Chapter 9. Concurrency with Shared Variables¶
The previous chapter presented several programs that use goroutines and channels to express concurrency in a direct and natural way.
This chapter looks at the mechanics of concurrency, and in particular discusses:
- Some of the problems associated with sharing variables among multiple goroutines.
- The analytical techniques for recognizing those problems.
- The patterns for solving them.
- Some of the technical differences between goroutines and operating system threads.
Race Conditions¶
In a sequential program, that is, a program with only one goroutine, the steps of the program happen in the familiar execution order determined by the program logic. For instance, in a sequence of statements, the first one happens before the second one, and so on.
In a program with two or more goroutines, the steps within each goroutine happen in the familiar order, but in general we don't know whether an event x in one goroutine happens before an event y in another goroutine, or happens after it, or is simultaneous with it. When we cannot confidently say that one event happens before the other, then the events x and y are concurrent.
Consider a function that works correctly in a sequential program. That function is concurrency-safe if it continues to work correctly even when called concurrently (called from two or more goroutines with no additional synchronization).
This notion can be generalized to a set of collaborating functions, such as the methods and operations of a particular type. A type is concurrency-safe if all its accessible methods and operations are concurrency-safe.
We can make a program concurrency-safe without making every concrete type in that program concurrency-safe. Indeed, concurrency-safe types are the exception rather than the rule, so you should access a variable concurrently only if the documentation for its type says that this is safe. We avoid concurrent access to most variables by either of the following:
- Confining them to a single goroutine
- Maintaining a higher-level invariant of mutual exclusion
In contrast, exported package-level functions are generally expected to be concurrency-safe. Since package-level variables cannot be confined to a single goroutine, functions that modify them must enforce mutual exclusion.
There are many reasons a function might not work when called concurrently, including:
- Deadlock
- Livelock
- Resource starvation.
This chapter focuse on the most important one, the race condition.
A race condition is a situation in which the program does not give the correct result for some interleavings of the operations of multiple goroutines. Race conditions are pernicious because they may remain latent in a program and appear infrequently, perhaps only under heavy load or when using certain compilers, platforms, or architectures. This makes them hard to reproduce and diagnose.
The following example explains the seriousness of race conditions through the metaphor of financial loss:
// Package bank implements a bank with only one account. package bank var balance int func Deposit(amount int) { balance = balance + amount } func Balance() int { return balance }
As a sequential program, any sequence of calls to Deposit
and Balance
will give the right answer, that is, Balance
will report the sum of all amounts previously deposited. However, if we call these functions not in a sequence but concurrently, Balance
is no longer guaranteed to give the right answer. Consider the following two goroutines, which represent two transactions on a joint bank account:
// Alice: go func() { bank.Deposit(200) // A1 fmt.Println("=", bank.Balance()) // A2 }() // Bob: go bank.Deposit(100) // B
Alice deposits $200, then checks her balance, while Bob deposits $100. Since the steps A1
and A2
occur concurrently with B
, we cannot predict the order in which they happen.
Intuitively, it might seem that there are only three possible orderings:
- "Alice first"
- "Bob first"
- "Alice/Bob/Alice"
[p259]
In all cases the final balance is $300. The only variation is whether Alice's balance slip includes Bob's transaction or not, but the customers are satisfied either way.
However, this intuition is wrong. There is a fourth possible outcome, in which Bob's deposit occurs in the middle of Alice's deposit, after the balance has been read (balance + amount
) but before it has been updated (balance = ...
), causing Bob's transaction to disappear. This is because Alice's deposit operation A1
is really a sequence of two operations, a read and a write; call them A1r
and A1w
. The following is the problematic interleaving:
After A1r
, the expression balance + amount
evaluates to 200, so this is the value written during A1w
, despite the intervening deposit.
This program contains a particular kind of race condition called a data race. The definition of it is:
A data race occurs whenever two goroutines access the same variable concurrently and at least one of the accesses is a write.
Things get even messier if the data race involves a variable of a type that is larger than a single machine word, such as an interface, a string, or a slice. The following code updates x
concurrently to two slices of different lengths:
var x []int go func() { x = make([]int, 10) }() go func() { x = make([]int, 1000000) }() x[999999] = 1 // NOTE: undefined behavior; memory corruption possible!
The value of x
in the final statement is not defined. It could be any of the following:
- nil
- A slice of length 10
- A slice of length 1,000,000.
Recall that there are three parts to a slice: the pointer, the length, and the capacity. If the pointer comes from the first call to make
and the length comes from the second, x
would be a slice whose nominal length is 1,000,000 but whose underlying array has only 10 elements. In this case, storing to element 999,999 would clobber an arbitrary faraway memory location, with consequences that are impossible to predict and hard to debug and localize. This semantic minefield is called undefined behavior and is well known to C programmers; fortunately it is rarely as troublesome in Go as in C.
Even the notion that a concurrent program is an interleaving of several sequential programs is a false intuition. Section 9.4 will show that data races may have even stranger outcomes. Many programmers will occasionally offer justifications for known data races in their programs. The absence of problems on a given compiler and platform may give them false confidence. A good rule of thumb is that there is no such thing as a benign data race. So how do we avoid data races in our programs?
Avoiding a data race *¶
There are three ways to avoid a data race.
Avoid writing the variable *¶
The first way is not to write the variable. Consider the map below, which is lazily populated as each key is requested for the first time. If Icon
is called sequentially, the program works fine, but if Icon
is called concurrently, there is a data race accessing the map.
var icons = make(map[string]image.Image) func loadIcon(name string) image.Image // NOTE: not concurrency-safe! func Icon(name string) image.Image { icon, ok := icons[name] if !ok { icon = loadIcon(name) icons[name] = icon } return icon }
If instead we initialize the map with all necessary entries before creating additional goroutines and never modify it again, then any number of goroutines may safely call Icon
concurrently since each only reads the map.
var icons = map[string]image.Image{ "spades.png": loadIcon("spades.png"), "hearts.png": loadIcon("hearts.png"), "diamonds.png": loadIcon("diamonds.png"), "clubs.png": loadIcon("clubs.png"), } // Concurrency-safe. func Icon(name string) image.Image { return icons[name] }
In the example above, the icons
variable is assigned during package initialization, which happens before the program's main
function starts running. Once initialized, icons
is never modified. Data structures that are never modified or are immutable are inherently concurrency-safe and need no synchronization. This approach can be used if updates are essential.
Avoid accessing the variable from multiple goroutines *¶
The second way to avoid a data race is to avoid accessing the variable from multiple goroutines. This is the approach taken by many of the programs in the previous chapter, for example:
- The main goroutine in the concurrent web crawler (Section 8.6) is the sole goroutine that accesses the
seen
map. - The
broadcaster
goroutine in the chat server (Section 8.10) is the only goroutine that accesses theclients
map.
These variables are confined to a single goroutine.
Since other goroutines cannot access the variable directly, they must use a channel to send the confining goroutine a request to query or update the variable. This is what is meant by the Go mantra:
"Do not communicate by sharing memory; instead, share memory by communicating."
A goroutine that brokers access to a confined variable using channel requests is called a monitor goroutine for that variable. For example, the broadcaster
goroutine monitors access to the clients
map.
The following is the bank
example rewritten with the balance
variable confined to a monitor goroutine called teller
:
// Package bank provides a concurrency-safe bank with one account. package bank var deposits = make(chan int) // send amount to deposit var balances = make(chan int) // receive balance func Deposit(amount int) { deposits <- amount } func Balance() int { return <-balances } func teller() { var balance int // balance is confined to teller goroutine for { select { case amount := <-deposits: balance += amount case balances <- balance: } } } func init() { go teller() // start the monitor goroutine }
Even when a variable cannot be confined to a single goroutine for its entire lifetime, confinement may still be a solution to the problem of concurrent access. For example, it's common to share a variable between goroutines in a pipeline by passing its address from one stage to the next over a channel. If each stage of the pipeline refrains from accessing the variable after sending it to the next stage, then all accesses to the variable are sequential. In effect, the variable is confined to one stage of the pipeline, then confined to the next, and so on. This discipline is sometimes called serial confinement.
In the example below, Cakes
are serially confined, first to the baker
goroutine, then to the icer
goroutine:
type Cake struct{ state string } func baker(cooked chan<- *Cake) { for { cake := new(Cake) cake.state = "cooked" cooked <- cake // baker never touches this cake again } } func icer(iced chan<- *Cake, cooked <-chan *Cake) { for cake := range cooked { cake.state = "iced" iced <- cake // icer never touches this cake again } }
Allow only one goroutine to access the variable at a time *¶
The third way to avoid a data race is to allow many goroutines to access the variable, but only one at a time. This approach is known as mutual exclusion and is the subject of the next section.
Mutual Exclusion: sync.Mutex
¶
Section 8.6 uses a buffered channel as a counting semaphore to ensure that no more than 20 goroutines made simultaneous HTTP requests. With the same idea, we can use a channel of capacity 1 to ensure that at most one goroutine accesses a shared variable at a time. A semaphore that counts only to 1 is called a binary semaphore.
var ( sema = make(chan struct{}, 1) // a binary semaphore guarding balance balance int ) func Deposit(amount int) { sema <- struct{}{} // acquire token balance = balance + amount <-sema // release token } func Balance() int { sema <- struct{}{} // acquire token b := balance <-sema // release token return b }
This pattern of mutual exclusion is so useful that it is supported directly by the Mutex
type from the sync
package. Its Lock
method acquires the token (called a lock) and its Unlock
method releases it:
import "sync" var ( mu sync.Mutex // guards balance balance int ) func Deposit(amount int) { mu.Lock() balance = balance + amount mu.Unlock() } func Balance() int { mu.Lock() b := balance mu.Unlock() return b }
Each time a goroutine accesses balance
, it must call the mutex's Lock
method to acquire an exclusive lock. If some other goroutine has acquired the lock, this operation will block until the other goroutine calls Unlock
and the lock becomes available again. The mutex guards the shared variables. By convention, the variables guarded by a mutex are declared immediately after the declaration of the mutex itself. If you deviate from this, be sure to document it.
The region of code between Lock
and Unlock
in which a goroutine is free to read and modify the shared variables is called a critical section. The lock holder's call to Unlock
happens before any other goroutine can acquire the lock for itself. It is essential that the goroutine release the lock once it is finished, on all paths through the function, including error paths.
The bank program above exemplifies a common concurrency pattern:
- A set of exported functions encapsulates one or more variables so that the only way to access the variables is through these functions (or methods, for the variables of an object).
- Each function acquires a mutex lock at the beginning and releases it at the end, thereby ensuring that the shared variables are not accessed concurrently.
This arrangement of functions, mutex lock, and variables is called a monitor. This older use of the word "monitor" inspired the term "monitor goroutine". Both uses share the meaning of a broker that ensures variables are accessed sequentially.
In more complex critical sections, especially those in which errors must be dealt with by returning early, it can be hard to tell that calls to Lock
and Unlock
are strictly paired on all paths. Go's defer
statement is useful by deferring a call to Unlock
, and the critical section implicitly extends to the end of the current function.
func Balance() int { mu.Lock() defer mu.Unlock() return balance }
In the example above:
- The
Unlock
executes after thereturn
statement has read the value ofbalance
, so theBalance
function is concurrency-safe. Also, the local variableb
is no longer needed. - Furthermore, a deferred
Unlock
will run even if the critical section panics, which may be important in programs that make use ofrecover
(Section 5.10). Adefer
is marginally more expensive than an explicit call toUnlock
, but not enough to justify less clear code. Concurrent programs always favor clarity and resist premature optimization. Where possible, usedefer
and extend critical sections to the end of a function.
In the Withdraw
function below:
- On success, it reduces the balance by the specified amount and returns
true
. - If the account holds insufficient funds for the transaction,
Withdraw
restores the balance and returnsfalse
.
// NOTE: not atomic! func Withdraw(amount int) bool { Deposit(-amount) if Balance() < 0 { Deposit(amount) return false // insufficient funds } return true }
This function eventually gives the correct result, but it has a side effect. When an excessive withdrawal is attempted, the balance transiently dips below zero. This may cause a concurrent withdrawal for a modest sum to be spuriously rejected. So if Bob tries to buy a sports car, Alice can't pay for her morning coffee.
The problem is that Withdraw
is not atomic: it consists of a sequence of three separate operations, each of which acquires and then releases the mutex lock, but nothing locks the whole sequence.
Ideally, Withdraw
should acquire the mutex lock once around the whole operation. However, this attempt won't work:
// NOTE: incorrect! func Withdraw(amount int) bool { mu.Lock() defer mu.Unlock() Deposit(-amount) if Balance() < 0 { Deposit(amount) return false // insufficient funds } return true }
Deposit
tries to acquire the mutex lock a second time by calling mu.Lock()
, but because mutex locks are not re-entrant, it's not possible to lock a mutex that's already locked. This leads to a deadlock where nothing can proceed, and Withdraw
blocks forever.
There is a good reason Go's mutexes are not re-entrant. The purpose of a mutex is to ensure that certain invariants of the shared variables are maintained at critical points during program execution. One of the invariants is "no goroutine is accessing the shared variables", but there may be additional invariants specific to the data structures that the mutex guards. When a goroutine acquires a mutex lock, it may assume that the invariants hold. While it holds the lock, it may update the shared variables so that the invariants are temporarily violated. However, when it releases the lock, it must guarantee that order has been restored and the invariants hold once again. Although a re-entrant mutex would ensure that no other goroutines are accessing the shared variables, it cannot protect the additional invariants of those variables.
A common solution is to divide a function such as Deposit
into two:
- An unexported function,
deposit
, which assumes the lock is already held and does the real work - An exported function
Deposit
that acquires the lock before callingdeposit
The rewritten Withdraw
in terms of deposit
is like this:
func Withdraw(amount int) bool { mu.Lock() defer mu.Unlock() deposit(-amount) if balance < 0 { deposit(amount) return false // insufficient funds } return true } func Deposit(amount int) { mu.Lock() defer mu.Unlock() deposit(amount) } func Balance() int { mu.Lock() defer mu.Unlock() return balance } // This function requires that the lock be held. func deposit(amount int) { balance += amount }
Encapsulation (Section 6.6), by reducing unexpected interactions in a program, helps us maintain data structure invariants. For the same reason, encapsulation also helps us maintain concurrency invariants. When you use a mutex, make sure that both it and the variables it guards are not exported, whether they are package-level variables or the fields of a struct.
Read/Write Mutexes: sync.RWMutex
¶
Since the Balance
function only needs to read the state of the variable, it would actually be safe for multiple Balance
calls to run concurrently, as long as no Deposit
or Withdraw
call is running. In this scenario we need a special kind of lock that allows read-only operations to proceed in parallel with each other, but write operations to have fully exclusive access. This lock is called a multiple readers, single writer lock, which is provided by sync.RWMutex
in Go:
var mu sync.RWMutex var balance int func Balance() int { mu.RLock() // readers lock defer mu.RUnlock() return balance }
- The
Balance
function now calls theRLock
andRUnlock
methods to acquire and release a reader or shared lock. - The
Deposit
function, which is unchanged, calls themu.Lock
andmu.Unlock
methods to acquire and release a writer or exclusive lock.
RLock
can be used only if there are no writes to shared variables in the critical section. In general, we should not assume that logically read-only functions or methods don't also update some variables. For example, a method that appears to be a simple accessor might also increment an internal usage counter, or update a cache so that repeat calls are faster. If in doubt, use an exclusive Lock
.
It's only profitable to use an RWMutex
when most of the goroutines that acquire the lock are readers, and the lock is under contention, that is, goroutines routinely have to wait to acquire it. An RWMutex
requires more complex internal bookkeeping, making it slower than a regular mutex for uncontended locks.
Memory Synchronization¶
From previous examples, it is seen that the Balance
method needs mutual exclusion, either channel-based or mutex-based, even if it consists only of a single operation (unlike Deposit
where there is no danger of another goroutine executing "in the middle" of it). There are two reasons we need a mutex:
- It's equally important that
Balance
not execute in the middle of some other operation likeWithdraw
. - Synchronization is about more than just the order of execution of multiple goroutines; synchronization also affects memory.
In a modern computer there may be dozens of processors, each with its own local cache of the main memory. For efficiency, writes to memory are buffered within each processor and flushed out to main memory only when necessary. They may even be committed to main memory in a different order than they were written by the writing goroutine. Synchronization primitives like channel communications and mutex operations cause the processor to flush out and commit all its accumulated writes so that the effects of goroutine execution up to that point are guaranteed to be visible to goroutines running on other processors.
Consider the possible outputs of the following code:
var x, y int go func() { x = 1 // A1 fmt.Print("y:", y, " ") // A2 }() go func() { y = 1 // B1 fmt.Print("x:", x, " ") // B2 }()
Since these two goroutines are concurrent and access shared variables without mutual exclusion, there is a data race, so the program is not deterministic. We might expect it to print any one of these four results, which correspond to intuitive interleavings of the labeled statements of the program:
y:0 x:1 x:0 y:1 x:1 y:1 y:1 x:1
The fourth line above could be explained by the sequence A1,B1,A2,B2
or by B1,A1,A2,B2
. However, the following two outcomes might also happen, depending on the compiler, CPU, and many other factors:
x:0 y:0 y:0 x:0
Within a single goroutine, the effects of each statement are guaranteed to occur in the order of execution; goroutines are sequentially consistent. But in the absence of explicit synchronization using a channel or mutex, there is no guarantee that events are seen in the same order by all goroutines. Although goroutine A must observe the effect of the write x = 1
before it reads the value of y
, it does not necessarily observe the write to y
done by goroutine B, so A may print a stale value of y
.
It is tempting to try to understand concurrency as if it corresponds to some interleaving of the statements of each goroutine, but as the example above shows, this is not how a modern compiler or CPU works:
- Because the assignment and the
Print
refer to different variables, a compiler may conclude that the order of the two statements cannot affect the result, and swap them. - If the two goroutines execute on different CPUs, each with its own cache, writes by one goroutine are not visible to the other goroutine's
Print
until the caches are synchronized with main memory.
All these concurrency problems can be avoided by the consistent use of simple, established patterns. Where possible, confine variables to a single goroutine; for all other variables, use mutual exclusion.
Lazy Initialization: sync.Once
¶
Initializing a variable up front increases the start-up latency of a program and is unnecessary if execution doesn't always reach the part of the program that uses that variable. Therefore, it is good practice to defer an expensive initialization step until it is needed. Consider the icons
variable discussed earlier in the chapter:
var icons map[string]image.Image
The following version of Icon
uses lazy initialization:
func loadIcons() { icons = map[string]image.Image{ "spades.png": loadIcon("spades.png"), "hearts.png": loadIcon("hearts.png"), "diamonds.png": loadIcon("diamonds.png"), "clubs.png": loadIcon("clubs.png"), } } // NOTE: not concurrency-safe! func Icon(name string) image.Image { if icons == nil { loadIcons() // one-time initialization } return icons[name] }
For a variable accessed by only a single goroutine, we can use the pattern above, but this pattern is not safe if Icon
is called concurrently. Like the bank's original Deposit
function, Icon
consists of multiple steps: it tests whether icons
is nil, then it loads the icons, then it updates icons to a non-nil value.
Intuition might suggest that the worst possible outcome of the race condition above is that the loadIcons
function is called several times. While the first goroutine is busy loading the icons, another goroutine entering Icon
would find the variable still equal to nil, and would also call loadIcons
. However, this intuition is also wrong. As discussed in Section 9.4, in the absence of explicit synchronization, the compiler and CPU are free to reorder accesses to memory in any number of ways, so long as the behavior of each goroutine is sequentially consistent. One possible reordering of the statements of loadIcons
is shown below. It stores the empty map in the icons
variable before populating it:
func loadIcons() { icons = make(map[string]image.Image) icons["spades.png"] = loadIcon("spades.png") icons["hearts.png"] = loadIcon("hearts.png") icons["diamonds.png"] = loadIcon("diamonds.png") icons["clubs.png"] = loadIcon("clubs.png") }
Consequently, a goroutine finding icons
to be non-nil may not assume that the initialization of the variable is complete.
The simplest correct way to ensure that all goroutines observe the effects of loadIcons
is to synchronize them using a mutex:
var mu sync.Mutex // guards icons var icons map[string]image.Image // Concurrency-safe. func Icon(name string) image.Image { mu.Lock() defer mu.Unlock() if icons == nil { loadIcons() } return icons[name] }
However, the cost of enforcing mutually exclusive access to icons is that two goroutines cannot access the variable concurrently, even once the variable has been safely initialized and will never be modified again. This suggests a multiple-readers lock:
var mu sync.RWMutex // guards icons var icons map[string]image.Image // Concurrency-safe. func Icon(name string) image.Image { mu.RLock() if icons != nil { icon := icons[name] mu.RUnlock() return icon } mu.RUnlock() // acquire an exclusive lock mu.Lock() if icons == nil { // NOTE: must recheck for nil loadIcons() } icon := icons[name] mu.Unlock() return icon }
There are now two critical sections:
- The goroutine first acquires a reader lock, consults the map, then releases the lock.
- If an entry was found, it is returned. If no entry was found, the goroutine acquires a writer lock. There is no way to upgrade a shared lock to an exclusive one without first releasing the shared lock, so we must recheck the
icons
variable in case another goroutine already initialized it in the interim.
The pattern above has greater concurrency but is complex and thus error-prone. Fortunately, the sync
package provides a specialized solution to the problem of one-time initialization: sync.Once
. Conceptually, a Once
consists of a mutex and a boolean variable that records whether initialization has taken place; the mutex guards both the boolean and the client's data structures. The sole method, Do
, accepts the initialization function as its argument. The following Icon
function uses Once
:
var loadIconsOnce sync.Once var icons map[string]image.Image // Concurrency-safe. func Icon(name string) image.Image { loadIconsOnce.Do(loadIcons) return icons[name] }
Each call to Do(loadIcons)
locks the mutex and checks the boolean variable. In the first call, in which the variable is false, Do
calls loadIcons
and sets the variable to true. Subsequent calls do nothing, but the mutex synchronization ensures that the effects of loadIcons
on memory (specifically, icons
) become visible to all goroutines. Using sync.Once
in this way, we can avoid sharing variables with other goroutines until they have been properly constructed.
The Race Detector¶
The Go runtime and toolchain are equipped with a sophisticated and easy-to-use dynamic analysis tool, the race detector. Add the -race
flag to your go build
, go run
, or go test
command. This causes the compiler to build a modified version of your application or test with additional instrumentation that effectively records all accesses to shared variables that occurred during execution, along with the identity of the goroutine that read or wrote the variable. In addition, the modified program records all synchronization events, such as go
statements, channel operations, and calls to (*sync.Mutex).Lock
, (*sync.WaitGroup).Wait
, and so on. The complete set of synchronization events is specified by the The Go Memory Model document that accompanies the language specification.
The race detector studies this stream of events, looking for cases in which one goroutine reads or writes a shared variable that was most recently written by a different goroutine without an intervening synchronization operation. This indicates a concurrent access to the shared variable, and thus a data race. The tool prints a report that includes the identity of the variable, and the stacks of active function calls in the reading goroutine and the writing goroutine. This is usually sufficient to pinpoint the problem. Section 9.7 contains an example of race detector.
The race detector reports all data races that were actually executed. However, it can only detect race conditions that occur during a run; it cannot prove that none will ever occur. For best results, make sure that your tests exercise your packages using concurrency. Due to extra bookkeeping, a program built with race detection needs more time and memory to run, but the overhead is tolerable even for many production jobs. For infrequently occurring race conditions, making the race detector do its job can save time of debugging.
Example: Concurrent Non-Blocking Cache¶
This section talks about building a concurrent non-blocking cache, an abstraction that solves a problem in real-world concurrent programs but is not well addressed by existing libraries. This is the problem of memoizing a function, which means caching the result of a function so that it need be computed only once.
The solution will be concurrency-safe and will avoid the contention associated with designs based on a single lock for the whole cache. It'll use the httpGetBody
function below as an example of the function to memoize. It makes an HTTP GET request and reads the request body. Calls to this function are relatively expensive, so we want to avoid repeating them unnecessarily.
func httpGetBody(url string) (interface{}, error) { resp, err := http.Get(url) if err != nil { return nil, err } defer resp.Body.Close() return ioutil.ReadAll(resp.Body) }
Notice the final line. ReadAll
returns two results, a []byte
and an error
, but since these are assignable to the declared result types of httpGetBody
, interface{}
and error
respectively, we can return the result of the call without further ado. We chose this return type for httpGetBody
so that it conforms to the type of functions that our cache is designed to memoize.
// Package memo provides a concurrency-unsafe // memoization of a function of type Func. package memo // A Memo caches the results of calling a Func. type Memo struct { f Func cache map[string]result } // Func is the type of the function to memoize. type Func func(key string) (interface{}, error) type result struct { value interface{} err error } func New(f Func) *Memo { return &Memo{f: f, cache: make(map[string]result)} } // NOTE: not concurrency-safe! func (memo *Memo) Get(key string) (interface{}, error) { res, ok := memo.cache[key] if !ok { res.value, res.err = memo.f(key) memo.cache[key] = res } return res.value, res.err }
A Memo
instance holds the function f
to memoize, of type Func
, and the cache
, which is a mapping from strings to results. Each result is simply the pair of results returned by a call to f
, a value and an error.
Below is an example of how to use Memo
. For each element in a stream of incoming URLs, we call Get
, logging the latency of the call and the amount of data it returns:
m := memo.New(httpGetBody) for url := range incomingURLs() { start := time.Now() value, err := m.Get(url) if err != nil { log.Print(err) } fmt.Printf("%s, %s, %d bytes\n", url, time.Since(start), len(value.([]byte))) }
We can use the testing
package (the topic of Chapter 11) to systematically investigate the effect of memoization. In the test output below,the URL stream contains duplicates; that although the first call to (*Memo).Get
for each URL takes hundreds of milliseconds, the second request returns the same amount of data in under a millisecond.
$ go test -v gopl.io/ch9/memo1 === RUN Test https://golang.org, 175.026418ms, 7537 bytes https://godoc.org, 172.686825ms, 6878 bytes https://play.golang.org, 115.762377ms, 5767 bytes http://gopl.io, 749.887242ms, 2856 bytes https://golang.org, 721ns, 7537 bytes https://godoc.org, 152ns, 6878 bytes https://play.golang.org, 205ns, 5767 bytes http://gopl.io, 326ns, 2856 bytes --- PASS: Test (1.21s) PASS ok gopl.io/ch9/memo1 1.257s
This test executes all calls to Get
sequentially.
Since HTTP requests are a great opportunity for parallelism, the following test makes all requests concurrently. The test uses a sync.WaitGroup
to wait until the last request is complete before returning.
gopl.io/ch9/memotest/memotest.go
m := memo.New(httpGetBody) var n sync.WaitGroup for url := range incomingURLs() { n.Add(1) go func(url string) { start := time.Now() value, err := m.Get(url) if err != nil { log.Print(err) } fmt.Printf("%s, %s, %d bytes\n", url, time.Since(start), len(value.([]byte))) n.Done() }(url) } n.Wait()
The test runs much faster, but unfortunately it is unlikely to work correctly all the time. We may notice unexpected cache misses, or cache hits that return incorrect values, or even crashes.
The worse thing is that it is likely to work correctly some of the time, so we may not even notice that it has a problem. But if we run it with the -race
flag, the race detector (Section 9.6) often prints a report such as this one:
$ go test -run=TestConcurrent -race -v gopl.io/ch9/memo1 === RUN TestConcurrent ... WARNING: DATA RACE Write by goroutine 36: runtime.mapassign1() ~/go/src/runtime/hashmap.go:411 +0x0 gopl.io/ch9/memo1.(*Memo).Get() ~/gobook2/src/gopl.io/ch9/memo1/memo.go:32 +0x205 ... Previous write by goroutine 35: runtime.mapassign1() ~/go/src/runtime/hashmap.go:411 +0x0 gopl.io/ch9/memo1.(*Memo).Get() ~/gobook2/src/gopl.io/ch9/memo1/memo.go:32 +0x205 ... Found 1 data race(s) FAIL gopl.io/ch9/memo1 2.393s
The reference to memo.go:32
tells us that two goroutines have updated the cache
map without any intervening synchronization. Get
is not concurrency-safe: it has a data race.
28 func (memo *Memo) Get(key string) (interface{}, error) { 29 res, ok := memo.cache[key] 30 if !ok { 31 res.value, res.err = memo.f(key) 32 memo.cache[key] = res 33 } 34 return res.value, res.err 35 }
The simplest way to make the cache concurrency-safe is to use monitor-based synchronization, by adding a mutex to the Memo
, which acquires the mutex lock at the start of Get
, and releases it before Get
returns, so that the two cache operations occur within the critical section:
type Memo struct { f Func mu sync.Mutex // guards cache cache map[string]result } // Get is concurrency-safe. func (memo *Memo) Get(key string) (value interface{}, err error) { memo.mu.Lock() res, ok := memo.cache[key] if !ok { res.value, res.err = memo.f(key) memo.cache[key] = res } memo.mu.Unlock() return res.value, res.err }
Now the race detector is silent, even when running the tests concurrently. Unfortunately this change to Memo
reverses our earlier performance gains. By holding the lock for the duration of each call to f
, Get
serializes all the I/O operations we intended to parallelize. What we need is a non-blocking cache, one that does not serialize calls to the function it memoizes.
In the following implementation of Get
, the calling goroutine acquires the lock twice: once for the lookup, and then a second time for the update if the lookup returned nothing. In between, other goroutines are free to use the cache.
func (memo *Memo) Get(key string) (value interface{}, err error) { memo.mu.Lock() res, ok := memo.cache[key] memo.mu.Unlock() if !ok { res.value, res.err = memo.f(key) // Between the two critical sections, several goroutines // may race to compute f(key) and update the map. memo.mu.Lock() memo.cache[key] = res memo.mu.Unlock() } return res.value, res.err }
The performance improves again, but now we notice that some URLs are being fetched twice. This happens when two or more goroutines call Get
for the same URL at about the same time. Both consult the cache, find no value there, and then call the slow function f
. Then both of them update the map with the result they obtained. One of the results is overwritten by the other.
We need to avoid this redundant work. This feature is sometimes called duplicate suppression. In the version of Memo
below, each map element is a pointer to an entry struct. Each entry contains the memoized result of a call to the function f
, but it additionally contains a channel called ready
. Right after the entry's result has been set, this channel will be closed, to broadcast (Section 8.9) to any other goroutines that it is now safe for them to read the result from the entry
.
type entry struct { res result ready chan struct{} // closed when res is ready } func New(f Func) *Memo { return &Memo{f: f, cache: make(map[string]*entry)} } type Memo struct { f Func mu sync.Mutex // guards cache cache map[string]*entry } func (memo *Memo) Get(key string) (value interface{}, err error) { memo.mu.Lock() e := memo.cache[key] if e == nil { // This is the first request for this key. // This goroutine becomes responsible for computing // the value and broadcasting the ready condition. e = &entry{ready: make(chan struct{})} memo.cache[key] = e memo.mu.Unlock() e.res.value, e.res.err = memo.f(key) close(e.ready) // broadcast ready condition } else { // This is a repeat request for this key. memo.mu.Unlock() <-e.ready // wait for ready condition } return e.res.value, e.res.err }
A call to Get
now involves:
- Acquiring the mutex lock that guards the
cache
map - Looking in the map for a pointer to an existing
entry
- Allocating and inserting a new
entry
if none was found, then releasing the lock
But there are two cases this time:
- If there was an existing
entry
(in theelse
block), its value is not necessarily ready yet: another goroutine could still be calling the slow functionf
. Thus the calling goroutine must wait for the entry's "ready" condition before it reads theentry
's result. It does this by reading a value from theready
channel, since this operation blocks until the channel is closed. - If there was no existing entry (in the
if e == nil
block), then by inserting a new "not ready" entry into the map, the current goroutine becomes responsible for invoking the slow function, updating the entry, and broadcasting the readiness of the new entry to any other goroutines that might (by then) be waiting for it.
Notice that the variables e.res.value
and e.res.err
in the entry
are shared among multiple goroutines. The goroutine that creates the entry
sets their values, and other goroutines read their values once the "ready" condition has been broadcast. Despite being accessed by multiple goroutines, no mutex lock is necessary. The closing of the ready
channel happens before any other goroutine receives the broadcast event, so the write to those variables in the first goroutine happens before they are read by subsequent goroutines. There is no data race.
Our concurrent, duplicate-suppressing, non-blocking cache is complete.
The implementation of Memo
above uses a mutex to guard a map variable that is shared by each goroutine that calls Get
. It's interesting to contrast this design with an alternative one in which the map variable is confined to a monitor goroutine to which callers of Get
must send a message.
The declarations of Func
, result
, and entry
remain as before. However, the Memo
type now consists of a channel, requests
, through which the caller of Get
communicates with the monitor goroutine. The element type of the channel is a request
. Using this structure, the caller of Get
sends the monitor goroutine both the key (the argument to the memoized function), and another channel, response
, over which the result should be sent back when it becomes available. This channel will carry only a single value.
// A request is a message requesting that the Func be applied to key. type request struct { key string response chan<- result // the client wants a single result } type Memo struct{ requests chan request } // New returns a memoization of f. Clients must subsequently call Close. func New(f Func) *Memo { memo := &Memo{requests: make(chan request)} go memo.server(f) return memo } func (memo *Memo) Get(key string) (interface{}, error) { response := make(chan result) memo.requests <- request{key, response} res := <-response return res.value, res.err } func (memo *Memo) Close() { close(memo.requests) }
The Get
method above creates a response
channel, puts it in the request
, sends it to the monitor goroutine, then immediately receives from it.
The cache variable is confined to the monitor goroutine (*Memo).server
shown below. The monitor reads requests in a loop until the request
channel is closed by the Close
method. For each request, it consults the cache, creating and inserting a new entry
if none was found.
func (memo *Memo) server(f Func) { cache := make(map[string]*entry) for req := range memo.requests { e := cache[req.key] if e == nil { // This is the first request for this key. e = &entry{ready: make(chan struct{})} cache[req.key] = e go e.call(f, req.key) // call f(key) } go e.deliver(req.response) } } func (e *entry) call(f Func, key string) { // Evaluate the function. e.res.value, e.res.err = f(key) // Broadcast the ready condition. close(e.ready) } func (e *entry) deliver(response chan<- result) { // Wait for the ready condition. <-e.ready // Send the result to the client. response <- e.res }
Similar to the mutex-based version, the first request for a given key becomes responsible for calling the function f
on that key, storing the result in the entry
, and broadcasting the readiness of the entry
by closing the ready
channel. This is done by (*entry).call
.
A subsequent request for the same key finds the existing entry
in the map, waits for the result to become ready, and sends the result through the response
channel to the client goroutine that called Get
. This is done by (*entry).deliver
.
The call
and deliver
methods must be called in their own goroutines to ensure that the monitor goroutine does not stop processing new requests.
This example shows that it's possible to build many concurrent structures using either of the two approaches without excessive complexity:
- Shared variables and locks
- Communicating sequential processes
It's not always obvious which approach is preferable in a given situation, but it's worth knowing how they correspond. Sometimes switching from one approach to the other can make your code simpler.
Goroutines and Threads¶
The previous chapter mentioned that the difference between goroutines and operating system (OS) threads could be ignored until later. Although the differences between them are essentially quantitative, a big enough quantitative difference becomes a qualitative one, and so it is with goroutines and threads.
Growable Stacks¶
Each OS thread has a fixed-size block of memory (often as large as 2MB) for its stack, the work area where it saves the local variables of function calls that are in progress or temporarily suspended while another function is called.
This fixed-size stack is simultaneously too much and too little:
- A 2MB stack would be a huge waste of memory for a little goroutine, such as one that merely waits for a
WaitGroup
then closes a channel. - It's too small for a Go program that creates hundreds of thousands of goroutines at one time. Yet despite their size, fixed-size stacks are not always big enough for the most complex and deeply recursive of functions.
Changing the fixed size can improve space efficiency and allow more threads to be created, or it can enable more deeply recursive functions, but it cannot do both.
In contrast, a goroutine starts life with a small stack, typically 2KB. A goroutine's stack, like the stack of an OS thread, holds the local variables of active and suspended function calls, but unlike an OS thread, a goroutine’s stack is not fixed; it grows and shrinks as needed. The size limit for a goroutine stack may be as much as 1GB, orders of magnitude larger than a typical fixed-size thread stack, though few goroutines use that much.
Goroutine Scheduling¶
OS threads are scheduled by the OS kernel. Every few milliseconds, a hardware timer interrupts the processor, which causes a kernel function called the scheduler to be invoked. This function does the following things:
- Suspends the currently executing thread and saves its registers in memory.
- Looks over the list of threads and decides which one should run next
- Restores that thread's registers from memory, then resumes the execution of that thread.
Because OS threads are scheduled by the kernel, passing control from one thread to another requires a full context switch, which does the following things:
- Saving the state of one user thread to memory
- Restoring the state of another thread
- Updating the scheduler's data structures.
This operation is slow, due to its poor locality and the number of memory accesses required, and has historically only gotten worse as the number of CPU cycles required to access memory has increased.
The Go runtime contains its own scheduler that uses a technique known as m:n scheduling, because it multiplexes (or schedules) m goroutines on n OS threads. The job of the Go scheduler is analogous to that of the kernel scheduler, but it is concerned only with the goroutines of a single Go program.
Unlike the operating system's thread scheduler, the Go scheduler is not invoked periodically by a hardware timer, but implicitly by certain Go language constructs. For example, when a goroutine calls time.Sleep
or blocks in a channel or mutex operation, the scheduler puts it to sleep and runs another goroutine until it is time to wake the first one up. Because it doesn't need a switch to kernel context, rescheduling a goroutine is much cheaper than rescheduling a thread.
GOMAXPROCS¶
The Go scheduler uses a parameter called GOMAXPROCS
to determine how many OS threads may be actively executing Go code simultaneously. Its default value is the number of CPUs on the machine, so on a machine with 8 CPUs, the scheduler will schedule Go code on up to 8 OS threads at once. GOMAXPROCS
is the n in m:n scheduling.
- Goroutines that are sleeping or blocked in a communication do not need a thread at all.
- Goroutines that are blocked in I/O or other system calls or are calling non-Go functions, do need an OS thread, but
GOMAXPROCS
need not account for them.
You can explicitly control this parameter using the GOMAXPROCS
environment variable or the runtime.GOMAXPROCS
function.
We can see the effect of GOMAXPROCS
on the following program, which prints an endless stream of zeros and ones:
for { go fmt.Print(0) fmt.Print(1) }
$ GOMAXPROCS=1 go run hacker-cliché.go 111111111111111111110000000000000000000011111... $ GOMAXPROCS=2 go run hacker-cliché.go 010101010101010101011001100101011010010100110...
- In the first run, at most one goroutine was executed at a time. Initially, it was the main goroutine, which prints ones. After a period of time, the Go scheduler put it to sleep and woke up the goroutine that prints zeros, giving it a turn to run on the OS thread.
- In the second run, there were two OS threads available, so both goroutines ran simultaneously, printing digits at about the same rate.
Note that many factors are involved in goroutine scheduling, and the runtime is constantly evolving, so your results may differ from the ones above.
Goroutines Have No Identity¶
In most operating systems and programming languages that support multithreading, the current thread has a distinct identity that can be easily obtained as an ordinary value, typically an integer or pointer. This makes it easy to build an abstraction called thread-local storage, which is essentially a global map keyed by thread identity, so that each thread can store and retrieve values independent of other threads.
Goroutines have no notion of identity that is accessible to the programmer. This is by design, since thread-local storage tends to be abused. For example, in a web server implemented in a language with thread-local storage, it's common for many functions to find information about the HTTP request on whose behalf they are currently working by looking in that storage. However, this can lead to an unhealthy "action at a distance" in which the behavior of a function is not determined by its arguments alone, but by the identity of the thread in which it runs. Consequently, if the identity of the thread changes (e.g. some worker threads are enlisted to help) the function misbehaves mysteriously.
Go encourages a simpler style of programming in which parameters that affect the behavior of a function are explicit. Not only does this make programs easier to read, but it facilitates us to freely assign subtasks of a given function to many different goroutines without worrying about their identity.
At this point, all the language features needed for writing Go programs have been discussed. The next two chapters will step back to some of the practices and tools that support programming in the large: how to structure a project as a set of packages, and how to obtain, build, test, benchmark, profile, document, and share those packages.
Doubts and Solution¶
Verbatim¶
p265 on mutexes¶
Although a re-entrant mutex would ensure that no other goroutines are accessing the shared variables, it cannot protect the additional invariants of those variables.
Question: What does it mean?
p271 in sync.Once
¶
... the mutex synchronization ensures that the effects of
loadIcons
on memory (specifically,icons
) become visible to all goroutines. Usingsync.Once
in this way, we can avoid sharing variables with other goroutines until they have been properly constructed.
Question: What does it mean?