I recently worked on a project that required loading a large amount of data into memory and then making it available for external query use. At the same time, the data needs to be updated according to a certain policy, but the update frequency is very low. In principle, concurrent reads and writes need to be synchronized by locks. However, considering that the frequency of writes is much lower than that of reads, locking for this is really wasteful, so we want to investigate whether there is a lock-free solution. This requires an understanding of the memory model described in this paper.
Note that the content of this article is intended for experienced developers. For newcomers, this article strongly recommends dealing with concurrent reads and writes in a classical way, such as through locks. As the official Go language documentation states.
If you must read the rest of this document to understand the behavior of your program, you are being too clever.
Immediately following the document is a warning that
Don’t be clever.
The content to be covered in this article is the techniques used for optimization for specific conditions. You must be careful when using them.
In C/C++, all concurrent read and write operations are undefined behavior as long as no locks are used. That is, the exact outcome of the execution is entirely up to the compiler. Unlike Go, which has a more detailed specification of concurrent reads through its memory model, programs basically run without undefined behavior. When a program reads memory data, if the length of the data is one word length or half a word length(The word length is the length of data that the CPU can process at one time, such as 64 bits or 32 bits.), then the result must be atomic. In other words, it will either read the value before or after the update, but never the value after the “half” update. This simplifies the debugging and troubleshooting process.
The so-called memory model is actually a technical specification that specifies the behavior of different Goroutines to read and write the same area of memory concurrently without explicit synchronization. Specifically it contains two parts. One part specifies the order of execution of each Goroutine in a particular concurrency scenario. The other part specifies the rules for whether the updated result can be read by other Goroutines after multiple Goroutines concurrently update the same memory area. It is on the basis of this memory model that we want to implement lock-free updating of data.
Next, we describe the main synchronization scenarios and non-synchronization scenarios. For the sake of convenience, we call the processing where the result of a change in one Goroutine can be read by other Goroutines “data synchronization”.
init() function is run by a single Goroutine, but the program may create new Goroutines during execution, and it will execute them concurrently.
If a package
p imports a package
q, then the
init() function in
p must start after all the
init() functions in
q have started executing (not necessarily finished executing).
The main function of the program
main.main() will be executed after all
init() have finished.
The program creates a new Goroutine and does a data sync before the Goroutine starts.
For example, in the following example.
Calling the function
hello() will output “hello, world” at some point (probably when
hello() has finished).
go f() here will allow the runtime to synchronize the data update. So the new Goroutine can read the latest value of variable
Goroutine exit does not guarantee that data synchronization will be performed. For example, in the following example.
hello() does not necessarily output “hello”. Some aggressive compilers will even “optimize” the
go func() line directly.
If the program must read the changes in the Goroutine, it needs to use an explicit synchronization mechanism such as a lock.
Goroutine relies mainly on channels for communication. For a given channel, a send necessarily corresponds to a receive. Usually the sending and receiving are performed in different Goroutines.
Changes made before the channel sends the message will perform a data synchronization before the receive operation is completed
As in the following code.
It will always output
hello, world. This is because
f() sends data to channel
c after updating variable
a, and the main function must read the latest changes to
a when it receives it.
Closing the channel has the same effect.
c <- 0 to
close(c) in the above program also outputs
For the unbuffered channel, changes made before the receiver receives are also synchronized before the sender finishes sending.
For example, this program (which swaps the order of sending and receiving in the above code and uses the unbuffered channel instead).
The output of
hello, world is also guaranteed. Since there is no buffering,
c <- 0 will not return until
f() completes. In turn, the main function must see the sender’s changes to
c <- 0 completes.
If the channel is buffered, then the program does not necessarily output “hello, world”.
Assuming that the channel buffer length is C, then the kth reception must be synchronized with the data changes before the kth+Cth transmission.
This behavior actually implements the ‘semaphore’ function, and the typical usage scenario is to control the maximum number of concurrency.
For example, the following code.
Takes one task at a time from
work and opens a Goroutine for processing. However, the Goroutine sends
limit before processing and receives another piece of data after it finishes. Since the maximum length of
3, at most three Goroutines can work at the same time, which limits the number of concurrent tasks.
The atomic operations provided in
sync/atomic are automatically synchronized, and operations on atomic variables by different Goroutines are read by other Goroutines.
For example, the following code.
The program will output
1 in all cases.
runtime package provides the
SetFinalizer function, which registers a
class destructor function on a specific object and executes it at runtime before memory reclamation. Changes made before calling
SetFinalizer(x, f) will complete data synchronization, and
f will be read at execution time.
Other synchronization scenarios
All other explicit synchronization tools such as those provided by the
sync package have their own data synchronization behavior, which can be found in the corresponding documentation.
If not explicitly synchronized, the program may not read the contents in the same order as the updates.
As an example.
Does the above program output
2 and then
0? Actually, it does. The key here is that
a is updated before
f. But in
g, we read the updated
b, that is,
2, but not the updated
a. This is indeed a bit unbelievable. But Russ Cox discusses it in detail in his article Hardware Memory Models [^hw-mm], which you can read if you are interested.
This behavior may result in the following code errors.
done to mark whether
setup has completed or not. But even if
setup has completed,
doprint may still read
false, creating a bizarre bug.
Finally, a more obscure example is given.
Here a new structure variable is assigned to the pointer
g. Even though the
main function has determined that
g ! = nil, the program may still output the empty string. Again, this is related to the underlying hardware behavior. The safest thing to do is to synchronize it manually.
The above is the main content of the memory model. With this foundation, we can give lock-free data update solutions.
Let’s say our data is saved using
map[int]float64. Then we can declare a global variable
Then we can use a different Goroutine directly to query or update this variable. For updating, we can assign a new value to
m directly in the Goroutine.
This is a direct assignment. If you try to modify the original map, it will panic directly, but the assignment is fine. And since the map variable itself is a pointer, i.e. a word length, the assignment process is atomic.
For queries, since each request is processed by a new Goroutine, it is natural to get the latest data.
According to the memory model, the newly started Goroutine must read the latest content after the data is updated. The only problem is that the Goroutine running at the same time as the data update Goroutine may still read the old data. This is not a problem at all in my scenario.
This achieves the lock-free update effect perfectly. In fact, such read-more-write-less scenarios are widely available, such as configuration loading, memory caching and so on. You can use them as needed. It doesn’t seem like a lot of trouble from the code, but to make sure the whole program doesn’t go wrong, you have to study the memory model carefully. This is the reason I wrote this article in the first place. But by all means, be careful where concurrent operations are concerned!