I recently worked on a project that required loading a large amount of data into memory and then making it available for external query use. At the same time, the data needs to be updated according to a certain policy, but the update frequency is very low. In principle, concurrent reads and writes need to be synchronized by locks. However, considering that the frequency of writes is much lower than that of reads, locking for this is really wasteful, so we want to investigate whether there is a lock-free solution. This requires an understanding of the memory model described in this paper.

Note that the content of this article is intended for experienced developers. For newcomers, this article strongly recommends dealing with concurrent reads and writes in a classical way, such as through locks. As the official Go language documentation states.

If you must read the rest of this document to understand the behavior of your program, you are being too clever.

Immediately following the document is a warning that

Don’t be clever.

The content to be covered in this article is the techniques used for optimization for specific conditions. You must be careful when using them.

In C/C++, all concurrent read and write operations are undefined behavior as long as no locks are used. That is, the exact outcome of the execution is entirely up to the compiler. Unlike Go, which has a more detailed specification of concurrent reads through its memory model, programs basically run without undefined behavior. When a program reads memory data, if the length of the data is one word length or half a word length(The word length is the length of data that the CPU can process at one time, such as 64 bits or 32 bits.), then the result must be atomic. In other words, it will either read the value before or after the update, but never the value after the “half” update. This simplifies the debugging and troubleshooting process.

The so-called memory model is actually a technical specification that specifies the behavior of different Goroutines to read and write the same area of memory concurrently without explicit synchronization. Specifically it contains two parts. One part specifies the order of execution of each Goroutine in a particular concurrency scenario. The other part specifies the rules for whether the updated result can be read by other Goroutines after multiple Goroutines concurrently update the same memory area. It is on the basis of this memory model that we want to implement lock-free updating of data.

Next, we describe the main synchronization scenarios and non-synchronization scenarios. For the sake of convenience, we call the processing where the result of a change in one Goroutine can be read by other Goroutines “data synchronization”.

Synchronization scenarios

Initialization

The program’s init() function is run by a single Goroutine, but the program may create new Goroutines during execution, and it will execute them concurrently.

If a package p imports a package q, then the init() function in p must start after all the init() functions in q have started executing (not necessarily finished executing).

The main function of the program main.main() will be executed after all init() have finished.

Goroutine creation

The program creates a new Goroutine and does a data sync before the Goroutine starts.

For example, in the following example.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
var a string

func f() {
  print(a)
}

func hello() {
  a = "hello, world"
  go f()
}

Calling the function hello() will output “hello, world” at some point (probably when hello() has finished).

The go f() here will allow the runtime to synchronize the data update. So the new Goroutine can read the latest value of variable a.

Goroutine exit

Goroutine exit does not guarantee that data synchronization will be performed. For example, in the following example.

1
2
3
4
5
6
var a string

func hello() {
  go func() { a = "hello" }()
  print(a)
}

Calling hello() does not necessarily output “hello”. Some aggressive compilers will even “optimize” the go func() line directly.

If the program must read the changes in the Goroutine, it needs to use an explicit synchronization mechanism such as a lock.

Channel communication

Goroutine relies mainly on channels for communication. For a given channel, a send necessarily corresponds to a receive. Usually the sending and receiving are performed in different Goroutines.

Changes made before the channel sends the message will perform a data synchronization before the receive operation is completed

As in the following code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
var c = make(chan int, 10)
var a string

func f() {
  a = "hello, world"
  c <- 0
}

func main() {
  go f()
  <-c
  print(a)
}

It will always output hello, world. This is because f() sends data to channel c after updating variable a, and the main function must read the latest changes to a when it receives it.

Closing the channel has the same effect.

Changing c <- 0 to close(c) in the above program also outputs hello, world.

For the unbuffered channel, changes made before the receiver receives are also synchronized before the sender finishes sending.

For example, this program (which swaps the order of sending and receiving in the above code and uses the unbuffered channel instead).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
var c = make(chan int)
var a string

func f() {
  a = "hello, world"
  <-c
}

func main() {
  go f()
  c <- 0
  print(a)
}

The output of hello, world is also guaranteed. Since there is no buffering, c <- 0 will not return until f() completes. In turn, the main function must see the sender’s changes to a after c <- 0 completes.

If the channel is buffered, then the program does not necessarily output “hello, world”.

Assuming that the channel buffer length is C, then the kth reception must be synchronized with the data changes before the kth+Cth transmission.

This behavior actually implements the ‘semaphore’ function, and the typical usage scenario is to control the maximum number of concurrency.

For example, the following code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
var limit = make(chan int, 3)

func main() {
  for _, w := range work {
    go func(w func()) {
      limit <- 1
      w()
      <-limit
    }(w)
  }
  select{}
}

Takes one task at a time from work and opens a Goroutine for processing. However, the Goroutine sends 1 to limit before processing and receives another piece of data after it finishes. Since the maximum length of limit is 3, at most three Goroutines can work at the same time, which limits the number of concurrent tasks.

Atomic variables

The atomic operations provided in sync/atomic are automatically synchronized, and operations on atomic variables by different Goroutines are read by other Goroutines.

For example, the following code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
var i atomic.Int32

func f() {
  i.Add(1)
}

func main() {
  go f()
  print(i.Load())
}

The program will output 1 in all cases.

Finalizers

The runtime package provides the SetFinalizer function, which registers a class destructor function on a specific object and executes it at runtime before memory reclamation. Changes made before calling SetFinalizer(x, f) will complete data synchronization, and f will be read at execution time.

Other synchronization scenarios

All other explicit synchronization tools such as those provided by the sync package have their own data synchronization behavior, which can be found in the corresponding documentation.

Non-synchronous scenarios

If not explicitly synchronized, the program may not read the contents in the same order as the updates.

As an example.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
var a, b int

func f() {
  a = 1
  b = 2
}

func g() {
  print(b)
  print(a)
}

func main() {
  go f()
  g()
}

Does the above program output g first 2 and then 0? Actually, it does. The key here is that a is updated before b in f. But in g, we read the updated b, that is, 2, but not the updated a. This is indeed a bit unbelievable. But Russ Cox discusses it in detail in his article Hardware Memory Models [^hw-mm], which you can read if you are interested.

This behavior may result in the following code errors.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
var a string
var done bool

func setup() {
  a = "hello, world"
  done = true
}

func doprint() {
  if !done {
    once.Do(setup)
  }
  print(a)
}

func twoprint() {
  go doprint()
  go doprint()
}

We use done to mark whether setup has completed or not. But even if setup has completed, doprint may still read false, creating a bizarre bug.

Finally, a more obscure example is given.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
type T struct {
  msg string
}

var g *T

func setup() {
  t := new(T)
  t.msg = "hello, world"
  g = t
}

func main() {
  go setup()
  for g == nil {
  }
  print(g.msg)
}

Here a new structure variable is assigned to the pointer g. Even though the main function has determined that g ! = nil, the program may still output the empty string. Again, this is related to the underlying hardware behavior. The safest thing to do is to synchronize it manually.

The above is the main content of the memory model. With this foundation, we can give lock-free data update solutions.

Let’s say our data is saved using map[int]float64. Then we can declare a global variable m.

1
var m map[int]float64

Then we can use a different Goroutine directly to query or update this variable. For updating, we can assign a new value to m directly in the Goroutine.

1
2
3
4
go func() {
  nm := load()
  m = nm
}()

This is a direct assignment. If you try to modify the original map, it will panic directly, but the assignment is fine. And since the map variable itself is a pointer, i.e. a word length, the assignment process is atomic.

For queries, since each request is processed by a new Goroutine, it is natural to get the latest data.

1
2
3
go func() {
  v := m[k]
}()

According to the memory model, the newly started Goroutine must read the latest content after the data is updated. The only problem is that the Goroutine running at the same time as the data update Goroutine may still read the old data. This is not a problem at all in my scenario.

This achieves the lock-free update effect perfectly. In fact, such read-more-write-less scenarios are widely available, such as configuration loading, memory caching and so on. You can use them as needed. It doesn’t seem like a lot of trouble from the code, but to make sure the whole program doesn’t go wrong, you have to study the memory model carefully. This is the reason I wrote this article in the first place. But by all means, be careful where concurrent operations are concerned!