Concurrency overload causes program crashes

Let’s start by looking at a very simple example.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
func main() {
	var wg sync.WaitGroup
	for i := 0; i < math.MaxInt32; i++ {
		wg.Add(1)
		go func(i int) {
			defer wg.Done()
			fmt.Println(i)
			time.Sleep(time.Second)
		}(i)
	}
	wg.Wait()
}

This example implements concurrency of math.MaxInt32 concurrently, about 2^31 = 200 million, with each concurrently doing almost nothing internally. Normally, this program would output 1 -> 2^31 numbers in random order.

So what does the actual run look like?

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
$ go run main.go
...
150577
150578
panic: too many concurrent operations on a single file or socket (max 1048575)

goroutine 1199236 [running]:
internal/poll.(*fdMutex).rwlock(0xc0000620c0, 0x0, 0xc0000781b0)
        /usr/local/go/src/internal/poll/fd_mutex.go:147 +0x13f
internal/poll.(*FD).writeLock(...)
        /usr/local/go/src/internal/poll/fd_mutex.go:239
internal/poll.(*FD).Write(0xc0000620c0, 0xc125ccd6e0, 0x11, 0x20, 0x0, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_unix.go:255 +0x5e
fmt.Fprintf(0x10ed3e0, 0xc00000e018, 0x10d3024, 0xc, 0xc0e69b87b0, 0x1, 0x1, 0x11, 0x0, 0x0)
        /usr/local/go/src/fmt/print.go:205 +0xa5
fmt.Printf(...)
        /usr/local/go/src/fmt/print.go:213
main.main.func1(0xc0000180b0, 0x124c31)
...

The result of the run was that the program crashed straight away, with the key exception message

1
panic: too many concurrent operations on a single file or socket (max 1048575)

The number of concurrent operations on a single file/socket exceeds the system limit. This error is caused by the fmt.Printf function, which prints the formatted string to the screen, i.e. standard output. In linux systems, standard output can also be considered a file. The kernel uses file descriptors to access files, 1 for standard output, 2 for error output, and 0 for standard input.

In short, the system is running out of resources.

So what if we remove the fmt.Printf line of code? Then the program will probably crash due to lack of memory. This is better understood as each goroutine needs to consume at least 2KB of space, so assuming the computer has 2GB of memory, then at most 2GB/2KB = 1M goroutines are allowed to exist at the same time. Then if there are other operations in the goroutine that require memory allocation, the number of goroutines allowed to execute concurrently will be reduced by an order of magnitude.

2 How to solve

The amount of resources consumed varies from application to application. The recommended way is for the application to actively limit the number of concurrent goroutines.

2.1 Using the channel’s buffer

This can be achieved by using the buffer size of the channel.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
// main_chan.go
func main() {
	var wg sync.WaitGroup
	ch := make(chan struct{}, 3)
	for i := 0; i < 10; i++ {
		ch <- struct{}{}
		wg.Add(1)
		go func(i int) {
			defer wg.Done()
			log.Println(i)
			time.Sleep(time.Second)
			<-ch
		}(i)
	}
	wg.Wait()
}
  • make(chan struct{}, 3) Create a channel with buffer size 3, and block if up to 3 messages are sent without being received.
  • Before starting goroutine, call ch <- struct{}{} and block if buffer is full.
  • The goroutine task ends with a call to <-ch to free the buffer.
  • sync.WaitGroup is not required, for example, for http services where each request is naturally concurrent, and where the channel is used to control the number of concurrently processed tasks, sync.WaitGroup is not required.

The results of the run are as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
$ go run main_chan.go
2020/12/21 00:48:28 2
2020/12/21 00:48:28 0
2020/12/21 00:48:28 1
2020/12/21 00:48:29 3
2020/12/21 00:48:29 4
2020/12/21 00:48:29 5
2020/12/21 00:48:30 6
2020/12/21 00:48:30 7
2020/12/21 00:48:30 8
2020/12/21 00:48:31 9

It is easy to see from the logs that only 3 tasks are executed concurrently per second, achieving the purpose of goroutine concurrency control.

2.2 Using third-party libraries

There are many third-party libraries that implement goroutine pools, which can be easily used to control the number of concurrent goroutines.

Take tunny as an example.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
package main

import (
	"log"
	"time"

	"github.com/Jeffail/tunny"
)

func main() {
	pool := tunny.NewFunc(3, func(i interface{}) interface{} {
		log.Println(i)
		time.Sleep(time.Second)
		return nil
	})
	defer pool.Close()

	for i := 0; i < 10; i++ {
		go pool.Process(i)
	}
	time.Sleep(time.Second * 4)
}
  • tunny.NewFunc(3, f) The first argument is the size of the goroutine pool (poolSize), and the second argument is the function (worker) that the goroutine runs.
  • pool.Process(i) Passes parameter i to the worker defined by the goroutine pool for processing.
  • pool.Close() Closes the goroutine pool.

The results of the run are as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
$ go run main_tunny.go
2020/12/21 01:00:21 6
2020/12/21 01:00:21 1
2020/12/21 01:00:21 3
2020/12/21 01:00:22 8
2020/12/21 01:00:22 4
2020/12/21 01:00:22 7
2020/12/21 01:00:23 5
2020/12/21 01:00:23 2
2020/12/21 01:00:23 0
2020/12/21 01:00:24 9

3 Adjusting the upper limit of system resources

3.1 ulimit

There are some scenarios where even though we effectively limit the number of concurrent goroutines, there is still a problem of insufficient resources of a certain type, e.g.

  • too many open files
  • out of memory
  • out of memory * …

For example, distributed compilation acceleration tools need to parse gcc commands and dependent source and header files, and some of the compiled commands may depend on hundreds of header files, so even if we limit the number of concurrent goroutines to 1000, we may still exceed the number of concurrent open file handles at process runtime. However, distributed compilation tools only distribute the dependent source files and headers to remote machines for execution, and do not consume memory and CPU resources on the local machine, so 1000 concurrency is not high, in this case, reducing the number of concurrency will affect the efficiency of compilation acceleration.

The operating system usually limits the number of concurrently opened files, stack space size, etc. ulimit -a can see the current system settings.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
$ ulimit -a
-t: cpu time (seconds)              unlimited
-f: file size (blocks)              unlimited
-d: data seg size (kbytes)          unlimited
-s: stack size (kbytes)             8192
-c: core file size (blocks)         0
-v: address space (kbytes)          unlimited
-l: locked-in-memory size (kbytes)  unlimited
-u: processes                       1418
-n: file descriptors                12800

We can use ulimit -n 999999 to adjust the number of simultaneous open file handles to 999999 to solve this problem, and other parameters can be adjusted as needed.

3.2 Virtual memory

Virtual memory is a very common technique for mapping disks to memory when there is not enough memory, such as swap space under linux.

Creating and using a swap partition on linux is a very simple task.

1
2
3
4
5
sudo fallocate -l 20G /mnt/.swapfile # 创建 20G 空文件
sudo mkswap /mnt/.swapfile    # 转换为交换分区文件
sudo chmod 600 /mnt/.swapfile # 修改权限为 600
sudo swapon /mnt/.swapfile    # 激活交换分区
free -m # 查看当前内存使用情况(包括交换分区)

Closing the swap partition is also very simple.

1
2
sudo swapoff /mnt/.swapfile
rm -rf /mnt/.swapfile

The difference between the I/O read/write performance of a disk and a memory stick is very large, for example, DDR3 memory sticks can easily reach 20GB/s read/write rate, but SSD solid state drives can usually only reach 0.5GB/s read/write performance, a difference of up to 40 times. Therefore, using virtual memory technology to map hard drives to memory obviously has some impact on performance. If the application only needs a large amount of memory for a short period of time, then virtual memory can effectively avoid the out of memory problem. If the application is reading and writing large amounts of memory at high frequency for a long period of time, then the performance impact of virtual memory is more obvious.