Programmers who are a little bit introduced to the Go language know that the GOMAXPROCS variable can limit the maximum number of threads that can concurrently run the user-state Go code operating system, and you can even change the size of the maximum number of threads at program runtime by calling the function func GOMAXPROCS(n int) int, but when you read the documentation further, or more When you read the documentation further, or go deeper into Go language development, you will find that the actual number of threads is larger than the number you set, sometimes much larger than the value you set, and even more tragically, even when your concurrent tasks are backed up to no more than a few, the number of threads has not come down, wasting memory space and CPU scheduling for nothing.

Of course, this problem has been encountered by many people.

The Go documentation also indicates that the actual Thread may not be limited by GOMAXPROCS, and that the number of threads being blocked when the Go code makes a system call is not limited by this variable, as stated in the following documentation.

The GOMAXPROCS variable limits the number of operating system threads that can execute user-level Go code simultaneously. There is no limit to the number of threads that can be blocked in system calls on behalf of Go code; those do not count against the GOMAXPROCS limit. This package’s GOMAXPROCS function queries and changes the limit.

If there are a lot of concurrent blocking system calls, Go will create a lot of threads, but when the system calls are done, these threads will not be recycled because of the design of the Go runtime. See go issue #14592 for a detailed discussion. This issue is a issue from 2016, it’s been a few years since Go 1.6 was pushed, and still no one has done anything to try to fix or improve it. Obviously, it’s not an easy fix to work on.

I’ll reorganize it to deepen my understanding of this point.

What is a blocking system call?

So what is a blocking system call? stackoverflow has a Q&A that answers this question very well.

A blocking system call is one that must wait until the action can be completed. read() would be a good example - if no input is ready, it’ll sit there and wait until some is (provided you haven’t set it to non-blocking, of course, in which case it wouldn’t be a blocking system call). Obviously, while one thread is waiting on a blocking system call, another thread can be off doing something else.

A blocking system call is a system call that executes while the caller must wait until it completes. read() is a good example, if there is no data to read, the caller waits until some data is available (in case you don’t set it to non-blocking).

So wouldn’t Go read data from network I/O take up a system thread for each read goroutine? No way! Go uses netpoller to handle network reads and writes, and it uses epoll(linux), kqueue(BSD, Darwin), and IoCompletionPort(Windows) to poll network I/O state. Once a connection is accepted, the file descriptor of the connection is set to non-blocking, which also means that once there is no data in the connection, read data from it is not blocked, but a specific error is returned, so the Go standard library network read/write does not generate a lot of threads, unless you set GOMAXPROCS very large or set the underlying network connection file descriptor back to blocking mode.

But cgo or some other blocking system call may cause a large number of threads to be added and not reclaimed, as in the following example.

A simple test for dramatic thread count increase

Let me give you a simple example so you can see the large number of threads that are created.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
package main
import (
	"fmt"
	"net"
	"runtime/pprof"
	"sync"
)
var threadProfile = pprof.Lookup("threadcreate")
func main() {
	// Number of threads before start
	fmt.Printf(("threads in starting: %d\n"), threadProfile.Count())
	var wg sync.WaitGroup
	wg.Add(100)
	for i := 0; i < 100; i++ {
		go func() {
			defer wg.Done()
			for j := 0; j < 100; j++ {
				net.LookupHost("www.google.com")
			}
		}()
	}
	wg.Wait()
	// Number of threads after goroutine execution
	fmt.Printf(("threads after LookupHost: %d\n"), threadProfile.Count())
}

Go provides two ways to query domain names, the CGO way or the pure Go way, such as Dial, LookupHost, LookupAddr in the net library, these functions will be indirectly or directly related to the domain name program, such as the above example using LookupHost, using different ways to generate different threads in the case of concurrency.

For example, with a pure Go approach, the program would have 10 threads at exit time.

1
2
3
$ GODEBUG=netdns=go go run main.go
threads in starting: 7
threads after LookupHost: 10

With the cgo approach, the program exits with dozens, if not hundreds, of threads at the time of.

1
2
3
$ GODEBUG=netdns=cgo go run main.go
threads in starting: 7
threads after LookupHost: 109

Infinite increase? No way!

The Go runtime will not recycle threads, but will reuse them when needed. But you simply don’t need them if you create a lot of threads, and the theoretical value of keeping a small number of threads for reuse is fine.

If the program is not designed properly, it can result in a large number of idle threads. If you call similar blocking system calls or CGO code in the http handler, or if you call similar code on the microservices server, it is possible to create a “thread leak” when the client has high concurrency.

However, there is no limit to the number of threads that can be created. For one thing, each thread takes up a certain amount of memory resources, and a large number of threads can lead to memory exhaustion, and the Go runtime actually has a display for the number of threads created at runtime, which is 10000 by default.

You can set this using the debug.SetMaxThreads function. For example, you can set the maximum number of threads to 100 in the above example:

1
2
3
4
5
6
7
   ......
   // Number of threads before start
fmt.Printf(("threads in starting: %d\n"), threadProfile.Count())
debug.SetMaxThreads(100)
var wg sync.WaitGroup
wg.Add(100)
   ......

If you run the above program again, it crashes:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
$ GODEBUG=netdns=cgo go run main.go
threads in starting: 7
runtime: program exceeds 100-thread limit
fatal error: thread exhaustion
runtime stack:
runtime.throw(0x54c3e2, 0x11)
        /usr/local/go/src/runtime/panic.go:1116 +0x72
runtime.checkmcount()
        /usr/local/go/src/runtime/proc.go:622 +0xac
runtime.mReserveID(0x62c878)
        /usr/local/go/src/runtime/proc.go:636 +0
......

Reduction of threads

There is also a way to kill a thread using LockOSThread in the official issue

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
// KillOne kills a thread
func KillOne() {
	var wg sync.WaitGroup
	wg.Add(1)
	go func() {
		defer wg.Done()
		runtime.LockOSThread()
		return
	}()
	wg.Wait()
}

The LockOSThread function binds the current goroutine to the current system thread. This goroutine is always executed in this thread, and no other goroutine is executed in this thread. Only after this goroutine has called the UnlockOSThread function the same number of times, will it be unbound.

If the goroutine exits without unlocking the thread, then the thread will be terminated. We can use this feature to kill the thread. We can start a goroutine, call LockOSThread to occupy a thread, although there are currently many free threads, so we can just reuse one, goroutine does not call UnlockOSThread when it exits, which also causes the thread to be terminated.

Of course there are concerns provided in the official issue that killing an idle thread could potentially result in a child process that would receive the KIll signal.

You can extend this method by providing Kill(n int) which can terminate multiple threads, but of course the principles are similar. From a practical point of view, you can start a guarded goroutine, which will recycle some threads when the number of threads exceeds a certain threshold, or provide an interface to manually call some API to terminate some threads, which is an available method before the official solution to this problem.


Reference https://colobu.com/2020/12/20/threads-in-go-runtime/