As we all know, the design of gorourtine is a core component of the concurrent implementation of the Go language, easy to get started, but also encounter a variety of difficulties, of which goroutine leakage is one of the major problems, and its appearance often requires a long time to troubleshoot. Some people say you can use pprof to troubleshoot, but although it can serve the purpose, these performance analysis tools are often used to help troubleshoot problems after they occur.

Is there a tool that can prevent problems before they happen? Of course there is. The goleak open-sourced by the Uber team can be used to detect goroutine leaks and can be combined with unit tests to prevent them before they happen.

goroutine leaks

I don’t know if you have ever encountered goroutine leaks in your daily development, goroutine leaks are actually goroutine blocking, these blocking goroutines will live until the end of the process, they occupy the stack memory has been unable to release, thus leading to the system’s available memory will be less and less, until the crash! To briefly summarize a few common causes of leaks.

  • The logic inside Goroutine goes into a dead cycle and keeps taking up resources
  • Goroutine is used in conjunction with channel/mutex and keeps getting blocked due to improper usage.
  • The logic inside Goroutine waits for a long time, causing the number of Goroutines to skyrocket

Next, we use the classic combination of Goroutine + channel to demonstrate goroutine leakage.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
func GetData() {
 var ch chan struct{}
 go func() {
  <- ch
 }()
}

func main()  {
 defer func() {
  fmt.Println("goroutines: ", runtime.NumGoroutine())
 }()
 GetData()
 time.Sleep(2 * time.Second)
}

This example is channel forget to initialize, both read and write operations will cause blocking, this method if the unit test is written is not to check the problem.

1
2
3
func TestGetData(t *testing.T) {
 GetData()
}

Results of the run.

1
2
3
=== RUN   TestGetData
--- PASS: TestGetData (0.00s)
PASS

The built-in test cannot be satisfied, so next we introduce goleak to test it.

goleak

Use goleak mainly focus on two methods can: VerifyNone, VerifyTestMain, VerifyNone is used for testing in a single test case, VerifyTestMain can be added in TestMain, can reduce the invasion of the test code, examples are as follows.

Use VerifyNone:

1
2
3
4
func TestGetDataWithGoleak(t *testing.T) {
 defer goleak.VerifyNone(t)
 GetData()
}

Results of the run.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
=== RUN   TestGetDataWithGoleak
    leaks.go:78: found unexpected goroutines:
        [Goroutine 35 in state chan receive (nil chan), with asong.cloud/Golang_Dream/code_demo/goroutine_oos_detector.GetData.func1 on top of the stack:
        goroutine 35 [chan receive (nil chan)]:
        asong.cloud/Golang_Dream/code_demo/goroutine_oos_detector.GetData.func1()
         /Users/go/src/asong.cloud/Golang_Dream/code_demo/goroutine_oos_detector/main.go:12 +0x1f
        created by asong.cloud/Golang_Dream/code_demo/goroutine_oos_detector.GetData
         /Users/go/src/asong.cloud/Golang_Dream/code_demo/goroutine_oos_detector/main.go:11 +0x3c
        ]
--- FAIL: TestGetDataWithGoleak (0.45s)

FAIL

Process finished with the exit code 1

See the specific code segment where the goroutine leak occurred by running the results; using VerifyNone will be invasive to our test code and can be integrated into the test faster by using the VerifyTestMain method.

1
2
3
func TestMain(m *testing.M) {
 goleak.VerifyTestMain(m)
}

Results of the run.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
=== RUN   TestGetData
--- PASS: TestGetData (0.00s)
PASS
goleak: Errors on successful test run: found unexpected goroutines:
[Goroutine 5 in state chan receive (nil chan), with asong.cloud/Golang_Dream/code_demo/goroutine_oos_detector.GetData.func1 on top of the stack:
goroutine 5 [chan receive (nil chan)]:
asong.cloud/Golang_Dream/code_demo/goroutine_oos_detector.GetData.func1()
 /Users/go/src/asong.cloud/Golang_Dream/code_demo/goroutine_oos_detector/main.go:12 +0x1f
created by asong.cloud/Golang_Dream/code_demo/goroutine_oos_detector.GetData
 /Users/go/src/asong.cloud/Golang_Dream/code_demo/goroutine_oos_detector/main.go:11 +0x3c
]

Process finished with the exit code 1

The result of VerifyTestMain is a little different from VerifyNone, VerifyTestMain will report the test case execution result first, and then report the leak analysis, if there are multiple goroutine leaks in the test case, it is not possible to pinpoint the specific test where the leak occurred, you need to use the following script for further analysis.

1
2
3
4
5
6
# Create a test binary which will be used to run each test individually
$ go test -c -o tests

# Run each test individually, printing "." for successful tests, or the test name
# for failing tests.
$ for test in $(go test -list . | grep -E "^(Test|Example)"); do ./tests -test.run "^$test\$" &>/dev/null && echo -n "." || echo -e "\n$test failed"; done

This will print out exactly which test case failed.

goleak implementation principle

From the VerifyNone portal, we look at the source code, which calls the Find method.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
// Find looks for extra goroutines, and returns a descriptive error if
// any are found.
func Find(options ...Option) error {
  // 获取当前goroutine的ID
 cur := stack.Current().ID()

 opts := buildOpts(options...)
 var stacks []stack.Stack
 retry := true
 for i := 0; retry; i++ {
    // 过滤无用的goroutine
  stacks = filterStacks(stack.All(), cur, opts)

  if len(stacks) == 0 {
   return nil
  }
  retry = opts.retry(i)
 }

 return fmt.Errorf("found unexpected goroutines:\n%s", stacks)
}

Let’s look at the filterStacks method.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
// filterStacks will filter any stacks excluded by the given opts.
// filterStacks modifies the passed in stacks slice.
func filterStacks(stacks []stack.Stack, skipID int, opts *opts) []stack.Stack {
 filtered := stacks[:0]
 for _, stack := range stacks {
  // Always skip the running goroutine.
  if stack.ID() == skipID {
   continue
  }
  // Run any default or user-specified filters.
  if opts.filter(stack) {
   continue
  }
  filtered = append(filtered, stack)
 }
 return filtered
}

The main purpose here is to filter out some goroutine stacks that are not involved in the detection, and if there are no custom filters, the default filters are used.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
func buildOpts(options ...Option) *opts {
 opts := &opts{
  maxRetries: _defaultRetries,
  maxSleep:   100 * time.Millisecond,
 }
 opts.filters = append(opts.filters,
  isTestStack,
  isSyscallStack,
  isStdLibStack,
  isTraceStack,
 )
 for _, option := range options {
  option.apply(opts)
 }
 return opts
}

As can be seen here, the default detection 20 times, each default interval 100ms; add the default filters;

To summarize the principle of goleak implementation.

Use the runtime.Stack() method to get the stack information of all goroutines currently running, define the filter items that do not need to be detected by default, define the number of detections + detection interval by default, and keep detecting in cycles, and finally determine that no goroutine leak has occurred if the remaining goroutine is not found after multiple checks.

Summary

In this article we have shared a tool that can find goroutine leaks in tests, but it still requires complete test case support, which shows the importance of test cases.