Goroutine

1
2
3
for {
    go func() {}()
}

The threshold for using Goroutine is really low, and there are a lot of abuses.

Goroutine Leak

The causes of Goroutine leaks are usually:

  • Read/write operations such as channel/mutex are being performed inside Goroutine, but due to logic problems, they are blocked all the time in some cases.
  • The business logic within the Goroutine enters a dead loop and resources are never released.
  • The business logic within the Goroutine goes into a long wait, with new Goroutines constantly being added to the wait.

Improper use of channel

Goroutine+Channel is the most classic combination, so many leaks occur here.

The most classic one is the above mentioned logic problem when the channel performs read and write operations.

Send not receive

First example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
func main() {
    for i := 0; i < 4; i++ {
        queryAll()
        fmt.Printf("goroutines: %d\n", runtime.NumGoroutine())
    }
}

func queryAll() int {
    ch := make(chan int)
    for i := 0; i < 3; i++ {
        go func() { ch <- query() }()
	    }
    return <-ch
}

func query() int {
    n := rand.Intn(100)
    time.Sleep(time.Duration(n) * time.Millisecond)
    return n
}

Output results:

1
2
3
4
goroutines: 3
goroutines: 5
goroutines: 7
goroutines: 9

In this example, we call the queryAll method multiple times, and we call the query method in a for loop using Goroutine. The point is that the result of the query method call is written to the ch variable, and the ch variable is returned after a successful reception.

Finally, we can see that the number of output goroutines is increasing, 2 more each time. That is, each time it is called, it leaks a goroutine.

The reason for this is that the channels are sent (3 at a time), but not fully received at the receiving end (only 1 ch is returned), which induces a Goroutine leak.

Receive not send

Second example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
func main() {
    defer func() {
        fmt.Println("goroutines: ", runtime.NumGoroutine())
    }()

    var ch chan struct{}
    go func() {
        ch <- struct{}{}
    }()
    
    time.Sleep(time.Second)
}

Output results:

1
goroutines:  2

In this example, it is the opposite of “send but not receive”, where the channel receives the value but does not send it, which also causes blocking.

But in a real-world business scenario, it’s generally more complicated. Basically, it’s a bunch of business logic, and one channel has a problem with reading or writing, so it naturally blocks.

nil channel

Third example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
func main() {
    defer func() {
        fmt.Println("goroutines: ", runtime.NumGoroutine())
    }()

    var ch chan int
    go func() {
        <-ch
    }()
    
    time.Sleep(time.Second)
}

Output results:

1
goroutines:  2

In this example, you can learn that a channel will block if you forget to initialize it, regardless of whether you are reading, or writing.

The normal way of initialization is:

1
2
3
4
5
6
    ch := make(chan int)
    go func() {
        <-ch
    }()
    ch <- 0
    time.Sleep(time.Second)

Call the make function to initialize.

Strange slow wait

Fourth example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
func main() {
    for {
        go func() {
            _, err := http.Get("https://www.xxx.com/")
            if err != nil {
                fmt.Printf("http.Get err: %v\n", err)
            }
            // do something...
    }()

    time.Sleep(time.Second * 1)
    fmt.Println("goroutines: ", runtime.NumGoroutine())
	}
}

Output results:

1
2
3
4
5
6
7
goroutines:  5
goroutines:  9
goroutines:  13
goroutines:  17
goroutines:  21
goroutines:  25
...

In this example, a classic accident scenario in the Go language is shown. That is, we would normally go to call the interface of a third-party service in our application.

The third-party interface, however, can sometimes be very slow and not return a response for a long time. As it happens, the default http.Client in Go does not set a timeout.

So it keeps blocking and Goroutine naturally keeps spiking and leaking, eventually filling up resources and causing accidents.

In Go projects, we generally recommend setting a timeout for at least http.Client:

1
2
3
    httpClient := http.Client{
        Timeout: time.Second * 15,
    }

And do measures such as flow restriction and fusing to prevent sudden flows from causing dependency collapse.

Forget to unlock

Fifth example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
func main() {
    total := 0
    defer func() {
        time.Sleep(time.Second)
        fmt.Println("total: ", total)
        fmt.Println("goroutines: ", runtime.NumGoroutine())
	}()

    var mutex sync.Mutex
    for i := 0; i < 10; i++ {
        go func() {
            mutex.Lock()
            total += 1
        }()
    }
}

Output results:

1
2
total:  1
goroutines:  10

In this example, the first mutex sync.Mutex is locked, but it may be working on business logic or it may have forgotten to unlock it.

Mutex tried to lock, but all the subsequent sync.Mutexes blocked because they were not released. In general, in Go projects, we recommend the following:

1
2
3
4
5
6
7
8
    var mutex sync.Mutex
    for i := 0; i < 10; i++ {
        go func() {
            mutex.Lock()
            defer mutex.Unlock()
            total += 1
    }()
    }

Improper use of sync lock

Sixth example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
func handle(v int) {
    var wg sync.WaitGroup
    wg.Add(5)
    for i := 0; i < v; i++ {
        fmt.Println("nzjjy")
        wg.Done()
    }
    wg.Wait()
}

func main() {
    defer func() {
        fmt.Println("goroutines: ", runtime.NumGoroutine())
    }()

    go handle(3)
    time.Sleep(time.Second)
}

In this example, we call the synchronization orchestration sync.WaitGroup to simulate the control variables that we would pass in from the outside for loop traversal.

However, because the number of wg.Add does not match the number of wg.Done, it keeps blocking and waiting after calling the wg.Wait method.

For use in a Go project, we would recommend writing it as follows:

1
2
3
4
5
6
7
    var wg sync.WaitGroup
    for i := 0; i < v; i++ {
        wg.Add(1)
        defer wg.Done()
        fmt.Println("nzjjy")
    }
    wg.Wait()

Verification method

We can call the runtime.NumGoroutine method to get the number of Goroutine runs, and compare them before and after to know if there is a leak.

However, in business service scenarios, most of the leaks caused by Goroutine are in production and test environments, so it is more common to use PProf:

1
2
3
4
5
6
import (
    "net/http"
     _ "net/http/pprof"
)

http.ListenAndServe("localhost:6060", nil))

As long as we call http://localhost:6060/debug/pprof/goroutine?debug=1, PProf will return a list of all Goroutines with stack traces.


Reference https://eddycjy.com/posts/go/goroutine-leak/