Go 1.19 finally implements SetMemoryLimit, Go’s GC doesn’t provide as many parameters to adjust as Java, there is only one parameter GOGC, so it’s exciting to add a parameter that can adjust GC.

Those who have been following Go performance will know that there are two hacker ways to tune Go GC in recent years:

  • ballast: ballast technique. This technique uses a “false” memory footprint to make it harder for Go to reach the threshold for triggering GC, in order to reduce the number of GCs and thus improve performance. This technique is very effective if your application’s memory footprint is basically below a certain threshold, because after all, a large part of Go’s performance consumption is on GC. This is a technique provided by the engineers at twitch.tv.
  • GOGC tuner: Dynamic tuning of the GC target by automatically tuning the GOGC, which is used to reduce the number of GCs when there is enough memory. This is also a very interesting and effective technique that works well in uber’s practice. This is a technology provided by uber engineers, Uber engineers did not open source it out.

Now, Go 1.19 provides SetMemoryLimit, a method that replaces ballast’s scheme and partially replaces GOGC Tuner’s scheme.

Talking about the history of this feature, it dates back to #23044 in December 2017, which proposed to add a method that would specify the minimum target heap size. This issue was hotly discussed and the result was that in 2019 twitch.tv engineers implemented ballast,verifying from an engineering point of view that GC can be optimized and that it works in practice.

In 2021 Go team engineer Michael Knyszek launched a proposal #44309, including a design document user configurable memory target. the tracking issue for this proposal was eventually attributed to #48409.

Originally, this proposal was expected to be implemented in Go 1.18, but because of the delayed approval of the proposal, it will eventually be implemented in Go 1.19.

At the time of writing, Go 1.19 is still under development, but the proposal has already been implemented, and all that remains is some documentation and bug fixes, so we can use gotip to test it.

The original implementation of this proposal was to implement (replace) the ballast functionality, so once Go 1.19 was released, the ballast solution could be deprecated. This year, suddenly Uber’s engineers came up with a scheme to automatically adjust GOGC, so the current scheme can not completely replace GOGC tuner, after all, GOGC tuner can be more flexible to adjust the target of GC, and SetMemoryLimit in the set MemoryLimit, but will still be frequent GC If you add GOGC=off, you can only wait for MemoryLimit to be reached before GC, which is different from the way of GOGC Tuner, so it can’t replace GOGC tuner completely.

The detailed GC tuning guide official document is not yet complete, so you can also pay attention to it and see the official recommendations.

This page is currently a work-in-progress and is expected to be complete by the time of the Go 1.19 release. See this tracking issue for more details.

Even if the official documentation is not yet complete, we can still get an early idea of the functionality and benefits of this proposal according to its content.

Here are four scenarios to observe the impact of this feature on GC:

  • SetMemoryLimit + GOGC=off + MemoryLimit is big enough
  • SetMemoryLimit + GOGC=off + MemoryLimit is not large enough
  • SetMemoryLimit + GOGC=100 + MemoryLimit is large enough
  • SetMemoryLimit + GOGC=100 + MemoryLimit is not large enough

Basic example

This article demonstrates these with the btree example from Debian’s benchmarks game four scenarios.

Because this example generates spanning binary trees frequently, it is suitable for memory allocation and recycling scenarios.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
package main
import (
    "flag"
    "fmt"
    "sync"
    "time"
)
type node struct {
    next *next
}
type next struct {
    left, right node
}
func create(d int) node {
    if d == 1 {
        return node{&next{node{}, node{}}}
    }
    return node{&next{create(d - 1), create(d - 1)}}
}
func (p node) check() int {
    sum := 1
    current := p.next
    for current != nil {
        sum += current.right.check() + 1
        current = current.left.next
    }
    return sum
}
var (
    depth = flag.Int("depth", 10, "depth")
)
func main() {
    flag.Parse()
    start := time.Now()
    const MinDepth = 4
    const NoTasks = 4
    maxDepth := *depth
    longLivedTree := create(maxDepth)
    stretchTreeCheck := ""
    wg := new(sync.WaitGroup)
    wg.Add(1)
    go func() {
        stretchDepth := maxDepth + 1
        stretchTreeCheck = fmt.Sprintf("stretch tree of depth %d\t check: %d",
            stretchDepth, create(stretchDepth).check())
        wg.Done()
    }()
    results := make([]string, (maxDepth-MinDepth)/2+1)
    for i := range results {
        depth := 2*i + MinDepth
        n := (1 << (maxDepth - depth + MinDepth)) / NoTasks
        tasks := make([]int, NoTasks)
        wg.Add(NoTasks)
        // 执行NoTasks个goroutine, 每个goroutine执行n个深度为depth的tree的check
        // 一共是n*NoTasks个tree,每个tree的深度是depth
        for t := range tasks {
            go func(t int) {
                check := 0
                for i := n; i > 0; i-- {
                    check += create(depth).check()
                }
                tasks[t] = check
                wg.Done()
            }(t)
        }
        wg.Wait()
        check := 0 // 总检查次数
        for _, v := range tasks {
            check += v
        }
        results[i] = fmt.Sprintf("%d\t trees of depth %d\t check: %d",
            n*NoTasks, depth, check)
    }
    fmt.Println(stretchTreeCheck)
    for _, s := range results {
        fmt.Println(s)
    }
    fmt.Printf("long lived tree of depth %d\t check: %d\n",
        maxDepth, longLivedTree.check())
    fmt.Printf("took %.02f s", float64(time.Since(start).Milliseconds())/1000)
}

You can use gotip build main.go to generate the Go 1.19 compiled binaries.

In the latter example I did not set MemoryLimit using debug.SetMemoryLimit, but used the environment variable GOMEMLIMIT.

SetMemoryLimit + GOGC=off + MemoryLimit is large enough

First compile the executable binary soft_memory_limit using gotip build main.go.

Run GOMEMLIMIT=10737418240 GOGC=off GODEBUG=gctrace=1 . /soft_memory_limit -depth=21 to see the effect.

go gc

Here I set MemoryLimit to 10G, and the memory threshold is not reached in the whole program, so no GC happens.

Is it the same effect as setting ballast.

SetMemoryLimit + GOGC=off + MemoryLimit is not big enough

Let’s set MemoryLimit to 1G and see how GC behaves (GOMEMLIMIT=1073741824 GOGC=off GODEBUG=gctrace=1 . /soft_memory_limit -depth=21).

go gc

You can see that the memory usage during the program run can still touch the threshold of 1G, which will lead to several garbage collections, and the overall running time is less than the difference between case1, because the GC collection is only a few times and can be ignored.

If you set the threshold smaller, for example by 10 times (GOMEMLIMIT=107374182 GOGC=off GODEBUG=gctrace=1 . /soft_memory_limit -depth=21), you can see more frequent garbage collection and a significant increase in overall program runtime.

go gc

SetMemoryLimit + GOGC=100 + MemoryLimit is big enough

In order to achieve the ballast effect, the previous cases set GOGC to off, what if we set it to the default value of 100?

GOMEMLIMIT=10737418240 GOGC=100 GODEBUG=gctrace=1 . /soft_memory_limit -depth=21

go gc

As you can see, there will be a large number of GC events, and many of them do not reach the threshold before GC occurs. This is also obvious because the GOGC target is still followed to decide whether to garbage collect without reaching the MemoryLimit threshold.

In this case, GOGC tuner can be used for tuning to avoid so many garbage collections.

SetMemoryLimit + GOGC=100 + MemoryLimit is not big enough

If the MemoryLimit is not large enough, the GC will be triggered when the memory reaches MemoryLimit, but since GOGC is not closed, both GOGC and MemoryLimit may trigger the GC, and the program will still run slower overall.

go gc

To sum up, by setting a larger value of SetMemoryLimit and adding GOGC=off, you can achieve the effect of ballast.

But in the case of not turning off GOGC, it may still trigger many times of GC, which affects the performance, this time also need GOGC Tuner tuning, reduce the number of GC before reaching MemoryLimit.