go gc closures

1. Go function closures

The Go language provides native support for closures. In Go, closures are function literals. The Go specification interprets closures in the following way.

function literals are closures: they can refer to variables defined in their wrapping function. These variables are then shared between the wrapping function and the function literals, and they continue to exist as long as they can be accessed.

Closures have a wide range of applications in Go, most often used in conjunction with the go keyword to create a new goroutine, such as the following code from the net/http package in the standard library.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
// $GOROOT/src/net/http/fileTransport.go

func (t fileTransport) RoundTrip(req *Request) (resp *Response, err error) {
    rw, resc := newPopulateResponseWriter()
    go func() {
        t.fh.ServeHTTP(rw, req)
        rw.finish()
    }()
    return <-resc, nil
}

The RoundTrip method in the above code creates a new goroutine using the go keyword in combination with closures, and the function running in this goroutine also references the variables that belong to its external wrapper function: t, rw, and req, or both share these variables.

Once a variable that was only used inside the RoundTrip method is “shared” with another function, it cannot be allocated on the stack and escaping to the heap is a deterministic event.

So here’s the problem! When can these external variables allocated on the heap that are referenced or called captured by closure be reclaimed? Perhaps the above example is still very easy to understand; these variables can be reclaimed when the newly created goroutine finishes executing. What about the following closure function?

1
2
3
4
5
6
7
func foo() func(int) int {
    i := []int{0: 10, 1: 11, 15: 128}
    return func(n int) int {
        n+=i[0]
        return n
    }
}

In this foo function, when can the slice variable i of length 16, which is captured by the closure function, be reclaimed?

Note: When we define closures, we like to use the term variable that references the external wrapper function, but in the Go compiler implementation code, we use is capture var, which translates to “captured variables”, so the term “capture” is also used here to denote variables in externally wrapped functions or even further functions that are shared by the closure.

The return value type of the foo function is a function, which means that the local variable i of the foo function is captured by the newly created closure function returned by foo and i is not reclaimed. Usually a memory object on the heap has a clear reference to it or a pointer to its address before it survives and is reclaimed by GC when it is unreachable, that is, when there is no more reference to it or a pointer to it.

So, who exactly is variable i referenced by? When will variable i be reclaimed?

Let’s first go back to a general function that is not a closure.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
func f1() []int {
    i := []int{0: 10, 1: 11, 15: 128}
    return i
}

func f2() {
    sl := f1()
    sl[0] = sl[0] + 10
    fmt.Println(sl)
}

func main() {
    f2()
}

We see that after f1 returns its own local slice variable i, the variable is referenced by sl in the f2 function. After the execution of the f2 function is complete, the slice variable i will become unreachable and the GC will reclaim the heap memory corresponding to the variable.

If we switch to a closure function, such as the foo function earlier, we are likely to use it in this way.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// https://github.com/bigwhite/experiments/tree/master/closure/closure1.go

package main
import "fmt"
func foo() func(int) int {
    i := []int{0: 10, 1: 11, 15: 128}
    return func(n int) int {
        n += i[0]
        return n
    }
}
func bar() {
    f := foo()
    a := f(5)
    fmt.Println(a)
}

func main() {
    bar()
    g := foo()
    b := g(6)
    fmt.Println(b)
}

In the example here, the local variables of the foo function are referenced whenever they are in the closure function. This suddenly reminds me of the “functions are also first-class citizens feature in Go”. Could it be that the closure function is an object that references the local variables of the foo function? Then how does the closure function refer to the local integer slice variable i of the foo function on the memory layout? What is the closure function mapped to in the memory layout?

If a programming language has no restrictions on the creation and use of certain language elements, and we can treat such syntax elements like value, then we call such syntax elements “first-class citizens” of the programming language.

2. Go closure function objects

To answer this question, we have to ask Go assembly for help. We generate the above assembly code for closure1.go (we use go compiler version 1.16.5).

1
$go tool compile -S closure1.go > closure1.s

In the assembly code, we find the assembly code corresponding to the creation of a closure function at line 7 in closure1.go.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// https://github.com/bigwhite/experiments/tree/master/closure/closure1.s

    0x0052 00082 (closure1.go:7)    LEAQ    type.noalg.struct { F uintptr; "".i []int }(SB), CX
    0x0059 00089 (closure1.go:7)    MOVQ    CX, (SP)
    0x005d 00093 (closure1.go:7)    PCDATA  $1, $1
    0x005d 00093 (closure1.go:7)    NOP
    0x0060 00096 (closure1.go:7)    CALL    runtime.newobject(SB)
    0x0065 00101 (closure1.go:7)    MOVQ    8(SP), AX
    0x006a 00106 (closure1.go:7)    LEAQ    "".foo.func1(SB), CX
    0x0071 00113 (closure1.go:7)    MOVQ    CX, (AX)
    0x0074 00116 (closure1.go:7)    MOVQ    $16, 16(AX)
    0x007c 00124 (closure1.go:7)    MOVQ    $16, 24(AX)
    0x0084 00132 (closure1.go:7)    PCDATA  $0, $-2
    0x0084 00132 (closure1.go:7)    CMPL    runtime.writeBarrier(SB), $0
    0x008b 00139 (closure1.go:7)    JNE 165
    0x008d 00141 (closure1.go:7)    MOVQ    ""..autotmp_7+16(SP), CX
    0x0092 00146 (closure1.go:7)    MOVQ    CX, 8(AX)
    0x0096 00150 (closure1.go:7)    PCDATA  $0, $-1
    0x0096 00150 (closure1.go:7)    MOVQ    AX, "".~r0+40(SP)
    0x009b 00155 (closure1.go:7)    MOVQ    24(SP), BP
    0x00a0 00160 (closure1.go:7)    ADDQ    $32, SP
    0x00a4 00164 (closure1.go:7)    RET
    0x00a5 00165 (closure1.go:7)    PCDATA  $0, $-2
    0x00a5 00165 (closure1.go:7)    LEAQ    8(AX), DI
    0x00a9 00169 (closure1.go:7)    MOVQ    ""..autotmp_7+16(SP), CX
    0x00ae 00174 (closure1.go:7)    CALL    runtime.gcWriteBarrierCX(SB)
    0x00b3 00179 (closure1.go:7)    JMP 150
    0x00b5 00181 (closure1.go:7)    NOP

The assembly is always obscure. Let’s focus on the first line.

1
 0x0052 00082 (closure1.go:7)    LEAQ    type.noalg.struct { F uintptr; "".i []int }(SB), CX

We see that line 7, which corresponds to the creation of the closure function in the Go source code, this line of assembly code roughly means to put the address of a structure object into CX. let’s extract this structure object.

1
2
3
4
struct {
    F uintptr
    i []int
}

Where does this structure object come from? Apparently the Go compiler created it based on the “characteristics” of the closure function. The F is the address of the closure function itself, which after all is a function, and should be in the same memory area as a normal function (like the read-only data area of rodata), but what about the integer slice variable i? Is this the local variable i of the Foo function that is captured by the closure function. Exactly. If you don’t believe me, we can define another closure function that captures more variables to verify it.

Here is the generator function for a closure function that captures 3 integer variables.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
// https://github.com/bigwhite/experiments/tree/master/closure/closure2.go

func foo() func(int) int {
    var a, b, c int = 11, 12, 13
    return func(n int) int {
        a += n
        b += n
        c += n
        return a + b + c
    }
}

The structure of that closure function in its corresponding assembly code is as follows.

1
0x0084 00132 (closure2.go:10)   LEAQ    type.noalg.struct { F uintptr; "".a *int; "".b *int; "".c *int }(SB), CX

Extract the structure, i.e.

1
2
3
4
5
6
struct {
    F uintptr
    a *int
    b *int
    c *int
}

At this point, we have confirmed that it is the closure function itself that references the local variables of the wrapped function, i.e., the closure function structure object that the compiler has created in memory for it. With the unsafe package, we can even export this closure function object. Let’s try it with closure2.go as an example, as shown in the following code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// https://github.com/bigwhite/experiments/tree/master/closure/closure2.go

func foo() func(int) int {
    var a, b, c int = 11, 12, 13
    return func(n int) int {
        a += n
        b += n
        c += n
        return a + b + c
    }
}

type closure struct {
    f uintptr
    a *int
    b *int
    c *int
}

func bar() {
    f := foo()
    f(5)
    pc := *(**closure)(unsafe.Pointer(&f))
    fmt.Printf("%#v\n", *pc)
    fmt.Printf("a=%d, b=%d,c=%d\n", *pc.a, *pc.b, *pc.c)
    f(6)
    fmt.Printf("a=%d, b=%d,c=%d\n", *pc.a, *pc.b, *pc.c)
}

In the above code, we refer to the assembly output to define the closure structure to correspond to the closure function object in memory (each closure object is different, a trick is to refer to the assembly output to define the object), through the address conversion of unsafe, we map the closure object in memory to the closure structure instance. Running the above program, we can get the following output.

1
2
3
4
$go run closure2.go
main.closure{f:0x10a4d80, a:(*int)(0xc000118000), b:(*int)(0xc000118008), c:(*int)(0xc000118010)}
a=16, b=17,c=18
a=22, b=23,c=24

In the above example, the closure function captures external variables a, b and c, which are essentially referenced by a closure memory object created by the compiler. When we call the foo function, the closure function object is created (its address is assigned to the variable f). Thus, the f object keeps referring to variables a, b, and c. Only when f is reclaimed will a, b, and c be reclaimed as unreachable.

If we simply perform read-only operations on captured external variables in a closure function, the closure function object will not store pointers to these variables, but will only make a copy of the values. Of course, if a variable is captured by multiple closures created in a function, and some are read-only and some are modified, then the closure function object will still store the address of that variable.

Understanding the nature of closed functions makes it much easier to look at the question in the title of this article. The answer is that after the closure function object that captures the variable is reclaimed, if there are no other references to those captured variables, they will become unreachable and subsequently reclaimed by GC.

3. Summary

Let’s recall the quote from the beginning of the article about the Go language specification’s interpretation of closures: “They will continue to exist as long as they can be accessed”. It now appears that we can interpret this to mean that as long as the closure function object exists, those variables it captures will exist and will not be recycled .

This mechanism of the closure function dictates that we should always consider the possible “delayed recycling” of the variables captured by the closure function in our daily use. If there is a scenario where the variables referenced by the closure take up a lot of memory, and the closure function object is created in large numbers and executed with a long delay due to business needs (e.g. timer scenarios), this can lead to a high level of heap memory for a long time, and we need to consider whether the memory capacity can withstand such a level, and if not, we need to consider a different implementation.