This time together, we will understand common function calls, struct method calls, and closure calls in depth assembly from the stack perspective.

Preamble

Function call types

Functions in this article refer to any executable block of code in Go. As mentioned in Go 1.1 Function Calls, there are four types of functions in Go.

  • top-level func
  • method with value receiver
  • method with pointer receiver
  • func literal

top-level func is what we normally write as a normal function.

1
func TopLevel(x int) {}

The method with value receiver & method with pointer receiver refer to the value receiver method and pointer receiver methods of the struct method.

A struct method can add new behavior to a user-defined type. The difference between it and a function is that a method has a receiver, add a receiver to a function and it becomes a method. The receiver can be either a value receiver or a pointer receiver.

Let’s take two simple constructs, Man and Woman, for example.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
type Man struct {
}
type Woman struct {
}

func (*Man) Say() {
}

func (Woman) Say() {
}

Say()uses apointer receiver; (Woman) Say()is avalue receiver`.

The definition of function literal is as follows.

A function literal represents an anonymous function.

That is, it contains anonymous functions and closures.

The following analysis is also expanded according to these types.

Basics

stack

On modern mainstream machine architectures (e.g., x86), the stack grows downward. The stack grows downward from the high address to the status address.

Let’s look at the definition of the assembly function for plan9.

assembly function

Let’s take a look at the definition of the assembly function for plan9:

sobyte

stack frame size: contains local variables and the space for additional function calls.

arguments size: contains the size of the arguments as well as the return value, e.g. if the input is 3 int64 types and the return value is 1 int64 type, then the return value is sizeof(int64) * 4.

Stack adjustment

Stack adjustment is achieved by performing operations on the hardware SP registers, for example:

1
2
3
SUBQ    $24, SP  // 对 sp 做减法,为函数分配函数栈帧 
...
ADDQ    $24, SP  // 对 sp 做加法 ,清除函数栈帧

Since the stack grows downward, SUBQ actually allocates stack frames for the function when it subtracts from SP, and ADDQ clears the stack frames.

Common instructions

Addition and subtraction operations.

1
2
ADDQ  AX, BX   // BX += AX
SUBQ  AX, BX   // BX -= AX

Data handling.

Constants are denoted by $num in plan9 assembly, can be negative, and are decimal by default. The length of the carry is determined by the suffix of the MOV.

1
2
3
4
MOVB $1, DI      // 1 byte
MOVW $0x10, BX   // 2 bytes
MOVD $1, DX      // 4 bytes
MOVQ $-10, AX     // 8 bytes

Another difference is that when using MOVQ you will see the difference between with and without parentheses.

1
2
3
4
5
6
// 加括号代表是指针的引用
MOVQ (AX), BX   // => BX = *AX 将AX指向的内存区域8byte赋值给BX
MOVQ 16(AX), BX // => BX = *(AX + 16)

// 不加括号是值的引用
MOVQ AX, BX     // => BX = AX 将AX中存储的内容赋值给BX,注意区别

Address arithmetic.

1
LEAQ (AX)(AX*2), CX // => CX = AX + (AX * 2) = AX * 3

The 2 in the above code stands for scale, and scale can only be 0, 2, 4, or 8.

Function call analysis

Direct function calls

We define a simple function here.

1
2
3
4
5
6
7
8
9
package main

func main() {
    add(1, 2)
}

func add(a, b int) int {
    return a + b
}

Then use the command to print out the assembly.

1
GOOS=linux GOARCH=amd64 go tool compile -S -N -l main.go

Let’s look at the assembly instructions and the stack in sections. Let’s start with the main method call.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
"".main STEXT size=71 args=0x0 locals=0x20
0x0000 00000 (main.go:3)        TEXT    "".main(SB), ABIInternal, $32-0
0x0000 00000 (main.go:3)        MOVQ    (TLS), CX
0x0009 00009 (main.go:3)        CMPQ    SP, 16(CX)   ; 栈溢出检测
0x000d 00013 (main.go:3)        PCDATA  $0, $-2      ; GC 相关
0x000d 00013 (main.go:3)        JLS     64
0x000f 00015 (main.go:3)        PCDATA  $0, $-1      ; GC 相关
0x000f 00015 (main.go:3)        SUBQ    $32, SP      ; 分配了 32bytes 的栈地址
0x0013 00019 (main.go:3)        MOVQ    BP, 24(SP)   ; 将 BP 的值存储到栈上
0x0018 00024 (main.go:3)        LEAQ    24(SP), BP   ; 将刚分配的栈空间 8bytes 的地址赋值给BP
0x001d 00029 (main.go:3)        FUNCDATA        $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) ; GC 相关
0x001d 00029 (main.go:3)        FUNCDATA        $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) ; GC 相关
0x001d 00029 (main.go:4)        MOVQ    $1, (SP)     ; 将给add函数的第一个参数1,写到SP
0x0025 00037 (main.go:4)        MOVQ    $2, 8(SP)    ; 将给add函数的第二个参数2,写到SP 
0x002e 00046 (main.go:4)        PCDATA  $1, $0
0x002e 00046 (main.go:4)        CALL    "".add(SB)   ; 调用 add 函数 
0x0033 00051 (main.go:5)        MOVQ    24(SP), BP   ; 将栈上储存的值恢复BP
0x0038 00056 (main.go:5)        ADDQ    $32, SP      ; 增加SP的值,栈收缩,收回 32 bytes的栈空间 
0x003c 00060 (main.go:5)        RET

Here’s a look at what the above assembly does.

1
0x0000 00000 (main.go:3)        TEXT    "".main(SB), ABIInternal, $32-0

0x0000 : the offset of the current instruction relative to the current function.

TEXT : since the program code is placed in the .text segment of memory during runtime, TEXT is an instruction to define a function.

"".main(SB) : indicates the package name. SB is a virtual register that holds the static-base pointer, which is the start address of our program address space.

$32-0 :$32 table the size of the stack frame to be allocated; 0 specifies the size of the arguments passed in by the caller.

1
2
3
4
5
0x000d 00013 (main.go:3)        PCDATA  $0, $-2      ; GC 相关
0x000f 00015 (main.go:3)        PCDATA  $0, $-1      ; GC 相关

0x001d 00029 (main.go:3)        FUNCDATA        $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) ; GC 相关
0x001d 00029 (main.go:3)        FUNCDATA        $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) ; GC 相关

The FUNCDATA and PCDATA instructions contain information that is used by garbage collection; these instructions are added by the compiler.

1
0x000f 00015 (main.go:3)        SUBQ    $32, SP

(a) When executing a call on the stack, since the stack grows from the high bit of the memory address to the low bit, the call SUBQ $32, SP is called based on the current stack frame size, indicating that 32bytes of stack memory is allocated.

1
2
0x0013 00019 (main.go:3)        MOVQ    BP, 24(SP)   ; 将 BP 的值存储到栈上
0x0018 00024 (main.go:3)        LEAQ    24(SP), BP   ; 将刚分配的栈空间 8bytes 的地址赋值给BP

Here 8 bytes (24(SP)-32(SP)) are used to store the current frame pointer BP.

1
2
0x001d 00029 (main.go:4)        MOVQ    $1, (SP)     ; 将给add函数的第一个参数1写到SP
0x0025 00037 (main.go:4)        MOVQ    $2, 8(SP)    ; 将给add函数的第二个参数2写到SP 

parameter value 1 will be pressed onto the stack at position (0(SP)-8(SP)).

Parameter value 2 will be pressed onto the stack at (8(SP)-16(SP)).

It is important to note that our parameter type here is int, which is 8byte in size in 64-bit. Although the stack grows from the high address bit to the low address bit, the data block inside the stack is stored from the low address bit to the high address bit, and the location pointed to by the pointer is also the starting location of the low address bit of the data block.

In summary, in function calls, we can know two pieces of information about the passing of parameters.

  1. Parameters are passed entirely through the stack.
  2. Stacking from right to left of the argument list.

Here are the details of the call to the stack before the add function is called.

sobyte

When we prepare the function entry, we call the assembly instruction CALL "".add(SB), which first stores the return address of main (8 bytes) on the stack, then changes the current stack pointer SP and executes the assembly instruction for add.

Let’s go to the add function.

1
2
3
4
5
6
7
8
9
"".add STEXT nosplit size=25 args=0x18 locals=0x0
0x0000 00000 (main.go:7)        TEXT    "".add(SB), NOSPLIT|ABIInternal, $0-24
0x0000 00000 (main.go:7)        FUNCDATA        $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) ; GC 相关
0x0000 00000 (main.go:7)        FUNCDATA        $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) ; GC 相关
0x0000 00000 (main.go:7)        MOVQ    $0, "".~r2+24(SP)   ; 初始化返回值
0x0009 00009 (main.go:8)        MOVQ    "".a+8(SP), AX      ; AX = 1
0x000e 00014 (main.go:8)        ADDQ    "".b+16(SP), AX     ; AX = AX + 2
0x0013 00019 (main.go:8)        MOVQ    AX, "".~r2+24(SP)   ; (24)SP = AX = 3
0x0018 00024 (main.go:8)        RET 

Since the current stack pointer SP will be changed, let’s look at the data on the stack before we look at the assembly code of this function, here we can actually dlv it.

We can print the current Rsp and Rbp registers with regs when we get to the add function.

1
2
3
4
5
6
7
8
9
(dlv) regs 
   Rsp = 0x000000c000044760
   Rbp = 0x000000c000044778
     ...

(dlv)  print uintptr(0x000000c000044778)
824634001272
(dlv)  print uintptr(0x000000c000044760)
824634001248

The difference between the address values of Rsp and Rbp is 24 bytes, which is consistent with our example above.

Then after entering the add function, we can use regs to print the current Rsp and Rbp registers.

1
2
3
4
5
6
7
8
9
(dlv) regs
   Rsp = 0x000000c000044758
   Rbp = 0x000000c000044778
   ...

(dlv)  print uintptr(0x000000c000044778)
824634001272
(dlv)  print uintptr(0x000000c000044758)
824634001240

The difference between the address values of Rsp and Rbp is 32 bytes, because the return address of the function (8 bytes) is pushed to the top of the stack when the CALL instruction is called.

At this time, the position of parameter value 1 and parameter value 2 will also change.

Parameter value 1, which was at position (0(SP)-8(SP)) on the stack, will be moved to position (8(SP)-16(SP)) on the stack.

the value of parameter 2, which would have been at position (8(SP)-16(SP)) on the stack, will be moved to position (16(SP)-24(SP)) on the stack.

We can also print out the parameter values via dlv.

1
2
3
4
(dlv) print *(*int)(uintptr(0x000000c000044758)+8)
1
(dlv) print *(*int)(uintptr(0x000000c000044758)+16)
2

The following are the call details of the call stack after the add function is called.

sobyte

From the analysis of the add function call above we can also conclude that

  • The return value is passed on the stack, and the stack space for the return value is before the arguments

After the call, we look at the return of the add function.

1
2
3
4
0x002e 00046 (main.go:4)        CALL    "".add(SB)   ; 调用 add 函数 
0x0033 00051 (main.go:5)        MOVQ    24(SP), BP   ; 将栈上储存的值恢复BP
0x0038 00056 (main.go:5)        ADDQ    $32, SP      ; 增加SP的值,栈收缩,收回 32 bytes的栈空间 
0x003c 00060 (main.go:5)        RET

After the add function is called, the BP pointer is restored and then the ADDQ instruction is called to increase the value of SP and perform a stack shrink. From here, we can see that the final caller is responsible for the cleanup of the stack.

To summarize the following rules for stack calls.

  1. Parameters are passed entirely through the stack.
  2. Stacking from right to left of the argument list.
  3. return value is passed through the stack, and the stack space for the return value is before the arguments
  4. After the function call, the caller (caller) takes care of the stack cleanup

Structure methods: value receivers and pointer receivers

As we mentioned above, there are two types of method receivers in Go, a value receiver and a pointer receiver. Here’s an example to illustrate.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
package main

func main() { 
    p := Point{2, 5} 
    p.VIncr(10)
    p.PIncr(10)
}

type Point struct {
    X int
    Y int
}

func (p Point) VIncr(factor int) {
    p.X += factor
    p.Y += factor
}

func (p *Point) PIncr(factor int) {
    p.X += factor
    p.Y += factor
} 

You can look at the manual assembly output yourself in conjunction with the article.

Calling the value receiver method

In assembly, our structure is actually a piece of contiguous memory at the assembly level, so p := Point{2, 5} is initialized as follows.

1
2
3
4
0x001d 00029 (main.go:5)        XORPS   X0, X0                  ;; 初始化寄存器 X0
0x0020 00032 (main.go:5)        MOVUPS  X0, "".p+24(SP)         ;; 初始化大小为16bytes连续内存块
0x0025 00037 (main.go:5)        MOVQ    $2, "".p+24(SP)         ;; 初始化结构体 p 参数 x
0x002e 00046 (main.go:5)        MOVQ    $5, "".p+32(SP)         ;; 初始化结构体 p 参数 y

Our struct Point here is composed of two int parameters, int is 8bytes on 64-bit machines, so here we use XORPS to initialize the 128-bit size X0 register first, and then use MOVUPS to assign the 128-bit size X0 to 24(SP) to request a 16bytes memory block. Then initialize the two parameters 2 and 5 of Point.

The next step is to initialize the variables and then call the p.VIncr method.

1
2
3
4
5
0x0037 00055 (main.go:7)        MOVQ    $2, (SP)                ;; 初始化变量2
0x003f 00063 (main.go:7)        MOVQ    $5, 8(SP)               ;; 初始化变量5
0x0048 00072 (main.go:7)        MOVQ    $10, 16(SP)             ;; 初始化变量10
0x0051 00081 (main.go:7)        PCDATA  $1, $0
0x0051 00081 (main.go:7)        CALL    "".Point.VIncr(SB)      ;; 调用 value receiver 方法

Up to this point, the structure of the stack frame before the call probably looks like this.

sobyte

Look again at the assembly code for p.VIncr.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
"".Point.VIncr STEXT nosplit size=31 args=0x18 locals=0x0
        0x0000 00000 (main.go:16)       TEXT    "".Point.VIncr(SB), NOSPLIT|ABIInternal, $0-24
        0x0000 00000 (main.go:16)       FUNCDATA        $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
        0x0000 00000 (main.go:16)       FUNCDATA        $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
        0x0000 00000 (main.go:17)       MOVQ    "".p+8(SP), AX          ;; AX = 8(SP) = 2
        0x0005 00005 (main.go:17)       ADDQ    "".factor+24(SP), AX    ;; AX = AX + 24(SP) = 2+10
        0x000a 00010 (main.go:17)       MOVQ    AX, "".p+8(SP)          ;; 8(SP) = AX = 12
        0x000f 00015 (main.go:18)       MOVQ    "".p+16(SP), AX         ;; AX = 16(SP) = 5
        0x0014 00020 (main.go:18)       ADDQ    "".factor+24(SP), AX    ;; AX = AX + 24(SP) = 5+10
        0x0019 00025 (main.go:18)       MOVQ    AX, "".p+16(SP)         ;; 16(SP) = AX  = 15
        0x001e 00030 (main.go:19)       RET 

The structure of the stack frame after the call here will probably look like this.

sobyte

From the above analysis we can see that the caller is actually assigning values on the stack to VIncr as parameters when calling the VIncr method, and the changes in VIncr are actually modifying the last two parameter values on the stack.

Calling the pointer receiver method

In main, the call is made with the following instruction

1
2
3
4
0x0056 00086 (main.go:8)        LEAQ    "".p+24(SP), AX         ;; 将 24(SP) 地址值赋值到 AX
0x005b 00091 (main.go:8)        MOVQ    AX, (SP)                ;; 将AX的值作为第一个参数,参数值是 2
0x005f 00095 (main.go:8)        MOVQ    $10, 8(SP)              ;; 将 10 作为第二个参数
0x0068 00104 (main.go:8)        CALL    "".(*Point).PIncr(SB)   ;; 调用 pointer receiver 方法

From the above assembly, we know that AX actually holds the address value of 24(SP), and the pointer stored in AX is also assigned to the first argument of SP. That is, both AX and SP’s first argument are the address value of 24(SP).

The entire stack frame structure should look like the following.

sobyte

Look at the assembly code for p.PIncr again.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
"".(*Point).PIncr STEXT nosplit size=53 args=0x10 locals=0x0
        0x0000 00000 (main.go:21)       TEXT    "".(*Point).PIncr(SB), NOSPLIT|ABIInternal, $0-16
        0x0000 00000 (main.go:21)       FUNCDATA        $0, gclocals·1a65e721a2ccc325b382662e7ffee780(SB)
        0x0000 00000 (main.go:21)       FUNCDATA        $1, gclocals·69c1753bd5f81501d95132d08af04464(SB)
        0x0000 00000 (main.go:22)       MOVQ    "".p+8(SP), AX          ;; 将8(SP) 处存放地址值赋值到 AX  
        0x0005 00005 (main.go:22)       TESTB   AL, (AX)
        0x0007 00007 (main.go:22)       MOVQ    "".p+8(SP), CX          ;; 将8(SP) 处存放地址值赋值到 CX 
        0x000c 00012 (main.go:22)       TESTB   AL, (CX)
        0x000e 00014 (main.go:22)       MOVQ    (AX), AX                ;; 从 AX 里读到内存地址从内存地址里拿到值再读到AX
        0x0011 00017 (main.go:22)       ADDQ    "".factor+16(SP), AX    ;; 将参数值 10 加到 AX 里, AX = AX + 10 =12
        0x0016 00022 (main.go:22)       MOVQ    AX, (CX)                ;; 将计算结果写入到 CX 的内存地址
        0x0019 00025 (main.go:23)       MOVQ    "".p+8(SP), AX          ;; 将 8(SP) 处的地址值赋值给 AX
        0x001e 00030 (main.go:23)       TESTB   AL, (AX)
        0x0020 00032 (main.go:23)       MOVQ    "".p+8(SP), CX          ;; 将 8(SP) 处的地址值赋值给 CX
        0x0025 00037 (main.go:23)       TESTB   AL, (CX)
        0x0027 00039 (main.go:23)       MOVQ    8(AX), AX               ;; 从 AX 里读到内存地址值+8 然后从内存地址里拿到值再读到AX
        0x002b 00043 (main.go:23)       ADDQ    "".factor+16(SP), AX    ;; AX = 5+10
        0x0030 00048 (main.go:23)       MOVQ    AX, 8(CX)               ;; 将计算结果 15 写入到 CX+8 的内存地址
        0x0034 00052 (main.go:24)       RET

In this method is actually a bit interesting and a bit roundabout, because a lot of places are actually operations on pointers, so that changes made by either side will affect the other side.

Here is a step-by-step analysis.

1
2
3
0x0000 00000 (main.go:22)       MOVQ    "".p+8(SP), AX
0x0007 00007 (main.go:22)       MOVQ    "".p+8(SP), CX 
0x000e 00014 (main.go:22)       MOVQ    (AX), AX

These two instructions assign the pointer stored in 8(SP) to AX and CX respectively, and then get the value from the AX memory address and write it to AX.

1
2
0x0011 00017 (main.go:22)       ADDQ    "".factor+16(SP), AX 
0x0016 00022 (main.go:22)       MOVQ    AX, (CX)

Here the 16(SP) parameter is added to AX, so the value stored in AX should be 12. Then AX is assigned to the value pointed to by CX’s memory address, and we know from the above assembly that CX points to the pointer stored in 8(SP), so the value pointed to by 8(SP) is also modified here.

We can verify this by using dlv to output regs.

1
2
3
4
(dlv) regs
    Rsp = 0x000000c000056748
    Rax = 0x000000000000000c
    Rcx = 0x000000c000056768

We can then look at the values stored in 8(SP) and CX.

1
2
3
4
(dlv) print *(*int)(uintptr(0x000000c000056748) +8  ) 
824634074984
(dlv) print uintptr(0x000000c000056768)
824634074984

You can see that they all point to the same 32(SP) pointer.

1
2
(dlv) print uintptr(0x000000c000056748) +32
824634074984

Then we can print out the exact value of the pointer.

1
2
(dlv) print *(*int)(824634074984) 
12

At this point, the stack frame looks like this.

sobyte

Let’s move on to the next

1
2
0x0019 00025 (main.go:23)       MOVQ    "".p+8(SP), AX
0x0020 00032 (main.go:23)       MOVQ    "".p+8(SP), CX

Here the address values stored at 8(SP) are assigned to AX and CX.

Here we let the code run after the MOVQ "".p+8(SP), CX execution line with the single-step step-instruction command, and then look at the AX pointer location.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
(dlv) disassemble
        ...
        main.go:21      0x467980        488b4c2408      mov rcx, qword ptr [rsp+0x8]
=>      main.go:21      0x467985        8401            test byte ptr [rcx], al
        main.go:21      0x467987        488b4008        mov rax, qword ptr [rax+0x8]
        ...

(dlv) regs
    Rsp = 0x000000c000056748
    Rax = 0x000000c000056768 
    Rcx = 0x000000c000056768

(dlv) print uintptr(0x000000c000056768)
824634074984

You can see that AX and CX are pointing to the same memory address location. We then go to the following.

1
0x0027 00039 (main.go:23)       MOVQ    8(AX), AX

As mentioned earlier, for structures, the allocation is a continuous block of code, and 32(SP) to 48(SP) on the stack point to the structure instantiated by variable p. So in the above printout 824634074984 represents the value of variable p.X. Then the address value of p.Y is 824634074984+8, and we can also print out the value represented by the address via dlv We can also print out the value represented by the address with dlv.

1
2
(dlv) print *(*int)(824634074984+8) 
5

So MOVQ 8(AX), AX actually does add 8 to the address value, then takes the result 5 and assigns it to AX.

1
2
0x002b 00043 (main.go:23)       ADDQ    "".factor+16(SP), AX ;; AX = AX +10
0x0030 00048 (main.go:23)       MOVQ    AX, 8(CX)

The result is that AX is calculated to be equal to 15, and then the result is written to the space pointed to by the CX+8 memory address value, which also modifies the value pointed to by the pointer at 40(SP).

At the end of this method, the stack frame is as follows.

sobyte

From the above analysis we can see an interesting thing, in the call to the pointer receiver (pointer receiver) method call, is actually the first copy of the structure pointer to the stack, and then in the method call is all based on the pointer operation.

Summary

We know that when we call the value receiver method, the caller writes the value of the argument to the stack, and the caller callee actually operates on the value of the argument on the caller’s stack.

The difference between the pointer receiver method and the value receiver method is that the caller writes to the stack the address of the parameter, so it is reflected in the receiver’s structure directly after the call.

Literal methods func literal

func literal I don’t know how to translate it exactly, let’s call it a literal method, in Go these methods mainly include anonymous functions and closures.

anonymous functions

I’m going to analyze this with a simple example.

1
2
3
4
5
6
7
8
9
package main

func main() {
    f := func(x int) int {
        x += x
        return x
    }
    f(100)
}

Let’s look at its compilation below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
        0x0000 00000 (main.go:3)        TEXT    "".main(SB), ABIInternal, $32-0
        ...
        0x001d 00029 (main.go:4)        LEAQ    "".main.func1·f(SB), DX
        0x0024 00036 (main.go:4)        MOVQ    DX, "".f+16(SP)
        0x0029 00041 (main.go:8)        MOVQ    $100, (SP)
        0x0031 00049 (main.go:8)        MOVQ    "".main.func1·f(SB), AX
        0x0038 00056 (main.go:8)        PCDATA  $1, $0
        0x0038 00056 (main.go:8)        CALL    AX
        0x003a 00058 (main.go:9)        MOVQ    24(SP), BP
        0x003f 00063 (main.go:9)        ADDQ    $32, SP
        0x0043 00067 (main.go:9)        RET

Through the above analysis I believe you should be able to see what this assembly is doing. The anonymous function actually passes the entry address of the anonymous function.

Closures

What are closures? Wikipedia describes closures in the following way.

a closure is a record storing a function together with an environment .

A closure is an entity consisting of a function and its associated reference environment.

I’ll do it here with a simple example.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
package main

func test() func() {
    x := 100
    return func() {
        x += 100
    }
}

func main() {
    f := test()
    f() //x= 200
    f() //x= 300
    f() //x= 400
} 

Since closures are contextual, let’s take the test example where the variable x changes with each call to the f() function. But as we know from other method calls, if the variable is stored on the stack then the variable will expire with the exit of the stack frame, so the variable of the closure will escape to the heap.

We can perform an escape analysis to prove this.

1
2
3
4
[root@localhost gotest]$ go run -gcflags "-m -l" main.go 
# command-line-arguments
./main.go:4:2: moved to heap: x
./main.go:5:9: func literal escapes to heap

You can see that the variable x escapes to the heap.

Let’s take a look at the assembly directly.

Let’s look at the main function first.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
"".main STEXT size=88 args=0x0 locals=0x18
        0x0000 00000 (main.go:10)       TEXT    "".main(SB), ABIInternal, $24-0
        0x0000 00000 (main.go:10)       MOVQ    (TLS), CX
        0x0009 00009 (main.go:10)       CMPQ    SP, 16(CX)
        0x000d 00013 (main.go:10)       PCDATA  $0, $-2
        0x000d 00013 (main.go:10)       JLS     81
        0x000f 00015 (main.go:10)       PCDATA  $0, $-1
        0x000f 00015 (main.go:10)       SUBQ    $24, SP
        0x0013 00019 (main.go:10)       MOVQ    BP, 16(SP)
        0x0018 00024 (main.go:10)       LEAQ    16(SP), BP
        0x001d 00029 (main.go:10)       FUNCDATA        $0, gclocals·69c1753bd5f81501d95132d08af04464(SB)
        0x001d 00029 (main.go:10)       FUNCDATA        $1, gclocals·9fb7f0986f647f17cb53dda1484e0f7a(SB)
        0x001d 00029 (main.go:11)       PCDATA  $1, $0
        0x001d 00029 (main.go:11)       NOP
        0x0020 00032 (main.go:11)       CALL    "".test(SB)
        ...

In fact, this assembly is the same as the assembly of other function calls, there is nothing to say, before calling the test function is to do some stack initialization work.

Here is a direct look at the test function.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
0x0000 00000 (main.go:3)        TEXT    "".test(SB), ABIInternal, $40-8
0x0000 00000 (main.go:3)        MOVQ    (TLS), CX
0x0009 00009 (main.go:3)        CMPQ    SP, 16(CX)
0x000d 00013 (main.go:3)        PCDATA  $0, $-2
0x000d 00013 (main.go:3)        JLS     171
0x0013 00019 (main.go:3)        PCDATA  $0, $-1
0x0013 00019 (main.go:3)        SUBQ    $40, SP
0x0017 00023 (main.go:3)        MOVQ    BP, 32(SP)
0x001c 00028 (main.go:3)        LEAQ    32(SP), BP
0x0021 00033 (main.go:3)        FUNCDATA        $0, gclocals·263043c8f03e3241528dfae4e2812ef4(SB)
0x0021 00033 (main.go:3)        FUNCDATA        $1, gclocals·568470801006e5c0dc3947ea998fe279(SB)
0x0021 00033 (main.go:3)        MOVQ    $0, "".~r0+48(SP)
0x002a 00042 (main.go:4)        LEAQ    type.int(SB), AX
0x0031 00049 (main.go:4)        MOVQ    AX, (SP)
0x0035 00053 (main.go:4)        PCDATA  $1, $0
0x0035 00053 (main.go:4)        CALL    runtime.newobject(SB)           ;; 申请内存
0x003a 00058 (main.go:4)        MOVQ    8(SP), AX                       ;; 将申请的内存地址写到 AX 中
0x003f 00063 (main.go:4)        MOVQ    AX, "".&x+24(SP)                ;; 将内存地址写到 24(SP) 中
0x0044 00068 (main.go:4)        MOVQ    $100, (AX)                      ;; 将100 写到 AX 保存的内存地址指向的内存中
0x004b 00075 (main.go:5)        LEAQ    type.noalg.struct { F uintptr; "".x *int }(SB), AX ;; 创建闭包结构体,并将函数地址写到 AX
0x0052 00082 (main.go:5)        MOVQ    AX, (SP)                        ;; 将 AX 中保存的函数地址写到 (SP)   
0x0056 00086 (main.go:5)        PCDATA  $1, $1
0x0056 00086 (main.go:5)        CALL    runtime.newobject(SB)           ;; 申请内存
0x005b 00091 (main.go:5)        MOVQ    8(SP), AX                       ;; 将申请的内存地址写到 AX 中
0x0060 00096 (main.go:5)        MOVQ    AX, ""..autotmp_4+16(SP)        ;; 将内存地址写到 16(SP) 中
0x0065 00101 (main.go:5)        LEAQ    "".test.func1(SB), CX           ;; 将 test.func1 函数地址写到 CX
0x006c 00108 (main.go:5)        MOVQ    CX, (AX)                        ;; 将 CX 中保存的函数地址写到 AX 保存的内存地址指向的内存中
0x006f 00111 (main.go:5)        MOVQ    ""..autotmp_4+16(SP), AX        ;; 将 16(SP) 保存的内存地址写到 AX 
0x0074 00116 (main.go:5)        TESTB   AL, (AX)
0x0076 00118 (main.go:5)        MOVQ    "".&x+24(SP), CX                ;; 将 24(SP) 保存的地址值写到 CX
0x007b 00123 (main.go:5)        LEAQ    8(AX), DI                       ;; 将 AX + 8 写到 DI
0x007f 00127 (main.go:5)        PCDATA  $0, $-2
0x007f 00127 (main.go:5)        CMPL    runtime.writeBarrier(SB), $0
0x0086 00134 (main.go:5)        JEQ     138
0x0088 00136 (main.go:5)        JMP     164
0x008a 00138 (main.go:5)        MOVQ    CX, 8(AX)                       ;; 将 CX 中保存的函数地址写到 AX+8
0x008e 00142 (main.go:5)        JMP     144
0x0090 00144 (main.go:5)        PCDATA  $0, $-1
0x0090 00144 (main.go:5)        MOVQ    ""..autotmp_4+16(SP), AX
0x0095 00149 (main.go:5)        MOVQ    AX, "".~r0+48(SP)
0x009a 00154 (main.go:5)        MOVQ    32(SP), BP
0x009f 00159 (main.go:5)        ADDQ    $40, SP
0x00a3 00163 (main.go:5)        RET

Let’s look at this compilation step by step.

1
2
3
4
5
6
7
0x002a 00042 (main.go:4)        LEAQ    type.int(SB), AX                ;; 将 type.int 函数地址值写到 AX
0x0031 00049 (main.go:4)        MOVQ    AX, (SP)                        ;; 将 AX 保存的函数地址值写到 (SP)  
0x0035 00053 (main.go:4)        PCDATA  $1, $0
0x0035 00053 (main.go:4)        CALL    runtime.newobject(SB)           ;; 申请内存
0x003a 00058 (main.go:4)        MOVQ    8(SP), AX                       ;; 将申请的内存地址写到 AX 中
0x003f 00063 (main.go:4)        MOVQ    AX, "".&x+24(SP)                ;; 将内存地址写到 24(SP) 中
0x0044 00068 (main.go:4)        MOVQ    $100, (AX)

This step is actually writing the type.int function address value to (SP) via AX, then calling runtime.newobject to request a memory block, writing the memory address value to 24(SP) via AX is equivalent to allocating memory space to variable x, and finally setting the value of x to 100.

At this point the stack frame structure should look like this.

sobyte

1
0x004b 00075 (main.go:5)        LEAQ    type.noalg.struct { F uintptr; "".x *int }(SB), AX

This structure represents a closure, and then the memory address of the created structure is placed in the AX register.

1
0x0052 00082 (main.go:5)        MOVQ    AX, (SP)

This assembly instruction then writes the memory address stored in AX to (SP).

1
2
3
0x0056 00086 (main.go:5)        CALL    runtime.newobject(SB)           ;; 申请内存
0x005b 00091 (main.go:5)        MOVQ    8(SP), AX                       ;; 将申请的内存地址写到 AX 中
0x0060 00096 (main.go:5)        MOVQ    AX, ""..autotmp_4+16(SP)        ;; 将内存地址写到 16(SP)

Here a new block of memory will be requested and the memory address will be written from AX to 16(SP).

1
2
3
0x0065 00101 (main.go:5)        LEAQ    "".test.func1(SB), CX           ;; 将 test.func1 函数地址写到 CX
0x006c 00108 (main.go:5)        MOVQ    CX, (AX)                        ;; 将 CX 中保存的函数地址写到 AX 保存的内存地址指向的内存中
0x006f 00111 (main.go:5)        MOVQ    ""..autotmp_4+16(SP), AX        ;; 将 16(SP) 保存的内存地址写到 AX

Here the test.func1 function address value is written to CX, and then the address value stored in CX is written to the memory pointed to by the memory address saved in AX. Then it also writes the address value saved by 16(SP) to AX, in fact, the value saved by AX does not change here, so I don’t know why it is necessary to generate an assembly instruction like this.

Since the AX memory address is written to 8(SP) and the 16(SP) memory address is written to AX, this one-time actually modifies the value in three places, and the specific stack frame structure is as follows.

sobyte

1
2
3
4
5
6
0x0076 00118 (main.go:5)        MOVQ    "".&x+24(SP), CX                ;; 将 24(SP) 保存的地址值写到 CX
0x007b 00123 (main.go:5)        LEAQ    8(AX), DI                       ;; 将 AX + 8 写到 DI
0x007f 00127 (main.go:5)        CMPL    runtime.writeBarrier(SB), $0    ;; 写屏障
0x0086 00134 (main.go:5)        JEQ     138
0x0088 00136 (main.go:5)        JMP     164
0x008a 00138 (main.go:5)        MOVQ    CX, 8(AX)                       ;; 将 CX 中保存的地址写到 AX+8

24(SP) actually holds the pointer address of the x variable, which will be written to CX here. Then the value saved by 8(AX) will be transferred to DI, and finally the value saved by CX will be written to 8(AX).

Here is a little bit more about the reference of AX at this point.

AX -> address value of test.func1, that is, AX is pointing to the address value of test.func1 at this time.

8(AX) -> 24(SP) address value -> 100, that is, the address value saved by 8(AX) points to the 24(SP) address value, and the memory saved by the 24(SP) address value points to 100.

1
2
3
4
0x0090 00144 (main.go:5)        MOVQ    ""..autotmp_4+16(SP), AX        ;; 16(SP) 中保存的地址写入 AX
0x0095 00149 (main.go:5)        MOVQ    AX, "".~r0+48(SP)               ;; 将 AX 中保存的地址写到 48(SP)   
0x009a 00154 (main.go:5)        MOVQ    32(SP), BP
0x009f 00159 (main.go:5)        ADDQ    $40, SP

The value of 16(SP) will be written to the upper caller’s stack frame 48(SP) by AX, and then the stack will be shrunk and the callee stack call will be completed.

After the call, it returns to the main function, and the stack frame at this time is as follows.

sobyte

Returning to the location of the main function after the test function call, the following

1
2
3
4
0x0020 00032 (main.go:11)       CALL    "".test(SB)
0x0025 00037 (main.go:11)       MOVQ    (SP), DX                ;; 将(SP)保存的函数地址值写到 DX 
0x0029 00041 (main.go:11)       MOVQ    DX, "".f+8(SP)          ;; 将 DX 保存的函数地址值写到 8(SP)   
0x002e 00046 (main.go:12)       MOVQ    (DX), AX                ;; 将 DX 保存的函数地址值写到 AX

After the test function is called, a test.func1 function address value is returned to the top of the main call stack, and then the test.func1 function address value stored in (SP) is written to AX after the test function is called, and then the following instruction is executed to call.

1
0x0031 00049 (main.go:12)       CALL    AX

Before we get to the test.func1 function, we should now know that (SP) holds the value of the address pointing to AX.

The test.func1 function is the function within the test function that wraps the return.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
"".test.func1 STEXT nosplit size=36 args=0x0 locals=0x10
        0x0000 00000 (main.go:5)        TEXT    "".test.func1(SB), NOSPLIT|NEEDCTXT|ABIInternal, $16-0
        0x0000 00000 (main.go:5)        SUBQ    $16, SP
        0x0004 00004 (main.go:5)        MOVQ    BP, 8(SP)
        0x0009 00009 (main.go:5)        LEAQ    8(SP), BP
        0x000e 00014 (main.go:5)        MOVQ    8(DX), AX       ;; 这里实际上是获取变量 x 的地址值
        0x0012 00018 (main.go:5)        MOVQ    AX, "".&x(SP)
        0x0016 00022 (main.go:6)        ADDQ    $100, (AX)      ;; 将x地址指向的值加100
        0x001a 00026 (main.go:7)        MOVQ    8(SP), BP
        0x001f 00031 (main.go:7)        ADDQ    $16, SP
        0x0023 00035 (main.go:7)        RET

Since DX holds the AX address value, you can get the address value of variable x by 8(DX) and write it to AX. The ADDQ instruction is then called to add 100 to the value pointed to by the x address.

Summary

From the above analysis, we can find that anonymous functions are actually a kind of closure, just without passing variable information. In the case of a closure call, the context information is escaped to the heap to avoid being recycled at the end of the stack frame call.

In the above example of the closure function test call, it is very complicated to do a lot of variable passing, but it actually does these things.

  1. initialize the memory block for the context information.
  2. saving the address value of the context information to the AX register.
  3. write the address of the test.func1 call function wrapped in the closure function to the top of the caller’s stack.

The context information here refers to the x variable and the test.func1 function. After writing these two addresses to the AX register, go back to the main function, get the address of the function on the top of the stack, write it to AX and execute CALL AX to call it.

Since the x variable address is written to AX + 8, the test.func1 function is called by getting the value of AX + 8 to get the x variable address to change the closure context information.

Summary

In this article, we first shared with you how the process of function calls works, including the passing of parameters, the order of parameter stacking, and the passing of function return values. Then we analyzed the difference between structured method passing and how closed function calling works.

The dlv tool’s regs command and step-instruction command help a lot when analyzing closures, otherwise it’s easy to get around the pointer passing between registers, so it’s recommended to draw on paper while watching.