Go 1.17 changed the long-standing stack-based calling convention. Before we can understand Go’s calling convention, we need to know what it is.

The x86 calling convention is, in a nutshell, the language’s convention for passing parameters between functions. The caller knows what parameters to pass to the called function in what form and in what order, and the called function follows this convention to find the contents of the passed parameters in the appropriate place.

We’ve seen the argument passing diagram in older versions of Go in many, many places, so here’s one I drew earlier.

You can see that the incoming and return values are on the stack, in order, from the low address, to the high address.

This stack-based pass-through is indeed simpler in design and implementation, but the stack pass-through results in several parameter moves between registers and memory during the function call. For example, when calling, the arguments are moved to the SP location (here from register -> memory); when ret, the arguments are moved from register to FP. After ret, the return value is moved from memory -> register.

There is an order of magnitude performance difference between registers, which are internal components of the CPU, and main memory, which is generally external, so it has been said that Go’s function calls are poor and need to be optimised (although these are probably not optimised for overall system performance reasons either).

Go 1.17 devised a set of call protocols based on register passing, which is currently only enabled on x86 platforms, and we can take a brief look at them via disassembly. Here, still to simplify matters, we only use int parameters (float uses a non-generic register).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
package main

//go:noinline
func add(x int, y int, z int, a, b, c int, d, e, f int, g, h, l int) (int, int, int, int, int, int, int, int, int, int, int) {
	println(x, y)
	return 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
}

func main() {
	println(add(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12))
}

Passing in a few more arguments makes it easier to see that there are 12 arguments and 11 values returned.

Looking directly at the results of the disassembly, we start with the call to main.add.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
TEXT main.main(SB) /Users/xargin/test/abi.go
  abi.go:15             0x1054e60               4c8da42478ffffff        LEAQ 0xffffff78(SP), R12
  abi.go:15             0x1054e68               4d3b6610                CMPQ 0x10(R14), R12
  abi.go:15             0x1054e6c               0f865a020000            JBE 0x10550cc
  abi.go:15             0x1054e72               4881ec08010000          SUBQ $0x108, SP
  abi.go:15             0x1054e79               4889ac2400010000        MOVQ BP, 0x100(SP)
  abi.go:15             0x1054e81               488dac2400010000        LEAQ 0x100(SP), BP
  abi.go:16             0x1054e89               48c704240a000000        MOVQ $0xa, 0(SP) // 10th parameter
  abi.go:16             0x1054e91               48c74424080b000000      MOVQ $0xb, 0x8(SP) // 11th parameter
  abi.go:16             0x1054e9a               48c74424100c000000      MOVQ $0xc, 0x10(SP) // 12th parameter
  abi.go:16             0x1054ea3               b801000000              MOVL $0x1, AX // 1st parameter, and so on
  abi.go:16             0x1054ea8               bb02000000              MOVL $0x2, BX
  abi.go:16             0x1054ead               b903000000              MOVL $0x3, CX
  abi.go:16             0x1054eb2               bf04000000              MOVL $0x4, DI
  abi.go:16             0x1054eb7               be05000000              MOVL $0x5, SI
  abi.go:16             0x1054ebc               41b806000000            MOVL $0x6, R8
  abi.go:16             0x1054ec2               41b907000000            MOVL $0x7, R9
  abi.go:16             0x1054ec8               41ba08000000            MOVL $0x8, R10
  abi.go:16             0x1054ece               41bb09000000            MOVL $0x9, R11
  abi.go:16             0x1054ed4               e807fdffff              CALL main.add(SB)
  abi.go:16             0x1054ed9               48898424f8000000        MOVQ AX, 0xf8(SP)

As you can see, there are officially only 9 general purpose registers used, AX, BX, CX, DI, SI, R8, R9, R10, R11, in that order, and beyond, on the stack.

Then there is the return value part of main.add.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
TEXT main.add(SB) /Users/xargin/test/abi.go
....  Omit the print part
  abi.go:6              0x1054c2f               48c74424400a000000      MOVQ $0xa, 0x40(SP) // 第 10 个返回值
  abi.go:6              0x1054c38               48c74424480b000000      MOVQ $0xb, 0x48(SP) // 第 11 个返回值
  abi.go:6              0x1054c41               b801000000              MOVL $0x1, AX // 第 1 个返回值,后面以此类推
  abi.go:6              0x1054c46               bb02000000              MOVL $0x2, BX
  abi.go:6              0x1054c4b               b903000000              MOVL $0x3, CX
  abi.go:6              0x1054c50               bf04000000              MOVL $0x4, DI
  abi.go:6              0x1054c55               be05000000              MOVL $0x5, SI
  abi.go:6              0x1054c5a               41b806000000            MOVL $0x6, R8
  abi.go:6              0x1054c60               41b907000000            MOVL $0x7, R9
  abi.go:6              0x1054c66               41ba08000000            MOVL $0x8, R10
  abi.go:6              0x1054c6c               41bb09000000            MOVL $0x9, R11
  abi.go:6              0x1054c72               488b6c2418              MOVQ 0x18(SP), BP
  abi.go:6              0x1054c77               4883c420                ADDQ $0x20, SP
  abi.go:6              0x1054c7b               c3                      RET

The return value uses the exact same sequence of registers as the input, and again when there are more than 9 return values, the excess is returned on the stack.

In a traditional calling protocol, a distinction is usually made between caller saved registers and callee saved registers, but in Go all registers are caller saved, i.e. the caller is responsible for saving them, and there is no guarantee in callee that they will not be destroyed on site.

This is also evidenced here by the fact that the return value directly overwrites the registers used by the incoming reference.

Since function calls don’t need to be passed through the stack anymore, there is a certain probability that the goroutine stack itself will use less memory in some scenarios where the function calls are nested at a deeper level. But since I don’t have a production environment at hand, I can’t verify this for now.


Reference https://xargin.com/go1-17-new-calling-convention/