Go Assembly Overview

Direction

The plan9 assembly operand direction is the opposite of the intel assembly direction, plan9 is left to right and intel is right to left.

// plan9 
MOVQ $123, AX

// intel
MOV RAX, 123

Stack push and pop

In plan9, there is no PUSH POP instruction for stack operation, but SUB and ADD respectively.

SP is the top of stack pointer, which corresponds to BP bottom of stack pointer, usually only need to operate SP pointer to complete push and pop operations, so BP pointer is not used much.

1
2

SUBQ $0x18, SP // Subtract the SP and put it on the stack
ADDQ $0x18, SP // Add to SP, out of stack

Data Copy

The commands starting with MOV are used to move data, and the amount of data that can be moved at one time varies with the variety of commands provided.

MOVB $1, DI      // MOVB can move 1 byte at a time
MOVW $0x10, BX   // MOVW can move 2 bytes at a time
MOVD $1, DX      // MOVD can move 4 bytes at a time
MOVQ $-10, AX    // MOVQ can move 8 bytes at a time

Operation commands

ADD to add
SUB to subtract
IMUL for multiplication

1
2
3

ADDQ AX, BX      // BX += AX
SUBQ AX, BX      // BX -= AX
IMULQ AX, BX     // BX *= AX

Jump Instructions

The jump instruction is the key to switch the flow of program execution.

// unconditional jumping
JMP label // jump to label can jump to the same function within the label position (commonly used)
JMP addr // jump to address, which can be the address in the code, but in practice this doesn't happen by hand
JMP 2(PC) // jump n lines forward/backward based on current PC
JMP -2(PC) // same as above

// Conditional jump
JNZ target // jump if zero flag has been set

Constant definition

plan9 assembly uses num to represent constants, which can be negative, the default is decimal and can be used in the form of 0x123 for hexadecimal

Variable declarations

Variables in assembly are generally read-only values stored in .rodata or .data segments. This corresponds to global const and var variables/constants that have been initialized at the application level.

DATA can declare and initialize a variable
1

DATA symbol+offset(SB)/width,value
The above statement initializes the symbol+offset(SB) data in width bytes, assigning it to value.(SB operations are all incremental)

GLOBL declares a global variable

If under Go package, GLOBL can export the DATA initialized variables for external use

1
2
3

GLOBL runtime·tlsoffset(SB), NOPTR, $4
// Declare a global variable tlsoffset, 4 byte, with no DATA part because its value is 0.
// NOPTR means that there is no pointer in the data of this variable and the GC does not need to scan it.

Go’s registers

Go assembly introduces 4 pseudo-registers PC, FP, SP, SB to simplify the writing of assembly code, plus other general registers which are the Go assembly language’s re-abstraction of the CPU.

Take AMD64 environment as an example, the purpose of each register is explained

Pseudo PC registers

Meaning: Alias of IP register, pointing to instruction address
Purpose: Used to indicate the address of the next instruction (logical address, i.e., offset), normally, the system instructs to add 1 to it, when encountering transfer instructions, such as JMP, CALL, LOOP, etc., the system will save the jump to the instruction address in the PC
Frequency of use: Except for individual jumps, handwritten code deals with PC registers less often

Pseudo SB register

Meaning: can be interpreted as raw memory, pointing to the global symbol table
Usage: Generally used to declare functions or global variables
Usage: foo(SB) means use foo to represent an address in memory. foo(SB) can be used to define global functions and data. foo<>(SB) means foo is only visible in the current file, similar to the effect of static in C. In addition, you can add an offset to the reference, for example, foo+4(SB) means the address of foo+4bytes
Frequency of use: commonly used

BP register

Meaning: indicates the start stack base of the function call stack (the direction of the stack is from large to small, the true SP indicates the top of the stack), and records the end position of the current function stack frame
Usage: Store the stack base address before entering the function, and use with true SP to maintain the function call stack relationship
Use.
- Function call-related instructions implicitly affect the value of BP
- The BP register on the X86 platform is usually used to indicate the starting position of the function stack, and is only used as an indicator.
- But some debug tools may use this register to find function parameters, local variables, etc.
- So on amd64 platform, compiler will insert 8 byte after return address to place caller BP register
If you need to maintain the call stack manually, you need to use the BP register and split the call stack manually.

SP register

Meaning.
- True SP register indicates the end stack top of the function call stack (the direction of the stack is from largest to smallest, BP indicates the bottom of the stack), and records the end position of the current function stack frame.
- Pseudo-SP register indicates the highest start address of the local variable.
Uses.
- True SP is generally used for stack allocation, stack release, etc.
- Pseudo-SP is generally used to locate local variables.
Pseudo-SP use.
- Pseudo SP starts at the high address of the local variable, so you need to use negative offset when using it.
- Used by symbol+offset(SP), the legal value of offset is [-framesize, 0)
- For example, b-8(SP) means that the local variable b is at the 8th byte of the pseudo-SP
Use of true SP.
- True SP starts at the low address of the function stack frame, and the compiler adds or subtracts the SP pointer to achieve stack allocation and stack release
- Stack allocation and release are allocated in a single addition and subtraction operation
Frequency of use: both true and false SP are commonly used (the compiler eventually generates true SP)

Pseudo FP registers

Meaning: compiler maintains on-stack parameter pointers based on FP offsets, identifying parameters
Use: Generally used to identify and access the arguments and return values of a function
Usage: To access the arguments of a specific function, the compiler forces that the FP must be accessed using the identifier prefix, e.g. foo+0(FP) for the first argument of foo, foo+8(FP) for the second argument, and 64-bit systems with offsets to access more arguments
Relationship to pseudo-SP registers:
- Pseudo FP is the base address for accessing incoming and outgoing parameters, usually addressed by positive offsets.
- Pseudo SP is the base address for accessing local variables, usually addressed by `negative offset
Frequency of use: commonly used

General Purpose Registers

AX, BX, CX, DX, DI, SI, R8-R15

MMX registers

R0-R7 are not general-purpose registers, they are only exclusive to the MMX instruction introduced from X87 onwards

TLS pseudo-registers

This register stores the address of the current goroutine g structure

Frequently Asked Questions

1. How can I view the Go assembly code?

For those who want to learn Go assembly language, we can look through the Go source code, which has a lot of assembly examples in practice. In addition, we can also analyze the compiled code of our Go programs through disassembly and other means to understand some underlying mechanisms and principles.

Here are some commands to get the compiled code.

Compile output

// -N Disable optimization
// -l disables inlining
// -S Output assembly code
go build -gcflags='-N -l -S' main.go

// Equivalent to
go tool compile -N -l -S main.go

Disassembly

// Compile to main.o
go tool compile -N -l main.go

// Disassembly
go tool objdump -S main.o

SSA analysis

The SSA analysis generates an ssa.html page, which can be opened to see the whole process of compiling a Go program, the last step being the assembly instructions we want.
1

GOSSAFUNC="main" go build main.go

2. What do framesize and argsize mean when they are specified in an assembly function?

framesize means the entire stack frame size of the function
- including local variables as callee
- including the return value argument as caller
- including input arguments as caller
- does not include parent caller BP
- does not include the return address function return address
argsize indicates the size of the space allocated for the function input and return values as a caller
- When there is a NOSPLIT marker, the size of the input arguments and return value can be omitted
- can be omitted because the compiler can derive the size of function arguments from the Go language function declaration
If framesize is greater than 0 and framepointer enabled, the BP register is also stacked, and the true SP register is offset down by framesize bytes to allocate stack space

assembly

3. How to distinguish between true and false registers?

Pseudo registers generally need an identifier and offset as prefix, if there is no identifier prefix, they are true registers
(SP), +8(SP) are true SP registers without an identifier prefix, while a(SP), b-8(SP) are pseudo SP registers with an identifier prefix

4. The relationship between FP and SP registers in AMD64 environment?

Scenario
- We assume the scenario is when caller calls callee, framesize is greater than 0, and BP registers are stacked
- We restrict NOSPLIT to not allow leaf function stack splitting
- caller’s FP pointer is at the high address above, and callee’s true and false SPs are at the low address, spanning both caller and callee functions
Relationship
- pseudo FP = true SP + framesize + 16
  - arg0+0(FP) = 0(SP) + framesize + 16
  - Note: The 16 bytes here store the 8-byte parent caller BP, and the 8-byte return address, respectively
- Pseudo-SP = true SP + framesize
  - Description: The pseudo-SP is pointing to a location below the low address of the BP
- pseudo FP = pseudo SP + 16
Caution
- Pseudo SP and pseudo FP registers are calculated based on the true SP, which is convenient for quick operation.
- When the true SP register is modified, the pseudo SP and pseudo FP registers will be moved in the same way.
- Therefore, it is better not to modify the true SP and leave it to the compiler.

5. What happens to the stack frame during the function call?

Execution of the CALL instruction, the caller
- return address after a stacked function call return address (addr of the subsequent instruction of the caller executed after callee returns to the caller)
- Jump to the address of the instruction pointed to by the PC register
Allocate the callee stack frame
- At the head of the function call, the compiler inserts 3 instructions
  - The first one: SUBQ $16, SP allocates the stack frame of callee, moving sp down 16 bytes, which is the top of the stack of callee.
  - Second: MOVQ BP, 8(SP) Back up the caller’s BP stack base below the return address (at the low address)
  - Third: LEAQ 8(SP), BP points the BP register to callee’s stack base (for 0x48(SP), this location is callee's stack base)
Execute the instructions that follow the TEXT segment of the callee function.
Restore the caller’s stack frame
- Before the function returns, you need to restore the caller’s stack frame, the compiler will insert 2 instructions to the end of the `function
  - The first one: MOVQ 8(SP), BP restores the BP stack base address of the caller
  - The second instruction: ADDQ $16, SP frees up the stack space of callee, and the SP register is naturally restored to the original position of caller
Execute the RET instruction and return
- Return address out of the stack return address
- PC register jumps to return address
- Execute the instruction at return address, and caller resumes execution

Summary

In Go’s assembly, there are many unique pseudo-registers to help users quickly locate the corresponding hardware registers, global variables, functions, concurrent procedures, etc., which is really convenient. On the other hand, if you don’t understand the mechanism and stack layout of these pseudo-registers in Go, you will have a hard time.

For writing and learning Go assembly, many register operations are transparent, and we should focus on the layout of return parameters, input parameters, and local variables on the stack frame in function calls. The compiled assembly instructions will also be converted to true SP registers, and there are no pseudo registers.

The others, pseudo PC, pseudo SB registers, pseudo TLS registers, etc., can be understood as long as they exist. bp registers, although they have a low presence, are still useful for understanding the layout of function stacks.

Table of Contents