1. Flow of Golang code being run up by the OS

1.1. Compilation

The go source code is first compiled into an executable file by go build, which is an ELF format executable file on linux platform, and the compilation stage will go through three processes: compiler, assembler, and linker to finally generate an executable file.

  • Compiler: *.go source code is generated as plan9 assembly code for *.s by the go compiler, the go compiler entry is compile/internal/gc/main.go file for the main function.
  • Assembler: The go assembler converts the compiler-generated *.s assembly language into machine code and writes the final target program *.o file, src/cmd/internal/obj package implements the go assembler.
  • Linker: The assembler generates a *.o target file that is linked to obtain the final executable, src/cmd/link/internal/ld package implements the linker.

Flow of Golang code being run up by the OS

1.2. Running

After the go source code has been generated as an executable through the above steps, the binary file will go through the following stages when loaded and run by the operating system.

  • Reading the executable from disk into memory.
  • Creating the process and the main thread.
  • Allocating stack space for the main thread.
  • copying the parameters entered by the user on the command line to the main thread’s stack.
  • placing the main thread into the operating system’s run queue to wait to be scheduled to execute it.

2. Golang program startup flow analysis

2.1. Analyze the program startup process through gdb debugging

Here a simple go program is debugged in a single step to analyze its startup process.

main.go

1
2
3
4
5
6
7
package main

import "fmt"

func main() {
    fmt.Println("hello world")
}

Compile the program and use gdb to debug it. When debugging with gdb, first set a breakpoint at the program entry point, and then perform single-step debugging to see the code execution flow during the program startup.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
$ go build -gcflags "-N -l" -o main main.go

$ gdb ./main

(gdb) info files
Symbols from "/home/gosoon/main".
Local exec file:
	`/home/gosoon/main', file type elf64-x86-64.
	Entry point: 0x465860
	0x0000000000401000 - 0x0000000000497893 is .text
	0x0000000000498000 - 0x00000000004dbb65 is .rodata
	0x00000000004dbd00 - 0x00000000004dc42c is .typelink
	0x00000000004dc440 - 0x00000000004dc490 is .itablink
	0x00000000004dc490 - 0x00000000004dc490 is .gosymtab
	0x00000000004dc4a0 - 0x0000000000534b90 is .gopclntab
	0x0000000000535000 - 0x0000000000535020 is .go.buildinfo
	0x0000000000535020 - 0x00000000005432e4 is .noptrdata
	0x0000000000543300 - 0x000000000054aa70 is .data
	0x000000000054aa80 - 0x00000000005781f0 is .bss
	0x0000000000578200 - 0x000000000057d510 is .noptrbss
	0x0000000000400f9c - 0x0000000000401000 is .note.go.buildid
(gdb) b *0x465860
Breakpoint 1 at 0x465860: file /home/gosoon/golang/go/src/runtime/rt0_linux_amd64.s, line 8.
(gdb) r
Starting program: /home/gaofeilei/./main

Breakpoint 1, _rt0_amd64_linux () at /home/gaofeilei/golang/go/src/runtime/rt0_linux_amd64.s:8
8		JMP	_rt0_amd64(SB)
(gdb) n
_rt0_amd64 () at /home/gaofeilei/golang/go/src/runtime/asm_amd64.s:15
15		MOVQ	0(SP), DI	// argc
(gdb) n
16		LEAQ	8(SP), SI	// argv
(gdb) n
17		JMP	runtime·rt0_go(SB)
(gdb) n
runtime.rt0_go () at /home/gaofeilei/golang/go/src/runtime/asm_amd64.s:91
91		MOVQ	DI, AX		// argc
......
231		CALL	runtime·mstart(SB)
(gdb) n
hello world
[Inferior 1 (process 39563) exited normally]

By single-step debugging, you can see that the program entry function is at line 8 of the runtime/rt0_linux_amd64.s file, which eventually executes the CALL runtime-mstart(SB) instruction and outputs “hello world” and then the program exits. .

The function calls in the startup process flow are shown below.

1
rt0_linux_amd64.s -->_rt0_amd64 --> rt0_go-->runtime·settls -->runtime·check-->runtime·args-->runtime·osinit-->runtime·schedinit-->runtime·newproc-->runtime·mstart

2.2. golang startup process analysis

The previous section has seen through gdb debugging golang program in the startup process will execute a series of assembly instructions, this section will specifically analyze the meaning of each instruction in the process of starting the program, to understand these to understand the golang program in the startup process of the operations performed.

src/runtime/rt0_linux_amd64.s

1
2
3
4
5
6
7
#include "textflag.h"

TEXT _rt0_amd64_linux(SB),NOSPLIT,$-8
JMP _rt0_amd64(SB)

TEXT _rt0_amd64_linux_lib(SB),NOSPLIT,$0
JMP _rt0_amd64_lib(SB)

The first execution is line 8, JMP _rt0_amd64, which runs under the amd64 platform, and the _rt0_amd64 function is located in the file src/runtime/asm_amd64.s.

1
2
3
4
5
6
TEXT _rt0_amd64(SB),NOSPLIT,$-8
    // 处理 argc 和 argv 参数,argc 是指命令行输入参数的个数,argv 存储了所有的命令行参数
    MOVQ    0(SP), DI   // argc
    // argv 为指针类型
    LEAQ    8(SP), SI   // argv
    JMP runtime·rt0_go(SB)

The _rt0_amd64 function saves the argc and argv arguments to the DI and SI registers and then jumps to the rt0_go function, the main purpose of the rt0_go function is as follows.

  • Copy argc, argv arguments to the main thread stack.
  • Initialize the global variable g0, allocate about 64K stack space on the main thread stack for g0, and set the stackguard0, stackguard1, stack fields of g0.
  • Execute the CPUID instruction to probe for CPU information.
  • Execute the nocpuinfo block to determine if the cgo needs to be initialized.
  • Execute the needtls code block to initialize tls and m0.
  • Execute ok block, first bind m0 to g0, then call runtime-args function to handle process parameters and environment variables, call runtime-osinit function to initialize cpu count, call runtime-schedinit to initialize scheduler, call runtime-newproc to create the first goroutine to execute the main function, call runtime-mstart to start the main thread, which will execute the first goroutine to run the main function, and will block here until the process exits.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
TEXT runtime·rt0_go(SB),NOSPLIT|TOPFRAME,$0
    // 处理命令行参数的代码
    MOVQ    DI, AX      // AX = argc 
    MOVQ    SI, BX      // BX = argv
    // 将栈扩大39字节,此处为什么扩大39字节暂时还没有搞清楚
    SUBQ    $(4*8+7), SP        
    ANDQ    $~15, SP    // 调整为 16 字节对齐
    MOVQ    AX, 16(SP)  //argc放在SP + 16字节处
    MOVQ    BX, 24(SP)  //argv放在SP + 24字节处

    // 开始初始化 g0,runtime·g0 是一个全局变量,变量在 src/runtime/proc.go 中定义,全局变量会保存在进程内存空间的数据区,下文会介绍查看 elf 二进制文件中的代码数据和全局变量的方法
    // g0 的栈是从进程栈内存区进行分配的,g0 占用了大约 64k 大小。 
    MOVQ    $runtime·g0(SB), DI    // g0 的地址放入 DI 寄存器 
    LEAQ    (-64*1024+104)(SP), BX // BX = SP - 64*1024 + 104
    
    // 开始初始化 g0 对象的 stackguard0,stackguard1,stack 这三个字段    
    MOVQ    BX, g_stackguard0(DI) // g0.stackguard0 = SP - 64*1024 + 104
    MOVQ    BX, g_stackguard1(DI) // g0.stackguard1 = SP - 64*1024 + 104
    MOVQ    BX, (g_stack+stack_lo)(DI) // g0.stack.lo = SP - 64*1024 + 104
    MOVQ    SP, (g_stack+stack_hi)(DI) // g0.stack.hi = SP

After the execution of the above instructions, the process memory space layout is as follows.

process memory space layout

Then start executing instructions to get cpu information and related to cgo initialization, this code can be ignored for now.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
    // 执行CPUID指令,尝试获取CPU信息,探测 CPU 和 指令集的代码
    MOVL    $0, AX
    CPUID
    MOVL    AX, SI
    CMPL    AX, $0
    JE  nocpuinfo

    // Figure out how to serialize RDTSC.
    // On Intel processors LFENCE is enough. AMD requires MFENCE.
    // Don't know about the rest, so let's do MFENCE.
    CMPL    BX, $0x756E6547  // "Genu"
    JNE notintel
    CMPL    DX, $0x49656E69  // "ineI"
    JNE notintel
    CMPL    CX, $0x6C65746E  // "ntel"
    JNE notintel
    MOVB    $1, runtime·isIntel(SB)
    MOVB    $1, runtime·lfenceBeforeRdtsc(SB)
notintel:

    // Load EAX=1 cpuid flags
    MOVL    $1, AX
    CPUID
    MOVL    AX, runtime·processorVersionInfo(SB)
    
nocpuinfo:
    // cgo 初始化相关,_cgo_init 为全局变量
    MOVQ    _cgo_init(SB), AX
    // 检查 AX 是否为 0
    TESTQ   AX, AX
    // 跳转到 needtls
    JZ  needtls
    // arg 1: g0, already in DI
    MOVQ    $setg_gcc<>(SB), SI // arg 2: setg_gcc

    CALL    AX
		
    // 如果开启了 CGO 特性,则会修改 g0 的部分字段
    MOVQ    $runtime·g0(SB), CX
    MOVQ    (g_stack+stack_lo)(CX), AX
    ADDQ    $const__StackGuard, AX
    MOVQ    AX, g_stackguard0(CX)
    MOVQ    AX, g_stackguard1(CX)

The following is the execution of needtls code block, initialize tls and m0, tls is the thread local storage, in the golang program running process, each m needs to be associated with a work thread, so how does the work thread know its associated m, at this time will use the thread local storage, thread local storage is the thread private global variable, through the thread local storage can be for Each thread can initialize a private global variable m, and then each thread can use the same global variable name to access a different m structure object. As will be analyzed later, each worker thread m actually uses thread-local storage to implement a private global variable for that worker thread that points to an instance of the m structure object just before it is created and enters the scheduling loop.

In the code analysis later, you will often see calls to the getg function. The getg function will fetch the currently running g from the thread local store, in this case the g0 associated with m.

The tls address will be written to m0, and m0 will be bound to g0, so you can get g0 directly from tls.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
// 下面开始初始化tls(thread local storage,线程本地存储),设置 m0 为线程私有变量,将 m0 绑定到主线程
needtls:
    LEAQ    runtime·m0+m_tls(SB), DI  //DI = &m0.tls,取m0的tls成员的地址到DI寄存器
    
    // 调用 runtime·settls 函数设置线程本地存储,runtime·settls 函数的参数在 DI 寄存器中
    // 在 runtime·settls 函数中将 m0.tls[1] 的地址设置为 tls 的地址
    // runtime·settls 函数在 runtime/sys_linux_amd64.s#599
    CALL    runtime·settls(SB)

    // 此处是在验证本地存储是否可以正常工作,确保值正确写入了 m0.tls,
    // 如果有问题则 abort 退出程序
    // get_tls 是宏,位于 runtime/go_tls.h
    get_tls(BX) 					 // 将 tls 的地址放入 BX 中,即 BX = &m0.tls[1] 
    MOVQ    $0x123, g(BX)  // BX = 0x123,即 m0.tls[0] = 0x123
    MOVQ    runtime·m0+m_tls(SB), AX    // AX = m0.tls[0]
    CMPQ    AX, $0x123
    JEQ 2(PC)   								// 如果相等则向后跳转两条指令即到 ok 代码块
    CALL    runtime·abort(SB)   // 使用 INT 指令执行中断

Continuing with the ok code block, the main logic is.

  • Bind m0 to g0 and start the main thread.
  • Calling the runtime-osinit function to initialize the number of cpu’s, the scheduler needs to know how many CPU cores the system currently has when it initializes.
  • Calling the runtime-schedinit function initializes the m0 and p objects and also sets the maxmcount member of the global variable sched to 10000, limiting the maximum number of OS threads that can be created out of work to 10000.
  • Call runtime-newproc to create a goroutine for the main function.
  • call runtime-mstart to start the main thread and execute the main function.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
// 首先将 g0 地址保存在 tls 中,即 m0.tls[0] = &g0,然后将 m0 和 g0 绑定
// 即 m0.g0 = g0, g0.m = m0
ok:
    get_tls(BX)    							// 获取tls地址到BX寄存器,即 BX = m0.tls[0]
    LEAQ    runtime·g0(SB), CX  // CX = &g0
    MOVQ    CX, g(BX) 				  // m0.tls[0]=&g0
    LEAQ    runtime·m0(SB), AX  // AX = &m0
  
    MOVQ    CX, m_g0(AX)  // m0.g0 = g0
    MOVQ    AX, g_m(CX)   // g0.m = m0
		
    CLD             // convention is D is always left cleared
    // check 函数检查了各种类型以及类型转换是否有问题,位于 runtime/runtime1.go#137 中
    CALL    runtime·check(SB)
	
    // 将 argc 和 argv 移动到 SP+0 和 SP+8 的位置
    // 此处是为了将 argc 和 argv 作为 runtime·args 函数的参数
    MOVL    16(SP), AX      
    MOVL    AX, 0(SP)
    MOVQ    24(SP), AX      
    MOVQ    AX, 8(SP)
    
    // args 函数会从栈中读取参数和环境变量等进行处理
    // args 函数位于 runtime/runtime1.go#61
    CALL    runtime·args(SB)
    
    // osinit 函数用来初始化 cpu 数量,函数位于 runtime/os_linux.go#301
    CALL    runtime·osinit(SB)
    // schedinit 函数用来初始化调度器,函数位于 runtime/proc.go#654
    CALL    runtime·schedinit(SB)

    // 创建第一个 goroutine 执行 runtime.main 函数。获取 runtime.main 的地址,调用 newproc 创建 g
    MOVQ    $runtime·mainPC(SB), AX     
    PUSHQ   AX            // runtime.main 作为 newproc 的第二个参数入栈
    PUSHQ   $0            // newproc 的第一个参数入栈,该参数表示runtime.main函数需要的参数大小,runtime.main没有参数,所以这里是0
    
    // newproc 创建一个新的 goroutine 并放置到等待队列里,该 goroutine 会执行runtime.main 函数, 函数位于 runtime/proc.go#4250
    CALL    runtime·newproc(SB)
    // 弹出栈顶的数据
    POPQ    AX
    POPQ    AX

    // mstart 函数会启动主线程进入调度循环,然后运行刚刚创建的 goroutine,mstart 会阻塞住,除非函数退出,mstart 函数位于 runtime/proc.go#1328
    CALL    runtime·mstart(SB)

    CALL    runtime·abort(SB)   // mstart should never return
    RET

    // Prevent dead-code elimination of debugCallV2, which is
    // intended to be called by debuggers.
    MOVQ    $runtime·debugCallV2<ABIInternal>(SB), AX
    RET

The process memory space layout at this point is shown below.

process memory space layout

2.3. View ELF binary file structure

You can view the structure of the ELF binary file by using the readelf command. You can see the contents of the code area and data area in the binary file, global variables are stored in the data area and functions are stored in the code area.

1
2
3
4
5
6
$ readelf -s main | grep runtime.g0
  1765: 000000000054b3a0   376 OBJECT  GLOBAL DEFAULT   11 runtime.g0
  
// _cgo_init 为全局变量
$ readelf -s main | grep -i _cgo_init
  2159: 000000000054aa88     8 OBJECT  GLOBAL DEFAULT   11 _cgo_init

3. Summary

This article mainly introduces the key code in the Golang program startup process, the main code of the startup process is written through Plan9 assembly, if you have not done the underlying related things look very difficult, the author of some of the details are not fully understood, if interested in discussing some of the details of the implementation in private, there are some hard-coded numbers as well as the operating system and hardware The specification is relatively difficult to understand. The analysis of several components in Golang runtime will be written one after another.