Differences between threads, kernel threads and user threads

  • Threads: From the kernel’s point of view there is no such concept as threads. Linux implements all threads as processes, and the kernel has no special scheduling algorithm to handle threads. A thread is simply seen as a process that shares some resources with other processes. Like processes, each thread has its own task_struct, so in the kernel, a thread appears to be a normal process. Threads are also called lightweight processes. A process can have multiple threads, which have their own independent stack and are scheduled by the operating system to switch. On Linux they can be created with the pthread_create() method or the clone() system call.

  • kernel threads: standard processes that run independently in kernel space. The difference between kernel threads and normal threads is that kernel threads do not have a separate address space.

  • User thread: also known as a Coroutine, is a thread based on top of, but lighter than, a thread that exists and is managed by the user runtime and is not perceived by the operating system; its switching is controlled by the user program itself, but the user thread is also run by the kernel thread. coroutine in Lua and Python, Golang’s goroutine are all user threads.

The relationship between the three is shown below.

Threads/kernel threads/User thread

The relationship between goroutine and threads in Golang is shown below.

relationship between goroutine and threads in Golang

When a Golang program starts, it first creates a process, then a main thread, which executes some code for runtime initialization, including scheduler initialization, and then starts the scheduler, which keeps looking for a goroutine that needs to be run to bind to a kernel thread.

Reasons for Golang to use goroutine

Although there are already multi-threads and multi-processes in the operating system to solve the problem of high concurrency, but in today’s Internet mass and high concurrency scenario, the performance requirements are becoming more and more demanding, a large number of processes/threads will have the problem of high memory occupation and CPU consumption, and many services are transformed and refactored to reduce costs and increase efficiency.

A process can be associated with multiple threads, and the threads will share some resources of the process, such as memory address space, open files, process base information, etc. Each thread will also have its own stack and register information, etc. Threads are lighter than processes, while goroutines are lighter than threads, and multiple goroutines will be associated with one thread. Each goroutine will have its own stack space, so it will also be more lightweight. From process to thread to goroutine, it is actually a process of continuous sharing to reduce switching costs.

Golang uses goroutine mainly for the following reasons.

  • The problem of too heavy kernel thread creation and switching: both creation and switching require access to the kernel state, and access to the kernel state has high overhead and high performance cost, while goroutine switching does not require access to the kernel state.
  • Too heavy thread memory usage: the default stack size for creating a kernel thread is 8M, while creating a user thread, i.e. goroutine, requires only 2K memory, which is also automatically increased when the goroutine stack is not enough.
  • goroutine scheduling is more flexible, all goroutine scheduling and switching happens in the user state, there is no overhead of creating threads, and even when a goroutine blocks, other goroutines on the thread will be scheduled to run on other threads.

Distribution of goroutines in the process memory space

A goroutine is essentially a function that can be suspended and resumed. When a goroutine is created, a section of space is allocated in the heap of the process, which is used to store the goroutine stack area, and then copied from the heap when the goroutine needs to be resumed to restore the runtime state of the function.

Distribution of goroutines in the process memory space

GPM Model Analysis

The commit history of Golang code shows that a lot of code is translated from C to Go. A lot of the code in the runtime before go 1.4 was implemented in C. For example, many of the scheduling-related functions implemented in the current version of proc.go were originally implemented in C, but later translated in Go code. If you need to know the details of Golang’s original design, you can look at the earliest C code commits.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
commit b2cdf30eb6c4a76504956aaaad47df969274296b
Author: Russ Cox <rsc@golang.org>
Date:   Tue Nov 11 17:08:33 2014 -0500

    [dev.cc] runtime: convert scheduler from C to Go
    
    
    
commit 15ced2d00832dd9129b4ee0ac53b5367ade24c13
Author: Russ Cox <rsc@golang.org>
Date:   Tue Nov 11 17:06:22 2014 -0500

    [dev.cc] runtime: convert assembly files for C to Go transition

    The main change is that #include "zasm_GOOS_GOARCH.h"
    is now #include "go_asm.h" and/or #include "go_tls.h".

    Also, because C StackGuard is now Go _StackGuard,
    the assembly name changes from const_StackGuard to
    const__StackGuard.

    In asm_$GOARCH.s, add new function getg, formerly
    implemented in C.

This article focuses on the current GPM model in the scheduler, starting with an understanding of the roles and connections of the three components of the GPM model.

  • G: Goroutine, the function we run in a Go program using the go keyword.
  • M: Machine, or worker thread, which stands for system thread, and M is an object in runtime that creates a system thread and binds to that M at the same time as each M is created.
  • P: Processor, similar to the concept of a CPU core, which can execute Go code only when M is associated with a P.

G needs to be bound to M at runtime, and M needs to be bound to P. M is not equal in number to P. This is because M gets caught in system calls or blocks for other things while running G. New M is created in runtime when M is not enough, so the number of M may grow as the program executes, while P stays the same without user intervention, and the number of G is determined by is determined by the user code.

The connection between the three GPMs is shown below.

GMP

  • Global queue: Holds the G’s waiting to run.
  • P’s local queue: Similar to the global queue, it also holds the G waiting to run, and the number of Gs stored is limited. When a new G is created, the G is added to the local queue of P. If the queue is full, a part of the G in the local queue will be moved to the global queue.
  • P list: All P’s are created at program startup and stored in an array of up to GOMAXPROCS (configurable).
  • M: If a thread wants to run a task, it has to get P and then get G from P’s local queue. When P’s queue is empty, M will also try to get a batch of G from the global queue and put it into P’s local queue, or steal half from other P’s local queue and put it into its own P’s local queue, M runs G. After G is executed, M will get the next G from P and so on.

GPM Life Cycle

1. Life cycle of P

The structure of the P object is shown below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
type p struct {
    id          int32
    status      uint32 // P 的状态
    link        puintptr
    schedtick   uint32     // 被调度次数
    syscalltick uint32     // 执行过系统调用的次数
    sysmontick  sysmontick // sysmon 最近一次运行的时间
    m           muintptr   // P 关联的 M
    mcache      *mcache    // 小对象缓存,可以无锁访问
    pcache      pageCache  // 页缓存,可以无锁访问
    raceprocctx uintptr    // race相关

    // 与 defer 相关
    deferpool    [5][]*_defer 
    deferpoolbuf [5][32]*_defer

    // goroutine ids 的缓存
    goidcache    uint64
    goidcacheend uint64

    // P 本地 G 队列,可以无锁访问
    runqhead uint32     // 本地队列头
    runqtail uint32     // 本地队尾
    runq     [256]guintptr // 本地 G 队列,使用数组实现的循环队列
    runnext guintptr   // 待运行的 G,优先级高于 runq

    // 已运行结束的 G (状态为 Gdead)会被保存在 gFree 中,方便实现对 G 的复用
    gFree struct {
        gList
        n int32
    }

    sudogcache []*sudog
    sudogbuf   [128]*sudog

    mspancache struct {
        len int
        buf [128]*mspan
    }

    tracebuf traceBufPtr
    traceSweep bool
    traceSwept, traceReclaimed uintptr

    palloc persistentAlloc 
    _ uint32 

    timer0When uint64

    timerModifiedEarliest uint64

    // 与 GC 相关的
    gcAssistTime         int64 
    gcFractionalMarkTime int64 
    gcMarkWorkerMode gcMarkWorkerMode
    gcMarkWorkerStartTime int64
    gcw gcWork
    wbBuf wbBuf

    ......

    // 抢占标记
    preempt bool
}
(1)Why do we need P?

Before Golang 1.1, there was no P component in the scheduler. The performance of the scheduler was still poor at this time. Dmitry Vyukov of the community summarized the problems in the current scheduler and designed to introduce the P component to solve the current problems (Scalable Go Scheduler Design Doc), and introduced the P component in Go 1.1. The introduction of the P component not only solves several problems listed in the documentation, but also introduces some good mechanisms.

The documentation lists 4 main issues with the scheduler, mainly

  • The global mutex lock (sched.Lock) problem: In testing, the community found that Golang programs consume 14% of their CPU usage at runtime on processing global locks. Without the P component, M could only fetch G from the global queue by adding a mutex lock, and there was a delay in the locking phase for other goroutine processing (creation, completion, rescheduling, etc.); with the introduction of the P component, there is a queue in the P object to hold the list of G. The local queue in P solves the problem of a single global lock in the old scheduler, and the G queue is split into two categories, sched continues to keep the global G queue, while there will be a local G queue in each P. At this time, M will run the G in P’s local queue first, and access without locking.

  • G switching problem: frequent switching of runnable G by M will increase the latency and overhead, for example, the newly created G will be put into the global queue instead of being executed locally in M, which will cause unnecessary overhead and latency, and should be executed on the M that created the G first; after the P component is introduced, the newly created G will be put into the local queue of the G-associated P first.

  • M’s memory cache (M.mcache) problem: In the version without P component, each M structure has a mcache field, mcache is a memory allocation pool, small objects will be allocated directly from mcache, when M is running G, G needs to request small objects will be allocated directly from M’s mcache, G G can be accessed without locking, because each M will only run one G at the same time, but there will only be some active M running G at each time in runtime, the other M blocking due to system calls etc. actually do not need mcache, this part of mcache is wasted, each M’s mcache has about 2M size of available memory, when there are thousands of M in the When there are thousands of M’s in the blocking state, a lot of memory is consumed. In addition, there is the problem of poor data locality, which means that M caches the small objects needed by G when it runs G. If G is dispatched to the same M again later, then it can speed up the access, but in the actual scenario, the probability that G is dispatched to the same M is not high, so the data locality is not good. With the introduction of the P component, mcache is shifted from M to P, and P keeps mcache, which means there is no need to allocate a mcache for each M, avoiding excessive memory consumption. In this way, in a highly concurrent state, each G will only use memory when it is running, and each G will be bound to a P, so only the currently running G will occupy a mcache, for which the number of mcaches is the number of P, and no locks will be generated during concurrent accesses.

  • Frequent thread blocking and wake-up problems: In the original scheduler, the number of system threads is limited by runtime.GOMAXPROCS(). Only one system thread is opened by default. And since M performs operations such as system calls, when M blocks, it does not create a new M to perform other tasks, but waits for M to wake up, and M switches between blocking and waking frequently, which causes additional overhead. In the new scheduler, when M is in the system scheduling state, it will be disassociated from the bound P and will wake up the existing or create a new M to run other G bound to P.

(2) New logic for P

The number of P’s is initialized at runtime startup and is by default equal to the number of logical cores of the cpu. It can be set at program startup with the environment variable GOMAXPROCS or the runtime.GOMAXPROCS() method, and the number of P’s is fixed for the duration of the program.

In the IO-intensive scenario, the number of P can be adjusted appropriately, because M needs to be bound to P to run, and M will fall into the system call when executing G. At this time, P associated with M is in the waiting state, if the system call never returns, then the CPU resources are actually wasted during this time waiting for the system call, although there is a sysmon monitoring thread in runtime can seize G, here is to seize P associated with G, let P rebind a M to run G, but sysmon is periodic execution of seizure, after sysmon stable operation every 10ms to check whether to seize P, the operating system in 10ms can perform multiple thread switching, if P is in the state of the system call there is a need to run G, this part of G is not executed, in fact, CPU resources are wasted. In some projects, we can see the operation of modifying the number of P. The open source database project https://github.com/dgraph-io/dgraph adjusts GOMAXPROCS to 128 to increase the IO processing power.

(3) P destruction logic

If GOMAXPROC is not adjusted during the program run, unused P will be placed in the scheduler’s global queue schedt.pidle and will not be destroyed. If GOMAXPROC is turned down, p.destroy() will reclaim the excess P-associated resources and set the P state to _Pdead, while there may still be M objects associated with P, so P objects will not be reclaimed.

(4) The state of P
state Description
_Pidle The state after P is initialized, when no user code or scheduler is running yet
_Prunning The state of P when it is bound by M and running user code
_Psyscall P will be set to this state by the associated M when a system call is required when G is executed
_Pgcstop P will be set to this state by the associated M when a GC occurs during program operation
_Pdead If the number of GOMAXPROCS is reduced while the program is running, the excess P will be set to the _Pdead state

The state of P

2. Life cycle of M

The structure of M object is as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
type m struct {
    // g0 记录工作线程(也就是内核线程)使用的栈信息,在执行调度代码时需要使用
    g0      *g    
    
    morebuf gobuf  // 堆栈扩容使用
    ......

    gsignal       *g   // 用于信号处理
    ......
    
    // 通过 tls (线程本地存储)结构体实现 m 与工作线程的绑定
    tls           [tlsSlots]uintptr 
    
    mstartfn      func()   // 表示m启动时立即执行的函数
    curg          *g       // 指向正在运行的 goroutine 对象
    caughtsig     guintptr 
    p             puintptr // 当前 m 绑定的 P
    nextp         puintptr // 下次运行时的P
    oldp          puintptr // 在执行系统调用之前绑定的P
    id            int64      // m 的唯一id
    mallocing     int32
    throwing      int32
    preemptoff    string // 是否要保持 curg 始终在这个 m 上运行
    locks         int32
    dying         int32
    profilehz     int32
    spinning      bool // 为 true 时表示当前 m 处于自旋状态,正在从其他线程偷工作
    blocked       bool // m 正阻塞在 note 上
    newSigstack   bool 
    printlock     int8
    incgo         bool   // 是否在执行 cgo 调用
    freeWait      uint32 
    fastrand      [2]uint32
    needextram    bool
    traceback     uint8
    
    // cgo 调用计数
    ncgocall      uint64      
    ncgo          int32       
    cgoCallersUse uint32      
    cgoCallers    *cgoCallers 
    
    // 没有 goroutine 需要运行时,工作线程睡眠在这个 park 成员上,
    // 其它线程通过这个 park 唤醒该工作线程
    doesPark      bool        
    park          note
    
    alllink       *m       // 记录所有工作线程的链表
    ......

    startingtrace bool
    syscalltick   uint32   // 执行过系统调用的次数
    freelink      *m 
    
    ......
    preemptGen uint32   // 完成的抢占信号数量
    ......
}
(1) New creation of M

M is an object in runtime that represents a thread. Each M object created creates a thread bound to M. New threads are created by executing the clone() system call. runtime defines the maximum number of M to be 10000. The maximum number of M is defined in runtime as 10000, which can be adjusted by debug.SetMaxThreads(n).

New M will be created in the following two scenarios.

  • The Golang program creates the main thread when it starts, and the main thread is the first M i.e. M0.
  • When a new G is created or a G goes from _Gwaiting to _Grunning and there is a free P, startm() will be called, first getting an M from the global queue (sched.midle) and binding the free P to execute the G. If there is no free M, M will be created by newm().
(2) Destruction of M

M will not be destroyed, but will go to sleep by executing the stopm() function when no runnable G can be found or no free P can be bound, and will go to sleep by executing the stopm() function in the following two cases.

  • When the P bound by M has no runnable G and cannot steal runnable G from other P, M will try to enter the spin state first (spinning), only some M will enter the spin state, the number of M in the spin state is at most half of the number of P in the non-idle state (sched.nmspinning < (procs- sched.npidle)/ 2), M in the spin state will steal executable G from other P. If M does not steal G in the spin state or does not enter the spin state, it will go directly to the sleep state.

  • When the G associated with M enters the system call, M will actively unbind with the associated P. When the G associated with M executes the exitsyscall() function to exit the system call, M will find a free P to bind, if no free P is found then M will call stopm() to enter the sleep state.

The stopm() function puts the sleeping M into the global idle queue (sched.midle).

(3) Operation of M

M needs to be associated with P to run, and M has affinity with P. For example, when executing the entersyscall() function to enter the system call, M will actively unbind with the current P. M will record the current P in m.oldp, and when executing the exitsyscall() function to exit the system call, M will preferentially bind the P in m.oldp.

(4) What is the role of M0 and how does it differ from the M associated with other threads?

M0 is a global variable defined in src/runtime/proc.go, M0 does not need to allocate memory on the heap, all other M’s are objects created by new(m) and their memory is allocated from the heap, M0 is responsible for performing initialization operations and starting the first G, the Golang program will start M0 first, M0 and the main thread are bound. M0 is bound to the main thread, and when M0 starts the first G, the main goroutine, it functions just like any other M.

(5) Why should we limit the number of M’s?

Golang added a limit on the number of M’s in version 1.2 (runtime: limit number of operating system threads), the default maximum number of M’s is 10000, and in In version 1.17 the scheduler was initialized with a default value set in the schedinit() function (sched.maxmcount = 10000).

Why limit the number of M’s? In the article on refactoring the scheduler Potential Further Improvements, Dmitry Vyukov has already mentioned the need to limit the number of M’s. In scenarios where there is high concurrency or a large number of goroutines are repeatedly created, more threads are needed to execute the goroutine, and too many threads can exhaust system resources or trigger system limitations that can cause program exceptions. The kernel also consumes additional resources when scheduling a large number of threads, so limiting the number of M threads is mainly to prevent unreasonable use of the program.

The default size of each thread stack on Linux is 8M. If 10,000 threads are created, 78.125 G of memory is required by default, which is already a very large amount of memory usage for ordinary programs.

  • /proc/sys/kernel/threads-max: indicates the maximum number of threads supported by the system.
  • /proc/sys/kernel/pid_max: indicates the limit of the system global PID number value, every process or thread has an ID, the process or thread will fail to be created if the value of the ID exceeds this number.
  • /proc/sys/vm/max_map_count: indicates a limit on the number of VMAs (virtual memory areas) a process can have.
(6) States of M

Through the analysis of the new and destruction process of M, M has three states: running, spin, and sleep, and the transitions between these three states are shown below.

States of M

3. Life cycle of G

The structure information of G is shown below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
type g struct {
    // 当前 Goroutine 的栈内存范围
    stack       stack   
    stackguard0 uintptr 
    stackguard1 uintptr 

    _panic    *_panic // 当前 g 中与 panic 相关的处理
    _defer    *_defer // 当前 g 中与 defer 相关的处理
    m         *m      // 绑定的 m
    
    // 存储当前 Goroutine 调度相关的数据,上下方切换时会把当前信息保存到这里
    sched     gobuf   
    
    ......
    
    param        unsafe.Pointer  // 唤醒G时传入的参数
    atomicstatus uint32  // 当前 G 的状态
    stackLock    uint32 
    goid         int64   // 当前 G 的 ID
    schedlink    guintptr
    waitsince    int64      // G 阻塞时长
    waitreason   waitReason // 阻塞原因

    // 抢占标记
    preempt       bool 
    preemptStop   bool 
    preemptShrink bool 
    
   
    asyncSafePoint bool
    paniconfault bool 
    gcscandone   bool 
    throwsplit   bool 
    
    // 表示是否有未加锁定的channel指向到了g 栈
    activeStackChans bool
    
    // 表示g 是放在chansend 还是 chanrecv,用于栈的收缩
    parkingOnChan uint8
    
    raceignore     int8     // ignore race detection events
    sysblocktraced bool     // StartTrace has emitted EvGoInSyscall about this goroutine
    tracking       bool     // whether we're tracking this G for sched latency statistics
    trackingSeq    uint8    // used to decide whether to track this G
    runnableStamp  int64    // timestamp of when the G last became runnable, only used when tracking
    runnableTime   int64    // the amount of time spent runnable, cleared when running, only used when tracking
    sysexitticks   int64    // cputicks when syscall has returned (for tracing)
    traceseq       uint64   // trace event sequencer
    tracelastp     puintptr // last P emitted an event for this goroutine
    lockedm        muintptr
    sig            uint32
    writebuf       []byte
    sigcode0       uintptr
    sigcode1       uintptr
    sigpc          uintptr
    gopc           uintptr         // goroutine 当前运行函数的 PC 值
    ancestors      *[]ancestorInfo // ancestor information goroutine(s) that created this goroutine (only used if debug.tracebackancestors)
    startpc        uintptr         // 触发这个 goroutine 的函数的 PC 值
    racectx        uintptr
    waiting        *sudog         // sudog structures this g is waiting on (that have a valid elem ptr); in lock order
    cgoCtxt        []uintptr      // cgo traceback context
    labels         unsafe.Pointer // profiler labels
    timer          *timer         // cached timer for time.Sleep
    selectDone     uint32         // are we participating in a select and did someone win the race?
    
    // GC 时存储当前 Goroutine 辅助标记的对象字节数
    gcAssistBytes int64
}
(1) New creation of G

When a Golang program starts, the main thread creates the first goroutine to execute the main function, and a new goroutine is created if the user uses the go keyword in the main function, and the user can continue to create new goroutines in the goroutine using the go keyword. goroutines are created goroutines are created by calling the newproc() function in the golang runtime. The number of goroutines is limited by system resources (CPU, memory, file descriptors, etc.). If there is only simple logic in a goroutine, it is theoretically fine to have as many goroutines as you want, but if there are operations such as creating network connections or opening files in a goroutine, too many goroutines may result in reports such as too many files open or Resource temporarily unavailable and so on, causing the program to execute abnormally.

The newly created G is placed first in the runnext queue of the current G association P via the runqput() function. only one G is kept in P’s runnext queue, and if there is already a G in the runnext queue, it is replaced with the newly created G. Then the original G in runnext is placed in P’s local queue, runq, and if P’s local queue is full, half of the G in P’s local queue is moved to the global queue sched.runq. The main reason for moving the newly created G to P’s runnext first is to improve performance. runnext is a completely private queue for P. If G is placed in P’s local queue runq, the G in the runq queue may change due to other M’s stealing, and each time G is fetched from P’s local queue, atomic. LoadAcq and atomic.CasRel atomic operations every time G is fetched from the P local queue, which introduces additional overhead.

(2) Destruction of G

When G exits, the goexit() function is executed and the state of G changes from _Grunning to _Gdead. However, the G object is not freed directly, but is put into the local or global idle list gFree of the associated P for reuse via gfput(). Priority is given to the P local queue, and if there are more than 64 gFree in the P local queue, only 32 will be kept in the P local queue, and any more than that will be put into the global idle queue sched.gFree.

(3) Runs of G

G is bound to M to run, and M needs to be bound to P to run, so theoretically the number of running G at the same time is equal to the number of P. M does not keep the state of G. G keeps the state in its gobuf field, so G can be scheduled across M. After M finds a G to run, it switches from the g0 stack to the user G’s stack to run via the assembly function gogo().

(4) What are the states of G?

The states of G are defined in the src/runtime/runtime2.go file. There are three main types of states, one is a few states when the goroutine is running normally, then the ones related to GC, and the remaining states are unused.

The role of each transition state and the relationship between the states are shown below.

states Description
_Gidle has just been created and not yet initialized
_Grunnable No execution code, no ownership of the stack, stored in the run queue
_Grunning can execute code, has ownership of the stack, has bound M and P
_Gsyscall System call being executed
_Gwaiting Blocked due to runtime, not executing user code and not on the run queue
_Gdead Run completed in exit state
_Gcopystack The stack is being copied
_Gpreempted Blocked due to preemption, waiting for wake-up
_Gscan GC is scanning the stack space

What are the states of G

4. The role of g0

1
2
3
4
type m struct {
    g0      *g     // goroutine with scheduling stack
    ......
}

There are two types of g0 in runtime, one is m0-associated g0 and the other is other m-associated g0. m0-associated g0 is defined as a global variable and its memory space is allocated on the system’s stack with a size of 64K - 104 bytes, and other m-associated g0 is a stack allocated on the heap with a default size of 8K.

src/runtime/proc.go#1879

1
2
3
4
5
6
if iscgo || mStackIsSystemAllocated() {
    mp.g0 = malg(-1)
} else {
    // sys.StackGuardMultiplier 在 linux 系统上值为 1
    mp.g0 = malg(8192 * sys.StackGuardMultiplier) 
}

Every time an M is started, the first goroutine created is g0. Each M will have its own g0. g0 is mainly used to record the stack information used by the worker thread, and is only used to be responsible for scheduling, which needs to be used when executing the scheduling code. When executing the user goroutine code, the stack of the user goroutine is used, and the stack switch occurs when scheduling.

M scheduling

Many functions in the runtime code are called through the systemstack() function when they are executed. The role of the systemstack() function is to switch to the g0 stack, then execute the corresponding function and finally switch back to the original stack and return. In principle, as long as a function has nosplit this system annotation will need to be executed on the g0 stack, because the addition of nosplit compiler in the compilation will not be inserted in front of the function to check the stack overflow code, these functions in the execution of the stack overflow may cause, and g0 stack is relatively large, in the compilation if each function in the runtime to do stack overflow check will affect efficiency That’s why we cut to the g0 stack.

Summary

This article mainly analyzes the Golang GPM model. In the process of reading the runtime code, we found that there are many details in the code that need a lot of time to analyze, and only some simple explanations of the general framework are given in the article, and some details are brought in by the way. In the later articles, many details will be analyzed again.