Many systems have daemons that monitor the running state of the system in the background and respond to unexpected situations as they arise. The system monitor is an important part of the Go language runtime, checking the Go language runtime at regular intervals to ensure that the program has not entered an abnormal state. This section describes the design and implementation of the Go system monitor, including its startup, execution process, and main responsibilities.
In operating systems that support multitasking, daemons are computer programs that run in the background and are not operated directly by the user; they typically run automatically when the operating system starts. both DaemonSet for Kubernetes and System Monitor for the Go language use a similar design to provide some general functionality.
Daemons are very efficient designs that exist throughout the life of a system, starting as the system starts and ending as the system ends. In the OS and Kubernetes, we often run processes such as database services, logging services, and monitoring services as daemons.
The Go language’s system monitoring also plays an important role by starting an internal loop that does not abort, polling the network inside the loop, preempting long-running or system-calling Goroutines, and triggering garbage collection, which, through these actions, can make the system run in a healthier state.
When a Go language program is started, the runtime calls runtime.main in the first Goroutine to start the main program, which creates a new thread on the system stack: the
runtime.newm creates a new structure runtime.m that stores the functions and processors to be executed. runtime executes the system monitor without a processor, and the system monitor’s Goroutine runs directly on the created thread.
runtime.newm1 calls platform-specific runtime.newosproc to create a new thread via the system call clone and execute runtime.mstart in the new thread.
In the newly created thread, we execute the runtime.sysmon stored in runtime.m to start the system monitoring.
When the above function is just called at runtime, it will first check for deadlocks via runtime.checkdead and then enter the core monitoring loop; the system monitoring hangs the current thread via usleep at the beginning of each loop, the argument to this function is microseconds and the runtime will follow the following rules to determine the hibernation time.
- The initial hibernation time is 20 μs.
- the maximum hibernation time is 10ms.
- when the system monitor does not wake up the Goroutine for 50 cycles, the hibernation time is multiplied in each cycle.
Once the program stabilizes, the system monitor’s trigger time stabilizes at 10 ms. In addition to checking for deadlocks, it does the following in the loop.
- Run timer - get the next timer that needs to be triggered.
- Polling Network - getting the due file descriptors that need to be processed.
- Preemption Processor - preempts Goroutines that have been running for a long time or are in a system call.
- Garbage collection - triggers garbage collection to reclaim memory when conditions are met.
We will describe in this section in turn how system monitoring accomplishes several of these different tasks.
Checking for deadlocks
The system monitor checks for deadlocks at runtime with
runtime.checkdead, and we can break the process of checking for deadlocks into three steps as follows.
- checking for the existence of a running thread.
- check for the presence of a running Goroutine.
- checking for the presence of a timer on the processor.
This function first checks the number of running threads in the Go language runtime, and we calculate the result of this value using several fields in the scheduler.
- runtime.mcount gets the number of threads present in the system based on the next thread id to be created and the number of threads released.
- nmidle is the number of threads that are idle.
- nmidlelocked is the number of threads in a locked state.
- nmsys is the number of threads in the system call.
Using the above thread-related data, we can get the number of running threads. If the number of threads is greater than 0, it means that there is no deadlock in the current program; if the number of threads is less than 0, it means that the state of the current program is inconsistent; if the number of threads is equal to 0, we need to further check the running state of the program.
- when there are Goroutines in the _Grunnable, _Grunning, and _Gsyscall states, it means that the program has deadlocked.
- when all Goroutines are in the _Gidle, _Gdead, and _Gcopystack states, it means that the main program called runtime.goexit.
When there is a waiting Goroutine at runtime and there is no running Goroutine, we check the timer present in the processor
If there are waiting timers in the processor, it makes sense for all Goroutines to fall asleep, but if there are no waiting timers, the run will simply report an error and exit the program.
In the system monitor loop, we use runtime.nanotime and runtime.timeSleepUntil to get the current time and the next time the timer needs to wake up; when the scheduler needs to perform garbage collection or when all processors are idle, the system monitor can temporarily fall asleep if there is no timer that needs to be triggered : The
The duration of hibernation is determined by the forced GC period forcegcperiod and the time when the timer is next triggered. runtime.notesleep uses a semaphore to synchronize the system monitor to the hibernation state. When the system monitor is woken up, we recalculate the current time and the next timer to be triggered, call runtime.noteclear to notify the system monitor of the wake-up and reset the hibernation interval.
If after this we find that the next timer needs to be triggered at a time less than the current time, which also indicates that all threads are probably busy running Goroutine, System Monitor will start a new thread to trigger the timer to avoid a large deviation in the timer’s expiration time.
Polling the network
If 10ms have passed since the last polling of the network, the system monitor also polls the network in a loop to check for pending file descriptors.
The above function non-blockingly calls runtime.netpoll to check for pending file descriptors and adds all ready Goroutines to the global run queue via runtime.injectglist.
This function switches the state of all Goroutines from _Gwaiting to _Grunnable and adds them to the global run queue waiting to run, and if there are free processors in the current program, it will start threads to execute those tasks via runtime.startm.
The system monitor calls runtime.retake in a loop to seize a processor that is running or in a system call. This function iterates through the global processors at runtime, each of which stores a runtime.sysmontick.
The four fields in this structure store the number of times the processor was scheduled, the last time the processor was scheduled, the number of system calls, and the time of the system call. runtime.retake’s loop contains two different types of preemption logic.
- when the processor is in the _Prunning or _Psyscall state, we preempt the current processor with runtime.preemptone if 10ms have elapsed since the last scheduling trigger.
- when the processor is in the _Psyscall state, runtime.handoffp is called to give up the use of the processor when both of the following conditions are met.
- when the processor’s run queue is not empty or no free processor exists.
- when the system call time exceeds 10ms.
System monitoring avoids starvation problems caused by the same Goroutine taking up threads for too long by preempting processors in a loop.
At the end, the system monitor also determines if a forced garbage collection needs to be triggered. runtime.sysmon builds runtime.gcTrigger and calls the runtime.gcTrigger.test method to determine if a garbage collection needs to be triggered.
If garbage collection needs to be triggered, we add the Goroutine used for garbage collection to the global queue and let the scheduler choose the appropriate processor to execute it.
Runtime system monitoring triggers thread preemption, network polling, and garbage collection to ensure the availability of the Go language runtime. System monitoring is a good solution to the tail latency problem, reducing the scheduler’s starvation problem for scheduling Goroutines and ensuring that timers are triggered at the most accurate times possible.