Golang's memory allocation

The Go language has a built-in runtime (that is, runtime) that abandons the traditional way of allocating memory in favor of autonomous management. This allows for autonomous implementation of better memory usage patterns, such as memory pooling, pre-allocation, etc. This way, you don’t need to make a system call for every memory allocation.

The Golang runtime memory allocation algorithm is derived from Google’s TCMalloc algorithm for C. The core idea is to reduce the granularity of locks by dividing memory into multiple levels of management. It manages the available heap memory in a two-level allocation: each thread maintains a separate memory pool of its own and allocates memory from that pool first, and only applies to the global memory pool when the pool is insufficient to avoid frequent competition between different threads for the global memory pool.

Basic Concepts

When Go starts a program, it first requests a block of memory from the operating system (note that this is still just a virtual address space and does not actually allocate memory), cuts it into small chunks and then manages it itself.

The requested memory block is allocated three areas, 512MB, 16GB, and 512GB in size on the X64.

memory areas

The arena is what we call the heap area. Go dynamically allocates memory in this area, which divides the memory into 8KB sized pages, some of which are combined and called mspan.

The bitmap area identifies which addresses in the arena area hold objects, and uses the 4bit flag bit to indicate whether the object contains pointer, GC tag information. One byte size of memory in bitmap corresponds to 4 pointer size (8B pointer size) of memory in the arena area, so the size of bitmap area is 512GB/(4*8B)=16GB.

bitmap area

From the above figure, you can actually see that the high address part of the bitmap points to the low address part of the arena area, which means that the address of the bitmap grows from the high address to the low address.

The spans area holds the pointers to the mspan (that is, the basic unit of memory management combined with some pages of the arena partition, which will be discussed later), and each pointer corresponds to a page, so the size of the spans area is 512GB/8KB*8B=512MB. Dividing by 8KB is to calculate the number of pages in the arena area, and multiplying by 8 is to calculate the size of all the pointers in the spans area. When creating an mspan, the corresponding spans area is filled by page, and when recycling the object, the mspan it belongs to can be easily found based on its address.

Memory management unit

mspan: The basic memory management unit in Go, a large block of memory consisting of a contiguous 8KB page. Note that a page here is not the same thing as an OS page, which is typically several times the size of an OS page. In a nutshell: mspan is a Doubly linked list containing the starting address, the mspan specification, the number of pages, etc.

Each mspan is divided into several objects according to the size of its own property Size Class, and each object can store one object. A bitmap is used to mark the objects that are not yet used. The property Size Class determines the size of the object, and mspan will only be assigned to objects that are close to the size of the object, but, of course, smaller than the size of the object. There is another concept: Span Class, which has a similar meaning to Size Class.

`1`	`Size_Class = Span_Class / 2`

This is because there are actually two mspans per Size Class, which means there are two Span Classs. One of them is allocated to objects that contain pointers, and the other one is allocated to objects that do not contain pointers. This will bring benefits to the garbage collection mechanism, which will be discussed in a later article.

As shown below, mspan consists of a set of consecutive pages divided into objects of a certain size.

mspan

Go1.9.2 mspan of Size Class has 67 kinds, each mspan partitioned object size is a multiple of 8*2n, this is written dead in the code.

// path: /usr/local/go/src/runtime/sizeclasses.go

const _NumSizeClasses = 67

var class_to_size = [_NumSizeClasses]uint16{0, 8, 16, 32, 48, 64, 80, 96, 112, 128, 144, 160, 176, 192, 208, 224, 240, 256, 288, 320, 352, 384, 416, 448, 480, 512, 576, 640, 704, 768, 896, 1024, 1152, 1280, 1408, 1536,1792, 2048, 2304, 2688, 3072, 3200, 3456, 4096, 4864, 5376, 6144, 6528, 6784, 6912, 8192, 9472, 9728, 10240, 10880, 12288, 13568, 14336, 16384, 18432, 19072, 20480, 21760, 24576, 27264, 28672, 32768}

According to the Size Class of mspan, you can get the size of the object it divides. For example, if Size Class is equal to 3, object size is 32B. An object of size 32B can store objects in the size range of 17B to 32B. For tiny objects (less than 16B), the allocator will merge them and allocate several objects into the same object.

The maximum number in the array is 32768, that is, 32KB, more than this size is a large object, it will be treated specially, this will be introduced later. By the way, a type Size Class of 0 indicates a large object, which is actually allocated directly from heap memory, while small objects are allocated through mspan.

For mspan, its Size Class determines the number of pages it can be allocated, which is also written dead in the code.

// path: /usr/local/go/src/runtime/sizeclasses.go

const _NumSizeClasses = 67

var class_to_allocnpages = [_NumSizeClasses]uint8{0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 2, 1, 3, 2, 3, 1, 3, 2, 3, 4, 5, 6, 1, 7, 6, 5, 4, 3, 5, 7, 2, 9, 7, 5, 8, 3, 10, 7, 4}

For example, when we want to request a mspan of object size 32B, the corresponding index in class_to_size is 3, and the index 3 corresponds to the number of pages in the class_to_allocnpages array is 1.

mspan structure definition.

// path: /usr/local/go/src/runtime/mheap.go

type mspan struct {
    //linked list 前向指针，用于将span链接起来
    next *mspan 
    //linked list 前向指针，用于将span链接起来
    prev *mspan 
    // 起始地址，也即所管理页的地址
    startAddr uintptr 
    // 管理的页数
    npages uintptr 
    // 块个数，表示有多少个块可供分配
    nelems uintptr 

    //分配位图，每一位代表一个块是否已分配
    allocBits *gcBits 

    // 已分配块的个数
    allocCount uint16 
    // class表中的class ID，和Size Classs相关
    spanclass spanClass  

    // class表中的对象大小，也即块大小
    elemsize uintptr 
}

Let’s put mspan into a larger perspective.

mspan

The above figure shows that there are two Ss pointing to the same mspan, because the P to which these two Ss point belongs to the same mspan. So, the S pointing to it can be quickly found by the address on the arena, and the mspan can be found by the S, recalling what we said earlier about each pointer to the mspan area corresponding to a page.

Assuming that the Size Class of the first mspan on the far left is equal to 10, according to the class_to_size array, the object size of this msapn partition is 144B, and the number of objects that can be allocated is 8KB/144B=56.89, rounded up to 56, so there will be some memory wasted. Go’s source code has the size of all mspan wasted memory of Size Class; then according to the class_to_allocnpages array, we get this mspan consists of only 1 page; suppose this mspan is allocated to pointerless objects, then spanClass is equal to 20.

startAddr points directly to a location in the arena region, indicating the starting address of this mspan, allocBits points to a bitmap, each representing whether a block has been allocated or not, and allocCount indicates the total number of objects allocated.

Thus, the parameters of each field of the first mspan from the left are as follows.

mspan

Memory Management Components

Memory allocation is done by the memory allocator. The allocator consists of 3 components: mcache , mcentral , mheap .

mcache

mcache : Each worker thread is bound to an mcache, which locally caches available mspan resources so that they can be allocated directly to Goroutines, as there is no competition from multiple Goroutines, so no lock resources are consumed.

Structure definition of mcache.

//path: /usr/local/go/src/runtime/mcache.go

type mcache struct {
    alloc [numSpanClasses]*mspan
}

numSpanClasses = _NumSizeClasses << 1

mcache uses Span Classes as an index to manage multiple mspan for allocation, and it contains all sizes of mspan. It is twice as large as _NumSizeClasses, which is 67*2=134. Why there is a twofold relationship, as we mentioned earlier: to speed up memory reclamation afterwards, half of the objects allocated in mspan in the array do not contain pointers, and the other half contain pointers.

The mspan of a pointerless object is garbage collected without further scanning for references to other active objects. More on this in a later garbage collection article, but this time we’ll stop here.

mspan

mcache is initialized without any mspan resources and is dynamically requested from mcentral during use and cached afterwards. When the object is less than or equal to 32KB in size, mcache is allocated using the corresponding size of mspan.

mcentral

mcentral: Provides a sliced mspan resource for all mcache. Each central holds a global mspan list of a specific size, both allocated and unallocated. Each mcentral corresponds to one type of mspan, and the type of mspan causes it to divide objects of different sizes. When there is no suitable (i.e., specific size) mspan in the worker thread’s mcache, it is fetched from mcentral.

mcentral is shared by all worker threads, and there is competition from multiple Goroutines, so it consumes lock resources. Structure definition.

//path: /usr/local/go/src/runtime/mcentral.go

type mcentral struct {
    // 互斥锁
    lock mutex 
    // 规格
    sizeclass int32 
    // 尚有空闲object的mspan linked list
    nonempty mSpanList 
    // 没有空闲object的mspan linked list，或者是已被mcache取走的msapn linked list
    empty mSpanList 
    // 已累计分配的对象个数
    nmalloc uint64 
}

mcentral

empty means that all the mspan in this linked list are either allocated object or mspan that have already been taken by cache, and this mspan is exclusive to that worker thread. And nonempty indicates a list of mspans that have free objects. Each central structure is maintained in mheap.

To briefly explain the flow of mcache fetching and returning mspan from mcentral.

Get Locks; finds an available mspan from the nonempty linked list; removes it from the nonempty linked list; adds the retrieved mspan to the empty linked list; returns the mspan to the worker thread; unlocks it.
Return lock; remove mspan from the empty linked list; add mspan to the nonempty linked list; unlock.

mheap

mheap: Represents all the heap space held by the Go program, which uses a global object _mheap of mheap to manage heap memory.

When mcentral does not have a free mspan, it requests it from mheap. And when mheap runs out of resources, it requests new memory from the OS. mheap is mainly used for memory allocation for large objects and for managing uncut mspan, which is used to cut mcentral into smaller objects.

Also we see that mheap contains mcentral of all sizes, so when an mcache requests mspan from mcentral, it only needs to use a lock in the separate mcentral and does not affect requests for mspan of other sizes.

mheap structure definition.

//path: /usr/local/go/src/runtime/mheap.go

type mheap struct {
    lock mutex
    // spans: 指向mspans区域，用于映射mspan和page的关系
    spans []*mspan 
    // 指向bitmap首地址，bitmap是从高地址向低地址增长的
    bitmap uintptr 

    // 指示arena区首地址
    arena_start uintptr 
    // 指示arena区已使用地址位置
    arena_used  uintptr 
    // 指示arena区末地址
    arena_end   uintptr 

    central [67*2]struct {
        mcentral mcentral
        pad [sys.CacheLineSize - unsafe.Sizeof(mcentral{})%sys.CacheLineSize]byte
    }
}

mheap

Above we see that the bitmap and arena_start point to the same address. This is because the address of the bitmap grows from high to low, so they point to the same memory location.

Allocation process

Whether a variable is allocated on the stack or on the heap is determined by the result of the escape analysis. Usually, the compiler prefers to allocate variables on the stack because it has less overhead, and the most extreme is “zero garbage”, where all variables are allocated on the stack so that there is no memory fragmentation, garbage collection, or anything like that.

Go’s memory allocator allocates objects in three categories based on their size: small objects (less than or equal to 16B), general objects (greater than 16B, less than or equal to 32KB), and large objects (greater than 32KB).

The general allocation process.

32KB objects are allocated directly from the mheap.
<=16B objects allocated using the tiny allocator of mcache.
objects of [16B,32KB], first calculate the specification size of the object, and then allocate it using the mspan of the corresponding specification size in the mcache.
If mcache does not have mspan of the corresponding specification size, apply to mcentral
If mcentral does not have an mspan of the corresponding specification size, apply to mheap
If there is no mspan of the appropriate size in mheap either, apply to the operating system

Summary

Go requests a large chunk of memory from the operating system at program startup and manages it on its own afterwards.
The basic unit of Go memory management is the mspan, which consists of several pages, each mspan can allocate a specific size of object.
mcache, mcentral, and mheap are the three major components of Go memory management in a hierarchy. mcache manages the mspan that threads cache locally; mcentral manages the global mspan for use by all threads; and mheap manages all of Go’s dynamically allocated memory.
Very small objects are allocated in an object to save resources and use tiny allocator to allocate memory; generally small objects are allocated memory by mspan; and large objects are allocated memory directly by mheap.

Table of Contents

Basic Concepts

Memory management unit

Memory Management Components

mcache

mcentral

mheap

Allocation process

Summary