01 Local cache for P: mcache

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// runtime/malloc.go

numSpanClasses = _NumSizeClasses << 1
_NumStackOrders = 4 - sys.PtrSize/4*sys.GoosWindows - 1*sys.GoosPlan9

// runtime/mcache.go

//go:notinheap
type mcache struct {	
	nextSample uintptr 
	scanAlloc  uintptr 
	tiny       uintptr
	tinyoffset uintptr
	tinyAllocs uintptr
	alloc      [numSpanClasses]*mspan
	stackcache [_NumStackOrders]stackfreelist
	flushGen   uint32
}

type stackfreelist struct {
    list gclinkptr // linked list of free stacks
    size uintptr   // total size of stacks in list
}
  • nextSmaple triggers a heap instance after allocating this many bytes.
  • mcache is not in GC scope, so it needs to be released manually with mache.releaseAll().
  • tiny is a heap pointer to a cache of small objects.
  • tinyAllocs is the number of small object allocations that have been requested by the host P.
  • tinyoffset is the offset, i.e. location, of where the current small object is located.
  • alloc is a cut up of small objects of different small sizes mspan . There are _NumSizeClasses=67, which means there are 67 different sizes of mspan objects. The reason why numSpanClasses = _NumSizeClasses << 1 is that each spanclass contains two types, a scan and a noscan. For the object where the memory is allocated, if it does not contain a pointer, it is put into noscan, otherwise it is put into scan. Because those that do not contain pointers do not need to be scanned by GC.
1
2
3
4
5
// runtime/sizeclasses.go

_NumSizeClasses = 68

var class_to_size = [_NumSizeClasses]uint16{0, 8, 16, 24, 32, 48, 64, 80, 96, 112, 128, 144, 160, 176, 192, 208, 224, 240, 256, 288, 320, 352, 384, 416, 448, 480, 512, 576, 640, 704, 768, 896, 1024, 1152, 1280, 1408, 1536, 1792, 2048, 2304, 2688, 3072, 3200, 3456, 4096, 4864, 5376, 6144, 6528, 6784, 6912, 8192, 9472, 9728, 10240, 10880, 12288, 13568, 14336, 16384, 18432, 19072, 20480, 21760, 24576, 27264, 28672, 32768}

stackcache is the definition of stack size rules, there are _NumStackOrders in total. The stack size varies from OS to OS.

1
2
3
4
5
6
//   OS               | FixedStack | NumStackOrders
//   -----------------+------------+---------------
//   linux/darwin/bsd | 2KB        | 4
//   windows/32       | 4KB        | 3
//   windows/64       | 8KB        | 2
//   plan9            | 4KB        | 3
  • flushGen is used for cleanup and refresh.

In summary, the overall memory allocation model for P is as follows.

Each P has its own cached memory allocation management object mcache. Each mcache holds n sliced and diced mspan Doubly linked list of different sizes, and stack space stackfreelist of different sizes. The existing memory requests on the mcache are exclusively occupied by the current P. The request process does not require locking, so its efficiency is high.

02 Small object storage structure: mspan

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// runtime/mheap.go

//go:notinheap
type mspan struct {
	next           *mspan     
	prev           *mspan     
	list           *mSpanList  // For debugging. TODO: Remove.
	startAddr      uintptr
	npages         uintptr 
	manualFreeList gclinkptr 
    
	freeindex  uintptr	
	nelems     uintptr 
	allocCache uint64	
	allocBits  *gcBits
	gcmarkBits *gcBits
	sweepgen    uint32
	divMul      uint32        // for divide by elemsize
	allocCount  uint16        // number of allocated objects
	spanclass   spanClass     // size class and noscan (uint8)
	state       mSpanStateBox // mSpanInUse etc; accessed atomically (get/set methods)
	needzero    uint8         // needs to be zeroed before allocation
	elemsize    uintptr       // computed from sizeclass or from npages
	limit       uintptr       // end of data in span
	speciallock mutex         // guards specials list
	specials    *special      // linked list of special records sorted by offset.
}

The mspan object fields are very numerous. next and prev are pointers to the front and back of the Doubly linked list.

  • startAddr is the memory address pointing to the beginning of the chain.

  • npages is an indication of the size of the specification of mspan in pages. Pages are the smallest unit of memory that Golang requests from the operating system, currently designed to be 8KB.

  • spanclass indicates the specification type of mspan.

  • allocBits is a bitmap of the current mspan requests, recording which places have been requested and which are free.

  • gcmarkBits indicates which objects in mspan have been freed.

  • freeindex between 0 ~ nelems, i.e. after freeindex is the index of the free requestable subscript. The rules for each allocation are as follows.

    Start from n>freeindex, i.e. from allocBits[n], to see if there is enough free capacity after it.Calculate whether allocBits[n/8] & (1 « (n % 8)) is 0, if it is 0, it means it can be applied, otherwise it means there are no more resources for the current span.

  • nelems is the number of objects in mspan.

03 Global memory cache mcentral

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
// runtime/mcentral.go

//go:notinheap
type mcentral struct {
    spanclass spanClass
    partial [2]spanSet
    full    [2]spanSet
}

type spanSet struct {
    spineLock mutex
	spine     unsafe.Pointer
	spineLen  uintptr       
	spineCap  uintptr
    index headTailIndex
}

mcentral is designed to be able to allocate memory even if p’s local mcache is insufficient. mcentral is a centralized memory allocation center where all p’s can request memory from mcentral, which involves data contention and requires locking, so it is not as efficient as mcache.

  • mcentral, like mcache.mspans, is also subspecified, and a mcentral maintains only one size of memory request. Specified by spanclass.
  • partial is the list of all free spans maintained by mcentral itself.
  • full is the list of spans maintained by mcentral for which non-free spans exist.

In general, the structure of mcentral is as above. Each mcentral object maintains a list of mspans of only one size. When a p’s mcache does not have enough memory, it requests the mcentral of the specified specification, and this process adds a lock. The mcentral then prioritizes picking one from the non-full free list full, otherwise it goes to partial and picks one to allocate to the application.

04 Heap memory: mheap

For both mcache and mcentral, neither can maintain memory blocks larger than 32KB, so for large memory is allocated directly on the heap, while mcentral will also replenish memory blocks via mheap. This process still requires locking to secure the data.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
//  runtime/mheap.go

//go:notinheap
type mheap struct {
    lock  mutex
	pages pageAlloc // page allocation data structure

	sweepgen     uint32 // sweep generation, see comment in mspan; written during STW
	sweepDrained uint32 // all spans are swept or are being swept
	sweepers     uint32 // number of active sweepone calls

    allspans []*mspan // all spans out there
    _ uint32 // align uint64 fields on 32-bit for atomics
    
    pagesInUse         uint64  // pages of spans in stats mSpanInUse; updated atomically
	pagesSwept         uint64  // pages swept this cycle; updated atomically
	pagesSweptBasis    uint64  
	sweepHeapLiveBasis uint64  
	sweepPagesPerByte  float64
    
    ...
}

//go:notinheap
type heapArena struct {
    bitmap       [heapArenaBitmapBytes]byte
    spans        [pagesPerArena]*mspan
    pageInUse    [pagesPerArena / 8]uint8
    pageMarks    [pagesPerArena / 8]uint8
    pageSpecials [pagesPerArena / 8]uint8
    checkmarks   *checkmarksMap
    zeroedBase   uintptr
}
  • On heapArena, the maintenance granularity is 64MB (64bit OS).
  • On mheap, the minimum maintenance granularity is page (1page=8KB).

05 Summary

  1. The heap in Golang is composed of a block of memory, represented as heapArena in runtime, each block is 64MB in size and consists of many pages, with one page occupying 8KB of space (at 64bit).
  2. mheap is the management object of the memory block, which is managed according to the page granularity.
  3. mcentral is a further management of the pages in mheap, and because it is global, access requires locking. Each mspan specification corresponds to one mcentral for management. Each mcentral has two mspan lists, one for unused and one for used.
  4. A cache mcache of the P itself is maintained inside each P. The mcache also maintains a list of 67 different specifications of mspan, each of which is stored in an array. One specification corresponds to two elements of the array, one storing pointer objects and one storing non-pointer objects. mcache itself is exclusive to P, so no locking application is required.
  5. When requesting memory in a program (such as make, or new, etc.), the P corresponding to the g where it is located will first request it from its own mcache, and if there is not enough space, it will request it from mcentral, and if there is still none, it will continue to request it from mheap.
  6. Since the specification of mspan is only from 1Byte to 32KB, for objects over 32KB, it will apply to mheap directly.
  7. Golang’s memory allocation uses multi-level caching to reduce lock granularity for more efficient memory allocation. When recycling objects, the memory is not recycled directly, but put back into pre-allocated memory blocks, and only returned to the OS when there is too much free memory.