References are an important feature introduced in C++ as opposed to C. They make the syntax much more concise in many places, but how are they actually implemented underneath?

In Wikipedia, pointers are described as follows.

In computer science, a pointer is a programming language object that stores the memory address of another value located in computer memory. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer.

As you can see from the definition, a pointer is essentially a variable in which the address of another variable is stored. A pointer in C is very flexible in that it can point to any address, whether it exists or not, or whether it stores data of the type represented by the pointer.

It is not hard to imagine that a pointer is also a variable in memory when it is implemented, and it holds the addresses of other variables.

In Wikipedia, references are described as follows.

In computer science, a reference is a value that enables a program to indirectly access a particular datum, such as a variable’s value or a record, in the computer’s memory or in some other storage device. The reference is said to refer to the datum, and accessing the datum is called dereferencing the reference.

In the C++ programming language, a reference is a simple reference datatype that is less powerful but safer than the pointer type inherited from C. The name C++ reference may cause confusion, as in computer science a reference is a general concept datatype, with pointers and C++ references being specific reference datatype implementations. The definition of a reference in C++ is such that it does not need to exist. It can be implemented as a new name for an existing object (similar to rename keyword in Ada).

As you can see from the above definition, in C++, a reference can be narrowly thought of as an alias for a variable, which does not exist in itself.

Based on the above statement, I then thought for a while that a reference is just some black magic of the C++ compiler at compile time, which links two resolved symbols into one at runtime, thus completing the reference, and after compilation, the reference is a variable (a value in a register or on the stack) with the propriety.

However, the facts smacked me in the face.

Let’s check it by the following procedure.

1
2
3
4
5
6
7
8
int main()
{
    int a = 0;
    int *pa = &a;
    int &ra = a;
    ++(*pa);
    ++ra;
}

The program declares a variable a, then declares a pointer pa to the address of a and a life reference ra to a. Finally, it performs a self-addition operation on a using the pointer and the address, respectively.

Next, compile the above C++ program into assembly using gcc -S test.cpp -o test.s -O0 to see exactly how all these operations are implemented. (MacOS environment, LLVM 10.0.1)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
	.section	__TEXT,__text,regular,pure_instructions
	.build_version macos, 10, 14	sdk_version 10, 14
	.globl	_main                   ## -- Begin function main
	.p2align	4, 0x90
_main:                                  ## @main
	.cfi_startproc
## %bb.0:
	; 保护现场
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset %rbp, -16
	; 保存栈指针
	movq	%rsp, %rbp
	.cfi_def_cfa_register %rbp
	; 设置返回值为0
	xorl	%eax, %eax
	; int a = 0
	movl	$0, -4(%rbp)
	; t0 = &a
	leaq	-4(%rbp), %rcx
	; int *pa = t0
	movq	%rcx, -16(%rbp)
	; int &ra = t0
	movq	%rcx, -24(%rbp)
	; t0 = pa
	movq	-16(%rbp), %rcx
	; t1 = *t0 = *pa
	movl	(%rcx), %edx
	; ++t1
	addl	$1, %edx
	; *t0 = *pa = t1
	movl	%edx, (%rcx)
	; t0 = &ra = &a
	movq	-24(%rbp), %rcx
	; t1 = *t0 = *(&ra) = ra = a
	movl	(%rcx), %edx
	; ++t1
	addl	$1, %edx
	; *t0 = *(&ra) = ra = a = t1
	movl	%edx, (%rcx)
	; 恢复现场
	popq	%rbp
	; 返回
	retq
	.cfi_endproc
                                        ## -- End function

.subsections_via_symbols

I’ve commented out the key code in the assembly, and you can see that the variable a, the pointer pa, and the reference ra are all located on the stack, with indexes at - 4, -16, and -24. Note that the reference does not directly reuse the -4(%rbp) address of the variable a, but rather, like a pointer, uses a new address and writes the address of a calculated by leaq is written into it.

When self-adding is done, the pointer and reference are used in exactly the same way, except that the address used to copy the contents of the pointer into the register at the beginning is different.

I was very surprised by this result, the compiler actually translated the developer’s operation on references into an operation on pointers.

In the end, it turns out that modern compilers are still very smart, and if you turn up the optimization level, you will find that it directly simplifies all the intermediate computations and returns them directly, because the result of the computation does not have any output, it is not necessary. If the above code is transferred from the main function to another function, the compiler at this point does its best to optimize and return the result directly (movl $2, %eax), although it cannot give up calculating the values in it.

Further experiments

After the article was posted, some people questioned that it might be a specific operation of the gcc compiler on MacOS and not universal, so I repeated the above experiment on Linux and Windows.

The assembly code obtained after compiling with the same instructions in a Linux environment (distribution Ubuntu 18.04, gcc version 7.5.0) is shown below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
	.file	"test_ref.cpp"
	.text
	.globl	main
	.type	main, @function
main:
.LFB0:
	.cfi_startproc
	; 保护现场
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	; 保存栈指针
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	subq	$32, %rsp
	movq	%fs:40, %rax
	movq	%rax, -8(%rbp)
	; 设置返回值为0
	xorl	%eax, %eax
	; int a = 0
	movl	$0, -28(%rbp)
	; t0 = &a
	leaq	-28(%rbp), %rax
	; int *pa = t0
	movq	%rax, -24(%rbp)
	; t0 = &a
	leaq	-28(%rbp), %rax
	; int &ra = t0
	movq	%rax, -16(%rbp)
	; t0 = pa
	movq	-24(%rbp), %rax
	; t1 = *t0 = *pa
	movl	(%rax), %eax
	; t2 = *t0 + 1
	leal	1(%rax), %edx
	; t0 = pa
	movq	-24(%rbp), %rax
	; *t0 = *pa = t2 = *t0 + 1
	movl	%edx, (%rax)
	; t0 = ra
	movq	-16(%rbp), %rax
	; t1 = *t0 = *pa
	movl	(%rax), %eax
	; t2 = *t0 + 1
	leal	1(%rax), %edx
	; t0 = ra
	movq	-16(%rbp), %rax
	; *t0 = *ra = t2 = *t0 + 1
	movl	%edx, (%rax)
	; 返回值设置为0
	movl	$0, %eax
	movq	-8(%rbp), %rcx
	xorq	%fs:40, %rcx
	je	.L3
	call	__stack_chk_fail@PLT
.L3:
	leave
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc
.LFE0:
	.size	main, .-main
	.ident	"GCC: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0"
	.section	.note.GNU-stack,"",@progbits

You can see that the compilation results are basically the same as for gcc in MacOS.

In a Windows environment (Windows 10, vs2019, cl version 19.23.28106.4), you can use the command cl /Od /FA . \test_ref.cpp to compile the source code, you can get the assembly code as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.23.28106.4 

	TITLE	C:\Users\jason\test\test_ref.cpp
	.686P
	.XMM
	include listing.inc
	.model	flat

INCLUDELIB LIBCMT
INCLUDELIB OLDNAMES

PUBLIC	_main
; Function compile flags: /Odtp
_TEXT	SEGMENT
_ra$ = -12						; size = 4
_pa$ = -8						; size = 4
_a$ = -4						; size = 4
_main	PROC
; File C:\Users\jason\test\test_ref.cpp
; Line 2
	push	ebp
	mov	ebp, esp
	sub	esp, 12					; 0000000cH
; Line 3
	mov	DWORD PTR _a$[ebp], 0
; Line 4
	lea	eax, DWORD PTR _a$[ebp]
	mov	DWORD PTR _pa$[ebp], eax
; Line 5
	lea	ecx, DWORD PTR _a$[ebp]
	mov	DWORD PTR _ra$[ebp], ecx
; Line 6
	mov	edx, DWORD PTR _pa$[ebp]
	mov	eax, DWORD PTR [edx]
	add	eax, 1
	mov	ecx, DWORD PTR _pa$[ebp]
	mov	DWORD PTR [ecx], eax
; Line 7
	mov	edx, DWORD PTR _ra$[ebp]
	mov	eax, DWORD PTR [edx]
	add	eax, 1
	mov	ecx, DWORD PTR _ra$[ebp]
	mov	DWORD PTR [ecx], eax
; Line 8
	xor	eax, eax
	mov	esp, ebp
	pop	ebp
	ret	0
_main	ENDP
_TEXT	ENDS
END

The cl compiler has a slightly different format for assembly code than gcc, but the meaning is similar, and the meaning of the assembly code can be determined relatively easily from the lines of code indicated above. As you can see, it also uses the same method as the first two.

To take it a step further, here is a comment from user “XZiar” on Zhihu, whose comment gives more insight into the mechanism.

It doesn’t really mean “interpreting references as pointers”, does it?

At the machine code level, there are no pointers either, only addresses (pointers actually also imply type information). The concept of variables also does not exist, there is only “unformatted data” that is just manipulated by instructions with formatting.

So you see that references and pointers have the same effect because at the machine code level, there is no extra information to indicate the difference between them.

And at the language level, references can indeed be understood as const pointers

In addition, she provides a more in-depth explanation of why references copy addresses in assembly code.

It is also normal for references to be copied over, and it is true that the compiler cannot fully analyze the exact point of the reference at compile time. Consider the following code.

int a=0,b=1; int& c = flag ? a : b;

References are only not reset because they are const, but what they point to can be determined at runtime.

At this point, the exploration of the underlying implementation of pointers and references is basically over. As you can see, without compiler optimizations enabled, mainstream compilers choose to interpret references in C++ as “const pointers”.

But what happens when compiler optimizations are enabled? In MacOS, after changing the return value in the source code to a (to prevent the compiler from optimizing and thinking there is no output and doing nothing), and adjusting the compiler optimization options to O1 and O2, the results are the same, as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
	.section	__TEXT,__text,regular,pure_instructions
	.build_version macos, 10, 15	sdk_version 10, 15, 4
	.globl	_main                   ## -- Begin function main
	.p2align	4, 0x90
_main:                                  ## @main
	.cfi_startproc
## %bb.0:
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset %rbp, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register %rbp
	movl	$2, %eax
	popq	%rbp
	retq
	.cfi_endproc
                                        ## -- End function

You can see that the assembly version of the code omits all code related to pointers, references and memory operations, and directly sets the return value to 2.

From here it can be seen that the compiler’s role is to translate the code written in the language into reasonable assembly code, as long as the assembly code can be executed as the source code really intended. Since machine code can express a limited number of concepts (basically, operations on registers and memory), while high-level languages can express a wide variety of concepts, the compiler needs to map (or see as a translation) various complex concepts in high-level languages into simple concepts in machine code. In the translation of C++ pointers and references, the mainstream C++ compilers have chosen to map them to addresses in machine code, discarding the type information therein.