This article focuses on function calling conventions in C. The assembly code is combined with real-time observation of stack changes to visualize the process of function calls. This article only discusses the situation in x86/64 architecture, Linux/GCC environment, but other environments should be similar in overall idea and need to deal with these issues.

Preface

What is the Calling Convention?

It is mainly to facilitate the sharing of code and to simplify the way sub-functions are used. How are arguments and return values passed, how are stack frames created and destroyed, and what are the caller and callee responsible for? The calling convention specifies all of these, and as long as both the function definer and the caller follow this convention, then the interaction can be error-free; otherwise, inconsistent state can lead to fatal errors in the program.

The x86 C compiler usually uses the cdecl calling convention by default, and this is the de facto standard for the C language. There are of course a number of other calling conventions, which the reader can learn more about through the references at the end of this article.

Typical stack frame structure

Typical stack frame structure

The figure above shows the structure of a typical stack frame during a call to a subfunction. The stack grows upwards from the high address below, one stack frame upwards for each nested layer of function calls. The return address can be seen as the dividing line for each stack frame, with the top part of it going to the callee and the bottom part of it going to the caller.

ESP and EBP indicate the current top-of-stack position and the base address of the current stack frame, respectively. EBP with appropriate offsets allows easy access to parameters and local variables, as well as fast frame closure.

The calling convention can be divided into two parts, the part that specifies the caller and the part that specifies the callee. They are described separately below.

Caller rules

When initiating a subfunction call, the caller needs to.

  1. first need to save the values of certain registers before calling the subfunction. These registers are designed to be saved by the caller, so that the called function is allowed to modify them. If the caller relies on the values of these registers after the subfunction returns, it must save the values of these registers on the stack before calling the subfunction, and restore them off the stack after the subfunction returns. The registers saved by the caller are EAX, ECX, EDX2;
  2. then, the caller needs to call the sub-function parameters on the stack, the stacking order is from right to left, so the first parameter at the top of the stack (low address)3;
  3. use the call instruction to call the sub-function, this instruction will return the address (that is, the address of the next instruction of the current function) on the stack, and then jump to the sub-function to start execution;

We’ll look at the callee rules later. For now, let’s assume that the subfunction has returned, so the stack has normally reverted to the situation before the call instruction was called. The caller can get the return value of the subfunction from the EAX register. To fully restore the state before the subfunction call, you also need to.

  1. remove the arguments from the stack;
  2. restoring the contents of the caller’s saved registers that were previously stacked out of the stack (in the reverse order of the stack entry). The caller can assume that the other registers have not been modified by the subfunction;

Rules for callee

The sub-function at the beginning needs to.

  1. stack the value of EBP, and then copy the value of ESP to EBP. You can think of this as an open frame operation, first saving the base address of the previous stack frame, then setting the base address of the current stack frame (i.e. the value of the stack pointer when the subfunction first starts execution), the parameters and local variables have a fixed offset from EBP, so they can be accessed through EBP2;
  2. next, allocate the stack space for local variables, which can be achieved by decreasing the value of ESP3;
  3. then you need to add the values of the registers saved by the callee to the stack (if they are used in subfunctions). The registers saved by the transferee include EBX, EDI and ESI;

After executing these 3 steps, the actual function body is executed. When the function body is about to return at the end of execution, it needs to.

  1. put the return value in EAX
  2. put the values of the registers saved by the caller on the stack (in the opposite order of the stack entry)
  3. release the stack space of local variables, this can be done by increasing the value of ESP, or better yet, restoring the value of EBP to ESP.
  4. then restore the EBP of the previous stack frame and take it off the stack
  5. Finally, execute the ret instruction to return. This instruction will return the previously stacked return address out of the stack, and then jump to this return address to continue execution.

C example

Let’s take a practical look at this with a simple example as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
int foo(int a, int b)
{
    int fa = 0x10;
    int fb = 0x20;
    fa = a;
    fb = b;
    return fa + fb;
}

int main()
{
    int ret = 0;
    ret = foo(1, 2);
    return ret;
}

After we compile it, we use objdump to disassemble.

1
2
3
4
5
6
# 内核版本 2.6.8-2-686-smp
# gcc版本 3.3.5
$ gcc -g -O0 test.c
$ objdump -Sd a.out > a.s
# 如果只想编译的话,可以直接
# gcc -S test.c

Next we will visualize the process of calling a function, i.e., the foo function above, in conjunction with the assembly code and the stack in real time.

Caller section

First, let’s look at the first part of the main function

1
2
3
4
5
6
7
8
int main()
{
 804837c:   55                      push   %ebp
 804837d:   89 e5                   mov    %esp,%ebp
 804837f:   83 ec 18                sub    $0x18,%esp
 8048382:   83 e4 f0                and    $0xfffffff0,%esp
 8048385:   b8 00 00 00 00          mov    $0x0,%eax
 804838a:   29 c4                   sub    %eax,%esp
  1. the first two instructions are the open frame operation of main function
  2. the next sub $0x18, %esp allocates the stack space, here more space is allocated, including the space for local variables and arguments when calling subfunctions, and there is still room to spare.
  3. the next and $0xfffff0,%esp, is to align esp 16 bytes.
  4. The next two instructions have no real impact, not sure what the purpose is, guess it is also some compiler behavior.
  5. movl $0x0,0xfffffffc(%ebp) assigns the first local variable (i.e. ret) to 0.

The next part is the part related to the call to the foo function, when the stack is as follows.

stack

At this point EBP points to the base address of the main function stack frame, and ESP points to the top of the stack location. Since our program is very simple and the registers saved by the caller are not used later, the step of saving registers is omitted here. Since the stack space of the parameters has been allocated at the beginning, the next step is to directly assign the value to the corresponding position of the formal reference on the stack.

stack

The parameters are passed in right-to-left order, so the second parameter is passed first and the value at the ESP+4 position is assigned a value of 2.

stack

Then the first argument is passed, and the value of the ESP position is assigned to 1. Once the argument is passed, the call instruction is executed to enter the scope of the subfunction foo.

Let’s skip ahead and assume that the foo function has now returned and look at the last few instructions. The return value is now stored in EAX.

  1. mov %eax,0xfffffffc(%ebp) assigns the return value to the local variable ret
  2. then the main function returns, so the return value should be put into EAX. (Because we compile with -O0, this step seems a bit redundant)
  3. next to the parameters out of the stack, this operation is not performed separately, but merged into the leave instruction
  4. leave instruction, equivalent to mov esp,ebp; pop ebp. It directly closes the current stack frame and clears all the parameters and local variables. As we can see from this example, the compiler does not foolishly perform stack-in and stack-out operations on every function call. In fact, it does some optimization by allocating enough stack space at the beginning to store the local variables and the arguments of the subfunctions to be called later, and then releasing the stack space at the end of the off frame.
  5. If the registers saved by the caller are stacked, the corresponding out-stack operation should be performed here to restore the values of the corresponding registers. Not in our case.

The called part

Now let’s look at the part of the callee foo function. After executing the call instruction in the main function, the stack looks like this.

stack

At this point, the return address is already on the stack, and EBP is still the base address of the previous main function.

stack

First you need to create the stack frame of the foo function, put the EBP on the stack, and save the base address of the main function stack frame.

stack

Immediately after letting EBP point to the current ESP location, EBP becomes the base address of the foo function stack frame. You can then access the form reference and local variables by adding certain offsets to the EBP.

stack

The next instruction allocates the stack space for local variables, and since there are two int-type local variables, esp is subtracted by 8 here.

stack

Normally, if the foo function uses the registers saved by the callee, you need to perform a stack operation here now to save the values of the corresponding registers. Since our program is relatively simple, we don’t have this process.

Next, assign the EBP-4 location to 0x10, which corresponds to int fa = 0x10; in the C code

stack

Similarly, assign the EBP-8 location to 0x20, which corresponds to int fb = 0x10; in the C code.

The next execution assigns the value of the formal parameter a to fa, first assigning the EBP+8 location of the formal parameter a to the EAX register.

stack

Then the value of the EAX register is assigned to the EBP-4 position.

stack

The next step is to assign the value of the formal parameter b to fb, which performs a similar operation, with a little transit through the register EAX.

stack

stack

Next, perform the operation of adding fa and fb, first put the value of EBP-8 into EAX, because of the -O0 relationship this step again seems a little redundant, then EBP-4 value added to EAX, after the execution of EAX becomes 3.

stack

Here the function body has been executed, but there is still some after work to be done.

  1. the return value is already in EAX at this point.
  2. if there are registers saved by the caller on the stack, here you need to perform the corresponding out-stack operation to restore.
  3. Next, you need to release the stack space of the local variable, and then restore the EBP of the previous stack frame to get it off the stack. These two steps are combined in the leave instruction. After executing leave, the stack looks like this. The top of the stack is the return address, and the EBP has been restored to the base address of the main function.

stack

After the last ret instruction is executed, the return address comes out of the stack, the stack reverts to the situation before the execution of the call instruction, and the program jumps to the return address to continue executing the subsequent code of the main function.

The difference on x64

On x64, not only is the number of registers extended to 64 bits, but there are also more registers available. Linux uses the System V AMD64 ABI calling convention on x64, the main elements of which are as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
%rax     %eax    返回值
%rbx     %ebx    被调用者保存
%rcx     %ecx    第四个参数
%rdx     %edx    第三个参数,128位返回值
%rsi     %esi    第二个参数
%rdi     %edi    第一个参数
%rbp     %ebp    基址指针,被调用者保存
%rsp     %esp    堆栈指针,被调用者保存
%r8      %r8d    第五个参数
%r9      %r9d    第六个参数
%r10     %r10d   调用者保存
%r11     %r11d   调用者保存
%r12     %r12d   被调用者保存
%r13     %r13d   被调用者保存
%r14     %r14d   被调用者保存
%r15     %r15d   被调用者保存
xmm0-7           前8个浮点参数
xmm0-1           浮点返回值

The RDI, RSI, RDX, RCX, R8, R9 registers are used to pass the first 6 integer or pointer parameters respectively, and XMM0-7 are used to pass the first 8 floating point parameters. If there are additional parameters, then they are still passed through the stack.

Return values up to 64 bits are passed through RAX, and those up to 128 bits are passed through RAX and RDX. Floating point return values use XMM0 and XMM1.

The registers RBX, RBP, RSP, and R12-R15 are saved by the caller, and the rest are saved by the caller.

Another point worth mentioning is that for leaf-node functions, a 128-byte space (red-zone) is reserved below the stack pointer of the function, which can be used by the compiler to save local variables, thus eliminating some instructions at the beginning.

Let’s recompile the previous example code in x64 environment and see the difference

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
// 内核版本 5.4.0-91-generic
// gcc版本 7.5.0
int main()
{
 628:   55                      push   %rbp
 629:   48 89 e5                mov    %rsp,%rbp
 62c:   48 83 ec 10             sub    $0x10,%rsp
    int ret = 0;
 630:   c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%rbp)
    ret = foo(1, 2);
 637:   be 02 00 00 00          mov    $0x2,%esi
 63c:   bf 01 00 00 00          mov    $0x1,%edi
 641:   e8 b4 ff ff ff          callq  5fa <foo>
 646:   89 45 fc                mov    %eax,-0x4(%rbp)
    return ret;
 649:   8b 45 fc                mov    -0x4(%rbp),%eax
}
 64c:   c9                      leaveq
 64d:   c3                      retq
 64e:   66 90                   xchg   %ax,%ax

You can see that the two parameters are not passed through the stack, but through the edi and esi registers, respectively.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
int foo(int a, int b)
{
 5fa:   55                      push   %rbp
 5fb:   48 89 e5                mov    %rsp,%rbp
 5fe:   89 7d ec                mov    %edi,-0x14(%rbp)
 601:   89 75 e8                mov    %esi,-0x18(%rbp)
    int fa = 0x10;
 604:   c7 45 f8 10 00 00 00    movl   $0x10,-0x8(%rbp)
    int fb = 0x20;
 60b:   c7 45 fc 20 00 00 00    movl   $0x20,-0x4(%rbp)
    fa = a;
 612:   8b 45 ec                mov    -0x14(%rbp),%eax
 615:   89 45 f8                mov    %eax,-0x8(%rbp)
    fb = b;
 618:   8b 45 e8                mov    -0x18(%rbp),%eax
 61b:   89 45 fc                mov    %eax,-0x4(%rbp)
    return fa + fb;
 61e:   8b 55 f8                mov    -0x8(%rbp),%edx
 621:   8b 45 fc                mov    -0x4(%rbp),%eax
 624:   01 d0                   add    %edx,%eax
}
 626:   5d                      pop    %rbp
 627:   c3                      retq

Let’s take a look at the foo function again and notice that it does not allocate stack space for local variables inside it, but assigns them directly via rbp. This is because the foo function is the leaf node function we mentioned earlier, it does not call any other function, so it can use the red-zone space directly.