C Source Code to x86 Binary

Written on January 12, 2021

This blog post details how C source code is compiled into x86 binary.

Post Outline

Source Code to Executable

How is a program (specifically in C) that someone writes understood and executed by the computer? This is a multistep process that turns the source code, written by the programmer, into machine code, which can be executed by the machine. We will follow the process with the example C code below.

#include <stdio.h>

int main() {
    int x = 1 + 2;
    printf("Hello world!\n");
    return 0;
}

Here is an image detailing the whole process.

_config.yml Source

Preprocessor and Compiler

First, the source code is put through a preprocessor. In the case of the C code, this could be including header files and expanding macros. Then, a compiler parses the source code and turns it into an Abstract Syntax Tree, an internal representation of the program. The compiler then puts the code through many different intermediate representations that each optimize the code (e.g. removing unused code, loop unrolling). The result of the compiler is assembly code, which is still human readable.

Calling Conventions Aside

Below is the assembly representation of the above example C code.

0000000000001135 <main>:
    1135:	55                   	push   %rbp
    1136:	48 89 e5             	mov    %rsp,%rbp
    1139:	48 83 ec 10          	sub    $0x10,%rsp
    113d:	c7 45 fc 03 00 00 00 	movl   $0x3,-0x4(%rbp)
    1144:	48 8d 3d b9 0e 00 00 	lea    0xeb9(%rip),%rdi        # 2004 <_IO_stdin_used+0x4>
    114b:	e8 e0 fe ff ff       	callq  1030 <puts@plt>
    1150:	b8 00 00 00 00       	mov    $0x0,%eax
    1155:	c9                   	leaveq
    1156:	c3                   	retq
    1157:	66 0f 1f 84 00 00 00 	nopw   0x0(%rax,%rax,1)
    115e:	00 00

We can see the C language calling conventions in the snippet. There is a __libc_start_main() function that initializes the execution environment and calls the main() function with the appropriate arguments. We can see the standard subroutine prologue, body, and epilogue in the assembly code. Below is an image of the stack in memory during the execution of some function with three input parameters and a local variable. Note that the architecture below is 32 bit so the registers start with e as opposed to r.

_config.yml Source

Subroutine Prologue

First, we push the old base pointer value stored in %rbp, the register for the base pointer, onto the stack. Then we move the value that the stack pointer %rsp is pointing to into %rbp’s address. There is then a sub $0x10,%rsp, which moves our stack pointer down the stack with some space between the saved value of the %rbp base pointer. This middle space will contain local variables that will be filled in the subroutine body.

Subroutine Body

We see that the value 0x3 is moved into -0x4(%rbp), the location of the local variable x. We also see that there is a puts() call where we might have expected a printf() call. This is because the compiler saw that we were not passing in any parameters to printf() and called puts() instead, which only takes a single argument.

We also see that the value 0x0 is moved into the register %eax. The register %eax stores the return value, which in this case is 0 as can be seen in the source code.

Subroutine Epilogue

We see there a leaveq instruction. At this point in the function, we have finished our work in the current function being called (i.e. the callee) and want to return to the previous function (i.e. the caller). Therefore, we want to deallocate our local variables and restore our base pointer, which marks the start of a function’s stack frame to its previous value. Both of the above actions are done with the leaveq instruction, which copies the frame pointer in the %rbp register into the stack pointer %rsp. This releases the stack space allocated by the stack frame. The leaveq instruction then pops the value pointed to by %rsp into %rbp and thus restoring the calling procedure’s stack frame

Note that the leaveq instruction is functionally equivalent to the instructions mov %rsp, %rbp and pop %ebp.

The leaveq is then followed by a retq instruction. At the point that the retq instruction is being executed, the stack pointer %rsp points to the address 4 bytes above the saved value of %rbp, which is the return address of the next instruction in the callee function’s code to be executed. The retq instruction pops the return address from the stack into the instruction pointer %rip, which points to the next instruction to be executed. Thus the program will resume executing at the saved return address (which was pushed onto the stack by the caller function).

Note that the NOPW is a no-op instruction, which stands for no operation. The instruction does nothing but simply takes up a word sized byte for alignment purposes so that a code sequence ends on a particular memory boundary.

Assembler and Linker

To turn the assembly code into machine code, there is an assembler that generates a binary object file. Then, a static linker looks at the main binary and resolves any imports that the main binary might need, creating another binary object file that includes all the required static libraries. Note that there is also dynamic linking, which is when shared libraries are imported at run time. Also note that when people say compile they generally mean the entire process: preprocessing, compiling, assembling, and linking.

Dynamic Linking vs Static Linking Aside

In static linking, the entire library of a function that the main program is using is copied over and compiled, which may result in a lot of unnecessary bloat. Dynamic linking solves this problem by linking the libraries that the main program requires at run time. The downside is that while compilation time and resulting binary size is decreased, running time is increased because of the need to import functions being used during runtime.

In our code, we have #include <stdio.h>. This line uses the #include preprocessor to tell the compiler what libraries to import. On Linux, C standard libraries like stdio are by default linked dynamically.

Static Library vs Shared Library Aside

When program is linked against a static library, machine code from the object files for any external functions used by the program is copied from the library into the final executable.

A shared library is handled with dynamic linking, which makes the executable file smaller than if the library were static. Dynamic linking works by including only the functions that the main executable requires (instead of the complete machine code from the object files for the external functions that a static library would include). Before the executable file starts running, the machine code for the external functions is copied into memory from the shared library file on disk by the OS. The advantage is that the library can be updated without recompiling the programs which use it. With dynamic linking of shared libraries, one copy of a library can be shared between multiple programs.

Executable Format

Note that this blog post specifically covers compilation on x86 processors, specifically x86-64 (the 64 bit version). The term x86 refers to a family of instruction set architectures. An instruction set architecture can be thought of as an abstract model of a computer. It specifies the architecture and the programming environment that processors supporting x86 must support, including the instruction set supported, the types of registers, and the data types. A CPU is an example of an implementation of an instruction set architecture.

On x86 processors, the ELF format (Executable and Linkable Format) is the standard binary file format. This ELF file is the output of the assembler as described in the process above.

_config.yml Source

We can view our program’s elf binary after the entire compilation process using readelf -e playground.

ELF Header

_config.yml

ELF files have three parts: the elf header, section headers, and program headers. The elf header general information about the binary, such as the byte ordering (i.e. endianness), whether to use 32-bit or 64-bit addresses, and the type of the ELF binary.

There are multiple types of ELF binaries: REL (relocatable object file), DYN (representing a shared library), and EXEC (the final product). Note that the output of the assembler before linking is a binary of type REL. This is because at that stage, the code in the binary does not contain exact addresses. After the linker, we should get an EXEC. However, we are getting a ELF of type DYN. This is because GCC, our compiler, by default enables PIE (position independent code). This means that the addresses are not finalized yet. The purpose of PIE is to achieve ASLR (address space layout randomization) to make it harder for bad actors to hijack the binary by knowing the exact locations of parts of the binary during runtime.

Program Headers

_config.yml The purpose of the program header is to tell the OS how to load the program binary into memory and execute it. Note that the program header is requesting a program interpreter. Even though C is a compiled language, the compiler uses this interpreter during dynamic loading, which involve the final steps needed to complete the loading and prepare the program for execution.

Section Headers

Sections make up segments, which are part of the program header. Note that only the first few sections are included in the picture for brevity.

_config.yml

We can see that there is a Section to Segment mapping section where sections are assigned to segments.

_config.yml

The section header contains meta data about the sections, including the location, the size and the name. The sections are used to divide up the binary logically as described in more detail in the next section Running the program. The .text section contains the executable code. The .data section holds initialized data. The .bss section holds uninitialized data.

Program Execution

When we run the program, a process is created and the program represented by an ELF is loaded from the file into the process’s address space. The executable file has instructions for the OS on the address space layout. The OS looks at the ELF to load segments into memory and for invoking a user space run-time linker program. The process’s single thread of control then executes the program’s code. This program consists of executable code and data. Note that the code, once loaded, is never modified. On the other hand, much of data is modified so it makes sense to divide the two and put the code in a part of the address space that prevents modification. We could put all data in a readable writable region but that’s not necessarily what we want. There are global and local variables. Global variable’s scope is the entire program. Local variables scope is just the block and are not needed after the thread exits the local variable’s scope. Therefore, we will have different sections for each. These sections are described previously in the Section Headers part.

The OS also allocates the stack and the heap for the program. The stack is used for local variables of fixed size. We encountered this previously in the Calling Conventions Aside when analyzing the assembly code. The heap is used for dynamic memory location and variables allocated here can be accessed globally.

Resources