Hello there! This post is going to be about the Executable and Linkable Format (ELF). This is one of the most important binary formats out there and is designed to store object code in a way that can be used on a wide variety of processor types and operating systems. It’s the most popular format used to contain compiled programs and libraries.
Let’s take a file apart and look at what it’s made of!
ELF Header of a C Program
To kick things off I will build a very small C program, test.c
int main(int argc, char *argv[]) {
return 0;
}
I’ll give it a quick and lazy build with make test
and look at the first 64 bytes of it with hexdump
.
We can see right away this is an ELF file, the first 4 bytes are "\x7fELF"
. The ELF format supports 32-bit and 64-bit machines. This header will be 52 bytes long on a 32-bit machine and 64 bytes long on a 64-bit machine to support the longer memory addresses. For this post I’ll only be looking at 64-bit files.
Here’s the structure of the 64-bit header from the ELF-64 documentation:
typedef struct {
unsigned char e_ident[16]; /* ELF identification */
Elf64_Half e_type; /* Object file type */
Elf64_Half e_machine; /* Machine type */
Elf64_Word e_version; /* Object file version */
Elf64_Addr e_entry; /* Entry point address */
Elf64_Off e_phoff; /* Program header offset */
Elf64_Off e_shoff; /* Section header offset */
Elf64_Word e_flags; /* Processor-specific flags */
Elf64_Half e_ehsize; /* ELF header size */
Elf64_Half e_phentsize; /* Size of program header entry */
Elf64_Half e_phnum; /* Number of program header entries */
Elf64_Half e_shentsize; /* Size of section header entry */
Elf64_Half e_shnum; /* Number of section header entries */
Elf64_Half e_shstrndx; /* Section name string table index */
} Elf64_Ehdr
A visual representation of this structure:
The e_ident
field contains all the information needed to understand what format and version of an ELF file this is. As I mentioned before, the first 4 bytes of this should be \x7f
, E
, L
, F
, the magic number for all ELF files. The remaining bytes of the e_ident
field will indicate if the file is 32 or 64-bit, if the file is in little endian or big endian format and a few other details about what the file was built for.
The e_type
field indicates what type of data is represented in the file, if it’s a object (1
), executable (2
), shared object (3
) or core file (4
). The e_machine
field represents what instruction set the file is in. e_version
is always 1
for now.
Up this this point the header is the same for 32-bit and 64-bit machines. The next three fields e_entry
(program entry point), e_phoff
(program header offset) and e_shoff
(section header offset) are used to refer to memory addresses so are twice as long on a 64-bit machine. The e_entry
field is a pointer to the start of program in virtual memory. If you’re unfamiliar with what virtual memory is I suggest this video from Android Authority’s YouTube channel. The e_phoff
and e_shoff
indicate the offsets within the ELF file to the program and section headers tables.
The rest of the header contains some flags, counts and sizes relating to the program and section headers in the file, I suggest checking out the elf header documentation on Wikipedia for more detail on the header.
We can easily see all of this header detail in a human readable way with the readelf
program.
A Simpler ELF File
I want to dive a bit deeper into the file, but even an empty C program like this contains a ton of data compared to a small Hello, world!
program written in assembly. Here’s my hello world assembly source for a x86_64 Linux machine:
section .text
global _start
_start:
; write(1, msg, len)
mov rax, 1
mov rdi, 1
mov rsi, msg
mov rdx, len
syscall
; exit(0)
mov rax, 60
mov rdi, 0
syscall
section .data
msg db "Hello, world!",0xa
len equ $ - msg
I won’t dive into the syntax of this assembly, but it makes one call to Linux to write Hello, world!\n
to stdout (1
) and a second call to exit the program with 0
. I’ll assemble it into an elf64 object file with nasm -f elf64 hello.asm
and link it into an executable with ld -s -o hello hello.o
.
This file is a lot smaller than the C program, 512 bytes vs 8552. There is also a lot less data in the ELF program and sections headers as indicated by the header.
There are 2 program headers instead of 9 and 4 sections headers instead of 31, so much less to look at!
ELF Sections
Let’s look at more of what readelf
can tell us about this ELF file, starting with closer inspection of the sections headers.
$ readelf -a hello
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x4000b0
Start of program headers: 64 (bytes into file)
Start of section headers: 256 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 2
Size of section headers: 64 (bytes)
Number of section headers: 4
Section header string table index: 3
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .text PROGBITS 00000000004000b0 000000b0
0000000000000027 0000000000000000 AX 0 0 16
[ 2] .data PROGBITS 00000000006000d8 000000d8
000000000000000e 0000000000000000 WA 0 0 4
[ 3] .shstrtab STRTAB 0000000000000000 000000e6
0000000000000017 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
There are no section groups in this file.
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x00000000000000d7 0x00000000000000d7 R E 200000
LOAD 0x00000000000000d8 0x00000000006000d8 0x00000000006000d8
0x000000000000000e 0x000000000000000e RW 200000
Section to Segment mapping:
Segment Sections...
00 .text
01 .data
There is no dynamic section in this file.
There are no relocations in this file.
The decoding of unwind sections for machine type Advanced Micro Devices X86-64 is not currently supported.
No version information found in this file.
Looking first at the section table, there are 4 section entries. The sections headers have the following structure:
typedef struct {
uint32_t sh_name;
uint32_t sh_type;
uint32_t sh_flags;
Elf32_Addr sh_addr;
Elf32_Off sh_offset;
uint32_t sh_size;
uint32_t sh_link;
uint32_t sh_info;
uint32_t sh_addralign;
uint32_t sh_entsize;
} Elf32_Shdr;
Precise detail on this structure, and what the fields are for can be found pretty easily in the Linux man page for elf. Going on the output of readelf
, the first section is is a NULL
section. I don’t quite understand why this first section is empty, but it is part of the ELF standard.
The next two sections are .text
and .data
, which were defined in the assembly code. These are both of type PROGBITS
which means all the data in the section is defined and formatted for the programs own usage, it is a chunk of arbitrary bytes as far as ELF is concerned.
The address
listed is where the section should be loaded in the program’s virtual memory space when the program is loaded by the operating system, if that section should be loaded into memory space of the process. The offset
indicates how many bytes into the ELF file is the start of this section and size
is the length of that section. So we can see here that .text
will be loaded at 0x4000b0
for the process, which is 0xb0
bytes into the ELF binary file on the filesystem and that chunk of data is 0x27
bytes long.
The flags
tells us a few other bits about the section, both .text
and .data
have the allocate flag, so it should reside in the programs memory space when the program is executed. The .text
section is the only section marked as executable, so the system should only allow code execution to occur within that section. The .data
section is the only section with the writable flag, so the memory in that range can be modified by the running program.
The program headers are pretty similar to the section headers, they describe information the system needs to run the program. They describe program segments and generally point to the same regions of memory and the binary file as the sections do, but with some detail that’s specific to executable and shared objects. I’ll defer again to the manual for all the raw details.
That covers my introduction to the ELF format. Hopefully this provides good enough context to explore a bit more on your own. I suggest trying to see if you can build an ELF file from scratch or modify one that is already built for some binary hacking shenanigans!