The document discusses the process from compiling source code to executing a program. It covers preprocessing, compilation, assembly, linking, and the ELF file format. Preprocessing handles macros and conditionals. Compilation translates to assembly code. Assembly generates machine code. Linking combines object files and resolves symbols statically or dynamically using libraries. The ELF file format organizes machine code and data into sections in the executable.
7. ELF Format
• Executable file format
– Derived from COFF(Common Object File Format)
• Windows : PE (Portable Executable)
• Linux: ELF (Executable Linkable Format)
– Dynamic Linking Library (DLL)
• Windows (.dll); Linux (.so)
– Static Linking Library
• Windows (.lib); Linux (.a)
– Object file
• Windows (.obj); Linux (.o)
• Like executable file format
• Intermediate file between compilation and linking
8. File Content
• Machine code, data, symbol table, string table
• File header
– Basic file information
• File divided by sections
– Code Section (.code, .text)
– Data Section (.data)
– Special Section (.symtab, .strtab)
9. File Header
• File header contains following information
– Is executable
– Static Link or Dynamic Link
– Entry address
– Target hardware / OS
– Section Table
10. File Header Structure
• The structure of ELF header is defined as Elf_Ehdr
• e_ident
– The first 4 byte is
‘x7f’, ‘E’,’L’,’F’
– File signature
• e_type
typedef struct
{
unsigned char e_ident[16]; /* ELF identification */
Elf64_Half e_type; /* Object file type */
Elf64_Half e_machine; /* Machine type */
Elf64_Word e_version; /* Object file version */
Elf64_Addr e_entry; /* Entry point address */
Elf64_Off e_phoff; /* Program header offset */
Elf64_Off e_shoff; /* Section header offset */
Elf64_Word e_flags; /* Processor-specific flags */
Elf64_Half e_ehsize; /* ELF header size */
Elf64_Half e_phentsize; /* Size of program header entry */
Elf64_Half e_phnum; /* Number of program header entries */
Elf64_Half e_shentsize; /* Size of section header entry */
Elf64_Half e_shnum; /* Number of section header entries */
Elf64_Half e_shstrndx; /* Section name string table index */
} Elf64_Ehdr;
11. File Header Structure
• e_machine
– 62 for AMD
x86-64 architecture
– 243 for RISC-V
(HITCON 2015)
• e_entry
• e_shoff
– Follow this member
can find the section
table
typedef struct
{
unsigned char e_ident[16]; /* ELF identification */
Elf64_Half e_type; /* Object file type */
Elf64_Half e_machine; /* Machine type */
Elf64_Word e_version; /* Object file version */
Elf64_Addr e_entry; /* Entry point address */
Elf64_Off e_phoff; /* Program header offset */
Elf64_Off e_shoff; /* Section header offset */
Elf64_Word e_flags; /* Processor-specific flags */
Elf64_Half e_ehsize; /* ELF header size */
Elf64_Half e_phentsize; /* Size of program header entry */
Elf64_Half e_phnum; /* Number of program header entries */
Elf64_Half e_shentsize; /* Size of section header entry */
Elf64_Half e_shnum; /* Number of section header entries */
Elf64_Half e_shstrndx; /* Section name string table index */
} Elf64_Ehdr;
12. Sections Table
• Sections Table store
the array of section headers
– Each section is in type
Elf64_Shdr
– Every element 64 bytes
typedef struct
{
Elf64_Word sh_name; /* Section name */
Elf64_Word sh_type; /* Section type */
Elf64_Xword sh_flags; /* Section attributes */
Elf64_Addr sh_addr; /* Virtual address in memory */
Elf64_Off sh_offset; /* Offset in file */
Elf64_Xword sh_size; /* Size of section */
Elf64_Word sh_link; /* Link to other section */
Elf64_Word sh_info; /* Miscellaneous information */
Elf64_Xword sh_addralign; /* Address alignment boundary */
Elf64_Xword sh_entsize; /* Size of entries, if section has table */
} Elf64_Shdr;
e_shoff
.sh.strtab
.sh.strtab + 0x20
14. Code Section
• Code section, most case .text, is used to save
the binary code
• objdump –s
– Display the full contents of all sections
• objdump –d
– Display assembler contents of executable sections
15. Data Section
• There are several sections to store program’s data
– .data → Initialized global variable & static variable
– .rodata → save the constant value in the program
– .bss → save the uninitialized variables
16. Bss section
• BSS section is used to save uninitialized data
or data filled with zero
• This section will not occupy space in the ELF
file
– But have space when loading into memory
20. Static Linking
• Static link is responsible for combining several
object files into final executable
$gcc hello.o -o hello.out
21. Two-pass Linking
• Two-pass Linking
• Space & Address Allocation
– Fetch section length, attribute and position
– Collect symbol(define, reference) and put to a global table
• Symbol Resolution & Relocation
– Modify relocation entry
22. Space & Address Allocation
• Define Symbols
– variables
– Functions
• The virtual address is
allocated after linking
Symbol Table before Linking Symbol Table after Linking
23. Symbol Table
typedef struct elf64_sym {
Elf64_Word st_name; /* Symbol name, index in string tbl */
unsigned char st_info; /* Type and binding attributes */
unsigned char st_other; /* No defined meaning, 0 */
Elf64_Half st_shndx; /* Associated section index */
Elf64_Addr st_value; /* Value of the symbol */
Elf64_Xword st_size; /* Associated symbol size */
} Elf64_Sym;
Size: 24 bytes
Fist symbol
.strtab section at 0x00000570
0x570+0xe(offset) = 0x57e
This symbol is named shared
24. Symbol Resolution & Relocation
• Resolve symbol’s address in the final executable
– Address of external symbols are unknown before linking
– Before linking, the temporary location is put into object
files
– Automatic patch the address with the correct one
25. Relocation Table
• Relocation table records the address of symbol to
patch
typedef struct elf64_rela {
Elf64_Addr r_offset;
Elf64_Xword r_info;
Elf64_Sxword r_addend;
} Elf64_Rela;
r_offset r_info[1] r_info[2] r_addend
shared 14 00 00 00 00 00 00 00 0a 00 00 00 09 00 00 00 00 00 00 00 00 00 00 0
swap 21 00 00 00 00 00 00 00 02 00 00 00 0a 00 00 00 fc ff ff ff ff ff ff ff
0x0a -> R_X86_64_32/R_AMD64_32
0X02 -> R_X86_64_PC32/R_AMD64_PC32
They have different way to patch address
26. Relocation Type and Patch Calculation
•
A - The addend value of the
relocatable field.
S - The value of the symbol
P - The section offset or address of
the storage unit being relocated,
computed using r_offset.
GOT - The address of the global
offset table
https://docs.oracle.com/cd/E2382
4_01/html/819-0690/chapter6-
54839.html
27. Program Execution
• After static linking, we have the executable file
• The loader is most important to make the
program run
– Loading the executable into memory
– Dynamic resolving the symbols
28. Creation of Process
• Create a independent virtual AS
– page directory(Linux)
• Read executable file header, create mapping between virtual
AS and executable file
– VMA, Virtual Memory Area
• Assign entry address to
program register(PC)
– Switch between kernel stack
and process stack
– CPU access attribute
29. Section to Segment Mapping
• Several sections are merge into the segment
– Depend on it’s permission
• Read
• Write
• Execution
30. Load Executable into Mem
• The program header section contains the
information of segments
– Program load into memory in the unit of segments
– Readelf
– Program
header table
in ELF file
31. Disadvantage of Static Linking
• Advantage
– Independent development
– Test individual modules
• Disadvantage
– Waste memory and disk space
• Every program has a copy of runtime library(printf,
scanf, strlen, ...)
• Difficulty of updating module
– Need to re-link and publish to user when a module is updated
32. Dynamic Linking
• Delay linking until execution
– Load Time Relocation
• Example:
– Program1.o, Program2.o, Lib.o
– Execute Program1 → Load Program1.o
– Program1 uses Lib → Load Lib.o
– Execute Program2 → Load Program2.o
– Program2 uses Lib → Lib.o has already been loaded into
physical memory
– Advantage
• Save space
• Easier to update modules
33. Lazy Binding
• Bind when the first time use the
function(relocation, symbol searching)
• More efficient
– In most executions, only small amount of function
called
– Not need to resolve all the symbols
34. Global Offset Table
• Divide into 2 part
– .got
• Store the reference address of global variables
– .got.plt
• Store the reference address of global functions
35. .GOT.PLT
• The first 3 items are fixed and have special purpose
– address of .dynamic section
– link_map
– dl_runtime_resolve
• The following items are functions in the share libraries
.dynamic
.got
.got.plt
.data
Addr of .dynamic
link_map
dl_resolve
print
print
48. Calling Convention
• Consistency between caller and callee •
• Argument passing order and method
– Stack, Register(eax for return value on i386) •
• Stack maintainer
– Keep consistency before and after function call
– Responsibility of caller or callee
– Name-mangling
– Default calling convention in C language is “cdecl”
50. LD_Preload
• Ordinarily the dynamic linker loads shared libs
in whatever order it needs them
• $LD_PRELOAD is an environment variable
containing a colon (or space) separated list of
libraries that the dynamic linker loads before
any others
51. LD_Preload
• Preloading a library means that its functions will
be used before others of the same name in later
libraries
• Allows functions to be overridden/replaced/
intercepted
• Program behaviour can be modified “non-
invasively”
– ie. no recompile/relink necessary
– Especially useful for closed-source programs
– And when the modifications don’t belong in the
program or the library
55. System Call
• User processes cannot perform privileged operations
themselves
• Must request OS to do so on their behalf by issuing system
calls
• System calls elevate privilege of
user process
56. Ltrace
• Tracing system calls in Linux – strace command
• Output is printed for each system call as it is executed,
including parameters and return codes
• ptrace() system call is used to implement strace – Also used by
debuggers (breakpoint, singlestep, etc)
– Maybe anti-debug
– How to solve?
57. Summary
• ELF file format
• Section
• Static Link
• Dynamic Link and Lazy Binding
• LD_Preload
• strace