Runtime Symbol Resolution
Outline Overview of Dynamic Library
Relocatable code / Relocation section
Executable code / PLT & GOT
Runtime process analysis with GDB
Summary
Overview This is about runtime symbol resolution on Linux x86-64 Windows/Mac have different mechanisms Assume source code is written in C
“Linker” means “static linker” (not dynamic) in this material
Build (& Load) Process Usually you just type “gcc foo.c” and it invokes the four sub-processes for you. (text) (text) (text) (ELF) (ELF) (ELF) C code Preprocessed C code Assembly code Executable code Relocatable code Preprocessor Compiler Process Image Assembler Linker Loader DL
What's Dynamic Library? Dynamic library is linked at runtime While static library is linked at “compile-time” Shared by more than one application Often called “Shared library”  Process 1 Process 2 Dynamic Lib Static Lib Static Lib Process 1 Process 2 Memory (or storage) image
How is DL different? Address of symbols (functions, variables) are only known at run-time Linker cannot tell where the DL will be loaded
Executables cannot contain exact addresses Loader must play a role so that the code can call functions (or refer to variables) defined in DLs
Sample Code: Hello, World! #include <stdio.h> int main() {  puts(&quot;Hello, World!&quot;); return 0; } hello.c
Build Preprocess & Compile & Assemble “gcc -c hello.c” generates a relocatable file “hello.o” Preprocess & Compile & Assemble & Link “gcc hello.c” generates an executable file “a.out” % ls  a.out  hello.c  hello.o
Relocatable code % objdump -d hello.o // disassemble text section [...] 0000000000000000 <main>: 0:  55  push  %rbp 1:  48 89 e5  mov  %rsp,%rbp 4:  bf 00 00 00 00  mov  $0x0,%edi 9:  e8 00 00 00 00  callq  e <main+0xe> e:  b8 00 00 00 00  mov  $0x0,%eax 13:  5d  pop  %rbp 14:  c3  retq  This must be a call to “puts”. But ...
“e8” or “callq” Instruction “e8” is “Call … displacement relative to next  instruction...” See “Intel Software Developer's Manual” “e8 00 00 00 00” means “call next instruction”, which doesn't make sense
“00 00 00 00” must be replaced with “puts” “hello.o” must contain relocation info
Relocation section % readelf  -r  hello.o # output edited for better readability   Relocation section '.rela.text' at offset 0x598 contains 2 entries: Offset  Info  Type  Sym. Val.  Sym. Name  + Addend 00000005  00050000000a R_X86_64_32  00000000 .rodata + 0 0000000a   000a 00000002  R_X86_64_PC32   00000000  puts  - 4 […] The symbol has index 0x0a (= “puts”) Show relocation section Replace value at 0x0a addend Value = [value of symbol] + [addend] - [offset]
Summary: Relocatable code Code (or “.text” section) has zero as an address
Relocation table (or “.rela” section) tells what to replace the zero with
The later process must utilize the relocation info to actually link “puts” to the code
Executable code % objdump -d a.out // disassemble text section [...] 00000000004004c4 <main>: 4004c4: 55  push  %rbp 4004c5: 48 89 e5  mov  %rsp,%rbp 4004c8: bf e0 05 40 00  mov  $0x4005e0,%edi 4004cd: e8 e6 fe ff ff  callq  4003b8 <puts@plt> 4004d2: b8 00 00 00 00  mov  $0x0,%eax 4004d7: 5d  pop  %rbp 4004d8: c3  retq Calling “puts@plt”, not “puts”
Executable code: PLT % objdump -d a.out [...] 00000000004003a8 <puts@plt-0x10>: 4003a8: pushq  0x2004b2(%rip)  # 600860 4003ae: jmpq  *0x2004b4(%rip)  # 600868 4003b4: nopl  0x0(%rax) 00000000004003b8 < [email_address] >: 4003b8: jmpq  *0x2004b2(%rip)  # 600870 4003be: pushq  $0x0 4003c3: jmpq  4003a8 <_init+0x18> _GLOBAL_OFFSET_TABLE_+0x8 _GLOBAL_OFFSET_TABLE_+0x10 _GLOBAL_OFFSET_TABLE_+0x18 Machine code is omitted as it's getting cryptic...
Executable code: GOT Symbol Address Value _GLOBAL_OFFSET_TABLE_ + 0x0 0x600858 0x006006c0 _GLOBAL_OFFSET_TABLE_ + 0x8 0x600860 0x00000000 _GLOBAL_OFFSET_TABLE_ + 0x10 0x600868 0x00000000 _GLOBAL_OFFSET_TABLE_ + 0x18 0x600870 0x004003be _GLOBAL_OFFSET_TABLE_ + 0x20 0x600878 0x004003ce GOT contents can be shown with objdump “objdump -s --start-address=0x600858 --stop-address=0x600880 a.out”
Summary: Executable code When “puts@plt” is called … Jump to an address stored in GOT The address  points to the next instruction Push  0  to stack
Jump to “puts@plt – 0x10”
Push an address of a GOT entry to stack
Jump to the address stored in GOT The address is set to 0 What's going on??? ??? Jumping to address 0 will crash the process What's this 0? What's this address used for?

Runtime Symbol Resolution

  • 1.
  • 2.
    Outline Overview ofDynamic Library
  • 3.
    Relocatable code /Relocation section
  • 4.
  • 5.
  • 6.
  • 7.
    Overview This isabout runtime symbol resolution on Linux x86-64 Windows/Mac have different mechanisms Assume source code is written in C
  • 8.
    “Linker” means “staticlinker” (not dynamic) in this material
  • 9.
    Build (& Load)Process Usually you just type “gcc foo.c” and it invokes the four sub-processes for you. (text) (text) (text) (ELF) (ELF) (ELF) C code Preprocessed C code Assembly code Executable code Relocatable code Preprocessor Compiler Process Image Assembler Linker Loader DL
  • 10.
    What's Dynamic Library?Dynamic library is linked at runtime While static library is linked at “compile-time” Shared by more than one application Often called “Shared library” Process 1 Process 2 Dynamic Lib Static Lib Static Lib Process 1 Process 2 Memory (or storage) image
  • 11.
    How is DLdifferent? Address of symbols (functions, variables) are only known at run-time Linker cannot tell where the DL will be loaded
  • 12.
    Executables cannot containexact addresses Loader must play a role so that the code can call functions (or refer to variables) defined in DLs
  • 13.
    Sample Code: Hello,World! #include <stdio.h> int main() { puts(&quot;Hello, World!&quot;); return 0; } hello.c
  • 14.
    Build Preprocess &Compile & Assemble “gcc -c hello.c” generates a relocatable file “hello.o” Preprocess & Compile & Assemble & Link “gcc hello.c” generates an executable file “a.out” % ls a.out hello.c hello.o
  • 15.
    Relocatable code %objdump -d hello.o // disassemble text section [...] 0000000000000000 <main>: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: bf 00 00 00 00 mov $0x0,%edi 9: e8 00 00 00 00 callq e <main+0xe> e: b8 00 00 00 00 mov $0x0,%eax 13: 5d pop %rbp 14: c3 retq This must be a call to “puts”. But ...
  • 16.
    “e8” or “callq”Instruction “e8” is “Call … displacement relative to next instruction...” See “Intel Software Developer's Manual” “e8 00 00 00 00” means “call next instruction”, which doesn't make sense
  • 17.
    “00 00 0000” must be replaced with “puts” “hello.o” must contain relocation info
  • 18.
    Relocation section %readelf -r hello.o # output edited for better readability Relocation section '.rela.text' at offset 0x598 contains 2 entries: Offset Info Type Sym. Val. Sym. Name + Addend 00000005 00050000000a R_X86_64_32 00000000 .rodata + 0 0000000a 000a 00000002 R_X86_64_PC32 00000000 puts - 4 […] The symbol has index 0x0a (= “puts”) Show relocation section Replace value at 0x0a addend Value = [value of symbol] + [addend] - [offset]
  • 19.
    Summary: Relocatable codeCode (or “.text” section) has zero as an address
  • 20.
    Relocation table (or“.rela” section) tells what to replace the zero with
  • 21.
    The later processmust utilize the relocation info to actually link “puts” to the code
  • 22.
    Executable code %objdump -d a.out // disassemble text section [...] 00000000004004c4 <main>: 4004c4: 55 push %rbp 4004c5: 48 89 e5 mov %rsp,%rbp 4004c8: bf e0 05 40 00 mov $0x4005e0,%edi 4004cd: e8 e6 fe ff ff callq 4003b8 <puts@plt> 4004d2: b8 00 00 00 00 mov $0x0,%eax 4004d7: 5d pop %rbp 4004d8: c3 retq Calling “puts@plt”, not “puts”
  • 23.
    Executable code: PLT% objdump -d a.out [...] 00000000004003a8 <puts@plt-0x10>: 4003a8: pushq 0x2004b2(%rip) # 600860 4003ae: jmpq *0x2004b4(%rip) # 600868 4003b4: nopl 0x0(%rax) 00000000004003b8 < [email_address] >: 4003b8: jmpq *0x2004b2(%rip) # 600870 4003be: pushq $0x0 4003c3: jmpq 4003a8 <_init+0x18> _GLOBAL_OFFSET_TABLE_+0x8 _GLOBAL_OFFSET_TABLE_+0x10 _GLOBAL_OFFSET_TABLE_+0x18 Machine code is omitted as it's getting cryptic...
  • 24.
    Executable code: GOTSymbol Address Value _GLOBAL_OFFSET_TABLE_ + 0x0 0x600858 0x006006c0 _GLOBAL_OFFSET_TABLE_ + 0x8 0x600860 0x00000000 _GLOBAL_OFFSET_TABLE_ + 0x10 0x600868 0x00000000 _GLOBAL_OFFSET_TABLE_ + 0x18 0x600870 0x004003be _GLOBAL_OFFSET_TABLE_ + 0x20 0x600878 0x004003ce GOT contents can be shown with objdump “objdump -s --start-address=0x600858 --stop-address=0x600880 a.out”
  • 25.
    Summary: Executable codeWhen “puts@plt” is called … Jump to an address stored in GOT The address points to the next instruction Push 0 to stack
  • 26.
  • 27.
    Push an addressof a GOT entry to stack
  • 28.
    Jump to theaddress stored in GOT The address is set to 0 What's going on??? ??? Jumping to address 0 will crash the process What's this 0? What's this address used for?