ELF
손승하
Contents
1. Concepts and Definitions
2. ELF Header
3. Segments and the Program Header Table
4. Sections and the Section Header Table
5. Relocation and Position Independent Code(PIC)
6. Stripping an ELF Object
7. Symbol Resolution
8. Use of Weak Symbols for Problem Investigations
1. Concepts and Definition
1. Concepts and Definition
• ELF
- ELF(Executable and Linking Format)
- Standard binary format for object files
- 하나의 ELF Header 와 파일데이터로 구성
1. Concepts and Definition
• ELF
1. Concepts and Definition
• Segment and Sections
- An ELF file can be interpreted in two ways
: segments or sections
- Section : Smaller pieces of an ELF file that contain very
specific information (machine instructions, symbol table, …)
- Segment : Larger groupings of one or more sections
All of which have the same memory attributes.
( read-only, writable, … )
1. Concepts and Definition
• Symbol
- ELF 파일 내에 저장되어 있는 변수 또는 함수의 description
- 함수 또는 변수의 간단한 정보(size, value, name 등) 포함
1. Concepts and Definition
• Object file
- computer readable version of a source file
- It still has hints of the original source code.
(hints : symbols names – global and static variable names..)
- Object files cannot be run directly because they do not
contain information about how the object file should be
loaded into memory
1. Concepts and Definition
• Shared Library
- made up of the symbols from one or more object files.
- Can be loaded anywhere in the address space.
- A list of symbols that either defined of undefined.
Undefined symbols must be satisfied through other shared
library.
1. Concepts and Definition
• Executables
- An executable is very similar to a shared library
- Be loaded at a specific address in memory.
- It has a function that is called when a program starts.
- _start() run first in an executable.
1. Concepts and Definition
• Core Files
- Special type of ELF file
- The memory image from a once running process.
- A number of memory segments that were originally used
by the running process.
1. Concepts and Definition
• Linking
- Takes the symbols from the object files, shuffles them into
a specific order
- Combines them into either a shared library or executable.
- Resolve some of the undefined symbols, either using the
functions and variables of the object files
1. Concepts and Definition
• Linking
1. Concepts and Definition
• Linking
1. Concepts and Definition
• Run Time Linking
- The process of matching undefined symbols with defined
symbols at run time
• Program Interpreter / Run Time Linker
- Special library that has the responsibility of bringing a
program up and eventually transferring control over to the
program.
2. ELF Header
2. ELF Header
- ELF Header appears at the start of every ELF file.
2. ELF Header
variable Size(byte)
e_ident 16
e_type 2
e_machine 2
e_version 4
e_entry 8
e_phoff 8
e_shoff 8
e_flags 4
e_ehsize 2
e_phentsize 2
e_phnum 2
e_shentsize 2
e_shnum 2
e_shstrndx 2
x86_64 기준
2. ELF Header
e_ident
e_type
e_machine e_version e_entry
2. ELF Header
e_phoff e_shoff
e_flags
e_ehsize
e_phentsize
e_phnum
e_shentsize
e_shnum
e_shstrndx
2. ELF Header
variable Size(byte) Data(Hex [Dex])
e_ident 16 7f 45 4c 46 02 01 01 00
e_type 2 2 [2]
e_machine 2 3e [64]
e_version 4 1 [1]
e_entry 8 40 04 40 [4,195,392]
e_phoff 8 40 [64]
e_shoff 8 11 a0 [4512]
e_flags 4 0 [0]
e_ehsize 2 40 [64]
e_phentsize 2 38 [56]
e_phnum 2 9 [9]
e_shentsize 2 40 [64]
e_shnum 2 1e [30]
e_shstrndx 2 1b [27]
3. Segments and
The Program Header Table
3. Segments and The Program Header Table
- Program Header Table contains information about the
segments in an ELF file and how to load them into memory
(segments : contiguous ranges of an ELF file that have the
same memory attribution.)
3. Segments and The Program Header Table
- Only used for executable, shared libraries and core files
3. Segments and The Program Header Table
• Element in the 64-bit program header table
3. Segments and The Program Header Table
3. Segments and The Program Header Table
program header itself.
“INTERP” segment. That only includes
the name of the program interpreter
3. Segments and The Program Header Table
program header itself.
“INTERP” segment. That only includes
the name of the program interpreter
The executable contains the
name of the program interpreter
3. Segments and The Program Header Table
DYNAMIC segment used for dynamic
linking.
Special segments for Vendor-specific
information
3. Segments and The Program Header Table
LOAD segment
: Loadable program segment
Note Segment
: The array element specifies the location
and size of auxiliary infomation
4. Sections and the Section
header Table
4. Sections and the Section Header Table
- Information about every part of an ELF file
(except the ELF Header, Program H.T , Section H.T itself)
- List of section header structures, each defining a different
section in the ELF file.
• Section header Table
4. Sections and the Section Header Table
- ELF header contains the file offset of the section header
table
• Section header Table(cont)
4. Sections and the Section Header Table
• String Table Format
- A list of all strings that are used by the ELF file.
- A number of string tables in an ELF file
(dynamic symbol table, main symbol table, section header names)
<string1>0<string2>0<string3>0…<string>00
4. Sections and the Section Header Table
• Symbol Table Format
- An array of ELF symbol structures that describe a function,
variable, or other type of symbol.
- Dynamic Symbol Table : use at runtime to find the various
symbol in ELF object
- Main Symbol Table : static symbol information
used at link time
4. Sections and the Section Header Table
• Symbol Table Format - example
- File offset 0x140
(.data section offset 0x120 + list value offset 0x20)
4. Sections and the Section Header Table
• Section Names and Types
1) .bss
2) .data
3) .dynamic
4) .dynsym
5) .dynstr
6) .fini
7) .got
8) .hash
9) .init
10) .interp
11) .plt
12) .rodata
13) .shstrtab
14) .strtab
15) .symtab
16) .text
17) .rel
4. Sections and the Section Header Table
• Section Names and Types(cont)
- global and file local variables that are not initialized
1) .bss section
- Initialized global, static, writable variables.
2) .data section
- information about dynamic linking
3) .dynamic section
4. Sections and the Section Header Table
• Section Names and Types(cont)
- Symbols that are required for program execution, global
symbols. (not contain static symbol)
4) .dynsym section
- Machine instructions for the function _fini()
5) .fini section
- Machine instructions of the function _init()
6) .init section
4. Sections and the Section Header Table
• Section Names and Types(cont)
- The path name of the program interpreter
7) .interp section
- Read-only constant values, string literals, and other
constant data.
8) .rodata section
- String table that contains the section names of the various
sections.
9) .shstrtab
4. Sections and the Section Header Table
• Section Names and Types(cont)
- Symbol names for the main symbol table.
10) .strtab
- This is full(main) symbol table that also includes all static
function and variables.
11) .symtab
- Executable code of an ELF file.
12) .text
4. Sections and the Section Header Table
• Section Names and Types(cont)
- Simply a table of addresses, residing in the data section.
- GOT is required for position-independent code.
13) .got section(Global Offset Table)
- ELF must implement a quick way to find the various
symbol table names to find the corresponding symbol.
14) .hash section(Symbol hash table)
4. Sections and the Section Header Table
• Section Names and Types(cont)
14) .hash section(Symbol hash table)
4. Sections and the Section Header Table
• Section Names and Types(cont)
- ELF must implement a quick way to find the various
symbol table names to find the corresponding symbol.
- A list of instructions that help functions find other
functions in the address space.
15) .plt section(Procedure Linkage Table)
4. Sections and the Section Header Table
• Section Names and Types(cont)
- Prefixed by the section name that they will be
operation on.
- Relocation is a critical part of ELF
: it allows shared libraries to be loaded anywhere in the
address space.
16) .rel(relocation section)
4. Sections and the Section Header Table
• Section Names and Types(cont)
16) .rel(relocation section)
5. Relocation and Position
Independent Code(PIC)
5. Relocation and Position Independent Code(PIC)
• Linking
- takes the symbols from the object files
- shuffles them into a specific order
- combines them into either a shared library or executable.
- Resolve some of the undefined symbols, either using the
functions and variables of the object files
5. Relocation and Position Independent Code(PIC)
• Linking(cont)
Data
Code
Data
Code
Data
Code
a.o
b.o
c.o
a.o Data
b.o Data
c.o Data
a.o Code
b.o Code
c.o Code
Symbol Table
Relocation Table
Data
Segment
Code
Segment
ABI Header
Symbol Table
Relocation Table
Data Segment
Code Segment
5. Relocation and Position Independent Code(PIC)
• Position Independent Code(PIC) in shared library
- Linker’s problem
: linker create a shared library, it doesn’t know in advance
where it might be loaded.
Linker
….
Shared
Library
?
?
?
Memory
5. Relocation and Position Independent Code(PIC)
• Position Independent Code(PIC) in shared library
- There are two main approaches to solve this problem in
Linux ELF shared libraries.
- Load-time relocation
- Position independent code(PIC)
5. Relocation and Position Independent Code(PIC)
• Some problems of long-time relocation
- The performance problem.
- The non-shareable text section problem
5. Relocation and Position Independent Code(PIC)
• PIC - Introduction
- Add an additional level of indirection to all global data and
function references in the code.
- By utilizing some artifacts of the linking and loading
processes, it’s possible to make the text section of the
shared library position independent.
5. Relocation and Position Independent Code(PIC)
• PIC – Introduction(cont)
- The linker knows both about the sizes of the sections and
about their relative locations.
5. Relocation and Position Independent Code(PIC)
• PIC – The Global Offset Table(GOT)
- A table of addresses, residing in the
data section.
- Get rid of a relocation in the code
section by using GOT
- But, the GOT still has to contain the
absolute address
5. Relocation and Position Independent Code(PIC)
• PIC – Lazy binding optimization
- The work is delayed until the last moment when it’s actually
needed
- The intention of avoid doing this work if its results are
never required during a specific run of a program.
Lazy binding 개념은 PLT에 접목되어 있다.
5. Relocation and Position Independent Code(PIC)
• PIC – The Procedure Linkage Table(PLT)
- The PLT is part of the executable text section.
- a set of entries.
(one for each external function the shared library calls)
5. Relocation and Position Independent Code(PIC)
• PIC – The Procedure Linkage Table(PLT)
Call func@PTL
…
…
PLT[0] :
call resolver
…..
PLT[n] :
jmp *GOT[n]
prepare resolver
jmp PLT[0]
Code :
PLT :
…
GOT[n] :
<addr>
GOT :
In the code, a function func is called.
The compiler translates it to a call to
func@plt, which is some N-th entry in
the PLT
5. Relocation and Position Independent Code(PIC)
• PIC – The Procedure Linkage Table(PLT)
Call func@PTL
…
…
PLT[0] :
call resolver
…..
PLT[n] :
jmp *GOT[n]
prepare resolver
jmp PLT[0]
Code :
PLT :
…
GOT[n] :
<addr>
GOT :
The PLT consists of a special first entry,
followed by a bunch of identically
structured entries, one for each
function needing resolution
5. Relocation and Position Independent Code(PIC)
• PIC – The Procedure Linkage Table(PLT)
Call func@PTL
…
…
PLT[0] :
call resolver
…..
PLT[n] :
jmp *GOT[n]
prepare resolver
jmp PLT[0]
Code :
PLT :
…
GOT[n] :
<addr>
GOT :
1. A jump to a location which is
specified in a corresponding GOT entry.
2. Preparation of arguments for a
‘resolver’ routine
3. Call to the resolver routine, which
resides in the first entry of the PLT
5. Relocation and Position Independent Code(PIC)
• PIC – The Procedure Linkage Table(PLT)
Call func@PTL
…
…
PLT[0] :
call resolver
…..
PLT[n] :
jmp *GOT[n]
prepare resolver
jmp PLT[0]
Code :
PLT :
…
GOT[n] :
<addr>
GOT :
Before the function’s actual address
bash been resolved, the Nth GOT entry
just points to after jump.
5. Relocation and Position Independent Code(PIC)
• PIC – The Procedure Linkage Table(PLT)
Call func@PTL
…
…
PLT[0] :
call resolver
…..
PLT[n] :
jmp *GOT[n]
prepare resolver
jmp PLT[0]
Code :
PLT :
…
GOT[n] :
<addr>
GOT :
Resolver performs resolution of the
actual address of func, places its actual
address into GOT[n] and calss func.
5. Relocation and Position Independent Code(PIC)
• PIC – The Procedure Linkage Table(PLT)
Call func@PTL
…
…
PLT[0] :
call resolver
…..
PLT[n] :
jmp *GOT[n]
prepare resolver
jmp PLT[0]
Code :
PLT :
…
GOT[n] :
<addr>
GOT :
After the First call :
func :
…
…
Code :
GOT[n] points to the actual func
instead of back into the PLT.
5. Relocation and Position Independent Code(PIC)
• PIC – The Procedure Linkage Table(PLT)
5. Relocation and Position Independent Code(PIC)
• PIC – The Procedure Linkage Table(PLT)
This code will be compiled into libmlpic.so, and the focus is going to be on
the call to ml_util_func fromml_func. Let's first disassemble ml_func:
5. Relocation and Position Independent Code(PIC)
• PIC – The Procedure Linkage Table(PLT)
- A jump to an address specified in GOT
- Preparation of arguments for the resolver
- Call to the resolver
Call func@PTL
…
…
PLT[0] :
call resolver
…..
PLT[n] :
jmp *GOT[n]
prepare resolver
jmp PLT[0]
Code :
PLT :
5. Relocation and Position Independent Code(PIC)
• PIC – The Procedure Linkage Table(PLT)
got.plt address = 0x1a9e + 0x56e = 0x2000
5. Relocation and Position Independent Code(PIC)
• PIC – The Procedure Linkage Table(PLT)
In pseudo-assembly, we replace an absolute addressing instruction:
; Place the value of the variable in edx mov edx, [ADDR_OF_VAR]
With displacement addressing from a register, along with an extra indirection:
; 1. Somehow get the address of the GOT into ebx
lea ebx, ADDR_OF_GOT
; 2. Suppose ADDR_OF_VAR is stored at offset 0x10
; in the GOT. Then this will place ADDR_OF_VAR into edx.
mov edx, DWORD PTR [ebx + 0x10]
; 3. Finally, access the variable and place its value into edx.
mov edx, DWORD PTR [edx]
5. Relocation and Position Independent Code(PIC)
• PIC – The Procedure Linkage Table(PLT)
got.plt address = 0x1a9e + 0x56e = 0x2000
got.plt address of ml_func_util
= 0x1a9e + 0x56e + offset = 0x2014
5. Relocation and Position Independent Code(PIC)
• PIC in shared libraries on x64
- In general, the idea is similar for both platforms, but some
details differ because of unique feature of each architecture.
5. Relocation and Position Independent Code(PIC)
• RIP-relative addressing
- 0x6bf : It places the address of myglobal into rax, by referencing
an entry in the GOT.
- It uses RIP relative addressing.
- 0x6c6 + 0x200912 = 0x200fd8
5. Relocation and Position Independent Code(PIC)
• RIP-relative addressing
- GOT starts at 0x200fd0, so myglob is in its second entry.
5. Relocation and Position Independent Code(PIC)
• RIP-relative addressing
- The relocation inserted for the GOT reference to myglob
- A relocation entry for 0x200fd8
- the dynamic linker to place the address of myglob into it once the
final address of this symbol is known.
5. Relocation and Position Independent Code(PIC)
• X64 PIC with function calls
5. Relocation and Position Independent Code(PIC)
• X64 PIC with function calls
The GOT entry holding the actual address of ml_util func is at
0x200a1a + 0x606 = 0x201020
5. Relocation and Position Independent Code(PIC)
• Performance implications
- The relocation inserted for the GOT reference to myglob
- A relocation entry for 0x200fd8
- the dynamic linker to place the address of myglob into it
once the final address of this symbol is known.
5. Relocation and Position Independent Code(PIC)
• PIC vs non-PIC
- Much of the complexity in the ELF
- PIC version include the global offset table
(non-PIC version does not)
5. Relocation and Position Independent Code(PIC)
• PIC in shared libraries on x64
- In general, the idea is similar for both platforms,
but some details differ because of unique feature
of each architecture.
- RIP-relative addressing mode
6. Stripping an ELF object
6. Stripping an ELF object
- ELF objects can be stripped.
- The removal of the main symbol table and other section
that not need for run time.
- Command ‘strip’ : discard symbols from object files.
6. Stripping an ELF object
Strip
6. Stripping an ELF object
note.gnu.build-i got.plt
bss init
comment init_array
data jcr
dynamic note
dynstr plt
dynsym rela.dyn
eh_frame rela.plt
eh_frame_hdr shstrtab
fini strtab
fini_array symtab
gnu.version text
gnu.version_r got
gnu.hash
note.gnu.build-i got.plt
bss init
comment init_array
data jcr
dynamic note
dynstr plt
dynsym rela.dyn
eh_frame rela.plt
eh_frame_hdr shstrtab
fini strtab
fini_array symtab
gnu.version text
gnu.version_r got
gnu.hash
STRIP
7. Symbol Resolution
7. Symbol Resolution
7. Symbol Resolution
- important symbols in each of these ELF files
resm.c
resm.cresm.c
resm.cresm.c
resm.c
7. Symbol Resolution
- Bsymbolic Option
: When creating a shared library, bind references to global
symbols to the definition within the shared library, if any.
Normally, it is possible for a program linked against a
shared library to override the definition within the shared
library. This option is only meaningful on ELF platforms
which support shared libraries.
resm.c
resm.cres.c
resm.cres.cresm.c
7. Symbol Resolution
7. Symbol Resolution
7. Symbol Resolution
resm.c
resm.c
7. Symbol Resolution
- important symbols in each of these ELF files
8. Use of Weak Symbols for
Problem Investigations
8. Use of Weak Symbols for Problem Investigations
• Strong and Weak Symbols
- Program symbols are either strong or weak
Strong : procedures and initialized globals.
Weak : uninitialized globals.
8. Use of Weak Symbols for Problem Investigations
• Linker’s Symbol Rules
Rule 1 : Multiple strong symbols are not allowd
- Each item can be defined only once.
- Otherwise : Linker error
Rule 2 : Given a strong symbol and multiple weak symbol,
choose the strong symbol
- References to the weak symbol resolve to the strong symbol
Rule 3 : If there are multiple weak symbols, pick an arbitrary one
- Can override this with gcc –fno-common
8. Use of Weak Symbols for Problem Investigations
• Linker’s Puzzle
8. Use of Weak Symbols for Problem Investigations
• Weak symbol – Global Variables
- Avoid if you can
: Side effect
: Scope does matter
- Otherwise
: Use static if you can
: Initialize if you define a global variable
: Use extern if you use external global variable
8. Use of Weak Symbols for Problem Investigations
• Weak symbol - Syntax
8. Use of Weak Symbols for Problem Investigations
• Weak symbol – Example
8. Use of Weak Symbols for Problem Investigations
• Weak symbol – Example
8. Use of Weak Symbols for Problem Investigations
• Weak symbol – Example
slow fast
8. Use of Weak Symbols for Problem Investigations
• Weak symbol – Example
- Source for Interception library for malloc, free, and realloc
8. Use of Weak Symbols for Problem Investigations
• Weak symbol – Example
- LD_PRELOAD environment variable instructs the program
interpreter to load a library befor executing a program.
Q & A
감사합니다.
Reference
- Tool Interface Standards – Portable Formats Specification, Version 1.1
- Self-Service Linux , MarkWilding and Dan Behman
- How To Write Shared Libraries, Ulrich Drepper
- Linking, Randy Bryant and Dave O’Hallaron
- https://en.wikipedia.org/wiki/Executable_and_Linkable_Format
- http://man7.org/linux/man-pages/man5/elf.5.html
- http://egloos.zum.com/recipes/v/5010841
- http://www.yolinux.com/TUTORIALS/LibraryArchives-StaticAndDynamic.html
- https://en.wikipedia.org/wiki/Weak_symbol

ELF(executable and linkable format)

  • 1.
  • 2.
    Contents 1. Concepts andDefinitions 2. ELF Header 3. Segments and the Program Header Table 4. Sections and the Section Header Table 5. Relocation and Position Independent Code(PIC) 6. Stripping an ELF Object 7. Symbol Resolution 8. Use of Weak Symbols for Problem Investigations
  • 3.
    1. Concepts andDefinition
  • 4.
    1. Concepts andDefinition • ELF - ELF(Executable and Linking Format) - Standard binary format for object files - 하나의 ELF Header 와 파일데이터로 구성
  • 5.
    1. Concepts andDefinition • ELF
  • 6.
    1. Concepts andDefinition • Segment and Sections - An ELF file can be interpreted in two ways : segments or sections - Section : Smaller pieces of an ELF file that contain very specific information (machine instructions, symbol table, …) - Segment : Larger groupings of one or more sections All of which have the same memory attributes. ( read-only, writable, … )
  • 7.
    1. Concepts andDefinition • Symbol - ELF 파일 내에 저장되어 있는 변수 또는 함수의 description - 함수 또는 변수의 간단한 정보(size, value, name 등) 포함
  • 8.
    1. Concepts andDefinition • Object file - computer readable version of a source file - It still has hints of the original source code. (hints : symbols names – global and static variable names..) - Object files cannot be run directly because they do not contain information about how the object file should be loaded into memory
  • 9.
    1. Concepts andDefinition • Shared Library - made up of the symbols from one or more object files. - Can be loaded anywhere in the address space. - A list of symbols that either defined of undefined. Undefined symbols must be satisfied through other shared library.
  • 10.
    1. Concepts andDefinition • Executables - An executable is very similar to a shared library - Be loaded at a specific address in memory. - It has a function that is called when a program starts. - _start() run first in an executable.
  • 11.
    1. Concepts andDefinition • Core Files - Special type of ELF file - The memory image from a once running process. - A number of memory segments that were originally used by the running process.
  • 12.
    1. Concepts andDefinition • Linking - Takes the symbols from the object files, shuffles them into a specific order - Combines them into either a shared library or executable. - Resolve some of the undefined symbols, either using the functions and variables of the object files
  • 13.
    1. Concepts andDefinition • Linking
  • 14.
    1. Concepts andDefinition • Linking
  • 15.
    1. Concepts andDefinition • Run Time Linking - The process of matching undefined symbols with defined symbols at run time • Program Interpreter / Run Time Linker - Special library that has the responsibility of bringing a program up and eventually transferring control over to the program.
  • 16.
  • 17.
    2. ELF Header -ELF Header appears at the start of every ELF file.
  • 18.
    2. ELF Header variableSize(byte) e_ident 16 e_type 2 e_machine 2 e_version 4 e_entry 8 e_phoff 8 e_shoff 8 e_flags 4 e_ehsize 2 e_phentsize 2 e_phnum 2 e_shentsize 2 e_shnum 2 e_shstrndx 2 x86_64 기준
  • 19.
  • 20.
    2. ELF Header e_phoffe_shoff e_flags e_ehsize e_phentsize e_phnum e_shentsize e_shnum e_shstrndx
  • 21.
    2. ELF Header variableSize(byte) Data(Hex [Dex]) e_ident 16 7f 45 4c 46 02 01 01 00 e_type 2 2 [2] e_machine 2 3e [64] e_version 4 1 [1] e_entry 8 40 04 40 [4,195,392] e_phoff 8 40 [64] e_shoff 8 11 a0 [4512] e_flags 4 0 [0] e_ehsize 2 40 [64] e_phentsize 2 38 [56] e_phnum 2 9 [9] e_shentsize 2 40 [64] e_shnum 2 1e [30] e_shstrndx 2 1b [27]
  • 22.
    3. Segments and TheProgram Header Table
  • 23.
    3. Segments andThe Program Header Table - Program Header Table contains information about the segments in an ELF file and how to load them into memory (segments : contiguous ranges of an ELF file that have the same memory attribution.)
  • 24.
    3. Segments andThe Program Header Table - Only used for executable, shared libraries and core files
  • 25.
    3. Segments andThe Program Header Table • Element in the 64-bit program header table
  • 26.
    3. Segments andThe Program Header Table
  • 27.
    3. Segments andThe Program Header Table program header itself. “INTERP” segment. That only includes the name of the program interpreter
  • 28.
    3. Segments andThe Program Header Table program header itself. “INTERP” segment. That only includes the name of the program interpreter The executable contains the name of the program interpreter
  • 29.
    3. Segments andThe Program Header Table DYNAMIC segment used for dynamic linking. Special segments for Vendor-specific information
  • 30.
    3. Segments andThe Program Header Table LOAD segment : Loadable program segment Note Segment : The array element specifies the location and size of auxiliary infomation
  • 31.
    4. Sections andthe Section header Table
  • 32.
    4. Sections andthe Section Header Table - Information about every part of an ELF file (except the ELF Header, Program H.T , Section H.T itself) - List of section header structures, each defining a different section in the ELF file. • Section header Table
  • 33.
    4. Sections andthe Section Header Table - ELF header contains the file offset of the section header table • Section header Table(cont)
  • 34.
    4. Sections andthe Section Header Table • String Table Format - A list of all strings that are used by the ELF file. - A number of string tables in an ELF file (dynamic symbol table, main symbol table, section header names) <string1>0<string2>0<string3>0…<string>00
  • 35.
    4. Sections andthe Section Header Table • Symbol Table Format - An array of ELF symbol structures that describe a function, variable, or other type of symbol. - Dynamic Symbol Table : use at runtime to find the various symbol in ELF object - Main Symbol Table : static symbol information used at link time
  • 36.
    4. Sections andthe Section Header Table • Symbol Table Format - example - File offset 0x140 (.data section offset 0x120 + list value offset 0x20)
  • 37.
    4. Sections andthe Section Header Table • Section Names and Types 1) .bss 2) .data 3) .dynamic 4) .dynsym 5) .dynstr 6) .fini 7) .got 8) .hash 9) .init 10) .interp 11) .plt 12) .rodata 13) .shstrtab 14) .strtab 15) .symtab 16) .text 17) .rel
  • 38.
    4. Sections andthe Section Header Table • Section Names and Types(cont) - global and file local variables that are not initialized 1) .bss section - Initialized global, static, writable variables. 2) .data section - information about dynamic linking 3) .dynamic section
  • 39.
    4. Sections andthe Section Header Table • Section Names and Types(cont) - Symbols that are required for program execution, global symbols. (not contain static symbol) 4) .dynsym section - Machine instructions for the function _fini() 5) .fini section - Machine instructions of the function _init() 6) .init section
  • 40.
    4. Sections andthe Section Header Table • Section Names and Types(cont) - The path name of the program interpreter 7) .interp section - Read-only constant values, string literals, and other constant data. 8) .rodata section - String table that contains the section names of the various sections. 9) .shstrtab
  • 41.
    4. Sections andthe Section Header Table • Section Names and Types(cont) - Symbol names for the main symbol table. 10) .strtab - This is full(main) symbol table that also includes all static function and variables. 11) .symtab - Executable code of an ELF file. 12) .text
  • 42.
    4. Sections andthe Section Header Table • Section Names and Types(cont) - Simply a table of addresses, residing in the data section. - GOT is required for position-independent code. 13) .got section(Global Offset Table) - ELF must implement a quick way to find the various symbol table names to find the corresponding symbol. 14) .hash section(Symbol hash table)
  • 43.
    4. Sections andthe Section Header Table • Section Names and Types(cont) 14) .hash section(Symbol hash table)
  • 44.
    4. Sections andthe Section Header Table • Section Names and Types(cont) - ELF must implement a quick way to find the various symbol table names to find the corresponding symbol. - A list of instructions that help functions find other functions in the address space. 15) .plt section(Procedure Linkage Table)
  • 45.
    4. Sections andthe Section Header Table • Section Names and Types(cont) - Prefixed by the section name that they will be operation on. - Relocation is a critical part of ELF : it allows shared libraries to be loaded anywhere in the address space. 16) .rel(relocation section)
  • 46.
    4. Sections andthe Section Header Table • Section Names and Types(cont) 16) .rel(relocation section)
  • 47.
    5. Relocation andPosition Independent Code(PIC)
  • 48.
    5. Relocation andPosition Independent Code(PIC) • Linking - takes the symbols from the object files - shuffles them into a specific order - combines them into either a shared library or executable. - Resolve some of the undefined symbols, either using the functions and variables of the object files
  • 49.
    5. Relocation andPosition Independent Code(PIC) • Linking(cont) Data Code Data Code Data Code a.o b.o c.o a.o Data b.o Data c.o Data a.o Code b.o Code c.o Code Symbol Table Relocation Table Data Segment Code Segment ABI Header Symbol Table Relocation Table Data Segment Code Segment
  • 50.
    5. Relocation andPosition Independent Code(PIC) • Position Independent Code(PIC) in shared library - Linker’s problem : linker create a shared library, it doesn’t know in advance where it might be loaded. Linker …. Shared Library ? ? ? Memory
  • 51.
    5. Relocation andPosition Independent Code(PIC) • Position Independent Code(PIC) in shared library - There are two main approaches to solve this problem in Linux ELF shared libraries. - Load-time relocation - Position independent code(PIC)
  • 52.
    5. Relocation andPosition Independent Code(PIC) • Some problems of long-time relocation - The performance problem. - The non-shareable text section problem
  • 53.
    5. Relocation andPosition Independent Code(PIC) • PIC - Introduction - Add an additional level of indirection to all global data and function references in the code. - By utilizing some artifacts of the linking and loading processes, it’s possible to make the text section of the shared library position independent.
  • 54.
    5. Relocation andPosition Independent Code(PIC) • PIC – Introduction(cont) - The linker knows both about the sizes of the sections and about their relative locations.
  • 55.
    5. Relocation andPosition Independent Code(PIC) • PIC – The Global Offset Table(GOT) - A table of addresses, residing in the data section. - Get rid of a relocation in the code section by using GOT - But, the GOT still has to contain the absolute address
  • 56.
    5. Relocation andPosition Independent Code(PIC) • PIC – Lazy binding optimization - The work is delayed until the last moment when it’s actually needed - The intention of avoid doing this work if its results are never required during a specific run of a program. Lazy binding 개념은 PLT에 접목되어 있다.
  • 57.
    5. Relocation andPosition Independent Code(PIC) • PIC – The Procedure Linkage Table(PLT) - The PLT is part of the executable text section. - a set of entries. (one for each external function the shared library calls)
  • 58.
    5. Relocation andPosition Independent Code(PIC) • PIC – The Procedure Linkage Table(PLT) Call func@PTL … … PLT[0] : call resolver ….. PLT[n] : jmp *GOT[n] prepare resolver jmp PLT[0] Code : PLT : … GOT[n] : <addr> GOT : In the code, a function func is called. The compiler translates it to a call to func@plt, which is some N-th entry in the PLT
  • 59.
    5. Relocation andPosition Independent Code(PIC) • PIC – The Procedure Linkage Table(PLT) Call func@PTL … … PLT[0] : call resolver ….. PLT[n] : jmp *GOT[n] prepare resolver jmp PLT[0] Code : PLT : … GOT[n] : <addr> GOT : The PLT consists of a special first entry, followed by a bunch of identically structured entries, one for each function needing resolution
  • 60.
    5. Relocation andPosition Independent Code(PIC) • PIC – The Procedure Linkage Table(PLT) Call func@PTL … … PLT[0] : call resolver ….. PLT[n] : jmp *GOT[n] prepare resolver jmp PLT[0] Code : PLT : … GOT[n] : <addr> GOT : 1. A jump to a location which is specified in a corresponding GOT entry. 2. Preparation of arguments for a ‘resolver’ routine 3. Call to the resolver routine, which resides in the first entry of the PLT
  • 61.
    5. Relocation andPosition Independent Code(PIC) • PIC – The Procedure Linkage Table(PLT) Call func@PTL … … PLT[0] : call resolver ….. PLT[n] : jmp *GOT[n] prepare resolver jmp PLT[0] Code : PLT : … GOT[n] : <addr> GOT : Before the function’s actual address bash been resolved, the Nth GOT entry just points to after jump.
  • 62.
    5. Relocation andPosition Independent Code(PIC) • PIC – The Procedure Linkage Table(PLT) Call func@PTL … … PLT[0] : call resolver ….. PLT[n] : jmp *GOT[n] prepare resolver jmp PLT[0] Code : PLT : … GOT[n] : <addr> GOT : Resolver performs resolution of the actual address of func, places its actual address into GOT[n] and calss func.
  • 63.
    5. Relocation andPosition Independent Code(PIC) • PIC – The Procedure Linkage Table(PLT) Call func@PTL … … PLT[0] : call resolver ….. PLT[n] : jmp *GOT[n] prepare resolver jmp PLT[0] Code : PLT : … GOT[n] : <addr> GOT : After the First call : func : … … Code : GOT[n] points to the actual func instead of back into the PLT.
  • 64.
    5. Relocation andPosition Independent Code(PIC) • PIC – The Procedure Linkage Table(PLT)
  • 65.
    5. Relocation andPosition Independent Code(PIC) • PIC – The Procedure Linkage Table(PLT) This code will be compiled into libmlpic.so, and the focus is going to be on the call to ml_util_func fromml_func. Let's first disassemble ml_func:
  • 66.
    5. Relocation andPosition Independent Code(PIC) • PIC – The Procedure Linkage Table(PLT) - A jump to an address specified in GOT - Preparation of arguments for the resolver - Call to the resolver Call func@PTL … … PLT[0] : call resolver ….. PLT[n] : jmp *GOT[n] prepare resolver jmp PLT[0] Code : PLT :
  • 67.
    5. Relocation andPosition Independent Code(PIC) • PIC – The Procedure Linkage Table(PLT) got.plt address = 0x1a9e + 0x56e = 0x2000
  • 68.
    5. Relocation andPosition Independent Code(PIC) • PIC – The Procedure Linkage Table(PLT) In pseudo-assembly, we replace an absolute addressing instruction: ; Place the value of the variable in edx mov edx, [ADDR_OF_VAR] With displacement addressing from a register, along with an extra indirection: ; 1. Somehow get the address of the GOT into ebx lea ebx, ADDR_OF_GOT ; 2. Suppose ADDR_OF_VAR is stored at offset 0x10 ; in the GOT. Then this will place ADDR_OF_VAR into edx. mov edx, DWORD PTR [ebx + 0x10] ; 3. Finally, access the variable and place its value into edx. mov edx, DWORD PTR [edx]
  • 69.
    5. Relocation andPosition Independent Code(PIC) • PIC – The Procedure Linkage Table(PLT) got.plt address = 0x1a9e + 0x56e = 0x2000 got.plt address of ml_func_util = 0x1a9e + 0x56e + offset = 0x2014
  • 70.
    5. Relocation andPosition Independent Code(PIC) • PIC in shared libraries on x64 - In general, the idea is similar for both platforms, but some details differ because of unique feature of each architecture.
  • 71.
    5. Relocation andPosition Independent Code(PIC) • RIP-relative addressing - 0x6bf : It places the address of myglobal into rax, by referencing an entry in the GOT. - It uses RIP relative addressing. - 0x6c6 + 0x200912 = 0x200fd8
  • 72.
    5. Relocation andPosition Independent Code(PIC) • RIP-relative addressing - GOT starts at 0x200fd0, so myglob is in its second entry.
  • 73.
    5. Relocation andPosition Independent Code(PIC) • RIP-relative addressing - The relocation inserted for the GOT reference to myglob - A relocation entry for 0x200fd8 - the dynamic linker to place the address of myglob into it once the final address of this symbol is known.
  • 74.
    5. Relocation andPosition Independent Code(PIC) • X64 PIC with function calls
  • 75.
    5. Relocation andPosition Independent Code(PIC) • X64 PIC with function calls The GOT entry holding the actual address of ml_util func is at 0x200a1a + 0x606 = 0x201020
  • 76.
    5. Relocation andPosition Independent Code(PIC) • Performance implications - The relocation inserted for the GOT reference to myglob - A relocation entry for 0x200fd8 - the dynamic linker to place the address of myglob into it once the final address of this symbol is known.
  • 77.
    5. Relocation andPosition Independent Code(PIC) • PIC vs non-PIC - Much of the complexity in the ELF - PIC version include the global offset table (non-PIC version does not)
  • 78.
    5. Relocation andPosition Independent Code(PIC) • PIC in shared libraries on x64 - In general, the idea is similar for both platforms, but some details differ because of unique feature of each architecture. - RIP-relative addressing mode
  • 79.
    6. Stripping anELF object
  • 80.
    6. Stripping anELF object - ELF objects can be stripped. - The removal of the main symbol table and other section that not need for run time. - Command ‘strip’ : discard symbols from object files.
  • 81.
    6. Stripping anELF object Strip
  • 82.
    6. Stripping anELF object note.gnu.build-i got.plt bss init comment init_array data jcr dynamic note dynstr plt dynsym rela.dyn eh_frame rela.plt eh_frame_hdr shstrtab fini strtab fini_array symtab gnu.version text gnu.version_r got gnu.hash note.gnu.build-i got.plt bss init comment init_array data jcr dynamic note dynstr plt dynsym rela.dyn eh_frame rela.plt eh_frame_hdr shstrtab fini strtab fini_array symtab gnu.version text gnu.version_r got gnu.hash STRIP
  • 83.
  • 84.
  • 85.
    7. Symbol Resolution -important symbols in each of these ELF files
  • 87.
  • 88.
  • 89.
  • 90.
    7. Symbol Resolution -Bsymbolic Option : When creating a shared library, bind references to global symbols to the definition within the shared library, if any. Normally, it is possible for a program linked against a shared library to override the definition within the shared library. This option is only meaningful on ELF platforms which support shared libraries.
  • 91.
  • 92.
  • 93.
  • 94.
  • 95.
  • 96.
  • 97.
  • 98.
  • 99.
    7. Symbol Resolution -important symbols in each of these ELF files
  • 100.
    8. Use ofWeak Symbols for Problem Investigations
  • 101.
    8. Use ofWeak Symbols for Problem Investigations • Strong and Weak Symbols - Program symbols are either strong or weak Strong : procedures and initialized globals. Weak : uninitialized globals.
  • 102.
    8. Use ofWeak Symbols for Problem Investigations • Linker’s Symbol Rules Rule 1 : Multiple strong symbols are not allowd - Each item can be defined only once. - Otherwise : Linker error Rule 2 : Given a strong symbol and multiple weak symbol, choose the strong symbol - References to the weak symbol resolve to the strong symbol Rule 3 : If there are multiple weak symbols, pick an arbitrary one - Can override this with gcc –fno-common
  • 103.
    8. Use ofWeak Symbols for Problem Investigations • Linker’s Puzzle
  • 104.
    8. Use ofWeak Symbols for Problem Investigations • Weak symbol – Global Variables - Avoid if you can : Side effect : Scope does matter - Otherwise : Use static if you can : Initialize if you define a global variable : Use extern if you use external global variable
  • 105.
    8. Use ofWeak Symbols for Problem Investigations • Weak symbol - Syntax
  • 106.
    8. Use ofWeak Symbols for Problem Investigations • Weak symbol – Example
  • 107.
    8. Use ofWeak Symbols for Problem Investigations • Weak symbol – Example
  • 108.
    8. Use ofWeak Symbols for Problem Investigations • Weak symbol – Example slow fast
  • 109.
    8. Use ofWeak Symbols for Problem Investigations • Weak symbol – Example - Source for Interception library for malloc, free, and realloc
  • 110.
    8. Use ofWeak Symbols for Problem Investigations • Weak symbol – Example - LD_PRELOAD environment variable instructs the program interpreter to load a library befor executing a program.
  • 111.
  • 112.
  • 113.
    Reference - Tool InterfaceStandards – Portable Formats Specification, Version 1.1 - Self-Service Linux , MarkWilding and Dan Behman - How To Write Shared Libraries, Ulrich Drepper - Linking, Randy Bryant and Dave O’Hallaron - https://en.wikipedia.org/wiki/Executable_and_Linkable_Format - http://man7.org/linux/man-pages/man5/elf.5.html - http://egloos.zum.com/recipes/v/5010841 - http://www.yolinux.com/TUTORIALS/LibraryArchives-StaticAndDynamic.html - https://en.wikipedia.org/wiki/Weak_symbol