Program Structure in GNU/Linux (ELF Format)


Published on

Published in: Education
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Program Structure in GNU/Linux (ELF Format)

  1. 1. Program Structure In GNU/Linux Author: Varun Mahajan <>
  2. 2. Contents  $gcc *.c -o Program – Processing of a User Program • Preprocessing • Compilation • Assembly • Linking – ELF Format The content is specific to a GNU/Linux system running on Intel Architecture
  3. 3. Processing of a User Program .c .h (C code) cpp main.c main.i cpp OR (C pre-processor) gcc -E main.c -o main.i .i (Preprocessed C code) /usr/lib/gcc/i486-linux-gnu/4.3.2/cc1 -fpreprocessed cc1 main.i -o main.s -quiet (C compiler) OR gcc -S main.i -o main.s .s (Assembly code) as main.s -o main.o as (Assembler) OR gcc main.s -o main.o .o (Object code)
  4. 4. Program
  5. 5. ELF Format: Object Files ELF Header Program Header Table (optional) Section Header Table Section 1 ... ... ... ... Section n Except the ELF Header, which is in the beginning of the file, rest of the components may be in any order
  6. 6. ELF Header (.o) $readelf -h main.o ELF Identification Relocation is the process of connecting symbolic references with symbolic definitions. For example, when a program calls a function, the associated call instruction must transfer control to the proper destination address at execution Relocatable files must have information that describes how to modify their section contents, thus allowing executable and shared object files to hold the right information for a processs program image An ELF header resides at the beginning and holds a road map describing the files organization ● ELF Identification: (16 bytes) ● Magic no: Identifies the file as ELF object file [0x7f, E, L, F] ● Class: Identifies files class or capacity. ELF32 supports machines with files and virtual address spaces up to 4 gigabytes ● Data: Data encoding for processor-specific data in the object file ● Version: ELF header version number ● OS/ABI: Operating system ● ABI Version: Application Binary Interface version (low-level interface between an application program and the OS) ● Type: Type of the object file (Relocatable, Executable, Shared object, etc) ● Machine: The required architecture for the file ● Entry point address: The virtual address to which the system first transfers the control thus starting the process. If the file has no associated entry point then it holds 0 ● Start of program headers: Program header tables file offset in bytes. If the file has no program header table then it holds 0 ● Start of section headers: Section header tables file offset in bytes. If the file has no section header table then it holds 0 ● Flags: Processor specific flags ● Section header string table index: The section header table index of the entry associated with the section name string table (This section holds section names)
  7. 7. Section Header Table (.o) #Section Header Table (executable) $readelf -S main.oA Section Header Table is an array of Section Headers $readelf -p .shstrtab main.o● Name: Name of the section● Type: Type of the section ● PROGBITS: Holds information whose format and meaning are determined solely by the program ● REL: Holds relocation entries without explicit addends ● NOBITS: Occupies no space in the file but otherwise resembles PROGBITS ● STRTAB: Holds a string table ● SYMTAB: Holds a symbol table● Addr: If this section will appear in the memory image of a process, this member gives the address atwhich sections first byte should reside. Otherwise it contains 0● Off (Offset): The byte offset from the beginning of the file to the first byte in the section● Size: Sections size in bytes● ES (Entry Size): Size in bytes of each entry (For the sections which hold a table of fixed-size entries)● Flg (Flags): Miscellaneous attributes ● W: Contains data that should be writable during process execution ● X: Contains executable machine instructions ● A: Occupies memory during process execution● Lk (Link), Inf (info): Interpretation depends on section type● AL (Address Align): Some sections have address alignment constraints. (0, 1 : no constraints)
  8. 8. .symtab Section: Symbol Table (.o) #.symtab & .dynsym Sections: Symbol Tables (executable) $readelf -s main.o $readelf -p .strtab main.oSymbol Table holds the information needed to locate and relocate a programs symbolic definitions and references● Name: Symbol name ● Size: Size in bytes (for symbols which have associated size, e.g. for● Type: Symbol type data objects). 0 if symbol has no size or unknown size ● NOTYPE: Type not specified ● Ndx (Index): ● OBJECT: Symbol is associated with a data object ● Relevant section header tables index ● FUNC: Symbol is associated with a function or other ● UND: undefined, missing, irrelevant or otherwise executable code meaningless section reference ● SECTION: Symbol is associated with a section ● COM: Unallocated C external variables ● FILE: File symbol ● ABS: Specifies absolute value for the corresponding● Bind: reference ● LOCAL: Symbol not visible outside the object file in which ● Value: For relocatable files: is defined ● Alignment constraints for a symbol whose Ndx is COM ● GLOBAL: Symbol is visible to all object files being ● Section offset for a defined symbol combined
  9. 9. .data & .bss Sections (.o) & .bss Sections (executable) $objdump -DxtT main.o ● .data: Holds initialized data that contribute towards the programs memory image ● .bss: Holds uninitialized data that contribute to the programs memory image. By definition the system initializes the data with zeros when the program begins to run. The section occupies no file space
  10. 10. .rodata Section (.o) $objdump -s main.o $readelf -p .rodata main.o .rodata Section holds read-only data that typically contribute to a non-writable segment in the process image
  11. 11. .text Section (.o) #.text Section (executable) $objdump -DxtT main.o .text Section holds the executable instructions of the program
  12. 12. .rel.text Section (.o) rel.text holds the Relocation Entries for the .text$readelf -r main.o section Relocation entries serve two functions. When a section of code is relocated to a different base address, relocation entries mark the places in the code that have to be modified. In a linkable file, there are also relocation entries that mark references to undefined symbols, so the linker knows where to patch in the symbols value when the symbol is finally defined Section header table: ● Lk (link): Section header index of the associated symbol table ● Inf (Info): Section header index to which the relocation applies Relocation section:Section Header table entries: ● Offset: The location at which to apply the relocation action. For Relocatable file: ● The byte offset from the beginning of the section to the storage unit affected by the relocation ● Info: ● ((info) >> 8) is the symbol table index w.r.t. which the relocation should be made E.g.: A call instructions entry would hold symbol table index of the function being called efunc ((0x1302 >> 8)) = 0x13 = 19 ● ((info) & 0xff) is the Relocation Type (processor specific) E.g.: efunc ((0x1302) & 0xff) = 0x02 (R_386_PC32) gei ((0xf01) & 0xff) = 0x01 (R_386_32) The Link Editor merges one or more relocatable files to for the output (executable or shared object file). It first decides how to combine and locate the input files, then updates the symbol values, and finally performs relocation
  13. 13. Linking with External Libraries A Library is a collection of precompiled object files which can be linked into programs E.g. C Math library, etc Two types: ● Static Library: Archive file (.a). A collection of ordinary object files created using the GNU archiver (ar) When a program is linked against a static library, the machine code from the object files for any external functions used by the program is copied from the library into the final executable (Static Linking) ● Shared Library: Shared Object (.so). It is created from the object files using the -shared option of gcc An executable file linked against a shared library contains only a small table of the functions it requires, instead of the complete machine code from the object files for the external functions. Before the executable file starts running, the machine code for the external functions is copied into memory from the shared library file on disk by the operating system (Dynamic Linking) The standard system libraries are usually found in the directories ‘/usr/lib’ and ‘/lib’
  14. 14. Types of Object Files ● Relocatable File: Holds code and data suitable for linking with other object files to create an executable or shared object file ● Executable File: Holds a program suitable for execution ● Shared Object File: Holds code and data suitable for linking in two contexts: ● The Link Editor may process it with other relocatable and shared object files to create another object file ● The Dynamic Linker combines it with an executable file and other shared objects to create a process image
  15. 15. Processing of a User Program contd... main.o *.a *.so *.o edf.o (Static (Shared (Relocatable) (Relocatable) Libraries) Libraries) ld (Link Editor) Program (Executable) ld -dynamic-linker /lib/ /usr/lib/crt1.o /usr/lib/crti.o /usr/lib/gcc/i486-linux-gnu/4.3.2/crtbegin.o -L/usr/lib/gcc/i486- linux-gnu/4.3.2/ main.o edf.o -lgcc -lgcc_eh -lc -lgcc_eh /usr/lib/gcc/i486-linux-gnu/4.3.2/crtend.o /usr/lib/crtn.o -o Program
  16. 16. ELF Header (.o, executable, .so) $readelf -h main.o $readelf -h Program $readelf -h /lib/
  17. 17. Section Header Table (executable) #Section Header Table (.o)$readelf -S Program ● Type: ● NOTE: Holds information that marks the file in some way ● HASH: Holds symbol hash table ● DYNSYM: Holds a symbol table ● DYNAMIC: Holds information for dynamic linking
  18. 18. .symtab & .dynsym Sections: Symbol Tables (executable) #.symtab Section: Symbol Table (.o) $readelf -s Program
  19. 19. .data & .bss Sections (executable) & .bss Sections (.o) $objdump -DxtT Program
  20. 20. .text Section (executable) #.text Section (.o) $objdump -d Program
  21. 21. .Program Header Table (executable) $readelf -l ProgramAn Object File Segment contains one or more SectionsProgram Header Table is an array of structures, each describing a Segment or other information the system needs toprepare the program for execution● Offset: Offset from the beginning of the file at which the first byte of the segment resides● VirtAddr: The virtual address at which the first byte of the segment resides in the memory● FileSiz: Number of bytes in the file image of the segment● MemSiz: Number of bytes in the memory image of the segment● Flg: Permissions (R W E)● Type: ● PHDR: Specifies the location size of the program header table itself both in file and memory image of the program ● INTERP: Specifies the location and size of a null-terminated path name to invoke as an interpreter ● LOAD: Loadable segment ● DYNAMIC: Specifies dynamic linking information● Align: Gives the value to which the segments are aligned in memory and in the file
  22. 22. Brief description of some Sections ● Following sections provide information for dynamic linking: ● .dynsym: Holds dynamic linking symbol table ● .dynstr: Holds strings needed for dynamic linking, most commonly the strings that represent the names associated with symbol table entries ● .interp: Holds the pathname of program interpreter ● .hash: Holds a symbol hash table ● .dynamic: Holds dynamic linking information ● .rel & .relname: Holds relocation information ● .got & .plt: Global offset table, Procedure linkage table (Content is processor specific) ● .rela & relaname ● Initialization and termination: ● .init: Holds executable instructions that contribute to the process initialization code. When a program starts to run, the system executes the code in this section before calling the main program entry point ● .fini: Holds executable instructions that contribute to the process termination code. When a program exits normally, the system executes the code in this section
  23. 23. Segment Loading ● Executable File Segments typically contain absolute code. To let the process execute correctly, the segments must reside at the virtual addresses used to build the executable ● Shared Object Segments typically contain position-independent code. This lets a segments virtual address change from one process to another, without invalidating the execution behavior
  24. 24. An Example of Dynamic Linking
  25. 25. An Example of Dynamic Linking
  26. 26. An Example of Dynamic Linking Dynamic linker updates this with the Virtual address of printf function GNU/LinuxOffline
  27. 27. An Example of Dynamic Linking Dynamic linker updates this with the Virtual address of printf function
  28. 28. References ● : Introduction to gcc ● Manuals: gcc, ld, ldd, objdump, nm, readelf ● : linkers & loaders ● Tool Interface Standard (TIS) Executable and Linking Format (ELF) Specification Version 1.2 ● ● ● ● ●
  29. 29. END...