Mach-O Internals


Published on

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • As a result of compiling we get an object file.
  • Header information: overall information about the file, such as the size of the code, name of the source file it was translated from, and creation date. Object code: Binary instructions and data generated by a compiler or assembler. Relocation: A list of the places in the object code that have to be fixed up when the linker changes the addresses of the object code. Symbols: Global symbols defined in this module, symbols to be imported from other modules or defined by the linker. Debugging information: Other information about the object code not needed for linking but of use to a debugger. This includes source file and line number information, local symbols, descriptions of data structures used by the object code such as C structure definitions.
  • Header – here, information on the target architecture and different options of the further file contents interpretation are stored. Load commands — these commands inform how and where to load Mach-O parts: segments (see below), symbol tables, and also informs which libraries this file depends on to load them first. Segments — these describe regions of memory where to load sections with the code or data.
  • Everything begins from a magic value (0xFEEDFACE or vice versa, depending on the agreement concerning the order of bytes in machine words). Then, the processor architecture type, number and size of load commands, and flags that describe other specifics are defined.
  • The existing load commands are listed below: LC_SEGMENT — contains different information on a certain segment: size, number of sections, offset in the file and in memory (after the load) LC_SYMTAB — loads the table of symbols and strings LC_DYSYMTAB — creates an import table; data on symbols is taken from the symbol table LC_LOAD_DYLIB — defines the dependency from a certain third-party library For example (for 32- and 64-bit versions, correspondingly) The most important segments are the following: __TEXT — the executed code and other read-only data __DATA — data available for writing; including import tables that can be changed by the dynamic loader during lazy binding __OBJC — different information of the standard library of Objective-C language of execution time __IMPORT — import table only for 32-bit architecture (I managed to generate it only on Mac OS 10.5) __LINKEDIT — here, the dynamic loader places its data for already loaded modules (symbol tables, string tables, etc.) The most interesting sections in the listed segments are the following: __TEXT,__text — the code itself __TEXT,__cstring — constant strings (in double quotes) __TEXT,__const — different constants __DATA,__data — initialized variables (strings and arrays) __DATA,__la_symbol_ptr — table of pointers to imported functions __DATA,__bss — non-initialized static variables
  • Of course, it’s worth mentioning that executable files and libraries “have learned” to store several variants of the executable code at once. It is due to the repeated gradual change of target architectures by the Apple Company (Motorola -> IBM -> Intel). In the general case, such files are called fat binary. In fact, these are several Mach-O gathered in one file but the header of the last is special. It contains information on the number and type of supported architectures and the offsets to each of them. Simple Mach-O with the structure described above are located by such offset. Where magic means 0xCAFEBABE (or vice versa, we should remember about different order of bytes in machine words on different processors). And then, exactly nfat_arch (number) structures of the described below type follow
  • Welcome to __TEXT, __symbol_stub1. This table is a set of JMP instructions for each imported function. In our case, we have only one such instruction that is presented above.
  • Each such instruction performs a jump to the address that is defined in the corresponding cell of the __DATA, __la_symbol_ptr table. The last one is an import table for this Mach-O
  • We get into the __TEXT, __stub_helper section. Actually, it’s a PLT (Procedure Linkage Table) for Mach-O. By means of the first instruction (in our case, it’s LEA in the connective with R11 but it could also be a simple PUSH), the dynamic linker remembers, which symbol requires the relocation. The second instruction always leads to one and the same address – to the beginning of the function - __dyld_stub_binding_helper, which will perform linking
  • After the dynamic linker performs relocations for puts(), the corresponding cell in __DATA, __la_symbol_ptr will look like the following: And this is the address of the puts() function from the libSystem.B.dylib module. It means that we will receive the required effect of the call redirection by replacing the address with our own one.
  • Now let's get armed with Mach-O View and explore the files been generated.
  • Mach-O Internals

    1. 1. Mach-O Internals <ul><li>Anthony Shoumikhin
    2. 2. </li></ul>
    3. 3. Agenda <ul><li>Program linking and loading on Mac OS X
    4. 4. Mach-O structure
    5. 5. Dynamic linking details
    6. 6. Run-time hooking </li></ul>
    7. 7. Compiling <ul><li>Converting human-readable text file to Mach-O binary </li><ul><li>Preprocessing
    8. 8. Generating assembler
    9. 9. Assembling to object file </li></ul></ul>
    10. 10. Compiling <ul><li>clang -c test.c </li><ul><li>clang -E # Preprocess, but don't compile
    11. 11. clang -S # Compile, but don't assemble
    12. 12. clang -c # Asseble, but don't link </li></ul><li>Object file (Mach-O format) </li></ul>
    13. 13. Object file <ul><li>Generated by ld </li><ul><li>Header information
    14. 14. Object code
    15. 15. Relocation
    16. 16. Symbols
    17. 17. Debugging info </li></ul></ul>
    18. 18. Symbols in object files <ul><li>Calls in code </li><ul><li>Defined functions
    19. 19. Undefined functions </li></ul><li>References to static data </li><ul><li>Defined variables
    20. 20. Undefined variables </li></ul></ul>
    21. 21. Linking <ul><li>Process of resolving of undifined symbols </li></ul>
    22. 22. Linking <ul><li>ld just converts Mach-O files of one type to another
    23. 23. Executables and dynamic-linked Mach-O have no undefined symbols </li></ul>
    24. 24. Dynamic-linked library <ul><li>A complete Mach-O file without startup code
    25. 25. Used to be linked against like any other object file during linking by ld, but does not become a part of executable
    26. 26. Could be loaded on executable startup or manually in code at any moment </li></ul>
    27. 27. Loading <ul><li>Transferring of Mach-O file into process memory </li></ul>
    28. 28. Process memory layout Arguments & environment Stack unused memory Heap Uninitialized data Initialized data Text
    29. 29. File mapping into memory <ul><li>Code maps readonly
    30. 30. Data maps copy-on-write </li></ul>
    31. 31. Introducing Mach-O
    32. 32. File layout
    33. 33. otool – CLI exploring <ul><li>man otool
    34. 34. -v (verbose) rulez </li></ul>$ otool -h (architecture i386): Mach header magic cputype cpusubtype caps filetype ncmds sizeofcmds flags 0xFEEDFACE 7 3 0x00 2 19 2356 0x00000085 (architecture ppc): Mach header magic cputype cpusubtype caps filetype ncmds sizeofcmds flags 0xFEEDFACE 18 0 0x00 2 17 2412 0x00000085
    35. 35. Mach-O View – GUI advantages
    36. 36. Header struct mach_header { uint32_t magic; cpu_type_t cputype; cpu_subtype_t cpusubtype; uint32_t filetype; uint32_t ncmds; uint32_t sizeofcmds; uint32_t flags; };
    37. 37. Load Commands x32 x64
    38. 38. Example - LC_SYMTAB struct load_command { uint32_t cmd; uint32_t cmdsize; //custom fields };
    39. 39. Introducing Fat Mach-O <ul><li>Several Mach-O of different target architecture in one </li><ul><li>struct fat_header
    40. 40. {
    41. 41. uint32_t magic; //0xCAFEBABE
    42. 42. uint32_t nfat_arch;
    43. 43. };
    44. 44. struct fat_arch
    45. 45. { cpu_type_t cputype;
    46. 46. cpu_subtype_t cpusubtype;
    47. 47. uint32_t offset;
    48. 48. uint32_t size;
    49. 49. uint32_t align;
    50. 50. }; </li></ul></ul>
    51. 51. Let's explore dynamic linking <ul><li>Test bed </li><ul><li>File test.c
    52. 52. void libtest(); //from libtest.dylib int main() { libtest(); //calls puts() from libSystem.B.dylib return 0; }
    53. 53. File libtest.c #include <stdio.h> void libtest() //just a simple library function { puts(&quot;libtest: calls the original puts()&quot;); } </li></ul></ul>
    54. 54. Debugging external call <ul><li>Here is a simple CALL </li></ul>
    55. 55. Debugging external call <ul><li>Welcome to __TEXT, __symbol_stub1 - a set of JMP instructions for each imported function </li></ul>
    56. 56. Debugging external call <ul><li>Each such instruction performs a jump to the address that is defined in the corresponding cell of the __DATA, __la_symbol_ptr table </li></ul>
    57. 57. <ul><li>Procedure Linkage Table </li></ul><ul><li>Welcome to __TEXT, __stub_helper - a PLT for Mach-O </li><ul><li>remember which symbol requires the relocation
    58. 58. jump to __dyld_stub_binding_helper for actual linking </li></ul></ul>
    59. 59. Dynamic linker - dyld <ul><li>dyld changes the corresponding cell in __DATA, __la_symbol_ptr </li></ul>
    60. 60. Let's hook
    61. 61. Mach-O hook tool <ul><li> </li><ul><li>void * mach_hook_init ( char const * library_filename , void const * library_address );
    62. 62. mach_substitution mach_hook ( void const * handle , char const * function_name , mach_substitution substitution );
    63. 63. void mach_hook_free (void * handle ); </li></ul><li>Just download it and run the test project! </li></ul>
    64. 64. Mach-O exploring (live demo) <ul><li>$ arch -x86_64 ./test
    65. 65. libtest: calls the original puts()
    66. 66. -----------------------------
    67. 67. libtest: calls the original puts()
    68. 68. HOOKED!
    69. 69. -----------------------------
    70. 70. libtest: calls the original puts() </li></ul>
    71. 71. Questions <ul><li>More at </li></ul>