BINARY TRAIN
- PART I
CMJ / 2017.03.18
OUTLINE
NEXT 45 MIN
▸ In the next 45 min
▸ Learn the Mach-O binary format
▸ X86-64 Assembly Language / Machine Code
▸ Trivial Binary Bugs
▸ Order by DESC
ㄌㄡˋ洞 就在那邊
民明書房
不可不知的⼗⼤名句
BUG TO VULNERABILITY
SIGNAL
▸ There are so~ many SIGNAL in *nix-like system
▸ Some is helpful
▸ Some is bug prevention
▸ Understand the bug will find the vulnerabilities
▸ SIGFPE - devision-by-zero
▸ SIGILL - illegal instruction
▸ SIGSEGV - invalid virtual memory reference
BUG TO VULNERABILITY
SIGNAL
▸ There are so~ many SIGNAL in *nix-like system
▸ Some is helpful
▸ Some is bug prevention
▸ Understand the bug will find the vulnerabilities
▸ SIGFPE - devision-by-zero
▸ SIGILL - illegal instruction
▸ SIGSEGV - invalid virtual memory reference
BUG TO VULNERABILITY
ILLEGAL & INVALID
▸ Caused by compiler, library, logical
▸ Compiler - replace a newer compiler
▸ Run-time library - replace a newer library
▸ Run-time logical - replace a correct input
▸ 都是 They 的錯
BUG TO VULNERABILITY
ILLEGAL & INVALID
▸ Caused by compiler, library, logical
▸ Compiler - replace a newer compiler
▸ Run-time library - replace a newer library
▸ Run-time logical - replace a correct input
▸ 都是 They 的錯
VULNERABILITY
INPUT
▸ User Input
▸ User-Name, Age, email-address, Gender
▸ Store the user input into memory space
▸ ISSUE
A. How
B. What
C. Where
WORLD IN
X64-64
CPU
X86-64
▸ Register - extend to 64-bits
▸ 8 / 16 / 32 / 64 bits
▸ 128 bits (SSE)
▸ NX (No-Execute) bit
▸ Register is limited
▸ limited to 16 general registers
▸ 16 SSE registers
CPU
X86-64
▸ Von Neumann model
▸ Code / Data are put together (memory)
▸ When data need to be stored / loaded
▸ from register to memory
▸ from memory to register
STORAGE
SOMETHING IN MEMORY
▸ Code vs Data vs BSS vs Stack vs Heap
▸ Code is used to read-execute
▸ Data is used to read-write
▸ BSS is used to store Non-Initial data
▸ Stack is used to store template (local) data
▸ Heap is used to store dynamic data
▸ All of these are stored in the memory
HOPE YOU HAVE …
DATA IN PROGRAM
▸ Data
▸ Gender - one letter or full description
▸ Age - possible integer or impossible integer
▸ Name - alphabet or unicode
▸ All data in register / memory are integer-like
▸ 8-bit (0~255) to SSE (0 ~ 3.4e38)
▸ sign or unsigned is a question
HOPE YOU HAVE …
DATA IN PROGRAM
▸ Can simply put age into register
▸ Gender could be
▸ one letter - to ASCII and put in register
▸ Fix-length - store in memory
▸ Name should be
▸ store in memory
MEMORY
WHERE TO STORE
▸ Memory
▸ Sequently store user input
▸ decode by program / programmer
▸ ISSUE
▸ size
▸ permission
MEMORY
WHERE TO STORE
▸ Data vs BSS vs Stack vs Heap stack
▸ Fit the scenario (assumption)
▸ data is
1. temporary
2. global view
3. variable size
綠⾖糕、稿紙
どっち
你或許看過的 - 雅量
DECODE
⽂字
MOV
▸ In x86-64 opcodes
▸ lots of opcodes are MOV
▸ move from/to memory are frequently used actions
▸ mov ch, dl
▸ mov rax, [rax-0x10]
▸ mov [r8], rsp
▸ lea cx, [rbx]
▸ But there are difference opcode!
AGE
SAVE DATA
▸ Save 18 as age into program
▸ mov rax, 18 ; save as register
▸ mov [rax], 18 ; save into memory
▸ push 18 ; save into stack
GENDER
SAVE DATA
▸ Save ‘F’ (0x46) as gender into program
▸ mov rax, 0x46 ; save as register
▸ mov [rax], 0x46 ; save into memory
▸ push 0x46 ; save into stack
GENDER
SAVE DATA
▸ Save ‘Female’ as gender into program
▸ mov [rax], 0x46656D61
▸ mov [rax+0x04], 0x6C650000
▸ push 0x46
▸ push 0x65
▸ push …
MEMORY
SIZE IS MATTER
▸ Step to store data in memory
1. decide the size of memory
2. how to encode/decode data
3. decide the location of memory
4. put into / get from memory
MEMORY
OVERESTIMATE VS UNDERESTIMATE
▸ Over
▸ memory leak - OOM
▸ waste resource
▸ Under
▸ data corrupt
▸ overflow
MEMORY
▸ move to memory space
▸ Where is the space? BSS or Data or Heap
▸ Compile-time or Run-time
▸ fix-length or variable-length
▸ Save into Stack
▸ Push stack is not unlimited
IN C LANGUAGE
ASSUMPTION
▸ Struct in C
struct foo {
int age;
char gender[8];
char email[128];
};
‣ What happen if overflow in gender
‣ email is corrupt / age is corrupt
age
gender
email
0x1230
0x12B9
IN ASM
ASSUMPTION
[0x400000] call 0x400043
…
[0x400043] mov rax 18
[0x400048] ret
IN ASM
ASSUMPTION
[0x400000] call 0x400043
…
[0x400043] push 18
[0x400048] ret
IN ASM
ASSUMPTION
[0x400000] call 0x400043
…
[0x400043] mov [rbp-0x10] 0x46
[0x40004E] ret
IN ASM
ASSUMPTION
[0x400000] call 0x400043
…
[0x400043] mov r8 [rip+0x08]
[0x40004A] mov [r8] 18
[0x400051] ret
LEGACY
CODE/DATA BOTH IN MEMORY
▸ First: call is combined from push and jump
▸ call 0x400035
1. push rip
2. jump 0x400035
‣ ret
1. pop rip
2. jump rip
‣ And more
▸ call rax
▸ call [rax]
LEGACY
PROGRAM ALWAYS HAS BUG
EVEN COMPILER
QUESTION
▸ If vulnerability could be
▸ source code to assembly code
QUESTION
▸ If vulnerability could be
▸ source code to assembly code
▸ NO BUG from assembly code to machine code?
⽂字
ASSEMBLE
▸ From assembly code to machine code
▸ 1-1 mapping
▸ platform-dependent
▸ Example
▸ pop rax - 58
▸ syscall - 0F 05
▸ xor r8 0x10 - 48 83 F0 10
▸ mov eax 0xDEADBEEF - B8 EF BE AD DE
X86-64
OPCODE
INSTRUCTION
X86-64 MACHINE CODE
▸ X86-64 machine code layout
▸ [prefix] [opcode] [MOD] [SIB] [Displacement] [Immediate]
▸ Max to 15-bytes peer each instruction
▸ Displacement + Immediate max to 8-bytes (64-bit address)
▸ R(educed)ISC vs C(omplex)ISC
STFW
OPCODE
▸ X86-64 opcode
▸ Intel Manual[0]
▸ Web Resource[1]
▸ OPCODE possible 00 ~ FF
▸ Each one has possible usage or invalid
[0]: https://software.intel.com/sites/default/files/managed/ad/01/253666-sdm-vol-2a.pdf
[1]: http://ref.x86asm.net/coder64.html
SIMPLE LIFE
OPCODE
▸ Simple (frequently-used) opcode
▸ No-OPeration
▸ NOP 90 (maybe xchg eax, eax)
▸ NOP 0F 0D
▸ FNOP D9 D0 (FPU nop)
[0]: http://stackoverflow.com/questions/25008772/whats-the-difference-between-the-x86-nop-and-fnop-
instructions
X86-64
SLIGHTLY COMPLICATED
▸ Extension OPCODE
▸ add (01) support 16 / 32 / 64 operand
▸ add r/m16/32/64 r16/32/64
▸ One opcode do multiple thing?
▸ prefix 48 ~ 4F extend the size to 64-bit
7 3 2 1 0
+—————————+———+———+———+———+
| 0 1 0 0 | W | R | X | B |
+—————————+———+———+———+———+
X86-64
REGISTER EXTENSION
▸ Extension
▸ Size (32-bits to 64-bits)
▸ register (general to extension)
▸ mov eax, 0xdeadbeef B8 EF BE AD DE
▸ mov rax, 0xdeadbeef 48 B8 EF BE AD DE
▸ mov r8, 0xdeadbeef 49 B8 EF BE AD DE
X86-64
TRICKY
▸ OPCODE
▸ push implies r64
▸ push rax 50
▸ push rax 48 50
X86-64
PRIMARY OPCODE
▸ Some opcode is mixed
▸ OPCODE + second opcode
▸ push r16/64 would be merge with 1-byte
▸ push ax 66 50
▸ push rax 50
▸ push r9w 66 41 51
▸ push r9 41 51
X86-64
TWO-BYTE OPCODE
▸ Some opcode are two-type
▸ ADD 05
▸ syscall 0F 05
▸ Prefix (two-byte) 0F
X86-64
SOME PROBLEM
▸ Trivial case - condition check
▸ jz LABEL 48 0F 84 06 00 00 00
▸ Can be modified as
▸ nop 90 90 90 90 90 90 90
X86-64
SOME PROBLEM
▸ If we have
▸ add ax, 0x5150 66 05 50 51
▸ Can be modified as
▸ syscall 0F 05
▸ push rax 50
▸ push rcx 51
REAL-CASE
- MAC OS X
POSSIBILITY
MACHO
▸ Mach-O is a binary format
▸ Header
▸ Commands
▸ Sections
▸ Segment
▸ Binary payload
▸ Multi-architecture binaries
MACH-O 64
HEADER
▸ Magic Number 0xFEEDFACF
▸ 64-bit
▸ CPU info
▸ X86_64 / ARM / ARM64 / POWERPC64 / …
▸ File Type
▸ Execute / Preload / DYLIB / …
▸ Number of commands (section/segment)
▸ Flags
▸ PIE / NOUNDEFS / DYLDLINK / LAZY_INIT / …
MACH-O 64
COMMANDS
▸ Lots of commands
▸ LC_SEGMENT_64
▸ LC_SYMTAB
▸ LC_LOAD_DYLIB
▸ LC_UNIXTHREAD
▸ LC_MAIN
▸ LC_RPATH
MACH-O 64
SEGMENT
▸ Segment
▸ command name
▸ memory address
▸ memory size
▸ file offset
▸ file size
▸ max VM protection
▸ max initial protection
▸ number of sections
MACH-O 64
SECTION
▸ Section Name
▸ Segment Name
▸ memory address
▸ size
▸ offset
▸ align
▸ flags
MACH-O 64
MINIMAL
▸ Minimal Mach-O 64 binary
▸ Low consumption - 4K
▸ Header
▸ 7 commands - 664 bytes
▸ Machine Code - 12 bytes
▸ Dummy x00
ZASM
ASSEMBLER
▸ Assembler
▸ From assembly language to machine code
▸ Target format (ELF / Mach-O / …)
▸ Target platform (x86-64 / ARMv8 / …)
▸ Generator
[0]: https://github.com/cmj0121/Zerg/tree/master/src/zasm
Q&A
THANKS FOR YOUR ATTENTION

[2017.03.18] hst binary training part 1

  • 1.
    BINARY TRAIN - PARTI CMJ / 2017.03.18
  • 2.
    OUTLINE NEXT 45 MIN ▸In the next 45 min ▸ Learn the Mach-O binary format ▸ X86-64 Assembly Language / Machine Code ▸ Trivial Binary Bugs ▸ Order by DESC
  • 3.
  • 4.
    BUG TO VULNERABILITY SIGNAL ▸There are so~ many SIGNAL in *nix-like system ▸ Some is helpful ▸ Some is bug prevention ▸ Understand the bug will find the vulnerabilities ▸ SIGFPE - devision-by-zero ▸ SIGILL - illegal instruction ▸ SIGSEGV - invalid virtual memory reference
  • 5.
    BUG TO VULNERABILITY SIGNAL ▸There are so~ many SIGNAL in *nix-like system ▸ Some is helpful ▸ Some is bug prevention ▸ Understand the bug will find the vulnerabilities ▸ SIGFPE - devision-by-zero ▸ SIGILL - illegal instruction ▸ SIGSEGV - invalid virtual memory reference
  • 6.
    BUG TO VULNERABILITY ILLEGAL& INVALID ▸ Caused by compiler, library, logical ▸ Compiler - replace a newer compiler ▸ Run-time library - replace a newer library ▸ Run-time logical - replace a correct input ▸ 都是 They 的錯
  • 7.
    BUG TO VULNERABILITY ILLEGAL& INVALID ▸ Caused by compiler, library, logical ▸ Compiler - replace a newer compiler ▸ Run-time library - replace a newer library ▸ Run-time logical - replace a correct input ▸ 都是 They 的錯
  • 8.
    VULNERABILITY INPUT ▸ User Input ▸User-Name, Age, email-address, Gender ▸ Store the user input into memory space ▸ ISSUE A. How B. What C. Where
  • 9.
  • 10.
    CPU X86-64 ▸ Register -extend to 64-bits ▸ 8 / 16 / 32 / 64 bits ▸ 128 bits (SSE) ▸ NX (No-Execute) bit ▸ Register is limited ▸ limited to 16 general registers ▸ 16 SSE registers
  • 11.
    CPU X86-64 ▸ Von Neumannmodel ▸ Code / Data are put together (memory) ▸ When data need to be stored / loaded ▸ from register to memory ▸ from memory to register
  • 12.
    STORAGE SOMETHING IN MEMORY ▸Code vs Data vs BSS vs Stack vs Heap ▸ Code is used to read-execute ▸ Data is used to read-write ▸ BSS is used to store Non-Initial data ▸ Stack is used to store template (local) data ▸ Heap is used to store dynamic data ▸ All of these are stored in the memory
  • 13.
    HOPE YOU HAVE… DATA IN PROGRAM ▸ Data ▸ Gender - one letter or full description ▸ Age - possible integer or impossible integer ▸ Name - alphabet or unicode ▸ All data in register / memory are integer-like ▸ 8-bit (0~255) to SSE (0 ~ 3.4e38) ▸ sign or unsigned is a question
  • 14.
    HOPE YOU HAVE… DATA IN PROGRAM ▸ Can simply put age into register ▸ Gender could be ▸ one letter - to ASCII and put in register ▸ Fix-length - store in memory ▸ Name should be ▸ store in memory
  • 15.
    MEMORY WHERE TO STORE ▸Memory ▸ Sequently store user input ▸ decode by program / programmer ▸ ISSUE ▸ size ▸ permission
  • 16.
    MEMORY WHERE TO STORE ▸Data vs BSS vs Stack vs Heap stack ▸ Fit the scenario (assumption) ▸ data is 1. temporary 2. global view 3. variable size
  • 17.
  • 18.
    ⽂字 MOV ▸ In x86-64opcodes ▸ lots of opcodes are MOV ▸ move from/to memory are frequently used actions ▸ mov ch, dl ▸ mov rax, [rax-0x10] ▸ mov [r8], rsp ▸ lea cx, [rbx] ▸ But there are difference opcode!
  • 19.
    AGE SAVE DATA ▸ Save18 as age into program ▸ mov rax, 18 ; save as register ▸ mov [rax], 18 ; save into memory ▸ push 18 ; save into stack
  • 20.
    GENDER SAVE DATA ▸ Save‘F’ (0x46) as gender into program ▸ mov rax, 0x46 ; save as register ▸ mov [rax], 0x46 ; save into memory ▸ push 0x46 ; save into stack
  • 21.
    GENDER SAVE DATA ▸ Save‘Female’ as gender into program ▸ mov [rax], 0x46656D61 ▸ mov [rax+0x04], 0x6C650000 ▸ push 0x46 ▸ push 0x65 ▸ push …
  • 22.
    MEMORY SIZE IS MATTER ▸Step to store data in memory 1. decide the size of memory 2. how to encode/decode data 3. decide the location of memory 4. put into / get from memory
  • 23.
    MEMORY OVERESTIMATE VS UNDERESTIMATE ▸Over ▸ memory leak - OOM ▸ waste resource ▸ Under ▸ data corrupt ▸ overflow
  • 24.
    MEMORY ▸ move tomemory space ▸ Where is the space? BSS or Data or Heap ▸ Compile-time or Run-time ▸ fix-length or variable-length ▸ Save into Stack ▸ Push stack is not unlimited
  • 25.
    IN C LANGUAGE ASSUMPTION ▸Struct in C struct foo { int age; char gender[8]; char email[128]; }; ‣ What happen if overflow in gender ‣ email is corrupt / age is corrupt age gender email 0x1230 0x12B9
  • 26.
    IN ASM ASSUMPTION [0x400000] call0x400043 … [0x400043] mov rax 18 [0x400048] ret
  • 27.
    IN ASM ASSUMPTION [0x400000] call0x400043 … [0x400043] push 18 [0x400048] ret
  • 28.
    IN ASM ASSUMPTION [0x400000] call0x400043 … [0x400043] mov [rbp-0x10] 0x46 [0x40004E] ret
  • 29.
    IN ASM ASSUMPTION [0x400000] call0x400043 … [0x400043] mov r8 [rip+0x08] [0x40004A] mov [r8] 18 [0x400051] ret
  • 30.
    LEGACY CODE/DATA BOTH INMEMORY ▸ First: call is combined from push and jump ▸ call 0x400035 1. push rip 2. jump 0x400035 ‣ ret 1. pop rip 2. jump rip ‣ And more ▸ call rax ▸ call [rax]
  • 31.
    LEGACY PROGRAM ALWAYS HASBUG EVEN COMPILER
  • 32.
    QUESTION ▸ If vulnerabilitycould be ▸ source code to assembly code
  • 33.
    QUESTION ▸ If vulnerabilitycould be ▸ source code to assembly code ▸ NO BUG from assembly code to machine code?
  • 34.
    ⽂字 ASSEMBLE ▸ From assemblycode to machine code ▸ 1-1 mapping ▸ platform-dependent ▸ Example ▸ pop rax - 58 ▸ syscall - 0F 05 ▸ xor r8 0x10 - 48 83 F0 10 ▸ mov eax 0xDEADBEEF - B8 EF BE AD DE
  • 35.
  • 36.
    INSTRUCTION X86-64 MACHINE CODE ▸X86-64 machine code layout ▸ [prefix] [opcode] [MOD] [SIB] [Displacement] [Immediate] ▸ Max to 15-bytes peer each instruction ▸ Displacement + Immediate max to 8-bytes (64-bit address) ▸ R(educed)ISC vs C(omplex)ISC
  • 37.
    STFW OPCODE ▸ X86-64 opcode ▸Intel Manual[0] ▸ Web Resource[1] ▸ OPCODE possible 00 ~ FF ▸ Each one has possible usage or invalid [0]: https://software.intel.com/sites/default/files/managed/ad/01/253666-sdm-vol-2a.pdf [1]: http://ref.x86asm.net/coder64.html
  • 38.
    SIMPLE LIFE OPCODE ▸ Simple(frequently-used) opcode ▸ No-OPeration ▸ NOP 90 (maybe xchg eax, eax) ▸ NOP 0F 0D ▸ FNOP D9 D0 (FPU nop) [0]: http://stackoverflow.com/questions/25008772/whats-the-difference-between-the-x86-nop-and-fnop- instructions
  • 39.
    X86-64 SLIGHTLY COMPLICATED ▸ ExtensionOPCODE ▸ add (01) support 16 / 32 / 64 operand ▸ add r/m16/32/64 r16/32/64 ▸ One opcode do multiple thing? ▸ prefix 48 ~ 4F extend the size to 64-bit 7 3 2 1 0 +—————————+———+———+———+———+ | 0 1 0 0 | W | R | X | B | +—————————+———+———+———+———+
  • 40.
    X86-64 REGISTER EXTENSION ▸ Extension ▸Size (32-bits to 64-bits) ▸ register (general to extension) ▸ mov eax, 0xdeadbeef B8 EF BE AD DE ▸ mov rax, 0xdeadbeef 48 B8 EF BE AD DE ▸ mov r8, 0xdeadbeef 49 B8 EF BE AD DE
  • 41.
    X86-64 TRICKY ▸ OPCODE ▸ pushimplies r64 ▸ push rax 50 ▸ push rax 48 50
  • 42.
    X86-64 PRIMARY OPCODE ▸ Someopcode is mixed ▸ OPCODE + second opcode ▸ push r16/64 would be merge with 1-byte ▸ push ax 66 50 ▸ push rax 50 ▸ push r9w 66 41 51 ▸ push r9 41 51
  • 43.
    X86-64 TWO-BYTE OPCODE ▸ Someopcode are two-type ▸ ADD 05 ▸ syscall 0F 05 ▸ Prefix (two-byte) 0F
  • 44.
    X86-64 SOME PROBLEM ▸ Trivialcase - condition check ▸ jz LABEL 48 0F 84 06 00 00 00 ▸ Can be modified as ▸ nop 90 90 90 90 90 90 90
  • 45.
    X86-64 SOME PROBLEM ▸ Ifwe have ▸ add ax, 0x5150 66 05 50 51 ▸ Can be modified as ▸ syscall 0F 05 ▸ push rax 50 ▸ push rcx 51
  • 46.
  • 47.
    POSSIBILITY MACHO ▸ Mach-O isa binary format ▸ Header ▸ Commands ▸ Sections ▸ Segment ▸ Binary payload ▸ Multi-architecture binaries
  • 48.
    MACH-O 64 HEADER ▸ MagicNumber 0xFEEDFACF ▸ 64-bit ▸ CPU info ▸ X86_64 / ARM / ARM64 / POWERPC64 / … ▸ File Type ▸ Execute / Preload / DYLIB / … ▸ Number of commands (section/segment) ▸ Flags ▸ PIE / NOUNDEFS / DYLDLINK / LAZY_INIT / …
  • 49.
    MACH-O 64 COMMANDS ▸ Lotsof commands ▸ LC_SEGMENT_64 ▸ LC_SYMTAB ▸ LC_LOAD_DYLIB ▸ LC_UNIXTHREAD ▸ LC_MAIN ▸ LC_RPATH
  • 50.
    MACH-O 64 SEGMENT ▸ Segment ▸command name ▸ memory address ▸ memory size ▸ file offset ▸ file size ▸ max VM protection ▸ max initial protection ▸ number of sections
  • 51.
    MACH-O 64 SECTION ▸ SectionName ▸ Segment Name ▸ memory address ▸ size ▸ offset ▸ align ▸ flags
  • 52.
    MACH-O 64 MINIMAL ▸ MinimalMach-O 64 binary ▸ Low consumption - 4K ▸ Header ▸ 7 commands - 664 bytes ▸ Machine Code - 12 bytes ▸ Dummy x00
  • 53.
    ZASM ASSEMBLER ▸ Assembler ▸ Fromassembly language to machine code ▸ Target format (ELF / Mach-O / …) ▸ Target platform (x86-64 / ARMv8 / …) ▸ Generator [0]: https://github.com/cmj0121/Zerg/tree/master/src/zasm
  • 54.