redhung@SQLab, NYCU
Unleashing
MAYHEM
On Binary Code
Sang Kil Cha, Thanassis Avgerinos, Alexandre Rebert and David Brumley, Carnegie Mellon University
2012 IEEE Symposium on Security and Privacy
redhung@SQLab, NYCU
Unleashing
MAYHEM
On Binary Code
Sang Kil Cha, Thanassis Avgerinos, Alexandre Rebert and David Brumley, Carnegie Mellon University
2012 IEEE Symposium on Security and Privacy
我用Keynote輸出Power Point會跑版請見諒
>_ CAT ./OUTLINE
Introduction
0x00
Overview
0x10
Technique
s
0x20
Evaluation
0x30
>_ CAT ./OUTLINE
Introduction
0x00
Overview
0x10
Technique
s
0x20
Evaluation
0x30
Introduction
>_ Introduction
•AEG
---
-The automatic exploit generation challenge is given a program, automatically
find vulnerabilities and generate exploits for them.
-Heelan, AEG, Mayhem, CRAX
>_ Introduction
•MAYHE
M
---
-MAYHEM’s design is based on four main principles:
1. The system should be able to make forward progress for arbitrarily long
times, ideally run “forever”
2. In order to maximize performance, the system should not repeat work
3. The system should not throw away any work, previous analysis results of
the system should be reusable on subsequent runs
4. The system should be able to reason about symbolic memory where a load
or store address depends on user input
>_ Introduction
•Symbolic Executors
---
-Current executors can be divided into two main categories: offline
executors, online executors
-Offline executors: which concretely run a single execution path and then
symbolically execute it
-Online executors: which try to execute all possible paths in a single run of the
system
>_ Introduction
•Overall, MAYHEM makes the following
contributions:
---
-Hybrid execution
-Index-based memory modeling
-Binary-only exploit generation
Overview
>_ Overview
---
•We use an HTTP server, orzHttpd as an example to highlight the main challenges
and present how MAYHEM works.
•Note that we show source for clarity and simplicity; MAYHEM runs on binary code
>_ Overview
---
>_ Overview
---
>_ Overview
---
>_ Overview
---
>_ Overview
---
>_ Overview
---
>_ Overview
---
>_ Overview
---
•We highlight several key points for finding exploitable bugs:
-Low-level details matter
-There are an enormous number of paths
-The more checked paths, the better
-Execute as much natively as possible
>_ Overview
---
•MAYHEM consists of two concurrently running processes:
-Concrete Executor Client (CEC)
-Symbolic Executor Server (SES)
>_ Overview
---
Technique
s
>_
Techniques
---
•Challenges
-Resource management in symbolic execution
-Symbolic indices
>_
Techniques
---
•Challenges
-Resource management in symbolic execution
-Symbolic indices
>_
Techniques
---
•Offline Execution
One path
at a time
>_
Techniques
---
•Offline Execution
One path
at a time
Method 1:
Rerun from scratch
=> Inefficient
>_
Techniques
---
•Online Execution
Fork at
branches
>_
Techniques
---
•Online Execution
Fork at
branches
Hit resource cap
>_
Techniques
---
•Online Execution
Fork at
branches
Method 2:
Stop forking
=> Miss paths
Hit resource cap
>_
Techniques
---
•Online Execution
Fork at
branches
Method 2:
Stop forking
=> Miss paths
Method 3:
Snapshot process
=> Huge disk image
Hit resource cap
>_
Techniques
---
•Hybrid Execution
Fork at
branches
>_
Techniques
---
•Hybrid Execution
Fork at
branches
Hit resource cap
>_
Techniques
---
•Hybrid Execution
Fork at
branches
Hit resource cap
MAYHEM’s Method:
Don’t snapshot state;
use path predicate
to recreate state
>_
Techniques
---
•Hybrid Execution
Fork at
branches
Hit resource cap
MAYHEM’s Method:
Don’t snapshot state;
use path predicate
to recreate state
Check point
Reduce the program state
From 9.4M to 500K
>_
Techniques
---
>_
Techniques
---
•Challenges
-Resource management in symbolic execution
-Symbolic indices
>_
Techniques
---
•Symbolic indices
-If user input is used as index to access the memory, then we call
it as a symbolic index
>_
Techniques
---
•Symbolic indices
x = user_input();
y = mem[x];
assert ( y == 42 );
>_
Techniques
---
•Symbolic indices
x = user_input();
y = mem[x];
assert ( y == 42 );
x can be everything
Which memory cell
contains 42?
>_
Techniques
---
•Symbolic indices
x = user_input();
y = mem[x];
assert ( y == 42 );
x can be everything
Which memory cell
contains 42?
2^32 cells to check
Memory
0 2^32 - 1
>_
Techniques
---
•One cause: overwritten pointers
arg
ret addr
ptr
buf
assert ( *ptr == 42 );
return;
mem[0x11223344]
ptr = 0x11223344
>_
Techniques
---
•One cause: overwritten pointers
AAAAAAA
AAAAAAA
AAAAAAA
AAAAAAA
assert ( *ptr == 42 );
return;
mem[input]
ptr = input
>_
Techniques
---
•Another cause: table lookups
-Really frequent in standard API
-e.g. sscanf, vfprintf, toupper, tolower, etc
-If user gives user input as an argument, then inside these functions we
will use user input as a memory index and we will use that index to
look up a table
>_
Techniques
---
•Method 1: Concretization
mem[x] = 42;
x = 17;
mem[x] = 42;
>_
Techniques
---
•Method 2: Fully symbolic
mem[x] = 42;
mem[0] = v … mem[2^32-1] = v
0 2^32-1
>_
Techniques
---
•Observation
x can be everything
x <= 42
x >= 50
y = mem[x]
F T
F T
42 < x < 50
>_
Techniques
---
•Use symbolic execution state to:
1. Bound memory addresses referenced
2. Make search tree for memory address values
-Value Set Analysis (VSA)
•Other algorithms:
-Refinement Cache
-Lemma Cache
-Index Search Trees (ISTs)
-Bucketization with Linear Functions
Evaluation
>_ Evaluation
---
•Experimental Setup
-3.40GHz Intel® Core i7-2600 CPU and 16GB of RAM
-One VM was running Debian Linux (Squeeze)
-One VM was running Windows XP SP3
>_ Evaluation
---
•Exploitable Bug Detection
-Downloaded 29 different vulnerable programs
-2 Zero-Day
-1 Packed binary (Windows)
>_ Evaluation
---
•Exploitable Bug Detection
-Downloaded 29 different vulnerable programs
-2 Zero-Day
-1 Packed binary (Windows)
>_ Evaluation
---
•Scalability of Hybrid Symbolic Execution
-Less Memory-Hungry than Online Execution
-Faster than Offline Execution
>_ Evaluation
---
•Scalability of Hybrid Symbolic Execution
-Less Memory-Hungry than Online Execution
-Faster than Offline Execution
>_ Evaluation
---
•Handling Symbolic Memory in Real-World Applications
>_ Evaluation
---
•MAYHEM Coverage Comparison
>_ Evaluation
---
•Comparison against AEG
>_ Evaluation
---
•Performance Tuning
THANK YOU!
redhung@hung.red
r3dhun9 @r3dhun9 Philip Chen
THANK YOU!
redhung@hung.red
r3dhun9 @r3dhun9 Philip Chen

Unleashing MAYHEM On Binary Code