• Co-founder and Chief Scientist at Lastline, Inc.
– Lastline offers protection against zero-day threats and advanced
malware
• Professor in Computer Science at UC Santa Barbara
– many systems security papers in academic conferences
• Part of Shellphish
Who are we?
• PhD Student at UC Santa Barbara
– research focused primarily on binary security and embedded devices
• Part of Shellphish
– team leader of Shellphish's effort in the DARPA Cyber Grand
Challenge
• Doesn't like peanut butter
Who are we?
- firmware
- binary analysis
- angr
What are we talking about?
The “Internet of Things”
Embedded software is everywhere
• Embedded Linux and user-space programs
• Custom OS and custom programs combined together
in a binary blob
– typically, the binary is all that you get
– and, sometimes, it is not easy to get this off the device
What is on embedded devices?
Binary analysis
noun | bi·na·ry anal·y·sis | ˈbī-nə-rē ə-ˈna-lə-səs
1. The process of automatically deriving properties about the
behavior of binary programs
2. Including static binary analysis and dynamic binary analysis
Binary Analysis
• Program verification
• Program testing
• Vulnerability excavation
• Vulnerability signature generation
• Reverse engineering
• Vulnerability excavation
• Exploit generation
Goals of Binary Analysis
– reason over multiple (all) execution paths
– can achieve excellent coverage
– precision versus scalability trade-off
• very precise analysis can be slow and not scalable
• too much approximation leads to wrong results (false positives)
– often works on abstract program model
• for example, binary code is lifted to an intermediate representation
Static Binary Analysis
– examine individual program paths
– very precise
– coverage is (very) limited
– sometimes hard to properly run program
• hard to attach debugger to embedded system
• when code is extracted and emulated, what happens with calls to
peripherals?
Dynamic Binary Analysis
• Get the binary code
• Binaries lack significant information present in source
• Often no clear library or operating system abstractions
o where to start the analysis from?
o hard to handle environment interactions
Challenges of Static Binary Analysis
From Source to Binary Code
compile link
strip
From Source to Binary Code
compile link
strip
type info
function
names
variable
names
jump
targets
• (Linux) system call interface is great
– you know what the I/O routines are
• important to understand what user can influence
– you have typed parameters and return values
– lets the analysis focus on (much smaller) main program
• OS is not there or embedded in binary blob
– heuristics to find I/O routines
– open challenge to find mostly independent components
Missing OS and Library Abstractions
• Library functions are great
– you know what they do and can write a “function summary”
– you have typed parameters and return values
– lets the analysis focus on (much smaller) main program
• Library functions are embedded (like static linking)
– need heuristics to rediscover library functions
– IDA FLIRT (Fast Library Identification and Recognition Technology)
– more robustness based on looking for control flow similarity
Missing OS and Library Abstractions
• Memory safety vulnerabilities
– buffer overrun
– out of bounds reads (heartbleed)
– write-what-where
• Authentication bypass (backdoors)
• Actuator control!
Types of Vulnerabilities
Linux embedded device: HTTP server for
management and video monitoring, with a
known backdoor.
Backdoor!!!
➔ Username: 3sadmin
➔ Password: 27988303
Heffner, Craig. "Finding and Reversing Backdoors in
Consumer Firmware." EELive! (2014).
Motivating Example
Authentication Bypass
Prompt
Authentication
Success Failure
Authentication Bypass
Prompt
Authentication
Success Failure
Backdoor
e.g. strcmp()
Authentication Bypass
Prompt
Authentication
Success Failure
Backdoor
e.g. strcmp()
Hard to find.
Authentication Bypass
Prompt
Success
Missing!
Modeling Authentication Bypass
Prompt
Authentication
Success Failure
Backdoor
e.g. strcmp()
Easier to find!
Hard to find.
Input Determinism
Prompt
Authentication
Success Failure
Backdoor
e.g. strcmp()
Can we determine
the input needed to
reach the success
function, just by
analyzing the code?
The answer is NO
Input Determinism
Prompt
Authentication
Success Failure
Backdoor
e.g. strcmp()
Can we determine
the input needed to
reach the success
function, just by
analyzing the code?
The answer is YES
Modeling Authentication Bypass
Prompt
Authentication
Success Failure
Backdoor
e.g. strcmp()
Easier to find!
But how?
• Without OS/ABI information:
• With ABI information:
Finding “Authenticated Point”
EXEC()
Using Binary Analysis to Hunt for Vulnerabilities
Program
Symbolic Execution
Security
policies Security
Policy Checker
POCs
Static Analysis
angr: A Binary Analysis Framework
Static Analysis Routines
Symbolic Execution Engine
Binary Loader
angr
angr: A Binary Analysis Framework
Static Analysis Routines
Symbolic Execution Engine
Binary Loader
angr
"How do I trigger path X or condition Y?"
Symbolic Execution
Input Determinism
Prompt
Authentication
Success Failure
Backdoor
e.g. strcmp()
Can we determine
the input needed to
reach the success
function, just by
analyzing the code?
"How do I trigger path X or condition Y?"
- Dynamic analysis
- Input A? No. Input B? No. Input C? …
- Based on concrete inputs to application.
- (Concrete) static analysis
- "You can't"/"You might be able to"
- Based on various static techniques.
We need something slightly different.
Symbolic Execution
"How do I trigger path X or condition Y?"
1. Interpret the application.
2. Track "constraints" on variables.
3. When the required condition is triggered,
"concretize" to obtain a possible input.
Symbolic Execution
Constraint solving:
❏ Conversion from set of constraints to set of concrete values
that satisfy them.
❏ NP-complete, in general.
Constraints
x >= 10
x < 100
x = 42
Symbolic Execution
Pros
- Precise
- No false positives (with
correct environment
model)
- Produces directly-
actionable inputs
Symbolic Execution - Pros and Cons
Cons
- Not scalable
- constraint solving is np-
complete
- path explosion
Our Case
Worst-Case
Worst-Case
angr: A Binary Analysis Framework
Static Analysis Routines
Symbolic Execution Engine
Binary Loader
angr
angr: A Binary Analysis Framework
Control-Flow Graph
Data-Flow Analysis
Value-Set Analysis
Static Analysis Routines
Symbolic Execution Engine
Binary Loader
angr
angr: A Binary Analysis Framework
Control-Flow Graph
Data-Flow Analysis
Value-Set Analysis
Static Analysis Routines
Symbolic Execution Engine
Binary Loader
angr
angr: A Binary Analysis Framework
Control-Flow Graph
Data-Flow Analysis
Value-Set Analysis
Static Analysis Routines
Symbolic Execution Engine
Binary Loader
angr
angr: A Binary Analysis Framework
Control-Flow Graph
Data-Flow Analysis
Value-Set Analysis
Static Analysis Routines
Symbolic Execution Engine
Binary Loader
angr
Example
cmp rbx, 0x1024
ja _OUT
cmp [rax+rbx], 1337
je _OUT
add rbx, 4
rbx?
What is rbx in the yellow square?mov rax, 0x400000
mov rbx, 0
Symbolic execution: state explosion
Naive static analysis: "anything"
Range analysis: "< 0x1024"
Can we do better?
Memory access checks Type inference
Variable recovery Range recovery
Wrapped-interval analysis
Value-set analysis
Abstract interpretation
Value Set Analysis
Value Set Analysis - Strided Intervals
4[0x100, 0x120],32
Stride Low High Size
0x100 0x10c 0x118
0x104 0x110 0x11c
0x108 0x114 0x120
Example
cmp rbx, 0x1024
ja _OUT
cmp [rax+rbx], 1337
je _OUT
add rbx, 4
rbx?
What is rbx in the yellow square?
1. 1[0x0, 0x0],64
2. 4[0x0, 0x4],64
3. 4[0x0, 0x8],64
4. 4[0x0, 0xc],64
5. 4[0x0, ∞],64
6. 4[0x0, 0x1024],64
mov rax, 0x400000
mov rbx, 0
Widen
Narrow
1
234 5
6
angr: A Binary Analysis Framework
Control-Flow Graph
Data-Flow Analysis
Value-Set Analysis
Static Analysis Routines
Symbolic Execution Engine
Binary Loader
angr
CB
vulnerable program
RB
patched program
POV
exploit
Cyber
Reasoning
System
The Cyber Grand Challenge!
The Shellphish CRS
CB
Proposed
RBs
Autonomous
vulnerability
scanning
Autonomous
service
resiliency
PCAP
Test cases
POV
RB
Autonomous
processing
Autonomous
patching
Proposed
POVs
The Shellphish CRS
CB
Proposed
RBs
Autonomous
vulnerability
scanning
Autonomous
service
resiliency
PCAP
Test cases
POV
RB
Autonomous
processing
Autonomous
patching
Proposed
POVs
- ipython-accessible
- powerful analyses
- versatile
- well-encapsulated
- open and expandable
- architecture "independent"
Angr
Angr Mini-howto
# ipython
In [1]: import angr, networkx
In [2]: binary = angr.Project("/some/binary")
In [3]: cfg = binary.analyses.CFG()
In [4]: networkx.draw(cfg.graph)
In [5]: explorer = binary.factory.path_group()
In [6]: explorer.explore(find=0xc001b000)
➔ http://angr.io
➔ https://github.com/angr
➔ angr@lists.cs.ucsb.edu
Pull requests, issues, questions, etc super-welcome! Let's bring on
the next generation of binary analysis!
Angr - Open source!
Birthday: September 2013
Total line numbers: 59950
Total commits: ALMOST 9000!! (actually ~6000)
Value Set Analysis - Strided Intervals
4[0x100, 0x110],32 + 1 = 4[0x101, 0x111],32
4[0x100, 0x110],32 >> 1 = 2[0x80, 0x88],31
2[0x80, 0x88],31 << 1 = 1[0x100, 0x110],32
4[0x100, 0x110],32 ⋃ 4[0x102, 0x112],32 = 2[0x100, 0x112],32
4[0x100, 0x110],32 ⋂ 3[0x100, 0x110],32 = 12[0x100, 0x112],32
WIDEN (4[0x100, 0x110],32 ⋃ 4[0x100, 0x112],32) = 4[0x100, ∞],32

Using Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware

  • 2.
    • Co-founder andChief Scientist at Lastline, Inc. – Lastline offers protection against zero-day threats and advanced malware • Professor in Computer Science at UC Santa Barbara – many systems security papers in academic conferences • Part of Shellphish Who are we?
  • 3.
    • PhD Studentat UC Santa Barbara – research focused primarily on binary security and embedded devices • Part of Shellphish – team leader of Shellphish's effort in the DARPA Cyber Grand Challenge • Doesn't like peanut butter Who are we?
  • 4.
    - firmware - binaryanalysis - angr What are we talking about?
  • 5.
  • 6.
  • 7.
    • Embedded Linuxand user-space programs • Custom OS and custom programs combined together in a binary blob – typically, the binary is all that you get – and, sometimes, it is not easy to get this off the device What is on embedded devices?
  • 8.
    Binary analysis noun |bi·na·ry anal·y·sis | ˈbī-nə-rē ə-ˈna-lə-səs 1. The process of automatically deriving properties about the behavior of binary programs 2. Including static binary analysis and dynamic binary analysis Binary Analysis
  • 9.
    • Program verification •Program testing • Vulnerability excavation • Vulnerability signature generation • Reverse engineering • Vulnerability excavation • Exploit generation Goals of Binary Analysis
  • 10.
    – reason overmultiple (all) execution paths – can achieve excellent coverage – precision versus scalability trade-off • very precise analysis can be slow and not scalable • too much approximation leads to wrong results (false positives) – often works on abstract program model • for example, binary code is lifted to an intermediate representation Static Binary Analysis
  • 11.
    – examine individualprogram paths – very precise – coverage is (very) limited – sometimes hard to properly run program • hard to attach debugger to embedded system • when code is extracted and emulated, what happens with calls to peripherals? Dynamic Binary Analysis
  • 12.
    • Get thebinary code • Binaries lack significant information present in source • Often no clear library or operating system abstractions o where to start the analysis from? o hard to handle environment interactions Challenges of Static Binary Analysis
  • 13.
    From Source toBinary Code compile link strip
  • 14.
    From Source toBinary Code compile link strip type info function names variable names jump targets
  • 15.
    • (Linux) systemcall interface is great – you know what the I/O routines are • important to understand what user can influence – you have typed parameters and return values – lets the analysis focus on (much smaller) main program • OS is not there or embedded in binary blob – heuristics to find I/O routines – open challenge to find mostly independent components Missing OS and Library Abstractions
  • 16.
    • Library functionsare great – you know what they do and can write a “function summary” – you have typed parameters and return values – lets the analysis focus on (much smaller) main program • Library functions are embedded (like static linking) – need heuristics to rediscover library functions – IDA FLIRT (Fast Library Identification and Recognition Technology) – more robustness based on looking for control flow similarity Missing OS and Library Abstractions
  • 17.
    • Memory safetyvulnerabilities – buffer overrun – out of bounds reads (heartbleed) – write-what-where • Authentication bypass (backdoors) • Actuator control! Types of Vulnerabilities
  • 18.
    Linux embedded device:HTTP server for management and video monitoring, with a known backdoor. Backdoor!!! ➔ Username: 3sadmin ➔ Password: 27988303 Heffner, Craig. "Finding and Reversing Backdoors in Consumer Firmware." EELive! (2014). Motivating Example
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
    Modeling Authentication Bypass Prompt Authentication SuccessFailure Backdoor e.g. strcmp() Easier to find! Hard to find.
  • 24.
    Input Determinism Prompt Authentication Success Failure Backdoor e.g.strcmp() Can we determine the input needed to reach the success function, just by analyzing the code? The answer is NO
  • 25.
    Input Determinism Prompt Authentication Success Failure Backdoor e.g.strcmp() Can we determine the input needed to reach the success function, just by analyzing the code? The answer is YES
  • 26.
    Modeling Authentication Bypass Prompt Authentication SuccessFailure Backdoor e.g. strcmp() Easier to find! But how?
  • 27.
    • Without OS/ABIinformation: • With ABI information: Finding “Authenticated Point” EXEC()
  • 28.
    Using Binary Analysisto Hunt for Vulnerabilities Program Symbolic Execution Security policies Security Policy Checker POCs Static Analysis
  • 29.
    angr: A BinaryAnalysis Framework Static Analysis Routines Symbolic Execution Engine Binary Loader angr
  • 30.
    angr: A BinaryAnalysis Framework Static Analysis Routines Symbolic Execution Engine Binary Loader angr
  • 31.
    "How do Itrigger path X or condition Y?" Symbolic Execution
  • 32.
    Input Determinism Prompt Authentication Success Failure Backdoor e.g.strcmp() Can we determine the input needed to reach the success function, just by analyzing the code?
  • 33.
    "How do Itrigger path X or condition Y?" - Dynamic analysis - Input A? No. Input B? No. Input C? … - Based on concrete inputs to application. - (Concrete) static analysis - "You can't"/"You might be able to" - Based on various static techniques. We need something slightly different. Symbolic Execution
  • 34.
    "How do Itrigger path X or condition Y?" 1. Interpret the application. 2. Track "constraints" on variables. 3. When the required condition is triggered, "concretize" to obtain a possible input. Symbolic Execution
  • 35.
    Constraint solving: ❏ Conversionfrom set of constraints to set of concrete values that satisfy them. ❏ NP-complete, in general. Constraints x >= 10 x < 100 x = 42 Symbolic Execution
  • 37.
    Pros - Precise - Nofalse positives (with correct environment model) - Produces directly- actionable inputs Symbolic Execution - Pros and Cons Cons - Not scalable - constraint solving is np- complete - path explosion
  • 38.
  • 39.
  • 40.
  • 41.
    angr: A BinaryAnalysis Framework Static Analysis Routines Symbolic Execution Engine Binary Loader angr
  • 42.
    angr: A BinaryAnalysis Framework Control-Flow Graph Data-Flow Analysis Value-Set Analysis Static Analysis Routines Symbolic Execution Engine Binary Loader angr
  • 43.
    angr: A BinaryAnalysis Framework Control-Flow Graph Data-Flow Analysis Value-Set Analysis Static Analysis Routines Symbolic Execution Engine Binary Loader angr
  • 44.
    angr: A BinaryAnalysis Framework Control-Flow Graph Data-Flow Analysis Value-Set Analysis Static Analysis Routines Symbolic Execution Engine Binary Loader angr
  • 45.
    angr: A BinaryAnalysis Framework Control-Flow Graph Data-Flow Analysis Value-Set Analysis Static Analysis Routines Symbolic Execution Engine Binary Loader angr
  • 46.
    Example cmp rbx, 0x1024 ja_OUT cmp [rax+rbx], 1337 je _OUT add rbx, 4 rbx? What is rbx in the yellow square?mov rax, 0x400000 mov rbx, 0 Symbolic execution: state explosion Naive static analysis: "anything" Range analysis: "< 0x1024" Can we do better?
  • 47.
    Memory access checksType inference Variable recovery Range recovery Wrapped-interval analysis Value-set analysis Abstract interpretation Value Set Analysis
  • 48.
    Value Set Analysis- Strided Intervals 4[0x100, 0x120],32 Stride Low High Size 0x100 0x10c 0x118 0x104 0x110 0x11c 0x108 0x114 0x120
  • 49.
    Example cmp rbx, 0x1024 ja_OUT cmp [rax+rbx], 1337 je _OUT add rbx, 4 rbx? What is rbx in the yellow square? 1. 1[0x0, 0x0],64 2. 4[0x0, 0x4],64 3. 4[0x0, 0x8],64 4. 4[0x0, 0xc],64 5. 4[0x0, ∞],64 6. 4[0x0, 0x1024],64 mov rax, 0x400000 mov rbx, 0 Widen Narrow 1 234 5 6
  • 50.
    angr: A BinaryAnalysis Framework Control-Flow Graph Data-Flow Analysis Value-Set Analysis Static Analysis Routines Symbolic Execution Engine Binary Loader angr
  • 51.
  • 52.
  • 53.
  • 54.
    - ipython-accessible - powerfulanalyses - versatile - well-encapsulated - open and expandable - architecture "independent" Angr
  • 55.
    Angr Mini-howto # ipython In[1]: import angr, networkx In [2]: binary = angr.Project("/some/binary") In [3]: cfg = binary.analyses.CFG() In [4]: networkx.draw(cfg.graph) In [5]: explorer = binary.factory.path_group() In [6]: explorer.explore(find=0xc001b000)
  • 56.
    ➔ http://angr.io ➔ https://github.com/angr ➔angr@lists.cs.ucsb.edu Pull requests, issues, questions, etc super-welcome! Let's bring on the next generation of binary analysis! Angr - Open source! Birthday: September 2013 Total line numbers: 59950 Total commits: ALMOST 9000!! (actually ~6000)
  • 58.
    Value Set Analysis- Strided Intervals 4[0x100, 0x110],32 + 1 = 4[0x101, 0x111],32 4[0x100, 0x110],32 >> 1 = 2[0x80, 0x88],31 2[0x80, 0x88],31 << 1 = 1[0x100, 0x110],32 4[0x100, 0x110],32 ⋃ 4[0x102, 0x112],32 = 2[0x100, 0x112],32 4[0x100, 0x110],32 ⋂ 3[0x100, 0x110],32 = 12[0x100, 0x112],32 WIDEN (4[0x100, 0x110],32 ⋃ 4[0x100, 0x112],32) = 4[0x100, ∞],32