Tamas K Lengyel, 7/6/2020
2
Notices & Disclaimers
Intel technologies may require enabled hardware, software or service activation.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel disclaims all express
and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement,
as well as any warranty arising from course of performance, course of dealing, or usage in trade.
The products described may contain design defects or errors known as errata which may cause the product to deviate from published
specifications. Current characterized errata are available on request.
You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products
described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter
disclosed herein.
Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy.
No product or component can be absolutely secure.
Your costs and results may vary.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands
may be claimed as the property of others.
3
Outline
1. Intro & Motivation
2. VM forking nuts & bolts
3. Kernel fuzzing with AFL on Xen
4. Malware fuzzing & Memory replay PoC
5. What’s next & challenges
6. Q&A
4
# whoami
 Senior Security Research @ Intel
 Maintainer of Xen’s introspection subsystem
 Maintainer of LibVMI
– Hypervisor agnostic introspection library (Xen, KVM, Bareflank, etc)
– Lot’s of super convenient APIs to do introspection with
 Background in malware research & black-box binary analysis
5
Why fuzzing?
 Time-tested approach to software validation
 Super simple, very effective
 Watch 36c3 “No Source, no problem! High speed binary fuzzing” for a good
intro to fuzzing
 Requires some setup & writing a harness
 The harder it is to write the harness the less likely it will be done
 How do you create coverage trace for the kernel?
 How do you recover fast enough for fuzzing to be effective?
6
Kernel fuzzers do exist
 syzkaller
– Linux syscall fuzzer with built-in coverage guidance
– https://github.com/google/syzkaller
 kAFL
– KVM based using AFL, coverage via Intel PT & PML
– https://github.com/RUB-SysSec/kAFL
 Chocolate milk
– Custom bootloader & hypervisor, all in rust
– https://github.com/gamozolabs/chocolate_milk
7
Why make another one?
 All of these platforms are very tightly coupled to their use-case
 We wanted something stable but also flexible to build on
 Preferring code that’s upstream to cut down on time it takes to maintain custom
patches & debugging things when they break
 Xen’s VMI subsystem is still experimental but fits the bill
 Also allows us to consider new types of fuzzing approaches
 Also allows us to target new use-cases
– Malware fuzzing!
8
Why VM forking?
 We need a way to restore VMs to a start point quickly after each fuzz cycle
 Restoring from a save-file can take up to 2s
 Even from a fast SSD or tmpfs
 Fuzzing to be effective we need to be faster then that
 Xen has a long-forgotten, half abandoned subsystem:
– Memory sharing!
 Should be possible to use it to create forks in a fast & lightweight manner
9
Memory sharing code archeology
 First implemented by Citrix in 2009
 Fairly active development until ~2012
 Pretty much abandoned afterwards
 As expected, had some bit-rot over the years
 But for the most part it still “just works”!
10
Memory sharing
1. Enable memory sharing for each participating domain
2. Nominate a page for sharing
– Page ownership transferred to the dom_cow domain
– Page is marked read-only in the original domain’s p2m (ie. EPT)
3. Multiple domains can now map this shared page
– Page contents are NOT checked, this is not KSM!
4. When EPT faults due to write-access, deduplicate page for the faulting
domain and update p2m to point to the new page
5. When no domain left that uses the shared page its released from dom_cow
11
Memory management in Xen
 The p2m is only for managing the domain’s view of its memory
 There are pages invisible to the guest but it still “owns them”
 The domain struct maintains a linked_list of all pages
 How does Xen know when it’s safe to release a page?
– The actual domain is not the only one that may map it
– QEMU also needs to have access (in dom0, or a stubdom)
– Xen may also map pages itself (shared_info, vcpu_info_page)
 A shared page may also be mapped into dom0!
12
Memory management in Xen
 The solution: every time a page is mapped by anything its reference counted
 Only safe to release when reference count is 0
 Pages are also typed separately from the p2m
– See full list in xen/include/asm-x86/mm.h
 Surprisingly little documentation on what these types and flags do
– Or how they are even stored for the page
 Who holds the reference is also not kept, makes debugging things hard
– Pages can only be made sharable if their reference count is 1
13
VM forking
1. Create domain with an empty p2m
2. Specify its parent
3. Copy vCPU parameters from parent (& some other stuff)
4. When domain is resumed, it will page-fault
5. Populate pages on-demand in the page-fault handler
– Read & execute accesses are populated with a shared entry
– Write accesses are deduplicated
14
VM forking: allocate metadata & copy vCPU
Forked VM
Metadata
vCPU
context
Parent VM
(Windows/Linux)
Metadata
vCPU
context
Memory
pages
<pageX>
<pageY>
Copy
15
Populate fork VM memory when MMU faults
Forked VM
Metadata
vCPU
context
Parent VM
(Windows/Linux)
Metadata
vCPU
context
Memory
pages
<pageX>
<pageY>
<n/a>
<n/a>
fault
<n/a>
Read/Exec?
Share entry
16
Populate fork VM memory when MMU faults
Forked VM
Metadata
vCPU
context
Parent VM
(Windows/Linux)
Metadata
vCPU
context
Memory
pages
<pageX>
<pageY>
<n/a>
<n/a>
{sharedX}
fault
Write?
Deduplicate
17
Populate fork VM memory when MMU faults
Forked VM
Metadata
vCPU
context
Parent VM
(Windows/Linux)
Metadata
vCPU
context
Memory
pages
<pageX>
<pageY>
<n/a>
<n/a>
{sharedX}
<pageZ>
18
Fork reset: copy vCPU & free allocated pages
Forked VM
Metadata
vCPU
context
Parent VM
(Windows/Linux)
Metadata
vCPU
context
Memory
pages
<pageX>
<pageY>
<n/a>
<n/a>
{sharedX}
<n/a>
Copy
19
VM forking!
VM fork creation time:
~745 μs ~= 1300 VM/s
VM fork reset time:
~111 μs ~= 9000 reset/s
20
21
VM forking
 It’s different then fork() on Linux
 The parent domain currently has to remain paused while forks are active
– This was fine for our use-case
– For a full domain split, all the parent pages need to be made shared
– Pages that can’t be made shared would need an extra copy
– Doable, was out-of-scope for now
 Forks can be further forked!
– Pages are searched for through the whole chain
22
VM forking without a device model
 It’s possible to create a fork without the QEMU backend
 Launching QEMU is slow & there is no reset operation for the QEMU state
 The fork can execute with just CPU & memory assigned!
 At least some parts of the fork can
 Usually when fuzzing we are exercising very specific code locations
 Perfect for that use-case
 No interrupts!
 Fully functional VMI interface
23
24
VM forking with an IOMMU
 We wanted to fuzz the kernel and kernel modules
– Device drivers!
 Without real hardware present initializing the code that handles it is hard
 Let’s pass the device through with an IOMMU and let everything initialize
 Code is now in fully functional state
 When we fork, the device stays with the parent
 The fork still has fully functional fully initialized kernel code to play with!
 Way easier then having to transplant memory or hand-crafting the init
25
Fuzzing with AFL
 Another benefit of VM forks is that we can have many of them
– All running simultaneously on different cores
– Each can be created / destroyed / reset independently
– Fully utilize all your hardware!
 So let’s put it all together with AFL
– Pause parent VM when it executes magic CPUID (leaf 0x13371337)
– End of code needs to be marked with another magic CPUID
– Fork & breakpoint kernel crash handlers (oops, panic, etc)
– Run!
26
27
Coverage guidance
 We can use VMI to trace the execution
– MTF single-stepping would be way too slow for fuzzing
1. Disassemble code from the start and breakpoint next control-flow instruction
2. When breakpoint executes, record location in coverage map
3. Remove breakpoint & enable single-step
4. Execute one instruction, record location & disable singlestep
5. GOTO 1.
28
29
Released as open-source (MIT)
https://github.com/intel/kernel-fuzzer-for-xen-project
30
Fuzzing malware!
Exercise binary to explore it’s available execution paths
Replace detection of “crash” with “malicious behavior”
Side-step reliance on anti-anti-analysis tricks
Gain confidence in results through large number of executions
Automate & scale
31
Fuzzing malware?
No source-code & debug data
Fuzzers are normally limited to ring3
Binary obfuscation & modular decryption
Encrypted communication
Scalability & containment
What is the “input” we fuzz?
32
How do we approach this?
Complexity is the bane of security
Complexity involves assumptions
Malware loves breaking our assumptions
We need to keep it simple
Our fuzzing system needs to “just work” on anything we throw at it
33
RAM
Key insight: all applications rely on memory
Inducing hardware-faults in memory has been shown to be an effective offensive
technique: Rowhammer!
We could use the same technique for fuzzing
Except we don’t have to actually hammer the RAM, we can virtualize it
34
Microsoft MicroX
Source: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/microx.pdf
35
We can do this!
1. Trap VM memory accesses to a hypervisor using EPT permissions
2. Fork the VM
3. Fuzz memory content in the VM fork
4. Resume VM fork & observe execution
5. Reset fork
6. Rinse & repeat
36
How to fuzz RAM of unknown binary?
Random binary is making accesses to memory
Purpose & context unknown
We can mutate the memory contents
We can do totally random values
We can mix & match
Is this going to be effective?
37
Memory replay
Key insight: memory values read or written by an application are for the most part
meaningful for the application
Replay attack is an effective offensive security technique: valid data is
maliciously or fraudulently repeated or delayed
1. Record memory values being accessed, replay them for future accesses
2. Don’t hardcode addresses
3. Don’t hardcode values
4. Dead simple
38
PoC released as open-source (MIT)
https://xenbits.xen.org/git-http/people/tklengyel/memory-replay.git
39
Thank you
Questions? Comments?
Contact me: tamas.lengyel@intel.com
Twitter: @tklengyel
Repositories:
https://github.com/intel/kernel-fuzzer-for-xen-project
https://xenbits.xen.org/git-http/people/tklengyel/memory-replay.git
41
Backup slides
42
Why we care about malware?
At IAGS Security, Privacy & Mitigations we do
- Pen Testing
- Software SAFE: secure architecture review
Both tasks require up-to-date knowledge on security issues
– How do you keep up & prioritize them?
– Knowing what interfaces are being attacked and how would help
Third party binaries
– Do we know if any of them have hidden capabilities (debug/trojan/etc)?
43
What we do today
CVEs, conferences, academic publications, blogposts, Twitter, etc.
– Ad-hoc, arbitrary, “shiny new thing” bias
Manual reverse engineering, source-code review
– Doesn’t scale, limited in scope
Fuzzing
– Mostly ring3 only, creating harness requires expert knowledge
44
What we need
We should understand what is being attacked
We should understand how it is being attacked
We should focus on hardening those components to maximize ROI
We should be able to tell when something new appears
We should get ahead of the curve
We need DATA
45
Why is that hard?
Malware fights back
Malware authors want to protect their investment
Longer the malware can spread & run the better the ROI
Static fingerprinting has long been broken
Reverse engineering everything is not feasible
46
Dynamic analysis state-of-the-art
Some of the analysis systems are emulation based
Most recent systems are virtualization based
Most try to be stealthy to trick the malware into executing as it would in its actual
target environment
Large collection of anti-anti-analysis tricks
47
Dynamic analysis state-of-the-art
Dynamic malware analysis systems are inherently limited
check_if_malware(random_binary) == halting problem
The Engineer’s Proof by Induction: “If it’s not malware after 1 minute of
execution, and it’s not malware after 2 minute of execution, …, then it’s not
malware”
¯_(ツ)_/¯
See Detecting traditional packers, decisively, D. Bueno, K. J. Compton, K. A. Sakallah and M. Bailey, RAID 2013.
48
Dynamic analysis state-of-the-art
Current automated malware analysis systems are only as good as their
understanding of the tricks that hide/delay malicious behavior
“malware can determine that a system is an artificial environment and not a real
user device with an accuracy of 92.86%”
(⋋▂⋌)
https://www.first.org/resources/papers/conf2017/Countering-Innovative-Sandbox-Evasion-Techniques-Used-by-Malware.pdf
Spotless sandboxes: Evading malware analysis systems using wear-and-tear artifacts. Miramirkhani et al., IEEE S&P 2017
49
Is this really the best we can do?
50
Fuzzing malware!
51
Malware fuzzing
Fuzz known malicious binaries to find bugs
– Find botnet “kill-switch”
– Find bugs in c2c communication to take it offline
– Aid reverse engineering
– Make fun of malware
Cool stuff
Not what we are after
52
Malware detection using fuzzing
Fuzz unknown binary to detect known malware
– See if anything gets dropped while fuzzing that triggers on VirusTotal
– Monitor memory with YARA sigs
– Check for known IOCs
Cool stuff
Still not what we are after
53
Behavior modeling using fuzzing
Fuzz unknown binary to build a behavior model
– Detect hidden capabilities
– Detect capabilities that would never trigger under normal circumstances
– Perform similarity match of behavior model
– Detect unknown (buzz-word-alert: 0day) malware
Very cool stuff
That’s what we’ll talk about today
54
A simple test case
Replace magic_string with magic_string2 on-the-fly using the hypervisor!
55
56
Control-flow instructions
57
58
59
Did memory replay work?
60
Does memory replay work?
Yes! Secret path was executed
61
62
Let’s double check that we triggered secret_path at the memcmp..
63
Let’s double check that we triggered secret_path at the memcmp..
We did trigger new
code..
But secret_path wasn’t
triggered here as new
code?
64
Let’s double check that we triggered secret_path at the memcmp..
We did trigger new
code..
But secret_path wasn’t
triggered here as new
code?
65
Secret path is executed
~100 forks before
memcmp!
66
Secret path is executed
~100 forks before
memcmp!
Unknown memory
location
67
Secret path is executed
~100 forks before
memcmp!
Unknown memory
location
Unknown fuzz value
68
This must be a ret!
69
This must be a valid address
Printed backwards due to
system endianness
0x5567a9b5c72d
This must be a ret!
70
This must be a valid address
Printed backwards due to
system endianness
0x5567a9b5c72d
And it is executed shortly after!
This must be a ret!
71
This must be a ret!
This must be a valid address
Printed backwards due to
system endianness
0x5567a9b5c72d
And it is executed shortly after!
We just smashed the stack!
72
Could it be that we smashed
something in memcmp?
There are some function calls
made
73
Could it be that we smashed
something in memcmp?
There are some function calls
made
No, this isn’t it, OP_T_THRES
is defined as 8
We specifically called
memcmp with a len of 7!
No other function calls are
made by memcmp
74
Something executes between test() and memcmp()
There isn’t anything there though..
75
Something executes between test() and memcmp()
There isn’t anything there though..
Unless it’s the dynamic linker (ld) kicking in for a late binding!
76
Something executes between test() and memcmp()
There isn’t anything there though..
Unless it’s the dynamic linker (ld) kicking in for a late binding!
77
That explains a lot!
We have smashed the stack of the dynamic loader!
That’s why we have seen over 200 memory accesses for that extremely tiny
code!
Let’s try again but with resolving imports at load time
• gcc –o test –Wl,-z,now test.c
• Memory accesses drop to 6 R and 3 R/W!
• Fuzzing this new binary results in secret_path being called where we expected
78
VM fork stats
• Forks deployed: 201
• Fuzz iterations executed: 8042
• Highest fork mem use: 13Mb
• Average fork mem use: 683Kb
79
So where are we?
We didn’t get rid of all assumptions
• Target binary must use memory in some way for its CF
• What if multiple memory locations needs magic values in combination?
• AFL’s coverage map is not adequate for malware fuzzing, can be overflown
• We must have a definition of what we consider “malicious”!
• What is and isn’t malicious depends on the context
We now have a metric to measure our “trust”: number of fuzz-cases executed!
• Better then code-coverage since the code isn’t static
80
TODO
• Follow new paths and record memory values to be fed back to the fuzzer
• Actual fuzzing based on the recorded memory values
• Glitching of the registers
• Control-flow path inversion
• Taint-tracking
• Windows support
• Parallel fuzzing.. and more!
81
Challenges
API Hammering
Anti-fuzzing*
Speculative execution based path hiding**
*FUZZIFICATION: Anti-Fuzzing Techniques. Jung, J. et al., USENIX Security 2018
**ExSpectre: Hiding Malware in Speculative Execution. Wampler, J et al., NDSS 2018

VM Forking and Hypervisor-based fuzzing

  • 1.
  • 2.
    2 Notices & Disclaimers Inteltechnologies may require enabled hardware, software or service activation. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein. Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy. No product or component can be absolutely secure. Your costs and results may vary. © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.
  • 3.
    3 Outline 1. Intro &Motivation 2. VM forking nuts & bolts 3. Kernel fuzzing with AFL on Xen 4. Malware fuzzing & Memory replay PoC 5. What’s next & challenges 6. Q&A
  • 4.
    4 # whoami  SeniorSecurity Research @ Intel  Maintainer of Xen’s introspection subsystem  Maintainer of LibVMI – Hypervisor agnostic introspection library (Xen, KVM, Bareflank, etc) – Lot’s of super convenient APIs to do introspection with  Background in malware research & black-box binary analysis
  • 5.
    5 Why fuzzing?  Time-testedapproach to software validation  Super simple, very effective  Watch 36c3 “No Source, no problem! High speed binary fuzzing” for a good intro to fuzzing  Requires some setup & writing a harness  The harder it is to write the harness the less likely it will be done  How do you create coverage trace for the kernel?  How do you recover fast enough for fuzzing to be effective?
  • 6.
    6 Kernel fuzzers doexist  syzkaller – Linux syscall fuzzer with built-in coverage guidance – https://github.com/google/syzkaller  kAFL – KVM based using AFL, coverage via Intel PT & PML – https://github.com/RUB-SysSec/kAFL  Chocolate milk – Custom bootloader & hypervisor, all in rust – https://github.com/gamozolabs/chocolate_milk
  • 7.
    7 Why make anotherone?  All of these platforms are very tightly coupled to their use-case  We wanted something stable but also flexible to build on  Preferring code that’s upstream to cut down on time it takes to maintain custom patches & debugging things when they break  Xen’s VMI subsystem is still experimental but fits the bill  Also allows us to consider new types of fuzzing approaches  Also allows us to target new use-cases – Malware fuzzing!
  • 8.
    8 Why VM forking? We need a way to restore VMs to a start point quickly after each fuzz cycle  Restoring from a save-file can take up to 2s  Even from a fast SSD or tmpfs  Fuzzing to be effective we need to be faster then that  Xen has a long-forgotten, half abandoned subsystem: – Memory sharing!  Should be possible to use it to create forks in a fast & lightweight manner
  • 9.
    9 Memory sharing codearcheology  First implemented by Citrix in 2009  Fairly active development until ~2012  Pretty much abandoned afterwards  As expected, had some bit-rot over the years  But for the most part it still “just works”!
  • 10.
    10 Memory sharing 1. Enablememory sharing for each participating domain 2. Nominate a page for sharing – Page ownership transferred to the dom_cow domain – Page is marked read-only in the original domain’s p2m (ie. EPT) 3. Multiple domains can now map this shared page – Page contents are NOT checked, this is not KSM! 4. When EPT faults due to write-access, deduplicate page for the faulting domain and update p2m to point to the new page 5. When no domain left that uses the shared page its released from dom_cow
  • 11.
    11 Memory management inXen  The p2m is only for managing the domain’s view of its memory  There are pages invisible to the guest but it still “owns them”  The domain struct maintains a linked_list of all pages  How does Xen know when it’s safe to release a page? – The actual domain is not the only one that may map it – QEMU also needs to have access (in dom0, or a stubdom) – Xen may also map pages itself (shared_info, vcpu_info_page)  A shared page may also be mapped into dom0!
  • 12.
    12 Memory management inXen  The solution: every time a page is mapped by anything its reference counted  Only safe to release when reference count is 0  Pages are also typed separately from the p2m – See full list in xen/include/asm-x86/mm.h  Surprisingly little documentation on what these types and flags do – Or how they are even stored for the page  Who holds the reference is also not kept, makes debugging things hard – Pages can only be made sharable if their reference count is 1
  • 13.
    13 VM forking 1. Createdomain with an empty p2m 2. Specify its parent 3. Copy vCPU parameters from parent (& some other stuff) 4. When domain is resumed, it will page-fault 5. Populate pages on-demand in the page-fault handler – Read & execute accesses are populated with a shared entry – Write accesses are deduplicated
  • 14.
    14 VM forking: allocatemetadata & copy vCPU Forked VM Metadata vCPU context Parent VM (Windows/Linux) Metadata vCPU context Memory pages <pageX> <pageY> Copy
  • 15.
    15 Populate fork VMmemory when MMU faults Forked VM Metadata vCPU context Parent VM (Windows/Linux) Metadata vCPU context Memory pages <pageX> <pageY> <n/a> <n/a> fault <n/a> Read/Exec? Share entry
  • 16.
    16 Populate fork VMmemory when MMU faults Forked VM Metadata vCPU context Parent VM (Windows/Linux) Metadata vCPU context Memory pages <pageX> <pageY> <n/a> <n/a> {sharedX} fault Write? Deduplicate
  • 17.
    17 Populate fork VMmemory when MMU faults Forked VM Metadata vCPU context Parent VM (Windows/Linux) Metadata vCPU context Memory pages <pageX> <pageY> <n/a> <n/a> {sharedX} <pageZ>
  • 18.
    18 Fork reset: copyvCPU & free allocated pages Forked VM Metadata vCPU context Parent VM (Windows/Linux) Metadata vCPU context Memory pages <pageX> <pageY> <n/a> <n/a> {sharedX} <n/a> Copy
  • 19.
    19 VM forking! VM forkcreation time: ~745 μs ~= 1300 VM/s VM fork reset time: ~111 μs ~= 9000 reset/s
  • 20.
  • 21.
    21 VM forking  It’sdifferent then fork() on Linux  The parent domain currently has to remain paused while forks are active – This was fine for our use-case – For a full domain split, all the parent pages need to be made shared – Pages that can’t be made shared would need an extra copy – Doable, was out-of-scope for now  Forks can be further forked! – Pages are searched for through the whole chain
  • 22.
    22 VM forking withouta device model  It’s possible to create a fork without the QEMU backend  Launching QEMU is slow & there is no reset operation for the QEMU state  The fork can execute with just CPU & memory assigned!  At least some parts of the fork can  Usually when fuzzing we are exercising very specific code locations  Perfect for that use-case  No interrupts!  Fully functional VMI interface
  • 23.
  • 24.
    24 VM forking withan IOMMU  We wanted to fuzz the kernel and kernel modules – Device drivers!  Without real hardware present initializing the code that handles it is hard  Let’s pass the device through with an IOMMU and let everything initialize  Code is now in fully functional state  When we fork, the device stays with the parent  The fork still has fully functional fully initialized kernel code to play with!  Way easier then having to transplant memory or hand-crafting the init
  • 25.
    25 Fuzzing with AFL Another benefit of VM forks is that we can have many of them – All running simultaneously on different cores – Each can be created / destroyed / reset independently – Fully utilize all your hardware!  So let’s put it all together with AFL – Pause parent VM when it executes magic CPUID (leaf 0x13371337) – End of code needs to be marked with another magic CPUID – Fork & breakpoint kernel crash handlers (oops, panic, etc) – Run!
  • 26.
  • 27.
    27 Coverage guidance  Wecan use VMI to trace the execution – MTF single-stepping would be way too slow for fuzzing 1. Disassemble code from the start and breakpoint next control-flow instruction 2. When breakpoint executes, record location in coverage map 3. Remove breakpoint & enable single-step 4. Execute one instruction, record location & disable singlestep 5. GOTO 1.
  • 28.
  • 29.
    29 Released as open-source(MIT) https://github.com/intel/kernel-fuzzer-for-xen-project
  • 30.
    30 Fuzzing malware! Exercise binaryto explore it’s available execution paths Replace detection of “crash” with “malicious behavior” Side-step reliance on anti-anti-analysis tricks Gain confidence in results through large number of executions Automate & scale
  • 31.
    31 Fuzzing malware? No source-code& debug data Fuzzers are normally limited to ring3 Binary obfuscation & modular decryption Encrypted communication Scalability & containment What is the “input” we fuzz?
  • 32.
    32 How do weapproach this? Complexity is the bane of security Complexity involves assumptions Malware loves breaking our assumptions We need to keep it simple Our fuzzing system needs to “just work” on anything we throw at it
  • 33.
    33 RAM Key insight: allapplications rely on memory Inducing hardware-faults in memory has been shown to be an effective offensive technique: Rowhammer! We could use the same technique for fuzzing Except we don’t have to actually hammer the RAM, we can virtualize it
  • 34.
  • 35.
    35 We can dothis! 1. Trap VM memory accesses to a hypervisor using EPT permissions 2. Fork the VM 3. Fuzz memory content in the VM fork 4. Resume VM fork & observe execution 5. Reset fork 6. Rinse & repeat
  • 36.
    36 How to fuzzRAM of unknown binary? Random binary is making accesses to memory Purpose & context unknown We can mutate the memory contents We can do totally random values We can mix & match Is this going to be effective?
  • 37.
    37 Memory replay Key insight:memory values read or written by an application are for the most part meaningful for the application Replay attack is an effective offensive security technique: valid data is maliciously or fraudulently repeated or delayed 1. Record memory values being accessed, replay them for future accesses 2. Don’t hardcode addresses 3. Don’t hardcode values 4. Dead simple
  • 38.
    38 PoC released asopen-source (MIT) https://xenbits.xen.org/git-http/people/tklengyel/memory-replay.git
  • 39.
    39 Thank you Questions? Comments? Contactme: tamas.lengyel@intel.com Twitter: @tklengyel Repositories: https://github.com/intel/kernel-fuzzer-for-xen-project https://xenbits.xen.org/git-http/people/tklengyel/memory-replay.git
  • 41.
  • 42.
    42 Why we careabout malware? At IAGS Security, Privacy & Mitigations we do - Pen Testing - Software SAFE: secure architecture review Both tasks require up-to-date knowledge on security issues – How do you keep up & prioritize them? – Knowing what interfaces are being attacked and how would help Third party binaries – Do we know if any of them have hidden capabilities (debug/trojan/etc)?
  • 43.
    43 What we dotoday CVEs, conferences, academic publications, blogposts, Twitter, etc. – Ad-hoc, arbitrary, “shiny new thing” bias Manual reverse engineering, source-code review – Doesn’t scale, limited in scope Fuzzing – Mostly ring3 only, creating harness requires expert knowledge
  • 44.
    44 What we need Weshould understand what is being attacked We should understand how it is being attacked We should focus on hardening those components to maximize ROI We should be able to tell when something new appears We should get ahead of the curve We need DATA
  • 45.
    45 Why is thathard? Malware fights back Malware authors want to protect their investment Longer the malware can spread & run the better the ROI Static fingerprinting has long been broken Reverse engineering everything is not feasible
  • 46.
    46 Dynamic analysis state-of-the-art Someof the analysis systems are emulation based Most recent systems are virtualization based Most try to be stealthy to trick the malware into executing as it would in its actual target environment Large collection of anti-anti-analysis tricks
  • 47.
    47 Dynamic analysis state-of-the-art Dynamicmalware analysis systems are inherently limited check_if_malware(random_binary) == halting problem The Engineer’s Proof by Induction: “If it’s not malware after 1 minute of execution, and it’s not malware after 2 minute of execution, …, then it’s not malware” ¯_(ツ)_/¯ See Detecting traditional packers, decisively, D. Bueno, K. J. Compton, K. A. Sakallah and M. Bailey, RAID 2013.
  • 48.
    48 Dynamic analysis state-of-the-art Currentautomated malware analysis systems are only as good as their understanding of the tricks that hide/delay malicious behavior “malware can determine that a system is an artificial environment and not a real user device with an accuracy of 92.86%” (⋋▂⋌) https://www.first.org/resources/papers/conf2017/Countering-Innovative-Sandbox-Evasion-Techniques-Used-by-Malware.pdf Spotless sandboxes: Evading malware analysis systems using wear-and-tear artifacts. Miramirkhani et al., IEEE S&P 2017
  • 49.
    49 Is this reallythe best we can do?
  • 50.
  • 51.
    51 Malware fuzzing Fuzz knownmalicious binaries to find bugs – Find botnet “kill-switch” – Find bugs in c2c communication to take it offline – Aid reverse engineering – Make fun of malware Cool stuff Not what we are after
  • 52.
    52 Malware detection usingfuzzing Fuzz unknown binary to detect known malware – See if anything gets dropped while fuzzing that triggers on VirusTotal – Monitor memory with YARA sigs – Check for known IOCs Cool stuff Still not what we are after
  • 53.
    53 Behavior modeling usingfuzzing Fuzz unknown binary to build a behavior model – Detect hidden capabilities – Detect capabilities that would never trigger under normal circumstances – Perform similarity match of behavior model – Detect unknown (buzz-word-alert: 0day) malware Very cool stuff That’s what we’ll talk about today
  • 54.
    54 A simple testcase Replace magic_string with magic_string2 on-the-fly using the hypervisor!
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
    60 Does memory replaywork? Yes! Secret path was executed
  • 61.
  • 62.
    62 Let’s double checkthat we triggered secret_path at the memcmp..
  • 63.
    63 Let’s double checkthat we triggered secret_path at the memcmp.. We did trigger new code.. But secret_path wasn’t triggered here as new code?
  • 64.
    64 Let’s double checkthat we triggered secret_path at the memcmp.. We did trigger new code.. But secret_path wasn’t triggered here as new code?
  • 65.
    65 Secret path isexecuted ~100 forks before memcmp!
  • 66.
    66 Secret path isexecuted ~100 forks before memcmp! Unknown memory location
  • 67.
    67 Secret path isexecuted ~100 forks before memcmp! Unknown memory location Unknown fuzz value
  • 68.
  • 69.
    69 This must bea valid address Printed backwards due to system endianness 0x5567a9b5c72d This must be a ret!
  • 70.
    70 This must bea valid address Printed backwards due to system endianness 0x5567a9b5c72d And it is executed shortly after! This must be a ret!
  • 71.
    71 This must bea ret! This must be a valid address Printed backwards due to system endianness 0x5567a9b5c72d And it is executed shortly after! We just smashed the stack!
  • 72.
    72 Could it bethat we smashed something in memcmp? There are some function calls made
  • 73.
    73 Could it bethat we smashed something in memcmp? There are some function calls made No, this isn’t it, OP_T_THRES is defined as 8 We specifically called memcmp with a len of 7! No other function calls are made by memcmp
  • 74.
    74 Something executes betweentest() and memcmp() There isn’t anything there though..
  • 75.
    75 Something executes betweentest() and memcmp() There isn’t anything there though.. Unless it’s the dynamic linker (ld) kicking in for a late binding!
  • 76.
    76 Something executes betweentest() and memcmp() There isn’t anything there though.. Unless it’s the dynamic linker (ld) kicking in for a late binding!
  • 77.
    77 That explains alot! We have smashed the stack of the dynamic loader! That’s why we have seen over 200 memory accesses for that extremely tiny code! Let’s try again but with resolving imports at load time • gcc –o test –Wl,-z,now test.c • Memory accesses drop to 6 R and 3 R/W! • Fuzzing this new binary results in secret_path being called where we expected
  • 78.
    78 VM fork stats •Forks deployed: 201 • Fuzz iterations executed: 8042 • Highest fork mem use: 13Mb • Average fork mem use: 683Kb
  • 79.
    79 So where arewe? We didn’t get rid of all assumptions • Target binary must use memory in some way for its CF • What if multiple memory locations needs magic values in combination? • AFL’s coverage map is not adequate for malware fuzzing, can be overflown • We must have a definition of what we consider “malicious”! • What is and isn’t malicious depends on the context We now have a metric to measure our “trust”: number of fuzz-cases executed! • Better then code-coverage since the code isn’t static
  • 80.
    80 TODO • Follow newpaths and record memory values to be fed back to the fuzzer • Actual fuzzing based on the recorded memory values • Glitching of the registers • Control-flow path inversion • Taint-tracking • Windows support • Parallel fuzzing.. and more!
  • 81.
    81 Challenges API Hammering Anti-fuzzing* Speculative executionbased path hiding** *FUZZIFICATION: Anti-Fuzzing Techniques. Jung, J. et al., USENIX Security 2018 **ExSpectre: Hiding Malware in Speculative Execution. Wampler, J et al., NDSS 2018