VM Forking and Hypervisor-based fuzzing

2
Notices & Disclaimers
Intel technologies may require enabled hardware, software or service activation.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel disclaims all express
and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement,
as well as any warranty arising from course of performance, course of dealing, or usage in trade.
The products described may contain design defects or errors known as errata which may cause the product to deviate from published
specifications. Current characterized errata are available on request.
You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products
described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter
disclosed herein.
Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy.
No product or component can be absolutely secure.
Your costs and results may vary.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands
may be claimed as the property of others.

3
Outline
1. Intro & Motivation
2. VM forking nuts & bolts
3. Kernel fuzzing with AFL on Xen
4. Malware fuzzing & Memory replay PoC
5. What’s next & challenges
6. Q&A

4
# whoami
 Senior Security Research @ Intel
 Maintainer of Xen’s introspection subsystem
 Maintainer of LibVMI
– Hypervisor agnostic introspection library (Xen, KVM, Bareflank, etc)
– Lot’s of super convenient APIs to do introspection with
 Background in malware research & black-box binary analysis

5
Why fuzzing?
 Time-tested approach to software validation
 Super simple, very effective
 Watch 36c3 “No Source, no problem! High speed binary fuzzing” for a good
intro to fuzzing
 Requires some setup & writing a harness
 The harder it is to write the harness the less likely it will be done
 How do you create coverage trace for the kernel?
 How do you recover fast enough for fuzzing to be effective?

6
Kernel fuzzers do exist
 syzkaller
– Linux syscall fuzzer with built-in coverage guidance
– https://github.com/google/syzkaller
 kAFL
– KVM based using AFL, coverage via Intel PT & PML
– https://github.com/RUB-SysSec/kAFL
 Chocolate milk
– Custom bootloader & hypervisor, all in rust
– https://github.com/gamozolabs/chocolate_milk

7
Why make another one?
 All of these platforms are very tightly coupled to their use-case
 We wanted something stable but also flexible to build on
 Preferring code that’s upstream to cut down on time it takes to maintain custom
patches & debugging things when they break
 Xen’s VMI subsystem is still experimental but fits the bill
 Also allows us to consider new types of fuzzing approaches
 Also allows us to target new use-cases
– Malware fuzzing!

8
Why VM forking?
 We need a way to restore VMs to a start point quickly after each fuzz cycle
 Restoring from a save-file can take up to 2s
 Even from a fast SSD or tmpfs
 Fuzzing to be effective we need to be faster then that
 Xen has a long-forgotten, half abandoned subsystem:
– Memory sharing!
 Should be possible to use it to create forks in a fast & lightweight manner

9
Memory sharing code archeology
 First implemented by Citrix in 2009
 Fairly active development until ~2012
 Pretty much abandoned afterwards
 As expected, had some bit-rot over the years
 But for the most part it still “just works”!

10
Memory sharing
1. Enable memory sharing for each participating domain
2. Nominate a page for sharing
– Page ownership transferred to the dom_cow domain
– Page is marked read-only in the original domain’s p2m (ie. EPT)
3. Multiple domains can now map this shared page
– Page contents are NOT checked, this is not KSM!
4. When EPT faults due to write-access, deduplicate page for the faulting
domain and update p2m to point to the new page
5. When no domain left that uses the shared page its released from dom_cow

11
Memory management in Xen
 The p2m is only for managing the domain’s view of its memory
 There are pages invisible to the guest but it still “owns them”
 The domain struct maintains a linked_list of all pages
 How does Xen know when it’s safe to release a page?
– The actual domain is not the only one that may map it
– QEMU also needs to have access (in dom0, or a stubdom)
– Xen may also map pages itself (shared_info, vcpu_info_page)
 A shared page may also be mapped into dom0!

12
Memory management in Xen
 The solution: every time a page is mapped by anything its reference counted
 Only safe to release when reference count is 0
 Pages are also typed separately from the p2m
– See full list in xen/include/asm-x86/mm.h
 Surprisingly little documentation on what these types and flags do
– Or how they are even stored for the page
 Who holds the reference is also not kept, makes debugging things hard
– Pages can only be made sharable if their reference count is 1

13
VM forking
1. Create domain with an empty p2m
2. Specify its parent
3. Copy vCPU parameters from parent (& some other stuff)
4. When domain is resumed, it will page-fault
5. Populate pages on-demand in the page-fault handler
– Read & execute accesses are populated with a shared entry
– Write accesses are deduplicated

14
VM forking: allocate metadata & copy vCPU
Forked VM
Metadata
vCPU
context
Parent VM
(Windows/Linux)
Metadata
vCPU
context
Memory
pages
<pageX>
<pageY>
Copy

15
Populate fork VM memory when MMU faults
Forked VM
Metadata
vCPU
context
Parent VM
(Windows/Linux)
Metadata
vCPU
context
Memory
pages
<pageX>
<pageY>
<n/a>
<n/a>
fault
<n/a>
Read/Exec?
Share entry

16
Forked VM
Metadata
vCPU
context
Parent VM
(Windows/Linux)
Metadata
vCPU
context
Memory
pages
<pageX>
<pageY>
<n/a>
<n/a>
{sharedX}
fault
Write?
Deduplicate

17
Forked VM
Metadata
vCPU
context
Parent VM
(Windows/Linux)
Metadata
vCPU
context
Memory
pages
<pageX>
<pageY>
<n/a>
<n/a>
{sharedX}
<pageZ>

18
Fork reset: copy vCPU & free allocated pages
Forked VM
Metadata
vCPU
context
Parent VM
(Windows/Linux)
Metadata
vCPU
context
Memory
pages
<pageX>
<pageY>
<n/a>
<n/a>
{sharedX}
<n/a>
Copy

19
VM forking!
VM fork creation time:
~745 μs ~= 1300 VM/s
VM fork reset time:
~111 μs ~= 9000 reset/s

21
VM forking
 It’s different then fork() on Linux
 The parent domain currently has to remain paused while forks are active
– This was fine for our use-case
– For a full domain split, all the parent pages need to be made shared
– Pages that can’t be made shared would need an extra copy
– Doable, was out-of-scope for now
 Forks can be further forked!
– Pages are searched for through the whole chain

22
VM forking without a device model
 It’s possible to create a fork without the QEMU backend
 Launching QEMU is slow & there is no reset operation for the QEMU state
 The fork can execute with just CPU & memory assigned!
 At least some parts of the fork can
 Usually when fuzzing we are exercising very specific code locations
 Perfect for that use-case
 No interrupts!
 Fully functional VMI interface

24
VM forking with an IOMMU
 We wanted to fuzz the kernel and kernel modules
– Device drivers!
 Without real hardware present initializing the code that handles it is hard
 Let’s pass the device through with an IOMMU and let everything initialize
 Code is now in fully functional state
 When we fork, the device stays with the parent
 The fork still has fully functional fully initialized kernel code to play with!
 Way easier then having to transplant memory or hand-crafting the init

25
Fuzzing with AFL
 Another benefit of VM forks is that we can have many of them
– All running simultaneously on different cores
– Each can be created / destroyed / reset independently
– Fully utilize all your hardware!
 So let’s put it all together with AFL
– Pause parent VM when it executes magic CPUID (leaf 0x13371337)
– End of code needs to be marked with another magic CPUID
– Fork & breakpoint kernel crash handlers (oops, panic, etc)
– Run!

27
Coverage guidance
 We can use VMI to trace the execution
– MTF single-stepping would be way too slow for fuzzing
1. Disassemble code from the start and breakpoint next control-flow instruction
2. When breakpoint executes, record location in coverage map
3. Remove breakpoint & enable single-step
4. Execute one instruction, record location & disable singlestep
5. GOTO 1.

29
Released as open-source (MIT)
https://github.com/intel/kernel-fuzzer-for-xen-project

30
Fuzzing malware!
Exercise binary to explore it’s available execution paths
Replace detection of “crash” with “malicious behavior”
Side-step reliance on anti-anti-analysis tricks
Gain confidence in results through large number of executions
Automate & scale

31
Fuzzing malware?
No source-code & debug data
Fuzzers are normally limited to ring3
Binary obfuscation & modular decryption
Encrypted communication
Scalability & containment
What is the “input” we fuzz?

32
How do we approach this?
Complexity is the bane of security
Complexity involves assumptions
Malware loves breaking our assumptions
We need to keep it simple
Our fuzzing system needs to “just work” on anything we throw at it

33
RAM
Key insight: all applications rely on memory
Inducing hardware-faults in memory has been shown to be an effective offensive
technique: Rowhammer!
We could use the same technique for fuzzing
Except we don’t have to actually hammer the RAM, we can virtualize it

34
Microsoft MicroX
Source: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/microx.pdf

35
We can do this!
1. Trap VM memory accesses to a hypervisor using EPT permissions
2. Fork the VM
3. Fuzz memory content in the VM fork
4. Resume VM fork & observe execution
5. Reset fork
6. Rinse & repeat

36
How to fuzz RAM of unknown binary?
Random binary is making accesses to memory
Purpose & context unknown
We can mutate the memory contents
We can do totally random values
We can mix & match
Is this going to be effective?

37
Memory replay
Key insight: memory values read or written by an application are for the most part
meaningful for the application
Replay attack is an effective offensive security technique: valid data is
maliciously or fraudulently repeated or delayed
1. Record memory values being accessed, replay them for future accesses
2. Don’t hardcode addresses
3. Don’t hardcode values
4. Dead simple

38
PoC released as open-source (MIT)
https://xenbits.xen.org/git-http/people/tklengyel/memory-replay.git

39
Thank you
Questions? Comments?
Contact me: tamas.lengyel@intel.com
Twitter: @tklengyel
Repositories:
https://github.com/intel/kernel-fuzzer-for-xen-project
https://xenbits.xen.org/git-http/people/tklengyel/memory-replay.git

42
Why we care about malware?
At IAGS Security, Privacy & Mitigations we do
- Pen Testing
- Software SAFE: secure architecture review
Both tasks require up-to-date knowledge on security issues
– How do you keep up & prioritize them?
– Knowing what interfaces are being attacked and how would help
Third party binaries
– Do we know if any of them have hidden capabilities (debug/trojan/etc)?

43
What we do today
CVEs, conferences, academic publications, blogposts, Twitter, etc.
– Ad-hoc, arbitrary, “shiny new thing” bias
Manual reverse engineering, source-code review
– Doesn’t scale, limited in scope
Fuzzing
– Mostly ring3 only, creating harness requires expert knowledge

44
What we need
We should understand what is being attacked
We should understand how it is being attacked
We should focus on hardening those components to maximize ROI
We should be able to tell when something new appears
We should get ahead of the curve
We need DATA

45
Why is that hard?
Malware fights back
Malware authors want to protect their investment
Longer the malware can spread & run the better the ROI
Static fingerprinting has long been broken
Reverse engineering everything is not feasible

46
Dynamic analysis state-of-the-art
Some of the analysis systems are emulation based
Most recent systems are virtualization based
Most try to be stealthy to trick the malware into executing as it would in its actual
target environment
Large collection of anti-anti-analysis tricks

47
Dynamic malware analysis systems are inherently limited
check_if_malware(random_binary) == halting problem
The Engineer’s Proof by Induction: “If it’s not malware after 1 minute of
execution, and it’s not malware after 2 minute of execution, …, then it’s not
malware”
¯_(ツ)_/¯
See Detecting traditional packers, decisively, D. Bueno, K. J. Compton, K. A. Sakallah and M. Bailey, RAID 2013.

48
Current automated malware analysis systems are only as good as their
understanding of the tricks that hide/delay malicious behavior
“malware can determine that a system is an artificial environment and not a real
user device with an accuracy of 92.86%”
(⋋▂⋌)
https://www.first.org/resources/papers/conf2017/Countering-Innovative-Sandbox-Evasion-Techniques-Used-by-Malware.pdf
Spotless sandboxes: Evading malware analysis systems using wear-and-tear artifacts. Miramirkhani et al., IEEE S&P 2017

49
Is this really the best we can do?

51
Malware fuzzing
Fuzz known malicious binaries to find bugs
– Find botnet “kill-switch”
– Find bugs in c2c communication to take it offline
– Aid reverse engineering
– Make fun of malware
Cool stuff
Not what we are after

52
Malware detection using fuzzing
Fuzz unknown binary to detect known malware
– See if anything gets dropped while fuzzing that triggers on VirusTotal
– Monitor memory with YARA sigs
– Check for known IOCs
Cool stuff
Still not what we are after

53
Behavior modeling using fuzzing
Fuzz unknown binary to build a behavior model
– Detect hidden capabilities
– Detect capabilities that would never trigger under normal circumstances
– Perform similarity match of behavior model
– Detect unknown (buzz-word-alert: 0day) malware
Very cool stuff
That’s what we’ll talk about today

54
A simple test case
Replace magic_string with magic_string2 on-the-fly using the hypervisor!

60
Does memory replay work?
Yes! Secret path was executed

62
Let’s double check that we triggered secret_path at the memcmp..

63
We did trigger new
code..
But secret_path wasn’t
triggered here as new
code?

64
We did trigger new
code..
But secret_path wasn’t
triggered here as new
code?

65
Secret path is executed
~100 forks before
memcmp!

66
~100 forks before
memcmp!
Unknown memory
location

67
~100 forks before
memcmp!
Unknown memory
location
Unknown fuzz value

69
This must be a valid address
Printed backwards due to
system endianness
0x5567a9b5c72d
This must be a ret!

70
system endianness
0x5567a9b5c72d
And it is executed shortly after!
This must be a ret!

71
This must be a ret!
system endianness
0x5567a9b5c72d
And it is executed shortly after!
We just smashed the stack!

72
Could it be that we smashed
something in memcmp?
There are some function calls
made

73
Could it be that we smashed
something in memcmp?
There are some function calls
made
No, this isn’t it, OP_T_THRES
is defined as 8
We specifically called
memcmp with a len of 7!
No other function calls are
made by memcmp

74
Something executes between test() and memcmp()
There isn’t anything there though..

75
Unless it’s the dynamic linker (ld) kicking in for a late binding!

76
Unless it’s the dynamic linker (ld) kicking in for a late binding!

77
That explains a lot!
We have smashed the stack of the dynamic loader!
That’s why we have seen over 200 memory accesses for that extremely tiny
code!
Let’s try again but with resolving imports at load time
• gcc –o test –Wl,-z,now test.c
• Memory accesses drop to 6 R and 3 R/W!
• Fuzzing this new binary results in secret_path being called where we expected

78
VM fork stats
• Forks deployed: 201
• Fuzz iterations executed: 8042
• Highest fork mem use: 13Mb
• Average fork mem use: 683Kb

79
So where are we?
We didn’t get rid of all assumptions
• Target binary must use memory in some way for its CF
• What if multiple memory locations needs magic values in combination?
• AFL’s coverage map is not adequate for malware fuzzing, can be overflown
• We must have a definition of what we consider “malicious”!
• What is and isn’t malicious depends on the context
We now have a metric to measure our “trust”: number of fuzz-cases executed!
• Better then code-coverage since the code isn’t static

80
TODO
• Follow new paths and record memory values to be fed back to the fuzzer
• Actual fuzzing based on the recorded memory values
• Glitching of the registers
• Control-flow path inversion
• Taint-tracking
• Windows support
• Parallel fuzzing.. and more!

81
Challenges
API Hammering
Anti-fuzzing*
Speculative execution based path hiding**
*FUZZIFICATION: Anti-Fuzzing Techniques. Jung, J. et al., USENIX Security 2018
**ExSpectre: Hiding Malware in Speculative Execution. Wampler, J et al., NDSS 2018

VM Forking and Hypervisor-based fuzzing

More Related Content

What's hot

Similar to VM Forking and Hypervisor-based fuzzing

More from Tamas K Lengyel

Recently uploaded

VM Forking and Hypervisor-based fuzzing