Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

CrySys guest-lecture: Virtual machine introspection on modern hardware

2,890 views

Published on

Slides of guest lecture at CrySys Lab, Budapest

Published in: Devices & Hardware
  • Hey guys! Who wants to chat with me? More photos with me here 👉 http://www.bit.ly/katekoxx
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

CrySys guest-lecture: Virtual machine introspection on modern hardware

  1. 1. Virtual machine introspection on modern hardware Tamas K Lengyel @tklengyel tamas@tklengyel.com 2/19/2015 – CrySys Lab, Budapest
  2. 2. Agenda 1. VMI intro 2. Intel’s split-TLB 3. Intel’s EPT and its limitations 4. Intel’s #VE / VMFUNC / EPTP-switching 5. Intel’s SMM/DMM 6. ARM 7. Conclusion
  3. 3. Virtual Machine Introspection (VMI) Interpret virtual hardware ● Network, Disk, vCPU & Memory ● The semantic gap problem: Reconstruct high-level state information from low- level data-sources.
  4. 4. Virtual Machine Introspection (VMI) Bridging the semantic gap ● The guest OS is in charge of managing the virtual hardware ● How to get the info from it? o Install in-guest agent to query using standard interfaces o If OS is compromised in-guest agent can be disabled / tampered with o Just as vulnerable as your AntiVirus
  5. 5. Virtual Machine Introspection (VMI) Bridging the semantic gap ● Replicate guest OS functions externally o Requires expert knowledge on OS internals and hardware behavior o Requires debug data to understand in-memory structures ● Need access to VM memory, vCPU registers, etc. o Hypervisor support required (or custom VMM) o Need to emulate hardware v2p translation
  6. 6. Translation lookaside buffer (TLB) poisoning Translation lookaside buffer (TLB) ● Virtual to physical address translation is expensive ● Hardware managed transparent cache of the results ● Separate cache for data read/write and instruction fetch (Harvard-type architecture)! ● The OS can flush it, but cannot query it o Opportunity to whack it out of sync! o Shadow Walker / FU rootkit
  7. 7. TLB poisoning Original algorithm: Input: Splitting Page Address (addr) Pagetable Entry for addr (pte) invalidate_instr_tlb (pte); // flush TLB pte = the_shadow_code_page (addr); // replace PTE in memory mark_global (pte); // disable auto-flush reload_instr_tlb (pte); // load it into TLB pte = the_orig_code_page (addr); // put original entry back
  8. 8. TLB poisoning and virtualization VMENTRY/VMEXIT automatically flushes the entire TLB ● TLB poisoning is impossible ● Performance hit Introduction of TLB tagging (VPID) in Intel Nehalem (2008) ● 16-bit field specified in the VMCS for each vCPU ● Performance boost! ● VM TLB entries invisible to the VMM ● The problem is not the split TLB, it’s the TLB itself
  9. 9. TLB poisoning with Windows / Linux TLB poisoning uses global pages ● CR4.PGE (bit 7) ● Makes PTE’s marked as global survive context-switches ● Great performance boost for kernel pages! Windows 7 ● Regularly flushes global pages by disabled & re-enabling CR4.PGE Linux ● Doesn’t touch CR4.PGE after boot
  10. 10. The tagged TLB in Xen The TLB tag is assigned to the vCPU from a global counter ● asid->asid = data->next_asid++; No flushes, just assign a new tag when needed! When counter is overflown, flush everything and restart from 1 A new TLB tag is assigned to the vCPU every time a MOV- TO-CR3 is caught! ● The use of global pages is negated in the guest! ● The TLB needs to be primed on each context-switch!
  11. 11. The tagged TLB in KVM Tag is assigned when vCPU structure is created • Doesn’t matter if the vCPU is activated or not. • Disable tagging and revert to old VMENTRY/VMEXIT TLB flushing if out of tags Priming the TLB in Linux guests on KVM is a problem! • However, the split TLB still has issues
  12. 12. The sTLB! Intel Nehalem introduced second-level (victim) cache The problem: ● Split-TLB relies on a custom page-fault handler being called to re-split the TLB when it’s evicted ● With sTLB, the entry is brought back.. o Both into the iTLB & the dTLB o Split-TLB becomes unsplit! ● Split-TLB poisoning is unreliable in VMs!
  13. 13. Disabling the sTLB 2014: MoRE Shadow Walker - The evolution of TLB splitting on x86 ● sTLB doesn’t merge entries with conflicting PTE permissions ● Definitely applies to EPT PTEs but hasn’t been tested with regular PTEs ● Make shadow code-page X only before loading and the custom #PF handler will be triggered when evicted
  14. 14. Reconstructing the guest OS’s view Use standard OS structures at known locations ● KPCR, KDBG ● Linux Sysmap Scan for objects of interest with signatures ● Memory is in fast-flux ● False positives ● Weak signatures ● Cross-view validation is costly
  15. 15. Interposition on x86 Intel Need to trap VM execution for coherent view ● Avoids memory fast-flux problems ● Can avoid DKOM/DKSM Two main options ● Tracing via EPT o Granularity is that of a page but can trap R/W/X ● Tracing via #BP o Only traps X on direct hit but is guest-visible o Can be protected with EPT to hide it
  16. 16. Extended Page Tables (EPT) Speed up guest virtual to machine physical address translation. Two sets of tables: 1st layer managed by guest OS 2nd layer managed by the VMM Permissions can be different in the two layers!
  17. 17. Extended Page Tables (EPT) Up to 512 EPTs per VM! ● Most VMMs just assign 1 EPT / VM ● Value defined in VMCS for each vCPU ● Xen 4.6 will allow different EPT / vCPU Permission stored in bit 0-2 of EPT PTE ● r : 1, /* bit 0 - Read permission */ ● w : 1, /* bit 1 - Write permission */ ● x : 1, /* bit 2 - Execute permission */ Lots of software programmable bits in EPT PTE available ● access : 4, /* bits 61:58 - p2m_access_t */
  18. 18. Extended Page Tables (EPT) Unlike normal PTEs, permission can be: • R • X • R/W • R/X • R/W/X • W and W/X triggers EPT misconfiguration • Still traps to the VMM but viewer info transfered
  19. 19. EPT limitations ● When violation is trapped, the information only gives the start address and type of the violation. ● A single R/W violation may touch up to 8-bytes! ● Not sufficient to match violation offset against known locations ● Violations in the vicinity of a watched area need to be treated as potential hits as well!
  20. 20. EPT limitations Read/Write violation ambiguities "An EPT violation that occurs during as a result of execution of a read- modify-write operation sets bit 1 (data write). Whether it also sets bit 0 (data read) is implementation-specific and, for a given implementation, may differ for different kinds of read-modify-write operations.“ - Intel SDM It is possible to siphon data using r-m-w operations from a page that doesn’t allow reading! Fixed in Xen 4.5
  21. 21. EPT limitations How to let the VM progress without missing a potential event?
  22. 22. EPT limitations How to let the VM progress without missing a potential event? Emulate the instruction! • Xen comes with a baked in x86 emulator! • Option to “emulate with no write” • Perform the instruction but don’t let it write to memory
  23. 23. Intel #VE (Virtualization Exceptions) EPT based tracing has a 4k granularity ● Too much overhead when we only care about certain points ● Handle EPT violations in the guest! #VE Interrupt Service Routine (ISR) ● Interrupt handler within the guest ● Defined in guest IDT #20 ● EPT PTE bit 63 determines if violation triggers #VE or VMEXIT
  24. 24. VMFUNC and EPTP switching “This instruction allows software in VMX non-root operation to invoke a VM function, which is processor functionality enabled and configured by software in VMX root operation. No VM exit occurs.” VMFUNC with EAX=0 => EPTP switching ● Remember how we can have up to 512 EPT’s per VM? ● Pass EPT IDX as parameter in ECX (0-512) ● Faulting translation of GPFN is performed via the new table! ● Guest never really “knows” the value of EPTP
  25. 25. EPTP switching How do we go “back” to the original EPTP afterwards? Single-step and restore? “The key observation is that at any single point in time, a given hardware thread can be fetching an instruction or reading data, but not both. There are ways of avoiding the single-step too. [...] It's an optimization certainly, but it's not required, and it's not a technique we have placed in the public domain. You could try talking to us under NDA or figure it out for yourself.” Ed White (Intel) xen-devel, 1/16/2015
  26. 26. 1-setting of the “EPT-violation #VE” VM- execution control Optional CPU feature baked into the chip During VMFUNC[0] saves the updated EPT index (ECX[15:0]) and gives it to the next #VE Allows you to juggle the “views” without missing anything! Index 0 := selective X traps; 1 := only X; 2 := only R/W • If #VE with IDX=0, switch to IDX=1 (VMEXIT if of interest) • If #VE with IDX=1, switch to IDX=2 • If #VE with IDX=2, switch to IDX=0
  27. 27. System Management Mode (SMM) SMM intended for low-level services, such as: ● thermal (fan) control ● USB emulation ● hw errata workarounds Can be used for: ● VMI ● Anti-VMI! Hard to take control of it on (most) Intel devices as it is loaded by the signed BIOS.
  28. 28. SMM ● Normal mode SMM is triggered by interrupts (SMI) ● Can be configured to happen periodically ● Always returns to the same execution mode afterwards
  29. 29. SMM The problem for SMM based VMI systems: “A limitation of any SMM-based solution [...] is that a malicious hypervisor could block SMI interrupts on every CPU in the APIC, effectively starving the introspection tool. For VMI, trusting the hypervisor is not a problem, but the hardware isolation from the hypervisor is incomplete.” Jain et al. “SoK: Introspections on Trust and the Semantic Gap” 2014, IEEE S&P
  30. 30. Intel Dual-monitor mode SMM – the DMM Available on all CPUs with VT-x SMM can become an independent hypervisor! SMIs are still available but.. A VMCALL executed by the VMM automatically and unconditionally traps into the DMM
  31. 31. Intel DMM The VMCALL instruction can be used to instrument the VMM. ● Same way the VMM can use #BP to instrument a VM. The DMM can enter any execution mode on the system! ● Full control over the execution flow ● Hidden VMs The DMM can disable SMIs for a VM! ● Forced execution ● Nothing in the system can preempt it (SMIs, NMIs, etc.)
  32. 32. ARM
  33. 33. ARM 2-stage paging Very similar to EPT • Fewer software available bits available • We need to store our custom permissions somewhere so we know if a violation was induced by us or not • With EPT its just stored directly in the PTE Xen mem_access for ARM • VMM maintains a Radix tree to store custom permissions • During violation the tree is queried • If violation induced by monitor, forward to monitor
  34. 34. ARM 2-stage paging How do we let the VM progress without potentially missing the next event? No MTF-like singlestep available • Emulation? • On x86 Xen already comes with a built-in emulator • Not so much on ARM.. The same trick we did in the VMFUNC! • Only this time it traps into the VMM
  35. 35. ARM SMC is trappable to the VMM! ARM has no #BP that traps to the VMM • Secure Monitor Call (SMC) can be configured to do so! • SMC allowed only from VMM or guest kernel • Better than nothing What to use the TrustZone for? • Integrity check the VMM! • Make all VMM pages non-writable • VMM #PF handler executes SMC • Verify if change is allowed
  36. 36. Conclusion Modern hardware is complex Behavior sometimes vaguely defined Intel really made a lot of progress since VT-x was introduced but the DMM is very odd ARM generally needs to catch-up but its progressing rapidly Thanks! Questions?

×