Turtles dc9723
Upcoming SlideShare
Loading in...5
×
 

Turtles dc9723

on

  • 1,387 views

 

Statistics

Views

Total Views
1,387
Views on SlideShare
1,367
Embed Views
20

Actions

Likes
0
Downloads
9
Comments
0

3 Embeds 20

http://dc9723.org 17
http://www.techgig.com 2
http://www.dc9723.org 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Turtles dc9723 Turtles dc9723 Presentation Transcript

  • The Turtles Project: Design and Implementation of Nested Virtualization Muli Ben-Yehuda† Michael D. Day‡ Zvi Dubitzky† Michael Factor† † Nadav Har’El Abel Gordon† Anthony Liguori‡ Orit Wasserman† Ben-Ami Yassour† † IBM Research—Haifa ‡ IBM Linux Technology Center Technion—Israel Institute of TechnologyBen-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 1/1
  • What is nested x86 virtualization? Running multiple unmodified Guest Guest hypervisors OS OS With their associated unmodified VM’s Guest Guest Hypervisor OS Simultaneously On the x86 architecture Which does not support Hypervisor nesting in hardware. . . . . . but does support a single level of virtualization HardwareBen-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 2/1
  • Why? Operating systems are already hypervisors (Windows 7 with XP mode, Linux/KVM) Security: attack via or defend against hypervisor-level rootkits such as Blue Pill To be able to run other hypervisors in clouds Co-design of x86 hardware and system software Testing, demonstrating, debugging, live migration of hypervisorsBen-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 3/1
  • Related work First models for nested virtualization [PopekGoldberg74, BelpaireHsu75, LauerWyeth73] First implementation in the IBM z/VM; relies on architectural support for nested virtualization (sie) Microkernels meet recursive VMs [FordHibler96]: assumes we can modify software at all levels x86 software based approaches (slow!) [Berghmans10] KVM [KivityKamay07] with AMD SVM [RoedelGraf09] Early Xen prototype [He09] Blue Pill rootkit hiding from other hypervisors [Rutkowska06]Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 4/1
  • What is the Turtles project? Efficient nested virtualization for Intel x86 based on KVM Runs multiple guest hypervisors and VMs: KVM, VMware, Linux, Windows, . . . Code publicly availableBen-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 5/1
  • What is the Turtles project? (cont’) Nested VMX virtualization for nested CPU virtualization Multi-dimensional paging for nested MMU virtualization Multi-level device assignment for nested I/O virtualization Micro-optimizations to make it go fast + + =Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 6/1
  • Theory of nested CPU virtualization Trap and emulate[PopekGoldberg74] ⇒ it’s all about the traps Single-level (x86) vs. multi-level (e.g., z/VM) Single level ⇒ one hypervisor, many guests Turtles approach: L0 multiplexes the hardware between L1 and L2 , running both as guests of L0 —without either being aware of it (Scheme generalized for n levels; Our focus is n=2) Guest Guest L2 L2 Guest Guest Hypervisor Guest Hypervisor Guest Guest Guest L1 L1 L2 L2 L0 L0 Host Hypervisor Host Hypervisor Hardware Hardware Multiple logical levels Multiplexed on a single levelBen-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 7/1
  • Nested VMX virtualization: flow L0 runs L1 with VMCS0→1 L1 prepares VMCS1→2 and L1 L2 executes vmlaunch Guest OS vmlaunch traps to L0 Guest Hypervisor L0 merges VMCS’s: Memory 1-2 State VMCS VMCS 0→1 merged with Tables VMCS 1→2 is VMCS0→2 L0 launches L2 0-1 State Memory Memory 0-2 State VMCS VMCS Tables Tables L2 causes a trap L0 handles trap itself or L0 Host Hypervisor forwards it to L1 ... Hardware eventually, L0 resumes L2 repeatBen-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 8/1
  • Exit multiplication makes angry turtle angry To handle a single L2 exit, L1 does many things: read and write the VMCS, disable interrupts, . . . Those operations can trap, leading to exit multiplication Exit multiplication: a single L2 exit can cause 40-50 L1 exits! Optimize: make a single exit fast and reduce frequency of exits L3 … L2 … … L1 … L0 Single Two Three Levels Level LevelsBen-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 9/1
  • Introduction to x86 MMU virtualization x86 does page table walks in hardware MMU has one currently active hardware page table Bare metal ⇒ only needs one logical translation, (virtual → physical) Virtualization ⇒ needs two logical translations 1 Guest page table: (guest virt → guest phys) 2 Host page table: (guest phys → host phys) . . . but MMU only knows to walk a single table!Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 10 / 1
  • Software MMU virtualization: shadow paging Guest virtual GV->GP Guest physical SPT GP->HP Host physical Two logical translations compressed onto the shadow page table [DevineBugnion02] Unmodified guest OS updates its own table Hypervisor traps OS page table updates Hypervisor propagates updates to the hardware table MMU walks the table Problem: traps are expensiveBen-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 11 / 1
  • Hardware MMU virtualization: Extended Page Tables Guest virtual GV->GP GPT Guest physical GP->HP EPT Host physical Two-dimensional paging: guest owns GPT, hypervisor owns EPT [BhargavaSerebrin08] Unmodified guest OS updates GPT Hypervisor updates EPT table controlling (guest phys → host phys) translations MMU walks both tablesBen-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 12 / 1
  • Nested MMU virt. via multi-dimensional paging Three logical translations: L2 virt → phys, L2 → L1 , L1 → L0 Only two tables in hardware with EPT: virt → phys and guest physical → host physical L0 compresses three logical translations onto two hardware tables Shadow on top of Shadow on top Multi-dimensional shadow of EPT paging L2 virtual L2 virtual L2 virtual GPT GPT GPT L2 physical L2 physical L2 physical SPT12 SPT12 SPT02 EPT12 L1 physical L1 physical L1 physical EPT02 EPT01 EPT01 L0 physical L0 physical L0 physical baseline better bestBen-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 13 / 1
  • Baseline: shadow-on-shadow L2 virtual GPT L2 physical SPT12 SPT02 L1 physical L0 physical Assume no EPT table; all hypervisors use shadow paging Useful for old machines and as a baseline Maintaining shadow page tables is expensive Compress: three logical translations ⇒ one table in hardwareBen-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 14 / 1
  • Better: shadow-on-EPT L2 virtual GPT L2 physical SPT12 L1 physical EPT01 L0 physical Instead of one hardware table we have two Compress: three logical translations ⇒ two in hardware Simple approach: L0 uses EPT, L1 uses shadow paging for L2 Every L2 page fault leads to multiple L1 exitsBen-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 15 / 1
  • Best: multi-dimensional paging L2 virtual GPT L2 physical EPT12 L1 physical EPT02 EPT01 L0 physical EPT table rarely changes; guest page table changes a lot Again, compress three logical translations ⇒ two in hardware L0 emulates EPT for L1 L0 uses EPT0→1 and EPT1→2 to construct EPT0→2 End result: a lot less exits!Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 16 / 1
  • Introduction to I/O virtualization From the hypervisor’s perspective, what is I/O? (1) PIO (2) MMIO (3) DMA (4) interrupts Device emulation [Sugerman01] HOST GUEST 1 device driver 2 device device driver emulation 4 3 Para-virtualized drivers [Barham03, Russell08] HOST GUEST front−end virtual driver 1 back−end device virtual 3 driver 2 driver Direct device assignment [Levasseur04,Yassour08] HOST GUEST device driver Direct assignment best performing option Direct assignment requires IOMMU for safe DMA bypass Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 17 / 1
  • Multi-level device assignment With nested 3x3 options for I/O virtualization (L2 ⇔ L1 ⇔ L0 ) Multi-level device assignment means giving an L2 guest direct access to L0 ’s devices, safely bypassing both L0 and L1 MMIOs and PIOs L2 device L1 L0 physical driver hypervisor hypervisor device L1 IOMMU L0 IOMMU Device DMA via platform IOMMU How? L0 emulates an IOMMU for L1 [Amit11] L0 compresses multiple IOMMU translations onto the single hardware IOMMU page table L2 programs the device directly Device DMA’s into L2 memory space directlyBen-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 18 / 1
  • Micro-optimizations Goal: reduce world switch overheads Reduce cost of single exit by focus on VMCS merges: Keep VMCS fields in processor encoding Partial updates instead of whole-sale copying Copy multiple fields at once Some optimizations not safe according to spec Reduce frequency of exits—focus on vmread and vmwrite Avoid the exit multiplier effect Loads/stores vs. architected trapping instructions Binary patching?Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 19 / 1
  • Nested VMX support in KVMDate: Mon, 16 May 2011 22:43:54 +0300From: Nadav Har’El <nyh (at) il.ibm.com>To: kvm [at] vger.kernel.orgCc: gleb [at] redhat.com, avi [at] redhat.comSubject: [PATCH 0/31] nVMX: Nested VMX, v10Hi,This is the tenth iteration of the nested VMX patch set. Improvements in thisversion over the previous one include: * Fix the code which did not fully maintain a list of all VMCSs loaded on each CPU. (Avi, this was the big thing that bothered you in the previous version). * Add nested-entry-time (L1->L2) verification of control fields of vmcs12 - procbased, pinbased, entry, exit and secondary controls - compared to the capability MSRs which we advertise to L1.[many other changes trimmed]This new set of patches applies to the current KVM trunk (I checked with6f1bd0daae731ff07f4755b4f56730a6e4a3c1cb).If you wish, you can also check out an already-patched version of KVM frombranch "nvmx10" of the repository: git://github.com/nyh/kvm-nested-vmx.gitBen-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 20 / 1
  • Windows XP on KVM L1 on KVM L0Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 21 / 1
  • Linux on VMware L1 on KVM L0Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 22 / 1
  • Experimental Setup Running Linux, Windows, KVM, VMware, SMP, . . . Macro workloads: kernbench SPECjbb netperf Multi-dimensional paging? Multi-level device assignment? KVM as L1 vs. VMware as L1 ? See paper for full experimental details and more benchmarks and analysisBen-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 23 / 1
  • Macro: SPECjbb and kernbench kernbench Host Guest Nested NestedDRW Run time 324.3 355 406.3 391.5 % overhead vs. host - 9.5 25.3 20.7 % overhead vs. guest - - 14.5 10.3 SPECjbb Host Guest Nested NestedDRW Score 90493 83599 77065 78347 % degradation vs. host - 7.6 14.8 13.4 % degradation vs. guest - - 7.8 6.3 Table: kernbench and SPECjbb results Exit multiplication effect not as bad as we feared Direct vmread and vmwrite (DRW) give an immediate boost Take-away: each level of virtualization adds approximately the same overhead!Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 24 / 1
  • Macro: multi-dimensional paging 3.5 3.0 2.5 Shadow on EPTImprovement ratio Multi−dimensional paging 2.0 1.5 1.0 0.5 0.0 kernbench specjbb netperf Impact of multi-dimensional paging depends on rate of page faults Shadow-on-EPT: every L2 page fault causes L1 multiple exits Multi-dimensional paging: only EPT violations cause L1 exits EPT table rarely changes: #(EPT violations) << #(page faults) Multi-dimensional paging huge win for page-fault intensive kernbench Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 25 / 1
  • Macro: multi-level device assignment throughput (Mbps) %cpu 1,000 100 900 800 80throughput (Mbps) 700 600 60 % cpu 500 400 40 300 200 20 100 0 nat si sing sing nes nes nes nes nes 0 ive engle mu le virti e le direlc le emtu d guvirted guvirted gudirted gudirted gu l e e lati vel o vel t a vel lati es tio / es tio / es ect / es ect / es on gue gue cce gue on /t em t ul virt t virtt diret st st ss st em io io ct ula ation tion Benchmark: netperf TCP_STREAM (transmit) Multi-level device assignment best performing option But: native at 20%, multi-level device assignment at 60% (x3!) Interrupts considered harmful, cause exit multiplication Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 26 / 1
  • Macro: multi-level device assignment (sans interrupts) 1000 900 Throughput (Mbps) 800 700 600 500 400 300 L0 (bare metal) 200 L2 (direct/direct) L2 (direct/virtio) 100 16 32 64 128 256 512 Message size (netperf -m) What if we could deliver device interrupts directly to L2 ? Only 7% difference between native and nested guest!Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 27 / 1
  • Micro: synthetic worst case CPUID loop 60,000 50,000 40,000CPU Cycles L1 30,000 L0 cpu mode switch 20,000 10,000 0 1. S 2. N 3. N 4. N 5. N Gueingle este o es opti este o es st Lev d G ptimited G miz d Gu ptimited Gu el ues zati ues atio est zati es t ons t ns 3 ons t 3.5. .5.2 3.5. 1 1 &3 .5.2 CPUID running in a tight loop is not a real-world workload! Went from 30x worse to “only” 6x worse A nested exit is still expensive—minimize both single exit cost and frequency of exits Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 28 / 1
  • Conclusions Efficient nested x86 virtualization is challenging but feasible A whole new ballpark opening up many exciting applications—security, cloud, architecture, . . . Current overhead of 6-14% Negligible for some workloads, not yet for others Work in progress—expect at most 5% eventually Code is available Why Turtles? It’s turtles all the way downBen-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 29 / 1
  • Questions?Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization dc9723 May, 2011 30 / 1