SlideShare a Scribd company logo
The Turtles Project: 
Design and Implementation of Nested Virtualization 
Muli Ben-Yehuday Michael D. Dayz Zvi Dubitzkyy Michael Factory 
Nadav Har’Ely Abel Gordony Anthony Liguoriz Orit Wassermany 
Ben-Ami Yassoury 
yIBM Research – Haifa 
zIBM Linux Technology Center 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 1 / 22
What is nested x86 virtualization? 
Running multiple unmodified 
hypervisors 
With their associated 
unmodified VM’s 
Guest 
Simultaneously 
Hypervisor 
On the x86 architecture 
Which does not support 
nesting in hardware. . . 
. . . but does support a single 
level of virtualization Hardware 
Hypervisor 
Guest 
OS 
Guest 
OS 
Guest 
OS 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 2 / 22
Why? 
Operating systems are already hypervisors (Windows 7 with XP 
mode, Linux/KVM) 
To be able to run other hypervisors in clouds 
Security (e.g., hypervisor-level rootkits) 
Co-design of x86 hardware and system software 
Testing, demonstrating, debugging, live migration of hypervisors 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 3 / 22
Related work 
First models for nested virtualization [PopekGoldberg74, 
BelpaireHsu75, LauerWyeth73] 
First implementation in the IBM z/VM; relies on architectural 
support for nested virtualization (sie) 
Microkernels meet recursive VMs [FordHibler96]: assumes we 
can modify software at all levels 
x86 software based approaches (slow!) [Berghmans10] 
KVM [KivityKamay07] with AMD SVM [RoedelGraf09] 
Early Xen prototype [He09] 
Blue Pill rootkit hiding from other hypervisors [Rutkowska06] 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 4 / 22
What is the Turtles project? 
Efficient nested virtualization for Intel x86 based on KVM 
Multiple guest hypervisors and VMs: VMware, Windows, . . . 
Code publicly available 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 5 / 22
What is the Turtles project? (cont’) 
Nested VMX virtualization for nested CPU virtualization 
Multi-dimensional paging for nested MMU virtualization 
Multi-level device assignment for nested I/O virtualization 
Micro-optimizations to make it go fast (see paper) 
+ + = 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 6 / 22
Theory of nested CPU virtualization 
Single-level architectural support (x86) vs. multi-level architectural 
support (e.g., z/VM) 
Single level ) one hypervisor, many guests 
Turtles approach: L0 multiplexes the hardware between L1 and L2, 
running both as guests of L0—without either being aware of it 
(Scheme generalized for n levels; Our focus is n=2) 
Guest 
Host Hypervisor 
Hardware 
Host Hypervisor 
Hardware 
Multiple logical levels Multiplexed on a single level 
L1 
L0 
L2 
L1 
Guest 
L2 
Guest 
L2 
L0 
Guest 
L2 Guest 
Hypervisor 
Guest 
Hypervisor Guest Guest 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 7 / 22
Theory of nested CPU virtualization 
Single-level architectural support (x86) vs. multi-level architectural 
support (e.g., z/VM) 
Single level ) one hypervisor, many guests 
Turtles approach: L0 multiplexes the hardware between L1 and L2, 
running both as guests of L0—without either being aware of it 
(Scheme generalized for n levels; Our focus is n=2) 
Guest 
Host Hypervisor 
Hardware 
Host Hypervisor 
Hardware 
Multiple logical levels Multiplexed on a single level 
L1 
L0 
L2 
L1 
Guest 
L2 
Guest 
L2 
L0 
Guest 
L2 Guest 
Hypervisor 
Guest 
Hypervisor Guest Guest 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 7 / 22
Theory of nested CPU virtualization 
Single-level architectural support (x86) vs. multi-level architectural 
support (e.g., z/VM) 
Single level ) one hypervisor, many guests 
Turtles approach: L0 multiplexes the hardware between L1 and L2, 
running both as guests of L0—without either being aware of it 
(Scheme generalized for n levels; Our focus is n=2) 
Guest 
Host Hypervisor 
Hardware 
Host Hypervisor 
Hardware 
Multiple logical levels Multiplexed on a single level 
L1 
L0 
L2 
L1 
Guest 
L2 
Guest 
L2 
L0 
Guest 
L2 Guest 
Hypervisor 
Guest 
Hypervisor Guest Guest 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 7 / 22
Theory of nested CPU virtualization 
Single-level architectural support (x86) vs. multi-level architectural 
support (e.g., z/VM) 
Single level ) one hypervisor, many guests 
Turtles approach: L0 multiplexes the hardware between L1 and L2, 
running both as guests of L0—without either being aware of it 
(Scheme generalized for n levels; Our focus is n=2) 
Guest 
Host Hypervisor 
Hardware 
Host Hypervisor 
Hardware 
Multiple logical levels Multiplexed on a single level 
L1 
L0 
L2 
L1 
Guest 
L2 
Guest 
L2 
L0 
Guest 
L2 Guest 
Hypervisor 
Guest 
Hypervisor Guest Guest 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 7 / 22
Nested VMX virtualization: flow 
L0 runs L1 with VMCS0!1 
L1 prepares VMCS1!2 and 
executes vmlaunch 
vmlaunch traps to L0 
L0 merges VMCS’s: 
VMCS0!1 merged with 
VMCS1!2 is VMCS0!2 
L0 launches L2 
L2 causes a trap 
L0 handles trap itself or 
forwards it to L1 
. . . 
eventually, L0 resumes L2 
repeat 
L1 L2 
Host Hypervisor 
Hardware 
Guest 
OS 
Guest 
Hypervisor 
VMCS 
Memory 
VMCS Tables 
Memory 
Tables VMCS Memory 
Tables 
L0 
1-2 State 
0-1 State 0-2 State 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 8 / 22
Nested VMX virtualization: flow 
L0 runs L1 with VMCS0!1 
L1 prepares VMCS1!2 and 
executes vmlaunch 
vmlaunch traps to L0 
L0 merges VMCS’s: 
VMCS0!1 merged with 
VMCS1!2 is VMCS0!2 
L0 launches L2 
L2 causes a trap 
L0 handles trap itself or 
forwards it to L1 
. . . 
eventually, L0 resumes L2 
repeat 
L1 L2 
Host Hypervisor 
Hardware 
Guest 
OS 
Guest 
Hypervisor 
VMCS 
Memory 
VMCS Tables 
Memory 
Tables VMCS Memory 
Tables 
L0 
1-2 State 
0-1 State 0-2 State 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 8 / 22
Nested VMX virtualization: flow 
L0 runs L1 with VMCS0!1 
L1 prepares VMCS1!2 and 
executes vmlaunch 
vmlaunch traps to L0 
L0 merges VMCS’s: 
VMCS0!1 merged with 
VMCS1!2 is VMCS0!2 
L0 launches L2 
L2 causes a trap 
L0 handles trap itself or 
forwards it to L1 
. . . 
eventually, L0 resumes L2 
repeat 
L1 L2 
Host Hypervisor 
Hardware 
Guest 
OS 
Guest 
Hypervisor 
VMCS 
Memory 
VMCS Tables 
Memory 
Tables VMCS Memory 
Tables 
L0 
1-2 State 
0-1 State 0-2 State 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 8 / 22
Nested VMX virtualization: flow 
L0 runs L1 with VMCS0!1 
L1 prepares VMCS1!2 and 
executes vmlaunch 
vmlaunch traps to L0 
L0 merges VMCS’s: 
VMCS0!1 merged with 
VMCS1!2 is VMCS0!2 
L0 launches L2 
L2 causes a trap 
L0 handles trap itself or 
forwards it to L1 
. . . 
eventually, L0 resumes L2 
repeat 
L1 L2 
Host Hypervisor 
Hardware 
Guest 
OS 
Guest 
Hypervisor 
VMCS 
Memory 
VMCS Tables 
Memory 
Tables VMCS Memory 
Tables 
L0 
1-2 State 
0-1 State 0-2 State 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 8 / 22
Nested VMX virtualization: flow 
L0 runs L1 with VMCS0!1 
L1 prepares VMCS1!2 and 
executes vmlaunch 
vmlaunch traps to L0 
L0 merges VMCS’s: 
VMCS0!1 merged with 
VMCS1!2 is VMCS0!2 
L0 launches L2 
L2 causes a trap 
L0 handles trap itself or 
forwards it to L1 
. . . 
eventually, L0 resumes L2 
repeat 
L1 L2 
Host Hypervisor 
Hardware 
Guest 
OS 
Guest 
Hypervisor 
VMCS 
Memory 
VMCS Tables 
Memory 
Tables VMCS Memory 
Tables 
L0 
1-2 State 
0-1 State 0-2 State 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 8 / 22
Nested VMX virtualization: flow 
L0 runs L1 with VMCS0!1 
L1 prepares VMCS1!2 and 
executes vmlaunch 
vmlaunch traps to L0 
L0 merges VMCS’s: 
VMCS0!1 merged with 
VMCS1!2 is VMCS0!2 
L0 launches L2 
L2 causes a trap 
L0 handles trap itself or 
forwards it to L1 
. . . 
eventually, L0 resumes L2 
repeat 
L1 L2 
Host Hypervisor 
Hardware 
Guest 
OS 
Guest 
Hypervisor 
VMCS 
Memory 
VMCS Tables 
Memory 
Tables VMCS Memory 
Tables 
L0 
1-2 State 
0-1 State 0-2 State 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 8 / 22
Nested VMX virtualization: flow 
L0 runs L1 with VMCS0!1 
L1 prepares VMCS1!2 and 
executes vmlaunch 
vmlaunch traps to L0 
L0 merges VMCS’s: 
VMCS0!1 merged with 
VMCS1!2 is VMCS0!2 
L0 launches L2 
L2 causes a trap 
L0 handles trap itself or 
forwards it to L1 
. . . 
eventually, L0 resumes L2 
repeat 
L1 L2 
Host Hypervisor 
Hardware 
Guest 
OS 
Guest 
Hypervisor 
VMCS 
Memory 
VMCS Tables 
Memory 
Tables VMCS Memory 
Tables 
L0 
1-2 State 
0-1 State 0-2 State 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 8 / 22
Nested VMX virtualization: flow 
L0 runs L1 with VMCS0!1 
L1 prepares VMCS1!2 and 
executes vmlaunch 
vmlaunch traps to L0 
L0 merges VMCS’s: 
VMCS0!1 merged with 
VMCS1!2 is VMCS0!2 
L0 launches L2 
L2 causes a trap 
L0 handles trap itself or 
forwards it to L1 
. . . 
eventually, L0 resumes L2 
repeat 
L1 L2 
Host Hypervisor 
Hardware 
Guest 
OS 
Guest 
Hypervisor 
VMCS 
Memory 
VMCS Tables 
Memory 
Tables VMCS Memory 
Tables 
L0 
1-2 State 
0-1 State 0-2 State 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 8 / 22
Nested VMX virtualization: flow 
L0 runs L1 with VMCS0!1 
L1 prepares VMCS1!2 and 
executes vmlaunch 
vmlaunch traps to L0 
L0 merges VMCS’s: 
VMCS0!1 merged with 
VMCS1!2 is VMCS0!2 
L0 launches L2 
L2 causes a trap 
L0 handles trap itself or 
forwards it to L1 
. . . 
eventually, L0 resumes L2 
repeat 
L1 L2 
Host Hypervisor 
Hardware 
Guest 
OS 
Guest 
Hypervisor 
VMCS 
Memory 
VMCS Tables 
Memory 
Tables VMCS Memory 
Tables 
L0 
1-2 State 
0-1 State 0-2 State 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 8 / 22
Nested VMX virtualization: flow 
L0 runs L1 with VMCS0!1 
L1 prepares VMCS1!2 and 
executes vmlaunch 
vmlaunch traps to L0 
L0 merges VMCS’s: 
VMCS0!1 merged with 
VMCS1!2 is VMCS0!2 
L0 launches L2 
L2 causes a trap 
L0 handles trap itself or 
forwards it to L1 
. . . 
eventually, L0 resumes L2 
repeat 
L1 L2 
Host Hypervisor 
Hardware 
Guest 
OS 
Guest 
Hypervisor 
VMCS 
Memory 
VMCS Tables 
Memory 
Tables VMCS Memory 
Tables 
L0 
1-2 State 
0-1 State 0-2 State 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 8 / 22
Exit multiplication makes angry turtle angry 
To handle a single L2 exit, L1 does many things: read and write 
the VMCS, disable interrupts, . . . 
Those operations can trap, leading to exit multiplication 
Exit multiplication: a single L2 exit can cause 40-50 L1 exits! 
Optimize: make a single exit fast and reduce frequency of exits 
… 
… 
… 
… 
L3 
L2 
L1 
L0 
Single Two 
Three Levels 
Level 
Levels 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 9 / 22
Exit multiplication makes angry turtle angry 
To handle a single L2 exit, L1 does many things: read and write 
the VMCS, disable interrupts, . . . 
Those operations can trap, leading to exit multiplication 
Exit multiplication: a single L2 exit can cause 40-50 L1 exits! 
Optimize: make a single exit fast and reduce frequency of exits 
… 
… 
… 
… 
L3 
L2 
L1 
L0 
Single Two 
Three Levels 
Level 
Levels 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 9 / 22
Exit multiplication makes angry turtle angry 
To handle a single L2 exit, L1 does many things: read and write 
the VMCS, disable interrupts, . . . 
Those operations can trap, leading to exit multiplication 
Exit multiplication: a single L2 exit can cause 40-50 L1 exits! 
Optimize: make a single exit fast and reduce frequency of exits 
… 
… 
… 
… 
L3 
L2 
L1 
L0 
Single Two 
Three Levels 
Level 
Levels 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 9 / 22
Exit multiplication makes angry turtle angry 
To handle a single L2 exit, L1 does many things: read and write 
the VMCS, disable interrupts, . . . 
Those operations can trap, leading to exit multiplication 
Exit multiplication: a single L2 exit can cause 40-50 L1 exits! 
Optimize: make a single exit fast and reduce frequency of exits 
… 
… 
… 
… 
L3 
L2 
L1 
L0 
Single Two 
Three Levels 
Level 
Levels 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 9 / 22
MMU virtualization via multi-dimensional paging 
Three logical translations: L2 virt ! phys, L2 ! L1, L1 ! L0 
Only two tables in hardware with EPT: 
virt ! phys and guest physical ! host physical 
L0 compresses three logical translations onto two hardware tables 
SPT12 
Shadow on top of 
L2 virtual 
L2 physical 
L1 physical 
L0 physical 
GPT 
L2 virtual 
L2 physical 
L1 physical 
L0 physical 
GPT 
SPT02 
shadow 
SPT12 
EPT01 
Multi-dimensional 
L2 virtual 
L2 physical 
L1 physical 
L0 physical 
GPT 
EPT02 
EPT12 
paging 
Shadow on top 
of EPT 
EPT01 
baseline better best 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 10 / 22
Baseline: shadow-on-shadow 
SPT12 
L2 virtual 
L2 physical 
L1 physical 
L0 physical 
GPT 
SPT02 
Assume no EPT table; all hypervisors use shadow paging 
Useful for old machines and as a baseline 
Maintaining shadow page tables is expensive 
Compress: three logical translations ) one table in hardware 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 11 / 22
Better: shadow-on-EPT 
L2 virtual 
L2 physical 
L1 physical 
L0 physical 
SPT12 
EPT01 
GPT 
Instead of one hardware table we have two 
Compress: three logical translations ) two in hardware 
Simple approach: L0 uses EPT, L1 uses shadow paging for L2 
Every L2 page fault leads to multiple L1 exits 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 12 / 22
Best: multi-dimensional paging 
L2 virtual 
L2 physical 
L1 physical 
L0 physical 
GPT 
EPT02 
EPT12 
EPT01 
EPT table rarely changes; guest page table changes a lot 
Again, compress three logical translations ) two in hardware 
L0 emulates EPT for L1 
L0 uses EPT0!1 and EPT1!2 to construct EPT0!2 
End result: a lot less exits! 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 13 / 22
Introduction to I/O virtualization 
Device emulation [Sugerman01] 
HOST 
GUEST 
1 
2 
device 
4 3 
device 
driver 
device 
emulation 
driver 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 14 / 22
Introduction to I/O virtualization 
Device emulation [Sugerman01] 
HOST 
GUEST 
1 
2 
device 
4 3 
device 
driver 
device 
emulation 
driver 
Para-virtualized drivers [Barham03, Russell08] 
HOST 
GUEST 
front−end 
virtual 
driver 
1 
driver 
3 2 
back−end 
device virtual 
driver 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 14 / 22
Introduction to I/O virtualization 
Device emulation [Sugerman01] 
HOST 
GUEST 
1 
2 
device 
4 3 
device 
driver 
device 
emulation 
driver 
Para-virtualized drivers [Barham03, Russell08] 
HOST 
GUEST 
front−end 
virtual 
driver 
1 
driver 
3 2 
back−end 
device virtual 
driver 
Direct device assignment [Levasseur04,Yassour08] 
HOST 
GUEST 
device 
driver 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 14 / 22
Introduction to I/O virtualization 
Device emulation [Sugerman01] 
HOST 
GUEST 
1 
2 
device 
4 3 
device 
driver 
device 
emulation 
driver 
Para-virtualized drivers [Barham03, Russell08] 
HOST 
GUEST 
front−end 
virtual 
driver 
1 
driver 
3 2 
back−end 
device virtual 
driver 
Direct device assignment [Levasseur04,Yassour08] 
HOST 
GUEST 
device 
driver 
Direct assignment best performing option 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 14 / 22
Introduction to I/O virtualization 
Device emulation [Sugerman01] 
HOST 
GUEST 
1 
2 
device 
4 3 
device 
driver 
device 
emulation 
driver 
Para-virtualized drivers [Barham03, Russell08] 
HOST 
GUEST 
front−end 
virtual 
driver 
1 
driver 
3 2 
back−end 
device virtual 
driver 
Direct device assignment [Levasseur04,Yassour08] 
HOST 
GUEST 
device 
driver 
Direct assignment best performing option 
Direct assignment requires IOMMU for safe DMA bypass 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 14 / 22
Multi-level device assignment 
With nested 3x3 options for I/O virtualization (L2 , L1 , L0) 
Multi-level device assignment means giving an L2 guest direct 
access to L0’s devices, safely bypassing both L0 and L1 
L1 
hypervisor 
physical 
device 
L0 
hypervisor 
L2 device 
driver 
MMIOs and PIOs 
L L0 IOMMU 1 IOMMU 
Device DMA via 
platform IOMMU 
How? L0 emulates an IOMMU for L1 [Amit10] 
L0 compresses multiple IOMMU translations onto the single 
hardware IOMMU page table 
L2 programs the device directly 
Device DMA’s into L2 memory space directly 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 15 / 22
Multi-level device assignment 
With nested 3x3 options for I/O virtualization (L2 , L1 , L0) 
Multi-level device assignment means giving an L2 guest direct 
access to L0’s devices, safely bypassing both L0 and L1 
L1 
hypervisor 
physical 
device 
L0 
hypervisor 
L2 device 
driver 
MMIOs and PIOs 
L L0 IOMMU 1 IOMMU 
Device DMA via 
platform IOMMU 
How? L0 emulates an IOMMU for L1 [Amit10] 
L0 compresses multiple IOMMU translations onto the single 
hardware IOMMU page table 
L2 programs the device directly 
Device DMA’s into L2 memory space directly 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 15 / 22
Multi-level device assignment 
With nested 3x3 options for I/O virtualization (L2 , L1 , L0) 
Multi-level device assignment means giving an L2 guest direct 
access to L0’s devices, safely bypassing both L0 and L1 
L1 
hypervisor 
physical 
device 
L0 
hypervisor 
L2 device 
driver 
MMIOs and PIOs 
L L0 IOMMU 1 IOMMU 
Device DMA via 
platform IOMMU 
How? L0 emulates an IOMMU for L1 [Amit10] 
L0 compresses multiple IOMMU translations onto the single 
hardware IOMMU page table 
L2 programs the device directly 
Device DMA’s into L2 memory space directly 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 15 / 22
Multi-level device assignment 
With nested 3x3 options for I/O virtualization (L2 , L1 , L0) 
Multi-level device assignment means giving an L2 guest direct 
access to L0’s devices, safely bypassing both L0 and L1 
L1 
hypervisor 
physical 
device 
L0 
hypervisor 
L2 device 
driver 
MMIOs and PIOs 
L L0 IOMMU 1 IOMMU 
Device DMA via 
platform IOMMU 
How? L0 emulates an IOMMU for L1 [Amit10] 
L0 compresses multiple IOMMU translations onto the single 
hardware IOMMU page table 
L2 programs the device directly 
Device DMA’s into L2 memory space directly 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 15 / 22
Multi-level device assignment 
With nested 3x3 options for I/O virtualization (L2 , L1 , L0) 
Multi-level device assignment means giving an L2 guest direct 
access to L0’s devices, safely bypassing both L0 and L1 
L1 
hypervisor 
physical 
device 
L0 
hypervisor 
L2 device 
driver 
MMIOs and PIOs 
L L0 IOMMU 1 IOMMU 
Device DMA via 
platform IOMMU 
How? L0 emulates an IOMMU for L1 [Amit10] 
L0 compresses multiple IOMMU translations onto the single 
hardware IOMMU page table 
L2 programs the device directly 
Device DMA’s into L2 memory space directly 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 15 / 22
Multi-level device assignment 
With nested 3x3 options for I/O virtualization (L2 , L1 , L0) 
Multi-level device assignment means giving an L2 guest direct 
access to L0’s devices, safely bypassing both L0 and L1 
L1 
hypervisor 
physical 
device 
L0 
hypervisor 
L2 device 
driver 
MMIOs and PIOs 
L L0 IOMMU 1 IOMMU 
Device DMA via 
platform IOMMU 
How? L0 emulates an IOMMU for L1 [Amit10] 
L0 compresses multiple IOMMU translations onto the single 
hardware IOMMU page table 
L2 programs the device directly 
Device DMA’s into L2 memory space directly 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 15 / 22
Experimental Setup 
Running Linux, Windows, 
KVM, VMware, SMP, . . . 
Macro workloads: 
kernbench 
SPECjbb 
netperf 
Multi-dimensional paging? 
Multi-level device assignment? 
KVM as L1 vs. VMware as L1? 
See paper for full experimental 
details, more benchmarks and 
analysis, including worst case 
synthetic micro-benchmark 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 16 / 22
Macro: SPECjbb and kernbench 
kernbench 
Host Guest Nested NestedDRW 
Run time 324.3 355 406.3 391.5 
% overhead vs. host - 9.5 25.3 20.7 
% overhead vs. guest - - 14.5 10.3 
SPECjbb 
Host Guest Nested NestedDRW 
Score 90493 83599 77065 78347 
% degradation vs. host - 7.6 14.8 13.4 
% degradation vs. guest - - 7.8 6.3 
Table: kernbench and SPECjbb results 
Exit multiplication effect not as bad as we feared 
Direct vmread and vmwrite (DRW) give an immediate boost 
Take-away: each level of virtualization adds approximately the same 
overhead! 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 17 / 22
Macro: multi-dimensional paging 
Shadow on EPT 
Multi−dimensional paging
 
3.5
 
3.0
 
2.5
 
2.0
 
1.5
 
1.0
 
0.5
 
0.0
 
kernbench specjbb netperf 
Improvement ratio
 
Impact of multi-dimensional paging depends on rate of page faults 
Shadow-on-EPT: every L2 page fault causes L1 multiple exits 
Multi-dimensional paging: only EPT violations cause L1 exits 
EPT table rarely changes: #(EPT violations) << #(page faults) 
Multi-dimensional paging huge win for page-fault intensive 
kernbench 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 18 / 22
Macro: multi-level device assignment 
throughput (Mbps) 
%cpu
 
1,000 
900 
800 
700 
600 
500 
400 
300 
200 
100 
0 
native 
nested guest 
nested guest 
nested guest 
emulation / emulation 
single level guest 
single emulation 
level virtio 
guest 
single direct level access 
guest 
nested guest 
direct / virtio 
virtio / emulation 
virtio / virtio 
100 
80 
60 
40 
20 
0 
nested guest 
direct / direct 
throughput (Mbps)
 
% cpu 
Benchmark: netperf TCP_STREAM (transmit) 
Multi-level device assignment best performing option 
But: native at 20%, multi-level device assignment at 60% (x3!) 
Interrupts considered harmful, cause exit multiplication 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 19 / 22
Macro: multi-level device assignment (sans interrupts) 
1000 
900 
800 
700 
600 
500 
400 
300 
200 
100 
L0 (bare metal) 
L2 (direct/direct) 
L2 (direct/virtio) 
16 32 64 128 256 512 
Throughput (Mbps) 
Message size (netperf -m) 
What if we could deliver device interrupts directly to L2? 
Only 7% difference between native and nested guest! 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 20 / 22
Conclusions 
Efficient nested x86 
virtualization is challenging but 
feasible 
A whole new ballpark opening 
up many exciting 
applications—security, cloud, 
architecture, . . . 
Current overhead of 6-14% 
Negligible for some 
workloads, not yet for others 
Work in progress—expect at 
most 5% eventually 
Code is available 
It’s turtles all the way down 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 21 / 22
Conclusions 
Efficient nested x86 
virtualization is challenging but 
feasible 
A whole new ballpark opening 
up many exciting 
applications—security, cloud, 
architecture, . . . 
Current overhead of 6-14% 
Negligible for some 
workloads, not yet for others 
Work in progress—expect at 
most 5% eventually 
Code is available 
It’s turtles all the way down 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 21 / 22
Conclusions 
Efficient nested x86 
virtualization is challenging but 
feasible 
A whole new ballpark opening 
up many exciting 
applications—security, cloud, 
architecture, . . . 
Current overhead of 6-14% 
Negligible for some 
workloads, not yet for others 
Work in progress—expect at 
most 5% eventually 
Code is available 
It’s turtles all the way down 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 21 / 22
Conclusions 
Efficient nested x86 
virtualization is challenging but 
feasible 
A whole new ballpark opening 
up many exciting 
applications—security, cloud, 
architecture, . . . 
Current overhead of 6-14% 
Negligible for some 
workloads, not yet for others 
Work in progress—expect at 
most 5% eventually 
Code is available 
It’s turtles all the way down 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 21 / 22
Conclusions 
Efficient nested x86 
virtualization is challenging but 
feasible 
A whole new ballpark opening 
up many exciting 
applications—security, cloud, 
architecture, . . . 
Current overhead of 6-14% 
Negligible for some 
workloads, not yet for others 
Work in progress—expect at 
most 5% eventually 
Code is available 
It’s turtles all the way down 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 21 / 22
Questions? 
Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 22 / 22

More Related Content

What's hot

Docker en kernel security
Docker en kernel securityDocker en kernel security
Docker en kernel security
smart_bit
 
sVirt: Hardening Linux Virtualization with Mandatory Access Control
sVirt: Hardening Linux Virtualization with Mandatory Access ControlsVirt: Hardening Linux Virtualization with Mandatory Access Control
sVirt: Hardening Linux Virtualization with Mandatory Access Control
James Morris
 
Cloudstack collaboration conference Europe - SDN and Devops
Cloudstack collaboration conference Europe - SDN and DevopsCloudstack collaboration conference Europe - SDN and Devops
Cloudstack collaboration conference Europe - SDN and Devops
John Willis
 
Security of Linux containers in the cloud
Security of Linux containers in the cloudSecurity of Linux containers in the cloud
Security of Linux containers in the cloud
Dobrica Pavlinušić
 
Linux Container Technology 101
Linux Container Technology 101Linux Container Technology 101
Linux Container Technology 101
inside-BigData.com
 
Containers and Cloud: From LXC to Docker to Kubernetes
Containers and Cloud: From LXC to Docker to KubernetesContainers and Cloud: From LXC to Docker to Kubernetes
Containers and Cloud: From LXC to Docker to Kubernetes
Shreyas MM
 
Docker introduction
Docker introductionDocker introduction
Docker introduction
dotCloud
 
Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack
eurobsdcon
 
Introduction to Docker, December 2014 "Tour de France" Edition
Introduction to Docker, December 2014 "Tour de France" EditionIntroduction to Docker, December 2014 "Tour de France" Edition
Introduction to Docker, December 2014 "Tour de France" Edition
Jérôme Petazzoni
 
Windows Internals for Linux Kernel Developers
Windows Internals for Linux Kernel DevelopersWindows Internals for Linux Kernel Developers
Windows Internals for Linux Kernel Developers
Kernel TLV
 
JAVA AND ANDROID OS_PRESENTATION
JAVA AND ANDROID OS_PRESENTATIONJAVA AND ANDROID OS_PRESENTATION
JAVA AND ANDROID OS_PRESENTATIONBenjamin Agboola
 
Locally run a FIWARE Lab Instance In another Hypervisors
Locally run a FIWARE Lab Instance In another HypervisorsLocally run a FIWARE Lab Instance In another Hypervisors
Locally run a FIWARE Lab Instance In another Hypervisors
José Ignacio Carretero Guarde
 
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo..."Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
Yandex
 
Virtualization - Kernel Virtual Machine (KVM)
Virtualization - Kernel Virtual Machine (KVM)Virtualization - Kernel Virtual Machine (KVM)
Virtualization - Kernel Virtual Machine (KVM)Wan Leung Wong
 
Introduction to Docker (as presented at December 2013 Global Hackathon)
Introduction to Docker (as presented at December 2013 Global Hackathon)Introduction to Docker (as presented at December 2013 Global Hackathon)
Introduction to Docker (as presented at December 2013 Global Hackathon)
Jérôme Petazzoni
 
Learn OpenStack from trystack.cn ——Folsom in practice
Learn OpenStack from trystack.cn  ——Folsom in practiceLearn OpenStack from trystack.cn  ——Folsom in practice
Learn OpenStack from trystack.cn ——Folsom in practice
OpenCity Community
 
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
dotCloud
 
PIC your malware
PIC your malwarePIC your malware
PIC your malware
CODE WHITE GmbH
 
BlueHat v18 || Memory resident implants - code injection is alive and well
BlueHat v18 || Memory resident implants - code injection is alive and wellBlueHat v18 || Memory resident implants - code injection is alive and well
BlueHat v18 || Memory resident implants - code injection is alive and well
BlueHat Security Conference
 

What's hot (19)

Docker en kernel security
Docker en kernel securityDocker en kernel security
Docker en kernel security
 
sVirt: Hardening Linux Virtualization with Mandatory Access Control
sVirt: Hardening Linux Virtualization with Mandatory Access ControlsVirt: Hardening Linux Virtualization with Mandatory Access Control
sVirt: Hardening Linux Virtualization with Mandatory Access Control
 
Cloudstack collaboration conference Europe - SDN and Devops
Cloudstack collaboration conference Europe - SDN and DevopsCloudstack collaboration conference Europe - SDN and Devops
Cloudstack collaboration conference Europe - SDN and Devops
 
Security of Linux containers in the cloud
Security of Linux containers in the cloudSecurity of Linux containers in the cloud
Security of Linux containers in the cloud
 
Linux Container Technology 101
Linux Container Technology 101Linux Container Technology 101
Linux Container Technology 101
 
Containers and Cloud: From LXC to Docker to Kubernetes
Containers and Cloud: From LXC to Docker to KubernetesContainers and Cloud: From LXC to Docker to Kubernetes
Containers and Cloud: From LXC to Docker to Kubernetes
 
Docker introduction
Docker introductionDocker introduction
Docker introduction
 
Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack
 
Introduction to Docker, December 2014 "Tour de France" Edition
Introduction to Docker, December 2014 "Tour de France" EditionIntroduction to Docker, December 2014 "Tour de France" Edition
Introduction to Docker, December 2014 "Tour de France" Edition
 
Windows Internals for Linux Kernel Developers
Windows Internals for Linux Kernel DevelopersWindows Internals for Linux Kernel Developers
Windows Internals for Linux Kernel Developers
 
JAVA AND ANDROID OS_PRESENTATION
JAVA AND ANDROID OS_PRESENTATIONJAVA AND ANDROID OS_PRESENTATION
JAVA AND ANDROID OS_PRESENTATION
 
Locally run a FIWARE Lab Instance In another Hypervisors
Locally run a FIWARE Lab Instance In another HypervisorsLocally run a FIWARE Lab Instance In another Hypervisors
Locally run a FIWARE Lab Instance In another Hypervisors
 
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo..."Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
 
Virtualization - Kernel Virtual Machine (KVM)
Virtualization - Kernel Virtual Machine (KVM)Virtualization - Kernel Virtual Machine (KVM)
Virtualization - Kernel Virtual Machine (KVM)
 
Introduction to Docker (as presented at December 2013 Global Hackathon)
Introduction to Docker (as presented at December 2013 Global Hackathon)Introduction to Docker (as presented at December 2013 Global Hackathon)
Introduction to Docker (as presented at December 2013 Global Hackathon)
 
Learn OpenStack from trystack.cn ——Folsom in practice
Learn OpenStack from trystack.cn  ——Folsom in practiceLearn OpenStack from trystack.cn  ——Folsom in practice
Learn OpenStack from trystack.cn ——Folsom in practice
 
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
 
PIC your malware
PIC your malwarePIC your malware
PIC your malware
 
BlueHat v18 || Memory resident implants - code injection is alive and well
BlueHat v18 || Memory resident implants - code injection is alive and wellBlueHat v18 || Memory resident implants - code injection is alive and well
BlueHat v18 || Memory resident implants - code injection is alive and well
 

Similar to Ben yehuda

Nova for Physicalization and Virtualization compute models
Nova for Physicalization and Virtualization compute modelsNova for Physicalization and Virtualization compute models
Nova for Physicalization and Virtualization compute modelsopenstackindia
 
open source virtualization
open source virtualizationopen source virtualization
open source virtualization
Kris Buytaert
 
Virtual machines and containers
Virtual machines and containersVirtual machines and containers
Virtual machines and containers
Patrick Pierson
 
Proxmox for DevOps
Proxmox for DevOpsProxmox for DevOps
Proxmox for DevOps
Jorge Moratilla Porras
 
Virtual Container - Docker
Virtual Container - Docker Virtual Container - Docker
Virtual Container - Docker
Venkata Naga Ravi
 
Linux virtualization
Linux virtualizationLinux virtualization
Linux virtualization
Google
 
DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...
DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...
DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...
Jim St. Leger
 
Unikernels: the rise of the library hypervisor in MirageOS
Unikernels: the rise of the library hypervisor in MirageOSUnikernels: the rise of the library hypervisor in MirageOS
Unikernels: the rise of the library hypervisor in MirageOS
Docker, Inc.
 
Unikernels: Rise of the Library Hypervisor
Unikernels: Rise of the Library HypervisorUnikernels: Rise of the Library Hypervisor
Unikernels: Rise of the Library Hypervisor
Anil Madhavapeddy
 
20150531 virtualizatino station 2.0 partner's day
20150531 virtualizatino station 2.0 partner's day20150531 virtualizatino station 2.0 partner's day
20150531 virtualizatino station 2.0 partner's day
qnapivan
 
F19 slidedeck (OpenStack^H^H^H^Hhift, what the)
F19 slidedeck (OpenStack^H^H^H^Hhift, what the)F19 slidedeck (OpenStack^H^H^H^Hhift, what the)
F19 slidedeck (OpenStack^H^H^H^Hhift, what the)
Gerard Braad
 
Noah - Robust and Flexible Operating System Compatibility Architecture - Cont...
Noah - Robust and Flexible Operating System Compatibility Architecture - Cont...Noah - Robust and Flexible Operating System Compatibility Architecture - Cont...
Noah - Robust and Flexible Operating System Compatibility Architecture - Cont...
Takaya Saeki
 
RunningFreeBSDonLinuxKVM
RunningFreeBSDonLinuxKVMRunningFreeBSDonLinuxKVM
RunningFreeBSDonLinuxKVM
Takeshi HASEGAWA
 
Introduction to Linux for Windows Users
Introduction to Linux for Windows UsersIntroduction to Linux for Windows Users
Introduction to Linux for Windows Users
Robert McDermott
 
Lightning talk unikernels
Lightning talk unikernelsLightning talk unikernels
Lightning talk unikernels
Michael Bright
 
Hyper-V support for OpenStack Grizzly
Hyper-V support for OpenStack GrizzlyHyper-V support for OpenStack Grizzly
Hyper-V support for OpenStack Grizzly
Kamesh Pemmaraju
 
Virtualization concept slideshare
Virtualization concept slideshareVirtualization concept slideshare
Virtualization concept slideshare
Yogesh Kumar
 
SYSAD323 Virtualization Basics
SYSAD323 Virtualization BasicsSYSAD323 Virtualization Basics
SYSAD323 Virtualization BasicsDon Bosco BSIT
 
Virtualization technolegys for amdocs
Virtualization technolegys for amdocsVirtualization technolegys for amdocs
Virtualization technolegys for amdocs
Samuel Dratwa
 

Similar to Ben yehuda (20)

Nova for Physicalization and Virtualization compute models
Nova for Physicalization and Virtualization compute modelsNova for Physicalization and Virtualization compute models
Nova for Physicalization and Virtualization compute models
 
open source virtualization
open source virtualizationopen source virtualization
open source virtualization
 
Virtual machines and containers
Virtual machines and containersVirtual machines and containers
Virtual machines and containers
 
Proxmox for DevOps
Proxmox for DevOpsProxmox for DevOps
Proxmox for DevOps
 
Virtual Container - Docker
Virtual Container - Docker Virtual Container - Docker
Virtual Container - Docker
 
Linux virtualization
Linux virtualizationLinux virtualization
Linux virtualization
 
DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...
DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...
DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...
 
Unikernels: the rise of the library hypervisor in MirageOS
Unikernels: the rise of the library hypervisor in MirageOSUnikernels: the rise of the library hypervisor in MirageOS
Unikernels: the rise of the library hypervisor in MirageOS
 
Unikernels: Rise of the Library Hypervisor
Unikernels: Rise of the Library HypervisorUnikernels: Rise of the Library Hypervisor
Unikernels: Rise of the Library Hypervisor
 
20150531 virtualizatino station 2.0 partner's day
20150531 virtualizatino station 2.0 partner's day20150531 virtualizatino station 2.0 partner's day
20150531 virtualizatino station 2.0 partner's day
 
F19 slidedeck (OpenStack^H^H^H^Hhift, what the)
F19 slidedeck (OpenStack^H^H^H^Hhift, what the)F19 slidedeck (OpenStack^H^H^H^Hhift, what the)
F19 slidedeck (OpenStack^H^H^H^Hhift, what the)
 
Handout2o
Handout2oHandout2o
Handout2o
 
Noah - Robust and Flexible Operating System Compatibility Architecture - Cont...
Noah - Robust and Flexible Operating System Compatibility Architecture - Cont...Noah - Robust and Flexible Operating System Compatibility Architecture - Cont...
Noah - Robust and Flexible Operating System Compatibility Architecture - Cont...
 
RunningFreeBSDonLinuxKVM
RunningFreeBSDonLinuxKVMRunningFreeBSDonLinuxKVM
RunningFreeBSDonLinuxKVM
 
Introduction to Linux for Windows Users
Introduction to Linux for Windows UsersIntroduction to Linux for Windows Users
Introduction to Linux for Windows Users
 
Lightning talk unikernels
Lightning talk unikernelsLightning talk unikernels
Lightning talk unikernels
 
Hyper-V support for OpenStack Grizzly
Hyper-V support for OpenStack GrizzlyHyper-V support for OpenStack Grizzly
Hyper-V support for OpenStack Grizzly
 
Virtualization concept slideshare
Virtualization concept slideshareVirtualization concept slideshare
Virtualization concept slideshare
 
SYSAD323 Virtualization Basics
SYSAD323 Virtualization BasicsSYSAD323 Virtualization Basics
SYSAD323 Virtualization Basics
 
Virtualization technolegys for amdocs
Virtualization technolegys for amdocsVirtualization technolegys for amdocs
Virtualization technolegys for amdocs
 

Recently uploaded

Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
ShamsuddeenMuhammadA
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
AI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website CreatorAI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website Creator
Google
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 

Recently uploaded (20)

Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
AI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website CreatorAI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website Creator
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 

Ben yehuda

  • 1. The Turtles Project: Design and Implementation of Nested Virtualization Muli Ben-Yehuday Michael D. Dayz Zvi Dubitzkyy Michael Factory Nadav Har’Ely Abel Gordony Anthony Liguoriz Orit Wassermany Ben-Ami Yassoury yIBM Research – Haifa zIBM Linux Technology Center Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 1 / 22
  • 2. What is nested x86 virtualization? Running multiple unmodified hypervisors With their associated unmodified VM’s Guest Simultaneously Hypervisor On the x86 architecture Which does not support nesting in hardware. . . . . . but does support a single level of virtualization Hardware Hypervisor Guest OS Guest OS Guest OS Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 2 / 22
  • 3. Why? Operating systems are already hypervisors (Windows 7 with XP mode, Linux/KVM) To be able to run other hypervisors in clouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware and system software Testing, demonstrating, debugging, live migration of hypervisors Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 3 / 22
  • 4. Related work First models for nested virtualization [PopekGoldberg74, BelpaireHsu75, LauerWyeth73] First implementation in the IBM z/VM; relies on architectural support for nested virtualization (sie) Microkernels meet recursive VMs [FordHibler96]: assumes we can modify software at all levels x86 software based approaches (slow!) [Berghmans10] KVM [KivityKamay07] with AMD SVM [RoedelGraf09] Early Xen prototype [He09] Blue Pill rootkit hiding from other hypervisors [Rutkowska06] Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 4 / 22
  • 5. What is the Turtles project? Efficient nested virtualization for Intel x86 based on KVM Multiple guest hypervisors and VMs: VMware, Windows, . . . Code publicly available Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 5 / 22
  • 6. What is the Turtles project? (cont’) Nested VMX virtualization for nested CPU virtualization Multi-dimensional paging for nested MMU virtualization Multi-level device assignment for nested I/O virtualization Micro-optimizations to make it go fast (see paper) + + = Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 6 / 22
  • 7. Theory of nested CPU virtualization Single-level architectural support (x86) vs. multi-level architectural support (e.g., z/VM) Single level ) one hypervisor, many guests Turtles approach: L0 multiplexes the hardware between L1 and L2, running both as guests of L0—without either being aware of it (Scheme generalized for n levels; Our focus is n=2) Guest Host Hypervisor Hardware Host Hypervisor Hardware Multiple logical levels Multiplexed on a single level L1 L0 L2 L1 Guest L2 Guest L2 L0 Guest L2 Guest Hypervisor Guest Hypervisor Guest Guest Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 7 / 22
  • 8. Theory of nested CPU virtualization Single-level architectural support (x86) vs. multi-level architectural support (e.g., z/VM) Single level ) one hypervisor, many guests Turtles approach: L0 multiplexes the hardware between L1 and L2, running both as guests of L0—without either being aware of it (Scheme generalized for n levels; Our focus is n=2) Guest Host Hypervisor Hardware Host Hypervisor Hardware Multiple logical levels Multiplexed on a single level L1 L0 L2 L1 Guest L2 Guest L2 L0 Guest L2 Guest Hypervisor Guest Hypervisor Guest Guest Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 7 / 22
  • 9. Theory of nested CPU virtualization Single-level architectural support (x86) vs. multi-level architectural support (e.g., z/VM) Single level ) one hypervisor, many guests Turtles approach: L0 multiplexes the hardware between L1 and L2, running both as guests of L0—without either being aware of it (Scheme generalized for n levels; Our focus is n=2) Guest Host Hypervisor Hardware Host Hypervisor Hardware Multiple logical levels Multiplexed on a single level L1 L0 L2 L1 Guest L2 Guest L2 L0 Guest L2 Guest Hypervisor Guest Hypervisor Guest Guest Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 7 / 22
  • 10. Theory of nested CPU virtualization Single-level architectural support (x86) vs. multi-level architectural support (e.g., z/VM) Single level ) one hypervisor, many guests Turtles approach: L0 multiplexes the hardware between L1 and L2, running both as guests of L0—without either being aware of it (Scheme generalized for n levels; Our focus is n=2) Guest Host Hypervisor Hardware Host Hypervisor Hardware Multiple logical levels Multiplexed on a single level L1 L0 L2 L1 Guest L2 Guest L2 L0 Guest L2 Guest Hypervisor Guest Hypervisor Guest Guest Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 7 / 22
  • 11. Nested VMX virtualization: flow L0 runs L1 with VMCS0!1 L1 prepares VMCS1!2 and executes vmlaunch vmlaunch traps to L0 L0 merges VMCS’s: VMCS0!1 merged with VMCS1!2 is VMCS0!2 L0 launches L2 L2 causes a trap L0 handles trap itself or forwards it to L1 . . . eventually, L0 resumes L2 repeat L1 L2 Host Hypervisor Hardware Guest OS Guest Hypervisor VMCS Memory VMCS Tables Memory Tables VMCS Memory Tables L0 1-2 State 0-1 State 0-2 State Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 8 / 22
  • 12. Nested VMX virtualization: flow L0 runs L1 with VMCS0!1 L1 prepares VMCS1!2 and executes vmlaunch vmlaunch traps to L0 L0 merges VMCS’s: VMCS0!1 merged with VMCS1!2 is VMCS0!2 L0 launches L2 L2 causes a trap L0 handles trap itself or forwards it to L1 . . . eventually, L0 resumes L2 repeat L1 L2 Host Hypervisor Hardware Guest OS Guest Hypervisor VMCS Memory VMCS Tables Memory Tables VMCS Memory Tables L0 1-2 State 0-1 State 0-2 State Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 8 / 22
  • 13. Nested VMX virtualization: flow L0 runs L1 with VMCS0!1 L1 prepares VMCS1!2 and executes vmlaunch vmlaunch traps to L0 L0 merges VMCS’s: VMCS0!1 merged with VMCS1!2 is VMCS0!2 L0 launches L2 L2 causes a trap L0 handles trap itself or forwards it to L1 . . . eventually, L0 resumes L2 repeat L1 L2 Host Hypervisor Hardware Guest OS Guest Hypervisor VMCS Memory VMCS Tables Memory Tables VMCS Memory Tables L0 1-2 State 0-1 State 0-2 State Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 8 / 22
  • 14. Nested VMX virtualization: flow L0 runs L1 with VMCS0!1 L1 prepares VMCS1!2 and executes vmlaunch vmlaunch traps to L0 L0 merges VMCS’s: VMCS0!1 merged with VMCS1!2 is VMCS0!2 L0 launches L2 L2 causes a trap L0 handles trap itself or forwards it to L1 . . . eventually, L0 resumes L2 repeat L1 L2 Host Hypervisor Hardware Guest OS Guest Hypervisor VMCS Memory VMCS Tables Memory Tables VMCS Memory Tables L0 1-2 State 0-1 State 0-2 State Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 8 / 22
  • 15. Nested VMX virtualization: flow L0 runs L1 with VMCS0!1 L1 prepares VMCS1!2 and executes vmlaunch vmlaunch traps to L0 L0 merges VMCS’s: VMCS0!1 merged with VMCS1!2 is VMCS0!2 L0 launches L2 L2 causes a trap L0 handles trap itself or forwards it to L1 . . . eventually, L0 resumes L2 repeat L1 L2 Host Hypervisor Hardware Guest OS Guest Hypervisor VMCS Memory VMCS Tables Memory Tables VMCS Memory Tables L0 1-2 State 0-1 State 0-2 State Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 8 / 22
  • 16. Nested VMX virtualization: flow L0 runs L1 with VMCS0!1 L1 prepares VMCS1!2 and executes vmlaunch vmlaunch traps to L0 L0 merges VMCS’s: VMCS0!1 merged with VMCS1!2 is VMCS0!2 L0 launches L2 L2 causes a trap L0 handles trap itself or forwards it to L1 . . . eventually, L0 resumes L2 repeat L1 L2 Host Hypervisor Hardware Guest OS Guest Hypervisor VMCS Memory VMCS Tables Memory Tables VMCS Memory Tables L0 1-2 State 0-1 State 0-2 State Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 8 / 22
  • 17. Nested VMX virtualization: flow L0 runs L1 with VMCS0!1 L1 prepares VMCS1!2 and executes vmlaunch vmlaunch traps to L0 L0 merges VMCS’s: VMCS0!1 merged with VMCS1!2 is VMCS0!2 L0 launches L2 L2 causes a trap L0 handles trap itself or forwards it to L1 . . . eventually, L0 resumes L2 repeat L1 L2 Host Hypervisor Hardware Guest OS Guest Hypervisor VMCS Memory VMCS Tables Memory Tables VMCS Memory Tables L0 1-2 State 0-1 State 0-2 State Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 8 / 22
  • 18. Nested VMX virtualization: flow L0 runs L1 with VMCS0!1 L1 prepares VMCS1!2 and executes vmlaunch vmlaunch traps to L0 L0 merges VMCS’s: VMCS0!1 merged with VMCS1!2 is VMCS0!2 L0 launches L2 L2 causes a trap L0 handles trap itself or forwards it to L1 . . . eventually, L0 resumes L2 repeat L1 L2 Host Hypervisor Hardware Guest OS Guest Hypervisor VMCS Memory VMCS Tables Memory Tables VMCS Memory Tables L0 1-2 State 0-1 State 0-2 State Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 8 / 22
  • 19. Nested VMX virtualization: flow L0 runs L1 with VMCS0!1 L1 prepares VMCS1!2 and executes vmlaunch vmlaunch traps to L0 L0 merges VMCS’s: VMCS0!1 merged with VMCS1!2 is VMCS0!2 L0 launches L2 L2 causes a trap L0 handles trap itself or forwards it to L1 . . . eventually, L0 resumes L2 repeat L1 L2 Host Hypervisor Hardware Guest OS Guest Hypervisor VMCS Memory VMCS Tables Memory Tables VMCS Memory Tables L0 1-2 State 0-1 State 0-2 State Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 8 / 22
  • 20. Nested VMX virtualization: flow L0 runs L1 with VMCS0!1 L1 prepares VMCS1!2 and executes vmlaunch vmlaunch traps to L0 L0 merges VMCS’s: VMCS0!1 merged with VMCS1!2 is VMCS0!2 L0 launches L2 L2 causes a trap L0 handles trap itself or forwards it to L1 . . . eventually, L0 resumes L2 repeat L1 L2 Host Hypervisor Hardware Guest OS Guest Hypervisor VMCS Memory VMCS Tables Memory Tables VMCS Memory Tables L0 1-2 State 0-1 State 0-2 State Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 8 / 22
  • 21. Exit multiplication makes angry turtle angry To handle a single L2 exit, L1 does many things: read and write the VMCS, disable interrupts, . . . Those operations can trap, leading to exit multiplication Exit multiplication: a single L2 exit can cause 40-50 L1 exits! Optimize: make a single exit fast and reduce frequency of exits … … … … L3 L2 L1 L0 Single Two Three Levels Level Levels Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 9 / 22
  • 22. Exit multiplication makes angry turtle angry To handle a single L2 exit, L1 does many things: read and write the VMCS, disable interrupts, . . . Those operations can trap, leading to exit multiplication Exit multiplication: a single L2 exit can cause 40-50 L1 exits! Optimize: make a single exit fast and reduce frequency of exits … … … … L3 L2 L1 L0 Single Two Three Levels Level Levels Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 9 / 22
  • 23. Exit multiplication makes angry turtle angry To handle a single L2 exit, L1 does many things: read and write the VMCS, disable interrupts, . . . Those operations can trap, leading to exit multiplication Exit multiplication: a single L2 exit can cause 40-50 L1 exits! Optimize: make a single exit fast and reduce frequency of exits … … … … L3 L2 L1 L0 Single Two Three Levels Level Levels Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 9 / 22
  • 24. Exit multiplication makes angry turtle angry To handle a single L2 exit, L1 does many things: read and write the VMCS, disable interrupts, . . . Those operations can trap, leading to exit multiplication Exit multiplication: a single L2 exit can cause 40-50 L1 exits! Optimize: make a single exit fast and reduce frequency of exits … … … … L3 L2 L1 L0 Single Two Three Levels Level Levels Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 9 / 22
  • 25. MMU virtualization via multi-dimensional paging Three logical translations: L2 virt ! phys, L2 ! L1, L1 ! L0 Only two tables in hardware with EPT: virt ! phys and guest physical ! host physical L0 compresses three logical translations onto two hardware tables SPT12 Shadow on top of L2 virtual L2 physical L1 physical L0 physical GPT L2 virtual L2 physical L1 physical L0 physical GPT SPT02 shadow SPT12 EPT01 Multi-dimensional L2 virtual L2 physical L1 physical L0 physical GPT EPT02 EPT12 paging Shadow on top of EPT EPT01 baseline better best Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 10 / 22
  • 26. Baseline: shadow-on-shadow SPT12 L2 virtual L2 physical L1 physical L0 physical GPT SPT02 Assume no EPT table; all hypervisors use shadow paging Useful for old machines and as a baseline Maintaining shadow page tables is expensive Compress: three logical translations ) one table in hardware Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 11 / 22
  • 27. Better: shadow-on-EPT L2 virtual L2 physical L1 physical L0 physical SPT12 EPT01 GPT Instead of one hardware table we have two Compress: three logical translations ) two in hardware Simple approach: L0 uses EPT, L1 uses shadow paging for L2 Every L2 page fault leads to multiple L1 exits Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 12 / 22
  • 28. Best: multi-dimensional paging L2 virtual L2 physical L1 physical L0 physical GPT EPT02 EPT12 EPT01 EPT table rarely changes; guest page table changes a lot Again, compress three logical translations ) two in hardware L0 emulates EPT for L1 L0 uses EPT0!1 and EPT1!2 to construct EPT0!2 End result: a lot less exits! Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 13 / 22
  • 29. Introduction to I/O virtualization Device emulation [Sugerman01] HOST GUEST 1 2 device 4 3 device driver device emulation driver Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 14 / 22
  • 30. Introduction to I/O virtualization Device emulation [Sugerman01] HOST GUEST 1 2 device 4 3 device driver device emulation driver Para-virtualized drivers [Barham03, Russell08] HOST GUEST front−end virtual driver 1 driver 3 2 back−end device virtual driver Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 14 / 22
  • 31. Introduction to I/O virtualization Device emulation [Sugerman01] HOST GUEST 1 2 device 4 3 device driver device emulation driver Para-virtualized drivers [Barham03, Russell08] HOST GUEST front−end virtual driver 1 driver 3 2 back−end device virtual driver Direct device assignment [Levasseur04,Yassour08] HOST GUEST device driver Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 14 / 22
  • 32. Introduction to I/O virtualization Device emulation [Sugerman01] HOST GUEST 1 2 device 4 3 device driver device emulation driver Para-virtualized drivers [Barham03, Russell08] HOST GUEST front−end virtual driver 1 driver 3 2 back−end device virtual driver Direct device assignment [Levasseur04,Yassour08] HOST GUEST device driver Direct assignment best performing option Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 14 / 22
  • 33. Introduction to I/O virtualization Device emulation [Sugerman01] HOST GUEST 1 2 device 4 3 device driver device emulation driver Para-virtualized drivers [Barham03, Russell08] HOST GUEST front−end virtual driver 1 driver 3 2 back−end device virtual driver Direct device assignment [Levasseur04,Yassour08] HOST GUEST device driver Direct assignment best performing option Direct assignment requires IOMMU for safe DMA bypass Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 14 / 22
  • 34. Multi-level device assignment With nested 3x3 options for I/O virtualization (L2 , L1 , L0) Multi-level device assignment means giving an L2 guest direct access to L0’s devices, safely bypassing both L0 and L1 L1 hypervisor physical device L0 hypervisor L2 device driver MMIOs and PIOs L L0 IOMMU 1 IOMMU Device DMA via platform IOMMU How? L0 emulates an IOMMU for L1 [Amit10] L0 compresses multiple IOMMU translations onto the single hardware IOMMU page table L2 programs the device directly Device DMA’s into L2 memory space directly Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 15 / 22
  • 35. Multi-level device assignment With nested 3x3 options for I/O virtualization (L2 , L1 , L0) Multi-level device assignment means giving an L2 guest direct access to L0’s devices, safely bypassing both L0 and L1 L1 hypervisor physical device L0 hypervisor L2 device driver MMIOs and PIOs L L0 IOMMU 1 IOMMU Device DMA via platform IOMMU How? L0 emulates an IOMMU for L1 [Amit10] L0 compresses multiple IOMMU translations onto the single hardware IOMMU page table L2 programs the device directly Device DMA’s into L2 memory space directly Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 15 / 22
  • 36. Multi-level device assignment With nested 3x3 options for I/O virtualization (L2 , L1 , L0) Multi-level device assignment means giving an L2 guest direct access to L0’s devices, safely bypassing both L0 and L1 L1 hypervisor physical device L0 hypervisor L2 device driver MMIOs and PIOs L L0 IOMMU 1 IOMMU Device DMA via platform IOMMU How? L0 emulates an IOMMU for L1 [Amit10] L0 compresses multiple IOMMU translations onto the single hardware IOMMU page table L2 programs the device directly Device DMA’s into L2 memory space directly Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 15 / 22
  • 37. Multi-level device assignment With nested 3x3 options for I/O virtualization (L2 , L1 , L0) Multi-level device assignment means giving an L2 guest direct access to L0’s devices, safely bypassing both L0 and L1 L1 hypervisor physical device L0 hypervisor L2 device driver MMIOs and PIOs L L0 IOMMU 1 IOMMU Device DMA via platform IOMMU How? L0 emulates an IOMMU for L1 [Amit10] L0 compresses multiple IOMMU translations onto the single hardware IOMMU page table L2 programs the device directly Device DMA’s into L2 memory space directly Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 15 / 22
  • 38. Multi-level device assignment With nested 3x3 options for I/O virtualization (L2 , L1 , L0) Multi-level device assignment means giving an L2 guest direct access to L0’s devices, safely bypassing both L0 and L1 L1 hypervisor physical device L0 hypervisor L2 device driver MMIOs and PIOs L L0 IOMMU 1 IOMMU Device DMA via platform IOMMU How? L0 emulates an IOMMU for L1 [Amit10] L0 compresses multiple IOMMU translations onto the single hardware IOMMU page table L2 programs the device directly Device DMA’s into L2 memory space directly Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 15 / 22
  • 39. Multi-level device assignment With nested 3x3 options for I/O virtualization (L2 , L1 , L0) Multi-level device assignment means giving an L2 guest direct access to L0’s devices, safely bypassing both L0 and L1 L1 hypervisor physical device L0 hypervisor L2 device driver MMIOs and PIOs L L0 IOMMU 1 IOMMU Device DMA via platform IOMMU How? L0 emulates an IOMMU for L1 [Amit10] L0 compresses multiple IOMMU translations onto the single hardware IOMMU page table L2 programs the device directly Device DMA’s into L2 memory space directly Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 15 / 22
  • 40. Experimental Setup Running Linux, Windows, KVM, VMware, SMP, . . . Macro workloads: kernbench SPECjbb netperf Multi-dimensional paging? Multi-level device assignment? KVM as L1 vs. VMware as L1? See paper for full experimental details, more benchmarks and analysis, including worst case synthetic micro-benchmark Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 16 / 22
  • 41. Macro: SPECjbb and kernbench kernbench Host Guest Nested NestedDRW Run time 324.3 355 406.3 391.5 % overhead vs. host - 9.5 25.3 20.7 % overhead vs. guest - - 14.5 10.3 SPECjbb Host Guest Nested NestedDRW Score 90493 83599 77065 78347 % degradation vs. host - 7.6 14.8 13.4 % degradation vs. guest - - 7.8 6.3 Table: kernbench and SPECjbb results Exit multiplication effect not as bad as we feared Direct vmread and vmwrite (DRW) give an immediate boost Take-away: each level of virtualization adds approximately the same overhead! Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 17 / 22
  • 42. Macro: multi-dimensional paging Shadow on EPT Multi−dimensional paging 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0 kernbench specjbb netperf Improvement ratio Impact of multi-dimensional paging depends on rate of page faults Shadow-on-EPT: every L2 page fault causes L1 multiple exits Multi-dimensional paging: only EPT violations cause L1 exits EPT table rarely changes: #(EPT violations) << #(page faults) Multi-dimensional paging huge win for page-fault intensive kernbench Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 18 / 22
  • 43. Macro: multi-level device assignment throughput (Mbps) %cpu 1,000 900 800 700 600 500 400 300 200 100 0 native nested guest nested guest nested guest emulation / emulation single level guest single emulation level virtio guest single direct level access guest nested guest direct / virtio virtio / emulation virtio / virtio 100 80 60 40 20 0 nested guest direct / direct throughput (Mbps) % cpu Benchmark: netperf TCP_STREAM (transmit) Multi-level device assignment best performing option But: native at 20%, multi-level device assignment at 60% (x3!) Interrupts considered harmful, cause exit multiplication Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 19 / 22
  • 44. Macro: multi-level device assignment (sans interrupts) 1000 900 800 700 600 500 400 300 200 100 L0 (bare metal) L2 (direct/direct) L2 (direct/virtio) 16 32 64 128 256 512 Throughput (Mbps) Message size (netperf -m) What if we could deliver device interrupts directly to L2? Only 7% difference between native and nested guest! Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 20 / 22
  • 45. Conclusions Efficient nested x86 virtualization is challenging but feasible A whole new ballpark opening up many exciting applications—security, cloud, architecture, . . . Current overhead of 6-14% Negligible for some workloads, not yet for others Work in progress—expect at most 5% eventually Code is available It’s turtles all the way down Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 21 / 22
  • 46. Conclusions Efficient nested x86 virtualization is challenging but feasible A whole new ballpark opening up many exciting applications—security, cloud, architecture, . . . Current overhead of 6-14% Negligible for some workloads, not yet for others Work in progress—expect at most 5% eventually Code is available It’s turtles all the way down Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 21 / 22
  • 47. Conclusions Efficient nested x86 virtualization is challenging but feasible A whole new ballpark opening up many exciting applications—security, cloud, architecture, . . . Current overhead of 6-14% Negligible for some workloads, not yet for others Work in progress—expect at most 5% eventually Code is available It’s turtles all the way down Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 21 / 22
  • 48. Conclusions Efficient nested x86 virtualization is challenging but feasible A whole new ballpark opening up many exciting applications—security, cloud, architecture, . . . Current overhead of 6-14% Negligible for some workloads, not yet for others Work in progress—expect at most 5% eventually Code is available It’s turtles all the way down Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 21 / 22
  • 49. Conclusions Efficient nested x86 virtualization is challenging but feasible A whole new ballpark opening up many exciting applications—security, cloud, architecture, . . . Current overhead of 6-14% Negligible for some workloads, not yet for others Work in progress—expect at most 5% eventually Code is available It’s turtles all the way down Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 21 / 22
  • 50. Questions? Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 22 / 22