paravirtualization: modify guest OS code binary translation: modify guest OS binary “on-the-fly” HW RING0 RING1 RING2 RING3 OS APPs RING0 RING1 RING2 RING3 OSes APPs VMM VMM Ring 0? OS App Ring 3 Ring 0
Intel ® Virtualization Technology
What is Intel VT? (formerly known as Vanderpool)
Silicon level virtualization support to eliminate virtualization holes
Unmodified guest OSes can be executed.
VT-x : for the IA-32 architecture
VT-i : for the Itanium architecture
VT-d : for Directed I/O
cf. AMD-V (known as Pacifica)
Benefits with VT-x
Reduce size and complexity of VMM SW
Reduce the need for VMM intervention
Reduce the need for memory overhead (no sidetable…)
Avoids need to modify guest OSes allowing them to run directly on the HW
Intel ® Virtualization Technology (cont’d)
VT-x : extension to the IA-32 Intel architecture
Virtual Machine Extension (VMX) operation
More-privileged mode (VMX root)
Less-privileged mode (VMX non-root)
10 new VMX instructions
Virtual Machine Control Structure (VMCS)
manages VM entry/exit
holds guest and host state
VMCS is created for each virtual CPU.
4 privilege levels (ring 0-3)
VM entry VM exit Shared Physical Hardware Intel® Virtualization Technology Ring 3 Ring 0 VMX Root VMM Apps OS Apps OS VM Exit VM Entry VM VM
Extending Xen * with Intel ® VT
HVM (Hardware-based Virtual Machine)
fully virtualized domain (unmodified guest OSes)
creating, controlling, and destroying HVM domains
load the guest FW into HVM domain
create the device model thread in Dom0
service I/O request
then, HVM guest is started, and control is passed to the first instruction in the guest FW.
The HVM guest executes at native speed until it encounters an event that requires special handling by Xen.
The Virtual CPU module
provides the abstraction of processor(s) to the HVM guest.
manages the virtual processor and associated virtualization events.
for the IA-32 architecture
VMCS is created for each CPU in a HVM domain.
Instructions, such as CPUID , MOV from/to CR3 , are intercepted as VM exit.
Exceptions/faults, such as page fault , are intercepted as VM exit s, and virtualized exceptions/faults are injected on VM entry to guests.
External interrupts unrelated to guests are intercepted as VM exit s, and virtualized interrupts are injected on VM entry to the guests.
Xen presents the abstraction of a HW MMU to the HVM domain
IA-32 Memory Virtualization
supports various kind of page table (2/3/4-level PT with 4KB size)
maintains a shadow page table for the guest.
extends Xen’s shadow page table to support both paravirtualized and fully virtualized guests.
Optimized shadow page table management
Shadow page table code is the most critical section for the performance
To detect any attempt to modify the guest page table, write protect the corresponding guest page table page.
Upon page fault against a guest page table, save a “snapshot” of the page and give write permission to the page
This page is then added to an “out-of-sync” list
When the flush TLB operation is executed, reflect all the entries on the “out-of-sync” list to the shadow page table
Xen/VT-x HVM implement shadow page table
Shadow TLB is inefficient in x86
Host page fault (VM exit) is very expensive
Guest OS purge entire TLBs at process switch time (CR3 write)
Excessive page fault will be raised if implementing shadow TLB
Shadow page table
Much effective than shadow TLB, but
Duplicating page table consume both CPU cycle & memory
Xen/VT-i HVM implement shadow TLB
Shadow TLB is highly efficient in Itanium
IA-64 use RID to differentiate TLBs from different process, thus guest OS rarely flush entire TLBs
reuse open source QEMU project emulation module
run an instance of the device models in Dom0 per HVM
performance critical models are moved into the hypervisor
communication between the I/O device model and the Xen hypervisor uses a shared memory
I/O Port Access
port Xen’s VBD and VNIF to HVM domains
Memory-Mapped I/O Handling
HVM guests only see virtualized external interrupts.
Virtual Device Drivers
define a way to allow the hypervisor to access guest virtual address
define a way to signal Xen events to the virtual driver
Performance Tuning VT-x Guests
extending Xentrace to support HVM domains
counting the occurrence of event s and their handling time in the hypervisor