1. Hardware support for Virtualization Yoonje Choi
2. Origins Formalized by ‣ R. Goldberg. Architectural Principles for Virtual Computer Systems. Ph.D. thesis, Harvard Univer- sity, Cambridge, MA, 1972. ‣ G. Popek and R. Goldberg. Formal Requirements for Virtualizable 3rd Generation Architectures. Communications of the A.C.M., 17(7):412–421, 1974. By their standards, ‣ Virtual Machine : an efﬁcient, isolated duplicate of the real machine. ‣ Virtual Machine Monitor is a piece of software which meets the following requirements • Equivalent execution. Programs running in a virtual environment run identically to running natively, barring differences in resource availability and timing. • Performance. A “statistically dominant” subset of instructions must be executed directly on the CPU. • Safety. A VMM must completely control system resources.
3. Origins Instruction types ‣ Privileged • an instruction traps in unprivileged (user) mode but not in privileged (supervisor) mode. ‣ Sensitive ✓ Control sensitive • attempts to change the memory allocation or privilege mode ✓ Behavior sensitive • Location sensitive – execution behavior depends on location in memory • Mode sensitive – execution behavior depends on the privilege mode ‣ Innocuous – an instruction that is not sensitive Theorem For any conventional third generation computer, a virtual machine monitor may be constructed if the set of sensitive instructions for that computer is a subset of the set of privileged instructions. The IA-32/x86 architecture is not virtualizable.
4. Full virtualization (direct execution) Exact hardware exposed to OS Efﬁcient execution OS runs unchanged Requires a “virtualizable” architecture Example: VMWare ESXParavirtualization OS modiﬁed to execute under VMM Requires porting OS code Execution overhead Necessary for some (popular) architectures (e.g., x86) Examples: Xen
5. SIMULATE(d) sensitive innocuous innocuous IDENT(ical)Binary Translation Binary – input is machine-level code Dynamic – occurs at runtime On demand – code translated when needed for execution System level – makes no assumption about guest code Subsetting – translates from full instruction set to safe subset Adaptive – adjust code based on guest behavior to achieve efﬁciency
6. Intel® Virtualization TechnologyWhat is Intel VT? (formerly known as Vanderpool) - Silicon level virtualization support to eliminate virtualization holes - Unmodiﬁed guest OSes can be executed. - VT-x : for the IA-32 architecture - VT-i : for the Itanium architecture - VT-d : for Directed I/O - cf. AMD-V (known as Paciﬁca)Beneﬁts with VT-x - Reduce size and complexity of VMM SW - Reduce the need for VMM intervention - Reduce the need for memory overhead (no sidetable…) - Avoids need to modify guest OSes allowing them to run directly on the HW
7. Intel VT-x Architecture• Two new forms of CPU operation - VMX root operation – for use by a VMM - VMX non-root operation – similar to that of IA-32 without VT-x - Both forms of operation support all four privilege levels - Guest OS can run at its intended privilege level• Two new transitions - VM entry – from VMX root operation to non-root operation - VM exit – from VMX non-root operation to root operation• Under VMX non-root operation, Many instructions/events cause VM exits
8. Intel VT-x Architecture• Two new forms of CPU operation - VMX root operation – for use by a VMM - VMX non-root operation – similar to that of IA-32 without VT-x - Both forms of operation support all four privilege levels - Guest OS can run at its intended privilege VM VM level• Two new transitions Ring 3 Apps Apps - VM entry – from VMX root operation to Ring 0 OS OS non-root operation VM Exit VM Entry - VM exit – from VMX non-root operation to VMX VMM root operation Root• Under VMX non-root operation, Many Intel® Virtualization Technology instructions/events cause VM exits Shared Physical Hardware
9. Virtual Machine Control Structure A new data structure. VMCS is created for each virtual CPU. VMCS includes guest-state area and host- state area At transition, corresponding state is loaded/ saved VM Exiting events control
10. Virtual Machine Control Structure A new data structure. VMCS is created for each virtual CPU. VMCS includes guest-state area and host- state area At transition, corresponding state is loaded/ saved VM Exiting events control
11. Virtual Machine Control Structure VM entry A new data structure. VMCS is created for each virtual CPU. VMCS includes guest-state area and host- state area At transition, corresponding state is loaded/ saved VM Exiting events control
12. Virtual Machine Control Structure VM exit A new data structure. VMCS is created for each virtual CPU. VMCS includes guest-state area and host- state area At transition, corresponding state is loaded/ saved VM Exiting events control
13. Virtual Machine Control Structure A new data structure. VMCS is created for each virtual CPU. VMCS includes guest-state area and host- state area At transition, corresponding state is loaded/ saved VM Exiting events control
14. VM exit/entryInstructions, such as CPUID, MOVfrom/to CR3, are intercepted asVM exit.Exceptions/faults, such as pagefault, are intercepted as VM exits,and virtualized exceptions/faultsare injected on VM entry to guests.External interrupts unrelated toguests are intercepted as VM exits,and virtualized interrupts areinjected on VM entry to the guests.
15. Performance 100000 10 Native Software VMM Software VMM Hardware VMM Hardware VMM 10000 8 CPU cycles (smaller is better) 1000 3.8GHz P4 672 2.66GHz Core 2 Duo Overhead (seconds) VM entry6 2409 937 Page fault VM exit 1931 1186 100 VMCB read 178 52 VMCB write 4 171 44 10 Table 1. Micro-architectural improvements (cycles). 2 1 System calls were similar in frequency to PTE modiﬁcations. However, while the software VMM slows down system calls sub- 0 0.1 stantially, on an end-to-end basis system calls were not frequent syscall in cr8wr callret pgfault divzero ptemod enough to offset the hardware VMM’s penalty for PTE ptemod transla syscall in/out cr8wr callret pgfault modiﬁca- tion (and I/O instructions), and the hardware VMM incurs consider- Figure 4. Virtualization nanobenchmarks. ably more Figure 5. Sources of virtualization overhead in workload. total overhead than the software VMM in this an XP boot/h The cost of running the binary translator (vs. executing the translated code) is rarely signiﬁcant; see again Figure 5. There aretween the two VMMs, the hardware VMM inducing approximately two reasons. First, the TC captures the working 35 cycles, about fou4.4 times greater overhead than the software VMM. Still, this pro- structions, completing the %cr8 write in set and continued execution amortizes away translation costs for long-running work- faster than native.gram stresses many divergent paths through both VMMs, such as loads. Second, the translator is quite fast because it does ﬂow. anal-system calls, context switching, creation of address spaces, modiﬁ- call/ret. BT slows down indirect control little We targ ysis (2300 cyclesby repeatedly calling a subroutine. Since kcy- ha overhead per x86 instruction, compared with 100-200 thecation of traced page table entries, and injection of page faults. cles per Java bytecode for some optimizing JITs ). High trans- the VMM executes calls and returns without modiﬁcation, lator throughput ensures goodboth execute the call/return pair in 11 ware VMM and native performance even for a worst-case6.3 Virtualization nanobenchmarks workload like boot/halt that mostly executes cold code.
16. Conclusion• While the new hardware removes the need for BT and simpliﬁes VMM design, it rarely improves performance.• Hardware overheads will shrink over time as technology matures.
17. References• Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, and Andrew Warﬁeld. Xen and the art of virtualization. In Proceedings of the ACM Symposium on Operating Systems Principles, October 2003.• Jacob Faber Kloster, Jesper Kristensen, and Arne Mejlholm. Efficient memory sharing in the xen virtual machine monitor. http://www.cs.aau.dk/ library/cgi-bin/detail.cgi?id=1136884892, January 2006.• Gil Neiger, Amy Santoni, Felix Leung, Dion Rodgers, Rich Uhlig. Intel Virtualization Technology:Hardware Support for Efﬁcient Processor Virtualization. Intel Technology Journal Volume 10, Issue 3, 2006• J. Fisher-Ogden. Hardware support for efﬁcient virtualization. http:// cseweb.ucsd.edu/~jﬁsherogden/hardwareVirt.pdf, 2006.• http://courses.cs.vt.edu/cs5204/fall09-kafura/
18. DeﬁnitionsVirtualization ‣ A layer mapping its visible interface and resources onto the interface and resources of the underlying layer or system on which it is implemented ‣ Purposes • Abstraction – to simplify the use of the underlying resource (e.g., by removing details of the resource’s structure) • Replication – to create multiple instances of the resource (e.g., to simplify management or allocation) • Isolation – to separate the uses which clients make of the underlying resources (e.g., to improve security)Virtual Machine Monitor (VMM) ‣ A virtualization system that partitions a single physical “machine” into multiple virtual machines.Terminology ‣ Host – the machine and/or software on which the VMM is implemented ‣ Guest – the OS which executes under the control of the VMM