Published on

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Chapter 8 System Virtual Machines 2005.11.9 Dong In Shin Distributed Computing System Laboratory Seoul National Univ. System VMs
  2. 2. Contents Performance Enhancement of System VMs 1 Case Study : Vmware Virtual Platform 2 Case Study : The Intel VT-x Technology 3 ** Case Study : Xen 4
  3. 3. Performance Enhancement of System Virtual Machines
  4. 4. Reasons for Performance Degradation <ul><li>Setup </li></ul><ul><li>Emulation </li></ul><ul><ul><li>Some guest instructions need to be emulated (usually via interpretation) by the VMM. </li></ul></ul><ul><li>Interrupt handling </li></ul><ul><li>State saving </li></ul><ul><li>Bookkeeping </li></ul><ul><ul><li>Ex. The accounting of time charged to a user </li></ul></ul><ul><li>Time elongation </li></ul>
  5. 5. Instruction Emulation Assists <ul><li>The VMM emulates the privilege instruction using a routine whose operation depends on whether the virtual machine is supposed to be executing in system mode or in user mode. </li></ul><ul><ul><li>Hardware assist for checking the state and performing the actions. </li></ul></ul>
  6. 6. Virtual Machine Monitor Assists <ul><li>Context switch </li></ul><ul><ul><li>Using hardware to save and restore registers </li></ul></ul><ul><li>Decoding of privileged instructions </li></ul><ul><ul><li>Hardware assists, such as decoding the privileged instructions. </li></ul></ul><ul><li>Virtual interval timer </li></ul><ul><ul><li>Decrementing the virtual counter by some amount estimated by the VMM from the amount that the real timer decrements. </li></ul></ul><ul><li>Adding to the instruction set </li></ul><ul><ul><li>A number of new instructions that are not a part of the ISA of the machine. </li></ul></ul>
  7. 7. Improving Performance of the Guest System <ul><li>Non-paged mode </li></ul><ul><ul><li>The guest OS disables dynamic address translation and defines its real address space to be as large as the largest virtual address space.  Page frames are mapped to fixed real pages. </li></ul></ul><ul><ul><li>The guest OS no longer has to exercise demand paging. </li></ul></ul><ul><ul><li>No double paging </li></ul></ul><ul><ul><li>No potential conflict in paging decisions by the guest OS system and the VMM </li></ul></ul>
  8. 8. Double Paging <ul><li>Two independent layers of paging will interact, perform poorly. </li></ul>Guest OS incorrectly believe a page to be in physical memory ( green/gold pages ) VMM believes an unneeded page is still in use (teal pages) Guest evicts a page despite available physical memory (red pages)
  9. 9. Pseudo-page-fault handling <ul><li>A page fault in a VM system </li></ul><ul><ul><li>A page fault in some VM’s page table </li></ul></ul><ul><ul><li>A page fault of VMM’s page table </li></ul></ul><ul><ul><ul><li>Pseudo page-fault handling </li></ul></ul></ul><ul><li>Process </li></ul><ul><ul><li>Initialize page-in operation from backing store. </li></ul></ul><ul><ul><li>Triggers guest ‘pseudo page fault’. </li></ul></ul><ul><ul><li>Guest OS suspends guest’s user process. </li></ul></ul><ul><ul><li>VMM does not suspend the guest. </li></ul></ul><ul><li>On completion of page-in operation </li></ul><ul><ul><li>VMM calls guest pseudo page fault handler again </li></ul></ul><ul><ul><li>Guest OS handler wakes up blocked user process. </li></ul></ul>
  10. 10. The others… <ul><li>Spool files </li></ul><ul><ul><li>Without any special mechanism, VMM should intercept the I/O commands and decipher that the virtual machines are simultaneously attempting to send a job to the I/O devices . </li></ul></ul><ul><ul><li>Handshaking allows the VMM picks up the spool file and continues to merge this file into its own buffer. </li></ul></ul><ul><li>Inter-virtual-machine communication </li></ul><ul><ul><li>Communication between two physical machines involves the processing of message packets through several layers at the sender/receiver side </li></ul></ul><ul><ul><li>This process can be streamlines, simplified, and made faster if the two machines are virtual machines on the same host platform. </li></ul></ul>
  11. 11. Specialized Systems <ul><li>Virtual-equals-real (V=R) virtual machine </li></ul><ul><ul><li>The host address space representing the guest real memory is mapped one-to-one to the host real memory address space. </li></ul></ul><ul><li>Shadow-table bypass assist </li></ul><ul><ul><li>The guest page tables can point directly to physical addresses if the dynamic address translation hardware is allowed to manipulate the guest page tables. </li></ul></ul><ul><li>Preferred-machine assist </li></ul><ul><ul><li>Allow a guest OS system to operate in system mode rather than user mode. </li></ul></ul><ul><li>Segment sharing </li></ul><ul><ul><li>Sharing the code segments of the operating system among the virtual machines, provided the operating system code is written in a reentrance manner. </li></ul></ul>
  12. 12. Generalized Support for Virtual Machines <ul><li>Interpretive Execution Facility (IEF) </li></ul><ul><ul><li>The processor directly executes most of the functions of the virtual machine in hardware. </li></ul></ul><ul><ul><li>An extreme case of a VM assist. </li></ul></ul><ul><li>Interpretive Execution Entry and Exit </li></ul><ul><ul><li>Entry </li></ul></ul><ul><ul><ul><li>Start Interpretive Execution (SIE) : The software give up control to the hardware IEF part and processor enters the interpretive execution mode. </li></ul></ul></ul><ul><ul><li>Exit </li></ul></ul><ul><ul><ul><li>Host Interrupt </li></ul></ul></ul><ul><ul><ul><li>Interception </li></ul></ul></ul><ul><ul><ul><ul><li>Unsupported hardware instructions. </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Exception during the execution of interpreted instruction. </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Some special case… </li></ul></ul></ul></ul>
  13. 13. Interpretive Execution Entry and Exit VMM Software SIE Host interrupt handler Interpretive execution mode Entry into interpretive execution mode Exit for interception Exit for host interrupt Emulation
  14. 14. Full-virtualization Versus Para-virtualization <ul><li>Full virtualization </li></ul><ul><ul><li>Provide total abstraction of the underlying physical system and creates a complete virtual systems in which the guest operating systems can execute. </li></ul></ul><ul><ul><li>No modification is required in the guest OS or application. </li></ul></ul><ul><ul><li>The guest OS or application is not aware of the virtualized environment. </li></ul></ul><ul><li>Advantages </li></ul><ul><ul><li>Streamlining the migration of applications and workloads between different physical systems. </li></ul></ul><ul><ul><li>Complete isolation of different applications, which make this approach highly secure. </li></ul></ul><ul><li>Disadvantages </li></ul><ul><ul><li>Performance penalty </li></ul></ul><ul><li>Microsoft Virtual Server and Vmware ESX Server </li></ul>
  15. 15. Full-virtualization Versus Para-virtualization <ul><li>Para Virtualization </li></ul><ul><ul><li>The virtualization technique that presents a software interface to virtual machines that is similar but not identical to that of the underlying hardware. </li></ul></ul><ul><ul><li>This techniques require modifications to the guest OS that are running on the VMs. </li></ul></ul><ul><ul><li>The guest OSs are aware that they are executing on a VM. </li></ul></ul><ul><li>Advantages </li></ul><ul><ul><li>Near-native performance </li></ul></ul><ul><li>Disadvantages </li></ul><ul><ul><li>Some limitations, including several insecurities such as the guest OS cache data, unauthenticated connections, and so forth. </li></ul></ul><ul><li>Xen system </li></ul>
  16. 16. Case Study: Vmware Virtual Platform
  17. 17. Vmware Virtual Platform <ul><li>A popular virtual machine infrastructure for IA-32-based PCs and server. </li></ul><ul><li>An example of a hosted virtual machine system </li></ul><ul><ul><li>Native virtualization architecture product  Vmware ESX Server </li></ul></ul><ul><ul><li>This book is limited to the hosted system , Vmware GSX Server (VMWare2001) </li></ul></ul><ul><li>Challenges </li></ul><ul><ul><li>Difficulties to virtualize efficiently based on IA-32 environment. </li></ul></ul><ul><ul><li>The openness of the system architecture. </li></ul></ul><ul><ul><li>Easy Installation. </li></ul></ul>
  18. 18. Vmware’s Hosted Virtual Machine Model
  19. 19. Processor Virtualization <ul><li>Critical Instructions in Intel IA-32 architecture </li></ul><ul><ul><li>not efficiently virtualizable. </li></ul></ul><ul><li>Protection system references </li></ul><ul><ul><li>Reference the storage protection system, memory system, or address relocation system. (ex. mov ax, cs ) </li></ul></ul><ul><li>Sensitive register instructions </li></ul><ul><ul><li>Read or change resource-related registers and memory locations (ex. POPF) </li></ul></ul><ul><li>Problems </li></ul><ul><ul><li>The sensitive instructions executed in user mode do not executed as correct as we expected unless the instruction is emulated. </li></ul></ul><ul><li>Solutions </li></ul><ul><ul><li>The VM monitor substitutes the instruction with another set of instruction and emulates the action of the original code. </li></ul></ul>
  20. 20. Input/Output Virtualization <ul><li>The PC platform supports many more devices and types of devices than any other platform. </li></ul><ul><li>Emulation in VMMonitor </li></ul><ul><ul><li>Converting the in and out I/O to new I/O instructions. </li></ul></ul><ul><ul><li>Requires some knowledge of the device interfaces. </li></ul></ul><ul><li>New Capability for Devices Through Abstraction Layer </li></ul><ul><ul><li>VMApp’s ability to insert a layer of abstraction above the physical device. </li></ul></ul><ul><li>Advantages </li></ul><ul><ul><li>Reduce performance losses due to virtualization. </li></ul></ul><ul><ul><ul><li>Ex) Virtual Ethernet switch between a virtual NIC and a physical NIC. </li></ul></ul></ul>
  21. 21. Using the Services of the Host Operating System <ul><li>The request is converted into a host OS call. </li></ul><ul><li>Advantages </li></ul><ul><ul><li>No limitations for VMM’s access of the host OS’s I/O features. </li></ul></ul><ul><ul><li>Running the Performance-Critical applications </li></ul></ul>
  22. 22. Memory Virtualization <ul><li>Paging requests of the guest OS </li></ul><ul><ul><li>Not directly intercepted by the VMM, but converted into disk read/writes. </li></ul></ul><ul><ul><li>VMMonitor translates it to requests on the host OS throught VMApp. </li></ul></ul><ul><li>Page replacement policy of host OS </li></ul><ul><ul><li>The host could replace the critical pages of VM system in the competition with other host applications. </li></ul></ul><ul><ul><li>VMDriver’s critical pages pinning for virtual memory system. </li></ul></ul>
  23. 23. Vmware ESX Server <ul><li>Native VM </li></ul><ul><ul><li>A thin software layer designed to multiplex hardware resources among virtual machines </li></ul></ul><ul><ul><li>Providing higher I/O performance and complete control over resource management </li></ul></ul><ul><li>Full Virtualization </li></ul><ul><ul><li>For servers running multiple instances of unmodified operating systems </li></ul></ul>
  24. 24. Page Replacement Issues <ul><li>Problem of double paging </li></ul><ul><ul><li>Unintended interactions with native memory management policies between in guest operating systems and host system. </li></ul></ul><ul><li>Ballooning </li></ul><ul><ul><li>Reclaims the pages considered least valuable by the operating system running in a virtual machine. </li></ul></ul><ul><ul><li>Small balloon module loaded into the guest OS as a pseudo-device driver or kernel service. </li></ul></ul><ul><ul><li>Module communicates with ESX server via a private channel. </li></ul></ul>
  25. 25. Ballooning in VMware ESX Server <ul><li>Inflating a balloon </li></ul><ul><ul><li>When the server wants to reclaim memory </li></ul></ul><ul><ul><li>Driver allocate pinned physical pages within the VM </li></ul></ul><ul><ul><li>Increase memory pressure in the guest OS, reclaim space to satisfy the driver allocation request </li></ul></ul><ul><ul><li>Driver communicates the physical page number for each allocated page to ESX server </li></ul></ul><ul><li>Deflating </li></ul><ul><ul><li>Frees up memory for general use within the guest OS </li></ul></ul>
  26. 26. Virtualizing I/O Devices on VMware Workstation <ul><li>Supported v irtual devices of V Mwa re </li></ul><ul><ul><li>PS/2 keyboard, PS/2 mouse, floppy drive, IDE controllers with ATA disks and ATAPI CD-ROMs, a Soundblaster 16 sound card, serial and parallel ports, virtual BusLogic SCSI controllers, AMD PCNet Ethernet adapters, and an SVGA video controller. </li></ul></ul><ul><li>P rocedures </li></ul><ul><ul><li>I ntercept I/O operations issued by the guest OS. ( IA-32 IN and OUT ) </li></ul></ul><ul><ul><li>E mulated either in the VMM or the VMApp. </li></ul></ul><ul><li>Drawbacks </li></ul><ul><ul><li>Virtualizing I/O devices can incur overhead from world switches between the VMM and the host </li></ul></ul><ul><ul><li>H andling the privileged instructions used to communicate with the hardware </li></ul></ul>
  27. 27. Case Study: The Intel VT-x (Vanderpool) Technology
  28. 28. Overview <ul><li>VT-x (Vanderpool) technology for IA-32 processors </li></ul><ul><ul><li>enhance the performance VM implementation through hardware enhancements of the processor. </li></ul></ul><ul><li>Main Feature </li></ul><ul><ul><li>The inclusion of the new VMX mode of operation (VMX root/non-root operation) </li></ul></ul><ul><ul><li>VMX root operation </li></ul></ul><ul><ul><ul><li>Fully privileged, intended for VM monitor New instructions – VMX instructions </li></ul></ul></ul><ul><ul><li>VMX non-root operation </li></ul></ul><ul><ul><ul><li>Not fully privileged, intended for guest software </li></ul></ul></ul><ul><ul><ul><li>Reduces Guest SW privilege w/o relying on rings </li></ul></ul></ul>
  29. 29. Technological Overview Root Mode (VMM) Non-Root (VM1) Non-Root (VM2) Regular Mode Regular Mode vmxon v mlaunch VM1 v mlaunch VM2 v mresume VM2 v mresume VM2 v mresume VM1 vmxoff VM1 exits VM2 exits VM2 exits VM2 exits VM1 exits
  30. 30. VT-x Operations IA-32 Operation VMX Root Operation VMX Non-root Operation . . . VMXON VMLAUNCH VMRESUME VM Exit Ring 0 Ring 3 Ring 0 Ring 3 VM 1 Ring 0 Ring 3 VM 2 Ring 0 Ring 3 VM n VMCS 2 VMCS n VMCS 1
  31. 31. Capabilities of the Technology <ul><li>A Key aspect </li></ul><ul><ul><li>The elimination of the need to run all guest code in the user mode. </li></ul></ul><ul><li>Maintenance of state information </li></ul><ul><ul><li>Major source of overhead in a software-based solution </li></ul></ul><ul><ul><li>Hardware technique that allows all of the state-holding data elements to be mapped to their native structures. </li></ul></ul><ul><ul><li>VMCS (Virtual Machine Control Structure) </li></ul></ul><ul><ul><ul><li>Hardware implementation take over the tasks of loading and unloading the state from their physical locations. </li></ul></ul></ul>
  32. 32. Virtual Machine Control Structure (VMCS) <ul><li>Control Structures in Memory </li></ul><ul><ul><li>Only one VMCS active per virtual processor at any given time </li></ul></ul><ul><li>VMCS Payload </li></ul><ul><ul><li>VM execution, VM exit, and VM entry controls </li></ul></ul><ul><ul><li>Guest and host state </li></ul></ul><ul><ul><li>VM-exit information fields </li></ul></ul>
  33. 33. ** Case Study: Xen Virtualization
  34. 34. Xen Design Principle <ul><li>Support for unmodified applica ti on binaries is essential. </li></ul><ul><li>Supporting full multi-application operating system is important. </li></ul><ul><li>Paravirtualization is necessary to obtain high performance and strong resource isolation. </li></ul>
  35. 35. Xen Features <ul><li>Secure isolation between VMs </li></ul><ul><li>Resource Control and QoS </li></ul><ul><li>Only guest kernel needs to be ported </li></ul><ul><ul><li>All user-level apps and libraries run unmodified. </li></ul></ul><ul><ul><li>Linux 2.4/2.6 , NetBSD, FreeBSD, WinXP </li></ul></ul><ul><li>Execution performance is close to native. </li></ul><ul><li>Live Migration of VMs between Xen nodes. </li></ul>
  36. 36. Xen 3.0 Architecture
  37. 37. Xen para-virtualization <ul><li>Arch Xen/X86 , replace privileged instructions with Xen hypercalls. </li></ul><ul><li>Hypercalls </li></ul><ul><ul><li>Notifications are delivered to domains from Xen using an asynchronous event mechanism </li></ul></ul><ul><li>Modify OS to understand virtualized environment </li></ul><ul><ul><li>Wall-clock time vs. virtual processor time </li></ul></ul><ul><ul><ul><li>Xen provides both types of alarm timer </li></ul></ul></ul><ul><ul><li>Expose real resource availability </li></ul></ul><ul><li>Xen Hypervisor </li></ul><ul><ul><li>Additional protection domain between guest OSes and I/O devices. </li></ul></ul>
  38. 38. X86 Processor Virtualization <ul><li>Xen runs in ring 0 (most privileged) </li></ul><ul><li>Ring 1,2 for guest OS, 3 for user-space </li></ul><ul><li>Xen lives in top of 64MB of linear address space. </li></ul><ul><ul><li>Segmentation used to protect Xen as switching page tables too slow on standard X86 </li></ul></ul><ul><li>Hypercalls jump to Xen in ring 0 </li></ul><ul><li>Guest OS may install ‘fast trap’ handler </li></ul><ul><li>MMU-virtualization : shadow vs. direct-mode </li></ul>
  39. 39. Para-virtualizing the MMU <ul><li>Guest OS allocate and manage own page-tables </li></ul><ul><ul><li>Hypercalls to change PageTable base. </li></ul></ul><ul><li>Xen Hypervisor is responsible for trapping accesses to the virtual page table, validating updates and propagating changes. </li></ul><ul><li>Xen must validate page table updates before use </li></ul><ul><ul><li>Updates may be queued and batch processed </li></ul></ul><ul><li>Validation rules applied to each PTE </li></ul><ul><ul><li>Guest may only map pages it owns </li></ul></ul><ul><li>XenoLinux implements a balloon driver </li></ul><ul><ul><li>Adjust a domain’s memory usage by passing memory pages back and forth between Xen and XenoLinux </li></ul></ul>
  40. 40. MMU virtualization
  41. 41. Writable Page Tables
  42. 42. I/O Architecture <ul><li>Asynchronous buffer descriptor rings </li></ul><ul><ul><li>Using shared-memory </li></ul></ul><ul><li>Xen I/O-Spaces delegate guest Oses protected access to specified h/w devices </li></ul><ul><li>The guest OS passes buffer information vertically through the system. </li></ul><ul><li>Xen performs validation checks. </li></ul><ul><li>Xen supports a lightweight event-delivery mechanism which is userd for sending asynchronous notifications to a domain. </li></ul>
  43. 43. Data Transfer : I/O Descriptor Rings
  44. 44. Device Channel Interface
  45. 45. Performance
  46. 46. Thank You !