Chapter 8 System Virtual Machines 2005.11.9 Dong In Shin Distributed Computing System Laboratory Seoul National Univ. System VMs
Contents Performance Enhancement of System VMs 1 Case Study : Vmware Virtual Platform 2 Case Study : The Intel VT-x Technology 3 ** Case Study : Xen 4
Performance Enhancement of System Virtual Machines
Reasons for Performance Degradation
Some guest instructions need to be emulated (usually via interpretation) by the VMM.
Ex. The accounting of time charged to a user
Instruction Emulation Assists
The VMM emulates the privilege instruction using a routine whose operation depends on whether the virtual machine is supposed to be executing in system mode or in user mode.
Hardware assist for checking the state and performing the actions.
Virtual Machine Monitor Assists
Using hardware to save and restore registers
Decoding of privileged instructions
Hardware assists, such as decoding the privileged instructions.
Virtual interval timer
Decrementing the virtual counter by some amount estimated by the VMM from the amount that the real timer decrements.
Adding to the instruction set
A number of new instructions that are not a part of the ISA of the machine.
Improving Performance of the Guest System
The guest OS disables dynamic address translation and defines its real address space to be as large as the largest virtual address space. Page frames are mapped to fixed real pages.
The guest OS no longer has to exercise demand paging.
No double paging
No potential conflict in paging decisions by the guest OS system and the VMM
Two independent layers of paging will interact, perform poorly.
Guest OS incorrectly believe a page to be in physical memory ( green/gold pages ) VMM believes an unneeded page is still in use (teal pages) Guest evicts a page despite available physical memory (red pages)
A page fault in a VM system
A page fault in some VM’s page table
A page fault of VMM’s page table
Pseudo page-fault handling
Initialize page-in operation from backing store.
Triggers guest ‘pseudo page fault’.
Guest OS suspends guest’s user process.
VMM does not suspend the guest.
On completion of page-in operation
VMM calls guest pseudo page fault handler again
Guest OS handler wakes up blocked user process.
Without any special mechanism, VMM should intercept the I/O commands and decipher that the virtual machines are simultaneously attempting to send a job to the I/O devices .
Handshaking allows the VMM picks up the spool file and continues to merge this file into its own buffer.
Communication between two physical machines involves the processing of message packets through several layers at the sender/receiver side
This process can be streamlines, simplified, and made faster if the two machines are virtual machines on the same host platform.
Virtual-equals-real (V=R) virtual machine
The host address space representing the guest real memory is mapped one-to-one to the host real memory address space.
Shadow-table bypass assist
The guest page tables can point directly to physical addresses if the dynamic address translation hardware is allowed to manipulate the guest page tables.
Allow a guest OS system to operate in system mode rather than user mode.
Sharing the code segments of the operating system among the virtual machines, provided the operating system code is written in a reentrance manner.
Generalized Support for Virtual Machines
Interpretive Execution Facility (IEF)
The processor directly executes most of the functions of the virtual machine in hardware.
An extreme case of a VM assist.
Interpretive Execution Entry and Exit
Start Interpretive Execution (SIE) : The software give up control to the hardware IEF part and processor enters the interpretive execution mode.
Unsupported hardware instructions.
Exception during the execution of interpreted instruction.
Some special case…
Interpretive Execution Entry and Exit VMM Software SIE Host interrupt handler Interpretive execution mode Entry into interpretive execution mode Exit for interception Exit for host interrupt Emulation
Full-virtualization Versus Para-virtualization
Provide total abstraction of the underlying physical system and creates a complete virtual systems in which the guest operating systems can execute.
No modification is required in the guest OS or application.
The guest OS or application is not aware of the virtualized environment.
Streamlining the migration of applications and workloads between different physical systems.
Complete isolation of different applications, which make this approach highly secure.
Microsoft Virtual Server and Vmware ESX Server
Full-virtualization Versus Para-virtualization
The virtualization technique that presents a software interface to virtual machines that is similar but not identical to that of the underlying hardware.
This techniques require modifications to the guest OS that are running on the VMs.
The guest OSs are aware that they are executing on a VM.
Some limitations, including several insecurities such as the guest OS cache data, unauthenticated connections, and so forth.
Case Study: Vmware Virtual Platform
Vmware Virtual Platform
A popular virtual machine infrastructure for IA-32-based PCs and server.
An example of a hosted virtual machine system
Native virtualization architecture product Vmware ESX Server
This book is limited to the hosted system , Vmware GSX Server (VMWare2001)
Difficulties to virtualize efficiently based on IA-32 environment.
The openness of the system architecture.
Vmware’s Hosted Virtual Machine Model
Critical Instructions in Intel IA-32 architecture
not efficiently virtualizable.
Protection system references
Reference the storage protection system, memory system, or address relocation system. (ex. mov ax, cs )
Sensitive register instructions
Read or change resource-related registers and memory locations (ex. POPF)
The sensitive instructions executed in user mode do not executed as correct as we expected unless the instruction is emulated.
The VM monitor substitutes the instruction with another set of instruction and emulates the action of the original code.
The PC platform supports many more devices and types of devices than any other platform.
Emulation in VMMonitor
Converting the in and out I/O to new I/O instructions.
Requires some knowledge of the device interfaces.
New Capability for Devices Through Abstraction Layer
VMApp’s ability to insert a layer of abstraction above the physical device.
Reduce performance losses due to virtualization.
Ex) Virtual Ethernet switch between a virtual NIC and a physical NIC.
Using the Services of the Host Operating System
The request is converted into a host OS call.
No limitations for VMM’s access of the host OS’s I/O features.
Running the Performance-Critical applications
Paging requests of the guest OS
Not directly intercepted by the VMM, but converted into disk read/writes.
VMMonitor translates it to requests on the host OS throught VMApp.
Page replacement policy of host OS
The host could replace the critical pages of VM system in the competition with other host applications.
VMDriver’s critical pages pinning for virtual memory system.
Vmware ESX Server
A thin software layer designed to multiplex hardware resources among virtual machines
Providing higher I/O performance and complete control over resource management
For servers running multiple instances of unmodified operating systems
Page Replacement Issues
Problem of double paging
Unintended interactions with native memory management policies between in guest operating systems and host system.
Reclaims the pages considered least valuable by the operating system running in a virtual machine.
Small balloon module loaded into the guest OS as a pseudo-device driver or kernel service.
Module communicates with ESX server via a private channel.
Ballooning in VMware ESX Server
Inflating a balloon
When the server wants to reclaim memory
Driver allocate pinned physical pages within the VM
Increase memory pressure in the guest OS, reclaim space to satisfy the driver allocation request
Driver communicates the physical page number for each allocated page to ESX server
Frees up memory for general use within the guest OS
Virtualizing I/O Devices on VMware Workstation
Supported v irtual devices of V Mwa re
PS/2 keyboard, PS/2 mouse, floppy drive, IDE controllers with ATA disks and ATAPI CD-ROMs, a Soundblaster 16 sound card, serial and parallel ports, virtual BusLogic SCSI controllers, AMD PCNet Ethernet adapters, and an SVGA video controller.
I ntercept I/O operations issued by the guest OS. ( IA-32 IN and OUT )
E mulated either in the VMM or the VMApp.
Virtualizing I/O devices can incur overhead from world switches between the VMM and the host
H andling the privileged instructions used to communicate with the hardware
Case Study: The Intel VT-x (Vanderpool) Technology
VT-x (Vanderpool) technology for IA-32 processors
enhance the performance VM implementation through hardware enhancements of the processor.
The inclusion of the new VMX mode of operation (VMX root/non-root operation)
VMX root operation
Fully privileged, intended for VM monitor New instructions – VMX instructions
VMX non-root operation
Not fully privileged, intended for guest software
Reduces Guest SW privilege w/o relying on rings
Technological Overview Root Mode (VMM) Non-Root (VM1) Non-Root (VM2) Regular Mode Regular Mode vmxon v mlaunch VM1 v mlaunch VM2 v mresume VM2 v mresume VM2 v mresume VM1 vmxoff VM1 exits VM2 exits VM2 exits VM2 exits VM1 exits
VT-x Operations IA-32 Operation VMX Root Operation VMX Non-root Operation . . . VMXON VMLAUNCH VMRESUME VM Exit Ring 0 Ring 3 Ring 0 Ring 3 VM 1 Ring 0 Ring 3 VM 2 Ring 0 Ring 3 VM n VMCS 2 VMCS n VMCS 1
Capabilities of the Technology
A Key aspect
The elimination of the need to run all guest code in the user mode.
Maintenance of state information
Major source of overhead in a software-based solution
Hardware technique that allows all of the state-holding data elements to be mapped to their native structures.
VMCS (Virtual Machine Control Structure)
Hardware implementation take over the tasks of loading and unloading the state from their physical locations.
Virtual Machine Control Structure (VMCS)
Control Structures in Memory
Only one VMCS active per virtual processor at any given time
VM execution, VM exit, and VM entry controls
Guest and host state
VM-exit information fields
** Case Study: Xen Virtualization
Xen Design Principle
Support for unmodified applica ti on binaries is essential.
Supporting full multi-application operating system is important.
Paravirtualization is necessary to obtain high performance and strong resource isolation.
Secure isolation between VMs
Resource Control and QoS
Only guest kernel needs to be ported
All user-level apps and libraries run unmodified.
Linux 2.4/2.6 , NetBSD, FreeBSD, WinXP
Execution performance is close to native.
Live Migration of VMs between Xen nodes.
Xen 3.0 Architecture
Arch Xen/X86 , replace privileged instructions with Xen hypercalls.
Notifications are delivered to domains from Xen using an asynchronous event mechanism
Modify OS to understand virtualized environment
Wall-clock time vs. virtual processor time
Xen provides both types of alarm timer
Expose real resource availability
Additional protection domain between guest OSes and I/O devices.
X86 Processor Virtualization
Xen runs in ring 0 (most privileged)
Ring 1,2 for guest OS, 3 for user-space
Xen lives in top of 64MB of linear address space.
Segmentation used to protect Xen as switching page tables too slow on standard X86
Hypercalls jump to Xen in ring 0
Guest OS may install ‘fast trap’ handler
MMU-virtualization : shadow vs. direct-mode
Para-virtualizing the MMU
Guest OS allocate and manage own page-tables
Hypercalls to change PageTable base.
Xen Hypervisor is responsible for trapping accesses to the virtual page table, validating updates and propagating changes.
Xen must validate page table updates before use
Updates may be queued and batch processed
Validation rules applied to each PTE
Guest may only map pages it owns
XenoLinux implements a balloon driver
Adjust a domain’s memory usage by passing memory pages back and forth between Xen and XenoLinux