Memory – Both virtual and physical memory, memory management, TLB/page tables, etc.
Exceptions – Trap architecture, interrupts, etc.
I/O – I/O devices accessed using programmed I/O, DMA, interrupts.
The state of a machine must be maintained.
Physical machine: latches, flip-flops, etc.
Virtual machine: combination of physical machine and state emulated in software using RAM, etc.
At certain points in execution, such as a trap, the state of the machine must be “materialized”.
Not trivial due to complex hardware techniques used to provide high performance.
This ability to materialize the state is termed “preciseness”.
Three aspects of virtualization
State: registers and memory
Instructions: may involve emulation
State materialization: when exceptions occur
VMs can support an individual process only, or can support a whole OS.
Can construct a useful taxonomy based on:
process or system
same ISA or different ISA
“ Classic” OS VMs (IBM)
A process has the illusion of having the whole machine to itself.
What are relative merits?
Especially useful with some kind of profile-directed translation.
High Level Language VMs
High-level language is compiled to an intermediate language.
VM then runs the intermediate language.
Example is Java: Interpreted or translated?
Java Virtual Machine
JVM execute platform-neutral bytecodes.
JVM consists of
- class loader
- class verifier
- runtime interpreter
Just-In-Time (JIT) compilers increase performance
“ Classic” (Define. Pros/cons?)
VMM built directly on top of hardware.
Most efficient, but requires wiping the slate clean.
Requires device drivers in the VMM.
Hosted (Define. Pros/cons?)
VMM built on top of existing OS.
Devices drivers supplied by host OS, VMM uses facilities provided by host OS.
Whole System VMs: Emulation
ISA not the same, must emulate everything.
Co-Designed VMs: Optimization
Hardware designed to support VMs.
Provides a clean design for virtualization.
Can be significantly more efficient.
Virtual Machine Monitor
Thin layer of software that virtualizes the hardware
Exports a virtual machine abstraction that looks like the hardware
Hardware Operating System App App Virtual Machine Monitor Virtual Machine Monitor (VMM) App App App Operating System Operating System
Old idea from the 1960s
IBM VM/370 – A VMM for IBM mainframe
Multiplex multiple OS environments on expensive hardware.
Desirable when few machine around.
Interest died out in the 1980s and 1990s.
Hardware got cheap.
Compare Windows NT verses N DOS machines
Interesting again today
Difference problems today – software management
VMM attributes still relevant
Virtual Machine Monitor attributes
Runs pretty much all software
Trick: Make virtual hardware match real hardware.
Low overheads/High performance
Near “raw” machine performance
Direct execution of CPU/MMU.
Total data isolation between virtual machines
Use hardware protection.
Virtual machines are not tied to physical machines
Different thought about OSes
Installing software on hardware is broken
Tight coupling of OS and applications to hardware creates management problems.
Want to subdivide OS:
System support software
Turn OSes into normal software that can be managed
Backward compatibility with VMMs
Backward compatibility is bane of new OSes.
Huge effort require to innovate but not break.
Recent security consideration make it impossible
Choice: Close security hole and break apps or be insecure
Example: Not all WinNT applications run on WinXP.
In spite of a huge effort to make WinXP compatible.
Given the number of applications that run on WinNT, practically any change will break something.
If (OS == WinNT)….
Solution: Use a VMM to run both WinNT and WinXP
Obvious for OS migration as well: Windows -> Linux
Cisco Content Engine 590 Intel Appliance Linux Windows 2000 RealPlayer Server Media Server IP chain
Isolation: Access to Classified Networks
Traditional tension: Security vs. Usability
Secure systems tend not to be that usable.
Flexible systems are not that secure.
Additional information assurance requirement:
Data cannot flow between networks of different classification.
Solution: Run two VMs:
Use isolation property to isolate two VMs
VMM has control of the information flow between machines
National Security Agency NetTop Classified VM VPN Internet VM Firewall SE-Linux
Logical partitioning of server machines
Run multiple servers on same box
Ability to give away less than one machine
Modern CPUs more power than most services need.
0.10U rack space machine - Better power, cooling, floor space,etc.
Server consolidation trend: N machine -> 1 real machine
Isolation of environments
Printer server doesn’t take down Exchange server
Compromise of one VM can’t get at data of others
Provide service-level agreements
Linux, FreeBSD, Windows, etc.
Scenario : Server Consolidation Web Server App Server Web Server App Server Database Server Database Server App Server App Server Web Server Web Server VMware MultipleWorlds + Physical Hardware
VMM Arrangements Host OS VMM Guest 1 Guest 2 VMM Guest 1 Guest 2 Host OS VMM Guest 1 Guest 2 Type-2 VMM Type-1 VMM (Hypervisor) Hybrid VMM Examples: JVM CLR Examples: Virtual PC & Virtual Server Examples: Windows Virtualization Hardware Hardware Hardware What we have today What we’re building for the future
Very thin layer of software
Much smaller Trusted Computing Base (TCB)
No built-in driver model
Leverage the large base of Windows drivers
Drivers run in a partition
Will have a well-defined, published interface
Allow others to create support for their OS’s as guests
Hardware virtualization assists are required
Intel Virtualization Technology
Monolithic vs. Microkernelized
Simpler than a modern kernel, but still complex
Contains its own drivers model
Simple partitioning functionality
Increase reliability and minimize TCB
No third-party code
Drivers run within guests
Hypervisor VM 1 (“Admin”) VM 2 VM 3 Hardware Hardware Hypervisor VM 2 (“Child”) VM 3 (“Child”) Virtual- ization Stack VM 1 (“Parent”) Drivers Drivers Drivers Drivers Drivers Drivers Drivers Drivers Drivers Drivers Drivers Drivers
The Hypervisor CPU Hard Drive Ethernet NIC RAM
Thin layer of software running on the hardware
Supports creation of partitions
Each partition is a virtual machine
Each partition has one or more virtual processors
Partitions can own or share hardware resources
Software running in partition is called a guest
Enforces memory access rules
Enforces policy for CPU usage
Virtual processors are scheduled on real processors
Enforces ownership of other devices
Provides simple inter-partition messaging
Messages appear as interrupts
Exposes simple programmatic interface called “hypercalls”
x86 problem: POPF (different semantics in different rings)
Privilege level should not be visible to software
Software in VM should be able to query and find its level in a VM
x86 problem: MOV ax, cs
Trap should be transparent to software in VM
Software in VM should be able to tell if instruction trapped.
x86 problem: traps can destroy machine state.
CPU Trap architecture virtualization
What happens when an interrupt or trap occurs.
Like all OSes: we trap into the monitor.
What if the interrupt or trap should go to the VM?
Example: Page fault, illegal instruction, system call, interrupt.
Run the simulator again.
X86 example: Lookup trap vector in VM’s IDT.
Push cs, eip, eflags, on stack.
Switch to privileged mode.
Virtualization requirements - Virtualizing Memory
Basic MMU functionality:
OS manages physical memory (0…MAX_MEM).
OS sets up page tables mapping VA->PA.
CPU accesses VA to should go to PA.Paging off: PA=VA.
Used for every instruction fetch, load, or store.
Need to implement a virtual physical memory
Logically need additional level of indirection
VM’s VA -> VM’s PA -> machine address
Trick: Use hardware MMU to simulate virtual MMU.
Can be folded into page tables: VA->machine address
Trick: Monitor keeps shadow of VM’s page table
Contains mapping to physical memory allocated for that VM.
Access causes Page Fault:
Lookup in VM’s page table mapping from VPN to PPN.
Determine where PPN is in machine memory (MPN).
Monitor can demand page the virtual machine
Insert mapping from VPN->MPN into shadow page table.
Uses hardware protection
Monitor never maps itself into VM’s page table
Monitor never maps memory allocated to other VMs in VM’s page table
I/O device virtualization
Type of communication:
Special instruction – IN/OUT.
Memory mapped I/O (PIO).
Make IN/OUT and PIO trap into monitor.
Run simulation of I/O device.
Interrupt – Tell CPU simulator to generate interrupt.
DMA – Copy data to/from physical memory of virtual machine.
Virtual Machine Uses
One ISA can be used to emulate another.
Provides cross-platform portability.
Emulators can optimize as they emulate.
Also can optimize same ISA to same ISA.
A single physical machine can be replicated, providing isolation between the VMs.
Two virtual machines can be composed, combining the functionality of each.
Example: Using VMM to enhance security
Problem Area: Intrusion Detection Systems (IDS).
Host-based IDS (HIDS):
+ Good visibility to catch intruder.
- Weak isolation from intruder disabling/masking IDS.
Network-based IDS (NIDS):
+ Good isolation from attack from intruder.
- Weak visibility can allow intruder to slip by unnoticed.
Would like visibility of HIDS with isolation of NIDS.
Idea: Do it in the virtual machine monitor.
VMM-based Intrusion Detection System
VMM isolate software in VM from VMM.
Comprise OS in VM can’t disable IDS in VMM.
Introspection – Peer inside at software running in VM
VMM can see: Physical memory, registers, I/O device state, etc.
Signature scan of memory
Look through physical memory for patterns or signs of break-in
Interposition – Modify VM abstraction to enhance security
Memory Access Enforcer
Interpose on page protection.
NIC Access Enforcer
Interpose on virtual network device.
Collective Project: A Compute Utility
Distributed system where all software runs in VMs
Research with Prof. Monica Lam and students.
Virtual Appliance abstraction
x86 virtual machine.
Target specialized environment (e.g. program development)
Store in a centralized persistent storage repository.
Cached on the machine were virtual appliances run.
Centralize and amortize administration of a virtual appliance.
Computing environment follows user around.
Virtualizing I/O Devices on VMware Workstation’s Host VMM
Virtualizing the PC Platform
Some privileged instructions fail silently. (Why is this a problem?) (What’s the solution?)
PC hardware diversity
Why is this problematic for a “classic” VM?
Pre-existing PC software
Must stay compatible
To address these, VMware uses a hosted VM. (Not a “classic” VM.)
Hosted VMware Architecture VMware achieves both near-native execution speed and broad device support by transparently switching* between Host Mode and VMM Mode. Guest OS Applications Guest Operating System Host OS Apps Host OS PC Hardware Disks Memory CPU NIC VMware App Virtual Machine VMware Driver Virtual Machine Monitor *VMware typically switches modes 1000 times per second The VMware Virtual machine monitor allows each guest OS to directly access the processor (direct execution) VMware, acting as an application, uses the host to access other devices such as the hard disk, floppy, or network card VMM Mode Host Mode
VMApp runs in the host, using the VMDriver host kernel component to establish the VMM.
CPU is thus executing in either the host world or the virtual world, using VMDriver to switch worlds.
World switches are expensive, since user and system state must be switched.
Virtualizing the NIC
I/O port operations by guest OS must be intercepted by VMM.
Must then be processed in the VMM (to maintain the virtual state).
Or executed in the host world. (When must it do what?)
Send operations start as a sequence of ops to virtual I/O ports.
Upon finalization of the send, the VMApp issues a host OS syscall to the VMNet driver, which passes it on the real NIC.
Finally requires raising a virtual IRQ to signal completion.
Receive operations operate in reverse.
VMApps executes select() syscall on possible sources.
Reads packet, forwards it to VMM which raises a virtual IRQ.
Virtualizing a Network Interface Host OS PC Hardware Physical NIC VMApp VMDriver Guest OS VMM Physical Ethernet NIC Driver NIC Driver Virtual Bridge Virtual Network Hub
Guest OS out to I/O port
Trap to VMDriver
Pass to VMApp
Syscall to VMNet
Pass to actual NIC driver
Actual NIC delivers to VMNet driver
VMNet driver causes VMApp to return from select()
VMApp copies packet to VM memory
VMApp asks VMM to raise virtual IRQ
Guest OS performs port operations to read data
Trap to VMDriver
VMApp returns from ioctl() to raise IRQ
Reducing Network Virtualization Overheads
Handling I/O ports in the VMM
Many accesses don’t involve actual I/O.
Let the VMM maintain the state, avoiding a worlds switch.
If data rate is high, queue up packets, send them in a group.
Use shared memory bitmap rather than requiring VMApp to call select() when an IRQ is received on the host system.
Reducing CPU virtualization overhead
Find operations to the interrupt controller that have memory semantics and replace with MOV operation, which does not require intervention by the VMM.
Apparently requires dynamic binary translation.
Modifying the guest OS
Eliminate idle task page table switching, which is not necessary, since the idle task pages are mapped in every process page table.
Run idle task with page table of last process.
What would happen if the idle task had a bug and wrote to some random addresses?
Creating a custom virtual device
Virtualizing a real device is somewhat inefficient, since the interface to these devices is optimized for real devices, not virtual devices.
Designing a custom virtual device can reduce expensive operations.
Disadvantage is that must write a new device driver in guest OS for this virtual device.
Modifying the host OS
VMNet driver allocates kernel memory sk_buff , then copies from VMApp to sk_buff .
Can eliminate copy by using memory from VM physical memory.
Bypassing the host OS
VMM uses own drivers, rather than going through the host OS. (Note that going through the host OS is using a kind of process VM provided by the host OS.)
Disadvantage is that you have to write your own VMM driver for every supported real device.
Main goal is to develop some understanding of the issues of hosted system VM performance.
Windows Virtualization Architecture Mark Kieffer Group Program Manager Windows Virtualization markkie @ microsoft.com Microsoft Corporation
Microsoft’s current virtualization offerings
Current virtualization uses and benefits
Uses for virtualization today
Microsoft’s current virtualization offerings
Windows Virtualization Architecture
Current Virtualization Uses and Benefits
Workloads that are enabled by virtualization
Efficient software development and test
Dynamic data centers
High availability partitions
Microsoft’s Current Virtualization Offerings
Virtual PC 2004
Being deployed in production environments
Demos, training, helpdesk
Being deployed in test and dev environments
Multiple test beds on a single piece of hardware
Virtual Server 2005
Released Q4 2004
Well received in the industry
Used for production server consolidation
Remote management of virtual machine operations
Great perf gains and functionality enhancement in SP1
64-bit host support, PXE support, and others
Microsoft's Next Gen Virtualization Architecture
Introducing Windows virtualization for servers
Separate, small management partition (parent)
Takes device virtualization to the next level
Targeting availability in the Longhorn wave
Definition of a couple of terms
Parent partition: a partition that manages its children
Child partition: any number of partitions that are started, managed, and shut down by their parent
Virtualization Stack: The collection of components that runs in the parent partition for VM management
Windows Virtualization for Servers
Some proposed features
32-bit and 64-bit guests
WMI management and control API
Save & restore
CPU and I/O resource controls
Tuning for NUMA
Dynamic resource addition & removal
Will run within a parent partition
Stand alone in a small footprint OS (MinWin)
Full Windows OS
Multiple virtualization stacks could co-exist
Virtualization Stack Hypervisor Parent Partition VM Service VM Worker Process Virtualization Infrastructure Driver VM Worker Process VM Worker Process VMBus Bus Driver
Collection of user-mode & kernel-mode components
Runs within a partition on top of a (minimal) OS
Contains all VM support not in the hypervisor
Interacts with hypervisor
Calls the hypervisor to perform certain actions
Responds to messages from the hypervisor or from other partitions