Virtual Machines Fred Kuhns (firstname.lastname@example.org, http://www.arl.wustl.edu/~fredk) Department of Computer Science and Engineering Washington University in St. Louis
Layers of Abstraction
used to manage complexity
typically defined in layers
each layer has a well defined interface
lowest layers implemented in hardware
higher layers implemented in software
abstraction layer where software directly manipulates hardware components
Machine: denotes the system on which software is executed.
to an operating system this is generally the physical system
to an application program a machine is defined by the conbination of hardware and OS implemented abstractions
Abstraction layers have well defined interfaces
A processors instruction set defines such an interface: IA-32, IBM PowerPC, ARM
A platform’s ABI defines another: SVR4 Application Binary Interface and it’s i386 specialization.
Assume one operating system instance controls all resources
Hardware implementation affects OS abstractions
Physical resources managed by a single entity (OS or executive) and shared amongst all users
Virtualization: defines an isomorphism that maps a virtual guest system to a physical host
Virtualization is not the same as abstraction since it does not necessarily simplify interfaces or hide information.
Adds another degree of freedom by enabling multiple resource managers and controlled sharing.
Adds a level of indirection
Can virtualize a single resource (DRAM, Disks) or an entire system (machine).
may create one or more virtual objects.
Virtual Machine: Add virtualization layer which transforms the physical machine into the desired virtual architecture.
Multiple virtual machine instances on a single physical host
isolated OS instances
Use emulation to support different instruction set architectures such as Intel IA-32, PowerPC etc
Support novel architectures
Support for high-level language virtual machines (Java)
Defined by a specification
An architecture may have many implementations
low power consumption
Abstraction levels correspond to hardware and software implementation layers, each with its own specification
software: run-time system libraries, OS system calls
hardware: device controllers, I/O devices, memory architecture, system bus, ISA
Hardware/software boundary is defined by the Instruction Set Architecture (ISA)
user ISA: portion of architecture visible to an application program
system ISA: portion of architecture visible to the supervisor software
Application Binary Interface (ABI): defines program interface to the hardware resources and services
set of all user instructions
system instructions are not included in the ABI
user instructions allow program direct access to hardware
system call interface
indirect interface for accessing shared system resources and services
implemented by the system software
Application Programming Interface
defined in terms of a high-level language (HHL)
typically implemented as a system library and defined at the source level (for example libc which is linked into program’s address space)
specifies operations available by system which are implemented by the operating system or other system software
Virtual Machine Basics
Process perspective : The system ABI defines the interface between the process and machine
user-level hardware access: logical memory space, user-level registers and instructions
OS mediated: Machine I/O or any shared resource or operations requiring system privilege.
Operating system perspective : ISA defines the interface between OS and machine
system is defined by the underlying machine
direct access to all resources
Virtual machine executes software (process or operating system) in the same manner as target machine
Implemented with both hardware and software
VM resources may differ from that of the physical machine
Generally not necessary for VM to have equivalent performace
Representing the virtual machine’s resources
mapping of virtual resources or state to the real resources of the underlying machine
Emulate the virtual machine’s ABI or ISA
implement virtual instructions or system calls with the underlying real machine instructions or operating system calls.
Process virtual machine : supports an individual process
Emulates user-level instructions and operating system calls
Virtualizing software placed at the ABI layer
System Virtual Machines : emulates the target hardware ISA
guest and host environment may use the same ISA
Host: underlying hosting system
Guest: software running in the virtualized environment
Native: The virtual machine’s corresponding real machine
runtime : virtualizing software in process-oriented VMs.
virtual machine monitor : virtualizing software in system virtual machines
Virtual machines can provide emulation, optimization and replication
emulation: cross platform compatibility
optimization: by considering implementation specific information
replication: making a single resource or platform appear as many
System Virtual Machines
Early example: early time-sharing systems which multiplexed programs on the computer system
basic process virtual machine
each application program ran as a process in it's own virtualized environment
System virtual machines apply similar techniques to provide a complete virtual system
each virtual system runs its own operating system
each OS instance is presented with a complete virtual system
OS instance manages assigned virtual resources as through they are physical devices/systems.
Host platform runs a layer of software which create the virtual resources and manages sharing for guest VMs
VMM owns the real resources and manages shared access
physical resources are shared in time or space
emulate if no matching physical resource
Without loss in generality we will assume the host and guest ISA are the same.
If not then additional work must be performed to emulate instruction set and resources.
Implement multiprogramming: multiple single-user virtual machine instances. IBM System/370 used this approach to provide time-sharing behavior with each VM running a simple single-user OS (Conversational Monitor System or CMS)
Multiple single-application VMs: Dedicates a VM for each application program, uses a general purpose OS.
Multiple secure environments: VM creates sandbox to isolate environments and security domains.
Manage application environment: Install core applications in one VM then create per user VMs for them to load their own apps.
Mixed-OS environments: Single hardware platform can support multiple Operating System environments.
Legacy applications: Dedicate VMs for legacy applications.
Multiplatform applications development: One hardware platform with VMs providing emulation of alternative hardware.
New system transition: Staged or gradual migration (opposite of legacy support).
System software development: For testing or developing new system software in a protected environment.
Operating system training: Run OS instance in a VM so parameter or configuration adjustments do not affect rest of system
Help desk support: Use VM to replicate user environment
Operating system instrumentation: Can monitor hardware access or low level software abstractions
Event monitoring: execution traces, machine state dumps and replaying of traces
System encapsulation: Check pointing system state and restarting on same or different machine.
Maintaining Control of Hardware
Each VM has associated hardware state, similar to how a process has associated hardware state
VMM switches context between VMs by “swapping” the hardware context state
A VMM has two mechanisms for gaining control of the processor (and thus of the hardware resources)
use interval timer: permits time-sharing of processor (or other resources) among the VM instances
emulate all privileged ISA instructions: enables isolation between VMs and provides mechanism for VMM when resources are manipulated
Note, this implies that the VMM must also emulate the interval timer
must not allow VM direct access for writing or reading
responsible for the notion of virtual time and how warping
VMM attempts to be fair across all VMs may ultimately cause it to be unfair to individual VM instances
for example, a VM requests a timer interrupt every 1 ms but the VMM changes this to at worst every 500 ms (when it may get 500 updates).
Native VM System
Some part of the VMM must run at the highest privilege level of the system
Each guest VM’s kernel (the trusted system software) “perceives” itself as running with the highest privilege level.
The VMM runs with the highest “real” privilege level so that it may manage the resources and ensure isolation
So the VMM runs in system-mode and the guest OS runs either in the user-mode or a reduced system privilege level (platform dependent)
The VMM must emulate the system-mode privilege level for the quest OS
Hosted VM system
VMM is installed within an operating system already running on a hardware platform
VMM manages resources using the existing OS
User-mode VM system:
VMM implemented entirely at the user-level
Dual-mode VM system
Part of the VMM functionality implemented at user level
leverage existing mechanisms to extend OS functionality to run portions of the VMM within the host OS (for example using kernel divers)
Resource Virtualization - Processors
Conditions for ISA Virtualization
G. J. Popek, R. P. Goldberg, “Formal Requirements for Virtualizable Third-Generation Architectures”, Communications of the ACM (July), pp 412-421, 1974
Defined for Native Systems with VMM operting in system mode (most privileged)
VMM must keep track of the “virtual” mode (virtual user-mode, virtual system-mode) but must set actual mode of guest software to user-mode.
Assumptions (may be extended to include I/O):
Single processor and uniform memory access
Processor has two operational modes (user and system mode)
Subset of instructions are only available in system mode.
Memory addressing is relative to relocation registers (paged memory satisfies this assumption).
Virtual machine modeled as the 4-tuple S = <E, M, P, R>
E - executable storage
M - operational mode
P - Program counter
R - memory relocation registers (base and bounds)
Memory trap occurs if program accesses memory outside of R (specified bounds)
trap automatically saves machine state: M , P , R
The copies new machine state into M , P and R
Privileged instructions also cause a trap if executed in user mode
It is not sufficient that an instruction have different behaviors in system and user modes. A trap must result if in user mode.
Guest operating systems and their applications must both operate in user mode
Control sensitive - instructions which may change the configuration of system resources (e.g., the current page table register)
Behavior sensitive - instructions whose behavior or results depend on the configuration of resources or operational mode (e.g., load instruction which depends on the page table in use)
Innocuous - all remaining instructions.
Functions of a VMM
Dispatcher - system interrupts/traps are first processed by the dispatcher module. It in turn “dispatches” or demuxes the event to the appropriate handler
Allocator - Invoked by the dispatcher when the event requires system resource configuration changes.
Control sensitive operations which change resource allocations are directed to allocator
Implements the resource allocation and sharing policies of the VMM
Interpreter routines - emulates privilege instructions not affecting current allocations
emulates privileged instructions operating on virtual resources.
Properties of an Efficient VMM
Efficiency : Innocuous instructions must be executed natively (directly) on the hardware.
Resource control : Guest VM must not be able to directly change the configuration of system resources (only the virtual resources assigned to it)
Equivalence : Any program executing on a VM must behave identically to the way it would behave running natively on a dedicated hardware platform.
There are a few exception to this rule:
Reduced performance due to emulation is OK
May be a limitation on total available resources
Differences in timing relationships are OK
Theorem: For any conventional third-generation computer, a virtual machine monitor may be constructed if the set of sensitive instructions is a subset of the set of privileged instructions
VMM must interpret sensitive instructions in terms of the current Virtual Machine’s state (i.e. the Guest VM’s state and virtual user/system mode)
If a privileged instruction is executed by a VM operating in virtual user-mode then a virtual trap is sent to the guest VM’s OS.
Theorem Not Satisfied
There are sensitive instructions which are not also privileged
Intel IA-32 POPF instructions behaves differently when executed in system mode versus user mode. It is not a privileged instruction
IA-32 has 17 critical instructions
VMM must use interpretation or emulation to detect and handle these critical instructions (sensitive but not privileged)
VMM may scan object code and just replace these critical instructions with a trap to the VMM (aka patching)