Slides on Virtualization and Xen.Presentation Transcript
Virtualization CS623 11/8/2006 Caution: Still a new topic for me as well. Note: these slides draw, sometimes verbatim, on the papers cited on the next slide.
Xen and the Art of Virtualization
Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauery, Ian Pratt, Andrew Wareld, SOSP 2003.
What is virtualization?
Why would you do it?
Why is it important?
What is virtualization?
Used to present the illusion of many smaller virtual machines (VMs) each running a separate operating system interface.
Run multiple XPs on XP.
Run LINUX, Solaris, XP on XP.
Run XP on LINUX.
Why would you do it?
Single user: You have an XP box and you want to run LINUX.
Single user: You don’t trust security of one OS or of some applications and you’d like to “wall it off.”
Miroslav Ponec: Run linux on new laptop in vmware to avoid driver problems.
Enterprise manager: You have lots of boxes that sit idle a lot of the time. If you multiplex you can save hardware, etc. costs.
Shared grid/resource: host multiple (untrusted?) applications and servers on a shared machine.
Number of ways to build a system to host multiple applications and servers on a shared machine.
Deploy hosts running standard OS and allow users to install files and start processes, protection between processes provided by standard OS techniques.
System administration challenging due to complex configuration interaction
No adequate support for performance isolation: scheduling priority, memory demand, network traffic and disk accesses of one process impact the performance of others.
Possible solution: retrofit support for performance isolation to the operating system.
Hard to ensure that all resource usage is accounted to the correct process– complex interactions due to buffer cache or page replacement algorithms.
Heck, we had trouble getting the Operating System to properly account for resource usage of different threads in one server.
Multiplex physical resources at the granularity of an entire operating system and provide performance isolation.
More heavyweight in terms of initialization and resource consumption.
Xen: “For target of up to 100 hosted OS instances, price worth paying. Enhanced flexibility, avoid configuration interactions (e.g. Windows registry).
A little hype from the media …
eweek on Virtualization 11/25/05
IT departments are doing this to try to find "ways to use the newest in technology (processors, storage, memory, communications, and software) to improve: the application environment by increasing performance; optimizing processor utilization through workload management, scalability and reliability; increasing organizational efficiency by reducing costs of hardware, software and staff; and reducing both the number and the impact of system outages regardless of the underlying reason," said Kusnetzky.
At a recent Gartner Symposium/ITxpo, Gartner Inc. vice president John Enck called virtualization a "megatrend."
"We see virtualization being extremely important across all server types" and "virtualization is the best tool you have right now in the market to increase efficiency and drive up the utilization of your servers," said Enck.
What all this boils down to is that virtualization should make today's more powerful computers more productive while simultaneously making them easier and cheaper to manage.
The trick is how to make this happen.
NOVEMBER 21, 2005 (IDG NEWS SERVICE) - A recent survey of 100 IT executives predicts that IT spending will decrease slightly in 2006 as more businesses worry about global economic conditions, but security software and enterprise IT upgrades remain top concerns.
Macroeconomic factors such as high oil prices and a devastating hurricane season in the U.S. have caused 40% of the executives surveyed by Goldman, Sachs & Co. to consider reducing their 2006 IT budgets, according to survey results released Friday. Most executives, 52%, believe IT spending will be unchanged in 2006.
Security software has been a long-running priority among the executives on Goldman’s survey panel, and nothing has changed that mind-set based on the current results. Spending on antivirus products has eased up after a flurry of activity, but CIOs continue to focus on improving security in areas like identity management and regulatory compliance, the survey said.
Other enterprise software priorities include enterprise resource management and customer relationship management systems, with CIOs upgrading those two categories to top priorities. When Goldman polled its panel in April, ERP and CRM software were considered only medium priorities.
Among enterprise software vendors, VMware Inc. and SAP AG were the two most cited companies receiving a larger percentage of the respondents’ IT budgets. Virtualization technologies are a hot topic this year as Intel Corp. and Advanced Micro Devices Inc. prepare chips that improve the performance of virtualization software . Respondents listed Novell Inc. and Computer Associates International Inc. as receiving less of their IT budgets.
ZDNet Blog 11/14/05
“ With virtual machines of the desktop sort that VW5 enables, PC users can literally carve their desktop and notebook systems into completely separate instances of Windows that run side-by-side with each other as though the other instances don't exist. In other words, if some process in one tries some sort of security exploit like a buffer overflow, it can't get to the others any more than a buffer overflow could affect another computer across the network. It can only get to whatever is running in that instance or "partition of Windows." The idea of partitioning systems in this way makes it possible to dedicate partitions to specific activities. For example, you can do all your Web browsing in one partition while you run your corporate applications in another and your personal applications like Quicken in a third and never the three shall meet. I'm a Firefox user. But for those Web sites that require Internet Explorer (which I'm always nervous about using), I just run it in a separate partition. Using a virtual machine for just one application is like driving on a completely empty road with airbags. “
Intel has announced the arrival of the first desktop chips to include its hardware-based virtualization technology known as VT (codenamed Vanderpool ). This could very well signal a new era in desktop/notebook computing and I would think long and hard before buying a new system that doesn't include this new and worthwhile technology.
So, why is the Intel announcement so significant? Until Intel started releasing its VT technology (it first debuted in the company's recently announced Paxville XEON server chips ), companies like SWSoft, VMWare, and Microsoft had to do a lot of the virtual machine heavy lifting in their software. Without any hardware assistance the likes of which VT provides, it takes far more in the way of physical resources (processor, memory) to launch and run virtual machines than it does if those instantiations can be activated through hardware. While such technologies make it easier for competing virtual machine software solutions like Xen to get in the virtual machine game, Raghu Raghuram, VMware's senior director of strategy and marketing, told me earlier this year that his company welcomes innovations like VT because end users will get better performance and his company can focus its attention on adding value in higher layers of the virtualization stack such as management. VMWare is wasting no time in rolling out its support for Intel's VT technology. According to a press release on its Web site, VT support is being beta tested in version 5.5 of VMWare Workstation, which the company expects to release by the end of the year.
Dianne Greene, President, VMWare
To start out, why don't you describe what your company does? VMware produces virtualization software. What that means is we take a physical x86-based system and we provide the multiple isolated, movable partitions that you can run operating systems with their applications in. In terms of what the customer gets, they get a way to drive utilization from, say, 15 percent, on up to 85 percent. They get very cost-effective ways to do disaster recovery, high availability, provisioning--all sorts of system-level services.
Pick a typical customer. What's their life before and after VMware? What changes? A typical customer has got widely proliferated x86 machines, and depending on the power of the server, they can get a 10-to-1, 4-to-1 reduction in the number of servers they need. Or they can stop that proliferation and contain it better. And beforehand, to bring a new service online you have to go order the machine, install it in the server room, get it network-connected, make sure the power is there--it can be a multi-month process. Post-VMware, all they do is keep pre-built images of different software services like SQL Server, and when someone needs that service, they just find some excess capacity somewhere and deploy it.
So what's the penalty? Why doesn't everybody do this? Actually, what we were finding is that for people who use it, it's become the default way that they run their x86 workloads.
OK, I’m convinced
So what do we do?
First let’s think about high-level challenges and approaches.
VMs must be isolated from each other: it is not acceptable for execution of one to adversely affect performance of the other.
Have to think about what this really means.
Support variety of OSs.
Performance overhead introduced by virtualization should be small.
Virtual hardware exposed is functionally identical to the underlying machine.
Allows unmodified operating systems to be hosted.
Seems like this is what VMWare supports.
DrawBacks of Full Virtualization
Especially on x86 architecture:
Support for full virtualization never part of x86 design, e.g. certain supervisor instructions would need to be handled by the VMM for correct virtualization, but executing with insufficient privilege fails silently as opposed to a nice trap.
Virtualizating x86 MMU is also a challenge.
VMWare ESX Server dynamically rewrites portions of the hosted machine code to insert traps wherever VMM intervention might be required. Applied to entire guest OS kernel since all non-trapping privileged sintrsuctions must be caught and handled.
ESX maintains shadow versions of things like page tables and maintains consistency with the virtual tables by trapping every update attempt – high cost for update-intensive operations such as creating a new application process.
More arguments against Full Virtualization
Sometimes it is desirable for hosted OS to see real as well as virtual resources:
providing both real and virtual time allows a guest OS to better support time-sensitive tasks and to correctly handle TCP timeouts and RTT estimates
Exposing real machine addresses allows a guest OS to improve performance by using superpages or page coloring.
Xen Approach: Paravirtualization
Present a virtual machine abstraction that is similar but not identical to the underlying hardware.
Requires modifications to the guest OS.
No changes to the application binary interface (ABI), so no modifications needed to applications.
Xen Design Principles
Support for unmodified application binaries is essential.
Need to support full multi-application operating systems.
Paravirtualization is necessary to obtain high performance and strong resource isolation on uncooperative machine architectures such as x86.
Even on cooperative machine architectures, completely hiding the effects of resource virtualization from guest OSes risks both correctness and performance.
Guest OS : one of the OSs that Xen can host.
Domain : running virtual machine within which a guest OS executes.
Xen itself is called the hypervisor since it operates at a higher privilege level than the supervisor code of the guest operating systems that it hosts.
Xen’s Paravirtualized (x86) Interface
Need to discuss
the architecture provides a software-managed TLB as these can be easily virtualized.
Tagged TLB: ability to associate an address-space identifier tag with each TLB entry to allow hypervisor and each guest OS to efficiently coexists in separate address spaces – no need to flush the entire TLB when transferring execution.
(What’s a TLB?)
Short for t ranslation l ook-aside b uffer , a table in the processor’s memory that contains information about the pages in memory the processor has accessed recently. The table cross-references a program’s virtual addresses with the corresponding absolute addresses in physical memory that the program has most recently used. The TLB enables faster computing because it allows the address processing to take place independent of the normal address-translation pipeline .
Unfortunately x86 does ot have a software-managed TLB: TLB misses are serviced automatically by the processor by walking the page table structure in hardware.
Thus to achieve best possible performance, all valid page translations for the current address space should be present in the hardware-accessible page table.
Moreover, because the TLB is not tagged, address space switches require a complete TLB flush.
Given these limitations, two decisions:
Guest OS has direct read access to hardware page tables, but updates are batched and validated by the hypervisor.
Xen exists in a 64MB section on the top of every address space, thus avoiding a TLB flush when entering and leaving the hypervisor.
OS no longer most privileged entity in system. Guest OS must run at a lower privilege level than Xen.
X86 has 4 privilege levels, 2 unused, so OK.
Guest OS can’t execute privileged instructions, but protected from applications at privilege level 3.
Privileged instructions “paravirtualized” by requiring them to be validated and executed within Xen.
Exceptions (e.g. memory faults, software traps): Guest OS must register a descriptor table for exception handlers with Xen.
Usually the same as real x86 hardware. Page fault handler would need to read from a privileged register, so need to work around this.
Only two types of exceptions frequent enough for real performance hits:
System Calls: Guest OS may install a “fast” handler for system calls, allowing direct calls from an application into its guest OS and avoiding indirecting through Xen on every call.
Can’t do with page faults – only code executing in ring 0 can read the faulting address from register CR2.
Hardware interrupts replaced with a lightweight event system.
Each guest OS has a timer interface and is aware of both ‘real’ and ‘virtual’ time.
Xen exposes a set of clean and simple device abstractions.
Allows protection and isolation
I/O Data transferred to and from each domain via Xen, using shared-memory asynchronous buffer rings.
Lightweight event delivery mechanism used for sending asynchronous notifications to a domain.
Control and Management
“ Separate policy from mechanism”
Keep hypervisor out of as much as possible.
Hypervisor provides only basic control operations.
Exported through and interface accessible only from authorized domains.
Domain is created at boot time which is permitted to use the control interface. This domain (Domain0) responsible for hosting application-level management software.
Control interface allows creation and termination of other domains and their scheduling parameters, physical memory allocation and access given to machine’s physical disks and network drives.
Control interface exported to a suite of application-level management software running in Domain0.
Tools allow creation and destruction of domains, set network filters and routing rules, creation and deletion of virtual network interfaces and virtual block devices.
Cost of Porting
Synchronous calls from a domain to Xen made using a hypercall.
Domain can perform a synchronous software trap into the hypervisor to do privileged operation.
Notifications delivered to domains from Xen using asynchronous event mechanism.
Small number of events: new data received, virtual disk request has been completed.
Data Transfer: I/O Rings
The presence of a hypervisor means there is an additional protection
domain between guest OSes and I/O devices, so it is crucial
that a data transfer mechanism be provided that allows data to move
vertically through the system with as little overhead as possible.
Two main factors have shaped the design of I/O-transfer
mechanism: resource management and event notication. For resource
accountability, attempt to minimize the work required to
demultiplex data to a specific domain when an interrupt is received
from a device . The overhead of managing buffers is carried out
later where computation may be accounted to the appropriate domain.
Similarly, memory committed to device I/O is provided by
the relevant domains wherever possible to prevent the crosstalk inherent
in shared buffer pools; I/O buffers are protected during data
transfer by pinning the underlying page frames within Xen.
Subsystem: CPU Scheduling
Uses Borrowed Virtual Time algorithm.
Has low-latency wakeup of a domain when it receives an event.
Fast dispatch important to minimize effect of virtualization on OS subsystems that need to run in a timely fashion, e.g. TCP relies on timely delivery of acknowledgements to estimate round-trip times.
BVT uses virtual-time warping, which temporarily violates ideal “fair sharing” to favor recently-woken domains.
Subsystem: Time and Timers
Xen provides guesOSes with notions of
Wall-clock time: offset to real time
Each guest OS can program a pair of alarm timers, one for real time and one for virtual time.
Timeouts delivered using Xen’s event mechanism.
Virtual Address Translation
Xen tries to virtualize this with as little overhead as possible.
Harder dues to x86’s use of hardware page tables.
VMWare: provide each guest OS with a virtual page table, not visible to the memory management unit. Hypervisor responsible for trapping accesses to the virtual page table, validating updates, and propagating changes back and forth between it and the MMU-visible “shadow” page table.
Full virtualization forces use of shadow page tables, Xen is not so constrained
Xen only involved in page table updates to prevent guest OSes from making unacceptable changes.
Approach: Register guest OS page tables directly with MMU, and restrict guest OSes to read-only access.
Initial memory allocation ore reservation for each domain is specified at the time of its creation. Memory statically partitioned between domains, providing strong isolation.
Maximum-allowable reservation also specified: if memory pressure in a domain increases, it may then attempt to claim additional memory pages from Xen, up to the limit.
If a domain wants to save resources, can release pages back to Xen.
XenoLinux implements a balloon driver, which adjusts a domain’s memory usage by passing memory pages back and forth between Xen and XenoLinux’s page allocator.
Could modify Linux MM routines directly, balloon driver makes adjustments by using existing OS functions, thus simplifying Linux porting effort.
Paravitualization could be used to extend the capabilities of this driver: e.g. out-of-memory handling mechanism in the guest OS can be modified to automatically alleviate memory pressure by requesting more memory from Xen.
Xen provides abstraction of virtual firewall-router where each domain has 1 or more network interfaces.
Rules for transmit/receive/whatever.
Only Domain0 has direct access to physical disks.
All other domains access disk through abstraction of virtual block devices.
Domain0 manages the VBDs – keeps mechanisms in Xen very simple.
VBD comprises a list of extents with associated ownership and access control information.
Guest OS disk scheduling algorithm will reorder requests prior to queueing them on the ring in an attempt to reduce response time or to supply differentiated service.
Xen has more complete knowledge of actual disk layout, so we support reordering within Xen, and responses may come back our of order.
Xen services batches of requests from competing domains in a simple round-robin fashion; these are then passed to a standard elevator scheduler before reaching disk hardware. Domains can pass down reorder barriers to prevent reordering.