Present the Center by describing: This is Intel doing software in Latinamerica The center has strong empowerment in defining the areas of expertise it wants to work Focus areas Services High Performance Computing – Clusters Pathfinding and Innovation Describe the role of ASPI Identify market trends Work through pathfinding technology projects to identify real business opportunities Describe how is ASPI divided Areas of expertise, one of them is Virtualization Technology Our mission is to identify opportunities in client platforms for this technology through projects supporting evolving usage models
Get an understanding of the audience, so you can best agree what you’re going to provide during the course and that all expectations will be considered and hopefully satisfied.
This course has a longer term goal than just provide a good description of virtualization technology, but also to sparkle new projects or research proposals around virtualization technology. The contents of this course were established to serve beginners and advanced people, through: For beginners, it starts from the very beginning of the virtualization technology by starting with high level solution description, then describing usages, software solutions, hardware support and finally how to continue. For advanced, it provides 2 specific topics where deep analysis is done: VMMs (especially Xen) and Hardware assisted virtualization Finally, give a description of the structure of the course, going day by day to clarify expectations. Last day will provide a chance to “play” with some of the VMMs to do some activities.
In computing , virtualization is a broad term that refers to the abstraction of computer resources: Platform virtualization , which separates an operating system from the underlying platform resources Full virtualization Hardware-assisted virtualization Paravirtualization Operating system-level virtualization Resource virtualization, the virtualization of specific system resources, such as storage volumes, name spaces, and network resources Virtual memory , which allows uniform, contiguous addressing of physically separate and non-contiguous memory and disk areas Redundant array of independent disks and logical volume management , combine many disks into one large logical disk. Storage Virtualization , the process of completely abstracting logical storage from physical storage Channel bonding , the use multiple links combined to work as though they offered a single, higher-bandwidth link Network virtualization , creation of a virtualized network addressing space within or across network subnets Computer clusters and grid computing , the combination of multiple discrete computers into larger metacomputers Application Virtualization , the hosting of individual applications on alien hardware/software Portable application Cross-platform virtualization Emulation or simulation Virtualization Development , further work in this area Desktop virtualization , the remote manipulation of a computer desktop In the past One operating systems in one machine, so the OS had completely control of the resources in that machine. Virtualization Era Virtualization is a software layer in between the machine and the operating system. Essentially what this software layer does is to divide the resources of the machine among all the guest operating systems. The Virtualization Layer is in charge of multiplexing the hardware resources to several operating systems. Each OS has the illusion that it controls the complete hardware but, in fact, the machine can now host a number of operating systems because the virtualization layer makes all the switching behind scenes. Supporting multiple instances of Operating Systems: Homogeneous or Heterogeneous One physical machine can host several Linux/Window copies
Machine states The machine can exist in any one of a finite number of states where each state has four components: executable storage E, processor mode M, program counter P, and relocation-bounds register R. S = (E, M, P,R) Executable storage is a conventional word or byte addressed memory of size q. The relocation-bounds register, R = (l, b) is always active, regardless of the machine's current mode. The relocation part l of the register gives an absolute address, which will correspond to the apparent address 0. The bounds part b will give the absolute size (not the largest valid address) of the virtual memory. If it is desired to access all of memory, the relocation must be set to 0 and the bounds to q -- 1. The Mode refers to two valid modes: supervisor and user mode. There are three properties of interest when analyzing the environment created by a VMM: Equivalence A program running under the VMM should exhibit a behavior essentially identical to that demonstrated when running on an equivalent machine directly. Resource control The VMM must be in complete control of the virtualized resources. Efficiency A statistically dominant fraction of machine instructions must be executed without VMM intervention. In Popek and Goldberg terminology, a VMM must present all three properties. In today's terminology, VMM are typically assumed to satisfy the equivalence and resource control properties. So, in a sense, Popek and Goldberg's VMMs are today's efficient VMM. Virtualization requirements To derive their virtualization requirements, Popek and Goldberg introduce a classification of instructions of an ISA into 3 different groups: Privileged instructions Those that trap if the processor is in user mode and do not trap if it is in system mode. Control sensitive instructions Those that attempt to change the configuration of resources in the system (i.e. change the amount of memory used). Trying to change the limit of a segment. Behavior sensitive instructions Those whose behavior or result depends on the configuration of resources (the content of the relocation register or the processor's mode). Trying to create a new page in a table.
Theorem 1 . For any conventional third generation computer, a VMM may be constructed if the set of sensitive instructions for that computer is a subset of the set of privileged instructions. This guarantees the resource control property. Non privileged instructions must instead be executed natively (i.e., efficiently). The holding of the equivalence property also follows. Theorem 2 . A conventional third generation computer is recursively virtualizable if it is virtualizable and a VMM without any timing dependencies can be constructed for it. Popek and Goldberg extractions: Description of VM: A virtual machine is taken to be an efficient, isolated duplicate of the real machine. Defines the limits to create an identical machine, Equivalence: Any program run under the VMM should exhibit an effect identical with that demonstrated if the program had been run on the original machine directly, with the possible exception of differences caused by the availability of system resources and differences caused by timing dependencies. Defines what it means by Efficiency: Any program run under the VMM should exhibit an effect identical with that demonstrated if the program had been run on the original machine directly, with the possible exception of differences caused by the availability of system resources and differences caused by timing dependencies. Resource control: The VMM is said to have complete control of resources such as memory, peripherals, and the like Definition of trap: When an instruction traps, storage is left unchanged, except for location zero in which is put the PSW that was in effect just before the instruction trapped. A trap automatically saves the current state of the machine and passes control of a prespecified routine by changing the processor mode, the relocation bounds register, and the program counter to the values specified in Ell1] Description of the Virtual Machine Monitor: The virtual machine monitor will be a particular piece of software, which we shall call a controlprogram, that exhibits certain properties. The control program modules fall into three groups which we present fairly informally: First is a dispatcher D . Its initial instruction is placed at the location to which the hardware traps: the value of P in location 1. The dispatcher can be considered as the top level control module of the control program. It decides which module to call. The second set in this skeletal specification has one member, an allocator A . It is the allocator's task to decide what system resource(s) are to be provided. In the case of a single VM, the allocator needs only to keep the VM and the VMM separate. In the case of a virtual machine monitor which hosts several VRS, it is also the allocator's task to avoid giving the same resource (such as part of memory) to more than one VM concurrently. The third set of modules in the control program can be thought of as interpreters for all of the other instructions which trap , one interpreter routine per privileged instruction. The purpose of each such routine is to simulate the effect of the instruction which trapped.
History of Virtualization Virtualization is a proven concept that was first developed in the 1960s to partition large, mainframe hardware. Today, computers based on x86 architecture are faced with the same problems of rigidity and underutilization that mainframes faced in the 1960s. VMware invented virtualization for the x86 platform in the 1990s to address underutilization and other issues, overcoming many challenges in the process.
In the Beginning: Mainframe Virtualization Virtualization was first implemented more than 30 years ago by IBM as a way to logically partition mainframe computers into separate virtual machines. These partitions allowed mainframes to “multitask”: run multiple applications and processes at the same time. Since mainframes were expensive resources at the time, they were designed for partitioning as a way to fully leverage the investment. The first machine to fully support virtualization was IBM’s VM, which began life as part of the System/360 project. The idea of System/360 (S/360) was to provide a stable architecture and upgrade path to IBM customers. A variety of machines was produced with the same basic architecture, so small businesses could buy a minicomputer if that was all they needed, but upgrade to a large mainframe with the same software later. One key market IBM identified at the time was people wishing to consolidate S/360 machines. A company with a few S360 mincomputers could save money by upgrading to a single S360 mainframe, assuming the mainframe could provide the same features. The Model 67 introduced the concept of a self-virtualizing instruction set. This meant that a Model 67 could be partitioned easily and appear to be a number of (less powerful) versions of itself. It could even be recursively virtualized, eavh virtual machine could be further partitioned. The Need for x86 Virtualization Virtualization was effectively abandoned during the 1980s and 1990s when client-server applications and inexpensive x86 servers and desktops established the model of distributed computing. Rather than sharing resources centrally in the mainframe model, organizations used the low cost of distributed systems to build up islands of computing capacity. The broad adoption of Windows and the emergence of Linux as server operating systems in the 1990s established x86 servers as the industry standard.
x86 architecture The generic term x86 refers to the &quot;CISC&quot; type instruction set of the most commercially successful CPU architecture in the history of personal computing, used in processors from Intel®, AMD, VIA, and others. It derived from the model numbers of the first few generations of CPUs, backward compatible with Intel®'s original 16-bit 8086 of 1978, most of which were ending in &quot;86&quot;. As the x86 term became common after the introduction of the 80386 in 1985, it usually implies a binary compatibility also with the extended 32-bit instruction set of the 80386. This may sometimes be emphasized as x86-32 to distinguish it either from the original 16-bit x86-16 or from the newer 64-bit x86-64 (also called x64). Today, x86 hardware usually implies also 64-bit capabilities, at least for personal computers and servers. However, to avoid compatibility problems, x86 software usually implies only 32-bit, while x86-64 or x64 are used to denote exclusive 64-bit software. The only significant competitors to x86 in PCs were the Motorola 68k, CISC type, and the PowerPC, RISC type, instruction sets. However, by August 7, 2006, Apple Inc. switched to x86 CPUs granting the x86 instruction set an effective monopoly among desktop and notebook processors. The x86 also held a growing majority among servers and workstations. Markets without a significant x86 presence include low cost embedded processors found in appliances and toys, among others. Countless computer software is written for the x86 platform – including nearly all modern commercial operating systems from MS-DOS and Microsoft Windows to Linux, BSD, Solaris OS, and Mac OS X – making the x86 instruction set architecture indispensable on a global scale, and practically irreplaceable. It was cheaper, safer and easier to buy a whole new hardware for a new customer with new service requirements than to install the applications on the same HW.
The logical solution was to implement virtualization in the x86 market to make use of all the investment already done in the new cheaper infrastructure.
Hardware Utilization Financial influences : More utilization so less hardware so less cooling so less electricity
IA-32 (x86) (Main article: X86 virtualization ) The IA-32 instruction set contains 17 sensitive, unprivileged instructions  . They can be categorized in two groups: Sensitive register instructions: read or change sensitive registers and/or memory locations such as a clock register or interrupt registers: PUSHF: Push EFLAGS onto stack POPF: Pop EFLAGS from stack SGDT: Store global descriptor table (GDT) register SIDT: Store interrupt descriptor table (IDT) register SLDT: Store local descriptor table (LDT) register SMSW: Store machine status word Protection system instructions: reference the storage protection system, memory or address relocation system: CALL: Call procedure JMP: Jump INT n: Software Interrupt LAR: Load access rights LSL: Load segment limit MOV: Move data between general-purpose registers; move data between memory and general-purpose or segment registers; move immediates to general-purpose registers POP: Pop off of stack PUSH: Push onto stack RET: Return STR: Store task register VERR: Verify segment for reading VERW: Verify segment for writing
PRIVILEGE LEVELS The processor’s segment-protection mechanism recognizes 4 privilege levels, numbered from 0 to 3. The greater numbers mean lesser privileges. Figure 4-3 shows how these levels of privilege can be interpreted as rings of protection. The center (reserved for the most privileged code, data, and stacks) is used for the segments containing the critical software, usually the kernel of an operating system. Outer rings are used for less critical software. (Systems that use only 2 of the 4 possible privilege levels should use levels 0 and 3.) The processor uses privilege levels to prevent a program or task operating at a lesser privilege level from accessing a segment with a greater privilege, except under controlled situations. When the processor detects a privilege level violation, it generates a general-protection exception (#GP). To carry out privilege-level checks between code segments and data segments, the processor recognizes the following three types of privilege levels: • Current privilege level (CPL) — The CPL is the privilege level of the currently executing program or task. It is stored in bits 0 and 1 of the CS and SS segment registers. Normally, the CPL is equal to the privilege level of the code segment from which instructions are being fetched. The processor changes the CPL when program control is transferred to a code segment with a different privilege level. • Descriptor privilege level (DPL) — The DPL is the privilege level of a segment or gate. It is stored in the DPL field of the segment or gate descriptor for the segment or gate. When the currently executing code segment attempts to access a segment or gate, the DPL of the segment or gate is compared to the CPL and RPL of the segment or gate selector (as described later in this section). The DPL is interpreted differently, depending on the type of segment or gate being accessed: • Requested privilege level (RPL) — The RPL is an override privilege level that is assigned to segment selectors. It is stored in bits 0 and 1 of the segment selector. The processor checks the RPL along with the CPL to determine if access to a segment is allowed. Even if the program or task requesting access to a segment has sufficient privilege to access the segment, access is denied if the RPL is not of sufficient privilege level. That is, if the RPL of a segment selector is numerically greater than the CPL, the RPL overrides the CPL, and vice versa. The RPL can be used to insure that privileged code does not access a segment on behalf of an application program unless the program itself has access privileges for that segment. See Section 4.10.4, “Checking Caller Access Privileges (ARPL Instruction),” for a detailed description of the purpose and typical use of the RPL. Privilege levels are checked when the segment selector of a segment descriptor is loaded into a segment register. The checks used for data access differ from those used for transfers of program control among code segments; therefore, the two kinds of accesses are considered separately in the following sections.
Equivalence Property violated: guest behavior is different than when running in ring 0 as the only guest. Resource Control violated:
• Descriptor privilege level (DPL) — The DPL is the privilege level of a segment or gate. It is stored in the DPL field of the segment or gate descriptor for the segment or gate. When the currently executing code segment attempts to access a segment or gate, the DPL of the segment or gate is compared to the CPL and RPL of the segment or gate selector (as described later in this section). The DPL is interpreted differently, depending on the type of segment or gate being accessed: — Data segment — The DPL indicates the numerically highest privilege level that a program or task can have to be allowed to access the segment. For example, if the DPL of a data segment is 1, only programs running at a CPL of 0 or 1 can access the segment. — Nonconforming code segment (without using a call gate) — The DPL indicates the privilege level that a program or task must be at to access the segment. For example, if the DPL of a nonconforming code segment is 0, only programs running at a CPL of 0 can access the segment. — Call gate — The DPL indicates the numerically highest privilege level that the currently executing program or task can be at and still be able to access the call gate. (This is the same access rule as for a data segment.) — Conforming code segment and nonconforming code segment accessed through a call gate — The DPL indicates the numerically lowest privilege level that a program or task can have to be allowed to access the segment. For example, if the DPL of a conforming code segment is 2, programs running at a CPL of 0 or 1 cannot access the segment. — TSS — The DPL indicates the numerically highest privilege level that the currently executing program or task can be at and still be able to access the TSS. (This is the same access rule as for a data segment.)
GDT: Global Descriptor table LDT: Local Descriptor Table IDT: Interrupt Descriptor Table TR: Task Register
Virtual Machine Monitor Each virtual machine interfaces with its host system via the virtual machine monitor (VMM). Being the primary link between a VM and the host OS and hardware, the VMM provides a crucial role. The VMM primarily: Presents emulated hardware to the virtual machine Isolates VMs from the host OS and from each other Throttles individual VM access to system resources, preventing an unstable VM from impacting system performance Passes hardware instructions to and from the VM and the host OS/hypervisor When full virtualization is employed, the VMM will present a complete set of emulated hardware to the VM's guest operating system. This includes the CPU, motherboard, memory, disk, disk controller, and network cards. For example, Microsoft Virtual Server 2005 emulates an Intel 21140 NIC card and Intel 440BX chipset. Regardless of the actual physical hardware on the host system, the emulated hardware remains the same. The next significant role of the VMM is to provide isolation. The VMM has full control of the physical host system's resources, leaving individual virtual machines with access only to their emulated hardware resources. The VMM contains no mechanisms for inter-VM communication, thus requiring that two virtual machines wishing to exchange data do so over the network. Another major role of the VMM is to manage host system resource access. This is important, as it can prevent over-utilization of one VM from starving out the performance of other VMs on the same host. Through the system configuration console, system hardware resources such as the CPU, network, and disk access can be throttled, with maximum usage percentages assigned to each individual VM. This allows the VMM to properly schedule access to host system resources as well as to guarantee that critical VMs will have access to the amount of hardware resources they need to sustain their operations. Host OS/Hypervisor The primary role of the host operating system or hypervisor is to work with the VMM to coordinate access to the physical host system's hardware resources. This includes scheduling access to the CPU as well as the drivers for communication with the physical devices on the host, such as its network cards. The term hypervisor is used to describe a lightweight operating shell that has the sole purpose of providing VM hosting services. The hypervisor differs from a traditional OS in that the OS may be designed for other roles on the network. As it is tailored to VM hosting, a hypervisor solution generally offers better performance and should have fewer security vulnerabilities because it runs few services and contains only essential code. Hypervisors written for hardware-assisted virtualization can embed themselves much deeper into the system architecture and offer superior performance improvements as a result. Like any traditional OS, a hypervisor-based OS still contains its own operating system code; therefore, maintaining security updates is still important. Unlike a traditional OS, hypervisors are vendor specific, so any needed hypervisor patches or security updates will come directly from the virtualization software vendor. Because hypervisors are vendor-centric, individual device support often comes directly from the virtualization vendors. Hence, it is important for the organization to ensure that any planned virtualization products are compatible with its existing or planned system hardware. When hosting VMs on a traditional OS such as SUSE Linux Enterprise Server or Windows Server “Longhorn,” the organization will find that while the host OS has a larger footprint than a hypervisor, it does provide additional flexibility with hardware devices. With SAN integration, for example, if the host OS does not recognize a Fibre Channel host bus adapter (HBA), the administrator can download the appropriate driver from the vendor's website. With a hypervisor, the administrator will need to get the driver from the virtualization software vendor, or learn that the device is not supported. Both hypervisors and operating systems have their strengths and weaknesses. Operating systems provide greater device support than hypervisors, but also require attention to ensure that they are current on all patches and security updates. Hypervisors run on minimal disk and storage resources, but patches and device drivers must come directly from the virtualization software vendor.
Dynamic binary translation The VMM modifies the guest’s binary image at runtime to get processor control when guest software attempts to perform a privileged operation The VMM can then emulate the privileged operation and return control to guest software There are different approaches in full virtualization, emulation and dynamic binary translation.
Comparing Traditional x86 Architecture and Virtualized Resource Access Operating systems designed for x86/x64 environments are written to have full access to Ring 0, which is where they run privileged OS instructions. Privileged instructions include OS kernel and device driver access to system hardware. Applications run at Ring 3. In a virtualized environment, the VMM runs at Ring 0 along with the host operating system's kernel and device drivers. Each VM cannot be given full access to Ring 0 without inducing conflicts, so the VMM runs all VMs at Ring 1. Because privileged instructions within the guest expect to run at Ring 0, the VMM must provide translation in order to “trick” the guest into believing that it has Ring 0 access. If the guest OS kernel did not demand Ring 0 access in the first place, then the translation would not be necessary and thus performance would improve substantially. This is where paravirtualization comes into play.
Hardware Emulation Hardware emulation comes with a performance price as the VMM translates instructions between the emulated hardware and the actual system device drivers ~ usually less than 2% for emulated devices such as RAM from 8% to 20% for input/output – intensive devices such as network cards and hard disks, emulation comes at a much higher price Comparing Traditional x86 Architecture and Virtualized Resource Access Operating systems designed for x86/x64 environments are written to have full access to Ring 0, which is where they run privileged OS instructions. Privileged instructions include OS kernel and device driver access to system hardware. Applications run at Ring 3. In a virtualized environment, the VMM runs at Ring 0 along with the host operating system's kernel and device drivers. Each VM cannot be given full access to Ring 0 without inducing conflicts, so the VMM runs all VMs at Ring 1. Because privileged instructions within the guest expect to run at Ring 0, the VMM must provide translation in order to “trick” the guest into believing that it has Ring 0 access. If the guest OS kernel did not demand Ring 0 access in the first place, then the translation would not be necessary and thus performance would improve substantially. This is where paravirtualization comes into play.
Paravirtualization Paravirtualization was developed as a means to overcome the emulation requirement of privileged instructions from virtual machines. With paravirtualization, virtualization application programming interfaces (APIs) and drivers are loaded into the kernel of guest operating systems. This allows the guest operating systems to run while fully aware of the virtualization architecture and thus run kernel-level operations at Ring 1. The end result is that privileged instruction translation is not necessary. The architectural differences between paravirtualization and full virtualization exist between the VM and the VMM. Paravirtualization requires the existence of paravirtualization device drivers in the guest VM, the guest VM's OS, the VMM, and the hypervisor. By including paravirtualization APIs within the guest OS kernel, the guest is fully aware of how to process privileged instructions; thus, privileged instruction translation by the VMM is no longer necessary. Furthermore, paravirtualized device drivers such as for network and storage devices are written to communicate with the VMM and hypervisor drivers. Hence, the VMM does not have to present a legacy device to the guest OS and then translate its instructions for the physical host operating system or hypervisor. Removing the heavy emulation requirements from the VMM reduces its workload to merely isolating and coordinating individual VM access to the physical host's hardware resources. The other benefit of paravirtualization is hardware access. With appropriate device drivers in its kernel, the guest OS is now capable of directly communicating with the system hardware. Note that this doesn't mean that the VM has direct access to all system hardware. In most instances, some system hardware will be available, while other hardware devices will appear as generic representations, as determined by the paravirtualization drivers within the VM. To determine which elements of hardware are paravirtualized and which are available for direct access, consult with the prospective virtualization software vendor.
Paravirtualization The most efficient approach in virtualization is paravirtualization. In paravirtualization, the guest operating system uses a specialized API to talk to the VMM which is responsible for handling the virtualization requests and putting them to the hardware. Because of this special API, the VMM doesn't need to do a resource-intensive translation of instructions before the instructions are passed to the hardware. Also, when using the paravirtualization API, the virtualized operating system is capable of generating much more efficients instructions. A disadvantage however is that you do need a modified operating system that includes this specific API, and for certain operating systems (mainly Windows), this is an important disadvantage because that kind of API is not available. For optimal performance paravirtualization is currently the way to go because instructions that are generated by the virtualized operating system don't need to be translated. Unfortunately, in some operating systems it is not possible to use complete paravirtualization, as it requires a specialized version of the operating system. To ensure good performance in such environments, paravirtualization can be applied for individual devices. This means that, for example, the instructions that are generated by the CPU are handled by using hardware virtualization, but the instructions that are generated by specific devices such as network boards or graphical interface cards can be modified before they leave the virtualized machine by using paravirtualized drivers. Some vendors offer paravirtualized driver packages for specific operating systems. These are frequently available as a separate purchase. It often pays to use these specialized driver sets, as the performance of devices like the network board and the hard disk can increase considerably.
Hardware-assisted virtualization Hardware-assisted virtualization is very likely to emerge as the standard for server virtualization well into the future. While the first-generation hardware that supports hardware-assisted virtualization offers better CPU performance and improved virtual machine isolation, future enhancements promise to extend both performance (such as memory) and isolation on the hardware level. The key to isolation and memory performance lies in dedicating hardware space to virtual machines. This will come in the form of dedicated address space that is assignable to each VM. AMD-V's forthcoming nested paging support will remove the paging bottleneck found in the current shadow paging methodology and in turn improve memory performance. Note that Intel will offer the same functionality, referred to as Extended Page Tables (EPT), in future enhancements to its VT chips. CPUs that support hardware-assisted virtualization are fully aware of the presence of the server virtualization stack. With hardware-assisted virtualization enabled via the system's Complementary Metal Oxide Semiconductor (CMOS) setup, the system will automatically reserve physical address space exclusively for virtual machines. This provides true isolation of virtual machine resources. Also note the existence of a device I/O pass-through bus in the virtualization stack. This is significant because virtual machines can use this bus to access high I/O devices such as disk and network directly instead of through emulated hardware resources. However, the pass-through bus, also known as the VMBus, is part of the VMM/hypervisor architecture for hypervisors designed to support hardware-assisted virtualization. Keep in mind that while the pass-through bus can provide a clear data path to physical hardware resources, all control information is processed by the VMM, which prevents one VM from taking full control of a hardware resource.
Hardware-assisted virtualization Guest Operating Systems Provided with Ring 0 Access via Hardware-Assisted Virtualization By allowing the hypervisor's VMM to run below Ring 0, guest operating systems can process privileged instructions without the need for any translation on the part of the VMM. This eliminates the previous requirement for privileged instruction translation by the VMM. When an AMD-V or Intel VT platform detects the presence of the VMM, it allows the VMM to run at Ring -1 and in turn runs in super-privileged mode. The VMM maintains control of processor, memory, and system hardware access in order to coordinate access to hardware resources. At the same time, the VMM also allocates specific hardware address space to each VM, thus providing hardware isolation between each VM. Forthcoming releases of AMD-V and Intel VT chips will improve memory paging support. Full virtualization; paravirtualization; and first-generation, hardware-assisted virtualization rely on Shadow Page Tables (SPT) to translate RAM access for virtual machines. To manage memory, the VMM maintains an SPT for each VM in software. When a VM attempts to write to memory, the VMM intercepts the request, translates it, and stores it in the SPT associated with the VM. The result of the required translation is significant performance overhead (25% to 75%) for memory paging. AMD-V's Nested Page Tables (NPT) support and Intel VT's Extended Page Tables (EPT) support allow direct translation between guest OS memory addresses and physical host memory addresses. NPT and EPT enable VM guest operating systems to directly modify their own allocated physical page tables and also handle their own page faults. With the VMM essentially acting as a bridge between a VM and physical memory space on the physical host, the memory performance bottleneck of SPT will no longer exist.
Desktop and application management are, if anything, greater enterprise pain points than server sprawl, skyrocketing energy costs and sluggish datacenter infrastructure. Intractable problems include running applications across incompatible operating systems, managing desktop sprawl and ensuring efficient provisioning and secure execution of desktop applications. • Unlike server virtualization approaches, which are comparable if not standard, virtualization techniques for desktop and application management differ so greatly from one another that they come in at least six distinct flavors or use cases: o Desktop virtualization itself o Server-side workspace virtualization o Client-side workspace virtualization o Application isolation o Application streaming o Virtual appliances.
Desktop virtualization Emulating one operating system upon another – lets organizations run multiple operating systems supporting all applications seamlessly. Already beloved of PC gamers and enterprise Mac zealots, it’s especially important for programmers working to Flickr-speed iteration cycles, giving them the ability to test against multiple target environments simultaneously and securely.
Desktop consolidation , or server-side workspace virtualization, can replace traditional server-based computing with centrally managed PC images that are nonetheless fully customizable from the end user’s point of view. In the server-hosted model, a pool of virtual workspaces resides on the server. Remote users log into them from any networked device via Microsoft’s Remote Desktop Protocol (RDP). This architecture lets users customize their virtual workspace to their heart’s content, while operators enjoy the relatively straightforward task of managing desktop configuration on one central server rather than proliferating desktop machines. A more useful term for this model might be desktop consolidation, but we’ll continue using server-side workspace virtualization in order to maintain the distinction between it and client-side workspace virtualization. VMware’s server-side workspace virtualization offering – Virtual Desktop Infrastructure (VDI) – is by far the best known product in this category (although stealth vendors are working on alternate platforms). Hewlett-Packard and IBM base their own packages around it. Significantly, VDI just got enhanced with the addition of Propero, a virtual connection broker. Connection brokers are a critical component of server-side workspace virtualization. They arbitrate between a pool of virtual workspaces residing on a central server, and break the 1:1 ratio between virtual workspaces and end users, thus optimizing resource use. Dunes and LeoStream sell connection brokers as well. Provision Networks has a platform-agnostic approach, supporting Microsoft, Virtuozzo and Xen virtualization alongside terminal services. HP and IBM offer VDI packages to help sell their own server hardware. Even Citrix has gotten in on the game, and just in the nick of time. Workspace virtualization advocates believe the technology improves upon terminal services by allowing end users more freedom to customize their workspaces, by providing better isolation between multiple workspaces hosted on a single machine, and by including high availability and disaster recovery as part of the package. Citrix’s Desktop Server requires customers to have Presentation Server, so customers have to use Citrix’s proprietary ICA to connect to Presentation Server, and then Microsoft’s Remote Desktop Protocol to connect to the virtual machine. Besides the architectural redundancy of this model, licenses for Presentation Server are going to eat into potential cost savings. Critics complain, with some justification, that this is the worst of all possible worlds. Citrix must, however, respond to the workspace virtualization threat. Organizations that want centralized control of desktops – that is, practically all of them – now have an option that overcomes end-user resistance to the comparatively restrictive terminal services model. We believe (and the virtualization customers who responded to our survey agree) that desktop and workspace virtualization in general, and server-side workspace virtualization in particular, will be this market’s hot spot for the next 12-18 months. It’s a very strong model, but like all architectures it has some obvious weaknesses. The biggest problem with server-hosted workspace virtualization is that it’s a bandwidth hog. Performance is constrained by the performance of your network.
In client-side workspace virtualization, a virtual workspace is served out to execute on the client device. This approach requires a desktop virtualization underpinning, so that Kidaro, for example, works with desktop VMMs from Microsoft and VMware, and Sentillion’s vThere works with Parallels. VMware has its own offering in this space, and the product’s original name, Assured Computing Environment (now just ACE), hints at its strength. Client-side workspace virtualization centralizes management, yes, but its big advantage over other models is the security and isolation of data and logic on the client. It’s the right model for organizations that need to ensure the security of environments served to remote users – defense contractors, for example, and healthcare providers.
Application isolation is also called encapsulation, a term which more precisely conveys what the software actually does. It packages a desktop application with the necessary extra bits and pieces included in the package, so that when the application executes on the desktop it does so within a secure sandbox. In Thinstall’s model, for example, applications use a virtual registry and file system embedded in the package with the application. These extra tools insulate applications from changes to and incompatibility with the underlying desktop operating system. Thinstall, in other words, is all about ending DLL hell – the mare’s nest created on Windows machines when multiple applications make obscure changes to the operating system’s registry, file system and libraries. Each application has its own unique registry and file system, but not a complete operating system (so users don’t have to pay for an extra Windows license.) Although Windows is the obvious market for application isolation, Linux and Solaris machines have their share of configuration drift as well. That’s probably why tiny Trigence has earned a blue-chip partner roster for its take on encapsulation, aimed at the Linux and Solaris markets. Isolation confers all the usual benefits of virtualization: better provisioning, better security and data protection, and more granular license management. Its limitations are mostly around the increased footprint of the application package and the correspondingly greater memory requirements. Compare the diagrams for client-hosted workspace virtualization and application isolation, and you’ll see that the two are very similar. The important distinction is that where application isolation assumes a host operating system on the client device, client-side workspace virtualization expects to see a desktop virtual machine.
The technique is intentionally named to evoke media streaming. Just as a QuickTime video will start playing before the whole file has downloaded from the server, a streamed application arrives on the client desktop in the order necessary for it to be able to execute on that desktop before it’s all streamed down from the server. For example, if it’s Microsoft Word streaming from the server, users should be able to open a new document even as the ‘help’ files are downloading in the background. With its origins in the gaming industry (a market still being pursued by Endeavors and Exent), streaming has expanded to embrace enterprise application delivery in the form of Altiris Software Virtualization Solution, AppStream, Citrix Presentation Server Application Streaming (the former Project Tarpon) and Microsoft SoftGrid. Streaming combines the security and license management benefits of application isolation with just-in-time delivery. Its limitations are mostly technical. Application code must be organized to stream in the correct order so that it can start execution before the download is complete. You don’t get the full PC environment, just the application, so you have to provide a workspace. That means maintaining the client-side operating system and ensuring compatibility. This may be why application streaming, which has been around for a long time (AppStream has already raised over $50m in venture capital), has not really lived up to its early hype. With recent advances in server virtualization, it is now possible to combine application streaming with client-side workspace virtualization and get around all the limitations of application streaming alone. Client-side workspace virtualization provides the full PC environment as well as central maintenance. Application streaming adds just-in-time delivery of desktop applications. This combination of workspace virtualization with application streaming holds enormous promise in the Windows world. It should simplify application delivery, ensure license management and prevent operating system drift by ensuring central control even as end users have all the power to personalize their environment.
As this discussion should illustrate, the approaches to desktop and application virtualization are many, with different strengths and weaknesses and multiple interdependencies. The overall picture is not so much one of a standard architecture, as in the Parthenon model of server virtualization, where different process automation tools (for example) offer comparable features and can be substituted for one another. Application virtualization is more like a series of atoms that can be combined to form larger molecules. Certain technologies take similar approaches and thus resemble one another, but work at different layers of the software stack. That’s why we’ve adopted the periodic table of elements as the visual metaphor for this market sector. Mapping the techniques as if they were atoms allows us to illustrate key patterns. We can plot streaming and virtual appliances separately, to indicate that they are distribution models as much as execution modes. We can suggest dependencies by proximity, so that IBM’s VDI offering sits next to LeoStream’s connection broker, for example, and Kidaro and Sentillion are aligned with the Microsoft and Parallels desktop virtualization layers on which they depend. We can illustrate the enormous power and influence of VMware, which has an entry in practically every category, and the ambition of Microsoft and Citrix, which are scrambling to catch up. We can suggest fruitful avenues for innovation. The open source universe, for example, presently lacks direct equivalents to VDI for server-side and ACE for client-side workspace virtualization. Most importantly, by sorting the many approaches to desktop and application virtualization into an orderly matrix such as this, we can preserve the subtlety and complexity of this competitive-yet-cooperative landscape, while striving to keep all the technologies straight in our minds.
Review the day with all the subjects, get back if questions arise. Highlighted in red are the most important concepts that will be re-used and analyzed deeper during the course. Requirements for HW Architecture Virtualization – Popek and Goldberg A Model of Third Generation Machines Machine states Instructions classification Privileged instructions Control sensitive instructions Behavior sensitive instructions Properties for a Virtual Machine Monitor Equivalence Resource control Efficiency Evolution for virtualization: from mainframes to x86 architecture due to business reasons Challenges around x86 virtualization -> ISA doesn’t comply with P&G The IA-32 instruction set contains 17 sensitive, unprivileged instructions Server virtualization approaches Full Virtualization Paravirtualization Hardware Assisted Virtualization Client virtualization approaches Desktop virtualization Server-side workspace virtualization Client-side workspace virtualization Application virtualization Application isolation Application streaming
Review the day with all the subjects, get back if questions arise. Highlighted in red are the most important concepts that will be re-used and analyzed deeper during the course.
Virtualization Technology Introduction Argentina Software Pathfinding and Innovation Intel® Corporation 28 July 2008
Why is Intel giving this course? Argentina Software Development Center in Córdoba - Strong investment in developing areas of expertise Software Pathfinding and Innovation - Seeking the next technological move Strategic Area in Virtualization Technology - Evolving expertise in Virtualization Technology - Augment critical mass in this area Introduction
What is virtualization? Virtualization Layer Virtual Container App. A App. B Hardware Virtual Container App. C App. D ‘ Nonvirtualized’ system A single OS controls all hardware platform resources Virtualized system It makes it possible to run multiple Virtual Containers on a single physical platform Virtualization is a broad term (virtual memory, storage, network, etc) Focus for this course: platform virtualization Virtualization basically allows one computer to do the job of multiple computers, by sharing the resources of a single hardware across multiple environments Introduction Operating System App. A App. B App. C App. D Hardware
Big disadvantage: machine utilization is very low, most of the times it is below than 25%
X86 Windows XP X86 Windows 2003 X86 Suse X86 Red Hat 12% Hardware Utilization 15% Hardware Utilization 18% Hardware Utilization 10% Hardware Utilization Evolution of Virtualization App App App App App App App App
Just-in-time delivery of a server-hosted application to the desktop, such that the desktop application can execute before the entire file has been downloaded from the server
Use cases include:
Managing the number of instances of running applications, in the case of license constraints
Superset of Application Isolation, including a delivery method and an execution mode
You stream the application code to the desktop, where it runs in isolation
No full PC environment, just the application, so you have to provide a workspace
Requires to maintain the client-side operating system and ensuring compatibility. This may be why application streaming, which has been around for a long time (AppStream has already raised over $50m in venture capital), has not really lived up to its early hype.