• Like
ppt
Upcoming SlideShare
Loading in...5
×
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
973
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
60
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • ‡ doom [duːm] n. U ① 운명 ( 보통 , 악운 ), 숙명 ; 불운 ; 파멸 ; 죽음 . ② ( 불리한 ) 판결 . ③ ( 신이 내리는 ) 최후의 심판 . ④ 〖역사〗 법령 .
  • The x86 in real mode and 16-bit protected mode contains 6 general 16-bit registers (AX, BX, CX, DX, SI, DI), 2 special stack registers (BP and SP), one 16-bit flags register ( FLAGS ), and 4 segment registers (CS, SS, DS, ES). The first 4 of the general registers are split into top and bottom half 8-bit registers (AX = AH:AL, BX = BH:BL, CX = CH:CL, DX = DH:DL) which are independently usable in 8-bit instruction forms. The instruction pointer (IP) register exists, but is only used in an implicit manner (though its value can be stored on the stack and accessed without a problem). Starting with the Intel 80386 processor, the x86 in 32-bit protected mode extended the 16-bit registers to 32 bits (EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP, EFLAGS, EIP). The older 16-bit registers were overlayed with the bottom half of the 32-bit registers and could be accessed with an instruction override. There is no "high-half" 16-bit register access; instead, Intel chose to generalize the addressing so that every register could be used for scaled index addressing, and so that EBP could be used as a general register, as well as a stack register. CLI/HALT :CLI - Clear Interrupt Flag (disable) Of course you can do that unless you use your app under a NT based (W2K, XP, Vista) OS. Even on those OSes where there îsn't a real DOS but an emulation layer, the mulation layer works to some extent unless you want to manipulate hardware directly, try to cross memory boundary of your application or any kind of memory protection, and last but not the least you can't use privileged opcodes as your code will reside on user spacethus running on Ring3 of CPU and only Kernel space code runs on Ring0 allowing privileged opcodes such as sti, cli, halt and many others.
  • What kind is VMware?
  • Initially, a block of source instruction is interpreted, and profiling is used to determine which instruction sequence are frequently executed. Then a frequently executed block may be binary translated. Multiprogramming State : Mapped 1:1 Instructions : Native State materialization : Provided by hardware Dynamic translation State : Registers mapped to host registers as available (overflow to memory). Memory mapped to host memory. Instructions : Emulated State materialization : Provided by VM software HLL VMs State : Mapped to host resources as available. Instructions : Emulated, JIT compiled State materialization : Provided by VM software
  • In computing , just-in-time compilation ( JIT ), also known as dynamic translation , is a technique for improving the runtime performance of a computer program . It converts, at runtime, code from one format into another, for example bytecode into native machine code. The performance improvement originates from caching the results of translating blocks of code, and not simply evaluating each line or operand separately (see Interpreted language ), or compiling the code at development time. JIT builds upon two earlier ideas in run-time environments: bytecode compilation and dynamic compilation . Several modern runtime environments, such as Microsoft 's .NET Framework and most implementations of Java , rely on JIT compilation for high-speed code execution.
  • “ Classic” VMs State Mapped 1:1, except for privileged registers. Instructions Native, except trapping for priveleged instructions State materialization Provided by hardware Whole System VMs State Mapped to available memory, not 1:1 Instructions Emulated State materialization Provided by VM software Co-Designed VMs State Mapped 1:1 Instructions Block-level translated State materialization Provided by hardware/VM software combination
  • bane [bein] n. U 독 ( 毒 ), 해악 ; 재해 ; 파멸 ( 의 원인 ); 죽음 (death). ┈┈• Gambling was the ∼ of his existence. 도박이 그의 파멸의 원인이 되었다 .
  • PALO ALTO, Calif., January 31, 2001 -- VMware, Inc. today announced a cooperative research and development agreement for a joint initiative with the U.S. National Security Agency (NSA) to enhance and certify the security of VMware's virtual machine technology. The project, which builds upon VMware's patent pending MultipleWorlds(tm) technology, will enable government users to safely use commercial off-the-shelf software for certain sensitive or classified applications and environments. NSA's project NetTop plans to use security enhanced virtual machines as building blocks for applications requiring separation of information domains, such as providing secure remote access to classified computer networks over the Internet. NSA expects NetTop to deliver components suitable for use by the national security community. VMware plans to incorporate the security enhancements resulting from the agreement in its future product releases.
  • visor [váizər] n.,vt. 〖역사〗 ( 투구의 ) 면갑 ( 面甲 )( 으로 덮다 ); ( 모자의 ) 챙 ; 복면 ( 을 쓰다 ), 마스크 ; =SUN VISOR; 변장 .
  • POPF : Pop data into f lags register Cs : The segment containing the currently executing sequence of instructions is known as the current code segment; it is specified by means of the CS register. The 80386 fetches all instructions from this code segment, using as an offset the contents of the instruction pointer. CS is changed implicitly as the result of intersegment control-transfer instructions (for example, CALL and JMP ), interrupts, and exceptions.
  • Windows XP Service Pack 1a When you try to start a Pre-Boot Execution Environment (PXE) client computer, you may receive one of the following error messages:
  • Windows Vista was known by its codename " Longhorn ". [1]
  • Windows Management interface

Transcript

  • 1. An Overview of Virtual Machine Architectures
  • 2. Slide sources
    • Virtual Machines Background, Kenneth Chiu, Computer Science Department at SUNY Binghamton [email_address]
    • Windows Virtualization Architecture, Mark Kieffer, Group Program Manager, Windows Virtualization, Microsoft Corporation
    • CS 140 Lecture 26: Virtual Machine Monitors, Mendel Rosenblum, Stanford CS Department
    • An Overview of Virtual Machine Architectures, Smith and Nair
  • 3. Review: What is an OS?
    • software between applications and reality:
      • abstracts hardware and makes portable
      • makes finite into (near)infinite
      • provides protection
    gcc emacs Doom, XXI OS hardware
  • 4. What If?
    • Process abstraction looked just like hardware!
    OS/Virtual Machine Monitor hardware gcc emacs OS hardware Doom,XXI OS hardware
  • 5. Virtual Machines
    • A virtual machine treats hardware and the operating system kernel as though they were all hardware.
    • A virtual machine provides an interface identical to the underlying bare hardware.
    • For example, the operating system creates the illusion of multiple processes, each executing on its own processor with its own (virtual) memory.
    • The resources of the physical computer are shared to create the virtual machines.
    • A normal user time-sharing terminal serves as the virtual machine operator’s console.
  • 6. System Models Non-virtual Machine Virtual Machine
  • 7. Definitions
    • Instruction Set Architecture (ISA)
      • Precise specification of the interface between hardware and software
    • Application Binary Interface (ABI)
      • Defines how an application can work with a platform at the binary level. (Contrast with API.)
      • Includes user ISA, system call interface, etc.
      • Suppose an ABI is changed.
        • Recompile?
        • Source changes?
  • 8. Virtualization
    • VMM also known as hypervisor.
    Hardware OS Application ISA Hardware OS Application Virtual ISA VMM ISA Guest Host OS Application Virtual ISA Virtual Machine
  • 9. Advantages/Disadvantages of Virtual Machines
    • The VM concept provides complete protection of system resources
      • since each VM is isolated from all other VMs.
    • What might be bad about this?
      • Isolation, however, permits no direct sharing of resources.
    • A VM system is a perfect vehicle for OS research and development.
    • The VM concept is difficult to implement
      • effort required to provide an exact duplicate to the underlying machine.
  • 10. Process vs. System
    • Meaning of “machine” depends on perspective.
      • To a process, the machine is the system calls, libraries, etc.
        • Already abstract.
      • The entire system also runs on a machine.
        • Includes ISA, actual devices, etc.
      • Other kinds of machines?
    • As there are two perspectives, there are two kinds of virtual machines: process and system.
      • Process virtual machine can support an individual process.
      • System virtual machine can run a complete OS plus environment.
  • 11. Process vs. System Process VM System VM Examples? x86 Linux Java VM Native App Native App Java VM Java Prog Java Prog x86 Linux VMM Native App Native App W32 App Windows W32 App
  • 12. Making a process look like hardware - CPU
    • Observations: Most instructions are the same regardless of processor privileged level.
        • Example: inc %eax
    • Why not just give CPU to execute the instructions?
      • Safety – How we going to get it back? Or stop it from stepping on us? How about CLI/HALT?
      • Answer: Use protection mechanism.
    • Run virtual machine directly on CPU at non-privileged level.
      • Most instruction just work.
      • Privileged instructions trap into monitor and run simulator on instruction.
      • Makes some assumptions about architecture.
  • 13. Complete Machine Simulation
    • Build a simulation of all the hardware.
      • CPU – A loop that fetch an instruction, decode it, simulate its effect on the machine, state.
      • Memory – Physical memory is just an array, simulate the MMU on all memory accesses.
      • I/O – Simulate I/O devices, programmed I/O, DMA, interrupts.
    • Problem: Too slow!
      • 100x slowdown makes it not too useful.
      • CPU/Memory – 100x CPU/MMU simulation.
      • I/O Device – <2x slowdown.
    • Need to emulate CPU/MMU fast enough.
  • 14. How is a process different from HW?
    • Process
    • CPU – Non-Privileged registers and instructions.
    • Memory – Virtual memory.
    • Exceptions – signals, errors.
    • I/O - File System, Directory, Files, raw devices.
    • Hardware
    • CPU – All registers and instructions.
    • Memory – Both virtual and physical memory, memory management, TLB/page tables, etc.
    • Exceptions – Trap architecture, interrupts, etc.
    • I/O – I/O devices accessed using programmed I/O, DMA, interrupts.
  • 15. Virtualization
    • The state of a machine must be maintained.
      • Physical machine: latches, flip-flops, etc.
      • Virtual machine: combination of physical machine and state emulated in software using RAM, etc.
    • At certain points in execution, such as a trap, the state of the machine must be “materialized”.
      • Not trivial due to complex hardware techniques used to provide high performance.
      • This ability to materialize the state is termed “preciseness”.
    • Three aspects of virtualization
      • State: registers and memory
      • Instructions: may involve emulation
      • State materialization: when exceptions occur
  • 16. Key Ideas
    • VMs can support an individual process only, or can support a whole OS.
    • Can construct a useful taxonomy based on:
      • process or system
      • same ISA or different ISA
  • 17. Taxonomy
    • Process
      • Same ISA
        • Multiprogramming
        • Dynamic optimization
      • Different ISA
        • Dynamic translators
        • HLL VM
    • System
      • Same ISA
        • “ Classic” OS VMs (IBM)
        • Hosted VMs
      • Different ISA
        • Whole system
        • Co-designed VMs
  • 18. Process VMs
    • Multiprogramming
      • A process has the illusion of having the whole machine to itself.
    • Emulation
      • Interpreted. (Define.)
      • Translated. (Define.)
      • What are relative merits?
    • Dynamic optimizers
      • Especially useful with some kind of profile-directed translation.
    • High Level Language VMs
      • High-level language is compiled to an intermediate language.
      • VM then runs the intermediate language.
      • Example is Java: Interpreted or translated?
  • 19. Java Virtual Machine
    • JVM execute platform-neutral bytecodes.
    • JVM consists of
    • - class loader
    • - class verifier
    • - runtime interpreter
    • Just-In-Time (JIT) compilers increase performance
  • 20. System VMs
    • Same ISA
      • “ Classic” (Define. Pros/cons?)
        • VMM built directly on top of hardware.
        • Most efficient, but requires wiping the slate clean.
        • Requires device drivers in the VMM.
      • Hosted (Define. Pros/cons?)
        • VMM built on top of existing OS.
        • Most convenient
        • Devices drivers supplied by host OS, VMM uses facilities provided by host OS.
    • Different ISA
      • Whole System VMs: Emulation
        • ISA not the same, must emulate everything.
      • Co-Designed VMs: Optimization
        • Hardware designed to support VMs.
        • Provides a clean design for virtualization.
        • Can be significantly more efficient.
  • 21. Virtual Machine Monitor
    • Thin layer of software that virtualizes the hardware
      • Exports a virtual machine abstraction that looks like the hardware
    Hardware Operating System App App Virtual Machine Monitor Virtual Machine Monitor (VMM) App App App Operating System Operating System
  • 22. Old idea from the 1960s
    • IBM VM/370 – A VMM for IBM mainframe
      • Multiplex multiple OS environments on expensive hardware.
      • Desirable when few machine around.
    • Interest died out in the 1980s and 1990s.
      • Hardware got cheap.
      • Compare Windows NT verses N DOS machines
    • Interesting again today
      • Difference problems today – software management
      • VMM attributes still relevant
  • 23. Virtual Machine Monitor attributes
    • Software compatibility
      • Runs pretty much all software
      • Trick: Make virtual hardware match real hardware.
    • Low overheads/High performance
      • Near “raw” machine performance
      • Direct execution of CPU/MMU.
    • Complete isolation
      • Total data isolation between virtual machines
      • Use hardware protection.
    • Encapsulation
      • Virtual machines are not tied to physical machines
      • Checkpoint/Migration.
  • 24. Different thought about OSes
    • Installing software on hardware is broken
      • Tight coupling of OS and applications to hardware creates management problems.
    • Want to subdivide OS:
      • Hardware drivers
      • Hardware management
      • System support software
    • Turn OSes into normal software that can be managed
  • 25. Backward compatibility with VMMs
    • Backward compatibility is bane of new OSes.
      • Huge effort require to innovate but not break.
    • Recent security consideration make it impossible
      • Choice: Close security hole and break apps or be insecure
    • Example: Not all WinNT applications run on WinXP.
      • In spite of a huge effort to make WinXP compatible.
      • Given the number of applications that run on WinNT, practically any change will break something.
        • If (OS == WinNT)….
    • Solution: Use a VMM to run both WinNT and WinXP
      • Obvious for OS migration as well: Windows -> Linux
  • 26. Cisco Content Engine 590 Intel Appliance Linux Windows 2000 RealPlayer Server Media Server IP chain
  • 27. Isolation: Access to Classified Networks
    • Traditional tension: Security vs. Usability
      • Secure systems tend not to be that usable.
      • Flexible systems are not that secure.
    • Additional information assurance requirement:
      • Data cannot flow between networks of different classification.
    • Solution: Run two VMs:
      • Classified VM
      • Internet VM
    • Use isolation property to isolate two VMs
      • VMM has control of the information flow between machines
        • Declassifier mechanism
  • 28. National Security Agency NetTop Classified VM VPN Internet VM Firewall SE-Linux
  • 29. Logical partitioning of server machines
    • Run multiple servers on same box
      • Ability to give away less than one machine
        • Modern CPUs more power than most services need.
      • 0.10U rack space machine - Better power, cooling, floor space,etc.
      • Server consolidation trend: N machine -> 1 real machine
    • Isolation of environments
      • Printer server doesn’t take down Exchange server
      • Compromise of one VM can’t get at data of others
    • Resource management
      • Provide service-level agreements
    • Heterogeneous environments
      • Linux, FreeBSD, Windows, etc.
  • 30. Scenario : Server Consolidation Web Server App Server Web Server App Server Database Server Database Server App Server App Server Web Server Web Server VMware MultipleWorlds + Physical Hardware
  • 31. VMM Arrangements Host OS VMM Guest 1 Guest 2 VMM Guest 1 Guest 2 Host OS VMM Guest 1 Guest 2 Type-2 VMM Type-1 VMM (Hypervisor) Hybrid VMM Examples: JVM CLR Examples: Virtual PC & Virtual Server Examples: Windows Virtualization Hardware Hardware Hardware What we have today What we’re building for the future
  • 32. The Hypervisor
    • Very thin layer of software
      • Highly reliable
      • Much smaller Trusted Computing Base (TCB)
    • No built-in driver model
      • Leverage the large base of Windows drivers
      • Drivers run in a partition
    • Will have a well-defined, published interface
      • Allow others to create support for their OS’s as guests
    • Hardware virtualization assists are required
      • Intel Virtualization Technology
      • AMD “Pacifica”
  • 33. Monolithic vs. Microkernelized
    • Monolithic hypervisor
      • Simpler than a modern kernel, but still complex
      • Contains its own drivers model
    • Microkernelized hypervisor
      • Simple partitioning functionality
      • Increase reliability and minimize TCB
      • No third-party code
      • Drivers run within guests
    Hypervisor VM 1 (“Admin”) VM 2 VM 3 Hardware Hardware Hypervisor VM 2 (“Child”) VM 3 (“Child”) Virtual- ization Stack VM 1 (“Parent”) Drivers Drivers Drivers Drivers Drivers Drivers Drivers Drivers Drivers Drivers Drivers Drivers
  • 34. The Hypervisor CPU Hard Drive Ethernet NIC RAM
    • Thin layer of software running on the hardware
    • Supports creation of partitions
      • Each partition is a virtual machine
      • Each partition has one or more virtual processors
      • Partitions can own or share hardware resources
      • Software running in partition is called a guest
    • Enforces memory access rules
    • Enforces policy for CPU usage
      • Virtual processors are scheduled on real processors
    • Enforces ownership of other devices
    • Provides simple inter-partition messaging
      • Messages appear as interrupts
    • Exposes simple programmatic interface called “hypercalls”
    Hypervisor Parent Partition (Minimum Footprint Windows)
  • 35. CPU Virtualization Requirements
    • Need protection levels to run VMs and monitors
    • All unsafe/privileged operations should trap
      • Example: disable interrupt, access I/O dev, …
      • x86 problem: POPF (different semantics in different rings)
    • Privilege level should not be visible to software
      • Software in VM should be able to query and find its level in a VM
      • x86 problem: MOV ax, cs
    • Trap should be transparent to software in VM
      • Software in VM should be able to tell if instruction trapped.
      • x86 problem: traps can destroy machine state.
    • Lost art.
  • 36. CPU Trap architecture virtualization
    • What happens when an interrupt or trap occurs.
      • Like all OSes: we trap into the monitor.
    • What if the interrupt or trap should go to the VM?
      • Example: Page fault, illegal instruction, system call, interrupt.
    • Run the simulator again.
      • X86 example: Lookup trap vector in VM’s IDT.
      • Push cs, eip, eflags, on stack.
      • Switch to privileged mode.
  • 37. Virtualization requirements - Virtualizing Memory
    • Basic MMU functionality:
      • OS manages physical memory (0…MAX_MEM).
      • OS sets up page tables mapping VA->PA.
      • CPU accesses VA to should go to PA.Paging off: PA=VA.
      • Used for every instruction fetch, load, or store.
    • Need to implement a virtual physical memory
      • Logically need additional level of indirection
        • VM’s VA -> VM’s PA -> machine address
    • Trick: Use hardware MMU to simulate virtual MMU.
      • Can be folded into page tables: VA->machine address
  • 38. MMU Virtualization
    • Trick: Monitor keeps shadow of VM’s page table
      • Contains mapping to physical memory allocated for that VM.
      • Access causes Page Fault:
        • Lookup in VM’s page table mapping from VPN to PPN.
        • Determine where PPN is in machine memory (MPN).
          • Monitor can demand page the virtual machine
        • Insert mapping from VPN->MPN into shadow page table.
    • Uses hardware protection
      • Monitor never maps itself into VM’s page table
      • Monitor never maps memory allocated to other VMs in VM’s page table
  • 39. I/O device virtualization
    • Type of communication:
      • Special instruction – IN/OUT.
      • Memory mapped I/O (PIO).
      • Interrupts.
      • DMA.
    • Virtualization
      • Make IN/OUT and PIO trap into monitor.
      • Run simulation of I/O device.
    • Simulation:
      • Interrupt – Tell CPU simulator to generate interrupt.
      • DMA – Copy data to/from physical memory of virtual machine.
  • 40. Virtual Machine Uses
    • Emulation
      • One ISA can be used to emulate another.
      • Provides cross-platform portability.
    • Optimization
      • Emulators can optimize as they emulate.
      • Also can optimize same ISA to same ISA.
    • Replication
      • A single physical machine can be replicated, providing isolation between the VMs.
    • Composition
      • Two virtual machines can be composed, combining the functionality of each.
  • 41. Example: Using VMM to enhance security
    • Problem Area: Intrusion Detection Systems (IDS).
    • Trade-offs
      • Host-based IDS (HIDS):
        • + Good visibility to catch intruder.
        • - Weak isolation from intruder disabling/masking IDS.
      • Network-based IDS (NIDS):
        • + Good isolation from attack from intruder.
        • - Weak visibility can allow intruder to slip by unnoticed.
    • Would like visibility of HIDS with isolation of NIDS.
      • Idea: Do it in the virtual machine monitor.
  • 42. VMM-based Intrusion Detection System
    • Strong isolation
      • VMM isolate software in VM from VMM.
      • Comprise OS in VM can’t disable IDS in VMM.
    • Introspection – Peer inside at software running in VM
      • VMM can see: Physical memory, registers, I/O device state, etc.
      • Signature scan of memory
        • Look through physical memory for patterns or signs of break-in
    • Interposition – Modify VM abstraction to enhance security
      • Memory Access Enforcer
        • Interpose on page protection.
      • NIC Access Enforcer
        • Interpose on virtual network device.
  • 43. Collective Project: A Compute Utility
    • Distributed system where all software runs in VMs
      • Research with Prof. Monica Lam and students.
    • Virtual Appliance abstraction
      • x86 virtual machine.
      • Target specialized environment (e.g. program development)
      • Store in a centralized persistent storage repository.
      • Cached on the machine were virtual appliances run.
    • Target benefits
      • System administration
        • Centralize and amortize administration of a virtual appliance.
      • Mobility
        • Computing environment follows user around.
  • 44. Virtualizing I/O Devices on VMware Workstation’s Host VMM
  • 45. Virtualizing the PC Platform
    • Several hurdles
      • Non-virtualizable processor
        • Some privileged instructions fail silently. (Why is this a problem?) (What’s the solution?)
      • PC hardware diversity
        • Why is this problematic for a “classic” VM?
      • Pre-existing PC software
        • Must stay compatible
    • To address these, VMware uses a hosted VM. (Not a “classic” VM.)
  • 46. Hosted VMware Architecture VMware achieves both near-native execution speed and broad device support by transparently switching* between Host Mode and VMM Mode. Guest OS Applications Guest Operating System Host OS Apps Host OS PC Hardware Disks Memory CPU NIC VMware App Virtual Machine VMware Driver Virtual Machine Monitor *VMware typically switches modes 1000 times per second The VMware Virtual machine monitor allows each guest OS to directly access the processor (direct execution) VMware, acting as an application, uses the host to access other devices such as the hard disk, floppy, or network card VMM Mode Host Mode
  • 47. Two Worlds
    • VMApp runs in the host, using the VMDriver host kernel component to establish the VMM.
    • CPU is thus executing in either the host world or the virtual world, using VMDriver to switch worlds.
    • World switches are expensive, since user and system state must be switched.
  • 48. Virtualizing the NIC
    • I/O port operations by guest OS must be intercepted by VMM.
      • Must then be processed in the VMM (to maintain the virtual state).
      • Or executed in the host world. (When must it do what?)
    • Send operations start as a sequence of ops to virtual I/O ports.
      • Upon finalization of the send, the VMApp issues a host OS syscall to the VMNet driver, which passes it on the real NIC.
      • Finally requires raising a virtual IRQ to signal completion.
    • Receive operations operate in reverse.
      • VMApps executes select() syscall on possible sources.
      • Reads packet, forwards it to VMM which raises a virtual IRQ.
  • 49. Virtualizing a Network Interface Host OS PC Hardware Physical NIC VMApp VMDriver Guest OS VMM Physical Ethernet NIC Driver NIC Driver Virtual Bridge Virtual Network Hub
  • 50. Details
    • Send
      • Guest OS out to I/O port
      • Trap to VMDriver
      • Pass to VMApp
      • Syscall to VMNet
      • Pass to actual NIC driver
    • Receive
      • Hardware IRQ
      • Actual NIC delivers to VMNet driver
      • VMNet driver causes VMApp to return from select()
      • VMApp copies packet to VM memory
      • VMApp asks VMM to raise virtual IRQ
      • Guest OS performs port operations to read data
      • Trap to VMDriver
      • VMApp returns from ioctl() to raise IRQ
  • 51. Reducing Network Virtualization Overheads
    • Handling I/O ports in the VMM
      • Many accesses don’t involve actual I/O.
      • Let the VMM maintain the state, avoiding a worlds switch.
    • Send combining
      • If data rate is high, queue up packets, send them in a group.
    • IRQ notification
      • Use shared memory bitmap rather than requiring VMApp to call select() when an IRQ is received on the host system.
  • 52. Performance Enhancements
    • Reducing CPU virtualization overhead
      • Find operations to the interrupt controller that have memory semantics and replace with MOV operation, which does not require intervention by the VMM.
      • Apparently requires dynamic binary translation.
    • Modifying the guest OS
      • Eliminate idle task page table switching, which is not necessary, since the idle task pages are mapped in every process page table.
      • Run idle task with page table of last process.
      • What would happen if the idle task had a bug and wrote to some random addresses?
  • 53. Performance Enhancements
    • Creating a custom virtual device
      • Virtualizing a real device is somewhat inefficient, since the interface to these devices is optimized for real devices, not virtual devices.
      • Designing a custom virtual device can reduce expensive operations.
      • Disadvantage is that must write a new device driver in guest OS for this virtual device.
    • Modifying the host OS
      • VMNet driver allocates kernel memory sk_buff , then copies from VMApp to sk_buff .
      • Can eliminate copy by using memory from VM physical memory.
    • Bypassing the host OS
      • VMM uses own drivers, rather than going through the host OS. (Note that going through the host OS is using a kind of process VM provided by the host OS.)
      • Disadvantage is that you have to write your own VMM driver for every supported real device.
  • 54. Summary
    • Main goal is to develop some understanding of the issues of hosted system VM performance.
  • 55. Windows Virtualization Architecture Mark Kieffer Group Program Manager Windows Virtualization markkie @ microsoft.com Microsoft Corporation
  • 56. Microsoft’s current virtualization offerings
    • Current virtualization uses and benefits
      • Uses for virtualization today
      • Microsoft’s current virtualization offerings
    • Windows Virtualization Architecture
      • Hypervisor
      • Virtualization stack
      • Device virtualization
  • 57. Current Virtualization Uses and Benefits
    • Workloads that are enabled by virtualization
      • Server Consolidation
      • Efficient software development and test
      • Dynamic data centers
        • Resource Management
      • Application re-hosting
      • Application compatibility
      • High availability partitions
      • Many others
  • 58. Microsoft’s Current Virtualization Offerings
    • Virtual PC 2004
      • Being deployed in production environments
        • Application re-hosting
        • Demos, training, helpdesk
      • Being deployed in test and dev environments
        • Multiple test beds on a single piece of hardware
    • Virtual Server 2005
      • Released Q4 2004
      • Well received in the industry
        • Used for production server consolidation
      • Remote management of virtual machine operations
      • Great perf gains and functionality enhancement in SP1
        • 64-bit host support, PXE support, and others
  • 59. Microsoft's Next Gen Virtualization Architecture
    • Introducing Windows virtualization for servers
      • Hypervisor-based
      • Separate, small management partition (parent)
      • Takes device virtualization to the next level
      • Targeting availability in the Longhorn wave
    • Definition of a couple of terms
      • Parent partition: a partition that manages its children
      • Child partition: any number of partitions that are started, managed, and shut down by their parent
      • Virtualization Stack: The collection of components that runs in the parent partition for VM management
  • 60. Windows Virtualization for Servers
    • Some proposed features
      • 32-bit and 64-bit guests
      • x64-only hosts
      • Guest multiprocessing
      • Virtualized devices
      • WMI management and control API
      • Save & restore
      • Snapshotting
      • CPU and I/O resource controls
      • Tuning for NUMA
      • Dynamic resource addition & removal
      • Live migration
  • 61.
    • Will run within a parent partition
      • Stand alone in a small footprint OS (MinWin)
      • Full Windows OS
    • Multiple virtualization stacks could co-exist
    Virtualization Stack
  • 62. Virtualization Stack Hypervisor Parent Partition VM Service VM Worker Process Virtualization Infrastructure Driver VM Worker Process VM Worker Process VMBus Bus Driver
    • Collection of user-mode & kernel-mode components
      • Runs within a partition on top of a (minimal) OS
      • Contains all VM support not in the hypervisor
    • Interacts with hypervisor
      • Calls the hypervisor to perform certain actions
      • Responds to messages from the hypervisor or from other partitions
    • Creates and manages a group of “child partitions”
      • Manages memory for child partitions
      • Virtualizes devices for child partitions
    • Exposes a management interface
    Child Partition 1 Child Partition 2 Hypervisor API & Message Library WMI Provider
  • 63. WMI Value Proposition
    • WMI is the interface that applications use to manage all aspects of Windows virtualization services
    • WMI is consumer agnostic
      • Can be accessed remotely by WS-Management,
      • Programmable via C++, WSH, .NET
    • Hardware manufacturers benefit from understanding WMI
      • Understand how their hardware can participate within overall Windows virtualization services manageability
  • 64. Device Virtualization
    • Provides a method for sharing hardware efficiently
    • Physical devices are still managed by their device drivers
    • Definitions
      • Virtualization Service Providers (VSPs) & Clients (VSCs)
        • VSP = provider, VSC = consumer
        • VSP typically run in a partition that “owns” a hardware resource
        • VSP/VSC pair per device type (storage, network, etc.)
        • May expose bandwidth resource controls
        • Protocol is specific to device type, but is generally OS-agnostic
  • 65. Device Virtualization
    • Standard VSPs
      • Storage: parses VHDs, supports difference drive chains
      • Network: provides virtualized network mechanism
      • Video: 2D for servers
      • USB: allows a USB device to be assigned to a partition
      • Input: keyboard & mouse
      • Time: virtualization for RTC hardware
  • 66. Device Virtualization Disk Hypervisor Storage VSP VMBus VMBus
    • Physical devices
      • Managed by traditional driver stacks
    • Virtualization service providers (VSPs)
      • Virtualize a specific class of device (e.g. networking, storage, etc.)
      • Expose an abstract device interface
      • Run within the partition that owns the corresponding physical device
    • Virtualization service clients (VSCs)
      • Consume virtualized hardware service
    • VMBus
      • Software “bus” (enumeration, hot plug, etc.)
      • Enables VSPs and VSCs to communicate efficiently
      • Uses memory sharing and hypervisor IPC messages
    Storage Stack Port Driver Storage Stack Storage VSC Parent Partition
  • 67. Windows Enlightenments
    • Enlightenments
      • Modifications to an OS to make it aware that it’s running within a VM
    • Windows codenamed “Longhorn” enlightenments
      • Optimizations in memory manager (MM)
      • Win32 and kernel API: Am I running on a virtual machine?
    • Looking at additional enlightenments in the future
  • 68. Some System Requirements
    • Must support hardware virtualization
      • Intel’s Virtualization Technology
      • AMD’s “Pacifica”
      • We are not planning on supporting any other implementations
    • Must support x64 extensions
  • 69. Microkernels Meet Recursive Virtual Machines Ford et al
  • 70. Decomposition
    • Microkernels decompose functionality horizontally (mainly).
      • Monolithic services separated horizontally.
      • Moved up one layer.
    • Stackable VMMs decompose functionality vertically.
      • Each layer supplies some functionality.
  • 71. Fluke
    • Uses a nested process architecture.
      • Each process provides a VM to its children, possibly with additional functionality.
      • Different from usual parent-child in that children are completely contained within and visible to parent.
      • This is necessary for the parent to be a VM to its children.
    • Two APIs
      • Low-level kernel API to microkernel for basic manipulation
      • High-level protocols to handle:
        • Parent Interface
        • Process
        • MemPool
        • FileSystem
      • Nested VMs interact directly with microkernel for the low-level API, but interact with the parent VM for high-level protocols.
      • Parent VM will use interposition to add additional functionality. This is how the stacking works.
  • 72.  
  • 73. Key Ideas
    • Implement a microkernel that allows process virtual machines to be stacked.
    • Each virtual machine is a user-level server.
    • Stacking occurs through process nesting.
    • Use pass-through to avoid exponential behavior.
    • Mainly interesting for the ideas, performance is relatively poor, but may be improvable.