Your SlideShare is downloading. ×
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply



Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Virtual machines Jinyang Li
  • 2. OS sits between h/w and app OS hardware firefox iTunes emacs syscall h/w interface (intel manuals) OS abstracts the h/w interface
  • 3. VMM virtualizes hardware interface guest OS hardware firefox iTunes emacs syscall h/w interface guest OS firefox iTunes emacs syscall h/w interface h/w interface Virtual machine monitor
  • 4. VMM hosted architecture Host OS hardware app syscall h/w interface (intel manuals) Guest OS app app app syscall h/w interface Virtual machine monitor
  • 5. History of virtualization
    • Old idea dating from 1960s
      • IBM VM/370: a VMM for IBM mainframe
      • Multiplex multiple OS on expensive h/w
      • Desirable when few machines around
    • Interest died out in the 80s and 90s
      • PC h/w is cheap
  • 6. Why VM today?
    • Machine consolidation
      • N virtual machines  1 physical machine
      • E.g. Amazon’s EC2 cloud
    • VM simplifies software management
      • Bundle OS/libraries/configurations together
    • Other cool uses
      • Security, fault tolerance, debugging …
  • 7. Similarities of OS and VMM
    • OS provides a virtual execution environment for processes
    • VMM provides a virtual execution env (virtual hardware) for OSes
  • 8. Differences btw. virtualization for processes and OSes
    • How does the process and OS use hardware resources?
    Programmed I/O, DMA, interrupts File system I/O +Traps, interrupts Signals, errors exceptions +Physical memory Virtual memory memory +Privileged registers and instructions Non-privileged registers and instructions CPU OS process
  • 9. Complete machine simulation #define REG_EAX 1; int32_t eip; int32_t regs[8]; int32_t segregs[4]; ... for (;;) { read_instruction(); switch (decode_instruction_opcode()) { case OPCODE_ADD: int src = decode_src_reg(); int dst = decode_dst_reg(); regs[dst] = regs[dst] + regs[src]; break; case .. } eip += instruction_length ; }
  • 10. Pros/Cons of simulation
    • Pros
      • Controlled execution
      • Great for debugging
    • Cons: too slow
      • 100x slow down of CPU
      • The software decode+execution takes 100~1000s cycles to execute one instruction
  • 11. Virtualization’s goals
    • Fidelity
      • Software on VMM executes identically to its execution on h/w
    • Performance
      • Majority of guest instructions are directly executed by hardware
    • Safety
      • VMM manages h/w resources, provides isolation etc.
  • 12. Virtualization challenges
    • Insight: execute most instructions as they are
      • ADD $1, %eax
    • Challenges:
      • How to execute privileged instructions?
        • lgdt, cli, halt
      • How to virtualize the MMU?
      • How to prevent guest from overwriting host or other guests?
        • mv $123, %cr3
      • How to virtualize I/O?
  • 13. Basic CPU virtualization techniques
    • Trap-and-emulate
      • KVM, QEMU
    • Paravirtualization
      • Xen
    • Dynamic binary translation
      • VMWare
  • 14. Technique #1: trap-n-emulate
    • “ trap-n-emulate” (classical virtualization)
      • Run guest OS at “lesser” privilege
      • Privileged instructions cause “traps”
      • VMM run simulator on trapped instructions
      • (Most) non-privileged instructions do not need traps
      • Need h/w support
  • 15. Technique #1: x86 challenges
    • Traditional x86 is not amicable to #1
    • Problems:
      • Many privilege instructions do not trap!
        • popf does not trap if it cannot modify system flag
      • Hardware-managed TLB
        • On TLB miss, h/w automatically loads from page table (VMM cannot intercept this event)
  • 16. Technique #1: h/w support
    • AMD’s SVM and Intel’s VT extension to x86
      • Starting in late 2005
      • AMD Athlon 64, Intel P4, Intel Core …
    • Many VMMs now utilize this h/w support
      • VMWare, QEMU, KVM, VirtualBox, …
    • More than just simple fixes
      • I.e. make sure privileged instructions trap
    • H/w support’s goal: minimize traps and emulation in VMM
  • 17. Technique #1: h/w support OS app app CPL=3 CPL=0 VMM Guest OS app app Guest OS Vmx non-root Vmx root vmrun vmexit
  • 18. Technique #1: h/w support
    • VMM sets up an in-memory VM control data structure (VMCS) per VM
    • VMCS virtualizes
      • System registers:
        • %CR0, %CR3, %EIP, %eflags, %CS, %SS, …
    • VMCS allows VMM to specify exit controls:
      • E.g. whether to trap upon “HLT”, “LGDT” instructions
    • Effects: fewer traps
  • 19. Technique #2: paravirtualization
    • Fancy word for “we have to modify and recompile OS”
    • Popular back when x86 is not easily virtualizable
    • VMM runs at privileged mode, VMs run unprivileged mode
    • Modified OS to call into VMM for memory, I/O, interrupts setup, etc..
      • ~3000 LoC modifications for Linux, ~5000 LoC for XP
  • 20. Technique #3: dynamic binary translation
    • We have seen BT before. Where?
      • Eraser intercepts all memory reads/writes to check for lock protection
    • How BT enables software virtualization:
      • find all privileged instructions in OS and replace them with call-ins to VMM for emulation
    • Why not static binary translation?
    • Popularized by VMWare
      • QEMU also supports BT
  • 21. Technique #3: binary translation void clearbal() { while (balance>0) balance--; } … 804836d: a1 8c 95 04 08 mov 0x804958c,%eax 8048372: 83 e8 01 sub $0x1,%eax 8048375: a3 8c 95 04 08 mov %eax,0x804958c 804837a: a1 8c 95 04 08 mov 0x804958c,%eax 804837f: 85 c0 test %eax,%eax 8048381: 7f ea jg 804836d 8048383: c3 ret … translation engine code cache 90d: mov… sub… mov… mov… test… jg call<TE_jmp>(804836d) call<TE_ret> Original Cache 804836d 90d … … jg 90d
  • 22. Technique #3: binary translation
    • Is BT applied on user-level programs?
    • BT performance
      • Most instructions can be executed identically
      • Incur translation overhead only for the first time code is executed
      • Intercepting and emulating privileged instructions is expensive
        • e.g. syscalls
      • BT slows down call/ret control flow
  • 23. Memory virtualization PA=0 4G MA=0 %cr3 pa Can h/w use this page table? %cr3 ma pa pa ma ma VMM gives the corresponding shadow page table to h/w VA VA VA VA VA OS1 PA=0 1G OS2 PA=0 1G
  • 24. Maintain shadow page tables
    • Correctness requires:
      • A shadow pg table must be consistent with its actual pg table
    • Strawman 1:
      • On switching address space (“mov %cr3 …”), construct a fresh shadow pg table
      • Incurs expensive addr space switch overhead
    • Strawman 2:
      • On switching address space, use an empty shadow pg table
      • Upon incurring page faults, modify shadow PTE according to actual PTE
      • Incurs many hidden pg faults
  • 25. Maintain shadow page tables
    • Can VMM cache shadows?
      • Challenge: what if OS modifies one of the pg tables w/o knowledge of VMM?
    • Insight : write protect actual pg tables.
      • Referred to as “memory traces”
    • VMM may choose not to populate all shadow PTEs at once
      • saves addr space switch time
      • Less hidden pg faults than strawman #2 because shadow PTEs are cached
  • 26. More h/w support
    • Intel/AMD added h/w support for memory virtualization
      • e.g. Intel Core i7 (Q4 2008)
      • Add new table from PA to MA
      • h/w traverses two pg tables VA  PA, PA  MA to fill TLB
  • 27. Virtualize I/O
    • OS communicates with I/O devices via
      • Special instruction in/out
      • Memory mapping I/O (PIO)
      • Interrupts
      • DMA
    • Virtualization
      • In/out and PIO must trap into VMM
      • Run simulation of I/O device
    • Simulation:
      • Interrupt: Generate interrupt in CPU simulator
      • DMA: copy data to/fromt physical memory of VM
  • 28. Managing memory in VMM
    • Configure VMs to use more “physical” memory than actually available
    • What happens when running out of memory?
    • Strawman: use LRU paging at VMM
      • OS already uses LRU  doubling paging
      • OS will recycle whatever “physical page” VMM just paged out
      • Better to do random eviction
  • 29. ESX: Reclaiming pages
    • Idea: trick OS to return memory to VMM
    • OS is better at deciding what to swap
      • Normally OS uses all available memory
      • E.g. buffer cache contains old pages, OS won’t discard if it doesn’t need memory
    • ESX trick: baloon driver
  • 30. baloon driver VMM OS1 OS2 Baloon is a special pseudo-device loaded into OS VMM instructs baloon to inflate or deflate depending on memory pressure Baloon inflates by requesting lots of “pinned” memory pages To accommodate inflated baloon, OS releases/swaps out some of its memory pages Baloon tells VMM to recycle its “private” pinned pages
  • 31. ESX: sharing pages across VMs
    • Many VMs run same OS and programs
      • Many Linux boxes with Apache server
    • Idea: use 1 machine page for identical physical pages
    • Periodically scan to find identical machine pages
      • Do copy-on-write to eliminate redundancy
    • Optimization: use a hash table keyed by hash(content)
      • Allows quick lookup based on page content
  • 32. Idle memory tax
    • Proportional share memory allocation
      • Important VM gets more memory
      • Reclaim memory from VM with smallest “shares-to-pages” (S/P) ratio
      • If S A = 2S B , A can have 2X memory as B
    • Problem:
      • high-share VMs hoard more memory than needed
    • Solution: idle memory tax
      • Instead of S/P, reclaim from VM w/ smallest S/P(f+k(1-f))
      • Statistically sample to determine f
    f: frac of non-idle pages k≥1: a configurable idle page “cost” parameter
  • 33. Summary: VMM attributes
    • Software compatibility
      • Runs all software
    • Low overhead
      • Near “raw” machine performance
    • Complete isolation
      • Total data isolation between virtual machines
    • Encapsulation
      • VMs are not tied to physical machines
      • Checkpoint/migration
  • 34. Example: VMM-based IDS
    • Tradeoffs of intrusion detection systems (IDS):
      • Host-based IDS:
        • Good visibility to detect intruder
        • Weak isolation from intruder disabling IDS
      • Network-based IDS:
        • Good isolation from attacker
        • Weak visibility of what’s actually going on
    • Can we have both visibility and isolation?
  • 35. Example: VMM-based IDS
    • Strong isolation
      • VMM isolate software in VM from VMM
      • Compromised OS cannot disable IDS in VMM
    • Introspection: peek inside at VM
      • Examine physical memory, registers, I/O devices for patterns of break-ins
    • Interposition: modify h/w abstraction to enhance security
  • 36. Compute Utility
    • Virtual appliance abstraction
      • Target specialized environment (e.g. program development)
      • Store targeted VMs in centralized repository
      • Cached on running machines
    • Benefits:
      • Simplified system admin
      • Mobility: computing environment follows user around
  • 37. Transparent replication
    • Replicate VMs across multiple physical machines
      • If one fails, another can take over immediately
    • No software modification necessary
    • Preserves all active network connections