Virtual machines Jinyang Li
OS sits between h/w and app OS hardware firefox iTunes emacs syscall h/w interface (intel manuals) OS abstracts the h/w in...
VMM virtualizes hardware interface guest OS hardware firefox iTunes emacs syscall h/w interface guest OS firefox iTunes em...
VMM hosted architecture Host OS hardware app syscall h/w interface (intel manuals) Guest OS app app app syscall h/w interf...
History of virtualization <ul><li>Old idea dating from 1960s </li></ul><ul><ul><li>IBM VM/370: a VMM for IBM mainframe </l...
Why VM today? <ul><li>Machine consolidation </li></ul><ul><ul><li>N virtual machines    1 physical machine </li></ul></ul...
Similarities of OS and VMM <ul><li>OS provides a virtual execution environment for processes </li></ul><ul><li>VMM provide...
Differences btw. virtualization for processes and OSes <ul><li>How does the process and OS use hardware resources? </li></...
Complete machine simulation #define REG_EAX 1; int32_t eip; int32_t regs[8]; int32_t segregs[4]; ... for (;;) { read_instr...
Pros/Cons of simulation <ul><li>Pros </li></ul><ul><ul><li>Controlled execution </li></ul></ul><ul><ul><li>Great for debug...
Virtualization’s goals <ul><li>Fidelity </li></ul><ul><ul><li>Software on VMM executes identically to its execution on h/w...
Virtualization challenges <ul><li>Insight: execute most instructions as they are </li></ul><ul><ul><li>ADD $1, %eax </li><...
Basic CPU virtualization techniques <ul><li>Trap-and-emulate </li></ul><ul><ul><li>KVM, QEMU </li></ul></ul><ul><li>Paravi...
Technique #1: trap-n-emulate <ul><li>“ trap-n-emulate” (classical virtualization) </li></ul><ul><ul><li>Run guest OS at “l...
Technique #1: x86 challenges <ul><li>Traditional  x86 is not amicable to #1 </li></ul><ul><li>Problems: </li></ul><ul><ul>...
Technique #1: h/w support <ul><li>AMD’s SVM and Intel’s VT extension to x86 </li></ul><ul><ul><li>Starting in late 2005  <...
Technique #1: h/w support OS app app CPL=3 CPL=0 VMM Guest OS app app Guest OS Vmx  non-root Vmx root vmrun vmexit
Technique #1: h/w support <ul><li>VMM sets up an in-memory VM control data structure (VMCS) per VM </li></ul><ul><li>VMCS ...
Technique #2: paravirtualization <ul><li>Fancy word for “we have to modify and recompile OS” </li></ul><ul><li>Popular bac...
Technique #3: dynamic binary translation <ul><li>We have seen BT before. Where? </li></ul><ul><ul><li>Eraser intercepts al...
Technique #3: binary translation  void clearbal()  { while (balance>0)  balance--; } …  804836d:  a1 8c 95 04 08  mov  0x8...
Technique #3: binary translation <ul><li>Is BT applied on user-level programs? </li></ul><ul><li>BT performance </li></ul>...
Memory virtualization PA=0 4G MA=0 %cr3 pa Can h/w use this  page table? %cr3 ma pa pa ma ma VMM gives the corresponding s...
Maintain shadow page tables <ul><li>Correctness requires:  </li></ul><ul><ul><li>A shadow pg table must be consistent with...
Maintain shadow page tables <ul><li>Can VMM cache shadows? </li></ul><ul><ul><li>Challenge: what if OS modifies one of the...
More h/w support <ul><li>Intel/AMD added h/w support for memory virtualization </li></ul><ul><ul><li>e.g. Intel Core i7 (Q...
Virtualize I/O <ul><li>OS communicates with I/O devices via </li></ul><ul><ul><li>Special instruction in/out </li></ul></u...
Managing memory in VMM <ul><li>Configure VMs to use more “physical” memory than actually available </li></ul><ul><li>What ...
ESX: Reclaiming pages <ul><li>Idea: trick OS to return memory to VMM </li></ul><ul><li>OS is better at deciding what to sw...
baloon driver VMM OS1 OS2 Baloon is a special  pseudo-device loaded into OS VMM instructs baloon to  inflate or deflate de...
ESX: sharing pages across VMs <ul><li>Many VMs run same OS and programs </li></ul><ul><ul><li>Many Linux boxes with Apache...
Idle memory tax <ul><li>Proportional share memory allocation </li></ul><ul><ul><li>Important VM gets more memory </li></ul...
Summary: VMM attributes <ul><li>Software compatibility </li></ul><ul><ul><li>Runs all software </li></ul></ul><ul><li>Low ...
Example: VMM-based IDS <ul><li>Tradeoffs of intrusion detection systems (IDS): </li></ul><ul><ul><li>Host-based IDS: </li>...
Example: VMM-based IDS <ul><li>Strong isolation </li></ul><ul><ul><li>VMM isolate software in VM from VMM </li></ul></ul><...
Compute Utility <ul><li>Virtual appliance abstraction </li></ul><ul><ul><li>Target specialized environment (e.g. program d...
Transparent replication <ul><li>Replicate VMs across multiple physical machines </li></ul><ul><ul><li>If one fails, anothe...
Upcoming SlideShare
Loading in...5
×

Notes

395

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
395
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Notes

  1. 1. Virtual machines Jinyang Li
  2. 2. OS sits between h/w and app OS hardware firefox iTunes emacs syscall h/w interface (intel manuals) OS abstracts the h/w interface
  3. 3. VMM virtualizes hardware interface guest OS hardware firefox iTunes emacs syscall h/w interface guest OS firefox iTunes emacs syscall h/w interface h/w interface Virtual machine monitor
  4. 4. VMM hosted architecture Host OS hardware app syscall h/w interface (intel manuals) Guest OS app app app syscall h/w interface Virtual machine monitor
  5. 5. History of virtualization <ul><li>Old idea dating from 1960s </li></ul><ul><ul><li>IBM VM/370: a VMM for IBM mainframe </li></ul></ul><ul><ul><li>Multiplex multiple OS on expensive h/w </li></ul></ul><ul><ul><li>Desirable when few machines around </li></ul></ul><ul><li>Interest died out in the 80s and 90s </li></ul><ul><ul><li>PC h/w is cheap </li></ul></ul>
  6. 6. Why VM today? <ul><li>Machine consolidation </li></ul><ul><ul><li>N virtual machines  1 physical machine </li></ul></ul><ul><ul><li>E.g. Amazon’s EC2 cloud </li></ul></ul><ul><li>VM simplifies software management </li></ul><ul><ul><li>Bundle OS/libraries/configurations together </li></ul></ul><ul><li>Other cool uses </li></ul><ul><ul><li>Security, fault tolerance, debugging … </li></ul></ul>
  7. 7. Similarities of OS and VMM <ul><li>OS provides a virtual execution environment for processes </li></ul><ul><li>VMM provides a virtual execution env (virtual hardware) for OSes </li></ul>
  8. 8. Differences btw. virtualization for processes and OSes <ul><li>How does the process and OS use hardware resources? </li></ul>Programmed I/O, DMA, interrupts File system I/O +Traps, interrupts Signals, errors exceptions +Physical memory Virtual memory memory +Privileged registers and instructions Non-privileged registers and instructions CPU OS process
  9. 9. Complete machine simulation #define REG_EAX 1; int32_t eip; int32_t regs[8]; int32_t segregs[4]; ... for (;;) { read_instruction(); switch (decode_instruction_opcode()) { case OPCODE_ADD: int src = decode_src_reg(); int dst = decode_dst_reg(); regs[dst] = regs[dst] + regs[src]; break; case .. } eip += instruction_length ; }
  10. 10. Pros/Cons of simulation <ul><li>Pros </li></ul><ul><ul><li>Controlled execution </li></ul></ul><ul><ul><li>Great for debugging </li></ul></ul><ul><li>Cons: too slow </li></ul><ul><ul><li>100x slow down of CPU </li></ul></ul><ul><ul><li>The software decode+execution takes 100~1000s cycles to execute one instruction </li></ul></ul>
  11. 11. Virtualization’s goals <ul><li>Fidelity </li></ul><ul><ul><li>Software on VMM executes identically to its execution on h/w </li></ul></ul><ul><li>Performance </li></ul><ul><ul><li>Majority of guest instructions are directly executed by hardware </li></ul></ul><ul><li>Safety </li></ul><ul><ul><li>VMM manages h/w resources, provides isolation etc. </li></ul></ul>
  12. 12. Virtualization challenges <ul><li>Insight: execute most instructions as they are </li></ul><ul><ul><li>ADD $1, %eax </li></ul></ul><ul><li>Challenges: </li></ul><ul><ul><li>How to execute privileged instructions? </li></ul></ul><ul><ul><ul><li>lgdt, cli, halt </li></ul></ul></ul><ul><ul><li>How to virtualize the MMU? </li></ul></ul><ul><ul><li>How to prevent guest from overwriting host or other guests? </li></ul></ul><ul><ul><ul><li>mv $123, %cr3 </li></ul></ul></ul><ul><ul><li>How to virtualize I/O? </li></ul></ul>
  13. 13. Basic CPU virtualization techniques <ul><li>Trap-and-emulate </li></ul><ul><ul><li>KVM, QEMU </li></ul></ul><ul><li>Paravirtualization </li></ul><ul><ul><li>Xen </li></ul></ul><ul><li>Dynamic binary translation </li></ul><ul><ul><li>VMWare </li></ul></ul>
  14. 14. Technique #1: trap-n-emulate <ul><li>“ trap-n-emulate” (classical virtualization) </li></ul><ul><ul><li>Run guest OS at “lesser” privilege </li></ul></ul><ul><ul><li>Privileged instructions cause “traps” </li></ul></ul><ul><ul><li>VMM run simulator on trapped instructions </li></ul></ul><ul><ul><li>(Most) non-privileged instructions do not need traps </li></ul></ul><ul><ul><li>Need h/w support </li></ul></ul>
  15. 15. Technique #1: x86 challenges <ul><li>Traditional x86 is not amicable to #1 </li></ul><ul><li>Problems: </li></ul><ul><ul><li>Many privilege instructions do not trap! </li></ul></ul><ul><ul><ul><li>popf does not trap if it cannot modify system flag </li></ul></ul></ul><ul><ul><li>Hardware-managed TLB </li></ul></ul><ul><ul><ul><li>On TLB miss, h/w automatically loads from page table (VMM cannot intercept this event) </li></ul></ul></ul>
  16. 16. Technique #1: h/w support <ul><li>AMD’s SVM and Intel’s VT extension to x86 </li></ul><ul><ul><li>Starting in late 2005 </li></ul></ul><ul><ul><li>AMD Athlon 64, Intel P4, Intel Core … </li></ul></ul><ul><li>Many VMMs now utilize this h/w support </li></ul><ul><ul><li>VMWare, QEMU, KVM, VirtualBox, … </li></ul></ul><ul><li>More than just simple fixes </li></ul><ul><ul><li>I.e. make sure privileged instructions trap </li></ul></ul><ul><li>H/w support’s goal: minimize traps and emulation in VMM </li></ul>
  17. 17. Technique #1: h/w support OS app app CPL=3 CPL=0 VMM Guest OS app app Guest OS Vmx non-root Vmx root vmrun vmexit
  18. 18. Technique #1: h/w support <ul><li>VMM sets up an in-memory VM control data structure (VMCS) per VM </li></ul><ul><li>VMCS virtualizes </li></ul><ul><ul><li>System registers: </li></ul></ul><ul><ul><ul><li>%CR0, %CR3, %EIP, %eflags, %CS, %SS, … </li></ul></ul></ul><ul><li>VMCS allows VMM to specify exit controls: </li></ul><ul><ul><li>E.g. whether to trap upon “HLT”, “LGDT” instructions </li></ul></ul><ul><li>Effects: fewer traps </li></ul>
  19. 19. Technique #2: paravirtualization <ul><li>Fancy word for “we have to modify and recompile OS” </li></ul><ul><li>Popular back when x86 is not easily virtualizable </li></ul><ul><li>VMM runs at privileged mode, VMs run unprivileged mode </li></ul><ul><li>Modified OS to call into VMM for memory, I/O, interrupts setup, etc.. </li></ul><ul><ul><li>~3000 LoC modifications for Linux, ~5000 LoC for XP </li></ul></ul>
  20. 20. Technique #3: dynamic binary translation <ul><li>We have seen BT before. Where? </li></ul><ul><ul><li>Eraser intercepts all memory reads/writes to check for lock protection </li></ul></ul><ul><li>How BT enables software virtualization: </li></ul><ul><ul><li>find all privileged instructions in OS and replace them with call-ins to VMM for emulation </li></ul></ul><ul><li>Why not static binary translation? </li></ul><ul><li>Popularized by VMWare </li></ul><ul><ul><li>QEMU also supports BT </li></ul></ul>
  21. 21. Technique #3: binary translation void clearbal() { while (balance>0) balance--; } … 804836d: a1 8c 95 04 08 mov 0x804958c,%eax 8048372: 83 e8 01 sub $0x1,%eax 8048375: a3 8c 95 04 08 mov %eax,0x804958c 804837a: a1 8c 95 04 08 mov 0x804958c,%eax 804837f: 85 c0 test %eax,%eax 8048381: 7f ea jg 804836d 8048383: c3 ret … translation engine code cache 90d: mov… sub… mov… mov… test… jg call<TE_jmp>(804836d) call<TE_ret> Original Cache 804836d 90d … … jg 90d
  22. 22. Technique #3: binary translation <ul><li>Is BT applied on user-level programs? </li></ul><ul><li>BT performance </li></ul><ul><ul><li>Most instructions can be executed identically </li></ul></ul><ul><ul><li>Incur translation overhead only for the first time code is executed </li></ul></ul><ul><ul><li>Intercepting and emulating privileged instructions is expensive </li></ul></ul><ul><ul><ul><li>e.g. syscalls </li></ul></ul></ul><ul><ul><li>BT slows down call/ret control flow </li></ul></ul>
  23. 23. Memory virtualization PA=0 4G MA=0 %cr3 pa Can h/w use this page table? %cr3 ma pa pa ma ma VMM gives the corresponding shadow page table to h/w VA VA VA VA VA OS1 PA=0 1G OS2 PA=0 1G
  24. 24. Maintain shadow page tables <ul><li>Correctness requires: </li></ul><ul><ul><li>A shadow pg table must be consistent with its actual pg table </li></ul></ul><ul><li>Strawman 1: </li></ul><ul><ul><li>On switching address space (“mov %cr3 …”), construct a fresh shadow pg table </li></ul></ul><ul><ul><li>Incurs expensive addr space switch overhead </li></ul></ul><ul><li>Strawman 2: </li></ul><ul><ul><li>On switching address space, use an empty shadow pg table </li></ul></ul><ul><ul><li>Upon incurring page faults, modify shadow PTE according to actual PTE </li></ul></ul><ul><ul><li>Incurs many hidden pg faults </li></ul></ul>
  25. 25. Maintain shadow page tables <ul><li>Can VMM cache shadows? </li></ul><ul><ul><li>Challenge: what if OS modifies one of the pg tables w/o knowledge of VMM? </li></ul></ul><ul><li>Insight : write protect actual pg tables. </li></ul><ul><ul><li>Referred to as “memory traces” </li></ul></ul><ul><li>VMM may choose not to populate all shadow PTEs at once </li></ul><ul><ul><li>saves addr space switch time </li></ul></ul><ul><ul><li>Less hidden pg faults than strawman #2 because shadow PTEs are cached </li></ul></ul>
  26. 26. More h/w support <ul><li>Intel/AMD added h/w support for memory virtualization </li></ul><ul><ul><li>e.g. Intel Core i7 (Q4 2008) </li></ul></ul><ul><ul><li>Add new table from PA to MA </li></ul></ul><ul><ul><li>h/w traverses two pg tables VA  PA, PA  MA to fill TLB </li></ul></ul>
  27. 27. Virtualize I/O <ul><li>OS communicates with I/O devices via </li></ul><ul><ul><li>Special instruction in/out </li></ul></ul><ul><ul><li>Memory mapping I/O (PIO) </li></ul></ul><ul><ul><li>Interrupts </li></ul></ul><ul><ul><li>DMA </li></ul></ul><ul><li>Virtualization </li></ul><ul><ul><li>In/out and PIO must trap into VMM </li></ul></ul><ul><ul><li>Run simulation of I/O device </li></ul></ul><ul><li>Simulation: </li></ul><ul><ul><li>Interrupt: Generate interrupt in CPU simulator </li></ul></ul><ul><ul><li>DMA: copy data to/fromt physical memory of VM </li></ul></ul>
  28. 28. Managing memory in VMM <ul><li>Configure VMs to use more “physical” memory than actually available </li></ul><ul><li>What happens when running out of memory? </li></ul><ul><li>Strawman: use LRU paging at VMM </li></ul><ul><ul><li>OS already uses LRU  doubling paging </li></ul></ul><ul><ul><li>OS will recycle whatever “physical page” VMM just paged out </li></ul></ul><ul><ul><li>Better to do random eviction </li></ul></ul>
  29. 29. ESX: Reclaiming pages <ul><li>Idea: trick OS to return memory to VMM </li></ul><ul><li>OS is better at deciding what to swap </li></ul><ul><ul><li>Normally OS uses all available memory </li></ul></ul><ul><ul><li>E.g. buffer cache contains old pages, OS won’t discard if it doesn’t need memory </li></ul></ul><ul><li>ESX trick: baloon driver </li></ul>
  30. 30. baloon driver VMM OS1 OS2 Baloon is a special pseudo-device loaded into OS VMM instructs baloon to inflate or deflate depending on memory pressure Baloon inflates by requesting lots of “pinned” memory pages To accommodate inflated baloon, OS releases/swaps out some of its memory pages Baloon tells VMM to recycle its “private” pinned pages
  31. 31. ESX: sharing pages across VMs <ul><li>Many VMs run same OS and programs </li></ul><ul><ul><li>Many Linux boxes with Apache server </li></ul></ul><ul><li>Idea: use 1 machine page for identical physical pages </li></ul><ul><li>Periodically scan to find identical machine pages </li></ul><ul><ul><li>Do copy-on-write to eliminate redundancy </li></ul></ul><ul><li>Optimization: use a hash table keyed by hash(content) </li></ul><ul><ul><li>Allows quick lookup based on page content </li></ul></ul>
  32. 32. Idle memory tax <ul><li>Proportional share memory allocation </li></ul><ul><ul><li>Important VM gets more memory </li></ul></ul><ul><ul><li>Reclaim memory from VM with smallest “shares-to-pages” (S/P) ratio </li></ul></ul><ul><ul><li>If S A = 2S B , A can have 2X memory as B </li></ul></ul><ul><li>Problem: </li></ul><ul><ul><li>high-share VMs hoard more memory than needed </li></ul></ul><ul><li>Solution: idle memory tax </li></ul><ul><ul><li>Instead of S/P, reclaim from VM w/ smallest S/P(f+k(1-f)) </li></ul></ul><ul><ul><li>Statistically sample to determine f </li></ul></ul>f: frac of non-idle pages k≥1: a configurable idle page “cost” parameter
  33. 33. Summary: VMM attributes <ul><li>Software compatibility </li></ul><ul><ul><li>Runs all software </li></ul></ul><ul><li>Low overhead </li></ul><ul><ul><li>Near “raw” machine performance </li></ul></ul><ul><li>Complete isolation </li></ul><ul><ul><li>Total data isolation between virtual machines </li></ul></ul><ul><li>Encapsulation </li></ul><ul><ul><li>VMs are not tied to physical machines </li></ul></ul><ul><ul><li>Checkpoint/migration </li></ul></ul>
  34. 34. Example: VMM-based IDS <ul><li>Tradeoffs of intrusion detection systems (IDS): </li></ul><ul><ul><li>Host-based IDS: </li></ul></ul><ul><ul><ul><li>Good visibility to detect intruder </li></ul></ul></ul><ul><ul><ul><li>Weak isolation from intruder disabling IDS </li></ul></ul></ul><ul><ul><li>Network-based IDS: </li></ul></ul><ul><ul><ul><li>Good isolation from attacker </li></ul></ul></ul><ul><ul><ul><li>Weak visibility of what’s actually going on </li></ul></ul></ul><ul><li>Can we have both visibility and isolation? </li></ul>
  35. 35. Example: VMM-based IDS <ul><li>Strong isolation </li></ul><ul><ul><li>VMM isolate software in VM from VMM </li></ul></ul><ul><ul><li>Compromised OS cannot disable IDS in VMM </li></ul></ul><ul><li>Introspection: peek inside at VM </li></ul><ul><ul><li>Examine physical memory, registers, I/O devices for patterns of break-ins </li></ul></ul><ul><li>Interposition: modify h/w abstraction to enhance security </li></ul>
  36. 36. Compute Utility <ul><li>Virtual appliance abstraction </li></ul><ul><ul><li>Target specialized environment (e.g. program development) </li></ul></ul><ul><ul><li>Store targeted VMs in centralized repository </li></ul></ul><ul><ul><li>Cached on running machines </li></ul></ul><ul><li>Benefits: </li></ul><ul><ul><li>Simplified system admin </li></ul></ul><ul><ul><li>Mobility: computing environment follows user around </li></ul></ul>
  37. 37. Transparent replication <ul><li>Replicate VMs across multiple physical machines </li></ul><ul><ul><li>If one fails, another can take over immediately </li></ul></ul><ul><li>No software modification necessary </li></ul><ul><li>Preserves all active network connections </li></ul>
  1. ¿Le ha llamado la atención una diapositiva en particular?

    Recortar diapositivas es una manera útil de recopilar información importante para consultarla más tarde.

×