Your SlideShare is downloading. ×
0
summary
summary
summary
summary
summary
summary
summary
summary
summary
summary
summary
summary
summary
summary
summary
summary
summary
summary
summary
summary
summary
summary
summary
summary
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

summary

432

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
432
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • PGFAULT– “This is due mostly to the shorter path whereby the software VMM receives control” PTEMOD– adaptive discovery/translation
  • Transcript

    • 1. A Comparison of Software and Hardware Techniques for x86 Virtualization Presented by Mike Marty By Keith Adams and Ole Ageson VMWare
    • 2. The Renaissance of Virtualization <ul><li>1970s: virtual machines first used </li></ul><ul><li>1990s: </li></ul><ul><ul><li>x86 becomes prominent server platform </li></ul></ul><ul><ul><li>No vertical integration in x86 </li></ul></ul><ul><ul><li>Lack of enterprise features in commodity OSs </li></ul></ul><ul><li>1999: VMWare first product to virtualize x86 </li></ul><ul><li>2006: AMD and Intel offer hardware support </li></ul>
    • 3. Outline <ul><li>Classic Virtualization </li></ul><ul><li>Software Virtualization </li></ul><ul><li>Intel/AMD Hardware Virtualization </li></ul><ul><li>Comparison and Results </li></ul><ul><li>Discussion </li></ul>
    • 4. Classic Virtualization <ul><li>Popek and Goldberg’s Criteria: </li></ul><ul><ul><li>Fidelity – run any software </li></ul></ul><ul><ul><li>Performance – run it fairly fast </li></ul></ul><ul><ul><li>Safety – VMM manages all hardware </li></ul></ul><ul><li>Trap-and-Emulate only real solution until recently </li></ul>
    • 5. Trap-and-Emulate Virtualization <ul><li>1. De-Privilege OS </li></ul>OS apps kernel mode user mode
    • 6. Trap-and-Emulate Virtualization <ul><li>1. De-Privilege OS </li></ul>OS apps kernel mode user mode virtual machine monitor OS apps
    • 7. Trap-and-Emulate Virtualization <ul><li>1. De-Privilege OS </li></ul><ul><li>2. Shadow structures and memory tracing </li></ul>OS apps kernel mode user mode virtual machine monitor OS apps primary page table shadow page table shadow page table
    • 8. Trap-and-Emulate cont. <ul><li>Traps are expensive (~3000 cycles) </li></ul><ul><li>Many traps unavoidable </li></ul><ul><ul><li>E.g., page faults </li></ul></ul><ul><li>Important enhancements </li></ul><ul><ul><li>“ Paravirtualization” to reduce traps (e.g., Xen) </li></ul></ul><ul><ul><li>Hardware VM modes (e.g., IBM s370) </li></ul></ul>
    • 9. Can x86 Trap and Emulate? <ul><li>No </li></ul><ul><ul><li>Even with 4 execution modes! </li></ul></ul><ul><ul><li>Key problem: dual-purpose instructions don’t trap </li></ul></ul><ul><li>Classic Example: popf instruction </li></ul><ul><ul><li>Same instruction behaves differently depending on execution mode </li></ul></ul><ul><ul><li>User Mode: changes ALU flags </li></ul></ul><ul><ul><li>Kernel Mode: changes ALU and system flags </li></ul></ul><ul><ul><li>Does not generate a trap in user mode </li></ul></ul>
    • 10. Outline <ul><li>Classic Virtualization </li></ul><ul><li>Software Virtualization </li></ul><ul><li>Intel/AMD Hardware Virtualization </li></ul><ul><li>Comparison and Results </li></ul><ul><li>Discussion </li></ul>
    • 11. Software Virtualization with VMWare <ul><li>Binary translation! </li></ul>X86 X86 (mostly safe, user-mode)
    • 12. VMWare’s Binary Translation <ul><li>On-the-fly </li></ul><ul><li>Only need to translate OS code </li></ul><ul><ul><li>Makes SPEC run fast by default </li></ul></ul><ul><li>Most instruction sequences don’t change </li></ul><ul><li>Instructions that do change: </li></ul><ul><ul><li>Indirect control flow: call/ret, jmp </li></ul></ul><ul><ul><li>PC-relative addressing </li></ul></ul><ul><ul><li>Privileged instructions </li></ul></ul><ul><li>Adaptive Translation </li></ul><ul><ul><li>“ Innocent until proven guilty” </li></ul></ul>
    • 13. Performance Advantages of BT <ul><li>Translation sequences can be faster than native: </li></ul><ul><ul><li>cli vs. vpu.flags.IF := 0 </li></ul></ul><ul><li>Avoid privilege instruction traps </li></ul><ul><ul><li>Example: rdtsc </li></ul></ul><ul><ul><ul><li>Trap-and-emulate: 2030 cycles </li></ul></ul></ul><ul><ul><ul><li>Callout-and-emulate: 1254 cycles </li></ul></ul></ul><ul><ul><ul><li>BT emulation : 216 cycles (but TSC value is stale) </li></ul></ul></ul>
    • 14. Outline <ul><li>Classic Virtualization </li></ul><ul><li>Software Virtualization </li></ul><ul><li>Intel/AMD Hardware Virtualization </li></ul><ul><li>Comparison and Results </li></ul><ul><li>Discussion </li></ul>
    • 15. AMD SVM and Intel VT <ul><li>Extensions to x86-32 and x86-64 </li></ul><ul><ul><li>Allows classic trap-and-emulate ! </li></ul></ul><ul><ul><li>Hardware VM modes to reduce traps </li></ul></ul><ul><ul><li>Details: </li></ul></ul><ul><ul><ul><li>VMCB – virtual machine control block </li></ul></ul></ul><ul><ul><ul><li>VMX mode for running guest OSs </li></ul></ul></ul><ul><ul><ul><li>Vmrun instruction to enter VMX mode </li></ul></ul></ul><ul><ul><ul><li>Many instructions and events cause VMX exits </li></ul></ul></ul><ul><ul><ul><li>Control fields in VMCB can change VMX exit behavior </li></ul></ul></ul>
    • 16. Hardware VM Example: syscall <ul><li>VMM fills in VMCB exception table for Guest OS </li></ul><ul><ul><li>Sets bit in VMCB not exit on syscall exception </li></ul></ul><ul><li>VMM executes vmrun </li></ul><ul><li>Application invokes syscall </li></ul><ul><li>CPU  CPL #0, does not trap , vectors to VMCB exception table </li></ul>
    • 17. Software BT vs. Hardware VM <ul><li>Binary Translation VMM: </li></ul><ul><ul><li>Converts traps to callouts </li></ul></ul><ul><ul><ul><li>Callouts faster than trapping </li></ul></ul></ul><ul><ul><li>Faster emulation routine </li></ul></ul><ul><ul><ul><li>VMM does not need to reconstruct state </li></ul></ul></ul><ul><ul><li>Avoids callouts entirely </li></ul></ul><ul><li>Hardware VMM: </li></ul><ul><ul><li>Preserves code density </li></ul></ul><ul><ul><li>No precise exception overhead </li></ul></ul><ul><ul><li>Faster system calls </li></ul></ul>
    • 18. Compute-bound Benchmarks Bottomline: little difference for SPEC
    • 19. Mixed Benchmarks Process-based Thread-based Who Cares? Would Hardware VM do better for multithreaded database? Cygwin Make is SLOW!
    • 20. Costs of Operations
    • 21. Nanobenchmarks
    • 22. VMWare Nanobenchmarks <ul><li>syscall </li></ul><ul><ul><li>Native/Hardware VMM: same </li></ul></ul><ul><ul><li>Software VMM: +2000 cycles </li></ul></ul><ul><li>in </li></ul><ul><ul><li>Native: 3209 cycles </li></ul></ul><ul><ul><li>Hardware VMM: 15826 cycles </li></ul></ul><ul><ul><li>Software VMM: 15x faster? </li></ul></ul><ul><li>call/ret </li></ul><ul><ul><li>Native/Hardware VMM: 11 cycles </li></ul></ul><ul><ul><li>Software VMM: 51 cycles </li></ul></ul>
    • 23. Opportunities <ul><li>Faster Microarchitecture implementations </li></ul><ul><ul><li>Intel Core Duo already much faster than P4 </li></ul></ul><ul><li>Hardware VMM algorithms </li></ul><ul><li>Software/Hardware Hybrid VMM </li></ul><ul><li>Hardware MMU </li></ul><ul><ul><li>Virtualize DMA </li></ul></ul>
    • 24. Catalysts for Discussion <ul><li>Is BT really faster for things that matter? </li></ul><ul><ul><li>Process-based Apache on Linux? </li></ul></ul><ul><ul><li>Who configures a system to constantly page? </li></ul></ul><ul><li>VMWare is done, why bother with Hardware VM support? </li></ul><ul><ul><li>Simplicity of VMM w/ Hardware support </li></ul></ul><ul><ul><li>New applications </li></ul></ul><ul><li>Will next-gen hardware make binary translation unnecessary? </li></ul>

    ×