IBM Presentations: Blue Pearl Asterisk template
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

IBM Presentations: Blue Pearl Asterisk template

on

  • 1,706 views

 

Statistics

Views

Total Views
1,706
Views on SlideShare
1,706
Embed Views
0

Actions

Likes
1
Downloads
25
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

IBM Presentations: Blue Pearl Asterisk template Presentation Transcript

  • 1. T.J. Watson Research Center Hardware Virtualization Trends Leendert van Doorn Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 2. T.J. Watson Research Center 2 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 3. T.J. Watson Research Center Outline Virtualization 101 The world is changing Processor virtualization (Intel VT-x, VT-x2, AMD SVM) Security enhancements: LT & Presidio Paravirtualization (software isolation approach) I/O Virtualization (AMD, Intel VT-d) Hypervisor Landscape Talk is based on publicly available information 3 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 4. T.J. Watson Research Center Virtualization In Servers Virtual Machines Reduce total cost of ownership (TCO) – Increased systems utilization (current servers have less than 10% utilization) – Reduce hardware (25% of the TCO) – Space, electricity, cooling (50% of the operating cost of a data center) Increase server utilization Management simplification – Dynamic provisioning Virtualization Layer – Workload management/isolation – Virtual machine migration – Reconfiguration Physical Better security Machine Legacy compatibility Virtualization protects IT investment 4 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 5. T.J. Watson Research Center Virtualization is not a Panacea 3 dependent systems 3 independent systems Virtualization Layer Increasing utilization through consolidation decreases the reliability – Need better hardware reliability (increased MTBF), error reporting, and fault tolerance – Need better software fault isolation 5 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 6. T.J. Watson Research Center Virtual Machine Monitor Approaches Type 2 VMM Hybrid VMM Type 1 VMM App App Guest OS 1 Guest OS 2 App App App App VMM Guest OS 1 Guest OS 2 Guest OS 1 Guest OS 2 Host OS Host OS VMM VMM Hardware Hardware Hardware JVM VMware ESX CLR MS Virtual Server Xen VMware Workstation MS Viridian 6 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 7. T.J. Watson Research Center Example: Xen System Architecture Control, I/O Guest Guest (domain 0) domain domain Application Application Application Application Application Application Application Application Application Domain 0 XenoLinux Windows Xen Hardware VMM and Microkernel are akin 7 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 8. T.J. Watson Research Center x86 Virtualization Problem VMM needs to intercept the privileged instructions that are executed in the guest OS Traditional x86 architecture is not virtualizable – POPF: pop value from stack, set flags eflags – If in privileged mode: IF is set – If in non-privileged mode: IF is not set, no exception is raised 8 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 9. T.J. Watson Research Center Virtualization Approaches Full virtualization – Binary rewriting • Inspect each basic block, rewrite privileged instructions • VMware, Virtual PC, qemu – Hardware assist (Intel VT-x, AMD SVM) • Conceptually, introduce a new CPU mode • Xen, VMware, MS Viridian Paravirtualization – Modify guest OS to cooperate with the VMM – Xen, VMware, L4, Denali 9 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 10. T.J. Watson Research Center The World is Changing Hardware virtualization support will become ubiquitous – Every CPU will have (some) kind of virtualization support – Intel VT-x, AMD SVM (Pacifica), PowerPC, Cell 64-bit is here to stay Processor security features – Dynamic root of trust (AMD SVM, Intel LT) Massive multi/many core – Computing cycles will become really cheap and the vendors cannot keep up – What do you do with 64 cores on a single socket? Reliability? Health monitoring? I/O MMU – Intel, AMD IOMMU, IBM Summit/Hurricane allow secure direct device access to partitions – PCI- Express will include virtualization support 10 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 11. T.J. Watson Research Center Processor Virtualization Features Both Intel and AMD defined processor extensions for their CPU architectures Intel: Vanderpool Technology (VT-x, VT-x2) AMD: Secure Virtual Machine (SVM, Pacifica), Rev F, Rev G, … From 10,000 ft. both look very similar: – Container model (similar to mainframe SIE, start interpretive execution) 11 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 12. T.J. Watson Research Center Intel Vanderpool Technology (VT) Guest OS 1 Guest OS 2 VM entry VM entry VM exit VM exit Instruction stream VMM VMXON VMXOFF VT-x adds new instructions such as VMXON, VMXOFF, VMLAUNCH, VMRESUME, VMCALL, … VM entry is caused by a VMLAUNCH or a VMRESUME Each guest has a VMCS (VM control segment) for its state 12 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 13. T.J. Watson Research Center AMD Secure Virtual Machine (SVM) Technology Guest OS 1 Guest OS 2 VM entry VM entry VM exit VM exit Instruction stream VMM SVM adds the following new instructions: VMRUN, VMMCALL, … VM entry is caused by a VMRUN [rax] Each guest has a VMCB (VM control block) for its state Also known as Pacifica 13 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 14. T.J. Watson Research Center Intercepts and Exits A guest runs until – it performs an action that causes an exit – it executes a VMCALL/VMMCALL Exit conditions are specified per guest: – Exceptions (e.g., page faults) and interrupts – Instruction intercepts (CLTS, HLT, IN, OUT, INVLPG, MONITOR,MOV CR/DR, MWAIT, PAUSE, RDTSC …) Intel VT-x has shadow registers 14 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 15. T.J. Watson Research Center Example: Full Virtualization Support for Xen HVM domain Most device emulation is implemented in Application Application Application Application Application ioemu (PCI, VGA, ioemu IDE, NE2100, …) High performance drivers, such as Domain 0 exit RHEL3_U5 ioapic, lapic, vpit are implemented in Xen Xen Developed by Intel, Hardware AMD and IBM 15 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 16. T.J. Watson Research Center Example: Xen Implementation Statistics Lines of Code (C, assembly, headers): – Xen: Intel VT-x specific code: 3718 (3.7%) – Xen: AMD SVM specific code: 5721 (5.6%) – Xen: Common HVM code: 5794 (5.7%) – Tools: Common HVM code: 86313 (85%) Xen 3.0.2 contains both Intel VT-x and AMD SVM support 16 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 17. T.J. Watson Research Center The Cost Of VM Entry And Exit VM entry and exits are very heavy weight and expensive operations VT-x specification has 11 pages of conditions that need to be checked just on a single VM entry! It is paramount to reduce number of exits – Shadow page tables – Direct device assignment 17 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 18. T.J. Watson Research Center Traditional Virtual Memory Map 1GB 4GB Virtual to physical translation (page table) is maintained by the OS The CPU walks the page tables automatically Page faults when page is not present or access 0 0 violation CPU uses Translation- Virtual Physical Lookaside-Buffer (TLB) to Address space Address space cache lookups 18 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 19. T.J. Watson Research Center Virtualized Memory Map 1GB 4GB 4GB 0 0 0 Guest Virtual Guest Physical Host Physical Address space Address space Address space 19 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 20. T.J. Watson Research Center Shadow Page Tables GUEST VMM 1GB 4GB 1GB 4GB cr3 0 0 0 0 Guest Virtual Guest Physical Guest Virtual Host Physical Address space Address space Address space Address space VMM maintains a shadow copy of the guest page table to translate from guest virtual to host physical Hardware only sees the shadow copy 20 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 21. T.J. Watson Research Center Shadow Page Table Issues Managing the Shadow Page Table is expensive – All page faults are handled by the VMM, it has to walk the guest page tables, and instantiate a shadow entry – VMM needs to propagate access and modify bits • A&M bits are used by the demand paging algorithms • The hardware modifies the shadow page table entry • VMM needs to emulate A&M behavior for the guest • May take up to 3 actual page faults per one guest page fault Next generation virtualization extension will do this in hardware (Intel VT-x2, AMD SVM) 21 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 22. T.J. Watson Research Center Double (Page Table) Walker Hardware 1GB 4GB 4GB 0 0 0 Guest Virtual Guest Physical Host Physical Address space Address space Address space AMD Nested Page Table support Intel Extended Page Table (EPT) 22 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 23. T.J. Watson Research Center Processor Security Features Both Intel and AMD are working on hardware security constructs to enhance their virtualization offerings Primarily client focused and driven by Microsoft’s NGSCB designs These enhancements include processor modifications to support – Isolation (VMM) – Trusted computing (TCG) – Trusted keyboard/graphics I/O Intel Lagrande Technology AMD Presidio Technology 23 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 24. T.J. Watson Research Center AMD SVM Dynamic Root of Trust New instruction SKINIT that essentially reboots the CPU into a known state – Start a 64KB secure loader – Interrupts disabled and other processors idled – Inhibit DMA to the secure loader memory area – Measurement of the secure loader is stored in the TCG trusted platform module (a secure passive storage device) Measurement can be attested to by a remote party 24 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 25. T.J. Watson Research Center Intel VT-x / AMD SVM Comparison Intel VT-x (2005) AMD SVM (2006) VMENTER, VMRESUME, VMREAD, VMRUN VMWRITE VMCS – VM control segment VMCB – VM control block ASID tagged TLB (performance) Intel VT-x2 (200?) Paged realmode Extended Page Tables (EPT) SKINIT (security) DMA exclusion vector Intel LT (200?) (security) SENTER (security) DMA exclusion vector (security) AMD SVM (2007) Nested paging (performance) 25 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 26. T.J. Watson Research Center Paravirtualization What do you do when you don’t have hardware support? Modify guest kernel to cooperate with the hypervisor – Introduce hypercalls – All privileged instructions become hypercalls – such as: mov-to-cr3, sti, cli, etc. All guest applications & libraries are unmodified This is the approach IBM i/p-Series took, but also Xen, Microsoft’s Viridian, L4, Denali and Virtual Iron – VMware has now proposed a Virtual Machine Interface (VMI) Paravirtualization has the potential to obviate the need for all the CPU virtualization extensions – Not quite, certain features such as Thread Local Storage are hard to do in a paravirtualized environment 26 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 27. T.J. Watson Research Center Example: Thread Local Storage 3GB App 0 4GB Xen VMM TLS -64M Linux Thread local storage (TLS) is used by 3GB glibc to store per thread data In Linux, TLS resides at the top 64MB, where the Xen VMM resides App Here Xen breaks compatibility at the application interface layer – Xen uses a modified glibc 0 27 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 28. T.J. Watson Research Center Example: Xen Writable Page Tables Globally Shared Page Table Application Application Application Application Application Application 4GB only read Guest Domain readonly Guest Domain writes writes 0 insert after vetted by Xen Xen VMM 28 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 29. T.J. Watson Research Center VMware’s Virtual Machine Interface CPU state Interupt state, interrupt mask, halt, reboot, … CPU descriptors Gdt, idt, ldt, tr, … CPU control Read/write MSR, CR0..4, … CPU information Cpuid, read/write tsc, pmc, … Stack/privilege Update kernel stack, iret, sysexit transitions I/O In/out, delay, IOPL, invd APIC Read/write Timer Get wall clock, frequency, cycles, set alarm, … MMU Set linear mapping, flush TLB, invalidate, set PTE, swap PTE, … 29 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 30. T.J. Watson Research Center CPU Virtualization Techniques Comparison Performance Legacy guest VMM support complexity Binary rewriting medium yes high paravirtualization high no medium Hardware assist low yes medium-low (current gen) Hardware assist medium yes medium-low (next gen) low medium high 30 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 31. T.J. Watson Research Center Typical System Architecture PCI bus Disk controller Virtual CPU PCI text Network RAM CPU text Bridge/ text Controller IOMMU CPU Video controller Physical Address Virtual I/O Address 31 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 32. T.J. Watson Research Center I/O Memory Management I/O MMU translates I/O virtual addresses to host physical addresses Main functions – Memory isolation – Address remapping (guest PA == I/O VA for devices assigned to guest) – Eliminate bounce buffers (32-bit devices into 64-bit PCI space) Protect system against BUSMASTER devices AMD disclosed their design in January 2006, Intel followed in March 2006 (VT-d) – Again, from a 10,000 ft. view, both are very similar IBM has the same Summit IOMMU in p/i/xSeries 32 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 33. T.J. Watson Research Center I/O Hosting Partition • With I/O hosting domain/partition, all real drivers extracted from DomU. • Can have multiple Device Logical Partition Logical Partition Logical Partition Logical Partition Domains to support different devices. • Reasonable performance possible through batching, (page flipping)… (“Unmodified device driver Kernel <-> Hypervisor Interface reuse via virtual machines” OSDI04…) Hypervisor • But performance is just not Hardware <-> Hypervisor Interface good enough to get rid of Hardware Platform all native devices. 33 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 34. T.J. Watson Research Center Direct Device Assignment • With IOMMU can directly give partitions control over Bus/Dev/Func. • Same HW results in Logical Partition Logical Partition Logical Partition Logical Partition improved reliability. • With right HW can support fully virtualized OS (e.g., windows). • Clean support for legacy Kernel <-> Hypervisor Interface OSes and for highest performance devices Hypervisor (majority probably still Hardware <-> Hypervisor Interface virtualized) Hardware Platform • Migration becomes impossible. • DD in each OS 34 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 35. T.J. Watson Research Center Self Virtualizing Devices • Self virtualizing devices allow direct access by partition, e.g., infiniband. • No overhead (throughput, Logical Partition Logical Partition Logical Partition Logical Partition latency, serialization) to context switch to device domain. • Is exception for high performance devices… no migration. Kernel <-> Hypervisor Interface • DD in each OS Hypervisor Hardware <-> Hypervisor Interface Hardware Platform 35 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 36. T.J. Watson Research Center IOTLB AMD I/O MMU PCI Ethernet bus IOTLB Network Controller Physical address Virtual I/O address PCI Disk … Bridge/ controller IOMMU AMD IOMMU has a distributed I/O TLB Video Devices themselves can do the controller translation of I/O virtual to physical Physical Address Virtual I/O Address 36 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 37. T.J. Watson Research Center Partition Migration IOMMU provides isolation and remapping for direct device assignment The current generation of IOMMU proposals do not handle partition migration – Problem: PCI has no ability to restart a request and does not notify failed PCI requests in all cases PCI SIG is working on adding support for this 37 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 38. T.J. Watson Research Center Hypervisor Landscape VMware is the undisputed leader in the x86 virtualization space – Its binary translation technology is superior – Only uses VT-x in x86-64 because unlike AMD, Intel does not provide segment limits – Very mature product Xen is an open source hypervisor shipped as part of RedHat and Suse Linux – Uses paravirtualization for Linux – VT-x/SVM for unmodified guest OS support – Immature product And then there is the big unknown … 38 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 39. T.J. Watson Research Center Microsoft Viridian Virtualization stack WMI VM Worker VM VM Worker VM Worker Service Guest Applications kernel kernel VSPs VSCs Windows Windows vmbus enlightments Hypervisor Hardware Will be released after LongHorn Server (end 2007) Will run Windows and Linux Uses Intel VT-x, AMD SVM, and paravirtualization (enlightments) 39 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 40. T.J. Watson Research Center Summary Technology Trends Challenges – Hardware virtualization – First generation virtual- support will become ization support is just a ubiquitous technology preview – 64-bit is here to stay – Increase system reliability – Processor security features • Hardware fault tolerance • Software robustness – Massive multi/many core – Management of virtual – I/O MMU environments 40 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 41. T.J. Watson Research Center Sources AMD SVM specification, http://www.amd.com/us- en/assets/content_type/white_papers_and_tech_docs/24594.pdf AMD SVM availability of Nested Paging, http://www.amd.com/us- en/assets/content_type/DownloadableAssets/JoeMenardAMDAnalystDayv2.pdf AMD IOMMU specification, http://www.amd.com/us- en/assets/content_type/white_papers_and_tech_docs/34434.pdf Intel VT-x and VT-d specification, http://www.intel.com/technology/computing/vptech/ Intel VT-d release date (2007), http://www.virtualization.info/2006/03/intel-working-on-new- virtualization.html Intel LT specification, http://www.intel.com/technology/security/ Merom, Conroe details, http://en.wikipedia.org/wiki/List_of_Intel_Core_2_microprocessors Microsoft Viridian, http://h40132.www4.hp.com/upload/ch/de/MicrosoftVS2005R2HPEvent.pdf, http://download.microsoft.com/download/4/1/e/41e56f34-8f90-405e-9daf- f8aeea249935/InTrack14dec_Presentatie_clean.ppt#35 VMWARE and CPU virtualization technology, http://download3.vmware.com/vmworld/2005/pac346.pdf VMWARE Virtual machine Interface, http://www.vmware.com/pdf/vmi_specs.pdf 41 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 42. T.J. Watson Research Center BACKUP 42 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 43. T.J. Watson Research Center Code Word Soup Intel VT-x, VT-x2 Vanderpool (processor virtualization) Technology for x86 architecture Intel VT-i VT for IPF (Itanium) Intel VT-d Intel’s IOMMU implementation Intel LT Intel’s Lagrande (security) Technology AMD SVM (Pacifica) AMD’s Secure Virtual Machine CPU extensions AMD Presidio AMD’s Lagrande equivalent 43 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
  • 44. T.J. Watson Research Center TCG Static Root of Trust Operating System /usr/sbin/httpd /usr/sbin/httpd BIOS Bootloader /bin/ls /bin/ls GRUB GRUB Stage2 Stage2 ROT GRUB GRUB Linux Linux GRUB GRUB Stage1 Stage1 Kernel CRTM POST (MBR) Stage1.5 Stage1.5 Kernel CRTM POST (MBR) TPM PCR01-07 PCR04-05 PCR08 PCR10 Trusted Boot 44 Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation