2. Agenda
• Introduction
history
Usage model
• Virtualization overview
cpu virtualiztion
memory virtualization
I/O virtualization
• Xen/KVM architecture
Xen
KVM
• Some intel work for Openstack
OAT
2 2012/11/28
3. Virtualization history
• 60s’ IBM - CP/CMS on S360, VM370, …
• 70’s 80s’ Silence
• 1998 VMWare - SimOS project, Stanford
• 2003 Xen - Xen project, Cambridge
• After that: KVM/Hyper-v/Parallels …
3 2012/11/28
4. What is Virtualization
VM0 VM1 VMn
Apps Apps Apps
Guest OS Guest OS ... Guest OS
Virtual Machine Monitor (VMM)
Platform HW
Memory Processors I/O Devices
• VMM is a layer of abstraction
support multiple guest OSes
de-privilege each OS to run as Guest OS
• VMM is a layer of redirection
redirect physical platform to virtual platform illusions of many
provide virtaul platfom to guest os
4 2012/11/28
5. Server Virtualization Usage Model
Server Consolidation R&D Production
App App App
…
OS OS OS
HW HW VMM VMM
HW HW
Benefit: Cost Savings
• Consolidate services Benefit: Business Agility and Productivity
• Power saving
Dynamic Load Balancing
Disaster Recovery
App App App App
1 2 3 4
App
App
… OS OS OS OS
OS OS VMM VMM
VMM
HW HW
VMM
CPU Usage CPU Usage
HW HW 90% 30%
Benefit: Lost saving
• RAS
• live migration
• relief lost • Benefit: Productivity
5 2012/11/28
6. Agenda
• Introduction
• Virtualization overview
CPU virtualization
Memory virtualization
I/O virtualization
• Xen/KVM architecture
• Some intel work for Openstack
6 2012/11/28
7. X86 virtualization challenges
• Ring Deprivileging
Goal: isolate guest OS from
• Controlling physical resources directly
• Modifying VMM code and data
Ring deprivileging layout
• vmm runs at full privileged ring0
• Guest kernel runs at
• X86-32: deprivileging ring 1
• X86-64: deprivileging ring 3
• Guest app runs at ring 3
Ring deprivileging problems
• Unnecessary faulting
• some privilege instructions
• some exceptions
• Guest kernel protection (x86-64)
• Virtualization holes
19 instructions
• SIDT/SGDT/SLDT …
• PUSHF/POPF …
Some userspace holes hard to fix by s/w approach
• Hard to trap, or
• Performance overhead
7 2012/11/28
8. X86 virtualization challenges
VM0 VM0
1
VM0
2
Ring3 Guest Apps
Apps Guest Apps
Apps Guest Apps
Apps
Guest Kernel
Guest OS Guest Kernel
Guest OS Guest Kernel
Guest OS
Ring1
Ring0 Virtual Machine Monitor (VMM)
8 2012/11/28
9. Typical X86 virtualization approaches
• Para-virtualization (PV)
Para virtualization approach, like Xen
Modified guest OS aware and co-work with VMM
Standardization milestone: linux3.0
• VMI vs. PVOPS
• Bare metal vs. virtual platform
• Binary Translation (BT)
Full virtualization approach, like VMWare
Unmodified guest OS
Translate binary ‘on-the-fly’
• translation block w/ caching,
• usually used for kernel, ~80% native performance
• userspace app directly runs natively as much as possible, ~100% native performance
• overall ~95% native performance
• Complicated
• Involves excessive complexities. e.g., self-modifying code
• Hardware-assisted Virtualization (VT)
Full virtualization approach assisted by hardware, like KVM
Unmodified guest OS
Intel VT-x, AMD-v
Benefits:
• Closing virtualization holes in hardware
• Simplify VMM software
• Optimizing for performance
9 2012/11/28
10. Memory virtualization challenges
• Guest OS has 2 assumptions
expect to own physical memory starting from 0
• BIOS/Legacy OS are designed to boot from address low 1M
expect to own basically contiguous physical memory
• OS kernel requires minimal contiguous low memory
• DMA require certain level of contiguous memory
• Efficient MM management, e.g., less buddy overhead
• Efficient TLB, e.g., super page TLB
• MMU virtualization
How to keep physical TLB valid
Different approaches involve different complication and overhead
10 2012/11/28
12. Memory virtualization approaches
• Direct page table
Guest/VMM in same linear space
Guest/VMM share same page table
GVA
• Shadow page table
Guest page table unmodified
• gva -> gpa
VMM shadow page table
• gva -> hpa
Complication and memory overhead Guest
page table
• Extended page table
Guest page table unmodified
• gva -> gpa
Direct Shadow
• full control CR3, page fault
VMM extended page table page table GPA page table
• gpa -> hpa
• hardware based
• good scalability for SMP
• low memory overhead Extended
• Reduce page fault VMexit greatly page table
• Flexible choices
Para virtualization
• Direct page table
• Shadow page table
Full virtualization HPA
• Shadow page table
• Extended page table
12 2012/11/28
13. Shadow page table
Page
Directory
• Guest page table remains PTE
unmodified to guest
PDE
Translate from gva -> gpa Page
Table
• Hypervisor create a new vCR3
page table for physical
Virtual
Use hpa in PDE/PTE
Physical
Translate from gva -> hpa
Invisible to guest Page
Directory
PTE
PDE
Page
Table
pCR3
13 2012/11/28
14. Extended page table
Guest CR3 EPT base pointer
Guest Guest Physical Address Extended
Guest Linear Page Page Host Physical
Address Tables Tables Address
• Extended page table
Guest can have full control over its page tables and events
• CR3, INVLPG, page fault
VMM controls Extended Page Tables
• Complicated shadow page table is eliminated
• Improved scalability for SMP guest
14 2012/11/28
15. I/O virtualization requirements
Interrupt
Register Access
Device CPU
DMA Shared
Memory
• I/O device from OS point of view
Resource configuration and probe
I/O request: IO, MMIO
I/O data: DMA
Interrupt
• I/O Virtualization require
presenting guestos driver a complete device interface
• Presenting an existing interface
• Software Emulation
• Direct assignment
• Presenting a brand new interface
• Paravirtualization
15 2012/11/28
16. I/O virtualization approaches
• Emulated I/O
Software emulates real hardware device
VMs run same driver for the emulated hardware device
Good legacy software compatibility
Emulation overheads limit performance
• Paravirtualized I/O
Uses abstract interfaces and stack for I/O services
FE driver: guest run virtualization-aware drivers
BE driver: driver based on simplified I/O interface and stack
Better performance over emulated I/O
• Direct I/O
Directly assign device to Guest
• Guest access I/O device directly
• High performance and low CPU utilization
DMA issue
• Guest set guest physical address
• DMA hardware only accept host physical address
Solution: DMA Remapping (a.k.a IOMMU)
• I/O page table is introduced
• DMA engine translate according to I/O page table
Some limitations under live migration
16 2012/11/28
17. Virtual platform models
Hypervisor Model Host-based Model Hybrid Model
Guest Guest Guest
Apps Apps ULM
Apps Apps Apps
DM
Service
Preferred Guest Guest Preferred VM Guest
OS OS ULM OS OS OS
DM DR
P M DR P M
Host
Hypervisor U-Hypervisor
OS P LKM M
DR DM N
P Processor Mgt code DR Device Driver N NoDMA
M Memory Mgt code DM Device Model
17 2012/11/28
22. Trusted Pools - Implementation
User specifies ::
OpenStack App
App
App
App
App App
Host
Mem > 2G agent
Disk > 50G OS OS
GPGPU=Intel Hypervisor / tboot
EC2 API
trusted_host=trusted Create VM HW/TXT
Tboot-
Scheduler Enabled
Create TrustedFilter
OSAPI
Query
Report
Attest
untrusted
trusted/
Query API Attestation
Server
Host Agent API
Privacy OAT-
Query API
CA
Based
Attestation Appraiser
Service Whitelist
Whitelist API
DB