We are building a high-performance NFV (Network Functions Virtualization) platform using Xen. Unlike the conventional use cases of Xen-based systems, such as data center or public cloud, NFV platforms require low-latency and high-bandwidth I/O for virtual middleboxes to handle small packets efficiently. It is challenging to meet such requirements in virtualization environments because of additional virtualization overhead.
We'll discuss a couple of solutions for high-performance packet forwarding, including SR-IOV VF (Virtual Function), shared-memory channels for VM-to-VM communication, resolving the major bottlenecks and latency/realtime issues, taking advantage of the Xen's architecture. We also discuss how the modern hardware-based virtualization features, such as Posted Interrupts, are helpful. Finally we share the best practices when achieving such high-performance systems.
"We are building a high-performance NFV (Network Functions Virtualization) platform using Xen. Unlike the conventional use cases of Xen-based systems, such as data center or public cloud, NFV platforms require low-latency and high-bandwidth I/O for virtual middleboxes to handle small packets efficiently. It is challenging to meet such requirements in virtualization environments because of additional virtualization overhead.
We'll discuss a couple of solutions for high-performance packet forwarding, including SR-IOV VF (Virtual Function), shared-memory channels for VM-to-VM communication, resolving the major bottlenecks and latency/realtime issues, taking advantage of the Xen's architecture. We also discuss how the modern hardware-based virtualization features, such as Posted Interrupts, are helpful. Finally we share the best practices when achieving such high-performance systems.
3. • What’s NFV (Network Functions Virtualization) ?
• NFV workloads from virtualization perspective
• New/different requirements for NFV
• Challenges and solutions
• Architecture proposal for NFV on Xen
• Summary
3
Agenda
4. 4
NFV Vision from ETSI
Source:
http://portal.etsi.org/nfv/nfv_white_paper2.pdf
5. Firewall
DPI* vSwitch
Load Router
Balancer
Hypervisor
5
Summary of NFV Workloads
“Bump in The Wire”
Telco Workloads
Communications Workload
Heavy Inter-VM
Communication
N x 10 Gbps (14.8Mpps)
…
Compute Intensive
CPU and/or Memory
Very High I/O Rate
*: Deep Packet Inspection
…
6. • High performance across all packet sizes, including small
packets (e.g. 64B)
• Real-time processing, including low latency and jitter
• RAS
• Security
• ...
6
New/different requirements for NFV
Compared with Conventional Virtualization
7. • Network I/O
• 10/40 GbE NICs, virtual I/O (frontend/backend), …
• Direct I/O assignment, SR-IOV, VT-d, Intel® Data Direct I/O
• Interrupt virtualization
• Full APIC virtualization, Posted Interrupts
• Compute (CPU and memory)
• CPU affinity, NUMA-aware
• Real-time
• RAS
• Security
• Guests themselves
• Inter-VM Communication
7
Areas of Focus for NFV
Generic
8. Guests: Networking Performance* with
small packets
app
A
Stack
Bare-metal Direct-assigned virtualization
Tx VM
B
C
drv
dev
D
External Traffic Generator
64Byte Packets
A 0 0.96 Mpps
B 0 1.13 Mpps
C 0 3.71 Mpps
D 14.77 Mpps
host
host guest
app
A
Stack
Tx VM
B
C
drv
dev
D
Assignment
A 0 0.97 Mpps
B 0 1.01 Mpps
C 0 3.67 Mpps
D 14.77 Mpps
Network stack in VM can be a big performance
bottleneck
VT-d
External Traffic Generator
64Byte Packets
*Intel internal measurements 8
9. • Notifications for
queue control
• Kick, door bell
• Virtual Switch
• Packet handling
• Copy, etc.
9
Inter-VM Communication: Switching Path
app app
Host OS
A
Stack
Rx VM
C
drv
tap
Tx VM
tap switch
hypervisor
Stack
drv
dev
dev
Switching path can be a big
performance bottleneck as well
A 0 0.712 Mpps
C 0 0.717 Mpps
*Intel internal measurements
10. Guests themselves: Optimize the guest itself first
• Optimize app, your code
• Bypass the OS software stacks. Use user-level, bypassing
the OS (e.g. drivers, network software stack, etc.) – e.g. in
Linux VM
• Low latency, polling-mode based
• Intel® DPDK (Data Plane Development Kit), Snabb
Switch, etc.
Inter-VM Communication:
• Discuss in the following slides
10
Recap: Guests and Inter-VM
Communication
11. • More cores
• More middle boxes per socket, per
server
• Service chaining on server
• Lower latency
• Inter-VM (i.e. intra-node) vs. Inter-node
• Higher Bandwidth
• Memory (or cache) vs. PCIe bus
11
Why Inter-VM Communication
Figure 1. The Intel® Xeon® processor E5-2600 V2
product family Microarchitecture
Source (Figure 1.):
https://software.intel.com/en-us/articles/intel-xeon-processor-e5-2600-v2-product-family-technical-overview
13. 1. Move knowledge and control for inter-VM communication to
VMs
• Allow VMs to share data/queue structures
• Allow VMs to use faster notification mechanisms w/o VM
exits or interrupts
• E.g. MONITOR/MWAIT (w/o VM exits)
2. Allow VMs to access other VMs to share or access memory in
a safe way
• Provide trusted entities in VMs with “Protected Memory View”
• Mapping itself is provided by the hypervisor
13
Solutions: Empower VMs in a safe way
14. Access Other Guests Memory or Shared
Memory Using EPTP* Switching
Protected View
Guest Physical Pages
Void
Default EPT
Other Guests or
Shared Memory
Host Physical Pages
EPTP
switching
Trusted
Component
VMFUNC Instruction in guest (no VM Exits)
Available in Ring 0-3
14
Can generate
#VE in guest
*:Extended-Page-Table Pointer
15. VM function 0: EPTP switching
• VMFUNC instruction with EAX = 0
• Value in ECX selects an entry from the EPTP list
• Available in Ring 0-3, executed in guest
• No VM exit
• Can be virtualized if not available
#VE: virtualization exception
• Can occur only in guest (vector 20)
• Some EPT violations can generate #VE instead of
VM exits (controlled by hypervisor)
• Can virtualized if not available
15
Details of VM function 0 and #VE
…
EPTP
…
ECX
(index)
EPTP list (4KB)
VMCS (per VCPU)
16. • Map/Unmap API - Request the hypervisor to map/unmap
foreign pages in Protected View (alternate EPT) given the
domain id and gfm (guest page frame #), etc.
• int foreign_memory_alt_p2m_map(domid,gfm, …)
• Return the index in the EPTP list if accepted
• foreign_memory_alt_p2m_unmap(index,..)
• Unmap the view
16
Simple Example
Protected View
do while() {
VMFUNC EAX=0, ECX=index;
…
Access the Protected View;
// Access the Rx/Tx queues,
…
VMFUNC EAX=0, ECX=0 <default EPT>;
// Close the Protected View if done
}
17. *:Need to set up VT-d page table as well
Architecture Proposal for NFV on Xen
Void
Xen Hypervisor
Middle Box (e.g. vSwitch)
Trusted
Component
(kernel or user-level)
Protected
View (guest
physical
pages)
EPT (Usual View)
Protected View
(virtual address)
Kernel (virt. I/O)
EPT with
Kernel (virt. I/O)
Protected View
Direct Access to
granted pages in
VM1
VM1
VM2
VT-d, SR-IOV
Direct Access to
Entire VM2
Grant
Table
VMFUNC
NICs
* (b)
Guest Physical
Pages
Middle Box
(e.g. vswitch)
(a)
17
Xen Hypervisor
(a) “Kick” doesn’t cause VM exit
* (b)
CPU affinity, NUMA-aware
18. • Contain knowledge and control for Inter-VM communication in VMs
• Keep the hypervisor simple and thin
• Suitable for the Xen architecture
• More flexible and powerful than vhost-net
• Minimize overhead and latency
• No hypercalls or VM exits
• Zero-copy (move data from queue to queue directly)
• Allow guest to handle invalid memory access efficiently in guest
• Use #VE upon EPT violation (error handling, testing, etc.)
• Work with direct I/O assignment as well
18
Benefits of New Architecture
19. • Patches for EPTP Switching and #VE for Xen – <Being
Submitted>
• Allow to have additional EPT page tables
• Considering how to extend Protected View for page flipping
• The backend doesn’t need to execute hypercalls to map the
pages granted
• Amortize EPT invalidation cost
• Prototype
• Performance measurements
19
Current Status