Understanding                                                                                        NETRONOME

Network IO...
The Management VM has access to all IO devices to be
EnterServer Virtualization                                           ...
buffer supplied by the recipient Guest VM. While different tech-
niques are used by different hypervisors, they all have t...
hardware. Typically, MQ NICs perform simple filtering at the           interrupts. The first two, device memory and IO por...
Introducing PCIe SR-IOV                                                     Like modern MQ NICs, SR-IOV-capable NICs requi...
formats used and how and when DMA is initiated. Endpoints                     Figure 4 depicts the commonalities and diffe...
Comparing the Various IOV Implementation Options
   The following table summarizes and compares the four different IOV opt...
can implement the same functionality as                                   Multi-core CPU                                  ...
Upcoming SlideShare
Loading in...5

Understanding Network IO Virtualization Whitepaper (QX):Layout 1


Published on

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Understanding Network IO Virtualization Whitepaper (QX):Layout 1

  1. 1. Understanding NETRONOME Network IO WHITE PAPER Virtualization (IOV) EXECUTIVE OVERVIEW And its Application in the Data Center In today’s network environment, servers and appliances in the data centers are increas- and Network Infrastructure ingly being built around commodity multi- core CPUs – specifically around the x86 architecture. The same scenario applies to control plane and application layer functions in infrastructure equipment, as With rising network traffic and the need for application awareness, content inspection, well. This CPU subsystem is being virtualized and security processing, the amount of network IO processing at line rates increases for efficient use of CPUs, better isolation, exponentially. This, coupled with the need for virtualization, places a huge burden security, ease of management, lower cost on the network IO subsystem. At 10Gbps and beyond, this dictates the use of an IO and lower power. This trend is expected virtualization co-processor (IOV-P). By classifying network traffic into flows, applying to accelerate. security rules and pinning flows to a specific virtual machine (VM) on a specific core As these servers, appliances and on the host, and/or by load-balancing various flows into various VMs, the IOV-P enables equipment are virtualized at the CPU level, the overall system to achieve full network performance. they need an underlying IO subsystem that As servers and network appliances in the data centers and control plane functions is also virtualized. The IOV-P, built into in infrastructure equipments are built around commodity multi-core CPUs – specifically Netronome's NFP-32xx Network Flow x86 architectures – IO communications are becoming dependent on the system inter- Processor family, provides an ideal solution connect, such as PCIe. An eight-lane PCIe v2 interconnect can easily support over 10G for network IO virtualization. Although of network IO traffic. The increasing use of virtualization in servers, appliances and the speed with which vendors will adopt network equipment means that the underlying IO subsystems explicitly have to SR-IOV for such virtualization may vary, support virtualization. Virtualized data center servers and appliances using IOV-P-based Netronome intends to lead the pack by intelligent network cards provide each VM with its own virtual NIC, allowing a building flexibility on top of SR-IOV, while number of VMs to share a single 10GbE physical Network Interface Card (NIC). Each focusing on networking applications. virtual NIC can have its own IP and MAC address and be assigned to a separate VLAN. This paper discusses this new class of To the outside world and host sub-system, the virtual NIC appears as a distinct and network IO virtualization architecture and dedicated NIC. In the same way that multiple VMs running on a multi-core server its role as a key ingredient in virtualized replace multiple-physical servers, the IOV-P can replace multiple NICs and help systems. As described, network IO virtual- ization results in an efficient virtualized replace or simplify the top-of-the-rack switch and server load balancer. The result is environment providing high-performance, higher overall performance, lower cost and easier system management using fewer security and low-power utilization. NICs, cables and switch ports while achieving full network IO performance. Similar benefits apply to network infrastructure equipment when IOV-P is used for intelligent service blades and trunk cards serving the various line cards. This paper discusses this new class of network IO virtualization architecture and its role as a key ingredient in virtualized systems. As described, network IO virtu- alization results in an efficient virtualized environment providing high-performance, security and low-power utilization. Effective Resource Utilization Requires Virtualization As companies grow, their IT infrastructure also grows, leading to an increase in the number of stand-alone servers, storage devices and applications. Unmanaged, this growth can lead to enormous inefficiency, higher expense, availability issues and systems management headaches negatively impacting the company’s core business. Smaller servers may have utilization rates of 20% or less. To address these challenges, organizations are implementing a variety of virtual- ization solutions for servers, storage, applications and clients environments. These virtualization solutions can deliver real business value through practical benefits, such as decreased IT costs and business risks; increased efficiency, utilization and flexibility; streamlined management; and enhanced business resilience, and agility. netronome.com
  2. 2. The Management VM has access to all IO devices to be EnterServer Virtualization shared; and the OS in the Management VM is running the In virtualized severs running VMware® or Xen®, the Physical normal device driver for that device (labeled “DD” in the figure). NIC becomes isolated from the Guest OS used by application The Management VM then needs to virtualize the device and software. The Guest OS, such as Windows® or Linux®, uses a present it to other VMs. NIC driver to talk to a virtual NIC. The virtualization software Conceptually, network device IO virtualization is straight- (Hypervisor) emulates a NIC for each Guest OS. One physical forward. Guest VMs have a virtual network interface with server could have eight or 16 VMs, each of which runs a Guest associated MAC and IP addresses. In the Management VM, the OS talking to a virtual NIC. physical device is visible with a MAC and IP address. Thus, the In addition to allowing multiple Guest OSs to share a sin- Management VM can use standard network mechanisms, such gle physical NIC, the Hypervisor typically emulates an Ethernet as bridging or routing, to direct traffic received from the (L2) switch connecting virtual machines to physical NIC ports. physical interface to the virtual interfaces in the Guest VMs, and Implementing virtual NIC functions and virtual switching to direct traffic received from a Guest VM to other Guest VMs or functions within the virtualization software is performance- physical network device. intensive and adds significant overhead in the networking path. In Figure 1, a software implementation of a normal Ether- This can reduce 10GbE throughput to 1GbE levels. net switch (labeled “SW”) performs this de-multiplexing and Introducing Network multiplexing of traffic to and from Guest VMs. This type of IO Virtualization software-based IO virtualization requires an efficient inter-VM The PCI Special Interest Group (PCI-SIG) IO Virtualization communication mechanism to transport packets between the (IOV) working group is developing extensions to PCIe. The first Management VM and Guest VMs. For bulk data transfers, either IOV specification maintains a Single PCIe Root complex memory copies or virtual memory techniques, such as page (SR-IOV), enabling one physical PCIe device to be divided into mappings or flipping, are deployed. Further, a signaling mech- multiple virtual functions. Each virtual function can then be used anism is required, allowing VMs to notify each other that they by a VM, allowing one physical device to be shared by many have packets to send. It is important that the inter-VM commu- VMs and their Guest OSs. nication mechanism does not violate basic isolation properties between VMs. For example, it should not be possible for a Guest IO Virtualization – VM to corrupt the Management VM or access data in other Guest Implementation Options VMs. Ideally, Guest VMs should also be protected to some extent In any given system, there are a limited number of IO devices – from a misbehaving Management VM, though this is not typically many less than the number of VMs the system may be completely possible due to the more privileged nature of the hosting. As all VMs require access to IO, a Virtual Machine Management VM. Monitor (VMM) or Hypervisor needs to mediate access to these In Figure 1, the inter-VM communication is represented by shared IO devices. In this section we review different IOV an entity in the Management VM (the back-end1) and implementation options. corresponding entity in the Guest VMs (the front-end). The Software IO Virtualization front-ends in the Guest VMs are normal network device drivers All VMMs and Hypervisors provide IO virtualization imple- in the Guest OS. However, they exchange network packets with mented in software. Commercial Hypervisor offerings run IO their corresponding back-end in the Management VM using the virtualization software in a special management – or otherwise aforementioned inter-VM communication mechanism. privileged VM – to virtualized IO devices as depicted in Figure 1. Software-based IO virtualization provides a great deal of flexibility. Within the Management VM, the virtual interfaces Mgmt VM Guest VM Guest VM connected to front-ends can connect to the physical interfaces in arbitrary ways. In the simplest and most common case, the virtual network devices are all connected to a software Ethernet SW BE FE FE bridge or switch. For enterprise environments, this is typically a VLAN-capable switch. The Management VM may also imple- ment a firewall, or other forms of filtering, to protect Guest VMs, as well as provide logging or other monitoring functions. DD In some environments, the Management VM may also provide other functions, such as Network Address Translation (NAT) or routing. In fact, some Hypervisors allow arbitrary virtual networks to be constructed to interconnect Guest VMs. Hypervisor The obvious drawback of this flexibility is a significant pro- cessing overhead, particularly when dealing with received pack- ets. Each packet is received into buffers owned by the Management VM which then needs to inspect the packet and Device determine the recipient Guest VM(s). Subsequently, the Management VM needs to transfer the packet into a receive Figure 1: Software IO Virtualization for Network devices. All network traffic is passed through the Management VM, adding significant virtualization overheads and latency. NETRONOME WHITE PAPER Understanding Network IO Virtualization (IOV) 2
  3. 3. buffer supplied by the recipient Guest VM. While different tech- niques are used by different hypervisors, they all have to copy Mgmt VM Guest VM Guest VM the packet data or exchange pages using page flipping, both of which incur significant overheads. For example, a Xen system BE FE FE without further optimization spends more than five times as many cycles per packet received as compared to native Linux.2 The network transmit path incurs less overheads. However, the Management VM still has to inspect the packets transmitted DD by a Guest VM to determine where to send them. Further, the Management VM may perform some header checks (e.g., to prevent MAC address spoofing) or it may need to rewrite the packet header, for example, to add VLAN tags or perform NAT. This typically requires at least the packet header, if not the Hypervisor entire packet, to be accessible within the Management VM, thus adding extra CPU overheads on the transmit path, as well. Software-based IO virtualization, however, has its Device drawbacks. Not only does it add significant CPU overhead for each packet, it also adds significant latencies. Packets, on both transmit as well as receive, are queued twice (at the device and Figure 2: IO Virtualization with Multi-queue devices. The device for the inter-VM communication). Both the Management VM performs all multiplexing and de-multiplexing of network traffic, significantly reducing the CPU overheads on the Host. and the Guest VM may experience scheduling latencies, delaying the time taken to react to interrupts or inter VM signals and increasing the latency for packet traffic. buffer descriptors to the Management VM which can directly Multi-queue NICs post these descriptors to the queue associated with the Guest Most modern NICs support multiple send and receive queues VM. When packets arrive at the device, the filter mechanism on (MQ NICs); and many commercial hypervisors make use of the device will select the destination queue and DMA the packet these MQ NICs to accelerate network IO virtualization. There into the buffer posted by the Guest VM. Subsequently, the are a number of different approaches for utilizing MQ NICs descriptor is returned to the Management VM which will in a virtualization environment, with the most-suitable approach forward it back to the Guest VM. depending heavily on the detailed capabilities of the NIC. Buffer descriptors have to be passed through the Manage- All MQ NICs provide some filtering of incoming packets to ment VM, rather than allowing the Guest VM to post descriptors decide onto which receive queue to place them. Typically, the directly, so that the Management VM can check that the filter is based on the destination MAC address and/or VLAN tags. memory referred to by the descriptors belongs to the Guest VM. Some MQ NICs also offer further filtering based on very simple Without this check, a Guest VM could either accidentally or L3 and L4 rules. maliciously cause a device to access memory belonging to Early models of MQ NICs did not apply any filtering to another Guest VM, thus violating isolation between VMs. transmitted packets – thus, they could not handle packets des- The transmit path from a Guest VM to the device is also tined for other VMs connected to the same NIC. As a result, these straightforward. Transmit descriptors are passed from the Guest MQ NICs required additional software to handle inter-VM net- VM to the Management VM which passes them on to the work traffic. However, modern MQ NICs typically do not have device. Once the packet is transmitted, the notifications are this limitation, thus simplifying the software support required. passed back to the Guest VM via the Management VM. Figure 2 shows a common architecture for using MQ NICs As should be obvious from this description, IO virtualiza- as an IOV solution in virtualized environments. The main idea tion with MQ NICs incurs far less overhead than software-based is to associate queues (more precisely sets of queues) with IO virtualization, since the data does not need to be moved individual Guest VMs. The OS in the Management VM still runs between VMs, and the Management VM is not involved in the the device driver for the device. However, since the MQ NIC is multiplexing and de-multiplexing of network traffic. Using this performing the multiplexing and de-multiplexing of traffic, the type of IO virtualization close to 10Gbps line-rate can be Management VM does not contain a software Ethernet switch. achieved with modern host hardware. In the Xen implementa- The Guest VMs still use a generic device driver (labeled “FE” or tion, IO virtualization with MQ NICs still incurs a per-packet Front-end) representing virtual network interfaces to their OS. overhead of about twice as many CPU cycles per packet when However, unlike the software IO virtualization scenario, they are compared to native Linux execution. Further, individual connected to a different, device-specific component in the packets still incur significant additional latency, as the descrip- Management VM (labeled “BE” or Back-end). tors have to be passed through the Management VM. Compared to software-based IOV the receive path with MQ , The use of MQ NICs for IO Virtualization severely limits NICs is much more straightforward. A Guest VM will transfer the flexibility offered by software-based IOV as the packet mul- tiplexing and de-multiplexing is performed by fixed-function 1 Note, we are using the terminology of Xen in this example, but both Microsoft’s Hyper-V and VMware’s ESX Server have similar concepts. 2 K. K. Ram, J. R. Santos, Y. Turner, A. L. Cox, S. Rixner: “Achieving 10 Gb/s using Safe and Transparent Network Interface Virtualization.” VEE 2009. NETRONOME WHITE PAPER Understanding Network IO Virtualization (IOV) 3
  4. 4. hardware. Typically, MQ NICs perform simple filtering at the interrupts. The first two, device memory and IO ports, are de- MAC level in order to implement enough functionality for a scribed in the device’s PCI configuration space as Base Address simple L2 Ethernet switch. Registers (BARs). In order for a Guest VM to access device PCI Device Assignment – Toward SR-IOV memory, the Management VM instructs the Hypervisor that a Some Hypervisors, including Xen and VMware ESX Server, given Guest VM is allowed to map the physical addresses at allow the direct assignment of PCI devices to Guest VMs. This is which the device memory is located into its virtual address a relatively small extension to the techniques needed to run de- space. The Hypervisor can use memory protection provided by vice drivers inside the Management VM. Assigning PCI devices the CPU Memory Management Unit (MMU) to enforce that a directly to Guest VMs eliminates the remaining overhead and Guest VM only accesses the device memory belonging to the added latencies of the MQ NIC IO virtualization approach. assigned device. Access to IO ports can be restricted in a similar Figure 3 shows the common architecture for how way using the Task Segment Selector (TSS) on x86 processors. hypervisors support PCI device assignment. The Hypervisor Physical interrupts originating from a device need to be handled provides mechanisms to directly access a PCI device’s hardware by the Hypervisor, as interrupts are only delivered to the resources; and the Management VM needs to provide a way highest-privileged software entity. Hypervisors then virtualize for Guest VMs to discover PCI devices assigned to them and the physical interrupts and deliver them to the Guest VMs. In their associated resources. order to reduce interrupt latencies, it is important that physical Device discovery by a Guest VM is typically achieved by interrupts are delivered to the same CPU core that the destina- providing a virtual PCI bus. The Management VM normally tion Guest VM is using to handle the resulting virtual interrupt. owns the physical PCI buses and enumerates all physical devices In the previous MQ section, we argued that descriptors attached to them. If a PCI device is assigned to a Guest VM, it is need to be passed through the Management VM to prevent enumerated on a virtual PCI (vPCI) bus exported to the Guest breach of VM isolation due to rogue DMA setups. This is not VM. This allows the guest to access the PCI configuration space required for PCI device assignment, since modern chipsets of the device assigned to it. Importantly, all PCI configuration include IO MMUs, such as the Intel® VT-d, which can be set up space accesses by a Guest VM are transferred to the Management by the Hypervisor to allow a device to access only certain pages VM which can either pass them through to the device, intercept of host memory. This is achieved by setting up a page table and emulate them, or discard them. This allows the Manage- mapping in the IO MMU to map host memory into a device’s ment VM to enable or configure hardware resources required by DMA address space. On memory write and read requests from the Guest VM to use the device. a PCI device to or from host memory, the chipsets select an There are three different types of hardware resources a IO MMU page table based on the Requester ID used by the PCI Guest VM must have access to in order to run a device driver for device. Thus, the Hypervisor sets up the IO MMU page tables a physical device: device memory; device IO ports; and device for a device to map only the memory belonging to a Guest VM when the device is assigned to it. This prevents a Guest VM from intentionally or accidentally accessing other VMs’ memory areas Mgmt VM Guest VM Guest VM via a device’s DMA engines. Of all the IO virtualization options, direct PCI device vPCI vPCI vPCI assignment has the lowest overhead and the least added laten- cies. The Management VM is not involved in the data path; it just provides infrequent access to the device’s PCI configura- tion space. The Hypervisor itself is only involved in the virtual- DD DD DD ization of device interrupts which can be achieved with relatively low overhead, especially if physical interrupts are delivered to the same CPU cores on which the recipient Guest VM is executing. However, it is clearly infeasible to have a separate Hypervisor PCI device for every Guest VM in a system, even if multi- functioned devices were used. The PCI-SIG introduced SR-IOV to address this issue. Device(s) Figure 3: PCI device assignment. Guest VMs can directly access hardware devices, eliminating all IO virtualization overheads. NETRONOME WHITE PAPER Understanding Network IO Virtualization (IOV) 4
  5. 5. Introducing PCIe SR-IOV Like modern MQ NICs, SR-IOV-capable NICs require The PCI-SIG introduced the SR-IOV standard in September multiplex and de-multiplex traffic between VFs and typically 2007, recognizing the need to provide a device-centric approach implement the same fixed functionality: L2 switching combined to IO virtualization. As such, the SR-IOV standard builds on top with some basic higher level filtering. Thus, SR-IOV NICs are a wide range of existing PCI standards, including PCI Express similarly limited in flexibility as MQ NICs. (PCIe), Alternative Routing ID (ARI), Address Translation Challenges with SR-IOV Services (ATS) and Function Level Reset (FLR). From the host SR-IOV is well suited for providing hardware support for virtu- perspective, SR-IOV on its own is primarily an extension to the alizing fixed-function devices, such as network cards. However, PCI configuration space, defining access to lightweight Virtual its design has limitations in supporting highly programmable IO Functions (VFs). devices. With SR-IOV, VFs are enumerated in a hardware-based With SR-IOV, a physical PCI device may contain several PCI configuration space; and all VFs associated with a PF have device functions. In SR-IOV parlance, these are called Physical to be of the same device type. Programmable IO devices may Functions (PFs). PFs are standard PCIe devices with their own allow vendors to dynamically create virtual functions and use full PCI configuration space and set of resources. An SR-IOV- different types of device functions to provide different interfaces compliant PF has an additional SR-IOV Extended Capability as to the IO device. For example, a networking device may be part of its configuration space. able to offer a standard NIC interfaces as well as interfaces for This extended capability in the PF’s configuration space efficient packet capture and network interfaces offloading contains configuration information about all VFs associated network security protocols. with the PF In particular, it defines the BAR configuration for . With SR-IOV, these three types of network interfaces would VFs, as well as the type of the VF . have to be represented as three different PFs, each with a set of While the BAR configuration for VFs is described in the VFs associated to them. From the SR-IOV standard, it is unclear associated PF’s Extended Capability, each VF also has a standard if the assignment of VFs to different PFs can be easily achieved PCIe configuration space entry. However, certain fields in a VF’s (i.e., it is unclear how dynamic VFs of a given type can be configuration space are ignored or undefined. Of particular note created). This limitation is a direct result of SR-IOV requiring is that the Vendor ID and Device ID fields in a VF’s configuration VFs to be enumerated in hardware, which also results in higher space are not defined and have to be taken from the associated hardware cost and complexity. PF’s configuration space fields. Due to this arrangement, all VFs Despite this additional cost and complexity, a software of a PF have to be of the same type. Further, as previously out- component – the PCIM – is still required to manage VFs. lined, the BAR configuration entries in a VF’s configuration space In the next section, we outline an alternative solution to are undefined, as the PF’s extended capability defines the BARs address these challenges. for all VFs. Each VF has its own set of MSI/MSI-X vectors; and these are configured using the VF’s PCIe configuration space. Netronome IOV Solution The SR-IOV standard anticipates that host software Netronome has designed a new IO co-processor, the NFP-32xx (including virtualization software) requires a PCI Manager (NFP). The NFP offers up to 20Gbps of accelerated network (PCIM) to manage PFs, VFs, their capabilities, configuration and processing for a variety of error handling. However, the standard explicitly does not define applications, including any implementation of the PCIM. For example, for BAR and IDS/IPS systems, L2-L7 other configuration accesses to a VF an implementation would , processing in network typically present a VF’s configuration space as a normal PCI infrastructure equipment device to the OS and/or Hypervisor and mask the differences and network security appli- through software emulation. Thus, a PCIM implementation is ances. The NFP is highly very similar in functionality to the vPCI module used for PCI programmable, including device assignment. In fact, in most implementation, the vPCI many aspects of its PCIe x8 module and the PCIM implementation cooperate. (v2.0) host interface. This programmability SR-IOV for Network Devices allows greater flexibility with From a virtualization perspective, SR-IOV-capable network how aspects of the device are devices combine PCI device assignment with the network The NFP offers up to 20Gbps of presented to the host system. accelerated network processing virtualization techniques of modern MQ devices. With the help The NFP supports up 256 for a variety of applications. of the PCIM SR-IOVs, VFs are typically treated as standard PCI queues to the host, which are devices, which are directly assigned to Guest VMs. Since VFs are grouped to form endpoints. The queues are generic in the sense using different Requester IDs, the chipset’s IO MMU can be set that they can carry arbitrary descriptors, thus making it possible up to provide appropriate DMA protection; and, with each VF to create endpoints of different endpoint types. Example possessing their own MSI/MSI-X vectors, interrupts can be endpoint types include standard NIC-style interface with RX and directed to the cores executing the Guest VMs. Thus, SR-IOV TX queues, optimized packet capture interfaces and look-a-side provides the same low overhead and latency access to IO crypto interfaces. The details of these interfaces are under devices as PCI device assignment. software control running on the NFP which defines, amongst , others, the purpose of the assigned queues, the descriptor NETRONOME WHITE PAPER Understanding Network IO Virtualization (IOV) 5
  6. 6. formats used and how and when DMA is initiated. Endpoints Figure 4 depicts the commonalities and differences between of these different types can be created dynamically at runtime. SR-IOV and Netromone’s IOV solution. With the Netronome Furthermore, for a variety of applications in both the data solution, different types of endpoints (indicated by the different center and network infrastructure equipment, it is necessary for colors) are easily supported, while SR-IOV mandates that all these different endpoints to be accessible by different Guest VMs VFs associated with a PF are of the same device type. With executing on the host with low overhead – an IOV solution is Netronome’s IOV solution, the PF driver, in combination with a required. However, as previously noted, SR-IOV is not designed virtual PCI implementation, performs the same function as the to support this type of highly dynamic and highly flexible devices. PF driver and the PCIM for SR-IOV. A More Flexible IOV Solution For both SR-IOV and for Netronome’s IOV solution, The key insight is that SR-IOV relies on a number of PCI virtual functions are presented to the host OS or the Hypervisor standards and essentially only adds a device enumeration and as standard PCI devices. Thus, they both leverage the mecha- resource discovery mechanism. This mechanism is the primary nism provided by modern Hypervisors to assign PCI devices reason for the limitations of SR-IOV. to Guest VMs. With the Netronome IOV solution, device enumeration During initialization, each SCFV gets assigned a unique PCI is delegated to a driver running on the host. This driver is function ID, which the NFP uses to tag DMA requests originat- specific to the NFP and capable of managing endpoints on the ing from the corresponding endpoint. Thus, IO MMU page tables NFP – creation and removal of endpoints of arbitrary types. can be set up appropriately to provide DMA protection between To the host OS (or hypervisor) the host driver acts as a PCI bus different SCVFs. Each SCVF also gets assigned one or more driver and enumerates NFP endpoints as standard PCI devices MSI vectors from the PF’s set of MSI-X vectors, enabling them to – it essentially implements a virtual PCI bus. All configuration directly notify the Guest VMs to which they are assigned. space access for devices on this virtual PCI bus are passed to Network Virtualization with Netronome’s Solution the host driver which either emulates or translates them to The host side interactions of the NFP are identical to an SR-IOV accesses on the NFP . solution. SCVFs can be assigned to Guest VMs just like VFs or This solution is not dissimilar to the SR-IOV approach. The PCI devices. This provides Guest VMs with low overhead and host driver performs the same function as the PF driver for an low latency access to network devices. Guest VMs run standard SR-IOV device (management of VFs) and the PCIM (translating device drivers to talk to the SCVF and interrupts can be deliv- SR-IOV VFs to PCI devices to the host OS or hypervisor). ered directly to the cores on which a Guest VM is executing. However, the Netronome solution does not require VFs to also However, since the NFP is a programmable network device, be enumerated in hardware. We, therefore, refer to the NFP the multiplexing and de-multiplexing of packets from and endpoints as Software-configurable Virtual Functions (SCVFs). to SCVFs is not limited to some fixed-function hardware Because the Netronome host driver is not restricted by the implementation as in most SR-IOV or MQ NICs. Instead, SR-IOV device enumeration limitations, it can enumerate extensive packet processing can be performed, including flow- arbitrary types of functions on its virtual PCI bus. based classification, load balancing or filtering. This provides a significantly more flexible solution than other hardware-based IOV approaches. VF ... VF SCVF ... SCVF DRV DRV DRV DRV PF PF DRV PCIM DRV vPCI OS OS VF ... VF PF EP ... EP PF SR-IOV Device NFP-32xx Figure 4: SR-IOV (left) and Netronome’s IOV (right) compared. The primary difference is that, with Netronome’s IOV solution, different types of devices are easily supported. NETRONOME WHITE PAPER Understanding Network IO Virtualization (IOV) 6
  7. 7. Comparing the Various IOV Implementation Options The following table summarizes and compares the four different IOV options previously discussed. SW IOV MQ IOV SR-IOV NFP IOV Flexibility in Packet Processing High Low Low High Overhead High Medium Low Low Latency High Medium Low Low IO-MMU Support Limited Very Limited Yes Yes Guest VM Drivers Generic Generic Device-specific Device-specific Management VM Support Generic Device-specific Device-specific Device-specific Flexibility in Device Support Medium Low Low High The Netronome IOV solution has the advantage of being SR-IOV-compliant while providing flexible device support – most notably, the capability to dynamically assign different kinds of virtual functions at run time. The result is that a physical NIC can provide multiple virtual NIC types, including a dumb NIC, intelligent NIC, crypto NIC or Packet Capture (PCap) NIC. Application of IOV driver for IO virtualization. These consolidated data center in the Data Center networks require lower latency and higher throughput than Next-generation data centers need to address a complex set of traditional data-only networks. issues, such as: IT consolidation, service continuity, service As noted earlier, the different approaches to IO Virtualiza- flexibility and energy efficiency. Virtualization of servers is tion have very different characteristics with regard to latency. already seen as a key factor in moving to a next-generation data Next-generation data center requirements for low latency and center. Without IO virtualization the limitations already low overhead delivery of network IO directly to VMs can only discussed in this article will severely restrict the extent to which be provided by hardware-based IOV solutions, such as SR-IOV- the goals outlined above can be met. based NICs or Netronome’s IOV solution. Software-based IO vir- A single, multi-core server can easily support ten to 100 tualization imposes too high an overhead to handle the expected VMs, allowing numerous applications – which, today, require a network IO demand; and IOV solutions based on MQ NICs are dedicated server – to share a single physical server. This allows unsuitable for latency-sensitive applications, such as FCoE. the number of servers in the data center to be reduced while As servers and network appliances in the data centers are increasing average utilization from as low as 5-15% to up to built around commodity multi-core CPUs – specifically, x86 50-60%. With multiple VMs running on a single physical ma- architectures – and network IO around PCIe, implementing chine, there are opportunities for the VMs to cost-effectively IOV over PCIe becomes critical in allowing the many VMs to share a pool of IO resources, such as intelligent NICs. In the sin- share network IO devices. gle application, single server model, each application has access Data centers also deploy a wide range of advanced network to the entire server’s bandwidth. In the virtualized server model, and management technologies within the Ethernet infrastruc- however, network bandwidth is shared by multiple applications. ture, such as extensive use of Access Control Lists (ACLs), As more applications are consolidated on one server, sophisticated VLAN setups, Quality of Service and even some bandwidth requirements per server and server utilization both limited L3 and L4 processing. These technologies are readily increase significantly. The result is that an intelligent NIC is available in modern network infrastructure equipment, such as needed to offload network processing for the host in order to Top-of-the-Rack (TOR) switches. However, even modern prevent the host CPU from becoming the bottleneck that limits SR-IOV-based NICs only provide very limited, fixed-function application consolidation. This trend requires low overhead switching capabilities, creating a disconnect between the delivery of network data directly to Guest VM. sophisticated physical network infrastructure and virtual net- The move to a unified network for all traffic in the data work infrastructure implemented on the host. An IOV solution center – with data and storage networks consolidating onto combined with intelligent IO processing (IOV-P) bridges this standard Ethernet using technologies, such as Fiber Channel gap and extends sophisticated network processing and manage- over Ethernet (FCoE) or iSCSI over 10Gbps Ethernet – is also a ment into virtualized servers. An IOV-P-based intelligent NIC NETRONOME WHITE PAPER Understanding Network IO Virtualization (IOV) 7
  8. 8. can implement the same functionality as Multi-core CPU Multi-core CPU modern data center switches, including monitoring and policy enforcement VM1 VM2 VM3 VMn VM1 VM2 VM3 VMn (ACLs), within the server. This enables OS OS OS OS OS OS OS OS these policies to be applied even for inter- VM network traffic without having to be passed through the TOR switches. IOV IOV IOV IOV IOV IOV IOV IOV Furthermore, the proliferation of encrypted network traffic (IPSec and SSL) provides the opportunity to offload some x86 Chipset or all of its required processing to an Control Plane intelligent NIC, freeing up host CPU cycles for application processing. PCIe Gen2 Data Plane 8 Lanes In summary, Intelligent Ethernet NICs with IOV capability are key ingre- dients of the virtualized system, as they IOV-P greatly reduce utilization of the host CPU 10GbE 10GbE for network processing, allowing the system to support a larger number of ap- Netronome NFP-32xx plications while saving power. Adding IOV capability to the intelligent NIC en- sures that each application can be config- ured with its own virtual NIC, allowing a 40G Backplane Interface number of applications to share a single 10GbE physical NIC. At the same time, Figure 5: Service blade / intelligent NIC architecture for infrastructure equipment. the IOV-P concept allows a single physical Integrating IOV-P function in the data plane allows the virtualized equipment / NIC to provide many different “intelli- appliance to support multiple different virtual functions. Netronome’s NFP-32xx processor integrates the IOV-P function. gent” functions to the VM and even create and refine these functions at run time. Application of IOV in Network Infrastructure Equipment Summary and Conclusion In much the same way as with data centers, control plane and In today’s network environment, servers and appliances in the application layer functions in infrastructure equipment are built data centers, as well as control plane and application layer around commodity, multi-core virtualized CPUs. These will also functions in infrastructure equipment, are increasingly being need an underlying IO subsystem that is also virtualized. built around commodity multi-core CPUs – specifically around Such IOV subsystems will be used to implement intelligent the x86 architecture. This CPU subsystem is being virtualized service blades, network appliances, intelligent trunk cards and for efficient use of CPUs, better isolation, security, ease of intelligent line cards in the core network infrastructure. Such management, lower cost and lower power. This trend is expected cards usually serve the various line cards in the system and will to accelerate. be best implemented based on the IOV-P These cards can be . As these servers, appliances and equipment are virtualized intelligent virtualized NICs supporting 10GbE and above, at the CPU level, they need an underlying IO subsystem that is service blades supporting multiple services or trunk cards also virtualized. The IOV-P provides an ideal solution for IO supporting nested tunnels. virtualization. However, the speed with which vendors will Figure 5 depicts a virtualized multi-core system with IOV adopt SR-IOV for such virtualization remains to be seen. capability running multiple applications. Such applications run Netronome’s IOV solution leads the pack by building flexibility on multiple instances of the same OS, or different OSs. These, in on top of SR-IOV, while focusing on networking applications. turn, run on a single core, single VM or multiple cores, or Nabil Damouny is the senior director of marketing and Rolf multiple VMs. By classifying network IO traffic into flows, Neugebauer is a staff software engineer at Netronome Systems. applying security rules and pinning flows to a specific VM on a specific core on the host, and/or by load balancing various flows into various VMs, the IOV-P enables the overall system to achieve Netronome has full network performance at 10Gbps and beyond. operations in: USA (Pittsburgh [HQ], Santa Clara & Boston), ® Netronome and the Netronome Logo are registered UK (Cambridge), trademarks of Netronome Systems, Inc. Malaysia (Penang), TM “Intelligent to the Core.” is a trademark of Netronome South Africa (Centurion) and Systems, Inc. China (Shenzhen, Hong Kong) All other trademarks are the property of their respective owners. info@netronome.com © 2009 Netronome Systems, Inc. All rights reserved. +1 877 638 7629 Specifications are subject to change without notice. (9-09) netronome.com