On October 23rd, 2014, we updated our
By continuing to use LinkedIn’s SlideShare service, you agree to the revised terms, so please take a few minutes to review them.
Network IO WHITE PAPER
Virtualization (IOV) EXECUTIVE OVERVIEW
And its Application in the Data Center In today’s network environment, servers and
appliances in the data centers are increas-
and Network Infrastructure ingly being built around commodity multi-
core CPUs – specifically around the x86
architecture. The same scenario applies
to control plane and application layer
functions in infrastructure equipment, as
With rising network traffic and the need for application awareness, content inspection,
well. This CPU subsystem is being virtualized
and security processing, the amount of network IO processing at line rates increases
for efficient use of CPUs, better isolation,
exponentially. This, coupled with the need for virtualization, places a huge burden security, ease of management, lower cost
on the network IO subsystem. At 10Gbps and beyond, this dictates the use of an IO and lower power. This trend is expected
virtualization co-processor (IOV-P). By classifying network traffic into flows, applying to accelerate.
security rules and pinning flows to a specific virtual machine (VM) on a specific core As these servers, appliances and
on the host, and/or by load-balancing various flows into various VMs, the IOV-P enables equipment are virtualized at the CPU level,
the overall system to achieve full network performance. they need an underlying IO subsystem that
As servers and network appliances in the data centers and control plane functions is also virtualized. The IOV-P, built into
in infrastructure equipments are built around commodity multi-core CPUs – specifically Netronome's NFP-32xx Network Flow
x86 architectures – IO communications are becoming dependent on the system inter- Processor family, provides an ideal solution
connect, such as PCIe. An eight-lane PCIe v2 interconnect can easily support over 10G for network IO virtualization. Although
of network IO traffic. The increasing use of virtualization in servers, appliances and the speed with which vendors will adopt
network equipment means that the underlying IO subsystems explicitly have to SR-IOV for such virtualization may vary,
support virtualization. Virtualized data center servers and appliances using IOV-P-based Netronome intends to lead the pack by
intelligent network cards provide each VM with its own virtual NIC, allowing a building flexibility on top of SR-IOV, while
number of VMs to share a single 10GbE physical Network Interface Card (NIC). Each focusing on networking applications.
virtual NIC can have its own IP and MAC address and be assigned to a separate VLAN. This paper discusses this new class of
To the outside world and host sub-system, the virtual NIC appears as a distinct and network IO virtualization architecture and
dedicated NIC. In the same way that multiple VMs running on a multi-core server its role as a key ingredient in virtualized
replace multiple-physical servers, the IOV-P can replace multiple NICs and help systems. As described, network IO virtual-
ization results in an efficient virtualized
replace or simplify the top-of-the-rack switch and server load balancer. The result is
environment providing high-performance,
higher overall performance, lower cost and easier system management using fewer
security and low-power utilization.
NICs, cables and switch ports while achieving full network IO performance. Similar
benefits apply to network infrastructure equipment when IOV-P is used for intelligent
service blades and trunk cards serving the various line cards.
This paper discusses this new class of network IO virtualization architecture
and its role as a key ingredient in virtualized systems. As described, network IO virtu-
alization results in an efficient virtualized environment providing high-performance,
security and low-power utilization.
Effective Resource Utilization
As companies grow, their IT infrastructure also grows, leading to an increase in the
number of stand-alone servers, storage devices and applications. Unmanaged, this
growth can lead to enormous inefficiency, higher expense, availability issues and
systems management headaches negatively impacting the company’s core business.
Smaller servers may have utilization rates of 20% or less.
To address these challenges, organizations are implementing a variety of virtual-
ization solutions for servers, storage, applications and clients environments. These
virtualization solutions can deliver real business value through practical benefits, such
as decreased IT costs and business risks; increased efficiency, utilization and flexibility;
streamlined management; and enhanced business resilience, and agility.
The Management VM has access to all IO devices to be
EnterServer Virtualization shared; and the OS in the Management VM is running the
In virtualized severs running VMware® or Xen®, the Physical
normal device driver for that device (labeled “DD” in the figure).
NIC becomes isolated from the Guest OS used by application
The Management VM then needs to virtualize the device and
software. The Guest OS, such as Windows® or Linux®, uses a
present it to other VMs.
NIC driver to talk to a virtual NIC. The virtualization software
Conceptually, network device IO virtualization is straight-
(Hypervisor) emulates a NIC for each Guest OS. One physical
forward. Guest VMs have a virtual network interface with
server could have eight or 16 VMs, each of which runs a Guest
associated MAC and IP addresses. In the Management VM, the
OS talking to a virtual NIC.
physical device is visible with a MAC and IP address. Thus, the
In addition to allowing multiple Guest OSs to share a sin-
Management VM can use standard network mechanisms, such
gle physical NIC, the Hypervisor typically emulates an Ethernet
as bridging or routing, to direct traffic received from the
(L2) switch connecting virtual machines to physical NIC ports.
physical interface to the virtual interfaces in the Guest VMs, and
Implementing virtual NIC functions and virtual switching
to direct traffic received from a Guest VM to other Guest VMs or
functions within the virtualization software is performance-
physical network device.
intensive and adds significant overhead in the networking path.
In Figure 1, a software implementation of a normal Ether-
This can reduce 10GbE throughput to 1GbE levels.
net switch (labeled “SW”) performs this de-multiplexing and
Introducing Network multiplexing of traffic to and from Guest VMs. This type of
IO Virtualization software-based IO virtualization requires an efficient inter-VM
The PCI Special Interest Group (PCI-SIG) IO Virtualization communication mechanism to transport packets between the
(IOV) working group is developing extensions to PCIe. The first Management VM and Guest VMs. For bulk data transfers, either
IOV specification maintains a Single PCIe Root complex memory copies or virtual memory techniques, such as page
(SR-IOV), enabling one physical PCIe device to be divided into mappings or flipping, are deployed. Further, a signaling mech-
multiple virtual functions. Each virtual function can then be used anism is required, allowing VMs to notify each other that they
by a VM, allowing one physical device to be shared by many have packets to send. It is important that the inter-VM commu-
VMs and their Guest OSs. nication mechanism does not violate basic isolation properties
between VMs. For example, it should not be possible for a Guest
IO Virtualization – VM to corrupt the Management VM or access data in other Guest
Implementation Options VMs. Ideally, Guest VMs should also be protected to some extent
In any given system, there are a limited number of IO devices –
from a misbehaving Management VM, though this is not
typically many less than the number of VMs the system may be
completely possible due to the more privileged nature of the
hosting. As all VMs require access to IO, a Virtual Machine
Monitor (VMM) or Hypervisor needs to mediate access to these
In Figure 1, the inter-VM communication is represented by
shared IO devices. In this section we review different IOV
an entity in the Management VM (the back-end1) and
corresponding entity in the Guest VMs (the front-end). The
Software IO Virtualization front-ends in the Guest VMs are normal network device drivers
All VMMs and Hypervisors provide IO virtualization imple- in the Guest OS. However, they exchange network packets with
mented in software. Commercial Hypervisor offerings run IO their corresponding back-end in the Management VM using the
virtualization software in a special management – or otherwise aforementioned inter-VM communication mechanism.
privileged VM – to virtualized IO devices as depicted in Figure 1. Software-based IO virtualization provides a great deal of
flexibility. Within the Management VM, the virtual interfaces
Mgmt VM Guest VM Guest VM connected to front-ends can connect to the physical interfaces
in arbitrary ways. In the simplest and most common case, the
virtual network devices are all connected to a software Ethernet
SW BE FE FE bridge or switch. For enterprise environments, this is typically a
VLAN-capable switch. The Management VM may also imple-
ment a firewall, or other forms of filtering, to protect Guest VMs,
as well as provide logging or other monitoring functions.
DD In some environments, the Management VM may also provide
other functions, such as Network Address Translation (NAT) or
routing. In fact, some Hypervisors allow arbitrary virtual
networks to be constructed to interconnect Guest VMs.
Hypervisor The obvious drawback of this flexibility is a significant pro-
cessing overhead, particularly when dealing with received pack-
ets. Each packet is received into buffers owned by the
Management VM which then needs to inspect the packet and
Device determine the recipient Guest VM(s). Subsequently, the
Management VM needs to transfer the packet into a receive
Figure 1: Software IO Virtualization for Network devices.
All network traffic is passed through the Management VM,
adding significant virtualization overheads and latency.
NETRONOME WHITE PAPER Understanding Network IO Virtualization (IOV) 2
buffer supplied by the recipient Guest VM. While different tech-
niques are used by different hypervisors, they all have to copy Mgmt VM Guest VM Guest VM
the packet data or exchange pages using page flipping, both of
which incur significant overheads. For example, a Xen system BE FE FE
without further optimization spends more than five times as
many cycles per packet received as compared to native Linux.2
The network transmit path incurs less overheads. However,
the Management VM still has to inspect the packets transmitted DD
by a Guest VM to determine where to send them. Further, the
Management VM may perform some header checks (e.g., to
prevent MAC address spoofing) or it may need to rewrite the
packet header, for example, to add VLAN tags or perform NAT.
This typically requires at least the packet header, if not the
entire packet, to be accessible within the Management VM, thus
adding extra CPU overheads on the transmit path, as well.
Software-based IO virtualization, however, has its Device
drawbacks. Not only does it add significant CPU overhead for
each packet, it also adds significant latencies. Packets, on both
transmit as well as receive, are queued twice (at the device and Figure 2: IO Virtualization with Multi-queue devices. The device
for the inter-VM communication). Both the Management VM performs all multiplexing and de-multiplexing of network traffic,
significantly reducing the CPU overheads on the Host.
and the Guest VM may experience scheduling latencies,
delaying the time taken to react to interrupts or inter VM
signals and increasing the latency for packet traffic. buffer descriptors to the Management VM which can directly
Multi-queue NICs post these descriptors to the queue associated with the Guest
Most modern NICs support multiple send and receive queues VM. When packets arrive at the device, the filter mechanism on
(MQ NICs); and many commercial hypervisors make use of the device will select the destination queue and DMA the packet
these MQ NICs to accelerate network IO virtualization. There into the buffer posted by the Guest VM. Subsequently, the
are a number of different approaches for utilizing MQ NICs descriptor is returned to the Management VM which will
in a virtualization environment, with the most-suitable approach forward it back to the Guest VM.
depending heavily on the detailed capabilities of the NIC. Buffer descriptors have to be passed through the Manage-
All MQ NICs provide some filtering of incoming packets to ment VM, rather than allowing the Guest VM to post descriptors
decide onto which receive queue to place them. Typically, the directly, so that the Management VM can check that the
filter is based on the destination MAC address and/or VLAN tags. memory referred to by the descriptors belongs to the Guest VM.
Some MQ NICs also offer further filtering based on very simple Without this check, a Guest VM could either accidentally or
L3 and L4 rules. maliciously cause a device to access memory belonging to
Early models of MQ NICs did not apply any filtering to another Guest VM, thus violating isolation between VMs.
transmitted packets – thus, they could not handle packets des- The transmit path from a Guest VM to the device is also
tined for other VMs connected to the same NIC. As a result, these straightforward. Transmit descriptors are passed from the Guest
MQ NICs required additional software to handle inter-VM net- VM to the Management VM which passes them on to the
work traffic. However, modern MQ NICs typically do not have device. Once the packet is transmitted, the notifications are
this limitation, thus simplifying the software support required. passed back to the Guest VM via the Management VM.
Figure 2 shows a common architecture for using MQ NICs As should be obvious from this description, IO virtualiza-
as an IOV solution in virtualized environments. The main idea tion with MQ NICs incurs far less overhead than software-based
is to associate queues (more precisely sets of queues) with IO virtualization, since the data does not need to be moved
individual Guest VMs. The OS in the Management VM still runs between VMs, and the Management VM is not involved in the
the device driver for the device. However, since the MQ NIC is multiplexing and de-multiplexing of network traffic. Using this
performing the multiplexing and de-multiplexing of traffic, the type of IO virtualization close to 10Gbps line-rate can be
Management VM does not contain a software Ethernet switch. achieved with modern host hardware. In the Xen implementa-
The Guest VMs still use a generic device driver (labeled “FE” or tion, IO virtualization with MQ NICs still incurs a per-packet
Front-end) representing virtual network interfaces to their OS. overhead of about twice as many CPU cycles per packet when
However, unlike the software IO virtualization scenario, they are compared to native Linux execution. Further, individual
connected to a different, device-specific component in the packets still incur significant additional latency, as the descrip-
Management VM (labeled “BE” or Back-end). tors have to be passed through the Management VM.
Compared to software-based IOV the receive path with MQ
, The use of MQ NICs for IO Virtualization severely limits
NICs is much more straightforward. A Guest VM will transfer the flexibility offered by software-based IOV as the packet mul-
tiplexing and de-multiplexing is performed by fixed-function
1 Note, we are using the terminology of Xen in this example, but both Microsoft’s Hyper-V and VMware’s ESX Server have similar concepts.
2 K. K. Ram, J. R. Santos, Y. Turner, A. L. Cox, S. Rixner: “Achieving 10 Gb/s using Safe and Transparent Network Interface Virtualization.” VEE 2009.
NETRONOME WHITE PAPER Understanding Network IO Virtualization (IOV) 3
hardware. Typically, MQ NICs perform simple filtering at the interrupts. The first two, device memory and IO ports, are de-
MAC level in order to implement enough functionality for a scribed in the device’s PCI configuration space as Base Address
simple L2 Ethernet switch. Registers (BARs). In order for a Guest VM to access device
PCI Device Assignment – Toward SR-IOV memory, the Management VM instructs the Hypervisor that a
Some Hypervisors, including Xen and VMware ESX Server, given Guest VM is allowed to map the physical addresses at
allow the direct assignment of PCI devices to Guest VMs. This is which the device memory is located into its virtual address
a relatively small extension to the techniques needed to run de- space. The Hypervisor can use memory protection provided by
vice drivers inside the Management VM. Assigning PCI devices the CPU Memory Management Unit (MMU) to enforce that a
directly to Guest VMs eliminates the remaining overhead and Guest VM only accesses the device memory belonging to the
added latencies of the MQ NIC IO virtualization approach. assigned device. Access to IO ports can be restricted in a similar
Figure 3 shows the common architecture for how way using the Task Segment Selector (TSS) on x86 processors.
hypervisors support PCI device assignment. The Hypervisor Physical interrupts originating from a device need to be handled
provides mechanisms to directly access a PCI device’s hardware by the Hypervisor, as interrupts are only delivered to the
resources; and the Management VM needs to provide a way highest-privileged software entity. Hypervisors then virtualize
for Guest VMs to discover PCI devices assigned to them and the physical interrupts and deliver them to the Guest VMs. In
their associated resources. order to reduce interrupt latencies, it is important that physical
Device discovery by a Guest VM is typically achieved by interrupts are delivered to the same CPU core that the destina-
providing a virtual PCI bus. The Management VM normally tion Guest VM is using to handle the resulting virtual interrupt.
owns the physical PCI buses and enumerates all physical devices In the previous MQ section, we argued that descriptors
attached to them. If a PCI device is assigned to a Guest VM, it is need to be passed through the Management VM to prevent
enumerated on a virtual PCI (vPCI) bus exported to the Guest breach of VM isolation due to rogue DMA setups. This is not
VM. This allows the guest to access the PCI configuration space required for PCI device assignment, since modern chipsets
of the device assigned to it. Importantly, all PCI configuration include IO MMUs, such as the Intel® VT-d, which can be set up
space accesses by a Guest VM are transferred to the Management by the Hypervisor to allow a device to access only certain pages
VM which can either pass them through to the device, intercept of host memory. This is achieved by setting up a page table
and emulate them, or discard them. This allows the Manage- mapping in the IO MMU to map host memory into a device’s
ment VM to enable or configure hardware resources required by DMA address space. On memory write and read requests from
the Guest VM to use the device. a PCI device to or from host memory, the chipsets select an
There are three different types of hardware resources a IO MMU page table based on the Requester ID used by the PCI
Guest VM must have access to in order to run a device driver for device. Thus, the Hypervisor sets up the IO MMU page tables
a physical device: device memory; device IO ports; and device for a device to map only the memory belonging to a Guest VM
when the device is assigned to it. This prevents a Guest VM from
intentionally or accidentally accessing other VMs’ memory areas
Mgmt VM Guest VM Guest VM via a device’s DMA engines.
Of all the IO virtualization options, direct PCI device
vPCI vPCI vPCI assignment has the lowest overhead and the least added laten-
cies. The Management VM is not involved in the data path;
it just provides infrequent access to the device’s PCI configura-
tion space. The Hypervisor itself is only involved in the virtual-
DD DD DD ization of device interrupts which can be achieved with relatively
low overhead, especially if physical interrupts are delivered
to the same CPU cores on which the recipient Guest VM is
executing. However, it is clearly infeasible to have a separate
Hypervisor PCI device for every Guest VM in a system, even if multi-
functioned devices were used. The PCI-SIG introduced SR-IOV
to address this issue.
Figure 3: PCI device assignment. Guest VMs can directly access
hardware devices, eliminating all IO virtualization overheads.
NETRONOME WHITE PAPER Understanding Network IO Virtualization (IOV) 4
Introducing PCIe SR-IOV Like modern MQ NICs, SR-IOV-capable NICs require
The PCI-SIG introduced the SR-IOV standard in September multiplex and de-multiplex traffic between VFs and typically
2007, recognizing the need to provide a device-centric approach implement the same fixed functionality: L2 switching combined
to IO virtualization. As such, the SR-IOV standard builds on top with some basic higher level filtering. Thus, SR-IOV NICs are
a wide range of existing PCI standards, including PCI Express similarly limited in flexibility as MQ NICs.
(PCIe), Alternative Routing ID (ARI), Address Translation Challenges with SR-IOV
Services (ATS) and Function Level Reset (FLR). From the host SR-IOV is well suited for providing hardware support for virtu-
perspective, SR-IOV on its own is primarily an extension to the alizing fixed-function devices, such as network cards. However,
PCI configuration space, defining access to lightweight Virtual its design has limitations in supporting highly programmable IO
Functions (VFs). devices. With SR-IOV, VFs are enumerated in a hardware-based
With SR-IOV, a physical PCI device may contain several PCI configuration space; and all VFs associated with a PF have
device functions. In SR-IOV parlance, these are called Physical to be of the same device type. Programmable IO devices may
Functions (PFs). PFs are standard PCIe devices with their own allow vendors to dynamically create virtual functions and use
full PCI configuration space and set of resources. An SR-IOV- different types of device functions to provide different interfaces
compliant PF has an additional SR-IOV Extended Capability as to the IO device. For example, a networking device may be
part of its configuration space. able to offer a standard NIC interfaces as well as interfaces for
This extended capability in the PF’s configuration space efficient packet capture and network interfaces offloading
contains configuration information about all VFs associated network security protocols.
with the PF In particular, it defines the BAR configuration for
. With SR-IOV, these three types of network interfaces would
VFs, as well as the type of the VF . have to be represented as three different PFs, each with a set of
While the BAR configuration for VFs is described in the VFs associated to them. From the SR-IOV standard, it is unclear
associated PF’s Extended Capability, each VF also has a standard if the assignment of VFs to different PFs can be easily achieved
PCIe configuration space entry. However, certain fields in a VF’s (i.e., it is unclear how dynamic VFs of a given type can be
configuration space are ignored or undefined. Of particular note created). This limitation is a direct result of SR-IOV requiring
is that the Vendor ID and Device ID fields in a VF’s configuration VFs to be enumerated in hardware, which also results in higher
space are not defined and have to be taken from the associated hardware cost and complexity.
PF’s configuration space fields. Due to this arrangement, all VFs Despite this additional cost and complexity, a software
of a PF have to be of the same type. Further, as previously out- component – the PCIM – is still required to manage VFs.
lined, the BAR configuration entries in a VF’s configuration space In the next section, we outline an alternative solution to
are undefined, as the PF’s extended capability defines the BARs address these challenges.
for all VFs. Each VF has its own set of MSI/MSI-X vectors; and
these are configured using the VF’s PCIe configuration space. Netronome IOV Solution
The SR-IOV standard anticipates that host software Netronome has designed a new IO co-processor, the NFP-32xx
(including virtualization software) requires a PCI Manager (NFP). The NFP offers up to 20Gbps of accelerated network
(PCIM) to manage PFs, VFs, their capabilities, configuration and processing for a variety of
error handling. However, the standard explicitly does not define applications, including
any implementation of the PCIM. For example, for BAR and IDS/IPS systems, L2-L7
other configuration accesses to a VF an implementation would
, processing in network
typically present a VF’s configuration space as a normal PCI infrastructure equipment
device to the OS and/or Hypervisor and mask the differences and network security appli-
through software emulation. Thus, a PCIM implementation is ances. The NFP is highly
very similar in functionality to the vPCI module used for PCI programmable, including
device assignment. In fact, in most implementation, the vPCI many aspects of its PCIe x8
module and the PCIM implementation cooperate. (v2.0) host interface.
SR-IOV for Network Devices allows greater flexibility with
From a virtualization perspective, SR-IOV-capable network how aspects of the device are
devices combine PCI device assignment with the network The NFP offers up to 20Gbps of
presented to the host system. accelerated network processing
virtualization techniques of modern MQ devices. With the help The NFP supports up 256 for a variety of applications.
of the PCIM SR-IOVs, VFs are typically treated as standard PCI queues to the host, which are
devices, which are directly assigned to Guest VMs. Since VFs are grouped to form endpoints. The queues are generic in the sense
using different Requester IDs, the chipset’s IO MMU can be set that they can carry arbitrary descriptors, thus making it possible
up to provide appropriate DMA protection; and, with each VF to create endpoints of different endpoint types. Example
possessing their own MSI/MSI-X vectors, interrupts can be endpoint types include standard NIC-style interface with RX and
directed to the cores executing the Guest VMs. Thus, SR-IOV TX queues, optimized packet capture interfaces and look-a-side
provides the same low overhead and latency access to IO crypto interfaces. The details of these interfaces are under
devices as PCI device assignment. software control running on the NFP which defines, amongst
others, the purpose of the assigned queues, the descriptor
NETRONOME WHITE PAPER Understanding Network IO Virtualization (IOV) 5
formats used and how and when DMA is initiated. Endpoints Figure 4 depicts the commonalities and differences between
of these different types can be created dynamically at runtime. SR-IOV and Netromone’s IOV solution. With the Netronome
Furthermore, for a variety of applications in both the data solution, different types of endpoints (indicated by the different
center and network infrastructure equipment, it is necessary for colors) are easily supported, while SR-IOV mandates that all
these different endpoints to be accessible by different Guest VMs VFs associated with a PF are of the same device type. With
executing on the host with low overhead – an IOV solution is Netronome’s IOV solution, the PF driver, in combination with a
required. However, as previously noted, SR-IOV is not designed virtual PCI implementation, performs the same function as the
to support this type of highly dynamic and highly flexible devices. PF driver and the PCIM for SR-IOV.
A More Flexible IOV Solution For both SR-IOV and for Netronome’s IOV solution,
The key insight is that SR-IOV relies on a number of PCI virtual functions are presented to the host OS or the Hypervisor
standards and essentially only adds a device enumeration and as standard PCI devices. Thus, they both leverage the mecha-
resource discovery mechanism. This mechanism is the primary nism provided by modern Hypervisors to assign PCI devices
reason for the limitations of SR-IOV. to Guest VMs.
With the Netronome IOV solution, device enumeration During initialization, each SCFV gets assigned a unique PCI
is delegated to a driver running on the host. This driver is function ID, which the NFP uses to tag DMA requests originat-
specific to the NFP and capable of managing endpoints on the ing from the corresponding endpoint. Thus, IO MMU page tables
NFP – creation and removal of endpoints of arbitrary types. can be set up appropriately to provide DMA protection between
To the host OS (or hypervisor) the host driver acts as a PCI bus different SCVFs. Each SCVF also gets assigned one or more
driver and enumerates NFP endpoints as standard PCI devices MSI vectors from the PF’s set of MSI-X vectors, enabling them to
– it essentially implements a virtual PCI bus. All configuration directly notify the Guest VMs to which they are assigned.
space access for devices on this virtual PCI bus are passed to Network Virtualization with Netronome’s Solution
the host driver which either emulates or translates them to The host side interactions of the NFP are identical to an SR-IOV
accesses on the NFP . solution. SCVFs can be assigned to Guest VMs just like VFs or
This solution is not dissimilar to the SR-IOV approach. The PCI devices. This provides Guest VMs with low overhead and
host driver performs the same function as the PF driver for an low latency access to network devices. Guest VMs run standard
SR-IOV device (management of VFs) and the PCIM (translating device drivers to talk to the SCVF and interrupts can be deliv-
SR-IOV VFs to PCI devices to the host OS or hypervisor). ered directly to the cores on which a Guest VM is executing.
However, the Netronome solution does not require VFs to also However, since the NFP is a programmable network device,
be enumerated in hardware. We, therefore, refer to the NFP the multiplexing and de-multiplexing of packets from and
endpoints as Software-configurable Virtual Functions (SCVFs). to SCVFs is not limited to some fixed-function hardware
Because the Netronome host driver is not restricted by the implementation as in most SR-IOV or MQ NICs. Instead,
SR-IOV device enumeration limitations, it can enumerate extensive packet processing can be performed, including flow-
arbitrary types of functions on its virtual PCI bus. based classification, load balancing or filtering. This provides a
significantly more flexible solution than other hardware-based
VF ... VF SCVF ... SCVF
DRV DRV DRV DRV
DRV PCIM DRV vPCI
VF ... VF PF EP ... EP
SR-IOV Device NFP-32xx
Figure 4: SR-IOV (left) and Netronome’s IOV (right) compared. The primary difference is that, with Netronome’s IOV solution, different
types of devices are easily supported.
NETRONOME WHITE PAPER Understanding Network IO Virtualization (IOV) 6
Comparing the Various IOV Implementation Options
The following table summarizes and compares the four different IOV options previously discussed.
SW IOV MQ IOV SR-IOV NFP IOV
Flexibility in Packet Processing High Low Low High
Overhead High Medium Low Low
Latency High Medium Low Low
IO-MMU Support Limited Very Limited Yes Yes
Guest VM Drivers Generic Generic Device-specific Device-specific
Management VM Support Generic Device-specific Device-specific Device-specific
Flexibility in Device Support Medium Low Low High
The Netronome IOV solution has the advantage of being SR-IOV-compliant while providing flexible device support – most
notably, the capability to dynamically assign different kinds of virtual functions at run time. The result is that a physical NIC can
provide multiple virtual NIC types, including a dumb NIC, intelligent NIC, crypto NIC or Packet Capture (PCap) NIC.
Application of IOV driver for IO virtualization. These consolidated data center
in the Data Center networks require lower latency and higher throughput than
Next-generation data centers need to address a complex set of traditional data-only networks.
issues, such as: IT consolidation, service continuity, service As noted earlier, the different approaches to IO Virtualiza-
flexibility and energy efficiency. Virtualization of servers is tion have very different characteristics with regard to latency.
already seen as a key factor in moving to a next-generation data Next-generation data center requirements for low latency and
center. Without IO virtualization the limitations already low overhead delivery of network IO directly to VMs can only
discussed in this article will severely restrict the extent to which be provided by hardware-based IOV solutions, such as SR-IOV-
the goals outlined above can be met. based NICs or Netronome’s IOV solution. Software-based IO vir-
A single, multi-core server can easily support ten to 100 tualization imposes too high an overhead to handle the expected
VMs, allowing numerous applications – which, today, require a network IO demand; and IOV solutions based on MQ NICs are
dedicated server – to share a single physical server. This allows unsuitable for latency-sensitive applications, such as FCoE.
the number of servers in the data center to be reduced while As servers and network appliances in the data centers are
increasing average utilization from as low as 5-15% to up to built around commodity multi-core CPUs – specifically, x86
50-60%. With multiple VMs running on a single physical ma- architectures – and network IO around PCIe, implementing
chine, there are opportunities for the VMs to cost-effectively IOV over PCIe becomes critical in allowing the many VMs to
share a pool of IO resources, such as intelligent NICs. In the sin- share network IO devices.
gle application, single server model, each application has access Data centers also deploy a wide range of advanced network
to the entire server’s bandwidth. In the virtualized server model, and management technologies within the Ethernet infrastruc-
however, network bandwidth is shared by multiple applications. ture, such as extensive use of Access Control Lists (ACLs),
As more applications are consolidated on one server, sophisticated VLAN setups, Quality of Service and even some
bandwidth requirements per server and server utilization both limited L3 and L4 processing. These technologies are readily
increase significantly. The result is that an intelligent NIC is available in modern network infrastructure equipment, such as
needed to offload network processing for the host in order to Top-of-the-Rack (TOR) switches. However, even modern
prevent the host CPU from becoming the bottleneck that limits SR-IOV-based NICs only provide very limited, fixed-function
application consolidation. This trend requires low overhead switching capabilities, creating a disconnect between the
delivery of network data directly to Guest VM. sophisticated physical network infrastructure and virtual net-
The move to a unified network for all traffic in the data work infrastructure implemented on the host. An IOV solution
center – with data and storage networks consolidating onto combined with intelligent IO processing (IOV-P) bridges this
standard Ethernet using technologies, such as Fiber Channel gap and extends sophisticated network processing and manage-
over Ethernet (FCoE) or iSCSI over 10Gbps Ethernet – is also a ment into virtualized servers. An IOV-P-based intelligent NIC
NETRONOME WHITE PAPER Understanding Network IO Virtualization (IOV) 7