Seven years ago at LCA, Van Jacobsen introduced the concept of net channels but since then the concept of user mode networking has not hit the mainstream. There are several different user mode networking environments: Intel DPDK, BSD netmap, and Solarflare OpenOnload. Each of these provides higher performance than standard Linux kernel networking; but also creates new problems. This talk will explore the issues created by user space networking including performance, internal architecture, security and licensing.
10. The OpenOnload architecture
Network hardware provides a user-safe interface which
can route Ethernet packets to an application context
based on flow information contained within headers
Kernel Application Application
Context Context Context
Application Application Application
Protocol Protocol
Driver
Network Driver
DMA No new protocols
DMA
Network Adaptor
Slide 7
11. The OpenOnload architecture
Protocol processing can take place both in the
application and kernel context for a given flow
Kernel Application Application
Context Context Context
Application Application Application
Protocol Protocol Enables persistent / asynchronous
processing
Driver
Network Driver
Maintains existing
network control-plane
DMA
DMA
Network Adaptor
Slide 8
12. The OpenOnload architecture
Protocol state is shared between the kernel and
application contexts through a protected shared
memory communications channel
Kernel Application Application
Context Context Context
Application Application Application
Protocol Protocol Enables correct handling of
protocol state with high-performance
Driver
Network Driver
DMA
DMA
Network Adaptor
Slide 9
13. Performance metrics
Overhead
– Networking overheads take CPU time away from your application
Latency
– Holds your application up when it has nothing else to do
– H/W + flight time + overhead
Bandwidth
– Dominates latency when messages are large
– Limited by: algorithms, buffering and overhead
Scalability
– Determines how overhead grows as you add cores, memory, threads, sockets
etc.
Slide 11
23. OpenOnload Open Source
OpenOnload available as Open Source (GPLv2)
– Please contact us if you’re interested
Compatible with x86 (ia32, amd64/emt64)
Currently supports SMC10GPCIe-XFP and SMC10GPCIe-10BT
NICs
– Could support other user-accessible network interfaces
Very interested in user feedback
– On the technology and project directions
Slide 100
24. Netmap
http://info.iet.unipi.it/~luigi/netmap/
●
BSD (and Linux port)
●
Good scalability
●
Libpcap emulation
26. Netmap API
●
Access
– open("/dev/netmap")
– ioctl(fd, NIOCREG, arg)
– mmap(..., fd, 0) maps buffers and rings
●
Transmit
– fill up to avail buffers, starting from slot cur.
– ioctl(fd,NIOCTXSYNC) queues the packets
●
Receive
– ioctl(fd,NIOCRXSYNC) reports newly received packets
– process up to avail buffers, starting from slot cur.
These ioctl()s are non-blocking.
27. Netmap API: synchronization
● poll() and select(), what else!
– POLLIN and POLLOUT decide which sets of rings to
work on
– work as expected, returning when avail>0
– interrupt mitigation delays are propagated up to
the userspace process
28. Netmap: multiqueue
●
Of course.
– one netmap ring per physical ring
– by default, the fd is bound to all rings
– ioctl(fd, NIOCREG, arg) can restrict the binding
to a single ring pair
– multiple fd's can be bound to different rings on the same
card
– the fd's can be managed by different threads
– threads mapped to cores with pthread_setaffinity()
29. Netmap and the host stack
●
While in netmap mode, the control path remains unchanged:
– ifconfig, ioctl's, etc still work as usual
– the OS still believes the interface is there
●
The data path is detached from the host stack:
– packets from NIC end up in RX netmap rings
– packets from TX netmap rings are sent to the NIC
●
The host stack is attached to an extra netmap rings:
– packets from the host go to a SW RX netmap ring
– packets from a SW TX netmap ring are sent to the host
– these rings are managed using the netmap API
34. The Intel® DPDK Philosophy
Intel® DPDK Fundamentals
• Implements a run to completion model or
pipeline model
• No scheduler - all devices accessed by
polling
• Supports 32-bit and 64-bit with/without
NUMA
• Scales from Intel® Atom™ to Intel®
Xeon® processors
• Number of Cores and Processors not
limited
• Optimal packet allocation across DRAM
channels
Control
Plane Data Plane
• Must run on any IA CPU Provide software examples that
‒ From Intel® Atom™ processor to the address common network
latest Intel® Xeon® processor family performance deficits
‒ Essential to the IA value proposition ‒ Best practices for software architecture
‒ ‒ Tips for data structure design and storage
• Focus on the fast-path ‒ Help the compiler generate optimum code
‒ Sending large number of packets to the ‒ Address the challenges of achieving 80
Linux Kernel /GPOS will bog the system down Mpps per CPU Socket
20 Intel Restricted Secret
TRANSFORMING COMMUNICATIONS
TRANSFORMING COMMUNICATIONS
35. Intel® Data Plane Development Kit (Intel® DPDK)
Intel® DPDK embeds optimizations for Intel® DPDK
Libraries
the IA platform:
- Data Plane Libraries and Optimized NIC Customer
Drivers in Linux User Space Buffer Management Application
Queue/Ring Functions Customer
- Run-time Environment
Application
Packet Flow
Classification
- Environment Abstraction Layer and Boot Code Customer
NIC Poll Mode Library Application
- BSD-licensed & source downloadable from
Intel and leading ecopartners Environment Abstraction Layer
User Space
Kernel Space
Environment Abstraction Layer
Linux Kernel
Platform Hardware
21 Intel Restricted Secret
TRANSFORMING COMMUNICATIONS
TRANSFORMING COMMUNICATIONS
36. Intel® DPDK Libraries and Drivers
• Memory Manager: Responsible for allocating pools of objects in memory. A pool is
created in huge page memory space and uses a ring to store free objects. It also
provides an alignment helper to ensure that objects are padded to spread them
equally on all DRAM channels.
• Buffer Manager: Reduces by a significant amount the time the operating system
spends allocating and de-allocating buffers. The Intel® DPDK pre-allocates fixed size
buffers which are stored in memory pools.
• Queue Manager:: Implements safe lockless queues, instead of using spinlocks, that
allow different software components to process packets, while avoiding unnecessary
wait times.
• Flow Classification: Provides an efficient mechanism which incorporates Intel®
Streaming SIMD Extensions (Intel® SSE) to produce a hash based on tuple
information so that packets may be placed into flows quickly for processing, thus
greatly improving throughput.
• Poll Mode Drivers: The Intel® DPDK includes Poll Mode Drivers for 1 GbE and 10 GbE
Ethernet* controllers which are designed to work without asynchronous, interrupt-
based signaling mechanisms, which greatly speeds up the packet pipeline.
22 Intel Restricted Secret
TRANSFORMING COMMUNICATIONS
TRANSFORMING COMMUNICATIONS