Userspace networking

Networking in Userspace
Living on the edge

Stephen Hemminger
stephen@networkplumber.org

Problem Statement
20,000,000
Packets per second (bidirectional)

15,000,000

10,000,000

5,000,000

0
64 208 352 496 640 784 928 1072 121613601504

Packet Size (bytes)

Intel: DPDK Overview

Server vs Infrastructure
Server Packets Network Infrastructure
Packet Size 1024 bytes Packet Size 64 bytes

Packets/second 1.2 Million Packets/second 14.88 Million
Arrival rate 67.2 ns
Arrival rate 835 ns 2 GHz Clock 135 cycles
cycles
2 GHz 1670 cycles
3 Ghz Clock 201 cycles
3 Ghz 2505 cycles cycles

L3 hit on Intel® Xeon® ~40 cycles
L3 miss, memory read is (201 cycles at 3 GHz)

Good old sockets

Flexible, portable but slow

Memory mapped buffers

Efficient, but still constrained by architecture

The OpenOnload architecture

Network hardware provides a user-safe interface which
can route Ethernet packets to an application context
based on flow information contained within headers
Kernel Application Application
Context Context Context

Application Application Application

Protocol Protocol

Driver
Network Driver

DMA No new protocols
DMA

Network Adaptor

Slide 7


Protocol processing can take place both in the
application and kernel context for a given flow



Protocol Protocol Enables persistent / asynchronous
processing

Driver
Network Driver
Maintains existing
network control-plane

DMA
DMA

Network Adaptor

Slide 8


Protocol state is shared between the kernel and
application contexts through a protected shared
memory communications channel


Protocol Protocol Enables correct handling of
protocol state with high-performance

Driver
Network Driver

DMA
DMA

Network Adaptor

Slide 9

Performance metrics

Overhead
– Networking overheads take CPU time away from your application

Latency
– Holds your application up when it has nothing else to do
– H/W + flight time + overhead

Bandwidth
– Dominates latency when messages are large
– Limited by: algorithms, buffering and overhead

Scalability
– Determines how overhead grows as you add cores, memory, threads, sockets
etc.

Slide 11

Anatomy of kernel-based networking

Slide 12

A user-level architecture?

Slide 13

Direct & safe hardware access

Slide 14

Some performance results

Test platform: typical commodity server
– Intel clovertown 2.3 GHz quad-core xeon (x1)
1.3 GHz FSB, 2 Gb RAM
– Intel 5000X chipset
– Solarflare Solarstorm SFC4000 (B) controller, CX4
– Back-to-back
– RedHat Enterprise 5 (2.6.18-8.el5)

Slide 88

Performance: Latency and overhead

TCP ping-pong with 4 byte payload
70 byte frame: 14+20+20+12+4

½ round-trip latency CPU overhead
(microseconds) (microseconds)
Hardware 4.2 --

Kernel 11.2 7.0

Onload 5.3 1.1

Slide 89

Performance: Streaming bandwidth

Slide 92

Performance: UDP transmit

Nessage rate:
– 4 byte UDP payload (46 byte
frame)

Kernel Onload

1 sender 473,000 2,030,000

Slide 93

Performance: UDP transmit

Nessage rate:
– 4 byte UDP payload (46 byte
frame)

Kernel Onload

1 sender 473,000 2,030,000

2 senders 532,000 3,880,000

Slide 94

Performance: UDP receive

Slide 95

OpenOnload Open Source

OpenOnload available as Open Source (GPLv2)
– Please contact us if you’re interested

Compatible with x86 (ia32, amd64/emt64)

Currently supports SMC10GPCIe-XFP and SMC10GPCIe-10BT
NICs
– Could support other user-accessible network interfaces

Very interested in user feedback
– On the technology and project directions

Slide 100

Netmap
http://info.iet.unipi.it/~luigi/netmap/
●
BSD (and Linux port)
●
Good scalability
●
Libpcap emulation

Netmap API
●
Access
– open("/dev/netmap")
– ioctl(fd, NIOCREG, arg)
– mmap(..., fd, 0) maps buffers and rings
●
Transmit
– fill up to avail buffers, starting from slot cur.
– ioctl(fd,NIOCTXSYNC) queues the packets
●
Receive
– ioctl(fd,NIOCRXSYNC) reports newly received packets
– process up to avail buffers, starting from slot cur.

These ioctl()s are non-blocking.

Netmap API: synchronization
● poll() and select(), what else!
– POLLIN and POLLOUT decide which sets of rings to
work on
– work as expected, returning when avail>0
– interrupt mitigation delays are propagated up to
the userspace process

Netmap: multiqueue
●
Of course.
– one netmap ring per physical ring
– by default, the fd is bound to all rings
– ioctl(fd, NIOCREG, arg) can restrict the binding
to a single ring pair
– multiple fd's can be bound to different rings on the same
card
– the fd's can be managed by different threads
– threads mapped to cores with pthread_setaffinity()

Netmap and the host stack
●
While in netmap mode, the control path remains unchanged:
– ifconfig, ioctl's, etc still work as usual
– the OS still believes the interface is there
●
The data path is detached from the host stack:
– packets from NIC end up in RX netmap rings
– packets from TX netmap rings are sent to the NIC
●
The host stack is attached to an extra netmap rings:
– packets from the host go to a SW RX netmap ring
– packets from a SW TX netmap ring are sent to the host
– these rings are managed using the netmap API

Netmap Summary
Packet Forwarding Mpps

Freebsd bridging 0.690

Netmap + libpcap 7.500

Netmap 14.88

Open vSwitch Mpps

userspace 0.065

linux 0.600

FreeBSD 0.790

FreeBSD+netmap/pcap 3.050

The Intel® DPDK Philosophy

Intel® DPDK Fundamentals
• Implements a run to completion model or
pipeline model
• No scheduler - all devices accessed by
polling
• Supports 32-bit and 64-bit with/without
NUMA
• Scales from Intel® Atom™ to Intel®
Xeon® processors
• Number of Cores and Processors not
limited
• Optimal packet allocation across DRAM
channels
Control
Plane Data Plane

• Must run on any IA CPU Provide software examples that
‒ From Intel® Atom™ processor to the address common network
latest Intel® Xeon® processor family performance deficits
‒ Essential to the IA value proposition ‒ Best practices for software architecture
‒ ‒ Tips for data structure design and storage
• Focus on the fast-path ‒ Help the compiler generate optimum code
‒ Sending large number of packets to the ‒ Address the challenges of achieving 80
Linux Kernel /GPOS will bog the system down Mpps per CPU Socket

20 Intel Restricted Secret
TRANSFORMING COMMUNICATIONS

Intel® Data Plane Development Kit (Intel® DPDK)
Intel® DPDK embeds optimizations for Intel® DPDK
Libraries
the IA platform:
- Data Plane Libraries and Optimized NIC Customer
Drivers in Linux User Space Buffer Management Application

Queue/Ring Functions Customer
- Run-time Environment
Application
Packet Flow
Classification
- Environment Abstraction Layer and Boot Code Customer
NIC Poll Mode Library Application
- BSD-licensed & source downloadable from
Intel and leading ecopartners Environment Abstraction Layer

User Space
Kernel Space

Environment Abstraction Layer
Linux Kernel

Platform Hardware


Intel® DPDK Libraries and Drivers

• Memory Manager: Responsible for allocating pools of objects in memory. A pool is
created in huge page memory space and uses a ring to store free objects. It also
provides an alignment helper to ensure that objects are padded to spread them
equally on all DRAM channels.
• Buffer Manager: Reduces by a significant amount the time the operating system
spends allocating and de-allocating buffers. The Intel® DPDK pre-allocates fixed size
buffers which are stored in memory pools.
• Queue Manager:: Implements safe lockless queues, instead of using spinlocks, that
allow different software components to process packets, while avoiding unnecessary
wait times.
• Flow Classification: Provides an efficient mechanism which incorporates Intel®
Streaming SIMD Extensions (Intel® SSE) to produce a hash based on tuple
information so that packets may be placed into flows quickly for processing, thus
greatly improving throughput.
• Poll Mode Drivers: The Intel® DPDK includes Poll Mode Drivers for 1 GbE and 10 GbE
Ethernet* controllers which are designed to work without asynchronous, interrupt-
based signaling mechanisms, which greatly speeds up the packet pipeline.


Intel® DPDK Native and Virtualized
Forwarding Performance


Comparison
Netmap DPDK OpenOnload

License BSD BSD GPL

API Packet + pcap Packet + lib Sockets

Kernel Yes Yes Yes

HW support Intel, realtek Intel Solarflare

OS FreeBSD, Linux Linux Linux

Issues
●
Out of tree kernel code
– Non standard drivers
●
Resource sharing
– CPU
– NIC
●
Security
– No firewall
– DMA isolation

What's needed?
●
Netmap
– Linux version (not port)
– Higher level protocols?
●
DPDK
– Wider device support
– Ask Intel
●
Openonload
– Ask Solarflare

●
OpenOnload
– A user-level network stack (Google tech talk)
●
Steve Pope
●
David Riddoch
●
Netmap - Luigi Rizzo
– http://info.iet.unipi.it/~luigi/netmap/talk-atc12.html
●
DPDK
– Intel DPDK Overview
– Disruptive network IP networking
●
Naoto MASMOTO

Userspace networking

More Related Content

What's hot

Viewers also liked

Similar to Userspace networking

More from Stephen Hemminger

Recently uploaded

Userspace networking