Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Userspace networking


Published on

Seven years ago at LCA, Van Jacobsen introduced the concept of net channels but since then the concept of user mode networking has not hit the mainstream. There are several different user mode networking environments: Intel DPDK, BSD netmap, and Solarflare OpenOnload. Each of these provides higher performance than standard Linux kernel networking; but also creates new problems. This talk will explore the issues created by user space networking including performance, internal architecture, security and licensing.

Published in: Technology

Userspace networking

  1. Networking in Userspace Living on the edge Stephen Hemminger
  2. Problem Statement 20,000,000Packets per second (bidirectional) 15,000,000 10,000,000 5,000,000 0 64 208 352 496 640 784 928 1072 121613601504 Packet Size (bytes) Intel: DPDK Overview
  3. Server vs Infrastructure Server Packets Network Infrastructure Packet Size 1024 bytes Packet Size 64 bytesPackets/second 1.2 Million Packets/second 14.88 Million Arrival rate 67.2 ns Arrival rate 835 ns 2 GHz Clock 135 cycles cycles 2 GHz 1670 cycles 3 Ghz Clock 201 cycles 3 Ghz 2505 cycles cyclesL3 hit on Intel® Xeon® ~40 cyclesL3 miss, memory read is (201 cycles at 3 GHz)
  4. Traditional Linux networking
  5. TCP Offload Engine
  6. Good old socketsFlexible, portable but slow
  7. Memory mapped buffersEfficient, but still constrained by architecture
  8. Run in kernel
  9. The OpenOnload architecture Network hardware provides a user-safe interface which can route Ethernet packets to an application context based on flow information contained within headers Kernel Application Application Context Context Context Application Application Application Protocol Protocol Driver Network Driver DMA No new protocols DMA Network AdaptorSlide 7
  10. The OpenOnload architecture Protocol processing can take place both in the application and kernel context for a given flow Kernel Application Application Context Context Context Application Application Application Protocol Protocol Enables persistent / asynchronous processing Driver Network Driver Maintains existing network control-plane DMA DMA Network AdaptorSlide 8
  11. The OpenOnload architecture Protocol state is shared between the kernel and application contexts through a protected shared memory communications channel Kernel Application Application Context Context Context Application Application Application Protocol Protocol Enables correct handling of protocol state with high-performance Driver Network Driver DMA DMA Network AdaptorSlide 9
  12. Performance metrics Overhead – Networking overheads take CPU time away from your application Latency – Holds your application up when it has nothing else to do – H/W + flight time + overhead Bandwidth – Dominates latency when messages are large – Limited by: algorithms, buffering and overhead Scalability – Determines how overhead grows as you add cores, memory, threads, sockets etc.Slide 11
  13. Anatomy of kernel-based networkingSlide 12
  14. A user-level architecture?Slide 13
  15. Direct & safe hardware accessSlide 14
  16. Some performance results Test platform: typical commodity server – Intel clovertown 2.3 GHz quad-core xeon (x1) 1.3 GHz FSB, 2 Gb RAM – Intel 5000X chipset – Solarflare Solarstorm SFC4000 (B) controller, CX4 – Back-to-back – RedHat Enterprise 5 (2.6.18-8.el5)Slide 88
  17. Performance: Latency and overhead TCP ping-pong with 4 byte payload 70 byte frame: 14+20+20+12+4 ½ round-trip latency CPU overhead (microseconds) (microseconds) Hardware 4.2 -- Kernel 11.2 7.0 Onload 5.3 1.1Slide 89
  18. Performance: Streaming bandwidthSlide 92
  19. Performance: UDP transmit Nessage rate: – 4 byte UDP payload (46 byte frame) Kernel Onload 1 sender 473,000 2,030,000Slide 93
  20. Performance: UDP transmit Nessage rate: – 4 byte UDP payload (46 byte frame) Kernel Onload 1 sender 473,000 2,030,000 2 senders 532,000 3,880,000Slide 94
  21. Performance: UDP receiveSlide 95
  22. OpenOnload Open Source OpenOnload available as Open Source (GPLv2) – Please contact us if you’re interested Compatible with x86 (ia32, amd64/emt64) Currently supports SMC10GPCIe-XFP and SMC10GPCIe-10BT NICs – Could support other user-accessible network interfaces Very interested in user feedback – On the technology and project directionsSlide 100
  23. Netmap● BSD (and Linux port)● Good scalability● Libpcap emulation
  24. Netmap
  25. Netmap API● Access – open("/dev/netmap") – ioctl(fd, NIOCREG, arg) – mmap(..., fd, 0) maps buffers and rings● Transmit – fill up to avail buffers, starting from slot cur. – ioctl(fd,NIOCTXSYNC) queues the packets● Receive – ioctl(fd,NIOCRXSYNC) reports newly received packets – process up to avail buffers, starting from slot cur. These ioctl()s are non-blocking.
  26. Netmap API: synchronization● poll() and select(), what else! – POLLIN and POLLOUT decide which sets of rings to work on – work as expected, returning when avail>0 – interrupt mitigation delays are propagated up to the userspace process
  27. Netmap: multiqueue● Of course. – one netmap ring per physical ring – by default, the fd is bound to all rings – ioctl(fd, NIOCREG, arg) can restrict the binding to a single ring pair – multiple fds can be bound to different rings on the same card – the fds can be managed by different threads – threads mapped to cores with pthread_setaffinity()
  28. Netmap and the host stack● While in netmap mode, the control path remains unchanged: – ifconfig, ioctls, etc still work as usual – the OS still believes the interface is there● The data path is detached from the host stack: – packets from NIC end up in RX netmap rings – packets from TX netmap rings are sent to the NIC● The host stack is attached to an extra netmap rings: – packets from the host go to a SW RX netmap ring – packets from a SW TX netmap ring are sent to the host – these rings are managed using the netmap API
  29. Netmap: Tx performance
  30. Netmap: Rx Performance
  31. Netmap SummaryPacket Forwarding MppsFreebsd bridging 0.690Netmap + libpcap 7.500Netmap 14.88Open vSwitch Mppsuserspace 0.065linux 0.600FreeBSD 0.790FreeBSD+netmap/pcap 3.050
  32. Intel DPDK Architecture
  33. The Intel® DPDK Philosophy Intel® DPDK Fundamentals • Implements a run to completion model or pipeline model • No scheduler - all devices accessed by polling • Supports 32-bit and 64-bit with/without NUMA • Scales from Intel® Atom™ to Intel® Xeon® processors • Number of Cores and Processors not limited • Optimal packet allocation across DRAM channels Control Plane Data Plane • Must run on any IA CPU Provide software examples that ‒ From Intel® Atom™ processor to the address common network latest Intel® Xeon® processor family performance deficits ‒ Essential to the IA value proposition ‒ Best practices for software architecture ‒ ‒ Tips for data structure design and storage • Focus on the fast-path ‒ Help the compiler generate optimum code ‒ Sending large number of packets to the ‒ Address the challenges of achieving 80 Linux Kernel /GPOS will bog the system down Mpps per CPU Socket20 Intel Restricted Secret TRANSFORMING COMMUNICATIONS TRANSFORMING COMMUNICATIONS
  34. Intel® Data Plane Development Kit (Intel® DPDK)Intel® DPDK embeds optimizations for Intel® DPDK Librariesthe IA platform:- Data Plane Libraries and Optimized NIC CustomerDrivers in Linux User Space Buffer Management Application Queue/Ring Functions Customer- Run-time Environment Application Packet Flow Classification- Environment Abstraction Layer and Boot Code Customer NIC Poll Mode Library Application- BSD-licensed & source downloadable fromIntel and leading ecopartners Environment Abstraction Layer User Space Kernel Space Environment Abstraction Layer Linux Kernel Platform Hardware21 Intel Restricted Secret TRANSFORMING COMMUNICATIONS TRANSFORMING COMMUNICATIONS
  35. Intel® DPDK Libraries and Drivers • Memory Manager: Responsible for allocating pools of objects in memory. A pool is created in huge page memory space and uses a ring to store free objects. It also provides an alignment helper to ensure that objects are padded to spread them equally on all DRAM channels. • Buffer Manager: Reduces by a significant amount the time the operating system spends allocating and de-allocating buffers. The Intel® DPDK pre-allocates fixed size buffers which are stored in memory pools. • Queue Manager:: Implements safe lockless queues, instead of using spinlocks, that allow different software components to process packets, while avoiding unnecessary wait times. • Flow Classification: Provides an efficient mechanism which incorporates Intel® Streaming SIMD Extensions (Intel® SSE) to produce a hash based on tuple information so that packets may be placed into flows quickly for processing, thus greatly improving throughput. • Poll Mode Drivers: The Intel® DPDK includes Poll Mode Drivers for 1 GbE and 10 GbE Ethernet* controllers which are designed to work without asynchronous, interrupt- based signaling mechanisms, which greatly speeds up the packet pipeline.22 Intel Restricted Secret TRANSFORMING COMMUNICATIONS TRANSFORMING COMMUNICATIONS
  36. Intel® DPDK Native and Virtualized Forwarding Performance23 Intel Restricted Secret TRANSFORMING COMMUNICATIONS TRANSFORMING COMMUNICATIONS
  37. Comparison Netmap DPDK OpenOnloadLicense BSD BSD GPLAPI Packet + pcap Packet + lib SocketsKernel Yes Yes YesHW support Intel, realtek Intel SolarflareOS FreeBSD, Linux Linux Linux
  38. Issues● Out of tree kernel code – Non standard drivers● Resource sharing – CPU – NIC● Security – No firewall – DMA isolation
  39. Whats needed?● Netmap – Linux version (not port) – Higher level protocols?● DPDK – Wider device support – Ask Intel● Openonload – Ask Solarflare
  40. ● OpenOnload – A user-level network stack (Google tech talk) ● Steve Pope ● David Riddoch● Netmap - Luigi Rizzo –● DPDK – Intel DPDK Overview – Disruptive network IP networking ● Naoto MASMOTO
  41. Thank you