SlideShare a Scribd company logo
1 of 42
Download to read offline
Networking in Userspace
   Living on the edge




  Stephen Hemminger
  stephen@networkplumber.org
Problem Statement
                                     20,000,000
Packets per second (bidirectional)




                                     15,000,000



                                     10,000,000



                                      5,000,000



                                              0
                                              64 208 352 496 640 784 928 1072 121613601504

                                                        Packet Size (bytes)



                                                                              Intel: DPDK Overview
Server vs Infrastructure
   Server Packets              Network Infrastructure
  Packet Size    1024 bytes     Packet Size       64 bytes

Packets/second   1.2 Million   Packets/second   14.88 Million
                                Arrival rate       67.2 ns
  Arrival rate     835 ns       2 GHz Clock      135 cycles
                                   cycles
    2 GHz        1670 cycles
                                3 Ghz Clock      201 cycles
    3 Ghz        2505 cycles       cycles


L3 hit on Intel® Xeon® ~40 cycles
L3 miss, memory read is (201 cycles at 3 GHz)
Traditional Linux networking
TCP Offload Engine
Good old sockets




Flexible, portable but slow
Memory mapped buffers




Efficient, but still constrained by architecture
Run in kernel
The OpenOnload architecture

      Network hardware provides a user-safe interface which
      can route Ethernet packets to an application context
      based on flow information contained within headers
                   Kernel          Application     Application
                   Context          Context         Context

                  Application      Application     Application




                             Protocol               Protocol


                                                     Driver
                       Network Driver




                                                      DMA        No new protocols
                                DMA

                                 Network Adaptor




Slide 7
The OpenOnload architecture

      Protocol processing can take place both in the
      application and kernel context for a given flow

                    Kernel          Application     Application
                    Context          Context         Context

                   Application      Application     Application




                              Protocol               Protocol     Enables persistent / asynchronous
                                                                             processing

                                                      Driver
                        Network Driver
                                                                         Maintains existing
                                                                       network control-plane

                                                       DMA
                                 DMA

                                  Network Adaptor




Slide 8
The OpenOnload architecture

      Protocol state is shared between the kernel and
      application contexts through a protected shared
      memory communications channel
                   Kernel          Application     Application
                   Context          Context         Context

                   Application     Application     Application




                             Protocol               Protocol         Enables correct handling of
                                                                 protocol state with high-performance

                                                     Driver
                        Network Driver




                                                      DMA
                                 DMA

                                 Network Adaptor




Slide 9
Performance metrics

      Overhead
           – Networking overheads take CPU time away from your application

      Latency
           – Holds your application up when it has nothing else to do
           – H/W + flight time + overhead

      Bandwidth
           – Dominates latency when messages are large
           – Limited by: algorithms, buffering and overhead

      Scalability
           – Determines how overhead grows as you add cores, memory, threads, sockets
             etc.


Slide 11
Anatomy of kernel-based networking




Slide 12
A user-level architecture?




Slide 13
Direct & safe hardware access




Slide 14
Some performance results


      Test platform: typical commodity server
           – Intel clovertown 2.3 GHz quad-core xeon (x1)
             1.3 GHz FSB, 2 Gb RAM
           – Intel 5000X chipset
           – Solarflare Solarstorm SFC4000 (B) controller, CX4
           – Back-to-back
           – RedHat Enterprise 5 (2.6.18-8.el5)




Slide 88
Performance: Latency and overhead

      TCP ping-pong with 4 byte payload
      70 byte frame: 14+20+20+12+4

                       ½ round-trip latency    CPU overhead
                         (microseconds)       (microseconds)
 Hardware                      4.2                  --

 Kernel                       11.2                 7.0

 Onload                        5.3                 1.1


Slide 89
Performance: Streaming bandwidth




Slide 92
Performance: UDP transmit

      Nessage rate:
           – 4 byte UDP payload (46 byte
             frame)



                               Kernel      Onload


 1 sender                      473,000     2,030,000




Slide 93
Performance: UDP transmit

      Nessage rate:
           – 4 byte UDP payload (46 byte
             frame)



                               Kernel      Onload


 1 sender                      473,000     2,030,000


 2 senders                     532,000     3,880,000



Slide 94
Performance: UDP receive




Slide 95
OpenOnload Open Source

      OpenOnload available as Open Source (GPLv2)
            – Please contact us if you’re interested

      Compatible with x86 (ia32, amd64/emt64)

      Currently supports SMC10GPCIe-XFP and SMC10GPCIe-10BT
      NICs
            – Could support other user-accessible network interfaces

      Very interested in user feedback
            – On the technology and project directions


Slide 100
Netmap
        http://info.iet.unipi.it/~luigi/netmap/
●
    BSD (and Linux port)
●
    Good scalability
●
    Libpcap emulation
Netmap
Netmap API
●
    Access
    –   open("/dev/netmap")
    –   ioctl(fd, NIOCREG, arg)
    –   mmap(..., fd, 0) maps buffers and rings
●
    Transmit
    –   fill up to avail buffers, starting from slot cur.
    –   ioctl(fd,NIOCTXSYNC) queues the packets
●
    Receive
    –   ioctl(fd,NIOCRXSYNC) reports newly received packets
    –   process up to avail buffers, starting from slot cur.


                       These ioctl()s are non-blocking.
Netmap API: synchronization
●   poll() and select(), what else!
    –   POLLIN and POLLOUT decide which sets of rings to
        work on
    –   work as expected, returning when avail>0
    –   interrupt mitigation delays are propagated up to
        the userspace process
Netmap: multiqueue
●
    Of course.
    –   one netmap ring per physical ring
    –   by default, the fd is bound to all rings
    –   ioctl(fd, NIOCREG, arg) can restrict the binding
        to a single ring pair
    –   multiple fd's can be bound to different rings on the same
        card
    –   the fd's can be managed by different threads
    –   threads mapped to cores with pthread_setaffinity()
Netmap and the host stack
●
    While in netmap mode, the control path remains unchanged:
    –   ifconfig, ioctl's, etc still work as usual
    –   the OS still believes the interface is there
●
    The data path is detached from the host stack:
    –   packets from NIC end up in RX netmap rings
    –   packets from TX netmap rings are sent to the NIC
●
    The host stack is attached to an extra netmap rings:
    –   packets from the host go to a SW RX netmap ring
    –   packets from a SW TX netmap ring are sent to the host
    –   these rings are managed using the netmap API
Netmap: Tx performance
Netmap: Rx Performance
Netmap Summary
Packet Forwarding     Mpps

Freebsd bridging      0.690

Netmap + libpcap      7.500

Netmap                14.88

Open vSwitch          Mpps

userspace             0.065

linux                 0.600

FreeBSD               0.790

FreeBSD+netmap/pcap   3.050
Intel DPDK Architecture
The Intel® DPDK Philosophy


                                                                   Intel® DPDK Fundamentals
                                                                   •   Implements a run to completion model or
                                                                       pipeline model
                                                                   •   No scheduler - all devices accessed by
                                                                       polling
                                                                   •   Supports 32-bit and 64-bit with/without
                                                                       NUMA
                                                                   •   Scales from Intel® Atom™ to Intel®
                                                                       Xeon® processors
                                                                   •   Number of Cores and Processors not
                                                                       limited
                                                                   •   Optimal packet allocation across DRAM
                                                                       channels
      Control
      Plane                       Data Plane




 • Must run on any IA CPU                                Provide software examples that
     ‒ From Intel® Atom™ processor to the                address common network
       latest Intel® Xeon® processor family              performance deficits
     ‒ Essential to the IA value proposition              ‒   Best practices for software architecture
     ‒                                                    ‒   Tips for data structure design and storage
 • Focus on the fast-path                                 ‒   Help the compiler generate optimum code
     ‒ Sending large number of packets to the             ‒   Address the challenges of achieving 80
       Linux Kernel /GPOS will bog the system down            Mpps per CPU Socket




20     Intel Restricted Secret
                                     TRANSFORMING COMMUNICATIONS
                                     TRANSFORMING COMMUNICATIONS
Intel® Data Plane Development Kit (Intel® DPDK)
Intel® DPDK embeds optimizations for                    Intel® DPDK
                                                        Libraries
the IA platform:
- Data Plane Libraries and Optimized NIC                                                  Customer
Drivers in Linux User Space                               Buffer Management               Application

                                                          Queue/Ring Functions            Customer
-   Run-time Environment
                                                                                          Application
                                                          Packet Flow
                                                          Classification
-   Environment Abstraction Layer and Boot Code                                           Customer
                                                          NIC Poll Mode Library           Application
- BSD-licensed & source downloadable from
Intel and leading ecopartners                           Environment Abstraction Layer

                                                                                                       User Space
                                                                                                   Kernel Space

                                                        Environment Abstraction Layer
                                                                                        Linux Kernel




                                                        Platform Hardware




21      Intel Restricted Secret
                                  TRANSFORMING COMMUNICATIONS
                                  TRANSFORMING COMMUNICATIONS
Intel® DPDK Libraries and Drivers

     • Memory Manager: Responsible for allocating pools of objects in memory. A pool is
       created in huge page memory space and uses a ring to store free objects. It also
       provides an alignment helper to ensure that objects are padded to spread them
       equally on all DRAM channels.
     • Buffer Manager: Reduces by a significant amount the time the operating system
       spends allocating and de-allocating buffers. The Intel® DPDK pre-allocates fixed size
       buffers which are stored in memory pools.
     • Queue Manager:: Implements safe lockless queues, instead of using spinlocks, that
       allow different software components to process packets, while avoiding unnecessary
       wait times.
     • Flow Classification: Provides an efficient mechanism which incorporates Intel®
       Streaming SIMD Extensions (Intel® SSE) to produce a hash based on tuple
       information so that packets may be placed into flows quickly for processing, thus
       greatly improving throughput.
     • Poll Mode Drivers: The Intel® DPDK includes Poll Mode Drivers for 1 GbE and 10 GbE
       Ethernet* controllers which are designed to work without asynchronous, interrupt-
       based signaling mechanisms, which greatly speeds up the packet pipeline.




22      Intel Restricted Secret
                                  TRANSFORMING COMMUNICATIONS
                                  TRANSFORMING COMMUNICATIONS
Intel® DPDK Native and Virtualized
     Forwarding Performance




23    Intel Restricted Secret
                                TRANSFORMING COMMUNICATIONS
                                TRANSFORMING COMMUNICATIONS
Comparison
             Netmap           DPDK           OpenOnload


License      BSD              BSD            GPL


API          Packet + pcap    Packet + lib   Sockets


Kernel       Yes              Yes            Yes


HW support   Intel, realtek   Intel          Solarflare


OS           FreeBSD, Linux   Linux          Linux
Issues
●
    Out of tree kernel code
    –   Non standard drivers
●
    Resource sharing
    –   CPU
    –   NIC
●
    Security
    –   No firewall
    –   DMA isolation
What's needed?
●
    Netmap
    –   Linux version (not port)
    –   Higher level protocols?
●
    DPDK
    –   Wider device support
    –   Ask Intel
●
    Openonload
    –   Ask Solarflare
●
    OpenOnload
    –   A user-level network stack (Google tech talk)
        ●
            Steve Pope
        ●
            David Riddoch
●
    Netmap - Luigi Rizzo
    –   http://info.iet.unipi.it/~luigi/netmap/talk-atc12.html
●
    DPDK
    –   Intel DPDK Overview
    –   Disruptive network IP networking
        ●
            Naoto MASMOTO
Thank you

More Related Content

What's hot

Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDKKernel TLV
 
DevConf 2014 Kernel Networking Walkthrough
DevConf 2014   Kernel Networking WalkthroughDevConf 2014   Kernel Networking Walkthrough
DevConf 2014 Kernel Networking WalkthroughThomas Graf
 
A Journey into Hexagon: Dissecting Qualcomm Basebands
A Journey into Hexagon: Dissecting Qualcomm BasebandsA Journey into Hexagon: Dissecting Qualcomm Basebands
A Journey into Hexagon: Dissecting Qualcomm BasebandsPriyanka Aash
 
LinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughLinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughThomas Graf
 
The TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelThe TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelDivye Kapoor
 
DPDK in Containers Hands-on Lab
DPDK in Containers Hands-on LabDPDK in Containers Hands-on Lab
DPDK in Containers Hands-on LabMichelle Holley
 
Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF SuperpowersBrendan Gregg
 
DPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet ProcessingDPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet ProcessingMichelle Holley
 
NVMe Over Fabrics Support in Linux
NVMe Over Fabrics Support in LinuxNVMe Over Fabrics Support in Linux
NVMe Over Fabrics Support in LinuxLF Events
 
Fast Userspace OVS with AF_XDP, OVS CONF 2018
Fast Userspace OVS with AF_XDP, OVS CONF 2018Fast Userspace OVS with AF_XDP, OVS CONF 2018
Fast Userspace OVS with AF_XDP, OVS CONF 2018Cheng-Chun William Tu
 
Enable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunEnable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunheut2008
 
The linux networking architecture
The linux networking architectureThe linux networking architecture
The linux networking architecturehugo lu
 
netfilter and iptables
netfilter and iptablesnetfilter and iptables
netfilter and iptablesKernel TLV
 
FD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingFD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingKernel TLV
 
Memory Mapping Implementation (mmap) in Linux Kernel
Memory Mapping Implementation (mmap) in Linux KernelMemory Mapping Implementation (mmap) in Linux Kernel
Memory Mapping Implementation (mmap) in Linux KernelAdrian Huang
 
Virtualized network with openvswitch
Virtualized network with openvswitchVirtualized network with openvswitch
Virtualized network with openvswitchSim Janghoon
 
Slab Allocator in Linux Kernel
Slab Allocator in Linux KernelSlab Allocator in Linux Kernel
Slab Allocator in Linux KernelAdrian Huang
 
Understanding DPDK algorithmics
Understanding DPDK algorithmicsUnderstanding DPDK algorithmics
Understanding DPDK algorithmicsDenys Haryachyy
 

What's hot (20)

Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
DevConf 2014 Kernel Networking Walkthrough
DevConf 2014   Kernel Networking WalkthroughDevConf 2014   Kernel Networking Walkthrough
DevConf 2014 Kernel Networking Walkthrough
 
A Journey into Hexagon: Dissecting Qualcomm Basebands
A Journey into Hexagon: Dissecting Qualcomm BasebandsA Journey into Hexagon: Dissecting Qualcomm Basebands
A Journey into Hexagon: Dissecting Qualcomm Basebands
 
LinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughLinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking Walkthrough
 
The TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelThe TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux Kernel
 
DPDK in Containers Hands-on Lab
DPDK in Containers Hands-on LabDPDK in Containers Hands-on Lab
DPDK in Containers Hands-on Lab
 
Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF Superpowers
 
DPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet ProcessingDPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet Processing
 
NVMe Over Fabrics Support in Linux
NVMe Over Fabrics Support in LinuxNVMe Over Fabrics Support in Linux
NVMe Over Fabrics Support in Linux
 
Fast Userspace OVS with AF_XDP, OVS CONF 2018
Fast Userspace OVS with AF_XDP, OVS CONF 2018Fast Userspace OVS with AF_XDP, OVS CONF 2018
Fast Userspace OVS with AF_XDP, OVS CONF 2018
 
Enable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunEnable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zun
 
The linux networking architecture
The linux networking architectureThe linux networking architecture
The linux networking architecture
 
netfilter and iptables
netfilter and iptablesnetfilter and iptables
netfilter and iptables
 
FD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingFD.IO Vector Packet Processing
FD.IO Vector Packet Processing
 
Linux device drivers
Linux device drivers Linux device drivers
Linux device drivers
 
Memory Mapping Implementation (mmap) in Linux Kernel
Memory Mapping Implementation (mmap) in Linux KernelMemory Mapping Implementation (mmap) in Linux Kernel
Memory Mapping Implementation (mmap) in Linux Kernel
 
Virtualized network with openvswitch
Virtualized network with openvswitchVirtualized network with openvswitch
Virtualized network with openvswitch
 
Linux device drivers
Linux device driversLinux device drivers
Linux device drivers
 
Slab Allocator in Linux Kernel
Slab Allocator in Linux KernelSlab Allocator in Linux Kernel
Slab Allocator in Linux Kernel
 
Understanding DPDK algorithmics
Understanding DPDK algorithmicsUnderstanding DPDK algorithmics
Understanding DPDK algorithmics
 

Viewers also liked

Ethernet and TCP optimizations
Ethernet and TCP optimizationsEthernet and TCP optimizations
Ethernet and TCP optimizationsJeff Squyres
 
I/O仮想化最前線〜ネットワークI/Oを中心に〜
I/O仮想化最前線〜ネットワークI/Oを中心に〜I/O仮想化最前線〜ネットワークI/Oを中心に〜
I/O仮想化最前線〜ネットワークI/Oを中心に〜Ryousei Takano
 
DPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel Architecture
DPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel ArchitectureDPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel Architecture
DPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel ArchitectureJim St. Leger
 
Direct Code Execution @ CoNEXT 2013
Direct Code Execution @ CoNEXT 2013Direct Code Execution @ CoNEXT 2013
Direct Code Execution @ CoNEXT 2013Hajime Tazaki
 
NUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osioNUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osioHajime Tazaki
 
Netmap presentation
Netmap presentationNetmap presentation
Netmap presentationAmir Razmjou
 
PASTE: Network Stacks Must Integrate with NVMM Abstractions
PASTE: Network Stacks Must Integrate with NVMM AbstractionsPASTE: Network Stacks Must Integrate with NVMM Abstractions
PASTE: Network Stacks Must Integrate with NVMM Abstractionsmicchie
 
Cisco usNIC: how it works, how it is used in Open MPI
Cisco usNIC: how it works, how it is used in Open MPICisco usNIC: how it works, how it is used in Open MPI
Cisco usNIC: how it works, how it is used in Open MPIJeff Squyres
 
70 лет победы!
70 лет победы!70 лет победы!
70 лет победы!Fintfin
 
Кратко о Rakudo
Кратко о RakudoКратко о Rakudo
Кратко о RakudoAndrew Shitov
 
Lagopus presentation on 14th Annual ON*VECTOR International Photonics Workshop
Lagopus presentation on 14th Annual ON*VECTOR International Photonics WorkshopLagopus presentation on 14th Annual ON*VECTOR International Photonics Workshop
Lagopus presentation on 14th Annual ON*VECTOR International Photonics WorkshopLagopus SDN/OpenFlow switch
 
5º Civilización U4º VA: La señora de cao
5º Civilización U4º VA: La señora de cao5º Civilización U4º VA: La señora de cao
5º Civilización U4º VA: La señora de caoebiolibros
 
Poptrie: A Compressed Trie with Population Count for Fast and Scalable Softwa...
Poptrie: A Compressed Trie with Population Count for Fast and Scalable Softwa...Poptrie: A Compressed Trie with Population Count for Fast and Scalable Softwa...
Poptrie: A Compressed Trie with Population Count for Fast and Scalable Softwa...Hirochika Asai
 
X86 hardware for packet processing
X86 hardware for packet processingX86 hardware for packet processing
X86 hardware for packet processingHisaki Ohara
 
DPDK summit 2015: It's kind of fun to do the impossible with DPDK
DPDK summit 2015: It's kind of fun  to do the impossible with DPDKDPDK summit 2015: It's kind of fun  to do the impossible with DPDK
DPDK summit 2015: It's kind of fun to do the impossible with DPDKLagopus SDN/OpenFlow switch
 
mSwitch: A Highly-Scalable, Modular Software Switch
mSwitch: A Highly-Scalable, Modular Software SwitchmSwitch: A Highly-Scalable, Modular Software Switch
mSwitch: A Highly-Scalable, Modular Software Switchmicchie
 

Viewers also liked (20)

Ethernet and TCP optimizations
Ethernet and TCP optimizationsEthernet and TCP optimizations
Ethernet and TCP optimizations
 
I/O仮想化最前線〜ネットワークI/Oを中心に〜
I/O仮想化最前線〜ネットワークI/Oを中心に〜I/O仮想化最前線〜ネットワークI/Oを中心に〜
I/O仮想化最前線〜ネットワークI/Oを中心に〜
 
DPDK KNI interface
DPDK KNI interfaceDPDK KNI interface
DPDK KNI interface
 
DPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel Architecture
DPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel ArchitectureDPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel Architecture
DPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel Architecture
 
Direct Code Execution @ CoNEXT 2013
Direct Code Execution @ CoNEXT 2013Direct Code Execution @ CoNEXT 2013
Direct Code Execution @ CoNEXT 2013
 
NUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osioNUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osio
 
Deep C
Deep CDeep C
Deep C
 
Netmap presentation
Netmap presentationNetmap presentation
Netmap presentation
 
PASTE: Network Stacks Must Integrate with NVMM Abstractions
PASTE: Network Stacks Must Integrate with NVMM AbstractionsPASTE: Network Stacks Must Integrate with NVMM Abstractions
PASTE: Network Stacks Must Integrate with NVMM Abstractions
 
Cisco usNIC: how it works, how it is used in Open MPI
Cisco usNIC: how it works, how it is used in Open MPICisco usNIC: how it works, how it is used in Open MPI
Cisco usNIC: how it works, how it is used in Open MPI
 
70 лет победы!
70 лет победы!70 лет победы!
70 лет победы!
 
Кратко о Rakudo
Кратко о RakudoКратко о Rakudo
Кратко о Rakudo
 
Lagopus presentation on 14th Annual ON*VECTOR International Photonics Workshop
Lagopus presentation on 14th Annual ON*VECTOR International Photonics WorkshopLagopus presentation on 14th Annual ON*VECTOR International Photonics Workshop
Lagopus presentation on 14th Annual ON*VECTOR International Photonics Workshop
 
Java - основы языка
Java - основы языкаJava - основы языка
Java - основы языка
 
5º Civilización U4º VA: La señora de cao
5º Civilización U4º VA: La señora de cao5º Civilización U4º VA: La señora de cao
5º Civilización U4º VA: La señora de cao
 
Poptrie: A Compressed Trie with Population Count for Fast and Scalable Softwa...
Poptrie: A Compressed Trie with Population Count for Fast and Scalable Softwa...Poptrie: A Compressed Trie with Population Count for Fast and Scalable Softwa...
Poptrie: A Compressed Trie with Population Count for Fast and Scalable Softwa...
 
Java 9 - кратко о новом
Java 9 -  кратко о новомJava 9 -  кратко о новом
Java 9 - кратко о новом
 
X86 hardware for packet processing
X86 hardware for packet processingX86 hardware for packet processing
X86 hardware for packet processing
 
DPDK summit 2015: It's kind of fun to do the impossible with DPDK
DPDK summit 2015: It's kind of fun  to do the impossible with DPDKDPDK summit 2015: It's kind of fun  to do the impossible with DPDK
DPDK summit 2015: It's kind of fun to do the impossible with DPDK
 
mSwitch: A Highly-Scalable, Modular Software Switch
mSwitch: A Highly-Scalable, Modular Software SwitchmSwitch: A Highly-Scalable, Modular Software Switch
mSwitch: A Highly-Scalable, Modular Software Switch
 

Similar to Userspace networking

High perf-networking
High perf-networkingHigh perf-networking
High perf-networkingmtimjones
 
Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)Shien-Chun Luo
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...PROIDEA
 
Webcast: Reduce latency, improve analytics and maximize asset utilization in ...
Webcast: Reduce latency, improve analytics and maximize asset utilization in ...Webcast: Reduce latency, improve analytics and maximize asset utilization in ...
Webcast: Reduce latency, improve analytics and maximize asset utilization in ...Emulex Corporation
 
100G Networking Berlin.pdf
100G Networking Berlin.pdf100G Networking Berlin.pdf
100G Networking Berlin.pdfJunZhao68
 
Pushing Packets - How do the ML2 Mechanism Drivers Stack Up
Pushing Packets - How do the ML2 Mechanism Drivers Stack UpPushing Packets - How do the ML2 Mechanism Drivers Stack Up
Pushing Packets - How do the ML2 Mechanism Drivers Stack UpJames Denton
 
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running LinuxLinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linuxbrouer
 
In-Network Acceleration with FPGA (MEMO)
In-Network Acceleration with FPGA (MEMO)In-Network Acceleration with FPGA (MEMO)
In-Network Acceleration with FPGA (MEMO)Naoto MATSUMOTO
 
Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A SupercomputerIntroduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A SupercomputerFörderverein Technische Fakultät
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHungWei Chiu
 
OpenStack and OpenFlow Demos
OpenStack and OpenFlow DemosOpenStack and OpenFlow Demos
OpenStack and OpenFlow DemosBrent Salisbury
 
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and moreAdvanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and moreinside-BigData.com
 
Steen_Dissertation_March5
Steen_Dissertation_March5Steen_Dissertation_March5
Steen_Dissertation_March5Steen Larsen
 
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors Michelle Holley
 
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Ontico
 

Similar to Userspace networking (20)

High perf-networking
High perf-networkingHigh perf-networking
High perf-networking
 
mTCP使ってみた
mTCP使ってみたmTCP使ってみた
mTCP使ってみた
 
Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)
 
SudheerV_resume_a
SudheerV_resume_aSudheerV_resume_a
SudheerV_resume_a
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
 
Webcast: Reduce latency, improve analytics and maximize asset utilization in ...
Webcast: Reduce latency, improve analytics and maximize asset utilization in ...Webcast: Reduce latency, improve analytics and maximize asset utilization in ...
Webcast: Reduce latency, improve analytics and maximize asset utilization in ...
 
100G Networking Berlin.pdf
100G Networking Berlin.pdf100G Networking Berlin.pdf
100G Networking Berlin.pdf
 
Pushing Packets - How do the ML2 Mechanism Drivers Stack Up
Pushing Packets - How do the ML2 Mechanism Drivers Stack UpPushing Packets - How do the ML2 Mechanism Drivers Stack Up
Pushing Packets - How do the ML2 Mechanism Drivers Stack Up
 
uCluster
uClusteruCluster
uCluster
 
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running LinuxLinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
 
In-Network Acceleration with FPGA (MEMO)
In-Network Acceleration with FPGA (MEMO)In-Network Acceleration with FPGA (MEMO)
In-Network Acceleration with FPGA (MEMO)
 
Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A SupercomputerIntroduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User Group
 
slides
slidesslides
slides
 
Mina2
Mina2Mina2
Mina2
 
OpenStack and OpenFlow Demos
OpenStack and OpenFlow DemosOpenStack and OpenFlow Demos
OpenStack and OpenFlow Demos
 
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and moreAdvanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
 
Steen_Dissertation_March5
Steen_Dissertation_March5Steen_Dissertation_March5
Steen_Dissertation_March5
 
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors
 
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
 

More from Stephen Hemminger

More from Stephen Hemminger (14)

Performance challenges in software networking
Performance challenges in software networkingPerformance challenges in software networking
Performance challenges in software networking
 
Staging driver sins
Staging driver sinsStaging driver sins
Staging driver sins
 
Netem -emulating real networks in the lab
Netem -emulating real networks in the labNetem -emulating real networks in the lab
Netem -emulating real networks in the lab
 
Untold story
Untold storyUntold story
Untold story
 
Llnw bufferbloat
Llnw bufferbloatLlnw bufferbloat
Llnw bufferbloat
 
Dpdk performance
Dpdk performanceDpdk performance
Dpdk performance
 
Bufferbloat is alll Wet!
Bufferbloat is alll Wet!Bufferbloat is alll Wet!
Bufferbloat is alll Wet!
 
Linux Bridging: Teaching an old dog new tricks
Linux Bridging: Teaching an old dog new tricksLinux Bridging: Teaching an old dog new tricks
Linux Bridging: Teaching an old dog new tricks
 
Taking the Fear Out of Contributing
Taking the Fear Out of ContributingTaking the Fear Out of Contributing
Taking the Fear Out of Contributing
 
Integrating Linux routing with FusionCLI™
Integrating Linux routing with FusionCLI™Integrating Linux routing with FusionCLI™
Integrating Linux routing with FusionCLI™
 
Virtual Network Performance Challenge
Virtual Network Performance ChallengeVirtual Network Performance Challenge
Virtual Network Performance Challenge
 
A Baker's dozen of TCP
A Baker's dozen of TCPA Baker's dozen of TCP
A Baker's dozen of TCP
 
Virtual net performance
Virtual net performanceVirtual net performance
Virtual net performance
 
Online tools
Online toolsOnline tools
Online tools
 

Recently uploaded

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

Userspace networking

  • 1. Networking in Userspace Living on the edge Stephen Hemminger stephen@networkplumber.org
  • 2. Problem Statement 20,000,000 Packets per second (bidirectional) 15,000,000 10,000,000 5,000,000 0 64 208 352 496 640 784 928 1072 121613601504 Packet Size (bytes) Intel: DPDK Overview
  • 3. Server vs Infrastructure Server Packets Network Infrastructure Packet Size 1024 bytes Packet Size 64 bytes Packets/second 1.2 Million Packets/second 14.88 Million Arrival rate 67.2 ns Arrival rate 835 ns 2 GHz Clock 135 cycles cycles 2 GHz 1670 cycles 3 Ghz Clock 201 cycles 3 Ghz 2505 cycles cycles L3 hit on Intel® Xeon® ~40 cycles L3 miss, memory read is (201 cycles at 3 GHz)
  • 5.
  • 7. Good old sockets Flexible, portable but slow
  • 8. Memory mapped buffers Efficient, but still constrained by architecture
  • 10. The OpenOnload architecture Network hardware provides a user-safe interface which can route Ethernet packets to an application context based on flow information contained within headers Kernel Application Application Context Context Context Application Application Application Protocol Protocol Driver Network Driver DMA No new protocols DMA Network Adaptor Slide 7
  • 11. The OpenOnload architecture Protocol processing can take place both in the application and kernel context for a given flow Kernel Application Application Context Context Context Application Application Application Protocol Protocol Enables persistent / asynchronous processing Driver Network Driver Maintains existing network control-plane DMA DMA Network Adaptor Slide 8
  • 12. The OpenOnload architecture Protocol state is shared between the kernel and application contexts through a protected shared memory communications channel Kernel Application Application Context Context Context Application Application Application Protocol Protocol Enables correct handling of protocol state with high-performance Driver Network Driver DMA DMA Network Adaptor Slide 9
  • 13. Performance metrics Overhead – Networking overheads take CPU time away from your application Latency – Holds your application up when it has nothing else to do – H/W + flight time + overhead Bandwidth – Dominates latency when messages are large – Limited by: algorithms, buffering and overhead Scalability – Determines how overhead grows as you add cores, memory, threads, sockets etc. Slide 11
  • 14. Anatomy of kernel-based networking Slide 12
  • 16. Direct & safe hardware access Slide 14
  • 17. Some performance results Test platform: typical commodity server – Intel clovertown 2.3 GHz quad-core xeon (x1) 1.3 GHz FSB, 2 Gb RAM – Intel 5000X chipset – Solarflare Solarstorm SFC4000 (B) controller, CX4 – Back-to-back – RedHat Enterprise 5 (2.6.18-8.el5) Slide 88
  • 18. Performance: Latency and overhead TCP ping-pong with 4 byte payload 70 byte frame: 14+20+20+12+4 ½ round-trip latency CPU overhead (microseconds) (microseconds) Hardware 4.2 -- Kernel 11.2 7.0 Onload 5.3 1.1 Slide 89
  • 20. Performance: UDP transmit Nessage rate: – 4 byte UDP payload (46 byte frame) Kernel Onload 1 sender 473,000 2,030,000 Slide 93
  • 21. Performance: UDP transmit Nessage rate: – 4 byte UDP payload (46 byte frame) Kernel Onload 1 sender 473,000 2,030,000 2 senders 532,000 3,880,000 Slide 94
  • 23. OpenOnload Open Source OpenOnload available as Open Source (GPLv2) – Please contact us if you’re interested Compatible with x86 (ia32, amd64/emt64) Currently supports SMC10GPCIe-XFP and SMC10GPCIe-10BT NICs – Could support other user-accessible network interfaces Very interested in user feedback – On the technology and project directions Slide 100
  • 24. Netmap http://info.iet.unipi.it/~luigi/netmap/ ● BSD (and Linux port) ● Good scalability ● Libpcap emulation
  • 26. Netmap API ● Access – open("/dev/netmap") – ioctl(fd, NIOCREG, arg) – mmap(..., fd, 0) maps buffers and rings ● Transmit – fill up to avail buffers, starting from slot cur. – ioctl(fd,NIOCTXSYNC) queues the packets ● Receive – ioctl(fd,NIOCRXSYNC) reports newly received packets – process up to avail buffers, starting from slot cur. These ioctl()s are non-blocking.
  • 27. Netmap API: synchronization ● poll() and select(), what else! – POLLIN and POLLOUT decide which sets of rings to work on – work as expected, returning when avail>0 – interrupt mitigation delays are propagated up to the userspace process
  • 28. Netmap: multiqueue ● Of course. – one netmap ring per physical ring – by default, the fd is bound to all rings – ioctl(fd, NIOCREG, arg) can restrict the binding to a single ring pair – multiple fd's can be bound to different rings on the same card – the fd's can be managed by different threads – threads mapped to cores with pthread_setaffinity()
  • 29. Netmap and the host stack ● While in netmap mode, the control path remains unchanged: – ifconfig, ioctl's, etc still work as usual – the OS still believes the interface is there ● The data path is detached from the host stack: – packets from NIC end up in RX netmap rings – packets from TX netmap rings are sent to the NIC ● The host stack is attached to an extra netmap rings: – packets from the host go to a SW RX netmap ring – packets from a SW TX netmap ring are sent to the host – these rings are managed using the netmap API
  • 32. Netmap Summary Packet Forwarding Mpps Freebsd bridging 0.690 Netmap + libpcap 7.500 Netmap 14.88 Open vSwitch Mpps userspace 0.065 linux 0.600 FreeBSD 0.790 FreeBSD+netmap/pcap 3.050
  • 34. The Intel® DPDK Philosophy Intel® DPDK Fundamentals • Implements a run to completion model or pipeline model • No scheduler - all devices accessed by polling • Supports 32-bit and 64-bit with/without NUMA • Scales from Intel® Atom™ to Intel® Xeon® processors • Number of Cores and Processors not limited • Optimal packet allocation across DRAM channels Control Plane Data Plane • Must run on any IA CPU Provide software examples that ‒ From Intel® Atom™ processor to the address common network latest Intel® Xeon® processor family performance deficits ‒ Essential to the IA value proposition ‒ Best practices for software architecture ‒ ‒ Tips for data structure design and storage • Focus on the fast-path ‒ Help the compiler generate optimum code ‒ Sending large number of packets to the ‒ Address the challenges of achieving 80 Linux Kernel /GPOS will bog the system down Mpps per CPU Socket 20 Intel Restricted Secret TRANSFORMING COMMUNICATIONS TRANSFORMING COMMUNICATIONS
  • 35. Intel® Data Plane Development Kit (Intel® DPDK) Intel® DPDK embeds optimizations for Intel® DPDK Libraries the IA platform: - Data Plane Libraries and Optimized NIC Customer Drivers in Linux User Space Buffer Management Application Queue/Ring Functions Customer - Run-time Environment Application Packet Flow Classification - Environment Abstraction Layer and Boot Code Customer NIC Poll Mode Library Application - BSD-licensed & source downloadable from Intel and leading ecopartners Environment Abstraction Layer User Space Kernel Space Environment Abstraction Layer Linux Kernel Platform Hardware 21 Intel Restricted Secret TRANSFORMING COMMUNICATIONS TRANSFORMING COMMUNICATIONS
  • 36. Intel® DPDK Libraries and Drivers • Memory Manager: Responsible for allocating pools of objects in memory. A pool is created in huge page memory space and uses a ring to store free objects. It also provides an alignment helper to ensure that objects are padded to spread them equally on all DRAM channels. • Buffer Manager: Reduces by a significant amount the time the operating system spends allocating and de-allocating buffers. The Intel® DPDK pre-allocates fixed size buffers which are stored in memory pools. • Queue Manager:: Implements safe lockless queues, instead of using spinlocks, that allow different software components to process packets, while avoiding unnecessary wait times. • Flow Classification: Provides an efficient mechanism which incorporates Intel® Streaming SIMD Extensions (Intel® SSE) to produce a hash based on tuple information so that packets may be placed into flows quickly for processing, thus greatly improving throughput. • Poll Mode Drivers: The Intel® DPDK includes Poll Mode Drivers for 1 GbE and 10 GbE Ethernet* controllers which are designed to work without asynchronous, interrupt- based signaling mechanisms, which greatly speeds up the packet pipeline. 22 Intel Restricted Secret TRANSFORMING COMMUNICATIONS TRANSFORMING COMMUNICATIONS
  • 37. Intel® DPDK Native and Virtualized Forwarding Performance 23 Intel Restricted Secret TRANSFORMING COMMUNICATIONS TRANSFORMING COMMUNICATIONS
  • 38. Comparison Netmap DPDK OpenOnload License BSD BSD GPL API Packet + pcap Packet + lib Sockets Kernel Yes Yes Yes HW support Intel, realtek Intel Solarflare OS FreeBSD, Linux Linux Linux
  • 39. Issues ● Out of tree kernel code – Non standard drivers ● Resource sharing – CPU – NIC ● Security – No firewall – DMA isolation
  • 40. What's needed? ● Netmap – Linux version (not port) – Higher level protocols? ● DPDK – Wider device support – Ask Intel ● Openonload – Ask Solarflare
  • 41. OpenOnload – A user-level network stack (Google tech talk) ● Steve Pope ● David Riddoch ● Netmap - Luigi Rizzo – http://info.iet.unipi.it/~luigi/netmap/talk-atc12.html ● DPDK – Intel DPDK Overview – Disruptive network IP networking ● Naoto MASMOTO