SlideShare a Scribd company logo
1 of 32
Download to read offline
DPDK
Data Plane Development Kit
Ariel Waizel ariel.waizel@gmail.com
2Confidential
Who Am I?
• Programmer for 12 years
– Mostly C/C++ for cyber and networks applications.
– Some higher languages for machine learning and generic DevOps (Python and Java).
• M.sc in Information System Engineering (a.k.a machine learning).
• Currently I’m a solution architect at Contextream ( HPE)
– The company specializes in software defined networks for carrier grade companies (Like Bezek, Partner, etc.).
– Appeal: Everything is open-source!
– I work with costumers on end-to-end solutions and create POCs.
• Gaming for fun.
3Confidential
Let’s Talk About DPDK
• What is DPDK?
• Why use it?
• How good is it?
• How does it work?
4Confidential
The Challenge
• Network application, done in software, supporting native network speeds.
• Network speeds:
– 10 Gb per NIC (Network Interface Card).
– 40 Gb per NIC.
– 100 Gb?
• Network applications:
– Network device implemented in software (Router, switch, http proxy, etc.).
– Network services for virtual machines (Openstack, etc.).
– Security. protect your applications from DDOS.
5Confidential
What is DPDK?
• Set of UIO (User IO) drivers and user space libraries.
• Goal: forward network packet to/from NIC (Network Interface Card) from/to user application at native
speed:
– 10 or 40 Gb NICs.
– Speed is the most (only?) important criteria.
– Only forwards the packets – not a network stack (But there’re helpful libraries and examples to use).
• All traffic bypasses the kernel (We’ll get to why).
– When a NIC is controlled by a DPDK driver, it’s invisible to the kernel.
• Open source (BSD-3 for most, GPL for Linux Kernel related parts).
6Confidential
Why DPDK Should Interest Kernel Devs?
• Bypassing the kernel is importance because of performance - Intriguing by itself.
– At the very least, know the “competition”.
• DPDK is a very light-weight, low level, performance driven framework – Makes it a good learning ground
for kernel developers, to learn performance guidelines.
7Confidential
Why bypassing the kernel is a necessity.
Why Use DPDK?
8Confidential
10Gb – Crunching Numbers
• Minimum Ethernet packet: 64 bytes + 20 preamble.
• Maximum number of pps: 14,880,952
– 10^10 bps /(84 bytes * 8)
• Minimum time to process a single packet: 67.2 ns
• Minimum CPU cycles: 201 cycles on a 3 Ghz CPU (1 Ghz -> 1 cycle per ns).
9Confidential
Time (ish) per Operation (July 2014)
Time (Expected)Operation
32 nsCache Miss
4.3 nsL2 cache access
7.9 nsL3 cache access
8.25 ns (16.5 ns for unlock too)“LOCK” operation (like Atomic)
41.85 ns (75.34 for audit-syscall)syscall
80 ns for alloc + freeSLUB Allocator (Linux Kernel buffer allocator)
77 nsSKB (Linux Kernel packet struct) Memory Manager
48 ns minimum, between 58 to 68 ns averageQdisc (tx queue descriptor)
Up to several Cache Misses.TLB Miss
* Red Hat Challenge 10Gbit/s Jesper Dangaard Brouer et al. Netfilter Workshop July 2014.
Conclusion: Must use batching (Send/rcv several packet together at the same time, amortize costs)
10Confidential
Time Per Operation – The Big No-Nos
Average TimeOperation
Micro seconds (1000 ns)Context Switch
Micro secondsPage Fault
Conclusion: Must pin CPUs and pre-allocate resources
11Confidential
Benchmarks at the end.
As of Today: DPDK can handle 11X the traffic Linux Kernel Can!
12Confidential
How It Works.
DPDK Architecture
13Confidential
DPDK – What do you get?
• UIO drivers
• PMD per hardware NIC:
– PMD (Poll Mode Driver) support for RX/TX (Receive and Transmit).
– Mapping PCI memory and registers.
– Mapping user memory (for example: packet memory buffers) into the NIC.
– Configuration of specific HW accelerations in the NIC.
• User space libraries:
– Initialize PMDs
– Threading (builds on pthread).
– CPU Management
– Memory Management (Huge pages only!).
– Hashing, Scheduler, pipeline (packet steering), etc. – High performance support libraries for the application.
14Confidential
DPDK APP
DPDK APP
From NIC to Process – Pipe Line modelNIC
RX Q
RX Q
PMD
PMD
TX Q
TX Q
DPDK APP
RING
RINGIgb_uio
* igb_uio is the DPDK standard kernel uio driver for device control plane
15Confidential
From NIC to Process – Run To Completion ModelNIC
RX Q
RX Q
TX Q
TX Q
DPDK APP + PMD
DPDK APP + PMDIgb_uio
16Confidential
Software Configuration
• C Code
– GCC 4.5.X or later.
• Required:
– Kernel version >= 2.6.34
– Hugetablefs (For best performance use 1G pages, which require GRUB configuration).
• Recommended:
– isolcpus in GRUB configuration: isolates CPUs from the scheduler.
• Compilation
– DPDK applications are statically compiled/linked with the DPDK libraries, for best performance.
– Export RTE_SDK and RTE_TARGET to develop the application from a misc. directory
• Setup:
– Best to use tools/dpdk-setup.sh to setup/build the environment.
– Use tools/dpdk-devbind.py
o -- status let’s you see the available NICs and their corresponding drivers.
o -bind let’s you bind a NIC to a driver. Igb-uio, for example.
– Run the application with the appropriate arguments.
17Confidential
Software Architecture
• PMD drivers are just user space pthreads that call specific EAL functions
– These EAL functions have concrete implementations per NIC, and this costs couple of indirections.
– Access to RX/TX descriptors is direct.
– Uses UIO driver for specific control changes (like configuring interrupts).
• Most DPDK libraries are not thread-safe.
– PMD drivers are non-preemptive: Can’t have 2 PMDs handle the same HW queue on the same NIC.
• All inner-thread communication is based on librte_ring.
– A mp-mc lockless non-resizable queue ring implementation.
– Optimized for DPDK purposes.
• All resources like memory (malloc), threads, descriptor queues, etc. are initialized at the start.
18Confidential
Software Architecture
19Confidential
Application Bring up
20Confidential
Code Example – Basic Forwarding
• Main Function:
DPDK Init
Get all available NICS (binded with igb_uio)
Initialize packet buffers
Initialize NICs.
21Confidential
Code Example – port_init
NIC init: Set number of queues
Rx queue init: Set packet buffer pool for queue
* Uses librte_ring to be thread safe
Tx queue init: No need for buffer pool
Start Getting Packets
22Confidential
Code Example – lcore_main
PMD
23Confidential
Igb_uio
• For Intel Gigabit NICs.
– Simple enough to work for most NICs, nonetheless.
• Basically:
– Calls pci_enable_device.
– Enables bus mastering on the device (pci_set_master).
– Requests all BARs and mas them using ioremap.
– Setups ioports.
– Sets the dma mask to 64-bit.
• Code to support SRIOV and xen.
24Confidential
rte_ring
• Fixed sized, “lockless”, queue ring
• Non Preemptive.
• Supports multiple/single producer/consumer, and bulk actions.
• Uses:
– Single array of pointers.
– Head/tail pointers for both producer and consumer (total 4 pointers).
• To enqueue (Just like dequeue):
– Until successful:
o Save in local variable the current head_ptr.
o head_next = head_ptr + num_objects
o CAS the head_ptr to head_next
– Insert objects.
– Until successful:
o Update tail_ptr = head_next + 1 when tail_ptr == head_ptr
• Analysis:
– Light weight.
– In theory, both loops are costly.
– In practice, as all threads are cpu bound, the amortized cost is low for the first loop, and very unlikely at the second loop.
25Confidential
rte_mempool
• Smart memory allocation
• Allocate the start of each memory buffer at a different memory channel/rank:
– Most applications only want to see the first 64 bytes of the packet (Ether+ip header).
– Requests for memory at different channels, same rank, are done concurrently by the memory controller.
– Requests for memory at different ranks can be managed effectively by the memory controller.
– Pads objects until gcd(obj_size,num_ranks*num_channels) == 1.
• Maintain a per-core cache, bulk requests to the mempool ring.
• Allocate memory based on:
– NUMA
– Contiguous virtual memory (Means also contiguous physical memory for huge pages).
• Non- preemptive.
– Same lcore must not context switch to another task using the same mempool.
26Confidential
Linux kernel Vs. DPDK
Benchmarks
27Confidential
Linux Kernel Benchmarks, single core
Feb. 2016July 2014Benchmark
14.8 Mpps4 MppsTX
12 Mpps (experimental)6.4 MppsRX (Dump at driver)
2 Mpps1 MppsL3 Forwarding (RX+Filter+TX)
12 Mpps6 MppsL3 forwarding (Multi Core)
* Red Hat The 100Gbit/s Challenge Jesper Dangaard Brouer et al. DevConf Feb. 2016.
28Confidential
DPDK Benchmarks (March 2016)
Multi CoreSingle CoreBenchmark
Linear Increase22 MppsL3 forwarding (PHY-PHY)
Linear Increase11 MppsSwitch Forwarding (PHY-OVS-PHY)
Near Linear Increase3.4 MppsVM Forwarding (PHY-VM-PHY)
Linear Increase2 MppsVM to VM
* Intel Open Network Platform Release 2.1 Performance Test Report March 2016.
• All tests with:
• 4X40Gb ports
• E5-2695 V4 2.1Ghz Processor
• 16X1GB Huge Pages, 2048X2MB Huge Pages
29Confidential
DPDK Benchmarks Figures
PHY-PHY PHY-OVS-PHY VM-VM
30Confidential
DPDK Pros and Cons
31Confidential
DPDK Advantages
• Best forwarding performance solution to/from PHY/process/VM to date.
– Best single core performance.
– Scales: linear performance increase per core.
• Active and longstanding community (from 2012).
– DPDK.org
– Full of tutorials, examples and complementary features.
• Active popular products
– OVS-DPDK is the main solution for high speed networking in openstack.
– 6WIND.
– TRex.
• Great virtualization support.
– Deploy at the host of the guest environment.
32Confidential
DPDK Disadvantages
• Security
• Isolated ecosystem:
– Hard to use linux kernel infrastructure (While there are precedents).
• Requires modified code in applications to use:
– DPDK processes use a very specific API.
– DPDK application can’t interact transparently with Linux processes (important for transparent networking applications
like Firewall , DDOS mitigation, etc.).
o Solved for interaction with VMs by the vhost Library.
• Requires Huge pages (XDP doesn’t).

More Related Content

What's hot

DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingMichelle Holley
 
Understanding DPDK algorithmics
Understanding DPDK algorithmicsUnderstanding DPDK algorithmics
Understanding DPDK algorithmicsDenys Haryachyy
 
DPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet ProcessingDPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet ProcessingMichelle Holley
 
FD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingFD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingKernel TLV
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking ExplainedThomas Graf
 
1 intro to_dpdk_and_hw
1 intro to_dpdk_and_hw1 intro to_dpdk_and_hw
1 intro to_dpdk_and_hwvideos
 
The linux networking architecture
The linux networking architectureThe linux networking architecture
The linux networking architecturehugo lu
 
LinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughLinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughThomas Graf
 
7 hands on
7 hands on7 hands on
7 hands onvideos
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)Brendan Gregg
 
What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?Michelle Holley
 
The TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelThe TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelDivye Kapoor
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machineAlexei Starovoitov
 
Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Andriy Berestovskyy
 
netfilter and iptables
netfilter and iptablesnetfilter and iptables
netfilter and iptablesKernel TLV
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringScyllaDB
 

What's hot (20)

Dpdk performance
Dpdk performanceDpdk performance
Dpdk performance
 
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet Processing
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
Understanding DPDK algorithmics
Understanding DPDK algorithmicsUnderstanding DPDK algorithmics
Understanding DPDK algorithmics
 
DPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet ProcessingDPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet Processing
 
FD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingFD.IO Vector Packet Processing
FD.IO Vector Packet Processing
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking Explained
 
1 intro to_dpdk_and_hw
1 intro to_dpdk_and_hw1 intro to_dpdk_and_hw
1 intro to_dpdk_and_hw
 
Ixgbe internals
Ixgbe internalsIxgbe internals
Ixgbe internals
 
The linux networking architecture
The linux networking architectureThe linux networking architecture
The linux networking architecture
 
LinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughLinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking Walkthrough
 
7 hands on
7 hands on7 hands on
7 hands on
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
 
What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?
 
The TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelThe TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux Kernel
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machine
 
Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)
 
Dpdk pmd
Dpdk pmdDpdk pmd
Dpdk pmd
 
netfilter and iptables
netfilter and iptablesnetfilter and iptables
netfilter and iptables
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uring
 

Similar to DPDK: High-Performance Packet Processing

High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHungWei Chiu
 
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...Jim St. Leger
 
High Performance Networking Leveraging the DPDK and Growing Community
High Performance Networking Leveraging the DPDK and Growing CommunityHigh Performance Networking Leveraging the DPDK and Growing Community
High Performance Networking Leveraging the DPDK and Growing Community6WIND
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...PROIDEA
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)Kirill Tsym
 
Accelerated dataplanes integration and deployment
Accelerated dataplanes integration and deploymentAccelerated dataplanes integration and deployment
Accelerated dataplanes integration and deploymentOPNFV
 
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Ontico
 
The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)Nicolas Poggi
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuAlan Sill
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community
 
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running LinuxLinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linuxbrouer
 
ODSA Use Case - SmartNIC
ODSA Use Case - SmartNICODSA Use Case - SmartNIC
ODSA Use Case - SmartNICODSA Workgroup
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networksinside-BigData.com
 
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-PremiseTackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-PremiseDatabricks
 
OVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
OVS and DPDK - T.F. Herbert, K. Traynor, M. GrayOVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
OVS and DPDK - T.F. Herbert, K. Traynor, M. Grayharryvanhaaren
 
[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene
[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene
[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-GeneOpenStack Korea Community
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsHPCC Systems
 

Similar to DPDK: High-Performance Packet Processing (20)

100 M pps on PC.
100 M pps on PC.100 M pps on PC.
100 M pps on PC.
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User Group
 
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
 
High Performance Networking Leveraging the DPDK and Growing Community
High Performance Networking Leveraging the DPDK and Growing CommunityHigh Performance Networking Leveraging the DPDK and Growing Community
High Performance Networking Leveraging the DPDK and Growing Community
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)
 
Accelerated dataplanes integration and deployment
Accelerated dataplanes integration and deploymentAccelerated dataplanes integration and deployment
Accelerated dataplanes integration and deployment
 
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
 
Current Trends in HPC
Current Trends in HPCCurrent Trends in HPC
Current Trends in HPC
 
The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttu
 
pps Matters
pps Matterspps Matters
pps Matters
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running LinuxLinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
 
ODSA Use Case - SmartNIC
ODSA Use Case - SmartNICODSA Use Case - SmartNIC
ODSA Use Case - SmartNIC
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networks
 
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-PremiseTackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
 
OVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
OVS and DPDK - T.F. Herbert, K. Traynor, M. GrayOVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
OVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
 
[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene
[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene
[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
 

More from Kernel TLV

Building Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCCBuilding Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCCKernel TLV
 
SGX Trusted Execution Environment
SGX Trusted Execution EnvironmentSGX Trusted Execution Environment
SGX Trusted Execution EnvironmentKernel TLV
 
Kernel Proc Connector and Containers
Kernel Proc Connector and ContainersKernel Proc Connector and Containers
Kernel Proc Connector and ContainersKernel TLV
 
Bypassing ASLR Exploiting CVE 2015-7545
Bypassing ASLR Exploiting CVE 2015-7545Bypassing ASLR Exploiting CVE 2015-7545
Bypassing ASLR Exploiting CVE 2015-7545Kernel TLV
 
Present Absence of Linux Filesystem Security
Present Absence of Linux Filesystem SecurityPresent Absence of Linux Filesystem Security
Present Absence of Linux Filesystem SecurityKernel TLV
 
OpenWrt From Top to Bottom
OpenWrt From Top to BottomOpenWrt From Top to Bottom
OpenWrt From Top to BottomKernel TLV
 
Make Your Containers Faster: Linux Container Performance Tools
Make Your Containers Faster: Linux Container Performance ToolsMake Your Containers Faster: Linux Container Performance Tools
Make Your Containers Faster: Linux Container Performance ToolsKernel TLV
 
Emerging Persistent Memory Hardware and ZUFS - PM-based File Systems in User ...
Emerging Persistent Memory Hardware and ZUFS - PM-based File Systems in User ...Emerging Persistent Memory Hardware and ZUFS - PM-based File Systems in User ...
Emerging Persistent Memory Hardware and ZUFS - PM-based File Systems in User ...Kernel TLV
 
File Systems: Why, How and Where
File Systems: Why, How and WhereFile Systems: Why, How and Where
File Systems: Why, How and WhereKernel TLV
 
KernelTLV Speaker Guidelines
KernelTLV Speaker GuidelinesKernelTLV Speaker Guidelines
KernelTLV Speaker GuidelinesKernel TLV
 
Userfaultfd: Current Features, Limitations and Future Development
Userfaultfd: Current Features, Limitations and Future DevelopmentUserfaultfd: Current Features, Limitations and Future Development
Userfaultfd: Current Features, Limitations and Future DevelopmentKernel TLV
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageKernel TLV
 
Linux Kernel Cryptographic API and Use Cases
Linux Kernel Cryptographic API and Use CasesLinux Kernel Cryptographic API and Use Cases
Linux Kernel Cryptographic API and Use CasesKernel TLV
 
DMA Survival Guide
DMA Survival GuideDMA Survival Guide
DMA Survival GuideKernel TLV
 
WiFi and the Beast
WiFi and the BeastWiFi and the Beast
WiFi and the BeastKernel TLV
 
FreeBSD and Drivers
FreeBSD and DriversFreeBSD and Drivers
FreeBSD and DriversKernel TLV
 
Specializing the Data Path - Hooking into the Linux Network Stack
Specializing the Data Path - Hooking into the Linux Network StackSpecializing the Data Path - Hooking into the Linux Network Stack
Specializing the Data Path - Hooking into the Linux Network StackKernel TLV
 
Linux Interrupts
Linux InterruptsLinux Interrupts
Linux InterruptsKernel TLV
 
Userfaultfd and Post-Copy Migration
Userfaultfd and Post-Copy MigrationUserfaultfd and Post-Copy Migration
Userfaultfd and Post-Copy MigrationKernel TLV
 

More from Kernel TLV (20)

Building Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCCBuilding Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCC
 
SGX Trusted Execution Environment
SGX Trusted Execution EnvironmentSGX Trusted Execution Environment
SGX Trusted Execution Environment
 
Fun with FUSE
Fun with FUSEFun with FUSE
Fun with FUSE
 
Kernel Proc Connector and Containers
Kernel Proc Connector and ContainersKernel Proc Connector and Containers
Kernel Proc Connector and Containers
 
Bypassing ASLR Exploiting CVE 2015-7545
Bypassing ASLR Exploiting CVE 2015-7545Bypassing ASLR Exploiting CVE 2015-7545
Bypassing ASLR Exploiting CVE 2015-7545
 
Present Absence of Linux Filesystem Security
Present Absence of Linux Filesystem SecurityPresent Absence of Linux Filesystem Security
Present Absence of Linux Filesystem Security
 
OpenWrt From Top to Bottom
OpenWrt From Top to BottomOpenWrt From Top to Bottom
OpenWrt From Top to Bottom
 
Make Your Containers Faster: Linux Container Performance Tools
Make Your Containers Faster: Linux Container Performance ToolsMake Your Containers Faster: Linux Container Performance Tools
Make Your Containers Faster: Linux Container Performance Tools
 
Emerging Persistent Memory Hardware and ZUFS - PM-based File Systems in User ...
Emerging Persistent Memory Hardware and ZUFS - PM-based File Systems in User ...Emerging Persistent Memory Hardware and ZUFS - PM-based File Systems in User ...
Emerging Persistent Memory Hardware and ZUFS - PM-based File Systems in User ...
 
File Systems: Why, How and Where
File Systems: Why, How and WhereFile Systems: Why, How and Where
File Systems: Why, How and Where
 
KernelTLV Speaker Guidelines
KernelTLV Speaker GuidelinesKernelTLV Speaker Guidelines
KernelTLV Speaker Guidelines
 
Userfaultfd: Current Features, Limitations and Future Development
Userfaultfd: Current Features, Limitations and Future DevelopmentUserfaultfd: Current Features, Limitations and Future Development
Userfaultfd: Current Features, Limitations and Future Development
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast Storage
 
Linux Kernel Cryptographic API and Use Cases
Linux Kernel Cryptographic API and Use CasesLinux Kernel Cryptographic API and Use Cases
Linux Kernel Cryptographic API and Use Cases
 
DMA Survival Guide
DMA Survival GuideDMA Survival Guide
DMA Survival Guide
 
WiFi and the Beast
WiFi and the BeastWiFi and the Beast
WiFi and the Beast
 
FreeBSD and Drivers
FreeBSD and DriversFreeBSD and Drivers
FreeBSD and Drivers
 
Specializing the Data Path - Hooking into the Linux Network Stack
Specializing the Data Path - Hooking into the Linux Network StackSpecializing the Data Path - Hooking into the Linux Network Stack
Specializing the Data Path - Hooking into the Linux Network Stack
 
Linux Interrupts
Linux InterruptsLinux Interrupts
Linux Interrupts
 
Userfaultfd and Post-Copy Migration
Userfaultfd and Post-Copy MigrationUserfaultfd and Post-Copy Migration
Userfaultfd and Post-Copy Migration
 

Recently uploaded

Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfPros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfkalichargn70th171
 
Key Steps in Agile Software Delivery Roadmap
Key Steps in Agile Software Delivery RoadmapKey Steps in Agile Software Delivery Roadmap
Key Steps in Agile Software Delivery RoadmapIshara Amarasekera
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...OnePlan Solutions
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
OpenMetadata Community Meeting - 4th April, 2024
OpenMetadata Community Meeting - 4th April, 2024OpenMetadata Community Meeting - 4th April, 2024
OpenMetadata Community Meeting - 4th April, 2024OpenMetadata
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorTier1 app
 
Transform your Corporate Strategy Office - Harness OnePlan’s Strategic Portfo...
Transform your Corporate Strategy Office - Harness OnePlan’s Strategic Portfo...Transform your Corporate Strategy Office - Harness OnePlan’s Strategic Portfo...
Transform your Corporate Strategy Office - Harness OnePlan’s Strategic Portfo...OnePlan Solutions
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shardsChristopher Curtin
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolsosttopstonverter
 
Effort Estimation Techniques used in Software Projects
Effort Estimation Techniques used in Software ProjectsEffort Estimation Techniques used in Software Projects
Effort Estimation Techniques used in Software ProjectsDEEPRAJ PATHAK
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
Advantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxAdvantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxRTS corp
 
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxUnderstanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxSasikiranMarri
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?Alexandre Beguel
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfmaor17
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics
 

Recently uploaded (20)

Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfPros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
 
Key Steps in Agile Software Delivery Roadmap
Key Steps in Agile Software Delivery RoadmapKey Steps in Agile Software Delivery Roadmap
Key Steps in Agile Software Delivery Roadmap
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
OpenMetadata Community Meeting - 4th April, 2024
OpenMetadata Community Meeting - 4th April, 2024OpenMetadata Community Meeting - 4th April, 2024
OpenMetadata Community Meeting - 4th April, 2024
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryError
 
Transform your Corporate Strategy Office - Harness OnePlan’s Strategic Portfo...
Transform your Corporate Strategy Office - Harness OnePlan’s Strategic Portfo...Transform your Corporate Strategy Office - Harness OnePlan’s Strategic Portfo...
Transform your Corporate Strategy Office - Harness OnePlan’s Strategic Portfo...
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration tools
 
Effort Estimation Techniques used in Software Projects
Effort Estimation Techniques used in Software ProjectsEffort Estimation Techniques used in Software Projects
Effort Estimation Techniques used in Software Projects
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
Advantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxAdvantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptx
 
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxUnderstanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdf
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
 

DPDK: High-Performance Packet Processing

  • 1. DPDK Data Plane Development Kit Ariel Waizel ariel.waizel@gmail.com
  • 2. 2Confidential Who Am I? • Programmer for 12 years – Mostly C/C++ for cyber and networks applications. – Some higher languages for machine learning and generic DevOps (Python and Java). • M.sc in Information System Engineering (a.k.a machine learning). • Currently I’m a solution architect at Contextream ( HPE) – The company specializes in software defined networks for carrier grade companies (Like Bezek, Partner, etc.). – Appeal: Everything is open-source! – I work with costumers on end-to-end solutions and create POCs. • Gaming for fun.
  • 3. 3Confidential Let’s Talk About DPDK • What is DPDK? • Why use it? • How good is it? • How does it work?
  • 4. 4Confidential The Challenge • Network application, done in software, supporting native network speeds. • Network speeds: – 10 Gb per NIC (Network Interface Card). – 40 Gb per NIC. – 100 Gb? • Network applications: – Network device implemented in software (Router, switch, http proxy, etc.). – Network services for virtual machines (Openstack, etc.). – Security. protect your applications from DDOS.
  • 5. 5Confidential What is DPDK? • Set of UIO (User IO) drivers and user space libraries. • Goal: forward network packet to/from NIC (Network Interface Card) from/to user application at native speed: – 10 or 40 Gb NICs. – Speed is the most (only?) important criteria. – Only forwards the packets – not a network stack (But there’re helpful libraries and examples to use). • All traffic bypasses the kernel (We’ll get to why). – When a NIC is controlled by a DPDK driver, it’s invisible to the kernel. • Open source (BSD-3 for most, GPL for Linux Kernel related parts).
  • 6. 6Confidential Why DPDK Should Interest Kernel Devs? • Bypassing the kernel is importance because of performance - Intriguing by itself. – At the very least, know the “competition”. • DPDK is a very light-weight, low level, performance driven framework – Makes it a good learning ground for kernel developers, to learn performance guidelines.
  • 7. 7Confidential Why bypassing the kernel is a necessity. Why Use DPDK?
  • 8. 8Confidential 10Gb – Crunching Numbers • Minimum Ethernet packet: 64 bytes + 20 preamble. • Maximum number of pps: 14,880,952 – 10^10 bps /(84 bytes * 8) • Minimum time to process a single packet: 67.2 ns • Minimum CPU cycles: 201 cycles on a 3 Ghz CPU (1 Ghz -> 1 cycle per ns).
  • 9. 9Confidential Time (ish) per Operation (July 2014) Time (Expected)Operation 32 nsCache Miss 4.3 nsL2 cache access 7.9 nsL3 cache access 8.25 ns (16.5 ns for unlock too)“LOCK” operation (like Atomic) 41.85 ns (75.34 for audit-syscall)syscall 80 ns for alloc + freeSLUB Allocator (Linux Kernel buffer allocator) 77 nsSKB (Linux Kernel packet struct) Memory Manager 48 ns minimum, between 58 to 68 ns averageQdisc (tx queue descriptor) Up to several Cache Misses.TLB Miss * Red Hat Challenge 10Gbit/s Jesper Dangaard Brouer et al. Netfilter Workshop July 2014. Conclusion: Must use batching (Send/rcv several packet together at the same time, amortize costs)
  • 10. 10Confidential Time Per Operation – The Big No-Nos Average TimeOperation Micro seconds (1000 ns)Context Switch Micro secondsPage Fault Conclusion: Must pin CPUs and pre-allocate resources
  • 11. 11Confidential Benchmarks at the end. As of Today: DPDK can handle 11X the traffic Linux Kernel Can!
  • 13. 13Confidential DPDK – What do you get? • UIO drivers • PMD per hardware NIC: – PMD (Poll Mode Driver) support for RX/TX (Receive and Transmit). – Mapping PCI memory and registers. – Mapping user memory (for example: packet memory buffers) into the NIC. – Configuration of specific HW accelerations in the NIC. • User space libraries: – Initialize PMDs – Threading (builds on pthread). – CPU Management – Memory Management (Huge pages only!). – Hashing, Scheduler, pipeline (packet steering), etc. – High performance support libraries for the application.
  • 14. 14Confidential DPDK APP DPDK APP From NIC to Process – Pipe Line modelNIC RX Q RX Q PMD PMD TX Q TX Q DPDK APP RING RINGIgb_uio * igb_uio is the DPDK standard kernel uio driver for device control plane
  • 15. 15Confidential From NIC to Process – Run To Completion ModelNIC RX Q RX Q TX Q TX Q DPDK APP + PMD DPDK APP + PMDIgb_uio
  • 16. 16Confidential Software Configuration • C Code – GCC 4.5.X or later. • Required: – Kernel version >= 2.6.34 – Hugetablefs (For best performance use 1G pages, which require GRUB configuration). • Recommended: – isolcpus in GRUB configuration: isolates CPUs from the scheduler. • Compilation – DPDK applications are statically compiled/linked with the DPDK libraries, for best performance. – Export RTE_SDK and RTE_TARGET to develop the application from a misc. directory • Setup: – Best to use tools/dpdk-setup.sh to setup/build the environment. – Use tools/dpdk-devbind.py o -- status let’s you see the available NICs and their corresponding drivers. o -bind let’s you bind a NIC to a driver. Igb-uio, for example. – Run the application with the appropriate arguments.
  • 17. 17Confidential Software Architecture • PMD drivers are just user space pthreads that call specific EAL functions – These EAL functions have concrete implementations per NIC, and this costs couple of indirections. – Access to RX/TX descriptors is direct. – Uses UIO driver for specific control changes (like configuring interrupts). • Most DPDK libraries are not thread-safe. – PMD drivers are non-preemptive: Can’t have 2 PMDs handle the same HW queue on the same NIC. • All inner-thread communication is based on librte_ring. – A mp-mc lockless non-resizable queue ring implementation. – Optimized for DPDK purposes. • All resources like memory (malloc), threads, descriptor queues, etc. are initialized at the start.
  • 20. 20Confidential Code Example – Basic Forwarding • Main Function: DPDK Init Get all available NICS (binded with igb_uio) Initialize packet buffers Initialize NICs.
  • 21. 21Confidential Code Example – port_init NIC init: Set number of queues Rx queue init: Set packet buffer pool for queue * Uses librte_ring to be thread safe Tx queue init: No need for buffer pool Start Getting Packets
  • 23. 23Confidential Igb_uio • For Intel Gigabit NICs. – Simple enough to work for most NICs, nonetheless. • Basically: – Calls pci_enable_device. – Enables bus mastering on the device (pci_set_master). – Requests all BARs and mas them using ioremap. – Setups ioports. – Sets the dma mask to 64-bit. • Code to support SRIOV and xen.
  • 24. 24Confidential rte_ring • Fixed sized, “lockless”, queue ring • Non Preemptive. • Supports multiple/single producer/consumer, and bulk actions. • Uses: – Single array of pointers. – Head/tail pointers for both producer and consumer (total 4 pointers). • To enqueue (Just like dequeue): – Until successful: o Save in local variable the current head_ptr. o head_next = head_ptr + num_objects o CAS the head_ptr to head_next – Insert objects. – Until successful: o Update tail_ptr = head_next + 1 when tail_ptr == head_ptr • Analysis: – Light weight. – In theory, both loops are costly. – In practice, as all threads are cpu bound, the amortized cost is low for the first loop, and very unlikely at the second loop.
  • 25. 25Confidential rte_mempool • Smart memory allocation • Allocate the start of each memory buffer at a different memory channel/rank: – Most applications only want to see the first 64 bytes of the packet (Ether+ip header). – Requests for memory at different channels, same rank, are done concurrently by the memory controller. – Requests for memory at different ranks can be managed effectively by the memory controller. – Pads objects until gcd(obj_size,num_ranks*num_channels) == 1. • Maintain a per-core cache, bulk requests to the mempool ring. • Allocate memory based on: – NUMA – Contiguous virtual memory (Means also contiguous physical memory for huge pages). • Non- preemptive. – Same lcore must not context switch to another task using the same mempool.
  • 27. 27Confidential Linux Kernel Benchmarks, single core Feb. 2016July 2014Benchmark 14.8 Mpps4 MppsTX 12 Mpps (experimental)6.4 MppsRX (Dump at driver) 2 Mpps1 MppsL3 Forwarding (RX+Filter+TX) 12 Mpps6 MppsL3 forwarding (Multi Core) * Red Hat The 100Gbit/s Challenge Jesper Dangaard Brouer et al. DevConf Feb. 2016.
  • 28. 28Confidential DPDK Benchmarks (March 2016) Multi CoreSingle CoreBenchmark Linear Increase22 MppsL3 forwarding (PHY-PHY) Linear Increase11 MppsSwitch Forwarding (PHY-OVS-PHY) Near Linear Increase3.4 MppsVM Forwarding (PHY-VM-PHY) Linear Increase2 MppsVM to VM * Intel Open Network Platform Release 2.1 Performance Test Report March 2016. • All tests with: • 4X40Gb ports • E5-2695 V4 2.1Ghz Processor • 16X1GB Huge Pages, 2048X2MB Huge Pages
  • 31. 31Confidential DPDK Advantages • Best forwarding performance solution to/from PHY/process/VM to date. – Best single core performance. – Scales: linear performance increase per core. • Active and longstanding community (from 2012). – DPDK.org – Full of tutorials, examples and complementary features. • Active popular products – OVS-DPDK is the main solution for high speed networking in openstack. – 6WIND. – TRex. • Great virtualization support. – Deploy at the host of the guest environment.
  • 32. 32Confidential DPDK Disadvantages • Security • Isolated ecosystem: – Hard to use linux kernel infrastructure (While there are precedents). • Requires modified code in applications to use: – DPDK processes use a very specific API. – DPDK application can’t interact transparently with Linux processes (important for transparent networking applications like Firewall , DDOS mitigation, etc.). o Solved for interaction with VMs by the vhost Library. • Requires Huge pages (XDP doesn’t).