SlideShare a Scribd company logo
1 of 42
Understanding
DPDK
Description of techniques used to achieve
high throughput on a commodity hardware
How fast SW has to work?
14.88 millions of 64 byte packets per second on 10G interface
1.8 GHz -> 1 cycle = 0,55 ns
1 packet -> 67.2 ns = 120 clock cycles
IFG
Pream
ble
DST
MAC
SRC
MAC
SRC
MAC
Type Payload CRC
84 Bytes
412 8 60
Comparative speed values
CPU to memory speed = 6-8 GBytes/s
PCI-Express x16 speed = 5 GBytes/s
Access to RAM = 200 ns
Access to L3 cache = 4 ns
Context switch ~= 1000 ns (3.2 GHz)
Packet processing in Linux
User space
Kernel space
NIC
App
Driver
RX/TX queues
Socket
Ring
buffers
Linux kernel overhead
System calls
Context switching on blocking I/O
Data copying from kernel to user space
Interrupt handling in kernel
Expense of sendto
Function Activity Time (ns)
sendto system call 96
sosend_dgram lock sock_buff, alloc mbuf, copy in 137
udp_output UDP header setup 57
ip_output route lookup, ip header setup 198
ether_otput MAC lookup, MAC header setup 162
ixgbe_xmit device programming 220
Total 950
Packet processing with DPDK
User space
Kernel space
NIC
App DPDK
Ring
buffers
UIO driver
RX/TX
queues
Kernel space
Updating a register in Linux
User space
HW
ioctl()
Register
syscall
VFS
copy_from_user()
iowrite()
Updating a register with DPDK
User space
HW
assign
Register
What is used inside DPDK?
Processor affinity (separate cores)
Huge pages (no swap, TLB)
UIO (no copying from kernel)
Polling (no interrupts overhead)
Lockless synchronization (avoid waiting)
Batch packets handling
SSE, NUMA awareness
Linux default scheduling
Core 0
Core 1
Core 2
Core 3
t1 t4t3t2
How to isolate a core for a process
To diagnose use top
“top” , press “f” , press “j”
Before boot use isolcpus
“isolcpus=2,4,6”
After boot - use cpuset
“cset shield -c 1-3”, “cset shield -k on”
Core 2Core 1
Run-to-completion model
RX/TX
thread
RX/TX
thread
Port 1 Port 2
Core 2Core 1
Pipeline model
RX
thread
TX
thread
Port 1 Port 2
Ring
Page tables tree
Linux paging model
cr3
Page
Page
Global
Directory
Page
Table
Page
Middle
Directory
TLB
TLB
Page
Table
RAM
OffsetVirtual page
Physical Page Offset
TLB characteristics
$ cpuid | grep -i tlb
size: 12–4,096 entries
hit time: 0.5–1 clock cycle
miss penalty: 10–100 clock cycles
miss rate: 0.01–1%
It is very expensive resource!
Solution - Hugepages
Benefit: optimized TLB usage, no swap
Hugepage size = 2M
Usage:
mount hugetlbfs /mnt/huge
mmap
Library - libhugetlbfs
Lockless ring design
Writer can preempt writer and reader
Reader can not preempt writer
Reader and writer can work simultaneously on
different cores
Barrier
CAS operation
Bulk queue/dequeue
Lockless ring (Single Producer)
1
cons_head
cons_tail
prod_head
prod_tail
prod_next 2
cons_head
cons_tail
prod_head
prod_next
prod_tail
3
cons_head
cons_tail
prod_head
prod_tail
Lockless ring (Single Consumer)
1
cons_head
cons_tail
prod_head
prod_tail
cons_next 2
cons_tail prod_head
prod_tail
cons_next
cons_head
3
cons_head
cons_tail
prod_head
prod_tail
Lockless ring (Multiple Producers)
1
cons_head
cons_tail
prod_head
prod_tail
prod_next1
prod_next2 3
cons_head
cons_tail
prod_head
2
cons_head
cons_tail
prod_head
prod_next2
prod_tail
prod_next1
4
cons_head
cons_tail
5
cons_head
cons_tail
prod_head
prod_tail
prod_tail
prod_head
prod_tail
prod_next1
prod_next2
prod_next1
prod_next2
Kernel space network driver
App
IP stack
Driver
NIC
Data
Desc
Config
Data
User space
Kernel space
Interrupts
UIO
“The most important devices can’t be handled
in user space, including, but not
limited to, network interfaces and block
devices.” - LDD3
UIO
User space
Kernel space
Interfacesysfs /dev/uioX
App
US driver epoll()
mmap()
UIO framework
driver
NIC User space
Access to device from user space
BAR0 (Mem)
BAR1
BAR2 (IO)
BAR5
BAR4
BAR3
Vendor Id
Device Id
Command
Revision Id
Status
...
Configuration
registers
I/O and memory
regions
/sys/class/uio/uioX/maps/mapX
/sys/class/uio/uioX/portio/portX
/dev/uioX -> mmap (offset)
/sys/bus/pci/devices
Host memory NIC memory
DMA RX
Update RDT
DMA descriptor(s)
RX queue RX FIFO
DMA packet
Descriptor ringMemory
DMA descriptors
Host memory NIC memory
DMA TX
Update TDT
DMA descriptor(s)
TX queue TX FIFO
DMA packet
Descriptor ringMemory
DMA descriptors
Receive from SW side
DD DD DDDD
RDT
DD
mbuf1
addr
DD
mbuf2
addr
RDT
RDH = 1
RDT = 5
RDBA = 0
RDLEN = 6
mbuf1
RDH
RDH
mbuf2
Transmit from SW side
DD DD DDDD
TDT
DD
mbuf1
addr
DD
mbuf2
addr
TDT
TDH = 1
TDT = 5
TDBA = 0
TDLEN = 6
mbuf1
TDH
TDH
mbuf2
NUMA
CPU 0
Cores
Memory
controller
I/O controller
Memory
PCI-E PCI-E
CPU 1
Cores
Memory
controller
I/O controller
Memory
PCI-E PCI-E
QPI
Socket 0 Socket 1
RSS (Receive Side Scaling)
Hash
function
Queue 0 CPU N
...
Queue N
Incoming traffic Indirection
table
Flow director
Queue 0 CPU N
...
Queue N
Incoming traffic
Filter table
Hash
function
Outgoing traffic
Drop Route
Virtualization - SR-IOV
NIC
VMM
VM1
VF driver
VM2
VF driver
PF driver
VF
Virtual bridge
VF PF
NIC
Slow path using bifurcated driver
Kernel DPDK
VF
Virtual bridge
PF Filter table
Slow path using TAP
User space
Kernel space
NIC
App DPDK
Ring
buffers
TAP device
RX/TX
queues
TCP/IP
stack
Slow path using KNI
User space
Kernel space
NIC
App DPDK
Ring
buffers
KNI device
RX/TX
queues
TCP/IP
stack
x86 HW
Application 1 - Traffic generator
User space
Streams generator
DUT
Traffic analyzer
x86 HW
Application 2 - Router
Kernel
User space
Routing table
Routing table cacheDUT1 DUT2
x86 HW
Application 3 - Middlebox
User space
DPIDUT1 DUT2
References
Device Drivers in User Space
Userspace I/O drivers in a realtime context
The Userspace I/O HOWTO
The anatomy of a PCI/PCI Express kernel driver
From Intel® Data Plane Development Kit to Wind River Network Acceleration
Platform
DPDK Design Tips (Part 1 - RSS)
Getting the Best of Both Worlds with Queue Splitting (Bifurcated Driver)
Design considerations for efficient network applications with Intel® multi-core
processor-based systems on Linux
Introduction to Intel Ethernet Flow Director
My blog
Learning Network Programming

More Related Content

What's hot

Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Andriy Berestovskyy
 
DPDK in Containers Hands-on Lab
DPDK in Containers Hands-on LabDPDK in Containers Hands-on Lab
DPDK in Containers Hands-on LabMichelle Holley
 
1 intro to_dpdk_and_hw
1 intro to_dpdk_and_hw1 intro to_dpdk_and_hw
1 intro to_dpdk_and_hwvideos
 
Understanding DPDK algorithmics
Understanding DPDK algorithmicsUnderstanding DPDK algorithmics
Understanding DPDK algorithmicsDenys Haryachyy
 
DPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet ProcessingDPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet ProcessingMichelle Holley
 
The TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelThe TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelDivye Kapoor
 
Debug dpdk process bottleneck & painpoints
Debug dpdk process bottleneck & painpointsDebug dpdk process bottleneck & painpoints
Debug dpdk process bottleneck & painpointsVipin Varghese
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking ExplainedThomas Graf
 
FD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingFD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingKernel TLV
 
Poll mode driver integration into dpdk
Poll mode driver integration into dpdkPoll mode driver integration into dpdk
Poll mode driver integration into dpdkVipin Varghese
 
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPDockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPThomas Graf
 
Ovs dpdk hwoffload way to full offload
Ovs dpdk hwoffload way to full offloadOvs dpdk hwoffload way to full offload
Ovs dpdk hwoffload way to full offloadKevin Traynor
 
introduction to linux kernel tcp/ip ptocotol stack
introduction to linux kernel tcp/ip ptocotol stack introduction to linux kernel tcp/ip ptocotol stack
introduction to linux kernel tcp/ip ptocotol stack monad bobo
 
eBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux KerneleBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux KernelThomas Graf
 
What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?Michelle Holley
 
Intel 82599 10GbE Controllerで遊ぼう
Intel 82599 10GbE Controllerで遊ぼうIntel 82599 10GbE Controllerで遊ぼう
Intel 82599 10GbE Controllerで遊ぼうTakuya ASADA
 
VLANs in the Linux Kernel
VLANs in the Linux KernelVLANs in the Linux Kernel
VLANs in the Linux KernelKernel TLV
 
The linux networking architecture
The linux networking architectureThe linux networking architecture
The linux networking architecturehugo lu
 

What's hot (20)

Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)
 
DPDK in Containers Hands-on Lab
DPDK in Containers Hands-on LabDPDK in Containers Hands-on Lab
DPDK in Containers Hands-on Lab
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
1 intro to_dpdk_and_hw
1 intro to_dpdk_and_hw1 intro to_dpdk_and_hw
1 intro to_dpdk_and_hw
 
Dpdk performance
Dpdk performanceDpdk performance
Dpdk performance
 
Understanding DPDK algorithmics
Understanding DPDK algorithmicsUnderstanding DPDK algorithmics
Understanding DPDK algorithmics
 
DPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet ProcessingDPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet Processing
 
The TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelThe TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux Kernel
 
Debug dpdk process bottleneck & painpoints
Debug dpdk process bottleneck & painpointsDebug dpdk process bottleneck & painpoints
Debug dpdk process bottleneck & painpoints
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking Explained
 
FD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingFD.IO Vector Packet Processing
FD.IO Vector Packet Processing
 
Poll mode driver integration into dpdk
Poll mode driver integration into dpdkPoll mode driver integration into dpdk
Poll mode driver integration into dpdk
 
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPDockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
 
Ovs dpdk hwoffload way to full offload
Ovs dpdk hwoffload way to full offloadOvs dpdk hwoffload way to full offload
Ovs dpdk hwoffload way to full offload
 
introduction to linux kernel tcp/ip ptocotol stack
introduction to linux kernel tcp/ip ptocotol stack introduction to linux kernel tcp/ip ptocotol stack
introduction to linux kernel tcp/ip ptocotol stack
 
eBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux KerneleBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux Kernel
 
What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?
 
Intel 82599 10GbE Controllerで遊ぼう
Intel 82599 10GbE Controllerで遊ぼうIntel 82599 10GbE Controllerで遊ぼう
Intel 82599 10GbE Controllerで遊ぼう
 
VLANs in the Linux Kernel
VLANs in the Linux KernelVLANs in the Linux Kernel
VLANs in the Linux Kernel
 
The linux networking architecture
The linux networking architectureThe linux networking architecture
The linux networking architecture
 

Viewers also liked

DPDK summit 2015: It's kind of fun to do the impossible with DPDK
DPDK summit 2015: It's kind of fun  to do the impossible with DPDKDPDK summit 2015: It's kind of fun  to do the impossible with DPDK
DPDK summit 2015: It's kind of fun to do the impossible with DPDKLagopus SDN/OpenFlow switch
 
The Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitchThe Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitchTe-Yen Liu
 
Disruptive IP Networking with Intel DPDK on Linux
Disruptive IP Networking with Intel DPDK on LinuxDisruptive IP Networking with Intel DPDK on Linux
Disruptive IP Networking with Intel DPDK on LinuxNaoto MATSUMOTO
 
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワーク
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワークSeastar:高スループットなサーバアプリケーションの為の新しいフレームワーク
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワークTakuya ASADA
 
OpenVZ - Linux Containers:第2回 コンテナ型仮想化の情報交換会@東京
OpenVZ - Linux Containers:第2回 コンテナ型仮想化の情報交換会@東京OpenVZ - Linux Containers:第2回 コンテナ型仮想化の情報交換会@東京
OpenVZ - Linux Containers:第2回 コンテナ型仮想化の情報交換会@東京Kentaro Ebisawa
 
コンテナ情報交換会2
コンテナ情報交換会2コンテナ情報交換会2
コンテナ情報交換会2Masahide Yamamoto
 
cassandra 100 node cluster admin operation
cassandra 100 node cluster admin operationcassandra 100 node cluster admin operation
cassandra 100 node cluster admin operationoranie Narut
 
PaaSの作り方 Sqaleの場合
PaaSの作り方 Sqaleの場合PaaSの作り方 Sqaleの場合
PaaSの作り方 Sqaleの場合hiboma
 
Inside Sqale's Backend at Sapporo Ruby Kaigi 2012
Inside Sqale's Backend at Sapporo Ruby Kaigi 2012Inside Sqale's Backend at Sapporo Ruby Kaigi 2012
Inside Sqale's Backend at Sapporo Ruby Kaigi 2012Gosuke Miyashita
 
Nosqlの基礎知識(2013年7月講義資料)
Nosqlの基礎知識(2013年7月講義資料)Nosqlの基礎知識(2013年7月講義資料)
Nosqlの基礎知識(2013年7月講義資料)CLOUDIAN KK
 
Structural design of tunnel lining
Structural design of tunnel liningStructural design of tunnel lining
Structural design of tunnel liningMahesh Raj Bhatt
 
Ecg533 rock-tunnel-engineering
Ecg533 rock-tunnel-engineeringEcg533 rock-tunnel-engineering
Ecg533 rock-tunnel-engineeringJunaida Wally
 

Viewers also liked (20)

DPDK summit 2015: It's kind of fun to do the impossible with DPDK
DPDK summit 2015: It's kind of fun  to do the impossible with DPDKDPDK summit 2015: It's kind of fun  to do the impossible with DPDK
DPDK summit 2015: It's kind of fun to do the impossible with DPDK
 
The Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitchThe Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitch
 
100 M pps on PC.
100 M pps on PC.100 M pps on PC.
100 M pps on PC.
 
Disruptive IP Networking with Intel DPDK on Linux
Disruptive IP Networking with Intel DPDK on LinuxDisruptive IP Networking with Intel DPDK on Linux
Disruptive IP Networking with Intel DPDK on Linux
 
Vagrant
VagrantVagrant
Vagrant
 
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワーク
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワークSeastar:高スループットなサーバアプリケーションの為の新しいフレームワーク
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワーク
 
OpenVZ - Linux Containers:第2回 コンテナ型仮想化の情報交換会@東京
OpenVZ - Linux Containers:第2回 コンテナ型仮想化の情報交換会@東京OpenVZ - Linux Containers:第2回 コンテナ型仮想化の情報交換会@東京
OpenVZ - Linux Containers:第2回 コンテナ型仮想化の情報交換会@東京
 
コンテナ情報交換会2
コンテナ情報交換会2コンテナ情報交換会2
コンテナ情報交換会2
 
cassandra 100 node cluster admin operation
cassandra 100 node cluster admin operationcassandra 100 node cluster admin operation
cassandra 100 node cluster admin operation
 
PaaSの作り方 Sqaleの場合
PaaSの作り方 Sqaleの場合PaaSの作り方 Sqaleの場合
PaaSの作り方 Sqaleの場合
 
Inside Sqale's Backend at Sapporo Ruby Kaigi 2012
Inside Sqale's Backend at Sapporo Ruby Kaigi 2012Inside Sqale's Backend at Sapporo Ruby Kaigi 2012
Inside Sqale's Backend at Sapporo Ruby Kaigi 2012
 
Nosqlの基礎知識(2013年7月講義資料)
Nosqlの基礎知識(2013年7月講義資料)Nosqlの基礎知識(2013年7月講義資料)
Nosqlの基礎知識(2013年7月講義資料)
 
Structural design of tunnel lining
Structural design of tunnel liningStructural design of tunnel lining
Structural design of tunnel lining
 
Tunnel engg.2
Tunnel engg.2Tunnel engg.2
Tunnel engg.2
 
Bridges precast
Bridges precastBridges precast
Bridges precast
 
Ecg533 rock-tunnel-engineering
Ecg533 rock-tunnel-engineeringEcg533 rock-tunnel-engineering
Ecg533 rock-tunnel-engineering
 
Tunneling
Tunneling  Tunneling
Tunneling
 
Guidelines
GuidelinesGuidelines
Guidelines
 
Precast segmental concrete bridges a
Precast segmental concrete bridges aPrecast segmental concrete bridges a
Precast segmental concrete bridges a
 
Diaphragm Wall Presentation By Gagan
Diaphragm Wall Presentation By GaganDiaphragm Wall Presentation By Gagan
Diaphragm Wall Presentation By Gagan
 

Similar to Understanding DPDK

Steen_Dissertation_March5
Steen_Dissertation_March5Steen_Dissertation_March5
Steen_Dissertation_March5Steen Larsen
 
Embedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debuggingEmbedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debuggingAnne Nicolas
 
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running LinuxLinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linuxbrouer
 
NUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osioNUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osioHajime Tazaki
 
Dpdk accelerated Ostinato
Dpdk accelerated OstinatoDpdk accelerated Ostinato
Dpdk accelerated Ostinatopstavirs
 
OSN days 2019 - Open Networking and Programmable Switch
OSN days 2019 - Open Networking and Programmable SwitchOSN days 2019 - Open Networking and Programmable Switch
OSN days 2019 - Open Networking and Programmable SwitchChun Ming Ou
 
Lrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with rLrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with rFerdinand Jamitzky
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Akihiro Hayashi
 
The New Systems Performance
The New Systems PerformanceThe New Systems Performance
The New Systems PerformanceBrendan Gregg
 
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...Hsien-Hsin Sean Lee, Ph.D.
 
Semiconductor memories
Semiconductor memoriesSemiconductor memories
Semiconductor memoriesSambitShreeman
 
Cisco crs1
Cisco crs1Cisco crs1
Cisco crs1wjunjmt
 
Introduction to tcpdump
Introduction to tcpdumpIntroduction to tcpdump
Introduction to tcpdumpLev Walkin
 
Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...
Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...
Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...Ontico
 

Similar to Understanding DPDK (20)

Steen_Dissertation_March5
Steen_Dissertation_March5Steen_Dissertation_March5
Steen_Dissertation_March5
 
Polyraptor
PolyraptorPolyraptor
Polyraptor
 
Embedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debuggingEmbedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debugging
 
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running LinuxLinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
 
NUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osioNUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osio
 
L05 parallel
L05 parallelL05 parallel
L05 parallel
 
Memory management
Memory managementMemory management
Memory management
 
The Spectre of Meltdowns
The Spectre of MeltdownsThe Spectre of Meltdowns
The Spectre of Meltdowns
 
Dpdk accelerated Ostinato
Dpdk accelerated OstinatoDpdk accelerated Ostinato
Dpdk accelerated Ostinato
 
OSN days 2019 - Open Networking and Programmable Switch
OSN days 2019 - Open Networking and Programmable SwitchOSN days 2019 - Open Networking and Programmable Switch
OSN days 2019 - Open Networking and Programmable Switch
 
Lrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with rLrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with r
 
Polyraptor
PolyraptorPolyraptor
Polyraptor
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
 
The New Systems Performance
The New Systems PerformanceThe New Systems Performance
The New Systems Performance
 
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
 
Semiconductor memories
Semiconductor memoriesSemiconductor memories
Semiconductor memories
 
Brkdct 3101
Brkdct 3101Brkdct 3101
Brkdct 3101
 
Cisco crs1
Cisco crs1Cisco crs1
Cisco crs1
 
Introduction to tcpdump
Introduction to tcpdumpIntroduction to tcpdump
Introduction to tcpdump
 
Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...
Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...
Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...
 

More from Denys Haryachyy

More from Denys Haryachyy (6)

Understanding iptables
Understanding iptablesUnderstanding iptables
Understanding iptables
 
Secure communication
Secure communicationSecure communication
Secure communication
 
Network sockets
Network socketsNetwork sockets
Network sockets
 
C++ 11
C++ 11C++ 11
C++ 11
 
Git basics
Git basicsGit basics
Git basics
 
History of the personal computer
History of the personal computerHistory of the personal computer
History of the personal computer
 

Recently uploaded

EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 

Recently uploaded (20)

EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 

Understanding DPDK

  • 1. Understanding DPDK Description of techniques used to achieve high throughput on a commodity hardware
  • 2. How fast SW has to work? 14.88 millions of 64 byte packets per second on 10G interface 1.8 GHz -> 1 cycle = 0,55 ns 1 packet -> 67.2 ns = 120 clock cycles IFG Pream ble DST MAC SRC MAC SRC MAC Type Payload CRC 84 Bytes 412 8 60
  • 3. Comparative speed values CPU to memory speed = 6-8 GBytes/s PCI-Express x16 speed = 5 GBytes/s Access to RAM = 200 ns Access to L3 cache = 4 ns Context switch ~= 1000 ns (3.2 GHz)
  • 4. Packet processing in Linux User space Kernel space NIC App Driver RX/TX queues Socket Ring buffers
  • 5. Linux kernel overhead System calls Context switching on blocking I/O Data copying from kernel to user space Interrupt handling in kernel
  • 6. Expense of sendto Function Activity Time (ns) sendto system call 96 sosend_dgram lock sock_buff, alloc mbuf, copy in 137 udp_output UDP header setup 57 ip_output route lookup, ip header setup 198 ether_otput MAC lookup, MAC header setup 162 ixgbe_xmit device programming 220 Total 950
  • 7. Packet processing with DPDK User space Kernel space NIC App DPDK Ring buffers UIO driver RX/TX queues
  • 8. Kernel space Updating a register in Linux User space HW ioctl() Register syscall VFS copy_from_user() iowrite()
  • 9. Updating a register with DPDK User space HW assign Register
  • 10. What is used inside DPDK? Processor affinity (separate cores) Huge pages (no swap, TLB) UIO (no copying from kernel) Polling (no interrupts overhead) Lockless synchronization (avoid waiting) Batch packets handling SSE, NUMA awareness
  • 11. Linux default scheduling Core 0 Core 1 Core 2 Core 3 t1 t4t3t2
  • 12. How to isolate a core for a process To diagnose use top “top” , press “f” , press “j” Before boot use isolcpus “isolcpus=2,4,6” After boot - use cpuset “cset shield -c 1-3”, “cset shield -k on”
  • 13. Core 2Core 1 Run-to-completion model RX/TX thread RX/TX thread Port 1 Port 2
  • 14. Core 2Core 1 Pipeline model RX thread TX thread Port 1 Port 2 Ring
  • 15. Page tables tree Linux paging model cr3 Page Page Global Directory Page Table Page Middle Directory
  • 17. TLB characteristics $ cpuid | grep -i tlb size: 12–4,096 entries hit time: 0.5–1 clock cycle miss penalty: 10–100 clock cycles miss rate: 0.01–1% It is very expensive resource!
  • 18. Solution - Hugepages Benefit: optimized TLB usage, no swap Hugepage size = 2M Usage: mount hugetlbfs /mnt/huge mmap Library - libhugetlbfs
  • 19. Lockless ring design Writer can preempt writer and reader Reader can not preempt writer Reader and writer can work simultaneously on different cores Barrier CAS operation Bulk queue/dequeue
  • 20. Lockless ring (Single Producer) 1 cons_head cons_tail prod_head prod_tail prod_next 2 cons_head cons_tail prod_head prod_next prod_tail 3 cons_head cons_tail prod_head prod_tail
  • 21. Lockless ring (Single Consumer) 1 cons_head cons_tail prod_head prod_tail cons_next 2 cons_tail prod_head prod_tail cons_next cons_head 3 cons_head cons_tail prod_head prod_tail
  • 22. Lockless ring (Multiple Producers) 1 cons_head cons_tail prod_head prod_tail prod_next1 prod_next2 3 cons_head cons_tail prod_head 2 cons_head cons_tail prod_head prod_next2 prod_tail prod_next1 4 cons_head cons_tail 5 cons_head cons_tail prod_head prod_tail prod_tail prod_head prod_tail prod_next1 prod_next2 prod_next1 prod_next2
  • 23. Kernel space network driver App IP stack Driver NIC Data Desc Config Data User space Kernel space Interrupts
  • 24. UIO “The most important devices can’t be handled in user space, including, but not limited to, network interfaces and block devices.” - LDD3
  • 25. UIO User space Kernel space Interfacesysfs /dev/uioX App US driver epoll() mmap() UIO framework driver
  • 26. NIC User space Access to device from user space BAR0 (Mem) BAR1 BAR2 (IO) BAR5 BAR4 BAR3 Vendor Id Device Id Command Revision Id Status ... Configuration registers I/O and memory regions /sys/class/uio/uioX/maps/mapX /sys/class/uio/uioX/portio/portX /dev/uioX -> mmap (offset) /sys/bus/pci/devices
  • 27. Host memory NIC memory DMA RX Update RDT DMA descriptor(s) RX queue RX FIFO DMA packet Descriptor ringMemory DMA descriptors
  • 28. Host memory NIC memory DMA TX Update TDT DMA descriptor(s) TX queue TX FIFO DMA packet Descriptor ringMemory DMA descriptors
  • 29. Receive from SW side DD DD DDDD RDT DD mbuf1 addr DD mbuf2 addr RDT RDH = 1 RDT = 5 RDBA = 0 RDLEN = 6 mbuf1 RDH RDH mbuf2
  • 30. Transmit from SW side DD DD DDDD TDT DD mbuf1 addr DD mbuf2 addr TDT TDH = 1 TDT = 5 TDBA = 0 TDLEN = 6 mbuf1 TDH TDH mbuf2
  • 31. NUMA CPU 0 Cores Memory controller I/O controller Memory PCI-E PCI-E CPU 1 Cores Memory controller I/O controller Memory PCI-E PCI-E QPI Socket 0 Socket 1
  • 32. RSS (Receive Side Scaling) Hash function Queue 0 CPU N ... Queue N Incoming traffic Indirection table
  • 33. Flow director Queue 0 CPU N ... Queue N Incoming traffic Filter table Hash function Outgoing traffic Drop Route
  • 34. Virtualization - SR-IOV NIC VMM VM1 VF driver VM2 VF driver PF driver VF Virtual bridge VF PF
  • 35. NIC Slow path using bifurcated driver Kernel DPDK VF Virtual bridge PF Filter table
  • 36. Slow path using TAP User space Kernel space NIC App DPDK Ring buffers TAP device RX/TX queues TCP/IP stack
  • 37. Slow path using KNI User space Kernel space NIC App DPDK Ring buffers KNI device RX/TX queues TCP/IP stack
  • 38. x86 HW Application 1 - Traffic generator User space Streams generator DUT Traffic analyzer
  • 39. x86 HW Application 2 - Router Kernel User space Routing table Routing table cacheDUT1 DUT2
  • 40. x86 HW Application 3 - Middlebox User space DPIDUT1 DUT2
  • 41. References Device Drivers in User Space Userspace I/O drivers in a realtime context The Userspace I/O HOWTO The anatomy of a PCI/PCI Express kernel driver From Intel® Data Plane Development Kit to Wind River Network Acceleration Platform DPDK Design Tips (Part 1 - RSS) Getting the Best of Both Worlds with Queue Splitting (Bifurcated Driver) Design considerations for efficient network applications with Intel® multi-core processor-based systems on Linux Introduction to Intel Ethernet Flow Director