SlideShare a Scribd company logo
Understanding
DPDK
Description of techniques used to achieve
high throughput on a commodity hardware
How fast SW has to work?
14.88 millions of 64 byte packets per second on 10G interface
1.8 GHz -> 1 cycle = 0,55 ns
1 packet -> 67.2 ns = 120 clock cycles
IFG
Pream
ble
DST
MAC
SRC
MAC
SRC
MAC
Type Payload CRC
84 Bytes
412 8 60
Comparative speed values
CPU to memory speed = 6-8 GBytes/s
PCI-Express x16 speed = 5 GBytes/s
Access to RAM = 200 ns
Access to L3 cache = 4 ns
Context switch ~= 1000 ns (3.2 GHz)
Packet processing in Linux
User space
Kernel space
NIC
App
Driver
RX/TX queues
Socket
Ring
buffers
Linux kernel overhead
System calls
Context switching on blocking I/O
Data copying from kernel to user space
Interrupt handling in kernel
Expense of sendto
Function Activity Time (ns)
sendto system call 96
sosend_dgram lock sock_buff, alloc mbuf, copy in 137
udp_output UDP header setup 57
ip_output route lookup, ip header setup 198
ether_otput MAC lookup, MAC header setup 162
ixgbe_xmit device programming 220
Total 950
Packet processing with DPDK
User space
Kernel space
NIC
App DPDK
Ring
buffers
UIO driver
RX/TX
queues
Kernel space
Updating a register in Linux
User space
HW
ioctl()
Register
syscall
VFS
copy_from_user()
iowrite()
Updating a register with DPDK
User space
HW
assign
Register
What is used inside DPDK?
Processor affinity (separate cores)
Huge pages (no swap, TLB)
UIO (no copying from kernel)
Polling (no interrupts overhead)
Lockless synchronization (avoid waiting)
Batch packets handling
SSE, NUMA awareness
Linux default scheduling
Core 0
Core 1
Core 2
Core 3
t1 t4t3t2
How to isolate a core for a process
To diagnose use top
“top” , press “f” , press “j”
Before boot use isolcpus
“isolcpus=2,4,6”
After boot - use cpuset
“cset shield -c 1-3”, “cset shield -k on”
Core 2Core 1
Run-to-completion model
RX/TX
thread
RX/TX
thread
Port 1 Port 2
Core 2Core 1
Pipeline model
RX
thread
TX
thread
Port 1 Port 2
Ring
Page tables tree
Linux paging model
cr3
Page
Page
Global
Directory
Page
Table
Page
Middle
Directory
TLB
TLB
Page
Table
RAM
OffsetVirtual page
Physical Page Offset
TLB characteristics
$ cpuid | grep -i tlb
size: 12–4,096 entries
hit time: 0.5–1 clock cycle
miss penalty: 10–100 clock cycles
miss rate: 0.01–1%
It is very expensive resource!
Solution - Hugepages
Benefit: optimized TLB usage, no swap
Hugepage size = 2M
Usage:
mount hugetlbfs /mnt/huge
mmap
Library - libhugetlbfs
Lockless ring design
Writer can preempt writer and reader
Reader can not preempt writer
Reader and writer can work simultaneously on
different cores
Barrier
CAS operation
Bulk queue/dequeue
Lockless ring (Single Producer)
1
cons_head
cons_tail
prod_head
prod_tail
prod_next 2
cons_head
cons_tail
prod_head
prod_next
prod_tail
3
cons_head
cons_tail
prod_head
prod_tail
Lockless ring (Single Consumer)
1
cons_head
cons_tail
prod_head
prod_tail
cons_next 2
cons_tail prod_head
prod_tail
cons_next
cons_head
3
cons_head
cons_tail
prod_head
prod_tail
Lockless ring (Multiple Producers)
1
cons_head
cons_tail
prod_head
prod_tail
prod_next1
prod_next2 3
cons_head
cons_tail
prod_head
2
cons_head
cons_tail
prod_head
prod_next2
prod_tail
prod_next1
4
cons_head
cons_tail
5
cons_head
cons_tail
prod_head
prod_tail
prod_tail
prod_head
prod_tail
prod_next1
prod_next2
prod_next1
prod_next2
Kernel space network driver
App
IP stack
Driver
NIC
Data
Desc
Config
Data
User space
Kernel space
Interrupts
UIO
“The most important devices can’t be handled
in user space, including, but not
limited to, network interfaces and block
devices.” - LDD3
UIO
User space
Kernel space
Interfacesysfs /dev/uioX
App
US driver epoll()
mmap()
UIO framework
driver
NIC User space
Access to device from user space
BAR0 (Mem)
BAR1
BAR2 (IO)
BAR5
BAR4
BAR3
Vendor Id
Device Id
Command
Revision Id
Status
...
Configuration
registers
I/O and memory
regions
/sys/class/uio/uioX/maps/mapX
/sys/class/uio/uioX/portio/portX
/dev/uioX -> mmap (offset)
/sys/bus/pci/devices
Host memory NIC memory
DMA RX
Update RDT
DMA descriptor(s)
RX queue RX FIFO
DMA packet
Descriptor ringMemory
DMA descriptors
Host memory NIC memory
DMA TX
Update TDT
DMA descriptor(s)
TX queue TX FIFO
DMA packet
Descriptor ringMemory
DMA descriptors
Receive from SW side
DD DD DDDD
RDT
DD
mbuf1
addr
DD
mbuf2
addr
RDT
RDH = 1
RDT = 5
RDBA = 0
RDLEN = 6
mbuf1
RDH
RDH
mbuf2
Transmit from SW side
DD DD DDDD
TDT
DD
mbuf1
addr
DD
mbuf2
addr
TDT
TDH = 1
TDT = 5
TDBA = 0
TDLEN = 6
mbuf1
TDH
TDH
mbuf2
NUMA
CPU 0
Cores
Memory
controller
I/O controller
Memory
PCI-E PCI-E
CPU 1
Cores
Memory
controller
I/O controller
Memory
PCI-E PCI-E
QPI
Socket 0 Socket 1
RSS (Receive Side Scaling)
Hash
function
Queue 0 CPU N
...
Queue N
Incoming traffic Indirection
table
Flow director
Queue 0 CPU N
...
Queue N
Incoming traffic
Filter table
Hash
function
Outgoing traffic
Drop Route
Virtualization - SR-IOV
NIC
VMM
VM1
VF driver
VM2
VF driver
PF driver
VF
Virtual bridge
VF PF
NIC
Slow path using bifurcated driver
Kernel DPDK
VF
Virtual bridge
PF Filter table
Slow path using TAP
User space
Kernel space
NIC
App DPDK
Ring
buffers
TAP device
RX/TX
queues
TCP/IP
stack
Slow path using KNI
User space
Kernel space
NIC
App DPDK
Ring
buffers
KNI device
RX/TX
queues
TCP/IP
stack
x86 HW
Application 1 - Traffic generator
User space
Streams generator
DUT
Traffic analyzer
x86 HW
Application 2 - Router
Kernel
User space
Routing table
Routing table cacheDUT1 DUT2
x86 HW
Application 3 - Middlebox
User space
DPIDUT1 DUT2
References
Device Drivers in User Space
Userspace I/O drivers in a realtime context
The Userspace I/O HOWTO
The anatomy of a PCI/PCI Express kernel driver
From Intel® Data Plane Development Kit to Wind River Network Acceleration
Platform
DPDK Design Tips (Part 1 - RSS)
Getting the Best of Both Worlds with Queue Splitting (Bifurcated Driver)
Design considerations for efficient network applications with Intel® multi-core
processor-based systems on Linux
Introduction to Intel Ethernet Flow Director
My blog
Learning Network Programming

More Related Content

What's hot

DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet Processing
Michelle Holley
 
Understanding DPDK algorithmics
Understanding DPDK algorithmicsUnderstanding DPDK algorithmics
Understanding DPDK algorithmics
Denys Haryachyy
 
DPDK in Containers Hands-on Lab
DPDK in Containers Hands-on LabDPDK in Containers Hands-on Lab
DPDK in Containers Hands-on Lab
Michelle Holley
 
DPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet ProcessingDPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet Processing
Michelle Holley
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uring
ScyllaDB
 
Debug dpdk process bottleneck & painpoints
Debug dpdk process bottleneck & painpointsDebug dpdk process bottleneck & painpoints
Debug dpdk process bottleneck & painpoints
Vipin Varghese
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
Adrien Mahieux
 
The TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelThe TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux Kernel
Divye Kapoor
 
DevConf 2014 Kernel Networking Walkthrough
DevConf 2014   Kernel Networking WalkthroughDevConf 2014   Kernel Networking Walkthrough
DevConf 2014 Kernel Networking Walkthrough
Thomas Graf
 
eBPF maps 101
eBPF maps 101eBPF maps 101
eBPF maps 101
SUSE Labs Taipei
 
LinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughLinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking Walkthrough
Thomas Graf
 
eBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux KerneleBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux Kernel
Thomas Graf
 
Introduction to eBPF
Introduction to eBPFIntroduction to eBPF
Introduction to eBPF
RogerColl2
 
Userspace networking
Userspace networkingUserspace networking
Userspace networking
Stephen Hemminger
 
FD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingFD.IO Vector Packet Processing
FD.IO Vector Packet Processing
Kernel TLV
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking Explained
Thomas Graf
 
Introduction to eBPF and XDP
Introduction to eBPF and XDPIntroduction to eBPF and XDP
Introduction to eBPF and XDP
lcplcp1
 
The linux networking architecture
The linux networking architectureThe linux networking architecture
The linux networking architecturehugo lu
 
CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016] CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016]
IO Visor Project
 
Xdp and ebpf_maps
Xdp and ebpf_mapsXdp and ebpf_maps
Xdp and ebpf_maps
lcplcp1
 

What's hot (20)

DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet Processing
 
Understanding DPDK algorithmics
Understanding DPDK algorithmicsUnderstanding DPDK algorithmics
Understanding DPDK algorithmics
 
DPDK in Containers Hands-on Lab
DPDK in Containers Hands-on LabDPDK in Containers Hands-on Lab
DPDK in Containers Hands-on Lab
 
DPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet ProcessingDPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet Processing
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uring
 
Debug dpdk process bottleneck & painpoints
Debug dpdk process bottleneck & painpointsDebug dpdk process bottleneck & painpoints
Debug dpdk process bottleneck & painpoints
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
The TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelThe TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux Kernel
 
DevConf 2014 Kernel Networking Walkthrough
DevConf 2014   Kernel Networking WalkthroughDevConf 2014   Kernel Networking Walkthrough
DevConf 2014 Kernel Networking Walkthrough
 
eBPF maps 101
eBPF maps 101eBPF maps 101
eBPF maps 101
 
LinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughLinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking Walkthrough
 
eBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux KerneleBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux Kernel
 
Introduction to eBPF
Introduction to eBPFIntroduction to eBPF
Introduction to eBPF
 
Userspace networking
Userspace networkingUserspace networking
Userspace networking
 
FD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingFD.IO Vector Packet Processing
FD.IO Vector Packet Processing
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking Explained
 
Introduction to eBPF and XDP
Introduction to eBPF and XDPIntroduction to eBPF and XDP
Introduction to eBPF and XDP
 
The linux networking architecture
The linux networking architectureThe linux networking architecture
The linux networking architecture
 
CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016] CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016]
 
Xdp and ebpf_maps
Xdp and ebpf_mapsXdp and ebpf_maps
Xdp and ebpf_maps
 

Viewers also liked

DPDK summit 2015: It's kind of fun to do the impossible with DPDK
DPDK summit 2015: It's kind of fun  to do the impossible with DPDKDPDK summit 2015: It's kind of fun  to do the impossible with DPDK
DPDK summit 2015: It's kind of fun to do the impossible with DPDK
Lagopus SDN/OpenFlow switch
 
The Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitchThe Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitch
Te-Yen Liu
 
100 M pps on PC.
100 M pps on PC.100 M pps on PC.
100 M pps on PC.
Redge Technologies
 
Disruptive IP Networking with Intel DPDK on Linux
Disruptive IP Networking with Intel DPDK on LinuxDisruptive IP Networking with Intel DPDK on Linux
Disruptive IP Networking with Intel DPDK on Linux
Naoto MATSUMOTO
 
Vagrant
VagrantVagrant
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワーク
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワークSeastar:高スループットなサーバアプリケーションの為の新しいフレームワーク
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワーク
Takuya ASADA
 
OpenVZ - Linux Containers:第2回 コンテナ型仮想化の情報交換会@東京
OpenVZ - Linux Containers:第2回 コンテナ型仮想化の情報交換会@東京OpenVZ - Linux Containers:第2回 コンテナ型仮想化の情報交換会@東京
OpenVZ - Linux Containers:第2回 コンテナ型仮想化の情報交換会@東京
Kentaro Ebisawa
 
コンテナ情報交換会2
コンテナ情報交換会2コンテナ情報交換会2
コンテナ情報交換会2
Masahide Yamamoto
 
cassandra 100 node cluster admin operation
cassandra 100 node cluster admin operationcassandra 100 node cluster admin operation
cassandra 100 node cluster admin operation
oranie Narut
 
PaaSの作り方 Sqaleの場合
PaaSの作り方 Sqaleの場合PaaSの作り方 Sqaleの場合
PaaSの作り方 Sqaleの場合hiboma
 
Inside Sqale's Backend at Sapporo Ruby Kaigi 2012
Inside Sqale's Backend at Sapporo Ruby Kaigi 2012Inside Sqale's Backend at Sapporo Ruby Kaigi 2012
Inside Sqale's Backend at Sapporo Ruby Kaigi 2012Gosuke Miyashita
 
Nosqlの基礎知識(2013年7月講義資料)
Nosqlの基礎知識(2013年7月講義資料)Nosqlの基礎知識(2013年7月講義資料)
Nosqlの基礎知識(2013年7月講義資料)
CLOUDIAN KK
 
Structural design of tunnel lining
Structural design of tunnel liningStructural design of tunnel lining
Structural design of tunnel lining
Mahesh Raj Bhatt
 
Tunnel engg.2
Tunnel engg.2Tunnel engg.2
Tunnel engg.2
SHUBHAM DABHADE
 
Bridges precast
Bridges precastBridges precast
Bridges precast
Dr Fereidoun Dejahang
 
Ecg533 rock-tunnel-engineering
Ecg533 rock-tunnel-engineeringEcg533 rock-tunnel-engineering
Ecg533 rock-tunnel-engineeringJunaida Wally
 
Guidelines
GuidelinesGuidelines
Guidelines
Šumadin Šumić
 
Precast segmental concrete bridges a
Precast segmental concrete bridges aPrecast segmental concrete bridges a
Precast segmental concrete bridges a
Palmer Consulting Services, LLC
 
Diaphragm Wall Presentation By Gagan
Diaphragm Wall Presentation By GaganDiaphragm Wall Presentation By Gagan
Diaphragm Wall Presentation By Gagan
HERITAGE INFRASPACE INDIA PRIVATE LIMITED
 

Viewers also liked (20)

DPDK summit 2015: It's kind of fun to do the impossible with DPDK
DPDK summit 2015: It's kind of fun  to do the impossible with DPDKDPDK summit 2015: It's kind of fun  to do the impossible with DPDK
DPDK summit 2015: It's kind of fun to do the impossible with DPDK
 
The Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitchThe Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitch
 
100 M pps on PC.
100 M pps on PC.100 M pps on PC.
100 M pps on PC.
 
Disruptive IP Networking with Intel DPDK on Linux
Disruptive IP Networking with Intel DPDK on LinuxDisruptive IP Networking with Intel DPDK on Linux
Disruptive IP Networking with Intel DPDK on Linux
 
Vagrant
VagrantVagrant
Vagrant
 
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワーク
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワークSeastar:高スループットなサーバアプリケーションの為の新しいフレームワーク
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワーク
 
OpenVZ - Linux Containers:第2回 コンテナ型仮想化の情報交換会@東京
OpenVZ - Linux Containers:第2回 コンテナ型仮想化の情報交換会@東京OpenVZ - Linux Containers:第2回 コンテナ型仮想化の情報交換会@東京
OpenVZ - Linux Containers:第2回 コンテナ型仮想化の情報交換会@東京
 
コンテナ情報交換会2
コンテナ情報交換会2コンテナ情報交換会2
コンテナ情報交換会2
 
cassandra 100 node cluster admin operation
cassandra 100 node cluster admin operationcassandra 100 node cluster admin operation
cassandra 100 node cluster admin operation
 
PaaSの作り方 Sqaleの場合
PaaSの作り方 Sqaleの場合PaaSの作り方 Sqaleの場合
PaaSの作り方 Sqaleの場合
 
Inside Sqale's Backend at Sapporo Ruby Kaigi 2012
Inside Sqale's Backend at Sapporo Ruby Kaigi 2012Inside Sqale's Backend at Sapporo Ruby Kaigi 2012
Inside Sqale's Backend at Sapporo Ruby Kaigi 2012
 
Nosqlの基礎知識(2013年7月講義資料)
Nosqlの基礎知識(2013年7月講義資料)Nosqlの基礎知識(2013年7月講義資料)
Nosqlの基礎知識(2013年7月講義資料)
 
Structural design of tunnel lining
Structural design of tunnel liningStructural design of tunnel lining
Structural design of tunnel lining
 
Tunnel engg.2
Tunnel engg.2Tunnel engg.2
Tunnel engg.2
 
Bridges precast
Bridges precastBridges precast
Bridges precast
 
Ecg533 rock-tunnel-engineering
Ecg533 rock-tunnel-engineeringEcg533 rock-tunnel-engineering
Ecg533 rock-tunnel-engineering
 
Tunneling
Tunneling  Tunneling
Tunneling
 
Guidelines
GuidelinesGuidelines
Guidelines
 
Precast segmental concrete bridges a
Precast segmental concrete bridges aPrecast segmental concrete bridges a
Precast segmental concrete bridges a
 
Diaphragm Wall Presentation By Gagan
Diaphragm Wall Presentation By GaganDiaphragm Wall Presentation By Gagan
Diaphragm Wall Presentation By Gagan
 

Similar to Understanding DPDK

Steen_Dissertation_March5
Steen_Dissertation_March5Steen_Dissertation_March5
Steen_Dissertation_March5Steen Larsen
 
Polyraptor
PolyraptorPolyraptor
Polyraptor
MohammedAlasmar2
 
Embedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debuggingEmbedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debugging
Anne Nicolas
 
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running LinuxLinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
brouer
 
NUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osioNUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osio
Hajime Tazaki
 
L05 parallel
L05 parallelL05 parallel
Memory management
Memory managementMemory management
Memory management
Adrien Mahieux
 
The Spectre of Meltdowns
The Spectre of MeltdownsThe Spectre of Meltdowns
The Spectre of Meltdowns
Andriy Berestovskyy
 
Dpdk accelerated Ostinato
Dpdk accelerated OstinatoDpdk accelerated Ostinato
Dpdk accelerated Ostinato
pstavirs
 
OSN days 2019 - Open Networking and Programmable Switch
OSN days 2019 - Open Networking and Programmable SwitchOSN days 2019 - Open Networking and Programmable Switch
OSN days 2019 - Open Networking and Programmable Switch
Chun Ming Ou
 
Lrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with rLrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with r
Ferdinand Jamitzky
 
Polyraptor
PolyraptorPolyraptor
Polyraptor
MohammedAlasmar2
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Akihiro Hayashi
 
introduction to linux kernel tcp/ip ptocotol stack
introduction to linux kernel tcp/ip ptocotol stack introduction to linux kernel tcp/ip ptocotol stack
introduction to linux kernel tcp/ip ptocotol stack
monad bobo
 
The New Systems Performance
The New Systems PerformanceThe New Systems Performance
The New Systems Performance
Brendan Gregg
 
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Hsien-Hsin Sean Lee, Ph.D.
 
Semiconductor memories
Semiconductor memoriesSemiconductor memories
Semiconductor memoriesSambitShreeman
 
Brkdct 3101
Brkdct 3101Brkdct 3101
Brkdct 3101
Nguyen Van Linh
 
Cisco crs1
Cisco crs1Cisco crs1
Cisco crs1wjunjmt
 
Introduction to tcpdump
Introduction to tcpdumpIntroduction to tcpdump
Introduction to tcpdumpLev Walkin
 

Similar to Understanding DPDK (20)

Steen_Dissertation_March5
Steen_Dissertation_March5Steen_Dissertation_March5
Steen_Dissertation_March5
 
Polyraptor
PolyraptorPolyraptor
Polyraptor
 
Embedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debuggingEmbedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debugging
 
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running LinuxLinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
 
NUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osioNUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osio
 
L05 parallel
L05 parallelL05 parallel
L05 parallel
 
Memory management
Memory managementMemory management
Memory management
 
The Spectre of Meltdowns
The Spectre of MeltdownsThe Spectre of Meltdowns
The Spectre of Meltdowns
 
Dpdk accelerated Ostinato
Dpdk accelerated OstinatoDpdk accelerated Ostinato
Dpdk accelerated Ostinato
 
OSN days 2019 - Open Networking and Programmable Switch
OSN days 2019 - Open Networking and Programmable SwitchOSN days 2019 - Open Networking and Programmable Switch
OSN days 2019 - Open Networking and Programmable Switch
 
Lrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with rLrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with r
 
Polyraptor
PolyraptorPolyraptor
Polyraptor
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
 
introduction to linux kernel tcp/ip ptocotol stack
introduction to linux kernel tcp/ip ptocotol stack introduction to linux kernel tcp/ip ptocotol stack
introduction to linux kernel tcp/ip ptocotol stack
 
The New Systems Performance
The New Systems PerformanceThe New Systems Performance
The New Systems Performance
 
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
 
Semiconductor memories
Semiconductor memoriesSemiconductor memories
Semiconductor memories
 
Brkdct 3101
Brkdct 3101Brkdct 3101
Brkdct 3101
 
Cisco crs1
Cisco crs1Cisco crs1
Cisco crs1
 
Introduction to tcpdump
Introduction to tcpdumpIntroduction to tcpdump
Introduction to tcpdump
 

More from Denys Haryachyy

Understanding iptables
Understanding iptablesUnderstanding iptables
Understanding iptables
Denys Haryachyy
 
Secure communication
Secure communicationSecure communication
Secure communication
Denys Haryachyy
 
Network sockets
Network socketsNetwork sockets
Network sockets
Denys Haryachyy
 
C++ 11
C++ 11C++ 11
Git basics
Git basicsGit basics
Git basics
Denys Haryachyy
 
History of the personal computer
History of the personal computerHistory of the personal computer
History of the personal computer
Denys Haryachyy
 

More from Denys Haryachyy (6)

Understanding iptables
Understanding iptablesUnderstanding iptables
Understanding iptables
 
Secure communication
Secure communicationSecure communication
Secure communication
 
Network sockets
Network socketsNetwork sockets
Network sockets
 
C++ 11
C++ 11C++ 11
C++ 11
 
Git basics
Git basicsGit basics
Git basics
 
History of the personal computer
History of the personal computerHistory of the personal computer
History of the personal computer
 

Recently uploaded

Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
Top 7 Unique WhatsApp API Benefits | Saudi Arabia
Top 7 Unique WhatsApp API Benefits | Saudi ArabiaTop 7 Unique WhatsApp API Benefits | Saudi Arabia
Top 7 Unique WhatsApp API Benefits | Saudi Arabia
Yara Milbes
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 

Recently uploaded (20)

Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
Top 7 Unique WhatsApp API Benefits | Saudi Arabia
Top 7 Unique WhatsApp API Benefits | Saudi ArabiaTop 7 Unique WhatsApp API Benefits | Saudi Arabia
Top 7 Unique WhatsApp API Benefits | Saudi Arabia
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 

Understanding DPDK

  • 1. Understanding DPDK Description of techniques used to achieve high throughput on a commodity hardware
  • 2. How fast SW has to work? 14.88 millions of 64 byte packets per second on 10G interface 1.8 GHz -> 1 cycle = 0,55 ns 1 packet -> 67.2 ns = 120 clock cycles IFG Pream ble DST MAC SRC MAC SRC MAC Type Payload CRC 84 Bytes 412 8 60
  • 3. Comparative speed values CPU to memory speed = 6-8 GBytes/s PCI-Express x16 speed = 5 GBytes/s Access to RAM = 200 ns Access to L3 cache = 4 ns Context switch ~= 1000 ns (3.2 GHz)
  • 4. Packet processing in Linux User space Kernel space NIC App Driver RX/TX queues Socket Ring buffers
  • 5. Linux kernel overhead System calls Context switching on blocking I/O Data copying from kernel to user space Interrupt handling in kernel
  • 6. Expense of sendto Function Activity Time (ns) sendto system call 96 sosend_dgram lock sock_buff, alloc mbuf, copy in 137 udp_output UDP header setup 57 ip_output route lookup, ip header setup 198 ether_otput MAC lookup, MAC header setup 162 ixgbe_xmit device programming 220 Total 950
  • 7. Packet processing with DPDK User space Kernel space NIC App DPDK Ring buffers UIO driver RX/TX queues
  • 8. Kernel space Updating a register in Linux User space HW ioctl() Register syscall VFS copy_from_user() iowrite()
  • 9. Updating a register with DPDK User space HW assign Register
  • 10. What is used inside DPDK? Processor affinity (separate cores) Huge pages (no swap, TLB) UIO (no copying from kernel) Polling (no interrupts overhead) Lockless synchronization (avoid waiting) Batch packets handling SSE, NUMA awareness
  • 11. Linux default scheduling Core 0 Core 1 Core 2 Core 3 t1 t4t3t2
  • 12. How to isolate a core for a process To diagnose use top “top” , press “f” , press “j” Before boot use isolcpus “isolcpus=2,4,6” After boot - use cpuset “cset shield -c 1-3”, “cset shield -k on”
  • 13. Core 2Core 1 Run-to-completion model RX/TX thread RX/TX thread Port 1 Port 2
  • 14. Core 2Core 1 Pipeline model RX thread TX thread Port 1 Port 2 Ring
  • 15. Page tables tree Linux paging model cr3 Page Page Global Directory Page Table Page Middle Directory
  • 17. TLB characteristics $ cpuid | grep -i tlb size: 12–4,096 entries hit time: 0.5–1 clock cycle miss penalty: 10–100 clock cycles miss rate: 0.01–1% It is very expensive resource!
  • 18. Solution - Hugepages Benefit: optimized TLB usage, no swap Hugepage size = 2M Usage: mount hugetlbfs /mnt/huge mmap Library - libhugetlbfs
  • 19. Lockless ring design Writer can preempt writer and reader Reader can not preempt writer Reader and writer can work simultaneously on different cores Barrier CAS operation Bulk queue/dequeue
  • 20. Lockless ring (Single Producer) 1 cons_head cons_tail prod_head prod_tail prod_next 2 cons_head cons_tail prod_head prod_next prod_tail 3 cons_head cons_tail prod_head prod_tail
  • 21. Lockless ring (Single Consumer) 1 cons_head cons_tail prod_head prod_tail cons_next 2 cons_tail prod_head prod_tail cons_next cons_head 3 cons_head cons_tail prod_head prod_tail
  • 22. Lockless ring (Multiple Producers) 1 cons_head cons_tail prod_head prod_tail prod_next1 prod_next2 3 cons_head cons_tail prod_head 2 cons_head cons_tail prod_head prod_next2 prod_tail prod_next1 4 cons_head cons_tail 5 cons_head cons_tail prod_head prod_tail prod_tail prod_head prod_tail prod_next1 prod_next2 prod_next1 prod_next2
  • 23. Kernel space network driver App IP stack Driver NIC Data Desc Config Data User space Kernel space Interrupts
  • 24. UIO “The most important devices can’t be handled in user space, including, but not limited to, network interfaces and block devices.” - LDD3
  • 25. UIO User space Kernel space Interfacesysfs /dev/uioX App US driver epoll() mmap() UIO framework driver
  • 26. NIC User space Access to device from user space BAR0 (Mem) BAR1 BAR2 (IO) BAR5 BAR4 BAR3 Vendor Id Device Id Command Revision Id Status ... Configuration registers I/O and memory regions /sys/class/uio/uioX/maps/mapX /sys/class/uio/uioX/portio/portX /dev/uioX -> mmap (offset) /sys/bus/pci/devices
  • 27. Host memory NIC memory DMA RX Update RDT DMA descriptor(s) RX queue RX FIFO DMA packet Descriptor ringMemory DMA descriptors
  • 28. Host memory NIC memory DMA TX Update TDT DMA descriptor(s) TX queue TX FIFO DMA packet Descriptor ringMemory DMA descriptors
  • 29. Receive from SW side DD DD DDDD RDT DD mbuf1 addr DD mbuf2 addr RDT RDH = 1 RDT = 5 RDBA = 0 RDLEN = 6 mbuf1 RDH RDH mbuf2
  • 30. Transmit from SW side DD DD DDDD TDT DD mbuf1 addr DD mbuf2 addr TDT TDH = 1 TDT = 5 TDBA = 0 TDLEN = 6 mbuf1 TDH TDH mbuf2
  • 31. NUMA CPU 0 Cores Memory controller I/O controller Memory PCI-E PCI-E CPU 1 Cores Memory controller I/O controller Memory PCI-E PCI-E QPI Socket 0 Socket 1
  • 32. RSS (Receive Side Scaling) Hash function Queue 0 CPU N ... Queue N Incoming traffic Indirection table
  • 33. Flow director Queue 0 CPU N ... Queue N Incoming traffic Filter table Hash function Outgoing traffic Drop Route
  • 34. Virtualization - SR-IOV NIC VMM VM1 VF driver VM2 VF driver PF driver VF Virtual bridge VF PF
  • 35. NIC Slow path using bifurcated driver Kernel DPDK VF Virtual bridge PF Filter table
  • 36. Slow path using TAP User space Kernel space NIC App DPDK Ring buffers TAP device RX/TX queues TCP/IP stack
  • 37. Slow path using KNI User space Kernel space NIC App DPDK Ring buffers KNI device RX/TX queues TCP/IP stack
  • 38. x86 HW Application 1 - Traffic generator User space Streams generator DUT Traffic analyzer
  • 39. x86 HW Application 2 - Router Kernel User space Routing table Routing table cacheDUT1 DUT2
  • 40. x86 HW Application 3 - Middlebox User space DPIDUT1 DUT2
  • 41. References Device Drivers in User Space Userspace I/O drivers in a realtime context The Userspace I/O HOWTO The anatomy of a PCI/PCI Express kernel driver From Intel® Data Plane Development Kit to Wind River Network Acceleration Platform DPDK Design Tips (Part 1 - RSS) Getting the Best of Both Worlds with Queue Splitting (Bifurcated Driver) Design considerations for efficient network applications with Intel® multi-core processor-based systems on Linux Introduction to Intel Ethernet Flow Director