DPDK
Multi Architecture
High Performance
Packet Processing
M Jay
DPDK Presentation March 1 2017
TRANSFORMING NETWORKING & STORAGE
2
Technology Disclaimer:
Intel technologies’ features and benefits depend on system configuration and may require enabled
hardware, software or service activation. Performance varies depending on system configuration. No
computer system can be absolutely secure. Check with your system manufacturer or retailer or learn
more at [intel.com].
Performance Disclaimers (include only the relevant ones):
Cost reduction scenarios described are intended as examples of how a given Intel- based product, in the
specified circumstances and configurations, may affect future costs and provide cost savings.
Circumstances will vary. Intel does not guarantee any costs or cost reduction.
Results have been estimated or simulated using internal Intel analysis or architecture simulation or
modeling, and provided to you for informational purposes. Any differences in your system hardware,
software or configuration may affect your actual performance.
General Disclaimer:
© Copyright 2017 Intel Corporation. All rights reserved. Intel, the Intel logo, Intel Inside, the Intel Inside
logo, Intel. Experience What’s Inside are trademarks of Intel. Corporation in the U.S. and/or other
countries. *Other names and brands may be claimed as the property of others.
Legal Disclaimer
TRANSFORMING NETWORKING & STORAGE
3
Agenda
•  DPDK – Multi Architecture Support
•  Why DPDK? - Optimizing Cycles per Packet
•  DPDK – Building Block for OVS/NFV
•  Enhancing OVS/NFV Infrastructure
•  Call To Action
TRANSFORMING COMMUNICATIONS & STORAGE4
Packet	Size	 64	bytes	
40G	Packets/second	 59.5	Million	each	way	
Packet	arrival	rate	 16.8	ns	
2	GHz	Clock	cycles	 33	cycles	
Typical	Server	Packet		Sizes		Network	Infrastructure	Packet	Sizes	
Packet	Size	(B)	
Packets	per	second	
0	
10,000,000	
20,000,000	
30,000,000	
40,000,000	
50,000,000	
60,000,000	
70,000,000	
What Problem Does DPDK address ?
Packet	Size	 1024	bytes	
40G	Packets/second	 4.8	Million	each	way	
Packet	arrival	rate	 208.8	ns	
2	GHz	Clock	cycles	 417	cycles	
40	Gbps	Line	Rate	(or	4x10G)	 Rx	
Process		
Packet	
Tx
TRANSFORMING NETWORKING & STORAGE
5
Packet Processing
Input Packet A
Look up In packet
A
Do the “Desired” Action
Input Packet B
Look up In packet
B
Do the “Desired” Action
Inter Packet Arrival Time
Line Rate 64 byte packet – Arrival Rate
10 GbE 67.2 ns
40 GbE 16.8 ns
100 6.7 ns
Rx Budget = 19 cycles.
Tx Budget = 28 cycles.
Network Platforms Group 6
* Other names and brands may be claimed as the property of others.
User
Space
KNI IGB_UIO VFIO
EAL
MBUF
MEMPOO
L
RING
TIMER
KernelUIO_PCI_GENERIC
FM10K
IXGBE
VMXNET3IGB
E1000
I40E
XENVIRT PCAP
MLX4
MLX5
ETHDEV
RING
NULL
AF_PKT
BONDING
VIRTIOENIC
CXGBE
BNX2X
PMDs: Native & Virtual
SZEDATA2
NFP
MPIPE
HASH
LPM
JOBSTAT
DISTRIB
IP FRAGKNI
REORDER POWER
VHOST
IVSHMEM
SCHED
METER
PIPELINE
PORT TABLE
Network Functions (Cloud, Enterprise, Comms)
CRYPTODEV
QAT
AESNI MB
AcceleratorsCore
Classification Extensions QoS Pkt Framework
ENA
AESNI GCM
SNOW 3G
NULL
New in
16.11
PDUMP
KASUMI
THUNDERX
BNXT
QEDE
VHOST
ACL
DPDK Framework
ZUC
OpenSSL
Network Platforms Group 7
Bond
QoS
Sched
Link
Status
Interrupt
L3fwd
Load
Balancer
KNI
IPv4
Multicast
L2fwd
Keep
Alive
Packet
Distrib
IP
Pipeline
Hello
World
Exception
Path
L2fwd
Jobstats
L2fwd
IVSHMEM
Timer
IP Reass
VMDq
DCB
PTP
Client
Packet
OrderingCLI
DPDK
Multi
Process
Ethtool
L3fwd VF
IP Frag
QoS
Meter
L2fwd
Perf
Thread
L2fwd
Crypto
RxTx
Callbacks
Quota &
W’mark
Skeleton
TEP Term
Vhost
VM Power
Manager
VMDq
L3fwd
Power
L3fwd
ACL
Netmap
Vhost
Xen
QAT
DPDK Sample Apps
L2fwd
CAT
IPsec
Sec GW
Network Platforms Group 8
DPDK Acceleration Enhancements
DPDK API
Traffic Gens
Pktgen, T-Rex,
Moongen, …
vSwitch
OVS, Lagopus,
…
DPDK
example
apps
AES-NI
Future
features
Event based
program
models
Threading
Models
lthreads, …
Video
Apps
EAL
MALLOC
MBUF
MEMPOOL
RING
TIMER
Core
Libraries
KNI
POWER
IVSHME
M
Platform
LPM
Classificat
ion
ACL
Classify
e1000
ixgbe
bonding
af_pkt
i40e
fm10k
Packet Access
(PMD)
ETHDEV
xenvirt
enic
ring
METER
SCHED
QoS
cxgbe
vmxnet3 virtio
PIPELINE
mlx4 memnic
others
HASH
Utilities
IP Frag
CMDLINE
JOBSTAT
KVARGS
REORDE
R
TABLE
Legacy DPDK
Future
acceleratorsCrypto
Programmable
Classifier/Parser
HW
3rd
Party
GPU/FPGA
3rd
Party
SoC
PMD
External
mempool
manager
SoC
HW
SOC model
VNF Apps
DPDK Acceleration Enhancements
DPDK Framework
Network Stacks
libUNS, mTCP,
SeaStar,
libuinet, TLDK, …
Compression
3rd
Party
HW/SW
IPSec
DPI
Hyperscan
Proxy
Apps, …
Network Platforms Group 9
DPDK in OS Distros
DPDK is also available as part of the following OS
distributions:
Version 7.1 & higher
Version 7.1 +
Version 15.10 +
Version 10.1 +
Version 22 +
* Other names and brands may be claimed as the property of others.
Version 6 +
Clear Linux Version 7160 +
TRANSFORMING NETWORKING & STORAGE
10
What Is The Task At Hand?
Receive
Process
Transmit
rx cost tx cost
A Chain is only as strong as …..
TRANSFORMING NETWORKING & STORAGE
11
Benefits – Eliminating / Hiding Overheads
Interrupt		
Context		
Switch	
Overhead	
Kernel	User	
Overhead	
Core	To	Thread	
Scheduling	
Overhead	
EliminaOng													How?	
Polling		
User	Mode	
Driver	
Pthread	
Affinity	
	4K	Paging	
Overhead		
PCI	Bridge	I/
O	Overhead	
EliminaPng	/Hiding											How?	
Huge	Page		
Lockless	Inter-core	
	CommunicaOon	
	
High	Throughput		
Bulk	Mode	I/O	calls	
To Tackle this challenge, what kind of devices /latency we have at our
disposal?
Network Platforms Group 12
DPDK Generational Performance Gains
Disclaimer: Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components,
software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that
product when combined with other products. For more complete information visit http://www.intel.com/performance.
IPV4 L3 Forwarding Performance of 64Byte
Packets
* Other names and brands may be claimed as the property of others.
Broadwell EP System Configuration 	
Hardware	 	 	 	
Platform	 SuperMicro® - X10DRX	
CPU	 Intel® Xeon® Processor E5-2658 v4
Chipset	 Intel® C612 chipset 	
Sockets	 2	
Cores per Socket	 14 (28 threads)	
LL CACHE	 30 MB	
QPI/DMI	 9.6GT/s	
PCIe	 Gen3x8	
MEMORY	 DDR4 2400 MHz, 1Rx4 8GB (total 64GB), 4 Channel
per Socket
NIC	 10 x Intel® Ethernet CNA XL710-QDA2PCI-Express
Gen3 x8 Dual Port 40 GbE Ethernet NIC (1x40G/card)
NIC Mbps	 40,000	
BIOS	 BIOS version: 1.0c (02/12/2015)
	 	 	 	
Software	 	 	 	
OS	 Debian 8.0
Kernel version	 3.18.2
Other	 DPDK2.2.0	
	
55
80.1
164.9
255
279.9
346.7
0
50
100
150
200
250
300
350
400
2010 (2S
WMR)
2011 (1S
SNB)
2012(2S SNB) 2013 (2S IVB) 2014 (2S
HSW)
2015 (2S
BDW)
L3FwdPerformance(MPPS)
Year
37
Gbps
53.8
Gbps
110.8
Gbps
171.4
Gbps
187.2
Gbps
233
Gbps
2010
(2S WMR)
2011
(1S SNB)
2013
(2S IVB)
2012
(2S SNB)
2015
(2S BDW)
2014
(2S HSW)
TRANSFORMING NETWORKING & STORAGE
13
What are the top two key performance metrics?
TRANSFORMING NETWORKING & STORAGE
14
1) Latencies come in all shapes and sizes – What to do?
RTE_PREFETC
H
TRANSFORMING NETWORKING & STORAGE
15
15
How Is Latency?
•  MIT* white paper on Fast Pass
•  Dream of a system with ZERO Queue
•  Ultimate testimonial for Latency
See How DPDK Can Solve Your Latency Concern
http://fastpass.mit.edu
TRANSFORMING NETWORKING & STORAGE
16
2) Throughput
Why go for 1s and 2s while
you can take a truck load
with BULK API ?
TRANSFORMING NETWORKING & STORAGE
17
Great White Paper – A Must To Read
hSp://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-mulPcore-
packet-processing-paper.pdf
TRANSFORMING NETWORKING & STORAGE
18
Core Components Architecture
TRANSFORMING NETWORKING & STORAGE
19
User Space
Ethernet
Intel® DPDK PMD
IP
TCP
Session
Presentation
Application
L3	Forward	
Kernel
10GbE 10GbE 10GbE 10GbE 10GbE
4K	pages		
(64)	
SKbuff	
DPDK	
	
	
KNI
PMD PMD PMD PMD
Intel®	DPDK	allocates	packet	memory	
equally	across	2,	3,	4	channels.	
Aligned	to	have	equal	load	over	channels	
Stacks	available	from			
	
Eco	Systems	
	
	
Run	to	complePon	model	
on	each	core	used	
DPDK model
IGB-UIO
IGB	 IXGBE	
KNI
RYO	
Stacks	
	
	
“Open-source	
Stack”	
(NetBSD)	
Pkt Buffers
(60K 2K
buffers)
Events
(2K 100B
buffers)
Rings	for	cached	buffers	
Per	core	lists,	unique	per	lcore.	Allows	packet	
movement	without	locks		
2	MB	/	1	GB	Huge	Pages	
for	Cache	Aligned	Structures
TRANSFORMING NETWORKING & STORAGE
20
High Performance Components of
DPDKEnvironment Abstraction Layer
•  Abstracts huge-page file system, provides multi-thread and multi-process support, etc.
Memory Manager
•  Responsible for allocating pools of objects in memory. A pool is created in huge page
memory space and uses a ring to store free objects. It also provides an alignment
helper to ensure that objects are padded to spread them equally on all DRAM channels.
Buffer Manager
•  Reduces by a significant amount the time the operating system spends allocating and
de-allocating buffers. The Intel® DPDK pre-allocates fixed size buffers which are stored
in memory pools.
Queue Manager
•  Implements safe lockless queues, instead of using spinlocks, that allow different
software components to process packets, while avoiding unnecessary wait times.
Flow Classification
•  Provides an efficient mechanism which incorporates Intel® Streaming SIMD Extensions
(Intel® SSE) to produce a hash based on tuple information so that packets may be
placed into flows quickly for processing, thus greatly improving throughput.
TRANSFORMING NETWORKING & STORAGE
21
L1 Cache With 4 Cycle Latency
Intel Confidential
L1 Cache
Core 0
Latenc
y
4 cycle
With 4 cycles Latency, achieving Rx budget of 19 cycles seems within reach.
L1
Cache
Hit
Read Packet Descriptor
TRANSFORMING NETWORKING & STORAGE
22
Last Level Cache
L2 Cache
Challenge: What if there is L1 Cache Miss and LLC Hit?
L1 Cache
Core 0
L1 Cache
Core 0
LLC
Cache
40 cycle
With 40 cycles LLC Hit, How will you achieve Rx budget of 19 cycles ?
L1 Cache
Miss
How?
TRANSFORMING NETWORKING & STORAGE
23
EAL Initialization in a Linux Application Environment
TRANSFORMING NETWORKING & STORAGE
24
DPDK: Overview of components
EAL is primarily initialization code
•  Bootstrap processor startup, MP-Init
•  PCI Scan for supported devices (NIC, CPM)
•  Console, Keyboard, other services initialization
Ends with each logical core (execution unit) running its own dispatch loop
•  Typically bootstrap core (EU0) initializes all foundation library services
(EAL) Environment
Abstraction
Layer
Hardware
Application
INIT
Queue Mgmt
API
Buffer Mgmt
API
Classification
API
Poll Mode
Driver API
NIC	
FoundaPon	Libraries	
Generalized	Overview	
EAL
Hardware
NIC Driver
INIT
Queue Mgmt
API
Buffer Mgmt
API
Poll Mode
Driver API
NIC	
Example	of	a	IPv4	L3	Forwarding	InstanPaPon	
Classify and
Forward
Classification
API
!  EAL	dispatch	loop	has	two	applicaPons	
–  NIC	Driver	&	Classify+Forward	apps	
–  Can	be	run	on	one	core	or	split	across	mulPple	
cores	
–  Queue	Management	library	serves	as	means	of	
communicaPon	between	applicaPons
25
TRANSFORMING NETWORKING & STORAGE
Ethernet Device Framework
Application (calls rte_ethdev API)
Network H/W
rte_eth_rx_burst(…)
rrc_recv_pkts(…)
rte_eth_tx_burst(…)
rrc_xmit_pkts(…)
(Port ID, Queue ID)
(PMD specific context)
(Descriptors)
PACKETFLOW
PACKETFLOW
26
TRANSFORMING NETWORKING & STORAGETRANSFORMING NETWORKING & STORAGE
26
Userspace I/O (UIO)
https://www.kernel.org/doc/htmldocs/uio-howto/
27
TRANSFORMING NETWORKING & STORAGE
UIO Picture
Kernel
User
uio.ko
igb_uio.ko
EAL
PMDETHDEV
DPDK Application
H/W
BAR0/2/4
mmap()
…
28
TRANSFORMING NETWORKING & STORAGE
Only one small kernel module to write and maintain (igb_uio.ko).
Develop the main part of the driver in user space, with all the tools and libraries you're used to.
Bugs in the driver won't crash the kernel.
Updates of the driver can take place without recompiling the kernel.
User/Admin binds PCI devices to igb_uio
UIO Framework creates /dev/uioX, and sysfs files describing BAR regions (address, size)
DPDK scans PCI bus looking for devices matching any of it’s PMDs
If a matching Driver is found, DPDK maps BAR regions into Userspace, and calls the initialization
function originally registered by the PMD
UIO
29
TRANSFORMING NETWORKING & STORAGETRANSFORMING NETWORKING & STORAGE
29
Poll Mode Driver (PMD) - Rx & Tx Overview
Initialization
RX
TX
Polling
1.  Initialization
o  Init Memory Zones and Pools
o  Init Devices and Device Queues
o  Start Packet Forwarding Application
2.  Packet Reception (RX)
o  Poll Devices’ RX queues and receive
packets in bursts
o  Allocate new RX buffers from per queue
memory pools to stuff into descriptors
3.  Packet Transmission (TX)
o  Transmit the received packets from RX
o  Free the buffers that we used to store
the packets
30,000 ft overview of packet flow
Packets to
send
31
TRANSFORMING NETWORKING & STORAGE
Rx Overview
1.  CPU Write Rx
descriptor
2.  NIC Read Rx
descriptor to get
buffer address
3.  NIC Write Rx packet
to buffer address
4.  NIC Write Rx
descriptor
5.  CPU Read Rx
descriptor (polling)
Memory
PCIe
RX
D
TX
D
BU
F
LLC
…
Cores…
1
2
34
5
32
TRANSFORMING NETWORKING & STORAGE
Tx Overview
1.  CPU Write data
2.  CPU Write Tx
descriptor
3.  NIC Read Tx descriptor
to get buffer address
4.  NIC Read Tx packet
from buffer address
5.  NIC Write Tx descriptor
6.  CPU Read Tx
descriptor
Memory
PCIe
RX
D
TX
D
BU
F
LLC
…
Cores…
1
3
4
5
62
TRANSFORMING COMMUNICATIONS & STORAGE33
Why Packet Framework?
•  Intel	devices	with	increased	acceleraOon	capability	need	to	be	
complemented	by	SW	to	enable	complete	funcOonality	
•  Intel	DPDK	provides	highly	opOmized	SW	primiOves	that	can	
be	further	accelerated	by	Intel	HW	
Intel DPDK Packet Framework
Mooresville
(Columbia Park)
Software
(Intel DPDK
components)
Custom FPGAs
Fortville
Niantic
Intel NICs
White Rock
Canyon
Black Rock
Canyon
Red Rock
Canyon
Intel Switches
Bell Creek
(Lewisburg)
Coleto Creek
Cave Creek
Intel Chipsets
Broadwell DE
Rangeley
Gladden
Intel SONIC
Combine	the	best	Intel	HW	with	the	best	Intel	SW	to	achieve	the	best	
funcPonality	and	performance
TRANSFORMING COMMUNICATIONS & STORAGE34
What is it?
Port	In	0	
Port	In	1	
Port	Out	0	
Port	Out	1	
Port	Out	2	
Table	0	
Flow	#	
Flow	#	
Flow	#	
AcPons	
AcPons	
AcPons	
Table	1	
Flow	#	
Flow	#	
Flow	#	
AcPons	
AcPons	
AcPons	
Standard	methodology	for	pipeline	development.	
Ports	and	tables	are	connected	together	in	tree-like	topologies	,	with	
tables	providing	the	ac.ons	to	be	executed	on	input	packets.
TRANSFORMING COMMUNICATIONS & STORAGE35
Example
Actions
•  Assigned per table
•  executed in priority order on all packets that share the
current action before moving to the next action (as
opposed to all actions for one packet at a time)
•  If (fn0) call next fn() else stop
TRANSFORMING NETWORKING & STORAGE
36
Memory Pools & Per-Core Cache
Object Size fixed at creation time:
Fixed size elements
Fixed number of elements
Multi-producer/multi-consumer safe
Safe for fast-path use
Typical usage is packet buffers
Optimized for performance:
No locking, use CAS instructions
All objects cache aligned
Per core caches to minimise
contention/use of CAS
instructions
Support for bulk allocation/freeing
of buffers
Memory Pool
Pkt Buffers
(60K 2K
buffers)
Events
(2K 100B
buffers)
Events
(2K 100B
buffers)
Processor 0
10G
Intel®
DPDK
C4
Data
Plane
Intel®
DPDK
C3
Data
Plane
Intel®
DPDK
C2
Data
Plane
Intel®
DPDK
C1
Data
Plane
10G
Per-core
cached
buffers
You can implement S/W caches of Large Structures private to Each Core
Intel Confidential
TRANSFORMING NETWORKING & STORAGE
37
TRANSFORMING NETWORKING & STORAGE
38
Mbuf To Carry More Metadata From NIC3. Building
Block For
NFV/OVS
http://www.dpdk.org/browse/dpdk/tree/lib/librte_mbuf/rte_mbuf.h
TRANSFORMING NETWORKING & STORAGE
39
DPDK Trail Blazing - Performance & Functionality
•  Data Direct I/O
•  AVX1, AVX2
•  4x10GbE NICs
•  PCI-E Gen2, Gen 3
•  Optimize Code
•  New /improve Algorithm
•  Hash Functions – Jhash, rte_hash_crc
•  Cuckoo Hash
•  Tune Bulk Operations
•  Prefetch
•  Multiple Pthreads per core
•  NAPI style Interrupt mode
•  Cgroups manage resources
•  MBUF to carry more metadata from NIC
4.
Distributed
NFV
3. Building
Block For
NFV/OVS
2. Extracting More
Instructions Per
Cycle
1. Packet I/
O
•  Platform QoS
•  Specifying machine to run on
•  Adapting to the machine
•  8K Match in h/w, more in s/w
•  ACL -> NIC
First, Let us take a look at Optimizations in Packet I/
O
TRANSFORMING NETWORKING & STORAGE
40
Solution – Amortizing Over Multiple Descriptors
•  40 ns gets Amortized Over Multiple Descriptors
•  Roughly getting back to the latency of L1 cache hit
per packet
•  Similarly for packet i/o, Go For Burst Read
1. Packet I/
O
TRANSFORMING NETWORKING & STORAGE
41
Last Level Cache
L2 Cache
Examine Bunch Of Descriptors At A Time
L1 Cache
Core 0
LLC
Cache
40 cycle
With 8 Descriptors, 40 ns gets amortized over 8 Descriptors
Read 8 Packet Descriptors at a time
Packet Descriptor 5
Packet Descriptor 0
1. Packet I/
O
Packet Descriptor 1
Packet Descriptor 2
Packet Descriptor 3
Packet Descriptor 4
Packet Descriptor 6
Packet Descriptor 7
TRANSFORMING NETWORKING & STORAGE
42
Design Principle In Packet I/O Optimization
L3fwd default tuning is for performance
•  Coalesces packets up to 100 us
•  Receives and transmits at least 32 packets at a time
•  nb_rx = rte_eth_rx_burst(portid, queueid, pkts_burst, MAX-PKT_BURST)
Could bunch 8,4, 2 (or 1) packets
1. Packet I/
O
TRANSFORMING NETWORKING & STORAGE
43
Micro BenchMarks – The Best Kept Secret
0	
100	
200	
300	
400	
500	
600	
.	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	
1	2	4	8	16	32	1	2	4	8	16	32	1	2	4	8	16	32	1	2	4	8	16	32	1	2	4	8	16	32	1	2	4	8	16	32	
.	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	
Cycle Cost [Enqueue + Dequeue] in CPU cycles
Cycle	Cost	[Enqueue	+	Dequeue]	
Single Producer/Single Consumer Multi Producer /Multi Consumer
Different Block sizes
1, 2, 4, 8, 16, 32
Bulk Enqueue /
Bulk Dequeue
Single Producer/
Single Consumer
Next: 2. Extracting More Instructions Per
Cycle
1. Packet I/
O
SSE – 4 Lookups in Parallel
TRANSFORMING NETWORKING & STORAGE
44
How Can Your NFV Application Benefit From SSE and AVX ?
ACL
Classify
2. Extracting More
Instructions Per
Cycle
TRANSFORMING NETWORKING & STORAGE
45
Exploiting Data Parallelism
ACL
Classify
2. Extracting More
Instructions Per
Cycle
TRANSFORMING NETWORKING & STORAGE
46
What About Exact Match Lookup Optimization?
2. Extracting More
Instructions Per
Cycle
TRANSFORMING NETWORKING & STORAGE
47
Comparison of Different Hash Implementations
Configuration:
intel® CoreTM i7 – 2 sockets
Frequency – 3 GHz
Memory: 2 Meg Huge Page –
2 Gig each socket
82599 10 Gig NIC
2. Extracting More
Instructions Per
Cycle
Faster Hash Functions
Higher Flow Count (16M, 32M Flows)
1 Billion Entries? Bring it on !! - DPDK & Cuckoo Switch
TRANSFORMING NETWORKING & STORAGE
48
Trail Blazing - Performance & Functionality
•  Data Direct I/O
•  AVX1, AVX2
•  4x10GbE NICs
•  PCI-E Gen2, Gen 3
•  Optimize Code
•  New /improve Algorithm
•  Hash Functions – Jhash, rte_hash_crc
•  Cuckoo Hash
•  Tune Bulk Operations
•  Prefetch
•  Multiple Pthreads per core
•  NAPI style Interrupt mode
•  Cgroups manage resources
•  MBUF to carry more metadata from NIC
4.
Distributed
NFV
3. Building
Block For
NFV/OVS
2. Extracting More
Instructions Per
Cycle
1. Packet I/
O
•  Platform QoS
•  Specifying machine to run on
•  Adapting to the machine
•  8K Match in h/w, more in s/w
•  ACL -> NIC
Network Platforms Group
Cryptodev Packet Processing Flow
PF VF
NIC
PF VF
Intel® QuickAssist
Technology Accelerator
Application Code
I40E PMD I40E PMD
ETHDEV API
SW Crypto
PMD
QAT PMD
CRYPTODEV API
DPDK Application
HW/SW Boundary
DPDK API
Plaintext packet flow
(encryption)
Encrypted packet flow
(encryption)
TRANSFORMING NETWORKING & STORAGE
50
•  What About Specifying Which Machine (with capabilities) to Run on?
•  If not available, how about adapting to the Machine where NFV was placed?
•  What About …
•  To Know More Register For Free in www.dpdk.org community
4.
Distributed
NFV
What DPDK Features To Enhance NFV ?
TRANSFORMING NETWORKING & STORAGE
51
Summary
•  DPDK offers the best performance for packet processing.
•  OVS Netdev-DPDK is progressing with new features and performance
enhancements.
•  Ready for deployments today.
TRANSFORMING NETWORKING & STORAGE
52
CALL TO ACTION - Thank YOU For
Painting The NFV World With DPDK
1.  Register in DPDK Community - http://dpdk.org/ml/listinfo/dev
•  Collaborate with Intel in Open Source and Standard Bodies.
•  DPDK, Virtual Switch, Open DayLight, Open Stack etc.
2. Develop applications with DPDK for a Programmable & Scalable VNF
Let’s Collaborate and Accelerate DPDK Deployments

DPDK: Multi Architecture High Performance Packet Processing

  • 1.
    DPDK Multi Architecture High Performance PacketProcessing M Jay DPDK Presentation March 1 2017
  • 2.
    TRANSFORMING NETWORKING &STORAGE 2 Technology Disclaimer: Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at [intel.com]. Performance Disclaimers (include only the relevant ones): Cost reduction scenarios described are intended as examples of how a given Intel- based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction. Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance. General Disclaimer: © Copyright 2017 Intel Corporation. All rights reserved. Intel, the Intel logo, Intel Inside, the Intel Inside logo, Intel. Experience What’s Inside are trademarks of Intel. Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Legal Disclaimer
  • 3.
    TRANSFORMING NETWORKING &STORAGE 3 Agenda •  DPDK – Multi Architecture Support •  Why DPDK? - Optimizing Cycles per Packet •  DPDK – Building Block for OVS/NFV •  Enhancing OVS/NFV Infrastructure •  Call To Action
  • 4.
    TRANSFORMING COMMUNICATIONS &STORAGE4 Packet Size 64 bytes 40G Packets/second 59.5 Million each way Packet arrival rate 16.8 ns 2 GHz Clock cycles 33 cycles Typical Server Packet Sizes Network Infrastructure Packet Sizes Packet Size (B) Packets per second 0 10,000,000 20,000,000 30,000,000 40,000,000 50,000,000 60,000,000 70,000,000 What Problem Does DPDK address ? Packet Size 1024 bytes 40G Packets/second 4.8 Million each way Packet arrival rate 208.8 ns 2 GHz Clock cycles 417 cycles 40 Gbps Line Rate (or 4x10G) Rx Process Packet Tx
  • 5.
    TRANSFORMING NETWORKING &STORAGE 5 Packet Processing Input Packet A Look up In packet A Do the “Desired” Action Input Packet B Look up In packet B Do the “Desired” Action Inter Packet Arrival Time Line Rate 64 byte packet – Arrival Rate 10 GbE 67.2 ns 40 GbE 16.8 ns 100 6.7 ns Rx Budget = 19 cycles. Tx Budget = 28 cycles.
  • 6.
    Network Platforms Group6 * Other names and brands may be claimed as the property of others. User Space KNI IGB_UIO VFIO EAL MBUF MEMPOO L RING TIMER KernelUIO_PCI_GENERIC FM10K IXGBE VMXNET3IGB E1000 I40E XENVIRT PCAP MLX4 MLX5 ETHDEV RING NULL AF_PKT BONDING VIRTIOENIC CXGBE BNX2X PMDs: Native & Virtual SZEDATA2 NFP MPIPE HASH LPM JOBSTAT DISTRIB IP FRAGKNI REORDER POWER VHOST IVSHMEM SCHED METER PIPELINE PORT TABLE Network Functions (Cloud, Enterprise, Comms) CRYPTODEV QAT AESNI MB AcceleratorsCore Classification Extensions QoS Pkt Framework ENA AESNI GCM SNOW 3G NULL New in 16.11 PDUMP KASUMI THUNDERX BNXT QEDE VHOST ACL DPDK Framework ZUC OpenSSL
  • 7.
    Network Platforms Group7 Bond QoS Sched Link Status Interrupt L3fwd Load Balancer KNI IPv4 Multicast L2fwd Keep Alive Packet Distrib IP Pipeline Hello World Exception Path L2fwd Jobstats L2fwd IVSHMEM Timer IP Reass VMDq DCB PTP Client Packet OrderingCLI DPDK Multi Process Ethtool L3fwd VF IP Frag QoS Meter L2fwd Perf Thread L2fwd Crypto RxTx Callbacks Quota & W’mark Skeleton TEP Term Vhost VM Power Manager VMDq L3fwd Power L3fwd ACL Netmap Vhost Xen QAT DPDK Sample Apps L2fwd CAT IPsec Sec GW
  • 8.
    Network Platforms Group8 DPDK Acceleration Enhancements DPDK API Traffic Gens Pktgen, T-Rex, Moongen, … vSwitch OVS, Lagopus, … DPDK example apps AES-NI Future features Event based program models Threading Models lthreads, … Video Apps EAL MALLOC MBUF MEMPOOL RING TIMER Core Libraries KNI POWER IVSHME M Platform LPM Classificat ion ACL Classify e1000 ixgbe bonding af_pkt i40e fm10k Packet Access (PMD) ETHDEV xenvirt enic ring METER SCHED QoS cxgbe vmxnet3 virtio PIPELINE mlx4 memnic others HASH Utilities IP Frag CMDLINE JOBSTAT KVARGS REORDE R TABLE Legacy DPDK Future acceleratorsCrypto Programmable Classifier/Parser HW 3rd Party GPU/FPGA 3rd Party SoC PMD External mempool manager SoC HW SOC model VNF Apps DPDK Acceleration Enhancements DPDK Framework Network Stacks libUNS, mTCP, SeaStar, libuinet, TLDK, … Compression 3rd Party HW/SW IPSec DPI Hyperscan Proxy Apps, …
  • 9.
    Network Platforms Group9 DPDK in OS Distros DPDK is also available as part of the following OS distributions: Version 7.1 & higher Version 7.1 + Version 15.10 + Version 10.1 + Version 22 + * Other names and brands may be claimed as the property of others. Version 6 + Clear Linux Version 7160 +
  • 10.
    TRANSFORMING NETWORKING &STORAGE 10 What Is The Task At Hand? Receive Process Transmit rx cost tx cost A Chain is only as strong as …..
  • 11.
    TRANSFORMING NETWORKING &STORAGE 11 Benefits – Eliminating / Hiding Overheads Interrupt Context Switch Overhead Kernel User Overhead Core To Thread Scheduling Overhead EliminaOng How? Polling User Mode Driver Pthread Affinity 4K Paging Overhead PCI Bridge I/ O Overhead EliminaPng /Hiding How? Huge Page Lockless Inter-core CommunicaOon High Throughput Bulk Mode I/O calls To Tackle this challenge, what kind of devices /latency we have at our disposal?
  • 12.
    Network Platforms Group12 DPDK Generational Performance Gains Disclaimer: Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance. IPV4 L3 Forwarding Performance of 64Byte Packets * Other names and brands may be claimed as the property of others. Broadwell EP System Configuration Hardware Platform SuperMicro® - X10DRX CPU Intel® Xeon® Processor E5-2658 v4 Chipset Intel® C612 chipset Sockets 2 Cores per Socket 14 (28 threads) LL CACHE 30 MB QPI/DMI 9.6GT/s PCIe Gen3x8 MEMORY DDR4 2400 MHz, 1Rx4 8GB (total 64GB), 4 Channel per Socket NIC 10 x Intel® Ethernet CNA XL710-QDA2PCI-Express Gen3 x8 Dual Port 40 GbE Ethernet NIC (1x40G/card) NIC Mbps 40,000 BIOS BIOS version: 1.0c (02/12/2015) Software OS Debian 8.0 Kernel version 3.18.2 Other DPDK2.2.0 55 80.1 164.9 255 279.9 346.7 0 50 100 150 200 250 300 350 400 2010 (2S WMR) 2011 (1S SNB) 2012(2S SNB) 2013 (2S IVB) 2014 (2S HSW) 2015 (2S BDW) L3FwdPerformance(MPPS) Year 37 Gbps 53.8 Gbps 110.8 Gbps 171.4 Gbps 187.2 Gbps 233 Gbps 2010 (2S WMR) 2011 (1S SNB) 2013 (2S IVB) 2012 (2S SNB) 2015 (2S BDW) 2014 (2S HSW)
  • 13.
    TRANSFORMING NETWORKING &STORAGE 13 What are the top two key performance metrics?
  • 14.
    TRANSFORMING NETWORKING &STORAGE 14 1) Latencies come in all shapes and sizes – What to do? RTE_PREFETC H
  • 15.
    TRANSFORMING NETWORKING &STORAGE 15 15 How Is Latency? •  MIT* white paper on Fast Pass •  Dream of a system with ZERO Queue •  Ultimate testimonial for Latency See How DPDK Can Solve Your Latency Concern http://fastpass.mit.edu
  • 16.
    TRANSFORMING NETWORKING &STORAGE 16 2) Throughput Why go for 1s and 2s while you can take a truck load with BULK API ?
  • 17.
    TRANSFORMING NETWORKING &STORAGE 17 Great White Paper – A Must To Read hSp://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-mulPcore- packet-processing-paper.pdf
  • 18.
    TRANSFORMING NETWORKING &STORAGE 18 Core Components Architecture
  • 19.
    TRANSFORMING NETWORKING &STORAGE 19 User Space Ethernet Intel® DPDK PMD IP TCP Session Presentation Application L3 Forward Kernel 10GbE 10GbE 10GbE 10GbE 10GbE 4K pages (64) SKbuff DPDK KNI PMD PMD PMD PMD Intel® DPDK allocates packet memory equally across 2, 3, 4 channels. Aligned to have equal load over channels Stacks available from Eco Systems Run to complePon model on each core used DPDK model IGB-UIO IGB IXGBE KNI RYO Stacks “Open-source Stack” (NetBSD) Pkt Buffers (60K 2K buffers) Events (2K 100B buffers) Rings for cached buffers Per core lists, unique per lcore. Allows packet movement without locks 2 MB / 1 GB Huge Pages for Cache Aligned Structures
  • 20.
    TRANSFORMING NETWORKING &STORAGE 20 High Performance Components of DPDKEnvironment Abstraction Layer •  Abstracts huge-page file system, provides multi-thread and multi-process support, etc. Memory Manager •  Responsible for allocating pools of objects in memory. A pool is created in huge page memory space and uses a ring to store free objects. It also provides an alignment helper to ensure that objects are padded to spread them equally on all DRAM channels. Buffer Manager •  Reduces by a significant amount the time the operating system spends allocating and de-allocating buffers. The Intel® DPDK pre-allocates fixed size buffers which are stored in memory pools. Queue Manager •  Implements safe lockless queues, instead of using spinlocks, that allow different software components to process packets, while avoiding unnecessary wait times. Flow Classification •  Provides an efficient mechanism which incorporates Intel® Streaming SIMD Extensions (Intel® SSE) to produce a hash based on tuple information so that packets may be placed into flows quickly for processing, thus greatly improving throughput.
  • 21.
    TRANSFORMING NETWORKING &STORAGE 21 L1 Cache With 4 Cycle Latency Intel Confidential L1 Cache Core 0 Latenc y 4 cycle With 4 cycles Latency, achieving Rx budget of 19 cycles seems within reach. L1 Cache Hit Read Packet Descriptor
  • 22.
    TRANSFORMING NETWORKING &STORAGE 22 Last Level Cache L2 Cache Challenge: What if there is L1 Cache Miss and LLC Hit? L1 Cache Core 0 L1 Cache Core 0 LLC Cache 40 cycle With 40 cycles LLC Hit, How will you achieve Rx budget of 19 cycles ? L1 Cache Miss How?
  • 23.
    TRANSFORMING NETWORKING &STORAGE 23 EAL Initialization in a Linux Application Environment
  • 24.
    TRANSFORMING NETWORKING &STORAGE 24 DPDK: Overview of components EAL is primarily initialization code •  Bootstrap processor startup, MP-Init •  PCI Scan for supported devices (NIC, CPM) •  Console, Keyboard, other services initialization Ends with each logical core (execution unit) running its own dispatch loop •  Typically bootstrap core (EU0) initializes all foundation library services (EAL) Environment Abstraction Layer Hardware Application INIT Queue Mgmt API Buffer Mgmt API Classification API Poll Mode Driver API NIC FoundaPon Libraries Generalized Overview EAL Hardware NIC Driver INIT Queue Mgmt API Buffer Mgmt API Poll Mode Driver API NIC Example of a IPv4 L3 Forwarding InstanPaPon Classify and Forward Classification API !  EAL dispatch loop has two applicaPons –  NIC Driver & Classify+Forward apps –  Can be run on one core or split across mulPple cores –  Queue Management library serves as means of communicaPon between applicaPons
  • 25.
    25 TRANSFORMING NETWORKING &STORAGE Ethernet Device Framework Application (calls rte_ethdev API) Network H/W rte_eth_rx_burst(…) rrc_recv_pkts(…) rte_eth_tx_burst(…) rrc_xmit_pkts(…) (Port ID, Queue ID) (PMD specific context) (Descriptors) PACKETFLOW PACKETFLOW
  • 26.
    26 TRANSFORMING NETWORKING &STORAGETRANSFORMING NETWORKING & STORAGE 26 Userspace I/O (UIO) https://www.kernel.org/doc/htmldocs/uio-howto/
  • 27.
    27 TRANSFORMING NETWORKING &STORAGE UIO Picture Kernel User uio.ko igb_uio.ko EAL PMDETHDEV DPDK Application H/W BAR0/2/4 mmap() …
  • 28.
    28 TRANSFORMING NETWORKING &STORAGE Only one small kernel module to write and maintain (igb_uio.ko). Develop the main part of the driver in user space, with all the tools and libraries you're used to. Bugs in the driver won't crash the kernel. Updates of the driver can take place without recompiling the kernel. User/Admin binds PCI devices to igb_uio UIO Framework creates /dev/uioX, and sysfs files describing BAR regions (address, size) DPDK scans PCI bus looking for devices matching any of it’s PMDs If a matching Driver is found, DPDK maps BAR regions into Userspace, and calls the initialization function originally registered by the PMD UIO
  • 29.
    29 TRANSFORMING NETWORKING &STORAGETRANSFORMING NETWORKING & STORAGE 29 Poll Mode Driver (PMD) - Rx & Tx Overview
  • 30.
    Initialization RX TX Polling 1.  Initialization o  InitMemory Zones and Pools o  Init Devices and Device Queues o  Start Packet Forwarding Application 2.  Packet Reception (RX) o  Poll Devices’ RX queues and receive packets in bursts o  Allocate new RX buffers from per queue memory pools to stuff into descriptors 3.  Packet Transmission (TX) o  Transmit the received packets from RX o  Free the buffers that we used to store the packets 30,000 ft overview of packet flow Packets to send
  • 31.
    31 TRANSFORMING NETWORKING &STORAGE Rx Overview 1.  CPU Write Rx descriptor 2.  NIC Read Rx descriptor to get buffer address 3.  NIC Write Rx packet to buffer address 4.  NIC Write Rx descriptor 5.  CPU Read Rx descriptor (polling) Memory PCIe RX D TX D BU F LLC … Cores… 1 2 34 5
  • 32.
    32 TRANSFORMING NETWORKING &STORAGE Tx Overview 1.  CPU Write data 2.  CPU Write Tx descriptor 3.  NIC Read Tx descriptor to get buffer address 4.  NIC Read Tx packet from buffer address 5.  NIC Write Tx descriptor 6.  CPU Read Tx descriptor Memory PCIe RX D TX D BU F LLC … Cores… 1 3 4 5 62
  • 33.
    TRANSFORMING COMMUNICATIONS &STORAGE33 Why Packet Framework? •  Intel devices with increased acceleraOon capability need to be complemented by SW to enable complete funcOonality •  Intel DPDK provides highly opOmized SW primiOves that can be further accelerated by Intel HW Intel DPDK Packet Framework Mooresville (Columbia Park) Software (Intel DPDK components) Custom FPGAs Fortville Niantic Intel NICs White Rock Canyon Black Rock Canyon Red Rock Canyon Intel Switches Bell Creek (Lewisburg) Coleto Creek Cave Creek Intel Chipsets Broadwell DE Rangeley Gladden Intel SONIC Combine the best Intel HW with the best Intel SW to achieve the best funcPonality and performance
  • 34.
    TRANSFORMING COMMUNICATIONS &STORAGE34 What is it? Port In 0 Port In 1 Port Out 0 Port Out 1 Port Out 2 Table 0 Flow # Flow # Flow # AcPons AcPons AcPons Table 1 Flow # Flow # Flow # AcPons AcPons AcPons Standard methodology for pipeline development. Ports and tables are connected together in tree-like topologies , with tables providing the ac.ons to be executed on input packets.
  • 35.
    TRANSFORMING COMMUNICATIONS &STORAGE35 Example Actions •  Assigned per table •  executed in priority order on all packets that share the current action before moving to the next action (as opposed to all actions for one packet at a time) •  If (fn0) call next fn() else stop
  • 36.
    TRANSFORMING NETWORKING &STORAGE 36 Memory Pools & Per-Core Cache Object Size fixed at creation time: Fixed size elements Fixed number of elements Multi-producer/multi-consumer safe Safe for fast-path use Typical usage is packet buffers Optimized for performance: No locking, use CAS instructions All objects cache aligned Per core caches to minimise contention/use of CAS instructions Support for bulk allocation/freeing of buffers Memory Pool Pkt Buffers (60K 2K buffers) Events (2K 100B buffers) Events (2K 100B buffers) Processor 0 10G Intel® DPDK C4 Data Plane Intel® DPDK C3 Data Plane Intel® DPDK C2 Data Plane Intel® DPDK C1 Data Plane 10G Per-core cached buffers You can implement S/W caches of Large Structures private to Each Core Intel Confidential
  • 37.
  • 38.
    TRANSFORMING NETWORKING &STORAGE 38 Mbuf To Carry More Metadata From NIC3. Building Block For NFV/OVS http://www.dpdk.org/browse/dpdk/tree/lib/librte_mbuf/rte_mbuf.h
  • 39.
    TRANSFORMING NETWORKING &STORAGE 39 DPDK Trail Blazing - Performance & Functionality •  Data Direct I/O •  AVX1, AVX2 •  4x10GbE NICs •  PCI-E Gen2, Gen 3 •  Optimize Code •  New /improve Algorithm •  Hash Functions – Jhash, rte_hash_crc •  Cuckoo Hash •  Tune Bulk Operations •  Prefetch •  Multiple Pthreads per core •  NAPI style Interrupt mode •  Cgroups manage resources •  MBUF to carry more metadata from NIC 4. Distributed NFV 3. Building Block For NFV/OVS 2. Extracting More Instructions Per Cycle 1. Packet I/ O •  Platform QoS •  Specifying machine to run on •  Adapting to the machine •  8K Match in h/w, more in s/w •  ACL -> NIC First, Let us take a look at Optimizations in Packet I/ O
  • 40.
    TRANSFORMING NETWORKING &STORAGE 40 Solution – Amortizing Over Multiple Descriptors •  40 ns gets Amortized Over Multiple Descriptors •  Roughly getting back to the latency of L1 cache hit per packet •  Similarly for packet i/o, Go For Burst Read 1. Packet I/ O
  • 41.
    TRANSFORMING NETWORKING &STORAGE 41 Last Level Cache L2 Cache Examine Bunch Of Descriptors At A Time L1 Cache Core 0 LLC Cache 40 cycle With 8 Descriptors, 40 ns gets amortized over 8 Descriptors Read 8 Packet Descriptors at a time Packet Descriptor 5 Packet Descriptor 0 1. Packet I/ O Packet Descriptor 1 Packet Descriptor 2 Packet Descriptor 3 Packet Descriptor 4 Packet Descriptor 6 Packet Descriptor 7
  • 42.
    TRANSFORMING NETWORKING &STORAGE 42 Design Principle In Packet I/O Optimization L3fwd default tuning is for performance •  Coalesces packets up to 100 us •  Receives and transmits at least 32 packets at a time •  nb_rx = rte_eth_rx_burst(portid, queueid, pkts_burst, MAX-PKT_BURST) Could bunch 8,4, 2 (or 1) packets 1. Packet I/ O
  • 43.
    TRANSFORMING NETWORKING &STORAGE 43 Micro BenchMarks – The Best Kept Secret 0 100 200 300 400 500 600 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 4 8 16 32 1 2 4 8 16 32 1 2 4 8 16 32 1 2 4 8 16 32 1 2 4 8 16 32 1 2 4 8 16 32 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cycle Cost [Enqueue + Dequeue] in CPU cycles Cycle Cost [Enqueue + Dequeue] Single Producer/Single Consumer Multi Producer /Multi Consumer Different Block sizes 1, 2, 4, 8, 16, 32 Bulk Enqueue / Bulk Dequeue Single Producer/ Single Consumer Next: 2. Extracting More Instructions Per Cycle 1. Packet I/ O SSE – 4 Lookups in Parallel
  • 44.
    TRANSFORMING NETWORKING &STORAGE 44 How Can Your NFV Application Benefit From SSE and AVX ? ACL Classify 2. Extracting More Instructions Per Cycle
  • 45.
    TRANSFORMING NETWORKING &STORAGE 45 Exploiting Data Parallelism ACL Classify 2. Extracting More Instructions Per Cycle
  • 46.
    TRANSFORMING NETWORKING &STORAGE 46 What About Exact Match Lookup Optimization? 2. Extracting More Instructions Per Cycle
  • 47.
    TRANSFORMING NETWORKING &STORAGE 47 Comparison of Different Hash Implementations Configuration: intel® CoreTM i7 – 2 sockets Frequency – 3 GHz Memory: 2 Meg Huge Page – 2 Gig each socket 82599 10 Gig NIC 2. Extracting More Instructions Per Cycle Faster Hash Functions Higher Flow Count (16M, 32M Flows) 1 Billion Entries? Bring it on !! - DPDK & Cuckoo Switch
  • 48.
    TRANSFORMING NETWORKING &STORAGE 48 Trail Blazing - Performance & Functionality •  Data Direct I/O •  AVX1, AVX2 •  4x10GbE NICs •  PCI-E Gen2, Gen 3 •  Optimize Code •  New /improve Algorithm •  Hash Functions – Jhash, rte_hash_crc •  Cuckoo Hash •  Tune Bulk Operations •  Prefetch •  Multiple Pthreads per core •  NAPI style Interrupt mode •  Cgroups manage resources •  MBUF to carry more metadata from NIC 4. Distributed NFV 3. Building Block For NFV/OVS 2. Extracting More Instructions Per Cycle 1. Packet I/ O •  Platform QoS •  Specifying machine to run on •  Adapting to the machine •  8K Match in h/w, more in s/w •  ACL -> NIC
  • 49.
    Network Platforms Group CryptodevPacket Processing Flow PF VF NIC PF VF Intel® QuickAssist Technology Accelerator Application Code I40E PMD I40E PMD ETHDEV API SW Crypto PMD QAT PMD CRYPTODEV API DPDK Application HW/SW Boundary DPDK API Plaintext packet flow (encryption) Encrypted packet flow (encryption)
  • 50.
    TRANSFORMING NETWORKING &STORAGE 50 •  What About Specifying Which Machine (with capabilities) to Run on? •  If not available, how about adapting to the Machine where NFV was placed? •  What About … •  To Know More Register For Free in www.dpdk.org community 4. Distributed NFV What DPDK Features To Enhance NFV ?
  • 51.
    TRANSFORMING NETWORKING &STORAGE 51 Summary •  DPDK offers the best performance for packet processing. •  OVS Netdev-DPDK is progressing with new features and performance enhancements. •  Ready for deployments today.
  • 52.
    TRANSFORMING NETWORKING &STORAGE 52 CALL TO ACTION - Thank YOU For Painting The NFV World With DPDK 1.  Register in DPDK Community - http://dpdk.org/ml/listinfo/dev •  Collaborate with Intel in Open Source and Standard Bodies. •  DPDK, Virtual Switch, Open DayLight, Open Stack etc. 2. Develop applications with DPDK for a Programmable & Scalable VNF Let’s Collaborate and Accelerate DPDK Deployments