SlideShare a Scribd company logo
Flexible High Performance Traffic
Generation on Commodity Multi-core
Platforms
Nicola Bonelli, Andrea Di Pietro,
Stefano Giordano, Gregorio Procissi
CNIT and Dip. di Ingegneria dell’Informazione - Università di Pisa
Introduction and Motivations
• New network devices are emerging… (probes, NIDs, shapers)
• Available traffic generator from the market:
• Expensive black-box solutions (i.e. Spirent AX analyzer)
• Not enough extensible: limited traffic patterns, poor semantics for randomization, etc.
• PC and professional NICs based solutions are cheaper (Endace, Napatech,
Invea-tech)
• Enable fast packet transmission but usually do not provide a framework for traffic
generation
• Traffic generator should combine the flexibility of the software with the
power of the modern hardware
• multi-core architectures equipped with multi-queues NICs are today
commodity hardware
• Is it possible to create a software for traffic generation that, running of
top of such a parallel architecture, is able to provide hardware-class
performance?
Software for traffic generation
• A number of software solutions for traffic generation (trafgen, iperf, rude/crude,
mgen)
• Ostinato, and Brute makes use of PF_PACKET sockets and therefore are able to
customize traffic at data-link layer:
• - Packet rate hardly exceed few million packets per second (no scalability)
• - No explicit support of multi-queue NICs
• - It does not support time-stamping to adjust the timing with which to transmit packets
Fast packet transmission…
• Recently accelerated drivers have also emerged: netmap (Luigi Rizzo)
• memory-map the DMA descriptors of NICs to user-space and can transmit at wire-speed
(14.8Mpps) the same packet or a small set of of packets
• A single thread generating a random-address IP packets does not fill the pipe (~6/8 Mpps
each)
• Also using the very fast Mersenne-twister random generator! (~50 CPU cycles)
• Additional investigations are required…
PF_DIRECT features
We implemented a brand new socket, named PF_DIRECT:
• A socket designed for the traffic generation (and transmission)
• Compliant with vanilla drivers (not a custom driver)
• Designed to run on top of commodity parallel hardware
• Support of timestamp in transmission
• Decoupling the traffic generation from packet transmission
• Packets are generated by a user-space thread and transmitted by
multiple kernel threads
• Simple patterns are generated and transmitted nearly at wire speed
• More complex patterns, most likely, do not have this requirement
PF_DIRECT architecture
PF_DIRECT kernel module consists of:
• A user-space library written in C++11 supposed to handle memory mapping,
packet dispatching among k-thread, etc.
• A special memory mapped byte-oriented SPSC queue
• Amortizes traffic coherence between cores (of queue index invalidations)
• Kernel thread supposed to transmit the packets buffered at the SPSC
queues, each at the given timestamp
• Active wait or reschedule in case of long wait…
• TSC of different cores are synchronized on modern CPUs (INVARIANT_TSC)
• A ring of pre-allocated socket buffers (skb) which are re-used by the
kernel module and never get deallocated by network drivers
• User-counter trick
PF_DIRECT architecture
Traffic generation with PF_DIRECT
Our experimental traffic generator, built on top of PF_DIRECT, consists of:
• User-space application, where each thread of execution represent a
source of traffic
• Traffic sources “Engine” (that can concurrently make use of different
traffic models)
• User-space thread, one per core, running a deadline scheduler (~20 ns
context switch)
• A user-defined traffic mode (micro-thread) is in charge of:
• Create the packet to be transmitted
• Schedule the timestamp for the packet transmission
• Send the packet through the PF_DIRECT socket (buffered it at the SPSC queue)
• Xml composition blocks that allow to instantiate and bind a given source
to a core and to a given hardware queue
Traffic generator architecture
Experimental results: 1G
Monsters
1 Gb link
Xeon 6-core X5650 @2.57 GHz, 12GBytes
RAM
Intel 82599 multi-queue 10G Ethernet adapter,
ixgbe 3.4.24 device driver
PF_DIRECT for traffic generation
Spirent AX-4000 Traffic Analyzer
Model CBR, 64bytes frames with random IP addresses:
single source: 1 user-space thread
hardware queue: 1 kernel thread
1G link: CBR 100kpps, interarrival time
1G link: variadic rate up to 1.4Mpps
1G link: Inter-arrival times of
Poisson process at 100Kpps
1G link: Inter-arrival times of
Poisson process at 1Mpps
Experimental results: 10G
Mascara Monsters
10 Gb link
Xeon 6-core X5650 @2.57 GHz, 12GBytes
RAM
Intel 82599 multi-queue 10G Ethernet adapter,
ixgbe 3.4.24 device driver
PF_DIRECT for traffic generation
Xeon 6-core X5650 @2.57 GHz, 12 GBytes
RAM
Intel 82599 multi-queue 10G ethernet
adapter, ixgbe 3.4.24 device driver
PFQ for traffic capture
Model CBR, 64bytes frames with random IP addresses:
1 user-space thread
multiple hardware queue: 4 kernel threads
10G link: variadic rate up to 12.8Mpps
10G link: Inter-arrival times of
Poisson process at 4Mpps
10G link: throughput bps
10G link: throughput bps
Conclusions
• PF_DIRECT a Linux socket that leverages the potential
of multi-core architectures and multi-queues NICs
• PF_DIRECT decouples the task of packet generation
from that of transmission
• A single thread is able to generate non-trivial traffic, close
to the wire-rate ~13Mpps
• Multiple kernel-threads transmit packets though multiple
queues
• Support transmission timestamp (in TSC)
• Experimental traffic generator on top of PF_DIRECT
Future work
• Release the PF_DIRECT source code
• Additional performance improvements in PF_DIRECT
• Performance: identify a small set of changes, common to
different drivers, that could define a “PF_DIRECT aware-
driver”
• Implement a stable version of the “traffic generator” with
complex traffic models

More Related Content

What's hot

General Purpose GPU Computing
General Purpose GPU ComputingGeneral Purpose GPU Computing
General Purpose GPU Computing
GlobalLogic Ukraine
 
Maxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorialMaxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorialmadhuinturi
 
File Systems: Why, How and Where
File Systems: Why, How and WhereFile Systems: Why, How and Where
File Systems: Why, How and Where
Kernel TLV
 
EBPF and Linux Networking
EBPF and Linux NetworkingEBPF and Linux Networking
EBPF and Linux Networking
PLUMgrid
 
Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509
Linaro
 
Linux Kernel Cryptographic API and Use Cases
Linux Kernel Cryptographic API and Use CasesLinux Kernel Cryptographic API and Use Cases
Linux Kernel Cryptographic API and Use Cases
Kernel TLV
 
DPDK Integration: A Product's Journey - Roger B. Melton
DPDK Integration: A Product's Journey - Roger B. MeltonDPDK Integration: A Product's Journey - Roger B. Melton
DPDK Integration: A Product's Journey - Roger B. Melton
harryvanhaaren
 
Symmetric Crypto for DPDK - Declan Doherty
Symmetric Crypto for DPDK - Declan DohertySymmetric Crypto for DPDK - Declan Doherty
Symmetric Crypto for DPDK - Declan Doherty
harryvanhaaren
 
P4 to OpenDataPlane Compiler - BUD17-304
P4 to OpenDataPlane Compiler - BUD17-304P4 to OpenDataPlane Compiler - BUD17-304
P4 to OpenDataPlane Compiler - BUD17-304
Linaro
 
Mahti quick-start guide
Mahti quick-start guide Mahti quick-start guide
Mahti quick-start guide
CSC - IT Center for Science
 
Smart logic
Smart logicSmart logic
VSPERF BEnchmarking the Network Data Plane of NFV VDevices and VLinks
VSPERF BEnchmarking the Network Data Plane of NFV VDevices and VLinksVSPERF BEnchmarking the Network Data Plane of NFV VDevices and VLinks
VSPERF BEnchmarking the Network Data Plane of NFV VDevices and VLinks
OPNFV
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)
Kirill Tsym
 
Bgpcep odl summit 2015
Bgpcep odl summit 2015Bgpcep odl summit 2015
Bgpcep odl summit 2015
Giles Heron
 
Performance challenges in software networking
Performance challenges in software networkingPerformance challenges in software networking
Performance challenges in software networking
Stephen Hemminger
 
Asymmetric Multiprocessing - Kynetics ELC 2018 portland
Asymmetric Multiprocessing - Kynetics ELC 2018 portlandAsymmetric Multiprocessing - Kynetics ELC 2018 portland
Asymmetric Multiprocessing - Kynetics ELC 2018 portland
Nicola La Gloria
 
Heterogeneous multiprocessing on androd and i.mx7
Heterogeneous multiprocessing on androd and i.mx7Heterogeneous multiprocessing on androd and i.mx7
Heterogeneous multiprocessing on androd and i.mx7
Kynetics
 
Foss Gadgematics
Foss GadgematicsFoss Gadgematics
Foss Gadgematics
Bud Siddhisena
 
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
byteLAKE
 
Run Your Own 6LoWPAN Based IoT Network
Run Your Own 6LoWPAN Based IoT NetworkRun Your Own 6LoWPAN Based IoT Network
Run Your Own 6LoWPAN Based IoT Network
Samsung Open Source Group
 

What's hot (20)

General Purpose GPU Computing
General Purpose GPU ComputingGeneral Purpose GPU Computing
General Purpose GPU Computing
 
Maxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorialMaxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorial
 
File Systems: Why, How and Where
File Systems: Why, How and WhereFile Systems: Why, How and Where
File Systems: Why, How and Where
 
EBPF and Linux Networking
EBPF and Linux NetworkingEBPF and Linux Networking
EBPF and Linux Networking
 
Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509
 
Linux Kernel Cryptographic API and Use Cases
Linux Kernel Cryptographic API and Use CasesLinux Kernel Cryptographic API and Use Cases
Linux Kernel Cryptographic API and Use Cases
 
DPDK Integration: A Product's Journey - Roger B. Melton
DPDK Integration: A Product's Journey - Roger B. MeltonDPDK Integration: A Product's Journey - Roger B. Melton
DPDK Integration: A Product's Journey - Roger B. Melton
 
Symmetric Crypto for DPDK - Declan Doherty
Symmetric Crypto for DPDK - Declan DohertySymmetric Crypto for DPDK - Declan Doherty
Symmetric Crypto for DPDK - Declan Doherty
 
P4 to OpenDataPlane Compiler - BUD17-304
P4 to OpenDataPlane Compiler - BUD17-304P4 to OpenDataPlane Compiler - BUD17-304
P4 to OpenDataPlane Compiler - BUD17-304
 
Mahti quick-start guide
Mahti quick-start guide Mahti quick-start guide
Mahti quick-start guide
 
Smart logic
Smart logicSmart logic
Smart logic
 
VSPERF BEnchmarking the Network Data Plane of NFV VDevices and VLinks
VSPERF BEnchmarking the Network Data Plane of NFV VDevices and VLinksVSPERF BEnchmarking the Network Data Plane of NFV VDevices and VLinks
VSPERF BEnchmarking the Network Data Plane of NFV VDevices and VLinks
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)
 
Bgpcep odl summit 2015
Bgpcep odl summit 2015Bgpcep odl summit 2015
Bgpcep odl summit 2015
 
Performance challenges in software networking
Performance challenges in software networkingPerformance challenges in software networking
Performance challenges in software networking
 
Asymmetric Multiprocessing - Kynetics ELC 2018 portland
Asymmetric Multiprocessing - Kynetics ELC 2018 portlandAsymmetric Multiprocessing - Kynetics ELC 2018 portland
Asymmetric Multiprocessing - Kynetics ELC 2018 portland
 
Heterogeneous multiprocessing on androd and i.mx7
Heterogeneous multiprocessing on androd and i.mx7Heterogeneous multiprocessing on androd and i.mx7
Heterogeneous multiprocessing on androd and i.mx7
 
Foss Gadgematics
Foss GadgematicsFoss Gadgematics
Foss Gadgematics
 
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
 
Run Your Own 6LoWPAN Based IoT Network
Run Your Own 6LoWPAN Based IoT NetworkRun Your Own 6LoWPAN Based IoT Network
Run Your Own 6LoWPAN Based IoT Network
 

Viewers also liked

Testbed For Ids
Testbed For IdsTestbed For Ids
Testbed For Ids
amiable_indian
 
TRex Realistic Traffic Generator - Stateless support
TRex  Realistic Traffic Generator  - Stateless support TRex  Realistic Traffic Generator  - Stateless support
TRex Realistic Traffic Generator - Stateless support
Hanoch Haim
 
Types, classes and concepts
Types, classes and conceptsTypes, classes and concepts
Types, classes and concepts
Nicola Bonelli
 
Cat's anatomy
Cat's anatomyCat's anatomy
Cat's anatomy
Nicola Bonelli
 
PFQ@ 10th Italian Networking Workshop (Bormio)
PFQ@ 10th Italian Networking Workshop (Bormio)PFQ@ 10th Italian Networking Workshop (Bormio)
PFQ@ 10th Italian Networking Workshop (Bormio)
Nicola Bonelli
 
Netmap presentation
Netmap presentationNetmap presentation
Netmap presentation
Amir Razmjou
 
TRex Traffic Generator - Hanoch Haim
TRex Traffic Generator - Hanoch HaimTRex Traffic Generator - Hanoch Haim
TRex Traffic Generator - Hanoch Haim
harryvanhaaren
 
DPDK KNI interface
DPDK KNI interfaceDPDK KNI interface
DPDK KNI interface
Denys Haryachyy
 
Understanding DPDK algorithmics
Understanding DPDK algorithmicsUnderstanding DPDK algorithmics
Understanding DPDK algorithmics
Denys Haryachyy
 
Vagrant
VagrantVagrant
Userspace networking
Userspace networkingUserspace networking
Userspace networking
Stephen Hemminger
 
Understanding DPDK
Understanding DPDKUnderstanding DPDK
Understanding DPDK
Denys Haryachyy
 

Viewers also liked (12)

Testbed For Ids
Testbed For IdsTestbed For Ids
Testbed For Ids
 
TRex Realistic Traffic Generator - Stateless support
TRex  Realistic Traffic Generator  - Stateless support TRex  Realistic Traffic Generator  - Stateless support
TRex Realistic Traffic Generator - Stateless support
 
Types, classes and concepts
Types, classes and conceptsTypes, classes and concepts
Types, classes and concepts
 
Cat's anatomy
Cat's anatomyCat's anatomy
Cat's anatomy
 
PFQ@ 10th Italian Networking Workshop (Bormio)
PFQ@ 10th Italian Networking Workshop (Bormio)PFQ@ 10th Italian Networking Workshop (Bormio)
PFQ@ 10th Italian Networking Workshop (Bormio)
 
Netmap presentation
Netmap presentationNetmap presentation
Netmap presentation
 
TRex Traffic Generator - Hanoch Haim
TRex Traffic Generator - Hanoch HaimTRex Traffic Generator - Hanoch Haim
TRex Traffic Generator - Hanoch Haim
 
DPDK KNI interface
DPDK KNI interfaceDPDK KNI interface
DPDK KNI interface
 
Understanding DPDK algorithmics
Understanding DPDK algorithmicsUnderstanding DPDK algorithmics
Understanding DPDK algorithmics
 
Vagrant
VagrantVagrant
Vagrant
 
Userspace networking
Userspace networkingUserspace networking
Userspace networking
 
Understanding DPDK
Understanding DPDKUnderstanding DPDK
Understanding DPDK
 

Similar to PF_DIRECT@TMA12

Mp So C 18 Apr
Mp So C 18 AprMp So C 18 Apr
Mp So C 18 Apr
FNian
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PROIDEA
 
Rethinking computation: A processor architecture for machine intelligence
Rethinking computation: A processor architecture for machine intelligenceRethinking computation: A processor architecture for machine intelligence
Rethinking computation: A processor architecture for machine intelligence
Intel Nervana
 
Sharing High-Performance Interconnects Across Multiple Virtual Machines
Sharing High-Performance Interconnects Across Multiple Virtual MachinesSharing High-Performance Interconnects Across Multiple Virtual Machines
Sharing High-Performance Interconnects Across Multiple Virtual Machines
inside-BigData.com
 
Fastsocket Linxiaofeng
Fastsocket LinxiaofengFastsocket Linxiaofeng
Fastsocket Linxiaofeng
Michael Zhang
 
Hari Krishna Vetsa Resume
Hari Krishna Vetsa ResumeHari Krishna Vetsa Resume
Hari Krishna Vetsa ResumeHari Krishna
 
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Michelle Holley
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
Kernel TLV
 
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors
Michelle Holley
 
Seminar Accelerating Business Using Microservices Architecture in Digital Age...
Seminar Accelerating Business Using Microservices Architecture in Digital Age...Seminar Accelerating Business Using Microservices Architecture in Digital Age...
Seminar Accelerating Business Using Microservices Architecture in Digital Age...
PT Datacomm Diangraha
 
From the Archives: Future of Supercomputing at Altparty 2009
From the Archives: Future of Supercomputing at Altparty 2009From the Archives: Future of Supercomputing at Altparty 2009
From the Archives: Future of Supercomputing at Altparty 2009
Olli-Pekka Lehto
 
Cloud Networking Trends
Cloud Networking TrendsCloud Networking Trends
Cloud Networking Trends
Michelle Holley
 
Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power
Deepak Shankar
 
FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning
Dr. Swaminathan Kathirvel
 
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.ioFast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
OPNFV
 
Point to point interconnect
Point to point interconnectPoint to point interconnect
Point to point interconnect
Kinza Razzaq
 
NUMA-aware thread-parallel breadth-first search for Graph500 and Green Graph5...
NUMA-aware thread-parallel breadth-first search for Graph500 and Green Graph5...NUMA-aware thread-parallel breadth-first search for Graph500 and Green Graph5...
NUMA-aware thread-parallel breadth-first search for Graph500 and Green Graph5...
Yuichiro Yasui
 
Reconfigurable network on chip
Reconfigurable network on chipReconfigurable network on chip
Reconfigurable network on chip
AngelinaRoyappa1
 
asap2013-khoa-presentation
asap2013-khoa-presentationasap2013-khoa-presentation
asap2013-khoa-presentationAbhishek Jain
 
LinkedIn OpenFabric Project - Interop 2017
LinkedIn OpenFabric Project - Interop 2017LinkedIn OpenFabric Project - Interop 2017
LinkedIn OpenFabric Project - Interop 2017
Shawn Zandi
 

Similar to PF_DIRECT@TMA12 (20)

Mp So C 18 Apr
Mp So C 18 AprMp So C 18 Apr
Mp So C 18 Apr
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
 
Rethinking computation: A processor architecture for machine intelligence
Rethinking computation: A processor architecture for machine intelligenceRethinking computation: A processor architecture for machine intelligence
Rethinking computation: A processor architecture for machine intelligence
 
Sharing High-Performance Interconnects Across Multiple Virtual Machines
Sharing High-Performance Interconnects Across Multiple Virtual MachinesSharing High-Performance Interconnects Across Multiple Virtual Machines
Sharing High-Performance Interconnects Across Multiple Virtual Machines
 
Fastsocket Linxiaofeng
Fastsocket LinxiaofengFastsocket Linxiaofeng
Fastsocket Linxiaofeng
 
Hari Krishna Vetsa Resume
Hari Krishna Vetsa ResumeHari Krishna Vetsa Resume
Hari Krishna Vetsa Resume
 
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors
 
Seminar Accelerating Business Using Microservices Architecture in Digital Age...
Seminar Accelerating Business Using Microservices Architecture in Digital Age...Seminar Accelerating Business Using Microservices Architecture in Digital Age...
Seminar Accelerating Business Using Microservices Architecture in Digital Age...
 
From the Archives: Future of Supercomputing at Altparty 2009
From the Archives: Future of Supercomputing at Altparty 2009From the Archives: Future of Supercomputing at Altparty 2009
From the Archives: Future of Supercomputing at Altparty 2009
 
Cloud Networking Trends
Cloud Networking TrendsCloud Networking Trends
Cloud Networking Trends
 
Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power
 
FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning
 
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.ioFast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
 
Point to point interconnect
Point to point interconnectPoint to point interconnect
Point to point interconnect
 
NUMA-aware thread-parallel breadth-first search for Graph500 and Green Graph5...
NUMA-aware thread-parallel breadth-first search for Graph500 and Green Graph5...NUMA-aware thread-parallel breadth-first search for Graph500 and Green Graph5...
NUMA-aware thread-parallel breadth-first search for Graph500 and Green Graph5...
 
Reconfigurable network on chip
Reconfigurable network on chipReconfigurable network on chip
Reconfigurable network on chip
 
asap2013-khoa-presentation
asap2013-khoa-presentationasap2013-khoa-presentation
asap2013-khoa-presentation
 
LinkedIn OpenFabric Project - Interop 2017
LinkedIn OpenFabric Project - Interop 2017LinkedIn OpenFabric Project - Interop 2017
LinkedIn OpenFabric Project - Interop 2017
 

Recently uploaded

CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
ClaraZara1
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
ydteq
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
gestioneergodomus
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
manasideore6
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
Basic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparelBasic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparel
top1002
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
ChristineTorrepenida1
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 

Recently uploaded (20)

CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
Basic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparelBasic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparel
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 

PF_DIRECT@TMA12

  • 1. Flexible High Performance Traffic Generation on Commodity Multi-core Platforms Nicola Bonelli, Andrea Di Pietro, Stefano Giordano, Gregorio Procissi CNIT and Dip. di Ingegneria dell’Informazione - Università di Pisa
  • 2. Introduction and Motivations • New network devices are emerging… (probes, NIDs, shapers) • Available traffic generator from the market: • Expensive black-box solutions (i.e. Spirent AX analyzer) • Not enough extensible: limited traffic patterns, poor semantics for randomization, etc. • PC and professional NICs based solutions are cheaper (Endace, Napatech, Invea-tech) • Enable fast packet transmission but usually do not provide a framework for traffic generation • Traffic generator should combine the flexibility of the software with the power of the modern hardware • multi-core architectures equipped with multi-queues NICs are today commodity hardware • Is it possible to create a software for traffic generation that, running of top of such a parallel architecture, is able to provide hardware-class performance?
  • 3. Software for traffic generation • A number of software solutions for traffic generation (trafgen, iperf, rude/crude, mgen) • Ostinato, and Brute makes use of PF_PACKET sockets and therefore are able to customize traffic at data-link layer: • - Packet rate hardly exceed few million packets per second (no scalability) • - No explicit support of multi-queue NICs • - It does not support time-stamping to adjust the timing with which to transmit packets Fast packet transmission… • Recently accelerated drivers have also emerged: netmap (Luigi Rizzo) • memory-map the DMA descriptors of NICs to user-space and can transmit at wire-speed (14.8Mpps) the same packet or a small set of of packets • A single thread generating a random-address IP packets does not fill the pipe (~6/8 Mpps each) • Also using the very fast Mersenne-twister random generator! (~50 CPU cycles) • Additional investigations are required…
  • 4. PF_DIRECT features We implemented a brand new socket, named PF_DIRECT: • A socket designed for the traffic generation (and transmission) • Compliant with vanilla drivers (not a custom driver) • Designed to run on top of commodity parallel hardware • Support of timestamp in transmission • Decoupling the traffic generation from packet transmission • Packets are generated by a user-space thread and transmitted by multiple kernel threads • Simple patterns are generated and transmitted nearly at wire speed • More complex patterns, most likely, do not have this requirement
  • 5. PF_DIRECT architecture PF_DIRECT kernel module consists of: • A user-space library written in C++11 supposed to handle memory mapping, packet dispatching among k-thread, etc. • A special memory mapped byte-oriented SPSC queue • Amortizes traffic coherence between cores (of queue index invalidations) • Kernel thread supposed to transmit the packets buffered at the SPSC queues, each at the given timestamp • Active wait or reschedule in case of long wait… • TSC of different cores are synchronized on modern CPUs (INVARIANT_TSC) • A ring of pre-allocated socket buffers (skb) which are re-used by the kernel module and never get deallocated by network drivers • User-counter trick
  • 7. Traffic generation with PF_DIRECT Our experimental traffic generator, built on top of PF_DIRECT, consists of: • User-space application, where each thread of execution represent a source of traffic • Traffic sources “Engine” (that can concurrently make use of different traffic models) • User-space thread, one per core, running a deadline scheduler (~20 ns context switch) • A user-defined traffic mode (micro-thread) is in charge of: • Create the packet to be transmitted • Schedule the timestamp for the packet transmission • Send the packet through the PF_DIRECT socket (buffered it at the SPSC queue) • Xml composition blocks that allow to instantiate and bind a given source to a core and to a given hardware queue
  • 9. Experimental results: 1G Monsters 1 Gb link Xeon 6-core X5650 @2.57 GHz, 12GBytes RAM Intel 82599 multi-queue 10G Ethernet adapter, ixgbe 3.4.24 device driver PF_DIRECT for traffic generation Spirent AX-4000 Traffic Analyzer Model CBR, 64bytes frames with random IP addresses: single source: 1 user-space thread hardware queue: 1 kernel thread
  • 10. 1G link: CBR 100kpps, interarrival time
  • 11. 1G link: variadic rate up to 1.4Mpps
  • 12. 1G link: Inter-arrival times of Poisson process at 100Kpps
  • 13. 1G link: Inter-arrival times of Poisson process at 1Mpps
  • 14. Experimental results: 10G Mascara Monsters 10 Gb link Xeon 6-core X5650 @2.57 GHz, 12GBytes RAM Intel 82599 multi-queue 10G Ethernet adapter, ixgbe 3.4.24 device driver PF_DIRECT for traffic generation Xeon 6-core X5650 @2.57 GHz, 12 GBytes RAM Intel 82599 multi-queue 10G ethernet adapter, ixgbe 3.4.24 device driver PFQ for traffic capture Model CBR, 64bytes frames with random IP addresses: 1 user-space thread multiple hardware queue: 4 kernel threads
  • 15. 10G link: variadic rate up to 12.8Mpps
  • 16. 10G link: Inter-arrival times of Poisson process at 4Mpps
  • 19. Conclusions • PF_DIRECT a Linux socket that leverages the potential of multi-core architectures and multi-queues NICs • PF_DIRECT decouples the task of packet generation from that of transmission • A single thread is able to generate non-trivial traffic, close to the wire-rate ~13Mpps • Multiple kernel-threads transmit packets though multiple queues • Support transmission timestamp (in TSC) • Experimental traffic generator on top of PF_DIRECT
  • 20. Future work • Release the PF_DIRECT source code • Additional performance improvements in PF_DIRECT • Performance: identify a small set of changes, common to different drivers, that could define a “PF_DIRECT aware- driver” • Implement a stable version of the “traffic generator” with complex traffic models