SlideShare a Scribd company logo
PFQ: a Novel Architecture for Packet
Capture on Parallel Commodity
Hardware
Nicola Bonelli, Andrea Di Pietro,
Stefano Giordano, Gregorio Procissi
CNIT e Dip. di Ingegneria dell’Informazione - Università di Pisa
Outline
• Introduction and motivation
• Multi-core programming guidelines
• PFQ architecture
• Performance evaluation
• Conclusion and future work
Introduction and Motivations
• Designing monitoring applications has become a very challenging task:
– The hardware has evolved: 10Gbits links, multi-core architectures and multi-
queue network devices (MSI-X)…
• The present software for traffic monitoring, including some parts of the
Linux kernel, is not optimized for new hardware
– (+) kernel support for multi-queue network adapters is implemented
– (-) Linux kernel has a very bad support for monitoring applications
– (-) PF_PACKET is extremely slow, even when used in memory-map mode (pcap)
– (-) PF_RING has been designed for single-processor systems
• Traffic monitoring should:
– Exploits modern hardware, scaling possibly linearly with the number of cores
– Decouple the hardware parallelism from the software one
– Divide and conquer approach to steer packets to applications or threads
Multi-thread on Multi-core
• What’s wrong with the current software?
– Previous multi-threading paradigms used for single-processor systems are still
valid, but prevent the software from scaling with the number of cores.
• For a software to be effective on multi-core system…
– Semaphores, mutexes, and spinlocks are out of question!
– R/W mutexes prevent readers from scaling, even though they are supposed to
grant concurrent access to readers
– Atomic operations are sometimes required, but must be used with
moderation
• sparse-counters instead of atomic ones
• design algorithm as they can use amortized atomic operations
– Sharing (writes to shared data) has serious impact on performance
– writes to shared memory are delayed by the hardware, reads must be synchronized
– False-sharing must and can always be avoided
• wait-free algorithms are mandatory, use lock-free algorithm should be
avoided (if possible)…
PFQ preamble
• PFQ is a novel capture system natively supporting 64bit multi-core
architectures written on top of all the previously exposed
guidelines
• PFQ is not a custom driver
• It is an architecture running on top of standard Ethernet drivers, as
well as slightly modified ones “PFQ aware drivers” (PF_RING aware
driver inheritance)
• PFQ enables packet capturing, filtering, hw queues and devices
aggregation, packet classifications, packet steering and so forth…
• Decouples the hardware parallelism (i.e. Intel RSS) from the
software one
PFQ architecture
Built on the top of the following components…
• User-space C++11 library that provides the same abstraction as that of the STL:
container and iterators
• DB-MPSC queue: double-buffered multiple-producers queue (for the
communication to user-space):
– Allows NAPI contexts to enqueue packets concurrently
– Reduce the sharing, eliminate the false sharing between user-space and NAPI contexts
– Enables user-space copies of packets from the queue to a private buffer in a batch fashion
• De-multiplexing Matrix:
– perfect wait-free concurrently accessible data structure
– no serialization is required to steer/copy packets
• SPSC queue:
– enables batching for socket buffers (skb), to increase temporal locality for the memory
manager (SLAB for kernel prior to 2.6.39)
• Driver aware:
– an effective idea inherited from PF_RING
PFQ architecture
Packet steering
Given a packet and a set of sockets, which sockets need to receive it?
• For capture engines that do not support it, filtering can be used to
dispatch packets across a number of sockets:
– Traversing the socket list to find those interested in the packet has
linear complexity O(n).
– Flexible approach because it enables dispatching as well as copies
• We designed a “packet steering” paradigm that:
– O(1) complexity to identify the destination sockets
– Support both balancing and copies of packets
– Custom hash functions for packet dispatching
Packet steering
• Completely concurrent block (wait-free):
– Shared state (de-multiplexing matrix) is mostly read only
– Writes, which are in general rare events, are serialized each other to prevent
race conditions. The update of the state in the matrix is atomic
• Load balancing groups:
– A socket can create or subscribe a load-balancing group
– It will receive a fraction of the overall traffic
• Socket binding
– One or more hardware queues of a given NIC
– One or more NICs
• Binding and balancing groups are orthogonal and can be concurrently
used
Socket queue: DB-MPSC
• The queue of socket is an unavoidable contention point:
– Load balancing shuffles packets across sockets
• How handle contention without impacting the performance?
– Use an atomic operation to reserve a slot within the queue (will be amortized
in future implementations)
– Reduce traffic coherence among the cores running k-thread and user-space
thread
– Swap between buffers is triggered by user-space thread or by water-mark
– Packets can be copied in batch fashion, or consumed in-place
Testbed: Mascara & Monsters
Mascara Monsters
10 Gb link
Xeon 6-core X5650, @2.57 GHz,
12GBytes RAM
New socket PF_DIRECT for generation
Intel 82599 multi-queue 10G ethernet
adapter.
By deploying 3-4 cores, it is possible to
generate up to ~12 Mpps of 64 bytes.
Xeon 6-core X5650 @2.57GHz, 12
GBytes RAM
Intel 82599 multi-queue 10G ethernet
adapter
PFQ on board for traffic capture
Single socket layout
Fully parallel layout
Load balancing across sockets
• Using 12 capturing NAPI
• Varying the number of user space threads
Packet copy
• Copying packets to a variable number of user space threads
• 12 NAPI contexts within the kernel
Future directions
We are working to improve the packet steering framework…
• How can we better distribute packets according to application-
specific semantics?
• Enhance balancing groups, allow a single socket to join multiple
balancing groups
• Each group is associated with a “specific steering function”
• Investigating on the implementation for wait-free stateful algorithm
(pimp/CAS)
• Add the support of control- and data-plane socket
• Implement a filtering mechanism by means of some bloom filter
variant (capture filters)
Conclusions
• Modern commodity architectures are increasingly parallel
• Multithread software is today not ready for multi-core
architectures:
• Need to strictly fulfill coding and design rules to achieve linear
scalability
• PFQ: a novel Linux packet capturing engine
– Better scalability with respect to competitors
– Flexible packet steering that eases the implementation of multi-
thread user-space applications
– Decouples kernel space and user space parallelism
• PFQ webpage and download:
– netgroup.iet.unipi.it/software/pfq

More Related Content

What's hot

Evolving Virtual Networking with IO Visor
Evolving Virtual Networking with IO VisorEvolving Virtual Networking with IO Visor
Evolving Virtual Networking with IO Visor
Larry Lang
 
General Purpose GPU Computing
General Purpose GPU ComputingGeneral Purpose GPU Computing
General Purpose GPU Computing
GlobalLogic Ukraine
 
File Systems: Why, How and Where
File Systems: Why, How and WhereFile Systems: Why, How and Where
File Systems: Why, How and Where
Kernel TLV
 
Bgpcep odl summit 2015
Bgpcep odl summit 2015Bgpcep odl summit 2015
Bgpcep odl summit 2015
Giles Heron
 
Maxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorialMaxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorialmadhuinturi
 
P4 to OpenDataPlane Compiler - BUD17-304
P4 to OpenDataPlane Compiler - BUD17-304P4 to OpenDataPlane Compiler - BUD17-304
P4 to OpenDataPlane Compiler - BUD17-304
Linaro
 
Linux Kernel Cryptographic API and Use Cases
Linux Kernel Cryptographic API and Use CasesLinux Kernel Cryptographic API and Use Cases
Linux Kernel Cryptographic API and Use Cases
Kernel TLV
 
Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509
Linaro
 
Asymmetric Multiprocessing - Kynetics ELC 2018 portland
Asymmetric Multiprocessing - Kynetics ELC 2018 portlandAsymmetric Multiprocessing - Kynetics ELC 2018 portland
Asymmetric Multiprocessing - Kynetics ELC 2018 portland
Nicola La Gloria
 
Mahti quick-start guide
Mahti quick-start guide Mahti quick-start guide
Mahti quick-start guide
CSC - IT Center for Science
 
Heterogeneous multiprocessing on androd and i.mx7
Heterogeneous multiprocessing on androd and i.mx7Heterogeneous multiprocessing on androd and i.mx7
Heterogeneous multiprocessing on androd and i.mx7
Kynetics
 
Foss Gadgematics
Foss GadgematicsFoss Gadgematics
Foss Gadgematics
Bud Siddhisena
 
Run Your Own 6LoWPAN Based IoT Network
Run Your Own 6LoWPAN Based IoT NetworkRun Your Own 6LoWPAN Based IoT Network
Run Your Own 6LoWPAN Based IoT Network
Samsung Open Source Group
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)
Kirill Tsym
 
Tempesta FW: a FrameWork and FireWall for HTTP DDoS mitigation and Web Applic...
Tempesta FW: a FrameWork and FireWall for HTTP DDoS mitigation and Web Applic...Tempesta FW: a FrameWork and FireWall for HTTP DDoS mitigation and Web Applic...
Tempesta FW: a FrameWork and FireWall for HTTP DDoS mitigation and Web Applic...
Alexander Krizhanovsky
 
Smart logic
Smart logicSmart logic
VSPERF BEnchmarking the Network Data Plane of NFV VDevices and VLinks
VSPERF BEnchmarking the Network Data Plane of NFV VDevices and VLinksVSPERF BEnchmarking the Network Data Plane of NFV VDevices and VLinks
VSPERF BEnchmarking the Network Data Plane of NFV VDevices and VLinks
OPNFV
 
BUD17-300: Journey of a packet
BUD17-300: Journey of a packetBUD17-300: Journey of a packet
BUD17-300: Journey of a packet
Linaro
 
Programming Trends in High Performance Computing
Programming Trends in High Performance ComputingProgramming Trends in High Performance Computing
Programming Trends in High Performance Computing
Juris Vencels
 
Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoF
Jason Riedy
 

What's hot (20)

Evolving Virtual Networking with IO Visor
Evolving Virtual Networking with IO VisorEvolving Virtual Networking with IO Visor
Evolving Virtual Networking with IO Visor
 
General Purpose GPU Computing
General Purpose GPU ComputingGeneral Purpose GPU Computing
General Purpose GPU Computing
 
File Systems: Why, How and Where
File Systems: Why, How and WhereFile Systems: Why, How and Where
File Systems: Why, How and Where
 
Bgpcep odl summit 2015
Bgpcep odl summit 2015Bgpcep odl summit 2015
Bgpcep odl summit 2015
 
Maxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorialMaxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorial
 
P4 to OpenDataPlane Compiler - BUD17-304
P4 to OpenDataPlane Compiler - BUD17-304P4 to OpenDataPlane Compiler - BUD17-304
P4 to OpenDataPlane Compiler - BUD17-304
 
Linux Kernel Cryptographic API and Use Cases
Linux Kernel Cryptographic API and Use CasesLinux Kernel Cryptographic API and Use Cases
Linux Kernel Cryptographic API and Use Cases
 
Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509
 
Asymmetric Multiprocessing - Kynetics ELC 2018 portland
Asymmetric Multiprocessing - Kynetics ELC 2018 portlandAsymmetric Multiprocessing - Kynetics ELC 2018 portland
Asymmetric Multiprocessing - Kynetics ELC 2018 portland
 
Mahti quick-start guide
Mahti quick-start guide Mahti quick-start guide
Mahti quick-start guide
 
Heterogeneous multiprocessing on androd and i.mx7
Heterogeneous multiprocessing on androd and i.mx7Heterogeneous multiprocessing on androd and i.mx7
Heterogeneous multiprocessing on androd and i.mx7
 
Foss Gadgematics
Foss GadgematicsFoss Gadgematics
Foss Gadgematics
 
Run Your Own 6LoWPAN Based IoT Network
Run Your Own 6LoWPAN Based IoT NetworkRun Your Own 6LoWPAN Based IoT Network
Run Your Own 6LoWPAN Based IoT Network
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)
 
Tempesta FW: a FrameWork and FireWall for HTTP DDoS mitigation and Web Applic...
Tempesta FW: a FrameWork and FireWall for HTTP DDoS mitigation and Web Applic...Tempesta FW: a FrameWork and FireWall for HTTP DDoS mitigation and Web Applic...
Tempesta FW: a FrameWork and FireWall for HTTP DDoS mitigation and Web Applic...
 
Smart logic
Smart logicSmart logic
Smart logic
 
VSPERF BEnchmarking the Network Data Plane of NFV VDevices and VLinks
VSPERF BEnchmarking the Network Data Plane of NFV VDevices and VLinksVSPERF BEnchmarking the Network Data Plane of NFV VDevices and VLinks
VSPERF BEnchmarking the Network Data Plane of NFV VDevices and VLinks
 
BUD17-300: Journey of a packet
BUD17-300: Journey of a packetBUD17-300: Journey of a packet
BUD17-300: Journey of a packet
 
Programming Trends in High Performance Computing
Programming Trends in High Performance ComputingProgramming Trends in High Performance Computing
Programming Trends in High Performance Computing
 
Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoF
 

Viewers also liked

PFQ@ 10th Italian Networking Workshop (Bormio)
PFQ@ 10th Italian Networking Workshop (Bormio)PFQ@ 10th Italian Networking Workshop (Bormio)
PFQ@ 10th Italian Networking Workshop (Bormio)
Nicola Bonelli
 
Cat's anatomy
Cat's anatomyCat's anatomy
Cat's anatomy
Nicola Bonelli
 
Types, classes and concepts
Types, classes and conceptsTypes, classes and concepts
Types, classes and concepts
Nicola Bonelli
 
Netmap presentation
Netmap presentationNetmap presentation
Netmap presentation
Amir Razmjou
 
DPDK KNI interface
DPDK KNI interfaceDPDK KNI interface
DPDK KNI interface
Denys Haryachyy
 
Understanding DPDK algorithmics
Understanding DPDK algorithmicsUnderstanding DPDK algorithmics
Understanding DPDK algorithmics
Denys Haryachyy
 
Vagrant
VagrantVagrant
Userspace networking
Userspace networkingUserspace networking
Userspace networking
Stephen Hemminger
 
Understanding DPDK
Understanding DPDKUnderstanding DPDK
Understanding DPDK
Denys Haryachyy
 

Viewers also liked (9)

PFQ@ 10th Italian Networking Workshop (Bormio)
PFQ@ 10th Italian Networking Workshop (Bormio)PFQ@ 10th Italian Networking Workshop (Bormio)
PFQ@ 10th Italian Networking Workshop (Bormio)
 
Cat's anatomy
Cat's anatomyCat's anatomy
Cat's anatomy
 
Types, classes and concepts
Types, classes and conceptsTypes, classes and concepts
Types, classes and concepts
 
Netmap presentation
Netmap presentationNetmap presentation
Netmap presentation
 
DPDK KNI interface
DPDK KNI interfaceDPDK KNI interface
DPDK KNI interface
 
Understanding DPDK algorithmics
Understanding DPDK algorithmicsUnderstanding DPDK algorithmics
Understanding DPDK algorithmics
 
Vagrant
VagrantVagrant
Vagrant
 
Userspace networking
Userspace networkingUserspace networking
Userspace networking
 
Understanding DPDK
Understanding DPDKUnderstanding DPDK
Understanding DPDK
 

Similar to PFQ@ PAM12

Walk Through a Software Defined Everything PoC
Walk Through a Software Defined Everything PoCWalk Through a Software Defined Everything PoC
Walk Through a Software Defined Everything PoC
MidoNet
 
Juniper Networks Router Architecture
Juniper Networks Router ArchitectureJuniper Networks Router Architecture
Juniper Networks Router Architecture
lawuah
 
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Michelle Holley
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
HostedbyConfluent
 
Making workload nomadic when accelerated
Making workload nomadic when acceleratedMaking workload nomadic when accelerated
Making workload nomadic when accelerated
Zhipeng Huang
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
Kernel TLV
 
DPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles ShiflettDPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles Shiflett
Jim St. Leger
 
Fastsocket Linxiaofeng
Fastsocket LinxiaofengFastsocket Linxiaofeng
Fastsocket Linxiaofeng
Michael Zhang
 
Project Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxProject Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptx
AkshitAgiwal1
 
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
Yuuki Takano
 
Multithreading computer architecture
 Multithreading computer architecture  Multithreading computer architecture
Multithreading computer architecture
Haris456
 
VLSI design Dr B.jagadeesh UNIT-5.pptx
VLSI design Dr B.jagadeesh   UNIT-5.pptxVLSI design Dr B.jagadeesh   UNIT-5.pptx
VLSI design Dr B.jagadeesh UNIT-5.pptx
jagadeesh276791
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
Peter Clapham
 
Designing HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale SystemsDesigning HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale Systems
inside-BigData.com
 
Distributed Clouds and Software Defined Networking
Distributed Clouds and Software Defined NetworkingDistributed Clouds and Software Defined Networking
Distributed Clouds and Software Defined Networking
US-Ignite
 
CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016] CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016]
IO Visor Project
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PROIDEA
 
Microx - A Unix like kernel for Embedded Systems written from scratch.
Microx - A Unix like kernel for Embedded Systems written from scratch.Microx - A Unix like kernel for Embedded Systems written from scratch.
Microx - A Unix like kernel for Embedded Systems written from scratch.Waqar Sheikh
 
Oracle rac 10g best practices
Oracle rac 10g best practicesOracle rac 10g best practices
Oracle rac 10g best practicesHaseeb Alam
 

Similar to PFQ@ PAM12 (20)

Walk Through a Software Defined Everything PoC
Walk Through a Software Defined Everything PoCWalk Through a Software Defined Everything PoC
Walk Through a Software Defined Everything PoC
 
Juniper Networks Router Architecture
Juniper Networks Router ArchitectureJuniper Networks Router Architecture
Juniper Networks Router Architecture
 
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
 
Making workload nomadic when accelerated
Making workload nomadic when acceleratedMaking workload nomadic when accelerated
Making workload nomadic when accelerated
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
DPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles ShiflettDPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles Shiflett
 
Fastsocket Linxiaofeng
Fastsocket LinxiaofengFastsocket Linxiaofeng
Fastsocket Linxiaofeng
 
Project Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxProject Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptx
 
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
 
Multithreading computer architecture
 Multithreading computer architecture  Multithreading computer architecture
Multithreading computer architecture
 
VLSI design Dr B.jagadeesh UNIT-5.pptx
VLSI design Dr B.jagadeesh   UNIT-5.pptxVLSI design Dr B.jagadeesh   UNIT-5.pptx
VLSI design Dr B.jagadeesh UNIT-5.pptx
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Designing HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale SystemsDesigning HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale Systems
 
Distributed Clouds and Software Defined Networking
Distributed Clouds and Software Defined NetworkingDistributed Clouds and Software Defined Networking
Distributed Clouds and Software Defined Networking
 
CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016] CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016]
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
 
Microx - A Unix like kernel for Embedded Systems written from scratch.
Microx - A Unix like kernel for Embedded Systems written from scratch.Microx - A Unix like kernel for Embedded Systems written from scratch.
Microx - A Unix like kernel for Embedded Systems written from scratch.
 
ODP Presentation LinuxCon NA 2014
ODP Presentation LinuxCon NA 2014ODP Presentation LinuxCon NA 2014
ODP Presentation LinuxCon NA 2014
 
Oracle rac 10g best practices
Oracle rac 10g best practicesOracle rac 10g best practices
Oracle rac 10g best practices
 

Recently uploaded

weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
SyedAbiiAzazi1
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
Vijay Dialani, PhD
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
manasideore6
 
Basic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparelBasic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparel
top1002
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
ongomchris
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
AmarGB2
 
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
ssuser7dcef0
 
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERSCW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
veerababupersonal22
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 

Recently uploaded (20)

weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
 
Basic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparelBasic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparel
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
 
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
 
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERSCW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 

PFQ@ PAM12

  • 1. PFQ: a Novel Architecture for Packet Capture on Parallel Commodity Hardware Nicola Bonelli, Andrea Di Pietro, Stefano Giordano, Gregorio Procissi CNIT e Dip. di Ingegneria dell’Informazione - Università di Pisa
  • 2. Outline • Introduction and motivation • Multi-core programming guidelines • PFQ architecture • Performance evaluation • Conclusion and future work
  • 3. Introduction and Motivations • Designing monitoring applications has become a very challenging task: – The hardware has evolved: 10Gbits links, multi-core architectures and multi- queue network devices (MSI-X)… • The present software for traffic monitoring, including some parts of the Linux kernel, is not optimized for new hardware – (+) kernel support for multi-queue network adapters is implemented – (-) Linux kernel has a very bad support for monitoring applications – (-) PF_PACKET is extremely slow, even when used in memory-map mode (pcap) – (-) PF_RING has been designed for single-processor systems • Traffic monitoring should: – Exploits modern hardware, scaling possibly linearly with the number of cores – Decouple the hardware parallelism from the software one – Divide and conquer approach to steer packets to applications or threads
  • 4. Multi-thread on Multi-core • What’s wrong with the current software? – Previous multi-threading paradigms used for single-processor systems are still valid, but prevent the software from scaling with the number of cores. • For a software to be effective on multi-core system… – Semaphores, mutexes, and spinlocks are out of question! – R/W mutexes prevent readers from scaling, even though they are supposed to grant concurrent access to readers – Atomic operations are sometimes required, but must be used with moderation • sparse-counters instead of atomic ones • design algorithm as they can use amortized atomic operations – Sharing (writes to shared data) has serious impact on performance – writes to shared memory are delayed by the hardware, reads must be synchronized – False-sharing must and can always be avoided • wait-free algorithms are mandatory, use lock-free algorithm should be avoided (if possible)…
  • 5. PFQ preamble • PFQ is a novel capture system natively supporting 64bit multi-core architectures written on top of all the previously exposed guidelines • PFQ is not a custom driver • It is an architecture running on top of standard Ethernet drivers, as well as slightly modified ones “PFQ aware drivers” (PF_RING aware driver inheritance) • PFQ enables packet capturing, filtering, hw queues and devices aggregation, packet classifications, packet steering and so forth… • Decouples the hardware parallelism (i.e. Intel RSS) from the software one
  • 6. PFQ architecture Built on the top of the following components… • User-space C++11 library that provides the same abstraction as that of the STL: container and iterators • DB-MPSC queue: double-buffered multiple-producers queue (for the communication to user-space): – Allows NAPI contexts to enqueue packets concurrently – Reduce the sharing, eliminate the false sharing between user-space and NAPI contexts – Enables user-space copies of packets from the queue to a private buffer in a batch fashion • De-multiplexing Matrix: – perfect wait-free concurrently accessible data structure – no serialization is required to steer/copy packets • SPSC queue: – enables batching for socket buffers (skb), to increase temporal locality for the memory manager (SLAB for kernel prior to 2.6.39) • Driver aware: – an effective idea inherited from PF_RING
  • 8. Packet steering Given a packet and a set of sockets, which sockets need to receive it? • For capture engines that do not support it, filtering can be used to dispatch packets across a number of sockets: – Traversing the socket list to find those interested in the packet has linear complexity O(n). – Flexible approach because it enables dispatching as well as copies • We designed a “packet steering” paradigm that: – O(1) complexity to identify the destination sockets – Support both balancing and copies of packets – Custom hash functions for packet dispatching
  • 9. Packet steering • Completely concurrent block (wait-free): – Shared state (de-multiplexing matrix) is mostly read only – Writes, which are in general rare events, are serialized each other to prevent race conditions. The update of the state in the matrix is atomic • Load balancing groups: – A socket can create or subscribe a load-balancing group – It will receive a fraction of the overall traffic • Socket binding – One or more hardware queues of a given NIC – One or more NICs • Binding and balancing groups are orthogonal and can be concurrently used
  • 10. Socket queue: DB-MPSC • The queue of socket is an unavoidable contention point: – Load balancing shuffles packets across sockets • How handle contention without impacting the performance? – Use an atomic operation to reserve a slot within the queue (will be amortized in future implementations) – Reduce traffic coherence among the cores running k-thread and user-space thread – Swap between buffers is triggered by user-space thread or by water-mark – Packets can be copied in batch fashion, or consumed in-place
  • 11. Testbed: Mascara & Monsters Mascara Monsters 10 Gb link Xeon 6-core X5650, @2.57 GHz, 12GBytes RAM New socket PF_DIRECT for generation Intel 82599 multi-queue 10G ethernet adapter. By deploying 3-4 cores, it is possible to generate up to ~12 Mpps of 64 bytes. Xeon 6-core X5650 @2.57GHz, 12 GBytes RAM Intel 82599 multi-queue 10G ethernet adapter PFQ on board for traffic capture
  • 14. Load balancing across sockets • Using 12 capturing NAPI • Varying the number of user space threads
  • 15. Packet copy • Copying packets to a variable number of user space threads • 12 NAPI contexts within the kernel
  • 16. Future directions We are working to improve the packet steering framework… • How can we better distribute packets according to application- specific semantics? • Enhance balancing groups, allow a single socket to join multiple balancing groups • Each group is associated with a “specific steering function” • Investigating on the implementation for wait-free stateful algorithm (pimp/CAS) • Add the support of control- and data-plane socket • Implement a filtering mechanism by means of some bloom filter variant (capture filters)
  • 17. Conclusions • Modern commodity architectures are increasingly parallel • Multithread software is today not ready for multi-core architectures: • Need to strictly fulfill coding and design rules to achieve linear scalability • PFQ: a novel Linux packet capturing engine – Better scalability with respect to competitors – Flexible packet steering that eases the implementation of multi- thread user-space applications – Decouples kernel space and user space parallelism • PFQ webpage and download: – netgroup.iet.unipi.it/software/pfq