Published on

Flexible High Performance Traffic Generation on Commodity Multi-core Platforms

Published in: Engineering, Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Flexible High Performance Traffic Generation on Commodity Multi-core Platforms Nicola Bonelli, Andrea Di Pietro, Stefano Giordano, Gregorio Procissi CNIT and Dip. di Ingegneria dell’Informazione - Università di Pisa
  2. 2. Introduction and Motivations • New network devices are emerging… (probes, NIDs, shapers) • Available traffic generator from the market: • Expensive black-box solutions (i.e. Spirent AX analyzer) • Not enough extensible: limited traffic patterns, poor semantics for randomization, etc. • PC and professional NICs based solutions are cheaper (Endace, Napatech, Invea-tech) • Enable fast packet transmission but usually do not provide a framework for traffic generation • Traffic generator should combine the flexibility of the software with the power of the modern hardware • multi-core architectures equipped with multi-queues NICs are today commodity hardware • Is it possible to create a software for traffic generation that, running of top of such a parallel architecture, is able to provide hardware-class performance?
  3. 3. Software for traffic generation • A number of software solutions for traffic generation (trafgen, iperf, rude/crude, mgen) • Ostinato, and Brute makes use of PF_PACKET sockets and therefore are able to customize traffic at data-link layer: • - Packet rate hardly exceed few million packets per second (no scalability) • - No explicit support of multi-queue NICs • - It does not support time-stamping to adjust the timing with which to transmit packets Fast packet transmission… • Recently accelerated drivers have also emerged: netmap (Luigi Rizzo) • memory-map the DMA descriptors of NICs to user-space and can transmit at wire-speed (14.8Mpps) the same packet or a small set of of packets • A single thread generating a random-address IP packets does not fill the pipe (~6/8 Mpps each) • Also using the very fast Mersenne-twister random generator! (~50 CPU cycles) • Additional investigations are required…
  4. 4. PF_DIRECT features We implemented a brand new socket, named PF_DIRECT: • A socket designed for the traffic generation (and transmission) • Compliant with vanilla drivers (not a custom driver) • Designed to run on top of commodity parallel hardware • Support of timestamp in transmission • Decoupling the traffic generation from packet transmission • Packets are generated by a user-space thread and transmitted by multiple kernel threads • Simple patterns are generated and transmitted nearly at wire speed • More complex patterns, most likely, do not have this requirement
  5. 5. PF_DIRECT architecture PF_DIRECT kernel module consists of: • A user-space library written in C++11 supposed to handle memory mapping, packet dispatching among k-thread, etc. • A special memory mapped byte-oriented SPSC queue • Amortizes traffic coherence between cores (of queue index invalidations) • Kernel thread supposed to transmit the packets buffered at the SPSC queues, each at the given timestamp • Active wait or reschedule in case of long wait… • TSC of different cores are synchronized on modern CPUs (INVARIANT_TSC) • A ring of pre-allocated socket buffers (skb) which are re-used by the kernel module and never get deallocated by network drivers • User-counter trick
  6. 6. PF_DIRECT architecture
  7. 7. Traffic generation with PF_DIRECT Our experimental traffic generator, built on top of PF_DIRECT, consists of: • User-space application, where each thread of execution represent a source of traffic • Traffic sources “Engine” (that can concurrently make use of different traffic models) • User-space thread, one per core, running a deadline scheduler (~20 ns context switch) • A user-defined traffic mode (micro-thread) is in charge of: • Create the packet to be transmitted • Schedule the timestamp for the packet transmission • Send the packet through the PF_DIRECT socket (buffered it at the SPSC queue) • Xml composition blocks that allow to instantiate and bind a given source to a core and to a given hardware queue
  8. 8. Traffic generator architecture
  9. 9. Experimental results: 1G Monsters 1 Gb link Xeon 6-core X5650 @2.57 GHz, 12GBytes RAM Intel 82599 multi-queue 10G Ethernet adapter, ixgbe 3.4.24 device driver PF_DIRECT for traffic generation Spirent AX-4000 Traffic Analyzer Model CBR, 64bytes frames with random IP addresses: single source: 1 user-space thread hardware queue: 1 kernel thread
  10. 10. 1G link: CBR 100kpps, interarrival time
  11. 11. 1G link: variadic rate up to 1.4Mpps
  12. 12. 1G link: Inter-arrival times of Poisson process at 100Kpps
  13. 13. 1G link: Inter-arrival times of Poisson process at 1Mpps
  14. 14. Experimental results: 10G Mascara Monsters 10 Gb link Xeon 6-core X5650 @2.57 GHz, 12GBytes RAM Intel 82599 multi-queue 10G Ethernet adapter, ixgbe 3.4.24 device driver PF_DIRECT for traffic generation Xeon 6-core X5650 @2.57 GHz, 12 GBytes RAM Intel 82599 multi-queue 10G ethernet adapter, ixgbe 3.4.24 device driver PFQ for traffic capture Model CBR, 64bytes frames with random IP addresses: 1 user-space thread multiple hardware queue: 4 kernel threads
  15. 15. 10G link: variadic rate up to 12.8Mpps
  16. 16. 10G link: Inter-arrival times of Poisson process at 4Mpps
  17. 17. 10G link: throughput bps
  18. 18. 10G link: throughput bps
  19. 19. Conclusions • PF_DIRECT a Linux socket that leverages the potential of multi-core architectures and multi-queues NICs • PF_DIRECT decouples the task of packet generation from that of transmission • A single thread is able to generate non-trivial traffic, close to the wire-rate ~13Mpps • Multiple kernel-threads transmit packets though multiple queues • Support transmission timestamp (in TSC) • Experimental traffic generator on top of PF_DIRECT
  20. 20. Future work • Release the PF_DIRECT source code • Additional performance improvements in PF_DIRECT • Performance: identify a small set of changes, common to different drivers, that could define a “PF_DIRECT aware- driver” • Implement a stable version of the “traffic generator” with complex traffic models