PFQ@ 10th Italian Networking Workshop (Bormio)

676 views

Published on

Running monitoring applications on accelerated capture engines

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
676
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

PFQ@ 10th Italian Networking Workshop (Bormio)

  1. 1. Running  Monitoring  Applica0ons   on  Accelerated  Capture  Engines   Nicola  Bonelli   N.  Bonelli,  R.G  Garroppo,  L.  Gazzarrini,  S.  Giordano,  G.   Procissi,  F.  Russo,  G.  Volpi  
  2. 2. Agenda   •  Capture  engines  overview   •  What’s  new  in  PFQ  (2.0)   •  Accelerated  pcap  library   – PF_RING,  PF_RING+DNA,  NETMAP,  PFQ   •  Pcap-­‐perf:  a  tool  for  benchmarking  pcap  apps   •  Experimental  results  
  3. 3. Speed  maXers…  
  4. 4. Accelerated  Capture  Engine     •  Linux  is  provided  with  a  default  capture  engine   –  the  PF_PACKET  socket   •  Because  of  speed,  other  capture  engines  emerged:   –  2004:  PF_RING   •  designed  for  single  core,  beXer  performance  than  the  then   PF_PACKET   –  2011:  PFQ   •  first  to  address  mul0-­‐core  architecture  and  mul0-­‐queues  NICs   (Best  Paper  Award  @PAM2012)   –  2012:  PF_RING-­‐DNA   •  accelerated  drivers  (Intel)   –  2012:  NetMap   •  accelerated  drivers  (Intel,Broadcom)  (Best  Paper  Award  @Usenix   ATC’12)  
  5. 5. …  but  what  happens  on  these  tracks?  
  6. 6. What’s  new  in  PFQ  2.0   •  From  capture  engine  to  monitoring  framework…   •  Improved  performance   –  ~14.8  Mpps  single  user-­‐space  thread   •  Improved  features:   –  compliant  with  a  plethora  of  NICs:  pfq-­‐oma0c   –  monitoring  groups  and  classes   –  in-­‐kernel  extensible  engine  for  packet  steering:   dispatching,  copying,  cloning,  filtering   –  na0ve  bindings:  C,  C++11,  Haskell  (more  to  come)   –  per-­‐group  filtering:  BFP,  vlan  (un-­‐tagging)   –  pcap  library  
  7. 7. Feature  comparison   PF_PACKET   PF_RING  5.x   PF_RING-­‐DNA   NETMAP  -­‐  0813   PFQ  2.0   NIC   *   *,  PF-­‐AWARE   (Intel,  Broadcom)   only  Intel  1/10G   Intel  1/10G,   forcedeth   *    accelerated   Driver  compat.   *   yes,  non  accel.   no   no   yes,  dynamic   mul0-­‐core   -­‐   Hardware  (RSS)   Hardware  (RSS)   Hardware  (RSS)   Hw  RSS  +  sog   mul0-­‐queue   yes  (poor)   yes   yes   yes   yes   na0ve  binding   C   C   C   C   C,  C++11,   Haskell,  Java,   Python   groups   -­‐   -­‐   -­‐   -­‐   yes   class   -­‐   -­‐   -­‐   -­‐   yes   concurrent  mon.   yes   yes   commercial  ?   -­‐   yes   clustering   -­‐   yes   -­‐   -­‐   yes  (MT,  group)   steering   -­‐   -­‐   commercial   -­‐   yes  (MT,  group)   STM  state   -­‐   -­‐   -­‐   -­‐   work  in   progress  
  8. 8. Feature  comparison   PF_PACKET   PF_RING  5.x   PF_RING-­‐DNA   NETMAP  -­‐  0813   PFQ  2.0   Pcap  library   yes   yes   yes   buggy/incomplete   yes   BPF  (filters)   yes  (MT)   yes  (MT)   yes  (user-­‐space)   -­‐   yes  (MT,  group)   vlan  filters   -­‐   yes   yes  (hw  Intel)   -­‐   yes  (MT,  group)   vlan  untagging   -­‐   -­‐     -­‐   -­‐   yes  (MT,  sog.)   Intel  hw  filters   -­‐   yes   yes   -­‐   No   bloom  filters   -­‐   -­‐   -­‐   -­‐   work  in  progress  
  9. 9. Accelerated  PCAP  library   •  Pcap  library  is  the  standard  de-­‐facto  interface  for  packet  capture   •  Accelerated  capture  engines  provide  their  own  pcap  library:   –  Both  PF_RING  and  PF_RING-­‐DNA  provide  a  complete  accelerated   version   –  NetMap  provides  an  experimental  and  incomplete  pcap  support   •  BPF  is  missing   •  PFQ  provides  a  complete  implementa0on   –  PFQ  C-­‐API  mapped  over  pcap  interface  wherever  possible,   implemented  as  environment  variables  otherwise   –  Clustering  is  enabled  specifying  mul0ple  NICs  in  colon-­‐separated   fashion,  steering  by  means  of  PFQ_STEER  variable   PFQ_GROUP=10  PFQ_STEER=ipv4-­‐addr  tcpdump  –n  –i  eth2:eth3   PFQ_GROUP=10  PFQ_STEER=ipv4-­‐addr  tcpdump  –n  –i  eth2:eth3  
  10. 10. Pcap-­‐perf   •  Pcap-­‐perf  is  a  C++11  applica0on  designed  for   benchmarking  capture  engines  through  pcap  interfaces   •  Support  for  mul0-­‐threads,  BPF  filter  and  plug-­‐ins:   plug-­‐in   kind   Null   packet  counter   IP  checksum   light  CPU  computa0on   MD5   CPU  computa0on   SHA256   heavy  CPU  computa0on   Bloom  Filter   memory  (linear)   Protocol  Classifica0on   memory  tree   TCP/UDP  flow  counter   memory  (std::unordered_set)  
  11. 11. Test-­‐bed  and  measurements   •  Intel  Xeon  6  cores  x5650  @2.67Ghz,  16G  Ram  +  Intel  82599  10G  (Debian  Wheezy)   •  Accelerated  drivers   –  PF_RING:  ixgbe  3.11.33  PF_RING-­‐aware   –  PF_RING-­‐DNA:  ixgbe  3.10.16-­‐DNA  driver   –  Netmap:  ixgbe  driver  shipped  with  the  netmap  package   –  PFQ:  intel  ixgbe  3.11.33  vanilla,  recompiled  through  pfq-­‐oma0c   •  Best  Interrupt  affinity  (MSI-­‐X)   –  4  or  5  kernel  threads  (NAPI)  bound  to  fixed  core  (RSS),  1  or  2  user-­‐space  threads  bound  to   other  core(s)   •  Traffic  is  generated  with  randomized  IP  addresses,  64/128  bytes  long  UDP  packets   –  using  both  PF_DIRECT  and  PF_RING-­‐DNA   10 Gb link mascara monsters
  12. 12. Coun0ng  packets  is  useless   (na0ve  speed)        uint64_t counter = 0;! ! ! !for(;;)! ! ! !{! ! ! !counter++;! ! ! !}!
  13. 13. 1  thread  user-­‐space  (Intel  10G)  
  14. 14. pcap  library  
  15. 15. Pcap  library,  1  thread  counter  
  16. 16. Pcap,  1  thread  counter,  BPF=udp  
  17. 17. Pcap,  1  thread  counter,  BPF=hXp  ||  udp    
  18. 18. pcap-­‐perf  
  19. 19. pcap-­‐perf  
  20. 20. pcap-­‐perf  with  BPF  =  udp  
  21. 21. pcap-­‐perf  (2  threads)  
  22. 22. tcpdump  
  23. 23. tcpdump  –s  64  –i  dev  –w  /ramdisk/dump.pcap   (300M@14.8Mpps)  
  24. 24. tcpdump  –s  138  –i  dev  –w  /ramdisk/dump.pcap   (100M@~8Mpps)  
  25. 25. tcpdump  –i  dev  –w  /ramdisk/dump.pcap  vlan     (5  Gbps)  
  26. 26. tcpdump  –i  dev  –w  /ramdisk/dump.pcap  ip  host   192.168.0.10  (voip  call)  
  27. 27. Thanks  for  the  aXen0on!   nicola.bonelli@cnit.it  

×