Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Spy hard, challenges of 100G deep packet inspection on x86 platform

3,108 views

Published on

Talk given at PLNOG, March 2017

Published in: Technology
  • Be the first to comment

Spy hard, challenges of 100G deep packet inspection on x86 platform

  1. 1. Spy hard challenges of 100G deep packet inspection on x86 platform Paweł Małachowski, 2017.03.07
  2. 2. ^Why?$
  3. 3. Deep packet inspection (DPI) no DPI • packet header lookup • route based on destination (unless PBR) • classify with static rules or state data • cheap DPI • packet header and payload lookup • may route based on content (e.g. uplinks for priority and `bulky’ traffic) • classify with static rules, state data, multiple patterns and custom logic • expensive? 3
  4. 4. 100+ Gbit DPI – why? • end customers typically < 10G uplinks – L7 filtering (WAF, IPS etc.) requested by enterprises – multiple IDS, IPS, NGFW, UTM and WAFs on the market – can be handled with open source tools • 100G+ speeds: ISP/Telco/large DCs – do not want to interfere with traffic • unless hit by huge DDoS attack • or kindly asked by local régime 4
  5. 5. Mirai botnet attacks – examples • attack_tcp_stomp – establish legal TCP connection, then flood it – not to confuse with STOMP protocol • attack_udp_dns – DNS „water torture”, FQDN with random host • attack_app_http – HTTP request flood • attack_app_cfnull – HTTP POST junk 5 source: https://github.com/rosgos/Mirai-Source-Code DPI may help easy :)
  6. 6. Large DDoS attacks in 2016 – examples 1. 150M pps (650Gbps) of TCP SYN packets (mixed size), spoofed IPs 2. 1.75M rps peak of HTTP requests (~121B/r) from ~52k src IPs 3. 220k rps (360Gbps) of large HTTP requests from ~128k src IPs 4. ~1Tbps of recursive „water torture” DNS queries sources: • https://blog.cloudflare.com/say-cheese-a-snapshot-of-the-massive-ddos-attacks-coming-from-iot-cameras/ • https://www.incapsula.com/blog/650gbps-ddos-attack-leet-botnet.html • http://dyn.com/blog/dyn-analysis-summary-of-friday-october-21-attack/ 6 DPI may help
  7. 7. 100Gbit/s sizing • ~148.8 Mpps in small frames, but no payload to scan • ~8.127 Mpps in 1514B frames • ~12.19 GB/s of IP payload • given 16 core machine, our target is: – ~0.5M – 2M lookups /s per core – up to ~762 MB/s per core – note: not all packets and not entire payloads have to be scanned 7
  8. 8. Payload lookup – position • fixed – e.g. NTP • network protocol aware – e.g. DNS • application aware – e.g. HTTP • anywhere in the packet – bad idea $ strings /usr/bin/* | grep -c sex 93 8
  9. 9. Protocol design rant "string: variable-length byte field, encoded in UTF-8, terminated by 0x00” source: https://developer.valvesoftware.com/wiki/Server_queries 9
  10. 10. Software payload lookup – approaches Method Example fixed position literal matching (sequence) <you name it> fixed position literal matching (trie) DPDK ACL computed position literal matching tc u32 application aware classifier nDPI, netfilter l7-filter application level gateway (ALG) netfilter nf_conntrack_* programmable data path netfilter xt_bpf, nftables, XDP+eBPF embedded scripting language NPFLua, pflua hybrid with state machines Hyperscan, Tempesta FW regexp engine Bro, Snort, Suricata 10
  11. 11. ^[Mm]atchings+regexps$
  12. 12. Basic regexp (w+ )+PLNOG1[68]$ tool: https://www.debuggex.com/ 12
  13. 13. Finite–state machine • abstract machine • has states and transitions • some states are "accept states" • input updates machine state • accepts and rejects input sequence of symbols sources: • https://en.wikipedia.org/wiki/State_diagram • https://en.wikipedia.org/wiki/Deterministic_finite_automaton example: accepts binary strings with even number of zeroes 13
  14. 14. DFA vs. NFA • Deterministic finite automaton (DFA) – each of its transitions is uniquely determined by its source state and input symbol – reading an input symbol is required for each state transition. • Nondeterministic finite automaton (NFA) otherwise • NFA can be converted to DFA – DFA is efficient to execute, but may grow – NFA is easier to construct, but may be slower tools: • http://hackingoff.com/compilers/regular-expression-to-nfa-dfa • http://ivanzuzak.info/noam/webapps/fsm_simulator/ 14
  15. 15. PCRE vs. DFA and NFA • PCRE (Perl Compatible Regular Expression) engine is powerful • typical PCRE engine comes as NFA + backtracking • DFA matches regular language (pure) thus can be used to match only some of PCREs • less features, faster engines! – Hyperscan, https://01.org/hyperscan – Perl Incompatible Regular Expressions, https://github.com/yandex/pire 15
  16. 16. Features considered harmful • back-tracking (trial and error) • back references 1 • lookarounds (lookahead, lookbehind) (?<!a)b • conditional regexps (?(?=regex)then|else) 16 see also: http://www.regular-expressions.info
  17. 17. Case: catastrophic backtracking • 34 min Stack Overflow outage in 2016 • s+$ • „malformed post contained roughly 20,000 consecutive characters of whitespace on a comment line” • O(n2) • in other cases it may be 2n sources: • http://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016 • http://www.regular-expressions.info/catastrophic.html 17 >>> sum(range(0,20001)) 200010000
  18. 18. Sources 1. „Finite State Machine Parsing for Internet Protocols: Faster Than You Think”, http://www.cs.dartmouth.edu/~pete/pubs/LangSec-2014-fsm-parsers.pdf 2. „100G Intrusion Detection”, http://go.lbl.gov/100g 3. „DotStar: Breaking the Scalability and Performance Barriers in Regular Expression Set Matching”, http://domino.watson.ibm.com/library/cyberdig.nsf/papers/F38C0227DBF5C7E78525758C005BD05C/$File/rc24645.pdf 4. „Fast Regular Expression Matching Using Dual Glushkov NFA”, https://www-alg.ist.hokudai.ac.jp/~thomas/TCSTR/tcstr_14_73/tcstr_14_73.pdf 5. PIRE discussion: https://news.ycombinator.com/item?id=10209775 18
  19. 19. ^Hyperscan$
  20. 20. What is Hyperscan? • „high-performance multiple regex matching library” • C (run-time, API) and C++ (compiler), BSD licensed • runs on Intel CPUs only, uses: – SIMD (Single Instruction, Multiple Data) – BMI (Bit Manipulation Instruction Sets) • „typically used in a DPI library stack” 20
  21. 21. Hyperscan history • developed by Sensory Networks • 2003-2008 hardware prototypes (GPGPU, FPGA), NodalCore C-series accelerators • 2009 software-based Hyperscan created (note: hardware approach dead end) • 2009-2015 evolution (commercial) • 2015 acquired by Intel, released on BSD license • 2017 v4.4 release sources: • https://01.org/hyperscan • https://lists.01.org/pipermail/hyperscan/2017-January/000078.html • "Hyperscan In SURICATA: STATE OF THE UNION" 21
  22. 22. Hyperscan usage examples (2016 EoY) • unknown commercial IDS/IPS and NGFW products • Snort integration (IDS/IPS signatures) • Suricata integration (IDS/IPS signatures) • RSPAMD integration (e-mail scanning) • redGuardian integration (DDoS patterns) 22
  23. 23. How it works – regexp database # pattern flags min offset max offset min length 0 ^foo 1 bar$ 2 w+bazs{2} singlematch 3 d+ leftmost 5 4 loremnipsum dotall 10 n ^(all|your|base) caseless 15 23 database is a group of regexps and their settings, thousands of regexps possible
  24. 24. How it works – independent scanning contexts 24 regex database compiled earlierinput core 0 matcher, local data (scratch) input core n matcher, local data (scratch)
  25. 25. How it works • may return multiple matches • by default, returns only end offset • not greedy • regexp expression parsed and split into: – literals (fixed strings) – DFA engines – NFA engines – custom engines (prefix, suffix, infix, outfix) – not Aho-Corasick • scanning mode – block, streaming, vectored 25 PCMPEQB (compare packed bytes in xmm2/m128 and xmm1 for equality) POPCNT (return the Count of Number of Bits Set to 1)
  26. 26. DPDK ACL vs. Hyperscan regexp DPDK ACL • compiled to „ACL” • fixed position pattern • looks up all fields in the packet • looks up multiple packets at once in one ACL (up to 16 categories) • predictable speed • returns one match (highest priority) per category regexp as ACL1 • compiled to „DB” • dynamic position pattern • skip not relevant fields • looks up one packet in DB (multiple regexps at once) • speed depends on input • may return multiple matches 26 1 speculation, v4.5 is not released yet
  27. 27. Sources (Hyperscan) 1. http://01org.github.io/hyperscan/ 2. http://www.slideshare.net/harryvanhaaren/hyperscan-mohammad-abdul-awal 3. „HYPERSCAN PERFORMANCE BENCHMARK ON INTEL XEON PROCESSORS, Delivering 160 Gbps DPI Throughput on the Intel Xeon Processor E5-2600 Series”, https://networkbuilders.intel.com/docs/1645-Hyperscan-Performance-Benchmark-on-Intel-Xeon-Processors.pdf 4. „HOW WE MATCH REGULAR EXPRESSIONS”, https://01.org/node/3777 5. „Hyperscan Glossary, a few philosophical points”, https://lists.01.org/pipermail/hyperscan/2016-September/000035.html 6. „Software-based Acceleration of Deep Packet Inspection on Intel Architecture”, https://openisf.files.wordpress.com/2015/11/oisf-keynote-2015-geoff-langdale.pdf 7. "Hyperscan In SURICATA: STATE OF THE UNION", http://suricon.net/wp-content/uploads/2016/11/SuriCon2016_GeoffLangdale.pdf 8. „Hyperscan in Rspamd”, http://www.slideshare.net/VsevolodStakhov/rspamdhyperscan 9. https://www.reddit.com/r/cpp/comments/3picdx/hyperscan_highperformance_multiple_regex_matching/ 27
  28. 28. redGuardian packet pipeline (simplified) DPDK RX customer? policingregexppre filtering state tables, protocol prefilters DPDK ACL1 DPDK TX DPDK ACLn 28
  29. 29. Basic benchmark • Xeon E3-1231 v3 @ 3.40GHz, turbo mode disabled, 10G ixgbe port, 1 core • two cache lines prefetched • results in Mpps 29 network net.1 acl drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 0 pass end regex baz "^foobar" network net.1 acl regex drop baz pass udp pass end plnog_udp_acl rx_median 12.912; tx_median 0.000; gen_rx 0.000; gen_tx 14.881 plnog_udp_regexp rx_median 9.832; tx_median 0.000; gen_rx 0.000; gen_tx 14.881
  30. 30. Basic benchmark // ETH() / IP() / UDP() / ('x'*64 + 'foobar') regex baz "^(.{8}){0,8}foobar" network net.1 acl regex drop baz pass udp pass end matching plnog_udp_acl_many rx_median 5.846; tx_median 0.000; gen_rx 0.000; gen_tx 9.191 plnog_udp_regexp_many rx_median 2.921; tx_median 0.000; gen_rx 0.000; gen_tx 9.191 not matching plnog_udp_acl_many rx_median 4.518; tx_median 4.518; gen_rx 4.517; gen_tx 9.124 plnog_udp_regexp_many rx_median 5.352; tx_median 5.352; gen_rx 5.353; gen_tx 9.124 30 network net.1 acl drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 0 drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 8 drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 16 drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 24 drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 32 drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 40 drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 48 drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 56 drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 64 pass end
  31. 31. Summary • header and payload are the same • regexp engines can be fast • careful benchmarking required • x86 platform can compete with „hardware appliances” 31
  32. 32. ^Backups+slides$
  33. 33. Hardware: CPU + FPGA hybrid? • CPU + FPGA hybrid – Atom + Altera FPGA (2010) – Intel bought Altera (2015) – Intel Stratix® 10 FPGA has built in ARM Cortex-A53 – Xeon Broadwell-EP + FPGA rumours (2016) • Xeon v5 with AVX-512 • Knights Landing Xeon PhiTM – AVX-512 – 256 threads 33 sources: • https://www.nextplatform.com/2016/03/14/intel-marrying-fpga-beefy-broadwell-open-compute-future/ • https://newsroom.intel.com/wp-content/uploads/sites/11/2016/01/ProductBrief-IntelAtomProcessor_E600C_series.pdf • https://www.nextplatform.com/2016/11/15/intel-sets-skylake-xeon-hpc-knights-mill-xeon-phi-ai/
  34. 34. Hardware: 100+ G NICs Mellanox ConnectX®-6 (not available yet) Silicom PE3100G2DQIRL QLogic FastLinQ QL45000 Netronome Agilio LX ports 2 × 200G 2 × 100G 1 × 100G 1 × 100G bus lanes 2 × 16, PCIe 3 or 4 (can use 2 slots) 2 × 8 16 2 × 8 chipset ConnectX-6 Intel® FM10420 cLOM8514 NFP-6480 host CPU bypass ASAP2 FlexPipeTM programmable data path offload (C, P4) driver mlx6? fm10k qede nfp sources: • http://www.mellanox.com/page/products_dyn?product_family=266&mtag=connectx_6_en_card • http://www.silicom-usa.com/pr/server-adapters/networking-adapters/100-gigabit-ethernet-networking-server-adapters/pe3100g2dqirl-server-adapter/ • http://www.qlogic.com/Resources/Documents/DataSheets/Adapters/DataSheet_QL45611HLCU_IEA.pdf • https://www.netronome.com/media/redactor_files/PB_Agilio_Lx_1x100GbE.pdf 34
  35. 35. ^Q&A.* https://twitter.com/redguardianeu

×