Spy hard, challenges of 100G deep packet inspection on x86 platform

Spy hard
challenges of 100G deep packet inspection on x86 platform
Paweł Małachowski, 2017.03.07

Deep packet inspection (DPI)
no DPI
• packet header lookup
• route based on destination (unless
PBR)
• classify with static rules or state data
• cheap
DPI
• packet header and payload lookup
• may route based on content (e.g.
uplinks for priority and `bulky’ traffic)
• classify with static rules, state data,
multiple patterns and custom logic
• expensive?
3

100+ Gbit DPI – why?
• end customers typically < 10G uplinks
– L7 filtering (WAF, IPS etc.) requested by enterprises
– multiple IDS, IPS, NGFW, UTM and WAFs on the market
– can be handled with open source tools
• 100G+ speeds: ISP/Telco/large DCs
– do not want to interfere with traffic
• unless hit by huge DDoS attack
• or kindly asked by local régime
4

Mirai botnet attacks – examples
• attack_tcp_stomp
– establish legal TCP connection, then flood it
– not to confuse with STOMP protocol
• attack_udp_dns
– DNS „water torture”, FQDN with random host
• attack_app_http
– HTTP request flood
• attack_app_cfnull
– HTTP POST junk
5
source: https://github.com/rosgos/Mirai-Source-Code
DPI may help
easy :)

Large DDoS attacks in 2016 – examples
1. 150M pps (650Gbps) of TCP SYN packets (mixed size), spoofed IPs
2. 1.75M rps peak of HTTP requests (~121B/r) from ~52k src IPs
3. 220k rps (360Gbps) of large HTTP requests from ~128k src IPs
4. ~1Tbps of recursive „water torture” DNS queries
sources:
• https://blog.cloudflare.com/say-cheese-a-snapshot-of-the-massive-ddos-attacks-coming-from-iot-cameras/
• https://www.incapsula.com/blog/650gbps-ddos-attack-leet-botnet.html
• http://dyn.com/blog/dyn-analysis-summary-of-friday-october-21-attack/
6
DPI may help

100Gbit/s sizing
• ~148.8 Mpps in small frames, but no payload to scan
• ~8.127 Mpps in 1514B frames
• ~12.19 GB/s of IP payload
• given 16 core machine, our target is:
– ~0.5M – 2M lookups /s per core
– up to ~762 MB/s per core
– note: not all packets and not entire payloads have to be scanned
7

Payload lookup – position
• fixed
– e.g. NTP
• network protocol aware
– e.g. DNS
• application aware
– e.g. HTTP
• anywhere in the packet
– bad idea
$ strings /usr/bin/* | grep -c sex
93
8

Protocol design rant
"string: variable-length byte field, encoded in UTF-8, terminated by 0x00”
source: https://developer.valvesoftware.com/wiki/Server_queries
9

Software payload lookup – approaches
Method Example
fixed position literal matching (sequence) <you name it>
fixed position literal matching (trie) DPDK ACL
computed position literal matching tc u32
application aware classifier nDPI, netfilter l7-filter
application level gateway (ALG) netfilter nf_conntrack_*
programmable data path netfilter xt_bpf, nftables, XDP+eBPF
embedded scripting language NPFLua, pflua
hybrid with state machines Hyperscan, Tempesta FW
regexp engine Bro, Snort, Suricata
10

Basic regexp
(w+ )+PLNOG1[68]$
tool: https://www.debuggex.com/
12

Finite–state machine
• abstract machine
• has states and transitions
• some states are "accept states"
• input updates machine state
• accepts and rejects input sequence
of symbols
sources:
• https://en.wikipedia.org/wiki/State_diagram
• https://en.wikipedia.org/wiki/Deterministic_finite_automaton
example: accepts binary strings with even number of zeroes
13

DFA vs. NFA
• Deterministic finite automaton (DFA)
– each of its transitions is uniquely determined by its source state and input
symbol
– reading an input symbol is required for each state transition.
• Nondeterministic finite automaton (NFA) otherwise
• NFA can be converted to DFA
– DFA is efficient to execute, but may grow
– NFA is easier to construct, but may be slower
tools:
• http://hackingoff.com/compilers/regular-expression-to-nfa-dfa
• http://ivanzuzak.info/noam/webapps/fsm_simulator/
14

PCRE vs. DFA and NFA
• PCRE (Perl Compatible Regular Expression) engine is powerful
• typical PCRE engine comes as NFA + backtracking
• DFA matches regular language (pure) thus can be used to match only
some of PCREs
• less features, faster engines!
– Hyperscan, https://01.org/hyperscan
– Perl Incompatible Regular Expressions, https://github.com/yandex/pire
15

Features considered harmful
• back-tracking (trial and error)
• back references 1
• lookarounds (lookahead, lookbehind) (?<!a)b
• conditional regexps (?(?=regex)then|else)
16
see also: http://www.regular-expressions.info

Case: catastrophic backtracking
• 34 min Stack Overflow outage in 2016
• s+$
• „malformed post contained roughly 20,000 consecutive
characters of whitespace on a comment line”
• O(n2)
• in other cases it may be 2n
sources:
• http://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016
• http://www.regular-expressions.info/catastrophic.html
17
>>> sum(range(0,20001))
200010000

Sources
1. „Finite State Machine Parsing for Internet Protocols: Faster Than You Think”,
http://www.cs.dartmouth.edu/~pete/pubs/LangSec-2014-fsm-parsers.pdf
2. „100G Intrusion Detection”, http://go.lbl.gov/100g
3. „DotStar: Breaking the Scalability and Performance Barriers in Regular Expression Set Matching”,
http://domino.watson.ibm.com/library/cyberdig.nsf/papers/F38C0227DBF5C7E78525758C005BD05C/$File/rc24645.pdf
4. „Fast Regular Expression Matching Using Dual Glushkov NFA”,
https://www-alg.ist.hokudai.ac.jp/~thomas/TCSTR/tcstr_14_73/tcstr_14_73.pdf
5. PIRE discussion: https://news.ycombinator.com/item?id=10209775
18

What is Hyperscan?
• „high-performance multiple regex matching library”
• C (run-time, API) and C++ (compiler), BSD licensed
• runs on Intel CPUs only, uses:
– SIMD (Single Instruction, Multiple Data)
– BMI (Bit Manipulation Instruction Sets)
• „typically used in a DPI library stack”
20

Hyperscan history
• developed by Sensory Networks
• 2003-2008 hardware prototypes (GPGPU, FPGA), NodalCore C-series accelerators
• 2009 software-based Hyperscan created (note: hardware approach dead end)
• 2009-2015 evolution (commercial)
• 2015 acquired by Intel, released on BSD license
• 2017 v4.4 release
sources:
• https://01.org/hyperscan
• https://lists.01.org/pipermail/hyperscan/2017-January/000078.html
• "Hyperscan In SURICATA: STATE OF THE UNION"
21

Hyperscan usage examples (2016 EoY)
• unknown commercial IDS/IPS and NGFW products
• Snort integration (IDS/IPS signatures)
• Suricata integration (IDS/IPS signatures)
• RSPAMD integration (e-mail scanning)
• redGuardian integration (DDoS patterns)
22

How it works – regexp database
# pattern flags min offset max offset min length
0 ^foo
1 bar$
2 w+bazs{2} singlematch
3 d+ leftmost 5
4 loremnipsum dotall 10
n ^(all|your|base) caseless 15
23
database is a group of regexps and their settings, thousands of regexps possible

How it works – independent scanning contexts
24
regex
database
compiled
earlierinput core 0
matcher, local data (scratch)
input core n
matcher, local data (scratch)

How it works
• may return multiple matches
• by default, returns only end offset
• not greedy
• regexp expression parsed and split into:
– literals (fixed strings)
– DFA engines
– NFA engines
– custom engines (prefix, suffix, infix, outfix)
– not Aho-Corasick
• scanning mode – block, streaming, vectored
25
PCMPEQB (compare packed bytes in
xmm2/m128 and xmm1 for equality)
POPCNT (return the Count of Number
of Bits Set to 1)

DPDK ACL vs. Hyperscan regexp
DPDK ACL
• compiled to „ACL”
• fixed position pattern
• looks up all fields in the packet
• looks up multiple packets at once in
one ACL (up to 16 categories)
• predictable speed
• returns one match (highest priority) per
category
regexp as ACL1
• compiled to „DB”
• dynamic position pattern
• skip not relevant fields
• looks up one packet in DB (multiple
regexps at once)
• speed depends on input
• may return multiple matches
26
1 speculation, v4.5 is not released yet

Sources (Hyperscan)
1. http://01org.github.io/hyperscan/
2. http://www.slideshare.net/harryvanhaaren/hyperscan-mohammad-abdul-awal
3. „HYPERSCAN PERFORMANCE BENCHMARK ON INTEL XEON PROCESSORS, Delivering 160 Gbps DPI Throughput on the Intel
Xeon Processor E5-2600 Series”,
https://networkbuilders.intel.com/docs/1645-Hyperscan-Performance-Benchmark-on-Intel-Xeon-Processors.pdf
4. „HOW WE MATCH REGULAR EXPRESSIONS”, https://01.org/node/3777
5. „Hyperscan Glossary, a few philosophical points”, https://lists.01.org/pipermail/hyperscan/2016-September/000035.html
6. „Software-based Acceleration of Deep Packet Inspection on Intel Architecture”,
https://openisf.files.wordpress.com/2015/11/oisf-keynote-2015-geoff-langdale.pdf
7. "Hyperscan In SURICATA: STATE OF THE UNION",
http://suricon.net/wp-content/uploads/2016/11/SuriCon2016_GeoffLangdale.pdf
8. „Hyperscan in Rspamd”, http://www.slideshare.net/VsevolodStakhov/rspamdhyperscan
9. https://www.reddit.com/r/cpp/comments/3picdx/hyperscan_highperformance_multiple_regex_matching/
27

redGuardian packet pipeline (simplified)
DPDK RX
customer? policingregexppre filtering
state
tables,
protocol
prefilters
DPDK
ACL1
DPDK TX
DPDK
ACLn
28

Basic benchmark
• Xeon E3-1231 v3 @ 3.40GHz, turbo mode disabled, 10G ixgbe port, 1 core
• two cache lines prefetched
• results in Mpps
29
network net.1 acl
drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 0
pass
end
regex baz "^foobar"
network net.1 acl
regex drop baz pass udp
pass
end
plnog_udp_acl rx_median 12.912; tx_median 0.000; gen_rx 0.000; gen_tx 14.881
plnog_udp_regexp rx_median 9.832; tx_median 0.000; gen_rx 0.000; gen_tx 14.881

Basic benchmark
// ETH() / IP() / UDP() / ('x'*64 + 'foobar')
regex baz "^(.{8}){0,8}foobar"
network net.1 acl
regex drop baz pass udp
pass
end
matching
plnog_udp_acl_many rx_median 5.846; tx_median 0.000; gen_rx 0.000; gen_tx 9.191
plnog_udp_regexp_many rx_median 2.921; tx_median 0.000; gen_rx 0.000; gen_tx 9.191
not matching
plnog_udp_acl_many rx_median 4.518; tx_median 4.518; gen_rx 4.517; gen_tx 9.124
plnog_udp_regexp_many rx_median 5.352; tx_median 5.352; gen_rx 5.353; gen_tx 9.124
30
network net.1 acl
pass
end

Summary
• header and payload are the same
• regexp engines can be fast
• careful benchmarking required
• x86 platform can compete with „hardware appliances”
31

Hardware: CPU + FPGA hybrid?
• CPU + FPGA hybrid
– Atom + Altera FPGA (2010)
– Intel bought Altera (2015)
– Intel Stratix® 10 FPGA has built in ARM Cortex-A53
– Xeon Broadwell-EP + FPGA rumours (2016)
• Xeon v5 with AVX-512
• Knights Landing Xeon PhiTM
– AVX-512
– 256 threads
33
sources:
• https://www.nextplatform.com/2016/03/14/intel-marrying-fpga-beefy-broadwell-open-compute-future/
• https://newsroom.intel.com/wp-content/uploads/sites/11/2016/01/ProductBrief-IntelAtomProcessor_E600C_series.pdf
• https://www.nextplatform.com/2016/11/15/intel-sets-skylake-xeon-hpc-knights-mill-xeon-phi-ai/

Hardware: 100+ G NICs
Mellanox ConnectX®-6
(not available yet)
Silicom
PE3100G2DQIRL
QLogic FastLinQ
QL45000
Netronome Agilio LX
ports 2 × 200G 2 × 100G 1 × 100G 1 × 100G
bus lanes 2 × 16, PCIe 3 or 4
(can use 2 slots)
2 × 8 16 2 × 8
chipset ConnectX-6 Intel® FM10420 cLOM8514 NFP-6480
host CPU bypass ASAP2 FlexPipeTM programmable data
path offload (C, P4)
driver mlx6? fm10k qede nfp
sources:
• http://www.mellanox.com/page/products_dyn?product_family=266&mtag=connectx_6_en_card
• http://www.silicom-usa.com/pr/server-adapters/networking-adapters/100-gigabit-ethernet-networking-server-adapters/pe3100g2dqirl-server-adapter/
• http://www.qlogic.com/Resources/Documents/DataSheets/Adapters/DataSheet_QL45611HLCU_IEA.pdf
• https://www.netronome.com/media/redactor_files/PB_Agilio_Lx_1x100GbE.pdf
34

^Q&A.*
https://twitter.com/redguardianeu

Spy hard, challenges of 100G deep packet inspection on x86 platform

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Spy hard, challenges of 100G deep packet inspection on x86 platform

Similar to Spy hard, challenges of 100G deep packet inspection on x86 platform (20)

More from Redge Technologies

More from Redge Technologies (8)

Recently uploaded

Recently uploaded (20)

Spy hard, challenges of 100G deep packet inspection on x86 platform