2. Intrusion Detection and Prevention System
Intrusion Detection and Prevention System
• IDS/IPS is deployed at the gateway to identify network threats
• Check packets (including payload) against complex rules
• Compute intensive
Internet
1000001000000010000
0001000100000111100
1000010111……………….
Local Network
2
App..
App..
3. Problem: State-of-the-Art Cannot Keep Up
10
0
90
80
70
60
50
40
30
20
10
0
Linerate(Gbps)
3
2015 2017
2009 2012
Line Rate Evolution
T
oday
Y
ear
4. Inefficient to Scale Up Using State-of-the-Art
• Evaluate Snort 3.0 equipped with Hyperscan pattern matching library
• Need 4-21 servers (32-core) and 1125-6000 W
Number of Cores Needed to Reach 100Gbps
667
4
500
400
333
400
125 125
0
200
400
600
800
Mixed-1 Mixed-2 Mixed-3 Mixed-5 Norm-1 Norm-2
Number
of
cores
Mixed-4
Traces
*Assuming ideal scaling for Snort
5. Pigasus: 100Gbps IPS on a Single Server
• 1 FPGA-based SmartNIC + 16-core CPU
FPGA
NIC
core core
core core
… …
Server
5
6. Order-of-Magnitude Efficiency Improvement
• Snort: 4-21 servers (32-core) and 1125-6000 W
• Pigasus: 1 server (16-core) and 49-166 W
Number of Cores Needed to Reach 100Gbps
667
700
600
500
400
300
200
100
0
Mixed-1 Mixed-2 Mixed-3 Mixed-5 Norm-1 Norm-2
Number
of
Cores
Mixed-4
Traces
Snort Pigasus
500
*Pigasus’ number
400 400
333 actual core-count
125 125
7 4 6 14 2 1 1
6
s are
7. What is the secret sauce behind the 100x improvement?
7
FPGA-First Architecture
Fundamentally different scheme to make 100x improvement possible
8. Traditional “FPGA-as-Offload” Acceleration
Multi-String
Pattern Match
FPGA
NIC
Pkt in
Pkt out
Flow
Reassembly
Parser Full Match
PCIe
PCIe
CPU
Innocent pkts
Suspicious
pkts
8
• Packets come into CPU first
• CPU is the main processing unit
• FPGAaccelerates a particular task,
e.g MSPM
9. Prior Work Cannot Get 100x Speedup
• No dominating task anymore (Hyperscan has made MSPM 8x faster)
• Up to ~2X speedup assuming ideal acceleration
Performance breakdown of Snort with Hyperscan
Mixed-3 Mixed-5 Norm-1 Norm-2
Fraction
of
CPU
Time
Mixed-4
Traces
Parsing Reassembly
100.00%
80.00%
60.00%
40.00%
20.00%
0.00%
Mixed-1 Mixed-2
Multi-String Pattern Matching Full Matching Other
9
10. Pigasus: Inverted Offload Approach
• “FPGA-first” architecture: FPGAis the main processing unit
• Common cases are entirely processed on FPGA
Ethernet
Flow
Reassembly
Multi-String
Pattern Match Full Match
Pkt in
Suspicious
pkts
Innocent pkts
Pkt out
Innocent
pkts
FPGA
PCIe
Parser
CPU
95%
packets
5% packets
10
11. Challenge: Limited Fast Memory on FPGA
16 MB
Ethernet
Flow
Reassembly
Multi-String
Pattern Match
FPGA
Parser
Not Fit !
11
• Only 16 MB Block RAM (BRAM)
• Using existing FPGAmodules:
more than 87 MB
12. What is the secret sauce behind the 100x improvement?
12
FPGA-First Architecture
Fundamentally different scheme to make 100x improvement possible
Hierarchical Multi-String Pattern Matching (MSPM)
One of the algorithms to address the memory challenge
*Please refer to our paper for Flow Reassembly and Memory Resource Management
13. Multi-String Pattern Matching (MSPM)
13
• Checking payload and port number against 10K rules in “one” pass
alert tcp $EXTERNAL_NET any -> $HOME_NET $HTTP_PORTS
(…;content:”username=”,fast_pattern;
content:”/GetPermisssions.asp”;
pcre:”(^|&)username=[^&]*?”; sid = 2019;…)
Snort’s MSPM:
- Header
- Fast pattern
(Rest is checked by Full Matcher)
Pigasus’ MSPM:
- Header
- Fast pattern
- Non-fast string pattern
(Dominated Snort’s Full Matcher)
Any field mismatch => Rulenot match
14. MSPM Design Options
14
State Machine-
Based:
23 MB of BRAM
Snort Hyperscan
(Hashtable-based):
25 MB of BRAM
Pigasus
(Hashtable-based):
3 MB of BRAM
Complete more work
15. Pigasus Utilizes Perf & Memory Tradeoff
Payload[i+31,i+31+8] Payload[i,i+8]
…
• High performance => Process more data in parallel => More memory
Matcher
Replica_31
Matcher
Replica_0
15
…
Key observation: no need to keep up 100Gbps **everywhere**
Why not use less memory when lower performance is allowed
16. Use Hierarchical Filters to Save Memory
Fast Pattern
32X
String Matcher
Replicas
100Gbps
of Data
Header Match
8X
Rule Header Matcher
Replicas Replicas ~5Gbps
~11Gbps
~23Gbps
Non-fast Pattern T
oCPU
16X
String Matcher
16
Key Idea: Hierarchical Filtering with Reduced Replicas at Each Layer
18. Pigasus Needs 100x Less Cores
667
700
600
500
400
300
200
100
0
Mixed-1 Mixed-2 Mixed-3 Mixed-4
Traces
Mixed-5 Norm-1 Norm-2
Number
of
Cores
Snort Pigasus
500
400 400
333
125 125
7 4 6 14 2 1 1
18
*Snort’s numbers are extrapolated from single core zero loss throughput
*Pigasus’ numbers are actual core-count
19. Pigasus is Much Cheaper
6000
4500
3600 3600
3000
1125 1125
103 76 94 166 58 49 4
9
7000
6000
5000
4000
3000
2000
1000
0
Mixed-1 Mixed-2 Mixed-3 Mixed-4
Traces
Mixed-5 Norm-1 Norm-2
Power
(Watts)
Snort Pigasus
19
Snort’s Total Cost of Ownership (TCO) $36,539
Pigasus’ TCO $10,642 *Assume 3 years lifetime
20. Conclusion
20
• Pigasus supports 100Gbps on a single server, saving hundreds of cores
• Pigasus proposes “FPGA-first” architecture, which is promising in
performance but challenging to realize due to memory constraints
• Pigasus efficiently uses memory, e.g. Hierarchical Filtering in MSPM
Pigasus is publicly available at
https://github.com/cmu-snap/pigasus
Contact: zhipengzhao@cmu.edu