SlideShare a Scribd company logo
1 of 20
Achieving 100Gbps Intrusion
Prevention on a Single Server
Zhipeng Zhao, Hugo Sadok, Nirav Atre, James C. Hoe, Vyas Sekar, Justine Sherry
Intrusion Detection and Prevention System
Intrusion Detection and Prevention System
• IDS/IPS is deployed at the gateway to identify network threats
• Check packets (including payload) against complex rules
• Compute intensive
Internet
1000001000000010000
0001000100000111100
1000010111……………….
Local Network
2
App..
App..
Problem: State-of-the-Art Cannot Keep Up
10
0
90
80
70
60
50
40
30
20
10
0
Linerate(Gbps)
3
2015 2017
2009 2012
Line Rate Evolution
T
oday
Y
ear
Inefficient to Scale Up Using State-of-the-Art
• Evaluate Snort 3.0 equipped with Hyperscan pattern matching library
• Need 4-21 servers (32-core) and 1125-6000 W
Number of Cores Needed to Reach 100Gbps
667
4
500
400
333
400
125 125
0
200
400
600
800
Mixed-1 Mixed-2 Mixed-3 Mixed-5 Norm-1 Norm-2
Number
of
cores
Mixed-4
Traces
*Assuming ideal scaling for Snort
Pigasus: 100Gbps IPS on a Single Server
• 1 FPGA-based SmartNIC + 16-core CPU
FPGA
NIC
core core
core core
… …
Server
5
Order-of-Magnitude Efficiency Improvement
• Snort: 4-21 servers (32-core) and 1125-6000 W
• Pigasus: 1 server (16-core) and 49-166 W
Number of Cores Needed to Reach 100Gbps
667
700
600
500
400
300
200
100
0
Mixed-1 Mixed-2 Mixed-3 Mixed-5 Norm-1 Norm-2
Number
of
Cores
Mixed-4
Traces
Snort Pigasus
500
*Pigasus’ number
400 400
333 actual core-count
125 125
7 4 6 14 2 1 1
6
s are
What is the secret sauce behind the 100x improvement?
7
 FPGA-First Architecture
Fundamentally different scheme to make 100x improvement possible
Traditional “FPGA-as-Offload” Acceleration
Multi-String
Pattern Match
FPGA
NIC
Pkt in
Pkt out
Flow
Reassembly
Parser Full Match
PCIe
PCIe
CPU
Innocent pkts
Suspicious
pkts
8
• Packets come into CPU first
• CPU is the main processing unit
• FPGAaccelerates a particular task,
e.g MSPM
Prior Work Cannot Get 100x Speedup
• No dominating task anymore (Hyperscan has made MSPM 8x faster)
• Up to ~2X speedup assuming ideal acceleration
Performance breakdown of Snort with Hyperscan
Mixed-3 Mixed-5 Norm-1 Norm-2
Fraction
of
CPU
Time
Mixed-4
Traces
Parsing Reassembly
100.00%
80.00%
60.00%
40.00%
20.00%
0.00%
Mixed-1 Mixed-2
Multi-String Pattern Matching Full Matching Other
9
Pigasus: Inverted Offload Approach
• “FPGA-first” architecture: FPGAis the main processing unit
• Common cases are entirely processed on FPGA
Ethernet
Flow
Reassembly
Multi-String
Pattern Match Full Match
Pkt in
Suspicious
pkts
Innocent pkts
Pkt out
Innocent
pkts
FPGA
PCIe
Parser
CPU
95%
packets
5% packets
10
Challenge: Limited Fast Memory on FPGA
16 MB
Ethernet
Flow
Reassembly
Multi-String
Pattern Match
FPGA
Parser
Not Fit !
11
• Only 16 MB Block RAM (BRAM)
• Using existing FPGAmodules:
more than 87 MB
What is the secret sauce behind the 100x improvement?
12
 FPGA-First Architecture
Fundamentally different scheme to make 100x improvement possible
 Hierarchical Multi-String Pattern Matching (MSPM)
One of the algorithms to address the memory challenge
*Please refer to our paper for Flow Reassembly and Memory Resource Management
Multi-String Pattern Matching (MSPM)
13
• Checking payload and port number against 10K rules in “one” pass
alert tcp $EXTERNAL_NET any -> $HOME_NET $HTTP_PORTS
(…;content:”username=”,fast_pattern;
content:”/GetPermisssions.asp”;
pcre:”(^|&)username=[^&]*?”; sid = 2019;…)
Snort’s MSPM:
- Header
- Fast pattern
(Rest is checked by Full Matcher)
Pigasus’ MSPM:
- Header
- Fast pattern
- Non-fast string pattern
(Dominated Snort’s Full Matcher)
Any field mismatch => Rulenot match
MSPM Design Options
14
State Machine-
Based:
23 MB of BRAM
Snort Hyperscan
(Hashtable-based):
25 MB of BRAM
Pigasus
(Hashtable-based):
3 MB of BRAM
Complete more work
Pigasus Utilizes Perf & Memory Tradeoff
Payload[i+31,i+31+8] Payload[i,i+8]
…
• High performance => Process more data in parallel => More memory
Matcher
Replica_31
Matcher
Replica_0
15
…
Key observation: no need to keep up 100Gbps **everywhere**
Why not use less memory when lower performance is allowed
Use Hierarchical Filters to Save Memory
Fast Pattern
32X
String Matcher
Replicas
100Gbps
of Data
Header Match
8X
Rule Header Matcher
Replicas Replicas ~5Gbps
~11Gbps
~23Gbps
Non-fast Pattern T
oCPU
16X
String Matcher
16
Key Idea: Hierarchical Filtering with Reduced Replicas at Each Layer
Evaluation
17
Pigasus Needs 100x Less Cores
667
700
600
500
400
300
200
100
0
Mixed-1 Mixed-2 Mixed-3 Mixed-4
Traces
Mixed-5 Norm-1 Norm-2
Number
of
Cores
Snort Pigasus
500
400 400
333
125 125
7 4 6 14 2 1 1
18
*Snort’s numbers are extrapolated from single core zero loss throughput
*Pigasus’ numbers are actual core-count
Pigasus is Much Cheaper
6000
4500
3600 3600
3000
1125 1125
103 76 94 166 58 49 4
9
7000
6000
5000
4000
3000
2000
1000
0
Mixed-1 Mixed-2 Mixed-3 Mixed-4
Traces
Mixed-5 Norm-1 Norm-2
Power
(Watts)
Snort Pigasus
19
Snort’s Total Cost of Ownership (TCO) $36,539
Pigasus’ TCO $10,642 *Assume 3 years lifetime
Conclusion
20
• Pigasus supports 100Gbps on a single server, saving hundreds of cores
• Pigasus proposes “FPGA-first” architecture, which is promising in
performance but challenging to realize due to memory constraints
• Pigasus efficiently uses memory, e.g. Hierarchical Filtering in MSPM
Pigasus is publicly available at
https://github.com/cmu-snap/pigasus
Contact: zhipengzhao@cmu.edu

More Related Content

Similar to osdi20-slides_zhao.pptx

cachegrand: A Take on High Performance Caching
cachegrand: A Take on High Performance Cachingcachegrand: A Take on High Performance Caching
cachegrand: A Take on High Performance CachingScyllaDB
 
MySQL NDB Cluster 8.0 SQL faster than NoSQL
MySQL NDB Cluster 8.0 SQL faster than NoSQL MySQL NDB Cluster 8.0 SQL faster than NoSQL
MySQL NDB Cluster 8.0 SQL faster than NoSQL Bernd Ocklin
 
Sunx4450 Intel7460 GigaSpaces XAP Platform Benchmark
Sunx4450 Intel7460 GigaSpaces XAP Platform BenchmarkSunx4450 Intel7460 GigaSpaces XAP Platform Benchmark
Sunx4450 Intel7460 GigaSpaces XAP Platform BenchmarkShay Hassidim
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networksinside-BigData.com
 
Theta and the Future of Accelerator Programming
Theta and the Future of Accelerator ProgrammingTheta and the Future of Accelerator Programming
Theta and the Future of Accelerator Programminginside-BigData.com
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...Chester Chen
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCoburn Watson
 
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.ioFast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.ioOPNFV
 
CPU Caches - Jamie Allen
CPU Caches - Jamie AllenCPU Caches - Jamie Allen
CPU Caches - Jamie Allenjaxconf
 
Performance challenges in software networking
Performance challenges in software networkingPerformance challenges in software networking
Performance challenges in software networkingStephen Hemminger
 
High Performance Hardware for Data Analysis
High Performance Hardware for Data AnalysisHigh Performance Hardware for Data Analysis
High Performance Hardware for Data AnalysisMike Pittaro
 
Mike Pittaro - High Performance Hardware for Data Analysis
Mike Pittaro - High Performance Hardware for Data Analysis Mike Pittaro - High Performance Hardware for Data Analysis
Mike Pittaro - High Performance Hardware for Data Analysis PyData
 
Sora- A High Performance Baseband DSP Processor
Sora- A High Performance Baseband DSP ProcessorSora- A High Performance Baseband DSP Processor
Sora- A High Performance Baseband DSP ProcessorHarshit Srivastava
 
High Performance Erlang - Pitfalls and Solutions
High Performance Erlang - Pitfalls and SolutionsHigh Performance Erlang - Pitfalls and Solutions
High Performance Erlang - Pitfalls and SolutionsYinghai Lu
 
DPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles ShiflettDPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles ShiflettJim St. Leger
 
In-memory Data Management Trends & Techniques
In-memory Data Management Trends & TechniquesIn-memory Data Management Trends & Techniques
In-memory Data Management Trends & TechniquesHazelcast
 
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopProject Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopDatabricks
 

Similar to osdi20-slides_zhao.pptx (20)

cachegrand: A Take on High Performance Caching
cachegrand: A Take on High Performance Cachingcachegrand: A Take on High Performance Caching
cachegrand: A Take on High Performance Caching
 
MySQL NDB Cluster 8.0 SQL faster than NoSQL
MySQL NDB Cluster 8.0 SQL faster than NoSQL MySQL NDB Cluster 8.0 SQL faster than NoSQL
MySQL NDB Cluster 8.0 SQL faster than NoSQL
 
Basics of JVM Tuning
Basics of JVM TuningBasics of JVM Tuning
Basics of JVM Tuning
 
Sunx4450 Intel7460 GigaSpaces XAP Platform Benchmark
Sunx4450 Intel7460 GigaSpaces XAP Platform BenchmarkSunx4450 Intel7460 GigaSpaces XAP Platform Benchmark
Sunx4450 Intel7460 GigaSpaces XAP Platform Benchmark
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networks
 
Theta and the Future of Accelerator Programming
Theta and the Future of Accelerator ProgrammingTheta and the Future of Accelerator Programming
Theta and the Future of Accelerator Programming
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
 
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.ioFast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
 
CPU Caches - Jamie Allen
CPU Caches - Jamie AllenCPU Caches - Jamie Allen
CPU Caches - Jamie Allen
 
Cpu Caches
Cpu CachesCpu Caches
Cpu Caches
 
Performance challenges in software networking
Performance challenges in software networkingPerformance challenges in software networking
Performance challenges in software networking
 
High Performance Hardware for Data Analysis
High Performance Hardware for Data AnalysisHigh Performance Hardware for Data Analysis
High Performance Hardware for Data Analysis
 
Mike Pittaro - High Performance Hardware for Data Analysis
Mike Pittaro - High Performance Hardware for Data Analysis Mike Pittaro - High Performance Hardware for Data Analysis
Mike Pittaro - High Performance Hardware for Data Analysis
 
Sora- A High Performance Baseband DSP Processor
Sora- A High Performance Baseband DSP ProcessorSora- A High Performance Baseband DSP Processor
Sora- A High Performance Baseband DSP Processor
 
High Performance Erlang - Pitfalls and Solutions
High Performance Erlang - Pitfalls and SolutionsHigh Performance Erlang - Pitfalls and Solutions
High Performance Erlang - Pitfalls and Solutions
 
DPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles ShiflettDPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles Shiflett
 
In-memory Data Management Trends & Techniques
In-memory Data Management Trends & TechniquesIn-memory Data Management Trends & Techniques
In-memory Data Management Trends & Techniques
 
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopProject Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
 
100 M pps on PC.
100 M pps on PC.100 M pps on PC.
100 M pps on PC.
 

Recently uploaded

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 

Recently uploaded (20)

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 

osdi20-slides_zhao.pptx

  • 1. Achieving 100Gbps Intrusion Prevention on a Single Server Zhipeng Zhao, Hugo Sadok, Nirav Atre, James C. Hoe, Vyas Sekar, Justine Sherry
  • 2. Intrusion Detection and Prevention System Intrusion Detection and Prevention System • IDS/IPS is deployed at the gateway to identify network threats • Check packets (including payload) against complex rules • Compute intensive Internet 1000001000000010000 0001000100000111100 1000010111………………. Local Network 2 App.. App..
  • 3. Problem: State-of-the-Art Cannot Keep Up 10 0 90 80 70 60 50 40 30 20 10 0 Linerate(Gbps) 3 2015 2017 2009 2012 Line Rate Evolution T oday Y ear
  • 4. Inefficient to Scale Up Using State-of-the-Art • Evaluate Snort 3.0 equipped with Hyperscan pattern matching library • Need 4-21 servers (32-core) and 1125-6000 W Number of Cores Needed to Reach 100Gbps 667 4 500 400 333 400 125 125 0 200 400 600 800 Mixed-1 Mixed-2 Mixed-3 Mixed-5 Norm-1 Norm-2 Number of cores Mixed-4 Traces *Assuming ideal scaling for Snort
  • 5. Pigasus: 100Gbps IPS on a Single Server • 1 FPGA-based SmartNIC + 16-core CPU FPGA NIC core core core core … … Server 5
  • 6. Order-of-Magnitude Efficiency Improvement • Snort: 4-21 servers (32-core) and 1125-6000 W • Pigasus: 1 server (16-core) and 49-166 W Number of Cores Needed to Reach 100Gbps 667 700 600 500 400 300 200 100 0 Mixed-1 Mixed-2 Mixed-3 Mixed-5 Norm-1 Norm-2 Number of Cores Mixed-4 Traces Snort Pigasus 500 *Pigasus’ number 400 400 333 actual core-count 125 125 7 4 6 14 2 1 1 6 s are
  • 7. What is the secret sauce behind the 100x improvement? 7  FPGA-First Architecture Fundamentally different scheme to make 100x improvement possible
  • 8. Traditional “FPGA-as-Offload” Acceleration Multi-String Pattern Match FPGA NIC Pkt in Pkt out Flow Reassembly Parser Full Match PCIe PCIe CPU Innocent pkts Suspicious pkts 8 • Packets come into CPU first • CPU is the main processing unit • FPGAaccelerates a particular task, e.g MSPM
  • 9. Prior Work Cannot Get 100x Speedup • No dominating task anymore (Hyperscan has made MSPM 8x faster) • Up to ~2X speedup assuming ideal acceleration Performance breakdown of Snort with Hyperscan Mixed-3 Mixed-5 Norm-1 Norm-2 Fraction of CPU Time Mixed-4 Traces Parsing Reassembly 100.00% 80.00% 60.00% 40.00% 20.00% 0.00% Mixed-1 Mixed-2 Multi-String Pattern Matching Full Matching Other 9
  • 10. Pigasus: Inverted Offload Approach • “FPGA-first” architecture: FPGAis the main processing unit • Common cases are entirely processed on FPGA Ethernet Flow Reassembly Multi-String Pattern Match Full Match Pkt in Suspicious pkts Innocent pkts Pkt out Innocent pkts FPGA PCIe Parser CPU 95% packets 5% packets 10
  • 11. Challenge: Limited Fast Memory on FPGA 16 MB Ethernet Flow Reassembly Multi-String Pattern Match FPGA Parser Not Fit ! 11 • Only 16 MB Block RAM (BRAM) • Using existing FPGAmodules: more than 87 MB
  • 12. What is the secret sauce behind the 100x improvement? 12  FPGA-First Architecture Fundamentally different scheme to make 100x improvement possible  Hierarchical Multi-String Pattern Matching (MSPM) One of the algorithms to address the memory challenge *Please refer to our paper for Flow Reassembly and Memory Resource Management
  • 13. Multi-String Pattern Matching (MSPM) 13 • Checking payload and port number against 10K rules in “one” pass alert tcp $EXTERNAL_NET any -> $HOME_NET $HTTP_PORTS (…;content:”username=”,fast_pattern; content:”/GetPermisssions.asp”; pcre:”(^|&)username=[^&]*?”; sid = 2019;…) Snort’s MSPM: - Header - Fast pattern (Rest is checked by Full Matcher) Pigasus’ MSPM: - Header - Fast pattern - Non-fast string pattern (Dominated Snort’s Full Matcher) Any field mismatch => Rulenot match
  • 14. MSPM Design Options 14 State Machine- Based: 23 MB of BRAM Snort Hyperscan (Hashtable-based): 25 MB of BRAM Pigasus (Hashtable-based): 3 MB of BRAM Complete more work
  • 15. Pigasus Utilizes Perf & Memory Tradeoff Payload[i+31,i+31+8] Payload[i,i+8] … • High performance => Process more data in parallel => More memory Matcher Replica_31 Matcher Replica_0 15 … Key observation: no need to keep up 100Gbps **everywhere** Why not use less memory when lower performance is allowed
  • 16. Use Hierarchical Filters to Save Memory Fast Pattern 32X String Matcher Replicas 100Gbps of Data Header Match 8X Rule Header Matcher Replicas Replicas ~5Gbps ~11Gbps ~23Gbps Non-fast Pattern T oCPU 16X String Matcher 16 Key Idea: Hierarchical Filtering with Reduced Replicas at Each Layer
  • 18. Pigasus Needs 100x Less Cores 667 700 600 500 400 300 200 100 0 Mixed-1 Mixed-2 Mixed-3 Mixed-4 Traces Mixed-5 Norm-1 Norm-2 Number of Cores Snort Pigasus 500 400 400 333 125 125 7 4 6 14 2 1 1 18 *Snort’s numbers are extrapolated from single core zero loss throughput *Pigasus’ numbers are actual core-count
  • 19. Pigasus is Much Cheaper 6000 4500 3600 3600 3000 1125 1125 103 76 94 166 58 49 4 9 7000 6000 5000 4000 3000 2000 1000 0 Mixed-1 Mixed-2 Mixed-3 Mixed-4 Traces Mixed-5 Norm-1 Norm-2 Power (Watts) Snort Pigasus 19 Snort’s Total Cost of Ownership (TCO) $36,539 Pigasus’ TCO $10,642 *Assume 3 years lifetime
  • 20. Conclusion 20 • Pigasus supports 100Gbps on a single server, saving hundreds of cores • Pigasus proposes “FPGA-first” architecture, which is promising in performance but challenging to realize due to memory constraints • Pigasus efficiently uses memory, e.g. Hierarchical Filtering in MSPM Pigasus is publicly available at https://github.com/cmu-snap/pigasus Contact: zhipengzhao@cmu.edu