SlideShare a Scribd company logo
1 of 26
Download to read offline
μžμ—°μ–΄μ²˜λ¦¬ 연ꡬ싀
M2020064
쑰단비
Content
1. Background
2. Introduce
3. Idea
4. Model
5. Experiments
6. Summary
#Kookmin_University #Natural_Language_Processing_lab. 1
Background
1) WAF(Web Application Firewall)
: Detecting and Blocking Web attacks
(ex: SQL Injection, Cross-Site Scripting(XSS))
- Signature-based detection
: one-to-one response method
- Rule-based detection
: signature + AI method
#Kookmin_University #Natural_Language_Processing_lab. 2
https://m.blog.naver.com/pentamkt/222084745829
Background
2) Zero-day attack
: Attacks vulnerability in computer software
(ex: spear phishing)
3) Whitelist
: Trusted target list
( blacklist)
#Kookmin_University #Natural_Language_Processing_lab. 3
Background
4) Recurrent Neural Network (RNN, LSTM)
#Kookmin_University #Natural_Language_Processing_lab. 4
https://ratsgo.github.io/natural%20language%20processing/2017/03/09/rnnlstm/
Background
4) Recurrent Neural Network (RNN, LSTM)
#Kookmin_University #Natural_Language_Processing_lab. 5
https://ratsgo.github.io/natural%20language%20processing/2017/03/09/rnnlstm/
Introduce
> Why a zero-day attack is difficult to detect?
1) Have not been previously seen
2) Can be carried out by a single malicious HTTP request
3) Very rare within a large number of Web requests
#Kookmin_University #Natural_Language_Processing_lab. 6
> Most supervised approaches are inappropriate
> Contextual information is not helpful
> Collective and statistical information are not effective
ZeroWall
- Unsupervised approach
(work with an existing WAF in pipeline)
- Detecting a zero-day Web attack
hidden in an individual Web request
Introduce
> What we want?
1) WAF detects those known attacks effectively
2) ZeroWall detects unknown attacks ignored by WAF rules
#Kookmin_University #Natural_Language_Processing_lab. 7
> Filter out known attacks
> Report new attack patterns to operators
and security engineers to update WAF rules
Idea
1) HTTP request: string following HTTP
2) Most requests are benign, malicious requests are rare
#Kookmin_University #Natural_Language_Processing_lab. 8
> Consider an HTTP request as one sentence in the HTTP request language
> Train a kind of language model based on historical logs,
to learn this language from benign requests
Historical
Web logs
Language
Model
Train
One request
Can
understand
Cannot
understand
Benign Malicious
Idea : Self-Translate Machine
> How to learn this β€˜Hyper-Text’ language?
: Use Neural Network Translation model to train a Self-Translate Machine
- Encoder: encode the original request into one representation
- Decoder: decode it back
#Kookmin_University #Natural_Language_Processing_lab. 9
Idea : Self-Translate Machine
> How to quantify the self-translation quality(anomaly score)?
(Translation Quality = Anomaly Score)
An attack detection problem = A machine translation quality assessment problem
: use machine translation metrics
#Kookmin_University #Natural_Language_Processing_lab. 10
Historical
Web logs
Self-Translate
Machine
Train
One request
Good
Translation
Bad
Translation
Benign Malicious
Idea : Self-Translate Machine
Translation Quality = Anomaly Score
: Use BLEU(Bi-Lingual Evaluation Understudy)
(Malicious Score = 1-BLEU_score)
#Kookmin_University #Natural_Language_Processing_lab. 11
π΅πΏπΈπ‘ˆ = 𝐡𝑃 Γ— exp ෍
𝑛=1
𝑁
𝑀𝑛 π‘™π‘œπ‘”π‘π‘›
𝐡𝑃 = ቐ
1, 𝑖𝑓 𝑐 > π‘Ÿ
𝑒
1βˆ’
π‘Ÿ
𝑐 , 𝑖𝑓 𝑐 ≀ π‘Ÿ
𝑝𝑛 =
Οƒπ‘›βˆ’π‘”π‘Ÿπ‘Žπ‘šβˆˆπΆπ‘Žπ‘›π‘‘π‘–π‘‘π‘Žπ‘‘π‘’ πΆπ‘œπ‘’π‘›π‘‘π‘π‘™π‘–π‘(𝑛 βˆ’ π‘”π‘Ÿπ‘Žπ‘š)
Οƒπ‘›βˆ’π‘”π‘Ÿπ‘Žπ‘šβˆˆπΆπ‘Žπ‘›π‘‘π‘–π‘‘π‘Žπ‘‘π‘’ πΆπ‘œπ‘’π‘›π‘‘(𝑛 βˆ’ π‘”π‘Ÿπ‘Žπ‘š)
Model: ZeroWall
#Kookmin_University #Natural_Language_Processing_lab. 12
Model: ZeroWall
> Offline Periodic Retraining
: build and update vocabulary and re-train the model
#Kookmin_University #Natural_Language_Processing_lab. 13
1
2 3
Model: ZeroWall
1) Building Vocabulary
#Kookmin_University #Natural_Language_Processing_lab. 14
Raw log
Bag of
Words
Vocabulary
Filtering
- Stop words
- variables
Model: ZeroWall
2) Parsing
#Kookmin_University #Natural_Language_Processing_lab. 15
Vocabulary
Model: ZeroWall
> Online Detecting
: Detect anomalies in real-time requests for manual investigation
#Kookmin_University #Natural_Language_Processing_lab. 16
1 2
3
4
Model: ZeroWall
1) Parsing = parsing in offline training
2) Translation
#Kookmin_University #Natural_Language_Processing_lab. 17
translation
Model: ZeroWall
3) Detection
4) Investigate
(1) BLEU metrics
(2) Threshold [Larger? Yes ⇨ Go to step 3; No ⇨ Benign]
(3) Check whitelist [Not in whitelist? Yes ⇨ Go to step 4; No ⇨ Benign]
(4) Investigation [True Attacks ⇨ Update WAF/IDS; False Alarms ⇨ Update whitelist rules]
#Kookmin_University #Natural_Language_Processing_lab. 18
Original sequence(token sequence) vs. Translated sequence(recovered token sequence)
1 2 3 4
Experiments
> Data
- 8 real world trace from an internet company
- Over 1.4 billion requests in a week
> Overview
- Captured 28 different types of zero-day attacks
- Contribute to 141,583 of zero-day attack requests in total
#Kookmin_University #Natural_Language_Processing_lab. 19
# B2M: Ratio of Benign to Malicious (in WAF)
# B2Z: Ratio of Benign to Zero-Day
Experiments
> Baseline & labels
1) Unsupervised Approaches
2) Supervised Approaches
#Kookmin_University #Natural_Language_Processing_lab. 20
> SAE(stacked auto-encoder), HMM and DFA(Deterministic Finite Automata)
> Use data filtered by WAF as training set
> CNN, RNN and DT
> Use all data (allowed/dropped) as training set and WAF results as labels
Experiments
> Evaluation Results
#Kookmin_University #Natural_Language_Processing_lab. 21
Experiments
> Zero-Day case
- These attack is detected by ZeroWall, CNN, and RNN
- WAF are usually based on keywords, e.g., eval, request, select, and execute
- ZeroWall is based on the β€œunderstanding” of benign requests
- The structure of this zero-day attack request is more like a programming language
#Kookmin_University #Natural_Language_Processing_lab. 22
ZeroWall
CNN and RNN
Experiments
> Whitelist
- To mitigate False Alarms
- The numbers of whitelist rules refer to β€œhow many whitelist rules are added each day”,
based on the FPs labeled on that day
(No rules applied on 0602 since it is the first day of testing set)
- The results shows that the whitelist reduces the number of FPs with low overhead
(Numbers of rules are very small)
- Based on these results, we believe ZeroWall is practical in real-world deployment
#Kookmin_University #Natural_Language_Processing_lab. 23
Summary
1) Present a zero-day web attack detection system ZeroWall
2) Deployed in the wild
#Kookmin_University #Natural_Language_Processing_lab. 24
- Augmenting existing signature-based WAFs
- Use Encoder-Decoder Network to learn patterns from normal requests
- Use Self-Translation Machine & BLEU metrics
- Over 1.4 billion requests
- Captured 28 different types of zero-day attacks
(100K of zero-day attack requests)
- Low overhead
Thank You.
25
#Kookmin_University #Natural_Language_Processing_lab.

More Related Content

Similar to Zero wall detecting zero-day web attacks through encoder-decoder recurrent neural networks

MIT-6-determina-vps.ppt
MIT-6-determina-vps.pptMIT-6-determina-vps.ppt
MIT-6-determina-vps.ppt
webhostingguy
Β 
Kenneth Delos Santos -SQA - 7 years - long
Kenneth Delos Santos -SQA - 7 years - longKenneth Delos Santos -SQA - 7 years - long
Kenneth Delos Santos -SQA - 7 years - long
Kenneth Lloyd Delos Santos
Β 
Good Security Starts with Software Assurance - Software Assurance Market Plac...
Good Security Starts with Software Assurance - Software Assurance Market Plac...Good Security Starts with Software Assurance - Software Assurance Market Plac...
Good Security Starts with Software Assurance - Software Assurance Market Plac...
Phil Agcaoili
Β 
Frehsher Testing Resume 1
Frehsher  Testing  Resume 1Frehsher  Testing  Resume 1
Frehsher Testing Resume 1
ncct
Β 

Similar to Zero wall detecting zero-day web attacks through encoder-decoder recurrent neural networks (20)

Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptx
Β 
MIT-6-determina-vps.ppt
MIT-6-determina-vps.pptMIT-6-determina-vps.ppt
MIT-6-determina-vps.ppt
Β 
A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavi...
A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavi...A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavi...
A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavi...
Β 
Kenneth Delos Santos -SQA - 7 years - long
Kenneth Delos Santos -SQA - 7 years - longKenneth Delos Santos -SQA - 7 years - long
Kenneth Delos Santos -SQA - 7 years - long
Β 
Predicting Method Crashes with Bytecode Operations
Predicting Method Crashes with Bytecode OperationsPredicting Method Crashes with Bytecode Operations
Predicting Method Crashes with Bytecode Operations
Β 
M phil-computer-science-cryptography-projects
M phil-computer-science-cryptography-projectsM phil-computer-science-cryptography-projects
M phil-computer-science-cryptography-projects
Β 
B-Sides Seattle 2012 Offensive Defense
B-Sides Seattle 2012 Offensive DefenseB-Sides Seattle 2012 Offensive Defense
B-Sides Seattle 2012 Offensive Defense
Β 
Transfer Learning: Repurposing ML Algorithms from Different Domains to Cloud ...
Transfer Learning: Repurposing ML Algorithms from Different Domains to Cloud ...Transfer Learning: Repurposing ML Algorithms from Different Domains to Cloud ...
Transfer Learning: Repurposing ML Algorithms from Different Domains to Cloud ...
Β 
OSCP Preparation Guide @ Infosectrain
OSCP Preparation Guide @ InfosectrainOSCP Preparation Guide @ Infosectrain
OSCP Preparation Guide @ Infosectrain
Β 
Good Security Starts with Software Assurance - Software Assurance Market Plac...
Good Security Starts with Software Assurance - Software Assurance Market Plac...Good Security Starts with Software Assurance - Software Assurance Market Plac...
Good Security Starts with Software Assurance - Software Assurance Market Plac...
Β 
Frehsher Testing Resume 1
Frehsher  Testing  Resume 1Frehsher  Testing  Resume 1
Frehsher Testing Resume 1
Β 
Penetration testing dont just leave it to chance
Penetration testing dont just leave it to chancePenetration testing dont just leave it to chance
Penetration testing dont just leave it to chance
Β 
A Case Study Injecting Safety-Critical Thinking Into Graduate Software Engin...
A Case Study  Injecting Safety-Critical Thinking Into Graduate Software Engin...A Case Study  Injecting Safety-Critical Thinking Into Graduate Software Engin...
A Case Study Injecting Safety-Critical Thinking Into Graduate Software Engin...
Β 
Keeping Up with the Adversary: Creating a Threat-Based Cyber Team
Keeping Up with the Adversary:  Creating a Threat-Based Cyber TeamKeeping Up with the Adversary:  Creating a Threat-Based Cyber Team
Keeping Up with the Adversary: Creating a Threat-Based Cyber Team
Β 
Syed Ubaid Ali Jafri - Black Box Penetration testing for Associates
Syed Ubaid Ali Jafri - Black Box Penetration testing for AssociatesSyed Ubaid Ali Jafri - Black Box Penetration testing for Associates
Syed Ubaid Ali Jafri - Black Box Penetration testing for Associates
Β 
Zerovm backgroud
Zerovm backgroudZerovm backgroud
Zerovm backgroud
Β 
Battista Biggio @ MCS 2015, June 29 - July 1, Guenzburg, Germany: "1.5-class ...
Battista Biggio @ MCS 2015, June 29 - July 1, Guenzburg, Germany: "1.5-class ...Battista Biggio @ MCS 2015, June 29 - July 1, Guenzburg, Germany: "1.5-class ...
Battista Biggio @ MCS 2015, June 29 - July 1, Guenzburg, Germany: "1.5-class ...
Β 
BlueHat v18 || Protecting the protector, hardening machine learning defenses ...
BlueHat v18 || Protecting the protector, hardening machine learning defenses ...BlueHat v18 || Protecting the protector, hardening machine learning defenses ...
BlueHat v18 || Protecting the protector, hardening machine learning defenses ...
Β 
Today
TodayToday
Today
Β 
Building functional Quality Gates with ReportPortal
Building functional Quality Gates with ReportPortalBuilding functional Quality Gates with ReportPortal
Building functional Quality Gates with ReportPortal
Β 

More from Danbi Cho

More from Danbi Cho (11)

Crf based named entity recognition using a korean lexical semantic network
Crf based named entity recognition using a korean lexical semantic networkCrf based named entity recognition using a korean lexical semantic network
Crf based named entity recognition using a korean lexical semantic network
Β 
Gpt models
Gpt modelsGpt models
Gpt models
Β 
Attention boosted deep networks for video classification
Attention boosted deep networks for video classificationAttention boosted deep networks for video classification
Attention boosted deep networks for video classification
Β 
A survey on deep learning based approaches for action and gesture recognition...
A survey on deep learning based approaches for action and gesture recognition...A survey on deep learning based approaches for action and gesture recognition...
A survey on deep learning based approaches for action and gesture recognition...
Β 
ELECTRA_Pretraining Text Encoders as Discriminators rather than Generators
ELECTRA_Pretraining Text Encoders as Discriminators rather than GeneratorsELECTRA_Pretraining Text Encoders as Discriminators rather than Generators
ELECTRA_Pretraining Text Encoders as Discriminators rather than Generators
Β 
A survey on automatic detection of hate speech in text
A survey on automatic detection of hate speech in textA survey on automatic detection of hate speech in text
A survey on automatic detection of hate speech in text
Β 
Decision tree and ensemble
Decision tree and ensembleDecision tree and ensemble
Decision tree and ensemble
Β 
Can recurrent neural networks warp time
Can recurrent neural networks warp timeCan recurrent neural networks warp time
Can recurrent neural networks warp time
Β 
Man is to computer programmer as woman is to homemaker debiasing word embeddings
Man is to computer programmer as woman is to homemaker debiasing word embeddingsMan is to computer programmer as woman is to homemaker debiasing word embeddings
Man is to computer programmer as woman is to homemaker debiasing word embeddings
Β 
Situation recognition visual semantic role labeling for image understanding
Situation recognition visual semantic role labeling for image understandingSituation recognition visual semantic role labeling for image understanding
Situation recognition visual semantic role labeling for image understanding
Β 
Mitigating unwanted biases with adversarial learning
Mitigating unwanted biases with adversarial learningMitigating unwanted biases with adversarial learning
Mitigating unwanted biases with adversarial learning
Β 

Recently uploaded

Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Lisi Hocke
Β 

Recently uploaded (20)

OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCAOpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
Β 
Microsoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdfMicrosoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdf
Β 
Software Engineering - Introduction + Process Models + Requirements Engineering
Software Engineering - Introduction + Process Models + Requirements EngineeringSoftware Engineering - Introduction + Process Models + Requirements Engineering
Software Engineering - Introduction + Process Models + Requirements Engineering
Β 
Community is Just as Important as Code by Andrea Goulet
Community is Just as Important as Code by Andrea GouletCommunity is Just as Important as Code by Andrea Goulet
Community is Just as Important as Code by Andrea Goulet
Β 
Abortion Clinic In Stanger ](+27832195400*)[ πŸ₯ Safe Abortion Pills In Stanger...
Abortion Clinic In Stanger ](+27832195400*)[ πŸ₯ Safe Abortion Pills In Stanger...Abortion Clinic In Stanger ](+27832195400*)[ πŸ₯ Safe Abortion Pills In Stanger...
Abortion Clinic In Stanger ](+27832195400*)[ πŸ₯ Safe Abortion Pills In Stanger...
Β 
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Β 
[GRCPP] Introduction to concepts (C++20)
[GRCPP] Introduction to concepts (C++20)[GRCPP] Introduction to concepts (C++20)
[GRCPP] Introduction to concepts (C++20)
Β 
Navigation in flutter – how to add stack, tab, and drawer navigators to your ...
Navigation in flutter – how to add stack, tab, and drawer navigators to your ...Navigation in flutter – how to add stack, tab, and drawer navigators to your ...
Navigation in flutter – how to add stack, tab, and drawer navigators to your ...
Β 
Abortion Pill Prices Mthatha (@](+27832195400*)[ πŸ₯ Women's Abortion Clinic In...
Abortion Pill Prices Mthatha (@](+27832195400*)[ πŸ₯ Women's Abortion Clinic In...Abortion Pill Prices Mthatha (@](+27832195400*)[ πŸ₯ Women's Abortion Clinic In...
Abortion Pill Prices Mthatha (@](+27832195400*)[ πŸ₯ Women's Abortion Clinic In...
Β 
Test Automation Design Patterns_ A Comprehensive Guide.pdf
Test Automation Design Patterns_ A Comprehensive Guide.pdfTest Automation Design Patterns_ A Comprehensive Guide.pdf
Test Automation Design Patterns_ A Comprehensive Guide.pdf
Β 
BusinessGPT - Security and Governance for Generative AI
BusinessGPT  - Security and Governance for Generative AIBusinessGPT  - Security and Governance for Generative AI
BusinessGPT - Security and Governance for Generative AI
Β 
Abortion Pill Prices Turfloop ](+27832195400*)[ πŸ₯ Women's Abortion Clinic in ...
Abortion Pill Prices Turfloop ](+27832195400*)[ πŸ₯ Women's Abortion Clinic in ...Abortion Pill Prices Turfloop ](+27832195400*)[ πŸ₯ Women's Abortion Clinic in ...
Abortion Pill Prices Turfloop ](+27832195400*)[ πŸ₯ Women's Abortion Clinic in ...
Β 
Abortion Pill Prices Jozini ](+27832195400*)[ πŸ₯ Women's Abortion Clinic in Jo...
Abortion Pill Prices Jozini ](+27832195400*)[ πŸ₯ Women's Abortion Clinic in Jo...Abortion Pill Prices Jozini ](+27832195400*)[ πŸ₯ Women's Abortion Clinic in Jo...
Abortion Pill Prices Jozini ](+27832195400*)[ πŸ₯ Women's Abortion Clinic in Jo...
Β 
Abortion Pill Prices Germiston ](+27832195400*)[ πŸ₯ Women's Abortion Clinic in...
Abortion Pill Prices Germiston ](+27832195400*)[ πŸ₯ Women's Abortion Clinic in...Abortion Pill Prices Germiston ](+27832195400*)[ πŸ₯ Women's Abortion Clinic in...
Abortion Pill Prices Germiston ](+27832195400*)[ πŸ₯ Women's Abortion Clinic in...
Β 
Weeding your micro service landscape.pdf
Weeding your micro service landscape.pdfWeeding your micro service landscape.pdf
Weeding your micro service landscape.pdf
Β 
Food Delivery Business App Development Guide 2024
Food Delivery Business App Development Guide 2024Food Delivery Business App Development Guide 2024
Food Delivery Business App Development Guide 2024
Β 
Novo Nordisk: When Knowledge Graphs meet LLMs
Novo Nordisk: When Knowledge Graphs meet LLMsNovo Nordisk: When Knowledge Graphs meet LLMs
Novo Nordisk: When Knowledge Graphs meet LLMs
Β 
Encryption Recap: A Refresher on Key Concepts
Encryption Recap: A Refresher on Key ConceptsEncryption Recap: A Refresher on Key Concepts
Encryption Recap: A Refresher on Key Concepts
Β 
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
Β 
Lessons Learned from Building a Serverless Notifications System.pdf
Lessons Learned from Building a Serverless Notifications System.pdfLessons Learned from Building a Serverless Notifications System.pdf
Lessons Learned from Building a Serverless Notifications System.pdf
Β 

Zero wall detecting zero-day web attacks through encoder-decoder recurrent neural networks

  • 2. Content 1. Background 2. Introduce 3. Idea 4. Model 5. Experiments 6. Summary #Kookmin_University #Natural_Language_Processing_lab. 1
  • 3. Background 1) WAF(Web Application Firewall) : Detecting and Blocking Web attacks (ex: SQL Injection, Cross-Site Scripting(XSS)) - Signature-based detection : one-to-one response method - Rule-based detection : signature + AI method #Kookmin_University #Natural_Language_Processing_lab. 2 https://m.blog.naver.com/pentamkt/222084745829
  • 4. Background 2) Zero-day attack : Attacks vulnerability in computer software (ex: spear phishing) 3) Whitelist : Trusted target list ( blacklist) #Kookmin_University #Natural_Language_Processing_lab. 3
  • 5. Background 4) Recurrent Neural Network (RNN, LSTM) #Kookmin_University #Natural_Language_Processing_lab. 4 https://ratsgo.github.io/natural%20language%20processing/2017/03/09/rnnlstm/
  • 6. Background 4) Recurrent Neural Network (RNN, LSTM) #Kookmin_University #Natural_Language_Processing_lab. 5 https://ratsgo.github.io/natural%20language%20processing/2017/03/09/rnnlstm/
  • 7. Introduce > Why a zero-day attack is difficult to detect? 1) Have not been previously seen 2) Can be carried out by a single malicious HTTP request 3) Very rare within a large number of Web requests #Kookmin_University #Natural_Language_Processing_lab. 6 > Most supervised approaches are inappropriate > Contextual information is not helpful > Collective and statistical information are not effective ZeroWall - Unsupervised approach (work with an existing WAF in pipeline) - Detecting a zero-day Web attack hidden in an individual Web request
  • 8. Introduce > What we want? 1) WAF detects those known attacks effectively 2) ZeroWall detects unknown attacks ignored by WAF rules #Kookmin_University #Natural_Language_Processing_lab. 7 > Filter out known attacks > Report new attack patterns to operators and security engineers to update WAF rules
  • 9. Idea 1) HTTP request: string following HTTP 2) Most requests are benign, malicious requests are rare #Kookmin_University #Natural_Language_Processing_lab. 8 > Consider an HTTP request as one sentence in the HTTP request language > Train a kind of language model based on historical logs, to learn this language from benign requests Historical Web logs Language Model Train One request Can understand Cannot understand Benign Malicious
  • 10. Idea : Self-Translate Machine > How to learn this β€˜Hyper-Text’ language? : Use Neural Network Translation model to train a Self-Translate Machine - Encoder: encode the original request into one representation - Decoder: decode it back #Kookmin_University #Natural_Language_Processing_lab. 9
  • 11. Idea : Self-Translate Machine > How to quantify the self-translation quality(anomaly score)? (Translation Quality = Anomaly Score) An attack detection problem = A machine translation quality assessment problem : use machine translation metrics #Kookmin_University #Natural_Language_Processing_lab. 10 Historical Web logs Self-Translate Machine Train One request Good Translation Bad Translation Benign Malicious
  • 12. Idea : Self-Translate Machine Translation Quality = Anomaly Score : Use BLEU(Bi-Lingual Evaluation Understudy) (Malicious Score = 1-BLEU_score) #Kookmin_University #Natural_Language_Processing_lab. 11 π΅πΏπΈπ‘ˆ = 𝐡𝑃 Γ— exp ෍ 𝑛=1 𝑁 𝑀𝑛 π‘™π‘œπ‘”π‘π‘› 𝐡𝑃 = ቐ 1, 𝑖𝑓 𝑐 > π‘Ÿ 𝑒 1βˆ’ π‘Ÿ 𝑐 , 𝑖𝑓 𝑐 ≀ π‘Ÿ 𝑝𝑛 = Οƒπ‘›βˆ’π‘”π‘Ÿπ‘Žπ‘šβˆˆπΆπ‘Žπ‘›π‘‘π‘–π‘‘π‘Žπ‘‘π‘’ πΆπ‘œπ‘’π‘›π‘‘π‘π‘™π‘–π‘(𝑛 βˆ’ π‘”π‘Ÿπ‘Žπ‘š) Οƒπ‘›βˆ’π‘”π‘Ÿπ‘Žπ‘šβˆˆπΆπ‘Žπ‘›π‘‘π‘–π‘‘π‘Žπ‘‘π‘’ πΆπ‘œπ‘’π‘›π‘‘(𝑛 βˆ’ π‘”π‘Ÿπ‘Žπ‘š)
  • 14. Model: ZeroWall > Offline Periodic Retraining : build and update vocabulary and re-train the model #Kookmin_University #Natural_Language_Processing_lab. 13 1 2 3
  • 15. Model: ZeroWall 1) Building Vocabulary #Kookmin_University #Natural_Language_Processing_lab. 14 Raw log Bag of Words Vocabulary Filtering - Stop words - variables
  • 16. Model: ZeroWall 2) Parsing #Kookmin_University #Natural_Language_Processing_lab. 15 Vocabulary
  • 17. Model: ZeroWall > Online Detecting : Detect anomalies in real-time requests for manual investigation #Kookmin_University #Natural_Language_Processing_lab. 16 1 2 3 4
  • 18. Model: ZeroWall 1) Parsing = parsing in offline training 2) Translation #Kookmin_University #Natural_Language_Processing_lab. 17 translation
  • 19. Model: ZeroWall 3) Detection 4) Investigate (1) BLEU metrics (2) Threshold [Larger? Yes ⇨ Go to step 3; No ⇨ Benign] (3) Check whitelist [Not in whitelist? Yes ⇨ Go to step 4; No ⇨ Benign] (4) Investigation [True Attacks ⇨ Update WAF/IDS; False Alarms ⇨ Update whitelist rules] #Kookmin_University #Natural_Language_Processing_lab. 18 Original sequence(token sequence) vs. Translated sequence(recovered token sequence) 1 2 3 4
  • 20. Experiments > Data - 8 real world trace from an internet company - Over 1.4 billion requests in a week > Overview - Captured 28 different types of zero-day attacks - Contribute to 141,583 of zero-day attack requests in total #Kookmin_University #Natural_Language_Processing_lab. 19 # B2M: Ratio of Benign to Malicious (in WAF) # B2Z: Ratio of Benign to Zero-Day
  • 21. Experiments > Baseline & labels 1) Unsupervised Approaches 2) Supervised Approaches #Kookmin_University #Natural_Language_Processing_lab. 20 > SAE(stacked auto-encoder), HMM and DFA(Deterministic Finite Automata) > Use data filtered by WAF as training set > CNN, RNN and DT > Use all data (allowed/dropped) as training set and WAF results as labels
  • 22. Experiments > Evaluation Results #Kookmin_University #Natural_Language_Processing_lab. 21
  • 23. Experiments > Zero-Day case - These attack is detected by ZeroWall, CNN, and RNN - WAF are usually based on keywords, e.g., eval, request, select, and execute - ZeroWall is based on the β€œunderstanding” of benign requests - The structure of this zero-day attack request is more like a programming language #Kookmin_University #Natural_Language_Processing_lab. 22 ZeroWall CNN and RNN
  • 24. Experiments > Whitelist - To mitigate False Alarms - The numbers of whitelist rules refer to β€œhow many whitelist rules are added each day”, based on the FPs labeled on that day (No rules applied on 0602 since it is the first day of testing set) - The results shows that the whitelist reduces the number of FPs with low overhead (Numbers of rules are very small) - Based on these results, we believe ZeroWall is practical in real-world deployment #Kookmin_University #Natural_Language_Processing_lab. 23
  • 25. Summary 1) Present a zero-day web attack detection system ZeroWall 2) Deployed in the wild #Kookmin_University #Natural_Language_Processing_lab. 24 - Augmenting existing signature-based WAFs - Use Encoder-Decoder Network to learn patterns from normal requests - Use Self-Translation Machine & BLEU metrics - Over 1.4 billion requests - Captured 28 different types of zero-day attacks (100K of zero-day attack requests) - Low overhead