Zero wall detecting zero-day web attacks through encoder-decoder recurrent neural networks

자연어처리 연구실
M2020064
조단비

Content
1. Background
2. Introduce
3. Idea
4. Model
5. Experiments
6. Summary
#Kookmin_University #Natural_Language_Processing_lab. 1

Background
1) WAF(Web Application Firewall)
: Detecting and Blocking Web attacks
(ex: SQL Injection, Cross-Site Scripting(XSS))
- Signature-based detection
: one-to-one response method
- Rule-based detection
: signature + AI method
https://m.blog.naver.com/pentamkt/222084745829

Background
2) Zero-day attack
: Attacks vulnerability in computer software
(ex: spear phishing)
3) Whitelist
: Trusted target list
( blacklist)

Background
4) Recurrent Neural Network (RNN, LSTM)
https://ratsgo.github.io/natural%20language%20processing/2017/03/09/rnnlstm/

Introduce
> Why a zero-day attack is difficult to detect?
1) Have not been previously seen
2) Can be carried out by a single malicious HTTP request
3) Very rare within a large number of Web requests
> Most supervised approaches are inappropriate
> Contextual information is not helpful
> Collective and statistical information are not effective
ZeroWall
- Unsupervised approach
(work with an existing WAF in pipeline)
- Detecting a zero-day Web attack
hidden in an individual Web request

Introduce
> What we want?
1) WAF detects those known attacks effectively
2) ZeroWall detects unknown attacks ignored by WAF rules
> Filter out known attacks
> Report new attack patterns to operators
and security engineers to update WAF rules

Idea
1) HTTP request: string following HTTP
2) Most requests are benign, malicious requests are rare
> Consider an HTTP request as one sentence in the HTTP request language
> Train a kind of language model based on historical logs,
to learn this language from benign requests
Historical
Web logs
Language
Model
Train
One request
Can
understand
Cannot
understand
Benign Malicious

Idea : Self-Translate Machine
> How to learn this ‘Hyper-Text’ language?
: Use Neural Network Translation model to train a Self-Translate Machine
- Encoder: encode the original request into one representation
- Decoder: decode it back

> How to quantify the self-translation quality(anomaly score)?
(Translation Quality = Anomaly Score)
An attack detection problem = A machine translation quality assessment problem
: use machine translation metrics
Historical
Web logs
Self-Translate
Machine
Train
One request
Good
Translation
Bad
Translation
Benign Malicious

Translation Quality = Anomaly Score
: Use BLEU(Bi-Lingual Evaluation Understudy)
(Malicious Score = 1-BLEU_score)
𝐵𝐿𝐸𝑈 = 𝐵𝑃 × exp ෍
𝑛=1
𝑁
𝑤𝑛 𝑙𝑜𝑔𝑝𝑛
𝐵𝑃 = ቐ
1, 𝑖𝑓 𝑐 > 𝑟
𝑒
1−
𝑟
𝑐 , 𝑖𝑓 𝑐 ≤ 𝑟
𝑝𝑛 =
σ𝑛−𝑔𝑟𝑎𝑚∈𝐶𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 𝐶𝑜𝑢𝑛𝑡𝑐𝑙𝑖𝑝(𝑛 − 𝑔𝑟𝑎𝑚)
σ𝑛−𝑔𝑟𝑎𝑚∈𝐶𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 𝐶𝑜𝑢𝑛𝑡(𝑛 − 𝑔𝑟𝑎𝑚)

Model: ZeroWall

Model: ZeroWall
> Offline Periodic Retraining
: build and update vocabulary and re-train the model
1
2 3

Model: ZeroWall
1) Building Vocabulary
Raw log
Bag of
Words
Vocabulary
Filtering
- Stop words
- variables

Model: ZeroWall
2) Parsing
Vocabulary

Model: ZeroWall
> Online Detecting
: Detect anomalies in real-time requests for manual investigation
1 2
3
4

Model: ZeroWall
1) Parsing = parsing in offline training
2) Translation
translation

Model: ZeroWall
3) Detection
4) Investigate
(1) BLEU metrics
(2) Threshold [Larger? Yes ⇨ Go to step 3; No ⇨ Benign]
(3) Check whitelist [Not in whitelist? Yes ⇨ Go to step 4; No ⇨ Benign]
(4) Investigation [True Attacks ⇨ Update WAF/IDS; False Alarms ⇨ Update whitelist rules]
Original sequence(token sequence) vs. Translated sequence(recovered token sequence)
1 2 3 4

Experiments
> Data
- 8 real world trace from an internet company
- Over 1.4 billion requests in a week
> Overview
- Captured 28 different types of zero-day attacks
- Contribute to 141,583 of zero-day attack requests in total
# B2M: Ratio of Benign to Malicious (in WAF)
# B2Z: Ratio of Benign to Zero-Day

Experiments
> Baseline & labels
1) Unsupervised Approaches
2) Supervised Approaches
> SAE(stacked auto-encoder), HMM and DFA(Deterministic Finite Automata)
> Use data filtered by WAF as training set
> CNN, RNN and DT
> Use all data (allowed/dropped) as training set and WAF results as labels

Experiments
> Evaluation Results

Experiments
> Zero-Day case
- These attack is detected by ZeroWall, CNN, and RNN
- WAF are usually based on keywords, e.g., eval, request, select, and execute
- ZeroWall is based on the “understanding” of benign requests
- The structure of this zero-day attack request is more like a programming language
ZeroWall
CNN and RNN

Experiments
> Whitelist
- To mitigate False Alarms
- The numbers of whitelist rules refer to “how many whitelist rules are added each day”,
based on the FPs labeled on that day
(No rules applied on 0602 since it is the first day of testing set)
- The results shows that the whitelist reduces the number of FPs with low overhead
(Numbers of rules are very small)
- Based on these results, we believe ZeroWall is practical in real-world deployment

Summary
1) Present a zero-day web attack detection system ZeroWall
2) Deployed in the wild
- Augmenting existing signature-based WAFs
- Use Encoder-Decoder Network to learn patterns from normal requests
- Use Self-Translation Machine & BLEU metrics
- Over 1.4 billion requests
- Captured 28 different types of zero-day attacks
(100K of zero-day attack requests)
- Low overhead

Thank You.
25
#Kookmin_University #Natural_Language_Processing_lab.

Zero wall detecting zero-day web attacks through encoder-decoder recurrent neural networks

Recommended

Recommended

More Related Content

Similar to Zero wall detecting zero-day web attacks through encoder-decoder recurrent neural networks

Similar to Zero wall detecting zero-day web attacks through encoder-decoder recurrent neural networks (20)

More from Danbi Cho

More from Danbi Cho (11)

Recently uploaded

Recently uploaded (20)

Zero wall detecting zero-day web attacks through encoder-decoder recurrent neural networks