Ariu - Workshop on Multiple Classifier Systems - 2011
University of Cagliari Department of Electric and
Electronic Engineering
A modular architecture for the
analysis of HTTP payloads based
on Multiple Classifiers
Davide Ariu Giorgio Giacinto
davide.ariu@diee.unica.it giacinto@diee.unica.it
Napoli, 17 Giugno 2011
This research was sponsored by the
Pattern Recognition and Applications Group Autonomous Region of Sardinia through a grant
Group http://prag.diee.unica.it financed with the ”Sardinia PO FSE 2007‐2013”
funds and provided according to the L.R. 7/2007
Outline
• Motivations
• The proposed system
• Experimental Setup and Results
• Conclusions
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
2
The objective
Design of an anomaly based
Intrusion Detection System
for the protection of
Web Servers and Applications.
The HTTP traffic toward the web
servers is inspected by a
multiple classifier system.
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
3
Why Web Applications?
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
4
Why Anomaly Detection?
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
5
A legitimate Payload...
GET /pra/ita/home.php HTTP/1.1
Host: prag.diee.unica.it
Accept: text/*, text/html
User-Agent: Mozilla/4.0
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
6
A legitimate Payload...
Request Line
GET /pra/ita/home.php HTTP/1.1
Host: prag.diee.unica.it
Accept: text/*, text/html
User-Agent: Mozilla/4.0
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
7
A legitimate Payload...
Request Line
GET /pra/ita/home.php HTTP/1.1
Host: prag.diee.unica.it
Accept: text/*, text/html
User-Agent: Mozilla/4.0
Request Headers
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
8
...and some attacks
• Long Request Buffer Overflow
HEAD / aaaaaaa…aaaaaaaaaaaa
• URL Decoding Error
GET /d/winnt/sys32/cmd.exe?/c+dir HTTP/1.0
Host: www
Connection: close
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
9
Why Payload Analysis?
• Detection of Web-based attacks based
on the
– Analysis of the Request-Line
• Allows detecting only attacks that exploit
input-validation flows
e.g. Spectrogram ([Song,2009]), HMM-Web
([Corona,2009])
– HTTP Payload Analysis
• Takes into account the whole HTTP-request,
and thus it can (in principle) detect any
kind of attack
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
10
SOA - Payload Analysis
• Payl [Wang,2004]
– n-grams to represent byte statistics
• McPAD [Perdisci,2009]
– Ensemble of one-class SVM trained on ν-grams
• Spectrogram [Wang,2009]
– Ensemble of Markov Chains to analyze the request-Line
• HMMPayl [Ariu,2011]
– Ensemble of HMM to analyze sequences of bytes from
the whole payload
None of the above techniques
represented the structure of the payload
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
11
The proposed system
Basic Idea
• We propose to take into account the
structure of HTTP payloads
– For each line of the payload, an
ensemble of HMM is used to model the
sequences of bytes.
– The final decision is obtained by
using the HMM outputs as features.
The payload is thus classified by a
one-class classifier trained on the
outputs of the HMM ensembles.
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
12
The proposed system
A scheme
HMM Ensemble
HTTP Payload
Request‐Line
IDS
HMM Ensemble
GET /pra/index.php HTTP/1.1
Accept‐Language
0.62
Host: prag.diee.unica.it
‐1
User-Agent: Mozilla/5.0
Output Score
One‐Class
Accept-Encoding: gzip, deflate
HMM Ensemble 0.53 or
Classifier
Host Class‐Label
0.34
HMM Ensemble 0.49
User‐Agent
HMM Ensemble
Accept‐Encoding
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
13
Missing Features
• Each request typically does not
contain all the headers
– Training phase: the value of the
feature related to a missing header has
been set to the average value
– Testing phase: the value of the feature
related to a missing header has been
set to -1
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
14
Experimental Setup - 1
• 2 Datasets of Real legitimate
traffic
– DIEE, collected at the University of
Cagliari
– GT, collected at Georgia Tech
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
15
Experimental Setup - 2
• 3 Datasets of Real Attacks
– Generic, 66 Attacks
– Shell-code, 11 Attacks
– XSS-SQL Injection,38 Attacks
• Training: 1 day of traffic
• Test: the remaining traffic plus
attacks
– K-fold CV
16
Experimental Setup - 3
• 4 One-class classification algorithms
with default setting of parameters
– Gauss - Gaussian distribution
– Mog – Mixture of Gaussians
– Parzen – Parzen density estimator
– SVM – SVM with RBF Kernel
• Performance evaluated using the Partial
AUC
– Computed in the FP range [0,0.1]
– Normalized dividing by 0.1
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
17
Experimental Results
Partial AUC – DIEE Dataset
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
18
Experimental Results
Multiple HMM – DIEE Dataset – Shellcode Attacks
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
19
Experimental Results
Partial AUC – GT Dataset
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
20
Experimental Results
Comparison with similar IDS
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
21
Computational Cost
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
22
Conclusions
• We proposed an anomaly based IDS for the
protection of Web-Servers and Web-
Applications
• We exploited the MCS paradigm
– To analyze the structure of the HTTP payload
– By combining the outputs through a One-class
classifier
• Compared to similar systems, our propoal
– Provides high performance in attack detection
– Is fast
Pattern Recognition and Applications Group
Group http://prag.diee.unica.it
23