• Save
IEEE IWCMC TRAC 2012
Upcoming SlideShare
Loading in...5
×
 

IEEE IWCMC TRAC 2012

on

  • 623 views

Traffic classification is still today a challenging problem ...

Traffic classification is still today a challenging problem
given the ever evolving nature of the Internet in which new
protocols and applications arise at a constant pace. In the past,
so called behavioral approaches have been successfully proposed
as valid alternatives to traditional deep packet inspection (DPI)
based tools to properly classify traffic into few and coarse classes.
In this paper we push forward the adoption of behavioral
classifiers by engineering a Hierarchical classifier that allows
proper classification of traffic into more than twenty fine grained
classes.
Thorough engineering has been followed which considers both
proper feature selection and testing seven different classification
algorithms. Results obtained over actual and large data sets show
that the proposed Hierarchical classifier outperforms off-the-shelf
non hierarchical classification algorithms by exhibiting average
accuracy higher than 90%, with precision and recall that are
higher than 95% for most popular classes of traffic.

Statistics

Views

Total Views
623
Views on SlideShare
360
Embed Views
263

Actions

Likes
0
Downloads
0
Comments
0

3 Embeds 263

http://dbdmg.polito.it 201
http://www.luigigrimaudo.it 58
http://www.linkedin.com 4

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    IEEE IWCMC TRAC 2012 IEEE IWCMC TRAC 2012 Presentation Transcript

    • Hierarchical Learning for Fine Grained Internet Traffic Classification Luigi Grimaudo, Marco Mellia, Elena Baralis IEEE IWCMC-TRAC CYPRUS 2012
    • Internet traffic classification Behavioral classifiers try to exploit some description of application behaviors by means of statistical characteristics (e.g., length of the first packets of a flow)Luigi Grimaudo Hierarchical Learning for Fine Grained Internet Traffic Classification
    • Behavioral classifiers  Pros:  No packet payload inspection  Extensions based on retraining steps  Decision can be lightweight  Cons: A proper training set must be available  Training must be customized to the monitored network  Good accuracy with few and coarse classesLuigi Grimaudo Hierarchical Learning for Fine Grained Internet Traffic Classification
    • Our goal  Is it possible to push further behavioral classifiers to correctly identify a large and granular set of classes?  Could it be possible to identify application specific traffic that runs over HTTP, like distinguishing Facebook, YouTube, or Google Maps traffic?  How to handle the unknown class?Luigi Grimaudo Hierarchical Learning for Fine Grained Internet Traffic Classification
    • Considerations  All classification algorithms suffer when the number of classes increases  Not all features are good to distinguish among all classes  Using a large number of features increases the computational cost of classification algorithms and can adds noise in the learning step  It’s difficult to properly select the most prominent features with a large number of applicationsLuigi Grimaudo Hierarchical Learning for Fine Grained Internet Traffic Classification
    • Classification problem  Feature selection: human driven or algorithm based  Classification algorithm choice: 7 different algorithms ( Bayes, Kernel Bayes, Decision Tree, Neural Nets, SVM, KNN)  Performance evaluation: 10-fold cross validation and training/test applicationLuigi Grimaudo Hierarchical Learning for Fine Grained Internet Traffic Classification
    • Feature selection  Initial set of features: more than 200 behavioral layer-4 features provided by Tstat Feature selection algorithm:  minimum-Redundancy-Maximum-Relevance (mRMR) Select features being both minimally redundant among themselves and maximally relevant to the target classes  Advantages:  Dimension reduction to reduce the computational cost  Noise reduction to improve classification accuracyLuigi Grimaudo Hierarchical Learning for Fine Grained Internet Traffic Classification
    • Hierarchical behavioral classifier Features that allow to separate P2P traffic from HTTP traffic may be useless when trying to separate YouTube from Facebook flow  Split the classification process into several stages  At first, coarser classes are used  At following stages, finer grained classification is performed  Classifiers are organized in a tree-based structure, defined according to our domain knowledgeLuigi Grimaudo Hierarchical Learning for Fine Grained Internet Traffic Classification
    • Hierarchical behavioral classifier ROOT GENERAL UNKNOWN P2P SMTP POP3 SSL/TLS IMAP4 MSN HTTPBITTORENT BITTORRENT ED2K ED2K MSE/PE OBF FLICKR OPEN ADV FACEBOOK VIDEO WIKIPEDIA GMAPS MEGAUPLOAD SOCIAL RTMPT YOUTUBE FLASH VIDEO YOUTUBE OTHER SITE VIDEO VIDEO Luigi Grimaudo Hierarchical Learning for Fine Grained Internet Traffic Classification
    • Hierarchical behavioral classifier Advantages (1/2):  Different and specific set of features for each sub- classifier Selected features on the server to client trafficLuigi Grimaudo Hierarchical Learning for Fine Grained Internet Traffic Classification
    • Hierarchical behavioral classifier Advantages (2/2):  Able to use a different algorithm for each sub- classifier  Reduced computational complexity  Possible to run in parallel too  Possible to stop the classification at different levels of granularityLuigi Grimaudo Hierarchical Learning for Fine Grained Internet Traffic Classification
    • Experimental analysis  Real traces  1day long traces  ISP network  Each dataset is 1h  Campus network UNKOWN/OTHER FACEBOOK SMTP OPENSOCIAL POP3 YOUTUBE VIDEO IMAP4 YOUTUBE SITE SSL/TLS FLASH VIDEO MSN RTMP MSN HTTP OTHER VIDEO FLICKR ED2K OBF ADV ED2K MEGAUPLOAD BT GMAPS BT MSE/PE WIKI Number of flow in each ISP dataset of 1hLuigi Grimaudo Hierarchical Learning for Fine Grained Internet Traffic Classification
    • Performance Metrics Actual class Predicted class tp (true positive) fp (false positive) fn (false negative) tn (true negative) Luigi Grimaudo Hierarchical Learning for Fine Grained Internet Traffic Classification
    • Classification algorithm selection  Best algorithms as Flat classifier  Best algorithms as Sub-classifier  Ten-fold cross-validation considering the training datasetLuigi Grimaudo Hierarchical Learning for Fine Grained Internet Traffic Classification
    • Classification algorithm selection  Best algorithms as Flat classifiers  Best algorithms as Sub-classifier  Ten-fold cross-validation considering the training datasetLuigi Grimaudo Hierarchical Learning for Fine Grained Internet Traffic Classification
    • Flat classifier VS Hierarchical ISP dataset h.18Luigi Grimaudo Hierarchical Learning for Fine Grained Internet Traffic Classification
    • Flat classifier VS Hierarchical ISP dataset h.19Luigi Grimaudo Hierarchical Learning for Fine Grained Internet Traffic Classification
    • Flat classifier VS Hierarchical Campus dataset ISP datasetLuigi Grimaudo Hierarchical Learning for Fine Grained Internet Traffic Classification
    • Robustness versus timeLuigi Grimaudo Hierarchical Learning for Fine Grained Internet Traffic Classification
    • Conclusions  Novel Hierarchical classifier  Proper Feature selection  Best algorithm among 7 different classification algorithms The application on real traces showed improvements on:  Classification performance  Robustness  Computational complexityLuigi Grimaudo Hierarchical Learning for Fine Grained Internet Traffic Classification
    • Thanks for attention! Questions? Luigi Grimaudo – luigi.grimaudo@polito.itLuigi Grimaudo Hierarchical Learning for Fine Grained Internet Traffic Classification