` Traffic Classification based on Machine Learning

990 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
990
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
36
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

` Traffic Classification based on Machine Learning

  1. 1. ` Traffic Classification based on Machine Learning using Flow-level Information Jong Gun Lee (jglee@an.kaist.ac.kr) Advanced Networking Lab.
  2. 2. ` Table of Contents • Motivation of this work • Background about machine learning • Our approach using machine learning • Experiment (dataset and result) • Conclusion
  3. 3. ` Motivation • We cannot effectively classify the traffic of some new emergent applications, – such as online games and streaming applications – because there is no application information, such as port number or a common byte sequence in payload We propose a methodology to classify Internet traffic with supervised and unsupervised learning
  4. 4. ` Basic Terminologies of Machine Learning • Classifier is mapping unlabeled instances into classes • Instance is a single object of the world • Attribute is a single object of the world • Feature is the specification of an attribute and its value • Feature vector is a list of features describing an instance
  5. 5. ` Unsupervised and Supervised Learning • Supervised learning (with answer/teacher) – With a training set, a classifier learns the characteristics of each class. And when entering new instance, the classifier predicts the class of the instance. • Unsupervised learning (without answer/teacher) – With only a set of data (feature vectors), a classifier make a set of clusters.
  6. 6. ` K-Means • One of the unsupervised learning methods • K value is the number of clusters and this value is given as the initial parameter • Procedure – First, the classifier randomly chooses K points as the centers of K subspaces – Second, it divides the overall vector space into K subspaces according to the centers – Third, it picks new K centers for each subspaces – And then, it iterates 2nd and 3rd steps until all of the centers are not changed or moved within the threshold value
  7. 7. ` Example of K-Means • # of instance: 8, K=2
  8. 8. ` Overall Process of Our Method Unsupervised Learning Feature Extraction Supervised Learning N packets N feature vectors Classifier K Clusters Classification Method
  9. 9. ` Flow-level Feature Information • Protocol number: 6(TCP) or 17(UDP) • Duration: seconds • Number of packets per second (PPS) • Mean of size of all packets • Mean of size of non-ACK packets • Rate of ACK packets • Interaction Information
  10. 10. ` Feature Extraction (Interaction Information) • Interaction Information – H: 2-dimensional histogram, 16x16 – p1, p2, p3, …, pn • a sequence of packets size of a flow and its partner flow according to timestamp For i = 1 : n-1 H[pi/100][pi+1/100]++ A sequence of packets’ size: 40, 80, 1500, …, 40, 1500 Pair-wise representation: [40, 80], [80, 1500], …, [40, 1500] Histogram: [40/100, 80/100], [80/100, 1500/100], … , [40/100, 1500/100] [0, 0], [0, 15], …, [0, 15]
  11. 11. ` Guideline Unsupervised Learning Supervised Learning Feature Extraction Packets N feature vectors K clusters yes no Classifier Rx and Tx Rx only Tx only #bins, bin size Dynamic/static Initial ?? packets Effetive K estimation Efficient theshold What kind of learning methodFeature extraction Unknown TRaffic
  12. 12. ` Dataset • 6412 bittorrent.arff • 4913 clubbox.arff • 101355 edonkey.arff • 21060 fileguri.arff • 635 ftp.arff • 200274 http.arff • 3611 https.arff • 22 melon.arff • 4986 msnp.arff • 1565 nateon.arff • 169 nntp.arff • 63 pop3.arff • 224 sayclub.arff • 40556 smtp.arff • 67 ssh.arff • 385912 total • 1500 bittorrent.arff • 1500 clubbox.arff • 1500 edonkey.arff • 1500 fileguri.arff • 0 ftp.arff • 1500 http.arff • 1500 https.arff • 0 melon.arff • 1500 msnp.arff • 1500 nateon.arff • 0 nntp.arff • 0 pop3.arff • 0 sayclub.arff • 1500 smtp.arff • 0 ssh.arff • 13500 total
  13. 13. `
  14. 14. `
  15. 15. ` Sum of Squared Error (SSE) • How to get SSE • #bins: 8*8 • #clusters: 1~20
  16. 16. ` Fitting of SSE Y=1.446e004 * X^(-1.194) + 755.8
  17. 17. ` Estimation of SSE
  18. 18. ` Decrease Rate of SSE 0.1% decrease
  19. 19. ` To do list • Direction – Rx and Tx, Rx only, and Tx only • Dynamic bin size • Initial N packets or all the packets • Different (un)supervised learning method • Different feature extraction method

×