SlideShare a Scribd company logo
`
Traffic Classification based on Machine Learning
using Flow-level Information
Jong Gun Lee (jglee@an.kaist.ac.kr)
Advanced Networking Lab.
`
Table of Contents
• Motivation of this work
• Background about machine learning
• Our approach using machine learning
• Experiment (dataset and result)
• Conclusion
`
Motivation
• We cannot effectively classify the traffic of some new
emergent applications,
– such as online games and streaming applications
– because there is no application information, such as port
number or a common byte sequence in payload
We propose a methodology to classify Internet traffic
with supervised and unsupervised learning
`
Basic Terminologies of Machine Learning
• Classifier
is mapping unlabeled instances into classes
• Instance
is a single object of the world
• Attribute
is a single object of the world
• Feature
is the specification of an attribute and its value
• Feature vector
is a list of features describing an instance
`
Unsupervised and Supervised Learning
• Supervised learning (with answer/teacher)
– With a training set, a classifier learns the characteristics of each
class. And when entering new instance, the classifier predicts
the class of the instance.
• Unsupervised learning (without answer/teacher)
– With only a set of data (feature vectors), a classifier make a set
of clusters.
`
K-Means
• One of the unsupervised learning methods
• K value is the number of clusters and this value is given as
the initial parameter
• Procedure
– First, the classifier randomly chooses K points as the centers of
K subspaces
– Second, it divides the overall vector space into K subspaces
according to the centers
– Third, it picks new K centers for each subspaces
– And then, it iterates 2nd
and 3rd
steps until all of the centers are
not changed or moved within the threshold value
`
Example of K-Means
• # of instance: 8, K=2
`
Overall Process of Our Method
Unsupervised
Learning
Feature
Extraction
Supervised
Learning
N packets N feature
vectors
Classifier
K Clusters
Classification
Method
`
Flow-level Feature Information
• Protocol number: 6(TCP) or 17(UDP)
• Duration: seconds
• Number of packets per second (PPS)
• Mean of size of all packets
• Mean of size of non-ACK packets
• Rate of ACK packets
• Interaction Information
`
Feature Extraction (Interaction Information)
• Interaction Information
– H: 2-dimensional histogram, 16x16
– p1, p2, p3, …, pn
• a sequence of packets size of a flow and its partner flow
according to timestamp
For i = 1 : n-1
H[pi/100][pi+1/100]++
A sequence of packets’ size: 40, 80, 1500, …, 40, 1500
Pair-wise representation: [40, 80], [80, 1500], …, [40, 1500]
Histogram: [40/100, 80/100], [80/100, 1500/100], … , [40/100, 1500/100]
[0, 0], [0, 15], …, [0, 15]
`
Guideline
Unsupervised
Learning
Supervised
Learning
Feature
Extraction
Packets N feature
vectors
K clusters
yes
no
Classifier
Rx and Tx
Rx only
Tx only
#bins, bin size
Dynamic/static
Initial ??
packets
Effetive K
estimation
Efficient
theshold
What kind of
learning methodFeature
extraction
Unknown
TRaffic
`
Dataset
• 6412 bittorrent.arff
• 4913 clubbox.arff
• 101355 edonkey.arff
• 21060 fileguri.arff
• 635 ftp.arff
• 200274 http.arff
• 3611 https.arff
• 22 melon.arff
• 4986 msnp.arff
• 1565 nateon.arff
• 169 nntp.arff
• 63 pop3.arff
• 224 sayclub.arff
• 40556 smtp.arff
• 67 ssh.arff
• 385912 total
• 1500 bittorrent.arff
• 1500 clubbox.arff
• 1500 edonkey.arff
• 1500 fileguri.arff
• 0 ftp.arff
• 1500 http.arff
• 1500 https.arff
• 0 melon.arff
• 1500 msnp.arff
• 1500 nateon.arff
• 0 nntp.arff
• 0 pop3.arff
• 0 sayclub.arff
• 1500 smtp.arff
• 0 ssh.arff
• 13500 total
`
`
`
Sum of Squared Error (SSE)
• How to get SSE
• #bins: 8*8
• #clusters: 1~20
`
Fitting of SSE
Y=1.446e004 * X^(-1.194) + 755.8
`
Estimation of SSE
`
Decrease Rate of SSE
0.1% decrease
`
To do list
• Direction
– Rx and Tx, Rx only, and Tx only
• Dynamic bin size
• Initial N packets or all the packets
• Different (un)supervised learning method
• Different feature extraction method

More Related Content

What's hot

Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural network
Sopheaktra YONG
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
Benjamin Bengfort
 
Ns2
Ns2Ns2
Wireless sensor network
Wireless sensor networkWireless sensor network
Wireless sensor network
Neha Kulkarni
 
Network traffic analysis course
Network traffic analysis courseNetwork traffic analysis course
Network traffic analysis course
TECHNOLOGY CONTROL CO.
 
Clustering in wireless sensor networks with compressive sensing
Clustering in wireless sensor networks with compressive sensingClustering in wireless sensor networks with compressive sensing
Clustering in wireless sensor networks with compressive sensing
shivani Shivanichou1
 
Character recognition project
Character recognition projectCharacter recognition project
Character recognition project
Monsif sakienah
 
Routing
RoutingRouting
The cougar approach to in-network query processing in sensor networks
The cougar approach to in-network query processing in sensor networksThe cougar approach to in-network query processing in sensor networks
The cougar approach to in-network query processing in sensor networks
Dilini Muthumala
 
Encapsulation
EncapsulationEncapsulation
Encapsulation
NetProtocol Xpert
 
Iot architecture
Iot architectureIot architecture
Iot architecture
Anam Iqbal
 
CoAP - Web Protocol for IoT
CoAP - Web Protocol for IoTCoAP - Web Protocol for IoT
CoAP - Web Protocol for IoT
Aniruddha Chakrabarti
 
What Is User Datagram Protocol?
What Is User Datagram Protocol?What Is User Datagram Protocol?
What Is User Datagram Protocol?
Simplilearn
 
TCP/IP Protocols
TCP/IP ProtocolsTCP/IP Protocols
TCP/IP Protocols
Danial Mirza
 
Leach & Pegasis
Leach & PegasisLeach & Pegasis
Leach & Pegasis
ReenaShekar
 
Multi channel communications & service delivery
Multi channel communications & service deliveryMulti channel communications & service delivery
Multi channel communications & service delivery
Thom Kearney
 
WSN-Routing Protocols Energy Efficient Routing
WSN-Routing Protocols Energy Efficient RoutingWSN-Routing Protocols Energy Efficient Routing
WSN-Routing Protocols Energy Efficient Routing
ArunChokkalingam
 
Sensor tasking and control
Sensor tasking and controlSensor tasking and control
Sensor tasking and control
chitra raju
 
linear classification
linear classificationlinear classification
linear classification
nep_test_account
 
Spectral clustering
Spectral clusteringSpectral clustering
Spectral clustering
SOYEON KIM
 

What's hot (20)

Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural network
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Ns2
Ns2Ns2
Ns2
 
Wireless sensor network
Wireless sensor networkWireless sensor network
Wireless sensor network
 
Network traffic analysis course
Network traffic analysis courseNetwork traffic analysis course
Network traffic analysis course
 
Clustering in wireless sensor networks with compressive sensing
Clustering in wireless sensor networks with compressive sensingClustering in wireless sensor networks with compressive sensing
Clustering in wireless sensor networks with compressive sensing
 
Character recognition project
Character recognition projectCharacter recognition project
Character recognition project
 
Routing
RoutingRouting
Routing
 
The cougar approach to in-network query processing in sensor networks
The cougar approach to in-network query processing in sensor networksThe cougar approach to in-network query processing in sensor networks
The cougar approach to in-network query processing in sensor networks
 
Encapsulation
EncapsulationEncapsulation
Encapsulation
 
Iot architecture
Iot architectureIot architecture
Iot architecture
 
CoAP - Web Protocol for IoT
CoAP - Web Protocol for IoTCoAP - Web Protocol for IoT
CoAP - Web Protocol for IoT
 
What Is User Datagram Protocol?
What Is User Datagram Protocol?What Is User Datagram Protocol?
What Is User Datagram Protocol?
 
TCP/IP Protocols
TCP/IP ProtocolsTCP/IP Protocols
TCP/IP Protocols
 
Leach & Pegasis
Leach & PegasisLeach & Pegasis
Leach & Pegasis
 
Multi channel communications & service delivery
Multi channel communications & service deliveryMulti channel communications & service delivery
Multi channel communications & service delivery
 
WSN-Routing Protocols Energy Efficient Routing
WSN-Routing Protocols Energy Efficient RoutingWSN-Routing Protocols Energy Efficient Routing
WSN-Routing Protocols Energy Efficient Routing
 
Sensor tasking and control
Sensor tasking and controlSensor tasking and control
Sensor tasking and control
 
linear classification
linear classificationlinear classification
linear classification
 
Spectral clustering
Spectral clusteringSpectral clustering
Spectral clustering
 

Similar to ` Traffic Classification based on Machine Learning

Iiwas19 yamazaki slide
Iiwas19 yamazaki slideIiwas19 yamazaki slide
Iiwas19 yamazaki slide
Kotaro Yamazaki
 
malware detection ppt for vtu project and other final year project
malware detection ppt for vtu project and other final year projectmalware detection ppt for vtu project and other final year project
malware detection ppt for vtu project and other final year project
NaveenAd4
 
Fast detection of transformed data leaks[mithun_p_c]
Fast detection of transformed data leaks[mithun_p_c]Fast detection of transformed data leaks[mithun_p_c]
Fast detection of transformed data leaks[mithun_p_c]
MithunPChandra
 
Performance evaluation methods for P2P overlays
Performance evaluation methods for P2P overlaysPerformance evaluation methods for P2P overlays
Performance evaluation methods for P2P overlays
Knut-Helge Vik
 
2016-04-27 research seminar, 2nd presenter
2016-04-27 research seminar, 2nd presenter2016-04-27 research seminar, 2nd presenter
2016-04-27 research seminar, 2nd presenter
ifi8106tlu
 
Unit i
Unit iUnit i
Packet Classification using Support Vector Machines with String Kernels
Packet Classification using Support Vector Machines with String KernelsPacket Classification using Support Vector Machines with String Kernels
Packet Classification using Support Vector Machines with String Kernels
IJERA Editor
 
Analysis of algorithn class 2
Analysis of algorithn class 2Analysis of algorithn class 2
Analysis of algorithn class 2Kumar
 
background.pptx
background.pptxbackground.pptx
background.pptx
KabileshCm
 
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
ssuser4b1f48
 
Cognitive Toolkit - Deep Learning framework from Microsoft
Cognitive Toolkit - Deep Learning framework from MicrosoftCognitive Toolkit - Deep Learning framework from Microsoft
Cognitive Toolkit - Deep Learning framework from Microsoft
Łukasz Grala
 
Learning with classification and clustering, neural networks
Learning with classification and clustering, neural networksLearning with classification and clustering, neural networks
Learning with classification and clustering, neural networks
Shaun D'Souza
 
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...
AntareepMajumder
 
181123 asynchronous method for deep reinforcement learning seunghyeok back
181123 asynchronous method for deep reinforcement learning seunghyeok back181123 asynchronous method for deep reinforcement learning seunghyeok back
181123 asynchronous method for deep reinforcement learning seunghyeok back
SeungHyeok Baek
 
Document clustering for forensic analysis an approach for improving compute...
Document clustering for forensic   analysis an approach for improving compute...Document clustering for forensic   analysis an approach for improving compute...
Document clustering for forensic analysis an approach for improving compute...
Madan Golla
 
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad FeinbergSpark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit
 
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdfAuto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Kundjanasith Thonglek
 
Artificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Artificial Intelligence and Deep Learning in Azure, CNTK and TensorflowArtificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Artificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Jen Stirrup
 
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...
Towards an Incremental Schema-level Index  for Distributed Linked Open Data G...Towards an Incremental Schema-level Index  for Distributed Linked Open Data G...
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...
Till Blume
 

Similar to ` Traffic Classification based on Machine Learning (20)

Iiwas19 yamazaki slide
Iiwas19 yamazaki slideIiwas19 yamazaki slide
Iiwas19 yamazaki slide
 
malware detection ppt for vtu project and other final year project
malware detection ppt for vtu project and other final year projectmalware detection ppt for vtu project and other final year project
malware detection ppt for vtu project and other final year project
 
Fast detection of transformed data leaks[mithun_p_c]
Fast detection of transformed data leaks[mithun_p_c]Fast detection of transformed data leaks[mithun_p_c]
Fast detection of transformed data leaks[mithun_p_c]
 
Performance evaluation methods for P2P overlays
Performance evaluation methods for P2P overlaysPerformance evaluation methods for P2P overlays
Performance evaluation methods for P2P overlays
 
2016-04-27 research seminar, 2nd presenter
2016-04-27 research seminar, 2nd presenter2016-04-27 research seminar, 2nd presenter
2016-04-27 research seminar, 2nd presenter
 
Unit i
Unit iUnit i
Unit i
 
Packet Classification using Support Vector Machines with String Kernels
Packet Classification using Support Vector Machines with String KernelsPacket Classification using Support Vector Machines with String Kernels
Packet Classification using Support Vector Machines with String Kernels
 
Analysis of algorithn class 2
Analysis of algorithn class 2Analysis of algorithn class 2
Analysis of algorithn class 2
 
background.pptx
background.pptxbackground.pptx
background.pptx
 
TINET_FRnOG_2008_public
TINET_FRnOG_2008_publicTINET_FRnOG_2008_public
TINET_FRnOG_2008_public
 
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
 
Cognitive Toolkit - Deep Learning framework from Microsoft
Cognitive Toolkit - Deep Learning framework from MicrosoftCognitive Toolkit - Deep Learning framework from Microsoft
Cognitive Toolkit - Deep Learning framework from Microsoft
 
Learning with classification and clustering, neural networks
Learning with classification and clustering, neural networksLearning with classification and clustering, neural networks
Learning with classification and clustering, neural networks
 
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...
 
181123 asynchronous method for deep reinforcement learning seunghyeok back
181123 asynchronous method for deep reinforcement learning seunghyeok back181123 asynchronous method for deep reinforcement learning seunghyeok back
181123 asynchronous method for deep reinforcement learning seunghyeok back
 
Document clustering for forensic analysis an approach for improving compute...
Document clustering for forensic   analysis an approach for improving compute...Document clustering for forensic   analysis an approach for improving compute...
Document clustering for forensic analysis an approach for improving compute...
 
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad FeinbergSpark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
 
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdfAuto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
 
Artificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Artificial Intelligence and Deep Learning in Azure, CNTK and TensorflowArtificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Artificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
 
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...
Towards an Incremental Schema-level Index  for Distributed Linked Open Data G...Towards an Incremental Schema-level Index  for Distributed Linked Open Data G...
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...
 

More from butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

` Traffic Classification based on Machine Learning

  • 1. ` Traffic Classification based on Machine Learning using Flow-level Information Jong Gun Lee (jglee@an.kaist.ac.kr) Advanced Networking Lab.
  • 2. ` Table of Contents • Motivation of this work • Background about machine learning • Our approach using machine learning • Experiment (dataset and result) • Conclusion
  • 3. ` Motivation • We cannot effectively classify the traffic of some new emergent applications, – such as online games and streaming applications – because there is no application information, such as port number or a common byte sequence in payload We propose a methodology to classify Internet traffic with supervised and unsupervised learning
  • 4. ` Basic Terminologies of Machine Learning • Classifier is mapping unlabeled instances into classes • Instance is a single object of the world • Attribute is a single object of the world • Feature is the specification of an attribute and its value • Feature vector is a list of features describing an instance
  • 5. ` Unsupervised and Supervised Learning • Supervised learning (with answer/teacher) – With a training set, a classifier learns the characteristics of each class. And when entering new instance, the classifier predicts the class of the instance. • Unsupervised learning (without answer/teacher) – With only a set of data (feature vectors), a classifier make a set of clusters.
  • 6. ` K-Means • One of the unsupervised learning methods • K value is the number of clusters and this value is given as the initial parameter • Procedure – First, the classifier randomly chooses K points as the centers of K subspaces – Second, it divides the overall vector space into K subspaces according to the centers – Third, it picks new K centers for each subspaces – And then, it iterates 2nd and 3rd steps until all of the centers are not changed or moved within the threshold value
  • 7. ` Example of K-Means • # of instance: 8, K=2
  • 8. ` Overall Process of Our Method Unsupervised Learning Feature Extraction Supervised Learning N packets N feature vectors Classifier K Clusters Classification Method
  • 9. ` Flow-level Feature Information • Protocol number: 6(TCP) or 17(UDP) • Duration: seconds • Number of packets per second (PPS) • Mean of size of all packets • Mean of size of non-ACK packets • Rate of ACK packets • Interaction Information
  • 10. ` Feature Extraction (Interaction Information) • Interaction Information – H: 2-dimensional histogram, 16x16 – p1, p2, p3, …, pn • a sequence of packets size of a flow and its partner flow according to timestamp For i = 1 : n-1 H[pi/100][pi+1/100]++ A sequence of packets’ size: 40, 80, 1500, …, 40, 1500 Pair-wise representation: [40, 80], [80, 1500], …, [40, 1500] Histogram: [40/100, 80/100], [80/100, 1500/100], … , [40/100, 1500/100] [0, 0], [0, 15], …, [0, 15]
  • 11. ` Guideline Unsupervised Learning Supervised Learning Feature Extraction Packets N feature vectors K clusters yes no Classifier Rx and Tx Rx only Tx only #bins, bin size Dynamic/static Initial ?? packets Effetive K estimation Efficient theshold What kind of learning methodFeature extraction Unknown TRaffic
  • 12. ` Dataset • 6412 bittorrent.arff • 4913 clubbox.arff • 101355 edonkey.arff • 21060 fileguri.arff • 635 ftp.arff • 200274 http.arff • 3611 https.arff • 22 melon.arff • 4986 msnp.arff • 1565 nateon.arff • 169 nntp.arff • 63 pop3.arff • 224 sayclub.arff • 40556 smtp.arff • 67 ssh.arff • 385912 total • 1500 bittorrent.arff • 1500 clubbox.arff • 1500 edonkey.arff • 1500 fileguri.arff • 0 ftp.arff • 1500 http.arff • 1500 https.arff • 0 melon.arff • 1500 msnp.arff • 1500 nateon.arff • 0 nntp.arff • 0 pop3.arff • 0 sayclub.arff • 1500 smtp.arff • 0 ssh.arff • 13500 total
  • 13. `
  • 14. `
  • 15. ` Sum of Squared Error (SSE) • How to get SSE • #bins: 8*8 • #clusters: 1~20
  • 16. ` Fitting of SSE Y=1.446e004 * X^(-1.194) + 755.8
  • 18. ` Decrease Rate of SSE 0.1% decrease
  • 19. ` To do list • Direction – Rx and Tx, Rx only, and Tx only • Dynamic bin size • Initial N packets or all the packets • Different (un)supervised learning method • Different feature extraction method