SlideShare a Scribd company logo
KSII The first International Conference on Internet (ICONI) 2009, December 2009
                                                                                                                        1
Copyright ⓒ 2009 KSII




     Classification of Traffic Flows into QoS
     Classes by Unsupervised Learning and
                  KNN Clustering
                                        Yi Zeng1 and Thomas M. Chen2
                        1
                            San Diego Supercomputer Center, University of California
                                          San Diego, CA 92093 - USA
                                            [e-mail: yzeng@sdsc.edu]
                                  2
                                    School of Engineering, Swansea University
                                         Swansea, Wales SA2 8PP - UK
                                       [e-mail: t.m.chen@swansea.ac.uk]
                                    *Corresponding author: Thomas M. Chen



                                                   Abstract

Traffic classification seeks to assign packet flows to an appropriate quality of service (QoS) class
based on flow statistics without the need to examine packet payloads. Classification proceeds in two
steps. Classification rules are first built by analyzing traffic traces, and then the classification rules are
evaluated using test data. In this paper, we use self-organizing map and K-means clustering as
unsupervised machine learning methods to identify the inherent classes in traffic traces. Three clusters
were discovered, corresponding to transactional, bulk data transfer, and interactive applications. The
K-nearest neighbor classifier was found to be highly accurate for the traffic data and significantly
better compared to a minimum mean distance classifier.


Keywords: Traffic classification, unsupervised learning, k-nearest neighbor, clustering




               1. Introduction                               the network to processing only the IP packet
                                                             header.
Network operators and system administrators
                                                                In Section 2, we review the previous work in
are interested in the mixture of traffic carried in
                                                             traffic classification. Section 3 addresses the
their networks for several reasons. Knowledge
                                                             question of useful features and number of QoS
about traffic composition is valuable for
                                                             classes. We describe experiments with
network planning, accounting, security, and
                                                             unsupervised clustering of real traffic traces to
traffic control. Traffic control includes packet
                                                             build classification rules. Given the discovered
scheduling and intelligent buffer management to
                                                             QoS classes, Section 4 presents experimental
provide the quality of service (QoS) needed by
                                                             evaluation of classification accuracy using k-
applications. It is necessary to determine to
                                                             nearest neighbor compared to minimum mean
which applications packets belong, but
                                                             distance clustering.
traditional protocol layering principles restrict

This research was supported by a research grant from the IT R&D program of MKE/IITA, the Korean
government [2005-Y-001-04, Development of Next Generation Security Technology]. We express our thanks to
Dr. Richard Berke who checked our manuscript.
2                                          Zeng et al.: Classification of Traffic Flows into QoS Classes by Clustering


                                                         input vector is called the best-matching unit
                                                         (BMU), denoted by mc :
              2. Related Work                                          x m= i x m
                                                                        −c m −i
                                                                            n                                     (1)
                                                                                       i

Research in traffic classification, which avoids         where ⋅ is the Euclidean distance, and { i}     m
payload inspection, has accelerated over the last        are the codebook vectors.
five years. It is generally difficult to compare            After finding BMU, the SOM codebook
different approaches, because they vary in the           vectors are updated, such that the BMU is
selection of features (some requiring inspection         moved closer to the input vector. The
of the packet payload), choice of supervised or          topological neighbors of BMU are also treated
unsupervised classification algorithms, and set          this way. This procedure moves BMU and its
of classified traffic classes. The wide range of         topological neighbors towards the sample
previous approaches can be seen in the                   vectors. The update rule for the ith codebook
comprehensive survey by Nguyen and Armitage              vector is:
[1]. Further complicating comparisons between                    mi (n + 1) = mi (n) + α r (n)hci         (2)
different studies is the fact that classification        where n is the training iteration number, x(t) is
performance depends on how the classifier is             an input vector randomly selected from the
trained and the test data used to evaluate
                                                         input data set at the nth training, α (n is the
                                                                                                 r )
accuracy. Unfortunately, a universal set of test
traffic data does not exist to allow uniform             learning rate in the nth training, and hi(n is
                                                                                                     c   )
comparisons of different classifiers.                    the kernel function around BMU mc . The
    A common approach is to classify traffic on          kernel function defines the region of influence
the basis of flows instead of individual packets.        that x has on the map.
Trussell et al. proposed the distribution of                Fig. 1 shows the U-matrix and the
packet lengths as a useful feature [2]. McGregor         components planes for the feature variables. The
et al. used a variety of features: packet length         U-matrix is a visualization of distance between
statistics, interarrival times, byte counts,             neurons, where distance is color coded
connection duration [3]. Flows with similar              according to the spectrum shown next to the
features were grouped together using EM                  map. Blue areas represent codebook vectors
(expectation- maximization) clustering. Having           close to each other in input space, i.e., clusters.
found the clusters representing a set of traffic
classes, the features contributing little were
deleted to simplify classification and the
clusters were recomputed with the reduced
feature set. EM clustering was also studied by
Zander, Nguyen, and Armitage [4]. Sequential
forward selection (SFS) was used to reduce the
feature set. The same authors also tried
AutoClass, an unsupervised Bayesian classifier,
for cluster formation and SFS for feature set
reduction [5].


      3. Unsupervised Clustering
                                                           Fig. 1. U-matrix with 7 components scaled to
3.1 Self-Organizing Map
                                                                               [0,1].
SOM is trained iteratively. In each training step,
one sample vector x from the input data pool is          3.2 K-Means Clustering
chosen randomly, and the distances between it            The K-means clustering algorithm starts with a
and all the SOM codebook vectors are                     training data set and a given number of clusters
calculated using some distance measure. The              K. The samples in the training data set are
neuron whose codebook vector is closest to the           assigned to a cluster based on a similarity
KSII The first International Conference on Internet (ICONI) 2009, December 2009
3

measurement. Euclidean distance is generally                                      5. Conclusions
used to measure the similarity. The K-means
algorithm tries to find an optimal solution by                    Traffic classification was carried out in two
minimizing the square error:                                      phases. In the first off-line phase, we started
                       K   n                                      with no assumptions about traffic classes and
                      ∑ x −c
                       ∑
                                           2
               Er=               j     i                (3)       used the unsupervised SOM and K-means
                      i= j=
                        1 1                                       clustering algorithms to find the structure in the
where K is the number of clusters and n is the                    traffic data. The data exploration procedure
                                                                  found three clusters corresponding to three QoS
number of training samples, c i is the center of
                                                                  classes: transactional, interactive, and bulk data
the ith cluster, x−c is the Euclidean distance
                    i                                             transfer.
between sample x and center c i of the ith                           In the second classification phase, the
cluster.                                                          accuracy of the KNN classifier was evaluated
                                                                  for test data. Leave-one-out cross-validation
                                                                  tests showed that this algorithm had a low error
     4. Experimental Classification                               rate. The KNN classifier was found to have an
          Results and Analysis                                    error rate of about 2 percent for the test data,
                                                                  compared to an error rate of 7 percent for a
The previous section identified three clusters for                MMD classifier. KNN is one of the simplest
QoS classes and features to build up                              classification algorithms, but not necessarily the
classification rules through unsupervised                         most accurate. Other supervised algorithms,
learning. In this section, the accuracy of the                    such as back propagation (BP) and SVM, also
classification rules is evaluated experimentally.                 have attractive features and should be compared
For classification, we chose the K-nearest                        in future work.
neighbor (KNN) algorithm. Experimental
results are compared with the minimum mean
distance (MMD) classifier.                                                         References
   The selected application lists for each class
and the number of applications in each class are                  [1] Thuy Nguyen and Grenville Armitage, “A
shown in Table 1.                                                     survey of techniques for Internet traffic
                                                                      classification using machine learning,”
       Table 1. Applications in each class                            IEEE Communications Surveys and
     Class         Applications         Total                         Tutorials, vo.10, no.4, pp.56-76, 2008.
                                      number                      [2] H. Trussell, A. Nilsson, P. Patel, and Y.
                                                                      Wang, “Estimation and detection of
                 53/TCP, 13/TCP,
 Transactional                           112                          network traffic,” in Proc. of 11th Digital
                    111/TCP,…
                                                                      Signal Processing Workshop, pp.246-248,
                 23/TCP, 21/TCP,
                                                                      2004.
                      43/TCP,
                                                                  [3] Anthony McGregor, Mark Hall, Perry
                     513/TCP,
                                                                      Lorier, and James Brunskill, “Flow
                     514/TCP,
                                                                      clustering     using    machine     learning
  Interactive        540/TCP,             77
                                                                      techniques,” in Proc. of 5th Int. Workshop
                     251/TCP,
                                                                      on Passive and Active Network
                 1017/TCP, 1019/
                                                                      Measurement, pp.205-214, 2004.
                 TCP, 1020/TCP,
                                                                  [4] Sebastian Zander, Thuy Nguyen, and
                   1022/TCP,…
                                                                      Grenville Armitage, “Self-learning IP
                 80/TCP, 20/TCP,
                                                                      traffic classification based on statistical
                 25/TCP, 70/TCP,
                                                                      flow characteristics,” in Proc. of 6th Int.
                 79/TCP, 81/TCP,
                                                                      Workshop on Passive and Active
                 82/TCP, 83/TCP,
   Bulk data                            1351                          Measurement, pp.325-328, 2005.
                      84/TCP,
                                                                  [5] Sebastian Zander, Thuy Nguyen, and
                     119/TCP,
                                                                      Grenville Armitage, “Automated traffic
                     210/TCP,
                                                                      classification and application identification
                   8080/TCP,…
                                                                      using machine learning,” in Proc. of IEEE
4                                Zeng et al.: Classification of Traffic Flows into QoS Classes by Clustering


    Conf. on Local Computer   Networks,
    pp.250-257, 2005.

More Related Content

What's hot

Sensitivity of Support Vector Machine Classification to Various Training Feat...
Sensitivity of Support Vector Machine Classification to Various Training Feat...Sensitivity of Support Vector Machine Classification to Various Training Feat...
Sensitivity of Support Vector Machine Classification to Various Training Feat...
Nooria Sukmaningtyas
 
Transfer Learning in NLP: A Survey
Transfer Learning in NLP: A SurveyTransfer Learning in NLP: A Survey
Transfer Learning in NLP: A Survey
NUPUR YADAV
 
Graphical Visualization of MAC Traces for Wireless Ad-hoc Networks Simulated ...
Graphical Visualization of MAC Traces for Wireless Ad-hoc Networks Simulated ...Graphical Visualization of MAC Traces for Wireless Ad-hoc Networks Simulated ...
Graphical Visualization of MAC Traces for Wireless Ad-hoc Networks Simulated ...
idescitation
 
T24144148
T24144148T24144148
T24144148
IJERA Editor
 
Pami meanshift
Pami meanshiftPami meanshift
Pami meanshiftirisshicat
 
Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...
Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...
Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...
csandit
 
Semantic Image Retrieval Using Relevance Feedback
Semantic Image Retrieval Using Relevance Feedback  Semantic Image Retrieval Using Relevance Feedback
Semantic Image Retrieval Using Relevance Feedback
dannyijwest
 
Reconfiguration layers of convolutional neural network for fundus patches cla...
Reconfiguration layers of convolutional neural network for fundus patches cla...Reconfiguration layers of convolutional neural network for fundus patches cla...
Reconfiguration layers of convolutional neural network for fundus patches cla...
journalBEEI
 
Iaetsd a novel scheduling algorithms for mimo based wireless networks
Iaetsd a novel scheduling algorithms for mimo based wireless networksIaetsd a novel scheduling algorithms for mimo based wireless networks
Iaetsd a novel scheduling algorithms for mimo based wireless networks
Iaetsd Iaetsd
 
Content Based Image Retrieval Using 2-D Discrete Wavelet Transform
Content Based Image Retrieval Using 2-D Discrete Wavelet TransformContent Based Image Retrieval Using 2-D Discrete Wavelet Transform
Content Based Image Retrieval Using 2-D Discrete Wavelet Transform
IOSR Journals
 
IRJET- Crowd Density Estimation using Novel Feature Descriptor
IRJET- Crowd Density Estimation using Novel Feature DescriptorIRJET- Crowd Density Estimation using Novel Feature Descriptor
IRJET- Crowd Density Estimation using Novel Feature Descriptor
IRJET Journal
 
Multi Object Tracking Methods Based on Particle Filter and HMM
Multi Object Tracking Methods Based on Particle Filter and HMMMulti Object Tracking Methods Based on Particle Filter and HMM
Multi Object Tracking Methods Based on Particle Filter and HMM
IJTET Journal
 
Dr35672675
Dr35672675Dr35672675
Dr35672675
IJERA Editor
 
A Review of Image Classification Techniques
A Review of Image Classification TechniquesA Review of Image Classification Techniques
A Review of Image Classification Techniques
IRJET Journal
 
Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio
Armando Vieira
 
Tutorial @ IEEE ICC 2019 : Machine Learning and Stochastic Geometry: Statisti...
Tutorial @ IEEE ICC 2019 : Machine Learning and Stochastic Geometry:Statisti...Tutorial @ IEEE ICC 2019 : Machine Learning and Stochastic Geometry:Statisti...
Tutorial @ IEEE ICC 2019 : Machine Learning and Stochastic Geometry: Statisti...
Takayuki Nishio
 
Distributed Spatial Modulation based Cooperative Diversity Scheme
Distributed Spatial Modulation based Cooperative Diversity SchemeDistributed Spatial Modulation based Cooperative Diversity Scheme
Distributed Spatial Modulation based Cooperative Diversity Scheme
ijwmn
 
Machine Learning and Stochastic Geometry: Statistical Frameworks Against Unce...
Machine Learning and Stochastic Geometry: Statistical Frameworks Against Unce...Machine Learning and Stochastic Geometry: Statistical Frameworks Against Unce...
Machine Learning and Stochastic Geometry: Statistical Frameworks Against Unce...
Koji Yamamoto
 
Parallel k nn on gpu architecture using opencl
Parallel k nn on gpu architecture using openclParallel k nn on gpu architecture using opencl
Parallel k nn on gpu architecture using opencl
eSAT Publishing House
 

What's hot (20)

Sensitivity of Support Vector Machine Classification to Various Training Feat...
Sensitivity of Support Vector Machine Classification to Various Training Feat...Sensitivity of Support Vector Machine Classification to Various Training Feat...
Sensitivity of Support Vector Machine Classification to Various Training Feat...
 
Transfer Learning in NLP: A Survey
Transfer Learning in NLP: A SurveyTransfer Learning in NLP: A Survey
Transfer Learning in NLP: A Survey
 
Graphical Visualization of MAC Traces for Wireless Ad-hoc Networks Simulated ...
Graphical Visualization of MAC Traces for Wireless Ad-hoc Networks Simulated ...Graphical Visualization of MAC Traces for Wireless Ad-hoc Networks Simulated ...
Graphical Visualization of MAC Traces for Wireless Ad-hoc Networks Simulated ...
 
T24144148
T24144148T24144148
T24144148
 
Pami meanshift
Pami meanshiftPami meanshift
Pami meanshift
 
Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...
Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...
Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...
 
Semantic Image Retrieval Using Relevance Feedback
Semantic Image Retrieval Using Relevance Feedback  Semantic Image Retrieval Using Relevance Feedback
Semantic Image Retrieval Using Relevance Feedback
 
Reconfiguration layers of convolutional neural network for fundus patches cla...
Reconfiguration layers of convolutional neural network for fundus patches cla...Reconfiguration layers of convolutional neural network for fundus patches cla...
Reconfiguration layers of convolutional neural network for fundus patches cla...
 
Iaetsd a novel scheduling algorithms for mimo based wireless networks
Iaetsd a novel scheduling algorithms for mimo based wireless networksIaetsd a novel scheduling algorithms for mimo based wireless networks
Iaetsd a novel scheduling algorithms for mimo based wireless networks
 
Content Based Image Retrieval Using 2-D Discrete Wavelet Transform
Content Based Image Retrieval Using 2-D Discrete Wavelet TransformContent Based Image Retrieval Using 2-D Discrete Wavelet Transform
Content Based Image Retrieval Using 2-D Discrete Wavelet Transform
 
IRJET- Crowd Density Estimation using Novel Feature Descriptor
IRJET- Crowd Density Estimation using Novel Feature DescriptorIRJET- Crowd Density Estimation using Novel Feature Descriptor
IRJET- Crowd Density Estimation using Novel Feature Descriptor
 
Multi Object Tracking Methods Based on Particle Filter and HMM
Multi Object Tracking Methods Based on Particle Filter and HMMMulti Object Tracking Methods Based on Particle Filter and HMM
Multi Object Tracking Methods Based on Particle Filter and HMM
 
Jc2415921599
Jc2415921599Jc2415921599
Jc2415921599
 
Dr35672675
Dr35672675Dr35672675
Dr35672675
 
A Review of Image Classification Techniques
A Review of Image Classification TechniquesA Review of Image Classification Techniques
A Review of Image Classification Techniques
 
Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio
 
Tutorial @ IEEE ICC 2019 : Machine Learning and Stochastic Geometry: Statisti...
Tutorial @ IEEE ICC 2019 : Machine Learning and Stochastic Geometry:Statisti...Tutorial @ IEEE ICC 2019 : Machine Learning and Stochastic Geometry:Statisti...
Tutorial @ IEEE ICC 2019 : Machine Learning and Stochastic Geometry: Statisti...
 
Distributed Spatial Modulation based Cooperative Diversity Scheme
Distributed Spatial Modulation based Cooperative Diversity SchemeDistributed Spatial Modulation based Cooperative Diversity Scheme
Distributed Spatial Modulation based Cooperative Diversity Scheme
 
Machine Learning and Stochastic Geometry: Statistical Frameworks Against Unce...
Machine Learning and Stochastic Geometry: Statistical Frameworks Against Unce...Machine Learning and Stochastic Geometry: Statistical Frameworks Against Unce...
Machine Learning and Stochastic Geometry: Statistical Frameworks Against Unce...
 
Parallel k nn on gpu architecture using opencl
Parallel k nn on gpu architecture using openclParallel k nn on gpu architecture using opencl
Parallel k nn on gpu architecture using opencl
 

Viewers also liked

ETRnew.doc.doc
ETRnew.doc.docETRnew.doc.doc
ETRnew.doc.docbutest
 
Resume.doc
Resume.docResume.doc
Resume.docbutest
 
Bibliography.docx - CADAIR: Home
Bibliography.docx - CADAIR: HomeBibliography.docx - CADAIR: Home
Bibliography.docx - CADAIR: Homebutest
 
PENERBITAN 2006.doc
PENERBITAN 2006.docPENERBITAN 2006.doc
PENERBITAN 2006.docbutest
 
What's New in Windows 7 Guide - ITpro.fi - riippumaton IT ...
What's New in Windows 7  Guide - ITpro.fi - riippumaton IT ...What's New in Windows 7  Guide - ITpro.fi - riippumaton IT ...
What's New in Windows 7 Guide - ITpro.fi - riippumaton IT ...butest
 
CURRICULUM VITAE
CURRICULUM VITAECURRICULUM VITAE
CURRICULUM VITAEbutest
 
Doc.doc
Doc.docDoc.doc
Doc.docbutest
 
mlas06_nigam_tie_01.ppt
mlas06_nigam_tie_01.pptmlas06_nigam_tie_01.ppt
mlas06_nigam_tie_01.pptbutest
 

Viewers also liked (8)

ETRnew.doc.doc
ETRnew.doc.docETRnew.doc.doc
ETRnew.doc.doc
 
Resume.doc
Resume.docResume.doc
Resume.doc
 
Bibliography.docx - CADAIR: Home
Bibliography.docx - CADAIR: HomeBibliography.docx - CADAIR: Home
Bibliography.docx - CADAIR: Home
 
PENERBITAN 2006.doc
PENERBITAN 2006.docPENERBITAN 2006.doc
PENERBITAN 2006.doc
 
What's New in Windows 7 Guide - ITpro.fi - riippumaton IT ...
What's New in Windows 7  Guide - ITpro.fi - riippumaton IT ...What's New in Windows 7  Guide - ITpro.fi - riippumaton IT ...
What's New in Windows 7 Guide - ITpro.fi - riippumaton IT ...
 
CURRICULUM VITAE
CURRICULUM VITAECURRICULUM VITAE
CURRICULUM VITAE
 
Doc.doc
Doc.docDoc.doc
Doc.doc
 
mlas06_nigam_tie_01.ppt
mlas06_nigam_tie_01.pptmlas06_nigam_tie_01.ppt
mlas06_nigam_tie_01.ppt
 

Similar to DOWNLOAD

Competent scene classification using feature fusion of pre-trained convolutio...
Competent scene classification using feature fusion of pre-trained convolutio...Competent scene classification using feature fusion of pre-trained convolutio...
Competent scene classification using feature fusion of pre-trained convolutio...
TELKOMNIKA JOURNAL
 
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
IRJET Journal
 
CSC 347 – Computer Hardware and Maintenance
CSC 347 – Computer Hardware and MaintenanceCSC 347 – Computer Hardware and Maintenance
CSC 347 – Computer Hardware and Maintenance
Sumaiya Ismail
 
Influence of priors over multityped object in evolutionary clustering
Influence of priors over multityped object in evolutionary clusteringInfluence of priors over multityped object in evolutionary clustering
Influence of priors over multityped object in evolutionary clustering
csandit
 
INFLUENCE OF PRIORS OVER MULTITYPED OBJECT IN EVOLUTIONARY CLUSTERING
INFLUENCE OF PRIORS OVER MULTITYPED OBJECT IN EVOLUTIONARY CLUSTERINGINFLUENCE OF PRIORS OVER MULTITYPED OBJECT IN EVOLUTIONARY CLUSTERING
INFLUENCE OF PRIORS OVER MULTITYPED OBJECT IN EVOLUTIONARY CLUSTERING
cscpconf
 
K means report
K means reportK means report
K means report
Gaurav Handa
 
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
acijjournal
 
Novel text categorization by amalgamation of augmented k nearest neighbourhoo...
Novel text categorization by amalgamation of augmented k nearest neighbourhoo...Novel text categorization by amalgamation of augmented k nearest neighbourhoo...
Novel text categorization by amalgamation of augmented k nearest neighbourhoo...
ijcsity
 
Trajectory Segmentation and Sampling of Moving Objects Based On Representativ...
Trajectory Segmentation and Sampling of Moving Objects Based On Representativ...Trajectory Segmentation and Sampling of Moving Objects Based On Representativ...
Trajectory Segmentation and Sampling of Moving Objects Based On Representativ...
ijsrd.com
 
A Professional QoS Provisioning in the Intra Cluster Packet Level Resource Al...
A Professional QoS Provisioning in the Intra Cluster Packet Level Resource Al...A Professional QoS Provisioning in the Intra Cluster Packet Level Resource Al...
A Professional QoS Provisioning in the Intra Cluster Packet Level Resource Al...
GiselleginaGloria
 
Performance analysis of congestion-aware Q-routing algorithm for network on chip
Performance analysis of congestion-aware Q-routing algorithm for network on chipPerformance analysis of congestion-aware Q-routing algorithm for network on chip
Performance analysis of congestion-aware Q-routing algorithm for network on chip
IAESIJAI
 
Image Segmentation Using Two Weighted Variable Fuzzy K Means
Image Segmentation Using Two Weighted Variable Fuzzy K MeansImage Segmentation Using Two Weighted Variable Fuzzy K Means
Image Segmentation Using Two Weighted Variable Fuzzy K Means
Editor IJCATR
 
A Learning Automata Based Prediction Mechanism for Target Tracking in Wireles...
A Learning Automata Based Prediction Mechanism for Target Tracking in Wireles...A Learning Automata Based Prediction Mechanism for Target Tracking in Wireles...
A Learning Automata Based Prediction Mechanism for Target Tracking in Wireles...
International Journal of Computer and Communication System Engineering
 
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...
Q UANTUM  C LUSTERING -B ASED  F EATURE SUBSET  S ELECTION FOR MAMMOGRAPHIC I...Q UANTUM  C LUSTERING -B ASED  F EATURE SUBSET  S ELECTION FOR MAMMOGRAPHIC I...
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...
ijcsit
 
Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...
IJERA Editor
 
AN IMPROVED MULTI-SOM ALGORITHM
AN IMPROVED MULTI-SOM ALGORITHMAN IMPROVED MULTI-SOM ALGORITHM
AN IMPROVED MULTI-SOM ALGORITHM
IJNSA Journal
 
5113jgraph01
5113jgraph015113jgraph01
5113jgraph01
graphhoc
 
Dynamic K-Means Algorithm for Optimized Routing in Mobile Ad Hoc Networks
Dynamic K-Means Algorithm for Optimized Routing in Mobile Ad Hoc Networks Dynamic K-Means Algorithm for Optimized Routing in Mobile Ad Hoc Networks
Dynamic K-Means Algorithm for Optimized Routing in Mobile Ad Hoc Networks
IJCSES Journal
 
Kernel based similarity estimation and real time tracking of moving
Kernel based similarity estimation and real time tracking of movingKernel based similarity estimation and real time tracking of moving
Kernel based similarity estimation and real time tracking of movingIAEME Publication
 
AN IMPROVED MULTI-SOM ALGORITHM
AN IMPROVED MULTI-SOM ALGORITHMAN IMPROVED MULTI-SOM ALGORITHM
AN IMPROVED MULTI-SOM ALGORITHM
IJNSA Journal
 

Similar to DOWNLOAD (20)

Competent scene classification using feature fusion of pre-trained convolutio...
Competent scene classification using feature fusion of pre-trained convolutio...Competent scene classification using feature fusion of pre-trained convolutio...
Competent scene classification using feature fusion of pre-trained convolutio...
 
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
 
CSC 347 – Computer Hardware and Maintenance
CSC 347 – Computer Hardware and MaintenanceCSC 347 – Computer Hardware and Maintenance
CSC 347 – Computer Hardware and Maintenance
 
Influence of priors over multityped object in evolutionary clustering
Influence of priors over multityped object in evolutionary clusteringInfluence of priors over multityped object in evolutionary clustering
Influence of priors over multityped object in evolutionary clustering
 
INFLUENCE OF PRIORS OVER MULTITYPED OBJECT IN EVOLUTIONARY CLUSTERING
INFLUENCE OF PRIORS OVER MULTITYPED OBJECT IN EVOLUTIONARY CLUSTERINGINFLUENCE OF PRIORS OVER MULTITYPED OBJECT IN EVOLUTIONARY CLUSTERING
INFLUENCE OF PRIORS OVER MULTITYPED OBJECT IN EVOLUTIONARY CLUSTERING
 
K means report
K means reportK means report
K means report
 
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
 
Novel text categorization by amalgamation of augmented k nearest neighbourhoo...
Novel text categorization by amalgamation of augmented k nearest neighbourhoo...Novel text categorization by amalgamation of augmented k nearest neighbourhoo...
Novel text categorization by amalgamation of augmented k nearest neighbourhoo...
 
Trajectory Segmentation and Sampling of Moving Objects Based On Representativ...
Trajectory Segmentation and Sampling of Moving Objects Based On Representativ...Trajectory Segmentation and Sampling of Moving Objects Based On Representativ...
Trajectory Segmentation and Sampling of Moving Objects Based On Representativ...
 
A Professional QoS Provisioning in the Intra Cluster Packet Level Resource Al...
A Professional QoS Provisioning in the Intra Cluster Packet Level Resource Al...A Professional QoS Provisioning in the Intra Cluster Packet Level Resource Al...
A Professional QoS Provisioning in the Intra Cluster Packet Level Resource Al...
 
Performance analysis of congestion-aware Q-routing algorithm for network on chip
Performance analysis of congestion-aware Q-routing algorithm for network on chipPerformance analysis of congestion-aware Q-routing algorithm for network on chip
Performance analysis of congestion-aware Q-routing algorithm for network on chip
 
Image Segmentation Using Two Weighted Variable Fuzzy K Means
Image Segmentation Using Two Weighted Variable Fuzzy K MeansImage Segmentation Using Two Weighted Variable Fuzzy K Means
Image Segmentation Using Two Weighted Variable Fuzzy K Means
 
A Learning Automata Based Prediction Mechanism for Target Tracking in Wireles...
A Learning Automata Based Prediction Mechanism for Target Tracking in Wireles...A Learning Automata Based Prediction Mechanism for Target Tracking in Wireles...
A Learning Automata Based Prediction Mechanism for Target Tracking in Wireles...
 
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...
Q UANTUM  C LUSTERING -B ASED  F EATURE SUBSET  S ELECTION FOR MAMMOGRAPHIC I...Q UANTUM  C LUSTERING -B ASED  F EATURE SUBSET  S ELECTION FOR MAMMOGRAPHIC I...
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...
 
Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...
 
AN IMPROVED MULTI-SOM ALGORITHM
AN IMPROVED MULTI-SOM ALGORITHMAN IMPROVED MULTI-SOM ALGORITHM
AN IMPROVED MULTI-SOM ALGORITHM
 
5113jgraph01
5113jgraph015113jgraph01
5113jgraph01
 
Dynamic K-Means Algorithm for Optimized Routing in Mobile Ad Hoc Networks
Dynamic K-Means Algorithm for Optimized Routing in Mobile Ad Hoc Networks Dynamic K-Means Algorithm for Optimized Routing in Mobile Ad Hoc Networks
Dynamic K-Means Algorithm for Optimized Routing in Mobile Ad Hoc Networks
 
Kernel based similarity estimation and real time tracking of moving
Kernel based similarity estimation and real time tracking of movingKernel based similarity estimation and real time tracking of moving
Kernel based similarity estimation and real time tracking of moving
 
AN IMPROVED MULTI-SOM ALGORITHM
AN IMPROVED MULTI-SOM ALGORITHMAN IMPROVED MULTI-SOM ALGORITHM
AN IMPROVED MULTI-SOM ALGORITHM
 

More from butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

DOWNLOAD

  • 1. KSII The first International Conference on Internet (ICONI) 2009, December 2009 1 Copyright ⓒ 2009 KSII Classification of Traffic Flows into QoS Classes by Unsupervised Learning and KNN Clustering Yi Zeng1 and Thomas M. Chen2 1 San Diego Supercomputer Center, University of California San Diego, CA 92093 - USA [e-mail: yzeng@sdsc.edu] 2 School of Engineering, Swansea University Swansea, Wales SA2 8PP - UK [e-mail: t.m.chen@swansea.ac.uk] *Corresponding author: Thomas M. Chen Abstract Traffic classification seeks to assign packet flows to an appropriate quality of service (QoS) class based on flow statistics without the need to examine packet payloads. Classification proceeds in two steps. Classification rules are first built by analyzing traffic traces, and then the classification rules are evaluated using test data. In this paper, we use self-organizing map and K-means clustering as unsupervised machine learning methods to identify the inherent classes in traffic traces. Three clusters were discovered, corresponding to transactional, bulk data transfer, and interactive applications. The K-nearest neighbor classifier was found to be highly accurate for the traffic data and significantly better compared to a minimum mean distance classifier. Keywords: Traffic classification, unsupervised learning, k-nearest neighbor, clustering 1. Introduction the network to processing only the IP packet header. Network operators and system administrators In Section 2, we review the previous work in are interested in the mixture of traffic carried in traffic classification. Section 3 addresses the their networks for several reasons. Knowledge question of useful features and number of QoS about traffic composition is valuable for classes. We describe experiments with network planning, accounting, security, and unsupervised clustering of real traffic traces to traffic control. Traffic control includes packet build classification rules. Given the discovered scheduling and intelligent buffer management to QoS classes, Section 4 presents experimental provide the quality of service (QoS) needed by evaluation of classification accuracy using k- applications. It is necessary to determine to nearest neighbor compared to minimum mean which applications packets belong, but distance clustering. traditional protocol layering principles restrict This research was supported by a research grant from the IT R&D program of MKE/IITA, the Korean government [2005-Y-001-04, Development of Next Generation Security Technology]. We express our thanks to Dr. Richard Berke who checked our manuscript.
  • 2. 2 Zeng et al.: Classification of Traffic Flows into QoS Classes by Clustering input vector is called the best-matching unit (BMU), denoted by mc : 2. Related Work x m= i x m −c m −i n (1) i Research in traffic classification, which avoids where ⋅ is the Euclidean distance, and { i} m payload inspection, has accelerated over the last are the codebook vectors. five years. It is generally difficult to compare After finding BMU, the SOM codebook different approaches, because they vary in the vectors are updated, such that the BMU is selection of features (some requiring inspection moved closer to the input vector. The of the packet payload), choice of supervised or topological neighbors of BMU are also treated unsupervised classification algorithms, and set this way. This procedure moves BMU and its of classified traffic classes. The wide range of topological neighbors towards the sample previous approaches can be seen in the vectors. The update rule for the ith codebook comprehensive survey by Nguyen and Armitage vector is: [1]. Further complicating comparisons between mi (n + 1) = mi (n) + α r (n)hci (2) different studies is the fact that classification where n is the training iteration number, x(t) is performance depends on how the classifier is an input vector randomly selected from the trained and the test data used to evaluate input data set at the nth training, α (n is the r ) accuracy. Unfortunately, a universal set of test traffic data does not exist to allow uniform learning rate in the nth training, and hi(n is c ) comparisons of different classifiers. the kernel function around BMU mc . The A common approach is to classify traffic on kernel function defines the region of influence the basis of flows instead of individual packets. that x has on the map. Trussell et al. proposed the distribution of Fig. 1 shows the U-matrix and the packet lengths as a useful feature [2]. McGregor components planes for the feature variables. The et al. used a variety of features: packet length U-matrix is a visualization of distance between statistics, interarrival times, byte counts, neurons, where distance is color coded connection duration [3]. Flows with similar according to the spectrum shown next to the features were grouped together using EM map. Blue areas represent codebook vectors (expectation- maximization) clustering. Having close to each other in input space, i.e., clusters. found the clusters representing a set of traffic classes, the features contributing little were deleted to simplify classification and the clusters were recomputed with the reduced feature set. EM clustering was also studied by Zander, Nguyen, and Armitage [4]. Sequential forward selection (SFS) was used to reduce the feature set. The same authors also tried AutoClass, an unsupervised Bayesian classifier, for cluster formation and SFS for feature set reduction [5]. 3. Unsupervised Clustering Fig. 1. U-matrix with 7 components scaled to 3.1 Self-Organizing Map [0,1]. SOM is trained iteratively. In each training step, one sample vector x from the input data pool is 3.2 K-Means Clustering chosen randomly, and the distances between it The K-means clustering algorithm starts with a and all the SOM codebook vectors are training data set and a given number of clusters calculated using some distance measure. The K. The samples in the training data set are neuron whose codebook vector is closest to the assigned to a cluster based on a similarity
  • 3. KSII The first International Conference on Internet (ICONI) 2009, December 2009 3 measurement. Euclidean distance is generally 5. Conclusions used to measure the similarity. The K-means algorithm tries to find an optimal solution by Traffic classification was carried out in two minimizing the square error: phases. In the first off-line phase, we started K n with no assumptions about traffic classes and ∑ x −c ∑ 2 Er= j i (3) used the unsupervised SOM and K-means i= j= 1 1 clustering algorithms to find the structure in the where K is the number of clusters and n is the traffic data. The data exploration procedure found three clusters corresponding to three QoS number of training samples, c i is the center of classes: transactional, interactive, and bulk data the ith cluster, x−c is the Euclidean distance i transfer. between sample x and center c i of the ith In the second classification phase, the cluster. accuracy of the KNN classifier was evaluated for test data. Leave-one-out cross-validation tests showed that this algorithm had a low error 4. Experimental Classification rate. The KNN classifier was found to have an Results and Analysis error rate of about 2 percent for the test data, compared to an error rate of 7 percent for a The previous section identified three clusters for MMD classifier. KNN is one of the simplest QoS classes and features to build up classification algorithms, but not necessarily the classification rules through unsupervised most accurate. Other supervised algorithms, learning. In this section, the accuracy of the such as back propagation (BP) and SVM, also classification rules is evaluated experimentally. have attractive features and should be compared For classification, we chose the K-nearest in future work. neighbor (KNN) algorithm. Experimental results are compared with the minimum mean distance (MMD) classifier. References The selected application lists for each class and the number of applications in each class are [1] Thuy Nguyen and Grenville Armitage, “A shown in Table 1. survey of techniques for Internet traffic classification using machine learning,” Table 1. Applications in each class IEEE Communications Surveys and Class Applications Total Tutorials, vo.10, no.4, pp.56-76, 2008. number [2] H. Trussell, A. Nilsson, P. Patel, and Y. Wang, “Estimation and detection of 53/TCP, 13/TCP, Transactional 112 network traffic,” in Proc. of 11th Digital 111/TCP,… Signal Processing Workshop, pp.246-248, 23/TCP, 21/TCP, 2004. 43/TCP, [3] Anthony McGregor, Mark Hall, Perry 513/TCP, Lorier, and James Brunskill, “Flow 514/TCP, clustering using machine learning Interactive 540/TCP, 77 techniques,” in Proc. of 5th Int. Workshop 251/TCP, on Passive and Active Network 1017/TCP, 1019/ Measurement, pp.205-214, 2004. TCP, 1020/TCP, [4] Sebastian Zander, Thuy Nguyen, and 1022/TCP,… Grenville Armitage, “Self-learning IP 80/TCP, 20/TCP, traffic classification based on statistical 25/TCP, 70/TCP, flow characteristics,” in Proc. of 6th Int. 79/TCP, 81/TCP, Workshop on Passive and Active 82/TCP, 83/TCP, Bulk data 1351 Measurement, pp.325-328, 2005. 84/TCP, [5] Sebastian Zander, Thuy Nguyen, and 119/TCP, Grenville Armitage, “Automated traffic 210/TCP, classification and application identification 8080/TCP,… using machine learning,” in Proc. of IEEE
  • 4. 4 Zeng et al.: Classification of Traffic Flows into QoS Classes by Clustering Conf. on Local Computer Networks, pp.250-257, 2005.