SlideShare a Scribd company logo
1 of 30
TRAFFIC FEATURES
EXTRACTION AND
CLUSTERING ANALYSIS FOR
ABNORMAL BEHAVIOR
DETECTION
1
Presented by :Areej Qasrawi
Jian Zhang Yan Tong Tao Qin
Outline
2
 Abstract
 1.Introduction
 2. Traffic Collection And Feature Extraction
 3. DBSCAN Clustering Method
 4. Experiment Result
 5. Conclusion
 6.Review
Abstract
3
 With the increasing of the network bandwidth, users
generate massive traffic data.
 how to extract the effective traffic features and achieve
the goal of abnormal behavior detection is a hot and
difficult problem in the area of network security
monitoring.
Abstract
4
 In this paper, several traffic features are proposed to capture the traffic
characteristics.
 Then we employ the DBSCAN methods to realize abnormal traffic mining
based on those features.
 We extract four traffic features to capture the traffic characteristics,
including :
 the ratio between number of source and destination IP addresses.
 ration between number of source port and destination port.
 ration between number of TCP packet and the total number of packets .
 number of small packets between the total number of package in a specific time
window
Abstract
5
 Based on the feature extracted, we employ the DBSCAN method to cluster
the network traffic in different clusters.
 Traffic packet in different clusters has different statistical characteristics.
 and the isolated points are employed to perform abnormal traffic detection.
 The implementation results based on traffic show that
 proposed statistics features can capture traffic characteristics.
 and clustering method can classify the abnormal traffic packets into a special
cluster.
Introduction
6
 Capturing and analyzing the abnormal traffic is one of the most critical
issues in keeping a network under control.
 Abnormal behaviors are referred to the behaviors caused by attacks which
infect normal operations of the Internet, such as worms and DDoS, etc.
 Those attacks will generate huge number of traffic packet and cause changes in the flow
patterns.
 Abnormal behavior detection is defined as how to discover those attacks by
analyzing the flow patterns using different methods.
 such as DPI and statistical methods
 In the early network, usually total bytes of packages transmitted on network
per unit time is employed to reflect the characteristics of the network.
 But in the network today, the bandwidth and number of users increase
quickly. It is more and more difficult to perform effective abnormal detection
2. Traffic Collection And Feature Extraction
2.1 Framework of the Proposed Methods
7
 The framework for degree calculation, characteristics measurement, and
abnormal users’ location extraction is divided into four steps as shown in
Fig. 1.
 Step 1: Traffic data collection. The network traffic data is collected using
the Coral Reef developed by CAIDA at the ingress router of our LAB. And
the flow model used is netstream, which is similar with netflow.
 Step 2: Traffic feature generation and analysis. According to the definition
of the traffic features, traffic packets are processed and then the traffic
features are extracted.
 Step 3: Clustering analysis, based on the features extracted, we employ the
DBSCAN methods to cluster the packets into different clusters.
 Step 4: Abnormal detection. We perform abnormal behaviors by analyze
the isolate points in the clustering results.
Fig. 1 Framework of measurement methods
8
2.2 Introduction to Netstream
9
 Netstreamis proposed by Huawei in China.
 which is defined as one-way network flow which
transmits packages with the same specific properties,
including source IP address, source port, destination IP
address, destination port and protocol.
 Packets with same properties are aggregated into one
flow.
2.2 Introduction to Netstream
10
 We established a network traffic monitoring platform in our Lab.
 The bandwidth of the campus network is 1Gbps.
 We set a mirror at the ingress router to collect data.
 The traffic data is dumped in form of PCAP file. The collection
period lasted for nearly 100 hours which ranged from Nov.20, 2015
to Nov.25, 2015.
 To verify the correctness of our methods, we also generate some
attacks, including DDOS attack and random brute-force incidents.
Based on the net stream model proposed, we aggregate those
packets into netstream flow
2.3 Traffic Feature Extraction
11
 To capture the characteristic of traffic, we must develop suitable traffic
features.
 As abnormal behaviors usually caused changes in the distributions of IP
and ports.
 In this paper, we proposed several traffic features and their definitions are
descripted as following:
 Definition1: Feature Hip is defined in equation 1,where Dsip and Ddip respectively represent
the number of different source IP address and different destination IP address in specific time
window.
Hip = Dsip /Ddip (1)
 Definition 2: Feature Hport is defined in equation 2, where Dspt and Ddpt respectively
represent the number of different source port and different destination port in specific time
window.
Hport = Dspt /Ddpt (2)
2.3 Traffic Feature Extraction
12
 Definition 3: Feature Htcp is defined in equation 3 can be employed to
capture the characteristics of the proportion of TCP packets. Where
Ptcp and Pip represent the number of TCP packets and IP packets in
specific time window respectively.
Htcp = Ptcp /Pip (3)
 Definition 4: Feature Hsmall is defined in equation 4 can be employed to
capture the characteristics of proportion small packets, where Psmall
and Pall represent the number of small packets and total packets in
specific time window respectively
Hsmall = Psmall /Pall (4)
2.3 Traffic Feature Extraction
13
 Also there are some abnormal attacks which will generated large
number of traffic packets in short time windows.
 The above features will lose their efficiency in this situation.
 We also develop one features based on the number of packets in
adjacent time windows, packet size and similarity of protocol, which
are defined in equation 5.
 Where Vi = protocal, the number of packages, average packet and
Vi+1 = protocol, the number of packages, average package
respectively represent the protocol, the number of packages and an
average number of packet in the ith and the (i+1)th time points.
Htime = 𝑉𝑖∗ 𝑉𝑖 + 1 /
2
𝑉𝑖 2∗
2
𝑉𝑖 + 1 2 (5)
2.4 Abnormal Traffic Detection Method
14
 Due to that most of the packets are generated by normal users and have
similar statistical characteristics, most of the packets are clustered into
clusters with a huge number of packets.
 But the packets generated by abnormal behaviors have special statistical
characteristics, such as have the same destination IP address generated
by DDoS or Port scan attack.
 Those packets are clustered into clusters with small number of packets,
even clusters only with one packets, which is called as isolated points.
 Based on the features extracted, we can employ the clustering methods to
cluster the streams into different clusters. With deep analysis into the flows
in the isolated.
 We can find the abnormal behavior and its behavior characteristics. The
cluster method used in this paper is DBSCAN
3. DBSCAN CLUSTERING METHOD
15
 3.1 Introduction for DBSCAN Algorithm
 Firstly, we analyses the element distribution of some databases shown
in Fig. 2.
 The main idea of DBSCAN method is that if a point element belongs to a
cluster, then it centered within a given radius ε region, including at least
a certain number of MinPts of elements point.
 The area shape of cluster is determined by the distance function
between two points (p, q) selected by users, which are marked as dist(p,
q).
 If we use Manhattan distance of two-dimension,the shape of region will
be rectangular.Under these conditions, the user can select a different
distance function according to their own requirements, to achieve the
purpose of optimizing clustering results.
3.1 Introduction for DBSCAN Algorithm
16
 Consider a set of points in some dataset to be clustered. For the purpose of
DBSCAN clustering, the points are classified into core points, (density-
)reachable points and outliers, which are described as follows:
 A point p is a core point if at least min Pts points are within distance ε(ε is the
maximum radius of the neighborhood from p) of it (including p). Those points are
said tobe directly reachable form p. By definition, no points are directly reachable
from a non-core point.
 A point q is reachable from p if there is a path p1 , ..., pn with p1 = p and pn = q,
where each pi+1 is directly reachable from pi .
 All points not reachable from any other point are outliers, which is also called
isolated points.
 Now if p is a core point, then it forms a cluster together with all points (core
or non-core) that are reachable from it. Each cluster contains at least one
core point; non-core points can be part of a cluster, but they form its "edge",
since they cannot be used to reach more points.
3.2 Definition of Distance
17
 Based on the traffic features extracted, we can obtain the vector which can
be employed to capture the traffic characteristics and behavior dynamics
for abnormal detection.
 The vector can be denoted using equation 6. Where (xi1, xi2, xi3, xi4, xi5)
denoted the Hip, Hport, Htcp, Hsamll and Htime respectively.
Feature = (xi1, xi2, xi3, xi4, xi5) (6)
 The distance between vector i and j is defined as follow in equation 7,
where xik and xjk are samples of traffic characteristic vectors at different
time windows.
d( i, j ) = 𝑘−1
5
𝑥𝑖𝑘 − 𝑥𝑗𝑘 2 1/2 (7)
3.3 Implementation of DBSCAN Algorithm
18
 DBSCAN algorithm clustering process is divided into the
following steps:
 Step 1:DBSCAN algorithm class discovery process.
 First check ε neighborhood of each object pi in the set of data objects
D.If the number of all objects relative to ε and MinPts are directly
densityreachable from pi , pi will be the core object.
 Then we find a new class with core object pi relative to ε and MinPts.If
the number of all objects directly density-reachable from pi on MinPts
and ε is zero, that is, the pi radius ε neighborhood does not contain
other objects, pi is temporarily marked as noise points. The procedure is
descripted as follows:
Step 1:DBSCAN algorithm class discovery
process.
19
3.3 Implementation of DBSCAN Algorithm
20
 Step 2: The process of DBSCAN classified data
objects as the known classes and merging classes.
 After found all core objects in the data objects, repeatedly
find density can be up to the object directly from these core
objects.
 If an object of non-core q and a core object pi is directly
density-reachable,the q belongs to class with core object
pi.
 If a core object pi density can be up to the object of pj is
also a core object, we will merge the class with core object
pi and class with core object pj .
 A core object can only belong to one class, but a non-core
object can belong to several classes.
 When there are no new objects can be added to any
classes, the end of the process.
 The procedure is descripted as follows:
Step 2: The process of DBSCAN classified
data objects as the known classes and
merging classes.21
3.3 Implementation of DBSCAN Algorithm
22
 Finally,to be introduced,DBSCAN algorithm
parameter ε and Minpts is artificial setting by
user according to the specific conditions of
network being monitored and management
experiences.
4. EXPERIMENT RESULT
4.1 The Generating of Test Data
23
 In this paper, the test data from laboratory, a China three MSR50- 60
routers.
 The router is divided into four subnets.It has about two hundred hosts,
including mail and web server, the number of users more than 200,
including graduate students, doctoral students and tutors.
 we collect data by using Jpcapture on December 20, 2015 to 22.
 In order to get better experimental effect, a DDoS attack was launched to
target drone in laboratory between 10 PM to 11 PM on 22.
4.2 TestingResult Analysis
24
 We got three classes and few amounts of isolated points.
 The number of first kind is 2469,the second is 518 and the third is 29.
 We mainly focus on isolated points and small clusters.Its statistical
characteristics are shown in table 1.
 From the time view,the third class is highly concentrated on time,the data is
highly similar and small which is similar with abnormal cluster.
 And the time mark of this cluster is consistent with the time we launched
ddos attack.It is evident that the detection for network attack is effective.
4.3 Compare with the Results of Kmeans
25
 Before using Kmeans clustering method, we need to specify the number of
clusters.For better comparing with DBSCAN clustering results, we set the
number of clusters is 4, which is same with the number of clusters obtained
using DBSCAN.The statistical features of clustering results are shown in
table 2.
 From the table we can get that the number of points in the clusters are
extremely different generated by those two methods.
 We made a further analysis and found that although abnormal traffic is
classified into a cluster using Kmeans clustering method,but the number of
the points in this cluster is up to 854,which accounting about 28.0% of the
total.
 Thus the false alarm rate is very high and which can be treated are useless
for finding abnormal traffic.
 Further experiments found that no matter how to set parameters,the
clusters obtained using Kmeans method have no obvious characteristics
andabnormal data is often classified into different clusters.
4.3 Compare with the Results of Kmeans
26
4.3 Compare with the Results of Kmeans
27
 Obviously,Kmeans clustering method is not suitable for abnormal behavior
detection.
 This is caused by the fact that Kmeans can not recognize the spherical
clusters.
 In contrast,if we set suitable ε and MinPts for DBSCAN,we will get more
suitable results. It shows that the DBSCAN based on density clustering is
excellent,it can against noise and deal with cluster with arbitrary shape and
size, so that we can find clusters that Kmeans method cannot find.
5. Conclusion
28
 we collected the traffic data in real network environment, based on the
network operate and security management experience, we proposed five
traffic features which can reflect the traffic characteristics.
 Based on the features extraction, we employ the DBSCAN to clustering the
traffic into different clusters. By analyzing the characteristics of isolated
points, we can achieve the goal of abnormal behavior detection duo to most
the abnormal behaviors have similar patterns.
 The experimental results show that the proposed methods have higher
detection accuracy than the traditional clustering method.
 Due to the method proposed don’t need training, thus we can detect those
unknown abnormal behaviors, which are more important for network
security monitoring
Reviews
29
 Strong points
 The paper follows a reasonable logical organization.
 provides many simple examples .
 clearly lays out the steps for work.
 The arguments and designs are straightforward.
 Weak points
 For the related work it was not clear .
 It presents some concepts and presents its work without
explaining what it is and some explained later.
30
Thank you ☺

More Related Content

What's hot

Performance evaluation of proactive, reactive and hybrid routing protocols wi...
Performance evaluation of proactive, reactive and hybrid routing protocols wi...Performance evaluation of proactive, reactive and hybrid routing protocols wi...
Performance evaluation of proactive, reactive and hybrid routing protocols wi...
eSAT Journals
 
A novel approach of hybrid multipath routing protocol for manets using receiv...
A novel approach of hybrid multipath routing protocol for manets using receiv...A novel approach of hybrid multipath routing protocol for manets using receiv...
A novel approach of hybrid multipath routing protocol for manets using receiv...
eSAT Publishing House
 
PERFORMANCE ANALYSIS AND COMPARISON OF IMPROVED DSR WITH DSR, AODV AND DSDV R...
PERFORMANCE ANALYSIS AND COMPARISON OF IMPROVED DSR WITH DSR, AODV AND DSDV R...PERFORMANCE ANALYSIS AND COMPARISON OF IMPROVED DSR WITH DSR, AODV AND DSDV R...
PERFORMANCE ANALYSIS AND COMPARISON OF IMPROVED DSR WITH DSR, AODV AND DSDV R...
ijp2p
 

What's hot (17)

Dynamic time warping and PIC 16F676 for control of devices
Dynamic time warping and PIC 16F676 for control of devicesDynamic time warping and PIC 16F676 for control of devices
Dynamic time warping and PIC 16F676 for control of devices
 
Labeled generalized stochastic petri net Based approach for web services Comp...
Labeled generalized stochastic petri net Based approach for web services Comp...Labeled generalized stochastic petri net Based approach for web services Comp...
Labeled generalized stochastic petri net Based approach for web services Comp...
 
Analyzing performance of zrp by varying node density and transmission range
Analyzing performance of zrp by varying node density and transmission rangeAnalyzing performance of zrp by varying node density and transmission range
Analyzing performance of zrp by varying node density and transmission range
 
Transport Layer
Transport LayerTransport Layer
Transport Layer
 
Multiple Downlink Fair Packet Scheduling Scheme in Wi-Max
Multiple Downlink Fair Packet Scheduling Scheme in Wi-MaxMultiple Downlink Fair Packet Scheduling Scheme in Wi-Max
Multiple Downlink Fair Packet Scheduling Scheme in Wi-Max
 
Optimized Fuzzy Routing for MANET
Optimized Fuzzy Routing for MANETOptimized Fuzzy Routing for MANET
Optimized Fuzzy Routing for MANET
 
Performance evaluation of proactive, reactive and hybrid routing protocols wi...
Performance evaluation of proactive, reactive and hybrid routing protocols wi...Performance evaluation of proactive, reactive and hybrid routing protocols wi...
Performance evaluation of proactive, reactive and hybrid routing protocols wi...
 
Performance evaluation of proactive, reactive and
Performance evaluation of proactive, reactive andPerformance evaluation of proactive, reactive and
Performance evaluation of proactive, reactive and
 
Determining the Optimum Number of Paths for Realization of Multi-path Routing...
Determining the Optimum Number of Paths for Realization of Multi-path Routing...Determining the Optimum Number of Paths for Realization of Multi-path Routing...
Determining the Optimum Number of Paths for Realization of Multi-path Routing...
 
Extensive Reviews of OSPF and EIGRP Routing Protocols based on Route Summariz...
Extensive Reviews of OSPF and EIGRP Routing Protocols based on Route Summariz...Extensive Reviews of OSPF and EIGRP Routing Protocols based on Route Summariz...
Extensive Reviews of OSPF and EIGRP Routing Protocols based on Route Summariz...
 
A QOS BASED LOAD BALANCED SWITCH
A QOS BASED LOAD BALANCED SWITCHA QOS BASED LOAD BALANCED SWITCH
A QOS BASED LOAD BALANCED SWITCH
 
A novel approach of hybrid multipath routing protocol for manets using receiv...
A novel approach of hybrid multipath routing protocol for manets using receiv...A novel approach of hybrid multipath routing protocol for manets using receiv...
A novel approach of hybrid multipath routing protocol for manets using receiv...
 
PERFORMANCE ANALYSIS AND COMPARISON OF IMPROVED DSR WITH DSR, AODV AND DSDV R...
PERFORMANCE ANALYSIS AND COMPARISON OF IMPROVED DSR WITH DSR, AODV AND DSDV R...PERFORMANCE ANALYSIS AND COMPARISON OF IMPROVED DSR WITH DSR, AODV AND DSDV R...
PERFORMANCE ANALYSIS AND COMPARISON OF IMPROVED DSR WITH DSR, AODV AND DSDV R...
 
A QUANTITATIVE ANALYSIS OF HANDOVER TIME AT MAC LAYER FOR WIRELESS MOBILE NET...
A QUANTITATIVE ANALYSIS OF HANDOVER TIME AT MAC LAYER FOR WIRELESS MOBILE NET...A QUANTITATIVE ANALYSIS OF HANDOVER TIME AT MAC LAYER FOR WIRELESS MOBILE NET...
A QUANTITATIVE ANALYSIS OF HANDOVER TIME AT MAC LAYER FOR WIRELESS MOBILE NET...
 
E04402030037
E04402030037E04402030037
E04402030037
 
A018120105
A018120105A018120105
A018120105
 
Dynamic MPLS with Feedback
Dynamic MPLS with FeedbackDynamic MPLS with Feedback
Dynamic MPLS with Feedback
 

Similar to Traffic Features Extraction and Clustering Analysis for Abnormal Behavior Detection

Discriminators for use in flow-based classification
Discriminators for use in flow-based classificationDiscriminators for use in flow-based classification
Discriminators for use in flow-based classification
Denis Zuev
 
Traffic Engineering in Software Defined Networking SDN
Traffic Engineering in Software Defined Networking SDNTraffic Engineering in Software Defined Networking SDN
Traffic Engineering in Software Defined Networking SDN
ijtsrd
 
Network Flow Pattern Extraction by Clustering Eugine Kang
Network Flow Pattern Extraction by Clustering Eugine KangNetwork Flow Pattern Extraction by Clustering Eugine Kang
Network Flow Pattern Extraction by Clustering Eugine Kang
Eugine Kang
 
OSPF (Open Shortest Path First) Case Study: Anil Nembang
OSPF (Open Shortest Path First) Case Study: Anil NembangOSPF (Open Shortest Path First) Case Study: Anil Nembang
OSPF (Open Shortest Path First) Case Study: Anil Nembang
Anil Nembang
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 
Dynamic bandwidth allocation scheme in lr pon with performance modelling and ...
Dynamic bandwidth allocation scheme in lr pon with performance modelling and ...Dynamic bandwidth allocation scheme in lr pon with performance modelling and ...
Dynamic bandwidth allocation scheme in lr pon with performance modelling and ...
IJCNCJournal
 

Similar to Traffic Features Extraction and Clustering Analysis for Abnormal Behavior Detection (20)

Discriminators for use in flow-based classification
Discriminators for use in flow-based classificationDiscriminators for use in flow-based classification
Discriminators for use in flow-based classification
 
Itcm a real time internet traffic classifier monitor
Itcm a real time internet traffic classifier monitorItcm a real time internet traffic classifier monitor
Itcm a real time internet traffic classifier monitor
 
DIFFERENTIATED SERVICES ENSURING QOS ON INTERNET
DIFFERENTIATED SERVICES ENSURING QOS ON INTERNETDIFFERENTIATED SERVICES ENSURING QOS ON INTERNET
DIFFERENTIATED SERVICES ENSURING QOS ON INTERNET
 
Clustering-based Analysis for Heavy-Hitter Flow Detection
Clustering-based Analysis for Heavy-Hitter Flow DetectionClustering-based Analysis for Heavy-Hitter Flow Detection
Clustering-based Analysis for Heavy-Hitter Flow Detection
 
Design, implementation and evaluation of icmp based available network bandwid...
Design, implementation and evaluation of icmp based available network bandwid...Design, implementation and evaluation of icmp based available network bandwid...
Design, implementation and evaluation of icmp based available network bandwid...
 
Chapter 3. sensors in the network domain
Chapter 3. sensors in the network domainChapter 3. sensors in the network domain
Chapter 3. sensors in the network domain
 
Traffic Engineering in Software Defined Networking SDN
Traffic Engineering in Software Defined Networking SDNTraffic Engineering in Software Defined Networking SDN
Traffic Engineering in Software Defined Networking SDN
 
IRJET- Performance Improvement of Wireless Network using Modern Simulation Tools
IRJET- Performance Improvement of Wireless Network using Modern Simulation ToolsIRJET- Performance Improvement of Wireless Network using Modern Simulation Tools
IRJET- Performance Improvement of Wireless Network using Modern Simulation Tools
 
Network Flow Pattern Extraction by Clustering Eugine Kang
Network Flow Pattern Extraction by Clustering Eugine KangNetwork Flow Pattern Extraction by Clustering Eugine Kang
Network Flow Pattern Extraction by Clustering Eugine Kang
 
New Scheme for Secured Routing in MANET
New Scheme for Secured Routing in MANET New Scheme for Secured Routing in MANET
New Scheme for Secured Routing in MANET
 
OSPF (Open Shortest Path First) Case Study: Anil Nembang
OSPF (Open Shortest Path First) Case Study: Anil NembangOSPF (Open Shortest Path First) Case Study: Anil Nembang
OSPF (Open Shortest Path First) Case Study: Anil Nembang
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Dynamic bandwidth allocation scheme in lr pon with performance modelling and ...
Dynamic bandwidth allocation scheme in lr pon with performance modelling and ...Dynamic bandwidth allocation scheme in lr pon with performance modelling and ...
Dynamic bandwidth allocation scheme in lr pon with performance modelling and ...
 
Modified Headfirst Sliding Routing: A Time-Based Routing Scheme for Bus-Nochy...
Modified Headfirst Sliding Routing: A Time-Based Routing Scheme for Bus-Nochy...Modified Headfirst Sliding Routing: A Time-Based Routing Scheme for Bus-Nochy...
Modified Headfirst Sliding Routing: A Time-Based Routing Scheme for Bus-Nochy...
 
IMPACT OF CONTENTION WINDOW ON CONGESTION CONTROL ALGORITHMS FOR WIRELESS ADH...
IMPACT OF CONTENTION WINDOW ON CONGESTION CONTROL ALGORITHMS FOR WIRELESS ADH...IMPACT OF CONTENTION WINDOW ON CONGESTION CONTROL ALGORITHMS FOR WIRELESS ADH...
IMPACT OF CONTENTION WINDOW ON CONGESTION CONTROL ALGORITHMS FOR WIRELESS ADH...
 
IRJET- Survey on Implementation of Graph Theory in Routing Protocols of Wired...
IRJET- Survey on Implementation of Graph Theory in Routing Protocols of Wired...IRJET- Survey on Implementation of Graph Theory in Routing Protocols of Wired...
IRJET- Survey on Implementation of Graph Theory in Routing Protocols of Wired...
 
ANALYZING NETWORK PERFORMANCE PARAMETERS USING WIRESHARK
ANALYZING NETWORK PERFORMANCE PARAMETERS USING WIRESHARKANALYZING NETWORK PERFORMANCE PARAMETERS USING WIRESHARK
ANALYZING NETWORK PERFORMANCE PARAMETERS USING WIRESHARK
 
cscn1819.pdf
cscn1819.pdfcscn1819.pdf
cscn1819.pdf
 
An Approach to Detect Packets Using Packet Sniffing
An Approach to Detect Packets Using Packet SniffingAn Approach to Detect Packets Using Packet Sniffing
An Approach to Detect Packets Using Packet Sniffing
 
Prediction System for Reducing the Cloud Bandwidth and Cost
Prediction System for Reducing the Cloud Bandwidth and CostPrediction System for Reducing the Cloud Bandwidth and Cost
Prediction System for Reducing the Cloud Bandwidth and Cost
 

Recently uploaded

TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
FIDO Alliance
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 

Recently uploaded (20)

How to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in PakistanHow to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in Pakistan
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdf
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
الأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهالأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهله
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 

Traffic Features Extraction and Clustering Analysis for Abnormal Behavior Detection

  • 1. TRAFFIC FEATURES EXTRACTION AND CLUSTERING ANALYSIS FOR ABNORMAL BEHAVIOR DETECTION 1 Presented by :Areej Qasrawi Jian Zhang Yan Tong Tao Qin
  • 2. Outline 2  Abstract  1.Introduction  2. Traffic Collection And Feature Extraction  3. DBSCAN Clustering Method  4. Experiment Result  5. Conclusion  6.Review
  • 3. Abstract 3  With the increasing of the network bandwidth, users generate massive traffic data.  how to extract the effective traffic features and achieve the goal of abnormal behavior detection is a hot and difficult problem in the area of network security monitoring.
  • 4. Abstract 4  In this paper, several traffic features are proposed to capture the traffic characteristics.  Then we employ the DBSCAN methods to realize abnormal traffic mining based on those features.  We extract four traffic features to capture the traffic characteristics, including :  the ratio between number of source and destination IP addresses.  ration between number of source port and destination port.  ration between number of TCP packet and the total number of packets .  number of small packets between the total number of package in a specific time window
  • 5. Abstract 5  Based on the feature extracted, we employ the DBSCAN method to cluster the network traffic in different clusters.  Traffic packet in different clusters has different statistical characteristics.  and the isolated points are employed to perform abnormal traffic detection.  The implementation results based on traffic show that  proposed statistics features can capture traffic characteristics.  and clustering method can classify the abnormal traffic packets into a special cluster.
  • 6. Introduction 6  Capturing and analyzing the abnormal traffic is one of the most critical issues in keeping a network under control.  Abnormal behaviors are referred to the behaviors caused by attacks which infect normal operations of the Internet, such as worms and DDoS, etc.  Those attacks will generate huge number of traffic packet and cause changes in the flow patterns.  Abnormal behavior detection is defined as how to discover those attacks by analyzing the flow patterns using different methods.  such as DPI and statistical methods  In the early network, usually total bytes of packages transmitted on network per unit time is employed to reflect the characteristics of the network.  But in the network today, the bandwidth and number of users increase quickly. It is more and more difficult to perform effective abnormal detection
  • 7. 2. Traffic Collection And Feature Extraction 2.1 Framework of the Proposed Methods 7  The framework for degree calculation, characteristics measurement, and abnormal users’ location extraction is divided into four steps as shown in Fig. 1.  Step 1: Traffic data collection. The network traffic data is collected using the Coral Reef developed by CAIDA at the ingress router of our LAB. And the flow model used is netstream, which is similar with netflow.  Step 2: Traffic feature generation and analysis. According to the definition of the traffic features, traffic packets are processed and then the traffic features are extracted.  Step 3: Clustering analysis, based on the features extracted, we employ the DBSCAN methods to cluster the packets into different clusters.  Step 4: Abnormal detection. We perform abnormal behaviors by analyze the isolate points in the clustering results.
  • 8. Fig. 1 Framework of measurement methods 8
  • 9. 2.2 Introduction to Netstream 9  Netstreamis proposed by Huawei in China.  which is defined as one-way network flow which transmits packages with the same specific properties, including source IP address, source port, destination IP address, destination port and protocol.  Packets with same properties are aggregated into one flow.
  • 10. 2.2 Introduction to Netstream 10  We established a network traffic monitoring platform in our Lab.  The bandwidth of the campus network is 1Gbps.  We set a mirror at the ingress router to collect data.  The traffic data is dumped in form of PCAP file. The collection period lasted for nearly 100 hours which ranged from Nov.20, 2015 to Nov.25, 2015.  To verify the correctness of our methods, we also generate some attacks, including DDOS attack and random brute-force incidents. Based on the net stream model proposed, we aggregate those packets into netstream flow
  • 11. 2.3 Traffic Feature Extraction 11  To capture the characteristic of traffic, we must develop suitable traffic features.  As abnormal behaviors usually caused changes in the distributions of IP and ports.  In this paper, we proposed several traffic features and their definitions are descripted as following:  Definition1: Feature Hip is defined in equation 1,where Dsip and Ddip respectively represent the number of different source IP address and different destination IP address in specific time window. Hip = Dsip /Ddip (1)  Definition 2: Feature Hport is defined in equation 2, where Dspt and Ddpt respectively represent the number of different source port and different destination port in specific time window. Hport = Dspt /Ddpt (2)
  • 12. 2.3 Traffic Feature Extraction 12  Definition 3: Feature Htcp is defined in equation 3 can be employed to capture the characteristics of the proportion of TCP packets. Where Ptcp and Pip represent the number of TCP packets and IP packets in specific time window respectively. Htcp = Ptcp /Pip (3)  Definition 4: Feature Hsmall is defined in equation 4 can be employed to capture the characteristics of proportion small packets, where Psmall and Pall represent the number of small packets and total packets in specific time window respectively Hsmall = Psmall /Pall (4)
  • 13. 2.3 Traffic Feature Extraction 13  Also there are some abnormal attacks which will generated large number of traffic packets in short time windows.  The above features will lose their efficiency in this situation.  We also develop one features based on the number of packets in adjacent time windows, packet size and similarity of protocol, which are defined in equation 5.  Where Vi = protocal, the number of packages, average packet and Vi+1 = protocol, the number of packages, average package respectively represent the protocol, the number of packages and an average number of packet in the ith and the (i+1)th time points. Htime = 𝑉𝑖∗ 𝑉𝑖 + 1 / 2 𝑉𝑖 2∗ 2 𝑉𝑖 + 1 2 (5)
  • 14. 2.4 Abnormal Traffic Detection Method 14  Due to that most of the packets are generated by normal users and have similar statistical characteristics, most of the packets are clustered into clusters with a huge number of packets.  But the packets generated by abnormal behaviors have special statistical characteristics, such as have the same destination IP address generated by DDoS or Port scan attack.  Those packets are clustered into clusters with small number of packets, even clusters only with one packets, which is called as isolated points.  Based on the features extracted, we can employ the clustering methods to cluster the streams into different clusters. With deep analysis into the flows in the isolated.  We can find the abnormal behavior and its behavior characteristics. The cluster method used in this paper is DBSCAN
  • 15. 3. DBSCAN CLUSTERING METHOD 15  3.1 Introduction for DBSCAN Algorithm  Firstly, we analyses the element distribution of some databases shown in Fig. 2.  The main idea of DBSCAN method is that if a point element belongs to a cluster, then it centered within a given radius ε region, including at least a certain number of MinPts of elements point.  The area shape of cluster is determined by the distance function between two points (p, q) selected by users, which are marked as dist(p, q).  If we use Manhattan distance of two-dimension,the shape of region will be rectangular.Under these conditions, the user can select a different distance function according to their own requirements, to achieve the purpose of optimizing clustering results.
  • 16. 3.1 Introduction for DBSCAN Algorithm 16  Consider a set of points in some dataset to be clustered. For the purpose of DBSCAN clustering, the points are classified into core points, (density- )reachable points and outliers, which are described as follows:  A point p is a core point if at least min Pts points are within distance ε(ε is the maximum radius of the neighborhood from p) of it (including p). Those points are said tobe directly reachable form p. By definition, no points are directly reachable from a non-core point.  A point q is reachable from p if there is a path p1 , ..., pn with p1 = p and pn = q, where each pi+1 is directly reachable from pi .  All points not reachable from any other point are outliers, which is also called isolated points.  Now if p is a core point, then it forms a cluster together with all points (core or non-core) that are reachable from it. Each cluster contains at least one core point; non-core points can be part of a cluster, but they form its "edge", since they cannot be used to reach more points.
  • 17. 3.2 Definition of Distance 17  Based on the traffic features extracted, we can obtain the vector which can be employed to capture the traffic characteristics and behavior dynamics for abnormal detection.  The vector can be denoted using equation 6. Where (xi1, xi2, xi3, xi4, xi5) denoted the Hip, Hport, Htcp, Hsamll and Htime respectively. Feature = (xi1, xi2, xi3, xi4, xi5) (6)  The distance between vector i and j is defined as follow in equation 7, where xik and xjk are samples of traffic characteristic vectors at different time windows. d( i, j ) = 𝑘−1 5 𝑥𝑖𝑘 − 𝑥𝑗𝑘 2 1/2 (7)
  • 18. 3.3 Implementation of DBSCAN Algorithm 18  DBSCAN algorithm clustering process is divided into the following steps:  Step 1:DBSCAN algorithm class discovery process.  First check ε neighborhood of each object pi in the set of data objects D.If the number of all objects relative to ε and MinPts are directly densityreachable from pi , pi will be the core object.  Then we find a new class with core object pi relative to ε and MinPts.If the number of all objects directly density-reachable from pi on MinPts and ε is zero, that is, the pi radius ε neighborhood does not contain other objects, pi is temporarily marked as noise points. The procedure is descripted as follows:
  • 19. Step 1:DBSCAN algorithm class discovery process. 19
  • 20. 3.3 Implementation of DBSCAN Algorithm 20  Step 2: The process of DBSCAN classified data objects as the known classes and merging classes.  After found all core objects in the data objects, repeatedly find density can be up to the object directly from these core objects.  If an object of non-core q and a core object pi is directly density-reachable,the q belongs to class with core object pi.  If a core object pi density can be up to the object of pj is also a core object, we will merge the class with core object pi and class with core object pj .  A core object can only belong to one class, but a non-core object can belong to several classes.  When there are no new objects can be added to any classes, the end of the process.  The procedure is descripted as follows:
  • 21. Step 2: The process of DBSCAN classified data objects as the known classes and merging classes.21
  • 22. 3.3 Implementation of DBSCAN Algorithm 22  Finally,to be introduced,DBSCAN algorithm parameter ε and Minpts is artificial setting by user according to the specific conditions of network being monitored and management experiences.
  • 23. 4. EXPERIMENT RESULT 4.1 The Generating of Test Data 23  In this paper, the test data from laboratory, a China three MSR50- 60 routers.  The router is divided into four subnets.It has about two hundred hosts, including mail and web server, the number of users more than 200, including graduate students, doctoral students and tutors.  we collect data by using Jpcapture on December 20, 2015 to 22.  In order to get better experimental effect, a DDoS attack was launched to target drone in laboratory between 10 PM to 11 PM on 22.
  • 24. 4.2 TestingResult Analysis 24  We got three classes and few amounts of isolated points.  The number of first kind is 2469,the second is 518 and the third is 29.  We mainly focus on isolated points and small clusters.Its statistical characteristics are shown in table 1.  From the time view,the third class is highly concentrated on time,the data is highly similar and small which is similar with abnormal cluster.  And the time mark of this cluster is consistent with the time we launched ddos attack.It is evident that the detection for network attack is effective.
  • 25. 4.3 Compare with the Results of Kmeans 25  Before using Kmeans clustering method, we need to specify the number of clusters.For better comparing with DBSCAN clustering results, we set the number of clusters is 4, which is same with the number of clusters obtained using DBSCAN.The statistical features of clustering results are shown in table 2.  From the table we can get that the number of points in the clusters are extremely different generated by those two methods.  We made a further analysis and found that although abnormal traffic is classified into a cluster using Kmeans clustering method,but the number of the points in this cluster is up to 854,which accounting about 28.0% of the total.  Thus the false alarm rate is very high and which can be treated are useless for finding abnormal traffic.  Further experiments found that no matter how to set parameters,the clusters obtained using Kmeans method have no obvious characteristics andabnormal data is often classified into different clusters.
  • 26. 4.3 Compare with the Results of Kmeans 26
  • 27. 4.3 Compare with the Results of Kmeans 27  Obviously,Kmeans clustering method is not suitable for abnormal behavior detection.  This is caused by the fact that Kmeans can not recognize the spherical clusters.  In contrast,if we set suitable ε and MinPts for DBSCAN,we will get more suitable results. It shows that the DBSCAN based on density clustering is excellent,it can against noise and deal with cluster with arbitrary shape and size, so that we can find clusters that Kmeans method cannot find.
  • 28. 5. Conclusion 28  we collected the traffic data in real network environment, based on the network operate and security management experience, we proposed five traffic features which can reflect the traffic characteristics.  Based on the features extraction, we employ the DBSCAN to clustering the traffic into different clusters. By analyzing the characteristics of isolated points, we can achieve the goal of abnormal behavior detection duo to most the abnormal behaviors have similar patterns.  The experimental results show that the proposed methods have higher detection accuracy than the traditional clustering method.  Due to the method proposed don’t need training, thus we can detect those unknown abnormal behaviors, which are more important for network security monitoring
  • 29. Reviews 29  Strong points  The paper follows a reasonable logical organization.  provides many simple examples .  clearly lays out the steps for work.  The arguments and designs are straightforward.  Weak points  For the related work it was not clear .  It presents some concepts and presents its work without explaining what it is and some explained later.