Anomaly Detection in Traffic Monitoring - Presentation Transcript
Anomaly Detection The state of the art Ignasi Paredes Oliva [email_address]
Outline
What is an anomaly?
Why anomaly detection?
Known anomalies
Present
Future work
Conclusions
Outline
What is an anomaly?
Why anomaly detection?
Known anomalies
Present
Future work
Conclusions
What is an anomaly?
An attack (malicious anomaly)
DoS
Worms
etc
Unexpected behaviour (unintentional anomaly)
A malfunctioning router.
Broken hardware due to some natural kind of accident (like a fire).
etc
Outline
What is an anomaly?
Why anomaly detection?
Known anomalies
Present
Future work
Conclusions
Internet
It changes very fast.
Number of threats grows and diversifies Networks are compromised every day with new kind of attacks
Threats: Network performance and security.
What security mechanisms do we have to fight against these threats?
Antivirus, firewalls, signature-based IDS.
They are not enough to provide full protection.
Why Anomaly Detection?
Drawbacks of firewalls, antivirus and signature-based IDS:
They only catch events that have been told to look for.
Anything outside the list will not be recognised.
Solution
Anomaly Detection System everything (known and unknown) that differs the normal profile is flagged as anomalous.
Final goal
Firewalls + antivirus + IDS + ADS working together to achieve a robust defence.
Why Anomaly Detection? (II)
Outline
What is an anomaly?
Why anomaly detection?
Known anomalies
Present
Future work
Conclusions
Known anomalies
Alpha
DoS and DDoS
Flash Crowd
Scan
Worm
Outage
Point to Multipoint
Ingress Shift
Alpha anomaly
Description
High rate point to point transfer.
Features
Spike in traffic between a single pair of IP addresses.
Short duration.
FTP transfer
DoS/DDoS anomaly
Description
An attempt to make a computer resource unavailable to prevent authorized users from gaining access to the desired network resource.
Features
Spike in traffic to a dominant destination IP.
Flash Crowd anomaly
Description
Unusually large demand for a resource/service.
Features
Spike in traffic to a dominant destination IP and port.
Scan anomaly
Description
Looking for a vulnerable port (port scan) or scanning the network for a target port (network scan).
Features
Spike in traffic from a dominant source IP.
open(port X)? ? ? ? ? ? ? ?
Worm anomaly
Description
Code that spreads across a network by exploiting security flaws.
Features
Spike in traffic with a dominant port.
Outage anomaly
Description
Decrease in traffic exchanged between an OD pair.
OD: O=origin (ingress router) and D=destination (egress router)
Change
Point to Multipoint anomaly
Description
Distribution of content from one server to many users.
Features
Spike in traffic from a dominant source IP to numerous destinations with the same (and well known) port.
Ingress Shift anomaly
Description
Customer shifts traffic from one ingress router to another one.
Features
Decrease in traffic in one OD flow and spike in another one.
Outline
What is an anomaly?
Why anomaly detection?
Known anomalies
Present
Future work
Conclusions
Present
Anomaly detection (AD)
Threshold-based
Profile-based
Subspace-based
Wavelet-based
Anomaly classification (AC)
Subspace-based
Clustering
Present
Anomaly detection (AD)
Threshold-based
Profile-based
Subspace-based
Wavelet-based
Anomaly classification (AC)
Subspace-based
Clustering
Threshold-based AD
Description
Upper and lower traffic thresholds are set for each time interval considered ( day and night, workday and weekend…).
bits/s, packets/s and flows/s.
How?
A value higher than the upper threshold.
A value smaller than the lower threshold.
Present
Anomaly detection (AD)
Threshold-based
Profile-based
Subspace-based
Wavelet-based
Anomaly classification (AC)
Subspace-based
Clustering
Profile-based AD
Time Series
Sequence of data points measured at successive times and spaced at time intervals.
Time series prediction : use of a model to predict future events based on known past events.
When applying time series for forecasting we implicitly make the assumption that there is an underlying model which governs the development of the system.
Method
There is a statistical definition of normal activity.
Future values are predicted using the past ones.
If the actual traffic measure differs too much from its predicted value anomaly.
Profile-based AD
Profile-based
Exponential smoothing (ES)
The Holt-Winters forecasting algorithm
Exponential Smoothing
The next value is predicted given the current value and the current prediction.
Y t+1 = α *Y t + (1- α )*y t
α is the model parameter (0< α <1)
Current value is most informative for the next one and older observations importance decays exponentially.
Main drawback
Assumes a steady evolution. It will never foresee a turning point It doesn’t account for seasonality.
Profile-based AD
Profile-based
The Holt-Winters forecasting algorithm
Exponential smoothing (ES)
The Holt-Winters Forecasting Algorithm
Observed time series can be divided into three components:
ES is used to update each one:
Baseline: a t = k * ( y t – c t-m ) + (1-k) * ( a t-1 + b t-1 )
Linear trend: b t = q * ( a t – a t-1 ) + (1-q) * b t-1
Seasonal effect: c t = r * ( y t - a t ) + (1 - r) * c t-m
k,q and r are the model parameters
Larger values fast adaptation to recent changes.
Smaller values slow adaptation; more weight on the past history.
Seasonality concept is clearly used .
Detecting the anomaly… (defining “too much” deviation)
For each predicted value of the time series there is a confidence band.
If the observed value falls out its interval anomaly
Drawback: high number of false positives.
Moving window of fixed number of observations
A maximum threshold of violations is defined
If more values than the threshold fall out anomaly
Profile-based AD
Present
Anomaly detection (AD)
Threshold-based
Profile-based
Subspace-based
Wavelet-based
Anomaly classification (AC)
Subspace-based
Clustering
Subspace-based AD
Subspace-based
(“single-way”) Subspace Method
Multiway Subspace Method
Subspace method
Focus on OD flows instead of studying traffic on all links.
Volume anomalies produces sudden changes in a OD flow’s traffic.
#packets and #bytes.
Problem: OD flows are a high dimensional multivariate structure A lower-dimensional approximation is needed Principal Component Analysis (PCA)
Subspace method (II) - PCA
Description
It works by examining the time series of traffic in all OD flows.
All their structure can be well captured by few dimensions.
How?
It transforms data onto a new set of axes (called principal components, PCs).
PCs are ordered by higher captured variance.
The first k PCs tend to capture the most significant and common periodic temporal trends
Subspace method (SM) and PCA:
SM separates the PCs into two sets (normal and anomalous variations of traffic) normal and anomalous subspaces.
The link traffic y is projected onto these subspaces.
: residual vector that captures the unexplained variation in the used metric (projecting y onto anomalous subspace).
If the norm of this vector is large anomaly
Subspace method (III)
Subspace method (IV)
Projections examples:
The traffic variations of top figure (periodic and deterministic trends) are assigned to the normal subspace.
The bottom figure (spikes, anomalous behaviour) projections belong to the anomalous subspace .
Source: A. Lakhina, M. Crovella, and C. Diot. “ Diagnosing Network-Wide Traffic Anomalies” . In ACM SIGCOMM, Portland, August 2004.
Subspace-based AD
Subspace-based
Multiway Subspace Method
(“single-way”) Subspace Method
Multiway Subspace Method
Anomalies produces some changes in traffic feature distributions
Which features are used?
Source IP, destination IP, source port and destination port.
How can we study the distribution of a particular feature?
Studying its dispersion through entropy
Maximum concentration entropy = 0
Maximum dispersion maximum entropy
It allows detecting anomalies unseen by volume-based methods Source: A. Lakhina, M. Crovella and C. Diot. “Mining Anomalies Using Traffic Feature Distributions.” In ACM SIGCOMM, Philadelphia, PA, August 2005
Multiway Subspace Method (II)
What is analyzed?
Multiple traffic features and flows.
How is analyzed?
One matrix for each feature.
H(t,p,k) = entropy value at time t for
OD flow p, of the traffic feature k.
How can be used the subspace method?
Problem? It only works for single-way data.
Solution: “unfold” the multiway matrix of the figure into a single, large matrix.
Now, subspace method can be used.
Source: A. Lakhina, M. Crovella and C. Diot. “Mining Anomalies Using Traffic Feature Distributions.” In ACM SIGCOMM, Philadelphia, PA, August 2005
Present
Anomaly detection (AD)
Threshold-based
Profile-based
Subspace-based
Wavelet-based
Anomaly classification (AC)
Subspace-based
Clustering
Wavelet-based AD
What is a wavelet transform?
A given signal is divided into different frequency components.
How it’s used to detect anomalies?
Each component is used to look for anomalies which duration matches with its scale.
Low-frequency : very sparsed information long time anomalies.
L-part : isolate very long duration anomalies (several days).
Source: P. Barford, J. Kline, D. Plonka, and A. Ron. “A signal analysis of network traffic anomalies” In Internet Measurement Workshop, Marseille, November 2002.
Wavelet-based AD (III)
The Deviation Score method
Compute the local variability of H- and M-parts calculating the variance of the data falling within a moving window of specified size.
V-part = weighted sum of the obtained values.
Apply thresholding to the V-signal.
Source: P. Barford, J. Kline, D. Plonka, and A. Ron. “A signal analysis of network traffic anomalies” In Internet Measurement Workshop, Marseille, November 2002.
Interpreting the results…
Score ≥ 2.0 “high-confidence”
Score < 1.25 “low-confidence”
Present
Anomaly detection (AD)
Threshold-based
Profile-based
Subspace-based
Wavelet-based
Anomaly classification (AC)
Subspace-based
Clustering
Anomaly Classification
Why?
AD it’s not enough. In order to do something to mitigate an anomaly we’ve to know what happened.
Mine patterns from anomaly data to gain better insight into the nature of the anomalies that have been detected.
What do we want?
Given a set of possible candidate anomalies we would like to find the true anomaly type.
Present
Anomaly detection (AD)
Threshold-based
Profile-based
Subspace-based
Wavelet-based
Anomaly classification (AC)
Subspace-based
Clustering
Subspace-based AC
AD = displacement of the state vector y away from the normal subspace.
AC study the direction of this vector.
Select the anomaly which best describes this deviation.
It has to explain the largest amount of residual traffic.
Present
Anomaly detection (AD)
Threshold-based
Profile-based
Subspace-based
Wavelet-based
Anomaly classification (AC)
Subspace-based
Clustering
AC: Clustering
Unsupervised method that group similar anomalies together clusters
Two types of algorithms:
Partitional: k-means
It works constructing new partitions associating each item with the closest centroid (mean point of each set).
Hierarchical: agglomerative hierarchical
It works merging clusters iteratively (at the beginning there is a cluster for each item).
(this point follows the multiway subspace method described earlier)
AC: Clustering
Each anomaly can be thought of as a point in a four-dimensional space:
Focus on the relationship between anomalies.
Similar kinds of anomalies will be close to each other in the entropy space.
Method:
Cluster anomalies (automatic).
Find the correspondence between clusters and high level anomaly types (manual).
AC: Clustering (II)
The distinct separation among these three types of known anomalies suggests that it may be possible to divide this set of anomalies into groups automatically .
e.g., with agglomerative hierarchical
How? Source: A. Lakhina, M. Crovella and C. Diot. “Mining Anomalies Using Traffic Feature Distributions.” In ACM SIGCOMM, Philadelphia, PA, August 2005
AC: Clustering (III)
Cluster ↔ Known anomaly?
Anomalies within a cluster tend to have the same known anomaly.
Clusters tend to have distinct meanings.
Source: A. Lakhina, M. Crovella and C. Diot. “Mining Anomalies Using Traffic Feature Distributions.” In ACM SIGCOMM, Philadelphia, PA, August 2005
Outline
What is an anomaly?
Why anomaly detection?
Known anomalies
Present
Future work
Conclusions
Future work
New useful metrics for AD and further understanding of anomalies structure.
Improve AC methods.
Impact of Traffic Sampling in AD.
Outline
What is an anomaly?
Why anomaly detection?
Known anomalies
Present
Future work
Conclusions
Conclusions
A lot of work had been done related to specific AD: DoS/DDoS, scans and worms had been widely treated while other anomalies like flash crowds are not.
There are diverse methods for generic AD (known and unknown anomalies). A very large and distinct set of anomalies is detected.
There is a lot of work to do to in generic AC since there is not a method for determining an anomaly type through a fully automated process.
The literature related to this subject is really huge and it’s increasing rapidly every day.
Bibliography
Zhang, Y., GE, Z.-H., Greenberg, A., and Roughan, M. “Network anomography” . In IMC , 2005.
A. Lakhina, M. Crovella and C. Diot. “ Mining Anomalies Using Traffic Feature Distributions.” In ACM SIGCOMM, Philadelphia, PA, August 2005.
S. Kim, A. L. N. Reddy, and M. Vannucci. “Detecting Traffic Anomalies through Aggregate Analysis of Packet Header Data” . In Networking, 2004.
A. Lakhina, M. Crovella, and C. Diot. ”Characterization of Network-Wide Anomalies in Traffic Flows” (Short Paper). In Internet Measurement Conference, 2004.
A. Lakhina, M. Crovella, and C. Diot. “ Diagnosing Network-Wide Traffic Anomalies” . In ACM SIGCOMM, Portland, August 2004.
A. Lakhina, K. Papagiannaki, M. Crovella, C. Diot, E. D. Kolaczyk, and N. Taft. “Structural Analysis of Network Traffic Flows” . In ACM SIGMETRICS, 2004.
M.-S. Kim, H.-J. Kang, S.-C. Hung, S.-H. Chung, and J. W. Hong. “A Flow-based Method for Abnormal Network Traffic Detection” . In IEEE/IFIP Network Operations and Management Symposium, Seoul, April 2004.
P. Barford, J. Kline, D. Plonka, and A. Ron. “A signal analysis of network traffic anomalies” In Internet Measurement Workshop, Marseille, November 2002.
Barford and D. Plonka, “ Characteristics of network traffic flow anomalies” in Proceedings of ACM SIGCOMM Internet Measurement Workshop, San Francisco, CA, November 2001.
J. Brutlag, “Aberrant behavior detection in time series for network monitoring” in Proceedings of the USENIX Fourteenth System Administration Conference LISA XIV, New Orleans, LA, December 2000.
This work was made in a subject called CBA (in cata more
This work was made in a subject called CBA (in catalan), that means Broadband Communications. It's done once a year in the UPC (Spain, Catalonia, Barcelona) less
0 comments
Post a comment