Anomaly Detection in Traffic Monitoring

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    Anomaly Detection in Traffic Monitoring - Presentation Transcript

    1. Anomaly Detection The state of the art Ignasi Paredes Oliva [email_address]
    2. Outline
      • What is an anomaly?
      • Why anomaly detection?
      • Known anomalies
      • Present
      • Future work
      • Conclusions
    3. Outline
      • What is an anomaly?
      • Why anomaly detection?
      • Known anomalies
      • Present
      • Future work
      • Conclusions
    4. What is an anomaly?
      • An attack (malicious anomaly)
        • DoS
        • Worms
        • etc
      • Unexpected behaviour (unintentional anomaly)
        • A malfunctioning router.
        • Broken hardware due to some natural kind of accident (like a fire).
        • etc
    5. Outline
      • What is an anomaly?
      • Why anomaly detection?
      • Known anomalies
      • Present
      • Future work
      • Conclusions
      • Internet
        • It changes very fast.
        • Number of threats grows and diversifies  Networks are compromised every day with new kind of attacks
      • Threats: Network performance and security.
      • What security mechanisms do we have to fight against these threats?
        • Antivirus, firewalls, signature-based IDS.
          • They are not enough to provide full protection.
      Why Anomaly Detection?
      • Drawbacks of firewalls, antivirus and signature-based IDS:
        • They only catch events that have been told to look for.
        • Anything outside the list will not be recognised.
      • Solution
        • Anomaly Detection System  everything (known and unknown) that differs the normal profile is flagged as anomalous.
      • Final goal
        • Firewalls + antivirus + IDS + ADS working together to achieve a robust defence.
      Why Anomaly Detection? (II)
    6. Outline
      • What is an anomaly?
      • Why anomaly detection?
      • Known anomalies
      • Present
      • Future work
      • Conclusions
    7. Known anomalies
      • Alpha
      • DoS and DDoS
      • Flash Crowd
      • Scan
      • Worm
      • Outage
      • Point to Multipoint
      • Ingress Shift
    8. Alpha anomaly
      • Description
        • High rate point to point transfer.
      • Features
        • Spike in traffic between a single pair of IP addresses.
        • Short duration.
      FTP transfer
    9. DoS/DDoS anomaly
      • Description
        • An attempt to make a computer resource unavailable to prevent authorized users from gaining access to the desired network resource.
      • Features
        • Spike in traffic to a dominant destination IP.
    10. Flash Crowd anomaly
      • Description
        • Unusually large demand for a resource/service.
      • Features
        • Spike in traffic to a dominant destination IP and port.
    11. Scan anomaly
      • Description
        • Looking for a vulnerable port (port scan) or scanning the network for a target port (network scan).
      • Features
        • Spike in traffic from a dominant source IP.
      open(port X)? ? ? ? ? ? ? ?
    12. Worm anomaly
      • Description
        • Code that spreads across a network by exploiting security flaws.
      • Features
        • Spike in traffic with a dominant port.
    13. Outage anomaly
      • Description
        • Decrease in traffic exchanged between an OD pair.
          • OD: O=origin (ingress router) and D=destination (egress router)
      Change
    14. Point to Multipoint anomaly
      • Description
        • Distribution of content from one server to many users.
      • Features
        • Spike in traffic from a dominant source IP to numerous destinations with the same (and well known) port.
    15. Ingress Shift anomaly
      • Description
        • Customer shifts traffic from one ingress router to another one.
      • Features
        • Decrease in traffic in one OD flow and spike in another one.
    16. Outline
      • What is an anomaly?
      • Why anomaly detection?
      • Known anomalies
      • Present
      • Future work
      • Conclusions
    17. Present
      • Anomaly detection (AD)
        • Threshold-based
        • Profile-based
        • Subspace-based
        • Wavelet-based
      • Anomaly classification (AC)
        • Subspace-based
        • Clustering
    18. Present
      • Anomaly detection (AD)
        • Threshold-based
        • Profile-based
        • Subspace-based
        • Wavelet-based
      • Anomaly classification (AC)
        • Subspace-based
        • Clustering
    19. Threshold-based AD
      • Description
        • Upper and lower traffic thresholds are set for each time interval considered ( day and night, workday and weekend…).
        • bits/s, packets/s and flows/s.
      • How?
        • A value higher than the upper threshold.
        • A value smaller than the lower threshold.
    20. Present
      • Anomaly detection (AD)
        • Threshold-based
        • Profile-based
        • Subspace-based
        • Wavelet-based
      • Anomaly classification (AC)
        • Subspace-based
        • Clustering
    21. Profile-based AD
      • Time Series
        • Sequence of data points measured at successive times and spaced at time intervals.
        • Time series prediction : use of a model to predict future events based on known past events.
        • When applying time series for forecasting we implicitly make the assumption that there is an underlying model which governs the development of the system.
      • Method
        • There is a statistical definition of normal activity.
        • Future values are predicted using the past ones.
        • If the actual traffic measure differs too much from its predicted value  anomaly.
    22. Profile-based AD
        • Profile-based
        • Exponential smoothing (ES)
        • The Holt-Winters forecasting algorithm
    23. Exponential Smoothing
      • The next value is predicted given the current value and the current prediction.
        • Y t+1 = α *Y t + (1- α )*y t
      • α is the model parameter (0< α <1)
        • Current value is most informative for the next one and older observations importance decays exponentially.
      • Main drawback
        • Assumes a steady evolution. It will never foresee a turning point  It doesn’t account for seasonality.
    24. Profile-based AD
        • Profile-based
        • The Holt-Winters forecasting algorithm
        • Exponential smoothing (ES)
    25. The Holt-Winters Forecasting Algorithm
      • Observed time series can be divided into three components:
      • ES is used to update each one:
        • Baseline: a t = k * ( y t – c t-m ) + (1-k) * ( a t-1 + b t-1 )
        • Linear trend: b t = q * ( a t – a t-1 ) + (1-q) * b t-1
        • Seasonal effect: c t = r * ( y t - a t ) + (1 - r) * c t-m
      • k,q and r are the model parameters
        • Larger values  fast adaptation to recent changes.
        • Smaller values  slow adaptation; more weight on the past history.
      • Seasonality concept is clearly used .
      • Detecting the anomaly… (defining “too much” deviation)
        • For each predicted value of the time series there is a confidence band.
          • If the observed value falls out its interval  anomaly
          • Drawback: high number of false positives.
        • Moving window of fixed number of observations
          • A maximum threshold of violations is defined
          • If more values than the threshold fall out  anomaly
      Profile-based AD
    26. Present
      • Anomaly detection (AD)
        • Threshold-based
        • Profile-based
        • Subspace-based
        • Wavelet-based
      • Anomaly classification (AC)
        • Subspace-based
        • Clustering
    27. Subspace-based AD
        • Subspace-based
        • (“single-way”) Subspace Method
        • Multiway Subspace Method
    28. Subspace method
      • Focus on OD flows instead of studying traffic on all links.
      • Volume anomalies produces sudden changes in a OD flow’s traffic.
        • #packets and #bytes.
      • Problem: OD flows are a high dimensional multivariate structure  A lower-dimensional approximation is needed  Principal Component Analysis (PCA)
    29. Subspace method (II) - PCA
      • Description
        • It works by examining the time series of traffic in all OD flows.
        • All their structure can be well captured by few dimensions.
      • How?
        • It transforms data onto a new set of axes (called principal components, PCs).
        • PCs are ordered by higher captured variance.
      The first k PCs tend to capture the most significant and common periodic temporal trends
      • Subspace method (SM) and PCA:
        • SM separates the PCs into two sets (normal and anomalous variations of traffic)  normal and anomalous subspaces.
        • The link traffic y is projected onto these subspaces.
        • : residual vector that captures the unexplained variation in the used metric (projecting y onto anomalous subspace).
        • If the norm of this vector is large  anomaly
      Subspace method (III)
    30. Subspace method (IV)
      • Projections examples:
        • The traffic variations of top figure (periodic and deterministic trends) are assigned to the normal subspace.
        • The bottom figure (spikes, anomalous behaviour) projections belong to the anomalous subspace .
      Source: A. Lakhina, M. Crovella, and C. Diot. “ Diagnosing Network-Wide Traffic Anomalies” . In ACM SIGCOMM, Portland, August 2004.
    31. Subspace-based AD
        • Subspace-based
        • Multiway Subspace Method
        • (“single-way”) Subspace Method
    32. Multiway Subspace Method
      • Anomalies produces some changes in traffic feature distributions
        • Which features are used?
          • Source IP, destination IP, source port and destination port.
      • How can we study the distribution of a particular feature?
        • Studying its dispersion through entropy
          • Maximum concentration  entropy = 0
          • Maximum dispersion  maximum entropy
      It allows detecting anomalies unseen by volume-based methods Source: A. Lakhina, M. Crovella and C. Diot. “Mining Anomalies Using Traffic Feature Distributions.” In ACM SIGCOMM, Philadelphia, PA, August 2005
    33. Multiway Subspace Method (II)
      • What is analyzed?
        • Multiple traffic features and flows.
      • How is analyzed?
        • One matrix for each feature.
        • H(t,p,k) = entropy value at time t for
        • OD flow p, of the traffic feature k.
      • How can be used the subspace method?
        • Problem? It only works for single-way data.
        • Solution: “unfold” the multiway matrix of the figure into a single, large matrix.
        • Now, subspace method can be used.
      Source: A. Lakhina, M. Crovella and C. Diot. “Mining Anomalies Using Traffic Feature Distributions.” In ACM SIGCOMM, Philadelphia, PA, August 2005
    34. Present
      • Anomaly detection (AD)
        • Threshold-based
        • Profile-based
        • Subspace-based
        • Wavelet-based
      • Anomaly classification (AC)
        • Subspace-based
        • Clustering
    35. Wavelet-based AD
      • What is a wavelet transform?
        • A given signal is divided into different frequency components.
      • How it’s used to detect anomalies?
        • Each component is used to look for anomalies which duration matches with its scale.
        • Low-frequency : very sparsed information  long time anomalies.
        • High-frequency : fine-grained details  spontaneous changes, short-term anomalies.
    36. Wavelet-based AD (II)
      • Three components are extracted from each signal:
        • H-part : isolate short-term variations ≡ “noise”.
        • M-part : isolate daily variations in the data.
        • L-part : isolate very long duration anomalies (several days).
      Source: P. Barford, J. Kline, D. Plonka, and A. Ron. “A signal analysis of network traffic anomalies” In Internet Measurement Workshop, Marseille, November 2002.
    37. Wavelet-based AD (III)
      • The Deviation Score method
        • Compute the local variability of H- and M-parts calculating the variance of the data falling within a moving window of specified size.
        • V-part = weighted sum of the obtained values.
        • Apply thresholding to the V-signal.
      Source: P. Barford, J. Kline, D. Plonka, and A. Ron. “A signal analysis of network traffic anomalies” In Internet Measurement Workshop, Marseille, November 2002.
      • Interpreting the results…
        • Score ≥ 2.0  “high-confidence”
        • Score < 1.25  “low-confidence”
    38. Present
      • Anomaly detection (AD)
        • Threshold-based
        • Profile-based
        • Subspace-based
        • Wavelet-based
      • Anomaly classification (AC)
        • Subspace-based
        • Clustering
    39. Anomaly Classification
      • Why?
        • AD it’s not enough. In order to do something to mitigate an anomaly we’ve to know what happened.
        • Mine patterns from anomaly data to gain better insight into the nature of the anomalies that have been detected.
      • What do we want?
        • Given a set of possible candidate anomalies we would like to find the true anomaly type.
    40. Present
      • Anomaly detection (AD)
        • Threshold-based
        • Profile-based
        • Subspace-based
        • Wavelet-based
      • Anomaly classification (AC)
        • Subspace-based
        • Clustering
    41. Subspace-based AC
      • AD = displacement of the state vector y away from the normal subspace.
      • AC  study the direction of this vector.
      • Select the anomaly which best describes this deviation.
        • It has to explain the largest amount of residual traffic.
    42. Present
      • Anomaly detection (AD)
        • Threshold-based
        • Profile-based
        • Subspace-based
        • Wavelet-based
      • Anomaly classification (AC)
        • Subspace-based
        • Clustering
    43. AC: Clustering
      • Unsupervised method that group similar anomalies together  clusters
      • Two types of algorithms:
        • Partitional: k-means
          • It works constructing new partitions associating each item with the closest centroid (mean point of each set).
        • Hierarchical: agglomerative hierarchical
          • It works merging clusters iteratively (at the beginning there is a cluster for each item).
      (this point follows the multiway subspace method described earlier)
    44. AC: Clustering
      • Each anomaly can be thought of as a point in a four-dimensional space:
      • Focus on the relationship between anomalies.
      • Similar kinds of anomalies will be close to each other in the entropy space.
      • Method:
        • Cluster anomalies (automatic).
        • Find the correspondence between clusters and high level anomaly types (manual).
    45. AC: Clustering (II)
      • The distinct separation among these three types of known anomalies suggests that it may be possible to divide this set of anomalies into groups automatically .
      • e.g., with agglomerative hierarchical
      How? Source: A. Lakhina, M. Crovella and C. Diot. “Mining Anomalies Using Traffic Feature Distributions.” In ACM SIGCOMM, Philadelphia, PA, August 2005
    46. AC: Clustering (III)
      • Cluster ↔ Known anomaly?
        • Anomalies within a cluster tend to have the same known anomaly.
        • Clusters tend to have distinct meanings.
      Source: A. Lakhina, M. Crovella and C. Diot. “Mining Anomalies Using Traffic Feature Distributions.” In ACM SIGCOMM, Philadelphia, PA, August 2005
    47. Outline
      • What is an anomaly?
      • Why anomaly detection?
      • Known anomalies
      • Present
      • Future work
      • Conclusions
    48. Future work
      • New useful metrics for AD and further understanding of anomalies structure.
      • Improve AC methods.
      • Impact of Traffic Sampling in AD.
    49. Outline
      • What is an anomaly?
      • Why anomaly detection?
      • Known anomalies
      • Present
      • Future work
      • Conclusions
    50. Conclusions
      • A lot of work had been done related to specific AD: DoS/DDoS, scans and worms had been widely treated while other anomalies like flash crowds are not.
      • There are diverse methods for generic AD (known and unknown anomalies). A very large and distinct set of anomalies is detected.
      • There is a lot of work to do to in generic AC since there is not a method for determining an anomaly type through a fully automated process.
      • The literature related to this subject is really huge and it’s increasing rapidly every day.
    51. Bibliography
      • Zhang, Y., GE, Z.-H., Greenberg, A., and Roughan, M. “Network anomography” . In IMC , 2005.
      • A. Lakhina, M. Crovella and C. Diot. “ Mining Anomalies Using Traffic Feature Distributions.” In ACM SIGCOMM, Philadelphia, PA, August 2005.
      • S. Kim, A. L. N. Reddy, and M. Vannucci. “Detecting Traffic Anomalies through Aggregate Analysis of Packet Header Data” . In Networking, 2004.
      • A. Lakhina, M. Crovella, and C. Diot. ”Characterization of Network-Wide Anomalies in Traffic Flows” (Short Paper). In Internet Measurement Conference, 2004.
      • A. Lakhina, M. Crovella, and C. Diot. “ Diagnosing Network-Wide Traffic Anomalies” . In ACM SIGCOMM, Portland, August 2004.
      • A. Lakhina, K. Papagiannaki, M. Crovella, C. Diot, E. D. Kolaczyk, and N. Taft. “Structural Analysis of Network Traffic Flows” . In ACM SIGMETRICS, 2004.
      • M.-S. Kim, H.-J. Kang, S.-C. Hung, S.-H. Chung, and J. W. Hong. “A Flow-based Method for Abnormal Network Traffic Detection” . In IEEE/IFIP Network Operations and Management Symposium, Seoul, April 2004.
      • P. Barford, J. Kline, D. Plonka, and A. Ron. “A signal analysis of network traffic anomalies” In Internet Measurement Workshop, Marseille, November 2002.
      • Barford and D. Plonka, “ Characteristics of network traffic flow anomalies” in Proceedings of ACM SIGCOMM Internet Measurement Workshop, San Francisco, CA, November 2001.
      • J. Brutlag, “Aberrant behavior detection in time series for network monitoring” in Proceedings of the USENIX Fourteenth System Administration Conference LISA XIV, New Orleans, LA, December 2000.
      Bibliography (II)

    + Ignasi ParedesIgnasi Paredes, 2 years ago

    custom

    1181 views, 0 favs, 1 embeds more stats

    This work was made in a subject called CBA (in cata more

    More Info

    CC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs License

    Go to text version
    • Total Views 1181
      • 1180 on SlideShare
      • 1 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 43
    Most viewed embeds
    • 1 views on http://static.slideshare.net

    more

    All embeds
    • 1 views on http://static.slideshare.net

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as innappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel

    Categories