• Save
Chapter 10 Anomaly Detection
Upcoming SlideShare
Loading in...5
×
 

Chapter 10 Anomaly Detection

on

  • 3,751 views

Chapter 10 Anomaly Detection sections (10.1 ~ 10.3).

Chapter 10 Anomaly Detection sections (10.1 ~ 10.3).

Statistics

Views

Total Views
3,751
Views on SlideShare
3,738
Embed Views
13

Actions

Likes
3
Downloads
0
Comments
2

2 Embeds 13

http://www.slideshare.net 9
http://abolkog.comxa.com 4

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Chapter 10 Anomaly Detection Chapter 10 Anomaly Detection Presentation Transcript

  • Anomaly Detection(10.1 ~ 10.3)
    Khalid Elshafie
    abolkog@dblab.cbnu.ac.kr
    Database / Bioinformatics Lab.
    Chungbuk National University
  • Anomaly Detection (10.1 ~ 10.3)
    Contents
    1
    2
    3
    Introduction
    Statistical Approach
    Proximity-based Approach
    2
  • Anomaly Detection (10.1 ~ 10.3)
    Introduction (1/4)
    Anomaly Detection
    Find objects that are different from most other objects.
    Anomaly objects are often known as outliers.
    On a scatter plot of data, they lie far away from other data points.
    Also knows as
    Deviation detection
    Anomalous objects have attribute values that deviate significantly from the expected or typical attribute values.
    Exception mining
    Because anomalies are exceptional in some sense.
    3
    outlier
  • Anomaly Detection (10.1 ~ 10.3)
    Introduction (2/4)
    Applications
    Fraud Detection.
    The purchasing behavior of someone who steals a credit card is probably different from that of the original owner.
    Intrusion Detection.
    Attacks on computer systems and computer networks.
    Ecosystem Disturbance.
    Hurricanes, floods, heat waves…etc
    Medicine.
    Unusual symptoms or test result may indicate potential health problem.
    ……
    4
  • Anomaly Detection (10.1 ~ 10.3)
    Introduction (3/4)
    What causes anomalies
    Data from Different Sources
    Someone who committing credit card fraud belongs to different class than those people who use credit card legitimately.
    Such anomalies are often of considerable interest and are the focus of anomaly detection in the field of data mining.
    An outlier is an observation that differs so much from other observations as to arouse suspicion that it was generated by different mechanism (Hawkins’ Definition of Outlier).
    Natural Variant
    Many data sets can be modeled by statistical distribution where the probability of a data object decrease rapidly as the distance of the object from the center of the distribution increases.
    Most objects are near a center (average object) and the likelihood that an object differs from this average is small.
    Anomalies that represent extreme or unlikely variations are often interesting.
    Data Measurement and Collection Error
    Error in the data collection or measurement process are another source of anomalies.
    The goal is to eliminate such anomalies since they provide no interesting information but only reduce the quality of the data and the subsequent data analysis.
    5
  • Anomaly Detection (10.1 ~ 10.3)
    Introduction (4/4)
    Approach to Anomaly Detection
    Model-based Technique.
    Build a model of the data.
    Anomalies are objects that do not fit the model very well.
    Proximity-based Technique.
    Many of the technique in this area are based on distances and are referred toasdistance-based outlier detection technique.
    Anomalous object are those that are distant from most of the other objects.
    Density-Based Technique.
    Objects that are in regions of low density are relatively distant from their neighbors and can be considered anomalous.
    6
  • Anomaly Detection (10.1 ~ 10.3)
    Statistical Approach (1/2)
    Statistical approach are model-based approaches
    A model is created for the data and object are evaluated with respect to how well they fit the model.
    Most statistical approach to outlier detection are based on building a probability model distribution model and considering how likely objects are under that model.
    Outliers are objects that has a low probability with respect to probability distribution model of the data (Probabilistic Definition of an Outlier).
    7
  • Anomaly Detection (10.1 ~ 10.3)
    Statistical Approach (2/2)
    Strength and weakness
    Have a firm foundation and build on standard statistical technique
    When there is sufficient knowledge of the data and the type of the test that should be applied, these tests can be very effective.
    There are a wide variety of statistical outliers test for single attributes, fewer options are available for multivariate data.
    Can perform poorly for high-dimensional data.
    8
  • Anomaly Detection (10.1 ~ 10.3)
    Proximity-based Approach (1/3)
    Proximity-based Approach
    The basic notation of this approach is straightforward
    An object is anomaly if it is distant from most point.
    More general and more easily applied than statistical approaches.
    Its easier to determine a meaningful proximity measure for data set than to determine its statistical distribution.
    One of the simplest way to measure whether an object is distant from most point is to use the distance to the k-nearest neighbor.
    The outlier score of an object is given by the distance to its k-nearest neighbor.
    The lowest value of outlier score is 0
    The highest value is the maximum possible value of the distance function (usually infinity).
    9
  • Anomaly Detection (10.1 ~ 10.3)
    Proximity-based Approach (2/4)
    10
    Approach:
    Compute the distance between every pair of data points
    There are various ways to define outliers:
    Data points for which there are fewer than p neighboring points within a distance D
    The top n data points whose distance to the kth nearest neighbor is greatest
    The top n data points whose average distance to the kth nearest neighbors is greatest
  • Anomaly Detection (10.1 ~ 10.3)
    Proximity-based Approach (3/4)
    11
    Proximity-based Approach
    • The shading of each point indicates its outlier score using value of K=5
    • The outlier score can be highly sensitive to the value of k
    • If k is too small e.g., 1 then a small number of nearby outliers can cause a low outlier score
    • If k is too large then its possible for all objects in a cluster that has fewer objects than k to become outliers
  • Anomaly Detection (10.1 ~ 10.3)
    Proximity-based Approach (4/4)
    Strength and Weaknesses
    Simple schema.
    Proximity based approach typically take O(m2) time.
    For large data sets this can be too expensive.
    Sensitive to the choice of parameters.
    It can’t handle dataset with regions of widely differing densities
    12
  • Thank You !
    www.themegallery.com