SlideShare a Scribd company logo
1 of 36
ANOMALY DETECTION FOR REAL-
WORLD SYSTEMS
Manojit Nandi
Data Scientist
STEALTHbits
Technologies
@mnandi92
WHAT ARE ANOMALIES?
Hard to define; “You’ll know it when you see it”.
Generally, anomalies are “anything that noticeably different” from the
expected.
One important thing to keep in mind is that what is considered anomalous
now may not be considered anomalous in the future.
Different approaches to anomaly detection
We can develop a statistical model of
normal behavior. Then we can test how
likely an observation is under the model.
We can take a machine learning approach
and use a classifier to classify data
points as “normal” or “anomalous”.
In this talk, I’m going to cover algorithms
specially designed to detect anomalies. Photo Credit: Microsoft Azure
Anomalies in Data Streams
Problem: We have data streaming in continuously, and we want to identify
anomalies in real-time.
Constraint: We can only examine the last 100 events in our sliding window.
In data streaming problems, we are “restricted” to quick-and-dirty methods
due to the limited memory and need for rapid action.
Heuristic: Z-Scores
Z-scores are often used as a test statistic to
measure the “extremeness” of an
observation in statistical hypothesis
testing.
How many standard deviations away from
the mean is a particular observation?
Moving Averages and Moving St. Dev.
As the data comes in, we keep track of the average and the standard
deviation of the last n data points.
For each new data point, update the average and the standard deviation.
Using the new average and standard deviation, compute the z-score for this
data point.
If the Z-score exceeds some threshold, flag the data point as anomalous.
Standard Deviation not robust
Standard deviation (and mean, to a lesser
extend) is highly sensitive to extreme
values.
One extreme value can drastically increase
the standard deviation.
As a result, the Z-scores for other data points
dramatically decreases as well.
Mathematics of Robustness
The arithmetic mean s is the number which solves the following
minimization problem:
The median m is the number which solves the following minimization
problem:
Visualizing
robustness
Mean vs. Outlier St. Dev vs. Outlier
Median Absolute Deviation
The Median Absolute Deviation provides a robust way to measure the
“spread” of the data.
Essentially, the median of the deviations from the “center” (the median).
Provides a more robust measure of “spread” compared to standard
deviation.
Median Absolute Deviation
Modified Z-Scores
Now, we can use the median and the
MAD to compute the modified Z-
score for each data point.
We then use the modified z-score to
perform statistical hypothesis testing,
in the same manner as the standard
z-score.
Density-Based Anomaly Detection
We have a bunch of points in
some n-dimensional space.
Which ones are “noticeably”
different from the others.
Probabilistic Density: A Primer
In statistics, we assume all data is
generated according to some
probability distribution.
The goal of density-based methods is to
estimate the underlying probability
density function, based on the data.
Many density-based methods, such as
DBSCAN and Level Set Tree Clustering.
Local Outlier
Factor
Goal: Quantify the relative density
about a particular data point.
Intuition: The anomalies should be
more isolated compared to
“normal” data points.
LOF estimates density by looking at
a small neighborhood about each
point.
K-Distance
For each datapoint, compute the
distance to the Kth-nearest
neighbor.
Meta-heuristic: The K-distance gives
us a notion of “volume”.
The more isolated a point is, the
larger its k-distance.
Reachability Distance
Reachability-Distance(A,B) =
Max(K-Dist(B), Dist(A,B))
“Do your close neighbors see you
as one of their close
neighbors”.
Local Reachability
Density
Now, we are going to estimate the local density about each point.
For each data point, compute the average reachability-distance to its K-
nearest neighbors.
The Local Reachability Density (LRD) of a data point A is defined as the
inverse of this average reachability-distance.
Local Outlier
Factor
LOF score is the ratio of point
A’s density to the average
density of its neighbors.
Outliers come from less dense
areas, so the ratio is higher for
outliers.
Interpreting LOF scores
Normal points have LOF scores between 1 and 1.5.
Anomalous points have much higher LOF scores.
If a point has a LOF score of 3, then this means the average density of this
point’s neighbors is about 3x more dense than its local density.
Time Series based Anomalies
Given activity indexed by time, can we identify extreme spikes and troughs in
the time series.
Want to find global anomalies and local anomalies.
Source: Twitter Engineering Blog
Seasonal Hybrid - ESD
Algorithm invented at Twitter in 2015.
Two components:
Seasonal Decomposition: Remove seasonal patterns from the time series
ESD: Iteratively test for outliers in the time series.
Remove periodic patterns from the time series, then identify anomalies with
the remaining “core” of the time series.
Seasonal
Decomposition
Time Series Decomposition breaks a time series down into three
components:
a. Trend Component
b. Seasonal Component
c. Residual (or Random) Component
The trend component contains the “meat” of the time series that we are
interested in.
Seasonality Example
Extreme Studentized Deviate
ESD is a statistical procedure to iteratively test for outliers in a dataset.
Specify the alpha level and the maximum number of anomalies to identify.
ESD naturally applies a statistical correction to compensate for multiple
hypothesis testing.
Extreme Studentized Deviate
1. For each datapoint, compute a G-Score (absolute value of Z-Score)
2. Take the point with the highest G-Score.
3. Using the pre-specified alpha value, compute a critical value.
4. If the G-Score of the test point is greater than the critical value, flag the
point as anomalous.
5. Remove this point from the data.
Seasonal Hybrid - ESD: Example
Photo Credit: Twitter Engineering Blog
Robust PCA
Created in 2009 by Candes et al.
Regular PCA identifies a low-rank representation of the data using Singular
Value Decomposition.
Robust PCA identifies a low-rank representation, outliers, and noise.
Used by Netflix in the Robust Anomaly Detection (RAD) package.
Robust PCA
Specify thresholds for the singular values and the error value.
Iterate through the data:
Apply Singular Value Decomposition.
Using thresholds, categorize the data into “normal”, “outlier”, “noise”.
Repeat until all points are classified.
Robust PCA - Example
Source: Netflix Techblog
Risk Scores
When doing anomaly detection in practice, you want to treat it more as a
regression problem rather than a classification problem.
Best practices recommend calculating the likelihood that a data point or
event is anomalous and converting the likelihood into a risk score.
Risk scores should be from 0-100 because people more intuitively
understand this scale.
Risk Scores using Bayesian Inference
We have some risk measure r, and we assume r comes from a well-defined
distribution.
Example: Prob(Getting risk measure r) = (Exponential Distribution)
Want to compute likelihood Prob(Getting r | data).
Risk Scores using Bayesian Inference
Baye’s Theorem tells us that
Prob(Data | λ) is the likelihood function, and by choosing Prob(λ) as the
Gamma distribution, we get a computable posterior distribution for
Prob( r | Data).
Risk Scores using Bayesian Inference
Testing your algorithms
Be sure to test algorithms in different environments. Just because it
performs well in one environment does not mean it will generalize well to
other environments.
Since anomalies are rare, create synthetic datasets with built-in anomalies. If
you can’t identify the built-in anomalies, then you have a problem.
You should be constantly testing and fine-tuning your algorithms, so I
recommend building a test harness to automate testing.
Questions?

More Related Content

What's hot

What's hot (20)

Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
Anomaly Detection in Seasonal Time Series
Anomaly Detection in Seasonal Time SeriesAnomaly Detection in Seasonal Time Series
Anomaly Detection in Seasonal Time Series
 
Anomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleAnomaly detection with machine learning at scale
Anomaly detection with machine learning at scale
 
Anomaly detection Workshop slides
Anomaly detection Workshop slidesAnomaly detection Workshop slides
Anomaly detection Workshop slides
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 
Anomaly Detection and Spark Implementation - Meetup Presentation.pptx
Anomaly Detection and Spark Implementation - Meetup Presentation.pptxAnomaly Detection and Spark Implementation - Meetup Presentation.pptx
Anomaly Detection and Spark Implementation - Meetup Presentation.pptx
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
RapidMiner: Introduction To Rapid Miner
RapidMiner: Introduction To Rapid MinerRapidMiner: Introduction To Rapid Miner
RapidMiner: Introduction To Rapid Miner
 
Anomaly Detection using Deep Auto-Encoders
Anomaly Detection using Deep Auto-EncodersAnomaly Detection using Deep Auto-Encoders
Anomaly Detection using Deep Auto-Encoders
 
Autoencoder Forest for Anomaly Detection from IoT Time Series
Autoencoder Forest for Anomaly Detection from IoT Time SeriesAutoencoder Forest for Anomaly Detection from IoT Time Series
Autoencoder Forest for Anomaly Detection from IoT Time Series
 
Machine Learning and Data Mining
Machine Learning and Data MiningMachine Learning and Data Mining
Machine Learning and Data Mining
 
ANOMALY DETECTION IN INTELLIGENT TRANSPORTATION SYSTEM using real-time video...
 ANOMALY DETECTION IN INTELLIGENT TRANSPORTATION SYSTEM using real-time video... ANOMALY DETECTION IN INTELLIGENT TRANSPORTATION SYSTEM using real-time video...
ANOMALY DETECTION IN INTELLIGENT TRANSPORTATION SYSTEM using real-time video...
 
Unsupervised Anomaly Detection with Isolation Forest - Elena Sharova
Unsupervised Anomaly Detection with Isolation Forest - Elena SharovaUnsupervised Anomaly Detection with Isolation Forest - Elena Sharova
Unsupervised Anomaly Detection with Isolation Forest - Elena Sharova
 
Anomaly Detection @Twitter
Anomaly Detection @TwitterAnomaly Detection @Twitter
Anomaly Detection @Twitter
 
TadGAN: Time Series Anomaly Detection Using GANs (2020)
TadGAN: Time Series Anomaly Detection Using GANs (2020)TadGAN: Time Series Anomaly Detection Using GANs (2020)
TadGAN: Time Series Anomaly Detection Using GANs (2020)
 
Outlier analysis and anomaly detection
Outlier analysis and anomaly detectionOutlier analysis and anomaly detection
Outlier analysis and anomaly detection
 
A review of machine learning based anomaly detection
A review of machine learning based anomaly detectionA review of machine learning based anomaly detection
A review of machine learning based anomaly detection
 
Time series forecasting with machine learning
Time series forecasting with machine learningTime series forecasting with machine learning
Time series forecasting with machine learning
 

Viewers also liked

Finding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impactFinding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impact
Arun Kejariwal
 
The Dark of Building an Production Incident Syste
The Dark of Building an Production Incident SysteThe Dark of Building an Production Incident Syste
The Dark of Building an Production Incident Syste
Alois Reitbauer
 
The Dark Art of Production Alerting
The Dark Art of Production AlertingThe Dark Art of Production Alerting
The Dark Art of Production Alerting
Alois Reitbauer
 
SSL Certificate Expiration and Howler Monkey's Inception
SSL Certificate Expiration and Howler Monkey's InceptionSSL Certificate Expiration and Howler Monkey's Inception
SSL Certificate Expiration and Howler Monkey's Inception
royrapoport
 

Viewers also liked (20)

Real time analytics @ netflix
Real time analytics @ netflixReal time analytics @ netflix
Real time analytics @ netflix
 
Anomaly Detection for Global Scale at Netflix
Anomaly Detection for Global Scale at NetflixAnomaly Detection for Global Scale at Netflix
Anomaly Detection for Global Scale at Netflix
 
Bring the Noise
Bring the NoiseBring the Noise
Bring the Noise
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
Finding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impactFinding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impact
 
The Dark of Building an Production Incident Syste
The Dark of Building an Production Incident SysteThe Dark of Building an Production Incident Syste
The Dark of Building an Production Incident Syste
 
Anomaly Detection for Security
Anomaly Detection for SecurityAnomaly Detection for Security
Anomaly Detection for Security
 
Traffic anomaly detection and attack
Traffic anomaly detection and attackTraffic anomaly detection and attack
Traffic anomaly detection and attack
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
 
Parallel Programming in Python: Speeding up your analysis
Parallel Programming in Python: Speeding up your analysisParallel Programming in Python: Speeding up your analysis
Parallel Programming in Python: Speeding up your analysis
 
The Dark Art of Production Alerting
The Dark Art of Production AlertingThe Dark Art of Production Alerting
The Dark Art of Production Alerting
 
Monitoring large scale Docker production environments
Monitoring large scale Docker production environmentsMonitoring large scale Docker production environments
Monitoring large scale Docker production environments
 
Monitoring without alerts
Monitoring without alertsMonitoring without alerts
Monitoring without alerts
 
Can a monitoring tool pass the turing test
Can a monitoring tool pass the turing testCan a monitoring tool pass the turing test
Can a monitoring tool pass the turing test
 
PyGotham 2016
PyGotham 2016PyGotham 2016
PyGotham 2016
 
The definition of normal - An introduction and guide to anomaly detection.
The definition of normal - An introduction and guide to anomaly detection. The definition of normal - An introduction and guide to anomaly detection.
The definition of normal - An introduction and guide to anomaly detection.
 
Cloud Tech III: Actionable Metrics
Cloud Tech III: Actionable MetricsCloud Tech III: Actionable Metrics
Cloud Tech III: Actionable Metrics
 
SSL Certificate Expiration and Howler Monkey's Inception
SSL Certificate Expiration and Howler Monkey's InceptionSSL Certificate Expiration and Howler Monkey's Inception
SSL Certificate Expiration and Howler Monkey's Inception
 
Python Through the Back Door: Netflix Presentation at CodeMash 2014
Python Through the Back Door: Netflix Presentation at CodeMash 2014Python Through the Back Door: Netflix Presentation at CodeMash 2014
Python Through the Back Door: Netflix Presentation at CodeMash 2014
 
Monitoring Docker Application in Production
Monitoring Docker Application in ProductionMonitoring Docker Application in Production
Monitoring Docker Application in Production
 

Similar to Anomaly Detection for Real-World Systems

DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..
butest
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .ppt
butest
 
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Sherri Gunder
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
butest
 

Similar to Anomaly Detection for Real-World Systems (20)

Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdf
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9
 
DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..
 
UNIT1-2.pptx
UNIT1-2.pptxUNIT1-2.pptx
UNIT1-2.pptx
 
M08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffM08 BiasVarianceTradeoff
M08 BiasVarianceTradeoff
 
Machine Learning.pptx
Machine Learning.pptxMachine Learning.pptx
Machine Learning.pptx
 
Anomaly detection Full Article
Anomaly detection Full ArticleAnomaly detection Full Article
Anomaly detection Full Article
 
Machine learning presentation (razi)
Machine learning presentation (razi)Machine learning presentation (razi)
Machine learning presentation (razi)
 
Quant Data Analysis
Quant Data AnalysisQuant Data Analysis
Quant Data Analysis
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .ppt
 
Module 5: Decision Trees
Module 5: Decision TreesModule 5: Decision Trees
Module 5: Decision Trees
 
The math behind big systems analysis.
The math behind big systems analysis.The math behind big systems analysis.
The math behind big systems analysis.
 
Questions for R language.pdf
Questions for R language.pdfQuestions for R language.pdf
Questions for R language.pdf
 
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
 
Module 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationModule 4: Model Selection and Evaluation
Module 4: Model Selection and Evaluation
 
Machine learning module 2
Machine learning module 2Machine learning module 2
Machine learning module 2
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
Top Machine Learning Algorithms Used By AI Professionals ARTiBA.pdf
Top Machine Learning Algorithms Used By AI Professionals ARTiBA.pdfTop Machine Learning Algorithms Used By AI Professionals ARTiBA.pdf
Top Machine Learning Algorithms Used By AI Professionals ARTiBA.pdf
 
Summer 07-mfin7011-tang1922
Summer 07-mfin7011-tang1922Summer 07-mfin7011-tang1922
Summer 07-mfin7011-tang1922
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
 

Recently uploaded

Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
Kamal Acharya
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
mphochane1998
 

Recently uploaded (20)

Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 

Anomaly Detection for Real-World Systems

  • 1. ANOMALY DETECTION FOR REAL- WORLD SYSTEMS Manojit Nandi Data Scientist STEALTHbits Technologies @mnandi92
  • 2. WHAT ARE ANOMALIES? Hard to define; “You’ll know it when you see it”. Generally, anomalies are “anything that noticeably different” from the expected. One important thing to keep in mind is that what is considered anomalous now may not be considered anomalous in the future.
  • 3. Different approaches to anomaly detection We can develop a statistical model of normal behavior. Then we can test how likely an observation is under the model. We can take a machine learning approach and use a classifier to classify data points as “normal” or “anomalous”. In this talk, I’m going to cover algorithms specially designed to detect anomalies. Photo Credit: Microsoft Azure
  • 4. Anomalies in Data Streams Problem: We have data streaming in continuously, and we want to identify anomalies in real-time. Constraint: We can only examine the last 100 events in our sliding window. In data streaming problems, we are “restricted” to quick-and-dirty methods due to the limited memory and need for rapid action.
  • 5. Heuristic: Z-Scores Z-scores are often used as a test statistic to measure the “extremeness” of an observation in statistical hypothesis testing. How many standard deviations away from the mean is a particular observation?
  • 6. Moving Averages and Moving St. Dev. As the data comes in, we keep track of the average and the standard deviation of the last n data points. For each new data point, update the average and the standard deviation. Using the new average and standard deviation, compute the z-score for this data point. If the Z-score exceeds some threshold, flag the data point as anomalous.
  • 7. Standard Deviation not robust Standard deviation (and mean, to a lesser extend) is highly sensitive to extreme values. One extreme value can drastically increase the standard deviation. As a result, the Z-scores for other data points dramatically decreases as well.
  • 8. Mathematics of Robustness The arithmetic mean s is the number which solves the following minimization problem: The median m is the number which solves the following minimization problem:
  • 10. Median Absolute Deviation The Median Absolute Deviation provides a robust way to measure the “spread” of the data. Essentially, the median of the deviations from the “center” (the median). Provides a more robust measure of “spread” compared to standard deviation.
  • 12. Modified Z-Scores Now, we can use the median and the MAD to compute the modified Z- score for each data point. We then use the modified z-score to perform statistical hypothesis testing, in the same manner as the standard z-score.
  • 13. Density-Based Anomaly Detection We have a bunch of points in some n-dimensional space. Which ones are “noticeably” different from the others.
  • 14. Probabilistic Density: A Primer In statistics, we assume all data is generated according to some probability distribution. The goal of density-based methods is to estimate the underlying probability density function, based on the data. Many density-based methods, such as DBSCAN and Level Set Tree Clustering.
  • 15. Local Outlier Factor Goal: Quantify the relative density about a particular data point. Intuition: The anomalies should be more isolated compared to “normal” data points. LOF estimates density by looking at a small neighborhood about each point.
  • 16. K-Distance For each datapoint, compute the distance to the Kth-nearest neighbor. Meta-heuristic: The K-distance gives us a notion of “volume”. The more isolated a point is, the larger its k-distance.
  • 17. Reachability Distance Reachability-Distance(A,B) = Max(K-Dist(B), Dist(A,B)) “Do your close neighbors see you as one of their close neighbors”.
  • 18. Local Reachability Density Now, we are going to estimate the local density about each point. For each data point, compute the average reachability-distance to its K- nearest neighbors. The Local Reachability Density (LRD) of a data point A is defined as the inverse of this average reachability-distance.
  • 19. Local Outlier Factor LOF score is the ratio of point A’s density to the average density of its neighbors. Outliers come from less dense areas, so the ratio is higher for outliers.
  • 20. Interpreting LOF scores Normal points have LOF scores between 1 and 1.5. Anomalous points have much higher LOF scores. If a point has a LOF score of 3, then this means the average density of this point’s neighbors is about 3x more dense than its local density.
  • 21. Time Series based Anomalies Given activity indexed by time, can we identify extreme spikes and troughs in the time series. Want to find global anomalies and local anomalies. Source: Twitter Engineering Blog
  • 22. Seasonal Hybrid - ESD Algorithm invented at Twitter in 2015. Two components: Seasonal Decomposition: Remove seasonal patterns from the time series ESD: Iteratively test for outliers in the time series. Remove periodic patterns from the time series, then identify anomalies with the remaining “core” of the time series.
  • 23. Seasonal Decomposition Time Series Decomposition breaks a time series down into three components: a. Trend Component b. Seasonal Component c. Residual (or Random) Component The trend component contains the “meat” of the time series that we are interested in.
  • 25. Extreme Studentized Deviate ESD is a statistical procedure to iteratively test for outliers in a dataset. Specify the alpha level and the maximum number of anomalies to identify. ESD naturally applies a statistical correction to compensate for multiple hypothesis testing.
  • 26. Extreme Studentized Deviate 1. For each datapoint, compute a G-Score (absolute value of Z-Score) 2. Take the point with the highest G-Score. 3. Using the pre-specified alpha value, compute a critical value. 4. If the G-Score of the test point is greater than the critical value, flag the point as anomalous. 5. Remove this point from the data.
  • 27. Seasonal Hybrid - ESD: Example Photo Credit: Twitter Engineering Blog
  • 28. Robust PCA Created in 2009 by Candes et al. Regular PCA identifies a low-rank representation of the data using Singular Value Decomposition. Robust PCA identifies a low-rank representation, outliers, and noise. Used by Netflix in the Robust Anomaly Detection (RAD) package.
  • 29. Robust PCA Specify thresholds for the singular values and the error value. Iterate through the data: Apply Singular Value Decomposition. Using thresholds, categorize the data into “normal”, “outlier”, “noise”. Repeat until all points are classified.
  • 30. Robust PCA - Example Source: Netflix Techblog
  • 31. Risk Scores When doing anomaly detection in practice, you want to treat it more as a regression problem rather than a classification problem. Best practices recommend calculating the likelihood that a data point or event is anomalous and converting the likelihood into a risk score. Risk scores should be from 0-100 because people more intuitively understand this scale.
  • 32. Risk Scores using Bayesian Inference We have some risk measure r, and we assume r comes from a well-defined distribution. Example: Prob(Getting risk measure r) = (Exponential Distribution) Want to compute likelihood Prob(Getting r | data).
  • 33. Risk Scores using Bayesian Inference Baye’s Theorem tells us that Prob(Data | λ) is the likelihood function, and by choosing Prob(λ) as the Gamma distribution, we get a computable posterior distribution for Prob( r | Data).
  • 34. Risk Scores using Bayesian Inference
  • 35. Testing your algorithms Be sure to test algorithms in different environments. Just because it performs well in one environment does not mean it will generalize well to other environments. Since anomalies are rare, create synthetic datasets with built-in anomalies. If you can’t identify the built-in anomalies, then you have a problem. You should be constantly testing and fine-tuning your algorithms, so I recommend building a test harness to automate testing.