Anomaly detection

Anomaly Detection
Techniques and Best Practices
2019 Copyright QuantUniversity LLC.
Presented By:
Sri Krishnamurthy, CFA, CAP
www.QuantUniversity.com
sri@quantuniversity.com

2
About us:
• Data Science, Quant Finance and
Machine Learning Startup
• Technologies using MATLAB, Python
and R
• Programs
▫ Analytics Certificate Program
▫ Fintech programs
• Platform

• Founder of QuantUniversity LLC. and
www.analyticscertificate.com
• Advisory and Consultancy for Financial Analytics
• Prior Experience at MathWorks, Citigroup and
Endeca and 25+ financial services and energy
customers.
• Regular Columnist for the Wilmott Magazine
• Author of forthcoming book
“Financial Modeling: A case study approach”
published by Wiley
• Charted Financial Analyst and Certified Analytics
Professional
• Teaches Analytics in the Babson College MBA
program and at Northeastern University, Boston
Sri Krishnamurthy
Founder and CEO
3

What is anomaly detection?
• Anomalies or outliers are data points within the datasets
that appear to deviate markedly from expected outputs
under certain assumptions.
• Anomaly detection is the process of finding patterns in
data that do not conform to a prior expected behavior.
• Anomaly detection is being employed more increasingly in
the presence of big data that is captured by sensors, social
media platforms, huge networks, etc. including energy
systems, medical devices, banking, network intrusion
detection, etc.
4

5
• Outliers are data points that are considered out of the ordinary or
abnormal . This includes noise.
• Anomalies are a special kind of outlier that has significant/
critical/actionable information which could be of interest to
analysts.
Anomaly vs Outliers
1
2
All points not in clusters 1 & 2 are Outliers
Point B is an Anomaly (Both X and Y are large)

6
• Fraud Detection
• E-commerce
Examples

7
• Fraud Detection
▫ Credit card fraud detection
– By owner or by operation
▫ Mobile phone fraud/anomaly detection
– Calling behavior, volume etc
▫ Insurance claim fraud detection
– Medical malpractice
– Auto insurance
▫ Insider trading detection
• E-commerce
▫ Pricing issues
▫ Network issues
Applications of Anomaly Detection

8
• Intrusion detection:
▫ Detect malicious activity in computer systems
▫ This could be host-based or network-based
• Medical anomalies
Examples of Anomaly Detection

9
• Manufacturing and sensors:
▫ Fault detection
▫ Heat, fire sensors
• Text data
▫ Novel topics, events
▫ Plagiarism
Examples of Anomaly Detection

Anomaly Detection Methods
• Most outlier detection methods generate an output
that can be categorized in one of the following
groups:
▫ Real-valued outlier score: which quantifies the tendency
of a data point being an outlier by assigning a score or
probability to it.
▫ Binary label: which is the result of using a threshold to
convert outlier scores to binary labels, inlier or outlier.
10

11
1. Graphical approaches
2. Statistical approaches
3. Machine learning approaches
4. Time series methods
Illustration of four methodologies to Anomaly Detection

12
ü Boxplot
ü Scatter plot
ü Adjusted quantile plot
ü Symbol plot

Graphical approaches
• Graphical methods utilize extreme value analysis, by which outliers
correspond to the statistical tails of probability distributions.
• Statistical tails are most commonly used for one dimensional
distributions, although the same concept can be applied to
multidimensional case.
• It is important to understand that all extreme values are outliers
but the reverse may not be true.
• For instance in one dimensional dataset of
{1,3,3,3,50,97,97,97,100}, observation 50 equals to mean and isn’t
considered as an extreme value, but since this observation is the
most isolated point, it should be considered as an outlier from a
generative perspective.
13

Box plot
• A standardized way of displaying the
variation of data based on the five
number summary, which includes
minimum, first quartile, median, third
quartile, and maximum.
• This plot does not make any assumptions
of the underlying statistical distribution.
• Any data not included between the
minimum and maximum are considered
as an outlier.
14

Boxplot
15
See Graphical_Approach.ipynb
Side-by-side boxplot for each variable

Scatter plot
• A mathematical diagram, which uses Cartesian coordinates for plotting ordered
pairs to show the correlation between typically two random variables.
• This plot is useful for detecting outliers.
• An outlier is defined as a data point that doesn't seem to fit with the rest of the
data points.
• In scatterplots, outliers of either intersection or union sets of two variables can
be shown.
16

Scatterplot
17
Scatterplot of Sepal.Width and Sepal.Length

18
• In statistics, a Q–Q plot is a probability plot, which is a graphical
method for comparing two probability distributions by plotting their
quantiles against each other.
• If the two distributions being compared are similar, the points in the
Q–Q plot will approximately lie on the line y = x.
Q-Q plot
Source: Wikipedia

Adjusted quantile plot
• This plot identifies possible multivariate outliers by calculating the Mahalanobis
distance of each point from the center of the data.
• Multi-dimensional Mahalanobis distance between vectors x and y in !" can be
formulated as:
d(x,y) = x − y ,S./(x − y)
where x and y are random vectors of the same distribution with the covariance
matrix S.
• An outlier is defined as a point with a distance larger than some predetermined
value.
19

• Before applying this method and many other parametric
multivariate methods, first we need to check if the data is
multivariate normally distributed using different
multivariate normality tests, such as Royston, Mardia, Chi-
square, univariate plots, etc.
• In R, we use the “mvoutlier” package, which utilizes
graphical approaches as discussed above.
20

21
Min-Max normalization before diving into analysis
Multivariate normality test
Outlier Boolean vector identifies the
outliers
Alpha defines maximum thresholding proportion

22
Mahalanobis distances
Covariance matrix

23

Symbol plot
• This plot plots two dimensional data, using robust Mahalanobis distances based
on the minimum covariance determinant(mcd) estimator with adjustment.
• Minimum Covariance Determinant (MCD) estimator looks for the subset of h
data points whose covariance matrix has the smallest determinant.
• Four drawn ellipsoids in the plot show the Mahalanobis distances correspond to
25%, 50%, 75% and adjusted quantiles of the chi-square distribution.
24

Symbol plot
25
Parameter “quan” defines the amount of observations,
which are used for minimum covariance determinant
estimations. The default is 0.5.
Alpha defines the amount of observations used for
calculating the adjusted quantile.

26
ü Hypothesis testing ( Chi-square test, Grubb’s test)
ü Scores

Hypothesis testing
• This method draws conclusions about a sample point by testing whether it
comes from the same distribution as the training data.
• Statistical tests, such as the t-test and the ANOVA table, can be used on multiple
subsets of the data.
• Here, the level of signiﬁcance, i.e, the probability of incorrectly rejecting the
true null hypothesis, needs to be chosen.
• To apply this method in R, “outliers” package, which utilizes statistical
tests, is used .
27

Chi-square test
• Chi-square test performs a simple test for detecting outliers of univariate data
based on Chi-square distribution of squared difference between data and
sample mean.
• In this test, sample variance counts as the estimator of the population variance.
• Chi-square test helps us identify the lowest and highest values, since outliers
can exist in both tails of the data.
28

Chi-square test
29
See Statistical_Approach.ipynb
This function repeats the Chi-square test until it finds all
the outliers within the data.

Grubbs’ test
•This test is defined for the following hypotheses:
H0: There are no outliers in the data set
H1: There is exactly one outlier in the data set
•The Grubbs' test statistic is defined as:
! =
#$% % − ̅%
(
30

Grubbs’ test
31
The above function repeats the Grubbs’ test until it finds
all the outliers within the data.

Grubbs’ test
32
Histogram of normal observations vs outliers)

Scores
• Scores quantifies the tendency of a data point being an outlier by assigning it a
score or probability.
• The most commonly used scores are:
▫ Normal score:
!" #$%&'
()&'*&+* *%,-&)-.'
▫ T-student score:
(0#(1+) '#2 )
(1+)(0#4#)5)
▫ Chi-square score:
!" #$%&'
(*
2
▫ IQR score: 67-64
• By using “score” function in R, p-values can be returned instead of scores.
33

Scores
34
“type” defines the type of the score, such as
normal, t-student, etc.
“prob=1” returns the corresponding p-value.

Scores
35
By setting “prob” to any specific value, logical vector
returns the data points, whose probabilities are
greater than this cut-off value, as outliers.
By setting “type” to IQR, all values lower than first
and greater than third quartiles are considered and
difference between them and nearest quartile
divided by IQR is calculated.

36
ü Linear regression
ü Piecewise/ segmented regression
ü Autoencoder-Decoder methods
ü Clustering-based approaches
ü PCA

Linear regression
• Linear regression investigates the linear relationships between variables and
predict one variable based on one or more other variables and it can be
formulated as:
! = #$ + &
'()
*
#'+'
where Y and +' are random variables, #' is regression coefficient and #$ is a
constant.
• In this model, ordinary least squares estimator is usually used to minimize the
difference between the dependent variable and independent variables.
37

Piecewise/segmented regression
• A method in regression analysis, in which the independent variable is
partitioned into intervals to allow multiple linear models to be fitted to data for
different ranges.
• This model can be applied when there are ‘breakpoints’ and clearly two
different linear relationships in the data with a sudden, sharp change in
directionality. Below is a simple segmented regression for data with two
breakpoints:
! = #$ + &'( ( < ('
! = #' + &*( ( > ('
where Y is a predicted value, X is an independent variable, #$ and #' are
constant values, &' and &* are regression coefficients, and (' and (* are
breakpoints.
38

39
Anomaly detection vs Supervised learning

• For this example, we use “segmented” package in R to first illustrate piecewise
regression for two dimensional data set, which has a breakpoint around z=0.5.
40
See Piecewise_Regression.ipynb
“pmax” is used for parallel maximization to
create different values for y.

• Then, we use linear regression to predict y values for each segment of z.
41

• Finally, the outliers can be detected for each segment by setting some rules for
residuals of model.
42
Here, we set the rule for the residuals corresponding to z
less than 0.5, by which the outliers with residuals below
0.5 can be defined as outliers.

43
• Motivation1:
Autoencoders
1. http://ai.stanford.edu/~quocle/tutorial2.pdf

44
• Goal is to have !" to approximate x
• Interesting applications such as
▫ Data compression
▫ Visualization
▫ Pre-train neural networks
Autoencoder

45
Demo in Keras1
1. https://blog.keras.io/building-autoencoders-in-keras.html
2. https://keras.io/models/model/

46
Principal Component Analysis
Principal component analysis (PCA) is a statistical
procedure that uses an orthogonal transformation to
convert a set of observations of possibly correlated
variables (entities each of which takes on various
numerical values) into a set of values of linearly
uncorrelated variables called principal components.
In Outlier analysis, we do principal component
analysis and computes p-values to test for outliers.
https://en.wikipedia.org/wiki/Principal_component_analysis

Clustering-based approaches
• These methods are suitable for unsupervised anomaly detection.
• They aim to partition the data into meaningful groups (clusters) based on the
similarities and relationships between the groups found in the data.
• Each data point is assigned a degree of membership for each of the clusters.
• Anomalies are those data points that:
▫ Do not ﬁt into any clusters.
▫ Belong to a particular cluster but are far away from the cluster centroid.
▫ Form small or sparse clusters.
47

• These methods partition the data into k clusters by assigning each data point to
its closest cluster centroid by minimizing the within-cluster sum of squares
(WSS), which is:
!
"#$
%
!
&∈()
!
*#$
+
(-&* − /"*)1
where 2" is the set of observations in the kth cluster and /"* is the mean of jth
variable of the cluster center of the kth cluster.
• Then, they select the top n points that are the farthest away from their nearest
cluster centers as outliers.
48

49
Anomaly Detection vs Unsupervised Learning

• “Kmod” package in R is used to show the application of K-means model.
50
In this example the number of clusters is defined
through bend graph in order to pass to K-mod
function.
See Clustering_Approach.ipynb

51
K=4 is the number of clusters and L=10 is
the number of outliers

52
Scatter plots of normal and outlier data points

53
ü Twitter Outlier Detection

Time-series method
• Time-series model is used to identify outliers only in univariate time-series
data.
• In order to apply this model, we use “Anomalydetection” package in R.
• This package was published by twitter for detecting anomalies in time-series
data in the presence of seasonality and an underlying trend using statistical
approaches.
• Since this package uses a specific algorithm to detect anomalies, we go over it
in details in the next slide.

Anomaly detection, R package
• Twitter’s R package: https://github.com/twitter/AnomalyDetection
• Seasonal Hybrid ESD (S-H-ESD), which builds upon the Generalized ESD test, is the
underlying algorithm of this package.
• The algorithm employs time series decomposition and statistical metrics with ESD test.
• Since the time-series data exhibit a huge variety of pattern, time-series decomposition,
which a statistical method, is used to decompose the data into its four components.
• The four components are:
1. Trend: refers to the long term progression of the series
2. Cyclical: refers to variations in recognizable cycles
3. Seasonal: refers to seasonal variations or fluctuations
4. Irregular: describes random, irregular influences
v Find more about ESD test in tutorial slides.

Anomaly detection, R package
56
See TimeSeriesAnomalies.ipynb

Summary
57
We have covered Anomaly detection
Introduction ü Definition of anomaly detection and its importance in energy systems
ü Different types of anomaly detection methods: Statistical, graphical and machine
learning methods
Graphical approach ü Graphical methods consist of boxplot, scatterplot, adjusted quantile plot and symbol
plot to demonstrate outliers graphically
ü The main assumption for applying graphical approaches is multivariate normality
ü Mahalanobis distance methods is mainly used for calculating the distance of a point
from a center of multivariate distribution
Statistical approach ü Statistical hypothesis testing includes of: Chi-square, Grubb’s test
ü Statistical methods may use either scores or p-value as threshold to detect outliers
Machine learning approach ü Both supervised and unsupervised learning methods can be used for outlier detection
ü Piece wised or segmented regression can be used to identify outliers based on the
residuals for each segment
ü In K-means clustering method outliers are defined as points which have doesn’t belong
to any cluster, are far away from the centroids of the cluster or shaping sparse clusters
ü In PCA, Auto-encoder decoder methods, we look at points that weren’t recovered closer
to the original points as anomalies
Time Series ü Temporal outlier detection to detect anomalies which is robust, from a statistical
standpoint, in the presence of seasonality and an underlying trend.

(MATLAB version also available)
www.analyticscertificate.com

Thank you!
Sri Krishnamurthy, CFA, CAP
Founder and CEO
QuantUniversity LLC.
srikrishnamurthy
www.QuantUniversity.com
Contact
Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be
distributed or used in any other publication without the prior written consent of QuantUniversity LLC.
60

Anomaly detection

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Anomaly detection

Similar to Anomaly detection (20)

More from QuantUniversity

More from QuantUniversity (20)

Recently uploaded

Recently uploaded (20)

Anomaly detection