Anomaly detection, part 1

1
Anomaly detection – part 1
David Khosid
Jan. 14, 2015

2
 Part 1: Anomaly detection – taste of theory and code
 Statistical techniques
 Part 2: Tools
 Part 3: Clustering
 High-level message:
IoE and every Cloud solution produce Big Data. Permanent focus on utilization
of this Big Data allows new features and even new products to be developed.
Having expertise, we can choose between adopting, collaborating, buying or
developing.
Agenda

3
Use Case: a computer fan in one of your servers is not working
Features to help: 1) CPU load 2) Temperature sensor
Motivation Example: detect failing servers on a network.
0 0.2 0.4 0.6 0.8 1
30
40
60
80
100
x1
(CPU load)
x2
(Temp,0
C)
combination of features help reveal anomaly

4
Manual process:
1. ask expert and define the rule: if(cpuLoad < thr1 && Tempsensor>thr2 ) -> Anomaly
2. implementation: requires rules language. Or let’s just hardcode it for now!
Fundamental problems:
- Not scalable: in use cases, in rules, in features, in hardware
- Very static, not adaptable. Example: fault positives in case we decide to
optimize energy efficiency of our Data Center
- A posteriori knowledge, delays in months/years
Motivation Example: detect failing servers on a network.
0 0.2 0.4 0.6 0.8 1
30
40
60
80
100
x1
(CPU load)
x2
(Temp,0
C)
The manual rule

5
Vision (still doesn’t exist):
 Universal
 Scalable
 Real-time/offline, pluggable, …
In the next slides:
 Mathematical intro to universal, scalable solutions. With limitations.
Why now?
 Switch to Big Data/Cloud. New challenges.
 Easy to see benefits – many others (Google, FB…) use Anomaly Detection.
 New features for our products.
Ideal Anomaly Detection for your domain

6
Anomaly
detection
Machine Learning
Theory (adopted
from Prof. Andrew Ng
and Coursera)

7
Dataset:
Approach: given the unlabeled training set, build a model for .
Say . If is a distributed Gaussian with mean , variance .
Gaussian (Normal) distribution
 
2
2
2
2
2
2
1
),;( 






x
exp
-2 -1 0 1 2 3 4
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Gauss Distribution
 = 1,  = 1
 2
,~ Νx
 xp



8
Gaussian distribution example

9
Parameter estimation
Dataset:
-10
0
10
-4
-2
0
2
4
0
0.05
0.1
0.15
xy
p(.)
 


m
i
i
x
m 1
1

 
 
2
1
2 1


m
i
i
x
m


10
Anomaly
detection
Algorithm
Machine Learning

11
Density estimation
Training set:
Each example is
 
 
 2
2
222
2
111
,~
....
,~
,~
nnn Νx
Νx
Νx



),;(...),;(),;()( 22
222
2
111 nnnxpxpxpp  x


n
i
iiixpp
1
2
),;()( x

12
Anomaly detection algorithm
1. Choose features that you think might be indicative of
anomalous examples.
2. Fit parameters
3. Given new example , compute :
Anomaly if
n

14
Anomaly
detection
Developing and
evaluating an anomaly
detection system
Machine Learning

15
When developing a learning algorithm (choosing features, etc.),
making decisions is much easier if we have a way of evaluating our
learning algorithm.
The importance of real-number evaluation
Assume we have some labeled data, of anomalous (0-50) and non-anomalous
examples (~100-10,000). ( if normal, if anomalous).
Training set: (assume normal examples/not
anomalous) – 60% of the data
Cross validation set: 20%+50% of anom.
Test set: 20%+50% of anomalies

16
Fit model on training set
On a cross validation/test example , predict
Algorithm evaluation
Possible evaluation metrics:
- True positive, false positive, false negative, true negative
- Precision/Recall
- F1-score
Can also use cross validation set to choose parameter
fntp
tp
rec
fptp
tp
prec
recprec
recprec
F






 ;;
2
1

17
Anomaly
detection
Choosing what
features to use
Machine Learning

18
Non-gaussian features
 constx log
),;(
2
iiixp 

19
Monitoring computers in data center
Choose features that might take on unusually large or small
values in the event of an anomaly.
= memory use of computer
= number of disk accesses/sec
= CPU load
= network traffic
trafficnetwork
loadCPU
x 5
loadCPU
etemperatur
x 6

20
Anomaly
detection
Multivariate Gaussian
distribution
Machine Learning

21
Motivating example: Monitoring machines in a data center
(CPU Load)
(CPU Load)
(Memory Use)
(MemoryUse)

22
Multivariate Gaussian (Normal) distribution
. Don’t model etc. separately.
Model all in one go.
Parameters: (covariance matrix)

23
Multivariate Gaussian (Normal) examples

24

25

26

27

28

29
Anomaly
detection
Anomaly detection using
the multivariate Gaussian
distribution
Machine Learning

30
Multivariate Gaussian (Normal) distribution
Parameters
Parameter fitting:
Given training set

31
2. Given a new example , compute
Flag an anomaly if
Anomaly detection with the multivariate Gaussian
1. Fit model by setting

32
Relationship to original model
Original model:
Corresponds to multivariate Gaussian
where















2
2
2
2
1
0...0
............
000
0...0
n



33
Original model Multivariate Gaussianvs.
Manually create features to capture
anomalies where take unusual
combinations of values.
Automatically captures correlations
between features
Computationally cheaper (alternatively, scales
better to large )
Computationally more expensive
OK even if (training set size) is small Must have , or else is non-
invertible.

34
 Anomaly detection – taste of theory and code
 Statistical techniques
 Clustering: K-means algorithm
 PCA
 Neural Network
 Practical tips: missing values, SW libraries, …
 Work with textual data, similarity techniques
 Tools
Break

36
Prof. Andrew Ng. “Machine Learning”, Coursera
Credits and Learning Materials

Anomaly detection, part 1

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Anomaly detection, part 1

Similar to Anomaly detection, part 1 (20)

Recently uploaded

Recently uploaded (20)

Anomaly detection, part 1

Editor's Notes