2. 2
Part 1: Anomaly detection – taste of theory and code
Statistical techniques
Part 2: Tools
Part 3: Clustering
High-level message:
IoE and every Cloud solution produce Big Data. Permanent focus on utilization
of this Big Data allows new features and even new products to be developed.
Having expertise, we can choose between adopting, collaborating, buying or
developing.
Agenda
3. 3
Use Case: a computer fan in one of your servers is not working
Features to help: 1) CPU load 2) Temperature sensor
Motivation Example: detect failing servers on a network.
0 0.2 0.4 0.6 0.8 1
30
40
60
80
100
x1
(CPU load)
x2
(Temp,0
C)
combination of features help reveal anomaly
4. 4
Manual process:
1. ask expert and define the rule: if(cpuLoad < thr1 && Tempsensor>thr2 ) -> Anomaly
2. implementation: requires rules language. Or let’s just hardcode it for now!
Fundamental problems:
- Not scalable: in use cases, in rules, in features, in hardware
- Very static, not adaptable. Example: fault positives in case we decide to
optimize energy efficiency of our Data Center
- A posteriori knowledge, delays in months/years
Motivation Example: detect failing servers on a network.
0 0.2 0.4 0.6 0.8 1
30
40
60
80
100
x1
(CPU load)
x2
(Temp,0
C)
The manual rule
5. 5
Vision (still doesn’t exist):
Universal
Scalable
Real-time/offline, pluggable, …
In the next slides:
Mathematical intro to universal, scalable solutions. With limitations.
Why now?
Switch to Big Data/Cloud. New challenges.
Easy to see benefits – many others (Google, FB…) use Anomaly Detection.
New features for our products.
Ideal Anomaly Detection for your domain
11. 11
Density estimation
Training set:
Each example is
2
2
222
2
111
,~
....
,~
,~
nnn Νx
Νx
Νx
),;(...),;(),;()( 22
222
2
111 nnnxpxpxpp x
n
i
iiixpp
1
2
),;()( x
12. 12
Anomaly detection algorithm
1. Choose features that you think might be indicative of
anomalous examples.
2. Fit parameters
3. Given new example , compute :
Anomaly if
n
15. 15
When developing a learning algorithm (choosing features, etc.),
making decisions is much easier if we have a way of evaluating our
learning algorithm.
The importance of real-number evaluation
Assume we have some labeled data, of anomalous (0-50) and non-anomalous
examples (~100-10,000). ( if normal, if anomalous).
Training set: (assume normal examples/not
anomalous) – 60% of the data
Cross validation set: 20%+50% of anom.
Test set: 20%+50% of anomalies
16. 16
Fit model on training set
On a cross validation/test example , predict
Algorithm evaluation
Possible evaluation metrics:
- True positive, false positive, false negative, true negative
- Precision/Recall
- F1-score
Can also use cross validation set to choose parameter
fntp
tp
rec
fptp
tp
prec
recprec
recprec
F
;;
2
1
19. 19
Monitoring computers in data center
Choose features that might take on unusually large or small
values in the event of an anomaly.
= memory use of computer
= number of disk accesses/sec
= CPU load
= network traffic
trafficnetwork
loadCPU
x 5
loadCPU
etemperatur
x 6
31. 31
2. Given a new example , compute
Flag an anomaly if
Anomaly detection with the multivariate Gaussian
1. Fit model by setting
32. 32
Relationship to original model
Original model:
Corresponds to multivariate Gaussian
where
2
2
2
2
1
0...0
............
000
0...0
n
33. 33
Original model Multivariate Gaussianvs.
Manually create features to capture
anomalies where take unusual
combinations of values.
Automatically captures correlations
between features
Computationally cheaper (alternatively, scales
better to large )
Computationally more expensive
OK even if (training set size) is small Must have , or else is non-
invertible.
34. 34
Anomaly detection – taste of theory and code
Statistical techniques
Clustering: K-means algorithm
PCA
Neural Network
Practical tips: missing values, SW libraries, …
Work with textual data, similarity techniques
Tools
Break