Numenta Anomaly Benchmark - SF Data Science Meetup

THE NUMENTA ANOMALY
BENCHMARK
EVALUATING REAL TIME ANOMALY DETECTION
SF Data Science Meetup
November 19, 2015
Alexander Lavin
alavin@numenta.com

2
Monitoring
IT infrastructure
Uncovering
fraudulent
transactions
Tracking
vehicles
Real-time
health
monitoring
Monitoring
energy
consumption
Detection is necessary, but prevention is often the goal
REAL-TIME ANOMALY DETECTION
•  Exponential growth in IoT, sensors, and real-time data collection is driving an
explosion of streaming data
•  The biggest application for machine learning is anomaly detection

3
EXAMPLE: PREVENTATIVE MAINTENANCE
Planned
shutdown
Behavioral change
preceding failure
Catastrophic
failure

4
TYPES OF ANOMALIES IN STREAMING DATA
Point anomalies
Temporal
(contextual/
conditional)

5
ANOMALY DETECTION TECHNIQUES
•  Traditional techniques
•  Classification-based
•  Clustering & nearest-neighbor
•  Statistical techniques
•  Chandola et al., “Anomaly Detection: A Survey”
•  In streaming we typically see a collection of statistical techniques
•  time-series modeling and forecasting models (e.g. ARIMA)
•  change point detection
•  outliers tests (e.g. ESD, k-sigma)
•  Most techniques not suitable for streaming data
•  new approaches needed
•  non-streaming benchmarks aren't very useful

6
WHY CREATE A BENCHMARK?
•  A benchmark consists of:
•  Labeled data files
•  Scoring mechanism
•  Versioning system
•  Most existing benchmarks are designed for batch data, not
streaming data
•  We saw a need for a benchmark that is designed to test anomaly
detection algorithms on real-time, streaming data
•  Hard to find benchmarks containing real world data labeled with
anomalies
•  Impact of published techniques suffers because researchers use
use different data, and/or completely artificial data.
•  A standard community benchmark could spur innovation in real-
time anomaly detection algorithms

7
NUMENTA ANOMALY BENCHMARK (NAB)
•  NAB: a rigorous benchmark for anomaly
detection in streaming applications
•  Real-world benchmark dataset
•  58 labeled data streams
(47 real-world, 11 artificial streams)
•  Total of 365,551 data points
•  Scoring mechanism
•  Custom scoring function
•  Reward early detection
•  Anomaly windows
•  Different “application profiles”
•  Open resource
•  AGPL repository contains data, source code,
and documentation
•  github.com/numenta/NAB!

8
Unusually high load balancer latency
EXAMPLE: LOAD BALANCER HEALTH

9
Unusually low demandSpike in demand
EXAMPLE: NYC TAXI HOURLY SERVICE
DEMAND

10
EXAMPLE: PRODUCTION SERVER CPU
Spiking behavior becomes the new norm
Spike anomaly

11
HOW SHOULD WE SCORE ANOMALIES?
•  The perfect detector
•  Detects every anomaly
•  Detects anomalies as soon as possible
•  tremendous value to detecting anomalies beforehand
•  Provides detections in real time
•  Triggers no false alarms
•  Requires no parameter tuning
•  can’t manually tune params because potentially thousands of models
•  Automatically adapts to changing statistics
•  e.g. servers get new SW

12
HOW SHOULD WE SCORE ANOMALIES?
•  Scoring methods in traditional benchmarks are insufficient
•  Precision, recall, and F1-score do not incorporate the value of time
•  early detections are not rewarded
•  Artificial separation into training and test sets does not handle continuous learning
•  Batch data files allow look ahead and multiple passes through the data
•  this is unrealistic for real-world use

14
NAB DEFINES ANOMALY WINDOWS

15
•  Effect of each detection is scaled
relative to position within window:
•  Detections outside window are false
positives (scored low)
•  Multiple detections within window are
ignored (use earliest one)
•  Total score is sum of scaled detections
+ weighted sum of missed detections:
SCORING FUNCTION

16
OTHER DETAILS
•  Application profiles
•  Application profiles assign different weightings based on the tradeoff between false
positives and false negatives.
•  EKG data on a cardiac patient favors FPs over FNs.
•  IT / DevOps professionals hate FPs.
•  Three application profiles: standard, favor low false positives, favor low false negatives.
•  NAB emulates practical real-time scenarios
•  Look ahead not allowed for algorithms. Detections must be made on the fly.
•  No separation between training and test files. Invoke model, start streaming, and go.
•  No batch, per data file, parameter tuning. Must be fully automated with single set of
parameters across data files. Any further parameter tuning must be done on the fly.

17
TESTING ALGORITHMS WITH NAB
•  NAB is a community effort
•  The goal is to have researchers independently evaluate a large number of algorithms
•  Very easy to plug in and test new algorithms
•  Seed results with three algorithms:
•  Hierarchical Temporal Memory
•  Numenta’s open source streaming anomaly detection algorithm
•  Models temporal sequences in data, continuously learning
•  Etsy Skyline
•  Popular open source anomaly detection technique
•  Mixture of statistical experts, continuously learning
•  Twitter AnomalyDetection
•  Open source anomaly detection released earlier this year
•  Robust outlier statistics + piecewise approximation

18
NAB V1.0 RESULTS
A lot of room for improvement!

19
DETECTION RESULTS: CPU USAGE ON
PRODUCTION SERVER
Simple spike, all 3
algorithms detect
Shift in usage
Etsy
Skyline
Numenta
HTM
Twitter
ADVec
Red denotes
False Positive
Key

20
DETECTION RESULTS: MACHINE
TEMPERATURE READINGS
HTM detects purely
temporal anomaly
Etsy
Skyline
Numenta
HTM
Twitter
ADVec
Red denotes
False Positive
Key
All 3 detect
catastrophic failure

21
DETECTION RESULTS: TEMPORAL CHANGES IN
BEHAVIOR OFTEN PRECEDE A LARGER SHIFT
HTM detects anomaly 3
hours earlier
Etsy
Skyline
Numenta
HTM
Twitter
ADVec
Red denotes
False Positive
Key

22
SUMMARY
•  Anomaly detection is most common application for streaming analytics
•  NAB is a community benchmark for streaming anomaly detection
•  Includes a labeled dataset with real data
•  Scoring methodology designed for practical real-time applications
•  Fully open source codebase
•  What can you get out of NAB?
•  Test and improve your algorithms
•  Contribute and improve NAB
•  Learn about streaming anomaly detection

23
SUMMARY
•  What’s next for NAB?
•  We hope to see researchers test additional algorithms
•  We hope to spark improved algorithms for streaming
•  More data sets!
•  Could incorporate UC Irvine dataset, Yahoo labs dataset (not open source)
•  Would love to get more labeled streaming datasets from you
•  Add support for multivariate anomaly detection
•  Any changes that affect the results will be released with v2.0

24
NAB RESOURCES
Repository: github.com/numenta/NAB
Paper:
A. Lavin and S. Ahmad, “Evaluating Real-time Anomaly Detection Algorithms –
the Numenta Anomaly Benchmark,” to appear in 14th International Conference
on Machine Learning and Applications (IEEE ICMLA’15), 2015.
Preprint available: arxiv.org/abs/1510.03336
Presentation from MLConf:
https://www.youtube.com/watch?v=SxtsCrTHz-4
Contact info:
nab@numenta.org
alavin@numenta.com, sahmad@numenta.com

26
NUMENTA RESOURCES
•  “Properties of Sparse Distributed Representations and their Application to
Hierarchical Temporal Memory”: http://arxiv.org/abs/1503.07469
•  “Why Neurons Have Thousands of Synapses, A Theory of Sequence
Memory in Neocortex”: http://arxiv.org/abs/1511.00083
•  NuPIC: Numenta Platform for Intelligent Computing open source repo
•  https://github.com/numenta/nupic
•  http://numenta.org/
•  Numenta
•  http://numenta.com/
•  HTM Whitepaper:
http://numenta.com/learn/hierarchical-temporal-memory-white-paper.html

27
NAB EXAMPLES
•  Figs. 1, 2, 5 from the paper: plot.ly/~alavin/3767
•  Fig. 4 from the paper: plot.ly/~alavin/3753
•  Fig. 6 from the paper: plot.ly/~alavin/3706
•  Subtle change in CPU utilization that precedes a much larger anomaly:
plot.ly/~alavin/3720
•  An anomaly preceding a much larger drop in CPU utilization: plot.ly/
~alavin/3717
•  All three detectors get the two TPs, but in different orders: plot.ly/~alavin/
3741
•  Good detections by HTM, but a lot of FPs: plot.ly/~alavin/3711
•  Noisy, difficult CPU utilization data: plot.ly/~alavin/3761
•  Temporal anomalies in spiking social media data: plot.ly/~alavin/3815
•  No true anomalies, but FP detections in CPU utilization data: https://plot.ly/
~alavin/3723

28
CUSTOM DETECTOR
How$to$enter$a$custom$anomaly$detection$algorithm$into$NAB$
Please&follow&a&path&for&your&detector&under&test&(DUT).&File&extensions&are&from&NAB/&directory.&
Path%I:%create%a%detector%
Subclass$detectors/base.py$for$
your$detector$“alpha”,$add$it$as$
detectors/alpha/
alpha_detector.py.$Then$execute$
on$the$console:$python
run.py –d alpha!
Path%II:%give%anomaly%scores%
Use$your$algorithm$to$create$anomaly$
scores$in$the$Eile$format$speciEied$in$
Appendix$F$of$the$NAB$writeup,$then$
execute$from$the$console:$python
run.py –d alpha –optimize
–score --normalize!
Path%III:%give%detections%
Use$your$algorithm$to$create$
anomaly$detections$in$the$Eile$format$
speciEied$in$Appendix$F$of$the$NAB$
writeup,$then$execute$from$the$
console:$python run.py –d
alpha --score --normalize!
NAB$DATA$
CORPUS$
$
data/$
SCORES$
$
results/$
DETECTORS$
$
detectors/$
ANOMALY$
SCORES$
$
results/$
RAW$
LABELS$
$
labels/
raw/$
COMBINED)
LABELS)
)
labels/)
PROFILES)
)
conﬁg/)
preprocessed&
SCORER$
$
nab/
scorer.py$
OPTIMIZE$
THRESHOLD$$
$
nab/
runner.py$
ANOMALY$
SCORES$
$
results/$

29
•  Scoring example
a)  FP before the window
b)  TP in the window
c)  additional TP (not counted)
d)  FP soon after the window
e)  FP long after the window
Ø  total score = -1.809
•  Missing a window
completely (i.e. FN)
detriments the score
-1.0
SCALED SIGMOID SCORING FUNCTION
29
(a)
(c)
(d)
(e)
(b)

30
ANOMALY DETECTION WITH HTM
•  How do we turn a data stream into anomaly scores?
HTM Algorithms
Encoder SDR Predictions
Raw anomaly score
Anomaly likelihood
Data

31
CALCULATING RAW ANOMALY SCORE
• Raw anomaly score is the fraction of active columns that were not
predicted.
• This is high when the spatial or temporal patterns deviate from the
norm.
rawAnomalyScore =
At −(Pt−1 ∩ At )
At
Pt = Predicted columns at time t
At = Active columns at time t

32
0
20
40
60
80
100
120
Machine Temperature
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Raw Anomaly Score
RAW ANOMALY SCORES EXAMPLE

33
• Compute normal distribution over history
• Compute probability for each point relative to the distribution
CALCULATING ANOMALY LIKELIHOOD
µ = xP(x)∑ σ = E[(X −µ)2
]

34
CALCULATING ANOMALY LIKELIHOOD
0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

Probability
Probability
Distribu.on

Mean 0.0201
Std. Dev. 0.1237
0

0.2

0.4

0.6

0.8

1

Raw
Anomaly
Score

Numenta Anomaly Benchmark - SF Data Science Meetup

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (13)

Similar to Numenta Anomaly Benchmark - SF Data Science Meetup

Similar to Numenta Anomaly Benchmark - SF Data Science Meetup (20)

More from Numenta

More from Numenta (20)

Recently uploaded

Recently uploaded (20)

Numenta Anomaly Benchmark - SF Data Science Meetup