Evaluating Real-Time Anomaly Detection: The Numenta Anomaly Benchmark

EVALUATING REAL-TIME ANOMALY DETECTION:
THE NUMENTA ANOMALY BENCHMARK
MLCONF San Francisco
November 13, 2015
Subutai Ahmad
sahmad@numenta.com

2
Monitoring
IT infrastructure
Uncovering
fraudulent
transactions
Tracking
vehicles
Real-time
health
monitoring
Monitoring
energy
consumption
Detection is necessary, but prevention is often the goal
REAL-TIME ANOMALY DETECTION
• Exponential growth in IoT, sensors and real-time data collection is driving an
explosion of streaming data
• The biggest application for machine learning is anomaly detection

3
EXAMPLE: PREVENTATIVE MAINTENANCE

4
EXAMPLE: PREVENTATIVE MAINTENANCE
Planned
shutdown
Behavioral change
preceding failure
Catastrophic
failure

5
YET ANOTHER BENCHMARK?
• A benchmark consists of:
• Labeled data sets
• Scoring mechanism
• Versioning system
• Most existing benchmarks are designed for batch data, not
streaming data
• Hard to find benchmarks containing real world data labeled with
anomalies
• We saw a need for a benchmark that is designed to test anomaly
detection algorithms on real-time, streaming data
• A standard community benchmark could spur innovation in real-
time anomaly detection algorithms

6
NUMENTA ANOMALY BENCHMARK (NAB)
• NAB: a rigorous benchmark for anomaly
detection in streaming applications

7
• Real-world benchmark data set
• 58 labeled data streams
(47 real-world, 11 artificial streams)
• Total of 365,551 data points

8
• Reward early detection
• Anomaly windows
• Scoring function
• Different “application profiles”

9
• Reward early detection
• Anomaly windows
• Scoring function
• Different “application profiles”
• Open resource
• AGPL repository contains data, source code,
and documentation
• github.com/numenta/NAB

10
EXAMPLE: LOAD BALANCER HEALTH
Unusually high load balancer latency

11
EXAMPLE: HOURLY SERVICE DEMAND
Spike in demand
Unusually low demand

12
EXAMPLE: PRODUCTION SERVER CPU
Spiking behavior becomes the new norm
Spike anomaly

13
HOW SHOULD WE SCORE ANOMALIES?
• The perfect detector
• Detects every anomaly
• Detects anomalies as soon as possible
• Provides detections in real time
• Triggers no false alarms
• Requires no parameter tuning
• Automatically adapts to changing statistics

14
HOW SHOULD WE SCORE ANOMALIES?
• The perfect detector
• Detects every anomaly
• Detects anomalies as soon as possible
• Provides detections in real time
• Triggers no false alarms
• Requires no parameter tuning
• Automatically adapts to changing statistics
• Scoring methods in traditional benchmarks are insufficient
• Precision/recall does not incorporate importance of early detection
• Artificial separation into training and test sets does not handle continuous learning
• Batch data files allow look ahead and multiple passes through the data

16
NAB DEFINES ANOMALY WINDOWS

17
• Effect of each detection is scaled
relative to position within window:
• Detections outside window are false
positives (scored low)
• Multiple detections within window are
ignored (use earliest one)
SCORING FUNCTION

18
• Effect of each detection is scaled
relative to position within window:
• Detections outside window are false
positives (scored low)
• Multiple detections within window are
ignored (use earliest one)
• Total score is sum of scaled detections
+ weighted sum of missed detections:
SCORING FUNCTION

19
OTHER DETAILS
• Application profiles
• Three application profiles assign different weightings based on the tradeoff between
false positives and false negatives.
• EKG data on a cardiac patient favors False Positives.
• IT / DevOps professionals hate False Positives.
• Three application profiles: standard, favor low false positives, favor low false negatives.

20
OTHER DETAILS
• Application profiles
• Three application profiles assign different weightings based on the tradeoff between
false positives and false negatives.
• EKG data on a cardiac patient favors False Positives.
• IT / DevOps professionals hate False Positives.
• Three application profiles: standard, favor low false positives, favor low false negatives.
• NAB emulates practical real-time scenarios
• Look ahead not allowed for algorithms. Detections must be made on the fly.
• No separation between training and test files. Invoke model, start streaming, and go.
• No batch, per dataset, parameter tuning. Must be fully automated with single set of
parameters across datasets. Any further parameter tuning must be done on the fly.

21
TESTING ALGORITHMS WITH NAB
• NAB is a community effort
• The goal is to have researchers independently evaluate a large number of algorithms
• Very easy to plug in and test new algorithms

22
TESTING ALGORITHMS WITH NAB
• NAB is a community effort
• The goal is to have researchers independently evaluate a large number of algorithms
• Very easy to plug in and test new algorithms
• Seed results with three algorithms:
• Hierarchical Temporal Memory
• Numenta’s open source streaming anomaly detection algorithm
• Models temporal sequences in data, continuously learning
• Etsy Skyline
• Popular open source anomaly detection technique
• Mixture of statistical experts, continuously learning
• Twitter ADVec
• Open source anomaly detection released earlier this year
• Robust outlier statistics + piecewise approximation

23
NAB V1.0 RESULTS (58 FILES)

24
DETECTION RESULTS: CPU USAGE ON
PRODUCTION SERVER
Simple spike, all 3
algorithms detect
Shift in usage
Etsy
Skyline
Numenta
HTM
Twitter
ADVec
Red denotes
False Positive
Key

25
DETECTION RESULTS: MACHINE
TEMPERATURE READINGS
HTM detects purely
temporal anomaly
Etsy
Skyline
Numenta
HTM
Twitter
ADVec
Red denotes
False Positive
Key
All 3 detect
catastrophic failure

26
DETECTION RESULTS: TEMPORAL CHANGES IN
BEHAVIOR OFTEN PRECEDE A LARGER SHIFT
HTM detects anomaly 3
hours earlier
Etsy
Skyline
Numenta
HTM
Twitter
ADVec
Red denotes
False Positive
Key

27
SUMMARY
• Anomaly detection is most common application for streaming analytics
• NAB is a community benchmark for streaming anomaly detection
• Includes a labeled dataset with real data
• Scoring methodology designed for practical real-time applications
• Fully open source codebase

28
SUMMARY
• Anomaly detection is most common application for streaming analytics
• NAB is a community benchmark for streaming anomaly detection
• Includes a labeled dataset with real data
• Scoring methodology designed for practical real-time applications
• Fully open source codebase
• What’s next for NAB?
• We hope to see researchers test additional algorithms
• We hope to spark improved algorithms for streaming
• More data sets!
• Could incorporate UC Irvine dataset, Yahoo labs dataset (not open source)
• Would love to get more labeled streaming datasets from you
• Add support for multivariate anomaly detection

29
NAB RESOURCES
Table 12 at MLConf
Repository: https://github.com/numenta/NAB
Paper:
A. Lavin and S. Ahmad, “Evaluating Real-time Anomaly Detection Algorithms –
the Numenta Anomaly Benchmark,” to appear in 14th International Conference
on Machine Learning and Applications (IEEE ICMLA’15), 2015.
Preprint available: http://arxiv.org/abs/1510.03336
Contact info:
sahmad@numenta.com , alavin@numenta.com

Evaluating Real-Time Anomaly Detection: The Numenta Anomaly Benchmark

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Evaluating Real-Time Anomaly Detection: The Numenta Anomaly Benchmark

Similar to Evaluating Real-Time Anomaly Detection: The Numenta Anomaly Benchmark (20)

More from Numenta

More from Numenta (20)

Recently uploaded

Recently uploaded (20)

Evaluating Real-Time Anomaly Detection: The Numenta Anomaly Benchmark