SlideShare a Scribd company logo
DETECTING ANOMALIES IN STREAMING DATA
Data By The Bay
May 19, 2016
Subutai Ahmad
@SubutaiAhmad
sahmad@numenta.com
OUTLINE
• Real-time streaming analytics
• Anomaly detection with Hierarchical Temporal Memory
• Benchmarking real-time anomaly detection
• Summary
Monitoring
IT infrastructure
Uncovering
fraudulent
transactions
Tracking
vehicles
Real-time
health
monitoring
Monitoring
energy
consumption
Detection is necessary, but prevention is often the goal
REAL-TIME ANOMALY DETECTION
•  Exponential growth in IoT, sensors and real-time data collection is driving an
explosion of streaming data
•  The biggest application for machine learning is anomaly detection
EXAMPLE: PREVENTIVE MAINTENANCE
EXAMPLE: PREVENTIVE MAINTENANCE
Planned
shutdown
Behavioral change
preceding failure
Catastrophic
failure
THE STREAMING ANALYTICS PROBLEM
Given all past input and current
input, decide whether the system
behavior is anomalous right now.
Must report decision, perform
any retraining, bookkeeping,
etc. before next input arrives.
No look-ahead
No training/test set split – everything must be done online
System must be automated, and customized to each stream
HIERARCHICAL TEMPORAL MEMORY (HTM)
• Powerful sequence memory derived
from recent findings in experimental
neuroscience
• High capacity memory based system
• Models temporal sequences in data
• Inherently streaming
• Continuously learning and predicting
• No need to tune hyper-parameters
• Open source: github.com/numenta
HTM PREDICTS FUTURE INPUT
• Input to the system is a stream of data
• Encoded into a sparse high dimensional vector
• Learns temporal sequences in input stream and makes a prediction
in the form of a sparse vector
•  represents a prediction for upcoming input
HTM
ANOMALY DETECTION WITH HTM
HTM
Raw anomaly
score
Anomaly
likelihood
is an instantaneous measure of
prediction error
•  0 if input was perfectly prediction
•  1 if it was completely unpredicted
•  Could threshold it directly to report
anomalies, but in very noisy
environments we can do better
ANOMALY LIKELIHOOD
• Second order measure: did the predictability of the metric change?
1.  Estimate historical distribution of anomaly scores
2.  Check if recent scores are very different
ANOMALY LIKELIHOOD
• Second order measure: did the predictability of the metric change?
1.  Estimate historical distribution of anomaly scores
2.  Check if recent scores are very different
ANOMALY DETECTION WITH HTM
HTM
Raw anomaly
score
Anomaly
likelihood
Learns temporal sequences
Continuously makes predictions
Continuously learning
Was current input
predicted?
Has level of
predictability changed
significantly?
ANOMALIES IN IT INFRASTRUCTURE
• Grok
•  Commercial server based product detects anomalies in IT infrastructure
•  Runs thousands of HTM anomaly detectors in real time
•  10 milliseconds per input per metric, including continuous learning
•  No parameter tuning required
•  http://grokstream.com
ANOMALIES IN FINANCIAL DATA
• HTM for Stocks
•  Real-time free demo application
•  Continuously monitors top 200 stocks
•  Available on iOS App Store or Google Play Store
•  Open source application: github.com/numenta/numenta-apps
OUTLINE
• Real-time streaming analytics
• Anomaly detection with Hierarchical Temporal Memory
• Benchmarking real-time anomaly detection
• Summary
EVALUATING STREAMING ANOMALY DETECTION
•  Most existing benchmarks are designed for batch data, not
streaming data
•  Hard to find benchmarks containing real world data labeled with
anomalies
•  There is a need for an open benchmark designed to test real-time
anomaly detection
•  A standard community benchmark could spur innovation in
streaming anomaly detection algorithms
NUMENTA ANOMALY BENCHMARK (NAB)
•  NAB: a rigorous benchmark for anomaly
detection in streaming applications
NUMENTA ANOMALY BENCHMARK (NAB)
•  NAB: a rigorous benchmark for anomaly
detection in streaming applications
•  Real-world benchmark data set
•  58 labeled data streams
(47 real-world, 11 artificial streams)
•  Total of 365,551 data points
NUMENTA ANOMALY BENCHMARK (NAB)
•  NAB: a rigorous benchmark for anomaly
detection in streaming applications
•  Real-world benchmark data set
•  58 labeled data streams
(47 real-world, 11 artificial streams)
•  Total of 365,551 data points
•  Scoring mechanism
•  Rewards early detection
•  Different “application profiles”
NUMENTA ANOMALY BENCHMARK (NAB)
•  NAB: a rigorous benchmark for anomaly
detection in streaming applications
•  Real-world benchmark data set
•  58 labeled data streams
(47 real-world, 11 artificial streams)
•  Total of 365,551 data points
•  Scoring mechanism
•  Rewards early detection
•  Different “application profiles”
•  Open resource
•  AGPL repository contains data, source code,
and documentation
•  github.com/numenta/NAB
•  Ongoing competition to expand NAB
EXAMPLE: HOURLY SERVICE DEMAND
Spike in demand
Unusually low demand
EXAMPLE: PRODUCTION SERVER CPU
Spiking behavior becomes the new norm
Spike anomaly
HOW SHOULD WE SCORE ANOMALIES?
•  The perfect detector
•  Detects anomalies as soon as possible
•  Provides detections in real time
•  Triggers no false alarms
•  Requires no parameter tuning
•  Automatically adapts to changing statistics
•  Scoring methods in traditional benchmarks are insufficient
•  Precision/recall does not incorporate importance of early detection
•  Artificial separation into training and test sets does not handle continuous learning
•  Batch data files allow look ahead and multiple passes through the data
WHERE IS THE ANOMALY?
NAB DEFINES ANOMALY WINDOWS
NAB scoring function gives higher score to earlier detections in window
OTHER DETAILS
•  Application profiles
•  Three application profiles assign different weightings based on the tradeoff between
false positives and false negatives.
•  EKG data on a cardiac patient favors False Positives.
•  IT / DevOps professionals hate False Positives.
•  Three application profiles: standard, favor low false positives, favor low false negatives.
•  NAB emulates practical real-time scenarios
•  Look ahead not allowed for algorithms. Detections must be made on the fly.
•  No separation between training and test files. Invoke model, start streaming, and go.
•  No batch parameter tuning. Must be fully automated with single set of parameters
across data streams. Any further parameter tuning must be done on the fly.
TESTING ALGORITHMS WITH NAB
•  NAB is designed to easily plug in and test new algorithms
•  Results with several algorithms:
•  Hierarchical Temporal Memory
•  Etsy Skyline
•  Popular open source anomaly detection technique
•  Mixture of statistical experts, continuously learning
•  Twitter ADVec
•  Open source anomaly detection released last year
•  Robust outlier statistics + piecewise approximation
•  Bayesian Online Change Point Detection
•  Formal Bayesian method for detecting anomalies in time series
NAB V1.0 RESULTS (58 FILES)
DETECTION RESULTS: CPU USAGE ON
PRODUCTION SERVER
Simple spike, all 3
algorithms detect
Shift in usage
Etsy
Skyline
Numenta
HTM
Twitter
ADVec
Red denotes
False Positive
Key
DETECTION RESULTS: MACHINE
TEMPERATURE READINGS
HTM detects purely
temporal anomaly
Etsy
Skyline
Numenta
HTM
Twitter
ADVec
Red denotes
False Positive
Key
All 3 detect
catastrophic failure
DETECTION RESULTS: TEMPORAL CHANGES IN
BEHAVIOR OFTEN PRECEDE A LARGER SHIFT
HTM detects anomaly 3
hours earlier
Etsy
Skyline
Numenta
HTM
Twitter
ADVec
Red denotes
False Positive
Key
NAB COMPETITION!!
•  NAB is a resource for the streaming analytics community
•  Need additional real-world data files and more algorithms tested
•  NAB Competition offers cash prizes for:
•  Additional anomaly detection algorithms tested on NAB
•  Submission of real-world data files with labeled real anomalies
•  Cash prizes of $2,500 each for algorithms and data
•  Easy to enter, high likelihood of winning!
•  Go to http://numenta.org/nab for details
SUMMARY
•  Anomaly detection for streaming data imposes unique challenges
•  Stringent real-time constraints and automation requirements
•  Typical batch methodologies do not work well
•  HTM learning algorithms
•  Can be used to create a streaming anomaly detection system
•  Performs very well across a wide range of datasets
•  Open source, commercially deployable
•  NAB is an open source benchmark for streaming anomaly detection
•  Includes a labeled dataset with real world data
•  Scoring methodology designed for practical real-time applications
•  NAB competition!
RESOURCES
Grok (anomalies in IT infrastructure): http://grokstream.com
HTM Studio (desktop app for easy experimentation): contact me
Open Source Repositories:
Algorithm code: https://github.com/numenta/nupic
HTM Stocks demo: https://github.com/numenta/numenta-apps
NAB code + paper: https://github.com/numenta/nab
Apache Flink: https://github.com/nupic-community/flink-htm
Contact info:
Subutai Ahmad sahmad@numenta.com, @SubutaiAhmad
Alex Lavin alavin@numenta.com, @theAlexLavin

More Related Content

Viewers also liked

Chapter 2.1 : Data Stream
Chapter 2.1 : Data StreamChapter 2.1 : Data Stream
Chapter 2.1 : Data Stream
Ministry of Higher Education
 
Big Data and Stream Data Analysis at Politecnico di Milano
Big Data and Stream Data Analysis at Politecnico di MilanoBig Data and Stream Data Analysis at Politecnico di Milano
Big Data and Stream Data Analysis at Politecnico di Milano
Marco Brambilla
 
Data streaming algorithms
Data streaming algorithmsData streaming algorithms
Data streaming algorithms
Sandeep Joshi
 
[RakutenTechConf2013] [D-3_2] Counting Big Data by Streaming Algorithms
[RakutenTechConf2013] [D-3_2] Counting Big Databy Streaming Algorithms[RakutenTechConf2013] [D-3_2] Counting Big Databy Streaming Algorithms
[RakutenTechConf2013] [D-3_2] Counting Big Data by Streaming Algorithms
Rakuten Group, Inc.
 
Streaming Algorithms
Streaming AlgorithmsStreaming Algorithms
Streaming Algorithms
Joe Kelley
 
Data Stream Outlier Detection Algorithm
Data Stream Outlier Detection Algorithm Data Stream Outlier Detection Algorithm
Data Stream Outlier Detection Algorithm
Hamza Aslam
 
Stream processing using Apache Storm - Big Data Meetup Athens 2016
Stream processing using Apache Storm - Big Data Meetup Athens 2016Stream processing using Apache Storm - Big Data Meetup Athens 2016
Stream processing using Apache Storm - Big Data Meetup Athens 2016
Adrianos Dadis
 
Data Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and RData Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and R
Radek Maciaszek
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.final
Hortonworks
 
Márton Balassi Streaming ML with Flink-
Márton Balassi Streaming ML with Flink- Márton Balassi Streaming ML with Flink-
Márton Balassi Streaming ML with Flink-
Flink Forward
 
Data Stream Analytics - Why they are important
Data Stream Analytics - Why they are importantData Stream Analytics - Why they are important
Data Stream Analytics - Why they are important
Paris Carbone
 
Advanced data science algorithms applied to scalable stream processing by Dav...
Advanced data science algorithms applied to scalable stream processing by Dav...Advanced data science algorithms applied to scalable stream processing by Dav...
Advanced data science algorithms applied to scalable stream processing by Dav...
Big Data Spain
 

Viewers also liked (12)

Chapter 2.1 : Data Stream
Chapter 2.1 : Data StreamChapter 2.1 : Data Stream
Chapter 2.1 : Data Stream
 
Big Data and Stream Data Analysis at Politecnico di Milano
Big Data and Stream Data Analysis at Politecnico di MilanoBig Data and Stream Data Analysis at Politecnico di Milano
Big Data and Stream Data Analysis at Politecnico di Milano
 
Data streaming algorithms
Data streaming algorithmsData streaming algorithms
Data streaming algorithms
 
[RakutenTechConf2013] [D-3_2] Counting Big Data by Streaming Algorithms
[RakutenTechConf2013] [D-3_2] Counting Big Databy Streaming Algorithms[RakutenTechConf2013] [D-3_2] Counting Big Databy Streaming Algorithms
[RakutenTechConf2013] [D-3_2] Counting Big Data by Streaming Algorithms
 
Streaming Algorithms
Streaming AlgorithmsStreaming Algorithms
Streaming Algorithms
 
Data Stream Outlier Detection Algorithm
Data Stream Outlier Detection Algorithm Data Stream Outlier Detection Algorithm
Data Stream Outlier Detection Algorithm
 
Stream processing using Apache Storm - Big Data Meetup Athens 2016
Stream processing using Apache Storm - Big Data Meetup Athens 2016Stream processing using Apache Storm - Big Data Meetup Athens 2016
Stream processing using Apache Storm - Big Data Meetup Athens 2016
 
Data Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and RData Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and R
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.final
 
Márton Balassi Streaming ML with Flink-
Márton Balassi Streaming ML with Flink- Márton Balassi Streaming ML with Flink-
Márton Balassi Streaming ML with Flink-
 
Data Stream Analytics - Why they are important
Data Stream Analytics - Why they are importantData Stream Analytics - Why they are important
Data Stream Analytics - Why they are important
 
Advanced data science algorithms applied to scalable stream processing by Dav...
Advanced data science algorithms applied to scalable stream processing by Dav...Advanced data science algorithms applied to scalable stream processing by Dav...
Advanced data science algorithms applied to scalable stream processing by Dav...
 

Similar to Detecting Anomalies in Streaming Data

Evaluating Real-Time Anomaly Detection: The Numenta Anomaly Benchmark
Evaluating Real-Time Anomaly Detection: The Numenta Anomaly BenchmarkEvaluating Real-Time Anomaly Detection: The Numenta Anomaly Benchmark
Evaluating Real-Time Anomaly Detection: The Numenta Anomaly Benchmark
Numenta
 
Anomaly Detection Using the CLA
Anomaly Detection Using the CLAAnomaly Detection Using the CLA
Anomaly Detection Using the CLANumenta
 
SmartData Webinar: Applying Neocortical Research to Streaming Analytics
SmartData Webinar: Applying Neocortical Research to Streaming AnalyticsSmartData Webinar: Applying Neocortical Research to Streaming Analytics
SmartData Webinar: Applying Neocortical Research to Streaming Analytics
DATAVERSITY
 
How the Big Data of APM can Supercharge DevOps
How the Big Data of APM can Supercharge DevOpsHow the Big Data of APM can Supercharge DevOps
How the Big Data of APM can Supercharge DevOps
CA Technologies
 
Chris Irwin - Business Development Director, Tridium
Chris Irwin - Business Development Director, TridiumChris Irwin - Business Development Director, Tridium
Chris Irwin - Business Development Director, Tridium
Global Business Intelligence
 
Real Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalReal Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from Pivotal
VMware Tanzu Korea
 
Apeman masta midih-oc2_demo_day
Apeman masta midih-oc2_demo_dayApeman masta midih-oc2_demo_day
Apeman masta midih-oc2_demo_day
MIDIH_EU
 
Machine Intelligence in Manufacturing Industry - Igor Mihajlovic
Machine Intelligence in Manufacturing Industry - Igor MihajlovicMachine Intelligence in Manufacturing Industry - Igor Mihajlovic
Machine Intelligence in Manufacturing Industry - Igor Mihajlovic
Institute of Contemporary Sciences
 
Fraud Analytics with Machine Learning and Big Data Engineering for Telecom
Fraud Analytics with Machine Learning and Big Data Engineering for TelecomFraud Analytics with Machine Learning and Big Data Engineering for Telecom
Fraud Analytics with Machine Learning and Big Data Engineering for Telecom
Sudarson Roy Pratihar
 
Anomaly detection - TIBCO Data Science Central
Anomaly detection - TIBCO Data Science CentralAnomaly detection - TIBCO Data Science Central
Anomaly detection - TIBCO Data Science Central
Michael O'Connell
 
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
confluent
 
Building data intensive applications
Building data intensive applicationsBuilding data intensive applications
Building data intensive applications
Amit Kejriwal
 
Observability - The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
Observability -  The good, the bad and the ugly Xp Days 2019 Kiev Ukraine Observability -  The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
Observability - The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
Aleksandr Tavgen
 
Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex
DataWorks Summit/Hadoop Summit
 
Performance tuning Grails applications
 Performance tuning Grails applications Performance tuning Grails applications
Performance tuning Grails applications
GR8Conf
 
NetFlow Analyzer Training Part II : Diagnosing and troubleshooting traffic is...
NetFlow Analyzer Training Part II : Diagnosing and troubleshooting traffic is...NetFlow Analyzer Training Part II : Diagnosing and troubleshooting traffic is...
NetFlow Analyzer Training Part II : Diagnosing and troubleshooting traffic is...
ManageEngine, Zoho Corporation
 
Predictive Analytics with Numenta Machine Intelligence
Predictive Analytics with Numenta Machine IntelligencePredictive Analytics with Numenta Machine Intelligence
Predictive Analytics with Numenta Machine Intelligence
Numenta
 
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Apache Apex
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Impetus Technologies
 
CNIT 121: 6 Discovering the Scope of the Incident & 7 Live Data Collection
CNIT 121: 6 Discovering the Scope of the Incident & 7 Live Data CollectionCNIT 121: 6 Discovering the Scope of the Incident & 7 Live Data Collection
CNIT 121: 6 Discovering the Scope of the Incident & 7 Live Data Collection
Sam Bowne
 

Similar to Detecting Anomalies in Streaming Data (20)

Evaluating Real-Time Anomaly Detection: The Numenta Anomaly Benchmark
Evaluating Real-Time Anomaly Detection: The Numenta Anomaly BenchmarkEvaluating Real-Time Anomaly Detection: The Numenta Anomaly Benchmark
Evaluating Real-Time Anomaly Detection: The Numenta Anomaly Benchmark
 
Anomaly Detection Using the CLA
Anomaly Detection Using the CLAAnomaly Detection Using the CLA
Anomaly Detection Using the CLA
 
SmartData Webinar: Applying Neocortical Research to Streaming Analytics
SmartData Webinar: Applying Neocortical Research to Streaming AnalyticsSmartData Webinar: Applying Neocortical Research to Streaming Analytics
SmartData Webinar: Applying Neocortical Research to Streaming Analytics
 
How the Big Data of APM can Supercharge DevOps
How the Big Data of APM can Supercharge DevOpsHow the Big Data of APM can Supercharge DevOps
How the Big Data of APM can Supercharge DevOps
 
Chris Irwin - Business Development Director, Tridium
Chris Irwin - Business Development Director, TridiumChris Irwin - Business Development Director, Tridium
Chris Irwin - Business Development Director, Tridium
 
Real Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalReal Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from Pivotal
 
Apeman masta midih-oc2_demo_day
Apeman masta midih-oc2_demo_dayApeman masta midih-oc2_demo_day
Apeman masta midih-oc2_demo_day
 
Machine Intelligence in Manufacturing Industry - Igor Mihajlovic
Machine Intelligence in Manufacturing Industry - Igor MihajlovicMachine Intelligence in Manufacturing Industry - Igor Mihajlovic
Machine Intelligence in Manufacturing Industry - Igor Mihajlovic
 
Fraud Analytics with Machine Learning and Big Data Engineering for Telecom
Fraud Analytics with Machine Learning and Big Data Engineering for TelecomFraud Analytics with Machine Learning and Big Data Engineering for Telecom
Fraud Analytics with Machine Learning and Big Data Engineering for Telecom
 
Anomaly detection - TIBCO Data Science Central
Anomaly detection - TIBCO Data Science CentralAnomaly detection - TIBCO Data Science Central
Anomaly detection - TIBCO Data Science Central
 
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
 
Building data intensive applications
Building data intensive applicationsBuilding data intensive applications
Building data intensive applications
 
Observability - The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
Observability -  The good, the bad and the ugly Xp Days 2019 Kiev Ukraine Observability -  The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
Observability - The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
 
Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex
 
Performance tuning Grails applications
 Performance tuning Grails applications Performance tuning Grails applications
Performance tuning Grails applications
 
NetFlow Analyzer Training Part II : Diagnosing and troubleshooting traffic is...
NetFlow Analyzer Training Part II : Diagnosing and troubleshooting traffic is...NetFlow Analyzer Training Part II : Diagnosing and troubleshooting traffic is...
NetFlow Analyzer Training Part II : Diagnosing and troubleshooting traffic is...
 
Predictive Analytics with Numenta Machine Intelligence
Predictive Analytics with Numenta Machine IntelligencePredictive Analytics with Numenta Machine Intelligence
Predictive Analytics with Numenta Machine Intelligence
 
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
 
CNIT 121: 6 Discovering the Scope of the Incident & 7 Live Data Collection
CNIT 121: 6 Discovering the Scope of the Incident & 7 Live Data CollectionCNIT 121: 6 Discovering the Scope of the Incident & 7 Live Data Collection
CNIT 121: 6 Discovering the Scope of the Incident & 7 Live Data Collection
 

Recently uploaded

Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 

Recently uploaded (20)

Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 

Detecting Anomalies in Streaming Data

  • 1. DETECTING ANOMALIES IN STREAMING DATA Data By The Bay May 19, 2016 Subutai Ahmad @SubutaiAhmad sahmad@numenta.com
  • 2. OUTLINE • Real-time streaming analytics • Anomaly detection with Hierarchical Temporal Memory • Benchmarking real-time anomaly detection • Summary
  • 3. Monitoring IT infrastructure Uncovering fraudulent transactions Tracking vehicles Real-time health monitoring Monitoring energy consumption Detection is necessary, but prevention is often the goal REAL-TIME ANOMALY DETECTION •  Exponential growth in IoT, sensors and real-time data collection is driving an explosion of streaming data •  The biggest application for machine learning is anomaly detection
  • 5. EXAMPLE: PREVENTIVE MAINTENANCE Planned shutdown Behavioral change preceding failure Catastrophic failure
  • 6. THE STREAMING ANALYTICS PROBLEM Given all past input and current input, decide whether the system behavior is anomalous right now. Must report decision, perform any retraining, bookkeeping, etc. before next input arrives. No look-ahead No training/test set split – everything must be done online System must be automated, and customized to each stream
  • 7. HIERARCHICAL TEMPORAL MEMORY (HTM) • Powerful sequence memory derived from recent findings in experimental neuroscience • High capacity memory based system • Models temporal sequences in data • Inherently streaming • Continuously learning and predicting • No need to tune hyper-parameters • Open source: github.com/numenta
  • 8. HTM PREDICTS FUTURE INPUT • Input to the system is a stream of data • Encoded into a sparse high dimensional vector • Learns temporal sequences in input stream and makes a prediction in the form of a sparse vector •  represents a prediction for upcoming input HTM
  • 9. ANOMALY DETECTION WITH HTM HTM Raw anomaly score Anomaly likelihood is an instantaneous measure of prediction error •  0 if input was perfectly prediction •  1 if it was completely unpredicted •  Could threshold it directly to report anomalies, but in very noisy environments we can do better
  • 10. ANOMALY LIKELIHOOD • Second order measure: did the predictability of the metric change? 1.  Estimate historical distribution of anomaly scores 2.  Check if recent scores are very different
  • 11. ANOMALY LIKELIHOOD • Second order measure: did the predictability of the metric change? 1.  Estimate historical distribution of anomaly scores 2.  Check if recent scores are very different
  • 12. ANOMALY DETECTION WITH HTM HTM Raw anomaly score Anomaly likelihood Learns temporal sequences Continuously makes predictions Continuously learning Was current input predicted? Has level of predictability changed significantly?
  • 13. ANOMALIES IN IT INFRASTRUCTURE • Grok •  Commercial server based product detects anomalies in IT infrastructure •  Runs thousands of HTM anomaly detectors in real time •  10 milliseconds per input per metric, including continuous learning •  No parameter tuning required •  http://grokstream.com
  • 14. ANOMALIES IN FINANCIAL DATA • HTM for Stocks •  Real-time free demo application •  Continuously monitors top 200 stocks •  Available on iOS App Store or Google Play Store •  Open source application: github.com/numenta/numenta-apps
  • 15. OUTLINE • Real-time streaming analytics • Anomaly detection with Hierarchical Temporal Memory • Benchmarking real-time anomaly detection • Summary
  • 16. EVALUATING STREAMING ANOMALY DETECTION •  Most existing benchmarks are designed for batch data, not streaming data •  Hard to find benchmarks containing real world data labeled with anomalies •  There is a need for an open benchmark designed to test real-time anomaly detection •  A standard community benchmark could spur innovation in streaming anomaly detection algorithms
  • 17. NUMENTA ANOMALY BENCHMARK (NAB) •  NAB: a rigorous benchmark for anomaly detection in streaming applications
  • 18. NUMENTA ANOMALY BENCHMARK (NAB) •  NAB: a rigorous benchmark for anomaly detection in streaming applications •  Real-world benchmark data set •  58 labeled data streams (47 real-world, 11 artificial streams) •  Total of 365,551 data points
  • 19. NUMENTA ANOMALY BENCHMARK (NAB) •  NAB: a rigorous benchmark for anomaly detection in streaming applications •  Real-world benchmark data set •  58 labeled data streams (47 real-world, 11 artificial streams) •  Total of 365,551 data points •  Scoring mechanism •  Rewards early detection •  Different “application profiles”
  • 20. NUMENTA ANOMALY BENCHMARK (NAB) •  NAB: a rigorous benchmark for anomaly detection in streaming applications •  Real-world benchmark data set •  58 labeled data streams (47 real-world, 11 artificial streams) •  Total of 365,551 data points •  Scoring mechanism •  Rewards early detection •  Different “application profiles” •  Open resource •  AGPL repository contains data, source code, and documentation •  github.com/numenta/NAB •  Ongoing competition to expand NAB
  • 21. EXAMPLE: HOURLY SERVICE DEMAND Spike in demand Unusually low demand
  • 22. EXAMPLE: PRODUCTION SERVER CPU Spiking behavior becomes the new norm Spike anomaly
  • 23. HOW SHOULD WE SCORE ANOMALIES? •  The perfect detector •  Detects anomalies as soon as possible •  Provides detections in real time •  Triggers no false alarms •  Requires no parameter tuning •  Automatically adapts to changing statistics •  Scoring methods in traditional benchmarks are insufficient •  Precision/recall does not incorporate importance of early detection •  Artificial separation into training and test sets does not handle continuous learning •  Batch data files allow look ahead and multiple passes through the data
  • 24. WHERE IS THE ANOMALY?
  • 25. NAB DEFINES ANOMALY WINDOWS NAB scoring function gives higher score to earlier detections in window
  • 26. OTHER DETAILS •  Application profiles •  Three application profiles assign different weightings based on the tradeoff between false positives and false negatives. •  EKG data on a cardiac patient favors False Positives. •  IT / DevOps professionals hate False Positives. •  Three application profiles: standard, favor low false positives, favor low false negatives. •  NAB emulates practical real-time scenarios •  Look ahead not allowed for algorithms. Detections must be made on the fly. •  No separation between training and test files. Invoke model, start streaming, and go. •  No batch parameter tuning. Must be fully automated with single set of parameters across data streams. Any further parameter tuning must be done on the fly.
  • 27. TESTING ALGORITHMS WITH NAB •  NAB is designed to easily plug in and test new algorithms •  Results with several algorithms: •  Hierarchical Temporal Memory •  Etsy Skyline •  Popular open source anomaly detection technique •  Mixture of statistical experts, continuously learning •  Twitter ADVec •  Open source anomaly detection released last year •  Robust outlier statistics + piecewise approximation •  Bayesian Online Change Point Detection •  Formal Bayesian method for detecting anomalies in time series
  • 28. NAB V1.0 RESULTS (58 FILES)
  • 29. DETECTION RESULTS: CPU USAGE ON PRODUCTION SERVER Simple spike, all 3 algorithms detect Shift in usage Etsy Skyline Numenta HTM Twitter ADVec Red denotes False Positive Key
  • 30. DETECTION RESULTS: MACHINE TEMPERATURE READINGS HTM detects purely temporal anomaly Etsy Skyline Numenta HTM Twitter ADVec Red denotes False Positive Key All 3 detect catastrophic failure
  • 31. DETECTION RESULTS: TEMPORAL CHANGES IN BEHAVIOR OFTEN PRECEDE A LARGER SHIFT HTM detects anomaly 3 hours earlier Etsy Skyline Numenta HTM Twitter ADVec Red denotes False Positive Key
  • 32. NAB COMPETITION!! •  NAB is a resource for the streaming analytics community •  Need additional real-world data files and more algorithms tested •  NAB Competition offers cash prizes for: •  Additional anomaly detection algorithms tested on NAB •  Submission of real-world data files with labeled real anomalies •  Cash prizes of $2,500 each for algorithms and data •  Easy to enter, high likelihood of winning! •  Go to http://numenta.org/nab for details
  • 33. SUMMARY •  Anomaly detection for streaming data imposes unique challenges •  Stringent real-time constraints and automation requirements •  Typical batch methodologies do not work well •  HTM learning algorithms •  Can be used to create a streaming anomaly detection system •  Performs very well across a wide range of datasets •  Open source, commercially deployable •  NAB is an open source benchmark for streaming anomaly detection •  Includes a labeled dataset with real world data •  Scoring methodology designed for practical real-time applications •  NAB competition!
  • 34. RESOURCES Grok (anomalies in IT infrastructure): http://grokstream.com HTM Studio (desktop app for easy experimentation): contact me Open Source Repositories: Algorithm code: https://github.com/numenta/nupic HTM Stocks demo: https://github.com/numenta/numenta-apps NAB code + paper: https://github.com/numenta/nab Apache Flink: https://github.com/nupic-community/flink-htm Contact info: Subutai Ahmad sahmad@numenta.com, @SubutaiAhmad Alex Lavin alavin@numenta.com, @theAlexLavin