This document discusses anomaly detection in streaming data using Hierarchical Temporal Memory (HTM). It describes how HTM can be used to build a real-time anomaly detection system that continuously learns and predicts patterns in streaming data. It also introduces the Numenta Anomaly Benchmark (NAB), an open benchmark for evaluating streaming anomaly detection algorithms that contains labeled real-world data streams and a scoring system that rewards early detection of anomalies. Several state-of-the-art algorithms are shown to perform well on NAB, including HTM, and a competition is announced to expand NAB with more data and algorithms.
Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15MLconf
Real-time Anomaly Detection for Real-time Data Needs: Much of the world’s data is becoming streaming, time-series data, where anomalies give significant information in often-critical situations. Examples abound in domains such as finance, IT, security, medical, and energy. Yet detecting anomalies in streaming data is a difficult task, requiring detectors to process data in real-time, not batches, and learn while simultaneously making predictions. Are there algorithms up for the challenge? Which are the most capable? The Numenta Anomaly Detection Benchmark (NAB) attempts to provide a controlled and repeatable environment of open-source tools to test and measure anomaly detection algorithms on streaming data. The perfect detector would detect all anomalies as soon as possible, trigger no false alarms, work with real-world time-series data across a variety of domains, and automatically adapt to changing statistics. These characteristics are formalized in NAB, using a custom scoring algorithm to evaluate the detectors on a benchmark dataset with labeled, real-world time-series data. We present these components, and describe the end-to-end scoring process. We give results and analyses for several algorithms to illustrate NAB in action. The goal for NAB is to provide a standard, open-source framework for which we can compare and evaluate different algorithms for detecting anomalies in streaming data.
Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15MLconf
Real-time Anomaly Detection for Real-time Data Needs: Much of the world’s data is becoming streaming, time-series data, where anomalies give significant information in often-critical situations. Examples abound in domains such as finance, IT, security, medical, and energy. Yet detecting anomalies in streaming data is a difficult task, requiring detectors to process data in real-time, not batches, and learn while simultaneously making predictions. Are there algorithms up for the challenge? Which are the most capable? The Numenta Anomaly Detection Benchmark (NAB) attempts to provide a controlled and repeatable environment of open-source tools to test and measure anomaly detection algorithms on streaming data. The perfect detector would detect all anomalies as soon as possible, trigger no false alarms, work with real-world time-series data across a variety of domains, and automatically adapt to changing statistics. These characteristics are formalized in NAB, using a custom scoring algorithm to evaluate the detectors on a benchmark dataset with labeled, real-world time-series data. We present these components, and describe the end-to-end scoring process. We give results and analyses for several algorithms to illustrate NAB in action. The goal for NAB is to provide a standard, open-source framework for which we can compare and evaluate different algorithms for detecting anomalies in streaming data.
An overview of streaming algorithms: what they are, what the general principles regarding them are, and how they fit into a big data architecture. Also four specific examples of streaming algorithms and use-cases.
Streaming data presents new challenges for statistics and machine learning on extremely large data sets. Tools such as Apache Storm, a stream processing framework, can power range of data analytics but lack advanced statistical capabilities. These slides are from the Apache.con talk, which discussed developing streaming algorithms with the flexibility of both Storm and R, a statistical programming language.
At the talk I dicsussed issues of why and how to use Storm and R to develop streaming algorithms; in particular I focused on:
• Streaming algorithms
• Online machine learning algorithms
• Use cases showing how to process hundreds of millions of events a day in (near) real time
See: https://apacheconna2015.sched.org/event/09f5a1cc372860b008bce09e15a034c4#.VUf7wxOUd5o
This webinar series covers Apache Kafka and Apache Storm for streaming data processing. Also, it discusses new streaming innovations for Kafka and Storm included in HDP 2.2
Márton Balassi Streaming ML with Flink- Flink Forward
http://flink-forward.org/kb_sessions/streaming-ml-with-flink/
As continuous big data processing is gaining popularity it naturally implies that there is a need to transition many of the distributed machine learning functionality to a streaming backend. The most common use case is to give streaming predictions based on the model learnt in batch, however in some cases it is beneficial to also update the model on the fly. It is not uncommon that streaming learners need different algorithms than their batch counterparts. The talk discusses the common use cases and the pitfalls of the streaming ML transition through the example of recommender systems. It also offer a dive into the implementation of a Scala library augmenting FlinkML with streaming predictors.
Data Stream Analytics - Why they are importantParis Carbone
Streaming is cool and it can help us do quick analytics and make profit but what about tsunamis? This is a motivation talk presented at the SeRC Big Data Workshop in Sweden during spring 2016. It motivates the streaming paradigm and provides examples on Apache Flink.
Evaluating Real-Time Anomaly Detection: The Numenta Anomaly BenchmarkNumenta
Subutai Ahmad, VP Research presenting NAB and discussing the need for evaluating real-time anomaly detection algorithms. This presentation was delivered at MLConf (Machine Learning Conference) in San Francisco 2015.
SmartData Webinar: Applying Neocortical Research to Streaming AnalyticsDATAVERSITY
We are witnessing an explosion of sensors and machine generated data. Every server, every building, and every device generates a continuous stream of information that is ever changing and potentially valuable. The existing big data paradigm requires storing data for batch analysis, and extensive modeling by a human expert, prior to deployment. This is incredibly inefficient and cannot scale.
In this webinar, Ahmad will describe a new paradigm for streaming data algorithms, based on recent neuroscience findings and on the computational properties of the neocortex. These systems are highly automated, adapt to changing statistics, and naturally deal with temporal data streams. Many of the core ideas have been implemented in the open source project NuPIC, and validated in commercial anomaly detection and predictive maintenance applications. Given the massive increase in the number of data sources, a general-purpose automated approach is the only scalable way to effectively analyze and act on continuously streaming information.
How the Big Data of APM can Supercharge DevOpsCA Technologies
In the age where applications reign supreme, your organizations must be agile in application performance management and app development in order to meet the market demands and stay competitive. Even with mature APM solutions, developer, test and operations teams are strained by operational complexity, accelerated release schedules, and big data challenges to quickly find the root cause of issues affecting end user experience.
The power of advanced analytics and data science can help us make the most of the vast cache of APM data we collect and help our DevOps teams supercharge user experience. It’s time to take some of the load off of our humans and let technology make it easier to focus on meaningful changes in user, application and system behavior. Analytics are becoming a valuable component of APM solutions to redefine triage, improve application quality, and delight the end-user.
In a webcast on August 7th, 2014, Ken Godskind, Chief blogger and Analyst, APMExaminer.com shared how the big data of APM can supercharge your DevOps transformation. Chris Kline, Senior Director, CA Technologies followed Ken and discussed how the Advanced Behavior Analytics capability of CA APM can assist in this journey.
Ken and Chris used this slide set during the webcast which can be viewed at http://goo.gl/TZYEuq
Use of Machine Intelligence in manufacturing industry poses a special challenge due to a wide range of use cases, inherent complexity in data collection, availability of information and disconnect between information islands in different manufacturing steps.
Within our talk we present several machine intelligence projects we did in the manufacturing industry, which helped our customers in product quality improvement, reduction of cost and better asset management. We will talk about the used methodologies, the results achieved and the lessons learned from these projects. We will specifically focus on the importance of process and business knowledge for successful implementation of any industrial project.
Fraud Analytics with Machine Learning and Big Data Engineering for TelecomSudarson Roy Pratihar
Presentation of a successful project executed on telecom fraud analytics @ 3rd International conference for businees analytics and intelligence, Indian Institute of Management Bangalore
Data Science methods and case studies in anomaly detection - from SPC to Deep Learning Autoencoders and signature analysis. Includes application of models on event streams. Case study - IoT sensor data from energy production facilities.
An overview of streaming algorithms: what they are, what the general principles regarding them are, and how they fit into a big data architecture. Also four specific examples of streaming algorithms and use-cases.
Streaming data presents new challenges for statistics and machine learning on extremely large data sets. Tools such as Apache Storm, a stream processing framework, can power range of data analytics but lack advanced statistical capabilities. These slides are from the Apache.con talk, which discussed developing streaming algorithms with the flexibility of both Storm and R, a statistical programming language.
At the talk I dicsussed issues of why and how to use Storm and R to develop streaming algorithms; in particular I focused on:
• Streaming algorithms
• Online machine learning algorithms
• Use cases showing how to process hundreds of millions of events a day in (near) real time
See: https://apacheconna2015.sched.org/event/09f5a1cc372860b008bce09e15a034c4#.VUf7wxOUd5o
This webinar series covers Apache Kafka and Apache Storm for streaming data processing. Also, it discusses new streaming innovations for Kafka and Storm included in HDP 2.2
Márton Balassi Streaming ML with Flink- Flink Forward
http://flink-forward.org/kb_sessions/streaming-ml-with-flink/
As continuous big data processing is gaining popularity it naturally implies that there is a need to transition many of the distributed machine learning functionality to a streaming backend. The most common use case is to give streaming predictions based on the model learnt in batch, however in some cases it is beneficial to also update the model on the fly. It is not uncommon that streaming learners need different algorithms than their batch counterparts. The talk discusses the common use cases and the pitfalls of the streaming ML transition through the example of recommender systems. It also offer a dive into the implementation of a Scala library augmenting FlinkML with streaming predictors.
Data Stream Analytics - Why they are importantParis Carbone
Streaming is cool and it can help us do quick analytics and make profit but what about tsunamis? This is a motivation talk presented at the SeRC Big Data Workshop in Sweden during spring 2016. It motivates the streaming paradigm and provides examples on Apache Flink.
Evaluating Real-Time Anomaly Detection: The Numenta Anomaly BenchmarkNumenta
Subutai Ahmad, VP Research presenting NAB and discussing the need for evaluating real-time anomaly detection algorithms. This presentation was delivered at MLConf (Machine Learning Conference) in San Francisco 2015.
SmartData Webinar: Applying Neocortical Research to Streaming AnalyticsDATAVERSITY
We are witnessing an explosion of sensors and machine generated data. Every server, every building, and every device generates a continuous stream of information that is ever changing and potentially valuable. The existing big data paradigm requires storing data for batch analysis, and extensive modeling by a human expert, prior to deployment. This is incredibly inefficient and cannot scale.
In this webinar, Ahmad will describe a new paradigm for streaming data algorithms, based on recent neuroscience findings and on the computational properties of the neocortex. These systems are highly automated, adapt to changing statistics, and naturally deal with temporal data streams. Many of the core ideas have been implemented in the open source project NuPIC, and validated in commercial anomaly detection and predictive maintenance applications. Given the massive increase in the number of data sources, a general-purpose automated approach is the only scalable way to effectively analyze and act on continuously streaming information.
How the Big Data of APM can Supercharge DevOpsCA Technologies
In the age where applications reign supreme, your organizations must be agile in application performance management and app development in order to meet the market demands and stay competitive. Even with mature APM solutions, developer, test and operations teams are strained by operational complexity, accelerated release schedules, and big data challenges to quickly find the root cause of issues affecting end user experience.
The power of advanced analytics and data science can help us make the most of the vast cache of APM data we collect and help our DevOps teams supercharge user experience. It’s time to take some of the load off of our humans and let technology make it easier to focus on meaningful changes in user, application and system behavior. Analytics are becoming a valuable component of APM solutions to redefine triage, improve application quality, and delight the end-user.
In a webcast on August 7th, 2014, Ken Godskind, Chief blogger and Analyst, APMExaminer.com shared how the big data of APM can supercharge your DevOps transformation. Chris Kline, Senior Director, CA Technologies followed Ken and discussed how the Advanced Behavior Analytics capability of CA APM can assist in this journey.
Ken and Chris used this slide set during the webcast which can be viewed at http://goo.gl/TZYEuq
Use of Machine Intelligence in manufacturing industry poses a special challenge due to a wide range of use cases, inherent complexity in data collection, availability of information and disconnect between information islands in different manufacturing steps.
Within our talk we present several machine intelligence projects we did in the manufacturing industry, which helped our customers in product quality improvement, reduction of cost and better asset management. We will talk about the used methodologies, the results achieved and the lessons learned from these projects. We will specifically focus on the importance of process and business knowledge for successful implementation of any industrial project.
Fraud Analytics with Machine Learning and Big Data Engineering for TelecomSudarson Roy Pratihar
Presentation of a successful project executed on telecom fraud analytics @ 3rd International conference for businees analytics and intelligence, Indian Institute of Management Bangalore
Data Science methods and case studies in anomaly detection - from SPC to Deep Learning Autoencoders and signature analysis. Includes application of models on event streams. Case study - IoT sensor data from energy production facilities.
(Mike Graham + Dan Carroll, Comcast) Kafka Summit SF 2018
Comcast manages over 2 million miles of fiber and coax, and over 40 million in home devices. This “outside plant” is subject to adverse conditions from severe weather to power grid outages to construction-related disruptions. Maintaining the health of this large and important infrastructure requires a distributed, scalable, reliable and fast information system capable of real-time processing and rapid analysis and response. Using Apache Kafka and the Kafka Streams Processor API, Comcast built an innovative new system for monitoring, problem analysis, metrics reporting and action response for the outside plant.
In this talk, you’ll learn how topic partitions, state stores, key mapping, source and sink topics and processors from the Kafka Streams Processor API work together to build a powerful dynamic system. We will dive into the details about the inner workings of the state store—how it is backed by a Kafka “changelog” topic, how it is scaled horizontally by partition and how the instances are rebuilt on startup or on processor failure. We will discuss how these state stores essentially become like materialized views in a SQL database but are updated incrementally as data flows through the system, and how this allows the developers to maintain the data in the optimal structures for performing the processing. The best part is that the data is readily available when needed by the processors. You will see how a REST API using Kafka Streams “interactive queries” can be used to retrieve the data in the state stores. We will explore the deployment and monitoring mechanisms used to deliver this system as a set of independently deployed components.
Observability - The good, the bad and the ugly Xp Days 2019 Kiev Ukraine Aleksandr Tavgen
Talk about approaches to an observability. Do we need millions of metrics? Anomalies vs regularities? Can Machine Learning help us? Some abilities of Flux language by InfluxData
Grails has great performance characteristics but as with all full stack frameworks, attention must be paid to optimize performance. In this talk Lari will discuss common missteps that can easily be avoided and share tips and tricks which help profile and tune Grails applications.
In this presentation, you'll learn how to troubleshoot bandwidth issues with NetFlow Analyzer.
Topics covered:
1. Customizing data storage
2. Customizing dashboards
3. Reporting and automation
4. Troubleshooting with forensics
5. Traffic shaping
6. Capacity planning and billing
To know more, visit www.netflowanalyzer.com
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexApache Apex
This is an overview of architecture with use cases for Apache Apex, a big data analytics platform. It comes with a powerful stream processing engine, rich set of functional building blocks and an easy to use API for the developer to build real-time and batch applications. Apex runs natively on YARN and HDFS and is used in production in various industries. You will learn more about two use cases: A leading Ad Tech company serves billions of advertising impressions and collects terabytes of data from several data centers across the world every day. Apex was used to implement rapid actionable insights, for real-time reporting and allocation, utilizing Kafka and files as source, dimensional computation and low latency visualization. A customer in the IoT space uses Apex for Time Series service, including efficient storage of time series data, data indexing for quick retrieval and queries at high scale and precision. The platform leverages the high availability, horizontal scalability and operability of Apex.
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationImpetus Technologies
Detecting anomalous patterns in data can lead to significant actionable insights in a wide variety of application domains, such as fraud detection, network traffic management, predictive healthcare, energy monitoring and many more.
However, detecting anomalies accurately can be difficult. What qualifies as an anomaly is continuously changing and anomalous patterns are unexpected. An effective anomaly detection system needs to continuously self-learn without relying on pre-programmed thresholds.
Join our speakers Ravishankar Rao Vallabhajosyula, Senior Data Scientist, Impetus Technologies and Saurabh Dutta, Technical Product Manager - StreamAnalytix, in a discussion on:
Importance of anomaly detection in enterprise data, types of anomalies, and challenges
Prominent real-time application areas
Approaches, techniques and algorithms for anomaly detection
Sample use-case implementation on the StreamAnalytix platform
CNIT 121: 6 Discovering the Scope of the Incident & 7 Live Data CollectionSam Bowne
Slides for a college course based on "Incident Response & Computer Forensics, Third Edition" by by Jason Luttgens, Matthew Pepe, and Kevin Mandia.
Teacher: Sam Bowne
Website: https://samsclass.info/121/121_F16.shtml
Similar to Detecting Anomalies in Streaming Data (20)
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
6. THE STREAMING ANALYTICS PROBLEM
Given all past input and current
input, decide whether the system
behavior is anomalous right now.
Must report decision, perform
any retraining, bookkeeping,
etc. before next input arrives.
No look-ahead
No training/test set split – everything must be done online
System must be automated, and customized to each stream
7. HIERARCHICAL TEMPORAL MEMORY (HTM)
• Powerful sequence memory derived
from recent findings in experimental
neuroscience
• High capacity memory based system
• Models temporal sequences in data
• Inherently streaming
• Continuously learning and predicting
• No need to tune hyper-parameters
• Open source: github.com/numenta
8. HTM PREDICTS FUTURE INPUT
• Input to the system is a stream of data
• Encoded into a sparse high dimensional vector
• Learns temporal sequences in input stream and makes a prediction
in the form of a sparse vector
• represents a prediction for upcoming input
HTM
9. ANOMALY DETECTION WITH HTM
HTM
Raw anomaly
score
Anomaly
likelihood
is an instantaneous measure of
prediction error
• 0 if input was perfectly prediction
• 1 if it was completely unpredicted
• Could threshold it directly to report
anomalies, but in very noisy
environments we can do better
10. ANOMALY LIKELIHOOD
• Second order measure: did the predictability of the metric change?
1. Estimate historical distribution of anomaly scores
2. Check if recent scores are very different
11. ANOMALY LIKELIHOOD
• Second order measure: did the predictability of the metric change?
1. Estimate historical distribution of anomaly scores
2. Check if recent scores are very different
12. ANOMALY DETECTION WITH HTM
HTM
Raw anomaly
score
Anomaly
likelihood
Learns temporal sequences
Continuously makes predictions
Continuously learning
Was current input
predicted?
Has level of
predictability changed
significantly?
13. ANOMALIES IN IT INFRASTRUCTURE
• Grok
• Commercial server based product detects anomalies in IT infrastructure
• Runs thousands of HTM anomaly detectors in real time
• 10 milliseconds per input per metric, including continuous learning
• No parameter tuning required
• http://grokstream.com
14. ANOMALIES IN FINANCIAL DATA
• HTM for Stocks
• Real-time free demo application
• Continuously monitors top 200 stocks
• Available on iOS App Store or Google Play Store
• Open source application: github.com/numenta/numenta-apps
16. EVALUATING STREAMING ANOMALY DETECTION
• Most existing benchmarks are designed for batch data, not
streaming data
• Hard to find benchmarks containing real world data labeled with
anomalies
• There is a need for an open benchmark designed to test real-time
anomaly detection
• A standard community benchmark could spur innovation in
streaming anomaly detection algorithms
17. NUMENTA ANOMALY BENCHMARK (NAB)
• NAB: a rigorous benchmark for anomaly
detection in streaming applications
18. NUMENTA ANOMALY BENCHMARK (NAB)
• NAB: a rigorous benchmark for anomaly
detection in streaming applications
• Real-world benchmark data set
• 58 labeled data streams
(47 real-world, 11 artificial streams)
• Total of 365,551 data points
19. NUMENTA ANOMALY BENCHMARK (NAB)
• NAB: a rigorous benchmark for anomaly
detection in streaming applications
• Real-world benchmark data set
• 58 labeled data streams
(47 real-world, 11 artificial streams)
• Total of 365,551 data points
• Scoring mechanism
• Rewards early detection
• Different “application profiles”
20. NUMENTA ANOMALY BENCHMARK (NAB)
• NAB: a rigorous benchmark for anomaly
detection in streaming applications
• Real-world benchmark data set
• 58 labeled data streams
(47 real-world, 11 artificial streams)
• Total of 365,551 data points
• Scoring mechanism
• Rewards early detection
• Different “application profiles”
• Open resource
• AGPL repository contains data, source code,
and documentation
• github.com/numenta/NAB
• Ongoing competition to expand NAB
23. HOW SHOULD WE SCORE ANOMALIES?
• The perfect detector
• Detects anomalies as soon as possible
• Provides detections in real time
• Triggers no false alarms
• Requires no parameter tuning
• Automatically adapts to changing statistics
• Scoring methods in traditional benchmarks are insufficient
• Precision/recall does not incorporate importance of early detection
• Artificial separation into training and test sets does not handle continuous learning
• Batch data files allow look ahead and multiple passes through the data
25. NAB DEFINES ANOMALY WINDOWS
NAB scoring function gives higher score to earlier detections in window
26. OTHER DETAILS
• Application profiles
• Three application profiles assign different weightings based on the tradeoff between
false positives and false negatives.
• EKG data on a cardiac patient favors False Positives.
• IT / DevOps professionals hate False Positives.
• Three application profiles: standard, favor low false positives, favor low false negatives.
• NAB emulates practical real-time scenarios
• Look ahead not allowed for algorithms. Detections must be made on the fly.
• No separation between training and test files. Invoke model, start streaming, and go.
• No batch parameter tuning. Must be fully automated with single set of parameters
across data streams. Any further parameter tuning must be done on the fly.
27. TESTING ALGORITHMS WITH NAB
• NAB is designed to easily plug in and test new algorithms
• Results with several algorithms:
• Hierarchical Temporal Memory
• Etsy Skyline
• Popular open source anomaly detection technique
• Mixture of statistical experts, continuously learning
• Twitter ADVec
• Open source anomaly detection released last year
• Robust outlier statistics + piecewise approximation
• Bayesian Online Change Point Detection
• Formal Bayesian method for detecting anomalies in time series
29. DETECTION RESULTS: CPU USAGE ON
PRODUCTION SERVER
Simple spike, all 3
algorithms detect
Shift in usage
Etsy
Skyline
Numenta
HTM
Twitter
ADVec
Red denotes
False Positive
Key
30. DETECTION RESULTS: MACHINE
TEMPERATURE READINGS
HTM detects purely
temporal anomaly
Etsy
Skyline
Numenta
HTM
Twitter
ADVec
Red denotes
False Positive
Key
All 3 detect
catastrophic failure
31. DETECTION RESULTS: TEMPORAL CHANGES IN
BEHAVIOR OFTEN PRECEDE A LARGER SHIFT
HTM detects anomaly 3
hours earlier
Etsy
Skyline
Numenta
HTM
Twitter
ADVec
Red denotes
False Positive
Key
32. NAB COMPETITION!!
• NAB is a resource for the streaming analytics community
• Need additional real-world data files and more algorithms tested
• NAB Competition offers cash prizes for:
• Additional anomaly detection algorithms tested on NAB
• Submission of real-world data files with labeled real anomalies
• Cash prizes of $2,500 each for algorithms and data
• Easy to enter, high likelihood of winning!
• Go to http://numenta.org/nab for details
33. SUMMARY
• Anomaly detection for streaming data imposes unique challenges
• Stringent real-time constraints and automation requirements
• Typical batch methodologies do not work well
• HTM learning algorithms
• Can be used to create a streaming anomaly detection system
• Performs very well across a wide range of datasets
• Open source, commercially deployable
• NAB is an open source benchmark for streaming anomaly detection
• Includes a labeled dataset with real world data
• Scoring methodology designed for practical real-time applications
• NAB competition!
34. RESOURCES
Grok (anomalies in IT infrastructure): http://grokstream.com
HTM Studio (desktop app for easy experimentation): contact me
Open Source Repositories:
Algorithm code: https://github.com/numenta/nupic
HTM Stocks demo: https://github.com/numenta/numenta-apps
NAB code + paper: https://github.com/numenta/nab
Apache Flink: https://github.com/nupic-community/flink-htm
Contact info:
Subutai Ahmad sahmad@numenta.com, @SubutaiAhmad
Alex Lavin alavin@numenta.com, @theAlexLavin