Evaluating Real-Time Anomaly Detection: The Numenta Anomaly BenchmarkNumenta
Subutai Ahmad, VP Research presenting NAB and discussing the need for evaluating real-time anomaly detection algorithms. This presentation was delivered at MLConf (Machine Learning Conference) in San Francisco 2015.
Abstract:
There’s no question that we are seeing an increase in the availability of streaming, time-series data. Largely driven by the rise of the Internet of Things (IoT) and connected real-time data sources, we now have an enormous number of applications with sensors that produce important data that changes over time. This data presents a challenge and opportunity for businesses across every industry. How do they handle the onslaught of streaming data? How can they exploit it to make decisions in real-time? One way is to detect, in real time, when something unusual occurs. Early anomaly detection in streaming data has significant implications, yet can be very difficult to execute. It requires detectors to process data in real-time, not batches, and learn while simultaneously making predictions. In this talk, we’ll look at algorithms designed for such data and analyze the components that lead to optimal performance. We’ll also discuss a new benchmark with a labeled, real-world data set, designed to provide a controlled and repeatable environment of open-source tools to test and measure anomaly detection algorithms on streaming data. How do we score in a way that rewards algorithms that detect all anomalies as soon as possible, triggers no false alarms, works with real-world time-series data across a variety of domains, and automatically adapts to changing statistics?
Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15MLconf
Real-time Anomaly Detection for Real-time Data Needs: Much of the world’s data is becoming streaming, time-series data, where anomalies give significant information in often-critical situations. Examples abound in domains such as finance, IT, security, medical, and energy. Yet detecting anomalies in streaming data is a difficult task, requiring detectors to process data in real-time, not batches, and learn while simultaneously making predictions. Are there algorithms up for the challenge? Which are the most capable? The Numenta Anomaly Detection Benchmark (NAB) attempts to provide a controlled and repeatable environment of open-source tools to test and measure anomaly detection algorithms on streaming data. The perfect detector would detect all anomalies as soon as possible, trigger no false alarms, work with real-world time-series data across a variety of domains, and automatically adapt to changing statistics. These characteristics are formalized in NAB, using a custom scoring algorithm to evaluate the detectors on a benchmark dataset with labeled, real-world time-series data. We present these components, and describe the end-to-end scoring process. We give results and analyses for several algorithms to illustrate NAB in action. The goal for NAB is to provide a standard, open-source framework for which we can compare and evaluate different algorithms for detecting anomalies in streaming data.
Evaluating Real-Time Anomaly Detection: The Numenta Anomaly BenchmarkNumenta
Subutai Ahmad, VP Research presenting NAB and discussing the need for evaluating real-time anomaly detection algorithms. This presentation was delivered at MLConf (Machine Learning Conference) in San Francisco 2015.
Abstract:
There’s no question that we are seeing an increase in the availability of streaming, time-series data. Largely driven by the rise of the Internet of Things (IoT) and connected real-time data sources, we now have an enormous number of applications with sensors that produce important data that changes over time. This data presents a challenge and opportunity for businesses across every industry. How do they handle the onslaught of streaming data? How can they exploit it to make decisions in real-time? One way is to detect, in real time, when something unusual occurs. Early anomaly detection in streaming data has significant implications, yet can be very difficult to execute. It requires detectors to process data in real-time, not batches, and learn while simultaneously making predictions. In this talk, we’ll look at algorithms designed for such data and analyze the components that lead to optimal performance. We’ll also discuss a new benchmark with a labeled, real-world data set, designed to provide a controlled and repeatable environment of open-source tools to test and measure anomaly detection algorithms on streaming data. How do we score in a way that rewards algorithms that detect all anomalies as soon as possible, triggers no false alarms, works with real-world time-series data across a variety of domains, and automatically adapts to changing statistics?
Subutai Ahmad, VP of Research, Numenta at MLconf SF - 11/13/15MLconf
Real-time Anomaly Detection for Real-time Data Needs: Much of the world’s data is becoming streaming, time-series data, where anomalies give significant information in often-critical situations. Examples abound in domains such as finance, IT, security, medical, and energy. Yet detecting anomalies in streaming data is a difficult task, requiring detectors to process data in real-time, not batches, and learn while simultaneously making predictions. Are there algorithms up for the challenge? Which are the most capable? The Numenta Anomaly Detection Benchmark (NAB) attempts to provide a controlled and repeatable environment of open-source tools to test and measure anomaly detection algorithms on streaming data. The perfect detector would detect all anomalies as soon as possible, trigger no false alarms, work with real-world time-series data across a variety of domains, and automatically adapt to changing statistics. These characteristics are formalized in NAB, using a custom scoring algorithm to evaluate the detectors on a benchmark dataset with labeled, real-world time-series data. We present these components, and describe the end-to-end scoring process. We give results and analyses for several algorithms to illustrate NAB in action. The goal for NAB is to provide a standard, open-source framework for which we can compare and evaluate different algorithms for detecting anomalies in streaming data.
Extending Flink for anomaly detection with Hierarchical Temporal Memory (HTM). Presented at Bay Area Apache Flink Meetup, in San Jose on June 27, 2016.
https://github.com/htm-community/flink-htm
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...tboubez
This is my presentation from LISA 2014 in Seattle on November 14, 2014.
Most IT Ops teams only keep an eye on a small fraction of the metrics they collect because analyzing this haystack of data and extracting signal from the noise is not easy and generates too many false positives.
In this talk I will show some of the types of anomalies commonly found in dynamic data center environments and discuss the top 5 things I learned while building algorithms to find them. You will see how various Gaussian based techniques work (and why they don’t!), and we will go into some non-parametric methods that you can use to great advantage.
A Practical Guide to Anomaly Detection for DevOpsBigPanda
Recent years have seen an explosion in the volumes of data that modern production environments generate. Making fast educated decisions about production incidents is more challenging than ever. BigPanda's team is passionate about solutions such as anomaly detection that tackle this very challenge.
This is a follow up on a previous talk on hacking my energy monitor. In this talk I go into detail on how I used Machine Learning techniques in the area of Anomaly Detection to draw more value from my data collection.
Finding bad apples early: Minimizing performance impactArun Kejariwal
The big data era is characterized by the ever-increasing velocity and volume of data. In order to store and analyze the ever-growing data, the operational footprint of data stores and Hadoop have also grown over time. (As per a recent report from IDC, the spending on big data infrastructure is expected to reach $41.5 billion by 2018.) The clusters comprise several thousands of nodes. The high performance of such clusters is vital for delivering the best user experience and productivity of teams.
The performance of such clusters is often limited by slow/bad nodes. Finding slow nodes in large clusters is akin to finding a needle in a haystack; hence, manual identification of slow/bad nodes is not practical. To this end, we developed a novel statistical technique to automatically detect slow/bad nodes in clusters comprising hundreds to thousands of nodes. We modeled the problem as a classification problem and employed a simple, yet very effective, distance measure to determine slow/bad nodes. The key highlights of the proposed technique are the following:
# Robustness against anomalies (note that anomalies may occur, for example, due to an ad-hoc heavyweight job on a Hadoop cluster)
# Given the varying data characteristics of different services, no one model fits all. Consequently, we parameterized the threshold used for classification
The proposed technique works well with both hourly and daily data, and has been in use in production by multiple services. This has not only eliminated manual investigation efforts, but has also mitigated the impact of slow nodes, which used to get detected after several weeks/months of lag!
We shall walk the audience through how the techniques are being used with REAL data.
Anomaly detection in real-time data streams using HeronArun Kejariwal
Twitter has become the de facto medium for consumption of news in real time, and billions of events are generated and analyzed on a daily basis. To analyze these events, Twitter designed its own next-generation streaming system, Heron. Arun Kejariwal and Karthik Ramasamy walk you through how Heron is used to detect anomalies in real-time data streams. Although there’s been over 75 years of prior work in anomaly detection, most of the techniques cannot be used off the shelf because they’re not suitable for high-velocity data streams. Arun and Karthik explain how to make trade-offs between accuracy and speed and discuss incremental approaches that marry sampling with robust measures such as median and MCD for anomaly detection.
Naveed Ahmad, Microsoft
Anomaly detection is the de facto standard in cyber defense. However, anomaly detection results in large number of false alerts with highly unusual but benign legit activity. Security detections based on supervised machine learning can reduce the noise, but it requires large number of labelled attack examples for training the model, which are not always available.
Successful cyber-attacks against a well-guarded online service like Office 365 are scarce. There are hundreds of thousands of machines with daily benign activities against a meager few hundred attack examples collected over the years from pen-test engagements. Training a well performing binary classifier using Supervised Machine Learning with such a skewed dataset with so few attack examples is extremely hard.
The presentation goes over various techniques to craft synthetic attack examples from known past attacks. These techniques are used in training Machine Learning models guarding Office 365 online services against cyber-attacks, predicting malicious activity with alert-able accuracy. The presentation describes these techniques with use-cases from Office 365 services with resulting model performance improvement metrics.
Techniques discussed in the presentation are:
Cartesian Bootstrapping - This technique samples benign activities from thousands of machines and combines them with known malicious examples using cartesian product, resulting in large number of synthetic attack examples with varying degrees of embedded benign noise. This helps producing models which can classify malicious & benign examples with greater accuracy and fewer false alerts.
Normalized Sampler Bootstrapping - This technique is very useful for micro-services with very few machines. This technique is used to rather generate synthetic benign examples to match the relatively larger number of malicious examples borrowed from other services. The synthetic examples are generated by sampling benign noise from the examples after removing outliers. This technique allows measuring effectiveness of the model for micro-services, where the model was trained on another larger service.
Developing Highly Instrumented Applications with Minimal EffortTim Hobson
Presentation from Silicon Valley Code Camp 2013. Related code on github:
* https://github.com/hoserdude/mvcmusicstore-instrumented
* https://github.com/hoserdude/spring-petclinic-instrumented
* https://github.com/hoserdude/nodecellar-instrumented
Extending Flink for anomaly detection with Hierarchical Temporal Memory (HTM). Presented at Bay Area Apache Flink Meetup, in San Jose on June 27, 2016.
https://github.com/htm-community/flink-htm
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...tboubez
This is my presentation from LISA 2014 in Seattle on November 14, 2014.
Most IT Ops teams only keep an eye on a small fraction of the metrics they collect because analyzing this haystack of data and extracting signal from the noise is not easy and generates too many false positives.
In this talk I will show some of the types of anomalies commonly found in dynamic data center environments and discuss the top 5 things I learned while building algorithms to find them. You will see how various Gaussian based techniques work (and why they don’t!), and we will go into some non-parametric methods that you can use to great advantage.
A Practical Guide to Anomaly Detection for DevOpsBigPanda
Recent years have seen an explosion in the volumes of data that modern production environments generate. Making fast educated decisions about production incidents is more challenging than ever. BigPanda's team is passionate about solutions such as anomaly detection that tackle this very challenge.
This is a follow up on a previous talk on hacking my energy monitor. In this talk I go into detail on how I used Machine Learning techniques in the area of Anomaly Detection to draw more value from my data collection.
Finding bad apples early: Minimizing performance impactArun Kejariwal
The big data era is characterized by the ever-increasing velocity and volume of data. In order to store and analyze the ever-growing data, the operational footprint of data stores and Hadoop have also grown over time. (As per a recent report from IDC, the spending on big data infrastructure is expected to reach $41.5 billion by 2018.) The clusters comprise several thousands of nodes. The high performance of such clusters is vital for delivering the best user experience and productivity of teams.
The performance of such clusters is often limited by slow/bad nodes. Finding slow nodes in large clusters is akin to finding a needle in a haystack; hence, manual identification of slow/bad nodes is not practical. To this end, we developed a novel statistical technique to automatically detect slow/bad nodes in clusters comprising hundreds to thousands of nodes. We modeled the problem as a classification problem and employed a simple, yet very effective, distance measure to determine slow/bad nodes. The key highlights of the proposed technique are the following:
# Robustness against anomalies (note that anomalies may occur, for example, due to an ad-hoc heavyweight job on a Hadoop cluster)
# Given the varying data characteristics of different services, no one model fits all. Consequently, we parameterized the threshold used for classification
The proposed technique works well with both hourly and daily data, and has been in use in production by multiple services. This has not only eliminated manual investigation efforts, but has also mitigated the impact of slow nodes, which used to get detected after several weeks/months of lag!
We shall walk the audience through how the techniques are being used with REAL data.
Anomaly detection in real-time data streams using HeronArun Kejariwal
Twitter has become the de facto medium for consumption of news in real time, and billions of events are generated and analyzed on a daily basis. To analyze these events, Twitter designed its own next-generation streaming system, Heron. Arun Kejariwal and Karthik Ramasamy walk you through how Heron is used to detect anomalies in real-time data streams. Although there’s been over 75 years of prior work in anomaly detection, most of the techniques cannot be used off the shelf because they’re not suitable for high-velocity data streams. Arun and Karthik explain how to make trade-offs between accuracy and speed and discuss incremental approaches that marry sampling with robust measures such as median and MCD for anomaly detection.
Naveed Ahmad, Microsoft
Anomaly detection is the de facto standard in cyber defense. However, anomaly detection results in large number of false alerts with highly unusual but benign legit activity. Security detections based on supervised machine learning can reduce the noise, but it requires large number of labelled attack examples for training the model, which are not always available.
Successful cyber-attacks against a well-guarded online service like Office 365 are scarce. There are hundreds of thousands of machines with daily benign activities against a meager few hundred attack examples collected over the years from pen-test engagements. Training a well performing binary classifier using Supervised Machine Learning with such a skewed dataset with so few attack examples is extremely hard.
The presentation goes over various techniques to craft synthetic attack examples from known past attacks. These techniques are used in training Machine Learning models guarding Office 365 online services against cyber-attacks, predicting malicious activity with alert-able accuracy. The presentation describes these techniques with use-cases from Office 365 services with resulting model performance improvement metrics.
Techniques discussed in the presentation are:
Cartesian Bootstrapping - This technique samples benign activities from thousands of machines and combines them with known malicious examples using cartesian product, resulting in large number of synthetic attack examples with varying degrees of embedded benign noise. This helps producing models which can classify malicious & benign examples with greater accuracy and fewer false alerts.
Normalized Sampler Bootstrapping - This technique is very useful for micro-services with very few machines. This technique is used to rather generate synthetic benign examples to match the relatively larger number of malicious examples borrowed from other services. The synthetic examples are generated by sampling benign noise from the examples after removing outliers. This technique allows measuring effectiveness of the model for micro-services, where the model was trained on another larger service.
Developing Highly Instrumented Applications with Minimal EffortTim Hobson
Presentation from Silicon Valley Code Camp 2013. Related code on github:
* https://github.com/hoserdude/mvcmusicstore-instrumented
* https://github.com/hoserdude/spring-petclinic-instrumented
* https://github.com/hoserdude/nodecellar-instrumented
Brains, Data, and Machine Intelligence (2014 04 14 London Meetup)Numenta
Jeff will discuss the Brains, Data, Machine Intelligence, Cortical Learning Algorithm he developed and the Numenta Platform for Intelligent Computing (NuPIC).
Why Neurons have thousands of synapses? A model of sequence memory in the brainNumenta
Presentation given by Yuwei Cui, Numenta Research Engineer at Beijing Normal University. December 2015.
Collaborators: Jeff Hawkins, Subutai Ahmad, Chetan Surpur
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...confluent
Apache Kafka is now nearly ubiquitous in modern data pipelines and use cases. While the Kafka development model is elegantly simple, operating Kafka clusters in production environments is a challenge. It’s hard to troubleshoot misbehaving Kafka clusters, especially when there are potentially hundreds or thousands of topics, producers and consumers and billions of messages.
The root cause of why real-time applications is lag may be due to an application problem – like poor data partitioning or load imbalance – or due to a Kafka problem – like resource exhaustion or suboptimal configuration. Therefore getting the best performance, predictability, and reliability for Kafka-based applications can be difficult. In the end, the operation of your Kafka powered analytics pipelines could themselves benefit from machine learning (ML).
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationImpetus Technologies
Detecting anomalous patterns in data can lead to significant actionable insights in a wide variety of application domains, such as fraud detection, network traffic management, predictive healthcare, energy monitoring and many more.
However, detecting anomalies accurately can be difficult. What qualifies as an anomaly is continuously changing and anomalous patterns are unexpected. An effective anomaly detection system needs to continuously self-learn without relying on pre-programmed thresholds.
Join our speakers Ravishankar Rao Vallabhajosyula, Senior Data Scientist, Impetus Technologies and Saurabh Dutta, Technical Product Manager - StreamAnalytix, in a discussion on:
Importance of anomaly detection in enterprise data, types of anomalies, and challenges
Prominent real-time application areas
Approaches, techniques and algorithms for anomaly detection
Sample use-case implementation on the StreamAnalytix platform
SmartData Webinar: Applying Neocortical Research to Streaming AnalyticsDATAVERSITY
We are witnessing an explosion of sensors and machine generated data. Every server, every building, and every device generates a continuous stream of information that is ever changing and potentially valuable. The existing big data paradigm requires storing data for batch analysis, and extensive modeling by a human expert, prior to deployment. This is incredibly inefficient and cannot scale.
In this webinar, Ahmad will describe a new paradigm for streaming data algorithms, based on recent neuroscience findings and on the computational properties of the neocortex. These systems are highly automated, adapt to changing statistics, and naturally deal with temporal data streams. Many of the core ideas have been implemented in the open source project NuPIC, and validated in commercial anomaly detection and predictive maintenance applications. Given the massive increase in the number of data sources, a general-purpose automated approach is the only scalable way to effectively analyze and act on continuously streaming information.
Drilling systems automation is the real-time reliance on digital technology in creating a wellbore. It encompasses downhole tools and systems, surface drilling equipment, remote monitoring and the use of models and simulations while drilling. While its scope is large, its potential benefits are impressive, among them: fewer workers exposed to rig-floor hazards, the ability to realize repeatable performance drilling, and lower drilling risk. While drilling systems automation includes new drilling technology, it is most importantly a collaborative infrastructure for performance drilling. In 2008, a small group of engineers and scientists attending an SPE conference noted that automation was becoming a key topic in drilling and they formed a technical section to investigate it further. By 2015, the group reached a membership of sixteen hundred as the technology rapidly gaining acceptance. Why so much interest? The benefits and promises of an automated approach to drilling address the safety and fundamental economics of drilling. What will it take? Among the answers are an open collaborative digital environment at the wellsite, an openness of mind to digital technologies, and modified or new business practices. What are the barriers? The primary barrier is a lack of understanding and a fear of automation. When will it happen? It is happening now. Digital technologies are transforming the infrastructure of the drilling industry. Drilling systems automation uses this infrastructure to deliver safety and performance, and address cost.
Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring T...Adrian Cockcroft
Monitorama opening keynote talk on the challenges of Monitoring in a world where we need to deal with continuous delivery, cloud, and automated control feedback loops.
A Deep Learning use case for water end use detection by Roberto Díaz and José...Big Data Spain
Deep Learning (DL) is a major breakthrough in artificial intelligence with a high potential for predictive applications.
https://www.bigdataspain.org/2017/talk/a-deep-learning-use-case-for-water-end-use-detection
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
ATI Courses Professional Development Short Course Applied Measurement Engin...Jim Jenkins
How do you know your test measurements are valid? Since NIST traceability actually guarantees little about your test data, how do you know? Could you prove validity to your customer? What is the right measurements solution for your testing requirements? Is it really as simple as the vendors say? What is your real cost of invalid, ambiguous data causing retest or, worst of all, hardware redesign?
This course is for engineers, scientists, and managers who must use systems to understand experimental test measurements on a daily basis. Learn how to design, buy and operate effective automated measurement systems providing demonstrably valid test data, the first time.
Fundamental & underlying engineering principles governing the design and operation of effective automated systems are demonstrated experimentally.
In this talk I will review several real-world applications and tools developed at the University of Waikato over the past 15 years. The early applications focused on agricultural problems such as cow culling, venison bruising and grass grubs. Following this we looked at the use of near infrared spectroscopy coupled with data mining as an alternate laboratory technique for predicting compound concentrations in soil and plant samples. Our latest application is in the area of gas chromatography mass spectrometry (GCMS), a technique used to determine in environmental applications, for example, the petroleum content in soil and water samples.
Time Series Anomaly Detection with .net and AzureMarco Parenzan
If you have any device or source that generates values over time (also a log from a service), you want to determine if in a time frame, the time serie is correct or you can detect some anomalies. What can you do as a developer (not a Data Scientist) with .NET o Azure? Let's see how in this session.
Observability - The good, the bad and the ugly Xp Days 2019 Kiev Ukraine Aleksandr Tavgen
Talk about approaches to an observability. Do we need millions of metrics? Anomalies vs regularities? Can Machine Learning help us? Some abilities of Flux language by InfluxData
Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...Spark Summit
At CERN, the biggest physics laboratory in the world, large volumes of data are generated every hour, it implies serious challenges to store and process all this data. An important part of this responsibility comes to the database group which not only provides services for RDBMS but also scalable systems as Hadoop, Spark and HBase. Since databases are critical, they need to be monitored, for that we have built a highly scalable, secure and central repository that stores consolidated audit data and listener, alert and OS log events generated by the databases. This central platform is used for reporting, alerting and security policy management. The database group want to further exploit the information available in this central repository to build intrusion detection system to enhance the security of the database infrastructure. In addition, build pattern detection models to flush out anomalies using the monitoring and performance metrics available in the central repository. Finally, this platform also helps us for capacity planning of the database deployment. The audience would get first-hand experience of how to build real time Apache Spark application that is deployed in production. They would hear the challenges faced and decisions taken while developing the application and troubleshooting Apache Spark and Spark streaming application in production.
Rise of the machines -- Owasp israel -- June 2014 meetupShlomo Yona
Rise of the machines -- Owasp israel -- June 2014 meetup
Shlomo Yona presents why it is a good idea to use Machine Learning in Security and explains some Machine Learning jargon and demonstraits with two fingerprinting examples: a wifi device (PHY) and a browser (L7)
How the Big Data of APM can Supercharge DevOpsCA Technologies
In the age where applications reign supreme, your organizations must be agile in application performance management and app development in order to meet the market demands and stay competitive. Even with mature APM solutions, developer, test and operations teams are strained by operational complexity, accelerated release schedules, and big data challenges to quickly find the root cause of issues affecting end user experience.
The power of advanced analytics and data science can help us make the most of the vast cache of APM data we collect and help our DevOps teams supercharge user experience. It’s time to take some of the load off of our humans and let technology make it easier to focus on meaningful changes in user, application and system behavior. Analytics are becoming a valuable component of APM solutions to redefine triage, improve application quality, and delight the end-user.
In a webcast on August 7th, 2014, Ken Godskind, Chief blogger and Analyst, APMExaminer.com shared how the big data of APM can supercharge your DevOps transformation. Chris Kline, Senior Director, CA Technologies followed Ken and discussed how the Advanced Behavior Analytics capability of CA APM can assist in this journey.
Ken and Chris used this slide set during the webcast which can be viewed at http://goo.gl/TZYEuq
Dependable Operation - Performance Management and Capacity Planning Under Con...Liming Zhu
Talk at http://www.cmga.org.au/ Meet up
Modern large-scale applications experience sporadic changes due to operational activities such as upgrade, redeployment, on-demand scaling and interferences from other simultaneous operations. This poses new challenges in system monitoring, capacity planning, performance management, error detection and diagnosis. For example, the traditional anomaly-detection-based techniques are less effective during the “sporadic” operation period as a wide range of legitimate changes confound the situation and make performance baseline establishment for “normal” operation difficult. The increasing frequency of these sporadic operations (e.g. due to continuous deployment) is exacerbating the problem. In this talk, we will introduce a number of ongoing research activities at NICTA addressing these issues. For example, we propose the Process Oriented Dependability (POD) approach, an approach that explicitly models these sporadic operations as processes and uses the process context to filter logs, traverse fault trees and conduct adaptive monitoring.
Brains@Bay Meetup: A Primer on Neuromodulatory Systems - Srikanth RamaswamyNumenta
Meetup page: https://www.meetup.com/Brains-Bay/events/284481247/
Neuromodulators are signalling chemicals in the brain, which control the emergence of adaptive learning and behaviour. Neuromodulators including dopamine, acetylcholine, serotonin and noradrenaline operate on a spectrum of spatio-temporal scales in tandem and opposition to reconfigure functions of biological neural networks and to regulate global cognition and state transition. Although neuromodulators are important in shaping cognition, their phenomenology is yet to be fully realized in deep neural networks (DNNs). In this talk, we will give an overview of the biological organizing principles of neuromodulators in adaptive cognition and highlight the competition and cooperation across neuromodulators.
Brains@Bay Meetup: How to Evolve Your Own Lab Rat - Thomas MiconiNumenta
Meetup page: https://www.meetup.com/Brains-Bay/events/284481247/
A hallmark of intelligence is the ability to learn new flexible, cognitive behaviors - that is, behaviors that require discovering, storing and exploiting novel information for each new instance of the task. In meta-learning, agents are trained with external algorithms to learn one specific cognitive task. However, animals are able to pick up such cognitive tasks automatically, as a result of their evolved neural architecture and synaptic plasticity mechanisms, including neuromodulation. Here we evolve neural networks, endowed with plastic connections and reward-based neuromodulation, over a sizable set of simple meta-learning tasks based on a framework from computational neuroscience. The resulting evolved networks can automatically acquire a novel simple cognitive task, never seen during evolution, through the spontaneous operation of their evolved neural organization and plasticity system. We suggest that attending to the multiplicity of loops involved in natural learning may provide useful insight into the emergence of intelligent behavior.
Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...Numenta
We receive information about the world through our sensors and influence the world through our effectors. Such low-level data has gradually come to play a greater role in AI during its 70-year history. I see this as occurring in four steps, two of which are mostly past and two of which are in progress or yet to come. The first step was to view AI as the design of agents which interact with the world and thereby have sensorimotor experience; this viewpoint became prominent in the 1980s and 1990s. The second step was to view the goal of intelligence in terms of experience, as in the reward signal of optimal control and reinforcement learning. The reward formulation of goals is now widely used but rarely loved. Many would prefer to express goals in non-experiential terms, such as reaching a destination or benefiting humanity, but settle for reward because, as an experiential signal, reward is directly available to the agent without human assistance or interpretation. This is the pattern that we see in all four steps. Initially a non-experiential approach seems more intuitive, is preferred and tried, but ultimately proves a limitation on scaling; the experiential approach is more suited to learning and scaling with computational resources. The third step in the increasing role of experience in AI concerns the agent’s representation of the world’s state. Classically, the state of the world is represented in objective terms external to the agent, such as “the grass is wet” and “the car is ten meters in front of me”, or with probability distributions over world states such as in POMDPs and other Bayesian approaches. Alternatively, the state of the world can be represented experientially in terms of summaries of past experience (e.g., the last four Atari video frames input to DQN) or predictions of future experience (e.g., successor representations). The fourth step is potentially the biggest: world knowledge. Classically, world knowledge has always been expressed in terms far from experience, and this has limited its ability to be learned and maintained. Today we are seeing more calls for knowledge to be predictive and grounded in experience. After reviewing the history and prospects of the four steps, I propose a minimal architecture for an intelligent agent that is entirely grounded in experience.
Brains@Bay Meetup: Open-ended Skill Acquisition in Humans and Machines: An Ev...Numenta
In this talk, I will propose a conceptual framework sketching a path toward open-ended skill acquisition through the coupling of environmental, morphological, sensorimotor, cognitive, developmental, social, cultural and evolutionary mechanisms. I will illustrate parts of this framework through computational experiments highlighting the key role of intrinsically motivated exploration in the generation of behavioral regularity and diversity. Firstly, I will show how some forms of language can self-organize out of generic exploration mechanisms without any functional pressure to communicate. Secondly, we will see how language — once invented — can be recruited as a cognitive tool that enables compositional imagination and bootstraps open-ended cultural innovation.
For more:
Brains@Bay Meetup: The Effect of Sensorimotor Learning on the Learned Represe...Numenta
Most current deep neural networks learn from a static data set without active interaction with the world. We take a look at how learning through a closed loop between action and perception affects the representations learned in a DNN. We demonstrate how these representations are significantly different from DNNs that learn supervised or unsupervised from a static dataset without interaction. These representations are much sparser and encode meaningful content in an efficient way. Even an agent who learned without any external supervision, purely through curious interaction with the world, acquires encodings of the high dimensional visual input that enable the agent to recognize objects using only a handful of labeled examples. Our results highlight the capabilities that emerge from letting DNNs learn more similar to biological brains, though sensorimotor interaction with the world.
For more:
SBMT 2021: Can Neuroscience Insights Transform AI? - Lawrence SpracklenNumenta
Numenta's Director of ML Architecture Lawrence Spracklen presented a talk at the SBMT Annual Congress on July 10th, 2021. He talked about how neuroscience principles can inspire better machine learning algorithms.
FPGA Conference 2021: Breaking the TOPS ceiling with sparse neural networks -...Numenta
Nick Ni (Xilinx) and Lawrence Spracklen (Numenta) presented a talk at the FGPA Conference Europe on July 8th, 2021. In this talk, they presented a neuroscience approach to optimize state-of-the-art deep learning networks into sparse topology and how it can unlock significant performance gains on FPGAs without major loss of accuracy. They then walked through the FPGA implementation where they exploited the advantage of sparse networks with a unique Domain Specific Architecture (DSA).
BAAI Conference 2021: The Thousand Brains Theory - A Roadmap for Creating Mac...Numenta
Jeff Hawkins presented a talk on "The Thousand Brains Theory: A Roadmap to Machine Intelligence" at the Beijing Academy of Artificial Intelligence Conference on 1st June 2021. In this talk, he discussed the key components of The Thousand Brains Theory and Numenta's recent work.
Jeff Hawkins NAISys 2020: How the Brain Uses Reference Frames, Why AI Needs t...Numenta
Jeff Hawkins presents a talk on "How the Brain Uses Reference Frames to Model the World, Why AI Needs to do the Same." In this talk, he gives an overview of The Thousand Brains Theory and discusses how machine intelligence can benefit from working on the same principles as the neocortex.
This talk was first presented at the NAISys conference on November 10, 2020. You can find a re-recording of the talk here: https://youtu.be/mGSG7I9VKDU
OpenAI’s GPT 3 Language Model - guest Steve OmohundroNumenta
In this research meeting, guest Stephen Omohundro gave a fascinating talk on GPT-3, the new massive OpenAI Natural Language Processing model. He reviewed the network architecture, training process, and results in the context of past work. There was extensive discussion on the implications for NLP and for Machine Intelligence / AGI.
Link to GPT-3 paper: https://arxiv.org/abs/2005.14165
Link to YouTube recording of Steve's talk: https://youtu.be/0ZVOmBp29E0
CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...Numenta
Numenta VP Research Subutai Ahmad presents a talk on "Sparsity in the Neocortex and its Implications for Continual Learning" at the virtual CVPR 2020 workshop. In this talk, he discusses how continuous learning systems can benefit from sparsity, active dendrites and other neocortical mechanisms.
The Thousand Brains Theory: A Framework for Understanding the Neocortex and B...Numenta
Recent advances in reverse engineering the neocortex reveal that it is a highly-distributed sensory-motor modeling system. Each cortical column learns complete models of observed objects through movement and sensation. The columns use long-range connections to vote on what objects are currently being observed. In this talk, we introduce the key elements of this theory and describe how these elements can be introduced into current machine learning techniques to improve their capabilities, robustness, and power requirements.
Jeff Hawkins Human Brain Project Summit Keynote: "Location, Location, Locatio...Numenta
Jeff Hawkins delivered this keynote presentation at the 2018 Human Brain Project Summit Open Day in Maastricht, the Netherlands on October 15, 2018. A screencast recording of the slides is also available at: https://numenta.com/resources/videos/jeff-hawkins-human-brain-project-screencast/
Location, Location, Location - A Framework for Intelligence and Cortical Comp...Numenta
Jeff Hawkins gave this presentation as part of the Johns Hopkins APL Colloquium Series on Septemer 21, 2018.
View the video of the talk here: https://numenta.com/resources/videos/jeff-hawkins-johns-hopkins-apl-talk/
Have We Missed Half of What the Neocortex Does? A New Predictive Framework ...Numenta
Numenta VP of Research Subutai Ahmad delivered this presentation at the Centre for Theoretical Neuroscience, University of Waterloo on October 2, 2018.
The Biological Path Toward Strong AI by Matt Taylor (05/17/18)Numenta
These are Matt Taylor's slides from the AI Singapore Meetup on May 17, 2018.
Abstract:
Today’s wave of AI technology is still being driven by the ANN neuron pioneered decades ago. Hierarchical Temporal Memory (HTM) is a realistic biologically-constrained model of the pyramidal neuron reflecting today’s most recent neocortical research. This talk will describe and visualize core HTM concepts like sparse distributed representations, spatial pooling and temporal memory. Strong AI is a common goal of many computer scientists. So far, machine learning techniques have created amazing results in narrow fields, but haven’t produced something we could all call “intelligent”. Given recent advances in neuroscience research, we know a lot more about how neurons work together now than we did when ANNs were created. We believe systems with a more realistic neuronal model will be more likely to produce Strong AI. Hierarchical Temporal Memory is a theory of intelligence based upon neuroscience research. The neocortex is the seat of intelligence in the brain, and it is structurally homogeneous throughout. This means a common algorithm is processing all your sensory input, no matter which sense. We believe we have discovered some of the foundational algorithms of the neocortex, and we’ve implemented them in software. I’ll show you how they work with detailed dynamic visualizations of Sparse Distributed Representations, Spatial Pooling, and Temporal Memory.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
4. 4
TYPES OF ANOMALIES IN STREAMING DATA
Point anomalies
Temporal
(contextual/
conditional)
5. 5
ANOMALY DETECTION TECHNIQUES
• Traditional techniques
• Classification-based
• Clustering & nearest-neighbor
• Statistical techniques
• Chandola et al., “Anomaly Detection: A Survey”
• In streaming we typically see a collection of statistical techniques
• time-series modeling and forecasting models (e.g. ARIMA)
• change point detection
• outliers tests (e.g. ESD, k-sigma)
• Most techniques not suitable for streaming data
• new approaches needed
• non-streaming benchmarks aren't very useful
6. 6
WHY CREATE A BENCHMARK?
• A benchmark consists of:
• Labeled data files
• Scoring mechanism
• Versioning system
• Most existing benchmarks are designed for batch data, not
streaming data
• We saw a need for a benchmark that is designed to test anomaly
detection algorithms on real-time, streaming data
• Hard to find benchmarks containing real world data labeled with
anomalies
• Impact of published techniques suffers because researchers use
use different data, and/or completely artificial data.
• A standard community benchmark could spur innovation in real-
time anomaly detection algorithms
7. 7
NUMENTA ANOMALY BENCHMARK (NAB)
• NAB: a rigorous benchmark for anomaly
detection in streaming applications
• Real-world benchmark dataset
• 58 labeled data streams
(47 real-world, 11 artificial streams)
• Total of 365,551 data points
• Scoring mechanism
• Custom scoring function
• Reward early detection
• Anomaly windows
• Different “application profiles”
• Open resource
• AGPL repository contains data, source code,
and documentation
• github.com/numenta/NAB!
11. 11
HOW SHOULD WE SCORE ANOMALIES?
• The perfect detector
• Detects every anomaly
• Detects anomalies as soon as possible
• tremendous value to detecting anomalies beforehand
• Provides detections in real time
• Triggers no false alarms
• Requires no parameter tuning
• can’t manually tune params because potentially thousands of models
• Automatically adapts to changing statistics
• e.g. servers get new SW
12. 12
HOW SHOULD WE SCORE ANOMALIES?
• Scoring methods in traditional benchmarks are insufficient
• Precision, recall, and F1-score do not incorporate the value of time
• early detections are not rewarded
• Artificial separation into training and test sets does not handle continuous learning
• Batch data files allow look ahead and multiple passes through the data
• this is unrealistic for real-world use
15. 15
• Effect of each detection is scaled
relative to position within window:
• Detections outside window are false
positives (scored low)
• Multiple detections within window are
ignored (use earliest one)
• Total score is sum of scaled detections
+ weighted sum of missed detections:
SCORING FUNCTION
16. 16
OTHER DETAILS
• Application profiles
• Application profiles assign different weightings based on the tradeoff between false
positives and false negatives.
• EKG data on a cardiac patient favors FPs over FNs.
• IT / DevOps professionals hate FPs.
• Three application profiles: standard, favor low false positives, favor low false negatives.
• NAB emulates practical real-time scenarios
• Look ahead not allowed for algorithms. Detections must be made on the fly.
• No separation between training and test files. Invoke model, start streaming, and go.
• No batch, per data file, parameter tuning. Must be fully automated with single set of
parameters across data files. Any further parameter tuning must be done on the fly.
17. 17
TESTING ALGORITHMS WITH NAB
• NAB is a community effort
• The goal is to have researchers independently evaluate a large number of algorithms
• Very easy to plug in and test new algorithms
• Seed results with three algorithms:
• Hierarchical Temporal Memory
• Numenta’s open source streaming anomaly detection algorithm
• Models temporal sequences in data, continuously learning
• Etsy Skyline
• Popular open source anomaly detection technique
• Mixture of statistical experts, continuously learning
• Twitter AnomalyDetection
• Open source anomaly detection released earlier this year
• Robust outlier statistics + piecewise approximation
19. 19
DETECTION RESULTS: CPU USAGE ON
PRODUCTION SERVER
Simple spike, all 3
algorithms detect
Shift in usage
Etsy
Skyline
Numenta
HTM
Twitter
ADVec
Red denotes
False Positive
Key
20. 20
DETECTION RESULTS: MACHINE
TEMPERATURE READINGS
HTM detects purely
temporal anomaly
Etsy
Skyline
Numenta
HTM
Twitter
ADVec
Red denotes
False Positive
Key
All 3 detect
catastrophic failure
21. 21
DETECTION RESULTS: TEMPORAL CHANGES IN
BEHAVIOR OFTEN PRECEDE A LARGER SHIFT
HTM detects anomaly 3
hours earlier
Etsy
Skyline
Numenta
HTM
Twitter
ADVec
Red denotes
False Positive
Key
22. 22
SUMMARY
• Anomaly detection is most common application for streaming analytics
• NAB is a community benchmark for streaming anomaly detection
• Includes a labeled dataset with real data
• Scoring methodology designed for practical real-time applications
• Fully open source codebase
• What can you get out of NAB?
• Test and improve your algorithms
• Contribute and improve NAB
• Learn about streaming anomaly detection
23. 23
SUMMARY
• What’s next for NAB?
• We hope to see researchers test additional algorithms
• We hope to spark improved algorithms for streaming
• More data sets!
• Could incorporate UC Irvine dataset, Yahoo labs dataset (not open source)
• Would love to get more labeled streaming datasets from you
• Add support for multivariate anomaly detection
• Any changes that affect the results will be released with v2.0
24. 24
NAB RESOURCES
Repository: github.com/numenta/NAB
Paper:
A. Lavin and S. Ahmad, “Evaluating Real-time Anomaly Detection Algorithms –
the Numenta Anomaly Benchmark,” to appear in 14th International Conference
on Machine Learning and Applications (IEEE ICMLA’15), 2015.
Preprint available: arxiv.org/abs/1510.03336
Presentation from MLConf:
https://www.youtube.com/watch?v=SxtsCrTHz-4
Contact info:
nab@numenta.org
alavin@numenta.com, sahmad@numenta.com
26. 26
NUMENTA RESOURCES
• “Properties of Sparse Distributed Representations and their Application to
Hierarchical Temporal Memory”: http://arxiv.org/abs/1503.07469
• “Why Neurons Have Thousands of Synapses, A Theory of Sequence
Memory in Neocortex”: http://arxiv.org/abs/1511.00083
• NuPIC: Numenta Platform for Intelligent Computing open source repo
• https://github.com/numenta/nupic
• http://numenta.org/
• Numenta
• http://numenta.com/
• HTM Whitepaper:
http://numenta.com/learn/hierarchical-temporal-memory-white-paper.html
27. 27
NAB EXAMPLES
• Figs. 1, 2, 5 from the paper: plot.ly/~alavin/3767
• Fig. 4 from the paper: plot.ly/~alavin/3753
• Fig. 6 from the paper: plot.ly/~alavin/3706
• Subtle change in CPU utilization that precedes a much larger anomaly:
plot.ly/~alavin/3720
• An anomaly preceding a much larger drop in CPU utilization: plot.ly/
~alavin/3717
• All three detectors get the two TPs, but in different orders: plot.ly/~alavin/
3741
• Good detections by HTM, but a lot of FPs: plot.ly/~alavin/3711
• Noisy, difficult CPU utilization data: plot.ly/~alavin/3761
• Temporal anomalies in spiking social media data: plot.ly/~alavin/3815
• No true anomalies, but FP detections in CPU utilization data: https://plot.ly/
~alavin/3723
29. 29
• Scoring example
a) FP before the window
b) TP in the window
c) additional TP (not counted)
d) FP soon after the window
e) FP long after the window
Ø total score = -1.809
• Missing a window
completely (i.e. FN)
detriments the score
-1.0
SCALED SIGMOID SCORING FUNCTION
29
(a)
(c)
(d)
(e)
(b)
30. 30
ANOMALY DETECTION WITH HTM
• How do we turn a data stream into anomaly scores?
HTM Algorithms
Encoder SDR Predictions
Raw anomaly score
Anomaly likelihood
Data
31. 31
CALCULATING RAW ANOMALY SCORE
• Raw anomaly score is the fraction of active columns that were not
predicted.
• This is high when the spatial or temporal patterns deviate from the
norm.
rawAnomalyScore =
At −(Pt−1 ∩ At )
At
Pt = Predicted columns at time t
At = Active columns at time t
33. 33
• Compute normal distribution over history
• Compute probability for each point relative to the distribution
CALCULATING ANOMALY LIKELIHOOD
µ = xP(x)∑ σ = E[(X −µ)2
]
34. 34
CALCULATING ANOMALY LIKELIHOOD
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
Probability
Probability
Distribu.on
Mean 0.0201
Std. Dev. 0.1237
0
0.2
0.4
0.6
0.8
1
Raw
Anomaly
Score