History and Present Developments in Machine Learning, by Tom Dietterich, Emeritus Professor of computer science at Oregon State University and Chief Scientist at BigML.
*Machine Learning School in The Netherlands 2022.
Searching for Anomalies, by Thomas Dietterich, Distinguished Professor Emeritus in the School of EECS at Oregon State University and Chief Scientist of BigML.
*MLSEV 2020: Virtual Conference.
By popular demand, here is a case study of my first Kaggle competition from about a year ago. Hope you find it useful. Thank you again to my fantastic team.
DutchMLSchool. ML: A Technical PerspectiveBigML, Inc
DutchMLSchool. Machine Learning: A Technical Perspective
TITLE AS IN SCHEDULE - Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
This presentation inludes step-by step tutorial by including the screen recordings to learn Rapid Miner.It also includes the step-step-step procedure to use the most interesting features -Turbo Prep and Auto Model.
A brief introduction to clustering with Scikit learn. In this presentation, we provide an overview with real examples of how to make use and optimize within k-means clustering.
Finding interesting patterns in data can lead to uncovering new knowledge. New patterns that haven’t occurred before can signify events of interest. Depending on context, these can be called novelties, anomalies, outliers or events. Whatever they are called, they are interesting because they tell a story different from the norm. In this talk, we will call them anomalies. Two diverse applications of anomaly detection are detecting fraudulent credit card transactions and identifying astronomical anomalies such as solar flares.
However, there are many challenges in anomaly detection including high false positive rates and low predictive accuracy. Ensemble learning is a way of combining many algorithms or models to obtain better predictive performance. Anomaly detection is generally an unsupervised task, that is, we do not train models using labelled data. Constructing an unsupervised anomaly detection ensemble is challenging because we do not know the labels. In this talk we discuss two topics in anomaly detection. First, we introduce an anomaly detection ensemble using Item Response Theory (IRT) – a class of models used in educational psychometrics. Using IRT we construct an ensemble that can downplay noisy, non-discriminatory methods and accentuate sharper methods.
Then we explore anomaly detection in computer network security. With cyber incidents and data breaches becoming increasingly common, we have seen a massive increase in computer network attacks over the years. Anomaly detection methods, even though used to detect suspicious behaviour, are criticized for high false positive rates. In addition, computer networks produce a large amount of complex data. We go through the end-to-end process of detecting anomalies in this scenario and show how we can minimize false positives and visualise anomalies developing over time.
Searching for Anomalies, by Thomas Dietterich, Distinguished Professor Emeritus in the School of EECS at Oregon State University and Chief Scientist of BigML.
*MLSEV 2020: Virtual Conference.
By popular demand, here is a case study of my first Kaggle competition from about a year ago. Hope you find it useful. Thank you again to my fantastic team.
DutchMLSchool. ML: A Technical PerspectiveBigML, Inc
DutchMLSchool. Machine Learning: A Technical Perspective
TITLE AS IN SCHEDULE - Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
This presentation inludes step-by step tutorial by including the screen recordings to learn Rapid Miner.It also includes the step-step-step procedure to use the most interesting features -Turbo Prep and Auto Model.
A brief introduction to clustering with Scikit learn. In this presentation, we provide an overview with real examples of how to make use and optimize within k-means clustering.
Finding interesting patterns in data can lead to uncovering new knowledge. New patterns that haven’t occurred before can signify events of interest. Depending on context, these can be called novelties, anomalies, outliers or events. Whatever they are called, they are interesting because they tell a story different from the norm. In this talk, we will call them anomalies. Two diverse applications of anomaly detection are detecting fraudulent credit card transactions and identifying astronomical anomalies such as solar flares.
However, there are many challenges in anomaly detection including high false positive rates and low predictive accuracy. Ensemble learning is a way of combining many algorithms or models to obtain better predictive performance. Anomaly detection is generally an unsupervised task, that is, we do not train models using labelled data. Constructing an unsupervised anomaly detection ensemble is challenging because we do not know the labels. In this talk we discuss two topics in anomaly detection. First, we introduce an anomaly detection ensemble using Item Response Theory (IRT) – a class of models used in educational psychometrics. Using IRT we construct an ensemble that can downplay noisy, non-discriminatory methods and accentuate sharper methods.
Then we explore anomaly detection in computer network security. With cyber incidents and data breaches becoming increasingly common, we have seen a massive increase in computer network attacks over the years. Anomaly detection methods, even though used to detect suspicious behaviour, are criticized for high false positive rates. In addition, computer networks produce a large amount of complex data. We go through the end-to-end process of detecting anomalies in this scenario and show how we can minimize false positives and visualise anomalies developing over time.
The Power of Auto ML and How Does it WorkIvo Andreev
Automated ML is an approach to minimize the need of data science effort by enabling domain experts to build ML models without having deep knowledge of algorithms, mathematics or programming skills. The mechanism works by allowing end-users to simply provide data and the system automatically does the rest by determining approach to perform particular ML task. At first this may sound discouraging to those aiming to the “sexiest job of the 21st century” - the data scientists. However, Auto ML should be considered as democratization of ML, rather that automatic data science.
In this session we will talk about how Auto ML works, how is it implemented by Microsoft and how it could improve the productivity of even professional data scientists.
This slide gives brief overview of supervised, unsupervised and reinforcement learning. Algorithms discussed are Naive Bayes, K nearest neighbour, SVM,decision tree, Markov model.
Difference between regression and classification. difference between supervised and reinforcement, iterative functioning of Markov model and machine learning applications.
In Part II of the Anomaly Detection Series, we discuss the challenges in analyzing Temporal datasets and discuss methods for outlier analysis. We focus on single time series and discuss point outlier and sub-sequence methods.
Leveraging Machine Learning Techniques Predictive Analytics for Knowledge Dis...Kevin Mader
Review the basic principles of predictive analytics.
Be exposed to some of the existing validation methodologies to test predictive models.
Understand how to incorporate radiology data sources (PACS, RIS, etc) into predictive modeling
Learn how to interpret results and make visualizations.
May 2015 talk to SW Data Meetup by Professor Hendrik Blockeel from KU Leuven & Leiden University.
With increasing amounts of ever more complex forms of digital data becoming available, the methods for analyzing these data have also become more diverse and sophisticated. With this comes an increased risk of incorrect use of these methods, and a greater burden on the user to be knowledgeable about their assumptions. In addition, the user needs to know about a wide variety of methods to be able to apply the most suitable one to a particular problem. This combination of broad and deep knowledge is not sustainable.
The idea behind declarative data analysis is that the burden of choosing the right statistical methodology for answering a research question should no longer lie with the user, but with the system. The user should be able to simply describe the problem, formulate a question, and let the system take it from there. To achieve this, we need to find answers to questions such as: what languages are suitable for formulating these questions, and what execution mechanisms can we develop for them? In this talk, I will discuss recent and ongoing research in this direction. The talk will touch upon query languages for data mining and for statistical inference, declarative modeling for data mining, meta-learning, and constraint-based data mining. What connects these research threads is that they all strive to put intelligence about data analysis into the system, instead of assuming it resides in the user.
Hendrik Blockeel is a professor of computer science at KU Leuven, Belgium, and part-time associate professor at Leiden University, The Netherlands. His research interests lie mostly in machine learning and data mining. He has made a variety of research contributions in these fields, including work on decision tree learning, inductive logic programming, predictive clustering, probabilistic-logical models, inductive databases, constraint-based data mining, and declarative data analysis. He is an action editor for Machine Learning and serves on the editorial board of several other journals. He has chaired or organized multiple conferences, workshops, and summer schools, including ILP, ECMLPKDD, IDA and ACAI, and he has been vice-chair, area chair, or senior PC member for ECAI, IJCAI, ICML, KDD, ICDM. He was a member of the board of the European Coordinating Committee for Artificial Intelligence from 2004 to 2010, and currently serves as publications chair for the ECMLPKDD steering committee.
Machine learning and linear regression programmingSoumya Mukherjee
Overview of AI and ML
Terminology awareness
Applications in real world
Use cases within Nokia
Types of Learning
Regression
Classification
Clustering
Linear Regression Single Variable with python
A data science observatory based on RAMP - rapid analytics and model prototypingAkin Osman Kazakci
RAMP approach to analytics: Rapid Analytics and Model Prototyping; collaborative data challenges with in-built data science process management tools and analytics; An observatory of data science and scientists. Presented at the Design Theory Special Interest Group of International Design Society. Mines ParisTech and Centre for Data Science.
5 Practical Steps to a Successful Deep Learning ResearchBrodmann17
Deep Learning has gained a huge popularity over the last several years. Especially due to its magnificent progress in many domains.
Many resources are out there including open source implementations of recent research advancements. This vast availability is somehow misleading because when one actually wants to create a Deep Learning based product, he soon realizes that there is a large gap between these open source implementations and a real production grade Deep Learning product. Closing this gap can take months of work involving large costs, especially on man power and compute power.
Throughout this talk I will talk based on my experience leading the research at Brodmann17 about several aspects we have found to be important for building Deep Learning based computer vision products.
Digital Transformation and Process Optimization in ManufacturingBigML, Inc
Keyanoush Razavidinani, Digital Services Consultant at A1 Digital, a BigML Partner, highlights why it is important to identify and reduce human bottlenecks that optimize processes and let you focus on important activities. Additionally, Guillem Vidal, Machine Learning Engineer at BigML completes the session by showcasing how Machine Learning is put to use in the manufacturing industry with a use case to detect factory failures.
The Road to Production: Automating your Anomaly Detectors - by jao (Jose A. Ortega), Co-Founder and Chief Technology Officer at BigML.
*Machine Learning School in The Netherlands 2022.
More Related Content
Similar to DutchMLSchool 2022 - History and Developments in ML
The Power of Auto ML and How Does it WorkIvo Andreev
Automated ML is an approach to minimize the need of data science effort by enabling domain experts to build ML models without having deep knowledge of algorithms, mathematics or programming skills. The mechanism works by allowing end-users to simply provide data and the system automatically does the rest by determining approach to perform particular ML task. At first this may sound discouraging to those aiming to the “sexiest job of the 21st century” - the data scientists. However, Auto ML should be considered as democratization of ML, rather that automatic data science.
In this session we will talk about how Auto ML works, how is it implemented by Microsoft and how it could improve the productivity of even professional data scientists.
This slide gives brief overview of supervised, unsupervised and reinforcement learning. Algorithms discussed are Naive Bayes, K nearest neighbour, SVM,decision tree, Markov model.
Difference between regression and classification. difference between supervised and reinforcement, iterative functioning of Markov model and machine learning applications.
In Part II of the Anomaly Detection Series, we discuss the challenges in analyzing Temporal datasets and discuss methods for outlier analysis. We focus on single time series and discuss point outlier and sub-sequence methods.
Leveraging Machine Learning Techniques Predictive Analytics for Knowledge Dis...Kevin Mader
Review the basic principles of predictive analytics.
Be exposed to some of the existing validation methodologies to test predictive models.
Understand how to incorporate radiology data sources (PACS, RIS, etc) into predictive modeling
Learn how to interpret results and make visualizations.
May 2015 talk to SW Data Meetup by Professor Hendrik Blockeel from KU Leuven & Leiden University.
With increasing amounts of ever more complex forms of digital data becoming available, the methods for analyzing these data have also become more diverse and sophisticated. With this comes an increased risk of incorrect use of these methods, and a greater burden on the user to be knowledgeable about their assumptions. In addition, the user needs to know about a wide variety of methods to be able to apply the most suitable one to a particular problem. This combination of broad and deep knowledge is not sustainable.
The idea behind declarative data analysis is that the burden of choosing the right statistical methodology for answering a research question should no longer lie with the user, but with the system. The user should be able to simply describe the problem, formulate a question, and let the system take it from there. To achieve this, we need to find answers to questions such as: what languages are suitable for formulating these questions, and what execution mechanisms can we develop for them? In this talk, I will discuss recent and ongoing research in this direction. The talk will touch upon query languages for data mining and for statistical inference, declarative modeling for data mining, meta-learning, and constraint-based data mining. What connects these research threads is that they all strive to put intelligence about data analysis into the system, instead of assuming it resides in the user.
Hendrik Blockeel is a professor of computer science at KU Leuven, Belgium, and part-time associate professor at Leiden University, The Netherlands. His research interests lie mostly in machine learning and data mining. He has made a variety of research contributions in these fields, including work on decision tree learning, inductive logic programming, predictive clustering, probabilistic-logical models, inductive databases, constraint-based data mining, and declarative data analysis. He is an action editor for Machine Learning and serves on the editorial board of several other journals. He has chaired or organized multiple conferences, workshops, and summer schools, including ILP, ECMLPKDD, IDA and ACAI, and he has been vice-chair, area chair, or senior PC member for ECAI, IJCAI, ICML, KDD, ICDM. He was a member of the board of the European Coordinating Committee for Artificial Intelligence from 2004 to 2010, and currently serves as publications chair for the ECMLPKDD steering committee.
Machine learning and linear regression programmingSoumya Mukherjee
Overview of AI and ML
Terminology awareness
Applications in real world
Use cases within Nokia
Types of Learning
Regression
Classification
Clustering
Linear Regression Single Variable with python
A data science observatory based on RAMP - rapid analytics and model prototypingAkin Osman Kazakci
RAMP approach to analytics: Rapid Analytics and Model Prototyping; collaborative data challenges with in-built data science process management tools and analytics; An observatory of data science and scientists. Presented at the Design Theory Special Interest Group of International Design Society. Mines ParisTech and Centre for Data Science.
5 Practical Steps to a Successful Deep Learning ResearchBrodmann17
Deep Learning has gained a huge popularity over the last several years. Especially due to its magnificent progress in many domains.
Many resources are out there including open source implementations of recent research advancements. This vast availability is somehow misleading because when one actually wants to create a Deep Learning based product, he soon realizes that there is a large gap between these open source implementations and a real production grade Deep Learning product. Closing this gap can take months of work involving large costs, especially on man power and compute power.
Throughout this talk I will talk based on my experience leading the research at Brodmann17 about several aspects we have found to be important for building Deep Learning based computer vision products.
Similar to DutchMLSchool 2022 - History and Developments in ML (20)
Digital Transformation and Process Optimization in ManufacturingBigML, Inc
Keyanoush Razavidinani, Digital Services Consultant at A1 Digital, a BigML Partner, highlights why it is important to identify and reduce human bottlenecks that optimize processes and let you focus on important activities. Additionally, Guillem Vidal, Machine Learning Engineer at BigML completes the session by showcasing how Machine Learning is put to use in the manufacturing industry with a use case to detect factory failures.
The Road to Production: Automating your Anomaly Detectors - by jao (Jose A. Ortega), Co-Founder and Chief Technology Officer at BigML.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - ML for AML ComplianceBigML, Inc
Machine Learning for Anti Money Laundering Compliance, by Kevin Nagel, Consultant and Data Scientist at INFORM.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - Multi Perspective AnomaliesBigML, Inc
Multi Perspective Anomalies, by Jan W Veldsink, Master in the art of AI at Nyenrode, Rabobank, and Grio.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - My First Anomaly Detector BigML, Inc
My First Anomaly Detector: Practical Workshop, by Mercè Martín, VP of Bindings and Applications at BigML.
*Machine Learning School in The Netherlands 2022.
Introduction to End-to-End Machine Learning: Classification and Regression - Mercè Martín, VP of Bindings and Applications at BigML.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - A Data-Driven CompanyBigML, Inc
A Data-Driven Company: 21 Lessons for Large Organizations to Create Value from AI, by Richard Benjamins, Chief AI and Data Strategist at Telefónica.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - ML in the Legal SectorBigML, Inc
How Machine Learning Transforms and Automates Legal Services, by Arnoud Engelfriet, Co-Founder at Lynn Legal.
*Machine Learning School in The Netherlands 2022.
Machine Learning for Public Safety: Reducing Violence and Discrimination in Stadiums.
Speakers: Ramon van Ingen, Co-Founder at Siip, Entrepreneur, Researcher, and Pablo González, Machine Learning Engineer at BigML.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsBigML, Inc
Process Optimization in Manufacturing Plants, by Keyanoush Razavidinani, Digital Business Consultant at A1 Digital.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - Anomaly Detection at ScaleBigML, Inc
Lessons Learned Applying Anomaly Detection at Scale, by Álvaro Clemente, Machine Learning Engineer at BigML.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - Citizen Development in AIBigML, Inc
Citizen Development in AI, by Jan W Veldsink, Master in the art of AI at Nyenrode, Rabobank, and Grio.
*Machine Learning School in The Netherlands 2022.
This new feature is a continuation of and improvement on our previous Image Processing release. Now, Object Detection lets you go a step further with your image data and allows you to locate objects and annotate regions in your images. Once your image regions are defined, you can train and evaluate Object Detection models, make predictions with them, and automate end-to-end Machine Learning workflows on a single platform. To make that possible, BigML enables Object Detection by introducing the regions optype.
As with any other BigML feature, Object Detection is available from the BigML Dashboard, API, and WhizzML for automation. Object Detection is extremely helpful to tackle a wide range of computer vision use cases such as medical image analysis, quality control in manufacturing, license plate recognition in transportation, people detection in security surveillance, among many others.
This new release brings Image Processing to the BigML platform, a feature that enhances our offering to solve image data-driven business problems with remarkable ease of use. Because BigML treats images as any other data type, this unique implementation allows you to easily use image data alongside text, categorical, numeric, date-time, and items data types as input to create any Machine Learning model available in our platform, both supervised and unsupervised.
Now, it is easier than ever to solve a wide variety of computer vision and image classification use cases in a single platform: label your image data, train and evaluate your models, make predictions, and automate your end-to-end Machine Learning workflows. As with any other BigML feature, Image Processing is available from the BigML Dashboard, API, and WhizzML, and it can be applied to solve use cases such as medical image analysis, visual product search, security surveillance, and vehicle damage detection, among others.
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureBigML, Inc
This session presents a quite common situation for those working in food and beverage retail (FnB) and highlights interesting insights to fight waste reduction.
Speaker: Stephen Kinns, CEO and Co-Founder at catsAi.
*ML in Retail 2021: Webinar.
Machine Learning in Retail: ML in the Retail SectorBigML, Inc
This is an introductory session about the role that Machine Learning is playing in the retail sector and how it is being deployed across the different areas of this industry.
Speaker: Atakan Cetinsoy, VP of Predictive Applications at BigML.
*ML in Retail 2021: Webinar.
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotBigML, Inc
This presentation analyzes the role that Machine Learning plays in legal automation with a real-world Machine Learning application.
Speaker: Arnoud Engelfriet, Co-Founder at Lynn Legal.
*ML in GRC 2021: Virtual Conference.
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...BigML, Inc
This is a real-life Machine Learning use case about integrated risk.
Speakers: Thomas Rengersen, Product Owner of the Governance Risk and Compliance Tool for Rabobank, and Thomas Alderse Baas, Co-Founder and Director of The Bowmen Group.
*ML in GRC 2021: Virtual Conference.
ML in GRC: Cybersecurity versus Governance, Risk Management, and ComplianceBigML, Inc
Some of these concepts (Cybersecurity, Governance, Risk Management, and Compliance) overlap and sometimes they can be confusing. This session helps us understand why those terms are key for any business to be successful.
Speaker: Jon Shende, Founding Investor at MyVayda.
*ML in GRC 2021: Virtual Conference.
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
3. BigML, Inc #DutchMLSchool
• Anomaly Detection Use Cases
• Four Basic Methods for Anomaly Detection with Engineered Features
• Benchmarking Study
• Incorporating Feedback
• Deep Versions of the Four Basic Methods
• Classifier-Based Anomaly Detection using the Max Logit Score
• Familiarity Hypothesis
• Challenges for the Future
Outline
3
5. BigML, Inc #DutchMLSchool 5
•Data Cleaning
•Remove corrupted data from the training data
•Example: Typos in feature values, feature values interchanged, test results from two patients
combined
•Fault Detection, Fraud Detection, Cyber Attack
•At training or test time, faulty or illegal behavior creates anomalous data
•Open Category Detection
•At test time, the classifier is given an instance of a novel category
•Example: Self-driving car (trained in Europe) encounters a kangaroo (in Australia)
•Out-of-Distribution Detection
•At test time, the classifier is given an instance collected in a different way
•Example: Chest X-Ray classifier trained only on front views is shown a side view
•Example: Self-driving car trained in clear conditions must operate during rainy conditions
Use Cases
6. BigML, Inc #DutchMLSchool 6
•Claim: Every deployed ML
classifier should include an
anomaly detector to detect
queries that lie outside the
region of competence of the
classifier
•Also useful as a performance
indicator to detect that you
need to retrain the classifier
Protecting a Classifier
𝑥𝑥𝑞𝑞
Anomaly
Detector
𝐴𝐴 𝑥𝑥𝑞𝑞 > 𝜏𝜏?
Classifier 𝑓𝑓
Training
Examples
(𝑥𝑥𝑖𝑖, 𝑦𝑦𝑖𝑖) no
�
𝑦𝑦 = 𝑓𝑓(𝑥𝑥𝑞𝑞)
yes reject
7. BigML, Inc #DutchMLSchool 7
•Definition: An “anomaly” is a data point generated by a process that is
different than the process generating the “nominal” data
•Let 𝐷𝐷0 be the probability distribution of the nominal process
•Let 𝐷𝐷𝑎𝑎 be the probability distribution of the anomaly process
•Two formal settings
• Clean training data
• Contaminated training data
Anomaly Detection Definitions
8. BigML, Inc #DutchMLSchool 8
• Given:
• Training data: 𝑥𝑥1, 𝑥𝑥2, … , 𝑥𝑥𝑁𝑁
• All data come from 𝐷𝐷0 the “nominal” distribution
• Test data: 𝑥𝑥𝑁𝑁+1, … , 𝑥𝑥𝑁𝑁+𝑀𝑀 from a mixture of 𝐷𝐷0 and 𝐷𝐷𝑎𝑎 (the anomaly
distribution)
• Find:
• The data points in the test data that belong to 𝐷𝐷𝑎𝑎
• Examples:
• Protecting a classifier
• Detecting manufacturing defects / equipment failure
Clean Training Data
9. BigML, Inc #DutchMLSchool 9
• Given:
• Training data: 𝑥𝑥1, 𝑥𝑥2, … , 𝑥𝑥𝑁𝑁 from a mixture of 𝐷𝐷0 and 𝐷𝐷𝑎𝑎 (the anomaly
distribution)
• Find:
• The data points in the training data that belong to 𝐷𝐷𝑎𝑎
• Use Cases:
• Data cleaning
• Fraud detection, Insider Threat detection
• These two cases can be combined
• Contaminated training data + Separate contaminated test data
Contaminated Training Data
11. BigML, Inc #DutchMLSchool 11
•Distance-Based Methods
•Anomaly score
𝐴𝐴 𝑥𝑥𝑞𝑞 = min
𝑥𝑥∈𝐷𝐷
𝑥𝑥𝑞𝑞 − 𝑥𝑥
•Density Estimation Methods
•Surprise: 𝐴𝐴 𝑥𝑥𝑞𝑞 = − log 𝑃𝑃𝐷𝐷(𝑥𝑥𝑞𝑞)
•Model the joint distribution
𝑃𝑃𝐷𝐷(𝑥𝑥) of the input data points
𝑥𝑥1, … ∈ 𝐷𝐷
Theoretical Approaches to Anomaly Detection
•Quantile Methods
•Find a smooth function 𝑓𝑓 such that
𝑥𝑥: 𝑓𝑓 𝑥𝑥 ≥ 0 contains 1 − 𝛼𝛼 of the
training data
•Anomaly score 𝐴𝐴 𝑥𝑥 = −𝑓𝑓(𝑥𝑥)
•Reconstruction Methods
•Train an auto-encoder: 𝑥𝑥 ≈
𝐷𝐷 𝐸𝐸 𝑥𝑥 , where 𝐸𝐸 is the encoder and
𝐷𝐷 is the decoder
•Anomaly score
𝐴𝐴 𝑥𝑥𝑞𝑞 = 𝑥𝑥𝑞𝑞 − 𝐷𝐷 𝐸𝐸 𝑥𝑥𝑞𝑞
12. BigML, Inc #DutchMLSchool 12
•Define a distance 𝑑𝑑(𝑥𝑥𝑖𝑖, 𝑥𝑥𝑗𝑗)
• 𝐴𝐴 𝑥𝑥𝑞𝑞 = min
𝑥𝑥∈𝐷𝐷
𝑑𝑑(𝑥𝑥𝑞𝑞, 𝑥𝑥)
•Requires a good distance metric
Approach 1: Distance-Based Methods
𝑥𝑥𝑞𝑞
𝑥𝑥𝑞𝑞
13. BigML, Inc #DutchMLSchool 13
• Approximates L1 (Manhattan) Distance
• (Guha, et al., ICML 2016)
• Construct a fully random binary tree
• choose attribute 𝑗𝑗 at random
• choose splitting threshold 𝜃𝜃 uniformly from
min 𝑥𝑥⋅𝑗𝑗 , max 𝑥𝑥⋅𝑗𝑗
• until every data point is in its own leaf
• let 𝑑𝑑(𝑥𝑥𝑖𝑖) be the depth of point 𝑥𝑥𝑖𝑖
• repeat 𝐿𝐿 times
• let ̅
𝑑𝑑(𝑥𝑥𝑖𝑖) be the average depth of 𝑥𝑥𝑖𝑖
• 𝐴𝐴 𝑥𝑥𝑖𝑖 = 2
−
�
𝑑𝑑 𝑥𝑥𝑖𝑖
𝑟𝑟 𝑥𝑥𝑖𝑖
• 𝑟𝑟(𝑥𝑥𝑖𝑖) is the expected depth
Isolation Forest [Liu, Ting, Zhou, 2011]
𝑥𝑥⋅𝑗𝑗
𝑥𝑥⋅𝑗𝑗 > 𝜃𝜃
𝑥𝑥⋅2 > 𝜃𝜃2 𝑥𝑥⋅8 > 𝜃𝜃3
𝑥𝑥⋅3 > 𝜃𝜃4 𝑥𝑥⋅1 > 𝜃𝜃5
𝑥𝑥𝑖𝑖
14. BigML, Inc #DutchMLSchool 14
• Given a data set 𝑥𝑥1, … , 𝑥𝑥𝑁𝑁 where
𝑥𝑥𝑖𝑖 ∈ ℝ𝑑𝑑
• We assume the data have been drawn
iid from an unknown probability
density: 𝑥𝑥𝑖𝑖 ∼ 𝑃𝑃 𝑥𝑥𝑖𝑖
• Goal: Estimate 𝑃𝑃
• Anomaly Score: 𝐴𝐴 𝑥𝑥𝑞𝑞 = − log 𝑃𝑃 𝑥𝑥𝑞𝑞
• “surprisal” from information theory
• Why density estimation?
• Gives a more global view by combining
distances to all data points
Approach 2: Density Estimation
15. BigML, Inc #DutchMLSchool 15
•Introduce sparse random
projections Π𝑙𝑙 into 1-
dimensional space
•Fit a density estimator
𝑃𝑃𝑙𝑙 Π𝑙𝑙 𝑥𝑥 in each 1-d space
• 𝐴𝐴 𝑥𝑥 =
1
𝐿𝐿
∑𝑙𝑙=1
𝐿𝐿
− log 𝑃𝑃𝑙𝑙 Π𝑙𝑙 𝑥𝑥𝑞𝑞
Example: LODA
(Pevny, 2015)
16. BigML, Inc #DutchMLSchool 16
• Vapnik’s principle: We only need to
estimate the “decision boundary” between
nominal and anomalous
• Surround the data by a function 𝑓𝑓 that
captures 1 − 𝜖𝜖 of the training data
• One-Class Support Vector Machine
(OCSVM)
• 𝑓𝑓 is a hyperplane in “kernel space”
• Support Vector Data Description (SVDD)
• 𝑓𝑓 is a sphere is “kernel space”
• Issue
• Need to choose 𝜖𝜖 at learning time rather
than run time
Approach 3: Quantile Methods
17. BigML, Inc #DutchMLSchool 17
• NavLab self-driving van (Pomerleau, 1992)
• Primary head: Predict steering angle from
input image
• Secondary head: Predict the input image
(“auto-encoder”)
• 𝐴𝐴 𝑥𝑥𝑞𝑞 = 𝑥𝑥𝑞𝑞 − �
𝑥𝑥𝑞𝑞
• If reconstruction is poor, this suggests that
the steering angle should not be trusted
• Principle: Anomaly Detection through
Failure
• Define a task on which the learned system
should fail for anomalies
Approach 4: Reconstruction Methods
Pomerleau, NIPS 1992
18. BigML, Inc #DutchMLSchool 18
• NASA Mars Science Laboratory ChemCam
instrument
• Collects 6144 spectral bands on rock samples
from 7m distance using laser stimulation
• Goal: active learning to find interesting spectra
• DEMUD
• Incremental PCA applied to samples one at a time
• Fit only to the samples labeled as “uninteresting” by
the user
• Show the user the most un-uninteresting sample
(sample with highest PCA reconstruction error)
• Rapidly discovers interesting samples
• Wagstaff, et al. (2013)
Application: Finding Unusual Chemical Spectra
19. BigML, Inc #DutchMLSchool 19
• Distance-Based Methods
• k-NN: Mean distance to 𝑘𝑘-nearest neighbors
• LOF: Local Outlier Factor (Breunig, et al., 2000)
• ABOD: kNN Angle-Based Outlier Detector (Kriegel, et al., 2008)
• IFOR: Isolation Forest (Liu, et al., 2008)
• Density-Based Approaches
• RKDE: Robust Kernel Density Estimation (Kim & Scott, 2008)
• EGMM: Ensemble Gaussian Mixture Model (our group)
• LODA: Lightweight Online Detector of Anomalies (Pevny, 2016)
• Quantile-Based Methods
• OCSVM: One-class SVM (Schoelkopf, et al., 1999)
• SVDD: Support Vector Data Description (Tax & Duin, 2004)
Benchmarking Study [Andrew Emmott, 2015, 2020]
20. BigML, Inc #DutchMLSchool 20
• Select 19 data sets from UC Irvine repository
• Choose one or more classes to be “anomalies”; the rest are “nominals”
• Manipulate
• Relative frequency
• Point difficulty
• Irrelevant features
• Clusteredness
• 20 replicates of each configuration
• Result: 11,888 Non-trivial Benchmark Datasets
Benchmarking Methodology
21. BigML, Inc #DutchMLSchool 21
• Linear ANOVA
• log
𝐴𝐴𝐴𝐴𝐴𝐴
1 −𝐴𝐴𝐴𝐴𝐴𝐴
~ 𝑟𝑟𝑟𝑟 + 𝑝𝑝𝑝𝑝 + 𝑐𝑐𝑐𝑐 + 𝑖𝑖𝑖𝑖 + 𝑝𝑝𝑠𝑠𝑠𝑠𝑠𝑠 + 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎
• rf: relative frequency
• pd: point difficulty
• cl: normalized clusteredness
• ir: irrelevant features
• pset: “Parent” set
• algo: anomaly detection algorithm
• Assess the algo effect while controlling for all other factors
• 𝐴𝐴𝐴𝐴𝐴𝐴: area under the ROC curve for the nominal vs. anomaly binary decision
Analysis of Variance
22. BigML, Inc #DutchMLSchool 22
• 19 UCI Datasets
• 9 Leading “feature-based” algorithms
• 11,888 non-trivial benchmark datasets
• Mean AUC effect for “nominal” vs. “anomaly” decisions
• Controlling for
• Parent data set
• Difficulty of individual queries
• Fraction of anomalies
• Irrelevant features
• Clusteredness of anomalies
• Baseline method: Distance to nominal mean (“tmd”)
• Best methods: K-nearest neighbors and Isolation Forest
• Worst methods: Kernel-based OCSVM and SVDD
Benchmarking Study Results
0.62
0.64
0.66
0.68
0.70
0.72
0.74
0.76
0.78
knn iforest egmm rkde lof abod loda svdd tmd ocsvm
Mean AUC Effect
23. BigML, Inc #DutchMLSchool 23
• Show top-ranked candidate to the
user
• User labels candidate
• Label is used to update the anomaly
detector
• Two methods
• AAD [Das, et al, ICDM 2016]
• GLAD-OMD (modified version of
iForest) [Siddiqui, et al., KDD 2018]
Incorporating User Feedback: Initial Work
Data
Anomaly
Detection
Best
Candidate
User
Anomaly Analysis
yes
no
24. BigML, Inc #DutchMLSchool 24
User Feedback Yields Big Improvements in
Anomaly Discovery
APT Engagement 3 Results
27. BigML, Inc #DutchMLSchool 27
•K-nearest neighbor in the
latent space
•Issue: What distance metric to
use?
•Cosine distance is the most
popular:
𝑑𝑑 𝑧𝑧1, 𝑧𝑧2 =
𝑧𝑧1 ⋅ 𝑧𝑧2
𝑧𝑧1 ‖𝑧𝑧2‖
Distance-Based Methods
28. BigML, Inc #DutchMLSchool 28
•Mahalanobis Method
• Fit a joint multivariate Gaussian
• Each class 𝑘𝑘 has its own mean 𝜇𝜇𝑘𝑘
• Shared covariance matrix Σ
•Given a new 𝑥𝑥,
log 𝑃𝑃(𝑥𝑥) ∝ min
𝑘𝑘
𝑥𝑥 − 𝜇𝜇𝑘𝑘
⊤
Σ−1
𝑥𝑥 − 𝜇𝜇𝑘𝑘
This is known as the squared
Mahalanobis distance
Density-Based Methods
29. BigML, Inc #DutchMLSchool 29
• Residual Flow Deep Density Estimator
• (Chen, Behrmann, Duvenaud, et al. NeurIPS 2019)
• Standard Cross-Entropy Supervised Loss
• Claim: This helps focus 𝑃𝑃 𝑥𝑥 on relevant aspects of the images
• Anomaly Score: 𝐴𝐴 𝑥𝑥𝑞𝑞 = − log 𝑃𝑃(𝑥𝑥𝑞𝑞)
Open Hybrid: Classification + Density Estimation
(Tack, Li, Guo, Guo, 2020)
30. BigML, Inc #DutchMLSchool 30
• The method is somewhat tricky to work with
• Set 𝑐𝑐 as the mean of a small set of points passed through the untrained network
• No bias weights
• These help prevent “hypersphere collapse”
Quantile Method: Deep SVDD (Ruff, et al. ICML 2018)
31. BigML, Inc #DutchMLSchool 31
• Encoder: 𝑧𝑧 = 𝐸𝐸 𝑥𝑥
• Decoder: �
𝑥𝑥 = 𝐷𝐷(𝑧𝑧)
• Challenge: How to constrain 𝐸𝐸 and
𝐷𝐷 so that the autoencoder fails on
anomalies but succeeds on nominal
images?
• Autoencoders often learn general-
purpose image compression
methods
Reconstruction Methods: Deep Autoencoders
𝑥𝑥
𝑧𝑧
�
𝑥𝑥
𝐸𝐸 𝐷𝐷
33. BigML, Inc #DutchMLSchool 33
•Garrepalli (2020)
• Train classifier to optimize
softmax likelihood (minimize
“cross-entropy loss”)
• Maximum logit score is better
than two distance methods:
• Isolation Forest
• LOF (a nearest-neighbor method)
Surprise: The Max Logit Score
0.68 0.67
0.63
0.72
0.51
0.44
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
H (y|x) Max SoftMax-
prob.
Max BCE-prob Max-logit Iforest LOF
AUROC
Anomaly Measures on Latent Representations for CIFAR-100
34. BigML, Inc #DutchMLSchool 34
• Vaze, Han, Vedaldi, Zisserman (2021): “Open
Set Recognition: A Good Classifier is All You
Need” (ICLR 2022; arXiv 2110.06207)
• Carefully train a classifier using the latest tricks
• Standard cross-entropy combined with the
following:
• Cosine learning rate schedule
• Learning rate warmup
• RandAugment augmentations
• Label Smoothing
• Anomaly score: max logit
• − max
𝑘𝑘
ℓ𝑘𝑘
More Evidence for Max Logit
Protocol from Lawrence Neal et al. (2018)
35. BigML, Inc #DutchMLSchool 35
•Novel class difficulty based on
semantic distance
• CUB: Bird species
• Air: Aircraft
• ImageNet
Still More Evidence for Max Logit
37. BigML, Inc #DutchMLSchool 37
• DenseNet with 384-dimensional
latent space.
• CIFAR-10: 6 known classes, 4 novel
classes
• UMAP visualization
• Light green: novel classes
• Darker greens: known classes
• Note that many novel classes stay
toward the center of the space;
others overlap with known classes
• Training was not required to “pull
them out” so that they could be
discriminated
How are open set images represented by deep
learning?
Alex Guyer
6 Known
Classes
4 Novel
Classes
38. BigML, Inc #DutchMLSchool 38
Similar Results from Other Groups
[Tack, et al. NeurIPS 2020] [Vaze, et al. arXiv 2110.06207]
39. BigML, Inc #DutchMLSchool 39
• Convolutional neural network learns “features” that
detect image patches relevant to the classification
task
• The logit layer weights these features to make the
classification decision
• Novel classes activate fewer of these features, so
their activation vectors are smaller
• Hypothesis: The networks don’t detect that an
elephant is novel because of trunk and tusks but
because its head doesn’t activate known features
The Familiarity Hypothesis
The network doesn’t
detect novelty, it detects
the absence of familiarity
40. BigML, Inc #DutchMLSchool 40
Novel images strongly activate fewer
features
• CIFAR 10: 6 known classes; 4 novel
classes
• DenseNet (𝑧𝑧 has 324 dimensions)
• Activation threshold 𝜃𝜃
• Count number of features whose
activation exceeds 𝜃𝜃
• OOD images activate fewer
features
Evidence: Number of Activated Features
Alex Guyer (unpublished)
41. BigML, Inc #DutchMLSchool 41
Are they features “on” the object vs. the
background?
• Strategy: blur the object and see how the
feature activations change
• activations that change must be on the object
• Details:
• PASCAL VOC Segmented Images
• Blur the original image (31x31 kernel; sd=31)
• Form composite image where blurred region
replaces the segmented region
Which features are responsible for the drop in
activation?
https://www.peko-step.com/en/tool/blur.html
42. BigML, Inc #DutchMLSchool 42
Blurring Examples
Note: This does not remove all object-related information (e.g.,
object boundary), so we don’t detect all on-object features
43. BigML, Inc #DutchMLSchool 43
• “presence feature”
• 𝐵𝐵𝐵𝐵 𝑖𝑖, 𝑗𝑗 > 0. Blurring decreases the
activity of the feature. Its net effect is to
measure the presence of one or more
image patterns
• Its activity is high when those patterns
are present
• “absence feature”
• 𝐵𝐵𝐵𝐵 𝑖𝑖, 𝑗𝑗 < 0. Blurring increases the
activity of the feature. Its net effect is to
measure the absence of one or more
image patterns
• Its activity is high when those patterns
are absent
• Define the “blurring effect” of feature 𝑗𝑗 on
image 𝑖𝑖
𝐵𝐵𝐵𝐵 𝑖𝑖, 𝑗𝑗 = 𝑧𝑧𝑖𝑖𝑖𝑖 − ̃
𝑧𝑧𝑖𝑖𝑖𝑖
where
• 𝑧𝑧𝑖𝑖𝑖𝑖 is the activation of latent feature 𝑗𝑗 on
image 𝑖𝑖
• ̃
𝑧𝑧𝑖𝑖𝑖𝑖 is the activation of latent feature 𝑗𝑗 on
blurred image 𝑖𝑖
Blurring Effect
44. BigML, Inc #DutchMLSchool 44
•On average, the activation of
a feature changes when the
object (of class 𝑘𝑘) is blurred
𝑂𝑂𝑂𝑂 𝑗𝑗, 𝑘𝑘
=
1
𝑁𝑁𝑘𝑘
�
𝑖𝑖:𝑦𝑦𝑖𝑖=𝑘𝑘
𝑧𝑧𝑖𝑖𝑖𝑖𝑖𝑖 − ̃
𝑧𝑧𝑖𝑖𝑖𝑖𝑖𝑖
•Feature 𝑗𝑗 is a net presence
feature for class 𝑘𝑘 if
𝑂𝑂𝑂𝑂 𝑗𝑗, 𝑘𝑘 > 0.02
•Feature 𝑗𝑗 is a net absence
feature for class 𝑘𝑘 if
𝑂𝑂𝑂𝑂 𝑗𝑗, 𝑘𝑘 < −0.02
•Otherwise 𝑗𝑗 is net neutral for
class 𝑘𝑘
“On Object” score of feature 𝑗𝑗 for class 𝑘𝑘
45. BigML, Inc #DutchMLSchool 45
• Logit score is ℓ𝑗𝑗𝑗𝑗 = ∑𝑗𝑗 𝑤𝑤𝑗𝑗𝑗𝑗𝑧𝑧𝑖𝑖𝑖𝑖
• Contribution of 𝑗𝑗 in image 𝑖𝑖 to class 𝑘𝑘:
• 𝑐𝑐𝑖𝑖𝑖𝑖𝑖𝑖 = 𝑤𝑤𝑗𝑗𝑗𝑗𝑧𝑧𝑖𝑖𝑖𝑖 (in normal images)
• ̃
𝑐𝑐𝑖𝑖𝑖𝑖𝑖𝑖 = 𝑤𝑤𝑗𝑗𝑗𝑗 ̃
𝑧𝑧𝑖𝑖𝑖𝑖 (in blurred images)
• Mean contribution
• ̅
𝑐𝑐𝑗𝑗𝑗𝑗 =
1
𝑁𝑁𝑘𝑘
∑ 𝑖𝑖 𝑦𝑦𝑖𝑖 = 𝑘𝑘 𝑐𝑐𝑖𝑖𝑖𝑖𝑖𝑖
• ̅̃
𝑐𝑐𝑗𝑗𝑗𝑗 =
1
𝑁𝑁𝑘𝑘
∑ 𝑖𝑖 𝑦𝑦𝑖𝑖 = 𝑘𝑘 ̃
𝑐𝑐𝑖𝑖𝑖𝑖𝑖𝑖
Feature Taxonomy
𝒘𝒘𝒋𝒋𝒋𝒋 > 𝟎𝟎 𝒘𝒘𝒋𝒋𝒋𝒋 < 𝟎𝟎
𝑂𝑂𝑂𝑂 𝑗𝑗, 𝑘𝑘
> 0.02
positive
presence
negative
presence
𝑂𝑂𝑂𝑂 𝑗𝑗, 𝑘𝑘
< 0.02
positive
absence
negative
absence
Sun & Li: On the Effectiveness of Sparsification for Detecting the
Deep Unknowns. arXiv 2111.09805
46. BigML, Inc #DutchMLSchool 46
Mean feature types for class 3
1.00
0.00
On-Object
Index
(presence)
On-Object
Index
(absence)
positive features
negative features
red = presence
blue = absence
47. BigML, Inc #DutchMLSchool 47
Zoomed View: Blurring reduces ̅
𝑐𝑐𝑗𝑗𝑗𝑗
Mean unblurred
contribution
Mean blurred contribution
• Blurring…
• reduces the contribution of
positive presence features (red
dots)
• reduces the contribution of
negative absence features (blue
dots)
1.00
0.00
On-Object
Index
(presence)
On-Object
Index
(absence)
48. BigML, Inc #DutchMLSchool 48
Decomposing the Logit Score: Four Cases
Positive presence:
𝑤𝑤𝑗𝑗𝑗𝑗 > 0 and
𝑂𝑂𝑂𝑂 𝑗𝑗, 𝑘𝑘 > 0
Positive absence:
𝑤𝑤𝑗𝑗𝑗𝑗 > 0 and
𝑂𝑂𝑂𝑂 𝑗𝑗, 𝑘𝑘 < 0
Negative presence:
𝑤𝑤𝑗𝑗𝑗𝑗 > 0 and
𝑂𝑂𝑂𝑂(𝑗𝑗, 𝑘𝑘) > 0
Negative absence:
𝑤𝑤𝑗𝑗𝑗𝑗 < 0 and
𝑂𝑂𝑂𝑂 𝑗𝑗, 𝑘𝑘 < 0
52. BigML, Inc #DutchMLSchool 52
• Note that the Positive Presence
features dominate the max logit
score
• The Negative Absence and
Positive Absence features
(purple and blue lines) make a
small contribution
• Negative Presence features
make no contribution
• Conclusion: Decreases in
activations of positive presence
account for most of the max
logit score
Decomposing the Novelty Scores
53. BigML, Inc #DutchMLSchool 53
•Red line: trend for Positive
Presence contribution to max
logit score
•Black line: smooth estimate of
classification accuracy
(“known” vs “novel”)
Decreases in Positive Presence Features
Account for Novelty Detection Accuracy
54. BigML, Inc #DutchMLSchool 54
•Blakemore, Colin, and Grahame F.
Cooper. “Development of the brain
depends on the visual environment.”
(1970): 477-478.
• Kittens raised in environments with
only horizontal or only vertical lines
• “They were virtually blind for contours
perpendicular to the orientation they
had experienced.”
•Chomsky: “Poverty of the stimulus”
Can we expect computer vision systems to perceive
things they have not been trained on?
Source: Li Yang Ku
https://computervisionblog.wordpress.com/2013/06/01/ca
ts-and-vision-is-vision-acquired-or-innate/
55. BigML, Inc #DutchMLSchool 55
• Familiarity-based anomaly detection advantages:
• Easy to implement – Anomaly signal (max logit) can be extracted from the
classifier. No separate anomaly detection model is needed
• Training on additional, auxiliary classes improves both classification and
anomaly detection performance
• Familiarity-based anomaly detection weaknesses
• Partially-occluded nominal objects will be flagged as anomalies
• If an image contains both a novel object and a known object, the novel object
will not be detected
• Adversarial attacks can easily cause false anomalies and missed anomalies
Implications
57. BigML, Inc #DutchMLSchool 57
• Can we learn deep representations that can represent outliers?
• Nonstationarity
• As the world changes, the anomaly detection model must also change
• Explanation
• Users often want explanations of why something is labeled as anomalous in order to provide feedback or
take other actions
• Setting alarm thresholds
• How can we set a threshold to control the false alarm and missed alarm rates?
• Incremental (continual) learning in deep networks
• How can we efficiently update a trained neural network to incorporate user feedback?
• Anomaly detection in temporal, spatial, and spatio-temporal data, in video data, etc.
• Anomaly detection at multiple scales
Challenges for Anomaly Detection
59. BigML, Inc #DutchMLSchool
• Four Basic Methods
• Distances, densities, density quantiles, and reconstruction
• Distances work best; Isolation Forest is very robust
• Anomaly Detection in Deep Learning
• The four basic methods have been extended to deep learning
• They often do not work well when applied to learned representations
• Classifier Max Logit Score Gives Very Competitive Performance
• Computed as a side effect of standard deep classifiers
• Measures familiarity rather than novelty, which makes it risky in many settings
• Advances in Deep Anomaly Detection Require Learning Better Representations
Shallow and Deep Methods for Anomaly Detection
59