July 4 - 6, 2022
2 n d E d i t i o n
BigML, Inc #DutchMLSchool
Anomaly Detection at Scale
Lessons Learned deploying thousands of Anomaly Detectors
Alvaro Clemente


Machine Learning Engineer, BigML
2
BigML, Inc #DutchMLSchool
Agenda
3
BigML, Inc #DutchMLSchool 4
Agenda
Anomaly Detection Primer
1
Lessons
2
Conclusion
3
BigML, Inc #DutchMLSchool
Anomaly Detection in a Nutshell
5
BigML, Inc #DutchMLSchool 6
Identify the anomalies
BigML, Inc #DutchMLSchool 6
Identify the anomalies
1
2
3
4
6
5 7
8
9
BigML, Inc #DutchMLSchool 7
Identify the anomalies
BigML, Inc #DutchMLSchool 7
Identify the anomalies
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
BigML, Inc #DutchMLSchool 8
Identify the anomalies
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
YES
BigML, Inc #DutchMLSchool 8
Identify the anomalies
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
YES *
BigML, Inc #DutchMLSchool 9
BigML, Inc #DutchMLSchool 10
Identify examples that are different
from the rest of the dataset
BigML, Inc #DutchMLSchool 11
Identify examples that are different from the rest of the dataset
• Dataset cleaning: Remove instances that are not
representative of your dataset
• Exploring Data: Identifying outlier situations in your
dataset


• Classification: working with very unbalanced data
or uncertainty
BigML, Inc #DutchMLSchool 12
Anomaly Detectors for Classification
• Fraud Detection: Detecting money laundering
and fraud in bank transactions


• Intrusion Detection: Detecting unexpected
events in network traffic


• Quality Control: Detecting failures in
manufacturing processes
BigML, Inc #DutchMLSchool
Lessons
13
BigML, Inc #DutchMLSchool 14
Design
Domain
Training
Operation
• You don’t need Anomaly Detection


• Divide and Conquer
• Data Cleaning for free


• Automatic Experts


• Missing the forest for the trees
• Features, Features, Features


• Setup a Feedback Loop


• Customize Thresholds
• You can’t evaluate it!


• Adapt to new times


• Size matters
BigML, Inc #DutchMLSchool 15
Lesson 1: You don’t need


Anomaly Detection
Design
BigML, Inc #DutchMLSchool 16
Lesson 1: You don’t need Anomaly Detection
• You are interested in the unusual case


• You have very few examples of the
interesting class


• You can’t have fully labeled datasets


• The class shows unexpected
behaviors
When possible, other methods will give better control over the performance
BigML, Inc #DutchMLSchool 17
Lesson 2: Divide and Conquer
Design
BigML, Inc #DutchMLSchool 18
Lesson 2: Divide and Conquer
• Identify the different tasks in your problem domain


• Build a model trained with data from that specific
task


• Even with different features
• Each model will be easier to track and reason about
Prefer multiple simpler models over a single complex one
BigML, Inc #DutchMLSchool 19
Lesson 3: Data cleaning for free
Domain
BigML, Inc #DutchMLSchool 20
Lesson 3: Data cleaning for free
• Find issues in your data pipelines


• Have a fast feedback loop for reporting issues


• Have an off-switch when those issues are detected
Anomaly Detectors will find data issues
BigML, Inc #DutchMLSchool 21
Lesson 4: Automatic Experts
Domain
BigML, Inc #DutchMLSchool 22
Lesson 4: Automatic Experts
Use expert rules to check predictions
• A combination of anomalies and rules
will yield the best results


• You can automate your rules with a
model


• Train a model to detect False Positives


• This requires even more data
BigML, Inc #DutchMLSchool 23
Lesson 5: Don’t miss the forest


for the trees
Domain
BigML, Inc #DutchMLSchool 24
Lesson 5: Don’t miss the forest for the trees
Anomaly patterns contain very interesting information!
• Looking how the anomalies happen over time can reveal
very useful information


• Random failures → Random anomalies


• Significant events → Groups of anomalies


• Macro Alerts
BigML, Inc #DutchMLSchool 25
Lesson 6: Features, Features,
Features
Training
BigML, Inc #DutchMLSchool 26
Lesson 6: Features, Features, Features
Use the features to tune model behavior
• Feature engineering and selection is one of the
2 main ways to tune the model behavior


• Keep the number of features to a minimum


• Keep it explainable
BigML, Inc #DutchMLSchool 27
Lesson 7: Setup a Feedback Loop
Training
BigML, Inc #DutchMLSchool 28
Lesson 7: Setup a Feedback Loop
Setup a Feedback Loop for tuning model behavior
• Keep a database of predictions and outcomes


• Usually requires human inspection


• Monitor performance of the models for tuning


• With BigML*, you can update your models with
new data
* currently only available in private deployments
BigML, Inc #DutchMLSchool 29
Lesson 8: Customize Thresholds
Training
BigML, Inc #DutchMLSchool 30
Lesson 8: Customize Thresholds
Customize the thresholds to your requirements and data
• Anomaly can be a fuzzy and subjective concept


• Use multiple thresholds
• Low, medium and High
• Use dynamic thresholds
• Data driven
BigML, Inc #DutchMLSchool 31
Lesson 9: You can’t evaluate it
Operation
BigML, Inc #DutchMLSchool 32
Lesson 9: You can’t evaluate it!
Evaluating these models is complicated
• Evaluation will not be that simple


• Lack of information


• Macro events affect the individual performance


• Precision and Recall don’t translate so well to these kinds of problems


• Find some useful and realistic evaluation metrics


• Indirect metrics, business metrics (i.e: recall rates on cars)


• Manual exploration of random samples of data
BigML, Inc #DutchMLSchool 33
Lesson 10: Adapt to new times
Operation
BigML, Inc #DutchMLSchool 34
Lesson 10: Adapt to new times
Anomaly Detectors are very sensitive to changes in the working conditions
• Anomaly Detectors are very sensitive to
changes in the working conditions


• Model quality will deteriorate over time faster than
with other models


• Monitor the model performance


• Retrain often


• Find change indicators and disable proactively
BigML, Inc #DutchMLSchool 35
Lesson 11: Size matters
Operation
BigML, Inc #DutchMLSchool 36
Lesson 11: Size matters
Storage and caching for Anomaly Detectors is key for fast predictions
• You will be deploying a lot of these models


• Anomaly Detectors can be heavy


• Efficient storage, transport and loading will be key
• Near real time scenarios and distributed prediction


• Use efficient representations of the models


• Minomaly
BigML, Inc #DutchMLSchool
Conclusion
37
BigML, Inc #DutchMLSchool 38
Solve problems that are impractical
with traditional methods
BigML, Inc #DutchMLSchool 39
Discover unexpected facts about
your systems
BigML, Inc #DutchMLSchool
Q & A
40
BigML, Inc #DutchMLSchool
MLToolbox Context
42
BigML, Inc #DutchMLSchool 43
BODY SHOP PAINT SHOP ASSEMBLY SHOP
Cost of
fi
xing a
welding failure: $
Cost of
fi
xing a
welding failure: $$$
From Data to Real Time Alerts
Using ML to improve Quality Control
BigML, Inc #DutchMLSchool 44
From Data to Real Time Alerts
Using ML to improve Quality Control
Detect as many failures as possible, while keeping a manageable number of car
extractions
BigML, Inc #DutchMLSchool 45
From Data to Real Time Alerts
Using ML to improve Quality Control
A case for Anomaly Detectors
• Large amounts of data processed in real time


• 300,000 welds / day


• Extremely few examples of failures


• 1 / 15,000 welds fails (0.0067%!)


• Failures can have unexpected shapes


• Hundreds of machines doing slightly different tasks


• Unreliable labels

DutchMLSchool 2022 - Anomaly Detection at Scale

  • 1.
    July 4 -6, 2022 2 n d E d i t i o n
  • 2.
    BigML, Inc #DutchMLSchool AnomalyDetection at Scale Lessons Learned deploying thousands of Anomaly Detectors Alvaro Clemente Machine Learning Engineer, BigML 2
  • 3.
  • 4.
    BigML, Inc #DutchMLSchool4 Agenda Anomaly Detection Primer 1 Lessons 2 Conclusion 3
  • 5.
    BigML, Inc #DutchMLSchool AnomalyDetection in a Nutshell 5
  • 6.
    BigML, Inc #DutchMLSchool6 Identify the anomalies
  • 7.
    BigML, Inc #DutchMLSchool6 Identify the anomalies 1 2 3 4 6 5 7 8 9
  • 8.
    BigML, Inc #DutchMLSchool7 Identify the anomalies
  • 9.
    BigML, Inc #DutchMLSchool7 Identify the anomalies 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
  • 10.
    BigML, Inc #DutchMLSchool8 Identify the anomalies 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 YES
  • 11.
    BigML, Inc #DutchMLSchool8 Identify the anomalies 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 YES *
  • 12.
  • 13.
    BigML, Inc #DutchMLSchool10 Identify examples that are different from the rest of the dataset
  • 14.
    BigML, Inc #DutchMLSchool11 Identify examples that are different from the rest of the dataset • Dataset cleaning: Remove instances that are not representative of your dataset • Exploring Data: Identifying outlier situations in your dataset • Classification: working with very unbalanced data or uncertainty
  • 15.
    BigML, Inc #DutchMLSchool12 Anomaly Detectors for Classification • Fraud Detection: Detecting money laundering and fraud in bank transactions • Intrusion Detection: Detecting unexpected events in network traffic • Quality Control: Detecting failures in manufacturing processes
  • 16.
  • 17.
    BigML, Inc #DutchMLSchool14 Design Domain Training Operation • You don’t need Anomaly Detection • Divide and Conquer • Data Cleaning for free • Automatic Experts • Missing the forest for the trees • Features, Features, Features • Setup a Feedback Loop • Customize Thresholds • You can’t evaluate it! • Adapt to new times • Size matters
  • 18.
    BigML, Inc #DutchMLSchool15 Lesson 1: You don’t need 
 Anomaly Detection Design
  • 19.
    BigML, Inc #DutchMLSchool16 Lesson 1: You don’t need Anomaly Detection • You are interested in the unusual case • You have very few examples of the interesting class • You can’t have fully labeled datasets • The class shows unexpected behaviors When possible, other methods will give better control over the performance
  • 20.
    BigML, Inc #DutchMLSchool17 Lesson 2: Divide and Conquer Design
  • 21.
    BigML, Inc #DutchMLSchool18 Lesson 2: Divide and Conquer • Identify the different tasks in your problem domain • Build a model trained with data from that specific task • Even with different features • Each model will be easier to track and reason about Prefer multiple simpler models over a single complex one
  • 22.
    BigML, Inc #DutchMLSchool19 Lesson 3: Data cleaning for free Domain
  • 23.
    BigML, Inc #DutchMLSchool20 Lesson 3: Data cleaning for free • Find issues in your data pipelines • Have a fast feedback loop for reporting issues • Have an off-switch when those issues are detected Anomaly Detectors will find data issues
  • 24.
    BigML, Inc #DutchMLSchool21 Lesson 4: Automatic Experts Domain
  • 25.
    BigML, Inc #DutchMLSchool22 Lesson 4: Automatic Experts Use expert rules to check predictions • A combination of anomalies and rules will yield the best results • You can automate your rules with a model • Train a model to detect False Positives • This requires even more data
  • 26.
    BigML, Inc #DutchMLSchool23 Lesson 5: Don’t miss the forest 
 for the trees Domain
  • 27.
    BigML, Inc #DutchMLSchool24 Lesson 5: Don’t miss the forest for the trees Anomaly patterns contain very interesting information! • Looking how the anomalies happen over time can reveal very useful information • Random failures → Random anomalies • Significant events → Groups of anomalies • Macro Alerts
  • 28.
    BigML, Inc #DutchMLSchool25 Lesson 6: Features, Features, Features Training
  • 29.
    BigML, Inc #DutchMLSchool26 Lesson 6: Features, Features, Features Use the features to tune model behavior • Feature engineering and selection is one of the 2 main ways to tune the model behavior • Keep the number of features to a minimum • Keep it explainable
  • 30.
    BigML, Inc #DutchMLSchool27 Lesson 7: Setup a Feedback Loop Training
  • 31.
    BigML, Inc #DutchMLSchool28 Lesson 7: Setup a Feedback Loop Setup a Feedback Loop for tuning model behavior • Keep a database of predictions and outcomes • Usually requires human inspection • Monitor performance of the models for tuning • With BigML*, you can update your models with new data * currently only available in private deployments
  • 32.
    BigML, Inc #DutchMLSchool29 Lesson 8: Customize Thresholds Training
  • 33.
    BigML, Inc #DutchMLSchool30 Lesson 8: Customize Thresholds Customize the thresholds to your requirements and data • Anomaly can be a fuzzy and subjective concept • Use multiple thresholds • Low, medium and High • Use dynamic thresholds • Data driven
  • 34.
    BigML, Inc #DutchMLSchool31 Lesson 9: You can’t evaluate it Operation
  • 35.
    BigML, Inc #DutchMLSchool32 Lesson 9: You can’t evaluate it! Evaluating these models is complicated • Evaluation will not be that simple • Lack of information • Macro events affect the individual performance • Precision and Recall don’t translate so well to these kinds of problems • Find some useful and realistic evaluation metrics • Indirect metrics, business metrics (i.e: recall rates on cars) • Manual exploration of random samples of data
  • 36.
    BigML, Inc #DutchMLSchool33 Lesson 10: Adapt to new times Operation
  • 37.
    BigML, Inc #DutchMLSchool34 Lesson 10: Adapt to new times Anomaly Detectors are very sensitive to changes in the working conditions • Anomaly Detectors are very sensitive to changes in the working conditions • Model quality will deteriorate over time faster than with other models • Monitor the model performance • Retrain often • Find change indicators and disable proactively
  • 38.
    BigML, Inc #DutchMLSchool35 Lesson 11: Size matters Operation
  • 39.
    BigML, Inc #DutchMLSchool36 Lesson 11: Size matters Storage and caching for Anomaly Detectors is key for fast predictions • You will be deploying a lot of these models • Anomaly Detectors can be heavy • Efficient storage, transport and loading will be key • Near real time scenarios and distributed prediction • Use efficient representations of the models • Minomaly
  • 40.
  • 41.
    BigML, Inc #DutchMLSchool38 Solve problems that are impractical with traditional methods
  • 42.
    BigML, Inc #DutchMLSchool39 Discover unexpected facts about your systems
  • 43.
  • 45.
  • 46.
    BigML, Inc #DutchMLSchool43 BODY SHOP PAINT SHOP ASSEMBLY SHOP Cost of fi xing a welding failure: $ Cost of fi xing a welding failure: $$$ From Data to Real Time Alerts Using ML to improve Quality Control
  • 47.
    BigML, Inc #DutchMLSchool44 From Data to Real Time Alerts Using ML to improve Quality Control Detect as many failures as possible, while keeping a manageable number of car extractions
  • 48.
    BigML, Inc #DutchMLSchool45 From Data to Real Time Alerts Using ML to improve Quality Control A case for Anomaly Detectors • Large amounts of data processed in real time • 300,000 welds / day • Extremely few examples of failures • 1 / 15,000 welds fails (0.0067%!) • Failures can have unexpected shapes • Hundreds of machines doing slightly different tasks • Unreliable labels