Anomaly Detection with Deep Learning
November 2017
1
Founded 2014
Funding $6.3M
OSS 3,700 Github Forks
300,000+ downloads/mo.
Team 35 employees; 25 engineers; 7 PhDs
SKYMIND OVERVIEW
OUR BOOK GIVEAWAY!
(pub. Aug. 2017)
SKIL スカイマインドスキル
Our Production Deep Learning Solution
ディープラーニングソリューション
confidential 4
• There are 5 main approaches to doing anomaly detection
• Probabilistic-based
• Distance-based
• Domain-based
• Reconstruction-based
• Information Theoretic-based
• All of these methods have some sort of drawback that prevents them from being
applicable to any type of data. They are either:
• Have built-in assumptions of the data (like Gaussian Mixture Models)
• Require specific domain knowledge
• Only detects certain patterns of anomalies
• Not suitable for data with high temporal dependencies
• Not suitable for multivariate data or computationally infeasible at our scale
• Multiple approaches are necessary to have a comprehensive detection pipeline
Anomaly Detection Approaches
アノマリー検出アプローチ
5
Anomaly Detection Example
アノマリー検出例
6
Cluster Based Methods1
クラスターベース方式1
Cluster based methods work
by creating a dictionary of non-
anomalous data and finding the
one that best matches the actual
data at inference time.
If the pattern was never seen
before the reconstruction will
be very different than the actual
data.
クラスタベース方式が機能する非アノマリー
データ辞書を作成し、推論時における実際
データに最も一致するものを見つける。
再構築前は従来絶対に見られなかったパ
ターンであれば、実際とは大きく異なるデータ
であるといえる。
confidential 7
Cluster Based Methods2
クラスターベース方式2
Data
データ
Best Reconstruction
最適な再生
Difference
違い
8
Join Raw data Transform
Feed groups into Autoencoder and save
reconstruction error of center
エンコーダへ送り込んで中心的再構築エラーを保
存
Input Data Reconstruction
Example workflow for anomaly detection
アノマリー検出のための作業例
9
Example of VAE detecting anomalies
VAE  (Variational Autoencoders 変分オートエンコーダー)検出アノマ
リー例
Low reconstruction error
低い再生エラー
High reconstruction error
低い再生エラー
confidential 10
Data Processing Step 3: Training
データプロセシング ステップ3:トレーニング
The autoencoder will be trained to recreate the input data as closely as possible
(one row at a time)
11
Data Processing Step 4: Ranking
データプロセシング ステップ4:ランキング
The trained autoencoder will then be run on the new data and we will store the error
(sum of mean squared difference per column per row) in a Ranking engine.
Unusual patterns will have a high reconstruction error
Output (Reconstruction)Input Data
Output (Reconstruction)
Reconstruction
-
Input
= 2 Error
Table
12
Problem: No Free lunch
課題:ノーフリーランチ定理
Wolpert’s no free lunch theorem states that there is no single machine learning algorithm that can perform
well on every task. Deep Learning is itself a set of very different techniques that are good or bad at various
problems.
The system will have to use various algorithms to detect different types of anomalies and possible different
root causes.
Class imbalance will be a problem at the beginning of the system’s lifetime. Labeled data will be overshadowed
by the unlabeled data and the Anomaly detectors will not be able to improve for some time. One possible
solution is to use Pseudo labeling to label all data from a trained classifier. These labels will be very noisy at
first so they might have to be deployed in stages.
confidential 13
• Systems need to be able to handle terabytes of data at minimum
• The system is required to train a large number of neural networks within a short
time-frame. GPU servers are a cost effective way to achieve the computation
resources necessary.
• Due to space considerations GPU servers are storage inefficient and the system
will employ the Hadoop File System and Spark on commodity servers to meet the
storage requirements.
• The system must scale to larger problems.
System Requirements
システム要求事項
confidential 14
Use kmeans and tsne highlighting clusters to label data points
• Uses the representation from the autoencoder to automatically group
data
• TSNE visualization allows highlighting and automatic labeling
• Use KNN and VPTrees to sample the hidden activations learned from the neural
net to interactively label
Automatic Labeling
confidential 15
• Autoencoders can be trained to identify causes of certain kinds of behavior
• “Spikes” in reconstruction error on time series can be used in detecting problems
in infrastructure as well as in network monitoring (dropped connections, unusually
high latency
• Use KNN and VPTrees to sample the hidden activations learned from the neural
net to interactively label
Root cause Analysis
confidential 16
• The system will not use a redundant environments since disaster recovery is not
a requirement.
• Inside an environment the system will have redundant hardware and can tolerate
the loss of one GPU node and one App node without service degradation.
• This system does not employ a remote backup strategy because all data is
ephemeral and can be recreated from the data on S3.
Design Considerations for Production
confidential 17
THANK YOU
ありがとうございました。
18

Anomaly Detection and Automatic Labeling with Deep Learning

  • 1.
    Anomaly Detection withDeep Learning November 2017 1
  • 2.
    Founded 2014 Funding $6.3M OSS3,700 Github Forks 300,000+ downloads/mo. Team 35 employees; 25 engineers; 7 PhDs SKYMIND OVERVIEW
  • 3.
  • 4.
    SKIL スカイマインドスキル Our Production DeepLearning Solution ディープラーニングソリューション confidential 4
  • 5.
    • There are5 main approaches to doing anomaly detection • Probabilistic-based • Distance-based • Domain-based • Reconstruction-based • Information Theoretic-based • All of these methods have some sort of drawback that prevents them from being applicable to any type of data. They are either: • Have built-in assumptions of the data (like Gaussian Mixture Models) • Require specific domain knowledge • Only detects certain patterns of anomalies • Not suitable for data with high temporal dependencies • Not suitable for multivariate data or computationally infeasible at our scale • Multiple approaches are necessary to have a comprehensive detection pipeline Anomaly Detection Approaches アノマリー検出アプローチ 5
  • 6.
  • 7.
    Cluster Based Methods1 クラスターベース方式1 Clusterbased methods work by creating a dictionary of non- anomalous data and finding the one that best matches the actual data at inference time. If the pattern was never seen before the reconstruction will be very different than the actual data. クラスタベース方式が機能する非アノマリー データ辞書を作成し、推論時における実際 データに最も一致するものを見つける。 再構築前は従来絶対に見られなかったパ ターンであれば、実際とは大きく異なるデータ であるといえる。 confidential 7
  • 8.
    Cluster Based Methods2 クラスターベース方式2 Data データ BestReconstruction 最適な再生 Difference 違い 8
  • 9.
    Join Raw dataTransform Feed groups into Autoencoder and save reconstruction error of center エンコーダへ送り込んで中心的再構築エラーを保 存 Input Data Reconstruction Example workflow for anomaly detection アノマリー検出のための作業例 9
  • 10.
    Example of VAEdetecting anomalies VAE  (Variational Autoencoders 変分オートエンコーダー)検出アノマ リー例 Low reconstruction error 低い再生エラー High reconstruction error 低い再生エラー confidential 10
  • 11.
    Data Processing Step3: Training データプロセシング ステップ3:トレーニング The autoencoder will be trained to recreate the input data as closely as possible (one row at a time) 11
  • 12.
    Data Processing Step4: Ranking データプロセシング ステップ4:ランキング The trained autoencoder will then be run on the new data and we will store the error (sum of mean squared difference per column per row) in a Ranking engine. Unusual patterns will have a high reconstruction error Output (Reconstruction)Input Data Output (Reconstruction) Reconstruction - Input = 2 Error Table 12
  • 13.
    Problem: No Freelunch 課題:ノーフリーランチ定理 Wolpert’s no free lunch theorem states that there is no single machine learning algorithm that can perform well on every task. Deep Learning is itself a set of very different techniques that are good or bad at various problems. The system will have to use various algorithms to detect different types of anomalies and possible different root causes. Class imbalance will be a problem at the beginning of the system’s lifetime. Labeled data will be overshadowed by the unlabeled data and the Anomaly detectors will not be able to improve for some time. One possible solution is to use Pseudo labeling to label all data from a trained classifier. These labels will be very noisy at first so they might have to be deployed in stages. confidential 13
  • 14.
    • Systems needto be able to handle terabytes of data at minimum • The system is required to train a large number of neural networks within a short time-frame. GPU servers are a cost effective way to achieve the computation resources necessary. • Due to space considerations GPU servers are storage inefficient and the system will employ the Hadoop File System and Spark on commodity servers to meet the storage requirements. • The system must scale to larger problems. System Requirements システム要求事項 confidential 14
  • 15.
    Use kmeans andtsne highlighting clusters to label data points • Uses the representation from the autoencoder to automatically group data • TSNE visualization allows highlighting and automatic labeling • Use KNN and VPTrees to sample the hidden activations learned from the neural net to interactively label Automatic Labeling confidential 15
  • 16.
    • Autoencoders canbe trained to identify causes of certain kinds of behavior • “Spikes” in reconstruction error on time series can be used in detecting problems in infrastructure as well as in network monitoring (dropped connections, unusually high latency • Use KNN and VPTrees to sample the hidden activations learned from the neural net to interactively label Root cause Analysis confidential 16
  • 17.
    • The systemwill not use a redundant environments since disaster recovery is not a requirement. • Inside an environment the system will have redundant hardware and can tolerate the loss of one GPU node and one App node without service degradation. • This system does not employ a remote backup strategy because all data is ephemeral and can be recreated from the data on S3. Design Considerations for Production confidential 17
  • 18.