SlideShare a Scribd company logo
1 of 32
Download to read offline
CTF: Anomaly Detection in High-Dimensional
Time Series with Coarse-to-Fine Model Transfer
Ming Sun, Ya Su, Shenglin Zhang, Yuanpu Cao, Yuqing Liu, Dan Pei,
Wenfei Wu, Yongsu Zhang, Xiaozhou Liu, Junliang Tang
INFOCOM 2021
Outline
Background Design Evaluation Conclusion
2
Outline
Background Design Evaluation Conclusion
3
DL Algorithms in the Infra Operation
4
• Advantages
– automation
– robustness
– Saving operator’s labor
• Example:
– RNN-VAE for anomaly detection
RNN-VAE Based Algorithms
5
Network architecture of RNN-VAE
models at time t
𝒙𝒕 (49) -> 𝒛𝒕 (3) -> 𝒙𝒕
%
(49)
RNN
Dense
layers
"# $# "#
%
RNN
Dense
layers
Variational Auto-Encoder (VAE)
KPI dimension reduced
Network Layers
• RNN: Shallow & general
• Dense layers: Deep & specific
Scalability is the problem for large scale
6
• High-Dimensional Data
– Machines: in millions
– KPI: in tens
– Time: Frequent data query (2880 samples/day)
Ø One model per machine: time
10X minutes * 1X million machines
Ø One model for all: accuracy
Scalability is the problem for large scale
7
• High-Dimensional Data
– Machines: in millions
– KPI: in tens
– Time: Frequent data query (2880 samples/day)
Goal: devise scalable deep learning (DL) algorithms for
large-scale anomaly detection
8
• Intuition: Cluster Machines first, then run DL for each cluster
• Challenge 1: clustering model training
• Clustering cannot run on high-dimensional data
• DL cannot run on whole dataset without clustering
• Solution: Synthetic framework
Intuition and Challenges
dependency
Coarse-grained model -> clustering -> fine-grained models
9
• Intuition: Cluster Machines first, then run DL for each cluster
• Challenge 1: clustering model training
• Clustering cannot run on high-dimensional data
• DL cannot run on whole dataset without clustering
• Solution: Synthetic framework
• Challenge 2: High dimension of time domain
• Hard to cluster even KPI is compressed
• Solution: compress sequence to z-distribution
Intuition and Challenges
dependency
10
• Intuition: Cluster Machines first, then run DL for each cluster
• Challenge 1: clustering model training
• Clustering cannot run on high-dimensional data
• DL cannot run on whole dataset without clustering
• Solution: Synthetic framework
• Challenge 2: High dimension of time domain
• Hard to cluster even KPI is compressed
• Solution: compress sequence to z-distribution
• Challenge 3: Neural network training method
• Solution: fine-tuning strategy
• Freeze RNN and tune dense layers
Intuition and Challenges
dependency
RNN
Dense
layers
"# $# "#
%
RNN
Dense
layers
Outline
Background Design Evaluation Conclusion
11
Framework of model training
12
Framework of model training
:0 : 4 4 2 30 . : 0
2: 4 0 0
0 :0 0 : . 4
.34 0 . 0:4 2
4 :4- 4 1 .34 0
0
0 : 10: 14 0 4 2
14 0 2: 4 0 0 0: . 0:
. 0:
(M)
0
0 0 (K<<M)
0 1: : 4 .34 0
Framework of model training
13
Framework of model training
:0 : 4 4 2 30 . : 0
2: 4 0 0
0 :0 0 : . 4
.34 0 . 0:4 2
4 :4- 4 1 .34 0
0
0 : 10: 14 0 4 2
14 0 2: 4 0 0 0: . 0:
. 0:
(M)
0
0 0 (K<<M)
0 1: : 4 .34 0 • Sampling strategy:
• Machine sampling
• Time sampling
Framework of model training
14
Framework of model training
:0 : 4 4 2 30 . : 0
2: 4 0 0
0 :0 0 : . 4
.34 0 . 0:4 2
4 :4- 4 1 .34 0
0
0 : 10: 14 0 4 2
14 0 2: 4 0 0 0: . 0:
. 0:
(M)
0
0 0 (K<<M)
0 1: : 4 .34 0
𝒙𝒕 sequence
𝒛𝒕 sequence
𝒛𝒕 distribution
Framework of model training
15
Framework of model training
:0 : 4 4 2 30 . : 0
2: 4 0 0
0 :0 0 : . 4
.34 0 . 0:4 2
4 :4- 4 1 .34 0
0
0 : 10: 14 0 4 2
14 0 2: 4 0 0 0: . 0:
. 0:
(M)
0
0 0 (K<<M)
0 1: : 4 .34 0 𝒛𝒕 distribution
distance matrix
clustering results
Wasserstein distance
HAC algorithm
Framework of model training
16
Framework of model training
:0 : 4 4 2 30 . : 0
2: 4 0 0
0 :0 0 : . 4
.34 0 . 0:4 2
4 :4- 4 1 .34 0
0
0 : 10: 14 0 4 2
14 0 2: 4 0 0 0: . 0:
. 0:
(M)
0
0 0 (K<<M)
0 1: : 4 .34 0
RNN
Dense
layers
"# $# "#
%
RNN
Dense
layers
• Fine-tuning strategy:
• RNN: fixed
• Dense layers: tuned
System architecture
17
System architecture
Data API
Online Anomaly
Detection ( IV-C)
Offline Data
Online Data
Offline Model
Training ( IV-B)
Model Score
Outlier Alerting
( V-D)
Results &
Visualization
Data Preprocessing
( IV-A)
Monitored
machine entities
1. Data preprocessing
2. Offline model training
3. Online anomaly detection
Labeling tools
18
The interface of the labeling tool
Outline
Background Design Evaluation Conclusion
19
Dataset & performance metrics
20
• Dataset:
– # Machine entities: 533
– Dimension of each machine entity: 49 KPIs x 37440 time
points (frequency: 30s, 13 days)
– Training = first 5 days, Testing = last 8 days
• Metrics:
– F1, Precision, Recall: average of all machine entities.
– Model training time
Overall performance
• Scalability
– Pre-training: fixed (5493s)
21
The execution time of each step under different
numbers of machine entities
F1, Precision, and Recall scores of CTF without
and with alerting
Overall performance
• Scalability
– Pre-training: fixed (5493s)
– feature extraction: 0.3s /
machine
22
The execution time of each step under different
numbers of machine entities
F1, Precision, and Recall scores of CTF without
and with alerting
Overall performance
• Scalability
– Pre-training: fixed (5493s)
– feature extraction: 0.3s /
machine
– Clustering: much smaller
– Fine-tuning: 448s / model
23
The execution time of each step under different
numbers of machine entities
F1, Precision, and Recall scores of CTF without
and with alerting
Overall performance
• Scalability
– Pre-training: fixed (5493s)
– feature extraction: 0.3s /
machine
– Clustering: much smaller
– Fine-tuning: 448s / model
• Effectiveness
– F1: 0.830->0.892
24
The execution time of each step under different
numbers of machine entities
F1, Precision, and Recall scores of CTF without
and with alerting
Overall performance
• Validating the Synthetic
Framework
– One model/machine
– One model for all
– CTF w/o transfer
25
Comparison with model variations
F1 and training time under different numbers of
epochs for CTF w/o transfer
Overall performance
• Validating the Synthetic
Framework
– One model/machine
– One model for all
– CTF w/o transfer
26
Comparison with model variations
F1 and training time under different numbers of
epochs for CTF w/o transfer
Overall performance
• Validating the Synthetic
Framework
– One model/machine
– One model for all
– CTF w/o transfer
27
Comparison with model variations
F1 and training time under different numbers of
epochs for CTF w/o transfer
Overall performance
• Validating the Synthetic
Framework
– One model/machine
– One model for all
– CTF w/o transfer
28
Comparison with model variations
F1 and training time under different numbers of
epochs for CTF w/o transfer
Validating Design Choices
• Choice of Clustering Objects
– SPF, ROCKA, DCN
• Choice of Distance Measures
– KL divergence, JS divergence,
mean squared error
• Choice of Clustering Algorithms
– DBSCAN, K-medoids
29
Outline
Background Design Evaluation Conclusion
30
Conclusion
• CTF: synthetic framework, high-dimensional time series
(machine, KPI, time)
• Techniques: 𝒛𝒕 distribution clustering, model reuse, fine-tuning
• Evaluation: CTF scalability and effectiveness
• Labeling tool + labeled dataset
31
Thank you!
Q & A
sunm19@mails.tsinghua.edu.cn
INFOCOM 2021
32

More Related Content

Similar to CTF: Anomaly Detection in High-Dimensional Time Series with Coarse-to-Fine Model Transfer

Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Dalei Li
 
Deep learning fundamental and Research project on IBM POWER9 system from NUS
Deep learning fundamental and Research project on IBM POWER9 system from NUSDeep learning fundamental and Research project on IBM POWER9 system from NUS
Deep learning fundamental and Research project on IBM POWER9 system from NUSGanesan Narayanasamy
 
Improving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN ApplicationsImproving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN ApplicationsChester Chen
 
Handwritten Digit Recognition and performance of various modelsation[autosaved]
Handwritten Digit Recognition and performance of various modelsation[autosaved]Handwritten Digit Recognition and performance of various modelsation[autosaved]
Handwritten Digit Recognition and performance of various modelsation[autosaved]SubhradeepMaji
 
3_Transfer_Learning.pdf
3_Transfer_Learning.pdf3_Transfer_Learning.pdf
3_Transfer_Learning.pdfFEG
 
Application of machine learning and cognitive computing in intrusion detectio...
Application of machine learning and cognitive computing in intrusion detectio...Application of machine learning and cognitive computing in intrusion detectio...
Application of machine learning and cognitive computing in intrusion detectio...Mahdi Hosseini Moghaddam
 
Tsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in ChinaTsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in ChinaDataStax Academy
 
NLP Classifier Models & Metrics
NLP Classifier Models & MetricsNLP Classifier Models & Metrics
NLP Classifier Models & MetricsSanghamitra Deb
 
RT15 Berkeley | ARTEMiS-SSN Features for Micro-grid / Renewable Energy Sourc...
RT15 Berkeley |  ARTEMiS-SSN Features for Micro-grid / Renewable Energy Sourc...RT15 Berkeley |  ARTEMiS-SSN Features for Micro-grid / Renewable Energy Sourc...
RT15 Berkeley | ARTEMiS-SSN Features for Micro-grid / Renewable Energy Sourc...OPAL-RT TECHNOLOGIES
 
Surveillance scene classification using machine learning
Surveillance scene classification using machine learningSurveillance scene classification using machine learning
Surveillance scene classification using machine learningUtkarsh Contractor
 
Master defence 2020 -Volodymyr Lut-Neural Architecture Search: a Probabilisti...
Master defence 2020 -Volodymyr Lut-Neural Architecture Search: a Probabilisti...Master defence 2020 -Volodymyr Lut-Neural Architecture Search: a Probabilisti...
Master defence 2020 -Volodymyr Lut-Neural Architecture Search: a Probabilisti...Lviv Data Science Summer School
 
Beyond data and model parallelism for deep neural networks
Beyond data and model parallelism for deep neural networksBeyond data and model parallelism for deep neural networks
Beyond data and model parallelism for deep neural networksJunKudo2
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)DonghyunKang12
 
Intelligence at scale through AI model efficiency
Intelligence at scale through AI model efficiencyIntelligence at scale through AI model efficiency
Intelligence at scale through AI model efficiencyQualcomm Research
 
Thesis Report - Gaurav Raina MSc ES - v2
Thesis Report - Gaurav Raina MSc ES - v2Thesis Report - Gaurav Raina MSc ES - v2
Thesis Report - Gaurav Raina MSc ES - v2Gaurav Raina
 
Review of the paper: Traffic-aware Frequency Scaling for Balanced On-Chip Net...
Review of the paper: Traffic-aware Frequency Scaling for Balanced On-Chip Net...Review of the paper: Traffic-aware Frequency Scaling for Balanced On-Chip Net...
Review of the paper: Traffic-aware Frequency Scaling for Balanced On-Chip Net...Luca Sinico
 
OpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation finalOpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation finalJunli Gu
 

Similar to CTF: Anomaly Detection in High-Dimensional Time Series with Coarse-to-Fine Model Transfer (20)

ExplainableAI.pptx
ExplainableAI.pptxExplainableAI.pptx
ExplainableAI.pptx
 
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...
 
Deep learning fundamental and Research project on IBM POWER9 system from NUS
Deep learning fundamental and Research project on IBM POWER9 system from NUSDeep learning fundamental and Research project on IBM POWER9 system from NUS
Deep learning fundamental and Research project on IBM POWER9 system from NUS
 
Improving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN ApplicationsImproving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN Applications
 
Machine Learning @NECST
Machine Learning @NECSTMachine Learning @NECST
Machine Learning @NECST
 
Handwritten Digit Recognition and performance of various modelsation[autosaved]
Handwritten Digit Recognition and performance of various modelsation[autosaved]Handwritten Digit Recognition and performance of various modelsation[autosaved]
Handwritten Digit Recognition and performance of various modelsation[autosaved]
 
3_Transfer_Learning.pdf
3_Transfer_Learning.pdf3_Transfer_Learning.pdf
3_Transfer_Learning.pdf
 
Application of machine learning and cognitive computing in intrusion detectio...
Application of machine learning and cognitive computing in intrusion detectio...Application of machine learning and cognitive computing in intrusion detectio...
Application of machine learning and cognitive computing in intrusion detectio...
 
Tsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in ChinaTsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in China
 
NLP Classifier Models & Metrics
NLP Classifier Models & MetricsNLP Classifier Models & Metrics
NLP Classifier Models & Metrics
 
RT15 Berkeley | ARTEMiS-SSN Features for Micro-grid / Renewable Energy Sourc...
RT15 Berkeley |  ARTEMiS-SSN Features for Micro-grid / Renewable Energy Sourc...RT15 Berkeley |  ARTEMiS-SSN Features for Micro-grid / Renewable Energy Sourc...
RT15 Berkeley | ARTEMiS-SSN Features for Micro-grid / Renewable Energy Sourc...
 
Surveillance scene classification using machine learning
Surveillance scene classification using machine learningSurveillance scene classification using machine learning
Surveillance scene classification using machine learning
 
Master defence 2020 -Volodymyr Lut-Neural Architecture Search: a Probabilisti...
Master defence 2020 -Volodymyr Lut-Neural Architecture Search: a Probabilisti...Master defence 2020 -Volodymyr Lut-Neural Architecture Search: a Probabilisti...
Master defence 2020 -Volodymyr Lut-Neural Architecture Search: a Probabilisti...
 
Beyond data and model parallelism for deep neural networks
Beyond data and model parallelism for deep neural networksBeyond data and model parallelism for deep neural networks
Beyond data and model parallelism for deep neural networks
 
convolutional_rbm.ppt
convolutional_rbm.pptconvolutional_rbm.ppt
convolutional_rbm.ppt
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
 
Intelligence at scale through AI model efficiency
Intelligence at scale through AI model efficiencyIntelligence at scale through AI model efficiency
Intelligence at scale through AI model efficiency
 
Thesis Report - Gaurav Raina MSc ES - v2
Thesis Report - Gaurav Raina MSc ES - v2Thesis Report - Gaurav Raina MSc ES - v2
Thesis Report - Gaurav Raina MSc ES - v2
 
Review of the paper: Traffic-aware Frequency Scaling for Balanced On-Chip Net...
Review of the paper: Traffic-aware Frequency Scaling for Balanced On-Chip Net...Review of the paper: Traffic-aware Frequency Scaling for Balanced On-Chip Net...
Review of the paper: Traffic-aware Frequency Scaling for Balanced On-Chip Net...
 
OpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation finalOpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation final
 

Recently uploaded

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxsoftware engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxnada99848
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 

Recently uploaded (20)

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxsoftware engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptx
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 

CTF: Anomaly Detection in High-Dimensional Time Series with Coarse-to-Fine Model Transfer

  • 1. CTF: Anomaly Detection in High-Dimensional Time Series with Coarse-to-Fine Model Transfer Ming Sun, Ya Su, Shenglin Zhang, Yuanpu Cao, Yuqing Liu, Dan Pei, Wenfei Wu, Yongsu Zhang, Xiaozhou Liu, Junliang Tang INFOCOM 2021
  • 4. DL Algorithms in the Infra Operation 4 • Advantages – automation – robustness – Saving operator’s labor • Example: – RNN-VAE for anomaly detection
  • 5. RNN-VAE Based Algorithms 5 Network architecture of RNN-VAE models at time t 𝒙𝒕 (49) -> 𝒛𝒕 (3) -> 𝒙𝒕 % (49) RNN Dense layers "# $# "# % RNN Dense layers Variational Auto-Encoder (VAE) KPI dimension reduced Network Layers • RNN: Shallow & general • Dense layers: Deep & specific
  • 6. Scalability is the problem for large scale 6 • High-Dimensional Data – Machines: in millions – KPI: in tens – Time: Frequent data query (2880 samples/day) Ø One model per machine: time 10X minutes * 1X million machines Ø One model for all: accuracy
  • 7. Scalability is the problem for large scale 7 • High-Dimensional Data – Machines: in millions – KPI: in tens – Time: Frequent data query (2880 samples/day) Goal: devise scalable deep learning (DL) algorithms for large-scale anomaly detection
  • 8. 8 • Intuition: Cluster Machines first, then run DL for each cluster • Challenge 1: clustering model training • Clustering cannot run on high-dimensional data • DL cannot run on whole dataset without clustering • Solution: Synthetic framework Intuition and Challenges dependency Coarse-grained model -> clustering -> fine-grained models
  • 9. 9 • Intuition: Cluster Machines first, then run DL for each cluster • Challenge 1: clustering model training • Clustering cannot run on high-dimensional data • DL cannot run on whole dataset without clustering • Solution: Synthetic framework • Challenge 2: High dimension of time domain • Hard to cluster even KPI is compressed • Solution: compress sequence to z-distribution Intuition and Challenges dependency
  • 10. 10 • Intuition: Cluster Machines first, then run DL for each cluster • Challenge 1: clustering model training • Clustering cannot run on high-dimensional data • DL cannot run on whole dataset without clustering • Solution: Synthetic framework • Challenge 2: High dimension of time domain • Hard to cluster even KPI is compressed • Solution: compress sequence to z-distribution • Challenge 3: Neural network training method • Solution: fine-tuning strategy • Freeze RNN and tune dense layers Intuition and Challenges dependency RNN Dense layers "# $# "# % RNN Dense layers
  • 12. Framework of model training 12 Framework of model training :0 : 4 4 2 30 . : 0 2: 4 0 0 0 :0 0 : . 4 .34 0 . 0:4 2 4 :4- 4 1 .34 0 0 0 : 10: 14 0 4 2 14 0 2: 4 0 0 0: . 0: . 0: (M) 0 0 0 (K<<M) 0 1: : 4 .34 0
  • 13. Framework of model training 13 Framework of model training :0 : 4 4 2 30 . : 0 2: 4 0 0 0 :0 0 : . 4 .34 0 . 0:4 2 4 :4- 4 1 .34 0 0 0 : 10: 14 0 4 2 14 0 2: 4 0 0 0: . 0: . 0: (M) 0 0 0 (K<<M) 0 1: : 4 .34 0 • Sampling strategy: • Machine sampling • Time sampling
  • 14. Framework of model training 14 Framework of model training :0 : 4 4 2 30 . : 0 2: 4 0 0 0 :0 0 : . 4 .34 0 . 0:4 2 4 :4- 4 1 .34 0 0 0 : 10: 14 0 4 2 14 0 2: 4 0 0 0: . 0: . 0: (M) 0 0 0 (K<<M) 0 1: : 4 .34 0 𝒙𝒕 sequence 𝒛𝒕 sequence 𝒛𝒕 distribution
  • 15. Framework of model training 15 Framework of model training :0 : 4 4 2 30 . : 0 2: 4 0 0 0 :0 0 : . 4 .34 0 . 0:4 2 4 :4- 4 1 .34 0 0 0 : 10: 14 0 4 2 14 0 2: 4 0 0 0: . 0: . 0: (M) 0 0 0 (K<<M) 0 1: : 4 .34 0 𝒛𝒕 distribution distance matrix clustering results Wasserstein distance HAC algorithm
  • 16. Framework of model training 16 Framework of model training :0 : 4 4 2 30 . : 0 2: 4 0 0 0 :0 0 : . 4 .34 0 . 0:4 2 4 :4- 4 1 .34 0 0 0 : 10: 14 0 4 2 14 0 2: 4 0 0 0: . 0: . 0: (M) 0 0 0 (K<<M) 0 1: : 4 .34 0 RNN Dense layers "# $# "# % RNN Dense layers • Fine-tuning strategy: • RNN: fixed • Dense layers: tuned
  • 17. System architecture 17 System architecture Data API Online Anomaly Detection ( IV-C) Offline Data Online Data Offline Model Training ( IV-B) Model Score Outlier Alerting ( V-D) Results & Visualization Data Preprocessing ( IV-A) Monitored machine entities 1. Data preprocessing 2. Offline model training 3. Online anomaly detection
  • 18. Labeling tools 18 The interface of the labeling tool
  • 20. Dataset & performance metrics 20 • Dataset: – # Machine entities: 533 – Dimension of each machine entity: 49 KPIs x 37440 time points (frequency: 30s, 13 days) – Training = first 5 days, Testing = last 8 days • Metrics: – F1, Precision, Recall: average of all machine entities. – Model training time
  • 21. Overall performance • Scalability – Pre-training: fixed (5493s) 21 The execution time of each step under different numbers of machine entities F1, Precision, and Recall scores of CTF without and with alerting
  • 22. Overall performance • Scalability – Pre-training: fixed (5493s) – feature extraction: 0.3s / machine 22 The execution time of each step under different numbers of machine entities F1, Precision, and Recall scores of CTF without and with alerting
  • 23. Overall performance • Scalability – Pre-training: fixed (5493s) – feature extraction: 0.3s / machine – Clustering: much smaller – Fine-tuning: 448s / model 23 The execution time of each step under different numbers of machine entities F1, Precision, and Recall scores of CTF without and with alerting
  • 24. Overall performance • Scalability – Pre-training: fixed (5493s) – feature extraction: 0.3s / machine – Clustering: much smaller – Fine-tuning: 448s / model • Effectiveness – F1: 0.830->0.892 24 The execution time of each step under different numbers of machine entities F1, Precision, and Recall scores of CTF without and with alerting
  • 25. Overall performance • Validating the Synthetic Framework – One model/machine – One model for all – CTF w/o transfer 25 Comparison with model variations F1 and training time under different numbers of epochs for CTF w/o transfer
  • 26. Overall performance • Validating the Synthetic Framework – One model/machine – One model for all – CTF w/o transfer 26 Comparison with model variations F1 and training time under different numbers of epochs for CTF w/o transfer
  • 27. Overall performance • Validating the Synthetic Framework – One model/machine – One model for all – CTF w/o transfer 27 Comparison with model variations F1 and training time under different numbers of epochs for CTF w/o transfer
  • 28. Overall performance • Validating the Synthetic Framework – One model/machine – One model for all – CTF w/o transfer 28 Comparison with model variations F1 and training time under different numbers of epochs for CTF w/o transfer
  • 29. Validating Design Choices • Choice of Clustering Objects – SPF, ROCKA, DCN • Choice of Distance Measures – KL divergence, JS divergence, mean squared error • Choice of Clustering Algorithms – DBSCAN, K-medoids 29
  • 31. Conclusion • CTF: synthetic framework, high-dimensional time series (machine, KPI, time) • Techniques: 𝒛𝒕 distribution clustering, model reuse, fine-tuning • Evaluation: CTF scalability and effectiveness • Labeling tool + labeled dataset 31
  • 32. Thank you! Q & A sunm19@mails.tsinghua.edu.cn INFOCOM 2021 32