SlideShare a Scribd company logo
CTF: Anomaly Detection in High-Dimensional
Time Series with Coarse-to-Fine Model Transfer
Ming Sun, Ya Su, Shenglin Zhang, Yuanpu Cao, Yuqing Liu, Dan Pei,
Wenfei Wu, Yongsu Zhang, Xiaozhou Liu, Junliang Tang
INFOCOM 2021
Outline
Background Design Evaluation Conclusion
2
Outline
Background Design Evaluation Conclusion
3
DL Algorithms in the Infra Operation
4
• Advantages
– automation
– robustness
– Saving operator’s labor
• Example:
– RNN-VAE for anomaly detection
RNN-VAE Based Algorithms
5
Network architecture of RNN-VAE
models at time t
𝒙𝒕 (49) -> 𝒛𝒕 (3) -> 𝒙𝒕
%
(49)
RNN
Dense
layers
"# $# "#
%
RNN
Dense
layers
Variational Auto-Encoder (VAE)
KPI dimension reduced
Network Layers
• RNN: Shallow & general
• Dense layers: Deep & specific
Scalability is the problem for large scale
6
• High-Dimensional Data
– Machines: in millions
– KPI: in tens
– Time: Frequent data query (2880 samples/day)
Ø One model per machine: time
10X minutes * 1X million machines
Ø One model for all: accuracy
Scalability is the problem for large scale
7
• High-Dimensional Data
– Machines: in millions
– KPI: in tens
– Time: Frequent data query (2880 samples/day)
Goal: devise scalable deep learning (DL) algorithms for
large-scale anomaly detection
8
• Intuition: Cluster Machines first, then run DL for each cluster
• Challenge 1: clustering model training
• Clustering cannot run on high-dimensional data
• DL cannot run on whole dataset without clustering
• Solution: Synthetic framework
Intuition and Challenges
dependency
Coarse-grained model -> clustering -> fine-grained models
9
• Intuition: Cluster Machines first, then run DL for each cluster
• Challenge 1: clustering model training
• Clustering cannot run on high-dimensional data
• DL cannot run on whole dataset without clustering
• Solution: Synthetic framework
• Challenge 2: High dimension of time domain
• Hard to cluster even KPI is compressed
• Solution: compress sequence to z-distribution
Intuition and Challenges
dependency
10
• Intuition: Cluster Machines first, then run DL for each cluster
• Challenge 1: clustering model training
• Clustering cannot run on high-dimensional data
• DL cannot run on whole dataset without clustering
• Solution: Synthetic framework
• Challenge 2: High dimension of time domain
• Hard to cluster even KPI is compressed
• Solution: compress sequence to z-distribution
• Challenge 3: Neural network training method
• Solution: fine-tuning strategy
• Freeze RNN and tune dense layers
Intuition and Challenges
dependency
RNN
Dense
layers
"# $# "#
%
RNN
Dense
layers
Outline
Background Design Evaluation Conclusion
11
Framework of model training
12
Framework of model training
:0 : 4 4 2 30 . : 0
2: 4 0 0
0 :0 0 : . 4
.34 0 . 0:4 2
4 :4- 4 1 .34 0
0
0 : 10: 14 0 4 2
14 0 2: 4 0 0 0: . 0:
. 0:
(M)
0
0 0 (K<<M)
0 1: : 4 .34 0
Framework of model training
13
Framework of model training
:0 : 4 4 2 30 . : 0
2: 4 0 0
0 :0 0 : . 4
.34 0 . 0:4 2
4 :4- 4 1 .34 0
0
0 : 10: 14 0 4 2
14 0 2: 4 0 0 0: . 0:
. 0:
(M)
0
0 0 (K<<M)
0 1: : 4 .34 0 • Sampling strategy:
• Machine sampling
• Time sampling
Framework of model training
14
Framework of model training
:0 : 4 4 2 30 . : 0
2: 4 0 0
0 :0 0 : . 4
.34 0 . 0:4 2
4 :4- 4 1 .34 0
0
0 : 10: 14 0 4 2
14 0 2: 4 0 0 0: . 0:
. 0:
(M)
0
0 0 (K<<M)
0 1: : 4 .34 0
𝒙𝒕 sequence
𝒛𝒕 sequence
𝒛𝒕 distribution
Framework of model training
15
Framework of model training
:0 : 4 4 2 30 . : 0
2: 4 0 0
0 :0 0 : . 4
.34 0 . 0:4 2
4 :4- 4 1 .34 0
0
0 : 10: 14 0 4 2
14 0 2: 4 0 0 0: . 0:
. 0:
(M)
0
0 0 (K<<M)
0 1: : 4 .34 0 𝒛𝒕 distribution
distance matrix
clustering results
Wasserstein distance
HAC algorithm
Framework of model training
16
Framework of model training
:0 : 4 4 2 30 . : 0
2: 4 0 0
0 :0 0 : . 4
.34 0 . 0:4 2
4 :4- 4 1 .34 0
0
0 : 10: 14 0 4 2
14 0 2: 4 0 0 0: . 0:
. 0:
(M)
0
0 0 (K<<M)
0 1: : 4 .34 0
RNN
Dense
layers
"# $# "#
%
RNN
Dense
layers
• Fine-tuning strategy:
• RNN: fixed
• Dense layers: tuned
System architecture
17
System architecture
Data API
Online Anomaly
Detection ( IV-C)
Offline Data
Online Data
Offline Model
Training ( IV-B)
Model Score
Outlier Alerting
( V-D)
Results &
Visualization
Data Preprocessing
( IV-A)
Monitored
machine entities
1. Data preprocessing
2. Offline model training
3. Online anomaly detection
Labeling tools
18
The interface of the labeling tool
Outline
Background Design Evaluation Conclusion
19
Dataset & performance metrics
20
• Dataset:
– # Machine entities: 533
– Dimension of each machine entity: 49 KPIs x 37440 time
points (frequency: 30s, 13 days)
– Training = first 5 days, Testing = last 8 days
• Metrics:
– F1, Precision, Recall: average of all machine entities.
– Model training time
Overall performance
• Scalability
– Pre-training: fixed (5493s)
21
The execution time of each step under different
numbers of machine entities
F1, Precision, and Recall scores of CTF without
and with alerting
Overall performance
• Scalability
– Pre-training: fixed (5493s)
– feature extraction: 0.3s /
machine
22
The execution time of each step under different
numbers of machine entities
F1, Precision, and Recall scores of CTF without
and with alerting
Overall performance
• Scalability
– Pre-training: fixed (5493s)
– feature extraction: 0.3s /
machine
– Clustering: much smaller
– Fine-tuning: 448s / model
23
The execution time of each step under different
numbers of machine entities
F1, Precision, and Recall scores of CTF without
and with alerting
Overall performance
• Scalability
– Pre-training: fixed (5493s)
– feature extraction: 0.3s /
machine
– Clustering: much smaller
– Fine-tuning: 448s / model
• Effectiveness
– F1: 0.830->0.892
24
The execution time of each step under different
numbers of machine entities
F1, Precision, and Recall scores of CTF without
and with alerting
Overall performance
• Validating the Synthetic
Framework
– One model/machine
– One model for all
– CTF w/o transfer
25
Comparison with model variations
F1 and training time under different numbers of
epochs for CTF w/o transfer
Overall performance
• Validating the Synthetic
Framework
– One model/machine
– One model for all
– CTF w/o transfer
26
Comparison with model variations
F1 and training time under different numbers of
epochs for CTF w/o transfer
Overall performance
• Validating the Synthetic
Framework
– One model/machine
– One model for all
– CTF w/o transfer
27
Comparison with model variations
F1 and training time under different numbers of
epochs for CTF w/o transfer
Overall performance
• Validating the Synthetic
Framework
– One model/machine
– One model for all
– CTF w/o transfer
28
Comparison with model variations
F1 and training time under different numbers of
epochs for CTF w/o transfer
Validating Design Choices
• Choice of Clustering Objects
– SPF, ROCKA, DCN
• Choice of Distance Measures
– KL divergence, JS divergence,
mean squared error
• Choice of Clustering Algorithms
– DBSCAN, K-medoids
29
Outline
Background Design Evaluation Conclusion
30
Conclusion
• CTF: synthetic framework, high-dimensional time series
(machine, KPI, time)
• Techniques: 𝒛𝒕 distribution clustering, model reuse, fine-tuning
• Evaluation: CTF scalability and effectiveness
• Labeling tool + labeled dataset
31
Thank you!
Q & A
sunm19@mails.tsinghua.edu.cn
INFOCOM 2021
32

More Related Content

Similar to CTF: Anomaly Detection in High-Dimensional Time Series with Coarse-to-Fine Model Transfer

ExplainableAI.pptx
ExplainableAI.pptxExplainableAI.pptx
ExplainableAI.pptx
Andrea Morichetta
 
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Dalei Li
 
Deep learning fundamental and Research project on IBM POWER9 system from NUS
Deep learning fundamental and Research project on IBM POWER9 system from NUSDeep learning fundamental and Research project on IBM POWER9 system from NUS
Deep learning fundamental and Research project on IBM POWER9 system from NUS
Ganesan Narayanasamy
 
Improving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN ApplicationsImproving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN Applications
Chester Chen
 
Machine Learning @NECST
Machine Learning @NECSTMachine Learning @NECST
Machine Learning @NECST
NECST Lab @ Politecnico di Milano
 
Handwritten Digit Recognition and performance of various modelsation[autosaved]
Handwritten Digit Recognition and performance of various modelsation[autosaved]Handwritten Digit Recognition and performance of various modelsation[autosaved]
Handwritten Digit Recognition and performance of various modelsation[autosaved]
SubhradeepMaji
 
3_Transfer_Learning.pdf
3_Transfer_Learning.pdf3_Transfer_Learning.pdf
3_Transfer_Learning.pdf
FEG
 
Application of machine learning and cognitive computing in intrusion detectio...
Application of machine learning and cognitive computing in intrusion detectio...Application of machine learning and cognitive computing in intrusion detectio...
Application of machine learning and cognitive computing in intrusion detectio...
Mahdi Hosseini Moghaddam
 
Tsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in ChinaTsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in China
DataStax Academy
 
NLP Classifier Models & Metrics
NLP Classifier Models & MetricsNLP Classifier Models & Metrics
NLP Classifier Models & Metrics
Sanghamitra Deb
 
RT15 Berkeley | ARTEMiS-SSN Features for Micro-grid / Renewable Energy Sourc...
RT15 Berkeley |  ARTEMiS-SSN Features for Micro-grid / Renewable Energy Sourc...RT15 Berkeley |  ARTEMiS-SSN Features for Micro-grid / Renewable Energy Sourc...
RT15 Berkeley | ARTEMiS-SSN Features for Micro-grid / Renewable Energy Sourc...
OPAL-RT TECHNOLOGIES
 
Surveillance scene classification using machine learning
Surveillance scene classification using machine learningSurveillance scene classification using machine learning
Surveillance scene classification using machine learning
Utkarsh Contractor
 
Master defence 2020 -Volodymyr Lut-Neural Architecture Search: a Probabilisti...
Master defence 2020 -Volodymyr Lut-Neural Architecture Search: a Probabilisti...Master defence 2020 -Volodymyr Lut-Neural Architecture Search: a Probabilisti...
Master defence 2020 -Volodymyr Lut-Neural Architecture Search: a Probabilisti...
Lviv Data Science Summer School
 
Beyond data and model parallelism for deep neural networks
Beyond data and model parallelism for deep neural networksBeyond data and model parallelism for deep neural networks
Beyond data and model parallelism for deep neural networks
JunKudo2
 
convolutional_rbm.ppt
convolutional_rbm.pptconvolutional_rbm.ppt
convolutional_rbm.ppt
AyushSingh398902
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
DonghyunKang12
 
Intelligence at scale through AI model efficiency
Intelligence at scale through AI model efficiencyIntelligence at scale through AI model efficiency
Intelligence at scale through AI model efficiency
Qualcomm Research
 
Thesis Report - Gaurav Raina MSc ES - v2
Thesis Report - Gaurav Raina MSc ES - v2Thesis Report - Gaurav Raina MSc ES - v2
Thesis Report - Gaurav Raina MSc ES - v2
Gaurav Raina
 
Review of the paper: Traffic-aware Frequency Scaling for Balanced On-Chip Net...
Review of the paper: Traffic-aware Frequency Scaling for Balanced On-Chip Net...Review of the paper: Traffic-aware Frequency Scaling for Balanced On-Chip Net...
Review of the paper: Traffic-aware Frequency Scaling for Balanced On-Chip Net...
Luca Sinico
 
OpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation finalOpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation final
Junli Gu
 

Similar to CTF: Anomaly Detection in High-Dimensional Time Series with Coarse-to-Fine Model Transfer (20)

ExplainableAI.pptx
ExplainableAI.pptxExplainableAI.pptx
ExplainableAI.pptx
 
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...
 
Deep learning fundamental and Research project on IBM POWER9 system from NUS
Deep learning fundamental and Research project on IBM POWER9 system from NUSDeep learning fundamental and Research project on IBM POWER9 system from NUS
Deep learning fundamental and Research project on IBM POWER9 system from NUS
 
Improving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN ApplicationsImproving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN Applications
 
Machine Learning @NECST
Machine Learning @NECSTMachine Learning @NECST
Machine Learning @NECST
 
Handwritten Digit Recognition and performance of various modelsation[autosaved]
Handwritten Digit Recognition and performance of various modelsation[autosaved]Handwritten Digit Recognition and performance of various modelsation[autosaved]
Handwritten Digit Recognition and performance of various modelsation[autosaved]
 
3_Transfer_Learning.pdf
3_Transfer_Learning.pdf3_Transfer_Learning.pdf
3_Transfer_Learning.pdf
 
Application of machine learning and cognitive computing in intrusion detectio...
Application of machine learning and cognitive computing in intrusion detectio...Application of machine learning and cognitive computing in intrusion detectio...
Application of machine learning and cognitive computing in intrusion detectio...
 
Tsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in ChinaTsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in China
 
NLP Classifier Models & Metrics
NLP Classifier Models & MetricsNLP Classifier Models & Metrics
NLP Classifier Models & Metrics
 
RT15 Berkeley | ARTEMiS-SSN Features for Micro-grid / Renewable Energy Sourc...
RT15 Berkeley |  ARTEMiS-SSN Features for Micro-grid / Renewable Energy Sourc...RT15 Berkeley |  ARTEMiS-SSN Features for Micro-grid / Renewable Energy Sourc...
RT15 Berkeley | ARTEMiS-SSN Features for Micro-grid / Renewable Energy Sourc...
 
Surveillance scene classification using machine learning
Surveillance scene classification using machine learningSurveillance scene classification using machine learning
Surveillance scene classification using machine learning
 
Master defence 2020 -Volodymyr Lut-Neural Architecture Search: a Probabilisti...
Master defence 2020 -Volodymyr Lut-Neural Architecture Search: a Probabilisti...Master defence 2020 -Volodymyr Lut-Neural Architecture Search: a Probabilisti...
Master defence 2020 -Volodymyr Lut-Neural Architecture Search: a Probabilisti...
 
Beyond data and model parallelism for deep neural networks
Beyond data and model parallelism for deep neural networksBeyond data and model parallelism for deep neural networks
Beyond data and model parallelism for deep neural networks
 
convolutional_rbm.ppt
convolutional_rbm.pptconvolutional_rbm.ppt
convolutional_rbm.ppt
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
 
Intelligence at scale through AI model efficiency
Intelligence at scale through AI model efficiencyIntelligence at scale through AI model efficiency
Intelligence at scale through AI model efficiency
 
Thesis Report - Gaurav Raina MSc ES - v2
Thesis Report - Gaurav Raina MSc ES - v2Thesis Report - Gaurav Raina MSc ES - v2
Thesis Report - Gaurav Raina MSc ES - v2
 
Review of the paper: Traffic-aware Frequency Scaling for Balanced On-Chip Net...
Review of the paper: Traffic-aware Frequency Scaling for Balanced On-Chip Net...Review of the paper: Traffic-aware Frequency Scaling for Balanced On-Chip Net...
Review of the paper: Traffic-aware Frequency Scaling for Balanced On-Chip Net...
 
OpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation finalOpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation final
 

Recently uploaded

How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
Rakesh Kumar R
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
Quickdice ERP
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
rodomar2
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptxLORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
lorraineandreiamcidl
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
Green Software Development
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
Remote DBA Services
 
DDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systemsDDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systems
Gerardo Pardo-Castellote
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
Grant Fritchey
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
Hironori Washizaki
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
kalichargn70th171
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
Remote DBA Services
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
brainerhub1
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
Peter Muessig
 

Recently uploaded (20)

How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptxLORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
 
DDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systemsDDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systems
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
 

CTF: Anomaly Detection in High-Dimensional Time Series with Coarse-to-Fine Model Transfer

  • 1. CTF: Anomaly Detection in High-Dimensional Time Series with Coarse-to-Fine Model Transfer Ming Sun, Ya Su, Shenglin Zhang, Yuanpu Cao, Yuqing Liu, Dan Pei, Wenfei Wu, Yongsu Zhang, Xiaozhou Liu, Junliang Tang INFOCOM 2021
  • 4. DL Algorithms in the Infra Operation 4 • Advantages – automation – robustness – Saving operator’s labor • Example: – RNN-VAE for anomaly detection
  • 5. RNN-VAE Based Algorithms 5 Network architecture of RNN-VAE models at time t 𝒙𝒕 (49) -> 𝒛𝒕 (3) -> 𝒙𝒕 % (49) RNN Dense layers "# $# "# % RNN Dense layers Variational Auto-Encoder (VAE) KPI dimension reduced Network Layers • RNN: Shallow & general • Dense layers: Deep & specific
  • 6. Scalability is the problem for large scale 6 • High-Dimensional Data – Machines: in millions – KPI: in tens – Time: Frequent data query (2880 samples/day) Ø One model per machine: time 10X minutes * 1X million machines Ø One model for all: accuracy
  • 7. Scalability is the problem for large scale 7 • High-Dimensional Data – Machines: in millions – KPI: in tens – Time: Frequent data query (2880 samples/day) Goal: devise scalable deep learning (DL) algorithms for large-scale anomaly detection
  • 8. 8 • Intuition: Cluster Machines first, then run DL for each cluster • Challenge 1: clustering model training • Clustering cannot run on high-dimensional data • DL cannot run on whole dataset without clustering • Solution: Synthetic framework Intuition and Challenges dependency Coarse-grained model -> clustering -> fine-grained models
  • 9. 9 • Intuition: Cluster Machines first, then run DL for each cluster • Challenge 1: clustering model training • Clustering cannot run on high-dimensional data • DL cannot run on whole dataset without clustering • Solution: Synthetic framework • Challenge 2: High dimension of time domain • Hard to cluster even KPI is compressed • Solution: compress sequence to z-distribution Intuition and Challenges dependency
  • 10. 10 • Intuition: Cluster Machines first, then run DL for each cluster • Challenge 1: clustering model training • Clustering cannot run on high-dimensional data • DL cannot run on whole dataset without clustering • Solution: Synthetic framework • Challenge 2: High dimension of time domain • Hard to cluster even KPI is compressed • Solution: compress sequence to z-distribution • Challenge 3: Neural network training method • Solution: fine-tuning strategy • Freeze RNN and tune dense layers Intuition and Challenges dependency RNN Dense layers "# $# "# % RNN Dense layers
  • 12. Framework of model training 12 Framework of model training :0 : 4 4 2 30 . : 0 2: 4 0 0 0 :0 0 : . 4 .34 0 . 0:4 2 4 :4- 4 1 .34 0 0 0 : 10: 14 0 4 2 14 0 2: 4 0 0 0: . 0: . 0: (M) 0 0 0 (K<<M) 0 1: : 4 .34 0
  • 13. Framework of model training 13 Framework of model training :0 : 4 4 2 30 . : 0 2: 4 0 0 0 :0 0 : . 4 .34 0 . 0:4 2 4 :4- 4 1 .34 0 0 0 : 10: 14 0 4 2 14 0 2: 4 0 0 0: . 0: . 0: (M) 0 0 0 (K<<M) 0 1: : 4 .34 0 • Sampling strategy: • Machine sampling • Time sampling
  • 14. Framework of model training 14 Framework of model training :0 : 4 4 2 30 . : 0 2: 4 0 0 0 :0 0 : . 4 .34 0 . 0:4 2 4 :4- 4 1 .34 0 0 0 : 10: 14 0 4 2 14 0 2: 4 0 0 0: . 0: . 0: (M) 0 0 0 (K<<M) 0 1: : 4 .34 0 𝒙𝒕 sequence 𝒛𝒕 sequence 𝒛𝒕 distribution
  • 15. Framework of model training 15 Framework of model training :0 : 4 4 2 30 . : 0 2: 4 0 0 0 :0 0 : . 4 .34 0 . 0:4 2 4 :4- 4 1 .34 0 0 0 : 10: 14 0 4 2 14 0 2: 4 0 0 0: . 0: . 0: (M) 0 0 0 (K<<M) 0 1: : 4 .34 0 𝒛𝒕 distribution distance matrix clustering results Wasserstein distance HAC algorithm
  • 16. Framework of model training 16 Framework of model training :0 : 4 4 2 30 . : 0 2: 4 0 0 0 :0 0 : . 4 .34 0 . 0:4 2 4 :4- 4 1 .34 0 0 0 : 10: 14 0 4 2 14 0 2: 4 0 0 0: . 0: . 0: (M) 0 0 0 (K<<M) 0 1: : 4 .34 0 RNN Dense layers "# $# "# % RNN Dense layers • Fine-tuning strategy: • RNN: fixed • Dense layers: tuned
  • 17. System architecture 17 System architecture Data API Online Anomaly Detection ( IV-C) Offline Data Online Data Offline Model Training ( IV-B) Model Score Outlier Alerting ( V-D) Results & Visualization Data Preprocessing ( IV-A) Monitored machine entities 1. Data preprocessing 2. Offline model training 3. Online anomaly detection
  • 18. Labeling tools 18 The interface of the labeling tool
  • 20. Dataset & performance metrics 20 • Dataset: – # Machine entities: 533 – Dimension of each machine entity: 49 KPIs x 37440 time points (frequency: 30s, 13 days) – Training = first 5 days, Testing = last 8 days • Metrics: – F1, Precision, Recall: average of all machine entities. – Model training time
  • 21. Overall performance • Scalability – Pre-training: fixed (5493s) 21 The execution time of each step under different numbers of machine entities F1, Precision, and Recall scores of CTF without and with alerting
  • 22. Overall performance • Scalability – Pre-training: fixed (5493s) – feature extraction: 0.3s / machine 22 The execution time of each step under different numbers of machine entities F1, Precision, and Recall scores of CTF without and with alerting
  • 23. Overall performance • Scalability – Pre-training: fixed (5493s) – feature extraction: 0.3s / machine – Clustering: much smaller – Fine-tuning: 448s / model 23 The execution time of each step under different numbers of machine entities F1, Precision, and Recall scores of CTF without and with alerting
  • 24. Overall performance • Scalability – Pre-training: fixed (5493s) – feature extraction: 0.3s / machine – Clustering: much smaller – Fine-tuning: 448s / model • Effectiveness – F1: 0.830->0.892 24 The execution time of each step under different numbers of machine entities F1, Precision, and Recall scores of CTF without and with alerting
  • 25. Overall performance • Validating the Synthetic Framework – One model/machine – One model for all – CTF w/o transfer 25 Comparison with model variations F1 and training time under different numbers of epochs for CTF w/o transfer
  • 26. Overall performance • Validating the Synthetic Framework – One model/machine – One model for all – CTF w/o transfer 26 Comparison with model variations F1 and training time under different numbers of epochs for CTF w/o transfer
  • 27. Overall performance • Validating the Synthetic Framework – One model/machine – One model for all – CTF w/o transfer 27 Comparison with model variations F1 and training time under different numbers of epochs for CTF w/o transfer
  • 28. Overall performance • Validating the Synthetic Framework – One model/machine – One model for all – CTF w/o transfer 28 Comparison with model variations F1 and training time under different numbers of epochs for CTF w/o transfer
  • 29. Validating Design Choices • Choice of Clustering Objects – SPF, ROCKA, DCN • Choice of Distance Measures – KL divergence, JS divergence, mean squared error • Choice of Clustering Algorithms – DBSCAN, K-medoids 29
  • 31. Conclusion • CTF: synthetic framework, high-dimensional time series (machine, KPI, time) • Techniques: 𝒛𝒕 distribution clustering, model reuse, fine-tuning • Evaluation: CTF scalability and effectiveness • Labeling tool + labeled dataset 31
  • 32. Thank you! Q & A sunm19@mails.tsinghua.edu.cn INFOCOM 2021 32