SlideShare a Scribd company logo
1 of 38
Download to read offline
Improving Resource Availability in
Data Centers using Deep Learning
(深層学習を使用したデータセンタにおける資源利用効率の向上)
Kundjanasith Thonglek
Software Design & Analysis Laboratory
Outline
➢ Introduction
➢ Improving availability of computing resources
○ Methodology
○ Evaluation
➢ Improving availability of storage resources
○ Methodology
○ Evaluation
➢ Conclusion
2
Software Design & Analysis Laboratory
Outline
➢ Introduction
➢ Improving availability of computing resources
○ Methodology
○ Evaluation
➢ Improving availability of storage resources
○ Methodology
○ Evaluation
➢ Conclusion
3
Software Design & Analysis Laboratory
Data Centers
Data centers are centralized facilities where computing and storage
hardware are aggregated to handle large amounts of data and computation.
4
Software Design & Analysis Laboratory
Technical challenges
➢ System monitoring
➢ Energy management
➢ Continuous migration
➢ Availability improvement
Objective
I aim to improve the availability of computing and storage resources in
data centers by applying deep learning.
5
Software Design & Analysis Laboratory
Resource utilization is paramount to many cloud providers as they need
to utilize their hardware resources efficiently to maximize profit.
Storage Resources
❖ Hard Disk
Computing Resources
❖ CPU, Memory
Outline
➢ Introduction
➢ Improving availability of computing resources
○ Methodology
○ Evaluation
➢ Improving availability of storage resources
○ Methodology
○ Evaluation
➢ Conclusion
6
Software Design & Analysis Laboratory
Users excessively request computing resources
➢ Users tend to request more computing resources than their applications actually need
○ Unused computing resources by application are wasted
○ Overall computing resource utilization in the data centers degrades
7
Software Design & Analysis Laboratory
wasted resource
Overview of Proposed Method
8
Software Design & Analysis Laboratory
Analyzing
Cluster Usage
Designing
Neural Network
Evaluation
Training
LSTM Model
Analyzing Cluster Usage
Designing Neural Network
Training LSTM Model
Evaluation
Analyze Google’s cluster
usage trace obtained from
a production data center
Design an LSTM-based model to
predict better resource
allocation from historical data of
resource usage and allocation.
Train our model using
Google’s cluster usage trace
Evaluate improvement of
resource utilization using
Google’s cluster scheduler
simulator
Analyzing Cluster Usage
9
Software Design & Analysis Laboratory
Google’s cluster usage trace is real workload data in Google’s data center
Computing Resource Requested Resource Used Resource
CPU Requested CPU Used CPU
Memory Requested memory Used memory
Long Short-Term Memory
Recurrent Neural Network (RNN)
➢ Deep learning model for time-series forecasting
➢ Model size not increasing with size of input
➢ Weights are shared across time
10
Software Design & Analysis Laboratory
Long Short-Term Memory (LSTM) introduces long-term memory into RNN
➢ LSTM migrates the vanishing gradient problem, where the neural
network stops learning because the updates to the various weights
within a given neural network become smaller and smaller
➢ The memory cell replaces hidden neurons used in traditional RNNs to
build a hidden layer
Proposed neural network
Input: The requested and used of CPU and memory resources
1st
LSTM: Finding the correlation between CPU and memory
2nd
LSTM: Finding the correlation between allocated and used
Fully Connected: Connected each neuron to one layers
Output: The efficient CPU and memory allocation
11
Software Design & Analysis Laboratory
Training LSTM Model
Improving resource utilization by
implement Long Short-Term Memory
model using requested CPU, requested
memory, used CPU and used memory.
12
Software Design & Analysis Laboratory
Allocated Resource
Used Resource
Memory (%)
CPU (%)
Memory (%)
CPU (%)
M
O
D
E
L
Memory cell size
➔ 20 minutes
➔ 40 minutes
➔ 60 minutes
The memory cell size in
Long Short-Term Memory
model is memorizing each
step input-output pair of
values in each sequence.
Usage Simulation
Simulate resource utilization in
data center from allocated resource
which is predicted using our time-
series predictive model to apply with
the actual computing resources.
13
Software Design & Analysis Laboratory
Google’s cluster usage data
(513,000 jobs)
Training dataset
(80%)
Testing dataset
(20%)
[LSTM/RNN] MODEL
Allocated Resource [Predicted]
CPU (%)
Memory (%)
Resource Allocation
CPU (%)
Memory (%)
Google’s simulation
Outline
➢ Introduction
➢ Improving availability of computing resources
○ Methodology
○ Evaluation
➢ Improving availability of storage resources
○ Methodology
○ Evaluation
➢ Conclusion
14
Software Design & Analysis Laboratory
-3% -6% -4% -8% -7%
-11%
-12%
-27% -23%
-34% -35%
-48%
Decreased Computing Resource Wastage
15
Software Design & Analysis Laboratory
CPU Memory
Training time & Inference time
16
Software Design & Analysis Laboratory
408.93
35.67
130.82
49.77
35.13
28.78
Training time Inference time
*For 100 epochs *For 102,600 jobs
Outline
➢ Introduction
➢ Improving availability of computing resources
○ Methodology
○ Evaluation
➢ Improving availability of storage resources
○ Methodology
○ Evaluation
➢ Conclusion
17
Software Design & Analysis Laboratory
ML models are becoming larger
ML model compression improves the storage usage efficiency by reducing
the size of ML models, and increases the availability of storage resources.
18
Software Design & Analysis Laboratory
Model Name Model Size Application
GPT-3 700 GB Language Processing
VGG-16 528 MB Image Classification
Mask RCNN 256 MB Object Detection
Normally, ML model compression reduces the model size, but it also
decreases the accuracy.
Compressing models while maintaining accuracy
19
Software Design & Analysis Laboratory
Quantization Retraining
Original Model Compressed Model
Quantized Model
Decrease model size
with loss of accuracy
Increase model accuracy
while keeping the model size
20
Software Design & Analysis Laboratory
Quantization Retraining
Original Model Compressed Model
Quantized Model
Decrease model size
with loss of accuracy
Increase model accuracy
while keeping the model size
Compressing models while maintaining accuracy
Calculate
clusters
Vector Quantization
21
Software Design & Analysis Laboratory
Calculate
centroids
Vector Quantization - lossy data compression
22
Software Design & Analysis Laboratory
Quantization Retraining
Original Model Compressed Model
Quantized Model
Decrease model size
with loss of accuracy
Increase model accuracy
while keeping the model size
Compressing models while maintaining accuracy
Retraining using unlabeled data
23
Software Design & Analysis Laboratory
Most existing retraining methods require the labeled datasets to retrain.
Using unlabeled dataset for retraining is highly useful when the labeled
dataset is unavailable.
DATA
LABEL
Researcher/Developer Labeled Dataset
Privacy policy, License limitation
DATA
Researcher/Developer Unlabeled Dataset
Proposed Retraining Method
24
Software Design & Analysis Laboratory
Unlabeled
Data set
Quantized model
Non-trainable layer
Trainable layer
Original model
Trainable layer
Output vector
Output vector
Loss
Outline
➢ Introduction
➢ Improving availability of computing resources
○ Methodology
○ Evaluation
➢ Improving availability of storage resources
○ Methodology
○ Evaluation
➢ Conclusion
25
Software Design & Analysis Laboratory
Case Study
26
Software Design & Analysis Laboratory
VGG-16 ResNet-50
Case Study of VGG-16
27
Software Design & Analysis Laboratory
Model Architecture Bias Value Weight Value
10
8
10
3
Model Quantization
28
Software Design & Analysis Laboratory
Size of Quantized
VGG-16 models
Accuracy of Quantized
VGG-16 models
# of
quanized
layers
Model Retraining
29
Software Design & Analysis Laboratory
Retraining Quantized VGG-16 models
Quantizing the 14th
and 15th
layers using 32-256 centroids
achieved nearly the accuracy of the original model.
The best configuration for quantizing
VGG-16 model
- Quantize the biases in all layer using
1 centroid
- Quantize the weights in 14th
and 15th
layers using 32 centroids
It compressed to possible smallest model size without
significant accuracy loss.
# of centroids
Case Study of ResNet-50
30
Software Design & Analysis Laboratory
Model Architecture Bias Value Weight Value
Model Quantization
31
Software Design & Analysis Laboratory
Size of Quantized
ResNet-50 models
Accuracy of Quantized
ResNet-50 models
# of
quanized
layers
Model Retraining
32
Software Design & Analysis Laboratory
Retraining Quantized ResNet-50 models
Quantizing the 13th
- 49th
layers using 128 or less centroids
clearly degrades the accuracy of the model.
The best configuration for quantizing
ResNet-50 model
- Quantize the biases in all layer using
1 centroid
- Quantize the weights in 13th
- 49th
layers using 256 centroids
It compressed to possible smallest model size without
significant accuracy loss.
# of centroids
Conventional & Proposed Retraining
33
Software Design & Analysis Laboratory
Accuracy of quantized model through retraining Retraining time of quantized model
85%
82%
*Conventional retraining method is retraining all layers in the model
Outline
➢ Introduction
➢ Improving availability of computing resources
○ Methodology
○ Evaluation
➢ Improving availability of storage resources
○ Methodology
○ Evaluation
➢ Conclusion
34
Software Design & Analysis Laboratory
Conclusion
➢ Improving availability of computing resources
○ We proposed the method for predicting the efficient allocated computing resources from the
proposed LSTM-based prediction model to improve computing resource availability
○ The proposed method is able to improve computing resource availability of the CPU and
memory by 11% and 48%, respectively
➢ Improving availability of storage resources
○ We proposed the method for reducing the size of the neural network models without the
significant accuracy loss to improve storage resource availability
○ The proposed method is able to improve storage resource availability of VGG16 and
ResNet50 by 81% and 52%, respectively
35
Software Design & Analysis Laboratory
Future Work
➢ Improving availability of computing resources
○ The significant features that impact to computing resource availability should be
investigated for conducting the efficient method
○ We would like to apply the other time-series forecasting techniques to improve the
availability of computing resources
➢ Improving availability of storage resources
○ The structure of other neural network models should be investigated to conduct the efficient
retraining method
○ We would like to apply the compression techniques other than quantization technique for
reducing the size of neural network models
36
Software Design & Analysis Laboratory
Publications
➢ Improving availability of computing resources
○ Kundjanasith Thonglek, Kohei Ichikawa, Keichi Takahashi, Chawanat Nakasan, and Hajimu
Iida, “Improving Resource Utilization in Data Centers using an LSTM-based Prediction
Model”, Proceedings of Workshop on Monitoring and Analysis for High Performance
Computing System Plus Applications (HCPMASPA 2019), September, 2019.
➢ Improving availability of storage resources
○ Kundjanasith Thonglek, Keichi Takahashi, Kohei Ichikawa, Chawanat Nakasan, Nakada
Hidemoto, Ryousei Takano, and Hajimu Iida, “Retraining Quantized Neural Network Model
without Unlabeled Data”, Proceedings of International Joint Conference on Neural Networks
(IJCNN 2020), July, 2020.
37
Software Design & Analysis Laboratory
Q&A
Thank you
Email: thonglek.kundjanasith.ti7@is.naist.jp
Software Design & Analysis Laboratory

More Related Content

Similar to Improving Resource Availability in Data Center using Deep Learning.pdf

Guiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning PipelineGuiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning PipelineMichael Gerke
 
Auto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningAuto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningDatabricks
 
Task Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive LearningTask Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive LearningMLAI2
 
Triple 20 IT – How to reduce costs on target while increasing speed and quality
Triple 20 IT – How to reduce costs on target while increasing speed and qualityTriple 20 IT – How to reduce costs on target while increasing speed and quality
Triple 20 IT – How to reduce costs on target while increasing speed and qualityStormForge .io
 
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...Databricks
 
FlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at HumanaFlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at HumanaDatabricks
 
A methodology for full system power modeling in heterogeneous data centers
A methodology for full system power modeling in  heterogeneous data centersA methodology for full system power modeling in  heterogeneous data centers
A methodology for full system power modeling in heterogeneous data centersRaimon Bosch
 
Invited cloud-e-Genome project talk at 2015 NGS Data Congress
Invited cloud-e-Genome project talk at 2015 NGS Data CongressInvited cloud-e-Genome project talk at 2015 NGS Data Congress
Invited cloud-e-Genome project talk at 2015 NGS Data CongressPaolo Missier
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...Robert Grossman
 
Kcc201728apr2017 170828235330
Kcc201728apr2017 170828235330Kcc201728apr2017 170828235330
Kcc201728apr2017 170828235330JEE HYUN PARK
 
Tuning the Untunable - Insights on Deep Learning Optimization
Tuning the Untunable - Insights on Deep Learning OptimizationTuning the Untunable - Insights on Deep Learning Optimization
Tuning the Untunable - Insights on Deep Learning OptimizationSigOpt
 
Table of Contents
Table of ContentsTable of Contents
Table of Contentsbutest
 
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...inside-BigData.com
 
TFX: A tensor flow-based production-scale machine learning platform
TFX: A tensor flow-based production-scale machine learning platformTFX: A tensor flow-based production-scale machine learning platform
TFX: A tensor flow-based production-scale machine learning platformShunya Ueta
 
Performance evaluation of a multi-core system using Systems development meth...
 Performance evaluation of a multi-core system using Systems development meth... Performance evaluation of a multi-core system using Systems development meth...
Performance evaluation of a multi-core system using Systems development meth...Yoshifumi Sakamoto
 
Approximation techniques used for general purpose algorithms
Approximation techniques used for general purpose algorithmsApproximation techniques used for general purpose algorithms
Approximation techniques used for general purpose algorithmsSabidur Rahman
 
Review of Existing Methods in K-means Clustering Algorithm
Review of Existing Methods in K-means Clustering AlgorithmReview of Existing Methods in K-means Clustering Algorithm
Review of Existing Methods in K-means Clustering AlgorithmIRJET Journal
 
A Study of Efficiency Improvements Technique for K-Means Algorithm
A Study of Efficiency Improvements Technique for K-Means AlgorithmA Study of Efficiency Improvements Technique for K-Means Algorithm
A Study of Efficiency Improvements Technique for K-Means AlgorithmIRJET Journal
 

Similar to Improving Resource Availability in Data Center using Deep Learning.pdf (20)

Guiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning PipelineGuiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning Pipeline
 
Auto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningAuto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine Learning
 
Task Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive LearningTask Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive Learning
 
Triple 20 IT – How to reduce costs on target while increasing speed and quality
Triple 20 IT – How to reduce costs on target while increasing speed and qualityTriple 20 IT – How to reduce costs on target while increasing speed and quality
Triple 20 IT – How to reduce costs on target while increasing speed and quality
 
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
 
FlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at HumanaFlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at Humana
 
A methodology for full system power modeling in heterogeneous data centers
A methodology for full system power modeling in  heterogeneous data centersA methodology for full system power modeling in  heterogeneous data centers
A methodology for full system power modeling in heterogeneous data centers
 
Invited cloud-e-Genome project talk at 2015 NGS Data Congress
Invited cloud-e-Genome project talk at 2015 NGS Data CongressInvited cloud-e-Genome project talk at 2015 NGS Data Congress
Invited cloud-e-Genome project talk at 2015 NGS Data Congress
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
 
Kcc201728apr2017 170828235330
Kcc201728apr2017 170828235330Kcc201728apr2017 170828235330
Kcc201728apr2017 170828235330
 
Tuning the Untunable - Insights on Deep Learning Optimization
Tuning the Untunable - Insights on Deep Learning OptimizationTuning the Untunable - Insights on Deep Learning Optimization
Tuning the Untunable - Insights on Deep Learning Optimization
 
Ds for finance day 3
Ds for finance day 3Ds for finance day 3
Ds for finance day 3
 
C3 w3
C3 w3C3 w3
C3 w3
 
Table of Contents
Table of ContentsTable of Contents
Table of Contents
 
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...
 
TFX: A tensor flow-based production-scale machine learning platform
TFX: A tensor flow-based production-scale machine learning platformTFX: A tensor flow-based production-scale machine learning platform
TFX: A tensor flow-based production-scale machine learning platform
 
Performance evaluation of a multi-core system using Systems development meth...
 Performance evaluation of a multi-core system using Systems development meth... Performance evaluation of a multi-core system using Systems development meth...
Performance evaluation of a multi-core system using Systems development meth...
 
Approximation techniques used for general purpose algorithms
Approximation techniques used for general purpose algorithmsApproximation techniques used for general purpose algorithms
Approximation techniques used for general purpose algorithms
 
Review of Existing Methods in K-means Clustering Algorithm
Review of Existing Methods in K-means Clustering AlgorithmReview of Existing Methods in K-means Clustering Algorithm
Review of Existing Methods in K-means Clustering Algorithm
 
A Study of Efficiency Improvements Technique for K-Means Algorithm
A Study of Efficiency Improvements Technique for K-Means AlgorithmA Study of Efficiency Improvements Technique for K-Means Algorithm
A Study of Efficiency Improvements Technique for K-Means Algorithm
 

More from Kundjanasith Thonglek

Sparse Communication for Federated Learning
Sparse Communication for Federated LearningSparse Communication for Federated Learning
Sparse Communication for Federated LearningKundjanasith Thonglek
 
Enhancing the Prediction Accuracy of Solar Power Generation using a Generativ...
Enhancing the Prediction Accuracy of Solar Power Generation using a Generativ...Enhancing the Prediction Accuracy of Solar Power Generation using a Generativ...
Enhancing the Prediction Accuracy of Solar Power Generation using a Generativ...Kundjanasith Thonglek
 
Federated Learning of Neural Network Models with Heterogeneous Structures.pdf
Federated Learning of Neural Network Models with Heterogeneous Structures.pdfFederated Learning of Neural Network Models with Heterogeneous Structures.pdf
Federated Learning of Neural Network Models with Heterogeneous Structures.pdfKundjanasith Thonglek
 
Abnormal Gait Recognition in Real-Time using Recurrent Neural Networks.pdf
Abnormal Gait Recognition in Real-Time using Recurrent Neural Networks.pdfAbnormal Gait Recognition in Real-Time using Recurrent Neural Networks.pdf
Abnormal Gait Recognition in Real-Time using Recurrent Neural Networks.pdfKundjanasith Thonglek
 
Retraining Quantized Neural Network Models with Unlabeled Data.pdf
Retraining Quantized Neural Network Models with Unlabeled Data.pdfRetraining Quantized Neural Network Models with Unlabeled Data.pdf
Retraining Quantized Neural Network Models with Unlabeled Data.pdfKundjanasith Thonglek
 
Improving Resource Utilization in Data Centers using an LSTM-based Prediction...
Improving Resource Utilization in Data Centers using an LSTM-based Prediction...Improving Resource Utilization in Data Centers using an LSTM-based Prediction...
Improving Resource Utilization in Data Centers using an LSTM-based Prediction...Kundjanasith Thonglek
 
Intelligent Vehicle Accident Analysis System.pdf
Intelligent Vehicle Accident Analysis System.pdfIntelligent Vehicle Accident Analysis System.pdf
Intelligent Vehicle Accident Analysis System.pdfKundjanasith Thonglek
 
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdfAuto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdfKundjanasith Thonglek
 

More from Kundjanasith Thonglek (8)

Sparse Communication for Federated Learning
Sparse Communication for Federated LearningSparse Communication for Federated Learning
Sparse Communication for Federated Learning
 
Enhancing the Prediction Accuracy of Solar Power Generation using a Generativ...
Enhancing the Prediction Accuracy of Solar Power Generation using a Generativ...Enhancing the Prediction Accuracy of Solar Power Generation using a Generativ...
Enhancing the Prediction Accuracy of Solar Power Generation using a Generativ...
 
Federated Learning of Neural Network Models with Heterogeneous Structures.pdf
Federated Learning of Neural Network Models with Heterogeneous Structures.pdfFederated Learning of Neural Network Models with Heterogeneous Structures.pdf
Federated Learning of Neural Network Models with Heterogeneous Structures.pdf
 
Abnormal Gait Recognition in Real-Time using Recurrent Neural Networks.pdf
Abnormal Gait Recognition in Real-Time using Recurrent Neural Networks.pdfAbnormal Gait Recognition in Real-Time using Recurrent Neural Networks.pdf
Abnormal Gait Recognition in Real-Time using Recurrent Neural Networks.pdf
 
Retraining Quantized Neural Network Models with Unlabeled Data.pdf
Retraining Quantized Neural Network Models with Unlabeled Data.pdfRetraining Quantized Neural Network Models with Unlabeled Data.pdf
Retraining Quantized Neural Network Models with Unlabeled Data.pdf
 
Improving Resource Utilization in Data Centers using an LSTM-based Prediction...
Improving Resource Utilization in Data Centers using an LSTM-based Prediction...Improving Resource Utilization in Data Centers using an LSTM-based Prediction...
Improving Resource Utilization in Data Centers using an LSTM-based Prediction...
 
Intelligent Vehicle Accident Analysis System.pdf
Intelligent Vehicle Accident Analysis System.pdfIntelligent Vehicle Accident Analysis System.pdf
Intelligent Vehicle Accident Analysis System.pdf
 
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdfAuto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
 

Recently uploaded

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 

Recently uploaded (20)

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 

Improving Resource Availability in Data Center using Deep Learning.pdf

  • 1. Improving Resource Availability in Data Centers using Deep Learning (深層学習を使用したデータセンタにおける資源利用効率の向上) Kundjanasith Thonglek Software Design & Analysis Laboratory
  • 2. Outline ➢ Introduction ➢ Improving availability of computing resources ○ Methodology ○ Evaluation ➢ Improving availability of storage resources ○ Methodology ○ Evaluation ➢ Conclusion 2 Software Design & Analysis Laboratory
  • 3. Outline ➢ Introduction ➢ Improving availability of computing resources ○ Methodology ○ Evaluation ➢ Improving availability of storage resources ○ Methodology ○ Evaluation ➢ Conclusion 3 Software Design & Analysis Laboratory
  • 4. Data Centers Data centers are centralized facilities where computing and storage hardware are aggregated to handle large amounts of data and computation. 4 Software Design & Analysis Laboratory Technical challenges ➢ System monitoring ➢ Energy management ➢ Continuous migration ➢ Availability improvement
  • 5. Objective I aim to improve the availability of computing and storage resources in data centers by applying deep learning. 5 Software Design & Analysis Laboratory Resource utilization is paramount to many cloud providers as they need to utilize their hardware resources efficiently to maximize profit. Storage Resources ❖ Hard Disk Computing Resources ❖ CPU, Memory
  • 6. Outline ➢ Introduction ➢ Improving availability of computing resources ○ Methodology ○ Evaluation ➢ Improving availability of storage resources ○ Methodology ○ Evaluation ➢ Conclusion 6 Software Design & Analysis Laboratory
  • 7. Users excessively request computing resources ➢ Users tend to request more computing resources than their applications actually need ○ Unused computing resources by application are wasted ○ Overall computing resource utilization in the data centers degrades 7 Software Design & Analysis Laboratory wasted resource
  • 8. Overview of Proposed Method 8 Software Design & Analysis Laboratory Analyzing Cluster Usage Designing Neural Network Evaluation Training LSTM Model Analyzing Cluster Usage Designing Neural Network Training LSTM Model Evaluation Analyze Google’s cluster usage trace obtained from a production data center Design an LSTM-based model to predict better resource allocation from historical data of resource usage and allocation. Train our model using Google’s cluster usage trace Evaluate improvement of resource utilization using Google’s cluster scheduler simulator
  • 9. Analyzing Cluster Usage 9 Software Design & Analysis Laboratory Google’s cluster usage trace is real workload data in Google’s data center Computing Resource Requested Resource Used Resource CPU Requested CPU Used CPU Memory Requested memory Used memory
  • 10. Long Short-Term Memory Recurrent Neural Network (RNN) ➢ Deep learning model for time-series forecasting ➢ Model size not increasing with size of input ➢ Weights are shared across time 10 Software Design & Analysis Laboratory Long Short-Term Memory (LSTM) introduces long-term memory into RNN ➢ LSTM migrates the vanishing gradient problem, where the neural network stops learning because the updates to the various weights within a given neural network become smaller and smaller ➢ The memory cell replaces hidden neurons used in traditional RNNs to build a hidden layer
  • 11. Proposed neural network Input: The requested and used of CPU and memory resources 1st LSTM: Finding the correlation between CPU and memory 2nd LSTM: Finding the correlation between allocated and used Fully Connected: Connected each neuron to one layers Output: The efficient CPU and memory allocation 11 Software Design & Analysis Laboratory
  • 12. Training LSTM Model Improving resource utilization by implement Long Short-Term Memory model using requested CPU, requested memory, used CPU and used memory. 12 Software Design & Analysis Laboratory Allocated Resource Used Resource Memory (%) CPU (%) Memory (%) CPU (%) M O D E L Memory cell size ➔ 20 minutes ➔ 40 minutes ➔ 60 minutes The memory cell size in Long Short-Term Memory model is memorizing each step input-output pair of values in each sequence.
  • 13. Usage Simulation Simulate resource utilization in data center from allocated resource which is predicted using our time- series predictive model to apply with the actual computing resources. 13 Software Design & Analysis Laboratory Google’s cluster usage data (513,000 jobs) Training dataset (80%) Testing dataset (20%) [LSTM/RNN] MODEL Allocated Resource [Predicted] CPU (%) Memory (%) Resource Allocation CPU (%) Memory (%) Google’s simulation
  • 14. Outline ➢ Introduction ➢ Improving availability of computing resources ○ Methodology ○ Evaluation ➢ Improving availability of storage resources ○ Methodology ○ Evaluation ➢ Conclusion 14 Software Design & Analysis Laboratory
  • 15. -3% -6% -4% -8% -7% -11% -12% -27% -23% -34% -35% -48% Decreased Computing Resource Wastage 15 Software Design & Analysis Laboratory CPU Memory
  • 16. Training time & Inference time 16 Software Design & Analysis Laboratory 408.93 35.67 130.82 49.77 35.13 28.78 Training time Inference time *For 100 epochs *For 102,600 jobs
  • 17. Outline ➢ Introduction ➢ Improving availability of computing resources ○ Methodology ○ Evaluation ➢ Improving availability of storage resources ○ Methodology ○ Evaluation ➢ Conclusion 17 Software Design & Analysis Laboratory
  • 18. ML models are becoming larger ML model compression improves the storage usage efficiency by reducing the size of ML models, and increases the availability of storage resources. 18 Software Design & Analysis Laboratory Model Name Model Size Application GPT-3 700 GB Language Processing VGG-16 528 MB Image Classification Mask RCNN 256 MB Object Detection Normally, ML model compression reduces the model size, but it also decreases the accuracy.
  • 19. Compressing models while maintaining accuracy 19 Software Design & Analysis Laboratory Quantization Retraining Original Model Compressed Model Quantized Model Decrease model size with loss of accuracy Increase model accuracy while keeping the model size
  • 20. 20 Software Design & Analysis Laboratory Quantization Retraining Original Model Compressed Model Quantized Model Decrease model size with loss of accuracy Increase model accuracy while keeping the model size Compressing models while maintaining accuracy
  • 21. Calculate clusters Vector Quantization 21 Software Design & Analysis Laboratory Calculate centroids Vector Quantization - lossy data compression
  • 22. 22 Software Design & Analysis Laboratory Quantization Retraining Original Model Compressed Model Quantized Model Decrease model size with loss of accuracy Increase model accuracy while keeping the model size Compressing models while maintaining accuracy
  • 23. Retraining using unlabeled data 23 Software Design & Analysis Laboratory Most existing retraining methods require the labeled datasets to retrain. Using unlabeled dataset for retraining is highly useful when the labeled dataset is unavailable. DATA LABEL Researcher/Developer Labeled Dataset Privacy policy, License limitation DATA Researcher/Developer Unlabeled Dataset
  • 24. Proposed Retraining Method 24 Software Design & Analysis Laboratory Unlabeled Data set Quantized model Non-trainable layer Trainable layer Original model Trainable layer Output vector Output vector Loss
  • 25. Outline ➢ Introduction ➢ Improving availability of computing resources ○ Methodology ○ Evaluation ➢ Improving availability of storage resources ○ Methodology ○ Evaluation ➢ Conclusion 25 Software Design & Analysis Laboratory
  • 26. Case Study 26 Software Design & Analysis Laboratory VGG-16 ResNet-50
  • 27. Case Study of VGG-16 27 Software Design & Analysis Laboratory Model Architecture Bias Value Weight Value 10 8 10 3
  • 28. Model Quantization 28 Software Design & Analysis Laboratory Size of Quantized VGG-16 models Accuracy of Quantized VGG-16 models # of quanized layers
  • 29. Model Retraining 29 Software Design & Analysis Laboratory Retraining Quantized VGG-16 models Quantizing the 14th and 15th layers using 32-256 centroids achieved nearly the accuracy of the original model. The best configuration for quantizing VGG-16 model - Quantize the biases in all layer using 1 centroid - Quantize the weights in 14th and 15th layers using 32 centroids It compressed to possible smallest model size without significant accuracy loss. # of centroids
  • 30. Case Study of ResNet-50 30 Software Design & Analysis Laboratory Model Architecture Bias Value Weight Value
  • 31. Model Quantization 31 Software Design & Analysis Laboratory Size of Quantized ResNet-50 models Accuracy of Quantized ResNet-50 models # of quanized layers
  • 32. Model Retraining 32 Software Design & Analysis Laboratory Retraining Quantized ResNet-50 models Quantizing the 13th - 49th layers using 128 or less centroids clearly degrades the accuracy of the model. The best configuration for quantizing ResNet-50 model - Quantize the biases in all layer using 1 centroid - Quantize the weights in 13th - 49th layers using 256 centroids It compressed to possible smallest model size without significant accuracy loss. # of centroids
  • 33. Conventional & Proposed Retraining 33 Software Design & Analysis Laboratory Accuracy of quantized model through retraining Retraining time of quantized model 85% 82% *Conventional retraining method is retraining all layers in the model
  • 34. Outline ➢ Introduction ➢ Improving availability of computing resources ○ Methodology ○ Evaluation ➢ Improving availability of storage resources ○ Methodology ○ Evaluation ➢ Conclusion 34 Software Design & Analysis Laboratory
  • 35. Conclusion ➢ Improving availability of computing resources ○ We proposed the method for predicting the efficient allocated computing resources from the proposed LSTM-based prediction model to improve computing resource availability ○ The proposed method is able to improve computing resource availability of the CPU and memory by 11% and 48%, respectively ➢ Improving availability of storage resources ○ We proposed the method for reducing the size of the neural network models without the significant accuracy loss to improve storage resource availability ○ The proposed method is able to improve storage resource availability of VGG16 and ResNet50 by 81% and 52%, respectively 35 Software Design & Analysis Laboratory
  • 36. Future Work ➢ Improving availability of computing resources ○ The significant features that impact to computing resource availability should be investigated for conducting the efficient method ○ We would like to apply the other time-series forecasting techniques to improve the availability of computing resources ➢ Improving availability of storage resources ○ The structure of other neural network models should be investigated to conduct the efficient retraining method ○ We would like to apply the compression techniques other than quantization technique for reducing the size of neural network models 36 Software Design & Analysis Laboratory
  • 37. Publications ➢ Improving availability of computing resources ○ Kundjanasith Thonglek, Kohei Ichikawa, Keichi Takahashi, Chawanat Nakasan, and Hajimu Iida, “Improving Resource Utilization in Data Centers using an LSTM-based Prediction Model”, Proceedings of Workshop on Monitoring and Analysis for High Performance Computing System Plus Applications (HCPMASPA 2019), September, 2019. ➢ Improving availability of storage resources ○ Kundjanasith Thonglek, Keichi Takahashi, Kohei Ichikawa, Chawanat Nakasan, Nakada Hidemoto, Ryousei Takano, and Hajimu Iida, “Retraining Quantized Neural Network Model without Unlabeled Data”, Proceedings of International Joint Conference on Neural Networks (IJCNN 2020), July, 2020. 37 Software Design & Analysis Laboratory