PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep Learning Clusters

•

0 likes•37 views

LEGaTO presentation at ACM Middleware 2020, by Isabelly Rocha, Nathaniel Morris, Lydia Y. Chen, Pascal Felber, Robert Birke and Valerio Schiavoni

Science

21st International Middleware Conference
December 7 - 11, 2020, Delft, Netherlands
PipeTune: Pipeline Parallelism of Hyper and System
Parameters Tuning for Deep Learning Clusters
Isabelly Rocha1, Nathaniel Morris2, Lydia Y. Chen3, Pascal Felber1, Robert Birke4, Valerio Schiavoni1
1University of Neuchâtel, 2The Ohio State University, 3TU Delft, 4ABB Research

Deep Learning
How many
neurons
should each
layer have?
How many
epochs
should it
run?
Which
learning rate
to define?
How many
layers to
use?

Hyperparameters Tuning
run
optimize()
Hyperparameters Model Parameters Score
n_layers = 3
n_neuros = 1024
learning_rate = 0.1
Weights
Optimization
60%
n_layers = 5
n_neuros = 512
learning_rate = 0.1
Weights
Optimization
75%

Hyperparameters Tuning
Hey, you going
to sleep?
Yes, now
shut up
What if you try 0.01 as
a learning rate?

Hyperparameters Autotuning
model dataset
hyper-
parameters
ranges
metric optimization
function
Hyperparameter Tuner
trained model
optimal hyper-
parameters
Google
Vizier
user

Auto-tuning: What is the problem?
Estimated Cost of
Tuning 6 Parameters
Cost[$]
0
22,5
45
67,5
90
EC2 Instances
m4.4xlarge m4.8xlarge m5.12xlarge m5.16xlarge m5.24xlarge
Tuning Time by
Number of Parameters
TuningTime
[hours]
0
1
2
3
4
Number of Parameters
1 2 3 4 5 6
The user can define only one objective function in the existing auto-tuning tools.
The chosen function is typically accuracy and the tuning performance is ignored.
Tuning duration grows exponentially with the number of parameters to be tuned.
Using more resources to improve the tuning performance is an expensive solution.

Auto-tuning: How to improve it?
1. Hyperparameters not only impact accuracy but also tuning duration and energy.
2. The optimal system parameters depend on the chosen hyperparameters.
Batch Size Impact
Diﬀerence[%]
-70
-60
-50
-40
-30
-20
-10
0
Batch Size
64 256 1024
Accuracy Duration Energy Cores Impact on Duration
DurationDiﬀerence[%]
-45
-30
-15
0
15
30
45
60
Number of Cores
2 4 8
Batch 64 Batch 256 Batch 1024
Baseline: batch size = 32. Baseline: number of cores = 1.

Evaluation: Setup
Baseline
Tune: Hyperparameter tuning only (i.e.,
no system parameter considered)
Workloads
Scenarios
Environment
I. Single Node (Intel E5-2620 with 8 cores)
Implemented on top of Keras and TensorFlow
II. Distributed Cluster (4x Intel E3-1275 with 8 cores)
Implemented on top of Spark using BigDL
I. Single-Tenancy
“Offline mode” showing results of running an
independent unseen HPT Job.
II. Multi-Tenancy
“Online mode” showing the averaged response
time of a synthetic trace with 90% load.

Evaluation Scenario I (Single-Node)
Model AccuracyAccuracy[%]
0
20
40
60
80
jacobi spkmeans bfs
Tune PipeTune
Tuning Duration
Time[s]
0
750
1500
2250
3000
jacobi spkmeans bfs
Tune PipeTune
Training Duration
Time[s]
0
10
20
30
40
jacobi spkmeans bfs
Tune PipeTune
Tuning Energy
Energy[kJ]
0
0,225
0,45
0,675
0,9
jacobi spkmeans bfs
Tune PipeTune

Evaluation Scenario II
Averaged Response Time
Time[s]
0
3000
6000
9000
12000
jacobi spkmeans bfs
Tune PipeTune
Single Node Distributed Cluster
Averaged Response Time
Time[s]
0
2375
4750
7125
9500
mnist new20 all
Tune PipeTune

Summary
• PipeTune is a novel approach for DNN tuning jobs;
• Leverages the combination of hyper with system parameter tuning to achieve high
model accuracy under low runtime and energy consumption;
• Experimental evaluation performed under various scenarios and using diﬀerent state-
of-the-art workloads indicates promising results;
• Reduces the tuning time up to 23%;
• Speeds up the training time by up to 1.7x;
• Lowers energy consumption up to 20%;
• Refer to the paper for: more detailed evaluation and intermediate solution;
• Source code available in: https://github.com/isabellyrocha/pipetune.

What's hot

Cloud Computingbutest

IT talk: Как я перестал бояться и полюбил TestNGDataArt

"Эффективность и оптимизация кода в Java 8" Сергей МоренецFwdays

Introduction to ChainerShunta Saito

Multithreading to Construct Neural NetworksAltoros

Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16MLconf

TensorFlow in ContextAltoros

Object classification using CNN & VGG16 Model (Keras and Tensorflow) Lalit Jain

Applying your Convolutional Neural NetworksDatabricks

Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016MLconf

Python libraryToniyaP1

Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...MLconf

multithreadingRajkattamuri

MultithreadingF K

Anomaly Detection at ScaleJeff Henrikson

GDG-Shanghai 2017 TensorFlow Summit RecapJiang Jun

JavaKhasim Cise

StormPouyan Rezazadeh

Threading Successes 05 Smokeguest40fc7cd

Machine-Learning-based Performance Heuristics for Runtime CPU/GPU SelectionAkihiro Hayashi

What's hot (20)

Cloud Computing

IT talk: Как я перестал бояться и полюбил TestNG

"Эффективность и оптимизация кода в Java 8" Сергей Моренец

Introduction to Chainer

Multithreading to Construct Neural Networks

Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16

TensorFlow in Context

Object classification using CNN & VGG16 Model (Keras and Tensorflow)

Applying your Convolutional Neural Networks

Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016

Python library

Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...

multithreading

Multithreading

Anomaly Detection at Scale

GDG-Shanghai 2017 TensorFlow Summit Recap

Java

Storm

Threading Successes 05 Smoke

Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection

Similar to PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep Learning Clusters

How to win data science competitions with Deep LearningSri Ambati

Effective BenchmarksWorkhorse Computing

StackNet Meta-Modelling frameworkSri Ambati

Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...Fisnik Kraja

H2O World - Top 10 Deep Learning Tips & Tricks - Arno CandelSri Ambati

Deep learningAman Kamboj

Separating Hype from Reality in Deep Learning with Sameer FarooquiDatabricks

Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsIntel® Software

Seven deadly sins of ElasticSearch BenchmarkingFan Robbin

Josh Patterson MLconf slidesMLconf

08 neural networksankit_ppt

Ann model and its applicationmilan107

An Efficient Reactive Model for Resource Discovery in DHT-Based Peer-to-Peer ...James Salter

Convolutional neural networkItachi SK

Parallelism in a NumPy-based programRalf Gommers

Python for Image Understanding: Deep Learning with Convolutional Neural NetsRoelof Pieters

Puppet Camp Melbourne 2014: Node Collaboration with PuppetDB Puppet

eam2butest

IRJET - Implementation of Neural Network on FPGAIRJET Journal

Making fitting in RooFit fasterPatrick Bos

Similar to PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep Learning Clusters (20)

How to win data science competitions with Deep Learning

Effective Benchmarks

StackNet Meta-Modelling framework

Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...

H2O World - Top 10 Deep Learning Tips & Tricks - Arno Candel

Deep learning

Separating Hype from Reality in Deep Learning with Sameer Farooqui

Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors

Seven deadly sins of ElasticSearch Benchmarking

Josh Patterson MLconf slides

08 neural networks

Ann model and its application

An Efficient Reactive Model for Resource Discovery in DHT-Based Peer-to-Peer ...

Convolutional neural network

Parallelism in a NumPy-based program

Python for Image Understanding: Deep Learning with Convolutional Neural Nets

Puppet Camp Melbourne 2014: Node Collaboration with PuppetDB

eam2

IRJET - Implementation of Neural Network on FPGA

Making fitting in RooFit faster

Recently uploaded

Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani

Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1

Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314

Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1

Luciferase in rDNA technology (biotechnology).pptxAleenaTreesaSaji

G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2

Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls

Nanoparticles synthesis and characterization kaibalyasahoo82800

Isotopic evidence of long-lived volcanism on IoSérgio Sacani

Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh

Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani

Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6

A relative description on Sonoporation.pdfnehabiju2046

Orientation, design and principles of polyhousejana861314

Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P

Animal Communication- Auditory and Visual.pptxUmerFayaz5

Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar

Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA

Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha

zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069

Recently uploaded (20)

Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...

Work, Energy and Power for class 10 ICSE Physics

Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...

Recombinant DNA technology (Immunological screening)

Luciferase in rDNA technology (biotechnology).pptx

G9 Science Q4- Week 1-2 Projectile Motion.ppt

Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR

Nanoparticles synthesis and characterization

Isotopic evidence of long-lived volcanism on Io

Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝

Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b

Biopesticide (2).pptx .This slides helps to know the different types of biop...

A relative description on Sonoporation.pdf

Orientation, design and principles of polyhouse

Artificial Intelligence In Microbiology by Dr. Prince C P

Animal Communication- Auditory and Visual.pptx

Analytical Profile of Coleus Forskohlii | Forskolin .pptx

Grafana in space: Monitoring Japan's SLIM moon lander in real time

Physiochemical properties of nanomaterials and its nanotoxicity.pptx

zoogeography of pakistan.pptx fauna of Pakistan

PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep Learning Clusters

1. 21st International Middleware Conference December 7 - 11, 2020, Delft, Netherlands PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep Learning Clusters Isabelly Rocha1, Nathaniel Morris2, Lydia Y. Chen3, Pascal Felber1, Robert Birke4, Valerio Schiavoni1 1University of Neuchâtel, 2The Ohio State University, 3TU Delft, 4ABB Research

2. Deep Learning How many neurons should each layer have? How many epochs should it run? Which learning rate to define? How many layers to use?

3. Hyperparameters Tuning run optimize() Hyperparameters Model Parameters Score n_layers = 3 n_neuros = 1024 learning_rate = 0.1 Weights Optimization 60% n_layers = 5 n_neuros = 512 learning_rate = 0.1 Weights Optimization 75%

4. Hyperparameters Tuning Hey, you going to sleep? Yes, now shut up What if you try 0.01 as a learning rate?

5. Hyperparameters Autotuning model dataset hyperparameters ranges metric optimization function Hyperparameter Tuner trained model optimal hyperparameters Google Vizier user

6. Auto-tuning: What is the problem? Estimated Cost of Tuning 6 Parameters Cost[$] 0 22,5 45 67,5 90 EC2 Instances m4.4xlarge m4.8xlarge m5.12xlarge m5.16xlarge m5.24xlarge Tuning Time by Number of Parameters TuningTime [hours] 0 1 2 3 4 Number of Parameters 1 2 3 4 5 6 The user can define only one objective function in the existing auto-tuning tools. The chosen function is typically accuracy and the tuning performance is ignored. Tuning duration grows exponentially with the number of parameters to be tuned. Using more resources to improve the tuning performance is an expensive solution.

7. Auto-tuning: How to improve it? 1. Hyperparameters not only impact accuracy but also tuning duration and energy. 2. The optimal system parameters depend on the chosen hyperparameters. Batch Size Impact Diﬀerence[%] -70 -60 -50 -40 -30 -20 -10 0 Batch Size 64 256 1024 Accuracy Duration Energy Cores Impact on Duration DurationDiﬀerence[%] -45 -30 -15 0 15 30 45 60 Number of Cores 2 4 8 Batch 64 Batch 256 Batch 1024 Baseline: batch size = 32. Baseline: number of cores = 1.

8. Hyperparameter Tuner

9. Hyper-parameter Tuner: Optimized

10. Evaluation: Setup Baseline Tune: Hyperparameter tuning only (i.e., no system parameter considered) Workloads Scenarios Environment I. Single Node (Intel E5-2620 with 8 cores) Implemented on top of Keras and TensorFlow II. Distributed Cluster (4x Intel E3-1275 with 8 cores) Implemented on top of Spark using BigDL I. Single-Tenancy “Offline mode” showing results of running an independent unseen HPT Job. II. Multi-Tenancy “Online mode” showing the averaged response time of a synthetic trace with 90% load.

11. Evaluation Scenario I (Single-Node) Model AccuracyAccuracy[%] 0 20 40 60 80 jacobi spkmeans bfs Tune PipeTune Tuning Duration Time[s] 0 750 1500 2250 3000 jacobi spkmeans bfs Tune PipeTune Training Duration Time[s] 0 10 20 30 40 jacobi spkmeans bfs Tune PipeTune Tuning Energy Energy[kJ] 0 0,225 0,45 0,675 0,9 jacobi spkmeans bfs Tune PipeTune

12. Evaluation Scenario II Averaged Response Time Time[s] 0 3000 6000 9000 12000 jacobi spkmeans bfs Tune PipeTune Single Node Distributed Cluster Averaged Response Time Time[s] 0 2375 4750 7125 9500 mnist new20 all Tune PipeTune

13. Summary • PipeTune is a novel approach for DNN tuning jobs; • Leverages the combination of hyper with system parameter tuning to achieve high model accuracy under low runtime and energy consumption; • Experimental evaluation performed under various scenarios and using diﬀerent state- of-the-art workloads indicates promising results; • Reduces the tuning time up to 23%; • Speeds up the training time by up to 1.7x; • Lowers energy consumption up to 20%; • Refer to the paper for: more detailed evaluation and intermediate solution; • Source code available in: https://github.com/isabellyrocha/pipetune.

PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep Learning Clusters

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep Learning Clusters

Similar to PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep Learning Clusters (20)

More from LEGATO project

More from LEGATO project (20)

Recently uploaded

Recently uploaded (20)

PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep Learning Clusters