PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep Learning Clusters
1. 21st International Middleware Conference
December 7 - 11, 2020, Delft, Netherlands
PipeTune: Pipeline Parallelism of Hyper and System
Parameters Tuning for Deep Learning Clusters
Isabelly Rocha1, Nathaniel Morris2, Lydia Y. Chen3, Pascal Felber1, Robert Birke4, Valerio Schiavoni1
1University of Neuchâtel, 2The Ohio State University, 3TU Delft, 4ABB Research
6. Auto-tuning: What is the problem?
Estimated Cost of
Tuning 6 Parameters
Cost[$]
0
22,5
45
67,5
90
EC2 Instances
m4.4xlarge m4.8xlarge m5.12xlarge m5.16xlarge m5.24xlarge
Tuning Time by
Number of Parameters
TuningTime
[hours]
0
1
2
3
4
Number of Parameters
1 2 3 4 5 6
The user can define only one objective function in the existing auto-tuning tools.
The chosen function is typically accuracy and the tuning performance is ignored.
Tuning duration grows exponentially with the number of parameters to be tuned.
Using more resources to improve the tuning performance is an expensive solution.
7. Auto-tuning: How to improve it?
1. Hyperparameters not only impact accuracy but also tuning duration and energy.
2. The optimal system parameters depend on the chosen hyperparameters.
Batch Size Impact
Difference[%]
-70
-60
-50
-40
-30
-20
-10
0
Batch Size
64 256 1024
Accuracy Duration Energy Cores Impact on Duration
DurationDifference[%]
-45
-30
-15
0
15
30
45
60
Number of Cores
2 4 8
Batch 64 Batch 256 Batch 1024
Baseline: batch size = 32. Baseline: number of cores = 1.
10. Evaluation: Setup
Baseline
Tune: Hyperparameter tuning only (i.e.,
no system parameter considered)
Workloads
Scenarios
Environment
I. Single Node (Intel E5-2620 with 8 cores)
Implemented on top of Keras and TensorFlow
II. Distributed Cluster (4x Intel E3-1275 with 8 cores)
Implemented on top of Spark using BigDL
I. Single-Tenancy
“Offline mode” showing results of running an
independent unseen HPT Job.
II. Multi-Tenancy
“Online mode” showing the averaged response
time of a synthetic trace with 90% load.
12. Evaluation Scenario II
Averaged Response Time
Time[s]
0
3000
6000
9000
12000
jacobi spkmeans bfs
Tune PipeTune
Single Node Distributed Cluster
Averaged Response Time
Time[s]
0
2375
4750
7125
9500
mnist new20 all
Tune PipeTune
13. Summary
• PipeTune is a novel approach for DNN tuning jobs;
• Leverages the combination of hyper with system parameter tuning to achieve high
model accuracy under low runtime and energy consumption;
• Experimental evaluation performed under various scenarios and using different state-
of-the-art workloads indicates promising results;
• Reduces the tuning time up to 23%;
• Speeds up the training time by up to 1.7x;
• Lowers energy consumption up to 20%;
• Refer to the paper for: more detailed evaluation and intermediate solution;
• Source code available in: https://github.com/isabellyrocha/pipetune.