INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
Distributed Deep Learning Using Hopsworks
SF Machine Learning
Mesosphere
Kim Hammar
kim@logicalclocks.com
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
DISTRIBUTED COMPUTING + DEEP LEARNING = ?
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Distributed Computing Deep Learning
Why Combine the two?
2em11
Chen Sun et al. “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”. In: CoRR
abs/1707.02968 (2017). arXiv: 1707.02968. URL: http://arxiv.org/abs/1707.02968.
2em12
Jeffrey Dean et al. “Large Scale Distributed Deep Networks”. In: Advances in Neural Information Processing
Systems 25. Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1223–1231.
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
DISTRIBUTED COMPUTING + DEEP LEARNING = ?
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Distributed Computing Deep Learning
Why Combine the two?
We like challenging problems
2em11
Chen Sun et al. “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”. In: CoRR
abs/1707.02968 (2017). arXiv: 1707.02968. URL: http://arxiv.org/abs/1707.02968.
2em12
Jeffrey Dean et al. “Large Scale Distributed Deep Networks”. In: Advances in Neural Information Processing
Systems 25. Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1223–1231.
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
DISTRIBUTED COMPUTING + DEEP LEARNING = ?
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Distributed Computing Deep Learning
Why Combine the two?
We like challenging problems
More productive data science
Unreasonable effectiveness of data1
To achieve state-of-the-art results2
2em11
Chen Sun et al. “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”. In: CoRR
abs/1707.02968 (2017). arXiv: 1707.02968. URL: http://arxiv.org/abs/1707.02968.
2em12
Jeffrey Dean et al. “Large Scale Distributed Deep Networks”. In: Advances in Neural Information Processing
Systems 25. Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1223–1231.
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
DISTRIBUTED DEEP LEARNING (DDL): PREDICTABLE
SCALING
3
2em13
Jeff Dean. Building Intelligent Systems withLarge Scale Deep Learning. https : / / www . scribd . com /
document/355752799/Jeff-Dean-s-Lecture-for-YC-AI. 2018.
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
DISTRIBUTED DEEP LEARNING (DDL): PREDICTABLE
SCALING
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
DDL IS NOT A SECRET ANYMORE
4
2em14
Tal Ben-Nun and Torsten Hoefler. “Demystifying Parallel and Distributed Deep Learning: An In-Depth
Concurrency Analysis”. In: CoRR abs/1802.09941 (2018). arXiv: 1802.09941. URL: http://arxiv.org/abs/
1802.09941.
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
DDL IS NOT A SECRET ANYMORE
TensorflowOnSpark
CaffeOnSpark
Distributed TF
Frameworks for DDL Companies using DDL
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
DDL REQUIRES AN ENTIRE
SOFTWARE/INFRASTRUCTURE STACK
e1
e2
e3
Distributed Training
e4
Gradient
Gradient
Gradient
Gradient
Distributed Systems
Data Validation
Feature Engineering
Data Collection
Hardware
Management
HyperParameter
Tuning
Model
Serving
Pipeline Management
A/B
Testing
Monitoring
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
OUTLINE
1. Hopsworks: Background of the platform
2. Managed Distributed Deep Learning using HopsYARN,
HopsML, PySpark, and Tensorflow
3. Black-Box Optimization using Hopsworks, Metadata
Store, PySpark, and Maggy5
2em15
Moritz Meister and Sina Sheikholeslami. Maggy. https://github.com/logicalclocks/maggy.
2019.
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
OUTLINE
1. Hopsworks: Background of the platform
2. Managed Distributed Deep Learning using HopsYARN,
HopsML, PySpark, and Tensorflow
3. Black-Box Optimization using Hopsworks, Metadata
Store, PySpark, and Maggy6
2em16
Moritz Meister and Sina Sheikholeslami. Maggy. https://github.com/logicalclocks/maggy.
2019.
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
HOPSWORKS
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
HOPSWORKS
HopsFS
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
HOPSWORKS
HopsFS
HopsYARN
(GPU/CPU as a resource)
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
HOPSWORKS
HopsFS
HopsYARN
(GPU/CPU as a resource)
Frameworks
(ML/Data)
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
HOPSWORKS
HopsFS
HopsYARN
(GPU/CPU as a resource)
Frameworks
(ML/Data)
Feature Store Pipelines Experiments Models
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
ML/AI Assets
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
HOPSWORKS
HopsFS
HopsYARN
(GPU/CPU as a resource)
Frameworks
(ML/Data)
Feature Store Pipelines Experiments Models
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
ML/AI Assets
from hops import featurestore
from hops import experiment
featurestore.get_features([
"average_attendance",
"average_player_age"])
experiment.collective_all_reduce(features , model)
APIs
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
HOPSWORKS
HopsFS
HopsYARN
(GPU/CPU as a resource)
Frameworks
(ML/Data)
Distributed Metadata
(Available from REST API)
Feature Store Pipelines Experiments Models
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
ML/AI Assets
from hops import featurestore
from hops import experiment
featurestore.get_features([
"average_attendance",
"average_player_age"])
experiment.collective_all_reduce(features , model)
APIs
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
INNER AND OUTER LOOP OF LARGE SCALE DEEP
LEARNING
Inner loop
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
. . .
worker1 worker2 workerN
Data
Synchronization
1 2 N
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
INNER AND OUTER LOOP OF LARGE SCALE DEEP
LEARNING
Outer loop
Metric τ
Search Method
hparams h
Inner loop
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
. . .
worker1 worker2 workerN
Data
Synchronization
1 2 N
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
INNER AND OUTER LOOP OF LARGE SCALE DEEP
LEARNING
Outer loop
Metric τ
Search Method
hparams h
Inner loop
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
. . .
worker1 worker2 workerN
Data
Synchronization
1 2 N
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
INNER LOOP: DISTRIBUTED DEEP LEARNING
e1
e2
e3
e4
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
INNER LOOP: DISTRIBUTED DEEP LEARNING
e1
e2
e3
e4
Gradient
Gradient
Gradient
Gradient
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
INNER LOOP: DISTRIBUTED DEEP LEARNING
e1
e2
e3
e4
p1
p2
p3
p4
Gradient
Gradient
Gradient
Gradient
Data Partition
Data Partition
Data Partition
Data Partition
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
DISTRIBUTED DEEP LEARNING IN PRACTICE
Implementation
of distributed
algorithms is
becoming a
commodity (TF,
PyTorch etc)
The hardest part
of DDL is now:
Cluster
management
Allocating
GPUs
Data
management
Operations &
performance
?
Models GPUs Data Distribution
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
HOPSWORKS DDL SOLUTION
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
HOPSWORKS DDL SOLUTION
from hops import experiment
experiment.collective_all_reduce(train_fn)
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
HOPSWORKS DDL SOLUTION
from hops
import experiment
experiment.
collective_all_reduce(
train_fn
)
HopsYARN RM
YARN container
GPU as a resource
YARN container
GPU as a resource
YARN container
GPU as a resource
YARN container
GPU as a resource
Resource requests
Client API
YARN container
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
HOPSWORKS DDL SOLUTION
from hops
import experiment
experiment.
collective_all_reduce(
train_fn
)
HopsYARN RM
YARN container
GPU as a resource
Spark executor
YARN container
GPU as a resource
Spark executor
YARN container
GPU as a resource
Spark executor
YARN container
GPU as a resource
Spark executor
Resource requests
Client API
YARN container
Spark driver
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
HOPSWORKS DDL SOLUTION
from hops
import experiment
experiment.
collective_all_reduce(
train_fn
)
HopsYARN RM
YARN container
GPU as a resource
conda env
Spark executor
YARN container
GPU as a resource
conda env
Spark executor
YARN container
GPU as a resource
conda env
Spark executor
YARN container
GPU as a resource
conda env
Spark executor
Resource requests
Client API
YARN container
conda env
Spark driver
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
HOPSWORKS DDL SOLUTION
from hops
import experiment
experiment.
collective_all_reduce(
train_fn
)
HopsYARN RM
YARN container
GPU as a resource
conda env
Spark executor
YARN container
GPU as a resource
conda env
Spark executor
YARN container
GPU as a resource
conda env
Spark executor
YARN container
GPU as a resource
conda env
Spark executor
Resource requests
Client API
Here is my ip: 192.168.1.1
Here is my ip: 192.168.1.2
Here is my ip: 192.168.1.3
Here is my ip: 192.168.1.4
YARN container
conda env
Spark driver
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
HOPSWORKS DDL SOLUTION
from hops
import experiment
experiment.
collective_all_reduce(
train_fn
)
HopsYARN RM
YARN container
GPU as a resource
conda env
Spark executor
YARN container
GPU as a resource
conda env
Spark executor
YARN container
GPU as a resource
conda env
Spark executor
YARN container
GPU as a resource
conda env
Spark executor
Resource requests
Client API
Gradient
Gradient
GradientGradient
YARN container
conda env
Spark driver
Hops Distributed File System (HopsFS)
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
HOPSWORKS DDL SOLUTION
from hops
import experiment
experiment.
collective_all_reduce(
train_fn
)
HopsYARN RM
YARN container
GPU as a resource
conda env
Spark executor
YARN container
GPU as a resource
conda env
Spark executor
YARN container
GPU as a resource
conda env
Spark executor
YARN container
GPU as a resource
conda env
Spark executor
Resource requests
Client API
Gradient
Gradient
GradientGradient
YARN container
conda env
Spark driver
Hops Distributed File System (HopsFS)
Hide complexity behind simple API
Allocate resources using pyspark
Allocate GPUs for spark executors using HopsYARN
Serve sharded training data to workers from HopsFS
Use HopsFS for aggregating logs, checkpoints and results
Store experiment metadata in metastore
Use dynamic allocation for interactive resource management
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
OUTER LOOP: BLACK BOX OPTIMIZATION
Outer loop
Metric τ
Search Method
hparams h
Inner loop
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
. . .
worker1 worker2 workerN
Data
Synchronization
1 2 N
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
OUTER LOOP: BLACK BOX OPTIMIZATION
Outer loop
Metric τ
Search Method
hparams h
Inner loop
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
. . .
worker1 worker2 workerN
Data
Synchronization
1 2 N
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
OUTER LOOP: BLACK BOX OPTIMIZATION




x1
...
xn




Features
Hyperparameters
η, num_layers, neurons
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Model θ
ˆy
Prediction
L(y, ˆy)
Loss
Gradient
θL(y, ˆy)
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
OUTER LOOP: BLACK BOX OPTIMIZATION
Example Use-Case from one of our clients:
Goal: Train a One-Class GAN model for fraud detection
Problem: GANs are extremely sensitive to
hyperparameters and there exists a very large space of
possible hyperparameters.
Example hyperparameters to tune: learning rates η,
optimizers, layers.. etc.
Real input x
Random Noise z
Generator
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Discriminator
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
OUTER LOOP: BLACK BOX OPTIMIZATION
Num Neurons/Layer
25
30
35
40
45 Num
Layers
2
4
6
8
10
12
LearningRate
0.00
0.02
0.04
0.06
0.08
0.10
Search Space




x1
...
xn




Features
Hyperparameters
η, num_layers, neurons
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Model θ
ˆy
Prediction
L(y, ˆy)
Loss
Gradient
θL(y, ˆy)
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
OUTER LOOP: BLACK BOX OPTIMIZATION
Num Neurons/Layer
25
30
35
40
45 Num
Layers
2
4
6
8
10
12
LearningRate
0.00
0.02
0.04
0.06
0.08
0.10
Search Space
η1, ..
η2, ..
η3, ..
η4, ..
η5, ..
Shared Task Queue Parallel Workers
w1
w1
w1
w1




x1
...
xn




Features
Hyperparameters
η, num_layers, neurons
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Model θ
ˆy
Prediction
L(y, ˆy)
Loss
Gradient
θL(y, ˆy)
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
OUTER LOOP: BLACK BOX OPTIMIZATION
Num Neurons/Layer
25
30
35
40
45 Num
Layers
2
4
6
8
10
12
LearningRate
0.00
0.02
0.04
0.06
0.08
0.10
Search Space
η1, ..
η2, ..
η3, ..
η4, ..
η5, ..
Shared Task Queue Parallel Workers
w1
w1
w1
w1




x1
...
xn




Features
Hyperparameters
η, num_layers, neurons
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Model θ
ˆy
Prediction
L(y, ˆy)
Loss
Gradient
θL(y, ˆy)
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
OUTER LOOP: BLACK BOX OPTIMIZATION
Num Neurons/Layer
25
30
35
40
45 Num
Layers
2
4
6
8
10
12
LearningRate
0.00
0.02
0.04
0.06
0.08
0.10
Search Space
η1, ..
η2, ..
η3, ..
η4, ..
η5, ..
Shared Task Queue Parallel Workers
w1
w1
w1
w1
Which algorithm to use for search?




x1
...
xn




Features
Hyperparameters
η, num_layers, neurons
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Model θ
ˆy
Prediction
L(y, ˆy)
Loss
Gradient
θL(y, ˆy)
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
OUTER LOOP: BLACK BOX OPTIMIZATION
Num Neurons/Layer
25
30
35
40
45 Num
Layers
2
4
6
8
10
12
LearningRate
0.00
0.02
0.04
0.06
0.08
0.10
Search Space
η1, ..
η2, ..
η3, ..
η4, ..
η5, ..
Shared Task Queue Parallel Workers
w1
w1
w1
w1
How to monitor progress?
Which algorithm to use for search?




x1
...
xn




Features
Hyperparameters
η, num_layers, neurons
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Model θ
ˆy
Prediction
L(y, ˆy)
Loss
Gradient
θL(y, ˆy)
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
OUTER LOOP: BLACK BOX OPTIMIZATION
Num Neurons/Layer
25
30
35
40
45 Num
Layers
2
4
6
8
10
12
LearningRate
0.00
0.02
0.04
0.06
0.08
0.10
Search Space
η1, ..
η2, ..
η3, ..
η4, ..
η5, ..
Shared Task Queue Parallel Workers
w1
w1
w1
w1
How to aggregate results?
How to monitor progress?
Which algorithm to use for search?




x1
...
xn




Features
Hyperparameters
η, num_layers, neurons
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Model θ
ˆy
Prediction
L(y, ˆy)
Loss
Gradient
θL(y, ˆy)
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
OUTER LOOP: BLACK BOX OPTIMIZATION
Num Neurons/Layer
25
30
35
40
45 Num
Layers
2
4
6
8
10
12
LearningRate
0.00
0.02
0.04
0.06
0.08
0.10
Search Space
η1, ..
η2, ..
η3, ..
η4, ..
η5, ..
Shared Task Queue Parallel Workers
w1
w1
w1
w1
How to aggregate results?
How to monitor progress?
Which algorithm to use for search?
Fault Tolerance?




x1
...
xn




Features
Hyperparameters
η, num_layers, neurons
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Model θ
ˆy
Prediction
L(y, ˆy)
Loss
Gradient
θL(y, ˆy)
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
OUTER LOOP: BLACK BOX OPTIMIZATION
Num Neurons/Layer
25
30
35
40
45 Num
Layers
2
4
6
8
10
12
LearningRate
0.00
0.02
0.04
0.06
0.08
0.10
Search Space
η1, ..
η2, ..
η3, ..
η4, ..
η5, ..
Shared Task Queue Parallel Workers
w1
w1
w1
w1
How to aggregate results?
How to monitor progress?
Which algorithm to use for search?
Fault Tolerance?




x1
...
xn




Features
Hyperparameters
η, num_layers, neurons
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Model θ
ˆy
Prediction
L(y, ˆy)
Loss
Gradient
θL(y, ˆy)
This should be managed with platform support!
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
MAGGY: A FRAMEWORK FOR
SYNCHRONOUS/ASYNCHRONOUS
HYPERPARAMETER TUNING ON HOPSWORKS7
A flexible framework for running different black-box
optimization algorithms on Hopsworks
ASHA, Hyperband, Differential Evolution, Random
search, Grid search, etc.
2em17
Authors of Maggy: Moritz Meister and Sina Sheikholeslami. Author of the base framework that Maggy
builds on: Robin Andersson
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
MAGGY: A FRAMEWORK FOR
SYNCHRONOUS/ASYNCHRONOUS
HYPERPARAMETER TUNING ON HOPSWORKS7
A flexible framework for running different black-box
optimization algorithms on Hopsworks
ASHA, Hyperband, Differential Evolution, Random
search, Grid search, etc.
2em17
Authors of Maggy: Moritz Meister and Sina Sheikholeslami. Author of the base framework that Maggy
builds on: Robin Andersson
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
FRAMEWORK SUPPORT FOR SYNCHRONOUS SEARCH
ALGORITHMS
Un-directed search
N spark tasks N eval metrics Max/Min
h
Driver/Parameter Server
Synchronous directed search (2 iterations)
N spark tasks
Barrier
N spark tasks N eval metrics Synchronization
Barrier
. . .
N eval metrics Synchronization
Driver/Parameter Server
Parallel undirected/synchronous search is trivial using
Spark and a distributed file system
Example of un-directed search algorithms: random and
grid search
Example of synchronous search algorithms: differential
evolution
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
FRAMEWORK SUPPORT FOR SYNCHRONOUS SEARCH
ALGORITHMS
Fits very well with Spark BSP model
Un-directed search
N spark tasks N eval metrics Max/Min
h
Driver/Parameter Server
Synchronous directed search (2 iterations)
N spark tasks
Barrier
N spark tasks N eval metrics Synchronization
Barrier
. . .
N eval metrics Synchronization
Driver/Parameter Server
Parallel un-directed/synchronous search is trivial using
Spark and a distributed file system
Example of un-directed search algorithms: random and
grid search
Example of synchronous search algorithms: differential
evolution
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
PROBLEM WITH THE BULK-SYNCHRONOUS
PROCESSING MODEL FOR PARALLEL SEARCH
N spark tasks
Barrier
N eval metrics
Waiting.. wasted compute
Driver/Parameter Server
Synchronous search is sensitive to stragglers and not suitable for
early stopping
... For large scale search problems we need asynchronous search
Problem: Asynchronous search is much harder to implement
with big data processing tools such as Spark
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
ENTER MAGGY: A FRAMEWORK FOR RUNNING
ASYNCHRONOUS SEARCH ALGORITHMS ON HOPS
1 spark task/worker
HopsFS
Async Task Queue/Driver/Parameter Server
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
ENTER MAGGY: A FRAMEWORK FOR RUNNING
ASYNCHRONOUS SEARCH ALGORITHMS ON HOPS
1 spark task/worker, many async tasks inside
HopsFS
Async Task Queue/Driver/Parameter Server
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
ENTER MAGGY: A FRAMEWORK FOR RUNNING
ASYNCHRONOUS SEARCH ALGORITHMS ON HOPS
1 spark task/worker, many async tasks inside
HopsFS
Write
checkpoints &
results
Async Task Queue/Driver/Parameter Server
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
ENTER MAGGY: A FRAMEWORK FOR RUNNING
ASYNCHRONOUS SEARCH ALGORITHMS ON HOPS
1 spark task/worker, many async tasks inside
HopsFS
Write
checkpoints &
results
Async fetching
of new tasks
(RPC framework)
Async Task Queue/Driver/Parameter Server
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
ENTER MAGGY: A FRAMEWORK FOR RUNNING
ASYNCHRONOUS SEARCH ALGORITHMS ON HOPS
Robust against
stragglers
Supports early
stopping
Fault tolerance with
checkpointing
Monitoring with
Tensorboard
Log aggregation
with HopsFS
Simple API and
extendable
1 spark task/worker, many async tasks inside
HopsFS
Write
checkpoints &
results
Async fetching
of new tasks
(RPC framework)
Async Task Queue/Driver/Parameter Server
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
MAGGY: ASYNCHRONOUS SEARCH WORKFLOW
λ Suggestionsb0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Trial
Metric
α
λ Suggestions
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Trial
Metric
α
λ Suggestions
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Trial
Metric
α
Workers Coordinator
Global Task Queue
0 20 40 60 80 100
Epochs
Accuracy
lr=0.0021,layers=5
lr=0.01,layers=2
lr=0.01,layers=10
lr=0.001,layers=15
lr=0.001,layers=25
lr=0.019,layers=5
lr=0.001,layers=7
lr=0.01,layers=4
lr=0.0014,layers=3
lr=0.05,layers=1
Trials Progress
Black-Box Optimziers
minx f(x)
x ∈ S
Suggested tasks
Results
Suggested tasks
Results
Suggested tasks
Results
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
MAGGY: ASYNCHRONOUS SEARCH WORKFLOW
λ Suggestionsb0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Trial
Metric
α
λ Suggestions
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Trial
Metric
α
λ Suggestions
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Trial
Metric
α
Workers Coordinator
Global Task Queue
0 20 40 60 80 100
Epochs
Accuracy
lr=0.0021,layers=5
lr=0.01,layers=2
lr=0.01,layers=10
lr=0.001,layers=15
lr=0.001,layers=25
lr=0.019,layers=5
lr=0.001,layers=7
lr=0.01,layers=4
lr=0.0014,layers=3
lr=0.05,layers=1
Trials Progress
Black-Box Optimziers
minx f(x)
x ∈ S
Suggested tasks
Results
Heartbeats
Suggested tasks
Results
Heartbeats
Suggested tasks
Results
Heartbeats
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
MAGGY: ASYNCHRONOUS SEARCH WORKFLOW
λ Suggestionsb0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Trial
Metric
α
λ Suggestions
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Trial
Metric
α
λ Suggestions
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Trial
Metric
α
Workers Coordinator
Global Task Queue
0 20 40 60 80 100
Epochs
Accuracy
lr=0.0021,layers=5
lr=0.01,layers=2
lr=0.01,layers=10
lr=0.001,layers=15
lr=0.001,layers=25
lr=0.019,layers=5
lr=0.001,layers=7
lr=0.01,layers=4
lr=0.0014,layers=3
lr=0.05,layers=1
Trials Progress
Black-Box Optimziers
minx f(x)
x ∈ S
Suggested tasks
Results
Early Stop
Heartbeats
Suggested tasks
Results
Early Stop
Heartbeats
Suggested tasks
Results
Heartbeats
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
MAGGY: ASYNCHRONOUS SEARCH WORKFLOW
λ Suggestions
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Trial
Metric
α
λ Suggestions
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Trial
Metric
α
λ Suggestions
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Trial
Metric
α
Workers Coordinator
Global Task Queue
0 20 40 60 80 100
Epochs
Accuracy
lr=0.0021,layers=5
lr=0.01,layers=2
lr=0.01,layers=10
lr=0.001,layers=15
lr=0.001,layers=25
lr=0.019,layers=5
lr=0.001,layers=7
lr=0.01,layers=4
lr=0.0014,layers=3
lr=0.05,layers=1
Trials Progress
Black-Box Optimziers
minx f(x)
x ∈ S
Suggested tasks
Results
Early Stop
Heartbeats
Suggested tasks
Results
Early Stop
Heartbeats
Suggested tasks
Results
Heartbeats
Checkpoints
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
MAGGY: API
class RandomSearch(AbstractOptimizer):
def initialize(self):
# ..
def get_suggestion(self, trial=None):
# ..
def finalize_experiment(self, trials):
# ..
def early_check(self, to_check , trials, direction):
# ..
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
MAGGY: API
Users have to extend the AbstractOptimizer
base class to implement their own algorithms.
class RandomSearch(AbstractOptimizer):
def initialize(self):
# ..
def get_suggestion(self, trial=None):
# ..
def finalize_experiment(self, trials):
# ..
def early_check(self, to_check ,
trials, direction):
# ..
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
MAGGY: API
Users have to extend the AbstractOptimizer
base class to implement their own algorithms.
Initializing search space
class RandomSearch(AbstractOptimizer):
def initialize(self):
# ..
def get_suggestion(self, trial=None):
# ..
def finalize_experiment(self, trials):
# ..
def early_check(self, to_check ,
trials, direction):
# ..
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
MAGGY: API
Users have to extend the AbstractOptimizer
base class to implement their own algorithms.
Initializing search space
Suggestions to be evaluated by workers
class RandomSearch(AbstractOptimizer):
def initialize(self):
# ..
def get_suggestion(self, trial=None):
# ..
def finalize_experiment(self, trials):
# ..
def early_check(self, to_check ,
trials, direction):
# ..
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
MAGGY: API
Users have to extend the AbstractOptimizer
base class to implement their own algorithms.
Initializing search space
Suggestions to be evaluated by workers
Aggregate results
class RandomSearch(AbstractOptimizer):
def initialize(self):
# ..
def get_suggestion(self, trial=None):
# ..
def finalize_experiment(self, trials):
# ..
def early_check(self, to_check ,
trials, direction):
# ..
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
MAGGY: API
Users have to extend the AbstractOptimizer
base class to implement their own algorithms.
Initializing search space
Suggestions to be evaluated by workers
Aggregate results
Configure early-stop policy
class RandomSearch(AbstractOptimizer):
def initialize(self):
# ..
def get_suggestion(self, trial=None):
# ..
def finalize_experiment(self, trials):
# ..
def early_check(self, to_check ,
trials, direction):
# ..
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
MAGGY: API
from maggy import experiment
from maggy.searchspace import Searchspace
from maggy.randomsearch import RandomSearch
sp = Searchspace(argument_param=(’DOUBLE’, [1, 5]))
rs = RandomSearch(5, sp)
result = experiment.launch(train_fn ,
sp, optimizer=rs,
num_trials=5, name=’test’,
direction="max")
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
SUMMARY
Deep Learning is going distributed
Algorithms for DDL are available in several frameworks
Applying DDL in practice brings a lot of operational
complexity
Hopsworks is a platform for scale out deep learning and
big data processing
Hopsworks makes DDL simpler by providing simple
abstractions for distributed training, parallel experiments
and much more..
@hopshadoop
www.hops.io
@logicalclocks
www.logicalclocks.com
We are open source:
https://github.com/logicalclocks/hopsworks
https://github.com/hopshadoop/hops
Thanks to Logical Clocks Team: Jim Dowling, Seif Haridi, Theo Kakantousis, Fabio Buso,
Gautier Berthou, Ermias Gebremeskel, Mahmoud Ismail, Salman Niazi, Antonios Kouzoupis, Robin Andersson,
Alex Ormenisan, and Rasmus Toivonen. And our interns: Moritz Meister and Sina Sheikholeslami.
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
REFERENCES
Example notebooks https:
//github.com/logicalclocks/hops-examples
HopsML8
Hopsworks9
Hopsworks’ feature store10
Maggy
https://github.com/logicalclocks/maggy
2em18
Logical Clocks AB. HopsML: Python-First ML Pipelines. https : / / hops . readthedocs . io / en /
latest/hopsml/hopsML.html. 2018.
2em19
Jim Dowling. Introducing Hopsworks. https : / / www . logicalclocks . com / introducing -
hopsworks/. 2018.
2em110
Kim Hammar and Jim Dowling. Feature Store: the missing data layer in ML pipelines? https://www.
logicalclocks.com/feature-store/. 2018.

Distributed deep learning_with_hopsworks_kim_hammar_25_april_2019

  • 1.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY Distributed Deep Learning Using Hopsworks SF Machine Learning Mesosphere Kim Hammar kim@logicalclocks.com
  • 2.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY DISTRIBUTED COMPUTING + DEEP LEARNING = ? b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Distributed Computing Deep Learning Why Combine the two? 2em11 Chen Sun et al. “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”. In: CoRR abs/1707.02968 (2017). arXiv: 1707.02968. URL: http://arxiv.org/abs/1707.02968. 2em12 Jeffrey Dean et al. “Large Scale Distributed Deep Networks”. In: Advances in Neural Information Processing Systems 25. Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1223–1231.
  • 3.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY DISTRIBUTED COMPUTING + DEEP LEARNING = ? b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Distributed Computing Deep Learning Why Combine the two? We like challenging problems 2em11 Chen Sun et al. “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”. In: CoRR abs/1707.02968 (2017). arXiv: 1707.02968. URL: http://arxiv.org/abs/1707.02968. 2em12 Jeffrey Dean et al. “Large Scale Distributed Deep Networks”. In: Advances in Neural Information Processing Systems 25. Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1223–1231.
  • 4.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY DISTRIBUTED COMPUTING + DEEP LEARNING = ? b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Distributed Computing Deep Learning Why Combine the two? We like challenging problems More productive data science Unreasonable effectiveness of data1 To achieve state-of-the-art results2 2em11 Chen Sun et al. “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”. In: CoRR abs/1707.02968 (2017). arXiv: 1707.02968. URL: http://arxiv.org/abs/1707.02968. 2em12 Jeffrey Dean et al. “Large Scale Distributed Deep Networks”. In: Advances in Neural Information Processing Systems 25. Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1223–1231.
  • 5.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY DISTRIBUTED DEEP LEARNING (DDL): PREDICTABLE SCALING 3 2em13 Jeff Dean. Building Intelligent Systems withLarge Scale Deep Learning. https : / / www . scribd . com / document/355752799/Jeff-Dean-s-Lecture-for-YC-AI. 2018.
  • 6.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY DISTRIBUTED DEEP LEARNING (DDL): PREDICTABLE SCALING
  • 7.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY DDL IS NOT A SECRET ANYMORE 4 2em14 Tal Ben-Nun and Torsten Hoefler. “Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis”. In: CoRR abs/1802.09941 (2018). arXiv: 1802.09941. URL: http://arxiv.org/abs/ 1802.09941.
  • 8.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY DDL IS NOT A SECRET ANYMORE TensorflowOnSpark CaffeOnSpark Distributed TF Frameworks for DDL Companies using DDL
  • 9.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY DDL REQUIRES AN ENTIRE SOFTWARE/INFRASTRUCTURE STACK e1 e2 e3 Distributed Training e4 Gradient Gradient Gradient Gradient Distributed Systems Data Validation Feature Engineering Data Collection Hardware Management HyperParameter Tuning Model Serving Pipeline Management A/B Testing Monitoring
  • 10.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY OUTLINE 1. Hopsworks: Background of the platform 2. Managed Distributed Deep Learning using HopsYARN, HopsML, PySpark, and Tensorflow 3. Black-Box Optimization using Hopsworks, Metadata Store, PySpark, and Maggy5 2em15 Moritz Meister and Sina Sheikholeslami. Maggy. https://github.com/logicalclocks/maggy. 2019.
  • 11.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY OUTLINE 1. Hopsworks: Background of the platform 2. Managed Distributed Deep Learning using HopsYARN, HopsML, PySpark, and Tensorflow 3. Black-Box Optimization using Hopsworks, Metadata Store, PySpark, and Maggy6 2em16 Moritz Meister and Sina Sheikholeslami. Maggy. https://github.com/logicalclocks/maggy. 2019.
  • 12.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY HOPSWORKS
  • 13.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY HOPSWORKS HopsFS
  • 14.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY HOPSWORKS HopsFS HopsYARN (GPU/CPU as a resource)
  • 15.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY HOPSWORKS HopsFS HopsYARN (GPU/CPU as a resource) Frameworks (ML/Data)
  • 16.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY HOPSWORKS HopsFS HopsYARN (GPU/CPU as a resource) Frameworks (ML/Data) Feature Store Pipelines Experiments Models b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy ML/AI Assets
  • 17.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY HOPSWORKS HopsFS HopsYARN (GPU/CPU as a resource) Frameworks (ML/Data) Feature Store Pipelines Experiments Models b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy ML/AI Assets from hops import featurestore from hops import experiment featurestore.get_features([ "average_attendance", "average_player_age"]) experiment.collective_all_reduce(features , model) APIs
  • 18.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY HOPSWORKS HopsFS HopsYARN (GPU/CPU as a resource) Frameworks (ML/Data) Distributed Metadata (Available from REST API) Feature Store Pipelines Experiments Models b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy ML/AI Assets from hops import featurestore from hops import experiment featurestore.get_features([ "average_attendance", "average_player_age"]) experiment.collective_all_reduce(features , model) APIs
  • 19.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY INNER AND OUTER LOOP OF LARGE SCALE DEEP LEARNING Inner loop b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy . . . worker1 worker2 workerN Data Synchronization 1 2 N
  • 20.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY INNER AND OUTER LOOP OF LARGE SCALE DEEP LEARNING Outer loop Metric τ Search Method hparams h Inner loop b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy . . . worker1 worker2 workerN Data Synchronization 1 2 N
  • 21.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY INNER AND OUTER LOOP OF LARGE SCALE DEEP LEARNING Outer loop Metric τ Search Method hparams h Inner loop b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy . . . worker1 worker2 workerN Data Synchronization 1 2 N
  • 22.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY INNER LOOP: DISTRIBUTED DEEP LEARNING e1 e2 e3 e4
  • 23.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY INNER LOOP: DISTRIBUTED DEEP LEARNING e1 e2 e3 e4 Gradient Gradient Gradient Gradient
  • 24.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY INNER LOOP: DISTRIBUTED DEEP LEARNING e1 e2 e3 e4 p1 p2 p3 p4 Gradient Gradient Gradient Gradient Data Partition Data Partition Data Partition Data Partition
  • 25.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY DISTRIBUTED DEEP LEARNING IN PRACTICE Implementation of distributed algorithms is becoming a commodity (TF, PyTorch etc) The hardest part of DDL is now: Cluster management Allocating GPUs Data management Operations & performance ? Models GPUs Data Distribution b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy
  • 26.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY HOPSWORKS DDL SOLUTION
  • 27.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY HOPSWORKS DDL SOLUTION from hops import experiment experiment.collective_all_reduce(train_fn)
  • 28.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY HOPSWORKS DDL SOLUTION from hops import experiment experiment. collective_all_reduce( train_fn ) HopsYARN RM YARN container GPU as a resource YARN container GPU as a resource YARN container GPU as a resource YARN container GPU as a resource Resource requests Client API YARN container
  • 29.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY HOPSWORKS DDL SOLUTION from hops import experiment experiment. collective_all_reduce( train_fn ) HopsYARN RM YARN container GPU as a resource Spark executor YARN container GPU as a resource Spark executor YARN container GPU as a resource Spark executor YARN container GPU as a resource Spark executor Resource requests Client API YARN container Spark driver
  • 30.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY HOPSWORKS DDL SOLUTION from hops import experiment experiment. collective_all_reduce( train_fn ) HopsYARN RM YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor Resource requests Client API YARN container conda env Spark driver
  • 31.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY HOPSWORKS DDL SOLUTION from hops import experiment experiment. collective_all_reduce( train_fn ) HopsYARN RM YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor Resource requests Client API Here is my ip: 192.168.1.1 Here is my ip: 192.168.1.2 Here is my ip: 192.168.1.3 Here is my ip: 192.168.1.4 YARN container conda env Spark driver
  • 32.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY HOPSWORKS DDL SOLUTION from hops import experiment experiment. collective_all_reduce( train_fn ) HopsYARN RM YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor Resource requests Client API Gradient Gradient GradientGradient YARN container conda env Spark driver Hops Distributed File System (HopsFS)
  • 33.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY HOPSWORKS DDL SOLUTION from hops import experiment experiment. collective_all_reduce( train_fn ) HopsYARN RM YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor Resource requests Client API Gradient Gradient GradientGradient YARN container conda env Spark driver Hops Distributed File System (HopsFS) Hide complexity behind simple API Allocate resources using pyspark Allocate GPUs for spark executors using HopsYARN Serve sharded training data to workers from HopsFS Use HopsFS for aggregating logs, checkpoints and results Store experiment metadata in metastore Use dynamic allocation for interactive resource management
  • 34.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY OUTER LOOP: BLACK BOX OPTIMIZATION Outer loop Metric τ Search Method hparams h Inner loop b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy . . . worker1 worker2 workerN Data Synchronization 1 2 N
  • 35.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY OUTER LOOP: BLACK BOX OPTIMIZATION Outer loop Metric τ Search Method hparams h Inner loop b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy . . . worker1 worker2 workerN Data Synchronization 1 2 N
  • 36.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY OUTER LOOP: BLACK BOX OPTIMIZATION     x1 ... xn     Features Hyperparameters η, num_layers, neurons b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Model θ ˆy Prediction L(y, ˆy) Loss Gradient θL(y, ˆy)
  • 37.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY OUTER LOOP: BLACK BOX OPTIMIZATION Example Use-Case from one of our clients: Goal: Train a One-Class GAN model for fraud detection Problem: GANs are extremely sensitive to hyperparameters and there exists a very large space of possible hyperparameters. Example hyperparameters to tune: learning rates η, optimizers, layers.. etc. Real input x Random Noise z Generator b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Discriminator b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy
  • 38.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY OUTER LOOP: BLACK BOX OPTIMIZATION Num Neurons/Layer 25 30 35 40 45 Num Layers 2 4 6 8 10 12 LearningRate 0.00 0.02 0.04 0.06 0.08 0.10 Search Space     x1 ... xn     Features Hyperparameters η, num_layers, neurons b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Model θ ˆy Prediction L(y, ˆy) Loss Gradient θL(y, ˆy)
  • 39.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY OUTER LOOP: BLACK BOX OPTIMIZATION Num Neurons/Layer 25 30 35 40 45 Num Layers 2 4 6 8 10 12 LearningRate 0.00 0.02 0.04 0.06 0.08 0.10 Search Space η1, .. η2, .. η3, .. η4, .. η5, .. Shared Task Queue Parallel Workers w1 w1 w1 w1     x1 ... xn     Features Hyperparameters η, num_layers, neurons b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Model θ ˆy Prediction L(y, ˆy) Loss Gradient θL(y, ˆy)
  • 40.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
  • 41.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY OUTER LOOP: BLACK BOX OPTIMIZATION Num Neurons/Layer 25 30 35 40 45 Num Layers 2 4 6 8 10 12 LearningRate 0.00 0.02 0.04 0.06 0.08 0.10 Search Space η1, .. η2, .. η3, .. η4, .. η5, .. Shared Task Queue Parallel Workers w1 w1 w1 w1     x1 ... xn     Features Hyperparameters η, num_layers, neurons b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Model θ ˆy Prediction L(y, ˆy) Loss Gradient θL(y, ˆy)
  • 42.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY OUTER LOOP: BLACK BOX OPTIMIZATION Num Neurons/Layer 25 30 35 40 45 Num Layers 2 4 6 8 10 12 LearningRate 0.00 0.02 0.04 0.06 0.08 0.10 Search Space η1, .. η2, .. η3, .. η4, .. η5, .. Shared Task Queue Parallel Workers w1 w1 w1 w1 Which algorithm to use for search?     x1 ... xn     Features Hyperparameters η, num_layers, neurons b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Model θ ˆy Prediction L(y, ˆy) Loss Gradient θL(y, ˆy)
  • 43.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY OUTER LOOP: BLACK BOX OPTIMIZATION Num Neurons/Layer 25 30 35 40 45 Num Layers 2 4 6 8 10 12 LearningRate 0.00 0.02 0.04 0.06 0.08 0.10 Search Space η1, .. η2, .. η3, .. η4, .. η5, .. Shared Task Queue Parallel Workers w1 w1 w1 w1 How to monitor progress? Which algorithm to use for search?     x1 ... xn     Features Hyperparameters η, num_layers, neurons b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Model θ ˆy Prediction L(y, ˆy) Loss Gradient θL(y, ˆy)
  • 44.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY OUTER LOOP: BLACK BOX OPTIMIZATION Num Neurons/Layer 25 30 35 40 45 Num Layers 2 4 6 8 10 12 LearningRate 0.00 0.02 0.04 0.06 0.08 0.10 Search Space η1, .. η2, .. η3, .. η4, .. η5, .. Shared Task Queue Parallel Workers w1 w1 w1 w1 How to aggregate results? How to monitor progress? Which algorithm to use for search?     x1 ... xn     Features Hyperparameters η, num_layers, neurons b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Model θ ˆy Prediction L(y, ˆy) Loss Gradient θL(y, ˆy)
  • 45.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY OUTER LOOP: BLACK BOX OPTIMIZATION Num Neurons/Layer 25 30 35 40 45 Num Layers 2 4 6 8 10 12 LearningRate 0.00 0.02 0.04 0.06 0.08 0.10 Search Space η1, .. η2, .. η3, .. η4, .. η5, .. Shared Task Queue Parallel Workers w1 w1 w1 w1 How to aggregate results? How to monitor progress? Which algorithm to use for search? Fault Tolerance?     x1 ... xn     Features Hyperparameters η, num_layers, neurons b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Model θ ˆy Prediction L(y, ˆy) Loss Gradient θL(y, ˆy)
  • 46.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY OUTER LOOP: BLACK BOX OPTIMIZATION Num Neurons/Layer 25 30 35 40 45 Num Layers 2 4 6 8 10 12 LearningRate 0.00 0.02 0.04 0.06 0.08 0.10 Search Space η1, .. η2, .. η3, .. η4, .. η5, .. Shared Task Queue Parallel Workers w1 w1 w1 w1 How to aggregate results? How to monitor progress? Which algorithm to use for search? Fault Tolerance?     x1 ... xn     Features Hyperparameters η, num_layers, neurons b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Model θ ˆy Prediction L(y, ˆy) Loss Gradient θL(y, ˆy) This should be managed with platform support!
  • 47.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY MAGGY: A FRAMEWORK FOR SYNCHRONOUS/ASYNCHRONOUS HYPERPARAMETER TUNING ON HOPSWORKS7 A flexible framework for running different black-box optimization algorithms on Hopsworks ASHA, Hyperband, Differential Evolution, Random search, Grid search, etc. 2em17 Authors of Maggy: Moritz Meister and Sina Sheikholeslami. Author of the base framework that Maggy builds on: Robin Andersson
  • 48.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY MAGGY: A FRAMEWORK FOR SYNCHRONOUS/ASYNCHRONOUS HYPERPARAMETER TUNING ON HOPSWORKS7 A flexible framework for running different black-box optimization algorithms on Hopsworks ASHA, Hyperband, Differential Evolution, Random search, Grid search, etc. 2em17 Authors of Maggy: Moritz Meister and Sina Sheikholeslami. Author of the base framework that Maggy builds on: Robin Andersson
  • 49.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY FRAMEWORK SUPPORT FOR SYNCHRONOUS SEARCH ALGORITHMS Un-directed search N spark tasks N eval metrics Max/Min h Driver/Parameter Server Synchronous directed search (2 iterations) N spark tasks Barrier N spark tasks N eval metrics Synchronization Barrier . . . N eval metrics Synchronization Driver/Parameter Server Parallel undirected/synchronous search is trivial using Spark and a distributed file system Example of un-directed search algorithms: random and grid search Example of synchronous search algorithms: differential evolution
  • 50.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY FRAMEWORK SUPPORT FOR SYNCHRONOUS SEARCH ALGORITHMS Fits very well with Spark BSP model Un-directed search N spark tasks N eval metrics Max/Min h Driver/Parameter Server Synchronous directed search (2 iterations) N spark tasks Barrier N spark tasks N eval metrics Synchronization Barrier . . . N eval metrics Synchronization Driver/Parameter Server Parallel un-directed/synchronous search is trivial using Spark and a distributed file system Example of un-directed search algorithms: random and grid search Example of synchronous search algorithms: differential evolution
  • 51.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY PROBLEM WITH THE BULK-SYNCHRONOUS PROCESSING MODEL FOR PARALLEL SEARCH N spark tasks Barrier N eval metrics Waiting.. wasted compute Driver/Parameter Server Synchronous search is sensitive to stragglers and not suitable for early stopping ... For large scale search problems we need asynchronous search Problem: Asynchronous search is much harder to implement with big data processing tools such as Spark
  • 52.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY ENTER MAGGY: A FRAMEWORK FOR RUNNING ASYNCHRONOUS SEARCH ALGORITHMS ON HOPS 1 spark task/worker HopsFS Async Task Queue/Driver/Parameter Server
  • 53.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY ENTER MAGGY: A FRAMEWORK FOR RUNNING ASYNCHRONOUS SEARCH ALGORITHMS ON HOPS 1 spark task/worker, many async tasks inside HopsFS Async Task Queue/Driver/Parameter Server
  • 54.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY ENTER MAGGY: A FRAMEWORK FOR RUNNING ASYNCHRONOUS SEARCH ALGORITHMS ON HOPS 1 spark task/worker, many async tasks inside HopsFS Write checkpoints & results Async Task Queue/Driver/Parameter Server
  • 55.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY ENTER MAGGY: A FRAMEWORK FOR RUNNING ASYNCHRONOUS SEARCH ALGORITHMS ON HOPS 1 spark task/worker, many async tasks inside HopsFS Write checkpoints & results Async fetching of new tasks (RPC framework) Async Task Queue/Driver/Parameter Server
  • 56.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY ENTER MAGGY: A FRAMEWORK FOR RUNNING ASYNCHRONOUS SEARCH ALGORITHMS ON HOPS Robust against stragglers Supports early stopping Fault tolerance with checkpointing Monitoring with Tensorboard Log aggregation with HopsFS Simple API and extendable 1 spark task/worker, many async tasks inside HopsFS Write checkpoints & results Async fetching of new tasks (RPC framework) Async Task Queue/Driver/Parameter Server
  • 57.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY MAGGY: ASYNCHRONOUS SEARCH WORKFLOW λ Suggestionsb0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Trial Metric α λ Suggestions b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Trial Metric α λ Suggestions b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Trial Metric α Workers Coordinator Global Task Queue 0 20 40 60 80 100 Epochs Accuracy lr=0.0021,layers=5 lr=0.01,layers=2 lr=0.01,layers=10 lr=0.001,layers=15 lr=0.001,layers=25 lr=0.019,layers=5 lr=0.001,layers=7 lr=0.01,layers=4 lr=0.0014,layers=3 lr=0.05,layers=1 Trials Progress Black-Box Optimziers minx f(x) x ∈ S Suggested tasks Results Suggested tasks Results Suggested tasks Results
  • 58.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY MAGGY: ASYNCHRONOUS SEARCH WORKFLOW λ Suggestionsb0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Trial Metric α λ Suggestions b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Trial Metric α λ Suggestions b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Trial Metric α Workers Coordinator Global Task Queue 0 20 40 60 80 100 Epochs Accuracy lr=0.0021,layers=5 lr=0.01,layers=2 lr=0.01,layers=10 lr=0.001,layers=15 lr=0.001,layers=25 lr=0.019,layers=5 lr=0.001,layers=7 lr=0.01,layers=4 lr=0.0014,layers=3 lr=0.05,layers=1 Trials Progress Black-Box Optimziers minx f(x) x ∈ S Suggested tasks Results Heartbeats Suggested tasks Results Heartbeats Suggested tasks Results Heartbeats
  • 59.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY MAGGY: ASYNCHRONOUS SEARCH WORKFLOW λ Suggestionsb0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Trial Metric α λ Suggestions b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Trial Metric α λ Suggestions b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Trial Metric α Workers Coordinator Global Task Queue 0 20 40 60 80 100 Epochs Accuracy lr=0.0021,layers=5 lr=0.01,layers=2 lr=0.01,layers=10 lr=0.001,layers=15 lr=0.001,layers=25 lr=0.019,layers=5 lr=0.001,layers=7 lr=0.01,layers=4 lr=0.0014,layers=3 lr=0.05,layers=1 Trials Progress Black-Box Optimziers minx f(x) x ∈ S Suggested tasks Results Early Stop Heartbeats Suggested tasks Results Early Stop Heartbeats Suggested tasks Results Heartbeats
  • 60.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY MAGGY: ASYNCHRONOUS SEARCH WORKFLOW λ Suggestions b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Trial Metric α λ Suggestions b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Trial Metric α λ Suggestions b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Trial Metric α Workers Coordinator Global Task Queue 0 20 40 60 80 100 Epochs Accuracy lr=0.0021,layers=5 lr=0.01,layers=2 lr=0.01,layers=10 lr=0.001,layers=15 lr=0.001,layers=25 lr=0.019,layers=5 lr=0.001,layers=7 lr=0.01,layers=4 lr=0.0014,layers=3 lr=0.05,layers=1 Trials Progress Black-Box Optimziers minx f(x) x ∈ S Suggested tasks Results Early Stop Heartbeats Suggested tasks Results Early Stop Heartbeats Suggested tasks Results Heartbeats Checkpoints
  • 61.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY MAGGY: API class RandomSearch(AbstractOptimizer): def initialize(self): # .. def get_suggestion(self, trial=None): # .. def finalize_experiment(self, trials): # .. def early_check(self, to_check , trials, direction): # ..
  • 62.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY MAGGY: API Users have to extend the AbstractOptimizer base class to implement their own algorithms. class RandomSearch(AbstractOptimizer): def initialize(self): # .. def get_suggestion(self, trial=None): # .. def finalize_experiment(self, trials): # .. def early_check(self, to_check , trials, direction): # ..
  • 63.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY MAGGY: API Users have to extend the AbstractOptimizer base class to implement their own algorithms. Initializing search space class RandomSearch(AbstractOptimizer): def initialize(self): # .. def get_suggestion(self, trial=None): # .. def finalize_experiment(self, trials): # .. def early_check(self, to_check , trials, direction): # ..
  • 64.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY MAGGY: API Users have to extend the AbstractOptimizer base class to implement their own algorithms. Initializing search space Suggestions to be evaluated by workers class RandomSearch(AbstractOptimizer): def initialize(self): # .. def get_suggestion(self, trial=None): # .. def finalize_experiment(self, trials): # .. def early_check(self, to_check , trials, direction): # ..
  • 65.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY MAGGY: API Users have to extend the AbstractOptimizer base class to implement their own algorithms. Initializing search space Suggestions to be evaluated by workers Aggregate results class RandomSearch(AbstractOptimizer): def initialize(self): # .. def get_suggestion(self, trial=None): # .. def finalize_experiment(self, trials): # .. def early_check(self, to_check , trials, direction): # ..
  • 66.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY MAGGY: API Users have to extend the AbstractOptimizer base class to implement their own algorithms. Initializing search space Suggestions to be evaluated by workers Aggregate results Configure early-stop policy class RandomSearch(AbstractOptimizer): def initialize(self): # .. def get_suggestion(self, trial=None): # .. def finalize_experiment(self, trials): # .. def early_check(self, to_check , trials, direction): # ..
  • 67.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY MAGGY: API from maggy import experiment from maggy.searchspace import Searchspace from maggy.randomsearch import RandomSearch sp = Searchspace(argument_param=(’DOUBLE’, [1, 5])) rs = RandomSearch(5, sp) result = experiment.launch(train_fn , sp, optimizer=rs, num_trials=5, name=’test’, direction="max")
  • 68.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY SUMMARY Deep Learning is going distributed Algorithms for DDL are available in several frameworks Applying DDL in practice brings a lot of operational complexity Hopsworks is a platform for scale out deep learning and big data processing Hopsworks makes DDL simpler by providing simple abstractions for distributed training, parallel experiments and much more.. @hopshadoop www.hops.io @logicalclocks www.logicalclocks.com We are open source: https://github.com/logicalclocks/hopsworks https://github.com/hopshadoop/hops Thanks to Logical Clocks Team: Jim Dowling, Seif Haridi, Theo Kakantousis, Fabio Buso, Gautier Berthou, Ermias Gebremeskel, Mahmoud Ismail, Salman Niazi, Antonios Kouzoupis, Robin Andersson, Alex Ormenisan, and Rasmus Toivonen. And our interns: Moritz Meister and Sina Sheikholeslami.
  • 69.
    INTRODUCTION HOPSWORKS DISTRIBUTEDDEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY REFERENCES Example notebooks https: //github.com/logicalclocks/hops-examples HopsML8 Hopsworks9 Hopsworks’ feature store10 Maggy https://github.com/logicalclocks/maggy 2em18 Logical Clocks AB. HopsML: Python-First ML Pipelines. https : / / hops . readthedocs . io / en / latest/hopsml/hopsML.html. 2018. 2em19 Jim Dowling. Introducing Hopsworks. https : / / www . logicalclocks . com / introducing - hopsworks/. 2018. 2em110 Kim Hammar and Jim Dowling. Feature Store: the missing data layer in ML pipelines? https://www. logicalclocks.com/feature-store/. 2018.