SlideShare a Scribd company logo
Jim Dowling, Logical Clocks AB
Distributed Deep
Learning with Apache
Spark and TensorFlow
#SAISDL2
jim_dowling
The Cargobike Riddle
2/48
?
Spark &
TensorFlow
Dynamic
Executors
(release GPUs when
training finishes)
Container
(GPUs)
Blacklisting Executors
(for Fault Tolerant
Hyperparameter Optimization)
Optimizing GPU Resource Utilization
3/48
Scalable ML Pipeline
Data
Collection
Experimentation Training Serving
Feature
Extraction
Data
Transformation
& Verification
Test
Distributed Storage
Potential Bottlenecks
Object Stores (S3, GCS), HDFS, Ceph
StandaloneSingle-Host TensorFlow Single GPU
4/48
Scalable ML Pipeline
5/48
Data
Collection
Experimentation Training Serving
Feature
Extraction
Data
Transformation
& Verification
Test
HopsFS
(Hopsworks)
PySpark KubernetesTensorFlow
6/48
Hopsworks
Rest API
JWT / TLS
Airflow
Spark/TensorFlow in Hopsworks
7/48
Executor Executor
Driver
HopsFSTensorBoard/Logs Model Serving
Conda Envs Conda Envs
HopsML
8/48
• Experiments
– Dist. Hyperparameter Optimization
– Versioning of Models/Code/Resources
– Visualization with Tensorboard
– Distributed Training with checkpointing
• [Feature Store]
• Model Serving and Monitoring
Feature Extraction
Experimentation
Training
Test + Serve
Data Acquisition
Clean/Transform Data
Why Distributed Deep Learning?
9/48
10/48
Prof Nando de Freitas @NandoDF
“ICLR 2019 lessons thus far: The deep neural
nets have to be BIGGER and they’re hungry for
data, memory and compute.”
https://twitter.com/NandoDF/status/1046371764211716096
All Roads Lead to Distribution
11/48
Distributed
Deep Learning
Hyper
Parameter
Optimization
Distributed
Training
Larger
Training
Datasets
Elastic
Model
Serving
Parallel
Experiments
(Commodity)
GPU Clusters
Auto
ML
Hyperparameter Optimization
12/48
(Because DL Theory Sucks!)
Faster Experimentation
13/48
GPU Servers
LearningRate (LR): [0.0001-0.001]
NumOfLayers (NL): [5-10]
……
LR: 0.0001
NL: 5
Error: 0.35
LR: 0.0001
NL: 7
Error: 0.34
LR: 0.0005
NL: 5
Error: 0.38
LR: 0.0005
NL: 10
Error: 0.37
LR: 0.001
NL: 6
Error: 0.31
LR: 0.001
NL: 9
HyperParameter Optimization
LR: 0.001
NL: 6
Error: 0.31
TensorFlow Program
Hyperparameters
Blacklist Executor
LR: 0.001
NL: 9
Error: 0.36
Declarative or API Approach?
• Declarative Hyperparameters in external files
– Vizier/CloudML (yaml)
– Sagemaker (json)*
• API-Driven
– Databrick’s MLFlow
– HopsML
14/48
*https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-define-ranges.html
Notebook-Friendly
def train(learning_rate, dropout):
[TensorFlow Code here]
args_dict = {'learning_rate': [0.001, 0.005, 0.01],
'dropout': [0.5, 0.6]}
experiment.launch(train, args_dict)
GridSearch for Hyperparameters on HopsML
Launch 6 Spark Executors
Dynamic Executors, Blacklisting
Distributed Training
16/48
Image from @hardmaru on Twitter.
Data Parallel Distributed Training
17/48
Training Time
Generalization
Error
(Synchronous Stochastic Gradient Descent (SGD))
Frameworks for Distributed Training
18/48
Distributed TensorFlow / TfOnSpark
TF_CONFIG
Bring your own Distribution!
1. Start all processes for
P1,P2, G1-G4 yourself
2. Enter all IP addresses in
TF_CONFIG along with
GPU device IDs.
19/48
Parameter Servers
G1 G2 G3 G4
P1 P2
GPU Servers
TF_CONFIG
Horovod
• Bandwidth optimal
• Builds the Ring, runs
AllReduce using MPI
and NCCL2
• Available in
– Hopsworks
– Databricks (Spark 2.4)
20/48
Tf CollectiveAllReduceStrategy
TF_CONFIG, again.
Bring your own Distribution!
1. Start all processes for
G1-G4 yourself
2. Enter all IP addresses in
TF_CONFIG along with
GPU device IDs.
21/48
G1
G2
G3
G4
TF_CONFIG
Available from TensorFlow 1.11
Tf CollectiveAllReduceStrategy Gotchas
• Specify GPU order in the ring statically
– gpu_indices
• Configure the batch size for merging tensors
– allreduce_merge_scope
• Set to ‘1’ for no merging
• Set to ’32’ for higher throughput.*
22/48
2018-10-
06
* https://groups.google.com/a/tensorflow.org/forum/#!topic/discuss/7T05tNV08Us
HopsML CollectiveAllReduceStrategy
• Uses Spark/YARN to add
distribution to TensorFlow’s
CollectiveAllReduceStrategy
– Automatically builds the ring
(Spark/YARN)
– Allocates GPUs to Spark
Executors
23/48
https://github.com/logicalclocks/hops-examples/tree/master/tensorflow/notebooks/Distributed_Training
CollectiveAllReduce vs Horovod Benchmark
TensorFlow: 1.11
Model: Inception v1
Dataset: imagenet (synthetic)
Batch size: 256 global, 32.0 per device
Num batches: 100
Optimizer Momemtum
Num GPUs: 8
AllReduce: collective
Step Img/sec total_loss
1 images/sec: 2972.4 +/- 0.0
10 images/sec: 3008.9 +/- 8.9
100 images/sec: 2998.6 +/- 4.3
------------------------------------------------------------
total images/sec: 2993.52
TensorFlow: 1.7
Model: Inception v1
Dataset: imagenet (synthetic)
Batch size: 256 global, 32.0 per device
Num batches 100
Optimizer Momemtum
Num GPUs: 8
AllReduce: horovod
Step Img/sec total_loss
1 images/sec: 2816.6 +/- 0.0
10 images/sec: 2808.0 +/- 10.8
100 images/sec: 2806.9 +/- 3.9
-----------------------------------------------------------
total images/sec: 2803.69
https://groups.google.com/a/tensorflow.org/forum/#!topic/discuss/7T05tNV08Us
Small Model
24/48
CollectiveAllReduce vs Horovod Benchmark
TensorFlow: 1.11
Model: VGG19
Dataset: imagenet (synthetic)
Batch size: 256 global, 32.0 per device
Num batches: 100
Optimizer Momemtum
Num GPUs: 8
AllReduce: collective
Step Img/sec total_loss
1 images/sec: 634.4 +/- 0.0
10 images/sec: 635.2 +/- 0.8
100 images/sec: 635.0 +/- 0.5
------------------------------------------------------------
total images/sec: 634.80
TensorFlow: 1.7
Model: VGG19
Dataset: imagenet (synthetic)
Batch size: 256 global, 32.0 per device
Num batches 100
Optimizer Momemtum
Num GPUs: 8
AllReduce: horovod
Step Img/sec total_loss
1 images/sec: 583.01 +/- 0.0
10 images/sec: 582.22 +/- 0.1
100 images/sec: 583.61 +/- 0.2
-----------------------------------------------------------
total images/sec: 583.61
https://groups.google.com/a/tensorflow.org/forum/#!topic/discuss/7T05tNV08Us
Big Model
25/48
Reduction in LoC for Dist Training
26/48
Released Framework Lines of Code in Hops
March 2016 DistributedTensorFlow ~1000
Feb 2017 TensorFlowOnSpark* ~900
Jan 2018 Horovod (Keras)* ~130
June 2018 Databricks’ HorovodEstimator ~100
Sep 2018 HopsML (Keras/CollectiveAllReduce)* ~100
*https://github.com/logicalclocks/hops-examples
**https://docs.azuredatabricks.net/_static/notebooks/horovod-estimator.html
Estimator APIs in TensorFlow
27/48
Estimators log to the Distributed Filesystem
tf.estimator.RunConfig(
‘CollectiveAllReduceStrategy’
model_dir
tensorboard_logs
checkpoints
)
experiment.launch(…)
/Experiments/appId/run.ID/<name>
/Experiments/appId/run.ID/<name>/eval
/Experiments/appId/run.ID/<name>/checkpoint
HopsFS (HDFS)
/Experiments/appId/run.ID/<name>/*.ipynb
/Experiments/appId/run.ID/<name>/conda.yml
28/48
def distributed_training():
def input_fn(): # return dataset
model = …
optimizer = …
model.compile(…)
rc = tf.estimator.RunConfig(‘CollectiveAllReduceStrategy’)
keras_estimator = tf.keras.estimator.model_to_estimator(….)
tf.estimator.train_and_evaluate(keras_estimator, input_fn)
experiment.allreduce(distributed_training)
HopsML CollectiveAllReduceStrategy with Keras
def distributed_training():
from hops import tensorboard
model_dir = tensorboard.logdir()
def input_fn(): # return dataset
model = …
optimizer = …
model.compile(…)
rc = tf.estimator.RunConfig(‘CollectiveAllReduceStrategy’)
keras_estimator = keras.model_to_estimator(model_dir)
tf.estimator.train_and_evaluate(keras_estimator, input_fn)
experiment.allreduce(distributed_training)
Add Tensorboard Support
def distributed_training():
from hops import devices
def input_fn(): # return dataset
model = …
optimizer = …
model.compile(…)
est.RunConfig(num_gpus_per_worker=devices.get_num_gpus())
keras_estimator = keras.model_to_estimator(…)
tf.estimator.train_and_evaluate(keras_estimator, input_fn)
experiment.allreduce(distributed_training)
GPU Device Awareness
def distributed_training():
def input_fn(): # return dataset
model = …
optimizer = …
model.compile(…)
rc = tf.estimator.RunConfig(‘CollectiveAllReduceStrategy’)
keras_estimator = keras.model_to_estimator(…)
tf.estimator.train_and_evaluate(keras_estimator, input_fn)
notebook = hdfs.project_path()+'/Jupyter/Experiment/inc.ipynb'
experiment.allreduce(distributed_training, name='inception',
description='A inception example with hidden layers‘,
versioned_resources=[notebook])
Experiment Versioning (.ipynb, conda, results)
Experiments/Versioning in Hopsworks
34/48
35/48
36/48
The Data Layer (Foundations)
Feeding Data to TensorFlow
38/48
Dataframe
Model Training
GPUs
CPUs
CPUs
CPUs
CPUs
Filesystem
.tfrecords
.csv
.parquet
Project Hydrogen: Barrier Execution mode in Spark: JIRA: SPARK-24374, SPARK-24723, SPARK-24579
Wrangling/Cleaning DataFrame
Filesystems are not good enough
Uber on Petastorm:
“[Using files] is hard to implement at large scale,
especially using modern distributed file systems
such as HDFS and S3 (these systems are typically
optimized for fast reads of large chunks of data).”
https://eng.uber.com/petastorm/
39/48
with Reader('hdfs://myhadoop/dataset.parquet') as reader:
dataset = make_petastorm_dataset(reader)
iterator = dataset.make_one_shot_iterator()
tensor = iterator.get_next()
with tf.Session() as sess:
sample = sess.run(tensor)
print(sample.id)
PetaStorm: Read Parquet directly into TensorFlow
40/48
NVMe Disks – Game Changer
• HDFS (and S3) are designed around large
blocks (optimized to overcome slow random I/O
on disks), while new NVMe hardware supports
orders of magnitude faster random disk I/O.
• Can we support faster random disk I/O with
HDFS?
– Yes with HopsFS.
41/48
Small files on NVMe
• At Spotify’s HDFS:
– 33% of files < 64KB
in size
– 42% of operations
are on files < 16KB in
size
42/48
*Size Matters: Improving the Performance of Small Files in Hadoop, Middleware 2018. Niazi et al
HopsFS – NVMe Performance
• HDFS with Distributed Metadata
– Winner IEEE Scale Prize 2017
• Small files stored replicated in the
metadata layer on NVMe disks*
– Read 10s of 1000s of images/second from
HopsFS
43/48
*Size Matters: Improving the Performance of Small Files in Hadoop, Middleware 2018. Niazi et al
Model Serving
Model Serving on Kubernetes
45/48
Model
Serving
46/48
Hopsworks
REST API
External
Apps
Internal
Apps
Logging
Load
Balancer
Model Serving
Containers
Streaming
Notifications
Retrain
Features:
• Canary
• Multiple Models
• Scale-Out/In
Frameworks:
ü TensorFlow Serving
ü MLeap for Spark
ü scikit-learn
Orchestrating ML Pipelines with Airflow
47/48
Airflow
Spark
ETL
Dist Training
TensorFlow
Test &
Optimize
Kubernetes
ModelServing
SparkStreaming
Monitoring
Summary
48/48
• The future of Deep Learning is Distributed
https://www.oreilly.com/ideas/distributed-tensorflow
• Hops is a new Data Platform with first-class
support for Python / Deep Learning / ML / Data
Governance / GPUs
hopshadoop logicalclocks
www.hops.io
www.logicalclocks.com

More Related Content

What's hot

Blazing Performance with Flame Graphs
Blazing Performance with Flame GraphsBlazing Performance with Flame Graphs
Blazing Performance with Flame GraphsBrendan Gregg
 
Intro to SVE 富岳のA64FXを触ってみた
Intro to SVE 富岳のA64FXを触ってみたIntro to SVE 富岳のA64FXを触ってみた
Intro to SVE 富岳のA64FXを触ってみたMITSUNARI Shigeo
 
GPU and Deep learning best practices
GPU and Deep learning best practicesGPU and Deep learning best practices
GPU and Deep learning best practicesLior Sidi
 
Cassandra's Odyssey @ Netflix
Cassandra's Odyssey @ NetflixCassandra's Odyssey @ Netflix
Cassandra's Odyssey @ NetflixRoopa Tangirala
 
ONNX - The Lingua Franca of Deep Learning
ONNX - The Lingua Franca of Deep LearningONNX - The Lingua Franca of Deep Learning
ONNX - The Lingua Franca of Deep LearningHagay Lupesko
 
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPBuild Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPDatabricks
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Brendan Gregg
 
プロセスとコンテキストスイッチ
プロセスとコンテキストスイッチプロセスとコンテキストスイッチ
プロセスとコンテキストスイッチKazuki Onishi
 
Protocol Buffers 入門
Protocol Buffers 入門Protocol Buffers 入門
Protocol Buffers 入門Yuichi Ito
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf toolsBrendan Gregg
 
Q2.12: Debugging with GDB
Q2.12: Debugging with GDBQ2.12: Debugging with GDB
Q2.12: Debugging with GDBLinaro
 
DNS Query Deconstructed
DNS Query DeconstructedDNS Query Deconstructed
DNS Query DeconstructedDNS Made Easy
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Alexey Lesovsky
 
Continguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux KernelContinguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux KernelKernel TLV
 
Intel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgen
Intel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgenIntel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgen
Intel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgenMITSUNARI Shigeo
 
LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceBrendan Gregg
 
Modern Kernel Pool Exploitation: Attacks and Techniques
Modern Kernel Pool Exploitation: Attacks and TechniquesModern Kernel Pool Exploitation: Attacks and Techniques
Modern Kernel Pool Exploitation: Attacks and TechniquesMichael Scovetta
 
Fast, Scalable Quantized Neural Network Inference on FPGAs with FINN and Logi...
Fast, Scalable Quantized Neural Network Inference on FPGAs with FINN and Logi...Fast, Scalable Quantized Neural Network Inference on FPGAs with FINN and Logi...
Fast, Scalable Quantized Neural Network Inference on FPGAs with FINN and Logi...KTN
 
PHPでマルチスレッド
PHPでマルチスレッドPHPでマルチスレッド
PHPでマルチスレッドkarky7
 

What's hot (20)

Blazing Performance with Flame Graphs
Blazing Performance with Flame GraphsBlazing Performance with Flame Graphs
Blazing Performance with Flame Graphs
 
Intro to SVE 富岳のA64FXを触ってみた
Intro to SVE 富岳のA64FXを触ってみたIntro to SVE 富岳のA64FXを触ってみた
Intro to SVE 富岳のA64FXを触ってみた
 
GPU and Deep learning best practices
GPU and Deep learning best practicesGPU and Deep learning best practices
GPU and Deep learning best practices
 
Cassandra's Odyssey @ Netflix
Cassandra's Odyssey @ NetflixCassandra's Odyssey @ Netflix
Cassandra's Odyssey @ Netflix
 
ONNX - The Lingua Franca of Deep Learning
ONNX - The Lingua Franca of Deep LearningONNX - The Lingua Franca of Deep Learning
ONNX - The Lingua Franca of Deep Learning
 
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPBuild Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016
 
プロセスとコンテキストスイッチ
プロセスとコンテキストスイッチプロセスとコンテキストスイッチ
プロセスとコンテキストスイッチ
 
Protocol Buffers 入門
Protocol Buffers 入門Protocol Buffers 入門
Protocol Buffers 入門
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf tools
 
Q2.12: Debugging with GDB
Q2.12: Debugging with GDBQ2.12: Debugging with GDB
Q2.12: Debugging with GDB
 
DNS Query Deconstructed
DNS Query DeconstructedDNS Query Deconstructed
DNS Query Deconstructed
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
 
Continguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux KernelContinguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux Kernel
 
Tips of Malloc & Free
Tips of Malloc & FreeTips of Malloc & Free
Tips of Malloc & Free
 
Intel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgen
Intel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgenIntel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgen
Intel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgen
 
LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems Performance
 
Modern Kernel Pool Exploitation: Attacks and Techniques
Modern Kernel Pool Exploitation: Attacks and TechniquesModern Kernel Pool Exploitation: Attacks and Techniques
Modern Kernel Pool Exploitation: Attacks and Techniques
 
Fast, Scalable Quantized Neural Network Inference on FPGAs with FINN and Logi...
Fast, Scalable Quantized Neural Network Inference on FPGAs with FINN and Logi...Fast, Scalable Quantized Neural Network Inference on FPGAs with FINN and Logi...
Fast, Scalable Quantized Neural Network Inference on FPGAs with FINN and Logi...
 
PHPでマルチスレッド
PHPでマルチスレッドPHPでマルチスレッド
PHPでマルチスレッド
 

Similar to Distributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling

Scaling TensorFlow with Hops, Global AI Conference Santa Clara
Scaling TensorFlow with Hops, Global AI Conference Santa ClaraScaling TensorFlow with Hops, Global AI Conference Santa Clara
Scaling TensorFlow with Hops, Global AI Conference Santa ClaraJim Dowling
 
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Chris Fregly
 
Certification Study Group -Professional ML Engineer Session 2 (GCP-TensorFlow...
Certification Study Group -Professional ML Engineer Session 2 (GCP-TensorFlow...Certification Study Group -Professional ML Engineer Session 2 (GCP-TensorFlow...
Certification Study Group -Professional ML Engineer Session 2 (GCP-TensorFlow...gdgsurrey
 
Anirudh Koul. 30 Golden Rules of Deep Learning Performance
Anirudh Koul. 30 Golden Rules of Deep Learning PerformanceAnirudh Koul. 30 Golden Rules of Deep Learning Performance
Anirudh Koul. 30 Golden Rules of Deep Learning PerformanceLviv Startup Club
 
Odsc workshop - Distributed Tensorflow on Hops
Odsc workshop - Distributed Tensorflow on HopsOdsc workshop - Distributed Tensorflow on Hops
Odsc workshop - Distributed Tensorflow on HopsJim Dowling
 
High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...
High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...
High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...Chris Fregly
 
Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)Jim Dowling
 
Optimizing, Profiling, and Deploying TensorFlow AI Models in Production with ...
Optimizing, Profiling, and Deploying TensorFlow AI Models in Production with ...Optimizing, Profiling, and Deploying TensorFlow AI Models in Production with ...
Optimizing, Profiling, and Deploying TensorFlow AI Models in Production with ...Chris Fregly
 
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUsOptimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUsChris Fregly
 
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...DataWorks Summit
 
Distributed deep learning optimizations - AI WithTheBest
Distributed deep learning optimizations - AI WithTheBestDistributed deep learning optimizations - AI WithTheBest
Distributed deep learning optimizations - AI WithTheBestgeetachauhan
 
Horovod ubers distributed deep learning framework by Alex Sergeev from Uber
Horovod ubers distributed deep learning framework  by Alex Sergeev from UberHorovod ubers distributed deep learning framework  by Alex Sergeev from Uber
Horovod ubers distributed deep learning framework by Alex Sergeev from UberBill Liu
 
Berlin buzzwords 2018 TensorFlow on Hops
Berlin buzzwords 2018 TensorFlow on HopsBerlin buzzwords 2018 TensorFlow on Hops
Berlin buzzwords 2018 TensorFlow on HopsJim Dowling
 
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on G...
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on G...Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on G...
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on G...Chris Fregly
 
GPUIterator: Bridging the Gap between Chapel and GPU Platforms
GPUIterator: Bridging the Gap between Chapel and GPU PlatformsGPUIterator: Bridging the Gap between Chapel and GPU Platforms
GPUIterator: Bridging the Gap between Chapel and GPU PlatformsAkihiro Hayashi
 
Uber's Journey in Distributed Deep Learning
Uber's Journey in Distributed Deep LearningUber's Journey in Distributed Deep Learning
Uber's Journey in Distributed Deep Learninginside-BigData.com
 
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSAccelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSDatabricks
 

Similar to Distributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling (20)

Scaling TensorFlow with Hops, Global AI Conference Santa Clara
Scaling TensorFlow with Hops, Global AI Conference Santa ClaraScaling TensorFlow with Hops, Global AI Conference Santa Clara
Scaling TensorFlow with Hops, Global AI Conference Santa Clara
 
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
 
Certification Study Group -Professional ML Engineer Session 2 (GCP-TensorFlow...
Certification Study Group -Professional ML Engineer Session 2 (GCP-TensorFlow...Certification Study Group -Professional ML Engineer Session 2 (GCP-TensorFlow...
Certification Study Group -Professional ML Engineer Session 2 (GCP-TensorFlow...
 
Anirudh Koul. 30 Golden Rules of Deep Learning Performance
Anirudh Koul. 30 Golden Rules of Deep Learning PerformanceAnirudh Koul. 30 Golden Rules of Deep Learning Performance
Anirudh Koul. 30 Golden Rules of Deep Learning Performance
 
Odsc workshop - Distributed Tensorflow on Hops
Odsc workshop - Distributed Tensorflow on HopsOdsc workshop - Distributed Tensorflow on Hops
Odsc workshop - Distributed Tensorflow on Hops
 
High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...
High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...
High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...
 
Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)
 
Optimizing, Profiling, and Deploying TensorFlow AI Models in Production with ...
Optimizing, Profiling, and Deploying TensorFlow AI Models in Production with ...Optimizing, Profiling, and Deploying TensorFlow AI Models in Production with ...
Optimizing, Profiling, and Deploying TensorFlow AI Models in Production with ...
 
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUsOptimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs
 
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
 
Distributed deep learning optimizations - AI WithTheBest
Distributed deep learning optimizations - AI WithTheBestDistributed deep learning optimizations - AI WithTheBest
Distributed deep learning optimizations - AI WithTheBest
 
Horovod ubers distributed deep learning framework by Alex Sergeev from Uber
Horovod ubers distributed deep learning framework  by Alex Sergeev from UberHorovod ubers distributed deep learning framework  by Alex Sergeev from Uber
Horovod ubers distributed deep learning framework by Alex Sergeev from Uber
 
Berlin buzzwords 2018 TensorFlow on Hops
Berlin buzzwords 2018 TensorFlow on HopsBerlin buzzwords 2018 TensorFlow on Hops
Berlin buzzwords 2018 TensorFlow on Hops
 
Clustering tensor flow con kubernetes y raspberry pi
Clustering tensor flow con kubernetes y raspberry piClustering tensor flow con kubernetes y raspberry pi
Clustering tensor flow con kubernetes y raspberry pi
 
Terraform training 🎒 - Basic
Terraform training 🎒 - BasicTerraform training 🎒 - Basic
Terraform training 🎒 - Basic
 
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on G...
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on G...Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on G...
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on G...
 
GPUIterator: Bridging the Gap between Chapel and GPU Platforms
GPUIterator: Bridging the Gap between Chapel and GPU PlatformsGPUIterator: Bridging the Gap between Chapel and GPU Platforms
GPUIterator: Bridging the Gap between Chapel and GPU Platforms
 
Uber's Journey in Distributed Deep Learning
Uber's Journey in Distributed Deep LearningUber's Journey in Distributed Deep Learning
Uber's Journey in Distributed Deep Learning
 
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSAccelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
 
TensorFlow for HPC?
TensorFlow for HPC?TensorFlow for HPC?
TensorFlow for HPC?
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundOppotus
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhArpitMalhotra16
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单enxupq
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单ukgaet
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单vcaxypu
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIAlejandraGmez176757
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单ewymefz
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsalex933524
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单nscud
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...elinavihriala
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxbenishzehra469
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...correoyaya
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?DOT TECH
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesStarCompliance.io
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单vcaxypu
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationBoston Institute of Analytics
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单ewymefz
 
Introduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxxIntroduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxxzahraomer517
 
Uber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis ReportUber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis ReportSatyamNeelmani2
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatheahmadsaood
 

Recently uploaded (20)

Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Introduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxxIntroduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxx
 
Uber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis ReportUber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis Report
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 

Distributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling

  • 1. Jim Dowling, Logical Clocks AB Distributed Deep Learning with Apache Spark and TensorFlow #SAISDL2 jim_dowling
  • 3. Spark & TensorFlow Dynamic Executors (release GPUs when training finishes) Container (GPUs) Blacklisting Executors (for Fault Tolerant Hyperparameter Optimization) Optimizing GPU Resource Utilization 3/48
  • 4. Scalable ML Pipeline Data Collection Experimentation Training Serving Feature Extraction Data Transformation & Verification Test Distributed Storage Potential Bottlenecks Object Stores (S3, GCS), HDFS, Ceph StandaloneSingle-Host TensorFlow Single GPU 4/48
  • 5. Scalable ML Pipeline 5/48 Data Collection Experimentation Training Serving Feature Extraction Data Transformation & Verification Test HopsFS (Hopsworks) PySpark KubernetesTensorFlow
  • 7. Spark/TensorFlow in Hopsworks 7/48 Executor Executor Driver HopsFSTensorBoard/Logs Model Serving Conda Envs Conda Envs
  • 8. HopsML 8/48 • Experiments – Dist. Hyperparameter Optimization – Versioning of Models/Code/Resources – Visualization with Tensorboard – Distributed Training with checkpointing • [Feature Store] • Model Serving and Monitoring Feature Extraction Experimentation Training Test + Serve Data Acquisition Clean/Transform Data
  • 9. Why Distributed Deep Learning? 9/48
  • 10. 10/48 Prof Nando de Freitas @NandoDF “ICLR 2019 lessons thus far: The deep neural nets have to be BIGGER and they’re hungry for data, memory and compute.” https://twitter.com/NandoDF/status/1046371764211716096
  • 11. All Roads Lead to Distribution 11/48 Distributed Deep Learning Hyper Parameter Optimization Distributed Training Larger Training Datasets Elastic Model Serving Parallel Experiments (Commodity) GPU Clusters Auto ML
  • 13. Faster Experimentation 13/48 GPU Servers LearningRate (LR): [0.0001-0.001] NumOfLayers (NL): [5-10] …… LR: 0.0001 NL: 5 Error: 0.35 LR: 0.0001 NL: 7 Error: 0.34 LR: 0.0005 NL: 5 Error: 0.38 LR: 0.0005 NL: 10 Error: 0.37 LR: 0.001 NL: 6 Error: 0.31 LR: 0.001 NL: 9 HyperParameter Optimization LR: 0.001 NL: 6 Error: 0.31 TensorFlow Program Hyperparameters Blacklist Executor LR: 0.001 NL: 9 Error: 0.36
  • 14. Declarative or API Approach? • Declarative Hyperparameters in external files – Vizier/CloudML (yaml) – Sagemaker (json)* • API-Driven – Databrick’s MLFlow – HopsML 14/48 *https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-define-ranges.html Notebook-Friendly
  • 15. def train(learning_rate, dropout): [TensorFlow Code here] args_dict = {'learning_rate': [0.001, 0.005, 0.01], 'dropout': [0.5, 0.6]} experiment.launch(train, args_dict) GridSearch for Hyperparameters on HopsML Launch 6 Spark Executors Dynamic Executors, Blacklisting
  • 16. Distributed Training 16/48 Image from @hardmaru on Twitter.
  • 17. Data Parallel Distributed Training 17/48 Training Time Generalization Error (Synchronous Stochastic Gradient Descent (SGD))
  • 18. Frameworks for Distributed Training 18/48
  • 19. Distributed TensorFlow / TfOnSpark TF_CONFIG Bring your own Distribution! 1. Start all processes for P1,P2, G1-G4 yourself 2. Enter all IP addresses in TF_CONFIG along with GPU device IDs. 19/48 Parameter Servers G1 G2 G3 G4 P1 P2 GPU Servers TF_CONFIG
  • 20. Horovod • Bandwidth optimal • Builds the Ring, runs AllReduce using MPI and NCCL2 • Available in – Hopsworks – Databricks (Spark 2.4) 20/48
  • 21. Tf CollectiveAllReduceStrategy TF_CONFIG, again. Bring your own Distribution! 1. Start all processes for G1-G4 yourself 2. Enter all IP addresses in TF_CONFIG along with GPU device IDs. 21/48 G1 G2 G3 G4 TF_CONFIG Available from TensorFlow 1.11
  • 22. Tf CollectiveAllReduceStrategy Gotchas • Specify GPU order in the ring statically – gpu_indices • Configure the batch size for merging tensors – allreduce_merge_scope • Set to ‘1’ for no merging • Set to ’32’ for higher throughput.* 22/48 2018-10- 06 * https://groups.google.com/a/tensorflow.org/forum/#!topic/discuss/7T05tNV08Us
  • 23. HopsML CollectiveAllReduceStrategy • Uses Spark/YARN to add distribution to TensorFlow’s CollectiveAllReduceStrategy – Automatically builds the ring (Spark/YARN) – Allocates GPUs to Spark Executors 23/48 https://github.com/logicalclocks/hops-examples/tree/master/tensorflow/notebooks/Distributed_Training
  • 24. CollectiveAllReduce vs Horovod Benchmark TensorFlow: 1.11 Model: Inception v1 Dataset: imagenet (synthetic) Batch size: 256 global, 32.0 per device Num batches: 100 Optimizer Momemtum Num GPUs: 8 AllReduce: collective Step Img/sec total_loss 1 images/sec: 2972.4 +/- 0.0 10 images/sec: 3008.9 +/- 8.9 100 images/sec: 2998.6 +/- 4.3 ------------------------------------------------------------ total images/sec: 2993.52 TensorFlow: 1.7 Model: Inception v1 Dataset: imagenet (synthetic) Batch size: 256 global, 32.0 per device Num batches 100 Optimizer Momemtum Num GPUs: 8 AllReduce: horovod Step Img/sec total_loss 1 images/sec: 2816.6 +/- 0.0 10 images/sec: 2808.0 +/- 10.8 100 images/sec: 2806.9 +/- 3.9 ----------------------------------------------------------- total images/sec: 2803.69 https://groups.google.com/a/tensorflow.org/forum/#!topic/discuss/7T05tNV08Us Small Model 24/48
  • 25. CollectiveAllReduce vs Horovod Benchmark TensorFlow: 1.11 Model: VGG19 Dataset: imagenet (synthetic) Batch size: 256 global, 32.0 per device Num batches: 100 Optimizer Momemtum Num GPUs: 8 AllReduce: collective Step Img/sec total_loss 1 images/sec: 634.4 +/- 0.0 10 images/sec: 635.2 +/- 0.8 100 images/sec: 635.0 +/- 0.5 ------------------------------------------------------------ total images/sec: 634.80 TensorFlow: 1.7 Model: VGG19 Dataset: imagenet (synthetic) Batch size: 256 global, 32.0 per device Num batches 100 Optimizer Momemtum Num GPUs: 8 AllReduce: horovod Step Img/sec total_loss 1 images/sec: 583.01 +/- 0.0 10 images/sec: 582.22 +/- 0.1 100 images/sec: 583.61 +/- 0.2 ----------------------------------------------------------- total images/sec: 583.61 https://groups.google.com/a/tensorflow.org/forum/#!topic/discuss/7T05tNV08Us Big Model 25/48
  • 26. Reduction in LoC for Dist Training 26/48 Released Framework Lines of Code in Hops March 2016 DistributedTensorFlow ~1000 Feb 2017 TensorFlowOnSpark* ~900 Jan 2018 Horovod (Keras)* ~130 June 2018 Databricks’ HorovodEstimator ~100 Sep 2018 HopsML (Keras/CollectiveAllReduce)* ~100 *https://github.com/logicalclocks/hops-examples **https://docs.azuredatabricks.net/_static/notebooks/horovod-estimator.html
  • 27. Estimator APIs in TensorFlow 27/48
  • 28. Estimators log to the Distributed Filesystem tf.estimator.RunConfig( ‘CollectiveAllReduceStrategy’ model_dir tensorboard_logs checkpoints ) experiment.launch(…) /Experiments/appId/run.ID/<name> /Experiments/appId/run.ID/<name>/eval /Experiments/appId/run.ID/<name>/checkpoint HopsFS (HDFS) /Experiments/appId/run.ID/<name>/*.ipynb /Experiments/appId/run.ID/<name>/conda.yml 28/48
  • 29. def distributed_training(): def input_fn(): # return dataset model = … optimizer = … model.compile(…) rc = tf.estimator.RunConfig(‘CollectiveAllReduceStrategy’) keras_estimator = tf.keras.estimator.model_to_estimator(….) tf.estimator.train_and_evaluate(keras_estimator, input_fn) experiment.allreduce(distributed_training) HopsML CollectiveAllReduceStrategy with Keras
  • 30. def distributed_training(): from hops import tensorboard model_dir = tensorboard.logdir() def input_fn(): # return dataset model = … optimizer = … model.compile(…) rc = tf.estimator.RunConfig(‘CollectiveAllReduceStrategy’) keras_estimator = keras.model_to_estimator(model_dir) tf.estimator.train_and_evaluate(keras_estimator, input_fn) experiment.allreduce(distributed_training) Add Tensorboard Support
  • 31. def distributed_training(): from hops import devices def input_fn(): # return dataset model = … optimizer = … model.compile(…) est.RunConfig(num_gpus_per_worker=devices.get_num_gpus()) keras_estimator = keras.model_to_estimator(…) tf.estimator.train_and_evaluate(keras_estimator, input_fn) experiment.allreduce(distributed_training) GPU Device Awareness
  • 32. def distributed_training(): def input_fn(): # return dataset model = … optimizer = … model.compile(…) rc = tf.estimator.RunConfig(‘CollectiveAllReduceStrategy’) keras_estimator = keras.model_to_estimator(…) tf.estimator.train_and_evaluate(keras_estimator, input_fn) notebook = hdfs.project_path()+'/Jupyter/Experiment/inc.ipynb' experiment.allreduce(distributed_training, name='inception', description='A inception example with hidden layers‘, versioned_resources=[notebook]) Experiment Versioning (.ipynb, conda, results)
  • 34. 34/48
  • 35. 35/48
  • 36. 36/48
  • 37. The Data Layer (Foundations)
  • 38. Feeding Data to TensorFlow 38/48 Dataframe Model Training GPUs CPUs CPUs CPUs CPUs Filesystem .tfrecords .csv .parquet Project Hydrogen: Barrier Execution mode in Spark: JIRA: SPARK-24374, SPARK-24723, SPARK-24579 Wrangling/Cleaning DataFrame
  • 39. Filesystems are not good enough Uber on Petastorm: “[Using files] is hard to implement at large scale, especially using modern distributed file systems such as HDFS and S3 (these systems are typically optimized for fast reads of large chunks of data).” https://eng.uber.com/petastorm/ 39/48
  • 40. with Reader('hdfs://myhadoop/dataset.parquet') as reader: dataset = make_petastorm_dataset(reader) iterator = dataset.make_one_shot_iterator() tensor = iterator.get_next() with tf.Session() as sess: sample = sess.run(tensor) print(sample.id) PetaStorm: Read Parquet directly into TensorFlow 40/48
  • 41. NVMe Disks – Game Changer • HDFS (and S3) are designed around large blocks (optimized to overcome slow random I/O on disks), while new NVMe hardware supports orders of magnitude faster random disk I/O. • Can we support faster random disk I/O with HDFS? – Yes with HopsFS. 41/48
  • 42. Small files on NVMe • At Spotify’s HDFS: – 33% of files < 64KB in size – 42% of operations are on files < 16KB in size 42/48 *Size Matters: Improving the Performance of Small Files in Hadoop, Middleware 2018. Niazi et al
  • 43. HopsFS – NVMe Performance • HDFS with Distributed Metadata – Winner IEEE Scale Prize 2017 • Small files stored replicated in the metadata layer on NVMe disks* – Read 10s of 1000s of images/second from HopsFS 43/48 *Size Matters: Improving the Performance of Small Files in Hadoop, Middleware 2018. Niazi et al
  • 45. Model Serving on Kubernetes 45/48
  • 46. Model Serving 46/48 Hopsworks REST API External Apps Internal Apps Logging Load Balancer Model Serving Containers Streaming Notifications Retrain Features: • Canary • Multiple Models • Scale-Out/In Frameworks: ü TensorFlow Serving ü MLeap for Spark ü scikit-learn
  • 47. Orchestrating ML Pipelines with Airflow 47/48 Airflow Spark ETL Dist Training TensorFlow Test & Optimize Kubernetes ModelServing SparkStreaming Monitoring
  • 48. Summary 48/48 • The future of Deep Learning is Distributed https://www.oreilly.com/ideas/distributed-tensorflow • Hops is a new Data Platform with first-class support for Python / Deep Learning / ML / Data Governance / GPUs hopshadoop logicalclocks www.hops.io www.logicalclocks.com