SlideShare a Scribd company logo
1 of 19
Download to read offline
Distributed Deep Learning (DDL) with HopsML
RISE Machine Learning Study Group
Kim Hammar
kim@logicalclocks.com
November 29, 2018
Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 1 / 20
Outline
1 Distributed Deep Learning (DDL) Theory
2 HopsML: Distributed Deep Learning in Practice
3 Use-Case of DDL: Anti-Money-Laundering
Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 2 / 20
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Distributed Computing Deep Learning
Why Combine the two?
More productive Data Science1
Unreasonable effectiveness of data2
To achieve state-of-the-art results3
1
Alex Sergeev and title = Meet Horovod: Uber’s Open Source Distributed Deep Learning Framework for
TensorFlow howpublished = https://eng.uber.com/horovod/ note = Accessed: 2018-11-24 Mike
Del Balso year=2017.
2
Chen Sun et al. “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”. In: CoRR
abs/1707.02968 (2017). arXiv: 1707.02968. URL: http://arxiv.org/abs/1707.02968.
3
Jeffrey Dean et al. “Large Scale Distributed Deep Networks”. In: Advances in Neural Information Processing
Systems 25. Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1223–1231. URL:
http://papers.nips.cc/paper/4687-large-scale-distributed-deep-networks.pdf.
Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 3 / 20
What is Distributed Deep Learning?
HPC
Deep Learning
Systems
Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 4 / 20
Model Parallelism
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
Machine 1 Machine 2 Machine 3
Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 5 / 20
Data Parallelism
Data Parallel Workers W
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy . . .
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Data Partitions P p1 p2 . . . pn
Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 6 / 20
When to use Model Parallel and Data Parallel?
How big is your model parameters θ vs
GPU memory? If size(θ) > size(gpu) you have to use model parallelism
If your model fits on a single GPU =⇒ in 99.999% you want to use
data parallelism to reduce training time
Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 7 / 20
Parameter Server Architecture
Parameter Server ps d
Data Parallel Workers W e1 e2 . . . en
Data Partitions P p1 p2 . . . pn
Broadcast parameters
Upload gradients
Local training
Read data
Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 8 / 20
Ring-All-Reduce Architecture
e1
e2
e3
e4
p1
p2
p3
p4
Broadcast parameters
Upload gradients
Local training
Read data
Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 9 / 20
When to use Parameter-Server and when to use
Ring-All-Reduce?
Ring-all-reduce scales better =⇒ generally prefer ring-all-reduce4
4
Alex Sergeev and title = Meet Horovod: Uber’s Open Source Distributed Deep Learning Framework for
TensorFlow howpublished = https://eng.uber.com/horovod/ note = Accessed: 2018-11-24 Mike
Del Balso year=2017.
Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 10 / 20
How to get started?
ICE (RISE SICS NORTH) provides the hardware that you need
GPU Machines for training 
CPU Machines for data prep 
Disks for storing large datasets 
HopsML provides the ML infrastructure that you need
Fast Distributed File System 
Spark-jobs and notebooks for data prep 
Framework for reproducible and versioned parallel experiments 
Framework for distributed training 
Framework for monitoring training 
Support for auto-scaling model serving 
Feature store (Soon!)
Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 11 / 20
Hopsworks: UI-driven front-end to the ML infrastructure
Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 12 / 20
Python-First API-powered workflow
Write your regular tensorflow/python/pytorch/keras code and put it in a
function, for example called collective_all_reduce_mnist, then you
can create a reproducible experiment using many GPUs and
collective-all-reduce as follows:
from hops import experiment
from hops import hdfs
notebook = hdfs.project_path() +
Jupyter/Distributed_Training/collective_allreduce_strategy/mnist.ipynb
experiment.collective_all_reduce(collective_all_reduce_mnist ,
name=’mnist estimator’,
description=’A minimal mnist example with two hidden layers’,
versioned_resources=[notebook], local_logdir=True)
Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 13 / 20
Single-GPU Training on Hops
HopsFS
Spark Driver
Read Data
Write Results
Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 14 / 20
Parallel Experiments on Hops
HopsFS
Spark Driver
Distributed Hyperparameter Search
With GPUs
E1 E2 E3 E4 E5 E6
read data
write results
Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 15 / 20
Multi-GPU Training on Hops
HopsFS
Spark Driver
Read Data
Write Results
send/receiveGPUs aranged in
a logical ring for
ring-all-reduce training
Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 16 / 20
Distributed GPU Training on Hops
HopsFS
Spark Driver
E1
E2
E3
E4
read data
write results
Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 17 / 20
Model Serving on Hops
Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 18 / 20
Register at hops.site, email: kim@logicalclocks.com if your
registration is not approved
Try out the deep learning tour on hopsworks
Example code:
https://github.com/logicalclocks/hops-examples
Look at the docs: https://www.hops.io/
If you get stuck, write on gitter:
https://gitter.im/hopshadoop/hopsworks
DEMO

More Related Content

What's hot

Evaluation of Caching Strategies Based on Access Statistics on Past Requests
Evaluation of Caching Strategies Based on Access Statistics on Past RequestsEvaluation of Caching Strategies Based on Access Statistics on Past Requests
Evaluation of Caching Strategies Based on Access Statistics on Past Requests
SmartenIT
 
Super COMPUTING Journal
Super COMPUTING JournalSuper COMPUTING Journal
Super COMPUTING Journal
Pandey_G
 
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
T. E. BOGALE
 

What's hot (20)

20180722 pyro
20180722 pyro20180722 pyro
20180722 pyro
 
Day 3 plotting.pptx
Day 3   plotting.pptxDay 3   plotting.pptx
Day 3 plotting.pptx
 
Deep Learning on Aerial Imagery: What does it look like on a map?
Deep Learning on Aerial Imagery: What does it look like on a map?Deep Learning on Aerial Imagery: What does it look like on a map?
Deep Learning on Aerial Imagery: What does it look like on a map?
 
Evaluation of Caching Strategies Based on Access Statistics on Past Requests
Evaluation of Caching Strategies Based on Access Statistics on Past RequestsEvaluation of Caching Strategies Based on Access Statistics on Past Requests
Evaluation of Caching Strategies Based on Access Statistics on Past Requests
 
Regularised Cross-Modal Hashing (SIGIR'15 Poster)
Regularised Cross-Modal Hashing (SIGIR'15 Poster)Regularised Cross-Modal Hashing (SIGIR'15 Poster)
Regularised Cross-Modal Hashing (SIGIR'15 Poster)
 
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
 
chapter - 6.ppt
chapter - 6.pptchapter - 6.ppt
chapter - 6.ppt
 
Redpoll
RedpollRedpoll
Redpoll
 
Super COMPUTING Journal
Super COMPUTING JournalSuper COMPUTING Journal
Super COMPUTING Journal
 
Histogram of Image Colors
Histogram of Image ColorsHistogram of Image Colors
Histogram of Image Colors
 
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
 
R user-group-2011-09
R user-group-2011-09R user-group-2011-09
R user-group-2011-09
 
cnsm2011_slide
cnsm2011_slidecnsm2011_slide
cnsm2011_slide
 
Adaptive Channel Prediction, Beamforming and Scheduling Design for 5G V2I Net...
Adaptive Channel Prediction, Beamforming and Scheduling Design for 5G V2I Net...Adaptive Channel Prediction, Beamforming and Scheduling Design for 5G V2I Net...
Adaptive Channel Prediction, Beamforming and Scheduling Design for 5G V2I Net...
 
LocationTech Projects
LocationTech ProjectsLocationTech Projects
LocationTech Projects
 
強化学習の分散アーキテクチャ変遷
強化学習の分散アーキテクチャ変遷強化学習の分散アーキテクチャ変遷
強化学習の分散アーキテクチャ変遷
 
6. Implementation
6. Implementation6. Implementation
6. Implementation
 
GraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBGraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDB
 
Effective machine learning_with_tpu
Effective machine learning_with_tpuEffective machine learning_with_tpu
Effective machine learning_with_tpu
 

Similar to Kim Hammar - Distributed Deep Learning - RISE Learning Machines Meetup

Intelligent Ruby + Machine Learning
Intelligent Ruby + Machine LearningIntelligent Ruby + Machine Learning
Intelligent Ruby + Machine Learning
Ilya Grigorik
 

Similar to Kim Hammar - Distributed Deep Learning - RISE Learning Machines Meetup (20)

Large Model support and Distribute deep learning
Large Model support and Distribute deep learningLarge Model support and Distribute deep learning
Large Model support and Distribute deep learning
 
Kim Hammar - Spotify ML Guild Meetup - Feature Stores
Kim Hammar - Spotify ML Guild Meetup - Feature StoresKim Hammar - Spotify ML Guild Meetup - Feature Stores
Kim Hammar - Spotify ML Guild Meetup - Feature Stores
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
 
Machine learning at scale by Amy Unruh from Google
Machine learning at scale by  Amy Unruh from GoogleMachine learning at scale by  Amy Unruh from Google
Machine learning at scale by Amy Unruh from Google
 
Google Big Data Expo
Google Big Data ExpoGoogle Big Data Expo
Google Big Data Expo
 
Buzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningBuzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learning
 
BOS K8S Meetup - Finetuning LLama 2 Model on GKE.pdf
BOS K8S Meetup - Finetuning LLama 2 Model on GKE.pdfBOS K8S Meetup - Finetuning LLama 2 Model on GKE.pdf
BOS K8S Meetup - Finetuning LLama 2 Model on GKE.pdf
 
Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)
 
Scientific Computing @ Fred Hutch
Scientific Computing @ Fred HutchScientific Computing @ Fred Hutch
Scientific Computing @ Fred Hutch
 
Intelligent Ruby + Machine Learning
Intelligent Ruby + Machine LearningIntelligent Ruby + Machine Learning
Intelligent Ruby + Machine Learning
 
Streaming Random Forest Learning in Spark and StreamDM with Heitor Murilogome...
Streaming Random Forest Learning in Spark and StreamDM with Heitor Murilogome...Streaming Random Forest Learning in Spark and StreamDM with Heitor Murilogome...
Streaming Random Forest Learning in Spark and StreamDM with Heitor Murilogome...
 
Nexxworks bootcamp ML6 (27/09/2017)
Nexxworks bootcamp ML6 (27/09/2017)Nexxworks bootcamp ML6 (27/09/2017)
Nexxworks bootcamp ML6 (27/09/2017)
 
All AI Roads lead to Distribution - Dot AI
All AI Roads lead to Distribution - Dot AIAll AI Roads lead to Distribution - Dot AI
All AI Roads lead to Distribution - Dot AI
 
Cloud Roundtable at Microsoft Switzerland
Cloud Roundtable at Microsoft Switzerland Cloud Roundtable at Microsoft Switzerland
Cloud Roundtable at Microsoft Switzerland
 
End-to-End Platform Support for Distributed Deep Learning in Finance
End-to-End Platform Support for Distributed Deep Learning in FinanceEnd-to-End Platform Support for Distributed Deep Learning in Finance
End-to-End Platform Support for Distributed Deep Learning in Finance
 
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-LearnAccelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
 
The Future of Computing is Distributed
The Future of Computing is DistributedThe Future of Computing is Distributed
The Future of Computing is Distributed
 
DECK36 - Log everything! and Realtime Datastream Analytics with Storm
DECK36 - Log everything! and Realtime Datastream Analytics with StormDECK36 - Log everything! and Realtime Datastream Analytics with Storm
DECK36 - Log everything! and Realtime Datastream Analytics with Storm
 
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSAccelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
 
Jfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocksJfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocks
 

More from Kim Hammar

Automated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesAutomated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jectures
Kim Hammar
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Kim Hammar
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Kim Hammar
 
Learning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via DecompositionLearning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via Decomposition
Kim Hammar
 

More from Kim Hammar (20)

Automated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesAutomated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jectures
 
Självlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTHSjälvlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTH
 
Learning Automated Intrusion Response
Learning Automated Intrusion ResponseLearning Automated Intrusion Response
Learning Automated Intrusion Response
 
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback ControlIntrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
 
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
 
Learning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via DecompositionLearning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via Decomposition
 
Digital Twins for Security Automation
Digital Twins for Security AutomationDigital Twins for Security Automation
Digital Twins for Security Automation
 
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
 
Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.
 
Intrusion Response through Optimal Stopping
Intrusion Response through Optimal StoppingIntrusion Response through Optimal Stopping
Intrusion Response through Optimal Stopping
 
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
 
Self-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber DefenseSelf-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber Defense
 
Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.
 
Learning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal StoppingLearning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal Stopping
 
Intrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingIntrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal Stopping
 
Intrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-PlayIntrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-Play
 
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
 
Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.
 

Recently uploaded

Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 

Recently uploaded (20)

Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 

Kim Hammar - Distributed Deep Learning - RISE Learning Machines Meetup

  • 1. Distributed Deep Learning (DDL) with HopsML RISE Machine Learning Study Group Kim Hammar kim@logicalclocks.com November 29, 2018 Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 1 / 20
  • 2. Outline 1 Distributed Deep Learning (DDL) Theory 2 HopsML: Distributed Deep Learning in Practice 3 Use-Case of DDL: Anti-Money-Laundering Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 2 / 20
  • 3. b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Distributed Computing Deep Learning Why Combine the two? More productive Data Science1 Unreasonable effectiveness of data2 To achieve state-of-the-art results3 1 Alex Sergeev and title = Meet Horovod: Uber’s Open Source Distributed Deep Learning Framework for TensorFlow howpublished = https://eng.uber.com/horovod/ note = Accessed: 2018-11-24 Mike Del Balso year=2017. 2 Chen Sun et al. “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”. In: CoRR abs/1707.02968 (2017). arXiv: 1707.02968. URL: http://arxiv.org/abs/1707.02968. 3 Jeffrey Dean et al. “Large Scale Distributed Deep Networks”. In: Advances in Neural Information Processing Systems 25. Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1223–1231. URL: http://papers.nips.cc/paper/4687-large-scale-distributed-deep-networks.pdf. Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 3 / 20
  • 4. What is Distributed Deep Learning? HPC Deep Learning Systems Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 4 / 20
  • 6. Data Parallelism Data Parallel Workers W b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy . . . b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Data Partitions P p1 p2 . . . pn Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 6 / 20
  • 7. When to use Model Parallel and Data Parallel? How big is your model parameters θ vs GPU memory? If size(θ) > size(gpu) you have to use model parallelism If your model fits on a single GPU =⇒ in 99.999% you want to use data parallelism to reduce training time Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 7 / 20
  • 8. Parameter Server Architecture Parameter Server ps d Data Parallel Workers W e1 e2 . . . en Data Partitions P p1 p2 . . . pn Broadcast parameters Upload gradients Local training Read data Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 8 / 20
  • 9. Ring-All-Reduce Architecture e1 e2 e3 e4 p1 p2 p3 p4 Broadcast parameters Upload gradients Local training Read data Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 9 / 20
  • 10. When to use Parameter-Server and when to use Ring-All-Reduce? Ring-all-reduce scales better =⇒ generally prefer ring-all-reduce4 4 Alex Sergeev and title = Meet Horovod: Uber’s Open Source Distributed Deep Learning Framework for TensorFlow howpublished = https://eng.uber.com/horovod/ note = Accessed: 2018-11-24 Mike Del Balso year=2017. Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 10 / 20
  • 11. How to get started? ICE (RISE SICS NORTH) provides the hardware that you need GPU Machines for training CPU Machines for data prep Disks for storing large datasets HopsML provides the ML infrastructure that you need Fast Distributed File System Spark-jobs and notebooks for data prep Framework for reproducible and versioned parallel experiments Framework for distributed training Framework for monitoring training Support for auto-scaling model serving Feature store (Soon!) Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 11 / 20
  • 12. Hopsworks: UI-driven front-end to the ML infrastructure Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 12 / 20
  • 13. Python-First API-powered workflow Write your regular tensorflow/python/pytorch/keras code and put it in a function, for example called collective_all_reduce_mnist, then you can create a reproducible experiment using many GPUs and collective-all-reduce as follows: from hops import experiment from hops import hdfs notebook = hdfs.project_path() + Jupyter/Distributed_Training/collective_allreduce_strategy/mnist.ipynb experiment.collective_all_reduce(collective_all_reduce_mnist , name=’mnist estimator’, description=’A minimal mnist example with two hidden layers’, versioned_resources=[notebook], local_logdir=True) Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 13 / 20
  • 14. Single-GPU Training on Hops HopsFS Spark Driver Read Data Write Results Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 14 / 20
  • 15. Parallel Experiments on Hops HopsFS Spark Driver Distributed Hyperparameter Search With GPUs E1 E2 E3 E4 E5 E6 read data write results Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 15 / 20
  • 16. Multi-GPU Training on Hops HopsFS Spark Driver Read Data Write Results send/receiveGPUs aranged in a logical ring for ring-all-reduce training Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 16 / 20
  • 17. Distributed GPU Training on Hops HopsFS Spark Driver E1 E2 E3 E4 read data write results Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 17 / 20
  • 18. Model Serving on Hops Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 18 / 20
  • 19. Register at hops.site, email: kim@logicalclocks.com if your registration is not approved Try out the deep learning tour on hopsworks Example code: https://github.com/logicalclocks/hops-examples Look at the docs: https://www.hops.io/ If you get stuck, write on gitter: https://gitter.im/hopshadoop/hopsworks DEMO