SlideShare a Scribd company logo
Distributed Deep Learning (DDL) with HopsML
RISE Machine Learning Study Group
Kim Hammar
kim@logicalclocks.com
November 29, 2018
Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 1 / 20
Outline
1 Distributed Deep Learning (DDL) Theory
2 HopsML: Distributed Deep Learning in Practice
3 Use-Case of DDL: Anti-Money-Laundering
Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 2 / 20
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Distributed Computing Deep Learning
Why Combine the two?
More productive Data Science1
Unreasonable effectiveness of data2
To achieve state-of-the-art results3
1
Alex Sergeev and title = Meet Horovod: Uber’s Open Source Distributed Deep Learning Framework for
TensorFlow howpublished = https://eng.uber.com/horovod/ note = Accessed: 2018-11-24 Mike
Del Balso year=2017.
2
Chen Sun et al. “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”. In: CoRR
abs/1707.02968 (2017). arXiv: 1707.02968. URL: http://arxiv.org/abs/1707.02968.
3
Jeffrey Dean et al. “Large Scale Distributed Deep Networks”. In: Advances in Neural Information Processing
Systems 25. Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1223–1231. URL:
http://papers.nips.cc/paper/4687-large-scale-distributed-deep-networks.pdf.
Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 3 / 20
What is Distributed Deep Learning?
HPC
Deep Learning
Systems
Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 4 / 20
Model Parallelism
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
Machine 1 Machine 2 Machine 3
Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 5 / 20
Data Parallelism
Data Parallel Workers W
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy . . .
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Data Partitions P p1 p2 . . . pn
Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 6 / 20
When to use Model Parallel and Data Parallel?
How big is your model parameters θ vs
GPU memory? If size(θ) > size(gpu) you have to use model parallelism
If your model fits on a single GPU =⇒ in 99.999% you want to use
data parallelism to reduce training time
Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 7 / 20
Parameter Server Architecture
Parameter Server ps d
Data Parallel Workers W e1 e2 . . . en
Data Partitions P p1 p2 . . . pn
Broadcast parameters
Upload gradients
Local training
Read data
Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 8 / 20
Ring-All-Reduce Architecture
e1
e2
e3
e4
p1
p2
p3
p4
Broadcast parameters
Upload gradients
Local training
Read data
Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 9 / 20
When to use Parameter-Server and when to use
Ring-All-Reduce?
Ring-all-reduce scales better =⇒ generally prefer ring-all-reduce4
4
Alex Sergeev and title = Meet Horovod: Uber’s Open Source Distributed Deep Learning Framework for
TensorFlow howpublished = https://eng.uber.com/horovod/ note = Accessed: 2018-11-24 Mike
Del Balso year=2017.
Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 10 / 20
How to get started?
ICE (RISE SICS NORTH) provides the hardware that you need
GPU Machines for training 
CPU Machines for data prep 
Disks for storing large datasets 
HopsML provides the ML infrastructure that you need
Fast Distributed File System 
Spark-jobs and notebooks for data prep 
Framework for reproducible and versioned parallel experiments 
Framework for distributed training 
Framework for monitoring training 
Support for auto-scaling model serving 
Feature store (Soon!)
Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 11 / 20
Hopsworks: UI-driven front-end to the ML infrastructure
Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 12 / 20
Python-First API-powered workflow
Write your regular tensorflow/python/pytorch/keras code and put it in a
function, for example called collective_all_reduce_mnist, then you
can create a reproducible experiment using many GPUs and
collective-all-reduce as follows:
from hops import experiment
from hops import hdfs
notebook = hdfs.project_path() +
Jupyter/Distributed_Training/collective_allreduce_strategy/mnist.ipynb
experiment.collective_all_reduce(collective_all_reduce_mnist ,
name=’mnist estimator’,
description=’A minimal mnist example with two hidden layers’,
versioned_resources=[notebook], local_logdir=True)
Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 13 / 20
Single-GPU Training on Hops
HopsFS
Spark Driver
Read Data
Write Results
Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 14 / 20
Parallel Experiments on Hops
HopsFS
Spark Driver
Distributed Hyperparameter Search
With GPUs
E1 E2 E3 E4 E5 E6
read data
write results
Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 15 / 20
Multi-GPU Training on Hops
HopsFS
Spark Driver
Read Data
Write Results
send/receiveGPUs aranged in
a logical ring for
ring-all-reduce training
Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 16 / 20
Distributed GPU Training on Hops
HopsFS
Spark Driver
E1
E2
E3
E4
read data
write results
Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 17 / 20
Model Serving on Hops
Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 18 / 20
Register at hops.site, email: kim@logicalclocks.com if your
registration is not approved
Try out the deep learning tour on hopsworks
Example code:
https://github.com/logicalclocks/hops-examples
Look at the docs: https://www.hops.io/
If you get stuck, write on gitter:
https://gitter.im/hopshadoop/hopsworks
DEMO

More Related Content

What's hot

20180722 pyro
20180722 pyro20180722 pyro
20180722 pyro
Taku Yoshioka
 
Day 3 plotting.pptx
Day 3   plotting.pptxDay 3   plotting.pptx
Day 3 plotting.pptx
Adrien Melquiond
 
Deep Learning on Aerial Imagery: What does it look like on a map?
Deep Learning on Aerial Imagery: What does it look like on a map?Deep Learning on Aerial Imagery: What does it look like on a map?
Deep Learning on Aerial Imagery: What does it look like on a map?
Rob Emanuele
 
Evaluation of Caching Strategies Based on Access Statistics on Past Requests
Evaluation of Caching Strategies Based on Access Statistics on Past RequestsEvaluation of Caching Strategies Based on Access Statistics on Past Requests
Evaluation of Caching Strategies Based on Access Statistics on Past RequestsSmartenIT
 
Regularised Cross-Modal Hashing (SIGIR'15 Poster)
Regularised Cross-Modal Hashing (SIGIR'15 Poster)Regularised Cross-Modal Hashing (SIGIR'15 Poster)
Regularised Cross-Modal Hashing (SIGIR'15 Poster)
Sean Moran
 
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
Rob Emanuele
 
Redpoll
RedpollRedpoll
Redpoll
Min Zhou
 
Super COMPUTING Journal
Super COMPUTING JournalSuper COMPUTING Journal
Super COMPUTING JournalPandey_G
 
Histogram of Image Colors
Histogram of Image ColorsHistogram of Image Colors
Histogram of Image Colors
pythontic
 
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
T. E. BOGALE
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
Alpine Data
 
R user-group-2011-09
R user-group-2011-09R user-group-2011-09
R user-group-2011-09
Ted Dunning
 
Adaptive Channel Prediction, Beamforming and Scheduling Design for 5G V2I Net...
Adaptive Channel Prediction, Beamforming and Scheduling Design for 5G V2I Net...Adaptive Channel Prediction, Beamforming and Scheduling Design for 5G V2I Net...
Adaptive Channel Prediction, Beamforming and Scheduling Design for 5G V2I Net...
T. E. BOGALE
 
LocationTech Projects
LocationTech ProjectsLocationTech Projects
LocationTech Projects
Jody Garnett
 
強化学習の分散アーキテクチャ変遷
強化学習の分散アーキテクチャ変遷強化学習の分散アーキテクチャ変遷
強化学習の分散アーキテクチャ変遷
Eiji Sekiya
 
6. Implementation
6. Implementation6. Implementation
GraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBGraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDB
ArangoDB Database
 
Effective machine learning_with_tpu
Effective machine learning_with_tpuEffective machine learning_with_tpu
Effective machine learning_with_tpu
Athul Suresh
 

What's hot (20)

20180722 pyro
20180722 pyro20180722 pyro
20180722 pyro
 
Day 3 plotting.pptx
Day 3   plotting.pptxDay 3   plotting.pptx
Day 3 plotting.pptx
 
Deep Learning on Aerial Imagery: What does it look like on a map?
Deep Learning on Aerial Imagery: What does it look like on a map?Deep Learning on Aerial Imagery: What does it look like on a map?
Deep Learning on Aerial Imagery: What does it look like on a map?
 
Evaluation of Caching Strategies Based on Access Statistics on Past Requests
Evaluation of Caching Strategies Based on Access Statistics on Past RequestsEvaluation of Caching Strategies Based on Access Statistics on Past Requests
Evaluation of Caching Strategies Based on Access Statistics on Past Requests
 
Regularised Cross-Modal Hashing (SIGIR'15 Poster)
Regularised Cross-Modal Hashing (SIGIR'15 Poster)Regularised Cross-Modal Hashing (SIGIR'15 Poster)
Regularised Cross-Modal Hashing (SIGIR'15 Poster)
 
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
 
chapter - 6.ppt
chapter - 6.pptchapter - 6.ppt
chapter - 6.ppt
 
Redpoll
RedpollRedpoll
Redpoll
 
Super COMPUTING Journal
Super COMPUTING JournalSuper COMPUTING Journal
Super COMPUTING Journal
 
Histogram of Image Colors
Histogram of Image ColorsHistogram of Image Colors
Histogram of Image Colors
 
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
 
R user-group-2011-09
R user-group-2011-09R user-group-2011-09
R user-group-2011-09
 
cnsm2011_slide
cnsm2011_slidecnsm2011_slide
cnsm2011_slide
 
Adaptive Channel Prediction, Beamforming and Scheduling Design for 5G V2I Net...
Adaptive Channel Prediction, Beamforming and Scheduling Design for 5G V2I Net...Adaptive Channel Prediction, Beamforming and Scheduling Design for 5G V2I Net...
Adaptive Channel Prediction, Beamforming and Scheduling Design for 5G V2I Net...
 
LocationTech Projects
LocationTech ProjectsLocationTech Projects
LocationTech Projects
 
強化学習の分散アーキテクチャ変遷
強化学習の分散アーキテクチャ変遷強化学習の分散アーキテクチャ変遷
強化学習の分散アーキテクチャ変遷
 
6. Implementation
6. Implementation6. Implementation
6. Implementation
 
GraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBGraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDB
 
Effective machine learning_with_tpu
Effective machine learning_with_tpuEffective machine learning_with_tpu
Effective machine learning_with_tpu
 

Similar to Kim Hammar - Distributed Deep Learning - RISE Learning Machines Meetup

Large Model support and Distribute deep learning
Large Model support and Distribute deep learningLarge Model support and Distribute deep learning
Large Model support and Distribute deep learning
Ganesan Narayanasamy
 
Kim Hammar - Spotify ML Guild Meetup - Feature Stores
Kim Hammar - Spotify ML Guild Meetup - Feature StoresKim Hammar - Spotify ML Guild Meetup - Feature Stores
Kim Hammar - Spotify ML Guild Meetup - Feature Stores
Kim Hammar
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
StampedeCon
 
Machine learning at scale by Amy Unruh from Google
Machine learning at scale by  Amy Unruh from GoogleMachine learning at scale by  Amy Unruh from Google
Machine learning at scale by Amy Unruh from Google
Bill Liu
 
Google Big Data Expo
Google Big Data ExpoGoogle Big Data Expo
Google Big Data Expo
BigDataExpo
 
Buzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningBuzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learning
Ted Dunning
 
BOS K8S Meetup - Finetuning LLama 2 Model on GKE.pdf
BOS K8S Meetup - Finetuning LLama 2 Model on GKE.pdfBOS K8S Meetup - Finetuning LLama 2 Model on GKE.pdf
BOS K8S Meetup - Finetuning LLama 2 Model on GKE.pdf
MichaelOLeary82
 
Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)
Jim Dowling
 
Scientific Computing @ Fred Hutch
Scientific Computing @ Fred HutchScientific Computing @ Fred Hutch
Scientific Computing @ Fred Hutch
Dirk Petersen
 
Intelligent Ruby + Machine Learning
Intelligent Ruby + Machine LearningIntelligent Ruby + Machine Learning
Intelligent Ruby + Machine LearningIlya Grigorik
 
Streaming Random Forest Learning in Spark and StreamDM with Heitor Murilogome...
Streaming Random Forest Learning in Spark and StreamDM with Heitor Murilogome...Streaming Random Forest Learning in Spark and StreamDM with Heitor Murilogome...
Streaming Random Forest Learning in Spark and StreamDM with Heitor Murilogome...
Databricks
 
Nexxworks bootcamp ML6 (27/09/2017)
Nexxworks bootcamp ML6 (27/09/2017)Nexxworks bootcamp ML6 (27/09/2017)
Nexxworks bootcamp ML6 (27/09/2017)
Karel Dumon
 
All AI Roads lead to Distribution - Dot AI
All AI Roads lead to Distribution - Dot AIAll AI Roads lead to Distribution - Dot AI
All AI Roads lead to Distribution - Dot AI
Jim Dowling
 
Cloud Roundtable at Microsoft Switzerland
Cloud Roundtable at Microsoft Switzerland Cloud Roundtable at Microsoft Switzerland
Cloud Roundtable at Microsoft Switzerland
mictc
 
End-to-End Platform Support for Distributed Deep Learning in Finance
End-to-End Platform Support for Distributed Deep Learning in FinanceEnd-to-End Platform Support for Distributed Deep Learning in Finance
End-to-End Platform Support for Distributed Deep Learning in Finance
Jim Dowling
 
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-LearnAccelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
Gilles Louppe
 
The Future of Computing is Distributed
The Future of Computing is DistributedThe Future of Computing is Distributed
The Future of Computing is Distributed
Alluxio, Inc.
 
DECK36 - Log everything! and Realtime Datastream Analytics with Storm
DECK36 - Log everything! and Realtime Datastream Analytics with StormDECK36 - Log everything! and Realtime Datastream Analytics with Storm
DECK36 - Log everything! and Realtime Datastream Analytics with Storm
Mike Lohmann
 
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSAccelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Databricks
 
Jfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocksJfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocks
Jim Dowling
 

Similar to Kim Hammar - Distributed Deep Learning - RISE Learning Machines Meetup (20)

Large Model support and Distribute deep learning
Large Model support and Distribute deep learningLarge Model support and Distribute deep learning
Large Model support and Distribute deep learning
 
Kim Hammar - Spotify ML Guild Meetup - Feature Stores
Kim Hammar - Spotify ML Guild Meetup - Feature StoresKim Hammar - Spotify ML Guild Meetup - Feature Stores
Kim Hammar - Spotify ML Guild Meetup - Feature Stores
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
 
Machine learning at scale by Amy Unruh from Google
Machine learning at scale by  Amy Unruh from GoogleMachine learning at scale by  Amy Unruh from Google
Machine learning at scale by Amy Unruh from Google
 
Google Big Data Expo
Google Big Data ExpoGoogle Big Data Expo
Google Big Data Expo
 
Buzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningBuzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learning
 
BOS K8S Meetup - Finetuning LLama 2 Model on GKE.pdf
BOS K8S Meetup - Finetuning LLama 2 Model on GKE.pdfBOS K8S Meetup - Finetuning LLama 2 Model on GKE.pdf
BOS K8S Meetup - Finetuning LLama 2 Model on GKE.pdf
 
Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)
 
Scientific Computing @ Fred Hutch
Scientific Computing @ Fred HutchScientific Computing @ Fred Hutch
Scientific Computing @ Fred Hutch
 
Intelligent Ruby + Machine Learning
Intelligent Ruby + Machine LearningIntelligent Ruby + Machine Learning
Intelligent Ruby + Machine Learning
 
Streaming Random Forest Learning in Spark and StreamDM with Heitor Murilogome...
Streaming Random Forest Learning in Spark and StreamDM with Heitor Murilogome...Streaming Random Forest Learning in Spark and StreamDM with Heitor Murilogome...
Streaming Random Forest Learning in Spark and StreamDM with Heitor Murilogome...
 
Nexxworks bootcamp ML6 (27/09/2017)
Nexxworks bootcamp ML6 (27/09/2017)Nexxworks bootcamp ML6 (27/09/2017)
Nexxworks bootcamp ML6 (27/09/2017)
 
All AI Roads lead to Distribution - Dot AI
All AI Roads lead to Distribution - Dot AIAll AI Roads lead to Distribution - Dot AI
All AI Roads lead to Distribution - Dot AI
 
Cloud Roundtable at Microsoft Switzerland
Cloud Roundtable at Microsoft Switzerland Cloud Roundtable at Microsoft Switzerland
Cloud Roundtable at Microsoft Switzerland
 
End-to-End Platform Support for Distributed Deep Learning in Finance
End-to-End Platform Support for Distributed Deep Learning in FinanceEnd-to-End Platform Support for Distributed Deep Learning in Finance
End-to-End Platform Support for Distributed Deep Learning in Finance
 
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-LearnAccelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
 
The Future of Computing is Distributed
The Future of Computing is DistributedThe Future of Computing is Distributed
The Future of Computing is Distributed
 
DECK36 - Log everything! and Realtime Datastream Analytics with Storm
DECK36 - Log everything! and Realtime Datastream Analytics with StormDECK36 - Log everything! and Realtime Datastream Analytics with Storm
DECK36 - Log everything! and Realtime Datastream Analytics with Storm
 
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSAccelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
 
Jfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocksJfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocks
 

More from Kim Hammar

Automated Intrusion Response - CDIS Spring Conference 2024
Automated Intrusion Response - CDIS Spring Conference 2024Automated Intrusion Response - CDIS Spring Conference 2024
Automated Intrusion Response - CDIS Spring Conference 2024
Kim Hammar
 
Automated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesAutomated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jectures
Kim Hammar
 
Självlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTHSjälvlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTH
Kim Hammar
 
Learning Automated Intrusion Response
Learning Automated Intrusion ResponseLearning Automated Intrusion Response
Learning Automated Intrusion Response
Kim Hammar
 
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback ControlIntrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
Kim Hammar
 
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Kim Hammar
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Kim Hammar
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Kim Hammar
 
Learning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via DecompositionLearning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via Decomposition
Kim Hammar
 
Digital Twins for Security Automation
Digital Twins for Security AutomationDigital Twins for Security Automation
Digital Twins for Security Automation
Kim Hammar
 
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Kim Hammar
 
Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.
Kim Hammar
 
Intrusion Response through Optimal Stopping
Intrusion Response through Optimal StoppingIntrusion Response through Optimal Stopping
Intrusion Response through Optimal Stopping
Kim Hammar
 
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
Kim Hammar
 
Self-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber DefenseSelf-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber Defense
Kim Hammar
 
Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.
Kim Hammar
 
Learning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal StoppingLearning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal Stopping
Kim Hammar
 
Intrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingIntrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal Stopping
Kim Hammar
 
Intrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-PlayIntrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-Play
Kim Hammar
 
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Kim Hammar
 

More from Kim Hammar (20)

Automated Intrusion Response - CDIS Spring Conference 2024
Automated Intrusion Response - CDIS Spring Conference 2024Automated Intrusion Response - CDIS Spring Conference 2024
Automated Intrusion Response - CDIS Spring Conference 2024
 
Automated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesAutomated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jectures
 
Självlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTHSjälvlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTH
 
Learning Automated Intrusion Response
Learning Automated Intrusion ResponseLearning Automated Intrusion Response
Learning Automated Intrusion Response
 
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback ControlIntrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
 
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
 
Learning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via DecompositionLearning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via Decomposition
 
Digital Twins for Security Automation
Digital Twins for Security AutomationDigital Twins for Security Automation
Digital Twins for Security Automation
 
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
 
Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.
 
Intrusion Response through Optimal Stopping
Intrusion Response through Optimal StoppingIntrusion Response through Optimal Stopping
Intrusion Response through Optimal Stopping
 
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
 
Self-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber DefenseSelf-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber Defense
 
Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.
 
Learning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal StoppingLearning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal Stopping
 
Intrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingIntrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal Stopping
 
Intrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-PlayIntrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-Play
 
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
 

Recently uploaded

STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 

Recently uploaded (20)

STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 

Kim Hammar - Distributed Deep Learning - RISE Learning Machines Meetup

  • 1. Distributed Deep Learning (DDL) with HopsML RISE Machine Learning Study Group Kim Hammar kim@logicalclocks.com November 29, 2018 Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 1 / 20
  • 2. Outline 1 Distributed Deep Learning (DDL) Theory 2 HopsML: Distributed Deep Learning in Practice 3 Use-Case of DDL: Anti-Money-Laundering Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 2 / 20
  • 3. b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Distributed Computing Deep Learning Why Combine the two? More productive Data Science1 Unreasonable effectiveness of data2 To achieve state-of-the-art results3 1 Alex Sergeev and title = Meet Horovod: Uber’s Open Source Distributed Deep Learning Framework for TensorFlow howpublished = https://eng.uber.com/horovod/ note = Accessed: 2018-11-24 Mike Del Balso year=2017. 2 Chen Sun et al. “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”. In: CoRR abs/1707.02968 (2017). arXiv: 1707.02968. URL: http://arxiv.org/abs/1707.02968. 3 Jeffrey Dean et al. “Large Scale Distributed Deep Networks”. In: Advances in Neural Information Processing Systems 25. Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1223–1231. URL: http://papers.nips.cc/paper/4687-large-scale-distributed-deep-networks.pdf. Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 3 / 20
  • 4. What is Distributed Deep Learning? HPC Deep Learning Systems Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 4 / 20
  • 6. Data Parallelism Data Parallel Workers W b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy . . . b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Data Partitions P p1 p2 . . . pn Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 6 / 20
  • 7. When to use Model Parallel and Data Parallel? How big is your model parameters θ vs GPU memory? If size(θ) > size(gpu) you have to use model parallelism If your model fits on a single GPU =⇒ in 99.999% you want to use data parallelism to reduce training time Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 7 / 20
  • 8. Parameter Server Architecture Parameter Server ps d Data Parallel Workers W e1 e2 . . . en Data Partitions P p1 p2 . . . pn Broadcast parameters Upload gradients Local training Read data Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 8 / 20
  • 9. Ring-All-Reduce Architecture e1 e2 e3 e4 p1 p2 p3 p4 Broadcast parameters Upload gradients Local training Read data Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 9 / 20
  • 10. When to use Parameter-Server and when to use Ring-All-Reduce? Ring-all-reduce scales better =⇒ generally prefer ring-all-reduce4 4 Alex Sergeev and title = Meet Horovod: Uber’s Open Source Distributed Deep Learning Framework for TensorFlow howpublished = https://eng.uber.com/horovod/ note = Accessed: 2018-11-24 Mike Del Balso year=2017. Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 10 / 20
  • 11. How to get started? ICE (RISE SICS NORTH) provides the hardware that you need GPU Machines for training CPU Machines for data prep Disks for storing large datasets HopsML provides the ML infrastructure that you need Fast Distributed File System Spark-jobs and notebooks for data prep Framework for reproducible and versioned parallel experiments Framework for distributed training Framework for monitoring training Support for auto-scaling model serving Feature store (Soon!) Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 11 / 20
  • 12. Hopsworks: UI-driven front-end to the ML infrastructure Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 12 / 20
  • 13. Python-First API-powered workflow Write your regular tensorflow/python/pytorch/keras code and put it in a function, for example called collective_all_reduce_mnist, then you can create a reproducible experiment using many GPUs and collective-all-reduce as follows: from hops import experiment from hops import hdfs notebook = hdfs.project_path() + Jupyter/Distributed_Training/collective_allreduce_strategy/mnist.ipynb experiment.collective_all_reduce(collective_all_reduce_mnist , name=’mnist estimator’, description=’A minimal mnist example with two hidden layers’, versioned_resources=[notebook], local_logdir=True) Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 13 / 20
  • 14. Single-GPU Training on Hops HopsFS Spark Driver Read Data Write Results Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 14 / 20
  • 15. Parallel Experiments on Hops HopsFS Spark Driver Distributed Hyperparameter Search With GPUs E1 E2 E3 E4 E5 E6 read data write results Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 15 / 20
  • 16. Multi-GPU Training on Hops HopsFS Spark Driver Read Data Write Results send/receiveGPUs aranged in a logical ring for ring-all-reduce training Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 16 / 20
  • 17. Distributed GPU Training on Hops HopsFS Spark Driver E1 E2 E3 E4 read data write results Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 17 / 20
  • 18. Model Serving on Hops Kim Hammar (Logical Clocks) DDL on Hops November 29, 2018 18 / 20
  • 19. Register at hops.site, email: kim@logicalclocks.com if your registration is not approved Try out the deep learning tour on hopsworks Example code: https://github.com/logicalclocks/hops-examples Look at the docs: https://www.hops.io/ If you get stuck, write on gitter: https://gitter.im/hopshadoop/hopsworks DEMO