SlideShare a Scribd company logo
@ODSC
Distributed
Deep
Learning on
Hops Robin Andersson
Fabio Buso
RISE SICS AB
Logical Clocks AB
London | October 12th
-14th
2017
Please register on odsc.hops.site
Big Data and AI
3
Why you are here
4From: https:// research.fb.com/wp-content/uploads/2017/06/imagenet1kin1h5.pdf
Deep Learning with GPUs (on Hops)
5
Separate Clusters for Big Data and ML
6
*Slide from: TensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters, YAHOO!
7
I need estimates for the
ROI on these candidate
features in our product
We are on it. Need
to first sync up with
IT and engineering
Data Science in Enterprises Today
7
Data Science Team
CTO
88
IT
Collaboration Overhead is High
Prepare Dataset
samples for Data
Science
Data Science Team Data Engineering
We need access
to these
Datasets
DataLake
Ok
1. Update
Access
Rights
GPU Cluster
2. Copy Dataset Samples
(some time later)
3. Run experiments
99
How it should be
IT
Data Science Data Engineering
Here’s someone
who can help you
out
I need help to
work on a project
for the CTO
Project
Conda Env, CPU/Storage Quotas, Self-Service, GDPR
Kafka
Topics
DataLake
GPU Cluster
Elasticsearch
HopsWorks Data Platform
10
HopsWorks
11
Kafka Topic
Project X Project Y
Project Data
HopsFS
12
Open Source fork of
Apache HDFS
16x faster than HDFS
37x more capacity than
HDFS
SSL/TLS instead of
Kerberos
Scale Challenge Winner (2017)
https://www.usenix.org/conference/fast17/technical-sessions/presentation/niazi
HopsYARN GPUs
13
Native GPU support in YARN - world first
Implications
- Schedule GPUs just like memory or CPU
- Exclusive allocation (no GPU-sharing)
- Distributed, scale-out Machine Learning
TensorFlow first-class support in Hops
14
Run in
Spark Executor
TensorFlow code
0.003 learning rate, 0.3 dropout0.001 learning rate, 0.5 dropout
0.002 learning rate, 0.7 dropout
Spark Executor
TensorFlow code
Spark Executor
TensorFlow code
HopsUtil
Library for launching TensorFlow jobs
Manages the TensorBoard lifecycle
Helper Functions for Spark/Kafka/HDFS/etc
15
HopsUtil - Read data
from hopsutil import hdfs
dataset=path.join(hdfs.project_path(),‘Resources/
mnist/tfr/train’)
files=tf.gfile.Glob(path.join(dataset,‘part-*’))
file_queue=tf.train.string_input_producer(files,
… )
16
17
HopsUtil - initialize Pydoop HDFS API
Pydoop HDFS API is a rich api that provides operations such as
- Connecting to an HDFS instance
- General file operations (create, read, write)
- Get information on files, directories, fs
Connect to HopsFS using HopsUtil:
from hopsutil import hdfs
pydoop_handle = hdfs.get()
17
HopsUtil - TensorBoard
from hopsutil import tensorboard
[...]
logdir = tensorboard.logdir()
sv = tf.train.Supervisor(is_chief=True,
logdir=logdir, [...], save_model_secs=60)
18
HopsUtil - Hyperparameter searching
from hopsutil import tflauncher
def training(learning_rate, dropout):
[....]
params = {‘learning_rate': [0.001, 0.002, 0.003],
'dropout': [0.3, 0.5, 0.7]}
tflauncher.launch(spark, training, params)
19
HopsUtil - Logging
from hopsutil import hdfs
[...]
while not sv.should_stop() and step < steps:
hdfs.log(sess.run(accuracy))
[...]
20
DEMO TIME!
TensorFlow tour on HopsWorks
21
22
How to get started
23
How to get started (2)
24
How to get started (3)
25
TensorBoard
26
Dela - Search for interesting datasets
27
Dela - Import a Dataset
Dela
28
p2p network of Hops clusters
Find and share interesting datasets
Exploits unused bandwidth and backs off in case of
network traffic
The Challenge
29
http://timdettmers.com/2017/08/31/deep-learning-research-directions
Experiment Time and Research Productivity
● Minutes, Hours:
○ Interactive analysis!
● 1-4 days
○ Interactivity replaced by
many parallel experiments
● 1-4 weeks
○ High value experiments only
● >1 month
○ Don’t even try!
30
Solution: Go distributed
31
State-of-the-Art in GPU Hardware
32
Nvidia DGX-1
33
SingleRoot Commodity GPU Cluster Computing
34
The budget side
35
Commodity Server*
➔ 10 Nvidia GTX 1080Ti
◆ 11 GB Memory
➔ 256 GB Ram
➔ 2 Intel Xeon CPUs
➔ Infiniband
➔ SingleRoot PCI Complex
10 x Commodity Server =
150K Euro
Nvidia DGX-1
➔ 8 Nvidia Tesla V100
◆ 16 GB Memory
➔ 512 GB Ram
➔ 2 Intel Xeon CPUs
➔ Infiniband
➔ NVLink
Price per DGX-1 = 150K Euro
*www.servethehome.com/single-root-or-dual-root-for-deep-learning-gpu-to-gpu-systems/
36
Distributed TensorFlow
Distribute TensorFlow graph
Workers / Parameter server
Synchronous / Asynchronous
Model / Data parallelism
Problems:
- Clusterspec
- Manually starting process
37
Introducing TensorFlowOnSpark by YAHOO!
Wrapper for Distributed TensorFlow
- Creates clusterspec automatically!
- Runs on a Hadoop/Spark cluster
- Starts the workers/parameter servers automatically
- First attempt at “scheduling” GPUs
- Simplifies the programming model
- Manages TensorBoard
- “Migrate all existing TF programs with < 10 lines of code”
37
TensorFlowOnSpark architecture
38
HopsFs
Spark Driver
Spark Executor
Parameter
Server
Spark Executor
Worker
Spark Executor
Worker
Scaling TensorFlowOnSpark
39
Near linear scaling
up to 8 workers
*Slide from: TensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters, YAHOO!
TensorFlowOnSpark on Hops
40
41
Our improved TensorFlowOnSpark - 1
Problem:
Use RAM (1GPU = 27GB RAM) as a proxy to ‘schedule’
GPUs.
Solution:
Hops provides GPU scheduling!
41
42
Our improved TensorFlowOnSpark - 2
Problem:
A worker will wait until GPUs become available,
potentially forever!
Solution:
GPU scheduling ensures that the GPU is only allocated
for that particular worker.
42
43
Our improved TensorFlowOnSpark - 3
Problem:
Each parameter server allocates 1 GPU, this is a waste!
Solution:
Only workers may use GPUs
43
44
Conversion guide: TensorFlowOnSpark
TFCluster.run(spark, training_fun, num_executors, num_ps…)
Add PySpark and TensorFlowOnSpark imports
Create your own FileWriter
Replace tf.train.Server() with TFNode.start_cluster_server()
Full conversion guide for Distributed TensorFlow to TensorFlowOnSpark
https://github.com/yahoo/TensorFlowOnSpark/wiki/Conversion-Guide
44
DEMO TIME!
Distributed TF on Spark
45
Distributed Stochastic Gradient
Descent
46
SDG with Data Parallelism (Single Host)
47
Facebook: Scaling Synchronous SDG
June 2017: training time on
ImageNet from 2 weeks to 1
hour
➔ ~90% scaling efficiency
going from 8 to 256 GPUs
Learning rate heuristic/
Warm up phase/ Large
batches
48Paper: https:// research.fb.com/wp-content/uploads/2017/06/imagenet1kin1h5.pdf
All-Reduce
49
N GPUs, K parameters
Comm. cost: 2(N-1) * K/N
Independent from # GPUs
overlap communication and
computation
Drawback: Synchronous
communication
From: http://research.baidu.com/bringing-hpc-techniques-deep-learning/
Baidu All-Reduce - Performance scaling
50
From: http://research.baidu.com/bringing-hpc-techniques-deep-learning/
Horovod - Better than Baidu All-Reduce?
51
Fork of Baidu All-Reduce
Improvements
1. Replaced Baidu ring-allreduce with NVIDIA NCCL
2. Tensor Fusion
3. Support for larger models
4. Pip package
5. Horovod Timeline
5252
Migrating existing code to run on Horovod
1. Run hvd.init()
2. Pin a server GPU to be used by this process using
config.gpu_options.visible_device_list. Local rank
maps to unique GPU for the process.
3. Wrap optimizer in hvd.DistributedOptimizer.
4. Add hvd.BroadcastGlobalVariablesHook(0) to
broadcast initial variable states from rank 0 to all other
processes.
Horovod/Baidu AllReduce
53
Provide as a service on HopsWorks
Integration of All-Reduce with a Hadoop cluster
- Use YARN to schedule GPUs
Scheduling of homogeneous GPUs and network
- YARN supports node labels
HopsFS authentication/authorization
TensorBoard lifecycle management as in HopsUtil
The team
Active contributors:
Jim Dowling, Seif Haridi, Tor Björn Minde, Gautier Berthou, Salman Niazi, Mahmoud Ismail, Theofilos
Kakantousis, Ermias Gebremeskel, Antonios Kouzoupis, Alex Ormenisan, Fabio Buso, Robin Andersson,
August Bonds, Filotas Siskos, Mahmoud Hamed.
Past contributors:
Vasileios Giannokostas, Johan Svedlund Nordström,Rizvi Hasan, Paul Mälzer, Bram Leenders, Juan Roca,
Misganu Dessalegn, K “Sri” Srijeyanthan, Jude D’Souza, Alberto Lorente, Andre Moré, Ali Gholami, Davis
Jaunzems, Stig Viaene, Hooman Peiro, Evangelos Savvidis, Steffen Grohsschmiedt, Qi Qi, Gayana
Chandrasekara, Nikolaos Stanogias, Daniel Bali, Ioannis Kerkinos, Peter Buechler, Pushparaj Motamari, Hamid
Afzali, Wasif Malik, Lalith Suresh, Mariano Valles, Ying Lieu, Aruna Kumari Yedurupaka, Tobias Johansson,
Roberto Bampi, Fanti Machmount Al Samisti, Braulio Grana, Adam Alpire, Zahin Azher Rashid.
54
www.hops.io
github.com/hopshadoop
@hopshadoop
55

More Related Content

What's hot

Profiling & Testing with Spark
Profiling & Testing with SparkProfiling & Testing with Spark
Profiling & Testing with SparkRoger Rafanell Mas
 
CuPy: A NumPy-compatible Library for GPU
CuPy: A NumPy-compatible Library for GPUCuPy: A NumPy-compatible Library for GPU
CuPy: A NumPy-compatible Library for GPUShohei Hido
 
Linux Performance Tools
Linux Performance ToolsLinux Performance Tools
Linux Performance ToolsBrendan Gregg
 
Java Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsJava Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsBrendan Gregg
 
MeetBSD2014 Performance Analysis
MeetBSD2014 Performance AnalysisMeetBSD2014 Performance Analysis
MeetBSD2014 Performance AnalysisBrendan Gregg
 
Easy and High Performance GPU Programming for Java Programmers
Easy and High Performance GPU Programming for Java ProgrammersEasy and High Performance GPU Programming for Java Programmers
Easy and High Performance GPU Programming for Java ProgrammersKazuaki Ishizaki
 
Analyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE MethodAnalyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE MethodBrendan Gregg
 
LISA2010 visualizations
LISA2010 visualizationsLISA2010 visualizations
LISA2010 visualizationsBrendan Gregg
 
From DTrace to Linux
From DTrace to LinuxFrom DTrace to Linux
From DTrace to LinuxBrendan Gregg
 
Transparent GPU Exploitation for Java
Transparent GPU Exploitation for JavaTransparent GPU Exploitation for Java
Transparent GPU Exploitation for JavaKazuaki Ishizaki
 
TFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU DelegatesTFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU DelegatesKoan-Sin Tan
 
HadoopCon 2016 - 用 Jupyter Notebook Hold 住一個上線 Spark Machine Learning 專案實戰
HadoopCon 2016  - 用 Jupyter Notebook Hold 住一個上線 Spark  Machine Learning 專案實戰HadoopCon 2016  - 用 Jupyter Notebook Hold 住一個上線 Spark  Machine Learning 專案實戰
HadoopCon 2016 - 用 Jupyter Notebook Hold 住一個上線 Spark Machine Learning 專案實戰Wayne Chen
 
DTrace Topics: Introduction
DTrace Topics: IntroductionDTrace Topics: Introduction
DTrace Topics: IntroductionBrendan Gregg
 
Tensorflow internal
Tensorflow internalTensorflow internal
Tensorflow internalHyunghun Cho
 
What Linux can learn from Solaris performance and vice-versa
What Linux can learn from Solaris performance and vice-versaWhat Linux can learn from Solaris performance and vice-versa
What Linux can learn from Solaris performance and vice-versaBrendan Gregg
 
Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixBrendan Gregg
 
Introduction to Chainer 11 may,2018
Introduction to Chainer 11 may,2018Introduction to Chainer 11 may,2018
Introduction to Chainer 11 may,2018Preferred Networks
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopDataWorks Summit
 
Chainer ui v0.3 and imagereport
Chainer ui v0.3 and imagereportChainer ui v0.3 and imagereport
Chainer ui v0.3 and imagereportPreferred Networks
 

What's hot (20)

Profiling & Testing with Spark
Profiling & Testing with SparkProfiling & Testing with Spark
Profiling & Testing with Spark
 
CuPy: A NumPy-compatible Library for GPU
CuPy: A NumPy-compatible Library for GPUCuPy: A NumPy-compatible Library for GPU
CuPy: A NumPy-compatible Library for GPU
 
Linux Performance Tools
Linux Performance ToolsLinux Performance Tools
Linux Performance Tools
 
Java Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsJava Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame Graphs
 
MeetBSD2014 Performance Analysis
MeetBSD2014 Performance AnalysisMeetBSD2014 Performance Analysis
MeetBSD2014 Performance Analysis
 
Easy and High Performance GPU Programming for Java Programmers
Easy and High Performance GPU Programming for Java ProgrammersEasy and High Performance GPU Programming for Java Programmers
Easy and High Performance GPU Programming for Java Programmers
 
Analyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE MethodAnalyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE Method
 
LISA2010 visualizations
LISA2010 visualizationsLISA2010 visualizations
LISA2010 visualizations
 
From DTrace to Linux
From DTrace to LinuxFrom DTrace to Linux
From DTrace to Linux
 
Exploiting GPUs in Spark
Exploiting GPUs in SparkExploiting GPUs in Spark
Exploiting GPUs in Spark
 
Transparent GPU Exploitation for Java
Transparent GPU Exploitation for JavaTransparent GPU Exploitation for Java
Transparent GPU Exploitation for Java
 
TFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU DelegatesTFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU Delegates
 
HadoopCon 2016 - 用 Jupyter Notebook Hold 住一個上線 Spark Machine Learning 專案實戰
HadoopCon 2016  - 用 Jupyter Notebook Hold 住一個上線 Spark  Machine Learning 專案實戰HadoopCon 2016  - 用 Jupyter Notebook Hold 住一個上線 Spark  Machine Learning 專案實戰
HadoopCon 2016 - 用 Jupyter Notebook Hold 住一個上線 Spark Machine Learning 專案實戰
 
DTrace Topics: Introduction
DTrace Topics: IntroductionDTrace Topics: Introduction
DTrace Topics: Introduction
 
Tensorflow internal
Tensorflow internalTensorflow internal
Tensorflow internal
 
What Linux can learn from Solaris performance and vice-versa
What Linux can learn from Solaris performance and vice-versaWhat Linux can learn from Solaris performance and vice-versa
What Linux can learn from Solaris performance and vice-versa
 
Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at Netflix
 
Introduction to Chainer 11 may,2018
Introduction to Chainer 11 may,2018Introduction to Chainer 11 may,2018
Introduction to Chainer 11 may,2018
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and Hadoop
 
Chainer ui v0.3 and imagereport
Chainer ui v0.3 and imagereportChainer ui v0.3 and imagereport
Chainer ui v0.3 and imagereport
 

Viewers also liked

Data Science with the Help of Metadata
Data Science with the Help of MetadataData Science with the Help of Metadata
Data Science with the Help of MetadataJim Dowling
 
Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Hopsworks - Self-Service Spark/Flink/Kafka/HadoopHopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Hopsworks - Self-Service Spark/Flink/Kafka/HadoopJim Dowling
 
Multi-tenant Flink as-a-service with Kafka on Hopsworks
Multi-tenant Flink as-a-service with Kafka on HopsworksMulti-tenant Flink as-a-service with Kafka on Hopsworks
Multi-tenant Flink as-a-service with Kafka on HopsworksJim Dowling
 
Spark Summit EU talk by Jim Dowling
Spark Summit EU talk by Jim DowlingSpark Summit EU talk by Jim Dowling
Spark Summit EU talk by Jim DowlingSpark Summit
 
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and ZeppelinJim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and ZeppelinFlink Forward
 
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN Flink Forward
 

Viewers also liked (6)

Data Science with the Help of Metadata
Data Science with the Help of MetadataData Science with the Help of Metadata
Data Science with the Help of Metadata
 
Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Hopsworks - Self-Service Spark/Flink/Kafka/HadoopHopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
 
Multi-tenant Flink as-a-service with Kafka on Hopsworks
Multi-tenant Flink as-a-service with Kafka on HopsworksMulti-tenant Flink as-a-service with Kafka on Hopsworks
Multi-tenant Flink as-a-service with Kafka on Hopsworks
 
Spark Summit EU talk by Jim Dowling
Spark Summit EU talk by Jim DowlingSpark Summit EU talk by Jim Dowling
Spark Summit EU talk by Jim Dowling
 
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and ZeppelinJim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
 
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
 

Similar to Odsc workshop - Distributed Tensorflow on Hops

Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSAccelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSDatabricks
 
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUsHow to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUsAltoros
 
Distributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
Distributed Deep Learning with Apache Spark and TensorFlow with Jim DowlingDistributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
Distributed Deep Learning with Apache Spark and TensorFlow with Jim DowlingDatabricks
 
RAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceRAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceData Works MD
 
Scientific Computing @ Fred Hutch
Scientific Computing @ Fred HutchScientific Computing @ Fred Hutch
Scientific Computing @ Fred HutchDirk Petersen
 
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...DataWorks Summit
 
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020John Zedlewski
 
Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)Jim Dowling
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioAlluxio, Inc.
 
High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...
High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...
High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...Chris Fregly
 
Capital onehadoopintro
Capital onehadoopintroCapital onehadoopintro
Capital onehadoopintroDoug Chang
 
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...E-Commerce Brasil
 
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdfS51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdfDLow6
 
RAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature EngineeringRAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature EngineeringKeith Kraus
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkDatabricks
 
Apache Submarine: Unified Machine Learning Platform
Apache Submarine: Unified Machine Learning PlatformApache Submarine: Unified Machine Learning Platform
Apache Submarine: Unified Machine Learning PlatformWangda Tan
 
Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Jim Dowling
 
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...Codemotion
 
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...Codemotion
 

Similar to Odsc workshop - Distributed Tensorflow on Hops (20)

Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSAccelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
 
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUsHow to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
 
Distributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
Distributed Deep Learning with Apache Spark and TensorFlow with Jim DowlingDistributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
Distributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
 
RAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceRAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data Science
 
Scientific Computing @ Fred Hutch
Scientific Computing @ Fred HutchScientific Computing @ Fred Hutch
Scientific Computing @ Fred Hutch
 
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
 
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
 
Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...
High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...
High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...
 
Capital onehadoopintro
Capital onehadoopintroCapital onehadoopintro
Capital onehadoopintro
 
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
 
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdfS51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
 
Getting started with AMD GPUs
Getting started with AMD GPUsGetting started with AMD GPUs
Getting started with AMD GPUs
 
RAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature EngineeringRAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature Engineering
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
 
Apache Submarine: Unified Machine Learning Platform
Apache Submarine: Unified Machine Learning PlatformApache Submarine: Unified Machine Learning Platform
Apache Submarine: Unified Machine Learning Platform
 
Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning
 
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
 
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
 

More from Jim Dowling

ARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdfARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdfJim Dowling
 
PyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdfPyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdfJim Dowling
 
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleJim Dowling
 
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfPyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfJim Dowling
 
_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdfJim Dowling
 
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022Jim Dowling
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupJim Dowling
 
Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021Jim Dowling
 
Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21Jim Dowling
 
Hopsworks Feature Store 2.0 a new paradigm
Hopsworks Feature Store  2.0   a new paradigmHopsworks Feature Store  2.0   a new paradigm
Hopsworks Feature Store 2.0 a new paradigmJim Dowling
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Jim Dowling
 
GANs for Anti Money Laundering
GANs for Anti Money LaunderingGANs for Anti Money Laundering
GANs for Anti Money LaunderingJim Dowling
 
Berlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingBerlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingJim Dowling
 
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala UniversityInvited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala UniversityJim Dowling
 
Hopsworks data engineering melbourne april 2020
Hopsworks   data engineering melbourne april 2020Hopsworks   data engineering melbourne april 2020
Hopsworks data engineering melbourne april 2020Jim Dowling
 
The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines Jim Dowling
 
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and MaggyAsynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and MaggyJim Dowling
 
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleHopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleJim Dowling
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Jim Dowling
 
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019Jim Dowling
 

More from Jim Dowling (20)

ARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdfARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdf
 
PyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdfPyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdf
 
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData Seattle
 
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfPyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
 
_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf
 
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science Meetup
 
Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021
 
Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21
 
Hopsworks Feature Store 2.0 a new paradigm
Hopsworks Feature Store  2.0   a new paradigmHopsworks Feature Store  2.0   a new paradigm
Hopsworks Feature Store 2.0 a new paradigm
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks
 
GANs for Anti Money Laundering
GANs for Anti Money LaunderingGANs for Anti Money Laundering
GANs for Anti Money Laundering
 
Berlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingBerlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowling
 
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala UniversityInvited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
 
Hopsworks data engineering melbourne april 2020
Hopsworks   data engineering melbourne april 2020Hopsworks   data engineering melbourne april 2020
Hopsworks data engineering melbourne april 2020
 
The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines
 
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and MaggyAsynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
 
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleHopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, Sunnyvale
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019
 
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
 

Recently uploaded

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...Product School
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backElena Simperl
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyJohn Staveley
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Product School
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekCzechDreamin
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...Product School
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...Product School
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxDavid Michel
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationZilliz
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...Product School
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka DoktorováCzechDreamin
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1DianaGray10
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Product School
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIES VE
 

Recently uploaded (20)

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 

Odsc workshop - Distributed Tensorflow on Hops

  • 1. @ODSC Distributed Deep Learning on Hops Robin Andersson Fabio Buso RISE SICS AB Logical Clocks AB London | October 12th -14th 2017
  • 2. Please register on odsc.hops.site
  • 4. Why you are here 4From: https:// research.fb.com/wp-content/uploads/2017/06/imagenet1kin1h5.pdf
  • 5. Deep Learning with GPUs (on Hops) 5
  • 6. Separate Clusters for Big Data and ML 6 *Slide from: TensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters, YAHOO!
  • 7. 7 I need estimates for the ROI on these candidate features in our product We are on it. Need to first sync up with IT and engineering Data Science in Enterprises Today 7 Data Science Team CTO
  • 8. 88 IT Collaboration Overhead is High Prepare Dataset samples for Data Science Data Science Team Data Engineering We need access to these Datasets DataLake Ok 1. Update Access Rights GPU Cluster 2. Copy Dataset Samples (some time later) 3. Run experiments
  • 9. 99 How it should be IT Data Science Data Engineering Here’s someone who can help you out I need help to work on a project for the CTO Project Conda Env, CPU/Storage Quotas, Self-Service, GDPR Kafka Topics DataLake GPU Cluster Elasticsearch
  • 11. HopsWorks 11 Kafka Topic Project X Project Y Project Data
  • 12. HopsFS 12 Open Source fork of Apache HDFS 16x faster than HDFS 37x more capacity than HDFS SSL/TLS instead of Kerberos Scale Challenge Winner (2017) https://www.usenix.org/conference/fast17/technical-sessions/presentation/niazi
  • 13. HopsYARN GPUs 13 Native GPU support in YARN - world first Implications - Schedule GPUs just like memory or CPU - Exclusive allocation (no GPU-sharing) - Distributed, scale-out Machine Learning
  • 14. TensorFlow first-class support in Hops 14 Run in Spark Executor TensorFlow code 0.003 learning rate, 0.3 dropout0.001 learning rate, 0.5 dropout 0.002 learning rate, 0.7 dropout Spark Executor TensorFlow code Spark Executor TensorFlow code
  • 15. HopsUtil Library for launching TensorFlow jobs Manages the TensorBoard lifecycle Helper Functions for Spark/Kafka/HDFS/etc 15
  • 16. HopsUtil - Read data from hopsutil import hdfs dataset=path.join(hdfs.project_path(),‘Resources/ mnist/tfr/train’) files=tf.gfile.Glob(path.join(dataset,‘part-*’)) file_queue=tf.train.string_input_producer(files, … ) 16
  • 17. 17 HopsUtil - initialize Pydoop HDFS API Pydoop HDFS API is a rich api that provides operations such as - Connecting to an HDFS instance - General file operations (create, read, write) - Get information on files, directories, fs Connect to HopsFS using HopsUtil: from hopsutil import hdfs pydoop_handle = hdfs.get() 17
  • 18. HopsUtil - TensorBoard from hopsutil import tensorboard [...] logdir = tensorboard.logdir() sv = tf.train.Supervisor(is_chief=True, logdir=logdir, [...], save_model_secs=60) 18
  • 19. HopsUtil - Hyperparameter searching from hopsutil import tflauncher def training(learning_rate, dropout): [....] params = {‘learning_rate': [0.001, 0.002, 0.003], 'dropout': [0.3, 0.5, 0.7]} tflauncher.launch(spark, training, params) 19
  • 20. HopsUtil - Logging from hopsutil import hdfs [...] while not sv.should_stop() and step < steps: hdfs.log(sess.run(accuracy)) [...] 20
  • 21. DEMO TIME! TensorFlow tour on HopsWorks 21
  • 22. 22 How to get started
  • 23. 23 How to get started (2)
  • 24. 24 How to get started (3)
  • 26. 26 Dela - Search for interesting datasets
  • 27. 27 Dela - Import a Dataset
  • 28. Dela 28 p2p network of Hops clusters Find and share interesting datasets Exploits unused bandwidth and backs off in case of network traffic
  • 30. Experiment Time and Research Productivity ● Minutes, Hours: ○ Interactive analysis! ● 1-4 days ○ Interactivity replaced by many parallel experiments ● 1-4 weeks ○ High value experiments only ● >1 month ○ Don’t even try! 30
  • 32. State-of-the-Art in GPU Hardware 32
  • 34. SingleRoot Commodity GPU Cluster Computing 34
  • 35. The budget side 35 Commodity Server* ➔ 10 Nvidia GTX 1080Ti ◆ 11 GB Memory ➔ 256 GB Ram ➔ 2 Intel Xeon CPUs ➔ Infiniband ➔ SingleRoot PCI Complex 10 x Commodity Server = 150K Euro Nvidia DGX-1 ➔ 8 Nvidia Tesla V100 ◆ 16 GB Memory ➔ 512 GB Ram ➔ 2 Intel Xeon CPUs ➔ Infiniband ➔ NVLink Price per DGX-1 = 150K Euro *www.servethehome.com/single-root-or-dual-root-for-deep-learning-gpu-to-gpu-systems/
  • 36. 36 Distributed TensorFlow Distribute TensorFlow graph Workers / Parameter server Synchronous / Asynchronous Model / Data parallelism Problems: - Clusterspec - Manually starting process
  • 37. 37 Introducing TensorFlowOnSpark by YAHOO! Wrapper for Distributed TensorFlow - Creates clusterspec automatically! - Runs on a Hadoop/Spark cluster - Starts the workers/parameter servers automatically - First attempt at “scheduling” GPUs - Simplifies the programming model - Manages TensorBoard - “Migrate all existing TF programs with < 10 lines of code” 37
  • 38. TensorFlowOnSpark architecture 38 HopsFs Spark Driver Spark Executor Parameter Server Spark Executor Worker Spark Executor Worker
  • 39. Scaling TensorFlowOnSpark 39 Near linear scaling up to 8 workers *Slide from: TensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters, YAHOO!
  • 41. 41 Our improved TensorFlowOnSpark - 1 Problem: Use RAM (1GPU = 27GB RAM) as a proxy to ‘schedule’ GPUs. Solution: Hops provides GPU scheduling! 41
  • 42. 42 Our improved TensorFlowOnSpark - 2 Problem: A worker will wait until GPUs become available, potentially forever! Solution: GPU scheduling ensures that the GPU is only allocated for that particular worker. 42
  • 43. 43 Our improved TensorFlowOnSpark - 3 Problem: Each parameter server allocates 1 GPU, this is a waste! Solution: Only workers may use GPUs 43
  • 44. 44 Conversion guide: TensorFlowOnSpark TFCluster.run(spark, training_fun, num_executors, num_ps…) Add PySpark and TensorFlowOnSpark imports Create your own FileWriter Replace tf.train.Server() with TFNode.start_cluster_server() Full conversion guide for Distributed TensorFlow to TensorFlowOnSpark https://github.com/yahoo/TensorFlowOnSpark/wiki/Conversion-Guide 44
  • 47. SDG with Data Parallelism (Single Host) 47
  • 48. Facebook: Scaling Synchronous SDG June 2017: training time on ImageNet from 2 weeks to 1 hour ➔ ~90% scaling efficiency going from 8 to 256 GPUs Learning rate heuristic/ Warm up phase/ Large batches 48Paper: https:// research.fb.com/wp-content/uploads/2017/06/imagenet1kin1h5.pdf
  • 49. All-Reduce 49 N GPUs, K parameters Comm. cost: 2(N-1) * K/N Independent from # GPUs overlap communication and computation Drawback: Synchronous communication From: http://research.baidu.com/bringing-hpc-techniques-deep-learning/
  • 50. Baidu All-Reduce - Performance scaling 50 From: http://research.baidu.com/bringing-hpc-techniques-deep-learning/
  • 51. Horovod - Better than Baidu All-Reduce? 51 Fork of Baidu All-Reduce Improvements 1. Replaced Baidu ring-allreduce with NVIDIA NCCL 2. Tensor Fusion 3. Support for larger models 4. Pip package 5. Horovod Timeline
  • 52. 5252 Migrating existing code to run on Horovod 1. Run hvd.init() 2. Pin a server GPU to be used by this process using config.gpu_options.visible_device_list. Local rank maps to unique GPU for the process. 3. Wrap optimizer in hvd.DistributedOptimizer. 4. Add hvd.BroadcastGlobalVariablesHook(0) to broadcast initial variable states from rank 0 to all other processes.
  • 53. Horovod/Baidu AllReduce 53 Provide as a service on HopsWorks Integration of All-Reduce with a Hadoop cluster - Use YARN to schedule GPUs Scheduling of homogeneous GPUs and network - YARN supports node labels HopsFS authentication/authorization TensorBoard lifecycle management as in HopsUtil
  • 54. The team Active contributors: Jim Dowling, Seif Haridi, Tor Björn Minde, Gautier Berthou, Salman Niazi, Mahmoud Ismail, Theofilos Kakantousis, Ermias Gebremeskel, Antonios Kouzoupis, Alex Ormenisan, Fabio Buso, Robin Andersson, August Bonds, Filotas Siskos, Mahmoud Hamed. Past contributors: Vasileios Giannokostas, Johan Svedlund Nordström,Rizvi Hasan, Paul Mälzer, Bram Leenders, Juan Roca, Misganu Dessalegn, K “Sri” Srijeyanthan, Jude D’Souza, Alberto Lorente, Andre Moré, Ali Gholami, Davis Jaunzems, Stig Viaene, Hooman Peiro, Evangelos Savvidis, Steffen Grohsschmiedt, Qi Qi, Gayana Chandrasekara, Nikolaos Stanogias, Daniel Bali, Ioannis Kerkinos, Peter Buechler, Pushparaj Motamari, Hamid Afzali, Wasif Malik, Lalith Suresh, Mariano Valles, Ying Lieu, Aruna Kumari Yedurupaka, Tobias Johansson, Roberto Bampi, Fanti Machmount Al Samisti, Braulio Grana, Adam Alpire, Zahin Azher Rashid. 54