Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Distributed Deep Learning (on Hopsworks)
Jfokus, February 5th, 2019
jim_dowling
©2018 Logical Clocks AB. All Rights Reserved
Massive Increase in Compute for AI*
Distributed Systems
3.5 month-doubling ti...
©2018 Logical Clocks AB. All Rights Reserved
All Roads Lead to Distribution
3
Distributed
Deep Learning
Hyper
Parameter
Op...
©2018 Logical Clocks AB. All Rights Reserved
Model Training just One Piece of the Puzzle
4
MODEL TRAINING
[Diagram adapted...
This is a GPU
(graphical processing unit)
Deep Learning needs Big Data and GPUs
•Lots of data is needed to train
machine l...
©2018 Logical Clocks AB. All Rights Reserved
Hopsworks – the Data and AI Platform
6
Hopsworks
Batch
Deep Learning
Streaming
©2018 Logical Clocks AB. All Rights Reserved
Hopsworks – the Data and AI Platform
7
Hopsworks
Batch
Deep Learning
Streamin...
Data
Sources
HopsFS
Kafka
Airflow
Stream
Processing
Batch
Processing
Feature
Store
Data
Warehouse
Deep
Learning
BI Tools &...
Data
Sources
HopsFS
Kafka
Airflow
Stream
Processing
Batch
Processing
Feature
Store
Data
Warehouse
Deep
Learning
BI Tools &...
©2018 Logical Clocks AB. All Rights Reserved
ML Pipelines in Hopsworks
10
Event Data
StoreRaw Data Pre-ProcessIngest Explo...
©2018 Logical Clocks AB. All Rights Reserved
Scalable ML Pipeline
11
Data
Collection
Experimentation Training Serving
Feat...
TensorFlow Extended (TFX) Pipeline
122/5/2019
Model Monitoring?
TF
Serving
TF Model
Analysis
TF
Estimator
TF
Transform
TF ...
©2018 Logical Clocks AB. All Rights Reserved
A Fully Distributed Deep Learning Pipeline
13
PySpark KubernetesDistributed T...
©2018 Logical Clocks AB. All Rights Reserved
Hopsworks Machine Learning Pipeline
HopsFS
PySpark Kubernetes
Tensorboard
Log...
©2018 Logical Clocks AB. All Rights Reserved
Orchestrating ML Pipelines with Airflow
15
Airflow
Spark
ETL
Dist Training
Te...
Data Problem
“Data is the hardest part of ML and the most important
piece to get right. Modelers spend most of their time
...
Hopsworks’ Feature Store
- Reusability of features
between models and
teams
- Automatic backfilling of
features
- Automati...
Without a Feature Store
18
With a Feature Store
19
Modelling Data in the Feature Store
20
•The Storage Layer: For storing feature data in the feature store
•The Metadata Layer: For storing feature metadata (versi...
Feature Storage
22
Feature Metadata
23
©2018 Logical Clocks AB. All Rights Reserved
24
Reading from the Feature Store (Data Scientist API)
from hops import featu...
Distribution with PySpark
25
Data
Collection
Experimentation Training Serving
Feature
Extraction
Data
Transformation
& Ver...
©2018 Logical Clocks AB. All Rights Reserved
PySpark gets GPUs from a Resource Manager
YARN/Kubernetes/Mesos
PySpark Appli...
©2018 Logical Clocks AB. All Rights Reserved
PySpark with GPUs in Hopsworks
27
Executor Executor
Driver
©2018 Logical Clocks AB. All Rights Reserved
PySpark Hello World
28
©2018 Logical Clocks AB. All Rights Reserved
PySpark in Code
29
Hello GPU World
1
*
Executor
print(“Hello from GPU”)
Drive...
©2018 Logical Clocks AB. All Rights Reserved
Driver with 4 Executors
30
print(“Hello from GPU”)
Driver
print(“Hello from G...
©2018 Logical Clocks AB. All Rights Reserved
Driver with 4 Executors in Code
31
©2018 Logical Clocks AB. All Rights Reserved
Hopsworks: Replicated Conda Environment
32
conda_env
conda_env conda_env cond...
©2018 Logical Clocks AB. All Rights Reserved
33
A Conda Environment Per Project in Hopsworks
©2018 Logical Clocks AB. All Rights Reserved
34
Executor 1 Executor N
Driver
HopsFSTensorBoard
HopsFS for:
• Training/Test...
Filesystems are not good enough
Uber on Petastorm:
“[Using files] is hard to implement at large scale, especially
using mo...
NVMe Disks – Game Changer
•HDFS (and S3) are designed around large blocks
(optimized to overcome slow random I/O on disks)...
©2018 Logical Clocks AB. All Rights Reserved
Small files on NVMe*
37
•Spotify’s HDFS:
-33% of files < 64KB in size
-42% of...
©2018 Logical Clocks AB. All Rights Reserved
HopsFS – NVMe Performance
•HopsFS is HDFS with
Distributed Metadata
-Winner I...
©2018 Logical Clocks AB. All Rights Reserved
Hyperparameter Optimization
39
(Because DL Theory Sucks!)
©2018 Logical Clocks AB. All Rights Reserved
Faster Experimentation
40
LearningRate (LR): [0.0001-0.001]
NumOfLayers (NL):...
©2018 Logical Clocks AB. All Rights Reserved
def train(learning_rate, dropout):
[TensorFlow Code here]
args_dict = {'learn...
©2018 Logical Clocks AB. All Rights Reserved
Data Parallel (Distributed) Training
42
Training Time
Generalization
Error
(S...
©2018 Logical Clocks AB. All Rights Reserved
Aggregate
Gradients
New Model
Barrier
43
GPU0 GPU1 GPU99
Distributed Filesyst...
Frameworks for Distributed Training
©2018 Logical Clocks AB. All Rights Reserved
Distributed TensorFlow / TfOnSpark
45
TF_CONFIG
Bring your own
Distribution!
...
©2018 Logical Clocks AB. All Rights Reserved
Ring AllReduce
46
•Bandwidth optimal
•Builds the Ring, runs
AllReduce using M...
©2018 Logical Clocks AB. All Rights Reserved
Tf CollectiveAllReduceStrategy
47
TF_CONFIG, again.
Bring your own
Distributi...
©2018 Logical Clocks AB. All Rights Reserved
HopsML CollectiveAllReduceStrategy
48
•Uses Spark/YARN to add
distribution to...
©2018 Logical Clocks AB. All Rights Reserved
Estimator APIs in TensorFlow
49
©2018 Logical Clocks AB. All Rights Reserved
Estimators log to the Distributed Filesystem
50
#over-simplified code – see
#...
©2018 Logical Clocks AB. All Rights Reserved
#over-simplified code – see notebook for full example
def distributed_trainin...
©2018 Logical Clocks AB. All Rights Reserved
Experiments/Versioning in Hopsworks
©2018 Logical Clocks AB. All Rights Reserved
53
Experiments
Tensorboard
©2018 Logical Clocks AB. All Rights Reserved
54
Experiment
Details
©2018 Logical Clocks AB. All Rights Reserved
55
Tensorboard
Visualizations
©2018 Logical Clocks AB. All Rights Reserved
Model Serving
56
Hopsworks
REST API
External
Apps
Internal
Apps
Logging
Load
...
Summary
•The future of Deep Learning is Distributed
https://www.oreilly.com/ideas/distributed-tensorflow
•Hopsworks is a n...
HopsML Meetup
•https://www.meetup.com/HopsML-Stockholm/
•3 months old with already 320+ members
•Next Event in March 2019
©2018 Logical Clocks AB. All Rights Reserved
Hops / Hopsworks Open Source Projects
Contributors:
Jim Dowling, Seif Haridi,...
©2018 Logical Clocks AB. All Rights Reserved
60
@logicalclocks
www.logicalclocks.com
1. Register for an account at:
www.hops.site
2. Enter your Firstname/lastname here:
https://bit.ly/2UEixTr
61
Upcoming SlideShare
Loading in …5
×

Jfokus 2019-dowling-logical-clocks

186 views

Published on

Distributed Deep Learning on Hopsworks at JFokus 2019

Published in: Software
  • Be the first to comment

Jfokus 2019-dowling-logical-clocks

  1. 1. Distributed Deep Learning (on Hopsworks) Jfokus, February 5th, 2019 jim_dowling
  2. 2. ©2018 Logical Clocks AB. All Rights Reserved Massive Increase in Compute for AI* Distributed Systems 3.5 month-doubling time *https://blog.openai.com/ai-and-compute 2
  3. 3. ©2018 Logical Clocks AB. All Rights Reserved All Roads Lead to Distribution 3 Distributed Deep Learning Hyper Parameter Optimization Distributed Training Larger Training Datasets Elastic Model Serving Parallel Experiments (Commodity) GPU Clusters Auto ML
  4. 4. ©2018 Logical Clocks AB. All Rights Reserved Model Training just One Piece of the Puzzle 4 MODEL TRAINING [Diagram adapted from “technical debt of machine learning”]
  5. 5. This is a GPU (graphical processing unit) Deep Learning needs Big Data and GPUs •Lots of data is needed to train machine learning (ML) models Big Data Analytics Features DL
  6. 6. ©2018 Logical Clocks AB. All Rights Reserved Hopsworks – the Data and AI Platform 6 Hopsworks Batch Deep Learning Streaming
  7. 7. ©2018 Logical Clocks AB. All Rights Reserved Hopsworks – the Data and AI Platform 7 Hopsworks Batch Deep Learning Streaming DATA AI
  8. 8. Data Sources HopsFS Kafka Airflow Stream Processing Batch Processing Feature Store Data Warehouse Deep Learning BI Tools & Reporting Notebooks Model Serving Hopsworks On-Premise, AWS, Azure, GCE External Service Hopsworks Service
  9. 9. Data Sources HopsFS Kafka Airflow Stream Processing Batch Processing Feature Store Data Warehouse Deep Learning BI Tools & Reporting Notebooks Model Serving Hopsworks On-Premise, AWS, Azure, GCE External Service Hopsworks Service BATCH STREAMING DEEP LEARNING
  10. 10. ©2018 Logical Clocks AB. All Rights Reserved ML Pipelines in Hopsworks 10 Event Data StoreRaw Data Pre-ProcessIngest Explore/Deploy REST API HopsFS FeatureStore BI Tools Experiment/Train Serving Airflow
  11. 11. ©2018 Logical Clocks AB. All Rights Reserved Scalable ML Pipeline 11 Data Collection Experimentation Training Serving Feature Extraction Data Transformation & Verification Test Distributed Storage Potential Bottlenecks Object Stores (S3, GCS), HDFS, Ceph StandaloneSingle-Host Single GPU
  12. 12. TensorFlow Extended (TFX) Pipeline 122/5/2019 Model Monitoring? TF Serving TF Model Analysis TF Estimator TF Transform TF Data Validation
  13. 13. ©2018 Logical Clocks AB. All Rights Reserved A Fully Distributed Deep Learning Pipeline 13 PySpark KubernetesDistributed TensorFlow Input Data Training Data Model DO NOT feed Spark DataFrames to TensorFlow
  14. 14. ©2018 Logical Clocks AB. All Rights Reserved Hopsworks Machine Learning Pipeline HopsFS PySpark Kubernetes Tensorboard Logs + Experiments Feature Store Training Data Models Kafka REST API Keras, TensorFlow, PyTorch
  15. 15. ©2018 Logical Clocks AB. All Rights Reserved Orchestrating ML Pipelines with Airflow 15 Airflow Spark ETL Dist Training TensorFlow Test & Optimize Kubernetes ModelServing SparkStreaming Monitoring
  16. 16. Data Problem “Data is the hardest part of ML and the most important piece to get right. Modelers spend most of their time selecting and transforming features at training time and then building the pipelines to deliver those features to production models. Broken data is the most common cause of problems in production ML systems” – Uber 16 Data Collection Experimentation Training Serving Feature Extraction Data Transformation & Verification TestFEATURE STORE
  17. 17. Hopsworks’ Feature Store - Reusability of features between models and teams - Automatic backfilling of features - Automatic feature documentation and analysis - Feature versioning - Standardized access of features between training and serving - Feature discovery 17
  18. 18. Without a Feature Store 18
  19. 19. With a Feature Store 19
  20. 20. Modelling Data in the Feature Store 20
  21. 21. •The Storage Layer: For storing feature data in the feature store •The Metadata Layer: For storing feature metadata (versioning, feature analysis, documentation, jobs) •The Feature Engineering Jobs: For computing features •The Feature Registry: A user interface to share and discover features •The Feature Store API: For writing/reading to/from the feature store Building Blocks of a Feature Store 21
  22. 22. Feature Storage 22
  23. 23. Feature Metadata 23
  24. 24. ©2018 Logical Clocks AB. All Rights Reserved 24 Reading from the Feature Store (Data Scientist API) from hops import featurestore raw_data = spark.read.parquet(filename) polynomial_features = raw_data.map(lambda x: x^2) featurestore.insert_into_featuregroup(polynomial_features, "polynomial_featuregroup") from hops import featurestore features_df = featurestore.get_features([ "average_attendance", "average_player_age“ ]) Writing to the Feature Store (Data Engineer API)
  25. 25. Distribution with PySpark 25 Data Collection Experimentation Training Serving Feature Extraction Data Transformation & Verification Test
  26. 26. ©2018 Logical Clocks AB. All Rights Reserved PySpark gets GPUs from a Resource Manager YARN/Kubernetes/Mesos PySpark Application 10 GPUs, please
  27. 27. ©2018 Logical Clocks AB. All Rights Reserved PySpark with GPUs in Hopsworks 27 Executor Executor Driver
  28. 28. ©2018 Logical Clocks AB. All Rights Reserved PySpark Hello World 28
  29. 29. ©2018 Logical Clocks AB. All Rights Reserved PySpark in Code 29 Hello GPU World 1 * Executor print(“Hello from GPU”) Driver experiment.launch(..)
  30. 30. ©2018 Logical Clocks AB. All Rights Reserved Driver with 4 Executors 30 print(“Hello from GPU”) Driver print(“Hello from GPU”) print(“Hello from GPU”) print(“Hello from GPU”)
  31. 31. ©2018 Logical Clocks AB. All Rights Reserved Driver with 4 Executors in Code 31
  32. 32. ©2018 Logical Clocks AB. All Rights Reserved Hopsworks: Replicated Conda Environment 32 conda_env conda_env conda_env conda_env conda_env
  33. 33. ©2018 Logical Clocks AB. All Rights Reserved 33 A Conda Environment Per Project in Hopsworks
  34. 34. ©2018 Logical Clocks AB. All Rights Reserved 34 Executor 1 Executor N Driver HopsFSTensorBoard HopsFS for: • Training/Test Datasets (Big Data) • Model checkpointing, Model Serving • State for Distributed Training & Parallel Experiments Model Serving Distributed Filesystem for Coordination
  35. 35. Filesystems are not good enough Uber on Petastorm: “[Using files] is hard to implement at large scale, especially using modern distributed file systems such as HDFS and S3 (these systems are typically optimized for fast reads of large chunks of data).” https://eng.uber.com/petastorm/ 35
  36. 36. NVMe Disks – Game Changer •HDFS (and S3) are designed around large blocks (optimized to overcome slow random I/O on disks), while new NVMe hardware supports orders of magnitude faster random disk I/O. •Can we support faster random disk I/O with HDFS? -Yes with HopsFS. 36
  37. 37. ©2018 Logical Clocks AB. All Rights Reserved Small files on NVMe* 37 •Spotify’s HDFS: -33% of files < 64KB in size -42% of operations are on files < 16KB in size *Size Matters: Improving the Performance of Small Files in Hadoop, Middleware 2018. Niazi et al
  38. 38. ©2018 Logical Clocks AB. All Rights Reserved HopsFS – NVMe Performance •HopsFS is HDFS with Distributed Metadata -Winner IEEE Scale Prize 2017 •Small files stored replicated in the metadata layer on NVMe disks* - Read/Writes tens of 1000s of images/second with HopsFS + NVMe *Size Matters: Improving the Performance of Small Files in Hadoop, Middleware 2018. Niazi et al
  39. 39. ©2018 Logical Clocks AB. All Rights Reserved Hyperparameter Optimization 39 (Because DL Theory Sucks!)
  40. 40. ©2018 Logical Clocks AB. All Rights Reserved Faster Experimentation 40 LearningRate (LR): [0.0001-0.001] NumOfLayers (NL): [5-10] …… LR: 0.0001 NL: 5 Error: 0.35 LR: 0.0001 NL: 7 Error: 0.34 LR: 0.0005 NL: 5 Error: 0.38 LR: 0.0005 NL: 10 Error: 0.37 LR: 0.001 NL: 6 Error: 0.31 LR: 0.001 NL: 9 HyperParameter Optimization LR: 0.001 NL: 6 Error: 0.31 TensorFlow Program Hyperparameters Blacklist Executor LR: 0.001 NL: 9 Error: 0.36
  41. 41. ©2018 Logical Clocks AB. All Rights Reserved def train(learning_rate, dropout): [TensorFlow Code here] args_dict = {'learning_rate': [0.001, 0.005, 0.01], 'dropout': [0.5, 0.6]} experiment.launch(train, args_dict) GridSearch for Hyperparameters on HopsML Launch 6 Spark Executors
  42. 42. ©2018 Logical Clocks AB. All Rights Reserved Data Parallel (Distributed) Training 42 Training Time Generalization Error (Synchronous Stochastic Gradient Descent (SGD))
  43. 43. ©2018 Logical Clocks AB. All Rights Reserved Aggregate Gradients New Model Barrier 43 GPU0 GPU1 GPU99 Distributed Filesystem 32 files 32 files 32 files Batch Size of 3200 Copy of the Model on each GPU Data Parallel Training
  44. 44. Frameworks for Distributed Training
  45. 45. ©2018 Logical Clocks AB. All Rights Reserved Distributed TensorFlow / TfOnSpark 45 TF_CONFIG Bring your own Distribution! 1. Start all processes for P1,P2, G1-G4 yourself 2. Enter all IP addresses in TF_CONFIG along with GPU device IDs. Parameter Servers G1 G2 G3 G4 P1 P2 GPU Servers TF_CONFIG
  46. 46. ©2018 Logical Clocks AB. All Rights Reserved Ring AllReduce 46 •Bandwidth optimal •Builds the Ring, runs AllReduce using MPI and NCCL2 -Horovod -TensorFlow 1.11+
  47. 47. ©2018 Logical Clocks AB. All Rights Reserved Tf CollectiveAllReduceStrategy 47 TF_CONFIG, again. Bring your own Distribution! 1. Start all processes for G1-G4 yourself 2. Enter all IP addresses in TF_CONFIG along with GPU device IDs. G1 G2 G3 G4 TF_CONFIG Available from TensorFlow 1.11
  48. 48. ©2018 Logical Clocks AB. All Rights Reserved HopsML CollectiveAllReduceStrategy 48 •Uses Spark/YARN to add distribution to TensorFlow’s CollectiveAllReduceStrategy -Automatically builds the ring (Spark/YARN) -Allocates GPUs to Spark Executors https://github.com/logicalclocks/hops-examples/tree/master/tensorflow/notebooks/Distributed_Training
  49. 49. ©2018 Logical Clocks AB. All Rights Reserved Estimator APIs in TensorFlow 49
  50. 50. ©2018 Logical Clocks AB. All Rights Reserved Estimators log to the Distributed Filesystem 50 #over-simplified code – see #notebook for full example tf.estimator.RunConfig( ‘CollectiveAllReduceStrategy’ model_dir tensorboard_logs checkpoints ) experiment.collective_all_reduce(…) /Experiments/appId/run.ID/<nam /Experiments/appId/run.ID/<nam /Experiments/appId/run.ID/<nam HopsFS (HDFS) /Experiments/appId/run.ID/<nam /Experiments/appId/run.ID/<nam
  51. 51. ©2018 Logical Clocks AB. All Rights Reserved #over-simplified code – see notebook for full example def distributed_training(): … rc = tf.estimator.RunConfig(‘CollectiveAllReduceStrategy’) tf.estimator.train_and_evaluate(…) experiment.collective_all_reduce(distributed_training) HopsML CollectiveAllReduceStrategy with Keras/Tf
  52. 52. ©2018 Logical Clocks AB. All Rights Reserved Experiments/Versioning in Hopsworks
  53. 53. ©2018 Logical Clocks AB. All Rights Reserved 53 Experiments Tensorboard
  54. 54. ©2018 Logical Clocks AB. All Rights Reserved 54 Experiment Details
  55. 55. ©2018 Logical Clocks AB. All Rights Reserved 55 Tensorboard Visualizations
  56. 56. ©2018 Logical Clocks AB. All Rights Reserved Model Serving 56 Hopsworks REST API External Apps Internal Apps Logging Load Balancer Model Serving Containers Streaming Notifications Retrain Features: • Canary • Multiple Models • Scale-Out/In Frameworks:  TensorFlow Serving  MLeap for Spark  scikit-learn
  57. 57. Summary •The future of Deep Learning is Distributed https://www.oreilly.com/ideas/distributed-tensorflow •Hopsworks is a new Data and AI Platform with first-class support for Python / Deep Learning / ML / Data Governance / GPUs 57
  58. 58. HopsML Meetup •https://www.meetup.com/HopsML-Stockholm/ •3 months old with already 320+ members •Next Event in March 2019
  59. 59. ©2018 Logical Clocks AB. All Rights Reserved Hops / Hopsworks Open Source Projects Contributors: Jim Dowling, Seif Haridi, Gautier Berthou, Salman Niazi, Mahmoud Ismail, Theofilos Kakantousis, Ermias Gebremeskel, Antonios Kouzoupis, Alex Ormenisan, Fabio Buso, Robin Andersson, Kim Hammar, Vasileios Giannokostas, Johan Svedlund Nordström,Rizvi Hasan, Paul Mälzer, Bram Leenders, Juan Roca, Misganu Dessalegn, K “Sri” Srijeyanthan, Jude D’Souza, Alberto Lorente, Andre Moré, Ali Gholami, Davis Jaunzems, Stig Viaene, Hooman Peiro, Evangelos Savvidis, Steffen Grohsschmiedt, Qi Qi, Gayana Chandrasekara, Nikolaos Stanogias, Daniel Bali, Ioannis Kerkinos, Peter Buechler, Pushparaj Motamari, Hamid Afzali, Wasif Malik, Lalith Suresh, Mariano Valles, Ying Lieu, Fanti Machmount Al Samisti, Braulio Grana, Adam Alpire, Zahin Azher Rashid, ArunaKumari Yedurupaka, Tobias Johansson , Roberto Bampi, Roshan Sedar, August Bonds. www.hops.io @hopshadoop
  60. 60. ©2018 Logical Clocks AB. All Rights Reserved 60 @logicalclocks www.logicalclocks.com
  61. 61. 1. Register for an account at: www.hops.site 2. Enter your Firstname/lastname here: https://bit.ly/2UEixTr 61

×