Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
What to Upload to SlideShare
Loading in …3
×
1 of 39

End-to-End Deep Learning with Horovod on Apache Spark

3

Share

Download to read offline

Data processing and deep learning are often split into two pipelines, one for ETL processing, the second for model training. Enabling deep learning frameworks to integrate seamlessly with ETL jobs allows for more streamlined production jobs, with faster iteration between feature engineering and model training.

End-to-End Deep Learning with Horovod on Apache Spark

  1. 1. End to end Deep Learning with Horovod on Spark clusters Travis Addair, Uber, Inc. Thomas Graves, NVIDIA
  2. 2. Agenda Travis Addair ▪ Overview ▪ Introduction to Horovod ▪ Horovod Estimator API Thomas Graves ▪ Apache Spark 3.0 Accelerator-aware scheduling ▪ DEMO of end to end pipeline
  3. 3. Data Processing and Deep Learning
  4. 4. End to End Pipelines ▪ Pipelines include ETL before Deep Learning ▪ Application required split ETL and Deep Learning into separate applications ▪ Horovod Estimator API helps integrate seamlessly ▪ Deep Learning accelerated with GPUs ▪ What about GPU accelerating ETL
  5. 5. Introduction to Horovod
  6. 6. Deep Learning Refresher
  7. 7. Distributed Deep Learning
  8. 8. Early Distributed Training - Parameter Servers
  9. 9. Parameter Servers - Tradeoffs Pros ▪ Fault tolerant ▪ Supports asynchronous SGD Cons ▪ Usability (tight coupling between model and parameter servers) ▪ Scalability (many-to-one) ▪ Convergence (with async SGD) Source: Analysis and Comparison of Distributed Training Techniques for Deep Neural Networks in a Dynamic Environment (https://pdfs.semanticscholar.org/b745/74da37b775bf813bd9a28a72ba13ea6d47b3.pdf)
  10. 10. Introducing Horovod ▪ Framework agnostic ▪ TensorFlow, Keras, PyTorch, Apache MXNet ▪ High Performance features ▪ NCCL, GPUDirect, RDMA, tensor fusion ▪ Easy to use ▪ Just 5 lines of Python ▪ Open source ▪ Linux Foundation AI Foundation ▪ Easy to install ▪ pip install horovod horovod.ai
  11. 11. Horovod Technique: Allreduce
  12. 12. Benchmarking Horovod Horovod scales well beyond 128 GPUs. RDMA helps at a large scale.
  13. 13. Introduction to Horovod Spark Estimator API
  14. 14. Deep Learning at Uber: Recent Trends 1. DL now achieving state of the art performance with tabular data ▪ Existing tree models built with Spark ML / XGBoost migrating to DL 2. Many features, but low average quality ▪ Lots of iteration between feature engineering and model training Apache Hive and Apache Spark logos are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.
  15. 15. End-to-End Deep Learning at Uber
  16. 16. Model Training in Production + = ? How do we combine Deep Learning training with Apache Spark? TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc. PyTorch, PyTorch, the PyTorch logo, and all other trademarks, service marks, graphics and logos used in connection with PyTorch, or the Website are trademarks or registered trademarks of PyTorch or PyTorch’s licensors. No endorsement of Google or PyTorch is implied by the use of these marks. Apache Hive and Apache Spark logos are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.
  17. 17. Preprocessing Often comes in two different kinds: 1. Example-dependent a. Image color adjustments b. Image resizing 2. Dataset-dependent a. String indexing b. Normalization Solution: Need to fit the preprocessing first, and then apply it.
  18. 18. Spark ML Pipelines Concepts: Estimator, Transformer, Pipeline
  19. 19. Horovod Spark Estimators from tensorflow import keras import tensorflow as tf import horovod.spark.keras as hvd model = keras.models.Sequential() .add(keras.layers.Dense(8, input_dim=2)) .add(keras.layers.Activation('tanh')) .add(keras.layers.Dense(1)) .add(keras.layers.Activation('sigmoid')) optimizer = keras.optimizers.SGD(lr=0.1) loss = 'binary_crossentropy' keras_estimator = hvd.KerasEstimator(model, optimizer, loss) pipeline = Pipeline(stages=[..., keras_estimator, ...]) trained_pipeline = pipeline.fit(train_df) pred_df = trained_pipeline.transform(test_df)
  20. 20. Horovod Spark Estimators: Keras from tensorflow import keras import tensorflow as tf import horovod.spark.keras as hvd model = keras.models.Sequential() .add(keras.layers.Dense(8, input_dim=2)) .add(keras.layers.Activation('tanh')) .add(keras.layers.Dense(1)) .add(keras.layers.Activation('sigmoid')) optimizer = keras.optimizers.SGD(lr=0.1) loss = 'binary_crossentropy' keras_estimator = hvd.KerasEstimator(model, optimizer, loss) pipeline = Pipeline(stages=[..., keras_estimator, ...]) trained_pipeline = pipeline.fit(train_df) pred_df = trained_pipeline.transform(test_df)
  21. 21. Horovod Spark Estimators: PySpark from tensorflow import keras import tensorflow as tf import horovod.spark.keras as hvd model = keras.models.Sequential() .add(keras.layers.Dense(8, input_dim=2)) .add(keras.layers.Activation('tanh')) .add(keras.layers.Dense(1)) .add(keras.layers.Activation('sigmoid')) optimizer = keras.optimizers.SGD(lr=0.1) loss = 'binary_crossentropy' keras_estimator = hvd.KerasEstimator(model, optimizer, loss) pipeline = Pipeline(stages=[..., keras_estimator, ...]) trained_pipeline = pipeline.fit(train_df) pred_df = trained_pipeline.transform(test_df)
  22. 22. Horovod Spark Estimators: Horovod from tensorflow import keras import tensorflow as tf import horovod.spark.keras as hvd model = keras.models.Sequential() .add(keras.layers.Dense(8, input_dim=2)) .add(keras.layers.Activation('tanh')) .add(keras.layers.Dense(1)) .add(keras.layers.Activation('sigmoid')) optimizer = keras.optimizers.SGD(lr=0.1) loss = 'binary_crossentropy' keras_estimator = hvd.KerasEstimator(model, optimizer, loss) pipeline = Pipeline(stages=[..., keras_estimator, ...]) trained_pipeline = pipeline.fit(train_df) pred_df = trained_pipeline.transform(test_df)
  23. 23. Deep Learning in Spark: Performance Challenges 1. DataFrames / RDDs not well-suited to deep learning (no random access) 2. Spark applications typically run on CPU, DL training on GPU
  24. 24. Deep Learning in Spark: Performance Challenges 1. DataFrames / RDDs not well-suited to deep learning (no random access) 2. Spark applications typically run on CPU, DL training on GPU Spark ▪ Jobs typically easy to fan out with cheap CPU machines ▪ Transformations do not benefit as much from GPU acceleration Deep Learning ▪ Not embarrassingly parallel ▪ Compute bound, not data bound ▪ Computations easy to represent with linear algebra
  25. 25. Petastorm: Data Access for Deep Learning Training Challenges of Training on Large Datasets: ▪ Sharding ▪ Streaming ▪ Shuffling / Buffering / Caching Parquet: ▪ Large continuous reads (HDFS/S3-friendly) ▪ Fast access to individual columns ▪ Faster row queries in some cases ▪ Written and read natively by Apache Spark
  26. 26. Deep Learning in Spark with Horovod + Petastorm Apache Hive and Apache Spark logos are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.
  27. 27. Horovod on Spark 3.0: Accelerator-Aware Scheduling ▪ End-to-end training in a single Spark application ▪ ETL on CPU can hand off data to Horovod on GPU ▪ Fine grained control over resource allocation ▪ Tasks assigned GPUs by Spark, GPU ownership is isolated ▪ Multi-GPU nodes can be shared over different applications
  28. 28. Horovod on Spark 3.0: Accelerator-Aware Scheduling ▪ End-to-end training in a single Spark application ▪ ETL on CPU can hand off data to Horovod on GPU ▪ Fine grained control over resource allocation ▪ Tasks assigned GPUs by Spark, GPU ownership is isolated ▪ Multi-GPU nodes can be shared over different applications conf = SparkConf() conf = conf.set("spark.executor.resource.gpu.discoveryScript", DISCOVERY_SCRIPT) conf = conf.set("spark.executor.resource.gpu.amount", 4) conf = conf.set("spark.task.resource.gpu.amount", 1) spark = SparkSession.builder.config(conf=conf).getOrCreate()
  29. 29. Deep Learning in Spark 3.0 Cluster GPU Icon by Misha Petrishchev, RU (Creative Commons) https://thenounproject.com/term/gpu/1132940/ CPU Icon by iconsmind.com, GB (Creative Commons) https://thenounproject.com/term/cpu/69236/
  30. 30. Spark 3.0 Accelerator-Aware Scheduling
  31. 31. Spark 3.0 Accelerator-Aware Scheduling ▪ SPARK-24615 ▪ Request resources ▪ Executor ▪ Driver ▪ Task ▪ Resource discovery ▪ API to determine assignment ▪ Supported on YARN, Kubernetes, and Standalone
  32. 32. GPU Scheduling Example ▪ Example: $SPARK_HOME/bin/spark-shell --master yarn --executor-cores --conf spark.driver.resource.gpu.amount=1 --conf spark.driver.resource.gpu.discoveryScript=/opt/spark/getGpuResources.sh --conf spark.executor.resource.gpu.amount=2 --conf spark.executor.resource.gpu.discoveryScript=./getGpuResources.sh --conf spark.task.resource.gpu.amount=1 --files examples/src/main/scripts/getGpusResources.sh ▪ Example discovery script in Apache Spark github
  33. 33. Spark 3.0 Accelerator-Aware Scheduling Cont // Task API val context = TaskContext.get() val resources = context.resources() val assignedGpuAddrs = resources("gpu").addresses // Pass assignedGpuAddrs into TensorFlow or other AI code // Driver API scala> sc.resources("gpu").addresses Array[String] = Array(0)
  34. 34. Spark 3.0 Columnar Processing APIs
  35. 35. Spark 3.0 GPU Columnar Processing ▪ Columnar Processing (SPARK-27396) ▪ Catalyst API for columnar processing ▪ Plugins can modify the query plan with columnar operations ▪ Rapids for Apache Spark Plugin ▪ Plugin that allows running Spark on a GPU ▪ No code changes required by user ▪ Run operations it supports on the GPU ▪ If operation is not supported or not compatible with GPU it will run it on the CPU ▪ Automatically handles transitioning from Row to Columnar and back ▪ Uses Rapids cuDF library
  36. 36. Demo: Databricks Notebook run ETL and Horovod
  37. 37. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.

×