Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

868 views

Published on

Deep learning has shown tremendous successes, yet it often requires a lot of effort to leverage its power. Existing deep learning frameworks require writing a lot of code to run a model, let alone in a distributed manner. Deep Learning Pipelines is an Apache Spark Package library that makes practical deep learning simple based on the Spark MLlib Pipelines API. Leveraging Spark, Deep Learning Pipelines scales out many compute-intensive deep learning tasks. In this talk, we discuss the philosophy behind Deep Learning Pipelines, as well as the main tools it provides, how they fit into the deep learning ecosystem, and how they demonstrate Spark's role in deep learning.

Published in: Software
  • Be the first to comment

Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

  1. 1. Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark Bay Area Spark Meetup Nov8, 2017 Sue Ann Hong, Databricks
  2. 2. This talk • Deep Learning at scale: current state • Deep Learning Pipelines: the philosophy • End-to-end workflow with DL Pipelines • Future
  3. 3. Deep Learning at Scale : current state 3put your #assignedhashtag here by setting the
  4. 4. What is Deep Learning? • A set of machine learning techniques that use layers that transform numerical inputs • Classification • Regression • Arbitrary mapping • Popular in the 80’s as Neural Networks • Recently came back thanks to advances in data collection, computation techniques, and hardware.
  5. 5. Success of Deep Learning Tremendous success for applications with complex data • AlphaGo • Image interpretation • Automatictranslation • Speech recognition
  6. 6. Deep Learning is often challenging Labeled Data Compute Resources & Time Engineer hours • Tedious or difficult to distribute computations • No exact science around deep learning à lots of tweaking • Low level APIs with steep learning curve • Complex models à need a lot of data
  7. 7. Deep Learning in industry • Currently limited adoption • Huge potential beyond the industrial giants • How do we accelerate the road to massive availability?
  8. 8. Deep Learning Pipelines 8put your #assignedhashtag here by setting the
  9. 9. Deep Learning Pipelines: Deep Learning with Simplicity • Open-source Databricks library • Focuses on easeof useand integration • without sacrificing performance
  10. 10. Deep Learning is often challenging Compute Resources & Time Engineer hours Labeled Data • Tedious or difficult to distribute computations • No exact science around deep learning à lots of tweaking • Low level APIs with steep learning curve • Complex models à need a lot of data
  11. 11. Deep Learning is often challenging • Tedious or difficult to distribute computations • No exact science around deep learning à lots of tweaking • Low level APIs with steep learning curve • Complex models à need a lot of data
  12. 12. • Tedious or difficult to distribute computations • No exact science around deep learning à lots of tweaking • Low level APIs with steep learning curve • Complex models à need a lot of data Instead • Be easy to scale • Require little tweaking • Be easy to write • Require little or no data Common workflows should
  13. 13. How • Be easy to scale • Require little tweaking • Be easy to write • Require little or no data Common workflows should • Use Apache Spark for scaling out common tasks • Leverage well-knownmodel architectures • Integrate with MLlib Pipelines API to capture ML workflowconcisely • Leverage pre-trained models for common tasks Apache Spark for scaling out MLlib Pipelines API pre-trained models
  14. 14. Demo: Build a visual recommendation AI 10minutes 7lines of code Elastic Scale-out using Apache Spark MLlib Pipelines API Leverage pre-trained models 0labels
  15. 15. To the notebook!
  16. 16. End-to-EndWorkflow with Deep Learning Pipelines 18put your #assignedhashtag here by setting the
  17. 17. A typical Deep Learning workflow • Load data (images, text, time series, …) • Interactive work • Train • Select an architecture for a neural network • Optimize the weights of the NN • Evaluateresults, potentially re-train • Apply: • Pass the data through the NN to produce new features or output Load data Interactive work Train Evaluate Apply
  18. 18. A typical Deep Learning workflow Load data Interactive work Train Evaluate Apply • Image loading in Spark • Distributed batch prediction • Deploying models in SQL • Transfer learning • Distributed tuning • Pre-trained models
  19. 19. Deep Learning Pipelines • Load data • Interactive work • Train • Evaluate model • Apply
  20. 20. Adds support for images in Spark • ImageSchema, reader, conversion functions to/from numpy arrays • Most of the tools we’ll describe work on ImageSchema columns from sparkdl import readImages image_df = readImages(sample_img_dir)
  21. 21. Upcoming: built-in support in Spark • Spark-21866 • Contributing image format & reading to Spark • Targeted for Spark 2.3 • Joint work with Microsoft 23put your #assignedhashtag here by setting the
  22. 22. Deep Learning Pipelines • Load data • Interactive work • Train • Evaluate model • Apply
  23. 23. Applying popular models • Popular pre-trained models accessible through MLlib Transformers predictor = DeepImagePredictor(inputCol="image", outputCol="predicted_labels", modelName="InceptionV3") predictions_df = predictor.transform(image_df)
  24. 24. Applying popular models predictor = DeepImagePredictor(inputCol="image", outputCol="predicted_labels", modelName="InceptionV3") predictions_df = predictor.transform(image_df)
  25. 25. Deep Learning Pipelines • Load data • Interactive work • Train • Evaluate model • Apply Hyperparameter tuning Transfer learning
  26. 26. Deep Learning Pipelines • Load data • Interactive work • Train • Evaluate model • Apply Hyperparameter tuning Transfer learning
  27. 27. Transfer learning • Pre-trained models may not be directly applicable • New domain, e.g. shoes • Training from scratch requires • Enormous amounts of data • A lot of compute resources & time • Idea: intermediate representations learned for one task may be useful for other related tasks
  28. 28. Transfer Learning SoftMax GIANT PANDA 0.9 RACCOON 0.05 RED PANDA 0.01 …
  29. 29. Transfer Learning
  30. 30. Transfer Learning Classifier
  31. 31. Transfer Learning Classifier Rose: 0.7 Daisy: 0.3
  32. 32. Transfer Learning as a Pipeline DeepImageFeaturizer Image Loading Preprocessing Logistic Regression MLlib Pipeline
  33. 33. Transfer Learning as a Pipeline 35put your #assignedhashtag here by setting the featurizer = DeepImageFeaturizer(inputCol="image", outputCol="features", modelName="InceptionV3") lr = LogisticRegression(labelCol="label") p = Pipeline(stages=[featurizer, lr]) p_model = p.fit(train_df)
  34. 34. Transfer Learning • Usually for classification tasks • Similar task, new domain • But other forms of learning leveraging learned representations can be loosely considered transfer learning
  35. 35. Featurization for similarity-based ML DeepImageFeaturizer Image Loading Preprocessing Logistic Regression
  36. 36. Featurization for similarity-based ML DeepImageFeaturizer Image Loading Preprocessing Clustering KMeans GaussianMixture Nearest Neighbor KNN LSH Distance computation
  37. 37. Featurization for similarity-based ML DeepImageFeaturizer Image Loading Preprocessing Clustering KMeans GaussianMixture Nearest Neighbor KNN LSH Distance computation Duplicate Detection Recommendation Anomaly Detection Search result diversification
  38. 38. Deep Learning Pipelines • Load data • Interactive work • Train • Evaluate model • Apply Hyperparameter tuning Transfer learning
  39. 39. Deep Learning Pipelines • Load data • Interactive work • Train • Evaluate model • Apply
  40. 40. Deep Learning Pipelines • Load data • Interactive work • Train • Evaluate model • Apply Spark SQL Batch prediction s
  41. 41. Deep Learning Pipelines • Load data • Interactive work • Train • Evaluate model • Apply Spark SQL Batch prediction
  42. 42. Batch prediction as an MLlib Transformer • A model is a Transformer in MLlib • DataFrame goes in, DataFrame comes out with output columns predictor = XXTransformer(inputCol="image", outputCol="prediction", modelSpecification={…}) predictions = predictor.transform(test_df)
  43. 43. Hierarchy of DL transformers for images 46 TFImageTransformer KerasImageTransformer DeepImageFeaturizerDeepImagePredictor NamedImageTransformer(model_name) (keras.Model) (tf.Graph) Input (Image) Output (vector) model_namemodel_name Keras.Model tf.Graph
  44. 44. Hierarchy of DL transformers for images 47 TFImageTransformer KerasImageTransformer DeepImageFeaturizerDeepImagePredictor NamedImageTransformer(model_name) (keras.Model) (tf.Graph) Input (Image) Output (vector) model_namemodel_name Keras.Model tf.Graph
  45. 45. Hierarchy of DL transformers 48 TFImageTransformer KerasImageTransformer DeepImageF eaturizer DeepImageP redictor NamedImageTransformer ) Input (Image ) Output (vector ) model_namemodel_name Keras.Model tf.Graph TFTextTransformer KerasTextTransformer DeepTextFeat urizer DeepTextPre dictor NamedTextTransformerInput (Text) Output (vector) model_namemodel_name Keras.Model tf.Graph TFTransformer
  46. 46. Hierarchy of DL transformers 49 TFImageTransformer KerasImageTransformer DeepImageFeaturizerDeepImagePredictor NamedImageTransformer TFTextTransformer KerasTextTransformer DeepTextFeaturizerDeepTextPredictor NamedTextTransformer TFTransformer
  47. 47. Hierarchy of DL transformers 50 TFImageTransformer KerasImageTransformer DeepImageFeaturizerDeepImagePredictor NamedImageTransformer TFTextTransformer KerasTextTransformer DeepTextFeaturizerDeepTextPredictor NamedTextTransformer TFTransformer
  48. 48. CommonTasks Easy Advanced Ones Possible 51 TFImageTransformer KerasImageTransformer DeepImageFeaturizerDeepImagePredictor NamedImageTransformer TFTextTransformer KerasTextTransformer DeepTextFeaturizerDeepTextPredictor NamedTextTransformer TFTransformer
  49. 49. CommonTasks Easy Advanced Ones Possible 52 TFImageTransformer KerasImageTransformer DeepImageFeaturizer DeepImagePredictor NamedImageTransformer TFTextTransformer KerasTextTransformer DeepTextFeaturizer DeepTextPredictor NamedTextTransformer TFTransformer Pre-built Solutions 80%-built Solutions Self-built Solutions
  50. 50. Deep Learning Pipelines : Future In progress • Text featurization (embeddings) • TFTransformer for arbitrary vectors Future directions • Non-image data domains: video, text, speech, … • Distributed training • Support for more backends, e.g. MXNet, PyTorch, BigDL
  51. 51. Questions? Thank you! 54put your #assignedhashtag here by setting the
  52. 52. Resources DL Pipelines GitHub Repo, Spark Summit Europe 2017 Deep Dive Blog posts & webinars (http://databricks.com/blog) • Deep Learning Pipelines • GPU acceleration in Databricks • BigDL on Databricks • Deep Learning and Apache Spark Docs for Deep Learningon Databricks (http://docs.databricks.com) • Getting started • Deep Learning Pipelines Example • Spark integration

×