Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Convolutional Neural Networks at scale in Spark MLlib:
Jeremy Nixon will focus on the engineering and applications of a new algorithm built on top of MLlib. The presentation will focus on the methods the algorithm uses to automatically generate features to capture nonlinear structure in data, as well as the process by which it’s trained. Major aspects of that include compositional transformations over the data, convolution, and distributed backpropagation via SGD with adaptive gradients and an adaptive learning rate. Applications will look into how to use convolutional neural networks to model data in computer vision, natural language and signal processing. Details around optimal preprocessing, the type of structure that can be learned, and managing its ability to generalize will inform developers looking to apply nonlinear modeling tools to problems that they face.

  • Login to see the comments

  • Be the first to like this

Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

  1. 1. Spark Technology Center Convolutional Neural Networks at Scale in MLlib Jeremy Nixon
  2. 2. Spark Technology Center 1. Machine Learning Engineer at the Spark Technology Center 2. Contributor to MLlib, dedicated to scalable deep learning. 3. Previously, studied Applied Mathematics to Computer Science and Economics at Harvard Jeremy Nixon
  3. 3. Spark Technology Center Large Scale Data Processing ● In-memory compute ● Up to 100x faster than Hadoop Improved Usability ● Rich APIs in Scala, Java, Python ● Interactive Shell
  4. 4. Spark Technology Center Spark’s Machine Learning Library ● Alternating Least Squares ● Lasso ● Ridge Regression ● Logistic Regression ● Decision Trees ● Naive Bayes ● SVMs ● … MLlib
  5. 5. Spark Technology Center Part of Spark ● Integrated Data Analysis ● Scalable Python, Scala, Java APIs MLlib
  6. 6. Spark Technology Center ● Deep Learning benefits from large datasets ● Spark allows for Large Scale Data Analysis ● Compute is Local to Data ● Integrated into organization’s Spark Jobs ● Leverages existing compute cluster Deep Learning in MLlib
  7. 7. Spark Technology Center Github Link: Spark Package: Links
  8. 8. Spark Technology Center 1. Framing Deep Learning 2. MLlib Deep Learning API 3. Optimization 4. Performance 5. Future Work 6. Deep Learning Options on Spark 7. Deep Learning Outside of Spark Structure
  9. 9. Spark Technology Center 1. Structural Assumptions 2. Automated Feature Engineering 3. Learning Representations 4. Applications Framing Convolutional Neural Networks
  10. 10. Spark Technology Center Structural Assumptions: Location Invariance - Convolution is a restriction on the features that can be combined. - Location Invariance leads to strong accuracy in vision, audio, and language.
  11. 11. Spark Technology Center Structural Assumptions: Hierarchical Abstraction
  12. 12. Spark Technology Center - Pixels - Edges - Shapes - Parts - Objects - Learn features that are optimized for the data - Makes transfer learning feasible Structural Assumptions: Hierarchical Abstraction
  13. 13. Spark Technology Center - Character - Word - Phrase - Sentence - Phonemes - Words - Pixels - Edges - Shapes - Parts - Objects Structural Assumptions: Composition
  14. 14. Spark Technology Center 1. CNNs - State of the art a. Object Recognition b. Object Localization c. Image Segmentation d. Image Restoration e. Music Recommendation 2. RNNs (LSTM) - State of the Art a. Speech Recognition b. Question Answering c. Machine Translation d. Text Summarization e. Named Entity Recognition f. Natural Language Generation g. Word Sense Disambiguation h. Image / Video Captioning i. Sentiment Analysis Applications
  15. 15. Spark Technology Center ● Computationally Efficient ● Makes Transfer Learning Easy ● Takes advantage of location invariance Structural Assumptions: Weight Sharing
  16. 16. Spark Technology Center - Network depth creates an extraordinary range of possible models. - That flexibility creates value in large datasets to reduce variance. Structural Assumptions: Combinatorial Flexibility
  17. 17. Spark Technology Center Automated Feature Engineering - Feature hierarchy is too complex to engineer manually - Works well for compositional structure, overfits elsewhere
  18. 18. Spark Technology Center Learning Representations Hidden Layer + Nonlinearity pology/
  19. 19. Spark Technology Center Flexibility. High level enough to be efficient. Low level enough to be expressive. MLlib Flexible Deep Learning API
  20. 20. Spark Technology Center Modularity enables Logistic Regression, Feedforward Networks. MLlib Flexible Deep Learning API
  21. 21. Spark Technology Center Optimization Modern optimizers allow for more efficient, stable training. Momentum cancels noise in the gradient.
  22. 22. Spark Technology Center Optimization Modern optimizers allow for more efficient, stable training. RMSProp automatically adapts the learning rate.
  23. 23. Spark Technology Center Parallel implementation of backpropagation: 1. Each worker gets weights from master node. 2. Each worker computes a gradient on its data. 3. Each worker sends gradient to master. 4. Master averages the gradients and updates the weights. Distributed Optimization
  24. 24. Spark Technology Center ● Parallel MLP on Spark with 7 nodes ~= Caffe w/GPU (single node). ● Advantages to parallelism diminish with additional nodes due to communication costs. ● Additional workers are valuable up to ~20 workers. ● See hmark for more details Performance
  25. 25. Spark Technology Center Github: Spark Package: parkdl Access
  26. 26. Spark Technology Center 1. GPU Acceleration (External) 2. Python API 3. Keras Integration 4. Residual Layers 5. Hardening 6. Regularization 7. Batch Normalization 8. Tensor Support Future Work
  27. 27. Deep Learning on Spark 1. Major Projects a. DL4J b. BigDL c. Spark-deep-learning d. Tensorflow-on-Spark e. SystemML 2. Important Comparisons 3. Minor & Abandoned Projects a. H20AI DeepWater b. TensorFrames c. Caffe-on-Spark d. Scalable-deep-learning e. MLlib Deep Learning f. Sparknet g. DeepDist
  28. 28. ● Distributed GPU support for all major deep learning architectures ○ CPU / Distributed CPU / Single GPU options exist ○ Supports Convolutional Nets, LSTMs / RNNs, Feedforward Nets, Word2Vec ● Actively Supported and Improved ● APIs in Java, Scala, Python ○ Fairly Inelegant API, there’s a optin through ScalNet (Keras-like front end) ○ Working towards becoming a Keras Backend ● Backed by Skymind (Committed) ○ ~15 person startup, Adam Gibson + Chris Nicholson ● Modular front end in DL4J ● Backed by linear algebra library ND4J ○ Numerical computing wrapper over BLAS for various backends ● Python API has Keras import / export ● Production with proprietary ‘Skymind Intelligence Layer’ DL4J
  29. 29. BigDL ● Distributed CPU based library ○ Backed by Intel MKL / multithreading ○ No benchmark out as yet ● Support for most major deep learning architectures ○ Convolutional Networks, RNNs, LSTMs, no Word2Vec / Glove ● Backed by Intel (Committed) ○ Actively Supported / Improved ○ Intel has already acquired Nirvana and partnered with Chainer - strategy here is unclear. ○ Intel doesn’t look to be supporting their own Xeon GPU with BigDL ● Scala and Python API Support ○ API Modeled after Torch ● Support for numeric computing via tensors
  30. 30. Spark-deep-learning ● Databricks’ library focused on model serving, to allow scaled out inference ● ‘Transfer Learning’ (Allows logistic regression layer to be retrained) ● Python API ○ One-liner for integrating Keras model into a pipeline ● Supports Tensorflow models ○ Keras Import for Tensorflow backed Keras Models ● Support for image processing only ● Weakly Supported by Databricks ○ Last commit was a month ago ○ Qualifying lines - “We will implement text processing, audio processing if there is interest”
  31. 31. 1. Goal is to scale out Caffe / Tensorflow on heterogenous GPU / CPU setup a. Each executor launches a Caffe / TF instance b. RDMA / Infiniband for distributing compute in TF on Spark, improvement over TF’s ethernet model 2. Goal is to minimize changes to Tensorflow / Caffe code during scaleout 3. Allows for Model / Data parallelism 4. Weakly supported by Yahoo a. Caffe-on-spark hasn’t seen a commit in 6 months b. Tensorflow-on-spark gets about 2 minor commits / month 5. Yahoo demonstrated capability on large scale Flickr dataset 6. Visualization with tensorboard Caffe / Tensorflow -on-Spark
  32. 32. SystemML ● Deep Learning library with single-node GPU support, moving towards distributed GPU support ○ Supports CNNs for Classification, Localization, Segmentation ○ Supports RNNs / LSTM ● Attached to linear algebra focused ML library w/ linear algebra compiler ● Backed by IBM ○ Actively being Improved ● Provides CPU based support for most computer vision tasks ○ Convolutional Networks ● Caffe2DML for caffe integration ● DML API ○ SystemML has Python API for a handful of algorithms, may come out with Python DL API
  33. 33. Important Comparisons Framework Hardware Supported Models API DL4J CPU / GPU, Distributed CPU / GPU CNNs, RNNs, Feedforward Nets, Word2Vec Java, Scala, Python BigDL CPU / Distributed CPU CNNs, RNNs, Feedforward Nets Scala, Python Spark-Deep-Learning CPU / Distributed CPU Vison - CNNs, Feedforward Nets Python Caffe / Tensorflow on Spark CPU / GPU, Distributed CPU / GPU CNNs, RNNs, Feedforward Nets, Word2Vec Python SystemML Deep Learning CPU, Towards GPU / Distrbuted GPU CNNs, RNNS, Feedforward Nets DML, Potentially Python
  34. 34. Important Comparisons Framework Support Strength Goal Distinguishing Value DL4J Skymind. Fully focused on package, but still a Startup. Fully fledged Deep Learning solution from training to production Comprehensive, Distributed GPU. BigDL Intel. Fairly strong AI/DL commitment. Has Chainer, Nirvana. Spark / Hadoop solution, bring DL to the data Comprehensive Spark-Deep-Learning Databricks, ambiguous level of commitment Scaleout solution for TF users Scaling out with Spark at inference time Caffe / Tensorflow on Spark Yahoo. Caffe-on-spark looks abandoned, TF-on Spark better. Scaling out training on heterogenous hardware. Scaling out training with distributed CPU / GPU. SystemML Deep Learning IBM team. Deep Learning Training solution GPU Support, Moving towards Distributed GPU Support.
  35. 35. Minor & Abandoned Projects 1. H20AI DeepWater a. Integrates other frameworks (TF, MXNet, Caffe) into H20 Platform b. Only native support is for feedforward networks 2. MXNet Integration a. Nascent, few commits from Microsoft engineer 3. TensorFrames a. Focused on hyperparameter tuning, running TF instances in parallel. ~ 2 commits / month 4. Caffe-on-Spark a. No commits for ~6 months 5. Scalable-deep-learning a. Only supports feedforward networks / autoencoder, CPU based 6. MLlib Deep Learning a. Only supports feedforward networks, CPU based 7. Sparknet a. Abandoned, no commits for 18 months
  36. 36. Deep Learning Outside of Spark
  37. 37. Deep Learning Outside of Spark
  38. 38. Spark Technology Center Thank you for your attention! Questions?