Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deeplearning on Hadoop @OSCON 2014

3,556 views

Published on

Distributed Deep Learning on Hadoop

Deep-learning is useful in detecting anomalies like fraud, spam and money laundering; identifying similarities to augment search and text analytics; predicting customer lifetime value and churn; recognizing faces and voices.

Deeplearning4j is an infinitely scalable deep-learning architecture suitable for Hadoop and other big-data structures. It includes a distributed deep-learning framework and a normal deep-learning framework; i.e. it runs on a single thread as well. Training takes place in the cluster, which means it can process massive amounts of data. Nets are trained in parallel via iterative reduce, and they are equally compatible with Java, Scala and Clojure. The distributed deep-learning framework is made for data input and neural net training at scale, and its output should be highly accurate predictive models.

The framework's neural nets include restricted Boltzmann machines, deep-belief networks, deep autoencoders, convolutional nets and recursive neural tensor networks.

Published in: Engineering

Deeplearning on Hadoop @OSCON 2014

  1. 1. Deep Learning on Hadoop Scale out Deep Learning on YARN
  2. 2. Adam Gibson Email : 0@blix.io Twitter @agibsonccc Github github.com/agi bsonccc Slideshare slideshare.net/agibsonccc Teaching zipfianacademy.com Press wired.com/2014/06/skymind -deep-learning
  3. 3. Josh Patterson Email: josh@pattersonconsultingtn.com Twitter: @jpatanooga Github: github.com /jpatanooga Past Published in IAAI-09: “TinyTermite: A Secure Routing Algorithm” Grad work in Meta-heuristics, Ant-algorithms Tennessee Valley Authority (TVA) Hadoop and the Smartgrid Cloudera Principal Solution Architect Today: Patterson Consulting
  4. 4. Overview • What Is Deep Learning? • Neural Nets and Optimization Algorithms • Implementation on Hadoop/YARN • Results
  5. 5. Machine perception, pattern recognition.What is Deep Learning?
  6. 6. What Is Deep Learning? Algorithms called neural nets that learn to recognize patterns: Nodes learn smaller features of larger patterns And combine them to recognize feature groups Until finally they can classify objects, faces, etc. Each node layer in net learns larger groups
  7. 7. Properties of Deep Learning Small training sets, they learn unsupervised data They save data scientists months of work Anything you can vectorize, DL nets can learn They can handle millions of parameters After training, DL models are one, small vector
  8. 8. Chasing Nature Learning sparse representations of auditory signals Leads to filters that correspond to neurons in early audio processing in mammals When applied to speech Learned representations show a resemblance to cochlear filters in the auditory cortex.
  9. 9. Yann Lecun on Deep Learning DL is the dominant method for acoustic modeling in speech recognition It is becoming dominant in machine vision for: object recognition object detection semantic segmentation.
  10. 10. “Deep” > 1 hidden layer Deep Neural Nets
  11. 11. Restricted Boltzmann Machines RBMs are building blocks for deeper nets. They deal with Binary and Continuous data differently. Binary Continuous
  12. 12. What Is a Deep-Belief Network? A stack of restricted Boltzmann machines A generative probabilistic model 1) A visible (input) layer … 2) Two or more hidden layers that learn more & more complex features… 3) An output layer that classifies the input.
  13. 13. A Recursive Neural Tensor Network? RNTN’s are top-down; DBN’s are feed-forward A tensor is 3d matrix RNTN’s handle multiplicity Scene and sentence parsing, windows of events
  14. 14. A Deep Autoencoder? DA’s are good for QA systems like Watson They encode lots of data in smaller number vectors Good for Image Search, Topic Modeling
  15. 15. A Convolutional Net? ConvNets slice up features with shared weights ConvNets learns images in patches from a grid Very good at generalization
  16. 16. DeepLearning4J The most complete, production-ready open- source DL lib Written in Java: Uses Akka, Hazelcast and Jblas Distributed to run fast, built for non-specialists More features than Theano-based tools Talks to any data source, expects 1 format
  17. 17. DL4J Serves Industry Nonspecialists can rely on its conventions to solve computationally intensive problems Usability first – DL4J follows ML tool conventions DL4J’s nets work equally well with text, image, sound and time-series DL4J will integrate with Python community through SDKs
  18. 18. Vectorized Implementation Handles lots of data concurrently. Any number of examples at once, but the code does not change. Faster: Allows for native and GPU execution. One input format: Everything is a matrix. Image, sound, text, time series are vectorized.
  19. 19. DL4J vs Theano vs Torch DL4J’s distributed nature means problems can be solved by “throwing CPUs at them.” Java ecosystem has GPU integration tools. Theano is not distributed, and Torch7 has not automated its distribution like DL4J. DL4J’s matrix multiplication is native w/ Jblas.
  20. 20. What Are Good Applications for DL? Recommendation engines (e-commerce) DL can model consumer and user behavior Anomaly detection (fraud, money laundering) DL can recognize early signals of bad outcomes Signal processing (CRM, ERP) DL has predictive capacity with time-series data
  21. 21. DL4J Vectorizes & Analyzes Text Sentiment analysis Logs News articles Social media
  22. 22. Build Your Own Google Brain …DL on Hadoop and AWS
  23. 23. Past Work: Parallel Iterative Algos on YARN Started with Parallel linear, logistic regression Parallel Neural Networks “Metronome” packages DL4J for Hadoop 100% Java, ASF 2.0 Licensed, on Github
  24. 24. MapReduce vs. Parallel Iterative 24 Input Output Map Map Map Reduce Reduce ProcessorProcessor ProcessorProcessor ProcessorProcessor Superstep 1Superstep 1 ProcessorProcessor ProcessorProcessor Superstep 2Superstep 2 . . . ProcessorProcessor
  25. 25. SGD: Serial vs Parallel 25 Model Training Data Worker 1 Master Partial Model Global Model Worker 2 Partial Model Worker N Partial Model Split 1 Split 2 Split 3 …
  26. 26. Managing Resources Running through YARN on Hadoop is important Allows for workflow scheduling Allows for scheduler oversight Allows the jobs to be first-class citizens on Hadoop And shares resources nicely
  27. 27. Parallelizing Deep-Belief Networks Two-phase training Pretrain Fine-tune Each phase can do multiple passes over dataset Entire network is averaged at master
  28. 28. PreTrain and Lots of Data We’re exploring how to better leverage the unsupervised aspects of the PreTrain phase of Deep-Belief Networks Allows for the use of far more unlabeled data Allows us to more easily model the massive amounts of structured data in HDFS
  29. 29. DL4J on Hadoop is fast and accurate Results
  30. 30. DBNs on IR Performance  Faster to train.  Parameter averaging is an automatic form of regularization.  Adagrad with IR allows for better generalization of different features and even pacing.
  31. 31. Scale-out Metrics Batches of records can be processed by as many workers as there are data splits Message passing overhead is minimal Exhibits linear scaling Example: 3x workers, 3x faster learning
  32. 32. Usage From Command Line Run Deep Learning on Hadoop yarn jariterativereduce-0.1-SNAPSH O T.jar[props file] Evaluate model ./score_m odel.sh [props file]
  33. 33. Handwriting Renders
  34. 34. Facial Renders
  35. 35. What’s Next? GPU integration in the cloud (AWS) Better vectorization tooling & data pipelines Move YARN version back over to JBLAS for matrices Spark
  36. 36. References “A Fast-Learning Algorithm for Deep Belief Nets” Hinton, G. E., Osindero, S. and Teh, Y. - Neural Computation (2006) “Large Scale Distributed Deep Networks” Dean, Corrado, Monga - NIPS (2012) “Visually Debugging Restricted Boltzmann Machine Training with a 3D Example” Yosinski, Lipson - Representation Learning Workshop (2012)
  37. 37. Parameter Averaging McDonald, 2010 Distributed Training Strategies for the Structured Perceptron Langford, 2007 Vowpal Wabbit Jeff Dean’s Work on Parallel SGD DownPour SGD 37

×