Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Georgia Tech cse6242 - Intro to Deep Learning and DL4J


Published on

Introduction to deep learning and DL4J - - a guest lecture by Josh Patterson at Georgia Tech for the cse6242 graduate class.

Published in: Data & Analytics
  • Be the first to comment

Georgia Tech cse6242 - Intro to Deep Learning and DL4J

  1. 1. Josh Patterson Email: Twitter: @jpatanooga Github: nooga Past Published in IAAI-09: “TinyTermite: A Secure Routing Algorithm” Grad work in Meta-heuristics, Ant-algorithms Tennessee Valley Authority (TVA) Hadoop and the Smartgrid Cloudera Principal Solution Architect Today: Patterson Consulting
  2. 2. Overview • What is Deep Learning? • Deep Belief Networks • DL4J
  3. 3. What is Deep Learning? Algorithm that tries to learn simple features in lower layers And more complex features in higher layers
  4. 4. Interesting Properties of Deep Learning Reduces a problem with overfitting in neural networks. Introduces new techniques for "unsupervised feature learning” introduces new more automatic ways to figure out the parts of your data you should feed into your learning algorithm.
  5. 5. Chasing Nature Learning sparse representations of auditory signals leads to filters that closely correspond to neurons in early audio processing in mammals When applied to speech Learned representations showed a striking resemblance to the cochlear filters in the auditory cortext
  6. 6. Yann LeCunn on Deep Learning Has become the dominant method for acoustic modeling in speech recognition Quickly becoming the dominant method for several vision tasks such as object recognition object detection semantic segmentation.
  7. 7. What is a Deep Belief Network? Generative probabilistic model Composed of one visible layer Many hidden layers Restricted Boltzman Machines Each hidden layer learns relationship between units in lower layer Higher layer representations tend to become more complex
  8. 8. Restricted Boltzmann Machines • Unsupervised model • Does feature learning by repeated sampling of the input data. • Learns how to reconstruct data for good feature detection.
  9. 9. Deep Belief Network Training Pre-Train We should each RBM layer unlabeled vectors “unsupervised learning” For each layer we want to minimize the Cross Entropy Fine-Tune We move the learned weights (hidden bias units) from the RBMs to a traditional feed-forward neural network We run gentle back-propagation with some labeled data
  10. 10. Pre-Train Reconstructions High Cross Entropy Low Cross Entropy
  11. 11. Deep Belief Network Diagram • DBNs are classifiers • Layers of RBMs • Capped with a Logistic Layer • RBMs are feature extractors • RBMs learn features via sampling • Creates “simpler problem” for later layers in stack
  12. 12. Rendering RBM Hidden Neuron Filters
  13. 13. DeepLearning4J Implementation in Java Self-contained & built on Akka, Hazelcast, Jblas Runs on desktop Runs on Hadoop via YARN natively to scale out Distributed to run faster and with more features than current Theano-based implementations
  14. 14. Vectorized Implementation Handles lots of data concurrently. Any number of examples at once, but the code does not change. Faster: Allows for native/GPU execution. One format: Everything is a matrix.
  15. 15. What are Good Applications for Deep Learning? Image Processing High MNIST Scores Audio Processing Current Champ on TIMIT dataset Text / NLP Processing Word2vec, etc
  16. 16. Parameter Averaging McDonald, 2010 Distributed Training Strategies for the Structured Perceptron Langford, 2007 Vowpal Wabbit Jeff Dean’s Work on Parallel SGD DownPour SGD 19
  17. 17. Parallelizing Deep Belief Networks Two phase training Pre Train Fine tune Each phase can do multiple passes over dataset Entire network is averaged at master
  18. 18. PreTrain and Lots of Data We’re exploring how to better leverage the unsupervised aspects of the PreTrain phase of Deep Belief Networks Allows for the use of far less unlabeled data Allows us to more easily modeled the massive amounts of structured data in HDFS
  19. 19. Refernces Visualizing RBMs tml DL4J