Adam Gibson
Email:
0@blix.io
Twitter:
@agibsonccc
Github:
https://github.com/a
gibsonccc
Slideshare:
http://slideshare.net...
Josh Patterson
Email:
josh@pattersonconsultingtn.com
Twitter:
@jpatanooga
Github:
https://github.com/jpata
nooga
Past
Publ...
Overview
• What is Deep Learning?
• Deep Belief Networks
• Implementation on Hadoop/YARN
• Results
What is Deep Learning?
Algorithm that tries to learn simple features in lower
layers
And more complex features in higher l...
Interesting Properties of Deep Learning
Reduces a problem with overfitting in neural
networks.
Introduces new techniques f...
Chasing Nature
Learning sparse representations of auditory signals
leads to filters that closely correspond to neurons in
...
Yann LeCunn on Deep Learning
Has become the dominant method for acoustic
modeling in speech recognition
Quickly becoming t...
What is a Deep Belief Network?
Generative probabilistic model
Composed of one visible layer
Many hidden layers
Each hidden...
Restricted Boltzmann Machines
Unsupervised model: Does feature learning by repeated sampling of the
input data. Learns how...
DeepLearning4J
Implementation in Java
Self-contained & built on Akka, Hazelcast, Jblas
Distributed to run faster and with ...
Vectorized Implementation
Handles lots of data concurrently.
Any number of examples at once, but the code does
not change....
DL4J vs Theano Perf
GPUs are inherently faster than normal native.
Theano is not distributed, and GPUs have very low
RAM.
...
What are Good Applications for Deep Learning?
Image Processing
High MNIST Scores
Audio Processing
Current Champ on TIMIT d...
Past Work: Parallel Iterative Algorithms on YARN
Started with
Parallel linear, logistic regression
Parallel Neural Network...
Parameter Averaging
McDonald, 2010
Distributed Training Strategies for the Structured Perceptron
Langford, 2007
Vowpal Wab...
MapReduce vs. Parallel Iterative
20
Input
Output
Map Map Map
Reduce Reduce
Processor Processor Processor
Superstep 1
Proce...
SGD: Serial vs Parallel
21
Model
Training Data
Worker 1
Master
Partial
Model
Global Model
Worker 2
Partial Model
Worker N
...
Managing Resources
Running through YARN on hadoop is important
Allows for workflow scheduling
Allows for scheduler oversig...
Parallelizing Deep Belief Networks
Two phase training
Pre Train
Fine tune
Each phase can do multiple passes over dataset
E...
PreTrain and Lots of Data
We’re exploring how to better leverage the
unsupervised aspects of the PreTrain phase of
Deep Be...
DBNs on IR Performance
Faster to Train.
Parameter averaging is an automatic form of
regularization.
Adagrad with IR allows...
Scale Out Metrics
Batches of records can be processed by as many
workers as there are data splits
Message passing overhead...
Usage From Command Line
Run Deep Learning on Hadoop
yarn jar iterativereduce-0.1-SNAPSHOT.jar [props file]
Evaluate model
...
Handwriting Renders
Faces Renders
…In Which We Gather Lots of Cat Photos
Future Direction
GPUs
Better Vectorization tooling
Move YARN version back over to JBLAS for matrices
References
“A Fast Learning Algorithm for Deep Belief Nets”
Hinton, G. E., Osindero, S. and Teh, Y. - Neural Computation
(...
Deep Learning on Hadoop
Deep Learning on Hadoop
Deep Learning on Hadoop
Deep Learning on Hadoop
Deep Learning on Hadoop
Upcoming SlideShare
Loading in...5
×

Deep Learning on Hadoop

987

Published on

Published in: Technology, Education
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
987
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide
  • Bottou similar to Xu2010 in the 2010 paper
  • Benefits of data flow: runtime can decide where to run tasks and can automatically recover from failures
    Acyclic data flow is a powerful abstraction, but is not efficient for applications that repeatedly reuse a working set of data:
    Iterative algorithms (many in machine learning)
    • No single programming model or framework can excel at
    every problem; there are always tradeoffs between simplicity, expressivity, fault tolerance, performance, etc.
  • POLR: Parallel Online Logistic Regression
    Talking points:
    wanted to start with a known tool to the hadoop community, with expected characteristics
    Mahout’s SGD is well known, and so we used that as a base point
  • Deep Learning on Hadoop

    1. 1. Adam Gibson Email: 0@blix.io Twitter: @agibsonccc Github: https://github.com/a gibsonccc Slideshare: http://slideshare.net/agibsonc cc/ Instructor at http://zipfianacademy.com/ Wired Coverage: http://www.wired.com/2014/0 6/skymind-deep-learning/
    2. 2. Josh Patterson Email: josh@pattersonconsultingtn.com Twitter: @jpatanooga Github: https://github.com/jpata nooga Past Published in IAAI-09: “TinyTermite: A Secure Routing Algorithm” Grad work in Meta-heuristics, Ant-algorithms Tennessee Valley Authority (TVA) Hadoop and the Smartgrid Cloudera Principal Solution Architect Today: Patterson Consulting
    3. 3. Overview • What is Deep Learning? • Deep Belief Networks • Implementation on Hadoop/YARN • Results
    4. 4. What is Deep Learning? Algorithm that tries to learn simple features in lower layers And more complex features in higher layers
    5. 5. Interesting Properties of Deep Learning Reduces a problem with overfitting in neural networks. Introduces new techniques for "unsupervised feature learning” introduces new more automatic ways to figure out the parts of your data you should feed into your learning algorithm.
    6. 6. Chasing Nature Learning sparse representations of auditory signals leads to filters that closely correspond to neurons in early audio processing in mammals When applied to speech Learned representations showed a striking resemblance to the cochlear filters in the auditory cortext
    7. 7. Yann LeCunn on Deep Learning Has become the dominant method for acoustic modeling in speech recognition Quickly becoming the dominant method for several vision tasks such as object recognition object detection semantic segmentation.
    8. 8. What is a Deep Belief Network? Generative probabilistic model Composed of one visible layer Many hidden layers Each hidden layer learns relationship between units in lower layer Higher layer representations tend to become more complext
    9. 9. Restricted Boltzmann Machines Unsupervised model: Does feature learning by repeated sampling of the input data. Learns how to reconstruct data for good feature detection. RBMs have different formulas for different kinds of data: Binary Continuous
    10. 10. DeepLearning4J Implementation in Java Self-contained & built on Akka, Hazelcast, Jblas Distributed to run faster and with more features than current Theano-based implementations. Talks to any data source, expects one format.
    11. 11. Vectorized Implementation Handles lots of data concurrently. Any number of examples at once, but the code does not change. Faster: Allows for native/GPU execution. One format: Everything is a matrix.
    12. 12. DL4J vs Theano Perf GPUs are inherently faster than normal native. Theano is not distributed, and GPUs have very low RAM. DL4J allows for situations where you have to “throw CPUs at it.”
    13. 13. What are Good Applications for Deep Learning? Image Processing High MNIST Scores Audio Processing Current Champ on TIMIT dataset Text / NLP Processing Word2vec, etc
    14. 14. Past Work: Parallel Iterative Algorithms on YARN Started with Parallel linear, logistic regression Parallel Neural Networks Packaged in Metronome 100% Java, ASF 2.0 Licensed, on github
    15. 15. Parameter Averaging McDonald, 2010 Distributed Training Strategies for the Structured Perceptron Langford, 2007 Vowpal Wabbit Jeff Dean’s Work on Parallel SGD DownPour SGD 19
    16. 16. MapReduce vs. Parallel Iterative 20 Input Output Map Map Map Reduce Reduce Processor Processor Processor Superstep 1 Processor Processor Superstep 2 . . . Processor
    17. 17. SGD: Serial vs Parallel 21 Model Training Data Worker 1 Master Partial Model Global Model Worker 2 Partial Model Worker N Partial Model Split 1 Split 2 Split 3 …
    18. 18. Managing Resources Running through YARN on hadoop is important Allows for workflow scheduling Allows for scheduler oversight Allows the jobs to be first class citizens on Hadoop And share resources nicely
    19. 19. Parallelizing Deep Belief Networks Two phase training Pre Train Fine tune Each phase can do multiple passes over dataset Entire network is averaged at master
    20. 20. PreTrain and Lots of Data We’re exploring how to better leverage the unsupervised aspects of the PreTrain phase of Deep Belief Networks Allows for the use of far less unlabeled data Allows us to more easily modeled the massive amounts of structured data in HDFS
    21. 21. DBNs on IR Performance Faster to Train. Parameter averaging is an automatic form of regularization. Adagrad with IR allows for better generalization of different features and even pacing.
    22. 22. Scale Out Metrics Batches of records can be processed by as many workers as there are data splits Message passing overhead is minimal Exhibits linear scaling Example: 3x workers, 3x faster learning
    23. 23. Usage From Command Line Run Deep Learning on Hadoop yarn jar iterativereduce-0.1-SNAPSHOT.jar [props file] Evaluate model ./score_model.sh [props file]
    24. 24. Handwriting Renders
    25. 25. Faces Renders
    26. 26. …In Which We Gather Lots of Cat Photos
    27. 27. Future Direction GPUs Better Vectorization tooling Move YARN version back over to JBLAS for matrices
    28. 28. References “A Fast Learning Algorithm for Deep Belief Nets” Hinton, G. E., Osindero, S. and Teh, Y. - Neural Computation (2006) “Large Scale Distributed Deep Networks” Dean, Corrado, Monga - NIPS (2012) “Visually Debugging Restricted Boltzmann Machine Training with a 3D Example” Yosinski, Lipson - Representation Learning Workshop (2012)

    ×