Georgia Tech cse6242 - Intro to Deep Learning and DL4J
Building distributed deep learning engine
1. Building Distributed Deep Learning Engine
Guangdeng Liao, Zhan Zhang and Murtaza Zafer
SRA-SV | Cloud Research Lab Slide 1
2. What is Deep Learning
Deep learning is a set of algorithms that attempt to model high-level
abstractions in data by using architectures composed of multiple non-linear
transformations
Learn hidden features Learning state emission
prob.
Learning word vectors
SRA-SV | Cloud Research Lab Slide 2
3. Thanks Big Data, Deep Learning is not only research
Usage Scenario: Speech Recognition, Image processing and NLP
SRA-SV | Cloud Research Lab Slide 3
4. Why Samsung needs Deep Learning?
To make our devices smarter and more intelligent by recognizing
voice, image and even language
SRA-SV | Cloud Research Lab Slide 4
5. How does Deep Learning look like?
Many more examples (millions to billions parameters ) in Speech
Recognition, Image Processing and NLP
Krizhevsky, A., Sutskever, I. and Hinton, G. E. ImageNet Classification with Deep Convolutional Neural Networks
SRA-SV | Cloud Research Lab Slide 5
6. Deep Learning is challenging..
BIG DATA + BIG MODEL
Building a distributed deep learning platform for
Quite new, no mature platform yet
Samsung R&D
Hard to design and develop DL algorithms
SRA-SV | Cloud Research Lab Slide 6
7. Distributed Deep Learning Platform we are building
Object
recognition
App.
Speech
recognition
….
RBM FF DA CNN ….
Model-parallel
engine
Algorithms
Infrastructure
I/O ….
Parameter
server
Execution
engine
Math
SRA-SV | Cloud Research Lab Slide 7
Our focus
8. Now, Let’s dive deeper and more technically…
SRA-SV | Cloud Research Lab Slide 8
9. Model-Parallel Engine (MPE)
Parallelize a big ML model over Hadoop YARN cluster
Auto-deployment
of topology (in-memory)
SRA-SV | Cloud Research Lab Slide 9
User
defined
model
Auto-generation
of model topology
Auto-partition of
topology over
cluster
c1
c2
c3
Neuron-like
programming
Message-based
communication
Message-driven
computation
-Define nodes
-Define groups
-Define connections
10. MPE’s Architecture
Container
Node Manager
Data Communication:
• node-level
• group-level
Container
Node manager
Control comm. based on
Thrift
Data comm. based on Netty
Application Master
Controller
Partition and
deploy topology
Container
Node manager
SRA-SV | Cloud Research Lab Slide 10
11. How to partition big models
Vertical Partition Horizontal Partition
SRA-SV | Cloud Research Lab Slide 11
12. Execution Engine (Layer-by-Layer Training)
Can stack different layers and training algorithms
HDFS/LFS
HDFS/LFS
SRA-SV | Cloud Research Lab Slide 12
14. Deep Learning Infra.: Hybrid of Data-parallelism and Model-parallelism
…….. Data Chunk
Data Chunk
……..
Model-parallel Model-parallel
Parameter
Server 1
Parameters coordination
Parameter
Server n ……..
Data-parallelism
Lots of model instances
Parameter servers
help models learn
each other
SRA-SV | Cloud Research Lab Slide 14
15. Distributed Parameter Servers
Currently we support asynchronous stochastic gradient descent with AdaGrad
Client Client Client
Server 1 Server 2 Server 3
In-memory
cache/storage
HBase/HDFS
In-memory
cache/storage
Asyn. communication
In-memory
cache/storage
SRA-SV | Cloud Research Lab Slide 15
Pull/Push
16. Deep Learning Algorithms
Deep Learning Algorithms
Feed-forward Neural Network
Restricted Boltzmann
Machine
Denoise Auto-encoder
Deep Belief Network
More importantly, we can stack them
layer by layer
SRA-SV | Cloud Research Lab Slide 16
17. More Challenging Algorithm: Convolutional Neural Network
Different convolutional, normalization and pooling layers
Weight shared and non-shared feature maps
Feature map is minimum partition unit
SRA-SV | Cloud Research Lab Slide 17
第17 页
Layer:
Multi-dimensional feature map
neurons
Output:
Dense layer feed-forward
neurons
Input:
e.g. image, spectral map of
voice data
Layer:
Multi-dimensional feature map
neurons
18. Sharing some early experiences/lessons
Infrastructure
Computation abstraction might be too low level (a lot
of pros and cons)
A generic deep learning platform is very challenging
(like recurrent NN)
Communication is important
Methods of partitioning models are important
High performance mathematical library is useful
SRA-SV | Cloud Research Lab Slide 18
19. Sharing some early experiences
Algorithm/Models
Models for ASR are relatively small
Models for image are much larger
Models for NLP are typical small
DA seems more efficient than RBM for
image
Accelerated SGD or Hessian free
optimizations need to be explored
SRA-SV | Cloud Research Lab Slide 19
20. Usage cases of Deep Learning
SRA-SV | Cloud Research Lab Slide 20
21. Image Recognition
Object models
Object parts
SRA-SV | Cloud Research Lab Slide 21
Edges
Pixels
Image
pixels
Hand-designed
Feature Extraction
(SIFT, HOG etc.)
Trainable
Classifier
Object
Category
Featured Learner
(Convolutional NN is
popular)
Learned high level features
Data
augmentation
Central and
corner crops
Original
Image
22. Speech Recognition
• DNN is used to replace GMM to learn state
output probability in HMM.
• FF and DBN have been used for ASR
• CNN starts being used to further improve WER
• Rectified Linear Activation seems better than
SRA-SV | Cloud Research Lab Slide 22
Sigmoid
• Models are relatively small (e.g. 5 layers, 2560
neurons/hidden layer)
Li Deng, A Tutorial Survey of Architectures, Algorithms, and Applications for Deep Learning
23. NLP
Deep Learning in NLP is quite new
Learn word vector
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean, Efficient Estimation of Word Representations in Vector Space
SRA-SV | Cloud Research Lab Slide 23
24. NLP
Based on word vector, map sentences to vector space now
Sentiment Analysis
Richard Socher Jeffrey Pennington Eric H. Huang Andrew Y. Ng Christopher D. Manning, Semi-Supervised Recursive Autoencoders
for Predicting Sentiment Distributions
SRA-SV | Cloud Research Lab Slide 24