Building distributed deep learning engine

Building Distributed Deep Learning Engine
Guangdeng Liao, Zhan Zhang and Murtaza Zafer
SRA-SV | Cloud Research Lab Slide 1

What is Deep Learning
Deep learning is a set of algorithms that attempt to model high-level
abstractions in data by using architectures composed of multiple non-linear
transformations
Learn hidden features Learning state emission
prob.
Learning word vectors

Thanks Big Data, Deep Learning is not only research
Usage Scenario: Speech Recognition, Image processing and NLP

Why Samsung needs Deep Learning?
To make our devices smarter and more intelligent by recognizing
voice, image and even language

How does Deep Learning look like?
Many more examples (millions to billions parameters ) in Speech
Recognition, Image Processing and NLP
Krizhevsky, A., Sutskever, I. and Hinton, G. E. ImageNet Classification with Deep Convolutional Neural Networks

Deep Learning is challenging..
BIG DATA + BIG MODEL
Building a distributed deep learning platform for
Quite new, no mature platform yet
Samsung R&D
Hard to design and develop DL algorithms

Distributed Deep Learning Platform we are building
Object
recognition
App.
Speech
recognition
….
RBM FF DA CNN ….
Model-parallel
engine
Algorithms
Infrastructure
I/O ….
Parameter
server
Execution
engine
Math
Our focus

Now, Let’s dive deeper and more technically…

Model-Parallel Engine (MPE)
Parallelize a big ML model over Hadoop YARN cluster
Auto-deployment
of topology (in-memory)
User
defined
model
Auto-generation
of model topology
Auto-partition of
topology over
cluster
c1
c2
c3
Neuron-like
programming
Message-based
communication
Message-driven
computation
-Define nodes
-Define groups
-Define connections

MPE’s Architecture
Container
Node Manager
Data Communication:
• node-level
• group-level
Container
Node manager
Control comm. based on
Thrift
Data comm. based on Netty
Application Master
Controller
Partition and
deploy topology
Container
Node manager

How to partition big models
Vertical Partition Horizontal Partition

Execution Engine (Layer-by-Layer Training)
Can stack different layers and training algorithms
HDFS/LFS
HDFS/LFS

Model-parallel itself is not scalable enough

Deep Learning Infra.: Hybrid of Data-parallelism and Model-parallelism
…….. Data Chunk
Data Chunk
……..
Model-parallel Model-parallel
Parameter
Server 1
Parameters coordination
Parameter
Server n ……..
Data-parallelism
Lots of model instances
Parameter servers
help models learn
each other

Distributed Parameter Servers
Currently we support asynchronous stochastic gradient descent with AdaGrad
Client Client Client
Server 1 Server 2 Server 3
In-memory
cache/storage
HBase/HDFS
In-memory
cache/storage
Asyn. communication
In-memory
cache/storage
Pull/Push

Deep Learning Algorithms
Deep Learning Algorithms
Feed-forward Neural Network
Restricted Boltzmann
Machine
Denoise Auto-encoder
Deep Belief Network
More importantly, we can stack them
layer by layer

More Challenging Algorithm: Convolutional Neural Network
Different convolutional, normalization and pooling layers
Weight shared and non-shared feature maps
Feature map is minimum partition unit
第17 页
Layer:
Multi-dimensional feature map
neurons
Output:
Dense layer feed-forward
neurons
Input:
e.g. image, spectral map of
voice data
Layer:
Multi-dimensional feature map
neurons

Sharing some early experiences/lessons
Infrastructure
Computation abstraction might be too low level (a lot
of pros and cons)
A generic deep learning platform is very challenging
(like recurrent NN)
Communication is important
Methods of partitioning models are important
High performance mathematical library is useful

Sharing some early experiences
Algorithm/Models
Models for ASR are relatively small
Models for image are much larger
Models for NLP are typical small
DA seems more efficient than RBM for
image
Accelerated SGD or Hessian free
optimizations need to be explored

Usage cases of Deep Learning

Image Recognition
Object models
Object parts
Edges
Pixels
Image
pixels
Hand-designed
Feature Extraction
(SIFT, HOG etc.)
Trainable
Classifier
Object
Category
Featured Learner
(Convolutional NN is
popular)
Learned high level features
Data
augmentation
Central and
corner crops
Original
Image

Speech Recognition
• DNN is used to replace GMM to learn state
output probability in HMM.
• FF and DBN have been used for ASR
• CNN starts being used to further improve WER
• Rectified Linear Activation seems better than
Sigmoid
• Models are relatively small (e.g. 5 layers, 2560
neurons/hidden layer)
Li Deng, A Tutorial Survey of Architectures, Algorithms, and Applications for Deep Learning

NLP
Deep Learning in NLP is quite new
Learn word vector
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean, Efficient Estimation of Word Representations in Vector Space

NLP
Based on word vector, map sentences to vector space now
Sentiment Analysis
Richard Socher Jeffrey Pennington Eric H. Huang Andrew Y. Ng Christopher D. Manning, Semi-Supervised Recursive Autoencoders
for Predicting Sentiment Distributions

Building distributed deep learning engine

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Building distributed deep learning engine

Similar to Building distributed deep learning engine (20)

Building distributed deep learning engine

Editor's Notes