Which Is Deeper
Comparison of Deep Learning
Frameworks Atop Spark
Zhe Dong, Dr. Yu Cao
EMC Corporation
Outline
• Motivation
• Theoretical Principle
• State-of-the-Art
• Evaluation Criteria
• Evaluation Results
• Summary
• Conclusion
2
Deep Learning on Spark Motivation
• Single-machine DL
• Low efficiency (in hours to even days)
• Limited DNN model capability (hard to
support billions of parameters)
• Dedicated deep learning cluster
• Massive data movement
• High maintenance cost
• Spark+Deep Learning = Truly All-in-One
3
Theoretical Principle
• Large Scale Distributed Deep Networks,Jeffrey Dean,2012
• Model parallelism
• Data parallelism
https://papers.nips.cc/paper/4687-large-scale-
distributed-deep-networks.pdf 4
Data Parallelism for distributed SGD
• Model is replicated on worker nodes
• Two repeating steps
– Train each model replica with mini-batches
– Synchronize model parameters across cluster
• Specific implementations can be different
– How parameters are combined
– Synchronization (strong or weak)
– Parameter server (centralized or not)
5
DownpourSGD Client Pseudo code
http://www.cs.toronto.edu/~ranzato/publications/DistBeliefNIPS2012_withAppendix
.pdf
6
DL on Spark – State-of-the-Art
l AMPLab SparkNet
l Yahoo! CaffeOnSpark
l Arimo Tensorflow On Spark
l Skymind DeepLearning4J
l DeepDist
l H2O Spark
7
Evaluation Criteria
Evaluation
Criteria
Dimensions For Example
Ease of Getting
Started
Documentation Are there detailed, well-organized, up-to-date documents?
Installation How automatic it is?
Built-in Examples Examples available for quick warming up?
Ease of Use Interface Programming language support
Model Encapsulation Model/Layer/Node
Functionality Built-in Models Which NN models have been implemented?
Parallelism Model parallelism or data parallelism
Performance Performance MNIST benchmark results
Status Quo Community Vitality Github project statistics
Enterprise Support Contributions from organizations? 8
SparkNet
• Started by AMPLab from 2015
• Wrapper of Caffe and Tensorflow
• Centralized parameter server
• Strong SGD synchronization
• Differentiating feature: A fixed number (τ) of iterations (mini-
batch) on its subset of data
http://arxiv.org/pdf/1511.06051v4 9
AMPLab SparkNet - EvaluationEvaluation
Criteria
Dimensions SparkNet Score
Ease of Getting
Started
Documentation Paper; No Blog; README.md in Github
Installation No Installation; Have to copy to each worker node
Built-in Examples Cifar10/MNIST/ImageNet
Ease of Use Interface Java/Scala
Model
Encapsulation
Model/Layer
Functionality Built-in Models Tensorflow and Caffe
Parallelism Data Parallelism
Performance Performance MNIST
Status Quo Community
Vitality
Enterprise
Support
AMPLab
1 2 3 4
Iterations 1000 2000 5000 10000
Time (seconds) 2130 4218 10471 21003
Accuracy 94.13% 94.26% 94.01% 94.22%
10
Deeplearning4J
• Started by Skymind from 2014
• An open-source, distributed deep-learning project in Java
and Scala
• Parameter server: IterativeReduce
• Strong SGD synchronization
http://deeplearning4j.org/iterativereduce.html 11
Deeplearning4J - Evaluation
Evaluation
Criteria
Dimensions
DL4J Score
Ease of Getting
Started
Documentation Comprehensive but bad-organized
Installation No Installation
Built-in Examples Only For CDH5;MNIST/IRIS/GravesLSTM
Ease of Use Interface Java/Scala
Model
Encapsulation
Layer
Functionality Built-in Models CNN/RNN/LSTM/DBN/SAE
Parallelism Data Parallelism
Performance Performance MNIST
Status Quo Community
Vitality
Enterprise
Support
Skymind
1 2 3 4
Epochs 5 10 15 20
Time (seconds) 2098 4205 6303 8367
Accuracy 70% 79% 82.7% 84.6%
12
CaffeOnSpark
• Started by Yahoo! from 2015
• Peer-to-Peer parameter server
• Strong SGD synchronization
• Distinguishing feature: MPI Allreduce, RMDA, Infiniband
w1 w2 w3
w1 w2 w3w1 w2 w3
Worker 1 (Parameter Server for w1)
Worker 3
(Parameter Server for w3)
Worker 2
(Parameter Server for w2)
Weights
propagation
(Gradients are sent
in reverse
direction)
13
CaffeOnSpark - Evaluation
Evaluation
Criteria
Dimensions
CaffeOnSpark Score
Ease of Getting
Started
Documentation Blog; README.md in github
Installation Have to install all Caffe needed in each node
Built-in
Examples
Cifar10/MNIST
Ease of Use Interface Java/Scala, DataFrames
Model
Encapsulation
Model
Functionality Built-in Models Caffe
Parallelism Data Parallelism
Performance Performance MNIST
Status Quo Community
Vitality
Enterprise
Support
Yahoo!
1 2 3 4
Iterations 1000 2000 5000 10000
Time(seconds) 224 445 1113 2229
Accuracy 97% 99.4% 99.7% 99.6%
14
Tensorflow on Spark
• Started by Arimo from 2014
• A data-parallel Downpour SGD implementation on Spark
• Centralized parameter server
• Weak SGD synchronization
15
Tensorflow on Spark - Evaluation
Evaluation
Criteria
Dimensions
Tensorflow on Spark Score
Ease of Getting
Started
Documentation Blog; Spark Summit East 2016 slides and video
Installation Dependent on Tensorflow and tornado
Built-in
Examples
MNISTcnn/MNISTdnn/higgsdnn/moleculardnn
Ease of Use Interface Python
Model
Encapsulation
Model/Layer
Functionality Built-in Models Tensorflow
Parallelism Data Parallelism
Performance Performance MNIST
Status Quo Community
Vitality
Enterprise
Support
Arimo
1 2 3 4
Epochs 5 10 15 20
Time(seconds) 223 415 615 828
Accuracy 93% 94% 94.2% 95.4%
16
Benchmark – MNIST
20
30
40
50
60
70
80
90
100
0 2000 4000 6000 8000 10000
SparkNet
DL4J
CaffeOnSpark
Tensorflow on Spark
One master (16-Core,64GB)
Five slaves (8-Core,32GB)
Executor memory: 20GB
Batch size: 64
Accuracy
Time (seconds) 17
Benchmark – MNIST
20
30
40
50
60
70
80
90
100
0 200 400 600 800 1000
SparkNet
CaffeOnSpark
Tensorflow on Spark
One master (16-Core,64GB)
Five slaves (8-Core,32GB)
Executor memory: 20GB
Batch size: 64
Accuracy
Time (seconds) 18
Tensorflow On Spark - Evaluation
• Easy of Use
– Language:Java/Scala
– Interface Level: Model/High Level Network Structure
• Function
– Algorithm:Excellent
– Data Parallel
– Only Ethernet
• Easy to Get Start
– Document: Average
– Need to setup in each node
– Example: Average
• Performance
• Maturity
– Early Stage
– Community: bad
– Commercial or Big company support: AMPLab
Evaluation
Criteria
Dimensions SparkNet DL4J CaffeOnSpark Tensorflow on
Spark
Ease of
Getting Started
Documentation
Installation
Built-in Examples
Ease of Use Interface
Model Encapsulation
Functionality Built-in Models
Parallelism
Performance Performance
Status Quo Community Vitality
Enterprise Support
19
Conclusion
• Common issues
– Lack of model parallelism
– Potential network congestion
– Early-stage development
• Future evaluation work
– GPU integration
– SGD synchronization
– Scalability
20
THANK YOU.
Zhe.Dong@emc.com

Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark

  • 1.
    Which Is Deeper Comparisonof Deep Learning Frameworks Atop Spark Zhe Dong, Dr. Yu Cao EMC Corporation
  • 2.
    Outline • Motivation • TheoreticalPrinciple • State-of-the-Art • Evaluation Criteria • Evaluation Results • Summary • Conclusion 2
  • 3.
    Deep Learning onSpark Motivation • Single-machine DL • Low efficiency (in hours to even days) • Limited DNN model capability (hard to support billions of parameters) • Dedicated deep learning cluster • Massive data movement • High maintenance cost • Spark+Deep Learning = Truly All-in-One 3
  • 4.
    Theoretical Principle • LargeScale Distributed Deep Networks,Jeffrey Dean,2012 • Model parallelism • Data parallelism https://papers.nips.cc/paper/4687-large-scale- distributed-deep-networks.pdf 4
  • 5.
    Data Parallelism fordistributed SGD • Model is replicated on worker nodes • Two repeating steps – Train each model replica with mini-batches – Synchronize model parameters across cluster • Specific implementations can be different – How parameters are combined – Synchronization (strong or weak) – Parameter server (centralized or not) 5
  • 6.
    DownpourSGD Client Pseudocode http://www.cs.toronto.edu/~ranzato/publications/DistBeliefNIPS2012_withAppendix .pdf 6
  • 7.
    DL on Spark– State-of-the-Art l AMPLab SparkNet l Yahoo! CaffeOnSpark l Arimo Tensorflow On Spark l Skymind DeepLearning4J l DeepDist l H2O Spark 7
  • 8.
    Evaluation Criteria Evaluation Criteria Dimensions ForExample Ease of Getting Started Documentation Are there detailed, well-organized, up-to-date documents? Installation How automatic it is? Built-in Examples Examples available for quick warming up? Ease of Use Interface Programming language support Model Encapsulation Model/Layer/Node Functionality Built-in Models Which NN models have been implemented? Parallelism Model parallelism or data parallelism Performance Performance MNIST benchmark results Status Quo Community Vitality Github project statistics Enterprise Support Contributions from organizations? 8
  • 9.
    SparkNet • Started byAMPLab from 2015 • Wrapper of Caffe and Tensorflow • Centralized parameter server • Strong SGD synchronization • Differentiating feature: A fixed number (τ) of iterations (mini- batch) on its subset of data http://arxiv.org/pdf/1511.06051v4 9
  • 10.
    AMPLab SparkNet -EvaluationEvaluation Criteria Dimensions SparkNet Score Ease of Getting Started Documentation Paper; No Blog; README.md in Github Installation No Installation; Have to copy to each worker node Built-in Examples Cifar10/MNIST/ImageNet Ease of Use Interface Java/Scala Model Encapsulation Model/Layer Functionality Built-in Models Tensorflow and Caffe Parallelism Data Parallelism Performance Performance MNIST Status Quo Community Vitality Enterprise Support AMPLab 1 2 3 4 Iterations 1000 2000 5000 10000 Time (seconds) 2130 4218 10471 21003 Accuracy 94.13% 94.26% 94.01% 94.22% 10
  • 11.
    Deeplearning4J • Started bySkymind from 2014 • An open-source, distributed deep-learning project in Java and Scala • Parameter server: IterativeReduce • Strong SGD synchronization http://deeplearning4j.org/iterativereduce.html 11
  • 12.
    Deeplearning4J - Evaluation Evaluation Criteria Dimensions DL4JScore Ease of Getting Started Documentation Comprehensive but bad-organized Installation No Installation Built-in Examples Only For CDH5;MNIST/IRIS/GravesLSTM Ease of Use Interface Java/Scala Model Encapsulation Layer Functionality Built-in Models CNN/RNN/LSTM/DBN/SAE Parallelism Data Parallelism Performance Performance MNIST Status Quo Community Vitality Enterprise Support Skymind 1 2 3 4 Epochs 5 10 15 20 Time (seconds) 2098 4205 6303 8367 Accuracy 70% 79% 82.7% 84.6% 12
  • 13.
    CaffeOnSpark • Started byYahoo! from 2015 • Peer-to-Peer parameter server • Strong SGD synchronization • Distinguishing feature: MPI Allreduce, RMDA, Infiniband w1 w2 w3 w1 w2 w3w1 w2 w3 Worker 1 (Parameter Server for w1) Worker 3 (Parameter Server for w3) Worker 2 (Parameter Server for w2) Weights propagation (Gradients are sent in reverse direction) 13
  • 14.
    CaffeOnSpark - Evaluation Evaluation Criteria Dimensions CaffeOnSparkScore Ease of Getting Started Documentation Blog; README.md in github Installation Have to install all Caffe needed in each node Built-in Examples Cifar10/MNIST Ease of Use Interface Java/Scala, DataFrames Model Encapsulation Model Functionality Built-in Models Caffe Parallelism Data Parallelism Performance Performance MNIST Status Quo Community Vitality Enterprise Support Yahoo! 1 2 3 4 Iterations 1000 2000 5000 10000 Time(seconds) 224 445 1113 2229 Accuracy 97% 99.4% 99.7% 99.6% 14
  • 15.
    Tensorflow on Spark •Started by Arimo from 2014 • A data-parallel Downpour SGD implementation on Spark • Centralized parameter server • Weak SGD synchronization 15
  • 16.
    Tensorflow on Spark- Evaluation Evaluation Criteria Dimensions Tensorflow on Spark Score Ease of Getting Started Documentation Blog; Spark Summit East 2016 slides and video Installation Dependent on Tensorflow and tornado Built-in Examples MNISTcnn/MNISTdnn/higgsdnn/moleculardnn Ease of Use Interface Python Model Encapsulation Model/Layer Functionality Built-in Models Tensorflow Parallelism Data Parallelism Performance Performance MNIST Status Quo Community Vitality Enterprise Support Arimo 1 2 3 4 Epochs 5 10 15 20 Time(seconds) 223 415 615 828 Accuracy 93% 94% 94.2% 95.4% 16
  • 17.
    Benchmark – MNIST 20 30 40 50 60 70 80 90 100 02000 4000 6000 8000 10000 SparkNet DL4J CaffeOnSpark Tensorflow on Spark One master (16-Core,64GB) Five slaves (8-Core,32GB) Executor memory: 20GB Batch size: 64 Accuracy Time (seconds) 17
  • 18.
    Benchmark – MNIST 20 30 40 50 60 70 80 90 100 0200 400 600 800 1000 SparkNet CaffeOnSpark Tensorflow on Spark One master (16-Core,64GB) Five slaves (8-Core,32GB) Executor memory: 20GB Batch size: 64 Accuracy Time (seconds) 18
  • 19.
    Tensorflow On Spark- Evaluation • Easy of Use – Language:Java/Scala – Interface Level: Model/High Level Network Structure • Function – Algorithm:Excellent – Data Parallel – Only Ethernet • Easy to Get Start – Document: Average – Need to setup in each node – Example: Average • Performance • Maturity – Early Stage – Community: bad – Commercial or Big company support: AMPLab Evaluation Criteria Dimensions SparkNet DL4J CaffeOnSpark Tensorflow on Spark Ease of Getting Started Documentation Installation Built-in Examples Ease of Use Interface Model Encapsulation Functionality Built-in Models Parallelism Performance Performance Status Quo Community Vitality Enterprise Support 19
  • 20.
    Conclusion • Common issues –Lack of model parallelism – Potential network congestion – Early-stage development • Future evaluation work – GPU integration – SGD synchronization – Scalability 20
  • 21.