Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Comparison	of	deep	learning
frameworks	from	a	viewpoint	of
double	backpropagation
Preferred	Networks,	Inc.
Kenta	Oono <oon...
Agenda
• Technological	stack	of	DL	frameworks
• Design	choice	in	DL	frameworks
• Double	backprop primer
• Coding	examples	...
Technology	stack	of	a	DL	framework
name functions example
Graphical visualization DIGITS, TensorBoard
Machine learning wor...
Technology	stack	of	Chainer
cuDNN
Chainer
NumPy CuPy
BLAS
cuBLAS,	
cuRAND
CPU GPU
4
name
Graphical visualization
Machine l...
Technology	stack	of	TensorFlow
cuDNN
TensorFlow
Eigen::Tensor
BLAS
cuBLAS,	
cuRAND
CPU GPU
5
TensorBoard
TF	slim
Keras
nam...
Technology	stack	of	Theano
CUDA,	OpenCL
CUDAToolkit
Theano
BLAS
CPU GPU
6
lib
gpuarray
NumPy
Keras,	Lasagne,	Blocks,	etc.
...
Technology	stack	of	Keras
7
Keras
TensorFlowTheano
Technology
Stack	of	Theano
Technology	
Stack	of	TF
name
Graphical visua...
8
9
10
11
12
Important	Design	Choices
through	user’s	typical	workflow
Write	NNs
(in	which	language?)
Compute	backprop
(how?)
Update	...
Important	Design	Choices
through	user’s	typical	workflow
Write	NNs
(in	which	language?)
Compute	backprop
(how?)
Update	par...
http://bit.ly/aaai-dlif
14
Neural	Network	as	a	Computational	Graph
• In	most	frameworks,	NN	is	conceptualized	as	a	computational	graph	(CG).
• The	si...
Multi	Layer	Perceptron	(MLP)
x Affine
W1 b1
h1 ReLU a1
Affine
W2 b2
h2 ReLU a2
Soft
max
prob
Cross
Entropy
loss
t 16
How	to	compute	backprop
Backprop through	graphs
Framework	only	builds	graphs	of	
forward	prop,	and	do	backprop
by	backtrac...
How	to	compute	backprop
Backprop through	graphs
Easy	and	simple	to	implement
Backprop	computation	need	not	
be	defined	as	...
Double	backprop
x F z
y
・・・ L
class F(FunctionNode):
def forward(self, x, y):
return x * x + y
def backward(self, x, y, gz...
Double	backprop
x F z
y
gx Grad F gz
gy
・・・ L
Backprop!
=∂L/∂z=∂L/∂x
=∂L/∂y
1.0
=∂L/∂L
Mul
x
gz
y
gx
gy
*2
20
Double	backprop
x F z
y
gx Grad F 1.0
gy
Backprop!
=∂z/∂x
=∂z/∂y 21
Double	backprop
x F z
y
gx
Grad F1.0
gy
22
Double	backprop
x Mul z
y
gx
Grad F1.0
gy
Backprop!
1.0
Double
Grad F
ggx
=∂2z/∂x2
23
Double	backprop
x f z
Computes	the	differentiation	of	L = G(f(x), ∇f(x)) with	respect	to	x
L = G(f(x), ∇f(x))
24
Double	backprop
x f z
gxGrad f
Computes	the	differentiation	of	L = G(f(x), ∇f(x)) with	respect	to	x
L = G(f(x), ∇f(x))
25
Double	backprop
x f z
gxGrad f
・・・ L
Computes	the	differentiation	of	L = G(f(x), ∇f(x)) with	respect	to	x
L = G(f(x), ∇f(x...
Double	backprop
x f z
gxGrad f
・・・ L
Backprop!
ggx
Double
Grad f
∂L/∂x
1.0gzGrad f
Computes	the	differentiation	of	L = G(f...
Example	(Chainer)
http://bit.ly/2wpEzO5
28
Example	(PyTorch)
29
Example	(TensorFlow)
30
Conclusion
• Several	DL	frameworks	have	similarity	in	their	
structure
• Difference	in	choice	of	design	determines	capabil...
Upcoming SlideShare
Loading in …5
×

Comparison of deep learning frameworks from a viewpoint of double backpropagation

2,818 views

Published on

Chainer Meetup #6@Preferred Networks, Inc. Japan in Sep. 30th 2017

Published in: Technology
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Comparison of deep learning frameworks from a viewpoint of double backpropagation

  1. 1. Comparison of deep learning frameworks from a viewpoint of double backpropagation Preferred Networks, Inc. Kenta Oono <oono@preferred.jp> Chainer Meetup #6@Preferred Networks Sep. 30th 2017 1
  2. 2. Agenda • Technological stack of DL frameworks • Design choice in DL frameworks • Double backprop primer • Coding examples of double backprop in Chainer, PyTorch, and TF 2
  3. 3. Technology stack of a DL framework name functions example Graphical visualization DIGITS, TensorBoard Machine learning workflow management Dataset prep, Save/Load Training loop Keras, TF slim Computational graph(CG) management Build/Optimize CGs Forward/Back prop Theano, TensorFlow Torch.nn Multi-dimensional array processing High-level array manipulation NumPy, CuPy Eigen, Torch (core) Numerical computation Matrix operation Convolution BLAS(OpenBLAS, MKL), cuBLAS, cuDNN, MKL DNN Computational device CPU, GPU, TPU, FPGA 3
  4. 4. Technology stack of Chainer cuDNN Chainer NumPy CuPy BLAS cuBLAS, cuRAND CPU GPU 4 name Graphical visualization Machine learning workflow management Computational graph management Multi-dimensional array processing Numerical computation Computational device
  5. 5. Technology stack of TensorFlow cuDNN TensorFlow Eigen::Tensor BLAS cuBLAS, cuRAND CPU GPU 5 TensorBoard TF slim Keras name Graphical visualization Machine learning workflow management Computational graph management Multi-dimensional array processing Numerical computation Computational device
  6. 6. Technology stack of Theano CUDA, OpenCL CUDAToolkit Theano BLAS CPU GPU 6 lib gpuarray NumPy Keras, Lasagne, Blocks, etc. name Graphical visualization Machine learning workflow management Computational graph management Multi-dimensional array processing Numerical computation Computational device
  7. 7. Technology stack of Keras 7 Keras TensorFlowTheano Technology Stack of Theano Technology Stack of TF name Graphical visualization Machine learning workflow management Computational graph management Multi-dimensional array processing Numerical computation Computational device
  8. 8. 8
  9. 9. 9
  10. 10. 10
  11. 11. 11
  12. 12. 12 Important Design Choices through user’s typical workflow Write NNs (in which language?) Compute backprop (how?) Update parameters (how to represent?) (how to update?) Run user codes (when?) Optimize CG (how?) Scale up training (how?) Coding Execution Improvement
  13. 13. Important Design Choices through user’s typical workflow Write NNs (in which language?) Compute backprop (how?) Update parameters (how to represent?) (how to update?) Run user codes (when?) Coding Execution Improvement Optimize CG (how?) Scale up training (how?) 13
  14. 14. http://bit.ly/aaai-dlif 14
  15. 15. Neural Network as a Computational Graph • In most frameworks, NN is conceptualized as a computational graph (CG). • The simplest form of CG is a bipartite DAG (Directed Acyclic Graph) consisting of data nodes and operator nodes. y = x1 * x2 z = y - x3 x1 mul suby x3 z x2 data node operator node 15
  16. 16. Multi Layer Perceptron (MLP) x Affine W1 b1 h1 ReLU a1 Affine W2 b2 h2 ReLU a2 Soft max prob Cross Entropy loss t 16
  17. 17. How to compute backprop Backprop through graphs Framework only builds graphs of forward prop, and do backprop by backtracking the graphs. E.g. Torch.nn, Caffe Backprop as extended graphs Framework builds graphs for backprop as well as those for forward prop. E.g. Theano, MXNet, TensorFlow, Chainer, PyTorch a mul suby c z b a mul suby c z b gzid neg mul mul gy gc ga gb ∇y z∇a z ∇z z = 1 17
  18. 18. How to compute backprop Backprop through graphs Easy and simple to implement Backprop computation need not be defined as graphs. Low flexibility Features available for graphs may not apply to backprop computations. Backprop as extended graphs Implementation gets complicated High flexibility Any features available for graphs can also be applied to backprop computations (e.g. backprop of backprop). 18
  19. 19. Double backprop x F z y ・・・ L class F(FunctionNode): def forward(self, x, y): return x * x + y def backward(self, x, y, gz): return 2 * gz * x, gz NumPy, CuPy Note: The interface is simplified from actual implementation. chainer.Variable -> Creates CG 19
  20. 20. Double backprop x F z y gx Grad F gz gy ・・・ L Backprop! =∂L/∂z=∂L/∂x =∂L/∂y 1.0 =∂L/∂L Mul x gz y gx gy *2 20
  21. 21. Double backprop x F z y gx Grad F 1.0 gy Backprop! =∂z/∂x =∂z/∂y 21
  22. 22. Double backprop x F z y gx Grad F1.0 gy 22
  23. 23. Double backprop x Mul z y gx Grad F1.0 gy Backprop! 1.0 Double Grad F ggx =∂2z/∂x2 23
  24. 24. Double backprop x f z Computes the differentiation of L = G(f(x), ∇f(x)) with respect to x L = G(f(x), ∇f(x)) 24
  25. 25. Double backprop x f z gxGrad f Computes the differentiation of L = G(f(x), ∇f(x)) with respect to x L = G(f(x), ∇f(x)) 25
  26. 26. Double backprop x f z gxGrad f ・・・ L Computes the differentiation of L = G(f(x), ∇f(x)) with respect to x L = G(f(x), ∇f(x)) 26
  27. 27. Double backprop x f z gxGrad f ・・・ L Backprop! ggx Double Grad f ∂L/∂x 1.0gzGrad f Computes the differentiation of L = G(f(x), ∇f(x)) with respect to x L = G(f(x), ∇f(x)) 27
  28. 28. Example (Chainer) http://bit.ly/2wpEzO5 28
  29. 29. Example (PyTorch) 29
  30. 30. Example (TensorFlow) 30
  31. 31. Conclusion • Several DL frameworks have similarity in their structure • Difference in choice of design determines capability of frameworks • Introduction of double backprop and toy examples in several frameworks. 31

×