SlideShare a Scribd company logo
1 of 35
Download to read offline
Tutorial:
Deep Learning Implementations
and Frameworks
Seiya Tokui*, Kenta Oono*, Atsunori Kanemura+, Toshihiro Kamishima+
*Preferred Networks, Inc. (PFN)
{tokui,oono}@preferred.jp
+National Institute of Advanced Industrial Science and Technology (AIST)
atsu-kan@aist.go.jp, mail@kamishima.net
1
Overview of this tutorial
•1st session (KO, 8:30 ‒ 10:00)
•Introduction
•Basics of neural networks
•Common design of neural network
implementations
•2nd session (ST, 10:30 ‒ 12:30)
•Differences of deep learning frameworks
•Coding examples of frameworks
•Conclusion
Common Design of
Deep Learning Frameworks
Kenta Oono <oono@preferred.jp>
Preferred Networks Inc.
2016/4/19 3DLIF Tutorial @ PAKDD2016
Objective of this part
• How deep learning frameworks represent various neural
networks.
• How deep learning frameworks realize the training procedure
of neural networks.
• Technology stack that is common to most of deep learning
frameworks.
2016/4/19 4DLIF Tutorial @ PAKDD2016
Steps for training neural networks
Prepare the training dataset
Repeat until meeting some criterion
Prepare for the next (mini) batch
Compute the loss (forward prop)
Initialize the Neural Network (NN) parameters
Save the NN parameters
Define how to compute the loss of this batch
Compute the gradient (backprop)
Update the NN parameters
2016/4/19 5DLIF Tutorial @ PAKDD2016
Technology stack of DL framework
name functions example
Graphical interface DIGITS, TensorBoard
Machine learning workflow
management
Dataset Management
Training Loop
Keras, Lasagne
Blocks, TF Learn
Computational graph
management
Build computational graph
Forward prop/Backprop
Theano, TensorFlow
Torch.nn
Multi-dimensional
array library
Linear algebra NumPy, CuPy
Eigen, torch (core)
Numerical computation
package
Matrix operation
Convolution
BLAS, cuBLAS, cuDNN
Hardware CPU, GPU
2016/4/19 6DLIF Tutorial @ PAKDD2016
Technology stack of DL framework
2016/4/19 7DLIF Tutorial @ PAKDD2016
name functions example
Graphical interface DIGITS, TensorBoard
Machine learning workflow
management
Dataset Management
Training Loop
Keras, Lasagne
Blocks, TF Learn
Computational graph
management
Build computational graph
Forward prop/Backprop
Theano, TensorFlow
Torch.nn
Multi-dimensional
array library
Linear algebra NumPy, CuPy
Eigen, torch (core)
Numerical computation
package
Matrix operation
Convolution
BLAS, cuBLAS, cuDNN
Hardware CPU, GPU
Neural Network as a Computational Graph
• In simplest form, NN is represented as a computational graph
(CG) that is a stack of bipartite DAGs (Directed Acyclic Graph)
consisting of data nodes and operator nodes.
y = x1 * x2
z = y - x3
x1 mul suby
x3
z
x2
data node
operator node
2016/4/19 8DLIF Tutorial @ PAKDD2016
Example: Multi-layer Perceptron (MLP)
x Affine
W1 b1
h1 ReLU a1
Affine
W2 b2
h2 ReLU a2
Soft
max
y
Cross
Entropy
Lo
ss
t
It is choice of
implementation if CG
includes weights and
biases.
2016/4/19 9DLIF Tutorial @ PAKDD2016
Example: Recurrent Neural Network (RNN)
x1
RNN
Unit
h1
RNN
Unit
x2
h2
RNN
Unit
xT
h0 ・・・ hT
RNN unit can be :
• Affine + activation function
• LSTM (Long Short-Term
Memory)
• GRU (Gated Recurrent Unit)
x h y
xt
ht-1
ht
W b
2016/4/19 10DLIF Tutorial @ PAKDD2016
Example: Stacked RNN
x1
RNN
Unit
h1
RNN
Unit
x2
h2
RNN
Unit
xT
h0 ・・・ hT
RNN
Unit
z1
RNN
Unit
z2
RNN
Unit
z0 ・・・ zT
Soft
max
Affine y
2016/4/19 11DLIF Tutorial @ PAKDD2016
Example: RNN with control flow nodes
loop
enter
s
i
predic
ate
pr
ed
s
h0
x
switch s
RNN
Unit
s’update
loop
end
y
pred=True
pred=False
• TensorFlow has control flow
nodes (e.g. cond, switch,
while)
• As CG has a loop, some
mechanism is necessary that
resolves he dependency of
nodes to schedule the order
of calculation.
W
b
2016/4/19 12DLIF Tutorial @ PAKDD2016
Automatic Differentiation
• Computes gradient of some specified data nodes (e.g. loss)
with respect to each data node.
• Each operator node must have backward operation to
calculate gradients w.r.t. its inputs from gradients w.r.t. its
outputs (realization of chain rule).
• e.g. Function class of Chainer has backward method.
• e.g. Each layer classes of Caffe has Backward_cpu and
Backward_gpu methods
• e.g. Autograd has a thin wrapper that adds gradient methods as a
closure to most of NumPy methods.
2016/4/19 13DLIF Tutorial @ PAKDD2016
Backprop through CG
∇y z∇x1 z ∇z z = 1
y = x1 * x2
z = y - x3
x1 mul suby
x3
z
x2
2016/4/19 14DLIF Tutorial @ PAKDD2016
Backprop as extended graphs
x1 mul suby
x3
z
x2
dzid
neg
mul
mul
dy
dx
3
dx
1
dx
2
forward
propagation
backward
propagation
y = x1 * x2
z = y - x3
2016/4/19 15DLIF Tutorial @ PAKDD2016
Example: Theano
2016/4/19 16DLIF Tutorial @ PAKDD2016
Technology stack of DL framework
2016/4/19 17DLIF Tutorial @ PAKDD2016
name functions example
Graphical interface DIGITS, TensorBoard
Machine learning workflow
management
Dataset Management
Training Loop
Keras, Lasagne
Blocks, TF Learn
Computational graph
management
Build computational graph
Forward prop/Backprop
Theano, TensorFlow
Torch.nn
Multi-dimensional
array library
Linear algebra NumPy, CuPy
Eigen, torch (core)
Numerical computation
package
Matrix operation
Convolution
BLAS, cuBLAS, cuDNN
Hardware CPU, GPU
Numerical optimizer
• Many gradient-based optimization algorithms are
implemented.
• Stochastic Gradient Descent (SGD) is implemented in most DL
frameworks.
• It depends on concrete tasks which optimizer works best.
w: parameters of neural network
θ: states of optimizer
L: loss function
Γ: optimizer-specific function
initialize w, θ
until meet the criteria:
get data (x, y)
calculate ∇w L(x, y; w)
w, θ ← Γ(w, θ, ∇w L)
2016/4/19 18DLIF Tutorial @ PAKDD2016
Serialization
• Save/Load the snapshot of training process in specified format
(e.g. hdf5, npz, protobuf)
• Models to be trained (= architectures and parameters of NNs)
• States of training procedure (e.g. epoch, learning rate, momentum)
• Serialization enhance the portability of models.
• Publish pre-trained model (e.g. Model Zoo (Caffe), MXNet, TensorFlow)
• Import pre-trained model of other DL frameworks
• e.g. Chainer supports BVLC-official reference models of Caffe.
2016/4/19 19DLIF Tutorial @ PAKDD2016
Computational optimizer
• Convert CGs to make them simplified and efficient.
e.g. Theano
y = x1 * x2
z = y - x3
2016/4/19 20DLIF Tutorial @ PAKDD2016
Abstraction of ML workflow
• Offers typical training/validation/evaluation procedures as APIs.
• Users should call a single API and do not have to write the procedure
manually.
• e.g. fit, evaluate methods of Model class in Keras.
2016/4/19 21DLIF Tutorial @ PAKDD2016
Prepare the training dataset
Repeat until meeting some criterion
Prepare for the next (mini) batch
Compute the loss (forward prop)
Initialize the Neural Network (NN) parameters
Save the NN parameters
Define how to compute the loss of this batch
Compute the gradient (backprop)
Update the NN parameters
Graphical interface
• Computational graph management
• Editor, Visualizer
• Visualization of training procedure
• Visualization of feature maps, output of NNs etc.
• Transition of error and accuracy
• Performance monitor
• e.g. Throughput, latency, memory usage
2016/4/19 22DLIF Tutorial @ PAKDD2016
Technology stack of DL framework
2016/4/19 23DLIF Tutorial @ PAKDD2016
name functions example
Graphical interface DIGITS, TensorBoard
Machine learning workflow
management
Dataset Management
Training Loop
Keras, Lasagne
Blocks, TF Learn
Computational graph
management
Build computational graph
Forward prop/Backprop
Theano, TensorFlow
Torch.nn
Multi-dimensional
array library
Linear algebra NumPy, CuPy
Eigen, torch (core)
Numerical computation
package
Matrix operation
Convolution
BLAS, cuBLAS, cuDNN
Hardware CPU, GPU
GPU support
• CUDA: Computing platform for GPGPU on NVIDIA GPU
• language extension, compiler, library etc.
• DL frameworks prepare wrappers for CUDA.
• GPU-array library that utilizes cuBLAS, cuRAND etc.
• Layer implementation with cuDNN (e.g. Convolution, sigmoid, LSTM)
• Designed to switch CPU and GPU easily.
• e.g. Users can write CPU-GPU agnostic code.
• e.g. Switch CPU/GPU with environment variables.
• Some framework supports Open CL as a GPU environment, but
CUDA is more popular for now.
2016/4/19 24DLIF Tutorial @ PAKDD2016
Multi-dimensional array library (CPU / GPU)
• In charge of concrete calculation of data nodes.
• Heavily depends on BLAS (CPU) or CUDA / CUDA Toolkits
(GPU)
• CPU
• Third-party library: Eigen::Tensor, NumPy
• Scratch: ND4J (DL4J), mshadow (MXNet)
• GPU
• Third-party library: Eigen::Tensor, PyCUDA, gpuarray
• Scratch: ND4J (DL4J), mshadow (MXNet), CuPy (Chainer)
2016/4/19 25DLIF Tutorial @ PAKDD2016
Which device to use?
• GPU is (by far) faster than CPU in most case.
• Most of tensor calculation consists of element-wise calculation,
matrix multiplications and convolutions.
• Exceptional cases
• Difficult to apply mini-batch technique.
• e.g. variable-length training dataset
• e.g. The architecture of NN depends on the training data.
• GPU calculation cannot hide transfer of data to GPU.
• e.g. Minibatch size is too small.
2016/4/19 26DLIF Tutorial @ PAKDD2016
Technology stack of Chainer
cuDNN
Chainer
NumPy CuPy
BLAS
cuBLAS,
cuRAND
CPU GPU
2016/4/19 27DLIF Tutorial @ PAKDD2016
name
Graphical interface
Machine learning workflow
management
Computational graph
management
Multi-dimensional
array library
Numerical computation
package
Hardware
Technology stack of TensorFlow
cuDNN
TensorFlow
Eigen::Tensor
BLAS
cuBLAS,
cuRAND
CPU GPU
2016/4/19 28DLIF Tutorial @ PAKDD2016
name
Graphical interface
Machine learning workflow
management
Computational graph
management
Multi-dimensional
array library
Numerical computation
package
Hardware
TensorBoard
TF Learn
Technology stack of Theano
CUDA, OpenCL
CUDAToolkit
Theano
BLAS
CPU GPU
2016/4/19 29DLIF Tutorial @ PAKDD2016
name
Graphical interface
Machine learning workflow
management
Computational graph
management
Multi-dimensional
array library
Numerical computation
package
Hardware
lib
gpuarray
NumPy
Technology stack of Keras
2016/4/19 30DLIF Tutorial @ PAKDD2016
name
Graphical interface
Machine learning workflow
management
Computational graph
management
Multi-dimensional
array library
Numerical computation
package
Hardware
Keras
TensorFlowTheano
Technology
Stack of Theano
Technology
Stack of TF
Summary
• Most DL frameworks have many components in common and
can be organized as a similar technology stack.
• At upper layer of the stack, frameworks are designed to
support users to follow typical ML workflows.
• At middle layer, manipulations on computational graphs are
automated.
• At lower layer, optimized tensor calculations are
implemented.
• Realization of these components differ between frameworks,
as we will see in the following part.
2016/4/19 31DLIF Tutorial @ PAKDD2016
memorandum
2016/4/19 32DLIF Tutorial @ PAKDD2016
Training of Neural Networks
• L is designed so that its value gets small as the prediction more
“accurate”
• In deep learning context
• L : represented by neural networks
• w : parameters of neural networks
argminw ∑(x, y) L(x, y; w)
w: parameters
x: feature vector
y: training label
L: loss function
e.g.: Classification problem
332016/4/19 DLIF Tutorial @ PAKDD2016
Layer = function + data nodes
• Layers (e.g. Fully connected layer, convolutional layer) can
be considered as a function with parameters to be optimized.
• In most of modern frameworks, parameters of layers can be
considered as data nodes in a computational graph.
• Framework need to be differentiate which data nodes are
parameters to be optimized or data point.
342016/4/19 DLIF Tutorial @ PAKDD2016
Execution Engine
• It calculates the dependency between data node and
schedules the execution of parts of computational graph
(especially in multi-node or multi-GPU setting)
352016/4/19 DLIF Tutorial @ PAKDD2016

More Related Content

What's hot

Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
MLconf
 
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
MLconf
 

What's hot (20)

Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflow
 
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
 
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
 
Intro to TensorFlow and PyTorch Workshop at Tubular Labs
Intro to TensorFlow and PyTorch Workshop at Tubular LabsIntro to TensorFlow and PyTorch Workshop at Tubular Labs
Intro to TensorFlow and PyTorch Workshop at Tubular Labs
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistry
 
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
 
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
 
Introduction to TensorFlow
Introduction to TensorFlowIntroduction to TensorFlow
Introduction to TensorFlow
 
TensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewTensorFlow and Keras: An Overview
TensorFlow and Keras: An Overview
 
PyTorch Tutorial for NTU Machine Learing Course 2017
PyTorch Tutorial for NTU Machine Learing Course 2017PyTorch Tutorial for NTU Machine Learing Course 2017
PyTorch Tutorial for NTU Machine Learing Course 2017
 
[Update] PyTorch Tutorial for NTU Machine Learing Course 2017
[Update] PyTorch Tutorial for NTU Machine Learing Course 2017[Update] PyTorch Tutorial for NTU Machine Learing Course 2017
[Update] PyTorch Tutorial for NTU Machine Learing Course 2017
 
Learning stochastic neural networks with Chainer
Learning stochastic neural networks with ChainerLearning stochastic neural networks with Chainer
Learning stochastic neural networks with Chainer
 
Machine Intelligence at Google Scale: TensorFlow
Machine Intelligence at Google Scale: TensorFlowMachine Intelligence at Google Scale: TensorFlow
Machine Intelligence at Google Scale: TensorFlow
 
TensorFlow Dev Summit 2017 요약
TensorFlow Dev Summit 2017 요약TensorFlow Dev Summit 2017 요약
TensorFlow Dev Summit 2017 요약
 
Neural networks and google tensor flow
Neural networks and google tensor flowNeural networks and google tensor flow
Neural networks and google tensor flow
 
Keras on tensorflow in R & Python
Keras on tensorflow in R & PythonKeras on tensorflow in R & Python
Keras on tensorflow in R & Python
 
TensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache SparkTensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache Spark
 
VAE-type Deep Generative Models
VAE-type Deep Generative ModelsVAE-type Deep Generative Models
VAE-type Deep Generative Models
 
Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow
 

Viewers also liked

Генерация новых продуктов и идей 1.1
Генерация новых продуктов и идей 1.1Генерация новых продуктов и идей 1.1
Генерация новых продуктов и идей 1.1
Serge Morozov
 
Caffeインストール
CaffeインストールCaffeインストール
Caffeインストール
Kenta Oono
 
提供AMIについて
提供AMIについて提供AMIについて
提供AMIについて
Kenta Oono
 
GPU Accelerated Deep Learning for CUDNN V2
GPU Accelerated Deep Learning for CUDNN V2GPU Accelerated Deep Learning for CUDNN V2
GPU Accelerated Deep Learning for CUDNN V2
NVIDIA
 

Viewers also liked (20)

Introduction to Chainer and CuPy
Introduction to Chainer and CuPyIntroduction to Chainer and CuPy
Introduction to Chainer and CuPy
 
Chainer GTC 2016
Chainer GTC 2016Chainer GTC 2016
Chainer GTC 2016
 
Development and Experiment of Deep Learning with Caffe and maf
Development and Experiment of Deep Learning with Caffe and mafDevelopment and Experiment of Deep Learning with Caffe and maf
Development and Experiment of Deep Learning with Caffe and maf
 
On the benchmark of Chainer
On the benchmark of ChainerOn the benchmark of Chainer
On the benchmark of Chainer
 
情報幾何学の基礎、第7章発表ノート
情報幾何学の基礎、第7章発表ノート情報幾何学の基礎、第7章発表ノート
情報幾何学の基礎、第7章発表ノート
 
Генерация новых продуктов и идей 1.1
Генерация новых продуктов и идей 1.1Генерация новых продуктов и идей 1.1
Генерация новых продуктов и идей 1.1
 
RocksDB meetup
RocksDB meetupRocksDB meetup
RocksDB meetup
 
20161122 gpu deep_learningcommunity#02
20161122 gpu deep_learningcommunity#0220161122 gpu deep_learningcommunity#02
20161122 gpu deep_learningcommunity#02
 
Artificial general intelligence research project at Keen Software House (3/2015)
Artificial general intelligence research project at Keen Software House (3/2015)Artificial general intelligence research project at Keen Software House (3/2015)
Artificial general intelligence research project at Keen Software House (3/2015)
 
How to Develop Experiment-Oriented Programs
How to Develop Experiment-Oriented ProgramsHow to Develop Experiment-Oriented Programs
How to Develop Experiment-Oriented Programs
 
High-Performance GPU Programming for Deep Learning
High-Performance GPU Programming for Deep LearningHigh-Performance GPU Programming for Deep Learning
High-Performance GPU Programming for Deep Learning
 
LSTMで話題分類
LSTMで話題分類LSTMで話題分類
LSTMで話題分類
 
集中不等式のすすめ [集中不等式本読み会#1]
集中不等式のすすめ [集中不等式本読み会#1]集中不等式のすすめ [集中不等式本読み会#1]
集中不等式のすすめ [集中不等式本読み会#1]
 
Caffeインストール
CaffeインストールCaffeインストール
Caffeインストール
 
提供AMIについて
提供AMIについて提供AMIについて
提供AMIについて
 
Techtalk:多様体
Techtalk:多様体Techtalk:多様体
Techtalk:多様体
 
Learning Image Embeddings using Convolutional Neural Networks for Improved Mu...
Learning Image Embeddings using Convolutional Neural Networks for Improved Mu...Learning Image Embeddings using Convolutional Neural Networks for Improved Mu...
Learning Image Embeddings using Convolutional Neural Networks for Improved Mu...
 
Enterprise Deep Learning with DL4J
Enterprise Deep Learning with DL4JEnterprise Deep Learning with DL4J
Enterprise Deep Learning with DL4J
 
2015年9月18日 (GTC Japan 2015) 深層学習フレームワークChainerの導入と化合物活性予測への応用
2015年9月18日 (GTC Japan 2015) 深層学習フレームワークChainerの導入と化合物活性予測への応用 2015年9月18日 (GTC Japan 2015) 深層学習フレームワークChainerの導入と化合物活性予測への応用
2015年9月18日 (GTC Japan 2015) 深層学習フレームワークChainerの導入と化合物活性予測への応用
 
GPU Accelerated Deep Learning for CUDNN V2
GPU Accelerated Deep Learning for CUDNN V2GPU Accelerated Deep Learning for CUDNN V2
GPU Accelerated Deep Learning for CUDNN V2
 

Similar to Common Design of Deep Learning Frameworks

Scalable and Distributed DNN Training on Modern HPC Systems
Scalable and Distributed DNN Training on Modern HPC SystemsScalable and Distributed DNN Training on Modern HPC Systems
Scalable and Distributed DNN Training on Modern HPC Systems
inside-BigData.com
 
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDSDistributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
PeterAndreasEntschev
 

Similar to Common Design of Deep Learning Frameworks (20)

Onnc intro
Onnc introOnnc intro
Onnc intro
 
Profiling deep learning network using NVIDIA nsight systems
Profiling deep learning network using NVIDIA nsight systemsProfiling deep learning network using NVIDIA nsight systems
Profiling deep learning network using NVIDIA nsight systems
 
Netflix machine learning
Netflix machine learningNetflix machine learning
Netflix machine learning
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unit
 
Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509
 
Web Traffic Time Series Forecasting
Web Traffic  Time Series ForecastingWeb Traffic  Time Series Forecasting
Web Traffic Time Series Forecasting
 
OpenPOWER Workshop in Silicon Valley
OpenPOWER Workshop in Silicon ValleyOpenPOWER Workshop in Silicon Valley
OpenPOWER Workshop in Silicon Valley
 
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
 
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala UniversityInvited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
 
Infrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep LearningInfrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep Learning
 
Deep_Learning_Frameworks_CNTK_PyTorch
Deep_Learning_Frameworks_CNTK_PyTorchDeep_Learning_Frameworks_CNTK_PyTorch
Deep_Learning_Frameworks_CNTK_PyTorch
 
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
 
Deep Learning with Spark and GPUs
Deep Learning with Spark and GPUsDeep Learning with Spark and GPUs
Deep Learning with Spark and GPUs
 
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
 
Scalable and Distributed DNN Training on Modern HPC Systems
Scalable and Distributed DNN Training on Modern HPC SystemsScalable and Distributed DNN Training on Modern HPC Systems
Scalable and Distributed DNN Training on Modern HPC Systems
 
DATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe ConferenceDATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe Conference
 
Speeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCCSpeeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCC
 
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDSDistributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
 
Deep Learning at Scale
Deep Learning at ScaleDeep Learning at Scale
Deep Learning at Scale
 
CuPy: A NumPy-compatible Library for GPU
CuPy: A NumPy-compatible Library for GPUCuPy: A NumPy-compatible Library for GPU
CuPy: A NumPy-compatible Library for GPU
 

More from Kenta Oono

More from Kenta Oono (15)

Minimax statistical learning with Wasserstein distances (NeurIPS2018 Reading ...
Minimax statistical learning with Wasserstein distances (NeurIPS2018 Reading ...Minimax statistical learning with Wasserstein distances (NeurIPS2018 Reading ...
Minimax statistical learning with Wasserstein distances (NeurIPS2018 Reading ...
 
Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017
Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017
Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017
 
Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...
 
20170422 数学カフェ Part2
20170422 数学カフェ Part220170422 数学カフェ Part2
20170422 数学カフェ Part2
 
20170422 数学カフェ Part1
20170422 数学カフェ Part120170422 数学カフェ Part1
20170422 数学カフェ Part1
 
Stochastic Gradient MCMC
Stochastic Gradient MCMCStochastic Gradient MCMC
Stochastic Gradient MCMC
 
Chainer Contribution Guide
Chainer Contribution GuideChainer Contribution Guide
Chainer Contribution Guide
 
Introduction to Chainer (LL Ring Recursive)
Introduction to Chainer (LL Ring Recursive)Introduction to Chainer (LL Ring Recursive)
Introduction to Chainer (LL Ring Recursive)
 
日本神経回路学会セミナー「DeepLearningを使ってみよう!」資料
日本神経回路学会セミナー「DeepLearningを使ってみよう!」資料日本神経回路学会セミナー「DeepLearningを使ってみよう!」資料
日本神経回路学会セミナー「DeepLearningを使ってみよう!」資料
 
Chainerインストール
ChainerインストールChainerインストール
Chainerインストール
 
ディープラーニング最近の発展とビジネス応用への課題
ディープラーニング最近の発展とビジネス応用への課題ディープラーニング最近の発展とビジネス応用への課題
ディープラーニング最近の発展とビジネス応用への課題
 
Encode勉強会:GENCODE: The reference human genome annotation for The ENCODE Proje...
Encode勉強会:GENCODE: The reference human genome annotation for The ENCODE Proje...Encode勉強会:GENCODE: The reference human genome annotation for The ENCODE Proje...
Encode勉強会:GENCODE: The reference human genome annotation for The ENCODE Proje...
 
Deep Learning技術の最近の動向とPreferred Networksの取り組み
Deep Learning技術の最近の動向とPreferred Networksの取り組みDeep Learning技術の最近の動向とPreferred Networksの取り組み
Deep Learning技術の最近の動向とPreferred Networksの取り組み
 
NIPS2013読み会:Inverse Density as an Inverse Problem: The Fredholm Equation Appr...
NIPS2013読み会:Inverse Density as an Inverse Problem: The Fredholm Equation Appr...NIPS2013読み会:Inverse Density as an Inverse Problem: The Fredholm Equation Appr...
NIPS2013読み会:Inverse Density as an Inverse Problem: The Fredholm Equation Appr...
 
NIPS2013読み会:Inverse Density as an Inverse Problem: The Fredholm Equation Appr...
NIPS2013読み会:Inverse Density as an Inverse Problem: The Fredholm Equation Appr...NIPS2013読み会:Inverse Density as an Inverse Problem: The Fredholm Equation Appr...
NIPS2013読み会:Inverse Density as an Inverse Problem: The Fredholm Equation Appr...
 

Recently uploaded

Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
FIDO Alliance
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software Engineering
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 

Common Design of Deep Learning Frameworks

  • 1. Tutorial: Deep Learning Implementations and Frameworks Seiya Tokui*, Kenta Oono*, Atsunori Kanemura+, Toshihiro Kamishima+ *Preferred Networks, Inc. (PFN) {tokui,oono}@preferred.jp +National Institute of Advanced Industrial Science and Technology (AIST) atsu-kan@aist.go.jp, mail@kamishima.net 1
  • 2. Overview of this tutorial •1st session (KO, 8:30 ‒ 10:00) •Introduction •Basics of neural networks •Common design of neural network implementations •2nd session (ST, 10:30 ‒ 12:30) •Differences of deep learning frameworks •Coding examples of frameworks •Conclusion
  • 3. Common Design of Deep Learning Frameworks Kenta Oono <oono@preferred.jp> Preferred Networks Inc. 2016/4/19 3DLIF Tutorial @ PAKDD2016
  • 4. Objective of this part • How deep learning frameworks represent various neural networks. • How deep learning frameworks realize the training procedure of neural networks. • Technology stack that is common to most of deep learning frameworks. 2016/4/19 4DLIF Tutorial @ PAKDD2016
  • 5. Steps for training neural networks Prepare the training dataset Repeat until meeting some criterion Prepare for the next (mini) batch Compute the loss (forward prop) Initialize the Neural Network (NN) parameters Save the NN parameters Define how to compute the loss of this batch Compute the gradient (backprop) Update the NN parameters 2016/4/19 5DLIF Tutorial @ PAKDD2016
  • 6. Technology stack of DL framework name functions example Graphical interface DIGITS, TensorBoard Machine learning workflow management Dataset Management Training Loop Keras, Lasagne Blocks, TF Learn Computational graph management Build computational graph Forward prop/Backprop Theano, TensorFlow Torch.nn Multi-dimensional array library Linear algebra NumPy, CuPy Eigen, torch (core) Numerical computation package Matrix operation Convolution BLAS, cuBLAS, cuDNN Hardware CPU, GPU 2016/4/19 6DLIF Tutorial @ PAKDD2016
  • 7. Technology stack of DL framework 2016/4/19 7DLIF Tutorial @ PAKDD2016 name functions example Graphical interface DIGITS, TensorBoard Machine learning workflow management Dataset Management Training Loop Keras, Lasagne Blocks, TF Learn Computational graph management Build computational graph Forward prop/Backprop Theano, TensorFlow Torch.nn Multi-dimensional array library Linear algebra NumPy, CuPy Eigen, torch (core) Numerical computation package Matrix operation Convolution BLAS, cuBLAS, cuDNN Hardware CPU, GPU
  • 8. Neural Network as a Computational Graph • In simplest form, NN is represented as a computational graph (CG) that is a stack of bipartite DAGs (Directed Acyclic Graph) consisting of data nodes and operator nodes. y = x1 * x2 z = y - x3 x1 mul suby x3 z x2 data node operator node 2016/4/19 8DLIF Tutorial @ PAKDD2016
  • 9. Example: Multi-layer Perceptron (MLP) x Affine W1 b1 h1 ReLU a1 Affine W2 b2 h2 ReLU a2 Soft max y Cross Entropy Lo ss t It is choice of implementation if CG includes weights and biases. 2016/4/19 9DLIF Tutorial @ PAKDD2016
  • 10. Example: Recurrent Neural Network (RNN) x1 RNN Unit h1 RNN Unit x2 h2 RNN Unit xT h0 ・・・ hT RNN unit can be : • Affine + activation function • LSTM (Long Short-Term Memory) • GRU (Gated Recurrent Unit) x h y xt ht-1 ht W b 2016/4/19 10DLIF Tutorial @ PAKDD2016
  • 11. Example: Stacked RNN x1 RNN Unit h1 RNN Unit x2 h2 RNN Unit xT h0 ・・・ hT RNN Unit z1 RNN Unit z2 RNN Unit z0 ・・・ zT Soft max Affine y 2016/4/19 11DLIF Tutorial @ PAKDD2016
  • 12. Example: RNN with control flow nodes loop enter s i predic ate pr ed s h0 x switch s RNN Unit s’update loop end y pred=True pred=False • TensorFlow has control flow nodes (e.g. cond, switch, while) • As CG has a loop, some mechanism is necessary that resolves he dependency of nodes to schedule the order of calculation. W b 2016/4/19 12DLIF Tutorial @ PAKDD2016
  • 13. Automatic Differentiation • Computes gradient of some specified data nodes (e.g. loss) with respect to each data node. • Each operator node must have backward operation to calculate gradients w.r.t. its inputs from gradients w.r.t. its outputs (realization of chain rule). • e.g. Function class of Chainer has backward method. • e.g. Each layer classes of Caffe has Backward_cpu and Backward_gpu methods • e.g. Autograd has a thin wrapper that adds gradient methods as a closure to most of NumPy methods. 2016/4/19 13DLIF Tutorial @ PAKDD2016
  • 14. Backprop through CG ∇y z∇x1 z ∇z z = 1 y = x1 * x2 z = y - x3 x1 mul suby x3 z x2 2016/4/19 14DLIF Tutorial @ PAKDD2016
  • 15. Backprop as extended graphs x1 mul suby x3 z x2 dzid neg mul mul dy dx 3 dx 1 dx 2 forward propagation backward propagation y = x1 * x2 z = y - x3 2016/4/19 15DLIF Tutorial @ PAKDD2016
  • 16. Example: Theano 2016/4/19 16DLIF Tutorial @ PAKDD2016
  • 17. Technology stack of DL framework 2016/4/19 17DLIF Tutorial @ PAKDD2016 name functions example Graphical interface DIGITS, TensorBoard Machine learning workflow management Dataset Management Training Loop Keras, Lasagne Blocks, TF Learn Computational graph management Build computational graph Forward prop/Backprop Theano, TensorFlow Torch.nn Multi-dimensional array library Linear algebra NumPy, CuPy Eigen, torch (core) Numerical computation package Matrix operation Convolution BLAS, cuBLAS, cuDNN Hardware CPU, GPU
  • 18. Numerical optimizer • Many gradient-based optimization algorithms are implemented. • Stochastic Gradient Descent (SGD) is implemented in most DL frameworks. • It depends on concrete tasks which optimizer works best. w: parameters of neural network θ: states of optimizer L: loss function Γ: optimizer-specific function initialize w, θ until meet the criteria: get data (x, y) calculate ∇w L(x, y; w) w, θ ← Γ(w, θ, ∇w L) 2016/4/19 18DLIF Tutorial @ PAKDD2016
  • 19. Serialization • Save/Load the snapshot of training process in specified format (e.g. hdf5, npz, protobuf) • Models to be trained (= architectures and parameters of NNs) • States of training procedure (e.g. epoch, learning rate, momentum) • Serialization enhance the portability of models. • Publish pre-trained model (e.g. Model Zoo (Caffe), MXNet, TensorFlow) • Import pre-trained model of other DL frameworks • e.g. Chainer supports BVLC-official reference models of Caffe. 2016/4/19 19DLIF Tutorial @ PAKDD2016
  • 20. Computational optimizer • Convert CGs to make them simplified and efficient. e.g. Theano y = x1 * x2 z = y - x3 2016/4/19 20DLIF Tutorial @ PAKDD2016
  • 21. Abstraction of ML workflow • Offers typical training/validation/evaluation procedures as APIs. • Users should call a single API and do not have to write the procedure manually. • e.g. fit, evaluate methods of Model class in Keras. 2016/4/19 21DLIF Tutorial @ PAKDD2016 Prepare the training dataset Repeat until meeting some criterion Prepare for the next (mini) batch Compute the loss (forward prop) Initialize the Neural Network (NN) parameters Save the NN parameters Define how to compute the loss of this batch Compute the gradient (backprop) Update the NN parameters
  • 22. Graphical interface • Computational graph management • Editor, Visualizer • Visualization of training procedure • Visualization of feature maps, output of NNs etc. • Transition of error and accuracy • Performance monitor • e.g. Throughput, latency, memory usage 2016/4/19 22DLIF Tutorial @ PAKDD2016
  • 23. Technology stack of DL framework 2016/4/19 23DLIF Tutorial @ PAKDD2016 name functions example Graphical interface DIGITS, TensorBoard Machine learning workflow management Dataset Management Training Loop Keras, Lasagne Blocks, TF Learn Computational graph management Build computational graph Forward prop/Backprop Theano, TensorFlow Torch.nn Multi-dimensional array library Linear algebra NumPy, CuPy Eigen, torch (core) Numerical computation package Matrix operation Convolution BLAS, cuBLAS, cuDNN Hardware CPU, GPU
  • 24. GPU support • CUDA: Computing platform for GPGPU on NVIDIA GPU • language extension, compiler, library etc. • DL frameworks prepare wrappers for CUDA. • GPU-array library that utilizes cuBLAS, cuRAND etc. • Layer implementation with cuDNN (e.g. Convolution, sigmoid, LSTM) • Designed to switch CPU and GPU easily. • e.g. Users can write CPU-GPU agnostic code. • e.g. Switch CPU/GPU with environment variables. • Some framework supports Open CL as a GPU environment, but CUDA is more popular for now. 2016/4/19 24DLIF Tutorial @ PAKDD2016
  • 25. Multi-dimensional array library (CPU / GPU) • In charge of concrete calculation of data nodes. • Heavily depends on BLAS (CPU) or CUDA / CUDA Toolkits (GPU) • CPU • Third-party library: Eigen::Tensor, NumPy • Scratch: ND4J (DL4J), mshadow (MXNet) • GPU • Third-party library: Eigen::Tensor, PyCUDA, gpuarray • Scratch: ND4J (DL4J), mshadow (MXNet), CuPy (Chainer) 2016/4/19 25DLIF Tutorial @ PAKDD2016
  • 26. Which device to use? • GPU is (by far) faster than CPU in most case. • Most of tensor calculation consists of element-wise calculation, matrix multiplications and convolutions. • Exceptional cases • Difficult to apply mini-batch technique. • e.g. variable-length training dataset • e.g. The architecture of NN depends on the training data. • GPU calculation cannot hide transfer of data to GPU. • e.g. Minibatch size is too small. 2016/4/19 26DLIF Tutorial @ PAKDD2016
  • 27. Technology stack of Chainer cuDNN Chainer NumPy CuPy BLAS cuBLAS, cuRAND CPU GPU 2016/4/19 27DLIF Tutorial @ PAKDD2016 name Graphical interface Machine learning workflow management Computational graph management Multi-dimensional array library Numerical computation package Hardware
  • 28. Technology stack of TensorFlow cuDNN TensorFlow Eigen::Tensor BLAS cuBLAS, cuRAND CPU GPU 2016/4/19 28DLIF Tutorial @ PAKDD2016 name Graphical interface Machine learning workflow management Computational graph management Multi-dimensional array library Numerical computation package Hardware TensorBoard TF Learn
  • 29. Technology stack of Theano CUDA, OpenCL CUDAToolkit Theano BLAS CPU GPU 2016/4/19 29DLIF Tutorial @ PAKDD2016 name Graphical interface Machine learning workflow management Computational graph management Multi-dimensional array library Numerical computation package Hardware lib gpuarray NumPy
  • 30. Technology stack of Keras 2016/4/19 30DLIF Tutorial @ PAKDD2016 name Graphical interface Machine learning workflow management Computational graph management Multi-dimensional array library Numerical computation package Hardware Keras TensorFlowTheano Technology Stack of Theano Technology Stack of TF
  • 31. Summary • Most DL frameworks have many components in common and can be organized as a similar technology stack. • At upper layer of the stack, frameworks are designed to support users to follow typical ML workflows. • At middle layer, manipulations on computational graphs are automated. • At lower layer, optimized tensor calculations are implemented. • Realization of these components differ between frameworks, as we will see in the following part. 2016/4/19 31DLIF Tutorial @ PAKDD2016
  • 33. Training of Neural Networks • L is designed so that its value gets small as the prediction more “accurate” • In deep learning context • L : represented by neural networks • w : parameters of neural networks argminw ∑(x, y) L(x, y; w) w: parameters x: feature vector y: training label L: loss function e.g.: Classification problem 332016/4/19 DLIF Tutorial @ PAKDD2016
  • 34. Layer = function + data nodes • Layers (e.g. Fully connected layer, convolutional layer) can be considered as a function with parameters to be optimized. • In most of modern frameworks, parameters of layers can be considered as data nodes in a computational graph. • Framework need to be differentiate which data nodes are parameters to be optimized or data point. 342016/4/19 DLIF Tutorial @ PAKDD2016
  • 35. Execution Engine • It calculates the dependency between data node and schedules the execution of parts of computational graph (especially in multi-node or multi-GPU setting) 352016/4/19 DLIF Tutorial @ PAKDD2016