Pytorch meetup

MUNICH
IN DATA REPLY
Dmitrii Azarnykh | Data Scientist at Data Reply
Jupyter notebook: https://goo.gl/z6Guvo WLAN: DO-Tagungswelt
HTML version: https://goo.gl/Nh953A PASS: DesignOffice
Git-Hub: https://goo.gl/j8LEb9

CONSULTING TEAMS IN
DATA REPLY
6
DataStrategy
Big Data3
Data Science1
Data Incubator2
Ab Initio4
MicroStrategy5
Teams
• Different aspects of Data Science are done by
different types of specialists
• Python is most used language in Data Science
group
• International, fast-growing team: more than 30
nationalities
• Employees from best Universities, >30% PhD
• Free trainings, certificates
• Traveling to conferences: ICML, ML-Prague

6
DataStrategy
Big Data3
Data Science1
Data Incubator2
Ab Initio4
MicroStrategy5
Teams
CONSULTING TEAMS IN
DATA REPLY
• Different aspects of Data Science are done by
different types of specialists
• Python is most used language in Data Science
group
• International, fast-growing team: more than 30
nationalities
• Employees from best Universities, >30% PhD
• Free trainings, certificates
• Traveling to conferences: ICML, ML-Prague

UNDERSTANDING PYTORCH:
PYTORCH IN IMAGE
PROCESSING
Dmitrii Azarnykh | Data Scientist at Data Reply
Jupyter notebook: https://goo.gl/spXV6b WLAN: DO-Tagungswelt
HTML version: https://goo.gl/Nh953A PASS: DesignOffice

OUTLINE
Automatic Differentiation
Pytorch Basics
Pytorch in image processing

THE TRUTH ABOUT TRAINING
DEEP NEURAL NETWORKS

THE TRUTH ABOUT TRAINING
DEEP NEURAL NETWORKS
𝜕𝐹(𝒙, 𝒚, 𝒘)
𝜕𝒘

TWO WAYS COMPUTE GRADIENTS
𝑥1
exp
𝑓(𝑥1)
Backward pathForward path
𝑓(𝑥1)
𝜕𝑓(𝑥1)
𝜕𝑥1
𝑥1
exp
𝑓(𝑥1)
Forward path
𝑓(𝑥1),
𝜕𝑓(𝑥1)
𝜕𝑥1
Backward propagationForward propagation

AUTOMATIC DIFFERENTIATION
𝑓(𝑥1, 𝑥2) = 𝑥1 𝑥2 + 𝑒 𝑥1;
𝜕𝑓(𝑥1,𝑥2)
𝜕𝑥1
=? ;
𝜕𝑓(𝑥1,𝑥2)
𝜕𝑥2
=?
𝑥1 𝑥2
exp
𝑑3
𝑑2= 0
+
𝑓(𝑥1, 𝑥2)
Forwardpropagation
ofderivativevalues
𝑑1= 1
𝑑5=𝑑3+ 𝑑4
𝑑4
mult

Loss function
Weights
𝑓(𝑥1, 𝑥2) = 𝑥1 𝑥2 + 𝑒 𝑥1;
𝜕𝑓(𝑥1,𝑥2)
𝜕𝑥1
=? ;
𝜕𝑓(𝑥1,𝑥2)
𝜕𝑥2
=?
𝑥1 𝑥2
exp
𝑑2= 0
+
𝑓(𝑥1, 𝑥2)
Forwardpropagation
ofderivativevalues
𝑑1= 1
𝑑5=𝑑3+ 𝑑4
𝑑4
mult
𝑑3

Loss function
Weights
Use 𝑑5 to update weights.
NO BACKPROPAGATIONNEEDED
𝑓(𝑥1, 𝑥2) = 𝑥1 𝑥2 + 𝑒 𝑥1;
𝜕𝑓(𝑥1,𝑥2)
𝜕𝑥1
=? ;
𝜕𝑓(𝑥1,𝑥2)
𝜕𝑥2
=?
𝑥1 𝑥2
exp
𝑑2= 0
+
𝑓(𝑥1, 𝑥2)
Forwardpropagation
ofderivativevalues
𝑑1= 1
𝑑5=𝑑3+ 𝑑4
𝑑4
mult
𝑑3

How many outputs?
How many inputs?
𝑓(𝑥1, 𝑥2) = 𝑥1 𝑥2 + 𝑒 𝑥1;
𝜕𝑓(𝑥1,𝑥2)
𝜕𝑥1
=? ;
𝜕𝑓(𝑥1,𝑥2)
𝜕𝑥2
=?
𝑥1 𝑥2
exp
𝑑2= 0
+
𝑓(𝑥1, 𝑥2)
Forwardpropagation
ofderivativevalues
𝑑1= 1
𝑑5=𝑑3+ 𝑑4
𝑑4
mult
𝑑3

one or zero,
cats or dogs
Inception:
6.7 millions
𝑥1 𝑥2
exp
𝑑2= 0
+
𝑓(𝑥1, 𝑥2)
Forwardpropagation
ofderivativevalues
𝑑1= 1
𝑑5=𝑑3+ 𝑑4
𝑑4
mult
𝑑3

𝑥1 𝑥2
exp
𝑥3 +
𝑓(𝑥1, 𝑥2)
Backwardpropagation
ofderivativevalues
𝑥4
𝑥5
First store values 𝑥3, 𝑥4, 𝑥5
mult
BACKPROPAGATION: FORWARD
𝑓(𝑥1, 𝑥2) = 𝑥1 𝑥2 + 𝑒 𝑥1;
𝜕𝑓(𝑥1,𝑥2)
𝜕𝑥1
=? ;
𝜕𝑓(𝑥1,𝑥2)
𝜕𝑥2
=?

𝑥1 𝑥2
exp mult
+
𝑓(𝑥1, 𝑥2)
Backwardpropagation
ofderivativevalues
𝑑5= 1 (𝑠𝑒𝑒𝑑)
𝑑3= 𝑑5 𝑑4 = 𝑑5
𝑑2= 𝑥1𝑑1= ⅇ 𝑥1+ 𝑥2
First store values 𝑥3, 𝑥4, 𝑥5
Then backpropagate gradients
BACKPROPAGATION: BACKWARD
𝑓(𝑥1, 𝑥2) = 𝑥1 𝑥2 + 𝑒 𝑥1;
𝜕𝑓(𝑥1,𝑥2)
𝜕𝑥1
=? ;
𝜕𝑓(𝑥1,𝑥2)
𝜕𝑥2
=?

PYTORCH LIBRARY
Dynamic computational graph
Feels like python, not C++
Python library for neural network
Implemented automatic differentiation

PYTORCH TENSOR
• Tensoris N-dimensional array
• Operations on tensors are done on a CPU in parallel or on a GPU
• Syntaxis is similar to numpy

PYTORCH TENSOR
• Tensorfrom numpyarray and numpy array to tensor
• torch.tensor versus torch.as_tensor

PYTORCH TENSOR
Tensorhas many attributes. Some of these attributes:
• Data of a tensoris a tensoritself
• Gradients of tensors is also a tensor of the same size as data tensor or None
• Parameter requires_grad.Need to compute gradients only for weights, not for data
• A function to compute backpropagation

COMPUTE GRADIENTS
Function backward() computesgradients with
backpropagation
What is the output here?

COMPUTE GRADIENTS
RuntimeError: grad can be implicitly created only
for scalar outputs
Function backward() computesgradients with
backpropagation

COMPUTE GRADIENTS
RuntimeError: grad can be implicitly created only
for scalar outputs
are vectors so a gradient is a matrix. What to do?

COMPUTE GRADIENTS
gradients double after the second call of
backward()function

COMPUTE GRADIENTS
Gradients always sum up after backpropagation.
So we need to set gradients to zero before calling
backward() function second time.
𝑥1
exp mult
𝑥3
𝑥4𝑑3= 𝑑5
𝜕𝑑5
𝜕𝑑4
= 𝑑5
𝑑4= 𝑑5
𝜕𝑑5
𝜕𝑑3
= 𝑑5
𝑑1= 𝑑3
𝜕𝑑3
𝜕𝑑1
+ 𝑑4
𝜕𝑑4
𝜕𝑑1
= ⅇ 𝑥1 + 𝑥2

Equation of orange line:
ො𝑦 = 𝑤 ∗ 𝑥 + 𝑏
Blue dots:
(𝑥𝑖, 𝑦𝑖)
Minimize sum of squared lengths
of greenlines:
෍
𝑖
(ො𝑦𝑖 − 𝑦𝑖)2
𝑦
𝑥
LINEAR REGRESSION

features and labels, (𝑥𝑖, 𝑦𝑖)
initialize weights, need gradient 𝑤, 𝑏
train with a gradient descent:
• compute predictions, ො𝑦 = 𝑤 ∗ 𝑥 + 𝑏
• backpropagate loss ෌𝑖
( ො𝑦𝑖 − 𝑦𝑖)2
• update weights
• set gradients to zero
LINEAR REGRESSION

LINEAR REGRESSION
also possible to use optimizer to accept weights
as parameters
optimizer updates all weights and sets gradients
of all weights to zero

LINEAR REGRESSION
Note that these two lines do the same as:

LINEAR REGRESSION
Can easily plot results:
for tensors with requires_grad=True
need to cast .detach()function
before transform to numpy

EXCITING PART:
PYTORCH FOR IMAGE
PROCESSING

OUTLINE
1. Build AlexNet model
2. Load dataset
3. CUDA/GPUcompatibility
4. Training
5. Speed-up,save-load model,evaluation

STEP 1: BUILD
2. Load dataset
4. Training

BUILD ALEXNET MODEL
Weights are downloaded
automatically.
features part is pretrained on
ImageNet.It extracts the most
useful features from the images.
We will not train this part and use
the weights we downloaded.
We will substitute and retrain
this part.

BUILD ALEXNET MODEL
do not need gradients for
features-extractorweights
a new modelforclassification
syntaxis is similar to Keras
set classifieras trainable
set features as not trainable

STEP 2: LOAD
2. Load dataset
4. Training

LOAD DATASET
set transformation for data
create dataset: no images in
memoryyet, only their paths
and labels
split train and test: still no
images in memory
balance dataset and create a
generator, which yields batches
of images
Images are loaded in memory
only when iterations happen

STEP 3: CONVERT
2. Load dataset
3. CUDA/GPU compatibility
4. Training

CUDA/GPU COMPATIBILITY
Graphical Processing Unit GPU Random-access memory(RAM)
• Model weights
• Images
• Labels
Solid State Drive (SSD)
• Images
(DataLoader)

STEP 4: TRAIN
2. Load dataset
4. Training
5. Speed-up,save/load model,evaluation

MODEL TRAINING
send images and labels to GPU, if
GPU is used
non_blocking=Trueis used for
asynchronous computations,which
speedsup CUDA computations

MODEL TRAINING
compute predictions,estimate loss
function and backpropagate

MODEL TRAINING
make one step of gradient descent
and set gradients of trainable weights
in alexnet.classifier to zero
only classifierparameters are
passed to the optimizer

MODEL TRAINING
use tqdm to show a progress bar and
evaluate a current average batch loss
function

STEP 5: EVALUATE
2. Load dataset
4. Training
5. Speed-up,save-loadmodel,evaluation

SPEED UP IMAGE LOADING
Graphical Processing Unit GPU Random-access memory(RAM)
• Model weights
• Labels
Solid State Drive (SSD)
• Images
(DataLoader)

MODEL EVALUATION
iterate on test_loaderDataLoader
push labels and probabilities firstly to CPU
and then to numpy
use scikit-learn to show metrics

SAVE/LOAD MODEL
save torch modeland the state
of the optimizer
when loading weights, need to
initialize the modeland the
optimizer first
load the weights and the state of
the optimizer

Pytorch meetup

Recommended

Recommended

More Related Content

Similar to Pytorch meetup

Similar to Pytorch meetup (20)

Recently uploaded

Recently uploaded (20)

Pytorch meetup