Title: "Understanding PyTorch: PyTorch in Image Processing". Github: https://github.com/azarnyx/PyData_Meetup. The Dataset: https://goo.gl/CWmLWD.
The talk was given in PyData Meetup which took place in Munich on 06.03.2019 in Data Reply office. The talk was given by Dmitrii Azarnykh, data scientist in Data Reply.
1. MUNICH
IN DATA REPLY
Dmitrii Azarnykh | Data Scientist at Data Reply
Jupyter notebook: https://goo.gl/z6Guvo WLAN: DO-Tagungswelt
HTML version: https://goo.gl/Nh953A PASS: DesignOffice
Git-Hub: https://goo.gl/j8LEb9
2. CONSULTING TEAMS IN
DATA REPLY
6
DataStrategy
Big Data3
Data Science1
Data Incubator2
Ab Initio4
MicroStrategy5
Teams
• Different aspects of Data Science are done by
different types of specialists
• Python is most used language in Data Science
group
• International, fast-growing team: more than 30
nationalities
• Employees from best Universities, >30% PhD
• Free trainings, certificates
• Traveling to conferences: ICML, ML-Prague
3. 6
DataStrategy
Big Data3
Data Science1
Data Incubator2
Ab Initio4
MicroStrategy5
Teams
CONSULTING TEAMS IN
DATA REPLY
• Different aspects of Data Science are done by
different types of specialists
• Python is most used language in Data Science
group
• International, fast-growing team: more than 30
nationalities
• Employees from best Universities, >30% PhD
• Free trainings, certificates
• Traveling to conferences: ICML, ML-Prague
4. UNDERSTANDING PYTORCH:
PYTORCH IN IMAGE
PROCESSING
Dmitrii Azarnykh | Data Scientist at Data Reply
Jupyter notebook: https://goo.gl/spXV6b WLAN: DO-Tagungswelt
HTML version: https://goo.gl/Nh953A PASS: DesignOffice
21. PYTORCH TENSOR
Tensorhas many attributes. Some of these attributes:
• Data of a tensoris a tensoritself
• Gradients of tensors is also a tensor of the same size as data tensor or None
• Parameter requires_grad.Need to compute gradients only for weights, not for data
• A function to compute backpropagation
27. COMPUTE GRADIENTS
Gradients always sum up after backpropagation.
So we need to set gradients to zero before calling
backward() function second time.
𝑥1
exp mult
𝑥3
𝑥4𝑑3= 𝑑5
𝜕𝑑5
𝜕𝑑4
= 𝑑5
𝑑4= 𝑑5
𝜕𝑑5
𝜕𝑑3
= 𝑑5
𝑑1= 𝑑3
𝜕𝑑3
𝜕𝑑1
+ 𝑑4
𝜕𝑑4
𝜕𝑑1
= ⅇ 𝑥1 + 𝑥2
28. Equation of orange line:
ො𝑦 = 𝑤 ∗ 𝑥 + 𝑏
Blue dots:
(𝑥𝑖, 𝑦𝑖)
Minimize sum of squared lengths
of greenlines:
𝑖
(ො𝑦𝑖 − 𝑦𝑖)2
𝑦
𝑥
LINEAR REGRESSION
29. features and labels, (𝑥𝑖, 𝑦𝑖)
initialize weights, need gradient 𝑤, 𝑏
train with a gradient descent:
• compute predictions, ො𝑦 = 𝑤 ∗ 𝑥 + 𝑏
• backpropagate loss 𝑖
( ො𝑦𝑖 − 𝑦𝑖)2
• update weights
• set gradients to zero
LINEAR REGRESSION
30. LINEAR REGRESSION
also possible to use optimizer to accept weights
as parameters
optimizer updates all weights and sets gradients
of all weights to zero
34. OUTLINE
1. Build AlexNet model
2. Load dataset
3. CUDA/GPUcompatibility
4. Training
5. Speed-up,save-load model,evaluation
35. STEP 1: BUILD
1. Build AlexNet model
2. Load dataset
3. CUDA/GPUcompatibility
4. Training
5. Speed-up,save-load model,evaluation
36. BUILD ALEXNET MODEL
Weights are downloaded
automatically.
features part is pretrained on
ImageNet.It extracts the most
useful features from the images.
We will not train this part and use
the weights we downloaded.
We will substitute and retrain
this part.
37. BUILD ALEXNET MODEL
do not need gradients for
features-extractorweights
a new modelforclassification
syntaxis is similar to Keras
set classifieras trainable
set features as not trainable
38. STEP 2: LOAD
1. Build AlexNet model
2. Load dataset
3. CUDA/GPUcompatibility
4. Training
5. Speed-up,save-load model,evaluation
39. LOAD DATASET
set transformation for data
create dataset: no images in
memoryyet, only their paths
and labels
split train and test: still no
images in memory
balance dataset and create a
generator, which yields batches
of images
Images are loaded in memory
only when iterations happen
40. STEP 3: CONVERT
1. Build AlexNet model
2. Load dataset
3. CUDA/GPU compatibility
4. Training
5. Speed-up,save-load model,evaluation
42. STEP 4: TRAIN
1. Build AlexNet model
2. Load dataset
3. CUDA/GPUcompatibility
4. Training
5. Speed-up,save/load model,evaluation
43. MODEL TRAINING
send images and labels to GPU, if
GPU is used
non_blocking=Trueis used for
asynchronous computations,which
speedsup CUDA computations
45. MODEL TRAINING
make one step of gradient descent
and set gradients of trainable weights
in alexnet.classifier to zero
only classifierparameters are
passed to the optimizer
46. MODEL TRAINING
use tqdm to show a progress bar and
evaluate a current average batch loss
function
47. STEP 5: EVALUATE
1. Build AlexNet model
2. Load dataset
3. CUDA/GPUcompatibility
4. Training
5. Speed-up,save-loadmodel,evaluation
48. SPEED UP IMAGE LOADING
Graphical Processing Unit GPU Random-access memory(RAM)
• Model weights
• Labels
Solid State Drive (SSD)
• Images
(DataLoader)
49. MODEL EVALUATION
iterate on test_loaderDataLoader
push labels and probabilities firstly to CPU
and then to numpy
use scikit-learn to show metrics
50. SAVE/LOAD MODEL
save torch modeland the state
of the optimizer
when loading weights, need to
initialize the modeland the
optimizer first
load the weights and the state of
the optimizer