SlideShare a Scribd company logo
1 of 95
Download to read offline
ReCap: PyTorch,
CNNs, RNNs
Andrey Ustyuzhanin
Andrey Ustyuzhanin 1
Deep Learning Frameworks
Source
Andrey Ustyuzhanin 2
Andrey Ustyuzhanin 3
▶ Simple, transparent development/ debugging
▶ Rich Ecosystem:
– Plenty of pretrained models
– NLP, Vision, …
– Interpretation
– Hyper-optimization
▶ Production Ready (C++, ONNX, Services)
▶ Distributed Training, declarative data parallelism
▶ Cloud Deployment support
▶ Choice of many industry leaders and researchers
PyTorch highlights
Andrey Ustyuzhanin 4
Neural network representation
Tensors
Functions
X Y
ReLU
ReLU
Y
b2
W2 W1
b1
X
Andrey Ustyuzhanin 5
Building blocks, tensors
Andrey Ustyuzhanin 6
▶ Keep in mind occurrence of tensors on devices: CPU, GPU, TPU
▶ Operations can be performed only if its arguments are inhabiting the
same device
Tensor creation and placement
Andrey Ustyuzhanin 7
GPU, TPU support
▶ https://pytorch.org/docs/stable/cuda.html
▶ http://pytorch.org/xla/release/1.5/index.html
Andrey Ustyuzhanin 8
Building blocks, graph
source
Toy example:
Andrey Ustyuzhanin 9
▶ https://pytorch.org/docs/stable/torch.html?highlight=mm#math-
operations
Math operations
Andrey Ustyuzhanin 10
Computing backpropagation
Andrey Ustyuzhanin 12
Computing gradient automatically
Each Tensor has an attribute grad_fn,
which refers to the mathematical
operator that created it.
If Tensor is a leaf node (initialized by the
user), then the grad_fn is None.
Andrey Ustyuzhanin 13
▶ All math steps represented by classes inherited from torch.autograd.Function
– forward, computes node output and buffers it
– backward, stores incoming gradient in grad and passes further
Functions
Andrey Ustyuzhanin 14
▶ Compute gradient for every tensor involved
▶ Make gradient descent step in the opposite direction:
Gradient descent
Andrey Ustyuzhanin 15
▶ Calling forward creates
– graph with the intermediate node output values,
– buffers for the non-leaf nodes,
– buffers for intermediate gradient values.
▶ Calling backward
– computes gradients and
– frees the buffers and destroys the graph.
▶ Next time, calling forward
– leaf node buffers from the previous run will be shared,
– non-leaf nodes buffers will be recreated.
Dynamic graph
Andrey Ustyuzhanin 16
▶ Due to the flexibility of the network architecture, it is not obvious when
does iteration of a gradient descent stops, so backward’s gradients are
accumulated each time a variable (Tensor) occurs in the graph;
▶ It is usually desired for RNN cases;
▶ If you do not need to accumulate those, you must clean previous
gradient values at the end of each iteration:
– Either by x.data.zero_() for every model tensor x;
– Or by optimizers’s zero_grad() method, which is more preferable.
Gradient cleaning
Andrey Ustyuzhanin 17
▶ Requires_grad attribute of the Tensor class.
By default, it’s False. It comes handy when
you must freeze some layers and stop them
from updating parameters while training.
▶ Thus, no gradient would be propagated to
them, or to those layers which depend
upon these layers for gradient flow
requires_grad.
▶ When set to True, requires_grad is
contagious: even if one operand of an
operation has requires_grad set to True,
so will the result.
Freezing weights
Andrey Ustyuzhanin 18
Pre-trained models' enhancement
Andrey Ustyuzhanin 19
▶ When we are computing gradients, we need to cache input values, and intermediate features
as they maybe required to compute the gradient later. The gradient of b=w1∗a w.r.t it's inputs
w1 and a is a and w1, respectively.
▶ We need to store these values for gradient computation during the backward pass. This affects
the memory footprint of the network.
▶ While, we are performing inference, we don't compute gradients
▶ Even better and recent optimized context: with torch.inference_mode (link)
Inference
Andrey Ustyuzhanin 20
Neural Network class: torch.nn.Module
Andrey Ustyuzhanin 21
Helpful functions
Andrey Ustyuzhanin
Type Classes/functions examples
Loss functions BCELoss, CrossEntropyLoss, L1Loss, MSELoss, NLLLoss,
SoftMarginLoss,
MultiLabelSoftMarginLoss, CosineEmbeddingLoss,
KLDivLoss, …
Activation functions ReLU, ELU, SELU, PReLU, LeakyReLU, Threshold, HardTanh,
Sigmoid, Tanh,LogSigmoid, Softplus, Softshrink, Softsign,
TanhShrink, Softmin, Softmax, Softmax2d ..
Optimizers opt = optim.X(model.parameters(), ...)
where X is SGD, Adadelta, Adagrad, Adam, SparseAdam,
Adamax, ASGD, LBFGS, RMSProp or Rprop
Visualization from torchviz import make_dot
make_dot(y.mean(),
params=dict(model.named_parameters()))
22
Open Neural Network eXchange (ONNX) is an open standard format for
representing machine learning models. The torch.onnx module can export
PyTorch models to ONNX. The model can then be consumed by any of the many
runtimes that support ONNX.
▶ Export:
▶ Run:
ONNX
Andrey Ustyuzhanin 23
ONNX Runtimes
Andrey Ustyuzhanin 24
▶ https://pytorch.org/docs/stable/data.html?highlight=dataloader#torch.utils.data.DataLoader
Data Utils
▶ https://pytorch.org/docs/stable/data.html?highlight=dataset#torch.utils.data.Dataset
Andrey Ustyuzhanin 25
▶ PyTorch 2.0 has a new API called torch.compile that wraps models and returns
compiled ones.
▶ It is 100% backward compatible and features technologies like TorchInductor
with Nvidia and AMD GPUs, Accelerated Transformers, Metal Performance
Shaders backend, and more.
▶ Metal Performance Shaders (MPS) backend provides GPU accelerated PyTorch
training on Mac platforms with added support for Top 60 most used ops,
bringing coverage to over 300 operators.
▶ Amazon AWS also added optimizations for PyTorch CPU inference on their
Graviton3-based C7g instances.
▶ Lastly, there are new features and technologies like TensorParallel, DTensor, 2D
parallel, TorchDynamo, AOTAutograd, PrimTorch, and TorchInductor.
PyTorch 2.0 (since March 2023)
Andrey Ustyuzhanin
https://bit.ly/3KgcsYE
26
▶ The lightweight PyTorch wrapper for high-performance AI research.
PyTorch Lightning
Andrey Ustyuzhanin https://github.com/PyTorchLightning/pytorch-lightning 27
https://github.com/PyTorchLightning/pytorch-lightning
Andrey Ustyuzhanin 28
▶ PyTorch lightning
▶ PyTorch geometric – graph neural nets
▶ Hydra – parametrization and experiments
▶ Horovod – distributed training
▶ Skorch – scikit-learn compatible neural network library
▶ Captum – model interpretation
▶ Ray – accelerating ML workloads
▶ And many others, see https://pytorch.org/ecosystem/
Ecosystem
Andrey Ustyuzhanin 29
PyTorch wrap-up
▶ PyTorch is a solid, flexible, production-ready
foundation for real-life deep-learning applications
▶ Building blocks:
– Tensors
– Functions
▶ Dynamic graph automatic differentiation
– CPU, GPU, TPU
▶ Rich ecosystem
Andrey Ustyuzhanin 30
›
Convolutional Neural Networks
Andrey Ustyuzhanin
Image Recognition
Andrey Ustyuzhanin
Banksy
RGB image is a 3D array
[HxWx3]
or [3xHxW]
32
General idea
Andrey Ustyuzhanin
P(”Kid”)
33
Problem with images
Slide from Andrey Zimovnov's “Deep learning for computer vision”
You network will have to learn those two cases separately!
Worst case: one neuron per position.
👶
👶
Main idea: let’s encode ”kid” by
weight tensor that we can shift
across the image.
Andrey Ustyuzhanin 35
Same features for each spot
Weights 𝑤ij
Portable kid detector pro!
No kid
👶
Same features for each spot
Weights 𝑤ij
Portable kid detector pro!
No kid No kid
👶
Same features for each spot
Weights 𝑤ij
Portable kid detector pro!
No kid No kid
Kid!
👶
👶
Same features for each spot
Weights 𝑤ij
Portable kid detector pro!
No kid No kid
Kid! No kid
👶
Same features for each spot
Weights 𝑤ij
Portable kid detector pro!
No kid No kid
Kid! No kid
👶
There's a kid on the pic!
Convolution
● Apply same weights to all patches
● 1D example:
image
output
weights
Convolution, 2D
apply one “filter” to all patches
Convolution
Intuition: how kid-like is this square?
5x5
3x3 (5-3+1)
Example of convolution
Intuition: how edge-like is this square?
▶ pad the edges with extra, “fake” pixels
(usually of value 0, hence the oft-used term
“zero padding”). This way, the kernel when
sliding can allow the original edge pixels to
be at its center, while extending into the fake
pixels beyond the edge, producing an output
the same size as the input;
▶ padding_mode (string, optional) – 'zeros',
'reflect', 'replicate' or 'circular'. Default: 'zeros'
Convolution tricks: padding
Andrey Ustyuzhanin 45
▶ The idea of the stride is to skip some of the
slide locations of the kernel (see figure on the
right)
▶ Dilated convolution is a basic convolution
only applied to the input volume with
defined gaps (see below). You may use
dilated convolution when:
Convolution tricks: striding and dilation
- You are working with higher
resolution images, but fine-
grained details are still important
- You are constructing a network
with fewer parameters
Andrey Ustyuzhanin 46
Convolution Demo
Andrey Ustyuzhanin
▶ <Demo>
47
▶ Intuition: learns weights by gradient descent
▶ May learn arbitrary number of kernels, one per output channel
▶ the output value of the layer with input size (N, Cin, H, W) and
output (N, Cout, Hout, Wout) is
N - batch size, C - number of channels.
▶ Number of learnable parameters: Cout * (Cin * size(kernel) + 1)
▶ https://pytorch.org/docs/master/generated/torch.nn.Conv2d.html
Convolution Layer
Andrey Ustyuzhanin 48
Receptive field
We can recognize larger objects by stacking
several small convolutions!
receptive field 5x5
receptive field 3x3
First 3x3 conv
Second 3x3 conv
Receptive field
2nd
3rd
4th
1st
1x3 conv
layer
1x9
1x7
1x5
1x3
inputs
Q: how many 3x3 convolutions we should use
to recognize a 100x100px kid
Receptive field
2nd
3rd
4th
1st
1x3 conv
layer
1x9
1x7
1x5
1x3
inputs
Q: how many 3x3 convolutions we should use
to recognize a 100x100px kid
A: around 50... we need to increase receptive field faster!
Kernel size choice heuristic
- Odd number per dimension (there should be the
center of the kernel)
- If input image is larger than 128x128?
- Use 5x5 or 7x7
- and then quickly reduce spatial dimensions — then start
working with 3×3 kernels
- Otherwise consider sticking with
1×1 and 3×3 filters.
Pooling
Intuition: What is the highest kid-ness
over this area?
Pooling
Popular types:
● Max
● Mean(average)
No params to train!
Motivation:
● Reduce layer size by a factor
● Make NN less sensitive to small
image shifts
Pooling Demo
Andrey Ustyuzhanin
▶ <Demo>
55
▶ How large is the receptive field for the following module:
– Conv1d(3x1, stride=1)
– MaxPool1d(2x1, stride=2)
▶ Answer:
▶ How many parameters should such network learn:
– Input (100x100x3)
– Conv2d(3x3, 2 out channels, bias=True)
– MaxPool2d(3x3, stride=3, dilation=1)
▶ Answer:
Self-check
Andrey Ustyuzhanin 56
Convolutional features stacking
https://arxiv.org/pdf/1409.4842.pdf
Andrey Ustyuzhanin 57
Computer vision example: GoogleNet
Andrey Ustyuzhanin https://distill.pub/2017/feature-visualization/appendix/
https://arxiv.org/pdf/1409.4842.pdf
58
▶ MemoryError(0x...)
▶ Gradients can vanish
▶ Gradients can explode
▶ Activations can vanish
▶ Activations can explode
Possible solutions:
▶ Use pre-trained networks
▶ Different normalization techniques
- BatchNorm, LayerNorm, InstanceNorm, LocalResponseNorm
Possible problems with large networks
Andrey Ustyuzhanin 59
Combining networks
Andrey Ustyuzhanin 60
General
▶ Image tagging, captioning,
retrieval, morphing, encoding,
upscaling
▶ Video processing, interpolation
▶ 3D point-clouds
HEP examples
▶ Jet tagging
▶ Particle tracking
▶ Calorimeter image analysis
▶ Cherenkov detector image
analysis
Other image-related tasks and libraries
Andrey Ustyuzhanin
▶ Torchvision,
▶ Kornia, Computer Vision
Library for PyTorch
▶ MMF, A modular
framework for vision &
language multimodal
research from Facebook
AI Research (FAIR),
▶ PyTorch3D,
https://pytorch3d.org/
61
▶ Dimensionality reduction using 1x1 convolutions,
https://machinelearningmastery.com/introduction-to-1x1-convolutions-to-
reduce-the-complexity-of-convolutional-neural-networks/
▶ Sparse convolutions, https://github.com/facebookresearch/SparseConvNet
▶ Gauge Equivariant Convolutional Networks,
https://arxiv.org/pdf/1902.04615.pdf
▶ Dilated convolutions, http://www.erogol.com/dilated-convolution/
▶ Receptive field guide https://medium.com/mlreview/a-guide-to-receptive-
field-arithmetic-for-convolutional-neural-networks-e0f514068807
Moar info
Andrey Ustyuzhanin 62
▶ Images are the first-class citizens in the deep Learning world;
▶ Images are high-dimensional objects that demand powerful techniques for
efficient processing:
– Convolution;
– Maxpooling;
Which effectively makes neural network capable of dealing with certain kind of
symmetries: translational, scale equivariant. For dealing with rotational and other kind of
symmetries see Steerable CNNs (arXiv:1612.08498v1).
▶ Architecture selection:
– pre-trained models;
– Neural architecture search (see module on optimization);
▶ Further tasks: segmentation, object detection, saliency map.
Convolutions wrap-up
Andrey Ustyuzhanin 63
›
Sequence modeling
Andrey Ustyuzhanin
Sequential data
Represent:
▶ Time series:
– Financial data, stock market
– Healthcare: pulse rate, sugar level
▶ Text and speech
▶ Spatiotemporal data:
– Self-driving and object tracking
– Plate tectonic activity
▶ Physics:
– Accelerator’s, detector’s data flows,
anomaly detection
– Tracking
▶ etc.
Andrey Ustyuzhanin 65
▶ Problem statement:
▶ ▶ x = {x1, x2, . . . , xn} - sequence of objects xi ∈ V
▶ ▶ y ∈ {1 . . . C} - labels (C is a number of classes)
▶ ▶ {(x1
, y1), (x2
, y2), . . . , (xm, ym)} - training data of size m
Aim: predict y given
Examples:
▶ ▶ Medicine: x - pulse rate, y - health (healthy, unhealthy)
▶ ▶ Opinion mining: x - sentence, y - sentiment (positive, negative)
▶ ▶ Physics: ghost tracks filtering. x - sequence of hits for some track, y - class
(ghost or not)
1. Sequence classification
Andrey Ustyuzhanin 66
Sequence classification. Ghost tracks filtering
Figure: link
Problem: which set of hits form a true
track, and which are just a random track-
like set of hits (ghost)?
Andrey Ustyuzhanin 67
Problem statement:
▶ x = {x1, x2, . . . , xn} - sequence of objects xi ∈ V
▶ y = {y1, y2, . . . , yn} - sequence of labels, yi ∈ {1 . . . C} (C is a number of classes)
▶ {(x1, y1), (x2, y2), . . . , (xm, ym)} - training data of size m
▶ Aim: predict y given x
Examples:
2. Sequence labelling
▶
▶ Genome annotation: x - DNA, y - genes
Hardware: x are system logs, y is system state at each time
▶ Physics: track reconstruction. x - set of hits, y - set of tracks (each hit is assigned to
some track)
Andrey Ustyuzhanin 68
Sequence labelling. Track reconstruction
example
Figure:
http://old.inspirehep.net/record/1729722/plots
Andrey Ustyuzhanin 69
▶ Problem statement:
▶ ▶ x = {x1, x2, . . . , xn} - source sequence of objects xi ∈ Vsource
▶ ▶ y = {y1, y2, . . . , yn} - target sequence of objects yi ∈ Vtarget
▶ ▶ {(x1
, y1
), (x2
, y2
), . . . , (xm, ym)} - training data of size m
▶ ▶ Aim: fit transformation from x to y Examples:
▶ ▶ Speech to text - x is speech, y is text
▶ ▶ Machine translation - x are sentences in English, y are sentences in German
▶ ▶ Logs compression - x are raw logs, y are compressed logs
3. Sequence to sequence transformation
Andrey Ustyuzhanin 70
Sequence transformation. Example
Figure:
https://buomsoo-kim.github.io/attention/2020/01/12/Attention-mechanism-3.md/
Andrey Ustyuzhanin 71
›
Recurrent neural networks
Andrey Ustyuzhanin 72
Problem
Question: How to perform elementwise processing of sequential data effectively
preserving its context?
Answer: Let’s keep all the context in a separate term (hidden state).
Figure: towardsdatascience.com 73 / 48
Recurrent neural networks (RNN)
▶ Input: (xi, hi) from sequence of input vectors x = {x1, x2, . . . , xn} and hidden states
(context) h = {0,h2, . . . , hn−1}, xi ∈ Rdin , h ∈ Rdhid
▶ Output: (yi, hi+1) from sequence of output vectors y = {y1, y2, . . . , yn} and next
hidden states (context) h = {h2, h3 . . . , hn}, yi ∈ Rdout , h ∈ Rdhid
74 / 48
Stacked RNN
Figure: Goldberg, Yoav. Neural network methods for natural language processing
Andrey Ustyuzhanin 75
RNN in sequence modeling
▶ Sequence classification - x = {x1, x2, . . . , xn} → yn, yn is final prediction, x is input
sequence
▶ Sequence labelling - x = {x1, x2, . . . , xn} → y = {y1, y2, . . . , yn}, x is input sequence and
y is labeled sequence
▶Sequence transformation - x = {x1, x2, . . . , xn} → y = {y1, y2, . . . , yn}, x is input
sequence and y is output sequence
Andrey Ustyuzhanin 76
Advantages:
▶ Preserves sequence context
▶ Good performance and flexibility
▶ Effectively handles longer sequences (longer memory)
▶ Handles sequences with variable size
▶ Model complexity does not depend on sequence length
Disadvantages:
RNN vs. Fully-connected. Advantages
and disadvantages
▶ Moves in sequence in the only direction (Some sequences must be processed in
both directions)
▶ Still has short memory due to vanishing gradients problems
▶ Exploding gradients due to cycles in recurrent connections
Andrey Ustyuzhanin 77
›
LSTM
Andrey Ustyuzhanin 78
LSTM. Idea
Idea: regulate the flow of context by special gates (auxilary neural networks) and split
memory (hidden state) in two separate terms: short-term h and long-term c
Andrey Ustyuzhanin 79
LSTM vs. RNN
Figure 1: Basic RNN Figure 2: LSTM
Figures: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Andrey Ustyuzhanin 80
LSTM inside
Figure 3: Long-memory update Figure 4: Short-memory and output update
Figures: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Andrey Ustyuzhanin 81
Advantages:
▶ Preserves sequence context
▶ Good performance and flexibility
Disadvantages:
▶ Moves in sequence in the only direction (Some sequences must be processed in
both directions)
LSTM vs. RNN. Advantages and
disadvantages
▶ Short memory due to vanishing gradients problems
▶ Exploding gradients due to cycles in recurrent connections
▶ Slower and heavier than RNN
Andrey Ustyuzhanin 82
›
GRU
Andrey Ustyuzhanin 83
GRU
Idea: combine long-term and short-term updates in single channel
Andrey Ustyuzhanin 84
Advantages:
▶ All context stored in single hidden state
▶ Lower model complexity
▶ Faster
Disadvantages:
▶ Shorter memory
GRU vs. LSTM. Advantages and
disadvantages
Andrey Ustyuzhanin 85
›
Probabilistic sequence
modeling
Andrey Ustyuzhanin 86
Probabilistic sequence modeling
How to fit p(xi|x1:i) and generate sequence xi ~ p(xi|x1:i)?
Andrey Ustyuzhanin 87
Probabilistic sequence modeling using RNN
Andrey Ustyuzhanin 88
Probabilistic sequence modeling using RNN
▶ Examples of generated texts:
“The surprised in investors weren’t going to raise money. I’m not the company with the
time there are all interesting quickly, don’t have to get off the same programmers...”
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
▶ Examples of generated MIDI music: https://towardsdatascience.com/how-to-
generate-music-using-a-lstm-neural-network-in-keras-68786834d4c5
Andrey Ustyuzhanin 89
Pros and cons of RNNs
Advantages:
▶ RNNs are popular and successful for variable-length sequences
▶ The gating models such as LSTM are suited for long-range error propagation
Problems:
▶ No transfer learning options
▶ The sequentiality prohibits parallelization within instances
▶ Long-range dependencies still tricky, despite gating (long-term dependencies are
actually not so long due to gradient vanishing)
Andrey Ustyuzhanin 90
▶ Each data exhibits certain kind of symmetry, which need to be
recognized by a learner (neural network, NN) in order to generalize
successfully;
▶ NN architecture is our assumptions on data symmetry encoded as a
data transformation pipeline
– Images: translation, scale -> convolutions;
– Sequences: context -> RNNs, LSTMs;
▶ PyTorch provides a flexible toolbox for writing such pipelines (almost)
effortlessly;
▶ Rich PyTorch ecosystems spans widely.
Conclusion
Andrey Ustyuzhanin 91
Thank you!
andrey.ustyuzhanin@constructor.org
anaderiRu
Andrey Ustyuzhanin
Andrey Ustyuzhanin 92
▶ https://pytorch.org/docs/stable/index.html
▶ https://pytorch.org/tutorials/beginner/ptcheat.html
▶ http://neuralnetworksanddeeplearning.com/chap2.html
▶ https://www.khanacademy.org/math/differential-calculus/dc-chain
▶ https://blog.paperspace.com/pytorch-101-understanding-graphs-and-
automatic-differentiation/
More on PyTorch
Andrey Ustyuzhanin 94
›
Bi-RNN
Andrey Ustyuzhanin 95
Bi-RNN
Figure: Goldberg, Yoav. Neural network methods for natural language processing
Andrey Ustyuzhanin 96
Advantages:
▶ Preserves sequence context
▶ Good performance and flexibility
Disadvantages:
▶ Moves in sequence in the only direction (Some sequences must be processed in
both directions. For instance, words meaning depends both on previous and further
context)
▶ Short memory due to vanishing gradients problems
▶ Exploding gradients due to cycles in recurrent connections
Bi-RNN. Advantages and disadvantages
Andrey Ustyuzhanin 97

More Related Content

What's hot

Algorithm Complexity and Main Concepts
Algorithm Complexity and Main ConceptsAlgorithm Complexity and Main Concepts
Algorithm Complexity and Main ConceptsAdelina Ahadova
 
15 puzzle problem using branch and bound
15 puzzle problem using branch and bound15 puzzle problem using branch and bound
15 puzzle problem using branch and boundAbhishek Singh
 
compiler ppt on symbol table
 compiler ppt on symbol table compiler ppt on symbol table
compiler ppt on symbol tablenadarmispapaulraj
 
Practical OOP In Java
Practical OOP In JavaPractical OOP In Java
Practical OOP In Javawiradikusuma
 
Introduction to Object Oriented Programming
Introduction to Object Oriented ProgrammingIntroduction to Object Oriented Programming
Introduction to Object Oriented ProgrammingMoutaz Haddara
 
Tic tac toe game with graphics presentation
Tic  tac  toe game with graphics presentationTic  tac  toe game with graphics presentation
Tic tac toe game with graphics presentationPrionto Abdullah
 
Multithread Programing in Java
Multithread Programing in JavaMultithread Programing in Java
Multithread Programing in JavaM. Raihan
 
Machine Learning in NLP
Machine Learning in NLPMachine Learning in NLP
Machine Learning in NLPVijay Ganti
 
Matrix chain multiplication
Matrix chain multiplicationMatrix chain multiplication
Matrix chain multiplicationKiran K
 
AI - Backtracking vs Depth-First Search (DFS)
AI - Backtracking vs Depth-First Search (DFS)AI - Backtracking vs Depth-First Search (DFS)
AI - Backtracking vs Depth-First Search (DFS)Johnnatan Messias
 
Dbms 14: Relational Calculus
Dbms 14: Relational CalculusDbms 14: Relational Calculus
Dbms 14: Relational CalculusAmiya9439793168
 
Asymptotic analysis
Asymptotic analysisAsymptotic analysis
Asymptotic analysisSoujanya V
 
Postfix Notation | Compiler design
Postfix Notation | Compiler designPostfix Notation | Compiler design
Postfix Notation | Compiler designShamsul Huda
 
Dag representation of basic blocks
Dag representation of basic blocksDag representation of basic blocks
Dag representation of basic blocksJothi Lakshmi
 
All pair shortest path
All pair shortest pathAll pair shortest path
All pair shortest pathArafat Hossan
 
Job sequencing with deadline
Job sequencing with deadlineJob sequencing with deadline
Job sequencing with deadlineArafat Hossan
 

What's hot (20)

OOPS Basics With Example
OOPS Basics With ExampleOOPS Basics With Example
OOPS Basics With Example
 
Algorithm Complexity and Main Concepts
Algorithm Complexity and Main ConceptsAlgorithm Complexity and Main Concepts
Algorithm Complexity and Main Concepts
 
15 puzzle problem using branch and bound
15 puzzle problem using branch and bound15 puzzle problem using branch and bound
15 puzzle problem using branch and bound
 
compiler ppt on symbol table
 compiler ppt on symbol table compiler ppt on symbol table
compiler ppt on symbol table
 
Practical OOP In Java
Practical OOP In JavaPractical OOP In Java
Practical OOP In Java
 
Introduction to Object Oriented Programming
Introduction to Object Oriented ProgrammingIntroduction to Object Oriented Programming
Introduction to Object Oriented Programming
 
Tic tac toe game with graphics presentation
Tic  tac  toe game with graphics presentationTic  tac  toe game with graphics presentation
Tic tac toe game with graphics presentation
 
Multithread Programing in Java
Multithread Programing in JavaMultithread Programing in Java
Multithread Programing in Java
 
Unit 3 daa
Unit 3 daaUnit 3 daa
Unit 3 daa
 
Pointers in c++
Pointers in c++Pointers in c++
Pointers in c++
 
Machine Learning in NLP
Machine Learning in NLPMachine Learning in NLP
Machine Learning in NLP
 
Matrix chain multiplication
Matrix chain multiplicationMatrix chain multiplication
Matrix chain multiplication
 
AI - Backtracking vs Depth-First Search (DFS)
AI - Backtracking vs Depth-First Search (DFS)AI - Backtracking vs Depth-First Search (DFS)
AI - Backtracking vs Depth-First Search (DFS)
 
Dbms 14: Relational Calculus
Dbms 14: Relational CalculusDbms 14: Relational Calculus
Dbms 14: Relational Calculus
 
Asymptotic analysis
Asymptotic analysisAsymptotic analysis
Asymptotic analysis
 
Python - object oriented
Python - object orientedPython - object oriented
Python - object oriented
 
Postfix Notation | Compiler design
Postfix Notation | Compiler designPostfix Notation | Compiler design
Postfix Notation | Compiler design
 
Dag representation of basic blocks
Dag representation of basic blocksDag representation of basic blocks
Dag representation of basic blocks
 
All pair shortest path
All pair shortest pathAll pair shortest path
All pair shortest path
 
Job sequencing with deadline
Job sequencing with deadlineJob sequencing with deadline
Job sequencing with deadline
 

Similar to 1-pytorch-CNN-RNN.pdf

Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Waste Classification System using Convolutional Neural Networks.pptx
Waste Classification System using Convolutional Neural Networks.pptxWaste Classification System using Convolutional Neural Networks.pptx
Waste Classification System using Convolutional Neural Networks.pptxJohnPrasad14
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryAndrii Gakhov
 
Keras on tensorflow in R & Python
Keras on tensorflow in R & PythonKeras on tensorflow in R & Python
Keras on tensorflow in R & PythonLonghow Lam
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningTapas Majumdar
 
Restricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for AttributionRestricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for Attributiontaeseon ryu
 
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...Universitat Politècnica de Catalunya
 
Deep Learning Tutorial
Deep Learning Tutorial Deep Learning Tutorial
Deep Learning Tutorial Ligeng Zhu
 
8_Neural Networks in artificial intelligence.ppt
8_Neural Networks in artificial intelligence.ppt8_Neural Networks in artificial intelligence.ppt
8_Neural Networks in artificial intelligence.pptssuser7e63fd
 
5.MLP(Multi-Layer Perceptron)
5.MLP(Multi-Layer Perceptron) 5.MLP(Multi-Layer Perceptron)
5.MLP(Multi-Layer Perceptron) 艾鍗科技
 
UNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptxUNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptxNoorUlHaq47
 
Deep learning @ University of Oradea - part I (16 Jan. 2018)
Deep learning @ University of Oradea - part I (16 Jan. 2018)Deep learning @ University of Oradea - part I (16 Jan. 2018)
Deep learning @ University of Oradea - part I (16 Jan. 2018)Vlad Ovidiu Mihalca
 
Introduction to artificial neural network and deep learning
Introduction to artificial neural network and deep learningIntroduction to artificial neural network and deep learning
Introduction to artificial neural network and deep learningPramod Ramachandra
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”Dr.(Mrs).Gethsiyal Augasta
 
物件偵測與辨識技術
物件偵測與辨識技術物件偵測與辨識技術
物件偵測與辨識技術CHENHuiMei
 
Lecture 5: Neural Networks II
Lecture 5: Neural Networks IILecture 5: Neural Networks II
Lecture 5: Neural Networks IISang Jun Lee
 
The RoboCup Rescue Dataset
The RoboCup Rescue DatasetThe RoboCup Rescue Dataset
The RoboCup Rescue DatasetPeter Lorenz
 

Similar to 1-pytorch-CNN-RNN.pdf (20)

ANNs.pdf
ANNs.pdfANNs.pdf
ANNs.pdf
 
Eye deep
Eye deepEye deep
Eye deep
 
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
 
Waste Classification System using Convolutional Neural Networks.pptx
Waste Classification System using Convolutional Neural Networks.pptxWaste Classification System using Convolutional Neural Networks.pptx
Waste Classification System using Convolutional Neural Networks.pptx
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: Theory
 
Keras on tensorflow in R & Python
Keras on tensorflow in R & PythonKeras on tensorflow in R & Python
Keras on tensorflow in R & Python
 
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learning
 
Restricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for AttributionRestricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for Attribution
 
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
 
Deep Learning Tutorial
Deep Learning Tutorial Deep Learning Tutorial
Deep Learning Tutorial
 
8_Neural Networks in artificial intelligence.ppt
8_Neural Networks in artificial intelligence.ppt8_Neural Networks in artificial intelligence.ppt
8_Neural Networks in artificial intelligence.ppt
 
5.MLP(Multi-Layer Perceptron)
5.MLP(Multi-Layer Perceptron) 5.MLP(Multi-Layer Perceptron)
5.MLP(Multi-Layer Perceptron)
 
UNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptxUNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptx
 
Deep learning @ University of Oradea - part I (16 Jan. 2018)
Deep learning @ University of Oradea - part I (16 Jan. 2018)Deep learning @ University of Oradea - part I (16 Jan. 2018)
Deep learning @ University of Oradea - part I (16 Jan. 2018)
 
Introduction to artificial neural network and deep learning
Introduction to artificial neural network and deep learningIntroduction to artificial neural network and deep learning
Introduction to artificial neural network and deep learning
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”
 
物件偵測與辨識技術
物件偵測與辨識技術物件偵測與辨識技術
物件偵測與辨識技術
 
Lecture 5: Neural Networks II
Lecture 5: Neural Networks IILecture 5: Neural Networks II
Lecture 5: Neural Networks II
 
The RoboCup Rescue Dataset
The RoboCup Rescue DatasetThe RoboCup Rescue Dataset
The RoboCup Rescue Dataset
 

Recently uploaded

MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 

Recently uploaded (20)

MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 

1-pytorch-CNN-RNN.pdf

  • 1. ReCap: PyTorch, CNNs, RNNs Andrey Ustyuzhanin Andrey Ustyuzhanin 1
  • 4. ▶ Simple, transparent development/ debugging ▶ Rich Ecosystem: – Plenty of pretrained models – NLP, Vision, … – Interpretation – Hyper-optimization ▶ Production Ready (C++, ONNX, Services) ▶ Distributed Training, declarative data parallelism ▶ Cloud Deployment support ▶ Choice of many industry leaders and researchers PyTorch highlights Andrey Ustyuzhanin 4
  • 5. Neural network representation Tensors Functions X Y ReLU ReLU Y b2 W2 W1 b1 X Andrey Ustyuzhanin 5
  • 7. ▶ Keep in mind occurrence of tensors on devices: CPU, GPU, TPU ▶ Operations can be performed only if its arguments are inhabiting the same device Tensor creation and placement Andrey Ustyuzhanin 7
  • 8. GPU, TPU support ▶ https://pytorch.org/docs/stable/cuda.html ▶ http://pytorch.org/xla/release/1.5/index.html Andrey Ustyuzhanin 8
  • 9. Building blocks, graph source Toy example: Andrey Ustyuzhanin 9
  • 12. Computing gradient automatically Each Tensor has an attribute grad_fn, which refers to the mathematical operator that created it. If Tensor is a leaf node (initialized by the user), then the grad_fn is None. Andrey Ustyuzhanin 13
  • 13. ▶ All math steps represented by classes inherited from torch.autograd.Function – forward, computes node output and buffers it – backward, stores incoming gradient in grad and passes further Functions Andrey Ustyuzhanin 14
  • 14. ▶ Compute gradient for every tensor involved ▶ Make gradient descent step in the opposite direction: Gradient descent Andrey Ustyuzhanin 15
  • 15. ▶ Calling forward creates – graph with the intermediate node output values, – buffers for the non-leaf nodes, – buffers for intermediate gradient values. ▶ Calling backward – computes gradients and – frees the buffers and destroys the graph. ▶ Next time, calling forward – leaf node buffers from the previous run will be shared, – non-leaf nodes buffers will be recreated. Dynamic graph Andrey Ustyuzhanin 16
  • 16. ▶ Due to the flexibility of the network architecture, it is not obvious when does iteration of a gradient descent stops, so backward’s gradients are accumulated each time a variable (Tensor) occurs in the graph; ▶ It is usually desired for RNN cases; ▶ If you do not need to accumulate those, you must clean previous gradient values at the end of each iteration: – Either by x.data.zero_() for every model tensor x; – Or by optimizers’s zero_grad() method, which is more preferable. Gradient cleaning Andrey Ustyuzhanin 17
  • 17. ▶ Requires_grad attribute of the Tensor class. By default, it’s False. It comes handy when you must freeze some layers and stop them from updating parameters while training. ▶ Thus, no gradient would be propagated to them, or to those layers which depend upon these layers for gradient flow requires_grad. ▶ When set to True, requires_grad is contagious: even if one operand of an operation has requires_grad set to True, so will the result. Freezing weights Andrey Ustyuzhanin 18
  • 19. ▶ When we are computing gradients, we need to cache input values, and intermediate features as they maybe required to compute the gradient later. The gradient of b=w1∗a w.r.t it's inputs w1 and a is a and w1, respectively. ▶ We need to store these values for gradient computation during the backward pass. This affects the memory footprint of the network. ▶ While, we are performing inference, we don't compute gradients ▶ Even better and recent optimized context: with torch.inference_mode (link) Inference Andrey Ustyuzhanin 20
  • 20. Neural Network class: torch.nn.Module Andrey Ustyuzhanin 21
  • 21. Helpful functions Andrey Ustyuzhanin Type Classes/functions examples Loss functions BCELoss, CrossEntropyLoss, L1Loss, MSELoss, NLLLoss, SoftMarginLoss, MultiLabelSoftMarginLoss, CosineEmbeddingLoss, KLDivLoss, … Activation functions ReLU, ELU, SELU, PReLU, LeakyReLU, Threshold, HardTanh, Sigmoid, Tanh,LogSigmoid, Softplus, Softshrink, Softsign, TanhShrink, Softmin, Softmax, Softmax2d .. Optimizers opt = optim.X(model.parameters(), ...) where X is SGD, Adadelta, Adagrad, Adam, SparseAdam, Adamax, ASGD, LBFGS, RMSProp or Rprop Visualization from torchviz import make_dot make_dot(y.mean(), params=dict(model.named_parameters())) 22
  • 22. Open Neural Network eXchange (ONNX) is an open standard format for representing machine learning models. The torch.onnx module can export PyTorch models to ONNX. The model can then be consumed by any of the many runtimes that support ONNX. ▶ Export: ▶ Run: ONNX Andrey Ustyuzhanin 23
  • 24. ▶ https://pytorch.org/docs/stable/data.html?highlight=dataloader#torch.utils.data.DataLoader Data Utils ▶ https://pytorch.org/docs/stable/data.html?highlight=dataset#torch.utils.data.Dataset Andrey Ustyuzhanin 25
  • 25. ▶ PyTorch 2.0 has a new API called torch.compile that wraps models and returns compiled ones. ▶ It is 100% backward compatible and features technologies like TorchInductor with Nvidia and AMD GPUs, Accelerated Transformers, Metal Performance Shaders backend, and more. ▶ Metal Performance Shaders (MPS) backend provides GPU accelerated PyTorch training on Mac platforms with added support for Top 60 most used ops, bringing coverage to over 300 operators. ▶ Amazon AWS also added optimizations for PyTorch CPU inference on their Graviton3-based C7g instances. ▶ Lastly, there are new features and technologies like TensorParallel, DTensor, 2D parallel, TorchDynamo, AOTAutograd, PrimTorch, and TorchInductor. PyTorch 2.0 (since March 2023) Andrey Ustyuzhanin https://bit.ly/3KgcsYE 26
  • 26. ▶ The lightweight PyTorch wrapper for high-performance AI research. PyTorch Lightning Andrey Ustyuzhanin https://github.com/PyTorchLightning/pytorch-lightning 27
  • 28. ▶ PyTorch lightning ▶ PyTorch geometric – graph neural nets ▶ Hydra – parametrization and experiments ▶ Horovod – distributed training ▶ Skorch – scikit-learn compatible neural network library ▶ Captum – model interpretation ▶ Ray – accelerating ML workloads ▶ And many others, see https://pytorch.org/ecosystem/ Ecosystem Andrey Ustyuzhanin 29
  • 29. PyTorch wrap-up ▶ PyTorch is a solid, flexible, production-ready foundation for real-life deep-learning applications ▶ Building blocks: – Tensors – Functions ▶ Dynamic graph automatic differentiation – CPU, GPU, TPU ▶ Rich ecosystem Andrey Ustyuzhanin 30
  • 31. Image Recognition Andrey Ustyuzhanin Banksy RGB image is a 3D array [HxWx3] or [3xHxW] 32
  • 33. Problem with images Slide from Andrey Zimovnov's “Deep learning for computer vision” You network will have to learn those two cases separately! Worst case: one neuron per position. 👶 👶
  • 34. Main idea: let’s encode ”kid” by weight tensor that we can shift across the image. Andrey Ustyuzhanin 35
  • 35. Same features for each spot Weights 𝑤ij Portable kid detector pro! No kid 👶
  • 36. Same features for each spot Weights 𝑤ij Portable kid detector pro! No kid No kid 👶
  • 37. Same features for each spot Weights 𝑤ij Portable kid detector pro! No kid No kid Kid! 👶 👶
  • 38. Same features for each spot Weights 𝑤ij Portable kid detector pro! No kid No kid Kid! No kid 👶
  • 39. Same features for each spot Weights 𝑤ij Portable kid detector pro! No kid No kid Kid! No kid 👶 There's a kid on the pic!
  • 40. Convolution ● Apply same weights to all patches ● 1D example: image output weights
  • 41. Convolution, 2D apply one “filter” to all patches
  • 42. Convolution Intuition: how kid-like is this square? 5x5 3x3 (5-3+1)
  • 43. Example of convolution Intuition: how edge-like is this square?
  • 44. ▶ pad the edges with extra, “fake” pixels (usually of value 0, hence the oft-used term “zero padding”). This way, the kernel when sliding can allow the original edge pixels to be at its center, while extending into the fake pixels beyond the edge, producing an output the same size as the input; ▶ padding_mode (string, optional) – 'zeros', 'reflect', 'replicate' or 'circular'. Default: 'zeros' Convolution tricks: padding Andrey Ustyuzhanin 45
  • 45. ▶ The idea of the stride is to skip some of the slide locations of the kernel (see figure on the right) ▶ Dilated convolution is a basic convolution only applied to the input volume with defined gaps (see below). You may use dilated convolution when: Convolution tricks: striding and dilation - You are working with higher resolution images, but fine- grained details are still important - You are constructing a network with fewer parameters Andrey Ustyuzhanin 46
  • 47. ▶ Intuition: learns weights by gradient descent ▶ May learn arbitrary number of kernels, one per output channel ▶ the output value of the layer with input size (N, Cin, H, W) and output (N, Cout, Hout, Wout) is N - batch size, C - number of channels. ▶ Number of learnable parameters: Cout * (Cin * size(kernel) + 1) ▶ https://pytorch.org/docs/master/generated/torch.nn.Conv2d.html Convolution Layer Andrey Ustyuzhanin 48
  • 48. Receptive field We can recognize larger objects by stacking several small convolutions! receptive field 5x5 receptive field 3x3 First 3x3 conv Second 3x3 conv
  • 49. Receptive field 2nd 3rd 4th 1st 1x3 conv layer 1x9 1x7 1x5 1x3 inputs Q: how many 3x3 convolutions we should use to recognize a 100x100px kid
  • 50. Receptive field 2nd 3rd 4th 1st 1x3 conv layer 1x9 1x7 1x5 1x3 inputs Q: how many 3x3 convolutions we should use to recognize a 100x100px kid A: around 50... we need to increase receptive field faster!
  • 51. Kernel size choice heuristic - Odd number per dimension (there should be the center of the kernel) - If input image is larger than 128x128? - Use 5x5 or 7x7 - and then quickly reduce spatial dimensions — then start working with 3×3 kernels - Otherwise consider sticking with 1×1 and 3×3 filters.
  • 52. Pooling Intuition: What is the highest kid-ness over this area?
  • 53. Pooling Popular types: ● Max ● Mean(average) No params to train! Motivation: ● Reduce layer size by a factor ● Make NN less sensitive to small image shifts
  • 55. ▶ How large is the receptive field for the following module: – Conv1d(3x1, stride=1) – MaxPool1d(2x1, stride=2) ▶ Answer: ▶ How many parameters should such network learn: – Input (100x100x3) – Conv2d(3x3, 2 out channels, bias=True) – MaxPool2d(3x3, stride=3, dilation=1) ▶ Answer: Self-check Andrey Ustyuzhanin 56
  • 57. Computer vision example: GoogleNet Andrey Ustyuzhanin https://distill.pub/2017/feature-visualization/appendix/ https://arxiv.org/pdf/1409.4842.pdf 58
  • 58. ▶ MemoryError(0x...) ▶ Gradients can vanish ▶ Gradients can explode ▶ Activations can vanish ▶ Activations can explode Possible solutions: ▶ Use pre-trained networks ▶ Different normalization techniques - BatchNorm, LayerNorm, InstanceNorm, LocalResponseNorm Possible problems with large networks Andrey Ustyuzhanin 59
  • 60. General ▶ Image tagging, captioning, retrieval, morphing, encoding, upscaling ▶ Video processing, interpolation ▶ 3D point-clouds HEP examples ▶ Jet tagging ▶ Particle tracking ▶ Calorimeter image analysis ▶ Cherenkov detector image analysis Other image-related tasks and libraries Andrey Ustyuzhanin ▶ Torchvision, ▶ Kornia, Computer Vision Library for PyTorch ▶ MMF, A modular framework for vision & language multimodal research from Facebook AI Research (FAIR), ▶ PyTorch3D, https://pytorch3d.org/ 61
  • 61. ▶ Dimensionality reduction using 1x1 convolutions, https://machinelearningmastery.com/introduction-to-1x1-convolutions-to- reduce-the-complexity-of-convolutional-neural-networks/ ▶ Sparse convolutions, https://github.com/facebookresearch/SparseConvNet ▶ Gauge Equivariant Convolutional Networks, https://arxiv.org/pdf/1902.04615.pdf ▶ Dilated convolutions, http://www.erogol.com/dilated-convolution/ ▶ Receptive field guide https://medium.com/mlreview/a-guide-to-receptive- field-arithmetic-for-convolutional-neural-networks-e0f514068807 Moar info Andrey Ustyuzhanin 62
  • 62. ▶ Images are the first-class citizens in the deep Learning world; ▶ Images are high-dimensional objects that demand powerful techniques for efficient processing: – Convolution; – Maxpooling; Which effectively makes neural network capable of dealing with certain kind of symmetries: translational, scale equivariant. For dealing with rotational and other kind of symmetries see Steerable CNNs (arXiv:1612.08498v1). ▶ Architecture selection: – pre-trained models; – Neural architecture search (see module on optimization); ▶ Further tasks: segmentation, object detection, saliency map. Convolutions wrap-up Andrey Ustyuzhanin 63
  • 64. Sequential data Represent: ▶ Time series: – Financial data, stock market – Healthcare: pulse rate, sugar level ▶ Text and speech ▶ Spatiotemporal data: – Self-driving and object tracking – Plate tectonic activity ▶ Physics: – Accelerator’s, detector’s data flows, anomaly detection – Tracking ▶ etc. Andrey Ustyuzhanin 65
  • 65. ▶ Problem statement: ▶ ▶ x = {x1, x2, . . . , xn} - sequence of objects xi ∈ V ▶ ▶ y ∈ {1 . . . C} - labels (C is a number of classes) ▶ ▶ {(x1 , y1), (x2 , y2), . . . , (xm, ym)} - training data of size m Aim: predict y given Examples: ▶ ▶ Medicine: x - pulse rate, y - health (healthy, unhealthy) ▶ ▶ Opinion mining: x - sentence, y - sentiment (positive, negative) ▶ ▶ Physics: ghost tracks filtering. x - sequence of hits for some track, y - class (ghost or not) 1. Sequence classification Andrey Ustyuzhanin 66
  • 66. Sequence classification. Ghost tracks filtering Figure: link Problem: which set of hits form a true track, and which are just a random track- like set of hits (ghost)? Andrey Ustyuzhanin 67
  • 67. Problem statement: ▶ x = {x1, x2, . . . , xn} - sequence of objects xi ∈ V ▶ y = {y1, y2, . . . , yn} - sequence of labels, yi ∈ {1 . . . C} (C is a number of classes) ▶ {(x1, y1), (x2, y2), . . . , (xm, ym)} - training data of size m ▶ Aim: predict y given x Examples: 2. Sequence labelling ▶ ▶ Genome annotation: x - DNA, y - genes Hardware: x are system logs, y is system state at each time ▶ Physics: track reconstruction. x - set of hits, y - set of tracks (each hit is assigned to some track) Andrey Ustyuzhanin 68
  • 68. Sequence labelling. Track reconstruction example Figure: http://old.inspirehep.net/record/1729722/plots Andrey Ustyuzhanin 69
  • 69. ▶ Problem statement: ▶ ▶ x = {x1, x2, . . . , xn} - source sequence of objects xi ∈ Vsource ▶ ▶ y = {y1, y2, . . . , yn} - target sequence of objects yi ∈ Vtarget ▶ ▶ {(x1 , y1 ), (x2 , y2 ), . . . , (xm, ym)} - training data of size m ▶ ▶ Aim: fit transformation from x to y Examples: ▶ ▶ Speech to text - x is speech, y is text ▶ ▶ Machine translation - x are sentences in English, y are sentences in German ▶ ▶ Logs compression - x are raw logs, y are compressed logs 3. Sequence to sequence transformation Andrey Ustyuzhanin 70
  • 72. Problem Question: How to perform elementwise processing of sequential data effectively preserving its context? Answer: Let’s keep all the context in a separate term (hidden state). Figure: towardsdatascience.com 73 / 48
  • 73. Recurrent neural networks (RNN) ▶ Input: (xi, hi) from sequence of input vectors x = {x1, x2, . . . , xn} and hidden states (context) h = {0,h2, . . . , hn−1}, xi ∈ Rdin , h ∈ Rdhid ▶ Output: (yi, hi+1) from sequence of output vectors y = {y1, y2, . . . , yn} and next hidden states (context) h = {h2, h3 . . . , hn}, yi ∈ Rdout , h ∈ Rdhid 74 / 48
  • 74. Stacked RNN Figure: Goldberg, Yoav. Neural network methods for natural language processing Andrey Ustyuzhanin 75
  • 75. RNN in sequence modeling ▶ Sequence classification - x = {x1, x2, . . . , xn} → yn, yn is final prediction, x is input sequence ▶ Sequence labelling - x = {x1, x2, . . . , xn} → y = {y1, y2, . . . , yn}, x is input sequence and y is labeled sequence ▶Sequence transformation - x = {x1, x2, . . . , xn} → y = {y1, y2, . . . , yn}, x is input sequence and y is output sequence Andrey Ustyuzhanin 76
  • 76. Advantages: ▶ Preserves sequence context ▶ Good performance and flexibility ▶ Effectively handles longer sequences (longer memory) ▶ Handles sequences with variable size ▶ Model complexity does not depend on sequence length Disadvantages: RNN vs. Fully-connected. Advantages and disadvantages ▶ Moves in sequence in the only direction (Some sequences must be processed in both directions) ▶ Still has short memory due to vanishing gradients problems ▶ Exploding gradients due to cycles in recurrent connections Andrey Ustyuzhanin 77
  • 78. LSTM. Idea Idea: regulate the flow of context by special gates (auxilary neural networks) and split memory (hidden state) in two separate terms: short-term h and long-term c Andrey Ustyuzhanin 79
  • 79. LSTM vs. RNN Figure 1: Basic RNN Figure 2: LSTM Figures: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ Andrey Ustyuzhanin 80
  • 80. LSTM inside Figure 3: Long-memory update Figure 4: Short-memory and output update Figures: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ Andrey Ustyuzhanin 81
  • 81. Advantages: ▶ Preserves sequence context ▶ Good performance and flexibility Disadvantages: ▶ Moves in sequence in the only direction (Some sequences must be processed in both directions) LSTM vs. RNN. Advantages and disadvantages ▶ Short memory due to vanishing gradients problems ▶ Exploding gradients due to cycles in recurrent connections ▶ Slower and heavier than RNN Andrey Ustyuzhanin 82
  • 83. GRU Idea: combine long-term and short-term updates in single channel Andrey Ustyuzhanin 84
  • 84. Advantages: ▶ All context stored in single hidden state ▶ Lower model complexity ▶ Faster Disadvantages: ▶ Shorter memory GRU vs. LSTM. Advantages and disadvantages Andrey Ustyuzhanin 85
  • 86. Probabilistic sequence modeling How to fit p(xi|x1:i) and generate sequence xi ~ p(xi|x1:i)? Andrey Ustyuzhanin 87
  • 87. Probabilistic sequence modeling using RNN Andrey Ustyuzhanin 88
  • 88. Probabilistic sequence modeling using RNN ▶ Examples of generated texts: “The surprised in investors weren’t going to raise money. I’m not the company with the time there are all interesting quickly, don’t have to get off the same programmers...” http://karpathy.github.io/2015/05/21/rnn-effectiveness/ ▶ Examples of generated MIDI music: https://towardsdatascience.com/how-to- generate-music-using-a-lstm-neural-network-in-keras-68786834d4c5 Andrey Ustyuzhanin 89
  • 89. Pros and cons of RNNs Advantages: ▶ RNNs are popular and successful for variable-length sequences ▶ The gating models such as LSTM are suited for long-range error propagation Problems: ▶ No transfer learning options ▶ The sequentiality prohibits parallelization within instances ▶ Long-range dependencies still tricky, despite gating (long-term dependencies are actually not so long due to gradient vanishing) Andrey Ustyuzhanin 90
  • 90. ▶ Each data exhibits certain kind of symmetry, which need to be recognized by a learner (neural network, NN) in order to generalize successfully; ▶ NN architecture is our assumptions on data symmetry encoded as a data transformation pipeline – Images: translation, scale -> convolutions; – Sequences: context -> RNNs, LSTMs; ▶ PyTorch provides a flexible toolbox for writing such pipelines (almost) effortlessly; ▶ Rich PyTorch ecosystems spans widely. Conclusion Andrey Ustyuzhanin 91
  • 92. ▶ https://pytorch.org/docs/stable/index.html ▶ https://pytorch.org/tutorials/beginner/ptcheat.html ▶ http://neuralnetworksanddeeplearning.com/chap2.html ▶ https://www.khanacademy.org/math/differential-calculus/dc-chain ▶ https://blog.paperspace.com/pytorch-101-understanding-graphs-and- automatic-differentiation/ More on PyTorch Andrey Ustyuzhanin 94
  • 94. Bi-RNN Figure: Goldberg, Yoav. Neural network methods for natural language processing Andrey Ustyuzhanin 96
  • 95. Advantages: ▶ Preserves sequence context ▶ Good performance and flexibility Disadvantages: ▶ Moves in sequence in the only direction (Some sequences must be processed in both directions. For instance, words meaning depends both on previous and further context) ▶ Short memory due to vanishing gradients problems ▶ Exploding gradients due to cycles in recurrent connections Bi-RNN. Advantages and disadvantages Andrey Ustyuzhanin 97