https://telecombcn-dl.github.io/2018-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
18. Reminder: Convolutional Layer
Slide Credit: CS231n
Typical 3 x 3 convolution, stride 2 pad 1
Input: 4 x 4 Output: 2 x 2
Dot product
between filter
and input
18
19. Reminder: Convolutional Layer
Slide Credit: CS231n
Typical 3 x 3 convolution, stride 2 pad 1
Input: 4 x 4 Output: 2 x 2
Dot product
between filter
and input
19
20. Learnable Upsample: Transposed Convolution
Slide Credit: CS231n
3 x 3 “deconvolution”, stride 2 pad 1
Input: 2 x 2 Output: 4 x 4
20
21. Slide Credit: CS231n
3 x 3 “deconvolution”, stride 2 pad 1
Input: 2 x 2 Output: 4 x 4
Input gives
weight for
filter values
Learnable Upsample: Transposed Convolution
21
22. Learnable Upsample: Transposed Convolution
Slide Credit: CS231n
3 x 3 “deconvolution”, stride 2 pad 1
Input: 2 x 2 Output: 4 x 4
Input gives
weight for
filter values
Sum where
output overlaps
22
23. Learnable Upsample: Transposed Convolution
Warning: Checkerboard effect when kernel size is not
divisible by the stride
Source: distill.pub
23
24. Learnable Upsample: Transposed Convolution
Source: distill.pub
stride = 2, kernel_size = 3
24
Warning: Checkerboard effect when kernel size is not
divisible by the stride
27. Semantic Segmentation
CNN Coarse output
Problem 2:
High-level features (e.g. conv5 layer) from a pretrained classification network are the input for the
segmentation branch
27
28. Skip Connections
Slide Credit: CS231n
Skip connections = Better results
“skip
connections”
Long et al. Fully Convolutional Networks for Semantic Segmentation. CVPR 2015
Recovering low level features from early layers
28
29. Dilated Convolutions
Yu & Koltun. Multi-Scale Context Aggregation by Dilated Convolutions. ICLR 2016
Structural change in convolutional layers for dense prediction problems (e.g. image segmentation)
● The receptive field grows exponentially as you add more layers → more context information in deeper
layers wrt regular convolutions
● Number of parameters increases linearly as you add more layers
29