Convolutional neural network from VGG to DenseNet

Comprehension
of deep-learning
- CNN from VGG to DenseNet
19.07.18 You Sung Min

1. Review of Deep learning
(Convolutional Neural Network)
2. Residual network (Resnet)
He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings
of the IEEE conference on computer vision and pattern recognition. 2016.
3. Densely connected convolutional network
(DenseNet)
Huang, Gao, et al. "Densely connected convolutional networks." Proceedings of
the IEEE conference on computer vision and pattern recognition. 2017.
Contents

Structure of Neural Networks
 A simple model to emulate a single neuron
 This model produces a binary output
Review of Deep learning
=
𝟎 𝒊𝒇
𝒋
𝝎𝒋 𝒙𝒋 ≤ 𝑻
𝟏 𝒊𝒇
𝒋
𝝎𝒋 𝒙𝒋 > 𝑻
𝝎 𝟏
𝝎 𝟐
𝝎 𝟑
𝒋
𝝎𝒋 𝒙𝒋Inputs
Threshold T
Perceptron (1950) Neuron

Multilayer Perceptron (MLP)
 A network model consists of perceptrons
 This model produces vectorized outputs

Multilayer Perceptron (MLP)
Handwritten digit with
28 by 28 pixel image
Binary Input
(Intensity of a pixel)
28
28
Input
(784)
Desired output for “5”
𝒚(𝒙) = 𝟎, 𝟎, 𝟎, 𝟎, 𝟏, 𝟎, 𝟎, 𝟎, 𝟎 𝑻

Convolutional Neural Network
 Convolution layer
 Subsampling (Pooling) layer
 Rectified Linear Unit(ReLU)
Feature Extractor Classifier

 Local receptive field (connectivity)
28 by 28 23 by 23
5 by 5
Kernel
(window)
2D Convolution
1. Detect local information
(features)
(e.g., Edge, Shape)
2. Reduce connections
between layers
• Fully connected network
→ 𝟐𝟖 ∗ 𝟐𝟖 ∗ 𝟐𝟑 ∗ 𝟐3 connections
• Local connected network
→ 𝟓 ∗ 𝟓 ∗ 𝟐𝟑 ∗ 𝟐𝟑 connections
𝑤11 𝑤12
𝑤55

 Shared weights
1. Detect same feature
in other positions
2. Reduce total number of
weights and bias
3. Construct multiple feature
maps (kernels)
𝒐𝒖𝒕𝒑𝒖𝒕 = 𝝈(𝒃 +
𝒍=𝟎
𝟒
𝒎=𝟎
𝟒
𝝎𝒍,𝒎 𝒂𝒋+𝒍,𝒌+𝒎)

 Pooling layer
1. Simplify (condense)
information in the feature
map
2. Reduce connections
(weights and biases)
Max-pooling:
Output only maximum activation
Conv. Pooling

y = max(x,0)

Feature map

 Deeper neural networks are more difficult to train
 Vanishing (or exploding) gradient problem
 Degradation problem
Residual network (ResNet)

 Residual learning
 Desired underlying mapping: 𝓗(𝒙)
 Nonlinear stacked layer mapping: ℱ 𝒙 ≔ ℋ 𝒙 − 𝒙
 ∴ ℋ 𝒙 = ℱ 𝒙 + 𝒙
Residual mapping

Degradation problem

 A: zero-padding shortcuts
 B: project shortcut for increasing
dimension, other shortcut are
identity
 C: all shortcuts are projections

 Dense block
 Short paths from early layers to later layers
Densely connected convolutional network
 Connect all layer (with
matching feature-map
size) directly
 Combine feature by
concatenating
 ∴ 𝑳-layer has
𝑳 𝑳+𝟏
𝟐
conntections

 DenseNet
 ResNets output 𝒙𝒍 = 𝑯𝒍 𝒙𝒍−𝟏 + 𝒙𝒍−𝟏
 DenseNets output 𝒙𝒍 = 𝑯𝒍 𝒙 𝟎, 𝒙 𝟏, … , 𝒙𝒍−𝟏
𝑤ℎ𝑒𝑟𝑒, 𝑯𝒍 𝒊𝒔 𝑩𝑵 + 𝑹𝒆𝑳𝑼 + 𝟑 ∗ 𝟑 𝒄𝒐𝒏𝒗𝒐𝒍𝒖𝒕𝒊𝒐𝒏
AVP: average pooling
Transition layer
1 by 1 Conv
2 by 2 AVP

 Collective knowledge: very narrow layers (e.g., 12)
 Bottleneck layer: 1 by 1 conv before 3 by 3 conv
 Compression: reduce # of feature-map at transition layer

References
 Image Source from https://deeplearning4j.org/convolutionalnets
 Zeiler, Matthew D., and Rob Fergus. "Visualizing and understanding
convolutional networks.“ European Conference on Computer Vision,
Springer International Publishing, 2014.
 Jia-Bin Huang, “Lecture 29 Convolutional Neural Networks”,
Computer Vision Spring 2015
 He, Kaiming, et al. "Deep residual learning for image
recognition." Proceedings of the IEEE conference on computer vision
and pattern recognition. 2016.
 Huang, Gao, et al. "Densely connected convolutional
networks." Proceedings of the IEEE conference on computer vision
and pattern recognition. 2017.

Convolutional neural network from VGG to DenseNet

More Related Content

What's hot

Similar to Convolutional neural network from VGG to DenseNet

More from SungminYou

Recently uploaded

Convolutional neural network from VGG to DenseNet