8. 8
Max Pooling and Average Pooling Layers
Roozbeh Sanaei
Yani, Muhamad. "Application of transfer learning using convolutional neural network method for early detection of terry’s nail." Journal
of Physics: Conference Series. Vol. 1201. No. 1. IOP Publishing, 2019.
9. 9
Why Convolutions?
Roozbeh Sanaei Coursera Deep Learning Specialization, Convolutional Neural Networks
Parameter Sharing: A feature detector (such as a vertical edge detector)
that’s useful in one part of the image, is probably useful in another part
of the image.
Sparsity of Connections: In each layer, each output value depends only
on small number of inputs.
10. 10
LeNet-5
Roozbeh Sanaei
Yani, Muhamad. "Application of transfer learning using convolutional neural network method for early detection of
terry’s nail." Journal of Physics: Conference Series. Vol. 1201. No. 1. IOP Publishing, 2019.
31. 31
1. Pre-train the CNN network on classification
2. Propose ROI candidates through selective search (class independent)
3. Warp ROIs candidates to CNN required size
4. Fine tune the CNN on warped ROIs for K + 1 classes (additional class is background)
5. CNN Generated features are fed to binary SVMs for each class
6. Regression model is trained to correct predicted window
Roozbeh Sanaei
RCNN
32. Roozbeh Sanaei 32
Generate region proposals that may contain objects
1. Create regions to start with through image segmentation
2. Iteratively:
1. Calculate similarities between all neighboring regions
2. Bundle two most similar regions
32
Roozbeh Sanaei
Selective Search
33. Roozbeh Sanaei 33
Predicted bounding box coordinates
Ground truth box coordinates
So all the bounding box correction functions,
can take any value between [-∞, +∞].
33
Roozbeh Sanaei
Bounding Box Regression
34. Roozbeh Sanaei 34
• Finding false positive samples during the training loops
• Including them in the training data so as to improve the classifier.
34
Roozbeh Sanaei
Hard Negative Mining
35. Roozbeh Sanaei 35
Instead of extracting separate CNN feature vectors for each region proposal, they are
aggregated into one CNN forward pass over the entire image, So pretrained CNN is
altered as:
• The last max pooling layer of the pre-trained CNN is replaced with a RoI pooling layer.
• The last fully connected layer and the last softmax layer (K classes) with a fully connected layer and softmax over
K + 1 classes.
Branch the last layer out to a bounding-box regression model which predicts offsets relative to the original RoI for
each of K classes.
35
Roozbeh Sanaei
Faster RCNN
36. Roozbeh Sanaei 36
The loss function sums up the cost of classification and bounding box prediction where bounding box
prediction is ignored for the “background”
36
Roozbeh Sanaei
RCNN Loss function
44. Roozbeh Sanaei 44
Reconstructed layers are semantically
strong but localization is inaccurate due
to down-sampling and up-sampling.
Lateral connections between
reconstructed layers and the
corresponding feature maps improves
localization
44
Roozbeh Sanaei
Feature Pyramid