SlideShare a Scribd company logo
1 of 59
Download to read offline
Modern Convolutional Neural Network
techniques for image segmentation
Deep Learning Journal Club
Gioele Ciaparrone
Michele Curci
November 30, 2016
University of Salerno
Index
1. Introduction
2. The Inception architecture
3. Fully convolutional networks
4. Hypercolumns
5. Conclusion
2
Introduction
CNN recap
• Sequence of convolutional and pooling layers
• Rectifier activation function
• Fully connected layers at the end
• Softmax function for classification
4
Convolution I
5
Convolution II
Valid padding (left) and same padding (right) convolutions
6
LeNet-5 (1989-1998)
• First CNN (1989) proven to work well, used for handwritten Zip
code recognition [1]
• Refined through the years until the LeNet-5 version (1998) [2]
7
LeNet-5 interactive visualization [3]
It’s possible to interact with the network in 3D, manually drawing a digit
to be classified, clicking on the neurons to get info about the parameters
and the connected units, or rotating and zooming the network:
http://scs.ryerson.ca/~aharley/vis/conv/
8
AlexNet (2012) [5]
• After a long hiatus in which deep learning was ignored [4], they
received attention once again after Alex Krizhevsky overwhelmingly
won the ILSVRC in 2012 with AlexNet
• Structure very similar to LeNet-5, but with some new key insights:
very efficient GPU implementation, ReLU neurons and dropout
9
The Inception architecture
Motivations
• Increasing model size tends to improve quality
• More computational resources are needed
• Computational efficiency and low parameter count are still important
• Mobile vision and embedded systems
• Big Data
11
Going Deeper with Convolutions [6]
• The Inception module solves this problem making a better use of the
computing resources
• Proposed in 2014 by Christian Szegedy and other Google researchers
• Used in the GoogLeNet architecture that won both the ILSVRC
2014 classification and detection challanges
12
Inception module I
• Visual information is processed at various scales and then aggregated
• Since pooling operations are beneficial in CNNs, a parallel pooling
path has been added
• Problems:
• 3x3 and 5x5 convolutions can be very expensive on top of a layer
with lots of filters
• The number of filters substantially increases for each Inception layer
added, leading to a computational blow up 13
Inception module II
• Adding the 1x1 convolutions before the bigger convolutions reduces
dimensionality
• The same is done after the pooling layer
14
GoogLeNet I
• GoogLeNet is a particular incarnation of the Inception architecture
• 22 convolutional layers (27 including pooling)
• 9 Inception modules
• 2 auxiliary classifiers to solve the vanishing gradient problem and for
regularization
• Designed with computational efficiency in mind
• Inference can be run on devices with limited computational
resources, especially memory
• 7 of these networks used in an ensemble for the ILSVRC 2014
classification task
15
GoogLeNet II
16
GoogLeNet III
17
GoogLeNet - Training
• Trained with the DistBelief distributed machine learning system
• Asynchronous stochastic gradient descent with 0.9 momentum
• Image sampling methods have changed many times before the
competition
• Converged models were trained on with other options
• Models were trained on crops of different size
• There isn’t a definitive guidance to the most effective single way to
train these networks
18
GoogLeNet - ILSVRC 2014 Results
Classification (above) and object detection (below) results.
19
DeepDream
Google’s DeepDream uses a GoogLeNet to produce “machine dreams”
20
Inception-v2 and Inception-v3
• The Inception module authors later presented new optimized
versions of the architecture, called Inception-v2 and Inception-v3 [7]
• They managed to significantly improve GoogLeNet ILSVRC 2014
results
• The improvements were based on various key principles:
• Avoid representational bottlenecks
• Spatial aggregation on lower dimensional embeddings doesn’t usually
induce relevant losses in representational power
• Balance the width and depth of the network
21
Convolution factorization I
• Factorizing convolutions allows to reduce the number of parameters
while not loosing much expressiveness
• For example 5x5 convolutions can be factorized into a pair of 3x3
convolutions
• It is also possible to factorize a NxN convolutions into a 1xN and a
Nx1 convolutions
22
Convolution factorization II
The original Inception module (left) and the new factorized module
(right).
23
Efficient grid size reduction - problem
• Suppose we want to pass from a d × d grid with k filters to a d
2 × d
2
grid with 2k filters
• We need to compute a stride-1 convolution and then a pooling
• Computational cost dominated by convolutions: 2d2
k2
operations
• Inverting the order, the number of operations is reduced to 2(d
2 )2
k2
,
but we violate the bottleneck principle
24
Efficient grid size reduction - solution
• The solution is an Inception module with convolution and pooling
blocks with stride 2
• Computationally efficient and no representational bottleneck
introduced
25
The new architecture
• Using various modified Inception modules, here is the new
Inception-v2 architecture
26
Inception-v2: modules used
n = 7
27
Inception-v2: training and observations
• The network was trained on the ILSVRC 2012 images using
stochastic gradient descent and the TensorFlow library
• Experimental testings proved the two auxiliary classifiers to have less
impact on the training convergence than expected
• In the early training phases, the model performance was not affected
by the presence of the auxiliary classifiers: they only improved the
performance near the end of training
• Removing the lower auxiliary classifier didn’t have any effect
• The main classifier performs better if batch normalization or dropout
are added to the auxiliary ones
• The model was also trained and tested on smaller receptive fields
with only a small loss of top-1 accuracy (76.6% for 299x299 RF vs.
75.2% on 79x79 RF). Important for post-classification of detection
28
Inception-v2 to Inception-v3 results (single model)
• Each row’s Inception-v2 model adds a feature with respect to the
previous row’s model
• The last line’s model is referred to as the Inception-v3 model
29
Inception-v3 vs other models (single and ensemble)
Single model results Ensemble results
• On the ILSVRC 2012 dataset, there is a significant improvement
versus state-of-the-art models, both with a single model and with an
ensemble of models
• Note that the ensemble errors here are validation errors (except for
the one marked with ’*’, that is a test error)
30
Fully convolutional networks
Semantic segmentation
• Image segmentation is the process of partitioning an image in
multiple segments (set of pixels or super-pixels)
• Semantic segmentation is the partitioning of an image into
semantically meaningful parts and to classify each part into one of
the pre-determined classes
• It’s possible to achieve the same result with pixel-wise
classification, i.e. assigning a class to each pixel
32
Fully convolutional networks
• Shelhamer et al. [8] showed that fully convolutional networks trained
pixels-to-pixels exceed the state-of-the-art in semantic segmentation
• The fully convolutional networks they proposed take input of
arbitrary size and produce same-sized output to make dense
predictions
33
Convolutionalization of a classic net I
• Typical recognition nets (AlexNet, GoogLeNet, etc.) take fixed-sized
inputs and produce non-spatial outputs
• The fully connected layers have fixed dimensions and drop the
spatial coordinates
• However we can view these fully connected layers as convolutions
that cover their entire input regions
34
Convolutionalization of a classic net II
• These fully convolutional networks take input of any size and output
classifications map
• The resulting maps are equivalent to the evaluation of the original
network on particular input patches
• The new network is more than 5 times faster than the original
network both at learning time and at inference time (considering a
10x10 output grid)
• Note that the output dimensions are typically reduced by
subsampling
• So output interpolation is needed to obtain dense predictions
• The interpolation is obtained through backwards convolutions
35
Backwards strided convolution
Upsampling from 3x3 grid to 5x5
36
Architecture I
• Coarse and local information is fused combining lower and higher
layers
• 3 network types with different layers fused were tested
37
Architecture II
• 3 proven classification architectures were transformed to fully
convolutional: AlexNet, VGG16 and GoogLeNet
• Each net’s final classifier layer is discarded and all the fully
connected layers are converted to convolutions
• A 1x1 convolution with 21 channels (the number of classes in the
PASCAL VOC 2011 dataset) is added to the end, followed by a
backwards convolution layer
38
Architecture III
• The original nets were first pre-trained using image classification
• Then they were transformed to fully convolutional for fine tuning
using whole images (using SGD with momentum)
• The best results were obtained with FCN-VGG16
• Training on whole images proved to be as effective as sampling
patches
39
Architecture comparison
• The first models (FCN-32s) didn’t fuse different layers, but the
resulting output is very coarse
• They then fused lower layers with the last one (as shown earlier) to
obtain better results (mean IU 62.7 for FCN-8s vs. 59.4 for
FCN-32s)
40
Results comparison I
• The model reaches state-of-the-art performance on semantic
segmentation
• Also the model is much faster at inference time than previous
architectures
41
Results comparison II
42
Hypercolumns
Hypercolumns I
• The last layer of a CNN captures general features of the image, but
is too coarse spatially to allow precise localization
• Earlier layers instead may be precise in localization but will not
capture semantics
• Hariharan et al. [9] presented the hypercolumn concept, which puts
togheter the information from both higher and lower layers to obtain
better results on 3 fine-grained localization tasks:
• Simultaneous detection and segmentation
• Keypoint localization
• Part labeling
44
Hypercolumns II
• The hypercolumn corresponding to a given input location is defined
as the outputs of all units above that location at all layers of the
CNN, stacked into one vector
45
Problem setting I
• Input: a set of detections (subjected to non-maximum suppression),
each with a bounding box, a category label and a score
• According to the task we are performing for each detection we want:
• segment out the object
• segment its parts
• predict its keypoints
• Whichever the task, the bounding boxes are slightly expanded and a
50x50 heatmap is predicted on each of them
46
Problem setting II
• The information encoded in each heatmap and the number of
heatmaps depend on the chosen task:
• For segmentation, the heatmap encodes the probability that a
particular location is inside the object
• For part labeling a separate heatmap is predicted for each part,
where each heatmap is the probability a location belongs to that part
• For keypoint localization a separate heatmap is predicted for each
keypoint, with each heatmap encoding the probability that the
keypoint is at a particular location
• The heatmaps are finally resized to the size of the expanded
bounding boxes
• So all the tasks are solved assigning a probability to each of the
50x50 locations
47
Problem setting III
• For each of the 50x50 locations and for each category a classifier
should be trained
• But doing so has 3 problems:
• The amount of data that each classifier sees during training is
heavily reduced
• Training so many classifiers is computationally expensive
• While the classifier should vary according to the location, to adjacent
pixels should be classified similarly
• The solution is to train a coarse K × K (usually K = 5 or K = 10)
grid of classifiers and interpolate between them
48
Network architecture
conv conv conv
upsample upsample upsample
sigmoid
classifier
interpolation
Note: inverting the order of upsampling and convolutions (that calculate
the K × K grids) and computing them separately for each of the 3
combined layers allows to reduce computational cost
49
Bounding box refining
• A special technique is used to improve the box selection, called
rescoring
50
SDS results
51
Keypoint prediction results
52
Part labeling results
53
Conclusion
Conclusion
• We have seen how the Inception modules allow to train deeper and
better networks in a computationally efficient manner
• We have then observed how to transform a classification CNN into a
fully convolutional network for pixel-wise classification
• We have learned the hypercolumn technique to combine high and
low level information to improve the accuracy on various fine-grained
localization tasks
55
Thank you for your patience! :)
56
References I
[1] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard,
W. Hubbard, and L. D. Jackel, “Backpropagation applied to
handwritten zip code recognition,” Neural Computation, vol. 1(4),
pp. 541–551, 1989.
[2] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based
learning applied to document recognition,” Proc. IEEE, vol. 86,
pp. 2278–2324, 1998.
[3] A. W. Harley, “An interactive node-link visualization of convolutional
neural networks,” in ISVC, pp. 867–877, 2015.
[4] A. Kurenkov, “A ’brief’ history of neural nets and deep learning, part
4.” http://www.andreykurenkov.com/writing/
a-brief-history-of-neural-nets-and-deep-learning-part-4/.
57
References II
[5] A. Krizhevsky, I. Sutskever, , and G. Hinton, “Imagenet classification
with deep convolutional neural networks,” Advances in Neural
Information Processing Systems, vol. 25, pp. 1106–1114, 2012.
[6] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov,
D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with
convolutions,” CoRR, vol. abs/1409.4842, 2014.
[7] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna,
“Rethinking the inception architecture for computer vision,” CoRR,
vol. abs/1512.00567, 2015.
[8] E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional networks
for semantic segmentation,” CoRR, vol. abs/1605.06211, 2016.
58
References III
[9] B. Hariharan, P. A. Arbel´aez, R. B. Girshick, and J. Malik,
“Hypercolumns for object segmentation and fine-grained
localization,” CoRR, vol. abs/1411.5752, 2014.
59

More Related Content

What's hot

Transformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to HeroTransformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to HeroBill Liu
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyNUPUR YADAV
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial NetworksMark Chang
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer VisionSungjoon Choi
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...Simplilearn
 
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Larry Guo
 
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Preferred Networks
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksChristian Perone
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks남주 김
 
Recurrent and Recursive Networks (Part 1)
Recurrent and Recursive Networks (Part 1)Recurrent and Recursive Networks (Part 1)
Recurrent and Recursive Networks (Part 1)sohaib_alam
 
Finding connections among images using CycleGAN
Finding connections among images using CycleGANFinding connections among images using CycleGAN
Finding connections among images using CycleGANNAVER Engineering
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learningAntonio Rueda-Toicen
 
Intro to deep learning
Intro to deep learning Intro to deep learning
Intro to deep learning David Voyles
 
Feature pyramid networks for object detection
Feature pyramid networks for object detection Feature pyramid networks for object detection
Feature pyramid networks for object detection heedaeKwon
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detectionBrodmann17
 
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Simplilearn
 
You Only Look Once: Unified, Real-Time Object Detection
You Only Look Once: Unified, Real-Time Object DetectionYou Only Look Once: Unified, Real-Time Object Detection
You Only Look Once: Unified, Real-Time Object DetectionDADAJONJURAKUZIEV
 
Deep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & FutureDeep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & FutureRouyun Pan
 

What's hot (20)

Transformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to HeroTransformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to Hero
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A survey
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial Networks
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
 
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10)
 
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Recurrent and Recursive Networks (Part 1)
Recurrent and Recursive Networks (Part 1)Recurrent and Recursive Networks (Part 1)
Recurrent and Recursive Networks (Part 1)
 
Finding connections among images using CycleGAN
Finding connections among images using CycleGANFinding connections among images using CycleGAN
Finding connections among images using CycleGAN
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learning
 
Intro to deep learning
Intro to deep learning Intro to deep learning
Intro to deep learning
 
Feature pyramid networks for object detection
Feature pyramid networks for object detection Feature pyramid networks for object detection
Feature pyramid networks for object detection
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
Tutorial on Deep Learning
Tutorial on Deep LearningTutorial on Deep Learning
Tutorial on Deep Learning
 
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
 
Cnn
CnnCnn
Cnn
 
You Only Look Once: Unified, Real-Time Object Detection
You Only Look Once: Unified, Real-Time Object DetectionYou Only Look Once: Unified, Real-Time Object Detection
You Only Look Once: Unified, Real-Time Object Detection
 
Deep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & FutureDeep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & Future
 

Similar to Modern Convolutional Neural Network techniques for image segmentation

PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignJinwon Lee
 
Handwritten Digit Recognition and performance of various modelsation[autosaved]
Handwritten Digit Recognition and performance of various modelsation[autosaved]Handwritten Digit Recognition and performance of various modelsation[autosaved]
Handwritten Digit Recognition and performance of various modelsation[autosaved]SubhradeepMaji
 
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...thanhdowork
 
intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxssuser3aa461
 
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 Once-for-All: Train One Network and Specialize it for Efficient Deployment Once-for-All: Train One Network and Specialize it for Efficient Deployment
Once-for-All: Train One Network and Specialize it for Efficient Deploymenttaeseon ryu
 
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...ssuser4b1f48
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architecturesananth
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkRichard Kuo
 
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digitsNVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digitsNVIDIA Taiwan
 
U-Netpresentation.pptx
U-Netpresentation.pptxU-Netpresentation.pptx
U-Netpresentation.pptxNoorUlHaq47
 
ConvNeXt: A ConvNet for the 2020s explained
ConvNeXt: A ConvNet for the 2020s explainedConvNeXt: A ConvNet for the 2020s explained
ConvNeXt: A ConvNet for the 2020s explainedSushant Gautam
 
04 Deep CNN (Ch_01 to Ch_3).pptx
04 Deep CNN (Ch_01 to Ch_3).pptx04 Deep CNN (Ch_01 to Ch_3).pptx
04 Deep CNN (Ch_01 to Ch_3).pptxZainULABIDIN496386
 
Image Classification using deep learning
Image Classification using deep learning Image Classification using deep learning
Image Classification using deep learning Asma-AH
 
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptxEfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptxssuser2624f71
 
3_Transfer_Learning.pdf
3_Transfer_Learning.pdf3_Transfer_Learning.pdf
3_Transfer_Learning.pdfFEG
 
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...taeseon ryu
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesJinwon Lee
 

Similar to Modern Convolutional Neural Network techniques for image segmentation (20)

PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
 
lec6a.ppt
lec6a.pptlec6a.ppt
lec6a.ppt
 
GoogLeNet.pptx
GoogLeNet.pptxGoogLeNet.pptx
GoogLeNet.pptx
 
Handwritten Digit Recognition and performance of various modelsation[autosaved]
Handwritten Digit Recognition and performance of various modelsation[autosaved]Handwritten Digit Recognition and performance of various modelsation[autosaved]
Handwritten Digit Recognition and performance of various modelsation[autosaved]
 
VGG.pptx
VGG.pptxVGG.pptx
VGG.pptx
 
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
 
intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptx
 
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 Once-for-All: Train One Network and Specialize it for Efficient Deployment Once-for-All: Train One Network and Specialize it for Efficient Deployment
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
 
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digitsNVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
 
U-Netpresentation.pptx
U-Netpresentation.pptxU-Netpresentation.pptx
U-Netpresentation.pptx
 
ConvNeXt: A ConvNet for the 2020s explained
ConvNeXt: A ConvNet for the 2020s explainedConvNeXt: A ConvNet for the 2020s explained
ConvNeXt: A ConvNet for the 2020s explained
 
04 Deep CNN (Ch_01 to Ch_3).pptx
04 Deep CNN (Ch_01 to Ch_3).pptx04 Deep CNN (Ch_01 to Ch_3).pptx
04 Deep CNN (Ch_01 to Ch_3).pptx
 
Image Classification using deep learning
Image Classification using deep learning Image Classification using deep learning
Image Classification using deep learning
 
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptxEfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
 
3_Transfer_Learning.pdf
3_Transfer_Learning.pdf3_Transfer_Learning.pdf
3_Transfer_Learning.pdf
 
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
 

Recently uploaded

如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themeitharjee
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...gragchanchal546
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...HyderabadDolls
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 

Recently uploaded (20)

如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 

Modern Convolutional Neural Network techniques for image segmentation

  • 1. Modern Convolutional Neural Network techniques for image segmentation Deep Learning Journal Club Gioele Ciaparrone Michele Curci November 30, 2016 University of Salerno
  • 2. Index 1. Introduction 2. The Inception architecture 3. Fully convolutional networks 4. Hypercolumns 5. Conclusion 2
  • 4. CNN recap • Sequence of convolutional and pooling layers • Rectifier activation function • Fully connected layers at the end • Softmax function for classification 4
  • 6. Convolution II Valid padding (left) and same padding (right) convolutions 6
  • 7. LeNet-5 (1989-1998) • First CNN (1989) proven to work well, used for handwritten Zip code recognition [1] • Refined through the years until the LeNet-5 version (1998) [2] 7
  • 8. LeNet-5 interactive visualization [3] It’s possible to interact with the network in 3D, manually drawing a digit to be classified, clicking on the neurons to get info about the parameters and the connected units, or rotating and zooming the network: http://scs.ryerson.ca/~aharley/vis/conv/ 8
  • 9. AlexNet (2012) [5] • After a long hiatus in which deep learning was ignored [4], they received attention once again after Alex Krizhevsky overwhelmingly won the ILSVRC in 2012 with AlexNet • Structure very similar to LeNet-5, but with some new key insights: very efficient GPU implementation, ReLU neurons and dropout 9
  • 11. Motivations • Increasing model size tends to improve quality • More computational resources are needed • Computational efficiency and low parameter count are still important • Mobile vision and embedded systems • Big Data 11
  • 12. Going Deeper with Convolutions [6] • The Inception module solves this problem making a better use of the computing resources • Proposed in 2014 by Christian Szegedy and other Google researchers • Used in the GoogLeNet architecture that won both the ILSVRC 2014 classification and detection challanges 12
  • 13. Inception module I • Visual information is processed at various scales and then aggregated • Since pooling operations are beneficial in CNNs, a parallel pooling path has been added • Problems: • 3x3 and 5x5 convolutions can be very expensive on top of a layer with lots of filters • The number of filters substantially increases for each Inception layer added, leading to a computational blow up 13
  • 14. Inception module II • Adding the 1x1 convolutions before the bigger convolutions reduces dimensionality • The same is done after the pooling layer 14
  • 15. GoogLeNet I • GoogLeNet is a particular incarnation of the Inception architecture • 22 convolutional layers (27 including pooling) • 9 Inception modules • 2 auxiliary classifiers to solve the vanishing gradient problem and for regularization • Designed with computational efficiency in mind • Inference can be run on devices with limited computational resources, especially memory • 7 of these networks used in an ensemble for the ILSVRC 2014 classification task 15
  • 18. GoogLeNet - Training • Trained with the DistBelief distributed machine learning system • Asynchronous stochastic gradient descent with 0.9 momentum • Image sampling methods have changed many times before the competition • Converged models were trained on with other options • Models were trained on crops of different size • There isn’t a definitive guidance to the most effective single way to train these networks 18
  • 19. GoogLeNet - ILSVRC 2014 Results Classification (above) and object detection (below) results. 19
  • 20. DeepDream Google’s DeepDream uses a GoogLeNet to produce “machine dreams” 20
  • 21. Inception-v2 and Inception-v3 • The Inception module authors later presented new optimized versions of the architecture, called Inception-v2 and Inception-v3 [7] • They managed to significantly improve GoogLeNet ILSVRC 2014 results • The improvements were based on various key principles: • Avoid representational bottlenecks • Spatial aggregation on lower dimensional embeddings doesn’t usually induce relevant losses in representational power • Balance the width and depth of the network 21
  • 22. Convolution factorization I • Factorizing convolutions allows to reduce the number of parameters while not loosing much expressiveness • For example 5x5 convolutions can be factorized into a pair of 3x3 convolutions • It is also possible to factorize a NxN convolutions into a 1xN and a Nx1 convolutions 22
  • 23. Convolution factorization II The original Inception module (left) and the new factorized module (right). 23
  • 24. Efficient grid size reduction - problem • Suppose we want to pass from a d × d grid with k filters to a d 2 × d 2 grid with 2k filters • We need to compute a stride-1 convolution and then a pooling • Computational cost dominated by convolutions: 2d2 k2 operations • Inverting the order, the number of operations is reduced to 2(d 2 )2 k2 , but we violate the bottleneck principle 24
  • 25. Efficient grid size reduction - solution • The solution is an Inception module with convolution and pooling blocks with stride 2 • Computationally efficient and no representational bottleneck introduced 25
  • 26. The new architecture • Using various modified Inception modules, here is the new Inception-v2 architecture 26
  • 28. Inception-v2: training and observations • The network was trained on the ILSVRC 2012 images using stochastic gradient descent and the TensorFlow library • Experimental testings proved the two auxiliary classifiers to have less impact on the training convergence than expected • In the early training phases, the model performance was not affected by the presence of the auxiliary classifiers: they only improved the performance near the end of training • Removing the lower auxiliary classifier didn’t have any effect • The main classifier performs better if batch normalization or dropout are added to the auxiliary ones • The model was also trained and tested on smaller receptive fields with only a small loss of top-1 accuracy (76.6% for 299x299 RF vs. 75.2% on 79x79 RF). Important for post-classification of detection 28
  • 29. Inception-v2 to Inception-v3 results (single model) • Each row’s Inception-v2 model adds a feature with respect to the previous row’s model • The last line’s model is referred to as the Inception-v3 model 29
  • 30. Inception-v3 vs other models (single and ensemble) Single model results Ensemble results • On the ILSVRC 2012 dataset, there is a significant improvement versus state-of-the-art models, both with a single model and with an ensemble of models • Note that the ensemble errors here are validation errors (except for the one marked with ’*’, that is a test error) 30
  • 32. Semantic segmentation • Image segmentation is the process of partitioning an image in multiple segments (set of pixels or super-pixels) • Semantic segmentation is the partitioning of an image into semantically meaningful parts and to classify each part into one of the pre-determined classes • It’s possible to achieve the same result with pixel-wise classification, i.e. assigning a class to each pixel 32
  • 33. Fully convolutional networks • Shelhamer et al. [8] showed that fully convolutional networks trained pixels-to-pixels exceed the state-of-the-art in semantic segmentation • The fully convolutional networks they proposed take input of arbitrary size and produce same-sized output to make dense predictions 33
  • 34. Convolutionalization of a classic net I • Typical recognition nets (AlexNet, GoogLeNet, etc.) take fixed-sized inputs and produce non-spatial outputs • The fully connected layers have fixed dimensions and drop the spatial coordinates • However we can view these fully connected layers as convolutions that cover their entire input regions 34
  • 35. Convolutionalization of a classic net II • These fully convolutional networks take input of any size and output classifications map • The resulting maps are equivalent to the evaluation of the original network on particular input patches • The new network is more than 5 times faster than the original network both at learning time and at inference time (considering a 10x10 output grid) • Note that the output dimensions are typically reduced by subsampling • So output interpolation is needed to obtain dense predictions • The interpolation is obtained through backwards convolutions 35
  • 36. Backwards strided convolution Upsampling from 3x3 grid to 5x5 36
  • 37. Architecture I • Coarse and local information is fused combining lower and higher layers • 3 network types with different layers fused were tested 37
  • 38. Architecture II • 3 proven classification architectures were transformed to fully convolutional: AlexNet, VGG16 and GoogLeNet • Each net’s final classifier layer is discarded and all the fully connected layers are converted to convolutions • A 1x1 convolution with 21 channels (the number of classes in the PASCAL VOC 2011 dataset) is added to the end, followed by a backwards convolution layer 38
  • 39. Architecture III • The original nets were first pre-trained using image classification • Then they were transformed to fully convolutional for fine tuning using whole images (using SGD with momentum) • The best results were obtained with FCN-VGG16 • Training on whole images proved to be as effective as sampling patches 39
  • 40. Architecture comparison • The first models (FCN-32s) didn’t fuse different layers, but the resulting output is very coarse • They then fused lower layers with the last one (as shown earlier) to obtain better results (mean IU 62.7 for FCN-8s vs. 59.4 for FCN-32s) 40
  • 41. Results comparison I • The model reaches state-of-the-art performance on semantic segmentation • Also the model is much faster at inference time than previous architectures 41
  • 44. Hypercolumns I • The last layer of a CNN captures general features of the image, but is too coarse spatially to allow precise localization • Earlier layers instead may be precise in localization but will not capture semantics • Hariharan et al. [9] presented the hypercolumn concept, which puts togheter the information from both higher and lower layers to obtain better results on 3 fine-grained localization tasks: • Simultaneous detection and segmentation • Keypoint localization • Part labeling 44
  • 45. Hypercolumns II • The hypercolumn corresponding to a given input location is defined as the outputs of all units above that location at all layers of the CNN, stacked into one vector 45
  • 46. Problem setting I • Input: a set of detections (subjected to non-maximum suppression), each with a bounding box, a category label and a score • According to the task we are performing for each detection we want: • segment out the object • segment its parts • predict its keypoints • Whichever the task, the bounding boxes are slightly expanded and a 50x50 heatmap is predicted on each of them 46
  • 47. Problem setting II • The information encoded in each heatmap and the number of heatmaps depend on the chosen task: • For segmentation, the heatmap encodes the probability that a particular location is inside the object • For part labeling a separate heatmap is predicted for each part, where each heatmap is the probability a location belongs to that part • For keypoint localization a separate heatmap is predicted for each keypoint, with each heatmap encoding the probability that the keypoint is at a particular location • The heatmaps are finally resized to the size of the expanded bounding boxes • So all the tasks are solved assigning a probability to each of the 50x50 locations 47
  • 48. Problem setting III • For each of the 50x50 locations and for each category a classifier should be trained • But doing so has 3 problems: • The amount of data that each classifier sees during training is heavily reduced • Training so many classifiers is computationally expensive • While the classifier should vary according to the location, to adjacent pixels should be classified similarly • The solution is to train a coarse K × K (usually K = 5 or K = 10) grid of classifiers and interpolate between them 48
  • 49. Network architecture conv conv conv upsample upsample upsample sigmoid classifier interpolation Note: inverting the order of upsampling and convolutions (that calculate the K × K grids) and computing them separately for each of the 3 combined layers allows to reduce computational cost 49
  • 50. Bounding box refining • A special technique is used to improve the box selection, called rescoring 50
  • 55. Conclusion • We have seen how the Inception modules allow to train deeper and better networks in a computationally efficient manner • We have then observed how to transform a classification CNN into a fully convolutional network for pixel-wise classification • We have learned the hypercolumn technique to combine high and low level information to improve the accuracy on various fine-grained localization tasks 55
  • 56. Thank you for your patience! :) 56
  • 57. References I [1] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, “Backpropagation applied to handwritten zip code recognition,” Neural Computation, vol. 1(4), pp. 541–551, 1989. [2] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, pp. 2278–2324, 1998. [3] A. W. Harley, “An interactive node-link visualization of convolutional neural networks,” in ISVC, pp. 867–877, 2015. [4] A. Kurenkov, “A ’brief’ history of neural nets and deep learning, part 4.” http://www.andreykurenkov.com/writing/ a-brief-history-of-neural-nets-and-deep-learning-part-4/. 57
  • 58. References II [5] A. Krizhevsky, I. Sutskever, , and G. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in Neural Information Processing Systems, vol. 25, pp. 1106–1114, 2012. [6] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” CoRR, vol. abs/1409.4842, 2014. [7] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” CoRR, vol. abs/1512.00567, 2015. [8] E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional networks for semantic segmentation,” CoRR, vol. abs/1605.06211, 2016. 58
  • 59. References III [9] B. Hariharan, P. A. Arbel´aez, R. B. Girshick, and J. Malik, “Hypercolumns for object segmentation and fine-grained localization,” CoRR, vol. abs/1411.5752, 2014. 59