SlideShare a Scribd company logo
1 of 27
UC Berkeley
Fully Convolutional Networks
for Semantic Segmentation
Jonathan Long* Evan Shelhamer* Trevor Darrell
1
Presented by: Gordon Christie
Slide credit: Jonathan Long
Overview
• Reinterpret standard classification convnets as “Fully
convolutional” networks (FCN) for semantic segmentation
• Use AlexNet, VGG, and GoogleNet in experiments
• Novel architecture: combine information from different
layers for segmentation
• State-of-the-art segmentation for PASCAL VOC
2011/2012, NYUDv2, and SIFT Flow at the time
• Inference less than one fifth of a second for a typical
image
2
Slide credit: Jonathan Long
pixels in, pixels out
monocular depth estimation (Liu et al. 2015)
boundary prediction (Xie & Tu 2015)
semantic
segmentation
3
Slide credit: Jonathan Long
4
“tabby cat”
1000-dim vector
< 1 millisecond
convnets perform classification
end-to-end learning
Slide credit: Jonathan Long
R-CNN
5
many seconds
“cat”
“dog”
R-CNN does detection
Slide credit: Jonathan Long
6
R-CNN
figure: Girshick et al.
Slide credit: Jonathan Long
7
< 1/5 second
end-to-end learning
???
Slide credit: Jonathan Long
“tabby cat”
8
a classification network
Slide credit: Jonathan Long
9
becoming fully convolutional
Slide credit: Jonathan Long
10
becoming fully convolutional
Slide credit: Jonathan Long
11
upsampling output
Slide credit: Jonathan Long
conv, pool,
nonlinearity
upsampling
pixelwise
output + loss
end-to-end, pixels-to-pixels
network
12
Slide credit: Jonathan Long
Dense Predictions
• Shift-and-stitch: trick that yields dense predictions
without interpolation
• Upsampling via deconvolution
• Shift-and-stitch used in preliminary experiments, but not
included in final model
• Upsampling found to be more effective and efficient
13
Classifier to Dense FCN
• Convolutionalize proven classification architectures:
AlexNet, VGG, and GoogLeNet (reimplementation)
• Remove classification layer and convert all fully
connected layers to convolutions
• Append 1x1 convolution with channel dimensions and
predict scores at each of the coarse output locations (21
categories + background for PASCAL)
14
Classifier to Dense FCN
Cast ILSVRC classifiers into FCNs and compare
performance on validation set of PASCAL 2011
15
spectrum of deep features
combine where (local, shallow) with what (global, deep)
fuse features into deep jet
(cf. Hariharan et al. CVPR15 “hypercolumn”) 16
Slide credit: Jonathan Long
skip layers
interp + sum
interp + sum
dense output 17
end-to-end, joint learning
of semantics and location
Slide credit: Jonathan Long
skip layers
18
Comparison of skip FCNs
19
Results on subset of validation set of PASCAL VOC 2011
stride 32
no skips
stride 16
1 skip
stride 8
2 skips
ground truth
input image
skip layer refinement
20
Slide credit: Jonathan Long
training + testing
- train full image at a time without patch sampling
- reshape network to take input of any size
- forward time is ~150ms for 500 x 500 x 21 output
21
Slide credit: Jonathan Long
Results – PASCAL VOC 2011/12
VOC 2011: 8498 training images (from additional labeled data
22
Results – NYUDv2
23
Table 4. Results on NYUDv2. RGBD is early-fusion of the
RGB anddepth channelsat theinput. HHAisthedepthembed-
ding of [14] as horizontal disparity, height above ground, and
the angle of the local surface normal with the inferred gravity
direction. RGB-HHA is the jointly trained late fusion model
that sums RGB and HHA predictions.
pixel
acc.
mean
acc.
mean
IU
f.w.
IU
Gupta et al. [14] 60.3 - 28.6 47.0
FCN-32s RGB 60.0 42.2 29.2 43.9
FCN-32s RGBD 61.5 42.4 30.5 45.5
FCN-32s HHA 57.1 35.2 24.2 40.4
FCN-32s RGB-HHA 64.3 44.9 32.8 48.0
FCN-16s RGB-HHA 65.4 46.1 34.0 49.5
Table 5.
(center) a
a non-pa
SVM wh
vnet train
samples (
noted RC
L
Tigh
Tighe
Tighe
Farabe
Farabe
1449 RGB-D images with pixelwise labels  40 categories
Results – SIFT Flow
2688 images with pixel labels
33 semantic categories, 3 geometric categories
Learn both label spaces jointly
 learning and inference have similar performance and
computation as independent models
24
is early-fusion of the
HAisthedepth embed-
ght above ground, and
th the inferred gravity
ned late fusion model
ean
c.
mean
IU
f.w.
IU
28.6 47.0
2 29.2 43.9
4 30.5 45.5
2 24.2 40.4
9 32.8 48.0
1 34.0 49.5
D images, with pixel-
Table 5. Results on SIFT Flow10
with class segmentation
(center) and geometric segmentation (right). Tighe [33] is
a non-parametric transfer method. Tighe 1 is an exemplar
SVM while 2 is SVM + MRF. Farabet is a multi-scale con-
vnet trained onclass-balanced samples(1) or natural frequency
samples (2). Pinheiro is a multi-scale, recurrent convnet, de-
noted RCNN3 (◦ 3
). Themetric for geometry ispixel accuracy.
pixel
acc.
mean
acc.
mean
IU
f.w.
IU
geom.
acc.
Liu et al. [23] 76.7 - - - -
Tigheet al. [33] - - - - 90.8
Tigheet al. [34] 1 75.6 41.1 - - -
Tigheet al. [34] 2 78.6 39.2 - - -
Farabet et al. [8] 1 72.3 50.8 - - -
Farabet et al. [8] 2 78.5 29.6 - - -
Pinheiro et al. [28] 77.7 29.8 - - -
FCN-16s 85.2 51.7 39.5 76.1 94.3
FCN SDS* Truth Input
25
Relative to prior state-of-the-
art SDS:
- 20% relative
improvement
for mean IoU
- 286× faster
*Simultaneous Detection and Segmentation
Hariharan et al. ECCV14
Slide credit: Jonathan Long
leaderboard
== segmentation with Caffe
26
FCN
FCN
FCN
FCN
FCN
FCN
FCN
FCN
FCN
FCN
FCN
FCN
FCN
FCN
FCN
Slide credit: Jonathan Long
conclusion
fully convolutional networks are fast, end-
to-end models for pixelwise problems
- code in Caffe branch (merged soon)
- models for PASCAL VOC, NYUDv2,
SIFT Flow, PASCAL-Context
27
caffe.berkeleyvision.org
github.com/BVLC/caffe
fcn.berkeleyvision.org
Slide credit: Jonathan Long

More Related Content

Similar to fully convolutional networks for semantic segmentation

DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
cscpconf
 
A Diffusion Wavelet Approach For 3 D Model Matching
A Diffusion Wavelet Approach For 3 D Model MatchingA Diffusion Wavelet Approach For 3 D Model Matching
A Diffusion Wavelet Approach For 3 D Model Matching
rafi
 
Deep Local Parametric Filters for Image Enhancement
Deep Local Parametric Filters for Image EnhancementDeep Local Parametric Filters for Image Enhancement
Deep Local Parametric Filters for Image Enhancement
Sean Moran
 

Similar to fully convolutional networks for semantic segmentation (20)

A Novel Blind SR Method to Improve the Spatial Resolution of Real Life Video ...
A Novel Blind SR Method to Improve the Spatial Resolution of Real Life Video ...A Novel Blind SR Method to Improve the Spatial Resolution of Real Life Video ...
A Novel Blind SR Method to Improve the Spatial Resolution of Real Life Video ...
 
Detection focal loss 딥러닝 논문읽기 모임 발표자료
Detection focal loss 딥러닝 논문읽기 모임 발표자료Detection focal loss 딥러닝 논문읽기 모임 발표자료
Detection focal loss 딥러닝 논문읽기 모임 발표자료
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Colored inversion
Colored inversionColored inversion
Colored inversion
 
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
 
A Diffusion Wavelet Approach For 3 D Model Matching
A Diffusion Wavelet Approach For 3 D Model MatchingA Diffusion Wavelet Approach For 3 D Model Matching
A Diffusion Wavelet Approach For 3 D Model Matching
 
Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related...
Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related...Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related...
Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related...
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
 
Single image super resolution with improved wavelet interpolation and iterati...
Single image super resolution with improved wavelet interpolation and iterati...Single image super resolution with improved wavelet interpolation and iterati...
Single image super resolution with improved wavelet interpolation and iterati...
 
Deep Local Parametric Filters for Image Enhancement
Deep Local Parametric Filters for Image EnhancementDeep Local Parametric Filters for Image Enhancement
Deep Local Parametric Filters for Image Enhancement
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
D3L4-objects.pdf
D3L4-objects.pdfD3L4-objects.pdf
D3L4-objects.pdf
 
Lecture17 xing fei-fei
Lecture17 xing fei-feiLecture17 xing fei-fei
Lecture17 xing fei-fei
 
Introduction to Chainer Chemistry
Introduction to Chainer ChemistryIntroduction to Chainer Chemistry
Introduction to Chainer Chemistry
 
Sub1539
Sub1539Sub1539
Sub1539
 
An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...
An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...
An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Multimodal Residual Networks for Visual QA
Multimodal Residual Networks for Visual QAMultimodal Residual Networks for Visual QA
Multimodal Residual Networks for Visual QA
 
LSTM Structured Pruning
LSTM Structured PruningLSTM Structured Pruning
LSTM Structured Pruning
 
RADIAL BASIS FUNCTION PROCESS NEURAL NETWORK TRAINING BASED ON GENERALIZED FR...
RADIAL BASIS FUNCTION PROCESS NEURAL NETWORK TRAINING BASED ON GENERALIZED FR...RADIAL BASIS FUNCTION PROCESS NEURAL NETWORK TRAINING BASED ON GENERALIZED FR...
RADIAL BASIS FUNCTION PROCESS NEURAL NETWORK TRAINING BASED ON GENERALIZED FR...
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software Engineering
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern Enterprise
 

fully convolutional networks for semantic segmentation

  • 1. UC Berkeley Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell 1 Presented by: Gordon Christie Slide credit: Jonathan Long
  • 2. Overview • Reinterpret standard classification convnets as “Fully convolutional” networks (FCN) for semantic segmentation • Use AlexNet, VGG, and GoogleNet in experiments • Novel architecture: combine information from different layers for segmentation • State-of-the-art segmentation for PASCAL VOC 2011/2012, NYUDv2, and SIFT Flow at the time • Inference less than one fifth of a second for a typical image 2 Slide credit: Jonathan Long
  • 3. pixels in, pixels out monocular depth estimation (Liu et al. 2015) boundary prediction (Xie & Tu 2015) semantic segmentation 3 Slide credit: Jonathan Long
  • 4. 4 “tabby cat” 1000-dim vector < 1 millisecond convnets perform classification end-to-end learning Slide credit: Jonathan Long
  • 5. R-CNN 5 many seconds “cat” “dog” R-CNN does detection Slide credit: Jonathan Long
  • 6. 6 R-CNN figure: Girshick et al. Slide credit: Jonathan Long
  • 7. 7 < 1/5 second end-to-end learning ??? Slide credit: Jonathan Long
  • 8. “tabby cat” 8 a classification network Slide credit: Jonathan Long
  • 9. 9 becoming fully convolutional Slide credit: Jonathan Long
  • 10. 10 becoming fully convolutional Slide credit: Jonathan Long
  • 12. conv, pool, nonlinearity upsampling pixelwise output + loss end-to-end, pixels-to-pixels network 12 Slide credit: Jonathan Long
  • 13. Dense Predictions • Shift-and-stitch: trick that yields dense predictions without interpolation • Upsampling via deconvolution • Shift-and-stitch used in preliminary experiments, but not included in final model • Upsampling found to be more effective and efficient 13
  • 14. Classifier to Dense FCN • Convolutionalize proven classification architectures: AlexNet, VGG, and GoogLeNet (reimplementation) • Remove classification layer and convert all fully connected layers to convolutions • Append 1x1 convolution with channel dimensions and predict scores at each of the coarse output locations (21 categories + background for PASCAL) 14
  • 15. Classifier to Dense FCN Cast ILSVRC classifiers into FCNs and compare performance on validation set of PASCAL 2011 15
  • 16. spectrum of deep features combine where (local, shallow) with what (global, deep) fuse features into deep jet (cf. Hariharan et al. CVPR15 “hypercolumn”) 16 Slide credit: Jonathan Long
  • 17. skip layers interp + sum interp + sum dense output 17 end-to-end, joint learning of semantics and location Slide credit: Jonathan Long
  • 19. Comparison of skip FCNs 19 Results on subset of validation set of PASCAL VOC 2011
  • 20. stride 32 no skips stride 16 1 skip stride 8 2 skips ground truth input image skip layer refinement 20 Slide credit: Jonathan Long
  • 21. training + testing - train full image at a time without patch sampling - reshape network to take input of any size - forward time is ~150ms for 500 x 500 x 21 output 21 Slide credit: Jonathan Long
  • 22. Results – PASCAL VOC 2011/12 VOC 2011: 8498 training images (from additional labeled data 22
  • 23. Results – NYUDv2 23 Table 4. Results on NYUDv2. RGBD is early-fusion of the RGB anddepth channelsat theinput. HHAisthedepthembed- ding of [14] as horizontal disparity, height above ground, and the angle of the local surface normal with the inferred gravity direction. RGB-HHA is the jointly trained late fusion model that sums RGB and HHA predictions. pixel acc. mean acc. mean IU f.w. IU Gupta et al. [14] 60.3 - 28.6 47.0 FCN-32s RGB 60.0 42.2 29.2 43.9 FCN-32s RGBD 61.5 42.4 30.5 45.5 FCN-32s HHA 57.1 35.2 24.2 40.4 FCN-32s RGB-HHA 64.3 44.9 32.8 48.0 FCN-16s RGB-HHA 65.4 46.1 34.0 49.5 Table 5. (center) a a non-pa SVM wh vnet train samples ( noted RC L Tigh Tighe Tighe Farabe Farabe 1449 RGB-D images with pixelwise labels  40 categories
  • 24. Results – SIFT Flow 2688 images with pixel labels 33 semantic categories, 3 geometric categories Learn both label spaces jointly  learning and inference have similar performance and computation as independent models 24 is early-fusion of the HAisthedepth embed- ght above ground, and th the inferred gravity ned late fusion model ean c. mean IU f.w. IU 28.6 47.0 2 29.2 43.9 4 30.5 45.5 2 24.2 40.4 9 32.8 48.0 1 34.0 49.5 D images, with pixel- Table 5. Results on SIFT Flow10 with class segmentation (center) and geometric segmentation (right). Tighe [33] is a non-parametric transfer method. Tighe 1 is an exemplar SVM while 2 is SVM + MRF. Farabet is a multi-scale con- vnet trained onclass-balanced samples(1) or natural frequency samples (2). Pinheiro is a multi-scale, recurrent convnet, de- noted RCNN3 (◦ 3 ). Themetric for geometry ispixel accuracy. pixel acc. mean acc. mean IU f.w. IU geom. acc. Liu et al. [23] 76.7 - - - - Tigheet al. [33] - - - - 90.8 Tigheet al. [34] 1 75.6 41.1 - - - Tigheet al. [34] 2 78.6 39.2 - - - Farabet et al. [8] 1 72.3 50.8 - - - Farabet et al. [8] 2 78.5 29.6 - - - Pinheiro et al. [28] 77.7 29.8 - - - FCN-16s 85.2 51.7 39.5 76.1 94.3
  • 25. FCN SDS* Truth Input 25 Relative to prior state-of-the- art SDS: - 20% relative improvement for mean IoU - 286× faster *Simultaneous Detection and Segmentation Hariharan et al. ECCV14 Slide credit: Jonathan Long
  • 26. leaderboard == segmentation with Caffe 26 FCN FCN FCN FCN FCN FCN FCN FCN FCN FCN FCN FCN FCN FCN FCN Slide credit: Jonathan Long
  • 27. conclusion fully convolutional networks are fast, end- to-end models for pixelwise problems - code in Caffe branch (merged soon) - models for PASCAL VOC, NYUDv2, SIFT Flow, PASCAL-Context 27 caffe.berkeleyvision.org github.com/BVLC/caffe fcn.berkeleyvision.org Slide credit: Jonathan Long

Editor's Notes

  1. Goal of work is to use FCn to predict class at every pixel Transfer existing classification models to dense prediction tasks
  2. Note that using existing networks is transfer learning
  3. note omissions “activations” fixed size input, single label output desire: efficient per-pixel output
  4. “Final layer deconvolutional filters are fixed to bilinear inter- polation, while intermediate upsampling layers are initial- ized to bilinear upsampling” Changing only the filters and layer strides of a convnet can produce the same output as this shift-and-stitch trick.
  5. “Despite similar classification accuracy, our implementation of GoogLeNet did not match this segmentation result.”
  6. THESE ARE VAL NUMBERS. Just begun and they are already state of the art They initialize using the classification models trained on imagenet Train with per-pixel multinomial loss and validate with mean intersection over union
  7. “Max fusion made learning difficult due to gradient switching.” Decreasing the stride of pooling layers is the most straightforward way to obtain finer predictions. However, doing so is problematic for our VGG16-based net. Setting the pool5 layer to have stride 1 requires our convolutionalized fc6 to have a kernel size of 14 × 14 in order to maintain its receptive field size. In addi- tion to their computational cost, we had difficulty learning such large filters. We made an attempt to re-architect the layers above pool5 with smaller filters, but were not suc- cessful in achieving comparable performance; one possible explanation is that the initialization from ImageNet-trained weights in the upper layers is important.
  8. Fixed = only fine tuning in final layer
  9. For following 3 results, dropout was used when used in original network SDS: MCG proposals, feature extraction, SVM to classify, region refinement
  10. Gupta: region proposals (using depth and rgb), deep features for depth and rgb, svm classifier, segmentation Gupta et all encode depth differently (surface normals and height from ground included) RGBD (early fusion) little improvement, perhaps difficult to propogate meaningful gradients through model To add depth information, we train on a model upgraded to take four-channel RGB-D input (early fusion)
  11. Semantic: bridge, mountain, sun, etc Geometric: horizontal, vertical, sky Farabet: multi-scale convnet, averaging class predictions across superpixels Pinheiro: patch based learning using multiple scales with rcnns
  12. + NYUD net for multi-modal input and SIFT Flow net for multi-task output
  13. Many segmentation methods powered by Caffe, most FCNs