SlideShare a Scribd company logo
1 of 43
Download to read offline
DeconvNet, DecoupledNet,
TransferNet in Image Segmentation
NamHyuk Ahn @ Ajou Univ.
2016. 05. 11
Contents
- Semantic Segmentation
- Deconvolution Network for Supervised Learning
- Decoupled Network for Semi-Supervised Learning
- Transfer Learning in Semantic Segmentation
Semantic Segmentation
Semantic Segmentation
- Predict pixel-level label in image
- ct
[Shotton et al . 2007]
PASCAL VOC
- 20 classes
- 12K training / 1K test images

MS COCO
- 91 classes
- 120K training / 40K test

images
Datasets
Deconvolution Network for
Supervised Learning
Problems of FCN
- FCN only handle
single-scale semantic,
since it has fixed-size
receptive field
- Label map is so small,
tend to forget detail
structures of object
DeconvNet
- To address such issue, they use “deconvolution”
- Convolution Network extract features (VGG-16 net)
- Deconvolution Network generate probability map (same size
to input image)
- Probability map indicate probability each pixel belongs to one
of class
-
Deconvolution Network
- Unpooling
• Reconstruct structure of
original activation map
• Activation size is preserved,
but still sparse
- Deconvolution
• Densify sparse (enlarge)
activation map
Analysis of DeconvNet
- DeconvNet is better in segmentation since it produce
dense and enlarged pixel-wise map
- Shallow layers tend to capture overall structure of object
(shape, region, position), deep layers does complicated
patterns
- Unpooling captures example-specific structure so can
reconstruct object details in higher resolution
- Deconvolution captures class-specific shape, so closely
related to target class are amplified and noise activations
are suppresed
Analysis of DeconvNet
More details of DeconvNet
- Instance-wise segmentation
- Use batch normalization in both networks
- Two-stage training
- Ensemble with FCN
• FCN, DeconvNet are complementary relationship
• Best result
Instance-wise Segmentation
- Input proposal instances in network (not entire image)
- Get proposal instance using EdgeBox algorithm
- Identify more details of object with multi scale
- Reduce search space, so can reduce memory at train
Two-stage Training
- DeconvNet has lots of parameters, but don’t have
many segmentation data (10K in PASCAL VOC)
• Use two-stage training to address this issue
• Fist stage: Input center-cropped images
• Second stage: Input proposal sub-images
- So network generalize better
Result
- 2nd best in Pascal VOC only training
- Note: In paper they say mean IOU is 72.5, but in
presentation files, 74.8
Qualitative Example
Recap
- Possible to make dense, precise segmentation mask
since reconstruct coarse-to-fine construction
- With instance-wise segmentation, it can handle object
scale variation
- But lots of parameters (almost 2x VGG-16) 

so additional training stage is needed
Decoupled Network for Semi-
Supervised Learning
Motivation
- Make ground-truth of segmentation takes a lot of
cost so do it like semi-supervised learning
- Utilize many image-level annotation and few pixel-
level annotation
- Modify DeconvNet
- With less data (25 per class), achieve good result
(62.5 mean IOU)
Main idea
- Semantic segmentation can be decomposed to 

multi-label classification, binary segmentation
Person
Bottle
Multi-label classification Binary segmentationSemantic segmentation
Overview
- Classification network for multi-label classification
- Segmentation network for binary segmentation
- Bridging layers for delivering class-specific
information to segmentation network
Architecture
- Classification Network (Same as VGG-16)
- Segmentation Network
• Take class-specific activation map from bridge layer and do
binary segmentation (main difference with DeconvNet)
• Binary segmentation reduce parameters, so we can train with
few pixel-wise annotation data
Architecture
- Bridging Layers
• Segmentation network needs class-specific and spatial info to
produce class-specific segmentation mask
• Get spatial information from pool5 in classification network
• has useful info for shape generation, but contain mixed info
of all relevant label → identify class-specific activation
• Make saliency map to identify class-specific activation
Architecture
- Saliency Map
1. Produce score vector, set
dscore all 0 but 1 in idx
related to label that want
to track
2. Backprop to arbitrary
layer (pool5 in this paper)
- By saliency map we can get
class-specific information 

in each label (class)
Qualitative example of saliency map 

[Karen Simonyan et al,. 2014]
Architecture
- Bridging Layers
• Combine , to produce class-specific activation map
• Pass through fc layer and feed to segmentation network
• g has both spatial and class-specific information
Inference
- Computing segmentation map for each identified label
- Pixel-wise aggregate each segmentation map M
Training
- Train classification network with many image-level
annotation
- Train segmentation network and bridging layers with
few pixel-level annotation
Result
Qualitative Example
Recap
- Utilize many image-level annotation and few pixel-level
annotation
- Add bridging layer to DeconvNet for binary segmentation to
reduce parameter
- Bridging layer output both spatial and class-specific information
in each class (label)
- Train two networks separately (decoupled)
• Worse performance in fully-supervision since jointly optimization is
more desirable in fully-supervision
- With few strong annotated data (25 per class) achieve good
result (62.5 mean IOU)
Transfer Learning in Semantic
Segmentation
Motivation
- Pre-train network and inference to new dataset

(ex. train with MS COCO, inference to PASCAL VOC)
- This idea doesn’t work well with DecoupledNet
• DecoupledNet trained with class-specific input, so it
can’t be generalize to new class
• Train network with class-independent input!
Overview
- Attention model identify salient region of each class associated with input
image
• Output of attention model has location information of each class in
coarse feature map
- Encoder extract features; Decoder generate dense foreground
segmentation mask of each focused region
- Training stage
• Fix encoder (pre-trained) and train decoder, attention model using pixel-level
annotation from source domain
• Train attention model using image-level annotation in both domain
- After training, decoder is trained with source domain and attention is
trained with both domain so attention adapted to target domain
Overview
- Decoupled encoder-decoder make it possible to share information
for shape generation among different class
- Attention model provides
• Predictions for localization
• Class-specific information → enable to adapt decoder into target domain
- With attention model, able to get information transferable across
different domain and provide useful segmentation prior information
Architecture
- Encoder
• Extract feature descriptor as 

A is obtain from last conv layer to retain spatial information
• M, D is # of hidden unit (20x20), # of channel respectively
- Attention model
• To train weight vector , where represents
relevance of location to each class l
• Formally,
• And extra technique to reduce parameter [R. Memisevic. 2013] did
Architecture
- Attention model
• To apply attention to this model, it has to be trainable in both
domain
• Add additional layers on top of attention model, and train

both , under classification objective
• Finally, , z represents class-specific
feature
• Can optimize z using weak annotation with both domain

• Example of attention
Architecture
- Decoder
• Output of attention model is spare due to softmax, it may lost
information for shape generation
• Feed additional input A to z (multiply) → densified attention
• With densified attention, optimize segmentation loss, procedure is
same as DecoupledNet, but optimize decoder only with source domain
Analysis of TransferNet
- Decoder generates foreground segmentation of
attention to each label
- By decoupling classification (domain specific task), it
can capture class-independent information for shape
generation and apply unseen class
- Train attention model using not only pixel-level but also
image-level annotation, it can handle unseen class
• In DecoupledNet, bridging layer is trained by only pixel-level data

Train / Inference
- When train, optimize this eq
• Trained using only class label is good, but jointly train with
segmentation label to regularize noise
• After training, remove since it is required only in training to
learn attention from target domain
- Inference
1. Iteratively obtain attention and segmentation mask
2. Aggregate mask (same as DecoupledNet)
Result
Qualitative Example
Reference
- Hyeonwoo Noh, Seunghoon Hong, and Bohyung Han. “Learning
deconvolution network for semantic segmentation.” Proceedings of the
IEEE International Conference on Computer Vision. 2015.
- Seunghoon Hong, Hyeonwoo Noh, and Bohyung Han. "Decoupled deep
neural network for semi-supervised semantic segmentation.” Advances in
Neural Information Processing Systems. 2015.
- Seunghoon Hong, et al. “Learning Transferrable Knowledge for Semantic
Segmentation with Deep Convolutional Neural Network.” arXiv preprint
arXiv:1512.07928 (2015).
- Hyeonwoo Noh. “Semantic Segmentation and Visual Question Answering”
(https://drive.google.com/file/d/0B5xl2L77gZfVRXZxQWNmSGlBemc/view)

More Related Content

What's hot

Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architecturesananth
 
Review-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learningReview-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learningTrong-An Bui
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorJinwon Lee
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksJeremy Nixon
 
Learning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for GraphsLearning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for GraphsMathias Niepert
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetSungminYou
 
Recent Object Detection Research & Person Detection
Recent Object Detection Research & Person DetectionRecent Object Detection Research & Person Detection
Recent Object Detection Research & Person DetectionKai-Wen Zhao
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkRichard Kuo
 
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015Jia-Bin Huang
 
YOLO9000 - PR023
YOLO9000 - PR023YOLO9000 - PR023
YOLO9000 - PR023Jinwon Lee
 
Introduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksIntroduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksHannes Hapke
 
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation..."Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...Edge AI and Vision Alliance
 
CNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesCNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesDmytro Mishkin
 
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...Universitat Politècnica de Catalunya
 
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Simplilearn
 
Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)SungminYou
 

What's hot (20)

Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
 
Review-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learningReview-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learning
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural Networks
 
Deep learning
Deep learningDeep learning
Deep learning
 
Learning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for GraphsLearning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for Graphs
 
crfasrnn_presentation
crfasrnn_presentationcrfasrnn_presentation
crfasrnn_presentation
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNet
 
Recent Object Detection Research & Person Detection
Recent Object Detection Research & Person DetectionRecent Object Detection Research & Person Detection
Recent Object Detection Research & Person Detection
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
 
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
 
YOLO9000 - PR023
YOLO9000 - PR023YOLO9000 - PR023
YOLO9000 - PR023
 
Introduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksIntroduction to Convolutional Neural Networks
Introduction to Convolutional Neural Networks
 
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation..."Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
 
Cnn
CnnCnn
Cnn
 
CNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesCNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent Advances
 
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
 
Cnn
CnnCnn
Cnn
 
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
 
Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)
 

Similar to DeconvNet, DecoupledNet, TransferNet in Image Segmentation

Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyNUPUR YADAV
 
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digitsNVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digitsNVIDIA Taiwan
 
AaSeminar_Template.pptx
AaSeminar_Template.pptxAaSeminar_Template.pptx
AaSeminar_Template.pptxManojGowdaKb
 
Introduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural NetworksIntroduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural NetworksMarcinJedyk
 
Introduction to computer vision
Introduction to computer visionIntroduction to computer vision
Introduction to computer visionMarcin Jedyk
 
3_Transfer_Learning.pdf
3_Transfer_Learning.pdf3_Transfer_Learning.pdf
3_Transfer_Learning.pdfFEG
 
Deep learning for image video processing
Deep learning for image video processingDeep learning for image video processing
Deep learning for image video processingYu Huang
 
MLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, CaptioningMLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, CaptioningCharles Deledalle
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用CHENHuiMei
 
Image Classification using Deep Learning
Image Classification using Deep LearningImage Classification using Deep Learning
Image Classification using Deep LearningIRJET Journal
 
multi modal transformers representation generation .pptx
multi modal transformers representation generation .pptxmulti modal transformers representation generation .pptx
multi modal transformers representation generation .pptxsiddharth1729
 
Transfer Learning (20230516)
Transfer Learning (20230516)Transfer Learning (20230516)
Transfer Learning (20230516)FEG
 
Deep-learning-for-computer-vision-applications-using-matlab.pdf
Deep-learning-for-computer-vision-applications-using-matlab.pdfDeep-learning-for-computer-vision-applications-using-matlab.pdf
Deep-learning-for-computer-vision-applications-using-matlab.pdfAubainYro1
 
Computer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonComputer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonAditya Bhattacharya
 
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_ReportSaptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_ReportSitakanta Mishra
 
U-Netpresentation.pptx
U-Netpresentation.pptxU-Netpresentation.pptx
U-Netpresentation.pptxNoorUlHaq47
 

Similar to DeconvNet, DecoupledNet, TransferNet in Image Segmentation (20)

Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A survey
 
SPPNet
SPPNetSPPNet
SPPNet
 
Presentation roi
Presentation roiPresentation roi
Presentation roi
 
Dl
DlDl
Dl
 
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digitsNVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
 
AaSeminar_Template.pptx
AaSeminar_Template.pptxAaSeminar_Template.pptx
AaSeminar_Template.pptx
 
Introduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural NetworksIntroduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural Networks
 
Introduction to computer vision
Introduction to computer visionIntroduction to computer vision
Introduction to computer vision
 
lec6a.ppt
lec6a.pptlec6a.ppt
lec6a.ppt
 
3_Transfer_Learning.pdf
3_Transfer_Learning.pdf3_Transfer_Learning.pdf
3_Transfer_Learning.pdf
 
Deep learning for image video processing
Deep learning for image video processingDeep learning for image video processing
Deep learning for image video processing
 
MLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, CaptioningMLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, Captioning
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用
 
Image Classification using Deep Learning
Image Classification using Deep LearningImage Classification using Deep Learning
Image Classification using Deep Learning
 
multi modal transformers representation generation .pptx
multi modal transformers representation generation .pptxmulti modal transformers representation generation .pptx
multi modal transformers representation generation .pptx
 
Transfer Learning (20230516)
Transfer Learning (20230516)Transfer Learning (20230516)
Transfer Learning (20230516)
 
Deep-learning-for-computer-vision-applications-using-matlab.pdf
Deep-learning-for-computer-vision-applications-using-matlab.pdfDeep-learning-for-computer-vision-applications-using-matlab.pdf
Deep-learning-for-computer-vision-applications-using-matlab.pdf
 
Computer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonComputer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathon
 
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_ReportSaptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
 
U-Netpresentation.pptx
U-Netpresentation.pptxU-Netpresentation.pptx
U-Netpresentation.pptx
 

Recently uploaded

Curve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxCurve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxRomil Mishra
 
Secure Key Crypto - Tech Paper JET Tech Labs
Secure Key Crypto - Tech Paper JET Tech LabsSecure Key Crypto - Tech Paper JET Tech Labs
Secure Key Crypto - Tech Paper JET Tech Labsamber724300
 
Novel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsNovel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsResearcher Researcher
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Sumanth A
 
The Satellite applications in telecommunication
The Satellite applications in telecommunicationThe Satellite applications in telecommunication
The Satellite applications in telecommunicationnovrain7111
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...Erbil Polytechnic University
 
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptxTriangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptxRomil Mishra
 
Structural Integrity Assessment Standards in Nigeria by Engr Nimot Muili
Structural Integrity Assessment Standards in Nigeria by Engr Nimot MuiliStructural Integrity Assessment Standards in Nigeria by Engr Nimot Muili
Structural Integrity Assessment Standards in Nigeria by Engr Nimot MuiliNimot Muili
 
A brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision ProA brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision ProRay Yuan Liu
 
Cost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionCost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionSneha Padhiar
 
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfComprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfalene1
 
Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Romil Mishra
 
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSneha Padhiar
 
tourism-management-srs_compress-software-engineering.pdf
tourism-management-srs_compress-software-engineering.pdftourism-management-srs_compress-software-engineering.pdf
tourism-management-srs_compress-software-engineering.pdfchess188chess188
 
March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...
March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...
March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...gerogepatton
 
70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical trainingGladiatorsKasper
 
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...arifengg7
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdfsahilsajad201
 
KCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosKCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosVictor Morales
 
Module-1-Building Acoustics(Introduction)(Unit-1).pdf
Module-1-Building Acoustics(Introduction)(Unit-1).pdfModule-1-Building Acoustics(Introduction)(Unit-1).pdf
Module-1-Building Acoustics(Introduction)(Unit-1).pdfManish Kumar
 

Recently uploaded (20)

Curve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxCurve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptx
 
Secure Key Crypto - Tech Paper JET Tech Labs
Secure Key Crypto - Tech Paper JET Tech LabsSecure Key Crypto - Tech Paper JET Tech Labs
Secure Key Crypto - Tech Paper JET Tech Labs
 
Novel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsNovel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending Actuators
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
 
The Satellite applications in telecommunication
The Satellite applications in telecommunicationThe Satellite applications in telecommunication
The Satellite applications in telecommunication
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...
 
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptxTriangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptx
 
Structural Integrity Assessment Standards in Nigeria by Engr Nimot Muili
Structural Integrity Assessment Standards in Nigeria by Engr Nimot MuiliStructural Integrity Assessment Standards in Nigeria by Engr Nimot Muili
Structural Integrity Assessment Standards in Nigeria by Engr Nimot Muili
 
A brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision ProA brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision Pro
 
Cost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionCost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based question
 
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfComprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
 
Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________
 
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
 
tourism-management-srs_compress-software-engineering.pdf
tourism-management-srs_compress-software-engineering.pdftourism-management-srs_compress-software-engineering.pdf
tourism-management-srs_compress-software-engineering.pdf
 
March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...
March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...
March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...
 
70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training
 
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdf
 
KCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosKCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitos
 
Module-1-Building Acoustics(Introduction)(Unit-1).pdf
Module-1-Building Acoustics(Introduction)(Unit-1).pdfModule-1-Building Acoustics(Introduction)(Unit-1).pdf
Module-1-Building Acoustics(Introduction)(Unit-1).pdf
 

DeconvNet, DecoupledNet, TransferNet in Image Segmentation

  • 1. DeconvNet, DecoupledNet, TransferNet in Image Segmentation NamHyuk Ahn @ Ajou Univ. 2016. 05. 11
  • 2. Contents - Semantic Segmentation - Deconvolution Network for Supervised Learning - Decoupled Network for Semi-Supervised Learning - Transfer Learning in Semantic Segmentation
  • 4. Semantic Segmentation - Predict pixel-level label in image - ct [Shotton et al . 2007]
  • 5. PASCAL VOC - 20 classes - 12K training / 1K test images
 MS COCO - 91 classes - 120K training / 40K test
 images Datasets
  • 7. Problems of FCN - FCN only handle single-scale semantic, since it has fixed-size receptive field - Label map is so small, tend to forget detail structures of object
  • 8. DeconvNet - To address such issue, they use “deconvolution” - Convolution Network extract features (VGG-16 net) - Deconvolution Network generate probability map (same size to input image) - Probability map indicate probability each pixel belongs to one of class -
  • 9. Deconvolution Network - Unpooling • Reconstruct structure of original activation map • Activation size is preserved, but still sparse - Deconvolution • Densify sparse (enlarge) activation map
  • 10. Analysis of DeconvNet - DeconvNet is better in segmentation since it produce dense and enlarged pixel-wise map - Shallow layers tend to capture overall structure of object (shape, region, position), deep layers does complicated patterns - Unpooling captures example-specific structure so can reconstruct object details in higher resolution - Deconvolution captures class-specific shape, so closely related to target class are amplified and noise activations are suppresed
  • 12. More details of DeconvNet - Instance-wise segmentation - Use batch normalization in both networks - Two-stage training - Ensemble with FCN • FCN, DeconvNet are complementary relationship • Best result
  • 13. Instance-wise Segmentation - Input proposal instances in network (not entire image) - Get proposal instance using EdgeBox algorithm - Identify more details of object with multi scale - Reduce search space, so can reduce memory at train
  • 14. Two-stage Training - DeconvNet has lots of parameters, but don’t have many segmentation data (10K in PASCAL VOC) • Use two-stage training to address this issue • Fist stage: Input center-cropped images • Second stage: Input proposal sub-images - So network generalize better
  • 15. Result - 2nd best in Pascal VOC only training - Note: In paper they say mean IOU is 72.5, but in presentation files, 74.8
  • 17. Recap - Possible to make dense, precise segmentation mask since reconstruct coarse-to-fine construction - With instance-wise segmentation, it can handle object scale variation - But lots of parameters (almost 2x VGG-16) 
 so additional training stage is needed
  • 18. Decoupled Network for Semi- Supervised Learning
  • 19. Motivation - Make ground-truth of segmentation takes a lot of cost so do it like semi-supervised learning - Utilize many image-level annotation and few pixel- level annotation - Modify DeconvNet - With less data (25 per class), achieve good result (62.5 mean IOU)
  • 20. Main idea - Semantic segmentation can be decomposed to 
 multi-label classification, binary segmentation Person Bottle Multi-label classification Binary segmentationSemantic segmentation
  • 21. Overview - Classification network for multi-label classification - Segmentation network for binary segmentation - Bridging layers for delivering class-specific information to segmentation network
  • 22. Architecture - Classification Network (Same as VGG-16) - Segmentation Network • Take class-specific activation map from bridge layer and do binary segmentation (main difference with DeconvNet) • Binary segmentation reduce parameters, so we can train with few pixel-wise annotation data
  • 23. Architecture - Bridging Layers • Segmentation network needs class-specific and spatial info to produce class-specific segmentation mask • Get spatial information from pool5 in classification network • has useful info for shape generation, but contain mixed info of all relevant label → identify class-specific activation • Make saliency map to identify class-specific activation
  • 24. Architecture - Saliency Map 1. Produce score vector, set dscore all 0 but 1 in idx related to label that want to track 2. Backprop to arbitrary layer (pool5 in this paper) - By saliency map we can get class-specific information 
 in each label (class) Qualitative example of saliency map 
 [Karen Simonyan et al,. 2014]
  • 25. Architecture - Bridging Layers • Combine , to produce class-specific activation map • Pass through fc layer and feed to segmentation network • g has both spatial and class-specific information
  • 26.
  • 27. Inference - Computing segmentation map for each identified label - Pixel-wise aggregate each segmentation map M
  • 28. Training - Train classification network with many image-level annotation - Train segmentation network and bridging layers with few pixel-level annotation
  • 31. Recap - Utilize many image-level annotation and few pixel-level annotation - Add bridging layer to DeconvNet for binary segmentation to reduce parameter - Bridging layer output both spatial and class-specific information in each class (label) - Train two networks separately (decoupled) • Worse performance in fully-supervision since jointly optimization is more desirable in fully-supervision - With few strong annotated data (25 per class) achieve good result (62.5 mean IOU)
  • 32. Transfer Learning in Semantic Segmentation
  • 33. Motivation - Pre-train network and inference to new dataset
 (ex. train with MS COCO, inference to PASCAL VOC) - This idea doesn’t work well with DecoupledNet • DecoupledNet trained with class-specific input, so it can’t be generalize to new class • Train network with class-independent input!
  • 34. Overview - Attention model identify salient region of each class associated with input image • Output of attention model has location information of each class in coarse feature map - Encoder extract features; Decoder generate dense foreground segmentation mask of each focused region - Training stage • Fix encoder (pre-trained) and train decoder, attention model using pixel-level annotation from source domain • Train attention model using image-level annotation in both domain - After training, decoder is trained with source domain and attention is trained with both domain so attention adapted to target domain
  • 35. Overview - Decoupled encoder-decoder make it possible to share information for shape generation among different class - Attention model provides • Predictions for localization • Class-specific information → enable to adapt decoder into target domain - With attention model, able to get information transferable across different domain and provide useful segmentation prior information
  • 36. Architecture - Encoder • Extract feature descriptor as 
 A is obtain from last conv layer to retain spatial information • M, D is # of hidden unit (20x20), # of channel respectively - Attention model • To train weight vector , where represents relevance of location to each class l • Formally, • And extra technique to reduce parameter [R. Memisevic. 2013] did
  • 37. Architecture - Attention model • To apply attention to this model, it has to be trainable in both domain • Add additional layers on top of attention model, and train
 both , under classification objective • Finally, , z represents class-specific feature • Can optimize z using weak annotation with both domain
 • Example of attention
  • 38. Architecture - Decoder • Output of attention model is spare due to softmax, it may lost information for shape generation • Feed additional input A to z (multiply) → densified attention • With densified attention, optimize segmentation loss, procedure is same as DecoupledNet, but optimize decoder only with source domain
  • 39. Analysis of TransferNet - Decoder generates foreground segmentation of attention to each label - By decoupling classification (domain specific task), it can capture class-independent information for shape generation and apply unseen class - Train attention model using not only pixel-level but also image-level annotation, it can handle unseen class • In DecoupledNet, bridging layer is trained by only pixel-level data

  • 40. Train / Inference - When train, optimize this eq • Trained using only class label is good, but jointly train with segmentation label to regularize noise • After training, remove since it is required only in training to learn attention from target domain - Inference 1. Iteratively obtain attention and segmentation mask 2. Aggregate mask (same as DecoupledNet)
  • 43. Reference - Hyeonwoo Noh, Seunghoon Hong, and Bohyung Han. “Learning deconvolution network for semantic segmentation.” Proceedings of the IEEE International Conference on Computer Vision. 2015. - Seunghoon Hong, Hyeonwoo Noh, and Bohyung Han. "Decoupled deep neural network for semi-supervised semantic segmentation.” Advances in Neural Information Processing Systems. 2015. - Seunghoon Hong, et al. “Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network.” arXiv preprint arXiv:1512.07928 (2015). - Hyeonwoo Noh. “Semantic Segmentation and Visual Question Answering” (https://drive.google.com/file/d/0B5xl2L77gZfVRXZxQWNmSGlBemc/view)