SlideShare a Scribd company logo
1 of 48
Download to read offline
Torr Vision Group, Engineering Department
Semantic Image
Segmentation with
Deep Learning
Sadeep Jayasumana
07/10/2015
Collaborators:
Bernardino Romera-Paredes
Shuai Zheng
Phillip Torr
Torr Vision Group, Engineering Department
Live Demo - http://crfasrnn.torr.vision/
Torr Vision Group, Engineering Department
Outline
 Semantic segmentation
 Why?
 CNNs for Pixelwise prediction
 CRFs
 CRF as RNN
 Conclusion
Torr Vision Group, Engineering Department
Semantic Segmentation
• Recognizing and delineating objects in an image 
Classifying each pixel in the image
Torr Vision Group, Engineering Department
Why Semantic Segmentation?
• To help partially sighted people by highlighting
important objects in their glasses
Torr Vision Group, Engineering Department
Why Semantic Segmentation?
• To let robots segment objects so that they can grasp
them
Torr Vision Group, Engineering Department
• Road scenes understanding
• Useful for autonomous navigation of cars and
drones
Image taken from the cityscapes dataset.
Why Semantic Segmentation?
Torr Vision Group, Engineering Department
• Useful tool for editing images
Why Semantic Segmentation?
Torr Vision Group, Engineering Department
• Medical purposes: e.g. segmenting
tumours, dental cavities, ...
Image taken from Mauricio Reyes
ISBI Challenge 2015, dental x-ray images
Why Semantic Segmentation?
Torr Vision Group, Engineering Department
But How?
• Deep convolutional neural networks are successful at
learning a good representation of the visual inputs.
• However, here we have a structured output.
Torr Vision Group, Engineering Department
CNN for Pixel-wise Labelling
• Usual convolutional networks
Torr Vision Group, Engineering Department
CNN for Pixel-wise Labelling
• Usual convolutional networks
• Fully convolutional networks
Long et. al., Fully Convolutional Networks for Semantic Segmentation, CVPR 2015.
Torr Vision Group, Engineering Department
Fully Convolutional Networks
[Long et al, CVPR 2014]
Torr Vision Group, Engineering Department
+ Significantly improved the state of the art in semantic
segmentation.
- Poor object delineation: e.g. spatial consistency
neglected.
Fully Convolutional Networks
[Long et al, CVPR 2014]
Image FCN Results Ground truth
Torr Vision Group, Engineering Department
• A CRF can account for contextual information in the
image
Conditional Random Fields (CRFs)
Coarse output from the
pixel-wise classifier
MRF/CRF modelling Output after the CRF
inference
Torr Vision Group, Engineering Department
Conditional Random Fields (CRFs)
∈ {bg, cat, tree, person, …}
• Define a discrete random variable Xi for each pixel i.
• Each Xi can take a value from the label set.
• Connect random variables to form a random field. (MRF)
Torr Vision Group, Engineering Department
Conditional Random Fields (CRFs)
∈ {bg, cat, tree, person, …} = cat= bg
• Define a discrete random variable Xi for each pixel i.
• Each Xi can take a value from the label set.
• Connect random variables to form a random field. (MRF)
• Most probable assignment given the image → segmentation.
Torr Vision Group, Engineering Department
Finding the Best Assignment
= bg
Pr = , = , … , = | = Pr	( = | )
= cat
Pr = 	|	 = exp − 	|	
• Maximize Pr = → Minimize	
• So we have formulated the problem as an energy minimization.
Torr Vision Group, Engineering Department
| = _ + _
=
Torr Vision Group, Engineering Department
Unary energy
 ( = ) =	?	
| = _ + _
=
Torr Vision Group, Engineering Department
Unary energy
 ( = ) =	?	
 Your label doesn’t agree with the initial
classifier → you pay a penalty.
| = _ + _
=
Torr Vision Group, Engineering Department
Unary energy
 ( = ) =	?	
 Your label doesn’t agree with the initial
classifier → you pay a penalty.
Pairwise energy
 ( = , = ) =	?	
 You assign different labels to two very similar
pixels → you pay a penalty.
 How do you measure similarity?
| = _ + _
Torr Vision Group, Engineering Department
Unary energy
 ( = ) =	?	
 Your label doesn’t agree with the initial
classifier → you pay a penalty.
Pairwise energy
 ( = , = ) =	?	
 You assign different labels to two very similar
pixels → you pay a penalty.
 How do you measure similarity?
| = _ + _
Torr Vision Group, Engineering Department
Unary energy
 ( = ) =	?	
 Your label doesn’t agree with the initial
classifier → you pay a penalty.
Pairwise energy
 ( = , = ) =	?	
 You assign different labels to two very similar
pixels → you pay a penalty.
 How do you measure similarity?
| = _ + _
Torr Vision Group, Engineering Department
Dense CRF Formulation
• Pairwise energies are defined for every pixel pair in the
image.
= ( ) + ( , )
,
• Exact inference is not feasible.
• Use approximate mean field inference.
[Krähenbühl & Koltun, NIPS 2011.]
Torr Vision Group, Engineering Department
Dense CRF Formulation
• Pairwise energies are defined for every pixel pair in the
image.
= ( ) + ( , )
,
• Exact inference is not feasible.
• Use approximate mean field inference.
[Krähenbühl & Koltun, NIPS 2011.]
exp	(− ) = = ( )
Torr Vision Group, Engineering Department
Fully Connected CRFs as a CNN
Torr Vision Group, Engineering Department
BilateralQ
I
U
Fully Connected CRFs as a CNN
Torr Vision Group, Engineering Department
Bilateral ConvQ
I
U
Fully Connected CRFs as a CNN
Torr Vision Group, Engineering Department
Bilateral Conv ConvQ
I
U
Fully Connected CRFs as a CNN
Torr Vision Group, Engineering Department
Bilateral Conv Conv +Q
I
U
Fully Connected CRFs as a CNN
Torr Vision Group, Engineering Department
Bilateral Conv Conv + SoftMaxQ
I
U
Fully Connected CRFs as a CNN
Torr Vision Group, Engineering Department
Bilateral Conv Conv + SoftMaxQ
I
U
CRF as a Recurrent Neural Network
• Each of these blocks is differentiable  We can backprop
Mean-field Iteration
Torr Vision Group, Engineering Department
CRF
Iteration
SoftMax
Image
Unaries
• Each of these blocks is differentiable  We can backprop
Output
CRF as RNN
CRF as a Recurrent Neural Network
Torr Vision Group, Engineering Department
Putting Things Together
FCN CRF-RNN
Torr Vision Group, Engineering Department
Experiments
68.3 69.5 72.9
FCN CRFFCN
CRF-
RNN
CRF-
RNN
FCN
Ours[Chen et al, 2015][Long et al, 2014]
Torr Vision Group, Engineering Department
Try our demo: http://crfasrnn.torr.vision
Code & model: https://github.com/torrvision/crfasrnn
Shuai Zheng
Bernardino
Romera-Paredes
Philip Torr
Torr Vision Group, Engineering Department
Examples
http://pp.vk.me/c622119/v622119584/20dc3/7lS5BU2Bp_k.jpg
Torr Vision Group, Engineering Department
Examples
http://media1.fdncms.com/boiseweekly/imager/mountain-bikers-are-advised-to-dism/u/original/3446917/walk_thru_sheep_1_.jpg
Torr Vision Group, Engineering Department
Examples
http://img.rtvslo.si/_up/upload/2014/07/22/65129194_tour-3.jpg
Torr Vision Group, Engineering Department
Examples
http://www.toxel.com/wp-content/uploads/2010/11/bike05.jpg
Torr Vision Group, Engineering Department
Not-so-good examples
http://www.independent.co.uk/incoming/article10335615.ece/alternates/w620/planecat.jpg
Torr Vision Group, Engineering Department
http://i1.wp.com/theverybesttop10.files.wordpress.com/2013/02/the-world_s-top-10-best-images-of-camouflage-cats-5.jpg?resize=375,500
Not-so-good examples
Torr Vision Group, Engineering Department
Tricky examples
http://se-preparer-aux-crises.fr/wp-content/uploads/2013/10/Golum.png
Torr Vision Group, Engineering Department
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRf4J7Hszkc8Wf6riVUX-cV_K-un8LJy5dYIBW1KDIn6i7UCzGHpg
Tricky examples
Torr Vision Group, Engineering Department
http://i.huffpost.com/gen/1478236/thumbs/s-DIRD6-large640.jpg
Tricky examples
Torr Vision Group, Engineering Department
Conclusion
• CNNs yield a coarse prediction on pixel-labeled tasks.
• CRFs improve the result by accounting for the contextual
information in the image.
• Learning the whole pipeline end-to-end significantly
improves the results.
CNN CRF
Torr Vision Group, Engineering Department
Conclusion
• CNNs yield a coarse prediction on pixel-labeled tasks.
• CRFs improve the result by accounting for the contextual
information in the image.
• Learning the whole pipeline end-to-end significantly
improves the results.
CNN CRF
Thank You!

More Related Content

What's hot

Codetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep LearningCodetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep LearningMatthew Opala
 
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic SegmentationSemantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation岳華 杜
 
Review-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learningReview-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learningTrong-An Bui
 
Deformable DETR Review [CDM]
Deformable DETR Review [CDM]Deformable DETR Review [CDM]
Deformable DETR Review [CDM]Dongmin Choi
 
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 
Modern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationModern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationGioele Ciaparrone
 
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
You only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detectionYou only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detectionEntrepreneur / Startup
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorJinwon Lee
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetRishabh Indoria
 
YOLO9000 - PR023
YOLO9000 - PR023YOLO9000 - PR023
YOLO9000 - PR023Jinwon Lee
 
Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Dmytro Mishkin
 

What's hot (20)

Codetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep LearningCodetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep Learning
 
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic SegmentationSemantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
 
Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)
Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)
Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)
 
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
 
Review-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learningReview-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learning
 
Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018
Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018
Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018
 
Deformable DETR Review [CDM]
Deformable DETR Review [CDM]Deformable DETR Review [CDM]
Deformable DETR Review [CDM]
 
Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)
 
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
 
Deep 3D Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2018
Deep 3D Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2018Deep 3D Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2018
Deep 3D Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2018
 
Modern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationModern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentation
 
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
 
Deep Learning for Computer Vision: Segmentation (UPC 2016)
Deep Learning for Computer Vision: Segmentation (UPC 2016)Deep Learning for Computer Vision: Segmentation (UPC 2016)
Deep Learning for Computer Vision: Segmentation (UPC 2016)
 
Mask R-CNN
Mask R-CNNMask R-CNN
Mask R-CNN
 
You only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detectionYou only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detection
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
 
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs Retinanet
 
YOLO9000 - PR023
YOLO9000 - PR023YOLO9000 - PR023
YOLO9000 - PR023
 
Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...
 

Viewers also liked

"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation..."Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...Edge AI and Vision Alliance
 
Semantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep LearningSemantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep LearningSungjoon Choi
 
(Semantic Web Technologies and Applications track) "A Quantitative Comparison...
(Semantic Web Technologies and Applications track) "A Quantitative Comparison...(Semantic Web Technologies and Applications track) "A Quantitative Comparison...
(Semantic Web Technologies and Applications track) "A Quantitative Comparison...icwe2015
 
Semantic-Aware Sky Replacement (SIGGRAPH 2016)
Semantic-Aware Sky Replacement (SIGGRAPH 2016)Semantic-Aware Sky Replacement (SIGGRAPH 2016)
Semantic-Aware Sky Replacement (SIGGRAPH 2016)Yi-Hsuan Tsai
 
Semantic Mapping of Road Scenes
Semantic Mapping of Road ScenesSemantic Mapping of Road Scenes
Semantic Mapping of Road ScenesSunando Sengupta
 
A novel data embedding method using adaptive pixel
A novel data embedding method using adaptive pixelA novel data embedding method using adaptive pixel
A novel data embedding method using adaptive pixelRenuka Verma
 
Image denoising algorithms
Image denoising algorithmsImage denoising algorithms
Image denoising algorithmsMohammad Sunny
 
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...남주 김
 
Deep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image ProcessingDeep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image ProcessingGrigory Sapunov
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowEmanuel Di Nardo
 
Dataset for Semantic Urban Scene Understanding
Dataset for Semantic Urban Scene UnderstandingDataset for Semantic Urban Scene Understanding
Dataset for Semantic Urban Scene UnderstandingYosuke Shinya
 
Deep learning intro
Deep learning introDeep learning intro
Deep learning introbeamandrew
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
 
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaDeep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaSpark Summit
 

Viewers also liked (20)

"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation..."Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
 
Semantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep LearningSemantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep Learning
 
dress for success
dress for success dress for success
dress for success
 
Lawrence Du Seattle Final
Lawrence Du Seattle FinalLawrence Du Seattle Final
Lawrence Du Seattle Final
 
(Semantic Web Technologies and Applications track) "A Quantitative Comparison...
(Semantic Web Technologies and Applications track) "A Quantitative Comparison...(Semantic Web Technologies and Applications track) "A Quantitative Comparison...
(Semantic Web Technologies and Applications track) "A Quantitative Comparison...
 
Semantic-Aware Sky Replacement (SIGGRAPH 2016)
Semantic-Aware Sky Replacement (SIGGRAPH 2016)Semantic-Aware Sky Replacement (SIGGRAPH 2016)
Semantic-Aware Sky Replacement (SIGGRAPH 2016)
 
Improving Spatial Codification in Semantic Segmentation
Improving Spatial Codification in Semantic SegmentationImproving Spatial Codification in Semantic Segmentation
Improving Spatial Codification in Semantic Segmentation
 
Semantic Mapping of Road Scenes
Semantic Mapping of Road ScenesSemantic Mapping of Road Scenes
Semantic Mapping of Road Scenes
 
A novel data embedding method using adaptive pixel
A novel data embedding method using adaptive pixelA novel data embedding method using adaptive pixel
A novel data embedding method using adaptive pixel
 
Image denoising
Image denoisingImage denoising
Image denoising
 
Image denoising algorithms
Image denoising algorithmsImage denoising algorithms
Image denoising algorithms
 
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
 
Deep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image ProcessingDeep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image Processing
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflow
 
Dataset for Semantic Urban Scene Understanding
Dataset for Semantic Urban Scene UnderstandingDataset for Semantic Urban Scene Understanding
Dataset for Semantic Urban Scene Understanding
 
Deep learning intro
Deep learning introDeep learning intro
Deep learning intro
 
Semantic segmentation
Semantic segmentationSemantic segmentation
Semantic segmentation
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaDeep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
 
Multidimensional RNN
Multidimensional RNNMultidimensional RNN
Multidimensional RNN
 

Similar to crfasrnn_presentation

IRJET - Vehicle Classification with Time-Frequency Domain Features using ...
IRJET -  	  Vehicle Classification with Time-Frequency Domain Features using ...IRJET -  	  Vehicle Classification with Time-Frequency Domain Features using ...
IRJET - Vehicle Classification with Time-Frequency Domain Features using ...IRJET Journal
 
Deep learning fundamental and Research project on IBM POWER9 system from NUS
Deep learning fundamental and Research project on IBM POWER9 system from NUSDeep learning fundamental and Research project on IBM POWER9 system from NUS
Deep learning fundamental and Research project on IBM POWER9 system from NUSGanesan Narayanasamy
 
Keynote at Tracking Workshop during ISMAR 2014
Keynote at Tracking Workshop during ISMAR 2014Keynote at Tracking Workshop during ISMAR 2014
Keynote at Tracking Workshop during ISMAR 2014Darius Burschka
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用CHENHuiMei
 
Ee 417 Senior Design
Ee 417 Senior DesignEe 417 Senior Design
Ee 417 Senior Designcrouchj1
 
ANISH_and_DR.DANIEL_augmented_reality_presentation
ANISH_and_DR.DANIEL_augmented_reality_presentationANISH_and_DR.DANIEL_augmented_reality_presentation
ANISH_and_DR.DANIEL_augmented_reality_presentationAnish Patel
 
Mirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image ProcessingMirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image ProcessingMeetupDataScienceRoma
 
Video Stitching using Improved RANSAC and SIFT
Video Stitching using Improved RANSAC and SIFTVideo Stitching using Improved RANSAC and SIFT
Video Stitching using Improved RANSAC and SIFTIRJET Journal
 
Transformer in Vision
Transformer in VisionTransformer in Vision
Transformer in VisionSangmin Woo
 
Build Your Own 3D Scanner: 3D Scanning with Structured Lighting
Build Your Own 3D Scanner: 3D Scanning with Structured LightingBuild Your Own 3D Scanner: 3D Scanning with Structured Lighting
Build Your Own 3D Scanner: 3D Scanning with Structured LightingDouglas Lanman
 
Imaging automotive 2015 addfor v002
Imaging automotive 2015   addfor v002Imaging automotive 2015   addfor v002
Imaging automotive 2015 addfor v002Enrico Busto
 
Imaging automotive 2015 addfor v002
Imaging automotive 2015   addfor v002Imaging automotive 2015   addfor v002
Imaging automotive 2015 addfor v002Enrico Busto
 
A survey on coding binary visual features extracted from video sequences
A survey on coding binary visual features extracted from video sequencesA survey on coding binary visual features extracted from video sequences
A survey on coding binary visual features extracted from video sequencesIRJET Journal
 
Itcs 4120 introduction (c)
Itcs 4120 introduction (c)Itcs 4120 introduction (c)
Itcs 4120 introduction (c)yaminigoyal
 
Software Defined Visualization (SDVis): Get the Most Out of ParaView* with OS...
Software Defined Visualization (SDVis): Get the Most Out of ParaView* with OS...Software Defined Visualization (SDVis): Get the Most Out of ParaView* with OS...
Software Defined Visualization (SDVis): Get the Most Out of ParaView* with OS...Intel® Software
 
Unsupervised Computer Vision: The Current State of the Art
Unsupervised Computer Vision: The Current State of the ArtUnsupervised Computer Vision: The Current State of the Art
Unsupervised Computer Vision: The Current State of the ArtTJ Torres
 
Moving vehicle detection from Videos
Moving vehicle detection from VideosMoving vehicle detection from Videos
Moving vehicle detection from VideosPRIYANK BHARDWAJ
 
IRJET- Automatic Data Collection from Forms using Optical Character Recognition
IRJET- Automatic Data Collection from Forms using Optical Character RecognitionIRJET- Automatic Data Collection from Forms using Optical Character Recognition
IRJET- Automatic Data Collection from Forms using Optical Character RecognitionIRJET Journal
 

Similar to crfasrnn_presentation (20)

IRJET - Vehicle Classification with Time-Frequency Domain Features using ...
IRJET -  	  Vehicle Classification with Time-Frequency Domain Features using ...IRJET -  	  Vehicle Classification with Time-Frequency Domain Features using ...
IRJET - Vehicle Classification with Time-Frequency Domain Features using ...
 
Deep learning fundamental and Research project on IBM POWER9 system from NUS
Deep learning fundamental and Research project on IBM POWER9 system from NUSDeep learning fundamental and Research project on IBM POWER9 system from NUS
Deep learning fundamental and Research project on IBM POWER9 system from NUS
 
Keynote at Tracking Workshop during ISMAR 2014
Keynote at Tracking Workshop during ISMAR 2014Keynote at Tracking Workshop during ISMAR 2014
Keynote at Tracking Workshop during ISMAR 2014
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用
 
Ee 417 Senior Design
Ee 417 Senior DesignEe 417 Senior Design
Ee 417 Senior Design
 
ANISH_and_DR.DANIEL_augmented_reality_presentation
ANISH_and_DR.DANIEL_augmented_reality_presentationANISH_and_DR.DANIEL_augmented_reality_presentation
ANISH_and_DR.DANIEL_augmented_reality_presentation
 
Mirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image ProcessingMirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image Processing
 
Video Stitching using Improved RANSAC and SIFT
Video Stitching using Improved RANSAC and SIFTVideo Stitching using Improved RANSAC and SIFT
Video Stitching using Improved RANSAC and SIFT
 
Transformer in Vision
Transformer in VisionTransformer in Vision
Transformer in Vision
 
Build Your Own 3D Scanner: 3D Scanning with Structured Lighting
Build Your Own 3D Scanner: 3D Scanning with Structured LightingBuild Your Own 3D Scanner: 3D Scanning with Structured Lighting
Build Your Own 3D Scanner: 3D Scanning with Structured Lighting
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
Imaging automotive 2015 addfor v002
Imaging automotive 2015   addfor v002Imaging automotive 2015   addfor v002
Imaging automotive 2015 addfor v002
 
Imaging automotive 2015 addfor v002
Imaging automotive 2015   addfor v002Imaging automotive 2015   addfor v002
Imaging automotive 2015 addfor v002
 
A survey on coding binary visual features extracted from video sequences
A survey on coding binary visual features extracted from video sequencesA survey on coding binary visual features extracted from video sequences
A survey on coding binary visual features extracted from video sequences
 
Itcs 4120 introduction (c)
Itcs 4120 introduction (c)Itcs 4120 introduction (c)
Itcs 4120 introduction (c)
 
Software Defined Visualization (SDVis): Get the Most Out of ParaView* with OS...
Software Defined Visualization (SDVis): Get the Most Out of ParaView* with OS...Software Defined Visualization (SDVis): Get the Most Out of ParaView* with OS...
Software Defined Visualization (SDVis): Get the Most Out of ParaView* with OS...
 
Unsupervised Computer Vision: The Current State of the Art
Unsupervised Computer Vision: The Current State of the ArtUnsupervised Computer Vision: The Current State of the Art
Unsupervised Computer Vision: The Current State of the Art
 
Moving vehicle detection from Videos
Moving vehicle detection from VideosMoving vehicle detection from Videos
Moving vehicle detection from Videos
 
slide-171212080528.pptx
slide-171212080528.pptxslide-171212080528.pptx
slide-171212080528.pptx
 
IRJET- Automatic Data Collection from Forms using Optical Character Recognition
IRJET- Automatic Data Collection from Forms using Optical Character RecognitionIRJET- Automatic Data Collection from Forms using Optical Character Recognition
IRJET- Automatic Data Collection from Forms using Optical Character Recognition
 

crfasrnn_presentation

  • 1. Torr Vision Group, Engineering Department Semantic Image Segmentation with Deep Learning Sadeep Jayasumana 07/10/2015 Collaborators: Bernardino Romera-Paredes Shuai Zheng Phillip Torr
  • 2. Torr Vision Group, Engineering Department Live Demo - http://crfasrnn.torr.vision/
  • 3. Torr Vision Group, Engineering Department Outline  Semantic segmentation  Why?  CNNs for Pixelwise prediction  CRFs  CRF as RNN  Conclusion
  • 4. Torr Vision Group, Engineering Department Semantic Segmentation • Recognizing and delineating objects in an image  Classifying each pixel in the image
  • 5. Torr Vision Group, Engineering Department Why Semantic Segmentation? • To help partially sighted people by highlighting important objects in their glasses
  • 6. Torr Vision Group, Engineering Department Why Semantic Segmentation? • To let robots segment objects so that they can grasp them
  • 7. Torr Vision Group, Engineering Department • Road scenes understanding • Useful for autonomous navigation of cars and drones Image taken from the cityscapes dataset. Why Semantic Segmentation?
  • 8. Torr Vision Group, Engineering Department • Useful tool for editing images Why Semantic Segmentation?
  • 9. Torr Vision Group, Engineering Department • Medical purposes: e.g. segmenting tumours, dental cavities, ... Image taken from Mauricio Reyes ISBI Challenge 2015, dental x-ray images Why Semantic Segmentation?
  • 10. Torr Vision Group, Engineering Department But How? • Deep convolutional neural networks are successful at learning a good representation of the visual inputs. • However, here we have a structured output.
  • 11. Torr Vision Group, Engineering Department CNN for Pixel-wise Labelling • Usual convolutional networks
  • 12. Torr Vision Group, Engineering Department CNN for Pixel-wise Labelling • Usual convolutional networks • Fully convolutional networks Long et. al., Fully Convolutional Networks for Semantic Segmentation, CVPR 2015.
  • 13. Torr Vision Group, Engineering Department Fully Convolutional Networks [Long et al, CVPR 2014]
  • 14. Torr Vision Group, Engineering Department + Significantly improved the state of the art in semantic segmentation. - Poor object delineation: e.g. spatial consistency neglected. Fully Convolutional Networks [Long et al, CVPR 2014] Image FCN Results Ground truth
  • 15. Torr Vision Group, Engineering Department • A CRF can account for contextual information in the image Conditional Random Fields (CRFs) Coarse output from the pixel-wise classifier MRF/CRF modelling Output after the CRF inference
  • 16. Torr Vision Group, Engineering Department Conditional Random Fields (CRFs) ∈ {bg, cat, tree, person, …} • Define a discrete random variable Xi for each pixel i. • Each Xi can take a value from the label set. • Connect random variables to form a random field. (MRF)
  • 17. Torr Vision Group, Engineering Department Conditional Random Fields (CRFs) ∈ {bg, cat, tree, person, …} = cat= bg • Define a discrete random variable Xi for each pixel i. • Each Xi can take a value from the label set. • Connect random variables to form a random field. (MRF) • Most probable assignment given the image → segmentation.
  • 18. Torr Vision Group, Engineering Department Finding the Best Assignment = bg Pr = , = , … , = | = Pr ( = | ) = cat Pr = | = exp − | • Maximize Pr = → Minimize • So we have formulated the problem as an energy minimization.
  • 19. Torr Vision Group, Engineering Department | = _ + _ =
  • 20. Torr Vision Group, Engineering Department Unary energy  ( = ) = ? | = _ + _ =
  • 21. Torr Vision Group, Engineering Department Unary energy  ( = ) = ?  Your label doesn’t agree with the initial classifier → you pay a penalty. | = _ + _ =
  • 22. Torr Vision Group, Engineering Department Unary energy  ( = ) = ?  Your label doesn’t agree with the initial classifier → you pay a penalty. Pairwise energy  ( = , = ) = ?  You assign different labels to two very similar pixels → you pay a penalty.  How do you measure similarity? | = _ + _
  • 23. Torr Vision Group, Engineering Department Unary energy  ( = ) = ?  Your label doesn’t agree with the initial classifier → you pay a penalty. Pairwise energy  ( = , = ) = ?  You assign different labels to two very similar pixels → you pay a penalty.  How do you measure similarity? | = _ + _
  • 24. Torr Vision Group, Engineering Department Unary energy  ( = ) = ?  Your label doesn’t agree with the initial classifier → you pay a penalty. Pairwise energy  ( = , = ) = ?  You assign different labels to two very similar pixels → you pay a penalty.  How do you measure similarity? | = _ + _
  • 25. Torr Vision Group, Engineering Department Dense CRF Formulation • Pairwise energies are defined for every pixel pair in the image. = ( ) + ( , ) , • Exact inference is not feasible. • Use approximate mean field inference. [Krähenbühl & Koltun, NIPS 2011.]
  • 26. Torr Vision Group, Engineering Department Dense CRF Formulation • Pairwise energies are defined for every pixel pair in the image. = ( ) + ( , ) , • Exact inference is not feasible. • Use approximate mean field inference. [Krähenbühl & Koltun, NIPS 2011.] exp (− ) = = ( )
  • 27. Torr Vision Group, Engineering Department Fully Connected CRFs as a CNN
  • 28. Torr Vision Group, Engineering Department BilateralQ I U Fully Connected CRFs as a CNN
  • 29. Torr Vision Group, Engineering Department Bilateral ConvQ I U Fully Connected CRFs as a CNN
  • 30. Torr Vision Group, Engineering Department Bilateral Conv ConvQ I U Fully Connected CRFs as a CNN
  • 31. Torr Vision Group, Engineering Department Bilateral Conv Conv +Q I U Fully Connected CRFs as a CNN
  • 32. Torr Vision Group, Engineering Department Bilateral Conv Conv + SoftMaxQ I U Fully Connected CRFs as a CNN
  • 33. Torr Vision Group, Engineering Department Bilateral Conv Conv + SoftMaxQ I U CRF as a Recurrent Neural Network • Each of these blocks is differentiable  We can backprop Mean-field Iteration
  • 34. Torr Vision Group, Engineering Department CRF Iteration SoftMax Image Unaries • Each of these blocks is differentiable  We can backprop Output CRF as RNN CRF as a Recurrent Neural Network
  • 35. Torr Vision Group, Engineering Department Putting Things Together FCN CRF-RNN
  • 36. Torr Vision Group, Engineering Department Experiments 68.3 69.5 72.9 FCN CRFFCN CRF- RNN CRF- RNN FCN Ours[Chen et al, 2015][Long et al, 2014]
  • 37. Torr Vision Group, Engineering Department Try our demo: http://crfasrnn.torr.vision Code & model: https://github.com/torrvision/crfasrnn Shuai Zheng Bernardino Romera-Paredes Philip Torr
  • 38. Torr Vision Group, Engineering Department Examples http://pp.vk.me/c622119/v622119584/20dc3/7lS5BU2Bp_k.jpg
  • 39. Torr Vision Group, Engineering Department Examples http://media1.fdncms.com/boiseweekly/imager/mountain-bikers-are-advised-to-dism/u/original/3446917/walk_thru_sheep_1_.jpg
  • 40. Torr Vision Group, Engineering Department Examples http://img.rtvslo.si/_up/upload/2014/07/22/65129194_tour-3.jpg
  • 41. Torr Vision Group, Engineering Department Examples http://www.toxel.com/wp-content/uploads/2010/11/bike05.jpg
  • 42. Torr Vision Group, Engineering Department Not-so-good examples http://www.independent.co.uk/incoming/article10335615.ece/alternates/w620/planecat.jpg
  • 43. Torr Vision Group, Engineering Department http://i1.wp.com/theverybesttop10.files.wordpress.com/2013/02/the-world_s-top-10-best-images-of-camouflage-cats-5.jpg?resize=375,500 Not-so-good examples
  • 44. Torr Vision Group, Engineering Department Tricky examples http://se-preparer-aux-crises.fr/wp-content/uploads/2013/10/Golum.png
  • 45. Torr Vision Group, Engineering Department https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRf4J7Hszkc8Wf6riVUX-cV_K-un8LJy5dYIBW1KDIn6i7UCzGHpg Tricky examples
  • 46. Torr Vision Group, Engineering Department http://i.huffpost.com/gen/1478236/thumbs/s-DIRD6-large640.jpg Tricky examples
  • 47. Torr Vision Group, Engineering Department Conclusion • CNNs yield a coarse prediction on pixel-labeled tasks. • CRFs improve the result by accounting for the contextual information in the image. • Learning the whole pipeline end-to-end significantly improves the results. CNN CRF
  • 48. Torr Vision Group, Engineering Department Conclusion • CNNs yield a coarse prediction on pixel-labeled tasks. • CRFs improve the result by accounting for the contextual information in the image. • Learning the whole pipeline end-to-end significantly improves the results. CNN CRF Thank You!