crfasrnn_presentation

Torr Vision Group, Engineering Department
Semantic Image
Segmentation with
Deep Learning
Sadeep Jayasumana
07/10/2015
Collaborators:
Bernardino Romera-Paredes
Shuai Zheng
Phillip Torr

Live Demo - http://crfasrnn.torr.vision/

Outline
 Semantic segmentation
 Why?
 CNNs for Pixelwise prediction
 CRFs
 CRF as RNN
 Conclusion

Semantic Segmentation
• Recognizing and delineating objects in an image 
Classifying each pixel in the image

Why Semantic Segmentation?
• To help partially sighted people by highlighting
important objects in their glasses

• To let robots segment objects so that they can grasp
them

• Road scenes understanding
• Useful for autonomous navigation of cars and
drones
Image taken from the cityscapes dataset.

• Useful tool for editing images

• Medical purposes: e.g. segmenting
tumours, dental cavities, ...
Image taken from Mauricio Reyes
ISBI Challenge 2015, dental x-ray images

But How?
• Deep convolutional neural networks are successful at
learning a good representation of the visual inputs.
• However, here we have a structured output.

CNN for Pixel-wise Labelling
• Usual convolutional networks

CNN for Pixel-wise Labelling
• Usual convolutional networks
• Fully convolutional networks
Long et. al., Fully Convolutional Networks for Semantic Segmentation, CVPR 2015.

Fully Convolutional Networks
[Long et al, CVPR 2014]

+ Significantly improved the state of the art in semantic
segmentation.
- Poor object delineation: e.g. spatial consistency
neglected.
Fully Convolutional Networks
[Long et al, CVPR 2014]
Image FCN Results Ground truth

• A CRF can account for contextual information in the
image
Conditional Random Fields (CRFs)
Coarse output from the
pixel-wise classifier
MRF/CRF modelling Output after the CRF
inference

∈ {bg, cat, tree, person, …}
• Define a discrete random variable Xi for each pixel i.
• Each Xi can take a value from the label set.
• Connect random variables to form a random field. (MRF)

∈ {bg, cat, tree, person, …} = cat= bg
• Define a discrete random variable Xi for each pixel i.
• Each Xi can take a value from the label set.
• Connect random variables to form a random field. (MRF)
• Most probable assignment given the image → segmentation.

Finding the Best Assignment
= bg
Pr = , = , … , = | = Pr ( = | )
= cat
Pr = | = exp − |
• Maximize Pr = → Minimize
• So we have formulated the problem as an energy minimization.

| = _ + _
=

Unary energy
 ( = ) = ?
| = _ + _
=

Unary energy
 ( = ) = ?
 Your label doesn’t agree with the initial
classifier → you pay a penalty.
| = _ + _
=

Unary energy
 ( = ) = ?
 Your label doesn’t agree with the initial
classifier → you pay a penalty.
Pairwise energy
 ( = , = ) = ?
 You assign different labels to two very similar
pixels → you pay a penalty.
 How do you measure similarity?
| = _ + _

Dense CRF Formulation
• Pairwise energies are defined for every pixel pair in the
image.
= ( ) + ( , )
,
• Exact inference is not feasible.
• Use approximate mean field inference.
[Krähenbühl & Koltun, NIPS 2011.]

Dense CRF Formulation
• Pairwise energies are defined for every pixel pair in the
image.
= ( ) + ( , )
,
• Exact inference is not feasible.
• Use approximate mean field inference.
[Krähenbühl & Koltun, NIPS 2011.]
exp (− ) = = ( )

Fully Connected CRFs as a CNN

BilateralQ
I
U

Bilateral ConvQ
I
U

Bilateral Conv ConvQ
I
U

Bilateral Conv Conv +Q
I
U

Bilateral Conv Conv + SoftMaxQ
I
U

Bilateral Conv Conv + SoftMaxQ
I
U
CRF as a Recurrent Neural Network
• Each of these blocks is differentiable  We can backprop
Mean-field Iteration

CRF
Iteration
SoftMax
Image
Unaries
• Each of these blocks is differentiable  We can backprop
Output
CRF as RNN
CRF as a Recurrent Neural Network

Putting Things Together
FCN CRF-RNN

Experiments
68.3 69.5 72.9
FCN CRFFCN
CRF-
RNN
CRF-
RNN
FCN
Ours[Chen et al, 2015][Long et al, 2014]

Try our demo: http://crfasrnn.torr.vision
Code & model: https://github.com/torrvision/crfasrnn
Shuai Zheng
Bernardino
Romera-Paredes
Philip Torr

Examples
http://pp.vk.me/c622119/v622119584/20dc3/7lS5BU2Bp_k.jpg

Examples
http://media1.fdncms.com/boiseweekly/imager/mountain-bikers-are-advised-to-dism/u/original/3446917/walk_thru_sheep_1_.jpg

Examples
http://img.rtvslo.si/_up/upload/2014/07/22/65129194_tour-3.jpg

Examples
http://www.toxel.com/wp-content/uploads/2010/11/bike05.jpg

Not-so-good examples
http://www.independent.co.uk/incoming/article10335615.ece/alternates/w620/planecat.jpg

http://i1.wp.com/theverybesttop10.files.wordpress.com/2013/02/the-world_s-top-10-best-images-of-camouflage-cats-5.jpg?resize=375,500
Not-so-good examples

Tricky examples
http://se-preparer-aux-crises.fr/wp-content/uploads/2013/10/Golum.png

https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRf4J7Hszkc8Wf6riVUX-cV_K-un8LJy5dYIBW1KDIn6i7UCzGHpg
Tricky examples

http://i.huffpost.com/gen/1478236/thumbs/s-DIRD6-large640.jpg
Tricky examples

Conclusion
• CNNs yield a coarse prediction on pixel-labeled tasks.
• CRFs improve the result by accounting for the contextual
information in the image.
• Learning the whole pipeline end-to-end significantly
improves the results.
CNN CRF

Conclusion
• CNNs yield a coarse prediction on pixel-labeled tasks.
• CRFs improve the result by accounting for the contextual
information in the image.
• Learning the whole pipeline end-to-end significantly
improves the results.
CNN CRF
Thank You!

crfasrnn_presentation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to crfasrnn_presentation

Similar to crfasrnn_presentation (20)

crfasrnn_presentation