IRJET- Automatic Data Collection from Forms using Optical Character Recognition
crfasrnn_presentation
1. Torr Vision Group, Engineering Department
Semantic Image
Segmentation with
Deep Learning
Sadeep Jayasumana
07/10/2015
Collaborators:
Bernardino Romera-Paredes
Shuai Zheng
Phillip Torr
2. Torr Vision Group, Engineering Department
Live Demo - http://crfasrnn.torr.vision/
3. Torr Vision Group, Engineering Department
Outline
Semantic segmentation
Why?
CNNs for Pixelwise prediction
CRFs
CRF as RNN
Conclusion
4. Torr Vision Group, Engineering Department
Semantic Segmentation
• Recognizing and delineating objects in an image
Classifying each pixel in the image
5. Torr Vision Group, Engineering Department
Why Semantic Segmentation?
• To help partially sighted people by highlighting
important objects in their glasses
6. Torr Vision Group, Engineering Department
Why Semantic Segmentation?
• To let robots segment objects so that they can grasp
them
7. Torr Vision Group, Engineering Department
• Road scenes understanding
• Useful for autonomous navigation of cars and
drones
Image taken from the cityscapes dataset.
Why Semantic Segmentation?
8. Torr Vision Group, Engineering Department
• Useful tool for editing images
Why Semantic Segmentation?
9. Torr Vision Group, Engineering Department
• Medical purposes: e.g. segmenting
tumours, dental cavities, ...
Image taken from Mauricio Reyes
ISBI Challenge 2015, dental x-ray images
Why Semantic Segmentation?
10. Torr Vision Group, Engineering Department
But How?
• Deep convolutional neural networks are successful at
learning a good representation of the visual inputs.
• However, here we have a structured output.
11. Torr Vision Group, Engineering Department
CNN for Pixel-wise Labelling
• Usual convolutional networks
12. Torr Vision Group, Engineering Department
CNN for Pixel-wise Labelling
• Usual convolutional networks
• Fully convolutional networks
Long et. al., Fully Convolutional Networks for Semantic Segmentation, CVPR 2015.
13. Torr Vision Group, Engineering Department
Fully Convolutional Networks
[Long et al, CVPR 2014]
14. Torr Vision Group, Engineering Department
+ Significantly improved the state of the art in semantic
segmentation.
- Poor object delineation: e.g. spatial consistency
neglected.
Fully Convolutional Networks
[Long et al, CVPR 2014]
Image FCN Results Ground truth
15. Torr Vision Group, Engineering Department
• A CRF can account for contextual information in the
image
Conditional Random Fields (CRFs)
Coarse output from the
pixel-wise classifier
MRF/CRF modelling Output after the CRF
inference
16. Torr Vision Group, Engineering Department
Conditional Random Fields (CRFs)
∈ {bg, cat, tree, person, …}
• Define a discrete random variable Xi for each pixel i.
• Each Xi can take a value from the label set.
• Connect random variables to form a random field. (MRF)
17. Torr Vision Group, Engineering Department
Conditional Random Fields (CRFs)
∈ {bg, cat, tree, person, …} = cat= bg
• Define a discrete random variable Xi for each pixel i.
• Each Xi can take a value from the label set.
• Connect random variables to form a random field. (MRF)
• Most probable assignment given the image → segmentation.
18. Torr Vision Group, Engineering Department
Finding the Best Assignment
= bg
Pr = , = , … , = | = Pr ( = | )
= cat
Pr = | = exp − |
• Maximize Pr = → Minimize
• So we have formulated the problem as an energy minimization.
20. Torr Vision Group, Engineering Department
Unary energy
( = ) = ?
| = _ + _
=
21. Torr Vision Group, Engineering Department
Unary energy
( = ) = ?
Your label doesn’t agree with the initial
classifier → you pay a penalty.
| = _ + _
=
22. Torr Vision Group, Engineering Department
Unary energy
( = ) = ?
Your label doesn’t agree with the initial
classifier → you pay a penalty.
Pairwise energy
( = , = ) = ?
You assign different labels to two very similar
pixels → you pay a penalty.
How do you measure similarity?
| = _ + _
23. Torr Vision Group, Engineering Department
Unary energy
( = ) = ?
Your label doesn’t agree with the initial
classifier → you pay a penalty.
Pairwise energy
( = , = ) = ?
You assign different labels to two very similar
pixels → you pay a penalty.
How do you measure similarity?
| = _ + _
24. Torr Vision Group, Engineering Department
Unary energy
( = ) = ?
Your label doesn’t agree with the initial
classifier → you pay a penalty.
Pairwise energy
( = , = ) = ?
You assign different labels to two very similar
pixels → you pay a penalty.
How do you measure similarity?
| = _ + _
25. Torr Vision Group, Engineering Department
Dense CRF Formulation
• Pairwise energies are defined for every pixel pair in the
image.
= ( ) + ( , )
,
• Exact inference is not feasible.
• Use approximate mean field inference.
[Krähenbühl & Koltun, NIPS 2011.]
26. Torr Vision Group, Engineering Department
Dense CRF Formulation
• Pairwise energies are defined for every pixel pair in the
image.
= ( ) + ( , )
,
• Exact inference is not feasible.
• Use approximate mean field inference.
[Krähenbühl & Koltun, NIPS 2011.]
exp (− ) = = ( )
28. Torr Vision Group, Engineering Department
BilateralQ
I
U
Fully Connected CRFs as a CNN
29. Torr Vision Group, Engineering Department
Bilateral ConvQ
I
U
Fully Connected CRFs as a CNN
30. Torr Vision Group, Engineering Department
Bilateral Conv ConvQ
I
U
Fully Connected CRFs as a CNN
31. Torr Vision Group, Engineering Department
Bilateral Conv Conv +Q
I
U
Fully Connected CRFs as a CNN
32. Torr Vision Group, Engineering Department
Bilateral Conv Conv + SoftMaxQ
I
U
Fully Connected CRFs as a CNN
33. Torr Vision Group, Engineering Department
Bilateral Conv Conv + SoftMaxQ
I
U
CRF as a Recurrent Neural Network
• Each of these blocks is differentiable We can backprop
Mean-field Iteration
34. Torr Vision Group, Engineering Department
CRF
Iteration
SoftMax
Image
Unaries
• Each of these blocks is differentiable We can backprop
Output
CRF as RNN
CRF as a Recurrent Neural Network
35. Torr Vision Group, Engineering Department
Putting Things Together
FCN CRF-RNN
36. Torr Vision Group, Engineering Department
Experiments
68.3 69.5 72.9
FCN CRFFCN
CRF-
RNN
CRF-
RNN
FCN
Ours[Chen et al, 2015][Long et al, 2014]
37. Torr Vision Group, Engineering Department
Try our demo: http://crfasrnn.torr.vision
Code & model: https://github.com/torrvision/crfasrnn
Shuai Zheng
Bernardino
Romera-Paredes
Philip Torr
38. Torr Vision Group, Engineering Department
Examples
http://pp.vk.me/c622119/v622119584/20dc3/7lS5BU2Bp_k.jpg
39. Torr Vision Group, Engineering Department
Examples
http://media1.fdncms.com/boiseweekly/imager/mountain-bikers-are-advised-to-dism/u/original/3446917/walk_thru_sheep_1_.jpg
40. Torr Vision Group, Engineering Department
Examples
http://img.rtvslo.si/_up/upload/2014/07/22/65129194_tour-3.jpg
41. Torr Vision Group, Engineering Department
Examples
http://www.toxel.com/wp-content/uploads/2010/11/bike05.jpg
42. Torr Vision Group, Engineering Department
Not-so-good examples
http://www.independent.co.uk/incoming/article10335615.ece/alternates/w620/planecat.jpg
43. Torr Vision Group, Engineering Department
http://i1.wp.com/theverybesttop10.files.wordpress.com/2013/02/the-world_s-top-10-best-images-of-camouflage-cats-5.jpg?resize=375,500
Not-so-good examples
44. Torr Vision Group, Engineering Department
Tricky examples
http://se-preparer-aux-crises.fr/wp-content/uploads/2013/10/Golum.png
45. Torr Vision Group, Engineering Department
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRf4J7Hszkc8Wf6riVUX-cV_K-un8LJy5dYIBW1KDIn6i7UCzGHpg
Tricky examples
46. Torr Vision Group, Engineering Department
http://i.huffpost.com/gen/1478236/thumbs/s-DIRD6-large640.jpg
Tricky examples
47. Torr Vision Group, Engineering Department
Conclusion
• CNNs yield a coarse prediction on pixel-labeled tasks.
• CRFs improve the result by accounting for the contextual
information in the image.
• Learning the whole pipeline end-to-end significantly
improves the results.
CNN CRF
48. Torr Vision Group, Engineering Department
Conclusion
• CNNs yield a coarse prediction on pixel-labeled tasks.
• CRFs improve the result by accounting for the contextual
information in the image.
• Learning the whole pipeline end-to-end significantly
improves the results.
CNN CRF
Thank You!