Presentation for the Berlin Computer Vision Group, December 2020 on deep learning methods for image segmentation: Instance segmentation, semantic segmentation, and panoptic segmentation.
4. Classification, detection, and segmentation
Classification refers to image-wide labels
Detection refers to localization of bounding boxes with labels
Segmentation refers to pixel-wise localization of the labels
5. Goals of supervised image segmentation
Given an input image we wish to obtain:
1. A class label associated to each individual pixel in the image. This is also called pixel-wise
localization.
3. The probability score associated with each class label
15. “Fully Convolutional” networks draw segmentation
masks
All layers in the network are convolutional, there is no fully connected (aka “dense”) layer like in most
classifiers, we use the local info of the pixel neighborhood
16. What is a convolution filter?
https://setosa.io/ev/image-kernels/
17. What is a convolution filter?
https://setosa.io/ev/image-kernels/
18. What is a convolution filter?
Convolution of 3x3 and stride = 1 without padding
Effect: the output loses one pixel on each dimension
19. What is a convolution filter?
Convolution of 3x3 and stride = 1 with zero padding
Effect: the output preserves original image size
20. What is a convolution filter?
Convolution of 3x3 and stride = 2 with zero padding
Effect: the output is downsampled to about half its size
21. “Fully Convolutional” networks draw segmentation
masks
All layers in the network are convolutional, there is no fully connected (aka “dense”) layer like in most
classifiers, we use the local info of the pixel neighborhood
22. U-net for semantic segmentation
All layers in the network are convolutional, there is no fully connected (aka “dense”) layer like in most
classifiers, we need this fully convolutional architecture to label images pixel by pixel preserving their
local info
23. U-net for semantic segmentation
All layers in the network are convolutional, there is no fully connected (aka “dense”) layer like in most
classifiers, we need this fully convolutional architecture to label images pixel by pixel preserving their
local info
54. Which things should be kept in this picture?
Kid, ball, 2 dogs, 9 people?
Example Case: Image Matting
Photo: Treddy Chen
https://unsplash.com/photos/UdQWvefOXJk
55. Issue: When there is more than one person in the image...
Example Case: Image Matting
56. Review questions
- How do we compute the confusion matrix for a segmentation mask? How do we
compute it for a bounding box?
- Can we use the Intersection over Union equation to evaluate the quality of a
segmentation mask?
- What’s the recall of a classifier that only outputs ‘1’ (positive class)?
- What’s the precision of a classifier that outputs a single true positive, with all its
other predictions being equal to ‘0’ (negative class)?
- Why does precision go down when recall increases?
- Does the F1 measure weigh precision and recall equally?
- What’s the appeal of using Detectron2? Do we need to write a Pytorch model to
use it for inference or training?
57. Google Colab Notebooks
● Unet in FastAI 2
● Mask R-CNN and Panoptic Segmentation with Detectron 2
58. - How does panoptic segmentation combine instance and semantic
segmentation? Which method produces the ‘stuff’? Which method produces
the ‘things’?
- Is semantic segmentation more computationally costly than instance
segmentation? Why?
- Is panoptic segmentation more computationally costly than instance
segmentation? Why?
Review questions