Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018

225 views

Published on

https://telecombcn-dl.github.io/2018-dlcv/

Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.

Published in: Data & Analytics
  • Be the first to comment

Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018

  1. 1. Míriam Bellver miriam.bellver@bsc.edu PhD Candidate Barcelona Supercomputing Center Semantic Segmentation Day 2 Lecture 3 #DLUPC http://bit.ly/dlcv2018
  2. 2. Segmentation Segmentation Define the accurate boundaries of all objects in an image 2
  3. 3. Semantic Segmentation Label every pixel! Don’t differentiate instances (cows) Classic computer vision problem Slide Credit: CS231n 3
  4. 4. Instance Segmentation Detect instances, give category, label pixels “simultaneous detection and segmentation” (SDS) Label are class-aware and instance-aware Slide Credit: CS231n 4
  5. 5. Outline Segmentation Datasets Semantic Segmentation Methods ● Deconvolution (or transposed convolution) ● Dilated Convolution ● Skip Connections 5
  6. 6. Segmentation: Datasets ● 20 categories ● +10,000 images ● Semantic segmentation GT ● Instance segmentation GT ● Real indoor & outdoor scenes ● 540 categories ● +10,000 images ● Dense annotations ● Semantic segmentation GT ● Objects + stuff Pascal Visual Object Classes Pascal Context 6
  7. 7. Segmentation: Datasets ● Real indoor & outdoor scenes ● 80 categories ● +300,000 images ● 2M instances ● Partial annotations ● Semantic segmentation GT ● Instance segmentation GT ● Objects, but no stuff COCO Common Objects in Context 7 ● Real general scenes ● +150 categories ● +22,000 images ● Semantic segmentation GT ● Instance + parts segmentation GT ● Objects and stuff ADE20K
  8. 8. Segmentation: Datasets ● Real driving scenes ● 30 categories ● +25,000 images ● 20,000 partial annotations ● 5,000 dense annotations ● Semantic segmentation GT ● Instance segmentation GT ● Depth, GPS and other metadata ● Objects and stuff ● Real driving scenes ● 100 categories ● 25,000 images ● Semantic segmentation GT ● Instance + parts segmentation GT ● Objects and stuff CityScapes Mapillary Vistas Dataset 8
  9. 9. Outline Segmentation Datasets Semantic Segmentation Methods ● Deconvolution (or transposed convolution) ● Dilated Convolution ● Skip Connections 9
  10. 10. From Classification to Segmentation Slide Credit: CS231n CNN COW Extract patch Run through a CNN Classify center pixel Repeat for every pixel 10
  11. 11. From Classification to Segmentation Slide Credit: CS231n CNN Run “fully convolutional” network to get all pixels at once 11
  12. 12. Semantic Segmentation Slide Credit: CS231n CNN Smaller output due to pooling Problem 1: 12
  13. 13. Learnable upsampling Long et al. Fully Convolutional Networks for Semantic Segmentation. CVPR 2015 Slide Credit: CS231n 13
  14. 14. Reminder: Convolutional Layer Slide Credit: CS231n Typical 3 x 3 convolution, stride 1 pad 1 Input: 4 x 4 Output: 4 x 4 14
  15. 15. Reminder: Convolutional Layer Slide Credit: CS231n Typical 3 x 3 convolution, stride 1 pad 1 Input: 4 x 4 Output: 4 x 4 Dot product between filter and input 15
  16. 16. Reminder: Convolutional Layer Slide Credit: CS231n Typical 3 x 3 convolution, stride 1 pad 1 Input: 4 x 4 Output: 4 x 4 Dot product between filter and input 16
  17. 17. Reminder: Convolutional Layer Slide Credit: CS231n Typical 3 x 3 convolution, stride 2 pad 1 Input: 4 x 4 Output: 2 x 2 17
  18. 18. Reminder: Convolutional Layer Slide Credit: CS231n Typical 3 x 3 convolution, stride 2 pad 1 Input: 4 x 4 Output: 2 x 2 Dot product between filter and input 18
  19. 19. Reminder: Convolutional Layer Slide Credit: CS231n Typical 3 x 3 convolution, stride 2 pad 1 Input: 4 x 4 Output: 2 x 2 Dot product between filter and input 19
  20. 20. Learnable Upsample: Transposed Convolution Slide Credit: CS231n 3 x 3 “deconvolution”, stride 2 pad 1 Input: 2 x 2 Output: 4 x 4 20
  21. 21. Slide Credit: CS231n 3 x 3 “deconvolution”, stride 2 pad 1 Input: 2 x 2 Output: 4 x 4 Input gives weight for filter values Learnable Upsample: Transposed Convolution 21
  22. 22. Learnable Upsample: Transposed Convolution Slide Credit: CS231n 3 x 3 “deconvolution”, stride 2 pad 1 Input: 2 x 2 Output: 4 x 4 Input gives weight for filter values Sum where output overlaps 22
  23. 23. Learnable Upsample: Transposed Convolution Warning: Checkerboard effect when kernel size is not divisible by the stride Source: distill.pub 23
  24. 24. Learnable Upsample: Transposed Convolution Source: distill.pub stride = 2, kernel_size = 3 24 Warning: Checkerboard effect when kernel size is not divisible by the stride
  25. 25. Learnable Upsample: Transposed Convolution Warning: Checkerboard effect in images generated by neural networks
  26. 26. Learnable Upsample: Transposed Convolution Slide Credit: CS231n Noh et al. Learning Deconvolution Network for Semantic Segmentation. ICCV 2015 “Regular” VGG “Upside down” VGG 26
  27. 27. Semantic Segmentation CNN Coarse output Problem 2: High-level features (e.g. conv5 layer) from a pretrained classification network are the input for the segmentation branch 27
  28. 28. Skip Connections Slide Credit: CS231n Skip connections = Better results “skip connections” Long et al. Fully Convolutional Networks for Semantic Segmentation. CVPR 2015 Recovering low level features from early layers 28
  29. 29. Dilated Convolutions Yu & Koltun. Multi-Scale Context Aggregation by Dilated Convolutions. ICLR 2016 Structural change in convolutional layers for dense prediction problems (e.g. image segmentation) ● The receptive field grows exponentially as you add more layers → more context information in deeper layers wrt regular convolutions ● Number of parameters increases linearly as you add more layers 29
  30. 30. Dilated Convolutions 30Source: https://github.com/vdumoulin/conv_arithmetic
  31. 31. State-of-the-art models 31 ● U-Net ○ Deconvolutions ○ skip connections Ronneberger et al. U-Net: Convolutional Networks for Biomedical Image Segmentation. MICCAI 2015
  32. 32. State-of-the-art models 32 ● PSPNet (dilated convolutions + pyramid pooling) Zhao et al. Pyramid Scene Parsing Network. CVPR 2017
  33. 33. State-of-the-art models 33 ● DeepLab v2 (dilated convolutions + CRF) ● DeepLab v3 (added pyramid pooling. Removed CRF) Chen et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. TPAMI 2017 Chen et al. Rethinking Atrous Convolution for Semantic Image Segmentation. TPAMI 2017
  34. 34. Summary Segmentation Datasets Semantic Segmentation Methods ● Deconvolution (or transposed convolution) ● Dilated Convolution ● Skip Connections 34
  35. 35. Questions? 35

×