Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)

3,266 views

Published on

https://telecombcn-dl.github.io/2017-dlcv/

Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.

Published in: Data & Analytics

Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)

  1. 1. [course site] Verónica Vilaplana veronica.vilaplana@upc.edu Associate Professor Universitat Politecnica de Catalunya Technical University of Catalonia Segmentation Day 3 Lecture 1 #DLUPC
  2. 2. Outline ● What is object segmentation? ○ Applications ● Semantic segmentation ○ From image classification to semantic segmentation ■ Fully convolutional networks ■ Learnable Upsampling ■ Skip connections ○ FCN8s, Dilated Convolutions, U-Net, ... ● Instance segmentation ○ Simultaneous Detection and Segmentation ○ Mask R-CNN 2
  3. 3. Image and object segmentation ● Image Segmentation ○ Group pixels into regions that share some similar properties ● Segmenting images into meaningful objects ○ Object-level segmentation: accurate localization and recognition Superpixels (Ren ICCV 2003 3
  4. 4. Object segmentation: applications Image editing and composition (Xu, 2016) Robotics Autonomous driving (cordts, 2016) Medical image analysis (Casamitjana, 2017) 4
  5. 5. Semantic segmentation ● Label every pixel: recognize the class of every pixel ● Do not differentiate instances 5Mottaghi et al, “The role of context for object detection and semantic segmentation in the wild”, CVPR 2014
  6. 6. Instance segmentation ● Detect instances, categorize and label every pixel ● Labels are class-aware and instance-aware 6 Arnab,Torr “Pixelwise instance segmentation with a dynamically instantiated network”, CVPR 2017 Object detection Semantic Segm. Instance segm. Ground truth
  7. 7. Datasets for semantic/instance segmentation 7 ● 20 categories ● +10,000 images ● Semantic segmentation GT ● Instance segmentation GT ● Real indoor & outdoor scenes ● 540 categories ● +10,000 images ● Dense annotations ● Semantic segmentation GT ● Objects + stuff Pascal Visual Object Classes Pascal Context
  8. 8. Datasets for semantic/instance segmentation 8 ● Real indoor scenes ● 10,000 images ● 58,658 3D bounding boxes ● Dense annotations ● Instances GT ● Semantic segmentation GT ● Objects + stuff ● Real indoor & outdoor scenes ● 80 categories ● +300,000 images ● 2M instances ● Partial annotations ● Semantic segmentation GT ● Instance segmentation GT ● Objects, but no stuff SUN RGB-D COCO Common Objects in Context
  9. 9. Datasets for semantic/instance segmentation 9 ● Real driving scenes ● 30 categories ● +25,000 images ● 20,000 partial annotations ● 5,000 dense annotations ● Semantic segmentation GT ● Instance segmentation GT ● Depth, GPS and other metadata ● Objects and stuff ● Real general scenes ● +150 categories ● +22,000 images ● Semantic segmentation GT ● Instance + parts segmentation GT ● Objects and stuff CityScapes ADE20K
  10. 10. From classification to semantic segmentation 10 CNN DOG CAT extract a patch run through a CNN trained for image classification classify the center pixel CAT repeat for every pixel
  11. 11. From classification to semantic segmentation ● A classification network becoming fully convolutional ○ Fully connected layers can also be viewed as convolutions with kernels that cover the entire input region 11 Shelhamer, Long, Darrell, Fully Convolutional Networks for Semantic Segmentation, 2014-2016
  12. 12. From classification to semantic segmentation ● Dense prediction: fully convolutional, end to end, pixel-to-pixel network ● Problems ○ Output is smaller than input → Add upsampling layers ○ Output is very coarse → Add fine details from previous layers 12 Final layer is 1X1 conv with #channels = #classes Pixelwise loss function: Credit: Shelhamer, Long
  13. 13. ● Dense prediction: fully convolutional, end to end, pixel-to-pixel network From classification to semantic segmentation 13 Final layer is 1X1 conv with #channels = #classes Pixelwise loss function: Conv, pool, non linearity Learnable Upsampling Pixelwise Output + loss Credit: Shelhamer, Long
  14. 14. Learnable upsampling: recovering spatial shape Upsampling: Transposed convolution also called fractionally strided convolution or ‘deconvolution’ 14 Convolution More info: Dumoulin et al, A guide to convolution arithmetic for deep learning, 2016 https://github.com/vdumoulin/conv_arithmetic I input image 4x4 vectorized to 16x1 O output image 4x1 (later reshaped 2x2) h 3x3 kernel; C 16x4 (weights) The backward pass is obtained by transposing C: CT Transposed convolution (also called fractionally strided convolution or ‘deconvolution’): swaps forward and backward passes of a convolution Convolution as a matrix operation C=
  15. 15. Learnable upsampling: recovering spatial shape It is always possible to emulate a transposed convolution with a direct convolution (fractional stride) 15 More info: Dumoulin et al, A guide to convolution arithmetic for deep learning, 2016 https://github.com/vdumoulin/conv_arithmetic
  16. 16. Learnable upsampling: recovering spatial shape 16 1D Convolution with stride 2 1D Transposed Convolution with stride 2 1D Subpixel convolution with stride 1/2 The two operators can achieve the same result if the filters are learned. Shi, Is the deconvolution layer the same as a convolutional layer?, 2016 1D example:
  17. 17. DeconvNet: VGG-16 (conv+Relu+MaxPool) + mirrored VGG (Unpooling+’deconv’+Relu) More than one upsampling layer 17 Normal VGG “Upside down” VGG Noh et al, “Learning Deconvolution Network for Semantic Segmentation”, ICCV 2015
  18. 18. Resolution: spectrum of deep features Problem: coarse output ● Combine where (local, shallow) with what (global, deep) 18 fuse features into deep jet (cf. Hariharan et al. CVPR15 “hypercolumn”) Credit: Shelhamer, Long
  19. 19. Fine details: skip connections 19 Adding 1x1 conv classifying layer on top of pool4, Then upsample x2 (init to bilinear and then learned) conv7 prediction, sum both, and upsample x16 for output end-to-end, joint learning of semantics and location skip tu fuse layers Credit: Shelhamer, Long
  20. 20. Skip connections ● A multi-stream network that fuses features/predictions across layers 20 Input image stride 32 stride 16 stride 8 ground truth no skps 1 skip 2 skipsCredit: Shelhamer, Long
  21. 21. Transfer learning ● Cast ILSVRC (AlexNet, VGG, GoogLeNet) classifiers into FCNs and augment them for dense prediction: discard classifier layer, transform FC to CONV, add 1X1 CONV with 21 filters for scoring at each output location, upsampling ● Add skip connections ● Train for segmentation by fine-tuning all layers with PASCAL VOC 2011 with a pixelwise loss. ● Metrics: pixel accuracy, mean accuracy, mean pixel intersection over union 21 Mean IU: Per-class evaluation: an intersection of the predicted and true sets of pixels for a given class, divided by their union Pascal Test Set Pascal Validation Set Based on VGG FCN-32s = FCN VGG
  22. 22. FCN-8s results on Pascal 22 SDS: Simultaneous Detection and Segmentation Hariharan et al. ECCV14
  23. 23. Semantic segmentation Typical architecture ● Downsampling path: extracts coarse features ● Upsampling path: recovers input image resolution ● Skip connections: recovers detailed information ● Post-processing (optional): refines predictions (CRF) Other architectures: ● DeepLab: ‘atrous’ convolutions + spatial pyramid + CRF (Chen, ICLR 2015) ● CRF-RNN: FCN + CRF as Recurrent NN (Zheng, ICCV 2015) ● U-Net (Ronnemberger, 2015) ● Fully Convolutional DenseNets (Jégou, 2016) ● Dilated convolutions (Yu, 2016) 23
  24. 24. U-Net ● A contracting path and an expansive path ● Adds convolutions in the upsampling path (“symmetric” net) ● Skip connections: concatenation of feature maps 24 Ronneberger et al, “U-Net: Convolutional Networks for Biomedical Image Segmentation”, arXiv 2015 Winner of CAD Caries challenge ISBI 2015 Cell tracking challenge ISBI 2015
  25. 25. Fully convolutional DenseNets ● Adds feed-forward connections between layers ● Based on U-Nets: ○ connections between downsampling – upsampling paths ● Based on DenseNets* (for image classification): ○ each layer directly connected to every other layer ○ alleviate the vanishing-gradient problem ○ strengthen feature propagation ○ encourage feature reuse ○ substantially reduce the number of parameters. 25 Jégou et al, “The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation”, Dec. 2016 * Huang et al, “Densely connected convolutional networks”, arxiv Aug 2016 Dense block: Complete architecture:
  26. 26. Dilated convolutions ● Systematically aggregate multiscale contextual information without losing resolution ○ Usual convolution ○ Dilated convolution 26 Yu, Koltun, Multi-scale context aggregation by dilated convolutions, 2016
  27. 27. Instance segmentation ● Detect instances, categorize and label every pixel ● Labels are class-aware and instance-aware 27 Arnab,Torr “Pixelwise instance segmentation with a dynamically instantiated network”, CVPR 2017 Object detection Semantic Segm. Instance segm. Ground truth
  28. 28. Instance segmentation: Multi-task cascades 28 Dai et al, “Instance-aware Semantic Segmentation via Multi-task Network Cascades”, arXiv 2015 Won COCO 2015 challenge (with ResNet) Learn entire model end to end
  29. 29. Instance segmentation: Multi-task cascades Results on Pascal VOC 2012 and MS COCO: 29
  30. 30. Instance segmentation: Mask R-CNN ● Extension of Faster R-CNN to instance segmentation ● A Fully Convolutional Network (FCN) is added on top of the CNN features of Faster R-CNN to generate a mask (segmentation output). ● This is in parallel to the classification and bounding box regression network of Faster R-CNN ● RoIAlign instead of RoIPool to properly aligning extracted features with input 30He et al, Mask R-CNN, 2017
  31. 31. Instance segmentation: Mask R-CNN ● Classification and bounding box detection losses like Faster R-CNN ● A new loss term for mask prediction ● Output: C x m x m volume for mask prediction (C classes, m size of square mask) 31 He et al, Mask R-CNN, 2017
  32. 32. Instance segmentation: Mask R-CNN ● Masks are combined with classifications and bounding boxes from Faster R-CNN 32 He et al, Mask R-CNN, 2017
  33. 33. Instance segmentation: Mask R-CNN ● Results on COCO dataset ● MNC and FCIS were winners of COCO 2015 and 2016 33 MNC: Dai et al, “Instance-aware Semantic Segmentation via Multi-task Network Cascades”, arXiv 2015 FCIS: Li et al., “Fully convolutional instance-aware semantic segmentation”, CVPR 2017
  34. 34. Summary ● Semantic segmentation ○ Fully convolutional networks ○ Learnable Upsampling ○ Skip connections ○ Models: ■ FCN8s, Dilated Convolutions, U-Net, FC Densenets ● Instance segmentation ○ Based on object/segments proposals ■ Simultaneous Detection and Segmentation (R-CNN) ■ Multii-task cascade (Faster R-CNN) ■ Mask R-CNN (Faster R-CNN) ○ Others ■ Recurrent instance segmentation 34
  35. 35. Questions? 35
  36. 36. FCN-8s vs DeepLab vs Dilated Convolutions 36 Input image FCN-8s DeepLab DilConv Ground truth Pascal VOC 2012 test set Mean IoU: FCN-8s = 62.2 DeepLab = 62.1 DilConv = 67.6 FC8: Fully Convolutional Networks for Semantic Segmentation, Long, Darrell, Shelhamer, 2014-2016 DeepLab: Semantic Image Segm. with Deep Conv. Nets, Atrous Convolution, and Fully Connected CRFs, Chen, et al, 2015 Dilated convolutions: Multi-scale context aggregation by dilated convolutions, Yu, Koltun, 2016

×