Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)

887 views

Published on

https://telecombcn-dl.github.io/dlmm-2017-dcu/

Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.

Published in: Data & Analytics
  • Be the first to comment

Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)

  1. 1. Amaia Salvador amaia.salvador@upc.edu PhD Candidate Universitat Politècnica de Catalunya DEEP LEARNING WORKSHOP Dublin City University 27-28 April 2017 Object Segmentation Day 2 Lecture 7
  2. 2. Object Segmentation Define the accurate boundaries of all objects in an image 2
  3. 3. Semantic Segmentation Label every pixel! Don’t differentiate instances (cows) Classic computer vision problem Slide Credit: CS231n 3
  4. 4. Instance Segmentation Detect instances, give category, label pixels “simultaneous detection and segmentation” (SDS) Slide Credit: CS231n 4
  5. 5. Object Segmentation: Datasets Pascal Visual Object Classes 20 Classes ~ 5.000 images Pascal Context 540 Classes ~ 10.000 images 5
  6. 6. Object Segmentation: Datasets SUN RGB-D 19 Classes ~ 10.000 images Microsoft COCO 80 Classes ~ 300.000 images 6
  7. 7. Object Segmentation: Datasets CityScapes 30 Classes ~ 25.000 images ADE20K >150 Classes ~ 22.000 images 7
  8. 8. Semantic Segmentation Slide Credit: CS231n CNN COW Extract patch Run through a CNN Classify center pixel Repeat for every pixel 8
  9. 9. Semantic Segmentation Slide Credit: CS231n CNN Run “fully convolutional” network to get all pixels at once 9
  10. 10. Semantic Segmentation Slide Credit: CS231n CNN Smaller output due to pooling Problem 1: 10
  11. 11. Learnable upsampling Long et al. Fully Convolutional Networks for Semantic Segmentation. CVPR 2015 Learnable upsampling! Slide Credit: CS231n 11
  12. 12. Reminder: Convolutional Layer Slide Credit: CS231n Typical 3 x 3 convolution, stride 1 pad 1 Input: 4 x 4 Output: 4 x 4 12
  13. 13. Reminder: Convolutional Layer Slide Credit: CS231n Typical 3 x 3 convolution, stride 1 pad 1 Input: 4 x 4 Output: 4 x 4 Dot product between filter and input 13
  14. 14. Reminder: Convolutional Layer Slide Credit: CS231n Typical 3 x 3 convolution, stride 1 pad 1 Input: 4 x 4 Output: 4 x 4 Dot product between filter and input 14
  15. 15. Reminder: Convolutional Layer Slide Credit: CS231n Typical 3 x 3 convolution, stride 2 pad 1 Input: 4 x 4 Output: 2 x 2 15
  16. 16. Reminder: Convolutional Layer Slide Credit: CS231n Typical 3 x 3 convolution, stride 2 pad 1 Input: 4 x 4 Output: 2 x 2 Dot product between filter and input 16
  17. 17. Reminder: Convolutional Layer Slide Credit: CS231n Typical 3 x 3 convolution, stride 2 pad 1 Input: 4 x 4 Output: 2 x 2 Dot product between filter and input 17
  18. 18. Learnable Upsample: Deconvolutional Layer Slide Credit: CS231n 3 x 3 “deconvolution”, stride 2 pad 1 Input: 2 x 2 Output: 4 x 4 18
  19. 19. Slide Credit: CS231n 3 x 3 “deconvolution”, stride 2 pad 1 Input: 2 x 2 Output: 4 x 4 Input gives weight for filter values Learnable Upsample: Deconvolutional Layer 19
  20. 20. Learnable Upsample: Deconvolutional Layer Slide Credit: CS231n 3 x 3 “deconvolution”, stride 2 pad 1 Input: 2 x 2 Output: 4 x 4 Input gives weight for filter values Sum where output overlaps 20
  21. 21. Learnable Upsample: Deconvolutional Layer Warning: Checkerboard effect when kernel size is not divisible by the stride Source: distill.pub 21
  22. 22. Learnable Upsample: Deconvolutional Layer Source: distill.pub stride = 2, kernel_size = 3 22 Warning: Checkerboard effect when kernel size is not divisible by the stride
  23. 23. Semantic Segmentation Slide Credit: CS231n Noh et al. Learning Deconvolution Network for Semantic Segmentation. ICCV 2015 “Regular” VGG “Upside down” VGG 23
  24. 24. Better Upsampling: Subpixel Re-arange features in previous convolutional layer to form a higher resolution output Shi et al.Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network.CVPR 2016 24
  25. 25. Semantic Segmentation CNN Blobby-like segmentations Problem 2: High-level features (e.g. conv5 layer) from a pretrained classification network are the input for the segmentation branch 25
  26. 26. Skip Connections Slide Credit: CS231n Skip connections = Better results “skip connections” Long et al. Fully Convolutional Networks for Semantic Segmentation. CVPR 2015 Recovering low level features from early layers 26
  27. 27. Dilated Convolutions Yu & Koltun. Multi-Scale Context Aggregation by Dilated Convolutions. ICLR 2016 Structural change in convolutional layers for dense prediction problems (e.g. image segmentation) ● The receptive field grows exponentially as you add more layers → more context information in deeper layers wrt regular convolutions ● Number of parameters increases linearly as you add more layers 27
  28. 28. Instance Segmentation Detect instances, give category, label pixels “simultaneous detection and segmentation” (SDS) Slide Credit: CS231n 28
  29. 29. Instance Segmentation More challenging than Semantic Segmentation ● Number of objects is variable ● No unique match between predicted and ground truth objects (cannot use instance IDs) Several attack lines: ● Proposal-based methods ● Recurrent Neural Networks 29
  30. 30. Proposal-based Slide Credit: CS231nHariharan et al. Simultaneous Detection and Segmentation. ECCV 2014 External Segment proposals Mask out background with mean image Similar to R-CNN, but with segment proposals 30
  31. 31. Proposal-based Slide Credit: CS231nHariharan et al. Hypercolumns for Object Segmentation and Fine-grained Localization. CVPR 2015 31
  32. 32. Proposal-based Instance Segmentation: MNC Dai et al. Instance-aware Semantic Segmentation via Multi-task Network Cascades. CVPR 2016 Won COCO 2015 challenge (with ResNet) Region proposal network (RPN) Reshape boxes to fixed size, figure / ground logistic regression Mask out background, predict object class Learn entire model end-to-end! Faster R-CNN for Pixel Level Segmentation in a multi-stage cascade strategy 32
  33. 33. Dai et al. Instance-aware Semantic Segmentation via Multi-task Network Cascades. CVPR 2016 Predictions Ground truth Proposal-based Instance Segmentation: MNC 33
  34. 34. He et al. Mask R-CNN. arXiv Mar 2017 Proposal-based Instance Segmentation: Mask R-CNN Faster R-CNN for Pixel Level Segmentation as a parallel prediction of masks and class labels 34
  35. 35. He et al. Mask R-CNN. arXiv Mar 2017 Mask R-CNN ● Classification & box detection losses are identical to those in Faster R-CNN ● Addition of a new loss term for mask prediction: The network outputs a K x m x m volume for mask prediction, where K is the number of categories and m is the size of the mask (square) 35
  36. 36. He et al. Mask R-CNN. arXiv Mar 2017 Mask R-CNN: RoI Align Reminder: RoI Pool from Fast R-CNN Hi-res input image: 3 x 800 x 600 with region proposal Convolution and Pooling Hi-res conv features: C x H x W with region proposal Fully-connected layers Max-pool within each grid cell RoI conv features: C x h x w for region proposal Fully-connected layers expect low-res conv features: C x h x w x/16 & rounding → misalignment ! + not differentiable 36
  37. 37. Jaderberg et al. Spatial Transformer Networks. NIPS 2015 Mask R-CNN: RoI Align Use bilinear interpolation instead of cropping + maxpool 37 Mapping given by box coordinates ( 12 and 21 = 0 translation + scale)
  38. 38. 38
  39. 39. He et al. Mask R-CNN. arXiv Mar 2017 Mask R-CNN Object Detection Instance Segmentation 39
  40. 40. Recurrent Instance Segmentation Romera-Paredes & H.S. Torr. Recurrent Instance Segmentation ECCV 2016 40 Sequential mask generation
  41. 41. Recurrent Instance Segmentation Romera-Paredes & H.S. Torr. Recurrent Instance Segmentation ECCV 2016 41 Mapping between ground truth and predicted masks ?
  42. 42. Recurrent Instance Segmentation: Romera-Paredes & H.S. Torr. Recurrent Instance Segmentation ECCV 2016 Slide Credit: M. Baradad, ReadCV@UPC 1-Compute the IoU for all pairs of Predicted/GT masks Ŷt Yt 42 Coverage Loss
  43. 43. Recurrent Instance Segmentation: Romera-Paredes & H.S. Torr. Recurrent Instance Segmentation ECCV 2016 Slide Credit: M. Baradad, ReadCV@UPC 1-Compute the IoU for all pairs of Predicted/GT masks 0.9 0 0 0.1 0.8 0.1 ... ... ... ... 43 Coverage Loss
  44. 44. Recurrent Instance Segmentation: Romera-Paredes & H.S. Torr. Recurrent Instance Segmentation ECCV 2016 Slide Credit: M. Baradad, ReadCV@UPC 2-Find best matching: Loss: Sum of the Intersections over the union for the best matching (*-1) 44 Coverage Loss
  45. 45. Recurrent Instance Segmentation: Romera-Paredes & H.S. Torr. Recurrent Instance Segmentation ECCV 2016 Slide Credit: M. Baradad, ReadCV@UPC 3-Also take into account the scores s1 = 0.93 s2 = 0.73 s3 = 0.86 s4 = 0.63 s5 = 0.56 Where: is the binary cross entropy: is the Iverson bracket which: Is 1 if the condition is true and 0 else 45 Coverage Loss
  46. 46. Recurrent Instance Segmentation: Romera-Paredes & H.S. Torr. Recurrent Instance Segmentation ECCV 2016 Slide Credit: M. Baradad, ReadCV@UPC 4-Add everything together 46 Coverage Loss
  47. 47. Recurrent Instance Segmentation Romera-Paredes & H.S. Torr. Recurrent Instance Segmentation ECCV 2016 47
  48. 48. Summary Segmentation Datasets Semantic Segmentation Methods ● Deconvolution ● Dilated Convolution ● Skip Connections Instance Segmentation Methods ● Proposal-Based ● Recurrent 48
  49. 49. Questions ?

×