Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Video Object Segmentation - Laura Leal-Taixé - UPC Barcelona 2018

228 views

Published on

Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.

Published in: Data & Analytics
  • Be the first to comment

Video Object Segmentation - Laura Leal-Taixé - UPC Barcelona 2018

  1. 1. Prof. Laura Leal-Taixé Dynamic Scene Understanding
  2. 2. Dynamic Scene Understanding Understand every pixel of a video
  3. 3. Dynamic Scene Understanding Understand every pixel of a video road tree Semantic segmentation person car
  4. 4. Dynamic Scene Understanding Understand every pixel of a video person 1 tree Semantic segmentation person 2 person 3 Instance- based segmentation road car
  5. 5. Dynamic Scene Understanding Understand every pixel of a video Semantic segmentation Instance- based segmentation Multiple object tracking
  6. 6. Prof. Laura Leal-Taixé Video Object Segmentation without Temporal Information
  7. 7. Input image First mask Supervised video object segmentation
  8. 8. Architecture • Series of convolutional filters • Upsampling at different scales • Final summation before prediction à get information from all scales Boundary Snapping Snap the foreground mask to accurate contours 3 Foreground Branch Specific object - Less accurate contours 1 Contour Branch Accurate contours - Generic objects 2
  9. 9. Architecture: improving boundaries Boundary Snapping Snap the foreground mask to accurate contours 3 Foreground Branch Specific object - Less accurate contours 1 Contour Branch Accurate contours - Generic objects 2
  10. 10. Architecture: improving boundaries Boundary Snapping Snap the foreground mask to accurate contours 3 Foreground Branch Specific object - Less accurate contours 1 Contour Branch Accurate contours - Generic objects 2
  11. 11. Architecture: improving boundaries Select the superpixels that are 50% or more covered by the foreground mask Boundary Snapping Snap the foreground mask to accurate contours 3 Foreground Branch Specific object - Less accurate contours 1 Contour Branch Accurate contours - Generic objects 2
  12. 12. ResultsonframeN oftestsequence Base Network Pre-trained on ImageNet 1 Parent Network Trained on DAVIS training set 2 Test Network Fine-tuned on frame 1 of test sequence 3 ResultsonframeN oftestsequence Base Network Pre-trained on ImageNet 1 Parent Network Trained on DAVIS training set 2 Test Network Fine-tuned on frame 1 of test sequence 3 ResultsonframeN oftestsequence Base Network Pre-trained on ImageNet 1 Parent Network Trained on DAVIS training set 2 Test Network Fine-tuned on frame 1 of test sequence 3 Edges and basic image features Learns how to do video segmentation Learns which object to segment One-Shot Video Object Segmentation S. Caelles, K.-K. Maninis, J. Pont-Tuset, L. Leal-Taixé, D. Cremers, L. van Gool. CVPR 2017 FinetuningPre-trained Training
  13. 13. ResultsonframeN oftestsequence Base Network Pre-trained on ImageNet 1 Parent Network Trained on DAVIS training set 2 Test Network Fine-tuned on frame 1 of test sequence 3 Learns which object to segment One-Shot Video Object Segmentation • Learning the appearance of the foreground and background objects Finetuning
  14. 14. 102ms – Parent network Finetuning time DAVIS dataset 11.8 pp.
  15. 15. Experiments: heavy occlusions Occlusion 1 Occlusion 2 First mask (input)
  16. 16. Experiments: number of annotations
  17. 17. Experiments: number of annotations
  18. 18. Experiments: highly dynamic scenes
  19. 19. Introducing Semantics • The network does not have the notion of objects • Broken contours are often seen, especially when occluded parts becomes visible
  20. 20. Introducing Semantics
  21. 21. Introducing Semantics
  22. 22. Semantic propagation • OSVOS path: obtain a coarse foreground estimation Semantic Instance Segmentation Result Top Matching InstancesInstance Proposals Input Image First-Round Foreground Estimation Conditional Classifier Semantic Selection & Propagation Semantic Prior Foreground Estimation CNN Appearance Model
  23. 23. Semantic propagation • Semantic prior branch that gives us proposals to select from Semantic Instance Segmentation Result Top Matching InstancesInstance Proposals Input Image First-Round Foreground Estimation Conditional Classifier Semantic Selection & Propagation Semantic Prior Foreground Estimation CNN Appearance Model
  24. 24. Semantic propagation • Semantic prior branch that gives us proposals to select from • Enforce that semantics stay coherent throughout the sequence Semantic Instance Segmentation Result Top Matching InstancesInstance Proposals Input Image First-Round Foreground Estimation Conditional Classifier Semantic Selection & Propagation Semantic Prior Foreground Estimation CNN Appearance Model
  25. 25. Semantic propagation Semantic Selection Selected Instances: Person and Motorbike Ground Truth Instance Segmentation Proposals Semantic Propagation InstanceSegmentation Proposals First-Round ForegroundEstimation TopPersonand Motorbike Frame 0 Frame 18 Frame 24 Frame 30 Frame 36
  26. 26. Semantic propagation Semantic Selection Selected Instances: Person and Motorbike Ground Truth Instance Segmentation Proposals Semantic Propagation InstanceSegmentation Proposals First-Round ForegroundEstimation TopPersonand Motorbike Frame 0 Frame 18 Frame 24 Frame 30 Frame 36
  27. 27. Semantic propagation Semantic Selection Selected Instances: Person and Motorbike Ground Truth Instance Segmentation Proposals Semantic Propagation InstanceSegmentation Proposals First-Round ForegroundEstimation TopPersonand Motorbike Frame 0 Frame 18 Frame 24 Frame 30 Frame 36
  28. 28. Semantic propagation Semantic Selection Selected Instances: Person and Motorbike Ground Truth Instance Segmentation Proposals Semantic Propagation InstanceSegmentation Proposals First-Round ForegroundEstimation TopPersonand Motorbike Frame 0 Frame 18 Frame 24 Frame 30 Frame 36
  29. 29. Semantic propagation Semantic Selection Selected Instances: Person and Motorbike Ground Truth Instance Segmentation Proposals Semantic Propagation InstanceSegmentation Proposals First-Round ForegroundEstimation TopPersonand Motorbike Frame 0 Frame 18 Frame 24 Frame 30 Frame 36
  30. 30. Ablation studies on DAVIS 2016 Finetuning on the first mask
  31. 31. Ablation studies on DAVIS 2016 Pre-training the parent network
  32. 32. Ablation studies on DAVIS 2016 Adding semantic information
  33. 33. Ablation studies on DAVIS 2016 Adding semantic information
  34. 34. Ablation studies on DAVIS 2016 Brings a substantial boost in temporal stability
  35. 35. Conclusions • We can do consistent video object segmentation without the need of temporal information • Pose changes show parts of the object the network has not seen and therefore holes in the result are created • Semantic priors help in delivering consistent object-type segmentations
  36. 36. Thank you Dr. Laura Leal-Taixé

×