Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

RVOS: End-to-End Recurrent Network for Video Object Segmentation (CVPR 2019)

278 views

Published on

https://imatge-upc.github.io/rvos/

Multiple object video object segmentation is a challenging task, specially for the zero-shot case, when no object mask is given at the initial frame and the model has to find the objects to be segmented along the sequence. In our work, we propose a Recurrent network for multiple object Video Object Segmentation (RVOS) that is fully end-to-end trainable. Our model incorporates recurrence on two different domains: (i) the spatial, which allows to discover the different object instances within a frame, and (ii) the temporal, which allows to keep the coherence of the segmented objects along time. We train RVOS for zero-shot video object segmentation and are the first ones to report quantitative results for DAVIS-2017 and YouTube-VOS benchmarks. Further, we adapt RVOS for one-shot video object segmentation by using the masks obtained in previous time steps as inputs to be processed by the recurrent module. Our model reaches comparable results to state-of-the-art techniques in YouTube-VOS benchmark and outperforms all previous video object segmentation methods not using online learning in the DAVIS-2017 benchmark. Moreover, our model achieves faster inference runtimes than previous methods, reaching 44ms/frame on a P100 GPU.

Published in: Data & Analytics
  • His Secret Obsession ✱✱✱ https://dwz1.cc/C0LOhFpr
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Would you like to earn extra cash 》》》 https://dwz1.cc/v5Fcq3Qr
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Real people just like you are kissing the idea of punching the clock for someone else goodbye, and embracing a new way of living. The internet economy is exploding, and there are literally THOUSANDS of great earnings opportunities available right now, all just one click away. ♥♥♥ http://t.cn/AisJWzdm
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Get Paid $25 per hour to watch YouTube videos ➤➤ http://t.cn/AieXiXbg
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Easy and hassle free way to make money online! I have just registered with this site and straight away I was making money! It doesn't get any better than this. Thank you for taking out all the hassle and making money answering surveys as easy as possible even for non-techie guys like me! ♣♣♣ http://t.cn/AieXAuZz
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

RVOS: End-to-End Recurrent Network for Video Object Segmentation (CVPR 2019)

  1. 1. RVOS: End-to-End Recurrent Network for Video Object Segmentation Carles Ventura Míriam Bellver Andreu Girbau Amaia Salvador Ferran Marqués Xavi Giró
  2. 2. Video Object Segmentation One-shot video object segmentation Zero-shot video object segmentation VS
  3. 3. Motivation ● End-to-End Trainable model ○ YouTube-VOS include 3,471 training videos ○ No dependency on other pre-trained networks like optical flow ● Recurrent Network ○ Extend RSIS (Recurrent Semantic Instance Segmentation) to Video Object Segmentation ● Spatial and Temporal Recurrence ○ Study if spatio-temporal recurrence outperforms spatial and temporal recurrence ● One-shot and zero-shot video object segmentation ○ RSIS is able to discover the object instances in an image ○ No published results for zero-shot multiple object video object segmentation ● Fast method ○ No need of fine-tuning at inference (no online learning)
  4. 4. Related Work ● Sequence-to-Sequence (S2S) [1] ○ Drawbacks ■ Each instance is trained and segmented independently ■ Designed only for one-shot video object segmentation [1] N. Xu et al, YouTube-VOS: Sequence-to-Sequence Video Object Segmentation. ECCV 2018
  5. 5. Related Work ● ConvGRU [2] ○ Drawbacks ■ Each instance is trained and segmented independently ■ Optical flow depends on a network trained for another task: model is not end-to-end trainable [2] P. Tokmakov et al., Learning video object segmentation with visual memory. ICCV 2017
  6. 6. Related Work ● RSIS [3] ○ ○ Drawbacks ■ Model is designed for image object segmentation [3] A. Salvador et al., Recurrent Neural Networks for Semantic Instance Segmentation. arXiv
  7. 7. Proposed model ● RVOS: Recurrent Video Object Segmentation ○
  8. 8. Proposed model ● RVOS: Recurrent Video Object Segmentation ○
  9. 9. Proposed model ● RVOS: Recurrent Video Object Segmentation ○
  10. 10. Experiments ● Two different benchmarks ○ YouTube-VOS ■ 3,471 training videos ■ 474 testing videos (validation set) ○ DAVIS-2017 ■ 90 training videos (train+val) ■ 30 testing videos (test-dev set) ● Two different tasks ○ One-shot video object segmentation ○ Zero-shot video object segmentation ● Evaluation measures ○ Region similarity J ○ Contour accuracy F
  11. 11. Experiments: One-shot VOS on YouTube ● Ablation study
  12. 12. Experiments: One-shot VOS on YouTube ● Comparison with SoA techniques
  13. 13. Experiments: One-shot VOS on YouTube ● Performance VS Number of Instances
  14. 14. Experiments: One-shot VOS on YouTube
  15. 15. Experiments: One-shot VOS on YouTube ● Qualitative results
  16. 16. Experiments: One-shot VOS on YouTube ● Qualitative results
  17. 17. Experiments: One-shot VOS on YouTube ● Qualitative results
  18. 18. Experiments: One-shot VOS on YouTube ● Qualitative results
  19. 19. Experiments: Zero-shot VOS on YouTube ● Problem related with annotated objects ○ Missing object annotations
  20. 20. Experiments: Zero-shot VOS on YouTube ● Ablation study ● First results reported for zero-shot VOS on YouTube-VOS
  21. 21. Experiments: Zero-shot VOS on YouTube
  22. 22. Experiments: Zero-shot VOS on YouTube ● Qualitative results
  23. 23. Experiments: Zero-shot VOS on YouTube ● Qualitative results
  24. 24. Experiments: Zero-shot VOS on YouTube ● Qualitative results
  25. 25. Experiments: One-shot VOS on DAVIS-2017 ● We take advantage of the model already trained on YouTube-VOS ○ Apply directly the pre-trained model ○ Finetune on DAVIS-2017 the pre-trained model ● S2S model (SoA also trained on YouTube-VOS) ○ Results on DAVIS-2016 (single object, foreground-background video object segmentation) ○ No results on DAVIS-2017 (multiple object)
  26. 26. Experiments: One-shot VOS on DAVIS-2017 ● Comparison with SoA techniques
  27. 27. Experiments: One-shot VOS on DAVIS-2017 ● Qualitative results
  28. 28. Experiments: Zero-shot VOS on DAVIS-2017 ● First results reported for zero-shot VOS on DAVIS-2017 ● Results for zero-shot VOS on YouTube for unseen categories were also low
  29. 29. Experiments: Zero-shot VOS on DAVIS-2017 ● Qualitative results
  30. 30. Conclusions ● Fully end-to-end trainable model for video object segmentation ● Designed for multiple object video object segmentation ● Designed for one-shot and zero-shot video object segmentation ● Spatio-temporal recurrence outperforms spatial and temporal recurrence ● One-shot video object segmentation: ○ YouTube-VOS: Comparable results to SoA techniques (S2S) ○ DAVIS-2017: ■ Outperform other SoA techniques that do not use online learning ■ Comparable results to some SoA techniques that use online learning ● Zero-shot video object segmentation: ○ No results reported both on YouTube-VOS and DAVIS-2017
  31. 31. Thank you for your attention Carles Ventura Royo cventuraroy@uoc.edu https://imatge-upc.github.io/rvos/

×