CURRICULUM LEARNING FOR
RECURRENT VIDEO OBJECT SEGMENTATION
Co-directors: Xavier Giró Nieto and Carles Ventura Royo
Author: Maria Gonzàlez Calabuig
Introduction
Dataset
The model
Experiment sets
Techniques
Qualitative results
YouTube-VOS
Conclusions
CONTENTS
INTRODUCTION
INTRODUCTION
Curriculum Learning for Recurrent VOS - 4 of 144
Curriculum Learning:
Methodology inspired by the learning process of humans. The training data is presented in a
meaningful way, from simple to complex concepts.
Yoshua Bengio et al. “Curriculum Learning”, ICML. 2019.
INTRODUCTION
Curriculum Learning for Recurrent VOS - 5 of 144
Curriculum Learning:
Methodology inspired by the learning process of humans. The training data is presented in a
meaningful way, from simple to complex concepts.
4 curriculums
Yoshua Bengio et al. “Curriculum Learning”, ICML. 2019.
INTRODUCTION
Curriculum Learning for Recurrent VOS - 6 of 144
Curriculum Learning:
Methodology inspired by the learning process of humans. The training data is presented in a
meaningful way, from simple to complex concepts.
4 curriculums
THE DATASET
Yoshua Bengio et al. “Curriculum Learning”, ICML. 2019.
INTRODUCTION
Curriculum Learning for Recurrent VOS - 7 of 144
Curriculum Learning:
Methodology inspired by the learning process of humans. The training data is presented in a
meaningful way, from simple to complex concepts.
4 curriculums
THE DATASET THE MODEL
Yoshua Bengio et al. “Curriculum Learning”, ICML. 2019.
INTRODUCTION
Curriculum Learning for Recurrent VOS - 8 of 144
THE TASK
Semi-supervised or “one-shot” Video Object Segmentation
INTRODUCTION
Curriculum Learning for Recurrent VOS - 9 of 144
THE TASK
Semi-supervised or “one-shot” Video Object Segmentation
INTRODUCTION
Curriculum Learning for Recurrent VOS - 10 of 144
THE TASK
Semi-supervised or “one-shot” Video Object Segmentation
INTRODUCTION
Curriculum Learning for Recurrent VOS - 11 of 144
THE TASK
Estimated by the modelGiven to the model
Semi-supervised or “one-shot” Video Object Segmentation
DATASET
KITTI-MOTS
DATASET
Curriculum Learning for Recurrent VOS - 13 of 144
Andreas Geiger, Philip Lenz, and Raquel Urtasun. “Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite”, CVPR 2012.
KITTI-MOTS
DATASET
Curriculum Learning for Recurrent VOS - 14 of 144
Andreas Geiger, Philip Lenz, and Raquel Urtasun. “Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite”, CVPR 2012.
Its video sequences present challenges:
KITTI-MOTS
DATASET
Curriculum Learning for Recurrent VOS - 15 of 144
Its video sequences present challenges:
Andreas Geiger, Philip Lenz, and Raquel Urtasun. “Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite”, CVPR 2012.
KITTI-MOTS
DATASET
Curriculum Learning for Recurrent VOS - 16 of 144
Its video sequences present challenges:
Andreas Geiger, Philip Lenz, and Raquel Urtasun. “Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite”, CVPR 2012.
KITTI-MOTS
DATASET
Curriculum Learning for Recurrent VOS - 17 of 144
Its video sequences present challenges:
Andreas Geiger, Philip Lenz, and Raquel Urtasun. “Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite”, CVPR 2012.
THE MODEL
THE MODEL
End-to-End Recurrent Network for video object segmentation: RVOS
Curriculum Learning for Recurrent VOS - 19 of 144
Carles Ventura, Miriam Bellver, Andreu Girbau, Amaia Salvador, Ferran Marques and Xavier Giro-i-Nieto. “RVOS: End-to-End Recurrent Network
for Video Object Segmentation”, CVPR 2019.
THE MODEL
End-to-End Recurrent Network for video object segmentation: RVOS
Curriculum Learning for Recurrent VOS - 20 of 144
Athar, A., Mahadevan, S., Oˇsep, A., Leal-Taix´e, L., Leibe, B.: Stem-seg: Spatio-temporal embeddings for instance segmentation in videos., ECCV (2020)
EXPERIMENT SETS
SETS OF EXPERIMENTS
All techniques tested on two sets of experiments:
Resolution Batch Size Length clip
287x950 2 3
Resolution Batch Size Length clip
256x448 4 5
Curriculum Learning for Recurrent VOS - 22 of 144
METRICS
The results have been evaluated on the official metrics of the MOTS Challenge.
- sMOTSA has been defined as the reference metric:
Curriculum Learning for Recurrent VOS - 23 of 144
Paul Voigtlaender et al. “MOTS: Multi-Object Tracking and Segmentation”, CVPR 2019.
METRICS
The results have been evaluated on the official metrics of the MOTS Challenge.
- sMOTSA has been defined as the reference metric:
Curriculum Learning for Recurrent VOS - 25 of 144
Paul Voigtlaender et al. “MOTS: Multi-Object Tracking and Segmentation”, CVPR 2019.
EVALUATION METHOD
Proposal: Evaluation averaged per sequence
Curriculum Learning for Recurrent VOS - 25 of 144
Sequences
SCHEDULE
SAMPLING
SCHEDULE SAMPLING
Curriculum Learning for Recurrent VOS - 27 of 144
VOS requires information about the previous step.
SCHEDULE SAMPLING
Curriculum Learning for Recurrent VOS - 28 of 144
Train using the model’s outputs.
SCHEDULE SAMPLING
Curriculum Learning for Recurrent VOS - 29 of 144
Train using the ground-truth annotations.
SCHEDULE SAMPLING
Curriculum Learning for Recurrent VOS - 30 of 144
TEACHER FORCING
SCHEDULE SAMPLING
Curriculum Learning for Recurrent VOS - 31 of 144
TEACHER FORCING
Fast and efficient
SCHEDULE SAMPLING
Curriculum Learning for Recurrent VOS - 32 of 144
TEACHER FORCING
Fast and efficient Leads to exposure bias
SCHEDULE SAMPLING
Curriculum Learning for Recurrent VOS - 33 of 144
time
SCHEDULE SAMPLING
Curriculum Learning for Recurrent VOS - 34 of 144
time
Schedule Sampling
SCHEDULE SAMPLING
Curriculum Learning for Recurrent VOS - 35 of 144
Schedule Sampling
Linear
SCHEDULE SAMPLING
Curriculum Learning for Recurrent VOS - 36 of 144
Schedule Sampling
Linear Step
SCHEDULE SAMPLING
Curriculum Learning for Recurrent VOS - 37 of 144
Schedule Sampling
Linear
Forward
Step
Forward
SCHEDULE SAMPLING
Curriculum Learning for Recurrent VOS - 38 of 144
Schedule Sampling
Linear
Forward
Step
Forward
SCHEDULE SAMPLING
Curriculum Learning for Recurrent VOS - 39 of 144
Schedule Sampling
Linear
Forward
Step
Forward
SCHEDULE SAMPLING
Curriculum Learning for Recurrent VOS - 40 of 144
Schedule Sampling
Linear
Forward Inverse
Step
Forward Inverse
SCHEDULE SAMPLING
Curriculum Learning for Recurrent VOS - 41 of 144
Schedule Sampling
Linear
Forward Inverse
Step
Forward Inverse
SCHEDULE SAMPLING
Curriculum Learning for Recurrent VOS - 42 of 144
Schedule Sampling
Linear
Forward Inverse
Step
Forward Inverse
SCHEDULE SAMPLING
Curriculum Learning for Recurrent VOS - 43 of 144
Schedule Sampling
Linear
Forward Inverse
Step
Forward Inverse
SCHEDULE SAMPLING
Curriculum Learning for Recurrent VOS - 44 of 144
SCHEDULE SAMPLING
Curriculum Learning for Recurrent VOS - 45 of 144
RESULTS ON THE FORWARD STRATEGIES
SCHEDULE SAMPLING
Curriculum Learning for Recurrent VOS - 46 of 144
RESULTS ON THE INVERSE STRATEGIES
SCHEDULE SAMPLING
Curriculum Learning for Recurrent VOS - 47 of 144
OVERVIEW
SCHEDULE SAMPLING
Curriculum Learning for Recurrent VOS - 48 of 144
OVERVIEW
SCHEDULE SAMPLING
Curriculum Learning for Recurrent VOS - 49 of 144
OVERVIEW
SCHEDULE SAMPLING
Curriculum Learning for Recurrent VOS - 50 of 144
OVERVIEW
FRAME SKIPPING
FRAME SKIPPING
Curriculum Learning for Recurrent VOS - 52 of 144
KITTI-MOTS has slow-motion video sequences.
frame #1
FRAME SKIPPING
Curriculum Learning for Recurrent VOS - 53 of 144
KITTI-MOTS has slow-motion video sequences.
frame #1
frame #2
FRAME SKIPPING
Curriculum Learning for Recurrent VOS - 54 of 144
KITTI-MOTS has slow-motion video sequences.
frame #1
frame #2
frame #3
FRAME SKIPPING
Curriculum Learning for Recurrent VOS - 55 of 144
KITTI-MOTS has slow-motion video sequences.
frame #1
frame #2
frame #3
frame #4
FRAME SKIPPING
Curriculum Learning for Recurrent VOS - 56 of 144
KITTI-MOTS has slow-motion video sequences.
frame #1
frame #2
frame #3
frame #4
frame #5
FRAME SKIPPING
Curriculum Learning for Recurrent VOS - 57 of 144
KITTI-MOTS has slow-motion video sequences.
frame #1
frame #2
frame #3
frame #4
frame #5
frame #6
FRAME SKIPPING
Curriculum Learning for Recurrent VOS - 58 of 144
Ideally:
…
...
.
…
…
..
N
fram
es of the sequence
FRAME SKIPPING
Curriculum Learning for Recurrent VOS - 59 of 144
Ideally:
But we have limitations (e.g. memory constraints)
…
…
..
N
fram
es of the sequence
…
...
.
FRAME SKIPPING
Curriculum Learning for Recurrent VOS - 60 of 144
FRAME SKIPPING
Curriculum Learning for Recurrent VOS - 61 of 144
FRAME SKIPPING
Frame Skipping
Curriculum Learning for Recurrent VOS - 62 of 144
FRAME SKIPPING
Frame Skipping
From 0 to 9
Curriculum Learning for Recurrent VOS - 63 of 144
FRAME SKIPPING
Frame Skipping
From 0 to 9
Curriculum Learning for Recurrent VOS - 64 of 144
time
FRAME SKIPPING
Frame Skipping
From 0 to 9
Curriculum Learning for Recurrent VOS - 65 of 144
time
FRAME SKIPPING
Frame Skipping
From 0 to 9
Curriculum Learning for Recurrent VOS - 66 of 144
time
FRAME SKIPPING
Frame Skipping
From 0 to 9
Curriculum Learning for Recurrent VOS - 67 of 144
time
...
FRAME SKIPPING
Frame Skipping
From 0 to 9 From 1 to 5
Curriculum Learning for Recurrent VOS - 68 of 144
time
FRAME SKIPPING
Frame Skipping
From 0 to 9 From 1 to 5
Curriculum Learning for Recurrent VOS - 69 of 144
time
FRAME SKIPPING
Frame Skipping
From 0 to 9 From 1 to 5
Curriculum Learning for Recurrent VOS - 70 of 144
time
FRAME SKIPPING
Frame Skipping
From 0 to 9 From 1 to 5
Curriculum Learning for Recurrent VOS - 71 of 144
time
...
FRAME SKIPPING
Frame Skipping
From 0 to 9 From 1 to 5
All training All training
Curriculum Learning for Recurrent VOS - 72 of 144
FRAME SKIPPING
Frame Skipping
From 0 to 9 From 1 to 5
All training All training
Curriculum Learning for Recurrent VOS - 73 of 144
FRAME SKIPPING
Frame Skipping
From 0 to 9 From 1 to 5
All training All training
Curriculum Learning for Recurrent VOS - 74 of 144
FRAME SKIPPING
Frame Skipping
From 0 to 9 From 1 to 5
All training
First half
training
All training
First half
training
Curriculum Learning for Recurrent VOS - 75 of 144
FRAME SKIPPING
Frame Skipping
From 0 to 9 From 1 to 5
All training
First half
training
All training
First half
training
Curriculum Learning for Recurrent VOS - 76 of 144
FRAME SKIPPING
Frame Skipping
From 0 to 9 From 1 to 5
All training
First half
training
All training
First half
training
Curriculum Learning for Recurrent VOS - 77 of 144
FRAME SKIPPING
Curriculum Learning for Recurrent VOS - 78 of 144
RESULTS ON THE FRAME SKIPPING APPLIED DURING ALL TRAINING
FRAME SKIPPING
Curriculum Learning for Recurrent VOS - 79 of 144
RESULTS ON THE FRAME SKIPPING APPLIED ONLY WITH GROUND-TRUTH
FRAME SKIPPING
Curriculum Learning for Recurrent VOS - 80 of 144
OVERVIEW
FRAME SKIPPING
Curriculum Learning for Recurrent VOS - 81 of 144
OVERVIEW
TEMPORAL AND
SPATIAL
RECURRENCES
TEMPORAL AND SPATIAL RECURRENCES
Curriculum Learning for Recurrent VOS - 83 of 144
KITTI-MOTS is a crowded dataset:
TEMPORAL AND SPATIAL RECURRENCES
Curriculum Learning for Recurrent VOS - 84 of 144
time (frame sequence)
space(objectsequence)
TEMPORAL AND SPATIAL RECURRENCES
Curriculum Learning for Recurrent VOS - 85 of 144
time (frame sequence)
TEMPORAL RECURRENCE
TEMPORAL AND SPATIAL RECURRENCES
Curriculum Learning for Recurrent VOS - 86 of 144
space(objectsequence)
SPATIAL RECURRENCE
TEMPORAL AND SPATIAL RECURRENCES
Proposed curriculum:
Curriculum Learning for Recurrent VOS - 87 of 144
TEMPORAL AND SPATIAL RECURRENCES
Temporal and Spatial
Recurrence
Only temporal during
the first half of training
Curriculum Learning for Recurrent VOS - 88 of 144
TEMPORAL AND SPATIAL RECURRENCES
Temporal and Spatial
Recurrence
Spatio-temporal during
all training
Only temporal during
the first half of training
Curriculum Learning for Recurrent VOS - 89 of 144
TEMPORAL AND SPATIAL RECURRENCES
Temporal and Spatial
Recurrence
Spatio-temporal during
all training
Only temporal during
all training
Only temporal during
the first half of training
Curriculum Learning for Recurrent VOS - 90 of 144
TEMPORAL AND SPATIAL RECURRENCES
Temporal and Spatial
Recurrence
Spatio-temporal during
all training
Only temporal during
all training
Only temporal during
the first half of training
Only temporal during the
second half of training
Curriculum Learning for Recurrent VOS - 91 of 144
TEMPORAL AND SPATIAL RECURRENCES
Curriculum Learning for Recurrent VOS - 92 of 144
TEMPORAL AND SPATIAL RECURRENCES
Curriculum Learning for Recurrent VOS - 93 of 144
TEMPORAL AND SPATIAL RECURRENCES
Curriculum Learning for Recurrent VOS - 94 of 144
Ground-truth
TEMPORAL AND SPATIAL RECURRENCES
Curriculum Learning for Recurrent VOS - 95 of 144
Only Spatio-Temporal
Ground-truth
TEMPORAL AND SPATIAL RECURRENCES
Curriculum Learning for Recurrent VOS - 96 of 144
Only Spatio-Temporal Only Temporal
Ground-truth
TEMPORAL AND SPATIAL RECURRENCES
Curriculum Learning for Recurrent VOS - 97 of 144
Only Spatio-Temporal Only Temporal
Only Temporal first half
Ground-truth
TEMPORAL AND SPATIAL RECURRENCES
Curriculum Learning for Recurrent VOS - 98 of 144
Only Spatio-Temporal Only Temporal
Only Temporal first half Only Temporal second half
Ground-truth
TEMPORAL AND SPATIAL RECURRENCES
Curriculum Learning for Recurrent VOS - 99 of 144
LOSS
PENALIZATION
BY OBJECT AREA
LOSS PENALIZATION BY OBJECT AREA
Curriculum Learning for Recurrent VOS - 101 of 144
KITTI-MOTS contains instances with different resolution:
LOSS PENALIZATION BY OBJECT AREA
Curriculum Learning for Recurrent VOS - 102 of 144
KITTI-MOTS contains instances with different resolution:
LOSS PENALIZATION BY OBJECT AREA
Curriculum Learning for Recurrent VOS - 103 of 144
KITTI-MOTS contains instances with different resolution:
LOSS PENALIZATION BY OBJECT AREA
Curriculum Learning for Recurrent VOS - 104 of 144
An hypothesis is made:
DIFFICULT
LOSS PENALIZATION BY OBJECT AREA
Curriculum Learning for Recurrent VOS - 105 of 144
An hypothesis is made:
DIFFICULT EASY
LOSS PENALIZATION BY OBJECT AREA
Curriculum Learning for Recurrent VOS - 106 of 144
A curriculum is created:
time
LOSS PENALIZATION BY OBJECT AREA
Curriculum Learning for Recurrent VOS - 107 of 144
A curriculum is created:
time
LOSS PENALIZATION BY OBJECT AREA
Curriculum Learning for Recurrent VOS - 108 of 144
LOSS PENALIZATION BY OBJECT AREA
Curriculum Learning for Recurrent VOS - 109 of 144
Resolution Batch Size Length clip
287x950 2 3
Resolution Batch Size Length clip
256x448 4 5
LOSS PENALIZATION BY OBJECT AREA
Curriculum Learning for Recurrent VOS - 110 of 144
Resolution Batch Size Length clip
287x950 2 3
Resolution Batch Size Length clip
256x448 4 5
LAST MINUTE
RESULTS
LAST MINUTE RESULTS
Curriculum Learning for Recurrent VOS - 112 of 144
QUALITATIVE
RESULTS
YouTube-VOS
YouTube-VOS
Curriculum Learning for Recurrent VOS - 119 of 144
Ning Xu et al. “YouTube-VOS: A Large-Scale Video Object Segmentation Benchmark”, ECCV 2018
YouTube-VOS
Curriculum Learning for Recurrent VOS - 120 of 144
- Training parameters:
Resolution Batch Size Length clip
256x448 4 5
YouTube-VOS
Curriculum Learning for Recurrent VOS - 121 of 144
- Training parameters:
- Evaluated with the official metrics of the YouTube-VOS challenge.
Resolution Batch Size Length clip
256x448 4 5
YouTube-VOS
Curriculum Learning for Recurrent VOS - 122 of 144
- Training parameters:
- Evaluated with the official metrics of the YouTube-VOS challenge.
Resolution Batch Size Length clip
256x448 4 5
YouTube-VOS
Curriculum Learning for Recurrent VOS - 123 of 144
- Training parameters:
- Evaluated with the official metrics of the YouTube-VOS challenge.
Resolution Batch Size Length clip
256x448 4 5
YouTube-VOS
Curriculum Learning for Recurrent VOS - 124 of 144
Forward Linear Inverse Linear
Forward Step Inverse Linear
YouTube-VOS
Curriculum Learning for Recurrent VOS - 125 of 144
Results on KITTI-MOTS Results on YouTube-VOS
YouTube-VOS
Curriculum Learning for Recurrent VOS - 126 of 144
adapted
Frame skipping from 0 to 3From 0 to 9
YouTube-VOS
Curriculum Learning for Recurrent VOS - 127 of 144
Results on YouTube-VOSResults on KITTI-MOTS
CONCLUSIONS
CONCLUSIONS
Curriculum Learning for Recurrent VOS - 129 of 144
SCHEDULE SAMPLING FRAME SKIPPING
LOSS PENALIZATION BY OBJECT AREATEMPORAL AND SPATIAL RECURRENCES
CONCLUSIONS
Curriculum Learning for Recurrent VOS - 130 of 144
SCHEDULE SAMPLING FRAME SKIPPING
LOSS PENALIZATION BY OBJECT AREATEMPORAL AND SPATIAL RECURRENCES
CONCLUSIONS
Curriculum Learning for Recurrent VOS - 131 of 144
SCHEDULE SAMPLING FRAME SKIPPING
LOSS PENALIZATION BY OBJECT AREATEMPORAL AND SPATIAL RECURRENCES
CONCLUSIONS
Curriculum Learning for Recurrent VOS - 132 of 144
SCHEDULE SAMPLING FRAME SKIPPING
LOSS PENALIZATION BY OBJECT AREATEMPORAL AND SPATIAL RECURRENCES
CONCLUSIONS
Curriculum Learning for Recurrent VOS - 133 of 144
SCHEDULE SAMPLING FRAME SKIPPING
LOSS PENALIZATION BY OBJECT AREATEMPORAL AND SPATIAL RECURRENCES
CONCLUSIONS
Curriculum Learning for Recurrent VOS - 134 of 144
Importance of knowing the dataset.
KITTI-MOTS YouTube-VOS
CONCLUSIONS
Curriculum Learning for Recurrent VOS - 135 of 144
Importance of knowing the dataset.
KITTI-MOTS YouTube-VOS
CONCLUSIONS
Curriculum Learning for Recurrent VOS - 136 of 144
Importance of knowing the dataset.
KITTI-MOTS YouTube-VOS
FUTURE WORK
Curriculum Learning for Recurrent VOS - 137 of 144
FUTURE WORK
Curriculum Learning for Recurrent VOS - 138 of 144
Schedule Sampling
FUTURE WORK
Curriculum Learning for Recurrent VOS - 139 of 144
Schedule Sampling Frame Skipping
FUTURE WORK
Curriculum Learning for Recurrent VOS - 140 of 144
Schedule Sampling Frame Skipping
Loss penalization
by object area
FUTURE WORK
Curriculum Learning for Recurrent VOS - 141 of 144
Schedule Sampling Frame Skipping
Loss penalization
by object area
Other curriculums
FUTURE WORK
Curriculum Learning for Recurrent VOS - 142 of 144
Schedule Sampling Frame Skipping
Loss penalization
by object area
Other curriculums
Combination of the
best curriculums
WORKSHOP SUBMISSIONS
Curriculum Learning for Recurrent VOS - 143 of 144
Acceptance Notification: August 3, 2020
PAD2020
Curriculum Learning for Recurrent Video Object Segmentation
Maria Gonzalez Calabuig
Barcelona, 24th July 2020

Curriculum Learning for Recurrent Video Object Segmentation