Curriculum Learning for Recurrent Video Object Segmentation

CURRICULUM LEARNING FOR
RECURRENT VIDEO OBJECT SEGMENTATION
Co-directors: Xavier Giró Nieto and Carles Ventura Royo
Author: Maria Gonzàlez Calabuig

Introduction
Dataset
The model
Experiment sets
Techniques
Qualitative results
YouTube-VOS
Conclusions
CONTENTS

INTRODUCTION
Curriculum Learning for Recurrent VOS - 4 of 144
Curriculum Learning:
Methodology inspired by the learning process of humans. The training data is presented in a
meaningful way, from simple to complex concepts.
Yoshua Bengio et al. “Curriculum Learning”, ICML. 2019.

INTRODUCTION
4 curriculums

INTRODUCTION
4 curriculums
THE DATASET

INTRODUCTION
4 curriculums
THE DATASET THE MODEL

INTRODUCTION
THE TASK
Semi-supervised or “one-shot” Video Object Segmentation

INTRODUCTION
THE TASK

INTRODUCTION
THE TASK
Estimated by the modelGiven to the model

KITTI-MOTS
DATASET
Andreas Geiger, Philip Lenz, and Raquel Urtasun. “Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite”, CVPR 2012.

KITTI-MOTS
DATASET
Its video sequences present challenges:

KITTI-MOTS
DATASET

THE MODEL
End-to-End Recurrent Network for video object segmentation: RVOS
Carles Ventura, Miriam Bellver, Andreu Girbau, Amaia Salvador, Ferran Marques and Xavier Giro-i-Nieto. “RVOS: End-to-End Recurrent Network
for Video Object Segmentation”, CVPR 2019.

THE MODEL
End-to-End Recurrent Network for video object segmentation: RVOS
Athar, A., Mahadevan, S., Oˇsep, A., Leal-Taix´e, L., Leibe, B.: Stem-seg: Spatio-temporal embeddings for instance segmentation in videos., ECCV (2020)

SETS OF EXPERIMENTS
All techniques tested on two sets of experiments:
Resolution Batch Size Length clip
287x950 2 3
256x448 4 5

METRICS
The results have been evaluated on the oﬃcial metrics of the MOTS Challenge.
- sMOTSA has been deﬁned as the reference metric:
Paul Voigtlaender et al. “MOTS: Multi-Object Tracking and Segmentation”, CVPR 2019.

EVALUATION METHOD
Proposal: Evaluation averaged per sequence
Sequences

SCHEDULE SAMPLING
VOS requires information about the previous step.

SCHEDULE SAMPLING
Train using the model’s outputs.

SCHEDULE SAMPLING
Train using the ground-truth annotations.

SCHEDULE SAMPLING
TEACHER FORCING

SCHEDULE SAMPLING
TEACHER FORCING
Fast and eﬃcient

SCHEDULE SAMPLING
TEACHER FORCING
Fast and eﬃcient Leads to exposure bias

SCHEDULE SAMPLING
time

Schedule Sampling
SCHEDULE SAMPLING

Schedule Sampling
Linear
SCHEDULE SAMPLING

Schedule Sampling
Linear Step
SCHEDULE SAMPLING

Schedule Sampling
Linear
Forward
Step
Forward
SCHEDULE SAMPLING

Schedule Sampling
Linear
Forward Inverse
Step
Forward Inverse
SCHEDULE SAMPLING

SCHEDULE SAMPLING
RESULTS ON THE FORWARD STRATEGIES

SCHEDULE SAMPLING
RESULTS ON THE INVERSE STRATEGIES

SCHEDULE SAMPLING
OVERVIEW

FRAME SKIPPING
KITTI-MOTS has slow-motion video sequences.
frame #1

FRAME SKIPPING
frame #1
frame #2

FRAME SKIPPING
frame #1
frame #2
frame #3

FRAME SKIPPING
frame #1
frame #2
frame #3
frame #4

FRAME SKIPPING
frame #1
frame #2
frame #3
frame #4
frame #5

FRAME SKIPPING
frame #1
frame #2
frame #3
frame #4
frame #5
frame #6

FRAME SKIPPING
Ideally:
…
...
.
…
…
..
N
fram
es of the sequence

FRAME SKIPPING
Ideally:
But we have limitations (e.g. memory constraints)
…
…
..
N
fram
es of the sequence
…
...
.

FRAME SKIPPING

FRAME SKIPPING
Frame Skipping

FRAME SKIPPING
Frame Skipping
From 0 to 9

FRAME SKIPPING
Frame Skipping
From 0 to 9
time

FRAME SKIPPING
Frame Skipping
From 0 to 9
time
...

FRAME SKIPPING
Frame Skipping
From 0 to 9 From 1 to 5
time

FRAME SKIPPING
Frame Skipping
time

FRAME SKIPPING
Frame Skipping
time
...

FRAME SKIPPING
Frame Skipping
All training All training

FRAME SKIPPING
Frame Skipping

FRAME SKIPPING
Frame Skipping
All training
First half
training
All training
First half
training

FRAME SKIPPING
RESULTS ON THE FRAME SKIPPING APPLIED DURING ALL TRAINING

FRAME SKIPPING
RESULTS ON THE FRAME SKIPPING APPLIED ONLY WITH GROUND-TRUTH

FRAME SKIPPING
OVERVIEW

TEMPORAL AND
SPATIAL
RECURRENCES

TEMPORAL AND SPATIAL RECURRENCES
KITTI-MOTS is a crowded dataset:

time (frame sequence)
space(objectsequence)

time (frame sequence)
TEMPORAL RECURRENCE

space(objectsequence)
SPATIAL RECURRENCE

Proposed curriculum:

Temporal and Spatial
Recurrence
Only temporal during
the ﬁrst half of training

Recurrence
Spatio-temporal during
all training

Recurrence
all training
all training

Recurrence
all training
all training
Only temporal during the
second half of training

Ground-truth

Only Spatio-Temporal
Ground-truth

Only Spatio-Temporal Only Temporal
Ground-truth

Only Temporal ﬁrst half
Ground-truth

Only Temporal ﬁrst half Only Temporal second half
Ground-truth

LOSS
PENALIZATION
BY OBJECT AREA

LOSS PENALIZATION BY OBJECT AREA
KITTI-MOTS contains instances with different resolution:

An hypothesis is made:
DIFFICULT

An hypothesis is made:
DIFFICULT EASY

A curriculum is created:
time

287x950 2 3
256x448 4 5

LAST MINUTE RESULTS

YouTube-VOS
Ning Xu et al. “YouTube-VOS: A Large-Scale Video Object Segmentation Benchmark”, ECCV 2018

YouTube-VOS
- Training parameters:
256x448 4 5

YouTube-VOS
- Evaluated with the official metrics of the YouTube-VOS challenge.
256x448 4 5

YouTube-VOS
256x448 4 5

YouTube-VOS
Forward Linear Inverse Linear
Forward Step Inverse Linear

YouTube-VOS
Results on KITTI-MOTS Results on YouTube-VOS

YouTube-VOS
adapted
Frame skipping from 0 to 3From 0 to 9

YouTube-VOS
Results on YouTube-VOSResults on KITTI-MOTS

CONCLUSIONS
SCHEDULE SAMPLING FRAME SKIPPING
LOSS PENALIZATION BY OBJECT AREATEMPORAL AND SPATIAL RECURRENCES

CONCLUSIONS

CONCLUSIONS
Importance of knowing the dataset.
KITTI-MOTS YouTube-VOS

CONCLUSIONS

FUTURE WORK

FUTURE WORK
Schedule Sampling

FUTURE WORK
Schedule Sampling Frame Skipping

FUTURE WORK
Loss penalization
by object area

FUTURE WORK
Loss penalization
by object area
Other curriculums

FUTURE WORK
Loss penalization
by object area
Other curriculums
Combination of the
best curriculums

WORKSHOP SUBMISSIONS
Acceptance Notiﬁcation: August 3, 2020
PAD2020

Curriculum Learning for Recurrent Video Object Segmentation
Maria Gonzalez Calabuig
Barcelona, 24th July 2020

Curriculum Learning for Recurrent Video Object Segmentation

More Related Content

Similar to Curriculum Learning for Recurrent Video Object Segmentation

More from Universitat Politècnica de Catalunya

Recently uploaded

Curriculum Learning for Recurrent Video Object Segmentation