How useful is self-supervised pretraining for Visual tasks?

How Useful is Self-Supervised Pretraining
for Visual Tasks
Hwang seung hyun
Yonsei University Severance Hospital CCIDS
Princeton university | CVPR 2020
2020.07.05

Introduction Related Work Methods and
Experiments
01 02 03
Conclusion
04
Yonsei Unversity Severance Hospital CCIDS
Contents

Self Supervised Pretraining
Introduction – Proposal
• There has been a lot of progress in self-supervised pretraining for vision. This paper
offer insights into when and how to use self-supervised pretraining.
• Self-supervised models now produce features that are comparable to or
outperform ImageNet Pretrained features.
• Networks pretrained on ImageNet data are relatively weak to domain shifts. Self
supervised methods have an advantage.
• Investigate how useful is self-supervision when there is sufficient amount of labeled
data.
• Large amount of data does not guarantee good performance, since fitting large data
to NN is difficult. Self supervised pretraining may produce better representations
that help optimization.
Introduction / Related Work / Methods and Experiments / Conclusion
01

Self Supervised Pretraining
Introduction – Proposal
• Through experiment, (c) was found to be the most
common outcome.
02

ResNeSt
Introduction – Contributions
• Found that leading self-supervised pretraining methods are useful with a
small labeling budget, but utility tends to decrease with ample labels.
• Found that self-supervision is more helpful when applied to larger models
and to more difficult versions of the data.
• Relative performance of methods is not consistent across downstream
settings.
03

Related Work
04
1. Variational AutoEncoder (VAE)
– Standard baseline for mapping images to a low-dimensional latent space.
Pretraining Methods

Related Work
05
2. Rotation
– Network is tasked with predicting whether an image has been rotated.
Pretraining Methods

Related Work
06
3. Contrastive Multiview Coding(CMC)
– Split image into multiple channels such as the L and ab channels of an
image in Lab color space.
Pretraining Methods

Related Work
07
4. Augmented Multiscale Deep
InfoMax (AMDIM)
– Instead of comparing across image
channels, AMDIM compares
representations from two augmented
versions of the same image
Pretraining Methods

Methods and Experiments
Experimental Settings – Data
08
• To control dataset, authors synthesized images, giving endless supply of images.
• Rendered images with Blender using object models from ShapeNet.
• Generated images consist of objects floating in empty space.
• Can change the number of objects, orientation, texture, lighting, and positon .
Texture / Color / Viewpoint / Lighting

Experimental Settings – Downstream tasks
09
1. Object Classification
- Distinguish between ten ShapeNet classes.
2. Object pose estimation
- Discretized pose into five bins (upward, forward, backward, left, right)
3. Semantic Segmentation
- Images rendered with multiple objects
4. Depth Estimation
- Multiple images with coarser resolution

Experimental Settings
10
• Data resolutions are 64x64, 128x128
• Rendered total 480,000 images
• Used ResNet9 and ResNet50 for all experiments.
• Pretrain self-supervised algorithms for between 100-200 epochs.
• For finetuning, load a pretrained model and train for 75 to 200 additional
epochs.
• Evaluation : Accuracy & Utility
* Utility: U(x) = (x’/x) -1

Results – Utility vs. Number of Labeled Samples
• Self-supervision has significant utility when the number of labeled samples is
small, but utility approaches zero as labeled data grows.
• Self-supervised pretraining gives regularization that reduces overfitting, not
better optimization that reduces underfitting 11

Results – Utility vs. Downstream Task
• CMC performs best in object classification
• Rotation and AMDIM perform better on segmentation and depth estimation
respectively
12

Results – Utility vs. Data Complexity
• Utility goes up with texture
and down with viewpoint
changes for CMC.
• Opposite for AMDIM
• For VAE, as data complexity
increases, utility lowers (Since
latent space must encode all
information necessary to
reproduce the image)
• Contrastive approaches teach
a network to map to the same
embedding after applying
different image
transformations, helping
network to ignore changes in
pixel space.
13

Results – Utility vs Model Size
• For downstream performance, applying self-supervised pretraining on
large backbone network is better.
14

Conclusion
• Investigated a number of factors that affect the utility of
self-supervised pretraining.
• Greatest benefits of pretraining are currently in low data
regimes.
• Performance of a self-supervised algorithm in one setting
may not necessarily reflect its performance in others.
15

How useful is self-supervised pretraining for Visual tasks?

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to How useful is self-supervised pretraining for Visual tasks?

Similar to How useful is self-supervised pretraining for Visual tasks? (20)

More from Seunghyun Hwang

More from Seunghyun Hwang (11)

Recently uploaded

Recently uploaded (20)

How useful is self-supervised pretraining for Visual tasks?