Review : How useful is self-supervised pretraining for Visual tasks?
- by Seunghyun Hwang (Yonsei University, Severance Hospital, Center for Clinical Data Science)
Powerpoint exploring the locations used in television show Time Clash
How useful is self-supervised pretraining for Visual tasks?
1. How Useful is Self-Supervised Pretraining
for Visual Tasks
Hwang seung hyun
Yonsei University Severance Hospital CCIDS
Princeton university | CVPR 2020
2020.07.05
2. Introduction Related Work Methods and
Experiments
01 02 03
Conclusion
04
Yonsei Unversity Severance Hospital CCIDS
Contents
3. Self Supervised Pretraining
Introduction – Proposal
• There has been a lot of progress in self-supervised pretraining for vision. This paper
offer insights into when and how to use self-supervised pretraining.
• Self-supervised models now produce features that are comparable to or
outperform ImageNet Pretrained features.
• Networks pretrained on ImageNet data are relatively weak to domain shifts. Self
supervised methods have an advantage.
• Investigate how useful is self-supervision when there is sufficient amount of labeled
data.
• Large amount of data does not guarantee good performance, since fitting large data
to NN is difficult. Self supervised pretraining may produce better representations
that help optimization.
Introduction / Related Work / Methods and Experiments / Conclusion
01
4. Self Supervised Pretraining
Introduction – Proposal
Introduction / Related Work / Methods and Experiments / Conclusion
• Through experiment, (c) was found to be the most
common outcome.
02
5. ResNeSt
Introduction – Contributions
• Found that leading self-supervised pretraining methods are useful with a
small labeling budget, but utility tends to decrease with ample labels.
• Found that self-supervision is more helpful when applied to larger models
and to more difficult versions of the data.
• Relative performance of methods is not consistent across downstream
settings.
Introduction / Related Work / Methods and Experiments / Conclusion
03
6. Related Work
Introduction / Related Work / Methods and Experiments / Conclusion
04
1. Variational AutoEncoder (VAE)
– Standard baseline for mapping images to a low-dimensional latent space.
Pretraining Methods
7. Related Work
Introduction / Related Work / Methods and Experiments / Conclusion
05
2. Rotation
– Network is tasked with predicting whether an image has been rotated.
Pretraining Methods
8. Related Work
Introduction / Related Work / Methods and Experiments / Conclusion
06
3. Contrastive Multiview Coding(CMC)
– Split image into multiple channels such as the L and ab channels of an
image in Lab color space.
Pretraining Methods
9. Related Work
Introduction / Related Work / Methods and Experiments / Conclusion
07
4. Augmented Multiscale Deep
InfoMax (AMDIM)
– Instead of comparing across image
channels, AMDIM compares
representations from two augmented
versions of the same image
Pretraining Methods
10. Methods and Experiments
Experimental Settings – Data
Introduction / Related Work / Methods and Experiments / Conclusion
08
• To control dataset, authors synthesized images, giving endless supply of images.
• Rendered images with Blender using object models from ShapeNet.
• Generated images consist of objects floating in empty space.
• Can change the number of objects, orientation, texture, lighting, and positon .
Texture / Color / Viewpoint / Lighting
11. Methods and Experiments
Experimental Settings – Downstream tasks
Introduction / Related Work / Methods and Experiments / Conclusion
09
1. Object Classification
- Distinguish between ten ShapeNet classes.
2. Object pose estimation
- Discretized pose into five bins (upward, forward, backward, left, right)
3. Semantic Segmentation
- Images rendered with multiple objects
4. Depth Estimation
- Multiple images with coarser resolution
12. Methods and Experiments
Experimental Settings
Introduction / Related Work / Methods and Experiments / Conclusion
10
• Data resolutions are 64x64, 128x128
• Rendered total 480,000 images
• Used ResNet9 and ResNet50 for all experiments.
• Pretrain self-supervised algorithms for between 100-200 epochs.
• For finetuning, load a pretrained model and train for 75 to 200 additional
epochs.
• Evaluation : Accuracy & Utility
* Utility: U(x) = (x’/x) -1
13. Methods and Experiments
Results – Utility vs. Number of Labeled Samples
Introduction / Related Work / Methods and Experiments / Conclusion
• Self-supervision has significant utility when the number of labeled samples is
small, but utility approaches zero as labeled data grows.
• Self-supervised pretraining gives regularization that reduces overfitting, not
better optimization that reduces underfitting 11
14. Methods and Experiments
Results – Utility vs. Downstream Task
Introduction / Related Work / Methods and Experiments / Conclusion
• CMC performs best in object classification
• Rotation and AMDIM perform better on segmentation and depth estimation
respectively
12
15. Methods and Experiments
Results – Utility vs. Data Complexity
Introduction / Related Work / Methods and Experiments / Conclusion
• Utility goes up with texture
and down with viewpoint
changes for CMC.
• Opposite for AMDIM
• For VAE, as data complexity
increases, utility lowers (Since
latent space must encode all
information necessary to
reproduce the image)
• Contrastive approaches teach
a network to map to the same
embedding after applying
different image
transformations, helping
network to ignore changes in
pixel space.
13
16. Methods and Experiments
Results – Utility vs Model Size
Introduction / Related Work / Methods and Experiments / Conclusion
• For downstream performance, applying self-supervised pretraining on
large backbone network is better.
14
17. Conclusion
Introduction / Related Work / Methods and Experiments / Conclusion
• Investigated a number of factors that affect the utility of
self-supervised pretraining.
• Greatest benefits of pretraining are currently in low data
regimes.
• Performance of a self-supervised algorithm in one setting
may not necessarily reflect its performance in others.
15