"Revisiting self supervised visual representation learning" Paper Review

2th February 2020
PR12 Paper Review
Ho Seong Lee (hoya012)
Cognex Deep Learning Lab KR
2019 CVPR
PR-222: Revisiting Self-Supervised Visual Representation Learning 1

Contents
• Introduction
• Self-Supervised Study Setup
• Architectures of CNN models
• Self-supervised techniques in this study
• Evaluation
• Datasets
• Experiments and Results
• Conclusion

Before Start..
[PR-208] Unsupervised Visual Representation Learning Overview: Toward Self-Supervision
• Video Link: https://youtu.be/eDDHsbMgOJQ
• I highly recommend watching the video above(PR-208) before listening to this presentation!!

Introduction
“Revisiting Self-Supervised Visual Representation Learning”, 2019 CVPR
• Many the pretext tasks for self-supervised learning have been studied
• But.. Still low performance than supervised setting
• Other important aspects, such as CNN architecture has not received equal attention

“Revisiting Self-Supervised Visual Representation Learning”, 2019 CVPR
• Other important aspects, such as CNN architecture has not received equal attention
• So, revisit previously proposed self-supervised models and conduct a large-scale study
Introduction

3.1. Architectures of CNN models
• A large part of the self-supervised techniques for visual representation approaches use AlexNet
• Employ modern network architectures
• ResNet50, pre-logits of size 512*k
• RevNet (The Reversible ResNet), but do not use G like real NVP paper
• VGG with batch-normalization, initial conv layer has 8*k channels, fc layer has 512*k channels
Self-Supervised Study Setup
Why use an old-fashioned architecture?!
reference: The Reversible Residual Network: Backpropagation Without Storing Activations, 2017 NIPS
ResNet RevNet
widening factor k, k ∈ {4, 8, 12, 16}

3.2. Self-supervised techniques in this study
• Use 4 self-supervised techniques for experiments
• Rotation
• Exemplar
• Jigsaw
• Relative Patch Location

3.3. Evaluation
• Follow common rule - Training a linear logistic regression model to solve multi-class classification task
• Exact the representation from the frozen network at the pre-logit level
• Train the logistic regression using L-BFGS except in Table 2
• For consistency and fair evaluation, use SGD with momentum, augmentation in Table 2
Table 2

3.4. Datasets
• ImageNet (Train + Validation)
• In order to avoid overfitting, use own validation split (50,000 random images from training split) for
all studies except in Table 2
• All self-supervised models are trained on ImageNet(without labels)
• Places205 (Validation only)
• Qualitatively different from ImageNet → good candidate for evaluating how well the learned
representations generalize to new unseen data of different nature
• Same procedure as for ImageNet regarding validation splits (random splitting)

4.1. Evaluation on ImageNet and Places205
• Measure the representation quality produced by 6 different CNN with various widening factors
• Increasing the number of channels improves performance of self-supervised models
Experiments and Results
Widening
factor
Random
Initialize
Without
ReLU before
GAP layer

4.1. Evaluation on ImageNet and Places205
• neither is the ranking of architectures consistent across different methods, nor is the ranking of
methods consistent across architectures
• Ranking of Places205 is consistent with that of ImageNet → generalized to new dataset
• VGG19-BN consistently demonstrates worst performance, even though it achieve performance similar to
ResNet 50 on standard vision benchmark (fully supervised setting)
Rotation → RevNet50
Exemplar → ResNet50 v1
Rel. Patch Loc. → ResNet50 v1
Jigsaw → ResNet50 v1
VGG19-BN → Worst performance in all case

4.2. Comparison to prior work
• For consistency and fair evaluation, use SGD with momentum, augmentation in Table 2
• As a result of selecting the right architecture, significantly outperform previous reported results
Prev. Result

4.3. A linear model is adequate for evaluation
• Consider an alternative evaluation scenario – use MLP for solving the evaluation task
• Add a single hidden layer with 1000 channels with ReLU, Dropout to become non-linear model
• MLP provides only marginal improvement over the linear evaluation

4.4. Better performance on the pretext task does not always translate to better
representations
• Performance on the pretext task is a good proxy, but not always..

4.5. Skip-connections prevent degradation of representation quality towards the end of
CNNs
• VGG-BN get worse towards the end of the network, but not ResNet, RevNet
• Model specialize to the pretext task and discard more general semantic features in the later layers
• Using skip-connections preserve information learned in intermediate layers

4.6. Model width and representation size strongly influence the representation quality
• Check whether the increase in performance is due to increased network capacity or the use of higher-
dimensional representations, or to the interplay of both
• Disentangle the network width from the representation size(pre-logits channels)
• Increasing the widening factor consistently boosts performance in both the full and low-data regimes.

4.7. SGD for training linear model takes long time to converge
• Previous works use short training time
• Investigate the importance of the SGD optimization schedule for training logistic regression
• The first decay has a large influence on the final accuracy

Revisit previously proposed self-supervised models and conduct a large-scale study
• Architecture design in the fully-supervised setting necessarily do not translate to the self-supervised
setting (VGG19-BN)
• Using skip-connections can achieve consistently good results in contrast to AlexNet
• Widening factor of CNNs has a drastic effect on performance of self-supervised techniques
• SGD training of linear logistic regression require very long time to converge
• Ranking of architectures  X → Ranking of methods
Conclusion

"Revisiting self supervised visual representation learning" Paper Review

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to "Revisiting self supervised visual representation learning" Paper Review

Similar to "Revisiting self supervised visual representation learning" Paper Review (20)

More from LEE HOSEONG

More from LEE HOSEONG (14)

Recently uploaded

Recently uploaded (20)

"Revisiting self supervised visual representation learning" Paper Review