SlideShare a Scribd company logo
2th February 2020
PR12 Paper Review
Ho Seong Lee (hoya012)
Cognex Deep Learning Lab KR
2019 CVPR
PR-222: Revisiting Self-Supervised Visual Representation Learning 1
Contents
• Introduction
• Self-Supervised Study Setup
• Architectures of CNN models
• Self-supervised techniques in this study
• Evaluation
• Datasets
• Experiments and Results
• Conclusion
PR-222: Revisiting Self-Supervised Visual Representation Learning 2
Before Start..
[PR-208] Unsupervised Visual Representation Learning Overview: Toward Self-Supervision
• Video Link: https://youtu.be/eDDHsbMgOJQ
• I highly recommend watching the video above(PR-208) before listening to this presentation!!
PR-222: Revisiting Self-Supervised Visual Representation Learning 3
Introduction
“Revisiting Self-Supervised Visual Representation Learning”, 2019 CVPR
• Many the pretext tasks for self-supervised learning have been studied
• But.. Still low performance than supervised setting
• Other important aspects, such as CNN architecture has not received equal attention
PR-222: Revisiting Self-Supervised Visual Representation Learning 4
“Revisiting Self-Supervised Visual Representation Learning”, 2019 CVPR
• Other important aspects, such as CNN architecture has not received equal attention
• So, revisit previously proposed self-supervised models and conduct a large-scale study
Introduction
PR-222: Revisiting Self-Supervised Visual Representation Learning 5
3.1. Architectures of CNN models
• A large part of the self-supervised techniques for visual representation approaches use AlexNet
• Employ modern network architectures
• ResNet50, pre-logits of size 512*k
• RevNet (The Reversible ResNet), but do not use G like real NVP paper
• VGG with batch-normalization, initial conv layer has 8*k channels, fc layer has 512*k channels
Self-Supervised Study Setup
Why use an old-fashioned architecture?!
reference: The Reversible Residual Network: Backpropagation Without Storing Activations, 2017 NIPS
ResNet RevNet
widening factor k, k ∈ {4, 8, 12, 16}
PR-222: Revisiting Self-Supervised Visual Representation Learning 6
3.2. Self-supervised techniques in this study
• Use 4 self-supervised techniques for experiments
• Rotation
• Exemplar
• Jigsaw
• Relative Patch Location
Self-Supervised Study Setup
PR-222: Revisiting Self-Supervised Visual Representation Learning 7
3.3. Evaluation
• Follow common rule - Training a linear logistic regression model to solve multi-class classification task
• Exact the representation from the frozen network at the pre-logit level
• Train the logistic regression using L-BFGS except in Table 2
• For consistency and fair evaluation, use SGD with momentum, augmentation in Table 2
Self-Supervised Study Setup
Table 2
PR-222: Revisiting Self-Supervised Visual Representation Learning 8
3.4. Datasets
• ImageNet (Train + Validation)
• In order to avoid overfitting, use own validation split (50,000 random images from training split) for
all studies except in Table 2
• All self-supervised models are trained on ImageNet(without labels)
• Places205 (Validation only)
• Qualitatively different from ImageNet → good candidate for evaluating how well the learned
representations generalize to new unseen data of different nature
• Same procedure as for ImageNet regarding validation splits (random splitting)
Self-Supervised Study Setup
PR-222: Revisiting Self-Supervised Visual Representation Learning 9
4.1. Evaluation on ImageNet and Places205
• Measure the representation quality produced by 6 different CNN with various widening factors
• Increasing the number of channels improves performance of self-supervised models
Experiments and Results
Widening
factor
Random
Initialize
Without
ReLU before
GAP layer
PR-222: Revisiting Self-Supervised Visual Representation Learning 10
4.1. Evaluation on ImageNet and Places205
• neither is the ranking of architectures consistent across different methods, nor is the ranking of
methods consistent across architectures
• Ranking of Places205 is consistent with that of ImageNet → generalized to new dataset
• VGG19-BN consistently demonstrates worst performance, even though it achieve performance similar to
ResNet 50 on standard vision benchmark (fully supervised setting)
Experiments and Results
Rotation → RevNet50
Exemplar → ResNet50 v1
Rel. Patch Loc. → ResNet50 v1
Jigsaw → ResNet50 v1
VGG19-BN → Worst performance in all case
PR-222: Revisiting Self-Supervised Visual Representation Learning 11
4.2. Comparison to prior work
• For consistency and fair evaluation, use SGD with momentum, augmentation in Table 2
• As a result of selecting the right architecture, significantly outperform previous reported results
Experiments and Results
Prev. Result
PR-222: Revisiting Self-Supervised Visual Representation Learning 12
4.3. A linear model is adequate for evaluation
• Consider an alternative evaluation scenario – use MLP for solving the evaluation task
• Add a single hidden layer with 1000 channels with ReLU, Dropout to become non-linear model
• MLP provides only marginal improvement over the linear evaluation
Experiments and Results
PR-222: Revisiting Self-Supervised Visual Representation Learning 13
4.4. Better performance on the pretext task does not always translate to better
representations
• Performance on the pretext task is a good proxy, but not always..
Experiments and Results
PR-222: Revisiting Self-Supervised Visual Representation Learning 14
4.5. Skip-connections prevent degradation of representation quality towards the end of
CNNs
• VGG-BN get worse towards the end of the network, but not ResNet, RevNet
• Model specialize to the pretext task and discard more general semantic features in the later layers
• Using skip-connections preserve information learned in intermediate layers
Experiments and Results
PR-222: Revisiting Self-Supervised Visual Representation Learning 15
4.6. Model width and representation size strongly influence the representation quality
• Check whether the increase in performance is due to increased network capacity or the use of higher-
dimensional representations, or to the interplay of both
• Disentangle the network width from the representation size(pre-logits channels)
• Increasing the widening factor consistently boosts performance in both the full and low-data regimes.
Experiments and Results
PR-222: Revisiting Self-Supervised Visual Representation Learning 16
4.7. SGD for training linear model takes long time to converge
• Previous works use short training time
• Investigate the importance of the SGD optimization schedule for training logistic regression
• The first decay has a large influence on the final accuracy
Experiments and Results
PR-222: Revisiting Self-Supervised Visual Representation Learning 17
Revisit previously proposed self-supervised models and conduct a large-scale study
• Architecture design in the fully-supervised setting necessarily do not translate to the self-supervised
setting (VGG19-BN)
• Using skip-connections can achieve consistently good results in contrast to AlexNet
• Widening factor of CNNs has a drastic effect on performance of self-supervised techniques
• SGD training of linear logistic regression require very long time to converge
• Ranking of architectures  X → Ranking of methods
Conclusion
PR-222: Revisiting Self-Supervised Visual Representation Learning 18

More Related Content

What's hot

YOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewYOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection review
LEE HOSEONG
 
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
Sujit Pal
 
PR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersPR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision Learners
Jinwon Lee
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
Sungchul Kim
 
Improving neural question generation using answer separation
Improving neural question generation using answer separationImproving neural question generation using answer separation
Improving neural question generation using answer separation
NAVER Engineering
 
Evolving a Medical Image Similarity Search
Evolving a Medical Image Similarity SearchEvolving a Medical Image Similarity Search
Evolving a Medical Image Similarity Search
Sujit Pal
 
Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural Networks
PyData
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
Jinwon Lee
 
[CVPR2020] Simple but effective image enhancement techniques
[CVPR2020] Simple but effective image enhancement techniques[CVPR2020] Simple but effective image enhancement techniques
[CVPR2020] Simple but effective image enhancement techniques
JaeJun Yoo
 
CNN Quantization
CNN QuantizationCNN Quantization
CNN Quantization
Emanuele Ghelfi
 
Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun Yoo
JaeJun Yoo
 
Attentional Object Detection - introductory slides.
Attentional Object Detection - introductory slides.Attentional Object Detection - introductory slides.
Attentional Object Detection - introductory slides.Sergey Karayev
 
Revisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural NetworksRevisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural Networks
Sungchul Kim
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
Sungjoon Choi
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott Clark
SigOpt
 
A beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trendsA beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trends
JaeJun Yoo
 
Making neural programming architectures generalize via recursion
Making neural programming architectures generalize via recursionMaking neural programming architectures generalize via recursion
Making neural programming architectures generalize via recursion
Katy Lee
 
Learning deep features for discriminative localization
Learning deep features for discriminative localizationLearning deep features for discriminative localization
Learning deep features for discriminative localization
太一郎 遠藤
 
Plotcon 2016 Visualization Talk by Alexandra Johnson
Plotcon 2016 Visualization Talk  by Alexandra JohnsonPlotcon 2016 Visualization Talk  by Alexandra Johnson
Plotcon 2016 Visualization Talk by Alexandra Johnson
SigOpt
 
Architecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks IArchitecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks I
Wanjin Yu
 

What's hot (20)

YOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewYOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection review
 
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
 
PR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersPR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision Learners
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
 
Improving neural question generation using answer separation
Improving neural question generation using answer separationImproving neural question generation using answer separation
Improving neural question generation using answer separation
 
Evolving a Medical Image Similarity Search
Evolving a Medical Image Similarity SearchEvolving a Medical Image Similarity Search
Evolving a Medical Image Similarity Search
 
Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural Networks
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
 
[CVPR2020] Simple but effective image enhancement techniques
[CVPR2020] Simple but effective image enhancement techniques[CVPR2020] Simple but effective image enhancement techniques
[CVPR2020] Simple but effective image enhancement techniques
 
CNN Quantization
CNN QuantizationCNN Quantization
CNN Quantization
 
Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun Yoo
 
Attentional Object Detection - introductory slides.
Attentional Object Detection - introductory slides.Attentional Object Detection - introductory slides.
Attentional Object Detection - introductory slides.
 
Revisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural NetworksRevisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural Networks
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott Clark
 
A beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trendsA beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trends
 
Making neural programming architectures generalize via recursion
Making neural programming architectures generalize via recursionMaking neural programming architectures generalize via recursion
Making neural programming architectures generalize via recursion
 
Learning deep features for discriminative localization
Learning deep features for discriminative localizationLearning deep features for discriminative localization
Learning deep features for discriminative localization
 
Plotcon 2016 Visualization Talk by Alexandra Johnson
Plotcon 2016 Visualization Talk  by Alexandra JohnsonPlotcon 2016 Visualization Talk  by Alexandra Johnson
Plotcon 2016 Visualization Talk by Alexandra Johnson
 
Architecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks IArchitecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks I
 

Similar to "Revisiting self supervised visual representation learning" Paper Review

PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
Sunghoon Joo
 
ResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksResNeSt: Split-Attention Networks
ResNeSt: Split-Attention Networks
Seunghyun Hwang
 
Vue.js Use Cases
Vue.js Use CasesVue.js Use Cases
Vue.js Use Cases
GlobalLogic Ukraine
 
An Automation Framework That Really Works
An Automation Framework That Really WorksAn Automation Framework That Really Works
An Automation Framework That Really WorksBasivi Reddy Junna
 
TIP_TAViT_presentation.pdf
TIP_TAViT_presentation.pdfTIP_TAViT_presentation.pdf
TIP_TAViT_presentation.pdf
BoahKim2
 
The Effect of Third Party Implementations on Reproducibility
The Effect of Third Party Implementations on ReproducibilityThe Effect of Third Party Implementations on Reproducibility
The Effect of Third Party Implementations on Reproducibility
Balázs Hidasi
 
Student feedback system
Student feedback systemStudent feedback system
Student feedback system
msandbhor
 
Refactoring Legacy Web Forms for Test Automation
Refactoring Legacy Web Forms for Test AutomationRefactoring Legacy Web Forms for Test Automation
Refactoring Legacy Web Forms for Test Automation
Stephen Fuqua
 
18 - Impact of Process Conformance on the Effects of Test-driven Development
18 - Impact of Process Conformance on the Effects of Test-driven Development18 - Impact of Process Conformance on the Effects of Test-driven Development
18 - Impact of Process Conformance on the Effects of Test-driven Development
ESEM 2014
 
addressing tim/quality trade-off in view maintenance
addressing tim/quality trade-off in view maintenanceaddressing tim/quality trade-off in view maintenance
addressing tim/quality trade-off in view maintenance
Soheila Dehghanzadeh
 
Bag of tricks for image classification with convolutional neural networks r...
Bag of tricks for image classification with convolutional neural networks   r...Bag of tricks for image classification with convolutional neural networks   r...
Bag of tricks for image classification with convolutional neural networks r...
Dongmin Choi
 
tip oopt pse-summit2017
tip oopt pse-summit2017tip oopt pse-summit2017
tip oopt pse-summit2017
domenico di mola
 
MSCV Capstone Spring 2020 Presentation - RL for AD
MSCV Capstone Spring 2020 Presentation - RL for ADMSCV Capstone Spring 2020 Presentation - RL for AD
MSCV Capstone Spring 2020 Presentation - RL for AD
Mayank Gupta
 
ABC of developer test
ABC of developer testABC of developer test
ABC of developer test
Dr. Anish Cheriyan (PhD)
 
Software Process Models
Software Process ModelsSoftware Process Models
Software Process Models
andyr91
 
Software Process Models
Software Process ModelsSoftware Process Models
Software Process Models
Atul Karmyal
 
Graph convolutional neural networks for web-scale recommender systems.pptx
Graph convolutional neural networks for web-scale recommender systems.pptxGraph convolutional neural networks for web-scale recommender systems.pptx
Graph convolutional neural networks for web-scale recommender systems.pptx
ssuser2624f71
 
Large Scale GAN Training for High Fidelity Natural Image Synthesis
Large Scale GAN Training for High Fidelity Natural Image SynthesisLarge Scale GAN Training for High Fidelity Natural Image Synthesis
Large Scale GAN Training for High Fidelity Natural Image Synthesis
Seunghyun Hwang
 
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
LDBC council
 

Similar to "Revisiting self supervised visual representation learning" Paper Review (20)

PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
 
ResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksResNeSt: Split-Attention Networks
ResNeSt: Split-Attention Networks
 
Vue.js Use Cases
Vue.js Use CasesVue.js Use Cases
Vue.js Use Cases
 
An Automation Framework That Really Works
An Automation Framework That Really WorksAn Automation Framework That Really Works
An Automation Framework That Really Works
 
TIP_TAViT_presentation.pdf
TIP_TAViT_presentation.pdfTIP_TAViT_presentation.pdf
TIP_TAViT_presentation.pdf
 
The Effect of Third Party Implementations on Reproducibility
The Effect of Third Party Implementations on ReproducibilityThe Effect of Third Party Implementations on Reproducibility
The Effect of Third Party Implementations on Reproducibility
 
Student feedback system
Student feedback systemStudent feedback system
Student feedback system
 
Refactoring Legacy Web Forms for Test Automation
Refactoring Legacy Web Forms for Test AutomationRefactoring Legacy Web Forms for Test Automation
Refactoring Legacy Web Forms for Test Automation
 
18 - Impact of Process Conformance on the Effects of Test-driven Development
18 - Impact of Process Conformance on the Effects of Test-driven Development18 - Impact of Process Conformance on the Effects of Test-driven Development
18 - Impact of Process Conformance on the Effects of Test-driven Development
 
Sudhakar Resume
Sudhakar ResumeSudhakar Resume
Sudhakar Resume
 
addressing tim/quality trade-off in view maintenance
addressing tim/quality trade-off in view maintenanceaddressing tim/quality trade-off in view maintenance
addressing tim/quality trade-off in view maintenance
 
Bag of tricks for image classification with convolutional neural networks r...
Bag of tricks for image classification with convolutional neural networks   r...Bag of tricks for image classification with convolutional neural networks   r...
Bag of tricks for image classification with convolutional neural networks r...
 
tip oopt pse-summit2017
tip oopt pse-summit2017tip oopt pse-summit2017
tip oopt pse-summit2017
 
MSCV Capstone Spring 2020 Presentation - RL for AD
MSCV Capstone Spring 2020 Presentation - RL for ADMSCV Capstone Spring 2020 Presentation - RL for AD
MSCV Capstone Spring 2020 Presentation - RL for AD
 
ABC of developer test
ABC of developer testABC of developer test
ABC of developer test
 
Software Process Models
Software Process ModelsSoftware Process Models
Software Process Models
 
Software Process Models
Software Process ModelsSoftware Process Models
Software Process Models
 
Graph convolutional neural networks for web-scale recommender systems.pptx
Graph convolutional neural networks for web-scale recommender systems.pptxGraph convolutional neural networks for web-scale recommender systems.pptx
Graph convolutional neural networks for web-scale recommender systems.pptx
 
Large Scale GAN Training for High Fidelity Natural Image Synthesis
Large Scale GAN Training for High Fidelity Natural Image SynthesisLarge Scale GAN Training for High Fidelity Natural Image Synthesis
Large Scale GAN Training for High Fidelity Natural Image Synthesis
 
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
 

More from LEE HOSEONG

Unsupervised anomaly detection using style distillation
Unsupervised anomaly detection using style distillationUnsupervised anomaly detection using style distillation
Unsupervised anomaly detection using style distillation
LEE HOSEONG
 
CNN Architecture A to Z
CNN Architecture A to ZCNN Architecture A to Z
CNN Architecture A to Z
LEE HOSEONG
 
carrier of_tricks_for_image_classification
carrier of_tricks_for_image_classificationcarrier of_tricks_for_image_classification
carrier of_tricks_for_image_classification
LEE HOSEONG
 
Human uncertainty makes classification more robust, ICCV 2019 Review
Human uncertainty makes classification more robust, ICCV 2019 ReviewHuman uncertainty makes classification more robust, ICCV 2019 Review
Human uncertainty makes classification more robust, ICCV 2019 Review
LEE HOSEONG
 
Single Image Super Resolution Overview
Single Image Super Resolution OverviewSingle Image Super Resolution Overview
Single Image Super Resolution Overview
LEE HOSEONG
 
2019 ICLR Best Paper Review
2019 ICLR Best Paper Review2019 ICLR Best Paper Review
2019 ICLR Best Paper Review
LEE HOSEONG
 
"Google Vizier: A Service for Black-Box Optimization" Paper Review
"Google Vizier: A Service for Black-Box Optimization" Paper Review"Google Vizier: A Service for Black-Box Optimization" Paper Review
"Google Vizier: A Service for Black-Box Optimization" Paper Review
LEE HOSEONG
 
"Searching for Activation Functions" Paper Review
"Searching for Activation Functions" Paper Review"Searching for Activation Functions" Paper Review
"Searching for Activation Functions" Paper Review
LEE HOSEONG
 
"Learning transferable architectures for scalable image recognition" Paper Re...
"Learning transferable architectures for scalable image recognition" Paper Re..."Learning transferable architectures for scalable image recognition" Paper Re...
"Learning transferable architectures for scalable image recognition" Paper Re...
LEE HOSEONG
 
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
LEE HOSEONG
 
"Dataset and metrics for predicting local visible differences" Paper Review
"Dataset and metrics for predicting local visible differences" Paper Review"Dataset and metrics for predicting local visible differences" Paper Review
"Dataset and metrics for predicting local visible differences" Paper Review
LEE HOSEONG
 
"From image level to pixel-level labeling with convolutional networks" Paper ...
"From image level to pixel-level labeling with convolutional networks" Paper ..."From image level to pixel-level labeling with convolutional networks" Paper ...
"From image level to pixel-level labeling with convolutional networks" Paper ...
LEE HOSEONG
 
"simple does it weakly supervised instance and semantic segmentation" Paper r...
"simple does it weakly supervised instance and semantic segmentation" Paper r..."simple does it weakly supervised instance and semantic segmentation" Paper r...
"simple does it weakly supervised instance and semantic segmentation" Paper r...
LEE HOSEONG
 
"How does batch normalization help optimization" Paper Review
"How does batch normalization help optimization" Paper Review"How does batch normalization help optimization" Paper Review
"How does batch normalization help optimization" Paper Review
LEE HOSEONG
 

More from LEE HOSEONG (14)

Unsupervised anomaly detection using style distillation
Unsupervised anomaly detection using style distillationUnsupervised anomaly detection using style distillation
Unsupervised anomaly detection using style distillation
 
CNN Architecture A to Z
CNN Architecture A to ZCNN Architecture A to Z
CNN Architecture A to Z
 
carrier of_tricks_for_image_classification
carrier of_tricks_for_image_classificationcarrier of_tricks_for_image_classification
carrier of_tricks_for_image_classification
 
Human uncertainty makes classification more robust, ICCV 2019 Review
Human uncertainty makes classification more robust, ICCV 2019 ReviewHuman uncertainty makes classification more robust, ICCV 2019 Review
Human uncertainty makes classification more robust, ICCV 2019 Review
 
Single Image Super Resolution Overview
Single Image Super Resolution OverviewSingle Image Super Resolution Overview
Single Image Super Resolution Overview
 
2019 ICLR Best Paper Review
2019 ICLR Best Paper Review2019 ICLR Best Paper Review
2019 ICLR Best Paper Review
 
"Google Vizier: A Service for Black-Box Optimization" Paper Review
"Google Vizier: A Service for Black-Box Optimization" Paper Review"Google Vizier: A Service for Black-Box Optimization" Paper Review
"Google Vizier: A Service for Black-Box Optimization" Paper Review
 
"Searching for Activation Functions" Paper Review
"Searching for Activation Functions" Paper Review"Searching for Activation Functions" Paper Review
"Searching for Activation Functions" Paper Review
 
"Learning transferable architectures for scalable image recognition" Paper Re...
"Learning transferable architectures for scalable image recognition" Paper Re..."Learning transferable architectures for scalable image recognition" Paper Re...
"Learning transferable architectures for scalable image recognition" Paper Re...
 
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
 
"Dataset and metrics for predicting local visible differences" Paper Review
"Dataset and metrics for predicting local visible differences" Paper Review"Dataset and metrics for predicting local visible differences" Paper Review
"Dataset and metrics for predicting local visible differences" Paper Review
 
"From image level to pixel-level labeling with convolutional networks" Paper ...
"From image level to pixel-level labeling with convolutional networks" Paper ..."From image level to pixel-level labeling with convolutional networks" Paper ...
"From image level to pixel-level labeling with convolutional networks" Paper ...
 
"simple does it weakly supervised instance and semantic segmentation" Paper r...
"simple does it weakly supervised instance and semantic segmentation" Paper r..."simple does it weakly supervised instance and semantic segmentation" Paper r...
"simple does it weakly supervised instance and semantic segmentation" Paper r...
 
"How does batch normalization help optimization" Paper Review
"How does batch normalization help optimization" Paper Review"How does batch normalization help optimization" Paper Review
"How does batch normalization help optimization" Paper Review
 

Recently uploaded

JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 

Recently uploaded (20)

JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 

"Revisiting self supervised visual representation learning" Paper Review

  • 1. 2th February 2020 PR12 Paper Review Ho Seong Lee (hoya012) Cognex Deep Learning Lab KR 2019 CVPR PR-222: Revisiting Self-Supervised Visual Representation Learning 1
  • 2. Contents • Introduction • Self-Supervised Study Setup • Architectures of CNN models • Self-supervised techniques in this study • Evaluation • Datasets • Experiments and Results • Conclusion PR-222: Revisiting Self-Supervised Visual Representation Learning 2
  • 3. Before Start.. [PR-208] Unsupervised Visual Representation Learning Overview: Toward Self-Supervision • Video Link: https://youtu.be/eDDHsbMgOJQ • I highly recommend watching the video above(PR-208) before listening to this presentation!! PR-222: Revisiting Self-Supervised Visual Representation Learning 3
  • 4. Introduction “Revisiting Self-Supervised Visual Representation Learning”, 2019 CVPR • Many the pretext tasks for self-supervised learning have been studied • But.. Still low performance than supervised setting • Other important aspects, such as CNN architecture has not received equal attention PR-222: Revisiting Self-Supervised Visual Representation Learning 4
  • 5. “Revisiting Self-Supervised Visual Representation Learning”, 2019 CVPR • Other important aspects, such as CNN architecture has not received equal attention • So, revisit previously proposed self-supervised models and conduct a large-scale study Introduction PR-222: Revisiting Self-Supervised Visual Representation Learning 5
  • 6. 3.1. Architectures of CNN models • A large part of the self-supervised techniques for visual representation approaches use AlexNet • Employ modern network architectures • ResNet50, pre-logits of size 512*k • RevNet (The Reversible ResNet), but do not use G like real NVP paper • VGG with batch-normalization, initial conv layer has 8*k channels, fc layer has 512*k channels Self-Supervised Study Setup Why use an old-fashioned architecture?! reference: The Reversible Residual Network: Backpropagation Without Storing Activations, 2017 NIPS ResNet RevNet widening factor k, k ∈ {4, 8, 12, 16} PR-222: Revisiting Self-Supervised Visual Representation Learning 6
  • 7. 3.2. Self-supervised techniques in this study • Use 4 self-supervised techniques for experiments • Rotation • Exemplar • Jigsaw • Relative Patch Location Self-Supervised Study Setup PR-222: Revisiting Self-Supervised Visual Representation Learning 7
  • 8. 3.3. Evaluation • Follow common rule - Training a linear logistic regression model to solve multi-class classification task • Exact the representation from the frozen network at the pre-logit level • Train the logistic regression using L-BFGS except in Table 2 • For consistency and fair evaluation, use SGD with momentum, augmentation in Table 2 Self-Supervised Study Setup Table 2 PR-222: Revisiting Self-Supervised Visual Representation Learning 8
  • 9. 3.4. Datasets • ImageNet (Train + Validation) • In order to avoid overfitting, use own validation split (50,000 random images from training split) for all studies except in Table 2 • All self-supervised models are trained on ImageNet(without labels) • Places205 (Validation only) • Qualitatively different from ImageNet → good candidate for evaluating how well the learned representations generalize to new unseen data of different nature • Same procedure as for ImageNet regarding validation splits (random splitting) Self-Supervised Study Setup PR-222: Revisiting Self-Supervised Visual Representation Learning 9
  • 10. 4.1. Evaluation on ImageNet and Places205 • Measure the representation quality produced by 6 different CNN with various widening factors • Increasing the number of channels improves performance of self-supervised models Experiments and Results Widening factor Random Initialize Without ReLU before GAP layer PR-222: Revisiting Self-Supervised Visual Representation Learning 10
  • 11. 4.1. Evaluation on ImageNet and Places205 • neither is the ranking of architectures consistent across different methods, nor is the ranking of methods consistent across architectures • Ranking of Places205 is consistent with that of ImageNet → generalized to new dataset • VGG19-BN consistently demonstrates worst performance, even though it achieve performance similar to ResNet 50 on standard vision benchmark (fully supervised setting) Experiments and Results Rotation → RevNet50 Exemplar → ResNet50 v1 Rel. Patch Loc. → ResNet50 v1 Jigsaw → ResNet50 v1 VGG19-BN → Worst performance in all case PR-222: Revisiting Self-Supervised Visual Representation Learning 11
  • 12. 4.2. Comparison to prior work • For consistency and fair evaluation, use SGD with momentum, augmentation in Table 2 • As a result of selecting the right architecture, significantly outperform previous reported results Experiments and Results Prev. Result PR-222: Revisiting Self-Supervised Visual Representation Learning 12
  • 13. 4.3. A linear model is adequate for evaluation • Consider an alternative evaluation scenario – use MLP for solving the evaluation task • Add a single hidden layer with 1000 channels with ReLU, Dropout to become non-linear model • MLP provides only marginal improvement over the linear evaluation Experiments and Results PR-222: Revisiting Self-Supervised Visual Representation Learning 13
  • 14. 4.4. Better performance on the pretext task does not always translate to better representations • Performance on the pretext task is a good proxy, but not always.. Experiments and Results PR-222: Revisiting Self-Supervised Visual Representation Learning 14
  • 15. 4.5. Skip-connections prevent degradation of representation quality towards the end of CNNs • VGG-BN get worse towards the end of the network, but not ResNet, RevNet • Model specialize to the pretext task and discard more general semantic features in the later layers • Using skip-connections preserve information learned in intermediate layers Experiments and Results PR-222: Revisiting Self-Supervised Visual Representation Learning 15
  • 16. 4.6. Model width and representation size strongly influence the representation quality • Check whether the increase in performance is due to increased network capacity or the use of higher- dimensional representations, or to the interplay of both • Disentangle the network width from the representation size(pre-logits channels) • Increasing the widening factor consistently boosts performance in both the full and low-data regimes. Experiments and Results PR-222: Revisiting Self-Supervised Visual Representation Learning 16
  • 17. 4.7. SGD for training linear model takes long time to converge • Previous works use short training time • Investigate the importance of the SGD optimization schedule for training logistic regression • The first decay has a large influence on the final accuracy Experiments and Results PR-222: Revisiting Self-Supervised Visual Representation Learning 17
  • 18. Revisit previously proposed self-supervised models and conduct a large-scale study • Architecture design in the fully-supervised setting necessarily do not translate to the self-supervised setting (VGG19-BN) • Using skip-connections can achieve consistently good results in contrast to AlexNet • Widening factor of CNNs has a drastic effect on performance of self-supervised techniques • SGD training of linear logistic regression require very long time to converge • Ranking of architectures  X → Ranking of methods Conclusion PR-222: Revisiting Self-Supervised Visual Representation Learning 18