A Simple Framework for Contrastive Learning of Visual Representations

•

0 likes•256 views

Review : A Simple Framework for Contrastive Learning of Visual Representat - by Seunghyun Hwang (Yonsei University, Severance Hospital, Center for Clinical Data Science)

Technology

A Simple Framework for Contrastive Learning of
Visual Representations
Hwang seung hyun
Yonsei University Severance Hospital CCIDS
Google Research Team, Geoffrey Hinton | ICML 2020
2020.07.19

Introduction Related Work Methods and
Experiments
01 02 03
Conclusion
04
Yonsei Unversity Severance Hospital CCIDS
Contents

SimCLR
Introduction – Proposal
• Most mainstream approaches for unsupervised visual representations fall into one
of two classes: Generative or Discriminative
Introduction / Related Work / Methods and Experiments / Conclusion
01Predict rotation
Autoencoder
Jigsaw Puzzle

SimCLR
Introduction – Proposal
• Discriminative approaches based on Contrastive Learning in the latent space have
recently shown state-of-the-art results.
Introduction / Related Work / Methods and Experiments / Conclusion
02[AMDIM]

SimCLR
Introduction – Proposal
Introduction / Related Work / Methods and Experiments / Conclusion
• SimCLR outperform previous
work but is simpler
• SimCLR achieves 76.5% top-1
accuracy which is a 7% relative
improvement over previous SOTA
method.
• When fine-tuned with only 1% of
the ImageNet labels, SimCLR
achieved 85.8% top-5 accuracy.
03

SimCLR
Introduction – Contributions
• Composition of multiple data augmentation operations is crucial in unsupervised
contrastive learning.
• Learnable nonlinear transformation between the representation and the
contrastive loss substantially improves the quality of the learned representations.
• Contrastive learning benefits from larger batch sizes and longer training.
• Like supervised learning, contrastive learning benefits from deeper and wider
networks.
• Representation learning with contrastive cross entropy loss benefits from
normalized embeddings and temperature parameter.
Introduction / Related Work / Methods and Experiments / Conclusion
04

Related Work
Introduction / Related Work / Methods and Experiments / Conclusion
05
Handcrafted pretext tasks
• Relative patch prediction
• Jigsaw puzzles
• Rotation Prediction
• Colorization Prediction
.
.
. Limits the GENERALITY of
learned Representations!

Related Work
Introduction / Related Work / Methods and Experiments / Conclusion
06
Contrastive Visual Representation learning
• CPC V2
• AMDIM
• Rotation Prediction
• MoCo (by Facebook)
.
.
. “SimCLR” is their composition!

Methods and Experiments
Overall Architecture
Introduction / Related Work / Methods and Experiments / Conclusion
07
https://www.youtube.com/watch?v=5lsmGWtxnKA

Methods and Experiments
Architecture – Data Augmentation
Introduction / Related Work / Methods and Experiments / Conclusion
08
https://www.youtube.com/watch?v=5lsmGWtxnKA

Methods and Experiments
Architecture – loss function
Introduction / Related Work / Methods and Experiments / Conclusion
09
https://www.youtube.com/watch?v=5lsmGWtxnKA

Methods and Experiments
Introduction / Related Work / Methods and Experiments / Conclusion
10
https://www.youtube.com/watch?v=5lsmGWtxnKA
Final Loss
Architecture – loss function
[Normalized temperature-scaled cross entropy loss]

Methods and Experiments
Introduction / Related Work / Methods and Experiments / Conclusion
11
Algorithm

Methods and Experiments
Other Methods
Introduction / Related Work / Methods and Experiments / Conclusion
12
• Large Batch Size
- Use Train batch 4096
- Use LARS optimizer, since using standard SGD/Momentum optimizer
might be unstable within large batch.
• Global BN
- When training with data parallelism, BN mean and variance are
typically aggregated locally per device.
- Aggregated BN mean and variance over all devices during the training.

Methods and Experiments
Evaluation Protocal
Introduction / Related Work / Methods and Experiments / Conclusion
13
• Dataset and Metrics
- ImageNet
- Transfer Learning on wide range of datasets (Cifar10, Cifar100, etc)
• Default Setting
- Random crop and resize, Color distortions, Gaussian blur
- ResNet-50 as base encoder network
- 2-layer MLP projection head to project the representation to a 128-
dimensional latent space
- Trained at batch size 4096 for 100 epochs

Methods and Experiments
Ablation Studies – Data Augmentation
Introduction / Related Work / Methods and Experiments / Conclusion
14
“Coloring”, “Crop” = Crucial

Methods and Experiments
Ablation Studies – Data Augmentation
Introduction / Related Work / Methods and Experiments / Conclusion
15

Methods and Experiments
Ablation Studies – Nonlinear Projection head
Introduction / Related Work / Methods and Experiments / Conclusion
16
• The hidden layer before the projection head is a better representation
than the layer after

Methods and Experiments
Ablation Studies – Batch Size
Introduction / Related Work / Methods and Experiments / Conclusion
17

Methods and Experiments
Results – ImageNet
Introduction / Related Work / Methods and Experiments / Conclusion
18

Methods and Experiments
Results – semi-supervised learning
Introduction / Related Work / Methods and Experiments / Conclusion
19

Methods and Experiments
Results – Transfer Learning
Introduction / Related Work / Methods and Experiments / Conclusion
20

Conclusion
Introduction / Related Work / Methods and Experiments / Conclusion
• Improved considerably over previous methods for self-
supervised, semi-supervised, and transfer learning.
• SimCLR Differs from standard supervised learning on
ImageNet only in the choice of data augmentation, the use
of a nonlinear head, and the loss function.
• Despite a recent surge in interest, self-supervised learning
remains undervalued.
21

What's hot

Relational knowledge distillationNAVER Engineering

Transfer LearningHichem Felouat

Deep Learning - Optimization BasicJaehyun Jun

Graph Representation LearningJure Leskovec

Fine tune and deploy Hugging Face NLP modelsOVHcloud

StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery ivaderivader

ELM: Extreme Learning Machine: Learning without iterative tuningzukun

Siamese networks.pptx.pdfvidhyalakshmi153619

Metric Learning 세미나.pptxDongkyunKim17

Masked Autoencoders Are Scalable Vision Learners.pptxSangmin Woo

Autoencoders in Deep Learningmilad abbasi

Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Universitat Politècnica de Catalunya

(CVPR2021 Oral) RobustNet: Improving Domain Generalization in Urban-Scene Seg...Sungha Choi

Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Deep Learning Italia

AutoencodersCloudxLab

【DL輪読会】StyleCLIP: Text-Driven Manipulation of StyleGAN ImageryDeep Learning JP

PR-217: EfficientDet: Scalable and Efficient Object DetectionJinwon Lee

End to-end semi-supervised object detection with soft teacher ver.1.0taeseon ryu

Introduction to Deep learningMassimiliano Ruocco

Emerging Properties in Self-Supervised Vision TransformersSungchul Kim

What's hot (20)

Relational knowledge distillation

Transfer Learning

Deep Learning - Optimization Basic

Graph Representation Learning

Fine tune and deploy Hugging Face NLP models

StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery

ELM: Extreme Learning Machine: Learning without iterative tuning

Siamese networks.pptx.pdf

Metric Learning 세미나.pptx

Masked Autoencoders Are Scalable Vision Learners.pptx

Autoencoders in Deep Learning

Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018

(CVPR2021 Oral) RobustNet: Improving Domain Generalization in Urban-Scene Seg...

Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)

Autoencoders

【DL輪読会】StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery

PR-217: EfficientDet: Scalable and Efficient Object Detection

End to-end semi-supervised object detection with soft teacher ver.1.0

Introduction to Deep learning

Emerging Properties in Self-Supervised Vision Transformers

Similar to A Simple Framework for Contrastive Learning of Visual Representations

How useful is self-supervised pretraining for Visual tasks?Seunghyun Hwang

FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...Seunghyun Hwang

Performance of Go on Multicore SystemsNo J

MSCV Capstone Spring 2020 Presentation - RL for ADMayank Gupta

Troubleshooting Deep Neural Networks - Full Stack Deep LearningSergey Karayev

modelling-and-simulation-made-easy-with-simulink.pdfGBBarrios

Large Scale GAN Training for High Fidelity Natural Image SynthesisSeunghyun Hwang

Toward a Traceable, Explainable and fair JD/Resume Recommendation SystemAmine Barrak

“Houston, we have a model...” Introduction to MLOpsRui Quintino

ASS_SDM2012_AliMDO_Lab

Multi-core Real-time Simulation of High-Fidelity Vehicle Models using Open St...Modelon

Cp04invitedslideJean-Francois Puget

Bart Knaack - The Truth About Model-Based Quality ImprovementsTEST Huddle

深度學習在AOI的應用CHENHuiMei

Single Camera Calibration Using Partially Visible Calibration Objects Based o...Yuji Oyamada

BC 504-Operation ResearchPCTE

AIAA-SDM-SequentialSampling-2012OptiModel

Using Bayesian Optimization to Tune Machine Learning ModelsScott Clark

Using Bayesian Optimization to Tune Machine Learning ModelsSigOpt

Face Identification for Humanoid Robotthomaswangxin

Similar to A Simple Framework for Contrastive Learning of Visual Representations (20)

How useful is self-supervised pretraining for Visual tasks?

FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...

Performance of Go on Multicore Systems

MSCV Capstone Spring 2020 Presentation - RL for AD

Troubleshooting Deep Neural Networks - Full Stack Deep Learning

modelling-and-simulation-made-easy-with-simulink.pdf

Large Scale GAN Training for High Fidelity Natural Image Synthesis

Toward a Traceable, Explainable and fair JD/Resume Recommendation System

“Houston, we have a model...” Introduction to MLOps

ASS_SDM2012_Ali

Multi-core Real-time Simulation of High-Fidelity Vehicle Models using Open St...

Cp04invitedslide

Bart Knaack - The Truth About Model-Based Quality Improvements

深度學習在AOI的應用

Single Camera Calibration Using Partially Visible Calibration Objects Based o...

BC 504-Operation Research

AIAA-SDM-SequentialSampling-2012

Using Bayesian Optimization to Tune Machine Learning Models

Face Identification for Humanoid Robot

Recently uploaded

Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada

Understanding the Laravel MVC ArchitecturePixlogix Infotech

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxnull - The Open Security Community

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

CloudStudio User manual (basic edition):comworks

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed

"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays

Key Features Of Token Development (1).pptxLBM Solutions

Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes

AI as an Interface for Commercial BuildingsMemoori

New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer

SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren

Artificial intelligence in the post-deep learning eraDeakin University

Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely

Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024

Understanding the Laravel MVC Architecture

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

CloudStudio User manual (basic edition):

Human Factors of XR: Using Human Factors to Design XR Systems

Scanning the Internet for External Cloud Exposures via SSL Certs

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn

Key Features Of Token Development (1).pptx

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners

AI as an Interface for Commercial Buildings

New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024

My Hashitalk Indonesia April 2024 Presentation

My INSURER PTE LTD - Insurtech Innovation Award 2024

SQL Database Design For Developers at php[tek] 2024

Artificial intelligence in the post-deep learning era

Advanced Test Driven-Development @ php[tek] 2024

Connect Wave/ connectwave Pitch Deck Presentation

Unlocking the Potential of the Cloud for IBM Power Systems

Science&tech:THE INFORMATION AGE STS.pdf

A Simple Framework for Contrastive Learning of Visual Representations

1. A Simple Framework for Contrastive Learning of Visual Representations Hwang seung hyun Yonsei University Severance Hospital CCIDS Google Research Team, Geoffrey Hinton | ICML 2020 2020.07.19

2. Introduction Related Work Methods and Experiments 01 02 03 Conclusion 04 Yonsei Unversity Severance Hospital CCIDS Contents

3. SimCLR Introduction – Proposal • Most mainstream approaches for unsupervised visual representations fall into one of two classes: Generative or Discriminative Introduction / Related Work / Methods and Experiments / Conclusion 01Predict rotation Autoencoder Jigsaw Puzzle

4. SimCLR Introduction – Proposal • Discriminative approaches based on Contrastive Learning in the latent space have recently shown state-of-the-art results. Introduction / Related Work / Methods and Experiments / Conclusion 02[AMDIM]

5. SimCLR Introduction – Proposal Introduction / Related Work / Methods and Experiments / Conclusion • SimCLR outperform previous work but is simpler • SimCLR achieves 76.5% top-1 accuracy which is a 7% relative improvement over previous SOTA method. • When fine-tuned with only 1% of the ImageNet labels, SimCLR achieved 85.8% top-5 accuracy. 03

6. SimCLR Introduction – Contributions • Composition of multiple data augmentation operations is crucial in unsupervised contrastive learning. • Learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations. • Contrastive learning benefits from larger batch sizes and longer training. • Like supervised learning, contrastive learning benefits from deeper and wider networks. • Representation learning with contrastive cross entropy loss benefits from normalized embeddings and temperature parameter. Introduction / Related Work / Methods and Experiments / Conclusion 04

7. Related Work Introduction / Related Work / Methods and Experiments / Conclusion 05 Handcrafted pretext tasks • Relative patch prediction • Jigsaw puzzles • Rotation Prediction • Colorization Prediction . . . Limits the GENERALITY of learned Representations!

8. Related Work Introduction / Related Work / Methods and Experiments / Conclusion 06 Contrastive Visual Representation learning • CPC V2 • AMDIM • Rotation Prediction • MoCo (by Facebook) . . . “SimCLR” is their composition!

9. Methods and Experiments Overall Architecture Introduction / Related Work / Methods and Experiments / Conclusion 07 https://www.youtube.com/watch?v=5lsmGWtxnKA

10. Methods and Experiments Architecture – Data Augmentation Introduction / Related Work / Methods and Experiments / Conclusion 08 https://www.youtube.com/watch?v=5lsmGWtxnKA

11. Methods and Experiments Architecture – loss function Introduction / Related Work / Methods and Experiments / Conclusion 09 https://www.youtube.com/watch?v=5lsmGWtxnKA

12. Methods and Experiments Introduction / Related Work / Methods and Experiments / Conclusion 10 https://www.youtube.com/watch?v=5lsmGWtxnKA Final Loss Architecture – loss function [Normalized temperature-scaled cross entropy loss]

13. Methods and Experiments Introduction / Related Work / Methods and Experiments / Conclusion 11 Algorithm

14. Methods and Experiments Other Methods Introduction / Related Work / Methods and Experiments / Conclusion 12 • Large Batch Size - Use Train batch 4096 - Use LARS optimizer, since using standard SGD/Momentum optimizer might be unstable within large batch. • Global BN - When training with data parallelism, BN mean and variance are typically aggregated locally per device. - Aggregated BN mean and variance over all devices during the training.

15. Methods and Experiments Evaluation Protocal Introduction / Related Work / Methods and Experiments / Conclusion 13 • Dataset and Metrics - ImageNet - Transfer Learning on wide range of datasets (Cifar10, Cifar100, etc) • Default Setting - Random crop and resize, Color distortions, Gaussian blur - ResNet-50 as base encoder network - 2-layer MLP projection head to project the representation to a 128- dimensional latent space - Trained at batch size 4096 for 100 epochs

16. Methods and Experiments Ablation Studies – Data Augmentation Introduction / Related Work / Methods and Experiments / Conclusion 14 “Coloring”, “Crop” = Crucial

17. Methods and Experiments Ablation Studies – Data Augmentation Introduction / Related Work / Methods and Experiments / Conclusion 15

18. Methods and Experiments Ablation Studies – Nonlinear Projection head Introduction / Related Work / Methods and Experiments / Conclusion 16 • The hidden layer before the projection head is a better representation than the layer after

19. Methods and Experiments Ablation Studies – Batch Size Introduction / Related Work / Methods and Experiments / Conclusion 17

20. Methods and Experiments Results – ImageNet Introduction / Related Work / Methods and Experiments / Conclusion 18

21. Methods and Experiments Results – semi-supervised learning Introduction / Related Work / Methods and Experiments / Conclusion 19

22. Methods and Experiments Results – Transfer Learning Introduction / Related Work / Methods and Experiments / Conclusion 20

23. Conclusion Introduction / Related Work / Methods and Experiments / Conclusion • Improved considerably over previous methods for self- supervised, semi-supervised, and transfer learning. • SimCLR Differs from standard supervised learning on ImageNet only in the choice of data augmentation, the use of a nonlinear head, and the loss function. • Despite a recent surge in interest, self-supervised learning remains undervalued. 21

A Simple Framework for Contrastive Learning of Visual Representations

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to A Simple Framework for Contrastive Learning of Visual Representations

Similar to A Simple Framework for Contrastive Learning of Visual Representations (20)

More from Seunghyun Hwang

More from Seunghyun Hwang (15)

Recently uploaded

Recently uploaded (20)

A Simple Framework for Contrastive Learning of Visual Representations