SlideShare a Scribd company logo
A Simple Framework for Contrastive
Learning ofVisual Representations
Ting Chen, et al., “A Simple Framework for Contrastive Learning of Visual Representations”
8th March, 2020
PR12 Paper Review
JinWon Lee
Samsung Electronics
References
• The Illustrated SimCLR Framework
 https://amitness.com/2020/03/illustrated-simclr/
• Exploring SimCLR: A Simple Framework for Contrastive Learning of
Visual Representations
 https://towardsdatascience.com/exploring-simclr-a-simple-framework-for-
contrastive-learning-of-visual-representations-158c30601e7e
• SimCLR: Contrastive Learning ofVisual Representations
 https://medium.com/@nainaakash012/simclr-contrastive-learning-of-visual-
representations-52ecf1ac11fa
Introduction
• Learning effective visual representations without human supervision
is a long-standing problem.
• Most mainstream approaches fall into one of two classes: generative
or discriminative.
 Generative approaches – pixel level generation is computationally expensive
and may not be necessary for representation learning.
 Discriminative approaches learn representations using objective function like
supervised learning but pretext tasks have relied on somewhat ad-hoc
heuristics, which limits the generality of learn representations.
Contrastive Learning – Intuition
Contrastive Learning
• Contrastive methods aim to learn representations by enforcing
similar elements to be equal and dissimilar elements to be different.
Contrastive Learning – Data
• Example pairs of images which are similar and images which are
different are required for training a model
Images from “The Illustrated SimCLR Framework”
Supervised & Self-supervisedApproach
Images from “The Illustrated SimCLR Framework”
Contrastive Learning – Representstions
Images from “The Illustrated SimCLR Framework”
Contrastive Learning – Similarity Metric
Images from “The Illustrated SimCLR Framework”
Contrastive Learning – Noise Contrastive
Estimator Loss
• x+ is a positive example and x- is a negative example
• sim(.) is a similarity function
• Note that each positive pair (x,x+) we have a set of K negatives
SimCLR
SimCLR – Overview
• A stochastic data augmentation module that transforms any given
data example randomly resulting in two correlated views of the same
example.
 Random crop and resize(with random flip), color distortions, and Gaussian
blur
• ResNet50 is adopted as a encoder, and the output vector is from GAP
layer. (2048-dimension)
• Two layer MLP is used in projection head. (128-dimensional latent
space)
• No explicit negative sampling. 2(N-1) augmented examples within a
minibatch are used for negative samples. (N is a batch size)
• Cosine similarity function is a used similarity metric.
• Normalized temperature-scaled cross entropy(NT-Xent) loss is used.
SimCLR - Overview
• Training
 Batch size : 256~8192
 A batch size of 8192 gives 16382 negative examples per positive pair from
both augmentation views.
 To stabilize the training, LARS optimizer is used.
 Aggregating BN mean and variance over all devices during training.
 With 128TPU v3 cores, it takes ~1.5 hours to train ResNet-50 with a batch
size of 4096 for 100 epochs
• Dataset – ImageNet 2012 dataset
• To evaluate the learned representations, linear evaluation protocol is
used.
Step by Step Example
Images from “The Illustrated SimCLR Framework”
Step by Step Example
Images from “The Illustrated SimCLR Framework”
Step by Step Example
Images from “The Illustrated SimCLR Framework”
Step by Step Example
Images from “The Illustrated SimCLR Framework”
Step by Step Example
Images from “The Illustrated SimCLR Framework”
Step by Step Example
Images from “The Illustrated SimCLR Framework”
Step by Step Example
Images from “The Illustrated SimCLR Framework”
Step by Step Example
Images from “The Illustrated SimCLR Framework”
Step by Step Example
Images from “The Illustrated SimCLR Framework”
Step by Step Example
Images from “The Illustrated SimCLR Framework”
Data Augmentation for Contrastive
Representation Learning
• Many existing approaches define contrastive prediction tasks by
changing architecture.
• The authors use only simple data augmentation methods, this simple
design choice conveniently decouples the predictive task from other
components such as the NN architecture.
Data Augmentation for Contrastive
Representation Learning
Augmentations in the red boxes are used
Linear Evaluation under Individual or
Composition of Data Augmentation
Evaluation of Data Augmentation
• Asymmetric data transformation method
is used.
 Only one branch of the frame work is applied
the target transformation(s).
• No single transformation suffices to learn
good representations, even though the
model can almost perfectly identify the
positive pairs.
• Random cropping and random color
distortion stands out
 When using only random cropping as data
augmentation is that most patches from an
image share a similar color distortion
Contrastive Learning Needs Stronger Data
Augmentation
• Stronger color augmentation substantially improves the linear
evaluation of the learned unsupervised models.
• A sophisticated augmentation policy(such as AutoAugment) does not
work better than simple cropping + (stronger) color distortion
• Unsupervised contrastive learning benefits from stronger (color) data
augmentation than supervised learning.
• Data augmentation that does not yield accuracy benefits for supervised
learning can still help considerably with contrastive learning.
Unsupervised Contrastive Learning Benefits from
Bigger Models
• Unsupervised learning benefits more from bigger models than its
supervised counter part.
Nonlinear Projection Head
• Nonlinear projection is better than a linear projection(+3%) and
much better than no projection(>10%)
Nonlinear Projection Head
• The hidden layer before the projection head is a better
representation than the layer after.
• The importance of using the representation before the nonlinear
projection is due to loss of information induced by the contrastive
loss. In particular z = g(h) is trained to be invariant to data
transformation.
Loss Function
• l2 normalization along with temperature effectively weights
different examples, and an appropriate temperature can help the
model learn from hard negatives.
• Unlike cross-entropy, other objective functions do not weigh the
negatives by their relative hardness.
Larger Batch Size and LongerTraining
• When the training epochs is small, larger batch size have a significant
advantage.
• Larger batch sizes provide more negative examples, facilitating
convergence, and training longer also provides more negative examples,
improving the results.
Comparison with SOTA – Linear Evaluation
Comparison with SOTA – Semi-supervised
Learning
Comparison with SOTA
Transfer Learning
Appendix – Effects of LongerTraining for
Supervised Learning
• There is no significant benefit from training supervised models
longer on ImageNet.
• Stronger data augmentation slightly improves the accuracy of
ResNet-50 (4x) but does not help on ResNet-50.
Appendix
– CIFAR-10 Dataset Results
Conclusion
• SimCLR differs from standard supervised learning on ImageNet only
in the choice of data augmentation, the use of nonlinear head at the
end of the network, and the loss function.
• Composition of data augmentations plays a critical role in defining
effective predictive tasks.
• Nonlinear transformation between the representation and the
contrastive loss substantially improves the quality of the learned
representations.
• Contrastive learning benefits from larger batch sizes and more
training steps compared to supervised learning.
• SimCLR achieves 76.5% top-1 accuracy, which is a 7% relative
improvement over previous state-of-the-art, matching the
performance of a supervised ResNet-50.

More Related Content

What's hot

Continual Learning with Deep Architectures - Tutorial ICML 2021
Continual Learning with Deep Architectures - Tutorial ICML 2021Continual Learning with Deep Architectures - Tutorial ICML 2021
Continual Learning with Deep Architectures - Tutorial ICML 2021
Vincenzo Lomonaco
 
Deep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & FutureDeep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & Future
Rouyun Pan
 
Disentangled Representation Learning of Deep Generative Models
Disentangled Representation Learning of Deep Generative ModelsDisentangled Representation Learning of Deep Generative Models
Disentangled Representation Learning of Deep Generative Models
Ryohei Suzuki
 
Meta-Learning with Memory-Augmented Neural Networks (MANN)
Meta-Learning with Memory-Augmented Neural Networks (MANN)Meta-Learning with Memory-Augmented Neural Networks (MANN)
Meta-Learning with Memory-Augmented Neural Networks (MANN)
Yeonsu Kim
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
HJ van Veen
 
artificial neural network
artificial neural networkartificial neural network
artificial neural network
Pallavi Yadav
 
Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun Yoo
JaeJun Yoo
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]
Dongmin Choi
 
Single Image Super Resolution Overview
Single Image Super Resolution OverviewSingle Image Super Resolution Overview
Single Image Super Resolution Overview
LEE HOSEONG
 
Transformer in Computer Vision
Transformer in Computer VisionTransformer in Computer Vision
Transformer in Computer Vision
Dongmin Choi
 
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
Universitat Politècnica de Catalunya
 
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Luba Elliott
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
Sungchul Kim
 
Diffusion models beat gans on image synthesis
Diffusion models beat gans on image synthesisDiffusion models beat gans on image synthesis
Diffusion models beat gans on image synthesis
BeerenSahu
 
Mobilenet
MobilenetMobilenet
Mobilenet
harmonylab
 
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
Sunghoon Joo
 
Intro to modelling-supervised learning
Intro to modelling-supervised learningIntro to modelling-supervised learning
Intro to modelling-supervised learning
Justin Sebok
 
Convolutional neural network in practice
Convolutional neural network in practiceConvolutional neural network in practice
Convolutional neural network in practice
남주 김
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learning
Antonio Rueda-Toicen
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
Junaid Bhat
 

What's hot (20)

Continual Learning with Deep Architectures - Tutorial ICML 2021
Continual Learning with Deep Architectures - Tutorial ICML 2021Continual Learning with Deep Architectures - Tutorial ICML 2021
Continual Learning with Deep Architectures - Tutorial ICML 2021
 
Deep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & FutureDeep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & Future
 
Disentangled Representation Learning of Deep Generative Models
Disentangled Representation Learning of Deep Generative ModelsDisentangled Representation Learning of Deep Generative Models
Disentangled Representation Learning of Deep Generative Models
 
Meta-Learning with Memory-Augmented Neural Networks (MANN)
Meta-Learning with Memory-Augmented Neural Networks (MANN)Meta-Learning with Memory-Augmented Neural Networks (MANN)
Meta-Learning with Memory-Augmented Neural Networks (MANN)
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
artificial neural network
artificial neural networkartificial neural network
artificial neural network
 
Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun Yoo
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]
 
Single Image Super Resolution Overview
Single Image Super Resolution OverviewSingle Image Super Resolution Overview
Single Image Super Resolution Overview
 
Transformer in Computer Vision
Transformer in Computer VisionTransformer in Computer Vision
Transformer in Computer Vision
 
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
 
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
 
Diffusion models beat gans on image synthesis
Diffusion models beat gans on image synthesisDiffusion models beat gans on image synthesis
Diffusion models beat gans on image synthesis
 
Mobilenet
MobilenetMobilenet
Mobilenet
 
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
 
Intro to modelling-supervised learning
Intro to modelling-supervised learningIntro to modelling-supervised learning
Intro to modelling-supervised learning
 
Convolutional neural network in practice
Convolutional neural network in practiceConvolutional neural network in practice
Convolutional neural network in practice
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learning
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 

Similar to PR-231: A Simple Framework for Contrastive Learning of Visual Representations

Optimization as a model for few shot learning
Optimization as a model for few shot learningOptimization as a model for few shot learning
Optimization as a model for few shot learning
Katy Lee
 
How useful is self-supervised pretraining for Visual tasks?
How useful is self-supervised pretraining for Visual tasks?How useful is self-supervised pretraining for Visual tasks?
How useful is self-supervised pretraining for Visual tasks?
Seunghyun Hwang
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
Kien Le
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Universitat Politècnica de Catalunya
 
contrastive-learning2.pdf
contrastive-learning2.pdfcontrastive-learning2.pdf
contrastive-learning2.pdf
omogire
 
Learning visual representation without human label
Learning visual representation without human labelLearning visual representation without human label
Learning visual representation without human label
Kai-Wen Zhao
 
UNIT-II-Machine-Learning.pptx Machine Learning Different AI Models
UNIT-II-Machine-Learning.pptx Machine Learning Different AI ModelsUNIT-II-Machine-Learning.pptx Machine Learning Different AI Models
UNIT-II-Machine-Learning.pptx Machine Learning Different AI Models
JVSTHARUNSAI
 
in5490-classification (1).pptx
in5490-classification (1).pptxin5490-classification (1).pptx
in5490-classification (1).pptx
MonicaTimber
 
Recommender Systems from A to Z – Model Training
Recommender Systems from A to Z – Model TrainingRecommender Systems from A to Z – Model Training
Recommender Systems from A to Z – Model Training
Crossing Minds
 
Large Scale GAN Training for High Fidelity Natural Image Synthesis
Large Scale GAN Training for High Fidelity Natural Image SynthesisLarge Scale GAN Training for High Fidelity Natural Image Synthesis
Large Scale GAN Training for High Fidelity Natural Image Synthesis
Seunghyun Hwang
 
ResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksResNeSt: Split-Attention Networks
ResNeSt: Split-Attention Networks
Seunghyun Hwang
 
MACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptxMACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptx
NAGARAJANS68
 
nnUNet
nnUNetnnUNet
Mnist soln
Mnist solnMnist soln
Mnist soln
DanishFaisal4
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
Subrat Panda, PhD
 
Intro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft VenturesIntro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft Ventures
microsoftventures
 
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning TasksPR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
Jinwon Lee
 
Bag of tricks for image classification with convolutional neural networks r...
Bag of tricks for image classification with convolutional neural networks   r...Bag of tricks for image classification with convolutional neural networks   r...
Bag of tricks for image classification with convolutional neural networks r...
Dongmin Choi
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
Sanghamitra Deb
 
Presentation of master thesis
Presentation of master thesisPresentation of master thesis
Presentation of master thesis
Seoung-Ho Choi
 

Similar to PR-231: A Simple Framework for Contrastive Learning of Visual Representations (20)

Optimization as a model for few shot learning
Optimization as a model for few shot learningOptimization as a model for few shot learning
Optimization as a model for few shot learning
 
How useful is self-supervised pretraining for Visual tasks?
How useful is self-supervised pretraining for Visual tasks?How useful is self-supervised pretraining for Visual tasks?
How useful is self-supervised pretraining for Visual tasks?
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
contrastive-learning2.pdf
contrastive-learning2.pdfcontrastive-learning2.pdf
contrastive-learning2.pdf
 
Learning visual representation without human label
Learning visual representation without human labelLearning visual representation without human label
Learning visual representation without human label
 
UNIT-II-Machine-Learning.pptx Machine Learning Different AI Models
UNIT-II-Machine-Learning.pptx Machine Learning Different AI ModelsUNIT-II-Machine-Learning.pptx Machine Learning Different AI Models
UNIT-II-Machine-Learning.pptx Machine Learning Different AI Models
 
in5490-classification (1).pptx
in5490-classification (1).pptxin5490-classification (1).pptx
in5490-classification (1).pptx
 
Recommender Systems from A to Z – Model Training
Recommender Systems from A to Z – Model TrainingRecommender Systems from A to Z – Model Training
Recommender Systems from A to Z – Model Training
 
Large Scale GAN Training for High Fidelity Natural Image Synthesis
Large Scale GAN Training for High Fidelity Natural Image SynthesisLarge Scale GAN Training for High Fidelity Natural Image Synthesis
Large Scale GAN Training for High Fidelity Natural Image Synthesis
 
ResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksResNeSt: Split-Attention Networks
ResNeSt: Split-Attention Networks
 
MACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptxMACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptx
 
nnUNet
nnUNetnnUNet
nnUNet
 
Mnist soln
Mnist solnMnist soln
Mnist soln
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
 
Intro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft VenturesIntro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft Ventures
 
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning TasksPR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
 
Bag of tricks for image classification with convolutional neural networks r...
Bag of tricks for image classification with convolutional neural networks   r...Bag of tricks for image classification with convolutional neural networks   r...
Bag of tricks for image classification with convolutional neural networks r...
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
 
Presentation of master thesis
Presentation of master thesisPresentation of master thesis
Presentation of master thesis
 

More from Jinwon Lee

PR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sPR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020s
Jinwon Lee
 
PR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersPR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision Learners
Jinwon Lee
 
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
Jinwon Lee
 
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionPR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
Jinwon Lee
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...
Jinwon Lee
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)
Jinwon Lee
 
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorPR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
Jinwon Lee
 
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
Jinwon Lee
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
Jinwon Lee
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object Detection
Jinwon Lee
 
PR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementPR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental Improvement
Jinwon Lee
 
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
Jinwon Lee
 
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsPR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
Jinwon Lee
 
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksPR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Jinwon Lee
 
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionPR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
Jinwon Lee
 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
Jinwon Lee
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
Jinwon Lee
 
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
Jinwon Lee
 
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksPR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
Jinwon Lee
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unit
Jinwon Lee
 

More from Jinwon Lee (20)

PR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sPR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020s
 
PR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersPR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision Learners
 
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
 
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionPR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)
 
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorPR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
 
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object Detection
 
PR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementPR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental Improvement
 
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
 
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsPR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
 
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksPR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
 
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionPR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
 
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
 
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksPR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unit
 

Recently uploaded

Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 

Recently uploaded (20)

Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 

PR-231: A Simple Framework for Contrastive Learning of Visual Representations

  • 1. A Simple Framework for Contrastive Learning ofVisual Representations Ting Chen, et al., “A Simple Framework for Contrastive Learning of Visual Representations” 8th March, 2020 PR12 Paper Review JinWon Lee Samsung Electronics
  • 2. References • The Illustrated SimCLR Framework  https://amitness.com/2020/03/illustrated-simclr/ • Exploring SimCLR: A Simple Framework for Contrastive Learning of Visual Representations  https://towardsdatascience.com/exploring-simclr-a-simple-framework-for- contrastive-learning-of-visual-representations-158c30601e7e • SimCLR: Contrastive Learning ofVisual Representations  https://medium.com/@nainaakash012/simclr-contrastive-learning-of-visual- representations-52ecf1ac11fa
  • 3. Introduction • Learning effective visual representations without human supervision is a long-standing problem. • Most mainstream approaches fall into one of two classes: generative or discriminative.  Generative approaches – pixel level generation is computationally expensive and may not be necessary for representation learning.  Discriminative approaches learn representations using objective function like supervised learning but pretext tasks have relied on somewhat ad-hoc heuristics, which limits the generality of learn representations.
  • 5. Contrastive Learning • Contrastive methods aim to learn representations by enforcing similar elements to be equal and dissimilar elements to be different.
  • 6. Contrastive Learning – Data • Example pairs of images which are similar and images which are different are required for training a model Images from “The Illustrated SimCLR Framework”
  • 7. Supervised & Self-supervisedApproach Images from “The Illustrated SimCLR Framework”
  • 8. Contrastive Learning – Representstions Images from “The Illustrated SimCLR Framework”
  • 9. Contrastive Learning – Similarity Metric Images from “The Illustrated SimCLR Framework”
  • 10. Contrastive Learning – Noise Contrastive Estimator Loss • x+ is a positive example and x- is a negative example • sim(.) is a similarity function • Note that each positive pair (x,x+) we have a set of K negatives
  • 12. SimCLR – Overview • A stochastic data augmentation module that transforms any given data example randomly resulting in two correlated views of the same example.  Random crop and resize(with random flip), color distortions, and Gaussian blur • ResNet50 is adopted as a encoder, and the output vector is from GAP layer. (2048-dimension) • Two layer MLP is used in projection head. (128-dimensional latent space) • No explicit negative sampling. 2(N-1) augmented examples within a minibatch are used for negative samples. (N is a batch size) • Cosine similarity function is a used similarity metric. • Normalized temperature-scaled cross entropy(NT-Xent) loss is used.
  • 13. SimCLR - Overview • Training  Batch size : 256~8192  A batch size of 8192 gives 16382 negative examples per positive pair from both augmentation views.  To stabilize the training, LARS optimizer is used.  Aggregating BN mean and variance over all devices during training.  With 128TPU v3 cores, it takes ~1.5 hours to train ResNet-50 with a batch size of 4096 for 100 epochs • Dataset – ImageNet 2012 dataset • To evaluate the learned representations, linear evaluation protocol is used.
  • 14. Step by Step Example Images from “The Illustrated SimCLR Framework”
  • 15. Step by Step Example Images from “The Illustrated SimCLR Framework”
  • 16. Step by Step Example Images from “The Illustrated SimCLR Framework”
  • 17. Step by Step Example Images from “The Illustrated SimCLR Framework”
  • 18. Step by Step Example Images from “The Illustrated SimCLR Framework”
  • 19. Step by Step Example Images from “The Illustrated SimCLR Framework”
  • 20. Step by Step Example Images from “The Illustrated SimCLR Framework”
  • 21. Step by Step Example Images from “The Illustrated SimCLR Framework”
  • 22. Step by Step Example Images from “The Illustrated SimCLR Framework”
  • 23. Step by Step Example Images from “The Illustrated SimCLR Framework”
  • 24. Data Augmentation for Contrastive Representation Learning • Many existing approaches define contrastive prediction tasks by changing architecture. • The authors use only simple data augmentation methods, this simple design choice conveniently decouples the predictive task from other components such as the NN architecture.
  • 25. Data Augmentation for Contrastive Representation Learning Augmentations in the red boxes are used
  • 26. Linear Evaluation under Individual or Composition of Data Augmentation
  • 27. Evaluation of Data Augmentation • Asymmetric data transformation method is used.  Only one branch of the frame work is applied the target transformation(s). • No single transformation suffices to learn good representations, even though the model can almost perfectly identify the positive pairs. • Random cropping and random color distortion stands out  When using only random cropping as data augmentation is that most patches from an image share a similar color distortion
  • 28. Contrastive Learning Needs Stronger Data Augmentation • Stronger color augmentation substantially improves the linear evaluation of the learned unsupervised models. • A sophisticated augmentation policy(such as AutoAugment) does not work better than simple cropping + (stronger) color distortion • Unsupervised contrastive learning benefits from stronger (color) data augmentation than supervised learning. • Data augmentation that does not yield accuracy benefits for supervised learning can still help considerably with contrastive learning.
  • 29. Unsupervised Contrastive Learning Benefits from Bigger Models • Unsupervised learning benefits more from bigger models than its supervised counter part.
  • 30. Nonlinear Projection Head • Nonlinear projection is better than a linear projection(+3%) and much better than no projection(>10%)
  • 31. Nonlinear Projection Head • The hidden layer before the projection head is a better representation than the layer after. • The importance of using the representation before the nonlinear projection is due to loss of information induced by the contrastive loss. In particular z = g(h) is trained to be invariant to data transformation.
  • 32. Loss Function • l2 normalization along with temperature effectively weights different examples, and an appropriate temperature can help the model learn from hard negatives. • Unlike cross-entropy, other objective functions do not weigh the negatives by their relative hardness.
  • 33. Larger Batch Size and LongerTraining • When the training epochs is small, larger batch size have a significant advantage. • Larger batch sizes provide more negative examples, facilitating convergence, and training longer also provides more negative examples, improving the results.
  • 34. Comparison with SOTA – Linear Evaluation
  • 35. Comparison with SOTA – Semi-supervised Learning
  • 38. Appendix – Effects of LongerTraining for Supervised Learning • There is no significant benefit from training supervised models longer on ImageNet. • Stronger data augmentation slightly improves the accuracy of ResNet-50 (4x) but does not help on ResNet-50.
  • 40. Conclusion • SimCLR differs from standard supervised learning on ImageNet only in the choice of data augmentation, the use of nonlinear head at the end of the network, and the loss function. • Composition of data augmentations plays a critical role in defining effective predictive tasks. • Nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations. • Contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning. • SimCLR achieves 76.5% top-1 accuracy, which is a 7% relative improvement over previous state-of-the-art, matching the performance of a supervised ResNet-50.