SlideShare a Scribd company logo
Members : 김상현, 고형권, 허다운, 전선영, 김준철, 조경진
Presenter : 조경진
2021.10.10
Joint Contrastive Learning with Infinite Possibilities
(Qi Cai, Yu Wang, Yingwei Pan, Ting Yao, Tao Mei , NeurIPS, 2020)
https://arxiv.org/abs/2009.14776
Contents
1. Introduction
2. Related Work
3. Methods
4. Experiments
5. Conclusion
1
Introduction Related Work Methods Experiments Conclusions
❖ What is Contrastive learning?
Contrastive learning is a machine learning technique used to learn the general features of a dataset
without labels by teaching the model which data points are similar or different.
https://amitness.com/2020/03/illustrated-simclr/ 2
Introduction Related Work Methods Experiments Conclusions
❖ What is Contrastive learning?
Contrastive learning is a machine learning technique used to learn the general features of a dataset
without labels by teaching the model which data points are similar or different.
https://amitness.com/2020/03/illustrated-simclr/ 3
Introduction Related Work Methods Experiments Conclusions
❖ What is Contrastive learning?
Contrastive learning is a machine learning technique used to learn the general features of a dataset
without labels by teaching the model which data points are similar or different.
CNN
Supervised learning
1. Label cost
1. Task-specific solution (poor generalizability) 4
Introduction Related Work Methods Experiments Conclusions
❖ What is Contrastive learning?
Contrastive learning is a machine learning technique used to learn the general features of a dataset
without labels by teaching the model which data points are similar or different.
5
Contrastive learning
1. No label cost
1. Task-agnostic solution (Good generalizability)
How can we train model using contrastive learning method?
Introduction Related Work Methods Experiments Conclusions
6
Encoder
Mechanism to get representations that allow the machine to
understand an image.
Introduction Related Work Methods Experiments Conclusions
❖ How do we train contrastive learning?
CNN
Image Representation
Similarity measure
Mechanism to compute the similarity of two images.
Similarity ( , )
7
Similar and dissimilar images
Example pairs of similar and dissimilar images.
Image Same
Different
Different
Introduction Related Work Methods Experiments Conclusions
❖ How do we train contrastive learning?
Positive sample to be recognized as a similar image, Negative sample to be recognized as a different image.
Score ( , )
maximize
https://ankeshanand.com/blog/2020/01/26/contrative-self-supervised-learning.html
Score ( , )
Score ( , ) Score ( , )
+ 𝚺K-1
we can change softmax and NLL in this probability.
InfoNCE
8
Score ( , )
minimize
Introduction Related Work Methods Experiments Conclusions
❖ InfoNCE loss
Maximize mutual Information
https://arxiv.org/pdf/1807.03748.pdf
Mutual Information
MI maximize
9
InfoNCE
❖ InfoNCE loss
Maximize mutual Information
https://arxiv.org/pdf/1807.03748.pdf
MI maximize
Mutual Information
Introduction Related Work Methods Experiments Conclusions
10
InfoNCE
Introduction Related Work Methods Experiments Conclusions
❖ Self supervised learning
Pretext Task : pre-designed tasks for networks to solve,
and visual features are learned by learning objective
functions of pretext tasks.
Downstream Task : computer vision applications that are
used to evaluate the quality of features learned by self-
supervised learning.
No label
Label
11
Introduction Related Work Methods Experiments Conclusions
❖ Self supervised learning in Pretext task
https://arxiv.org/pdf/1803.07728.pdf,https://arxiv.org/pdf/1603.09246.pdf
https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Doersch_Unsupervised_Visual_Representation_ICCV_2015_paper.pdf 12
Introduction Related Work Methods Experiments Conclusions
❖ Self supervised learning in Pretext task
https://arxiv.org/pdf/1803.07728.pdf,https://arxiv.org/pdf/1603.09246.pdf
https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Doersch_Unsupervised_Visual_Representation_ICCV_2015_paper.pdf
Rotation
0, 90, 180, 270 degree random rotation
4-class classification problem
13
Introduction Related Work Methods Experiments Conclusions
❖ Self supervised learning in Pretext task
https://arxiv.org/pdf/1803.07728.pdf,https://arxiv.org/pdf/1603.09246.pdf
https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Doersch_Unsupervised_Visual_Representation_ICCV_2015_paper.pdf
Jigsaw Puzzle
Random shuffle patches,
Permutation prediction
Context Prediction
Relative patches location prediction
14
Introduction Related Work Methods Experiments Conclusions
❖ MoCo
https://arxiv.org/abs/1911.05722 ,
https://arxiv.org/abs/2003.04297
● Dictionary as a queue
(JCL adopted)
● Momentum update
● Memory bank
● Contrastive loss function
○ one positive pair
○ many negative pair
❖ SimCLR
https://arxiv.org/abs/2002.05709
● Random cropping resize to
original, color distortion,
Gaussian blur
● Encoder f (ᐧ)
● Projection head g (ᐧ) mapping
● Contrastive loss function
○ one positive pair
○ many negative pair
15
● SimCLR augmentation strategy
● MLP projection head
● Cosine LR scheduler
● Same loss function as Moco v1
❖ MoCo v2
How does JCL actually work?
Introduction Related Work Methods Experiments Conclusions
16
Introduction Related Work Methods Experiments Conclusions
● Most existing contrastive learning methods only consider independently penalizing the incompatibility of
each single positive query-key pair at a time.
● This does not fully leverage the assumption that all augmentations corresponding to a specific
image are statistically dependent on each other, and are simultaneously similar to the query.
❖ Positive, Negative key pairs
17
Introduction Related Work Methods Experiments Conclusions
● Instance xi to be firstly augmented M+1 times (M for positive keys and 1 extra for the query itself).
● Unfortunately, this is not computational applicable, as carrying all (M +1)×N pairs in a mini-batch would
quickly drain the GPU memory when M is even moderately small.
● Capitalizing on this application of infinity limit, the statistics of the data become sufficient to reach the
same goal of multiple pairing.
❖ Positive, Negative key pairs
18
Introduction Related Work Methods Experiments Conclusions
❖ Joint Contrastive Learning
https://en.wikipedia.org/wiki/Jensen%27s_inequality 19
Introduction Related Work Methods Experiments Conclusions
❖ Joint Contrastive Learning
https://en.wikipedia.org/wiki/Jensen%27s_inequality 20
● Jensen's inequality
Introduction Related Work Methods Experiments Conclusions
❖ Joint Contrastive Learning
https://en.wikipedia.org/wiki/Jensen%27s_inequality 21
Introduction Related Work Methods Experiments Conclusions
❖ Joint Contrastive Learning
Implicit
Explicit
22
Introduction Related Work Methods Experiments Conclusions
❖ Joint Contrastive Learning
23
● Under this Gaussian assumption, Eq.(7) eventually reduces Eq.(8).
● For any random variable x that follows Gaussian distribution x∼N (μ, Σ), where μ is the expectation of x, Σ is the variance of x,
we have the moment generation function that satisfy.
● We scale the influence of Σk+ by multiplying it with a scalar λ. This tuning of λ hopefully stabilizes the training.
Introduction Related Work Methods Experiments Conclusions
❖ Joint Contrastive Learning
24
● Under this Gaussian assumption, Eq.(7) eventually reduces Eq.(8).
● For any random variable x that follows Gaussian distribution x∼N (μ, Σ), where μ is the expectation of x, Σ is the variance of x,
we have the moment generation function that satisfy.
● We scale the influence of Σk+ by multiplying it with a scalar λ. This tuning of λ hopefully stabilizes the training.
Introduction Related Work Methods Experiments Conclusions
❖ Joint Contrastive Learning
25
● Under this Gaussian assumption, Eq.(7) eventually reduces Eq.(8).
● For any random variable x that follows Gaussian distribution x∼N (μ, Σ), where μ is the expectation of x, Σ is the variance of x,
we have the moment generation function that satisfy.
● We scale the influence of Σk+ by multiplying it with a scalar λ. This tuning of λ hopefully stabilizes the training.
Question?
Introduction Related Work Methods Experiments Conclusions
26
Introduction Related Work Methods Experiments Conclusions
❖ Implicit Semantic Data Augmentation
https://arxiv.org/abs/1909.12220
https://neurohive.io/en/news/semantic-data-augmentation-improves-neural-network-s-generalization/
● We hope that directions corresponding to meaningful transformations for each class
are well represented by the principal components of the covariance matrix of that class.
● Consider training a deep network G with weights Θ on a training set
D = {(xi,yi)}Ni=1, where yi ∈ {1, . . . , C} is the label of the i-th sample xi over C classes.
● Let the A-dimensional vector ai = [ai1, . . . , aiA]T = G(xi, Θ) denote the deep features of xi
learned by G, and aij indicate the j-th element of ai.
● To obtain semantic directions to augment ai, we randomly sample vectors from
a zero-mean multi-variate normal distribution N(0 , Σyi),
where Σyi is the class-conditional covariance matrix estimated from the features of all
the samples in class yi.
● During training, C covariance matrices are computed, one for each class.
The augmented feature a ̃i is obtained by translating ai along a random direction
sampled from N(0 , λΣyi).
27
Introduction Related Work Methods Experiments Conclusions
❖ Joint Contrastive Learning
● Variable k+
i follows a Gaussian distribution
,where μk+ and Σk+
are respectively the mean and the covariance matrix
of the positive keys for qi+.
● this assumption is legitimate as
positive keys more or less share similarities in the
embedding space around some mean value as they
all mirror the nature of the query to some extent.
Explicit representation in negative keys
https://blog.daum.net/shksjy/228
https://neurohive.io/en/news/semantic-data-augmentation-improves-neural-network-s-generalization/
Implicit representation in positive keys
1 2 3 4
28
Introduction Related Work Methods Experiments Conclusions
❖ Joint Contrastive Learning
29
JCL pre-trained , MoCo v2 pre-trained ResNet-18 network ,
directly extract the features.
Fig(3(a)): calculating the cosine similarities of each pair of
features (every pair of 2 out of 32 augmentations) belonging
to the same identity image.
Fig(3(b)): Accordingly, the variance of the obtained
features belonging to the same image is much smaller.
non-contrastive
contrastive
supervised
unsupervised
Positive key number M 𝜏 λ Batch size N
hyper-parameter 5 0.2 4.0 512
Introduction Related Work Methods Experiments Conclusions
❖ Conclusion in Joint Contrastive Learning
● JCL implicitly involves the joint learning of an infinite number of query-key pairs for each instance.
● By applying rigorous bounding techniques on the proposed formulation, we transfer the originally intractable loss
function into practical implementations.
● Most notably, although JCL is an unsupervised algorithm, the JCL pre-trained networks even outperform its
supervised counterparts in many scenarios.
30
Question?
Introduction Related Work Methods Experiments Conclusions
31

More Related Content

What's hot

CNN and its applications by ketaki
CNN and its applications by ketakiCNN and its applications by ketaki
CNN and its applications by ketaki
Ketaki Patwari
 
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation..."Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
Edge AI and Vision Alliance
 
Digit recognition using mnist database
Digit recognition using mnist databaseDigit recognition using mnist database
Digit recognition using mnist database
btandale
 
Restricted Boltzmann Machines.pptx
Restricted Boltzmann Machines.pptxRestricted Boltzmann Machines.pptx
Restricted Boltzmann Machines.pptx
husseinali674716
 
Super resolution-review
Super resolution-reviewSuper resolution-review
Super resolution-review
Woojin Jeong
 
Otsu binarization
Otsu binarizationOtsu binarization
Otsu binarization
yalda akbarzadeh
 
Data Augmentation
Data AugmentationData Augmentation
Data Augmentation
Md Tajul Islam
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
Jinwon Lee
 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural Networks
Yogendra Tamang
 
Week6 face detection
Week6 face detectionWeek6 face detection
Week6 face detection
Haitham El-Ghareeb
 
Diffusion models beat gans on image synthesis
Diffusion models beat gans on image synthesisDiffusion models beat gans on image synthesis
Diffusion models beat gans on image synthesis
BeerenSahu
 
Image enhancement in the spatial domain1
Image enhancement in the spatial domain1Image enhancement in the spatial domain1
Image enhancement in the spatial domain1
shabanam tamboli
 
Reinforcement Learning for Self Driving Cars
Reinforcement Learning for Self Driving CarsReinforcement Learning for Self Driving Cars
Reinforcement Learning for Self Driving Cars
Sneha Ravikumar
 
Deep Semi-supervised Learning methods
Deep Semi-supervised Learning methodsDeep Semi-supervised Learning methods
Deep Semi-supervised Learning methods
Princy Joy
 
Image segmentation based on color
Image segmentation based on colorImage segmentation based on color
Image segmentation based on color
eSAT Journals
 
A Simple Framework for Contrastive Learning of Visual Representations
A Simple Framework for Contrastive Learning of Visual RepresentationsA Simple Framework for Contrastive Learning of Visual Representations
A Simple Framework for Contrastive Learning of Visual Representations
Seunghyun Hwang
 
Facial emotion recognition
Facial emotion recognitionFacial emotion recognition
Facial emotion recognition
Rahin Patel
 
[SIGGRAPH 2016] Automatic Image Colorization
[SIGGRAPH 2016] Automatic Image Colorization[SIGGRAPH 2016] Automatic Image Colorization
[SIGGRAPH 2016] Automatic Image Colorization
Satoshi Iizuka
 
Presentation On Machine Learning.pptx
Presentation  On Machine Learning.pptxPresentation  On Machine Learning.pptx
Presentation On Machine Learning.pptx
Godwin585235
 
Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)
Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)
Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 

What's hot (20)

CNN and its applications by ketaki
CNN and its applications by ketakiCNN and its applications by ketaki
CNN and its applications by ketaki
 
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation..."Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
 
Digit recognition using mnist database
Digit recognition using mnist databaseDigit recognition using mnist database
Digit recognition using mnist database
 
Restricted Boltzmann Machines.pptx
Restricted Boltzmann Machines.pptxRestricted Boltzmann Machines.pptx
Restricted Boltzmann Machines.pptx
 
Super resolution-review
Super resolution-reviewSuper resolution-review
Super resolution-review
 
Otsu binarization
Otsu binarizationOtsu binarization
Otsu binarization
 
Data Augmentation
Data AugmentationData Augmentation
Data Augmentation
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural Networks
 
Week6 face detection
Week6 face detectionWeek6 face detection
Week6 face detection
 
Diffusion models beat gans on image synthesis
Diffusion models beat gans on image synthesisDiffusion models beat gans on image synthesis
Diffusion models beat gans on image synthesis
 
Image enhancement in the spatial domain1
Image enhancement in the spatial domain1Image enhancement in the spatial domain1
Image enhancement in the spatial domain1
 
Reinforcement Learning for Self Driving Cars
Reinforcement Learning for Self Driving CarsReinforcement Learning for Self Driving Cars
Reinforcement Learning for Self Driving Cars
 
Deep Semi-supervised Learning methods
Deep Semi-supervised Learning methodsDeep Semi-supervised Learning methods
Deep Semi-supervised Learning methods
 
Image segmentation based on color
Image segmentation based on colorImage segmentation based on color
Image segmentation based on color
 
A Simple Framework for Contrastive Learning of Visual Representations
A Simple Framework for Contrastive Learning of Visual RepresentationsA Simple Framework for Contrastive Learning of Visual Representations
A Simple Framework for Contrastive Learning of Visual Representations
 
Facial emotion recognition
Facial emotion recognitionFacial emotion recognition
Facial emotion recognition
 
[SIGGRAPH 2016] Automatic Image Colorization
[SIGGRAPH 2016] Automatic Image Colorization[SIGGRAPH 2016] Automatic Image Colorization
[SIGGRAPH 2016] Automatic Image Colorization
 
Presentation On Machine Learning.pptx
Presentation  On Machine Learning.pptxPresentation  On Machine Learning.pptx
Presentation On Machine Learning.pptx
 
Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)
Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)
Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)
 

Similar to Joint contrastive learning with infinite possibilities

Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Sangwoo Mo
 
2. polynomial interpolation
2. polynomial interpolation2. polynomial interpolation
2. polynomial interpolation
EasyStudy3
 
More investment in Research and Development for better Education in the future?
More investment in Research and Development for better Education in the future?More investment in Research and Development for better Education in the future?
More investment in Research and Development for better Education in the future?
Dhafer Malouche
 
A simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsA simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representations
Devansh16
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)
Masahiro Suzuki
 
Machine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.pptMachine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.ppt
Anshika865276
 
Machine learning (5)
Machine learning (5)Machine learning (5)
Machine learning (5)
NYversity
 
Chap 8. Optimization for training deep models
Chap 8. Optimization for training deep modelsChap 8. Optimization for training deep models
Chap 8. Optimization for training deep models
Young-Geun Choi
 
Machine learning - session 3
Machine learning - session 3Machine learning - session 3
Machine learning - session 3
Luis Borbon
 
An introduction to machine learning for particle physics
An introduction to machine learning for particle physicsAn introduction to machine learning for particle physics
An introduction to machine learning for particle physics
Andrew Lowe
 
Neural Nets Deconstructed
Neural Nets DeconstructedNeural Nets Deconstructed
Neural Nets Deconstructed
Paul Sterk
 
Efficient projections
Efficient projectionsEfficient projections
Efficient projections
Tomasz Waszczyk
 
Efficient projections
Efficient projectionsEfficient projections
Efficient projections
Tomasz Waszczyk
 
1-s2.0-S0047259X16300689-main (1).pdf
1-s2.0-S0047259X16300689-main (1).pdf1-s2.0-S0047259X16300689-main (1).pdf
1-s2.0-S0047259X16300689-main (1).pdf
shampy kamboj
 
Model Selection and Validation
Model Selection and ValidationModel Selection and Validation
Model Selection and Validation
gmorishita
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
Hadrian7
 
Harnessing Deep Neural Networks with Logic Rules
Harnessing Deep Neural Networks with Logic RulesHarnessing Deep Neural Networks with Logic Rules
Harnessing Deep Neural Networks with Logic Rules
Sho Takase
 
Distributed ADMM
Distributed ADMMDistributed ADMM
Distributed ADMM
Pei-Che Chang
 
Duality Theory in Multi Objective Linear Programming Problems
Duality Theory in Multi Objective Linear Programming ProblemsDuality Theory in Multi Objective Linear Programming Problems
Duality Theory in Multi Objective Linear Programming Problems
theijes
 
A0311010106
A0311010106A0311010106
A0311010106
theijes
 

Similar to Joint contrastive learning with infinite possibilities (20)

Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
 
2. polynomial interpolation
2. polynomial interpolation2. polynomial interpolation
2. polynomial interpolation
 
More investment in Research and Development for better Education in the future?
More investment in Research and Development for better Education in the future?More investment in Research and Development for better Education in the future?
More investment in Research and Development for better Education in the future?
 
A simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsA simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representations
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)
 
Machine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.pptMachine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.ppt
 
Machine learning (5)
Machine learning (5)Machine learning (5)
Machine learning (5)
 
Chap 8. Optimization for training deep models
Chap 8. Optimization for training deep modelsChap 8. Optimization for training deep models
Chap 8. Optimization for training deep models
 
Machine learning - session 3
Machine learning - session 3Machine learning - session 3
Machine learning - session 3
 
An introduction to machine learning for particle physics
An introduction to machine learning for particle physicsAn introduction to machine learning for particle physics
An introduction to machine learning for particle physics
 
Neural Nets Deconstructed
Neural Nets DeconstructedNeural Nets Deconstructed
Neural Nets Deconstructed
 
Efficient projections
Efficient projectionsEfficient projections
Efficient projections
 
Efficient projections
Efficient projectionsEfficient projections
Efficient projections
 
1-s2.0-S0047259X16300689-main (1).pdf
1-s2.0-S0047259X16300689-main (1).pdf1-s2.0-S0047259X16300689-main (1).pdf
1-s2.0-S0047259X16300689-main (1).pdf
 
Model Selection and Validation
Model Selection and ValidationModel Selection and Validation
Model Selection and Validation
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
 
Harnessing Deep Neural Networks with Logic Rules
Harnessing Deep Neural Networks with Logic RulesHarnessing Deep Neural Networks with Logic Rules
Harnessing Deep Neural Networks with Logic Rules
 
Distributed ADMM
Distributed ADMMDistributed ADMM
Distributed ADMM
 
Duality Theory in Multi Objective Linear Programming Problems
Duality Theory in Multi Objective Linear Programming ProblemsDuality Theory in Multi Objective Linear Programming Problems
Duality Theory in Multi Objective Linear Programming Problems
 
A0311010106
A0311010106A0311010106
A0311010106
 

More from taeseon ryu

VoxelNet
VoxelNetVoxelNet
VoxelNet
taeseon ryu
 
OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...
taeseon ryu
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splatting
taeseon ryu
 
JetsonTX2 Python
 JetsonTX2 Python  JetsonTX2 Python
JetsonTX2 Python
taeseon ryu
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptx
taeseon ryu
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
taeseon ryu
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
taeseon ryu
 
YOLO V6
YOLO V6YOLO V6
YOLO V6
taeseon ryu
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
taeseon ryu
 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
taeseon ryu
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
taeseon ryu
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learning
taeseon ryu
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
taeseon ryu
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
taeseon ryu
 
mPLUG
mPLUGmPLUG
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
taeseon ryu
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
taeseon ryu
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithm
taeseon ryu
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networks
taeseon ryu
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarization
taeseon ryu
 

More from taeseon ryu (20)

VoxelNet
VoxelNetVoxelNet
VoxelNet
 
OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splatting
 
JetsonTX2 Python
 JetsonTX2 Python  JetsonTX2 Python
JetsonTX2 Python
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptx
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
 
YOLO V6
YOLO V6YOLO V6
YOLO V6
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learning
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
 
mPLUG
mPLUGmPLUG
mPLUG
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithm
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networks
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarization
 

Recently uploaded

Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
wyddcwye1
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
yuvarajkumar334
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
slg6lamcq
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
taqyea
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
exukyp
 

Recently uploaded (20)

Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
 

Joint contrastive learning with infinite possibilities

  • 1. Members : 김상현, 고형권, 허다운, 전선영, 김준철, 조경진 Presenter : 조경진 2021.10.10 Joint Contrastive Learning with Infinite Possibilities (Qi Cai, Yu Wang, Yingwei Pan, Ting Yao, Tao Mei , NeurIPS, 2020) https://arxiv.org/abs/2009.14776
  • 2. Contents 1. Introduction 2. Related Work 3. Methods 4. Experiments 5. Conclusion 1
  • 3. Introduction Related Work Methods Experiments Conclusions ❖ What is Contrastive learning? Contrastive learning is a machine learning technique used to learn the general features of a dataset without labels by teaching the model which data points are similar or different. https://amitness.com/2020/03/illustrated-simclr/ 2
  • 4. Introduction Related Work Methods Experiments Conclusions ❖ What is Contrastive learning? Contrastive learning is a machine learning technique used to learn the general features of a dataset without labels by teaching the model which data points are similar or different. https://amitness.com/2020/03/illustrated-simclr/ 3
  • 5. Introduction Related Work Methods Experiments Conclusions ❖ What is Contrastive learning? Contrastive learning is a machine learning technique used to learn the general features of a dataset without labels by teaching the model which data points are similar or different. CNN Supervised learning 1. Label cost 1. Task-specific solution (poor generalizability) 4
  • 6. Introduction Related Work Methods Experiments Conclusions ❖ What is Contrastive learning? Contrastive learning is a machine learning technique used to learn the general features of a dataset without labels by teaching the model which data points are similar or different. 5 Contrastive learning 1. No label cost 1. Task-agnostic solution (Good generalizability)
  • 7. How can we train model using contrastive learning method? Introduction Related Work Methods Experiments Conclusions 6
  • 8. Encoder Mechanism to get representations that allow the machine to understand an image. Introduction Related Work Methods Experiments Conclusions ❖ How do we train contrastive learning? CNN Image Representation Similarity measure Mechanism to compute the similarity of two images. Similarity ( , ) 7 Similar and dissimilar images Example pairs of similar and dissimilar images. Image Same Different Different
  • 9. Introduction Related Work Methods Experiments Conclusions ❖ How do we train contrastive learning? Positive sample to be recognized as a similar image, Negative sample to be recognized as a different image. Score ( , ) maximize https://ankeshanand.com/blog/2020/01/26/contrative-self-supervised-learning.html Score ( , ) Score ( , ) Score ( , ) + 𝚺K-1 we can change softmax and NLL in this probability. InfoNCE 8 Score ( , ) minimize
  • 10. Introduction Related Work Methods Experiments Conclusions ❖ InfoNCE loss Maximize mutual Information https://arxiv.org/pdf/1807.03748.pdf Mutual Information MI maximize 9 InfoNCE
  • 11. ❖ InfoNCE loss Maximize mutual Information https://arxiv.org/pdf/1807.03748.pdf MI maximize Mutual Information Introduction Related Work Methods Experiments Conclusions 10 InfoNCE
  • 12. Introduction Related Work Methods Experiments Conclusions ❖ Self supervised learning Pretext Task : pre-designed tasks for networks to solve, and visual features are learned by learning objective functions of pretext tasks. Downstream Task : computer vision applications that are used to evaluate the quality of features learned by self- supervised learning. No label Label 11
  • 13. Introduction Related Work Methods Experiments Conclusions ❖ Self supervised learning in Pretext task https://arxiv.org/pdf/1803.07728.pdf,https://arxiv.org/pdf/1603.09246.pdf https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Doersch_Unsupervised_Visual_Representation_ICCV_2015_paper.pdf 12
  • 14. Introduction Related Work Methods Experiments Conclusions ❖ Self supervised learning in Pretext task https://arxiv.org/pdf/1803.07728.pdf,https://arxiv.org/pdf/1603.09246.pdf https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Doersch_Unsupervised_Visual_Representation_ICCV_2015_paper.pdf Rotation 0, 90, 180, 270 degree random rotation 4-class classification problem 13
  • 15. Introduction Related Work Methods Experiments Conclusions ❖ Self supervised learning in Pretext task https://arxiv.org/pdf/1803.07728.pdf,https://arxiv.org/pdf/1603.09246.pdf https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Doersch_Unsupervised_Visual_Representation_ICCV_2015_paper.pdf Jigsaw Puzzle Random shuffle patches, Permutation prediction Context Prediction Relative patches location prediction 14
  • 16. Introduction Related Work Methods Experiments Conclusions ❖ MoCo https://arxiv.org/abs/1911.05722 , https://arxiv.org/abs/2003.04297 ● Dictionary as a queue (JCL adopted) ● Momentum update ● Memory bank ● Contrastive loss function ○ one positive pair ○ many negative pair ❖ SimCLR https://arxiv.org/abs/2002.05709 ● Random cropping resize to original, color distortion, Gaussian blur ● Encoder f (ᐧ) ● Projection head g (ᐧ) mapping ● Contrastive loss function ○ one positive pair ○ many negative pair 15 ● SimCLR augmentation strategy ● MLP projection head ● Cosine LR scheduler ● Same loss function as Moco v1 ❖ MoCo v2
  • 17. How does JCL actually work? Introduction Related Work Methods Experiments Conclusions 16
  • 18. Introduction Related Work Methods Experiments Conclusions ● Most existing contrastive learning methods only consider independently penalizing the incompatibility of each single positive query-key pair at a time. ● This does not fully leverage the assumption that all augmentations corresponding to a specific image are statistically dependent on each other, and are simultaneously similar to the query. ❖ Positive, Negative key pairs 17
  • 19. Introduction Related Work Methods Experiments Conclusions ● Instance xi to be firstly augmented M+1 times (M for positive keys and 1 extra for the query itself). ● Unfortunately, this is not computational applicable, as carrying all (M +1)×N pairs in a mini-batch would quickly drain the GPU memory when M is even moderately small. ● Capitalizing on this application of infinity limit, the statistics of the data become sufficient to reach the same goal of multiple pairing. ❖ Positive, Negative key pairs 18
  • 20. Introduction Related Work Methods Experiments Conclusions ❖ Joint Contrastive Learning https://en.wikipedia.org/wiki/Jensen%27s_inequality 19
  • 21. Introduction Related Work Methods Experiments Conclusions ❖ Joint Contrastive Learning https://en.wikipedia.org/wiki/Jensen%27s_inequality 20
  • 22. ● Jensen's inequality Introduction Related Work Methods Experiments Conclusions ❖ Joint Contrastive Learning https://en.wikipedia.org/wiki/Jensen%27s_inequality 21
  • 23. Introduction Related Work Methods Experiments Conclusions ❖ Joint Contrastive Learning Implicit Explicit 22
  • 24. Introduction Related Work Methods Experiments Conclusions ❖ Joint Contrastive Learning 23 ● Under this Gaussian assumption, Eq.(7) eventually reduces Eq.(8). ● For any random variable x that follows Gaussian distribution x∼N (μ, Σ), where μ is the expectation of x, Σ is the variance of x, we have the moment generation function that satisfy. ● We scale the influence of Σk+ by multiplying it with a scalar λ. This tuning of λ hopefully stabilizes the training.
  • 25. Introduction Related Work Methods Experiments Conclusions ❖ Joint Contrastive Learning 24 ● Under this Gaussian assumption, Eq.(7) eventually reduces Eq.(8). ● For any random variable x that follows Gaussian distribution x∼N (μ, Σ), where μ is the expectation of x, Σ is the variance of x, we have the moment generation function that satisfy. ● We scale the influence of Σk+ by multiplying it with a scalar λ. This tuning of λ hopefully stabilizes the training.
  • 26. Introduction Related Work Methods Experiments Conclusions ❖ Joint Contrastive Learning 25 ● Under this Gaussian assumption, Eq.(7) eventually reduces Eq.(8). ● For any random variable x that follows Gaussian distribution x∼N (μ, Σ), where μ is the expectation of x, Σ is the variance of x, we have the moment generation function that satisfy. ● We scale the influence of Σk+ by multiplying it with a scalar λ. This tuning of λ hopefully stabilizes the training.
  • 27. Question? Introduction Related Work Methods Experiments Conclusions 26
  • 28. Introduction Related Work Methods Experiments Conclusions ❖ Implicit Semantic Data Augmentation https://arxiv.org/abs/1909.12220 https://neurohive.io/en/news/semantic-data-augmentation-improves-neural-network-s-generalization/ ● We hope that directions corresponding to meaningful transformations for each class are well represented by the principal components of the covariance matrix of that class. ● Consider training a deep network G with weights Θ on a training set D = {(xi,yi)}Ni=1, where yi ∈ {1, . . . , C} is the label of the i-th sample xi over C classes. ● Let the A-dimensional vector ai = [ai1, . . . , aiA]T = G(xi, Θ) denote the deep features of xi learned by G, and aij indicate the j-th element of ai. ● To obtain semantic directions to augment ai, we randomly sample vectors from a zero-mean multi-variate normal distribution N(0 , Σyi), where Σyi is the class-conditional covariance matrix estimated from the features of all the samples in class yi. ● During training, C covariance matrices are computed, one for each class. The augmented feature a ̃i is obtained by translating ai along a random direction sampled from N(0 , λΣyi). 27
  • 29. Introduction Related Work Methods Experiments Conclusions ❖ Joint Contrastive Learning ● Variable k+ i follows a Gaussian distribution ,where μk+ and Σk+ are respectively the mean and the covariance matrix of the positive keys for qi+. ● this assumption is legitimate as positive keys more or less share similarities in the embedding space around some mean value as they all mirror the nature of the query to some extent. Explicit representation in negative keys https://blog.daum.net/shksjy/228 https://neurohive.io/en/news/semantic-data-augmentation-improves-neural-network-s-generalization/ Implicit representation in positive keys 1 2 3 4 28
  • 30. Introduction Related Work Methods Experiments Conclusions ❖ Joint Contrastive Learning 29 JCL pre-trained , MoCo v2 pre-trained ResNet-18 network , directly extract the features. Fig(3(a)): calculating the cosine similarities of each pair of features (every pair of 2 out of 32 augmentations) belonging to the same identity image. Fig(3(b)): Accordingly, the variance of the obtained features belonging to the same image is much smaller. non-contrastive contrastive supervised unsupervised Positive key number M 𝜏 λ Batch size N hyper-parameter 5 0.2 4.0 512
  • 31. Introduction Related Work Methods Experiments Conclusions ❖ Conclusion in Joint Contrastive Learning ● JCL implicitly involves the joint learning of an infinite number of query-key pairs for each instance. ● By applying rigorous bounding techniques on the proposed formulation, we transfer the originally intractable loss function into practical implementations. ● Most notably, although JCL is an unsupervised algorithm, the JCL pre-trained networks even outperform its supervised counterparts in many scenarios. 30
  • 32. Question? Introduction Related Work Methods Experiments Conclusions 31

Editor's Notes

  1. Joint Contrastive Learning with Infinite Possibilities : https://arxiv.org/abs/2009.14776 contrastive learning
  2. implicit semantic augmentation은 explicit 하게 여러 augmentation으로 직접 모델에게 가르치는게 아니라 latent feature map에 대한 의미있는 direction을 가르치기 위하여 가우시안 분포라고 가정하여 샘플링 하도록 합니다. 그리고 그에 해당하는 가우시안 분포의 mean, var를 학습시키도록 합니다. 그리고 PC들을 잘 꺼내어 의미있는 direction을 뽑히도록 한다.