SlideShare a Scribd company logo
MCSE: Multimodal Contrastive Learning of
Sentence Embeddings
Miaoran Zhang, Marius Mosbach, David Ifeoluwa Adelani, Michael A. Hedderich, and Dietrich Klakow,
2022
Experiments
Conclusions
Introduction
Related Work
01
02
03
04
Introduction
• MCSE : Multimodal Contrastive Learning of Sentence Embeddings
 Background: Unsupervised SimCSE (Gao et al., 2021)
 Extend a multimodal contrastive objective
 Experiments on standard Semantic Textual Similarity (STS)
• Architecture of MCSE
Introduction
f v (·
) is a pre-trained image encoder such as ResNet
• Contrastive learning : background Unsupervised SimCSE
 data augmentation strategy : dropout noise
 pulling positive sentences closer and pushing apart negatives
Related Work
Cosine similarity
• Multimodal Contrastive Learning
 sentence-image pairs , sentence xi and image yi
• f v (·
) : pre-trained image encoder such as ResNet
• fθ(·
) : pre-trained language encoder such as BERT
 pull semantically close image-sentence pairs together and push away non-related pairs
Related Work
• Dataset
 Multimodal datasets : Flickr30k (29,783 images) and MS-COCO (82,783 images)
 text corpus : Wiki1M (English Wikipedia : 106 sentences)
• Encoder
 Language encoders : BERT and RoBERTa
 Image encoder : ResNet-50
Single layer MLPs
• Evaluation
 7 Semantic Textual Similarity (STS) : STS 2012-2016, STS Benchmark, SICK-Relatedness
 Spearman’s correlation
Experiments
Results
• MCSE : Wiki1M, Flickr30k
BERT (76.3 → 77.3)
RoBERTa (76.6 → 78.3)
• STS16 MCSE-BERT
-> the domain discrepancy
Performance comparison on STS tasks
Results
• the performances decrease
(without the large text-only corpus)
• MCSE models (0.9 – 3.8 points improvement)
• Spearman’s correlation(0.8 – 5.0 points reduction)
-> validating the efficacy of visual semantics
Average Spearman’s correlation on 7 STS tasks
Results
• Alignment-Uniformity
 Alignment : paired instance 사이의 거리
(짧을수록 좋음)
Similar samples have similar features
 Uniformity : embedding이 얼만큼 균일하게
분포하는지 (균일 할수록 좋음)
Preserve maximal information
* 참고 논문 : Understanding Contrastive Representation Learning through
Alignment and Uniformity on the Hypersphere (ICML 2020)
• Embedding space가 넓고, 고르게
분포하여 각 단어가 고유한 의미를
보존하는 것이 중요함.
• Contrastive learning을 통해
Negative Pair를 Positive Pair와 멀게
강제하는 과정에서 embedding space를
균일하게 분포하게 함.
Results
• Alignment-Uniformity
 PPOS : positive pairs distribution
 Pdata : data distribution
MCSE models : visually grounding
enhance by improving the
alignment property
The alignment-uniformity plot of models (BERT)
Results
• Improvements on Different Subsets
 different degrees from the visually grounding
because of domain discrepancy
Results
• SimCSE는 구문이 유사한 문장을 검색하는 반면
MCSE는 구문이 다양하고 의미 체계를 공유하는 문장을 검색
Results
• Cross-Modal Retrieval : metric Recall@K
 Recall@K : k개 추천 결과에 대한 recall
Conclusion
• MCSE 제안 : sentence embedding learning
• MCSE consistently improves the performance on STS tasks
• the superiority of method : by analyzing the alignment and uniformity properties
of the embedding space.
• SimCSE는 limited SAMPLE에서 MCSE 보다 나은 성능을 보임
MCSE는 큰 데이터에서는 SimCSE 성능을 능가함.
-> multimodal weight training 관련
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정

More Related Content

Similar to MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정

Zeroshot multimodal named entity disambiguation for noisy social media posts
Zeroshot multimodal named entity disambiguation for noisy social media postsZeroshot multimodal named entity disambiguation for noisy social media posts
Zeroshot multimodal named entity disambiguation for noisy social media posts
Syo Kyojin
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Matthew Lease
 
CVPR2022 paper reading - Balanced multimodal learning - All Japan Computer Vi...
CVPR2022 paper reading - Balanced multimodal learning - All Japan Computer Vi...CVPR2022 paper reading - Balanced multimodal learning - All Japan Computer Vi...
CVPR2022 paper reading - Balanced multimodal learning - All Japan Computer Vi...
Antonio Tejero de Pablos
 
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Scale adaptive dictionary learning
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Scale adaptive dictionary learningIEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Scale adaptive dictionary learning
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Scale adaptive dictionary learning
IEEEBEBTECHSTUDENTPROJECTS
 
do adversarially robust image net models transfer better
do adversarially robust image net models transfer betterdo adversarially robust image net models transfer better
do adversarially robust image net models transfer better
LEE HOSEONG
 
Dexa2007 Orsi V1.5
Dexa2007 Orsi V1.5Dexa2007 Orsi V1.5
Dexa2007 Orsi V1.5Giorgio Orsi
 
Transfer Learning for Natural Language Processing
Transfer Learning for Natural Language ProcessingTransfer Learning for Natural Language Processing
Transfer Learning for Natural Language Processing
Sebastian Ruder
 
Distilling Linguistic Context for Language Model Compression
Distilling Linguistic Context for Language Model CompressionDistilling Linguistic Context for Language Model Compression
Distilling Linguistic Context for Language Model Compression
GeonDoPark1
 
Distilling Linguistic Context for Language Model Compression
Distilling Linguistic Context for Language Model CompressionDistilling Linguistic Context for Language Model Compression
Distilling Linguistic Context for Language Model Compression
Gyeongman Kim
 
A reconstruction error based framework for multi label and multi-view learning
A reconstruction error based framework for multi label and multi-view learningA reconstruction error based framework for multi label and multi-view learning
A reconstruction error based framework for multi label and multi-view learning
ieeepondy
 
VAEs for multimodal disentanglement
VAEs for multimodal disentanglementVAEs for multimodal disentanglement
VAEs for multimodal disentanglement
Antonio Tejero de Pablos
 
2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories
WarNik Chow
 
Compact and Distinctive Visual Vocabularies for Efficient Multimedia Data Ind...
Compact and Distinctive Visual Vocabularies for Efficient Multimedia Data Ind...Compact and Distinctive Visual Vocabularies for Efficient Multimedia Data Ind...
Compact and Distinctive Visual Vocabularies for Efficient Multimedia Data Ind...
Symeon Papadopoulos
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
Bhaskar Mitra
 
Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?
Goergen Institute for Data Science
 
Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?
Goergen Institute for Data Science
 
Action Sequence Mining and Behavior Pattern Analysis for User Modeling
Action Sequence Mining and Behavior Pattern Analysis for User ModelingAction Sequence Mining and Behavior Pattern Analysis for User Modeling
Action Sequence Mining and Behavior Pattern Analysis for User Modeling
Peter Brusilovsky
 
Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach
Text Representation & Fixed-Size Ordinally-Forgetting Encoding ApproachText Representation & Fixed-Size Ordinally-Forgetting Encoding Approach
Text Representation & Fixed-Size Ordinally-Forgetting Encoding ApproachAhmed Hani Ibrahim
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLP
indico data
 
A neural image caption generator
A neural image caption generatorA neural image caption generator
A neural image caption generator
heedaeKwon
 

Similar to MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정 (20)

Zeroshot multimodal named entity disambiguation for noisy social media posts
Zeroshot multimodal named entity disambiguation for noisy social media postsZeroshot multimodal named entity disambiguation for noisy social media posts
Zeroshot multimodal named entity disambiguation for noisy social media posts
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
CVPR2022 paper reading - Balanced multimodal learning - All Japan Computer Vi...
CVPR2022 paper reading - Balanced multimodal learning - All Japan Computer Vi...CVPR2022 paper reading - Balanced multimodal learning - All Japan Computer Vi...
CVPR2022 paper reading - Balanced multimodal learning - All Japan Computer Vi...
 
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Scale adaptive dictionary learning
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Scale adaptive dictionary learningIEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Scale adaptive dictionary learning
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Scale adaptive dictionary learning
 
do adversarially robust image net models transfer better
do adversarially robust image net models transfer betterdo adversarially robust image net models transfer better
do adversarially robust image net models transfer better
 
Dexa2007 Orsi V1.5
Dexa2007 Orsi V1.5Dexa2007 Orsi V1.5
Dexa2007 Orsi V1.5
 
Transfer Learning for Natural Language Processing
Transfer Learning for Natural Language ProcessingTransfer Learning for Natural Language Processing
Transfer Learning for Natural Language Processing
 
Distilling Linguistic Context for Language Model Compression
Distilling Linguistic Context for Language Model CompressionDistilling Linguistic Context for Language Model Compression
Distilling Linguistic Context for Language Model Compression
 
Distilling Linguistic Context for Language Model Compression
Distilling Linguistic Context for Language Model CompressionDistilling Linguistic Context for Language Model Compression
Distilling Linguistic Context for Language Model Compression
 
A reconstruction error based framework for multi label and multi-view learning
A reconstruction error based framework for multi label and multi-view learningA reconstruction error based framework for multi label and multi-view learning
A reconstruction error based framework for multi label and multi-view learning
 
VAEs for multimodal disentanglement
VAEs for multimodal disentanglementVAEs for multimodal disentanglement
VAEs for multimodal disentanglement
 
2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories
 
Compact and Distinctive Visual Vocabularies for Efficient Multimedia Data Ind...
Compact and Distinctive Visual Vocabularies for Efficient Multimedia Data Ind...Compact and Distinctive Visual Vocabularies for Efficient Multimedia Data Ind...
Compact and Distinctive Visual Vocabularies for Efficient Multimedia Data Ind...
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
 
Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?
 
Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?
 
Action Sequence Mining and Behavior Pattern Analysis for User Modeling
Action Sequence Mining and Behavior Pattern Analysis for User ModelingAction Sequence Mining and Behavior Pattern Analysis for User Modeling
Action Sequence Mining and Behavior Pattern Analysis for User Modeling
 
Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach
Text Representation & Fixed-Size Ordinally-Forgetting Encoding ApproachText Representation & Fixed-Size Ordinally-Forgetting Encoding Approach
Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLP
 
A neural image caption generator
A neural image caption generatorA neural image caption generator
A neural image caption generator
 

More from taeseon ryu

VoxelNet
VoxelNetVoxelNet
VoxelNet
taeseon ryu
 
OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...
taeseon ryu
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splatting
taeseon ryu
 
JetsonTX2 Python
 JetsonTX2 Python  JetsonTX2 Python
JetsonTX2 Python
taeseon ryu
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptx
taeseon ryu
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
taeseon ryu
 
YOLO V6
YOLO V6YOLO V6
YOLO V6
taeseon ryu
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
taeseon ryu
 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
taeseon ryu
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
taeseon ryu
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learning
taeseon ryu
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
taeseon ryu
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
taeseon ryu
 
mPLUG
mPLUGmPLUG
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
taeseon ryu
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
taeseon ryu
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithm
taeseon ryu
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networks
taeseon ryu
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarization
taeseon ryu
 
ProximalPolicyOptimization
ProximalPolicyOptimizationProximalPolicyOptimization
ProximalPolicyOptimization
taeseon ryu
 

More from taeseon ryu (20)

VoxelNet
VoxelNetVoxelNet
VoxelNet
 
OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splatting
 
JetsonTX2 Python
 JetsonTX2 Python  JetsonTX2 Python
JetsonTX2 Python
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptx
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
 
YOLO V6
YOLO V6YOLO V6
YOLO V6
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learning
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
 
mPLUG
mPLUGmPLUG
mPLUG
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithm
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networks
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarization
 
ProximalPolicyOptimization
ProximalPolicyOptimizationProximalPolicyOptimization
ProximalPolicyOptimization
 

Recently uploaded

Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 

Recently uploaded (20)

Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 

MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정

  • 1. MCSE: Multimodal Contrastive Learning of Sentence Embeddings Miaoran Zhang, Marius Mosbach, David Ifeoluwa Adelani, Michael A. Hedderich, and Dietrich Klakow, 2022
  • 3. Introduction • MCSE : Multimodal Contrastive Learning of Sentence Embeddings  Background: Unsupervised SimCSE (Gao et al., 2021)  Extend a multimodal contrastive objective  Experiments on standard Semantic Textual Similarity (STS)
  • 4. • Architecture of MCSE Introduction f v (· ) is a pre-trained image encoder such as ResNet
  • 5. • Contrastive learning : background Unsupervised SimCSE  data augmentation strategy : dropout noise  pulling positive sentences closer and pushing apart negatives Related Work Cosine similarity
  • 6. • Multimodal Contrastive Learning  sentence-image pairs , sentence xi and image yi • f v (· ) : pre-trained image encoder such as ResNet • fθ(· ) : pre-trained language encoder such as BERT  pull semantically close image-sentence pairs together and push away non-related pairs Related Work
  • 7. • Dataset  Multimodal datasets : Flickr30k (29,783 images) and MS-COCO (82,783 images)  text corpus : Wiki1M (English Wikipedia : 106 sentences) • Encoder  Language encoders : BERT and RoBERTa  Image encoder : ResNet-50 Single layer MLPs • Evaluation  7 Semantic Textual Similarity (STS) : STS 2012-2016, STS Benchmark, SICK-Relatedness  Spearman’s correlation Experiments
  • 8. Results • MCSE : Wiki1M, Flickr30k BERT (76.3 → 77.3) RoBERTa (76.6 → 78.3) • STS16 MCSE-BERT -> the domain discrepancy Performance comparison on STS tasks
  • 9. Results • the performances decrease (without the large text-only corpus) • MCSE models (0.9 – 3.8 points improvement) • Spearman’s correlation(0.8 – 5.0 points reduction) -> validating the efficacy of visual semantics Average Spearman’s correlation on 7 STS tasks
  • 10. Results • Alignment-Uniformity  Alignment : paired instance 사이의 거리 (짧을수록 좋음) Similar samples have similar features  Uniformity : embedding이 얼만큼 균일하게 분포하는지 (균일 할수록 좋음) Preserve maximal information * 참고 논문 : Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere (ICML 2020) • Embedding space가 넓고, 고르게 분포하여 각 단어가 고유한 의미를 보존하는 것이 중요함. • Contrastive learning을 통해 Negative Pair를 Positive Pair와 멀게 강제하는 과정에서 embedding space를 균일하게 분포하게 함.
  • 11. Results • Alignment-Uniformity  PPOS : positive pairs distribution  Pdata : data distribution MCSE models : visually grounding enhance by improving the alignment property The alignment-uniformity plot of models (BERT)
  • 12. Results • Improvements on Different Subsets  different degrees from the visually grounding because of domain discrepancy
  • 13. Results • SimCSE는 구문이 유사한 문장을 검색하는 반면 MCSE는 구문이 다양하고 의미 체계를 공유하는 문장을 검색
  • 14. Results • Cross-Modal Retrieval : metric Recall@K  Recall@K : k개 추천 결과에 대한 recall
  • 15.
  • 16. Conclusion • MCSE 제안 : sentence embedding learning • MCSE consistently improves the performance on STS tasks • the superiority of method : by analyzing the alignment and uniformity properties of the embedding space. • SimCSE는 limited SAMPLE에서 MCSE 보다 나은 성능을 보임 MCSE는 큰 데이터에서는 SimCSE 성능을 능가함. -> multimodal weight training 관련