MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정

•

0 likes•37 views

이 논문은 MCSE라는 새로운 접근법을 제시하며, 시각과 텍스트 정보를 결합하여 의미있는 문장 임베딩을 학습합니다. 다양한 데이터셋과 사전 훈련된 인코더에서 성능 향상을 보이며, 의미론적으로 유사한 문장을 잘 정렬합니다. 또한, 비전을 추가 의미 정보로 사용함으로써 문장 표현 학습을 더욱 촉진할 수 있다는 주장을 하고 있습니다. 이 방법은 기존의 문장 임베딩 학습 방법과 비교되며, 그 결과로서 이론과 실제에서 모두 탁월한 성능을 보입니다.

Data & Analytics

MCSE: Multimodal Contrastive Learning of
Sentence Embeddings
Miaoran Zhang, Marius Mosbach, David Ifeoluwa Adelani, Michael A. Hedderich, and Dietrich Klakow,
2022

Experiments
Conclusions
Introduction
Related Work
01
02
03
04

Introduction
• MCSE : Multimodal Contrastive Learning of Sentence Embeddings
 Background: Unsupervised SimCSE (Gao et al., 2021)
 Extend a multimodal contrastive objective
 Experiments on standard Semantic Textual Similarity (STS)

• Architecture of MCSE
Introduction
f v (·
) is a pre-trained image encoder such as ResNet

• Contrastive learning : background Unsupervised SimCSE
 data augmentation strategy : dropout noise
 pulling positive sentences closer and pushing apart negatives
Related Work
Cosine similarity

• Multimodal Contrastive Learning
 sentence-image pairs , sentence xi and image yi
• f v (·
) : pre-trained image encoder such as ResNet
• fθ(·
) : pre-trained language encoder such as BERT
 pull semantically close image-sentence pairs together and push away non-related pairs
Related Work

• Dataset
 Multimodal datasets : Flickr30k (29,783 images) and MS-COCO (82,783 images)
 text corpus : Wiki1M (English Wikipedia : 106 sentences)
• Encoder
 Language encoders : BERT and RoBERTa
 Image encoder : ResNet-50
Single layer MLPs
• Evaluation
 7 Semantic Textual Similarity (STS) : STS 2012-2016, STS Benchmark, SICK-Relatedness
 Spearman’s correlation
Experiments

Results
• MCSE : Wiki1M, Flickr30k
BERT (76.3 → 77.3)
RoBERTa (76.6 → 78.3)
• STS16 MCSE-BERT
-> the domain discrepancy
Performance comparison on STS tasks

Results
• the performances decrease
(without the large text-only corpus)
• MCSE models (0.9 – 3.8 points improvement)
• Spearman’s correlation(0.8 – 5.0 points reduction)
-> validating the efficacy of visual semantics
Average Spearman’s correlation on 7 STS tasks

Results
• Alignment-Uniformity
 Alignment : paired instance 사이의 거리
(짧을수록 좋음)
Similar samples have similar features
 Uniformity : embedding이 얼만큼 균일하게
분포하는지 (균일 할수록 좋음)
Preserve maximal information
* 참고 논문 : Understanding Contrastive Representation Learning through
Alignment and Uniformity on the Hypersphere (ICML 2020)
• Embedding space가 넓고, 고르게
분포하여 각 단어가 고유한 의미를
보존하는 것이 중요함.
• Contrastive learning을 통해
Negative Pair를 Positive Pair와 멀게
강제하는 과정에서 embedding space를
균일하게 분포하게 함.

Results
• Alignment-Uniformity
 PPOS : positive pairs distribution
 Pdata : data distribution
MCSE models : visually grounding
enhance by improving the
alignment property
The alignment-uniformity plot of models (BERT)

Results
• Improvements on Different Subsets
 different degrees from the visually grounding
because of domain discrepancy

Results
• SimCSE는 구문이 유사한 문장을 검색하는 반면
MCSE는 구문이 다양하고 의미 체계를 공유하는 문장을 검색

Results
• Cross-Modal Retrieval : metric Recall@K
 Recall@K : k개 추천 결과에 대한 recall

Conclusion
• MCSE 제안 : sentence embedding learning
• MCSE consistently improves the performance on STS tasks
• the superiority of method : by analyzing the alignment and uniformity properties
of the embedding space.
• SimCSE는 limited SAMPLE에서 MCSE 보다 나은 성능을 보임
MCSE는 큰 데이터에서는 SimCSE 성능을 능가함.
-> multimodal weight training 관련

Similar to MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정

Zeroshot multimodal named entity disambiguation for noisy social media postsSyo Kyojin

Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease

CVPR2022 paper reading - Balanced multimodal learning - All Japan Computer Vi...Antonio Tejero de Pablos

IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Scale adaptive dictionary learningIEEEBEBTECHSTUDENTPROJECTS

do adversarially robust image net models transfer betterLEE HOSEONG

Dexa2007 Orsi V1.5Giorgio Orsi

Transfer Learning for Natural Language ProcessingSebastian Ruder

Distilling Linguistic Context for Language Model CompressionGyeongman Kim

Distilling Linguistic Context for Language Model CompressionGeonDoPark1

A reconstruction error based framework for multi label and multi-view learningieeepondy

VAEs for multimodal disentanglementAntonio Tejero de Pablos

2010 PACLIC - pay attention to categoriesWarNik Chow

Compact and Distinctive Visual Vocabularies for Efficient Multimedia Data Ind...Symeon Papadopoulos

Neural Models for Information RetrievalBhaskar Mitra

Video + Language: Where Does Domain Knowledge Fit in?Goergen Institute for Data Science

Action Sequence Mining and Behavior Pattern Analysis for User ModelingPeter Brusilovsky

Text Representation & Fixed-Size Ordinally-Forgetting Encoding ApproachAhmed Hani Ibrahim

ODSC East: Effective Transfer Learning for NLPindico data

A neural image caption generatorheedaeKwon

Similar to MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정 (20)

Zeroshot multimodal named entity disambiguation for noisy social media posts

Deep Learning for Information Retrieval: Models, Progress, & Opportunities

CVPR2022 paper reading - Balanced multimodal learning - All Japan Computer Vi...

IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Scale adaptive dictionary learning

do adversarially robust image net models transfer better

Dexa2007 Orsi V1.5

Transfer Learning for Natural Language Processing

Distilling Linguistic Context for Language Model Compression

A reconstruction error based framework for multi label and multi-view learning

VAEs for multimodal disentanglement

2010 PACLIC - pay attention to categories

Compact and Distinctive Visual Vocabularies for Efficient Multimedia Data Ind...

Neural Models for Information Retrieval

Video + Language: Where Does Domain Knowledge Fit in?

Action Sequence Mining and Behavior Pattern Analysis for User Modeling

Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach

ODSC East: Effective Transfer Learning for NLP

A neural image caption generator

Recently uploaded

Industrialised data - the key to AI success.pdfLars Albertsson

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat

Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa

How we prevented account sharing with MFAAndrei Kaleshka

Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort

PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava

04242024_CCC TUG_Joins and Relationshipsccctableauusergroup

办理学位证纽约大学毕业证(NYU毕业证书）原版一比一fhwihughh

INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Universitat Politècnica de Catalunya

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh

Decoding Loan Approval: Predictive Modeling in ActionBoston Institute of Analytics

GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch

代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo

1:1定制(UQ毕业证）昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk

Data Science Jobs and Salaries Analysis.pptxFurkanTasci3

Recently uploaded (20)

Industrialised data - the key to AI success.pdf

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service

Call Girls In Dwarka 9654467111 Escorts Service

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf

How we prevented account sharing with MFA

Schema on read is obsolete. Welcome metaprogramming..pdf

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service

PKS-TGC-1084-630 - Stage 1 Proposal.pptx

04242024_CCC TUG_Joins and Relationships

办理学位证纽约大学毕业证(NYU毕业证书）原版一比一

INTERNSHIP ON PURBASHA COMPOSITE TEX LTD

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝

Decoding Loan Approval: Predictive Modeling in Action

GA4 Without Cookies [Measure Camp AMS]

代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改

1:1定制(UQ毕业证）昆士兰大学毕业证成绩单修改留信学历认证原版一模一样

Data Science Jobs and Salaries Analysis.pptx

MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정

1. MCSE: Multimodal Contrastive Learning of Sentence Embeddings Miaoran Zhang, Marius Mosbach, David Ifeoluwa Adelani, Michael A. Hedderich, and Dietrich Klakow, 2022

2. Experiments Conclusions Introduction Related Work 01 02 03 04

3. Introduction • MCSE : Multimodal Contrastive Learning of Sentence Embeddings  Background: Unsupervised SimCSE (Gao et al., 2021)  Extend a multimodal contrastive objective  Experiments on standard Semantic Textual Similarity (STS)

4. • Architecture of MCSE Introduction f v (· ) is a pre-trained image encoder such as ResNet

5. • Contrastive learning : background Unsupervised SimCSE  data augmentation strategy : dropout noise  pulling positive sentences closer and pushing apart negatives Related Work Cosine similarity

6. • Multimodal Contrastive Learning  sentence-image pairs , sentence xi and image yi • f v (· ) : pre-trained image encoder such as ResNet • fθ(· ) : pre-trained language encoder such as BERT  pull semantically close image-sentence pairs together and push away non-related pairs Related Work

7. • Dataset  Multimodal datasets : Flickr30k (29,783 images) and MS-COCO (82,783 images)  text corpus : Wiki1M (English Wikipedia : 106 sentences) • Encoder  Language encoders : BERT and RoBERTa  Image encoder : ResNet-50 Single layer MLPs • Evaluation  7 Semantic Textual Similarity (STS) : STS 2012-2016, STS Benchmark, SICK-Relatedness  Spearman’s correlation Experiments

8. Results • MCSE : Wiki1M, Flickr30k BERT (76.3 → 77.3) RoBERTa (76.6 → 78.3) • STS16 MCSE-BERT -> the domain discrepancy Performance comparison on STS tasks

9. Results • the performances decrease (without the large text-only corpus) • MCSE models (0.9 – 3.8 points improvement) • Spearman’s correlation(0.8 – 5.0 points reduction) -> validating the efficacy of visual semantics Average Spearman’s correlation on 7 STS tasks

10. Results • Alignment-Uniformity  Alignment : paired instance 사이의 거리 (짧을수록 좋음) Similar samples have similar features  Uniformity : embedding이 얼만큼 균일하게 분포하는지 (균일 할수록 좋음) Preserve maximal information * 참고 논문 : Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere (ICML 2020) • Embedding space가 넓고, 고르게 분포하여 각 단어가 고유한 의미를 보존하는 것이 중요함. • Contrastive learning을 통해 Negative Pair를 Positive Pair와 멀게 강제하는 과정에서 embedding space를 균일하게 분포하게 함.

11. Results • Alignment-Uniformity  PPOS : positive pairs distribution  Pdata : data distribution MCSE models : visually grounding enhance by improving the alignment property The alignment-uniformity plot of models (BERT)

12. Results • Improvements on Different Subsets  different degrees from the visually grounding because of domain discrepancy

13. Results • SimCSE는 구문이 유사한 문장을 검색하는 반면 MCSE는 구문이 다양하고 의미 체계를 공유하는 문장을 검색

14. Results • Cross-Modal Retrieval : metric Recall@K  Recall@K : k개 추천 결과에 대한 recall

15.

16. Conclusion • MCSE 제안 : sentence embedding learning • MCSE consistently improves the performance on STS tasks • the superiority of method : by analyzing the alignment and uniformity properties of the embedding space. • SimCSE는 limited SAMPLE에서 MCSE 보다 나은 성능을 보임 MCSE는 큰 데이터에서는 SimCSE 성능을 능가함. -> multimodal weight training 관련

MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정

Recommended

Recommended

More Related Content

Similar to MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정

Similar to MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정 (20)

More from taeseon ryu

More from taeseon ryu (20)

Recently uploaded

Recently uploaded (20)

MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정