SlideShare a Scribd company logo
1 of 36
Download to read offline
mPLUG: Effective and Efficient Vision-Language Learning by
Cross-modal Skip-connections
2023.04.09
이미지 처리팀
이찬혁, 김준철, 류채은, 이해원, 최승준, 현청천
Introduction
03
Introduction Vision-Language Task
04
Introduction Problem of previous works
05
Introduction Problem of previous works
•
•
06
Introduction Proposed method
•
•
•
07
Introduction Contribution
• We propose a unified vision-language pretrained model mPLUG of cross-modal understanding and
generation for both effectiveness and efficiency in cross-modal learning.
• We introduce a new asymmetric vision language architecture with novel cross-modal skip-connections,
to address two fundamental problems of information asymmetry and computation inefficiency in multi-
modal fusion.
• mPLUG achieves state-of-the-art performance on a wide range of vision-language tasks, including
Image captioning, image-text retrieval, visual grounding and visual question answering in zero-shot
manner
Proposed method
09
Proposed method Overall architecture
skip-connected fusion block
skip-connected fusion block
010
Proposed method Model architecture
𝐿𝐼𝑇𝐶 + 𝐿𝐼𝑇𝑀 + 𝐿𝑀𝑎𝑠𝑘𝐿𝑀 𝐿𝑃𝑟𝑒𝑓𝑖𝑥𝐿𝑀
011
Proposed method Model architecture
𝑳𝑰𝑻𝑪 + 𝐿𝐼𝑇𝑀 + 𝐿𝑀𝑎𝑠𝑘𝐿𝑀 𝐿𝑃𝑟𝑒𝑓𝑖𝑥𝐿𝑀
012
Proposed method Pre-training methods (Image-Text Contrastive learning)
•
𝐼1, 𝐼2, 𝐼3, 𝐼4
𝑇1, 𝑇2, 𝑇3, 𝑇4
𝐼1, 𝐼2, 𝐼3, 𝐼4
𝑇1, 𝑇2, 𝑇3, 𝑇4
𝐼1
𝑇2, 𝑇3, 𝑇4
𝐼3
𝑇1, 𝑇2, 𝑇4
013
Proposed method Pre-training methods (Image-Text Contrastive learning)
𝑻𝟏 𝑻𝟐 𝑻𝟑 𝑻𝟒
𝑰𝟏 𝑺𝟏𝟏 𝑺𝟏𝟐 𝑺𝟏𝟑 𝑺𝟏𝟒
𝑰𝟐 𝑺𝟐𝟏 𝑺𝟐𝟐 𝑺𝟐𝟑 𝑺𝟐𝟒
𝑰𝟑 𝑺𝟑𝟏 𝑺𝟑𝟐 𝑺𝟑𝟑 𝑺𝟑𝟒
𝑰𝟒 𝑺𝟒𝟏 𝑺𝟒𝟐 𝑺𝟒𝟑 𝑺𝟒𝟒
𝐼𝑚𝑎𝑔𝑒
𝑇𝑒𝑥𝑡
𝑳𝑰𝑻𝑪 + 𝐿𝐼𝑇𝑀 + 𝐿𝑀𝑎𝑠𝑘𝐿𝑀 𝐿𝑃𝑟𝑒𝑓𝑖𝑥𝐿𝑀
014
Proposed method Model architecture
𝐿𝐼𝑇𝐶 + 𝑳𝑰𝑻𝑴 + 𝐿𝑀𝑎𝑠𝑘𝐿𝑀 𝐿𝑃𝑟𝑒𝑓𝑖𝑥𝐿𝑀
015
Proposed method Pre-training methods (Image-Text Matching)
𝑻𝟏 𝑻𝟐 𝑻𝟑 𝑻𝟒
𝑰𝟏 𝑺𝟏𝟏 𝑺𝟏𝟐 𝑺𝟏𝟑 𝑺𝟏𝟒
𝑰𝟐 𝑺𝟐𝟏 𝑺𝟐𝟐 𝑺𝟐𝟑 𝑺𝟐𝟒
𝑰𝟑 𝑺𝟑𝟏 𝑺𝟑𝟐 𝑺𝟑𝟑 𝑺𝟑𝟒
𝑰𝟒 𝑺𝟒𝟏 𝑺𝟒𝟐 𝑺𝟒𝟑 𝑺𝟒𝟒
𝑻𝟏 𝑻𝟐 𝑻𝟑 𝑻𝟒
𝑰𝟏
𝑰𝟐
𝑰𝟑
𝑰𝟒
𝐼𝑚𝑎𝑔𝑒
𝑇𝑒𝑥𝑡
𝐼𝑚𝑎𝑔𝑒
𝑇𝑒𝑥𝑡
[Hard negative Image-Text pair]
𝐼1 − 𝑇2 𝐼2 − 𝑇1 𝐼3 − 𝑇1 𝐼4 − 𝑇3
𝑻𝟏 𝑻𝟐 𝑻𝟑 𝑻𝟒
𝑰𝟏
𝑰𝟐
𝑰𝟑
𝑰𝟒
𝐼𝑚𝑎𝑔𝑒
𝑇𝑒𝑥𝑡
[Hard negative Text-Image pair]
𝑇1 − 𝐼2 𝑇2 − 𝐼1 𝑇3 − 𝐼4 𝑇4 − 𝐿1
016
Proposed method Pre-training methods (Image-Text Matching)
→
𝐿𝐼𝑇𝐶 + 𝑳𝑰𝑻𝑴 + 𝐿𝑀𝑎𝑠𝑘𝐿𝑀 𝐿𝑃𝑟𝑒𝑓𝑖𝑥𝐿𝑀
𝑳𝑰𝑻𝑴
017
Proposed method Model architecture
𝐿𝐼𝑇𝐶 + 𝐿𝐼𝑇𝑀 + 𝑳𝑴𝒂𝒔𝒌𝑳𝑴 𝐿𝑃𝑟𝑒𝑓𝑖𝑥𝐿𝑀
018
Proposed method Pre-training methods (Masked Language Modeling)
𝐿𝐼𝑇𝐶 + 𝐿𝐼𝑇𝑀 + 𝑳𝑴𝒂𝒔𝒌𝑳𝑴 𝐿𝑃𝑟𝑒𝑓𝑖𝑥𝐿𝑀
019
Proposed method Model architecture
𝐿𝐼𝑇𝐶 + 𝐿𝐼𝑇𝑀 + 𝐿𝑀𝑎𝑠𝑘𝐿𝑀 𝑳𝑷𝒓𝒆𝒇𝒊𝒙𝑳𝑴
020
Proposed method Pre-training methods (Prefix Language Modeling)
021
Proposed method Pre-training methods
𝑳𝑰𝑻𝑪
𝑳𝑴𝑳𝑴 𝑳𝑰𝑻𝑴
𝑳𝑷𝒓𝒆𝒇𝒊𝒙𝑳𝑴
Question
Experiments
024
Experiments Data & Setup
•
•
• )
•
𝐵𝐸𝑅𝑇𝑏𝑎𝑠𝑒
Multi-modal network
(Last 6-layers in 𝐵𝐸𝑅𝑇𝑏𝑎𝑠𝑒)
Decoder
(12-layers Transformer)
•
• NVIDIA A100 GPUs (16 GPUS)
• AdamW (Weight decay 0.02)
• 1e-5 (ViT) / 1e-4 (BERT)
• Random image crop, RandAugment
• 65,536
•
025
Experiments Distributed learning on a large scale
026
Evaluation on Vision-Language Tasks Visual Question Answering
027
Evaluation on Vision-Language Tasks Image captioning
028
Evaluation on Vision-Language Tasks Image-Text retrieval
029
Evaluation on Vision-Language Tasks Visual grounding
030
Evaluation on Vision-Language Tasks Visual reasoning
031
Effectiveness and Efficiency Analysis of Stride for Skip
032
Effectiveness and Efficiency Analysis of Cross-modal Fusion
033
Zero-shot Transferability Image caption
034
Zero-shot Transferability Image-Text retrieval
035
Conclusion
• Presents mPLUG with novel cross-modal skip-connections, an effective and efficient VLP
framework for both cross-modal understanding and generation.
• mPLUG achieves state-of-the-art performance on a wide range of vision-language tasks with
zero-shot manner
THANK YOU

More Related Content

What's hot

論文の図表レイアウト例
論文の図表レイアウト例論文の図表レイアウト例
論文の図表レイアウト例Sunao Hara
 
人間の視覚的注意を予測するモデル - 動的ベイジアンネットワークに基づく 最新のアプローチ -
人間の視覚的注意を予測するモデル - 動的ベイジアンネットワークに基づく 最新のアプローチ -人間の視覚的注意を予測するモデル - 動的ベイジアンネットワークに基づく 最新のアプローチ -
人間の視覚的注意を予測するモデル - 動的ベイジアンネットワークに基づく 最新のアプローチ -Akisato Kimura
 
機械学習におけるオンライン確率的最適化の理論
機械学習におけるオンライン確率的最適化の理論機械学習におけるオンライン確率的最適化の理論
機械学習におけるオンライン確率的最適化の理論Taiji Suzuki
 
スペクトログラム無矛盾性を用いた 独立低ランク行列分析の実験的評価
スペクトログラム無矛盾性を用いた独立低ランク行列分析の実験的評価スペクトログラム無矛盾性を用いた独立低ランク行列分析の実験的評価
スペクトログラム無矛盾性を用いた 独立低ランク行列分析の実験的評価Daichi Kitamura
 
今、改めて振り返りたいiPhoneセンサの種類と使われ方
今、改めて振り返りたいiPhoneセンサの種類と使われ方今、改めて振り返りたいiPhoneセンサの種類と使われ方
今、改めて振り返りたいiPhoneセンサの種類と使われ方Atsushi Otsubo
 
フリーソフトではじめるメチル化データ解析入門 SeqCap Epiデータ対応_第40回勉強会資料
フリーソフトではじめるメチル化データ解析入門 SeqCap Epiデータ対応_第40回勉強会資料フリーソフトではじめるメチル化データ解析入門 SeqCap Epiデータ対応_第40回勉強会資料
フリーソフトではじめるメチル化データ解析入門 SeqCap Epiデータ対応_第40回勉強会資料Amelieff
 
Dl hacks輪読: "Unifying distillation and privileged information"
Dl hacks輪読: "Unifying distillation and privileged information"Dl hacks輪読: "Unifying distillation and privileged information"
Dl hacks輪読: "Unifying distillation and privileged information"Yusuke Iwasawa
 
[DL輪読会]Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Ima...
[DL輪読会]Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Ima...[DL輪読会]Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Ima...
[DL輪読会]Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Ima...Deep Learning JP
 
強化学習初心者が強化学習でニューラルネットワークの設計を自動化してみたい
強化学習初心者が強化学習でニューラルネットワークの設計を自動化してみたい強化学習初心者が強化学習でニューラルネットワークの設計を自動化してみたい
強化学習初心者が強化学習でニューラルネットワークの設計を自動化してみたいTakuma Wakamori
 
相互相関関数の最大化と時間差推定
相互相関関数の最大化と時間差推定相互相関関数の最大化と時間差推定
相互相関関数の最大化と時間差推定KoueiYamaoka
 
Noisy Labels と戦う深層学習
Noisy Labels と戦う深層学習Noisy Labels と戦う深層学習
Noisy Labels と戦う深層学習Plot Hong
 
MIRU2020長尾賞受賞論文解説:Attention Branch Networkの展開
MIRU2020長尾賞受賞論文解説:Attention Branch Networkの展開MIRU2020長尾賞受賞論文解説:Attention Branch Networkの展開
MIRU2020長尾賞受賞論文解説:Attention Branch Networkの展開Hironobu Fujiyoshi
 
【論文紹介】Understanding Back-Translation at Scale
【論文紹介】Understanding Back-Translation at Scale【論文紹介】Understanding Back-Translation at Scale
【論文紹介】Understanding Back-Translation at ScaleTomoyuki Hioki
 
[DL輪読会]Deep Neural Networks as Gaussian Processes
[DL輪読会]Deep Neural Networks as Gaussian Processes[DL輪読会]Deep Neural Networks as Gaussian Processes
[DL輪読会]Deep Neural Networks as Gaussian ProcessesDeep Learning JP
 
[DL輪読会]StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators
[DL輪読会]StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators[DL輪読会]StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators
[DL輪読会]StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image GeneratorsDeep Learning JP
 
0から理解するニューラルネットアーキテクチャサーチ(NAS)
0から理解するニューラルネットアーキテクチャサーチ(NAS)0から理解するニューラルネットアーキテクチャサーチ(NAS)
0から理解するニューラルネットアーキテクチャサーチ(NAS)MasanoriSuganuma
 
【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについてDeep Learning JP
 
実践・最強最速のアルゴリズム勉強会 第二回講義資料(ワークスアプリケーションズ & AtCoder)
実践・最強最速のアルゴリズム勉強会 第二回講義資料(ワークスアプリケーションズ & AtCoder)実践・最強最速のアルゴリズム勉強会 第二回講義資料(ワークスアプリケーションズ & AtCoder)
実践・最強最速のアルゴリズム勉強会 第二回講義資料(ワークスアプリケーションズ & AtCoder)AtCoder Inc.
 
Automatic Summarization (2014)
Automatic Summarization (2014)Automatic Summarization (2014)
Automatic Summarization (2014)Hitoshi Nishikawa
 

What's hot (20)

論文の図表レイアウト例
論文の図表レイアウト例論文の図表レイアウト例
論文の図表レイアウト例
 
人間の視覚的注意を予測するモデル - 動的ベイジアンネットワークに基づく 最新のアプローチ -
人間の視覚的注意を予測するモデル - 動的ベイジアンネットワークに基づく 最新のアプローチ -人間の視覚的注意を予測するモデル - 動的ベイジアンネットワークに基づく 最新のアプローチ -
人間の視覚的注意を予測するモデル - 動的ベイジアンネットワークに基づく 最新のアプローチ -
 
機械学習におけるオンライン確率的最適化の理論
機械学習におけるオンライン確率的最適化の理論機械学習におけるオンライン確率的最適化の理論
機械学習におけるオンライン確率的最適化の理論
 
スペクトログラム無矛盾性を用いた 独立低ランク行列分析の実験的評価
スペクトログラム無矛盾性を用いた独立低ランク行列分析の実験的評価スペクトログラム無矛盾性を用いた独立低ランク行列分析の実験的評価
スペクトログラム無矛盾性を用いた 独立低ランク行列分析の実験的評価
 
今、改めて振り返りたいiPhoneセンサの種類と使われ方
今、改めて振り返りたいiPhoneセンサの種類と使われ方今、改めて振り返りたいiPhoneセンサの種類と使われ方
今、改めて振り返りたいiPhoneセンサの種類と使われ方
 
フリーソフトではじめるメチル化データ解析入門 SeqCap Epiデータ対応_第40回勉強会資料
フリーソフトではじめるメチル化データ解析入門 SeqCap Epiデータ対応_第40回勉強会資料フリーソフトではじめるメチル化データ解析入門 SeqCap Epiデータ対応_第40回勉強会資料
フリーソフトではじめるメチル化データ解析入門 SeqCap Epiデータ対応_第40回勉強会資料
 
辺彩色
辺彩色辺彩色
辺彩色
 
Dl hacks輪読: "Unifying distillation and privileged information"
Dl hacks輪読: "Unifying distillation and privileged information"Dl hacks輪読: "Unifying distillation and privileged information"
Dl hacks輪読: "Unifying distillation and privileged information"
 
[DL輪読会]Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Ima...
[DL輪読会]Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Ima...[DL輪読会]Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Ima...
[DL輪読会]Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Ima...
 
強化学習初心者が強化学習でニューラルネットワークの設計を自動化してみたい
強化学習初心者が強化学習でニューラルネットワークの設計を自動化してみたい強化学習初心者が強化学習でニューラルネットワークの設計を自動化してみたい
強化学習初心者が強化学習でニューラルネットワークの設計を自動化してみたい
 
相互相関関数の最大化と時間差推定
相互相関関数の最大化と時間差推定相互相関関数の最大化と時間差推定
相互相関関数の最大化と時間差推定
 
Noisy Labels と戦う深層学習
Noisy Labels と戦う深層学習Noisy Labels と戦う深層学習
Noisy Labels と戦う深層学習
 
MIRU2020長尾賞受賞論文解説:Attention Branch Networkの展開
MIRU2020長尾賞受賞論文解説:Attention Branch Networkの展開MIRU2020長尾賞受賞論文解説:Attention Branch Networkの展開
MIRU2020長尾賞受賞論文解説:Attention Branch Networkの展開
 
【論文紹介】Understanding Back-Translation at Scale
【論文紹介】Understanding Back-Translation at Scale【論文紹介】Understanding Back-Translation at Scale
【論文紹介】Understanding Back-Translation at Scale
 
[DL輪読会]Deep Neural Networks as Gaussian Processes
[DL輪読会]Deep Neural Networks as Gaussian Processes[DL輪読会]Deep Neural Networks as Gaussian Processes
[DL輪読会]Deep Neural Networks as Gaussian Processes
 
[DL輪読会]StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators
[DL輪読会]StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators[DL輪読会]StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators
[DL輪読会]StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators
 
0から理解するニューラルネットアーキテクチャサーチ(NAS)
0から理解するニューラルネットアーキテクチャサーチ(NAS)0から理解するニューラルネットアーキテクチャサーチ(NAS)
0から理解するニューラルネットアーキテクチャサーチ(NAS)
 
【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて
 
実践・最強最速のアルゴリズム勉強会 第二回講義資料(ワークスアプリケーションズ & AtCoder)
実践・最強最速のアルゴリズム勉強会 第二回講義資料(ワークスアプリケーションズ & AtCoder)実践・最強最速のアルゴリズム勉強会 第二回講義資料(ワークスアプリケーションズ & AtCoder)
実践・最強最速のアルゴリズム勉強会 第二回講義資料(ワークスアプリケーションズ & AtCoder)
 
Automatic Summarization (2014)
Automatic Summarization (2014)Automatic Summarization (2014)
Automatic Summarization (2014)
 

Similar to Effective and Efficient Vision-Language Learning with Cross-Modal Skip-Connections

Transfer Learning CV - Souradip and Sayak
Transfer Learning CV - Souradip and SayakTransfer Learning CV - Souradip and Sayak
Transfer Learning CV - Souradip and SayakSayak Paul
 
Minor Project Report on Denoising Diffusion Probabilistic Model
Minor Project Report on Denoising Diffusion Probabilistic ModelMinor Project Report on Denoising Diffusion Probabilistic Model
Minor Project Report on Denoising Diffusion Probabilistic Modelsoxigoh238
 
BULK IEEE PROJECTS IN MATLAB ,BULK IEEE PROJECTS, IEEE 2015-16 MATLAB PROJEC...
 BULK IEEE PROJECTS IN MATLAB ,BULK IEEE PROJECTS, IEEE 2015-16 MATLAB PROJEC... BULK IEEE PROJECTS IN MATLAB ,BULK IEEE PROJECTS, IEEE 2015-16 MATLAB PROJEC...
BULK IEEE PROJECTS IN MATLAB ,BULK IEEE PROJECTS, IEEE 2015-16 MATLAB PROJEC...Nexgen Technology
 
final year ieee pojects in pondicherry,bulk ieee projects ,bulk 2015-16 i...
  final  year ieee pojects in pondicherry,bulk ieee projects ,bulk  2015-16 i...  final  year ieee pojects in pondicherry,bulk ieee projects ,bulk  2015-16 i...
final year ieee pojects in pondicherry,bulk ieee projects ,bulk 2015-16 i...nexgentech
 
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...Vitaly Bondar
 
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...Sangmin Woo
 
NS - CUK Seminar: J.H.Lee, Review on "Graph Pre-training for AMR Parsing an...
 NS - CUK Seminar: J.H.Lee,  Review on "Graph Pre-training for AMR Parsing an... NS - CUK Seminar: J.H.Lee,  Review on "Graph Pre-training for AMR Parsing an...
NS - CUK Seminar: J.H.Lee, Review on "Graph Pre-training for AMR Parsing an...ssuser4b1f48
 
Prespective analytics with DOcplex and pandas
Prespective analytics with DOcplex and pandasPrespective analytics with DOcplex and pandas
Prespective analytics with DOcplex and pandasPyDataParis
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017Manish Pandey
 
NS-Lab Seminar : J.H.Lee, Review on "Graph Pre-training for AMR Parsing and ...
NS-Lab Seminar : J.H.Lee,  Review on "Graph Pre-training for AMR Parsing and ...NS-Lab Seminar : J.H.Lee,  Review on "Graph Pre-training for AMR Parsing and ...
NS-Lab Seminar : J.H.Lee, Review on "Graph Pre-training for AMR Parsing and ...ssuser4b1f48
 
Manoj Sharma_Enovia_9years
Manoj Sharma_Enovia_9yearsManoj Sharma_Enovia_9years
Manoj Sharma_Enovia_9yearsManoj Sharma
 
Manoj Sharma_Enovia_9years
Manoj Sharma_Enovia_9yearsManoj Sharma_Enovia_9years
Manoj Sharma_Enovia_9yearsManoj Sharma
 
Constraint Programming - An Alternative Approach to Heuristics in Scheduling
Constraint Programming - An Alternative Approach to Heuristics in SchedulingConstraint Programming - An Alternative Approach to Heuristics in Scheduling
Constraint Programming - An Alternative Approach to Heuristics in SchedulingEray Cakici
 
LearnSQL: Online Learning and Evaluation System for Databases Courses
LearnSQL: Online Learning and Evaluation System for Databases CoursesLearnSQL: Online Learning and Evaluation System for Databases Courses
LearnSQL: Online Learning and Evaluation System for Databases CoursesCarme Quer
 
A neural image caption generator
A neural image caption generatorA neural image caption generator
A neural image caption generatorheedaeKwon
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用CHENHuiMei
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImageryRAHUL BHOJWANI
 

Similar to Effective and Efficient Vision-Language Learning with Cross-Modal Skip-Connections (20)

Transfer Learning CV - Souradip and Sayak
Transfer Learning CV - Souradip and SayakTransfer Learning CV - Souradip and Sayak
Transfer Learning CV - Souradip and Sayak
 
Minor Project Report on Denoising Diffusion Probabilistic Model
Minor Project Report on Denoising Diffusion Probabilistic ModelMinor Project Report on Denoising Diffusion Probabilistic Model
Minor Project Report on Denoising Diffusion Probabilistic Model
 
BULK IEEE PROJECTS IN MATLAB ,BULK IEEE PROJECTS, IEEE 2015-16 MATLAB PROJEC...
 BULK IEEE PROJECTS IN MATLAB ,BULK IEEE PROJECTS, IEEE 2015-16 MATLAB PROJEC... BULK IEEE PROJECTS IN MATLAB ,BULK IEEE PROJECTS, IEEE 2015-16 MATLAB PROJEC...
BULK IEEE PROJECTS IN MATLAB ,BULK IEEE PROJECTS, IEEE 2015-16 MATLAB PROJEC...
 
final year ieee pojects in pondicherry,bulk ieee projects ,bulk 2015-16 i...
  final  year ieee pojects in pondicherry,bulk ieee projects ,bulk  2015-16 i...  final  year ieee pojects in pondicherry,bulk ieee projects ,bulk  2015-16 i...
final year ieee pojects in pondicherry,bulk ieee projects ,bulk 2015-16 i...
 
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...
 
Venu gopal_CV
Venu gopal_CVVenu gopal_CV
Venu gopal_CV
 
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...
 
NS - CUK Seminar: J.H.Lee, Review on "Graph Pre-training for AMR Parsing an...
 NS - CUK Seminar: J.H.Lee,  Review on "Graph Pre-training for AMR Parsing an... NS - CUK Seminar: J.H.Lee,  Review on "Graph Pre-training for AMR Parsing an...
NS - CUK Seminar: J.H.Lee, Review on "Graph Pre-training for AMR Parsing an...
 
A0540106
A0540106A0540106
A0540106
 
Prespective analytics with DOcplex and pandas
Prespective analytics with DOcplex and pandasPrespective analytics with DOcplex and pandas
Prespective analytics with DOcplex and pandas
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017
 
NS-Lab Seminar : J.H.Lee, Review on "Graph Pre-training for AMR Parsing and ...
NS-Lab Seminar : J.H.Lee,  Review on "Graph Pre-training for AMR Parsing and ...NS-Lab Seminar : J.H.Lee,  Review on "Graph Pre-training for AMR Parsing and ...
NS-Lab Seminar : J.H.Lee, Review on "Graph Pre-training for AMR Parsing and ...
 
Manoj Sharma_Enovia_9years
Manoj Sharma_Enovia_9yearsManoj Sharma_Enovia_9years
Manoj Sharma_Enovia_9years
 
Manoj Sharma_Enovia_9years
Manoj Sharma_Enovia_9yearsManoj Sharma_Enovia_9years
Manoj Sharma_Enovia_9years
 
NMSL_2017summer
NMSL_2017summerNMSL_2017summer
NMSL_2017summer
 
Constraint Programming - An Alternative Approach to Heuristics in Scheduling
Constraint Programming - An Alternative Approach to Heuristics in SchedulingConstraint Programming - An Alternative Approach to Heuristics in Scheduling
Constraint Programming - An Alternative Approach to Heuristics in Scheduling
 
LearnSQL: Online Learning and Evaluation System for Databases Courses
LearnSQL: Online Learning and Evaluation System for Databases CoursesLearnSQL: Online Learning and Evaluation System for Databases Courses
LearnSQL: Online Learning and Evaluation System for Databases Courses
 
A neural image caption generator
A neural image caption generatorA neural image caption generator
A neural image caption generator
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
 

More from taeseon ryu

OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...taeseon ryu
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splattingtaeseon ryu
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptxtaeseon ryu
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정taeseon ryu
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdftaeseon ryu
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories taeseon ryu
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extractiontaeseon ryu
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learningtaeseon ryu
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Modelstaeseon ryu
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuningtaeseon ryu
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdftaeseon ryu
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdftaeseon ryu
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithmtaeseon ryu
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networkstaeseon ryu
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarizationtaeseon ryu
 
ProximalPolicyOptimization
ProximalPolicyOptimizationProximalPolicyOptimization
ProximalPolicyOptimizationtaeseon ryu
 

More from taeseon ryu (20)

VoxelNet
VoxelNetVoxelNet
VoxelNet
 
OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splatting
 
JetsonTX2 Python
 JetsonTX2 Python  JetsonTX2 Python
JetsonTX2 Python
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptx
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
 
YOLO V6
YOLO V6YOLO V6
YOLO V6
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learning
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithm
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networks
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarization
 
ProximalPolicyOptimization
ProximalPolicyOptimizationProximalPolicyOptimization
ProximalPolicyOptimization
 

Recently uploaded

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 

Recently uploaded (20)

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 

Effective and Efficient Vision-Language Learning with Cross-Modal Skip-Connections

  • 1. mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections 2023.04.09 이미지 처리팀 이찬혁, 김준철, 류채은, 이해원, 최승준, 현청천
  • 4. 04 Introduction Problem of previous works
  • 5. 05 Introduction Problem of previous works • •
  • 7. 07 Introduction Contribution • We propose a unified vision-language pretrained model mPLUG of cross-modal understanding and generation for both effectiveness and efficiency in cross-modal learning. • We introduce a new asymmetric vision language architecture with novel cross-modal skip-connections, to address two fundamental problems of information asymmetry and computation inefficiency in multi- modal fusion. • mPLUG achieves state-of-the-art performance on a wide range of vision-language tasks, including Image captioning, image-text retrieval, visual grounding and visual question answering in zero-shot manner
  • 9. 09 Proposed method Overall architecture skip-connected fusion block skip-connected fusion block
  • 10. 010 Proposed method Model architecture 𝐿𝐼𝑇𝐶 + 𝐿𝐼𝑇𝑀 + 𝐿𝑀𝑎𝑠𝑘𝐿𝑀 𝐿𝑃𝑟𝑒𝑓𝑖𝑥𝐿𝑀
  • 11. 011 Proposed method Model architecture 𝑳𝑰𝑻𝑪 + 𝐿𝐼𝑇𝑀 + 𝐿𝑀𝑎𝑠𝑘𝐿𝑀 𝐿𝑃𝑟𝑒𝑓𝑖𝑥𝐿𝑀
  • 12. 012 Proposed method Pre-training methods (Image-Text Contrastive learning) • 𝐼1, 𝐼2, 𝐼3, 𝐼4 𝑇1, 𝑇2, 𝑇3, 𝑇4 𝐼1, 𝐼2, 𝐼3, 𝐼4 𝑇1, 𝑇2, 𝑇3, 𝑇4 𝐼1 𝑇2, 𝑇3, 𝑇4 𝐼3 𝑇1, 𝑇2, 𝑇4
  • 13. 013 Proposed method Pre-training methods (Image-Text Contrastive learning) 𝑻𝟏 𝑻𝟐 𝑻𝟑 𝑻𝟒 𝑰𝟏 𝑺𝟏𝟏 𝑺𝟏𝟐 𝑺𝟏𝟑 𝑺𝟏𝟒 𝑰𝟐 𝑺𝟐𝟏 𝑺𝟐𝟐 𝑺𝟐𝟑 𝑺𝟐𝟒 𝑰𝟑 𝑺𝟑𝟏 𝑺𝟑𝟐 𝑺𝟑𝟑 𝑺𝟑𝟒 𝑰𝟒 𝑺𝟒𝟏 𝑺𝟒𝟐 𝑺𝟒𝟑 𝑺𝟒𝟒 𝐼𝑚𝑎𝑔𝑒 𝑇𝑒𝑥𝑡 𝑳𝑰𝑻𝑪 + 𝐿𝐼𝑇𝑀 + 𝐿𝑀𝑎𝑠𝑘𝐿𝑀 𝐿𝑃𝑟𝑒𝑓𝑖𝑥𝐿𝑀
  • 14. 014 Proposed method Model architecture 𝐿𝐼𝑇𝐶 + 𝑳𝑰𝑻𝑴 + 𝐿𝑀𝑎𝑠𝑘𝐿𝑀 𝐿𝑃𝑟𝑒𝑓𝑖𝑥𝐿𝑀
  • 15. 015 Proposed method Pre-training methods (Image-Text Matching) 𝑻𝟏 𝑻𝟐 𝑻𝟑 𝑻𝟒 𝑰𝟏 𝑺𝟏𝟏 𝑺𝟏𝟐 𝑺𝟏𝟑 𝑺𝟏𝟒 𝑰𝟐 𝑺𝟐𝟏 𝑺𝟐𝟐 𝑺𝟐𝟑 𝑺𝟐𝟒 𝑰𝟑 𝑺𝟑𝟏 𝑺𝟑𝟐 𝑺𝟑𝟑 𝑺𝟑𝟒 𝑰𝟒 𝑺𝟒𝟏 𝑺𝟒𝟐 𝑺𝟒𝟑 𝑺𝟒𝟒 𝑻𝟏 𝑻𝟐 𝑻𝟑 𝑻𝟒 𝑰𝟏 𝑰𝟐 𝑰𝟑 𝑰𝟒 𝐼𝑚𝑎𝑔𝑒 𝑇𝑒𝑥𝑡 𝐼𝑚𝑎𝑔𝑒 𝑇𝑒𝑥𝑡 [Hard negative Image-Text pair] 𝐼1 − 𝑇2 𝐼2 − 𝑇1 𝐼3 − 𝑇1 𝐼4 − 𝑇3 𝑻𝟏 𝑻𝟐 𝑻𝟑 𝑻𝟒 𝑰𝟏 𝑰𝟐 𝑰𝟑 𝑰𝟒 𝐼𝑚𝑎𝑔𝑒 𝑇𝑒𝑥𝑡 [Hard negative Text-Image pair] 𝑇1 − 𝐼2 𝑇2 − 𝐼1 𝑇3 − 𝐼4 𝑇4 − 𝐿1
  • 16. 016 Proposed method Pre-training methods (Image-Text Matching) → 𝐿𝐼𝑇𝐶 + 𝑳𝑰𝑻𝑴 + 𝐿𝑀𝑎𝑠𝑘𝐿𝑀 𝐿𝑃𝑟𝑒𝑓𝑖𝑥𝐿𝑀 𝑳𝑰𝑻𝑴
  • 17. 017 Proposed method Model architecture 𝐿𝐼𝑇𝐶 + 𝐿𝐼𝑇𝑀 + 𝑳𝑴𝒂𝒔𝒌𝑳𝑴 𝐿𝑃𝑟𝑒𝑓𝑖𝑥𝐿𝑀
  • 18. 018 Proposed method Pre-training methods (Masked Language Modeling) 𝐿𝐼𝑇𝐶 + 𝐿𝐼𝑇𝑀 + 𝑳𝑴𝒂𝒔𝒌𝑳𝑴 𝐿𝑃𝑟𝑒𝑓𝑖𝑥𝐿𝑀
  • 19. 019 Proposed method Model architecture 𝐿𝐼𝑇𝐶 + 𝐿𝐼𝑇𝑀 + 𝐿𝑀𝑎𝑠𝑘𝐿𝑀 𝑳𝑷𝒓𝒆𝒇𝒊𝒙𝑳𝑴
  • 20. 020 Proposed method Pre-training methods (Prefix Language Modeling)
  • 21. 021 Proposed method Pre-training methods 𝑳𝑰𝑻𝑪 𝑳𝑴𝑳𝑴 𝑳𝑰𝑻𝑴 𝑳𝑷𝒓𝒆𝒇𝒊𝒙𝑳𝑴
  • 24. 024 Experiments Data & Setup • • • ) • 𝐵𝐸𝑅𝑇𝑏𝑎𝑠𝑒 Multi-modal network (Last 6-layers in 𝐵𝐸𝑅𝑇𝑏𝑎𝑠𝑒) Decoder (12-layers Transformer) • • NVIDIA A100 GPUs (16 GPUS) • AdamW (Weight decay 0.02) • 1e-5 (ViT) / 1e-4 (BERT) • Random image crop, RandAugment • 65,536 •
  • 26. 026 Evaluation on Vision-Language Tasks Visual Question Answering
  • 27. 027 Evaluation on Vision-Language Tasks Image captioning
  • 28. 028 Evaluation on Vision-Language Tasks Image-Text retrieval
  • 29. 029 Evaluation on Vision-Language Tasks Visual grounding
  • 30. 030 Evaluation on Vision-Language Tasks Visual reasoning
  • 31. 031 Effectiveness and Efficiency Analysis of Stride for Skip
  • 32. 032 Effectiveness and Efficiency Analysis of Cross-modal Fusion
  • 35. 035 Conclusion • Presents mPLUG with novel cross-modal skip-connections, an effective and efficient VLP framework for both cross-modal understanding and generation. • mPLUG achieves state-of-the-art performance on a wide range of vision-language tasks with zero-shot manner