SlideShare a Scribd company logo
ProbVLM: Probabilistic Adapter for
Frozen Vison-Language Models
Uddeshya Upadhyay, Shyamgopal Karthik, Massimiliano Mancini, Zeynep Akata,
ICCV2023
木全潤(名工大)
2023/10/30
概要
n決定的に事前学習されたVision-Languageモデルに
確率的な埋め込みを追加
• マルチモーダル埋め込みの不確実性を推定
• 下流タスクへの支援
• 潜在拡散モデルの埋め込みの可視化
babilistic Adapter for Frozen Vison-Language Models
1
Shyamgopal Karthik⇤,1
Massimiliano Mancini2
Zeynep Akata1,3
Tübingen 2
University of Trento 3
MPI for Intelligent Systems
ct
models (VLMs) like CLIP
between images and text.
stic mapping process, an
ped to a single vector in
problematic: as multiple
bstract the same concept
手法の目的
nCLIP [Radford+, ICML2021]などのVision-Languageモデル
• 画像とテキストを共有埋め込み空間に埋め込む
n提案手法
• アダプタを用いて埋め込まれた情報を確率的に再度埋め込み
• 出力の不確実性を定量化
babilistic Adapter for Frozen Vison-Language Models
1
Shyamgopal Karthik⇤,1
Massimiliano Mancini2
Zeynep Akata1,3
Tübingen 2
University of Trento 3
MPI for Intelligent Systems
ct
models (VLMs) like CLIP
between images and text.
stic mapping process, an
ped to a single vector in
problematic: as multiple
bstract the same concept
手法
Figure 2: Proposed framework (ProbVLM) takes an existing vision-language model and introduces a probabilistic adapter
手法
nIntra-modal Alignment
• ProbVLMの出力が元の埋め込みから
離れすぎないようにする
nCross-modal Alignment
• ProbVLMの出力がテキストと画像で
互い近くなるようにする
Figure 2: Proposed framework (ProbVLM) takes an existing vision-language model and introduces a probabilistic
over the image and text encoders. These adapters predict the parameters of a parameterized distribution for a given
ding. Models are trained by minimizing an objective consisting of intra/cross-modal supervision as detailed in Sectio
SLIP [51], Flava [74] and BLIP [45], are in frozen state, i.e.,
we have V (·; ✓⇤
V ) and T (·; ✓⇤
T ), where ✓⇤
V , ✓⇤
T represents
the parameters of the pretrained frozen encoders. These en-
coders are deterministic and map an image/text to vectors
in the shared space, i.e., given a sample image xV (and sim-
that learns to estimate the parameters {ẑ, ˆ
⌫...ˆ
⇢} w
help of frozen encoders V (·; ✓⇤
V ) and T (·; ✓⇤
T
functions V (·; ⇣V ) and T (·; ⇣T ) operate on im
text embeddings respectively, but during training de
both modalities, as discussed later. We design th
実験
nデータセット
• MS-COCO [Lin+, ECCV2014]
• Flickr-30k [Plummer+, CVPR2015]
• CUB [Wah+, Calteck2011]
• Oxford-Flowers 102 [Nilsback+,
ICVGIP2008]
n評価指標
• Recall@k
• Uncertainty level [chun+,
CVPR2021]の定義
nベースライン
• PFE [Shi and Jain, ICCV2019]から
適応
n学習率
• 1e-4
n学習エポック
• 100
較正された不確実性の計測
n不確実性が増加すると性能が低下
• S:R@kと不確実性レベルのスピアマン順位相関
• R^2:不確実性レベルとR@1の回帰適合
• -SR^2:これらの積
i2t t2i
VL M Metrics COCO Flickr FLO CUB COCO Flickr FLO CUB
CLIP
ProbVLM
S # -0.99 -0.70 -0.90 -0.60 -0.30 -0.70 -0.99 -0.89
R2
" 0.93 0.71 0.62 0.67 0.35 0.50 0.99 0.70
-SR2
" 0.93 0.49 0.56 0.40 0.10 0.35 0.99 0.63
PFE*[73]
S # -0.79 -0.19 0.60 -0.60 0.79 0.30 -0.89 -0.10
R2
" 0.59 0.01 0.30 0.28 0.74 0.44 0.52 0.00
-SR2
" 0.47 0.00 -0.18 0.17 -0.59 -0.13 0.47 -0.00
PCME*[10]
S # -0.89 -0.30 -0.30 -0.60 0.30 0.09 -0.70 0.30
R2
" 0.75 0.07 0.07 0.20 0.16 0.01 0.57 0.01
-SR2
" 0.68 0.02 0.02 0.12 -0.05 -0.00 0.40 -0.00
TTDA[2]
S # -0.79 -0.30 0.00 -0.60 -0.10 -0.19 -0.89 -0.50
R2
" 0.69 0.09 0.00 0.41 0.26 0.071 0.80 0.15
-SR2
" 0.55 0.03 0.00 0.24 0.00 0.01 0.73 0.07
BLIP
ProbVLM
S # -0.87 -0.79 -0.74 -0.66 -0.43 -0.38 -0.31 -0.22
R2
" 0.92 0.83 0.68 0.61 0.52 0.48 0.45 0.38
-SR2
" 0.80 0.66 0.50 0.40 0.22 0.18 0.14 0.08
PFE*[73]
S # -0.82 -0.74 -0.63 -0.63 -0.39 -0.32 -0.28 -0.18
R2
" 0.72 0.76 0.62 0.44 0.48 0.38 0.39 0.37
-SR2
" 0.58 0.57 0.39 0.27 0.19 0.12 0.11 0.07
PCME*[10]
S # -0.76 -0.53 -0.60 -0.44 -0.28 -0.26 -0.28 -0.21
R2
" 0.81 0.56 0.60 0.53 0.50 0.34 0.44 0.36
-SR2
" 0.62 0.29 0.36 0.23 0.14 0.09 0.12 0.08
TTDA[2]
S # -0.44 -0.33 -0.74 -0.60 -0.19 -0.26 -0.21 -0.21
R2
" 0.66 0.56 0.42 0.55 0.49 0.23 0.35 0.36
-SR2
" 0.29 0.18 0.31 0.33 0.10 0.06 0.07 0.08
Table 1: Metrics to evaluate the calibration of the uncer-
曖昧さの可視化
nCUBの画像を固定した予測埋込み分布
• CUBとCOCOのすべてのサンプルの尤度を計算
nProbVLM(左)とCLIP(右)の比較
• 重なっている部分によって曖昧さが捉えられている
Stable diffusionを用いた埋め込みの可視化
n方法
• キャプションの予測分布から埋め込みベクトルをサンプリング
• Stable diffusionモデルに通して可視化
n結果
• 平均に近いサンプルほど
生成画像に意味のある
バリエーション
• 離れすぎると強い
アーチファクト
まとめ
nProbVLMの提案
• 凍結された大規模な決定的VLMの埋め込み分布を推定
• 下流タスクへの有用性を示す
• 予測した埋め込み分布を拡散モデルで解釈する実験

More Related Content

Similar to 論文紹介:ProbVLM: Probabilistic Adapter for Frozen Vison-Language Models

[DL輪読会]Dense Captioning分野のまとめ
[DL輪読会]Dense Captioning分野のまとめ[DL輪読会]Dense Captioning分野のまとめ
[DL輪読会]Dense Captioning分野のまとめ
Deep Learning JP
 
SSII2022 [TS3] コンテンツ制作を支援する機械学習技術​〜 イラストレーションやデザインの基礎から最新鋭の技術まで 〜​
SSII2022 [TS3] コンテンツ制作を支援する機械学習技術​〜 イラストレーションやデザインの基礎から最新鋭の技術まで 〜​SSII2022 [TS3] コンテンツ制作を支援する機械学習技術​〜 イラストレーションやデザインの基礎から最新鋭の技術まで 〜​
SSII2022 [TS3] コンテンツ制作を支援する機械学習技術​〜 イラストレーションやデザインの基礎から最新鋭の技術まで 〜​
SSII
 
論文紹介: Offline Q-Learning on diverse Multi-Task data both scales and generalizes
論文紹介: Offline Q-Learning on diverse Multi-Task data both scales and generalizes論文紹介: Offline Q-Learning on diverse Multi-Task data both scales and generalizes
論文紹介: Offline Q-Learning on diverse Multi-Task data both scales and generalizes
atsushi061452
 
PredCNN: Predictive Learning with Cascade Convolutions
PredCNN: Predictive Learning with Cascade ConvolutionsPredCNN: Predictive Learning with Cascade Convolutions
PredCNN: Predictive Learning with Cascade Convolutions
harmonylab
 
【2016.01】(2/3)cvpaper.challenge2016
【2016.01】(2/3)cvpaper.challenge2016【2016.01】(2/3)cvpaper.challenge2016
【2016.01】(2/3)cvpaper.challenge2016
cvpaper. challenge
 
テスト 【クラウドアプリケーションのためのオブジェクト指向分析設計講座 第33回】
テスト 【クラウドアプリケーションのためのオブジェクト指向分析設計講座 第33回】テスト 【クラウドアプリケーションのためのオブジェクト指向分析設計講座 第33回】
テスト 【クラウドアプリケーションのためのオブジェクト指向分析設計講座 第33回】
Tomoharu ASAMI
 
論文紹介:Tracking Anything with Decoupled Video Segmentation
論文紹介:Tracking Anything with Decoupled Video Segmentation論文紹介:Tracking Anything with Decoupled Video Segmentation
論文紹介:Tracking Anything with Decoupled Video Segmentation
Toru Tamaki
 
文献紹介:Length-Controllable Image Captioning
文献紹介:Length-Controllable Image Captioning文献紹介:Length-Controllable Image Captioning
文献紹介:Length-Controllable Image Captioning
Toru Tamaki
 
論文紹介:A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, a...
論文紹介:A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, a...論文紹介:A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, a...
論文紹介:A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, a...
Toru Tamaki
 
Generating Better Search Engine Text Advertisements with Deep Reinforcement L...
Generating Better Search Engine Text Advertisements with Deep Reinforcement L...Generating Better Search Engine Text Advertisements with Deep Reinforcement L...
Generating Better Search Engine Text Advertisements with Deep Reinforcement L...
harmonylab
 
[DL輪読会]Meta-Learning Probabilistic Inference for Prediction
[DL輪読会]Meta-Learning Probabilistic Inference for Prediction[DL輪読会]Meta-Learning Probabilistic Inference for Prediction
[DL輪読会]Meta-Learning Probabilistic Inference for Prediction
Deep Learning JP
 
【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...
【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...
【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...
Deep Learning JP
 
論文紹介:A Survey of Vision-Language Pre-Trained Models
論文紹介:A Survey of Vision-Language Pre-Trained Models論文紹介:A Survey of Vision-Language Pre-Trained Models
論文紹介:A Survey of Vision-Language Pre-Trained Models
Toru Tamaki
 

Similar to 論文紹介:ProbVLM: Probabilistic Adapter for Frozen Vison-Language Models (13)

[DL輪読会]Dense Captioning分野のまとめ
[DL輪読会]Dense Captioning分野のまとめ[DL輪読会]Dense Captioning分野のまとめ
[DL輪読会]Dense Captioning分野のまとめ
 
SSII2022 [TS3] コンテンツ制作を支援する機械学習技術​〜 イラストレーションやデザインの基礎から最新鋭の技術まで 〜​
SSII2022 [TS3] コンテンツ制作を支援する機械学習技術​〜 イラストレーションやデザインの基礎から最新鋭の技術まで 〜​SSII2022 [TS3] コンテンツ制作を支援する機械学習技術​〜 イラストレーションやデザインの基礎から最新鋭の技術まで 〜​
SSII2022 [TS3] コンテンツ制作を支援する機械学習技術​〜 イラストレーションやデザインの基礎から最新鋭の技術まで 〜​
 
論文紹介: Offline Q-Learning on diverse Multi-Task data both scales and generalizes
論文紹介: Offline Q-Learning on diverse Multi-Task data both scales and generalizes論文紹介: Offline Q-Learning on diverse Multi-Task data both scales and generalizes
論文紹介: Offline Q-Learning on diverse Multi-Task data both scales and generalizes
 
PredCNN: Predictive Learning with Cascade Convolutions
PredCNN: Predictive Learning with Cascade ConvolutionsPredCNN: Predictive Learning with Cascade Convolutions
PredCNN: Predictive Learning with Cascade Convolutions
 
【2016.01】(2/3)cvpaper.challenge2016
【2016.01】(2/3)cvpaper.challenge2016【2016.01】(2/3)cvpaper.challenge2016
【2016.01】(2/3)cvpaper.challenge2016
 
テスト 【クラウドアプリケーションのためのオブジェクト指向分析設計講座 第33回】
テスト 【クラウドアプリケーションのためのオブジェクト指向分析設計講座 第33回】テスト 【クラウドアプリケーションのためのオブジェクト指向分析設計講座 第33回】
テスト 【クラウドアプリケーションのためのオブジェクト指向分析設計講座 第33回】
 
論文紹介:Tracking Anything with Decoupled Video Segmentation
論文紹介:Tracking Anything with Decoupled Video Segmentation論文紹介:Tracking Anything with Decoupled Video Segmentation
論文紹介:Tracking Anything with Decoupled Video Segmentation
 
文献紹介:Length-Controllable Image Captioning
文献紹介:Length-Controllable Image Captioning文献紹介:Length-Controllable Image Captioning
文献紹介:Length-Controllable Image Captioning
 
論文紹介:A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, a...
論文紹介:A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, a...論文紹介:A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, a...
論文紹介:A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, a...
 
Generating Better Search Engine Text Advertisements with Deep Reinforcement L...
Generating Better Search Engine Text Advertisements with Deep Reinforcement L...Generating Better Search Engine Text Advertisements with Deep Reinforcement L...
Generating Better Search Engine Text Advertisements with Deep Reinforcement L...
 
[DL輪読会]Meta-Learning Probabilistic Inference for Prediction
[DL輪読会]Meta-Learning Probabilistic Inference for Prediction[DL輪読会]Meta-Learning Probabilistic Inference for Prediction
[DL輪読会]Meta-Learning Probabilistic Inference for Prediction
 
【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...
【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...
【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...
 
論文紹介:A Survey of Vision-Language Pre-Trained Models
論文紹介:A Survey of Vision-Language Pre-Trained Models論文紹介:A Survey of Vision-Language Pre-Trained Models
論文紹介:A Survey of Vision-Language Pre-Trained Models
 

More from Toru Tamaki

論文紹介:Deep Learning-Based Human Pose Estimation: A Survey
論文紹介:Deep Learning-Based Human Pose Estimation: A Survey論文紹介:Deep Learning-Based Human Pose Estimation: A Survey
論文紹介:Deep Learning-Based Human Pose Estimation: A Survey
Toru Tamaki
 
論文紹介:Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
論文紹介:Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation論文紹介:Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
論文紹介:Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
Toru Tamaki
 
論文紹介:Multi-criteria Token Fusion with One-step-ahead Attention for Efficient ...
論文紹介:Multi-criteria Token Fusion with One-step-ahead Attention for Efficient ...論文紹介:Multi-criteria Token Fusion with One-step-ahead Attention for Efficient ...
論文紹介:Multi-criteria Token Fusion with One-step-ahead Attention for Efficient ...
Toru Tamaki
 
論文紹介:ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
論文紹介:ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation論文紹介:ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
論文紹介:ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
Toru Tamaki
 
論文紹介:ArcFace: Additive Angular Margin Loss for Deep Face Recognition
論文紹介:ArcFace: Additive Angular Margin Loss for Deep Face Recognition論文紹介:ArcFace: Additive Angular Margin Loss for Deep Face Recognition
論文紹介:ArcFace: Additive Angular Margin Loss for Deep Face Recognition
Toru Tamaki
 
論文紹介:Deep Occlusion-Aware Instance Segmentation With Overlapping BiLayers
論文紹介:Deep Occlusion-Aware Instance Segmentation With Overlapping BiLayers論文紹介:Deep Occlusion-Aware Instance Segmentation With Overlapping BiLayers
論文紹介:Deep Occlusion-Aware Instance Segmentation With Overlapping BiLayers
Toru Tamaki
 
論文紹介:Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Groun...
論文紹介:Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Groun...論文紹介:Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Groun...
論文紹介:Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Groun...
Toru Tamaki
 
論文紹介:Selective Structured State-Spaces for Long-Form Video Understanding
論文紹介:Selective Structured State-Spaces for Long-Form Video Understanding論文紹介:Selective Structured State-Spaces for Long-Form Video Understanding
論文紹介:Selective Structured State-Spaces for Long-Form Video Understanding
Toru Tamaki
 
論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...
論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...
論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...
Toru Tamaki
 
論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...
論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...
論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...
Toru Tamaki
 
論文紹介:Automated Classification of Model Errors on ImageNet
論文紹介:Automated Classification of Model Errors on ImageNet論文紹介:Automated Classification of Model Errors on ImageNet
論文紹介:Automated Classification of Model Errors on ImageNet
Toru Tamaki
 
論文紹介:Semantic segmentation using Vision Transformers: A survey
論文紹介:Semantic segmentation using Vision Transformers: A survey論文紹介:Semantic segmentation using Vision Transformers: A survey
論文紹介:Semantic segmentation using Vision Transformers: A survey
Toru Tamaki
 
論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
Toru Tamaki
 
論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...
論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...
論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...
Toru Tamaki
 
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
Toru Tamaki
 
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
Toru Tamaki
 
論文紹介:Multitask Vision-Language Prompt Tuning
論文紹介:Multitask Vision-Language Prompt Tuning論文紹介:Multitask Vision-Language Prompt Tuning
論文紹介:Multitask Vision-Language Prompt Tuning
Toru Tamaki
 
論文紹介:MovieCLIP: Visual Scene Recognition in Movies
論文紹介:MovieCLIP: Visual Scene Recognition in Movies論文紹介:MovieCLIP: Visual Scene Recognition in Movies
論文紹介:MovieCLIP: Visual Scene Recognition in Movies
Toru Tamaki
 
論文紹介:Discovering Universal Geometry in Embeddings with ICA
論文紹介:Discovering Universal Geometry in Embeddings with ICA論文紹介:Discovering Universal Geometry in Embeddings with ICA
論文紹介:Discovering Universal Geometry in Embeddings with ICA
Toru Tamaki
 
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
Toru Tamaki
 

More from Toru Tamaki (20)

論文紹介:Deep Learning-Based Human Pose Estimation: A Survey
論文紹介:Deep Learning-Based Human Pose Estimation: A Survey論文紹介:Deep Learning-Based Human Pose Estimation: A Survey
論文紹介:Deep Learning-Based Human Pose Estimation: A Survey
 
論文紹介:Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
論文紹介:Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation論文紹介:Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
論文紹介:Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
 
論文紹介:Multi-criteria Token Fusion with One-step-ahead Attention for Efficient ...
論文紹介:Multi-criteria Token Fusion with One-step-ahead Attention for Efficient ...論文紹介:Multi-criteria Token Fusion with One-step-ahead Attention for Efficient ...
論文紹介:Multi-criteria Token Fusion with One-step-ahead Attention for Efficient ...
 
論文紹介:ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
論文紹介:ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation論文紹介:ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
論文紹介:ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
 
論文紹介:ArcFace: Additive Angular Margin Loss for Deep Face Recognition
論文紹介:ArcFace: Additive Angular Margin Loss for Deep Face Recognition論文紹介:ArcFace: Additive Angular Margin Loss for Deep Face Recognition
論文紹介:ArcFace: Additive Angular Margin Loss for Deep Face Recognition
 
論文紹介:Deep Occlusion-Aware Instance Segmentation With Overlapping BiLayers
論文紹介:Deep Occlusion-Aware Instance Segmentation With Overlapping BiLayers論文紹介:Deep Occlusion-Aware Instance Segmentation With Overlapping BiLayers
論文紹介:Deep Occlusion-Aware Instance Segmentation With Overlapping BiLayers
 
論文紹介:Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Groun...
論文紹介:Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Groun...論文紹介:Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Groun...
論文紹介:Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Groun...
 
論文紹介:Selective Structured State-Spaces for Long-Form Video Understanding
論文紹介:Selective Structured State-Spaces for Long-Form Video Understanding論文紹介:Selective Structured State-Spaces for Long-Form Video Understanding
論文紹介:Selective Structured State-Spaces for Long-Form Video Understanding
 
論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...
論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...
論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...
 
論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...
論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...
論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...
 
論文紹介:Automated Classification of Model Errors on ImageNet
論文紹介:Automated Classification of Model Errors on ImageNet論文紹介:Automated Classification of Model Errors on ImageNet
論文紹介:Automated Classification of Model Errors on ImageNet
 
論文紹介:Semantic segmentation using Vision Transformers: A survey
論文紹介:Semantic segmentation using Vision Transformers: A survey論文紹介:Semantic segmentation using Vision Transformers: A survey
論文紹介:Semantic segmentation using Vision Transformers: A survey
 
論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
 
論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...
論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...
論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...
 
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
 
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
 
論文紹介:Multitask Vision-Language Prompt Tuning
論文紹介:Multitask Vision-Language Prompt Tuning論文紹介:Multitask Vision-Language Prompt Tuning
論文紹介:Multitask Vision-Language Prompt Tuning
 
論文紹介:MovieCLIP: Visual Scene Recognition in Movies
論文紹介:MovieCLIP: Visual Scene Recognition in Movies論文紹介:MovieCLIP: Visual Scene Recognition in Movies
論文紹介:MovieCLIP: Visual Scene Recognition in Movies
 
論文紹介:Discovering Universal Geometry in Embeddings with ICA
論文紹介:Discovering Universal Geometry in Embeddings with ICA論文紹介:Discovering Universal Geometry in Embeddings with ICA
論文紹介:Discovering Universal Geometry in Embeddings with ICA
 
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
 

Recently uploaded

ハイブリッドクラウド研究会_Hyper-VとSystem Center Virtual Machine Manager セッションMM
ハイブリッドクラウド研究会_Hyper-VとSystem Center Virtual Machine Manager セッションMMハイブリッドクラウド研究会_Hyper-VとSystem Center Virtual Machine Manager セッションMM
ハイブリッドクラウド研究会_Hyper-VとSystem Center Virtual Machine Manager セッションMM
osamut
 
「進化するアプリ イマ×ミライ ~生成AIアプリへ続く道と新時代のアプリとは~」Interop24Tokyo APPS JAPAN B1-01講演
「進化するアプリ イマ×ミライ ~生成AIアプリへ続く道と新時代のアプリとは~」Interop24Tokyo APPS JAPAN B1-01講演「進化するアプリ イマ×ミライ ~生成AIアプリへ続く道と新時代のアプリとは~」Interop24Tokyo APPS JAPAN B1-01講演
「進化するアプリ イマ×ミライ ~生成AIアプリへ続く道と新時代のアプリとは~」Interop24Tokyo APPS JAPAN B1-01講演
嶋 是一 (Yoshikazu SHIMA)
 
Generating Automatic Feedback on UI Mockups with Large Language Models
Generating Automatic Feedback on UI Mockups with Large Language ModelsGenerating Automatic Feedback on UI Mockups with Large Language Models
Generating Automatic Feedback on UI Mockups with Large Language Models
harmonylab
 
Humanoid Virtual Athletics Challenge2024 技術講習会 スライド
Humanoid Virtual Athletics Challenge2024 技術講習会 スライドHumanoid Virtual Athletics Challenge2024 技術講習会 スライド
Humanoid Virtual Athletics Challenge2024 技術講習会 スライド
tazaki1
 
生成AIがもたらすコンテンツ経済圏の新時代  The New Era of Content Economy Brought by Generative AI
生成AIがもたらすコンテンツ経済圏の新時代  The New Era of Content Economy Brought by Generative AI生成AIがもたらすコンテンツ経済圏の新時代  The New Era of Content Economy Brought by Generative AI
生成AIがもたらすコンテンツ経済圏の新時代  The New Era of Content Economy Brought by Generative AI
Osaka University
 
ロジックから状態を分離する技術/設計ナイト2024 by わいとん @ytnobody
ロジックから状態を分離する技術/設計ナイト2024 by わいとん @ytnobodyロジックから状態を分離する技術/設計ナイト2024 by わいとん @ytnobody
ロジックから状態を分離する技術/設計ナイト2024 by わいとん @ytnobody
azuma satoshi
 

Recently uploaded (6)

ハイブリッドクラウド研究会_Hyper-VとSystem Center Virtual Machine Manager セッションMM
ハイブリッドクラウド研究会_Hyper-VとSystem Center Virtual Machine Manager セッションMMハイブリッドクラウド研究会_Hyper-VとSystem Center Virtual Machine Manager セッションMM
ハイブリッドクラウド研究会_Hyper-VとSystem Center Virtual Machine Manager セッションMM
 
「進化するアプリ イマ×ミライ ~生成AIアプリへ続く道と新時代のアプリとは~」Interop24Tokyo APPS JAPAN B1-01講演
「進化するアプリ イマ×ミライ ~生成AIアプリへ続く道と新時代のアプリとは~」Interop24Tokyo APPS JAPAN B1-01講演「進化するアプリ イマ×ミライ ~生成AIアプリへ続く道と新時代のアプリとは~」Interop24Tokyo APPS JAPAN B1-01講演
「進化するアプリ イマ×ミライ ~生成AIアプリへ続く道と新時代のアプリとは~」Interop24Tokyo APPS JAPAN B1-01講演
 
Generating Automatic Feedback on UI Mockups with Large Language Models
Generating Automatic Feedback on UI Mockups with Large Language ModelsGenerating Automatic Feedback on UI Mockups with Large Language Models
Generating Automatic Feedback on UI Mockups with Large Language Models
 
Humanoid Virtual Athletics Challenge2024 技術講習会 スライド
Humanoid Virtual Athletics Challenge2024 技術講習会 スライドHumanoid Virtual Athletics Challenge2024 技術講習会 スライド
Humanoid Virtual Athletics Challenge2024 技術講習会 スライド
 
生成AIがもたらすコンテンツ経済圏の新時代  The New Era of Content Economy Brought by Generative AI
生成AIがもたらすコンテンツ経済圏の新時代  The New Era of Content Economy Brought by Generative AI生成AIがもたらすコンテンツ経済圏の新時代  The New Era of Content Economy Brought by Generative AI
生成AIがもたらすコンテンツ経済圏の新時代  The New Era of Content Economy Brought by Generative AI
 
ロジックから状態を分離する技術/設計ナイト2024 by わいとん @ytnobody
ロジックから状態を分離する技術/設計ナイト2024 by わいとん @ytnobodyロジックから状態を分離する技術/設計ナイト2024 by わいとん @ytnobody
ロジックから状態を分離する技術/設計ナイト2024 by わいとん @ytnobody
 

論文紹介:ProbVLM: Probabilistic Adapter for Frozen Vison-Language Models