2. はじめに
• Densecaptioningの性能向上を目的とした研究を2件紹介します
• L. Yang et al. Dense Captioning with Joint Inference and Visual Context. CVPR, 2017.
• G. Yin et al. Context and Attribute Grounded Dense Captioning. CVPR, 2019.
• どちらも、対象となる領域の周囲のコンテキストも考慮したキャプション生成を
行う手法
22[Yang+ CVPR’17]
[Yin+ CVPR’19]
参考文献
• J. Johnsonet al. DenseCap: Fully Convolutional Localization Networks for Dense Captioning.
CVPR, 2016.
• O. Vinyals et al. Show and Tell: A Neural Image Caption Generator. CVPR, 2015.
• K. Xu et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention.
ICML, 2015.
• S. Ren et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal
Networks. NIPS, 2015.
• P. Anderson et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual
Question Answering. CVPR, 2018.
• T. Yao et al. Exploring Visual Relationship for Image Captioning. ECCV, 2018.
• R. Krishna et al. Visual genome: Connecting language and vision using crowdsourced dense
image annotations. 2016.
• L. Yang et al. Dense Captioning with Joint Inference and Visual Context. CVPR, 2017.
• G. Yin et al. Context and Attribute Grounded Dense Captioning. CVPR, 2019.
• D-J Kim et al. Dense Relational Captioning: Triple-Stream Networks for Relationship-Based
Captioning. CVPR, 2019.
39