Successfully reported this slideshow.
Your SlideShare is downloading. ×

Dense-captioning events in videos

Ad

Dense-Captioning Events
in Videos

Ad

Dense-Captioning

Ad

Highlight
• Task: dense-captioning events
• Dataset: ActivityNet Captions
• Events range across multiple time scales and c...

Ad

Ad

Ad

Ad

Ad

Ad

Loading in …3
×

Check these out next

1 of 9 Ad
1 of 9 Ad
Advertisement

More Related Content

Advertisement

Dense-captioning events in videos

  1. 1. Dense-Captioning Events in Videos
  2. 2. Dense-Captioning
  3. 3. Highlight • Task: dense-captioning events • Dataset: ActivityNet Captions • Events range across multiple time scales and can even overlap. • generating action proposals to multi-scale detection of events, processes each video in a forward pass to detect events as they occur • Events in a given video are usually related to one another. • introduce a captioning module that utilizes the context from all the events from our proposal module to generate each sentence
  4. 4. DenseCap: Fully Convolutional Localization Networks for Dense Captioning
  5. 5. DenseCap: Fully Convolutional Localization Networks for Dense Captioning
  6. 6. Method V. Escorcia, F. C. Heilbron, J. C. Niebles, and B. Ghanem. Daps: Deep action proposals for action understanding. 2016,ECCV J. Johnson, A. Karpathy, and L. Fei-Fei. DenseCap: Fully convolutional localization networks for dense captioning. A. Alahi, K. Goel, V. Ramanathan, A. Robicquet, L. Fei- Fei, and S. Savarese. Social lstm: Human trajectory prediction in crowded spaces. object-centric in images action-centric in videos
  7. 7. Performance
  8. 8. Discussion Jointly Localizing and Describing Events for Dense Video Captioning
  9. 9. Discussion Joint Event Detection and Description in Continuous Video Streams

Editor's Notes

  • 1.给定视频,生成特征序列。实验中以16帧为单位,输入C3D提取特征。

    2.proposal module。proposal module是在DAPs的基础上做了一点修改,即在每一个time step输出K个proposals。采用LSTM结构,输入上述C3D特征序列,用不同的strides提取特征序列,strides={1,2,4,8}。
    生成的proposal在时间上会有重叠。每检测出一个event,就将当前的隐藏层状态作为视频描述。

    3.captioning module。利用相邻事件的context来生成event caption。采用LSTM结构。
    将所有的事件相对于当前事件分成两个桶:past events和future events。并发事件则依据结束时间分成past events和future events。

×