Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning


Published on

Oral presentation at IEEE International Conference on Computer Vision (ICCV) 2019, Seoul, Korea.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning

  1. 1. ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning Giorgos Kordopatis-Zilos Symeon Papadopoulos Ioannis Patras Ioannis Kompatsiaris
  2. 2. Problem statement Given two arbitrary videos, calculate their similarity based on their visual content. Query Video Complementary Scene Video Duplicate Scene Video Incident Scene Video Application scenario • Video Retrieval
  3. 3. Video-level methods Z. Gao et al. “ER3: A unified framework for event retrieval, recognition and recounting”. CVPR, 2017. G. Kordopatis-Zilos et al. “Near-duplicate video retrieval with deep metric learning”. ICCVW, 2017. Video similarity calculation disregards spatio-temporal information of videos
  4. 4. Frame-level methods Y. Jiang and J. Wang. “Partial copy detection in videos: A benchmark and an evaluation of popular methods”. Tran. on Big Data, 2016. L. Baraldi et al. “LAMV: Learning to align and match videos with kernelized temporal layers”. CVPR, 2018. Frame-to-frame similarity calculation disregards the spatial structure of frames
  5. 5. Motivation Fine-grained similarity calculation • Learn a video similarity function that respects: • Spatial structure of video frames (intra-frame relations) • Temporal structure of videos (inter-frame relations)
  6. 6. Frame-to-frame similarity Chamfer Similarity
  7. 7. Frame-to-frame similarity Baseline frame-to-frame similarity matrix ViSiL frame-to-frame similarity matrix
  8. 8. Video-to-video similarity Video Similarity Learning network • 4-layer CNN • Captures the temporal structures on similarity matrix with the convolutional filters Chamfer Similarity
  9. 9. Training ViSiL
  10. 10. Experimental results Near-Duplicate Video Retrieval (CC_WEB_VIDEO) Fine-grained Incident Video Retrieval (FIVR-200K) Action Video Retrieval (ActivityNet) Event-based Video Retrieval (EVVE)
  11. 11. Visual examples query video database video frame-to-frame similarity matrix ViSiL output video-to-video similarity 0.8 0.5 0.7 near-duplicate videos same event videos same action videos
  12. 12. Thank you! Poster ID: No. 39 Code & models: With the support of: Get in touch: Giorgos Kordopatis-Zilos: / @g_kordo No. EP/R026424/1No. 825297