This document proposes ViSiL, a method for fine-grained video similarity learning that respects both the spatial structure of video frames and the temporal structure of videos. ViSiL learns a video similarity function using a 4-layer CNN that captures temporal structures in a frame-to-frame similarity matrix. Experimental results show ViSiL can accurately retrieve near-duplicate, same incident, same action, and same event videos from databases.
2. Problem statement
Given two arbitrary videos, calculate their similarity based on their visual content.
Query Video
Complementary
Scene Video
Duplicate
Scene Video
Incident
Scene Video
Application scenario
• Video Retrieval
3. Video-level methods
Z. Gao et al. “ER3: A unified framework for event retrieval, recognition and recounting”. CVPR, 2017.
G. Kordopatis-Zilos et al. “Near-duplicate video retrieval with deep metric learning”. ICCVW, 2017.
Video similarity calculation disregards
spatio-temporal information of videos
4. Frame-level methods
Y. Jiang and J. Wang. “Partial copy detection in videos: A benchmark and an evaluation of popular methods”. Tran. on Big Data, 2016.
L. Baraldi et al. “LAMV: Learning to align and match videos with kernelized temporal layers”. CVPR, 2018.
Frame-to-frame similarity
calculation disregards the
spatial structure of frames
5. Motivation
Fine-grained similarity calculation
• Learn a video similarity function that respects:
• Spatial structure of video frames (intra-frame relations)
• Temporal structure of videos (inter-frame relations)
8. Video-to-video similarity
Video Similarity Learning network
• 4-layer CNN
• Captures the temporal structures
on similarity matrix with the
convolutional filters
Chamfer Similarity
10. Experimental results
Near-Duplicate Video Retrieval
(CC_WEB_VIDEO)
Fine-grained Incident
Video Retrieval
(FIVR-200K)
Action Video Retrieval
(ActivityNet)
Event-based Video Retrieval (EVVE)
11. Visual examples
query video database video
frame-to-frame
similarity matrix
ViSiL output video-to-video
similarity
0.8
0.5
0.7
near-duplicate
videos
same event
videos
same action
videos
12. Thank you!
Poster ID: No. 39
Code & models:
https://github.com/MKLab-ITI/visil
With the support of:
Get in touch:
Giorgos Kordopatis-Zilos: georgekordopatis@iti.gr / @g_kordo
No. EP/R026424/1No. 825297