2. Problem statement
Given two arbitrary videos, calculate their similarity based on their visual content.
Query Video
Complementary
Scene Video
Duplicate
Scene Video
Incident
Scene Video
Application scenario
• Video Retrieval
3. Video-level methods
Z. Gao et al. “ER3: A unified framework for event retrieval, recognition and recounting”. CVPR, 2017.
G. Kordopatis-Zilos et al. “Near-duplicate video retrieval with deep metric learning”. ICCVW, 2017.
Video similarity calculation disregards
spatio-temporal information of videos
4. Frame-level methods
Y. Jiang and J. Wang. “Partial copy detection in videos: A benchmark and an evaluation of popular methods”. Tran. on Big Data, 2016.
L. Baraldi et al. “LAMV: Learning to align and match videos with kernelized temporal layers”. CVPR, 2018.
Frame-to-frame similarity
calculation disregards the
spatial structure of frames
5. Motivation
Fine-grained similarity calculation
• Learn a video similarity function that respects:
• Spatial structure of video frames (intra-frame relations)
• Temporal structure of videos (inter-frame relations)
8. Video-to-video similarity
Video Similarity Learning network
• 4-layer CNN
• Captures the temporal structures
on similarity matrix with the
convolutional filters
Chamfer Similarity
10. Experimental results
Near-Duplicate Video Retrieval
(CC_WEB_VIDEO)
Fine-grained Incident
Video Retrieval
(FIVR-200K)
Action Video Retrieval
(ActivityNet)
Event-based Video Retrieval (EVVE)
11. Visual examples
query video database video
frame-to-frame
similarity matrix
ViSiL output video-to-video
similarity
0.8
0.5
0.7
near-duplicate
videos
same event
videos
same action
videos
12. Thank you!
Poster ID: No. 39
Code & models:
https://github.com/MKLab-ITI/visil
With the support of:
Get in touch:
Giorgos Kordopatis-Zilos: georgekordopatis@iti.gr / @g_kordo
No. EP/R026424/1No. 825297