5. Aim & solution
Our Aim
• Development of a system for reverse video search
• Very high retrieval performance
• Fast retrieval speed
Our Solution
• Two main components
• Video indexing & filtering
• Video-level method
• Efficient video indexing and filtering
• Video similarity calculation
• Frame-level method
• Video similarity learning
6. Video indexing & filtering (1/2)
Layer Bag-of-Word (LBoW)
• Extract a number of L visual words from each video frame
• Index videos based on the extracted words
Kordopatis-Zilos et al. “Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers”. MMM, 2017.
7. Video indexing & filtering (2/2)
Similarity calculation
• Video-level representations with tf-idf weighting
• Cosine similarity
Video filtering
• Rank videos based on their similarity
• Select the top N videos (set to N = 5,000)
• Select videos with similarity greater than t
Kordopatis-Zilos et al. “Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers”. MMM, 2017.
8. Video similarity calculation (1/2)
Video Similarity Learning (ViSiL)
• Learn a video similarity function that considers:
• Spatial structure of video frames (intra-frame relations)
• Temporal structure of videos (inter-frame relations)
Kordopatis-Zilos et al. “ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning”. ICCV, 2019.
9. Video similarity calculation (2/2)
Video Similarity Learning network
• 4-layer CNN
• Captures the temporal structures
in the similarity matrix
Kordopatis-Zilos et al. “ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning”. ICCV, 2019.
10. Experimental setup
FIVR-200K dataset
• 225,960 videos from 4,687 news events
• 100 query videos
• Three retrieval tasks
• Simulate different scenarios
Evaluation metrics
• mean Average Precision (mAP)
Kordopatis-Zilos et al. “FIVR: Fine-grained Incident Video Retrieval”. IEEE TMM, 2019.
16. Experiments
Kordopatis-Zilos et al. “ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning”. ICCV, 2019.
Additional experimental results
• Evaluation on four video retrieval problems
• Achieves state-of-the-art performance
17. Video verification
query video database video
frame-to-frame
similarity matrix
ViSiL output video
similarity
0.8
0.5
near-duplicate
videos
same event
videos
21. Tips
• Video-level methods offer a fast video retrieval solution but with limited
performance
• Frame-level methods achieve high retrieval performance, but with very high
computation cost
• Combination of the two method types with a carefully selected similarity
threshold according to the application scenario
22. Thank you!
Get in touch:
Giorgos Kordopatis-Zilos: georgekordopatis@iti.gr / @g_kordo
Team info:
https://mever.iti.gr/
https://twitter.com/meverteam
Code & models:
https://github.com/MKLab-ITI/FIVR-200K
https://github.com/MKLab-ITI/visil
With the support of:
MeVer