ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning

•

1 like•361 views

This document proposes ViSiL, a method for fine-grained video similarity learning that respects both the spatial structure of video frames and the temporal structure of videos. ViSiL learns a video similarity function using a 4-layer CNN that captures temporal structures in a frame-to-frame similarity matrix. Experimental results show ViSiL can accurately retrieve near-duplicate, same incident, same action, and same event videos from databases.

Technology

ViSiL: Fine-grained Spatio-Temporal
Video Similarity Learning
Giorgos Kordopatis-Zilos Symeon Papadopoulos Ioannis Patras Ioannis Kompatsiaris

Problem statement
Given two arbitrary videos, calculate their similarity based on their visual content.
Query Video
Complementary
Scene Video
Duplicate
Scene Video
Incident
Scene Video
Application scenario
• Video Retrieval

Video-level methods
Z. Gao et al. “ER3: A unified framework for event retrieval, recognition and recounting”. CVPR, 2017.
G. Kordopatis-Zilos et al. “Near-duplicate video retrieval with deep metric learning”. ICCVW, 2017.
Video similarity calculation disregards
spatio-temporal information of videos

Frame-level methods
Y. Jiang and J. Wang. “Partial copy detection in videos: A benchmark and an evaluation of popular methods”. Tran. on Big Data, 2016.
L. Baraldi et al. “LAMV: Learning to align and match videos with kernelized temporal layers”. CVPR, 2018.
Frame-to-frame similarity
calculation disregards the
spatial structure of frames

Motivation
Fine-grained similarity calculation
• Learn a video similarity function that respects:
• Spatial structure of video frames (intra-frame relations)
• Temporal structure of videos (inter-frame relations)

Frame-to-frame similarity
Chamfer Similarity

Frame-to-frame similarity
Baseline frame-to-frame
similarity matrix
ViSiL frame-to-frame
similarity matrix

Video-to-video similarity
Video Similarity Learning network
• 4-layer CNN
• Captures the temporal structures
on similarity matrix with the
convolutional filters
Chamfer Similarity

Experimental results
Near-Duplicate Video Retrieval
(CC_WEB_VIDEO)
Fine-grained Incident
Video Retrieval
(FIVR-200K)
Action Video Retrieval
(ActivityNet)
Event-based Video Retrieval (EVVE)

Visual examples
query video database video
frame-to-frame
similarity matrix
ViSiL output video-to-video
similarity
0.8
0.5
0.7
near-duplicate
videos
same event
videos
same action
videos

Thank you!
Poster ID: No. 39
Code & models:
https://github.com/MKLab-ITI/visil
With the support of:
Get in touch:
Giorgos Kordopatis-Zilos: georgekordopatis@iti.gr / @g_kordo
No. EP/R026424/1No. 825297

In this work, we address the problem of audio-based near-duplicate video retrieval. We propose the Audio Similarity Learning (AuSiL) approach that effectively captures temporal patterns of audio similarity between video pairs. For the robust similarity calculation between two videos, we first extract representative audio-based video descriptors by leveraging transfer learning based on a Convolutional Neural Network (CNN) trained on a large scale dataset of audio events, and then we calculate the similarity matrix derived from the pairwise similarity of these descriptors. The similarity matrix is subsequently fed to a CNN network that captures the temporal structures existing within its content. We train our network following a triplet generation process and optimizing the triplet loss function. To evaluate the effectiveness of the proposed approach, we have manually annotated two publicly available video datasets based on the audio duplicity between their videos. The proposed approach achieves very competitive results compared to three state-of-the-art methods. Also, unlike the competing methods, it is very robust to the retrieval of audio duplicates generated with speed transformations.

FIVR: Fine-grained Incident Video Retrieval | Presentation@ICME2020

gkordo

In this work, we introduce the problem of Fine-grained Incident Video Retrieval (FIVR). Given a query video, the objective is to retrieve all associated videos, considering several types of associations that range from duplicate videos to videos from the same incident. FIVR offers a single framework that contains several retrieval tasks as special cases. To address the benchmarking needs of all such tasks, we construct and present a large-scale annotated video dataset, which we call FIVR-200K, and it comprises 225,960 videos. To create the dataset, we devise a process for the collection of YouTube videos based on major news events from recent years crawled from Wikipedia and deploy a retrieval pipeline for the automatic selection of query videos based on their estimated suitability as benchmarks. We also devise a protocol for the annotation of the dataset with respect to the four types of video associations defined by FIVR. Finally, we report the results of an experimental study on the dataset comparing five state-of-the-art methods developed based on a variety of visual descriptors, highlighting the challenges of the current problem.

Object detection elearning

Lavanya Sharma

Reverse Video Search Large-scale Media Collections | Presentation@Q3-AIDI

gkordo

Reverse video search is the problem of retrieving all videos that share the same content with a given query from a video database. We tackle this problem by building a novel system that contains two main components: i) a video similarity learning method for the accurate similarity calculation, and ii) an efficient indexing scheme for fast retrieval. We present evaluation results and a case study on the problems of video verification and copyright management.

Reverse Video Search on Large-scale Media Collections

Weverify

The document proposes a system called MeVer for reverse video search that achieves both very high retrieval performance and fast retrieval speed. It uses a two-component approach, including an efficient video indexing and filtering method based on bag-of-words representations for fast retrieval, and a frame-level video similarity learning method called ViSiL that considers spatial and temporal frame relations for high performance. Experimental results on a large dataset demonstrate that the combined approach achieves state-of-the-art performance on video retrieval tasks.

Semantic Summarization of videos, Semantic Summarization of videos

darsh228313

Vision and Language: Past, Present and Future

Goergen Institute for Data Science

This document summarizes recent developments in the field of vision and language processing. It discusses early work using unsupervised methods for tasks like aligning images and text, before describing advances enabled by deep learning. Modern image captioning approaches are covered, including the use of CNN-LSTM models with attention. Recent work is also presented on tasks like dense captioning, stylized captioning using attributes, and captioning informed by extracted text. Developments in video captioning and visual grounding using natural language queries are also summarized.

Deep vo and slam ii

Yu Huang

This document outlines recent research papers on unsupervised and self-supervised learning approaches for depth estimation, ego-motion estimation, and SLAM from monocular images or video. Key papers discussed include using matching losses, sparse representations, semantic guidance, scale consistency, visual odometry fusion, disparity consensus, compositional re-estimation, differentiable bundle adjustment, panoramic views, occlusion modeling, and active exploration policies. Code and data are provided for many of the approaches.

Deep VO and SLAM IV

Yu Huang

Analysis of visual similarity in news videos with robust and memory efficient...

MediaMixerCommunity

This document describes research on analyzing visual similarity in news videos. It presents an anchor detection pipeline that identifies news anchors by extracting image signatures from frames and comparing them. It also describes a preview matching pipeline that identifies preview clips by detecting text, extracting image signatures, and comparing signatures. Experimental results show the proposed Residual Enhanced Visual Vector (REVV) signature achieves better anchor detection and preview matching performance than GIST while using less memory. The research demonstrates using long-range visual similarity for tasks like anchor detection and preview matching in news videos.

"How Image Sensor and Video Compression Parameters Impact Vision Algorithms,"...

Edge AI and Vision Alliance

For the full video of this presentation, please visit: https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/may-2017-embedded-vision-summit-brailovskiy For more information about embedded vision, please visit: http://www.embedded-vision.com Ilya Brailovskiy, Principal Engineer at Amazon Lab126, presents the "How Image Sensor and Video Compression Parameters Impact Vision Algorithms" tutorial at the May 2017 Embedded Vision Summit. Recent advances in deep learning algorithms have brought automated object detection and recognition to human accuracy levels on various test datasets. But algorithms that work well on an engineer’s PC often fail when deployed as part of a complete embedded system. In this talk, Brailovskiy examines some of the key embedded vision system elements that can degrade the performance of vision algorithms. For example, in many systems video is compressed, transmitted, and then decompressed before being presented to vision algorithms. Not surprisingly, video encoding parameters, such as bit rate, can have a significant impact on vision algorithm accuracy. Similarly, image sensor parameters can have a profound effect on the nature of the images captured, and therefore on the performance of vision algorithms. He explores how image sensor and video compression parameters impact vision algorithm performance, and discusses methods for selecting the best parameters to aid vision algorithm accuracy.

Sparse representation in image and video copy detection

Huan-Cheng Hsu

This document discusses using sparse representation for image and video copy detection. It begins by introducing the problem of identifying duplicated images and videos online given various manipulations. It then reviews existing techniques using watermarking or feature extraction and compares them to sparse representation, which describes images based on natural sparsity. The document outlines applying sparse representation to detect copies, presents experimental settings testing various distortions on image datasets, and compares performance to other methods. It concludes sparse representation enables efficient yet accurate copy detection and discusses future work applying it to large-scale image retrieval.

06-08 ppt.pptx

Farah Naaz

This document provides a survey of video frame interpolation techniques using deep learning. It discusses benchmark datasets commonly used to evaluate methods, including UCF101, Middlebury, and Vimeo-90K. Two main categories of methods are covered: kernel-based methods that estimate pixel-wise convolution kernels to generate interpolated frames, and flow-based methods that rely on optical flow estimation. Recent works that use adaptive separable convolutions, deformable convolutions, and GANs are summarized. The survey concludes that while kernel-based methods have improved, flow-based techniques still achieve more natural results, and combining frame interpolation with GANs shows promise.

OpenID AuthZEN Interop Read Out - Authorization

David Brossard

WeTestAthens: Postman's AI & Automation Techniques

Postman

Recommendation System using RAG Architecture

fredae14

20240607 QFM018 Elixir Reading List May 2024

Matthew Sinclair

GenAI Pilot Implementation in the organizations

kumardaparthi1024

HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU

panagenda

Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/ DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen! Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell. Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten. Diese Themen werden behandelt - Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten - Wie funktionieren CCB- und CCX-Lizenzen wirklich? - Verstehen des DLAU-Tools und wie man es am besten nutzt - Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw. - Praxisbeispiele und Best Practices zum sofortigen Umsetzen

How to Get CNIC Information System with Paksim Ga.pptx

danishmna97

5th LF Energy Power Grid Model Meet-up Slides

DanBrown980551

5th Power Grid Model Meet-up It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology. Power Grid Model The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services. Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability. Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization. What to expect For the upcoming meetup we are organizing, we have an exciting lineup of activities planned: -Insightful presentations covering two practical applications of the Power Grid Model. -An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024. -An interactive brainstorming session to discuss and propose new feature requests. -An opportunity to connect with fellow Power Grid Model enthusiasts and users.

Introduction of Cybersecurity with OSS at Code Europe 2024

Hiroshi SHIBATA

I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems. The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS. Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application. I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.

UI5 Controls simplified - UI5con2024 presentation

Wouter Lemaire

GraphRAG for Life Science to increase LLM accuracy

Tomaz Bratanic

Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...

saastr

Programming Foundation Models with DSPy - Meetup Slides

Zilliz

Monitoring and Managing Anomaly Detection on OpenShift.pdf

Tosin Akinosho

Monitoring and Managing Anomaly Detection on OpenShift Overview Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices. Key Topics Covered 1. Introduction to Anomaly Detection - Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems. 2. Understanding Edge (IoT) - Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source. 3. What is ArgoCD? - Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices. 4. Deployment Using ArgoCD for Edge Devices - Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD. 5. Introduction to Apache Kafka and S3 - Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions. 6. Viewing Kafka Messages in the Data Lake - Learn how to view and analyze Kafka messages stored in a data lake for better insights. 7. What is Prometheus? - Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices. 8. Monitoring Application Metrics with Prometheus - Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system. 9. What is Camel K? - Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes. 10. Configuring Camel K Integrations for Data Pipelines - Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow. 11. What is a Jupyter Notebook? - Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text. 12. Jupyter Notebooks with Code Examples - Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.

UiPath Test Automation using UiPath Test Suite series, part 6

DianaGray10

Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI. UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities. Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes. What will you get from this session? 1. Insights into integrating generative AI. 2. Understanding how this integration enhances test automation within the UiPath platform 3. Practical demonstrations 4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath Topics covered: What is generative AI Test Automation with generative AI and Open AI. UiPath integration with generative AI Speaker: Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP

Nordic Marketo Engage User Group_June 13_ 2024.pptx

MichaelKnudsen27

Similar to ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning

Deep vo and slam iii

Yu Huang

Deep VO and SLAM IV

Yu Huang

Analysis of visual similarity in news videos with robust and memory efficient...

MediaMixerCommunity

"How Image Sensor and Video Compression Parameters Impact Vision Algorithms,"...

Edge AI and Vision Alliance

Sparse representation in image and video copy detection

Huan-Cheng Hsu

06-08 ppt.pptx

Farah Naaz

Similar to ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning (6)

Deep vo and slam iii

Deep VO and SLAM IV

Analysis of visual similarity in news videos with robust and memory efficient...

"How Image Sensor and Video Compression Parameters Impact Vision Algorithms,"...

Sparse representation in image and video copy detection

06-08 ppt.pptx

Recently uploaded

OpenID AuthZEN Interop Read Out - Authorization

David Brossard

WeTestAthens: Postman's AI & Automation Techniques

Postman

Recommendation System using RAG Architecture

fredae14

20240607 QFM018 Elixir Reading List May 2024

Matthew Sinclair

GenAI Pilot Implementation in the organizations

kumardaparthi1024

HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU

panagenda

How to Get CNIC Information System with Paksim Ga.pptx

danishmna97

5th LF Energy Power Grid Model Meet-up Slides

DanBrown980551

Introduction of Cybersecurity with OSS at Code Europe 2024

Hiroshi SHIBATA

UI5 Controls simplified - UI5con2024 presentation

Wouter Lemaire

GraphRAG for Life Science to increase LLM accuracy

Tomaz Bratanic

Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...

saastr

Programming Foundation Models with DSPy - Meetup Slides

Zilliz

Monitoring and Managing Anomaly Detection on OpenShift.pdf

Tosin Akinosho

UiPath Test Automation using UiPath Test Suite series, part 6

DianaGray10

Nordic Marketo Engage User Group_June 13_ 2024.pptx

MichaelKnudsen27

Ocean lotus Threat actors project by John Sitima 2024 (1).pptx

SitimaJohn

Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.

20240609 QFM020 Irresponsible AI Reading List May 2024

Matthew Sinclair

Artificial Intelligence for XMLDevelopment

Octavian Nadolu

In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject. We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup. Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved. The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring. The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise. By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.

Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...

Jeffrey Haguewood

Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows. We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases. This video focuses on integration of Salesforce with Bonterra Impact Management. Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.

Recently uploaded (20)

OpenID AuthZEN Interop Read Out - Authorization

WeTestAthens: Postman's AI & Automation Techniques

Recommendation System using RAG Architecture

20240607 QFM018 Elixir Reading List May 2024

GenAI Pilot Implementation in the organizations

HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU

How to Get CNIC Information System with Paksim Ga.pptx

5th LF Energy Power Grid Model Meet-up Slides

Introduction of Cybersecurity with OSS at Code Europe 2024

UI5 Controls simplified - UI5con2024 presentation

GraphRAG for Life Science to increase LLM accuracy

Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...

Programming Foundation Models with DSPy - Meetup Slides

Monitoring and Managing Anomaly Detection on OpenShift.pdf

UiPath Test Automation using UiPath Test Suite series, part 6

Nordic Marketo Engage User Group_June 13_ 2024.pptx

Ocean lotus Threat actors project by John Sitima 2024 (1).pptx

20240609 QFM020 Irresponsible AI Reading List May 2024

Artificial Intelligence for XMLDevelopment

Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...

ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning

1. ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning Giorgos Kordopatis-Zilos Symeon Papadopoulos Ioannis Patras Ioannis Kompatsiaris

2. Problem statement Given two arbitrary videos, calculate their similarity based on their visual content. Query Video Complementary Scene Video Duplicate Scene Video Incident Scene Video Application scenario • Video Retrieval

3. Video-level methods Z. Gao et al. “ER3: A unified framework for event retrieval, recognition and recounting”. CVPR, 2017. G. Kordopatis-Zilos et al. “Near-duplicate video retrieval with deep metric learning”. ICCVW, 2017. Video similarity calculation disregards spatio-temporal information of videos

4. Frame-level methods Y. Jiang and J. Wang. “Partial copy detection in videos: A benchmark and an evaluation of popular methods”. Tran. on Big Data, 2016. L. Baraldi et al. “LAMV: Learning to align and match videos with kernelized temporal layers”. CVPR, 2018. Frame-to-frame similarity calculation disregards the spatial structure of frames

5. Motivation Fine-grained similarity calculation • Learn a video similarity function that respects: • Spatial structure of video frames (intra-frame relations) • Temporal structure of videos (inter-frame relations)

6. Frame-to-frame similarity Chamfer Similarity

7. Frame-to-frame similarity Baseline frame-to-frame similarity matrix ViSiL frame-to-frame similarity matrix

8. Video-to-video similarity Video Similarity Learning network • 4-layer CNN • Captures the temporal structures on similarity matrix with the convolutional filters Chamfer Similarity

9. Training ViSiL

10. Experimental results Near-Duplicate Video Retrieval (CC_WEB_VIDEO) Fine-grained Incident Video Retrieval (FIVR-200K) Action Video Retrieval (ActivityNet) Event-based Video Retrieval (EVVE)

11. Visual examples query video database video frame-to-frame similarity matrix ViSiL output video-to-video similarity 0.8 0.5 0.7 near-duplicate videos same event videos same action videos

12. Thank you! Poster ID: No. 39 Code & models: https://github.com/MKLab-ITI/visil With the support of: Get in touch: Giorgos Kordopatis-Zilos: georgekordopatis@iti.gr / @g_kordo No. EP/R026424/1No. 825297

ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning

Recommended

Recommended

More Related Content

Similar to ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning

Similar to ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning (6)

Recently uploaded

Recently uploaded (20)

ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning