VERGE: A Multimodal Interactive Search Engine for Video Browsing and Retrieval

 Leverage semantic features of text metadata
(title, description, tags, category) of each video
via the exploitation of online databases
(Wordnet, Babelnet) to replace terms with their
respective semantic concepts
 Semantically adjacent terms offer increased
versatility in retrieved user query results
 Combine Apache Lucene for text normalization
and MongoDB for data storage, indexing and
retrieval
Supported by:
: A Multimodal Interactive Search Engine for Video Browsing and Retrieval
Introduction
 VERGE interactive video search engine
 Enables browsing & retrieval of video/image collections
 Includes query submissions & a reranking capability
 Friendly and efficient graphical user interface (GUI)
 VERGE supports Known Item Search (KIS), Instance Search (INS) &
Ad-Hoc Video Search (AVS) tasks
 Participation in video-oriented benchmarks and workshops
 TRECVID (2007 – 2018)
 Video Browser Showdown – VBS (2014 – 2019)
Stelios Andreadis1, Anastasia Moumtzidou1, Damianos Galanopoulos1,
Foteini Markatopoulou1,2, Konstantinos Apostolidis1, Thanassis Mavropoulos1,
IIias Gialampoukidis1, Stefanos Vrochidis1, Vasileios Mezaris1,
Ioannis Kompatsiaris1, and Ioannis Patras 2
1Information Technologies Institute, CERTH, Thessaloniki, Greece
2School of Electronic Engineering and Computer Science, QMUL, UK
 Essential improvements on the novel
version of VERGE GUI that was introduced
in VBS 2018.
 Navigation limits to a single-page for
efficiency
 A variety of retrieval modalities, mostly
offered in a dashboard menu
 Shot-based or video-based representation
of results
 The complete shots of a video can be
displayed in a film-strip
 Built with common Web Technologies,
RESTful Web Services and a MongoDB
VERGE GUI
Interface & Interaction Modes
Contact point: Stefanos Vrochidis (stefanos@iti.gr)
URL: http://mklab-services.iti.gr/vbs2019
System
Indexing & Retrieval Modules
 1000 ImageNet concepts - Late fusion of 4
different state-of-the-art pre-trained ImageNet
Deep Convolutional Neural Nets (DCNNs)
 345 TRECVID SIN concepts - ResNet pre-
trained ImageNet DCNN fine-tuned on these
concepts. Selection of the 323 top-performing
concepts.
 500 event-related concepts – Using an existing
DCNN fine-tuned on the EventNet dataset
 365 place-related concepts – Using an existing
DCNN fine-tuned on the Places dataset
Concept-based Retrieval
 Train GoogleNet on 5055 ImageNet concepts
 Use of last pooling layer with length = 1024 as
global key-frame representation
 Retrieval:
 Create an asymmetric distance computation
index of database vector for fast image
retrieval
 Use K-Nearest Neighbours for finding visually
similar images to the query image
Visual Similarity Search
 Consider high level visual concepts:
 Use 20 concepts representing each video
frame
 Create a video concept vector by getting the
sum or the product of its frames
 Use Cosine/Euclidean distance for calculating
the distance among the videos
 Fast retrieval is achieved by pre-calculating all
distances among the video collection
 Image/Shot color clustering
 Extraction of MPEG-7 Color Layout descriptor
from all frames
 Mapping of frames to a color of the palette (of
8 colors) by using the Euclidean distance
 Video clustering into topics
 Exploit textual metadata (description and
tags) per video
 Topic modeling using Latent Dirichlet
Allocation
 Most frequent terms per topic are presented
Clustering Multimodal Fusion & Search
Query translation into a set of high-level concepts 𝐶 𝑄
 Query to elementary “sub-queries” decomposition
 “Sub-queries” creation using POS and NER
tagging, string matching and Noun Phrases
extraction
 Semantic relatedness using Explicit Semantic
Analysis for every “sub-query”– concept pair
 Select the most closely related concepts for each
sub-query
 Finally, 𝐶 𝑄 is filed with concepts that describe
the input query
Automatic Query Formulation & Expansion
Text-based Search

VERGE: A Multimodal Interactive Search Engine for Video Browsing and Retrieval

Recommended

Recommended

More Related Content

Similar to VERGE: A Multimodal Interactive Search Engine for Video Browsing and Retrieval

Similar to VERGE: A Multimodal Interactive Search Engine for Video Browsing and Retrieval (20)

More from MOVING Project

More from MOVING Project (20)

Recently uploaded

Recently uploaded (20)

VERGE: A Multimodal Interactive Search Engine for Video Browsing and Retrieval