ITI-CERTH participation in TRECVID 2018

•

0 likes•301 views

The document summarizes the ITI-CERTH participation in TRECVID 2018, including their approaches to ad-hoc video search, instance search, and activity recognition. Their key approaches included using pre-trained deep convolutional neural networks (DCNNs) to extract concepts from videos, late fusion of multiple DCNNs, and combining traditional and DCNN approaches. Their system achieved a maximum infAP of 0.047 for ad-hoc video search and had high accuracy for activity recognition training.

Technology

Supported by:
ITI-CERTH participation in TRECVID 2018
Konstantinos Avgerinakis, Anastasia Moumtzidou, Damianos Galanopoulos, Georgios Orfanidis, Stelios Andreadis, Foteini Markatopoulou,
Elissavet Batziou, Konstantinos Ioannidis, Stefanos Vrochidis, Vasileios Mezaris, Ioannis Kompatsiaris
Ad-hoc Video Search
Activities in Extended Video
Information Technologies Institute (ITI), Centre for Research and Technology Hellas (CERTH), Thessaloniki, Greece
{koafgeri, moumtzid, dgalanop, g.orfanidis, andreadisst, markatopoulou, batziou.el, kioannid, stefanos, bmezaris, ikom} @iti.gr
Instance Search
Keyframe Representation: Keyframe
annotation using 33142 concept detectors.
• ImageNet 1000 - Late fusion of 5
different DCNNs.
• Pre-trained ResNet ImageNet DCNN;
fine-tuned on SIN 345 concepts, plus 3
additional concepts derived from SIN.
• Pre-trained DCNNs for FCVID-239,
Places-205, Places-365, Hybrid-1365 &
ImageNet 4000, 4437, 8201, 12988.
Runs:
ITI-CERTH 1: Runs 2 & 4 combination (late fusion)
ITI-CERTH 2: Approach (iii) for query representation
ITI-CERTH 3: Approach (ii) for query representation
ITI-CERTH 4: Approach (i) for query representation
Run
ITI-
CERTH 1
ITI-
CERTH 2
ITI-
CERTH 3
ITI-
CERTH 4
MXinfAP 0.043 0.047 0.040 0.034
Conclusions:
• Combination of DCNNs with traditional
approaches (HOG-HOF and SVM classification).
• Accuracy above 99% in training, indicates that the
contestants activity features are linearly separable.
Activity Recognition Pipeline:
min max min max
0.4536572 0.99807
0.5576526 0.9994005
mean 𝑃 𝑚𝑖𝑠𝑠@𝑅𝑎𝑡𝑒 𝐹𝐴=1𝐴𝐷
mean 𝑃 𝑚𝑖𝑠𝑠@𝑅𝑎𝑡𝑒 𝐹𝐴=1𝐴O𝐷
Results:
High Level Visual Concept Retrieval:
• 346 TRECVID concepts using pre-trained DCNNs
& Training with Linear SVMs.
• GoogLeNet CNN network for landscape
recognition using the Places-205 scene
categories.
Visual Similarity Search module:
• Use of pre-trained DCNN on 5055 ImageNet
categories & selection of last pooling layer for
keyframe representation.
• Nearest Neighbour search realized using
Asymmetric Distance Computation.
Face detection and Face Retrieval Module:
• Face detection: detect faces using Tiny face
detection algorithm.
• Face feature extraction: extract CNN descriptors
using VGG-Very-Deep-16 CNN architecture &
use of last FC layer as feature vector.
• Construction of an IVFADC index for fast face
retrieval.
Scene Similarity Search Module:
• Use the feature vector of the last fully
connected layer and the output of the softmax
layer of the VGG16 DCNN pre-trained on Places-
365 dataset. VERGE GUI
http://mklab-services.iti.gr/verge/trec2018
VERGE video search engine modules:
VERGE graphical user interface:
• Friendly and efficient navigation.
• Various retrieval utilities, displayed
in a sliding menu or as buttons.
• Shot-based representation of
results.
Conclusion:
• High level features are extracted
and combined.
Run ID MAP Recall
Run 1 0.114 478/11717
Semantic embeddings: Both query and keyframe concept-based
representations are transformed to semantic-embedding representations.
Query Representation: Cue extraction and
query representation as a vector of concepts.
Three approaches:
i. A variety of complex NLP rules for cue
extraction; multiple stages of matching
cues with concepts.
ii. Cue extraction by finding noun phrases;
multiple stages of cue-concept matching.
iii. Keywords are extracted by finding nouns;
the most similar concept is selected for
each noun.
• The performance of the object detection module is vital, which in our case was not
secured. For this, many activities weren’t evaluated.
• An improved object detection model, as well as, an activity filter for irrelevant activities,
will improve the performance.
Results:
Results:

Similar to ITI-CERTH participation in TRECVID 2018

kanimozhi2019.pdfAshrafDabbas1

Motion capture for AnimationIRJET Journal

Operation-wise Attention Network for Tampering Localization Fusion.Weverify

Foreground algorithms for detection and extraction of an object in multimedia...IJECEIAES

ROAD SIGN DETECTION USING CONVOLUTIONAL NEURAL NETWORK (CNN)IRJET Journal

Secure IoT Systems Monitor Framework using Probabilistic Image EncryptionIJAEMSJORNAL

Content-based product image retrieval using squared-hinge loss trained convol...IJECEIAES

“How Transformers Are Changing the Nature of Deep Learning Models,” a Present...Edge AI and Vision Alliance

IRJET- A Deep Learning based Approach for Automatic Detection of Bike Rid...IRJET Journal

“COVID-19 Safe Distancing Measures in Public Spaces with Edge AI,” a Presenta...Edge AI and Vision Alliance

Combining textual and visual features for Ad-hoc Video SearchVasileiosMezaris

Virtual Contact Discovery using Facial RecognitionIRJET Journal

Presentation of the InVID verification technologies at IPTC 2018InVID Project

IRJET - 3D Virtual Dressing Room ApplicationIRJET Journal

Transfer Learning Model for Image Segmentation by Integrating U-NetPlusPlus a...YutaSuzuki27

Siddha Ganju. Deep learning on mobileLviv Startup Club

Siddha Ganju, NVIDIA. Deep Learning for MobileIT Arena

Poster Competition - Hwan LeeHwan Lee

USING IMAGE CLASSIFICATION TO INCENTIVIZE RECYCLINGIRJET Journal

Synthesizing pseudo 2.5 d content from monocular videos for mixed realityNAVER Engineering

Similar to ITI-CERTH participation in TRECVID 2018 (20)

kanimozhi2019.pdf

Motion capture for Animation

Operation-wise Attention Network for Tampering Localization Fusion.

Foreground algorithms for detection and extraction of an object in multimedia...

ROAD SIGN DETECTION USING CONVOLUTIONAL NEURAL NETWORK (CNN)

Secure IoT Systems Monitor Framework using Probabilistic Image Encryption

Content-based product image retrieval using squared-hinge loss trained convol...

“How Transformers Are Changing the Nature of Deep Learning Models,” a Present...

IRJET- A Deep Learning based Approach for Automatic Detection of Bike Rid...

“COVID-19 Safe Distancing Measures in Public Spaces with Edge AI,” a Presenta...

Combining textual and visual features for Ad-hoc Video Search

Virtual Contact Discovery using Facial Recognition

Presentation of the InVID verification technologies at IPTC 2018

IRJET - 3D Virtual Dressing Room Application

Transfer Learning Model for Image Segmentation by Integrating U-NetPlusPlus a...

Siddha Ganju. Deep learning on mobile

Siddha Ganju, NVIDIA. Deep Learning for Mobile

Poster Competition - Hwan Lee

USING IMAGE CLASSIFICATION TO INCENTIVIZE RECYCLING

Synthesizing pseudo 2.5 d content from monocular videos for mixed reality

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

AI as an Interface for Commercial BuildingsMemoori

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community

SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren

Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies

How to convert PDF to text with Nanonetsnaman860154

Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes

SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal

How to Remove Document Management Hurdles with X-Docs?XfilesPro

Install Stable Diffusion in windows machinePadma Pradeep

GenCyber Cyber Security Day PresentationMichael W. Hawkins

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

Slack Application Development 101 Slidespraypatel2

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh

CloudStudio User manual (basic edition):comworks

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

Recently uploaded (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions

AI as an Interface for Commercial Buildings

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx

SQL Database Design For Developers at php[tek] 2024

Benefits Of Flutter Compared To Other Frameworks

How to convert PDF to text with Nanonets

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

My Hashitalk Indonesia April 2024 Presentation

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service

How to Remove Document Management Hurdles with X-Docs?

Install Stable Diffusion in windows machine

GenCyber Cyber Security Day Presentation

08448380779 Call Girls In Civil Lines Women Seeking Men

Slack Application Development 101 Slides

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi

CloudStudio User manual (basic edition):

Injustice - Developers Among Us (SciFiDevCon 2024)

ITI-CERTH participation in TRECVID 2018

1. Supported by: ITI-CERTH participation in TRECVID 2018 Konstantinos Avgerinakis, Anastasia Moumtzidou, Damianos Galanopoulos, Georgios Orfanidis, Stelios Andreadis, Foteini Markatopoulou, Elissavet Batziou, Konstantinos Ioannidis, Stefanos Vrochidis, Vasileios Mezaris, Ioannis Kompatsiaris Ad-hoc Video Search Activities in Extended Video Information Technologies Institute (ITI), Centre for Research and Technology Hellas (CERTH), Thessaloniki, Greece {koafgeri, moumtzid, dgalanop, g.orfanidis, andreadisst, markatopoulou, batziou.el, kioannid, stefanos, bmezaris, ikom} @iti.gr Instance Search Keyframe Representation: Keyframe annotation using 33142 concept detectors. • ImageNet 1000 - Late fusion of 5 different DCNNs. • Pre-trained ResNet ImageNet DCNN; fine-tuned on SIN 345 concepts, plus 3 additional concepts derived from SIN. • Pre-trained DCNNs for FCVID-239, Places-205, Places-365, Hybrid-1365 & ImageNet 4000, 4437, 8201, 12988. Runs: ITI-CERTH 1: Runs 2 & 4 combination (late fusion) ITI-CERTH 2: Approach (iii) for query representation ITI-CERTH 3: Approach (ii) for query representation ITI-CERTH 4: Approach (i) for query representation Run ITI- CERTH 1 ITI- CERTH 2 ITI- CERTH 3 ITI- CERTH 4 MXinfAP 0.043 0.047 0.040 0.034 Conclusions: • Combination of DCNNs with traditional approaches (HOG-HOF and SVM classification). • Accuracy above 99% in training, indicates that the contestants activity features are linearly separable. Activity Recognition Pipeline: min max min max 0.4536572 0.99807 0.5576526 0.9994005 mean 𝑃 𝑚𝑖𝑠𝑠@𝑅𝑎𝑡𝑒 𝐹𝐴=1𝐴𝐷 mean 𝑃 𝑚𝑖𝑠𝑠@𝑅𝑎𝑡𝑒 𝐹𝐴=1𝐴O𝐷 Results: High Level Visual Concept Retrieval: • 346 TRECVID concepts using pre-trained DCNNs & Training with Linear SVMs. • GoogLeNet CNN network for landscape recognition using the Places-205 scene categories. Visual Similarity Search module: • Use of pre-trained DCNN on 5055 ImageNet categories & selection of last pooling layer for keyframe representation. • Nearest Neighbour search realized using Asymmetric Distance Computation. Face detection and Face Retrieval Module: • Face detection: detect faces using Tiny face detection algorithm. • Face feature extraction: extract CNN descriptors using VGG-Very-Deep-16 CNN architecture & use of last FC layer as feature vector. • Construction of an IVFADC index for fast face retrieval. Scene Similarity Search Module: • Use the feature vector of the last fully connected layer and the output of the softmax layer of the VGG16 DCNN pre-trained on Places- 365 dataset. VERGE GUI http://mklab-services.iti.gr/verge/trec2018 VERGE video search engine modules: VERGE graphical user interface: • Friendly and efficient navigation. • Various retrieval utilities, displayed in a sliding menu or as buttons. • Shot-based representation of results. Conclusion: • High level features are extracted and combined. Run ID MAP Recall Run 1 0.114 478/11717 Semantic embeddings: Both query and keyframe concept-based representations are transformed to semantic-embedding representations. Query Representation: Cue extraction and query representation as a vector of concepts. Three approaches: i. A variety of complex NLP rules for cue extraction; multiple stages of matching cues with concepts. ii. Cue extraction by finding noun phrases; multiple stages of cue-concept matching. iii. Keywords are extracted by finding nouns; the most similar concept is selected for each noun. • The performance of the object detection module is vital, which in our case was not secured. For this, many activities weren’t evaluated. • An improved object detection model, as well as, an activity filter for irrelevant activities, will improve the performance. Results: Results:

ITI-CERTH participation in TRECVID 2018

Recommended

Recommended

More Related Content

Similar to ITI-CERTH participation in TRECVID 2018

Similar to ITI-CERTH participation in TRECVID 2018 (20)

More from MOVING Project

More from MOVING Project (20)

Recently uploaded

Recently uploaded (20)

ITI-CERTH participation in TRECVID 2018