ITI-CERTH participation in TRECVID 2018

•

0 likes•302 views

The document summarizes the ITI-CERTH participation in TRECVID 2018, including their approaches to ad-hoc video search, instance search, and activity recognition. Their key approaches included using pre-trained deep convolutional neural networks (DCNNs) to extract concepts from videos, late fusion of multiple DCNNs, and combining traditional and DCNN approaches. Their system achieved a maximum infAP of 0.047 for ad-hoc video search and had high accuracy for activity recognition training.

Technology

Supported by:
ITI-CERTH participation in TRECVID 2018
Konstantinos Avgerinakis, Anastasia Moumtzidou, Damianos Galanopoulos, Georgios Orfanidis, Stelios Andreadis, Foteini Markatopoulou,
Elissavet Batziou, Konstantinos Ioannidis, Stefanos Vrochidis, Vasileios Mezaris, Ioannis Kompatsiaris
Ad-hoc Video Search
Activities in Extended Video
Information Technologies Institute (ITI), Centre for Research and Technology Hellas (CERTH), Thessaloniki, Greece
{koafgeri, moumtzid, dgalanop, g.orfanidis, andreadisst, markatopoulou, batziou.el, kioannid, stefanos, bmezaris, ikom} @iti.gr
Instance Search
Keyframe Representation: Keyframe
annotation using 33142 concept detectors.
• ImageNet 1000 - Late fusion of 5
different DCNNs.
• Pre-trained ResNet ImageNet DCNN;
fine-tuned on SIN 345 concepts, plus 3
additional concepts derived from SIN.
• Pre-trained DCNNs for FCVID-239,
Places-205, Places-365, Hybrid-1365 &
ImageNet 4000, 4437, 8201, 12988.
Runs:
ITI-CERTH 1: Runs 2 & 4 combination (late fusion)
ITI-CERTH 2: Approach (iii) for query representation
ITI-CERTH 3: Approach (ii) for query representation
ITI-CERTH 4: Approach (i) for query representation
Run
ITI-
CERTH 1
ITI-
CERTH 2
ITI-
CERTH 3
ITI-
CERTH 4
MXinfAP 0.043 0.047 0.040 0.034
Conclusions:
• Combination of DCNNs with traditional
approaches (HOG-HOF and SVM classification).
• Accuracy above 99% in training, indicates that the
contestants activity features are linearly separable.
Activity Recognition Pipeline:
min max min max
0.4536572 0.99807
0.5576526 0.9994005
mean 𝑃 𝑚𝑖𝑠𝑠@𝑅𝑎𝑡𝑒 𝐹𝐴=1𝐴𝐷
mean 𝑃 𝑚𝑖𝑠𝑠@𝑅𝑎𝑡𝑒 𝐹𝐴=1𝐴O𝐷
Results:
High Level Visual Concept Retrieval:
• 346 TRECVID concepts using pre-trained DCNNs
& Training with Linear SVMs.
• GoogLeNet CNN network for landscape
recognition using the Places-205 scene
categories.
Visual Similarity Search module:
• Use of pre-trained DCNN on 5055 ImageNet
categories & selection of last pooling layer for
keyframe representation.
• Nearest Neighbour search realized using
Asymmetric Distance Computation.
Face detection and Face Retrieval Module:
• Face detection: detect faces using Tiny face
detection algorithm.
• Face feature extraction: extract CNN descriptors
using VGG-Very-Deep-16 CNN architecture &
use of last FC layer as feature vector.
• Construction of an IVFADC index for fast face
retrieval.
Scene Similarity Search Module:
• Use the feature vector of the last fully
connected layer and the output of the softmax
layer of the VGG16 DCNN pre-trained on Places-
365 dataset. VERGE GUI
http://mklab-services.iti.gr/verge/trec2018
VERGE video search engine modules:
VERGE graphical user interface:
• Friendly and efficient navigation.
• Various retrieval utilities, displayed
in a sliding menu or as buttons.
• Shot-based representation of
results.
Conclusion:
• High level features are extracted
and combined.
Run ID MAP Recall
Run 1 0.114 478/11717
Semantic embeddings: Both query and keyframe concept-based
representations are transformed to semantic-embedding representations.
Query Representation: Cue extraction and
query representation as a vector of concepts.
Three approaches:
i. A variety of complex NLP rules for cue
extraction; multiple stages of matching
cues with concepts.
ii. Cue extraction by finding noun phrases;
multiple stages of cue-concept matching.
iii. Keywords are extracted by finding nouns;
the most similar concept is selected for
each noun.
• The performance of the object detection module is vital, which in our case was not
secured. For this, many activities weren’t evaluated.
• An improved object detection model, as well as, an activity filter for irrelevant activities,
will improve the performance.
Results:
Results:

Similar to ITI-CERTH participation in TRECVID 2018

kanimozhi2019.pdfAshrafDabbas1

Motion capture for AnimationIRJET Journal

Operation-wise Attention Network for Tampering Localization Fusion.Weverify

Foreground algorithms for detection and extraction of an object in multimedia...IJECEIAES

ROAD SIGN DETECTION USING CONVOLUTIONAL NEURAL NETWORK (CNN)IRJET Journal

Secure IoT Systems Monitor Framework using Probabilistic Image EncryptionIJAEMSJORNAL

Content-based product image retrieval using squared-hinge loss trained convol...IJECEIAES

“How Transformers Are Changing the Nature of Deep Learning Models,” a Present...Edge AI and Vision Alliance

IRJET- A Deep Learning based Approach for Automatic Detection of Bike Rid...IRJET Journal

“COVID-19 Safe Distancing Measures in Public Spaces with Edge AI,” a Presenta...Edge AI and Vision Alliance

Combining textual and visual features for Ad-hoc Video SearchVasileiosMezaris

Virtual Contact Discovery using Facial RecognitionIRJET Journal

Presentation of the InVID verification technologies at IPTC 2018InVID Project

IRJET - 3D Virtual Dressing Room ApplicationIRJET Journal

Transfer Learning Model for Image Segmentation by Integrating U-NetPlusPlus a...YutaSuzuki27

Siddha Ganju. Deep learning on mobileLviv Startup Club

Siddha Ganju, NVIDIA. Deep Learning for MobileIT Arena

Poster Competition - Hwan LeeHwan Lee

USING IMAGE CLASSIFICATION TO INCENTIVIZE RECYCLINGIRJET Journal

Synthesizing pseudo 2.5 d content from monocular videos for mixed realityNAVER Engineering

Similar to ITI-CERTH participation in TRECVID 2018 (20)

kanimozhi2019.pdf

Motion capture for Animation

Operation-wise Attention Network for Tampering Localization Fusion.

Foreground algorithms for detection and extraction of an object in multimedia...

ROAD SIGN DETECTION USING CONVOLUTIONAL NEURAL NETWORK (CNN)

Secure IoT Systems Monitor Framework using Probabilistic Image Encryption

Content-based product image retrieval using squared-hinge loss trained convol...

“How Transformers Are Changing the Nature of Deep Learning Models,” a Present...

IRJET- A Deep Learning based Approach for Automatic Detection of Bike Rid...

“COVID-19 Safe Distancing Measures in Public Spaces with Edge AI,” a Presenta...

Combining textual and visual features for Ad-hoc Video Search

Virtual Contact Discovery using Facial Recognition

Presentation of the InVID verification technologies at IPTC 2018

IRJET - 3D Virtual Dressing Room Application

Transfer Learning Model for Image Segmentation by Integrating U-NetPlusPlus a...

Siddha Ganju. Deep learning on mobile

Siddha Ganju, NVIDIA. Deep Learning for Mobile

Poster Competition - Hwan Lee

USING IMAGE CLASSIFICATION TO INCENTIVIZE RECYCLING

Synthesizing pseudo 2.5 d content from monocular videos for mixed reality

Recently uploaded

Salesforce Community Group Quito, Salesforce 101Paola De la Torre

Developing An App To Navigate The Roads of BrazilV3cube

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Histor y of HAM Radio presentation slidevu2urc

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

Recently uploaded (20)

Salesforce Community Group Quito, Salesforce 101

Developing An App To Navigate The Roads of Brazil

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

How to Troubleshoot Apps for the Modern Connected Worker

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Histor y of HAM Radio presentation slide

Presentation on how to chat with PDF using ChatGPT code interpreter

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Driving Behavioral Change for Information Management through Data-Driven Gree...

CNv6 Instructor Chapter 6 Quality of Service

Finology Group – Insurtech Innovation Award 2024

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

The Codex of Business Writing Software for Real-World Solutions 2.pptx

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

Factors to Consider When Choosing Accounts Payable Services Providers.pptx

08448380779 Call Girls In Friends Colony Women Seeking Men

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

ITI-CERTH participation in TRECVID 2018

1. Supported by: ITI-CERTH participation in TRECVID 2018 Konstantinos Avgerinakis, Anastasia Moumtzidou, Damianos Galanopoulos, Georgios Orfanidis, Stelios Andreadis, Foteini Markatopoulou, Elissavet Batziou, Konstantinos Ioannidis, Stefanos Vrochidis, Vasileios Mezaris, Ioannis Kompatsiaris Ad-hoc Video Search Activities in Extended Video Information Technologies Institute (ITI), Centre for Research and Technology Hellas (CERTH), Thessaloniki, Greece {koafgeri, moumtzid, dgalanop, g.orfanidis, andreadisst, markatopoulou, batziou.el, kioannid, stefanos, bmezaris, ikom} @iti.gr Instance Search Keyframe Representation: Keyframe annotation using 33142 concept detectors. • ImageNet 1000 - Late fusion of 5 different DCNNs. • Pre-trained ResNet ImageNet DCNN; fine-tuned on SIN 345 concepts, plus 3 additional concepts derived from SIN. • Pre-trained DCNNs for FCVID-239, Places-205, Places-365, Hybrid-1365 & ImageNet 4000, 4437, 8201, 12988. Runs: ITI-CERTH 1: Runs 2 & 4 combination (late fusion) ITI-CERTH 2: Approach (iii) for query representation ITI-CERTH 3: Approach (ii) for query representation ITI-CERTH 4: Approach (i) for query representation Run ITI- CERTH 1 ITI- CERTH 2 ITI- CERTH 3 ITI- CERTH 4 MXinfAP 0.043 0.047 0.040 0.034 Conclusions: • Combination of DCNNs with traditional approaches (HOG-HOF and SVM classification). • Accuracy above 99% in training, indicates that the contestants activity features are linearly separable. Activity Recognition Pipeline: min max min max 0.4536572 0.99807 0.5576526 0.9994005 mean 𝑃 𝑚𝑖𝑠𝑠@𝑅𝑎𝑡𝑒 𝐹𝐴=1𝐴𝐷 mean 𝑃 𝑚𝑖𝑠𝑠@𝑅𝑎𝑡𝑒 𝐹𝐴=1𝐴O𝐷 Results: High Level Visual Concept Retrieval: • 346 TRECVID concepts using pre-trained DCNNs & Training with Linear SVMs. • GoogLeNet CNN network for landscape recognition using the Places-205 scene categories. Visual Similarity Search module: • Use of pre-trained DCNN on 5055 ImageNet categories & selection of last pooling layer for keyframe representation. • Nearest Neighbour search realized using Asymmetric Distance Computation. Face detection and Face Retrieval Module: • Face detection: detect faces using Tiny face detection algorithm. • Face feature extraction: extract CNN descriptors using VGG-Very-Deep-16 CNN architecture & use of last FC layer as feature vector. • Construction of an IVFADC index for fast face retrieval. Scene Similarity Search Module: • Use the feature vector of the last fully connected layer and the output of the softmax layer of the VGG16 DCNN pre-trained on Places- 365 dataset. VERGE GUI http://mklab-services.iti.gr/verge/trec2018 VERGE video search engine modules: VERGE graphical user interface: • Friendly and efficient navigation. • Various retrieval utilities, displayed in a sliding menu or as buttons. • Shot-based representation of results. Conclusion: • High level features are extracted and combined. Run ID MAP Recall Run 1 0.114 478/11717 Semantic embeddings: Both query and keyframe concept-based representations are transformed to semantic-embedding representations. Query Representation: Cue extraction and query representation as a vector of concepts. Three approaches: i. A variety of complex NLP rules for cue extraction; multiple stages of matching cues with concepts. ii. Cue extraction by finding noun phrases; multiple stages of cue-concept matching. iii. Keywords are extracted by finding nouns; the most similar concept is selected for each noun. • The performance of the object detection module is vital, which in our case was not secured. For this, many activities weren’t evaluated. • An improved object detection model, as well as, an activity filter for irrelevant activities, will improve the performance. Results: Results:

ITI-CERTH participation in TRECVID 2018

Recommended

Recommended

More Related Content

Similar to ITI-CERTH participation in TRECVID 2018

Similar to ITI-CERTH participation in TRECVID 2018 (20)

More from MOVING Project

More from MOVING Project (20)

Recently uploaded

Recently uploaded (20)

ITI-CERTH participation in TRECVID 2018