CUNI at MediaEval 2014 Search and Hyperlinking Task: Search Task Experiments

•

0 likes•269 views

In this paper, we describe our participation in the Search part of the Search and Hyperlinking Task in MediaEval Benchmark 2014. In our experiments, we compare two types of segmentation: fixed-length segmentation and segmentation employing Decision Trees on a set of various features. We also show usefulness of exploiting metadata and explore removal of overlapping retrieved segments. http://ceur-ws.org/Vol-1263/mediaeval2014_submission_17.pdf

Software

CUNI at MediaEval 2014 Search and Hyperlinking Task:
Search Task Experiments
Petra Galuščáková and Pavel Pecina
Charles University in Prague, Faculty of Mathematics and Physics
Institute of Formal and Applied Linguistics
• Enable users to find information relevant to the
submitted textual query in a collection of audio-visual
recordings (TV programmes provided by
BBC).
• Available metadata, subtitles, and automatic
transcripts (LIMSI, LIUM, and NST-Sheffield).
1. System Description
• The recordings were divided into shorter
segments.
• We employed the Terrier IR system and its
implementation of the Hiemstra language model
with stemming and stopwords removal.
• Segmentation: we used segments of fixed length
and we used our segmentation system based on
Decision Trees (DT).
3. Results
Tab1. Results of the Search sub-task for different transcripts, segmentation types,
segment lengths, meta-data, and removal of overlapping segments.
2. System Tuning
• We tuned segment length for fixed length (60
seconds) and DT-based (50 and 120 seconds)
segmentation.
• We experimented with post-filtering of the
retrieved segments.
• We employed the metadata.
• The best results are achieved in experiments
using subtitles.
• In most of the cases, LIMSI scored better than
LIUM and NST-Sheffield transcripts and NST-Sheffield
scored better than LIUM.
• In most cases, the concatenation of the segment
with metadata improved the results.
• The fixed-length segmentation outperforms the
DT-based segmentation with 120-seconds-long
segments.
• 50-seconds-long DT-segments outperform the
fixed-length segments measured by MAP and
precision-based measures, but are outperformed by
fixed-length segments using the MAP-bin and
MAP-tol measures.
4. Conclusion
Acknowledgments
This research is supported by the Czech Science Foundation, grant
number P103/12/G084, Charles University Grant Agency GA UK, grant
number 920913, and by SVV project number 260 104.
Transcripts Segmentation Segment
Length Metadata Overlap MAP P 5 P 10 P 20 MAP-bin MAP-tol
Subtitles Fixed 60s No No 0.4209 0.7933 0.7433 0.5950 0.3192 0.3155
Subtitles Fixed 60s Yes No 0.5127 0.7467 0.7267 0.6100 0.3433 0.3023
Subtitles Fixed 60s Yes Yes 4.3527 0.7867 0.7733 0.7683 0.4150 0.1459
Subtitles DT 120s Yes No 0.3692 0.7467 0.7133 0.6050 0.2606 0.2157
Subtitles DT 120s Yes Yes 16.3486 0.8400 0.8367 0.8433 0.3172 0.0515
Subtitles DT 50s Yes No 0.8028 0.7867 0.7667 0.6933 0.3199 0.2350
LIMSI DT 120s Yes Yes 4.6366 0.7133 0.7300 0.7617 0.3706 0.1007
LIMSI Fixed 60s Yes No 0.4725 0.6667 0.6633 0.5467 0.3160 0.2696
LIUM DT 120s Yes Yes 4.0709 0.6533 0.6800 0.6900 0.3345 0.099
LIUM Fixed 60s Yes No 0.4371 0.6333 0.6400 0.5367 0.2651 0.2327
NST-Sheffield DT 120s Yes Yes 10.0198 0.7267 0.7533 0.7650 0.3342 0.0675
NST-Sheffield Fixed 60s Yes No 0.4645 0.6667 0.6600 0.5667 0.2974 0.2598
• The highest results are achieved on the subtitles.
• Metadata improve the results.
• Segmentation into fixed-length segments is very
effective, but it can be outperformed.
• MAP-tol measure may be preferred if retrieved
segments can overlap.
DT-based Segmentation:
• For each word in the transcript, it decides
whether the segment ends after this word or
not.
• Makes use of cue word n-grams, cue tag
n-grams, silence between words, capital letters,
division given in transcripts, and the output of
the TextTiling algorithm.
• The score of all measures, except the MAP-tol
measure, are notably higher in the experiments
which do not use post-filtering.
• In these cases, the relevant retrieved segments
frequently overlap each other.
• User may have already seen the relevant
segment → these segments could not be expected
to be so beneficial.

Similar to CUNI at MediaEval 2014 Search and Hyperlinking Task: Search Task Experiments

Multimodal Features for Search and Hyperlinking of Video ContentPetra Galuscakova

CUNI at MediaEval 2012: Search and Hyperlinking TaskMediaEval2012

MediaEval 2015 - The SPL-IT-UC Query by Example Search on Speech system for M...multimediaeval

LACS S y stem A nalysis on R etrieval M odels for the MediaEval 2014 Search a...multimediaeval

BMSB_QianQian Han

Poster SCGlowTTS Interspeech 2021Bilkent University

Eshg sequencing workshopOxford Gene Technology

A Transcat.com Webinar Presented by Aglient Technolgoes: Scope Technology Imp...Transcat

NR_Frame_Structure_and_Air_Interface_Resources.pptxBijoy Banerjee

IRJET-Analysis of Medium Access Protocols with Channel Bonding for Cognitive ...IRJET Journal

Gwas.emes.compRichard Emes

Fast channel zapping with destination oriented multicast for ip video deliveryecway

Java fast channel zapping with destination-oriented multicast for ip video d...ecwayerode

Blast fasta 4Er Puspendra Tripathi

Transition-based Dependency Parsing with Selectional BranchingJinho Choi

Measurement Procedures for Design and Enforcement of Harm Claim ThresholdsPierre de Vries

DCU Search Runs at MediaEval 2014 Search and Hyperlinkingmultimediaeval

A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Al...a3labdsp

Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...Sri Manakula Vinayagar Engineering College

Mediaeval 2013 Spoken Web Search results slidesXavier Anguera

Similar to CUNI at MediaEval 2014 Search and Hyperlinking Task: Search Task Experiments (20)

Multimodal Features for Search and Hyperlinking of Video Content

CUNI at MediaEval 2012: Search and Hyperlinking Task

MediaEval 2015 - The SPL-IT-UC Query by Example Search on Speech system for M...

LACS S y stem A nalysis on R etrieval M odels for the MediaEval 2014 Search a...

BMSB_Qian

Poster SCGlowTTS Interspeech 2021

Eshg sequencing workshop

A Transcat.com Webinar Presented by Aglient Technolgoes: Scope Technology Imp...

NR_Frame_Structure_and_Air_Interface_Resources.pptx

IRJET-Analysis of Medium Access Protocols with Channel Bonding for Cognitive ...

Gwas.emes.comp

Fast channel zapping with destination oriented multicast for ip video delivery

Java fast channel zapping with destination-oriented multicast for ip video d...

Blast fasta 4

Transition-based Dependency Parsing with Selectional Branching

Measurement Procedures for Design and Enforcement of Harm Claim Thresholds

DCU Search Runs at MediaEval 2014 Search and Hyperlinking

A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Al...

Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...

Mediaeval 2013 Spoken Web Search results slides

Recently uploaded

MYjobs Presentation Django-based projectAnoyGreter

Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin

Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz

Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110

Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin

办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea

The Evolution of Karaoke From Analog to App.pdfPower Karaoke

EY_Graph Database Powered SustainabilityNeo4j

Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig

GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko

KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app

Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.

chapter--4-software-project-planning.pptkotipi9215

Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ

Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran

Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh

BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp

Asset Management Software - InfographicHr365.us smith

Recently uploaded (20)

MYjobs Presentation Django-based project

Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...

Folding Cheat Sheet #4 - fourth in a series

Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...

Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide

办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样

The Evolution of Karaoke From Analog to App.pdf

EY_Graph Database Powered Sustainability

Automate your Kamailio Test Calls - Kamailio World 2024

GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf

KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx

Unveiling Design Patterns: A Visual Guide with UML Diagrams

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...

chapter--4-software-project-planning.ppt

Cloud Management Software Platforms: OpenStack

Intelligent Home Wi-Fi Solutions | ThinkPalm

Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...

BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE

Asset Management Software - Infographic

CUNI at MediaEval 2014 Search and Hyperlinking Task: Search Task Experiments

1. CUNI at MediaEval 2014 Search and Hyperlinking Task: Search Task Experiments Petra Galuščáková and Pavel Pecina Charles University in Prague, Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics • Enable users to find information relevant to the submitted textual query in a collection of audio-visual recordings (TV programmes provided by BBC). • Available metadata, subtitles, and automatic transcripts (LIMSI, LIUM, and NST-Sheffield). 1. System Description • The recordings were divided into shorter segments. • We employed the Terrier IR system and its implementation of the Hiemstra language model with stemming and stopwords removal. • Segmentation: we used segments of fixed length and we used our segmentation system based on Decision Trees (DT). 3. Results Tab1. Results of the Search sub-task for different transcripts, segmentation types, segment lengths, meta-data, and removal of overlapping segments. 2. System Tuning • We tuned segment length for fixed length (60 seconds) and DT-based (50 and 120 seconds) segmentation. • We experimented with post-filtering of the retrieved segments. • We employed the metadata. • The best results are achieved in experiments using subtitles. • In most of the cases, LIMSI scored better than LIUM and NST-Sheffield transcripts and NST-Sheffield scored better than LIUM. • In most cases, the concatenation of the segment with metadata improved the results. • The fixed-length segmentation outperforms the DT-based segmentation with 120-seconds-long segments. • 50-seconds-long DT-segments outperform the fixed-length segments measured by MAP and precision-based measures, but are outperformed by fixed-length segments using the MAP-bin and MAP-tol measures. 4. Conclusion Acknowledgments This research is supported by the Czech Science Foundation, grant number P103/12/G084, Charles University Grant Agency GA UK, grant number 920913, and by SVV project number 260 104. Transcripts Segmentation Segment Length Metadata Overlap MAP P 5 P 10 P 20 MAP-bin MAP-tol Subtitles Fixed 60s No No 0.4209 0.7933 0.7433 0.5950 0.3192 0.3155 Subtitles Fixed 60s Yes No 0.5127 0.7467 0.7267 0.6100 0.3433 0.3023 Subtitles Fixed 60s Yes Yes 4.3527 0.7867 0.7733 0.7683 0.4150 0.1459 Subtitles DT 120s Yes No 0.3692 0.7467 0.7133 0.6050 0.2606 0.2157 Subtitles DT 120s Yes Yes 16.3486 0.8400 0.8367 0.8433 0.3172 0.0515 Subtitles DT 50s Yes No 0.8028 0.7867 0.7667 0.6933 0.3199 0.2350 LIMSI DT 120s Yes Yes 4.6366 0.7133 0.7300 0.7617 0.3706 0.1007 LIMSI Fixed 60s Yes No 0.4725 0.6667 0.6633 0.5467 0.3160 0.2696 LIUM DT 120s Yes Yes 4.0709 0.6533 0.6800 0.6900 0.3345 0.099 LIUM Fixed 60s Yes No 0.4371 0.6333 0.6400 0.5367 0.2651 0.2327 NST-Sheffield DT 120s Yes Yes 10.0198 0.7267 0.7533 0.7650 0.3342 0.0675 NST-Sheffield Fixed 60s Yes No 0.4645 0.6667 0.6600 0.5667 0.2974 0.2598 • The highest results are achieved on the subtitles. • Metadata improve the results. • Segmentation into fixed-length segments is very effective, but it can be outperformed. • MAP-tol measure may be preferred if retrieved segments can overlap. DT-based Segmentation: • For each word in the transcript, it decides whether the segment ends after this word or not. • Makes use of cue word n-grams, cue tag n-grams, silence between words, capital letters, division given in transcripts, and the output of the TextTiling algorithm. • The score of all measures, except the MAP-tol measure, are notably higher in the experiments which do not use post-filtering. • In these cases, the relevant retrieved segments frequently overlap each other. • User may have already seen the relevant segment → these segments could not be expected to be so beneficial.

CUNI at MediaEval 2014 Search and Hyperlinking Task: Search Task Experiments

Recommended

Recommended

More Related Content

Similar to CUNI at MediaEval 2014 Search and Hyperlinking Task: Search Task Experiments

Similar to CUNI at MediaEval 2014 Search and Hyperlinking Task: Search Task Experiments (20)

More from multimediaeval

More from multimediaeval (20)

Recently uploaded

Recently uploaded (20)

CUNI at MediaEval 2014 Search and Hyperlinking Task: Search Task Experiments