In this paper, we describe our participation in the Search part of the Search and Hyperlinking Task in MediaEval Benchmark 2014. In our experiments, we compare two types of segmentation: fixed-length segmentation and segmentation employing Decision Trees on a set of various features. We also show usefulness of exploiting metadata and explore removal of overlapping retrieved segments.
http://ceur-ws.org/Vol-1263/mediaeval2014_submission_17.pdf
CUNI at MediaEval 2014 Search and Hyperlinking Task: Search Task Experiments
1. CUNI at MediaEval 2014 Search and Hyperlinking Task:
Search Task Experiments
Petra Galuščáková and Pavel Pecina
Charles University in Prague, Faculty of Mathematics and Physics
Institute of Formal and Applied Linguistics
• Enable users to find information relevant to the
submitted textual query in a collection of audio-visual
recordings (TV programmes provided by
BBC).
• Available metadata, subtitles, and automatic
transcripts (LIMSI, LIUM, and NST-Sheffield).
1. System Description
• The recordings were divided into shorter
segments.
• We employed the Terrier IR system and its
implementation of the Hiemstra language model
with stemming and stopwords removal.
• Segmentation: we used segments of fixed length
and we used our segmentation system based on
Decision Trees (DT).
3. Results
Tab1. Results of the Search sub-task for different transcripts, segmentation types,
segment lengths, meta-data, and removal of overlapping segments.
2. System Tuning
• We tuned segment length for fixed length (60
seconds) and DT-based (50 and 120 seconds)
segmentation.
• We experimented with post-filtering of the
retrieved segments.
• We employed the metadata.
• The best results are achieved in experiments
using subtitles.
• In most of the cases, LIMSI scored better than
LIUM and NST-Sheffield transcripts and NST-Sheffield
scored better than LIUM.
• In most cases, the concatenation of the segment
with metadata improved the results.
• The fixed-length segmentation outperforms the
DT-based segmentation with 120-seconds-long
segments.
• 50-seconds-long DT-segments outperform the
fixed-length segments measured by MAP and
precision-based measures, but are outperformed by
fixed-length segments using the MAP-bin and
MAP-tol measures.
4. Conclusion
Acknowledgments
This research is supported by the Czech Science Foundation, grant
number P103/12/G084, Charles University Grant Agency GA UK, grant
number 920913, and by SVV project number 260 104.
Transcripts Segmentation Segment
Length Metadata Overlap MAP P 5 P 10 P 20 MAP-bin MAP-tol
Subtitles Fixed 60s No No 0.4209 0.7933 0.7433 0.5950 0.3192 0.3155
Subtitles Fixed 60s Yes No 0.5127 0.7467 0.7267 0.6100 0.3433 0.3023
Subtitles Fixed 60s Yes Yes 4.3527 0.7867 0.7733 0.7683 0.4150 0.1459
Subtitles DT 120s Yes No 0.3692 0.7467 0.7133 0.6050 0.2606 0.2157
Subtitles DT 120s Yes Yes 16.3486 0.8400 0.8367 0.8433 0.3172 0.0515
Subtitles DT 50s Yes No 0.8028 0.7867 0.7667 0.6933 0.3199 0.2350
LIMSI DT 120s Yes Yes 4.6366 0.7133 0.7300 0.7617 0.3706 0.1007
LIMSI Fixed 60s Yes No 0.4725 0.6667 0.6633 0.5467 0.3160 0.2696
LIUM DT 120s Yes Yes 4.0709 0.6533 0.6800 0.6900 0.3345 0.099
LIUM Fixed 60s Yes No 0.4371 0.6333 0.6400 0.5367 0.2651 0.2327
NST-Sheffield DT 120s Yes Yes 10.0198 0.7267 0.7533 0.7650 0.3342 0.0675
NST-Sheffield Fixed 60s Yes No 0.4645 0.6667 0.6600 0.5667 0.2974 0.2598
• The highest results are achieved on the subtitles.
• Metadata improve the results.
• Segmentation into fixed-length segments is very
effective, but it can be outperformed.
• MAP-tol measure may be preferred if retrieved
segments can overlap.
DT-based Segmentation:
• For each word in the transcript, it decides
whether the segment ends after this word or
not.
• Makes use of cue word n-grams, cue tag
n-grams, silence between words, capital letters,
division given in transcripts, and the output of
the TextTiling algorithm.
• The score of all measures, except the MAP-tol
measure, are notably higher in the experiments
which do not use post-filtering.
• In these cases, the relevant retrieved segments
frequently overlap each other.
• User may have already seen the relevant
segment → these segments could not be expected
to be so beneficial.