• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
LinkedTV @ MediaEval 2013 Search and Hyperlinking Task
 

LinkedTV @ MediaEval 2013 Search and Hyperlinking Task

on

  • 615 views

This paper aims at presenting the results of LinkedTV's rst ...

This paper aims at presenting the results of LinkedTV's rst
participation to the Search and Hyperlinking task at Medi-
aEval challenge 2013. We used textual information, tran-
scripts, subtitles and metadata, and we tested their combi-
nation with automatically detected visual concepts. Hence,
we submitted various runs to compare diverse approaches
and see the improvement when adding visual information.

Statistics

Views

Total Views
615
Views on SlideShare
538
Embed Views
77

Actions

Likes
0
Downloads
0
Comments
0

1 Embed 77

https://twitter.com 77

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Input from Daniel regarding the progress in Audio Analysis and VideoOCR*** adoption of new video OCR*** speech processing - preparation of new paradigms: **** deep neural networks (automatic speech recognition) **** i-vectors + SVMs using cosine kernel (speaker recognition)

LinkedTV @ MediaEval 2013 Search and Hyperlinking Task LinkedTV @ MediaEval 2013 Search and Hyperlinking Task Presentation Transcript

  • Television Linked To The Web LinkedTV @ MediaEval Search and Hyperlinking M. Sahuguet1, B. Huet1, B. Cervenková2, E. Apostolidis4, V. Mezaris4, D. Stein3, S. Eickeler3, J.L. Redondo Garcia1, R. Troncy1, and L. Pikora2 MediaEval 2013 Workshop Barcelona, Catalunya, Spain, 18-19 October 2013. (1) (2) www.linkedtv.eu (3) (4)
  • LinkedTV ― Television Linked To the Web www.linkedtv.eu LinkedTV: interweaving Web and TV into a single experience Second screen scenario for enriching television content and achieving interaction between user and content Web: http://www.linkedtv.eu 2 LinkedTV @ MediaEval Search and Hyperlinking 2013 10/18/2013
  • LinkedTV@MediaEval www.linkedtv.eu  MediaEval Search & Hyperlinking: an overview of LinkedTV’s enrichment process         Brainstorming Pre-processing (BBC dataset) Video segmentation Indexing data in Lucene From visual cues to detected concepts Search task Hyperlinking task Conclusion 3 LinkedTV @ MediaEval Search and Hyperlinking 2013 10/18/2013
  • Brainstorming www.linkedtv.eu  Brainstorming meeting: Tasks and Dataset analysis Shots are too small to return to user Typos in the queries Duplicate videos in the dataset Visual concepts are not usable as such Visual cues may not be helpful Visual cues can also help as search terms Maybe we can segment the videos differently? Can we use speaker information? Name of show/channel may appear in the query Actors/Character names may appear What analysis can we further apply on videos? 4 LinkedTV @ MediaEval Search and Hyperlinking 2013 10/18/2013
  • Brainstorming www.linkedtv.eu  Brainstorming meeting: Tasks and Dataset analysis  Search:   Getting the right video is possible Need to extract segment with good timing  Segmentation level is of major importance  Shot are too short  We want to be as close as possible to the viewer  Visual cues: not always helpful <visualQueues>2 men sitting opposite each other</visualQueues> <visualQueues>stands out and grabs your attention</visualQueues>  Need to design a framework to use Visual Cues  How can the LinkedTV media analysis tools be used? 5 LinkedTV @ MediaEval Search and Hyperlinking 2013 10/18/2013
  • Pre-processing dataset www.linkedtv.eu  Processing ~ 1697h of BBC video data Visual Concept detection (151) 20 days on 100 cores Scene segmentation CERTH 2 days on 6 cores OCR Fraunhofer 1 day on 10 cores Keywords extraction Fraunhofer 5 hours Named Entities extraction Eurecom 4 days Face detection and tracking 6 CERTH Eurecom 4 days on 160 cores LinkedTV @ MediaEval Search and Hyperlinking 2013 10/18/2013
  • Video Segmentation www.linkedtv.eu  Shots (provided by Task Organisers)  Scenes: groups of adjacent shots    Visual similarity Temporal consistency P. Sidiropoulos, V. Mezaris, I. Kompatsiaris, H. Meinedo, M. Bugalho, and I. Trancoso. Temporal Video Segmentation to Scenes Using High-Level Audiovisual Features. IEEE Transactions on Circuits and Systems for Video Technology, 2011  Sliding windows:  7 inspired from M. Eskevich, G. Jones, C. Wartena, M. Larson, R. Aly, T. Verschoor, and R. Ordelman. Comparing retrieval effectiveness of alternative content segmentation methods for Internet video search. 10th International Workshop on Content-Based Multimedia Indexing (CBMI), 2012 LinkedTV @ MediaEval Search and Hyperlinking 2013 10/18/2013
  • Indexing data in Lucene www.linkedtv.eu  Lucene engine for indexing the data  Index at different temporal granularities:  Video level (pre-filtering)  Scenes level  Shot level  Sliding windows segments level  Index different features at each temporal granularity:  Text (transcripts, subtitles)  Metadata (title, synopsis, cast, etc)  OCR  Visual concepts values (floating point fields)  Design a framework for querying indexes and returning video segments from a query 8 LinkedTV @ MediaEval Search and Hyperlinking 2013 10/18/2013
  • From visual cues to detected concepts www.linkedtv.eu  Text search is straightforward (default, TF-IDF values)  Need to incorporate visual information to the search 9 LinkedTV @ MediaEval Search and Hyperlinking 2013 10/18/2013
  • From visual cues to detected concepts www.linkedtv.eu  Text search is straightforward (default, TF-IDF values)  Need to incorporate visual information to the search  Which concepts are present in the query?  semantic word distance based on Wordnet synset  mapping between keywords (extracted from the visual cues query) and visual concepts <visualQueues>animals, kenya wildlife reserve, marathon</visualQueues> mapped visual concepts: Athlete, Dogs, Horse, Animal 10 LinkedTV @ MediaEval Search and Hyperlinking 2013 10/18/2013
  • From visual cues to detected concepts www.linkedtv.eu  Text search is straightforward (default, TF-IDF values)  Need to incorporate visual information to the search  Which concepts are present in the query?  semantic word distance based on Wordnet synset  mapping between keywords (extracted from the visual cues query) and visual concepts <visualQueues>animals, kenya wildlife reserve, marathon</visualQueues> mapped visual concepts: Athlete, Dogs, Horse, Animal  Integration of detected visual concepts to the Lucene search:  Concepts filtering 11 LinkedTV @ MediaEval Search and Hyperlinking 2013 10/18/2013
  • From visual cues to detected concepts www.linkedtv.eu  Text search is straightforward (default, TF-IDF values)  Need to incorporate visual information to the search  Which concepts are present in the query?  semantic word distance based on Wordnet synset  mapping between keywords (extracted first results: - Correct detection rate from the 100 from the visual cues query) and visual concepts 0,5 - threshold at <visualQueues>animals, kenya wildlife reserve, marathon</visualQueues> - Normalize confidence: threshold at 0,7 mapped visual concepts: Athlete, Dogs, Horse, Animal  Integration of detected visual concepts to the Lucene search:  Concepts filtering 12 LinkedTV @ MediaEval Search and Hyperlinking 2013 10/18/2013
  • From visual cues to detected concepts www.linkedtv.eu  Text search is straightforward (default, TF-IDF values)  Need to incorporate visual information to the search  Which concepts are present in the query?  semantic word distance based on Wordnet synset  mapping between keywords (extracted from the visual cues query) and visual concepts <visualQueues>animals, kenya wildlife reserve, marathon</visualQueues> mapped visual concepts: Athlete, Dogs, Horse, Animal  Integration of detected visual concepts to the Lucene search:  Concepts Selection  Designing an enriched query: both textual (text query) and visual information (range query). 13 LinkedTV @ MediaEval Search and Hyperlinking 2013 10/18/2013
  • Search task www.linkedtv.eu  Search videos at different temporal granularity  Concatenation of textual and visual query for text search  <queryText>Odd cars, Fake MacLaren, </queryText>  <visualQueues>Jeremy Clarkson, Richard Hammond, James May, Ferrari 430 Scuderia</visualQueues>  Visual cues can be found in queryText too  If TV Channel is mentioned, perform filtering:  <visualQueues>Cannabis on BBC ONE</visualQueues>  Should also be done on show titles (for next year?)  For some runs, filter at video level first  Making a text query on the video index  Use 20 first video for segment search  Focused search 14 LinkedTV @ MediaEval Search and Hyperlinking 2013 10/18/2013
  • Search task www.linkedtv.eu  Different granularities:     scenes partial scenes (begin at shot ; ends at the corresponding scene ending) temporally clustered shots (inside a video) sliding window  Different textual data (transcript/ASR)  With/Without Visual Concepts  With/Without use of synonyms  9 runs  goal : comparing approaches and features 15 LinkedTV @ MediaEval Search and Hyperlinking 2013 10/18/2013
  • Search task – Results www.linkedtv.eu MASP scenes-C 0.3095 0.1770 0.1951 0.3091 0.1767 0.1947 0.3152 0.1635 0.2021 scenes-I 0.2613 0.1444 0.1582 scenes-U 0.2458 0.1344 0.1528 0.2284 0.1241 0.1024 part-scenes-noC 0.2281 0.1240 0.1021 clustering-C 0.2929 0.1525 0.1814 clustering-noC 0.2849 0.1479 0.1713 SW-60-S 0.2833 0.1925 0.2027 SW-60-I 0.1965 0.1206 0.1204 SW-40-U 16 mGAP part-scenes-C Search over sliding window segments (size 60) MRR scenes-S Scene search using only subtitles Run scenes-noC Scenes search using textual and visual concepts 0.2368 0.1342 0.1501 LinkedTV @ MediaEval Search and Hyperlinking 2013 10/18/2013
  • Hyperlinking Task www.linkedtv.eu  Re-use of the search component   Shot clustering approach Scene approach  Create a query from the anchor!    Get subtitle and shots aligned with anchor Text query: extract keywords using Alchemy API (highest weight to anchor than context) Visual cues query: for each concept, highest score over all shots  Use of “MoreLikeThis” (MLT) feature in Lucene, combined with THD  sliding window approach  Create temporary documents from the anchor!   17 THD = Targeted Hypernym Discovery (UEP): returns semantic annotation, synonyms MLT: finding similar documents as input LinkedTV @ MediaEval Search and Hyperlinking 2013 10/18/2013
  • Hyperlinking results www.linkedtv.eu Run 18 P-10 P-20 0.0577 0.4467 0.3200 0.2067 LA SW MLT 0.1201 0.4200 0.4200 0.3217 LA scenes 0.1770 0.6867 0.5867 0.4167 LC clustering 0.0823 Scenes search in LC condition (anchor + context) P-5 LA clustering Scenes search in LA condition (anchor only) MAP 0.5733 0.4833 0.2767 LC SW MLT 0.1820 0.5667 0.5667 0.4300 LC scenes 0.2523 0.8133 0.7300 0.5283 LinkedTV @ MediaEval Search and Hyperlinking 2013 10/18/2013
  • Conclusions www.linkedtv.eu  Major findings  Scene segmentation approach performs best  Improvement when using visual concepts  when carefully employed  Future work  Improve scene detection  Closer follow human perception  Improve the link between query and visual concepts  Use named entities Thank you Questions? 19 LinkedTV @ MediaEval Search and Hyperlinking 2013 10/18/2013