Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities

969 views

Published on

  • Be the first to comment

Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities

  1. 1. ELIS – Multimedia LabMediaEval: Search and Hyperlinking 4-5 October, Pisa, Italy Tom De Nies Pedro Debevere, Davy Van Deursen, Wesley De Neve, Erik Mannens and Rik Van de Walle Ghent University – IBBT – Multimedia Lab
  2. 2. ELIS – Multimedia Lab Our approach in a nutshell1. Create enriched representation of videos and queries2. Apply multiple similarity metrics3. Merge results by late fusion MediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 2 05/10/2012
  3. 3. ELIS – Multimedia LabEnriched Data Representation MediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 3 05/10/2012
  4. 4. ELIS – Multimedia Lab Enriched Data RepresentationAdvantages  Comparable queries and videos  Extra metadata containing disambiguated concepts  Easy conversion from video to query object → possible to use same approach for Search and Linking!Disadvantages o Enrichment step when ingesting data can take a while o Only English NER tools → automatic translation step for other languages MediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 4 05/10/2012
  5. 5. ELIS – Multimedia Lab1. Create enriched representation of videos and queries2. Apply multiple similarity metrics3. Merge results by late fusion MediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 5 05/10/2012
  6. 6. ELIS – Multimedia Lab Similarity metrics1. “Bag of words” similarity2. Named Entity-based similarity3. Tag-based similarity MediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 6 05/10/2012
  7. 7. ELIS – Multimedia Lab Bag of Words similarity TEXT STOP WORD TEXT WITHOUT REMOVAL STOPWORDS TF(t,D) = # of CALCULATE occurrences of TERM FREQUENCY (TF) t in D FOREACHWORD CALCULATE INVERSE DOCUMENT FREQUENCY (IDF) MediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 7 05/10/2012
  8. 8. ELIS – Multimedia LabBag of Words similarity MediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 8 05/10/2012
  9. 9. ELIS – Multimedia Lab Bag of Words similarity Both corpus & documents taken into accountCommon words get lower weight to exploit unique features Expensive training step (IDF initialization) No semantics → ambiguity MediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 9 05/10/2012
  10. 10. ELIS – Multimedia Lab Named Entity-based SimilarityNamed Entities are extracted from content Similar content will have similar entities! MediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 10 05/10/2012
  11. 11. ELIS – Multimedia LabNamed Entity-based Similarity MediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 11 05/10/2012
  12. 12. ELIS – Multimedia Lab Named Entity-based SimilarityLess entities than terms → less calculations than BoW IDF → IS : no indexing of corpus required Named Entities are unambiguous Lower precision / coarser granularity than BoW MediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 12 05/10/2012
  13. 13. ELIS – Multimedia LabTag-based similarityMediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 13 05/10/2012
  14. 14. ELIS – Multimedia Lab Tag-based similarity Uses user-generated metadata Synonyms for higher recallVery coarse granularity / Low precision MediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 14 05/10/2012
  15. 15. ELIS – Multimedia Lab1. Create enriched representation of videos and queries2. Apply multiple similarity metrics3. Merge results by late fusion MediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 15 05/10/2012
  16. 16. ELIS – Multimedia Lab Late FusionMediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 16 05/10/2012
  17. 17. ELIS – Multimedia Lab Evaluation: Search MRR mGAP MASPRun 60 30 10 60 30 10 60 30 101 (LIMSI: 0.188 0.15 0.117 0.120 0.089 0.033 0.066 0.066 0.061BoW+NE)2 (LIUM: 0.254 0.187 0.054 0.140 0.069 0.033 0.046 0.046 0.028BoW+NE)3 (LIMSI: 0.165 0.128 0.094 0.099 0.069 0.017 0.061 0.061 0.057BoW+NE+Tags)4 (LIUM: 0.221 0.154 0.038 0.115 0.053 0.017 0.040 0.041 0.023BoW+NE+Tags) MediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 17 05/10/2012
  18. 18. ELIS – Multimedia Lab Evaluation: SearchUnexpected:• LIUM > LIMSI, even though LIMSI had better language detection → due to automatic translation?• NE + BoW > NE + BoW + Tags → Tags give false positives higher rank and find more results, so MRR decreases MediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 18 05/10/2012
  19. 19. ELIS – Multimedia Lab Evaluation: SearchRun Precision @60 Recall @601 (LIMSI: BoW+NE) 0.056 0.402 (LIUM: BoW+NE) 0.061 0.4673 (LIMSI: BoW+NE+Tags) 0.054 0.4334 (LIUM: BoW+NE+Tags) 0.059 0.50 MediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 19 05/10/2012
  20. 20. ELIS – Multimedia Lab Evaluation: Linking MAP (Ground Truth) MAP (Search results)LIMSI (BoW + NE) 0.157 0.014LIUM (BoW + NE) 0.171 0.040LIMSI (BoW + NE + Tags) 0.157 0.003LIUM (BoW + NE + Tags) 0.171 0.037Possible explanations:• Thresholds optimized for Search task, not for Linking• User-generated tags vs. extracted tags… to be investigated! MediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 20 05/10/2012
  21. 21. ELIS – Multimedia Lab Improvements / Future Work• Better ranking criteria / late fusion• Improve tag-similarity• Optimize parameters for linking MediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 21 05/10/2012
  22. 22. ELIS – Multimedia Lab DiscussionThese research activities were funded by Ghent University, IBBT, the IWTFlanders, the FWO-Flanders, and the European Union, in the context of theIBBT project Smarter Media in Flanders (SMIF). MediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 22 05/10/2012

×