Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities

912 views

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
912
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities

  1. 1. ELIS – Multimedia LabMediaEval: Search and Hyperlinking 4-5 October, Pisa, Italy Tom De Nies Pedro Debevere, Davy Van Deursen, Wesley De Neve, Erik Mannens and Rik Van de Walle Ghent University – IBBT – Multimedia Lab
  2. 2. ELIS – Multimedia Lab Our approach in a nutshell1. Create enriched representation of videos and queries2. Apply multiple similarity metrics3. Merge results by late fusion MediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 2 05/10/2012
  3. 3. ELIS – Multimedia LabEnriched Data Representation MediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 3 05/10/2012
  4. 4. ELIS – Multimedia Lab Enriched Data RepresentationAdvantages  Comparable queries and videos  Extra metadata containing disambiguated concepts  Easy conversion from video to query object → possible to use same approach for Search and Linking!Disadvantages o Enrichment step when ingesting data can take a while o Only English NER tools → automatic translation step for other languages MediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 4 05/10/2012
  5. 5. ELIS – Multimedia Lab1. Create enriched representation of videos and queries2. Apply multiple similarity metrics3. Merge results by late fusion MediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 5 05/10/2012
  6. 6. ELIS – Multimedia Lab Similarity metrics1. “Bag of words” similarity2. Named Entity-based similarity3. Tag-based similarity MediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 6 05/10/2012
  7. 7. ELIS – Multimedia Lab Bag of Words similarity TEXT STOP WORD TEXT WITHOUT REMOVAL STOPWORDS TF(t,D) = # of CALCULATE occurrences of TERM FREQUENCY (TF) t in D FOREACHWORD CALCULATE INVERSE DOCUMENT FREQUENCY (IDF) MediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 7 05/10/2012
  8. 8. ELIS – Multimedia LabBag of Words similarity MediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 8 05/10/2012
  9. 9. ELIS – Multimedia Lab Bag of Words similarity Both corpus & documents taken into accountCommon words get lower weight to exploit unique features Expensive training step (IDF initialization) No semantics → ambiguity MediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 9 05/10/2012
  10. 10. ELIS – Multimedia Lab Named Entity-based SimilarityNamed Entities are extracted from content Similar content will have similar entities! MediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 10 05/10/2012
  11. 11. ELIS – Multimedia LabNamed Entity-based Similarity MediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 11 05/10/2012
  12. 12. ELIS – Multimedia Lab Named Entity-based SimilarityLess entities than terms → less calculations than BoW IDF → IS : no indexing of corpus required Named Entities are unambiguous Lower precision / coarser granularity than BoW MediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 12 05/10/2012
  13. 13. ELIS – Multimedia LabTag-based similarityMediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 13 05/10/2012
  14. 14. ELIS – Multimedia Lab Tag-based similarity Uses user-generated metadata Synonyms for higher recallVery coarse granularity / Low precision MediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 14 05/10/2012
  15. 15. ELIS – Multimedia Lab1. Create enriched representation of videos and queries2. Apply multiple similarity metrics3. Merge results by late fusion MediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 15 05/10/2012
  16. 16. ELIS – Multimedia Lab Late FusionMediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 16 05/10/2012
  17. 17. ELIS – Multimedia Lab Evaluation: Search MRR mGAP MASPRun 60 30 10 60 30 10 60 30 101 (LIMSI: 0.188 0.15 0.117 0.120 0.089 0.033 0.066 0.066 0.061BoW+NE)2 (LIUM: 0.254 0.187 0.054 0.140 0.069 0.033 0.046 0.046 0.028BoW+NE)3 (LIMSI: 0.165 0.128 0.094 0.099 0.069 0.017 0.061 0.061 0.057BoW+NE+Tags)4 (LIUM: 0.221 0.154 0.038 0.115 0.053 0.017 0.040 0.041 0.023BoW+NE+Tags) MediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 17 05/10/2012
  18. 18. ELIS – Multimedia Lab Evaluation: SearchUnexpected:• LIUM > LIMSI, even though LIMSI had better language detection → due to automatic translation?• NE + BoW > NE + BoW + Tags → Tags give false positives higher rank and find more results, so MRR decreases MediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 18 05/10/2012
  19. 19. ELIS – Multimedia Lab Evaluation: SearchRun Precision @60 Recall @601 (LIMSI: BoW+NE) 0.056 0.402 (LIUM: BoW+NE) 0.061 0.4673 (LIMSI: BoW+NE+Tags) 0.054 0.4334 (LIUM: BoW+NE+Tags) 0.059 0.50 MediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 19 05/10/2012
  20. 20. ELIS – Multimedia Lab Evaluation: Linking MAP (Ground Truth) MAP (Search results)LIMSI (BoW + NE) 0.157 0.014LIUM (BoW + NE) 0.171 0.040LIMSI (BoW + NE + Tags) 0.157 0.003LIUM (BoW + NE + Tags) 0.171 0.037Possible explanations:• Thresholds optimized for Search task, not for Linking• User-generated tags vs. extracted tags… to be investigated! MediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 20 05/10/2012
  21. 21. ELIS – Multimedia Lab Improvements / Future Work• Better ranking criteria / late fusion• Improve tag-similarity• Optimize parameters for linking MediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 21 05/10/2012
  22. 22. ELIS – Multimedia Lab DiscussionThese research activities were funded by Ghent University, IBBT, the IWTFlanders, the FWO-Flanders, and the European Union, in the context of theIBBT project Smarter Media in Flanders (SMIF). MediaEval 2012: Brave New Task: Search and Hyperlinking Tom De Nies (IBBT-MMLab) 22 05/10/2012

×