1. The document discusses an approach for the MediaEval 2012 Search and Hyperlinking task that creates an enriched representation of videos and queries, applies multiple similarity metrics, and merges results through late fusion.
2. Three similarity metrics are used - bag-of-words, named entity-based, and tag-based - each with their own advantages and disadvantages.
3. Evaluation results showed the combination of bag-of-words and named entity-based similarity performed best for search, while improvements are needed for linking, including optimizing parameters.
NMC Horizon Connect Webinar > A New Copyright Solution for UniversitiesNew Media Consortium
Higher education is witnessing a sea change in the way content is created, consumed and curated. Traditional boundaries are blurring in course content and in course delivery systems. Experimental activities in every corner are challenging the business models and support systems of higher education. These challenges are compounded by the many obstacles that exist in traditional mechanisms for content licensing, commonly resulting in under-utilization of content or copyright piracy. It can be very difficult to locate the appropriate rights holders and engage in permissions requests processes, and there are often prohibitively high transaction costs involved in ensuring legally proper use of content.
Responding to this challenge, SIPX (formerly the Stanford Intellectual Property Exchange research project) resolves copyright blockages with user-friendly technology that clears rights for print, digital and online education platforms. It is an active system used by Stanford and is growing rapidly into universities and MOOC platforms. SIPX’s unique approach to copyright leverages technology and institutional relationships to provide an easy and transparent content access experience for both copyright owner and content user.
NMC Horizon Connect Webinar > A New Copyright Solution for UniversitiesNew Media Consortium
Higher education is witnessing a sea change in the way content is created, consumed and curated. Traditional boundaries are blurring in course content and in course delivery systems. Experimental activities in every corner are challenging the business models and support systems of higher education. These challenges are compounded by the many obstacles that exist in traditional mechanisms for content licensing, commonly resulting in under-utilization of content or copyright piracy. It can be very difficult to locate the appropriate rights holders and engage in permissions requests processes, and there are often prohibitively high transaction costs involved in ensuring legally proper use of content.
Responding to this challenge, SIPX (formerly the Stanford Intellectual Property Exchange research project) resolves copyright blockages with user-friendly technology that clears rights for print, digital and online education platforms. It is an active system used by Stanford and is growing rapidly into universities and MOOC platforms. SIPX’s unique approach to copyright leverages technology and institutional relationships to provide an easy and transparent content access experience for both copyright owner and content user.
FInES, ENSEMBLE and A Scientific Perspective For Enterprise InteroperabilityFenareti Lampathaki
An overview of the interoperability history that led to the concept of the Enterprise Interoperability Science Base; the methodology followed; the main achievements so far and the key challenges towards the future.
A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Ber...Pablo Mendes
A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight
Presented at SemTech Berlin 2012
Wikipedia is one of the most important repositories of human knowledge, containing millions of interlinked articles. The DBpedia project extracts and combines Wikipedia information into a large multilingual knowledge base that enables semantic processing in a wide range of applications. We have built DBpedia Spotlight, a tool that recognizes ambiguous terms in text and automatically assigns unambiguous definitions to those terms by connecting them to DBpedia. Such interconnection enriches information by providing explicit semantic relationships, enabling semantic indexing, faceted exploration, among other data processing enhancements. In this talk we will describe how DBpedia Spotlight can be applied to establish a virtuous cycle of semantic enhancement. On the one hand, it can enhance knowledge interconnectivity in document collections. On the other hand, it learns how to better annotate from user feedback. Such a positive feedback loop can be applied on Wikipedia itself, or in enterprises to alleviate the cold start problem and knowledge management costs.
Presentation given at the EMTACL12 conference in Trondheim, Norway, on October 1 2012. Discusses the evolution towards a highly dynamic scholarly record (assets don't have the sense of fixity they used to have; assets are highly interdependent) and how the archiving infrastructure used for scholarly communication can not adequately deal with this dynamism.
FInES, ENSEMBLE and A Scientific Perspective For Enterprise InteroperabilityFenareti Lampathaki
An overview of the interoperability history that led to the concept of the Enterprise Interoperability Science Base; the methodology followed; the main achievements so far and the key challenges towards the future.
A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Ber...Pablo Mendes
A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight
Presented at SemTech Berlin 2012
Wikipedia is one of the most important repositories of human knowledge, containing millions of interlinked articles. The DBpedia project extracts and combines Wikipedia information into a large multilingual knowledge base that enables semantic processing in a wide range of applications. We have built DBpedia Spotlight, a tool that recognizes ambiguous terms in text and automatically assigns unambiguous definitions to those terms by connecting them to DBpedia. Such interconnection enriches information by providing explicit semantic relationships, enabling semantic indexing, faceted exploration, among other data processing enhancements. In this talk we will describe how DBpedia Spotlight can be applied to establish a virtuous cycle of semantic enhancement. On the one hand, it can enhance knowledge interconnectivity in document collections. On the other hand, it learns how to better annotate from user feedback. Such a positive feedback loop can be applied on Wikipedia itself, or in enterprises to alleviate the cold start problem and knowledge management costs.
Presentation given at the EMTACL12 conference in Trondheim, Norway, on October 1 2012. Discusses the evolution towards a highly dynamic scholarly record (assets don't have the sense of fixity they used to have; assets are highly interdependent) and how the archiving infrastructure used for scholarly communication can not adequately deal with this dynamism.
ARF @ MediaEval 2012: Multimodal Video Classification
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities
1. ELIS – Multimedia Lab
MediaEval: Search and Hyperlinking
4-5 October, Pisa, Italy
Tom De Nies
Pedro Debevere, Davy Van Deursen, Wesley De Neve, Erik
Mannens and Rik Van de Walle
Ghent University – IBBT – Multimedia Lab
2. ELIS – Multimedia Lab
Our approach in a nutshell
1. Create enriched representation
of videos and queries
2. Apply multiple similarity metrics
3. Merge results by late fusion
MediaEval 2012: Brave New Task: Search and Hyperlinking
Tom De Nies (IBBT-MMLab) 2
05/10/2012
3. ELIS – Multimedia Lab
Enriched Data Representation
MediaEval 2012: Brave New Task: Search and Hyperlinking
Tom De Nies (IBBT-MMLab) 3
05/10/2012
4. ELIS – Multimedia Lab
Enriched Data Representation
Advantages
Comparable queries and videos
Extra metadata containing disambiguated concepts
Easy conversion from video to query object
→ possible to use same approach for Search and Linking!
Disadvantages
o Enrichment step when ingesting data can take a while
o Only English NER tools → automatic translation step for
other languages
MediaEval 2012: Brave New Task: Search and Hyperlinking
Tom De Nies (IBBT-MMLab) 4
05/10/2012
5. ELIS – Multimedia Lab
1. Create enriched representation
of videos and queries
2. Apply multiple similarity metrics
3. Merge results by late fusion
MediaEval 2012: Brave New Task: Search and Hyperlinking
Tom De Nies (IBBT-MMLab) 5
05/10/2012
6. ELIS – Multimedia Lab
Similarity metrics
1. “Bag of words” similarity
2. Named Entity-based similarity
3. Tag-based similarity
MediaEval 2012: Brave New Task: Search and Hyperlinking
Tom De Nies (IBBT-MMLab) 6
05/10/2012
7. ELIS – Multimedia Lab
Bag of Words similarity
TEXT
STOP WORD
TEXT WITHOUT
REMOVAL
STOPWORDS
TF(t,D) = # of
CALCULATE
occurrences of
TERM FREQUENCY (TF)
t in D
FOR
EACH
WORD CALCULATE
INVERSE DOCUMENT
FREQUENCY (IDF)
MediaEval 2012: Brave New Task: Search and Hyperlinking
Tom De Nies (IBBT-MMLab) 7
05/10/2012
8. ELIS – Multimedia Lab
Bag of Words similarity
MediaEval 2012: Brave New Task: Search and Hyperlinking
Tom De Nies (IBBT-MMLab) 8
05/10/2012
9. ELIS – Multimedia Lab
Bag of Words similarity
Both corpus & documents taken into account
Common words get lower weight to exploit unique
features
Expensive training step (IDF initialization)
No semantics → ambiguity
MediaEval 2012: Brave New Task: Search and Hyperlinking
Tom De Nies (IBBT-MMLab) 9
05/10/2012
10. ELIS – Multimedia Lab
Named Entity-based Similarity
Named Entities are extracted from content
Similar content will have similar entities!
MediaEval 2012: Brave New Task: Search and Hyperlinking
Tom De Nies (IBBT-MMLab) 10
05/10/2012
11. ELIS – Multimedia Lab
Named Entity-based Similarity
MediaEval 2012: Brave New Task: Search and Hyperlinking
Tom De Nies (IBBT-MMLab) 11
05/10/2012
12. ELIS – Multimedia Lab
Named Entity-based Similarity
Less entities than terms → less calculations than BoW
IDF → IS : no indexing of corpus required
Named Entities are unambiguous
Lower precision / coarser granularity than BoW
MediaEval 2012: Brave New Task: Search and Hyperlinking
Tom De Nies (IBBT-MMLab) 12
05/10/2012
13. ELIS – Multimedia Lab
Tag-based similarity
MediaEval 2012: Brave New Task: Search and Hyperlinking
Tom De Nies (IBBT-MMLab) 13
05/10/2012
14. ELIS – Multimedia Lab
Tag-based similarity
Uses user-generated metadata
Synonyms for higher recall
Very coarse granularity / Low precision
MediaEval 2012: Brave New Task: Search and Hyperlinking
Tom De Nies (IBBT-MMLab) 14
05/10/2012
15. ELIS – Multimedia Lab
1. Create enriched representation
of videos and queries
2. Apply multiple similarity metrics
3. Merge results by late fusion
MediaEval 2012: Brave New Task: Search and Hyperlinking
Tom De Nies (IBBT-MMLab) 15
05/10/2012
16. ELIS – Multimedia Lab
Late Fusion
MediaEval 2012: Brave New Task: Search and Hyperlinking
Tom De Nies (IBBT-MMLab) 16
05/10/2012
18. ELIS – Multimedia Lab
Evaluation: Search
Unexpected:
• LIUM > LIMSI, even though LIMSI had better language
detection
→ due to automatic translation?
• NE + BoW > NE + BoW + Tags
→ Tags give false positives higher rank and find more
results, so MRR decreases
MediaEval 2012: Brave New Task: Search and Hyperlinking
Tom De Nies (IBBT-MMLab) 18
05/10/2012
19. ELIS – Multimedia Lab
Evaluation: Search
Run Precision @60 Recall @60
1 (LIMSI: BoW+NE) 0.056 0.40
2 (LIUM: BoW+NE) 0.061 0.467
3 (LIMSI: BoW+NE+Tags) 0.054 0.433
4 (LIUM: BoW+NE+Tags) 0.059 0.50
MediaEval 2012: Brave New Task: Search and Hyperlinking
Tom De Nies (IBBT-MMLab) 19
05/10/2012
20. ELIS – Multimedia Lab
Evaluation: Linking
MAP (Ground Truth) MAP (Search results)
LIMSI (BoW + NE) 0.157 0.014
LIUM (BoW + NE) 0.171 0.040
LIMSI (BoW + NE + Tags) 0.157 0.003
LIUM (BoW + NE + Tags) 0.171 0.037
Possible explanations:
• Thresholds optimized for Search task, not for Linking
• User-generated tags vs. extracted tags
… to be investigated!
MediaEval 2012: Brave New Task: Search and Hyperlinking
Tom De Nies (IBBT-MMLab) 20
05/10/2012
21. ELIS – Multimedia Lab
Improvements / Future Work
• Better ranking criteria / late fusion
• Improve tag-similarity
• Optimize parameters for linking
MediaEval 2012: Brave New Task: Search and Hyperlinking
Tom De Nies (IBBT-MMLab) 21
05/10/2012
22. ELIS – Multimedia Lab
Discussion
These research activities were funded by Ghent University, IBBT, the IWT
Flanders, the FWO-Flanders, and the European Union, in the context of the
IBBT project Smarter Media in Flanders (SMIF).
MediaEval 2012: Brave New Task: Search and Hyperlinking
Tom De Nies (IBBT-MMLab) 22
05/10/2012