NUIG Research Showcase 2014


Published on

Poster presented at NUIG Research Showcase 2014

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

NUIG Research Showcase 2014

  1. 1. Entity Linking with Multiple Knowledge Bases What is the text talking about? Motivation Written communication has been a common way of sharing knowledge between humans. But machines understand natural language text as a sequence of characters without any meaning. When asked about a term (sequence of characters) the computer can spot that sequence but cannot explain its meaning. Bianca Pereira This project has been funded by Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289. Proposed Solution Even big cross-domain Knowledge Bases do not cover all knowledge in the world. Then, our solution aims the use of multiple Knowledge Bases to perform Entity Linking. In other words, we want to enable the use of different sources of concepts. Our approach is based on three main steps: selection of textual features, selection of Knowledge Base Features, and use of a Collective Inference Algorithm. When a human reader wants to understand the content of a text she uses the words around a given term to determine its meaning (context words). Noun phrases and verbs are the main source of information. In the same way, words appearing near the term are more relevant than those appearing far in the text. In a computer-based environment those features are extracted and used to measure how probable a given concept in the knowledge base has been cited by that term. When analyzing those context words, a human also performs the mapping between the words in the text and her previous knowledge. This is used to modify the probability that the term is citing a given concept instead of another one. In a computer-based environment, the relationship between concepts in a Knowledge Base can be used to modify the probability of linking with a given entry. In the last step, a human uses the coherence characteristic of a text to perform the understanding of all terms. The basic assumption is that terms appearing in a coherent text are somehow related in the previous knowledge of the reader (unless they are concepts introduced by the text). In a computer-based environment, this step aggregates all features and, using the probabilities computed, detect all the best linking between each term in the text and their respective concepts in the Knowledge Base. This is done through a process called Collective Inference. Problem Statement Natural language texts are hard to understand due to two linguistic features: polysemy and synonymy. Related Work Humans process the content of a text first by matching the terms with their previous knowledge. In a computer-based environment this previous knowledge is given by a Knowledge Base. In Computer Science, the process that mimics this linking process is called Entity Linking. It is the task of linking terms in a text with Knowledge Base entries that represent the same real world concept. Previous work [1][2] have been successful in linking text with cross-domain Knowledge Bases (e.g. Wikipedia, DBPedia and YAGO). Challenges The disambiguation of terms is our key challenge. In other words, the definition of the right concept for each term cited in text. Since our goal is in the use of multiple Knowledge Bases there are also two other challenges to address: the processing of Big Data and the hetereogeneity in the semantic description of Knowledge Bases. This text is not meaningful for machines. This text is not meaningful for machines. SOURCE: SOURCE: SOURCE: Polysemy happens when a single term may be related to more than one concept. Synonymy happens when there are many terms that refer to the same concept. Jackson NUIG National University of Ireland, Galway Michael Jackson, the singer of Black or White, died in 2009. n X X I started my night watching Copacabana and ended in a party dancing Havana D’Primera. Michael Jackson, the composer of Blame it on the Boogie, has the same name of the member of Jackson 5. ? ? context words 3ddb-4a6a-a2d5-8ec5ecee1c78 singer_of composer_of af05-4f36-916e-3d57f91ecf5e af63-4d90-8078-ebed36985fff Michael Jackson, the composer of Blame it on the Boogie, has the same name of the member of Jackson 5. ? ? ? Main Findings Not all Knowledge Bases contain textual descriptions for all concepts. As major previous work assume. Is it possible to perform Entity Linking with Knowledge Bases other than the previous cross- domain ones [3]? How is the method when applied in cross-domain ones [4]? To be continued.. (a.k.a. Future Work) References [1] Hachey, B., Radford, W., Nothman, J., Honnibal, M., & Curran, J.R. (2013). Evaluating Entity Linking with Wikipedia. Artificial Intelligence, 194, 130-150. [2] Hoffart, J., Yosef, M. A., Bordino, I., Fürstenau, H., Pinkal, M., Spaniol, M., … & Weikum, G. (2011, July). Robust Disambiguation of named entities in text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 782 -792). Association for Computational Linguistics. [3] Pereira, B., Aggarwal, N., & Buitelaar, P. (2013, May). AELA: an adaptive entity linking approach. In Proceedings of the 22nd international conference on World Wide Web companion (pp. 87-88). International World Wide Web Conferences Steering Committee. [4] EuroSentiment Project. Work Package 4. Pictures from