NUIG Research Showcase 2014
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

NUIG Research Showcase 2014

on

  • 49 views

Poster presented at NUIG Research Showcase 2014

Poster presented at NUIG Research Showcase 2014

Statistics

Views

Total Views
49
Views on SlideShare
49
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

NUIG Research Showcase 2014 Document Transcript

  • 1. Entity Linking with Multiple Knowledge Bases What is the text talking about? Motivation Written communication has been a common way of sharing knowledge between humans. But machines understand natural language text as a sequence of characters without any meaning. When asked about a term (sequence of characters) the computer can spot that sequence but cannot explain its meaning. Bianca Pereira This project has been funded by Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289. Proposed Solution Even big cross-domain Knowledge Bases do not cover all knowledge in the world. Then, our solution aims the use of multiple Knowledge Bases to perform Entity Linking. In other words, we want to enable the use of different sources of concepts. Our approach is based on three main steps: selection of textual features, selection of Knowledge Base Features, and use of a Collective Inference Algorithm. When a human reader wants to understand the content of a text she uses the words around a given term to determine its meaning (context words). Noun phrases and verbs are the main source of information. In the same way, words appearing near the term are more relevant than those appearing far in the text. In a computer-based environment those features are extracted and used to measure how probable a given concept in the knowledge base has been cited by that term. When analyzing those context words, a human also performs the mapping between the words in the text and her previous knowledge. This is used to modify the probability that the term is citing a given concept instead of another one. In a computer-based environment, the relationship between concepts in a Knowledge Base can be used to modify the probability of linking with a given entry. In the last step, a human uses the coherence characteristic of a text to perform the understanding of all terms. The basic assumption is that terms appearing in a coherent text are somehow related in the previous knowledge of the reader (unless they are concepts introduced by the text). In a computer-based environment, this step aggregates all features and, using the probabilities computed, detect all the best linking between each term in the text and their respective concepts in the Knowledge Base. This is done through a process called Collective Inference. Problem Statement Natural language texts are hard to understand due to two linguistic features: polysemy and synonymy. Related Work Humans process the content of a text first by matching the terms with their previous knowledge. In a computer-based environment this previous knowledge is given by a Knowledge Base. In Computer Science, the process that mimics this linking process is called Entity Linking. It is the task of linking terms in a text with Knowledge Base entries that represent the same real world concept. Previous work [1][2] have been successful in linking text with cross-domain Knowledge Bases (e.g. Wikipedia, DBPedia and YAGO). Challenges The disambiguation of terms is our key challenge. In other words, the definition of the right concept for each term cited in text. Since our goal is in the use of multiple Knowledge Bases there are also two other challenges to address: the processing of Big Data and the hetereogeneity in the semantic description of Knowledge Bases. This text is not meaningful for machines. This text is not meaningful for machines. SOURCE: http://google.com SOURCE: http://bing.com SOURCE: http://yahoo.com Polysemy happens when a single term may be related to more than one concept. Synonymy happens when there are many terms that refer to the same concept. Jackson NUIG National University of Ireland, Galway Michael Jackson, the singer of Black or White, died in 2009. http://en.wikipedia.org/wiki/Michael_Jackso n http://en.wikipedia.org/wiki/Black_or_White X X I started my night watching Copacabana and ended in a party dancing Havana D’Primera. Michael Jackson, the composer of Blame it on the Boogie, has the same name of the member of Jackson 5. ? ? context words http://musicbrainz.org/work/8ffc75e5- 3ddb-4a6a-a2d5-8ec5ecee1c78 singer_of composer_of http://musicbrainz.org/artist/f27ec8db- af05-4f36-916e-3d57f91ecf5e http://musicbrainz.org/artist/059e57d8- af63-4d90-8078-ebed36985fff Michael Jackson, the composer of Blame it on the Boogie, has the same name of the member of Jackson 5. ? ? ? Main Findings Not all Knowledge Bases contain textual descriptions for all concepts. As major previous work assume. Is it possible to perform Entity Linking with Knowledge Bases other than the previous cross- domain ones [3]? How is the method when applied in cross-domain ones [4]? To be continued.. (a.k.a. Future Work) References [1] Hachey, B., Radford, W., Nothman, J., Honnibal, M., & Curran, J.R. (2013). Evaluating Entity Linking with Wikipedia. Artificial Intelligence, 194, 130-150. [2] Hoffart, J., Yosef, M. A., Bordino, I., Fürstenau, H., Pinkal, M., Spaniol, M., … & Weikum, G. (2011, July). Robust Disambiguation of named entities in text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 782 -792). Association for Computational Linguistics. [3] Pereira, B., Aggarwal, N., & Buitelaar, P. (2013, May). AELA: an adaptive entity linking approach. In Proceedings of the 22nd international conference on World Wide Web companion (pp. 87-88). International World Wide Web Conferences Steering Committee. [4] EuroSentiment Project. Work Package 4. http://eurosentiment.eu Pictures from http://pixabay.com