Your SlideShare is downloading. ×
0
Scenario-Driven Selection and Exploitation of Semantic Data for Optimal Named Entity Disambiguation
Scenario-Driven Selection and Exploitation of Semantic Data for Optimal Named Entity Disambiguation
Scenario-Driven Selection and Exploitation of Semantic Data for Optimal Named Entity Disambiguation
Scenario-Driven Selection and Exploitation of Semantic Data for Optimal Named Entity Disambiguation
Scenario-Driven Selection and Exploitation of Semantic Data for Optimal Named Entity Disambiguation
Scenario-Driven Selection and Exploitation of Semantic Data for Optimal Named Entity Disambiguation
Scenario-Driven Selection and Exploitation of Semantic Data for Optimal Named Entity Disambiguation
Scenario-Driven Selection and Exploitation of Semantic Data for Optimal Named Entity Disambiguation
Scenario-Driven Selection and Exploitation of Semantic Data for Optimal Named Entity Disambiguation
Scenario-Driven Selection and Exploitation of Semantic Data for Optimal Named Entity Disambiguation
Scenario-Driven Selection and Exploitation of Semantic Data for Optimal Named Entity Disambiguation
Scenario-Driven Selection and Exploitation of Semantic Data for Optimal Named Entity Disambiguation
Scenario-Driven Selection and Exploitation of Semantic Data for Optimal Named Entity Disambiguation
Scenario-Driven Selection and Exploitation of Semantic Data for Optimal Named Entity Disambiguation
Scenario-Driven Selection and Exploitation of Semantic Data for Optimal Named Entity Disambiguation
Scenario-Driven Selection and Exploitation of Semantic Data for Optimal Named Entity Disambiguation
Scenario-Driven Selection and Exploitation of Semantic Data for Optimal Named Entity Disambiguation
Scenario-Driven Selection and Exploitation of Semantic Data for Optimal Named Entity Disambiguation
Scenario-Driven Selection and Exploitation of Semantic Data for Optimal Named Entity Disambiguation
Scenario-Driven Selection and Exploitation of Semantic Data for Optimal Named Entity Disambiguation
Scenario-Driven Selection and Exploitation of Semantic Data for Optimal Named Entity Disambiguation
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Scenario-Driven Selection and Exploitation of Semantic Data for Optimal Named Entity Disambiguation

435

Published on

Published in: Investor Relations
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
435
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
4
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Scenario-Driven Selection and Exploitation of Semantic Data for Optimal Named Entity Disambiguation Panos Alexopoulos, Carlos Ruiz, Jose Manuel Gomez Perez 1st Semantic Web and Information Extraction Workshop, Galway, Ireland, October 9th, 2012
  • 2. Agenda Introduction  Problem Definition and Paper Focus  Approach Overview and Rationale Proposed Disambiguation Framework  Disambiguation Evidence Model  Entity Disambiguation Process Framework Evaluation  Evaluation Process  Evaluation Results Conclusions and Future Work 2
  • 3. Introduction Problem DefinitionEntity Resolution & Disambiguation ● Named entity resolution involves detecting mentions of named entities (e.g. people, organizations or locations) within texts and mapping them to their corresponding entities in a given knowledge source. ● One important challenge in this task is the correct disambiguation of the detected entities. ● For example: ● “Siege of Tripolitsa took place in Tripoli with Theodoros Kolokotronis being the leader of the Greeks. This event marked an early victory for the fight for independence from Turkey but it was also a massacre against the Muslim and Jewish population of the city”. ● The term “Tripoli” here refers to http://dbpedia.org/resource/Tripoli,_Greece but it can be mistaken, for example, with Tripoli in Libya or that in Lebanon. 3
  • 4. Introduction Disambiguation Approaches● The majority of disambiguation approaches rely on the strong contextual hypothesis that terms with similar meanings are often used in similar contexts.● The role of these contexts is typically played by already annotated documents (e.g. wikipedia articles) which are used to train term classifiers.● These classifiers link a term to its correct meaning entity, based on the similarity between the term’s textual context and the contexts of its potential entities.● Some more recent approaches utilize semantic structures in order to determine this similarity in a semantic way.● The effectiveness of these latter approaches is highly dependent on: ● The availability of comprehensive semantic information. ● The degree of alignment between the content of the texts to be disambiguated and the semantic data to be used. 4
  • 5. Introduction Alignment and its Importance● Alignment means that the ontology’s elements should cover the domain(s) of the texts to be disambiguated but should not contain other additional elements that: ● Do not belong to the domain. ● Do belong to it but do not appear in the texts.● For example assume the text “Ronaldo scored two goals for Real Madrid“ from a contemporary football match description.● To disambiguate the term “Ronaldo” using an ontology, the only contextual evidence that can be used is the entity “Real Madrid”.● Yet there are two players with that name that are semantically related to Real: ● Cristiano Ronaldo (current player) ● Ronaldo Luis Nazario de Lima (former player).● This means that if both relations are considered then the term will not be disambiguated. 5
  • 6. Introduction Towards Better Alignment● In the previous example the fact that the text describes a contemporary football match suggests that, in general, the relation between a team and its former players is not expected to appear in it.● Thus, for such texts, it would make sense to ignore this relation in order to achieve more accurate disambiguation.● Based on this observation, we make two claims: ● That there are certain scenarios where there is available a priori knowledge about what entities and relations are expected to be present in the text. ● That this knowledge can be exploited for better alignment between semantic information and content leading to more effective disambiguation.● To verify these claims we define an entity disambiguation framework that can perform better disambiguation in such scenarios. 6
  • 7. Proposed Framework Approach● We target the task of entity disambiguation based on the intuition that a given ontological entity is more likely to represent the meaning of an ambiguous term when there are many ontologically related to it entities in the text.● E.g. in the example text the entities “Siege of Tripolitsa” and “Theodoros Kolokotronis” indicate that the term “Tripoli” refers to the city of Greece.● These evidential entities are derived from one or more domain ontologies.● However, which entities and to what extent may serve as evidence in a given application scenario depends on the domain and expected content of the texts.● For that, the key ability our framework provides to its users is to construct, in a semi-automatic manner, semantic evidence models for specific disambiguation scenarios and use them to perform entity disambiguation within them. 7
  • 8. Proposed Framework Framework Components● A Disambiguation Evidence Model that contains the semantic entities that may serve as disambiguation evidence for the scenario’s target entities in the given scenario. ● Each pair of a target entity and an evidential one is accompanied by a degree that quantifies the latter’s evidential power for the given target entity.● A Disambiguation Evidence Model Construction Process that builds, in a semi- automatic manner, a disambiguation evidence model for a given scenario.● An Entity Disambiguation Process that uses the evidence model to detect and extract from a given text terms that refer to the scenario’s target entities. ● Each term is linked to one or more possible entity uris along with a confidence score calculated for each of them. ● The entity with the highest confidence should be the one the term actually refers to. 8
  • 9. Proposed Framework Disambiguation Evidence Model● Defines for each ontology entity which other instances and to what extent should be used as evidence towards its correct meaning interpretation.● It consists of entity pairs where a particular entity provides quantified evidence for a another one. 9
  • 10. Proposed Framework Evidence Model Construction● Construction of the evidence model depends on the characteristics of the domain and the texts.● The first step of the construction is manual and involves: ● The identification of the concepts whose instances we wish to disambiguate (e.g. locations) ● The determination, for each of these concepts, of the related to them concepts whose instances may serve as contextual disambiguation evidence/ ● For example, in texts that describe historical events, some concepts whose instances may act as location evidence are related locations, historical events, and historical groups and persons. ● The identification, for each pair of evidence and target concept, of the relation paths that links them. 10
  • 11. Proposed Framework Evidence Model Construction● The result of this first step is a table like the following ones: 11
  • 12. Proposed Framework Evidence Model Construction● Based on these tables, the second step of the construction is automatic and involves the generation of the target-evidence entity pairs along with a disambiguation evidential strength.● This strength is inversely proportional to the number of different same-name target entities a given evidential entity provides evidence for.● For example, “Getafe” provides evidence for “Pedro Leon” to a strength of 0.5 because it has another player called Pedro. 12
  • 13. Proposed Framework Entity Resolution Process● Step 1: We extract from the text the terms that possibly refer to the target entities as well as those that refer to their respective evidential entities. ● Extraction is performed with Knowledge Tagger, an in-house tool based on GATE.● Step 2: Using the evidential entities we compute for each extracted target entity term the confidence that it refers to a particular target entity. ● The target entity with the highest confidence is expected to be the correct one. 13
  • 14. Proposed Framework Disambiguation ExampleCorrect disambiguation of the term Atletico 14
  • 15. Framework Evaluation Evaluation ProcessDescription ● Two disambiguation scenarios: ● Football match descriptions. ● Texts describing military conflicts. ● DBPedia as a source of semantic information in both cases. ● Disambiguation effectiveness measured through precision and recall. ● Evaluation results were compared to those achieved by two publicly available semantic annotation and disambiguation systems; ● DBPedia Spotlight ● AIDA ● The two systems: ● Use also DBPedia as a knowledge source. ● Provide the users the capability to select the classes whose instances are to be included in the process. 15
  • 16. Framework Evaluation Evaluation ResultsFootball Match Descriptions Scenario ● 50 texts describing football matches. ● E.g. “Its the 70th minute of the game and after a magnificent pass by Pedro, Messi managed to beat Claudio Bravo. Barcelona now leads 1-0 against Real."Disambiguation Results 16
  • 17. Framework Evaluation Evaluation ResultsMilitary Conflict Texts Scenario ● 50 historical texts describing military conflicts. ● E.g. “The Siege of Augusta was a significant battle of the American Revolution. Fought for control of Fort Cornwallis, a British fort near Augusta, the battle was a major victory for the Patriot forces of Lighthorse Harry Lee and a stunning reverse to the British and Loyalist forces in the South”.Disambiguation Results 17
  • 18. Conclusions and Future Work Key Points● We proposed a novel framework for optimizing named entity disambiguation in well- defined and adequately constrained scenarios through the customized selection and exploitation of semantic data.● Our purpose was not to build another generic disambiguation system but rather a reusable framework that can: ● Be relatively easily adapted to the particular characteristics of the domain and application scenario at hand. ● Exploit these characteristics to increase the effectiveness of the disambiguation process.● The key aspect of the framework is the semi-automatic process it defines for selecting the optimal evidence model for the scenario at hand. 18
  • 19. Conclusions and Future Work Key Points● Comparative evaluation in two specific scenarios verified the framework’s superiority over existing approaches that are designed to work in open domains and unconstrained scenarios.● This verified our hypothesis that the scenario adaptation capabilities of such generic disambiguation systems can be inadequate in certain scenarios.● Of course, the framework’s usability and effectiveness is directly proportional to the content specificity of the texts to be disambiguated and the availability and quality of a priori semantic knowledge about their content. ● The greater these two parameters are, the more applicable is our approach and the more effective the disambiguation is expected to be. ● The opposite is true as the texts become more generic and the information we have out about them more scarce. 19
  • 20. Conclusions and Future Work Framework Extensions● Fully automated construction of the disambiguation evidence model. ● Challenge here is how to automatically identify the text’s domain/topic.● Combination with statistical methods for cases where available domain semantic information is incomplete. ● Challenge here is how to select the optimal ratio of ontological evidence v.s. statistical one.● Development of tool to enable users to dynamically build such models out of existing semantic data and use them for disambiguation purposes 20
  • 21. Thank you! Contact iSOCO Dr. Panos Alexopoulos Senior Researcher palexopoulos@isoco.com Questions?Barcelona Madrid Pamplona ValenciaTel +34 935 677 200 Tel +34 913 349 797 Tel +34 948 102 408 Tel +34 963 467 143Edificio Testa A Av. del Partenón, 16-18, 1º7ª Parque Tomás Oficina 107C/ Alcalde Barnils, 64-68 Campo de las Naciones Caballero, 2, 6º-4ª C/ Prof. Beltrán Báguena, 4St. Cugat del Vallès 28042 Madrid 31006 Pamplona 46009 Valencia08174 Barcelona 21

×