Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

SANAPHOR: Ontology-based Coreference Resolution

725 views

Published on

Presented at International Semantic Web Conference, 2015.
Bethlehem, PA.

Published in: Technology
  • Be the first to comment

SANAPHOR: Ontology-based Coreference Resolution

  1. 1. SANAPHOR: Ontology-Based Coreference Resolution Roman Prokofyev, Alberto Tonon, Michael Luggen, Loic Vouilloz, Djellel Difallah and Philippe Cudré-Mauroux eXascale Infolab University of Fribourg, Switzerland October 14th, ISWC’15 Bethlehem PA, USA 1
  2. 2. Motivations and Task Overview 2 Task: identify groups (cluster) of co-referring mentions. Example: “Xi Jinping was due to arrive in Washington for a dinner with Barack Obama on Thursday night, in which he will aim to reassure the US president about a rising China. The Chinese president said he favors a “new model of major country relationship" built on understanding, rather than suspicion.” http://www.telegraph.co.uk/ Benefits: • identification of a specific type of an unknown entity • extract more relationships between named entities
  3. 3. State-of-Art in Coreference Resolution Best approaches use generic multi-step algorithm: 1. Pre-processing (POS tagging, parsing, NER) 2. Identification of referring expressions (e.g., pronouns) 3. Anaphoricity determination (“it rains” vs “he took it”) 4. Generation of antecedent candidates 5. Searching/Clustering of candidates Lee et al., Stanford’s multi-pass sieve coreference resolution system at the conll-2011 shared task 3
  4. 4. Motivations for a rich semantic layer 4 http://www.telegraph.co.uk/ “Xi Jinping was due to arrive in Washington for a dinner with Barack Obama on Thursday night, in which he will aim to reassure the US president about a rising China. The Chinese president said he favors a “new model of major country relationship" built on understanding, rather than suspicion.” Syntactic approaches are not able to differentiate between the names of the city and the province.
  5. 5. Semantic layer on top of an existing system 5 Stanford Coref Deterministic Coreference Resolution [US President] [Barack Obama] [Australia] [Quintex Australia] [Quintex ltd.] Documents
  6. 6. Generic overview of the approach Key techniques Split and merge clusters based on their semantics. 6 Clusters produced by Stanford Coref Entity/Type Linking Split clusters Merge clusters SANAPHOR
  7. 7. Pre-Processing: Entity Linking 7 Entity Linking US President Barack Obama Australia Quintex Australia Quintex ltd. US President e1: Barack Obama e2: Australia e3: Quintex Australia e3: Quintex ltd.
  8. 8. Pre-Processing: Semantic Typing 8 Semantic Typing: recognized entities are typed, other mention are typed by string similarity with YAGO. YAGO Index US President e1: Barack Obama t1: US President e1: Barack Obama
  9. 9. Cluster splits 9 Entity- and Type-based splitting on clusters (e2: Australia) (e3: Quintex Australia) (e3: Quintex ltd.) e3: Quintex Australia e3: Quintex ltd. e2: Australia
  10. 10. Cluster splits: heuristics 10 1. Non-identified mention assignment – based on exclusive words in each cluster: Obama ⇒ Barack Obama Jinping ⇒ Xi Jinping 2. Ignore complete subsets of other identified mentions: ✕ Aspen (“Aspen Airways”) ✕ Obama (“Barack Obama”)
  11. 11. Cluster merges 11 Merge different clusters that contain the same types/entities t1: US President e1: Barack Obama (e1: Barack Obama) (t1:US President)
  12. 12. Evaluation CoNLL-2012 Shared Task on Coference Resolution: • over 1M words • 3 parts: development, training and test. Design methods based on dev, evaluate on test. Metrics: • Precision/Recall/F1 for the case of clustering • Evaluate noun-only clusters separately (no pronouns) 12
  13. 13. Cluster linking statistics 13 0 entities 1 entity 2 entities 3 entities All Clusters 4175 849 49 5 Noun-Only Clusters 1208 502 33 2 Total clusters (Stanford Coref): 5078 To be merged To be split All Clusters 270 118 Noun-Only Clusters 77 52
  14. 14. Cluster optimization results 14 • System improves on top of Stanford Coref in both split and merge tasks. • Greater improvement in split task for noun-only clusters, since we do not re-assign pronouns.
  15. 15. Conclusions • Leveraging semantic information improves coreference resolution on top of existing NLP systems. • The performance improves with the improvement of entity and type linking. • Complete evaluation code available at: https://github.com/xi-lab/sanaphor 15 Roman Prokofyev (@rprokofyev) eXascale Infolab (exascale.info), University of Fribourg, Switzerland http://www.slideshare.net/eXascaleInfolab/
  16. 16. Anaphora vs. Coreference “Do you have a cat? I love them.” “a cat” is not an antecedent of “them”. 16
  17. 17. Metrics • True positive (TP) - two similar documents to the same cluster. • True negative (TN) - two dissimilar documents to different clusters. • False positive (FP) - two dissimilar documents to the same cluster. • False negative (FN) - two similar documents to different clusters. 17

×