Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Context-Enhanced Adaptive Entity Linking

437 views

Published on

"Context-Enhanced Adaptive Entity Linking" talk given at LREC2016

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Context-Enhanced Adaptive Entity Linking

  1. 1. Context-Enhanced Adaptive Entity Linking @giusepperizzo F. Ilievski, G. Rizzo, M. van Erp, J. Plu, R. Troncy
  2. 2. 2http://babelfy.org, as of 2016-05-24
  3. 3. Linguistic approach: A text is parsed by a NER classifier. Entity labels are used to look up resources in a referent KB. A ranking function is used to select the best match (relatedness, semantic similarity) End-to-End approach: A dictionary of mentions and links is built from a referent KB. A text is split in n-grams that are used to look up candidate links from the dictionary. A selection function is used to pick the best match (relatedness, semantic similarity, relevance) Hybrid approach: combination of both Current approaches performing the NEL task 3
  4. 4. ranking and selection of the candidate links are led by the relatedness of the entities in the knowledge base Henri Leland db:Henry_M._Leland Lincoln Motor Company db:Lincoln_Motor_Company Joe ? Lincoln db:Abraham_Lincoln Cadillac db:Cadillac when the context is poor, head entities are favoured 4 ?
  5. 5. “Henry Leland … formed the Lincoln Motor Company... Joe drove a Lincoln for the first time in his life” … Joe PER NIL Lincoln PRO db:Lincoln_Motor _Company 5
  6. 6. Text as input Reranking with Context ... Joe PER NIL Lincoln PRO db:Lincoln_Motor_ Company Resolution and Classification Candidate Selection Mention Extraction General-purpose Hybrid Annotator 6
  7. 7. Text as input Resolution and Classification Candidate Selection Mention Extraction General-purpose Hybrid Annotator - Longest match - Entity type propagation to the longest match - Dictionary fuzzy match - Entity popularity - Named Entity Recognition - Proper Noun Extractor e1 c1,1, …, c1,10 ... en cn,1, …, cn,10 7
  8. 8. General-purpose Hybrid Annotator (I) Mention extraction: proper nouns as classified by Stanford POS Tagger (trained with english-bidirectional-distsim model) and named entities as classified by Stanford NERClassifierCombiner (trained with CoNLL 2003, MUC 6, MUC 7 corpora) Resolution and typing o When one mention is substring of another we take the longest: o When a part of one mention is a substring of another we do a merge to create a new one: POS: (United States, NNPS) NER: (United States of America, PLACE) United States of America, PLACE POS: (United States, NNPS) NER: (States of America, PLACE) United States of America, PLACE Plu et al., Revealing Entities from Textual Documents Using a Hybrid Approach, (ISWC'15) NLP & DBpedia 2015 8
  9. 9. General-purpose Hybrid Annotator (II) Candidate selection: fuzzy string match over an index based on DBpedia2015-04 o NIL Clustering when no candidates are found. Exact match of labels within the boundaries of a sentence o Candidate Ranking if multiple candidates are found. 9 r(l): the score of the label l L: the Levenshtein distance m: the extracted mention title: the title of the label l R: the set of redirect pages associated to the label l D: the set of disambiguation pages associated to the label l PR: Pagerank associated to the label l a, b and c are weights following the properties: ● a > b > c ● a + b + c = 1
  10. 10. Text as input Reranking with Context e1 c1,1, …, c1,10 ... en cn,1, …, cn,10 10 e1 c1,1, c1,2, …, c1,10 ... en cn,1, …, cn,9, cn,10 en+1 cn+1
  11. 11. Reranking with Context Aim: Adapt the linking task to the textual content that is being analysed Approach: Leverage the genre and topic domain information about the text Apply: 4 heuristics (H1, H2, H3, H4) in cascade. They take the form of binary rules 11
  12. 12. H1: Order of processing Process the running text sequentially, starting from the first sentence. Process the title at the end Reasoning: Title is typically ambiguous/catchy. The first sentences of an article are written most explicitly 12
  13. 13. H2: Coherence Detect if an entity is co-referential (an abbreviation or a substring) with an entity that occurs previously in the same news article Reasoning: Once the writer has clearly introduced an entity, she can use abbreviations or more ambiguous ways to refer to it later in text 13
  14. 14. H3: Domain relevance Use a contextual knowledge base to examine whether a mention has been frequently and dominantly associated with a certain entity within a domain Reasoning: It is customary that the entities mentioned in domain-specific text stem from the same domain. Also, within a domain, a mention is typically associated with one dominant entity 14
  15. 15. H4: Semantic Typing Check whether the semantic type of the entity resolved by H2 or H3 fits the textual context Reasoning: The entity should fit the textual context and fulfil a certain role in text 15
  16. 16. MEANTIME* AIDA-YAGO2** benchmark the approach with a corpus composed of 4 topic-specific gold standards test the generalizability of the approach 16 *Minard et al., MEANTIME, the newsreader multilingual event and time corpus. LREC 2016 **Hoffart et al., Robust Disambiguation of Named Entities in Text. EMNLP 2011 Benchmark corpora
  17. 17. Number of Articles Number of Tokens Number of Entities Number of Links Number of NILs Number of Entity Types MEANTI ME* airbus 30 3,620 614 414 200 5 apple 30 3,452 812 525 287 5 gm 30 3,641 760 526 234 5 stock 30 3,362 449 331 118 4 AIDA- YAGO2** 231 46,435 5,616 4,485 1,131 4 17 Corpora statistics
  18. 18. Experimental results airbus apple gm stock AIDA-YAGO2 P R F1 P R F1 P R F1 P R F1 P R F1 Hybrid 58.7 4 40.5 8 48 19.7 8 10.0 9 13.3 7 50.3 6 26.8 1 34.9 9 59.1 2 32.3 3 41.8 49.1 4 43.4 1 46.1 Hybrid +H1+H2 59.0 9 40.8 2 48.2 9 19.7 8 10.0 9 13.3 7 55 29.2 8 38.2 1 59.1 2 32.3 3 41.8 48.6 7 43.0 1 45.6 7 Hybrid +H1+H3 62.5 43.4 8 51.2 8 20.0 7 10.2 9 13.6 63.5 4 34.7 9 44.9 6 66.3 1 37.4 6 47.8 8 57.8 9 52.0 4 54.8 1 Hybrid +H1+H2+H3 62.1 5 43.2 4 51 20.0 7 10.2 9 13.6 67.3 6 36.8 8 47.6 7 68.9 8 38.9 7 49.8 1 57.6 5 51.8 4 54.5 9 Hybrid +H1+H2+H3 +H4 61.4 6 42.7 5 50.4 3 20.0 7 10.2 9 13.6 62.1 33.6 5 43.6 5 63.1 35.6 5 45.5 6 55.2 1 49.6 1 52.2 6
  19. 19. Discussion Reranking with context is effective and brings improvement over the baseline for all corpora Improvement also on AIDA-YAGO2, even though it stems from a neutral topic domain. This is because MEANTIME and AIDA-YAGO2 share the genre domain, and many of the entities in MEANTIME stem from the neutral domain as well H1 (Order of Processing) and H3 (Domain Relevance) with these settings are the most effective heuristics H4 (Semantic Typing) requires further investigations 19
  20. 20. Future Work Model the genre and topic domains to contextualize further the entity linking, i.e. adding more features to improve our adaptive contextual model Investigate the dynamic adaptability in different contexts using knowledge bases as inputs 20
  21. 21. Acknowledgements 21

×