Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Missing Mr. Brown and buying an Abraham Lincoln

442 views

Published on

We argue for the need for the community to address the issue of “dark entities”, those domain entities for which a knowledge base has no information in the context of the entity linking task for building Event-Centric Knowledge Graphs. Through an analysis of a large (1,2 million article) automotive newswire corpus against DBpedia, we identify six classes of errors that lead to dark entities. Finally, we outline further steps that can be taken for tackling this issue.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Missing Mr. Brown and buying an Abraham Lincoln

  1. 1. Missing Mr. Brown and buying an Abraham Lincoln: Dark Entities and DBpedia Marieke van Erp, Filip Ilievski, Marco Rospocher and Piek Vossen
  2. 2. • Entity linking is an important step in building knowledge graphs • DBpedia is the de facto resource for entity linking: • it’s large • it’s broad • it’s got good documentation • it’s got tools • Still, it’s coverage is insufficient The Problem
  3. 3. Dark entities • Not the same as NIL entities! • Dark entities are those entities for which a knowledge base has no information in the context of the entity linking task • In NewsReader we use this context for building event-centric knowledge graphs. • We need to know more about an entity besides its type
  4. 4. 1.2 Million News Articles on Cars • 2003 - 2013 • Born digital • Deep processing via 15-module NLP pipeline • First intra-document information extraction, followed by cross-document event and entity coreference
  5. 5. Performance of the system Precision Recall F1 NewsReader 91.64 90.21 90.92 Stanford NER -- -- 88.08 Ratinov et al. (2009) -- -- 90.57 Passos et al. (2014) -- -- 90.90 Precision Recall CoNLL/AIDA 79.67 75.95 TAC2010 79.77 60.68 NERC: CoNLL 2003 NEL:NewsReader system
  6. 6. Performance of the system
  7. 7. What’s going wrong in the pipeline? • Real world data is dirty • NER isn’t perfect • Conjunctions • Coreference resolution
  8. 8. What’s going wrong with linking to DBpedia? • Subdivisions •April 2006: •production of Polo from Spain to Eastern-Europe because of social problems in Volkswagen - Pamplona and maybe to Volkswagen -Vorst in Belgium •July 2006: •Polo production in Vorst, no jobs lost in Spain but extra jobs in Belgium. •August 2006: •Fewer Golfs produced in Vorst, maybe more Polos. ‘If not, we have a problem’, says a union representative.....Chances that Vorst will not make any Polos next year are minimal, because the factory invested this year in a special new welding installation specific for Polo cars. •November 2006: •Volkswagen stops the production of Golf in Vorst: 3,500 jobs are lost plant renamed to Audi-Brussels •November 2009: •Audi plant in Vorst stops the production of Polo: 300 jobs lost Audi-Brussels present in DBpedia Volkswagen Pamplona linked to Volkswagen Volkswagen closes Volkswagen Pamplona ≠ dbp:Volkswagen closes dbp:Volkswagen
  9. 9. What’s going wrong with linking to DBpedia? • Domain mismatch/Ambiguity
  10. 10. What can we do? • Dynamic set of knowledge bases • Expand knowledge bases • Leverage latent semantics
  11. 11. What can we do? • Dynamic set of knowledge bases • Expand knowledge bases • Leverage latent semantics
  12. 12. This research was supported by the European Union’s 7th Framework Programme via the NewsReader project (ICT-316404)
  13. 13. http://www.newsreader-project.eu

×