Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Using DBpedia for Spotting and Disambiguating Entities

1,880 views

Published on

Talk for the 3rd DBpedia community meeting.

Published in: Data & Analytics

Using DBpedia for Spotting and Disambiguating Entities

  1. 1. Julien Plu, Giuseppe Rizzo, Raphaël Troncy {firstname.lastname}@eurecom.fr, @julienplu, @giusepperizzo, @rtroncy Using DBpedia for Spotting and Disambiguating Entities
  2. 2. Agenda  Entity Linking task  Why using DBpedia?  Workflow  How is the index created?  Experiments on tweets  Future work 09/02/2015 - 3rd DBpedia Community Meeting – Dublin, Ireland - 2
  3. 3. Entity Linking Task  The purpose is to link entity mentions one can find in text to their corresponding entries in a knowledge base.  Example: Last year I went to Paris to see the Eiffel Tower with some friends. http://dbpedia.org/resource/Paris http://dbpedia.org/resource/Eiffel_Tower 09/02/2015 - 3rd DBpedia Community Meeting – Dublin, Ireland - 3
  4. 4. Why using DBpedia?  No legacy problems compared with Freebase  Knowledge base is constantly evolving  Available in many languages which are interlinked  Most of the resources have a type  All the resources have semantic relations with others  Possibility to get the popularity of a resource for each language 09/02/2015 - 3rd DBpedia Community Meeting – Dublin, Ireland - 4
  5. 5. Workflow Text POS Tagging / N-grams analysis to get the entities Lookup in the index to get candidates for each entities linking each entity in choosing the right one among the candidates • Not domain-dependent • The lookup and the linking processes are made on top of an index created with DBpedia 09/02/2015 - 3rd DBpedia Community Meeting – Dublin, Ireland - 5
  6. 6. How is the index created?  3 datasets are used: Titles Redirects Disambiguation links  Structure of the index: First column is the label of the entity Second column is the URI of the entity Third column list all the labels of the redirect pages linked to the entity Fourth column is the label of the disambiguation page of the entity 09/02/2015 - 3rd DBpedia Community Meeting – Dublin, Ireland - 6
  7. 7. Experiments on tweets  Dataset from the #Micropost2014 NEEL challenge  Entity recognition  Entity recognition + linking Precision Recall F-measure 31,29% 20,64% 24,88% Precision Recall F-measure 63,51% 41,91% 50,50% 09/02/2015 - 3rd DBpedia Community Meeting – Dublin, Ireland - 7
  8. 8. Future work  Using deeply DBpedia: Relation among the entities Compute the popularity of an entity (i.e pageRank according to a language) Relation between different languages for the same entity Using the types for each entity  Using better algorithm to rank candidates after the lookup 09/02/2015 - 3rd DBpedia Community Meeting – Dublin, Ireland - 8

×