Extending DBpedia with Wikipedia List Pages

  • 111 views
Uploaded on

Thanks to its wide coverage and general-purpose ontology, DBpedia is a prominent dataset in the Linked Open Data cloud. DBpedia's content is harvested from Wikipedia's infoboxes, based on manually …

Thanks to its wide coverage and general-purpose ontology, DBpedia is a prominent dataset in the Linked Open Data cloud. DBpedia's content is harvested from Wikipedia's infoboxes, based on manually created mappings. In this paper, we explore the use of a promising source of knowledge for extending DBpedia, i.e., Wikipedia's list pages. We discuss how a combination of frequent pattern mining and natural language processing (NLP) methods can be leveraged in order to extend both the DBpedia ontology, as well as the instance information in DBpedia. We provide an illustrative example to show the potential impact of our approach and discuss its main challenges.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
111
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
2
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Extending DBpedia with Wikipedia List Pages 10/22/13 Paulheim, Simone Paolo Simone Paolo Ponzetto Heiko Paulheim, Ponzetto Heiko 1
  • 2. Disclaimer • This presentation shows an idea – after all, it says “position paper” – We don't know if it works! – (but we are quite confident) 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 2
  • 3. Lists in Wikipedia • Wikipedia loves lists • As of June 2013, there are almost 600,000 list pages • Lists organize Wikipedia pages – that correspond to DBpedia instances • Example: – List of African-American writers 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 3
  • 4. Lists in Wikipedia 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 4
  • 5. Lists in Wikipedia • Different types of lists – simple bullet point lists – broken bullet point lists (i.e., different sections) • sometimes, the sections are semantically meaningful – tables – ... Simple Bullet List Broken Bullet List Table Other 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 5
  • 6. Lists in Wikipedia • What information is in a list? – the linked things have the same “type” • The type can be a complex construct – e.g., Writer∩∀ nationality. {United States}∩∀ ethnicity.{African American} • Sometimes, there are more information bits – e.g., birth dates for persons 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 6
  • 7. Extracting Information from Lists • Goal: – find the common characteristics of all things in the list • Example: African-American writers – all instances are writers 25% – all instances have nationality=United_States – all instances have ethnicity=African_American • 12% 3% Information in DBpedia is far from complete – makes extraction difficult – but: big potential to add information to DBpedia 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 7
  • 8. Extracting Information from Lists • Possible approach: finding characteristics with high TF-IDF – TF: percentage of instances in the list that carry characteristic – IDF: 1 / (percentage of all DBpedia instances that carry characteristic) • Rationale: only going by frequency would rate owl:Thing the highest • Example: African-American writers – type=Writer: 0.608 (maximal across all possible classes) – nationality=United_States: 0.277 – ethnicity=African_American: 0.127 • But: – deathPlace=New_York_City: 0.157 :-( 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 8
  • 9. Extracting Information from Lists • Example: African-American writers – ethnicity=African_American: 0.127 – deathPlace=New_York_City: 0.157 • Exploit further information from list page – e.g., wiki:African_American is linked from page, New_York_City is not – e.g., analyze list page title, e.g., using DBpedia Spotlight • African_American is recognized as an entity 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 9
  • 10. Lists of Lists in Wikipedia • Wikipedia also knows ~600 lists of lists – organize lists – form a hierachy • E.g.: – Lists of Writers – Lists of American writers – List of African American writers 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 10
  • 11. From Lists of Lists to an Extended Ontology • Idea: – find corresponding lists of... pages for DBpedia classes – extend hierarchy owl:Thing ... Agent ... Person Corresponding Wikipedia page: Artist ... DBpedia Ontology ... Extended Ontology ... Lists of Writers Writer African-American Writer 10/22/13 Lists of American Writers American Writer ... List of African-American Writers Heiko Paulheim, Simone Paolo Ponzetto 11
  • 12. Potential of the Idea • Given that we extract everything correctly from List of African American writers, we get – 814 new type statements (only DBpedia ontology) – 1409 new property assertions – two entirely new instances • ...and there are ~600,000 list pages – extrapolation: we can roughly double the information in DBpedia • many list pages contain extra information – e.g., birth places and birth dates of persons 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 12
  • 13. Challenges • Robust extraction of instances – from different kinds of list pages – e.g., picking the right column in a table – tables and bullet point lists already make for 75% • Picking good scoring functions – TF-IDF seems not bad at first glance • Combining statistical and textual evidence • Scalable implementation – Advantage: perfectly parallelizable 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 13
  • 14. Extending DBpedia with Wikipedia List Pages 10/22/13 Paulheim, Christian Bizer Heiko Paulheim, Simone Paolo Ponzetto Heiko 14