Extending DBpedia (LOD) using WikiTables
Upcoming SlideShare
Loading in...5
×
 

Extending DBpedia (LOD) using WikiTables

on

  • 758 views

 

Statistics

Views

Total Views
758
Views on SlideShare
751
Embed Views
7

Actions

Likes
1
Downloads
9
Comments
0

1 Embed 7

http://net2.deri.ie 7

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Extending DBpedia (LOD) using WikiTables Presentation Transcript

  • 1. Extending DBpedia (LOD) using WikiTables Emir Muñoz Unit for Reasoning and Querying emir.munoz@deri.org
  • 2. Linked Open Data Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ October 12, 2012 -- E. Muñoz
  • 3. Linked Open Data • DBpedia, an export of Wikipedia’s structured data DBpedia provides RDF version of all wikipedia structured data (infoboxes) October 12, 2012 -- E. Muñoz
  • 4. Linked Open Data • DBpedia, an export of Wikipedia’s structured data DBpedia provides RDF version of all wikipedia structured data (infoboxes) But not yet a version of all normal Wikipedia tables or wikitables October 12, 2012 -- E. Muñoz
  • 5. Tables as a source of LOD http://en.wikipedia.org/wiki/Dublin Caption as another row Column header represents types of information The values represent instances of that types http://en.wikipedia.org/wiki/Galway Infoboxes (attr-value) October 12, 2012 -- E. Muñoz Tables are inherently concise as well as information rich
  • 6. Reasoning over Wikipedia Tables http://en.wikipedia.org/wiki/Dublin Recovering Table Semantics … October 12, 2012 -- E. Muñoz Dublin is twinned with the following places:
  • 7. Reasoning over Wikipedia Tables dbpedia.org/resource/San_Jose,_California dbpedia.org/resource/Liverpool dbpedia.org/resource/Matsue,_Shimane dbpedia.org/resource/Barcelona dbpedia.org/resource/Beijing dbpedia.org/resource/United_States dbpedia.org/resource/United_Kingdom dbpedia.org/resource/Japan dbpedia.org/resource/Spain dbpedia.org/resource/People’s_Republic_of_China dbpedia.org/property/city dbpedia.org/property/nation dbpedia.org/property/since http://en.wikipedia.org/wiki/Dublin Entity annotation for cells, mappings to DBpedia resources (xsd:integer) October 12, 2012 -- E. Muñoz
  • 8. Reasoning over Wikipedia Tables dbpedia.org/resource/San_Jose,_California dbpedia.org/resource/Liverpool dbpedia.org/resource/Matsue,_Shimane dbpedia.org/resource/Barcelona dbpedia.org/resource/Beijing dbpedia.org/resource/United_States dbpedia.org/resource/United_Kingdom dbpedia.org/resource/Japan dbpedia.org/resource/Spain dbpedia.org/resource/People’s_Republic_of_China (xsd:integer) dbpedia.org/property/city dbpedia.org/property/nation dbpedia.org/property/since dbpedia.org/ontology/country dbpedia.org/property/subdivisionName is dbpedia.org/ontology/country of http://en.wikipedia.org/wiki/Dublin Extracting relations October 12, 2012 -- E. Muñoz
  • 9. Reasoning over Wikipedia Tables • <http://dbpedia.org/resource/San_Jose,_California> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/United_States> . • <http://dbpedia.org/resource/San_Jose,_California> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/United_States> . • <http://dbpedia.org/resource/Liverpool> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/United_Kingdom> . • <http://dbpedia.org/resource/Liverpool> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/United_Kingdom> . • <http://dbpedia.org/resource/Matsue,_Shimane> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/Japan> . • <http://dbpedia.org/resource/Matsue,_Shimane> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/Japan> . • <http://dbpedia.org/resource/Barcelona> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/Spain> . • <http://dbpedia.org/resource/Barcelona> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/Spain> . • <http://dbpedia.org/resource/Beijing> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/People's_Republic_of_China> . • <http://dbpedia.org/resource/Beijing> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/People's_Republic_of_China> . October 12, 2012 -- E. Muñoz
  • 10. Reasoning over Wikipedia Tables • <http://dbpedia.org/resource/San_Jose,_California> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/United_States> . • <http://dbpedia.org/resource/San_Jose,_California> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/United_States> . • <http://dbpedia.org/resource/Liverpool> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/United_Kingdom> . • <http://dbpedia.org/resource/Liverpool> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/United_Kingdom> . • <http://dbpedia.org/resource/Matsue,_Shimane> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/Japan> . • <http://dbpedia.org/resource/Matsue,_Shimane> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/Japan> . • <http://dbpedia.org/resource/Barcelona> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/Spain> . • <http://dbpedia.org/resource/Barcelona> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/Spain> . • <http://dbpedia.org/resource/Beijing> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/People's_Republic_of_China> . • <http://dbpedia.org/resource/Beijing> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/People's_Republic_of_China> . October 12, 2012 -- E. Muñoz
  • 11. Reasoning over Wikipedia Tables • Let’s analyze these cases … • Liverpool • Matsue • Beijing October 12, 2012 -- E. Muñoz
  • 12. Not that simple… • Web tables usually don’t have explicit semantics by themselves. • Main issues: – Complex tables with spans – Captions inside the table as another row – Not well-formed tables (i.e., not a matrix) – We need filters (e.g., min 2 columns, 2 rows) • We are extracting relations at row level and between the main entity and the table resources October 12, 2012 -- E. Muñoz
  • 13. Parsing: Extracting Tables http://en.wikipedia.org/wiki/People%27s_Republic_of_China Caption as another row Table split October 12, 2012 -- E. Muñoz Rowspans with pictures First step: parsing Wiki format
  • 14. Parsing: Extracting Tables • Problems with parsing the cell’s content http://en.wikipedia.org/wiki/Danny_Kaye October 12, 2012 -- E. Muñoz
  • 15. Parsing: Extracting Tables • Problems with parsing the cell’s content http://en.wikipedia.org/wiki/Danny_Kaye October 12, 2012 -- E. Muñoz
  • 16. Parsing: Extracting Tables Same page link Many different formats Anchor text vs. Content text http://en.wikipedia.org/wiki/List_of_animated_television_series_of_the_1990s October 12, 2012 -- E. Muñoz
  • 17. Extracting Relations A table containing tables http://en.wikipedia.org/wiki/AFC_Ajax October 12, 2012 -- E. Muñoz
  • 18. Extracting Relations • Also relations between the main entity and the entities in the table dbpedia.org/resource/AFC_Ajax 14 dbpedia.org/ontology/team 14 dbpedia.org/property/clubs 11 dbpedia.org/property/currentclub 3 dbpedia.org/property/youthclubs In his dbpedia page there is no mention to AFC Ajax http://en.wikipedia.org/wiki/AFC_Ajax 16 players October 12, 2012 -- E. Muñoz
  • 19. dbpedia.org/resource/Christian_Eriksen Disambiguation page dbpedia.org/resource/Ajax http://en.wikipedia.org/wiki/AFC_Ajax October 12, 2012 -- E. Muñoz
  • 20. Our Dataset • enwiki dump from 2012-09-03 02:17:37 • 8.6 GB of Wikipedia pages that comprise – 10,531,986 documents (HTML pages) – Only 413,256 HTML contains tables – 2,989,098 tables – 905,929 tables after the filter • 27.7% of the whole tables – 0.46 tables per page (or 2.15 discarding pages without tables) October 12, 2012 -- E. Muñoz
  • 21. Methodology October 12, 2012 -- E. Muñoz
  • 22. Ranking of Relationships • The current ranking function is naïve October 12, 2012 -- E. Muñoz http://en.wikipedia.org/wiki/AFC_Ajax 16 players freq relationship score 14 dbpedia.org/ontology/team 0,875 14 dbpedia.org/property/clubs 0,875 11 dbpedia.org/property/currentclub 0,6875 3 dbpedia.org/property/youthclubs 0,1875 𝑠𝑐𝑜𝑟𝑒 = 𝑓𝑟𝑒𝑙 𝑛 𝑟𝑜𝑤𝑠
  • 23. Ranking of Relationships • For this cases is not good and 𝑠𝑐𝑜𝑟𝑒 ∉ [0,1] October 12, 2012 -- E. Muñoz http://en.wikipedia.org/wiki/Danny_Kaye
  • 24. Ongoing Work and Challenges • Improve the ranking function for relations. • Store the 5.5M DBpedia (transitive) redirects locally (optimizing time). • Statistical analysis of Wikipedia tables – Number of columns, rows – Headers, Captions – External and internal links • The big following challenge is the evaluation. October 12, 2012 -- E. Muñoz
  • 25. What’s next? • Some ideas in mind: – Use the extracted relations to classify WikiTables – Define a similarity function for WikiTables English Italian October 12, 2012 -- E. Muñoz
  • 26. What’s next? October 12, 2012 -- E. Muñoz http://en.wikipedia.org/wiki/Electronegativity What means this number? Here there is no reference to those numbers!
  • 27. What’s next? October 12, 2012 -- E. Muñoz http://en.wikipedia.org/wiki/Electronegativity http://en.wikipedia.org/wiki/Chlorine Chlorous acid is a chlorite http://dbpedia.org/page/Chlorous_acid
  • 28. Open problems • Handle multiple-entities in the same cell • Improve the ranking function • Handle redirects before querying DBpedia • How to evaluate the outcome October 12, 2012 -- E. Muñoz Thanks! Q & A Thanks! Emir Muñoz Unit for Reasoning and Querying emir.munoz@deri.org