Your SlideShare is downloading. ×
Extending DBpedia (LOD) using WikiTables
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Extending DBpedia (LOD) using WikiTables

508
views

Published on

Published in: Technology

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
508
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
11
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Extending DBpedia (LOD) using WikiTables Emir Muñoz Unit for Reasoning and Querying emir.munoz@deri.org
  • 2. Linked Open Data Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ October 12, 2012 -- E. Muñoz
  • 3. Linked Open Data • DBpedia, an export of Wikipedia’s structured data DBpedia provides RDF version of all wikipedia structured data (infoboxes) October 12, 2012 -- E. Muñoz
  • 4. Linked Open Data • DBpedia, an export of Wikipedia’s structured data DBpedia provides RDF version of all wikipedia structured data (infoboxes) But not yet a version of all normal Wikipedia tables or wikitables October 12, 2012 -- E. Muñoz
  • 5. Tables as a source of LOD http://en.wikipedia.org/wiki/Dublin Caption as another row Column header represents types of information The values represent instances of that types http://en.wikipedia.org/wiki/Galway Infoboxes (attr-value) October 12, 2012 -- E. Muñoz Tables are inherently concise as well as information rich
  • 6. Reasoning over Wikipedia Tables http://en.wikipedia.org/wiki/Dublin Recovering Table Semantics … October 12, 2012 -- E. Muñoz Dublin is twinned with the following places:
  • 7. Reasoning over Wikipedia Tables dbpedia.org/resource/San_Jose,_California dbpedia.org/resource/Liverpool dbpedia.org/resource/Matsue,_Shimane dbpedia.org/resource/Barcelona dbpedia.org/resource/Beijing dbpedia.org/resource/United_States dbpedia.org/resource/United_Kingdom dbpedia.org/resource/Japan dbpedia.org/resource/Spain dbpedia.org/resource/People’s_Republic_of_China dbpedia.org/property/city dbpedia.org/property/nation dbpedia.org/property/since http://en.wikipedia.org/wiki/Dublin Entity annotation for cells, mappings to DBpedia resources (xsd:integer) October 12, 2012 -- E. Muñoz
  • 8. Reasoning over Wikipedia Tables dbpedia.org/resource/San_Jose,_California dbpedia.org/resource/Liverpool dbpedia.org/resource/Matsue,_Shimane dbpedia.org/resource/Barcelona dbpedia.org/resource/Beijing dbpedia.org/resource/United_States dbpedia.org/resource/United_Kingdom dbpedia.org/resource/Japan dbpedia.org/resource/Spain dbpedia.org/resource/People’s_Republic_of_China (xsd:integer) dbpedia.org/property/city dbpedia.org/property/nation dbpedia.org/property/since dbpedia.org/ontology/country dbpedia.org/property/subdivisionName is dbpedia.org/ontology/country of http://en.wikipedia.org/wiki/Dublin Extracting relations October 12, 2012 -- E. Muñoz
  • 9. Reasoning over Wikipedia Tables • <http://dbpedia.org/resource/San_Jose,_California> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/United_States> . • <http://dbpedia.org/resource/San_Jose,_California> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/United_States> . • <http://dbpedia.org/resource/Liverpool> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/United_Kingdom> . • <http://dbpedia.org/resource/Liverpool> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/United_Kingdom> . • <http://dbpedia.org/resource/Matsue,_Shimane> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/Japan> . • <http://dbpedia.org/resource/Matsue,_Shimane> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/Japan> . • <http://dbpedia.org/resource/Barcelona> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/Spain> . • <http://dbpedia.org/resource/Barcelona> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/Spain> . • <http://dbpedia.org/resource/Beijing> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/People's_Republic_of_China> . • <http://dbpedia.org/resource/Beijing> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/People's_Republic_of_China> . October 12, 2012 -- E. Muñoz
  • 10. Reasoning over Wikipedia Tables • <http://dbpedia.org/resource/San_Jose,_California> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/United_States> . • <http://dbpedia.org/resource/San_Jose,_California> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/United_States> . • <http://dbpedia.org/resource/Liverpool> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/United_Kingdom> . • <http://dbpedia.org/resource/Liverpool> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/United_Kingdom> . • <http://dbpedia.org/resource/Matsue,_Shimane> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/Japan> . • <http://dbpedia.org/resource/Matsue,_Shimane> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/Japan> . • <http://dbpedia.org/resource/Barcelona> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/Spain> . • <http://dbpedia.org/resource/Barcelona> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/Spain> . • <http://dbpedia.org/resource/Beijing> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/People's_Republic_of_China> . • <http://dbpedia.org/resource/Beijing> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/People's_Republic_of_China> . October 12, 2012 -- E. Muñoz
  • 11. Reasoning over Wikipedia Tables • Let’s analyze these cases … • Liverpool • Matsue • Beijing October 12, 2012 -- E. Muñoz
  • 12. Not that simple… • Web tables usually don’t have explicit semantics by themselves. • Main issues: – Complex tables with spans – Captions inside the table as another row – Not well-formed tables (i.e., not a matrix) – We need filters (e.g., min 2 columns, 2 rows) • We are extracting relations at row level and between the main entity and the table resources October 12, 2012 -- E. Muñoz
  • 13. Parsing: Extracting Tables http://en.wikipedia.org/wiki/People%27s_Republic_of_China Caption as another row Table split October 12, 2012 -- E. Muñoz Rowspans with pictures First step: parsing Wiki format
  • 14. Parsing: Extracting Tables • Problems with parsing the cell’s content http://en.wikipedia.org/wiki/Danny_Kaye October 12, 2012 -- E. Muñoz
  • 15. Parsing: Extracting Tables • Problems with parsing the cell’s content http://en.wikipedia.org/wiki/Danny_Kaye October 12, 2012 -- E. Muñoz
  • 16. Parsing: Extracting Tables Same page link Many different formats Anchor text vs. Content text http://en.wikipedia.org/wiki/List_of_animated_television_series_of_the_1990s October 12, 2012 -- E. Muñoz
  • 17. Extracting Relations A table containing tables http://en.wikipedia.org/wiki/AFC_Ajax October 12, 2012 -- E. Muñoz
  • 18. Extracting Relations • Also relations between the main entity and the entities in the table dbpedia.org/resource/AFC_Ajax 14 dbpedia.org/ontology/team 14 dbpedia.org/property/clubs 11 dbpedia.org/property/currentclub 3 dbpedia.org/property/youthclubs In his dbpedia page there is no mention to AFC Ajax http://en.wikipedia.org/wiki/AFC_Ajax 16 players October 12, 2012 -- E. Muñoz
  • 19. dbpedia.org/resource/Christian_Eriksen Disambiguation page dbpedia.org/resource/Ajax http://en.wikipedia.org/wiki/AFC_Ajax October 12, 2012 -- E. Muñoz
  • 20. Our Dataset • enwiki dump from 2012-09-03 02:17:37 • 8.6 GB of Wikipedia pages that comprise – 10,531,986 documents (HTML pages) – Only 413,256 HTML contains tables – 2,989,098 tables – 905,929 tables after the filter • 27.7% of the whole tables – 0.46 tables per page (or 2.15 discarding pages without tables) October 12, 2012 -- E. Muñoz
  • 21. Methodology October 12, 2012 -- E. Muñoz
  • 22. Ranking of Relationships • The current ranking function is naïve October 12, 2012 -- E. Muñoz http://en.wikipedia.org/wiki/AFC_Ajax 16 players freq relationship score 14 dbpedia.org/ontology/team 0,875 14 dbpedia.org/property/clubs 0,875 11 dbpedia.org/property/currentclub 0,6875 3 dbpedia.org/property/youthclubs 0,1875 𝑠𝑐𝑜𝑟𝑒 = 𝑓𝑟𝑒𝑙 𝑛 𝑟𝑜𝑤𝑠
  • 23. Ranking of Relationships • For this cases is not good and 𝑠𝑐𝑜𝑟𝑒 ∉ [0,1] October 12, 2012 -- E. Muñoz http://en.wikipedia.org/wiki/Danny_Kaye
  • 24. Ongoing Work and Challenges • Improve the ranking function for relations. • Store the 5.5M DBpedia (transitive) redirects locally (optimizing time). • Statistical analysis of Wikipedia tables – Number of columns, rows – Headers, Captions – External and internal links • The big following challenge is the evaluation. October 12, 2012 -- E. Muñoz
  • 25. What’s next? • Some ideas in mind: – Use the extracted relations to classify WikiTables – Define a similarity function for WikiTables English Italian October 12, 2012 -- E. Muñoz
  • 26. What’s next? October 12, 2012 -- E. Muñoz http://en.wikipedia.org/wiki/Electronegativity What means this number? Here there is no reference to those numbers!
  • 27. What’s next? October 12, 2012 -- E. Muñoz http://en.wikipedia.org/wiki/Electronegativity http://en.wikipedia.org/wiki/Chlorine Chlorous acid is a chlorite http://dbpedia.org/page/Chlorous_acid
  • 28. Open problems • Handle multiple-entities in the same cell • Improve the ranking function • Handle redirects before querying DBpedia • How to evaluate the outcome October 12, 2012 -- E. Muñoz Thanks! Q & A Thanks! Emir Muñoz Unit for Reasoning and Querying emir.munoz@deri.org