Your SlideShare is downloading. ×
WikiTables DERI Talk
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

WikiTables DERI Talk

638

Published on

Published in: Education
2 Comments
2 Likes
Statistics
Notes
No Downloads
Views
Total Views
638
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
7
Comments
2
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Extending DBpedia (LOD) using WikiTables Emir Muñoz Unit for Reasoning and Querying emir.munoz@deri.org
  • 2. Linked Open DataLinking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ October 12, 2012 -- E. Muñoz
  • 3. Linked Open Data• DBpedia, an export of Wikipedia’s structured dataDBpedia provides RDF version of all wikipedia structured data (infoboxes) October 12, 2012 -- E. Muñoz
  • 4. Linked Open Data• DBpedia, an export of Wikipedia’s structured dataDBpedia provides RDF version of all wikipedia structured data (infoboxes) But not yet a version of all normal Wikipedia tables or wikitables October 12, 2012 -- E. Muñoz
  • 5. Tables as a source of LOD Tables are inherently concise Infoboxes as well as information rich (attr-value) The values Column header represents represent types of information Caption asinstances of that another row types http://en.wikipedia.org/wiki/Dublin http://en.wikipedia.org/wiki/Galway October 12, 2012 -- E. Muñoz
  • 6. Reasoning over Wikipedia Tables Recovering Table Semantics …Dublin is twinned with the following places: http://en.wikipedia.org/wiki/Dublin October 12, 2012 -- E. Muñoz
  • 7. Reasoning over Wikipedia Tables Entity annotation for cells, mappings to DBpedia resources http://en.wikipedia.org/wiki/Dublin dbpedia.org/property/city dbpedia.org/property/nation dbpedia.org/property/sincedbpedia.org/resource/San_Jose,_California dbpedia.org/resource/United_States dbpedia.org/resource/Liverpool dbpedia.org/resource/United_Kingdom dbpedia.org/resource/Matsue,_Shimane dbpedia.org/resource/Japan dbpedia.org/resource/Barcelona dbpedia.org/resource/Spain dbpedia.org/resource/Beijing dbpedia.org/resource/People’s_Republic_of_China (xsd:integer) October 12, 2012 -- E. Muñoz
  • 8. Reasoning over Wikipedia Tables dbpedia.org/ontology/country dbpedia.org/property/subdivisionName Extracting relations http://en.wikipedia.org/wiki/Dublin dbpedia.org/property/city dbpedia.org/property/nation dbpedia.org/property/sincedbpedia.org/resource/San_Jose,_California dbpedia.org/resource/United_States dbpedia.org/resource/Liverpool dbpedia.org/resource/United_Kingdom dbpedia.org/resource/Matsue,_Shimane dbpedia.org/resource/Japan dbpedia.org/resource/Barcelona dbpedia.org/resource/Spain dbpedia.org/resource/Beijing dbpedia.org/resource/People’s_Republic_of_China (xsd:integer) is dbpedia.org/ontology/country of October 12, 2012 -- E. Muñoz
  • 9. • <http://dbpedia.org/resource/San_Jose,_California> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/United_States> .• Reasoning over Wikipedia Tables <http://dbpedia.org/resource/San_Jose,_California> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/United_States> .• <http://dbpedia.org/resource/Liverpool> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/United_Kingdom> .• <http://dbpedia.org/resource/Liverpool> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/United_Kingdom> .• <http://dbpedia.org/resource/Matsue,_Shimane> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/Japan> .• <http://dbpedia.org/resource/Matsue,_Shimane> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/Japan> .• <http://dbpedia.org/resource/Barcelona> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/Spain> .• <http://dbpedia.org/resource/Barcelona> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/Spain> .• <http://dbpedia.org/resource/Beijing> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/Peoples_Republic_of_China> .• <http://dbpedia.org/resource/Beijing> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/Peoples_Republic_of_China> . October 12, 2012 -- E. Muñoz
  • 10. • <http://dbpedia.org/resource/San_Jose,_California> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/United_States> .• Reasoning over Wikipedia Tables <http://dbpedia.org/resource/San_Jose,_California> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/United_States> .• <http://dbpedia.org/resource/Liverpool> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/United_Kingdom> .• <http://dbpedia.org/resource/Liverpool> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/United_Kingdom> .• <http://dbpedia.org/resource/Matsue,_Shimane> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/Japan> .• <http://dbpedia.org/resource/Matsue,_Shimane> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/Japan> .• <http://dbpedia.org/resource/Barcelona> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/Spain> .• <http://dbpedia.org/resource/Barcelona> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/Spain> .• <http://dbpedia.org/resource/Beijing> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/Peoples_Republic_of_China> .• <http://dbpedia.org/resource/Beijing> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/Peoples_Republic_of_China> . October 12, 2012 -- E. Muñoz
  • 11. Reasoning over Wikipedia Tables• Let’s analyze these cases …• Liverpool• Matsue• Beijing October 12, 2012 -- E. Muñoz
  • 12. Not that simple…• Web tables usually don’t have explicit semantics by themselves.• Main issues: – Complex tables with spans – Captions inside the table as another row – Not well-formed tables (i.e., not a matrix) – We need filters (e.g., min 2 columns, 2 rows)• We are extracting relations at row level and between the main entity and the table resources October 12, 2012 -- E. Muñoz
  • 13. Parsing: Extracting TablesFirst step: parsing Wiki format Caption as another row http://en.wikipedia.org/wiki/People%27s_Republic_of_China Rowspans Table splitwith pictures October 12, 2012 -- E. Muñoz
  • 14. Parsing: Extracting Tables• Problems with parsing the cell’s content http://en.wikipedia.org/wiki/Danny_Kaye October 12, 2012 -- E. Muñoz
  • 15. Parsing: Extracting Tables• Problems with parsing the cell’s content http://en.wikipedia.org/wiki/Danny_Kaye October 12, 2012 -- E. Muñoz
  • 16. Parsing: Extracting Tables Same page link Many different formatsAnchor text vs.Content text http://en.wikipedia.org/wiki/List_of_animated_television_series_of_the_1990s October 12, 2012 -- E. Muñoz
  • 17. Extracting Relations http://en.wikipedia.org/wiki/AFC_Ajax A tablecontaining tables October 12, 2012 -- E. Muñoz
  • 18. Extracting Relations• Also relations between the main entity and the entities in the table http://en.wikipedia.org/wiki/AFC_Ajax 16 playersdbpedia.org/resource/AFC_Ajax14 dbpedia.org/ontology/team14 dbpedia.org/property/clubs11 dbpedia.org/property/currentclub3 dbpedia.org/property/youthclubs In his dbpedia page there is no mention to AFC Ajax October 12, 2012 -- E. Muñoz
  • 19. dbpedia.org/resource/Christian_Eriksen http://en.wikipedia.org/wiki/AFC_AjaxDisambiguation pagedbpedia.org/resource/Ajax October 12, 2012 -- E. Muñoz
  • 20. Our Dataset• enwiki dump from 2012-09-03 02:17:37• 8.6 GB of Wikipedia pages that comprise – 10,531,986 documents (HTML pages) – Only 413,256 HTML contains tables – 2,989,098 tables – 905,929 tables after the filter • 27.7% of the whole tables – 0.46 tables per page (or 2.15 discarding pages without tables) October 12, 2012 -- E. Muñoz
  • 21. Methodology October 12, 2012 -- E. Muñoz
  • 22. Ranking of Relationships• The current ranking function is naïve 𝑓 𝑟𝑒𝑙 http://en.wikipedia.org/wiki/AFC_Ajax 𝑠𝑐𝑜𝑟𝑒 = 𝑛 𝑟𝑜𝑤𝑠 16 playersfreq relationship score 14 dbpedia.org/ontology/team 0,875 14 dbpedia.org/property/clubs 0,875 11 dbpedia.org/property/currentclub 0,6875 3 dbpedia.org/property/youthclubs 0,1875 October 12, 2012 -- E. Muñoz
  • 23. Ranking of Relationships• For this cases is not good and 𝑠𝑐𝑜𝑟𝑒 ∉ [0,1] http://en.wikipedia.org/wiki/Danny_Kaye October 12, 2012 -- E. Muñoz
  • 24. Ongoing Work and Challenges• Improve the ranking function for relations.• Store the 5.5M DBpedia (transitive) redirects locally (optimizing time).• Statistical analysis of Wikipedia tables – Number of columns, rows – Headers, Captions – External and internal links• The big following challenge is the evaluation. October 12, 2012 -- E. Muñoz
  • 25. What’s next?• Some ideas in mind: – Use the extracted relations to classify WikiTables – Define a similarity function for WikiTables English Italian October 12, 2012 -- E. Muñoz
  • 26. What’s next?http://en.wikipedia.org/wiki/Electronegativity What means Here there is no reference to those numbers! this number? October 12, 2012 -- E. Muñoz
  • 27. What’s next? http://dbpedia.org/page/Chlorous_acidhttp://en.wikipedia.org/wiki/Electronegativity Chlorous acid is a chlorite http://en.wikipedia.org/wiki/Chlorine October 12, 2012 -- E. Muñoz
  • 28. Open problems• Handle multiple-entities in the same cell• Improve the ranking function Thanks!• Handle redirects before querying DBpedia Q&A• How to evaluate the outcome Thanks! Emir Muñoz Unit for Reasoning and Querying emir.munoz@deri.org October 12, 2012 -- E. Muñoz

×