• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Extending DBpedia (LOD) using WikiTables
 

Extending DBpedia (LOD) using WikiTables

on

  • 538 views

 

Statistics

Views

Total Views
538
Views on SlideShare
533
Embed Views
5

Actions

Likes
1
Downloads
7
Comments
0

1 Embed 5

http://net2.deri.ie 5

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Extending DBpedia (LOD) using WikiTables Extending DBpedia (LOD) using WikiTables Presentation Transcript

    • Extending DBpedia (LOD) using WikiTables Emir Muñoz Unit for Reasoning and Querying emir.munoz@deri.org
    • Linked Open Data Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ October 12, 2012 -- E. Muñoz
    • Linked Open Data • DBpedia, an export of Wikipedia’s structured data DBpedia provides RDF version of all wikipedia structured data (infoboxes) October 12, 2012 -- E. Muñoz
    • Linked Open Data • DBpedia, an export of Wikipedia’s structured data DBpedia provides RDF version of all wikipedia structured data (infoboxes) But not yet a version of all normal Wikipedia tables or wikitables October 12, 2012 -- E. Muñoz
    • Tables as a source of LOD http://en.wikipedia.org/wiki/Dublin Caption as another row Column header represents types of information The values represent instances of that types http://en.wikipedia.org/wiki/Galway Infoboxes (attr-value) October 12, 2012 -- E. Muñoz Tables are inherently concise as well as information rich
    • Reasoning over Wikipedia Tables http://en.wikipedia.org/wiki/Dublin Recovering Table Semantics … October 12, 2012 -- E. Muñoz Dublin is twinned with the following places:
    • Reasoning over Wikipedia Tables dbpedia.org/resource/San_Jose,_California dbpedia.org/resource/Liverpool dbpedia.org/resource/Matsue,_Shimane dbpedia.org/resource/Barcelona dbpedia.org/resource/Beijing dbpedia.org/resource/United_States dbpedia.org/resource/United_Kingdom dbpedia.org/resource/Japan dbpedia.org/resource/Spain dbpedia.org/resource/People’s_Republic_of_China dbpedia.org/property/city dbpedia.org/property/nation dbpedia.org/property/since http://en.wikipedia.org/wiki/Dublin Entity annotation for cells, mappings to DBpedia resources (xsd:integer) October 12, 2012 -- E. Muñoz
    • Reasoning over Wikipedia Tables dbpedia.org/resource/San_Jose,_California dbpedia.org/resource/Liverpool dbpedia.org/resource/Matsue,_Shimane dbpedia.org/resource/Barcelona dbpedia.org/resource/Beijing dbpedia.org/resource/United_States dbpedia.org/resource/United_Kingdom dbpedia.org/resource/Japan dbpedia.org/resource/Spain dbpedia.org/resource/People’s_Republic_of_China (xsd:integer) dbpedia.org/property/city dbpedia.org/property/nation dbpedia.org/property/since dbpedia.org/ontology/country dbpedia.org/property/subdivisionName is dbpedia.org/ontology/country of http://en.wikipedia.org/wiki/Dublin Extracting relations October 12, 2012 -- E. Muñoz
    • Reasoning over Wikipedia Tables • <http://dbpedia.org/resource/San_Jose,_California> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/United_States> . • <http://dbpedia.org/resource/San_Jose,_California> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/United_States> . • <http://dbpedia.org/resource/Liverpool> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/United_Kingdom> . • <http://dbpedia.org/resource/Liverpool> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/United_Kingdom> . • <http://dbpedia.org/resource/Matsue,_Shimane> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/Japan> . • <http://dbpedia.org/resource/Matsue,_Shimane> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/Japan> . • <http://dbpedia.org/resource/Barcelona> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/Spain> . • <http://dbpedia.org/resource/Barcelona> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/Spain> . • <http://dbpedia.org/resource/Beijing> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/People's_Republic_of_China> . • <http://dbpedia.org/resource/Beijing> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/People's_Republic_of_China> . October 12, 2012 -- E. Muñoz
    • Reasoning over Wikipedia Tables • <http://dbpedia.org/resource/San_Jose,_California> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/United_States> . • <http://dbpedia.org/resource/San_Jose,_California> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/United_States> . • <http://dbpedia.org/resource/Liverpool> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/United_Kingdom> . • <http://dbpedia.org/resource/Liverpool> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/United_Kingdom> . • <http://dbpedia.org/resource/Matsue,_Shimane> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/Japan> . • <http://dbpedia.org/resource/Matsue,_Shimane> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/Japan> . • <http://dbpedia.org/resource/Barcelona> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/Spain> . • <http://dbpedia.org/resource/Barcelona> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/Spain> . • <http://dbpedia.org/resource/Beijing> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/People's_Republic_of_China> . • <http://dbpedia.org/resource/Beijing> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/People's_Republic_of_China> . October 12, 2012 -- E. Muñoz
    • Reasoning over Wikipedia Tables • Let’s analyze these cases … • Liverpool • Matsue • Beijing October 12, 2012 -- E. Muñoz
    • Not that simple… • Web tables usually don’t have explicit semantics by themselves. • Main issues: – Complex tables with spans – Captions inside the table as another row – Not well-formed tables (i.e., not a matrix) – We need filters (e.g., min 2 columns, 2 rows) • We are extracting relations at row level and between the main entity and the table resources October 12, 2012 -- E. Muñoz
    • Parsing: Extracting Tables http://en.wikipedia.org/wiki/People%27s_Republic_of_China Caption as another row Table split October 12, 2012 -- E. Muñoz Rowspans with pictures First step: parsing Wiki format
    • Parsing: Extracting Tables • Problems with parsing the cell’s content http://en.wikipedia.org/wiki/Danny_Kaye October 12, 2012 -- E. Muñoz
    • Parsing: Extracting Tables • Problems with parsing the cell’s content http://en.wikipedia.org/wiki/Danny_Kaye October 12, 2012 -- E. Muñoz
    • Parsing: Extracting Tables Same page link Many different formats Anchor text vs. Content text http://en.wikipedia.org/wiki/List_of_animated_television_series_of_the_1990s October 12, 2012 -- E. Muñoz
    • Extracting Relations A table containing tables http://en.wikipedia.org/wiki/AFC_Ajax October 12, 2012 -- E. Muñoz
    • Extracting Relations • Also relations between the main entity and the entities in the table dbpedia.org/resource/AFC_Ajax 14 dbpedia.org/ontology/team 14 dbpedia.org/property/clubs 11 dbpedia.org/property/currentclub 3 dbpedia.org/property/youthclubs In his dbpedia page there is no mention to AFC Ajax http://en.wikipedia.org/wiki/AFC_Ajax 16 players October 12, 2012 -- E. Muñoz
    • dbpedia.org/resource/Christian_Eriksen Disambiguation page dbpedia.org/resource/Ajax http://en.wikipedia.org/wiki/AFC_Ajax October 12, 2012 -- E. Muñoz
    • Our Dataset • enwiki dump from 2012-09-03 02:17:37 • 8.6 GB of Wikipedia pages that comprise – 10,531,986 documents (HTML pages) – Only 413,256 HTML contains tables – 2,989,098 tables – 905,929 tables after the filter • 27.7% of the whole tables – 0.46 tables per page (or 2.15 discarding pages without tables) October 12, 2012 -- E. Muñoz
    • Methodology October 12, 2012 -- E. Muñoz
    • Ranking of Relationships • The current ranking function is naïve October 12, 2012 -- E. Muñoz http://en.wikipedia.org/wiki/AFC_Ajax 16 players freq relationship score 14 dbpedia.org/ontology/team 0,875 14 dbpedia.org/property/clubs 0,875 11 dbpedia.org/property/currentclub 0,6875 3 dbpedia.org/property/youthclubs 0,1875 𝑠𝑐𝑜𝑟𝑒 = 𝑓𝑟𝑒𝑙 𝑛 𝑟𝑜𝑤𝑠
    • Ranking of Relationships • For this cases is not good and 𝑠𝑐𝑜𝑟𝑒 ∉ [0,1] October 12, 2012 -- E. Muñoz http://en.wikipedia.org/wiki/Danny_Kaye
    • Ongoing Work and Challenges • Improve the ranking function for relations. • Store the 5.5M DBpedia (transitive) redirects locally (optimizing time). • Statistical analysis of Wikipedia tables – Number of columns, rows – Headers, Captions – External and internal links • The big following challenge is the evaluation. October 12, 2012 -- E. Muñoz
    • What’s next? • Some ideas in mind: – Use the extracted relations to classify WikiTables – Define a similarity function for WikiTables English Italian October 12, 2012 -- E. Muñoz
    • What’s next? October 12, 2012 -- E. Muñoz http://en.wikipedia.org/wiki/Electronegativity What means this number? Here there is no reference to those numbers!
    • What’s next? October 12, 2012 -- E. Muñoz http://en.wikipedia.org/wiki/Electronegativity http://en.wikipedia.org/wiki/Chlorine Chlorous acid is a chlorite http://dbpedia.org/page/Chlorous_acid
    • Open problems • Handle multiple-entities in the same cell • Improve the ranking function • Handle redirects before querying DBpedia • How to evaluate the outcome October 12, 2012 -- E. Muñoz Thanks! Q & A Thanks! Emir Muñoz Unit for Reasoning and Querying emir.munoz@deri.org