Aidan's PhD Viva

2,073 views
1,959 views

Published on

Slide's of Aidan's PhD defense.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,073
On SlideShare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
30
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Aidan's PhD Viva

  1. 1. Exploiting RDFS and OWL for Integrating Heterogeneous, Large-Scale, Linked Data Corpora Aidan Hogan PhD Viva
  2. 2. Cold Open Figure 1: Web of Data explicit data implicit data Topic of thesis: How can consumers tap into the implicit data
  3. 3. <ul><li>PRELUDE </li></ul><ul><li>The Area… </li></ul><ul><li>The Problem… </li></ul><ul><li>The Hypothesis… </li></ul>
  4. 4. <ul><li>The Area… </li></ul><ul><li>… Linked Data / Linking Open Data </li></ul>
  5. 5. <ul><li>Bottom-up Approach to Semantic Web </li></ul><ul><li>Individual Publishers should: </li></ul><ul><ul><li>Use URIs to name things (not just documents) </li></ul></ul><ul><ul><li>Use HTTP URIs that can be looked up </li></ul></ul><ul><ul><li>Return information in a common structured data model ( RDF ) </li></ul></ul><ul><ul><li>Use external URIs in your data so as to link to related data </li></ul></ul>… the micro … Linked Data Principles
  6. 6. … the macro … A Web of Data Images from: http://richard.cyganiak.de/2007/10/lod/ ; Cyganiak, Jentzsch September 2010 August 2007 November 2007 February 2008 March 2008 September 2008 March 2009 July 2009
  7. 7. <ul><li>… so what’s The Problem ? … </li></ul><ul><li>… heterogeneity </li></ul>
  8. 8. Take Query Answering … <ul><li>SPARQL endpoints over Web data such as YARS2 , Virtuoso, FactForge, etc. </li></ul><ul><li>Search engines such as SWSE , Sindice, Falcons, Swoogle, Watson, etc. </li></ul>
  9. 9. Take Query Answering … Gimme webpages relating to Tim Berners-Lee foaf:page timbl:i timbl:i foaf:page ?pages .
  10. 10. Hetereogenity in terminology … webpage: properties foaf:page foaf:homepage foaf:isPrimaryTopicOf foaf:weblog doap:homepage foaf:topic foaf:primaryTopic mo:musicBrainz mo:myspace … = rdfs:subPropertyOf = owl:inverseOf
  11. 11. Linked Data, RDFS and OWL: Linked Vocabularies … … Image from http://blog.dbtune.org/public/.081005_lod_constellation_m.jpg : ; Giasson, Bergman
  12. 12. Hetereogenity in naming … Tim Berners-Lee: URIs … timbl:i dblp:100007 identica:45563 adv:timbl fb:en.tim_berners-lee db:Tim-Berners_Lee = owl:sameAs
  13. 13. Returning to our Query … Gimme webpages relating to Tim Berners-Lee foaf:page timbl:i timbl:i foaf:page ?pages . ... 7 x 6 = 42 possible patterns foaf:homepage foaf:isPrimaryTopicOf doap:homepage foaf:topic foaf:primaryTopic mo:myspace dblp:100007 identica:45563 adv:timbl fb:en.tim_berners-lee db:Tim-Berners_Lee
  14. 14. <ul><li>… The Hypothesis ? … </li></ul><ul><li>… we can use the OWL and RDFS inherent in Linked Data to attenuate the problem of heterogeneity for consumers </li></ul>
  15. 15. <ul><li>Scenario… </li></ul><ul><li>… take a static corpus crawled from Linked Data… </li></ul><ul><li>… about a billion triples or so… </li></ul><ul><li>… and tackle the problem (s) of heterogeneity </li></ul><ul><li>… ( without domain-specific “cheats” ). </li></ul>
  16. 16. <ul><li>Setup… </li></ul><ul><li>hardware </li></ul><ul><li>… 9 machines </li></ul><ul><ul><li>… ~6 years old… 4Gb RAM, 2.2GHz, Ethernet </li></ul></ul>
  17. 17. <ul><li>Setup… </li></ul><ul><li>corpus </li></ul><ul><li>… crawl ( 9 machines: 52.5 hr ) </li></ul><ul><ul><li>… took random seed URIs from Billion Triple Challenge 2009 dataset </li></ul></ul><ul><ul><li>… crawled ~4 million RDF/XML documents </li></ul></ul><ul><ul><li>… from arbitrary domains (e.g., dbpedia.org) </li></ul></ul><ul><ul><ul><li>Only found 785 domains providing RDF/XML </li></ul></ul></ul><ul><ul><li>… 1.118 billion quadruples </li></ul></ul><ul><ul><li>… 947 million unique triples </li></ul></ul>
  18. 18. <ul><li>Setup… </li></ul><ul><li>ranking ( 9 machines: 30.3 hr ) </li></ul><ul><li>… applied PageRank over interlinked source docs. </li></ul><ul><ul><ul><li>… source A links to source B if A uses a URI which “dereferences” (points) to B </li></ul></ul></ul>
  19. 19. <ul><li>Challenges… </li></ul><ul><li>… what (OWL) reasoning is feasible for Linked Data? </li></ul>
  20. 20. Linked Data Reasoning: Challenges Scalable Expressive Robust Domain-Agnostic
  21. 21. <ul><li>CORE </li></ul><ul><li>1. Reasoning… </li></ul><ul><li>2. Annotated Reasoning… </li></ul><ul><li>3. Consolidation… </li></ul>
  22. 22. <ul><li>1. Reasoning </li></ul>
  23. 23. <ul><li>High Level Approach… </li></ul><ul><li>… apply a subset of OWL 2 RL/RDF rules over the data </li></ul>
  24. 24. <ul><li>Forward Chaining materialisation: </li></ul><ul><ul><li>Avoid runtime expense of backward-chaining </li></ul></ul><ul><ul><ul><li>Users taught impatience by Google </li></ul></ul></ul><ul><ul><li>Pre-compute answers for quick retrieval </li></ul></ul><ul><ul><li>Web-scale systems should be scalable! </li></ul></ul><ul><ul><ul><li>More data = more disk-space/machines </li></ul></ul></ul>Web Reasoning: Forward Chaining! One size does not fit all! Don't materialise too much!
  25. 25. <ul><li>Scalable Authoritative OWL Reasoner </li></ul> Our Approach
  26. 26. Our Approach… <ul><li>INPUT: </li></ul><ul><li>Flat file of triples (quads) </li></ul><ul><li>OUTPUT: </li></ul><ul><li>Flat file of (partial) inferred triples (quads) </li></ul>
  27. 27. Scalable Reasoning: In-mem T-Box <ul><li>Main optimisation: Store T-Box in memory </li></ul><ul><li>T-Box: (loosely) data describing classes and properties. </li></ul><ul><ul><li>Aka. schemata/vocabularies/ontologies/terminologies. </li></ul></ul><ul><ul><li>E.g., </li></ul></ul><ul><ul><ul><li>foaf:topic owl:inverseOf foaf:page . </li></ul></ul></ul><ul><ul><ul><li>sioc:UserAccount rdfs:subClassOf foaf:OnlineAccount . </li></ul></ul></ul><ul><li>Most commonly accessed data for reasoning </li></ul><ul><li>Quite small (~0.1% for our Linked Data corpus) </li></ul><ul><ul><li>High selectivity (if you prefer) </li></ul></ul><ul><li>A-Box: Lots ?s foaf:page ?o . vs. </li></ul><ul><li>T-Box: Few foaf:page ?p ?o . + ?s ?p foaf:page . </li></ul>
  28. 28. <ul><li>Scan 1: Scan input data separate T-Box statements, load T-Box statements into memory </li></ul><ul><ul><li>Do T-Box level reasoning if required (semi-naïve) </li></ul></ul><ul><li>Scan 2: Scan all on-disk data, join with in-memory T-Box. </li></ul> Scalable Reasoning: Two Scans
  29. 29. ... ex:me foaf:homepage ex:hp . ... ... ex:hp rdf:type foaf:Document . ex:me foaf:page ex:hp . ex:hp foaf:topic ex:me . ... IN-MEM T-BOX ON-DISK A-BOX ON-DISK OUTPUT <ul><li>Execution of three rules: </li></ul><ul><li>OWL 2 RL rule prp-inv1 </li></ul><ul><li>?p 1 owl:inverseOf ?p 2 . </li></ul><ul><li>?x ?p 1 ?y . </li></ul><ul><li>⇒ ?y ?p 2 ?x . </li></ul><ul><li>OWL 2 RL rule prp-rng </li></ul><ul><li>?p rdfs:range ?c . </li></ul><ul><li>?x ?p ?y . </li></ul><ul><li>⇒ ?y a ?c . </li></ul><ul><li>OWL 2 RL rule prp-spo1 </li></ul><ul><li>?p 1 rdfs:subPropertyOf ?p 2 . </li></ul><ul><li>?x ?p 1 ?y. </li></ul><ul><li>⇒ ?x ?p 2 ?y . </li></ul>Scalable Reasoning: No A-Box Joins
  30. 30. <ul><li>However: some rules do require A-Box joins </li></ul><ul><ul><li>?p a owl:TransitiveProperty . ?x ?p ?y . ?y ?p z . </li></ul></ul><ul><ul><ul><ul><ul><li>⇒ ?x ?p ?z . </li></ul></ul></ul></ul></ul><ul><ul><li>Difficult to engineer a scalable solution (which reaches a fixpoint) for Linked Data(?) </li></ul></ul><ul><ul><li>Can lead to quadratic inferences </li></ul></ul><ul><li>A lot of useful reasoning still possible without A-Box joins… </li></ul> Scalable Reasoning: A-Box joins?
  31. 31. <ul><li>Consider source of T-Box (schemata) data </li></ul><ul><li>Class/property URIs dereference to their authoritative document </li></ul><ul><ul><li>FOAF spec authoritative for foaf:Person ✓ </li></ul></ul><ul><ul><li>MY spec not authoritative for foaf:Person ✘ </li></ul></ul><ul><li>Allow “extension” in authoritative documents </li></ul><ul><ul><li>my:Person rdfs:subClassOf foaf:Person . (MY spec) ✓ </li></ul></ul><ul><li>BUT: Reduce obscure memberships </li></ul><ul><ul><li>foaf:Person rdfs:subClassOf my:Person . (MY spec) ✘ </li></ul></ul><ul><li>ALSO: Protect specifications </li></ul><ul><ul><li>foaf:knows a owl:SymmetricProperty . (MY spec) ✘ </li></ul></ul> Authoritative Reasoning
  32. 32. Survey of terminology: counts <ul><li>Looked at use of RDFS and OWL in our corpus </li></ul><ul><li>rdfs:subClassOf ~307k axioms ~51k docs ✓ </li></ul><ul><li>owl:equivalentClass ~23k axioms ~23k docs ✓ </li></ul><ul><li>rdfs:domain ~16k axioms 623 docs ✓ </li></ul><ul><li>rdfs:range ~14k axioms 717 docs ✓ </li></ul><ul><li>owl:unionOf ~13k axioms 109 docs ✓ </li></ul><ul><li>rdfs:subPropertyOf ~9k axioms 227 docs ✓ </li></ul><ul><li>owl:inverseOf ~1k axioms 98 docs ✓ </li></ul><ul><li>owl:disjointWith 917 axioms 60 docs ✘ </li></ul><ul><li>owl:someValuesFrom 465 axioms 48 docs ✓ </li></ul><ul><li>owl:intersectionOf 325 axioms 12 docs ✓ / ✘ </li></ul><ul><li>… </li></ul>
  33. 33. <ul><li>...summary please? </li></ul><ul><ul><li>Our “cheap rules” cover 99% of RDFS/OWL axioms in our corpus </li></ul></ul><ul><ul><li>82.3% of such axioms have an authoritative version </li></ul></ul><ul><ul><li>- 78.3% of all non-authoritative axioms come from one doc </li></ul></ul><ul><ul><li>- (without which, ~96% of axioms have auth. version) </li></ul></ul><ul><ul><li>9.1% of documents have non-authoritative axioms </li></ul></ul><ul><ul><li>Authoritative reasoning for cheap rules fully support 90.6% of the “vocabulary documents” </li></ul></ul> Survey of terminology: counts
  34. 34. Survey of terminology: ranks <ul><li>Looked at use of RDFS and OWL wrt. ranks of documents… </li></ul><ul><li>rdfs:subClassOf 0.295 ✓ </li></ul><ul><li>rdfs:range 0.294 ✓ </li></ul><ul><li>rdfs:domain 0.292 ✓ </li></ul><ul><li>rdfs:subPropertyOf 0.090 ✓ </li></ul><ul><li>owl:FunctionalProperty 0.063 ✘ </li></ul><ul><li>owl:disjointWith 0.049 ✘ </li></ul><ul><li>owl:inverseOf 0.047 ✓ </li></ul><ul><li>owl:unionOf 0.035 ✓ </li></ul><ul><li>owl:SymmetricProperty 0.033 ✓ </li></ul><ul><li>owl:equivalentClass 0.021 ✓ </li></ul><ul><li>owl:InverseFunctionalProperty 0.030 ✘ </li></ul><ul><li>owl:equivalentProperty 0.030 ✓ </li></ul><ul><li>owl:someValuesFrom 0.030 ✓ / ✘ </li></ul>
  35. 35. <ul><li>...summary please? </li></ul><ul><ul><li>Adding up the ranks of all vocabularies our rules fully support gives 77% of the total rank of all vocabularies </li></ul></ul><ul><ul><li>Adding up the ranks of all vocabularies our authoritative rules fully support gives 70% of the total rank of all vocabularies </li></ul></ul><ul><ul><li>The highest ranked document our rules do not fully support was 5 th overall: SKOS </li></ul></ul><ul><ul><li>The highest ranked document with non-authoritative axioms was 7 th overall: FOAF </li></ul></ul> Survey of terminology: ranks
  36. 36. <ul><li>...let’s stick to the simple rules </li></ul>
  37. 37. Scalable Distributed Reasoning ... ... ex:me ex:presented ex:ThisTalk ... SAME T-BOX SAME T-BOX SAME T-BOX SAME T-BOX SAME T-BOX DIFF. A-BOX DIFF. A-BOX DIFF. A-BOX DIFF. A-BOX DIFF. A-BOX ... ... ex:me ex:presented ex:ThisTalk ... ... ... ex:me ex:presented ex:ThisTalk ... ... ... ex:me ex:presented ex:ThisTalk ... ... ... ex:me ex:presented ex:ThisTalk ... LOCAL OUTPUT ... ... ex:me ex:presented ex:ThisTalk ... LOCAL OUTPUT LOCAL OUTPUT LOCAL OUTPUT LOCAL OUTPUT ... ... ex:me ex:presented ex:ThisTal ... ... ex:me ex:presented ex:ThisTalk ... ... ex:me ex:presented ex:ThisTalk ... ... ex:me rdf:type ex:Awesome . ... ... ... ... ... ... ... ex:me ex:presented ex:ThisTalk ... ... ... ex:me ex:presented ex:ThisTalk ... ... ... ex:me ex:presented ex:ThisTalk ... ... ... ex:me ex:presented ex:ThisTalk ... ... ... ex:me ex:presented ex:ThisTalk ... EXTRACT T-BOX EXTRACT T-BOX EXTRACT T-BOX EXTRACT T-BOX EXTRACT T-BOX COLLECT T-BOX COLLECT T-BOX COLLECT T-BOX COLLECT T-BOX COLLECT T-BOX ... ...
  38. 38. Reasoning Performance (1 machine)
  39. 39. Reasoning Performance: Distrib. <ul><ul><li>9 machines: Total 3.35 hours </li></ul></ul>
  40. 40. Reasoning: Results 962 million unique/novel triples 947 million unique triples
  41. 41. <ul><li>2. Annotated </li></ul><ul><li>Reasoning </li></ul>
  42. 42. Annotated Reasoning <ul><li>Let’s try track some meta-information during the reasoning process </li></ul><ul><li>Annotate input triples with information </li></ul><ul><li>Use annotated reasoning framework for transforming annotations on input triples into annotations on output triples </li></ul>
  43. 43. <ul><li>Each input triple is assigned the sum of the ranks of the documents in which it appears… </li></ul><ul><li>foaf:Person rdfs:subClassOf foaf:Agent 0.3 . </li></ul><ul><li>timbl:i rdf:type foaf:Person 0.04 . </li></ul><ul><li>aidan:me rdf:type foaf:Person 0.0001 . </li></ul> Annotated Reasoning: ranks
  44. 44. <ul><li>During reasoning, inferences are assigned the least-trustworthy triple involved in their “proof” </li></ul><ul><li>foaf:Person rdfs:subClassOf foaf:Agent 0.3 . </li></ul><ul><li>timbl:i rdf:type foaf:Person 0.04 . </li></ul><ul><li>⇒ </li></ul><ul><li>timbl:i rdf:type foaf:Agent 0.04 . </li></ul> Annotated Reasoning
  45. 45. <ul><li>Can do top- k materialisation </li></ul><ul><ul><li>Only give me inferences above a certain rank threshold </li></ul></ul><ul><ul><li>Only give me top- k inferences </li></ul></ul><ul><li>Can fix inconsistencies in the data… </li></ul><ul><ul><li>… aka. logical contradictions </li></ul></ul><ul><ul><li>… interpreting the rank values as denoting “trustworthy” data </li></ul></ul> Why?
  46. 46. <ul><li>foaf:Person owl:disjointWith foaf:Document . </li></ul> Inconsistencies: aka. Contradictions
  47. 47. <ul><li>?c 1 owl:disjointWith ?c 2 . </li></ul><ul><li>?x rdf:type ?c 1 . </li></ul><ul><li>?x rdf:type ?c 2 . </li></ul><ul><li>⇒ false </li></ul><ul><li>foaf:Person owl:disjointWith foaf:Document . </li></ul><ul><li>ex:sleepygirl rdf:type foaf:Person . </li></ul><ul><li>ex:sleepygirl rdf:type foaf:Document . </li></ul><ul><li>⇒ false </li></ul> Cannot compute…
  48. 48. <ul><li>Considered two approaches: </li></ul><ul><li>Find the “consistency threshold” of the input + inferred data: </li></ul><ul><ul><li>The largest rank such that all data above that rank are consistent </li></ul></ul><ul><ul><li>Unfortunately, the 22 nd ranked document had an ill-typed literal, and so was inconsistent… </li></ul></ul><ul><ul><li>So we would keep the data of ~22 documents </li></ul></ul><ul><ul><li>And throw away the data of nearly four million </li></ul></ul> Fixing inconsistencies
  49. 49. <ul><li>Time for Plan B: </li></ul><ul><li>2. Perform a “granular” repair of the data </li></ul><ul><ul><li>Remove the weakest triple causing each contradiction </li></ul></ul><ul><li>foaf:Person owl:disjointWith foaf:Document 0.3 . </li></ul><ul><li>ex:sleepygirl rdf:type foaf:Person 0.007 . </li></ul><ul><li>ex:sleepygirl rdf:type foaf:Document 0.002 . </li></ul> Fixing inconsistencies
  50. 50. <ul><li>~294k ill-typed datatypes </li></ul><ul><li>~7k members of disjoint classes </li></ul> Inconsistencies found
  51. 51. Performance <ul><li>9 machines </li></ul><ul><li>Annotated Reasoning: 14.6 hrs </li></ul><ul><ul><li>(vs. 3.35hrs w/o annotations: need to do a distributed sort to remove non-optimal triples ) </li></ul></ul><ul><li>Detect/Extract Inconsistencies: 2.9 hrs </li></ul><ul><li>Diagnosis/Repair 2.8 hrs </li></ul><ul><ul><li>Total ~20.3 hours </li></ul></ul>
  52. 52. <ul><li>3. Consolidation </li></ul>
  53. 53. Consolidation for Linked Data
  54. 54. <ul><li>Baseline Approach… </li></ul><ul><li>… use the explicit owl:sameAs relations given in the data… </li></ul>
  55. 55. <ul><li>Scan the data and extract all owl:sameAs triples </li></ul><ul><li>timbl:i owl:sameas identica:45563 . </li></ul><ul><li>dbpedia:Berners-Lee owl:sameas identica:45563 . </li></ul><ul><li>Load into memory </li></ul><ul><ul><li>Use a map to store equivalences: </li></ul></ul><ul><ul><li>timbl:i -> </li></ul></ul><ul><ul><li>identica:45563 -> </li></ul></ul><ul><ul><li>dbpedia:Berners-Lee -> </li></ul></ul> Consolidation: Baseline <ul><ul><li>timbl:i </li></ul></ul><ul><ul><li>identica:45563 </li></ul></ul><ul><ul><li>dbpedia:Berners-Lee </li></ul></ul>
  56. 56. <ul><li>For each set of equivalent identifiers, choose a canonical term </li></ul> Consolidation: Baseline <ul><ul><li>timbl:i </li></ul></ul><ul><ul><li>identica:45563 </li></ul></ul><ul><ul><li>dbpedia:Berners-Lee </li></ul></ul>
  57. 57. <ul><li>Scan data a second time: </li></ul><ul><ul><li>Rewrite identifiers to their canonical version </li></ul></ul><ul><ul><ul><li>Skip predicates and values of rdf:type </li></ul></ul></ul> Canonicalisation timbl:i rdf:type foaf:Person . identica:48404 foaf:knows identica:45563 . dbpedia:Berners-Lee dpo:birthDate “ 1955-06-08”^^xsd:date . dbpedia:Berners-Lee rdf:type foaf:Person . identica:48404 foaf:knows dbpedia:Berners-Lee . dbpedia:Berners-Lee dpo:birthDate “ 1955-06-08”^^xsd:date . <ul><ul><li>timbl:i </li></ul></ul><ul><ul><li>identica:45563 </li></ul></ul><ul><ul><li>dbpedia:Berners-Lee </li></ul></ul>
  58. 58. Baseline Consolidation: Performance <ul><li>9 machines </li></ul><ul><li>Extract owl:sameAs : 0.2 hr </li></ul><ul><li>Gather owl:sameAs : 0.1 hr </li></ul><ul><li>Canonicalise data 0.7 hr </li></ul><ul><ul><li>Total ~1.1 hours </li></ul></ul>
  59. 59. <ul><li>Applied over raw input data </li></ul><ul><li>~12 million owl:sameAs triples </li></ul><ul><li>~2.2 million sets of equivalent identifiers </li></ul><ul><li>~5.8 million identifiers involved </li></ul><ul><ul><li>~2.65 identifiers per set </li></ul></ul><ul><ul><li>~99.99% of terms were URIs </li></ul></ul><ul><ul><li>~6.25% of all URIs </li></ul></ul> Baseline Consolidation: Results
  60. 60. <ul><li>Extended Approach… </li></ul><ul><li>… use the owl:sameAs relations inferable through reasoning… </li></ul>
  61. 61. <ul><li>Infer owl:sameAs through reasoning (OWL 2 RL/RDF) </li></ul><ul><ul><li>explicit owl:sameAs (again) </li></ul></ul><ul><ul><li>owl:InverseFunctionalProperty </li></ul></ul><ul><ul><li>owl:FunctionalProperty </li></ul></ul><ul><ul><li>owl:cardinality 1 / owl:maxCardinality 1 </li></ul></ul><ul><li>foaf:homepage a owl:InverseFunctionalProperty . </li></ul><ul><li>timbl:i foaf:homepage w3c:timblhomepage . </li></ul><ul><li>adv:timbl foaf:homepage w3c:timblhomepage . </li></ul><ul><li>⇒ </li></ul><ul><li>timbl:i owl:sameas adv:timbl . </li></ul><ul><li>… then apply consolidation as before </li></ul> Extended Consolidation
  62. 62. <ul><li>OWL 2 RL/RDF consolidation rules require A-Box joins! </li></ul><ul><li>Might not be able to fit owl:sameAs index in memory ( 4 Gb ) ! </li></ul><ul><li>⇒ Use on-disk batch-processing </li></ul><ul><ul><li>Distributed sorts, scans and merge-joins </li></ul></ul> Derive owl:sameAs on-disk
  63. 63. Extended Consolidation: Performance <ul><li>9 machines </li></ul><ul><li>Inferring owl:sameAs ~7.4 hr </li></ul><ul><li>Canonicalise data ~4.9 hr </li></ul><ul><ul><li>Total ~12.3 hours </li></ul></ul><ul><ul><li>(11X baseline) </li></ul></ul>
  64. 64. <ul><li>~12 million explicit owl:sameAs triples (as before) </li></ul><ul><li>~8.7 million thru. owl:InverseFunctionalProperty </li></ul><ul><li>~106 thousand thru. owl:FunctionalProperty </li></ul><ul><li>none thru. owl:cardinality / owl:maxCardinality </li></ul><ul><li>~2.8 million sets of equivalent identifiers </li></ul><ul><ul><li>(1.31x baseline) </li></ul></ul><ul><li>~14.86 million identifiers involved </li></ul><ul><ul><li>(2.58x baseline) </li></ul></ul><ul><li>~5.8 million URIs </li></ul><ul><ul><li>(1.014x baseline) </li></ul></ul> Extended Consolidation: Results
  65. 65. <ul><li>CONCLUSION </li></ul>
  66. 66. timbl:i foaf:page ?pages . <ul><ul><li>timbl:i </li></ul></ul><ul><ul><li>identica:45563 </li></ul></ul><ul><ul><li>dbpedia:Berners-Lee </li></ul></ul>dbpedia:Berners-Lee foaf:page ?pages .
  67. 67. <ul><li>Heterogeneity poses a significant problem for consuming Linked Data </li></ul><ul><li>Lightweight reasoning can go a long way </li></ul><ul><ul><li>Simple/authoritative rules have reasonable coverage </li></ul></ul><ul><li>Deceit/Noise ≠ End Of World </li></ul><ul><li>Inconsistency ≠ End Of World </li></ul><ul><ul><li>Useful for finding noise in fact! </li></ul></ul><ul><li>Explicit owl:sameAs vs. extended consolidation: </li></ul><ul><ul><li>Extended consolidation mostly for consolidating blank-nodes from older FOAF exporters </li></ul></ul> Conclusions

×