Enhancing Large-Scale RDF Web Knowledge-bases for Query Answering

1,195 views
1,160 views

Published on

Given at DERI Galway on 2010/02/12.

More and more data is being published on the Web through RDF; in particular – and largely under the auspices of the pragmatic Linked Data community – more and more structured data is being published within a plethora of different domains: e.g., information is being published from Wikipedia, BBC, LastFM, the UK government, etc., describing people, organisations, online communities, movies, proteins, and so forth. In this talk, I will present research I have carried out during my PhD which aims at enhancing large-scale RDF web datasets for query-answering: I will show some simple examples of problems with query-answering over the “native” RDF data, and discuss pragmatic solutions – in light of these examples – which involve using a scalable and best-effort reasoning approach. I will also discuss open questions and future directions along the lines of the above topic.

Published in: Technology, Education
2 Comments
4 Likes
Statistics
Notes
  • omplexity of knowledge representation leads to an non efficient reasoning service.
    structure of representation must give possibilties to simplfy the reasoners
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Conclusion:

    1. First T-Box reasoning, second A-Box reasoning?
    2. Good T-Box reasoning needs good interlinked ontology specifications?
    3. Establishing a better dialog between difference ontology modeling developers to avoid ontology hijacking.

    Cheers,

    Bob
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
1,195
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
19
Comments
2
Likes
4
Embeds 0
No embeds

No notes for slide

Enhancing Large-Scale RDF Web Knowledge-bases for Query Answering

  1. 1. Enhancing Large-Scale RDF Web Knowledge-bases for Query Answering Mini-Viva Aidan Hogan 12 th February, 2010
  2. 2. Overview Fig 1: RDF Web Dataset explicit data implicit data Topic of today’s talk: How to exploit implicit data for “query answering”
  3. 3. Query Answering… <ul><li>… over RDF Web data </li></ul><ul><li>… (Linked Data if you prefer) </li></ul><ul><li>Search engines such as SWSE , Sindice, Falcons, Swoogle, Watson etc. </li></ul><ul><li>SPARQL endpoints over Web data such as YARS2 , Virtuoso, etc. </li></ul>
  4. 4. Problem: Synonymous Omissions ex:Aidan ex:presented ex:RR2009Talk . deri:Aidan ex:presented deri:FridayTalk . Query: Give me all talks presented by Aidan ex:Aidan ex:presented ?talk . IMPLICT EXPLICIT
  5. 5. Problem: Synonymous Duplicates ex:Aidan ex:presented deri:FridayTalk . ex:Aidan ex:presented ex:FridayTalk . Query: Give me all talks presented by Aidan ex:Aidan ex:presented ?talk . IMPLICT EXPLICIT
  6. 6. Problem: Incomplete Answers ex:RR2009Talk ex:presentedBy ex:Aidan . Query: Give me all talks presented by Aidan ex:Aidan ex:presented ?talk . IMPLICT EXPLICIT
  7. 7. Solution: Publish Complete Data? Query: Give me all talks presented by Aidan ex:Aidan ex:presented ?talk . IMPLICT EXPLICIT ex:RR2009Talk ex:presentedBy ex:Aidan . ex:Aidan ex:presented ex:RR2009Talk .
  8. 8. Solution: Write Query in many ways? ex:RR2009Talk ex:presentedBy ex:Aidan . Query: Give me all talks presented by Aidan ex:Aidan ex:presented ?talk . ?talk ex:presentedBy ex:Aidan . IMPLICT EXPLICIT
  9. 9. Solution: Exploit OWL and RDFS… deri:Aidan ex:presented deri:FridayTalk . deri:Aidan owl:sameAs ex:Aidan . ex:RR2009Talk ex:presentedBy ex:Aidan . ex:presentedBy owl:inverseOf ex:presented . ex:Aidan ex:presented deri:FridayTalk . ex:Aidan ex:presented ex:RR2009Talk . Query: Give me all talks presented by Aidan ex:Aidan ex:presented ?talk . IMPLICT EXPLICIT
  10. 10. OWL / RDFS <ul><li>(loosely) Define the semantics of classes and properties… (define relationships between terms) </li></ul><ul><li>ex:presentedBy owl:inverseOf ex:presented . </li></ul><ul><li>ex:presentedBy rdfs:domain ex:Talk . </li></ul><ul><li>ex:presentedBy rdfs:range ex:Person . </li></ul><ul><li>Define equivalence between individuals ( owl:sameAs ) </li></ul><ul><li>ex:Aidan owl:sameAs deri:Aidan . </li></ul><ul><li>Give machines an insight into the meaning of data </li></ul><ul><li>Allows for reasoning </li></ul>
  11. 11. Reasoning <ul><li>(Loosely) Use the semantics of classes and properties —defined in RDFS and OWL—to make implicit knowledge explicit </li></ul><ul><li>One approach is using rules: IF condition THEN consequent </li></ul><ul><ul><li>?p1 owl:inverseOf ?p2 . </li></ul></ul><ul><ul><li>?s ?p1 ?o . </li></ul></ul><ul><ul><li>=> ?o ?p2 ?s . </li></ul></ul>ex:presentedBy owl:inverseOf ex:presented . ex:FridayTalk ex:presentedBy ex:Aidan . => ex:Aidan ex:presented ex:FridayTalk . <ul><ul><li>?p rdfs:domain ?c . </li></ul></ul><ul><ul><li>?s ?p ?o . </li></ul></ul><ul><ul><li>=> ?s rdf:type ?c . </li></ul></ul><ul><ul><li>?p rdfs:range ?c . </li></ul></ul><ul><ul><li>?s ?p ?o . </li></ul></ul><ul><ul><li>=> ?o rdf:type ?c . </li></ul></ul>ex:presentedBy rdfs:domain ex:Talk . ex:FridayTalk ex:presentedBy ex:Aidan . => ex:FridayTalk rdf:type ex:Talk . ex:presentedBy rdfs:range ex:Person . ex:FridayTalk ex:presentedBy ex:Aidan . => ex:Aidan rdf:type ex:Person .
  12. 12. Reasoning: Make Explicit the Implicit deri:Aidan ex:presented deri:FridayTalk . deri:Aidan owl:sameAs ex:Aidan . ex:RR2009Talk ex:presentedBy ex:Aidan . ex:presentedBy owl:inverseOf ex:presented . Query: Give me all talks presented by Aidan ex:Aidan ex:presented ?talk . ex:Aidan ex:presented deri:FridayTalk . ex:Aidan ex:presented ex:RR2009Talk . IMPLICT EXPLICIT
  13. 13. Web Reasoning: Challenges <ul><li>Scalability </li></ul><ul><ul><li>Billions or tens of billions of statements (for the moment) </li></ul></ul><ul><ul><ul><li>Near linear scale!!! </li></ul></ul></ul><ul><li>Noisy data </li></ul><ul><ul><li>Inconsistencies galore </li></ul></ul><ul><ul><li>Publishing errors </li></ul></ul><ul><ul><li>“ Ontology hijacking” </li></ul></ul>
  14. 14. Web Reasoning: Challenges <ul><li>Challenges (Semantic Web Wikipedia Article) </li></ul><ul><li>Some of the challenges for the Semantic Web include vastness, vagueness, uncertainty, inconsistency and deceit. Automated reasoning systems will have to deal with all of these issues in order to deliver on the promise of the Semantic Web. </li></ul><ul><li>Vastness: The World Wide Web contains at least 48 billion pages as of this writing (August 2, 2009). The SNOMED CT medical terminology ontology contains 370,000 class names, and existing technology has not yet been able to eliminate all semantically duplicated terms. Any automated reasoning system will have to deal with truly huge inputs. </li></ul><ul><li>Vagueness: These are imprecise concepts like &quot;young&quot; or &quot;tall&quot;. This arises from the vagueness of user queries, of concepts represented by content providers, of matching query terms to provider terms and of trying to combine different knowledge bases with overlapping but subtly different concepts. Fuzzy logic is the most common technique for dealing with vagueness. </li></ul><ul><li>Uncertainty: These are precise concepts with uncertain values. For example, a patient might present a set of symptoms which correspond to a number of different distinct diagnoses each with a different probability. Probabilistic reasoning techniques are generally employed to address uncertainty. </li></ul><ul><li>Inconsistency: These are logical contradictions which will inevitably arise during the development of large ontologies, and when ontologies from separate sources are combined. Deductive reasoning fails catastrophically when faced with inconsistency, because &quot;anything follows from a contradiction&quot; . Defeasible reasoning and paraconsistent reasoning are two techniques which can be employed to deal with inconsistency. </li></ul><ul><li>Deceit: This is when the producer of the information is intentionally misleading the consumer of the information. Cryptography techniques are currently utilized to ameliorate this threat. </li></ul>
  15. 15. Noisy Data: Omnipotent Being <ul><li>Proposition 1 </li></ul><ul><li>Web data is noisy. </li></ul><ul><li>Proof: </li></ul><ul><li>08445a31a78661b5c746feff39a9db6e4e2cc5cf </li></ul><ul><li>sha1-sum of ‘ mailto :’ </li></ul><ul><li>common value for foaf:mbox_sha1sum </li></ul><ul><ul><li>An inverse-functional (uniquely identifying) property!!! </li></ul></ul><ul><ul><li>Any person who shares the same value will be considered the same </li></ul></ul><ul><li>Q.E.D. </li></ul>
  16. 16. <ul><li>More Proof: </li></ul><ul><li>From http://www.eiao.net/rdf/1.0 </li></ul><ul><li><owl:Property rdf:about=&quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&quot;> </li></ul><ul><li><rdfs:label xml:lang=&quot;en&quot;>type</rdfs:label> </li></ul><ul><li><rdfs:comment xml:lang=&quot;en&quot;>Type of resource</rdfs:comment> </li></ul><ul><li><rdfs:domain rdf:resource=&quot;http://www.eiao.net/rdf/1.0#testRun&quot;/> </li></ul><ul><li><rdfs:domain rdf:resource=&quot;http://www.eiao.net/rdf/1.0#pageSurvey&quot;/> </li></ul><ul><li><rdfs:domain rdf:resource=&quot;http://www.eiao.net/rdf/1.0#siteSurvey&quot;/> </li></ul><ul><li><rdfs:domain rdf:resource=&quot;http://www.eiao.net/rdf/1.0#scenario&quot;/> </li></ul><ul><li><rdfs:domain rdf:resource=&quot;http://www.eiao.net/rdf/1.0#rangeLocation&quot;/> </li></ul><ul><li><rdfs:domain rdf:resource=&quot;http://www.eiao.net/rdf/1.0#startPointer&quot;/> </li></ul><ul><li><rdfs:domain rdf:resource=&quot;http://www.eiao.net/rdf/1.0#endPointer&quot;/> </li></ul><ul><li><rdfs:domain rdf:resource=&quot;http://www.eiao.net/rdf/1.0#header&quot;/> </li></ul><ul><li><rdfs:domain rdf:resource=&quot;http://www.eiao.net/rdf/1.0#runs&quot;/> </li></ul><ul><li></owl:Property> </li></ul><ul><li>Ontology hijacking!! </li></ul>Noisy Data: Redefining Everything …and home in time for tea
  17. 17. The Web… …forecast is for muck
  18. 18. <ul><li>(Briefly) Why use a rule based approach? </li></ul><ul><li>… as opposed to a Description Logics based approach </li></ul><ul><ul><li>Massive A-Box (i.e., instance data) </li></ul></ul><ul><ul><li>Inconsistencies galore </li></ul></ul><ul><ul><li>Publishing errors / Messy data </li></ul></ul><ul><ul><li>Popular Web ontologies are fairly inexpressive </li></ul></ul>Web Reasoning: Use Rules!
  19. 19. <ul><li>Forward Chaining materialisation: </li></ul><ul><ul><li>Avoid runtime expense of backward-chaining </li></ul></ul><ul><ul><ul><li>Users taught impatience by Google </li></ul></ul></ul><ul><ul><li>Pre-compute answers for quick retrieval </li></ul></ul><ul><ul><li>Web-scale systems should be scalable! </li></ul></ul><ul><ul><ul><li>More data = more disk space AND/OR more machines </li></ul></ul></ul>Web Reasoning: Forward Chaining!
  20. 20. <ul><li>“ Standard” </li></ul><ul><ul><li>RDFS </li></ul></ul><ul><ul><li>OWL 2 RL (W3C Rec: 27 Oct. 2009) </li></ul></ul><ul><li>“ Non-standard” </li></ul><ul><ul><li>DLP </li></ul></ul><ul><ul><li>pD* (OWL Horst) </li></ul></ul><ul><ul><li>OWL – </li></ul></ul><ul><li>OWL 2 RL first standard OWL rule expressible “fragment”! </li></ul><ul><li>More inclusive than previous non-standard OWL rule fragments </li></ul><ul><li>Includes RDFS rules </li></ul><ul><li>Includes rule support for new OWL 2 constructs </li></ul><ul><ul><li>… although I don’t know of any OWL 2 data on the Web </li></ul></ul>What rules?
  21. 21. Okay, so let’s do forward-chaining OWL 2 RL on billions of triples collected from the Web… <ul><li>foaf:mbox_sha1sum a owl:InverseFunctionalProperty . </li></ul><ul><li>?x foaf:mbox_sha1sum 08445a31a78661b5c746feff39a9db6e4e2cc5cf . </li></ul><ul><li>OWL 2 RL rule prp-ifp: </li></ul><ul><li>?p a owl:InverseFunctionalProperty . ?x 1 ?p ?z . ?x 2 ?p ?z . </li></ul><ul><li>⇒ ?x 1 owl:sameAs ?x 2 . </li></ul><ul><li>10 6 ?x 1 / ?x 2 bindings in body </li></ul><ul><li>10 12 inferred pair-wise and reflexive owl:sameAs statements </li></ul><ul><li> … or in simpler terms: </li></ul>pow!
  22. 22. Okay, so let’s do forward-chaining OWL 2 RL on billions of triples collected from the Web… <ul><li>OWL 2 RL rule eq-ref : </li></ul><ul><li>?s ?p ?o . ⇒ ?s owl:sameAs ?s . ?p owl:sameAs ?p . ?o owl:sameAs ?o . </li></ul><ul><li>Adds |T| triples, where T is the set of RDF terms in the data </li></ul><ul><li>Could be easily supported by backward-chaining/query rewriting </li></ul><ul><li>Boring </li></ul>
  23. 23. SAOR: Scalable Authoritative OWL Reasoner <ul><li>Goals: </li></ul><ul><li>Scalability </li></ul><ul><ul><li>Separate TBox (schema) data </li></ul></ul><ul><ul><li>Incomplete reasoning! </li></ul></ul><ul><li>Reduced Output </li></ul><ul><ul><li>Incomplete reasoning! </li></ul></ul><ul><li>Web tolerance </li></ul><ul><ul><li>Consider provenance of Web data </li></ul></ul><ul><ul><li>Incomplete reasoning! </li></ul></ul>
  24. 24. Scalable Reasoning: In-mem T-Box <ul><li>Main optimisation: Store T-Box in memory </li></ul><ul><li>T-Box: (loosely) data describing classes and properties </li></ul><ul><li>By far, the most commonly accessed segment of data for reasoning </li></ul><ul><li>Quite small (1-2%) </li></ul><ul><li>e.g. from a 100M statement Web crawl </li></ul><ul><li>A-Box: 3,753,791 X ?s foaf:name ?o . vs. T-Box: <20 X foaf:name ?p ?o . + ?s ?p foaf:name . </li></ul>
  25. 25. <ul><li>Scan 1: Scan input data separate T-Box statements, load T-Box statements into memory </li></ul><ul><li>Scan 2: Scan all on-disk data, join with in-memory T-Box. </li></ul>Scalable Reasoning: Scans
  26. 26. ... ... ex:me ex:presented ex:FridayTalk ... ... ex:FridayTalk ex:presentedBy ex:me . ex:me rdf:type foaf:Person . ex:me rdf:type foaf:Agent . ... IN-MEM T-BOX ON-DISK A-BOX ON-DISK OUTPUT <ul><li>Execution of three rules: </li></ul><ul><li>OWL 2 RL rule prp-inv1 </li></ul><ul><li>?p 1 owl:inverseOf ?p 2 . ?x ?p 1 ?y . ⇒ ?y ?p 2 ?x . </li></ul><ul><li>OWL 2 RL rule prp-dom </li></ul><ul><li>?p rdfs:domain ?c . ?x ?p ?y . ⇒ ?x a ?c . </li></ul><ul><li>OWL 2 RL rule cax-sco </li></ul><ul><li>?c 1 rdfs:subClassOf ?c 2 . ?x a ?c 1 . ⇒ ?x a ?c 2 . </li></ul>Scalable Reasoning: No A-Box Joins
  27. 27. <ul><li>However: some rules do require A-Box joins </li></ul><ul><ul><li>?p a owl:TransitiveProperty . ?x ?p ?y . ?y ?p z . </li></ul></ul><ul><ul><ul><ul><ul><li>⇒ ?x ?p ?z . </li></ul></ul></ul></ul></ul><ul><ul><li>Difficult to engineer a scalable solution (which reaches a fixpoint) </li></ul></ul><ul><ul><li>No A-Box joins for SAOR reasoning over 1B statements </li></ul></ul><ul><ul><li>~99% of inferences over Web data possible without A-Box joins </li></ul></ul><ul><li>48/76 OWL 2 RL rules don’t require A-Box joins </li></ul><ul><ul><li>Side note: No RDFS rule requires A-Box joins </li></ul></ul><ul><ul><li>And rules cover ~most of what current Web ontologies use </li></ul></ul>Scalable Reasoning: A-Box joins?
  28. 28. <ul><li>T-Box only! </li></ul><ul><li>Document D authoritative for class/property X iff: </li></ul><ul><ul><li>X is a blank-node </li></ul></ul><ul><ul><ul><li>OR </li></ul></ul></ul><ul><ul><li>De-referenced URI of X coincides with or redirects to D </li></ul></ul><ul><ul><li>FOAF spec authoritative for foaf:Person ✓ </li></ul></ul><ul><ul><li>MY spec not authoritative for foaf:Person ✘ </li></ul></ul><ul><li>Only allow extension in authoritative documents </li></ul><ul><ul><li>my:Person rdfs:subClassOf foaf:Person . (MY spec) ✓ </li></ul></ul><ul><li>BUT: Reduce obscure memberships </li></ul><ul><ul><li>foaf:Person rdfs:subClassOf my:Person . (MY spec) ✘ </li></ul></ul><ul><li>ALSO: Protect specifications </li></ul><ul><ul><li>foaf:mbox rdf:type owl:SymmetricProperty . (MY spec) ✘ </li></ul></ul><ul><li>S imilarly for other T-Box statements. </li></ul><ul><li>In-memory T-Box stores authoritative information for rule execution </li></ul><ul><li>Greatly reduces output size!!! </li></ul><ul><li>Compatible with FOAF, SIOC, DC (common Web vocabulary etiquette) </li></ul>Authoritative Reasoning
  29. 29. <ul><li>More Proof: </li></ul><ul><li>From http://www.eiao.net/rdf/1.0 </li></ul><ul><li><owl:Property rdf:about=&quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&quot;> </li></ul><ul><li><rdfs:label xml:lang=&quot;en&quot;>type</rdfs:label> </li></ul><ul><li><rdfs:comment xml:lang=&quot;en&quot;>Type of resource</rdfs:comment> </li></ul><ul><li><rdfs:domain rdf:resource=&quot;http://www.eiao.net/rdf/1.0#testRun&quot;/> </li></ul><ul><li><rdfs:domain rdf:resource=&quot;http://www.eiao.net/rdf/1.0#pageSurvey&quot;/> </li></ul><ul><li><rdfs:domain rdf:resource=&quot;http://www.eiao.net/rdf/1.0#siteSurvey&quot;/> </li></ul><ul><li><rdfs:domain rdf:resource=&quot;http://www.eiao.net/rdf/1.0#scenario&quot;/> </li></ul><ul><li><rdfs:domain rdf:resource=&quot;http://www.eiao.net/rdf/1.0#rangeLocation&quot;/> </li></ul><ul><li><rdfs:domain rdf:resource=&quot;http://www.eiao.net/rdf/1.0#startPointer&quot;/> </li></ul><ul><li><rdfs:domain rdf:resource=&quot;http://www.eiao.net/rdf/1.0#endPointer&quot;/> </li></ul><ul><li><rdfs:domain rdf:resource=&quot;http://www.eiao.net/rdf/1.0#header&quot;/> </li></ul><ul><li><rdfs:domain rdf:resource=&quot;http://www.eiao.net/rdf/1.0#runs&quot;/> </li></ul><ul><li></owl:Property> </li></ul><ul><li>Ontology hijacking!! </li></ul>Noisy Data: Redefining Everything …revisited Not authoritative!!!!
  30. 30. Distributed Reasoning <ul><li>More recently performed reasoning over a cluster of commodity hardware </li></ul><ul><li>Ran the “easy” OWL 2 RL rules (no A-Box joins) </li></ul><ul><ul><li>Duplicate the T-Box to all machines… A-Box can be arbitrarily distributed… </li></ul></ul><ul><li>Authoritative (of course) </li></ul><ul><li>Eight machines, 4GB main memory, 2.2 GHz </li></ul><ul><li>1.192b input statements crawled last month, pre-distributed over the machines </li></ul><ul><ul><li>Reasoning in 113 minutes </li></ul></ul><ul><ul><ul><li>Extract T-Box 16 mins </li></ul></ul></ul><ul><ul><ul><li>Aggregate, perform authoritative analysis, broadcast T-Box: 14 mins </li></ul></ul></ul><ul><ul><ul><li>Reasoning over A-Box: 83 mins </li></ul></ul></ul><ul><ul><li>Output 570m inferences </li></ul></ul>
  31. 31. … and back again ex:RR2009Talk ex:presentedBy ex:Aidan . ex:presentedBy owl:inverseOf ex:presented . Query: Give me all talks presented by Aidan ex:Aidan ex:presented ?talk . IMPLICT EXPLICIT ex:Aidan ex:presented ex:RR2009Talk .
  32. 32. … but what about… ex:Aidan ex:presented ex:RR2009Talk . deri:Aidan ex:presented deri:FridayTalk . Query: Give me all talks presented by Aidan ex:Aidan ex:presented ?talk . IMPLICT EXPLICIT
  33. 33. … and… ex:Aidan ex:presented deri:FridayTalk . ex:Aidan ex:presented ex:FridayTalk . Query: Give me all talks presented by Aidan ex:Aidan ex:presented ?talk . IMPLICT EXPLICIT
  34. 34. <ul><li>Equality Reasoning: </li></ul><ul><ul><li>Standard (e.g. OWL 2 RL) rules for OWL equality not great: </li></ul></ul><ul><ul><li>Saw noisy data earlier </li></ul></ul><ul><ul><li>Quadratic explosion of inferences for equivalences, with high duplication </li></ul></ul><ul><ul><li>Do not solve “synonymous duplicates” query answering problem </li></ul></ul><ul><li>Entity Consolidation: </li></ul><ul><ul><li>Instead use “canonical” identifiers: one term to represent set of equivalent individuals </li></ul></ul>Equality Reasoning/ Entity Consolidation
  35. 35. Need owl:sameAs relations <ul><li>Explicit owl:sameAs statements: good precision / poor recall </li></ul><ul><li>Inverse-functional properties: reasonable recall / poor precision </li></ul><ul><ul><li>Publishers not aware of inverse-functional semantics of such properties </li></ul></ul><ul><li>Other rules: very poor recall / ? high precision </li></ul><ul><li>Been there, done that… </li></ul><ul><li>Most recently, consolidation using explicit owl:sameAs statements </li></ul><ul><ul><li>8 machines (as before) 61mins for 1.193bn statements </li></ul></ul>
  36. 36. <ul><li>Probabilistic/statistical approach to equality reasoning </li></ul><ul><ul><li>Identify resources with high probability of being equivalent: </li></ul></ul><ul><ul><ul><li>use semantics in the data (equality reasoning); </li></ul></ul></ul><ul><ul><ul><li>use statistics derived from the data; e.g.: </li></ul></ul></ul><ul><ul><ul><ul><li>Two people have same birthday, name and share co-authors… </li></ul></ul></ul></ul><ul><ul><li>Identify resources with high probability of being different: </li></ul></ul><ul><ul><ul><li>use semantics in the data (inconsistencies?); </li></ul></ul></ul><ul><ul><ul><li>again use statistics derived from the data; e.g.: </li></ul></ul></ul><ul><ul><ul><ul><li>Two people have different dates-of-birth and names </li></ul></ul></ul></ul><ul><li>Perform “fuzzy” reasoning </li></ul><ul><ul><li>Leverage links-based analysis from input-data to give inferences “scores” of trustworthiness </li></ul></ul><ul><ul><li>Depending on results, could be used to, e.g.: </li></ul></ul><ul><ul><ul><li>identify “interesting” inferences for partial-materialisation; </li></ul></ul></ul><ul><ul><ul><li>identify “trustworthy” inferences for bypassing noise in Web data. </li></ul></ul></ul><ul><li>Must be domain-agnostic/scalable/distributable/give good results for noisy and heterogeneous Web data </li></ul>Future Work
  37. 37. 1. Web data is messy 2. Reasoning over Web data is difficult 3. Need incomplete, albeit inclusive reasoning 4. Rule execution optimisations possible through special treatment of terminological data 5. Need to consider the provenance of data 6. OWL 2 RL not immediately suitable to application over Web data 7. Incomplete OWL 2 RL support can be offered using existing technologies, in a scalable and tolerant way 8. Busy year ahead Conclusion

×