cade23-schneidsut-atp4owlfull-2011
Upcoming SlideShare
Loading in...5
×
 

cade23-schneidsut-atp4owlfull-2011

on

  • 875 views

Presentation of the paper "Reasoning in the OWL 2 Full Ontology Language using First-Order Automated Theorem Proving" by Michael Schneider, FZI Karlsruhe, and Geoff Sutcliffe, University of Miami, at ...

Presentation of the paper "Reasoning in the OWL 2 Full Ontology Language using First-Order Automated Theorem Proving" by Michael Schneider, FZI Karlsruhe, and Geoff Sutcliffe, University of Miami, at the 23rd International Conference on Automated Deduction (CADE 23), August 2011.

Statistics

Views

Total Views
875
Views on SlideShare
875
Embed Views
0

Actions

Likes
1
Downloads
11
Comments
1

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

cade23-schneidsut-atp4owlfull-2011 cade23-schneidsut-atp4owlfull-2011 Presentation Transcript

  • Reasoning in the OWL 2 Full Ontology Language using First-Order Automated Theorem Proving Michael Schneider FZI Research Center for Information Technology, Germany Geoff Sutcliffe University of Miami, USA 23rd International Conference on Automated Deduction (CADE 23) Wrocław, Poland, August 2011
  • Introduction• Context: Semantic Web, OWL, First-Order Logic, ATP• Focus: W3C ontology language OWL 2 Full – highly expressive ontology formalism – no reasoner as of today• Aims: find out … – … to what extend can practical OWL 2 Full reasoning be implemented with off-the-shelf FOL reasoning technology – … whether FOL-based OWL 2 Full reasoning can provide added value over todays state-of-the art OWL reasoners
  • PRELIMINARIES
  • Semantic Web• Semantic Web (SW): extends WWW by machine-processible, interlinked resource descriptions and vocabularies – Resources (everything one can talk about): :Eagle • identified by URIs age • described by property-value pairs 3 father • classified by classes :Harry :Larry sex • related to other resources via properties m – Vocabularies: define classes and properties and their (formal) semantics, e.g. that class Eagle is a subclass of class Animal, so Harry becomes an Animal• RDF ("Resource Description Framework"): language to define "graphs" of interlinked resource descriptions• OWL ("Web Ontology Language"): language to define vocabularies
  • OWL Flavours• OWL 2: family of ontology languages for the Semantic Web – W3C Recommendation (2009) – version 2 is revised and extended version of OWL (2004)• Two major "flavours" of OWL: OWL 2 DL and OWL 2 Full – OWL 2 DL: basically a description logic (SROIQ[D]) adjusted to SW needs – OWL 2 Full: similar to OWL 2 DL, but directly designed for an RDF-based SW• Some observable distinctive features of OWL 2 Full: – can be applied to weakly-structured SW data (LOD) and RDFS vocabularies – no restrictions on use of OWL constructs (e.g. asymmetric transitive properties) – support for (semantic) metamodeling (Harry, the Eagle, has meta-class Species)• Theoretical issue: OWL 2 Full is undecidable (practical problem?)
  • OWL 2 Full Semantics• Specification: OWL 2 Full semantics is specified via a set of model-theoretic "semantic conditions" OWL 2 RDF-Based Semantics: http://www.w3.org/TR/owl2-rdf-based-semantics/• Core Observation: – all semantic conditions have the form of standard first-order logic formulae → OWL 2 Full semantics is essentially a FOL theory! – all input RDF graphs are representable as FOL formulae – hence: OWL 2 Full reasoning is implementable in terms of FOL reasoning!• Question: How well does it work in practice?• Prior Art: very little research based on this observation so far; none for OWL 2 Full
  • OWL 2 Full Semantic Conditions• Typical format of semantic conditions: if a certain semantic relationship holds, then another associated relationship also holds (Note: many semantic conditions are in fact if-and-only-if conditions)• Example (1st semantic condition in Table 5.8 of OWL 2 RDF-Based Semantics): if two individuals c1 and c2 are related by the denotation of URI rdfs:subClassOf, then c1 and c2 are classes (members of set IC) and ICEXT(c1), the class extension of c1, is a subset of ICEXT(c2), the class extensions of c2
  • Prior Art in OWL Full Reasoning• Fikes, McGuinness, Waldinger: A First-Order Logic Semantics for Semantic Web Markup Languages. TR, Stanford, 2002. – translation of specifications of precursers of OWL and RDF into first-order logic (FOL) theory, and application of FOL reasoners – focus: checking for technical issues in specifications (less on inferencing)• Hayes: Translating Semantic Web Languages into Common Logic. TR, Pensacola (Florida), 2005. – translation of OWL 1 Full into Common Logic (basically a variant of FOL) – no report on reasoning experiments• Hawke: Surnia. 2003. URL: http://www.w3.org/2003/08/surnia – OWL 1 Full reasoner based on FOL translation using Otter FOL reasoner – did not perform well on W3C OWL 1 test suite – ad hoc implementation: does not properly follow specification; many flaws
  • APPROACH
  • Translation into FOL: General ProcessOWL 2 Full Entailment Checking using FOL Reasoners and concrete FOL syntax TPTP: OWL 2 Full TPTPSemantic Conditions Axiom Set FOL { } Theorem Prover theorem Premise TPTP counter-sat RDF Graph Axiom unknown FOL Conclusion TPTP Model Finder RDF Graph Conjecture 1. Input: • translate semantic conditions into set of TPTP axioms • translate premise RDF graph into TPTP axiom • translate conclusion RDF graph into TPTP conjecture 2. Reasoning: feed all TPTP formulae into FOL reasoners (parallel execution): • FOL theorem provers: used to detect positive entailments • FOL model-finders: used to detect non-entailments 3. Output: integrate results from FOL reasoners into single result 10
  • Translation into FOL: Semantic Conditions iffmodel-theoretic OWL 2 Full semantic condition (Table 5.8) corresponding FOL formula (TPTP)
  • Translation into FOL: RDF Graphs RDF graph (Turtle) corresponding FOL formula (TPTP)
  • EVALUATION SETTING
  • Evaluation Setting: TPTP-Encoding• FOL Axiomatization: – translated most normative semantic conditions of OWL 2 Full – excluded: datatype reasoning-related semantics – size of complete axiomatization: 558 FOL formulae Syntax Statistics Number of formulae: 558 ( 196 unit ) Number of atoms: 1772 ( 90 equality ) Maximal formula depth: 27 ( 5 average ) Number of connectives: 1350 ( 136 ~ ; 35 | ; 758 & ; 126 <=> ; 295 => ) Number of predicates: 13 ( 1 propositional ; 0-3 arity ) Number of functors: 157 ( 156 constant ; 0-2 arity ) Number of variables: 973 ( 0 sgn ; 911 ! ; 62 ? ) Maximal term depth: 2 ( 1 average )• RDF Graph Conversion: – implemented simple RDF-to-TPTP converter tool
  • Evaluation Setting: Experiments1. Language Coverage completeness w.r.t. OWL 2 Full specification2. Characteristic OWL 2 Full Conclusions semantic capabilities beyond OWL 2 DL or OWL 2 RL/RDF Rules3. Scalability reasoning upon large data sets4. Model Finding detecting consistent ontologies and non-entailments
  • Evaluation Setting: Reasoners• FOL Theorem Provers: – Vampire 0.6 (using two modes: "auto", and with SInE strategy) – iProver-SInE 0.8 (iProver with SInE strategy and strategy scheduling)• FOL Model-Finders: – Paradox 4.0 (finite model finder) – DarwinFM 1.4.5 (finite model finder)• OWL Reasoners: – Pellet 2.2.2 (tableaux-based OWL 2 DL reasoner) – HermiT 1.3.2 (tableaux-based OWL 2 DL reasoner) – FaCT++ 1.5.0 (tableaux-based OWL 2 DL reasoner) – BigOWLIM 3.4, using "owl-rl" ruleset (RDF entailment-rule reasoner) – Jena 2.6.4, using OWL_MEM_RULE_INF spec (RDF framework with rule engine) – Parliament 2.6.9 (reasoning-enabled RDF triple store)
  • Evaluation Setting: Environment• Computers: – CPU: Intel Pentium 4, 2.8 GHz – Memory: 2 GB – Operating System: Linux FC8• Max. CPU time per run: 300 s
  • EVALUATION RESULTS1. LANGUAGE COVERAGE
  • Experiment 1: Language Coverage Overview• Aim: analyse completeness w.r.t. OWL 2 Full specification• Method: check that all parts of OWL 2 Full semantics specification are covered (except for datatype reasoning)• Test Data: dedicated OWL 2 Full coverage test suite targeted to specification level (Schneider & Mainzer, 2009): – at least one test case for each OWL 2 Full semantic condition – each test case focuses as much as possible on targeted semantic condition – generally easy to solve, hence failure indicates flaw or lack of coverage• Adjustments: – removal of datatype-reasoning related test cases (currently unsupported) – only using entailment and inconsistency test cases – size of remaining test suite: 411 test cases
  • Experiment 1: Language Coverage Example Test CaseTest case for probing coverage of the RDFS semantic condition for class subsumption:This positive entailment (‘p‘) test case applies to RDFS (‘rdfs‘), the OWL 2 RL/RDF Rules(‘owl2rl‘), and all common semantic extensions, including OWL 2 Full.The upper RDF graph is the premise graph, the lower RDF graph is the conclusion graph. TESTCASE rdfbased-sem-rdfs-subclass-cond p rdfs owl2rl The extensions of two classes related by rdfs:subClassOf are in a subsumption relationship. + ex:c1 rdfs:subClassOf ex:c2 . ex:w rdf:type ex:c1 . + ex:w rdf:type ex:c2 . +
  • Experiment 1: Language Coverage Results Pellet 237 168 6 HermiT 246 157 8 FaCT++ 190 45 176 BigOWLIM 282 129 0 Jena 129 282 0 Parliament 14 373 24 Vampire, OWL 2 Full axioms 349 0 62 iProver-SInE, OWL 2 Full axioms 383 028iProver-SInE with small axiom sets 396 0 15 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Success Wrong UnknownNotes:•All DL reasoners show similar results (although FaCT++ signals much more errors)•Results of BigOWLIM and DL reasoners are very different (BigOWLIM not "better")•"small axiom set": a manually selected subset of axioms from the complete OWL 2 Full axiom set that is small but sufficient to succeed on the given test case 21
  • Experiment 1: Language Coverage Runtimes (sorted)For each reasoner: runtimes for all test cases are sorted increasingly(all runtimes are for the complete OWL 2 Full axiom set; small axiom sets are ignored) Notes: • Most problems solved in less than 1s • Vampire solves slightly less problems, but is generally faster → suggests strategy to run both reasoners in parallel 22
  • EVALUATION RESULTS:2. CHARACTERISTIC CONCLUSIONS
  • Experiment 2: Characteristic Conclusions Overview• Aim: analyse ability to infer semantic conclusions that are characteristic for OWL 2 Full (beyond OWL 2 DL or OWL 2 RL/RDF)• Test Data: new "Characteristic Conclusions" test suite – 32 test cases (manually created) – probes many distinctive features of OWL 2 Full, including: • strong logic-based reasoning • unrestricted use of complex properties • blank nodes as existentially quantified variables • metamodeling • use of data values as individuals • semantic annotation properties • reflective use of built-in vocabulary terms – Differences to Language Coverage test suite: • focus is on "emergent behaviour" of OWL 2 Full rather than on technical specification • most test cases depend on interplay of several OWL 2 Full semantic conditions • results often technically non-obvious (proof needed)
  • Experiment 2: Characteristic Conclusions Example Test CaseTest case for probing metamodeling with Boolean logic reasoning and blank node semantics:This positive entailment test case applies to OWL 2 Full, but neither to OWL 2 DL (requiresreasoning based on metamodeling) nor to OWL 2 RL/RDF (requires strong support for classunion and existential blank node semantics). Test Case: 014_Harry_belongs_to_some_Species Premise Graph: ex:Eagle rdf:type ex:Species . ex:Falcon rdf:type ex:Species . ex:harry rdf:type [ owl:unionOf ( ex:Eagle ex:Falcon ) ] . Conclusion Graph: ex:harry rdf:type _:x . _:x rdf:type ex:Species .
  • Experiment 2: Characteristic Conclusions + success Results - wrong ? unknown 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 Pellet + + + - - - - - + + - - - - + - - - - + + - - - - + - - ? - - - Hermit + ? + - - ? - + + + - - - - + - - - - + + - - + ? + - - ? - - - Fact++ + ? ? ? ? ? ? - ? + - - - ? + ? - - - + + ? ? ? ? + - ? ? - - ? BigOWLIM + - - + - - + + - - + + - - + - - + + - - - - - - - - - - - - - Jena + - - - - + + + - - + - - - - - - + - - - - + - - + - - - - - + Parliament + - - - - - - + - - ? - - - - - - - ? - - - - - - - - - - ? ? - Vampire / complete + + + + + + + + + ? + ? ? + + + + + + ? ? ? + + ? + ? ? + + + + iProver-SInE / complete + + + + + + + + + + + ? ? + + + + + + ? ? + + + + + + + + + + + Vampire / small + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + iProver-SInE / small + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Notes:•OWL reasoners weak: < 30% success rate•DL vs RDF-rule reasoners: nearly no overlap for successful results•FOL reasoners: much better on complete axiom set; perfect on small axiom sets 26
  • Experiment 2: Characteristic Conclusions Runtimes (sorted)Notes:• FOL reasoners often slow when using complete axiom set• Generally much faster with small axiom sets (up to several magnitudes) 27
  • EVALUATION RESULTS:3. SCALABILITY
  • Experiment 3: Scalability Overview• Aim: analysing reasoning upon large data sets, when most data is not relevant for reasoning result (most simple scenario for a start)• Method: using existing reasoning test suite, but with large masses of "bulk" RDF data added to premise graph, where the bulk data is semantically weak and unrelated to the test suite• Test Data: – Reasoning test suite: Characteristic Conclusions test suite – Bulk RDF data: 1 Million triples, no RDF(S)/OWL vocabulary terms, no URIs shared with reasoning test cases• Reasoning Scenarios: – auto reasoning mode vs. SInE strategy – complete axiom set vs. small axiom sets
  • Experiment 3: Scalability Example Bulk RDF Data Set ex:si1 ex:pi1 ex:oi1 . ex:si2 ex:pi2 ex:oi2 . ex:si3 ex:pi3 ex:oi3 .This is an example bulk RDF data set consisting ex:si4 ex:pi4 ex:oi4 .of 20 RDF triples. The data set has no names in ex:si5 ex:pi5 ex:oi5 .common with any of the test cases being used ex:ss ex:ps1 ex:os1 . ex:ss ex:ps2 ex:os2 .in the evaluation, nor does the bulk data refer ex:ss ex:ps3 ex:os3 .to any built-in terms of the OWL and RDF(S) ex:ss ex:ps4 ex:os4 .vocabularies. There are no blank nodes, i.e., ex:ss ex:ps5 ex:os5 .the bulk data consists entirely of a "ground" ex:sp1 ex:pp ex:op1 .RDF graph. The bulk data sets being used in the ex:sp2 ex:pp ex:op2 .evaluation have been much larger, still having ex:sp3 ex:pp ex:op3 .the same basic format as the example set ex:sp4 ex:pp ex:op4 .presented here. ex:sp5 ex:pp ex:op5 . ex:ssp ex:psp ex:osp1 . ex:ssp ex:psp ex:osp2 . ex:ssp ex:psp ex:osp3 . ex:ssp ex:psp ex:osp4 . ex:ssp ex:psp ex:osp5 .
  • Experiment 3: Scalability Results + success - wrong ? unknown 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 Vampire auto / complete + + + ? ? ? ? ? ? ? ? ? ? ? + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Vampire SInE / complete + + + + + + ? + ? ? + ? ? ? + + ? + + ? ? ? + ? ? + ? ? ? + ? + IProver-SInE / complete + + + + + + + + + + + ? ? + + + + + + ? ? + + + + + + + + + + + Vampire auto / small + ? + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Vampire SInE / small + + + + + + ? + + + + + ? + + + ? + + ? ? + + + + + + ? + + ? + Iprover-SInE / small + + + + + + + + + + + + ? + + + + + + + + + + + + + + + + + + +Notes:• Vampire is very bad with "auto" strategy: times out in most cases• Improvement by using SInE strategy (Iprover and Vampire) on complete axiom set• Major improvement by combining SInE strategy with removal of irrelevant OWL axioms (small axiom sets)
  • Experiment 3: Scalability Runtimes (sorted)Notes:• General offset of ca. 20s for parsing large input data (ca. 55MB)• SInE strategy successful (Vampire mostly fails when using "auto" mode)• further improvements by using small axiom sets
  • EVALUATION RESULTS:4. MODEL FINDING
  • Experiment 4: Model-Finding Overview• Aim: analyse ability to detect non-entailments and consistent ontologies• Method: Using FOL model-finders on test suite with consistent ontologies and non-entailments. Also using sub axiom sets of OWL 2 Full axiom set in order to see how well model-finding improves for smaller sublanguages of OWL 2 Full. For sub-axiom sets, some of the OWL 2 Full entailments and inconsistencies in a test suite will become non-entailments and consistent ontologies.• Axiom Sets: – OWL 2 Full – ALCO Full: undecidable sublanguage of OWL 2 Full [Motik 05] – RDFS-EXT: "extensional RDFS" [RDF Semantics, Sec. 4.2]• Test Data: Characteristic Conclusions test suite
  • Experiment 4: Model-Finding Results (Summary)• OWL 2 Full (unsuccessful!): – No FOL model-finder confirmed satisfiability of axiomatization (timeouts) – Fortunately: no theorem prover confirmed unsatisfiability – Good: all "small-sufficient" sub-axiomatizations of test cases satisfiable• ALCO Full: – Satisfiability checking for axiomatization successful – Checking non-entailment/consistency successful in 2/3 of the test cases – Runtimes: median ca. 18s with model-finder Paradox• RDFS: – Checking non-entailment/consistency always successful – Runtimes: ca. 1/10s for most experiments with model-finder DarwinFM 35
  • Experiment 4: Model-Finding Results (Concrete) + success - wrong ? unknown not probed 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 Paradox / ALCO Full + + + + + + ? + + + + + ? ? ? ? + ? ? ? + + ? + Paradox / RDFS + + + + + + + + + + + + + + + + + + + + + + + + + + + + + DarwinFM / RDFS + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Notes:• black cells are still entailments or inconsistent ontologies: not probed!• 1st line: ALCO Full axiom set, using Paradox model-finder Result: 15 successful detections, 9 time-outs• 2nd/3rd line: RDFS-EXT axiom set, using Paradox/DarwinFM model-finders Result: always successfull
  • CONCLUSIONS
  • Summary• Using ATP-based OWL 2 Full reasoning works in principle: – Language Coverage: basically complete (skipped datatypes) • for a few test cases, it was necessary to select a small axiom set from the complete OWL 2 Full axiomatization sufficient to proof the result – Characteristic OWL 2 Full Conclusions: all, if using small axiom sets – Performance: often quick (< 1/10s), if using small axiom sets – Scalability: works for semantically weak and unrelated "bulk" data – Model-Finding: works for certain fragments of OWL 2 Full• Identified Problems (motivates future work): – slow or even dysfunctional on complete axiomatization (> 500 axioms) – no successful model-finding for complete OWL 2 Full axiomatization
  • Future Work• develop automated method for selecting small axiom sets• conduct more realistic scalability experiments• investigate query answering with FOL ATPs• add support for datatype reasoning• try to manually find a model for the OWL 2 Full axiomatization• implement a prototypical OWL 2 Full reasoner
  • Links• Conference Paper: http://dx.doi.org/10.1007/978-3-642-22438-6_35• Extended Version of Paper (detailed results, "Characteristic Conclusions" test suite) http://arxiv.org/abs/1108.0155• Supplementary Material (all axiom sets, test data, raw results, software): http://www.fzi.de/downloads/ipe/schneid/cade2011-schneidsut-owlfullatp.zip