Reasoning in the OWL 2 Full Ontology Language
 using First-Order Automated Theorem Proving
                          Michael Schneider
       FZI Research Center for Information Technology, Germany

                           Geoff Sutcliffe
                      University of Miami, USA



   23rd International Conference on Automated Deduction (CADE 23)
                     Wrocław, Poland, August 2011
Introduction
• Context: Semantic Web, OWL, First-Order Logic, ATP

• Focus: W3C ontology language OWL 2 Full
   – highly expressive ontology formalism
   – no reasoner as of today

• Aims: find out …
   – … to what extend can practical OWL 2 Full reasoning be
     implemented with off-the-shelf FOL reasoning technology
   – … whether FOL-based OWL 2 Full reasoning can provide
     added value over today's state-of-the art OWL reasoners
PRELIMINARIES
Semantic Web
• Semantic Web (SW): extends WWW by machine-processible,
  interlinked resource descriptions and vocabularies
   – Resources (everything one can talk about):                           :Eagle
       •   identified by URIs
                                                           age
       •   described by property-value pairs           3
                                                                           father
       •   classified by classes                                 :Harry             :Larry
                                                           sex
       •   related to other resources via properties   m
   – Vocabularies: define classes and properties and their (formal) semantics,
     e.g. that class Eagle is a subclass of class Animal, so Harry becomes an Animal

• RDF ("Resource Description Framework"): language to define
  "graphs" of interlinked resource descriptions

• OWL ("Web Ontology Language"): language to define vocabularies
OWL Flavours
• OWL 2: family of ontology languages for the Semantic Web
   – W3C Recommendation (2009)
   – version 2 is revised and extended version of OWL (2004)

• Two major "flavours" of OWL: OWL 2 DL and OWL 2 Full
   – OWL 2 DL: basically a description logic (SROIQ[D]) adjusted to SW needs
   – OWL 2 Full: similar to OWL 2 DL, but directly designed for an RDF-based SW

• Some observable distinctive features of OWL 2 Full:
   – can be applied to weakly-structured SW data (LOD) and RDFS vocabularies
   – no restrictions on use of OWL constructs (e.g. asymmetric transitive properties)
   – support for (semantic) metamodeling (Harry, the Eagle, has meta-class Species)

• Theoretical issue: OWL 2 Full is undecidable (practical problem?)
OWL 2 Full Semantics
• Specification: OWL 2 Full semantics is specified via a set of
  model-theoretic "semantic conditions"
                          OWL 2 RDF-Based Semantics:
                 http://www.w3.org/TR/owl2-rdf-based-semantics/

• Core Observation:
   – all semantic conditions have the form of standard first-order logic formulae
     → OWL 2 Full semantics is essentially a FOL theory!
   – all input RDF graphs are representable as FOL formulae
   – hence: OWL 2 Full reasoning is implementable in terms of FOL reasoning!
• Question: How well does it work in practice?
• Prior Art: very little research based on this observation so far;
  none for OWL 2 Full
OWL 2 Full Semantic Conditions
•   Typical format of semantic conditions:
      if a certain semantic relationship holds,
      then another associated relationship also holds
    (Note: many semantic conditions are in fact if-and-only-if conditions)
•   Example (1st semantic condition in Table 5.8 of OWL 2 RDF-Based Semantics):
      if two individuals c1 and c2 are related by the denotation of URI rdfs:subClassOf,
      then c1 and c2 are classes (members of set IC) and ICEXT(c1), the class extension
      of c1, is a subset of ICEXT(c2), the class extensions of c2
Prior Art in OWL Full Reasoning
• Fikes, McGuinness, Waldinger: A First-Order Logic Semantics
  for Semantic Web Markup Languages. TR, Stanford, 2002.
   – translation of specifications of precursers of OWL and RDF into first-order
     logic (FOL) theory, and application of FOL reasoners
   – focus: checking for technical issues in specifications (less on inferencing)
• Hayes: Translating Semantic Web Languages into Common Logic.
  TR, Pensacola (Florida), 2005.
   – translation of OWL 1 Full into Common Logic (basically a variant of FOL)
   – no report on reasoning experiments
• Hawke: Surnia. 2003. URL: http://www.w3.org/2003/08/surnia
   – OWL 1 Full reasoner based on FOL translation using Otter FOL reasoner
   – did not perform well on W3C OWL 1 test suite
   – ad hoc implementation: does not properly follow specification; many flaws
APPROACH
Translation into FOL:
                        General Process
OWL 2 Full Entailment Checking using FOL Reasoners and concrete FOL syntax TPTP:

   OWL 2 Full               TPTP
Semantic Conditions       Axiom Set                    FOL



                                                                      { }
                                                 Theorem Prover
                                                                         theorem
       Premise               TPTP
                                                                        counter-sat
      RDF Graph             Axiom
                                                                         unknown
                                                     FOL
      Conclusion            TPTP                  Model Finder
      RDF Graph           Conjecture




 1. Input:
     • translate semantic conditions into set of TPTP axioms
     • translate premise RDF graph into TPTP axiom
     • translate conclusion RDF graph into TPTP conjecture
 2. Reasoning: feed all TPTP formulae into FOL reasoners (parallel execution):
     • FOL theorem provers: used to detect positive entailments
     • FOL model-finders: used to detect non-entailments
 3. Output: integrate results from FOL reasoners into single result                   10
Translation into FOL:
        Semantic Conditions
                          iff

model-theoretic OWL 2 Full semantic condition (Table 5.8)




           corresponding FOL formula (TPTP)
Translation into FOL:
    RDF Graphs


        RDF graph (Turtle)




 corresponding FOL formula (TPTP)
EVALUATION SETTING
Evaluation Setting:
                               TPTP-Encoding
•   FOL Axiomatization:
    –   translated most normative semantic conditions of OWL 2 Full
    –   excluded: datatype reasoning-related semantics
    –   size of complete axiomatization: 558 FOL formulae
           Syntax Statistics
           Number of formulae:     558 ( 196 unit )
           Number of atoms:       1772 ( 90 equality )
           Maximal formula depth:   27 ( 5 average )
           Number of connectives: 1350 ( 136 '~' ; 35 '|' ; 758 '&' ; 126 '<=>' ; 295 '=>' )
           Number of predicates:    13 ( 1 propositional ; 0-3 arity )
           Number of functors:     157 ( 156 constant ; 0-2 arity )
           Number of variables:    973 ( 0 sgn ; 911 '!' ; 62 '?' )
           Maximal term depth:       2 ( 1 average )



•   RDF Graph Conversion:
    –   implemented simple RDF-to-TPTP converter tool
Evaluation Setting:
                    Experiments

1. Language Coverage
   completeness w.r.t. OWL 2 Full specification
2. Characteristic OWL 2 Full Conclusions
   semantic capabilities beyond OWL 2 DL or OWL 2 RL/RDF Rules
3. Scalability
   reasoning upon large data sets
4. Model Finding
   detecting consistent ontologies and non-entailments
Evaluation Setting:
                          Reasoners
• FOL Theorem Provers:
  – Vampire 0.6 (using two modes: "auto", and with SInE strategy)
  – iProver-SInE 0.8 (iProver with SInE strategy and strategy scheduling)
• FOL Model-Finders:
  – Paradox 4.0 (finite model finder)
  – DarwinFM 1.4.5 (finite model finder)
• OWL Reasoners:
  –   Pellet 2.2.2 (tableaux-based OWL 2 DL reasoner)
  –   HermiT 1.3.2 (tableaux-based OWL 2 DL reasoner)
  –   FaCT++ 1.5.0 (tableaux-based OWL 2 DL reasoner)
  –   BigOWLIM 3.4, using "owl-rl" ruleset (RDF entailment-rule reasoner)
  –   Jena 2.6.4, using OWL_MEM_RULE_INF spec (RDF framework with rule engine)
  –   Parliament 2.6.9 (reasoning-enabled RDF triple store)
Evaluation Setting:
                   Environment

• Computers:
   – CPU: Intel Pentium 4, 2.8 GHz
   – Memory: 2 GB
   – Operating System: Linux FC8
• Max. CPU time per run: 300 s
EVALUATION RESULTS
1. LANGUAGE COVERAGE
Experiment 1: Language Coverage
                 Overview
• Aim: analyse completeness w.r.t. OWL 2 Full specification
• Method: check that all parts of OWL 2 Full semantics specification
  are covered (except for datatype reasoning)
• Test Data: dedicated OWL 2 Full coverage test suite targeted to
  specification level (Schneider & Mainzer, 2009):
   – at least one test case for each OWL 2 Full semantic condition
   – each test case focuses as much as possible on targeted semantic condition
   – generally easy to solve, hence failure indicates flaw or lack of coverage
• Adjustments:
   – removal of datatype-reasoning related test cases (currently unsupported)
   – only using entailment and inconsistency test cases
   – size of remaining test suite: 411 test cases
Experiment 1: Language Coverage
              Example Test Case
Test case for probing coverage of the RDFS semantic condition for class subsumption:
This positive entailment (‘p‘) test case applies to RDFS (‘rdfs‘), the OWL 2 RL/RDF Rules
(‘owl2rl‘), and all common semantic extensions, including OWL 2 Full.
The upper RDF graph is the premise graph, the lower RDF graph is the conclusion graph.


    TESTCASE rdfbased-sem-rdfs-subclass-cond
    p rdfs owl2rl
    The extensions of two classes related by rdfs:subClassOf
    are in a subsumption relationship.
    +
    ex:c1 rdfs:subClassOf ex:c2 .
    ex:w rdf:type ex:c1 .
    +
    ex:w rdf:type ex:c2 .
    +
Experiment 1: Language Coverage
                   Results
                             Pellet              237                     168       6
                           HermiT                 246                    157       8
                            FaCT++            190             45         176
                        BigOWLIM                    282                     129     0
                               Jena     129                        282              0
                       Parliament 14                         373                  24
       Vampire, OWL 2 Full axioms                         349                0 62
   iProver-SInE, OWL 2 Full axioms                          383                 028
iProver-SInE with small axiom sets                           396                  0
                                                                                  15

                                0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
                              Success    Wrong       Unknown
Notes:
•All DL reasoners show similar results (although FaCT++ signals much more errors)
•Results of BigOWLIM and DL reasoners are very different (BigOWLIM not "better")
•"small axiom set": a manually selected subset of axioms from the complete
 OWL 2 Full axiom set that is small but sufficient to succeed on the given test case
                                                                                 21
Experiment 1: Language Coverage
              Runtimes (sorted)
For each reasoner: runtimes for all test cases are sorted increasingly
(all runtimes are for the complete OWL 2 Full axiom set; small axiom sets are ignored)




  Notes:
  • Most problems solved in less than 1s
  • Vampire solves slightly less problems, but is generally faster
    → suggests strategy to run both reasoners in parallel                                22
EVALUATION RESULTS:
2. CHARACTERISTIC CONCLUSIONS
Experiment 2: Characteristic Conclusions
               Overview
• Aim: analyse ability to infer semantic conclusions that are
  characteristic for OWL 2 Full (beyond OWL 2 DL or OWL 2 RL/RDF)
• Test Data: new "Characteristic Conclusions" test suite
   – 32 test cases (manually created)
   – probes many distinctive features of OWL 2 Full, including:
       •   strong logic-based reasoning
       •   unrestricted use of complex properties
       •   blank nodes as existentially quantified variables
       •   metamodeling
       •   use of data values as individuals
       •   semantic annotation properties
       •   reflective use of built-in vocabulary terms
   – Differences to Language Coverage test suite:
       • focus is on "emergent behaviour" of OWL 2 Full rather than on technical specification
       • most test cases depend on interplay of several OWL 2 Full semantic conditions
       • results often technically non-obvious (proof needed)
Experiment 2: Characteristic Conclusions
           Example Test Case
Test case for probing metamodeling with Boolean logic reasoning and blank node semantics:
This positive entailment test case applies to OWL 2 Full, but neither to OWL 2 DL (requires
reasoning based on metamodeling) nor to OWL 2 RL/RDF (requires strong support for class
union and existential blank node semantics).

       Test Case: 014_Harry_belongs_to_some_Species
       Premise Graph:
         ex:Eagle rdf:type ex:Species .
         ex:Falcon rdf:type ex:Species .
         ex:harry rdf:type [
             owl:unionOf ( ex:Eagle ex:Falcon )
         ] .

       Conclusion Graph:
          ex:harry rdf:type _:x .
          _:x rdf:type ex:Species .
Experiment 2: Characteristic Conclusions
   +   success  Results
   -   wrong
   ?   unknown               0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3
                             1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2
   Pellet                    + + + -       -   -   -   - + + -   -   -   - + -       -   -   - + + -       -   -   - + -       - ? -       -   -
   Hermit                    + ? + -       - ? - + + + -         -   -   - + -       -   -   - + + -       - + ? + -           - ? -       -   -
   Fact++                    + ? ? ? ? ? ? - ? + -               -   - ? + ? -           -   - + + ? ? ? ? + - ? ? -                       - ?
   BigOWLIM                  + -   - + -       - + + -     - + + -       - + -       - + + -       -   -   -   -   -   -   -   -   -   -   -   -
   Jena                      + -   -   -   - + + + -       - + -     -   -   -   -   - + -     -   -   - + -       - + -       -   -   -   - +
   Parliament                + -   -   -   -   -   - + -   - ? -     -   -   -   -   -   - ? -     -   -   -   -   -   -   -   -   - ? ? -



   Vampire / complete        + + + + + + + + + ? + ? ? + + + + + + ? ? ? + + ? + ? ? + + + +
   iProver-SInE / complete   + + + + + + + + + + + ? ? + + + + + + ? ? + + + + + + + + + + +
   Vampire / small           + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
   iProver-SInE / small      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

Notes:
•OWL reasoners weak: < 30% success rate
•DL vs RDF-rule reasoners: nearly no overlap for successful results
•FOL reasoners: much better on complete axiom set; perfect on small axiom sets
                                                                            26
Experiment 2: Characteristic Conclusions
          Runtimes (sorted)




Notes:
• FOL reasoners often slow when using complete axiom set
• Generally much faster with small axiom sets (up to several magnitudes)

                                                                           27
EVALUATION RESULTS:
3. SCALABILITY
Experiment 3: Scalability
                      Overview
• Aim: analysing reasoning upon large data sets, when most data is
  not relevant for reasoning result (most simple scenario for a start)
• Method: using existing reasoning test suite, but with large masses
  of "bulk" RDF data added to premise graph, where the bulk data is
  semantically weak and unrelated to the test suite
• Test Data:
   – Reasoning test suite: Characteristic Conclusions test suite
   – Bulk RDF data: 1 Million triples, no RDF(S)/OWL vocabulary terms, no URIs
     shared with reasoning test cases
• Reasoning Scenarios:
   – auto reasoning mode vs. SInE strategy
   – complete axiom set vs. small axiom sets
Experiment 3: Scalability
           Example Bulk RDF Data Set
                                                  ex:si1 ex:pi1 ex:oi1 .
                                                  ex:si2 ex:pi2 ex:oi2 .
                                                  ex:si3 ex:pi3 ex:oi3 .
This is an example bulk RDF data set consisting   ex:si4 ex:pi4 ex:oi4 .
of 20 RDF triples. The data set has no names in   ex:si5 ex:pi5 ex:oi5 .
common with any of the test cases being used      ex:ss ex:ps1 ex:os1 .
                                                  ex:ss ex:ps2 ex:os2 .
in the evaluation, nor does the bulk data refer
                                                  ex:ss ex:ps3 ex:os3 .
to any built-in terms of the OWL and RDF(S)
                                                  ex:ss ex:ps4 ex:os4 .
vocabularies. There are no blank nodes, i.e.,
                                                  ex:ss ex:ps5 ex:os5 .
the bulk data consists entirely of a "ground"     ex:sp1 ex:pp ex:op1 .
RDF graph. The bulk data sets being used in the   ex:sp2 ex:pp ex:op2 .
evaluation have been much larger, still having    ex:sp3 ex:pp ex:op3 .
the same basic format as the example set          ex:sp4 ex:pp ex:op4 .
presented here.                                   ex:sp5 ex:pp ex:op5 .
                                                  ex:ssp ex:psp ex:osp1 .
                                                  ex:ssp ex:psp ex:osp2 .
                                                  ex:ssp ex:psp ex:osp3 .
                                                  ex:ssp ex:psp ex:osp4 .
                                                  ex:ssp ex:psp ex:osp5 .
Experiment 3: Scalability
                                   Results
  +   success
  -   wrong
  ?   unknown
                             0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3
                             1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2
  Vampire auto / complete    + + + ? ? ? ? ? ? ? ? ? ? ? + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
  Vampire SInE / complete    + + + + + + ? + ? ? + ? ? ? + + ? + + ? ? ? + ? ? + ? ? ? + ? +
  IProver-SInE / complete    + + + + + + + + + + + ? ? + + + + + + ? ? + + + + + + + + + + +
  Vampire auto / small       + ? + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
  Vampire SInE / small       + + + + + + ? + + + + + ? + + + ? + + ? ? + + + + + + ? + + ? +
  Iprover-SInE / small       + + + + + + + + + + + + ? + + + + + + + + + + + + + + + + + + +




Notes:
• Vampire is very bad with "auto" strategy: times out in most cases
• Improvement by using SInE strategy (Iprover and Vampire) on complete axiom set
• Major improvement by combining SInE strategy with removal of irrelevant OWL axioms
   (small axiom sets)
Experiment 3: Scalability
                 Runtimes (sorted)




Notes:
• General offset of ca. 20s for parsing large input data (ca. 55MB)
• SInE strategy successful (Vampire mostly fails when using "auto" mode)
• further improvements by using small axiom sets
EVALUATION RESULTS:
4. MODEL FINDING
Experiment 4: Model-Finding
                  Overview
• Aim: analyse ability to detect non-entailments and consistent
  ontologies
• Method: Using FOL model-finders on test suite with consistent
  ontologies and non-entailments. Also using sub axiom sets of
  OWL 2 Full axiom set in order to see how well model-finding
  improves for smaller sublanguages of OWL 2 Full. For sub-axiom
  sets, some of the OWL 2 Full entailments and inconsistencies in a
  test suite will become non-entailments and consistent ontologies.
• Axiom Sets:
   – OWL 2 Full
   – ALCO Full: undecidable sublanguage of OWL 2 Full [Motik 05]
   – RDFS-EXT: "extensional RDFS" [RDF Semantics, Sec. 4.2]
• Test Data: Characteristic Conclusions test suite
Experiment 4: Model-Finding
              Results (Summary)
• OWL 2 Full (unsuccessful!):
   – No FOL model-finder confirmed satisfiability of axiomatization (timeouts)
   – Fortunately: no theorem prover confirmed unsatisfiability
   – Good: all "small-sufficient" sub-axiomatizations of test cases satisfiable
• ALCO Full:
   – Satisfiability checking for axiomatization successful
   – Checking non-entailment/consistency successful in 2/3 of the test cases
   – Runtimes: median ca. 18s with model-finder Paradox
• RDFS:
   – Checking non-entailment/consistency always successful
   – Runtimes: ca. 1/10s for most experiments with model-finder DarwinFM


                                                                           35
Experiment 4: Model-Finding
                        Results (Concrete)
  +   success
  -   wrong
  ?   unknown
      not probed
                        0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3
                        1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2
  Paradox / ALCO Full        + + + + +     +   ?   + + + + + ? ? ? ? + ? ? ? +    + ? +
  Paradox / RDFS             + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
  DarwinFM / RDFS            + + + + + + + + + + + + + + + + + + + + + + + + + + + + +




Notes:
• black cells are still entailments or inconsistent ontologies: not probed!
• 1st line: ALCO Full axiom set, using Paradox model-finder
  Result: 15 successful detections, 9 time-outs
• 2nd/3rd line: RDFS-EXT axiom set, using Paradox/DarwinFM model-finders
  Result: always successfull
CONCLUSIONS
Summary
• Using ATP-based OWL 2 Full reasoning works in principle:
   – Language Coverage: basically complete (skipped datatypes)
        • for a few test cases, it was necessary to select a small axiom set from the
          complete OWL 2 Full axiomatization sufficient to proof the result
   –   Characteristic OWL 2 Full Conclusions: all, if using small axiom sets
   –   Performance: often quick (< 1/10s), if using small axiom sets
   –   Scalability: works for semantically weak and unrelated "bulk" data
   –   Model-Finding: works for certain fragments of OWL 2 Full


• Identified Problems (motivates future work):
   – slow or even dysfunctional on complete axiomatization (> 500 axioms)
   – no successful model-finding for complete OWL 2 Full axiomatization
Future Work
•   develop automated method for selecting small axiom sets
•   conduct more realistic scalability experiments
•   investigate query answering with FOL ATPs
•   add support for datatype reasoning
•   try to manually find a model for the OWL 2 Full axiomatization
•   implement a prototypical OWL 2 Full reasoner
Links

• Conference Paper:
       http://dx.doi.org/10.1007/978-3-642-22438-6_35

• Extended Version of Paper
  (detailed results, "Characteristic Conclusions" test suite)
       http://arxiv.org/abs/1108.0155

• Supplementary Material
  (all axiom sets, test data, raw results, software):
       http://www.fzi.de/downloads/ipe/schneid/cade2011-schneidsut-owlfullatp.zip

cade23-schneidsut-atp4owlfull-2011

  • 1.
    Reasoning in theOWL 2 Full Ontology Language using First-Order Automated Theorem Proving Michael Schneider FZI Research Center for Information Technology, Germany Geoff Sutcliffe University of Miami, USA 23rd International Conference on Automated Deduction (CADE 23) Wrocław, Poland, August 2011
  • 2.
    Introduction • Context: SemanticWeb, OWL, First-Order Logic, ATP • Focus: W3C ontology language OWL 2 Full – highly expressive ontology formalism – no reasoner as of today • Aims: find out … – … to what extend can practical OWL 2 Full reasoning be implemented with off-the-shelf FOL reasoning technology – … whether FOL-based OWL 2 Full reasoning can provide added value over today's state-of-the art OWL reasoners
  • 3.
  • 4.
    Semantic Web • SemanticWeb (SW): extends WWW by machine-processible, interlinked resource descriptions and vocabularies – Resources (everything one can talk about): :Eagle • identified by URIs age • described by property-value pairs 3 father • classified by classes :Harry :Larry sex • related to other resources via properties m – Vocabularies: define classes and properties and their (formal) semantics, e.g. that class Eagle is a subclass of class Animal, so Harry becomes an Animal • RDF ("Resource Description Framework"): language to define "graphs" of interlinked resource descriptions • OWL ("Web Ontology Language"): language to define vocabularies
  • 5.
    OWL Flavours • OWL2: family of ontology languages for the Semantic Web – W3C Recommendation (2009) – version 2 is revised and extended version of OWL (2004) • Two major "flavours" of OWL: OWL 2 DL and OWL 2 Full – OWL 2 DL: basically a description logic (SROIQ[D]) adjusted to SW needs – OWL 2 Full: similar to OWL 2 DL, but directly designed for an RDF-based SW • Some observable distinctive features of OWL 2 Full: – can be applied to weakly-structured SW data (LOD) and RDFS vocabularies – no restrictions on use of OWL constructs (e.g. asymmetric transitive properties) – support for (semantic) metamodeling (Harry, the Eagle, has meta-class Species) • Theoretical issue: OWL 2 Full is undecidable (practical problem?)
  • 6.
    OWL 2 FullSemantics • Specification: OWL 2 Full semantics is specified via a set of model-theoretic "semantic conditions" OWL 2 RDF-Based Semantics: http://www.w3.org/TR/owl2-rdf-based-semantics/ • Core Observation: – all semantic conditions have the form of standard first-order logic formulae → OWL 2 Full semantics is essentially a FOL theory! – all input RDF graphs are representable as FOL formulae – hence: OWL 2 Full reasoning is implementable in terms of FOL reasoning! • Question: How well does it work in practice? • Prior Art: very little research based on this observation so far; none for OWL 2 Full
  • 7.
    OWL 2 FullSemantic Conditions • Typical format of semantic conditions: if a certain semantic relationship holds, then another associated relationship also holds (Note: many semantic conditions are in fact if-and-only-if conditions) • Example (1st semantic condition in Table 5.8 of OWL 2 RDF-Based Semantics): if two individuals c1 and c2 are related by the denotation of URI rdfs:subClassOf, then c1 and c2 are classes (members of set IC) and ICEXT(c1), the class extension of c1, is a subset of ICEXT(c2), the class extensions of c2
  • 8.
    Prior Art inOWL Full Reasoning • Fikes, McGuinness, Waldinger: A First-Order Logic Semantics for Semantic Web Markup Languages. TR, Stanford, 2002. – translation of specifications of precursers of OWL and RDF into first-order logic (FOL) theory, and application of FOL reasoners – focus: checking for technical issues in specifications (less on inferencing) • Hayes: Translating Semantic Web Languages into Common Logic. TR, Pensacola (Florida), 2005. – translation of OWL 1 Full into Common Logic (basically a variant of FOL) – no report on reasoning experiments • Hawke: Surnia. 2003. URL: http://www.w3.org/2003/08/surnia – OWL 1 Full reasoner based on FOL translation using Otter FOL reasoner – did not perform well on W3C OWL 1 test suite – ad hoc implementation: does not properly follow specification; many flaws
  • 9.
  • 10.
    Translation into FOL: General Process OWL 2 Full Entailment Checking using FOL Reasoners and concrete FOL syntax TPTP: OWL 2 Full TPTP Semantic Conditions Axiom Set FOL { } Theorem Prover theorem Premise TPTP counter-sat RDF Graph Axiom unknown FOL Conclusion TPTP Model Finder RDF Graph Conjecture 1. Input: • translate semantic conditions into set of TPTP axioms • translate premise RDF graph into TPTP axiom • translate conclusion RDF graph into TPTP conjecture 2. Reasoning: feed all TPTP formulae into FOL reasoners (parallel execution): • FOL theorem provers: used to detect positive entailments • FOL model-finders: used to detect non-entailments 3. Output: integrate results from FOL reasoners into single result 10
  • 11.
    Translation into FOL: Semantic Conditions iff model-theoretic OWL 2 Full semantic condition (Table 5.8) corresponding FOL formula (TPTP)
  • 12.
    Translation into FOL: RDF Graphs RDF graph (Turtle) corresponding FOL formula (TPTP)
  • 13.
  • 14.
    Evaluation Setting: TPTP-Encoding • FOL Axiomatization: – translated most normative semantic conditions of OWL 2 Full – excluded: datatype reasoning-related semantics – size of complete axiomatization: 558 FOL formulae Syntax Statistics Number of formulae: 558 ( 196 unit ) Number of atoms: 1772 ( 90 equality ) Maximal formula depth: 27 ( 5 average ) Number of connectives: 1350 ( 136 '~' ; 35 '|' ; 758 '&' ; 126 '<=>' ; 295 '=>' ) Number of predicates: 13 ( 1 propositional ; 0-3 arity ) Number of functors: 157 ( 156 constant ; 0-2 arity ) Number of variables: 973 ( 0 sgn ; 911 '!' ; 62 '?' ) Maximal term depth: 2 ( 1 average ) • RDF Graph Conversion: – implemented simple RDF-to-TPTP converter tool
  • 15.
    Evaluation Setting: Experiments 1. Language Coverage completeness w.r.t. OWL 2 Full specification 2. Characteristic OWL 2 Full Conclusions semantic capabilities beyond OWL 2 DL or OWL 2 RL/RDF Rules 3. Scalability reasoning upon large data sets 4. Model Finding detecting consistent ontologies and non-entailments
  • 16.
    Evaluation Setting: Reasoners • FOL Theorem Provers: – Vampire 0.6 (using two modes: "auto", and with SInE strategy) – iProver-SInE 0.8 (iProver with SInE strategy and strategy scheduling) • FOL Model-Finders: – Paradox 4.0 (finite model finder) – DarwinFM 1.4.5 (finite model finder) • OWL Reasoners: – Pellet 2.2.2 (tableaux-based OWL 2 DL reasoner) – HermiT 1.3.2 (tableaux-based OWL 2 DL reasoner) – FaCT++ 1.5.0 (tableaux-based OWL 2 DL reasoner) – BigOWLIM 3.4, using "owl-rl" ruleset (RDF entailment-rule reasoner) – Jena 2.6.4, using OWL_MEM_RULE_INF spec (RDF framework with rule engine) – Parliament 2.6.9 (reasoning-enabled RDF triple store)
  • 17.
    Evaluation Setting: Environment • Computers: – CPU: Intel Pentium 4, 2.8 GHz – Memory: 2 GB – Operating System: Linux FC8 • Max. CPU time per run: 300 s
  • 18.
  • 19.
    Experiment 1: LanguageCoverage Overview • Aim: analyse completeness w.r.t. OWL 2 Full specification • Method: check that all parts of OWL 2 Full semantics specification are covered (except for datatype reasoning) • Test Data: dedicated OWL 2 Full coverage test suite targeted to specification level (Schneider & Mainzer, 2009): – at least one test case for each OWL 2 Full semantic condition – each test case focuses as much as possible on targeted semantic condition – generally easy to solve, hence failure indicates flaw or lack of coverage • Adjustments: – removal of datatype-reasoning related test cases (currently unsupported) – only using entailment and inconsistency test cases – size of remaining test suite: 411 test cases
  • 20.
    Experiment 1: LanguageCoverage Example Test Case Test case for probing coverage of the RDFS semantic condition for class subsumption: This positive entailment (‘p‘) test case applies to RDFS (‘rdfs‘), the OWL 2 RL/RDF Rules (‘owl2rl‘), and all common semantic extensions, including OWL 2 Full. The upper RDF graph is the premise graph, the lower RDF graph is the conclusion graph. TESTCASE rdfbased-sem-rdfs-subclass-cond p rdfs owl2rl The extensions of two classes related by rdfs:subClassOf are in a subsumption relationship. + ex:c1 rdfs:subClassOf ex:c2 . ex:w rdf:type ex:c1 . + ex:w rdf:type ex:c2 . +
  • 21.
    Experiment 1: LanguageCoverage Results Pellet 237 168 6 HermiT 246 157 8 FaCT++ 190 45 176 BigOWLIM 282 129 0 Jena 129 282 0 Parliament 14 373 24 Vampire, OWL 2 Full axioms 349 0 62 iProver-SInE, OWL 2 Full axioms 383 028 iProver-SInE with small axiom sets 396 0 15 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Success Wrong Unknown Notes: •All DL reasoners show similar results (although FaCT++ signals much more errors) •Results of BigOWLIM and DL reasoners are very different (BigOWLIM not "better") •"small axiom set": a manually selected subset of axioms from the complete OWL 2 Full axiom set that is small but sufficient to succeed on the given test case 21
  • 22.
    Experiment 1: LanguageCoverage Runtimes (sorted) For each reasoner: runtimes for all test cases are sorted increasingly (all runtimes are for the complete OWL 2 Full axiom set; small axiom sets are ignored) Notes: • Most problems solved in less than 1s • Vampire solves slightly less problems, but is generally faster → suggests strategy to run both reasoners in parallel 22
  • 23.
  • 24.
    Experiment 2: CharacteristicConclusions Overview • Aim: analyse ability to infer semantic conclusions that are characteristic for OWL 2 Full (beyond OWL 2 DL or OWL 2 RL/RDF) • Test Data: new "Characteristic Conclusions" test suite – 32 test cases (manually created) – probes many distinctive features of OWL 2 Full, including: • strong logic-based reasoning • unrestricted use of complex properties • blank nodes as existentially quantified variables • metamodeling • use of data values as individuals • semantic annotation properties • reflective use of built-in vocabulary terms – Differences to Language Coverage test suite: • focus is on "emergent behaviour" of OWL 2 Full rather than on technical specification • most test cases depend on interplay of several OWL 2 Full semantic conditions • results often technically non-obvious (proof needed)
  • 25.
    Experiment 2: CharacteristicConclusions Example Test Case Test case for probing metamodeling with Boolean logic reasoning and blank node semantics: This positive entailment test case applies to OWL 2 Full, but neither to OWL 2 DL (requires reasoning based on metamodeling) nor to OWL 2 RL/RDF (requires strong support for class union and existential blank node semantics). Test Case: 014_Harry_belongs_to_some_Species Premise Graph: ex:Eagle rdf:type ex:Species . ex:Falcon rdf:type ex:Species . ex:harry rdf:type [ owl:unionOf ( ex:Eagle ex:Falcon ) ] . Conclusion Graph: ex:harry rdf:type _:x . _:x rdf:type ex:Species .
  • 26.
    Experiment 2: CharacteristicConclusions + success Results - wrong ? unknown 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 Pellet + + + - - - - - + + - - - - + - - - - + + - - - - + - - ? - - - Hermit + ? + - - ? - + + + - - - - + - - - - + + - - + ? + - - ? - - - Fact++ + ? ? ? ? ? ? - ? + - - - ? + ? - - - + + ? ? ? ? + - ? ? - - ? BigOWLIM + - - + - - + + - - + + - - + - - + + - - - - - - - - - - - - - Jena + - - - - + + + - - + - - - - - - + - - - - + - - + - - - - - + Parliament + - - - - - - + - - ? - - - - - - - ? - - - - - - - - - - ? ? - Vampire / complete + + + + + + + + + ? + ? ? + + + + + + ? ? ? + + ? + ? ? + + + + iProver-SInE / complete + + + + + + + + + + + ? ? + + + + + + ? ? + + + + + + + + + + + Vampire / small + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + iProver-SInE / small + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Notes: •OWL reasoners weak: < 30% success rate •DL vs RDF-rule reasoners: nearly no overlap for successful results •FOL reasoners: much better on complete axiom set; perfect on small axiom sets 26
  • 27.
    Experiment 2: CharacteristicConclusions Runtimes (sorted) Notes: • FOL reasoners often slow when using complete axiom set • Generally much faster with small axiom sets (up to several magnitudes) 27
  • 28.
  • 29.
    Experiment 3: Scalability Overview • Aim: analysing reasoning upon large data sets, when most data is not relevant for reasoning result (most simple scenario for a start) • Method: using existing reasoning test suite, but with large masses of "bulk" RDF data added to premise graph, where the bulk data is semantically weak and unrelated to the test suite • Test Data: – Reasoning test suite: Characteristic Conclusions test suite – Bulk RDF data: 1 Million triples, no RDF(S)/OWL vocabulary terms, no URIs shared with reasoning test cases • Reasoning Scenarios: – auto reasoning mode vs. SInE strategy – complete axiom set vs. small axiom sets
  • 30.
    Experiment 3: Scalability Example Bulk RDF Data Set ex:si1 ex:pi1 ex:oi1 . ex:si2 ex:pi2 ex:oi2 . ex:si3 ex:pi3 ex:oi3 . This is an example bulk RDF data set consisting ex:si4 ex:pi4 ex:oi4 . of 20 RDF triples. The data set has no names in ex:si5 ex:pi5 ex:oi5 . common with any of the test cases being used ex:ss ex:ps1 ex:os1 . ex:ss ex:ps2 ex:os2 . in the evaluation, nor does the bulk data refer ex:ss ex:ps3 ex:os3 . to any built-in terms of the OWL and RDF(S) ex:ss ex:ps4 ex:os4 . vocabularies. There are no blank nodes, i.e., ex:ss ex:ps5 ex:os5 . the bulk data consists entirely of a "ground" ex:sp1 ex:pp ex:op1 . RDF graph. The bulk data sets being used in the ex:sp2 ex:pp ex:op2 . evaluation have been much larger, still having ex:sp3 ex:pp ex:op3 . the same basic format as the example set ex:sp4 ex:pp ex:op4 . presented here. ex:sp5 ex:pp ex:op5 . ex:ssp ex:psp ex:osp1 . ex:ssp ex:psp ex:osp2 . ex:ssp ex:psp ex:osp3 . ex:ssp ex:psp ex:osp4 . ex:ssp ex:psp ex:osp5 .
  • 31.
    Experiment 3: Scalability Results + success - wrong ? unknown 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 Vampire auto / complete + + + ? ? ? ? ? ? ? ? ? ? ? + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Vampire SInE / complete + + + + + + ? + ? ? + ? ? ? + + ? + + ? ? ? + ? ? + ? ? ? + ? + IProver-SInE / complete + + + + + + + + + + + ? ? + + + + + + ? ? + + + + + + + + + + + Vampire auto / small + ? + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Vampire SInE / small + + + + + + ? + + + + + ? + + + ? + + ? ? + + + + + + ? + + ? + Iprover-SInE / small + + + + + + + + + + + + ? + + + + + + + + + + + + + + + + + + + Notes: • Vampire is very bad with "auto" strategy: times out in most cases • Improvement by using SInE strategy (Iprover and Vampire) on complete axiom set • Major improvement by combining SInE strategy with removal of irrelevant OWL axioms (small axiom sets)
  • 32.
    Experiment 3: Scalability Runtimes (sorted) Notes: • General offset of ca. 20s for parsing large input data (ca. 55MB) • SInE strategy successful (Vampire mostly fails when using "auto" mode) • further improvements by using small axiom sets
  • 33.
  • 34.
    Experiment 4: Model-Finding Overview • Aim: analyse ability to detect non-entailments and consistent ontologies • Method: Using FOL model-finders on test suite with consistent ontologies and non-entailments. Also using sub axiom sets of OWL 2 Full axiom set in order to see how well model-finding improves for smaller sublanguages of OWL 2 Full. For sub-axiom sets, some of the OWL 2 Full entailments and inconsistencies in a test suite will become non-entailments and consistent ontologies. • Axiom Sets: – OWL 2 Full – ALCO Full: undecidable sublanguage of OWL 2 Full [Motik 05] – RDFS-EXT: "extensional RDFS" [RDF Semantics, Sec. 4.2] • Test Data: Characteristic Conclusions test suite
  • 35.
    Experiment 4: Model-Finding Results (Summary) • OWL 2 Full (unsuccessful!): – No FOL model-finder confirmed satisfiability of axiomatization (timeouts) – Fortunately: no theorem prover confirmed unsatisfiability – Good: all "small-sufficient" sub-axiomatizations of test cases satisfiable • ALCO Full: – Satisfiability checking for axiomatization successful – Checking non-entailment/consistency successful in 2/3 of the test cases – Runtimes: median ca. 18s with model-finder Paradox • RDFS: – Checking non-entailment/consistency always successful – Runtimes: ca. 1/10s for most experiments with model-finder DarwinFM 35
  • 36.
    Experiment 4: Model-Finding Results (Concrete) + success - wrong ? unknown not probed 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 Paradox / ALCO Full + + + + + + ? + + + + + ? ? ? ? + ? ? ? + + ? + Paradox / RDFS + + + + + + + + + + + + + + + + + + + + + + + + + + + + + DarwinFM / RDFS + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Notes: • black cells are still entailments or inconsistent ontologies: not probed! • 1st line: ALCO Full axiom set, using Paradox model-finder Result: 15 successful detections, 9 time-outs • 2nd/3rd line: RDFS-EXT axiom set, using Paradox/DarwinFM model-finders Result: always successfull
  • 37.
  • 38.
    Summary • Using ATP-basedOWL 2 Full reasoning works in principle: – Language Coverage: basically complete (skipped datatypes) • for a few test cases, it was necessary to select a small axiom set from the complete OWL 2 Full axiomatization sufficient to proof the result – Characteristic OWL 2 Full Conclusions: all, if using small axiom sets – Performance: often quick (< 1/10s), if using small axiom sets – Scalability: works for semantically weak and unrelated "bulk" data – Model-Finding: works for certain fragments of OWL 2 Full • Identified Problems (motivates future work): – slow or even dysfunctional on complete axiomatization (> 500 axioms) – no successful model-finding for complete OWL 2 Full axiomatization
  • 39.
    Future Work • develop automated method for selecting small axiom sets • conduct more realistic scalability experiments • investigate query answering with FOL ATPs • add support for datatype reasoning • try to manually find a model for the OWL 2 Full axiomatization • implement a prototypical OWL 2 Full reasoner
  • 40.
    Links • Conference Paper: http://dx.doi.org/10.1007/978-3-642-22438-6_35 • Extended Version of Paper (detailed results, "Characteristic Conclusions" test suite) http://arxiv.org/abs/1108.0155 • Supplementary Material (all axiom sets, test data, raw results, software): http://www.fzi.de/downloads/ipe/schneid/cade2011-schneidsut-owlfullatp.zip