cade23-schneidsut-atp4owlfull-2011

Reasoning in the OWL 2 Full Ontology Language
using First-Order Automated Theorem Proving
Michael Schneider
FZI Research Center for Information Technology, Germany

Geoff Sutcliffe
University of Miami, USA

23rd International Conference on Automated Deduction (CADE 23)
Wrocław, Poland, August 2011

Introduction
• Context: Semantic Web, OWL, First-Order Logic, ATP

• Focus: W3C ontology language OWL 2 Full
– highly expressive ontology formalism
– no reasoner as of today

• Aims: find out …
– … to what extend can practical OWL 2 Full reasoning be
implemented with off-the-shelf FOL reasoning technology
– … whether FOL-based OWL 2 Full reasoning can provide
added value over today's state-of-the art OWL reasoners

Semantic Web
• Semantic Web (SW): extends WWW by machine-processible,
interlinked resource descriptions and vocabularies
– Resources (everything one can talk about): :Eagle
• identified by URIs
age
• described by property-value pairs 3
father
• classified by classes :Harry :Larry
sex
• related to other resources via properties m
– Vocabularies: define classes and properties and their (formal) semantics,
e.g. that class Eagle is a subclass of class Animal, so Harry becomes an Animal

• RDF ("Resource Description Framework"): language to define
"graphs" of interlinked resource descriptions

• OWL ("Web Ontology Language"): language to define vocabularies

OWL Flavours
• OWL 2: family of ontology languages for the Semantic Web
– W3C Recommendation (2009)
– version 2 is revised and extended version of OWL (2004)

• Two major "flavours" of OWL: OWL 2 DL and OWL 2 Full
– OWL 2 DL: basically a description logic (SROIQ[D]) adjusted to SW needs
– OWL 2 Full: similar to OWL 2 DL, but directly designed for an RDF-based SW

• Some observable distinctive features of OWL 2 Full:
– can be applied to weakly-structured SW data (LOD) and RDFS vocabularies
– no restrictions on use of OWL constructs (e.g. asymmetric transitive properties)
– support for (semantic) metamodeling (Harry, the Eagle, has meta-class Species)

• Theoretical issue: OWL 2 Full is undecidable (practical problem?)

OWL 2 Full Semantics
• Specification: OWL 2 Full semantics is specified via a set of
model-theoretic "semantic conditions"
OWL 2 RDF-Based Semantics:
http://www.w3.org/TR/owl2-rdf-based-semantics/

• Core Observation:
– all semantic conditions have the form of standard first-order logic formulae
→ OWL 2 Full semantics is essentially a FOL theory!
– all input RDF graphs are representable as FOL formulae
– hence: OWL 2 Full reasoning is implementable in terms of FOL reasoning!
• Question: How well does it work in practice?
• Prior Art: very little research based on this observation so far;
none for OWL 2 Full

OWL 2 Full Semantic Conditions
• Typical format of semantic conditions:
if a certain semantic relationship holds,
then another associated relationship also holds
(Note: many semantic conditions are in fact if-and-only-if conditions)
• Example (1st semantic condition in Table 5.8 of OWL 2 RDF-Based Semantics):
if two individuals c1 and c2 are related by the denotation of URI rdfs:subClassOf,
then c1 and c2 are classes (members of set IC) and ICEXT(c1), the class extension
of c1, is a subset of ICEXT(c2), the class extensions of c2

Prior Art in OWL Full Reasoning
• Fikes, McGuinness, Waldinger: A First-Order Logic Semantics
for Semantic Web Markup Languages. TR, Stanford, 2002.
– translation of specifications of precursers of OWL and RDF into first-order
logic (FOL) theory, and application of FOL reasoners
– focus: checking for technical issues in specifications (less on inferencing)
• Hayes: Translating Semantic Web Languages into Common Logic.
TR, Pensacola (Florida), 2005.
– translation of OWL 1 Full into Common Logic (basically a variant of FOL)
– no report on reasoning experiments
• Hawke: Surnia. 2003. URL: http://www.w3.org/2003/08/surnia
– OWL 1 Full reasoner based on FOL translation using Otter FOL reasoner
– did not perform well on W3C OWL 1 test suite
– ad hoc implementation: does not properly follow specification; many flaws

Translation into FOL:
General Process
OWL 2 Full Entailment Checking using FOL Reasoners and concrete FOL syntax TPTP:

OWL 2 Full TPTP
Semantic Conditions Axiom Set FOL

{ }
Theorem Prover
theorem
Premise TPTP
counter-sat
RDF Graph Axiom
unknown
FOL
Conclusion TPTP Model Finder
RDF Graph Conjecture

1. Input:
• translate semantic conditions into set of TPTP axioms
• translate premise RDF graph into TPTP axiom
• translate conclusion RDF graph into TPTP conjecture
2. Reasoning: feed all TPTP formulae into FOL reasoners (parallel execution):
• FOL theorem provers: used to detect positive entailments
• FOL model-finders: used to detect non-entailments
3. Output: integrate results from FOL reasoners into single result 10

Semantic Conditions
iff

model-theoretic OWL 2 Full semantic condition (Table 5.8)

corresponding FOL formula (TPTP)

RDF Graphs

RDF graph (Turtle)

corresponding FOL formula (TPTP)

Evaluation Setting:
TPTP-Encoding
• FOL Axiomatization:
– translated most normative semantic conditions of OWL 2 Full
– excluded: datatype reasoning-related semantics
– size of complete axiomatization: 558 FOL formulae
Syntax Statistics
Number of formulae: 558 ( 196 unit )
Number of atoms: 1772 ( 90 equality )
Maximal formula depth: 27 ( 5 average )
Number of connectives: 1350 ( 136 '~' ; 35 '|' ; 758 '&' ; 126 '<=>' ; 295 '=>' )
Number of predicates: 13 ( 1 propositional ; 0-3 arity )
Number of functors: 157 ( 156 constant ; 0-2 arity )
Number of variables: 973 ( 0 sgn ; 911 '!' ; 62 '?' )
Maximal term depth: 2 ( 1 average )

• RDF Graph Conversion:
– implemented simple RDF-to-TPTP converter tool

Evaluation Setting:
Experiments

1. Language Coverage
completeness w.r.t. OWL 2 Full specification
2. Characteristic OWL 2 Full Conclusions
semantic capabilities beyond OWL 2 DL or OWL 2 RL/RDF Rules
3. Scalability
reasoning upon large data sets
4. Model Finding
detecting consistent ontologies and non-entailments

Evaluation Setting:
Reasoners
• FOL Theorem Provers:
– Vampire 0.6 (using two modes: "auto", and with SInE strategy)
– iProver-SInE 0.8 (iProver with SInE strategy and strategy scheduling)
• FOL Model-Finders:
– Paradox 4.0 (finite model finder)
– DarwinFM 1.4.5 (finite model finder)
• OWL Reasoners:
– Pellet 2.2.2 (tableaux-based OWL 2 DL reasoner)
– HermiT 1.3.2 (tableaux-based OWL 2 DL reasoner)
– FaCT++ 1.5.0 (tableaux-based OWL 2 DL reasoner)
– BigOWLIM 3.4, using "owl-rl" ruleset (RDF entailment-rule reasoner)
– Jena 2.6.4, using OWL_MEM_RULE_INF spec (RDF framework with rule engine)
– Parliament 2.6.9 (reasoning-enabled RDF triple store)

Evaluation Setting:
Environment

• Computers:
– CPU: Intel Pentium 4, 2.8 GHz
– Memory: 2 GB
– Operating System: Linux FC8
• Max. CPU time per run: 300 s

EVALUATION RESULTS
1. LANGUAGE COVERAGE

Experiment 1: Language Coverage
Overview
• Aim: analyse completeness w.r.t. OWL 2 Full specification
• Method: check that all parts of OWL 2 Full semantics specification
are covered (except for datatype reasoning)
• Test Data: dedicated OWL 2 Full coverage test suite targeted to
specification level (Schneider & Mainzer, 2009):
– at least one test case for each OWL 2 Full semantic condition
– each test case focuses as much as possible on targeted semantic condition
– generally easy to solve, hence failure indicates flaw or lack of coverage
• Adjustments:
– removal of datatype-reasoning related test cases (currently unsupported)
– only using entailment and inconsistency test cases
– size of remaining test suite: 411 test cases

Example Test Case
Test case for probing coverage of the RDFS semantic condition for class subsumption:
This positive entailment (‘p‘) test case applies to RDFS (‘rdfs‘), the OWL 2 RL/RDF Rules
(‘owl2rl‘), and all common semantic extensions, including OWL 2 Full.
The upper RDF graph is the premise graph, the lower RDF graph is the conclusion graph.

TESTCASE rdfbased-sem-rdfs-subclass-cond
p rdfs owl2rl
The extensions of two classes related by rdfs:subClassOf
are in a subsumption relationship.
+
ex:c1 rdfs:subClassOf ex:c2 .
ex:w rdf:type ex:c1 .
+
ex:w rdf:type ex:c2 .
+

Results
Pellet 237 168 6
HermiT 246 157 8
FaCT++ 190 45 176
BigOWLIM 282 129 0
Jena 129 282 0
Parliament 14 373 24
Vampire, OWL 2 Full axioms 349 0 62
iProver-SInE, OWL 2 Full axioms 383 028
iProver-SInE with small axiom sets 396 0
15

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Success Wrong Unknown
Notes:
•All DL reasoners show similar results (although FaCT++ signals much more errors)
•Results of BigOWLIM and DL reasoners are very different (BigOWLIM not "better")
•"small axiom set": a manually selected subset of axioms from the complete
OWL 2 Full axiom set that is small but sufficient to succeed on the given test case
21

Runtimes (sorted)
For each reasoner: runtimes for all test cases are sorted increasingly
(all runtimes are for the complete OWL 2 Full axiom set; small axiom sets are ignored)

Notes:
• Most problems solved in less than 1s
• Vampire solves slightly less problems, but is generally faster
→ suggests strategy to run both reasoners in parallel 22

EVALUATION RESULTS:
2. CHARACTERISTIC CONCLUSIONS

Experiment 2: Characteristic Conclusions
Overview
• Aim: analyse ability to infer semantic conclusions that are
characteristic for OWL 2 Full (beyond OWL 2 DL or OWL 2 RL/RDF)
• Test Data: new "Characteristic Conclusions" test suite
– 32 test cases (manually created)
– probes many distinctive features of OWL 2 Full, including:
• strong logic-based reasoning
• unrestricted use of complex properties
• blank nodes as existentially quantified variables
• metamodeling
• use of data values as individuals
• semantic annotation properties
• reflective use of built-in vocabulary terms
– Differences to Language Coverage test suite:
• focus is on "emergent behaviour" of OWL 2 Full rather than on technical specification
• most test cases depend on interplay of several OWL 2 Full semantic conditions
• results often technically non-obvious (proof needed)

Example Test Case
Test case for probing metamodeling with Boolean logic reasoning and blank node semantics:
This positive entailment test case applies to OWL 2 Full, but neither to OWL 2 DL (requires
reasoning based on metamodeling) nor to OWL 2 RL/RDF (requires strong support for class
union and existential blank node semantics).

Test Case: 014_Harry_belongs_to_some_Species
Premise Graph:
ex:Eagle rdf:type ex:Species .
ex:Falcon rdf:type ex:Species .
ex:harry rdf:type [
owl:unionOf ( ex:Eagle ex:Falcon )
] .

Conclusion Graph:
ex:harry rdf:type _:x .
_:x rdf:type ex:Species .

+ success Results
- wrong
? unknown 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2
Pellet + + + - - - - - + + - - - - + - - - - + + - - - - + - - ? - - -
Hermit + ? + - - ? - + + + - - - - + - - - - + + - - + ? + - - ? - - -
Fact++ + ? ? ? ? ? ? - ? + - - - ? + ? - - - + + ? ? ? ? + - ? ? - - ?
BigOWLIM + - - + - - + + - - + + - - + - - + + - - - - - - - - - - - - -
Jena + - - - - + + + - - + - - - - - - + - - - - + - - + - - - - - +
Parliament + - - - - - - + - - ? - - - - - - - ? - - - - - - - - - - ? ? -

Vampire / complete + + + + + + + + + ? + ? ? + + + + + + ? ? ? + + ? + ? ? + + + +
iProver-SInE / complete + + + + + + + + + + + ? ? + + + + + + ? ? + + + + + + + + + + +
Vampire / small + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
iProver-SInE / small + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

Notes:
•OWL reasoners weak: < 30% success rate
•DL vs RDF-rule reasoners: nearly no overlap for successful results
•FOL reasoners: much better on complete axiom set; perfect on small axiom sets
26

Runtimes (sorted)

Notes:
• FOL reasoners often slow when using complete axiom set
• Generally much faster with small axiom sets (up to several magnitudes)

27

EVALUATION RESULTS:
3. SCALABILITY

Experiment 3: Scalability
Overview
• Aim: analysing reasoning upon large data sets, when most data is
not relevant for reasoning result (most simple scenario for a start)
• Method: using existing reasoning test suite, but with large masses
of "bulk" RDF data added to premise graph, where the bulk data is
semantically weak and unrelated to the test suite
• Test Data:
– Reasoning test suite: Characteristic Conclusions test suite
– Bulk RDF data: 1 Million triples, no RDF(S)/OWL vocabulary terms, no URIs
shared with reasoning test cases
• Reasoning Scenarios:
– auto reasoning mode vs. SInE strategy
– complete axiom set vs. small axiom sets

Example Bulk RDF Data Set
ex:si1 ex:pi1 ex:oi1 .
This is an example bulk RDF data set consisting ex:si4 ex:pi4 ex:oi4 .
of 20 RDF triples. The data set has no names in ex:si5 ex:pi5 ex:oi5 .
common with any of the test cases being used ex:ss ex:ps1 ex:os1 .
ex:ss ex:ps2 ex:os2 .
in the evaluation, nor does the bulk data refer
to any built-in terms of the OWL and RDF(S)
vocabularies. There are no blank nodes, i.e.,
the bulk data consists entirely of a "ground" ex:sp1 ex:pp ex:op1 .
RDF graph. The bulk data sets being used in the ex:sp2 ex:pp ex:op2 .
evaluation have been much larger, still having ex:sp3 ex:pp ex:op3 .
the same basic format as the example set ex:sp4 ex:pp ex:op4 .
presented here. ex:sp5 ex:pp ex:op5 .
ex:ssp ex:psp ex:osp1 .

Results
+ success
- wrong
? unknown
0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2
Vampire auto / complete + + + ? ? ? ? ? ? ? ? ? ? ? + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Vampire SInE / complete + + + + + + ? + ? ? + ? ? ? + + ? + + ? ? ? + ? ? + ? ? ? + ? +
IProver-SInE / complete + + + + + + + + + + + ? ? + + + + + + ? ? + + + + + + + + + + +
Vampire auto / small + ? + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Vampire SInE / small + + + + + + ? + + + + + ? + + + ? + + ? ? + + + + + + ? + + ? +
Iprover-SInE / small + + + + + + + + + + + + ? + + + + + + + + + + + + + + + + + + +

Notes:
• Vampire is very bad with "auto" strategy: times out in most cases
• Improvement by using SInE strategy (Iprover and Vampire) on complete axiom set
• Major improvement by combining SInE strategy with removal of irrelevant OWL axioms
(small axiom sets)

Runtimes (sorted)

Notes:
• General offset of ca. 20s for parsing large input data (ca. 55MB)
• SInE strategy successful (Vampire mostly fails when using "auto" mode)
• further improvements by using small axiom sets

EVALUATION RESULTS:
4. MODEL FINDING

Experiment 4: Model-Finding
Overview
• Aim: analyse ability to detect non-entailments and consistent
ontologies
• Method: Using FOL model-finders on test suite with consistent
ontologies and non-entailments. Also using sub axiom sets of
OWL 2 Full axiom set in order to see how well model-finding
improves for smaller sublanguages of OWL 2 Full. For sub-axiom
sets, some of the OWL 2 Full entailments and inconsistencies in a
test suite will become non-entailments and consistent ontologies.
• Axiom Sets:
– OWL 2 Full
– ALCO Full: undecidable sublanguage of OWL 2 Full [Motik 05]
– RDFS-EXT: "extensional RDFS" [RDF Semantics, Sec. 4.2]
• Test Data: Characteristic Conclusions test suite

Results (Summary)
• OWL 2 Full (unsuccessful!):
– No FOL model-finder confirmed satisfiability of axiomatization (timeouts)
– Fortunately: no theorem prover confirmed unsatisfiability
– Good: all "small-sufficient" sub-axiomatizations of test cases satisfiable
• ALCO Full:
– Satisfiability checking for axiomatization successful
– Checking non-entailment/consistency successful in 2/3 of the test cases
– Runtimes: median ca. 18s with model-finder Paradox
• RDFS:
– Checking non-entailment/consistency always successful
– Runtimes: ca. 1/10s for most experiments with model-finder DarwinFM

35

Results (Concrete)
+ success
- wrong
? unknown
not probed
0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2
Paradox / ALCO Full + + + + + + ? + + + + + ? ? ? ? + ? ? ? + + ? +
Paradox / RDFS + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
DarwinFM / RDFS + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

Notes:
• black cells are still entailments or inconsistent ontologies: not probed!
• 1st line: ALCO Full axiom set, using Paradox model-finder
Result: 15 successful detections, 9 time-outs
• 2nd/3rd line: RDFS-EXT axiom set, using Paradox/DarwinFM model-finders
Result: always successfull

Summary
• Using ATP-based OWL 2 Full reasoning works in principle:
– Language Coverage: basically complete (skipped datatypes)
• for a few test cases, it was necessary to select a small axiom set from the
complete OWL 2 Full axiomatization sufficient to proof the result
– Characteristic OWL 2 Full Conclusions: all, if using small axiom sets
– Performance: often quick (< 1/10s), if using small axiom sets
– Scalability: works for semantically weak and unrelated "bulk" data
– Model-Finding: works for certain fragments of OWL 2 Full

• Identified Problems (motivates future work):
– slow or even dysfunctional on complete axiomatization (> 500 axioms)
– no successful model-finding for complete OWL 2 Full axiomatization

Future Work
• develop automated method for selecting small axiom sets
• conduct more realistic scalability experiments
• investigate query answering with FOL ATPs
• add support for datatype reasoning
• try to manually find a model for the OWL 2 Full axiomatization
• implement a prototypical OWL 2 Full reasoner

Links

• Conference Paper:
http://dx.doi.org/10.1007/978-3-642-22438-6_35

• Extended Version of Paper
(detailed results, "Characteristic Conclusions" test suite)
http://arxiv.org/abs/1108.0155

• Supplementary Material
(all axiom sets, test data, raw results, software):
http://www.fzi.de/downloads/ipe/schneid/cade2011-schneidsut-owlfullatp.zip

cade23-schneidsut-atp4owlfull-2011

More Related Content

What's hot

Viewers also liked

Similar to cade23-schneidsut-atp4owlfull-2011

Recently uploaded

cade23-schneidsut-atp4owlfull-2011