Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
RR2010 Keynote
1. Data Validation with
OWL Integrity Constraints
Evren Sirin, CTO
Clark & Parsia, LLC
evren@clarkparsia.com
1
Wednesday, September 22, 2010
2. Who are we?
• Clark & Parsia is a semantic software startup!
– HQ in Washington, DC & office in Boston
• Provides software development and integration
services
• Specializing in Semantic Web, web services, and
advanced AI technologies for federal and
enterprise customers!
http://clarkparsia.com/
Twitter: @candp
2
Wednesday, September 22, 2010
3. Overview
• Data validation with OWL
– Representation and validation of integrity constraints
• Use cases
– Examples, issues, workarounds
• OWL Integrity Constraints
– Syntax, semantics, validation
• Comparison with other approaches
– Epistemic DLs, Epistemic QLs, Rules
• Implementation and performance
3
Wednesday, September 22, 2010
4. Some Applications
• Customer and product data
– Find which customer would be interested in buying a
certain product
• System and component descriptions
– Configure components to build a desired system
• Workforce and employee data
– Locate employees with desired expertise
• Patient history and drug data
– Detect and prevent potentially harmful drug interactions
4
Wednesday, September 22, 2010
5. Common Theme
• There is data and lots of it!
• Adding semantics to the data helps a lot
– Sometimes simple taxonomies, but other times,
complex ontologies
• We have complete knowledge about the domain
• Errors in the data cause problems
– Failures in applications, errors in decision making,
potential loss of revenue, security vulnerabilities, etc.
5
Wednesday, September 22, 2010
6. Data Validation
• Fundamental!data management!problem
– Verify data integrity and correctness!
– Enforce validity of updates!
• Relevant in many scenarios
– Storing data for stand-alone applications
– Exchanging data in distributed settings
• Solved (to some degree) in RDBMSs
– Harder to achieve as data semantics increase and/or
more expressive integrity conditions are required
6
Wednesday, September 22, 2010
7. Disclaimer
• Data validity not important for every use case
– Invalid data may be fine for an application
– Invalidity may even be a requirement
• Focus of this talk is cases where data consistency
and integrity are crucial
7
Wednesday, September 22, 2010
8. Building Semantic Apps
• Represent data as RDF triples
– First step for accomplishing data integration and
analysis
• Enrich data with more semantics (RDFS, OWL)
– Infer implicit information from explicit assertions
• Ensure data validity
– Detect errors in the data
• Do something cool with the data
– Obviously...
8
Wednesday, September 22, 2010
9. Reasoning Example
• Input ontology
# Every supervisor is an employee
Supervisor subClassOf Employee
# Person0853 is a manager
Person085 type Supervisor
• Output inferences
# Person0853 is an employee
Person085 type Employee
9
Wednesday, September 22, 2010
10. Reasoning Example
• Input ontology
# Every supervisor is an employee
Schema
Supervisor subClassOf Employee
# Person0853 is a manager
Person085 type Supervisor
• Output inferences
# Person0853 is an employee
Person085 type Employee
9
Wednesday, September 22, 2010
11. Reasoning Example
• Input ontology
# Every supervisor is an employee
Schema
Supervisor subClassOf Employee
# Person0853 is a manager
Person085 type Supervisor Instance data
• Output inferences
# Person0853 is an employee
Person085 type Employee
9
Wednesday, September 22, 2010
12. Validating RDF Data
• Common misunderstanding
– RDFS/OWL is to RDF what XML Schema is to XML
– Describe integrity conditions in RDFS or OWL
• Typing constraints - RDFS domain/range
• Participation constraints - OWL some values restrictions
• Uniqueness constraints - OWL cardinality restriction
– Use a reasoner to find inconsistencies
• Problem:!Open World Assumption
10
Wednesday, September 22, 2010
13. Closed vs. Open World
• Two different views on truth:
– CWA: Any statement that is not known to be true is false
– OWA: A statement is false only if it is known to be false
• Used in different contexts
– Databases use CWA because (typically) they contain!
complete information
– Ontologies use OWA because (typically) they don't...
that is, they contain!incomplete information
• Data validation results significantly different when
using CWA instead of OWA
11
Wednesday, September 22, 2010
14. Typing Constraint
• Only managers can supervise employees
• Input ontology
o supervises domain Supervisor
o Person085 supervises Person173
OWA CWA
!Consistent true false
Infer that Assume that
!Reason Person085 type Supervisor Person085 type not Supervisor
12
Wednesday, September 22, 2010
15. Participation Constraint
• Each supervisor must supervise at least
one employee
• Input axioms
o Supervisor subClassOf supervises some Employee
o Person085 type Supervisor
OWA CWA
Consistent true false
Infer that Assume that
Reason Person085 supervises _:b Person085 supervises _:b
_:b type Employee does not exist
13
Wednesday, September 22, 2010
16. Uniqueness Constraint
• Employees can have at most one supervisor
• Input axioms
o supervises InverseFunctional
o Person085 supervises Person173
o Person632 supervises Person173
OWA CWA
Consistent true false
Assume that
Infer that
Reason Person085 sameAs Person632
Person085 sameAs Person632
does not hold
14
Wednesday, September 22, 2010
17. Workarounds for CW
• Manually close the world
– Declare all individuals different from each other
– Count existing property values and add a max
cardinality restriction
– Make all disjointness statements explicit and add
negated types to individuals
• Drawbacks
– Can be computationally expensive
– Likely to be error-prone
15
Wednesday, September 22, 2010
18. Problem Summary
• Definitions in an OWL schema may have two
purposes
– Infer new statements
– Check if existing statements are valid
• Using OWA for validation is undesirable
– Not always but in many cases
• In a problem domain we may have:
– Complete knowledge about some parts of the domain
– Incomplete knowledge about the other parts
16
Wednesday, September 22, 2010
19. Integrity Constraint
Solution
• We defined an alternative semantics for OWL
– Integrity Constraint (IC) semantics use CWA
– Can be combined with regular inference axioms
• Ontology developer chooses which axioms will
be!interpreted with...
– OWA - regular OWL axiom, or
– CWA - integrity constraint
17
Wednesday, September 22, 2010
20. IC Extension
• Syntax specification
– How do we syntactically say an axiom is an IC and
not a regular OWL axiom?
• Semantics specification
– How do we exactly interpret an IC?
• Validation algorithm
– Given the semantics how do we check for IC
violations?
18
Wednesday, September 22, 2010
21. IC Syntax
• Similar approach to using owl:imports
• Define a new annotation property in a new
namespace
Ont1 owl:imports Ont2
Ont1 ic:imports IC1
• Backward compatible, requires minimum change
in tools
19
Wednesday, September 22, 2010
22. Use Case: SKOS
• Simple Knowledge Organization System (SKOS)
• SKOS provides a model for expressing the basic
structure and content of concept schemes
– Thesauri, classification schemes, subject heading lists,
taxonomies, folksonomies, etc.
• SKOS data model specification
– Informal (Text): http://www.w3.org/TR/skos-reference/
– Formal (OWL): http://www.w3.org/2004/02/skos/core.rdf
20
Wednesday, September 22, 2010
23. SKOS Example
# SKOS reference ontology that contains inference rules
skos:broaderTransitive Transitive skos-reference.ttl
skos:broaderTransitive subPropertyOf skos:broader
# NL constraints from SKOS specification expressed as ICs
skos:related propertyDisjointWith skos:broaderTransitive skos-constraints.ttl
# SKOS data that violates the SKOS data model
[] a owl:Ontology ; owl:imports skos-reference.ttl ;
ic:imports skos-constraints.ttl . skos-invalid.ttl
A skos:broader B ; skos:related C .
B skos:broader C .
IC validation requires OWL reasoning 21
Wednesday, September 22, 2010
24. Another SKOS Example
# SKOS-XL ontology with a cardinality restriction
skosxl:Label subClassOf skos-xl.ttl
skosxl:literalForm cardinality 1
# SKOS data that violates the SKOS data model when
# SKOS ontology is imported as ICs as well
[] a owl:Ontology ; owl:imports skos-xl.ttl ;
ic:imports skos-xl.ttl . skos-invalid.tll
A skosxl:labelRelation LabelA
LabelA type skosxl:Label .
Same ontology can be both
a regular OWL import and an IC import 22
Wednesday, September 22, 2010
25. IC Semantics
• OWL semantics based on model theory
– Similar to First Order Logic
– Formal, precise, and unambiguous
• IC semantics specification
– Extends OWL model theory
– Change couple basic definitions, everything else
follows
• Details published in technical papers
– We are submitting a W3C member submission soon
23
Wednesday, September 22, 2010
26. IC Interpretations
• A regular OWL interpretation I = ( ΔI, ΔD, ⋅C, ⋅OP, ⋅DP, ⋅I,
⋅DT, ⋅LT, ⋅FA) is a 9-tuple
– ⋅C is the class interpretation function that assigns to each class
C ∈ VC a subset (C)C ⊆ ΔI
• An OWL IC interpretation Γ = ( ΔI, ΔD, I, U, ⋅C, ⋅OP, ⋅DP, ⋅I,
⋅DT, ⋅LT, ⋅FA) is a 11-tuple where I and elements of U are
regular OWL interpretations
– (C)C = {xI | x ∈ VI and for each Uj ∈ U we have that xIUj
∈ (C)Cj }.
• More details available in references at the end
24
Wednesday, September 22, 2010
27. Other Approaches
• IC semantics of Motik et al. [WWW2007]
• Epistemic DLs
• Epistemic QLs
• Rules with negation as failure operators
25
Wednesday, September 22, 2010
28. Validation Algorithm
• An automated translation!algorithm
• Automatically maps an OWL IC to a SPARQL
query
– Query must be evaluated with OWL entailment regime
• ICs can be mapped to RIF rules too
– Use SPARQL and Datalog correspondence
– RIF defines Negation-as-Failure operator
• Many different implementation possibilities
• Off-the-shelf tools can be used for IC validation
26
Wednesday, September 22, 2010
29. SPARQL Translation
Supervisor subClassOf supervises some Employee
SELECT * {
?x type Supervisor.
NOT EXISTS {
?x supervises ?y.
?y type Employee.
}
}
27
Wednesday, September 22, 2010
30. RIF Translation
Supervisor subClassOf supervises some Employee
Forall ?x ?y (
invalid() :- And (
?x[type -> Supervisor]
Naf And (
?x[supervises -> ?y]
?y[type -> Employee] )))
28
Wednesday, September 22, 2010
31. Solution Summary
• Separate ICs from regular OWL ICs
– No new syntax
– Import-based mechanism
• Alternative semantics for ICs
– Extends OWL model theory
– Provides the meanings of ICs formally
• Validation algorithm
– Translate ICs to another formalism
– SPARQL or RIF engines can be used
29
Wednesday, September 22, 2010
32. Explanations
• Explanations for positive atoms well-understood
– Smallest subset of the ontology that entails the atom
– Precise & laconic explanation
– Lemma generation
• Explanation for IC violation are tricky
– Need to explain negation (i.e. missing values)
– Lemma generation even more crucial
• Simple solution
– Explanation represented as a tree where each node
represents an existing or missing axiom
30
Wednesday, September 22, 2010
33. Explanation Example (1)
VIOLATION: A violates related propertyDisjointWith broaderTransitive
INFERRED: A related C
INFERRED: A broaderTransitive C
31
Wednesday, September 22, 2010
34. Explanation Example (1)
VIOLATION: A violates related propertyDisjointWith broaderTransitive
INFERRED: A related C
ASSERTED: A related C
INFERRED: A broaderTransitive C
32
Wednesday, September 22, 2010
35. Explanation Example (1)
VIOLATION: A violates related propertyDisjointWith broaderTransitive
INFERRED: A related C
ASSERTED: A related C
INFERRED: A broaderTransitive C
ASSERTED: A broader B
ASSERTED: B broader C
ASSERTED: broader subPropertyOf broaderTransitive
ASSERTED: broaderTransitive Transitive
33
Wednesday, September 22, 2010
36. Explanation Example (2)
VIOLATION: A violates Label subClassOf literalForm cardinality 1
INFERRED: A type Label
INFERRED: A labelRelation LabelA
NOT INFERRED: LabelA literalForm
Missing values are represented as
34
Wednesday, September 22, 2010
37. Explanation Example (2)
VIOLATION: A violates Label subClassOf literalForm cardinality 1
INFERRED: A type Label
ASSERTED: A type Label
INFERRED: A labelRelation LabelA
ASSERTED: A labelRelation LabelA
NOT INFERRED: LabelA literalForm
NOT ASSERTED: LabelA literalForm
35
Wednesday, September 22, 2010
38. Performance
• Using ICs can improve performance!
• Expressive OWL reasoning is not easy
• Profiles of OWL defined for tractable reasoning
– OWL 2 QL, OWL 2 EL, OWL 2 RL
– Less expressive but more efficient
• Modeling some OWL axioms as ICs may reduce
the expressivity where OWL reasoning is used
36
Wednesday, September 22, 2010
39. Prototype
• Pellet IC validator
– Translates ICs into SPARQL queries automatically
– Executes SPARQL queries with Pellet
– Query results show constraint violations
– Automatically explain constraint violations
• Free download
– http://clarkparsia.com/pellet/icv
37
Wednesday, September 22, 2010
40. Code Example
// create an inferencing model using Pellet reasoner
InfModel dataModel = ModelFactory.createInfModel(r);
// load the schema and instance data to Pellet
dataModel.read( "file:data.rdf" );
dataModel.read( "file:schema.owl" );
// Create the IC validator and associate it with the dataset
JenaICValidator validator = new JenaICValidator(dataModel);
// Load the constraints into the IC validator
validator.getConstraints().read("file:constraints.owl");
// Get the constraint violations
Iterator<ConstraintViolation> violations =
validator.getViolations();
38
Wednesday, September 22, 2010
41. Next Steps
• W3C Member submission for IC semantics
• Robust IC validator implementation
– Incremental validation
– Multi-threaded validation
• Support for IC editing
• Integration with PelletDb
– Scalable reasoning + validation
39
Wednesday, September 22, 2010
42. References
• Evren Sirin, Michael Smith, Evan Wallace
Opening, Closing Worlds - On Integrity Constraints
OWL: Experiences and Directions Workshop
(OWLED '08), October 2008.
• Evren Sirin, Jiao Tao
Towards Integrity Constraints in OWL
OWL: Experiences and Directions Workshop
(OWLED '09), October 2009.
• Jiao Tao, Evren Sirin, Jie Bao, Deborah L. McGuinness
Integrity Constraints in OWL
To AppearThe 24th AAAIConference on Artificial
Intelligence (AAAI '10), July 2010.
40
Wednesday, September 22, 2010
44. Other Approaches
• IC semantics of Motik et al. [WWW2007]
• Epistemic DLs
• Epistemic QLs
• Rules
42
Wednesday, September 22, 2010
45. IC Semantics of Motik et al.
• Same motivation and similar approach
– Separate ICs for regular OWL axioms
– Use ICs for validation only
• Semantics based on outer skolemization on first order
formula and entailment in minimal Herbrand models
• Several features of the semantics made it unsuitable
for us
– ICs can be satisfied by existential variables
– Disjunction can cause false positives
43
Wednesday, September 22, 2010
46. Epistemic DLs
• DLs extended with epistemic operator K
• ICs can be represented as epistemic queries over a
regular KB
– Aligns with Reiter’s original characterization of ICs
• Example:
– KSupervisor subClassOf Ksupervises some KEmployee
• Only major differences
– We are not using K operator explicitly
– No Unique Name Assumption in OWL ICs
44
Wednesday, September 22, 2010