RR2010 Keynote

1,336 views

Published on

Evren Sirin gave one of the RR2010 keynotes on Integrity Constraint Validation for Linked Data with OWL2 via SPARQL.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,336
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
17
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

RR2010 Keynote

  1. 1. Data Validation with OWL Integrity Constraints Evren Sirin, CTO Clark & Parsia, LLC evren@clarkparsia.com 1 Wednesday, September 22, 2010
  2. 2. Who are we? • Clark & Parsia is a semantic software startup! – HQ in Washington, DC & office in Boston • Provides software development and integration services • Specializing in Semantic Web, web services, and advanced AI technologies for federal and enterprise customers! http://clarkparsia.com/ Twitter: @candp 2 Wednesday, September 22, 2010
  3. 3. Overview • Data validation with OWL – Representation and validation of integrity constraints • Use cases – Examples, issues, workarounds • OWL Integrity Constraints – Syntax, semantics, validation • Comparison with other approaches – Epistemic DLs, Epistemic QLs, Rules • Implementation and performance 3 Wednesday, September 22, 2010
  4. 4. Some Applications • Customer and product data – Find which customer would be interested in buying a certain product • System and component descriptions – Configure components to build a desired system • Workforce and employee data – Locate employees with desired expertise • Patient history and drug data – Detect and prevent potentially harmful drug interactions 4 Wednesday, September 22, 2010
  5. 5. Common Theme • There is data and lots of it! • Adding semantics to the data helps a lot – Sometimes simple taxonomies, but other times, complex ontologies • We have complete knowledge about the domain • Errors in the data cause problems – Failures in applications, errors in decision making, potential loss of revenue, security vulnerabilities, etc. 5 Wednesday, September 22, 2010
  6. 6. Data Validation • Fundamental!data management!problem – Verify data integrity and correctness! – Enforce validity of updates! • Relevant in many scenarios – Storing data for stand-alone applications – Exchanging data in distributed settings • Solved (to some degree) in RDBMSs – Harder to achieve as data semantics increase and/or more expressive integrity conditions are required 6 Wednesday, September 22, 2010
  7. 7. Disclaimer • Data validity not important for every use case – Invalid data may be fine for an application – Invalidity may even be a requirement • Focus of this talk is cases where data consistency and integrity are crucial 7 Wednesday, September 22, 2010
  8. 8. Building Semantic Apps • Represent data as RDF triples – First step for accomplishing data integration and analysis • Enrich data with more semantics (RDFS, OWL) – Infer implicit information from explicit assertions • Ensure data validity – Detect errors in the data • Do something cool with the data – Obviously... 8 Wednesday, September 22, 2010
  9. 9. Reasoning Example • Input ontology # Every supervisor is an employee Supervisor subClassOf Employee # Person0853 is a manager Person085 type Supervisor • Output inferences # Person0853 is an employee Person085 type Employee 9 Wednesday, September 22, 2010
  10. 10. Reasoning Example • Input ontology # Every supervisor is an employee Schema Supervisor subClassOf Employee # Person0853 is a manager Person085 type Supervisor • Output inferences # Person0853 is an employee Person085 type Employee 9 Wednesday, September 22, 2010
  11. 11. Reasoning Example • Input ontology # Every supervisor is an employee Schema Supervisor subClassOf Employee # Person0853 is a manager Person085 type Supervisor Instance data • Output inferences # Person0853 is an employee Person085 type Employee 9 Wednesday, September 22, 2010
  12. 12. Validating RDF Data • Common misunderstanding – RDFS/OWL is to RDF what XML Schema is to XML – Describe integrity conditions in RDFS or OWL • Typing constraints - RDFS domain/range • Participation constraints - OWL some values restrictions • Uniqueness constraints - OWL cardinality restriction – Use a reasoner to find inconsistencies • Problem:!Open World Assumption 10 Wednesday, September 22, 2010
  13. 13. Closed vs. Open World • Two different views on truth: – CWA: Any statement that is not known to be true is false – OWA: A statement is false only if it is known to be false • Used in different contexts – Databases use CWA because (typically) they contain! complete information – Ontologies use OWA because (typically) they don't... that is, they contain!incomplete information • Data validation results significantly different when using CWA instead of OWA 11 Wednesday, September 22, 2010
  14. 14. Typing Constraint • Only managers can supervise employees • Input ontology o supervises domain Supervisor o Person085 supervises Person173 OWA CWA !Consistent true false Infer that Assume that !Reason Person085 type Supervisor Person085 type not Supervisor 12 Wednesday, September 22, 2010
  15. 15. Participation Constraint • Each supervisor must supervise at least one employee • Input axioms o Supervisor subClassOf supervises some Employee o Person085 type Supervisor OWA CWA Consistent true false Infer that Assume that Reason Person085 supervises _:b Person085 supervises _:b _:b type Employee does not exist 13 Wednesday, September 22, 2010
  16. 16. Uniqueness Constraint • Employees can have at most one supervisor • Input axioms o supervises InverseFunctional o Person085 supervises Person173 o Person632 supervises Person173 OWA CWA Consistent true false Assume that Infer that Reason Person085 sameAs Person632 Person085 sameAs Person632 does not hold 14 Wednesday, September 22, 2010
  17. 17. Workarounds for CW • Manually close the world – Declare all individuals different from each other – Count existing property values and add a max cardinality restriction – Make all disjointness statements explicit and add negated types to individuals • Drawbacks – Can be computationally expensive – Likely to be error-prone 15 Wednesday, September 22, 2010
  18. 18. Problem Summary • Definitions in an OWL schema may have two purposes – Infer new statements – Check if existing statements are valid • Using OWA for validation is undesirable – Not always but in many cases • In a problem domain we may have: – Complete knowledge about some parts of the domain – Incomplete knowledge about the other parts 16 Wednesday, September 22, 2010
  19. 19. Integrity Constraint Solution • We defined an alternative semantics for OWL – Integrity Constraint (IC) semantics use CWA – Can be combined with regular inference axioms • Ontology developer chooses which axioms will be!interpreted with... – OWA - regular OWL axiom, or – CWA - integrity constraint 17 Wednesday, September 22, 2010
  20. 20. IC Extension • Syntax specification – How do we syntactically say an axiom is an IC and not a regular OWL axiom? • Semantics specification – How do we exactly interpret an IC? • Validation algorithm – Given the semantics how do we check for IC violations? 18 Wednesday, September 22, 2010
  21. 21. IC Syntax • Similar approach to using owl:imports • Define a new annotation property in a new namespace Ont1 owl:imports Ont2 Ont1 ic:imports IC1 • Backward compatible, requires minimum change in tools 19 Wednesday, September 22, 2010
  22. 22. Use Case: SKOS • Simple Knowledge Organization System (SKOS) • SKOS provides a model for expressing the basic structure and content of concept schemes – Thesauri, classification schemes, subject heading lists, taxonomies, folksonomies, etc. • SKOS data model specification – Informal (Text): http://www.w3.org/TR/skos-reference/ – Formal (OWL): http://www.w3.org/2004/02/skos/core.rdf 20 Wednesday, September 22, 2010
  23. 23. SKOS Example # SKOS reference ontology that contains inference rules skos:broaderTransitive Transitive skos-reference.ttl skos:broaderTransitive subPropertyOf skos:broader # NL constraints from SKOS specification expressed as ICs skos:related propertyDisjointWith skos:broaderTransitive skos-constraints.ttl # SKOS data that violates the SKOS data model [] a owl:Ontology ; owl:imports skos-reference.ttl ;                  ic:imports skos-constraints.ttl . skos-invalid.ttl A skos:broader B ; skos:related C . B skos:broader C . IC validation requires OWL reasoning 21 Wednesday, September 22, 2010
  24. 24. Another SKOS Example # SKOS-XL ontology with a cardinality restriction skosxl:Label subClassOf skos-xl.ttl skosxl:literalForm cardinality 1 # SKOS data that violates the SKOS data model when # SKOS ontology is imported as ICs as well [] a owl:Ontology ; owl:imports skos-xl.ttl ;                  ic:imports skos-xl.ttl . skos-invalid.tll A skosxl:labelRelation LabelA LabelA type skosxl:Label . Same ontology can be both a regular OWL import and an IC import 22 Wednesday, September 22, 2010
  25. 25. IC Semantics • OWL semantics based on model theory – Similar to First Order Logic – Formal, precise, and unambiguous • IC semantics specification – Extends OWL model theory – Change couple basic definitions, everything else follows • Details published in technical papers – We are submitting a W3C member submission soon 23 Wednesday, September 22, 2010
  26. 26. IC Interpretations • A regular OWL interpretation I = ( ΔI, ΔD, ⋅C, ⋅OP, ⋅DP, ⋅I, ⋅DT, ⋅LT, ⋅FA) is a 9-tuple – ⋅C is the class interpretation function that assigns to each class C ∈ VC a subset (C)C ⊆ ΔI • An OWL IC interpretation Γ = ( ΔI, ΔD, I, U, ⋅C, ⋅OP, ⋅DP, ⋅I, ⋅DT, ⋅LT, ⋅FA) is a 11-tuple where I and elements of U are regular OWL interpretations – (C)C = {xI | x ∈ VI and for each Uj ∈ U we have that xIUj ∈ (C)Cj }. • More details available in references at the end 24 Wednesday, September 22, 2010
  27. 27. Other Approaches • IC semantics of Motik et al. [WWW2007] • Epistemic DLs • Epistemic QLs • Rules with negation as failure operators 25 Wednesday, September 22, 2010
  28. 28. Validation Algorithm • An automated translation!algorithm • Automatically maps an OWL IC to a SPARQL query – Query must be evaluated with OWL entailment regime • ICs can be mapped to RIF rules too – Use SPARQL and Datalog correspondence – RIF defines Negation-as-Failure operator • Many different implementation possibilities • Off-the-shelf tools can be used for IC validation 26 Wednesday, September 22, 2010
  29. 29. SPARQL Translation Supervisor subClassOf supervises some Employee SELECT * { ?x type Supervisor. NOT EXISTS { ?x supervises ?y. ?y type Employee. } } 27 Wednesday, September 22, 2010
  30. 30. RIF Translation Supervisor subClassOf supervises some Employee Forall ?x ?y ( invalid() :- And ( ?x[type -> Supervisor] Naf And ( ?x[supervises -> ?y] ?y[type -> Employee] ))) 28 Wednesday, September 22, 2010
  31. 31. Solution Summary • Separate ICs from regular OWL ICs – No new syntax – Import-based mechanism • Alternative semantics for ICs – Extends OWL model theory – Provides the meanings of ICs formally • Validation algorithm – Translate ICs to another formalism – SPARQL or RIF engines can be used 29 Wednesday, September 22, 2010
  32. 32. Explanations • Explanations for positive atoms well-understood – Smallest subset of the ontology that entails the atom – Precise & laconic explanation – Lemma generation • Explanation for IC violation are tricky – Need to explain negation (i.e. missing values) – Lemma generation even more crucial • Simple solution – Explanation represented as a tree where each node represents an existing or missing axiom 30 Wednesday, September 22, 2010
  33. 33. Explanation Example (1) VIOLATION: A violates related propertyDisjointWith broaderTransitive INFERRED: A related C INFERRED: A broaderTransitive C 31 Wednesday, September 22, 2010
  34. 34. Explanation Example (1) VIOLATION: A violates related propertyDisjointWith broaderTransitive INFERRED: A related C ASSERTED: A related C INFERRED: A broaderTransitive C 32 Wednesday, September 22, 2010
  35. 35. Explanation Example (1) VIOLATION: A violates related propertyDisjointWith broaderTransitive INFERRED: A related C ASSERTED: A related C INFERRED: A broaderTransitive C ASSERTED: A broader B ASSERTED: B broader C ASSERTED: broader subPropertyOf broaderTransitive ASSERTED: broaderTransitive Transitive 33 Wednesday, September 22, 2010
  36. 36. Explanation Example (2) VIOLATION: A violates Label subClassOf literalForm cardinality 1 INFERRED: A type Label INFERRED: A labelRelation LabelA NOT INFERRED: LabelA literalForm Missing values are represented as 34 Wednesday, September 22, 2010
  37. 37. Explanation Example (2) VIOLATION: A violates Label subClassOf literalForm cardinality 1 INFERRED: A type Label ASSERTED: A type Label INFERRED: A labelRelation LabelA ASSERTED: A labelRelation LabelA NOT INFERRED: LabelA literalForm NOT ASSERTED: LabelA literalForm 35 Wednesday, September 22, 2010
  38. 38. Performance • Using ICs can improve performance! • Expressive OWL reasoning is not easy • Profiles of OWL defined for tractable reasoning – OWL 2 QL, OWL 2 EL, OWL 2 RL – Less expressive but more efficient • Modeling some OWL axioms as ICs may reduce the expressivity where OWL reasoning is used 36 Wednesday, September 22, 2010
  39. 39. Prototype • Pellet IC validator – Translates ICs into SPARQL queries automatically – Executes SPARQL queries with Pellet – Query results show constraint violations – Automatically explain constraint violations • Free download – http://clarkparsia.com/pellet/icv 37 Wednesday, September 22, 2010
  40. 40. Code Example // create an inferencing model using Pellet reasoner InfModel dataModel = ModelFactory.createInfModel(r); // load the schema and instance data to Pellet dataModel.read( "file:data.rdf" ); dataModel.read( "file:schema.owl" ); // Create the IC validator and associate it with the dataset JenaICValidator validator = new JenaICValidator(dataModel); // Load the constraints into the IC validator validator.getConstraints().read("file:constraints.owl"); // Get the constraint violations Iterator<ConstraintViolation> violations = validator.getViolations(); 38 Wednesday, September 22, 2010
  41. 41. Next Steps • W3C Member submission for IC semantics • Robust IC validator implementation – Incremental validation – Multi-threaded validation • Support for IC editing • Integration with PelletDb – Scalable reasoning + validation 39 Wednesday, September 22, 2010
  42. 42. References • Evren Sirin, Michael Smith, Evan Wallace Opening, Closing Worlds - On Integrity Constraints OWL: Experiences and Directions Workshop (OWLED '08), October 2008. • Evren Sirin, Jiao Tao Towards Integrity Constraints in OWL OWL: Experiences and Directions Workshop (OWLED '09), October 2009. • Jiao Tao, Evren Sirin, Jie Bao, Deborah L. McGuinness Integrity Constraints in OWL To AppearThe 24th AAAIConference on Artificial Intelligence (AAAI '10), July 2010. 40 Wednesday, September 22, 2010
  43. 43. Questions 41 Wednesday, September 22, 2010
  44. 44. Other Approaches • IC semantics of Motik et al. [WWW2007] • Epistemic DLs • Epistemic QLs • Rules 42 Wednesday, September 22, 2010
  45. 45. IC Semantics of Motik et al. • Same motivation and similar approach – Separate ICs for regular OWL axioms – Use ICs for validation only • Semantics based on outer skolemization on first order formula and entailment in minimal Herbrand models • Several features of the semantics made it unsuitable for us – ICs can be satisfied by existential variables – Disjunction can cause false positives 43 Wednesday, September 22, 2010
  46. 46. Epistemic DLs • DLs extended with epistemic operator K • ICs can be represented as epistemic queries over a regular KB – Aligns with Reiter’s original characterization of ICs • Example: – KSupervisor subClassOf Ksupervises some KEmployee • Only major differences – We are not using K operator explicitly – No Unique Name Assumption in OWL ICs 44 Wednesday, September 22, 2010

×