Using OWL in
Closed World Applications

         Evren Sirin, CTO
        Clark & Parsia, LLC
      evren@clarkparsia.com
Who are we?
• Clark & Parsia is a semantic software startup 
  – HQ in Washington, DC & office in Boston
• Provides software development and integration
  services
• Specializing in Semantic Web, web services, and
  advanced AI technologies for federal and
  enterprise customers 
                 http://clarkparsia.com/
                 Twitter: @candp
                                             2
Some Applications
• Customer and product data
  – Find which customer would be interested in buying a
    certain product
• System and component descriptions
  – Configure components to build a desired system
• Workforce and employee data
  – Locate employees with desired expertise
• Patient history and drug data
  – Detect and prevent potentially harmful drug interactions

                                                     3
Common Theme
• There is data and lots of it!
• Adding semantics to the data helps a lot
  – Some times simple taxonomies, but other times,
    complex ontologies
• We have complete knowledge about the domain
• Errors in the data cause problems
  – Failures in applications, errors in decision making,
    potential loss of revenue, security vulnerabilities, etc.


                                                      4
Data Validation
• Fundamental data management problem
  – Verify data integrity and correctness 
  – Enforce validity of updates 
• Relevant in many scenarios
  – Storing data for stand-alone applications
  – Exchanging data in distributed settings
• Solved (to some degree) in RDBMSs
  – Harder to achieve as data semantics increase and/or
    more expressive integrity conditions are required
                                                 5
Disclaimer
• Data validity not important for every use case
  – Invalid data may be fine for an application
  – Invalidity may even be a requirement
• Focus of this talk is cases where data consistency
  and integrity are crucial




                                                 6
Roadmap for an App
• How to build one of these applications?
  – Represent data as RDF triples
     • First step for accomplishing data integration and analysis
  – Enrich data with more semantics (RDFS, OWL)
     • Infer implicit information from explicit assertions
  – Ensure data validity
     • Detect errors in the data
  – Do something cool with the data
     • Obviously...

                                                             7
Reasoning Example
• Input ontology
      # Every manager is an employee
      Manager subClassOf Employee
      # Person0853 is a manager
      Person0853 type Manager
• Output inferences
      # Person0853 is an employee
      Person0853 type Employee
Reasoning Example
• Input ontology
      # Every manager is an employee
                                       Schema
      Manager subClassOf Employee
      # Person0853 is a manager
      Person0853 type Manager
• Output inferences
      # Person0853 is an employee
      Person0853 type Employee
Reasoning Example
• Input ontology
      # Every manager is an employee
                                       Schema
      Manager subClassOf Employee
      # Person0853 is a manager
      Person0853 type Manager          Instance data
• Output inferences
      # Person0853 is an employee
      Person0853 type Employee
Validating RDF Data
• Common misunderstanding
  – RDFS/OWL is to RDF what XML Schema is to XML
  – Describe integrity conditions in RDFS or OWL
     • Typing constraints - RDFS domain/range
     • Participation constraints - OWL some values restrictions
     • Uniqueness constraints - OWL cardinality restriction
  – Use a reasoner to find inconsistencies
• Problem: Open World Assumption


                                                         9
Closed vs. Open World
• Two different views on truth:
   – CWA: Any statement that is not known to be true is false
   – OWA: A statement is false only if it is known to be false
• Used in different contexts
   – Databases use CWA because (typically) they contain 
     complete information
   – Ontologies use OWA because (typically) they don't...
     that is, they contain incomplete information
• Data validation results significantly different when
  using CWA instead of OWA
                                                      10
Typing Constraint
 • Only managers can supervise employees
 • Input ontology
    o   supervises domain Manager
    o   Person085 supervises Person173


                      OWA                        CWA
 Consistent           true                       false
              Infer that               Assume that
 Reason       Person085 type Manager   Person085 type not Manager
Participation Constraint
• Each supervisor must supervise at least
  one employee
• Input axioms
  o   Supervisor subClassOf supervises some Employee
  o   Person085 type Supervisor

                       OWA                      CWA
Consistent                 true                  false

              Infer that                Assume that
Reason       Person085 supervises _:b   Person085 supervises _:b
             _:b type Employee          does not exist
Uniqueness Constraint
 • Employees can have at most one supervisor
 • Input axioms
    o   supervises InverseFunctional
    o   Person085 supervises Person173
    o   Person632 supervises Person173


                      OWA                          CWA
Consistent             true                         false
                                        Assume that
             Infer that
Reason       Person085 sameAs Person632
                                        Person085 sameAs Person632
                                        does not hold
Workarounds for CW
• Manually close the world
  – Declare all individuals different from each other
  – Count existing property values and add a max
    cardinality restriction
  – Make all disjointness statements explicit and add
    negated types to individuals
• Drawbacks
  – Can be computationally expensive
  – Likely to be error-prone
Problem Summary
• Definitions in an OWL schema may have two
  purposes
  – Infer new statements
  – Check if existing statements are valid
• Using OWA for validation is undesirable
  – Not always but in many cases
• In a problem domain we may have:
  – Complete knowledge about some parts of the domain
  – Incomplete knowledge about the other parts
Integrity Constraint
             Solution
• We defined an alternative semantics for OWL
  – Integrity Constraint (IC) semantics use CWA
  – Can be combined with regular inference axioms
• Ontology developer chooses which axioms will
  be interpreted with...
  – OWA - regular OWL axiom, or
  – CWA - integrity constraint
IC Extension
• Syntax specification
  – How do we syntactically say an axiom is an IC and
    not a regular OWL axiom?
• Semantics specification
  – How do we exactly interpret an IC?
• Validation algorithm
  – Given the semantics how do we check for IC
    violations?
IC Syntax
• Similar approach to using owl:imports
• Define a new annotation property in a new
  namespace

         Ont1 owl:imports Ont2
         Ont1 ic:imports IC1

• Backward compatible, requires minimum change
  in tools
IC Semantics
• OWL semantics based on model theory
  – Similar to First Order Logic
  – Formal, precise, and unambiguous
• IC semantics specification
  – Extends OWL model theory
  – Change couple basic definitions, everything else
    follows
• Details published in technical papers
  – We are submitting a W3C member submission soon
Use Case: SKOS
• Simple Knowledge Organization System (SKOS)
• SKOS provides a model for expressing the basic
  structure and content of concept schemes
  – Thesauri, classification schemes, subject heading lists,
    taxonomies, folksonomies, etc.
• SKOS data model specification
  – Informal (Text): http://www.w3.org/TR/skos-reference/
  – Formal (OWL): http://www.w3.org/2004/02/skos/core.rdf


                                                       20
SKOS Example
# SKOS reference ontology that contains inference rules
skos:broaderTransitive Transitive                       skos-reference.ttl
skos:broaderTransitive subPropertyOf skos:broader

# Constraints from SKOS reference expressed as ICs
skos:related propertyDisjointWith skos:broaderTransitive skos-constraints.ttl


# SKOS data that violates the SKOS data model
[] a owl:Ontology ; owl:imports skos-reference.ttl ;
                     ic:imports skos-constraints.ttl .     skos-invalid.ttl

A skos:broader B ; skos:related C .
B skos:broader C .
Explanation
VIOLATION: A violates related propertyDisjointWith broaderTransitive
   INFERRED: A related C
      ASSERTED: A related C
   INFERRED: A broaderTransitive C
      ASSERTED: A broader B
      ASSERTED: B broader C
      ASSERTED: broader subPropertyOf broaderTransitive
      ASSERTED: broaderTransitive Transitive



                                                        22
Another SKOS Example
# SKOS-XL ontology with a cardinality restriction
skosxl:Label subClassOf                             skos-xl.ttl
                 skosxl:literalForm cardinality 1

# SKOS data that violates the SKOS data model
[] a owl:Ontology ; owl:imports skos-xl.ttl .
                                                    skos-data.tll
A skosxl:labelRelation LabelA
LabelA type skosxl:Label .


            Result: Consistent
Another SKOS Example
# SKOS-XL ontology with a cardinality restriction
skosxl:Label subClassOf                             skos-xl.ttl
                 skosxl:literalForm cardinality 1

# SKOS data that violates the SKOS data model
[] a owl:Ontology ; owl:imports skos-xl.ttl ;
                     ic:imports skos-xl.ttl .       skos-data.tll

A skosxl:labelRelation LabelA
LabelA type skosxl:Label .


            Result: IC Violation
Linked Data Application
• Large amounts of instance data
• Validate before publishing/consuming LOD
• Instance data + Inference axioms + Constraints
  – Infer new facts using inference axioms with OWA
  – Validate data using constraints with CWA
  – Inference axioms and constraints are both expressed
    in OWL



                                                25
Validation Algorithm
• An automated translation algorithm
• Automatically maps an OWL IC to ...
  – A SPARQL query, or
  – A RIF rule
• Many different implementation possibilities
• Off-the-shelf tools can be used for IC validation
SPARQL Translation
Supervisor subClassOf supervises some Employee



       SELECT * {
          ?x type Supervisor.
          NOT EXISTS {
             ?x supervises ?y.
             ?y type Employee.
          }
       }
RIF Translation
Supervisor subClassOf supervises some Employee



       Forall ?x ?y (
         invalid() :- And (
            ?x[type -> Supervisor]
            Naf And (
               ?x[supervises -> ?y]
               ?y[type -> Employee] )))
Solution Summary
• Separate ICs from regular OWL ICs
  – No new syntax
  – Import-based mechanism
• Alternative semantics for ICs
  – Extends OWL model theory
  – Provides the meanings of ICs formally
• Validation algorithm
  – Translate ICs to another formalism
  – SPARQL or RIF engines can be used
Performance
• Using ICs can improve performance!
• Expressive OWL reasoning is not easy
• Profiles of OWL defined for tractable reasoning
  – OWL 2 QL, OWL 2 EL, OWL 2 RL
  – Less expressive but more efficient
• Modeling some OWL axioms as ICs may reduce
  the overall expressivity


                                          30
Prototype
• Pellet IC validator
  –   Translates ICs into SPARQL queries automatically
  –   Executes SPARQL queries with Pellet
  –   Query results show constraint violations
  –   Automatically explain constraint violations
• Free download
  – http://clarkparsia.com/pellet/icv



                                                  31
Code Example
// create an inferencing model using Pellet reasoner
InfModel dataModel = ModelFactory.createInfModel(r);

// load the schema and instance data to Pellet
dataModel.read( "file:data.rdf" );
dataModel.read( "file:schema.owl" );

// Create the IC validator and associate it with the dataset
JenaICValidator validator = new JenaICValidator(dataModel);

// Load the constraints into the IC validator
validator.getConstraints().read("file:constraints.owl");

// Get the constraint violations
Iterator<ConstraintViolation> violations =
                                      validator.getViolations();
Next Steps
• W3C Member submission for IC semantics
• Robust IC validator implementation
  – Incremental validation
  – Multi-threaded validation
• Support for IC editing
• Integration with PelletDb
  – Scalable reasoning + validation


                                       33
References
• Evren Sirin, Michael Smith, Evan Wallace
  Opening, Closing Worlds - On Integrity Constraints
  OWL: Experiences and Directions Workshop
  (OWLED '08), October 2008.
• Evren Sirin, Jiao Tao
  Towards Integrity Constraints in OWL
  OWL: Experiences and Directions Workshop
  (OWLED '09), October 2009.
• Jiao Tao, Evren Sirin, Jie Bao, Deborah L. McGuinness
  Integrity Constraints in OWL
  To AppearThe 24th AAAIConference on Artificial
  Intelligence (AAAI '10), July 2010.
Questions

Validating Linked Data with OWL

  • 1.
    Using OWL in ClosedWorld Applications Evren Sirin, CTO Clark & Parsia, LLC evren@clarkparsia.com
  • 2.
    Who are we? •Clark & Parsia is a semantic software startup  – HQ in Washington, DC & office in Boston • Provides software development and integration services • Specializing in Semantic Web, web services, and advanced AI technologies for federal and enterprise customers  http://clarkparsia.com/ Twitter: @candp 2
  • 3.
    Some Applications • Customerand product data – Find which customer would be interested in buying a certain product • System and component descriptions – Configure components to build a desired system • Workforce and employee data – Locate employees with desired expertise • Patient history and drug data – Detect and prevent potentially harmful drug interactions 3
  • 4.
    Common Theme • Thereis data and lots of it! • Adding semantics to the data helps a lot – Some times simple taxonomies, but other times, complex ontologies • We have complete knowledge about the domain • Errors in the data cause problems – Failures in applications, errors in decision making, potential loss of revenue, security vulnerabilities, etc. 4
  • 5.
    Data Validation • Fundamental datamanagement problem – Verify data integrity and correctness  – Enforce validity of updates  • Relevant in many scenarios – Storing data for stand-alone applications – Exchanging data in distributed settings • Solved (to some degree) in RDBMSs – Harder to achieve as data semantics increase and/or more expressive integrity conditions are required 5
  • 6.
    Disclaimer • Data validitynot important for every use case – Invalid data may be fine for an application – Invalidity may even be a requirement • Focus of this talk is cases where data consistency and integrity are crucial 6
  • 7.
    Roadmap for anApp • How to build one of these applications? – Represent data as RDF triples • First step for accomplishing data integration and analysis – Enrich data with more semantics (RDFS, OWL) • Infer implicit information from explicit assertions – Ensure data validity • Detect errors in the data – Do something cool with the data • Obviously... 7
  • 8.
    Reasoning Example • Inputontology # Every manager is an employee Manager subClassOf Employee # Person0853 is a manager Person0853 type Manager • Output inferences # Person0853 is an employee Person0853 type Employee
  • 9.
    Reasoning Example • Inputontology # Every manager is an employee Schema Manager subClassOf Employee # Person0853 is a manager Person0853 type Manager • Output inferences # Person0853 is an employee Person0853 type Employee
  • 10.
    Reasoning Example • Inputontology # Every manager is an employee Schema Manager subClassOf Employee # Person0853 is a manager Person0853 type Manager Instance data • Output inferences # Person0853 is an employee Person0853 type Employee
  • 11.
    Validating RDF Data •Common misunderstanding – RDFS/OWL is to RDF what XML Schema is to XML – Describe integrity conditions in RDFS or OWL • Typing constraints - RDFS domain/range • Participation constraints - OWL some values restrictions • Uniqueness constraints - OWL cardinality restriction – Use a reasoner to find inconsistencies • Problem: Open World Assumption 9
  • 12.
    Closed vs. OpenWorld • Two different views on truth: – CWA: Any statement that is not known to be true is false – OWA: A statement is false only if it is known to be false • Used in different contexts – Databases use CWA because (typically) they contain  complete information – Ontologies use OWA because (typically) they don't... that is, they contain incomplete information • Data validation results significantly different when using CWA instead of OWA 10
  • 13.
    Typing Constraint •Only managers can supervise employees • Input ontology o supervises domain Manager o Person085 supervises Person173 OWA CWA  Consistent true false Infer that Assume that  Reason Person085 type Manager Person085 type not Manager
  • 14.
    Participation Constraint • Eachsupervisor must supervise at least one employee • Input axioms o Supervisor subClassOf supervises some Employee o Person085 type Supervisor OWA CWA Consistent true false Infer that Assume that Reason Person085 supervises _:b Person085 supervises _:b _:b type Employee does not exist
  • 15.
    Uniqueness Constraint •Employees can have at most one supervisor • Input axioms o supervises InverseFunctional o Person085 supervises Person173 o Person632 supervises Person173 OWA CWA Consistent true false Assume that Infer that Reason Person085 sameAs Person632 Person085 sameAs Person632 does not hold
  • 16.
    Workarounds for CW •Manually close the world – Declare all individuals different from each other – Count existing property values and add a max cardinality restriction – Make all disjointness statements explicit and add negated types to individuals • Drawbacks – Can be computationally expensive – Likely to be error-prone
  • 17.
    Problem Summary • Definitionsin an OWL schema may have two purposes – Infer new statements – Check if existing statements are valid • Using OWA for validation is undesirable – Not always but in many cases • In a problem domain we may have: – Complete knowledge about some parts of the domain – Incomplete knowledge about the other parts
  • 18.
    Integrity Constraint Solution • We defined an alternative semantics for OWL – Integrity Constraint (IC) semantics use CWA – Can be combined with regular inference axioms • Ontology developer chooses which axioms will be interpreted with... – OWA - regular OWL axiom, or – CWA - integrity constraint
  • 19.
    IC Extension • Syntaxspecification – How do we syntactically say an axiom is an IC and not a regular OWL axiom? • Semantics specification – How do we exactly interpret an IC? • Validation algorithm – Given the semantics how do we check for IC violations?
  • 20.
    IC Syntax • Similarapproach to using owl:imports • Define a new annotation property in a new namespace Ont1 owl:imports Ont2 Ont1 ic:imports IC1 • Backward compatible, requires minimum change in tools
  • 21.
    IC Semantics • OWLsemantics based on model theory – Similar to First Order Logic – Formal, precise, and unambiguous • IC semantics specification – Extends OWL model theory – Change couple basic definitions, everything else follows • Details published in technical papers – We are submitting a W3C member submission soon
  • 22.
    Use Case: SKOS •Simple Knowledge Organization System (SKOS) • SKOS provides a model for expressing the basic structure and content of concept schemes – Thesauri, classification schemes, subject heading lists, taxonomies, folksonomies, etc. • SKOS data model specification – Informal (Text): http://www.w3.org/TR/skos-reference/ – Formal (OWL): http://www.w3.org/2004/02/skos/core.rdf 20
  • 23.
    SKOS Example # SKOSreference ontology that contains inference rules skos:broaderTransitive Transitive skos-reference.ttl skos:broaderTransitive subPropertyOf skos:broader # Constraints from SKOS reference expressed as ICs skos:related propertyDisjointWith skos:broaderTransitive skos-constraints.ttl # SKOS data that violates the SKOS data model [] a owl:Ontology ; owl:imports skos-reference.ttl ;                  ic:imports skos-constraints.ttl . skos-invalid.ttl A skos:broader B ; skos:related C . B skos:broader C .
  • 24.
    Explanation VIOLATION: A violatesrelated propertyDisjointWith broaderTransitive INFERRED: A related C ASSERTED: A related C INFERRED: A broaderTransitive C ASSERTED: A broader B ASSERTED: B broader C ASSERTED: broader subPropertyOf broaderTransitive ASSERTED: broaderTransitive Transitive 22
  • 25.
    Another SKOS Example #SKOS-XL ontology with a cardinality restriction skosxl:Label subClassOf skos-xl.ttl skosxl:literalForm cardinality 1 # SKOS data that violates the SKOS data model [] a owl:Ontology ; owl:imports skos-xl.ttl . skos-data.tll A skosxl:labelRelation LabelA LabelA type skosxl:Label . Result: Consistent
  • 26.
    Another SKOS Example #SKOS-XL ontology with a cardinality restriction skosxl:Label subClassOf skos-xl.ttl skosxl:literalForm cardinality 1 # SKOS data that violates the SKOS data model [] a owl:Ontology ; owl:imports skos-xl.ttl ;                  ic:imports skos-xl.ttl . skos-data.tll A skosxl:labelRelation LabelA LabelA type skosxl:Label . Result: IC Violation
  • 27.
    Linked Data Application •Large amounts of instance data • Validate before publishing/consuming LOD • Instance data + Inference axioms + Constraints – Infer new facts using inference axioms with OWA – Validate data using constraints with CWA – Inference axioms and constraints are both expressed in OWL 25
  • 28.
    Validation Algorithm • Anautomated translation algorithm • Automatically maps an OWL IC to ... – A SPARQL query, or – A RIF rule • Many different implementation possibilities • Off-the-shelf tools can be used for IC validation
  • 29.
    SPARQL Translation Supervisor subClassOfsupervises some Employee SELECT * { ?x type Supervisor. NOT EXISTS { ?x supervises ?y. ?y type Employee. } }
  • 30.
    RIF Translation Supervisor subClassOfsupervises some Employee Forall ?x ?y ( invalid() :- And ( ?x[type -> Supervisor] Naf And ( ?x[supervises -> ?y] ?y[type -> Employee] )))
  • 31.
    Solution Summary • SeparateICs from regular OWL ICs – No new syntax – Import-based mechanism • Alternative semantics for ICs – Extends OWL model theory – Provides the meanings of ICs formally • Validation algorithm – Translate ICs to another formalism – SPARQL or RIF engines can be used
  • 32.
    Performance • Using ICscan improve performance! • Expressive OWL reasoning is not easy • Profiles of OWL defined for tractable reasoning – OWL 2 QL, OWL 2 EL, OWL 2 RL – Less expressive but more efficient • Modeling some OWL axioms as ICs may reduce the overall expressivity 30
  • 33.
    Prototype • Pellet ICvalidator – Translates ICs into SPARQL queries automatically – Executes SPARQL queries with Pellet – Query results show constraint violations – Automatically explain constraint violations • Free download – http://clarkparsia.com/pellet/icv 31
  • 34.
    Code Example // createan inferencing model using Pellet reasoner InfModel dataModel = ModelFactory.createInfModel(r); // load the schema and instance data to Pellet dataModel.read( "file:data.rdf" ); dataModel.read( "file:schema.owl" ); // Create the IC validator and associate it with the dataset JenaICValidator validator = new JenaICValidator(dataModel); // Load the constraints into the IC validator validator.getConstraints().read("file:constraints.owl"); // Get the constraint violations Iterator<ConstraintViolation> violations = validator.getViolations();
  • 35.
    Next Steps • W3CMember submission for IC semantics • Robust IC validator implementation – Incremental validation – Multi-threaded validation • Support for IC editing • Integration with PelletDb – Scalable reasoning + validation 33
  • 36.
    References • Evren Sirin,Michael Smith, Evan Wallace Opening, Closing Worlds - On Integrity Constraints OWL: Experiences and Directions Workshop (OWLED '08), October 2008. • Evren Sirin, Jiao Tao Towards Integrity Constraints in OWL OWL: Experiences and Directions Workshop (OWLED '09), October 2009. • Jiao Tao, Evren Sirin, Jie Bao, Deborah L. McGuinness Integrity Constraints in OWL To AppearThe 24th AAAIConference on Artificial Intelligence (AAAI '10), July 2010.
  • 37.