A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

2,143 views

Published on

This tutorial will provide detailed instruction to create and make use of formalized ontologies from linked open data for advanced knowledge discovery including consistency checking and answering sophisticated questions.

Automated reasoning in OWL offers the tantalizing possibility to undertake advanced knowledge discovery including verifying the consistency of conceptual schemata in information systems, verifying data integrity and answering expressive queries over the conceptual schema and the data. Given that a large amount of structured knowledge is now available as linked data, the challenge is to formalize this knowledge iso that intended semantics become explicit and that the reasoning is efficient and scalable. While using the full expressiveness of OWL 2 yields ontologies that can be used for consistency verification, classification and query answering, use of less expressive OWL profiles enable efficient reasoning and support different application scenarios. In this tutorial, 
- we describe how to generate OWL ontologies from linked data
- check consistency of knowledge
- automatically transform ontologies into OWL profiles
- use this knowledge in applications to integrate data and answer sophisticated questions across domains.
- expressive ontologies enables data integration, verifying consistency of knowledge and answering questions
- formalization of linked data will create new opportunities for knowledge discovery
- OWL 2 profiles support more efficient reasoning and query answering procedures
- recent technology facilitates the automatic conversion of OWL 2 ontologies into profiles
- OWL ontologies can dramatically extend the functionality of semantically-enabled web sites

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,143
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
63
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

A little more semantics goes a lot further!  Getting more out of Linked Data with OWL

  1. 1. A little more semantics goes a lot further! Getting more out of Linked Data with OWL Dr. Michel Dumontier Dr. Robert Hoehndorf
  2. 2. Abstract This tutorial will provide detailed instruction to create and make use of formalized ontologies from linked open data for advanced knowledge discovery including consistency checking and answering sophisticated questions. Automated reasoning in OWL offers the tantalizing possibility to undertake advanced knowledge discovery including verifying the consistency of conceptual schemata in information systems, verifying data integrity and answering expressive queries over the conceptual schema and the data. Given that a large amount of structured knowledge is now available as linked data, the challenge is to formalize this knowledge iso that intended semantics become explicit and that the reasoning is efficient and scalable. While using the full expressiveness of OWL 2 yields ontologies that can be used for consistency verification, classification and query answering, use of less expressive OWL profiles enable efficient reasoning and support different application scenarios. In this tutorial, - we describe how to generate OWL ontologies from linked data - check consistency of knowledge - automatically transform ontologies into OWL profiles - use this knowledge in applications to integrate data and answer sophisticated questions across domains. - expressive ontologies enables data integration, verifying consistency of knowledge and answering questions - formalization of linked data will create new opportunities for knowledge discovery - OWL 2 profiles support more efficient reasoning and query answering procedures - recent technology facilitates the automatic conversion of OWL 2 ontologies into profiles - OWL ontologies can dramatically extend the functionality of semantically-enabled web sites OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 2
  3. 3. skills obtained • understand the nature and capability of a formal ontology and information system • understand the subtle differences between OWL2 and its profiles, including difference in constructs, when to apply these profiles and how to convert ontologies in this format • understand the distinction between a class and an individual and their descriptions • understand how to convert RDF triples in Linked Data into axioms for an OWL ontology • understand how to execute standard reasoning services (classification, consistency checking, realization, query answering) on an OWL ontology using the OWL API and an OWL reasoner, with focus on OWL-EL ontologies and reasoners. • understand how to identify inconsistencies and simple patterns to remove or repair them • Understand how to convert large amounts of linked data into a large scale OWL knowledge base and enable tractable reasoning over it OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 3
  4. 4. 90 min Outline 1. introduction (10min) • case study: SGD • linked data vs ontology • RDF vs OWL • Motivation: can we use some features of OWL to organize, verify and exploit Linked Data? 2. Formalization • OWL2 – elements, expressions and axioms • Triples to axioms • Role of top level ontologies (classes + relations) • Axiom patterns 3. Practical Reasoning • classification using CEL/CB/Pellet/HermiT/... • OWL profiles • Modularization (EL Vira) • Diagnosis and Repair • Explanations • Inference of new triples 4. Conclusion 4OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial
  5. 5. Saccharomyces Genome Database A repository for all things yeast. includes : • molecular entities, their parts o chromosomes; genes, open reading frames, etc o rna, proteins; domains • qualities, realizables (dispositions, functions) • interactions and their participants • complexes, their parts and their topology • pathways and their components • phenotypes and their basis 5OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial
  6. 6. Hexokinase (HXK1) The HXK1 gene encodes the HXK1 protein - which is responsible for the conversion of glucose to glucose-6- phosphate in the first step of glycolysis. Gene: (region of DNA) Protein (macromolecule) 6OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial
  7. 7. Questions we may want to ask about HXK1: • What kind of thing is HXK1? • What are the implications of being a gene? o In which chromosome does it appear? o Which entities does it encode? • What are the implications of being a protein? o What is its function? o Where is it located in the cell? o If HXK1 participates in processes that involves other cellular components, where else must HXK1 be located? • Is HXK1 annotation consistent? o does the annotation contradict common biological knowledge? o Is it possible for HXK1 to have multiple locations when it can only be located on one chromosome? OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 7
  8. 8. SGD refers to other data sources Gene Ontology - functions, locations, processes Ascomycetes Phenotype Ontology - experiments, interactions and phenotypes Pubmed - abstracts of published research articles + MeSH terms over 40 references to other molecular/data entities for which the relation is unclear… 8OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial
  9. 9. Bio2RDF’s RDFized data fits together 9syntactic integration OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial
  10. 10. SGD as RDF-based Linked Open Data 10OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial
  11. 11. Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ SGD is provided by Bio2RDF and forms part of the growing linked open data cloud 11OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial
  12. 12. Semantic Integration • Requires a level of abstraction/generalization where the relationship between each resource is formalized – classes – relations • How do we ensure that our representation facilitates integration across datasets? • How can we get our formalization to interoperate with ontologies? 12OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial
  13. 13. Early conceptualization 13OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial
  14. 14. More advanced conceptualization OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 14
  15. 15. Semantic Technologies: RDF vs OWL RDF: simple triples, graph-based queries, supports very large amount of data OWL: significantly more expressive language, strong axioms, inference capabilities, consistency verification, but can be rather slow OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 15
  16. 16. RDF-based Linked Data • Provides the basis for simple data syndication and syntactic data integration o IRIs o Statements (aka triples) take the form of o <subject> <predicate> <object> • Easy to implement o stand-alone datasets o logical layer over databases • Limited reasoning o class and property hierarchies o domain/range restrictions o can’t automatically discover inconsistency • Standardized Queries - SPARQL • Scalable - to billions of triples OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 16
  17. 17. OWL - The Web Ontology Language • Enhanced vocabulary (strong axioms) to express knowledge relating to classes, properties, individuals and data values o quantifiers (existential, universal, cardinality restriction) o negation o disjunction o property characteristics o complex classes in domain and range restrictions o property chains • Advanced reasoning OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 17
  18. 18. Advanced Reasoning • Consistency: determines whether the ontology contains contradictions. • Satisfiability: determines whether classes can have instances. • Subsumption: is class C1 implicitly a subclass of C2? • Classification: repetitive application of subsumption to discover implicit subclass links between named classes • Realization: find the most specific class that an individual belongs to. OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 18
  19. 19. OWL Challenges and Solutions Inconsistency: • needs to be resolved to ask any questions involving the ontology • Solution: explicitly accommodate multiple meanings, remove contradictory axioms Unsatisfiability (of a class): • may indicate a modelling error • needs to be resolved to ask meaningful questions about the class • Solution: explicitly accommodate multiple meanings, redefine class, remove contradicting class restrictions OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 19
  20. 20. OWL Challenges and Solutions Scalability: • answers to OWL queries requires reasoning • inference in OWL is highly complex (worst case: 2 NEXPTIME) • highly optimized reasoners are getting better and better, but can still be slow with large ontologies • tractable OWL profiles (EL, QL, RL) enable more efficient and guaranteed polynomial-time inferences • use ontology modularization approaches to increase performance OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 20
  21. 21. Linked data and OWL: Motivation • use OWL reasoning to identify mistakes in RDF data o incorrect content of assertions o incorrect use of relations o conflicting conceptualizations o incorrect same-as assertions • verify, fix and exploit Linked Data through expressive OWL reasoning • generate/infer new triples to write back into RDF and use for efficient retrieval Proposal: Convert RDF to OWL to perform inferences and represent inferences in RDF after classification. OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 21
  22. 22. OWL can help you create rich, machine- understandable descriptions! • transform our expert knowledge into axioms and expressions that can be automatically reasoned about o a transcription factor is  a protein  that binds to DNA  and regulates the expression of a gene. o can we mine 'omic datasets to discover which proteins are transcription factors? • create rich expressions from combinations of classes, relations and individuals • assert statements of truth using axioms. OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 22
  23. 23. Elements of OWL 2.0 • The “ontology” of OWL 2 consists of: • Classes • Object properties • Data properties • Individuals • Expressions • Axioms • Plus RDF stuff (like datatypes) OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 23
  24. 24. Classes and class axioms • a class is a set of individuals that share one or more characteristics o a protein • classes can be organized in a hierarchy using subClassOf axioms o i.e. every member of C2 is a member of C1 o subClassOf (protein molecule) • special classes o owl:Thing is the superclass of all things o owl:Nothing is the subclass of all things, denotes an empty set • classes can be made disjoint from one another o i.e. there is no member of C1 that is also a member of C2 o disjointClasses (protein DNA ) • classes can be said to be equivalent o i.e. all members of C1 are members of C2 and all members of C2 are members of C1 o EquivalentClass (Peptide Polypeptide ) OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 24
  25. 25. Object Properties and axioms • an object property OP is a relation between two individuals o 'has part' is an object property that denotes the mereological relation between two individuals • OPs can be organized in a hierarchy o given OP1 and OP2 and OP2 is a subproperty of OP1 then if an individual x is connected by OP2 to an individual y, then x is also connected by OP2 to y. o subPropertyOf ('has proper part' 'has part') o owl:TopObjectProperty, owl:BottomObjectProperty • We can restrict the domain and range to allowed values • ObjectPropertyDomain ('is participant in', 'process') • ObjectPropertyRange ('is participant in', 'physical entity') • We can also assert objects to be disjoint or equivalent OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 25
  26. 26. description of object properties • Inverse o we say that 'has part' is an inverse for 'is part of' o we can also refer to this as inv('is part of') • Symmetric o to cases where the inverse relation is the very same relation o e.g. the inverse for 'is related to' is 'is related to‘ • Transitive o a transitive relation if individual x is connected to an individual y that is connected by to an individual z, then x is also connected by to z o e.g. 'has part' is transitive OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 26
  27. 27. description of object properties • Reflexive o reflexive infers that the relation automatically refers back to the individual o e.g. 'has part' is reflexive because protein has itself as a part. • Functional o restrict the range of the relation to a single individual, and therefore all individuals in the range must be the same. o e.g. 'has unique identifier‘ • Inverse Functional o restrict the domain of the relation to a single individual, therefore all individuals in the domain must be the same o e.g. 'is unique identifier of' OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 27
  28. 28. Class Expressions Class expressions are rich descriptions of classes through the logical combination of ontological primitives (classes, object properties, datatype properties, individuals) Protein subClassOf molecule and ‘has proper part’ min 2 ‘amino acid residues’ Combinations specified using logical operators • conjunction (and), disjunction (or), negation (not) Object or data property expressions provide a qualified cardinality over the relation o minimum: rel min # Y o maximum: rel max # Y o exact: rel exactly # Y (minimum + maximum) o some: rel min 1 Y OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 28
  29. 29. Class Expressions o The quantifications can qualified by the object type o rel only Y – the only values allowed are of type Y • To form complex class expressions like o 'molecule' and not 'dna' o 'has part' min 2 'amino acid' o 'is located in' only ('nucleus' or 'cytoplasm') • and be expressed as axioms in the ontology Protein subClassOf molecule and ‘has proper part’ min 2 ‘amino acid residues’ Transcription Factor equivalentClass ‘protein’ and ‘has disposition’ some ‘to bind to DNA’ and ‘has function’ some ‘to regulate gene expression’ OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 29
  30. 30. Triples to axioms Convert RDF triples into OWL axioms. Triple in RDF: <Nucleus> <partOf> <Cell> • Nucleus and Cell are classes • partOf is a relation between 2 classes • intended meaning: every instance of Nucleus is partOf some instance of Cell • formalize as OWL axiom: Nucleus subClassOf partOf some Cell OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 30
  31. 31. Triples to axioms Triple in RDF: <Cytosol> <isLocationOf> <HXK1> • Cell and HXK1 are classes • isLocationOf is an axiom pattern involving 2 classes • intended meaning: • every instance of HXK1 is located at some instance of Cytosol • not intended: • for every instance of Cytosol, there is an instance of HXK1 located in it. HXK1 subClassOf hasLocation some Cytosol inv(isLocationOf) some Cytosol OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 31
  32. 32. Triples to axioms Convert RDF triples into OWL axioms. Triple in RDF: <C1 R C2> • C1 and C2 are classes, R a relation between 2 classes • intended meaning: o C1 SubClassOf: C2 o C1 SubClassOf: R some C2 o C1 SubClassOf: R only C2 o C2 SubClassOf: R some C1 o C1 SubClassOf: S some C2 o C1 DisjointFrom C2 o C1 and C2 SubClassOf: owl:Nothing o R some C1 DisjointFrom: R some C2 o C1 EquivalentClasses C2 o ... • in general: P(C1, C2), where P is an OWL axiom (template) OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 32 Challenge: Formalizing data requires one to commit to a particular meaning – to make an ontological commitment
  33. 33. Triples to axioms Formalizing RDF triples in OWL often introduces new OWL object properties. • Which object properties should be included? • What axioms hold for included object properties? • Can domain and range restrictions be generalized across multiple domains, i.e., reused across multiple linked data sources to ensure consistency between them? OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 33 Challenges
  34. 34. Top level ontologies contain generalized (domain independent) classes and relations OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 34 They can be used to constrain what can be said about these entities (and hence will later be useful for checking the consistency of data annotated using these terms).
  35. 35. Basic classes in top-level ontologies • Material entity • Example: Apple, Human, Cell, Planet • Has mass as an quality • Located in space and time • Independent of other entities • it exists in whole whenever it exists • Quality • Example: mass, color, concentration • Dependent: always the quality of some entity • Quality of object: size, shape, length • Quality of process: duration, rate • Quality of quality: shade (of color), intensity OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 35
  36. 36. Basic classes in top-level ontologies • Function • e.g. to bind, to catalyze (a reaction), to kill bacteria • Dependent: always the function of some thing • Similar to a property of an object • Represents the potential to do something (an action) in some process • capabilities, dispositions and tendencies • Process • Example: running a marathon, binding, cell division • Located in space and time • Independent of other entities • Temporally extended OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 36
  37. 37. Top-level ontologies make a commitment to these being different things Material object, Process, Function and Quality are mutually disjoint. OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 37
  38. 38. Basic Relations in Top Level Ontologies • Mereological: parthood – ‘has part’, ‘has proper part’, ‘has component part’ • Participatory – ‘is participant in’, ‘is agent in’, ‘is target in • Topology – ‘is connected to’, ‘located in’, ‘contains’, ‘is adjacent to’ • Temporal – ‘derives from’, ‘precedes’, ‘meets’, ‘overlaps’, etc • Referential – ‘describes’, ’references’, ‘represents’ OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 38
  39. 39. Relations in top-level ontologies • relations (object properties) in OWL hold between instances • domain and range restrictions from top-level ontology can be applied for general relations, e.g.: o ‘has part’ can be restricted with "Material object" as both domain and range o ‘participates in’ can be restricted with a domain of "Material object" and a range of "Process“ o re-use of relations enables inferences across resources OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 39
  40. 40. Enforce ontological commitment by mapping to a top-level ontology Foundation of domain classes and relations in top-level ontology: • every domain class becomes a subclass of a class in top- level ontology • every object property used in OWL axioms becomes a sub- property of an object property in the top-level ontology • assert additional axioms to restrict domain classes and delimit it from other domains (where appropriate) o e.g., if a particular resources uses (in RDF) the relation part-of exclusively between processes, the additional constraint can be added OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 40
  41. 41. Top-level ontology Application of a top-level ontology: • can help to make the ontological commitment that is employed within an information system explicit, • can guarantee basic agreement about fundamental types, • agreement about common relations, • provides common domain and range restrictions across multiple domains, and therefore • enables re-use of relations and types across data sources, domains, levels of granularities, information systems. OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 41
  42. 42. Formalization of SGD’s Linked Data SGD uses at least the following relations in RDF: • isPartOf • hasParticipant • isFunctionOf • isLocationOf Can we create patterns from which linked data can be appropriately formalized into OWL axioms? OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 42 axiom patterns
  43. 43. Formalization of SGD Linked Data ?X isPartOf ?Y Can be translated to axiom pattern ?X subClassOf: part-of some ?Y "part-of" is an object property contained in our top-level ontology. Example: HXK1 isPartOf chromosome6_Crick translated to HXK1 subClassOf: part-of some chromosome6_Crick OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 43
  44. 44. Formalization of SGD Linked Data ?X hasParticipant ?Y translated to axiom pattern ?Y subClassOf: participates-in some ?X "participates-in" is an object property contained in our top-level ontology. Example: GO:0005975 (carbohydrate metabolism) hasParticipant HXK1 translated to HXK1 subClassOf: participates-in some GO:0005975 OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 44
  45. 45. Formalization of SGD Linked Data ?X isLocationOf ?Y translated to axiom schema ?Y subClassOf: located-in some ?X Example: GO:0005737 (cytoplasm) isLocationof HXK1 translated to HXK1 subClassOf: located-in some GO:0005737 What if "located-in" is not present in our top-level ontology… OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 45
  46. 46. Formalization of SGD Linked Data Top-level foundation for located-in relation: • declare located-in as sub-property of part-of o verify how located-in is used within SGD, i.e., does located-in imply part-of? o counter-example: misfolded protein located-in chaperone protein, but not misfolded protein part-of chaperone protein • create located-in as super-property of part-of in our top- level ontology: o does part-of imply located-in within SGD? o cell body part-of cell, but not cell body located-in cell OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 46
  47. 47. Formalization of SGD Linked Data Top-level foundation for located-in relation: • add located-in to our top-level ontology o adding the new relation allows its reuse across multiple resources o inclusion may require addition of further classes (e.g., spatial regions) o relation to part-of must be clarified (and part-of may even be replaced by located-in) Establishing the relation between relations and classes depends on how the relations and classes are being applied. OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 47
  48. 48. Formalization of SGD Linked Data Top-level foundation: Translate HXK1 rdf:type OpenReadingFrame to HXK1 subClassOf: OpenReadingFrame OpenReadingFrame (Sequence Ontology) is a subclass of Sequence. OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 48
  49. 49. 49OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial
  50. 50. Formalization of SGD Linked Data Foundation for SGD classes in top-level ontology: • declare Sequence to be a subclass of Material object • import (owl:imports) Sequence Ontology • declare Biological Process (GO) subclass of Process • declare Molecular Function (GO) subclass of Function • import GO • ... to create a top-level foundation (i.e., super-class in top-level ontology for all classes) for SGD OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 50
  51. 51. Implementation • expand relations in RDF based on relational patterns • relational patterns are OWL axioms with 2 variables (which are filled by subject and object, respectively) • implementation based on OWL API • adopt implementation of relational patterns in OBO language (http://code.google.com/p/obo2owl/) Hoehndorf, Robert, Oellrich, Anika, Dumontier, Michel, Kelso, Janet, Herre, Heinrich, and Rebholz-Schuhmann, Dietrich (2010). Relational patterns in OWL and their application to OBO. OWL: Experiences and Directions (OWLED). paper: http://www.webont.org/owled/2010/papers/owled2010_submission_3.pdf presentation: http://www.slideshare.net/micheldumontier/relational-patterns-in- owl-and-their-application-to-obo BMC Bioinformatics: http://www.biomedcentral.com/1471-2105/11/441 OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 51
  52. 52. Another way? • OPPL is an abstract formalism that allows for manipulating ontologies written in OWL. • Use OPPL to select triples and create the axioms OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 52
  53. 53. Operations on OWL ontologies • Consistency: determines whether the ontology contains contradictions. • Satisfiability: determines whether classes can have instances. • Subsumption: is class C1 implicitly a subclass of C2? • Classification: repetitive application of subsumption to discover implicit subclass links between named classes • Realization: find the most specific class that an individual belongs to. OWL reasoners can perform these operations and make the results accessible for further processing. OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 53
  54. 54. Practical reasoning with OWL ontologies • Ontology editors such as Protege interface with reasoners to perform consistency and class satisfiability, classification, realisation, and provide explanations. • Some reasoners are setup to be used as the command line to execute requests including SPARQL querying. • Programmatic use of reasoners via APIs. Maximal flexibility, e.g., one can request all subclasses of a given class, including implicit once, or all entailed statements with a specified subject and predicate 54OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial
  55. 55. 55OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial OWLAPI
  56. 56. Classifying the ontology 56OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial
  57. 57. Classifying the ontology 57OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial
  58. 58. OWL Reasoners OWL DL Reasoners • Pellet: Clark & Parsia, dual-licensed, Java. • Fact++: Manchester University, open-source, C++ with a Java API. • HermiT: Oxford University, open-source, Java. • Racer Pro: Racer Systems, commercial, Lisp with a Java API. OWL Profile/subset reasoners • Jena: Hewlett-Packard, open-source, Java. • OWLIM: Ontotext, dual-licensed, Java. • CB: • CEL: • JCEL (Pellet) • ELLY: OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 58
  59. 59. Automated reasoning over SGD • SGD in OWL contains more than 800,000 axioms • included ontologies contains several thousand axioms o GO has approx. 35,000 classes o ChEBI contains almost 100,000 classes o complex definitions of classes create links between large ontologies • Reasoning in OWL 2 DL is highly complex (worst-case 2NEXPTIME complete). • Consequence: OWL reasoning can rarely be employing in a large scale. • Expressive OWL reasoners do not classify the formalized SGD repository. OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 59
  60. 60. OWL Profiles • OWL 2 defines three different tractable profiles: • EL o polynomial time reasoning for schema and data o Useful for ontologies with large conceptual part • QL o fast (logspace) query answering using RDBMs via SQL o Useful for large datasets already stored in RDBs • RL o fast (polynomial) query answering using rule-extended DBs o Useful for large datasets stored as RDF triple OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 60
  61. 61. OWL RL Features: • identity of classes, instances, properties • subproperties, subclasses, domains, ranges • union and intersection of classes (some restrictions) • property characterizations (functional, symmetric, etc) • property chains • keys • some property restrictions (but not all inferences are possible) Limitations: • not all datatypes are available • no datatype restrictions • no minimum or exact cardinality restrictions • maximum cardinality only with 0 and 1 • some consequences cannot be drawn OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 61
  62. 62. OWL EL Features • existential quantification to a class expression or data range • existential quantification to an individual or a literal • self-restriction • enumerations involving a single individual or a single literal • intersection of classes and data range • class axioms: subClassOf, equivalence, disjointness • property axioms: domain, range, equivalence, transitive, reflexive, inclusion with or without property chains; functional data properties. keys. • assertions (sameAs, DifferentFrom, Class, Object Property, Data Property, Negative Object/Data Property Not supported • universal quantification to a class expression or a data range • cardinality restrictions • disjunction (union) • class negation • enumerations involving more than one individual • object properties: disjoint, symmetric, asymmetric, irreflexive, inverse, functional and inverse-functional OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 62
  63. 63. Ontology modularization Can we automatically extract a large (maximal) OWL (EL, QL, RL) module from an ontology? 1. D EquivalentTo: not A (not EL) 2. C EquivalentTo: not B (not EL) 3. B subClassOf: A (EL) Inference: • D subClassOf: C (EL) (Inference from (1)-(3)) EL module of (1)-(3): • {B subClassOf: A}, or • {B subClassOf: A, D subClassOf: C} 63OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial
  64. 64. EL Vira modularization • ontology modularization • identify EL, QL, RL axioms in deductive closure • retain signature of ontology • maximality is an open problem OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 64 http://el-vira.googlecode.com
  65. 65. Consistency repair • Unsatisfiable classes result from contradictory class definitions • Conflict in asserted axioms, in imported ontologies or through combination of both • Conflicts can be hidden through domain/range restrictions, subclass relations, axioms for relations, etc. • Conflicting axioms may be challenging to identify! OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 65
  66. 66. Protege 4: Explanations OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 66
  67. 67. Consistency repair OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 67
  68. 68. Ontology repair and disambiguation • Ontological constraints may have been too strong • Complex relations (between classes) that are used in multiple meanings can be relaxed by explicitly introducing a disjunction that accommodates the different meanings, e.g.: o (1) Hxk1 part-of Chromosome6_Crick_strand o (2) Hxk1 part-of Hxk1_ATP_complex o (3) Hxk1 part-of Carbohydrate_metabolism o only (1) is consistent with background knowledge that Genes (as material objects) must be part of material objects (more specifically DNA), and that Genes cannot be part of protein complexes OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 68
  69. 69. Ontology repair and disambiguation 1. Hxk1 part-of Chromosome6_Crick_strand 2. Hxk1 part-of Hxk1_ATP_complex 3. Hxk1 part-of Carbohydrate_metabolism part-of here means either ?X subClassOf: part-of some ?Y, or ?X subClassOf: encodes some (part-of some ?Y), or ?X subClassOf: participates-in some ?Y, or ?X subClassOf: encodes some (participates-in some ?Y) OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 69
  70. 70. Ontology repair and disambiguation ?X subClassOf: part-of some ?Y, or ?X subClassOf: encodes some (part-of some ?Y), or ?X subClassOf: participates-in some ?Y, or ?X subClassOf: encodes some (participates-in some ?Y) All four interpretations are disjoint! Create new interpretation for part-of: ?X subClassOf: part-of some ?Y or encodes some (part-of some ?Y) or participates-in some ?Y or encodes some (participates-in some ?Y) OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 70
  71. 71. Inference of revised RDF representation • Query OWL ontology for relational patterns that were used in relation expansion • generates deductive closure of a set of RDF triples with respect to inferences in OWL • naive implementation: o given a pattern P(?X, ?Y), substitute all combination of named classes for ?X and ?Y o runtime: n*n o more efficient implementation work-in-progress OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 71
  72. 72. Inference of revised RDF representation In the definition: ?X subClassOf: part-of some ?Y or encodes some (part-of some ?Y) or participates-in some ?Y or encodes some (participates-in some ?Y) one or more of the classes in the disjunction may become unsatisfiable! • reasoner can be used to decide which interpretation is correct • eliminate remaining interpretations • useful to "split" relations in RDF that have multiple conflicting meanings OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 72
  73. 73. Summary - RDF and OWL RDF provides • light-weight semantics • fast queries • highly scalable implementations • large volumes of data (e.g., DBPedia, other Linked Data repositories) OWL provides • Constructs to formalize the intended semantics • An OWLAPI to develop, manage, and serialize OWL ontologies • Efficient reasoners of get inferences, compute modules and get explanations. • syntactic subset for better performance, albeit some inferences may be lost OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 73
  74. 74. Summary - Reasoning in OWL • verification: reveal contradictory definitions of classes (unsatisfiable classes), conflicting conceptualizations and reveal hidden inferences (that may be considered invalid through manual verification) • repair: through explicit definitions using disjunction, constraints can be relaxed and contradictions reduced • more facts: OWL queries for relational patterns can be used to generate RDF triples that are closed against the constraints and axioms of an OWL knowledge base • powerful queries: queries in OWL can be made for instances and for classes satisfying complex expressions OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 74
  75. 75. Conclusions • ontologies are tools for better knowledge management • ontology (philosophy) is a useful source of well-developed theories that can be applied to ontology design, but only when put into practice as a formalized ontology • formal ontologies can help in getting us closer to the goal of large-scale integration, verification and analysis of data across domains and levels of granularity • The combination of formal ontologies + scalable reasoning will be instrumental in making sense of the Semantic Web. OWLED2011::Dumontier|Hoehndorf::Formalizing Linked Data Tutorial 75

×