Your SlideShare is downloading. ×
ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification

1,998
views

Published on

Published in: Technology, Education

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,998
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
73
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Biomedical Ontologies for data integration and verification Michel Dumontier and Robert HoehndorfCarleton University, University of Cambridge ISMB tutorial @ Vienna. July 16,2011 ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 1
  • 2. Outline1. General background (10min) o an introduction to the use-case: systems biology, SBML and BioModels2. Ontological analysis (45 min) o how to express domain content as formal knowledge using the Web Ontology Language (OWL)3. Application of formal ontology to consistency and data verification (30min) o how to use the OWL formalization to verify the accuracy of annotations, data and constraints in a domain4. Break (30min)5. Mapping, repair and disambiguation using ontologies (30min) o how to relax and disambiguate constraints on ontologies to obtain consistent representation of domain content6. Knowledge discovery, retrieval and querying (15min) o how to answer questions that require the inference of knowledge through automated reasoning7. Efficient implementation in software systems (15min) o how to convert ontologies in efficient formal representations amenable to high-throughput analyses8. Applications in Bioinformatics (25min) 1. how the formalized ontologies can be used to perform bioinformatics analyses– Discussion and questions (15min) ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 2
  • 3. Systems BiologyWe create and simulate biological models to :• gain insight into the structure and function of biochemical networks• reveal metabolic and signalling capabilities so as to predict phenotypes• undertake metabolic engineering to maximize some desired productTo do this, we need • to integrate & manage our data & knowledge in a coherent, scalable and machine understandable manner• efficient software to execute computationally demanding simulations ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 3
  • 4. Bio-ontologies• Provide rich human and machine understandable descriptions of the terms they purport to describe• Have value for semantic annotation of data, which allows integration across domains (granularity, species, experimental methods)• Facilitate granular and cross-domain queries• Can be used to obtain explanations for inferences drawn• Can be efficiently processed by algorithms and software ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 4
  • 5. Biomodels are semantically annotated SBML models• EBI managed resource• 600+ models available as SBML• 300+ models are curated with GO process, function and component terms, and has links to protein databases.• Possible to browse by GO terms: http://www.ebi.ac.uk/biomodels-main/ ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 5
  • 6. Objective:Computational Knowledge Discovery• Terminological resources increasingly being used to annotate SBML-based biomolecular models o Makes it easier to explore or find models• By converting models into formal representations of knowledge we get to: o validate the accuracy of the annotations o infer knowledge explicit in terminological resources o discover biological implications inherent in the models. ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 6
  • 7. SBMLXML-based representation of biochemical models, theircomponents (compartments, species, reactions, events),descriptors (rules, constraints, functions, units)Consider the following enzymatic reaction: ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 7
  • 8. SBML captures reaction kinetics using an XML-based format<?xml version="1.0" encoding="UTF-8"?><sbml level="2" version="3" xmlns="http://www.sbml.org/sbml/level2/version3"> <model name="EnzymaticReaction"> <listOfUnitDefinitions> <unitDefinition id="per_second"> <listOfUnits> <unit kind="second" exponent="-1"/> </listOfUnits> </unitDefinition> <unitDefinition id="litre_per_mole_per_second"> <listOfUnits> <unit kind="mole" exponent="-1"/> <unit kind="litre" exponent="1"/> <unit kind="second" exponent="-1"/> </listOfUnits> </unitDefinition> </listOfUnitDefinitions> <listOfCompartments> <compartment id="cytosol" size="1e-14"/> </listOfCompartments> <listOfSpecies> <species compartment="cytosol" id="ES" initialAmount="0" name="ES"/> <species compartment="cytosol" id="P" initialAmount="0" name="P"/> <species compartment="cytosol" id="S" initialAmount="1e-20" name="S"/> <species compartment="cytosol" id="E" initialAmount="5e-21" name="E"/> </listOfSpecies> ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 8
  • 9. <listOfReactions> <reaction id="veq"> <listOfReactants> <speciesReference species="E"/> <speciesReference species="S"/> </listOfReactants> <listOfProducts> <speciesReference species="ES"/> </listOfProducts> <kineticLaw> <math xmlns="http://www.w3.org/1998/Math/MathML"> <apply> <times/> <ci>cytosol</ci> <apply> <minus/> <apply> <times/> <ci>kon</ci> <ci>E</ci> <ci>S</ci> </apply> <apply> <times/> <ci>koff</ci> <ci>ES</ci> </apply> </apply> </apply> </math> <listOfParameters> <parameter id="kon" value="1000000" units="litre_per_mole_per_second"/> <parameter id="koff" value="0.2" units="per_second"/> </listOfParameters> </kineticLaw> </reaction> ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 9
  • 10. <reaction id="vcat" reversible="false"> <listOfReactants> <speciesReference species="ES"/> </listOfReactants> <listOfProducts> <speciesReference species="E"/> <speciesReference species="P"/> </listOfProducts> <kineticLaw> <math xmlns="http://www.w3.org/1998/Math/MathML"> <apply> <times/> <ci>cytosol</ci> <ci>kcat</ci> <ci>ES</ci> </apply> </math> <listOfParameters> <parameter id="kcat" value="0.1" units="per_second"/> </listOfParameters> </kineticLaw> </reaction> </listOfReactions> </model></sbml> ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 10
  • 11. SBML models may feature several components ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 11
  • 12. SBML specifies the number and kind ofattributes models and components can have ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 12
  • 13. It’s up to the modeler to use those attributes in a meaningful waywhat models have you produced?
  • 14. Biomodels are semantically annotated SBML models• EBI managed resource• 600+ models available as SBML• 300+ models are curated with GO process, function and component terms, and has links to protein databases.• Possible to browse by GO terms: http://www.ebi.ac.uk/biomodels-main/ ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 14
  • 15. Energy (ATP) is produced from glycolysis (break down of glucose) in a series of enzyme-catalyzed biochemical reactions. Fermentation regenerates NAD+ so it can be re- used to metabolize more glucose Analysis and optimization of metabolic pathways important for biotechnologyISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 15
  • 16. Gene Ontology• over 30,000 terms• covers o biological processes o molecular functions o cellular components• terms organized around "is a" hierarchy• terms further described with has part/part of; regulates and + regulates, - regulates ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 16
  • 17. Chemical Entities of Biological Interest (ChEBI)recently refactored to be in line with formal(reasoning capable) ontologyscope includes chemical entities (atoms,substances, groups, molecules), roles andsubatomic particleslarge numbers of curated molecules ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 17
  • 18. SBML annotations are captured using the Resource Description Framework (RDF) <species metaid="_525530" id="GLCi"Implicit subject compartment="cyto"and xml attributes initialConcentration="0.097652231064563"> <annotation> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" The annotation element xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" stores the RDF xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/" xmlns:bqmodel="http://biomodels.net/model-qualifiers/"> subject <rdf:Description rdf:about="#_525530"> <bqbiol:is> <rdf:Bag> <rdf:li rdf:resource="urn:miriam:obo.chebi:CHEBI%3A4167"/> predicate <rdf:li rdf:resource="urn:miriam:kegg.compound:C00031"/> </rdf:Bag> </bqbiol:is> </rdf:Description> </rdf:RDF> </annotation> object </species>The intent is to express that the species represents a substance composed of glucosemoleculesWe also know from the SBML model that this substance is located in the cytosol and witha (initial) concentration of 0.09765M ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 18
  • 19. annotated models contain references toentities described elsewherePubmed - papers <model> <annotation>ChEBI - chemicals `<bqmodel:isDescribedBy> <rdf:Bag>UniProt - proteins <rdf:li rdf:resource="urn:miriam:pubmed:17667951"/>KEGG - chemicals, </rdf:Bag> </bqmodel:isDescribedBy>reactions <bqbiol:hasPart>E.C. - reactions <rdf:Bag> <rdf:li rdf:resource="urn:miriam:kegg.pathway:sce00010"/>Gene Ontology - <rdf:li rdf:resource="urn:miriam:obo.go:GO%3A0019642"/> </rdf:Bag>functions, reactions, </bqbiol:hasPart>compartments <bqmodel:is> <rdf:Bag>Taxonomy - organism <rdf:li rdf:resource="urn:miriam:taxonomy:4932"/> </rdf:Bag> </bqmodel:is> ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 19
  • 20. It looks like another XML syntax, but it has RDF semantics! What is the meaning of SBML’s RDF annotation? <rdf:Description about=“#_551383”> <bqmodel:is> <rdf:Bag> <rdf:li rdf:resource="urn:miriam:taxonomy:4932"/> </rdf:Bag> </bqmodel:is> </annotation>• The intent is to indicate that the model is a model of a yeast• RDF semantics: #_551383 is a member of a set that is related by bqmodel:is to a collection (rdf:Bag) that has a single member – yeast (4932)• RDF semantics does not match the intent! ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 20
  • 21. Can we formalize and automatically verify the intendedmeaning of the RDF annotation? BioModels.net biology qualifiers is, identity The biological entity represented by the model element has identity with the subject of the referenced resource (modeling object B). This relation might be used to link a reaction to its exact counterpart in a database, for instance. ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 21
  • 22. Biomodels: QualifiersQualifiers for the biological object represented by the model component.encodes/isEncodedByhasPart/isPartOfhasProperty/isPropertyOfhasVersion/ isVersionOfisisDescribedByisHomologTooccursIn http://www.ebi.ac.uk/miriam/main/qualifiers/
  • 23. In this tutorialYou will learn how to create accurate knowledgerepresentations of annotated SBML models.Features • ontological commitment: terms in a vocabulary correspond to formally defined classes and relations and expressions formulated using the Web Ontology Language (OWL) have an unambiguous interpretation • upper level ontology of types and relations to distinguish and constrain model entities to the spatio-temporal entities they represent • Reasoning to uncover inconsistencies, and how to repair them. • Advanced applications of OWL ontologies for answering questions and providing biological insight ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 23
  • 24. What is a model?How does it differ from the thing it is a model of?
  • 25. Conceptualization (SBML)• 2 kinds of entities: o in silico: model components o in vivo: the entities represented by a model ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 25
  • 26. Conceptualization ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 26
  • 27. SBML Conceptualization • Instances of SBML model entities are syntactic entities (in XML) • SBML models represent biological phenomena and structures (e.g., Cell cycle processes, Yeast cells, ...) • Here we focus on Model, Compartment, Species, Reaction ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 27
  • 28. Formalization• Formalization is the process by which we map a conceptualization into a logical representation, which has a particular interpretation.• We first express the basic nature of what the terms refer to by defining them in using a formal language. Next, we can logically combine the terms to form expressions, which have an unambiguous interpretation, and hence can be automatically reasoned about. ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 28
  • 29. Have you heard of the Semantic Web?
  • 30. The Semantic WebIt is about standards for publishing, sharing and querying knowledge drawn from diverse sources It enables the answering of sophisticated questions ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 30
  • 31. The Semantic Web effort aims to develop an interoperable set of standards for knowledge representation and reasoning ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 31
  • 32. URI/IRI• Uniform Resource Identifiers (URI) and Internationalized Resource Identifiers (IRI) are identifiers for resources, given a particular protocol• We’re familiar with Uniform Resource Locators, which species the use of the HTTP protocol to obtain a document with that identifier. – http://dumontierlab.com • International Resource Identifiers (IRIs) include an expanded set of international characters • URI/IRIs are the basis for naming resources on the Semantic Web. – As names, they can also be used to identify non-information resources, like people and places ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 32
  • 33. Entity naming• Uniform Resource Identifiers (URI) are identifiers for resources given a particular protocol. Internationalized Resource Identifiers (IRI) include an expanded set of international characters• URI/IRIs can be used to name entities, both for digital media and non-informational entities like people and places.• Uniform Resource Name (URN) – only a name o MIRIAM - Minimal Information Required In the Annotation of Models  data source and identifier combined in a single IRI - urn:miriam:source:identifier  e.g. urn:miriam:uniprot:P62158  ~ 40 sources defined at EBI registry...• Uniform Resource Locator (URL) – a resolvable name o Bio2RDF - Makes life sciences data available on the Semantic Web o http://bio2rdf.org/uniprot:P62158 o content-type negotiation and explicit URLs resolve to an HTML/RDF/etc description of it. ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 33
  • 34. Semantic Technologies: RDF vs OWLRDF: simple triples, graph-based queries, supportsvery large amount of dataOWL: significantly more expressive language,strong axioms, inference capabilities, consistencyverification, but can be rather slow ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 34
  • 35. Resource Description Framework (RDF) Allows one to talk about anythingUniform Resource Identifier (URI) can be used as entitynamesBio2RDF specifies its naming conventionhttp://bio2rdf.org/uniprot:P05067 uniprot:P05067 is a name for Amyloid precursor proteinhttp://bio2rdf.org/omim:104300 omim:104300 is a name for Alzheimer disease ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 35
  • 36. Resource Description Framework (RDF) Allows one to express statements “Amyloid precursor protein”A RDF statement consists of: rdfs:label– Subject: resource identified by a URI uniprot:P05067– Predicate: resource identified by a URI rdf:type– Object: resource or literal uniprot:Protein ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 36
  • 37. RDF has multiple serializationsRDF/XML<?xml version="1.0"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:u="http://bio2rdf.org/uniprot:" <rdf:Description rdf:about=“&u;Q16665"> <rdf:type rdf:resource=“&u;Protein"/> </rdf:Description></rdf:RDF>RDF/N3PREFIX u: <http://bio2rdf.org/uniprot:><u:Q16665> a <u:Protein> . ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 37
  • 38. Multi-Source Data Integration Syntactic data integration depends on consistent naming is auniprot:P05067 uniprot:Protein uniprot:Protein UniProt has name + located in located inuniprot:P05067 go:Membrane uniprot:P05067 go:Membrane Gene Ontology + interacts with uniprot:P05067 interacts withuniprot:P05067 uniprot:P05067 iRefIndex Unified view ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 38
  • 39. Building statements creates knowledge Amyloid Alzheimer precursor Disease protein label label is involved in uniprot:P05067 omim:104300 is a is a Protein Disease ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 39
  • 40. Bio2RDF’s RDFized data fits togethersyntactic integration ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with 40
  • 41. SGD as RDF-based Linked Open Data ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 41
  • 42. Bio2RDF links and provisions 40 high value datasets ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 42
  • 43. Bio2RDF now serving over40 billion triples of linked biological data ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 43
  • 44. SGD is provided by Bio2RDF and forms part of the growing linked open data cloudLinking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 44
  • 45. Semantic Integration • Requires a level of abstraction/generalization where the relationship between each resource is formalized – classes – relations – individuals • How do we ensure that our representation facilitates integration across datasets? • How can we get our formalization to interoperate with ontologies? ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 45
  • 46. RDF-based Linked Data• Provides the basis for simple data syndication and syntactic data integration o IRIs o Statements (aka triples) take the form of o <subject> <predicate> <object>• Easy to implement o stand-alone datasets o logical layer over databases• Limited reasoning o class and property hierarchies o domain/range restrictions o can’t automatically discover inconsistency• Standardized Queries - SPARQL• Scalable - to billions of triples ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 46
  • 47. What do you know of OWL?
  • 48. The Web Ontology Language (OWL) Has Explicit SemanticsCan therefore be used to capture knowledge in a machine understandable way ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 48
  • 49. OWL - The Web Ontology Language• Enhanced vocabulary (strong axioms) to express knowledge relating to classes, properties, individuals and data values o quantifiers (existential, universal, cardinality restriction) o negation o disjunction o property characteristics o complex classes in domain and range restrictions o property chains• Advanced reasoning ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 49
  • 50. Advanced Reasoning• Consistency: determines whether the ontology contains contradictions.• Satisfiability: determines whether classes can have instances.• Subsumption: is class C1 implicitly a subclass of C2?• Classification: repetitive application of subsumption to discover implicit subclass links between named classes• Realization: find the most specific class that an individual belongs to. ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 50
  • 51. OWL Challenges and SolutionsInconsistency: • needs to be resolved to ask any questions involving the ontology • Solution: explicitly accommodate multiple meanings, remove contradictory axiomsUnsatisfiability (of a class):• may indicate a modelling error• needs to be resolved to ask meaningful questions about the class• Solution: explicitly accommodate multiple meanings, redefine class, remove contradicting class restrictions ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 51
  • 52. OWL Challenges and SolutionsScalability:• answers to OWL queries requires reasoning• inference in OWL is highly complex (worst case: 2 NEXPTIME)• highly optimized reasoners are getting better and better, but can still be slow with large ontologies• tractable OWL profiles (EL, QL, RL) enable more efficient and guaranteed polynomial-time inferences• use ontology modularization approaches to increase performance ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 52
  • 53. OWL can help you create rich, machine- understandable descriptions!• transform our expert knowledge into axioms and expressions that can be automatically reasoned about o a transcription factor is  a protein  that binds to DNA  and regulates the expression of a gene. o can we mine omic datasets to discover which proteins are transcription factors?• create rich expressions from combinations of classes, relations and individuals• assert statements of truth using axioms. ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 53
  • 54. Linked data and OWL: Motivation• use OWL reasoning to identify mistakes in RDF data o incorrect content of assertions o incorrect use of relations o conflicting conceptualizations o incorrect same-as assertions• verify, fix and exploit Linked Data through expressive OWL reasoning• generate/infer new triples to write back into RDF and use for efficient retrievalProposal:Represent SBML biomodels into OWL from the implicitrelations and explicit attributes in XML/RDF. ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 54
  • 55. Elements of OWL 2.0• The “ontology” of OWL 2 consists of: • Classes • Object properties • Data properties • Individuals • Expressions • Axioms • Plus RDF stuff (like datatypes) ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 55
  • 56. Axiomatization• Axioms are statements that are assumed to be true in the domain• Axioms formally interrelate terms from conceptualization stepevery statement can be reduced to an expression based only onprimitive termsTherefore: every axiom expressed only using primitive terms ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 56
  • 57. Classes and class axioms• a class is a set of individuals that share one or more characteristics o a protein• classes can be organized in a hierarchy using subClassOf axioms o i.e. every member of C2 is a member of C1 o subClassOf (protein molecule)• special classes o owl:Thing is the superclass of all things o owl:Nothing is the subclass of all things, denotes an empty set• classes can be made disjoint from one another o i.e. there is no member of C1 that is also a member of C2 o disjointClasses (protein DNA )• classes can be said to be equivalent o i.e. all members of C1 are members of C2 and all members of C2 are members of C1 o EquivalentClass (Peptide Polypeptide ) ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 57
  • 58. Object Properties and axioms• an object property OP is a relation between two individuals o has part is an object property that denotes the mereological relation between two individuals• OPs can be organized in a hierarchy o given OP1 and OP2 and OP2 is a subproperty of OP1 then if an individual x is connected by OP2 to an individual y, then x is also connected by OP1 to y. o subPropertyOf (has proper part has part) o owl:TopObjectProperty, owl:BottomObjectProperty• We can restrict the domain and range to allowed values• ObjectPropertyDomain (is participant in, process)• ObjectPropertyRange (is participant in, physical entity)• We can also assert objects to be disjoint or equivalent ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 58
  • 59. description of object properties• Inverse o we say that has part is an inverse for is part of o we can also refer to this as inv(is part of)• Symmetric o to cases where the inverse relation is the very same relation o e.g. the inverse for is related to is is related to‘• Transitive o a transitive relation if individual x is connected to an individual y that is connected by to an individual z, then x is also connected by to z o e.g. has part is transitive ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 59
  • 60. description of object properties• Reflexive o reflexive infers that the relation automatically refers back to the individual o e.g. has part is reflexive because protein has itself as a part.• Functional o restrict the range of the relation to a single individual, and therefore all individuals in the range must be the same. o e.g. has unique identifier‘• Inverse Functional o restrict the domain of the relation to a single individual, therefore all individuals in the domain must be the same o e.g. is unique identifier of ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 60
  • 61. Class ExpressionsClass expressions are rich descriptions of classes through thelogical combination of ontological primitives (classes, objectproperties, datatype properties, individuals)Protein subClassOf molecule and ‘has proper part’ min 2 ‘amino acid residues’Combinations specified using logical operators • conjunction (and), disjunction (or), negation (not)Object or data property expressions provide a qualified cardinalityover the relation o minimum: rel min # Y o maximum: rel max # Y o exact: rel exactly # Y (minimum + maximum) o some: rel min 1 Y ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 61
  • 62. Class Expressions o The quantifications can qualified by the object type o rel only Y – the only values allowed are of type Y• To form complex class expressions like o molecule and not dna o has part min 2 amino acid o is located in only (nucleus or cytoplasm)• and be expressed as axioms in the ontologyProtein subClassOf molecule and ‘has proper part’ min 2 ‘amino acid residues’Transcription Factor equivalentTo ‘protein’ and ‘has disposition’ some ‘to bind to DNA’ and ‘has function’ some ‘to regulate gene expression’ ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 62
  • 63. What do the following mean, and what biological thing might you annotate with it? C equivalentTo ‘has part’ exactly 2 polypeptide M subClassOf DNA and not molecule
  • 64. OWL has multiple syntaxesFunctional-Style SyntaxClassAssertion( :Person :Robert)RDF SyntaxRDF/XML<Person rdf:about="Robert"/>RDF Turtle:Robert rdf:type :Person .Manchester SyntaxIndividual: RobertTypes: PersonOWL/XML Syntax<ClassAssertion> <Class IRI="Person"/> <NamedIndividual IRI="Robert"/> ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 64
  • 65. OWL ReasonersOWL DL Reasoners• Pellet: Clark & Parsia, dual-licensed, Java.• Fact++: Manchester University, open-source, C++ with a Java API.• HermiT: Oxford University, open-source, Java.• Racer Pro: Racer Systems, commercial, Lisp with a Java API.OWL Profile/subset reasoners• Jena: Hewlett-Packard, open-source, Java.• OWLIM: Ontotext, dual-licensed, Java.• CB:• CEL:• JCEL (Pellet)• ELLY: ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 65
  • 66. Formalization of XML/RDF using OWL• For every triple, we want to create an axiom that makes a commitment as to what the terms refer to and what their combination necessarily implies.• We will also commit to expressing our knowledge in a consistent manner, and this will allow other information resources to be semantically integrated (the expressions are comparable and share the same semantics) ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 66
  • 67. Triples to axiomsConvert RDF triples into OWL axioms.Triple in RDF:<nucleus> <part-of> <cell>• Nucleus and Cell are classes• part-of is a relation between 2 classes• intended meaning: every instance of Nucleus is partOf some instance of Cell• formalize as OWL axiom: Nucleus SubClassOf: part-of some Cell ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 67
  • 68. Triples to axioms: Many possible formalizations –knowledge of logics and domain expertise comes in handy here!Convert RDF triples into OWL axioms.Triple in RDF:<C1 R C2> • C1 and C2 are classes, R a relation between 2 classes • intended meaning: o C1 SubClassOf: C2 Challenge: o C1 SubClassOf: R some C2 Formalizing data requires o C1 SubClassOf: R only C2 one to commit to a o C2 SubClassOf: R some C1 o C1 SubClassOf: S some C2 particular meaning – to o C1 DisjointFrom: C2 make an ontological o C1 and C2 SubClassOf: owl:Nothing commitment o R some C1 DisjointFrom: R some C2 o C1 EquivalentClasses C2 o ... • in general: P(C1, C2), where P is an OWL axiom (template) ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 68
  • 69. Triples to axiomsTriple in RDF:<Cytosol> <isLocationOf> <HXK1>• Cell and HXK1 are classes• isLocationOf is an axiom pattern involving 2 classes• intended meaning: every instance of HXK1 is located at some instance of Cytosol• not intended: for every instance of Cytosol, there is an instance of HXK1 located in it.HXK1 subClassOf hasLocation some Cytosol inv(isLocationOf) some Cytosol
  • 70. Triples to axiomsChallengesFormalizing RDF triples in OWL may introduce new OWLobject properties. • Which object properties should be included? • What axioms hold for included object properties? • Can domain and range restrictions be generalized across multiple domains, i.e., reused across multiple linked data sources to ensure consistency between them?Integration of OWL ontologies requires a commonsemantic platform ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 70
  • 71. Axiom Patterns for Triples<nucleus> <part-of> <cell>?X part-of ?Y•translated to axiom pattern?X subClassOf: part-of some ?Y-> Nucleus subClassOf: part-of some Cell ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 71
  • 72. Implementation• expand relations in RDF based on relational patterns• relational patterns are OWL axioms with 2 variables (which are filled by subject and object, respectively)• implementation based on OWL API• adopt implementation of relational patterns in OBO language (http://code.google.com/p/obo2owl/)Hoehndorf, Robert, Oellrich, Anika, Dumontier, Michel, Kelso, Janet, Herre,Heinrich, and Rebholz-Schuhmann, Dietrich (2010). Relational patterns in OWLand their application to OBO. OWL: Experiences and Directions (OWLED).paper: http://www.webont.org/owled/2010/papers/owled2010_submission_3.pdfpresentation: http://www.slideshare.net/micheldumontier/relational-patterns-in-owl-and-their-application-to-oboBMC Bioinformatics: http://www.biomedcentral.com/1471-2105/11/441 ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 72
  • 73. Another way? http://oppl2.sourceforge.net/• OPPL is an abstract formalism that allows for manipulating ontologies written in OWL.• Use OPPL to select triples and create the axioms ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 73
  • 74. Which types and relations should we use for our axiom patterns?
  • 75. Top level ontologies contain generalized (domain independent) classes and relationsThey can be used to constrain what can be said about theseentities (and hence will later be useful for checking theconsistency of data annotated using these terms). ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 76
  • 76. Basic classes in top-level ontologies• Material entity • Example: Apple, Human, Cell, Planet • Has mass as an quality • Located in space and time • Independent of other entities • it exists in whole whenever it exists• Quality • Example: mass, color, concentration • Dependent: always the quality of some entity • Quality of object: size, shape, length • Quality of process: duration, rate • Quality of quality: shade (of color), intensity ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 77
  • 77. Basic classes in top-level ontologies• Function • e.g. to bind, to catalyze (a reaction), to kill bacteria • Dependent: always the function of some thing • Similar to a property of an object • Represents the potential to do something (an action) in some process • capabilities, dispositions and tendencies• Process • Example: running a marathon, binding, cell division • Located in space and time • Independent of other entities • Temporally extended ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 78
  • 78. Top-level ontologies can make a commitment to these being disjointMaterial object, Process, Function and Quality are mutuallydisjoint. ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 79
  • 79. Basic Relations in Top Level Ontologies• relations (object properties) in OWL hold between instances• Mereological: parthood– ‘has part’, ‘has proper part’, ‘has component part’• Participatory– ‘is participant in’, ‘is agent in’, ‘is target in• Spatial– ‘is connected to’, ‘located in’, ‘contains’, ‘is adjacent to’• Temporal– ‘derives from’, ‘precedes’, ‘meets’, ‘overlaps’, etc• Referential– ‘describes’, ’denotes’, ‘represents’ ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 80
  • 80. Relations in top-level ontologies• domain and range restrictions from top-level ontology can be applied for general relations, e.g.: • ‘has material part’ can be restricted with "Material object" as both domain and range • ‘participates in’ can be restricted with a domain of "Material object" and a range of "Process“• re-use of relations (between instances) enables inferences across resources ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 81
  • 81. Relations impose additional constraints,such that inconsistencies arise whenincorrectly used ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 82
  • 82. Alignment with top-level ontologyFoundation of domain classes and relations in top-levelontology: • every domain class becomes a subclass of a class in top- level ontology • every object property used in OWL axioms becomes a sub- property of an object property in the top-level ontology • assert additional axioms to restrict domain classes and delimit it from other domains (where appropriate) o e.g., if a particular resources uses (in RDF) the relation part-of exclusively between processes, the additional constraint can be added to this relation ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 83
  • 83. What’s the role of top level ontologies?
  • 84. Top-level ontologyApplication of a top-level ontology:• can help to make the ontological commitment that is employed within an information system explicit,• can guarantee basic agreement about fundamental, common types,• Basic agreement about common relations,• provides common domain and range restrictions across multiple domains, and therefore• enables re-use of relations and types across data sources, domains, levels of granularities, information systems. ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 85
  • 85. Formalization of SBML Models: • SBML models and model annotations are converted into OWL axioms by making SBMLs ontological commitment explicit • Implementation as conversion patterns An explicit ontological commitment establishes and implements a one-to-one correspondence between SBML expressions and a formal interpretation within an ontology. ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 86
  • 86. Bridging the gap: combine in vivo entitiesand in silico entities in a common model (an ontology) defined with axioms ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 87
  • 87. FormalizationReaction:A reaction represents some transformation, transport or bindingprocess, typically a chemical reaction, that can change theamount of one or more species. (Hucka et al.)vsa Model component that is part-of a Model and representssome Process ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 88
  • 88. Formalizing SBML models using OWLModel component(x): a model entity that is part of a modelmodel component equivalentClass model entity that is part of some model ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 89
  • 89. Assumption 1: Every model represents a material entityOWL Axiom:Model SubClassOf: represents some MaterialEntityConversion rule: a Model annotated with class C represents:If C is a SubClassOf MaterialEntity thenM SubClassOf: represents some CIf C is a SubClassOf Function thenM SubClassOf: represents some (has-function some C)If C is a SubClassOf Process thenM SubClassOf: represents some (has-function some (realized-by onlyC))
  • 90. BIOMODEL 82: Converting ModelAnnotated with heterotrimeric G-protein complex cycle(GO:0031684): • represents an object O1 • O1 has a function F1 • F1 is realized by processes of the type heterotrimeric G- protein complex cycle • M SubClassOf: represents some O1 • O1 SubClassOf: (has-function some (realized-by only GO:0031684)
  • 91. Assumption 2: Every compartment represents a material objectCompartment(x): a model component that represents amaterial object which is part of the object represented by themodel to which the component belongs Compartment subClassOf model component and represents some Material objectConversion rule: • represents an object O2 • part of the object represented by the model • compartment’s species represent objects that are located in O2 • C SubClassOf: represents some A2 • A2 SubClassOf: located-in some A1
  • 92. BIOMODEL 82: Converting Compartment “Cell”Annotated with Cell (GO:0005623) • represents an object O2 • O2 is a kind of Cell • O2 is a part of O1 (represented by BIOMODEL 82) • C SubClassOf: represents some O2 • O2 SubClassOf: Cell and part-of some O1
  • 93. Assumption 3: Every species represents a material objectSpecies(x): a model component that represents a materialobject which is part of the entity represented by thecompartment of which the species is a part Species subClassOf model component and represents some Material objectSpecies represents an O3 which • can have functions • the functions can be realized by processes • can have qualities (charge, amount, …) • is located in O2
  • 94. BIOMODEL 82: Converting Species “GTP”Annotated with GTP (CHEBI:15996) • represents an object O3 • O3 is a kind of GTP • O3 is located-in O2 (represented by “Cell” compartment) • S SubClassOf: represents some O3 • O3 SubClassOf: GTP and located-in some O2 • O3 SubClassOf: GTP and located-in some (Cell and part-of some (has-function some (realized-by only GO:0031684)))
  • 95. Reactions as Functions, not ProcessesReactions represent Functions. Why not processes?- Functions are capabilities while processes aremanifestations of these capabilities- Processes have a duration, a time of occurrence,participants, etc.- Functions can be realized multiple times,processes occur only once- Processes may be represented by simulations
  • 96. Assumption 4: Every reactionrepresents a functional entityReaction(x): a model component that can include reactants,products and modifiers and represents a functional entity Reaction subClassOf model component and represents some ( ‘material entity’ and ‘has function’ some Function)ListOfReactions(x): a List that has only Reactions as membersListOfReactions EquivalentTo: List and has member only reaction
  • 97. BIOMODEL 82: Converting Reaction “GTP-binding”Annotated with GTP binding (GO:0005525) • represents an object O4 • O4 has a function F4 • F4 is a kind of GTP binding • F4 is realized by P4 • P4 has-input O3 (GTP) •R SubClassOf: represents some (has-function some F4) •F4 SubClassOf: GTP binding and realized-by only P •P SubClassOf: has-input some O3
  • 98. ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 99
  • 99. How would you formalize a modelannotate with:A) heartB) to pump bloodC) heart palpitations
  • 100. SBML2OWL: Implementation1. Read the model • libSBML - http://sbml.org/Software/libSBML2. Extract annotations from model & components • libSBML & Jena - http://jena.sourceforge.net3. Formalize each annotation according to the formalization rules • OWLAPI - http://owlapi.sourceforge.net/4. Integrate with external ontologies • OWLAPI5. Reasoning ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 101
  • 101. SBML2OWL: ImplementationApplication to BioModels repository yields:• OWL ontology with • more than 300,000 classes • More than 800,000 axioms • 90,000 complex model annotations• includes all referenced ontologies o GO o ChEBI o Celltype o FMA o PATO o (KEGG, Reactome) ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 102
  • 102. SBML2OWL: ImplementationOWLAPI:• Ontology consists of o a signature (classes, object properties, individuals) o a set of axioms ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 103
  • 103. SBML2OWL: ImplementationReference implementation: SBMLHarvester http://code.google.com/p/sbmlharvester/ ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 104
  • 104. Verification, querying, integrationWhat can we do with the combined knowledge base?1. Verification2. Querying3. Interoperability and knowledge integration ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 105
  • 105. Operations on OWL ontologiesConsistency checking will identify contradictions in the statedand inferred knowledge. Consistency checking also helps toimplement other reasoning tasks. • Satisfiability: determines whether classes can have instances. • Subsumption: is class C1 implicitly a subclass of C2? Check if C1 and not C2 is unsatisfiable, i.e., there is no instance of C1 that is not also an instance of C2 • Classification: repetitive application of subsumption to discover implicit subclass links between named classes • Realization: find the most specific class that an individual belongs to. Does individual a classify into the class C? ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 106
  • 106. Practical reasoning with OWLontologies• Ontology editors such as Protege interface with reasoners to perform consistency and class satisfiability, classification, realisation, and provide explanations.• Some reasoners are setup to be used as the command line to execute requests including SPARQL querying.• Programmatic use of reasoners via APIs. Maximal flexibility, e.g., one can request all subclasses of a given class, including implicit once, or all entailed statements with a specified subject and predicate ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 107
  • 107. Operations on OWL ontologiesConsistency checking will identify contradictions in the statedand inferred knowledge. Consistency checking also helps toimplement other reasoning tasks • Satisfiability: determines whether classes can have instances. • Subsumption: is class C1 implicitly a subclass of C2? Check if C1 and not C2 is unsatisfiable, i.e., there is no instance of C1 that is not also an instance of C2 • Classification: repetitive application of subsumption to discover implicit subclass links between named classes • Realization: find the most specific class that an individual belongs to. Does individual a classify into the class C? Check if a : ¬C is consistent with the underlying ontology. ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 108
  • 108. Classifying the ontology ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 109
  • 109. Classifying the ontology ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 110
  • 110. Classifying the ontology ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 111
  • 111. Verification• Use of OWL reasoning for classification• Which classes are unsatisfiable?• Unsatisfiable classes are equivalent to owl:Nothing ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 112
  • 112. Model verificationAfter reasoning, we found 27 models to be inconsistentreasons 1. our representation - functions sometimes found in the place of physical entities (e.g. entities that secrete insulin). better to constrain with appropriate relations 2. SBML abused - species used as a measure of time 3. constraints in the ontologies themselves mean that the annotation is simply not possible ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 113
  • 113. Compartments/species annotated with functions or processes ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 114
  • 114. Biological inconsistency: Biomodel 176 ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 115
  • 115. Biological inconsistency: Biomodel 176[Term]id: GO:0016887name: ATPase activityis a: GO:0017111intersection of: GO:0003824 ! catalytic activityintersection of: has input CHEBI:15377 ! waterintersection of: has input CHEBI:15422 ! ATPintersection of: has output CHEBI:16761 ! ADPintersection of: has output CHEBI:26020 ! phosphates ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 116
  • 116. Finding inconsistencies withaxiomatically enhanced ontologiesWe add:• GO: ATP + Water the only inputs (=2 quantification)• ChEBI: Water, ATP, alpha-D-glucose 6-phosphate are all different (disjointness) ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 117
  • 117. Consistency repair • Unsatisfiable classes result from contradictory class definitions • Conflict in asserted axioms, in imported ontologies or through combination of both • Conflicts can be hidden through domain/range restrictions, subclass relations, axioms for relations, etc. • Conflicting axioms may be challenging to identify! ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 118
  • 118. Consistency repairISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 119
  • 119. Protege 4: Explanation Workbench ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 120
  • 120. Ontology repair and disambiguation • Ontological commitment may have been too strong • Complex relations (between classes) can be relaxed by explicitly introducing a disjunction • Example: o Assumption 1: models represent material objects o model is annotated with the process Glycolysis o process and material object are disjoint, therefore the KB will contain unsatisfiable classes ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 121
  • 121. Disambiguation patterndisambiguation pattern: models annotated with X representsmaterial objects X, ormaterial objects with function X, ormaterial objects with function that is realized by X.disambiguation patterns are applicable if multiple alternativesare mutually disjointautomated reasoning will then eliminate all but one option
  • 122. Disambiguation: Model annotationsAssertion:M SubClassOf: represents some C or representssome (has-function some C) or represents some(has-function some (realized-by only C))C SubClassOf: MaterialEntityThen:• represents some C is satisfiable• represents some (has-function some C) and represents some (has-function some (realized-by only C)) are unsatisfiable ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 123
  • 123. Disambiguation: Model annotationsAssertion:M SubClassOf: represents some C or representssome (has-function some C) or represents some(has-function some (realized-by only C))C SubClassOf: FunctionThen:• represents some (has-function some C) is satisfiable• represents some C and represents some (has- function some (realized-by only C)) are unsatisfiable ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 124
  • 124. Disambiguation: Model annotationsAssertion:M SubClassOf: represents some C or representssome (has-function some C) or represents some(has-function some (realized-by only C))C SubClassOf: ProcessThen:• represents some (has-function some (realized-by only C)) is satisfiable• represents some C and represents some (has- function some C) are unsatisfiable ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 125
  • 125. Aside from the disjunction pattern, what else could be used for consistency repair?
  • 126. Once consistent, we can query theontology and infer new knowledge what would YOU ask of your formalized knowledge base?
  • 127. Knowledge discovery and retrieval • All queries are of the form: o Query class: Y o List all subclasses (and descendant classes), equivalent classes, superclasses (and ancestor classes) o Some OWL reasoners perform only classification and output the classified taxonomy ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 128
  • 128. Knowledge discovery and retrieval • Query: list all models • Query type: subclasses • Query class: Model ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 129
  • 129. Knowledge discovery and retrieval • Query: list all reactions that are part of BIOMD0000000169 • Query type: subclasses • Query class: Reaction and part-of some BIOMD0000000169 ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 130
  • 130. Knowledge discovery and retrieval • Query: list all models that represent Glycolysis • Query type: subclasses • Query class: Model and represents some (has-function some (realized-by only Glycolysis)) ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 131
  • 131. Knowledge discovery and retrieval • Query: list all models that have a compartment that represents a part of a Cell in which a sugar is located • Query type: subclasses • Query class: Model and has-part some (Compartment and represents some (part-of some Cell and contains some Sugar)) ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 132
  • 132. Knowledge discovery and retrieval • Query: list all Model entities that represent catalytic activity involving sugar in the endocrine pancreas • Query type: subclasses • Query class: represents some (has-function some catalytic activity and realized-by only (has-participant some (sugar and contained-in some (part-of some Endocrine pancreas)))) ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 133
  • 133. Knowledge discovery and retrieval • Query: list all Model entities that represent mutagenic central nervous system drugs in the gastrointestinal system • Query type: subclasses • Query class: represents some (has-part some (has role some central nervous system drug and has role some mutagen and part-of some Gastrointestinal system) ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 134
  • 134. Answering questions ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 135
  • 135. Automated reasoning• more than 800,000 axioms• included ontologies contains several thousand axioms o GO has approx. 35,000 classes o ChEBI contains almost 100,000 classes o complex definitions of classes create links between large ontologies• Reasoning in OWL 2 DL is highly complex (worst-case 2NEXPTIME complete - 2^(2^n) - with n the number of operators used in the ontology)• Consequence: OWL reasoning can rarely be employing in a large scale.• Expressive OWL reasoners do not classify the formalized biomodels repository. ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 136
  • 136. OWL ReasonersOWL DL Reasoners• Pellet: Clark & Parsia, dual-licensed, Java.• Fact++: Manchester University, open-source, C++ with a Java API.• HermiT: Oxford University, open-source, Java.• Racer Pro: Racer Systems, commercial, Lisp with a Java API.OWL Profile/subset reasoners• Jena: Hewlett-Packard, open-source, Java.• OWLIM: Ontotext, dual-licensed, Java.• CB:• CEL:• JCEL (Pellet)• ELLY: ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 137
  • 137. Implementation in information systems• Classification of model ontology: 10-120min• Answering complex queries: up to several hours• Consequence: OWL reasoning can rarely be employing in a large scale• Subsets of OWL allow tractable (polynomial- time) automated reasoning• OWL EL suitable for ontologies with a large number of classes• Problem: convert ontologies into tractable subset of OWL ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 138
  • 138. OWL Profiles• OWL 2 defines three different tractable profiles: • EL o polynomial time reasoning for schema and data o Useful for ontologies with large conceptual part • QL o fast (logspace) query answering using RDBMs via SQL o Useful for large datasets already stored in RDBs • RL o fast (polynomial) query answering using rule-extended DBs o Useful for large datasets stored as RDF triple ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 139
  • 139. OWL RLFeatures: • identity of classes, instances, properties • subproperties, subclasses, domains, ranges • union and intersection of classes (some restrictions) • property characterizations (functional, symmetric, etc) • property chains • keys • some property restrictions (but not all inferences are possible)Limitations: • not all datatypes are available • no datatype restrictions • no minimum or exact cardinality restrictions • maximum cardinality only with 0 and 1 • some consequences cannot be drawn ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 140
  • 140. OWL ELFeatures • existential quantification to a class expression or data range • existential quantification to an individual or a literal • self-restriction • enumerations involving a single individual or a single literal • intersection of classes and data range • class axioms: subClassOf, equivalence, disjointness • property axioms: domain, range, equivalence, transitive, reflexive, inclusion with or without property chains; functional data properties. keys. • assertions (sameAs, DifferentFrom, Class, Object Property, Data Property, Negative Object/Data PropertyNot supported • universal quantification to a class expression or a data range • cardinality restrictions • disjunction (union) • class negation • enumerations involving more than one individual • object properties: disjoint, symmetric, asymmetric, irreflexive, inverse, functional and inverse-functional ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 141
  • 141. Ontology modularizationCan we automatically extract a large (maximal) OWL (EL, QL,RL) module from an ontology? 1. D EquivalentTo: not A (not EL) 2. C EquivalentTo: not B (not EL) 3. B subClassOf: A (EL)Inference: • D subClassOf: C (EL) (Inference from (1)-(3))EL module of (1)-(3):• {B subClassOf: A}, or• {B subClassOf: A, D subClassOf: C} ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 142
  • 142. EL Vira modularization http://el-vira.googlecode.com• ontology modularization• identify EL, QL, RL axioms in deductive closure• retain signature of ontology• maximality is an open problem ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 143
  • 143. OutcomesThe SBML-derived ontologies can be i) checked for their consistency, thereby uncovering erroneouscurations ii) infer attributes and relations of the substances,compartments and reactions beyond what was originallydescribed in the models iii) answer sophisticated questions across a model knowledgebase ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 144
  • 144. Questions?
  • 145. PhenotypesPhenotypes are observable characteristics of an organism.Examples include: – Red hair – Heart rate of 120bpm – Absent arm – Malfunctional liverPhenotypes include comparisons such as Increased heart rate ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 146
  • 146. Phenotype and anatomy ontologiesanatomy ontologies: > 100,000 classes – FMA, MA, WA, ZFA, FA, GO-CC, ...phenotype ontologies: > 20,000 classes – HPO, MP, WBPhenotype, FBcv, APO, ...quality ontology: > 2,000 classes – PATOprocess and function ontologies: > 25,000 classes – Gene Ontology, ...alignments between anatomy ontologies – UBERON, various mappings
  • 147. Phenotype: Example questionFind all regions in the human, mouse, fish, fly,worm and yeast genome that are associatedwith tetralogy of Fallot.
  • 148. Tetralogy of Fallot
  • 149. Tetralogy of Fallot– Overriding aorta (HP:0002623)– Ventricular septal defect (HP:0001629)– Pulmonic stenosis (HP:0001642)– Right ventricular hypertrophy (HP:0001667)
  • 150. Phenotype descriptionsOverriding aorta (HP:0002623): – Q: overlap with (PATO:0001590) – E1: Aorta (FMA:3734) – E2: Membranous part of interventricular septum (FMA:7135)HP:0002623 EquivalentTo: phene-of some (has-part some (FMA:3734 and has-quality some (PATO:0001590 and towards some FMA:7135)))
  • 151. Human-mouse anatomy mappingsOverriding aorta (HP:0002623): – Q: overlap with (PATO:0001590) – E1: Aorta (FMA:3734) • FMA:3734 EquivalentTo: MA:0000062 – E2: Membranous part of interventricular septum (FMA:7135) • FMA:7135 EquivalentTo: MA:0002939
  • 152. Mouse phenotypeOverriding aorta (MP:0000273): – Q: overlap with (PATO:0001590) – E1: Aorta (MA:0000062) – E2: Membranous interventricular septum (MA:0002939)MP:0000273 EquivalentTo:phene-of some (has-part some (MA:0000062 andhas-quality some (PATO:0001590 and towards someMA:0002939)))Consequence: MP:000272 EquivalentTo: HP:0002623
  • 153. Absence: absent appendixAbsent appendix: – Q: lacks all parts of type (PATO:0002000) – E1: Human body (FMA:20394) – E2: Appendix (FMA:14542)AbsentAppendix EquivalentTo: LacksParts and towards some Appendix andinheres-in some HumanBodyAbsentAppendix EquivalentTo: LacksParts and towards some {Appendix} andinheres-in some HumanBodyAbsentAppendix EquivalentTo: phene-of some (HumanBody and not has-partsome Appendix)
  • 154. Absence and inconsistencyAbsentAppendix SubClassOf: phene-of some (HumanBody and not has-partsome Appendix)HumanBody SubClassOf: has-part some AppendixHumanBody(John). AbsentAppendix(x). has-phene(John,x).
  • 155. Inconsistency removal– Removal of conflicting axioms (has-part/part-of in anatomy)– Contextualize anatomy: • Normal and HumanBody SubClassOf: has-part some (Normal and Appendix)– Use of non-monotonic reasoning
  • 156. Ontology of phenotypesDifferent formal expressions for phenotypes based on – qualities, – anatomical parts, – functions, – processes
  • 157. Tetralogy of Fallot
  • 158. Mouse model
  • 159. Mouse model
  • 160. PhenomeBLAST – apply definition patterns to yeast, fly, worm, fish, mouse and human phenotypes and integrate in single ontology – phenotype alignment through OWL reasoning – more than 300,000 classes and 1,000,000 axioms – combination of HermiT (for EL Vira modularization), CB and CEL reasoner – classification time: 7 minuteshttp://phenomeblast.googlecode.org
  • 161. Phenotype alignments
  • 162. Comparison of phenotypesdirect comparison of phenotypes: – disease phenotypes, e.g., tetralogy of Fallot – phenotypes associated with genetic mutations (genotypes in mouse, fish, etc.)
  • 163. Comparison of phenotypesWhen the phenotype annotation of a genotype becomes asubclass of a disease phenotype, then we can infer a gene-disease association if – disease phenotypes sufficient for having the disease – mutation phenotypes necessary for having a specific genotypeInference over ontologies can establish a formal proof for agene-disease association.
  • 164. Knowledge discoverySimilarity-based comparison allows for incomplete and noisyinformation. – pairwise comparison of phenotypes – similarity: weighted Jaccard index – result: similarity matrix between phenotypes – (quantitative) evaluation based on predicting orthology, pathway, disease – identify novel gene-disease associations
  • 165. Evaluation
  • 166. http://PhenomeBrowser.net
  • 167. What does the future hold? Better formalized ontologies Dynamic generation of knowledge through semantic web services … ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 169
  • 168. Summary - RDF and OWLRDF provides• light-weight semantics• fast queries• highly scalable implementations• large volumes of data (e.g., DBPedia, other Linked Data repositories)OWL provides• Constructs to formalize the intended semantics• An OWLAPI to develop, manage, and serialize OWL ontologies• Efficient reasoners of get inferences, compute modules and get explanations.• syntactic subset for better performance, albeit some inferences may be lost ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 170
  • 169. Summary - OWL & Formal languages• Formal logic-based languages can be used to formalize the meaning of terms used in discourse. While normally restricted in terms of what can be expressed, the statements formed can be automatically reasoned about.• OWL is based on description logics and formalizes the meaning of terms with axioms. Axioms can be used to characterize and distinguish classes, relations and individuals. Rich expressions can be crafted from logical combinations of language primitives including conjunction, disjunction, negation and object/dataproperty restrictions.• OWL reasoners provide a number of services including computing subsumption, satisfiability, entailment, realization and query answering. ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 171
  • 170. Summary - Exploitation of ontologies• verification: automated reasoning can reveal contradictory definitions of classes (unsatisfiable classes), instances that violate constraints in the ontology (often leading to inconsistent ontologies) and reveal hidden inferences (that may be considered invalid through manual verification• querying: ontologies define an explicit, formal language based on which queries to a knowledge base can be performed; queries can be made for instances and for classes satisfying complex conditions• repair: through explicit definitions using disjunction, constraints can be relaxed and contradictions reduced ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 172
  • 171. Summary - Ontology Ontology is not philosophy!• an ontology is a specification of a conceptualization of a domain• a conceputalization is a system of categories accounting for a particular view on the world• ontologies are used to make some aspects of the intended meaning of terms in a vocabulary explicit• ontologies (in computer science) may utilize philosophical theories• formalized ontologies can be used by humans and automated systems as a basis for communication and data exchange• Ontologies are useful tools for translational research ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 173
  • 172. Summary - Implementation in information systems• The OWLAPI is a reference implementation of the OWL specification and facilitates the development, management and serialization of expressive OWL ontologies. The OWLAPI also facilitates modularization and getting explanations.• OWL provides a syntactic subset of the language for efficient reasoning. These so-called OWL profiles (EL, RL, QL) have well understood computational properties and can lead to better performance, but with some inferences lost.• Formal ontology makes it possible to not only retrieve data (similar to db), but also query the concepts themselves ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 174
  • 173. Summary - evaluation• ontologies are tools to support science• Ontologies can provide insight into real biological/scientific problems• quantifiable evaluation can be performed, e.g., based on precision/recall or ROC analysis• application of ontologies may go beyond reasoning alone and use statistical analyses (enrichment), semantic similarity, graph algorithms, clustering, etc. ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 175
  • 174. Conclusions• Ontologies + Semantic Web enables • Integration • Verification • Analysis • Discovery • Translational research ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 176
  • 175. Acknowledgements George Gkoutos Heinrich Herre Janet KelsoDietrich Rebholz-Schuhmann Anika Oellrich Michael Ashburner Dan Cook John Gennari Paul Schofield
  • 176. michel_dumontier@carleton.ca leechuck@leechuck.deISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies 178

×