Real World Applications of OWL




                             Michel Dumontier, Ph.D.

                        Associate Professor of Bioinformatics
    Department of Biology, School of Computer Science, Institute of Biochemistry,
                                 Carleton University
                         Ottawa Institute of Systems Biology
                Ottawa-Carleton Institute of Biomedical Engineering
                         Professeur Associé, Université Laval
                  Visiting Associate Professor, Stanford University

1                                                        Protege Short Course::Dumontier:March 2012
Ontologies in Use
    •   Knowledge Capture (Rightfield)
    •   Formalization and Verification (SNOMED-CT)
    •   Consistency Checking (SBML Harvester)
    •   Classification (Phosphatases, Compounds)
    •   Semantic Annotation (Array Express/ Gene Expression Atlas,
        Semantic Assistant)
    •   Query Formulation (Array Express/ Gene Expression Atlas)
    •   Query Answering (KUPD)
    •   Search & co-occurence (gopubmed)
    •   Semantic Assistant
    •   Hypothesis Testing (HyQue)
    •   Disease Similarity and Model Organism prediction
        (phenomeBLAST)
    •   Function Prediction (genemania)
2                                             Protege Short Course::Dumontier:March 2012
Knowledge Capture
        Rightfield


                              K.Wolstencroft, S.Owen,
                              M.Horridge, O.Krebs,
                              W.Mueller, JL. Snoep,
                              F.Preez, C.Goble
                              RightField: Embedding
                              ontology annotation in
                              spreadsheets.
                              Bioinformatics (2011),
                              May 2011




3              Protege Short Course::Dumontier:March 2012
Formalization
                         SNOMED-CT
    • SNOMED-CT (Clinical Terms)
      ontology
    • used in healthcare systems of
      more than 15 countries, including
      Australia, Canada, Denmark,
      Spain, Sweden and the UK
    • also used by major US providers,
      e.g., Kaiser Permanente
    • ontology provides common
      vocabulary for recording clinical
      data
    • 395036 classes




4                                         Protege Short Course::Dumontier:March 2012
SNOMED-CT




    • Pattern based knowledge capture
    • need training and an information system to
      implement

5                             Protege Short Course::Dumontier:March 2012
SNOMED - verification

    • Kaiser Permanente extending SNOMED to express,
      e.g.:
        – non-viral pneumonia (negation)
        – infectious pneumonia is caused by a virus or a bacterium
          (disjunction)
        – double pneumonia occurs in two lungs (cardinalities)
    • This is easy in SNOMED-OWL
        – but reasoner failed to find expected subsumptions, e.g., that
          bacterial pneumonia is a kind of non-viral pneumonia
    • Ontology highly under-constrained: need to add
      disjointness axioms (at least)
        – virus and bacterium must be disjoint

    - Ian Horrocks OWL2 tutorial
6                                                Protege Short Course::Dumontier:March 2012
SNOMED

     • Adding disjointness led to surprising results
         – many classes become inconsistent, e.g.,
           percutanious embolization of hepatic artery using
           fluoroscopy guidance
     • Cause of inconsistencies identified as class
       groin
         – groin asserted to be subclass of both abdomen and
           leg
         – abdomen and leg are disjoint
         – modelling of groin (and other similar “junction”
           regions) identified as incorrect
    - Ian Horrocks OWL2 tutorial
7                                        Protege Short Course::Dumontier:March 2012
Consistency Checking
          Formalization of SBML annotations into
                     OWL ontologies
     • Biomodels contains hundreds of quantitative
       models
     • SBML is an XML-based format for specifying
       models and their parameters
     • Models and their components are being
       semantically annotated
     • Use the ontologies to validate the assertions

    Integrating systems biology models and biomedical ontologies.
    Hoehndorf R, Dumontier M, Gennari JH, Wimalaratne S, de Bono B, Cook DL, Gkoutos GV.
    BMC Syst Biol. 2011 Aug 11;5:124.


8                                                      Protege Short Course::Dumontier:March 2012
Additional annotations are specified using the
       Resource Description Framework (RDF)
                                                       <species metaid="_525530" id="GLCi"
     Implicit subject                                         compartment="cyto"
    and xml attributes                            initialConcentration="0.097652231064563">

                                                                      <annotation>
                                            <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-
            The annotation                        ns#" xmlns:dc="http://purl.org/dc/elements/1.1/"
                                                     xmlns:dcterms="http://purl.org/dc/terms/"
          element stores the                  xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#"
                                              xmlns:bqbiol="http://biomodels.net/biology-qualifiers/"
                 RDF                        xmlns:bqmodel="http://biomodels.net/model-qualifiers/">
              subject                                     <rdf:Description rdf:about="#_525530">
                                                                          <bqbiol:is>
                                                                            <rdf:Bag>
                                                                               <rdf:li
                       predicate             rdf:resource="urn:miriam:obo.chebi:CHEBI%3A4167"/>
                                                 <rdf:li rdf:resource="urn:miriam:kegg.compound:C00031"/>
                                                                           </rdf:Bag>
                                                                          </bqbiol:is>
                                                                                object
                                                                     </rdf:Description>
                                                                         </rdf:RDF>
                                                                      </annotation>
                                                                       </species>
The intent is to express that the species represents a substance composed of glucose molecules
We also know from the SBML model that this substance is located in the cytosol and with a (initial)
                                    concentration of 0.09765M
9
For each model annotation, we make a
   commitment to what it represents
                            OWL Axiom:
            M SubClassOf: represents some MaterialEntity

     Conversion rule: a Model annotated with class C represents:

               If C is a SubClassOf MaterialEntity then
                   M SubClassOf: represents some C

                If C is a SubClassOf Function then
        M SubClassOf: represents some (has-function some C)

                If C is a SubClassOf Process then
     M SubClassOf: represents some (has-function some (realized-
                             by only C))
10
11
Model verification

        After reasoning, we found 27 models to be inconsistent

                                  reasons
     1. our representation - functions sometimes found in the place
        of physical entities (e.g. entities that secrete insulin). better
                    to constrain with appropriate relations
       2. SBML abused – e.g. species used as a measure of time
          3. Incorrect annotations - constraints in the ontologies
        themselves mean that the annotation is simply not possible




12
Finding inconsistencies with
     axiomatically enhanced ontologies
     ATPase activity (GO:0004002) is a Catalytic activity that has
     Water and ATP as input, ADP and phosphate as output and is
     a part of an ATP catabolic process.
     To this, we add:
      • GO: ATP + Water the only inputs (universal quantification)
      • ChEBI: Water, ATP, alpha-D-glucose 6-phosphate are all
        different (disjointness)
      • “ATP” input to “ATPase” reaction, which is annotated with
        ATPase activity. The species “ATP”, however, is mis-
        annotated with Alpha-D-glucose 6-phosphate
        (CHEBI:17665), not with ATP.
      • Unsatisfiable -> curation error in BIOMD0000000176 and
        BIOMD0000000177 models of anaerobic glycolysis in
        yeast.
13
Classification:
                                       Phosphotases

     • Bioinformaticians use tools to identify functional
       domains (e.g., InterProScan)
     • Tools simply show the presence of domains -
       they do not classify proteins
     • Experts classify proteins according to domain
       arrangements - the presence and number of
       each domain is important

      PhosphaBase: an ontology-driven database resource for protein phosphatases.
      Wolstencroft KJ, Stevens R, Tabernero L, Brass A. Proteins. 2005 Feb 1;58(2):290-4.


14                                                                        Protege Short Course::Dumontier:March 2012
Phosphatase Functional Domains




15                      Protege Short Course::Dumontier:March 2012
Defining Protein Phosphatases

     • Necessary and sufficient conditions are
       stipulated using EquivalentClass axioms
     • A protein phosphatase is exactly a protein that
       consists of exactly one transmembrane domain
       and contains at least one phosphotase domain
     ProteinPhosphatase
     EquivalentTo:
      Protein
      AND hasDomain 1 transMembraneDomain
      AND hasDomain min 1 PhosphataseCatalyticDomain




16                                           Protege Short Course::Dumontier:March 2012
More precise class expressions can
         be formulated for subtypes
     Inclusion of universal quantifier now restricts the domains
     to only the types listed

     R2A EquivalentTo:
     Protein
     AND hasDomain 2 ProteinTyrosinePhosphataseDomain
     AND hasDomain 1 TransmembraneDomain
     AND hasDomain 4 FibronectinDomains
     AND hasDomain 1 ImmunoglobulinDomain
     AND hasDomain 1 MAMDomain
     AND hasDomain 1 Cadherin-LikeDomain
     AND hasDomain only (TyrosinePhosphataseDomain OR
     TransmembraneDomain OR FibronectinDomain OR
     ImnunoglobulinDomain OR Clathrin-LikeDomain OR ManDomain)



17                                        Protege Short Course::Dumontier:March 2012
Describing chemical functional groups in OWL-DL
        for the classification of chemical compounds

                                    methyl group
                                                                                  hydroxyl group




                                                                               Ethanol


        Knowledge of functional                                           Functional groups describe
         groups is important in                                          chemical reactivity in terms of
          chemical synthesis,                                          atoms and their connectivity, and
       pharmaceutical design and                                        exhibits characteristic chemical
           lead optimization.                                             behavior when present in a
                                                                                  compound.

 N Villanueva-Rosales, M Dumontier. 2007. OWLED, Innsbruck, Austria.
18                                                                        Protege Short Course::Dumontier:March 2012
Describing Functional Groups in DL
              R group



                                                           O
                                                 R                 H




      HydroxylGroup:
      CarbonGroup that (hasSingleBondWith      some      (OxygenAtom         that
      hasSingleBondWith some HydrogenAtom)




19                                           Protege Short Course::Dumontier:March 2012
Fully Classified Ontology




                                                     35 FG




20                   Protege Short Course::Dumontier:March 2012
And, we define certain compounds

     Alcohol:
     OrganicCompound that (hasPart some HydroxylGroup)




21                                   Protege Short Course::Dumontier:March 2012
Organic Compound Ontology




                                                    28 OC




22                  Protege Short Course::Dumontier:March 2012
Question Answering:
      Classes as self-contained queries


     • Query PubChem, DrugBank and dbPedia




23                          Protege Short Course::Dumontier:March 2012
Querying Kidney and Urinary
                            Knowledge Base and Ontology
                           Query: What are the genes involved in
              Proteins transport expressed in Proximal Tubule Epithelial Cell?
          Entre gene
                                                    KUPO Ontology
      Gene X                GO:0054426                                    PT epithelial cell
               go:biological_process                MA:00345
                                                                                     rdfs:label
       Gene Y
                                                            ro:part_of   kupo:002444
     Higgings Dataset          Proximal tubule
                                                                          DT epithelial cell
                                  MA:000345         MA:00456
     Gene X                                                                          rdfs:label
              kupo:expressed_in     Distal tubule
                                                            ro:part_of   kupo:004672
     Gene Y
                                  MA:00456
              kupo:expressed_in



24                                                    Protege Short Course::Dumontier:March 2012
Semantic Annotation and Query

          ArrayExpress

       Curation                                                     Curation




                      >250,000
                        Assays                                                            ATLAS
      AE/GEO acquire   >10,000                       Re-annotate & summarize
                     experiments




            Ontologically Modeling Sample Variables in Gene Expression Data
                                    malone@ebi.ac.uk
25                                                                Protege Short Course::Dumontier:March 2012
ontology-based data exploration
     Query for Cell adhesion genes in all „organism parts‟

                                                                   „View on EFO‟




               Ontologically Modeling Sample Variables in Gene Expression Data
                                       malone@ebi.ac.uk
26                                                                   Protege Short Course::Dumontier:March 2012
Ontology-based query expansion for ArrayExpress
          Archive @ www.ebi.ac.uk/arrayexpress




27                               Protege Short Course::Dumontier:March 2012
Search and Co-Occurrence




28                  Protege Short Course::Dumontier:March 2012
Semantic Assistant
       services relevant for the user's current task are offered directly within a desktop
       application. This approach relies on ontology-described semantic web services
       to provide external natural language processing (NLP) pipelines




       Leverage of OWL-DL axioms in a Contact Centre for Technical Product Support
     Alex Kouznetsov, Bradley Shoebottom, René Witte, Christopher JO Baker. OWLED 2010.



29                                                        Protege Short Course::Dumontier:March 2012
Plug-in for Open Office Client




30                     Protege Short Course::Dumontier:March 2012
• HyQue helps construct and evaluate
        (automatically obtain support for) hypotheses
        using formalized background knowledge and
        data using the Semantic Web
      • HyQue makes it possible to develop a reliability
        model around data based on our scientific
        expectations of corroborating evidence

     Callahan A, Dumontier M, Shah NH. HyQue: evaluating hypotheses using Semantic Web technologies. J Biomed
     Semantics. 2011 May 17;2 Suppl 2:S3.

     Callahan A, Dumontier M. Evaluating scientific hypotheses using the SPARQL Inferencing Notation. Extended
     Semantic Web Conference (ESWC 2012). Heraklion, Crete. May 27-31, 2012. Accepted.


31                                                                          Protege Short Course::Dumontier:March 2012
Hypothesis

     h1:                                               • simple event-
     e1 (Gal4p induces expression of GAL1)               based
                                                         expression
     h2:
     e2 (Gal3p induces expression of GAL2              • conjunctive
                                                         hypothesis –
     e3 AND Gal4p induces expression of GAL7)            must satisfy
                                                         two
     h3:                                                 expressions
     e4 (Gal4p induces expression of GAL7
     e5 AND Gal80p inhibits production of Gal4p        • conjunctive
             when GAL3 is over-expressed                 hypothesis
                                                         with
     e6 AND Gal80p induces expression of GAL7)           conditional
                                                         expression



32                                           Protege Short Course::Dumontier:March 2012
HYQUE ARCHITECTURE




       Callahan A, Dumontier M, Shah NH. HyQue: evaluating hypotheses using Semantic Web
                    technologies. J Biomed Semantics. 2011 May 17;2 Suppl 2:S3.

   Callahan A, Dumontier M. Evaluating scientific hypotheses using the SPARQL Inferencing Notation.
   Extended Semantic Web Conference (ESWC 2012). Heraklion, Crete. May 27-31, 2012. Accepted.
33                                                                 Protege Short Course::Dumontier:March 2012
Rule-based assessment of evidence

     • „induce‟ rule (maximum score: 5):
        – Is event negated?                              GO:0010628

            • If yes, subtract 2
        – Is logical operator „induce‟?                                CHEBI:36080
            • If yes, add 1; if no, subtract 1
        – Is agent of type „protein‟ or „RNA‟?
            • If yes, add 1; if of type „gene‟, subtract 1
        – Is target of type „gene‟?                                        SO:0000236

            • If yes, add 1; if no, subtract 1
        – Does agent have known „transcription factor activity‟?
            • If yes, add 1                                                GO:0003700
        – Is event located in the „nucleus‟?
            • If yes, add 1; if no, subtract 1
                                                                  GO:0005634


34                                                    Protege Short Course::Dumontier:March 2012
Linked Open Results :
     from hypothesis to evidence




35                    Protege Short Course::Dumontier:March 2012
Literature-Based Enrichment Analysis




     • Enrichment analysis on terms extracted using a target
       ontology for associated articles.


       Enabling enrichment analysis with the Human Disease Ontology. Paea LePendu, , Mark A. Musen, Nigam H. Shah.
                Journal of Biomedical Informatics. Volume 44, Supplement 1, December 2011, Pages S31–S38
36                                                                         Protege Short Course::Dumontier:March 2012
37   Protege Short Course::Dumontier:March 2012
Phenotype-based predictions

      Phenotypes can be used as a substrate to
      cluster similar diseases, identify potential
      model systems, predict potential disease-
      treating drugs or their adverse events, drug
      repurposing, etc
     Robert Hoehndorf, Paul N. Schofield and Georgios V. Gkoutos. PhenomeNET: a whole-
     phenome approach to disease gene discovery. Nucleid Acids Research, 2011.

     Linking pharmgkb to phenotype studies and animal models of disease for drug repurposing.
     Hoehndorf R, Oellrich A, Rebholz-Schuhmann D, Schofield PN, Gkoutos GV. Pac Symp
     Biocomput. 2012:388-99.

     CK Chen, CJ Mungall, GV Gkoutos et al. MouseFinder: candidate disease genes from mouse
     phenotype data. Human Mutation 2012
38                                                        Protege Short Course::Dumontier:March 2012
Tetralogy of
                             Fallot

                       Phenotype ontologies should
                          contain descriptions of
                        morphological, behavioural,
                       physiological, developmental
                              characteristics


            Human Phenotype Ontology

     OMIM




39                   Protege Short Course::Dumontier:March 2012
Compare Diseases based on their
              Phenotypes




 Comparison using Weighted Jaccard – uses information content for a
 phenotype regarding genotype or disease



40                                           Protege Short Course::Dumontier:March 2012
Inferring equivalent phenotypes by
        reasoning over OWL ontologies
     human „overriding aorta [HP:0002623]‟ EquivalentTo:
     „phenotype of‟ some („has part‟ some („aorta [FMA:3734]‟ and „overlaps with‟ some
     „membranous part of interventricular septum [FMA:7135]‟)

     mouse „overriding aorta [MP:0000273 ]‟ EquivalentTo:
     „phenotype of‟ some („has part‟ some („aorta [MA:0000062]‟ and „overlaps with‟
     some „membranous interventricular septum [MA:0002939]‟

     Uberon super-anatomy ontology provides inter-species mappings
     „aorta [FMA:3734]‟ EquivalentTo: „aorta [MA:0002939]‟
     „membranous part of interventricular septum [FMA:3734]‟ EquivalentTo:
     „membranous interventricular septum [MA:0000062]

     Thus, „overriding aorta [HP:0002623] EquivalentTo:„overriding aorta[MP:0000273]‟



41                                                   Protege Short Course::Dumontier:March 2012
Identifying potential mouse models
             for human diseases




                     Quantitative ROC Analysis prediction
                     against curated models yields 0.89 AUC

                     Prediction of Tetralogy of Fallot added by
                     MGI




42                            Protege Short Course::Dumontier:March 2012
Conclusion

     • OWL has come of age and can be used in
       an increasing number of scientific
       investigations and applications
     • OWL applications cover knowledge
       capture, formalization, verification,
       classification, semantic annotation, query
       formulation, query answering, search,
       hypothesis testing and prediction


43                              Protege Short Course::Dumontier:March 2012

Real World Applications of OWL

  • 1.
    Real World Applicationsof OWL Michel Dumontier, Ph.D. Associate Professor of Bioinformatics Department of Biology, School of Computer Science, Institute of Biochemistry, Carleton University Ottawa Institute of Systems Biology Ottawa-Carleton Institute of Biomedical Engineering Professeur Associé, Université Laval Visiting Associate Professor, Stanford University 1 Protege Short Course::Dumontier:March 2012
  • 2.
    Ontologies in Use • Knowledge Capture (Rightfield) • Formalization and Verification (SNOMED-CT) • Consistency Checking (SBML Harvester) • Classification (Phosphatases, Compounds) • Semantic Annotation (Array Express/ Gene Expression Atlas, Semantic Assistant) • Query Formulation (Array Express/ Gene Expression Atlas) • Query Answering (KUPD) • Search & co-occurence (gopubmed) • Semantic Assistant • Hypothesis Testing (HyQue) • Disease Similarity and Model Organism prediction (phenomeBLAST) • Function Prediction (genemania) 2 Protege Short Course::Dumontier:March 2012
  • 3.
    Knowledge Capture Rightfield K.Wolstencroft, S.Owen, M.Horridge, O.Krebs, W.Mueller, JL. Snoep, F.Preez, C.Goble RightField: Embedding ontology annotation in spreadsheets. Bioinformatics (2011), May 2011 3 Protege Short Course::Dumontier:March 2012
  • 4.
    Formalization SNOMED-CT • SNOMED-CT (Clinical Terms) ontology • used in healthcare systems of more than 15 countries, including Australia, Canada, Denmark, Spain, Sweden and the UK • also used by major US providers, e.g., Kaiser Permanente • ontology provides common vocabulary for recording clinical data • 395036 classes 4 Protege Short Course::Dumontier:March 2012
  • 5.
    SNOMED-CT • Pattern based knowledge capture • need training and an information system to implement 5 Protege Short Course::Dumontier:March 2012
  • 6.
    SNOMED - verification • Kaiser Permanente extending SNOMED to express, e.g.: – non-viral pneumonia (negation) – infectious pneumonia is caused by a virus or a bacterium (disjunction) – double pneumonia occurs in two lungs (cardinalities) • This is easy in SNOMED-OWL – but reasoner failed to find expected subsumptions, e.g., that bacterial pneumonia is a kind of non-viral pneumonia • Ontology highly under-constrained: need to add disjointness axioms (at least) – virus and bacterium must be disjoint - Ian Horrocks OWL2 tutorial 6 Protege Short Course::Dumontier:March 2012
  • 7.
    SNOMED • Adding disjointness led to surprising results – many classes become inconsistent, e.g., percutanious embolization of hepatic artery using fluoroscopy guidance • Cause of inconsistencies identified as class groin – groin asserted to be subclass of both abdomen and leg – abdomen and leg are disjoint – modelling of groin (and other similar “junction” regions) identified as incorrect - Ian Horrocks OWL2 tutorial 7 Protege Short Course::Dumontier:March 2012
  • 8.
    Consistency Checking Formalization of SBML annotations into OWL ontologies • Biomodels contains hundreds of quantitative models • SBML is an XML-based format for specifying models and their parameters • Models and their components are being semantically annotated • Use the ontologies to validate the assertions Integrating systems biology models and biomedical ontologies. Hoehndorf R, Dumontier M, Gennari JH, Wimalaratne S, de Bono B, Cook DL, Gkoutos GV. BMC Syst Biol. 2011 Aug 11;5:124. 8 Protege Short Course::Dumontier:March 2012
  • 9.
    Additional annotations arespecified using the Resource Description Framework (RDF) <species metaid="_525530" id="GLCi" Implicit subject compartment="cyto" and xml attributes initialConcentration="0.097652231064563"> <annotation> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax- The annotation ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" element stores the xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/" RDF xmlns:bqmodel="http://biomodels.net/model-qualifiers/"> subject <rdf:Description rdf:about="#_525530"> <bqbiol:is> <rdf:Bag> <rdf:li predicate rdf:resource="urn:miriam:obo.chebi:CHEBI%3A4167"/> <rdf:li rdf:resource="urn:miriam:kegg.compound:C00031"/> </rdf:Bag> </bqbiol:is> object </rdf:Description> </rdf:RDF> </annotation> </species> The intent is to express that the species represents a substance composed of glucose molecules We also know from the SBML model that this substance is located in the cytosol and with a (initial) concentration of 0.09765M 9
  • 10.
    For each modelannotation, we make a commitment to what it represents OWL Axiom: M SubClassOf: represents some MaterialEntity Conversion rule: a Model annotated with class C represents: If C is a SubClassOf MaterialEntity then M SubClassOf: represents some C If C is a SubClassOf Function then M SubClassOf: represents some (has-function some C) If C is a SubClassOf Process then M SubClassOf: represents some (has-function some (realized- by only C)) 10
  • 11.
  • 12.
    Model verification After reasoning, we found 27 models to be inconsistent reasons 1. our representation - functions sometimes found in the place of physical entities (e.g. entities that secrete insulin). better to constrain with appropriate relations 2. SBML abused – e.g. species used as a measure of time 3. Incorrect annotations - constraints in the ontologies themselves mean that the annotation is simply not possible 12
  • 13.
    Finding inconsistencies with axiomatically enhanced ontologies ATPase activity (GO:0004002) is a Catalytic activity that has Water and ATP as input, ADP and phosphate as output and is a part of an ATP catabolic process. To this, we add: • GO: ATP + Water the only inputs (universal quantification) • ChEBI: Water, ATP, alpha-D-glucose 6-phosphate are all different (disjointness) • “ATP” input to “ATPase” reaction, which is annotated with ATPase activity. The species “ATP”, however, is mis- annotated with Alpha-D-glucose 6-phosphate (CHEBI:17665), not with ATP. • Unsatisfiable -> curation error in BIOMD0000000176 and BIOMD0000000177 models of anaerobic glycolysis in yeast. 13
  • 14.
    Classification: Phosphotases • Bioinformaticians use tools to identify functional domains (e.g., InterProScan) • Tools simply show the presence of domains - they do not classify proteins • Experts classify proteins according to domain arrangements - the presence and number of each domain is important PhosphaBase: an ontology-driven database resource for protein phosphatases. Wolstencroft KJ, Stevens R, Tabernero L, Brass A. Proteins. 2005 Feb 1;58(2):290-4. 14 Protege Short Course::Dumontier:March 2012
  • 15.
    Phosphatase Functional Domains 15 Protege Short Course::Dumontier:March 2012
  • 16.
    Defining Protein Phosphatases • Necessary and sufficient conditions are stipulated using EquivalentClass axioms • A protein phosphatase is exactly a protein that consists of exactly one transmembrane domain and contains at least one phosphotase domain ProteinPhosphatase EquivalentTo: Protein AND hasDomain 1 transMembraneDomain AND hasDomain min 1 PhosphataseCatalyticDomain 16 Protege Short Course::Dumontier:March 2012
  • 17.
    More precise classexpressions can be formulated for subtypes Inclusion of universal quantifier now restricts the domains to only the types listed R2A EquivalentTo: Protein AND hasDomain 2 ProteinTyrosinePhosphataseDomain AND hasDomain 1 TransmembraneDomain AND hasDomain 4 FibronectinDomains AND hasDomain 1 ImmunoglobulinDomain AND hasDomain 1 MAMDomain AND hasDomain 1 Cadherin-LikeDomain AND hasDomain only (TyrosinePhosphataseDomain OR TransmembraneDomain OR FibronectinDomain OR ImnunoglobulinDomain OR Clathrin-LikeDomain OR ManDomain) 17 Protege Short Course::Dumontier:March 2012
  • 18.
    Describing chemical functionalgroups in OWL-DL for the classification of chemical compounds methyl group hydroxyl group Ethanol Knowledge of functional Functional groups describe groups is important in chemical reactivity in terms of chemical synthesis, atoms and their connectivity, and pharmaceutical design and exhibits characteristic chemical lead optimization. behavior when present in a compound. N Villanueva-Rosales, M Dumontier. 2007. OWLED, Innsbruck, Austria. 18 Protege Short Course::Dumontier:March 2012
  • 19.
    Describing Functional Groupsin DL R group O R H HydroxylGroup: CarbonGroup that (hasSingleBondWith some (OxygenAtom that hasSingleBondWith some HydrogenAtom) 19 Protege Short Course::Dumontier:March 2012
  • 20.
    Fully Classified Ontology 35 FG 20 Protege Short Course::Dumontier:March 2012
  • 21.
    And, we definecertain compounds Alcohol: OrganicCompound that (hasPart some HydroxylGroup) 21 Protege Short Course::Dumontier:March 2012
  • 22.
    Organic Compound Ontology 28 OC 22 Protege Short Course::Dumontier:March 2012
  • 23.
    Question Answering: Classes as self-contained queries • Query PubChem, DrugBank and dbPedia 23 Protege Short Course::Dumontier:March 2012
  • 24.
    Querying Kidney andUrinary Knowledge Base and Ontology Query: What are the genes involved in Proteins transport expressed in Proximal Tubule Epithelial Cell? Entre gene KUPO Ontology Gene X GO:0054426 PT epithelial cell go:biological_process MA:00345 rdfs:label Gene Y ro:part_of kupo:002444 Higgings Dataset Proximal tubule DT epithelial cell MA:000345 MA:00456 Gene X rdfs:label kupo:expressed_in Distal tubule ro:part_of kupo:004672 Gene Y MA:00456 kupo:expressed_in 24 Protege Short Course::Dumontier:March 2012
  • 25.
    Semantic Annotation andQuery ArrayExpress Curation Curation >250,000 Assays ATLAS AE/GEO acquire >10,000 Re-annotate & summarize experiments Ontologically Modeling Sample Variables in Gene Expression Data malone@ebi.ac.uk 25 Protege Short Course::Dumontier:March 2012
  • 26.
    ontology-based data exploration Query for Cell adhesion genes in all „organism parts‟ „View on EFO‟ Ontologically Modeling Sample Variables in Gene Expression Data malone@ebi.ac.uk 26 Protege Short Course::Dumontier:March 2012
  • 27.
    Ontology-based query expansionfor ArrayExpress Archive @ www.ebi.ac.uk/arrayexpress 27 Protege Short Course::Dumontier:March 2012
  • 28.
    Search and Co-Occurrence 28 Protege Short Course::Dumontier:March 2012
  • 29.
    Semantic Assistant services relevant for the user's current task are offered directly within a desktop application. This approach relies on ontology-described semantic web services to provide external natural language processing (NLP) pipelines Leverage of OWL-DL axioms in a Contact Centre for Technical Product Support Alex Kouznetsov, Bradley Shoebottom, René Witte, Christopher JO Baker. OWLED 2010. 29 Protege Short Course::Dumontier:March 2012
  • 30.
    Plug-in for OpenOffice Client 30 Protege Short Course::Dumontier:March 2012
  • 31.
    • HyQue helpsconstruct and evaluate (automatically obtain support for) hypotheses using formalized background knowledge and data using the Semantic Web • HyQue makes it possible to develop a reliability model around data based on our scientific expectations of corroborating evidence Callahan A, Dumontier M, Shah NH. HyQue: evaluating hypotheses using Semantic Web technologies. J Biomed Semantics. 2011 May 17;2 Suppl 2:S3. Callahan A, Dumontier M. Evaluating scientific hypotheses using the SPARQL Inferencing Notation. Extended Semantic Web Conference (ESWC 2012). Heraklion, Crete. May 27-31, 2012. Accepted. 31 Protege Short Course::Dumontier:March 2012
  • 32.
    Hypothesis h1: • simple event- e1 (Gal4p induces expression of GAL1) based expression h2: e2 (Gal3p induces expression of GAL2 • conjunctive hypothesis – e3 AND Gal4p induces expression of GAL7) must satisfy two h3: expressions e4 (Gal4p induces expression of GAL7 e5 AND Gal80p inhibits production of Gal4p • conjunctive when GAL3 is over-expressed hypothesis with e6 AND Gal80p induces expression of GAL7) conditional expression 32 Protege Short Course::Dumontier:March 2012
  • 33.
    HYQUE ARCHITECTURE Callahan A, Dumontier M, Shah NH. HyQue: evaluating hypotheses using Semantic Web technologies. J Biomed Semantics. 2011 May 17;2 Suppl 2:S3. Callahan A, Dumontier M. Evaluating scientific hypotheses using the SPARQL Inferencing Notation. Extended Semantic Web Conference (ESWC 2012). Heraklion, Crete. May 27-31, 2012. Accepted. 33 Protege Short Course::Dumontier:March 2012
  • 34.
    Rule-based assessment ofevidence • „induce‟ rule (maximum score: 5): – Is event negated? GO:0010628 • If yes, subtract 2 – Is logical operator „induce‟? CHEBI:36080 • If yes, add 1; if no, subtract 1 – Is agent of type „protein‟ or „RNA‟? • If yes, add 1; if of type „gene‟, subtract 1 – Is target of type „gene‟? SO:0000236 • If yes, add 1; if no, subtract 1 – Does agent have known „transcription factor activity‟? • If yes, add 1 GO:0003700 – Is event located in the „nucleus‟? • If yes, add 1; if no, subtract 1 GO:0005634 34 Protege Short Course::Dumontier:March 2012
  • 35.
    Linked Open Results: from hypothesis to evidence 35 Protege Short Course::Dumontier:March 2012
  • 36.
    Literature-Based Enrichment Analysis • Enrichment analysis on terms extracted using a target ontology for associated articles. Enabling enrichment analysis with the Human Disease Ontology. Paea LePendu, , Mark A. Musen, Nigam H. Shah. Journal of Biomedical Informatics. Volume 44, Supplement 1, December 2011, Pages S31–S38 36 Protege Short Course::Dumontier:March 2012
  • 37.
    37 Protege Short Course::Dumontier:March 2012
  • 38.
    Phenotype-based predictions Phenotypes can be used as a substrate to cluster similar diseases, identify potential model systems, predict potential disease- treating drugs or their adverse events, drug repurposing, etc Robert Hoehndorf, Paul N. Schofield and Georgios V. Gkoutos. PhenomeNET: a whole- phenome approach to disease gene discovery. Nucleid Acids Research, 2011. Linking pharmgkb to phenotype studies and animal models of disease for drug repurposing. Hoehndorf R, Oellrich A, Rebholz-Schuhmann D, Schofield PN, Gkoutos GV. Pac Symp Biocomput. 2012:388-99. CK Chen, CJ Mungall, GV Gkoutos et al. MouseFinder: candidate disease genes from mouse phenotype data. Human Mutation 2012 38 Protege Short Course::Dumontier:March 2012
  • 39.
    Tetralogy of Fallot Phenotype ontologies should contain descriptions of morphological, behavioural, physiological, developmental characteristics Human Phenotype Ontology OMIM 39 Protege Short Course::Dumontier:March 2012
  • 40.
    Compare Diseases basedon their Phenotypes Comparison using Weighted Jaccard – uses information content for a phenotype regarding genotype or disease 40 Protege Short Course::Dumontier:March 2012
  • 41.
    Inferring equivalent phenotypesby reasoning over OWL ontologies human „overriding aorta [HP:0002623]‟ EquivalentTo: „phenotype of‟ some („has part‟ some („aorta [FMA:3734]‟ and „overlaps with‟ some „membranous part of interventricular septum [FMA:7135]‟) mouse „overriding aorta [MP:0000273 ]‟ EquivalentTo: „phenotype of‟ some („has part‟ some („aorta [MA:0000062]‟ and „overlaps with‟ some „membranous interventricular septum [MA:0002939]‟ Uberon super-anatomy ontology provides inter-species mappings „aorta [FMA:3734]‟ EquivalentTo: „aorta [MA:0002939]‟ „membranous part of interventricular septum [FMA:3734]‟ EquivalentTo: „membranous interventricular septum [MA:0000062] Thus, „overriding aorta [HP:0002623] EquivalentTo:„overriding aorta[MP:0000273]‟ 41 Protege Short Course::Dumontier:March 2012
  • 42.
    Identifying potential mousemodels for human diseases Quantitative ROC Analysis prediction against curated models yields 0.89 AUC Prediction of Tetralogy of Fallot added by MGI 42 Protege Short Course::Dumontier:March 2012
  • 43.
    Conclusion • OWL has come of age and can be used in an increasing number of scientific investigations and applications • OWL applications cover knowledge capture, formalization, verification, classification, semantic annotation, query formulation, query answering, search, hypothesis testing and prediction 43 Protege Short Course::Dumontier:March 2012

Editor's Notes

  • #26 So slide 1 is the workflow we have for getting data into the two repositories. We take publicly submitted data, often accompanying a publication, and we also have a pipeline importing data from GEO which we re-annotate. Data is ran through some scripts to check minimum QC then it is manually curated which then goes into ArrayExpress. The second part of that pipeline is that a subset of that data, selected based on array design type, is re-annotated against ontology terms. We use the gene annotations plus ontology terms to integrate and analyse on a per gene per condition level and summarise the diff expression for each gene vs condition - this is Atlas.Second slide is showing the gene expression atlas. you can explore the data using the ontology (far right), it expands based on the tree and on synonym (annotation properties). The view you see is based on the ontology tree.Third slide is ArrayExpress which is not annotated with EFO URIs but here we use the ontology to drive query expansion simply using it as a vocabulary. A search for cancer would expand on all subtypes and perform a Lucene query across the data for anything matching this text. This will change in near future as we are aiming to annotate both against the ontology.
  • #36 The RDF representing the evaluation of the input hypothesis is linked to both the hypothesis AND the data used to evaluate the hypothesis
  • #37  Workflow for generating background annotation sets for enrichment analysis: First, we start with a corpus of PubMed articles identified in manually curated GO annotations. These curated annotations provide gene-to-article associations. Next, we annotate the titles and abstracts of each article with ontology terms using the NCBO Annotator service. Terms associations can be expanded based on inferred hierarchical relationships. Finally, the gene-to-article associations are linked with the curated article-to-term associations to obtain a list of gene-to-term associations. The resulting term frequencies provide a background set for enrichment analysis