Formal representation of models in systems biology


Published on

Published in: Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Slick used car salesman
  • Formal representation of models in systems biology

    1. 1. Formal representation of models in systems biology<br />1<br />Michel Dumontier, Ph.D.<br />Associate Professor of Bioinformatics, Department of Biology, School of Computer Science, Institute of Biochemistry, Carleton University<br />Professeur Associé, Département d’informatique et de génielogiciel, Université Laval<br />Ottawa Institute of Systems Biology<br />Ottawa-Carleton Institute of Biomedical Engineering<br />INRIA2011::Dumontier<br />
    2. 2. Systems Biology<br />We create and simulate biological models to :<br /><ul><li>gain insight into the structure and function of biochemical networks
    3. 3. reveal metabolic and signallingcapabilities so as to predict phenotypes
    4. 4. undertake metabolic engineering to maximize some desired product</li></ul>To do this, we need <br /><ul><li>to integrate & manage our data & knowledge in a coherent, scalable and machine understandable manner
    5. 5. efficient software to execute computationally demanding simulations</li></ul>ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies<br />2<br />
    6. 6. Repressilator: A self-regulating system<br />3<br />INRIA2011::Dumontier<br />A synthetic oscillatory network of transcriptional regulators. Elowitz MB, Leibler S. (2000). Nature 403: 335-338.<br />
    7. 7. Biomodels are specified using SBML+ semantic annotation<br />4<br />INRIA2011::Dumontier<br />
    8. 8. Semantic annotations are primarily used for browse and search<br /><ul><li>EBI managed resource with 600+ SBML models
    9. 9. 300+ models are annotated with
    10. 10. Pubmed- papers
    11. 11. ChEBI - chemicals
    12. 12. UniProt - proteins
    13. 13. KEGG - chemicals, reactions
    14. 14. E.C. - reactions
    15. 15. Gene Ontology - functions, reactions, compartments
    16. 16. Taxonomy - organism</li></ul><br />ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies<br />5<br />
    17. 17. Bio-ontologies<br /><ul><li>Provide rich human and machine understandable descriptionsof the terms they purport to describe
    18. 18. Are generally used for semantic annotation of data, which when reused, facilitate integration across domains (granularity, species, experimental methods)
    19. 19. Facilitate granular and cross-domain queries
    20. 20. Can be used to obtain explanations for inferencesdrawn
    21. 21. Can be efficiently processed by algorithms and software</li></ul>ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies<br />6<br />
    22. 22. Additional annotations are specified using the Resource Description Framework (RDF)<br />Implicit subject and xml attributes<br />   <species metaid="_525530" id="GLCi" <br /> compartment="cyto" <br />initialConcentration="0.097652231064563"><br />      <br />     <annotation><br />        <rdf:RDFxmlns:rdf="" xmlns:dc="" xmlns:dcterms="" xmlns:vCard="" xmlns:bqbiol="" xmlns:bqmodel=""><br />          <rdf:Descriptionrdf:about="#_525530"><br />            <bqbiol:is><br />              <rdf:Bag><br />                <rdf:lirdf:resource="urn:miriam:obo.chebi:CHEBI%3A4167"/><br />                <rdf:lirdf:resource="urn:miriam:kegg.compound:C00031"/><br />              </rdf:Bag><br />            </bqbiol:is><br />          </rdf:Description><br />        </rdf:RDF><br />      </annotation><br />    </species><br />The annotation element stores the RDF<br />subject<br />predicate<br />object<br />The intent is to express that the species represents a substance composed of glucose molecules<br />We also know from the SBML model that this substance is located in the cytosol and with a (initial) concentration of 0.09765M<br />ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies<br />7<br />
    23. 23. RDF-based Linked Data <br /><ul><li>Provides the basis for simple data syndication and syntactic data integration
    24. 24. IRIs
    25. 25. Statements (aka triples) take the form of 
    26. 26. <subject> <predicate> <object>
    27. 27. Easy to implement
    28. 28. stand-alone datasets
    29. 29. logical layer over databases
    30. 30. Limited reasoning
    31. 31. class and property hierarchies
    32. 32. domain/range restrictions
    33. 33. can’t automatically discover inconsistency
    34. 34. Standardized Queries - SPARQL
    35. 35. Scalable - to billions of triples</li></ul>ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies<br />8<br />
    36. 36. Objective:Computational Knowledge Discovery<br /><ul><li>Terminological resources increasingly being used to annotate SBML-based biomolecular models 
    37. 37. Makes it easier to explore or find models
    38. 38. By converting models into formal representations of knowledge we get to:
    39. 39. validate the accuracy of the annotations 
    40. 40. infer knowledge explicit in terminological resources
    41. 41. discover biological implications inherent in the models and the results of simulations.</li></ul>ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies<br />9<br />
    42. 42. Approach<br />We formally represent semantically annotated biomodelssuch that it becomes possible to:<br />Capture the semantics of models and the biological systems they represent<br />Check the consistency of biological knowledge represented through automated reasoning<br />query the results of simulations in the context of the biological knowledge<br />10<br />INRIA2011::Dumontier<br />
    43. 43. Have you heard of OWL?<br />COMBINE2011:Dumontier<br />11<br />
    44. 44. OWL - The Web Ontology Language<br />Enhanced vocabulary (strong axioms) to express knowledge relating to classes, properties, individuals and data values<br /><ul><li>quantifiers
    45. 45. existential, universal, cardinality restriction
    46. 46. negation
    47. 47. disjunction
    48. 48. property characteristics
    49. 49. transitive, functional, inverse functional, symmetric, antisymmetric, reflexive, irreflexive
    50. 50. complex classes in domain and range restrictions
    51. 51. property chains</li></ul>12<br />COMBINE2011:Dumontier<br />12<br />
    52. 52. Powerful Reasoning over OWL ontologies<br /><ul><li>Consistency: determines whether the ontology contains contradictions.
    53. 53. Satisfiability: determines whether classes can have instances.
    54. 54. Subsumption: is class C1 implicitly a subclass of C2?
    55. 55. Classification: repetitive application of subsumption to discover implicit subclass links between named classes
    56. 56. Realization: find the most specific class that an individual belongs to. </li></ul>13<br />COMBINE2011:Dumontier<br />13<br />
    57. 57. Triples to axioms: Many possible formalizations – knowledge of logics and domain expertise comes in handy here!<br />We make an ontological commitment by converting RDF triples into OWL axioms.<br /> <br />Triple in RDF:<br /><C1 R C2><br /><ul><li>C1 and C2 are classes, R a relation between 2 classes
    58. 58. intended meaning:
    59. 59. C1 SubClassOf: C2
    60. 60. C1 SubClassOf: R some C2
    61. 61. C1 SubClassOf: R only C2
    62. 62. C2 SubClassOf: R some C1
    63. 63. C1 SubClassOf: S some C2
    64. 64. C1 DisjointFrom: C2
    65. 65. C1 and C2 SubClassOf: owl:Nothing
    66. 66. R some C1 DisjointFrom: R some C2
    67. 67. C1 EquivalentClasses C2
    68. 68. ...
    69. 69. in general: P(C1, C2), where P is an OWL axiom (template)</li></ul>ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies<br />14<br />
    70. 70. INRIA2011::Dumontier<br />15<br />Conceptualization:Models and their components represent physical entities (material entities, processes)<br />Formalization: <br />every element E of the SBML language represents a class Rep(E) and we assert that  E subClassOf: represents some Rep(E)<br />
    71. 71. Top-level ontologies can make additional commitment by enforcing disjointnessamong basic types<br />Material object, Process, Function and Quality are mutually disjoint.<br />ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies<br />16<br />
    72. 72. Relations impose additional constraints, such that inconsistencies arise when incorrectly used<br />ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies<br />17<br />
    73. 73. For each model annotation, we make a commitment to what it represents<br />OWL Axiom:<br />Model SubClassOf: represents some MaterialEntity<br />Conversion rule: a Model annotated with class C represents:<br />If C is a SubClassOfMaterialEntity then <br />M SubClassOf: represents some C<br />If C is a SubClassOfFunction then<br />M SubClassOf: represents some (has-function some C)<br />If C is a SubClassOfProcess then<br />M SubClassOf: represents some (has-function some (realized-by only C))<br />
    74. 74. BIOMODEL 82: Converting Model<br />Annotated with heterotrimeric G-protein complex cycle (GO:0031684):<br /><ul><li> represents an object O1
    75. 75. O1 has a function F1
    76. 76. F1 is realized by processes of the type heterotrimeric G-protein complex cycle</li></ul>M SubClassOf: represents some O1<br />O1 SubClassOf: (has-function some (realized-by only GO:0031684)<br />
    77. 77. Every compartment represents a material entity<br />Compartment(x): a model component that represents a material entity which is part of the object represented by the model to which the component belongs<br />   Compartment SubClassOf  'model component'  <br />     and represents some 'Material object'<br />Conversion rule:<br /><ul><li> represents an object O2
    78. 78. part of the object represented by the model
    79. 79. compartment’s species represent objects that are located in O2
    80. 80. C SubClassOf: represents some A2
    81. 81. A2 SubClassOf: located-in some A1</li></li></ul><li>Models and their components represent physical entities (material entities, processes)<br />INRIA2011::Dumontier<br />model-of<br />model<br />Physical system<br />has part<br />represents<br />species<br />Material entity<br />represents<br />compartments<br />Material entity<br />represents<br />SBMLHarvester<br />reactions<br />Material entity<br />Robert Hoehndorf, Michel Dumontier, John H Gennari, SaralaWimalaratne, Bernard de Bono, Daniel L Cook and Georgios V Gkoutos. Integrating systems biology models and biomedical ontologies. BMC Systems Biology 2011, 5:124.<br />has-function<br />Function<br />realized in only<br />Class<br />Process<br />Individual<br />Datatype<br />21<br />
    82. 82. ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies<br />22<br />
    83. 83. SBML2OWL: Implementation<br />SBML Harvester<br /><ul><li>libSBMLto access model structure & extract RDF annotations
    84. 84. Jena RDF API to parse RDF annotations
    85. 85. OWLAPI to create OWL axioms & reason with top-level ontology</li></ul>Application to BioModels repository yields:<br /><ul><li>OWL ontology with more than 300,000 classes, 800,000 axioms
    86. 86. includes all referenced ontologies
    87. 87. GO (functions, compartments, processes)
    88. 88. ChEBI (molecules)
    89. 89. Celltype (cell types)
    90. 90. FMA (anatomy)
    91. 91. PATO (qualities)</li></li></ul><li>Answering questions<br />ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies<br />24<br />
    92. 92. Model verification<br />After reasoning, we found 27 models to be inconsistent<br />reasons<br />our representation - functions sometimes found in the place of physical entities (e.g. entities that secrete insulin). better to constrain with appropriate relations<br />SBML abused – e.g. species used as a measure of time<br />constraints in the ontologies themselves mean that the annotation is simply not possible<br />ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies<br />25<br />
    93. 93. Finding inconsistencies with axiomatically enhanced ontologies<br />recent work treats function as process and axioms state that an ATPase activity (GO:0004002) is a Catalytic activity that has Water and ATP as input, ADP and phosphate as output and is a part of an ATP catabolic process.<br />To this, we add:<br /><ul><li>GO: ATP + Water the only inputs (universal quantification)
    94. 94. ChEBI: Water, ATP, alpha-D-glucose 6-phosphate  are all different (disjointness)</li></ul>BIOMD0000000176 and BIOMD0000000177 models of anaerobic glycolysis in yeast.<br /><ul><li>“ATP” input to “ATPase” reaction, which is annotated with ATPase activity. The species “ATP”, however, is mis-annotated with Alpha-D-glucose 6-phosphate (CHEBI:17665), not with ATP.</li></li></ul><li>Disambiguation: Model annotations<br />Assertion:<br />M SubClassOf: represents some C or represents some (has-function some C) or represents some (has-function some (realized-by only C))<br />C SubClassOf: MaterialEntity<br />Then: <br /><ul><li>represents some C is satisfiable
    95. 95. represents some (has-function some C) and represents some (has-function some (realized-by only C)) are unsatisfiable</li></ul>ISMB2011::Dumontier|Hoehndorf::Formalizing Systems Biology with Biomedical Ontologies<br />27<br />
    96. 96. Species are further described with ‘modifiers’ in the context of a reaction<br />28<br />INRIA2011::Dumontier<br />essential activator<br /><listOfModifiers> <br /> <modifierSpeciesReferencesboTerm="SBO:0000461" species="X"/> </listOfModifiers><br />partial inhibitor<br /><listOfModifiers> <br /> <modifierSpeciesReferencesboTerm="SBO:0000536" species="PX"/> </listOfModifiers><br />
    97. 97. Roles are realized in the context of processes by material entities<br />INRIA2011::Dumontier<br />model<br />sio: has proper part<br />sio: represents<br />sio: has direct part<br />species<br />ChEBI:molecule<br />ChEBI:substance<br />sio: role of<br />SBO: participant role<br />sio: has participant<br />SBMLHarvester+<br />sio: realizes<br />sio: represents<br />GO:Process<br />reaction<br />Class<br />role chain: realizes o role of -> has participant<br />Individual<br />Datatype<br />29<br />
    98. 98. Semanticscience Integrated Ontology (SIO)<br />OWL2 ontology<br />1000+ classes covering basic types (physical, processual, abstract, informational) with an emphasis on biological entities<br />183 basic relations (mereological, participatory, attribute/quality, spatial, temporal and representational)<br />axioms can be used by reasoners to compute inferences for consistency checking, classification and answering questions about life science knowledge<br />embodies emerging ontology design patterns <br />specifies a data model<br />dereferenceable URIs<br />searchable in the NCBO bioportal<br />Available at<br />30<br />INRIA2011::Dumontier<br />
    99. 99. Examining Mathematical Expressions<br />31<br />INRIA2011::Dumontier<br /><assignmentRulemetaid="metaid_0400235" variable="k_tl"> <br /> <math><br /> <apply> <br /> <divide/> <br /> <ci> eff </ci> <br /> <ci> t_ave </ci> <br /> </apply> <br /> </math> <br /></assignmentRule><br /> <kineticLawsboTerm="SBO:0000049"> <br /> <math><br /> <apply> <br /> <times/> <br /> <ci> k_tl </ci> <br /> <ci> X </ci> <br /> </apply> <br /> </math> <br /> </kineticLaw><br /><parameter metaid="metaid_0000233" id="k_tl" name="k_tl" constant="false" sboTerm="SBO:0000016"> (unimolecularrate constant)<br /> <notes> <p xmlns="">Translation rate constant</p> </notes> <br /></parameter><br /> <parameter metaid="metaid_0000025" id="eff" name="translation efficiency" value="20"> <br /> <notes> <p xmlns="">Average number of proteins per transcript</p> </notes> <br /></parameter><br />
    100. 100. SBML Reactions may be specified by mathematical expressions, which contain quantitative variables that denote quantities<br />SBO:Reaction<br />Quantity<br />sio: is specified by<br />sio:denotes<br />sio:<br />has proper part<br />SBO: systems<br /> description<br /> parameter<br />SBO:mathematical<br />expression<br />sio: has value<br />Literal<br />Class<br />sio:derives from<br />SBMLFarmer<br />Individual<br />Datatype<br />INRIA2011::Dumontier<br />32<br />
    101. 101. When running a simulation, some attributes change with time<br />33<br />INRIA2011::Dumontier<br />species<br />double<br />sio:has attribute<br />sio: has value<br />sio: has unit<br />uo:unit<br />attribute<br />sio: measured at<br />sio: has value<br />datetime<br />time<br />sio: result of<br />sio: has agent<br />simulation<br />software<br />sio: conforms to<br />sio: has participant<br />KISAO:<br />algorithm<br />parameter<br />model<br />expression<br />Class<br />Individual<br />Datatype<br />
    102. 102. 34<br />INRIA2011::Dumontier<br />
    103. 103. Copasi output:not machine understandable<br />35<br />INRIA2011::Dumontier<br />
    104. 104. Query Answering over RDF/OWL<br />Find those concentration measurements for species that represent molecular entities that contain ribonucleotide residues<br />‘concentration’ <br />and (‘measured at’ some double[>20.0, <40.0])<br />and ‘is attribute of’ some (<br /> ‘species’<br /> and ‘represents’ some (‘has part’ some ‘ribonucleotide residue’)<br />)<br />36<br />INRIA2011::Dumontier<br />ChEBI ontology<br />
    105. 105. Curve Analysis: Elements of a Plot<br />37<br />Curve: Monotonicity<br />Curves Contain<br /><ul><li>Points
    106. 106. Line Segments
    107. 107. Curve Segments</li></ul>Curves Annotated As<br /><ul><li>Monotonic/Non-Monotonic
    108. 108. Overall Increasing/Decreasing</li></ul>Curve Segments<br /><ul><li>Contain line segments
    109. 109. Same properties as curve</li></ul>Line Segments<br /><ul><li>Contain points
    110. 110. Increasing/Decreasing</li></ul>Points<br /><ul><li>Have attributes and values
    111. 111. Can be local minima/maxima
    112. 112. Can be inflection points</li></ul>TEDDY_0000144<br />Point: Stationary Point <br />(Maximum)<br />B<br />A<br />C<br />Curve Segment<br />Overall Change (Slope)<br />Constituent Points<br />Concentration<br />strictly <br />increasing<br />TEDDY_0000008<br />strictly <br />decreasing<br />Curve: Overall Change<br />D<br />TEDDY_0000009<br />Time<br />INRIA2011::Dumontier<br />
    113. 113. Queries<br />‘local maximum’ <br />and ‘is attribute of’ some (<br /> species <br /> and represents some (<br /> ‘has function’ some ‘dna binding’<br />))<br />38<br />INRIA2011::Dumontier<br />Biomodel + UniProt + GO<br />
    114. 114. Get the non-monotonic curves for protein species<br />‘non-monotonic curve’<br />and ‘has part’ some (<br />‘concentration’<br /> and ‘is attribute of’ some (<br /> ‘species’ <br /> and ‘represents’ some ‘protein’))<br />)<br />39<br />INRIA2011::Dumontier<br />
    115. 115. Conclusion<br />The SBML-derived ontologies can be <br /> i) checked for their consistency, thereby uncovering erroneous curations<br /> ii) infer attributes and relations of the substances, compartments and reactions beyond what was originally described in the models<br /> iii) answer sophisticated questions across a model knowledge base <br />iv) extended with modifiers, mathematical expressions and parameters, simulation Results (from tab files) to answer questions about simulation results with reference to the semantic annotations (GO) in biomodels, UniProt<br />40<br />INRIA2011::Dumontier<br />
    116. 116. 41<br />Acknowledgements<br />Leonid Chepelev<br />Robert Hoehndorf <br />INRIA2011::Dumontier<br />
    117. 117. Michel Dumontier<br /><br />42<br />INRIA2011::Dumontier<br />Publications:<br />Presentations:<br />