Modelling metabolite concentrations in OWL using Pronto

841 views

Published on

OWLED 2011 presentation on the topic of modelling metabolite concentrations

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
841
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • While genomic and proteomic information describe the overallcellular machinery available to an organism, the metabolic profile ofan individual at a given time provides a canvas as to the current physiologicalstate. Concentration levels of relevant metabolites vary underdifferent conditions, in particular, in the presence or absence of differentdisorders.
  • 780 chemical entities with chemical structures have associated the role ‘metabolite’.This information is somewhat useful for the clustering of molecules – at least it allows us to distinguish those molecules that can be metabolites from those that cannot – but it is far too general. We can do much better.
  • ChEBI roles represent activeproperties of chemical entities – what chemicals do in biological contextsThis information is enhanced by specific representation of the context in which the chemicals are so activeFor metabolites, contextual information includes: - which organism (taxonomy) - how much of the metabolite is usually (normally) present in different body fluids - which disorders are associated to abnormal levels in different body fluids
  • The HMDB is a database collecting together knowledge about all known human metabolites, including physicochemical, spectral, clinical, biochemical and genomic informationFor each metabolite, HMDB includes measured concentration values taken from human samples of different biofluids (such as blood, urine, cerebrospinal fluid), from persons of different ages and with different underlying conditions.
  • In some cases the link between certain concentration values and the associated disorders can be pretty close to certain – consider – pregnancy testing.
  • HMDB data is parsed from its MetaboCards download formatWe extract metabolite concentrations from HMDB where there are both a normal and an abnormal (associated with some disease) concentration level for an adult subject. The difference between the normal and abnormal concentration values indicates a threshold between these scenariosWe want to infer the likelihood of presence of disorder by virtue of the numeric concentration value being closer to the known disordered concentration than to the known normal concentration.
  • uM = micromolar (1e-6 M)
  • Problem with standard OWL inferences and uncertainty...Introduce probabilistic DL...
  • Lukasiewicz, T.: Probabilistic description logics for the semantic web. TU Vienna infsys research report (2007)
  • We create classes for the categories of low, medium and high risk of having the given disorder.Note that the variation of risk with concentration value can be thought of, asa simplifying assumption, as a continuously valued function ranging over allpossible concentration values. However, as Pronto constraints take the formof intervals associated with classes (or instances), to create a finite numberof OWL classes and associate probability intervals to them, it is necessary todiscretize the probability function into fixed ranges.
  • What is the risk that Barry has diabetes?
  • Pronto’s strategy for combining two constraints, in the absence of a conflict, resembles a data union operationWhen multiple constraints conflict, Pronto prefers more specific statements toless specific. We evaluated this behaviour by changing the medium risk constraintto overlap with the high risk constraint, setting the upper bound for mediumto 0.55 instead of 0.54. In this case, Pronto concludes that the probability forBarry having diabetes is [0.55;0.55] -- the most specific (narrowest) resolution. Ifthe medium risk ranges to 0.6, Pronto entails Barry the range [0.55;0.6]. Thus,it seems that the behaviour on conflict (at least for the two-axiom scenario wetest here) resembles an intersection of the two underlying data ranges.
  • Neither of these results is an optimal representation of the intuitive requirement driven by the use case: it would be betterif the probabilistic combination of different types of evidence for the same conclusion increased the certainty of the conclusion. However, Pronto does allow for overriding inherited constraints in more specific subclasses. Thus, we can specify a new risk subclass for Barry's combined risk categories, and associate this with the disease with a new probability range (e.g. [0.54;0.85]). However,this approach is in general somewhat cumbersome as it would require adding many more classes and constraints to the knowledge base -- for all interesting combinations of risk factors.
  • Modelling metabolite concentrations in OWL using Pronto

    1. 1. OWLED 2011<br />Modelling threshold phenomena in OWL:Metabolite concentrations as evidence for disorders<br />Janna Hastings 1,2<br />Ludger Jansen 3,4<br />Christoph Steinbeck 1<br />Stefan Schulz 5<br />1Chemoinformatics and Metabolism, European Bioinformatics Institute, UK<br />2 Swiss Centre for Affective Sciences, University of Geneva, Switzerland<br />3 Department of Philosophy, University of Rostock, Germany<br />4 Department of Philosophy, RWTH Aachen University, Germany<br />5 Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Austria<br />
    2. 2. Motivation<br />How do we link chemical entities to diseases?<br />Chemicals can be used as drugs to treatdiseases<br />But also, chemicals infuse living organisms as metabolites: by-products of metabolic processes that indicate which of those processes have taken place<br />Wednesday, June 08, 2011<br />2<br />Metabolite concentrations as evidence for disorders (OWLED 2011)<br />
    3. 3. ChEBI<br />ChEBI is an ontology for chemicalswhich appear in a biological context<br />Chemical entities, such as molecules and ions are classified structurally, and assigned to one or more roles<br />Examples: antioxidant, analgesic drug, cyclooxygenase inhibitor, ... metabolite<br />Wednesday, June 08, 2011<br />3<br />Metabolite concentrations as evidence for disorders (OWLED 2011)<br />
    4. 4. ChEBI Roles<br />Wednesday, June 08, 2011<br />4<br />Metabolite concentrations as evidence for disorders (OWLED 2011)<br />
    5. 5. Metabolites in ChEBI<br />Wednesday, June 08, 2011<br />5<br />Metabolite concentrations as evidence for disorders (OWLED 2011)<br />
    6. 6. Contextual information<br />In which organism(s) is the molecule a metabolite?<br />How much (what concentration) of this metabolite is normally present in different bio-fluids of those organisms?<br />Which disorders are associated with abnormal levels (increased or decreased) of this metabolite?<br />Wednesday, June 08, 2011<br />6<br />Metabolite concentrations as evidence for disorders (OWLED 2011)<br />
    7. 7. Human Metabolome DB<br />Database of humanmetabolites and associated contextual information <br />Includes measured concentration valuesfrom different human samples under different conditions (specified as free text!)<br />Wednesday, June 08, 2011<br />7<br />Wishart DS, Knox C, Guo AC, et al. HMDB: a knowledgebase for the human metabolome. Nucleic Acids Res. 2009 37(Database issue):D603-610. <br />Metabolite concentrations as evidence for disorders (OWLED 2011)<br />
    8. 8. Metabolite concentrations and OWL<br />Numeric data (OWL data ranges; DL concrete domains)<br />Link between concentrations and disorders is not certain, but a concentration of some metabolite above a certain threshold isconsidered evidence for the presence of a disorder <br />Threshold between normal and abnormal levels is vague(no definite cut-off)<br />Wednesday, June 08, 2011<br />8<br />Metabolite concentrations as evidence for disorders (OWLED 2011)<br />
    9. 9. We extract:<br />Metabolite concentration values<br />for metabolites found in ChEBI<br />where both a normal and an abnormal value are present for an adult subject<br />The difference between the normal and abnormal concentration indicates a thresholdbetween these scenarios<br />Wednesday, June 08, 2011<br />9<br />Data extraction<br />Metabolite concentrations as evidence for disorders (OWLED 2011)<br />
    10. 10. Reasoning with OWL data ranges<br />Can we use the ontology to automatically differentiate normal from abnormal concentrations? <br />Wednesday, June 08, 2011<br />10<br />4440 uM (normal adult)<br />7000 uM (adult with diabetes)<br />D-glucosein blood<br />measured value(abnormal)<br />measured value(normal)<br />threshold<br />metaboliteconcentration<br />abnormal<br />Metabolite concentrations as evidence for disorders (OWLED 2011)<br />
    11. 11. Generated ontology<br />Wednesday, June 08, 2011<br />11<br />`concentration of D-glucose in Blood associated with Diabetes mellitus type 2'<br />equivalentTo ( `concentration in blood'<br /> and (hasMetabolite some `portion of D-glucose')<br /> and (hasConcentrationValue some double[>= 5700.0]) )<br />Metabolite concentrations as evidence for disorders (OWLED 2011)<br />
    12. 12. Uncertainty<br />Individual differences mean that we can’t straightforwardly associate an abnormal metabolite concentration with a disorder<br />Rather, we want to infer the likelihood(risk) that a patient has a given disorder, given their metabolite concentration value<br />Wednesday, June 08, 2011<br />12<br />?<br />Metabolite concentrations as evidence for disorders (OWLED 2011)<br />
    13. 13. Probabilistic DLs<br />Probabilistic DLs extend traditional DLs with the ability to associate with each axiom in the ontology a probability valuewhich represents the degree of certainty of the axiom. <br />Probabilistic knowledge consists of conditional constraints: <br /> (v | j) [l, u] <br />with l, u real numbers in the range [0, 1]<br />encodes that j is a subclass of v with probability between l and u.<br />Wednesday, June 08, 2011<br />13<br />Metabolite concentrations as evidence for disorders (OWLED 2011)<br />
    14. 14. PRONTO<br />A probabilistic, non-monotonic extension to Pellet<br />Accepts probabilistic axioms of the form<br />X subClassOf Y [l, u] <br />(as annotations: pronto:certainty)<br />Version 0.2 with slight modification: upgraded to the latest Pellet and OWL API releases<br />Klinov, P.: Pronto: A Non-monotonic Probabilistic Description Logic Reasoner. Lecture Notes in Computer Science, vol. 5021, chap. 66, pp. 822-826.<br />Wednesday, June 08, 2011<br />14<br />Metabolite concentrations as evidence for disorders (OWLED 2011)<br />
    15. 15. Discretization<br />We assume disorder risk varies continuouslywith metabolite concentration<br />However, Pronto accepts only discreteranges<br />Wednesday, June 08, 2011<br />15<br />high<br />measured value(normal)<br />measured value(abnormal)<br />threshold<br />probability of associated disorder<br />metaboliteconcentration<br />low<br />mediumrisk<br />low risk<br />high risk<br />Metabolite concentrations as evidence for disorders (OWLED 2011)<br />
    16. 16. Reasoning with probabilities<br />Wednesday, June 08, 2011<br />16<br />2<br />what is the likelihood that this person has this disorder? (reasoning based on probabilistic constraints)<br />Low risk<br />0.00—0.24<br />Disorder<br />Medium risk<br />concentration<br />in blood<br />0.25—0.54<br />High risk<br />0.55—1.00<br />1<br />what risk category is this concentration? (reasoning based on data restrictions)<br />Metabolite concentrations as evidence for disorders (OWLED 2011)<br />
    17. 17. Results<br />Wednesday, June 08, 2011<br />17<br />…<br />Metabolite concentrations as evidence for disorders (OWLED 2011)<br />
    18. 18. Combining different evidence<br />Can we accumulate the evidence (i.e. increase the likelihood) for the presence of a given disorder if there are multiple metabolite concentration values pointing towards it?<br />Wednesday, June 08, 2011<br />18<br />concentration of D-glucose<br />in blood<br />Diabetes<br />concentration of Acetoacetic acid<br />in blood<br />BARRY<br />Metabolite concentrations as evidence for disorders (OWLED 2011)<br />
    19. 19. Results: no conflict (Union)<br />Pronto will combinethe probabilistic constraints<br /> medium risk [0.25; 0.54]<br /> and<br /> high risk [0.55; 1.00]<br />Barry’s risk of having diabetes is in [0.25; 1.00]<br />Wednesday, June 08, 2011<br />19<br />Metabolite concentrations as evidence for disorders (OWLED 2011)<br />
    20. 20. Results: conflict (Intersection)<br />What happens if Pronto combines probabilistic constraints that overlap? <br />medium [0.25; 0.55] and high risk [0.55; 1.00]<br /> Barry’s risk : [0.55; 0.55]<br />medium [0.25; 0.60] and high risk [0.55; 1.00]<br /> Barry’s risk : [0.55; 0.60]<br />Wednesday, June 08, 2011<br />20<br />Metabolite concentrations as evidence for disorders (OWLED 2011)<br />
    21. 21. Discussion<br />Our intuitive requirement is not met: <br />multiple forms of evidence for the same conclusion strengthen the likelihood of that conclusion<br />To address this, Pronto allows creating overriding constraints in sub-classes<br />Wednesday, June 08, 2011<br />21<br />+<br />=<br />Medium risk<br />High risk<br />Medium-high risk<br />e.g. [0.54;0.85]<br />Metabolite concentrations as evidence for disorders (OWLED 2011)<br />
    22. 22. Limitations<br />We did not attempt:<br />Combined reasoning with more than two conflicting or non-conflicting constraints;<br />Linking the generated ontology to the rest of ChEBI and to a relevant disease ontology;<br />Applying probabilistic constraints to all possible diseases and metabolites in generated ontology; and<br />Systematic performance evaluation of Pronto for this use case<br />Wednesday, June 08, 2011<br />22<br />Metabolite concentrations as evidence for disorders (OWLED 2011)<br />
    23. 23. OWL and uncertainty<br />Accurately modelling the association between metabolites and disorders requires a semantics for uncertainty<br />Reasoner behaviour when combining different constraints is crucial for adequate applicability to different use cases<br />Future work will involve evaluating alternative probabilistic DLs based on Bayesian networks<br />Wednesday, June 08, 2011<br />23<br />Metabolite concentrations as evidence for disorders (OWLED 2011)<br />
    24. 24. Conclusion<br />OWL 2 (with data properties and restrictions) <br />and probabilistic DL (as implemented in Pronto)<br />CAN be used to represent the chemical—disease association <br />via metabolite concentration values<br />The ontology (META.owl) and software (META.zip) are available for download from http://www.ebi.ac.uk/~hastings/concentrations/.<br />Wednesday, June 08, 2011<br />24<br />Metabolite concentrations as evidence for disorders (OWLED 2011)<br />
    25. 25. Acknowledgements<br />Funding<br />BBSRC, grant agreement number BB/G022747/1 within the "Bioinformatics and biological resources" fund; and <br />DFG, grant agreement number JA 1904/2-1, SCHU 2515/1-1 GoodOD(Good Ontology Design). <br />Wednesday, June 08, 2011<br />25<br />Metabolite concentrations as evidence for disorders (OWLED 2011)<br />

    ×