Integrative information management for systems biology


Published on

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Integrative information management for systems biology

  1. 1. Integrative Information Management for Systems Biology <ul><li>Neil Swainston </li></ul><ul><li>Manchester Centre for Integrative Systems Biology </li></ul><ul><li>Data Integration in the Life Sciences, Gothenburg, Sweden </li></ul><ul><li>27 August 2010 </li></ul>
  2. 2. The MCISB <ul><li>Pioneer the development of new experimental and computational technologies in systems biology </li></ul><ul><li>Currently employs 9.5 multidisciplinary people </li></ul><ul><ul><li>Mathmaticians, informaticians, experimentalists, etc. </li></ul></ul><ul><ul><li>All share same office , lab </li></ul></ul><ul><li>Develop kinetic models of yeast metabolism </li></ul>
  3. 3. Metabolism
  4. 4. Models <ul><li>Genome-scale SBML model of yeast metabolism </li></ul><ul><li>Not kinetic / quantitative! </li></ul><ul><li>Annotated model </li></ul><ul><ul><li>All >2000 molecules have unique database references </li></ul></ul><ul><ul><li>MIRIAM standards have been followed ( RDF ) </li></ul></ul><ul><ul><li>Should be entirely unambiguous for third party users </li></ul></ul><ul><ul><li>Should be usable in third party tools </li></ul></ul><ul><ul><li>Should allow experimental data to be imported easily </li></ul></ul><ul><li>Herrgård MJ, Swainston N, et al. A consensus yeast metabolic network reconstruction obtained from a community approach to systems biology. Nat Biotechnol . 2008, 26 , 1155-60. </li></ul>
  5. 5. Bottom-up systems biology <ul><li>Steps in kinetic modeling: </li></ul><ul><ul><li>Identify the pathway or portion of a network that is to be modeled </li></ul></ul><ul><ul><li>Associate the model with functions and parameter values that represent its dynamic behavior, either from databases or experimentation </li></ul></ul><ul><ul><li>Analyze and/or simulate the resulting model to understand its properties </li></ul></ul>
  6. 6. Bottom-up systems biology <ul><li>In common practice, model construction is a manual process , in which a modeler associates a model with experimental data for simulation </li></ul><ul><li>Such an approach can give rise to good quality models, but is more a cottage industry than as a highly scalable production process </li></ul><ul><li>Can this be automated ? </li></ul>
  7. 7. Automation of the process <ul><li>Experimental data is captured from instruments , and subject to primary analyses </li></ul><ul><li>Experimental data and results of the primary analyses are archived in experimental data repositories </li></ul><ul><li>The information required for modeling is extracted from the experimental data resources and stored in a Key Results Database (KRDB) </li></ul><ul><li>A workflow obtains qualitative model information , represented using SBML, parameterizes this model with results in the KRDB, and analyses / simulates the resulting quantitative model </li></ul>
  8. 8. Enzyme kinetics Quantitative metabolomics Quantitative proteomics SBML Model Parameters (K M , k cat ) Variables (metabolite, protein concentrations) PRIDE XML MeMo SABIO-RK Web service KRDB Web service
  9. 9. From instrument to result <ul><li>Raw data typically needs analysed before use </li></ul><ul><li>Experimental data is often managed in an ad hoc way </li></ul><ul><li>Experimentalists are not keen to spend time on data curation for archiving or sharing </li></ul><ul><li>Try to capture necessary metadata as part of primary data analysis </li></ul>
  10. 10. Requirements <ul><li>The experimental techniques share requirements: </li></ul><ul><ul><li>perform analyses on the raw experimental data to derive the secondary quantitative parameters required in the model </li></ul></ul><ul><ul><li>store the raw experimental data along with relevant metadata and the derived parameters , thus providing the facility to trace back and reanalyze raw data should this be required </li></ul></ul><ul><ul><li>Where possible, existing data standards and tools are reused, although in practice data standards tend to lag behind technique development, and tools tend to lag behind standards </li></ul></ul>
  11. 11. Data capture <ul><li>Software wizards have been developed that step experimentalists through the analysis of primary data </li></ul><ul><ul><li>QconCAT PrideWizard for proteomics </li></ul></ul><ul><ul><li>KineticsWizard for enzyme kinetics </li></ul></ul><ul><li>Metadata collected along the way, as unobtrusively as possible </li></ul><ul><ul><li>Heavily reliant on database web services </li></ul></ul>
  12. 12. KineticsWizard
  13. 13. QconCAT PrideWizard
  14. 14. QconCAT PrideWizard eXist database PRIDE XML Identify QconCAT Pride Wizard Quantify Format Upload Web / web service Browser Mascot PRIDE XML PRIDE Converter mzData Pride
  15. 15. Web interfaces
  16. 16. From instrument to result <ul><li>All laboratories carry out primary analyses of experimental data </li></ul><ul><li>All laboratories carry out some form of secondary analyses based on primary results </li></ul><ul><li>Many laboratories struggle to manage the results of these processes in a systematic manner </li></ul><ul><li>We see the key to obtaining manageable results as being to integrate data capture and management with necessary analyses </li></ul>
  17. 17. But… <ul><li>… MCISB has to manage “only” three types of experiment </li></ul><ul><ul><li>Proteomics, metabolomics, enzyme kinetics </li></ul></ul><ul><li>Informatics team share office with experimentalists and modellers </li></ul><ul><li>We’ve been doing this for years… </li></ul><ul><ul><ul><li>Lots of time, lots of people, lots of resource </li></ul></ul></ul><ul><ul><ul><li>Infrastructure development is part of our remit </li></ul></ul></ul>
  18. 18. And… <ul><li>… many projects are far more diverse </li></ul><ul><li>Informatics team separated from experimentalists, who are separated from modellers </li></ul><ul><li>Less informatics resource </li></ul><ul><li>Heavyweight approach of MCISB ( bespoke tools for each experiment) not always applicable… </li></ul>
  19. 19. So… <ul><li>… lightweight approach may be more suitable </li></ul><ul><li>Store only secondary data necessary for modelling </li></ul><ul><ul><ul><li>Not raw data </li></ul></ul></ul><ul><ul><li>Key Results Database (KRDB) </li></ul></ul><ul><ul><ul><li>More modeller -focussed </li></ul></ul></ul>
  20. 20. Key Results Database <ul><li>Who , what , some how and why ? </li></ul><ul><li>Measure “something” under “some conditions” </li></ul><ul><li>Measurements are generally a number but may be some other artifact </li></ul><ul><li>Conditions may apply across entire experiment (Static Factors) </li></ul><ul><li>Conditions may change across measurements (Variable Factors) </li></ul><ul><li>Measurements may take place at a certain time </li></ul>
  21. 21. Key Results Database
  22. 22. KRDB structure
  23. 23. KRDB web interface
  24. 24. KRDB web interface
  25. 25. Key Results Database <ul><li>Deployed in Liverpool, MCISB, UCD </li></ul><ul><li>Easily extensible interface </li></ul><ul><li>eXist “lets it all hang out” as RESTful web services </li></ul>
  26. 26. Modelling infrastructure
  27. 27. Taverna
  28. 28. Taverna
  29. 29. Modelling life-cycle workflows
  30. 30. Qualitative model construction Input: list of ORFs Output: SBML file 1. Get reaction info 3. Create species 2. Create compartments 4. Create reactions
  31. 31. Qualitative model construction
  32. 32. Qual to quan: parameterisation <ul><li>Data requirements </li></ul><ul><ul><li>Qualitative SBML model </li></ul></ul><ul><ul><li>Starting concentrations for enzymes and source metabolites </li></ul></ul><ul><ul><ul><li>Key Results Database </li></ul></ul></ul><ul><ul><li>Enzyme kinetics data </li></ul></ul><ul><ul><ul><li>SABIO-RK database web service </li></ul></ul></ul>
  33. 33. Qual to quan: parameterisation
  34. 34. Model parameterisation
  35. 35. Model calibration <ul><li>Optional modification of parameters in reaction kinetics until the output of the model produces results similar to those obtained from experimentation </li></ul><ul><li>Data requirements </li></ul><ul><ul><li>Parameterised SBML model </li></ul></ul><ul><ul><li>Experimental data </li></ul></ul><ul><ul><ul><li>Metabolite concentrations from KRDB </li></ul></ul></ul><ul><ul><li>Calibration by COPASI web service </li></ul></ul>
  36. 36. COPASI web service Design and Architecture of Web Services for Simulation of Biochemical Systems. Dada JO, Mendes P. Data Integration in the Life Sciences, Manchester, UK (2009).
  37. 37. Model calibration
  38. 38. Model simulation <ul><li>The running of a parameterized (and calibrated?) model using a specified simulation operation </li></ul>
  39. 39. Model simulation
  40. 40. SBRML <ul><li>Simulation results are data too, and are represented in our case in SBRML </li></ul><ul><ul><li>Systems Biology Results Markup Language </li></ul></ul><ul><ul><li>Developed by Joseph Dada, et al. (Manchester) </li></ul></ul><ul><li>Structured format for representing simulation results </li></ul><ul><ul><li>And experimental data ? </li></ul></ul><ul><li>Dada JO, et al . SBRML: a markup language for associating systems biology data with models. Bioinformatics 2010, 26 , 932-938. </li></ul>
  41. 41. Model simulation
  42. 42. Conclusion <ul><li>Classically, systems biology has been a cottage industry </li></ul><ul><ul><li>Experimental results are selected for use in modelling in an ad hoc manner </li></ul></ul><ul><ul><li>Modellers develop and refine models using a time consuming and partially documented process </li></ul></ul>
  43. 43. Conclusion <ul><li>Large scale experimentation should lead to more systematic behaviour </li></ul><ul><ul><li>Data integration to support the construction and parameterisation of models </li></ul></ul><ul><ul><li>Large scale computational experimentation to support the comparison of models and their results </li></ul></ul>
  44. 44. Thanks…
  45. 45. Integrative Information Management for Systems Biology <ul><li>Neil Swainston </li></ul><ul><li>Manchester Centre for Integrative Systems Biology </li></ul><ul><li>Data Integration in the Life Sciences, Gothenburg, Sweden </li></ul><ul><li>27 August 2010 </li></ul>