Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data and model management in Systems Biology

252 views

Published on

Presentation at the 10th anniversary of SABIO-RK, Heidelberg 2016

Published in: Science
  • Be the first to comment

  • Be the first to like this

Data and model management in Systems Biology

  1. 1. Data and model management in Systems Biology Dagmar Waltemath University of Rostock, Germany Kinetics on the move – Happy 10th anniversary to SABIO-RK! Heidelberg, 31st May, 2016 http://www.slideshare.net/dagwa/data-and-model-management-in-systems-biology
  2. 2. 2 Junior research group: Management of simulation studies in systems biology Tool development: SBGN-ED for the graphical representation of networks Infrastructure: Data management for systems biology in Germany Standards and tools for model management www.sems.uni-rostock.de
  3. 3. © 2009 UNIVERSITÄT ROSTOCK 3 NBI-SysBio: Data management for systems biology in Germany 3 ● Sustainable infrastructure for data management ● Access to documented and reproducible results ● Systems Biology Standards ● Tool Development ● Education www.denbi.de (training – services – jobs)
  4. 4. © 2009 UNIVERSITÄT ROSTOCK 4 Photo: NY - http://nyphotographic.com (CC BY-SA 3.0) Photo: janneke staaks on flickr Fig. courtesy 10.1371/journal.pbio.1001779 TM
  5. 5. © 2009 UNIVERSITÄT ROSTOCK 5 Data management is … ● Data management describes procedures and actions that help to store, preserve, organize and control the data generated during a (research) project. ● Aspects of data management include: – Data Ownership; – Metadata Compilation; – Data Lifecycle Control; – Data Quality; – Data Access and Dissemination Photo: NY - http://nyphotographic.com (CC BY-SA 3.0)
  6. 6. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 6 ● Data about data ● Improved understanding of encoded data items ● Descriptive details ● Discovery and search for existing data, online browsing of data ● Standardized and structured information – Purpose, origin, time references, geographic location, creator, access conditions, and terms of use of your data collection ● Often encoded in ontologies https://www.libraries.psu.edu/psul/pubcur/what_is_dm.html#data-management Metadata
  7. 7. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 7 ● Well-structured, controlled vocabularies ● Capture and convey commonly agreed definitions and concepts in a domain ● Communication across people and software tools ● Enable reuse of domain knowledge ● Make implicit domain knowledge explicit and queryable ● Bio-ontologies – Gene Ontology, ChEBI, UniProt – Systems Biology Ontology (concepts and terminology for modeling) Ontologies
  8. 8. 8 Example: Definition of „cell growth“ in the Gene Ontology 5/31/16 id: GO:0016049 name: cell growth namespace: biological_process def: "The process in which a cell  irreversibly increases in size over  time by accretion and biosynthetic  production of matter similar to that  already present." synonym: "cell expansion" RELATED [] synonym: "cellular growth" EXACT [] synonym: "growth of cell" EXACT [] is_a: GO:0009987 ! cellular process is_a: GO:0040007 ! Growth relationship: part_of GO:0008361 !  regulation of cell size © 2009 UNIVERSITÄT ROSTOCK
  9. 9. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 9 ● Increased confidence and trust in the data ● Better understanding of how to use the data, and of the data itself ● Better data quality ● Coherent data when standards are used ● Improved business processes (saving time, guaranteeing high quality) ● Improved access to data and improved reproducibility ● Better exploitation of data through easier data exchange and integration Advantages of careful & planned data management
  10. 10. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 10 ● Reusable ● Exchangeable ● Interoperable ● Long-term available (in open repositories) ● Curateable ● Shareable Advantages of standardised data
  11. 11. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 11 Photo: janneke staaks on flickr
  12. 12. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 12 Research data in the modeling life cycle Models equations, parameters, data tables Ideas text, drawings Experimental results text, data tables Publications text, figures Analyses configuration files, data tables Fig. courtesy Martin Scharm (adapted)
  13. 13. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 13 Research data in the modeling life cycle ● Mathematical formulae ● Networks, diagrams ● Image data ● Publications ● Experiment descriptions ● Experimental results (both lab and simulation) ● Definitions of things (e.g., gene functions, chemical structures...) Figures top to bottom: (1) By Noah A. Rosenberget al. Slightly modified by User:Wobble. - Public Library of Science, CC BY 3.0, https://commons.wikimedia.org/w/index.php?curid=2839383; (2) By http://rsb.info.nih.gov/ij/images/, Public Domain, https://commons.wikimedia.org/w/index.php?curid=655748; (3) BIOM005, generated using CellDesigner 4, (4,5) PMID:18669651
  14. 14. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 14 ● Heterogenuous ● Highly connected ● Context-dependent ● Distributed ● Big Research data in the modeling life cycle Figures top to bottom: (1) By Noah A. Rosenberget al. Slightly modified by User:Wobble. - Public Library of Science, CC BY 3.0, https://commons.wikimedia.org/w/index.php?curid=2839383; (2) By http://rsb.info.nih.gov/ij/images/, Public Domain, https://commons.wikimedia.org/w/index.php?curid=655748; (3) BIOM005, generated using CellDesigner 4, (4,5) PMID:18669651
  15. 15. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 15 The model ● Mathematical equations ● Biological entities ● Kinetic information ● Encoding: & semantic annotations TM <bqmodel:isDescribedBy> <rdf:Bag> <rdf:li rdf:resource="http://identifiers.org/pubmed/18669651"/> </rdf:Bag> </bqmodel:isDescribedBy> <parameter id="parameter_49" name="L" metaid="metaid_0000078" value="20670"/>
  16. 16. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 16 SBML – Standard for model encoding ● Systems Biology Markup Language ● Community-driven de-facto Standard ● Free & open source: www.sbml.org ● Supported by many organizations and tools ● Encodes computational models of biological processes (compartments – species – reactions - parameters)
  17. 17. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 17 SBGN – Standard for visual representation ● Systems Biology Graphical Notation ● Standardised glyphs for biological entities ● Three levels – SBGN-AF | SBGN-ER | SBGN-PD ● Free & open source: www.sbgn.org ● Tool support ● Interpretable Format: SBGN-ML Fig.: http:sbgn.org
  18. 18. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 18 Fig.: SBGN map for BIOM183, CellDesigner SBGN – Standard for visual representation Fig.: SBGN map for BIOM005, CellDesigner
  19. 19. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 19 ● Reproduce behaviour of the model ● Publish and share virtualexperiments – Simulation setup / conditions – Pre- and post-processing – Observations ● Encoding: & & result data in Excel, CSV files <listOfSimulations> <uniformTimeCourse id="sim1" initialTime="0" outputStartTime="0" outputEndTime="100" numberOfPoints="100"> <algorithm kisaoID="KISAO:0000019"/> </uniformTimeCourse> </listOfSimulations> The analysis Fig. M. Stefan et al, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2596252/
  20. 20. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 20 SED-ML – Standard for model analysis ● Links to models used in an analysis ● Pre- and Post-processing of models ● Type of simulation ● Definition of output ● Free an open source: www.sed-ml.org ● Tool support →Showcase your tool support online ←
  21. 21. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 21 SED-ML – Standard for model analysis Fig. M. Stefan et al, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2596252/ Simulation of BIOM183 in SED-ML Web Tools without simulation description
  22. 22. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 22 m n Coordinate annual meetings Simulation GuidelinesOntologies - Next HARMONY: Auckland, June 7-11, 2016 - Next COMBINE: Newcastle, Sep 19-23, 2016 Coordinate standards development - Common procedures - Interoperable software tools - Discussion forums, mailing lists... Represent community - Funders - Other communities Provide standards resources - Single entry point - Resolvable URI - Web infrastructure
  23. 23. Standard-compliant software tools for modeling 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 23 The path2models project integrated data from different databases into more than 140.000 SBML models. Fig.: Büchel et al BMC Sys Biol (2013)http://www.ebi.ac.uk/biomodels-main/path2models
  24. 24. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 24 The Systems Biology Workbench is a software framework to help heterogeneous application components communicate with each other. Modeling Editing Simulating Analysinghttp://sbw.sourceforge.net Standard-compliant software tools for modeling
  25. 25. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 25 The decision whether and how to share data often rests with researchers. Roche DG, Lanfear R, Binning SA, Haff TM, Schwanz LE, et al. (2014) Troubleshooting Public Data Archiving: Suggestions to Increase Participation. PLoS Biol 12(1): e1001779. doi:10.1371/journal.pbio.1001779
  26. 26. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 26 ● Bundling files ● Shipping results ● Exchanging data ● Keeping provenance ● Encoding: zip-like file with a manifest (meta-data) ● Generate, modify & share through WebCAT COMBINE Archive
  27. 27. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 27 COMBINE Archive Original publication SBGN map SBML model versions SED-ML files Open in Webcat Open in SEEK
  28. 28. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 28 Model curation & publication
  29. 29. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 29 Model curation & publication
  30. 30. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 30 Model curation, simulation & publication
  31. 31. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 31 Introduction to SEEK & FAIRDOM by Olga Krebs.
  32. 32. 32 Thank you for your attention. http://www.denbi.de/ @SemsProject m nhttp://co.mbine.org

×