Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

FAIR data and model management for systems biology (and SOPs too!)

1,828 views

Published on

Written and presented by Carole Goble (University of Manchester) at Multiscale Biology Network Springboard meeting, Nottingham. June 1st 2015.

Published in: Science
  • Be the first to comment

  • Be the first to like this

FAIR data and model management for systems biology (and SOPs too!)

  1. 1. FAIR Data and Model Management for Systems Biology (and SOPs too!) Prof Carole Goble The University of Manchester The Software Sustainability Institute ELIXIR UK, SynBioChem Centre carole.goble@manchester.ac.uk MultiScale Biology Network Springboard meeting, Nottingham, UK, 1 June 2015
  2. 2. • Project-centric data and model management • Respect & expects other systems • Forged in fire of national & international projects • PhDs/postgrads/PIs • Context • FAIRDOM Initiative • Challenges http://www.fair-dom.org http://www.fairdomhub.org
  3. 3. republic of science* regulation of science institutions libraries *Merton’s four norms of scientific behaviour (1942) public archives cloud services
  4. 4. Reproducibility Nature, April 2015
  5. 5. https://sems.uni-rostock.de/reproducible-and-citable-data-and-models/
  6. 6. Publishers • Reproducibility • New publishable assets • New business models and services Funders, Managers • Capitalising • Skills • Justification, Audit & Compliance
  7. 7. UK Funder Data Policies http://www.dcc.ac.uk/resources/policy-and-legal/overview-funders-data-policies
  8. 8. Tools, Standards, Formats, Reporting, Policies, Practices, Initiatives
  9. 9. Data Models SOPs consistency, comparability Samples… ‘omics images, reaction kinetics, samples, specimens… Small: spreadsheets, files… Big: NGS, Mass Spec, specialist repositories… ODE, SBML, Native Matlab, PDE, Fortran, CellML… versioning, provenance tracking, parameter tracking, citation tracking, links to articles STANDARDS Asset Management
  10. 10. public archives cloud services 88 Public-Centric Asset Management
  11. 11. public archives cloud services Public-Centric Asset Management
  12. 12. Challenge: Most quantitative databases provide kinetic constants for enzymes, sometimes binding constants…. Little to help building quantitative descriptions, i.e. concentrations, sizes, diffusions…. Exceptions: gene expression data, proteomics, metabolomics. Localisation: The average concentration of a protein in a piece of brain is of limited use (mix of tissues and subcellular compartments) [Nicolas Le Novere, 2015] Public-Centric Asset Management public archives
  13. 13. FAIR for the Researcher Collaborative, data/model-driven science Publication Local and Public Resources Skills and Productivity Compliance
  14. 14. Collaboration, asset management Pop-up projects Dynamic groups Internal / external visibility
  15. 15. Pop-up projects Dynamic groups Internal / external visibility Collaboration, asset management
  16. 16. 18
  17. 17. Project-Centric Asset Management Is this data available? What SOP was used for this sample? Where is the validation data for this model? • Retain results beyond a project / the PhD student • Exchange & find assets. • Share, disseminate and publish assets sensitively • Consistent reporting for interpretation, interop & comparison • Promote standardised metadata practices. • Organise and link assets • Reuse results
  18. 18. Find Data, models, protocols, projects, people Catalogued and linked assets Link studies to assets Control sharing, versioning, gateway to scattered public/local archives Access Interoperate Standards (SBML, SED- ML…), vocabs, formats, ids harvesting, export, API Reuse Download assets Run models with exp’mtl data DOI citation
  19. 19. The Neylon Equation
  20. 20. FAIRDOM Provenance 2008 2010 2014 de.NBI 2019
  21. 21. SEEK: Science Commons Web-based Cataloguing and Rich web interface for describing, finding, linking and promoting ongoing research and outcomes. Small files, aggregates across data archives. openBIS: Scaled local LIMS and analytics Extract,Transform and Load tooling direct from the instrumentation, data analysis pipelines.Automatic archiving. Handles large data. FAIRDOM Suite
  22. 22. Personal Data Local Stores LIMS External Databases Articles Models Standards SOPs AggregatedCommons Infrastructure Über metadata, cataloguing Stores SOPs, Models, data files
  23. 23. NGS Proteomics LIMS iPortal BeeWM
  24. 24. https://doi.org/10.15490/seek.1.investigation.56
  25. 25. [Snoep, 2015] https://doi.org/10.15490/seek.1.investigation.56
  26. 26. StandardOperating Procedures Challenge: Machine processable SOPs
  27. 27. Models simulate and annotate in browser
  28. 28. Metadata standards & templates to link studies and link assets Just Enough Results Model Describes common elements and relationships between things produced and used in experiments. Structured descriptions for consistency and comparison
  29. 29. NuML [Adapted, Le Novere]
  30. 30. FAIRDOM Suite Resource FAIRDOMHub Self-managed, customised local installation. Independent, self- managed private space on shared, hosted installation. Publisher Companion Site FAIRDOMHub.org
  31. 31. FAIRDOM Suite Resource FAIRDOMHub FAIRDOM Initiative Facilities Community Networks Forums Workshops Tools Standards Support Sustainability de.NBI
  32. 32. Sys Bio Developers Foundry, Oct 2014 Heidelberg, Germany EraSysAPP meeting, April 2015, Berlin, Germany
  33. 33. PALs
  34. 34. http://seek.virtual-liver.de/ • Navigation • Single standards at one scale • Multi-type hosting “To integrate the detailed knowledge that we have at the molecular level up to the functional level at tissue/organ/whole body level “ Multi-scale? Multi-silos ….
  35. 35. Handling/converting data of different levels of detail to make the model run. Representing in the SBML model the DNA bindings at the level of detail that had been measured in the experiments Whole Cell model by Jonathan Karr (Rostock Summer School, DagmarWaltemath) Support for aggregating data to find the appropriate level of representation for a given model. Karr JR, Sanghvi JC, Macklin DN, et al. AWhole-Cell Computational Model Predicts Phenotype from Genotype. Cell. 2012;150(2):389-401. doi:10.1016/j.cell.2012.05.044.
  36. 36. Challenge: mismatches • Systems on different scales – incompatible time scales, data may be too sparse or need to be aggregated to work with another module • Different levels of complexity – comparing results from different modelling approaches. • Linking models needs thinking and standards – connecting the single standards – interfacing between the different scales – connecting (experimental/simulation) data to models
  37. 37. Challenge: model evolution BiVeS tool: diff in versions of computational models Provenance,Versioning, Parameter tracking Releasing updated versions into the literature Identifying, Interpreting, and CommunicatingChanges in XML-encoded Models of Biological Systems Scharm et. al. 2015, under revision at BIOINFORMATICS Haus et al, BMC Systems Biology, 2011, 5:10 Solvent production by Clostridium acetobutylicum [Martin Scharm]
  38. 38. F1000Research Living Figures, versioned articles, in-article data manipulation R Lawrence Force2015, Vision Award Runner Up http://f1000.com/posters/browse/summary/1097482 Simply data + code Can change the definition of a figure, and ultimately the journal article Colomb J and Brembs B. Sub-strains of Drosophila Canton-S differ markedly in their locomotor behavior [v1; ref status: indexed, http://f1000r.es/3is] F1000Research 2014, 3:176 Other labs can replicate the study, or contribute their data to a meta-analysis or disease model - figure automatically updates. Data updates time-stamped. New conclusions added via versions.
  39. 39. Challenge: reproducibility bridging from research to FAIR publishing Bergmann, Rodriguez, Le Novère. COMBINE archive specification. <http://identifiers.org/combine.specifications/omex.version-1> (2014) Describe Access Port
  40. 40. Challenge: reproducibility bridging from research to FAIR publishing DepositModel simulation Differentiated data
  41. 41. Challenge: Samples Descriptions SOP-Centric
  42. 42. Challenge: Releasing
  43. 43. Challenge: Releasing SysMO Projects (2009-2014) me ME my team close colleagues • Self-publication & Journal companionship. • Staged & Selective Hugging & Flirting. Reciprocity. • Tribal &Trading behaviours • Forgetfulness, Embargos • Resources, Benefit • Individuals more likely to share than consortia • Post-hoc rationalised Data/Model Cycles
  44. 44. Challenges: (meta)data wrangling Offseting curation debt http://rightfield.org.uk
  45. 45. FAIRDOM Challenge: Sustainability Free. Like a Free Puppy.
  46. 46. Enabling multi-scale modelling in systems medicine 1. Exploit existing data for multi-scale modelling 2. Develop SOPs and quality standards for systematic collection of quantitative data and information. 3. Identify required standards and ontologies for models and data repositories in systems medicine. 4. Develop modelling workflows for the integration of data and models; support data management, model construction and analysis. 5. Develop mathematical formalism to analyze and compare multi-scale models (parameter estimation, sensitivity analysis, identifiability analysis and image analysis). Wolkenhauer et al, Enabling multiscale modeling in systems medicine, 2014, Genome Medicine 6(3)
  47. 47. Carole Goble Stuart Owen Finn Bacall Jacky Snoep Wolfgang Mueller Olga Krebs Quyen Nguyen Natalie Stanford KatyWolstencroft Peter Kunzst Bernd Rinn fairdom@fair-dom.org fair-dom@fair-dom.org http://www.fair-dom.org http://www.fairdomhub.org http://seek4science.org http://www.rightfield.org.uk http://jjj.biochem.sun.ac.za http://sybit.net/software/openBIS Donal FellowsAlanWilliams Rostyslav Kuzyakiv Jakub Straszewski Chandrasekhar Ramakrishnan Caterina Barillari Norman Morrison

×