FAIR Data and Model
Management for
Systems Biology
(and SOPs too!)
Prof Carole Goble
The University of Manchester
The Software Sustainability Institute
ELIXIR UK, SynBioChem Centre
carole.goble@manchester.ac.uk
MultiScale Biology Network Springboard meeting, Nottingham, UK, 1 June 2015
• Project-centric data and
model management
• Respect & expects other systems
• Forged in fire of
national & international
projects
• PhDs/postgrads/PIs
• Context
• FAIRDOM Initiative
• Challenges
http://www.fair-dom.org
http://www.fairdomhub.org
republic of science*
regulation of science
institutions
libraries
*Merton’s four norms of scientific behaviour (1942)
public archives
cloud services
Reproducibility
Nature, April 2015
https://sems.uni-rostock.de/reproducible-and-citable-data-and-models/
Publishers
• Reproducibility
• New publishable assets
• New business models and services
Funders, Managers
• Capitalising
• Skills
• Justification, Audit & Compliance
UK Funder Data Policies
http://www.dcc.ac.uk/resources/policy-and-legal/overview-funders-data-policies
Tools, Standards,
Formats, Reporting,
Policies, Practices,
Initiatives
Data
Models
SOPs
consistency,
comparability
Samples…
‘omics
images,
reaction
kinetics,
samples,
specimens… Small: spreadsheets, files…
Big: NGS, Mass Spec, specialist repositories…
ODE, SBML, Native Matlab,
PDE, Fortran, CellML…
versioning,
provenance tracking,
parameter tracking,
citation tracking,
links to articles
STANDARDS
Asset Management
public archives
cloud services
88
Public-Centric Asset Management
public archives
cloud services
Public-Centric Asset Management
Challenge: Most quantitative databases provide kinetic constants
for enzymes, sometimes binding constants….
Little to help building quantitative descriptions, i.e. concentrations,
sizes, diffusions….
Exceptions: gene expression data, proteomics, metabolomics.
Localisation: The average concentration of a protein in a piece of
brain is of limited use (mix of tissues and subcellular compartments)
[Nicolas Le Novere, 2015]
Public-Centric Asset Management
public archives
FAIR for the Researcher
Collaborative, data/model-driven
science
Publication
Local and Public Resources
Skills and Productivity
Compliance
Collaboration, asset management
Pop-up projects
Dynamic groups
Internal / external
visibility
Pop-up projects
Dynamic groups
Internal / external
visibility
Collaboration, asset management
18
Project-Centric Asset Management
Is this data available?
What SOP was used for
this sample?
Where is the validation data for
this model?
• Retain results beyond a
project / the PhD student
• Exchange & find assets.
• Share, disseminate and
publish assets sensitively
• Consistent reporting for
interpretation, interop &
comparison
• Promote standardised
metadata practices.
• Organise and link assets
• Reuse results
Find
Data, models, protocols,
projects, people
Catalogued and linked assets
Link studies to assets
Control sharing, versioning,
gateway to scattered
public/local archives
Access
Interoperate
Standards (SBML, SED-
ML…), vocabs, formats, ids
harvesting, export, API
Reuse
Download assets
Run models with exp’mtl data
DOI citation
The Neylon Equation
FAIRDOM Provenance
2008
2010
2014
de.NBI
2019
SEEK:
Science Commons
Web-based Cataloguing and Rich web
interface for describing, finding,
linking and promoting ongoing
research and outcomes. Small files,
aggregates across data archives.
openBIS:
Scaled local LIMS and analytics
Extract,Transform and Load tooling
direct from the instrumentation, data
analysis pipelines.Automatic
archiving. Handles large data.
FAIRDOM Suite
Personal Data
Local Stores
LIMS
External
Databases
Articles
Models
Standards
SOPs
AggregatedCommons Infrastructure
Über metadata, cataloguing
Stores
SOPs,
Models,
data files
NGS
Proteomics
LIMS
iPortal
BeeWM
https://doi.org/10.15490/seek.1.investigation.56
[Snoep, 2015]
https://doi.org/10.15490/seek.1.investigation.56
StandardOperating Procedures
Challenge: Machine
processable SOPs
Models
simulate and
annotate in
browser
Metadata standards & templates to
link studies and link assets
Just Enough
Results Model
Describes
common
elements and
relationships
between things
produced and
used in
experiments.
Structured
descriptions for
consistency and
comparison
NuML
[Adapted, Le Novere]
FAIRDOM
Suite
Resource
FAIRDOMHub
Self-managed,
customised
local installation.
Independent, self-
managed private
space on shared,
hosted
installation.
Publisher
Companion Site
FAIRDOMHub.org
FAIRDOM
Suite
Resource
FAIRDOMHub
FAIRDOM Initiative
Facilities
Community
Networks
Forums
Workshops
Tools
Standards
Support
Sustainability
de.NBI
Sys Bio Developers Foundry, Oct
2014 Heidelberg, Germany
EraSysAPP meeting, April 2015,
Berlin, Germany
PALs
http://seek.virtual-liver.de/
• Navigation
• Single standards
at one scale
• Multi-type hosting
“To integrate the detailed
knowledge that we have at the
molecular level up to the
functional level at
tissue/organ/whole body level “
Multi-scale?
Multi-silos ….
Handling/converting data of
different levels of detail to
make the model run.
Representing in the SBML
model the DNA bindings at the
level of detail that had been
measured in the experiments
Whole Cell model by Jonathan Karr
(Rostock Summer School, DagmarWaltemath)
Support for aggregating data to find the appropriate
level of representation for a given model.
Karr JR, Sanghvi JC, Macklin DN, et al. AWhole-Cell Computational Model Predicts Phenotype
from Genotype. Cell. 2012;150(2):389-401. doi:10.1016/j.cell.2012.05.044.
Challenge: mismatches
• Systems on different scales
– incompatible time scales, data may be too sparse or
need to be aggregated to work with another module
• Different levels of complexity
– comparing results from different modelling
approaches.
• Linking models needs thinking and standards
– connecting the single standards
– interfacing between the different scales
– connecting (experimental/simulation) data to models
Challenge: model evolution
BiVeS tool: diff in versions of computational models
Provenance,Versioning, Parameter tracking
Releasing updated versions into the literature
Identifying, Interpreting, and CommunicatingChanges in XML-encoded Models of Biological Systems Scharm et. al.
2015, under revision at BIOINFORMATICS
Haus et al, BMC
Systems Biology, 2011, 5:10
Solvent production by
Clostridium acetobutylicum
[Martin Scharm]
F1000Research Living Figures,
versioned articles, in-article data manipulation
R Lawrence Force2015, Vision Award Runner Up http://f1000.com/posters/browse/summary/1097482
Simply data + code
Can change the definition of
a figure, and ultimately the
journal article
Colomb J and Brembs B.
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1;
ref status: indexed, http://f1000r.es/3is]
F1000Research 2014, 3:176
Other labs can replicate the study, or
contribute their data to a meta-analysis
or disease model - figure automatically
updates.
Data updates time-stamped.
New conclusions added via versions.
Challenge: reproducibility
bridging from research to FAIR publishing
Bergmann, Rodriguez, Le Novère. COMBINE archive specification.
<http://identifiers.org/combine.specifications/omex.version-1> (2014)
Describe
Access
Port
Challenge: reproducibility
bridging from research to FAIR publishing
DepositModel simulation
Differentiated data
Challenge:
Samples
Descriptions
SOP-Centric
Challenge: Releasing
Challenge: Releasing
SysMO Projects
(2009-2014)
me
ME
my
team
close
colleagues
• Self-publication & Journal
companionship.
• Staged & Selective Hugging
& Flirting. Reciprocity.
• Tribal &Trading behaviours
• Forgetfulness, Embargos
• Resources, Benefit
• Individuals more likely to
share than consortia
• Post-hoc rationalised
Data/Model Cycles
Challenges: (meta)data wrangling
Offseting curation debt
http://rightfield.org.uk
FAIRDOM Challenge: Sustainability
Free. Like a Free Puppy.
Enabling multi-scale modelling in systems medicine
1. Exploit existing data for multi-scale modelling
2. Develop SOPs and quality standards for systematic collection of quantitative
data and information.
3. Identify required standards and ontologies for models and data repositories in
systems medicine.
4. Develop modelling workflows for the integration of data and models; support
data management, model construction and analysis.
5. Develop mathematical formalism to analyze and compare multi-scale models
(parameter estimation, sensitivity analysis, identifiability analysis and image
analysis).
Wolkenhauer et al, Enabling multiscale modeling in systems medicine, 2014, Genome Medicine 6(3)
Carole Goble Stuart Owen
Finn
Bacall
Jacky Snoep
Wolfgang
Mueller
Olga Krebs Quyen Nguyen
Natalie Stanford
KatyWolstencroft
Peter Kunzst Bernd Rinn
fairdom@fair-dom.org
fair-dom@fair-dom.org
http://www.fair-dom.org
http://www.fairdomhub.org
http://seek4science.org
http://www.rightfield.org.uk
http://jjj.biochem.sun.ac.za
http://sybit.net/software/openBIS Donal FellowsAlanWilliams
Rostyslav
Kuzyakiv
Jakub
Straszewski
Chandrasekhar
Ramakrishnan
Caterina
Barillari
Norman Morrison

FAIR data and model management for systems biology (and SOPs too!)