2019 07-04-model reuse-bonn

from paper-based
model description to
interactive simulation
of disease progression
PROF. DR.-ING. DIPL.-INF. DAGMAR WALTEMATH
MEDICAL INFORMATICS | INSTITUTE FOR COMMUNITY MEDICINE
UNIVERSITY MEDICINE GREIFSWALD (GERMANY)
MODELREUSEWITHJOY

How this talk is organised
THE HISTORY THE SCIENCE
Disclaimer: All comic-style graphics in this presentation
were done either by Anna Zhukova or by Martin Peters.
Thank you very much! Images downloaded from pixabay.
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 37/4/2019

Systems Biology is…
Systems biology is the science that studies
how biological function emerges from the
interactions between the components of living
systems.
… and how these emergent properties enable
or constrain the behavior of these
components.
(Slide adapted from: Olaf Wolkenhauer)

Simulation models can take many forms.
MATHEMATICAL MODELS FURTHER APPROACHES
Fig.s: https://doi.org/10.1371/journal.pcbi.1002815, https://doi.org/10.1371/journal.pcbi.1004591

Simulation models can be complex.
First in silico Whole Cell Model
Genome (525 genes), transcriptome, proteome and metabolome
incorporated
Describes whole life cycle of a single cell on molecular level, and
predicts a wide range of cellular behaviors, and
accounts for the specific function of every annotated gene product
 Based on 900 publications
Consists of 116 MATLAB files
Incorporates over 1.900 experimentally observed parameters
WHOLE-CELL MODEL KEY FIGURES
Fig.: Karr et al. (2012), https://doi.org/10.1016/j.cell.2012.05.044

Early simulation models
Tyson‘s Cell Cycle Model (1991) (BIOM5 und BIOM6)
history

Publishing the model
PAPER AVAILABLE INFORMATION
1) (textual) description of work and related
efforts (referencing other papers)
2) (textual and visual) description of
(biochemical) network
3) (printed) model parameters
4) (printed) mathematical equations
5) resulting plots
DAGMAR WALTEMATH | MODEL REUSE WITH JOY 8
Fig.: http://doi.org/10.1073/pnas.88.16.7328
7/4/2019

What can you do with this model?
STUDY THE PAPER, BELIEVE RE-IMPLEMENT BASED ON THE PAPER

The year 2000
Novak‘s Cell Cycle model (1997) (BIOM7)
history

Publishing the model
PAPER AVAILABLE INFORMATION
1) (textual) description of work and related
efforts (referencing other papers)
2) (textual and visual) description of
(biochemical) network
3) (printed) model parameters
4) (printed) mathematical equations
5) resulting plots
Fig.: http://doi.org/10.1073/pnas.94.17.9147
7/4/2019

Publishing the model code
SIMULATION MODEL AVAILABLE INFORMATION
1) Description of (biochemical) network in
computer-readable format (SBML)
2) Mathematical equations in computer-
readable format (MathML)
3) Model parameters inside model code

What can you do with this data?
CHECK THE MODEL (REPRODUCIBILITY)
RE-USE THE CODE IN ANOTHER SOFTWARE
(INTEROPERABILITY)
Fig. (left) JWS Online, http://jjj.mib.ac.uk/models. Fig. (right) courtesy M.Hucka (2016),
https://www.slideshare.net/thehuck/recent-software-and-services-to-support-the-sbml-community

State-of-the-art (in theory)
Calzone‘s Cell Cycle model (BIOM144)
history

Publishing the model & code
PAPER SIMULATION MODEL
Fig.: https://doi.org/10.1038/msb4100171
7/4/2019

Publishing the meta-data
on repository – model – and entity level
Harmonised meta-data for simulation models in computational biology: Neal et al. (2018), Briefings in Bioinformatics (https://doi.org/10.1093/bib/bby087)
7/4/2019

Publishing the simulation setups
COMBINE ARCHIVE
manifest.xml Omex Skeleton, automatically generated by WebCAT
metadata.rdf Omex Skeleton, automatically generated by WebCAT
README.md Markdown Human readable information for users stumbling upon the archive
model/
BIOMD0000000144.xml SBML L2V1 origin: www.ebi.ac.uk/biomodels-main/download?mid=BIOMD0000000144
calzone_2007.svg SVG origin: models.cellml.org/workspace/calzone_thieffry_tyson_novak_2007
calzone_2007.ai Illustrator origin: models.cellml.org/workspace/calzone_thieffry_tyson_novak_2007
calzone_2007.png PNG origin: models.cellml.org/workspace/calzone_thieffry_tyson_novak_2007
calzone_thieffry_tyson_novak_2007.cellml CellML 1.0 origin: models.cellml.org/workspace/calzone_thieffry_tyson_novak_2007
sbgn/Calzone2007.gml GML SBGN compliant figure generated using SBGN-ED
sbgn/Calzone2007.graphml GraphML SBGN compliant figure generated using SBGN-ED
sbgn/Calzone2007.pdf PDF SBGN compliant figure generated using SBGN-ED
sbgn/Calzone2007.png PNG SBGN compliant figure generated using SBGN-ED
sbgn/Calzone2007.sbgn SBGN-ML SBGN-ML encoded figure generated using SBGN-ED
experiment/
Calzone2007-default-simulation.xml SED-ML L1V1 Simulation description generated using SED-ML Web Tools
Calzone2007-simulation-figure-1B.xml SED-ML L1V1
Simulation description generated using SED-ML Web Tools based on
Calzone2007-default-simulation.xml
documentation/
Calzone2007.pdf PDF
Scientific publication “Dynamical modeling of syncytial mitotic cycles in
Drosophila embryos”obtained from msb.embopress.org/content/3/1/131
Calzone2007-supplementary-material.pdf PDF
Supplementary information for the publication obtained from
msb.embopress.org/content/3/1/131
result/
Fig1B-bottom-COPASI.svg SVG
Imagegenerated by executing Calzone2007-simulation-figure-1B.xml on
BIOMD0000000144.xml in COPASI
Fig1B-top-COPASI.svg SVG
BIOMD0000000144.xml in COPASI
Fig1B-bottom-webtools.png PNG
BIOMD0000000144.xml in SED-ML Web Tools
Fig1B-top-webtools.png PNG
BIOMD0000000144.xml in SED-ML Web Tools
AVAILABLE INFORMATION
1) Paper and additional information
2) Meta-data
3) Graphical representation of model (SBGN)
4) Alternative parametrisations (SED-ML)
5) Model versions
6) Simulation experiments (SED-ML)
Example archive available from: https://github.com/SemsProject/CombineArchiveShowCase/

What can you do with an archive?
Explore data
and meta-data
Identify
Data set
of interest
Run model
Online/
offline
Safe new versions and
documentation in archive
Modify,
merge,
extend,
combine...
Re-publish
Download
Archive
7/4/2019

What can you do with an archive?
Example: Download archive from Github and run it in JWS Online

What does the (near) future bring?

Linking models and data simplifies verification
of models, and experimental data sets.
Integrating Disease maps and Biomedical
data (e.g., https://pdmap.uni.lu/minerva/)
Linking models and experimental data sets
(e.g., JWS Online)
7/4/2019

Connecting pathways, ontologies and datasets
leads to new means of data exploration.
Comprehensive knowledge of cancer signaling networks and linked data,
working with interactive Pathway Maps, https://acsn.curie.fr/ACSN2/ACSN2.html
7/4/2019

Easy access to patient-specific liver disease
progression helps doctors choose a therapy.
Fig.: Koenig et al. (2016), ODLS, Halle (Saale), http://livermetabolism.com
7/4/2019

The pillars of success
WHAT‘S THE SECRET?

The research field develops and adheres to
FAIR standards for modeling and simulation.
Data formatsRecommendations Semantic / Ontologies

Data formats are interoperable and are
being developed collaboratively.
Editorial Boards
Specifications
Software tool support
http://co.mbine.org/standards
Standard development Meetings
Annual special issue with
list of latest specifications
and errata

The community builds, feeds & uses
open repositories for simulation studies.

The community actively develops open,
standard-compliant libraries & tools.
MODELING AND SIMULATION SOFTWARE REPOSITORIES & MANAGEMENT TOOLS
…
Full list available at: http://sbml.org/SBML_Software_Guide/

The (data) Science
DEVELOPMENT OF MODEL MANAGEMENT STRATEGIES
BY SEMS & FRIENDS (2011-2019)

Characteristics of the data
Heterogeneous
Big
Distributed
Complex
Highly connected
But
Good standards available to represent the
data
Agreed-upon semantic annotation schemes &
ontologies to enrich the data
Open data movement
Community spirit

Issues that SEMS investigated 2012-17
Handling the steadily increasing size & numbers of models and studies (database performance)
Increasing the quality of published models (semantic annotations, reproducibility of results)
Keeping track of model changes and relations
(comprehensibility)
Identifying and handling similarities
in model representations (reuse)
~ 300.000 models in
BioModels Database,
on average 5 versions per
model.
XML, RDF, OWL
7/4/2019

A graph-based approach keeps storage
and retrieval efficient.
Document
SEDML
Modelrefere
nce
Output
Datagenera
tor
Simulation Task
Variable
Variable
Document
Tyson_1991
C2 CP
time
environment
isDescribedBy Pubmed:
1831270
time timeCPC2 CP C2
is_connected is_connected
is_mapped_to
is_connected
Document
Tyson1991
Cell Cycle 6
var
C2 pM CellReaction3 CP
Uniprot:P04551 Uniprot:P04551 GO:0005623
Interpro:
IPR006670isVersionOf
isVersion
hasPart
is
asProduct
asReactant isContainedIn
Pubmed:
1831270
Kegg Pathway
sce04111
isDescribedBy
is
EC-Code:
3.1.3.16
isVersionOf
Example: Tyson 1991 (BIOM5), Source: Waltemath & Henkel, Neo4j Life & Health Sciences Day - Berlin, 21st June, 2017,
adapted from Henkel et al. (2015) DATABASE (https://doi.org/10.1093/database/bau130)
SBO:
Ontology
SBO:0000
SBO:544 SBO:236SBO:231
isA
SBO:064 SBO:545SBO:004 SBO:003
Models Simulation Annotation
7/4/2019

The linking of data sets on graph-level
allows for complex queries.
2 experiments,
3 model versions,
changes,
meta-data
Fig.: Martin Peters, SEMS
Fig (right): Henkel et al. (2015) DATABASE, https://doi.org/10.1093/database/bau130
7/4/2019

Lucene-based indices incorporate all relevant
information for later search & comparison.
Model
Publication
Annotation
Person
Simulation
Document
Tyson1991
Cell Cycle 6
var
Interpro:
IPR006670
isVersionOf
isVersion
hasPart
is
asProduct
Pubmed:
1831270
Kegg Pathway
sce04111
isDescribedBy
is
EC-Code:
3.1.3.16
isVersionOf
Document
SEDML
Modelrefere
nce
Output
Datagenera
tor
Simulation Task
Variable
Variable
Document
Tyson_1991
C2 CP
time
environment
isDescribedBy Pubmed:
1831270
time timeCPC2 CP C2
is_connected is_connected
is_mapped_to
is_connected
SBO:
Ontology
SBO:0000
SBO:544 SBO:236SBO:231
isA
SBO:064 SBO:545SBO:004 SBO:003
• Id
• Name
• Title
• Journal
• Abstract
• Authors
• …
• Id
• Name
• Component
• Variable
• Species
• Reaction
• Compartment
• First name
• Last name
• Organization
• Email
• URI
• Description
Fig.: Henkel et al. (2015) DATABASE
7/4/2019

A weighted ranked-retrieval methods
returns only most relevant models.
Document
Tyson1991
Cell Cycle 6
var
Interpro:
IPR006670
isVersionOf
isVersion
hasPart
is
asProduct
Pubmed:
1831270
Kegg Pathway
sce04111
isDescribedBy
is
EC-Code:
3.1.3.16
isVersionOf
Annotation
Person
Show me models by
Tyson describing the cell
cycle and having cdc2
1. (0.859) Tyson1991 - Cell Cycle 6 var
2. (0.854) Tyson2001_Cell_Cycle_Regulation
3. (0.477) Chen2004 - Cell Cycle Regulation
Which are the most frequently used
GO annotations in my model set?
Which models contain reactions
with 'ATP' as reactant and 'ADP'
as product?
Find good candidates for
features describing my model set.
Which models are annotated
with ‘Ubiquitin'’?
Give me all the files I need to
run this simulation study.
Fig.: Henkel et al. (2015) DATABASE
7/4/2019

A method to detect and track differences
in model versions ensures transparency.
How did my model change between version x and X+1?
„Sophisticated“ XYDIFF & change ontology
How often did this model
change, when and wy?Give me all versions of this
model.Figs.: Waltemath et al. (2015) Oxford Bioinformatics (https://doi.org/10.1093/bioinformatics/btt018);
Implementation: M. Scharm, https://github.com/SemsProject/BiVeS
7/4/2019

Identification of frequent pattern in network
graphs helps determine structural similarity.
Fig.: Size and number of reactions and participating species (left), and identified frequent patterns (right).
Implementation: Fabienne Lambusch. Figure: Lambusch et al. (2018) DATABASE (https://doi.org/10.1093/database/bay051)
7/4/2019

Identification of frequent pattern in network
graphs helps determine structural similarity.
Fig.: Tyson BIOM5 (left), and identified patterns based on the (right).
Implementation: Fabienne Lambusch. Figure: Lambusch et al. (2018) DATABASE (https://doi.org/10.1093/database/bay051)
How similar are these two models
with respect to structure?
Give me all models with
this particular sub-structure.
7/4/2019

Example applications
MODEL RETRIEVAL | MODEL VERSION CONTROL & PROVENANCE |

Implementing model version control in the FAIRDOMHub
Internal use of BIVES difference detection for SBML models

Change statistics for model versions
Internal use of BIVES difference detection for SBML and CellML models, Change ontology COMODI, SBGN Visualisation tool DiViL;
https://most.bio.informatik.uni-rostock.de, Scharm et al (2018), BMC SysBio (https://doi.org/10.1186/s12918-018-0553-2)
BIOM7
7/4/2019

BIOM7
7/4/2019

Ranked retrieval of reproducible simulation studies
Internal use of the COMBINEArchive-library, MORRE, MASYMOS, http://cellml.org/models
Internal use of the COMBINEArchive library, SEDMLlibrary, https://jjj.biochem.sun.ac.za/

…we can help
you manage it,
so it can be
retrieved and
reused by others.
If your work is
standardised,
documented,
and open

Summary

Standardisation and integration of data
improved model accessibility and reusability.
COPPIC FOREST (DECORTICATED)
Matlab logo: By Jarekt (Own work) [Public domain], via Wikimedia Commons; Python logo: By www.python.org [GPL, via Wikimedia Commons];
Java logo: By Cguevara94 (Own work) [CC BY-SA 4.0], via Wikimedia Commons, modified.
PATH (ACCESSIBLE)
7/4/2019

Biological data is well-integrated with simulation
models, but biomedical/clinical data lacks behind.

Thank you for
your attention
Dagmar Waltemath
University Medicine Greifswald
@dagmarwaltemath
0000-0002-5886-5563
Contact me to adopt a SEMS –
work in Greifswald or clone a github repository!

2019 07-04-model reuse-bonn

Recommended

Recommended

More Related Content

Similar to 2019 07-04-model reuse-bonn

Similar to 2019 07-04-model reuse-bonn (20)

More from University Medicine Greifswald

More from University Medicine Greifswald (20)

Recently uploaded

Recently uploaded (20)

2019 07-04-model reuse-bonn