Reproducibility, dissemination,
and management of modeling results

17 February 2014, Braunschweig

Dagmar Waltemath

http://sems.uni-rostock.de
Nature Blogs: Of Schemes and Dreams (2014) Nine Worrying Stats on the Effect of Poor Scientific Data Management

http://sems.uni-rostock.de

2
Nature Blogs: Of Schemes and Dreams (2014) Nine Worrying Stats on the Effect of Poor Scientific Data Management

http://sems.uni-rostock.de

3
“We’ve been hearing a common theme from
the academic community – researchers are
having difficulty managing and accessing their
data. It seems to be an ongoing problem for
research scientists, at any stage of their
careers.”
(Nature Blogs: Of Schemes and Dreams (2014) Nine Worrying Stats on the Effect of Poor Scientific Data
Management)

http://sems.uni-rostock.de

4
Outline

reproducibility

dissemination

http://sems.uni-rostock.de

management

5
Outline

reproducibility

dissemination

management

“People can’t share knowledge if they don’t
speak a common language”
Tom Davenport, Lawrence Prusak (2000) Working Knowledge

http://sems.uni-rostock.de

6
Reproducible modeling results :: Standards

Model
Entities, network
of reactions, math

Fig: Goldbeter (1991),
http://www.ncbi.nlm.nih.
gov/pubmed/1833774

Annotations
Compartment: Cell GO:0005623
Publication: Goldbeter
PMID:1833774
M = inactive CDCD2 Kinase:
UniProt:CDK1a_XENIA
Fig.: BioModels Database
Behavior: Oscillation
TEDDY_0000006
Algorithm: Gillespie
KiSAO:000029

Protocols

Fig.: BioModels Database

http://sems.uni-rostock.de

7
Reproducible modeling results :: Towards publication

1

3

2

+

4

5

Following: Waltemath et al (2013) Reproducibility of model-based results in systems biology. Springer
http://sems.uni-rostock.de

8
Outline

reproducibility

dissemination

management

[Quantitative] models will be only as useful as their access and reuse
is easy for all scientists.
Nicolas Le Novère (2006) Model storage, exchange and integration. BMC Neuroscience
http://sems.uni-rostock.de

9
Dissemination :: Model curation and annotation

Fig.: Li et al (2010) BioModels Database: An enhanced, curated and annotated resource for published quantitative kinetic
models. BMC Systems Biology
http://sems.uni-rostock.de

10
Dissemination :: Public model repositories

1.
2.
3.
4.

Higher visibility of research
Long-term availability
Link to other resources
Quality-checks
Fig.: Piwowar and Vision (2013) Data reuse and the open
data citation advantage. PeerJ

http://sems.uni-rostock.de

11
Dissemination :: Quality checks with functional curation

Fig.: Example for functional curation on heart model, http://travis.cs.ox.ac.uk/FunctionalCuration/db.html

Fig.: Cooper et al (under review) Through models to knowledge with virtual experiments
http://sems.uni-rostock.de

Martin Scharm
12
Outline

reproducibility

dissemination

management

“And that’s why we need model Management.“
Following: http://www.indiana.edu/~hperp200/images/WhyWeNeedComputer_thumb.png

http://sems.uni-rostock.de

13
Management :: Integration of model-related data
“Which models are annotated with ‘Adenosine tri-phosphate’?”

Document

”Which models contain reactions with
ATP as reactant and ADP as product?“
C2

CP

Pubmed:
1831270
Kegg Pathway
sce04111

is

pM

Cell

asProduct
asReactant

EC-Code:
3.1.3.16

Uniprot:P04551

Uniprot:P04551

Interpro:
IPR006670

is

hasPart

isContainedIn

isVersion

isVersionOf

• Relations between entities
• Links to concepts in bio-ontologies

Reaction3

isVersionOf

• Graph store (Neo4J database)

isDescribedBy
Tyson1991
Cell Cycle 6
var

GO:0005623

Fig.: Henkel et al (2012) Considerations of graph-based
concepts to manage of computational biology models and
associated simulations INFORMATIK2012, Braunschweig

Ron Henkel
http://sems.uni-rostock.de

14
Management :: Integration of model-related data
Document

Document

SEDML
Pubmed:
1831270

isDescribedBy

Tyson_1991

Modelreference

C2

is_connected

is_connected

environment

Simulation

Task

Datagenerator

Output

CP
Variable

is_connected

Variable
C2

CP

time

time

time

CP

KISAO:
Ontology

C2

KISAO:097

is_mapped_to

KISAO:000

KISAO:201

isA
Document

isDescribedBy

KISAO:433

Tyson1991
Cell Cycle 6
var

Reaction3

C2

CP

pM

KISAO:352

KISAO:20

KISAO:019

Kegg Pathway
sce04111

is

KISAO:273

KISAO:447

SBO:
Ontology

Cell

asProduct
isContainedIn

is

hasPart

isA

ha

f
nO

sP

art

isVersionOf

SBO:0000

is

sio
er

EC-Code:
3.1.3.16

isV

isVersionOf

asReactant

SBO:064
Uniprot:P04551

Interpro:
IPR006670

GO:0005623

SBO:544

SBO:004

SBO:231

SBO:003

SBO:236

SBO:545

SBO:000064

Fig.: Henkel et al (in preparation)
http://sems.uni-rostock.de

15
Management :: Combination of methods
Keywords describing a
model of interest.

Rank

isVersion
Of

CP

Unipr
ot:P0
4551

is

pM

3.

Maex‘98

SEDM
L
Tyso
n_19
91

Inte
rpro
:
IPR
006
670

Pubm
isDescribedBy

Cel
l
envi
ron
men
t

ed:
Pubm
183127
0 ed:
183127
0

Model
refere
nce

CP

Simul
ation

Task

Outpu
t

Datag
enera
tor

Varia
ble

GO:0
0056
23

C2

CP

time

time

time

CP

C2

Varia
ble

ID:
BIOMD000000005
Authors:
Tyson JJ.
Date:
13 Sep 2005 12:31:08
Publication: pubmed:1831270
Species:
cdc2k, cyclin …
Reaction: cyclin_cdc2k_dissociation, …

Tyson‘91
Tyson‘91 ODE plot

simulate

Tyson‘91

Doc
ume
nt

Pub
med:
Kegg
1831
Path
270
way
sce04
111

is

hasPart

isVersion
Unipr
ot:P0
4551

Novak‘97

Docu
ment

isDescrib
edBy

C2

ECCode:
3.1.3.
16

Format

retrieve

select simulation
description

compare with paper

search

C2

isVersion
Of

19
91
Cel
l
Cy
cle
6
var

1.

2.

Do
cu
me
Tys
nt
on

Re
act
ion
3

Name

Tyson’91 ODE plot

Model: BIOMD000000005
add simulation
Algorithm:
ODE solver
description to
Type:
time course
simulation software Output:
plot

Fig.: Following Waltemath et al (2013) Reproducibility of model-based results in systems biology. Springer.
Henkel et al (2010) Ranked retrieval of Computational Biology models. BMC bioinformatics
http://sems.uni-rostock.de

Ron Henkel
16
Management :: Provenance
“Give me the best matching model published on the Cell Cycle
and considering cdk1.”

Lucene: species:cdk1, compartment:cell, …

Fig.: Waltemath et al (2013) Improving the reuse of computational models through version control.Bioinformatics
http://sems.uni-rostock.de

17
Management :: Model version control

Fig.: courtesy Martin Scharm, BudHat, http://sems.uni-rostock.de/budhat
http://sems.uni-rostock.de

Martin Scharm
18
Summary :: SEMS projects & Contributions

foster
dissemination

improve
management
Document

isDescribedBy
Tyson1991
Cell Cycle 6
var

Reaction3

C2

CP

Pubmed:
1831270
Kegg Pathway
sce04111

is

pM

Cell

asProduct

EC-Code:
3.1.3.16

http://sems.uni-rostock.de

Uniprot:P04551

Uniprot:P04551

Interpro:
IPR006670

is

hasPart

isContainedIn

isVersion

isVersionOf

asReactant

isVersionOf

ensure
reproducibility

GO:0005623

19
Thank you for your attention.
Collaborators
Nicolas Le Novère

Christian Rosenke

David Nickerson

Wolfgang Müller

Jonathan Cooper

Falk Schreiber

Jon Olav Vik

SED-ML Editorial Board

Tommy Yu

SBML Editorial Board

HARMONY 2015
Wittenberg
HERMESForschungsförderung
HERMES-Forschungsförderung
der
der Universität RostockUniversität Rostock
http://sems.uni-rostock.de

@SemsProject

20

Reproducibility, dissemination, and management of modeling results

  • 1.
    Reproducibility, dissemination, and managementof modeling results 17 February 2014, Braunschweig Dagmar Waltemath http://sems.uni-rostock.de
  • 2.
    Nature Blogs: OfSchemes and Dreams (2014) Nine Worrying Stats on the Effect of Poor Scientific Data Management http://sems.uni-rostock.de 2
  • 3.
    Nature Blogs: OfSchemes and Dreams (2014) Nine Worrying Stats on the Effect of Poor Scientific Data Management http://sems.uni-rostock.de 3
  • 4.
    “We’ve been hearinga common theme from the academic community – researchers are having difficulty managing and accessing their data. It seems to be an ongoing problem for research scientists, at any stage of their careers.” (Nature Blogs: Of Schemes and Dreams (2014) Nine Worrying Stats on the Effect of Poor Scientific Data Management) http://sems.uni-rostock.de 4
  • 5.
  • 6.
    Outline reproducibility dissemination management “People can’t shareknowledge if they don’t speak a common language” Tom Davenport, Lawrence Prusak (2000) Working Knowledge http://sems.uni-rostock.de 6
  • 7.
    Reproducible modeling results:: Standards Model Entities, network of reactions, math Fig: Goldbeter (1991), http://www.ncbi.nlm.nih. gov/pubmed/1833774 Annotations Compartment: Cell GO:0005623 Publication: Goldbeter PMID:1833774 M = inactive CDCD2 Kinase: UniProt:CDK1a_XENIA Fig.: BioModels Database Behavior: Oscillation TEDDY_0000006 Algorithm: Gillespie KiSAO:000029 Protocols Fig.: BioModels Database http://sems.uni-rostock.de 7
  • 8.
    Reproducible modeling results:: Towards publication 1 3 2 + 4 5 Following: Waltemath et al (2013) Reproducibility of model-based results in systems biology. Springer http://sems.uni-rostock.de 8
  • 9.
    Outline reproducibility dissemination management [Quantitative] models willbe only as useful as their access and reuse is easy for all scientists. Nicolas Le Novère (2006) Model storage, exchange and integration. BMC Neuroscience http://sems.uni-rostock.de 9
  • 10.
    Dissemination :: Modelcuration and annotation Fig.: Li et al (2010) BioModels Database: An enhanced, curated and annotated resource for published quantitative kinetic models. BMC Systems Biology http://sems.uni-rostock.de 10
  • 11.
    Dissemination :: Publicmodel repositories 1. 2. 3. 4. Higher visibility of research Long-term availability Link to other resources Quality-checks Fig.: Piwowar and Vision (2013) Data reuse and the open data citation advantage. PeerJ http://sems.uni-rostock.de 11
  • 12.
    Dissemination :: Qualitychecks with functional curation Fig.: Example for functional curation on heart model, http://travis.cs.ox.ac.uk/FunctionalCuration/db.html Fig.: Cooper et al (under review) Through models to knowledge with virtual experiments http://sems.uni-rostock.de Martin Scharm 12
  • 13.
    Outline reproducibility dissemination management “And that’s whywe need model Management.“ Following: http://www.indiana.edu/~hperp200/images/WhyWeNeedComputer_thumb.png http://sems.uni-rostock.de 13
  • 14.
    Management :: Integrationof model-related data “Which models are annotated with ‘Adenosine tri-phosphate’?” Document ”Which models contain reactions with ATP as reactant and ADP as product?“ C2 CP Pubmed: 1831270 Kegg Pathway sce04111 is pM Cell asProduct asReactant EC-Code: 3.1.3.16 Uniprot:P04551 Uniprot:P04551 Interpro: IPR006670 is hasPart isContainedIn isVersion isVersionOf • Relations between entities • Links to concepts in bio-ontologies Reaction3 isVersionOf • Graph store (Neo4J database) isDescribedBy Tyson1991 Cell Cycle 6 var GO:0005623 Fig.: Henkel et al (2012) Considerations of graph-based concepts to manage of computational biology models and associated simulations INFORMATIK2012, Braunschweig Ron Henkel http://sems.uni-rostock.de 14
  • 15.
    Management :: Integrationof model-related data Document Document SEDML Pubmed: 1831270 isDescribedBy Tyson_1991 Modelreference C2 is_connected is_connected environment Simulation Task Datagenerator Output CP Variable is_connected Variable C2 CP time time time CP KISAO: Ontology C2 KISAO:097 is_mapped_to KISAO:000 KISAO:201 isA Document isDescribedBy KISAO:433 Tyson1991 Cell Cycle 6 var Reaction3 C2 CP pM KISAO:352 KISAO:20 KISAO:019 Kegg Pathway sce04111 is KISAO:273 KISAO:447 SBO: Ontology Cell asProduct isContainedIn is hasPart isA ha f nO sP art isVersionOf SBO:0000 is sio er EC-Code: 3.1.3.16 isV isVersionOf asReactant SBO:064 Uniprot:P04551 Interpro: IPR006670 GO:0005623 SBO:544 SBO:004 SBO:231 SBO:003 SBO:236 SBO:545 SBO:000064 Fig.: Henkel et al (in preparation) http://sems.uni-rostock.de 15
  • 16.
    Management :: Combinationof methods Keywords describing a model of interest. Rank isVersion Of CP Unipr ot:P0 4551 is pM 3. Maex‘98 SEDM L Tyso n_19 91 Inte rpro : IPR 006 670 Pubm isDescribedBy Cel l envi ron men t ed: Pubm 183127 0 ed: 183127 0 Model refere nce CP Simul ation Task Outpu t Datag enera tor Varia ble GO:0 0056 23 C2 CP time time time CP C2 Varia ble ID: BIOMD000000005 Authors: Tyson JJ. Date: 13 Sep 2005 12:31:08 Publication: pubmed:1831270 Species: cdc2k, cyclin … Reaction: cyclin_cdc2k_dissociation, … Tyson‘91 Tyson‘91 ODE plot simulate Tyson‘91 Doc ume nt Pub med: Kegg 1831 Path 270 way sce04 111 is hasPart isVersion Unipr ot:P0 4551 Novak‘97 Docu ment isDescrib edBy C2 ECCode: 3.1.3. 16 Format retrieve select simulation description compare with paper search C2 isVersion Of 19 91 Cel l Cy cle 6 var 1. 2. Do cu me Tys nt on Re act ion 3 Name Tyson’91 ODE plot Model: BIOMD000000005 add simulation Algorithm: ODE solver description to Type: time course simulation software Output: plot Fig.: Following Waltemath et al (2013) Reproducibility of model-based results in systems biology. Springer. Henkel et al (2010) Ranked retrieval of Computational Biology models. BMC bioinformatics http://sems.uni-rostock.de Ron Henkel 16
  • 17.
    Management :: Provenance “Giveme the best matching model published on the Cell Cycle and considering cdk1.” Lucene: species:cdk1, compartment:cell, … Fig.: Waltemath et al (2013) Improving the reuse of computational models through version control.Bioinformatics http://sems.uni-rostock.de 17
  • 18.
    Management :: Modelversion control Fig.: courtesy Martin Scharm, BudHat, http://sems.uni-rostock.de/budhat http://sems.uni-rostock.de Martin Scharm 18
  • 19.
    Summary :: SEMSprojects & Contributions foster dissemination improve management Document isDescribedBy Tyson1991 Cell Cycle 6 var Reaction3 C2 CP Pubmed: 1831270 Kegg Pathway sce04111 is pM Cell asProduct EC-Code: 3.1.3.16 http://sems.uni-rostock.de Uniprot:P04551 Uniprot:P04551 Interpro: IPR006670 is hasPart isContainedIn isVersion isVersionOf asReactant isVersionOf ensure reproducibility GO:0005623 19
  • 20.
    Thank you foryour attention. Collaborators Nicolas Le Novère Christian Rosenke David Nickerson Wolfgang Müller Jonathan Cooper Falk Schreiber Jon Olav Vik SED-ML Editorial Board Tommy Yu SBML Editorial Board HARMONY 2015 Wittenberg HERMESForschungsförderung HERMES-Forschungsförderung der der Universität RostockUniversität Rostock http://sems.uni-rostock.de @SemsProject 20