Computational Approaches to Systems Biology

Computational Approaches
to Systems Biology
Michael Hucka, Ph.D.
Department of Computing + Mathematical Sciences
California Institute of Technology
Pasadena, CA, USA
The Kinghorn Cancer Centre, Australia, August 2013
Email: mhucka@caltech.edu Twitter: @mhucka

Outline
Background and introduction
The Systems Biology Markup Language (SBML)
Complementary eﬀorts: MIRIAM and SED-ML
COMBINE: the Computational Modeling in Biology Network
Conclusion

Research today: experimentation, computation, cogitation

“ The nature of systems biology”
Bruggeman & Westerhoﬀ,
Trends Microbiol. 15 (2007).

Large-scale integrative models are growing

Many models have traditionally been published this way
Problems:
• Errors in printing
• Missing information
• Dependencies on
implementation
• Outright errors
• Can be a huge
eﬀort to recreate
Is it enough to communicate the model in a paper?

Is it enough to make your (software X) code available?
It’s vital for good science:
• Someone with access to the same software can try to run it,
understand it, verify the computational results, build on them, etc.
• Opinion: you should always do this in any case

Is it enough to make your (software X) code available?
It’s vital for good science—
• Someone with access to the same software can try to run it,
understand it, build on it, etc.
• Opinion: you should always do this in any case
But it’s still not ideal for communication of scientiﬁc results:
• Doesn’t necessarily encode biological semantics of the model
• What if they don’t have access to the same software?
• What if they don’t want to use that software?
• What if they want to use a diﬀerent conceptual framework?
• And how will people be able to relate the model to other work?

Diﬀerent tools diﬀerent interfaces & languages

SBML:alinguafranca
forsoftware

Format for representing computational models of biological processes
• Data structures + usage principles + serialization to XML
• (Mostly) Declarative, not procedural—not a scripting language
Neutral with respect to modeling framework
• E.g., ODE, stochastic systems, etc.
Important: software reads/writes SBML, not humans
SBML = Systems Biology Markup Language

The process is central
• Literally called a“reaction”in SBML
• Participants are pools of entities (biochemical species)
Models can further include:
• Compartments
• Other constants & variables
• Discontinuous events
• Other, explicit math
Core SBML concepts are fairly simple
• Unit deﬁnitions
• Annotations

Well-stirred compartments
c
n
Some basics of SBML core model encoding

Species pools are located in compartments
c
n
protein A protein B
gene mRNAn mRNAc

Reactions can involve any species anywhere
c
n
protein A protein B
gene mRNAn mRNAc

Reactions can cross compartment boundaries
c
n
protein A protein B
gene mRNAn mRNAc

Reaction/process rates can be (almost) arbitrary formulas
c
n
protein A protein B
gene mRNAn mRNAc
f1(x)
f2(x)
f3(x)f4(x)
f5(x)

“Rules”: equations expressing relationships in addition to reaction sys.
c
n
protein A protein B
gene mRNAn mRNAc
f1(x)
f2(x)
f3(x)
g1(x)
g2(x)
.
.
.
f4(x)
f5(x)

“Events”: discontinuous actions triggered by system conditions
c
n
protein A protein B
gene mRNAn mRNAc
f1(x)
f2(x)
f3(x)
g1(x)
g2(x)
.
.
.
Event1: when (...condition...),
do (...assignments...)
...
f4(x)
f5(x)

Annotations: machine-readable semantics and links to other resources
...
c
n
protein A protein B
gene mRNAn mRNAc
f1(x)
f2(x)
f3(x)
g1(x)
g2(x)
.
.
.
f4(x)
f5(x)
“This event
represents ...”
“This is identiﬁed
by GO id # ...”
“This is an enzymatic
reaction with EC # ...”
“This is a transport
into the nucleus ...” “This compartment
represents the nucleus ...”

BioModels Database
http://biomodels.net/biomodels

Contents of BioModels Database
Contents today:
• 142,000+ pathway models (converted from KEGG)
• 460+ hand-curated quantitative models
• 460+ non-curated quantitative models
8%
2%
3%
6%
6%
7%
8%
9%
24%
27%
signal transduction
metabolic process
multicelullar organismal process
rhythmic process
cell cycle
homeostatic process
response to stimulus
cell death
localization
others (e.g., developmental process)
Database data from 2013

Find software in the SBML Software Guide

Find SBML software
Find software in the SBML Software Guide

Question: Which of the following categories best describe your software?
(Check all that apply.)
Results of 2011 survey of SBML-compatible software
Out of 81 responses
Simulation software
Analysis s/w (in addition, or instead of, simulation)
Creation/model development software
Visualization/display/formatting software
Utility software (e.g., format conversion)
Data integration and management software
Repository or database
Framework or library (for use in developing s/w)
S/w for interactive env. (e.g., MATLAB, R, ...)
Annotation software
0 20 40 60 80
11
13
13
14
16
23
31
31
40
42

Some particularly full-featured, general simulation tools
COPASI: ODE & stochastic simulation, parameter scanning, plotting
Virtual Cell: web-based environment, spatial models
iBioSim: special features for genetic circuit models for synthetic biology
SBW (Systems Biology Workbench): component-based toolkit
SBMLsimulator: Java-based simulator, web-start or stand-alone
CellDesigner: graphical editing, SBGN support, SABIO-RK integration

Free software libraries – libSBML
Reads, writes, validates SBML
Can check & convert units
Written in portable C++
Runs on Linux, Mac, Windows
APIs for C, C++, C#, Java, Octave,
Perl, Python, R, Ruby, MATLAB
Well documented API
Open-source (LGPL)
http://sbml.org/Software/libSBML

Evolution of SBML continues
Today: SBML Level 3
• Level 3 Core provides framework for common models
• Level 3 packages add additional constructs to the Core

Level 3 package What it enables
Hierarchical model composition Models containing submodels ✔
Flux balance constraints Constraint-based models ✔
Qualitative models Petri net models, Boolean models ✔
Graph layout Diagrams of models ✔
Multicomponent/state species Entities w/ structure; also rule-based models draft
Spatial Nonhomogeneous spatial models draft
Graph rendering Diagrams of models draft
Groups Arbitrary grouping of components draft
Distributions Numerical values as statistical distributions in dev
Arrays & sets Arrays or sets of entities in dev
Dynamic structures Creation & destruction of components in dev
Annotations Richer annotation syntax
Status

NationalInstituteofGeneralMedicalSciences(USA)
European Molecular Biology Laboratory (EMBL)
JST ERATO Kitano Symbiotic Systems Project (Japan) (to 2003)
JST ERATO-SORST Program (Japan)
ELIXIR (UK)
Beckman Institute, Caltech (USA)
Keio University (Japan)
International Joint Research Program of NEDO (Japan)
Japanese Ministry of Agriculture
Japanese Ministry of Educ., Culture, Sports, Science and Tech.
BBSRC (UK)
National Science Foundation (USA)
DARPA IPTO Bio-SPICE Bio-Computation Program (USA)
Air Force Office of Scientific Research (USA)
STRI, University of Hertfordshire (UK)
Molecular Sciences Institute (USA)
SBML funding sources over the past 13+ years

Modelerswanttousetheirownconventions

No standard
identiﬁers

Low info
content
No standard
identiﬁers

Raw models alone are insuﬃcient
Need standard schemes for
machine-readable annotations
• Identify entities
• Mathematical semantics
• Links to other data resources
• Authorship & pub. info
Low info
content
No standard
identiﬁers

Addresses 2 general areas of annotation needs:
MIRIAM is not speciﬁc to SBML
MIRIAM(MinimumInformationRequestedIntheAnnotationofModels)
Requirements for
reference correspondence
Scheme for encoding
annotations
Annotations for
attributing model
creators & sources
Annotations for
referring to external
data resources

Addresses 2 general areas of annotation needs:
MIRIAM is not speciﬁc to SBML
MIRIAM(MinimumInformationRequestedIntheAnnotationofModels)
Requirements for
reference correspondence
Scheme for encoding
annotations
Annotations for
attributing model
creators & sources
Annotations for
data resources
Annotations for
data resources

Example of a problem that can be solved with annotations
http://www.ebi.ac.uk/chebi
Low info
content

Example of a problem that can be solved with annotations
http://www.ebi.ac.uk/chebi
Low info
content
Known by different names –
do you want to write all of
them into your model?
salicylic acid

MIRIAM annotations for external references
Goal: link model constituents to corresponding entities in
bioinformatics resources (e.g., databases, controlled vocabularies)
• Supports:
- Precise identiﬁcation of model constituents
- Discovery of models that concern the same thing
- Comparison of model constituents between diﬀerent models
MIRIAM approach avoids putting data content directly in the model
• Instead, it points at external resources that contain the data

How do we create globally unique identifiers consistently?
Long story short—developed by the Le Novère group at the EBI
• Resource identifiers (URIs) combine 2 parts:
• There’s a registry for namespaces: MIRIAM Registry
- Allows people & software to use same namespace identifiers
• There’s a URI resolution service: MIRIAM Resources & identifiers.org
- Allows people & software to take a given identifier and figure
out what it points to
namespace entity identifier
{
{
Identifies a dataset Identifies a datum
within the dataset

Another problem: software can’t read ﬁgure legends
?
BIOMD0000000319 in BioModels Database
Decroly & Goldbeter, PNAS, 1982

SED-ML = Simulation Experiment Description ML
Application-independent format
•Captures procedures, algorithms, parameter values
Can be used for
•Simulation experiments encoding parametrizations & perturbations
•Simulations using more than one model and/or method
•Data manipulations to produce plot(s)
http://sedml.org
Simulation
Model
Task Data generators
Reports

Eﬀorts like SED-ML improve reproducibility of publications
Waltemath et al.,
BMC Sys Bio 5, 2011.

Need interoperable formats, but developing them is not easy
Need people with diverse set of knowledge & skills
• Scientiﬁc needs
• Technical implementation skills
• Practical experience
Need manage multiple phases of a standardization eﬀort
• Creation
• Evolution
• Support

Need interoperable formats, but developing them is not easy
Need people with diverse set of knowledge & skills
• Scientific needs
• Technical implementation skills
• Practical experience
Need manage multiple phases of a standardization effort
• Creation
• Evolution
• Support
} This is just for the specification of the
standards, to say nothing of the necessary
software and other infrastructure!

Realizations about the state of affairs in late-2000’s
• Many standardization efforts overlapped, but lacked coordination
• Efforts were inventing their own processes from scratch
• Many individual meetings meant more travel for many people
• Limited and fragile funding didn’t support solid, coherent base
COMBINE = Computational Modeling in Biology Network
• Coordinate standards development
• Develop common procedures & tools (but not impose them!)
• Coordinate meetings
• Provide a recognized voice
Motivations for the creation of COMBINE

Standardization efforts represented in COMBINE today
BioPAX
Qualifiers
GPML
COMBINE Standards
Associated Standardization Efforts
Related Standardization Efforts

COMBINE formats cover many types of models
– from Nicolas Le Novère

Examples of community organization
Two main annual meetings, plus ad hoc workshops
• COMBINE meeting: status updates, presentations, outreach
- Next COMBINE: Paris, Sep 16–20, 2013
• HARMONY: Hackathon on Resources for Modeling in Biology
- Software development, interoperability hacking
COMBINE 2012, TorontoCOMBINE 2011, Heidelberg

COMBINE is open to all—and COMBINE needs you!
http://co.mbine.org
Current coordinators:
• Nicolas Le Novère, Mike Hucka, Falk Schreiber, Gary Bader

Time it well
• Too early and too late are bad
Start with actual stakeholders
• Address real needs, not perceived ones
Start with small team of dedicated developers
• Can work faster, more focused; also avoids“designed-by-committee”
Engage people constantly, in many ways
• Electronic forums, email, electronic voting, surveys, hackathons
Make the results free and open-source
• Makes people comfortable knowing it will always be available
Be creative about seeking funding
Some things we (maybe?) got right with SBML

Not waiting for implementations before freezing specifications
• Sometimes finalized specification before implementations tested it
- Especially bad when we failed to do a good job
‣ E.g.,“forward thinking”features, or“elegant”designs
Not formalizing the development process sufficiently
• Especially early in the history, did not have a very open process
Not resolving intellectual property issues from the beginning
• Industrial users ask“who has the right to give any rights to this?”
Some things we certainly got wrong

Nicolas Le Novère, Henning Hermjakob, Camille Laibe, Chen Li, Lukas Endler,
Nico Rodriguez, Marco Donizelli,Viji Chelliah, Mélanie Courtot, Harish Dharuri
Attendees at SBML 10th Anniversary Symposium, Edinburgh, 2010
John C. Doyle, Hiroaki Kitano
Mike Hucka, Sarah Keating, Frank Bergmann, Lucian Smith, Andrew Finney,
Herbert Sauro, Hamid Bolouri, Ben Bornstein, Bruce Shapiro, Akira Funahashi,
Akiya Juraku, Ben Kovitz
OriginalPI’s:
SBMLTeam:
SBMLEditors:
BioModelsDB:
Mike Hucka, Nicolas Le Novère, Sarah Keating, Frank Bergmann, Lucian Smith,
Chris Myers, Stefan Hoops, Sven Sahle, James Schaff, DarrenWilkinson
And a huge thanks to many others in the COMBINE community
This work was made possible thanks to a great community

SBML http://sbml.org
BioModels Database http://biomodels.net/biomodels
MIRIAM http://biomodels.net/miriam
identiﬁers.org http://identiﬁers.org
SED-ML http://biomodels.net/sed-ml
SBO http://biomodels.net/sbo
SBGN http://sbgn.org
COMBINE http://co.mbine.org
URLs

I’d like your feedback!
You can use this anonymous form:
http://tinyurl.com/mhuckafeedback

Computational Approaches to Systems Biology

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Computational Approaches to Systems Biology

Similar to Computational Approaches to Systems Biology (20)

More from Mike Hucka

More from Mike Hucka (16)

Recently uploaded

Recently uploaded (20)

Computational Approaches to Systems Biology