Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Computational Approaches
to Systems Biology
Michael Hucka, Ph.D.
Department of Computing + Mathematical Sciences
Californi...
Outline
Background and introduction
The Systems Biology Markup Language (SBML)
Complementary efforts: MIRIAM and SED-ML
COM...
Outline
Background and introduction
The Systems Biology Markup Language (SBML)
Complementary efforts: MIRIAM and SED-ML
COM...
Research today: experimentation, computation, cogitation
“ The nature of systems biology”
Bruggeman & Westerhoff,
Trends Microbiol. 15 (2007).
Large-scale integrative models are growing
Many models have traditionally been published this way
Problems:
• Errors in printing
• Missing information
• Dependencies...
Is it enough to make your (software X) code available?
It’s vital for good science:
• Someone with access to the same soft...
Is it enough to make your (software X) code available?
It’s vital for good science—
• Someone with access to the same soft...
Different tools different interfaces & languages
Outline
Background and introduction
The Systems Biology Markup Language (SBML)
Complementary efforts: MIRIAM and SED-ML
COM...
SBML:alinguafranca
forsoftware
Format for representing computational models of biological processes
• Data structures + usage principles + serialization ...
The raw SBML (as XML)
The process is central
• Literally called a“reaction”in SBML
• Participants are pools of entities (biochemical species)
Mo...
Well-stirred compartments
c
n
Some basics of SBML core model encoding
Species pools are located in compartments
c
n
protein A protein B
gene mRNAn mRNAc
Reactions can involve any species anywhere
c
n
protein A protein B
gene mRNAn mRNAc
Reactions can cross compartment boundaries
c
n
protein A protein B
gene mRNAn mRNAc
Reaction/process rates can be (almost) arbitrary formulas
c
n
protein A protein B
gene mRNAn mRNAc
f1(x)
f2(x)
f3(x)f4(x)
...
“Rules”: equations expressing relationships in addition to reaction sys.
c
n
protein A protein B
gene mRNAn mRNAc
f1(x)
f2...
“Events”: discontinuous actions triggered by system conditions
c
n
protein A protein B
gene mRNAn mRNAc
f1(x)
f2(x)
f3(x)
...
Annotations: machine-readable semantics and links to other resources
Event1: when (...condition...),
do (...assignments......
BioModels Database
http://biomodels.net/biomodels
Contents of BioModels Database
Contents today:
• 142,000+ pathway models (converted from KEGG)
• 460+ hand-curated quantit...
Find software in the SBML Software Guide
Find SBML software
Find software in the SBML Software Guide
Question: Which of the following categories best describe your software?
(Check all that apply.)
Results of 2011 survey of...
Some particularly full-featured, general simulation tools
COPASI: ODE & stochastic simulation, parameter scanning, plottin...
Free software libraries – libSBML
Reads, writes, validates SBML
Can check & convert units
Written in portable C++
Runs on ...
Evolution of SBML continues
Today: SBML Level 3
• Level 3 Core provides framework for common models
• Level 3 packages add...
Level 3 package What it enables
Hierarchical model composition Models containing submodels ✔
Flux balance constraints Cons...
NationalInstituteofGeneralMedicalSciences(USA)
European Molecular Biology Laboratory (EMBL)
JST ERATO Kitano Symbiotic Sys...
Outline
Background and introduction
The Systems Biology Markup Language (SBML)
Complementary efforts: MIRIAM and SED-ML
COM...
Modelerswanttousetheirownconventions
Modelerswanttousetheirownconventions
No standard
identifiers
Modelerswanttousetheirownconventions
Low info
content
No standard
identifiers
Raw models alone are insufficient
Need standard schemes for
machine-readable annotations
• Identify entities
• Mathematical ...
Addresses 2 general areas of annotation needs:
MIRIAM is not specific to SBML
MIRIAM(MinimumInformationRequestedIntheAnnota...
Addresses 2 general areas of annotation needs:
MIRIAM is not specific to SBML
MIRIAM(MinimumInformationRequestedIntheAnnota...
Example of a problem that can be solved with annotations
http://www.ebi.ac.uk/chebi
Low info
content
Example of a problem that can be solved with annotations
http://www.ebi.ac.uk/chebi
Low info
content
Known by different na...
MIRIAM annotations for external references
Goal: link model constituents to corresponding entities in
bioinformatics resou...
How do we create globally unique identifiers consistently?
Long story short—developed by the Le Novère group at the EBI
• R...
Another problem: software can’t read figure legends
?
BIOMD0000000319 in BioModels Database
Decroly & Goldbeter, PNAS, 1982
SED-ML = Simulation Experiment Description ML
Application-independent format
•Captures procedures, algorithms, parameter v...
Efforts like SED-ML improve reproducibility of publications
Waltemath et al.,
BMC Sys Bio 5, 2011.
Outline
Background and introduction
The Systems Biology Markup Language (SBML)
Complementary efforts: MIRIAM and SED-ML
COM...
Need interoperable formats, but developing them is not easy
Need people with diverse set of knowledge & skills
• Scientific...
Need interoperable formats, but developing them is not easy
Need people with diverse set of knowledge & skills
• Scientific...
Realizations about the state of affairs in late-2000’s
• Many standardization efforts overlapped, but lacked coordination
• ...
Standardization efforts represented in COMBINE today
BioPAX
Qualifiers
GPML
COMBINE Standards
Associated Standardization Effo...
COMBINE formats cover many types of models
– from Nicolas Le Novère
Examples of community organization
Two main annual meetings, plus ad hoc workshops
• COMBINE meeting: status updates, pres...
COMBINE is open to all—and COMBINE needs you!
http://co.mbine.org
Current coordinators:
• Nicolas Le Novère, Mike Hucka, F...
Outline
Background and introduction
The Systems Biology Markup Language (SBML)
Complementary efforts: MIRIAM and SED-ML
COM...
Time it well
• Too early and too late are bad
Start with actual stakeholders
• Address real needs, not perceived ones
Star...
Not waiting for implementations before freezing specifications
• Sometimes finalized specification before implementations tes...
Nicolas Le Novère, Henning Hermjakob, Camille Laibe, Chen Li, Lukas Endler,
Nico Rodriguez, Marco Donizelli,Viji Chelliah,...
SBML http://sbml.org
BioModels Database http://biomodels.net/biomodels
MIRIAM http://biomodels.net/miriam
identifiers.org h...
I’d like your feedback!
You can use this anonymous form:
http://tinyurl.com/mhuckafeedback
Computational Approaches to Systems Biology
Upcoming SlideShare
Loading in …5
×

Computational Approaches to Systems Biology

1,410 views

Published on

Presentation given at the Sydney Computational Biologists meetup on 21 August 2013 (http://australianbioinformatics.net/past-events/2013/8/21/computational-approaches-to-systems-biology.html).

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Computational Approaches to Systems Biology

  1. 1. Computational Approaches to Systems Biology Michael Hucka, Ph.D. Department of Computing + Mathematical Sciences California Institute of Technology Pasadena, CA, USA The Kinghorn Cancer Centre, Australia, August 2013 Email: mhucka@caltech.edu Twitter: @mhucka
  2. 2. Outline Background and introduction The Systems Biology Markup Language (SBML) Complementary efforts: MIRIAM and SED-ML COMBINE: the Computational Modeling in Biology Network Conclusion
  3. 3. Outline Background and introduction The Systems Biology Markup Language (SBML) Complementary efforts: MIRIAM and SED-ML COMBINE: the Computational Modeling in Biology Network Conclusion
  4. 4. Research today: experimentation, computation, cogitation
  5. 5. “ The nature of systems biology” Bruggeman & Westerhoff, Trends Microbiol. 15 (2007).
  6. 6. Large-scale integrative models are growing
  7. 7. Many models have traditionally been published this way Problems: • Errors in printing • Missing information • Dependencies on implementation • Outright errors • Can be a huge effort to recreate Is it enough to communicate the model in a paper?
  8. 8. Is it enough to make your (software X) code available? It’s vital for good science: • Someone with access to the same software can try to run it, understand it, verify the computational results, build on them, etc. • Opinion: you should always do this in any case
  9. 9. Is it enough to make your (software X) code available? It’s vital for good science— • Someone with access to the same software can try to run it, understand it, build on it, etc. • Opinion: you should always do this in any case But it’s still not ideal for communication of scientific results: • Doesn’t necessarily encode biological semantics of the model • What if they don’t have access to the same software? • What if they don’t want to use that software? • What if they want to use a different conceptual framework? • And how will people be able to relate the model to other work?
  10. 10. Different tools different interfaces & languages
  11. 11. Outline Background and introduction The Systems Biology Markup Language (SBML) Complementary efforts: MIRIAM and SED-ML COMBINE: the Computational Modeling in Biology Network Conclusion
  12. 12. SBML:alinguafranca forsoftware
  13. 13. Format for representing computational models of biological processes • Data structures + usage principles + serialization to XML • (Mostly) Declarative, not procedural—not a scripting language Neutral with respect to modeling framework • E.g., ODE, stochastic systems, etc. Important: software reads/writes SBML, not humans SBML = Systems Biology Markup Language
  14. 14. The raw SBML (as XML)
  15. 15. The process is central • Literally called a“reaction”in SBML • Participants are pools of entities (biochemical species) Models can further include: • Compartments • Other constants & variables • Discontinuous events • Other, explicit math Core SBML concepts are fairly simple • Unit definitions • Annotations
  16. 16. Well-stirred compartments c n Some basics of SBML core model encoding
  17. 17. Species pools are located in compartments c n protein A protein B gene mRNAn mRNAc
  18. 18. Reactions can involve any species anywhere c n protein A protein B gene mRNAn mRNAc
  19. 19. Reactions can cross compartment boundaries c n protein A protein B gene mRNAn mRNAc
  20. 20. Reaction/process rates can be (almost) arbitrary formulas c n protein A protein B gene mRNAn mRNAc f1(x) f2(x) f3(x)f4(x) f5(x)
  21. 21. “Rules”: equations expressing relationships in addition to reaction sys. c n protein A protein B gene mRNAn mRNAc f1(x) f2(x) f3(x) g1(x) g2(x) . . . f4(x) f5(x)
  22. 22. “Events”: discontinuous actions triggered by system conditions c n protein A protein B gene mRNAn mRNAc f1(x) f2(x) f3(x) g1(x) g2(x) . . . Event1: when (...condition...), do (...assignments...) Event2: when (...condition...), do (...assignments...) ... f4(x) f5(x)
  23. 23. Annotations: machine-readable semantics and links to other resources Event1: when (...condition...), do (...assignments...) Event2: when (...condition...), do (...assignments...) ... c n protein A protein B gene mRNAn mRNAc f1(x) f2(x) f3(x) g1(x) g2(x) . . . f4(x) f5(x) “This event represents ...” “This is identified by GO id # ...” “This is an enzymatic reaction with EC # ...” “This is a transport into the nucleus ...” “This compartment represents the nucleus ...”
  24. 24. BioModels Database http://biomodels.net/biomodels
  25. 25. Contents of BioModels Database Contents today: • 142,000+ pathway models (converted from KEGG) • 460+ hand-curated quantitative models • 460+ non-curated quantitative models 8% 2% 3% 6% 6% 7% 8% 9% 24% 27% signal transduction metabolic process multicelullar organismal process rhythmic process cell cycle homeostatic process response to stimulus cell death localization others (e.g., developmental process) Database data from 2013
  26. 26. Find software in the SBML Software Guide
  27. 27. Find SBML software Find software in the SBML Software Guide
  28. 28. Question: Which of the following categories best describe your software? (Check all that apply.) Results of 2011 survey of SBML-compatible software Out of 81 responses Simulation software Analysis s/w (in addition, or instead of, simulation) Creation/model development software Visualization/display/formatting software Utility software (e.g., format conversion) Data integration and management software Repository or database Framework or library (for use in developing s/w) S/w for interactive env. (e.g., MATLAB, R, ...) Annotation software 0 20 40 60 80 11 13 13 14 16 23 31 31 40 42
  29. 29. Some particularly full-featured, general simulation tools COPASI: ODE & stochastic simulation, parameter scanning, plotting Virtual Cell: web-based environment, spatial models iBioSim: special features for genetic circuit models for synthetic biology SBW (Systems Biology Workbench): component-based toolkit SBMLsimulator: Java-based simulator, web-start or stand-alone CellDesigner: graphical editing, SBGN support, SABIO-RK integration
  30. 30. Free software libraries – libSBML Reads, writes, validates SBML Can check & convert units Written in portable C++ Runs on Linux, Mac, Windows APIs for C, C++, C#, Java, Octave, Perl, Python, R, Ruby, MATLAB Well documented API Open-source (LGPL) http://sbml.org/Software/libSBML
  31. 31. Evolution of SBML continues Today: SBML Level 3 • Level 3 Core provides framework for common models • Level 3 packages add additional constructs to the Core
  32. 32. Level 3 package What it enables Hierarchical model composition Models containing submodels ✔ Flux balance constraints Constraint-based models ✔ Qualitative models Petri net models, Boolean models ✔ Graph layout Diagrams of models ✔ Multicomponent/state species Entities w/ structure; also rule-based models draft Spatial Nonhomogeneous spatial models draft Graph rendering Diagrams of models draft Groups Arbitrary grouping of components draft Distributions Numerical values as statistical distributions in dev Arrays & sets Arrays or sets of entities in dev Dynamic structures Creation & destruction of components in dev Annotations Richer annotation syntax Status
  33. 33. NationalInstituteofGeneralMedicalSciences(USA) European Molecular Biology Laboratory (EMBL) JST ERATO Kitano Symbiotic Systems Project (Japan) (to 2003) JST ERATO-SORST Program (Japan) ELIXIR (UK) Beckman Institute, Caltech (USA) Keio University (Japan) International Joint Research Program of NEDO (Japan) Japanese Ministry of Agriculture Japanese Ministry of Educ., Culture, Sports, Science and Tech. BBSRC (UK) National Science Foundation (USA) DARPA IPTO Bio-SPICE Bio-Computation Program (USA) Air Force Office of Scientific Research (USA) STRI, University of Hertfordshire (UK) Molecular Sciences Institute (USA) SBML funding sources over the past 13+ years
  34. 34. Outline Background and introduction The Systems Biology Markup Language (SBML) Complementary efforts: MIRIAM and SED-ML COMBINE: the Computational Modeling in Biology Network Conclusion
  35. 35. Modelerswanttousetheirownconventions
  36. 36. Modelerswanttousetheirownconventions No standard identifiers
  37. 37. Modelerswanttousetheirownconventions Low info content No standard identifiers
  38. 38. Raw models alone are insufficient Need standard schemes for machine-readable annotations • Identify entities • Mathematical semantics • Links to other data resources • Authorship & pub. info Modelerswanttousetheirownconventions Low info content No standard identifiers
  39. 39. Addresses 2 general areas of annotation needs: MIRIAM is not specific to SBML MIRIAM(MinimumInformationRequestedIntheAnnotationofModels) Requirements for reference correspondence Scheme for encoding annotations Annotations for attributing model creators & sources Annotations for referring to external data resources
  40. 40. Addresses 2 general areas of annotation needs: MIRIAM is not specific to SBML MIRIAM(MinimumInformationRequestedIntheAnnotationofModels) Requirements for reference correspondence Scheme for encoding annotations Annotations for attributing model creators & sources Annotations for referring to external data resources Annotations for referring to external data resources
  41. 41. Example of a problem that can be solved with annotations http://www.ebi.ac.uk/chebi Low info content
  42. 42. Example of a problem that can be solved with annotations http://www.ebi.ac.uk/chebi Low info content Known by different names –  do you want to write all of them into your model? salicylic acid
  43. 43. MIRIAM annotations for external references Goal: link model constituents to corresponding entities in bioinformatics resources (e.g., databases, controlled vocabularies) • Supports: - Precise identification of model constituents - Discovery of models that concern the same thing - Comparison of model constituents between different models MIRIAM approach avoids putting data content directly in the model • Instead, it points at external resources that contain the data
  44. 44. How do we create globally unique identifiers consistently? Long story short—developed by the Le Novère group at the EBI • Resource identifiers (URIs) combine 2 parts: • There’s a registry for namespaces: MIRIAM Registry - Allows people & software to use same namespace identifiers • There’s a URI resolution service: MIRIAM Resources & identifiers.org - Allows people & software to take a given identifier and figure out what it points to namespace entity identifier { { Identifies a dataset Identifies a datum within the dataset
  45. 45. Another problem: software can’t read figure legends ? BIOMD0000000319 in BioModels Database Decroly & Goldbeter, PNAS, 1982
  46. 46. SED-ML = Simulation Experiment Description ML Application-independent format •Captures procedures, algorithms, parameter values Can be used for •Simulation experiments encoding parametrizations & perturbations •Simulations using more than one model and/or method •Data manipulations to produce plot(s) http://sedml.org Simulation Model Task Data generators Reports
  47. 47. Efforts like SED-ML improve reproducibility of publications Waltemath et al., BMC Sys Bio 5, 2011.
  48. 48. Outline Background and introduction The Systems Biology Markup Language (SBML) Complementary efforts: MIRIAM and SED-ML COMBINE: the Computational Modeling in Biology Network Conclusion
  49. 49. Need interoperable formats, but developing them is not easy Need people with diverse set of knowledge & skills • Scientific needs • Technical implementation skills • Practical experience Need manage multiple phases of a standardization effort • Creation • Evolution • Support
  50. 50. Need interoperable formats, but developing them is not easy Need people with diverse set of knowledge & skills • Scientific needs • Technical implementation skills • Practical experience Need manage multiple phases of a standardization effort • Creation • Evolution • Support } This is just for the specification of the standards, to say nothing of the necessary software and other infrastructure!
  51. 51. Realizations about the state of affairs in late-2000’s • Many standardization efforts overlapped, but lacked coordination • Efforts were inventing their own processes from scratch • Many individual meetings meant more travel for many people • Limited and fragile funding didn’t support solid, coherent base COMBINE = Computational Modeling in Biology Network • Coordinate standards development • Develop common procedures & tools (but not impose them!) • Coordinate meetings • Provide a recognized voice Motivations for the creation of COMBINE
  52. 52. Standardization efforts represented in COMBINE today BioPAX Qualifiers GPML COMBINE Standards Associated Standardization Efforts Related Standardization Efforts
  53. 53. COMBINE formats cover many types of models – from Nicolas Le Novère
  54. 54. Examples of community organization Two main annual meetings, plus ad hoc workshops • COMBINE meeting: status updates, presentations, outreach - Next COMBINE: Paris, Sep 16–20, 2013 • HARMONY: Hackathon on Resources for Modeling in Biology - Software development, interoperability hacking COMBINE 2012, TorontoCOMBINE 2011, Heidelberg
  55. 55. COMBINE is open to all—and COMBINE needs you! http://co.mbine.org Current coordinators: • Nicolas Le Novère, Mike Hucka, Falk Schreiber, Gary Bader
  56. 56. Outline Background and introduction The Systems Biology Markup Language (SBML) Complementary efforts: MIRIAM and SED-ML COMBINE: the Computational Modeling in Biology Network Conclusion
  57. 57. Time it well • Too early and too late are bad Start with actual stakeholders • Address real needs, not perceived ones Start with small team of dedicated developers • Can work faster, more focused; also avoids“designed-by-committee” Engage people constantly, in many ways • Electronic forums, email, electronic voting, surveys, hackathons Make the results free and open-source • Makes people comfortable knowing it will always be available Be creative about seeking funding Some things we (maybe?) got right with SBML
  58. 58. Not waiting for implementations before freezing specifications • Sometimes finalized specification before implementations tested it - Especially bad when we failed to do a good job ‣ E.g.,“forward thinking”features, or“elegant”designs Not formalizing the development process sufficiently • Especially early in the history, did not have a very open process Not resolving intellectual property issues from the beginning • Industrial users ask“who has the right to give any rights to this?” Some things we certainly got wrong
  59. 59. Nicolas Le Novère, Henning Hermjakob, Camille Laibe, Chen Li, Lukas Endler, Nico Rodriguez, Marco Donizelli,Viji Chelliah, Mélanie Courtot, Harish Dharuri Attendees at SBML 10th Anniversary Symposium, Edinburgh, 2010 John C. Doyle, Hiroaki Kitano Mike Hucka, Sarah Keating, Frank Bergmann, Lucian Smith, Andrew Finney, Herbert Sauro, Hamid Bolouri, Ben Bornstein, Bruce Shapiro, Akira Funahashi, Akiya Juraku, Ben Kovitz OriginalPI’s: SBMLTeam: SBMLEditors: BioModelsDB: Mike Hucka, Nicolas Le Novère, Sarah Keating, Frank Bergmann, Lucian Smith, Chris Myers, Stefan Hoops, Sven Sahle, James Schaff, DarrenWilkinson And a huge thanks to many others in the COMBINE community This work was made possible thanks to a great community
  60. 60. SBML http://sbml.org BioModels Database http://biomodels.net/biomodels MIRIAM http://biomodels.net/miriam identifiers.org http://identifiers.org SED-ML http://biomodels.net/sed-ml SBO http://biomodels.net/sbo SBGN http://sbgn.org COMBINE http://co.mbine.org URLs
  61. 61. I’d like your feedback! You can use this anonymous form: http://tinyurl.com/mhuckafeedback

×