SBML and related resources 
and standardization efforts


Published on

Slides from presentation given on November 21, 2011, at the 4th Global COE International Symposium on Physiome and Systems Biology for Integrated Life Sciences and Predictive Medicine, in Osaka, Japan.

Published in: Technology, Health & Medicine
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

SBML and related resources 
and standardization efforts

  1. 1. SBML and related resources and standardization efforts Michael Hucka Member of the Professional Staff Computing + Mathematical Sciences California Institute of Technology 1
  2. 2. Research today: experimentation, modeling, cogitation 2
  3. 3. 3
  4. 4. Different tools different interfaces & languages 4
  5. 5. SBML 5
  6. 6. SBML = Systems Biology Markup LanguageFormat for representing computational models • Data structures + rules for their use + serialization to XMLNeutral with respect to modeling framework • E.g., ODE, stochastic systems, etc.A lingua franca for software (not humans) 6
  7. 7. Basic SBML concepts are fairly simpleThe reaction is central: a process occurring at a given rate f ([A],[B],[P ],...) na A + nb B ⇥ np P f (...) nc C ⇥ nd D + ne E + nf F . . . • Participants are pools of entities (species)Models can further include: • Other constants & variables • Unit definitions • Compartments • Annotations • Explicit math • Discontinuous events 7
  8. 8. Example of a common type of model Simulation output Tyson et al. (1991) PNAS 88(1):7328–32 8
  9. 9. Signaling pathway models Fernandez et al. (2006) DARPP-32 Is a Robust Integrator of Dopamine and Glutamate Signals PLoS Computational Biology BioModels Database model #BIOMD0000000153 Scope of SBML encompasses many types of models 9
  10. 10. Signaling pathway models Hodgkin & Huxley (1952) A quantitative description ofConductance-based models membrane current and its • application to conduction and “Rate rules” for temporal evolution excitation in nerve of quantitative parameters J. Physiology 117:500–544 BioModels Database model #BIOMD0000000020 Scope of SBML encompasses many types of models 10
  11. 11. Signaling pathway models Izhikevich EM. (2003) Simple model of spiking neurons.Conductance-based models IEEE Trans Neural Net. • “Rate rules” for temporal evolution of quantitative parameters BioModels Database model #BIOMD0000000127Neural models • “Events” for discontinuous changes in quantitative parameters Scope of SBML encompasses many types of models 11
  12. 12. Signaling pathway models Tham et al. (2008) A pharmacodynamic model forConductance-based models the time course of tumor shrinkage by gemcitabine + • “Rate rules” for temporal evolution of quantitative parameters carboplatin in non-small cell lung cancer patients Clin. Cancer Res. 14Neural models BioModels Database model • “Events” for discontinuous changes in quantitative parameters #BIOMD0000000234Pharmacokinetic/dynamics models • “Species” is not required to be a biochemical entity Scope of SBML encompasses many types of models 12
  13. 13. Signaling pathway models Munz et al. (2009 )Conductance-based models When zombies attack!: Mathematical modelling of an • “Rate rules” for temporal evolution of quantitative parameters outbreak of zombie infection Infectious Disease Modelling Research Progress, eds. Tchuenche et al., p. 133–150Neural models • “Events” for discontinuous changes in quantitative parameters BioModels Database model #MODEL1008060001Pharmacokinetic/dynamics models • “Species” is not required to be a biochemical entityInfectious diseases Scope of SBML encompasses many types of models 13
  14. 14. SBML Level 1 SBML Level 2 SBML Level 3predefined math functions user-defined functions user-defined functionstext-string math notation MathML subset MathML subsetreserved namespaces for no reserved namespaces no reserved namespaces annotations for annotations for annotationsno controlled annotation RDF-based controlled RDF-based controlled scheme annotation scheme annotation scheme no discrete events discrete events discrete events default values defined default values defined no default values monolithic monolithic modular 14
  15. 15. SBML Level 3: Supporting more categories of models Package W Package X Package Y Package Z SBML Level 3 Core (dependencies)A package adds constructs & capabilitiesModels declare which packages they use • Applications tell users which packages they supportPackage development can be decoupled 15
  16. 16. Preliminary libSBML 5Level 3 package Active? plug-in available?Graph layout ✓Groups ✓Hierarchical composition ✓Flux balance constraints ✓Spatial ✓Multicomponent species ✓Annotations ✓Graph rendering ✓Distribution & ranges ✓Qualitative models ✓Dynamic structuresArrays & sets 16
  17. 17. Preliminary libSBML 5Level 3 package Active? plug-in available?Graph layout ✓Groups ✓Hierarchical composition ✓ Models composed of submodelsFlux balance constraints ✓Spatial ✓Multicomponent species ✓Annotations ✓Graph rendering ✓Distribution & ranges ✓Qualitative models ✓Dynamic structuresArrays & sets 16
  18. 18. Preliminary libSBML 5Level 3 package Active? plug-in available?Graph layout ✓Groups ✓Hierarchical composition ✓Flux balance constraints ✓ 2-D and 3-D spatial geometries andSpatial ✓ spatial processesMulticomponent species ✓Annotations ✓Graph rendering ✓Distribution & ranges ✓Qualitative models ✓Dynamic structuresArrays & sets 16
  19. 19. Hierarchical model composition 17
  20. 20. Goal of supporting model composition is not new Modular SBML CellML has always had capability Martin Ginkel & Jörg Stelling made MAX−PLANCK−INSTITUT DYNAMIK KOMPLEXER TECHNISCHER SYSTEME MAGDEBURG Martin Ginkel proposals mid-2001, 2002 Max-Planck-Institute for Dynamics of complex technical Systems Magdeburg, Germany 5th July 2002 • Influenced by ProMoT/DIVA Jonathan Webb also made a proposal in 2003The Systems Biology Markup Language (SBML) [1-3] is a computer-readable format for representing models ofbiochemical reaction networks. It is applicable to many subject areas: • metabolic networks, • cell-signaling pathways, • genomic regulatory networks, and • many other modelling problems in systems biology.SBML is based on XML, a standard medium for representing and transporting data that is widely supported on the Internetas well as in computational biology and bioinformatics.Because SBML is completely tool-independent, it enables Some types of model use indexed collections of objects to describe biological phenomena [7]. We have developed a proposal for an array extension to address this requirement [8] which has the following features: • Arrays of , , , , structures can be created. These arrays can have any number of dimensions where the range of each dimension is determined by two MathML integer expressions. • An object of one of these types can have an MathML expression which defines whether the object exists. This enables the definition of sparse arrays which turn provides a mechanism for defining connection patterns among array elements. • Specific objects within an array can be referenced from other objects using a variant of the direct link structure introduced by the model composition proposal. An array selector operator performs a similar function in MathML. • Context of Bio-SPICE project Andrew Finney made alternate • use of multiple simulation and analysis tools in a single research project without rewriting models for each tool • Arrays can be declared in a less verbose form (the implied form) which allows the array to inherit dimensions from • publication of models in peer-reviewed journals: other researchers can download and use your model even if they use a other arrays. different modelling environment • survival of models: they can outlive the software used to create them, making your work still useful even if a particular • Arrays of and structures introduced by the Model Composition Proposal can be incorporated if simulation package is no longer supported required. This would allow for example the encoding of a model of tissue represented as an array of instances of cellSBML has been evolving since mid-2000 through the efforts of many collaborators who make up the SBML Forum. Today, submodels.SBML is supported by over 60 software applications In SBML Level 2 represents a pool of chemical entities all of the same single state in a specific compartment.As SBML evolves the community creates SBML Levels. Each new level adds new features to the language. SBML Level 2 was cannot be composed from components. Given that several groups find this representation of species limited, astandardized in 2003. Simple software tools can use SBML Level 1, the first and most basic version of SBML. More proposal for a multicomponent species extension to SBML has been written [9]. This proposal aims to satisfy the followingsophisticated systems can use SBML Level 2, with its enhanced capabilities. SBML Level 3 is actively being developed requirements:through the SBML Forum • Relate species of the same type that are located in different compartments proposal in 2003, kept up discussions • Enable reactions to defined that are generalized across compartmentsSBML Level 3 is being designed collaboratively by todays leading developers of open-source software for • Enable species to be defined as composed of componentscomputational biology. SBML Level 3 development has been divided into several modules including: • Enable reactions to be generalized to apply to sets of species states These requirements address the near-term needs of modellers of metabolic networks and the longer-term requirements of• Diagrams: SBML extensions to store the graphical diagrams of models that can be created in many of todays modellers of signal transduction networks. graphical pathway editors.• Model Composition: SBML extensions to support the representation of models that are composed from submodels (See Sections Proposals for Model Composition and Model Composition Example). The proposal described here [8] introduces a number of basic facilities that overcome some of the limitations of SBML Level 2 and provide a foundation for a representation scheme that address all the requirements for a multicomponent species• Multicomponent Species: SBML extensions to enable the compact representation of species having multiple proposal. possible states (e.g., due to phosphorylation) and/or configurations with other species (e.g., protein complexes). (See section Requirements of a Multicomponent Species Proposal and following sections.) The proposal introduces a new structure which represents the set of all biochemical entities of a given type irrespective of the location of those entities. Species structures can refer to species types which enables species of the same• Arrays: SBML data structures to permit arrays of items (such as species, compartments, and others) to be grouped type to be related together when the given species are located in different compartments. Similarly reactions can be through 2004 and manipulated en masse. Sparse arrays will be supported and could be used as a way to describe network generalized to apply to species types instead of species. Such a reaction applies to all compartments in a model. connection schemes. (See Section Array Proposal).• Spatial Features: SBML extensions to describe the 2-D and 3-D spatial characteristics of models, including the geometry of compartments, the diffusion properties of species, and the specification of different species The following diagrams show various cases of how a species type may be defined. Some of these concentrations across different regions of a cell. species type structures refer to each other.• Controlled Vocabularies: extension of SBML to enable components of a model to be labelled with terms taken from t biologically and computationally meaningful controlled vocabularies. A simple species type is indivisibleTo date, there have been several proposals for SBML extensions to support model composition. These come from Martin vGinkel (MPI Magdeburg) [4], Jonathan Webb (BBN) [5] and Andrew Finney [6]. The common idea is to support the A species type can define a number ofcomposition of larger models from smaller ones (submodels). Under these proposals, a model could contain: external labelled binding sites A• Submodel definitions: Models may be contained within an SBML document or an SBML document can reference external models. species type• Instances of submodels: Models may contain instances of submodels that are complete copies of the submodels. A species type instance identifier model can contain more than one instance of a submodel. A model consists of a hierarchy of instances of submodels.• Links between objects: Models may contain links between objects at arbitrary positions in the instance hierarchy. Such a link indicates that the linked objects are replaced by a single object. The links are directional; the direction x indicates which object overloads its attribute values to create the final object. A species type is a graph of species type y v instances connected by bonds 0 q p SBML efforts stalled in ‘05–’06 ...• Direct Reference links: SBML attributes that reference other objects, for example on C B A can be replaced by elements which enable objects in arbitrary positions in the instance hierarchy to be referenced. unoccupied bond species type binding site When composing a model, it is often necessary to merge objects from different submodels. The model composition instance proposals provide mechanism for doing this. Consider the following model, without interfaces, containing two identifier instances each of a different submodel. In this example, we merge species g with h and i with f: Instance A Instance B In this section we show examples of two ways in which a reaction can be defined under this proposal. The following diagram i shows an example of the first approach. The diagram shows a simple reaction in which two entities of types t and z are d f consumed to create an entity of type s. The internal structure of t, z and s are not relevant to the reaction. t z s g j + The following diagram shows the second more complex approach in which the reactants and products of a reaction are e defined as graphs of species instances. The diagram shows a reaction in which two entities come together to form a larger h molecule. The instances of species types are identified so that the transformational details of the reaction are captured. Lucian Smith & Mike Hucka w v w v 0 B 0 + 0 A p 0 B A p Port Reaction Link Species The following model is equivalent but has defined interfaces: The complex reaction scheme described above is extended so that reactions can be applied to a class of species states rather than individual species states. Without this extension, all species states and the reactions that apply to them would have to be Instance A Instance B enumerated. A reaction can be generalized to cover all states of one or more binding sites. In the following example diagram, i species type y has 2 binding sites C and D. This reaction shows that an entity t of type v binds to an entity s of type y d F f irrespective of the state of the C binding site on s. The state of the C binding site on s is captured by the variable G which is mapped from the reactants to the product. D g j y v y v J G G C s D 0 + 0 A t G C s D A t restarted effort in ’10 H e h Arbitrary Subgraph E Support for the development of SBML and associated software and activities comes from the National Human Genome Research Institute (USA), the National Institute of General Medical Sciences (USA), the International Joint Research Program of NEDO (Japan), the ERATO-SORST Program of the Japan Science and Technology Agency (Japan), the Ministry of Agriculture (Japan), the Ministry of Education, Culture, Sports, Science and Technology (Japan), the BBSRC e-Science Along with merging equivalent entities form a single object, when combining models it is useful to be able to create Initiative (UK), the DARPA IPTO Bio-Computation Program (USA), and the Air Force Office of Scientific Research (USA). reactions that link models. The model composition proposals allow reactions to connect species in different instances of submodels. For example, consider the following model containing a reaction between two ports: [1] M. Hucka et al., The systems biology markup language (SBML): a medium for representation and exchange of biochemical network Instance X Instance Y models, Bioinformatics, Vol 19, 524-531 [2] A. Finney and M. Hucka, Systems Biology Markup Language: Level 2 and Beyond, Biochem. Soc. Trans., Vol 31, 1472-1473 [3] M. Hucka et al., Evolving a Lingua Franca and Associated Software Infrastructure for Computational Systems Biology: The Systems a b c Biology Markup Language (SBML) Project, Systems Biology, Vol 1, 41-53 P Q d [4] M. Ginkel, Modular SBML, Proposal for an Extension of SBML towards level 2 Proceedings of the 5th Workshop on Software Platforms for Systems Biology, [5] J. Webb, BioSpice MDL Model Composition and Libraries [6] A. Finney, Systems Biology Markup Language (SBML) Level 3 Proposal: Model Composition Features [7] H. Jˆnnson et al., Signalling in multicellular models of plant development, Proceedings of the 3rd International Conference on Systems Biology [8] A. Finney, Systems Biology Markup Language (SBML) Level 3 Proposal: Array Features, [9] A. Finney, Systems Biology Markup Language (SBML) Level 3 Proposal: Multicomponent Species Features, 18
  21. 21. Composition as it is currently envisionedGoals: • Separate concepts of model definition vs instantiation of the model - Can define single model definition & instantiate multiple copies - Can create model libraries • Selective replacement and/or deletion of entities • Optional explicit interfaces (“ports”)Latest proposal: • • Preliminary implementation for libSBML is nearly ready 19
  22. 22. Scenario #1File “X” Single submodel template<sbml> Model definition “A” instantiated multiple times in the enclosing model <model> Submodel “B” Pointer to def. “A” Submodel “C” Pointer to def. “A” 20
  23. 23. Scenario #2File “X” Arbitrary nesting—model<sbml> Model definition “C” instantiates another model definition that itself instantiates another model definition Model definition “B” Submodel “A” Pointer to def. “C” <model> Submodel “Z” Pointer to def. “B” 21
  24. 24. Scenario #3 File “Y” <model>File “X”<sbml> External model definition “B” Models in external files <model> Submodel “Z” Pointer to def. “B” 22
  25. 25. Links/references/replacementsModel “outer” Model “inner” S1 S2 Compartment “c” X1 X2 Compartment “q” Implied modelModel “outer” S1 S2 X2 (from “inner”) Compartment “c” 23
  26. 26. Interfaces/portsModel “outer” Model “inner” S1 S2 X1 X2 Compartment “c” Compartment “q” 24
  27. 27. Spatial geometry 25
  28. 28. The problemCore SBML only supports compartments containing well-stirred mixtures • Lack support for defining geometric shape of compartments • Lack support for nonuniform molecular distributions • Lack support for expressing diffusion processesThe only way to do it portably in SBML is to fake it • E.g., define a large number of small compartments... 26
  29. 29. The current proposalMain components: • Coordinate systems • Patches of spatial geometries, called domains - Domain = contiguous patch of volumetric space or surface patch • Mapping of SBML compartments, species, & parameters to domains • Molecular transport mechanisms (e.g., advection, diffusion) • Mapping of molecular transport mechanisms to domainsDeveloped & implemented by Jim Schaff of the Virtual Cell group • (Incomplete) proposal doc at • Beta test implementation for libSBML available today 27
  30. 30. Supports multiple alternatives for defining geometries1. Analytic2. Sampled field3. Constructive solid geometry4. Parametric shapes 28
  31. 31. — additional extensions ...— additional extensions ... 29
  32. 32. Where to learn more 30
  33. 33. Where to learn more:—the SBML portal 31
  34. 34. Where to learn more:—the SBML portal Find SBML software 31
  35. 35. Where to find curated, ready-to-run models BioModels Database 32
  36. 36. Features of BioModels DatabaseStores & serves quantitative models of biological interest • Free, public resource • Models must be described in peer-reviewed publication(s)All models are curated by hand to reproduce published resultsImports & exports models in several formats • SBML, CellML, SciLab, XPP, BioPAXToday: 750+ modelsDeveloped by Nicolas Le Novère’s group (EBI), funded by EBI & NIH 33
  37. 37. There’s more to modeling than SBML 34
  38. 38. Model Procedures ResultsRepresentation format SBRML Minimal info ? requirements Semantics— Mathematical Biological annotations annotations annotations 35
  39. 39. Model Procedures ResultsRepresentation format SBRML Minimal info ? requirements Semantics— Mathematical Biological annotations annotations annotations 35
  40. 40. Annotations add semantics and connectionsAnnotations can answer questions: • “What other identities (synonyms) does this entity have?” • “What exactly is the process represented by equation ‘r17’?” • “What role does constant ‘k3’ play in equation ‘r17’?” • “What organism are we talking about?” • ... etc. ...Multiple annotations on same entity are common 36
  41. 41. Le Novère et al., Nature Biotech., 23(12), 2005. 37
  42. 42. Element in Entity elsewherethe model (e.g., in a database) relationship qualifier (optional)MIRIAM cross-references are simple triples 38
  43. 43. Annotations permit inter-database linking 39
  44. 44. Annotations permit inter-database linking 39
  45. 45. Annotations permit other capabilities 40
  46. 46. Element in Entity elsewhere the model (e.g., in a database) relationship qualifier (optional) MIRIAM cross-references are simple triples { Data source identifier Data item identifier Annotation qualifier } (Required) (Required) (Optional)Format: URI chosen from Syntax & value space Controlled agreed-upon list depends on data type vocabulary term 41
  47. 47. MIRIAM Registry provides URI dictionary & resolver Community-maintained 42
  48. 48. MIRIAM Registry provides URI dictionary & resolver Community-maintained 42
  49. 49. New development: identifiers.orgProvides resolvable persistent URIs • Unlike URNs, you can type it in a web browserImplemented as additional layer on top of MIRIAM Registry • Provides persistent URLs to data sources • References data are kept in MIRIAM RegistryExample: • EC Code entry # - MIRIAM URN: urn:miriam:ec-­‐code:1.1.1 - URI:­‐code/ by Nicolas Le Novère, Camille Laibe, Nick Juty @ EBI 43
  50. 50. Model representation level Concept due to Nicolas Le Novère Graphical Dis Biological Co cre nti te nuo sto Mathematical us cha lum stic at ion ped ent re ion itie lc tatMe par s de an no an Sta te am Mo l sis fiel da tra ete ode n aly lts ppr nsi r M de la esu oxi tio n Mo rical r ma tio me n Nu Model type Model life-cycle Other forms of representation 44
  51. 51. Graphical representation of modelsToday: broad variation in graphical notation used in biological diagrams • Between authors, between journals, even people in same groupHowever, standard notations (as used in engineering) would offer benefits: • Consistency = easier to read diagrams with less ambiguity • Software support: verification of correctness, translation to math 45
  52. 52. SBGN = Systems Biology Graphical NotationGoal: standardize the graphical notation in diagrams of biological processes • Community-based development, à la SBMLMany groups participating • Proceeding in “levels” • 23 software tools so far 46
  53. 53. Agencies to thank for supporting SBML & BioModels.netNational Institute of General Medical Sciences (USA)European Molecular Biology Laboratory (EMBL)ELIXIR (UK)Beckman Institute, Caltech (USA)Keio University (Japan)JST ERATO Kitano Symbiotic Systems Project (Japan) (to 2003)JST ERATO-SORST Program (Japan)International Joint Research Program of NEDO (Japan)Japanese Ministry of AgricultureJapanese Ministry of Educ., Culture, Sports, Science and Tech.BBSRC (UK)National Science Foundation (USA)DARPA IPTO Bio-SPICE Bio-Computation Program (USA)Air Force Office of Scientific Research (USA)STRI, University of Hertfordshire (UK)Molecular Sciences Institute (USA) 47
  54. 54. People on SBML Team & BioModels Team SBML Team Team Michael Hucka Nicolas Le Novère Sarah Keating Camille LaibeFrank Bergmann Nicolas Rodriguez Lucian Smith Nick JutyNicolas Rodriguez Vijayalakshmi Chelliah Linda Taddeo Stuart Moodie Akiya Joukarou Sarah Keating Visionaries Akira Funahashi Maciej Swat Hiroaki Kitano Kimberley Begley Lukas Endler John Doyle Bruce Shapiro Chen Li Andrew Finney Harish Dharuri Ben Bornstein Lu Li Ben Kovitz Enuo He Hamid Bolouri Mélanie Courtot Herbert Sauro Alexander Broicher Jo Matthews Arnaud Henry Maria Schilstra Marco Donizelli 48
  55. 55. Attendees at SBML 10th Anniversary Symposium, Edinburgh, 2010A huge thank you to the community 49
  56. 56. URLs SBML http://sbml.orgBioModels Database MIRIAM MIASE SED-ML SBO KiSAO TEDDY SBRML SBGN 50
  57. 57. Model Procedures ResultsRepresentation format SBRML Minimal info ? requirements Semantics— Mathematical Biological annotations annotations annotations 51
  58. 58. Model Procedures ResultsRepresentation format SBRML Minimal info ? requirements Semantics— Mathematical Biological annotations annotations annotations 51
  59. 59. <sbml ...> ... <listOfCompartments> <compartment id="cell" size="1e-15" /> </listOfCompartments> <listOfSpecies> ? <species compartment="cell" id="S1" initialAmount="1000" /> <species compartment="cell" id="S2" initialAmount="0" /> <listOfSpecies> <listOfParameters> <parameter id="k" value="0.005" sboTerm="SBO:0000339" /> <listOfParameters> <listOfReactions> <reaction id="r1" reversible="false"> <listOfReactants> <speciesReference species="S1" stoichiometry="2"sboTerm="SBO:0000010" />... SED-ML = Simulation Experiment Description MLApplication-independent formatCaptures procedures, algorithms,parameter values • Steps to go from model to outputlibSedML project developing API library 52