Upcoming SlideShare
×

# Modelling and Visualising Biological Systems - Falk Schreiber

1,143 views
937 views

Published on

Two topics will be discussed in this tutorial: (1) constraint-based modelling of metabolic systems using Flux Balance Analysis (FBA) and (2) standardised visual representation of cellular processes and biological networks using the Systems Biology Graphical Notation (SBGN).

Published in: Technology
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
1,143
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
30
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Modelling and Visualising Biological Systems - Falk Schreiber

1. 1. Modelling and Visualising Biological Systems Falk Schreiber Institute of Computer Science Martin Luther University Halle-Wittenberg Bioinformatics IPK Gatersleben
2. 2. IPK Gatersleben & MLU Halle-Wittenberg
3. 3. Outline 1. Modelling metabolism - Basics - Constraint-based modelling: FBA - Mathematical representation - Application of constraints - Example - Resources and tools 2. Visualising models and networks - Basics - Standard graphical representation - Process Description Language - Resources and tools
4. 4. Metabolic Modelling  Comprises the reconstruction, simulation, and analysis of metabolic models  Metabolic model  list of reactions and  associated properties, assumed to be present in the system under investigation, along with  description of the environment within which the biological system is assumed to reside  Provides a basis for system-level analysis of metabolism for different organisms Source: http://www.hydroponicist.com/pages/p69-oxygen-air.htm
5. 5. Model size Model details Methods in Metabolic Modelling Topology only network structure Petri Nets + stoichiometric constraints + thermodynamics Flux Balance + mass balance + capacity constraints Kinetic + kinetic rate laws + kinetic parameters + metabolite concentrations
6. 6. Flux Balance Analysis  Constraint-based stoichiometric modelling approach to predict and analyse the metabolic steady state conversion rates (fluxes)  Advantages  No kinetic parameters required  Quantitative predictions  Applicable to large systems  Applications  Prediction of optimal metabolic yields and flux distributions  Prediction of phenotype/viability of knockout-mutants  Prediction of pathway redundancies …
7. 7. History of FBA
8. 8. Principles of Flux Balance Analysis
9. 9. Simulation Oxygene level
10. 10. Reaction Network Formalism R1 : A v1 B  R2 : A v2 C  R3 : B v3 C  R4 : Aext b1 A  R5 : C b2 C plast  b: exchange fluxes v: internal fluxes R6 : B b 3 B plast 
11. 11. Stoichiometric Matrix R1 : A v1 B  R2 : A v2 C  R1 R2 R3 R4 R5 R6 R3 : B v3 C  A 1 1 0 1 0 0 R4 : Aext b1 A  B 1 0 1 0 0 1 R5 : C   C plast C 0 1 1 0 1 0 b2 R6 : B b 3 B plast 
12. 12. Stoichiometric Matrix
13. 13. Stoichiometric Matrix R1 : A v1 B  R2 : A v2 C  R1 R2 R3 R4 R5 R6 R3 : B v3 C  A 1 1 0 1 0 0 R4 : Aext b1 A  B 1 0 1 0 0 1 R5 : C   C plast C 0 1 1 0 1 0 b2 R6 : B b 3 B plast 
14. 14. Dynamic Mass Balance b: exchange fluxes v: internal fluxes Mass balance equations Matrix form v S dM  S v dt
15. 15. Steady State Steady state assumption dM 0 dt thus S v  0 Steady state mass balance
16. 16. FluxC FluxC Metabolic Modelling Feasible solution space Constraints FluxB  Mass balance: FluxB dM  S v  0 dt  Thermodynamic: directionality of reaction 0  vi    Capacity: enzymatic capacity, nutrient availability  i  vi  i
17. 17. Metabolic Modelling Direction of increasing Z Optimization FluxC FluxC Feasible solution space Max./Min. Z FluxB  Optimization problem: maximize/minimize Z  Solved using linear programming Optimal solution FluxB
18. 18. FluxB Example 50 A + 2B = 120 Feasible set 60 FluxA Z = 20A + 30B FluxB System of two metabolites A and B  Production constrains 0 < A < 60 and 0 < B < 50  Capacity for simultaneous production A + 2B < 120  Objective function Z = 20A + 30B Optimal value within feasible set 50 Feasible set Z = 2100 Z = 1500 60 FluxA
19. 19. Linear Programming: Types of Solutions
20. 20. Objective Function Question Objective What areto identify plausible physiological states? metabolite product How the biochemical production Maximize capabilities? What is the maximal growth rate and biomass yield? Maximize growth rate What is the trade-off between biomass production and metabolite overproduction? Maximize biomass production for a given metabolite production How energetically efficient can metabolism operate? Minimize ATP production or minimize nutrient uptake
21. 21. Model Simulation and Analysis Flux balance analysis Yield / flux predictions under varying environmental conditions - multi parameter variation Knockout analysis Yield / flux predictions under varying genetic backgrounds - complete - specified Robustness analysis Obj. function sensitivity to flux variation of specific reaction - complete - specified Flux variability analysis Predictions of min/ max flux values - complete
22. 22. Objective Function: Growth Objective
23. 23. Objective Function: Growth Objective Metabolic demands of precursors and cofactors required for 1 g of biomass of E. coli. Metabolite ATP NADH NADPH G6P F6P R5P E4P T3P 3PG PEP PYR AcCoA OAA AKG Demand (mmol) 41.2570 -3.5470 18.2250 0.2050 0.0709 0.8977 0.3610 0.1290 1.4960 0.5191 2.8328 3.7478 1.7867 1.0789 Z = 41.2570 vATP - 3.547vNADH + 18.225vNADPH + ….
24. 24. Summary Flux Balance Analysis
25. 25. Metabolism in the Hordeum vulgare Seed FBA model of seed storage metabolism in developing endosperm of Hordeum vulgare
26. 26. Metabolism in the Hordeum vulgare Seed FBA model of seed storage metabolism in developing endosperm of Hordeum vulgare Size 257 reactions, 234 metabolites Pathways Glyc, TCA, PPP, oxP, Ferm, Rubisco, AA, Starch, CW, and others
27. 27. Case Study Source of images: L. Borisjuk and H. Rolletschek, IPK  Non-invasive imaging uncovers metabolic compartmentation in the endosperm  Primary site of alanine synthesis is the central endosperm  Alanine gradients reflect local oxygen state of the endosperm  13C-Ala gradient can be used as in vivo marker for hypoxia
28. 28. Case Study Source of images: L. Borisjuk and H. Rolletschek, IPK  Alanine-AT: critical branch point separating aerobic from anaerobic metabolism  Modelling purpose: to elucidate role of alanine metabolism for seed tissues with varying oxygen supply
29. 29. Simulation of Region-specific Metabolism Central endosperm (hypoxic region) A B Peripheral endosperm (aerobic region)
30. 30. Regulation of Alanine-AT  Regulation of Alanine-AT in the endosperm in response to changing oxygen supply
31. 31. Current Research Directions  Model coupling (different organs)  Multiscale modelling (different modelling methods)
32. 32. Software Tools for FBA  CellNetAnalyzer (CNA) http://www.mpi-magdeburg.mpg.de/projects/cna/cna.html  COBRA Toolbox http://gcrg.ucsd.edu/downloads/COBRAToolbox  FBA-SimVis http://fbasimvis.ipk-gatersleben.de/
33. 33. Model Reconstruction: Metabolic Model 1. Model definition  Organism, organ, dev. stage, pathways, model boundaries 2. Model reconstruction & data retrieval  Top-down: metabolism – pathways – reactions  Integration of heterogeneous data types  Data types: biochemical, physiological, genomic data  Data basis: literature, databases  Missing data  Data referring to closely related species/organs/dev. stages  Inferred reactions: indirect, inferred from BM requirements  Unknown reaction directionality: rev; unknown compartment: cytosol
34. 34. Additional Parameters  Maximum uptake/excretion rates  Literature, experimental data, approximations (e.g. related taxa)  Growth objective  Biomass composition  Energy requirements (growth, maintenance)  Literature, experimental data, approximations (e.g. dev. stage)
35. 35. Outline 1. Modelling metabolism - Basics - Constraint-based modelling: FBA - Mathematical representation - Application of constraints - Example - Resources and tools 2. Visualising models and networks - Basics - Standard graphical representation - Process Description Language - Resources and tools
36. 36. Question 1 – Can you Read this? A network with 102 nodes Protein interaction network, source: Jeong et al. Nature, 2001
37. 37. Question 1 – Can you Read this? A network with 103 nodes Metabolic network, source: KEGG, 2012
38. 38. Question 1 – Can you Read this? A network with 104 nodes Protein interaction network, source: DIP, 2013
39. 39. Part 1 A network with 104 nodes Protein interaction network, source: DIP, 2013  Automatic layout of large networks and circuit-boards
40. 40. Question 2 – Can you Understand this?
41. 41. Question 2 – Can you Understand this? Stimulates gene transcription? Associates into? Is degraded? Translocates? Reciprocal stimulation?
42. 42. Part 2 Stimulates gene transcription? Associates into? Is degraded? Translocates? Reciprocal stimulation? Standardisation of graphical representation
43. 43. Part 1 A network with 104 nodes Protein interaction network, source: DIP, 2013  Automatic layout of large networks and circuit-boards
44. 44. Automatic Layout of Networks  Force-based approaches  Simulate a system of physical forces Eades. Congressus Numerantium, 1984. Fruchterman & Reingold. Software - Practice and Experience,1991.  Layered approaches  Decycling - layering - crossing reduction - coordinate assignment Sugiyama et al. IEEE Transactions on Systems, Man and Cybernetics, 1981.  Orthogonal / grid-based approaches Tamassia. SIAM Journal on Computing, 1987. Biedl et al. Graph Drawing, LNCS 1353, 1997.
45. 45. Many Special Layout Algorithms  Commonly extensions of the three classes of layout algorithms  Force-based  Layered  Orthogonal / grid-based  Examples Source: Karp & Paley. Conf. Bioinformatics and Genome Research, 1994. Source: Becker & Rojas. Bioinformatics, 2001. Source: Schreiber. In Silico Biology, 2002. Source: Genc & Dogrusoz. Graph Drawing LNCS 2912, 2004.
46. 46. Good Network Layout  Better layouts have  Fewer edge crossings  Large crossing angles  Straighter edges  Horizontal and vertical edges  Symmetrical parts shown symmetrically …  Special layout algorithms
47. 47. Part 2 Standardisation of graphical representation
48. 48. Ambiguity in Conventional Representation
49. 49. Standardised Symbols are Important Most English speaking country Singapore Quebec Iran Norway China Poland Israel USA and Canada
50. 50. Pathway Diagrams has been Used a Long Time Ago A metabolic pathway diagram From the wall chart of Biochemical Path-ways created by Gerhard Michal (1968) Electrical circuit diagram representing cell membrane From Hodgkin AL and Huxley AF (1952) A quantitative description of membrane current and its application to conduction and excitation in nerve. J. Physiol. 117: 500-544.
51. 51. What is SBGN?  A way to unambiguously describe biochemical and cellular events in graphs  Limited amount of symbols (~30)  Smooth learning curve  Can graphically represent quantitative models, biochemical pathways, at different levels of granularity  Developed since 2006 by a growing community, part of COMBINE  Three languages  Process Descriptions  Entity Relationships  Activity Flow  one state = one glyph  one entity = one glyph  conceptual level
52. 52. Graph Trinity: Three Languages in One Process Description maps     Unambiguous Mechanistic Sequential Combinatorial explosion Entity Relationships Activity Flow maps maps  Unambiguous  Mechanistic  Non-Sequential  Ambiguous  Conceptual  Sequential
53. 53. Graph Trinity: Three Languages in One Process Description Entity Relationships Activity Flow
54. 54. SBGN Process Description Language  A Process Description (PD) Diagram represents all molecular processes and interactions occurring between various biochemical entities  It depicts how entities transition forms as a result of biochemical reactions (including non-covalent modifications such as binding)  Most of the classic metabolic pathways (e.g., glycolysis and TCA cycle) in biochemistry textbooks were drawn in this approach  Though not the conventional approach for drawing signaling pathways, this approach captures the details of biochemical reactions within the pathway network and provides, in most cases, unambiguous interpretation of pathway mechanisms
55. 55. Graph Trinity: Three Languages in One Process Description maps     Unambiguous Mechanistic Sequential Combinatorial explosion Entity Relationships Activity Flow maps maps  Unambiguous  Mechanistic  Non-Sequential  Ambiguous  Conceptual  Sequential
56. 56. SBGN Process Description L1 V1.2 Reference Card
57. 57. Pools of Entities  Collection of molecules indistinguishable in some sense  Non-overlapping  Characterized by concentration
58. 58. Entity Types Unspecified entity Simple chemical Macromolecule LABEL LABEL LABEL Nucleic acid feature LABEL
59. 59. Material Type  Unit of information  Controlled vocabulary (SBO)  Indicates its chemical structure (physical composition) Name Non-macromolecular ion Non-macromolecular radical Ribonucleic acid Deoxribonucleic acid Protein Polysaccharide mt:prot pre:label Label mt:ion mt:rad mt:rna mt:dna mt:prot mt:psac PhyA
60. 60. Conceptual Type  Unit of information  Controlled vocabulary (SBO)  Indicates its function within the context of a given PD map Name Gene Transcription start site Gene coding region Gene regulatory region Messenger RNA ct:grr pre:label Label ct:gene ct:tss ct:coding ct:grr ct:mRNA crp
61. 61. Macromolecular Pools: State Variables  Pool is set of molecules somehow undistinguishable  Molecules can be in different state  (Non)phosphorylated  Open/close channel  Modified at some state R R Ch Ch Close Open Kinase P@237 P R 2P
62. 62. Stateless and State-full Entity Types  Not all entities can have states  Stateless mt:prot  Simple chemicals  Unspecified entity PhyA  State-full entities  Macromolecule Pr/Prf  Nucleic acid feature  Complex  State is defined as combination of state values  Once defined state variable should be always visible
63. 63. Example 1: LEC1/AFL-B3 Network Macromolecules: biochemical substances that are built up from the covalent linking of pseudo-identical units. Examples of macromolecules include proteins, nucleic acids (RNA, DNA), and polysaccharides (glycogen, cellulose, starch, etc.).
64. 64. Complex and Multimer  Represents complexes of molecules held together by non-covalent bonds  Multimer require cardinality  Can have state variables  In multimer it means that all monomers have same state  Use complex if not the same states Multimers N:2 LABEL N:5 LABEL Complex N:3 LABEL LABEL LABEL
65. 65. Key Concept: Process  Process: conversion of element of one pool to another  Special cases  Non-covalent binding Association Dissociation  Incompleteness Uncertain process Omitted process Association Dissociation Process Uncertain process Omitted process ? //
66. 66. LEC1/AFL-B3 Network Omitted processes are processes that are known to exist, but details are omitted from the map for the sake of clarity or parsimony. A single omitted process can represent any number of actual processes.
67. 67. Arcs  Using pools by process  Consumption/production  Stoichiometry (optional)  Regulating process rate  Stimulation  Inhibition  Catalysis  Requirement for process  Necessary stimulation consumption production catalysis stimulation inhibition necessary stimulation modulation 2
68. 68. Laying out Process Arcs  Production can represents consumption  Reversible process  Substrates and products should come to opposite sides of process shape (two connectors)  Regulatory arcs should come to other two sides of the process  If you have separate regulation of forward and backward process, you have to split
69. 69. LEC1/AFL-B3 Network A stimulation affects positively the flux of a process represented by the target process.
70. 70. Sink/source: Creation and Destruction  Represents creation and destruction of entities  Shape to represent source of materials and sink of degraded entities
71. 71. LEC1/AFL-B3 network A submap is used to encapsulate processes (including all types of nodes and edges) within one glyph. The submap hides its content to the users, and display only input terminals (or ports).
72. 72. LEC1/AFL-B3 Factors and Maturation Gene Control
73. 73. Environmental Influence  External influences: Perturbing agent  Light  Temperature change  Mutation/disease Phenotype  System manifestation: Phenotype LABEL  Apoptosis  Phenotype Perturbing agent LABEL
74. 74. LEC1/AFL-B3 Factors and Maturation Gene Control The phenotype glyph represents biological processes or phenotypes that are affected or generated by a biochemical/regulatory network. Such processes can take place at different levels and are independent of the biochemical network itself.
75. 75. Clone Marker  Each entity pool is only once represented on the map  Layout problems  Clone marker as visual indicator of duplication  Stateless nodes carry unnamed marker  State-full nodes carry named marker to simplify recognition LABEL marker
76. 76. LEC1/AFL-B3 Factors and Maturation Gene Control If an EPN is duplicated on a map, it is necessary to indicate this fact by using the clone marker auxiliary unit. The purpose of this marker is to provide the reader with a visual indication that this node has been cloned, and that at least one other occurrence of the EPN can be found in the map.
77. 77. Discrimination Between Knowledge Levels Transcription + factor Target gene DNA complex transcription translation
78. 78. Discrimination Between Knowledge Levels Transcription factors and target gene DNA together stimulate transcription, translation Transcription factor stimulates the transcription of several putative target genes
79. 79. Compartments  Container to represent physical or logical structure  Free form  Visually thicker line  The same entity pools in different compartments are different  Compartments are independent  Overlapping do not mean containment
80. 80. Compartments Neuro-muscular junction
81. 81. Logical Gates  Encode of network logic  To simplify layout If there are many activators for the process  To include uncertain information Combination of TF with unknown or combinatorial binding kinetics  Three main logic operations  AND: all are required  OR: any combination is required  NOT: prevent influence
82. 82. Strength and Weakness of SBGN-PD Strength  Easy translation into mathematical model  Natural mapping to SBML  A lot of information in DBs  KEGG  Panther  Timeline is easily extractable Weakness  Full explicit definition of state  Combinatorial complexity  Additional assumption to include uncertain information  Laborious creation
83. 83. SBGN Process Description L1 V1.2 Reference Card
84. 84. SBGN Process Description - Entity Pool Nodes
85. 85. SBGN Process Description - Process Nodes
86. 86. SBGN Process Description - Connecting Arcs
87. 87. Software Tools for SBGN  SBGN http://www.sbgn.org  SBGN-ED http://www.sbgned.org
88. 88. Standards in Systems Biology Source: Demir et. al. Nature Biotechnology, 2012.
89. 89. High Throughput Modelling and Visualisation  Path2Models: A pipeline to produce models that combine data from different sources  140.000 kinetic, logical and constraint-based models  Part of BioModelsDB Path2Models team: F. Büchel, T. Czauderna, C. Chaouiya, A. Dräger, M. Glont, H. Hermjakob, M. Hucka, S. Keating, D.B. Kell, R. Keller , C. Laibe, N. Le Novère, P. Mendes, F. Mittag, M. Rall, N. Rodriguez, J. SaezRodriguez, F. Schreiber, M. Schubert, N. Swainston, M. van Iersel, C. Wrzodek, M. Wybrow, A. Zell