Your SlideShare is downloading. ×
Areejit Samal Randomizing metabolic networks
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Areejit Samal Randomizing metabolic networks

635
views

Published on

Published in: Education, Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
635
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Large-scale structure of metabolic networks: Role of Biochemical and Functional constraints Areejit Samal Laboratoire de Physique Théorique et Modèles Statistiques (LPTMS), CNRS and Univ Paris-Sud, Orsay, France and Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany Areejit Samal
  • 2. Outline Architectural features of metabolic networks Issues with existing null models for metabolic networks Our method to generate realistic randomized metabolic networks using Markov Chain Monte Carlo (MCMC) sampling and Flux Balance Analysis (FBA) Biological function is a main driver of large-scale structure in metabolic networks Areejit Samal
  • 3. Architectural features of real-world networks Small-world Scale-freeWatts and Strogatz, Nature (1998) Barabasi and Albert, Science (1999)Small Average Path Length Power law degree distribution & High Local ClusteringScale-free graphs can be generated using a simple model incorporating:Growth & Preferential attachment scheme Areejit Samal
  • 4. Architectural features of real-world networks Small-world Scale-freeWatts and Strogatz, Nature (1998) Barabasi and Albert, Science (1999)Small Average Path Length Power law degree distribution & High Local ClusteringScale-free graphs can be generated using a simple model incorporating:Growth & Preferential attachment scheme Areejit Samal
  • 5. Large-scale structure of metabolic networks Small-world, Scale-free and Hierarchical Organization Wagner & Fell (2001) Jeong et al (2000) Ravasz et al (2002)Small Average Path Length, High Local Clustering, Power-law degree distribution Bow-tie architecture Directed graph with a giant component of strongly connected nodes along with associated IN and OUT component Bow-tie architecture of the metabolism is similar to that found for the WWW by Broder et al.Ma and Zeng, Bioinformatics (2003) Areejit Samal
  • 6. Large-scale structure of metabolic networks Small-world, Scale-free and Hierarchical Organization Wagner & Fell (2001) Jeong et al (2000) Ravasz et al (2002)Small Average Path Length, High Local Clustering, Power-law degree distribution Bow-tie architecture Directed graph with a giant component of strongly connected nodes along with associated IN and OUT component Bow-tie architecture of the metabolism is similar to that found for the WWW by Broder et al.Ma and Zeng, Bioinformatics (2003) Areejit Samal
  • 7. Network motifs and Evolutionary design Sub-graphs that are over- represented in real networks compared to randomized networks with same degree sequence.Certain network motifs were shown toperform important information processingtasks.For example, the coherent Feed ForwardLoop (FFL) can be used to detectpersistent signals. Shen-Orr et al Nature Genetics (2002); Milo et al Science (2002,2004) Areejit Samal
  • 8. Network motifs and Evolutionary design Sub-graphs that are over- represented in real networks compared to randomized networks with same degree sequence.Certain network motifs were shown toperform important information processingtasks.For example, the coherent Feed ForwardLoop (FFL) can be used to detectpersistent signals. Shen-Orr et al Nature Genetics (2002); Milo et al Science (2002,2004) Areejit Samal
  • 9. “Null hypothesis” based approach for identifying design principles in real-world networksStatistically significant network properties arededuced as follows: 3001) Measure the ‘chosen property’ (E.g., 250 Investigated motif significance profile, average network 200 clustering coefficient, etc.) in the Frequency p-value investigated real network. 150 1002) Generate randomized networks with 50 structure similar to the investigated real network using an appropriate null 0 0.60 0.62 0.64 0.66 0.68 0.70 model. Chosen Property3) Use the distribution of the ‘chosen property’ for the randomized networks to estimate a p-value. Areejit Samal
  • 10. Edge-exchange algorithm: commonly-used null model Randomized networks preserve the degree at each node and the degree sequence. Investigated Network After 1 exchange After 2 exchangesReference:Shen-Orr et al Nature Genetics (2002); Milo et al Science (2002,2004); Maslov & Sneppen Science (2002) Areejit Samal
  • 11. Carefully posed null models are required for deducing design principles with confidence C. elegans Neuronal Network: Close by neurons have a greater chance of forming a connection than distant neurons A null model incorporating this spatial constraint gives different results compared to generic edge- exchange algorithm. Artzy-Randrup et al Science (2004) Areejit Samal
  • 12. Carefully posed null models are required for deducing design principles with confidence C. elegans Neuronal Network: Close by neurons have a greater chance of forming a connection than distant neurons A null model incorporating this spatial constraint gives different results compared to generic edge- exchange algorithm. Artzy-Randrup et al Science (2004) Areejit Samal
  • 13. Edge-exchange algorithm: case of bipartite metabolic networks asp-L cit ASPT CITLnh4 fum ac oaaASPT: asp-L → fum + nh4CITL: cit → oaa + ac Areejit Samal
  • 14. Edge-exchange algorithm: case of bipartite metabolic networks cit asp-L cit Null-model used in asp-L many studies including: Guimera & Amaral Nature (2005); CITL ASPT* CITL* Samal et al BMC ASPT Bioinformatics (2006) ac oaa nh4 fum ac oaanh4 fumASPT: asp-L → fum + nh4 ASPT*: asp-L → ac + nh4CITL: cit → oaa + ac CITL*: cit → oaa + fumPreserves degree of each node in the networkBut generates fictitious reactions that violatemass, charge and atomic balance satisfied byreal chemical reactions!!Note that fum has 4 carbon atoms while ac has2 carbon atoms in the example shown. Areejit Samal
  • 15. Edge-exchange algorithm: case of bipartite metabolic networks cit asp-L cit Null-model used in asp-L many studies including: Guimera & Amaral Nature (2005); CITL ASPT* CITL* Samal et al BMC ASPT Bioinformatics (2006) ac oaa nh4 fum ac oaanh4 fumASPT: asp-L → fum + nh4 ASPT*: asp-L → ac + nh4CITL: cit → oaa + ac CITL*: cit → oaa + fumPreserves degree of each node in the networkBut generates fictitious reactions that violatemass, charge and atomic balance satisfied byreal chemical reactions!!Note that fum has 4 carbon atoms while ac has2 carbon atoms in the example shown. Areejit Samal
  • 16. Edge-exchange algorithm: case of bipartite metabolic networks cit asp-L cit Null-model used in asp-L many studies including: Guimera & Amaral Nature (2005); CITL ASPT* CITL* Samal et al BMC ASPT Bioinformatics (2006) ac oaa nh4 fum ac oaanh4 fumASPT: asp-L → fum + nh4 ASPT*: asp-L → ac + nh4CITL: cit → oaa + ac CITL*: cit → oaa + fumPreserves degree of each node in the networkBut generates fictitious reactions that violate A common generic null modelmass, charge and atomic balance satisfied by (edge exchange algorithm) isreal chemical reactions!! unsuitable for different types of real-world networks.Note that fum has 4 carbon atoms while ac has2 carbon atoms in the example shown. Areejit Samal
  • 17. Null models for metabolic networks: Blind-watchmaker network Areejit Samal
  • 18. Null models for metabolic networks: Blind-watchmaker network Areejit Samal
  • 19. Randomizing metabolic networksWe propose a new method using Markov Chain Monte Carlo (MCMC) sampling andFlux Balance Analysis (FBA) to generate meaningful randomized ensembles formetabolic networks by successively imposing biochemical and functional constraints. Ensemble R Increasing level of constraints Given no. of valid Constraint I biochemical reactions + Ensemble Rm Constraint II Fix the no. of metabolites + Ensemble uRm Constraint III Exclude Blocked Reactions + Ensemble uRm-V1 Functional constraint of Constraint IV growth on specified environments Areejit Samal
  • 20. Comparison with E. coli: Large-scale structural properties of interestIn this work, we will generate randomized metabolic networksusing only reactions in KEGG and compare the structuralproperties of randomized networks with those of E. coli.We study the following large-scale structural properties:• Metabolite Degree distribution• Clustering Coefficient• Average Path Length Scale Free and Small World Ref: Jeong et al (2000) Wagner & Fell (2001)• Pc: Probability that a path exists between two nodes in the directed graph• Largest strongly component (LSC) and the union of LSC, IN and OUT components Bow-tie architecture of the E. coli metabolic network has m (~750) metabolites Internet and metabolism and r (~900) reactions. Ref: Broder et al; Ma and Zeng, Bioinformatics (2003) Areejit Samal
  • 21. glc-D HEX1: atp + glc-D → adp + g6p + h atp PGI: g6p ↔ f6pDifferent Graph-theoretic Representations of the PFK: atp + f6p → adp + fdp + h HEX1 Currency Other Metabolite Metabolite Irreversible Reversible g6p Reaction Metabolic Network Reaction PGI glc-D f6p g6p PFK f6p h fdp adp fdp (a) Bipartite Representation (b) Unipartite Representation
  • 22. glc-D HEX1: atp + glc-D → adp + g6p + h atp PGI: g6p ↔ f6pDifferent Graph-theoretic Representations of the PFK: atp + f6p → adp + fdp + h HEX1 Currency Other Metabolite Metabolite Degree Distribution Irreversible Reversible g6p Reaction Metabolic Network Reaction PGI glc-D Clustering coefficient Average Path Length Largest Strong Component f6p g6p PFK f6p h fdp adp fdp (a) Bipartite Representation (b) Unipartite Representation
  • 23. glc-D HEX1: atp + glc-D → adp + g6p + h atp PGI: g6p ↔ f6pDifferent Graph-theoretic Representations of the PFK: atp + f6p → adp + fdp + h HEX1 Currency Other Metabolite Metabolite Degree Distribution Irreversible Reversible g6p Reaction Metabolic Network Reaction PGI glc-D Clustering coefficient Average Path Length Largest Strong Component f6p g6p The values obtained for different network measures depend on the list PFK f6p of currency metabolites used to generate the unipartite graph of metabolites starting from the bipartite graph. In this work, we have h usedfdpwell defined biochemically sound list of currency metabolites to a adp fdp construct unipartite graph corresponding to each metabolic network. (a) Bipartite Representation (b) Unipartite Representation
  • 24. Constraint I: Given number of valid biochemical reactions  Instead of generating fictitious reactions using edge exchange mechanism, we can restrict to >6000 valid reactions contained within KEGG database.  We use the database of 5870 mass balanced reactions derived from KEGG by Rodrigues and Wagner. For simplicity, I will refer to this database in the talk as ‘KEGG’ Areejit Samal
  • 25. Global Reaction Set Nutrient metabolites Biomass compositionThe set of external The E. coli biomassmetabolites in iJR904 composition of iJR904were allowed for was used to determineuptake/excretion in the + viability of genotypes.global reaction set. We can implement Flux Balance Analysis (FBA) within this database. KEGG Database + E. coli iJR904 Areejit Samal
  • 26. Genome-scale metabolic models: Flux Balance Analysis (FBA)List of metabolic reactions withstoichiometric coefficients Growth rate for Flux Balance the given mediumBiomass composition Analysis (FBA) ‘Biomass Yield’ Fluxes of all reactionsSet of nutrients in theenvironmentAdvantagesFBA does not require enzyme kinetic information which is not known for most reactions.DisadvantagesFBA cannot predict internal metabolite concentrations and is restricted to steady states.Basic models do not account for metabolic regulation. Reference: Varma and Palsson, Biotechnology (1994); Price et al Nature Reviews Microbiology (2004) Areejit Samal
  • 27. Bit string representation of metabolic networks within the global reaction set KEGG Database E. coli R1: 3pg + nad → 3php + h + nadh 1 Present - 1 R2: 6pgc + nadp → co2 + nadph + ru5p-D 0 R3: 6pgc → 2ddg6p + h2o 1 Absent - 0 . . . . . . 6000 C900 RN: --------------------------------- . Contains r reactions N = 5870 reactions within KEGG (1’s in the bit string) (or global reaction set)The E. coli metabolic network or any random network of reactions within KEGG can berepresented as a bit string of length N with exactly r entries equal to 1.r is the number of reactions in the E. coli metabolic network. Areejit Samal
  • 28. Ensemble R: Random networks within KEGG with same number of reactions as E. coliRandom subsets or networks with r reactions To compare with E. coli, we generate an ensemble R of random networks as KEGG with N=5870 follows: reactions Pick random subsets with exactly r (~900) reactions within KEGG database of N (~6000) reactions. r is the no. of reactions in E. coli. 1 0 1 1 0 0 1 0 0 1 1 1 Huge no. of networks possible within . . . . KEGG with exactly r reactions . . . . 6000 ~ C900 . . . . . . . . Fixing the no. of reactions is equivalent to Each random network or bit string fixing genome size. contains exactly r reactions (1’s in the string) like in E. coli. Areejit Samal
  • 29. Metabolite degree distribution A degree of a metabolite is the number of reactions in which the metabolite participates in the network. 100 10-1 10-2P(k) 10-3 10-4 E. coli 10-5 10-6 1 10 100 1000 k E. coli: = 2.17; Most organisms:  is 2.1-2.3 Reference: Jeong et al (2000); Wagner and Fell (2001) Areejit Samal
  • 30. Metabolite degree distribution A degree of a metabolite is the number of reactions in which the metabolite participates in the network. 100 100 10-1 10-1 10-2 10-2 P(k)P(k) 10-3 10-3 10-4 10-4 E. coli E. coli 10-5 Random 10-5 10-6 10-6 1 10 100 1000 1 10 100 1000 k k E. coli: = 2.17; Random networks in Most organisms:  is 2.1-2.3 ensemble R: = 2.51 Reference: Jeong et al (2000); Wagner and Fell (2001) Areejit Samal
  • 31. Metabolite degree distribution A degree of a metabolite is the number of reactions in which the metabolite participates in the network. 100 100 100 10-1 -1 10-1 10 10-2 10-2 10-2 10-3 P(k) P(k)P(k) 10-3 10-4 10-3 10-5 10-4 10-4 E. coli 10-6 E. coli E. coli Random 10-5 Random 10-5 10-7 KEGG 10-6 10-8 10-6 1 10 100 1000 1 10 100 1000 1 10 100 1000 k k k E. coli: = 2.17; Random networks in Complete KEGG: Most organisms:  is 2.1-2.3 ensemble R: = 2.51 = 2.31 Reference: Jeong et al (2000); Wagner and Fell (2001) Any (large enough) random subset of reactions in KEGG has Power Law degree distribution !! Areejit Samal
  • 32. Reaction Degree Distribution The degree of a reaction is the number of metabolites that participate in it. 3000 0 Currency 1 Currency 2500 2 Currency 3 Currency atp adp 4 Currency 5 Currency 2000 6 Currency RFrequency 7 Currency 1500 A B 1000 In each reaction, we have counted the number of currency and other metabolites which is shown in 500 the figure. Most reactions involve 4 metabolites of which 2 0 0 2 4 6 8 10 12 are currency metabolites and 2 are other Number of substrates in a reaction metabolites. The reaction degree distribution is bell shaped with typical reaction in the network involving 4 metabolites. The nature of reaction degree distribution is very different from the metabolite degree distribution which follows a power law. Areejit Samal
  • 33. Scale-free versus Scale-rich network: Origin of Power laws in metabolic networksTanaka and Doyle have suggested a classification of metabolites into threecategories based on their biochemical roles(a) ‘Carriers’ (very high degree) atp adp(b) ‘Precursors’ (intermediate degree) R(c) ‘Others’ (low degree) A B It has been argued that gene duplication and divergence at the level of protein interaction networks can lead to observed power laws in metabolic networks. However, the presence of ubiquitous (high degree) ‘currency’ metabolites along with low degree ‘other’ metabolites in each reaction can explain the observed fat tail in the metabolite distribution. Reference: Tanaka (2005); Tanaka, Csete, and Doyle (2008) Areejit Samal
  • 34. Network Properties: Given number of valid biochemical reactions 10 0.10 8 0.08 6 <L> <C> 0.06 4 0.04 2 0.02 0.00 0 R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli Clustering Coefficient Path Length 0.8 1.0 LSC LSC + IN + OUT 0.6 0.8 Fraction of nodes 0.6 Pc 0.4 0.4 0.2 0.2 0.0 0.0 R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli Probability that a path exists Largest Strong Component between two nodes Areejit Samal
  • 35. Network Properties: Given number of valid biochemical reactions 10 0.10 8 0.08 6 <L> <C> 0.06 4 0.04 2 0.02 0.00 0 R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli Clustering Coefficient Path Length 0.8 1.0 LSC LSC + IN + OUT 0.6 0.8 Fraction of nodes 0.6 Pc 0.4 0.4 0.2 0.2 0.0 0.0 R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli Probability that a path exists Largest Strong Component between two nodes Areejit Samal
  • 36. Constraint II: Fix the number of metabolitesDirect sampling of randomized networks with same no. of reactions and metabolites as inE. coli is not possible. Markov Chain Monte Carlo (MCMC) sampling of randomized networksKEGG Database E. coli Step 1 Step 2 Ensemble RMR1: 3pg + nad → 3php + h + nadh 1 1 1 1 0 0 Swap 1 Swap 0 106 Swaps 1 10 Swaps 3 0R2: 6pgc + nadp → co2 + nadph + ru5p-DR3: 6pgc → 2ddg6p + h2o 1 0 1 1 1. 0 0 1 0 0 Accept. 1 1 0 0 1 . . . Erase . Saving .. memory ofRN: --------------------------------- . . . . Frequency . initialN = 5870 reactions within KEGG network reaction universe Reject Reject store storeReaction swap keeps the number of reactions in network constantAccept/Reject Criterion of reaction swap:Accept if the no. of metabolites in the new network is less than or equal to E. coli. Areejit Samal
  • 37. Network Properties: After fixing number of metabolites 10 0.10 8 0.08 6 <L> <C> 0.06 4 0.04 2 0.02 0.00 0 R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli Clustering Coefficient Path Length 0.8 1.0 LSC LSC + IN + OUT 0.6 0.8 Fraction of nodes 0.6 Pc 0.4 0.4 0.2 0.2 0.0 0.0 R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli Probability that a path exists Largest Strong Component between two nodes Areejit Samal
  • 38. Network Properties: After fixing number of metabolites 10 0.10 8 0.08 6 <L> <C> 0.06 4 0.04 2 0.02 0.00 0 R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli Clustering Coefficient Path Length 0.8 1.0 LSC LSC + IN + OUT 0.6 0.8 Fraction of nodes 0.6 Pc 0.4 0.4 0.2 0.2 0.0 0.0 R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli Probability that a path exists Largest Strong Component between two nodes Areejit Samal
  • 39. Constraint III: Exclude Blocked Reactions  Large-scale metabolic networks typically have certain dead-end or blocked reactions which cannot contribute to KEGG growth of the organism.Unblocked KEGG  Blocked reactions have zero flux for every investigated chemical environment under any steady-state condition and do not contribute to biomass growth flux.  We next exclude blocked reactions from our biochemical universe used for sampling randomized networks. For Blocked reactions, see Burgard et al Genome Research (2004) Areejit Samal
  • 40. MCMC Sampling: Exclusion of blocked reactions Ensemble uRMKEGG Database E. coli Step 1 Step 2R1: 3pg + nad → 3php + h + nadh 1 1 1 1 0 0 Swap 1 Swap 0 106 Swaps 1 10 Swaps 3 0R2: 6pgc + nadp → co2 + nadph + ru5p-DR3: 6pgc → 2ddg6p + h2o 1 0 1 1 1. 0 0 1 0 0 Accept Erase Saving. 1 1 0 0 1 . . . memory of . Frequency .. initialRN: --------------------------------- . . . network . .N = 5870 reactions within KEGG reaction universe Reject Reject store storeAccept/Reject Criterion of reaction swap:Restrict the reaction swaps to the set of unblocked reactions within KEGG.Accept ifthe no. of metabolites in the new network is less than or equal to E. coli. Areejit Samal
  • 41. Network Properties: After excluding blocked reactions 10 0.10 8 0.08 6 <L> <C> 0.06 4 0.04 2 0.02 0.00 0 R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli Clustering Coefficient Path Length 0.8 1.0 LSC LSC + IN + OUT 0.6 0.8 Fraction of nodes 0.6 Pc 0.4 0.4 0.2 0.2 0.0 0.0 R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli Probability that a path exists Largest Strong Component between two nodes Areejit Samal
  • 42. Network Properties: After excluding blocked reactions 10 0.10 8 0.08 6 <L> <C> 0.06 4 0.04 2 0.02 0.00 0 R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli Clustering Coefficient Path Length 0.8 1.0 LSC LSC + IN + OUT 0.6 0.8 Fraction of nodes 0.6 Pc 0.4 0.4 0.2 0.2 0.0 0.0 R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli Probability that a path exists Largest Strong Component between two nodes Areejit Samal
  • 43. Constraint IV: Growth Phenotype Definition of PhenotypeGenotypes Ability to synthesize Viability biomass components011. Viable genotype.. Flux Growth. GA Balance0 Analysis No Growth0 (FBA)1 Unviable. genotype x.. Deleted. GB reaction Bit string and equivalent network representation of genotypes For FBA, see Price et al (2004) for a review Areejit Samal
  • 44. MCMC Sampling: Growth phenotype in one environment Ensemble uRM-V1KEGG Database E. coli Step 1 Step 2R1: 3pg + nad → 3php + h + nadh 1 1 1 1 0 0 Swap 1 Swap 0 106 Swaps 1 10 Swaps 3 0R2: 6pgc + nadp → co2 + nadph + ru5p-DR3: 6pgc → 2ddg6p + h2o 1 0 1 1 1. 0 0 1 0 0 Accept. 1 1 0 Erase 0 Saving 1. . . . memory of . Frequency .RN: --------------------------------- . . . initial . . networkN = 5870 reactions within KEGG reaction universe Reject Reject store storeAccept/Reject Criterion of reaction swap:Restrict the reaction swaps to the set of unblocked reactions within KEGG.Accept if(a) No. of metabolites in the new network is less than or equal to E. coli(b) New network is able to grow on the specified environment Areejit Samal
  • 45. Network Properties: Growth phenotype in one environment 10 0.10 8 0.08 6 <L><C> 0.06 4 0.04 2 0.02 0.00 0 R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli Clustering Coefficient Path Length 0.8 1.0 LSC LSC + IN + OUT 0.6 0.8 Fraction of nodes 0.6Pc 0.4 0.4 0.2 0.2 0.0 0.0 R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli Probability that a path exists Largest Strong Component between two nodes Areejit Samal
  • 46. Network Properties: Growth phenotype in one environment 10 0.10 8 0.08 6 <L><C> 0.06 4 0.04 2 0.02 0.00 0 R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli Clustering Coefficient Path Length 0.8 1.0 LSC LSC + IN + OUT 0.6 0.8 Fraction of nodes 0.6Pc 0.4 0.4 0.2 0.2 0.0 0.0 R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli Probability that a path exists Largest Strong Component between two nodes Areejit Samal
  • 47. Network Properties: Growth phenotype in multiple environments 10 0.10 8 0.08 6 <L><C> 0.06 4 0.04 2 0.02 0.00 0 R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli Clustering Coefficient Path Length 0.8 1.0 LSC LSC + IN + OUT 0.6 0.8 Fraction of nodes 0.6Pc 0.4 0.4 0.2 0.2 0.0 0.0 R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli Probability that a path exists Largest Strong Component between two nodes Areejit Samal
  • 48. Network Properties: Growth phenotype in multiple environments 10 0.10 8 0.08 6 <L><C> 0.06 4 0.04 2 0.02 0.00 0 R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli Clustering Coefficient Path Length 0.8 1.0 LSC LSC + IN + OUT 0.6 0.8 Fraction of nodes 0.6Pc 0.4 0.4 0.2 0.2 0.0 0.0 R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli R RM uRM uRM-V1 uRM-V5 uRM-V10 E. coli Probability that a path exists Largest Strong Component between two nodes Areejit Samal
  • 49. Global structural properties of real metabolic networks:Consequence of simple biochemical and functional constraints Increasing level of constraints Conjectured by: A. Wagner (2007) and B. Papp et al. (2008) Samal and Martin, PLoS ONE (2011) 6(7): e22295 Areejit Samal
  • 50. Global structural properties of real metabolic networks:Consequence of simple biochemical and functional constraints Increasing level of constraints Conjectured by: A. Wagner (2007) and B. Papp et al. (2008) Samal and Martin, PLoS ONE (2011) 6(7): e22295 Areejit Samal
  • 51. High level of genetic diversity in our randomized ensembles Any two random networks in our most constrained ensemble uRM-v10 differ in ~ 60% of their reactions. 1 1 1 1 0 0 0 0 1 0 1 0 . . . versus . versus . . . . . . . . . . . .E. coli Random network in Random network in Random network in ensemble uRM-V10 ensemble uRM-V10 ensemble uRM-V10 Hamming distance between the two networks is ~ 60% of the maximum possible between two bit strings. There is a set of ~ 100 reactions which are present in all of our random networks. Areejit Samal
  • 52. Necessity of MCMC sampling Ensemble uRm-V10 Ensemble uRm-V5 Ensemble uRm-V1 Ensemble uRm Ensemble Rm Ensemble R Embedded sets like Russian dolls 100 Fraction of genotypes that are viable 10-5 10-10Including the viability constraint on the first 10-15 Glucosechemical environment leads to a reduction by at 10-20 Acetate Succinate Analytic prefactorleast a factor 1022 2000 2200 2400 2600 2800 Reference: Samal et al, BMC Systems Biology (2010) Number of reactions in genotype Areejit Samal
  • 53. Summary  We have proposed a new method based on MCMC sampling and FBA to generate randomized metabolic networks by successively imposing few macroscopic biochemical and functional constraints into our sampling criteria.  By imposing a few macroscopic constraints into our sampling criteria, we were able to approach the properties of real metabolic networks.  We conclude that biological function is a main driver of structure in metabolic networks.Areejit Samal
  • 54. Limitations and Future Outlook We can extend the study using the SEED database to many organisms. Automated reconstruction of metabolic models for >140 organisms on which FBA can be performed. The list of reactions in KEGG is incomplete and its use may reflect evolutionary constraints associated with natural organisms. From a synthetic biology context, there could be many more reactions that can occur which are not in KEGG. Alternate approach proposed by Basler et al, Bioinformatics, 2011 which generates new mass-balanced reactions but does not impose functionality constraint. Generate new mass-balanced reactions but no constraint of biomass production. Areejit Samal
  • 55. Genotype networks: A many-to-one genotype to phenotype map RNA Sequence Protein Sequence Gene networkGenotype foldingPhenotype Secondary structure Three dimensional Gene expression of pairings structure levelsReference: Fontana & Schuster (1998) Lipman & Wilbur (1991) Ciliberti, Martin & Wagner (2007)Genotype network refers to the set of (many) genotypes that have a given phenotype,and their “organization” in genotype space. In the context of RNA, genotype networksare commonly referred to as Neutral networks. Areejit Samal
  • 56. Properties of the genotype-to-phenotype map Are there many genotypes that have a givenphenotype? Do the genotypes with a given phenotype form asignificant fraction of all possible genotypes? Are all the genotypes with a given phenotype rathersimilar? Are biological genotypes atypically robustcompared to random genotypes with similarphenotype? Are biological genotypes atypically innovative? Can genotypes change gradually in response toselective pressures? A. Wagner, Nat. Rev. Genet. (2008) Areejit Samal
  • 57. Properties of the genotype-to-phenotype map Are there many genotypes that have a givenphenotype? Do the genotypes with a given phenotype form asignificant fraction of all possible genotypes? We have considered these questions in the context of metabolic Are all the genotypes with a given phenotype rathersimilar? networks. Are biological genotypes atypically robust enzymes or reactions Genotype = lists of metaboliccompared to random genotypes with similar in a given environment Phenotype = growth or notphenotype? Are biological genotypes atypically innovative? Can genotypes change gradually in response toselective pressures? A. Wagner, Nat. Rev. Genet. (2008) Areejit Samal
  • 58. Computational tool for Evolutionary Systems Biology Areejit Samal
  • 59. Acknowledgement Reference Olivier C. Martin Université Paris-Sud Other Work Andreas Wagner João Rodrigues Jürgen Jost University of Zurich Max Planck Institute for Mathematics in the SciencesGenotype networks in metabolic reaction spacesAreejit Samal, João F Matias Rodrigues, Jürgen Jost, Olivier C Martin* and Andreas Wagner*BMC Systems Biology 2010, 4:30Environmental versatility promotes modularity in genome-scale metabolic networksAreejit Samal, Andreas Wagner* and Olivier C Martin*BMC Systems Biology 2011, 5:135 Funding CNRS & Max Planck Society Joint Program in Systems Biology (CNRS MPG GDRE 513) for a Postdoctoral Fellowship. Areejit Samal