Laboratoire de Physique Théorique et Modèles Statistiques (LPTMS), CNRS and Univ Paris-Sud, Orsay, France and Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany Areejit Samal Randomizing genome-scale metabolic networks
Architectural features of real-world networks Structure of real-world networks is very different from random graphs Small Average Path Length & High Local Clustering Watts and Strogatz (1998) Small-World Ravasz et al (2002) Power law degree distribution & Hierarchical modularity Barabasi and Albert (1999) Scale-free and Hierarchical Organization Scale-free graphs can be generated using a simple model incorporating: Growth & Preferential attachment scheme
Network motifs and evolved design Shen-Orr et al (2002); Milo et al (2002,2004) Sub-graphs that are over-represented in real networks compared to randomized networks with same degree sequence. Certain network motifs were shown to perform important information processing tasks. For example, the coherent Feed Forward Loop (FFL) can be used to detect persistent signals. The preponderance of certain sub-graphs in a real network may reflect evolutionary design principles favored by selection.
“Null hypothesis” based approach for identifying design principles in real-world networks <ul><li>The procedure to deduce statistically significant network properties is: </li></ul><ul><li>Measure the ‘chosen property’ (E.g., motif significance profile, average clustering coefficient, etc.) in the investigated real network. </li></ul><ul><li>Generate randomized networks with structure similar to the investigated real network using an appropriate null model. </li></ul><ul><li>Use the distribution of the ‘chosen property’ for the randomized networks to estimate a p-value. </li></ul>Investigated network p-value
Edge-exchange algorithm: widely used null model Randomized networks preserve the degree at each node and the degree sequence. Investigated Network After 1 exchange After 2 exchanges Reference: Shen-Orr et al (2002); Milo et al (2002,2004); Maslov & Sneppen (2002) …
Carefully posed null models are required for deducing design principles with confidence Artzy-Randrup et al Science (2004) In C. elegans , close by neurons have a greater chance of forming a connection than distant neurons. A null model incorporating this spatial constraint gives different results compared to generic edge-exchange algorithm. <ul><li>Misinterpretation can arise due to use of an incomplete null model (e.g. ignoring important constraints) !! </li></ul><ul><li>A common generic null model (edge exchange algorithm) is unsuitable for different types of real-world networks. </li></ul>
Edge-exchange algorithm: case of bipartite metabolic networks ASPT: asp-L -> fum + nh4 CITL: cit -> oaa + ac ASPT*: asp-L -> ac + nh4 CITL*: cit -> oaa + fum Preserves degree of each node in the network But generates fictitious reactions that violate mass , charge and atomic balance satisfied by real chemical reactions!! Note that fum has 4 carbon atoms while ac has 2 carbon atoms in the example shown. Example: Guimera & Amaral (2005); Samal et al (2006) Biochemically meaningless randomization inappropriate for metabolic networks ASPT CITL fum asp-L cit ac oaa nh4 ASPT* CITL* fum asp-L cit ac oaa nh4
Null models for metabolic networks: Blind-watchmaker network Blind to basic constraints satisfied by biochemical reactions!!
Motivation of this work We propose a new Markov Chain Monte Carlo (MCMC) based method to generate meaningful randomized ensembles for metabolic networks by successively imposing biochemical and functional constraints. Ensemble R Given number of valid biochemical reactions Ensemble Rm Additionally, fix the No. of metabolites Ensemble uRm Additionally, exclude blocked reactions Ensemble uRm-V1 Additionally, functional constraint of growth on a specified environments Increasing level of constraints Constraint I Constraint II Constraint III Constraint IV
Constraint I: Given number of valid biochemical reactions <ul><li>Instead of generating fictitious reactions using edge exchange mechanism, we can restrict to >6000 valid reactions contained within KEGG database. </li></ul><ul><li>We use the database of 5870 mass balanced reactions derived from KEGG by Rodrigues and Wagner. </li></ul>
Global Reaction Set + Biomass composition The E. coli biomass composition of iJR904 was used to determine viability of genotypes. Nutrient metabolites The set of external metabolites in iJR904 were allowed for uptake/excretion in the global reaction set. KEGG Database + E. coli iJR904 We can implement Flux Balance Analysis (FBA) within this database. Reference: Rodrigues and Wagner (2009)
Comparison with E. coli : Structural Properties of Interest <ul><li>Metabolite Degree distribution </li></ul><ul><li>Clustering Coefficient </li></ul><ul><li>Average Path Length </li></ul><ul><li>P c : Probability that a path exists between two nodes in the graph </li></ul><ul><li>Largest strongly component (LSC) and the union of LSC, IN and OUT components </li></ul><ul><li>E. coli metabolic network has m (~750) metabolites and r (~900) reactions. </li></ul>In this work, we will generate randomized metabolic networks using only reactions in KEGG and compare the structural properties of randomized networks with those of E. coli . We study the following large-scale structural properties: Bow-tie architecture of the Internet and metabolism Ref: Broder et al; Ma and Zeng, Bioinformatics (2003) Scale Free and Small World Ref: Jeong et al (2000) Wagner & Fell (2001)
g6p glc-D f6p fdp glc-D g6p f6p fdp HEX1: atp + glc-D -> adp + g6p + h PGI: g6p ↔ f6p PFK: atp + f6p -> adp + fdp + h Currency Metabolite Other Metabolite Reversible Reaction Irreversible Reaction (a) Bipartite Representation (b) Unipartite Representation Degree Distribution Clustering coefficient Average Path Length Largest Strong Component Different Graph-theoretic Representations of the Metabolic Network HEX1 PGI atp adp h PFK
Ensemble R : Random networks within KEGG with same number of reactions as E. coli The E. coli metabolic network or any random network of reactions within KEGG can be represented as a bit string of length N. Present - 1 Absent - 0 N = 5870 reactions within KEGG Bit string representation of metabolic networks Ensemble R of random networks To compare with E. coli , we generate an ensemble R of random networks: Pick random subsets with exactly r (~900) reactions within KEGG database of N (~6000) reactions. r is the no. of reactions in E. coli . Huge no. of networks possible within KEGG with exactly r reactions ~ Fixing the no. of reactions is equivalent to fixing genome size. Contains r reactions (1’s in the bit string) KEGG with N reactions Random networks with r reactions KEGG Database E. coli R 1 : 3pg + nad -> 3php + h + nadh R 2 : 6pgc + nadp -> co2 + nadph + ru5p-D R 3 : 6pgc -> 2ddg6p + h2o . . . R N : --------------------------------- 1 0 1 . . . .
Metabolite degree distribution Complete KEGG: = 2.31 E. coli : = 2.17; Most organisms: is 2.1-2.3 Reference: Jeong et al (2000); Wagner and Fell (2001) Random networks in ensemble R : = 2.51 Any (large enough) random subset of reactions in KEGG has Power Law degree distribution !! A degree of a metabolite is the number of reactions in which the metabolite participates in the network.
Reaction Degree Distribution <ul><li>Tanaka and Doyle have suggested a classification of metabolites into three categories based on their biochemical roles: </li></ul><ul><li>(a) ‘Carriers’ (very high degree) </li></ul><ul><li>(b) ‘Precursors’ (intermediate degree) </li></ul><ul><li>(c) ‘Others’ (low degree) </li></ul><ul><li>Reference: Tanaka (2005); Tanaka, Csete, and Doyle (2008) </li></ul>The degree of a reaction is the number of metabolites that participate in it. The presence of ubiquitous (high degree) ‘currency’ metabolites along with low degree ‘other’ metabolites within a given reaction explains the observed power laws in metabolism.
Network Properties: Given number of valid biochemical reactions No. of metabolites in E. coli < No. of metabolites in any random network
Constraint II: Fix the number of metabolites Direct sampling of randomized networks with same no. of reactions and metabolites as in E. coli is not possible. Simple MCMC algorithm to sample networks within KEGG which have same no. of reactions and metabolites as that in E. coli . Accept/Reject Criterion of reaction swap: Accept if the no. of metabolites in the new network is less than or equal to E. coli . Erase memory of initial network Saving Frequency Ensemble Rm KEGG Database R 1 : 3pg + nad -> 3php + h + nadh R 2 : 6pgc + nadp -> co2 + nadph + ru5p-D R 3 : 6pgc -> 2ddg6p + h2o . . . R N : --------------------------------- 1 0 1 0 1 . . N = 5870 reactions within KEGG reaction universe 1 1 0 0 1 . . 1 0 1 1 0 . . 1 1 1 0 0 . . 0 0 1 0 1 . . E. coli Accept Reject Step 1 Swap Swap Reject 10 6 Swaps Step 2 10 3 Swaps store store Reaction swap keeps the number of reactions in network constant
Network Properties: After fixing number of metabolites Randomized network properties become closer to E. coli
Constraint III: Exclude Blocked Reactions KEGG Unblocked KEGG <ul><li>Large-scale metabolic networks typically have certain dead-end or blocked reactions which cannot contribute to growth of the organism. </li></ul><ul><li>Blocked reactions have zero flux for every investigated chemical environment, under any steady-state condition with nonzero biomass growth flux. </li></ul><ul><li>We next exclude blocked reactions from our biochemical universe used for sampling randomized networks. </li></ul>
MCMC Sampling: Exclusion of blocked reactions Restrict the reaction swaps to the set of unblocked reactions within KEGG. Accept/Reject Criterion of reaction swap: Accept if the no. of metabolites in the new network is less than or equal to E. coli . Erase memory of initial network Saving Frequency Ensemble uRm KEGG Database R 1 : 3pg + nad -> 3php + h + nadh R 2 : 6pgc + nadp -> co2 + nadph + ru5p-D R 3 : 6pgc -> 2ddg6p + h2o . . . R N : --------------------------------- 1 0 1 0 1 . . N = 5870 reactions within KEGG reaction universe 1 1 0 0 1 . . 1 0 1 1 0 . . 1 1 1 0 0 . . 0 0 1 0 1 . . E. coli Accept Reject Step 1 Swap Swap Reject 10 6 Swaps Step 2 10 3 Swaps store store Reaction swap keeps the number of reactions in network constant
Network Properties: After excluding blocked reactions Randomized network properties become still closer to E. coli
Constraint IV: Growth Phenotype 0 1 1 . . . . 0 0 1 . . . . G A G B Flux Balance Analysis (FBA) Viable genotype Unviable genotype Bit string and equivalent network representation of genotypes Genotypes Ability to synthesize Viability biomass components Growth No Growth Definition of Phenotype For FBA Kauffman et al (2003) for a review x Deleted reaction
MCMC Sampling: Growth phenotype in one environment <ul><li>Restrict the reaction swaps to the set of unblocked reactions within KEGG. </li></ul><ul><li>Accept/Reject Criterion of reaction swap: </li></ul><ul><li>Accept if </li></ul><ul><li>the no. of metabolites in the new network is less than or equal to E. coli </li></ul><ul><li>the new network is able to grow on the specified environment </li></ul>Erase memory of initial network Saving Frequency Ensemble uRm-V1 KEGG Database R 1 : 3pg + nad -> 3php + h + nadh R 2 : 6pgc + nadp -> co2 + nadph + ru5p-D R 3 : 6pgc -> 2ddg6p + h2o . . . R N : --------------------------------- 1 0 1 0 1 . . N = 5870 reactions within KEGG reaction universe 1 1 0 0 1 . . 1 0 1 1 0 . . 1 1 1 0 0 . . 0 0 1 0 1 . . E. coli Accept Reject Step 1 Swap Swap Reject 10 6 Swaps Step 2 10 3 Swaps store store Reaction swap keeps the number of reactions in network constant
Network Properties: Growth phenotype in one environment Further improvement obtained by imposing phenotypic constraint on randomized networks
Network Properties: Growth phenotype in multiple environments Still further improvement obtained by increasing phenotypic constraints on randomized networks
Global structural properties of real metabolic networks: Consequence of simple biochemical and functional constraints Increasing level of constraints Global network structure could be an indirect consequence of biochemical and functional constraints Samal and Martin, PLoS ONE (2011) 6(7): e22295 Conjectured by: A. Wagner (2007) B. Papp et al. (2008)
High level of genetic diversity in our randomized ensembles Any two random networks in our most constrained ensemble differ in ~ 60% of their reactions.
Necessity of MCMC sampling Constraint of having super-essential reactions The convex curve indicates that the effect on viability of successive reaction removals becomes more and more severe. Reference: Samal, Rodrigues, Jost, Martin, Wagner BMC Systems Biology (2010)
Summary <ul><li>We propose a MCMC based method to generate randomized metabolic networks by successively imposing few macroscopic biochemical and functional constraints into our sampling criteria. </li></ul><ul><li>By imposing a few macroscopic constraints into our sampling criteria, we were able to approach the properties of real metabolic networks. </li></ul>
Concluding Remarks <ul><li>We can extend the study using the SEED database to other organisms. </li></ul><ul><li>The list of reactions in KEGG is incomplete and its use may reflect evolutionary constraints associated with natural organisms. </li></ul><ul><li>From a synthetic biology context, there could be many more reactions that can occur which are not in KEGG. </li></ul><ul><li>Alternate approach proposed by Basler, Ebenhöh, Selbig and Nikoloski, Bioinformatics, 2011 </li></ul>
Neutral networks: A many-to-one genotype to phenotype map Genotype Phenotype RNA Sequence Secondary structure of pairings folding Reference: Fontana & Schuster (1998) Lipman & Wilbur (1991) Ciliberti, Martin & Wagner(2007) Neutral network or genotype network refers to the set of (many) genotypes that have a given phenotype, and their “organization” in genotype space. Protein Sequence Three dimensional structure Gene network Gene expression levels
Common approaches in the study of design principles of metabolic networks See also:
Acknowledgement CNRS & Max Planck Society Joint Program in Systems Biology (CNRS MPG GDRE 513) for a Postdoctoral Fellowship. Reference Related Work Genotype networks in metabolic reaction spaces Areejit Samal , João F Matias Rodrigues , Jürgen Jost , Olivier C Martin * and Andreas Wagner * BMC Systems Biology 2010, 4: 30 Environmental versatility promotes modularity in genome-scale metabolic networks Areejit Samal , Andreas Wagner * and Olivier C Martin * BMC Systems Biology 2011, 5: 135 Funding Andreas Wagner João Rodrigues University of Zurich Jürgen Jost Max Planck Institute for Mathematics in the Sciences Olivier C. Martin Université Paris-Sud Thanks for your attention!! -- Questions??