Areejit SamalInternational Centre for TheoreticalPhysicsTrieste, ITALYasamal@ictp.itPhenotypic constraints drive thearchitecture of biological networksWork done @CNRS, LPTMS, Univ Paris-Sud 11, Orsay, France&Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany
Boehringer MannheimBiochemical Pathway ChartMetabolic networkMetabolic network consists the set of biochemical reactions that convert nutrientmolecules into key molecules required for growth and maintenance of the cell.
Large-scale structure of metabolic networks is similar toother real world networksMa and Zeng (2003);Csete and Doyle (2004)Wagner & Fell (2001) Jeong et al (2000) Ravasz et al (2002)Small-world, Scale-free and Hierarchical OrganizationBow-tie architectureBow-tie architecture of the metabolism is similar to thatfound in the WWW by Broder et al.Small Average Path Length, High Local Clustering, Power-law degree distributionDirected graph with a giant component of stronglyconnected nodes along with associated IN and OUTcomponent. Structure of real networks is very different from random(Erdős–Rényi) networks. Large-scale structure of biological networks is similar toother kinds of real networks (social and technologicalnetworks).
Reference: Shen-Orr et al (2002); Milo et al (2002,2004)Local structure of biological networks: Network motifsSub-graphs that are over-represented in realnetworks compared to randomized networks.Certain network motifs were shown to performimportant information processing tasks.E.g., the coherent Feed Forward Loop (FFL)can be used to detect persistent signals.The preponderance of certain sub-graphs in the real network may reflect evolutionarydesign principles favored by selection.
Are the observed structural features of a metabolicnetwork ‘unusual’ or ‘atypical’? Extremely popular to study network measures such as degreedistribution, clustering coefficient, assortativity, motifs, etc. in biologicalnetworks. But mere observation of scale-free, small-world or other strikingstructural features in biological networks do not mean these propertiesare non-obvious or atypical features of cellular networks. A distinction needs to be made between observation of these structuralproperties in real networks and our understanding of the generativeprinciples that may have led to these properties in real networks.
Questions of Interest Are the observed structural properties of biological networks frozenaccidents? What are the adaptive features of a network? Which features of abiological network are under selection and which are mere byproducts ofselection on other traits?To answer these questions adequately, one needs proper controls or nullmodels of biological networks. Current controls are not well posed for biologicalnetworks, especially, metabolic networks. Furthermore, proper controls should account for biological function.What are the non-obvious or atypical properties of metabolic networks oncebiological function is accounted in the control or null model?
Is the large-scale structure of E. coli metabolic networkatypical?• Metabolite Degree distribution• Clustering Coefficient• Average Path Length• Pc: Probability that a path exists between two nodes inthe directed graph• Largest strongly component (LSC) and the union ofLSC, IN and OUT componentsWe decided to compare the following structural properties of E. colimetabolic network with those in randomized metabolic networks:Bow-tie architecture of theInternet and metabolismRef: Broder et al (1999); Ma and Zeng (2003)Scale Free and Small WorldRef: Jeong et al (2000); Wagner & Fell (2001)The widely used null model to generaterandomized metabolic networks is not well posedto answer this question!
Chosen Property0.60 0.62 0.64 0.66 0.68 0.70Frequency050100150200250300Edge-randomization: Widely-used null model that does notaccount for biological function1) Measure the ‘chosen property’ in theinvestigated real network.2) Generate randomized networks withstructure similar to the investigated realnetwork using edge-randomization.3) Use the distribution of the ‘chosenproperty’ for the randomized networks toestimate a p-value.Investigatednetworkp-valueInvestigated Network After 1 exchange After 2 exchangesReference:Milo et al(2002,2004);Maslov &Sneppen (2002)BUT this null model does not account for phenotypic or functionalconstraints central to cellular networks!
Edge-randomization: Does not preserve mass balance constraintand unsuitable for metabolic networksASPT CITLfumasp-L citac oaanh4ASPT* CITL*fumasp-L citac oaanh4ASPT: asp-L → fum + nh4CITL: cit → oaa + acASPT*: asp-L → ac + nh4CITL*: cit → oaa + fumPreserves degree of each node in the network but generates fictitious reactionsthat violate mass, charge and atomic balance satisfied by real chemicalreactions!!Note that fum has 4 carbon atoms while ac has 2 carbon atoms in the example shown.Null-model used inmany studiesincluding: Guimera &Amaral Nature (2005)Biochemically meaningless randomization inappropriate for metabolic networksWe decided to develop a proper null model for metabolic networks that accountsfor basic biochemical and functional constraints.
Global reaction set ensures mass balance constraintLimitation 1: Edge-randomization generatesrandomized reactions which violate mass and atomicbalance.Solution: We decided to overcome this problem bylimiting the set of reactions in random metabolicnetworks to those within KEGG database.KEGG Database + E. coli iJR904We have use a curateddatabase of 5870 massbalanced reactions derivedfrom KEGG by Rodrigues andWagner (2009).
Bit string representation of metabolic network6000900CKEGG Database E. coliR1: 3pg + nad → 3php + h + nadhR2: 6pgc + nadp → co2 + nadph + ru5p-DR3: 6pgc → 2ddg6p + h2o...RN: ---------------------------------101....The E. coli metabolic network or any random network of reactions within KEGGcan be represented as a bit string of length N = 5870 reactions with exactly nentries equal to 1 where n is the number of reactions in E. coli.Present - 1Absent - 0N = 5870 reactions within KEGG(or global reaction set)Contains n reactions(1‟s in the bit string)
Constraint-based Flux Balance Analysis (FBA): Determine thefunctional capability of metabolic networksList of metabolic reactions withstoichiometric coefficientsBiomass compositionSet of nutrients in theenvironmentFlux BalanceAnalysis (FBA)Growth rate forthe given medium‘Biomass Yield’Fluxes of allreactionsAdvantagesFBA does not require enzyme kinetic information which is not known for most reactions.DisadvantagesFBA cannot predict internal metabolite concentrations and is restricted to steady states.Basic models do not account for metabolic regulation.Reference: Varma and Palsson, Biotechnology (1994); Price et al (2004)InputGenotype: Set of Metabolic ReactionsEnvironment: Set of nutrients available for uptakeOutputPhenotype: Ability to grow
Metabolic Genotype to Phenotype Map011....001....GAGBxDeletedreactionFluxBalanceAnalysis(FBA)ViablegenotypeUnviablegenotypeBit string and equivalent networkrepresentation of genotypesMetabolic Genotypes Ability to synthesize Viabilitybiomass componentsGrowthNo GrowthLimitation 2: Edge-randomization does not account for biological function.Solution: Flux Balance Analysis can be used to determine functionality of metabolicnetworks.
Fraction of networks satisfying functional constraintsNumber of reactions in genotype2000 2200 2400 2600 2800Fractionofgenotypesthatareviable10010-510-1010-1510-20GlucoseAcetateSuccinateAnalytic prefactorAnswer:To estimate this fraction:• Generate random networks within KEGGwith exactly n reactions.• Determine the fraction of networks thatcan grow under glucose minimal mediausing FBA.Question: What fraction of possible metabolic networks within KEGG with exactly nreactions can grow under glucose minimal media like E. coli?Only a tiny fraction of possible networks satisfy functional constraints yet thenumber of functional networks is huge !Reference: Samal et al., BMC Systems Biology (2010)Space of possible networks with number of reactions equal to E. coli ~ 101113Space of possible networks with number of reactions equal to E. coli and desiredphenotype ~ 101050
Necessity for Markov Chain Monte Carlo (MCMC) SamplingSpace of possible networks ~ 101113Space of networks with desiredphenotype ~ 101050We wish to uniformly samplethe space of viable metabolicnetworks within KEGG havingthe same number of reactionsas E. coli.But one cannot sample theviable space by generatingrandom bit strings followed by atest for phenotype.We resort to Markov ChainMonte Carlo (MCMC) methodto sample the space ofnetworks satisfying functionalconstraints.
MCMC sampling of metabolic networks with growthphenotypeAccept/Reject Criterion of reaction swap:Accept if(a) No. of metabolites in the new network is less than or equal to E. coli(b) New network is able to grow on the specified environment(s)KEGG DatabaseR1: 3pg + nad → 3php + h + nadhR2: 6pgc + nadp → co2 + nadph + ru5p-DR3: 6pgc → 2ddg6p + h2o...RN: ---------------------------------10101..N = 5870 reactions within KEGGreaction universe11001..10110..11100..00101..E. coliAcceptRejectStep 1Swap SwapReject105 SwapsStep 2103 Swapsstore storeErasememory ofinitialnetworkSavingFrequencyA reaction swap can be decomposed into two mutations:a) First mutation leading to deletion of reactionb) Second mutation leading to addition of reactionA reaction swap preserves the number of reactions in the randomized network.
Randomizing metabolic networksWe have developed a new method using Markov Chain Monte Carlo (MCMC)sampling and Flux Balance Analysis (FBA) to generate meaningful randomizedensembles for metabolic networks by successively imposing constraints.Ensemble RGiven no. of validbiochemical reactionsEnsemble RMFix the no. of metabolitesEnsemble uRMExclude Blocked ReactionsEnsemble uRM-V1Functional constraint ofgrowth on specified environmentsIncreasinglevelofconstraintsConstraint IConstraint IIConstraint IIIConstraint IV+++Reference: Samal et al., BMC Systems Biology (2010); Samal and Martin, PLoS ONE (2011)
Metabolite degree distributionk1 10 100 1000P(k)10-610-510-410-310-210-1100E. coliRandomSampled random networks are scale-free like E. coli !Degree of a metabolite is the number of reactions in which the metaboliteparticipates in the network.
Clustering coefficient and Average Path LengthSampled random networks have small-worldand hierarchical architecture!R: Fixed no. of reactionsRM: Fixed no. of reactions andmetabolitesuRM: Fixed no. of unblocked reactionsand metabolitesuRM-V1: Fixed no. of unblocked reactionsand metabolites and viable in oneenvironmentuRM-V5: Fixed no. of unblocked reactionsand metabolites and viable in fiveenvironmentsuRM-V10: Fixed no. of unblocked reactionsand metabolites and viable in tenenvironments
Probability that a path exists between two nodes andSize of largest strong componentIn a directed network, the may exist path fromnode a to f but lack a path back from node f to a.Probability that a path exists between two nodes isan important quantity characterizing directednetworks.A strongly connected component is a maximal setof nodes such that for any pair of nodes a and b inthe set there is a directed path from a to b andfrom b to a.The size of largest strong component is animportant characteristic of directed networks.Sampled random networks have a bow-tie architecture similar to real network!
Global structural properties of real metabolic networks are aconsequence of simple biochemical and functional constraintsIncreasinglevelofconstraintsReference: Samal and Martin, PLoS ONE (2011)Conjectured by:A. Wagner (2007) andB. Papp et al. (2008)Biological function is the maindriver of large-scale structure inmetabolic networks.
Additional constraints reduce the space of possiblenetworksReference: Samal et al, BMC Systems Biology (2010)Ensemble REnsemble RmEnsemble uRmEnsemble uRm-V1Ensemble uRm-V5Ensemble uRm-V10Embedded sets like Russian dollsIncluding the viability constraint on the first chemical environment leads to areduction by at least a factor 1050
High level of genetic diversity in our randomized ensemblesAny two random networks in our most constrained ensemble uRM-v10 differ in~ 60% of their reactions.101....100....E. coli Random network inensemble uRM-V10versusHamming distance between the two networks is ~ 60% of the maximumpossible between two bit strings.101....100....Random network inensemble uRM-V10versusRandom network inensemble uRM-V10
Origins of Power law degree distribution: geneduplication and divergenceReference: Barabasi and Oltvai (2004)In Barabasi and Albert model, scale-free networksemerge through two basic mechanisms: growthand preferential attachment.It has been suggested that a combination of geneduplication and divergence can explain powerlaws observed in protein interaction networks andthis line of argument has been extended to explainpower laws in metabolic networks.
Reaction Degree DistributionNumber of substrates in a reaction0 2 4 6 8 10 12Frequency0500100015002000250030000 Currency1 Currency2 Currency3 Currency4 Currency5 Currency6 Currency7 CurrencyIn each reaction, we have counted the numberof currency and other metabolites.Most reactions involve 4 metabolites of which 2are currency metabolites and 2 are othermetabolites.The nature of reaction degree distribution is very different from the metabolitedegree distribution which follows a power law.The degree of a reaction is the number of metabolites that participate in it.The reaction degree distribution is bell shaped with typical reaction in the networkinvolving 4 metabolites.RatpAadpB
Scale-free versus Scale-rich network:Origin of Power laws in metabolic networksTanaka and Doyle have suggested a classification of metabolites into threecategories based on their biochemical roles(a) „Carriers‟ (very high degree)(b) „Precursors‟ (intermediate degree)(c) „Others‟ (low degree)Gene duplication and divergence at thelevel of protein interaction networks hasbeen suggested as an explanation forobserved power laws in metabolicnetworks.However, the presence of ubiquitous(high degree) „currency‟ metabolitesalong with low degree „other‟metabolites in each reaction canalternatively explain the observed powerlaws (or at least fat tail) in themetabolite degree distribution.Reference: Tanaka (2005); Tanaka, Csete, and Doyle (2008)RatpAadpB
MCMC sampling method: A computational framework toaddress questions in evolutionary systems biologyReference: Papp et al (2008)
Genotype networks: A many-to-one genotype to phenotype mapGenotypePhenotypeRNA SequenceSecondary structureof pairingsfoldingReference: Fontana & Schuster (1998) Lipman & Wilbur (1991) Ciliberti, Martin & Wagner (2007)In the context of RNA, genotype networks are commonly referred to as Neutral networks.We have studied the properties of the metabolic genotype-phenotype map using ourframework in detail.Protein SequenceThree dimensionalstructureGene networkGene expressionlevelsReference: Samal et al, BMC Systems Biology (2010)
Implications: Origins of Evolutionary Innovations “How do organisms maintain existing phenotypewhile exploring for new and better adaptedphenotypes?” Darwin‟s theory explains the survival of the fittestbut does not explain the arrival of the fittest. Our MCMC simulations show that: "Starting with thereference genotype, one can evolve to very differentgenotypes with gradual small changes whilemaintaining the existing phenotype“. Neighborhoods of different genotypes with a givenphenotype can have access to very different novelphenotypes.
Properties of Metabolic Genotype to Phenotype map Any one phenotype is adopted by a vast number of genotypes. These genotypes form large connected sets in genotype space and it ispossible to reach any genotype in such a connected set from any othergenotype by a series of small genotypic changes. Any one genotype network typically extends far through genotype space. Individual genotypes on any one genotype network typically havemultiple neighbors with the same phenotype. Put differently, phenotypesare to some extent robust to small changes in genotypes (such asmutations). Different neighborhoods of the same genotype network contain verydifferent novel phenotypes.
Floral Organ Specification (FOS) gene regulatory networkWe have developed a Markov Chain Monte Carlo (MCMC) method similar to thecase of metabolic networks to sample the space of functional gene regulatorynetworks.Phenotype: 10 steady-state attractors ofthe network whose expression statesspecify different cell types (shoot apicalmeristem, sepal, petal, stamen andcarpel).Genotype: Boolean Gene RegulatoryNetwork model for Arabidopsis FloralOrgan Specification network containing15 genes connected by 46 edges.Reference: Alvarez-Buylla et al, The Arabidopsis Book (2010)
Edge usage in real and sampled networks with functionalconstraintsReference: Henry, Moneger, Samal* and Martin*, Mol. Biosystems (2013)Arabidopsis network Sampled networks with phenotypeconstraints
Summary We have proposed a null model based on Markov Chain Monte Carlo (MCMC)sampling to generate benchmark ensembles with desired phenotype formetabolic networks. Our realistic benchmark ensembles can be used to distinguish between typicaland atypical properties of a network. We show that many large-scale structural properties of metabolic networks areby-products of functional or phenotypic constraints. Modularity in metabolic networks can arise due to phenotypic constraints ofgrowth in many different environments. Our framework can be used to address many questions in evolutionary systemsbiology.
AcknowledgementsAndreas Wagner João RodriguesJürgen JostOlivier Martin Adrien HenryFrançoise Monéger