Modularity in metabolic networks

Areejit Samal
LPTMS/CNRS, Univ Paris-Sud 11, Orsay, France
and
Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany
Environmental versatility promotes modularity in
large-scale metabolic networks
International Conference on Mathematical and Theoretical Biology, Jan 23-27, 2012, Pune, India
Jan 24, 2012
In collaboration
with:
Andreas Wagner
University of Zurich
Olivier C. Martin
Université Paris-Sud

Modular Organization of Biological Systems
Spatial modularity:
Organisms are partitioned into organs
and tissues where cells of similar types
are in close proximity.
Topological modularity:
Modules represent subsets of nodes in a
network which are strongly connected to
each other but weakly connected to the
remaining network.
If nodes within a module tend to be involved in the same biological function, then both
spatial and topological modularity point to a general architectural principle:
The parts of an organism that perform specific tasks or functions are grouped
into modules that can function semi-autonomously.
For a review see: LH Hartwell et al, Nature (1999)

Motivation
 We ask if modularity in large-scale metabolic networks can arise as a by-
product of phenotypic constraints associated with the need to sustain life in
one or more chemical environments.
 An organism’s metabolism is highly versatile if it can sustain life in many
different chemical environments.
 Using recently developed techniques to randomly sample large numbers of
viable metabolic networks from a vast space of metabolic networks, we use
flux balance analysis to study insilico metabolic networks that differ in their
versatility.
 We then ask whether versatility affects the modularity of metabolic networks.

Outline
 Modeling Framework
 MCMC sampling of random viable genotypes
 Environmental versatility index and measures of modularity
 Results
 Conclusions and Outlook

Framework: Global Reaction Set
+
Biomass composition
The E. coli biomass
composition of iJR904
was used to determine
viability of genotypes.
Nutrient metabolites
The set of external
metabolites in iJR904
were allowed for
uptake/excretion in the
global reaction set.
KEGG Database + E. coli iJR904
5870 reactions
We can implement Flux
Balance Analysis (FBA)
within this database.

Genome-scale metabolic model reconstruction pipeline
Genome sequence with annotation
Biochemical Knowledge
Physiology
Genome-scale reconstructed metabolic network
AbbreviationOfficialName Equation (note [c] and [e] at the beginning refer to the compartment the reaction takes place in, cytosolic and extracellular respectively)Subsystem ProteinClassDescription
ALATA_L L-alanine transaminase [c]akg + ala-L <==> glu-L + pyr Alanine and aspartate metabolismEC-2.6.1.2
ALAR alanine racemase [c]ala-L <==> ala-D Alanine and aspartate metabolismEC-5.1.1.1
ASNN L-asparaginase [c]asn-L + h2o --> asp-L + nh4 Alanine and aspartate metabolismEC-3.5.1.1
ASNS2 asparagine synthetase [c]asp-L + atp + nh4 --> amp + asn-L + h + ppi Alanine and aspartate metabolismEC-6.3.1.1
ASNS1 asparagine synthase (glutamine-hydrolysing) [c]asp-L + atp + gln-L + h2o --> amp + asn-L + glu-L + h + ppiAlanine and aspartate metabolismEC-6.3.5.4
ASPT L-aspartase [c]asp-L --> fum + nh4 Alanine and aspartate metabolismEC-4.3.1.1
ASPTA aspartate transaminase [c]akg + asp-L <==> glu-L + oaa Alanine and aspartate metabolismEC-2.6.1.1
VPAMT Valine-pyruvate aminotransferase [c]3mob + ala-L --> pyr + val-L Alanine and aspartate metabolismEC-2.6.1.66
DAAD D-Amino acid dehydrogenase [c]ala-D + fad + h2o --> fadh2 + nh4 + pyr Alanine and aspartate metabolismEC-1.4.99.1
ALARi alanine racemase (irreversible) [c]ala-L --> ala-D Alanine and aspartate metabolismEC-5.1.1.1
FFSD beta-fructofuranosidase [c]h2o + suc6p --> fru + g6p Alternate Carbon Metabolism EC-3.2.1.26
A5PISO arabinose-5-phosphate isomerase [c]ru5p-D <==> ara5p Alternate Carbon Metabolism EC-5.3.1.13
MME methylmalonyl-CoA epimerase [c]mmcoa-R <==> mmcoa-S Alternate Carbon Metabolism EC-5.1.99.1
MICITD 2-methylisocitrate dehydratase [c]2mcacn + h2o --> micit Alternate Carbon Metabolism EC-4.2.1.99
ALCD19 alcohol dehydrogenase (glycerol) [c]glyald + h + nadh <==> glyc + nad Alternate Carbon Metabolism EC-1.1.1.1
LCADi lactaldehyde dehydrogenase [c]h2o + lald-L + nad --> (2) h + lac-L + nadh Alternate Carbon Metabolism EC-1.2.1.22
TGBPA Tagatose-bisphosphate aldolase [c]tagdp-D <==> dhap + g3p Alternate Carbon Metabolism EC-4.1.2.40
LCAD lactaldehyde dehydrogenase [c]h2o + lald-L + nad <==> (2) h + lac-L + nadh Alternate Carbon Metabolism EC-1.2.1.22
ALDD2x aldehyde dehydrogenase (acetaldehyde, NAD) [c]acald + h2o + nad --> ac + (2) h + nadh Alternate Carbon Metabolism EC-1.2.1.3
ARAI L-arabinose isomerase [c]arab-L <==> rbl-L Alternate Carbon Metabolism EC-5.3.1.4
RBK_L1 L-ribulokinase (L-ribulose) [c]atp + rbl-L --> adp + h + ru5p-L Alternate Carbon Metabolism EC-2.7.1.16
RBP4E L-ribulose-phosphate 4-epimerase [c]ru5p-L <==> xu5p-D Alternate Carbon Metabolism EC-5.1.3.4
ACACCT acetyl-CoA:acetoacetyl-CoA transferase [c]acac + accoa --> aacoa + ac Alternate Carbon Metabolism
BUTCT Acetyl-CoA:butyrate-CoA transferase [c]accoa + but --> ac + btcoa Alternate Carbon Metabolism EC-2.8.3.8
AB6PGH Arbutin 6-phosphate glucohydrolase [c]arbt6p + h2o --> g6p + hqn Alternate Carbon Metabolism EC-3.2.1.86
PMANM phosphomannomutase [c]man1p <==> man6p Alternate Carbon Metabolism EC-5.4.2.8
PPM2 phosphopentomutase 2 (deoxyribose) [c]2dr1p <==> 2dr5p Alternate Carbon Metabolism EC-5.4.2.7
List of metabolic reactions that can occur in an organism along with
gene-protein -reaction (GPR association)
Genome-scale reconstructed metabolic model
ACALDt acetaldehyde reversible transport acald[e] <==> acald[c] Transport, Extracellular
GUAt Guanine transport gua[e] <==> gua[c] Transport, Extracellular
HYXNt Hypoxanthine transport hxan[e] <==> hxan[c] Transport, Extracellular
XANt xanthine reversible transport xan[e] <==> xan[c] Transport, Extracellular
NACUP Nicotinic acid uptake nac[e] --> nac[c] Transport, Extracellular
ASNabc L-asparagine transport via ABC system asn-L[e] + atp[c] + h2o[c] --> adp[c] + asn-L[c] + h[c] + pi[c]Transport, Extracellular
ASNt2r L-asparagine reversible transport via proton symportasn-L[e] + h[e] <==> asn-L[c] + h[c] Transport, Extracellular
DAPabc M-diaminopimelic acid ABC transport 26dap-M[e] + atp[c] + h2o[c] --> 26dap-M[c] + adp[c] + h[c] + pi[c]Transport, Extracellular
CYSabc L-cysteine transport via ABC system atp[c] + cys-L[e] + h2o[c] --> adp[c] + cys-L[c] + h[c] + pi[c]Transport, Extracellular
ACt2r acetate reversible transport via proton symport ac[e] + h[e] <==> ac[c] + h[c] Transport, Extracellular
ETOHt2r ethanol reversible transport via proton symport etoh[e] + h[e] <==> etoh[c] + h[c] Transport, Extracellular
PYRt2r pyruvate reversible transport via proton symport h[e] + pyr[e] <==> h[c] + pyr[c] Transport, Extracellular
O2t o2 transport (diffusion) o2[e] <==> o2[c] Transport, Extracellular
CO2t CO2 transporter via diffusion co2[e] <==> co2[c] Transport, Extracellular
H2Ot H2O transport via diffusion h2o[e] <==> h2o[c] Transport, Extracellular
DHAt Dihydroxyacetone transport via facilitated diffusiondha[e] <==> dha[c] Transport, Extracellular
NH3t ammonia reversible transport nh4[e] <==> nh4[c] Transport, Extracellular
ARBt2r L-arabinose transport via proton symport arab-L[e] + h[e] <==> arab-L[c] + h[c] Transport, Extracellular
Exchange and transport reactions defining possible nutrients
Cellular Objective: Biomass Reaction
(0.05) 5mthf + (5.0E-5) accoa + (0.488) ala-L + (0.0010) amp + (0.281) arg-L + (0.229) asn-L + (0.229) asp-L + (45.7318) atp + (1.29E-4) clpn_EC + (6.0E-6) coa + (0.126) ctp + (0.087) cys-L + (0.0247)
datp + (0.0254) dctp + (0.0254) dgtp + (0.0247) dttp + (1.0E-5) fad + (0.25) gln-L + (0.25) glu-L + (0.582) gly + (0.154) glycogen + (0.203) gtp + (45.5608) h2o + (0.09) his-L + (0.276) ile-L + (0.428) leu-L +
(0.0084) lps_EC + (0.326) lys-L + (0.146) met-L + (0.00215) nad + (5.0E-5) nadh + (1.3E-4) nadp + (4.0E-4) nadph + (0.001935) pe_EC + (0.0276) peptido_EC + (4.64E-4) pg_EC + (0.176) phe-L + (0.21)
pro-L + (5.2E-5) ps_EC + (0.035) ptrc + (0.205) ser-L + (0.0070) spmd + (3.0E-6) succoa + (0.241) thr-L + (0.054) trp-L + (0.131) tyr-L + (0.0030) udpg + (0.136) utp + (0.402) val-L --> (45.5608) adp +
(45.56035) h + (45.5628) pi + (0.7302) ppi
Flux Balance Analysis (FBA) can be used to analyze the capabilities of these models.

Framework: Metabolic Genotype
Global Reaction Set List of catalyzed reactions
R1: 3pg + nad → 3php + h + nadh
R2: 6pgc + nadp → co2 + nadph + ru5p-D
R3: 6pgc → 2ddg6p + h2o
.
.
.
RN: ---------------------------------
1
0
1
.
.
.
.
1
1
0
.
.
.
.
1
1
1
.
.
.
.
G1 G2 G3
Genotypes G1 and G3 differ by a single reaction swap. This reaction swap can be decomposed into a
single reaction addition, which produces from G1 a new genotype G2, followed by a single reaction
deletion, which produces G3 from G2.
A metabolic genotype G is any subset of reactions taken from the global reaction
set of N reactions. Such a genotype G can be represented as a binary string of
length N with each reaction i either present (1) or absent (0).

Framework: Genotype-to-Phenotype Map
0
1
1
.
.
.
.
0
0
1
.
.
.
.
GA
GB
x
Deleted
reaction
Flux
Balance
Analysis
(FBA)
Viable
genotype
Unviable
genotype
Bit string and equivalent network
representation of genotypes
Genotypes Ability to synthesize Viability
biomass components
Growth
No Growth
For FBA, see Price et al, Nature Rev Micro (2004)

Definition: Genotype network
We denote by Ω(n) the vast space of metabolic genotypes with a given number n of
reactions. Even for modest n, Ω(n) is a huge space containing N!/[n! (N - n)!]
genotypes.
We refer to the set or space of viable genotypes within Ω(n) as V(n).
1
0
1
.
.
.
.
1
0
0
.
.
.
.
1
1
1
.
.
.
.
1
1
0
.
.
.
.
…………………………………………………………..…
Genotypes with n reactions:
Bit strings with exactly n entries 1.
Space of possible genotypes Ω(n)
Viable genotypes
V(n)
Unviable genotypes
The set of viable genotypes with fixed number of reactions n forms a genotype network or
neutral network

Viable genotypes occupy tiny fraction of the
genotype space
Number of reactions in genotype
2000 2200 2400 2600 2800
Fractionofgenotypesthatareviable
100
10-5
10-10
10-15
10-20
Glucose
Acetate
Succinate
Analytic prefactor
We tried to estimate the fraction of viable space by generating random genotypes with a
given number of reactions n and determining their phenotype.
Constraint of having super-
essential reactions
The convex curve indicates that the effect on
viability of successive reaction removals becomes
more and more severe.
The numerical estimation of the fraction
becomes extremely difficult for genotypes
with n<2100 reactions while typical
microbial metabolic networks contain 400-
1200 reactions.
1
0
1
.
.
.
.
1
0
0
.
.
.
.
1
1
1
.
.
.
.
1
1
0
.
.
.
.
……………………
Generate a sample of random bit strings with
exactly n entries 1.
Count the number of viable bit strings in the
sample.

Markov Chain Monte Carlo (MCMC) sampling of viable
genotypes
• MCMC allows one to uniformly
sample the tiny fraction of viable
space by generating a random walk
among viable genotypes, going from
one viable to another by performing a
reaction swap.
• If a swap leads to a non-viable
genotype, the walk stays at the
previous genotype for that step and
then the process is repeated.
G1 G3
G4
Viable genotype Unviable genotype
Random walkReaction swap
Genotype networks in metabolic reaction spaces
Areejit Samal, João F Matias Rodrigues, Jürgen Jost, Olivier C Martin* and Andreas Wagner*
BMC Systems Biology 2010, 4:30

MCMC sampling of viable genotypes
1
0
1
0
1
.
.
1
1
0
0
1
.
.
1
0
1
1
0
.
.
1
1
1
0
0
.
.
0
0
1
0
1
.
.
E. coli
Accept
Reject
Step 1
Swap Swap
Reject
106 Swaps
Step 2
103 Swaps
store store
Erase
memory of
initial
network
Saving
Frequency
Accept/Reject Criterion of reaction swap:
New network is able to grow on the specified environment
Global Reaction Set
R1: 3pg + nad → 3php + h + nadh
R2: 6pgc + nadp → co2 + nadph + ru5p-D
R3: 6pgc → 2ddg6p + h2o
.
.
.
RN: ---------------------------------
The advantage of using reaction swaps in our approach is that it leaves the number of reactions constant
over time.

Environmental Versatility Index Venv
 MCMC sampling algorithm can be used to sample
the space of genotypes having a given phenotype.
 The desired phenotype is viability on a given set of
minimal environments.
 If the set of minimal environments, consists of Venv
environments, we designate the genotype’s
Environmental Versatility Index to Venv.
 Venv =1 refers to genotypes viable in one specific
environment; Venv =2 refers to genotypes viable in
two given environments; and so on.
 We have considered 89 minimal environments
whose sole carbon sources are their only
distinguishing feature.
Versatility
Venv
Number of
Sampled
genotypes
1 1000
2 1000
5 1000
10 1000
. .
. .
89 1000

Fully Coupled Set (FCS)
 A pair of reactions v1 and v2 are said to be fully coupled to each other if a nonzero flux
for v1 implies a proportionate (nonzero) flux for v2 in any steady state and vice versa.
 A fully coupled set (FCS) in a metabolic network is a maximal set of reactions that are
mutually fully coupled to each other. Such sets have been referred to as
reaction/enzyme subsets or correlated reaction sets in the literature.
 FCSs define functional modules in metabolic network which have been shown to be
biochemically sensible.
 Determining all FCSs in a large metabolic network can be done efficiently using linear
optimization.

Properties of Fully Coupled Sets (FCSs)
 In steady state, each FCS can be replaced by a single effective reaction. FCSs
can be used to coarse-grain metabolic networks.
 It has been shown that if one reaction in a FCS is essential, the complete FCS is
essential (Samal et al., BMC Bioinformatics (2006)).
 Since, FCSs guarantee flux proportionality among its constituent reactions, FCSs
are more relevant for unveiling functional coupling than graph-based measures
(Samal et al., BMC Bioinformatics (2006), Notebaart et al., PLoS Comp Bio
(2008)).
 FCS can be branched and contains cycles.
Example of a FCS in the E. coli metabolic
network that is branched and contains a
cycle: Green (red) edges represent
irreversible (reversible) reactions. The
reactions in this FCS are involved in cell
envelope biosynthesis.

Functional measures of modularity in metabolic networks
We defined two indices of metabolic
network modularity based on Fully
Coupled Sets (FCSs) of reactions in a
metabolic network.
 M : the number of reactions in a
metabolic network that belong to
different FCSs (modules).
 s : the number of FCSs (modules) in
a metabolic network.
Versatility
Venv
Number of
Sampled
genotypes
<M> <s>
1 1000 . .
2 1000
5 1000
10 1000
. .
. .
89 1000 . .

Modularity index M increases with versatility
Venv
1 5 10 20 30 40 50 70 90
<M>
240
250
260
270
280
290
300
310
320
Average over different nested sets
(a)
The data shown here are
based on MCMC sampled
genotypes with number
n=831 reactions (same as
in E. coli metabolic model.
The Environmental Versatility Index Venv denotes the number of distinct environments in
which a genotype is viable.
The modularity index M for a genotype gives the number of reactions contained in the FCSs
(modules) of that genotype.

Modularity index s with environmental versatility
Venv
12 5 10 20 30 40 50 70 89
<s>
70
75
80
85
90
95
100
Average over different nested sets
(b)
The data shown here are
based on MCMC sampled
genotypes with number
n=831 reactions (same as
in E. coli metabolic model.
The Environmental Versatility Index Venv denotes the number of distinct environments in
which a genotype is viable.
The modularity index s for a genotype gives the number of FCSs (modules) of that
genotype.

Environmental versatility enhances M and s regardless of
the value of number n of reactions in a genotype
Venv
1 5 10 20 30 40
<M>
190
200
210
220
230
240
250
260
270
Average over different orders
Venv
1 5 10 20 30 40
<M>
220
230
240
250
260
270
280
(a)
(b)
Venv
1 5 10 20 30 40
<s>
45
50
55
60
65
70
Venv
1 5 10 20 30 40
<s>
60
62
64
66
68
70
72
74
76
78
(a)
(b)
Modularity index M Modularity index s
n = 500
n = 700

Higher values of M and s are by-products of increasing versatility
We next wished to check if the additional
reactions accounting for the increase in
modularity would be closely linked to the
additional nutrients that metabolic networks
must utilize as their versatility increases.
These additional reactions and the modules
they reside could occur just downstream of
the nutrients.
The notion of downstream can be made
quantitative through the Scope or Network
expansion algorithm.

Network Expansion or Scope algorithm
Reference: Handorf, Ebenhöh and Heinrich J Mol Evol (2005)
This algorithm is based on principle of forward evolution
The scope distance of a reaction from the nutrient metabolites in the seed set is
defined as the iteration number of the step when that particular reaction is first
selected by the algorithm.
Distance 1 Distance 2 Distance 3Distance 0

The increase in modularity with environmental versatility can be
attributed to reactions that are close to nutrients
Global reaction set
Scope distance from nutrient metabolites
0 5 10 15 20 25
Frequency
0
100
200
300
400
500
(a) Reactions that account for increase in M
Scope distance from nutrient metabolites
0 5 10 15 20 25
Frequency
0
5
10
15
20
25
30
(b)
Distribution of scope distances from
the nutrients for all possible
reactions.
Distribution of scope distances from
nutrients for those reactions in additional
FCSs that differentiate the sampled
genotypes at Venv=89 from those at Venv =1.
Using Kolmogorov–Smirnov test (K–S test) we could reject the hypothesis that
the two distributions are same at a p-value < 0.00001.

Example of FCSs that account for the increase in
modularity with versatility
These FCSs metabolize the nutrients:
fucose, rhamnose and 3-hydroxycinnamic acid.

Distribution of M for genotypes in our most constrained
ensemble and comparison with E. coli
M
260 280 300 320 340 360 380
Frequency
0
50
100
150
200
250
300
E. coli
(a)
The histogram of modularity M for a sample of 1000 genotypes with Venv =89
different minimal environments. The estimate of M for E. coli is also displayed.
We can reject the hypothesis that the modularity index M of E. coli is drawn from
this same distribution.

Distribution of s for genotypes in our most constrained
ensemble and comparison with E. coli
The histogram of s for a sample of 1000 genotypes with Venv =89 different minimal
environments. The estimate of s for E. coli is also displayed.
We see that the number of FCSs in E. coli is just slightly above the position of the
distribution's peak in our ensemble. Thus, the atypically high M of the E. coli network
does not stem from its having more modules than typical networks allowing growth
on 89 carbon sources.
s
70 80 90 100 110 120
Frequency
0
50
100
150
200
250
300
350
E. coli
(b)

Atypically large FCSs near the output end of the E. coli
metabolic network
Samal et al, BMC Bioinformatics (2006); Private Gallery
Pink metabolites are part of
Biomass reaction
FCSs and operons are
positively associated
in E. coli.

Conclusions
• We find that more versatile networks are more modular and also have more
modules than less versatile networks.
• The reactions that form part of newly arising modules in highly versatile
networks tend to be close to reactions that process nutrients.
• E. coli metabolic network is significantly more modular than random networks
in our most constrained ensemble.
• Since versatility corresponds to viability in increasing numbers of
environments, it can be considered as a trait associated with fitness itself.
• Our work supports a scenario for the origin of modularity in which increasing
functional constraints can make modularity emerge.

Two scenarios for the evolution of modularity
• Modularity might result from directional selection favouring change in one trait
while stabilizing selection maintains other traits unchanged.
This scenario was shown to be true in model gene regulatory networks.
• Modular fluctuations in evolutionary goals can be sufficient to produce and
maintain modularity. In this scenario, modularity will be lost once there are no
fluctuations in the evolutionary goals.

Empirical studies show that generalist prokaryotes are more
modular than specialists
Empirical observations by groups of Uri Alon
(Parter et al, BMC Evol Bio (2009)) and Eytan
Ruppin (Kreimer et al, PNAS (2008)) where
generalists prokaryotes living in many different
environments are more modular than specialists
are fully consistent with our conclusion.
Note that the size of
the networks increases
with environmental
variability in these
studies.
Our samples of random viable networks have not
been subject to unknown selection pressures
unlike metabolism of real organisms.

Functional constraints lead to emergence of modularity
• Our results support the scenario proposed can explain the
emergence of modularity in both non-fluctuating and
fluctuating environment scenario.
• The main requirement for emergence is an increase in the
number of functions that a network performs.
The intersection of genotype
spaces for two different
environments contains
modular networks.
Modularity is ultimately a
property of genotype spaces
(or the genotype-phenotype
map).
Uri Alon’s PictureOur Picture

Global structural properties of real metabolic networks:
Consequence of simple biochemical and functional constraints
Increasinglevelofconstraints
Biological
function is a
main driver of
structure in
metabolic
networks
• Power law degree distribution
• Small Average Path Length
• High Clustering coefficient
• Bow-tie architecture

Acknowledgement
CNRS & Max Planck Society Joint Program in Systems Biology (CNRS MPG
GDRE 513) for a Postdoctoral Fellowship.
Reference
Environmental versatility promotes modularity in genome-scale metabolic networks
Areejit Samal, Andreas Wagner* and Olivier C Martin*
Funding
Andreas Wagner
University of Zurich
Olivier C. Martin
Université Paris-Sud
Federation of European Biochemical Societies (FEBS) for a Fellowship.

Modularity in metabolic networks

Recommended

Recommended

More Related Content

What's hot

What's hot (6)

Similar to Modularity in metabolic networks

Similar to Modularity in metabolic networks (20)

More from Areejit Samal

More from Areejit Samal (7)

Recently uploaded

Recently uploaded (20)

Modularity in metabolic networks