DÉCOUVERTE ET EXPLORATION
DES MODULES CONSERVÉS DE
TRANSFORMATIONS CHIMIQUES
DANS LE MÉTABOLISME
MARIA SOROKINA
3 FÉVRIER 2016
Ecole doctorale
Structure et Dynamique des Systèmes Vivants
What is the metabolism?
2
What is the metabolism?
Metabolism is the overall biochemical processes by which
living organisms are maintained in life, grow, reproduce and
interact with the environment
3
What is the metabolism?
Metabolism is the overall biochemical processes by which
living organisms are maintained in life, grow, reproduce and
interact with the environment
« Μεταβολή » (metabôlé) – greek – change, transformation
4
What is the metabolism?
Metabolism is the overall biochemical processes by which
living organisms are maintained in life, grow, reproduce and
interact with the environment
« Μεταβολή » (metabôlé) – greek – change, transformation
Chemical transformations mainly concern small molecules –
metabolites- which are modified by (bio)chemical reactions
5
What is the metabolism?
Metabolism is the overall biochemical processes by which
living organisms are maintained in life, grow, reproduce and
interact with the environment
« Μεταβολή » (metabôlé) – greek – change, transformation
Chemical transformations mainly concern small molecules –
metabolites- which are modified by (bio)chemical reactions
6
Successive reactions aiming the production or degradation of a target metabolite are
described in metabolic pathways
What is the metabolism?
Metabolism is the overall biochemical processes by which
living organisms are maintained in life, grow, reproduce and
interact with the environment
« Μεταβολή » (metabôlé) – greek – change, transformation
Chemical transformations mainly concern small molecules –
metabolites- which are modified by (bio)chemical reactions
Biochemical reactions are often catalysed by enzymes – proteins encoded in the
organism genome and having the ability to facilitate specific reactions
7
Successive reactions aiming the production or degradation of a target metabolite are
described in metabolic pathways
Biochemical reactions are often catalysed by enzymes – proteins encoded
in the organism genome and having the ability to facilitate specific
reactions
8
Enzymes
Genome
Reactions
Transforming metabolites
Presentation Outline
Introduction: From Genome to Metabolism
Part I: Orphan Enzymes
Part II: Reaction Molecular Signature Network and Conserved Modules
Part III: Combining Genomic and Metabolic Contexts
Conclusions & Perspectives
9
From Genome To Metabolism
Enzymes
Genome
Reactions
Transforming metabolites
From Genome To Metabolism
Sequencing
Enzymes
Genome
Reactions
Transforming metabolites
From Genome To Metabolism
Finding CDS
(protein-coding genes)
Sequencing
Enzymes
Genome
Reactions
Transforming metabolites
From Genome To Metabolism
Sequencing
Functional annotation
Finding CDS
(protein-coding genes)
Enzymes
Genome
Reactions
Transforming metabolites
Functional Annotation
Assigning a biological function to a protein
«  Through experimentation (high confidence)
«  Homology detection through sequence similarity (BLAST….)
«  Genomic context
«  Protein structural analysis
«  Rules-based annotation systems
«  Community annotation systems
14
Enzymes
Genome
Reactions
Transforming metabolites
From Genome To Metabolism
Sequencing
Functional annotation
Metabolism
Reconstruction
Finding CDS
(protein-coding genes)
Representing The Metabolism
Models for structural analysis
«  Networks
Models for flow analyses in metabolism
«  Flux Balance Analysis
«  Capacitance analysis
Models for dynamic analysis
«  Models including reaction kinetics
16
Representing The Metabolism
Models for structural analysis
«  Networks
Models for flow analyses in metabolism
«  Flux Balance Analysis
«  Capacitance analysis
Models for dynamic analysis
«  Models including reaction kinetics
17
Metabolic Networks
18
Toy example:
part of the Escherichia coli metabolism
Metabolic Networks
Bipartite network of metabolites and reactions
o  Nodes = metabolites and reactions
19
Pyruvate
Formate
Acetyl-CoA
Acetaldehyde
Ethanol
Coenzyme A
NADH
NAD+
Reaction
2.3.1.54
Reaction
1.2.1.10
Reaction 1.1.1.1
Metabolic Networks
Metabolite network
o  Nodes = metabolites
o  Edge between two nodes if
there is a reaction where one
of the metabolites is the
substrate and the other is
the product
20
Pyruvate
Formate
Acetyl-CoA
Acetaldehyde
Ethanol
Coenzyme A
NADH
NAD+
Metabolic Networks
Reaction network
o  Nodes = reactions
o  Edge between two nodes if
there is a metabolite produced
by a reaction substrate of the
other reaction
21
Reaction 2.3.1.54
Reaction 1.2.1.10
Reaction 1.1.1.1
Metabolic Networks
Enzyme network
o  Nodes = enzymes
o  Edge between two nodes if
there is a metabolite produced
by an enzyme substrate of the
other enzyme
o  Limitations :
o  An enzyme can catalyse several reactions
o  A reaction can be catalysed by several enzymes
o  Incomplete knowledge of enzymes (orphan enzymes)
22
Pyruvate formate
lyase
Acetaldehyde
dehydrogenase
Alcohol
dehydrogenase
Metabolic Networks
Ubiquitous compounds problem
CO2 ATP/ADP H2O H+ NAD(P)+/NAD(P)H ….
Create important hubs in the metabolic network
! need to take them into account!
“Primary” and “secondary” metabolites
in reactions in pathways
23
Ubiquitous: existing or being everywhere,
especially at the same time; omnipresent:
Hub: A highly connected node in a graph
Main difficulties in metabolic network reconstruction
from whole genomes:
« Gene functional annotation issues
24
Main difficulties in metabolic network reconstruction
from whole genomes:
« Gene functional annotation issues
25
>60% of functional
annotations in UniProt
may be erroneous
2009
Main difficulties in metabolic network reconstruction
from whole genomes:
« Gene functional annotation issues
« Orphan enzymes
26
Part I
Orphan Enzymes
What is an orphan enzyme?
An “orphan enzyme activity” (or “orphan enzyme” for short) is a known
biochemical activity for which there is any associated sequence (yet)
28
Orphan enzymes
2004: Karp: Call for an enzyme genomics initiative. (38% of orphan enzymes)
2005: Lespinet & Labedan: Orphan enzymes? (42% of orphan enzymes)
2006: Lespinet & Labedan: ORENZA database. (36% of orphan enzymes)
2007: Chen & Vitkup: Distribution of orphan metabolic activities. (34% of orphan enzymes)
2007: Pouliot & Karp: A survey of orphan enzyme activities. (34% of orphan enzymes)
29
Orphan enzymes
22%
78%
>5,000 enzymatic activities
IUBMB - EC numbers
Orphan
enzymes
30
Enzyme Commission (EC) number:
Official classification of enzyme activities
Reaction class
Metabolite type
Reaction nature
Serial number
Enzyme activities and annotated proteins over years
Limited number of recently discovered activities
Protein
sequencing
DNA sequencing
Expression cloning
Genomics
31
Enzyme discovery and protein families
23%
77%
>14,000 protein families
Pfam
22%
78%
>5,000 enzymatic activities
IUBMB - EC numbers
Unknown
functionOrphan
enzymes
32
Enzyme discovery and protein families
Newly discovered enzymatic activities are mostly associated with already
known enzyme families 33
Local Orphan Enzymes
Enzymatic activities that have been observed
in at least one organism of a given clade and
having a sequence associated in an other clade
but not in this one
34
Local orphan EC
numbers
Achaea Bacteria Eukaryotes
Total number of
concerned EC
numbers
79 133 299
% of EC retrieved
with PRIAM
(significant hit with a
detected protein)
30% 30% 59%
Main difficulties in metabolic network reconstruction
from whole genomes:
« Gene functional annotation issues
« Orphan enzymes
« Lack of knowledge on organism metabolic diversity
35
Part II
Reaction Molecular Signature Networks and Conserved
Modules
37
38
5,830 nodes
11,197 edges
39
57% of nodes
83% of edges
Reactions from
model organisms
✴  E. coli
✴  B. subtilis
✴  S. cerevisiae
✴  H. sapiens
✴  A. thaliana
✴  D. melanogaster
40
41
57% nodes suppressed
83% edges suppressed
42
Lack of knowledge about metabolism diversity in non-model organisms
43
Lack of knowledge about metabolism diversity in non-model organisms
What strategy can be adopted to counter this lack of knowledge?
44
All main hypotheses on metabolic pathway evolution agree about the
importance of enzyme promiscuity, i.e. the capacity of enzymes to catalyze
one or several reactions on more or less different substrates…
…we should look at the conservation of chemical transformations in
pathways and not only the conservation of enzymatic reaction
45
Reactions and chemical transformation types
Dehydrogenation
46
How to represent molecules, reactions and
their chemical transformation types ?
Representing Molecules
47
Representing Molecules
48
Need to be able to describe
molecular substructures and their
proprieties
Molecular Signatures
49
50
Molecular signature
set of sub-graphs of given diameter (height) centered on each atom of the
molecule
Carbonell, P., Carlsson, L., Faulon, J.-L.: Stereo signature molecular descriptor. Journal of Chemical Information and Modeling
53(4), 887–97 (2013)
51
Molecular signature
set of sub-graphs of given diameter (height) centered on each atom of the
molecule
Carbonell, P., Carlsson, L., Faulon, J.-L.: Stereo signature molecular descriptor. Journal of Chemical Information and Modeling
53(4), 887–97 (2013)
52
Molecular signature
set of sub-graphs of given diameter (height) centered on each atom of the
molecule
Carbonell, P., Carlsson, L., Faulon, J.-L.: Stereo signature molecular descriptor. Journal of Chemical Information and Modeling
53(4), 887–97 (2013)
53
Molecular signature
set of sub-graphs of given diameter (height) centered on each atom of the
molecule
Carbonell, P., Carlsson, L., Faulon, J.-L.: Stereo signature molecular descriptor. Journal of Chemical Information and Modeling
53(4), 887–97 (2013)
54
Molecular signature
set of sub-graphs of given diameter (height) centered on each atom of the
molecule
Carbonell, P., Carlsson, L., Faulon, J.-L.: Stereo signature molecular descriptor. Journal of Chemical Information and Modeling
53(4), 887–97 (2013)
How To Represent Reactions And Their Chemical
Transformation Type?
55
56
Reaction molecular signature (RMS)
difference between molecular signatures of products and substrates of the
reaction
Carbonell, P., Carlsson, L., Faulon, J.-L.: Stereo signature molecular descriptor. Journal of Chemical Information and Modeling
53(4), 887–97 (2013)
57
Reaction molecular signature (RMS)
difference between molecular signatures of products and substrates of the
reaction
… specifically, it consists in keeping changing substructures, or, a way to encode the chemical
transformation
Carbonell, P., Carlsson, L., Faulon, J.-L.: Stereo signature molecular descriptor. Journal of Chemical Information and Modeling
53(4), 887–97 (2013)
58
Reaction molecular signature (RMS)
difference between molecular signatures of products and substrates of the
reaction
… specifically, it consists in keeping changing substructures, or, a way to encode the chemical
transformation
Carbonell, P., Carlsson, L., Faulon, J.-L.: Stereo signature molecular descriptor. Journal of Chemical Information and Modeling
53(4), 887–97 (2013)
59
Reaction molecular signature (RMS)
difference between molecular signatures of products and substrates of the
reaction
… specifically, it consists in keeping changing substructures, or, a way to encode the chemical
transformation
Carbonell, P., Carlsson, L., Faulon, J.-L.: Stereo signature molecular descriptor. Journal of Chemical Information and Modeling
53(4), 887–97 (2013)
60
Reaction molecular signature (RMS)
★  At height 0, the RMS is null
61
Reaction molecular signature (RMS)
★  At height 0, the RMS is null
★  Height 1 RMS:
-1.0*[O]([C][P])
1.0*[O]([H][C])
-1.0*[O]([H][H])
1.0*[O]([H][P])
0.0
62
Reaction molecular signature (RMS):
★  At height 0, the RMS is null (all atoms are subtracted)
★  Height 1 RMS:
★  Height 2 RMS:
-1.0*[O]([C][P])
1.0*[O]([H][C])
-1.0*[O]([H][H])
1.0*[O]([H][P])
0.0
1.0*[C@@]([H][C@@]([H][C][O])[C@@]([H][C@][O])[O]([H]))
1.0*[C@@]([H][C@]([H][C@@][O])[C@@]([H][C@@][O])[O]([H]))
1.0*[C@@]([H][C@]([H][C@@][O])[O]([H])[O]([C@@]))
-1.0*[C@@]([H][C@]([H][C@][O])[C@@]([H][C@][O])[O]([H]))
-1.0*[C@@]([H][C@]([H][O][O])[C@@]([H][C@][O])[O]([H]))
1.0*[C@@]([H][C]([H][H][O])[C@@]([H][C@@][O])[O]([C@@]))
1.0*[C@]([H][C@@]([H][C@@][O])[C@@]([H][O][O])[O]([H]))
-1.0*[C@]([H][C@@]([H][C@@][O])[O]([H])[O]([C@]))
-1.0*[C@]([H][C@]([H][C][O])[C@@]([H][C@@][O])[O]([H]))
-1.0*[C@]([H][C]([H][H][O])[C@]([H][C@@][O])[O]([C@]))
1.0*[C]([H][H][C@@]([H][C@@][O])[O]([H]))
-1.0*[C]([H][H][C@]([H][C@][O])[O]([P]))
1.0*[H]([C@@]([C@@][C@@][O]))
-1.0*[H]([C@@]([C@][C@@][O]))
1.0*[H]([C@@]([C@][O][O]))
1.0*[H]([C@@]([C][C@@][O]))
1.0*[H]([C@]([C@@][C@@][O]))
-1.0*[H]([C@]([C@@][O][O]))
-1.0*[H]([C@]([C@][C@@][O]))
-1.0*[H]([C@]([C][C@][O]))
2.0*[H]([C]([H][C@@][O]))
-2.0*[H]([C]([H][C@][O]))
1.0*[H]([O]([C@@]))
-1.0*[H]([O]([C@]))
1.0*[H]([O]([C]))
-2.0*[H]([O]([H]))
1.0*[H]([O]([P]))
1.0*[O]([C@@]([H][C][C@@])[C@@]([H][C@][O]))
-1.0*[O]([C@]([H][C][C@])[C@]([H][C@@][O]))
-1.0*[O]([C]([H][H][C@])[P]([O][O]=[O]))
63
RMS group reactions on the basis of performed chemical
transformation type
=
Reaction molecular signature
Molecular signature
Reaction
64
=
Reaction molecular signature
Molecular signature
Reaction network
Reaction
65
«  Nodes represent reactions
«  Two nodes are linked by a
directed edge if there is a
metabolite produced by the first
reaction that is consumed by the
second reaction
«  5,830 nodes
«  11,197 edges
66
=
Reaction molecular signature
Molecular signature
Reaction network
RMS network
Reaction
67
Transformation of a reaction network in a RMS network
68
Transformation of a reaction network in a RMS network
69
Transformation of a reaction network in a RMS network
70
Transformation of a reaction network in a RMS network
71
Markov chains transition probabilities of order 1 between connected RMSMarkov chains transition probabilities of order 1 between RMSi and RMSj
72
3,365 nodes
8,721 edges
5,830 nodes
11,197 edges Node reduction rate : 0.57
X1 X0,57
73
=
Reaction molecular signature
Molecular signature
Reaction network
RMS network
Reaction
Search and analysis of conserved paths
Path conservation metrics
74
75
Pathway conservation index (PCI)
✴  Computed for each RMS path present in at least one known metabolic pathway
✴  Represents the number of corresponding reaction paths that are present in at
least one MetaCyc pathway
… captures the chemical redundancy across the known metabolism
76
77
Beta-oxydation module - PCI = 14
(conserved in 14 pathways)
78
Aldoxime biosynthesis- PCI = 7
(conserved in 7 pathways)
Pathway conservation index (PCI)
for all MetaCyc pathways
79
Paths of length 2 & PCI>=2 : 365 conserved modules
Previous study: Muto et al. (J. Chem. Inf. Model., 2013) identified 34 conserved modules
Pathway type
MetaCyc pathways with
conserved modules
Biosynthesis 263 (42%)
Degradation 172 (47%)
Detox 3 (27%)
Energy 61 (78%)
Other 19 (33%)
All 518 (46%)
Pathway conservation index (PCI)
for all MetaCyc pathways
80
Paths of length 2 & PCI>=2 : 365 conserved modules
Previous study: Muto et al. (J. Chem. Inf. Model., 2013) identified 34 conserved modules
Pathway type
MetaCyc pathways with
conserved modules
Biosynthesis 263 (42%)
Degradation 172 (47%)
Detox 3 (27%)
Energy 61 (78%)
Other 19 (33%)
All 518 (46%)
Pathway conservation index (PCI)
for all MetaCyc pathways
81
Paths of length 2 & PCI>=2 : 365 conserved modules
Previous study: Muto et al. (J. Chem. Inf. Model., 2013) identified 34 conserved modules
Pathway type
MetaCyc pathways with
conserved modules
Biosynthesis 263 (42%)
Degradation 172 (47%)
Detox 3 (27%)
Energy 61 (78%)
Other 19 (33%)
All 518 (46%)
Conservation of all RMS paths in the network
82
83
Path enumeration in the RMS network
>72,000 paths of length 2 (2 edges and 3 nodes)
RMS path scores
84
RMS path scores
85
RMS path scores
86
wRea
Number of reactions described by a RMS
scoreRea
diversity of reactions performing the same chemical transformation
RMS path scores
87
wPageRank
Feedback centrality: the more neighbours a node
has, the more it is central. The more a node is
central, the more its neighbours are central
scorePageRank
topological importance of the module in the network by highlighting
chemical hubs
RMS path scores
88
wProt
Estimation of the number of proteins
associated to a given RMS
scoreProt
diversity of enzymes performing the same chemical transformation
RMS path scores
89
wProt
Estimation of the number of proteins
associated to a given RMS
scoreProt
diversity of enzymes performing the same chemical transformation
30% of RMS with weightProt=0
Link with orphan enzymes?
!
RMS path scores
90
Significant difference between scores distributions of known metabolic pathways and all/random
paths in the RMS network
(Kruskall-Wallis & Tuckey HSD tests for validation: p-value<<0.05)
RMS path scores
91
Learning pathway types from known metabolic pathways using rules combining
scoreProt, scoreRea and scorePageRank
NNge algorithm
Pathway type prediction with an accuracy of 89% for RMS paths
5 metabolic pathway types:
✴  biosynthesis
✴  degradation
✴  detoxification
✴  energy creation
✴  other
92
93
=
Reaction molecular signature
Molecular signature
Reaction network
RMS network
Reaction
Search and analysis of conserved paths Linking to genomic context
Part III
Combining Genomic and Metabolic Contexts
95
Metabolic context: RMS network
Genomic context: gene cluster
Gene Clusters: Operons
96
Operon: genomic unit containing a group of genes:
«  co-localised on the same strand
«  controlled by the same promoter
«  co-transcripted in a polycistronic ARNm
«  often associated to a same cellular function
Directons
predicting operons
97
Maximal set of adjacent CDS localised on the same DNA strand and not interrupted by a
CDS on the opposite strand
98
Linking directon genes to RMS
EC number
RMS
…
…
Known
enzymes
99
Linking directon genes to RMS
RMS1
RMS2
RMS3
RMS4
A Pfam is often associated to several RMS
! A gene is therefore often associated to several RMS
100
Projection of directon RMS on the network
101
Extraction of selected nodes and all edges – selection of
maximal connected components
102
Gene-based node colouring
103
Best paths selection for the directon
«  Max number of gene colours
«  High path scores (scoreRea,
scorePageRank, scoreProt)
104
Protein Family Case
Case study for the Baeyer-Villiger MonoOxygenases protein family
A protein family is a group of
proteins that share a common
evolutionary origin, reflected by
their related functions and
similarities in sequence or structure.
Baeyer-Villiger MonoOxygenases (BVMOs)
«  Flavoenzymes (FAD dependent)
«  Water soluble
«  Two classes: I and II
105
linear or cyclic ketone ester or lactone
106
★All RMS
catalysed by the
protein family
★All directons
containing a
member of the
protein family
Directon clustering
based on their RMS
content
Network projection
of common RMS
from each directon
cluster
Path selection: max
colours, high scores
Baeyer-Villiger Monooxygenation
107
3 RMS describing this
type of reaction
108
★All RMS catalysed
by the protein
family
★All directons
containing a
member of the
protein family
Directon clustering
based on their RMS
content
Network projection
of common RMS
from each directon
cluster
Path selection: max
colours, high scores
Directons containing a BVMO
«  814 BVMO sequences
«  812 directons
«  468 organisms – only bacteria
109
110
★All RMS catalysed
by the protein
family
★All directons
containing a
member of the
protein family
Directon
clustering based
on their RMS
content
Network projection
of common RMS
from each directon
cluster
Path selection: max
colours, high scores
111
Clustering of BVMO-containing directons according
their content in RMS
Cluster 1
•  251 directons
•  0 common RMS
Cluster 2
•  308 directons
•  32 common RMS
Cluster 3
•  125 directons
•  10 common RMS
Cluster 4
•  69 directons
•  86 common RMS
Cluster 5
•  59 directons
•  5 common RMS
112
★All RMS catalysed
by the protein
family
★All directons
containing a
member of the
protein family
Directon clustering
based on their RMS
content
Network
projection of
common RMS
from each
directon cluster
Path selection: max
colours, high scores
113
Cluster projection on the RMS network
& selection of maximal connected components
Cluster 2
«  Pink nodes: RMS BVMOs
«  Grey nodes: RMS known to be in
BVMOs metabolic context
«  Blue nodes: RMS never seen in BVMO
metabolic context
«  Green edges: links between RMS from
known metabolic paths where BVMOs
are involved
114
Cluster projection on the RMS network
& selection of maximal connected components
115
★All RMS catalysed
by the protein
family
★All directons
containing a
member of the
protein family
Directon clustering
based on their RMS
content
Network projection
of common RMS
from each directon
cluster
Path selection:
max colours, high
scores
116
Path selection
RMS path scoreRea scoreProt scorePageRank
Path 1 7.9 2.5 5.0 10-4
Path 2 7.6 2.5 4.8 10-4
Path 3 5.5 0.5 3.6 10-4
Path 4 5.2 0.4 3.4 10-4
117
Difficulty to define the exact location on the molecule
where the described reaction happens
To Conclude….
What has been done?
«  Orphan enzyme survey
«  Update of statistics
«  Protein families and local orphan enzymes
«  A new representation of metabolism using a network of chemical
transformations
«  Definition and detection of conserved modules
«  Rules for module type prediction
«  Network exploration using genomic and metabolic contexts
«  Definition of a strategy to explore the functional diversity of enzyme families
«  Application to the Baeyer-Villiger Monooxygenases
119
What’s next?
Method improvements
«  Detect branched and cyclic conserved modules
«  Determine specific domains/profiles for RMS: using PRIAM/MKDOM-like
methods
«  Improve gene cluster projection on the RMS network
Applications
«  RMS to classify enzyme activities
«  Assign sequences for orphan enzymes and reactions for orphan metabolites
«  Application on other protein families
«  A way to study biological systems from a chemical point of view
120
David Vallenet
Claudine Médigue
Systems biology team:
Karine Bastard
Mark Stam
Jonathan Mercier
Guillaume Reboul
And all LABGeM
Jean-Loup Faulon
Olivier Lespinet
121
Acknowledgements
122
123
Additional slides
Metabolic Networks
Metabolites hypergraph
o  Nodes = metabolites
o  Hyperedge linking all metabolites implied in the reaction
124
Pyruvate
FormateAcetyl-CoA
Acetaldehyde
Ethanol
Coenzyme A
NADH
NAD+
125
EC numbers vs RMS

soutenance

  • 1.
    DÉCOUVERTE ET EXPLORATION DESMODULES CONSERVÉS DE TRANSFORMATIONS CHIMIQUES DANS LE MÉTABOLISME MARIA SOROKINA 3 FÉVRIER 2016 Ecole doctorale Structure et Dynamique des Systèmes Vivants
  • 2.
    What is themetabolism? 2
  • 3.
    What is themetabolism? Metabolism is the overall biochemical processes by which living organisms are maintained in life, grow, reproduce and interact with the environment 3
  • 4.
    What is themetabolism? Metabolism is the overall biochemical processes by which living organisms are maintained in life, grow, reproduce and interact with the environment « Μεταβολή » (metabôlé) – greek – change, transformation 4
  • 5.
    What is themetabolism? Metabolism is the overall biochemical processes by which living organisms are maintained in life, grow, reproduce and interact with the environment « Μεταβολή » (metabôlé) – greek – change, transformation Chemical transformations mainly concern small molecules – metabolites- which are modified by (bio)chemical reactions 5
  • 6.
    What is themetabolism? Metabolism is the overall biochemical processes by which living organisms are maintained in life, grow, reproduce and interact with the environment « Μεταβολή » (metabôlé) – greek – change, transformation Chemical transformations mainly concern small molecules – metabolites- which are modified by (bio)chemical reactions 6 Successive reactions aiming the production or degradation of a target metabolite are described in metabolic pathways
  • 7.
    What is themetabolism? Metabolism is the overall biochemical processes by which living organisms are maintained in life, grow, reproduce and interact with the environment « Μεταβολή » (metabôlé) – greek – change, transformation Chemical transformations mainly concern small molecules – metabolites- which are modified by (bio)chemical reactions Biochemical reactions are often catalysed by enzymes – proteins encoded in the organism genome and having the ability to facilitate specific reactions 7 Successive reactions aiming the production or degradation of a target metabolite are described in metabolic pathways
  • 8.
    Biochemical reactions areoften catalysed by enzymes – proteins encoded in the organism genome and having the ability to facilitate specific reactions 8 Enzymes Genome Reactions Transforming metabolites
  • 9.
    Presentation Outline Introduction: FromGenome to Metabolism Part I: Orphan Enzymes Part II: Reaction Molecular Signature Network and Conserved Modules Part III: Combining Genomic and Metabolic Contexts Conclusions & Perspectives 9
  • 10.
    From Genome ToMetabolism Enzymes Genome Reactions Transforming metabolites
  • 11.
    From Genome ToMetabolism Sequencing Enzymes Genome Reactions Transforming metabolites
  • 12.
    From Genome ToMetabolism Finding CDS (protein-coding genes) Sequencing Enzymes Genome Reactions Transforming metabolites
  • 13.
    From Genome ToMetabolism Sequencing Functional annotation Finding CDS (protein-coding genes) Enzymes Genome Reactions Transforming metabolites
  • 14.
    Functional Annotation Assigning abiological function to a protein «  Through experimentation (high confidence) «  Homology detection through sequence similarity (BLAST….) «  Genomic context «  Protein structural analysis «  Rules-based annotation systems «  Community annotation systems 14
  • 15.
    Enzymes Genome Reactions Transforming metabolites From GenomeTo Metabolism Sequencing Functional annotation Metabolism Reconstruction Finding CDS (protein-coding genes)
  • 16.
    Representing The Metabolism Modelsfor structural analysis «  Networks Models for flow analyses in metabolism «  Flux Balance Analysis «  Capacitance analysis Models for dynamic analysis «  Models including reaction kinetics 16
  • 17.
    Representing The Metabolism Modelsfor structural analysis «  Networks Models for flow analyses in metabolism «  Flux Balance Analysis «  Capacitance analysis Models for dynamic analysis «  Models including reaction kinetics 17
  • 18.
    Metabolic Networks 18 Toy example: partof the Escherichia coli metabolism
  • 19.
    Metabolic Networks Bipartite networkof metabolites and reactions o  Nodes = metabolites and reactions 19 Pyruvate Formate Acetyl-CoA Acetaldehyde Ethanol Coenzyme A NADH NAD+ Reaction 2.3.1.54 Reaction 1.2.1.10 Reaction 1.1.1.1
  • 20.
    Metabolic Networks Metabolite network o Nodes = metabolites o  Edge between two nodes if there is a reaction where one of the metabolites is the substrate and the other is the product 20 Pyruvate Formate Acetyl-CoA Acetaldehyde Ethanol Coenzyme A NADH NAD+
  • 21.
    Metabolic Networks Reaction network o Nodes = reactions o  Edge between two nodes if there is a metabolite produced by a reaction substrate of the other reaction 21 Reaction 2.3.1.54 Reaction 1.2.1.10 Reaction 1.1.1.1
  • 22.
    Metabolic Networks Enzyme network o Nodes = enzymes o  Edge between two nodes if there is a metabolite produced by an enzyme substrate of the other enzyme o  Limitations : o  An enzyme can catalyse several reactions o  A reaction can be catalysed by several enzymes o  Incomplete knowledge of enzymes (orphan enzymes) 22 Pyruvate formate lyase Acetaldehyde dehydrogenase Alcohol dehydrogenase
  • 23.
    Metabolic Networks Ubiquitous compoundsproblem CO2 ATP/ADP H2O H+ NAD(P)+/NAD(P)H …. Create important hubs in the metabolic network ! need to take them into account! “Primary” and “secondary” metabolites in reactions in pathways 23 Ubiquitous: existing or being everywhere, especially at the same time; omnipresent: Hub: A highly connected node in a graph
  • 24.
    Main difficulties inmetabolic network reconstruction from whole genomes: « Gene functional annotation issues 24
  • 25.
    Main difficulties inmetabolic network reconstruction from whole genomes: « Gene functional annotation issues 25 >60% of functional annotations in UniProt may be erroneous 2009
  • 26.
    Main difficulties inmetabolic network reconstruction from whole genomes: « Gene functional annotation issues « Orphan enzymes 26
  • 27.
  • 28.
    What is anorphan enzyme? An “orphan enzyme activity” (or “orphan enzyme” for short) is a known biochemical activity for which there is any associated sequence (yet) 28
  • 29.
    Orphan enzymes 2004: Karp:Call for an enzyme genomics initiative. (38% of orphan enzymes) 2005: Lespinet & Labedan: Orphan enzymes? (42% of orphan enzymes) 2006: Lespinet & Labedan: ORENZA database. (36% of orphan enzymes) 2007: Chen & Vitkup: Distribution of orphan metabolic activities. (34% of orphan enzymes) 2007: Pouliot & Karp: A survey of orphan enzyme activities. (34% of orphan enzymes) 29
  • 30.
    Orphan enzymes 22% 78% >5,000 enzymaticactivities IUBMB - EC numbers Orphan enzymes 30 Enzyme Commission (EC) number: Official classification of enzyme activities Reaction class Metabolite type Reaction nature Serial number
  • 31.
    Enzyme activities andannotated proteins over years Limited number of recently discovered activities Protein sequencing DNA sequencing Expression cloning Genomics 31
  • 32.
    Enzyme discovery andprotein families 23% 77% >14,000 protein families Pfam 22% 78% >5,000 enzymatic activities IUBMB - EC numbers Unknown functionOrphan enzymes 32
  • 33.
    Enzyme discovery andprotein families Newly discovered enzymatic activities are mostly associated with already known enzyme families 33
  • 34.
    Local Orphan Enzymes Enzymaticactivities that have been observed in at least one organism of a given clade and having a sequence associated in an other clade but not in this one 34 Local orphan EC numbers Achaea Bacteria Eukaryotes Total number of concerned EC numbers 79 133 299 % of EC retrieved with PRIAM (significant hit with a detected protein) 30% 30% 59%
  • 35.
    Main difficulties inmetabolic network reconstruction from whole genomes: « Gene functional annotation issues « Orphan enzymes « Lack of knowledge on organism metabolic diversity 35
  • 36.
    Part II Reaction MolecularSignature Networks and Conserved Modules
  • 37.
  • 38.
  • 39.
    39 57% of nodes 83%of edges Reactions from model organisms ✴  E. coli ✴  B. subtilis ✴  S. cerevisiae ✴  H. sapiens ✴  A. thaliana ✴  D. melanogaster
  • 40.
  • 41.
  • 42.
    42 Lack of knowledgeabout metabolism diversity in non-model organisms
  • 43.
    43 Lack of knowledgeabout metabolism diversity in non-model organisms What strategy can be adopted to counter this lack of knowledge?
  • 44.
    44 All main hypotheseson metabolic pathway evolution agree about the importance of enzyme promiscuity, i.e. the capacity of enzymes to catalyze one or several reactions on more or less different substrates… …we should look at the conservation of chemical transformations in pathways and not only the conservation of enzymatic reaction
  • 45.
    45 Reactions and chemicaltransformation types Dehydrogenation
  • 46.
    46 How to representmolecules, reactions and their chemical transformation types ?
  • 47.
  • 48.
    Representing Molecules 48 Need tobe able to describe molecular substructures and their proprieties
  • 49.
  • 50.
    50 Molecular signature set ofsub-graphs of given diameter (height) centered on each atom of the molecule Carbonell, P., Carlsson, L., Faulon, J.-L.: Stereo signature molecular descriptor. Journal of Chemical Information and Modeling 53(4), 887–97 (2013)
  • 51.
    51 Molecular signature set ofsub-graphs of given diameter (height) centered on each atom of the molecule Carbonell, P., Carlsson, L., Faulon, J.-L.: Stereo signature molecular descriptor. Journal of Chemical Information and Modeling 53(4), 887–97 (2013)
  • 52.
    52 Molecular signature set ofsub-graphs of given diameter (height) centered on each atom of the molecule Carbonell, P., Carlsson, L., Faulon, J.-L.: Stereo signature molecular descriptor. Journal of Chemical Information and Modeling 53(4), 887–97 (2013)
  • 53.
    53 Molecular signature set ofsub-graphs of given diameter (height) centered on each atom of the molecule Carbonell, P., Carlsson, L., Faulon, J.-L.: Stereo signature molecular descriptor. Journal of Chemical Information and Modeling 53(4), 887–97 (2013)
  • 54.
    54 Molecular signature set ofsub-graphs of given diameter (height) centered on each atom of the molecule Carbonell, P., Carlsson, L., Faulon, J.-L.: Stereo signature molecular descriptor. Journal of Chemical Information and Modeling 53(4), 887–97 (2013)
  • 55.
    How To RepresentReactions And Their Chemical Transformation Type? 55
  • 56.
    56 Reaction molecular signature(RMS) difference between molecular signatures of products and substrates of the reaction Carbonell, P., Carlsson, L., Faulon, J.-L.: Stereo signature molecular descriptor. Journal of Chemical Information and Modeling 53(4), 887–97 (2013)
  • 57.
    57 Reaction molecular signature(RMS) difference between molecular signatures of products and substrates of the reaction … specifically, it consists in keeping changing substructures, or, a way to encode the chemical transformation Carbonell, P., Carlsson, L., Faulon, J.-L.: Stereo signature molecular descriptor. Journal of Chemical Information and Modeling 53(4), 887–97 (2013)
  • 58.
    58 Reaction molecular signature(RMS) difference between molecular signatures of products and substrates of the reaction … specifically, it consists in keeping changing substructures, or, a way to encode the chemical transformation Carbonell, P., Carlsson, L., Faulon, J.-L.: Stereo signature molecular descriptor. Journal of Chemical Information and Modeling 53(4), 887–97 (2013)
  • 59.
    59 Reaction molecular signature(RMS) difference between molecular signatures of products and substrates of the reaction … specifically, it consists in keeping changing substructures, or, a way to encode the chemical transformation Carbonell, P., Carlsson, L., Faulon, J.-L.: Stereo signature molecular descriptor. Journal of Chemical Information and Modeling 53(4), 887–97 (2013)
  • 60.
    60 Reaction molecular signature(RMS) ★  At height 0, the RMS is null
  • 61.
    61 Reaction molecular signature(RMS) ★  At height 0, the RMS is null ★  Height 1 RMS: -1.0*[O]([C][P]) 1.0*[O]([H][C]) -1.0*[O]([H][H]) 1.0*[O]([H][P]) 0.0
  • 62.
    62 Reaction molecular signature(RMS): ★  At height 0, the RMS is null (all atoms are subtracted) ★  Height 1 RMS: ★  Height 2 RMS: -1.0*[O]([C][P]) 1.0*[O]([H][C]) -1.0*[O]([H][H]) 1.0*[O]([H][P]) 0.0 1.0*[C@@]([H][C@@]([H][C][O])[C@@]([H][C@][O])[O]([H])) 1.0*[C@@]([H][C@]([H][C@@][O])[C@@]([H][C@@][O])[O]([H])) 1.0*[C@@]([H][C@]([H][C@@][O])[O]([H])[O]([C@@])) -1.0*[C@@]([H][C@]([H][C@][O])[C@@]([H][C@][O])[O]([H])) -1.0*[C@@]([H][C@]([H][O][O])[C@@]([H][C@][O])[O]([H])) 1.0*[C@@]([H][C]([H][H][O])[C@@]([H][C@@][O])[O]([C@@])) 1.0*[C@]([H][C@@]([H][C@@][O])[C@@]([H][O][O])[O]([H])) -1.0*[C@]([H][C@@]([H][C@@][O])[O]([H])[O]([C@])) -1.0*[C@]([H][C@]([H][C][O])[C@@]([H][C@@][O])[O]([H])) -1.0*[C@]([H][C]([H][H][O])[C@]([H][C@@][O])[O]([C@])) 1.0*[C]([H][H][C@@]([H][C@@][O])[O]([H])) -1.0*[C]([H][H][C@]([H][C@][O])[O]([P])) 1.0*[H]([C@@]([C@@][C@@][O])) -1.0*[H]([C@@]([C@][C@@][O])) 1.0*[H]([C@@]([C@][O][O])) 1.0*[H]([C@@]([C][C@@][O])) 1.0*[H]([C@]([C@@][C@@][O])) -1.0*[H]([C@]([C@@][O][O])) -1.0*[H]([C@]([C@][C@@][O])) -1.0*[H]([C@]([C][C@][O])) 2.0*[H]([C]([H][C@@][O])) -2.0*[H]([C]([H][C@][O])) 1.0*[H]([O]([C@@])) -1.0*[H]([O]([C@])) 1.0*[H]([O]([C])) -2.0*[H]([O]([H])) 1.0*[H]([O]([P])) 1.0*[O]([C@@]([H][C][C@@])[C@@]([H][C@][O])) -1.0*[O]([C@]([H][C][C@])[C@]([H][C@@][O])) -1.0*[O]([C]([H][H][C@])[P]([O][O]=[O]))
  • 63.
    63 RMS group reactionson the basis of performed chemical transformation type
  • 64.
  • 65.
    = Reaction molecular signature Molecularsignature Reaction network Reaction 65
  • 66.
    «  Nodes representreactions «  Two nodes are linked by a directed edge if there is a metabolite produced by the first reaction that is consumed by the second reaction «  5,830 nodes «  11,197 edges 66
  • 67.
    = Reaction molecular signature Molecularsignature Reaction network RMS network Reaction 67
  • 68.
    Transformation of areaction network in a RMS network 68
  • 69.
    Transformation of areaction network in a RMS network 69
  • 70.
    Transformation of areaction network in a RMS network 70
  • 71.
    Transformation of areaction network in a RMS network 71 Markov chains transition probabilities of order 1 between connected RMSMarkov chains transition probabilities of order 1 between RMSi and RMSj
  • 72.
    72 3,365 nodes 8,721 edges 5,830nodes 11,197 edges Node reduction rate : 0.57 X1 X0,57
  • 73.
    73 = Reaction molecular signature Molecularsignature Reaction network RMS network Reaction Search and analysis of conserved paths
  • 74.
  • 75.
    75 Pathway conservation index(PCI) ✴  Computed for each RMS path present in at least one known metabolic pathway ✴  Represents the number of corresponding reaction paths that are present in at least one MetaCyc pathway … captures the chemical redundancy across the known metabolism
  • 76.
  • 77.
    77 Beta-oxydation module -PCI = 14 (conserved in 14 pathways)
  • 78.
    78 Aldoxime biosynthesis- PCI= 7 (conserved in 7 pathways)
  • 79.
    Pathway conservation index(PCI) for all MetaCyc pathways 79 Paths of length 2 & PCI>=2 : 365 conserved modules Previous study: Muto et al. (J. Chem. Inf. Model., 2013) identified 34 conserved modules Pathway type MetaCyc pathways with conserved modules Biosynthesis 263 (42%) Degradation 172 (47%) Detox 3 (27%) Energy 61 (78%) Other 19 (33%) All 518 (46%)
  • 80.
    Pathway conservation index(PCI) for all MetaCyc pathways 80 Paths of length 2 & PCI>=2 : 365 conserved modules Previous study: Muto et al. (J. Chem. Inf. Model., 2013) identified 34 conserved modules Pathway type MetaCyc pathways with conserved modules Biosynthesis 263 (42%) Degradation 172 (47%) Detox 3 (27%) Energy 61 (78%) Other 19 (33%) All 518 (46%)
  • 81.
    Pathway conservation index(PCI) for all MetaCyc pathways 81 Paths of length 2 & PCI>=2 : 365 conserved modules Previous study: Muto et al. (J. Chem. Inf. Model., 2013) identified 34 conserved modules Pathway type MetaCyc pathways with conserved modules Biosynthesis 263 (42%) Degradation 172 (47%) Detox 3 (27%) Energy 61 (78%) Other 19 (33%) All 518 (46%)
  • 82.
    Conservation of allRMS paths in the network 82
  • 83.
    83 Path enumeration inthe RMS network >72,000 paths of length 2 (2 edges and 3 nodes)
  • 84.
  • 85.
  • 86.
    RMS path scores 86 wRea Numberof reactions described by a RMS scoreRea diversity of reactions performing the same chemical transformation
  • 87.
    RMS path scores 87 wPageRank Feedbackcentrality: the more neighbours a node has, the more it is central. The more a node is central, the more its neighbours are central scorePageRank topological importance of the module in the network by highlighting chemical hubs
  • 88.
    RMS path scores 88 wProt Estimationof the number of proteins associated to a given RMS scoreProt diversity of enzymes performing the same chemical transformation
  • 89.
    RMS path scores 89 wProt Estimationof the number of proteins associated to a given RMS scoreProt diversity of enzymes performing the same chemical transformation 30% of RMS with weightProt=0 Link with orphan enzymes? !
  • 90.
    RMS path scores 90 Significantdifference between scores distributions of known metabolic pathways and all/random paths in the RMS network (Kruskall-Wallis & Tuckey HSD tests for validation: p-value<<0.05)
  • 91.
    RMS path scores 91 Learningpathway types from known metabolic pathways using rules combining scoreProt, scoreRea and scorePageRank NNge algorithm Pathway type prediction with an accuracy of 89% for RMS paths 5 metabolic pathway types: ✴  biosynthesis ✴  degradation ✴  detoxification ✴  energy creation ✴  other
  • 92.
  • 93.
    93 = Reaction molecular signature Molecularsignature Reaction network RMS network Reaction Search and analysis of conserved paths Linking to genomic context
  • 94.
    Part III Combining Genomicand Metabolic Contexts
  • 95.
    95 Metabolic context: RMSnetwork Genomic context: gene cluster
  • 96.
    Gene Clusters: Operons 96 Operon:genomic unit containing a group of genes: «  co-localised on the same strand «  controlled by the same promoter «  co-transcripted in a polycistronic ARNm «  often associated to a same cellular function
  • 97.
    Directons predicting operons 97 Maximal setof adjacent CDS localised on the same DNA strand and not interrupted by a CDS on the opposite strand
  • 98.
    98 Linking directon genesto RMS EC number RMS … … Known enzymes
  • 99.
    99 Linking directon genesto RMS RMS1 RMS2 RMS3 RMS4 A Pfam is often associated to several RMS ! A gene is therefore often associated to several RMS
  • 100.
    100 Projection of directonRMS on the network
  • 101.
    101 Extraction of selectednodes and all edges – selection of maximal connected components
  • 102.
  • 103.
    103 Best paths selectionfor the directon «  Max number of gene colours «  High path scores (scoreRea, scorePageRank, scoreProt)
  • 104.
    104 Protein Family Case Casestudy for the Baeyer-Villiger MonoOxygenases protein family A protein family is a group of proteins that share a common evolutionary origin, reflected by their related functions and similarities in sequence or structure.
  • 105.
    Baeyer-Villiger MonoOxygenases (BVMOs) « Flavoenzymes (FAD dependent) «  Water soluble «  Two classes: I and II 105 linear or cyclic ketone ester or lactone
  • 106.
    106 ★All RMS catalysed bythe protein family ★All directons containing a member of the protein family Directon clustering based on their RMS content Network projection of common RMS from each directon cluster Path selection: max colours, high scores
  • 107.
    Baeyer-Villiger Monooxygenation 107 3 RMSdescribing this type of reaction
  • 108.
    108 ★All RMS catalysed bythe protein family ★All directons containing a member of the protein family Directon clustering based on their RMS content Network projection of common RMS from each directon cluster Path selection: max colours, high scores
  • 109.
    Directons containing aBVMO «  814 BVMO sequences «  812 directons «  468 organisms – only bacteria 109
  • 110.
    110 ★All RMS catalysed bythe protein family ★All directons containing a member of the protein family Directon clustering based on their RMS content Network projection of common RMS from each directon cluster Path selection: max colours, high scores
  • 111.
    111 Clustering of BVMO-containingdirectons according their content in RMS Cluster 1 •  251 directons •  0 common RMS Cluster 2 •  308 directons •  32 common RMS Cluster 3 •  125 directons •  10 common RMS Cluster 4 •  69 directons •  86 common RMS Cluster 5 •  59 directons •  5 common RMS
  • 112.
    112 ★All RMS catalysed bythe protein family ★All directons containing a member of the protein family Directon clustering based on their RMS content Network projection of common RMS from each directon cluster Path selection: max colours, high scores
  • 113.
    113 Cluster projection onthe RMS network & selection of maximal connected components Cluster 2 «  Pink nodes: RMS BVMOs «  Grey nodes: RMS known to be in BVMOs metabolic context «  Blue nodes: RMS never seen in BVMO metabolic context «  Green edges: links between RMS from known metabolic paths where BVMOs are involved
  • 114.
    114 Cluster projection onthe RMS network & selection of maximal connected components
  • 115.
    115 ★All RMS catalysed bythe protein family ★All directons containing a member of the protein family Directon clustering based on their RMS content Network projection of common RMS from each directon cluster Path selection: max colours, high scores
  • 116.
    116 Path selection RMS pathscoreRea scoreProt scorePageRank Path 1 7.9 2.5 5.0 10-4 Path 2 7.6 2.5 4.8 10-4 Path 3 5.5 0.5 3.6 10-4 Path 4 5.2 0.4 3.4 10-4
  • 117.
    117 Difficulty to definethe exact location on the molecule where the described reaction happens
  • 118.
  • 119.
    What has beendone? «  Orphan enzyme survey «  Update of statistics «  Protein families and local orphan enzymes «  A new representation of metabolism using a network of chemical transformations «  Definition and detection of conserved modules «  Rules for module type prediction «  Network exploration using genomic and metabolic contexts «  Definition of a strategy to explore the functional diversity of enzyme families «  Application to the Baeyer-Villiger Monooxygenases 119
  • 120.
    What’s next? Method improvements « Detect branched and cyclic conserved modules «  Determine specific domains/profiles for RMS: using PRIAM/MKDOM-like methods «  Improve gene cluster projection on the RMS network Applications «  RMS to classify enzyme activities «  Assign sequences for orphan enzymes and reactions for orphan metabolites «  Application on other protein families «  A way to study biological systems from a chemical point of view 120
  • 121.
    David Vallenet Claudine Médigue Systemsbiology team: Karine Bastard Mark Stam Jonathan Mercier Guillaume Reboul And all LABGeM Jean-Loup Faulon Olivier Lespinet 121 Acknowledgements
  • 122.
  • 123.
  • 124.
    Metabolic Networks Metabolites hypergraph o Nodes = metabolites o  Hyperedge linking all metabolites implied in the reaction 124 Pyruvate FormateAcetyl-CoA Acetaldehyde Ethanol Coenzyme A NADH NAD+
  • 125.