Biocuration2012 Eugeni Belda

Eugenio Belda

Laboratory of Bioinformatic Analysis in Genomic and Metabolism (LABGeM team)
CEA/DSV/IG/Genoscope & CNRS UMR8030

Introduction
 Advances in sequencing technologies has allowed an exponential accumulation
of complete genome sequences in public databases in recent years.
12273 protein
4712 enzymatic
 However, wide gap exist activities families (Pfam)
between rapid advances in genome (EC number)
sequencing and slow progress in 25% of 26%
characterization of new protein orphan of unknown
functions reactions functions

?
 Genoscope (French National Sequencing Center) has
as one fundamental research objective the extension of in
silico sequence annotations with experimental
characterization of new enzymatic functions (Metabolic
Genomics).
Lab. of Genomics & Biochemistry of Metabolism (LGBM)
 Lab. of Organic Chemistry and Biocatalysis (LCOB)
Lab. For enzymatic cloning and screening (LCAB)
Lab. of Bioinformatic Analysis in Genomic and Metabolism
(LABGeM)

Three MicroScope components
Process Management

Primary Databank Syntactic Functional / relational > 25 methods :
Update Annotations Analyses
Integrated in a
JBPM Database
workflow
DB Job management system
Release History

=> full automatisation :
PkGDB MicroCyc
• genome annotation
Data Management

• primary data up-to-date
Primary Internal Computational Pathway
Databanks Genomic results Genome
Objects DataBases

Vallenet D. et al.
«MicroScope - a platform for
microbial genome annotation
MaGe Web Interface Keyword search
Blast and Pattern and comparative genomics»
Tutorial
Login Phylogenetic Profile Database 2009
Visualization

Fusion / Fission
Genome overview Tandem duplications
Genome browser Minimal Gene Set Vallenet D, et al.
Data Export and RGPfinder
Synteny maps SNPs / InDels «MaGe - a microbial genome
Artemis annotation system supported
KEGG
MicroCyc by synteny results» Nucleic
CGView
LinePlot
Synton Gene Gene Metabolic Profile Acids Research 2006
display editor card Pathway / Synteny

Database Management
Relational DataBase PkGDB
(Prokaryotic Genome DataBase)
EC / reaction
correspondence
• Experimentally elucidated
metabolic pathways
• 1800 pathways from 2216
organisms

(P. Karp, SRI, USA)
Pathway Tools
A metabolic database is built for each annotated microbial genome
PGDB = Pathway/Genome Database (orgname_Cyc)

http://www.genoscope.cns.fr/agc/microcyc

Today: 1233 organisms
(of which 676 public
genomes)
Mapping on the PkGDB
KEGG metabolic
maps
(http://www.kegg.jp/)

MicroScope Web site
 More than 30 tools are made available to the community
«guest» access
«guest» access

Since 2005, more than
50.000 expert
annotations per year

> 1,000 users, 300 active

www.genoscope.cns.fr/agc/microscope

Curation of metabolic data in Microscope
 CanOE (Candidate genes for Orphan Enzymes): Method for the automatic integration
of genomic and metabolic contexts, that assists expert functional annotation, especially
in the case of orphan enzymes. Based on the concept of Metabolon (“close” genes in
genome sequence associated to “close” metabolic reactions):
Boyer et. Al; Bioinformatics 2005; Dec 1;21(23):4209-15.

gene gaps

genes
on genome

functional
annotations
? reactions and
compounds in
metabolic network

reaction gap
And ORPHAN

The method provides candidate genes for global/local orphan enzymatic activities
that are located in the “gaps” of metabolons
https://www.genoscope.cns.fr/agc/microscope/metabolism/canoe.php

 CanOE (Candidate genes for Orphan Enzymes)
Example: Allantoin degradation metabolon in E. coli K12
2.1.3.5 is a global orphan reaction (no associated to any gene in any
organism)

Three candidate genes for EC:2.1.3.5 reaction
 None share any significant similarities with kown carbamoytransferases
 Protein expression and biochemical assays under way
Smith AAT, Belda E., Viari A., Médigue C., and Vallenet D. “The CanOE strategy: integrating genomic and metabolic contexts across multiple
prokaryote genomes to find candidate genes for orphan enzymes” (Plos Computational Biology, In revision)


 GPR curation interface: In the context of network reconstruction, is essential the
definition of Gene-Protein-Reaction associations (Genes encoding
enzymes/complexes/isozymes catalyzing a particular metabolic reaction):

Thiele & Palsson; Nat Protoc. 2010;5(1):93-121

 GPR curation interface: The gene curation interface of Microscope allows the
validation of Gene-Reaction associations based on curated gene annotations. Two
reference reaction resources availables, MetaCyc (functional) and RHEA (under
development):

4.1.3.27, 2.4.2.18 Automatic retrieval of
Metacyc/Rhea
reactions based on
EC number
 Keyword
search


 Pathway validation interface: Validation/curation of automatically projected MetaCyc
pathways based on Gene-Reaction associations:

Projet Microme : www.microme.eu
A Knowledge-Based Bioinformatics Framework
for Microbial Pathway Genomics

AMAbiotics
 Purpose : develop bioinformatics infrastructures, Centro Nacional
together with a projection and curation process, in de Biotecnología

order to generate : CEA-Genoscope

- complete metabolic pathways from genome European
Bioinformatics
annotations Center for research
Institute

- whole-cell metabolic models from pathway and Technology
German Collection of
Hellas
assemblies Microorganisms and
Cell Cultures

ISTHMUS Spanish National
 Experimentally validation of metabolic model Cancer Centre

using growth phenotype data (i.e, BIOLOG Molecular Tel-Aviv
experiments) generated within the project for a Networks University

subset of selected species.
Université
Swiss Institute of
Libre de
Bioinformatics
Bruxelles
 Analytical tools are integrated for comparative
and phylogenetic analysis based on projected Wageningen
Wellcome Trust
pathways and metabolic models Sanger Institute University

Microme WP2: Objectives
 Provide EU with a curated microbial metabolic resource

 Implement a unique cyclic and colaborative curation process for metabolic data

 Unification of existing metabolic resources:

 Pivot resources: ChEBI (chemical compounds) and Rhea (chemical reactions)
 Cross-references External resources (compounds, reactions, pathways):
KEGG, MetaCyc, Metabolic models
Alcantara R., Axelsen K.B., Morgat A., Belda E., Coudert E., Bridge A., Cao H., de Matos P., Ennis M., Turner S., Owen G., Bougueleret
L., Xenarios I., and Steinbeck C. (2012) Rhea - a manually curated resource of biochemical reactions. Nucleic Acids Research. 40, D754-
D760, Database issue.

MicroScope and Microme
 Use MicroScope as reference resource of curated GPR (Gene Protein Reaction)
associations for microbial genomes included in Microme project
 Development of novel interfaces for GPR curation in Microscope environment. Retrieval
of METACYC and RHEA reactions for a particular gene object from EC number annotations

MicroScope and Microme
 Development of web-services to provide Microme partners with curated Gene-
Reaction associations from Microscope platform
Curation tool

Reconstruction

microcyc Each night PkGDB

Web-services

Test-case: Bacillus subtilis 168 re-annotation

 Second most intensively studied bacterium after Escherichia coli, being a model
organism for Gram-positive bacteria

 Genome sequenced in
1997. 4,214 Megabases, 4000
CDSs
Nature 1997 Nov 20;390(6657):249-56

 Re-sequencing and first re-
annotation of the genome in
2009

Microbiology (2009), 155, 1758-1775

 Re-annotation of the genome in the context of Microme project with special
focus in the curation of Gene-Reaction associations by using Microscope metabolic
tools and curation interface. Collaborative work LABGeM (CEA)-SIB-AMAbiotics
(Antoine Danchin)


 Starting data for curation of Gene-Reaction associations

Predicted MetaCyc
reaction; BBH relationship
with E. coli CDSs

Predicted MetaCyc
reaction; No BBH
310 CDSs
relationship with E. coli
531 CDSs CDSs
909 CDSs
508 CDSs 378 CDSs "Putative enzymes" in
Product type annotation;
No predicted MetaCyc
reaction

"Enzymes" in Product type
annotation; No predicted
MetaCyc reaction

 From the 909 CDS with predicted reaction

 531 with BBH in E. coli:

 416 with same GPR in B. Automatic validation of Gene-
subtilis and E. coli (EcoCyc) Reaction associations

 115 CDS with different GPR in
B. subtilis and E. coli (EcoCyc) Manual curation of Gene-Reaction
associations in Microscope
 378 without BBH in E. coli: environment

 254 with GPR predicted from  Sequence similarity profiles
the curated EC number
 Genomic context
 124 with GPR predicted from
conservation
“product” annotation

 310 CDS with “enzyme” annotation and  Integration of genomic and
without predicted reaction metabolic context (CanOE
strategy)
 508 CDS with “enzyme” annotation and
without predicted reaction: Filter by
 Co-evolution patterns of
Catalytic activity field in SwissProt
annotations (41 CDSs)
functionally related genes


 Problems associated to
automatic predictions of Gene-
Reaction associations. Example:
Generic EC number definition
associated to multiple specific No experimental
reaction instances in MetaCyc evidence of activity ;
generic product
annotation

17 predicted reactions based
on EC:1.2.1.3 annotation.
Problems in terms of
modelling purposes

Without experimental
evidence of specific
substrates, only generic
reaction has been validated

 Stats of curation Gene-Reaction associations in Microscope

1022
Nº reactions Initial Gene-
985 (388)
Reaction
predictions
901 (Pathway Tools)
Nº CDS
1006 (517)

Current Gene-
Nº Gene-Reaction 1549 Reaction
associations 1406 (715) associations
(Manually Curated)

0 500 1000 1500 2000

105 CDS without
automatically predicted  147 new reactions added (not
reaction in initial originally predicted)
projections  184 originally predicted
reactions removed

 17 possible updates of SwissProt annotations Reported to
SwissProt/IUBMB
 6 possible new EC numbers curators
 13 possible new metabolic pathways/pathway variants not presents in MetaCyc

 Biotin biosynthesis pathway variant
 Lipoate biosynthesis pathway variant
New  Myoinositol catabolism pathway variant
pathway  Rhamnogalacturonan type I degradation pathway variant
variants  Acetoin dehydrogenase pathway variant
 Methionin salvage pathway variant
 Bacillaene biosynthesis pathway
 Aerobic respiration pathway variants

 Aromatic polyketide biosynthesis pathway
New  2-methylthio-N6-threocarbamoyladenosine biosynthesis
metab. Bacilysocin biosynthesis
pathways Archaeal-type ether lipid biosynthesis
Bacillaene biosynthesis pathway
Methionine-Cysteine interconversion

 Biotin biosynthesis pathway variant: Update of DAP aminotransferase pathway variant
(EC:2.6.1.62)
KEGG pathway (map00780) MetaCyc pathway (PWY-5005)

S-Adenosyl-L-
methionine as amino
group donor

L-lysine instead S-adenosyl-
Methionine as amino group donor in
Bacillus subtilis BioA enzyme

 Biotin biosynthesis pathway variant: Link with fatty acid metabolism. Improvement of
genome-scale metabolic models
iBsu1103: Most up-to-date B. subtilis 168 metabolic model (SEED
methodology; 1437 reactions, 1103 genes). Henry CS, Zinner JF, Cohoon MP, Stevens RL.
Genome Biol. 2009;10(6):R69

Dead-end
metabolite

Auxotrophic for
EX_pimelate Biotin
biosynthesis
FBA simulations iBsu1103 model

122.97 122.97 122.97
140.00
Not included in
Biomass prod. rate

120.00
Biomass equation 100.00
80.00
60.00
EX_biotin 40.00
0.00
20.00
0.00
iBsu1103 iBsu1103; Biotin iBsu1103; iBsu1103;
in Biomass External influx External influx
Pimelate Biotin

 BioI enzyme of B. subtilis 168: cytochrome
P450 protein that catalyzes the oxidative
cleavage of acyl-ACP/free fatty acid molecules
generated in the context of fatty acid
biosynthesis yielding pimeloyl-ACP as primary
product.

Fatty acids An Acyl-ACP
metabolism BioI (BSU30190) L-Alanine+H+

Pimeloyl-ACP BioF (BSU30220)
CO2+HoloACP

A fatty acid
BioI
(BSU30190)

Future work

 Extension of the reference set of Microme species to:
 Acinetobacter sp. ADP1
 Pseudomonas putida KT2440
 Bacillus subtilis 168

 Second version of Gene-Reaction curation interface in Microscope
environment:
 Curation of protein complexes / Isozyme sets
 Management of Rhea reactions in addition of MetaCyc reactions

 Definition of strategies for vertical annotation and propagation of curated
GPR across multiple microbial genomes

 Use UniPathway as reference resource of metabolic pathways in Microscope;
Specie-specific pathway representations based on Pathway modules
combination (http://www.unipathway.org)

Contributions
Claudine Médigue (Group Leader)
David Vallenet (Researcher)
Damien Monrico (Engineer)
François Lefèvre (Engineer)
Alexander T. Smith (PhD)
Eugeni Belda (Post doc)

IT team Claude Scarpelli
Ludovic Fleury

External partners
Anne Morgat Antoine Danchin

Foundings

EU Framework Programme 7 Collaborative
Project. Grant Agreement Number 222886-2

Biocuration2012 Eugeni Belda

Recommended

Recommended

More Related Content

What's hot

What's hot (10)

Viewers also liked

Viewers also liked (10)

Similar to Biocuration2012 Eugeni Belda

Similar to Biocuration2012 Eugeni Belda (20)

Recently uploaded

Recently uploaded (20)

Biocuration2012 Eugeni Belda