Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Biocuration2012 Eugeni Belda


Published on

Presentation of Eugeni Belda (LABGeM-Genoscope) at the Biocuration 2012 conference (Georgetown University, Washington DC): From bacterial genome annotation to metabolic pathway curation

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Biocuration2012 Eugeni Belda

  1. 1. Eugenio BeldaLaboratory of Bioinformatic Analysis in Genomic and Metabolism (LABGeM team) CEA/DSV/IG/Genoscope & CNRS UMR8030
  2. 2. Introduction Advances in sequencing technologies has allowed an exponential accumulationof complete genome sequences in public databases in recent years. 12273 protein 4712 enzymatic However, wide gap exist activities families (Pfam)between rapid advances in genome (EC number)sequencing and slow progress in 25% of 26%characterization of new protein orphan of unknownfunctions reactions functions ? Genoscope (French National Sequencing Center) hasas one fundamental research objective the extension of insilico sequence annotations with experimentalcharacterization of new enzymatic functions (MetabolicGenomics). Lab. of Genomics & Biochemistry of Metabolism (LGBM)  Lab. of Organic Chemistry and Biocatalysis (LCOB) Lab. For enzymatic cloning and screening (LCAB) Lab. of Bioinformatic Analysis in Genomic and Metabolism (LABGeM)
  3. 3. Three MicroScope componentsProcess Management Primary Databank Syntactic Functional / relational > 25 methods : Update Annotations Analyses Integrated in a JBPM Database workflow DB Job management system Release History => full automatisation : PkGDB MicroCyc • genome annotationData Management • primary data up-to-date Primary Internal Computational Pathway Databanks Genomic results Genome Objects DataBases Vallenet D. et al. «MicroScope - a platform for microbial genome annotation MaGe Web Interface Keyword search Blast and Pattern and comparative genomics» Tutorial Login Phylogenetic Profile Database 2009Visualization Fusion / Fission Genome overview Tandem duplications Genome browser Minimal Gene Set Vallenet D, et al. Data Export and RGPfinder Synteny maps SNPs / InDels «MaGe - a microbial genome Artemis annotation system supported KEGG MicroCyc by synteny results» Nucleic CGView LinePlot Synton Gene Gene Metabolic Profile Acids Research 2006 display editor card Pathway / Synteny
  4. 4. Database Management Relational DataBase PkGDB (Prokaryotic Genome DataBase) EC / reaction correspondence • Experimentally elucidated metabolic pathways • 1800 pathways from 2216 organisms (P. Karp, SRI, USA) Pathway Tools A metabolic database is built for each annotated microbial genome PGDB = Pathway/Genome Database (orgname_Cyc) Today: 1233 organisms (of which 676 public genomes) Mapping on the PkGDB KEGG metabolic maps (
  5. 5. MicroScope Web site  More than 30 tools are made available to the community «guest» access «guest» access Since 2005, more than 50.000 expert annotations per year > 1,000 users, 300 active
  6. 6. Curation of metabolic data in Microscope  CanOE (Candidate genes for Orphan Enzymes): Method for the automatic integration of genomic and metabolic contexts, that assists expert functional annotation, especially in the case of orphan enzymes. Based on the concept of Metabolon (“close” genes in genome sequence associated to “close” metabolic reactions): Boyer et. Al; Bioinformatics 2005; Dec 1;21(23):4209-15. gene gaps genes on genome functional annotations ? reactions and compounds in metabolic network reaction gap And ORPHAN The method provides candidate genes for global/local orphan enzymatic activities that are located in the “gaps” of metabolons
  7. 7. Curation of metabolic data in Microscope  CanOE (Candidate genes for Orphan Enzymes) Example: Allantoin degradation metabolon in E. coli K12 is a global orphan reaction (no associated to any gene in any organism) Three candidate genes for EC: reaction  None share any significant similarities with kown carbamoytransferases  Protein expression and biochemical assays under waySmith AAT, Belda E., Viari A., Médigue C., and Vallenet D. “The CanOE strategy: integrating genomic and metabolic contexts across multipleprokaryote genomes to find candidate genes for orphan enzymes” (Plos Computational Biology, In revision)
  8. 8. Curation of metabolic data in Microscope  GPR curation interface: In the context of network reconstruction, is essential the definition of Gene-Protein-Reaction associations (Genes encoding enzymes/complexes/isozymes catalyzing a particular metabolic reaction): Thiele & Palsson; Nat Protoc. 2010;5(1):93-121
  9. 9. Curation of metabolic data in Microscope  GPR curation interface: The gene curation interface of Microscope allows the validation of Gene-Reaction associations based on curated gene annotations. Two reference reaction resources availables, MetaCyc (functional) and RHEA (under development):, Automatic retrieval of Metacyc/Rhea reactions based on EC number  Keyword search
  10. 10. Curation of metabolic data in Microscope  Pathway validation interface: Validation/curation of automatically projected MetaCyc pathways based on Gene-Reaction associations:
  11. 11. Projet Microme : A Knowledge-Based Bioinformatics Framework for Microbial Pathway Genomics AMAbiotics Purpose : develop bioinformatics infrastructures, Centro Nacionaltogether with a projection and curation process, in de Biotecnologíaorder to generate : CEA-Genoscope - complete metabolic pathways from genome European Bioinformaticsannotations Center for research Institute - whole-cell metabolic models from pathway and Technology German Collection of Hellasassemblies Microorganisms and Cell Cultures ISTHMUS Spanish National Experimentally validation of metabolic model Cancer Centreusing growth phenotype data (i.e, BIOLOG Molecular Tel-Avivexperiments) generated within the project for a Networks Universitysubset of selected species. Université Swiss Institute of Libre de Bioinformatics Bruxelles Analytical tools are integrated for comparativeand phylogenetic analysis based on projected Wageningen Wellcome Trustpathways and metabolic models Sanger Institute University
  12. 12. Microme WP2: Objectives Provide EU with a curated microbial metabolic resource Implement a unique cyclic and colaborative curation process for metabolic data Unification of existing metabolic resources:  Pivot resources: ChEBI (chemical compounds) and Rhea (chemical reactions)  Cross-references External resources (compounds, reactions, pathways): KEGG, MetaCyc, Metabolic models Alcantara R., Axelsen K.B., Morgat A., Belda E., Coudert E., Bridge A., Cao H., de Matos P., Ennis M., Turner S., Owen G., Bougueleret L., Xenarios I., and Steinbeck C. (2012) Rhea - a manually curated resource of biochemical reactions. Nucleic Acids Research. 40, D754- D760, Database issue.MicroScope and Microme Use MicroScope as reference resource of curated GPR (Gene Protein Reaction)associations for microbial genomes included in Microme project Development of novel interfaces for GPR curation in Microscope environment. Retrievalof METACYC and RHEA reactions for a particular gene object from EC number annotations
  13. 13. MicroScope and Microme  Development of web-services to provide Microme partners with curated Gene- Reaction associations from Microscope platform Curation tool Reconstruction microcyc Each night PkGDB Web-services
  14. 14. Test-case: Bacillus subtilis 168 re-annotation  Second most intensively studied bacterium after Escherichia coli, being a model organism for Gram-positive bacteria  Genome sequenced in 1997. 4,214 Megabases, 4000 CDSs Nature 1997 Nov 20;390(6657):249-56  Re-sequencing and first re- annotation of the genome in 2009 Microbiology (2009), 155, 1758-1775  Re-annotation of the genome in the context of Microme project with special focus in the curation of Gene-Reaction associations by using Microscope metabolic tools and curation interface. Collaborative work LABGeM (CEA)-SIB-AMAbiotics (Antoine Danchin)
  15. 15. Test-case: Bacillus subtilis 168 re-annotation  Starting data for curation of Gene-Reaction associations Predicted MetaCyc reaction; BBH relationship with E. coli CDSs Predicted MetaCyc reaction; No BBH 310 CDSs relationship with E. coli 531 CDSs CDSs 909 CDSs 508 CDSs 378 CDSs "Putative enzymes" in Product type annotation; No predicted MetaCyc reaction "Enzymes" in Product type annotation; No predicted MetaCyc reaction
  16. 16. Test-case: Bacillus subtilis 168 re-annotation  From the 909 CDS with predicted reaction  531 with BBH in E. coli:  416 with same GPR in B. Automatic validation of Gene- subtilis and E. coli (EcoCyc) Reaction associations  115 CDS with different GPR in B. subtilis and E. coli (EcoCyc) Manual curation of Gene-Reaction associations in Microscope  378 without BBH in E. coli: environment  254 with GPR predicted from  Sequence similarity profiles the curated EC number  Genomic context  124 with GPR predicted from conservation “product” annotation  310 CDS with “enzyme” annotation and  Integration of genomic and without predicted reaction metabolic context (CanOE strategy)  508 CDS with “enzyme” annotation and without predicted reaction: Filter by  Co-evolution patterns of Catalytic activity field in SwissProt annotations (41 CDSs) functionally related genes
  17. 17. Test-case: Bacillus subtilis 168 re-annotation Problems associated toautomatic predictions of Gene-Reaction associations. Example:Generic EC number definitionassociated to multiple specific No experimentalreaction instances in MetaCyc evidence of activity ; generic product annotation 17 predicted reactions based on EC: annotation. Problems in terms of modelling purposes Without experimental evidence of specific substrates, only generic reaction has been validated
  18. 18. Test-case: Bacillus subtilis 168 re-annotation  Stats of curation Gene-Reaction associations in Microscope 1022 Nº reactions Initial Gene- 985 (388) Reaction predictions 901 (Pathway Tools) Nº CDS 1006 (517) Current Gene-Nº Gene-Reaction 1549 Reaction associations 1406 (715) associations (Manually Curated) 0 500 1000 1500 2000 105 CDS without automatically predicted  147 new reactions added (not reaction in initial originally predicted) projections  184 originally predicted reactions removed
  19. 19. Test-case: Bacillus subtilis 168 re-annotation  17 possible updates of SwissProt annotations Reported to SwissProt/IUBMB  6 possible new EC numbers curators  13 possible new metabolic pathways/pathway variants not presents in MetaCyc  Biotin biosynthesis pathway variant  Lipoate biosynthesis pathway variant New  Myoinositol catabolism pathway variantpathway  Rhamnogalacturonan type I degradation pathway variantvariants  Acetoin dehydrogenase pathway variant  Methionin salvage pathway variant  Bacillaene biosynthesis pathway  Aerobic respiration pathway variants  Aromatic polyketide biosynthesis pathway New  2-methylthio-N6-threocarbamoyladenosine biosynthesis metab. Bacilysocin biosynthesispathways Archaeal-type ether lipid biosynthesis Bacillaene biosynthesis pathway Methionine-Cysteine interconversion
  20. 20. Test-case: Bacillus subtilis 168 re-annotation  Biotin biosynthesis pathway variant: Update of DAP aminotransferase pathway variant (EC: pathway (map00780) MetaCyc pathway (PWY-5005) S-Adenosyl-L- methionine as amino group donor L-lysine instead S-adenosyl- Methionine as amino group donor in Bacillus subtilis BioA enzyme
  21. 21. Test-case: Bacillus subtilis 168 re-annotation  Biotin biosynthesis pathway variant: Link with fatty acid metabolism. Improvement of genome-scale metabolic models iBsu1103: Most up-to-date B. subtilis 168 metabolic model (SEED methodology; 1437 reactions, 1103 genes). Henry CS, Zinner JF, Cohoon MP, Stevens RL. Genome Biol. 2009;10(6):R69Dead-endmetabolite Auxotrophic for EX_pimelate Biotin biosynthesis FBA simulations iBsu1103 model 122.97 122.97 122.97 140.00 Not included in Biomass prod. rate 120.00 Biomass equation 100.00 80.00 60.00 EX_biotin 40.00 0.00 20.00 0.00 iBsu1103 iBsu1103; Biotin iBsu1103; iBsu1103; in Biomass External influx External influx Pimelate Biotin
  22. 22. Test-case: Bacillus subtilis 168 re-annotation BioI enzyme of B. subtilis 168: cytochromeP450 protein that catalyzes the oxidativecleavage of acyl-ACP/free fatty acid moleculesgenerated in the context of fatty acidbiosynthesis yielding pimeloyl-ACP as primaryproduct. Fatty acids An Acyl-ACP metabolism BioI (BSU30190) L-Alanine+H+ Pimeloyl-ACP BioF (BSU30220) CO2+HoloACP A fatty acid BioI (BSU30190)
  23. 23. Future work  Extension of the reference set of Microme species to:  Acinetobacter sp. ADP1  Pseudomonas putida KT2440  Bacillus subtilis 168  Second version of Gene-Reaction curation interface in Microscope environment:  Curation of protein complexes / Isozyme sets  Management of Rhea reactions in addition of MetaCyc reactions  Definition of strategies for vertical annotation and propagation of curated GPR across multiple microbial genomes  Use UniPathway as reference resource of metabolic pathways in Microscope; Specie-specific pathway representations based on Pathway modules combination (
  24. 24. Contributions Claudine Médigue (Group Leader) David Vallenet (Researcher) Damien Monrico (Engineer) François Lefèvre (Engineer) Alexander T. Smith (PhD) Eugeni Belda (Post doc) IT team Claude Scarpelli Ludovic FleuryExternal partners Anne Morgat Antoine DanchinFoundings EU Framework Programme 7 Collaborative Project. Grant Agreement Number 222886-2