This work has been supported by the BBSRC/EPSRC grant: the Manchester Centre for Integrative Systems Biology              ...
Upcoming SlideShare
Loading in …5
×

Network cheminformatics: gap filling and identifying new reactions in metabolic networks based on metabolite similarity

907 views
806 views

Published on

The number of published metabolic network reconstructions are increasing, as are their applications. However, such reconstructions commonly include gaps (see Figure 1), which are due to incomplete source databases or holes in biochemical knowledge reported in literature. The filling of such gaps has been aided through automated techniques which attempt to mitigate these gaps by adding reactions from external resources such as KEGG.

The approach introduced here is to apply cheminformatics to determine and quantify chemical similarity across all metabolites in a metabolic network of S. cerevisiae. The hypothesis is that those metabolite pairs of high chemical similarity are likely to form reaction pairs, in which one metabolite can be converted to the other by a single chemical reaction. The similar scoring pairs that do not currently form a reaction pair in the network can be analysed, by either comparison with existing data resources or by literature searches, to determine whether they take part in a metabolic reaction.

Following this approach, preliminary results have led to the discovery of missing information from KEGG, and the assignment of function and determination of kinetic constants to a gene of previously unknown function.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
907
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
18
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Network cheminformatics: gap filling and identifying new reactions in metabolic networks based on metabolite similarity

  1. 1. This work has been supported by the BBSRC/EPSRC grant: the Manchester Centre for Integrative Systems Biology Gap filling and identifying new reactions in metabolic networks based on metabolite similarity Matthew G.S. Norris, Neil Swainston, Paul D. Dobson, Daniel Jameson, Evangelos Simeonidis, Kieran Smallbone, Naglis Malys Manchester Centre for Integrative Systems Biology, University of Manchester, Manchester M1 7ND, UKIntroductionThe number of published metabolic network reconstructions are Two chemical similarity distributions were generated, resulting fromincreasing, as are their applications. However, such reconstructions pairs of metabolites that do and do not form a reaction pair in thecommonly include gaps (see Figure 1), which are due to incomplete network (plotted as actual and potential pairs in Figure 3). Masssource databases or holes in biochemical knowledge reported in differences are calculated, such that potential pairs were onlyliterature. The filling of such gaps has been aided through automated considered if they exhibit a mass difference of an actual pair, resultingtechniques which attempt to mitigate these gaps by adding reactions from a known chemical transformation.from external resources such as KEGG1. ResultsThe approach introduced here is to apply cheminformatics todetermine and quantify chemical similarity across all metabolites in It can be seen that the majority of actual metabolite pairs have aa metabolic network of S. cerevisiae2. The hypothesis is that those chemical similarity score greater than 0.7. However, only 8.5% (557)metabolite pairs of high chemical similarity are likely to form of potential pairs exhibit such similarity. Of these 557, 99 werereaction pairs, in which one metabolite can be converted to the found to form a reaction pair in KEGG, but were not present in theother by a single chemical reaction. The similar scoring pairs that metabolic network. From these 99 pairs, a number were selected fordo not currently form a reaction pair in the network can be analysed, further evaluation, and three examples of this are provided in Tableby either comparison with existing data resources or by literature 1. The evaluation entailed:searches, to determine whether they take part in a metabolic reaction. •  extraction from KEGG of homologous protein sequences thatFollowing this approach, preliminary results have led to the discovery catalyse these reactions;of missing information from KEGG, and the assignment of functionand determination of kinetic constants to a gene of previously •  BLAST searching these sequences against a S. cerevisiae proteinunknown function. database to identify candidate enzymes exhibiting this activity; Figure 1: Gaps in metabolic •  literature search and / or experimental validation of the networks. activity of these candidates. Unreachable metabolites are disconnected from the KEGG Reaction Similarity Gene id KM / µM Kcat / s-1 reaction score extracellular medium. “Blocked” R00585 L-serine + pyruvate <=> hydroxypyruvate 0.87 YFL030W Gene activity confirmed in reactions are incapable of + L-alanine by literature search4 carrying flux as they lead to R00720 ITP + H2O <=> IMP + diphosphate 0.78 YJR069C 2.33 0.14 dead-end metabolites (such as R01215 L-valine + pyruvate <=> 3-methyl-2- 0.76 YER152C No experimental validation the metabolites f and j). Gap oxobutanoic acid + L-alanine filling is required to reconcile both issues. Table 1: Reactions found for three highly similar scoring metabolite pairs that were not present in the metabolic reconstruction. Metabolites that form pairs are highlighted in bold. Kinetic constants were determined through proteinMethod expression, purification and absorbance assay (see Figure 4).Metabolites were extracted from a genome-scale metabolic network, Further workand SMILES strings representing their chemical structure wereacquired. The structures were compared in a pairwise manner using Future directions may include:the Chemical Development Kit (CDK)3, to determine a chemicalsimilarity score between each pair (see Figure 2). •  focussing on those metabolites that are known to be “dead-ends” or are disconnected from the core network, thus more-closely integrating the method with network gap filling; •  automating the bioinformatics aspect of the pipeline (BLAST searching, etc.) to automate the identification of putative enzymes; •  apply text-mining to find potential reactions from literature where reactions are not present in existing data resources such as KEGG; •  application of the approach to metabolite identification in metabolomics experiments.Figure 2: Example of chemical similarity score generated from SMILES stringsusing the CDK for the metabolite pair IMP and ITP. Similarity score distribution of actual and potential metabolite pairs Figure 4: Confirmation of ITP 40.0 pyrophosphohydrolase activity for YJR069C. A Malachite Green assay was performed to detect orthophosphate, indicating hydrolysis of ITP and release 30.0 of pyrophosphate by YJR069C, which is further hydrolysed to orthophosphate by inorganic phosphatase (IP). !Percentage 20.0 Actual pairs References Potential pairs 1KEGG: kyoto encyclopedia of genes and genomes. Kanehisa M, et al. Nucleic Acids Res. 2000, 28, 27-30. 10.0 2A consensus yeast metabolic network reconstruction obtained from a community approach to systems biology. Herrgård MJ, et al. Nat Biotechnol. 2008, 26, 1155-60. 3Recent developments of the chemistry development kit (CDK) - an open-source 0.0 java library for chemo- and bioinformatics. Steinbeck C, et al. Curr Pharm Des. 0.0-0.1 0.1-0.2 0.2-0.3 0.3-0.4 0.4-0.5 0.5-0.6 Similarity score 0.6-0.7 0.7-0.8 0.8-0.9 0.9-1.0 2006, 12, 2111-20. 4Crystal structure and confirmation of the alanine:glyoxylate aminotransferaseFigure 3: Similarity score distribution of actual and potential metabolite pairs. activity of the YFL030w yeast protein. Meyer P, et al. Biochimie. 2005, 87, 1041-7.

×