SlideShare a Scribd company logo
1 of 1
Download to read offline
This work has been supported by the BBSRC/EPSRC grant: the Manchester Centre for Integrative Systems Biology



                                       Subliminal: exploiting semantic
                                       annotations in the reconstruction of
                                       metabolic networks
                                       Neil Swainston
                                       Manchester Centre for Integrative Systems Biology, University of Manchester, Manchester M1 7ND, UK

Introduction
The development of metabolic network reconstructions has increased in recent years. It now covers a range of organisms and has been applied
to a number of research topics including metabolic engineering, genome-annotation, evolutionary studies, network property analysis, and
interpretation of omics datasets1.

The process of developing such reconstructions is now defined and is recognised as being time-consuming2. While many of the steps associated
with generating a high-quality reconstruction require manual curation, some of these are applicable to automation, providing the possibility of
automating the process of generating a draft reconstruction to be used in subsequent manual curation3.

The importance of using standard representations such as SBML4 and the MIRIAM standard5 has been recognised6, with the development of
reconstructions in which all components are semantically annotated with unambiguous database identifiers greatly facilitating their use by third
parties.

However, to date, the use of semantic annotations has been focused on the usability of the reconstruction after publication. Subliminal
comprises a toolbox that exploits semantic annotations during the reconstruction process, utilising libAnnotationSBML7 and web service
interfaces to external databases such as ChEBI8 and UniProt9 to retrieve chemical and protein data which can be used in the automation of
chemical protonation state determination, reaction mass / charge balancing and enzyme (and reaction) localisation.


Initial pre-draft pathways: KEGG2SBML and other sources


                          Initial pre-draft pathways for a given organism are generated from the existing KEGG2SBML10 tool. KEGG2SBML
                          generates SBML files representing individual metabolic pathways, which are then enhanced by addition of semantic
                          annotations such as references to ChEBI and UniProt ids for metabolites and enzymes respectively, and EC terms.

                          Subsequent work will focus on generating additional pathways from MetaCyc11 and genome sequences.


Model merging: pre-draft reconstruction


                          As each of the initial pre-draft pathways, irrespective of their source, are semantically annotated with comparable terms,
                          each can be merged automatically to generate a pre-draft reconstruction in which duplicate metabolites, enzymes and
                          reactions are removed.




Protonation state prediction


                          Automated acquisition from the ChEBI database of the InChI12 (or SMILES) string representing each metabolite allows
                          protonation state of the metabolite at a given pH to be predicted using cheminformatic resources such as the Chemistry
                          Development Kit (CDK)13.




Reaction mass / charge balancing

                          By acquiring the chemical formulae and charge of each metabolite from the ChEBI database, each reaction can be
                          represented as an matrix, A, containing elements and charges for each reactant and product. The vector, b, represents
   Ab = 0                 the stoichiometric coefficients of each reactant. Mixed integer linear programming can be applied to solve Ab = 0,
                          producing a vector of stoichiometric coefficients to be applied to each reactant and product. Commonly absent species,
                          such as water, protons and CO2, can also be considered, allowing previously unbalancable reactions (for example, from
                          KEGG) to be balanced automatically.

Protein localisation


                          With each enzyme being annotated with UniProt terms, the UniProt web services can be queried to automatically acquire
                          each protein sequence. These can be fed to protein cellular location prediction algorithms such as PSORT14 in order to
                          predict subcellular location of the enzyme, and by implication, the reaction(s) that it catalyses.




Future directions
While individual steps in the reconstruction process are amenable to automation, it is recognised that gap-filling, manual curation and validation
are essential steps in generating a high-quality reconstruction. Semantic annotations can further aid the validation process through automated
harvesting of chemical synonyms which can be fed to text-mining tools such as PathText15 in order to simplify the arduous, but necessary, task
of finding evidence for present (and missing) reactions in the literature.
1Applications of genome-scale metabolic reconstructions. Oberhardt MA, Palsson BØ, Papin JA. Mol Syst Biol. (2009) 5:320
2A protocol for generating a high-quality genome-scale metabolic reconstruction. Thiele I, Palsson BØ. Nat Protoc. (2010) 5, 93-121.
3High-throughput generation, optimization and analysis of genome-scale metabolic models. Henry CS, DeJongh M, et al. Nat Biotechnol. (2010) 28, 977-82.
4The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Hucka M, Finney A, et al. Bioinformatics. (2003) 19, 524-31.
5Minimum information requested in the annotation of biochemical models (MIRIAM). Le Novère N, Finney A, et al. Nat Biotechnol. (2005) 23, 1509-15.
6A consensus yeast metabolic network reconstruction obtained from a community approach to systems biology. Herrgård MJ, Swainston N, et al. Nat Biotechnol. (2008) 26, 1155-60.
7libAnnotationSBML: a library for exploiting SBML annotations. Swainston N, Mendes P. Bioinformatics. (2009) 25, 2292-3.
8ChEBI: a database and ontology for chemical entities of biological interest. Degtyarenko K, de Matos P, et al. Nucleic Acids Res. (2008) 36, D344-50.
9The Universal Protein Resource (UniProt) in 2010. UniProt Consortium. Nucleic Acids Res. (2010) 38, D142-8.
10http://sbml.org/Software/KEGG2SBML/
11The EcoCyc and MetaCyc databases. Karp PD, Riley M, et al. Nucleic Acids Res. (2000) 28, 56-9.
12http://www.iupac.org/inchi/
13PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. Nakai K, Horton P. Trends Biochem Sci. (1999) 24, 34-6.
14The Chemistry Development Kit (CDK): an open-source Java library for Chemo- and Bioinformatics. Steinbeck C, Han Y, et al. J Chem Inf Comput Sci. (2003) 43, 493-500.
15PathText: a text mining integrator for biological pathway visualizations. Kemper B, Matsuzaki T, et al. Bioinformatics. (2010) 26, i374-81.

More Related Content

What's hot

Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...
Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...
Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...
Keiji Takamoto
 
Biochemical characterization of LOV domain proteins from protist-SK
Biochemical characterization of LOV domain proteins from  protist-SKBiochemical characterization of LOV domain proteins from  protist-SK
Biochemical characterization of LOV domain proteins from protist-SK
harimohan001
 

What's hot (14)

Proteomics and protein-protein interaction
Proteomics  and protein-protein interactionProteomics  and protein-protein interaction
Proteomics and protein-protein interaction
 
Mg Atp
Mg AtpMg Atp
Mg Atp
 
Introduction to Proteogenomics
Introduction to Proteogenomics Introduction to Proteogenomics
Introduction to Proteogenomics
 
Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...
Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...
Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...
 
Biochemical characterization of LOV domain proteins from protist-SK
Biochemical characterization of LOV domain proteins from  protist-SKBiochemical characterization of LOV domain proteins from  protist-SK
Biochemical characterization of LOV domain proteins from protist-SK
 
Hydrolysis of ATP
Hydrolysis of ATPHydrolysis of ATP
Hydrolysis of ATP
 
Proteomics ppt
Proteomics pptProteomics ppt
Proteomics ppt
 
Cobra phylogeny paper slides
Cobra phylogeny paper slidesCobra phylogeny paper slides
Cobra phylogeny paper slides
 
Role of ATP in Bioenergetics
Role of ATP in BioenergeticsRole of ATP in Bioenergetics
Role of ATP in Bioenergetics
 
Macromolecules or Big Small Molecules? Handling biopolymers in a chemical reg...
Macromolecules or Big Small Molecules? Handling biopolymers in a chemical reg...Macromolecules or Big Small Molecules? Handling biopolymers in a chemical reg...
Macromolecules or Big Small Molecules? Handling biopolymers in a chemical reg...
 
JBEI Highlights February 2015
JBEI Highlights February 2015JBEI Highlights February 2015
JBEI Highlights February 2015
 
Cytoscape: Gene coexppression and PPI networks
Cytoscape: Gene coexppression and PPI networksCytoscape: Gene coexppression and PPI networks
Cytoscape: Gene coexppression and PPI networks
 
Ppi
PpiPpi
Ppi
 
A new algorithm for Predicting Metabolic Pathways
A new algorithm for Predicting Metabolic PathwaysA new algorithm for Predicting Metabolic Pathways
A new algorithm for Predicting Metabolic Pathways
 

Viewers also liked

Continued development of ChEBI towards better usability for the systems biolo...
Continued development of ChEBI towards better usability for the systems biolo...Continued development of ChEBI towards better usability for the systems biolo...
Continued development of ChEBI towards better usability for the systems biolo...
Neil Swainston
 
Quantitative Proteomics: From Instrument To Browser
Quantitative Proteomics: From Instrument To BrowserQuantitative Proteomics: From Instrument To Browser
Quantitative Proteomics: From Instrument To Browser
Neil Swainston
 
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
Neil Swainston
 
Data standards for systems biology
Data standards for systems biologyData standards for systems biology
Data standards for systems biology
Neil Swainston
 
Data standards for systems biology
Data standards for systems biologyData standards for systems biology
Data standards for systems biology
Neil Swainston
 
Data Integration, Mass Spectrometry Proteomics Software Development
Data Integration, Mass Spectrometry Proteomics Software DevelopmentData Integration, Mass Spectrometry Proteomics Software Development
Data Integration, Mass Spectrometry Proteomics Software Development
Neil Swainston
 
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
Neil Swainston
 
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
Neil Swainston
 

Viewers also liked (10)

SBML Browse
SBML BrowseSBML Browse
SBML Browse
 
Continued development of ChEBI towards better usability for the systems biolo...
Continued development of ChEBI towards better usability for the systems biolo...Continued development of ChEBI towards better usability for the systems biolo...
Continued development of ChEBI towards better usability for the systems biolo...
 
Quantitative Proteomics: From Instrument To Browser
Quantitative Proteomics: From Instrument To BrowserQuantitative Proteomics: From Instrument To Browser
Quantitative Proteomics: From Instrument To Browser
 
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
 
Network cheminformatics: gap filling and identifying new reactions in metabol...
Network cheminformatics: gap filling and identifying new reactions in metabol...Network cheminformatics: gap filling and identifying new reactions in metabol...
Network cheminformatics: gap filling and identifying new reactions in metabol...
 
Data standards for systems biology
Data standards for systems biologyData standards for systems biology
Data standards for systems biology
 
Data standards for systems biology
Data standards for systems biologyData standards for systems biology
Data standards for systems biology
 
Data Integration, Mass Spectrometry Proteomics Software Development
Data Integration, Mass Spectrometry Proteomics Software DevelopmentData Integration, Mass Spectrometry Proteomics Software Development
Data Integration, Mass Spectrometry Proteomics Software Development
 
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
 
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
 

Similar to Subliminal: exploiting semantic annotations in the reconstruction of metabolic networks

2015_Nature Chem
2015_Nature Chem2015_Nature Chem
2015_Nature Chem
Ximin He
 
Conrado et al. 2011 NAR DNA-guided assembly of biosynthetic pathways promotes...
Conrado et al. 2011 NAR DNA-guided assembly of biosynthetic pathways promotes...Conrado et al. 2011 NAR DNA-guided assembly of biosynthetic pathways promotes...
Conrado et al. 2011 NAR DNA-guided assembly of biosynthetic pathways promotes...
SynEnthu
 

Similar to Subliminal: exploiting semantic annotations in the reconstruction of metabolic networks (20)

Systems biology and biotechnology of Streptomyces species for the production ...
Systems biology and biotechnology of Streptomyces species for the production ...Systems biology and biotechnology of Streptomyces species for the production ...
Systems biology and biotechnology of Streptomyces species for the production ...
 
Flux balance analysis
Flux balance analysisFlux balance analysis
Flux balance analysis
 
Metabolomics
MetabolomicsMetabolomics
Metabolomics
 
Project report: Investigating the effect of cellular objectives on genome-sca...
Project report: Investigating the effect of cellular objectives on genome-sca...Project report: Investigating the effect of cellular objectives on genome-sca...
Project report: Investigating the effect of cellular objectives on genome-sca...
 
CE508 Lecture 1 2006.ppt
CE508 Lecture 1 2006.pptCE508 Lecture 1 2006.ppt
CE508 Lecture 1 2006.ppt
 
CE508-Lecture 1 2007.ppt
CE508-Lecture 1 2007.pptCE508-Lecture 1 2007.ppt
CE508-Lecture 1 2007.ppt
 
2015_Nature Chem
2015_Nature Chem2015_Nature Chem
2015_Nature Chem
 
Conrado et al. 2011 NAR DNA-guided assembly of biosynthetic pathways promotes...
Conrado et al. 2011 NAR DNA-guided assembly of biosynthetic pathways promotes...Conrado et al. 2011 NAR DNA-guided assembly of biosynthetic pathways promotes...
Conrado et al. 2011 NAR DNA-guided assembly of biosynthetic pathways promotes...
 
Gdt 2-126
Gdt 2-126Gdt 2-126
Gdt 2-126
 
Gdt 2-126 (1)
Gdt 2-126 (1)Gdt 2-126 (1)
Gdt 2-126 (1)
 
Ecocyc database
Ecocyc databaseEcocyc database
Ecocyc database
 
MULISA : A New Strategy for Discovery of Protein Functional Motifs and Residues
MULISA : A New Strategy for Discovery of Protein Functional Motifs and ResiduesMULISA : A New Strategy for Discovery of Protein Functional Motifs and Residues
MULISA : A New Strategy for Discovery of Protein Functional Motifs and Residues
 
art%3A10.1186%2F1471-2105-13-93
art%3A10.1186%2F1471-2105-13-93art%3A10.1186%2F1471-2105-13-93
art%3A10.1186%2F1471-2105-13-93
 
Lafont proteins 2007
Lafont proteins 2007Lafont proteins 2007
Lafont proteins 2007
 
Computer simulation
Computer simulationComputer simulation
Computer simulation
 
Dynamic complex formation during the yeast cell cycle
Dynamic complex formation during the yeast cell cycleDynamic complex formation during the yeast cell cycle
Dynamic complex formation during the yeast cell cycle
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 
A Frequency Domain Approach to Protein Sequence Similarity Analysis and Funct...
A Frequency Domain Approach to Protein Sequence Similarity Analysis and Funct...A Frequency Domain Approach to Protein Sequence Similarity Analysis and Funct...
A Frequency Domain Approach to Protein Sequence Similarity Analysis and Funct...
 
whole body.pptx
whole body.pptxwhole body.pptx
whole body.pptx
 
iGEM UCSD 2015 Poster
iGEM UCSD 2015 PosteriGEM UCSD 2015 Poster
iGEM UCSD 2015 Poster
 

More from Neil Swainston

Integrative information management for systems biology
Integrative information management for systems biologyIntegrative information management for systems biology
Integrative information management for systems biology
Neil Swainston
 
ChEBI and genome scale metabolic reconstructions
ChEBI and genome scale metabolic reconstructionsChEBI and genome scale metabolic reconstructions
ChEBI and genome scale metabolic reconstructions
Neil Swainston
 
iQconCAT: quantitative proteomics from instrument to browser
iQconCAT: quantitative proteomics from instrument to browseriQconCAT: quantitative proteomics from instrument to browser
iQconCAT: quantitative proteomics from instrument to browser
Neil Swainston
 
Informatics In The Manchester Centre For Integrative Systems Biology
Informatics In The Manchester Centre For Integrative Systems BiologyInformatics In The Manchester Centre For Integrative Systems Biology
Informatics In The Manchester Centre For Integrative Systems Biology
Neil Swainston
 
QconCat: From Instrument To Browser
QconCat: From Instrument To BrowserQconCat: From Instrument To Browser
QconCat: From Instrument To Browser
Neil Swainston
 

More from Neil Swainston (6)

Integrative information management for systems biology
Integrative information management for systems biologyIntegrative information management for systems biology
Integrative information management for systems biology
 
ChEBI and genome scale metabolic reconstructions
ChEBI and genome scale metabolic reconstructionsChEBI and genome scale metabolic reconstructions
ChEBI and genome scale metabolic reconstructions
 
iQconCAT: quantitative proteomics from instrument to browser
iQconCAT: quantitative proteomics from instrument to browseriQconCAT: quantitative proteomics from instrument to browser
iQconCAT: quantitative proteomics from instrument to browser
 
Informatics In The Manchester Centre For Integrative Systems Biology
Informatics In The Manchester Centre For Integrative Systems BiologyInformatics In The Manchester Centre For Integrative Systems Biology
Informatics In The Manchester Centre For Integrative Systems Biology
 
QconCat: From Instrument To Browser
QconCat: From Instrument To BrowserQconCat: From Instrument To Browser
QconCat: From Instrument To Browser
 
libAnnotationSBML
libAnnotationSBMLlibAnnotationSBML
libAnnotationSBML
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 

Subliminal: exploiting semantic annotations in the reconstruction of metabolic networks

  • 1. This work has been supported by the BBSRC/EPSRC grant: the Manchester Centre for Integrative Systems Biology Subliminal: exploiting semantic annotations in the reconstruction of metabolic networks Neil Swainston Manchester Centre for Integrative Systems Biology, University of Manchester, Manchester M1 7ND, UK Introduction The development of metabolic network reconstructions has increased in recent years. It now covers a range of organisms and has been applied to a number of research topics including metabolic engineering, genome-annotation, evolutionary studies, network property analysis, and interpretation of omics datasets1. The process of developing such reconstructions is now defined and is recognised as being time-consuming2. While many of the steps associated with generating a high-quality reconstruction require manual curation, some of these are applicable to automation, providing the possibility of automating the process of generating a draft reconstruction to be used in subsequent manual curation3. The importance of using standard representations such as SBML4 and the MIRIAM standard5 has been recognised6, with the development of reconstructions in which all components are semantically annotated with unambiguous database identifiers greatly facilitating their use by third parties. However, to date, the use of semantic annotations has been focused on the usability of the reconstruction after publication. Subliminal comprises a toolbox that exploits semantic annotations during the reconstruction process, utilising libAnnotationSBML7 and web service interfaces to external databases such as ChEBI8 and UniProt9 to retrieve chemical and protein data which can be used in the automation of chemical protonation state determination, reaction mass / charge balancing and enzyme (and reaction) localisation. Initial pre-draft pathways: KEGG2SBML and other sources Initial pre-draft pathways for a given organism are generated from the existing KEGG2SBML10 tool. KEGG2SBML generates SBML files representing individual metabolic pathways, which are then enhanced by addition of semantic annotations such as references to ChEBI and UniProt ids for metabolites and enzymes respectively, and EC terms. Subsequent work will focus on generating additional pathways from MetaCyc11 and genome sequences. Model merging: pre-draft reconstruction As each of the initial pre-draft pathways, irrespective of their source, are semantically annotated with comparable terms, each can be merged automatically to generate a pre-draft reconstruction in which duplicate metabolites, enzymes and reactions are removed. Protonation state prediction Automated acquisition from the ChEBI database of the InChI12 (or SMILES) string representing each metabolite allows protonation state of the metabolite at a given pH to be predicted using cheminformatic resources such as the Chemistry Development Kit (CDK)13. Reaction mass / charge balancing By acquiring the chemical formulae and charge of each metabolite from the ChEBI database, each reaction can be represented as an matrix, A, containing elements and charges for each reactant and product. The vector, b, represents Ab = 0 the stoichiometric coefficients of each reactant. Mixed integer linear programming can be applied to solve Ab = 0, producing a vector of stoichiometric coefficients to be applied to each reactant and product. Commonly absent species, such as water, protons and CO2, can also be considered, allowing previously unbalancable reactions (for example, from KEGG) to be balanced automatically. Protein localisation With each enzyme being annotated with UniProt terms, the UniProt web services can be queried to automatically acquire each protein sequence. These can be fed to protein cellular location prediction algorithms such as PSORT14 in order to predict subcellular location of the enzyme, and by implication, the reaction(s) that it catalyses. Future directions While individual steps in the reconstruction process are amenable to automation, it is recognised that gap-filling, manual curation and validation are essential steps in generating a high-quality reconstruction. Semantic annotations can further aid the validation process through automated harvesting of chemical synonyms which can be fed to text-mining tools such as PathText15 in order to simplify the arduous, but necessary, task of finding evidence for present (and missing) reactions in the literature. 1Applications of genome-scale metabolic reconstructions. Oberhardt MA, Palsson BØ, Papin JA. Mol Syst Biol. (2009) 5:320 2A protocol for generating a high-quality genome-scale metabolic reconstruction. Thiele I, Palsson BØ. Nat Protoc. (2010) 5, 93-121. 3High-throughput generation, optimization and analysis of genome-scale metabolic models. Henry CS, DeJongh M, et al. Nat Biotechnol. (2010) 28, 977-82. 4The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Hucka M, Finney A, et al. Bioinformatics. (2003) 19, 524-31. 5Minimum information requested in the annotation of biochemical models (MIRIAM). Le Novère N, Finney A, et al. Nat Biotechnol. (2005) 23, 1509-15. 6A consensus yeast metabolic network reconstruction obtained from a community approach to systems biology. Herrgård MJ, Swainston N, et al. Nat Biotechnol. (2008) 26, 1155-60. 7libAnnotationSBML: a library for exploiting SBML annotations. Swainston N, Mendes P. Bioinformatics. (2009) 25, 2292-3. 8ChEBI: a database and ontology for chemical entities of biological interest. Degtyarenko K, de Matos P, et al. Nucleic Acids Res. (2008) 36, D344-50. 9The Universal Protein Resource (UniProt) in 2010. UniProt Consortium. Nucleic Acids Res. (2010) 38, D142-8. 10http://sbml.org/Software/KEGG2SBML/ 11The EcoCyc and MetaCyc databases. Karp PD, Riley M, et al. Nucleic Acids Res. (2000) 28, 56-9. 12http://www.iupac.org/inchi/ 13PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. Nakai K, Horton P. Trends Biochem Sci. (1999) 24, 34-6. 14The Chemistry Development Kit (CDK): an open-source Java library for Chemo- and Bioinformatics. Steinbeck C, Han Y, et al. J Chem Inf Comput Sci. (2003) 43, 493-500. 15PathText: a text mining integrator for biological pathway visualizations. Kemper B, Matsuzaki T, et al. Bioinformatics. (2010) 26, i374-81.