Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Pipeline for automated structure-based classification in the ChEBI ontology


Published on

Presented at the ACS in Dallas: ChEBI is a database and ontology of chemical entities of biological interest, organised into a structure-based and role-based classification hierarchy. Each entry is extensively annotated with a name, definition and synonyms, other metadata such as cross-references, and chemical structure information where appropriate. In addition to the
classification hierarchy, the ontology also contains diverse chemical and ontological relationships. While ChEBI is primarily manually maintained, recent developments have focused on improvements in curation through partial automation of common tasks. We will describe a pipeline we have developed for structure-based classification of chemicals into the ChEBI structural classification. The pipeline connects class-level structural knowledge encoded in Web Ontology Language (OWL) axioms as an extension to the ontology, and structural information specified in standard MOLfiles. We make use of the Chemistry Development Kit, the OWL API and the OWLTools library. Harnessing the pipeline, we are able to suggest the best structural classes for the classification of novel structures within the ChEBI ontology.

Published in: Technology
  • Be the first to comment

Pipeline for automated structure-based classification in the ChEBI ontology

  1. 1. Pipeline for automated structure-based classification in the ChEBI ontology Janna Hastings Coordinator, Cheminformatics and Metabolism ACS Symposium on Chemical Ontologies, Taxonomies and Schemas. Dallas, 16 March 2014
  2. 2. Chemical Entities of Biological Interest Freely available online, available for download in full Freely available online, available for download in full Low molecular weight, i.e. no proteins Low molecular weight, i.e. no proteins Definitions, relationships, hierarchy Definitions, relationships, hierarchy E.g. metabolites, drugs, pesticides E.g. metabolites, drugs, pesticides 38,215 entries last release 38,215 entries last release
  3. 3. What does ChEBI provide? Chemical structures and visualisations caffeine 1,3,7-trimethylxanthine methyltheobromine Names and synonyms Formula: C8H10N4O2 Charge: 0 Mass: 194.19 Chemical data metabolite CNS stimulant trimethylxanthines Ontology – classifications MSDchem: CFF KEGG DRUG: D00528 PubMed citations Links to more information Chemical Informatics InChI=1/C8H10N4O2/c1-10-4-9-6- 5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3 SMILES CN1C(=O)N(C)c2ncn(C)c2C1=O
  4. 4. Example ChEBI entry page
  5. 5. Example entry page (continued)
  6. 6. Example entry page (continued)
  7. 7. Structure-based classification in ChEBI
  8. 8. Challenges with manual classification • May be incomplete • May be inconsistent • Difficult to maintain (even with extensive use of computationally expensive automatic validations) • Blocks automatic loading of otherwise high-quality externally annotated chemical data into ChEBI (as no classification available)
  9. 9. SOCO (SMARTS, OWL) Leonid Chepelev, Michel Dumontier, collaborators • Given a training set of classified molecules, examine structures for consensus features across all (using fragmentation and feature detection) • Capture features hierarchically • Use OWL to classify Chepelev et al. BMC Bioinformatics 2012 13:3 doi:10.1186/1471-2105-13-3
  10. 10. Limitations of SOCO • No support for negation • Only “min” (at least) counting supported, not max or exact. Thus, dicarboxylic acid is_a monocarboxylic acid (Every two-legged human is also a one-legged human in the sense that they have at least one leg…) • SMARTS is powerful – but not very human-readable. ChEBI is for human biologist and chemist consumption. E.g. SMARTS for the class of aliphatic amines: [$([NH2][CX4]),$ ([NH]([CX4])[CX4]),$[NX3]([CX4])([CX4])[CX4])] Can we do better at making definitions accessible?
  11. 11. A new pipeline for automated structure- based ontology classification in ChEBI Definitions (OWL) ChEBI structures OWL Parser => logical cheminformatics definitions OWL Parser => logical cheminformatics definitions Novel structure Candidate classes RankingRankingBest classes: save is_a relations MatchingMatching
  12. 12. Human-readable definitions, mapped to structures in ChEBI knowledgebase thiadiazoles: molecular_entity and has_part some ( 1,2,3-thiadiazole or 1,2,4-thiadiazole or 1,2,5-thiadiazole or 1,3,4-thiadiazole ) diterpenoid: organic_molecular_entity and has_part exactly 2 terpenoid organic ion: organic_molecular_entity and ( has_charge some int[>0] or has_charge some int[<0] ) monocyclic compound: molecular_entity and has_cycles value "1"^^int Logical operatorsLogical operators Counts (min, max and exact) Counts (min, max and exact) PropertiesProperties PartsParts
  13. 13. Planned integration into ChEBI tools • ChEBI internal data loader and bulk submissions • ChEBI online submission tool Pre-population of matched classes Pre-population of matched classes
  14. 14. Acknowledgements – Thanks! ChEBI team: Christoph Steinbeck Gareth Owen Adriano Dekker Namrata Kale Steve Turner Venkatesh Muthukrishnan Collaborators: Colin Batchelor, RSC Lian Duan, ETH Leonid Chepelev, Ottawa Michel Dumontier, Stanford Despoina Magka, Oxford Ilinca Tudose and John May, EBI Funding: BBSRC “Continued development of ChEBI towards better usability for the systems biology and metabolic modelling communities” BB/K019783/1
  15. 15. Questions? Thank you for listening! ACS Symposium on Chemical Ontologies, Taxonomies and Schemas. Dallas, 16 March 2014