Automatic classification in ChEBI


Published on

Presented at the 2nd ChEBI User Group Workshop. Discusses some of the difficulties encountered in the project which aims to classify chemicals in the ChEBI ontology automatically based on their structures.

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • More notes in team discussion: prioritise standard inchis add batch submissions (bulk submissions) which additional properties do we pre-calculate and make visible in ChEBI? (team discussion needed) (lipinski?) OWL improvement to make SPARQL querying easier and improve the relationship patterns (not ALWAYS subclassof exists some). This ties into the SADI-fying of ChEBI and should also involve thinking of and testing out specific use cases for *doing stuff with* the exported OWL file. Concern about downgrade in quality caused by increase in scale (quantity of compounds)
  • Automatic classification in ChEBI

    1. 1. Automatic classification, logical definitions Janna Hastings, EBI Cheminformatics and Metabolism 2nd ChEBI User Group Workshop, 24 June 2010
    2. 2. <ul><li>Chemistry is a domain with a rich heritage of classification based on structural features </li></ul>ChEBI ontology 20.10.10
    3. 3. ChEBI ontology 20.10.10 The ChEBI ontology contains a large asserted is-a hierarchy of chemical classes and compounds Each chemical class is clearly defined in natural language
    4. 4. Why automatic classification? <ul><li>A reasoner can help manage your complex hierarchy </li></ul><ul><li>Minimise curation overhead by harnessing the power of the knowledge already captured in the ontology and the structures already drawn for the new chemicals </li></ul><ul><li>Avoid redundancy and excessive pre-coordination of terms </li></ul>ChEBI ontology 20.10.10
    5. 5. <ul><li>Provides a decidable set of constructors for defining classes </li></ul>OWL 2 ChEBI ontology 20.10.10 oneOf disjointWith sameClassAs rdfs:subClassOf unionOf intersectionOf complementOf minCardinality maxCardinality cardinality inverseOf TransitiveProperty SymmetricProperty FunctionalProperty InverseFunctionalProperty allValuesFrom someValuesFrom
    6. 6. Necessary and sufficient conditions <ul><li>Necessary conditions </li></ul><ul><li>‘ hydrocarbon molecular entity’ has_atom some ‘carbon atom’ </li></ul><ul><li>‘ hydrocarbon molecular entity’ has_atom some ‘hydrogen atom’ </li></ul>ChEBI ontology 20.10.10 neopentane
    7. 7. Necessary and sufficient conditions <ul><li>Sufficient conditions: </li></ul><ul><li>‘ hydrocarbon molecular entity’ has_atom only ( ‘carbon atom’ </li></ul><ul><li>or ‘hydrogen atom’ ) </li></ul>ChEBI ontology 20.10.10 neopentane
    8. 8. OWL Reasoning ChEBI ontology 20.10.10 neopentane has_atom some ‘carbon atom’ has_atom some ‘hydrogen atom’ neopentane subClassOf (is_a) ‘ hydrocarbon molecular entity’
    9. 9. Parts and properties <ul><li>Chemical ontology consists of chemical classes which can be defined by parts of structures and/or properties of structures </li></ul>ChEBI ontology 20.10.10 carboxylic acid cyclic molecule if molecule has part some carboxy group if molecule has property cyclic, i.e. a self-connected cyclic path exists through the molecule’s atoms
    10. 10. Pre-coordination vs. post-coordination <ul><li>Given a set of properties that can be used in class definitions, you get an explosion of possible combinations </li></ul><ul><li>e.g. ‘cyclic’ </li></ul>ChEBI ontology 20.10.10
    11. 11. Pre-coordination vs. post-coordination <ul><li>Other properties: saturated, radical, ion/anion, ... </li></ul>ChEBI ontology 20.10.10
    12. 12. Pre-coordination vs. post-coordination ChEBI ontology 20.10.10
    13. 13. Logically defining chemical classes 20.10.10 Goal: transform the textual definitions into logical definitions which are then accessible for automated reasoning ‘ carbonyl compound’ ↔ has_part some ( ‘carbonyl group’ ) ‘ carboxylic acid’ ↔ has_part some ( ‘carboxy group’ ) ‘ monocarboxylic acid’ ↔ has_part exactly 1 ( ‘carboxy group’ ) ‘ hydroxy monocarboxylic acid’ ↔ has_part exactly 1 ( ‘carboxy group’ ) and has_atom only some ( ‘hydrogen atom’ or ‘carbon atom’ or ‘oxygen atom’ ) is_a is_a is_a is_a
    14. 14. Foundational classification of molecules ChEBI ontology 20.10.10
    15. 15. Foundational classification of molecules <ul><li>XXXX molecular entity ≝ ∃ has_atom some XXXX atom </li></ul><ul><li>carbon molecular entity ≝ ∃ has_atom some carbon atom </li></ul>ChEBI ontology 20.10.10 has_part in ChEBI
    16. 16. Classification based on regularities in naming <ul><li>name ends with - oic acid </li></ul><ul><li>is_a oxoacid (CHEBI:24833) </li></ul>ChEBI ontology 20.10.10
    17. 17. Classification based on chemical structure ChEBI ontology 20.10.10 Best would be to include the structure in the ontology Without structure, all parts must be explicitly asserted (combinatorial explosion for larger molecules) But the structure of complex molecules breaks the OWL Tree Model requirement does not have a model in the shape of a tree
    18. 18. Recent work: description graphs <ul><li>Description graphs are a recent extension to OWL2 which allows graph structures to be captured at the class level </li></ul><ul><li>We generated these for chemicals in ChEBI </li></ul>ChEBI ontology 20.10.10
    19. 19. Rules for properties ChEBI ontology 20.10.10 molecule(?x), atom(?a1), atom (?a2), atom(?a3), atom(?a4), bond(?b1), bond (?b2), bond(?b3), bond (?b4), has_atom(?x, ?a1), has_atom(?x, ?a2), has_atom(?x, ?a3), has_atom(?x, ?a4), has_bond(?a1, ?b1), has_bond(?a1, ?b4), has_bond(?a2, ?b1), has_bond(?a2, ?b2), has_bond(?a3, ?b2), has_bond(?a3, ?b3), has_bond(?a4, ?b3), has_bond(?a4, ?b4) -> cyclic_entity(?x) cyclobutane tetrahedrane
    20. 20. Rules for classes defined by parts ChEBI ontology 20.10.10 molecule(?y), atom(?a0), oxygen_atom(?a1), carbon_atom(?a2), oxygen_atom (?a3), has_atom(?y, ?a0), has_atom (?y, ?a1), has_atom (?y, ?a2), has_atom (?y, ?a3), double_bond(?b0), single_bond (?b1), single_bond (?b2), has_bond(?a0, ?b2), has_bond(?a1, ?b1), has_bond(?a2, ?b0), has_bond(?a2, ?b1), has_bond(?a2, ?b2), has_bond(?a3, ?b0) -> carboxylic_acid(?y) benzoic acid has this part so: is a carboxylic acid carboxylic acid benzoic acid
    21. 21. Testing the reasoning <ul><li>Can we use a reasoner to deduce the classification hierarchy based on the graphs and rules? </li></ul>No asserted hierarchy between test classes and molecules with generated graphs
    22. 22. Results <ul><li>Inferred hierarchy shows classified molecules </li></ul>
    23. 23. That’s great, but... ChEBI ontology 20.10.10
    24. 24. Simple substructure search <ul><li>Can be done with cheminformatics software outside the ontology for a defined list of groups </li></ul>ChEBI ontology 20.10.10 Get a list of groups in ChEBI
    25. 25. Substructure search ChEBI ontology 20.10.10 benzoic acid has this part so: is a carboxylic acid carboxylic acid benzoic acid
    26. 26. Goal 20.10.10 We extract features from the structural specifications of chemical compounds using standard cheminformatics techniques and use these to automatically classify compounds into defined classes CDK has_part exactly 1 ( ‘carboxy group’ ) has_part some ( ‘cholesterol’ ) 3β-hydroxy-4β-methyl-5α-cholest-7-ene-4α-carboxylic acid has_part only some ( ‘carbon atom’ or ‘oxygen atom’ or ‘hydrogen atom’ ) hydroxy monocarboxylic acid
    27. 27. Elements of chemical class definitions <ul><li>Composition and cardinality </li></ul><ul><ul><li>&quot;tricarboxylic acid&quot; can be defined as a compound containing exactly three carboxy groups </li></ul></ul><ul><li>Skeleton </li></ul><ul><ul><li>&quot;metalloporphyrins&quot; can be defined as any compound containing a porphyrin skeleton and a metal atom </li></ul></ul><ul><ul><li>B ut beware! Skeleton is not always substructure </li></ul></ul>ChEBI ontology 20.10.10
    28. 28. Elements of chemical class definitions <ul><li>Number and arrangement of rings in a ring system </li></ul><ul><ul><li>bicyclic compound </li></ul></ul><ul><ul><li>polycyclic cage </li></ul></ul><ul><li>properties such as charge and unpaired electrons </li></ul><ul><ul><li>ion, radical </li></ul></ul><ul><li>Structural formula </li></ul><ul><ul><li>alkane: acyclic branched or unbranched hydrocarbon having the general formula C n H2 n +2 </li></ul></ul>ChEBI ontology 20.10.10
    29. 29. ‘ Features’ must be explicitly asserted <ul><li>All properties and parts have to be explicitly associated with molecules in the ontology </li></ul><ul><li>e.g. has_part has_charge </li></ul><ul><li>has_attribute (XXX which has_value YYYY) has_ring_count </li></ul><ul><li>=> adding new relationships </li></ul>ChEBI ontology 20.10.10
    30. 30. Conclusions <ul><li>Chemical classes are defined based on features and parts of molecules </li></ul><ul><li>These class definitions can be captured explicitly in OWL as ‘necessary and sufficient conditions’ </li></ul><ul><li>This allows automatic classification </li></ul><ul><li>if the features are also asserted about the molecules </li></ul>ChEBI ontology 20.10.10
    31. 31. Thank you for your attention
    32. 32. 2 nd ChEBI UGM: Closing remarks <ul><li>Relationships: more, more specific </li></ul><ul><li>Natural products: flag them </li></ul><ul><li>Change of focus from OBO to OWL </li></ul><ul><li>Expose fingerprints? </li></ul><ul><li>Commitment to BFO? (general classes) What about DOLCE? (also GFO) </li></ul><ul><li>Scope: become clearer </li></ul><ul><li>Semantify the web offering more (SADI) </li></ul><ul><li>ChEBI as ‘glue’: keep the links coming </li></ul><ul><li>Mine ChEMBL bioactivity data for ChEBI role assertions (inhibitor etc) </li></ul><ul><li>Harness literature, map to MeSH </li></ul>ChEBI ontology 20.10.10