Researchers in the increasingly data-overridden scientific domains face ever-growing difficulty in working their way through the mounds of data spread across different resources, interfaces, languages and databases
We need more and more use of computational tools to intervene between the mountains of distributed, heterogeneous data. We need annotations to shared, controlled IDs, in order to harmonise data across different heterogeneous sources.
The human mind is an amazing thing: most people are able to correctly answer very quickly when asked the following questions: Are there any footprints on the moon? (YES) Are there any purple dogs on the moon? (NO) (nor bats, nor dinosaurs, nor trees...) How do they do that? They are not taught itineraries of what things are on the moon in high school. Rather they are taught the simple fact that there is no life on the moon at all. From this fact they are able to infer that there are no purple dogs on the moon, because purple dogs are a kind of life form.
What is an ontology? It is at least all of these things: a community-wide standardised terminology and dictionary of terms in a particular domain; a hierarchically organised map of entities in the domain; a logical model which allows compact representation but logical inference to additional implications; and a tool which supports multiple, knowledge-based applications.
Ontologies are organised hierarchically from a very general root term to the most specialised leaf terms (utility: grouping items at different levels) They gather together synonyms and other metadata (utility: ‘glue’ for data integration) They provide logical definitions to allow automatic inferences thus providing a compact storage mechanism (utility: automated reasoning and query answering) They therefore provide a sophisticated searching and organising medium for multiple applications And there is one standard (OWL) format for ontology development which is supported by many tools and resources
This slide illustrates our chemical ontologies (currently in development at the EBI and with collaborators)
Software libraries implement algorithmsAlgorithms calculate descriptors Descriptors are about chemical entities of various sorts (molecules, substances, atoms...)
Now, because you have a single ontology on top of multiple annotations across several databases (a standard), you can perform cross-database querying for data related to the same thing. But that’s not all – not only can you query across several databases, but your query is semantic – it *knows* that leukemia is a kind of cancer, and you don’t have to implement a custom search solution in each database capable of inferring this, because the hierarchy and the synonyms lives outside of any one database – in the community-wide shared ontology. Image: different databases, literature resources. Organising ontology: semantic searching, multi-level aggregation.
What are the challenges?
Many chemical classification systems do not differentiate between structure-based and role-based classification systems (e.g. MeSH). They therefore say that caffeine IS A `cns stimulant’ in exactly the same way that they say caffeine IS A ‘trimethylxanthine’. Humans can distinguish between these two types of classification and make correct inferences, but it leads to invalid inferences when computers are asked to reason over the classification, since the terms on the left share structural features while those on the right do not; the terms on the left are ‘timeless, condition-less’ properties of the chemical entities while the terms on the right describe context-specific behaviour of chemical entities. We therefore separated the structure-based and role-based classifications and introduced the has-role cross-ontology relationship. A term such as `antibiotic’ is ambiguous in sense between meaning an &lt;activity&gt; (role) and a particular chemical entity which may have that activity.
In common language (particularly in the realm of databases), chemical ‘structure’ and chemical ‘entity’ are referred to synonymously. For example the GDB database refers to its total size in terms of ‘organic structures’ while calling itself a database of ‘molecules’. However, it is crucial to differentiate these senses in classification, since it is possible to have a chemical entity and not know its structure, or be mistaken about its structure (e.g. vancomycin).
If you pre-compute all parts of a molecule and all properties, you can make ontology definitions for classes which use those properties BUT your ontology becomes very, very large in asserted parts/propertiesBetter is if, at least for simple properties and parts, the minimal information needed to deduce the relationship can be included in the ontology itself
Research in our group is investigating the applicability of the new ontology extension description graphs for addition of elements of chemical structures to the ontology to allow structure-based classification to be more automated in easier cases. Difficulty is that this appears to be reinventing a wheel that has already been well invented by the cheminformatics community, and our challenge moving forward is to bring in the cheminformatics libraries and toolkits and integrate them with the ontology ones.
One of the challenges which we are investigating is to accurately include in the ontology model the relevant conditions under which bioactivity holds. These conditions might be concentrations of the active substance in the organism, or the organism itself. These conditions are often THRESHOLD phenomena, that is, it is not sufficient to merely indicate a fixed border at which an effect starts to take place.
Chemical ontologies: what are they, what are they for, and what are the challenges
EBI is an Outstation of the European Molecular Biology Laboratory.
What are they?
What are they for?
What are the challenges?
Janna Hastings, EBI Chemoinformatics and Metabolism
German Conference on Chemoinformatics,
Goslar, 8 November 2010
How do we find
Multiple databases, heterogeneous data
Ambiguity, multiple synonyms
J. Hastings Chemical Ontology30.01.152
Data lost in
J. Hastings Chemical Ontology30.01.153
I’ll show you
All men are mortal
Socrates is a man
Therefore, Socrates is mortal
J. Hastings Chemical Ontology30.01.154
finding the implications of what you know
J. Hastings Chemical Ontology30.01.155
Community terminological standardisation
Dictionary: synonyms, definitions
Logical model allowing computer inferences
beyond what is explicitly encoded
Ontologies to filter and organise data
J. Hastings Chemical Ontology30.01.156
The Web Ontology Language (OWL)
Can be re-used in
J. Hastings Chemical Ontology30.01.157
Chemical entity Role
J. Hastings Chemical Ontology30.01.158
organic molecular entity
inorganic molecular entity
J. Hastings Chemical Ontology30.01.159
Biological role Application
J. Hastings Chemical Ontology30.01.1510
Chemical information entity
J. Hastings Chemical Ontology30.01.1511
Unified browsing and querying
Ontology representation in a complex domain
J. Hastings Chemical Ontology30.01.1512
Sounds great, but...
What are the challenges?
Chemicals and roles
J. Hastings Chemical Ontology30.01.1513
de Matos, P. et al: Chemical Entities of Biological Interest: an update. NAR Database issue 2010
Chemicals and structures
J. Hastings Chemical Ontology30.01.1514
J. Hastings, C. Batchelor, C. Steinbeck, S. Schulz: What are chemical structures and their relations? FOIS 2010
What is the
Representing complex structures
J. Hastings Chemical Ontology30.01.1515
Chemical classes can be defined by
parts of structures
properties of structures
if molecule has part some carboxy group
if molecule has property cyclic, i.e. a self-connected
cyclic path exists through the molecule’s atoms
J. Hastings Chemical Ontology30.01.1516
all parts and
Integration of chemoinformatics and ontology toolsIntegration of chemoinformatics and ontology tools
J. Hastings et al.: Representing chemicals using OWL, description graphs and rules. OWLED 2010
Purpose and mode of action
J. Hastings Chemical Ontology30.01.1517
has rolehas role
C. Batchelor, J. Hastings, C. Steinbeck: Ontological dependence, dispositions and institutional reality in chemistry.
Bulk quantity of molecules
Depends on human intent
(e.g. license, prescription)
J. Hastings Chemical Ontology30.01.1518
Conditions in bioactivity models
Consider aspirin as treatment for a headache
Too few individual molecules will have no effect
Too many tablets will have unpleasant additional effects
Image credit: tell.fll.purdue.edu
J. Hastings, C. Steinbeck, L. Jansen, S. Schulz: Substance concentrations as conditions for the realization
of dispositions. ISMB Bio-Ontologies SIG 2010
J. Hastings Chemical Ontology30.01.1519
Paula de Matos
Rafael Alcántara Martin
Colin Batchelor, RSC
Stefan Schulz, Freiburg
Egon Willighagen, Uppsala
Michel Dumontier, Carleton
Leonid Chepelev, Carleton