Ontology work at the Royal
Society of Chemistry
Antony J. Williams, Colin
Batchelor, Peter Corbett, Jon Steele
and Valery ...
Royal Society of Chemistry
• You know us as a publisher and society but
• We are a host of chemistry databases
• We are a ...
We have data to manage…
• Compounds
• Reactions
• Spectra
• Crystals
• Materials
• Assays
• Algorithms
• …
We have data to manage…
• Compounds
• Reactions
• Spectra
• Crystals
• Materials
• Assays
• Algorithms
• …
Properties - experimental
Physicochemical properties
LONG LIST: log P, log D (at pH 5.5, at pH
7.4), bioconcentration factor, KOC (at pH
5.5, at pH ...
All are amenable to ontologies
and should blend standards
• Compounds and properties are handled
(InChIs are important)
• ...
ChemSpider Reactions
ChemSpider Spectra
ChemSpider is 7 years old
• When ChemSpider was developed ontologies
were not directly implemented
• The ontologies and te...
Some available ontologies…
• RSC has built and opened in-house ontologies:
• Chemical methods (CHMO)
• Name reactions (RXN...
Chemistry ontologies 1
ChEBI (molecules, families of molecules,
parts of molecules, 32128 fully annotated
classes) (http:/...
ChEBI Ontology
RSC Ontologies
Chemistry ontologies 2
Chemical Methods Ontology (http://rsc-cmo.googlecode.com)
2745 classes describes methods used to:
•...
Chemistry ontologies 3
RSC Name Reaction Ontology
(http://rxno.googlecode.com/)
421 classes
Examples:
Diels–Alder cyclizat...
Chemistry ontologies 4
CHEMINF
(http://code.google.com/p/semanticchemistry/)
638 classes
Describes cheminformatics methods...
Limits of ontologies
Chemical space is very big:
‘The “small molecule universe” (SMU), the set of
all synthetically feasib...
Why a named reaction ontology?
• Despite attempts to introduce systematic
nomenclature for organic reactions, lots of
chem...
A big challenge
• Classification is based on what the experimenter
intends
• Build the ontology around intended product
mo...
Defining the skeleton
Limits of reaction classification
• Much of RXNO is still classified by hand
• Example: we can’t just define a cyclization...
RXNO in the wild
510 classes in the RXNO namespace
… and RXNO is built in to NextMove
Software’s reaction identification t...
RXNO: next steps
• More reactions!
• More cross-references!
• More example reactions!
• Links to graphical versions! (All ...
Using ontologies in text mining
• To provide a controlled vocabulary of terms
found in text and a common identifier.
• Thi...
Ontologies as synonym sets for
text-mining
• We have text-mined the whole 21st century
RSC archive with a myriad of ontolo...
Co-occurrences with ?
alcohols (CHEBI:30879) solvents (CHEBI:46787)
coproporphyrins (CHEBI:23388) 3D DOSY-TOCSY
(CHMO:0001...
Co-occurrences with ?
reducing agent (CHEBI:63247) ascorbic acid (CHEBI:22652)
antioxidant (CHEBI:22586) reduction (MOP:00...
Projects and Ontologies
• 3-year Innovative Medicines Initiative project
• Integrating chemistry and biology data using
se...
The Open PHACTS community ecosystem
Our RDF schema
Two dozen calculated properties >106
molecules
•CHEMINF ontology for cheminformatics
•QUDT for units and nu...
RSC data in Open PHACTS
1. Molecule synonyms and identifiers
2. Linksets between ChEBI, ChEMBL, DrugBank
and OPS identifie...
Synonyms and identifiers
Newly added to the CHEMINF ontology:
•Validated ChemSpider synonyms
•Unvalidated ChemSpider synon...
Physicochemical properties
log P log D (at pH 5.5, at pH 7.4)
bioconcentration factor KOC (at pH 5.5, at
pH 7.4) index of ...
It is actually more complicated..
benzene’s
connection table
OPS
benzene
calculation result
QUDT
dimensionless
quantity
“2...
What’s built on top of this?
Chemistry Data to manage…
• Compounds
• Reactions
• Spectra
• Crystals (in development)
• Materials
• Assays
• Algorithms
...
Future Work
• Extending use of ontologies across all of our
work on databases and as an underpinning to
the Chemical Data ...
Thank you
•Email: williamsa@rsc.org
•ORCID: 0000-0002-2668-4821
•Twitter: @ChemConnector
•Personal Blog: www.chemconnector...
Ontology work at the Royal Society of Chemistry
Ontology work at the Royal Society of Chemistry
Ontology work at the Royal Society of Chemistry
Ontology work at the Royal Society of Chemistry
Upcoming SlideShare
Loading in...5
×

Ontology work at the Royal Society of Chemistry

3,309

Published on

We provide an overview of the use we make of ontologies at the Royal Society of Chemistry. Our engagement with the ontology community began in 2006 with preparations for Project Prospect, which used ChEBI and other Open Biomedical Ontologies to mark up journal articles. Subsequently Project Prospect has evolved into DERA (Digitally Enhancing the RSC Archive) and we have developed further ontologies for text markup, covering analytical methods and name reactions. Most recently we have been contributing to CHEMINF, an open-source cheminformatics ontology, as part of our work on disseminating calculated physicochemical properties of molecules via the Open PHACTS. We show how we represent these properties and how it can serve as a template for disseminating different sorts of chemical information.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,309
On Slideshare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
10
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Ontology work at the Royal Society of Chemistry

  1. 1. Ontology work at the Royal Society of Chemistry Antony J. Williams, Colin Batchelor, Peter Corbett, Jon Steele and Valery Tkachenko ACS Dallas March 16th 2014
  2. 2. Royal Society of Chemistry • You know us as a publisher and society but • We are a host of chemistry databases • We are a charity and community support • We are a provider of grant-based services • We are an innovator in cheminformatics
  3. 3. We have data to manage… • Compounds • Reactions • Spectra • Crystals • Materials • Assays • Algorithms • …
  4. 4. We have data to manage… • Compounds • Reactions • Spectra • Crystals • Materials • Assays • Algorithms • …
  5. 5. Properties - experimental
  6. 6. Physicochemical properties LONG LIST: log P, log D (at pH 5.5, at pH 7.4), bioconcentration factor, KOC (at pH 5.5, at pH 7.4), index of refraction, polar surface area, molar refractivity, molar volume, polarizability, surface tension, density at STP, flash point at 1 atm, boiling point at 1 atm, enthalpy of vaporization at STP, vapour pressure at STP…
  7. 7. All are amenable to ontologies and should blend standards • Compounds and properties are handled (InChIs are important) • Reactions are covered (and RInChIs help) • Spectra (JCAMP, AnIML, NetCDF, mzML) • Crystals (CIFs) • Materials (MatML) • Assays (MIAME) • Algorithms • …
  8. 8. ChemSpider Reactions
  9. 9. ChemSpider Spectra
  10. 10. ChemSpider is 7 years old • When ChemSpider was developed ontologies were not directly implemented • The ontologies and technologies have developed and more accepted in seven years • Some efforts have been made to include ontologies – layer on MeSH. We support a lot of standards – InChI, RInChI, JCAMP, CIF • The ChemSpider architecture is being rebuilt and considering new standards and ontologies
  11. 11. Some available ontologies… • RSC has built and opened in-house ontologies: • Chemical methods (CHMO) • Name reactions (RXNO) • Molecular processes (MOP), largely auto-generated from the corresponding ChEBI classes • We have contributed to external ontologies: • Small molecules (ChEBI) • Cheminformatics (CHEMINF)
  12. 12. Chemistry ontologies 1 ChEBI (molecules, families of molecules, parts of molecules, 32128 fully annotated classes) (http://www.ebi.ac.uk/chebi/) perylene (CHEBI:29861) a perylene (CHEBI:60201) perylene skeleton (CHEBI:60200)
  13. 13. ChEBI Ontology
  14. 14. RSC Ontologies
  15. 15. Chemistry ontologies 2 Chemical Methods Ontology (http://rsc-cmo.googlecode.com) 2745 classes describes methods used to: •collect data in chemical experiments, such as MS and NMR •prepare and separate material for further analysis, such as sample ionisation, chromatography, and electrophoresis •synthesise materials, such as continuous vapour deposition •also describes the instruments used in these experiments, such as mass spectrometers and chromatography columns and their outputs •Should be of value to chemical hazards and safety data
  16. 16. Chemistry ontologies 3 RSC Name Reaction Ontology (http://rxno.googlecode.com/) 421 classes Examples: Diels–Alder cyclization
  17. 17. Chemistry ontologies 4 CHEMINF (http://code.google.com/p/semanticchemistry/) 638 classes Describes cheminformatics methods. Not presently used in text mining (see Open PHACTS usage later). doi:10.1371/journal.pone.0025513
  18. 18. Limits of ontologies Chemical space is very big: ‘The “small molecule universe” (SMU), the set of all synthetically feasible organic molecules of 500 Daltons molecular weight or less, is estimated to contain over 1060 structures, making exhaustive searches for structures of interest impractical.” Virshup et al., J. Am. Chem. Soc., doi:10.1021/ja401184g
  19. 19. Why a named reaction ontology? • Despite attempts to introduce systematic nomenclature for organic reactions, lots of chemists still prefer to attach human names.
  20. 20. A big challenge • Classification is based on what the experimenter intends • Build the ontology around intended product molecules rather than might be by-products • (Carbon dioxide, water, hydrolysed protecting groups, protons, etc. etc.)
  21. 21. Defining the skeleton
  22. 22. Limits of reaction classification • Much of RXNO is still classified by hand • Example: we can’t just define a cyclization as a reaction where a cyclic compound is formed. The Friedel–Crafts acylation produces a cyclic compound but is not a cyclization!
  23. 23. RXNO in the wild 510 classes in the RXNO namespace … and RXNO is built in to NextMove Software’s reaction identification tool.
  24. 24. RXNO: next steps • More reactions! • More cross-references! • More example reactions! • Links to graphical versions! (All drawn, just awaiting uploading.) • More SMIRKS strings!
  25. 25. Using ontologies in text mining • To provide a controlled vocabulary of terms found in text and a common identifier. • This identifier hopefully is a resolvable HTTP URI, for example, for chemical compounds http://purl.obolibrary.org/obo/CHEBI_36063 ) and to methods terminology
  26. 26. Ontologies as synonym sets for text-mining • We have text-mined the whole 21st century RSC archive with a myriad of ontologies. Results are on the publishing platform • We have looked for correlations between molecules and ontology terms. • Two examples follow…
  27. 27. Co-occurrences with ? alcohols (CHEBI:30879) solvents (CHEBI:46787) coproporphyrins (CHEBI:23388) 3D DOSY-TOCSY (CHMO:0001950) lipase activity (GO:0016298) solvolysis (MOP:0000620) wood (ENVO:00002040) aliphatic alcohol (CHEBI:2571) Raman circular dichroism spectroscopy (CHMO:0001160) propoxy group (CHEBI:46881) steam reforming (CHMO:0001450) hydrogenation (MOP:0000589) aqueous-phase reforming (CHMO:0001444) sonication (CHMO:0001707)
  28. 28. Co-occurrences with ? reducing agent (CHEBI:63247) ascorbic acid (CHEBI:22652) antioxidant (CHEBI:22586) reduction (MOP:0000569) electrode (CHMO:0002344) ascorbate (CHEBI:22651) modified residue (SO:0001089) phosphate buffer (CHMO:0001734) oxidation (MOP:0000568) nafion polymer (CHEBI:61428) vitamin C (CHEBI:21241) antioxidant activity (GO:0016209) atom-transfer radical polymerisation (MOP:0000684) detection of glucose (GO:0051594) reducing agent (CHEBI:63247) glucose (CHEBI:17234) graphene (CHEBI:36973)
  29. 29. Projects and Ontologies • 3-year Innovative Medicines Initiative project • Integrating chemistry and biology data using semantic web technologies • Open source code, open data and open standards • Academics, Pharmas, Publishers… • To put medicines in the pipeline…
  30. 30. The Open PHACTS community ecosystem
  31. 31. Our RDF schema Two dozen calculated properties >106 molecules •CHEMINF ontology for cheminformatics •QUDT for units and numeric values •ChemSpider IDs for molecules Calculation connection table has_input benzene is_about calculated log P has_output dimensionless has_unit 2.177 has_value 0.234has standard uncertainty
  32. 32. RSC data in Open PHACTS 1. Molecule synonyms and identifiers 2. Linksets between ChEBI, ChEMBL, DrugBank and OPS identifiers 3. Molecule–molecule relations (“parent–child”) of interest for drug discovery 4. Calculated physicochemical properties for compounds (both molecular and macroscopic)
  33. 33. Synonyms and identifiers Newly added to the CHEMINF ontology: •Validated ChemSpider synonyms •Unvalidated ChemSpider synonyms •Validated database identifiers •Unvalidated database identifiers •InChI, InChIKey, SMILES •Preferred ChemSpider name
  34. 34. Physicochemical properties log P log D (at pH 5.5, at pH 7.4) bioconcentration factor KOC (at pH 5.5, at pH 7.4) index of refraction polar surface area molar refractivity molar volume polarizability surface tension density at STP flash point at 1 atm boiling point at 1 atm enthalpy of vaporization at STP vapour pressure at STP
  35. 35. It is actually more complicated.. benzene’s connection table OPS benzene calculation result QUDT dimensionless quantity “2.17”^^xsd:float IAO is about OBI has specified output OBI has specified input QUDT has value QUDT has standard uncertainty QUDT has unit CHEMINF calculated log P rdf:type CHEMINF connection table rdf:type “0.234”^^xsd:float calculation process CHEMINF execution of ACD/Labs PhysChem software library version 12.01 rdf:type
  36. 36. What’s built on top of this?
  37. 37. Chemistry Data to manage… • Compounds • Reactions • Spectra • Crystals (in development) • Materials • Assays • Algorithms • …
  38. 38. Future Work • Extending use of ontologies across all of our work on databases and as an underpinning to the Chemical Data Repository • Adding ontologies to other grant-based projects such as PharmaSea • Continued collaborations with University of Southampton on Labtrove for Chemistry • RSC collaboration with Dr Stuart Chalk (UNF) on data standards and ontologies • Working with CHAS on hazard/safety data
  39. 39. Thank you •Email: williamsa@rsc.org •ORCID: 0000-0002-2668-4821 •Twitter: @ChemConnector •Personal Blog: www.chemconnector.com •SLIDES: www.slideshare.net/AntonyWilliams
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×