SlideShare a Scribd company logo
Ontology work at the Royal
Society of Chemistry
Antony J. Williams, Colin
Batchelor, Peter Corbett, Jon Steele
and Valery Tkachenko
ACS Dallas
March 16th
2014
Royal Society of Chemistry
• You know us as a publisher and society but
• We are a host of chemistry databases
• We are a charity and community support
• We are a provider of grant-based services
• We are an innovator in cheminformatics
We have data to manage…
• Compounds
• Reactions
• Spectra
• Crystals
• Materials
• Assays
• Algorithms
• …
We have data to manage…
• Compounds
• Reactions
• Spectra
• Crystals
• Materials
• Assays
• Algorithms
• …
Properties - experimental
Physicochemical properties
LONG LIST: log P, log D (at pH 5.5, at pH
7.4), bioconcentration factor, KOC (at pH
5.5, at pH 7.4), index of refraction, polar
surface area, molar refractivity, molar
volume, polarizability, surface tension,
density at STP, flash point at 1 atm, boiling
point at 1 atm, enthalpy of vaporization at
STP, vapour pressure at STP…
All are amenable to ontologies
and should blend standards
• Compounds and properties are handled
(InChIs are important)
• Reactions are covered (and RInChIs help)
• Spectra (JCAMP, AnIML, NetCDF, mzML)
• Crystals (CIFs)
• Materials (MatML)
• Assays (MIAME)
• Algorithms
• …
ChemSpider Reactions
ChemSpider Spectra
ChemSpider is 7 years old
• When ChemSpider was developed ontologies
were not directly implemented
• The ontologies and technologies have
developed and more accepted in seven years
• Some efforts have been made to include
ontologies – layer on MeSH. We support a lot
of standards – InChI, RInChI, JCAMP, CIF
• The ChemSpider architecture is being rebuilt
and considering new standards and
ontologies
Some available ontologies…
• RSC has built and opened in-house ontologies:
• Chemical methods (CHMO)
• Name reactions (RXNO)
• Molecular processes (MOP), largely auto-generated
from the corresponding ChEBI classes
• We have contributed to external ontologies:
• Small molecules (ChEBI)
• Cheminformatics (CHEMINF)
Chemistry ontologies 1
ChEBI (molecules, families of molecules,
parts of molecules, 32128 fully annotated
classes) (http://www.ebi.ac.uk/chebi/)
perylene (CHEBI:29861) a perylene (CHEBI:60201)
perylene skeleton
(CHEBI:60200)
ChEBI Ontology
RSC Ontologies
Chemistry ontologies 2
Chemical Methods Ontology (http://rsc-cmo.googlecode.com)
2745 classes describes methods used to:
•collect data in chemical experiments, such as MS and NMR
•prepare and separate material for further analysis, such as
sample ionisation, chromatography, and electrophoresis
•synthesise materials, such as continuous vapour deposition
•also describes the instruments used in these experiments,
such as mass spectrometers and chromatography columns and
their outputs
•Should be of value to chemical hazards and safety data
Chemistry ontologies 3
RSC Name Reaction Ontology
(http://rxno.googlecode.com/)
421 classes
Examples:
Diels–Alder cyclization
Chemistry ontologies 4
CHEMINF
(http://code.google.com/p/semanticchemistry/)
638 classes
Describes cheminformatics methods. Not
presently used in text mining (see Open
PHACTS usage later).
doi:10.1371/journal.pone.0025513
Limits of ontologies
Chemical space is very big:
‘The “small molecule universe” (SMU), the set of
all synthetically feasible organic molecules of 500
Daltons molecular weight or less, is estimated to
contain over 1060
structures, making exhaustive
searches for structures of interest impractical.”
Virshup et al., J. Am. Chem. Soc.,
doi:10.1021/ja401184g
Why a named reaction ontology?
• Despite attempts to introduce systematic
nomenclature for organic reactions, lots of
chemists still prefer to attach human
names.
A big challenge
• Classification is based on what the experimenter
intends
• Build the ontology around intended product
molecules rather than might be by-products
• (Carbon dioxide, water, hydrolysed protecting
groups, protons, etc. etc.)
Defining the skeleton
Limits of reaction classification
• Much of RXNO is still classified by hand
• Example: we can’t just define a cyclization as
a reaction where a cyclic compound is formed.
The Friedel–Crafts acylation produces a cyclic
compound but is not a cyclization!
RXNO in the wild
510 classes in the RXNO namespace
… and RXNO is built in to NextMove
Software’s reaction identification tool.
RXNO: next steps
• More reactions!
• More cross-references!
• More example reactions!
• Links to graphical versions! (All drawn, just
awaiting uploading.)
• More SMIRKS strings!
Using ontologies in text mining
• To provide a controlled vocabulary of terms
found in text and a common identifier.
• This identifier hopefully is a resolvable HTTP
URI, for example, for chemical compounds
http://purl.obolibrary.org/obo/CHEBI_36063 )
and to methods terminology
Ontologies as synonym sets for
text-mining
• We have text-mined the whole 21st century
RSC archive with a myriad of ontologies.
Results are on the publishing platform
• We have looked for correlations between
molecules and ontology terms.
• Two examples follow…
Co-occurrences with ?
alcohols (CHEBI:30879) solvents (CHEBI:46787)
coproporphyrins (CHEBI:23388) 3D DOSY-TOCSY
(CHMO:0001950) lipase activity (GO:0016298) solvolysis
(MOP:0000620) wood (ENVO:00002040) aliphatic alcohol
(CHEBI:2571) Raman circular dichroism spectroscopy
(CHMO:0001160) propoxy group (CHEBI:46881) steam
reforming (CHMO:0001450) hydrogenation (MOP:0000589)
aqueous-phase reforming (CHMO:0001444) sonication
(CHMO:0001707)
Co-occurrences with ?
reducing agent (CHEBI:63247) ascorbic acid (CHEBI:22652)
antioxidant (CHEBI:22586) reduction (MOP:0000569)
electrode (CHMO:0002344) ascorbate (CHEBI:22651)
modified residue (SO:0001089) phosphate buffer
(CHMO:0001734) oxidation (MOP:0000568) nafion polymer
(CHEBI:61428) vitamin C (CHEBI:21241) antioxidant activity
(GO:0016209) atom-transfer radical polymerisation
(MOP:0000684) detection of glucose (GO:0051594) reducing
agent (CHEBI:63247) glucose (CHEBI:17234) graphene
(CHEBI:36973)
Projects and Ontologies
• 3-year Innovative Medicines Initiative project
• Integrating chemistry and biology data using
semantic web technologies
• Open source code, open data and open
standards
• Academics, Pharmas, Publishers…
• To put medicines in the pipeline…
The Open PHACTS community ecosystem
Our RDF schema
Two dozen calculated properties >106
molecules
•CHEMINF ontology for cheminformatics
•QUDT for units and numeric values
•ChemSpider IDs for molecules
Calculation
connection table
has_input
benzene
is_about
calculated log P
has_output
dimensionless
has_unit 2.177
has_value
0.234has
standard
uncertainty
RSC data in Open PHACTS
1. Molecule synonyms and identifiers
2. Linksets between ChEBI, ChEMBL, DrugBank
and OPS identifiers
3. Molecule–molecule relations (“parent–child”) of
interest for drug discovery
4. Calculated physicochemical properties for
compounds (both molecular and macroscopic)
Synonyms and identifiers
Newly added to the CHEMINF ontology:
•Validated ChemSpider synonyms
•Unvalidated ChemSpider synonyms
•Validated database identifiers
•Unvalidated database identifiers
•InChI, InChIKey, SMILES
•Preferred ChemSpider name
Physicochemical properties
log P log D (at pH 5.5, at pH 7.4)
bioconcentration factor KOC (at pH 5.5, at
pH 7.4) index of refraction polar surface
area molar refractivity molar volume
polarizability surface tension density at
STP flash point at 1 atm boiling point at 1
atm enthalpy of vaporization at STP
vapour pressure at STP
It is actually more complicated..
benzene’s
connection table
OPS
benzene
calculation result
QUDT
dimensionless
quantity
“2.17”^^xsd:float
IAO
is about
OBI
has specified
output
OBI
has specified
input
QUDT
has value
QUDT
has standard
uncertainty
QUDT
has unit
CHEMINF
calculated log P
rdf:type
CHEMINF
connection table
rdf:type
“0.234”^^xsd:float
calculation
process
CHEMINF
execution of
ACD/Labs
PhysChem software
library version 12.01
rdf:type
What’s built on top of this?
Chemistry Data to manage…
• Compounds
• Reactions
• Spectra
• Crystals (in development)
• Materials
• Assays
• Algorithms
• …
Future Work
• Extending use of ontologies across all of our
work on databases and as an underpinning to
the Chemical Data Repository
• Adding ontologies to other grant-based projects
such as PharmaSea
• Continued collaborations with University of
Southampton on Labtrove for Chemistry
• RSC collaboration with Dr Stuart Chalk (UNF)
on data standards and ontologies
• Working with CHAS on hazard/safety data
Thank you
•Email: williamsa@rsc.org
•ORCID: 0000-0002-2668-4821
•Twitter: @ChemConnector
•Personal Blog: www.chemconnector.com
•SLIDES: www.slideshare.net/AntonyWilliams

More Related Content

Similar to Ontology work at the Royal Society of Chemistry

Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Building a data repository to manage chemistry research data
Building a data repository to manage chemistry research dataBuilding a data repository to manage chemistry research data
Building a data repository to manage chemistry research data
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Accessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data DashboardsAccessing Environmental Chemistry Data via Data Dashboards
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
The application of text and data mining to enhance the RSC publication archive
The application of text and data mining to enhance the RSC publication archiveThe application of text and data mining to enhance the RSC publication archive
The application of text and data mining to enhance the RSC publication archive
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts project
Ken Karapetyan
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts project
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
2010 CASCON - Towards a integrated network of data and services for the life ...
2010 CASCON - Towards a integrated network of data and services for the life ...2010 CASCON - Towards a integrated network of data and services for the life ...
2010 CASCON - Towards a integrated network of data and services for the life ...
Michel Dumontier
 
ChemSpider reactions – delivering a free community resource of chemical synth...
ChemSpider reactions – delivering a free community resource of chemical synth...ChemSpider reactions – delivering a free community resource of chemical synth...
ChemSpider reactions – delivering a free community resource of chemical synth...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
New Approach Methods - What is That?
New Approach Methods - What is That?New Approach Methods - What is That?
SAR_EMBL_EBI_EC_BLAST_NOV_2013_Industry_workshop
SAR_EMBL_EBI_EC_BLAST_NOV_2013_Industry_workshopSAR_EMBL_EBI_EC_BLAST_NOV_2013_Industry_workshop
SAR_EMBL_EBI_EC_BLAST_NOV_2013_Industry_workshop
Syed Asad Rahman
 
ChemSpider reactions – delivering a free community resource of chemical synth...
ChemSpider reactions – delivering a free community resource of chemical synth...ChemSpider reactions – delivering a free community resource of chemical synth...
ChemSpider reactions – delivering a free community resource of chemical synth...
Ken Karapetyan
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
Ken Karapetyan
 
Experiences in Hosting Big Chemistry Data Collections for the Community
Experiences in Hosting Big Chemistry Data Collections for the CommunityExperiences in Hosting Big Chemistry Data Collections for the Community
Experiences in Hosting Big Chemistry Data Collections for the Community
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
A chemistry data repository to serve them all
A chemistry data repository to serve them allA chemistry data repository to serve them all
II-SDV 2017: The "International Chemical Ontology Network"
II-SDV 2017: The "International Chemical Ontology Network" II-SDV 2017: The "International Chemical Ontology Network"
II-SDV 2017: The "International Chemical Ontology Network"
Dr. Haxel Consult
 
Chemical similarity
Chemical similarityChemical similarity
Chemical similarity
Nina Jeliazkova
 
2013 s bio 101 chapter 2 basic chemistry
2013 s bio 101 chapter 2 basic chemistry2013 s bio 101 chapter 2 basic chemistry
2013 s bio 101 chapter 2 basic chemistry
germannajessica
 

Similar to Ontology work at the Royal Society of Chemistry (20)

Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...
 
Building a data repository to manage chemistry research data
Building a data repository to manage chemistry research dataBuilding a data repository to manage chemistry research data
Building a data repository to manage chemistry research data
 
Accessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data DashboardsAccessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data Dashboards
 
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
 
The application of text and data mining to enhance the RSC publication archive
The application of text and data mining to enhance the RSC publication archiveThe application of text and data mining to enhance the RSC publication archive
The application of text and data mining to enhance the RSC publication archive
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts project
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts project
 
2010 CASCON - Towards a integrated network of data and services for the life ...
2010 CASCON - Towards a integrated network of data and services for the life ...2010 CASCON - Towards a integrated network of data and services for the life ...
2010 CASCON - Towards a integrated network of data and services for the life ...
 
ChemSpider reactions – delivering a free community resource of chemical synth...
ChemSpider reactions – delivering a free community resource of chemical synth...ChemSpider reactions – delivering a free community resource of chemical synth...
ChemSpider reactions – delivering a free community resource of chemical synth...
 
New Approach Methods - What is That?
New Approach Methods - What is That?New Approach Methods - What is That?
New Approach Methods - What is That?
 
SAR_EMBL_EBI_EC_BLAST_NOV_2013_Industry_workshop
SAR_EMBL_EBI_EC_BLAST_NOV_2013_Industry_workshopSAR_EMBL_EBI_EC_BLAST_NOV_2013_Industry_workshop
SAR_EMBL_EBI_EC_BLAST_NOV_2013_Industry_workshop
 
ChemSpider reactions – delivering a free community resource of chemical synth...
ChemSpider reactions – delivering a free community resource of chemical synth...ChemSpider reactions – delivering a free community resource of chemical synth...
ChemSpider reactions – delivering a free community resource of chemical synth...
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
 
Experiences in Hosting Big Chemistry Data Collections for the Community
Experiences in Hosting Big Chemistry Data Collections for the CommunityExperiences in Hosting Big Chemistry Data Collections for the Community
Experiences in Hosting Big Chemistry Data Collections for the Community
 
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
 
A chemistry data repository to serve them all
A chemistry data repository to serve them allA chemistry data repository to serve them all
A chemistry data repository to serve them all
 
II-SDV 2017: The "International Chemical Ontology Network"
II-SDV 2017: The "International Chemical Ontology Network" II-SDV 2017: The "International Chemical Ontology Network"
II-SDV 2017: The "International Chemical Ontology Network"
 
Chemical similarity
Chemical similarityChemical similarity
Chemical similarity
 
2013 s bio 101 chapter 2 basic chemistry
2013 s bio 101 chapter 2 basic chemistry2013 s bio 101 chapter 2 basic chemistry
2013 s bio 101 chapter 2 basic chemistry
 

Recently uploaded

Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 

Recently uploaded (20)

Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 

Ontology work at the Royal Society of Chemistry

  • 1. Ontology work at the Royal Society of Chemistry Antony J. Williams, Colin Batchelor, Peter Corbett, Jon Steele and Valery Tkachenko ACS Dallas March 16th 2014
  • 2. Royal Society of Chemistry • You know us as a publisher and society but • We are a host of chemistry databases • We are a charity and community support • We are a provider of grant-based services • We are an innovator in cheminformatics
  • 3. We have data to manage… • Compounds • Reactions • Spectra • Crystals • Materials • Assays • Algorithms • …
  • 4. We have data to manage… • Compounds • Reactions • Spectra • Crystals • Materials • Assays • Algorithms • …
  • 5.
  • 7. Physicochemical properties LONG LIST: log P, log D (at pH 5.5, at pH 7.4), bioconcentration factor, KOC (at pH 5.5, at pH 7.4), index of refraction, polar surface area, molar refractivity, molar volume, polarizability, surface tension, density at STP, flash point at 1 atm, boiling point at 1 atm, enthalpy of vaporization at STP, vapour pressure at STP…
  • 8. All are amenable to ontologies and should blend standards • Compounds and properties are handled (InChIs are important) • Reactions are covered (and RInChIs help) • Spectra (JCAMP, AnIML, NetCDF, mzML) • Crystals (CIFs) • Materials (MatML) • Assays (MIAME) • Algorithms • …
  • 11. ChemSpider is 7 years old • When ChemSpider was developed ontologies were not directly implemented • The ontologies and technologies have developed and more accepted in seven years • Some efforts have been made to include ontologies – layer on MeSH. We support a lot of standards – InChI, RInChI, JCAMP, CIF • The ChemSpider architecture is being rebuilt and considering new standards and ontologies
  • 12. Some available ontologies… • RSC has built and opened in-house ontologies: • Chemical methods (CHMO) • Name reactions (RXNO) • Molecular processes (MOP), largely auto-generated from the corresponding ChEBI classes • We have contributed to external ontologies: • Small molecules (ChEBI) • Cheminformatics (CHEMINF)
  • 13. Chemistry ontologies 1 ChEBI (molecules, families of molecules, parts of molecules, 32128 fully annotated classes) (http://www.ebi.ac.uk/chebi/) perylene (CHEBI:29861) a perylene (CHEBI:60201) perylene skeleton (CHEBI:60200)
  • 16. Chemistry ontologies 2 Chemical Methods Ontology (http://rsc-cmo.googlecode.com) 2745 classes describes methods used to: •collect data in chemical experiments, such as MS and NMR •prepare and separate material for further analysis, such as sample ionisation, chromatography, and electrophoresis •synthesise materials, such as continuous vapour deposition •also describes the instruments used in these experiments, such as mass spectrometers and chromatography columns and their outputs •Should be of value to chemical hazards and safety data
  • 17. Chemistry ontologies 3 RSC Name Reaction Ontology (http://rxno.googlecode.com/) 421 classes Examples: Diels–Alder cyclization
  • 18. Chemistry ontologies 4 CHEMINF (http://code.google.com/p/semanticchemistry/) 638 classes Describes cheminformatics methods. Not presently used in text mining (see Open PHACTS usage later). doi:10.1371/journal.pone.0025513
  • 19. Limits of ontologies Chemical space is very big: ‘The “small molecule universe” (SMU), the set of all synthetically feasible organic molecules of 500 Daltons molecular weight or less, is estimated to contain over 1060 structures, making exhaustive searches for structures of interest impractical.” Virshup et al., J. Am. Chem. Soc., doi:10.1021/ja401184g
  • 20. Why a named reaction ontology? • Despite attempts to introduce systematic nomenclature for organic reactions, lots of chemists still prefer to attach human names.
  • 21. A big challenge • Classification is based on what the experimenter intends • Build the ontology around intended product molecules rather than might be by-products • (Carbon dioxide, water, hydrolysed protecting groups, protons, etc. etc.)
  • 22.
  • 24. Limits of reaction classification • Much of RXNO is still classified by hand • Example: we can’t just define a cyclization as a reaction where a cyclic compound is formed. The Friedel–Crafts acylation produces a cyclic compound but is not a cyclization!
  • 25. RXNO in the wild 510 classes in the RXNO namespace … and RXNO is built in to NextMove Software’s reaction identification tool.
  • 26. RXNO: next steps • More reactions! • More cross-references! • More example reactions! • Links to graphical versions! (All drawn, just awaiting uploading.) • More SMIRKS strings!
  • 27. Using ontologies in text mining • To provide a controlled vocabulary of terms found in text and a common identifier. • This identifier hopefully is a resolvable HTTP URI, for example, for chemical compounds http://purl.obolibrary.org/obo/CHEBI_36063 ) and to methods terminology
  • 28.
  • 29. Ontologies as synonym sets for text-mining • We have text-mined the whole 21st century RSC archive with a myriad of ontologies. Results are on the publishing platform • We have looked for correlations between molecules and ontology terms. • Two examples follow…
  • 30. Co-occurrences with ? alcohols (CHEBI:30879) solvents (CHEBI:46787) coproporphyrins (CHEBI:23388) 3D DOSY-TOCSY (CHMO:0001950) lipase activity (GO:0016298) solvolysis (MOP:0000620) wood (ENVO:00002040) aliphatic alcohol (CHEBI:2571) Raman circular dichroism spectroscopy (CHMO:0001160) propoxy group (CHEBI:46881) steam reforming (CHMO:0001450) hydrogenation (MOP:0000589) aqueous-phase reforming (CHMO:0001444) sonication (CHMO:0001707)
  • 31. Co-occurrences with ? reducing agent (CHEBI:63247) ascorbic acid (CHEBI:22652) antioxidant (CHEBI:22586) reduction (MOP:0000569) electrode (CHMO:0002344) ascorbate (CHEBI:22651) modified residue (SO:0001089) phosphate buffer (CHMO:0001734) oxidation (MOP:0000568) nafion polymer (CHEBI:61428) vitamin C (CHEBI:21241) antioxidant activity (GO:0016209) atom-transfer radical polymerisation (MOP:0000684) detection of glucose (GO:0051594) reducing agent (CHEBI:63247) glucose (CHEBI:17234) graphene (CHEBI:36973)
  • 32. Projects and Ontologies • 3-year Innovative Medicines Initiative project • Integrating chemistry and biology data using semantic web technologies • Open source code, open data and open standards • Academics, Pharmas, Publishers… • To put medicines in the pipeline…
  • 33. The Open PHACTS community ecosystem
  • 34.
  • 35. Our RDF schema Two dozen calculated properties >106 molecules •CHEMINF ontology for cheminformatics •QUDT for units and numeric values •ChemSpider IDs for molecules Calculation connection table has_input benzene is_about calculated log P has_output dimensionless has_unit 2.177 has_value 0.234has standard uncertainty
  • 36. RSC data in Open PHACTS 1. Molecule synonyms and identifiers 2. Linksets between ChEBI, ChEMBL, DrugBank and OPS identifiers 3. Molecule–molecule relations (“parent–child”) of interest for drug discovery 4. Calculated physicochemical properties for compounds (both molecular and macroscopic)
  • 37. Synonyms and identifiers Newly added to the CHEMINF ontology: •Validated ChemSpider synonyms •Unvalidated ChemSpider synonyms •Validated database identifiers •Unvalidated database identifiers •InChI, InChIKey, SMILES •Preferred ChemSpider name
  • 38. Physicochemical properties log P log D (at pH 5.5, at pH 7.4) bioconcentration factor KOC (at pH 5.5, at pH 7.4) index of refraction polar surface area molar refractivity molar volume polarizability surface tension density at STP flash point at 1 atm boiling point at 1 atm enthalpy of vaporization at STP vapour pressure at STP
  • 39. It is actually more complicated.. benzene’s connection table OPS benzene calculation result QUDT dimensionless quantity “2.17”^^xsd:float IAO is about OBI has specified output OBI has specified input QUDT has value QUDT has standard uncertainty QUDT has unit CHEMINF calculated log P rdf:type CHEMINF connection table rdf:type “0.234”^^xsd:float calculation process CHEMINF execution of ACD/Labs PhysChem software library version 12.01 rdf:type
  • 40. What’s built on top of this?
  • 41. Chemistry Data to manage… • Compounds • Reactions • Spectra • Crystals (in development) • Materials • Assays • Algorithms • …
  • 42. Future Work • Extending use of ontologies across all of our work on databases and as an underpinning to the Chemical Data Repository • Adding ontologies to other grant-based projects such as PharmaSea • Continued collaborations with University of Southampton on Labtrove for Chemistry • RSC collaboration with Dr Stuart Chalk (UNF) on data standards and ontologies • Working with CHAS on hazard/safety data
  • 43. Thank you •Email: williamsa@rsc.org •ORCID: 0000-0002-2668-4821 •Twitter: @ChemConnector •Personal Blog: www.chemconnector.com •SLIDES: www.slideshare.net/AntonyWilliams