SlideShare a Scribd company logo
The importance of standards for
data exchange and interchange
on the Royal Society of
Chemistry eScience platforms
Valery Tkachenko, Colin Batchelor,
Jon Steele and Antony Williams*
ACS Indianapolis
September 12th
2013
RSC Projects in Action
• Many RSC projects underway, underpinned by
ChemSpider, and very dependent on standards
• ChemSpider
• ChemSpider Reactions
• Open PHACTS
• PharmaSea
• Chemical Database Service
• Open Source Drug Discovery
• 3-year Innovative Medicines Initiative project
• Integrating chemistry and biology data using
semantic web technologies
• Open source code, open data and open
standards
• Academics, Pharmas, Publishers…
• To put medicines in the pipeline…
Open Source Drug Discovery
Compound Data
• The standards of chemical structure handling
are primarily molfile, SDfile, SMILES, InChI
• We primarily depend on molfiles and SDF
files for data deposition and interchange
• We use InChI a lot – especially for integrated
searching across the web
• There ARE data interchange problems
associated with structures….
ChemSpider
ChemSpider
Exact Search
Skeleton Search
Compound Data
• The standards of chemical structure handling
are primarily molfile, SDfile, SMILES, InChI
• We primarily depend on molfiles and SDF
files for data deposition and interchange
• We use InChI a lot – especially for integrated
searching across the web
• There ARE data interchange problems
associated with structures….
CVSP : chemical validation
Free chemistry validation platform that performs:
•Structure validation
• Atoms
• Bonds
• Valence
• Stereo
• If aromatic - check that uniquely dearomatized
• Strongest acid not ionized first in partially-ionized system
•Cross-matching of SDF fields
• synonyms
• InChIs
• Smiles
Input formats supported:
CDX, Mol,
Sdf
Zip
Gz
Tab-delimited text files
CVSP: standardization
modules
• Custom processing let’s user to put together workflow
from pre-defined standardization modules list
Reaction Data
• ChemSpider is built for compounds – but
how are they made???
• ChemSpider Reactions is our attempt to
answer the question..
• Integrating both commercial and open data
• RSC Databases, data extracted from our
publications on the DERA project and Open
Data sources of reactions
• Molfiles, CDX files, RXN files
RSC and Chemical Reactions
RSC and Chemical Reactions
RSC Journal Content
• Many 10s/100s of thousands of reactions
contained in our journals
• Electronic Supplementary information data
contains lots more
ChemSpider Reactions
ChemSpider Reactions
ChemSpider SyntheticPages
Spectral Data
• ChemSpider requires spectral data to be
deposited in standard formats – JCAMP or
images
• All spectra available at: http://
www.chemspider.com/spectra.aspx
• Data are deposited on a regular basis
• Students
• Chemical vendors
• Growing collection now
Student Submissions
JCAMP NMR Spectra
Data on ChemSpider
Data Interchange
JCAMP file downloads
• When NMR spectra are stored as JCAMP
then downloads into offline packages are
feasible – MestreLabs, ACD/Labs etc
• Open Data – download versus view
• Store spectra locally and reuse
• Java is increasingly a pain!
• Need to move to HTML5 viewing on
ChemSpider, especially for Mobile Viewing
Spectral Display in the hand
Challenges with Spectra
• JCAMP is good for a lot of spectral data – IR,
Raman, 1D NMR
• MS data is rarely made available in JCAMP
• We would love a ratified JCAMP 6.0 for 2D
data exchange – allows third parties to build
support for download
• ASSIGNED JCAMP spectra can be
supported but no real standards here
…and images
DERA to digitize documents?
• We want to get data out of our historical archive
• What could we do?
• Find chemical names and generate structures
• Find chemical images and generate structures
• Find reactions – and make a database!
• Find data (MP, BP, LogP) and deposit
• Find figures and database them
• Find spectra (and link to structures)
Text-Mining
ESI – Text Spectra
Do we want to search text spectra?
What do we get when we search:
13C NMR (CDCl3, 100 MHz): δ = 14.12 (CH3),
30.11 (CH, benzylic methane), 30.77 (CH,
benzylic methane), 66.12 (CH2), 68.49 (CH2),
117.72, 118.19, 120.29, 122.67, 123.37, 125.69,
125.84, 129.03, 130.00, 130.53 (ArCH), 99.42,
123.60, 134.69, 139.23, 147.21, 147.61,
149.41, 152.62, 154.88 (ArC)
MestreLabs Mnova NMR Beta
1H NMR (CDCl3, 400 MHz):
δ = 2.57 (m, 4H, Me, C(5a)H), 4.24 (d, 1H, J = 4.8 Hz, C(11b)H),
4.35 (t, 1H, Jb = 10.8 Hz, C(6)H), 4.47 (m, 2H, C(5)H), 4.57 (dd,
1H, J = 2.8 Hz, C(6)H), 6.95 (d, 1H, J = 8.4 Hz, ArH), 7.18–7.94
(m, 11H, ArH)
ESI – Text and Image Spectra
Extracted JCAMP Spectrum
Prepare CONSISTENT JCAMP
Data onto ChemSpider
It’s exactly the WRONG WAY!
• We should NOT be mining data out of future
publications
• Structures should be submitted “correctly”
• Spectra should be digital spectral formats,
not images
• ESI should be RICH and interactive
APIs and Standards
• We follow the standard expectations in terms
of how people would want to access our
APIs: RESTful services, JSON handling etc.
• We allow people to pass in queries using
molfiles, SMILES, InChI/Keys etc
• Future will include JCAMP searching
• APIs in use by MANY organizations and of
value to our Open PHACTS, PharmaSea,
Chemical Database Service etc. Also Mobile
Conclusions
• Data Interchange standards are all over our
projects!
• We are grateful to companies, organizations,
contributors who have helped define:
• Structure – Mol,SDF,InChI etc
• Spectra – JCAMP, SPC, NetCDF etc
• W3C standards
For the Next ACS hopefully…
• Build out our ChemSpider Reaction collection
• Grab spectral data out of our ESI!
• Get more submissions in STANDARD formats
• Integrate to spectroscopy handling systems
for deposition in JCAMP
• Push molfiles directly into ChemSpider with
improved deposition platform
• Build out the chemical data repository…
Thank You
Email: williamsa@rsc.org
Twitter: ChemConnector
Personal Blog: www.chemconnector.com
SLIDES: www.slideshare.net/AntonyWilliams

More Related Content

What's hot

Data integration and building a profile for yourself as an online scientist
Data integration and building a profile for yourself as an online scientistData integration and building a profile for yourself as an online scientist
Data integration and building a profile for yourself as an online scientist
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Data Mining Dissertations and Adventures and Experiences in the World of Chem...
Data Mining Dissertations and Adventures and Experiences in the World of Chem...Data Mining Dissertations and Adventures and Experiences in the World of Chem...
Data Mining Dissertations and Adventures and Experiences in the World of Chem...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Dealing with the complex challenge of managing diverse analytical chemistry d...
Dealing with the complex challenge of managing diverse analytical chemistry d...Dealing with the complex challenge of managing diverse analytical chemistry d...
Dealing with the complex challenge of managing diverse analytical chemistry d...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Cheminformatics and the Structure Elucidation of Natural Products
Cheminformatics and the Structure Elucidation of Natural ProductsCheminformatics and the Structure Elucidation of Natural Products
Cheminformatics and the Structure Elucidation of Natural Products
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Building a data repository to manage chemistry research data
Building a data repository to manage chemistry research dataBuilding a data repository to manage chemistry research data
Building a data repository to manage chemistry research data
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Value of the mediawiki platform for providing content to the chemistry community
Value of the mediawiki platform for providing content to the chemistry communityValue of the mediawiki platform for providing content to the chemistry community
Value of the mediawiki platform for providing content to the chemistry community
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Our dire need to mandate data standards and expectations for scientific publi...
Our dire need to mandate data standards and expectations for scientific publi...Our dire need to mandate data standards and expectations for scientific publi...
Our dire need to mandate data standards and expectations for scientific publi...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts project
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
eScience Resources for the Chemistry Community from the Royal Society of Chem...
eScience Resources for the Chemistry Community from the Royal Society of Chem...eScience Resources for the Chemistry Community from the Royal Society of Chem...
eScience Resources for the Chemistry Community from the Royal Society of Chem...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
ACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP Project
Stuart Chalk
 
The needs for chemistry standards, database tools and data curation at the ch...
The needs for chemistry standards, database tools and data curation at the ch...The needs for chemistry standards, database tools and data curation at the ch...
The needs for chemistry standards, database tools and data curation at the ch...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Clustering the royal society of chemistry chemical repository to enable enhan...
Clustering the royal society of chemistry chemical repository to enable enhan...Clustering the royal society of chemistry chemical repository to enable enhan...
Clustering the royal society of chemistry chemical repository to enable enhan...
Valery Tkachenko
 
Hosting a compound centric community resource for chemistry data
Hosting a compound centric community resource for chemistry dataHosting a compound centric community resource for chemistry data
Hosting a compound centric community resource for chemistry data
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...
Sean Ekins
 

What's hot (18)

Data integration and building a profile for yourself as an online scientist
Data integration and building a profile for yourself as an online scientistData integration and building a profile for yourself as an online scientist
Data integration and building a profile for yourself as an online scientist
 
Data Mining Dissertations and Adventures and Experiences in the World of Chem...
Data Mining Dissertations and Adventures and Experiences in the World of Chem...Data Mining Dissertations and Adventures and Experiences in the World of Chem...
Data Mining Dissertations and Adventures and Experiences in the World of Chem...
 
Dealing with the complex challenge of managing diverse analytical chemistry d...
Dealing with the complex challenge of managing diverse analytical chemistry d...Dealing with the complex challenge of managing diverse analytical chemistry d...
Dealing with the complex challenge of managing diverse analytical chemistry d...
 
Cheminformatics and the Structure Elucidation of Natural Products
Cheminformatics and the Structure Elucidation of Natural ProductsCheminformatics and the Structure Elucidation of Natural Products
Cheminformatics and the Structure Elucidation of Natural Products
 
Building a data repository to manage chemistry research data
Building a data repository to manage chemistry research dataBuilding a data repository to manage chemistry research data
Building a data repository to manage chemistry research data
 
Value of the mediawiki platform for providing content to the chemistry community
Value of the mediawiki platform for providing content to the chemistry communityValue of the mediawiki platform for providing content to the chemistry community
Value of the mediawiki platform for providing content to the chemistry community
 
Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...
 
Our dire need to mandate data standards and expectations for scientific publi...
Our dire need to mandate data standards and expectations for scientific publi...Our dire need to mandate data standards and expectations for scientific publi...
Our dire need to mandate data standards and expectations for scientific publi...
 
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
 
Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts project
 
eScience Resources for the Chemistry Community from the Royal Society of Chem...
eScience Resources for the Chemistry Community from the Royal Society of Chem...eScience Resources for the Chemistry Community from the Royal Society of Chem...
eScience Resources for the Chemistry Community from the Royal Society of Chem...
 
ACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP Project
 
The needs for chemistry standards, database tools and data curation at the ch...
The needs for chemistry standards, database tools and data curation at the ch...The needs for chemistry standards, database tools and data curation at the ch...
The needs for chemistry standards, database tools and data curation at the ch...
 
Clustering the royal society of chemistry chemical repository to enable enhan...
Clustering the royal society of chemistry chemical repository to enable enhan...Clustering the royal society of chemistry chemical repository to enable enhan...
Clustering the royal society of chemistry chemical repository to enable enhan...
 
Hosting a compound centric community resource for chemistry data
Hosting a compound centric community resource for chemistry dataHosting a compound centric community resource for chemistry data
Hosting a compound centric community resource for chemistry data
 
Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...
 

Similar to The importance of standards for data exchange and interchange on the Royal Society of Chemistry e science platforms

Current initiatives in developing research data repositories at the Royal Soc...
Current initiatives in developing research data repositories at the Royal Soc...Current initiatives in developing research data repositories at the Royal Soc...
Current initiatives in developing research data repositories at the Royal Soc...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Experiences in Hosting Big Chemistry Data Collections for the Community
Experiences in Hosting Big Chemistry Data Collections for the CommunityExperiences in Hosting Big Chemistry Data Collections for the Community
Experiences in Hosting Big Chemistry Data Collections for the Community
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Using online chemistry databases to facilitate structure identification in ma...
Using online chemistry databases to facilitate structure identification in ma...Using online chemistry databases to facilitate structure identification in ma...
Using online chemistry databases to facilitate structure identification in ma...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Teaching analytical spectroscopy using online spectroscopic data
Teaching analytical spectroscopy using online spectroscopic dataTeaching analytical spectroscopy using online spectroscopic data
Teaching analytical spectroscopy using online spectroscopic data
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Importance of data standards for large scale data integration in chemistry
Importance of data standards for large scale data integration in chemistryImportance of data standards for large scale data integration in chemistry
Importance of data standards for large scale data integration in chemistry
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
NOMAD
NOMADNOMAD
NOMAD
Jisc RDM
 
Data enhancing the royal society of chemistry publication archive
Data enhancing the royal society of chemistry publication archiveData enhancing the royal society of chemistry publication archive
Data enhancing the royal society of chemistry publication archive
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Data enhancing the royal society of chemistry publication archive
Data enhancing the royal society of chemistry publication archiveData enhancing the royal society of chemistry publication archive
Data enhancing the royal society of chemistry publication archive
Ken Karapetyan
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
Ken Karapetyan
 
Approaches for extraction and digital chromatography of chemical data
Approaches for extraction and digital chromatography of chemical dataApproaches for extraction and digital chromatography of chemical data
Approaches for extraction and digital chromatography of chemical data
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...
Kamel Mansouri
 
Evolution of open chemical information
Evolution of open chemical informationEvolution of open chemical information
Evolution of open chemical information
Valery Tkachenko
 
Molecular modelling for in silico drug discovery
Molecular modelling for in silico drug discoveryMolecular modelling for in silico drug discovery
Molecular modelling for in silico drug discovery
Lee Larcombe
 
How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...
Ken Karapetyan
 
How ACDLabs Software Tools are used by the Royal Society of Chemistry
How ACDLabs Software Tools are used by the Royal Society of ChemistryHow ACDLabs Software Tools are used by the Royal Society of Chemistry
How ACDLabs Software Tools are used by the Royal Society of Chemistry
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
Ken Karapetyan
 
Prediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source toolsPrediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source tools
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Incorporating new technologies and High Throughput Screening in the design an...
Incorporating new technologies and High Throughput Screening in the design an...Incorporating new technologies and High Throughput Screening in the design an...
Incorporating new technologies and High Throughput Screening in the design an...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 

Similar to The importance of standards for data exchange and interchange on the Royal Society of Chemistry e science platforms (20)

Current initiatives in developing research data repositories at the Royal Soc...
Current initiatives in developing research data repositories at the Royal Soc...Current initiatives in developing research data repositories at the Royal Soc...
Current initiatives in developing research data repositories at the Royal Soc...
 
Experiences in Hosting Big Chemistry Data Collections for the Community
Experiences in Hosting Big Chemistry Data Collections for the CommunityExperiences in Hosting Big Chemistry Data Collections for the Community
Experiences in Hosting Big Chemistry Data Collections for the Community
 
Using online chemistry databases to facilitate structure identification in ma...
Using online chemistry databases to facilitate structure identification in ma...Using online chemistry databases to facilitate structure identification in ma...
Using online chemistry databases to facilitate structure identification in ma...
 
Teaching analytical spectroscopy using online spectroscopic data
Teaching analytical spectroscopy using online spectroscopic dataTeaching analytical spectroscopy using online spectroscopic data
Teaching analytical spectroscopy using online spectroscopic data
 
Importance of data standards for large scale data integration in chemistry
Importance of data standards for large scale data integration in chemistryImportance of data standards for large scale data integration in chemistry
Importance of data standards for large scale data integration in chemistry
 
NOMAD
NOMADNOMAD
NOMAD
 
Data enhancing the royal society of chemistry publication archive
Data enhancing the royal society of chemistry publication archiveData enhancing the royal society of chemistry publication archive
Data enhancing the royal society of chemistry publication archive
 
Data enhancing the royal society of chemistry publication archive
Data enhancing the royal society of chemistry publication archiveData enhancing the royal society of chemistry publication archive
Data enhancing the royal society of chemistry publication archive
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
 
Approaches for extraction and digital chromatography of chemical data
Approaches for extraction and digital chromatography of chemical dataApproaches for extraction and digital chromatography of chemical data
Approaches for extraction and digital chromatography of chemical data
 
The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...
 
Evolution of open chemical information
Evolution of open chemical informationEvolution of open chemical information
Evolution of open chemical information
 
Molecular modelling for in silico drug discovery
Molecular modelling for in silico drug discoveryMolecular modelling for in silico drug discovery
Molecular modelling for in silico drug discovery
 
How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...
 
How ACDLabs Software Tools are used by the Royal Society of Chemistry
How ACDLabs Software Tools are used by the Royal Society of ChemistryHow ACDLabs Software Tools are used by the Royal Society of Chemistry
How ACDLabs Software Tools are used by the Royal Society of Chemistry
 
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
 
ISDD_SK_Talk.V2
ISDD_SK_Talk.V2ISDD_SK_Talk.V2
ISDD_SK_Talk.V2
 
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
 
Prediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source toolsPrediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source tools
 
Incorporating new technologies and High Throughput Screening in the design an...
Incorporating new technologies and High Throughput Screening in the design an...Incorporating new technologies and High Throughput Screening in the design an...
Incorporating new technologies and High Throughput Screening in the design an...
 

Recently uploaded

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 

Recently uploaded (20)

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 

The importance of standards for data exchange and interchange on the Royal Society of Chemistry e science platforms

  • 1. The importance of standards for data exchange and interchange on the Royal Society of Chemistry eScience platforms Valery Tkachenko, Colin Batchelor, Jon Steele and Antony Williams* ACS Indianapolis September 12th 2013
  • 2. RSC Projects in Action • Many RSC projects underway, underpinned by ChemSpider, and very dependent on standards • ChemSpider • ChemSpider Reactions • Open PHACTS • PharmaSea • Chemical Database Service • Open Source Drug Discovery
  • 3. • 3-year Innovative Medicines Initiative project • Integrating chemistry and biology data using semantic web technologies • Open source code, open data and open standards • Academics, Pharmas, Publishers… • To put medicines in the pipeline…
  • 4.
  • 5.
  • 6. Open Source Drug Discovery
  • 7. Compound Data • The standards of chemical structure handling are primarily molfile, SDfile, SMILES, InChI • We primarily depend on molfiles and SDF files for data deposition and interchange • We use InChI a lot – especially for integrated searching across the web • There ARE data interchange problems associated with structures….
  • 12. Compound Data • The standards of chemical structure handling are primarily molfile, SDfile, SMILES, InChI • We primarily depend on molfiles and SDF files for data deposition and interchange • We use InChI a lot – especially for integrated searching across the web • There ARE data interchange problems associated with structures….
  • 13. CVSP : chemical validation Free chemistry validation platform that performs: •Structure validation • Atoms • Bonds • Valence • Stereo • If aromatic - check that uniquely dearomatized • Strongest acid not ionized first in partially-ionized system •Cross-matching of SDF fields • synonyms • InChIs • Smiles
  • 14. Input formats supported: CDX, Mol, Sdf Zip Gz Tab-delimited text files
  • 15. CVSP: standardization modules • Custom processing let’s user to put together workflow from pre-defined standardization modules list
  • 16.
  • 17. Reaction Data • ChemSpider is built for compounds – but how are they made??? • ChemSpider Reactions is our attempt to answer the question.. • Integrating both commercial and open data • RSC Databases, data extracted from our publications on the DERA project and Open Data sources of reactions • Molfiles, CDX files, RXN files
  • 18. RSC and Chemical Reactions
  • 19. RSC and Chemical Reactions
  • 20. RSC Journal Content • Many 10s/100s of thousands of reactions contained in our journals • Electronic Supplementary information data contains lots more
  • 23.
  • 25. Spectral Data • ChemSpider requires spectral data to be deposited in standard formats – JCAMP or images • All spectra available at: http:// www.chemspider.com/spectra.aspx • Data are deposited on a regular basis • Students • Chemical vendors • Growing collection now
  • 30. JCAMP file downloads • When NMR spectra are stored as JCAMP then downloads into offline packages are feasible – MestreLabs, ACD/Labs etc • Open Data – download versus view • Store spectra locally and reuse • Java is increasingly a pain! • Need to move to HTML5 viewing on ChemSpider, especially for Mobile Viewing
  • 32. Challenges with Spectra • JCAMP is good for a lot of spectral data – IR, Raman, 1D NMR • MS data is rarely made available in JCAMP • We would love a ratified JCAMP 6.0 for 2D data exchange – allows third parties to build support for download • ASSIGNED JCAMP spectra can be supported but no real standards here
  • 34. DERA to digitize documents? • We want to get data out of our historical archive • What could we do? • Find chemical names and generate structures • Find chemical images and generate structures • Find reactions – and make a database! • Find data (MP, BP, LogP) and deposit • Find figures and database them • Find spectra (and link to structures)
  • 36. ESI – Text Spectra
  • 37. Do we want to search text spectra? What do we get when we search: 13C NMR (CDCl3, 100 MHz): δ = 14.12 (CH3), 30.11 (CH, benzylic methane), 30.77 (CH, benzylic methane), 66.12 (CH2), 68.49 (CH2), 117.72, 118.19, 120.29, 122.67, 123.37, 125.69, 125.84, 129.03, 130.00, 130.53 (ArCH), 99.42, 123.60, 134.69, 139.23, 147.21, 147.61, 149.41, 152.62, 154.88 (ArC)
  • 39. 1H NMR (CDCl3, 400 MHz): δ = 2.57 (m, 4H, Me, C(5a)H), 4.24 (d, 1H, J = 4.8 Hz, C(11b)H), 4.35 (t, 1H, Jb = 10.8 Hz, C(6)H), 4.47 (m, 2H, C(5)H), 4.57 (dd, 1H, J = 2.8 Hz, C(6)H), 6.95 (d, 1H, J = 8.4 Hz, ArH), 7.18–7.94 (m, 11H, ArH)
  • 40. ESI – Text and Image Spectra
  • 41.
  • 43.
  • 46. It’s exactly the WRONG WAY! • We should NOT be mining data out of future publications • Structures should be submitted “correctly” • Spectra should be digital spectral formats, not images • ESI should be RICH and interactive
  • 47. APIs and Standards • We follow the standard expectations in terms of how people would want to access our APIs: RESTful services, JSON handling etc. • We allow people to pass in queries using molfiles, SMILES, InChI/Keys etc • Future will include JCAMP searching • APIs in use by MANY organizations and of value to our Open PHACTS, PharmaSea, Chemical Database Service etc. Also Mobile
  • 48. Conclusions • Data Interchange standards are all over our projects! • We are grateful to companies, organizations, contributors who have helped define: • Structure – Mol,SDF,InChI etc • Spectra – JCAMP, SPC, NetCDF etc • W3C standards
  • 49. For the Next ACS hopefully… • Build out our ChemSpider Reaction collection • Grab spectral data out of our ESI! • Get more submissions in STANDARD formats • Integrate to spectroscopy handling systems for deposition in JCAMP • Push molfiles directly into ChemSpider with improved deposition platform • Build out the chemical data repository…
  • 50. Thank You Email: williamsa@rsc.org Twitter: ChemConnector Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams