0
The importance of standards for
data exchange and interchange
on the Royal Society of
Chemistry eScience platforms
Valery ...
RSC Projects in Action
• Many RSC projects underway, underpinned by
ChemSpider, and very dependent on standards
• ChemSpid...
• 3-year Innovative Medicines Initiative project
• Integrating chemistry and biology data using
semantic web technologies
...
Open Source Drug Discovery
Compound Data
• The standards of chemical structure handling
are primarily molfile, SDfile, SMILES, InChI
• We primarily d...
ChemSpider
ChemSpider
Exact Search
Skeleton Search
Compound Data
• The standards of chemical structure handling
are primarily molfile, SDfile, SMILES, InChI
• We primarily d...
CVSP : chemical validation
Free chemistry validation platform that performs:
•Structure validation
• Atoms
• Bonds
• Valen...
Input formats supported:
CDX, Mol,
Sdf
Zip
Gz
Tab-delimited text files
CVSP: standardization
modules
• Custom processing let’s user to put together workflow
from pre-defined standardization mod...
Reaction Data
• ChemSpider is built for compounds – but
how are they made???
• ChemSpider Reactions is our attempt to
answ...
RSC and Chemical Reactions
RSC and Chemical Reactions
RSC Journal Content
• Many 10s/100s of thousands of reactions
contained in our journals
• Electronic Supplementary informa...
ChemSpider Reactions
ChemSpider Reactions
ChemSpider SyntheticPages
Spectral Data
• ChemSpider requires spectral data to be
deposited in standard formats – JCAMP or
images
• All spectra avai...
Student Submissions
JCAMP NMR Spectra
Data on ChemSpider
Data Interchange
JCAMP file downloads
• When NMR spectra are stored as JCAMP
then downloads into offline packages are
feasible – MestreLabs...
Spectral Display in the hand
Challenges with Spectra
• JCAMP is good for a lot of spectral data – IR,
Raman, 1D NMR
• MS data is rarely made available ...
…and images
DERA to digitize documents?
• We want to get data out of our historical archive
• What could we do?
• Find chemical names ...
Text-Mining
ESI – Text Spectra
Do we want to search text spectra?
What do we get when we search:
13C NMR (CDCl3, 100 MHz): δ = 14.12 (CH3),
30.11 (CH, be...
MestreLabs Mnova NMR Beta
1H NMR (CDCl3, 400 MHz):
δ = 2.57 (m, 4H, Me, C(5a)H), 4.24 (d, 1H, J = 4.8 Hz, C(11b)H),
4.35 (t, 1H, Jb = 10.8 Hz, C(6)H...
ESI – Text and Image Spectra
Extracted JCAMP Spectrum
Prepare CONSISTENT JCAMP
Data onto ChemSpider
It’s exactly the WRONG WAY!
• We should NOT be mining data out of future
publications
• Structures should be submitted “co...
APIs and Standards
• We follow the standard expectations in terms
of how people would want to access our
APIs: RESTful ser...
Conclusions
• Data Interchange standards are all over our
projects!
• We are grateful to companies, organizations,
contrib...
For the Next ACS hopefully…
• Build out our ChemSpider Reaction collection
• Grab spectral data out of our ESI!
• Get more...
Thank You
Email: williamsa@rsc.org
Twitter: ChemConnector
Personal Blog: www.chemconnector.com
SLIDES: www.slideshare.net/...
The importance of standards for data exchange and interchange on the Royal Society of Chemistry e science platforms
The importance of standards for data exchange and interchange on the Royal Society of Chemistry e science platforms
The importance of standards for data exchange and interchange on the Royal Society of Chemistry e science platforms
The importance of standards for data exchange and interchange on the Royal Society of Chemistry e science platforms
The importance of standards for data exchange and interchange on the Royal Society of Chemistry e science platforms
The importance of standards for data exchange and interchange on the Royal Society of Chemistry e science platforms
Upcoming SlideShare
Loading in...5
×

The importance of standards for data exchange and interchange on the Royal Society of Chemistry e science platforms

4,536

Published on

The Royal Society of Chemistry provides access to a number of databases hosting chemicals data, reactions, spectroscopy data and prediction services. These databases and services can be accessed via web services utilizing queries using standard data formats such as InChI and molfiles. Data can then be downloaded in standard structure and spectral formats allowing for reuse and repurposing. The ChemSpider database integrates to a number of projects external to RSC including Open PHACTS that integrates chemical and biological data. This project utilizes semantic web data standards including RDF. This presentation will provide an overview of how structure and spectral data standards have been critical in allowing us to integrate many open source tools, ease of integration to a myriad of services and underpin many of our future developments.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
4,536
On Slideshare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "The importance of standards for data exchange and interchange on the Royal Society of Chemistry e science platforms"

  1. 1. The importance of standards for data exchange and interchange on the Royal Society of Chemistry eScience platforms Valery Tkachenko, Colin Batchelor, Jon Steele and Antony Williams* ACS Indianapolis September 12th 2013
  2. 2. RSC Projects in Action • Many RSC projects underway, underpinned by ChemSpider, and very dependent on standards • ChemSpider • ChemSpider Reactions • Open PHACTS • PharmaSea • Chemical Database Service • Open Source Drug Discovery
  3. 3. • 3-year Innovative Medicines Initiative project • Integrating chemistry and biology data using semantic web technologies • Open source code, open data and open standards • Academics, Pharmas, Publishers… • To put medicines in the pipeline…
  4. 4. Open Source Drug Discovery
  5. 5. Compound Data • The standards of chemical structure handling are primarily molfile, SDfile, SMILES, InChI • We primarily depend on molfiles and SDF files for data deposition and interchange • We use InChI a lot – especially for integrated searching across the web • There ARE data interchange problems associated with structures….
  6. 6. ChemSpider
  7. 7. ChemSpider
  8. 8. Exact Search
  9. 9. Skeleton Search
  10. 10. Compound Data • The standards of chemical structure handling are primarily molfile, SDfile, SMILES, InChI • We primarily depend on molfiles and SDF files for data deposition and interchange • We use InChI a lot – especially for integrated searching across the web • There ARE data interchange problems associated with structures….
  11. 11. CVSP : chemical validation Free chemistry validation platform that performs: •Structure validation • Atoms • Bonds • Valence • Stereo • If aromatic - check that uniquely dearomatized • Strongest acid not ionized first in partially-ionized system •Cross-matching of SDF fields • synonyms • InChIs • Smiles
  12. 12. Input formats supported: CDX, Mol, Sdf Zip Gz Tab-delimited text files
  13. 13. CVSP: standardization modules • Custom processing let’s user to put together workflow from pre-defined standardization modules list
  14. 14. Reaction Data • ChemSpider is built for compounds – but how are they made??? • ChemSpider Reactions is our attempt to answer the question.. • Integrating both commercial and open data • RSC Databases, data extracted from our publications on the DERA project and Open Data sources of reactions • Molfiles, CDX files, RXN files
  15. 15. RSC and Chemical Reactions
  16. 16. RSC and Chemical Reactions
  17. 17. RSC Journal Content • Many 10s/100s of thousands of reactions contained in our journals • Electronic Supplementary information data contains lots more
  18. 18. ChemSpider Reactions
  19. 19. ChemSpider Reactions
  20. 20. ChemSpider SyntheticPages
  21. 21. Spectral Data • ChemSpider requires spectral data to be deposited in standard formats – JCAMP or images • All spectra available at: http:// www.chemspider.com/spectra.aspx • Data are deposited on a regular basis • Students • Chemical vendors • Growing collection now
  22. 22. Student Submissions
  23. 23. JCAMP NMR Spectra
  24. 24. Data on ChemSpider
  25. 25. Data Interchange
  26. 26. JCAMP file downloads • When NMR spectra are stored as JCAMP then downloads into offline packages are feasible – MestreLabs, ACD/Labs etc • Open Data – download versus view • Store spectra locally and reuse • Java is increasingly a pain! • Need to move to HTML5 viewing on ChemSpider, especially for Mobile Viewing
  27. 27. Spectral Display in the hand
  28. 28. Challenges with Spectra • JCAMP is good for a lot of spectral data – IR, Raman, 1D NMR • MS data is rarely made available in JCAMP • We would love a ratified JCAMP 6.0 for 2D data exchange – allows third parties to build support for download • ASSIGNED JCAMP spectra can be supported but no real standards here
  29. 29. …and images
  30. 30. DERA to digitize documents? • We want to get data out of our historical archive • What could we do? • Find chemical names and generate structures • Find chemical images and generate structures • Find reactions – and make a database! • Find data (MP, BP, LogP) and deposit • Find figures and database them • Find spectra (and link to structures)
  31. 31. Text-Mining
  32. 32. ESI – Text Spectra
  33. 33. Do we want to search text spectra? What do we get when we search: 13C NMR (CDCl3, 100 MHz): δ = 14.12 (CH3), 30.11 (CH, benzylic methane), 30.77 (CH, benzylic methane), 66.12 (CH2), 68.49 (CH2), 117.72, 118.19, 120.29, 122.67, 123.37, 125.69, 125.84, 129.03, 130.00, 130.53 (ArCH), 99.42, 123.60, 134.69, 139.23, 147.21, 147.61, 149.41, 152.62, 154.88 (ArC)
  34. 34. MestreLabs Mnova NMR Beta
  35. 35. 1H NMR (CDCl3, 400 MHz): δ = 2.57 (m, 4H, Me, C(5a)H), 4.24 (d, 1H, J = 4.8 Hz, C(11b)H), 4.35 (t, 1H, Jb = 10.8 Hz, C(6)H), 4.47 (m, 2H, C(5)H), 4.57 (dd, 1H, J = 2.8 Hz, C(6)H), 6.95 (d, 1H, J = 8.4 Hz, ArH), 7.18–7.94 (m, 11H, ArH)
  36. 36. ESI – Text and Image Spectra
  37. 37. Extracted JCAMP Spectrum
  38. 38. Prepare CONSISTENT JCAMP
  39. 39. Data onto ChemSpider
  40. 40. It’s exactly the WRONG WAY! • We should NOT be mining data out of future publications • Structures should be submitted “correctly” • Spectra should be digital spectral formats, not images • ESI should be RICH and interactive
  41. 41. APIs and Standards • We follow the standard expectations in terms of how people would want to access our APIs: RESTful services, JSON handling etc. • We allow people to pass in queries using molfiles, SMILES, InChI/Keys etc • Future will include JCAMP searching • APIs in use by MANY organizations and of value to our Open PHACTS, PharmaSea, Chemical Database Service etc. Also Mobile
  42. 42. Conclusions • Data Interchange standards are all over our projects! • We are grateful to companies, organizations, contributors who have helped define: • Structure – Mol,SDF,InChI etc • Spectra – JCAMP, SPC, NetCDF etc • W3C standards
  43. 43. For the Next ACS hopefully… • Build out our ChemSpider Reaction collection • Grab spectral data out of our ESI! • Get more submissions in STANDARD formats • Integrate to spectroscopy handling systems for deposition in JCAMP • Push molfiles directly into ChemSpider with improved deposition platform • Build out the chemical data repository…
  44. 44. Thank You Email: williamsa@rsc.org Twitter: ChemConnector Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×