Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Driving needs for analytical data exchange standards and the potential impacts on the chemical sciences


Published on

Analytical science underpins so many different types of chemistry that it is clearly indispensable. Nuclear Magnetic Resonance and infrared spectroscopy, mass spectrometry and chromatography, and a myriad of other forms of analytical science are easily available to scientists today, commonly in open access walk up labs. While instrumentation is now compact and highly flexible, and the controlling software is both powerful and easy to use, significant challenges remain in terms of the management and integration of various forms of analytical data and, more importantly, the exchange of data between scientists. In general the reporting of data in peer-reviewed journals is limited to electronic supplementary information in the form of PDF files or, occasionally in the form of webpages. Many of the strengths in analytical data resides in the ability to database diverse data types and interrogate later performing searches based on metadata, spectral features and related chemical structure information. The need for file format export and conversions from binary file formats associated with the majority of analytical instrumentation remains a major objective in the field. While file formats such as JCAMP and NetCDF have enabled data exchange for a number of years the requirement for more advanced formats (such as AnIML and mzML) has continued. This presentation will review existing activities in the development of exchangeable formats and progress in utilizing existing formats for the delivery of reusable analytical data to the community.

Published in: Science
  • Be the first to comment

Driving needs for analytical data exchange standards and the potential impacts on the chemical sciences

  1. 1. Driving needs for analytical data exchange standards and the potential impacts on the chemical sciences Antony Williams ORCID ID:0000-0002-2668-4821
  2. 2. A useful website if we had it… • All of the “public spectra” from scientific research articles were available on a website – NMR, MS, GC/LC-MS, IR, UV-Vis, Raman • The spectra were NOT pictures but live, interactive spectral data that can be searched • The site had programmatic interfaces that could integrate to instruments for real time structure identification
  3. 3. A useful website if we had it… • Structural integration with assigned data (vibrational bands, MS fragments, NMR assignments (1D and 2D)) would allow for the construction of predictive models • And if it all came together we would be able to consider CASE – Computer-Assisted Structure Elucidation online!
  4. 4. And some of it is done…
  5. 5. NIST Webbook
  6. 6. mzCloud
  7. 7.
  8. 8. ACD/ILab
  9. 9. MassBank
  10. 10. MassBank
  11. 11. SDBS
  12. 12. ChemSpider
  13. 13. ChemSpider
  14. 14. 9442 Spectra and growing
  15. 15. We have pieces…but much to do • To build the “spectral database” we really need certain things: • Adoption of a new community norm: “A commitment to share spectral data” • Education around existing standards – “yes madam, you can already generate JCAMP!” • “We need a CCDC for spectral data” 
  16. 16. So why do we need standards?
  17. 17. So why do we need standards? • Well that’s a dumb question! • Just in general - think character codes, HTML, CSV, W3C efforts • For our domain – the molfile, SDF file, InChI, CIF files, JCAMP • There are “standards by adoption” and “open standards”
  18. 18. Mass Spectrometry Formats
  19. 19. Analytical Data Standards
  20. 20. Analytical Data Standards
  21. 21. 2D NMR
  22. 22. Progress in standards
  23. 23. Progress in standards
  24. 24. Standards without adoption are limited in value • If the instrument vendors don’t support or adopt the standards success is limited • If the scientists don’t know what the standards are and how to use them then what?
  25. 25. Publishers can push us for data
  26. 26. RSC loads Supp. Info Data now..
  27. 27. Are There Challenges? • JCAMP is good for a lot of spectral data – IR, Raman, 1D NMR • MS data is rarely made available in JCAMP • A ratified JCAMP 6.0 for 2D data exchange – would allow third parties to build support • All other data standards (for NMR at least!) will take years to catch up • Support for ASSIGNED JCAMP spectra IS already supported!
  28. 28. JCAMP-MOL
  29. 29. Jmol - JSpecView
  30. 30. ChemDoodle Components
  31. 31. And even support for 2D NMR!
  32. 32. A Movie from the Denver meeting
  33. 33. ESI – Text Spectra
  34. 34. We want to find text spectra? • We can find and index text spectra:13C NMR (CDCl3, 100 MHz): δ = 14.12 (CH3), 30.11 (CH, benzylic methane), 30.77 (CH, benzylic methane), 66.12 (CH2), 68.49 (CH2), 117.72, 118.19, 120.29, 122.67, 123.37, 125.69, 125.84, 129.03, 130.00, 130.53 (ArCH), 99.42, 123.60, 134.69, 139.23, 147.21, 147.61, 149.41, 152.62, 154.88 (ArC) • What would be better are spectral figures – and include assignments where possible!
  35. 35. MestreLabs Mnova NMR
  36. 36. 1H NMR (CDCl3, 400 MHz): δ = 2.57 (m, 4H, Me, C(5a)H), 4.24 (d, 1H, J = 4.8 Hz, C(11b)H), 4.35 (t, 1H, Jb = 10.8 Hz, C(6)H), 4.47 (m, 2H, C(5)H), 4.57 (dd, 1H, J = 2.8 Hz, C(6)H), 6.95 (d, 1H, J = 8.4 Hz, ArH), 7.18–7.94 (m, 11H, ArH)
  37. 37. Developing Proof-of-Concept • Extract from 1976-2014 USPTO applications *unknown – starts off with NMR: peak list (no nucleus) H 975543 C 56536 unknown 44306 F 9429 P 3241 B 91 Si 62 Sn 22 Se 11 N 8
  38. 38. ESI Data also contains figures
  39. 39. “Where is the real data please?” FIGURE DATA
  40. 40. Manual Curation Layer • ALL SPECTRA SHOULD BE JCAMP • ChemSpider had manual curation for >8 years • Users already annotate data on ChemSpider • These data are intended to go into the developing RSC Data Repository architecture •
  41. 41. What should we be doing? • Settle on a short-term format – JCAMP-JMOL? • Convince the instrument vendors to export in this format • Push button depositions into “containers” – ChemSpider, NMRShiftDB, Institutional Repositories • Encourage format support in software (read and write) – Mestre, ACD/Labs, Bruker TopSpin, etc.
  42. 42. Actions • Support and encourage new and EXISTING standards • In the meantime, reawaken and modernize the JCAMP standard • Encourage scientists to provide data • Support those that may have good solutions
  43. 43. JCAMP-MOL
  44. 44. ChAMP – Stuart Chalk
  45. 45. Thank you Email: ORCID: 0000-0002-2668-4821 Twitter: @ChemConnector Personal Blog: SLIDES: