Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Overview of open resources to support automated structure verification and elucidation

704 views

Published on

Cheminformatics methods form an essential basis for providing analytical scientists with access to data, algorithms and workflows. There are an increasing number of free online databases (compound databases, spectral libraries, data repositories) and a rich collection of software approaches that can be used to support automated structure verification and elucidation, specifically for Nuclear Magnetic Resonance (NMR) and Mass Spectrometry (MS). This presentation will provide an overview of freely available data, tools, databases and approaches available to support chemical structure verification and elucidation and highlight some of the known issues regarding data quality and suggest approaches for resolving some of the issues. The importance of structure and spectral standards for data exchange will be discussed, especially with regard to how spectral data can be made openly available to the community via online tools and through scientific publishing. This work does not necessarily reflect U.S. EPA policy.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Overview of open resources to support automated structure verification and elucidation

  1. 1. Overview of open resources to support automated structure verification and elucidation Antony Williams1 and Emma Schymanski2 1. National Center for Computational Toxicology, US EPA, Research Triangle Park, Durham, NC, USA. 2. Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, Luxembourg. March 2018 ACS Spring Meeting, New Orleans http://www.orcid.org/0000-0002-2668-4821 The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
  2. 2. Today’s Session • Our focus for the session: – Access to data to support automated structure verification and elucidation – NMR and MS – Data quality, curation and validation – and a call to action – OPENness is here – Open Access, Data, Source – Data standards – we already have them and there are more coming – Vendors and scientists providing and using available data – There are tools USING these data for Structure Elucidation – Cannot be an exhaustive review…but at least a good start 1
  3. 3. An Ideal Scenario… • All published structures and spectra will be available from all published articles for repurposing and reuse in standard formats (preferably not necessarily Open!) • Scientists are building open approaches – MS fragmentation – NMR shift prediction – Structure generators – Computer-Assisted Structure Elucidation (CASE) • Are we there yet??? 2
  4. 4. Publishers sharing data • We have achieved ideal scenario right? • No – PDF figures in Supplementary Info is still the default • There is a need for public databases of spectral data. There ARE some out there. • Analogous to Wikipedia, we are primarily consumers rather than contributors… 3
  5. 5. Sites sharing data • There are many sites that “share” spectral data. Generally in non-open formats • There are rich resources • Cannot easily be used to serve automated structure verification and elucidation 4
  6. 6. PubChem – Spectral Links 5
  7. 7. PubChem - Spectral Links 6
  8. 8. Spectral Links to Partial Data 7
  9. 9. SDBS – Free Not Open 8
  10. 10. SDBS – Free Not Open 9
  11. 11. SDBS – Free Not Open 10
  12. 12. ChemSpider 11
  13. 13. ChemSpider 12
  14. 14. ChemSpider 13
  15. 15. NIST WebBook https://webbook.nist.gov/chemistry/ 14
  16. 16. NIST WebBook https://webbook.nist.gov/chemistry/ 15
  17. 17. Focused Databases • Focused databases – Compiled focused databases of Open Data are preferable – Spectral data for structure verification and elucidation – Open Mass Spec Data especially useful (Emma’s talks!) – Data can be brought in-house and integrated – Algorithms can be derived – e.g. NMR shift prediction 16
  18. 18. NMRShiftDB https://nmrshiftdb.nmr.uni-koeln.de/ 17
  19. 19. NMRShiftDB https://nmrshiftdb.nmr.uni-koeln.de/ 18
  20. 20. NMRShiftDB https://nmrshiftdb.nmr.uni-koeln.de/ 19
  21. 21. Open Resources • Open Databases offer more value – Bring the data in-house, integrate, link – Ingest and train algorithms 20
  22. 22. CSEARCH/NMRPREDICT http://nmrpredict.orc.univie.ac.at/ 21
  23. 23. MassBank https://massbank.eu/MassBank/ 22
  24. 24. MassBank https://massbank.eu/MassBank/ 23
  25. 25. m/z CLOUD https://www.mzcloud.org/ 24
  26. 26. Integrating Data and Services • Integration: – Use simple URL linking for navigation – Provide simple services for real time prediction 25
  27. 27. Example Integration via the CompTox Chemistry Dashboard 26
  28. 28. Link-Based Access 27
  29. 29. Link Access 28
  30. 30. 29
  31. 31. Open Data For Bulk Predictions 30 • Open Data for apps – Structures – CAS Registry Numbers – Names – Formulae – Mass • iOS app including predicted C13 NMR
  32. 32. Mass Searching 31
  33. 33. Mass and CNMR Searching 32
  34. 34. Important Standards in our efforts • Structures – Molfile, SDF file, InChIs (standard and non-standard) • NMR – JCAMP and all its variants • MS – mz(X)ML, MSP (and all its variants), MGF, MassBank 33
  35. 35. There are more coming 34
  36. 36. Conclusion • The abundance of online data continues to grow • There are “integrated data”, there are databases, there are online tools, there are mobile apps • Data Quality is critical and OPENness is enabling – Open Data – Open Standards – Open Source • The rest of the day will expand on these efforts… 35
  37. 37. Contact Antony Williams US EPA Office of Research and Development National Center for Computational Toxicology (NCCT) Williams.Antony@epa.gov ORCID: https://orcid.org/0000-0002-2668-4821 36

×