Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Using online chemistry databases to facilitate structure identification in mass spectral data

527 views

Published on

Increasingly online databases are being used for the purpose of structure identification. In many cases an unknown to an investigator is known in the chemical literature or online database and these “known unknowns” are commonly available in these aggregated internet resources. The identification of these types of compounds in commercial, environmental, forensic, and natural product samples can be identified by searching against these large aggregated databases querying by either elemental composition or monoisotopic mass. We will report on the search approaches that we offer on aggregated compound databases hosted by the Royal Society of Chemistry and how these resources can be used for the purpose of structure identification.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Using online chemistry databases to facilitate structure identification in mass spectral data

  1. 1. Using online chemistry databases to facilitate structure identification in mass spectral data Antony Williams, Valery Tkachenko, Alexey Pshenichnov ACS Denver, March 2015
  2. 2. Free and Easy • Everything I will show in terms of ChemSpider is available for free online today • To make it easy to “take notes” these slides will be available at: www.slideshare.net/AntonyWilliams/
  3. 3. www.ChemSpider.com
  4. 4. ChemSpider
  5. 5. What will ChemSpider give us?
  6. 6. What will ChemSpider give us?
  7. 7. For Mass Spectrometrists • Valuable searches for Mass Spec would be: • Search the database by mass or formula for structure identification • Search subsets of data – e.g. “metabolism”, pesticides etc • Link structure-based data across the internet • Provide “programming interfaces” to integrate • Does ChemSpider provide value to Mass Spectrometrists?
  8. 8. Pre-calculated data
  9. 9. Data Source Selection • >34 million chemicals include • Vendor collections • Government databases • Individual/Lab data • Publication data • All segregated allowing for data source selection
  10. 10. Data Source Selection - Type
  11. 11. Data Source Selection - Individual
  12. 12. Mass Spec Analysis Jim Little, Eastman Chemical
  13. 13. ChemSpider Interface
  14. 14. 1287 Hits Ranked by Defect
  15. 15. 1287 Hits Ranked by # of References
  16. 16. Top Ranked Hit
  17. 17. Tinuvin 328
  18. 18. What can I find on ChemSpider?
  19. 19. What can I find?
  20. 20. What can I find?
  21. 21. Source and Purchase…
  22. 22. What can I find on ChemSpider?
  23. 23. External Calculation Engines
  24. 24. What can I find on ChemSpider?
  25. 25. …and in the RSC Databases..
  26. 26. Linked to the Publisher
  27. 27. What can I find?
  28. 28. And out to Google Patents
  29. 29. What About the Entire Web?
  30. 30. The InChI Identifier
  31. 31. InChIStrings Hash to InChIKeys
  32. 32. Searching Internet by Structure
  33. 33. Extended Study Sorting by references
  34. 34. Position sorted by references
  35. 35. Position 1 only
  36. 36. Web Services For Collaboration • Many instrument vendors are using or investigating our web-based services for compound lookup • Many academic sites integrating directly – metabonomics, name lookup, mass-based searching
  37. 37. Results of the ChemSpider Search in the MarkerLynx Worksheet
  38. 38. Hit Details in ChemSpider
  39. 39. “REAL Spectral Data” • Masses on ChemSpider are clearly valuable! • We’d like to host “spectral curves” • But we’re a publisher so what can we do?
  40. 40. Spectra: Cholesterol
  41. 41. ChemSpider ID 24528095 H1 NMR
  42. 42. ChemSpider ID 24528095 HHCOSY
  43. 43. Publications & “Real Spectra” • We are turning text into spectra • We are turning figures into spectra
  44. 44. ESI – Text Spectra
  45. 45. 1H NMR (CDCl3, 400 MHz): δ = 2.57 (m, 4H, Me, C(5a)H), 4.24 (d, 1H, J = 4.8 Hz, C(11b)H), 4.35 (t, 1H, Jb = 10.8 Hz, C(6)H), 4.47 (m, 2H, C(5)H), 4.57 (dd, 1H, J = 2.8 Hz, C(6)H), 6.95 (d, 1H, J = 8.4 Hz, ArH), 7.18–7.94 (m, 11H, ArH)
  46. 46. “Where is the real data please?” FIGURE DATA
  47. 47. Future Developments • We have extracted 100s of 1000s of text strings from patents – next we go into our archive • We estimate many 1000s of figures with spectral data in our ESI and articles • We are aiming for a million spectra online… • But YOU can submit your data today and share it
  48. 48. We want this…we need YOU!
  49. 49. Data Mining – it’s mine, mine!
  50. 50. New Repository Architecture doi: 10.1007/s10822-014-9784-5
  51. 51. Acknowledgments • Jim Little, Eastman Chemical Company • Daniel Lowe – NextMove Software • Bill Brouwer – Plot2Txt Development • Carlos Cobas and Stan Sykora– MestreLabs • Patrick Wheeler - ACD/Labs
  52. 52. Thank you Email: williamsa@rsc.org ORCID: 0000-0002-2668-4821 Twitter: @ChemConnector Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams

×