Your SlideShare is downloading. ×
Using an online database of chemical compounds for the purpose of structure identification
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Using an online database of chemical compounds for the purpose of structure identification


Published on

Online databases can be used for the purposes of structure identification. The Royal Society of Chemistry provides access to an online database containing tens of millions of compounds and this has been shown to be a very effective platform for the development of tools for structure identification. Since in many cases an unknown to an investigator is known in the chemical literature or reference database, these “known unknowns” are commonly available now on aggregated internet resources. The identification of these types of compounds in commercial, environmental, forensic, and natural product samples can be identified by searching against these large aggregated databases querying by either elemental composition or monoisotopic mass. Searching by elemental composition is the preferred approach as it is often difficult to determine a unique elemental composition for compounds with molecular weights greater than 600 Da. In these cases, searching by the monoisotopic mass is advantageous. In either case, the search results can be refined by appropriate filtering to identify the compounds. We will report on integrated filtering and search approaches on our aggregated compound database for the purpose of structure identification and review our progress in using the platform for natural product dereplication purposes.

Published in: Science

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • MarinLit is ‘article-centric’ and not compound centric. Compounds are only indexed when they are newly discovered, revised, or new to marine.
    All compound records link to the paper they were first mentioned. They are not linked to subsequent articles that describe them.
  • Transcript

    • 1. Using an online database of chemical compounds for the purpose of structure identification Antony Williams, Valery Tkachenko and Alexey Pshenichnov ACS San Francisco August 2014
    • 2. Free and Easy • Everything I will show in terms of ChemSpider is available for free online today • To make it easy to “take notes” these slides are already available at:
    • 3. Mass Spectrometry for Structure ID • Many applications of mass spectrometry are the identification of “knowns” • Known structures, previously characterized, previously identified and, increasingly, online • Dereplication, identification of “other manufacturers” materials, metabolites, lipids analysis – can be supported by existing databases • What large database could serve mass spec. ?
    • 4. • ~32 million chemicals and growing • Data sourced from >500 different sources • Crowd sourced curation and annotation • Ongoing deposition of data from our journals and our collaborators • Structure centric hub for web-searching • …and a really big dictionary!!!
    • 5. ChemSpider
    • 6. What will ChemSpider give us?
    • 7. What will ChemSpider give us?
    • 8. What will ChemSpider give us?
    • 9. What will ChemSpider give us?
    • 10. Spectra: e.g. Cholesterol
    • 11. Spectra
    • 12. For Mass Spectrometrists • Valuable searches for Mass Spec would be: • Search the database by mass or formula for structure identification • Search subsets of data – e.g. “metabolism”, pesticides etc • Link structure-based data across the internet • Provide “programming interfaces” to integrate • Does ChemSpider provide value to Mass Spectrometrists?
    • 13. Pre-calculated data
    • 14. Data Source Selection • >32 million chemicals include • Vendor collections • Government databases • Individual/Lab data • Publication data • All segregated allowing for data source selection
    • 15. Data Source Selection - Type
    • 16. Data Source Selection - Individual
    • 17. Mass Spec Analysis Jim Little, Eastman Chemical
    • 18. ChemSpider Interface
    • 19. 1287 Hits Ranked by Defect
    • 20. 1287 Hits Ranked by # of References
    • 21. Top Ranked Hit
    • 22. Tinuvin 328
    • 23. What can I find on ChemSpider?
    • 24. What can I find?
    • 25. What can I find?
    • 26. Source and Purchase…
    • 27. What can I find on ChemSpider?
    • 28. External Calculation Engines
    • 29. What can I find on ChemSpider?
    • 30. and in the RSC Databases..
    • 31. Linked to the Publisher
    • 32. What can I find?
    • 33. And out to Google Patents
    • 34. And What About the Entire Web?
    • 35. The InChI Identifier
    • 36. InChIStrings Hash to InChIKeys
    • 37. Searching Internet by Structure
    • 38. Extended Study Sorting by references
    • 39. Position sorted by references
    • 40. Position 1 only
    • 41. Searching by Monoisotopic Mass
    • 42. Improved Searches Substructure Search with Mass Filter 352.239 +/- 0.0018
    • 43. Identification of “Known Unknowns” • “Known Unknowns” can be identified by searching in ChemSpider • Searching of “segregated” datasets can be performed • Datasets can be expanded for specific projects – for example, natural products ID…
    • 44. We Are Doomed I Tell You!!!
    • 45.
    • 46. The PharmaSea Website
    • 47. What about ID’ing “Unknowns”? • Bring together various spectroscopic techniques for structure elucidation – primarily NMR and Mass Spectrometry • Work to identify substructural fragments • Use Computer-Assisted Structure Elucidation
    • 48. • Index literature related to marine natural products: 26K articles and growing • Structure searchable database • Data includes taxonomy, location and literature • “Spectral features” generated algorithmically • Utilize the spectral features for dereplication • Initially NMR and MS
    • 49. Web Services
    • 50. Web Services Open Up Collaboration • Agilent, Bruker, Waters and Thermo all using or investigating our web-based services for compound lookup • Many academic sites integrating directly – metabonomics, name lookup, mass-based searching
    • 51. Results of the ChemSpider Search in the MarkerLynx Worksheet
    • 52. Hit Details in ChemSpider
    • 53. Future Developments • Enhanced support for Multiple Substructures • Mass to formula conversion • Expand data sources with MS focus
    • 54. Acknowledgments • RSC Cheminformatics Team • James Little, Eastman Chemical Company • Depositors of data – there are many!
    • 55. Thank you Email: ORCID: 0000-0002-2668-4821 Twitter: @ChemConnector Personal Blog: SLIDES: