Using an online database of chemical compounds for the purpose of structure identification


Published on

Online databases can be used for the purposes of structure identification. The Royal Society of Chemistry provides access to an online database containing tens of millions of compounds and this has been shown to be a very effective platform for the development of tools for structure identification. Since in many cases an unknown to an investigator is known in the chemical literature or reference database, these “known unknowns” are commonly available now on aggregated internet resources. The identification of these types of compounds in commercial, environmental, forensic, and natural product samples can be identified by searching against these large aggregated databases querying by either elemental composition or monoisotopic mass. Searching by elemental composition is the preferred approach as it is often difficult to determine a unique elemental composition for compounds with molecular weights greater than 600 Da. In these cases, searching by the monoisotopic mass is advantageous. In either case, the search results can be refined by appropriate filtering to identify the compounds. We will report on integrated filtering and search approaches on our aggregated compound database for the purpose of structure identification and review our progress in using the platform for natural product dereplication purposes.

Published in: Science
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • MarinLit is ‘article-centric’ and not compound centric. Compounds are only indexed when they are newly discovered, revised, or new to marine.
    All compound records link to the paper they were first mentioned. They are not linked to subsequent articles that describe them.
  • Using an online database of chemical compounds for the purpose of structure identification

    1. 1. Using an online database of chemical compounds for the purpose of structure identification Antony Williams, Valery Tkachenko and Alexey Pshenichnov ACS San Francisco August 2014
    2. 2. Free and Easy • Everything I will show in terms of ChemSpider is available for free online today • To make it easy to “take notes” these slides are already available at:
    3. 3. Mass Spectrometry for Structure ID • Many applications of mass spectrometry are the identification of “knowns” • Known structures, previously characterized, previously identified and, increasingly, online • Dereplication, identification of “other manufacturers” materials, metabolites, lipids analysis – can be supported by existing databases • What large database could serve mass spec. ?
    4. 4. • ~32 million chemicals and growing • Data sourced from >500 different sources • Crowd sourced curation and annotation • Ongoing deposition of data from our journals and our collaborators • Structure centric hub for web-searching • …and a really big dictionary!!!
    5. 5. ChemSpider
    6. 6. What will ChemSpider give us?
    7. 7. What will ChemSpider give us?
    8. 8. What will ChemSpider give us?
    9. 9. What will ChemSpider give us?
    10. 10. Spectra: e.g. Cholesterol
    11. 11. Spectra
    12. 12. For Mass Spectrometrists • Valuable searches for Mass Spec would be: • Search the database by mass or formula for structure identification • Search subsets of data – e.g. “metabolism”, pesticides etc • Link structure-based data across the internet • Provide “programming interfaces” to integrate • Does ChemSpider provide value to Mass Spectrometrists?
    13. 13. Pre-calculated data
    14. 14. Data Source Selection • >32 million chemicals include • Vendor collections • Government databases • Individual/Lab data • Publication data • All segregated allowing for data source selection
    15. 15. Data Source Selection - Type
    16. 16. Data Source Selection - Individual
    17. 17. Mass Spec Analysis Jim Little, Eastman Chemical
    18. 18. ChemSpider Interface
    19. 19. 1287 Hits Ranked by Defect
    20. 20. 1287 Hits Ranked by # of References
    21. 21. Top Ranked Hit
    22. 22. Tinuvin 328
    23. 23. What can I find on ChemSpider?
    24. 24. What can I find?
    25. 25. What can I find?
    26. 26. Source and Purchase…
    27. 27. What can I find on ChemSpider?
    28. 28. External Calculation Engines
    29. 29. What can I find on ChemSpider?
    30. 30. and in the RSC Databases..
    31. 31. Linked to the Publisher
    32. 32. What can I find?
    33. 33. And out to Google Patents
    34. 34. And What About the Entire Web?
    35. 35. The InChI Identifier
    36. 36. InChIStrings Hash to InChIKeys
    37. 37. Searching Internet by Structure
    38. 38. Extended Study Sorting by references
    39. 39. Position sorted by references
    40. 40. Position 1 only
    41. 41. Searching by Monoisotopic Mass
    42. 42. Improved Searches Substructure Search with Mass Filter 352.239 +/- 0.0018
    43. 43. Identification of “Known Unknowns” • “Known Unknowns” can be identified by searching in ChemSpider • Searching of “segregated” datasets can be performed • Datasets can be expanded for specific projects – for example, natural products ID…
    44. 44. We Are Doomed I Tell You!!!
    45. 45.
    46. 46. The PharmaSea Website
    47. 47. What about ID’ing “Unknowns”? • Bring together various spectroscopic techniques for structure elucidation – primarily NMR and Mass Spectrometry • Work to identify substructural fragments • Use Computer-Assisted Structure Elucidation
    48. 48. • Index literature related to marine natural products: 26K articles and growing • Structure searchable database • Data includes taxonomy, location and literature • “Spectral features” generated algorithmically • Utilize the spectral features for dereplication • Initially NMR and MS
    49. 49. Web Services
    50. 50. Web Services Open Up Collaboration • Agilent, Bruker, Waters and Thermo all using or investigating our web-based services for compound lookup • Many academic sites integrating directly – metabonomics, name lookup, mass-based searching
    51. 51. Results of the ChemSpider Search in the MarkerLynx Worksheet
    52. 52. Hit Details in ChemSpider
    53. 53. Future Developments • Enhanced support for Multiple Substructures • Mass to formula conversion • Expand data sources with MS focus
    54. 54. Acknowledgments • RSC Cheminformatics Team • James Little, Eastman Chemical Company • Depositors of data – there are many!
    55. 55. Thank you Email: ORCID: 0000-0002-2668-4821 Twitter: @ChemConnector Personal Blog: SLIDES: