Using an online database of
chemical compounds for the
purpose of structure
identification
Antony Williams, Valery Tkachen...
Free and Easy
• Everything I will show in terms of ChemSpider
is available for free online today
• To make it easy to “tak...
Mass Spectrometry for
Structure ID
• Many applications of mass spectrometry are the
identification of “knowns”
• Known str...
• ~32 million chemicals and growing
• Data sourced from >500 different sources
• Crowd sourced curation and annotation
• O...
ChemSpider
What will ChemSpider give us?
What will ChemSpider give us?
What will ChemSpider give us?
What will ChemSpider give us?
Spectra: e.g. Cholesterol
Spectra
For Mass Spectrometrists
• Valuable searches for Mass Spec would be:
• Search the database by mass or formula for
structur...
Pre-calculated data
Data Source Selection
• >32 million chemicals include
• Vendor collections
• Government databases
• Individual/Lab data
• ...
Data Source Selection - Type
Data Source Selection -
Individual
Mass Spec Analysis
Jim Little, Eastman Chemical
ChemSpider Interface
1287 Hits Ranked by Defect
1287 Hits Ranked by # of
References
Top Ranked Hit
Tinuvin 328
What can I find on
ChemSpider?
What can I find?
What can I find?
Source and Purchase…
What can I find on
ChemSpider?
External Calculation Engines
What can I find on
ChemSpider?
and in the RSC Databases..
Linked to the Publisher
What can I find?
And out to Google Patents
And What About the Entire
Web?
The InChI Identifier
InChIStrings Hash to InChIKeys
Searching Internet by Structure
Extended Study
Sorting by references
Position sorted by references
Position 1 only
Searching by Monoisotopic Mass
Improved Searches
Substructure Search with Mass Filter
352.239 +/- 0.0018
Identification of “Known
Unknowns”
• “Known Unknowns” can be identified by
searching in ChemSpider
• Searching of “segrega...
We Are Doomed I Tell You!!!
http://www.pharma-sea.eu/
The PharmaSea Website
What about ID’ing
“Unknowns”?
• Bring together various spectroscopic
techniques for structure elucidation –
primarily NMR ...
• Index literature related to marine natural
products: 26K articles and growing
• Structure searchable database
• Data inc...
Web Services
Web Services Open Up
Collaboration
• Agilent, Bruker, Waters and Thermo all using
or investigating our web-based services ...
Results of the ChemSpider Search
in the MarkerLynx Worksheet
Hit Details in ChemSpider
Future Developments
• Enhanced support for Multiple Substructures
• Mass to formula conversion
• Expand data sources with ...
Acknowledgments
• RSC Cheminformatics Team
• James Little, Eastman Chemical Company
• Depositors of data – there are many!
Thank you
Email: williamsa@rsc.org
ORCID: 0000-0002-2668-4821
Twitter: @ChemConnector
Personal Blog: www.chemconnector.com...
Using an online database of chemical compounds for the purpose of structure identification
Using an online database of chemical compounds for the purpose of structure identification
Upcoming SlideShare
Loading in...5
×

Using an online database of chemical compounds for the purpose of structure identification

2,322

Published on


Online databases can be used for the purposes of structure identification. The Royal Society of Chemistry provides access to an online database containing tens of millions of compounds and this has been shown to be a very effective platform for the development of tools for structure identification. Since in many cases an unknown to an investigator is known in the chemical literature or reference database, these “known unknowns” are commonly available now on aggregated internet resources. The identification of these types of compounds in commercial, environmental, forensic, and natural product samples can be identified by searching against these large aggregated databases querying by either elemental composition or monoisotopic mass. Searching by elemental composition is the preferred approach as it is often difficult to determine a unique elemental composition for compounds with molecular weights greater than 600 Da. In these cases, searching by the monoisotopic mass is advantageous. In either case, the search results can be refined by appropriate filtering to identify the compounds. We will report on integrated filtering and search approaches on our aggregated compound database for the purpose of structure identification and review our progress in using the platform for natural product dereplication purposes.

Published in: Science
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
2,322
On Slideshare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • MarinLit is ‘article-centric’ and not compound centric. Compounds are only indexed when they are newly discovered, revised, or new to marine.
    All compound records link to the paper they were first mentioned. They are not linked to subsequent articles that describe them.
  • Using an online database of chemical compounds for the purpose of structure identification

    1. 1. Using an online database of chemical compounds for the purpose of structure identification Antony Williams, Valery Tkachenko and Alexey Pshenichnov ACS San Francisco August 2014
    2. 2. Free and Easy • Everything I will show in terms of ChemSpider is available for free online today • To make it easy to “take notes” these slides are already available at: www.slideshare.net/AntonyWilliams/
    3. 3. Mass Spectrometry for Structure ID • Many applications of mass spectrometry are the identification of “knowns” • Known structures, previously characterized, previously identified and, increasingly, online • Dereplication, identification of “other manufacturers” materials, metabolites, lipids analysis – can be supported by existing databases • What large database could serve mass spec. ?
    4. 4. • ~32 million chemicals and growing • Data sourced from >500 different sources • Crowd sourced curation and annotation • Ongoing deposition of data from our journals and our collaborators • Structure centric hub for web-searching • …and a really big dictionary!!!
    5. 5. ChemSpider
    6. 6. What will ChemSpider give us?
    7. 7. What will ChemSpider give us?
    8. 8. What will ChemSpider give us?
    9. 9. What will ChemSpider give us?
    10. 10. Spectra: e.g. Cholesterol
    11. 11. Spectra
    12. 12. For Mass Spectrometrists • Valuable searches for Mass Spec would be: • Search the database by mass or formula for structure identification • Search subsets of data – e.g. “metabolism”, pesticides etc • Link structure-based data across the internet • Provide “programming interfaces” to integrate • Does ChemSpider provide value to Mass Spectrometrists?
    13. 13. Pre-calculated data
    14. 14. Data Source Selection • >32 million chemicals include • Vendor collections • Government databases • Individual/Lab data • Publication data • All segregated allowing for data source selection
    15. 15. Data Source Selection - Type
    16. 16. Data Source Selection - Individual
    17. 17. Mass Spec Analysis Jim Little, Eastman Chemical
    18. 18. ChemSpider Interface
    19. 19. 1287 Hits Ranked by Defect
    20. 20. 1287 Hits Ranked by # of References
    21. 21. Top Ranked Hit
    22. 22. Tinuvin 328
    23. 23. What can I find on ChemSpider?
    24. 24. What can I find?
    25. 25. What can I find?
    26. 26. Source and Purchase…
    27. 27. What can I find on ChemSpider?
    28. 28. External Calculation Engines
    29. 29. What can I find on ChemSpider?
    30. 30. and in the RSC Databases..
    31. 31. Linked to the Publisher
    32. 32. What can I find?
    33. 33. And out to Google Patents
    34. 34. And What About the Entire Web?
    35. 35. The InChI Identifier
    36. 36. InChIStrings Hash to InChIKeys
    37. 37. Searching Internet by Structure
    38. 38. Extended Study Sorting by references
    39. 39. Position sorted by references
    40. 40. Position 1 only
    41. 41. Searching by Monoisotopic Mass
    42. 42. Improved Searches Substructure Search with Mass Filter 352.239 +/- 0.0018
    43. 43. Identification of “Known Unknowns” • “Known Unknowns” can be identified by searching in ChemSpider • Searching of “segregated” datasets can be performed • Datasets can be expanded for specific projects – for example, natural products ID…
    44. 44. We Are Doomed I Tell You!!!
    45. 45. http://www.pharma-sea.eu/
    46. 46. The PharmaSea Website
    47. 47. What about ID’ing “Unknowns”? • Bring together various spectroscopic techniques for structure elucidation – primarily NMR and Mass Spectrometry • Work to identify substructural fragments • Use Computer-Assisted Structure Elucidation
    48. 48. • Index literature related to marine natural products: 26K articles and growing • Structure searchable database • Data includes taxonomy, location and literature • “Spectral features” generated algorithmically • Utilize the spectral features for dereplication • Initially NMR and MS
    49. 49. Web Services
    50. 50. Web Services Open Up Collaboration • Agilent, Bruker, Waters and Thermo all using or investigating our web-based services for compound lookup • Many academic sites integrating directly – metabonomics, name lookup, mass-based searching
    51. 51. Results of the ChemSpider Search in the MarkerLynx Worksheet
    52. 52. Hit Details in ChemSpider
    53. 53. Future Developments • Enhanced support for Multiple Substructures • Mass to formula conversion • Expand data sources with MS focus
    54. 54. Acknowledgments • RSC Cheminformatics Team • James Little, Eastman Chemical Company • Depositors of data – there are many!
    55. 55. Thank you Email: williamsa@rsc.org ORCID: 0000-0002-2668-4821 Twitter: @ChemConnector Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×