Utilizing Online Databases for the Purposeof Structure Identification – ApproachesUtilizing the ChemSpider ResourceAntony ...
Free and Easy• Everything I will show in terms of ChemSpider isavailable for free online today• To make it easy to “take n...
Mass Spectrometry for Structure ID• Many applications of mass spectrometry are theidentification of “knowns”• Known struct...
ChemSpider• > 28 million chemicals with associated data• Linked out to 400 data sources…
ChemSpider
What will ChemSpider give us??
What will ChemSpider give us??
What will ChemSpider give us??
What will ChemSpider give us??
Spectra: e.g. Cholesterol
Spectra
For Mass Spectrometrists• Valuable searches for Mass Spec would be:– Search the database by mass or formula forstructure i...
Pre-calculated data
Data Source Selection• >28 million chemicals include– Vendor collections– Government databases– Individual/Lab data– Publi...
Data Source Selection - Type
Data Source Selection - Individual
Mass Spec AnalysisJim Little, Eastman Chemical
ChemSpider Interface
1287 Hits Ranked by Defect
1287 Hits Ranked by # of References
Top Ranked Hit
Tinuvin 328
What can I find on ChemSpider?
What can I find on ChemSpider?
What can I find on ChemSpider?
Source and Purchase…
What can I find on ChemSpider?
External Calculation Engines
What can I find on ChemSpider?
and in the RSC Databases..
Linked to the Publisher
What can I find on ChemSpider?
And out to Google Patents
And is it “Dangerous”
And What About the Entire Web?
The InChI Identifier
InChIStrings Hash to InChIKeys
Searching the Internet by Structure
What can I find on ChemSpider?• Experimental properties• Predicted properties• Literature links• Book Links• Database link...
Extended StudySorting by references
Position sorted by references
Position 1 only
Searching by Monoisotopic Mass
Improved Searches• Substructure Search with Mass Filter352.239 +/- 0.0018
Identification of “KnownUnknowns”• “Known Unknowns” can be identified bysearching in ChemSpider• Searching of “segregated”...
• FP7 Initiative. PharmaSea: increasing value and flow inthe marine biodiscovery pipeline
The PharmaSea Project• PharmaSea project for the identification ofnatural products – dereplication approaches– Use MS sear...
What about ID’ing “Unknowns”?• Bring together various spectroscopic techniquesfor structure elucidation – primarily NMR an...
CASE Systems
Blind Trials of CASEhttp://www.jcheminf.com/content/4/1/5
The PharmaSea Project• PharmaSea project for the identification ofnatural products – dereplication approaches– Use MS sear...
Web Services Open Up Collaboration• Agilent, Bruker, Waters and Thermo all using orinvestigating our web-based services fo...
Web Services
Results of the ChemSpider Searchin the MarkerLynx Worksheet
Hit Details in ChemSpider
Calculation of Elemental Composition &ChemSpider Search of Lipid Maps DatabasePerformed via MarkerLynx
Some usage statistics• ca. 200 visitors at any one time, ~30,000 visits per day• Mar 4-Apr 3, 2013– Visits = 731,656– Uniq...
Crowdsourcing ChemSpider• ChemSpider is crowdsourced• Community deposition,annotation and curation• Anyone can “Leave Feed...
Future Developments• Support for Multiple Substructures• Mass to formula conversion• Expand data sources with MS focus• Ho...
SpectraSchool http://spectraschool.rsc.org/
Presently in Beta• Storage and display of ASSIGNED spectra –already started with NMR spectral assignment
Coming Soon – NIST DB in ChemSpider
How long until Mobile StructureID?
Formula Generation
NMR Prediction
Acknowledgments• RSC eScience Team• James Little, Eastman Chemical Company• Alexey Pshenichnov, University of Leicester –S...
Thank youEmail: williamsa@rsc.orgTwitter: @ChemConnectorPersonal Blog: www.chemconnector.comSLIDES: www.slideshare.net/Ant...
Utilizing Online Databases for the Purpose of Structure Identification – Approaches Utilizing the ChemSpider Resource
Utilizing Online Databases for the Purpose of Structure Identification – Approaches Utilizing the ChemSpider Resource
Upcoming SlideShare
Loading in …5
×

Utilizing Online Databases for the Purpose of Structure Identification – Approaches Utilizing the ChemSpider Resource

2,058 views

Published on

As scientists we now have online a number of domain specific databases in chemistry for us to use. While there are hundreds of these “compound databases” available for us to access there are very few developed with the concerns of the analytical scientist in mind. ChemSpider is a free resource from the Royal Society of Chemistry hosting over 28 million chemicals from over 400 data sources and is already utilized by the mass spectrometry community in particular to aid in the process of structure verification. This presentation will give an overview of how ChemSpider has become one of the internets’ primary resources for chemists providing access to chemicals, experimental and predicted properties, patents, publications and analytical data and ultimately acting as a structure-centric hub. The importance of programming interfaces to allow for integration and how the primary mass spectrometry vendors are already utilizing ChemSpider will be discussed. The progress towards providing a dereplication platform for natural products using a combination of mass spectrometry and NMR spectroscopy data will be outlined and the path forward to a fully chemical structure-enabled internet and the importance of data interchange standards to enable this will be discussed.

Published in: Technology
  • Be the first to comment

Utilizing Online Databases for the Purpose of Structure Identification – Approaches Utilizing the ChemSpider Resource

  1. 1. Utilizing Online Databases for the Purposeof Structure Identification – ApproachesUtilizing the ChemSpider ResourceAntony WilliamsTriangle Chromatography Discussion GroupMay 16th2013
  2. 2. Free and Easy• Everything I will show in terms of ChemSpider isavailable for free online today• To make it easy to “take notes” these slides willgo online tonight for you to downloadwww.slideshare.net/AntonyWilliams/
  3. 3. Mass Spectrometry for Structure ID• Many applications of mass spectrometry are theidentification of “knowns”• Known structures, previously characterized,previously identified and, increasingly, online• Dereplication, identification of “othermanufacturers” materials, metabolites, lipidsanalysis – can be supported by existing databases• What large database could serve mass spec. ?
  4. 4. ChemSpider• > 28 million chemicals with associated data• Linked out to 400 data sources…
  5. 5. ChemSpider
  6. 6. What will ChemSpider give us??
  7. 7. What will ChemSpider give us??
  8. 8. What will ChemSpider give us??
  9. 9. What will ChemSpider give us??
  10. 10. Spectra: e.g. Cholesterol
  11. 11. Spectra
  12. 12. For Mass Spectrometrists• Valuable searches for Mass Spec would be:– Search the database by mass or formula forstructure identification– Search subsets of data – e.g. “metabolism”,pesticides etc– Link structure-based data across the internet– Provide “programming interfaces” to integrate– Does ChemSpider provide value to MassSpectrometrists?
  13. 13. Pre-calculated data
  14. 14. Data Source Selection• >28 million chemicals include– Vendor collections– Government databases– Individual/Lab data– Publication data– All segregated allowing for data source selection
  15. 15. Data Source Selection - Type
  16. 16. Data Source Selection - Individual
  17. 17. Mass Spec AnalysisJim Little, Eastman Chemical
  18. 18. ChemSpider Interface
  19. 19. 1287 Hits Ranked by Defect
  20. 20. 1287 Hits Ranked by # of References
  21. 21. Top Ranked Hit
  22. 22. Tinuvin 328
  23. 23. What can I find on ChemSpider?
  24. 24. What can I find on ChemSpider?
  25. 25. What can I find on ChemSpider?
  26. 26. Source and Purchase…
  27. 27. What can I find on ChemSpider?
  28. 28. External Calculation Engines
  29. 29. What can I find on ChemSpider?
  30. 30. and in the RSC Databases..
  31. 31. Linked to the Publisher
  32. 32. What can I find on ChemSpider?
  33. 33. And out to Google Patents
  34. 34. And is it “Dangerous”
  35. 35. And What About the Entire Web?
  36. 36. The InChI Identifier
  37. 37. InChIStrings Hash to InChIKeys
  38. 38. Searching the Internet by Structure
  39. 39. What can I find on ChemSpider?• Experimental properties• Predicted properties• Literature links• Book Links• Database links• Where to Buy• Patent links• Spectral data• Toxicity data• Virtual screening data• ……….and a hub forsearching the entireinternet!!!
  40. 40. Extended StudySorting by references
  41. 41. Position sorted by references
  42. 42. Position 1 only
  43. 43. Searching by Monoisotopic Mass
  44. 44. Improved Searches• Substructure Search with Mass Filter352.239 +/- 0.0018
  45. 45. Identification of “KnownUnknowns”• “Known Unknowns” can be identified bysearching in ChemSpider• Searching of “segregated” datasets can beperformed• Datasets can be expanded for specific projects –for example, natural products ID…
  46. 46. • FP7 Initiative. PharmaSea: increasing value and flow inthe marine biodiscovery pipeline
  47. 47. The PharmaSea Project• PharmaSea project for the identification ofnatural products – dereplication approaches– Use MS searches of natural product slices to identify– Natural product data include from RSC databases(NPU) and ChemSpider data sources
  48. 48. What about ID’ing “Unknowns”?• Bring together various spectroscopic techniquesfor structure elucidation – primarily NMR andMass Spectrometry• Work to identify substructural fragments• Use Computer-Assisted Structure Elucidation
  49. 49. CASE Systems
  50. 50. Blind Trials of CASEhttp://www.jcheminf.com/content/4/1/5
  51. 51. The PharmaSea Project• PharmaSea project for the identification ofnatural products – dereplication approaches– Use MS searches of natural product slices to identify– Natural product data include from RSC databases(NPU) and ChemSpider data sources– Pre-fragment compounds and develop searches– Dereplication using NMR data• NMR features• Predicted spectra and “Verification approaches”• CASE based approaches
  52. 52. Web Services Open Up Collaboration• Agilent, Bruker, Waters and Thermo all using orinvestigating our web-based services forcompound lookup• Many academic sites integrating directly –metabonomics, name lookup, mass-basedsearching
  53. 53. Web Services
  54. 54. Results of the ChemSpider Searchin the MarkerLynx Worksheet
  55. 55. Hit Details in ChemSpider
  56. 56. Calculation of Elemental Composition &ChemSpider Search of Lipid Maps DatabasePerformed via MarkerLynx
  57. 57. Some usage statistics• ca. 200 visitors at any one time, ~30,000 visits per day• Mar 4-Apr 3, 2013– Visits = 731,656– Unique Visitors = 527,008• Independent servers to support other projects
  58. 58. Crowdsourcing ChemSpider• ChemSpider is crowdsourced• Community deposition,annotation and curation• Anyone can “Leave Feedback”• Registered users can add data
  59. 59. Future Developments• Support for Multiple Substructures• Mass to formula conversion• Expand data sources with MS focus• Hosting reference data for Metabonomics• Investigating how to serve chromatographers• What can we do for you???• Anybody in the audience teach spectroscopy??
  60. 60. SpectraSchool http://spectraschool.rsc.org/
  61. 61. Presently in Beta• Storage and display of ASSIGNED spectra –already started with NMR spectral assignment
  62. 62. Coming Soon – NIST DB in ChemSpider
  63. 63. How long until Mobile StructureID?
  64. 64. Formula Generation
  65. 65. NMR Prediction
  66. 66. Acknowledgments• RSC eScience Team• James Little, Eastman Chemical Company• Alexey Pshenichnov, University of Leicester –SpectraSchool• ACD/Labs – Assigned Spectra Display Widget• Depositors of data – there are many!
  67. 67. Thank youEmail: williamsa@rsc.orgTwitter: @ChemConnectorPersonal Blog: www.chemconnector.comSLIDES: www.slideshare.net/AntonyWilliams

×