Navigating the Complex Web of Chemistry Using ChemSpider


Published on

There is an increasing availability of free and open access resources for scientists to use on the internet. Coupled with the increasing availability of Open Source software tools we are in the middle of a revolution in data availability and tools to manipulate these data. ChemSpider is a free access website for chemists built with the intention of providing a structure centric community for chemists. As an aggregator of chemistry related information from many sources, at present over 21.5 million unique chemical entities from over 200 separate data sources, ChemSpider has taken on the task of both robotically and manually curating publicly available data sources. This presentation will provide an overview of the ChemSpider platform and how it is fast becoming the centralized hub for resourcing information about chemical entities.

Published in: Technology, Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Navigating the Complex Web of Chemistry Using ChemSpider

  1. 1. Navigating the Complex Web of Chemistry Using ChemSpider
  2. 2. Antony Williams vs Identifiers Old Passport ID Dad, Tony, others SSN Green Card License 5 email addresses ChemSpiderman (blog, Twitter account, Facebook, Friendfeed) OpenID … .
  3. 3. Aspirin vs Chemical Identifiers
  4. 4. Aspirin names and synonyms <ul><li>Text searches depend on correct association </li></ul><ul><li>335 suggested identifiers for Aspirin just on PubChem! </li></ul><ul><li>Disambiguation dictionaries are necessary </li></ul>
  5. 5. Linked Data Cloud
  6. 6. <ul><li>… the premium database producers are using some automatic tools to prepare a ‘first draft’ of a database record, to be refined by eye . </li></ul><ul><li>Coupled with the public internet as a distribution method of choice, it is becoming possible for the first time to create and distribute new structure based databases at much lower costs, or even free of charge. </li></ul>
  7. 9. The Final Search Strategy
  8. 10. All Those Names, One Structure
  9. 11. Content is King and Quality Costs <ul><li>Chemistry “content” is big business. Not everyone can afford it. </li></ul><ul><ul><li>Patent searching </li></ul></ul><ul><ul><li>Structures and properties </li></ul></ul><ul><ul><li>Drug databases </li></ul></ul><ul><ul><li>Literature databases </li></ul></ul><ul><li>Chemical Abstracts Service (CAS), the “Gold Standard” in Chemistry related information </li></ul><ul><ul><li>101 years of content </li></ul></ul><ul><ul><li>$260 million revenue (2006) </li></ul></ul><ul><ul><li>>50 million substances </li></ul></ul><ul><ul><li>Proprietary platform </li></ul></ul>
  10. 12. Searching Chemistry on the Internet <ul><li>How complete a result set will we get if we search for “chemicals” by name? </li></ul><ul><li>Is there a better way to link chemistry databases? Linking by “names” is dangerous </li></ul><ul><li>Chemists want structure and SUBstructure searching </li></ul>
  11. 13. The InChI Identifier
  12. 14. Multiple Layers
  13. 15. InChIStrings Hash to InChIKeys
  14. 16. Oleoylethanolamine
  15. 17. InChIKey Searches Work
  16. 18. Search Engine Dependencies
  17. 19. Search Engine Dependencies
  18. 20. InChIs have traction…
  19. 21. RDF Linking of Structures
  20. 22. PubChem
  21. 23. The Simplest Organic Molecule
  22. 24. Question Everything online:
  23. 25. The Structure-Based Data Cloud
  24. 26. Vancomycin
  25. 28. Vancomycin <ul><li>Who will curate? </li></ul><ul><li>How would you clean such a large dataset? </li></ul>
  26. 29. Vancomycin on ChemSpider
  27. 30. Vancomycin
  28. 31. Vancomycin Search Molecular SKELETON Search Full Molecule
  29. 32. Full Skeleton Search: 104 Hits
  30. 33. Full Molecule Search: 4 Hits
  31. 34. What is ChemSpider? <ul><li>ChemSpider is: </li></ul><ul><ul><li>Building a Structure Centric Community for Chemists </li></ul></ul><ul><ul><li>22.2 million compounds, >200 data sources </li></ul></ul><ul><ul><li>A deposition and curation platform </li></ul></ul><ul><ul><li>A publishing platform for the community </li></ul></ul><ul><ul><li>Grows daily – more depositions, more links, more data sources </li></ul></ul>
  32. 35. For Chemical Compounds <ul><li>Vendor sites – Aldrich, Alfa Aesar, TCI and 100s of others </li></ul><ul><li>Government databases – PubChem, DSSTox, FDA databases, ChemIDPlus,… </li></ul><ul><li>Biological Databases – Protein Database, Stitch, KEGG, ChEBI,… </li></ul><ul><li>Analytical databases –NMRShiftDB,… </li></ul>
  33. 36. How Was ChemSpider Built? <ul><li>ChemSpider was a “hobby project” </li></ul><ul><li>Housed in a basement and running off three servers – one bought, two built </li></ul><ul><li>May 2009 </li></ul>
  34. 37. <ul><li>3 servers – 2 homebuilt </li></ul><ul><li>.NET architecture </li></ul><ul><li>SQL server </li></ul><ul><li>Homebuilt structure/substructure </li></ul><ul><li>Commercial components </li></ul><ul><li>Open Source Components </li></ul><ul><ul><li>OpenBabel, Jmol, JSpecView, NCBI Toolkit, InChI Libraries </li></ul></ul>
  35. 38. Search Cholesterol
  36. 39. Search Cholesterol
  37. 40. Search Cholesterol
  38. 41. Search Cholesterol
  39. 42. Linked across the internet
  40. 43. Kyoto Encyclopedia of Genes and Genomes
  41. 44. Links to Patents based on structure
  42. 46. Answering Questions for Chemists <ul><li>Questions a chemist might ask… </li></ul><ul><ul><li>What is the melting point of n-butanol? </li></ul></ul><ul><ul><li>What is the chemical structure of Xanax? </li></ul></ul><ul><ul><li>Chemically, what is phenolphthalein? </li></ul></ul><ul><ul><li>What are the stereocenters of cholesterol? </li></ul></ul><ul><ul><li>Where can I find publications about xylene? </li></ul></ul><ul><ul><li>What are the different trade names for Ketoconazole? </li></ul></ul><ul><ul><li>What is the NMR spectrum of Aspirin? </li></ul></ul><ul><ul><li>What are the safety handling issues for Thymol Blue? </li></ul></ul>
  43. 47. Complex Data and Information
  44. 48. Remember – QUALITY ISSUES
  45. 49. The FDA’s DailyMed
  46. 50. Incorrect Structures
  47. 51. Does one stereocenter matter? <ul><li>Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, and Softenon </li></ul>
  48. 52. Crowd-sourcing Chemistry Curation
  49. 53. We Need Recognition and Rewards
  50. 54. Master Curators, Curators, Depositors
  51. 55. Collaborating with Wikipedia <ul><li>Long term project to curate chemical compounds </li></ul><ul><li>Robotically linking ChemSpider to Wikipedia at present </li></ul><ul><li>Will layer on InChI Strings and InChIKeys shortly and make Wikipedia structure searchable </li></ul>
  52. 56. Blogs need InChIs too!
  53. 57. Blogs need InChIs too!
  54. 58. Use Intelligent Structures : ChemSpider Embed Web Service
  55. 59. ChemSpider Web Services
  56. 60. Semantic Mark-up for Chemistry <ul><li>Semantic mark-up for chemistry is here </li></ul><ul><ul><li>RSC project prospect </li></ul></ul><ul><ul><li>Nature publishing group compound linking </li></ul></ul><ul><ul><li>ChemMantis </li></ul></ul>
  57. 61. Nature Chemistry Compound Pages
  58. 62. Project Prospect
  59. 63. ChemMantis
  60. 64. Deposit Structures
  61. 65. Species – linked to Wikipedia
  62. 66. Semantic Linking of Structures <ul><li>What would you want to link off a structure? </li></ul><ul><ul><li>Chemical suppliers </li></ul></ul><ul><ul><li>Other publications </li></ul></ul><ul><ul><li>Analytical Data </li></ul></ul><ul><ul><li>Related Reactions </li></ul></ul><ul><ul><li>Wikipedia </li></ul></ul><ul><ul><li>Patents </li></ul></ul><ul><ul><li>“ Everything” </li></ul></ul>
  63. 67. The InChI “Resolver”
  64. 68. InChI Resolver to DOIs Structure Search the Web
  65. 70. Conclusions <ul><li>Internet resources provide a collaborative community for chemistry </li></ul><ul><li>Crowdsourcing to expand, curate and integrate to the benefit of chemists </li></ul><ul><li>Searching the web for chemistry is arriving </li></ul><ul><li>InChIs are enabling chemistry on the internet </li></ul><ul><li>Question Quality! </li></ul>
  66. 71. [email_address] Twitter: ChemSpiderman