Taming The Wild West Of Internet Based Chemistry You Can Help


Published on

I am an adjunct prof at University of North Carolina Chapel Hill so when I stopped by yesterday for a business meeting I was informed that I had been lined up to give a talk to the students at 1pm. I had 20 minutes to prepare and assembled a mish-mash of information that might be of value to Citizen Chemists, those who might want to contribute to chemistry on the internet

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Taming The Wild West Of Internet Based Chemistry You Can Help

  1. 1. Taming the Wild, Wild West of Chemistry on the Internet. Maybe YOU Can Help?
  2. 2. Citizen Scientists Enable the Web <ul><li>Who is writing about chemical compounds on Wikipedia? </li></ul><ul><li>Who is writing critical reviews of Chemistry online? </li></ul><ul><li>Who is blogging about chemistry on the web? </li></ul>
  3. 3. For Synthesis…TotallySynthetic.com
  4. 4. Org Prep Daily (Blog)
  5. 5. Molbank (Open Access Journal)
  6. 6. Synthetic Pages (Website)
  7. 7. Encyclopedic Articles (Wikipedia)
  8. 9. Chemistry online – An Overview <ul><li>Encyclopedic articles (Wikipedia) </li></ul><ul><li>Chemical vendor databases </li></ul><ul><li>Metabolic pathway databases </li></ul><ul><li>Property databases </li></ul><ul><li>Chemical Synthesis procedures </li></ul><ul><li>Scientific publications </li></ul><ul><li>Chemical vendors </li></ul><ul><li>Blogs </li></ul><ul><li>Wikis </li></ul><ul><li>Open Notebook Science </li></ul>
  9. 10. What and who do you trust?
  10. 11. Compounds and Identifiers
  11. 12. What is ChemSpider? <ul><li>ChemSpider is: </li></ul><ul><ul><li>Building a Structure Centric Community for Chemists </li></ul></ul><ul><ul><li>>23 million compounds, ca. 250 data sources </li></ul></ul><ul><ul><li>A deposition and curation platform </li></ul></ul><ul><ul><li>A publishing platform for the community </li></ul></ul><ul><ul><li>Grows daily – more depositions, more links, more data sources </li></ul></ul>
  12. 13. Search Cholesterol
  13. 14. Search Cholesterol
  14. 15. Search Cholesterol
  15. 16. Search Cholesterol
  16. 17. Search Cholesterol
  17. 18. Linked across the internet
  18. 19. Link off a structure in ChemSpider <ul><ul><li>Chemical suppliers </li></ul></ul><ul><ul><li>Other publications </li></ul></ul><ul><ul><li>Analytical Data </li></ul></ul><ul><ul><li>Related Reactions </li></ul></ul><ul><ul><li>Wikipedia </li></ul></ul><ul><ul><li>Patents </li></ul></ul><ul><ul><li>“ Everything” </li></ul></ul>
  19. 20. Linked to Millions of Articles
  20. 21. Answering Questions for Chemists <ul><li>Questions a chemist might ask… </li></ul><ul><ul><li>What is the melting point of n-butanol? </li></ul></ul><ul><ul><li>What is the chemical structure of Xanax? </li></ul></ul><ul><ul><li>Chemically, what is phenolphthalein? </li></ul></ul><ul><ul><li>What are the stereocenters of cholesterol? </li></ul></ul><ul><ul><li>Where can I find publications about xylene? </li></ul></ul><ul><ul><li>What are the different trade names for Ketoconazole? </li></ul></ul><ul><ul><li>What is the NMR spectrum of Aspirin? </li></ul></ul><ul><ul><li>What are the safety handling issues for Thymol Blue? </li></ul></ul>
  21. 22. What is the structure of Flibanserin?
  22. 23. What is the structure of Flibanserin?
  23. 24. Complex Data and Information
  24. 25. Various Searches <ul><li>Structure searching </li></ul><ul><li>Substructure searching </li></ul><ul><li>Subset searching – choose from 200 data sources </li></ul><ul><li>Property searching </li></ul><ul><li>Searches are used in various ways by different types of chemists… </li></ul>
  25. 26. ChemSpider Searches
  26. 27. ChemSpider Searches
  27. 28. Antony Williams vs Identifiers Passport ID Dad, Tony, others SSN Green Card License 5 email addresses ChemSpiderman (blog, Twitter account, Facebook, Friendfeed) OpenID … .
  28. 29. Aspirin vs Chemical Identifiers
  29. 30. Aspirin names and synonyms <ul><li>Text searches depend on correct association </li></ul><ul><li>335 suggested identifiers for Aspirin just on PubChem! </li></ul><ul><li>Disambiguation dictionaries are necessary </li></ul>
  30. 34. The Final Search Strategy
  31. 35. All Those Names, One Structure
  32. 36. Connections Can Lead Anywhere
  33. 37. The InChI Identifier
  34. 38. Multiple Layers
  35. 39. InChIStrings Hash to InChIKeys
  36. 40. Oleoylethanolamine
  37. 41. Search Engine Dependencies
  38. 42. Search Engine Dependencies
  39. 43. Vancomycin
  40. 44. Vancomycin <ul><li>Who will curate? </li></ul><ul><li>How would you clean such a large dataset? </li></ul>
  41. 45. Chemistry on the Internet <ul><li>Much of the information is based on assertions and User Beware! </li></ul><ul><li>The Quality of information available is diverse and how does the user know what is and is not “correct”? </li></ul>
  42. 46. Caution! Question Everything!
  43. 47. Question Everything online: www.dhmo.org
  44. 48. Vancomycin on ChemSpider
  45. 49. Vancomycin
  46. 50. Vancomycin Search Molecular SKELETON Search Full Molecule
  47. 51. Full Skeleton Search: 104 Hits
  48. 52. Full Molecule Search: 4 Hits
  49. 53. The EXPERTS must get it right?!
  50. 54. Wikipedia, C&E News, PubChem <ul><li>C&E News (from ACS) </li></ul>
  51. 55. “ Lathosterol”
  52. 56. “ Lathosterol”
  53. 57. “ Lathosterol”
  54. 58. “ Lathosterol” Removed
  55. 60. “ Lathosterol” on PubChem
  56. 61. Crowd-sourcing Chemistry Curation <ul><li>Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate </li></ul>
  57. 62. Citizen Scientists
  58. 63. Become a Data Source
  59. 65. Synthesis Procedures
  60. 66. Links to Data or Deposit Data
  61. 67. Your Blog Posted Online?
  62. 68. Upload Spectral Data, OPEN Data?
  63. 69. Semantic Mark-up for Chemistry <ul><li>Semantic mark-up for chemistry is here </li></ul><ul><ul><li>RSC project prospect (structure linking, IUPAC Gold Book ontology and other ontologies). Based on the OSCAR system </li></ul></ul><ul><ul><li>ChemSpider Journal of Chemistry </li></ul></ul><ul><ul><li>Nature publishing group compound linking </li></ul></ul>
  64. 70. ChemMantis and CJOC
  65. 71. Name-Structure Pairs
  66. 72. Deposit Structures
  67. 73. Species – linked to Wikipedia
  68. 74. In Development ChemSpider Synthesis <ul><li>ChemSpider Synthesis will be a home for all things “synthetic” </li></ul><ul><li>An online resource for synthetic procedures from blogs, other online resources, RSC supplementary info, other publishers etc. </li></ul><ul><li>Public peer-review and feedback for synthetic procedures </li></ul>
  69. 75. Online Journals and Live Data
  70. 76. ChemSpider Everywhere : Embed
  71. 77. ChemSpider Everywhere: Spectral Game
  72. 78. ChemSpider Everywhere Crowdsourced Curation of Spectra
  73. 79. ChemSpider Everywhere ChemMobi Building a Structure Centric Community for Chemists
  74. 80. ChemSpider Everywhere <ul><li>Linked from Wikipedia </li></ul><ul><li>Linked from Open Notebook Science sites </li></ul><ul><li>Linked from Blogs using Structure/Spectra </li></ul><ul><li>Integrated into structure drawing packages such as ACD/ChemSketch, Symyx Draw, Open Source applets </li></ul>
  75. 81. Where is ChemSpider Lacking? <ul><li>ChemSpider is limited to “defined chemicals”. No support for: </li></ul><ul><ul><li>Polymers </li></ul></ul><ul><ul><li>Minerals </li></ul></ul><ul><ul><li>Markush structures </li></ul></ul><ul><li>ChemSpider is very dependent on InChIs </li></ul><ul><ul><li>Stereochemistry around non-carbon centers </li></ul></ul><ul><ul><li>Organometallics are not correctly represented </li></ul></ul><ul><li>There are millions of errors on ChemSpider </li></ul>
  76. 82. What’s next? <ul><li>Keep cleaning and depositing data </li></ul><ul><li>Enable discovery via the semantic web (RDF) </li></ul><ul><li>Integrate software: Symyx Jdraw, NMRShiftDB </li></ul><ul><li>Integrate RSC content – a massive archive! </li></ul><ul><li>Integrate RSC publishing workflows and databases </li></ul>
  77. 83. <ul><li>Continue Building Community for Chemistry </li></ul><ul><li>Building a Public ADME/Tox database </li></ul><ul><li>Delivering ChemSpider Synthetic Pages </li></ul><ul><li>Delivering ChemSpider Analytical Data </li></ul><ul><li>Delivering ChemSpider Education </li></ul>Project Focus
  78. 84. People Make Change Happen You are invited.. <ul><li>Curate ChemSpider data and link to us </li></ul><ul><li>Deposit your data with us </li></ul><ul><ul><li>Structures </li></ul></ul><ul><ul><li>Spectra </li></ul></ul><ul><ul><li>Synthesis procedures </li></ul></ul><ul><li>ChemSpider Synthesis is under development </li></ul>
  79. 85. People Make Change Happen <ul><li>ChemSpider was a “hobby project” </li></ul><ul><li>Housed in a basement and running off three servers – one bought, two built </li></ul><ul><li>Sensitive to weather and power stability </li></ul><ul><li>Went live at ACS Spring 2007 in Chicago </li></ul><ul><li>ca. 6000 visitors a day, >50,000 transactions daily </li></ul>
  80. 86. Organizations Scale Innovation
  81. 87. Thank you [email_address] Twitter: ChemSpiderman www.chemspider.com/blog SLIDES: www.slideshare.net/AntonyWilliams