Your SlideShare is downloading. ×
0
RSC ChemSpider – Building an Internet Based Community for Chemists<br />
Where is chemistry online?<br />Encyclopedic articles (Wikipedia)<br />Chemical vendor databases<br />Metabolic pathway da...
Chemistry on the Internet TODAY<br />Chemistry searches are generally limited to text-based searches across the internet<b...
What do humans want?<br />media.obsessable.com<br />As few interfaces as possible<br />
Chemistry on the Internet FUTURE<br />Search by chemical structure and substructure<br />Chemistry articles indexed and se...
For Synthesis…TotallySynthetic.com<br />
Org Prep Daily  (Blog)<br />
Lots of “Public Compound” Databases<br />PubChem<br />Drugbank<br />ChEBI/ChEMBL<br />KEGG<br />LipidMAPs<br />ChemIDPlus<...
Where Would You look? What Do You Trust?<br />
Linked Data on the Web<br />Taken from: Rafael Sidis’ Blog<br />
What is a compound?<br />
What is ChemSpider?<br />ChemSpider is:<br />Building a Structure Centric Community for Chemists<br />&gt;23 million compo...
How Was ChemSpider Built?<br />ChemSpider was a “hobby project” <br />Housed in a basement and running off three servers –...
Search Cholesterol<br />
Search Cholesterol<br />
Search Cholesterol<br />
Search Cholesterol<br />
Search Cholesterol<br />
Linked across the internet<br />
Kyoto Encyclopedia of Genes and Genomes <br />
Link off a structure in ChemSpider<br />Chemical suppliers<br />Other publications<br />Analytical Data<br />Related React...
Links to Patents based on structure<br />
Clickthrough to Patents<br />
Articles Linked<br />
Answering Questions for Chemists<br />Questions a chemist might ask…<br />What is the melting point of n-butanol? <br />Wh...
Complex Data and Information<br />
ChemSpider is a structure-centric hub<br />ChemSpider aggregates and links out across the internet<br />Data aggregate bas...
What is a compound?<br />
Question Everything online: www.dhmo.org<br />
Di-Hydrogen Monoxide<br />2H<br />
Di-HydrogenMonoxide<br />2H + 1O<br />
Di-Hydrogen Monoxide<br />H2O<br />
Di-Hydrogen Monoxide<br />H2O<br />Water<br />
It’s all on Wikipedia…<br />
It’s all on Wikipedia…<br />
Chemistry on The Internet Is Messy<br />
It’s Methane…<br />
What’s Methane?<br />
What’s Methane?<br />
What ELSE is Methane???<br />
PubChem<br />
Truly “I Love You”<br />
Chemistry is REALLY Messy<br />
Vancomycin<br />Who will curate?<br />How would you clean such a large dataset?<br />Assertions!!!<br />
Vancomycin<br />Who will curate?<br />How would you clean such a large dataset?<br />
Vancomycin on ChemSpider 1 compound – 3 days <br />
The EXPERTS must get it right?!<br />
Wikipedia, C&E News, PubChem<br />C&E News (from ACS)<br />
What About Digitonin?<br />
CAS as an authority<br />
The Blogging Community Participate <br />
The FDA’s DailyMed<br />
 Structures on DailyMed<br />
Lack of Stereochemisty<br />
 Incorrect Structures<br />
Wow!<br />
The InChI Identifier<br />
Multiple Layers<br />
InChIStrings Hash to InChIKeys<br />
InChIs for Taxol<br />
Back to Taxol<br />DrugBank: RCINICONZNJXQF-CLDWUXIMDD<br />ChEBI: 	 RCINICONZNJXQF-GXKQXQCDDN <br />Wikipedia: RCINICONZN...
InChIKeys for Taxol<br />DrugBank: RCINICONZNJXQF-CLDWUXIMDD<br />ChEBI: 	 RCINICONZNJXQF-GXKQXQCDDN <br />Wikipedia: RCIN...
Does one stereocenter matter?<br />
Does one stereocenter matter?<br />Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, and Softenon <b...
Does one stereocenter matter?<br /><ul><li>Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, and Sof...
Assertion and  Chemical Entities<br />Who says what Taxol is?<br />What is the “timeline” for a molecule?<br />How do we c...
ChemSpider Searches<br />
ChemSpider Searches<br />
ChemSpider Complex Searches<br />
Vancomycin – Search the Internet<br />
Full Molecule Search: 4 Hits<br />
Full Skeleton Search: 104 Hits<br />
The InChI “Resolver”<br />
Citizen Scientists<br />
Crowd-sourcing Chemistry Curation<br />Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records...
Building a Structure Centric Community for Chemists<br />Multi-level Curation and Approval<br />
Citizens as Data Sources <br />
Entity-Extraction, Mark-up, Annotate<br />
Success Depends on Dictionaries<br />
Project Prospect<br />
ChemMantis and CJOC<br />
Name-Structure Pairs<br />
Species – linked to Wikipedia<br />
Semantic Linking of Structures<br />What would you want to link off a structure?<br />Chemical suppliers<br />Other public...
ChemSpider Everywhere<br />Linked from Wikipedia<br />Linked from Open Notebook Science sites using EMBED<br />Linked from...
ChemSpider Everywhere : Embed<br />
ChemSpider Everywhere:What do computers want?<br />Web services<br />flickr.com/photos/microcosmos<br />
ChemSpider Everywhere: Spectral Game<br />
ChemSpider EverywhereCrowdsourced Curation of Spectra<br />
ChemSpider EverywhereChemMobi<br />
There are always gaps...<br />What ChemSpider doesn’t deal with yet...<br />Markush structures and other “non-defineds”<br...
What’s next?<br />Continue the curation effort and keep cleaning<br />Finish depositions – millions left to deposit <br />...
Thank you<br />antony.williams@chemspider.com<br />Twitter: ChemSpiderman<br />www.chemspider.com/blog<br />SLIDES: www.sl...
RSC ChemSpider – Building An Internet Based Community For Chemists
Upcoming SlideShare
Loading in...5
×

RSC ChemSpider – Building An Internet Based Community For Chemists

1,127

Published on

This is a general presentation about our efforts to build an internet based community for chemists using ChemSpider. A general overview of data quality online, crowdsourced deposition and curation and our progress to deliver a solution to the community for resourcing data.

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,127
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
11
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "RSC ChemSpider – Building An Internet Based Community For Chemists"

  1. 1. RSC ChemSpider – Building an Internet Based Community for Chemists<br />
  2. 2. Where is chemistry online?<br />Encyclopedic articles (Wikipedia)<br />Chemical vendor databases<br />Metabolic pathway databases<br />Property databases<br />Patents with chemical structures<br />Drug Discovery data<br />Scientific publications <br />Compound aggregators<br />Blogs/Wikis and Open Notebook Science<br />
  3. 3. Chemistry on the Internet TODAY<br />Chemistry searches are generally limited to text-based searches across the internet<br />Poor quality and little curation/validation work<br />Too many searches required to resource data<br />
  4. 4. What do humans want?<br />media.obsessable.com<br />As few interfaces as possible<br />
  5. 5. Chemistry on the Internet FUTURE<br />Search by chemical structure and substructure<br />Chemistry articles indexed and searchable<br />Reduced number of searches to find data<br />Data are integrated – compounds, vendors, syntheses, data, publications and patents<br />
  6. 6. For Synthesis…TotallySynthetic.com<br />
  7. 7. Org Prep Daily (Blog)<br />
  8. 8. Lots of “Public Compound” Databases<br />PubChem<br />Drugbank<br />ChEBI/ChEMBL<br />KEGG<br />LipidMAPs<br />ChemIDPlus<br />eMolecules<br />ZINC<br />Lots of chemical vendors<br />ChemSpider<br />
  9. 9. Where Would You look? What Do You Trust?<br />
  10. 10. Linked Data on the Web<br />Taken from: Rafael Sidis’ Blog<br />
  11. 11. What is a compound?<br />
  12. 12. What is ChemSpider?<br />ChemSpider is:<br />Building a Structure Centric Community for Chemists<br />&gt;23 million compounds, &gt;300 data sources<br />A deposition and curation platform<br />A publishing platform for the community<br />Grows daily – more depositions, more links, more data sources<br />
  13. 13. How Was ChemSpider Built?<br />ChemSpider was a “hobby project” <br />Housed in a basement and running off three servers – one bought, two built<br />Sensitive to weather and power stability<br />Went live at ACS Spring 2007 in Chicago<br />
  14. 14. Search Cholesterol<br />
  15. 15. Search Cholesterol<br />
  16. 16. Search Cholesterol<br />
  17. 17. Search Cholesterol<br />
  18. 18. Search Cholesterol<br />
  19. 19. Linked across the internet<br />
  20. 20. Kyoto Encyclopedia of Genes and Genomes <br />
  21. 21. Link off a structure in ChemSpider<br />Chemical suppliers<br />Other publications<br />Analytical Data<br />Related Reactions<br />Wikipedia<br />Patents<br />“Everything”<br />
  22. 22. Links to Patents based on structure<br />
  23. 23. Clickthrough to Patents<br />
  24. 24. Articles Linked<br />
  25. 25. Answering Questions for Chemists<br />Questions a chemist might ask…<br />What is the melting point of n-butanol? <br />What is the chemical structure of Xanax?<br />Chemically, what is phenolphthalein?<br />What are the stereocenters of cholesterol?<br />Where can I find publications about xylene?<br />What are the different trade names for Ketoconazole?<br />What is the NMR spectrum of Aspirin?<br />What are the safety handling issues for Thymol Blue?<br />
  26. 26. Complex Data and Information<br />
  27. 27. ChemSpider is a structure-centric hub<br />ChemSpider aggregates and links out across the internet<br />Data aggregate based on “structures and links”<br />What defines a chemical compound?<br />
  28. 28. What is a compound?<br />
  29. 29. Question Everything online: www.dhmo.org<br />
  30. 30. Di-Hydrogen Monoxide<br />2H<br />
  31. 31. Di-HydrogenMonoxide<br />2H + 1O<br />
  32. 32. Di-Hydrogen Monoxide<br />H2O<br />
  33. 33. Di-Hydrogen Monoxide<br />H2O<br />Water<br />
  34. 34. It’s all on Wikipedia…<br />
  35. 35. It’s all on Wikipedia…<br />
  36. 36. Chemistry on The Internet Is Messy<br />
  37. 37. It’s Methane…<br />
  38. 38. What’s Methane?<br />
  39. 39. What’s Methane?<br />
  40. 40. What ELSE is Methane???<br />
  41. 41. PubChem<br />
  42. 42. Truly “I Love You”<br />
  43. 43. Chemistry is REALLY Messy<br />
  44. 44. Vancomycin<br />Who will curate?<br />How would you clean such a large dataset?<br />Assertions!!!<br />
  45. 45. Vancomycin<br />Who will curate?<br />How would you clean such a large dataset?<br />
  46. 46. Vancomycin on ChemSpider 1 compound – 3 days <br />
  47. 47. The EXPERTS must get it right?!<br />
  48. 48. Wikipedia, C&E News, PubChem<br />C&E News (from ACS)<br />
  49. 49. What About Digitonin?<br />
  50. 50. CAS as an authority<br />
  51. 51. The Blogging Community Participate <br />
  52. 52. The FDA’s DailyMed<br />
  53. 53. Structures on DailyMed<br />
  54. 54. Lack of Stereochemisty<br />
  55. 55. Incorrect Structures<br />
  56. 56. Wow!<br />
  57. 57. The InChI Identifier<br />
  58. 58. Multiple Layers<br />
  59. 59. InChIStrings Hash to InChIKeys<br />
  60. 60. InChIs for Taxol<br />
  61. 61. Back to Taxol<br />DrugBank: RCINICONZNJXQF-CLDWUXIMDD<br />ChEBI: RCINICONZNJXQF-GXKQXQCDDN <br />Wikipedia: RCINICONZNJXQF-MZXODVADBJ<br />Which one is correct???<br />
  62. 62. InChIKeys for Taxol<br />DrugBank: RCINICONZNJXQF-CLDWUXIMDD<br />ChEBI: RCINICONZNJXQF-GXKQXQCDDN <br />Wikipedia: RCINICONZNJXQF-MZXODVADBJ<br />ChEBI and Wikipedia are the SAME structure<br />Drugbank is a DIFFERENT structure – ONE stereocenter<br />
  63. 63. Does one stereocenter matter?<br />
  64. 64. Does one stereocenter matter?<br />Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, and Softenon <br />
  65. 65. Does one stereocenter matter?<br /><ul><li>Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, and Softenon </li></li></ul><li>Building a Structure Centric Community for Chemists<br />
  66. 66. Assertion and Chemical Entities<br />Who says what Taxol is?<br />What is the “timeline” for a molecule?<br />How do we clean up the Public data?<br />The Quality source is Chemical Abstracts Service…<br />
  67. 67. ChemSpider Searches<br />
  68. 68. ChemSpider Searches<br />
  69. 69. ChemSpider Complex Searches<br />
  70. 70. Vancomycin – Search the Internet<br />
  71. 71. Full Molecule Search: 4 Hits<br />
  72. 72. Full Skeleton Search: 104 Hits<br />
  73. 73. The InChI “Resolver”<br />
  74. 74. Citizen Scientists<br />
  75. 75. Crowd-sourcing Chemistry Curation<br />Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate<br />
  76. 76. Building a Structure Centric Community for Chemists<br />Multi-level Curation and Approval<br />
  77. 77. Citizens as Data Sources <br />
  78. 78.
  79. 79. Entity-Extraction, Mark-up, Annotate<br />
  80. 80. Success Depends on Dictionaries<br />
  81. 81. Project Prospect<br />
  82. 82. ChemMantis and CJOC<br />
  83. 83. Name-Structure Pairs<br />
  84. 84. Species – linked to Wikipedia<br />
  85. 85. Semantic Linking of Structures<br />What would you want to link off a structure?<br />Chemical suppliers<br />Other publications<br />Analytical Data<br />Related Reactions<br />Wikipedia<br />Patents<br />“Everything”<br />
  86. 86. ChemSpider Everywhere<br />Linked from Wikipedia<br />Linked from Open Notebook Science sites using EMBED<br />Linked from Blogs using Structure/Spectra EMBED<br />Integrated into structure drawing packages such as ACD/ChemSketch, Symyx Draw, Open Source applets<br />Integrated to software offerings from Thermo, Waters, Agilent, Bruker<br />
  87. 87. ChemSpider Everywhere : Embed<br />
  88. 88. ChemSpider Everywhere:What do computers want?<br />Web services<br />flickr.com/photos/microcosmos<br />
  89. 89. ChemSpider Everywhere: Spectral Game<br />
  90. 90. ChemSpider EverywhereCrowdsourced Curation of Spectra<br />
  91. 91. ChemSpider EverywhereChemMobi<br />
  92. 92. There are always gaps...<br />What ChemSpider doesn’t deal with yet...<br />Markush structures and other “non-defineds”<br />Materials<br />Minerals<br />Polymers<br />Biological macromolecules<br />
  93. 93. What’s next?<br />Continue the curation effort and keep cleaning<br />Finish depositions – millions left to deposit <br />Layer on RDF to allow the semantic web to benefit from our efforts<br />Integrate RSC content – a massive archive!<br />Integrate RSC publishing workflows and databases<br />
  94. 94. Thank you<br />antony.williams@chemspider.com<br />Twitter: ChemSpiderman<br />www.chemspider.com/blog<br />SLIDES: www.slideshare.net/AntonyWilliams<br />
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×