ChemSpider as a Platform for Crowd Participation in Curating Chemistry Antony Williams IDCC, Chicago, December 2010
WARNING: Chemistry is Dangerous
Di-Hydrogen Monoxide
Di-Hydrogen  Monoxide <ul><li>2H </li></ul>
Di-Hydrogen   Monoxide <ul><li>2H + 1O </li></ul>
Di-Hydrogen Monoxide <ul><li>H2O </li></ul>
Di-Hydrogen Monoxide <ul><li>H2O </li></ul><ul><li>Water </li></ul>
It’s all on Wikipedia…
Chemistry on the Internet –  Not All Bad <ul><li>100s  of websites hosting chemistry-related data </li></ul><ul><li>Chemis...
Dialects describing chemicals
A Pragmatic Vision <ul><ul><li>“ Build a Structure Centric Community” </li></ul></ul><ul><li>Integrate chemistry across th...
www.chemspider.com
Answering Questions for  Chemists <ul><li>Questions a chemist might ask… </li></ul><ul><ul><li>What is the melting point o...
Search for a Chemical…by name
Available Information… <ul><li>Linked to chemical vendors, safety data, toxicity, metabolism… </li></ul>
Available Information….
ChemSpider Today <ul><li>Almost  25 million  unique chemicals </li></ul><ul><li>Over  400  data sources </li></ul><ul><li>...
Three Years of Experience <ul><li>Internet-based chemistry is a  mess ! </li></ul><ul><li>Public compound databases are  c...
Linked Data on the Web
Where is chemistry online? <ul><li>Encyclopedic articles (Wikipedia) </li></ul><ul><li>Chemical vendor databases </li></ul...
What is the Structure of Vitamin K?
MeSH – Medical Subject Headings <ul><li>Several forms of vitamin K have been identified:  VITAMIN K 1 (phytomenadione) der...
What is the Structure of Vitamin K1?
What is the Structure of Vitamin K1?
Chemical Abstracts “Common Chemistry” Database
Wikipedia
 
  Incorrect Structures
Lack of Stereochemistry
Does stereochemistry matter? <ul><li>Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, Softenon,  Th...
 
PubChem
 
 
What’s Methane?
What’s Methane?
What  ELSE  is Methane???
Internet-Based Chemistry is a Mess <ul><li>Algorithms can get you so far </li></ul><ul><li>Human curation is necessary </l...
Search “Vitamin H”
Search “Vitamin H”
“ Curate” Identifiers
“ Curate” Identifiers
“ Curate” Identifiers
Crowd-sourcing Chemistry Curation <ul><li>Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify reco...
“ Curate” Identifiers <ul><li>General curation activities </li></ul><ul><ul><li>Remove incorrect names </li></ul></ul><ul>...
Crowdsourcing Works <ul><li>The “crowd” has deposited data (structures, spectra, etc) and participated in data curation </...
Crowdsourcing Works <ul><li>The “crowd” has deposited data (structures, spectra, etc) and participated in data curation </...
Vancomycin –  Curate This!!!
Vancomycin on ChemSpider  1 compound – 3 days
Crowdsourced “Annotations” <ul><li>Users can add  </li></ul><ul><ul><li>Descriptions/Syntheses/Commentaries </li></ul></ul...
Multimedia Content Holder
Gaming for Curation of Spectra
ChemSpider Everywhere Crowdsourced Curation of Spectra
Data Curation
True Curation of Data
ChemSpider SyntheticPages
Drug Name Generic Name ChEBI ChemSpider CAS Com. Chem ChemIDPlus DailyMed DrugBank PubChem Wikipedia Spiriva Tiotropium Br...
Sharing Our Activities <ul><li>Presently defining approaches with other public compound databases to share results of cura...
Mobile ChemSpider
First  request to Database Hosts! <ul><li>Every public compound database host should add ONE feature – “Leave Comments” </...
Second  request to Database Hosts! Show Comments
Question Quality
Thank you Email: williamsa@rsc.org  Twitter: ChemConnector Blog: www.chemspider.com/blog Personal Blog: www.chemconnector....
Upcoming SlideShare
Loading in...5
×

ChemSpider as a Platform for Crowd Participation in Curating Chemistry

1,284

Published on

This is a presentation I gave at the International Digital Curation Conference in Chicago, December 7th 2010, #idcc10. The presentation discusses the issues of data quality and the need for collective, crowdsourced efforts to improve the quality of chemistry related data on the Internet

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,284
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

ChemSpider as a Platform for Crowd Participation in Curating Chemistry

  1. 1. ChemSpider as a Platform for Crowd Participation in Curating Chemistry Antony Williams IDCC, Chicago, December 2010
  2. 2. WARNING: Chemistry is Dangerous
  3. 3. Di-Hydrogen Monoxide
  4. 4. Di-Hydrogen Monoxide <ul><li>2H </li></ul>
  5. 5. Di-Hydrogen Monoxide <ul><li>2H + 1O </li></ul>
  6. 6. Di-Hydrogen Monoxide <ul><li>H2O </li></ul>
  7. 7. Di-Hydrogen Monoxide <ul><li>H2O </li></ul><ul><li>Water </li></ul>
  8. 8. It’s all on Wikipedia…
  9. 9. Chemistry on the Internet – Not All Bad <ul><li>100s of websites hosting chemistry-related data </li></ul><ul><li>Chemistry information is generally “compound-based” </li></ul><ul><ul><li>Chemical “structures” </li></ul></ul><ul><ul><li>Identifiers, names and synonyms </li></ul></ul><ul><ul><li>Properties </li></ul></ul><ul><ul><li>Analytical data </li></ul></ul><ul><ul><li>How to synthesize </li></ul></ul><ul><ul><li>Articles, patents, safety information </li></ul></ul><ul><li>Chemistry “language and dialects” </li></ul>
  10. 10. Dialects describing chemicals
  11. 11. A Pragmatic Vision <ul><ul><li>“ Build a Structure Centric Community” </li></ul></ul><ul><li>Integrate chemistry across the internet based on “chemical structure” </li></ul><ul><ul><li>A “structure-based hub” to information and data </li></ul></ul><ul><ul><li>Let chemists contribute their own data </li></ul></ul><ul><ul><li>Allow the community to curate & annotate data </li></ul></ul>
  12. 12. www.chemspider.com
  13. 13. Answering Questions for Chemists <ul><li>Questions a chemist might ask… </li></ul><ul><ul><li>What is the melting point of n-heptanol? </li></ul></ul><ul><ul><li>What is the chemical structure of Xanax? </li></ul></ul><ul><ul><li>Chemically, what is phenolphthalein? </li></ul></ul><ul><ul><li>What are the stereocenters of cholesterol? </li></ul></ul><ul><ul><li>Where can I find publications about xylene? </li></ul></ul><ul><ul><li>What are the different trade names for Aspirin? </li></ul></ul><ul><ul><li>What is the NMR spectrum of Benzoic Acid? </li></ul></ul><ul><ul><li>What are the safety handling issues for toluene? </li></ul></ul>
  14. 14. Search for a Chemical…by name
  15. 15. Available Information… <ul><li>Linked to chemical vendors, safety data, toxicity, metabolism… </li></ul>
  16. 16. Available Information….
  17. 17. ChemSpider Today <ul><li>Almost 25 million unique chemicals </li></ul><ul><li>Over 400 data sources </li></ul><ul><li>Grows daily – community and RSC depositions </li></ul><ul><li>Community annotation and curation </li></ul><ul><li>We curate, edit, change, enhance data daily </li></ul>
  18. 18. Three Years of Experience <ul><li>Internet-based chemistry is a mess ! </li></ul><ul><li>Public compound databases are contaminated </li></ul><ul><li>The annotation/curation of data online is difficult </li></ul><ul><li>Most database hosts are non-responsive to feedback – “We are a host/repository of data” </li></ul><ul><li>Who cares ? </li></ul>
  19. 19. Linked Data on the Web
  20. 20. Where is chemistry online? <ul><li>Encyclopedic articles (Wikipedia) </li></ul><ul><li>Chemical vendor databases </li></ul><ul><li>Metabolic pathway databases </li></ul><ul><li>Property databases </li></ul><ul><li>Patents with chemical structures </li></ul><ul><li>Drug Discovery data </li></ul><ul><li>Scientific publications </li></ul><ul><li>Compound aggregators </li></ul><ul><li>Blogs/Wikis and Open Notebook Science </li></ul>
  21. 21. What is the Structure of Vitamin K?
  22. 22. MeSH – Medical Subject Headings <ul><li>Several forms of vitamin K have been identified: VITAMIN K 1 (phytomenadione) derived from plants , VITAMIN K 2 (menaquinone) from bacteria, and synthetic naphthoquinone provitamins, VITAMIN K 3 (menadione). </li></ul>
  23. 23. What is the Structure of Vitamin K1?
  24. 24. What is the Structure of Vitamin K1?
  25. 25. Chemical Abstracts “Common Chemistry” Database
  26. 26. Wikipedia
  27. 28. Incorrect Structures
  28. 29. Lack of Stereochemistry
  29. 30. Does stereochemistry matter? <ul><li>Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, Softenon, Thalidomide </li></ul>
  30. 32. PubChem
  31. 35. What’s Methane?
  32. 36. What’s Methane?
  33. 37. What ELSE is Methane???
  34. 38. Internet-Based Chemistry is a Mess <ul><li>Algorithms can get you so far </li></ul><ul><li>Human curation is necessary </li></ul><ul><li>Only the crowds can help with big data… ChemSpider is approaching 25 million compounds </li></ul>
  35. 39. Search “Vitamin H”
  36. 40. Search “Vitamin H”
  37. 41. “ Curate” Identifiers
  38. 42. “ Curate” Identifiers
  39. 43. “ Curate” Identifiers
  40. 44. Crowd-sourcing Chemistry Curation <ul><li>Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate </li></ul>
  41. 45. “ Curate” Identifiers <ul><li>General curation activities </li></ul><ul><ul><li>Remove incorrect names </li></ul></ul><ul><ul><li>Correct spellings </li></ul></ul><ul><ul><li>Add multilingual names </li></ul></ul><ul><ul><li>Add alternative names </li></ul></ul><ul><li>In 3 years over 1 million structure-identifier relationships have been validated – robotically and manually </li></ul><ul><li>130 people have participated in validation or annotation. “ Crowds ” can be quite small! </li></ul>
  42. 46. Crowdsourcing Works <ul><li>The “crowd” has deposited data (structures, spectra, etc) and participated in data curation </li></ul><ul><li>Different level curators check each others work </li></ul><ul><li>Wikipedia is the modern primary example </li></ul><ul><li>Some curators are “madmen”… </li></ul>
  43. 47. Crowdsourcing Works <ul><li>The “crowd” has deposited data (structures, spectra, etc) and participated in data curation </li></ul><ul><li>Different level curators check each others work </li></ul><ul><li>Wikipedia is the modern primary example </li></ul><ul><li>Some curators are “madmen”… </li></ul><ul><li>The Oxford English Dictionary </li></ul>
  44. 48. Vancomycin – Curate This!!!
  45. 49. Vancomycin on ChemSpider 1 compound – 3 days
  46. 50. Crowdsourced “Annotations” <ul><li>Users can add </li></ul><ul><ul><li>Descriptions/Syntheses/Commentaries </li></ul></ul><ul><ul><li>Links to articles </li></ul></ul><ul><ul><li>Spectral data </li></ul></ul><ul><ul><li>Photos </li></ul></ul><ul><ul><li>MP3 files </li></ul></ul><ul><ul><li>Videos </li></ul></ul>
  47. 51. Multimedia Content Holder
  48. 52. Gaming for Curation of Spectra
  49. 53. ChemSpider Everywhere Crowdsourced Curation of Spectra
  50. 54. Data Curation
  51. 55. True Curation of Data
  52. 56. ChemSpider SyntheticPages
  53. 57. Drug Name Generic Name ChEBI ChemSpider CAS Com. Chem ChemIDPlus DailyMed DrugBank PubChem Wikipedia Spiriva Tiotropium Bromide No Hits  No Hits    4/0  Depakote Valproate semisodium        No Structure Basen Voglibose   No Hits  No Hits  2/1  Symbicort 1) Budesonide       8/1  Symbicort 2) Formoterol WRONG  No Hits    6/1  Vytorin 1) Ezetimibe   No Hits      Vytorin 2) Simvastatin       2/1  Taxol Paclitaxel       44/1  Thalidomid Thalidomide No Hits        Zocor Simvastatin       2/1  Crestor Rosuvastatin   No Hits    2/1 
  54. 58. Sharing Our Activities <ul><li>Presently defining approaches with other public compound databases to share results of curation activities </li></ul><ul><li>Member of large European project to link data from the Life Sciences. Sharing results of curation is essential </li></ul><ul><li>Making curation and contribution interfaces Mobile </li></ul>
  55. 59. Mobile ChemSpider
  56. 60. First request to Database Hosts! <ul><li>Every public compound database host should add ONE feature – “Leave Comments” </li></ul>
  57. 61. Second request to Database Hosts! Show Comments
  58. 62. Question Quality
  59. 63. Thank you Email: williamsa@rsc.org Twitter: ChemConnector Blog: www.chemspider.com/blog Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×