Your SlideShare is downloading. ×
0
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

ChemSpider as a Platform for Crowd Participation in Curating Chemistry

1,275

Published on

This is a presentation I gave at the International Digital Curation Conference in Chicago, December 7th 2010, #idcc10. The presentation discusses the issues of data quality and the need for …

This is a presentation I gave at the International Digital Curation Conference in Chicago, December 7th 2010, #idcc10. The presentation discusses the issues of data quality and the need for collective, crowdsourced efforts to improve the quality of chemistry related data on the Internet

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,275
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. ChemSpider as a Platform for Crowd Participation in Curating Chemistry Antony Williams IDCC, Chicago, December 2010
  • 2. WARNING: Chemistry is Dangerous
  • 3. Di-Hydrogen Monoxide
  • 4. Di-Hydrogen Monoxide <ul><li>2H </li></ul>
  • 5. Di-Hydrogen Monoxide <ul><li>2H + 1O </li></ul>
  • 6. Di-Hydrogen Monoxide <ul><li>H2O </li></ul>
  • 7. Di-Hydrogen Monoxide <ul><li>H2O </li></ul><ul><li>Water </li></ul>
  • 8. It’s all on Wikipedia…
  • 9. Chemistry on the Internet – Not All Bad <ul><li>100s of websites hosting chemistry-related data </li></ul><ul><li>Chemistry information is generally “compound-based” </li></ul><ul><ul><li>Chemical “structures” </li></ul></ul><ul><ul><li>Identifiers, names and synonyms </li></ul></ul><ul><ul><li>Properties </li></ul></ul><ul><ul><li>Analytical data </li></ul></ul><ul><ul><li>How to synthesize </li></ul></ul><ul><ul><li>Articles, patents, safety information </li></ul></ul><ul><li>Chemistry “language and dialects” </li></ul>
  • 10. Dialects describing chemicals
  • 11. A Pragmatic Vision <ul><ul><li>“ Build a Structure Centric Community” </li></ul></ul><ul><li>Integrate chemistry across the internet based on “chemical structure” </li></ul><ul><ul><li>A “structure-based hub” to information and data </li></ul></ul><ul><ul><li>Let chemists contribute their own data </li></ul></ul><ul><ul><li>Allow the community to curate & annotate data </li></ul></ul>
  • 12. www.chemspider.com
  • 13. Answering Questions for Chemists <ul><li>Questions a chemist might ask… </li></ul><ul><ul><li>What is the melting point of n-heptanol? </li></ul></ul><ul><ul><li>What is the chemical structure of Xanax? </li></ul></ul><ul><ul><li>Chemically, what is phenolphthalein? </li></ul></ul><ul><ul><li>What are the stereocenters of cholesterol? </li></ul></ul><ul><ul><li>Where can I find publications about xylene? </li></ul></ul><ul><ul><li>What are the different trade names for Aspirin? </li></ul></ul><ul><ul><li>What is the NMR spectrum of Benzoic Acid? </li></ul></ul><ul><ul><li>What are the safety handling issues for toluene? </li></ul></ul>
  • 14. Search for a Chemical…by name
  • 15. Available Information… <ul><li>Linked to chemical vendors, safety data, toxicity, metabolism… </li></ul>
  • 16. Available Information….
  • 17. ChemSpider Today <ul><li>Almost 25 million unique chemicals </li></ul><ul><li>Over 400 data sources </li></ul><ul><li>Grows daily – community and RSC depositions </li></ul><ul><li>Community annotation and curation </li></ul><ul><li>We curate, edit, change, enhance data daily </li></ul>
  • 18. Three Years of Experience <ul><li>Internet-based chemistry is a mess ! </li></ul><ul><li>Public compound databases are contaminated </li></ul><ul><li>The annotation/curation of data online is difficult </li></ul><ul><li>Most database hosts are non-responsive to feedback – “We are a host/repository of data” </li></ul><ul><li>Who cares ? </li></ul>
  • 19. Linked Data on the Web
  • 20. Where is chemistry online? <ul><li>Encyclopedic articles (Wikipedia) </li></ul><ul><li>Chemical vendor databases </li></ul><ul><li>Metabolic pathway databases </li></ul><ul><li>Property databases </li></ul><ul><li>Patents with chemical structures </li></ul><ul><li>Drug Discovery data </li></ul><ul><li>Scientific publications </li></ul><ul><li>Compound aggregators </li></ul><ul><li>Blogs/Wikis and Open Notebook Science </li></ul>
  • 21. What is the Structure of Vitamin K?
  • 22. MeSH – Medical Subject Headings <ul><li>Several forms of vitamin K have been identified: VITAMIN K 1 (phytomenadione) derived from plants , VITAMIN K 2 (menaquinone) from bacteria, and synthetic naphthoquinone provitamins, VITAMIN K 3 (menadione). </li></ul>
  • 23. What is the Structure of Vitamin K1?
  • 24. What is the Structure of Vitamin K1?
  • 25. Chemical Abstracts “Common Chemistry” Database
  • 26. Wikipedia
  • 27.  
  • 28. Incorrect Structures
  • 29. Lack of Stereochemistry
  • 30. Does stereochemistry matter? <ul><li>Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, Softenon, Thalidomide </li></ul>
  • 31.  
  • 32. PubChem
  • 33.  
  • 34.  
  • 35. What’s Methane?
  • 36. What’s Methane?
  • 37. What ELSE is Methane???
  • 38. Internet-Based Chemistry is a Mess <ul><li>Algorithms can get you so far </li></ul><ul><li>Human curation is necessary </li></ul><ul><li>Only the crowds can help with big data… ChemSpider is approaching 25 million compounds </li></ul>
  • 39. Search “Vitamin H”
  • 40. Search “Vitamin H”
  • 41. “ Curate” Identifiers
  • 42. “ Curate” Identifiers
  • 43. “ Curate” Identifiers
  • 44. Crowd-sourcing Chemistry Curation <ul><li>Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate </li></ul>
  • 45. “ Curate” Identifiers <ul><li>General curation activities </li></ul><ul><ul><li>Remove incorrect names </li></ul></ul><ul><ul><li>Correct spellings </li></ul></ul><ul><ul><li>Add multilingual names </li></ul></ul><ul><ul><li>Add alternative names </li></ul></ul><ul><li>In 3 years over 1 million structure-identifier relationships have been validated – robotically and manually </li></ul><ul><li>130 people have participated in validation or annotation. “ Crowds ” can be quite small! </li></ul>
  • 46. Crowdsourcing Works <ul><li>The “crowd” has deposited data (structures, spectra, etc) and participated in data curation </li></ul><ul><li>Different level curators check each others work </li></ul><ul><li>Wikipedia is the modern primary example </li></ul><ul><li>Some curators are “madmen”… </li></ul>
  • 47. Crowdsourcing Works <ul><li>The “crowd” has deposited data (structures, spectra, etc) and participated in data curation </li></ul><ul><li>Different level curators check each others work </li></ul><ul><li>Wikipedia is the modern primary example </li></ul><ul><li>Some curators are “madmen”… </li></ul><ul><li>The Oxford English Dictionary </li></ul>
  • 48. Vancomycin – Curate This!!!
  • 49. Vancomycin on ChemSpider 1 compound – 3 days
  • 50. Crowdsourced “Annotations” <ul><li>Users can add </li></ul><ul><ul><li>Descriptions/Syntheses/Commentaries </li></ul></ul><ul><ul><li>Links to articles </li></ul></ul><ul><ul><li>Spectral data </li></ul></ul><ul><ul><li>Photos </li></ul></ul><ul><ul><li>MP3 files </li></ul></ul><ul><ul><li>Videos </li></ul></ul>
  • 51. Multimedia Content Holder
  • 52. Gaming for Curation of Spectra
  • 53. ChemSpider Everywhere Crowdsourced Curation of Spectra
  • 54. Data Curation
  • 55. True Curation of Data
  • 56. ChemSpider SyntheticPages
  • 57. Drug Name Generic Name ChEBI ChemSpider CAS Com. Chem ChemIDPlus DailyMed DrugBank PubChem Wikipedia Spiriva Tiotropium Bromide No Hits  No Hits    4/0  Depakote Valproate semisodium        No Structure Basen Voglibose   No Hits  No Hits  2/1  Symbicort 1) Budesonide       8/1  Symbicort 2) Formoterol WRONG  No Hits    6/1  Vytorin 1) Ezetimibe   No Hits      Vytorin 2) Simvastatin       2/1  Taxol Paclitaxel       44/1  Thalidomid Thalidomide No Hits        Zocor Simvastatin       2/1  Crestor Rosuvastatin   No Hits    2/1 
  • 58. Sharing Our Activities <ul><li>Presently defining approaches with other public compound databases to share results of curation activities </li></ul><ul><li>Member of large European project to link data from the Life Sciences. Sharing results of curation is essential </li></ul><ul><li>Making curation and contribution interfaces Mobile </li></ul>
  • 59. Mobile ChemSpider
  • 60. First request to Database Hosts! <ul><li>Every public compound database host should add ONE feature – “Leave Comments” </li></ul>
  • 61. Second request to Database Hosts! Show Comments
  • 62. Question Quality
  • 63. Thank you Email: williamsa@rsc.org Twitter: ChemConnector Blog: www.chemspider.com/blog Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams

×