Your SlideShare is downloading. ×
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

ChemSpider as a Platform for Crowd Participation in Curating Chemistry

1,263
views

Published on

This is a presentation I gave at the International Digital Curation Conference in Chicago, December 7th 2010, #idcc10. The presentation discusses the issues of data quality and the need for …

This is a presentation I gave at the International Digital Curation Conference in Chicago, December 7th 2010, #idcc10. The presentation discusses the issues of data quality and the need for collective, crowdsourced efforts to improve the quality of chemistry related data on the Internet

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,263
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. ChemSpider as a Platform for Crowd Participation in Curating Chemistry Antony Williams IDCC, Chicago, December 2010
  • 2. WARNING: Chemistry is Dangerous
  • 3. Di-Hydrogen Monoxide
  • 4. Di-Hydrogen Monoxide
    • 2H
  • 5. Di-Hydrogen Monoxide
    • 2H + 1O
  • 6. Di-Hydrogen Monoxide
    • H2O
  • 7. Di-Hydrogen Monoxide
    • H2O
    • Water
  • 8. It’s all on Wikipedia…
  • 9. Chemistry on the Internet – Not All Bad
    • 100s of websites hosting chemistry-related data
    • Chemistry information is generally “compound-based”
      • Chemical “structures”
      • Identifiers, names and synonyms
      • Properties
      • Analytical data
      • How to synthesize
      • Articles, patents, safety information
    • Chemistry “language and dialects”
  • 10. Dialects describing chemicals
  • 11. A Pragmatic Vision
      • “ Build a Structure Centric Community”
    • Integrate chemistry across the internet based on “chemical structure”
      • A “structure-based hub” to information and data
      • Let chemists contribute their own data
      • Allow the community to curate & annotate data
  • 12. www.chemspider.com
  • 13. Answering Questions for Chemists
    • Questions a chemist might ask…
      • What is the melting point of n-heptanol?
      • What is the chemical structure of Xanax?
      • Chemically, what is phenolphthalein?
      • What are the stereocenters of cholesterol?
      • Where can I find publications about xylene?
      • What are the different trade names for Aspirin?
      • What is the NMR spectrum of Benzoic Acid?
      • What are the safety handling issues for toluene?
  • 14. Search for a Chemical…by name
  • 15. Available Information…
    • Linked to chemical vendors, safety data, toxicity, metabolism…
  • 16. Available Information….
  • 17. ChemSpider Today
    • Almost 25 million unique chemicals
    • Over 400 data sources
    • Grows daily – community and RSC depositions
    • Community annotation and curation
    • We curate, edit, change, enhance data daily
  • 18. Three Years of Experience
    • Internet-based chemistry is a mess !
    • Public compound databases are contaminated
    • The annotation/curation of data online is difficult
    • Most database hosts are non-responsive to feedback – “We are a host/repository of data”
    • Who cares ?
  • 19. Linked Data on the Web
  • 20. Where is chemistry online?
    • Encyclopedic articles (Wikipedia)
    • Chemical vendor databases
    • Metabolic pathway databases
    • Property databases
    • Patents with chemical structures
    • Drug Discovery data
    • Scientific publications
    • Compound aggregators
    • Blogs/Wikis and Open Notebook Science
  • 21. What is the Structure of Vitamin K?
  • 22. MeSH – Medical Subject Headings
    • Several forms of vitamin K have been identified: VITAMIN K 1 (phytomenadione) derived from plants , VITAMIN K 2 (menaquinone) from bacteria, and synthetic naphthoquinone provitamins, VITAMIN K 3 (menadione).
  • 23. What is the Structure of Vitamin K1?
  • 24. What is the Structure of Vitamin K1?
  • 25. Chemical Abstracts “Common Chemistry” Database
  • 26. Wikipedia
  • 27.  
  • 28. Incorrect Structures
  • 29. Lack of Stereochemistry
  • 30. Does stereochemistry matter?
    • Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, Softenon, Thalidomide
  • 31.  
  • 32. PubChem
  • 33.  
  • 34.  
  • 35. What’s Methane?
  • 36. What’s Methane?
  • 37. What ELSE is Methane???
  • 38. Internet-Based Chemistry is a Mess
    • Algorithms can get you so far
    • Human curation is necessary
    • Only the crowds can help with big data… ChemSpider is approaching 25 million compounds
  • 39. Search “Vitamin H”
  • 40. Search “Vitamin H”
  • 41. “ Curate” Identifiers
  • 42. “ Curate” Identifiers
  • 43. “ Curate” Identifiers
  • 44. Crowd-sourcing Chemistry Curation
    • Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate
  • 45. “ Curate” Identifiers
    • General curation activities
      • Remove incorrect names
      • Correct spellings
      • Add multilingual names
      • Add alternative names
    • In 3 years over 1 million structure-identifier relationships have been validated – robotically and manually
    • 130 people have participated in validation or annotation. “ Crowds ” can be quite small!
  • 46. Crowdsourcing Works
    • The “crowd” has deposited data (structures, spectra, etc) and participated in data curation
    • Different level curators check each others work
    • Wikipedia is the modern primary example
    • Some curators are “madmen”…
  • 47. Crowdsourcing Works
    • The “crowd” has deposited data (structures, spectra, etc) and participated in data curation
    • Different level curators check each others work
    • Wikipedia is the modern primary example
    • Some curators are “madmen”…
    • The Oxford English Dictionary
  • 48. Vancomycin – Curate This!!!
  • 49. Vancomycin on ChemSpider 1 compound – 3 days
  • 50. Crowdsourced “Annotations”
    • Users can add
      • Descriptions/Syntheses/Commentaries
      • Links to articles
      • Spectral data
      • Photos
      • MP3 files
      • Videos
  • 51. Multimedia Content Holder
  • 52. Gaming for Curation of Spectra
  • 53. ChemSpider Everywhere Crowdsourced Curation of Spectra
  • 54. Data Curation
  • 55. True Curation of Data
  • 56. ChemSpider SyntheticPages
  • 57. Drug Name Generic Name ChEBI ChemSpider CAS Com. Chem ChemIDPlus DailyMed DrugBank PubChem Wikipedia Spiriva Tiotropium Bromide No Hits  No Hits    4/0  Depakote Valproate semisodium        No Structure Basen Voglibose   No Hits  No Hits  2/1  Symbicort 1) Budesonide       8/1  Symbicort 2) Formoterol WRONG  No Hits    6/1  Vytorin 1) Ezetimibe   No Hits      Vytorin 2) Simvastatin       2/1  Taxol Paclitaxel       44/1  Thalidomid Thalidomide No Hits        Zocor Simvastatin       2/1  Crestor Rosuvastatin   No Hits    2/1 
  • 58. Sharing Our Activities
    • Presently defining approaches with other public compound databases to share results of curation activities
    • Member of large European project to link data from the Life Sciences. Sharing results of curation is essential
    • Making curation and contribution interfaces Mobile
  • 59. Mobile ChemSpider
  • 60. First request to Database Hosts!
    • Every public compound database host should add ONE feature – “Leave Comments”
  • 61. Second request to Database Hosts! Show Comments
  • 62. Question Quality
  • 63. Thank you Email: williamsa@rsc.org Twitter: ChemConnector Blog: www.chemspider.com/blog Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams

×