Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th  2012
The World of Online Chemistry <ul><li>Safety data </li></ul><ul><li>Toxicity data  </li></ul><ul><li>Blogs and Wikis  </li...
If it was not just about me…
If it was not just about me… <ul><li>We might have a community built encyclopedia </li></ul><ul><li>I might know where the...
If it was not just about me… <ul><li>We might have a community built encyclopedia </li></ul><ul><li>I might know where the...
Collaborative Knowledge Management
QUESTION <ul><li>Are you involved with assisting chemists, pharmaceutical scientists, etc. in sourcing information about C...
Chemistry Databases on the Internet <ul><li>Public databases are “trusted” as primary sources </li></ul><ul><li>Trust is g...
With Great Fanfare…
NPC Browser  http://tripod.nih.gov/npc/
 
NPC Browser  http://tripod.nih.gov/npc/
How many contribute to clean-up? <ul><li>Less than a dozen contributors to data </li></ul><ul><li>The  majority  are proje...
What you might not know about Chemistry Databases on the Internet <ul><li>Data-sharing between the databases is cyclic –pr...
What is the Structure of Vitamin K?
MeSH <ul><li>A lipid cofactor that is required for normal blood clotting.  </li></ul><ul><li>Several forms of vitamin K ha...
What is the Structure of Vitamin K1?
QUESTION <ul><li>Who has heard of ChemSpider as a chemistry database? </li></ul><ul><ul><li>1 . Yes </li></ul></ul><ul><ul...
ChemSpider
We Want to Answer Questions <ul><li>Questions a chemist might ask… </li></ul><ul><ul><li>What is the melting point of n-he...
Available Information… <ul><li>Linked to vendors, safety data, toxicity, metabolism </li></ul>
Available Information….
Crowdsourced “Annotations” <ul><li>Users can add  </li></ul><ul><ul><li>Descriptions/Syntheses/Commentaries </li></ul></ul...
 
QUESTION <ul><li>Did you know that ChemSpider was OWNED by the Royal Society of Chemistry? </li></ul><ul><ul><li>1 . Yes <...
Public Domain Databases <ul><li>Our  databases are a mess… </li></ul><ul><li>Non-curated databases are proliferating error...
Stop Whining – Fix it
Crowdsourced Curation <ul><li>Crowdsourced curation: identify/tag errors, edit names, synonyms, identify records to deprec...
Search “Vitamin H”
“ Curate” Identifiers
“ Curate” Identifiers
Validated Name-Structure Dictionaries <ul><li>Chemical name dictionaries are used for: </li></ul><ul><ul><ul><li>Text-mini...
Why are Dictionaries important?
The Final Search Strategy
Many Names, One Structure
I want to know about “Vincristine”
Vincristine: Identifiers and Properties
Vincristine: Patents Linked by  Name
Text-Mining Depends on Dictionaries
Curated Dictionaries Matter
Originally 15 compounds “called” Yohimbine 54 Skeletons for Yohimbine
Sharing Chemspider  curation
Data Curation Sharing - Proof of Concept
Identifier Dictionaries <ul><li>Reciprocal curation processes…share curation </li></ul><ul><li>A series of “added” and “re...
Community Contribution to ChemSpider
www.SpectralGame.com http://www.jcheminf.com/content/1/1/9
Curation through “gaming”
Data Curation
Reversed Spectrum
True Curation of Data
ChemSpider SyntheticPages
ChemSpider SyntheticPages
Submission Process <ul><li>Simple template-based submission process </li></ul><ul><li>Submissions reviewed by editorial bo...
Crowdsourcing – does it work? <ul><li>192 people EVER have deposited or curated data </li></ul><ul><li>ChemSpider Syntheti...
Contributions
Curations <ul><li>2009 – 8255 curations by 43 people </li></ul><ul><li>2010 – 10014 curations by 66 people  </li></ul><ul>...
www. SciMobileApps .com <ul><li>8 contributors only…in 7 months </li></ul>
www. SciDBs .com <ul><li>7 contributors only…in 6 months </li></ul>
www. ScientistsDB .com <ul><li>38 contributors …in 6 weeks </li></ul>
What encourages participation? <ul><li>“ Interested” parties contribute  </li></ul><ul><li>Marketing and self-promotion ar...
<ul><li>Crowdsourcing across drug discovery </li></ul><ul><li>Open PHACTS : partnership between European Community and Eur...
 
How will it improve? <ul><li>Participation  </li></ul><ul><li>and  </li></ul><ul><li>contribution  </li></ul>
Conclusions <ul><li>For chemistry - crowdsourced deposition, annotation, and curation works but  low  engagement to date  ...
Thank you Email : williamsa@rsc.org  Twitter : ChemConnector Personal Blog :  www.chemconnector.com   SLIDES:  www.slidesh...
Upcoming SlideShare
Loading in...5
×

Crowdsourcing Chemistry for the Community – 5 Years of Experiences

3,020

Published on

ChemSpider is one of the internet’s primary resources for chemists. ChemSpider is a structure-centric platform and hosts over 26 million unique chemical entities sourced from over 400 different data sources and delivers information including commercial availability, associated publications, patents, analytical data, experimental and predicted properties. ChemSpider serves a rather unique role to the community in that any chemist has the ability to deposit, curate and annotate data. In this manner they can contribute their skills, and data, to any chemist using the system. A number of parallel projects have been developed from the initial platform including ChemSpider SyntheticPages, a community generated database of reaction syntheses, and the Learn Chemistry wiki, an educational wiki for secondary school students.

This presentation will provide an overview of the project in terms of our success in engaging scientists to contribute to crowdsouring chemistry. We will also discuss some of our plans to encourage future participation and engagement in this and related projects.

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,020
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
14
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Crowdsourcing Chemistry for the Community – 5 Years of Experiences

  1. 1. Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012
  2. 2. The World of Online Chemistry <ul><li>Safety data </li></ul><ul><li>Toxicity data </li></ul><ul><li>Blogs and Wikis </li></ul><ul><li>Property databases </li></ul><ul><li>Experimental results </li></ul><ul><li>Scientific publications </li></ul><ul><li>Compound aggregators </li></ul><ul><li>Open Notebook Science </li></ul><ul><li>Metabolic pathway databases </li></ul><ul><li>Encyclopedic articles (Wikipedia) </li></ul>
  3. 3. If it was not just about me…
  4. 4. If it was not just about me… <ul><li>We might have a community built encyclopedia </li></ul><ul><li>I might know where the best restaurants are </li></ul><ul><li>I might get good advice on books to read </li></ul><ul><li>I might know which movies to watch </li></ul><ul><li>I might know which plumber to call </li></ul><ul><li>Data might just be Open </li></ul>
  5. 5. If it was not just about me… <ul><li>We might have a community built encyclopedia </li></ul><ul><li>I might know where the best restaurants are </li></ul><ul><li>I might get good advice on books to read </li></ul><ul><li>I might know which movies to watch </li></ul><ul><li>I might know which plumber to call </li></ul><ul><li>Data might just be Open </li></ul>
  6. 6. Collaborative Knowledge Management
  7. 7. QUESTION <ul><li>Are you involved with assisting chemists, pharmaceutical scientists, etc. in sourcing information about Chemistry? </li></ul><ul><ul><li>1 . Yes </li></ul></ul><ul><ul><li>2 . No </li></ul></ul>
  8. 8. Chemistry Databases on the Internet <ul><li>Public databases are “trusted” as primary sources </li></ul><ul><li>Trust is granted without investigation of the content </li></ul><ul><li>Online data vary dramatically in quality! </li></ul><ul><li>Examples… </li></ul>
  9. 9. With Great Fanfare…
  10. 10. NPC Browser http://tripod.nih.gov/npc/
  11. 12. NPC Browser http://tripod.nih.gov/npc/
  12. 13. How many contribute to clean-up? <ul><li>Less than a dozen contributors to data </li></ul><ul><li>The majority are project members </li></ul><ul><li>The crowd is small … </li></ul>
  13. 14. What you might not know about Chemistry Databases on the Internet <ul><li>Data-sharing between the databases is cyclic –proliferating errors – “Linked Data” </li></ul>
  14. 15. What is the Structure of Vitamin K?
  15. 16. MeSH <ul><li>A lipid cofactor that is required for normal blood clotting. </li></ul><ul><li>Several forms of vitamin K have been identified: </li></ul><ul><ul><li>VITAMIN K 1 (phytomenadione) derived from plants , </li></ul></ul><ul><ul><li>VITAMIN K 2 (menaquinone) from bacteria, and synthetic naphthoquinone provitamins, </li></ul></ul><ul><ul><li>VITAMIN K 3 (menadione). </li></ul></ul>
  16. 17. What is the Structure of Vitamin K1?
  17. 18. QUESTION <ul><li>Who has heard of ChemSpider as a chemistry database? </li></ul><ul><ul><li>1 . Yes </li></ul></ul><ul><ul><li>2 . No </li></ul></ul>
  18. 19. ChemSpider
  19. 20. We Want to Answer Questions <ul><li>Questions a chemist might ask… </li></ul><ul><ul><li>What is the melting point of n-heptanol? </li></ul></ul><ul><ul><li>What is the chemical structure of Xanax? </li></ul></ul><ul><ul><li>Chemically, what is phenolphthalein? </li></ul></ul><ul><ul><li>What are the stereocenters of cholesterol? </li></ul></ul><ul><ul><li>Where can I find publications about xylene? </li></ul></ul><ul><ul><li>What are the different trade names for Ketoconazole? </li></ul></ul><ul><ul><li>What is the NMR spectrum of Aspirin? </li></ul></ul><ul><ul><li>What are the safety handling issues for Thymol Blue? </li></ul></ul>
  20. 21. Available Information… <ul><li>Linked to vendors, safety data, toxicity, metabolism </li></ul>
  21. 22. Available Information….
  22. 23. Crowdsourced “Annotations” <ul><li>Users can add </li></ul><ul><ul><li>Descriptions/Syntheses/Commentaries </li></ul></ul><ul><ul><li>Links to PubMed articles </li></ul></ul><ul><ul><li>Links to articles via DOIs </li></ul></ul><ul><ul><li>Add spectral data </li></ul></ul><ul><ul><li>Add Crystallographic Information Files </li></ul></ul><ul><ul><li>Add photos </li></ul></ul><ul><ul><li>Add MP3 files </li></ul></ul><ul><ul><li>Add Videos </li></ul></ul>
  23. 25. QUESTION <ul><li>Did you know that ChemSpider was OWNED by the Royal Society of Chemistry? </li></ul><ul><ul><li>1 . Yes </li></ul></ul><ul><ul><li>2 . No </li></ul></ul>
  24. 26. Public Domain Databases <ul><li>Our databases are a mess… </li></ul><ul><li>Non-curated databases are proliferating errors </li></ul><ul><li>We source and deposit data between databases </li></ul><ul><li>Original sources of errors hard to determine </li></ul><ul><li>Curation is time-consuming and challenging </li></ul>
  25. 27. Stop Whining – Fix it
  26. 28. Crowdsourced Curation <ul><li>Crowdsourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate </li></ul>
  27. 29. Search “Vitamin H”
  28. 30. “ Curate” Identifiers
  29. 31. “ Curate” Identifiers
  30. 32. Validated Name-Structure Dictionaries <ul><li>Chemical name dictionaries are used for: </li></ul><ul><ul><ul><li>Text-mining (publications, patents) </li></ul></ul></ul><ul><ul><ul><ul><li>Used to index PubMed and link to Google Patents </li></ul></ul></ul></ul><ul><ul><ul><li>Linking to other databases – think Biology! </li></ul></ul></ul><ul><ul><ul><ul><li>When structures are not available drug names link </li></ul></ul></ul></ul><ul><ul><ul><li>Searching the web </li></ul></ul></ul><ul><ul><ul><ul><li>Names link to structures link to InChIs </li></ul></ul></ul></ul>
  31. 33. Why are Dictionaries important?
  32. 34. The Final Search Strategy
  33. 35. Many Names, One Structure
  34. 36. I want to know about “Vincristine”
  35. 37. Vincristine: Identifiers and Properties
  36. 38. Vincristine: Patents Linked by Name
  37. 39. Text-Mining Depends on Dictionaries
  38. 40. Curated Dictionaries Matter
  39. 41. Originally 15 compounds “called” Yohimbine 54 Skeletons for Yohimbine
  40. 42. Sharing Chemspider curation
  41. 43. Data Curation Sharing - Proof of Concept
  42. 44. Identifier Dictionaries <ul><li>Reciprocal curation processes…share curation </li></ul><ul><li>A series of “added” and “removed” synonyms against structures for matching. </li></ul><ul><li>Announced 9 months ago – only one consumer </li></ul><ul><li>Who will participate??? </li></ul>
  43. 45. Community Contribution to ChemSpider
  44. 46. www.SpectralGame.com http://www.jcheminf.com/content/1/1/9
  45. 47. Curation through “gaming”
  46. 48. Data Curation
  47. 49. Reversed Spectrum
  48. 50. True Curation of Data
  49. 51. ChemSpider SyntheticPages
  50. 52. ChemSpider SyntheticPages
  51. 53. Submission Process <ul><li>Simple template-based submission process </li></ul><ul><li>Submissions reviewed by editorial board. </li></ul><ul><li>Online Peer Review process </li></ul><ul><li>Crowdsourced expansion? </li></ul><ul><ul><li>A few regular dedicated authors only </li></ul></ul><ul><ul><li>Online peer review and feedback small but useful </li></ul></ul>
  52. 54. Crowdsourcing – does it work? <ul><li>192 people EVER have deposited or curated data </li></ul><ul><li>ChemSpider SyntheticPages small group of authors </li></ul><ul><li>Database hosts make the largest contributions </li></ul><ul><li>ChemSpider staff tend to do the most curation </li></ul>
  53. 55. Contributions
  54. 56. Curations <ul><li>2009 – 8255 curations by 43 people </li></ul><ul><li>2010 – 10014 curations by 66 people </li></ul><ul><li>2011 – 16025 curations by 116 people </li></ul><ul><li>“ Crowdsourcing” – the crowd is small ! </li></ul>
  55. 57. www. SciMobileApps .com <ul><li>8 contributors only…in 7 months </li></ul>
  56. 58. www. SciDBs .com <ul><li>7 contributors only…in 6 months </li></ul>
  57. 59. www. ScientistsDB .com <ul><li>38 contributors …in 6 weeks </li></ul>
  58. 60. What encourages participation? <ul><li>“ Interested” parties contribute </li></ul><ul><li>Marketing and self-promotion are primary reasons for participation </li></ul><ul><li>There are very few “selfless” participants </li></ul><ul><li>Relationships garner contributions… </li></ul>
  59. 61. <ul><li>Crowdsourcing across drug discovery </li></ul><ul><li>Open PHACTS : partnership between European Community and European Pharma Companies </li></ul><ul><li>Freely accessible for knowledge discovery and verification. </li></ul><ul><ul><li>Data on chemistry and biology </li></ul></ul><ul><ul><li>Pharmacological profiles </li></ul></ul><ul><ul><li>Proprietary and public data sources. </li></ul></ul>
  60. 63. How will it improve? <ul><li>Participation </li></ul><ul><li>and </li></ul><ul><li>contribution </li></ul>
  61. 64. Conclusions <ul><li>For chemistry - crowdsourced deposition, annotation, and curation works but low engagement to date </li></ul><ul><li>Primary challenge – engaging the community to help create what they want. Rewards and recognition ? </li></ul><ul><li>MORE collaboration can benefit us all </li></ul><ul><li>Indicators are good for small but continued growth </li></ul>
  62. 65. Thank you Email : williamsa@rsc.org Twitter : ChemConnector Personal Blog : www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×