Your SlideShare is downloading. ×
0
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists?

2,369

Published on

With an intention to provide a high quality free internet resource of chemistry related data for the community, ChemSpider has aggregated almost 25 million compounds linked out to over 400 data …

With an intention to provide a high quality free internet resource of chemistry related data for the community, ChemSpider has aggregated almost 25 million compounds linked out to over 400 data sources and provided a platform for the community to both deposit and curate data. This experiment in crowdsourcing for chemistry has now been running for over three years. This presentation will review a number of aspects of the project including (a) the level of community participation in depositing and curating data; (b) the nature of data and content supplied by the community; (c) how ChemSpider is used by the community; (d) using game-based systems to assist in data curation; (e) algorithmic-based approaches to data validation and filtering; and (f) sharing data curation efforts with other online databases.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
2,369
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. ChemSpider - Does Community Engagement work to Build a Quality Online Resource for Chemists? Antony Williams ACS Denver August 30th 2011
  • 2. What’s said on the web is true…
  • 3. What’s said on the web is true…
  • 4. What’s said on the web is true… <ul><li>“ We then established a collaboration with professor Sum Ting Wong, a fugitive from the North Korean University Hu Yu Hai Ding, currently in Rome (Italy).” </li></ul><ul><li>“ This was identified as the new protein Wai So Dim (WSD).” </li></ul>
  • 5. Who is Sandy Lawson? Ask Google
  • 6. Who is Sandy..to me? <ul><li>Mentor in computer-generated nomenclature </li></ul><ul><li>Educational Technologist </li></ul><ul><li>Innovator </li></ul><ul><li>Ethical </li></ul><ul><li>“ Gentleman Sandy” </li></ul>
  • 7. What is the Structure of Vitamin K1?
  • 8. ChemSpider <ul><li>The Free Chemical Database </li></ul><ul><li>A central hub for chemists to source information </li></ul><ul><ul><li>>26 million unique chemical records </li></ul></ul><ul><ul><li>Aggregated from >400 data sources </li></ul></ul><ul><ul><li>Chemicals, spectra, CIF files, movies, images, podcasts, links to patents, publications, predictions </li></ul></ul><ul><li>A central hub for chemists to deposit & curate data </li></ul>
  • 9. ChemSpider general statements <ul><li>ChemSpider : one of many important resources </li></ul><ul><li>The “Google and Wikipedia of Chemistry” </li></ul><ul><li>A vision of “Linking all chemistry on the internet” </li></ul><ul><li>Most people in this room probably know about it </li></ul><ul><li>New people discover us regularly </li></ul><ul><li>Our distinct roles are: </li></ul><ul><ul><li>Hosting and exposing data for the community </li></ul></ul><ul><ul><li>Curating and validating chemistry-related data </li></ul></ul>
  • 10. I want to know about “Vincristine”
  • 11. I want to know about “Vincristine” If all algorithms work then everything on the page is correct by default except the name!
  • 12. Vincristine: Identifiers and Properties
  • 13. Vincristine: Identifiers and Properties
  • 14. Vincristine: Vendors and Sources
  • 15. Vincristine: Patents
  • 16. Vincristine: Articles
  • 17. Searches: The INTERNET All ChemSpider and Internet searches are “simply algorithms” but synonym searching is based on an assertion
  • 18. InChIs
  • 19. Validated Names for Searching…
  • 20. What you might not know about Chemistry Databases on the Internet <ul><li>Data-sharing between the databases is cyclic –proliferating errors – “Linked Data” </li></ul>
  • 21. What you might not know about Chemistry Databases on the Internet <ul><li>Some public databases are “trusted” as primary sources </li></ul><ul><li>Trust is granted without investigation or understanding of the content </li></ul>
  • 22. <ul><li>Consider searching each of these chemical databases by chemical name (systematic name, trade name or synonym). Please mark each online resource according to how much you generally trust the results. </li></ul>
  • 23. What you might not know about Chemistry Databases on the Internet <ul><li>Some public databases are “trusted” as primary sources. </li></ul>
  • 24. What you might not know about Chemistry Databases on the Internet <ul><li>Some public databases are “trusted” as primary sources </li></ul><ul><li>Trust is granted without investigation or understanding of the content </li></ul><ul><li>What do we know about some of the online resources? </li></ul>
  • 25. PHYSPROP Database <ul><li>The freely downloadable database under the EPI Suite prediction software </li></ul><ul><li>Very Basic filters suggest data quality issues </li></ul>
  • 26. The Stereochemistry challenge. 12500 chemicals with “missed” stereo
  • 27. NIST Webbook
  • 28. PubChem
  • 29. What you might not know about Chemistry Databases on the Internet <ul><li>Make sure you blame the database hosts!!! (???) </li></ul><ul><li>Errors are primarily deposited and inherited by the data suppliers </li></ul><ul><li>Chemistry databases depend enormously on structure representations… </li></ul>
  • 30.  
  • 31.  
  • 32.  
  • 33. What you might not know about Chemistry Databases on the Internet <ul><li>Despite all of the blog posts, lectures, presentations and pleas it’s not improving </li></ul>
  • 34. NPC Browser http://tripod.nih.gov/npc/
  • 35. NPC Browser http://tripod.nih.gov/npc/
  • 36. NPC Browser http://tripod.nih.gov/npc/
  • 37. NPC Browser http://tripod.nih.gov/npc/
  • 38. Patents
  • 39. Patents
  • 40. WYSIWYG compounds
  • 41. WYSIWYG compounds
  • 42. But Chemspider is curated right?
  • 43. Originally 15 compounds “called” Yohimbine 54 Skeletons for Yohimbine
  • 44. All aggegators suffer dilution!
  • 45. What is the structure of Discodermolide?
  • 46. How to distinguish…who’s wrong?
  • 47. Neither is wrong
  • 48. Data Curation…long torturous task <ul><li>Data curation – JUST structure-name validation is a long, torturous, iterative task. </li></ul><ul><li>How about validating “data” – PhysChem data such as logP data, boiling points, melting points, spectra </li></ul>
  • 49. Curating Melting Point Data http://tinyurl.com/3e44vbx
  • 50. Melting Point Validation Work
  • 51. Some melting points can’t be resolved only with literature: 4-benzyltoluene
  • 52. Data Curation…long torturous task <ul><li>Data curation – JUST structure-name validation is a long, torturous, iterative task. </li></ul><ul><li>How about validating “data” – PhysChem data such as logP data, boiling points, melting points (J.C.Bradley’s talk), spectra </li></ul><ul><li>The crowd in crowdsourcing is …generally small </li></ul><ul><li>Which of the large databases are doing careful curation. How can we share the workload? Hmm.. </li></ul>
  • 53. ChemSpider can “do it” for us <ul><li>ChemSpider provides a curation interface </li></ul><ul><li>All curation activities are available for review, online immediately, iteratively checked </li></ul><ul><li>Curators have different abilities based on their profile: There are only a few “Master Curators”. </li></ul><ul><li>Can we “share” the curation workload? </li></ul>
  • 54. Identifier Dictionaries <ul><li>Reciprocal curation processes…share curation with each other. </li></ul><ul><li>If a database has a compound already then use InChiKeys to match “suggested” validation against the compound. </li></ul><ul><li>A series of “added” and “removed” synonyms against InChIKeys for matching. </li></ul>
  • 55. Proof of Concept Data Curation Sharing
  • 56. Structure Validation using feed <ul><li>Look for approved synonyms </li></ul><ul><li>Compare feed InChIKey with database InChIKey </li></ul><ul><li>If different, flag for inspection </li></ul>
  • 57. Identifier Dictionaries <ul><li>Reciprocal curation processes…share curation with each other. </li></ul><ul><li>If a database has a compound already then use InChiKeys to match “suggested” validation against the compound. </li></ul><ul><li>A series of “added” and “removed” synonyms against InChIKeys for matching. </li></ul><ul><li>Who will participate??? </li></ul>
  • 58. Batch Validation Also Works! <ul><li>Batch validation of name-structure relationships </li></ul><ul><li>“ Background Processing framework” </li></ul><ul><li>Hexamethylchickenwire Chloride = C12H23O5 </li></ul>
  • 59. Batch Validation Also Works! <ul><li>Batch validation of name-structure relationships </li></ul><ul><li>“ Background Processing framework” </li></ul><ul><li>Hexamethylchickenwire Chloride = C12H23O5 </li></ul>
  • 60. Batch Validation Also Works! <ul><li>Batch validation of name-structure relationships </li></ul><ul><li>“ Background Processing framework” </li></ul><ul><li>Hexamethylchickenwire Chloride = C12H23O5 </li></ul><ul><li>Define set of synonym filters and process the entire backfile. We will use synonym filters at deposition </li></ul>
  • 61. Community Contribution to ChemSpider <ul><li>ChemSpider as a host for community contributions </li></ul><ul><ul><li>Curation and validation input </li></ul></ul><ul><ul><li>Structures </li></ul></ul><ul><ul><li>Movies </li></ul></ul><ul><ul><li>Images </li></ul></ul><ul><ul><li>Analytical data – especially spectra </li></ul></ul>
  • 62. Spectra
  • 63. www.SpectralGame.com http://www.jcheminf.com/content/1/1/9
  • 64. Spectral Game
  • 65. Data Curation
  • 66. Reversed Spectrum
  • 67. Download, reprocess, redeposit
  • 68. True Curation of Data
  • 69. Batch wise validation of NMR data
  • 70. Automated C13 Verification
  • 71. Mixture Identified
  • 72. NMR Verification <ul><li>H1 NMR: 77% of spectra consistent </li></ul><ul><li>C13NMR: 67% of spectra consistent </li></ul><ul><li>Algorithms NOT perfect but did identify: </li></ul><ul><ul><li>Misreferenced data </li></ul></ul><ul><ul><li>Reversed spectra </li></ul></ul><ul><ul><li>22 mixtures identified </li></ul></ul><ul><ul><li>Signal-to-noise was poor – missing peaks </li></ul></ul><ul><li>What about 2DNMR verification? </li></ul>
  • 73. ChemSpider ID 24528095 HHCOSY
  • 74. ChemSpider ID 24528095 HSQC
  • 75. Crowdsourced Spectral Data <ul><li>Spectral data available at </li></ul><ul><li>http :// www.chemspider.com/spectra.aspx </li></ul><ul><li>Regular data depositions </li></ul><ul><li>Generally licensed as Open Data </li></ul><ul><li>Chemical vendors now contributing spectral data – up to 800 spectra presently being acquired </li></ul><ul><li>All data welcomed – who will they benefit ? </li></ul><ul><ul><li>www.SpectralGame.com </li></ul></ul><ul><ul><li>http://spectraschool.rsc.org/ </li></ul></ul>
  • 76. SpectraSchool
  • 77.  
  • 78. Community Contribution to ChemSpider <ul><li>ChemSpider as a host for community contributions </li></ul><ul><ul><li>Curation and validation input </li></ul></ul><ul><ul><li>Analytical data – especially spectra </li></ul></ul><ul><ul><li>Movies, images </li></ul></ul><ul><ul><li>Is it just structures? </li></ul></ul><ul><li>ChemSpider SyntheticPages as a host for reaction syntheses </li></ul>
  • 79. ChemSpider SyntheticPages
  • 80. ChemSpider SyntheticPages
  • 81. Submission Process <ul><li>Simple template-based submission process </li></ul><ul><li>Submissions reviewed by editorial board. Published as is or comments sent to author </li></ul><ul><li>Online Peer Review process </li></ul><ul><li>Data supported include web movies, images, live spectra etc. </li></ul><ul><li>DOI issued to author </li></ul>
  • 82. Is it working? <ul><li>Show of hands… </li></ul><ul><ul><li>How many of you know CSSP? </li></ul></ul><ul><ul><li>Have any of you submitted to CSSP? </li></ul></ul><ul><li>Low submissions but some dedicated authors </li></ul>
  • 83. Is it working? <ul><li>Show of hands… </li></ul><ul><ul><li>How many of you know CSSP? </li></ul></ul><ul><ul><li>Have any of you submitted to CSSP? </li></ul></ul><ul><li>Low submissions but some dedicated authors </li></ul><ul><li>It is NOT a technology issue </li></ul><ul><ul><li>Students need permission to publish </li></ul></ul><ul><ul><li>Publishing syntheses might prevent publication </li></ul></ul><ul><ul><li>CSSP would grow if we abstracted supp. info – templated supp info. submissions could help. </li></ul></ul>
  • 84. Crowdsourcing – does it work? <ul><li>131 people EVER has either deposited or curated data on ChemSpider </li></ul><ul><li>ChemSpider SyntheticPages has a small group of dedicated authors </li></ul><ul><li>Database hosts and vendors make the largest contributions of data </li></ul><ul><li>ChemSpider staff do the most curation </li></ul>
  • 85. If it was not just about me… <ul><li>We might have a community built encyclopedia </li></ul><ul><li>I might know where the best restaurants are </li></ul><ul><li>I might get good advice on books to read </li></ul><ul><li>I might know which movies to watch </li></ul><ul><li>I might know which plumber to call </li></ul><ul><li>Data might just be Open </li></ul>
  • 86. If it was not just about me… <ul><li>We might have a community built encyclopedia </li></ul><ul><li>I might know where the best restaurants are </li></ul><ul><li>I might get good advice on books to read </li></ul><ul><li>I might know which movies to watch </li></ul><ul><li>I might know which plumber to call </li></ul><ul><li>Data might just be Open </li></ul>
  • 87. How will it improve? <ul><li>Participation </li></ul><ul><li>and </li></ul><ul><li>contribution </li></ul>
  • 88. RSC’s LearnChemistry:Share
  • 89. <ul><li>Improved Quality of data is essential </li></ul><ul><li>Open PHACTS : partnership between European Community and EFPIA </li></ul><ul><li>Freely accessible for knowledge discovery and verification. </li></ul><ul><ul><li>Data on small molecules </li></ul></ul><ul><ul><li>Pharmacological profiles </li></ul></ul><ul><ul><li>ADMET data </li></ul></ul><ul><ul><li>Biological targets and pathways </li></ul></ul><ul><ul><li>Proprietary and public data sources. </li></ul></ul>
  • 90. Conclusions <ul><li>ChemSpider has an important role in quality data </li></ul><ul><li>Crowdsourced deposition, validation and curation works but low engagement to date </li></ul><ul><li>Primary challenge – engaging the community to help create what they want. Rewards and recognition ? </li></ul><ul><li>MORE collaboration can benefit us all </li></ul><ul><li>All indicators are good for continued growth </li></ul>
  • 91. Acknowledgments <ul><li>The ChemSpider team </li></ul><ul><li>Craig Knox, DrugBank </li></ul><ul><li>Our data providers, depositors, collaborators and curators </li></ul><ul><li>Software providers – OpenEye, ChemDoodle, ACD/Labs, GGA Software, Open Source (Jmol, JSpecView, OpenBabel) </li></ul>
  • 92. Thank you Email: williamsa@rsc.org Twitter: ChemConnector Blog: www.chemspider.com/blog Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams

×