Your SlideShare is downloading. ×
0
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community

2,004

Published on

This is the presentation I gave at OpenSciNY 2010. It was a great gathering of Librarians and people interested in Open Science. Sharing the stage with Beth Brown Jean-Claude Bradley and Heather …

This is the presentation I gave at OpenSciNY 2010. It was a great gathering of Librarians and people interested in Open Science. Sharing the stage with Beth Brown Jean-Claude Bradley and Heather Joseph was, as usual, a good opportunity to discuss how openness and online data sharing is changing the way we access and share data. We live in interesting and exciting times.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,004
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
10
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community OpenSciNY, New York, May 2010,
  • 2. Once Upon a Time Over a “Coffee”
  • 3. Which is better for Plants? Vodka, Sprite or Viagra?
  • 4. It Works – Viagra Wins the Day
  • 5. Now Which is Better? <ul><li>Viagra or Cialis? </li></ul><ul><li>Images sourced from Wikipedia </li></ul>
  • 6. Cialis <ul><li>I want… </li></ul><ul><ul><ul><li>The structure </li></ul></ul></ul><ul><ul><ul><li>Any patent information </li></ul></ul></ul><ul><ul><ul><li>Related publications </li></ul></ul></ul><ul><ul><ul><li>Where can I buy it? </li></ul></ul></ul><ul><ul><ul><li>Metabolic pathway info </li></ul></ul></ul><ul><ul><ul><li>What else is easy to find… </li></ul></ul></ul>
  • 7. Cialis on Google?
  • 8. What is Cialis?
  • 9. What is Cialis? Can we trust Wikipedia?
  • 10. What is Cialis? 6 hits on PubChem
  • 11. What is Cialis?
  • 12. Search by Trade Name
  • 13. Are there other names???
  • 14. Are there other names??? <ul><li>PubMed hits: </li></ul><ul><ul><li>736 Tadalafil </li></ul></ul><ul><ul><li>744 Cialis </li></ul></ul>
  • 15. Are there other names???
  • 16. Are There Other Names?
  • 17. IC351 on PubChem? 5 HITS for IC351 ZERO HITS for IC 351
  • 18. Chemistry on the Web <ul><li>Text searching the web is far from optimal </li></ul><ul><li>The quality of data on the web is a problem </li></ul><ul><li>It may be hard to find but it is “out there” </li></ul><ul><li>What was once locked up behind an expensive license can generally be found </li></ul><ul><li>Structure searching the web is already possible! </li></ul>
  • 19. Text Searching the Web <ul><li>Text searching the web for chemical compounds is an enormous challenge </li></ul><ul><li>RSC has multiple databases, >500,000 articles and a lot of other resources. How do we do? </li></ul>
  • 20. The RSC Publishing Platform (Beta)
  • 21. 2+2 = 4 Articles?
  • 22. CAS Number Search
  • 23. Text Searching the Web <ul><li>Disambiguation dictionaries of name-structure relationships would be very enabling. </li></ul><ul><ul><li>IC351 = IC 351 = Tadalafil = Cialis = … </li></ul></ul><ul><li>Creating validated dictionaries is an enormous challenge to cover chemistry </li></ul>
  • 24. CAS Registry – LOTS of Chemicals!
  • 25.  
  • 26.  
  • 27. The Final Search Strategy A “Disambiguation Query!”
  • 28. All Those Names, One Structure A problem to solve…
  • 29. ChemSpider - A Pragmatic Vision <ul><ul><li>“ Build a Structure Centric Community to </li></ul></ul><ul><ul><li>Serve Chemists” </li></ul></ul><ul><ul><li>Aggregate and integrate chemical structure data on the web – names, structures, links </li></ul></ul><ul><ul><li>Create a “structure-based hub” to information, data and algorithmic predictions </li></ul></ul><ul><ul><li>Let chemists contribute their own data </li></ul></ul><ul><ul><li>Allow the community to curate/correct data </li></ul></ul>
  • 30. media.obsessable.com <ul><li>As few interfaces as possible </li></ul>What do humans want?
  • 31. Aggregating Data – Who to Trust??? <ul><li>Encyclopedic articles (Wikipedia) </li></ul><ul><li>Chemical vendor databases </li></ul><ul><li>Metabolic pathway databases </li></ul><ul><li>Property databases </li></ul><ul><li>Patents with chemical structures </li></ul><ul><li>Drug Discovery data </li></ul><ul><li>Scientific publications </li></ul><ul><li>Compound aggregators </li></ul><ul><li>Blogs/Wikis and Open Notebook Science </li></ul>
  • 32. Just “Public Compound” Databases <ul><li>PubChem </li></ul><ul><li>Drugbank </li></ul><ul><li>ChEBI/ChEMBL </li></ul><ul><li>KEGG </li></ul><ul><li>LipidMAPs </li></ul><ul><li>ChemIDPlus </li></ul><ul><li>eMolecules </li></ul><ul><li>ZINC </li></ul><ul><li>Lots of chemical vendors </li></ul>
  • 33. Question Everything online: www.dhmo.org
  • 34. Di-Hydrogen Monoxide <ul><li>2H </li></ul>
  • 35. Di-Hydrogen Monoxide <ul><li>2H + 1O </li></ul>
  • 36. Di-Hydrogen Monoxide <ul><li>H2O </li></ul>
  • 37. Di-Hydrogen Monoxide <ul><li>H2O </li></ul><ul><li>Water </li></ul>
  • 38. It’s all on Wikipedia…
  • 39. What About Gases? Methane…
  • 40. What’s Methane?
  • 41. What’s Methane?
  • 42. What ELSE is Methane???
  • 43. Structural Data for Life Sciences DailyMed
  • 44. Lack of Stereochemisty
  • 45. Incorrect Structures
  • 46. Pragmatic Vision Delivered… <ul><li>Aggregate, integrate and link data from across the internet </li></ul><ul><li>Almost 25 million structures from > 300 data sources </li></ul><ul><li>Linked to vendors, literature, online databases (open and commercial), open notebook science, patents and…. </li></ul><ul><li>Robotic and Crowdsourced Curation </li></ul>
  • 47. Search “OEA”
  • 48. Search OEA
  • 49. Search OEA
  • 50. Search OEA
  • 51. Linked Patents for OEA
  • 52. Answering Questions… <ul><li>Questions a student might ask… </li></ul><ul><ul><li>What is the structure of levulinic acid? </li></ul></ul><ul><ul><li>Chemically, what is phenolphthalein? </li></ul></ul><ul><ul><li>What are the stereocenters of cholesterol? </li></ul></ul><ul><ul><li>Where can I find publications about xylene? </li></ul></ul><ul><ul><li>What are the different trade names for Ketoconazole? </li></ul></ul><ul><ul><li>What is the NMR spectrum of Aspirin? </li></ul></ul><ul><ul><li>How can I synthesize 2,4-dichlorophenol? </li></ul></ul><ul><ul><li>What are the safety handling issues for Thymol Blue? </li></ul></ul>
  • 53. Back to Cialis…
  • 54. Cialis on ChemSpider : 1 hit <ul><li>Chemicals are curated/validated on ChemSpider by ourselves and the community </li></ul><ul><li>Based on assertions from various sources. Iterative, time-consuming and exacting! </li></ul><ul><li>We believe we know the structure now </li></ul><ul><li>What is linked and available? </li></ul>
  • 55. Google Patents
  • 56. ChemSpider – Patents Linked SURECHEM PATENTS GOOGLE
  • 57. Google Books
  • 58. Microsoft Academic Search
  • 59. Google Scholar – Articles were found by CAS Number !
  • 60. Identifiers for Tadalafil
  • 61. How Many Articles in RSC Journals ? <ul><li>Based on 171596-29​-5 there are 13 articles in RSC journals </li></ul><ul><li>What about if we VALIDATE identifiers? </li></ul>
  • 62. Validated Dictionaries Hit APIs This is data curation...
  • 63. Does this generate more results?
  • 64. RSC Journals
  • 65. RSC Journals REMEMBER 2+2 = 4
  • 66. PubMed
  • 67. Google Scholar – Expanded Hit Set
  • 68. Microsoft Academic Search
  • 69. Microsoft Academic Search <ul><li>Be careful! More mussels than drugs… </li></ul>
  • 70. Searching Chemistry on the Internet <ul><li>Do we get complete a result set will we get if we search for “chemicals” only by name? </li></ul><ul><li>Is there a better way to link chemistry databases? Linking by “names” is dangerous </li></ul><ul><li>Chemists want structure and SUBstructure searching </li></ul>
  • 71. Structure Searching the Web <ul><li>We have resources about Tadalafil actively linked to ChemSpider </li></ul><ul><li>What about searching the web for Tadalafil by structure…not based on the various identifiers </li></ul><ul><li>How? </li></ul>
  • 72. Link the Internet with InChIKeys! Taken from: Rafael Sidis’ Blog
  • 73. The InChI Identifier
  • 74. Multiple Layers
  • 75. InChIStrings Hash to InChIKeys
  • 76. Cialis – Searching the Web by InChI Search Molecular SKELETON Search Full Molecule
  • 77. InChI Search the Web by Skeleton 78 Hits by Skeleton
  • 78. InChI Search the Web Exact Match 32 Hits by InChIKey
  • 79. InChI Search the Web Exact Match 6 Hits by Standard InChIKey
  • 80. InChifying the Web <ul><li>There are more than 2X “skeletons” for Cialis than exact matches – different stereo? Mistakes? </li></ul><ul><li>Our judgment…MISTAKES </li></ul>
  • 81. Vancomycin – Search the Internet
  • 82. Full Molecule Search: 4 Hits
  • 83. Full Skeleton Search: 104 Hits
  • 84. InChIKeys Make the internet searchable by adding InChIKeys Publishers add InChIKeys to papers now… But what is the structure???
  • 85. We need an InChI “Resolver”
  • 86. InChI Resolver to DOIs Structure Search the Web
  • 87. Semantic Markup: Project Prospect
  • 88. Depends on Validated Dictionaries Link to a Structure or the Right Structure?
  • 89. Name-Structure Pairs
  • 90. Semantic Linking of Structures <ul><li>What would you want to link off a structure? </li></ul><ul><ul><li>Chemical suppliers </li></ul></ul><ul><ul><li>Other publications </li></ul></ul><ul><ul><li>Analytical Data </li></ul></ul><ul><ul><li>Related Reactions </li></ul></ul><ul><ul><li>Wikipedia </li></ul></ul><ul><ul><li>Patents </li></ul></ul><ul><ul><li>“ Everything” </li></ul></ul><ul><ul><li>Through ChemSpider! </li></ul></ul>
  • 91. Unpublished Chemistry <ul><li>Only a fraction of chemistry is published </li></ul><ul><li>Only a tiny fraction of chemistry is patented </li></ul><ul><li>What of the “Lost Chemistry”- never published and cannot be abstracted </li></ul><ul><ul><li>Reactions performed </li></ul></ul><ul><ul><li>Structures made and studied </li></ul></ul><ul><ul><li>Spectra acquired and then disposed of </li></ul></ul><ul><ul><li>Available chemicals never found </li></ul></ul>
  • 92. Org Prep Daily (Blog)
  • 93. ChemSpider SyntheticPages
  • 94. Submission process <ul><li>Register as a user </li></ul><ul><li>Use the Submit button and fill in the fields… </li></ul>
  • 95. Submission Process <ul><li>Submissions reviewed by editorial board </li></ul><ul><li>Published as is or comments sent to author </li></ul><ul><li>Online Peer Review process </li></ul><ul><li>Data supported include web movies, images, live spectra etc. </li></ul>
  • 96. Micro- and Nano-publications <ul><li>Blogs, wiki entries and even Amazon book reviews are micro/nano-publications </li></ul><ul><li>ChemSpider SyntheticPages will be DOI’ed – students can add these “micro-publications” to their resume </li></ul><ul><li>Structures and spectra are nano-publications – these can be tracked and referenced also. (depositions, curations etc). Students participate in building one of the premier sources of chemistry data. </li></ul>
  • 97. ChemSpider : Spectra Linked
  • 98. Spectra Linked
  • 99. Spectra Linked
  • 100. Not Just NMR Data
  • 101. www.SpectralGame.com http://www.jcheminf.com/content/1/1/9
  • 102. Spectral Game
  • 103. Increasing Complexity
  • 104. Spectral Game
  • 105. ChemSpider Content <ul><li>ChemSpider is a container…supports multimedia </li></ul><ul><ul><li>Spectra </li></ul></ul><ul><ul><li>Crystal structures </li></ul></ul><ul><ul><li>Images </li></ul></ul><ul><ul><li>MP3s </li></ul></ul><ul><ul><li>Videos </li></ul></ul>
  • 106. Roses’ Crystal Image Collection
  • 107. MP3s and Videos : Titanium
  • 108. Periodic Table Images
  • 109. How Can You Help ChemSpider? <ul><li>Deposit your data and share with the community </li></ul><ul><ul><li>Structures – one or many </li></ul></ul><ul><ul><li>Spectra </li></ul></ul><ul><ul><li>Links </li></ul></ul><ul><ul><li>Syntheses into SyntheticPages </li></ul></ul><ul><li>Curate data – most basic level…just add comments </li></ul><ul><li>Spread the word – ChemSpider is an untapped resource </li></ul>
  • 110. Community Contribution <ul><li>We can make a bigger contribution to the community if the community shares via ChemSpider </li></ul><ul><li>Don’t underestimate what others will find of value </li></ul><ul><li>ChemSpider wins “Community </li></ul><ul><li>contribution” best practice award” </li></ul>
  • 111. Chemistry on the Internet FUTURE <ul><li>The semantic web for chemistry is in place </li></ul><ul><li>Crowdsourced contributions are commonplace </li></ul><ul><li>Chemists will search by structure/substructure </li></ul><ul><li>Chemistry articles indexed and searchable </li></ul><ul><li>Reduced number of searches to find data </li></ul><ul><li>Data are integrated – compounds, vendors, syntheses, data, publications and patents </li></ul><ul><li>A world of Open Access and Open Data </li></ul>
  • 112. Thank you [email_address] Twitter: ChemSpiderman www.chemspider.com/blog SLIDES: www.slideshare.net/AntonyWilliams

×