Your SlideShare is downloading. ×
0
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn

2,421

Published on

These are the slides I will be giving here at the Science Commons Symposium Pacific Northwest at the Microsoft Campus here in Redmond in about 5 minutes time

These are the slides I will be giving here at the Science Commons Symposium Pacific Northwest at the Microsoft Campus here in Redmond in about 5 minutes time

Published in: Technology, Education
1 Comment
2 Likes
Statistics
Notes
No Downloads
Views
Total Views
2,421
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
16
Comments
1
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. ChemSpider: Collecting and Curating the World’s Chemistry with the Community
  • 2. A Pragmatic Vision “Build a Structure Centric Community”  December 2006 – A hobby project initiated to connect chemistry on the web  Integrate chemical structure data on the web  Create a “structure-based hub” to information and data  Provide access to structure-based “algorithms”  Let chemists contribute their own data  Allow the community to curate/correct data
  • 3. Where is chemistry online?  Encyclopedic articles (Wikipedia)  Chemical vendor databases  Metabolic pathway databases  Property databases  Patents with chemical structures  Drug Discovery data  Scientific publications  Compound aggregators  Blogs/Wikis and Open Notebook Science
  • 4. Chemistry on the Internet TODAY  Chemistry searches are generally limited to text- based searches across the internet  Data are dirty: sorting the wheat from the chaff. Who can you trust?  Too many searches required to resource data
  • 5. media.obsessable.com As few interfaces as possible What do humans want?
  • 6. Chemistry on the Internet FUTURE  The semantic web for chemistry is in place  Crowdsourced contributions are commonplace  Chemists will search by structure/substructure  Chemistry articles indexed and searchable  Reduced number of searches to find data  Data are integrated – compounds, vendors, syntheses, data, publications and patents  A world of Open Access and Open Data  Classical business models will have to morph
  • 7. Getting it done  March 2007 – A beta system opened online  One purchased computer, two home-built  Seeded with 10.5 million structures  Structure/substructure searching  June 2007  A curating layer to flag data  A deposition interface to add to the data  And so it continued….
  • 8. ChemSpider Searches
  • 9. Search Cholesterol
  • 10. Search Cholesterol
  • 11. Search Cholesterol
  • 12. Search Cholesterol
  • 13. Search Cholesterol
  • 14. Linked across the internet
  • 15. Kyoto Encyclopedia of Genes and Genomes
  • 16. Links to Patents based on structure
  • 17. Articles Linked
  • 18. ChemSpider Complex Searches
  • 19. Link off a structure in ChemSpider  Chemical suppliers  Other publications  Analytical Data  Related Reactions  Wikipedia  Patents  “Everything”
  • 20. Answering Questions for Chemists  Questions a chemist might ask…  What is the melting point of n-butanol?  What is the chemical structure of Xanax?  Chemically, what is phenolphthalein?  What are the stereocenters of cholesterol?  Where can I find publications about xylene?  What are the different trade names for Ketoconazole?  What is the NMR spectrum of Aspirin?  What are the safety handling issues for Thymol Blue?
  • 21. What is a compound?
  • 22. ChemSpider is a structure-centric hub  ChemSpider aggregates and links out across the internet  Data aggregate based on “structures and links”  What defines a chemical compound?
  • 23. Linked Data on the Web Taken from: Rafael Sidis’ Blog
  • 24. Where Would You look? What Do You Trust?
  • 25. Question Everything online: www.dhmo.org
  • 26. Di-Hydrogen Monoxide 2H
  • 27. Di-Hydrogen Monoxide 2H + 1O
  • 28. Di-Hydrogen Monoxide H2O
  • 29. Di-Hydrogen Monoxide H2O Water
  • 30. It’s all on Wikipedia…
  • 31. Chemistry on The Internet Is Messy
  • 32. It’s Methane…
  • 33. What’s Methane?
  • 34. What’s Methane?
  • 35. What ELSE is Methane???
  • 36. PubChem
  • 37. Chemistry is REALLY Messy
  • 38. Vancomycin  Who will curate?  How would you clean such a large dataset?  Assertions!!!
  • 39. Vancomycin on ChemSpider 1 compound – 3 days
  • 40. The EXPERTS must get it right?!
  • 41. Wikipedia, C&E News, PubChem C&E News (from ACS)
  • 42. The InChI Identifier
  • 43. Multiple Layers
  • 44. InChIStrings Hash to InChIKeys
  • 45. InChIs for Taxol
  • 46. InChIKeys for Taxol  DrugBank: RCINICONZNJXQF-CLDWUXIMDD  ChEBI: RCINICONZNJXQF-GXKQXQCDDN  Wikipedia: RCINICONZNJXQF-MZXODVADBJ  ChEBI and Wikipedia are the SAME structure  Drugbank is a DIFFERENT structure – ONE stereocenter
  • 47. Does one stereocenter matter?
  • 48. Does one stereocenter matter?  Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, and Softenon
  • 49. Does one stereocenter matter?  Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, and Softenon
  • 50. Assertion and Chemical Entities  Who says what Taxol is?  What is the “timeline” for a molecule?  How do we clean up the Public data?  The Quality source is Chemical Abstracts Service…
  • 51. Vancomycin – Search the Internet
  • 52. Full Molecule Search: 4 Hits
  • 53. Full Skeleton Search: 104 Hits
  • 54. The InChI “Resolver”
  • 55. Citizen Scientists
  • 56. Crowd-sourcing Chemistry Curation  Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate
  • 57. Building a Structure Centric Multi-level Curation and Approval
  • 58. Citizens as Data Sources
  • 59. Semantic Markup: Project Prospect
  • 60. Entity-Extraction, Mark-up, Annotate
  • 61. Success Depends on Dictionaries
  • 62. ChemMantis and CJOC
  • 63. Name-Structure Pairs
  • 64. Species – linked to Wikipedia
  • 65. Semantic Linking of Structures  What would you want to link off a structure?  Chemical suppliers  Other publications  Analytical Data  Related Reactions  Wikipedia  Patents  “Everything”
  • 66. ChemSpider Everywhere : Embed
  • 67. ChemSpider Everywhere: Spectral Game
  • 68. ChemSpider Everywhere Crowdsourced Curation of Spectra
  • 69. ChemSpider Everywhere: What do computers want? Web services flickr.com/photos/microcosmos
  • 70. ChemSpider Everywhere  Linked from Wikipedia and many Public Databases  Linked from Open Notebook Science sites  Linked from Blogs using Structure/Spectra EMBED  Integrated into structure drawing packages  Integrated to software offerings from Thermo, Waters, Agilent, Bruker
  • 71. ChemSpider Everywhere: ChemMobi
  • 72. There will always be gaps...  What ChemSpider does not deal with, yet...  Materials  Minerals  Polymers  Biological macromolecules
  • 73. Open Source, Access and Data  ChemSpider is NOT Open Source but we do use Open Source components (OpenBabel, JSpecView, Jmol). Thanks Microsoft!  ChemSpider is not an “Open Access Database” – it’s a “free access” resource  We do not assume copyright. Rights to the data and the creative works remain with the depositor  Is ChemSpider “Open Data”?
  • 74. Open Data?
  • 75. Who declares data as Open?  Data licensing is very interesting and can spark “interesting” conversations. Opinions differ:  Are images data? Are assertions data?  What on a ChemSpider record is data?  Is PubChem or PubMed Open Data?  We allow people to declare their data as Open and add an Open Data button at upload  A lot of data on ChemSpider are free but not Open  Pragmatism: Our focus is a community resource
  • 76. Conclusions: ChemSpider Today  ChemSpider is an established community resource  >23 million compounds from >300 data sources  About 7000 unique users per day and up to ½ million transactions per day  A crowdsourced deposition and curation platform  Grows daily – more depositions, more links, more data  Web services provider  Linked to commercial and open source software  Supporting analytical companies: Agilent, Thermo, Waters, Bruker  Serving ONS, providing games to students, ChemSpidey robot  A publishing platform for the community
  • 77. ChemSpider Tomorrow  Continue the curation effort and keep cleaning  Finish depositions – millions left to deposit  Integrate RSC content – a massive archive!  Integrate RSC publishing workflows and databases  Enable the semantic web for chemistry
  • 78. Acknowledgments  Royal Society of Chemistry  Valery Tkachenko and Sergey Shevelev  Commercial Software: Microsoft, Advanced Chemistry Development, OpenEye and Symyx  Open Source Software: Jmol, OpenBabel, JSpecView  JC Bradley, Andrew Lang – The Spectral Game and Open Notebook Science integration  The “Crowd” of curators  306 Data Source providers  SyntheticPages.org
  • 79. Thank you antony.williams@chemspider.com Twitter: ChemSpiderman www.chemspider.com/blog SLIDES: www.slideshare.net/AntonyWilliamsSLIDES: www.slideshare.net/AntonyWilliams

×