Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Like this presentation? Why not share!

Like this? Share it with your network

Share

RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn

  • 3,643 views
Uploaded on

These are the slides I will be giving here at the Science Commons Symposium Pacific Northwest at the Microsoft Campus here in Redmond in about 5 minutes time

These are the slides I will be giving here at the Science Commons Symposium Pacific Northwest at the Microsoft Campus here in Redmond in about 5 minutes time

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • Doing great work cleaning up chemistry on the web.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
3,643
On Slideshare
3,387
From Embeds
256
Number of Embeds
4

Actions

Shares
Downloads
15
Comments
1
Likes
2

Embeds 256

http://www.chemspider.com 252
http://www.slideshare.net 2
http://webcache.googleusercontent.com 1
http://translate.googleusercontent.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. ChemSpider: Collecting and Curating the World’s Chemistry with the Community
  • 2. A Pragmatic Vision
      • “ Build a Structure Centric Community”
    • December 2006 – A hobby project initiated to connect chemistry on the web
      • Integrate chemical structure data on the web
      • Create a “structure-based hub” to information and data
      • Provide access to structure-based “algorithms”
      • Let chemists contribute their own data
      • Allow the community to curate/correct data
  • 3. Where is chemistry online?
    • Encyclopedic articles (Wikipedia)
    • Chemical vendor databases
    • Metabolic pathway databases
    • Property databases
    • Patents with chemical structures
    • Drug Discovery data
    • Scientific publications
    • Compound aggregators
    • Blogs/Wikis and Open Notebook Science
  • 4. Chemistry on the Internet TODAY
    • Chemistry searches are generally limited to text-based searches across the internet
    • Data are dirty: sorting the wheat from the chaff. Who can you trust?
    • Too many searches required to resource data
  • 5. media.obsessable.com
    • As few interfaces as possible
    What do humans want?
  • 6. Chemistry on the Internet FUTURE
    • The semantic web for chemistry is in place
    • Crowdsourced contributions are commonplace
    • Chemists will search by structure/substructure
    • Chemistry articles indexed and searchable
    • Reduced number of searches to find data
    • Data are integrated – compounds, vendors, syntheses, data, publications and patents
    • A world of Open Access and Open Data
    • Classical business models will have to morph
  • 7. Getting it done
    • March 2007 – A beta system opened online
      • One purchased computer, two home-built
      • Seeded with 10.5 million structures
      • Structure/substructure searching
    • June 2007
      • A curating layer to flag data
      • A deposition interface to add to the data
      • And so it continued….
  • 8. ChemSpider Searches
  • 9. Search Cholesterol
  • 10. Search Cholesterol
  • 11. Search Cholesterol
  • 12. Search Cholesterol
  • 13. Search Cholesterol
  • 14. Linked across the internet
  • 15. Kyoto Encyclopedia of Genes and Genomes
  • 16. Links to Patents based on structure
  • 17. Articles Linked
  • 18. ChemSpider Complex Searches
  • 19. Link off a structure in ChemSpider
      • Chemical suppliers
      • Other publications
      • Analytical Data
      • Related Reactions
      • Wikipedia
      • Patents
      • “ Everything”
  • 20. Answering Questions for Chemists
    • Questions a chemist might ask…
      • What is the melting point of n-butanol?
      • What is the chemical structure of Xanax?
      • Chemically, what is phenolphthalein?
      • What are the stereocenters of cholesterol?
      • Where can I find publications about xylene?
      • What are the different trade names for Ketoconazole?
      • What is the NMR spectrum of Aspirin?
      • What are the safety handling issues for Thymol Blue?
  • 21. What is a compound?
  • 22. ChemSpider is a structure-centric hub
    • ChemSpider aggregates and links out across the internet
    • Data aggregate based on “structures and links”
    • What defines a chemical compound?
  • 23. Linked Data on the Web Taken from: Rafael Sidis’ Blog
  • 24. Where Would You look? What Do You Trust?
  • 25. Question Everything online: www.dhmo.org
  • 26. Di-Hydrogen Monoxide
    • 2H
  • 27. Di-Hydrogen Monoxide
    • 2H + 1O
  • 28. Di-Hydrogen Monoxide
    • H2O
  • 29. Di-Hydrogen Monoxide
    • H2O
    • Water
  • 30. It’s all on Wikipedia…
  • 31. Chemistry on The Internet Is Messy
  • 32. It’s Methane…
  • 33. What’s Methane?
  • 34. What’s Methane?
  • 35. What ELSE is Methane???
  • 36. PubChem
  • 37. Chemistry is REALLY Messy
  • 38. Vancomycin
    • Who will curate?
    • How would you clean such a large dataset?
    • Assertions!!!
  • 39. Vancomycin on ChemSpider 1 compound – 3 days
  • 40. The EXPERTS must get it right?!
  • 41. Wikipedia, C&E News, PubChem
    • C&E News (from ACS)
  • 42. The InChI Identifier
  • 43. Multiple Layers
  • 44. InChIStrings Hash to InChIKeys
  • 45. InChIs for Taxol
  • 46. InChIKeys for Taxol
    • DrugBank: RCINICONZNJXQF-CLDWUXIMDD
    • ChEBI: RCINICONZNJXQF-GXKQXQCDDN
    • Wikipedia: RCINICONZNJXQF-MZXODVADBJ
    • ChEBI and Wikipedia are the SAME structure
    • Drugbank is a DIFFERENT structure – ONE stereocenter
  • 47. Does one stereocenter matter?
  • 48. Does one stereocenter matter?
    • Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, and Softenon
  • 49. Does one stereocenter matter?
    • Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, and Softenon
  • 50. Assertion and Chemical Entities
    • Who says what Taxol is?
    • What is the “timeline” for a molecule?
    • How do we clean up the Public data?
    • The Quality source is Chemical Abstracts Service…
  • 51. Vancomycin – Search the Internet
  • 52. Full Molecule Search: 4 Hits
  • 53. Full Skeleton Search: 104 Hits
  • 54. The InChI “Resolver”
  • 55. Citizen Scientists
  • 56. Crowd-sourcing Chemistry Curation
    • Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate
  • 57. Multi-level Curation and Approval Building a Structure Centric Community for Chemists
  • 58. Citizens as Data Sources
  • 59.  
  • 60. Semantic Markup: Project Prospect
  • 61. Entity-Extraction, Mark-up, Annotate
  • 62. Success Depends on Dictionaries
  • 63. ChemMantis and CJOC
  • 64. Name-Structure Pairs
  • 65. Species – linked to Wikipedia
  • 66. Semantic Linking of Structures
    • What would you want to link off a structure?
      • Chemical suppliers
      • Other publications
      • Analytical Data
      • Related Reactions
      • Wikipedia
      • Patents
      • “ Everything”
  • 67. ChemSpider Everywhere : Embed
  • 68. ChemSpider Everywhere: Spectral Game
  • 69. ChemSpider Everywhere Crowdsourced Curation of Spectra
  • 70. ChemSpider Everywhere: What do computers want?
    • Web services
    flickr.com/photos/microcosmos
  • 71. ChemSpider Everywhere
    • Linked from Wikipedia and many Public Databases
    • Linked from Open Notebook Science sites
    • Linked from Blogs using Structure/Spectra EMBED
    • Integrated into structure drawing packages
    • Integrated to software offerings from Thermo, Waters, Agilent, Bruker
  • 72. ChemSpider Everywhere: ChemMobi
  • 73. There will always be gaps...
    • What ChemSpider does not deal with, yet...
      • Materials
      • Minerals
      • Polymers
      • Biological macromolecules
  • 74. Open Source, Access and Data
    • ChemSpider is NOT Open Source but we do use Open Source components (OpenBabel, JSpecView, Jmol). Thanks Microsoft!
    • ChemSpider is not an “Open Access Database” – it’s a “free access” resource
    • We do not assume copyright. Rights to the data and the creative works remain with the depositor
    • Is ChemSpider “Open Data”?
  • 75. Open Data?
  • 76. Who declares data as Open?
    • Data licensing is very interesting and can spark “interesting” conversations. Opinions differ:
      • Are images data? Are assertions data?
      • What on a ChemSpider record is data?
      • Is PubChem or PubMed Open Data?
    • We allow people to declare their data as Open and add an Open Data button at upload
    • A lot of data on ChemSpider are free but not Open
    • Pragmatism: Our focus is a community resource
  • 77. Conclusions: ChemSpider Today
    • ChemSpider is an established community resource
      • >23 million compounds from >300 data sources
      • About 7000 unique users per day and up to ½ million transactions per day
      • A crowdsourced deposition and curation platform
      • Grows daily – more depositions, more links, more data
      • Web services provider
        • Linked to commercial and open source software
        • Supporting analytical companies: Agilent, Thermo, Waters, Bruker
        • Serving ONS, providing games to students, ChemSpidey robot
      • A publishing platform for the community
  • 78. ChemSpider Tomorrow
    • Continue the curation effort and keep cleaning
    • Finish depositions – millions left to deposit
    • Integrate RSC content – a massive archive!
    • Integrate RSC publishing workflows and databases
    • Enable the semantic web for chemistry
  • 79. Acknowledgments
    • Royal Society of Chemistry
    • Valery Tkachenko and Sergey Shevelev
    • Commercial Software: Microsoft, Advanced Chemistry Development, OpenEye and Symyx
    • Open Source Software: Jmol, OpenBabel, JSpecView
    • JC Bradley, Andrew Lang – The Spectral Game and Open Notebook Science integration
    • The “Crowd” of curators
    • 306 Data Source providers
    • SyntheticPages.org
  • 80. Thank you [email_address] Twitter: ChemSpiderman www.chemspider.com/blog SLIDES: www.slideshare.net/AntonyWilliams