• Like

Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn

  • 2,334 views
Uploaded on

These are the slides I will be giving here at the Science Commons Symposium Pacific Northwest at the Microsoft Campus here in Redmond in about 5 minutes time

These are the slides I will be giving here at the Science Commons Symposium Pacific Northwest at the Microsoft Campus here in Redmond in about 5 minutes time

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • Doing great work cleaning up chemistry on the web.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
2,334
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
15
Comments
1
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. ChemSpider: Collecting and Curating the World’s Chemistry with the Community
  • 2. A Pragmatic Vision
      • “ Build a Structure Centric Community”
    • December 2006 – A hobby project initiated to connect chemistry on the web
      • Integrate chemical structure data on the web
      • Create a “structure-based hub” to information and data
      • Provide access to structure-based “algorithms”
      • Let chemists contribute their own data
      • Allow the community to curate/correct data
  • 3. Where is chemistry online?
    • Encyclopedic articles (Wikipedia)
    • Chemical vendor databases
    • Metabolic pathway databases
    • Property databases
    • Patents with chemical structures
    • Drug Discovery data
    • Scientific publications
    • Compound aggregators
    • Blogs/Wikis and Open Notebook Science
  • 4. Chemistry on the Internet TODAY
    • Chemistry searches are generally limited to text-based searches across the internet
    • Data are dirty: sorting the wheat from the chaff. Who can you trust?
    • Too many searches required to resource data
  • 5. media.obsessable.com
    • As few interfaces as possible
    What do humans want?
  • 6. Chemistry on the Internet FUTURE
    • The semantic web for chemistry is in place
    • Crowdsourced contributions are commonplace
    • Chemists will search by structure/substructure
    • Chemistry articles indexed and searchable
    • Reduced number of searches to find data
    • Data are integrated – compounds, vendors, syntheses, data, publications and patents
    • A world of Open Access and Open Data
    • Classical business models will have to morph
  • 7. Getting it done
    • March 2007 – A beta system opened online
      • One purchased computer, two home-built
      • Seeded with 10.5 million structures
      • Structure/substructure searching
    • June 2007
      • A curating layer to flag data
      • A deposition interface to add to the data
      • And so it continued….
  • 8. ChemSpider Searches
  • 9. Search Cholesterol
  • 10. Search Cholesterol
  • 11. Search Cholesterol
  • 12. Search Cholesterol
  • 13. Search Cholesterol
  • 14. Linked across the internet
  • 15. Kyoto Encyclopedia of Genes and Genomes
  • 16. Links to Patents based on structure
  • 17. Articles Linked
  • 18. ChemSpider Complex Searches
  • 19. Link off a structure in ChemSpider
      • Chemical suppliers
      • Other publications
      • Analytical Data
      • Related Reactions
      • Wikipedia
      • Patents
      • “ Everything”
  • 20. Answering Questions for Chemists
    • Questions a chemist might ask…
      • What is the melting point of n-butanol?
      • What is the chemical structure of Xanax?
      • Chemically, what is phenolphthalein?
      • What are the stereocenters of cholesterol?
      • Where can I find publications about xylene?
      • What are the different trade names for Ketoconazole?
      • What is the NMR spectrum of Aspirin?
      • What are the safety handling issues for Thymol Blue?
  • 21. What is a compound?
  • 22. ChemSpider is a structure-centric hub
    • ChemSpider aggregates and links out across the internet
    • Data aggregate based on “structures and links”
    • What defines a chemical compound?
  • 23. Linked Data on the Web Taken from: Rafael Sidis’ Blog
  • 24. Where Would You look? What Do You Trust?
  • 25. Question Everything online: www.dhmo.org
  • 26. Di-Hydrogen Monoxide
    • 2H
  • 27. Di-Hydrogen Monoxide
    • 2H + 1O
  • 28. Di-Hydrogen Monoxide
    • H2O
  • 29. Di-Hydrogen Monoxide
    • H2O
    • Water
  • 30. It’s all on Wikipedia…
  • 31. Chemistry on The Internet Is Messy
  • 32. It’s Methane…
  • 33. What’s Methane?
  • 34. What’s Methane?
  • 35. What ELSE is Methane???
  • 36. PubChem
  • 37. Chemistry is REALLY Messy
  • 38. Vancomycin
    • Who will curate?
    • How would you clean such a large dataset?
    • Assertions!!!
  • 39. Vancomycin on ChemSpider 1 compound – 3 days
  • 40. The EXPERTS must get it right?!
  • 41. Wikipedia, C&E News, PubChem
    • C&E News (from ACS)
  • 42. The InChI Identifier
  • 43. Multiple Layers
  • 44. InChIStrings Hash to InChIKeys
  • 45. InChIs for Taxol
  • 46. InChIKeys for Taxol
    • DrugBank: RCINICONZNJXQF-CLDWUXIMDD
    • ChEBI: RCINICONZNJXQF-GXKQXQCDDN
    • Wikipedia: RCINICONZNJXQF-MZXODVADBJ
    • ChEBI and Wikipedia are the SAME structure
    • Drugbank is a DIFFERENT structure – ONE stereocenter
  • 47. Does one stereocenter matter?
  • 48. Does one stereocenter matter?
    • Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, and Softenon
  • 49. Does one stereocenter matter?
    • Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, and Softenon
  • 50. Assertion and Chemical Entities
    • Who says what Taxol is?
    • What is the “timeline” for a molecule?
    • How do we clean up the Public data?
    • The Quality source is Chemical Abstracts Service…
  • 51. Vancomycin – Search the Internet
  • 52. Full Molecule Search: 4 Hits
  • 53. Full Skeleton Search: 104 Hits
  • 54. The InChI “Resolver”
  • 55. Citizen Scientists
  • 56. Crowd-sourcing Chemistry Curation
    • Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate
  • 57. Multi-level Curation and Approval Building a Structure Centric Community for Chemists
  • 58. Citizens as Data Sources
  • 59.  
  • 60. Semantic Markup: Project Prospect
  • 61. Entity-Extraction, Mark-up, Annotate
  • 62. Success Depends on Dictionaries
  • 63. ChemMantis and CJOC
  • 64. Name-Structure Pairs
  • 65. Species – linked to Wikipedia
  • 66. Semantic Linking of Structures
    • What would you want to link off a structure?
      • Chemical suppliers
      • Other publications
      • Analytical Data
      • Related Reactions
      • Wikipedia
      • Patents
      • “ Everything”
  • 67. ChemSpider Everywhere : Embed
  • 68. ChemSpider Everywhere: Spectral Game
  • 69. ChemSpider Everywhere Crowdsourced Curation of Spectra
  • 70. ChemSpider Everywhere: What do computers want?
    • Web services
    flickr.com/photos/microcosmos
  • 71. ChemSpider Everywhere
    • Linked from Wikipedia and many Public Databases
    • Linked from Open Notebook Science sites
    • Linked from Blogs using Structure/Spectra EMBED
    • Integrated into structure drawing packages
    • Integrated to software offerings from Thermo, Waters, Agilent, Bruker
  • 72. ChemSpider Everywhere: ChemMobi
  • 73. There will always be gaps...
    • What ChemSpider does not deal with, yet...
      • Materials
      • Minerals
      • Polymers
      • Biological macromolecules
  • 74. Open Source, Access and Data
    • ChemSpider is NOT Open Source but we do use Open Source components (OpenBabel, JSpecView, Jmol). Thanks Microsoft!
    • ChemSpider is not an “Open Access Database” – it’s a “free access” resource
    • We do not assume copyright. Rights to the data and the creative works remain with the depositor
    • Is ChemSpider “Open Data”?
  • 75. Open Data?
  • 76. Who declares data as Open?
    • Data licensing is very interesting and can spark “interesting” conversations. Opinions differ:
      • Are images data? Are assertions data?
      • What on a ChemSpider record is data?
      • Is PubChem or PubMed Open Data?
    • We allow people to declare their data as Open and add an Open Data button at upload
    • A lot of data on ChemSpider are free but not Open
    • Pragmatism: Our focus is a community resource
  • 77. Conclusions: ChemSpider Today
    • ChemSpider is an established community resource
      • >23 million compounds from >300 data sources
      • About 7000 unique users per day and up to ½ million transactions per day
      • A crowdsourced deposition and curation platform
      • Grows daily – more depositions, more links, more data
      • Web services provider
        • Linked to commercial and open source software
        • Supporting analytical companies: Agilent, Thermo, Waters, Bruker
        • Serving ONS, providing games to students, ChemSpidey robot
      • A publishing platform for the community
  • 78. ChemSpider Tomorrow
    • Continue the curation effort and keep cleaning
    • Finish depositions – millions left to deposit
    • Integrate RSC content – a massive archive!
    • Integrate RSC publishing workflows and databases
    • Enable the semantic web for chemistry
  • 79. Acknowledgments
    • Royal Society of Chemistry
    • Valery Tkachenko and Sergey Shevelev
    • Commercial Software: Microsoft, Advanced Chemistry Development, OpenEye and Symyx
    • Open Source Software: Jmol, OpenBabel, JSpecView
    • JC Bradley, Andrew Lang – The Spectral Game and Open Notebook Science integration
    • The “Crowd” of curators
    • 306 Data Source providers
    • SyntheticPages.org
  • 80. Thank you [email_address] Twitter: ChemSpiderman www.chemspider.com/blog SLIDES: www.slideshare.net/AntonyWilliams