RSC ChemSpider – Building An Internet Based Community For Chemists
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

RSC ChemSpider – Building An Internet Based Community For Chemists

on

  • 1,756 views

This is a general presentation about our efforts to build an internet based community for chemists using ChemSpider. A general overview of data quality online, crowdsourced deposition and curation and ...

This is a general presentation about our efforts to build an internet based community for chemists using ChemSpider. A general overview of data quality online, crowdsourced deposition and curation and our progress to deliver a solution to the community for resourcing data.

Statistics

Views

Total Views
1,756
Views on SlideShare
1,680
Embed Views
76

Actions

Likes
2
Downloads
10
Comments
0

3 Embeds 76

http://alpsp-web2-training.pbworks.com 72
http://www.chemspider.com 3
http://www.slideshare.net 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

RSC ChemSpider – Building An Internet Based Community For Chemists Presentation Transcript

  • 1. RSC ChemSpider – Building an Internet Based Community for Chemists
  • 2. Where is chemistry online?
    Encyclopedic articles (Wikipedia)
    Chemical vendor databases
    Metabolic pathway databases
    Property databases
    Patents with chemical structures
    Drug Discovery data
    Scientific publications
    Compound aggregators
    Blogs/Wikis and Open Notebook Science
  • 3. Chemistry on the Internet TODAY
    Chemistry searches are generally limited to text-based searches across the internet
    Poor quality and little curation/validation work
    Too many searches required to resource data
  • 4. What do humans want?
    media.obsessable.com
    As few interfaces as possible
  • 5. Chemistry on the Internet FUTURE
    Search by chemical structure and substructure
    Chemistry articles indexed and searchable
    Reduced number of searches to find data
    Data are integrated – compounds, vendors, syntheses, data, publications and patents
  • 6. For Synthesis…TotallySynthetic.com
  • 7. Org Prep Daily (Blog)
  • 8. Lots of “Public Compound” Databases
    PubChem
    Drugbank
    ChEBI/ChEMBL
    KEGG
    LipidMAPs
    ChemIDPlus
    eMolecules
    ZINC
    Lots of chemical vendors
    ChemSpider
  • 9. Where Would You look? What Do You Trust?
  • 10. Linked Data on the Web
    Taken from: Rafael Sidis’ Blog
  • 11. What is a compound?
  • 12. What is ChemSpider?
    ChemSpider is:
    Building a Structure Centric Community for Chemists
    >23 million compounds, >300 data sources
    A deposition and curation platform
    A publishing platform for the community
    Grows daily – more depositions, more links, more data sources
  • 13. How Was ChemSpider Built?
    ChemSpider was a “hobby project”
    Housed in a basement and running off three servers – one bought, two built
    Sensitive to weather and power stability
    Went live at ACS Spring 2007 in Chicago
  • 14. Search Cholesterol
  • 15. Search Cholesterol
  • 16. Search Cholesterol
  • 17. Search Cholesterol
  • 18. Search Cholesterol
  • 19. Linked across the internet
  • 20. Kyoto Encyclopedia of Genes and Genomes
  • 21. Link off a structure in ChemSpider
    Chemical suppliers
    Other publications
    Analytical Data
    Related Reactions
    Wikipedia
    Patents
    “Everything”
  • 22. Links to Patents based on structure
  • 23. Clickthrough to Patents
  • 24. Articles Linked
  • 25. Answering Questions for Chemists
    Questions a chemist might ask…
    What is the melting point of n-butanol?
    What is the chemical structure of Xanax?
    Chemically, what is phenolphthalein?
    What are the stereocenters of cholesterol?
    Where can I find publications about xylene?
    What are the different trade names for Ketoconazole?
    What is the NMR spectrum of Aspirin?
    What are the safety handling issues for Thymol Blue?
  • 26. Complex Data and Information
  • 27. ChemSpider is a structure-centric hub
    ChemSpider aggregates and links out across the internet
    Data aggregate based on “structures and links”
    What defines a chemical compound?
  • 28. What is a compound?
  • 29. Question Everything online: www.dhmo.org
  • 30. Di-Hydrogen Monoxide
    2H
  • 31. Di-HydrogenMonoxide
    2H + 1O
  • 32. Di-Hydrogen Monoxide
    H2O
  • 33. Di-Hydrogen Monoxide
    H2O
    Water
  • 34. It’s all on Wikipedia…
  • 35. It’s all on Wikipedia…
  • 36. Chemistry on The Internet Is Messy
  • 37. It’s Methane…
  • 38. What’s Methane?
  • 39. What’s Methane?
  • 40. What ELSE is Methane???
  • 41. PubChem
  • 42. Truly “I Love You”
  • 43. Chemistry is REALLY Messy
  • 44. Vancomycin
    Who will curate?
    How would you clean such a large dataset?
    Assertions!!!
  • 45. Vancomycin
    Who will curate?
    How would you clean such a large dataset?
  • 46. Vancomycin on ChemSpider 1 compound – 3 days
  • 47. The EXPERTS must get it right?!
  • 48. Wikipedia, C&E News, PubChem
    C&E News (from ACS)
  • 49. What About Digitonin?
  • 50. CAS as an authority
  • 51. The Blogging Community Participate
  • 52. The FDA’s DailyMed
  • 53. Structures on DailyMed
  • 54. Lack of Stereochemisty
  • 55. Incorrect Structures
  • 56. Wow!
  • 57. The InChI Identifier
  • 58. Multiple Layers
  • 59. InChIStrings Hash to InChIKeys
  • 60. InChIs for Taxol
  • 61. Back to Taxol
    DrugBank: RCINICONZNJXQF-CLDWUXIMDD
    ChEBI: RCINICONZNJXQF-GXKQXQCDDN
    Wikipedia: RCINICONZNJXQF-MZXODVADBJ
    Which one is correct???
  • 62. InChIKeys for Taxol
    DrugBank: RCINICONZNJXQF-CLDWUXIMDD
    ChEBI: RCINICONZNJXQF-GXKQXQCDDN
    Wikipedia: RCINICONZNJXQF-MZXODVADBJ
    ChEBI and Wikipedia are the SAME structure
    Drugbank is a DIFFERENT structure – ONE stereocenter
  • 63. Does one stereocenter matter?
  • 64. Does one stereocenter matter?
    Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, and Softenon
  • 65. Does one stereocenter matter?
    • Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, and Softenon
  • Building a Structure Centric Community for Chemists
  • 66. Assertion and Chemical Entities
    Who says what Taxol is?
    What is the “timeline” for a molecule?
    How do we clean up the Public data?
    The Quality source is Chemical Abstracts Service…
  • 67. ChemSpider Searches
  • 68. ChemSpider Searches
  • 69. ChemSpider Complex Searches
  • 70. Vancomycin – Search the Internet
  • 71. Full Molecule Search: 4 Hits
  • 72. Full Skeleton Search: 104 Hits
  • 73. The InChI “Resolver”
  • 74. Citizen Scientists
  • 75. Crowd-sourcing Chemistry Curation
    Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate
  • 76. Building a Structure Centric Community for Chemists
    Multi-level Curation and Approval
  • 77. Citizens as Data Sources
  • 78.
  • 79. Entity-Extraction, Mark-up, Annotate
  • 80. Success Depends on Dictionaries
  • 81. Project Prospect
  • 82. ChemMantis and CJOC
  • 83. Name-Structure Pairs
  • 84. Species – linked to Wikipedia
  • 85. Semantic Linking of Structures
    What would you want to link off a structure?
    Chemical suppliers
    Other publications
    Analytical Data
    Related Reactions
    Wikipedia
    Patents
    “Everything”
  • 86. ChemSpider Everywhere
    Linked from Wikipedia
    Linked from Open Notebook Science sites using EMBED
    Linked from Blogs using Structure/Spectra EMBED
    Integrated into structure drawing packages such as ACD/ChemSketch, Symyx Draw, Open Source applets
    Integrated to software offerings from Thermo, Waters, Agilent, Bruker
  • 87. ChemSpider Everywhere : Embed
  • 88. ChemSpider Everywhere:What do computers want?
    Web services
    flickr.com/photos/microcosmos
  • 89. ChemSpider Everywhere: Spectral Game
  • 90. ChemSpider EverywhereCrowdsourced Curation of Spectra
  • 91. ChemSpider EverywhereChemMobi
  • 92. There are always gaps...
    What ChemSpider doesn’t deal with yet...
    Markush structures and other “non-defineds”
    Materials
    Minerals
    Polymers
    Biological macromolecules
  • 93. What’s next?
    Continue the curation effort and keep cleaning
    Finish depositions – millions left to deposit
    Layer on RDF to allow the semantic web to benefit from our efforts
    Integrate RSC content – a massive archive!
    Integrate RSC publishing workflows and databases
  • 94. Thank you
    antony.williams@chemspider.com
    Twitter: ChemSpiderman
    www.chemspider.com/blog
    SLIDES: www.slideshare.net/AntonyWilliams