• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
RSC ChemSpider – Building An Internet Based Community For Chemists
 

RSC ChemSpider – Building An Internet Based Community For Chemists

on

  • 1,661 views

This is a general presentation about our efforts to build an internet based community for chemists using ChemSpider. A general overview of data quality online, crowdsourced deposition and curation and ...

This is a general presentation about our efforts to build an internet based community for chemists using ChemSpider. A general overview of data quality online, crowdsourced deposition and curation and our progress to deliver a solution to the community for resourcing data.

Statistics

Views

Total Views
1,661
Views on SlideShare
1,585
Embed Views
76

Actions

Likes
2
Downloads
10
Comments
0

3 Embeds 76

http://alpsp-web2-training.pbworks.com 72
http://www.chemspider.com 3
http://www.slideshare.net 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    RSC ChemSpider – Building An Internet Based Community For Chemists RSC ChemSpider – Building An Internet Based Community For Chemists Presentation Transcript

    • RSC ChemSpider – Building an Internet Based Community for Chemists
    • Where is chemistry online?
      Encyclopedic articles (Wikipedia)
      Chemical vendor databases
      Metabolic pathway databases
      Property databases
      Patents with chemical structures
      Drug Discovery data
      Scientific publications
      Compound aggregators
      Blogs/Wikis and Open Notebook Science
    • Chemistry on the Internet TODAY
      Chemistry searches are generally limited to text-based searches across the internet
      Poor quality and little curation/validation work
      Too many searches required to resource data
    • What do humans want?
      media.obsessable.com
      As few interfaces as possible
    • Chemistry on the Internet FUTURE
      Search by chemical structure and substructure
      Chemistry articles indexed and searchable
      Reduced number of searches to find data
      Data are integrated – compounds, vendors, syntheses, data, publications and patents
    • For Synthesis…TotallySynthetic.com
    • Org Prep Daily (Blog)
    • Lots of “Public Compound” Databases
      PubChem
      Drugbank
      ChEBI/ChEMBL
      KEGG
      LipidMAPs
      ChemIDPlus
      eMolecules
      ZINC
      Lots of chemical vendors
      ChemSpider
    • Where Would You look? What Do You Trust?
    • Linked Data on the Web
      Taken from: Rafael Sidis’ Blog
    • What is a compound?
    • What is ChemSpider?
      ChemSpider is:
      Building a Structure Centric Community for Chemists
      >23 million compounds, >300 data sources
      A deposition and curation platform
      A publishing platform for the community
      Grows daily – more depositions, more links, more data sources
    • How Was ChemSpider Built?
      ChemSpider was a “hobby project”
      Housed in a basement and running off three servers – one bought, two built
      Sensitive to weather and power stability
      Went live at ACS Spring 2007 in Chicago
    • Search Cholesterol
    • Search Cholesterol
    • Search Cholesterol
    • Search Cholesterol
    • Search Cholesterol
    • Linked across the internet
    • Kyoto Encyclopedia of Genes and Genomes
    • Link off a structure in ChemSpider
      Chemical suppliers
      Other publications
      Analytical Data
      Related Reactions
      Wikipedia
      Patents
      “Everything”
    • Links to Patents based on structure
    • Clickthrough to Patents
    • Articles Linked
    • Answering Questions for Chemists
      Questions a chemist might ask…
      What is the melting point of n-butanol?
      What is the chemical structure of Xanax?
      Chemically, what is phenolphthalein?
      What are the stereocenters of cholesterol?
      Where can I find publications about xylene?
      What are the different trade names for Ketoconazole?
      What is the NMR spectrum of Aspirin?
      What are the safety handling issues for Thymol Blue?
    • Complex Data and Information
    • ChemSpider is a structure-centric hub
      ChemSpider aggregates and links out across the internet
      Data aggregate based on “structures and links”
      What defines a chemical compound?
    • What is a compound?
    • Question Everything online: www.dhmo.org
    • Di-Hydrogen Monoxide
      2H
    • Di-HydrogenMonoxide
      2H + 1O
    • Di-Hydrogen Monoxide
      H2O
    • Di-Hydrogen Monoxide
      H2O
      Water
    • It’s all on Wikipedia…
    • It’s all on Wikipedia…
    • Chemistry on The Internet Is Messy
    • It’s Methane…
    • What’s Methane?
    • What’s Methane?
    • What ELSE is Methane???
    • PubChem
    • Truly “I Love You”
    • Chemistry is REALLY Messy
    • Vancomycin
      Who will curate?
      How would you clean such a large dataset?
      Assertions!!!
    • Vancomycin
      Who will curate?
      How would you clean such a large dataset?
    • Vancomycin on ChemSpider 1 compound – 3 days
    • The EXPERTS must get it right?!
    • Wikipedia, C&E News, PubChem
      C&E News (from ACS)
    • What About Digitonin?
    • CAS as an authority
    • The Blogging Community Participate
    • The FDA’s DailyMed
    • Structures on DailyMed
    • Lack of Stereochemisty
    • Incorrect Structures
    • Wow!
    • The InChI Identifier
    • Multiple Layers
    • InChIStrings Hash to InChIKeys
    • InChIs for Taxol
    • Back to Taxol
      DrugBank: RCINICONZNJXQF-CLDWUXIMDD
      ChEBI: RCINICONZNJXQF-GXKQXQCDDN
      Wikipedia: RCINICONZNJXQF-MZXODVADBJ
      Which one is correct???
    • InChIKeys for Taxol
      DrugBank: RCINICONZNJXQF-CLDWUXIMDD
      ChEBI: RCINICONZNJXQF-GXKQXQCDDN
      Wikipedia: RCINICONZNJXQF-MZXODVADBJ
      ChEBI and Wikipedia are the SAME structure
      Drugbank is a DIFFERENT structure – ONE stereocenter
    • Does one stereocenter matter?
    • Does one stereocenter matter?
      Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, and Softenon
    • Does one stereocenter matter?
      • Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, and Softenon
    • Building a Structure Centric Community for Chemists
    • Assertion and Chemical Entities
      Who says what Taxol is?
      What is the “timeline” for a molecule?
      How do we clean up the Public data?
      The Quality source is Chemical Abstracts Service…
    • ChemSpider Searches
    • ChemSpider Searches
    • ChemSpider Complex Searches
    • Vancomycin – Search the Internet
    • Full Molecule Search: 4 Hits
    • Full Skeleton Search: 104 Hits
    • The InChI “Resolver”
    • Citizen Scientists
    • Crowd-sourcing Chemistry Curation
      Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate
    • Building a Structure Centric Community for Chemists
      Multi-level Curation and Approval
    • Citizens as Data Sources
    • Entity-Extraction, Mark-up, Annotate
    • Success Depends on Dictionaries
    • Project Prospect
    • ChemMantis and CJOC
    • Name-Structure Pairs
    • Species – linked to Wikipedia
    • Semantic Linking of Structures
      What would you want to link off a structure?
      Chemical suppliers
      Other publications
      Analytical Data
      Related Reactions
      Wikipedia
      Patents
      “Everything”
    • ChemSpider Everywhere
      Linked from Wikipedia
      Linked from Open Notebook Science sites using EMBED
      Linked from Blogs using Structure/Spectra EMBED
      Integrated into structure drawing packages such as ACD/ChemSketch, Symyx Draw, Open Source applets
      Integrated to software offerings from Thermo, Waters, Agilent, Bruker
    • ChemSpider Everywhere : Embed
    • ChemSpider Everywhere:What do computers want?
      Web services
      flickr.com/photos/microcosmos
    • ChemSpider Everywhere: Spectral Game
    • ChemSpider EverywhereCrowdsourced Curation of Spectra
    • ChemSpider EverywhereChemMobi
    • There are always gaps...
      What ChemSpider doesn’t deal with yet...
      Markush structures and other “non-defineds”
      Materials
      Minerals
      Polymers
      Biological macromolecules
    • What’s next?
      Continue the curation effort and keep cleaning
      Finish depositions – millions left to deposit
      Layer on RDF to allow the semantic web to benefit from our efforts
      Integrate RSC content – a massive archive!
      Integrate RSC publishing workflows and databases
    • Thank you
      antony.williams@chemspider.com
      Twitter: ChemSpiderman
      www.chemspider.com/blog
      SLIDES: www.slideshare.net/AntonyWilliams