RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build Community for Chemists
Upcoming SlideShare
Loading in...5
×
 

RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build Community for Chemists

on

  • 3,460 views

The increasing availability of free and open access resources for scientists on the internet presents us with a revolution in data availability. The Royal Society of Chemistry hosts ChemSpider, a free ...

The increasing availability of free and open access resources for scientists on the internet presents us with a revolution in data availability. The Royal Society of Chemistry hosts ChemSpider, a free access website for chemists built with the intention of building community for chemists (http://www.chemspider.com/).

ChemSpider is an aggregator of chemistry related information, at present over 20 million unique chemical entities linked out to over 300 separate data sources, ChemSpider has taken on the task of both robotically and manually curating publicly available data sources. It is also a public deposition platform where chemists can deposit their own data including novel structures, analytical data, synthesis procedures and host data associated with the growing activities associated with Open Notebook Science.

This presentation will examine chemistry on the internet, the dubious quality of what is available and how the ChemSpider crowdsourced curation platform is fast becoming one of the centralized hubs for resourcing information about chemical entities.

We will also review our efforts to provide free resources for synthesis procedures, spectral data and structure-based searching of the chemistry literature and how chemists can contribute directly to each of these projects.

Statistics

Views

Total Views
3,460
Views on SlideShare
2,296
Embed Views
1,164

Actions

Likes
0
Downloads
14
Comments
0

8 Embeds 1,164

http://jeffloo.com 684
http://www.chemspider.com 472
http://www.slideshare.net 2
http://translate.googleusercontent.com 2
http://chemspider.com 1
http://static.slidesharecdn.com 1
http://webcache.googleusercontent.com 1
https://s14-us2.ixquick-proxy.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build Community for Chemists RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build Community for Chemists Presentation Transcript

  • Managing and Integrating Chemistry on the Internet to Build Community for Chemists Lawrence Berkeley National Laboratory, March 2010,
  • Chemistry on the Internet TODAY
    • Chemistry searches are generally limited to text-based searches across the internet
    • Data are dirty: sorting the wheat from the chaff. Who can you trust?
    • Too many searches required to resource data
  • Chemistry on the Internet TODAY
    • Chemistry searches are generally limited to text-based searches across the internet
    • Data are dirty: sorting the wheat from the chaff. Who can you trust?
    • Too many searches required to resource data
  •  
  •  
  • The Final Search Strategy
  • All Those Names, One Structure A problem to solve…
  • Chemistry on the Internet TODAY
    • Chemistry searches are generally limited to text-based searches across the internet
    • Data are dirty: sorting the wheat from the chaff. Who can you trust?
    • Too many searches required to resource data
  • Trustworthy Chemistry?
    • Encyclopedic articles (Wikipedia)
    • Chemical vendor databases
    • Metabolic pathway databases
    • Property databases
    • Patents with chemical structures
    • Drug Discovery data
    • Scientific publications
    • Compound aggregators
    • Blogs/Wikis and Open Notebook Science
  • Where Would You look? What Do You Trust?
  • Question Everything online: www.dhmo.org
  • Di-Hydrogen Monoxide
    • 2H
  • Di-Hydrogen Monoxide
    • 2H + 1O
  • Di-Hydrogen Monoxide
    • H2O
  • Di-Hydrogen Monoxide
    • H2O
    • Water
  • It’s all on Wikipedia…
  • Chemistry on The Internet Is Messy
  • It’s Methane…
  • What’s Methane?
  • What’s Methane?
  • What ELSE is Methane???
  • Drugs are REALLY Messy
  • Vancomycin
    • Who will curate?
    • How would you clean such a large dataset?
    • Assertions!!!
  • The EXPERTS must get it right?!
  • Wikipedia, C&E News, PubChem
    • C&E News (from ACS)
  • Feedback from C&E Senior Editor
    • “ Although CAS and C&EN are both part of the ACS Publications Division, we at C&EN still have to pay for our SciFinder access , strangely enough.”
    • “ It would be nice to have an authoritative web-based source of standard, well-drawn structures for chemists to go to so they can freely cut and paste structures into their papers, PowerPoint presentations, and anything else they might need. Maybe Wikipedia will be that source one day .”
  • Structural Data for LifeSciences DailyMed
  • Lack of Stereochemisty
  • Incorrect Structures
  • Ugh…
  • Chemistry on the Internet TODAY
    • Chemistry searches are generally limited to text-based searches across the internet
    • Data are dirty: sorting the wheat from the chaff. Who can you trust?
    • Too many searches required to resource data
  • Just “Public Compound” Databases
    • PubChem
    • Drugbank
    • ChEBI/ChEMBL
    • KEGG
    • LipidMAPs
    • ChemIDPlus
    • eMolecules
    • ZINC
    • Lots of chemical vendors
    • ChemSpider
  • media.obsessable.com
    • As few interfaces as possible
    What do humans want?
  • A Pragmatic Vision
      • “ Build a Structure Centric Community to
      • Serve Chemists”
    • December 2006 – A hobby project initiated to connect chemistry on the web
      • Integrate chemical structure data on the web
      • Create a “structure-based hub” to information and data
      • Provide access to structure-based “algorithms”
      • Let chemists contribute their own data
      • Allow the community to curate/correct data
  • Answer Questions
    • Questions a student might ask…
      • What is the structure of levulinic acid?
      • Chemically, what is phenolphthalein?
      • What are the stereocenters of cholesterol?
      • Where can I find publications about xylene?
      • What are the different trade names for Ketoconazole?
      • What is the NMR spectrum of Aspirin?
      • How can I synthesize 2,4-dichlorophenol?
      • What are the safety handling issues for Thymol Blue?
  • What is Levulinic Acid?
  • What is Levulinic Acid?
  • Basic Info
  • Wikipedia and External Links
  • External Links to Data
  • Linked across the internet
  • Kyoto Encyclopedia of Genes and Genomes
  • Google Patent Integration
  • Access to Articles
    • RSC Journals
    • RSC Books
    • PubMed
    • Google Scholar
    • Google Books
    • Microsoft Academic Search
  • Access to Articles
  • Google Scholar
  • Experimental and Predicted Properties
  • ChemSpider : Spectra Linked
  •  
  • Search “OEA”
  • Search OEA
  • Search OEA
  • Search OEA
  • Linked Patents for OEA
  •  
  • Statistics for Today
      • >25 million compounds from >300 data sources
      • About 7000 unique users per day and up to ½ million transactions per day
      • A crowdsourced deposition and curation platform
      • Grows daily – more depositions, more links, more data
  • Searching Chemistry on the Internet
    • How complete a result set will we get if we search for “chemicals” by name?
    • Is there a better way to link chemistry databases? Linking by “names” is dangerous
    • Chemists want structure and SUBstructure searching
  • The InChI Identifier
  • Multiple Layers
  • InChIStrings Hash to InChIKeys
  • Link the Internet with InChIKeys! Taken from: Rafael Sidis’ Blog
  • Vancomycin – Search the Internet
  • Vancomycin Search Molecular SKELETON Search Full Molecule
  • Full Molecule Search: 4 Hits
  • Full Skeleton Search: 104 Hits
  •  
  •  
  •  
  • Vancomycin on ChemSpider 1 compound – 3 days
  • InChIKeys Make the internet searchable by adding InChIKeys Publishers add InChIKeys to papers now…
  • InChIKeys Make the internet searchable by adding InChIKeys Publishers add InChIKeys to papers now… is what???
  • The InChI “Resolver”
  • InChI Resolver to DOIs Structure Search the Web
  • Most Chemistry is NOT Published
    • Only a fraction of chemistry is published
    • Only a tiny fraction of chemistry is patented
    • What of the “Lost Chemistry”- never published and cannot be abstracted
      • Reactions performed
      • Structures made and studied
      • Spectra acquired and then disposed of
      • Available chemicals never found
  • The CAS Registry
  • CAS Registry
  • Crowd-sourcing Curation and Deposition
    • Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate
  • Multi-level Curation and Approval Building a Structure Centric Community for Chemists
  • Entity-Extraction, Mark-up, Annotate
  • Semantic Markup: Project Prospect
  • Success Depends on Dictionaries Link to a Structure or the Right Structure?
  • Name-Structure Pairs
  • Semantic Linking of Structures
    • What would you want to link off a structure?
      • Chemical suppliers
      • Other publications
      • Analytical Data
      • Related Reactions
      • Wikipedia
      • Patents
      • “ Everything”
  • Org Prep Daily (Blog)
  • Micro- and Nano-publications
    • Blogs, wiki entries and even Amazon book reviews are micro/nano-publications
    • ChemSpider SyntheticPages will be DOI’ed – students can add these “micro-publications” to their resume
    • Structures and spectra are nano-publications – these can be tracked and referenced also. (depositions, curations etc). Students participate in building one of the premier sources of chemistry data.
  • ChemSpider SyntheticPages
  • Submission process
    • Register as a user
    • Use the Submit button and fill in the fields…
  • Submission Process
    • Submissions reviewed by editorial board
    • Published as is or comments sent to author
    • Online Peer Review process
    • Data supported include web movies, images, live spectra etc.
  • ChemSpider : Spectra Linked
  • Spectra Linked
  • Spectra Linked
  • ChemSpider ID 24528095 H1 NMR
  • ChemSpider ID 24528095 C13 NMR
  • ChemSpider ID 24528095 HHCOSY
  • ChemSpider ID 24528095 HSQC
  • ChemSpider ID 24528095 HMBC
  • Full C13 assignment uploaded
  • Not Just NMR Data
  • Spectra on ChemSpider
  • Available Spectra http://www.chemspider.com/spectra.aspx
  • Sources of Spectra
    • Sourced from online sources with permission
    • Private collections
    • The MAJORITY deposited by ChemSpider users
  • How Could Students Help? Part 1
    • Students can help “curate” the data – check whether the spectra are consistent with the compound
    • If not then flag them, annotate them and provide feedback
    • OR…play the game
  • www.SpectralGame.com http://www.jcheminf.com/content/1/1/9
  • Spectral Game
  • Increasing Complexity
  • Spectral Game
  • True Curation of Data
  • How Could Students Help? Part 2
    • Add their own data to the database!
    • Spectra from:
      • research projects
      • lab sessions
      • supplementary data sections in publications
  • Spectral Uploading
    • Locate the structure of interest and deposit spectrum
  • Spectral Uploading
    • Various types of NMR spectra supported
  • Deposit spectra against new structure
    • If a NEW compound has spectral data then deposit the structure onto ChemSpider first
  • How Else Can Students Help?
    • Students can deposit single structures or thousands of structures – UNIQUE chemistry can be added and “claimed”
    • Data can be curated/edited and annotated – simply register and request the rights
    • 25 million structures, >300 data sources…there are errors of course!
  • NMRShiftDB
  • NMRShiftDB: http://www.ebi.ac.uk/nmrshiftdb/
  •  
    • Flexible search capabilities
      • by chemical name
      • by structure
      • by spectral peaklist
  • NMR Prediction
  • Multinuclear NMR Prediction
  • NMRShiftDB Data Review
    • High quality NMR shift set of ca. 100,000 shifts
    • Multiple outliers identified
    • Removed followed publication
  • ChemSpider Integrated NMR Prediction
    • Initial integration in place
  • A Game Through Embedding Data
  • Embedding Structures
  • Do you write Wikipedia Articles?
  • Do you write Wikipedia Articles?
  • ChemSpider Web Services
  • How Can You Help ChemSpider?
    • Deposit your data and share with the community
      • Structures – one or many
      • Spectra
      • Links
      • Syntheses into SyntheticPages
    • Curate data – most basic level…just add comments
    • Spread the word – ChemSpider is an untapped resource
  • Chemistry on the Internet FUTURE
    • The semantic web for chemistry is in place
    • Crowdsourced contributions are commonplace
    • Chemists will search by structure/substructure
    • Chemistry articles indexed and searchable
    • Reduced number of searches to find data
    • Data are integrated – compounds, vendors, syntheses, data, publications and patents
    • A world of Open Access and Open Data
    • Classical business models will have to morph
  • Thank you [email_address] Twitter: ChemSpiderman www.chemspider.com/blog SLIDES: www.slideshare.net/AntonyWilliams