• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community

ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community



This is the presentation I gave at OpenSciNY 2010. It was a great gathering of Librarians and people interested in Open Science. Sharing the stage with Beth Brown Jean-Claude Bradley and Heather ...

This is the presentation I gave at OpenSciNY 2010. It was a great gathering of Librarians and people interested in Open Science. Sharing the stage with Beth Brown Jean-Claude Bradley and Heather Joseph was, as usual, a good opportunity to discuss how openness and online data sharing is changing the way we access and share data. We live in interesting and exciting times.



Total Views
Views on SlideShare
Embed Views



9 Embeds 302

http://www.chemspider.com 282
http://www.slideshare.net 11
http://localhost 3
http://zjuem.zju.edu.cn 1
http://f4mail.rediff.com 1
http://wmsvr1.n20svrg.139.com 1
http://www.lmodules.com 1
http://webcache.googleusercontent.com 1
http://translate.googleusercontent.com 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community Presentation Transcript

    • ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Community OpenSciNY, New York, May 2010,
    • Once Upon a Time Over a “Coffee”
    • Which is better for Plants? Vodka, Sprite or Viagra?
    • It Works – Viagra Wins the Day
    • Now Which is Better?
      • Viagra or Cialis?
      • Images sourced from Wikipedia
    • Cialis
      • I want…
          • The structure
          • Any patent information
          • Related publications
          • Where can I buy it?
          • Metabolic pathway info
          • What else is easy to find…
    • Cialis on Google?
    • What is Cialis?
    • What is Cialis? Can we trust Wikipedia?
    • What is Cialis? 6 hits on PubChem
    • What is Cialis?
    • Search by Trade Name
    • Are there other names???
    • Are there other names???
      • PubMed hits:
        • 736 Tadalafil
        • 744 Cialis
    • Are there other names???
    • Are There Other Names?
    • IC351 on PubChem? 5 HITS for IC351 ZERO HITS for IC 351
    • Chemistry on the Web
      • Text searching the web is far from optimal
      • The quality of data on the web is a problem
      • It may be hard to find but it is “out there”
      • What was once locked up behind an expensive license can generally be found
      • Structure searching the web is already possible!
    • Text Searching the Web
      • Text searching the web for chemical compounds is an enormous challenge
      • RSC has multiple databases, >500,000 articles and a lot of other resources. How do we do?
    • The RSC Publishing Platform (Beta)
    • 2+2 = 4 Articles?
    • CAS Number Search
    • Text Searching the Web
      • Disambiguation dictionaries of name-structure relationships would be very enabling.
        • IC351 = IC 351 = Tadalafil = Cialis = …
      • Creating validated dictionaries is an enormous challenge to cover chemistry
    • CAS Registry – LOTS of Chemicals!
    • The Final Search Strategy A “Disambiguation Query!”
    • All Those Names, One Structure A problem to solve…
    • ChemSpider - A Pragmatic Vision
        • “ Build a Structure Centric Community to
        • Serve Chemists”
        • Aggregate and integrate chemical structure data on the web – names, structures, links
        • Create a “structure-based hub” to information, data and algorithmic predictions
        • Let chemists contribute their own data
        • Allow the community to curate/correct data
    • media.obsessable.com
      • As few interfaces as possible
      What do humans want?
    • Aggregating Data – Who to Trust???
      • Encyclopedic articles (Wikipedia)
      • Chemical vendor databases
      • Metabolic pathway databases
      • Property databases
      • Patents with chemical structures
      • Drug Discovery data
      • Scientific publications
      • Compound aggregators
      • Blogs/Wikis and Open Notebook Science
    • Just “Public Compound” Databases
      • PubChem
      • Drugbank
      • ChEBI/ChEMBL
      • KEGG
      • LipidMAPs
      • ChemIDPlus
      • eMolecules
      • ZINC
      • Lots of chemical vendors
    • Question Everything online: www.dhmo.org
    • Di-Hydrogen Monoxide
      • 2H
    • Di-Hydrogen Monoxide
      • 2H + 1O
    • Di-Hydrogen Monoxide
      • H2O
    • Di-Hydrogen Monoxide
      • H2O
      • Water
    • It’s all on Wikipedia…
    • What About Gases? Methane…
    • What’s Methane?
    • What’s Methane?
    • What ELSE is Methane???
    • Structural Data for Life Sciences DailyMed
    • Lack of Stereochemisty
    • Incorrect Structures
    • Pragmatic Vision Delivered…
      • Aggregate, integrate and link data from across the internet
      • Almost 25 million structures from > 300 data sources
      • Linked to vendors, literature, online databases (open and commercial), open notebook science, patents and….
      • Robotic and Crowdsourced Curation
    • Search “OEA”
    • Search OEA
    • Search OEA
    • Search OEA
    • Linked Patents for OEA
    • Answering Questions…
      • Questions a student might ask…
        • What is the structure of levulinic acid?
        • Chemically, what is phenolphthalein?
        • What are the stereocenters of cholesterol?
        • Where can I find publications about xylene?
        • What are the different trade names for Ketoconazole?
        • What is the NMR spectrum of Aspirin?
        • How can I synthesize 2,4-dichlorophenol?
        • What are the safety handling issues for Thymol Blue?
    • Back to Cialis…
    • Cialis on ChemSpider : 1 hit
      • Chemicals are curated/validated on ChemSpider by ourselves and the community
      • Based on assertions from various sources. Iterative, time-consuming and exacting!
      • We believe we know the structure now
      • What is linked and available?
    • Google Patents
    • ChemSpider – Patents Linked SURECHEM PATENTS GOOGLE
    • Google Books
    • Microsoft Academic Search
    • Google Scholar – Articles were found by CAS Number !
    • Identifiers for Tadalafil
    • How Many Articles in RSC Journals ?
      • Based on 171596-29​-5 there are 13 articles in RSC journals
      • What about if we VALIDATE identifiers?
    • Validated Dictionaries Hit APIs This is data curation...
    • Does this generate more results?
    • RSC Journals
    • RSC Journals REMEMBER 2+2 = 4
    • PubMed
    • Google Scholar – Expanded Hit Set
    • Microsoft Academic Search
    • Microsoft Academic Search
      • Be careful! More mussels than drugs…
    • Searching Chemistry on the Internet
      • Do we get complete a result set will we get if we search for “chemicals” only by name?
      • Is there a better way to link chemistry databases? Linking by “names” is dangerous
      • Chemists want structure and SUBstructure searching
    • Structure Searching the Web
      • We have resources about Tadalafil actively linked to ChemSpider
      • What about searching the web for Tadalafil by structure…not based on the various identifiers
      • How?
    • Link the Internet with InChIKeys! Taken from: Rafael Sidis’ Blog
    • The InChI Identifier
    • Multiple Layers
    • InChIStrings Hash to InChIKeys
    • Cialis – Searching the Web by InChI Search Molecular SKELETON Search Full Molecule
    • InChI Search the Web by Skeleton 78 Hits by Skeleton
    • InChI Search the Web Exact Match 32 Hits by InChIKey
    • InChI Search the Web Exact Match 6 Hits by Standard InChIKey
    • InChifying the Web
      • There are more than 2X “skeletons” for Cialis than exact matches – different stereo? Mistakes?
      • Our judgment…MISTAKES
    • Vancomycin – Search the Internet
    • Full Molecule Search: 4 Hits
    • Full Skeleton Search: 104 Hits
    • InChIKeys Make the internet searchable by adding InChIKeys Publishers add InChIKeys to papers now… But what is the structure???
    • We need an InChI “Resolver”
    • InChI Resolver to DOIs Structure Search the Web
    • Semantic Markup: Project Prospect
    • Depends on Validated Dictionaries Link to a Structure or the Right Structure?
    • Name-Structure Pairs
    • Semantic Linking of Structures
      • What would you want to link off a structure?
        • Chemical suppliers
        • Other publications
        • Analytical Data
        • Related Reactions
        • Wikipedia
        • Patents
        • “ Everything”
        • Through ChemSpider!
    • Unpublished Chemistry
      • Only a fraction of chemistry is published
      • Only a tiny fraction of chemistry is patented
      • What of the “Lost Chemistry”- never published and cannot be abstracted
        • Reactions performed
        • Structures made and studied
        • Spectra acquired and then disposed of
        • Available chemicals never found
    • Org Prep Daily (Blog)
    • ChemSpider SyntheticPages
    • Submission process
      • Register as a user
      • Use the Submit button and fill in the fields…
    • Submission Process
      • Submissions reviewed by editorial board
      • Published as is or comments sent to author
      • Online Peer Review process
      • Data supported include web movies, images, live spectra etc.
    • Micro- and Nano-publications
      • Blogs, wiki entries and even Amazon book reviews are micro/nano-publications
      • ChemSpider SyntheticPages will be DOI’ed – students can add these “micro-publications” to their resume
      • Structures and spectra are nano-publications – these can be tracked and referenced also. (depositions, curations etc). Students participate in building one of the premier sources of chemistry data.
    • ChemSpider : Spectra Linked
    • Spectra Linked
    • Spectra Linked
    • Not Just NMR Data
    • www.SpectralGame.com http://www.jcheminf.com/content/1/1/9
    • Spectral Game
    • Increasing Complexity
    • Spectral Game
    • ChemSpider Content
      • ChemSpider is a container…supports multimedia
        • Spectra
        • Crystal structures
        • Images
        • MP3s
        • Videos
    • Roses’ Crystal Image Collection
    • MP3s and Videos : Titanium
    • Periodic Table Images
    • How Can You Help ChemSpider?
      • Deposit your data and share with the community
        • Structures – one or many
        • Spectra
        • Links
        • Syntheses into SyntheticPages
      • Curate data – most basic level…just add comments
      • Spread the word – ChemSpider is an untapped resource
    • Community Contribution
      • We can make a bigger contribution to the community if the community shares via ChemSpider
      • Don’t underestimate what others will find of value
      • ChemSpider wins “Community
      • contribution” best practice award”
    • Chemistry on the Internet FUTURE
      • The semantic web for chemistry is in place
      • Crowdsourced contributions are commonplace
      • Chemists will search by structure/substructure
      • Chemistry articles indexed and searchable
      • Reduced number of searches to find data
      • Data are integrated – compounds, vendors, syntheses, data, publications and patents
      • A world of Open Access and Open Data
    • Thank you [email_address] Twitter: ChemSpiderman www.chemspider.com/blog SLIDES: www.slideshare.net/AntonyWilliams