• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
ChemSpider – The Vision and Challenges Associated with Building a Free Online Community Resource for Chemists
 

ChemSpider – The Vision and Challenges Associated with Building a Free Online Community Resource for Chemists

on

  • 2,636 views

 

Statistics

Views

Total Views
2,636
Views on SlideShare
2,624
Embed Views
12

Actions

Likes
0
Downloads
6
Comments
0

3 Embeds 12

http://www.chemspider.com 6
http://paper.li 4
https://www.chemspider.com 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    ChemSpider – The Vision and Challenges Associated with Building a Free Online Community Resource for Chemists ChemSpider – The Vision and Challenges Associated with Building a Free Online Community Resource for Chemists Presentation Transcript

    • ChemSpider – The Vision and Challenges Associated with Building a Free Online Community Resource for Chemists Antony Williams AZ, February 2011
    • What’s the Status of Chemistry online?
      • Encyclopedic articles (Wikipedia)
      • Chemical vendor databases
      • Metabolic pathway databases
      • Virtual Screening databases
      • Property databases
      • Screening assay results
      • Patents with chemical structures
      • ADME/Tox data
      • Scientific publications
      • Compound aggregators
      • Blogs/Wikis and Open Notebook Science
    • For Synthesis…TotallySynthetic.com
    • Org Prep Daily (Blog)
    • Molbank (Open Access Journal)
    • Lots of “Public Compound” Databases
      • PubChem
      • Drugbank
      • ChEBI/ChEMBL
      • KEGG
      • LipidMAPs
      • ChemIDPlus
      • eMolecules
      • ZINC
      • Lots of chemical vendors
      • ChemSpider
    • Where Would You look? What Do You Trust?
    • Linked Data on the Web
    • What is a compound? “ARTAs”
    • Vision: Connect Chemistry on the Web
      • The internet is searchable by chemical structure and substructure (e.g.Wikipedia, Google Scholar)
      • Chemistry articles are indexed and searchable by a free online service
      • The web is linked together through the “language of chemistry”
      • Publicly funded research data is linked
    • We Have Delivered the Vision
        • “ Build a Structure Centric Community to
        • Serve Chemists”
        • Integrate chemical structure data on the web
        • Create a “structure-based hub” to information, data and algorithmic predictions
        • Let chemists contribute their own data
        • Allow the community to curate/correct data
    • How Was ChemSpider Built?
      • ChemSpider was a “hobby project”
      • Housed in a basement and running off three servers – one bought, two built
      • Sensitive to weather and power stability
      • Went live at ACS Spring 2007 in Chicago
    • How Did We Build It?
      • We deal in Molfiles or SDF files
      • We do rudimentary filtering – valence checking, charge imbalance – prior to deposition
      • We have our own “business logic” to standardize
      • Link out to external sites where possible using IDs
    • www.chemspider.com
    • We Want to Answer Questions
      • Questions a chemist might ask…
        • What is the melting point of n-heptanol?
        • What is the chemical structure of Xanax?
        • Chemically, what is phenolphthalein?
        • What are the stereocenters of cholesterol?
        • Where can I find publications about xylene?
        • What are the different trade names for Ketoconazole?
        • What is the NMR spectrum of Aspirin?
        • What are the safety handling issues for Thymol Blue?
    • Search for a Chemical…by name
    • Link off a structure in ChemSpider
        • Chemical suppliers
        • Other publications
        • Analytical Data
        • Related Reactions
        • Wikipedia
        • Patents
        • “ Everything”
    • Available Information…
      • Linked to vendors, safety data, toxicity, metabolism
    • Available Information….
    • Clickthrough to Patent (SureChem)
    • Crowdsourced “Annotations”
      • Registered Users can add
        • Descriptions/Syntheses/Commentaries
        • Links to PubMed articles
        • Links to articles via DOIs
        • Add spectral data
        • Add Crystallographic Information Files
        • Add photos
        • Add MP3 files
        • Add Videos
    •  
    • Spectra Linked
    • Spectra Linked
    • Search for a chemical…by structure Substructure search coming…
    • Inherited Errors
      • Inherited errors from every database… all public compound databases, including ours, have errors
      • “ Incorrect” structures – assertions, timelines etc
      • “ Incorrect” names associated with structures
      • ENORMOUS CHALLENGE
    • What is the Structure of Vitamin K?
    • MeSH
      • A lipid cofactor that is required for normal blood clotting. Several forms of vitamin K have been identified: VITAMIN K 1 (phytomenadione) derived from plants , VITAMIN K 2 (menaquinone) from bacteria, and synthetic naphthoquinone provitamins, VITAMIN K 3 (menadione). Vitamin K 3 provitamins, after being alkylated in vivo, exhibit the antifibrinolytic activity of vitamin K. Green leafy vegetables, liver, cheese, butter, and egg yolk are good sources of vitamin K
    • What is the Structure of Vitamin K1?
    • What is the Structure of Vitamin K1?
    • Vitamin K1
    •  
      • “ 2-methyl-3-(3,7,11,15-tetramethyl hexadec-2-enyl)naphthalene-1,4-dione”
      • Variants of systematic names on PubChem
      • 2-methyl-3-[(E,7R,11R)-3,7,11,15-tetramethyl
      • 2-methyl-3-[(E,7S,11R)-3,7,11,15-tetramethyl
      • 2-methyl-3-[(E,7R,11S)-3,7,11,15-tetramethyl
      • 2-methyl-3-[(E,7S,11S)-3,7,11,15-tetramethyl
      • 2-methyl-3-[(E,11S)-3,7,11,15-tetramethyl
      • 2-methyl-3-[(E)-3,7,11,15-tetramethyl
      • 2-methyl-3-(3,7,11,15-tetramethyl
      • 2-methyl-3-[(E)-3,7,11,15-tetramethyl
    • Question Everything online: www.dhmo.org
    • It’s all on Wikipedia…
    • Chemistry on The Internet Is Messy
    • It’s Methane…
    • What’s Methane?
    • What’s Methane?
    • What ELSE is Methane???
    •  
    •  
    • EPA’s DailyMed
    • EPA’s DailyMed
    • EPA’s DailyMed
    • Public Domain Chemistry Databases
      • Our databases are a mess…
      • Non-curated databases are proliferating errors
      • We source and deposit data between databases
      • Original sources of errors hard to determine
      • Curation is time-consuming, challenging and exacting
      • An examination of quality in databases – inter/intra lab comparison of processes for 150 drugs
    • Drug Name Generic Name ChEBI ChemSpider CAS Com. Chem ChemIDPlus DailyMed DrugBank PubChem Wikipedia Spiriva Tiotropium Bromide No Hits  No Hits    4/0  Depakote Valproate semisodium        No Structure Basen Voglibose   No Hits  No Hits  2/1  Symbicort 1) Budesonide       8/1  Symbicort 2) Formoterol WRONG  No Hits    6/1  Vytorin 1) Ezetimibe   No Hits      Vytorin 2) Simvastatin       2/1  Taxol Paclitaxel       44/1  Thalidomid Thalidomide No Hits        Zocor Simvastatin       2/1  Crestor Rosuvastatin   No Hits    2/1 
    • Symbicort: Budesonide + Formoterol
    • Symbicort: Budesonide + Formoterol ChemIDPlus Wikipedia
    • DrugBank: Search Symbicort…
    • Symbicort: Budesonide + Formoterol
      • PubChem
        • 8 structures called Budesonide. 1 “correct”
        • 6 structures called Formoterol. 1 “correct”
        • Search on “Symbicort” gives 1 structure.
    • Taxol: Paclitaxel 44 structures
    • Taxol: Paclitaxel Bioassay Data
    • Taxol: Paclitaxel Bioassay Data
      • Most Bioassay data associated with structure with one ambiguous stereocenter
      • Consider searching each of these chemical databases by chemical name (systematic name, trade name or synonym). Please mark each online resource according to how much you generally trust the results.
    •  
    •  
    •  
    •  
    • The Final Search Strategy
    • All Those Names, One Structure
    • Searching Chemistry on the Internet
      • How complete a result set will we get if we search for “chemicals” by name?
      • Is there a better way to link chemistry databases? Linking by “names” is dangerous
      • Chemists want structure and SUBstructure searching
    • The InChI Identifier
    • Multiple Layers
    • InChIStrings Hash to InChIKeys
    • Oleoylethanolamine
    • InChIs have traction…
    • Vancomycin
    • Vancomycin
    • Vancomycin Search Molecular SKELETON Search Full Molecule
    • Full Molecule Search: 4 Hits
    • Full Skeleton Search: 104 Hits
    • Vancomycin
      • Who will curate?
      • How would you clean such a large dataset?
    • Vancomycin on ChemSpider
    •  
    •  
    •  
    •  
    •  
    •  
    • Name Searching is “Easier”
    • Name Searching is “Easier”
    •  
    •  
    • Content is King and Quality Costs
      • Curated Chemistry “content” is expensive to create
        • Patent searching
        • Structures and properties
        • Drug databases
        • Literature databases
      • Chemical Abstracts Service (CAS), the “Gold Standard” in Chemistry related information
        • 104 years of content
        • >50 million substances
        • Proprietary platform
    • The EXPERTS must get it right?!
    • Wikipedia, C&E News, PubChem
      • C&E News (from ACS)
    • Feedback from Steve Ritter
      • “ Although CAS and C&EN are both part of the ACS Publications Division, we at C&EN still have to pay for our SciFinder access, strangely enough.”
      • “ It would be nice to have an authoritative web-based source of standard, well-drawn structures for chemists to go to so they can freely cut and paste structures into their papers, PowerPoint presentations, and anything else they might need. Maybe Wikipedia will be that source one day .”
    •  
    • Search OEA
    • Search OEA
    • Search OEA
    • Semantic Mark-up for Chemistry
      • Semantic mark-up for chemistry is here
        • RSC project prospect (structure linking, IUPAC Gold Book ontology and other ontologies
        • Nature publishing group compound linking
    • Nature Chemistry Compound Pages
    • Project Prospect
    • Entity-Extraction, Mark-up, Annotate
    • Entity-Extraction, Mark-up, Annotate
    • And linked to STITCH…
    • Success Depends on Dictionaries
    • Online Curation
      • Online databases generally do NOT allow curation or annotation
      • If you find errors they stay there!
      • ChemSpider allows immediate curation
    • Search “Vitamin H”
    • “ Curate” Identifiers
    • “ Curate” Identifiers
    • “ Curate” Identifiers
    • Crowd-sourcing Chemistry Curation
    • Crowdsourcing Works
      • >130 people have deposited data and participated in data curation
      • Different level curators check each other
      • Wikipedia is the modern primary example
    • ChemSpider and Publishing
      • The curation efforts on ChemSpider led to a set of validated dictionaries
      • Integrate best-in-class entity extraction with validated name dictionaries
      • Already text-mined the RSC archive and presently linking!
    • Crowdsourcing Synthesis ChemSpider SyntheticPages
    • Crowdsourcing Synthesis ChemSpider SyntheticPages
    • ChemSpider Everywhere: What do computers want?
      • Web services
    • Web Services
    • ChemSpider Everywhere
      • Linked from Wikipedia and many Public Databases
      • Linked from Open Notebook Science sites
      • Linked from Blogs using Structure/Spectra EMBED
      • Integrated into structure drawing packages
      • Integrated to software offerings from Thermo, Waters, Agilent, Bruker
    • ChemSpider Everywhere : Embed
    • ChemSpider Everywhere: Spectral Game
    • ChemSpider Everywhere Crowdsourced Curation of Spectra
    • ChemSpider Everywhere : ChemMobi
    • Structure Database Lookup
    • Structure Database Lookup
    • Reaction Database Look-up
    • Reaction Database Look-up
    • There will always be gaps...
      • What ChemSpider does not deal with, yet...
        • Materials
        • Minerals
        • Polymers
        • Biological macromolecules
    • Collaborative Data Curation
      • How can we COLLECTIVELY clean online data?
      • Developing ways to share curation actions back to original data sources
      • A mindset of bigger is better is problematic. How many “real chemicals” are in the public databases?
    • Future Work
      • Continue curation work
      • Extend search capabilities
      • Expand existing databases
      • Text-mine RSC archive and link chemistry
      • Project: pre-competitive data sharing and linking for Life Sciences
      • Integrate to metabolic pathways tools
    • The Future of Chemistry on the Web?
      • Public compound databases federate & build a linked environment of validated data!
      • Data validation needs are not ignored
      • Publishers layer on information to make publications discoverable
      • Public-Private databases can be linked
      • Open Data proliferate
      • The “ Semantic Web ” in action
    • It’s a long road ahead…
    • Thank you Email: williamsa@rsc.org Twitter: ChemConnector Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams