Integrating and curating internet based chemistry resources to serve life scientists
Upcoming SlideShare
Loading in...5
×
 

Integrating and curating internet based chemistry resources to serve life scientists

on

  • 3,298 views

The internet now offers access to a myriad of online resources that can be of value to chemists working in the Life Sciences. While finding information online is, in many cases, a simple search away, ...

The internet now offers access to a myriad of online resources that can be of value to chemists working in the Life Sciences. While finding information online is, in many cases, a simple search away, the accuracy and validity of the associated data and information should be questioned. As more databases and resources are introduced online, and commonly not integrated to other resources, a scientist must perform multiple searches and then undertake the task of meshing and merging data. ChemSpider is a freely accessible online database that has taken on the challenge of meshing together distributed resources across the internet to provide a structure-based hub. It is a crowdsourcing environment hosting over 26 million unique compounds linked out to over 400 data sources. With well defined programming interfaces for integration ChemSpider has been integrated to many commercial and open software packages and is presently serving as the chemistry foundation for the IMI Open PHACTS project.

Statistics

Views

Total Views
3,298
Views on SlideShare
3,127
Embed Views
171

Actions

Likes
3
Downloads
15
Comments
1

3 Embeds 171

http://www.chemconnector.com 165
url_unknown 4
http://www.chemspider.com 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-NonCommercial LicenseCC Attribution-NonCommercial License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Integrating and curating internet based chemistry resources to serve life scientists Integrating and curating internet based chemistry resources to serve life scientists Presentation Transcript

    • ChemSpider – Integrating and Curating Internet-Based Chemistry Resources to Serve Life Scientists Antony Williams PharmSciFair, July 2011
    • The Internet for Life Scientists
      • What resources are you using online?
      • How well are they working?
      • What problems exist, that you know of?
      • ChemSpider – “curating Chemistry with the world”
      • The benefits of crowdsourcing chemistry
      • A Semantic Web for the Life Sciences
      • An introduction to Open PHACTS
    • Where is chemistry online?
      • Encyclopedic articles (Wikipedia)
      • Chemical vendor databases
      • Metabolic pathway databases
      • Property databases
      • Patents with chemical structures
      • Drug Discovery data
      • Scientific publications
      • Compound aggregators
      • Blogs/Wikis and Open Notebook Science
    • Where can we find data online?
    • Life Scientists and Online Resources
      • Where do life scientists resource information online?
        • PubChem
        • ChEBI/ChEMBL
        • Protein Data Bank (PDB)
        • DrugBank
        • Wikipedia
        • What else do you use??
        • What do you TRUST??
    • What is the Structure of Vitamin K1?
    • What is the Structure of Vitamin K1?
    • CAS’s Common Chemistry
    • Wikipedia
    •  
    •  
    • ChEBI – Manual Curation
    •  
    •  
    • PubChem
    •  
      • “ 2-methyl-3-(3,7,11,15-tetramethyl hexadec-2-enyl)naphthalene-1,4-dione”
      • Variants of systematic names on PubChem
      • 2-methyl-3-[(E,7R,11R)-3,7,11,15-tetramethyl
      • 2-methyl-3-[(E,7S,11R)-3,7,11,15-tetramethyl
      • 2-methyl-3-[(E,7R,11S)-3,7,11,15-tetramethyl
      • 2-methyl-3-[(E,7S,11S)-3,7,11,15-tetramethyl
      • 2-methyl-3-[(E,11S)-3,7,11,15-tetramethyl
      • 2-methyl-3-[(E)-3,7,11,15-tetramethyl
      • 2-methyl-3-(3,7,11,15-tetramethyl
      • 2-methyl-3-[(E)-3,7,11,15-tetramethyl
    • Public Domain Chemistry Databases
      • Our databases are a mess…
      • Non-curated databases are proliferating errors
      • We source and deposit data between databases
      • Original sources of errors hard to determine
      • Curation is time-consuming, challenging and exacting
    • Lipitor
      • What are people ACTUALLY measuring BioAssays on?
      • Does stereochemistry matter?
    • The FDA’s DailyMed
    • Structures on DailyMed
    • Lack of Stereochemisty
    • Incorrect Structures
    • PDSP
      • The database has 55440K i values for searching
    • PDSP Structures – Canonical SMILES Is Stereochemistry important???!!!
    • What’s Methane?
    • What’s Methane?
    • What ELSE is Methane???
    • Build Models with GOOD DATA!
    •  
    • So you want data on drugs???
      • Sourcing data based on drug names is difficult!
      • Where would you find the “correct chemical structures”?
      • What databases can you trust?
      • Consider searching each of these chemical databases by chemical name (systematic name, trade name or synonym). Please mark each online resource according to how much you generally trust the results .
    •  
    • Public Domain Chemistry Databases
      • An examination of quality in databases – inter/intra lab comparison of processes for 150 drugs
    •  
    • Vytorin: Ezetimibe/Simvastatin
    • Vytorin: Ezetimibe/Simvastatin
    • Vytorin: Ezetimibe/Simvastatin
    • Vytorin: Ezetimibe/Simvastatin
    • Vytorin: Ezetimibe/Simvastatin
    • Taxol: Paclitaxel 44 structures
    • Drug Name Generic Name ChEBI ChemSpider CAS Com. Chem ChemIDPlus DailyMed DrugBank PubChem Wikipedia Spiriva Tiotropium Bromide No Hits  No Hits    4/0  Depakote Valproate semisodium        No Structure Basen Voglibose   No Hits  No Hits  2/1  Symbicort 1) Budesonide       8/1  Symbicort 2) Formoterol WRONG  No Hits    6/1  Vytorin 1) Ezetimibe   No Hits      Vytorin 2) Simvastatin       2/1  Taxol Paclitaxel       44/1  Thalidomid Thalidomide No Hits        Zocor Simvastatin       2/1  Crestor Rosuvastatin   No Hits    2/1 
    • Vision: Connect Chemistry on the Web
      • The internet is searchable by chemical structure and substructure (e.g.Wikipedia, Google Scholar)
      • Chemistry articles are indexed and searchable by a free online service
      • The web is linked together through the “language of chemistry”
      • Publicly funded research data is linked
    • We Have Delivered the Vision
        • “ Build a Structure Centric Community to
        • Serve Chemists”
        • Integrate chemical structure data on the web
        • Create a “structure-based hub” to information, data and algorithmic predictions
        • Let chemists contribute their own data
        • Allow the community to curate/correct data
    • www.chemspider.com
    • We Want to Answer Questions
      • Questions a chemist might ask…
        • What is the melting point of n-heptanol?
        • What is the chemical structure of Xanax?
        • Chemically, what is phenolphthalein?
        • What are the stereocenters of cholesterol?
        • Where can I find publications about vancomycin?
        • What are the different trade names for Ketoconazole?
        • What is the NMR spectrum of Aspirin?
        • What are the safety handling issues for Thymol Blue?
    • Search for a Chemical…by name
    • Link off a structure in ChemSpider
        • Chemical suppliers
        • Other publications
        • Analytical Data
        • Related Reactions
        • Wikipedia
        • Patents
        • “ Everything”
    • Available Information…
      • Linked to vendors, safety data, toxicity, metabolism
    • Available Information….
    • What else is available?
      • Links to patents – SureChem and Google Patents
      • Links to literature – PubMed, Google Scholar, RSC backfile and databases
      • Measured and experimental physchem data
      • Links to prediction algorithms
      • Links to suppliers
    • Structure and substructure searches
    • Crowdsourced “Annotations”
      • Users can add
        • Descriptions/Syntheses/Commentaries
        • Links to PubMed articles
        • Links to articles via DOIs
        • Add spectral data
        • Add Crystallographic Information Files
        • Add photos
        • Add MP3 files
        • Add Videos
    •  
    • Content is King and Quality Costs
      • Curated Chemistry “content” is expensive to create
        • Patent searching
        • Structures and properties
        • Drug databases
        • Literature databases
      • Chemical Abstracts Service (CAS), lauded as the “Gold Standard” in Chemistry related information
        • 104 years of content
        • >50 million substances
        • Proprietary platform
    • With Great Fanfare…also costs…
    • NPC Browser http://tripod.nih.gov/npc/
    • Curation required
    • Curation required
    • My favorite
    • Neomycin
    • Inherited Errors
      • Inherited errors from every database… all public compound databases, including ours , have errors
      • “ Incorrect” structures – assertions, timelines etc
      • “ Incorrect” names associated with structures
      • ENORMOUS CHALLENGE
    • Online Curation
      • Online databases generally do NOT allow curation or annotation
      • If you find errors they stay there!
      • ChemSpider allows immediate curation
    • Search “Vitamin H”
    • “ Curate” Identifiers
    • “ Curate” Identifiers
    • “ Curate” Identifiers
    • Crowd-sourcing Chemistry Curation
    • Crowdsourcing Works
      • >130 people have deposited data and participated in data curation
      • Different level curators check each other
      • Wikipedia is the modern primary example
      • It ALSO works for crowdsourcing SYNTHESIS
    • ChemSpider SyntheticPages
    • Why Curated Dictionaries Matter
    • Nature Chemistry
    • Success Depends on Dictionaries
    • Validated Name-Structure Dictionaries
      • Chemical name dictionaries are used for:
          • Text-mining (publications, patents)
            • Used to index PubMed and link to Google Patents
          • Linking to other databases – think Biology!
            • When structures are not available drug names link
          • Searching the web
            • Names link to structures link to InChIs
    • The InChI Identifier
    • Multiple Layers
    • InChIStrings Hash to InChIKeys
    • Vancomycin – Search the Internet
    • Full Skeleton Search: 104 Hits
    • Full Molecule Search: 4 Hits
    • OpenTox uses InChIs
    • There will always be gaps...
      • What ChemSpider does not deal with, yet...
        • Materials
        • Minerals
        • Polymers
        • Biological macromolecules
        • Mappings to diseases, targets etc. ONLY to chemicals in other databases
        • Daily updates to chemistry!
    • Continuous changes..June 2011 USANS
    • The Future of Chemistry on the Web?
      • Public compound databases federate & build a linked environment of validated data!
      • Data validation needs are not ignored
      • Publishers layer on information to make publications discoverable
      • Public-Private databases can be linked
      • Open Data proliferate
      • The “ Semantic Web ” in action
      • It will require COLLABORATION
    • The Future: Open PHACTS
    •  
    • Open PHACTS Overview
      • Develop a set of robust standards to enable:
        • Integration between data sources via semantic technologies
        • Development of high quality assertions
        • Workflows and analysis pipelines across resources
    • Open PHACTS Overview
      • Implement standards in a semantic integration hub (“Open Pharmacological Space”)
        • Develop an open, public domain infrastructure for drug discovery data integration
        • Development of open web-services for drug discovery
        • Development of a secure access model to enable queries with proprietary data
    • Open PHACTS Overview
      • Deliver services to support ongoing drug discovery programs in pharma and public domain
        • Align development of standards, vocabularies and data integration to selected drug discovery issues
      • Collaboration between pharmaceutical companies, medicinal chemists, cheminformaticians, semantic web scientists and publishers
      • Will include public-private data sharing
      • Open PHACTS Project Partners
    • Acknowledgments
      • RSC|ChemSpider team
      • The “Crowd” of curators
      • All Data Source providers
      • The Open PHACTS team – a large cast!!!
      • GGA Software Services
      • ACD/Labs
      • OpenEye
      • Accelrys
    • Thank you Email: williamsa@rsc.org Twitter: ChemConnector Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams