Building A Community Resource For The Life Sciences
Upcoming SlideShare
Loading in...5
×
 

Building A Community Resource For The Life Sciences

on

  • 1,113 views

This is a presentation given in Track 4, Open Access and Cheminformatics, at the Bio-IT Meeting in Boston on April 21st 2010. It is a general overview of ChemSpider activities to link together the ...

This is a presentation given in Track 4, Open Access and Cheminformatics, at the Bio-IT Meeting in Boston on April 21st 2010. It is a general overview of ChemSpider activities to link together the internet for chemists and validate and curate data. We won the Bio-IT Best Practices Community Service Award that evening also.

Statistics

Views

Total Views
1,113
Views on SlideShare
1,032
Embed Views
81

Actions

Likes
1
Downloads
8
Comments
0

3 Embeds 81

http://www.chemspider.com 79
http://www.lmodules.com 1
http://www.slideshare.net 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Building A Community Resource For The Life Sciences Building A Community Resource For The Life Sciences Presentation Transcript

  • Building A Community Platform to Support Chemistry and the Life Sciences
  • Where Would You look? What Do You Trust?
  • Chemistry on the Internet TODAY
    • Chemistry searches are generally limited to text-based searches across the internet
    • Data are dirty: sorting the wheat from the chaff. Who can you trust?
    • Too many searches required to resource data
  • Chemistry on the Internet TODAY
    • Chemistry searches are generally limited to text-based searches across the internet
    • Data are dirty: sorting the wheat from the chaff. Who can you trust?
    • Too many searches required to resource data
  •  
  •  
  • The Final Search Strategy
  • All Those Names, One Structure A problem to solve…
  • Chemistry on the Internet TODAY
    • Chemistry searches are generally limited to text-based searches across the internet
    • Data are dirty: sorting the wheat from the chaff. Who can you trust?
    • Too many searches required to resource data
  • Trustworthy Chemistry?
    • Encyclopedic articles (Wikipedia)
    • Chemical vendor databases
    • Metabolic pathway databases
    • Property databases
    • Patents with chemical structures
    • Drug Discovery data
    • Scientific publications
    • Compound aggregators
    • Blogs/Wikis and Open Notebook Science
  • Where Would You look? What Do You Trust?
  • Structural Data for LifeSciences DailyMed
  • Lack of Stereochemisty
  • Incorrect Structures
  • Ugh…
  • Drugs are REALLY Messy
  • Vancomycin
    • Who will curate?
    • How would you clean such a large dataset?
    • Assertions!!!
  • The EXPERTS must get it right?!
  • Wikipedia, C&E News, PubChem
    • C&E News (from ACS)
  • Chemistry on the Internet TODAY
    • Chemistry searches are generally limited to text-based searches across the internet
    • Data are dirty: sorting the wheat from the chaff. Who can you trust?
    • Too many searches required to resource data
  • Just “Public Compound” Databases
    • PubChem
    • Drugbank
    • ChEBI/ChEMBL
    • KEGG
    • LipidMAPs
    • ChemIDPlus
    • eMolecules
    • ZINC
    • Lots of chemical vendors
    • ChemSpider
  • media.obsessable.com
    • As few interfaces as possible
    What do humans want?
  • A Pragmatic Vision
      • “ Build a Structure Centric Community to
      • Serve Chemists”
      • Integrate chemical structure data on the web
      • Create a “structure-based hub” to information and data
      • Provide access to structure-based “algorithms”
      • Let chemists contribute their own data
      • Allow the community to curate/correct data
  • Answer Questions
    • Questions a chemist might ask…
      • What is the melting point of n-heptanol?
      • What is the chemical structure of Xanax?
      • Chemically, what is phenolphthalein?
      • What are the stereocenters of cholesterol?
      • Where can I find publications about xylene?
      • What are the different trade names for Ketoconazole?
      • What is the NMR spectrum of Aspirin?
      • What are the safety handling issues for Thymol Blue?
  • ChemSpider Searches
  •  
  • Search “OEA”
  • Search OEA
  • Link Farm Connections
  • Link Farm Connections
  • Search OEA
  • Search OEA
  • Google Books
  • Google Scholar
  • Linked Patents for OEA
  •  
  • Google Patents
  • Microsoft Academic Search
  • RSC Journals
  • RSC Databases
  • Statistics for Today
      • Almost 25 million compounds from >350 data sources
      • About 7000 unique users per day and up to ½ million transactions per day
      • A crowdsourced deposition and curation platform
      • Grows daily – more depositions, more links, more data
  • Searching Chemistry on the Internet
    • How complete a result set will we get if we search for “chemicals” by name?
    • Is there a better way to link chemistry databases? Linking by “names” is dangerous
    • Chemists want structure and SUBstructure searching
  • The InChI Identifier
  • Multiple Layers
  • InChIStrings Hash to InChIKeys
  • Link the Internet with InChIKeys! Taken from: Rafael Sidis’ Blog
  • Vancomycin – Search the Internet
  • Vancomycin Search Molecular SKELETON Search Full Molecule
  • Full Molecule Search: 4 Hits
  • Full Skeleton Search: 104 Hits
  •  
  •  
  •  
  • Vancomycin
  • Vancomycin on ChemSpider 1 compound – 3 days
  • InChIKeys Make the internet searchable by adding InChIKeys Publishers add InChIKeys to papers now…
  • InChIKeys Make the internet searchable by adding InChIKeys Publishers add InChIKeys to papers now… is what???
  • The InChI “Resolver”
  • InChI Resolver to DOIs Structure Search the Web
  • Most Chemistry is NOT Published
    • Only a fraction of chemistry is published
    • Only a tiny fraction of chemistry is patented
    • What of the “Lost Chemistry”- never published and cannot be abstracted
      • Reactions performed
      • Structures made and studied
      • Spectra acquired and then disposed of
      • Available chemicals never found
  • Crowd-sourcing Curation and Deposition
    • Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate
  • Multi-level Curation and Approval Building a Structure Centric Community for Chemists
  • Semantic Markup: Project Prospect
  • Name-Structure Pairs
  • Semantic Linking of Structures
    • What would you want to link off a structure?
      • Chemical suppliers
      • Other publications
      • Analytical Data
      • Related Reactions
      • Wikipedia
      • Patents
      • “ Everything”
  • Org Prep Daily (Blog)
  • ChemSpider SyntheticPages
  • Chemistry on the Internet FUTURE
    • The semantic web for chemistry is in place
    • Crowdsourced contributions are commonplace
    • Chemists will search by structure/substructure
    • Chemistry articles indexed and searchable
    • Reduced number of searches to find data
    • Data are integrated – compounds, vendors, syntheses, data, publications and patents
    • A world of Open Access and Open Data
  • ChemSpider Web Services
  •  
  • Thank you [email_address] Twitter: ChemSpiderman www.chemspider.com/blog SLIDES: www.slideshare.net/AntonyWilliams