Building A Community Resource For The Life Sciences

  • 633 views
Uploaded on

This is a presentation given in Track 4, Open Access and Cheminformatics, at the Bio-IT Meeting in Boston on April 21st 2010. It is a general overview of ChemSpider activities to link together the …

This is a presentation given in Track 4, Open Access and Cheminformatics, at the Bio-IT Meeting in Boston on April 21st 2010. It is a general overview of ChemSpider activities to link together the internet for chemists and validate and curate data. We won the Bio-IT Best Practices Community Service Award that evening also.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
633
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
8
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Building A Community Platform to Support Chemistry and the Life Sciences
  • 2. Where Would You look? What Do You Trust?
  • 3. Chemistry on the Internet TODAY
    • Chemistry searches are generally limited to text-based searches across the internet
    • Data are dirty: sorting the wheat from the chaff. Who can you trust?
    • Too many searches required to resource data
  • 4. Chemistry on the Internet TODAY
    • Chemistry searches are generally limited to text-based searches across the internet
    • Data are dirty: sorting the wheat from the chaff. Who can you trust?
    • Too many searches required to resource data
  • 5.  
  • 6.  
  • 7. The Final Search Strategy
  • 8. All Those Names, One Structure A problem to solve…
  • 9. Chemistry on the Internet TODAY
    • Chemistry searches are generally limited to text-based searches across the internet
    • Data are dirty: sorting the wheat from the chaff. Who can you trust?
    • Too many searches required to resource data
  • 10. Trustworthy Chemistry?
    • Encyclopedic articles (Wikipedia)
    • Chemical vendor databases
    • Metabolic pathway databases
    • Property databases
    • Patents with chemical structures
    • Drug Discovery data
    • Scientific publications
    • Compound aggregators
    • Blogs/Wikis and Open Notebook Science
  • 11. Where Would You look? What Do You Trust?
  • 12. Structural Data for LifeSciences DailyMed
  • 13. Lack of Stereochemisty
  • 14. Incorrect Structures
  • 15. Ugh…
  • 16. Drugs are REALLY Messy
  • 17. Vancomycin
    • Who will curate?
    • How would you clean such a large dataset?
    • Assertions!!!
  • 18. The EXPERTS must get it right?!
  • 19. Wikipedia, C&E News, PubChem
    • C&E News (from ACS)
  • 20. Chemistry on the Internet TODAY
    • Chemistry searches are generally limited to text-based searches across the internet
    • Data are dirty: sorting the wheat from the chaff. Who can you trust?
    • Too many searches required to resource data
  • 21. Just “Public Compound” Databases
    • PubChem
    • Drugbank
    • ChEBI/ChEMBL
    • KEGG
    • LipidMAPs
    • ChemIDPlus
    • eMolecules
    • ZINC
    • Lots of chemical vendors
    • ChemSpider
  • 22. media.obsessable.com
    • As few interfaces as possible
    What do humans want?
  • 23. A Pragmatic Vision
      • “ Build a Structure Centric Community to
      • Serve Chemists”
      • Integrate chemical structure data on the web
      • Create a “structure-based hub” to information and data
      • Provide access to structure-based “algorithms”
      • Let chemists contribute their own data
      • Allow the community to curate/correct data
  • 24. Answer Questions
    • Questions a chemist might ask…
      • What is the melting point of n-heptanol?
      • What is the chemical structure of Xanax?
      • Chemically, what is phenolphthalein?
      • What are the stereocenters of cholesterol?
      • Where can I find publications about xylene?
      • What are the different trade names for Ketoconazole?
      • What is the NMR spectrum of Aspirin?
      • What are the safety handling issues for Thymol Blue?
  • 25. ChemSpider Searches
  • 26.  
  • 27. Search “OEA”
  • 28. Search OEA
  • 29. Link Farm Connections
  • 30. Link Farm Connections
  • 31. Search OEA
  • 32. Search OEA
  • 33. Google Books
  • 34. Google Scholar
  • 35. Linked Patents for OEA
  • 36.  
  • 37. Google Patents
  • 38. Microsoft Academic Search
  • 39. RSC Journals
  • 40. RSC Databases
  • 41. Statistics for Today
      • Almost 25 million compounds from >350 data sources
      • About 7000 unique users per day and up to ½ million transactions per day
      • A crowdsourced deposition and curation platform
      • Grows daily – more depositions, more links, more data
  • 42. Searching Chemistry on the Internet
    • How complete a result set will we get if we search for “chemicals” by name?
    • Is there a better way to link chemistry databases? Linking by “names” is dangerous
    • Chemists want structure and SUBstructure searching
  • 43. The InChI Identifier
  • 44. Multiple Layers
  • 45. InChIStrings Hash to InChIKeys
  • 46. Link the Internet with InChIKeys! Taken from: Rafael Sidis’ Blog
  • 47. Vancomycin – Search the Internet
  • 48. Vancomycin Search Molecular SKELETON Search Full Molecule
  • 49. Full Molecule Search: 4 Hits
  • 50. Full Skeleton Search: 104 Hits
  • 51.  
  • 52.  
  • 53.  
  • 54. Vancomycin
  • 55. Vancomycin on ChemSpider 1 compound – 3 days
  • 56. InChIKeys Make the internet searchable by adding InChIKeys Publishers add InChIKeys to papers now…
  • 57. InChIKeys Make the internet searchable by adding InChIKeys Publishers add InChIKeys to papers now… is what???
  • 58. The InChI “Resolver”
  • 59. InChI Resolver to DOIs Structure Search the Web
  • 60. Most Chemistry is NOT Published
    • Only a fraction of chemistry is published
    • Only a tiny fraction of chemistry is patented
    • What of the “Lost Chemistry”- never published and cannot be abstracted
      • Reactions performed
      • Structures made and studied
      • Spectra acquired and then disposed of
      • Available chemicals never found
  • 61. Crowd-sourcing Curation and Deposition
    • Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate
  • 62. Multi-level Curation and Approval Building a Structure Centric Community for Chemists
  • 63. Semantic Markup: Project Prospect
  • 64. Name-Structure Pairs
  • 65. Semantic Linking of Structures
    • What would you want to link off a structure?
      • Chemical suppliers
      • Other publications
      • Analytical Data
      • Related Reactions
      • Wikipedia
      • Patents
      • “ Everything”
  • 66. Org Prep Daily (Blog)
  • 67. ChemSpider SyntheticPages
  • 68. Chemistry on the Internet FUTURE
    • The semantic web for chemistry is in place
    • Crowdsourced contributions are commonplace
    • Chemists will search by structure/substructure
    • Chemistry articles indexed and searchable
    • Reduced number of searches to find data
    • Data are integrated – compounds, vendors, syntheses, data, publications and patents
    • A world of Open Access and Open Data
  • 69. ChemSpider Web Services
  • 70.  
  • 71. Thank you [email_address] Twitter: ChemSpiderman www.chemspider.com/blog SLIDES: www.slideshare.net/AntonyWilliams