Building A Community Resource For The Life Sciences
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Building A Community Resource For The Life Sciences

  • 1,177 views
Uploaded on

This is a presentation given in Track 4, Open Access and Cheminformatics, at the Bio-IT Meeting in Boston on April 21st 2010. It is a general overview of ChemSpider activities to link together the......

This is a presentation given in Track 4, Open Access and Cheminformatics, at the Bio-IT Meeting in Boston on April 21st 2010. It is a general overview of ChemSpider activities to link together the internet for chemists and validate and curate data. We won the Bio-IT Best Practices Community Service Award that evening also.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,177
On Slideshare
1,096
From Embeds
81
Number of Embeds
3

Actions

Shares
Downloads
8
Comments
0
Likes
1

Embeds 81

http://www.chemspider.com 79
http://www.lmodules.com 1
http://www.slideshare.net 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Building A Community Platform to Support Chemistry and the Life Sciences
  • 2. Where Would You look? What Do You Trust?
  • 3. Chemistry on the Internet TODAY
    • Chemistry searches are generally limited to text-based searches across the internet
    • Data are dirty: sorting the wheat from the chaff. Who can you trust?
    • Too many searches required to resource data
  • 4. Chemistry on the Internet TODAY
    • Chemistry searches are generally limited to text-based searches across the internet
    • Data are dirty: sorting the wheat from the chaff. Who can you trust?
    • Too many searches required to resource data
  • 5.  
  • 6.  
  • 7. The Final Search Strategy
  • 8. All Those Names, One Structure A problem to solve…
  • 9. Chemistry on the Internet TODAY
    • Chemistry searches are generally limited to text-based searches across the internet
    • Data are dirty: sorting the wheat from the chaff. Who can you trust?
    • Too many searches required to resource data
  • 10. Trustworthy Chemistry?
    • Encyclopedic articles (Wikipedia)
    • Chemical vendor databases
    • Metabolic pathway databases
    • Property databases
    • Patents with chemical structures
    • Drug Discovery data
    • Scientific publications
    • Compound aggregators
    • Blogs/Wikis and Open Notebook Science
  • 11. Where Would You look? What Do You Trust?
  • 12. Structural Data for LifeSciences DailyMed
  • 13. Lack of Stereochemisty
  • 14. Incorrect Structures
  • 15. Ugh…
  • 16. Drugs are REALLY Messy
  • 17. Vancomycin
    • Who will curate?
    • How would you clean such a large dataset?
    • Assertions!!!
  • 18. The EXPERTS must get it right?!
  • 19. Wikipedia, C&E News, PubChem
    • C&E News (from ACS)
  • 20. Chemistry on the Internet TODAY
    • Chemistry searches are generally limited to text-based searches across the internet
    • Data are dirty: sorting the wheat from the chaff. Who can you trust?
    • Too many searches required to resource data
  • 21. Just “Public Compound” Databases
    • PubChem
    • Drugbank
    • ChEBI/ChEMBL
    • KEGG
    • LipidMAPs
    • ChemIDPlus
    • eMolecules
    • ZINC
    • Lots of chemical vendors
    • ChemSpider
  • 22. media.obsessable.com
    • As few interfaces as possible
    What do humans want?
  • 23. A Pragmatic Vision
      • “ Build a Structure Centric Community to
      • Serve Chemists”
      • Integrate chemical structure data on the web
      • Create a “structure-based hub” to information and data
      • Provide access to structure-based “algorithms”
      • Let chemists contribute their own data
      • Allow the community to curate/correct data
  • 24. Answer Questions
    • Questions a chemist might ask…
      • What is the melting point of n-heptanol?
      • What is the chemical structure of Xanax?
      • Chemically, what is phenolphthalein?
      • What are the stereocenters of cholesterol?
      • Where can I find publications about xylene?
      • What are the different trade names for Ketoconazole?
      • What is the NMR spectrum of Aspirin?
      • What are the safety handling issues for Thymol Blue?
  • 25. ChemSpider Searches
  • 26.  
  • 27. Search “OEA”
  • 28. Search OEA
  • 29. Link Farm Connections
  • 30. Link Farm Connections
  • 31. Search OEA
  • 32. Search OEA
  • 33. Google Books
  • 34. Google Scholar
  • 35. Linked Patents for OEA
  • 36.  
  • 37. Google Patents
  • 38. Microsoft Academic Search
  • 39. RSC Journals
  • 40. RSC Databases
  • 41. Statistics for Today
      • Almost 25 million compounds from >350 data sources
      • About 7000 unique users per day and up to ½ million transactions per day
      • A crowdsourced deposition and curation platform
      • Grows daily – more depositions, more links, more data
  • 42. Searching Chemistry on the Internet
    • How complete a result set will we get if we search for “chemicals” by name?
    • Is there a better way to link chemistry databases? Linking by “names” is dangerous
    • Chemists want structure and SUBstructure searching
  • 43. The InChI Identifier
  • 44. Multiple Layers
  • 45. InChIStrings Hash to InChIKeys
  • 46. Link the Internet with InChIKeys! Taken from: Rafael Sidis’ Blog
  • 47. Vancomycin – Search the Internet
  • 48. Vancomycin Search Molecular SKELETON Search Full Molecule
  • 49. Full Molecule Search: 4 Hits
  • 50. Full Skeleton Search: 104 Hits
  • 51.  
  • 52.  
  • 53.  
  • 54. Vancomycin
  • 55. Vancomycin on ChemSpider 1 compound – 3 days
  • 56. InChIKeys Make the internet searchable by adding InChIKeys Publishers add InChIKeys to papers now…
  • 57. InChIKeys Make the internet searchable by adding InChIKeys Publishers add InChIKeys to papers now… is what???
  • 58. The InChI “Resolver”
  • 59. InChI Resolver to DOIs Structure Search the Web
  • 60. Most Chemistry is NOT Published
    • Only a fraction of chemistry is published
    • Only a tiny fraction of chemistry is patented
    • What of the “Lost Chemistry”- never published and cannot be abstracted
      • Reactions performed
      • Structures made and studied
      • Spectra acquired and then disposed of
      • Available chemicals never found
  • 61. Crowd-sourcing Curation and Deposition
    • Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate
  • 62. Multi-level Curation and Approval Building a Structure Centric Community for Chemists
  • 63. Semantic Markup: Project Prospect
  • 64. Name-Structure Pairs
  • 65. Semantic Linking of Structures
    • What would you want to link off a structure?
      • Chemical suppliers
      • Other publications
      • Analytical Data
      • Related Reactions
      • Wikipedia
      • Patents
      • “ Everything”
  • 66. Org Prep Daily (Blog)
  • 67. ChemSpider SyntheticPages
  • 68. Chemistry on the Internet FUTURE
    • The semantic web for chemistry is in place
    • Crowdsourced contributions are commonplace
    • Chemists will search by structure/substructure
    • Chemistry articles indexed and searchable
    • Reduced number of searches to find data
    • Data are integrated – compounds, vendors, syntheses, data, publications and patents
    • A world of Open Access and Open Data
  • 69. ChemSpider Web Services
  • 70.  
  • 71. Thank you [email_address] Twitter: ChemSpiderman www.chemspider.com/blog SLIDES: www.slideshare.net/AntonyWilliams