Building A Community Platform to Support Chemistry and the Life Sciences
Where Would You look?  What Do You Trust?
Chemistry on the Internet TODAY <ul><li>Chemistry searches are generally limited to text-based searches across the interne...
Chemistry on the Internet TODAY <ul><li>Chemistry searches are generally limited to text-based searches across the interne...
 
 
The Final Search Strategy
All Those Names, One Structure A problem to solve…
Chemistry on the Internet TODAY <ul><li>Chemistry searches are generally limited to text-based searches across the interne...
Trustworthy Chemistry? <ul><li>Encyclopedic articles (Wikipedia) </li></ul><ul><li>Chemical vendor databases </li></ul><ul...
Where Would You look?  What Do You Trust?
Structural Data for LifeSciences DailyMed
Lack of Stereochemisty
  Incorrect Structures
Ugh…
Drugs are REALLY Messy
Vancomycin <ul><li>Who will curate? </li></ul><ul><li>How would you clean such a large dataset? </li></ul><ul><li>Assertio...
The EXPERTS must get it right?!
Wikipedia, C&E News, PubChem <ul><li>C&E News (from ACS) </li></ul>
Chemistry on the Internet TODAY <ul><li>Chemistry searches are generally limited to text-based searches across the interne...
Just  “Public Compound” Databases <ul><li>PubChem </li></ul><ul><li>Drugbank </li></ul><ul><li>ChEBI/ChEMBL </li></ul><ul>...
media.obsessable.com <ul><li>As few interfaces as possible </li></ul>What do humans want?
A Pragmatic Vision <ul><ul><li>“ Build a Structure Centric Community to </li></ul></ul><ul><ul><li>Serve Chemists” </li></...
Answer Questions <ul><li>Questions a chemist might ask… </li></ul><ul><ul><li>What is the melting point of n-heptanol?  </...
ChemSpider Searches
 
Search “OEA”
Search OEA
Link Farm Connections
Link Farm Connections
Search OEA
Search OEA
Google Books
Google Scholar
Linked Patents for OEA
 
Google Patents
Microsoft Academic Search
RSC Journals
RSC Databases
Statistics for Today <ul><ul><li>Almost 25 million compounds from >350 data sources </li></ul></ul><ul><ul><li>About 7000 ...
Searching Chemistry on the  Internet <ul><li>How complete a result set will we get if we search for “chemicals” by name? <...
The InChI Identifier
Multiple Layers
InChIStrings Hash to InChIKeys
Link the Internet with InChIKeys! Taken from: Rafael Sidis’ Blog
Vancomycin –  Search the Internet
Vancomycin Search Molecular SKELETON Search Full Molecule
Full  Molecule  Search: 4 Hits
Full  Skeleton  Search: 104 Hits
 
 
 
Vancomycin
Vancomycin on ChemSpider  1 compound – 3 days
InChIKeys Make the internet searchable by adding InChIKeys Publishers add InChIKeys to papers now…
InChIKeys Make the internet searchable by adding InChIKeys Publishers add InChIKeys to papers now… is what???
The InChI “Resolver”
InChI Resolver to DOIs Structure Search the Web
Most Chemistry is NOT Published <ul><li>Only a fraction of chemistry is published </li></ul><ul><li>Only a tiny fraction o...
Crowd-sourcing Curation and Deposition <ul><li>Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify...
Multi-level Curation and Approval Building a Structure Centric Community for Chemists
Semantic Markup: Project Prospect
Name-Structure Pairs
Semantic Linking of Structures <ul><li>What would you want to link off a structure? </li></ul><ul><ul><li>Chemical supplie...
Org Prep Daily  (Blog)
ChemSpider SyntheticPages
Chemistry on the Internet FUTURE <ul><li>The semantic web for chemistry is in place </li></ul><ul><li>Crowdsourced contrib...
ChemSpider Web Services
 
Thank you [email_address] Twitter: ChemSpiderman www.chemspider.com/blog SLIDES: www.slideshare.net/AntonyWilliams
Upcoming SlideShare
Loading in...5
×

Building A Community Resource For The Life Sciences

688

Published on

This is a presentation given in Track 4, Open Access and Cheminformatics, at the Bio-IT Meeting in Boston on April 21st 2010. It is a general overview of ChemSpider activities to link together the internet for chemists and validate and curate data. We won the Bio-IT Best Practices Community Service Award that evening also.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
688
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
9
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Building A Community Resource For The Life Sciences

  1. 1. Building A Community Platform to Support Chemistry and the Life Sciences
  2. 2. Where Would You look? What Do You Trust?
  3. 3. Chemistry on the Internet TODAY <ul><li>Chemistry searches are generally limited to text-based searches across the internet </li></ul><ul><li>Data are dirty: sorting the wheat from the chaff. Who can you trust? </li></ul><ul><li>Too many searches required to resource data </li></ul>
  4. 4. Chemistry on the Internet TODAY <ul><li>Chemistry searches are generally limited to text-based searches across the internet </li></ul><ul><li>Data are dirty: sorting the wheat from the chaff. Who can you trust? </li></ul><ul><li>Too many searches required to resource data </li></ul>
  5. 7. The Final Search Strategy
  6. 8. All Those Names, One Structure A problem to solve…
  7. 9. Chemistry on the Internet TODAY <ul><li>Chemistry searches are generally limited to text-based searches across the internet </li></ul><ul><li>Data are dirty: sorting the wheat from the chaff. Who can you trust? </li></ul><ul><li>Too many searches required to resource data </li></ul>
  8. 10. Trustworthy Chemistry? <ul><li>Encyclopedic articles (Wikipedia) </li></ul><ul><li>Chemical vendor databases </li></ul><ul><li>Metabolic pathway databases </li></ul><ul><li>Property databases </li></ul><ul><li>Patents with chemical structures </li></ul><ul><li>Drug Discovery data </li></ul><ul><li>Scientific publications </li></ul><ul><li>Compound aggregators </li></ul><ul><li>Blogs/Wikis and Open Notebook Science </li></ul>
  9. 11. Where Would You look? What Do You Trust?
  10. 12. Structural Data for LifeSciences DailyMed
  11. 13. Lack of Stereochemisty
  12. 14. Incorrect Structures
  13. 15. Ugh…
  14. 16. Drugs are REALLY Messy
  15. 17. Vancomycin <ul><li>Who will curate? </li></ul><ul><li>How would you clean such a large dataset? </li></ul><ul><li>Assertions!!! </li></ul>
  16. 18. The EXPERTS must get it right?!
  17. 19. Wikipedia, C&E News, PubChem <ul><li>C&E News (from ACS) </li></ul>
  18. 20. Chemistry on the Internet TODAY <ul><li>Chemistry searches are generally limited to text-based searches across the internet </li></ul><ul><li>Data are dirty: sorting the wheat from the chaff. Who can you trust? </li></ul><ul><li>Too many searches required to resource data </li></ul>
  19. 21. Just “Public Compound” Databases <ul><li>PubChem </li></ul><ul><li>Drugbank </li></ul><ul><li>ChEBI/ChEMBL </li></ul><ul><li>KEGG </li></ul><ul><li>LipidMAPs </li></ul><ul><li>ChemIDPlus </li></ul><ul><li>eMolecules </li></ul><ul><li>ZINC </li></ul><ul><li>Lots of chemical vendors </li></ul><ul><li>ChemSpider </li></ul>
  20. 22. media.obsessable.com <ul><li>As few interfaces as possible </li></ul>What do humans want?
  21. 23. A Pragmatic Vision <ul><ul><li>“ Build a Structure Centric Community to </li></ul></ul><ul><ul><li>Serve Chemists” </li></ul></ul><ul><ul><li>Integrate chemical structure data on the web </li></ul></ul><ul><ul><li>Create a “structure-based hub” to information and data </li></ul></ul><ul><ul><li>Provide access to structure-based “algorithms” </li></ul></ul><ul><ul><li>Let chemists contribute their own data </li></ul></ul><ul><ul><li>Allow the community to curate/correct data </li></ul></ul>
  22. 24. Answer Questions <ul><li>Questions a chemist might ask… </li></ul><ul><ul><li>What is the melting point of n-heptanol? </li></ul></ul><ul><ul><li>What is the chemical structure of Xanax? </li></ul></ul><ul><ul><li>Chemically, what is phenolphthalein? </li></ul></ul><ul><ul><li>What are the stereocenters of cholesterol? </li></ul></ul><ul><ul><li>Where can I find publications about xylene? </li></ul></ul><ul><ul><li>What are the different trade names for Ketoconazole? </li></ul></ul><ul><ul><li>What is the NMR spectrum of Aspirin? </li></ul></ul><ul><ul><li>What are the safety handling issues for Thymol Blue? </li></ul></ul>
  23. 25. ChemSpider Searches
  24. 27. Search “OEA”
  25. 28. Search OEA
  26. 29. Link Farm Connections
  27. 30. Link Farm Connections
  28. 31. Search OEA
  29. 32. Search OEA
  30. 33. Google Books
  31. 34. Google Scholar
  32. 35. Linked Patents for OEA
  33. 37. Google Patents
  34. 38. Microsoft Academic Search
  35. 39. RSC Journals
  36. 40. RSC Databases
  37. 41. Statistics for Today <ul><ul><li>Almost 25 million compounds from >350 data sources </li></ul></ul><ul><ul><li>About 7000 unique users per day and up to ½ million transactions per day </li></ul></ul><ul><ul><li>A crowdsourced deposition and curation platform </li></ul></ul><ul><ul><li>Grows daily – more depositions, more links, more data </li></ul></ul>
  38. 42. Searching Chemistry on the Internet <ul><li>How complete a result set will we get if we search for “chemicals” by name? </li></ul><ul><li>Is there a better way to link chemistry databases? Linking by “names” is dangerous </li></ul><ul><li>Chemists want structure and SUBstructure searching </li></ul>
  39. 43. The InChI Identifier
  40. 44. Multiple Layers
  41. 45. InChIStrings Hash to InChIKeys
  42. 46. Link the Internet with InChIKeys! Taken from: Rafael Sidis’ Blog
  43. 47. Vancomycin – Search the Internet
  44. 48. Vancomycin Search Molecular SKELETON Search Full Molecule
  45. 49. Full Molecule Search: 4 Hits
  46. 50. Full Skeleton Search: 104 Hits
  47. 54. Vancomycin
  48. 55. Vancomycin on ChemSpider 1 compound – 3 days
  49. 56. InChIKeys Make the internet searchable by adding InChIKeys Publishers add InChIKeys to papers now…
  50. 57. InChIKeys Make the internet searchable by adding InChIKeys Publishers add InChIKeys to papers now… is what???
  51. 58. The InChI “Resolver”
  52. 59. InChI Resolver to DOIs Structure Search the Web
  53. 60. Most Chemistry is NOT Published <ul><li>Only a fraction of chemistry is published </li></ul><ul><li>Only a tiny fraction of chemistry is patented </li></ul><ul><li>What of the “Lost Chemistry”- never published and cannot be abstracted </li></ul><ul><ul><li>Reactions performed </li></ul></ul><ul><ul><li>Structures made and studied </li></ul></ul><ul><ul><li>Spectra acquired and then disposed of </li></ul></ul><ul><ul><li>Available chemicals never found </li></ul></ul>
  54. 61. Crowd-sourcing Curation and Deposition <ul><li>Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate </li></ul>
  55. 62. Multi-level Curation and Approval Building a Structure Centric Community for Chemists
  56. 63. Semantic Markup: Project Prospect
  57. 64. Name-Structure Pairs
  58. 65. Semantic Linking of Structures <ul><li>What would you want to link off a structure? </li></ul><ul><ul><li>Chemical suppliers </li></ul></ul><ul><ul><li>Other publications </li></ul></ul><ul><ul><li>Analytical Data </li></ul></ul><ul><ul><li>Related Reactions </li></ul></ul><ul><ul><li>Wikipedia </li></ul></ul><ul><ul><li>Patents </li></ul></ul><ul><ul><li>“ Everything” </li></ul></ul>
  59. 66. Org Prep Daily (Blog)
  60. 67. ChemSpider SyntheticPages
  61. 68. Chemistry on the Internet FUTURE <ul><li>The semantic web for chemistry is in place </li></ul><ul><li>Crowdsourced contributions are commonplace </li></ul><ul><li>Chemists will search by structure/substructure </li></ul><ul><li>Chemistry articles indexed and searchable </li></ul><ul><li>Reduced number of searches to find data </li></ul><ul><li>Data are integrated – compounds, vendors, syntheses, data, publications and patents </li></ul><ul><li>A world of Open Access and Open Data </li></ul>
  62. 69. ChemSpider Web Services
  63. 71. Thank you [email_address] Twitter: ChemSpiderman www.chemspider.com/blog SLIDES: www.slideshare.net/AntonyWilliams
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×