ChemSpider – A Community Platform for Chemistry and Resources Supporting the Life Sciences
Chemistry on the Internet TODAY <ul><li>Chemistry searches are generally limited to text-based searches across the interne...
Chemistry on the Internet TODAY <ul><li>Chemistry searches are generally limited to text-based searches across the interne...
 
 
The Final Search Strategy
All Those Names, One Structure A problem to solve…
Chemistry on the Internet TODAY <ul><li>Chemistry searches are generally limited to text-based searches across the interne...
Trustworthy Chemistry? <ul><li>Encyclopedic articles (Wikipedia) </li></ul><ul><li>Chemical vendor databases </li></ul><ul...
Where Would You look?  What Do You Trust?
Question Everything online: www.dhmo.org
Di-Hydrogen  Monoxide <ul><li>2H </li></ul>
Di-Hydrogen   Monoxide <ul><li>2H + 1O </li></ul>
Di-Hydrogen Monoxide <ul><li>H2O </li></ul>
Di-Hydrogen Monoxide <ul><li>H2O </li></ul><ul><li>Water </li></ul>
It’s all on Wikipedia…
Chemistry on The Internet Is Messy
It’s Methane…
What’s Methane?
What’s Methane?
What  ELSE  is Methane???
Drugs are REALLY Messy
Vancomycin <ul><li>Who will curate? </li></ul><ul><li>How would you clean such a large dataset? </li></ul><ul><li>Assertio...
The EXPERTS must get it right?!
Wikipedia, C&E News, PubChem <ul><li>C&E News (from ACS) </li></ul>
Feedback from C&E Senior Editor <ul><li>“ Although CAS and C&EN are both part of the ACS Publications Division,  we at C&E...
Structural Data for LifeSciences DailyMed
Lack of Stereochemisty
  Incorrect Structures
Ugh…
Chemistry on the Internet TODAY <ul><li>Chemistry searches are generally limited to text-based searches across the interne...
Just  “Public Compound” Databases <ul><li>PubChem </li></ul><ul><li>Drugbank </li></ul><ul><li>ChEBI/ChEMBL </li></ul><ul>...
media.obsessable.com <ul><li>As few interfaces as possible </li></ul>What do humans want?
A Pragmatic Vision <ul><ul><li>“ Build a Structure Centric Community to </li></ul></ul><ul><ul><li>Serve Chemists” </li></...
Answer Questions <ul><li>Questions a chemist might ask… </li></ul><ul><ul><li>What is the melting point of n-heptanol?  </...
ChemSpider Searches
Search Cholesterol
Search Cholesterol
Search Cholesterol
Search Cholesterol
Search Cholesterol
A Link Farm to Content
Linked across the internet
Kyoto Encyclopedia of Genes and Genomes
Linking SMPDB
Links to Patents based on structure
Articles Linked
 
Search “OEA”
Search OEA
Search OEA
Search OEA
Linked Patents for OEA
 
Statistics for Today <ul><ul><li>>23 million compounds from >300 data sources </li></ul></ul><ul><ul><li>About 7000 unique...
Searching Chemistry on the  Internet <ul><li>How complete a result set will we get if we search for “chemicals” by name? <...
The InChI Identifier
Multiple Layers
InChIStrings Hash to InChIKeys
Link the Internet with InChIKeys! Taken from: Rafael Sidis’ Blog
Vancomycin –  Search the Internet
Vancomycin Search Molecular SKELETON Search Full Molecule
Full  Molecule  Search: 4 Hits
Full  Skeleton  Search: 104 Hits
 
 
 
Vancomycin
Vancomycin on ChemSpider  1 compound – 3 days
InChIKeys Make the internet searchable by adding InChIKeys Publishers add InChIKeys to papers now…
InChIKeys Make the internet searchable by adding InChIKeys Publishers add InChIKeys to papers now… is what???
The InChI “Resolver”
InChI Resolver to DOIs Structure Search the Web
Most Chemistry is NOT Published <ul><li>Only a fraction of chemistry is published </li></ul><ul><li>Only a tiny fraction o...
The CAS Registry
CAS Registry
Crowd-sourcing Curation and Deposition <ul><li>Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify...
Multi-level Curation and Approval Building a Structure Centric Community for Chemists
Entity-Extraction, Mark-up, Annotate
Semantic Markup: Project Prospect
Success Depends on Dictionaries Link to a Structure or the Right Structure?
Name-Structure Pairs
Semantic Linking of Structures <ul><li>What would you want to link off a structure? </li></ul><ul><ul><li>Chemical supplie...
Org Prep Daily  (Blog)
ChemSpider SyntheticPages
Chemistry on the Internet FUTURE <ul><li>The semantic web for chemistry is in place </li></ul><ul><li>Crowdsourced contrib...
ChemSpider Web Services
 
Thank you [email_address] Twitter: ChemSpiderman www.chemspider.com/blog SLIDES: www.slideshare.net/AntonyWilliams
Upcoming SlideShare
Loading in...5
×

ChemSpider – A Community Platform for Chemistry and Resources Supporting the Life Sciences

764

Published on

ChemSpider was developed with the intention of aggregating and indexing available sources of chemical structures and their associated information into a single searchable repository and making it available to everybody, at no charge. There are many tens of chemical structure databases such as literature data, chemical vendor catalogs, molecular properties, environmental data, toxicity data, analytical data etc. and no single way to search across them. Despite the diversity of databases available online their inherent quality, accuracy and completeness is lacking in many regards. ChemSpider was established to provide a platform whereby the chemistry community could contribute to cleaning up the data, improving the quality of data online and expanding the information available to include data such as reaction syntheses, analytical data and experimental properties. ChemSpider has now grown into a database of over 20 million chemical substances integrated with over 300 disparate data sources, many of these directly supporting the Life Sciences. This presentation will provide an overview of our efforts to improve the quality of data online, to provide a foundation for the semantic web for chemistry and to provide access to a set online tools and services to support access to these data. I will also discuss how ChemSpider is being used to enhance Semantic Publishing in Chemistry at RSC.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
764
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

ChemSpider – A Community Platform for Chemistry and Resources Supporting the Life Sciences

  1. 1. ChemSpider – A Community Platform for Chemistry and Resources Supporting the Life Sciences
  2. 2. Chemistry on the Internet TODAY <ul><li>Chemistry searches are generally limited to text-based searches across the internet </li></ul><ul><li>Data are dirty: sorting the wheat from the chaff. Who can you trust? </li></ul><ul><li>Too many searches required to resource data </li></ul>
  3. 3. Chemistry on the Internet TODAY <ul><li>Chemistry searches are generally limited to text-based searches across the internet </li></ul><ul><li>Data are dirty: sorting the wheat from the chaff. Who can you trust? </li></ul><ul><li>Too many searches required to resource data </li></ul>
  4. 6. The Final Search Strategy
  5. 7. All Those Names, One Structure A problem to solve…
  6. 8. Chemistry on the Internet TODAY <ul><li>Chemistry searches are generally limited to text-based searches across the internet </li></ul><ul><li>Data are dirty: sorting the wheat from the chaff. Who can you trust? </li></ul><ul><li>Too many searches required to resource data </li></ul>
  7. 9. Trustworthy Chemistry? <ul><li>Encyclopedic articles (Wikipedia) </li></ul><ul><li>Chemical vendor databases </li></ul><ul><li>Metabolic pathway databases </li></ul><ul><li>Property databases </li></ul><ul><li>Patents with chemical structures </li></ul><ul><li>Drug Discovery data </li></ul><ul><li>Scientific publications </li></ul><ul><li>Compound aggregators </li></ul><ul><li>Blogs/Wikis and Open Notebook Science </li></ul>
  8. 10. Where Would You look? What Do You Trust?
  9. 11. Question Everything online: www.dhmo.org
  10. 12. Di-Hydrogen Monoxide <ul><li>2H </li></ul>
  11. 13. Di-Hydrogen Monoxide <ul><li>2H + 1O </li></ul>
  12. 14. Di-Hydrogen Monoxide <ul><li>H2O </li></ul>
  13. 15. Di-Hydrogen Monoxide <ul><li>H2O </li></ul><ul><li>Water </li></ul>
  14. 16. It’s all on Wikipedia…
  15. 17. Chemistry on The Internet Is Messy
  16. 18. It’s Methane…
  17. 19. What’s Methane?
  18. 20. What’s Methane?
  19. 21. What ELSE is Methane???
  20. 22. Drugs are REALLY Messy
  21. 23. Vancomycin <ul><li>Who will curate? </li></ul><ul><li>How would you clean such a large dataset? </li></ul><ul><li>Assertions!!! </li></ul>
  22. 24. The EXPERTS must get it right?!
  23. 25. Wikipedia, C&E News, PubChem <ul><li>C&E News (from ACS) </li></ul>
  24. 26. Feedback from C&E Senior Editor <ul><li>“ Although CAS and C&EN are both part of the ACS Publications Division, we at C&EN still have to pay for our SciFinder access , strangely enough.” </li></ul><ul><li>“ It would be nice to have an authoritative web-based source of standard, well-drawn structures for chemists to go to so they can freely cut and paste structures into their papers, PowerPoint presentations, and anything else they might need. Maybe Wikipedia will be that source one day .” </li></ul>
  25. 27. Structural Data for LifeSciences DailyMed
  26. 28. Lack of Stereochemisty
  27. 29. Incorrect Structures
  28. 30. Ugh…
  29. 31. Chemistry on the Internet TODAY <ul><li>Chemistry searches are generally limited to text-based searches across the internet </li></ul><ul><li>Data are dirty: sorting the wheat from the chaff. Who can you trust? </li></ul><ul><li>Too many searches required to resource data </li></ul>
  30. 32. Just “Public Compound” Databases <ul><li>PubChem </li></ul><ul><li>Drugbank </li></ul><ul><li>ChEBI/ChEMBL </li></ul><ul><li>KEGG </li></ul><ul><li>LipidMAPs </li></ul><ul><li>ChemIDPlus </li></ul><ul><li>eMolecules </li></ul><ul><li>ZINC </li></ul><ul><li>Lots of chemical vendors </li></ul><ul><li>ChemSpider </li></ul>
  31. 33. media.obsessable.com <ul><li>As few interfaces as possible </li></ul>What do humans want?
  32. 34. A Pragmatic Vision <ul><ul><li>“ Build a Structure Centric Community to </li></ul></ul><ul><ul><li>Serve Chemists” </li></ul></ul><ul><li>December 2006 – A hobby project initiated to connect chemistry on the web </li></ul><ul><ul><li>Integrate chemical structure data on the web </li></ul></ul><ul><ul><li>Create a “structure-based hub” to information and data </li></ul></ul><ul><ul><li>Provide access to structure-based “algorithms” </li></ul></ul><ul><ul><li>Let chemists contribute their own data </li></ul></ul><ul><ul><li>Allow the community to curate/correct data </li></ul></ul>
  33. 35. Answer Questions <ul><li>Questions a chemist might ask… </li></ul><ul><ul><li>What is the melting point of n-heptanol? </li></ul></ul><ul><ul><li>What is the chemical structure of Xanax? </li></ul></ul><ul><ul><li>Chemically, what is phenolphthalein? </li></ul></ul><ul><ul><li>What are the stereocenters of cholesterol? </li></ul></ul><ul><ul><li>Where can I find publications about xylene? </li></ul></ul><ul><ul><li>What are the different trade names for Ketoconazole? </li></ul></ul><ul><ul><li>What is the NMR spectrum of Aspirin? </li></ul></ul><ul><ul><li>What are the safety handling issues for Thymol Blue? </li></ul></ul>
  34. 36. ChemSpider Searches
  35. 37. Search Cholesterol
  36. 38. Search Cholesterol
  37. 39. Search Cholesterol
  38. 40. Search Cholesterol
  39. 41. Search Cholesterol
  40. 42. A Link Farm to Content
  41. 43. Linked across the internet
  42. 44. Kyoto Encyclopedia of Genes and Genomes
  43. 45. Linking SMPDB
  44. 46. Links to Patents based on structure
  45. 47. Articles Linked
  46. 49. Search “OEA”
  47. 50. Search OEA
  48. 51. Search OEA
  49. 52. Search OEA
  50. 53. Linked Patents for OEA
  51. 55. Statistics for Today <ul><ul><li>>23 million compounds from >300 data sources </li></ul></ul><ul><ul><li>About 7000 unique users per day and up to ½ million transactions per day </li></ul></ul><ul><ul><li>A crowdsourced deposition and curation platform </li></ul></ul><ul><ul><li>Grows daily – more depositions, more links, more data </li></ul></ul>
  52. 56. Searching Chemistry on the Internet <ul><li>How complete a result set will we get if we search for “chemicals” by name? </li></ul><ul><li>Is there a better way to link chemistry databases? Linking by “names” is dangerous </li></ul><ul><li>Chemists want structure and SUBstructure searching </li></ul>
  53. 57. The InChI Identifier
  54. 58. Multiple Layers
  55. 59. InChIStrings Hash to InChIKeys
  56. 60. Link the Internet with InChIKeys! Taken from: Rafael Sidis’ Blog
  57. 61. Vancomycin – Search the Internet
  58. 62. Vancomycin Search Molecular SKELETON Search Full Molecule
  59. 63. Full Molecule Search: 4 Hits
  60. 64. Full Skeleton Search: 104 Hits
  61. 68. Vancomycin
  62. 69. Vancomycin on ChemSpider 1 compound – 3 days
  63. 70. InChIKeys Make the internet searchable by adding InChIKeys Publishers add InChIKeys to papers now…
  64. 71. InChIKeys Make the internet searchable by adding InChIKeys Publishers add InChIKeys to papers now… is what???
  65. 72. The InChI “Resolver”
  66. 73. InChI Resolver to DOIs Structure Search the Web
  67. 74. Most Chemistry is NOT Published <ul><li>Only a fraction of chemistry is published </li></ul><ul><li>Only a tiny fraction of chemistry is patented </li></ul><ul><li>What of the “Lost Chemistry”- never published and cannot be abstracted </li></ul><ul><ul><li>Reactions performed </li></ul></ul><ul><ul><li>Structures made and studied </li></ul></ul><ul><ul><li>Spectra acquired and then disposed of </li></ul></ul><ul><ul><li>Available chemicals never found </li></ul></ul>
  68. 75. The CAS Registry
  69. 76. CAS Registry
  70. 77. Crowd-sourcing Curation and Deposition <ul><li>Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate </li></ul>
  71. 78. Multi-level Curation and Approval Building a Structure Centric Community for Chemists
  72. 79. Entity-Extraction, Mark-up, Annotate
  73. 80. Semantic Markup: Project Prospect
  74. 81. Success Depends on Dictionaries Link to a Structure or the Right Structure?
  75. 82. Name-Structure Pairs
  76. 83. Semantic Linking of Structures <ul><li>What would you want to link off a structure? </li></ul><ul><ul><li>Chemical suppliers </li></ul></ul><ul><ul><li>Other publications </li></ul></ul><ul><ul><li>Analytical Data </li></ul></ul><ul><ul><li>Related Reactions </li></ul></ul><ul><ul><li>Wikipedia </li></ul></ul><ul><ul><li>Patents </li></ul></ul><ul><ul><li>“ Everything” </li></ul></ul>
  77. 84. Org Prep Daily (Blog)
  78. 85. ChemSpider SyntheticPages
  79. 86. Chemistry on the Internet FUTURE <ul><li>The semantic web for chemistry is in place </li></ul><ul><li>Crowdsourced contributions are commonplace </li></ul><ul><li>Chemists will search by structure/substructure </li></ul><ul><li>Chemistry articles indexed and searchable </li></ul><ul><li>Reduced number of searches to find data </li></ul><ul><li>Data are integrated – compounds, vendors, syntheses, data, publications and patents </li></ul><ul><li>A world of Open Access and Open Data </li></ul><ul><li>Classical business models will have to morph </li></ul>
  80. 87. ChemSpider Web Services
  81. 89. Thank you [email_address] Twitter: ChemSpiderman www.chemspider.com/blog SLIDES: www.slideshare.net/AntonyWilliams
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×