How Internet Resources Are Providing a Collaborative Community for Chemistry


Published on

Online chemistry resources have expanded dramatically in the past few years with resources such as PubChem, ChEBI, Wikipedia, ChemSpider and many others offering rich resources to scientists seeking data and information. ChemSpider has become one of the primary chemistry portals delivering a heterogeneous mix of Open and Closed data. ChemSpider offers a structure-centric community for collaboration enabling the crowd-sourced deposition and validation of online chemistry data. ChemSpider has also been integrated into the ChemMantis system – CHEMistry Markup And Nomenclature Transformation Integrated System. This platform facilitates entity extraction of science related terms using both heuristics and highly curated dictionaries. The resulting documents are marked up to allow viewing of chemical structures linked out to over 200 different data sources via the ChemSpider database.

Published in: Technology, Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

How Internet Resources Are Providing a Collaborative Community for Chemistry

  1. 1. How Internet Resources Are Providing a Collaborative Community for Chemistry 60 slides in 20 minutes
  2. 2. Imagine a time when …. <ul><li>The internet is searchable by chemical structure and substructure (e.g.Wikipedia, Google Scholar) </li></ul><ul><li>Chemistry articles are indexed and searchable by a free online service </li></ul><ul><li>The web is linked together through the “language of chemistry” </li></ul>
  3. 3. It’s Coming…Linked Data Cloud
  4. 4. Thanks to the Organizers…
  5. 5. Antony Williams vs Identifiers Passport ID Dad, Tony, others SSN Green Card License 5 email addresses ChemSpiderman (blog, Twitter account, Facebook, Friendfeed) OpenID … .
  6. 6. Aspirin vs Chemical Identifiers
  7. 7. Aspirin names and synonyms <ul><li>Text searches depend on correct association </li></ul><ul><li>335 suggested identifiers for Aspirin just on PubChem! </li></ul><ul><li>Disambiguation dictionaries are necessary </li></ul>
  8. 11. The Final Search Strategy
  9. 12. All Those Names, One Structure
  10. 13. Searching Chemistry on the Internet <ul><li>How complete a result set will we get if we search for “chemicals” by name? </li></ul><ul><li>Is there a better way to link chemistry databases? Linking by “names” is dangerous </li></ul><ul><li>Chemists want structure and SUBstructure searching </li></ul>
  11. 14. The InChI Identifier
  12. 15. Multiple Layers
  13. 16. InChIStrings Hash to InChIKeys
  14. 17. Oleoylethanolamine
  15. 18. Search Engine Dependencies
  16. 19. Search Engine Dependencies
  17. 20. InChIs have traction…
  18. 21. RDF Linking of Structures
  19. 22. PubChem
  20. 23. The Simplest Organic Molecule
  21. 24. Vancomycin
  22. 26. Vancomycin <ul><li>Who will curate? </li></ul><ul><li>How would you clean such a large dataset? </li></ul>
  23. 27. Vancomycin on ChemSpider
  24. 28. Vancomycin
  25. 29. Vancomycin Search Molecular SKELETON Search Full Molecule
  26. 30. Full Skeleton Search: 104 Hits
  27. 31. Full Molecule Search: 4 Hits
  28. 32. The InChI “Resolver”
  29. 33. Content is King and Quality Costs <ul><li>Curated Chemistry “content” is expensive to create </li></ul><ul><ul><li>Patent searching </li></ul></ul><ul><ul><li>Structures and properties </li></ul></ul><ul><ul><li>Drug databases </li></ul></ul><ul><ul><li>Literature databases </li></ul></ul><ul><li>Chemical Abstracts Service (CAS), the “Gold Standard” in Chemistry related information </li></ul><ul><ul><li>102 years of content </li></ul></ul><ul><ul><li>>50 million substances </li></ul></ul><ul><ul><li>Proprietary platform </li></ul></ul>
  30. 34. The EXPERTS must get it right?!
  31. 35. Wikipedia, C&E News, PubChem <ul><li>C&E News (from ACS) </li></ul>
  32. 36. Feedback from Steve Ritter <ul><li>“ Although CAS and C&EN are both part of the ACS Publications Division, we at C&EN still have to pay for our SciFinder access, strangely enough.” </li></ul><ul><li>“ It would be nice to have an authoritative web-based source of standard, well-drawn structures for chemists to go to so they can freely cut and paste structures into their papers, PowerPoint presentations, and anything else they might need. Maybe Wikipedia will be that source one day .” </li></ul>
  33. 37. Maybe it will be ChemSpider? <ul><li>What is ChemSpider? </li></ul><ul><ul><li>A database of almost 23 million compounds, >200 data sources </li></ul></ul><ul><ul><li>A deposition and curation platform </li></ul></ul><ul><ul><li>A publishing platform for the community </li></ul></ul><ul><ul><li>Grows daily – more depositions, more links, more data sources </li></ul></ul>
  34. 38. Search OEA
  35. 39. Search OEA
  36. 40. Search OEA
  37. 41. Search OEA
  38. 42. Linked Patents for OEA
  39. 44. Linked resources <ul><li>Vendor sites – Aldrich, Alfa Aesar, TCI and 100s of others </li></ul><ul><li>Government databases – PubChem, DSSTox, FDA databases, ChemIDPlus,… </li></ul><ul><li>Biological Databases – Protein Database, Stitch, KEGG, ChEBI,… </li></ul><ul><li>Analytical databases –NMRShiftDB,… </li></ul>
  40. 45. Linked across the internet
  41. 46. Kyoto Encyclopedia of Genes and Genomes
  42. 47. Complex Data and Information
  43. 48. Remember – QUALITY ISSUES
  44. 49. The FDA’s DailyMed
  45. 50. Incorrect Structures
  46. 51. Crowd-sourcing Chemistry Curation
  47. 52. The Currency of Recognition <ul><li>We need to build a platform for recognition …. </li></ul>
  48. 53. Chemistry – A Deposition Platform <ul><li>CAS indexes published literature, patents and chemical vendors </li></ul><ul><li>CAS indexes ChemSpider – >303,000 records </li></ul><ul><li>“ Lost Chemistry” – syntheses in theses, lab notebooks? Compounds in private collections? </li></ul><ul><li>ChemSpider accepts public depositions, linking to websites, hosting of details etc. Accepts structures, text, spectra, images. </li></ul>
  49. 54. Blogs should be searchable too…
  50. 55. Use Intelligent Structures : ChemSpider Embed Web Service
  51. 56. ChemSpider Web Services
  52. 57. Semantic Linking of Structures <ul><li>What would you want to link off a structure? </li></ul><ul><ul><li>Chemical suppliers </li></ul></ul><ul><ul><li>Other publications </li></ul></ul><ul><ul><li>Analytical Data </li></ul></ul><ul><ul><li>Related Reactions </li></ul></ul><ul><ul><li>Wikipedia </li></ul></ul><ul><ul><li>Patents </li></ul></ul><ul><ul><li>“ Everything” </li></ul></ul><ul><ul><li>See Richard Kidd’s Talk </li></ul></ul>
  53. 58. Conclusions <ul><li>Internet resources provide a collaborative community for chemistry </li></ul><ul><li>Crowdsourcing to expand, curate and integrate to the benefit of chemists </li></ul><ul><li>Searching the web for chemistry is arriving </li></ul><ul><li>InChIs are enabling chemistry on the internet </li></ul><ul><li>Question Quality! </li></ul>
  54. 60. Acknowledgments <ul><li>Valery Tkachenko and Sergey Golotvin </li></ul><ul><li>RSC infrastructure team </li></ul><ul><li>The ChemSpider advisory group </li></ul><ul><li>The Wikipedia Chemistry team </li></ul>
  55. 61. [email_address] Twitter: ChemSpiderman