Your SlideShare is downloading. ×
0
How Internet Resources Are Providing a Collaborative Community for Chemistry 60 slides in 20 minutes
Imagine a time when …. <ul><li>The internet is searchable by chemical structure and substructure (e.g.Wikipedia, Google Sc...
It’s Coming…Linked Data Cloud
Thanks to the Organizers…
Antony Williams vs Identifiers Passport ID Dad, Tony, others SSN Green Card License 5 email addresses ChemSpiderman (blog,...
Aspirin vs Chemical Identifiers
Aspirin names and synonyms <ul><li>Text searches depend on correct association </li></ul><ul><li>335  suggested identifier...
 
 
 
The Final Search Strategy
All Those Names, One Structure
Searching Chemistry on the  Internet <ul><li>How complete a result set will we get if we search for “chemicals” by name? <...
The InChI Identifier
Multiple Layers
InChIStrings Hash to InChIKeys
Oleoylethanolamine
Search Engine Dependencies
Search Engine Dependencies
InChIs have traction…
RDF Linking of Structures
PubChem
The Simplest Organic Molecule
Vancomycin
 
Vancomycin <ul><li>Who will curate? </li></ul><ul><li>How would you clean such a large dataset? </li></ul>
Vancomycin on ChemSpider
Vancomycin
Vancomycin Search Molecular SKELETON Search Full Molecule
Full  Skeleton  Search: 104 Hits
Full  Molecule  Search: 4 Hits
The InChI “Resolver”
Content is King and  Quality  Costs <ul><li>Curated Chemistry “content” is expensive to create </li></ul><ul><ul><li>Paten...
The EXPERTS must get it right?!
Wikipedia, C&E News, PubChem <ul><li>C&E News (from ACS) </li></ul>
Feedback from Steve Ritter <ul><li>“ Although CAS and C&EN are both part of the ACS Publications Division,  we at C&EN sti...
Maybe it will be ChemSpider? <ul><li>What is ChemSpider? </li></ul><ul><ul><li>A database of almost 23 million compounds, ...
Search OEA
Search OEA
Search OEA
Search OEA
Linked Patents for OEA
 
Linked resources <ul><li>Vendor sites – Aldrich, Alfa Aesar, TCI and 100s of others </li></ul><ul><li>Government databases...
Linked across the internet
Kyoto Encyclopedia of Genes and Genomes
Complex Data and Information
Remember –  QUALITY ISSUES
The FDA’s DailyMed
  Incorrect Structures
Crowd-sourcing Chemistry Curation
The Currency of Recognition <ul><li>We need to build a platform for recognition …. </li></ul>
Chemistry – A Deposition Platform <ul><li>CAS indexes published literature, patents and chemical vendors </li></ul><ul><li...
Blogs should be searchable too…
Use Intelligent Structures :  ChemSpider Embed Web Service
ChemSpider Web Services
Semantic Linking of Structures <ul><li>What would you want to link off a structure? </li></ul><ul><ul><li>Chemical supplie...
Conclusions <ul><li>Internet resources provide a collaborative community for chemistry  </li></ul><ul><li>Crowdsourcing to...
 
Acknowledgments <ul><li>Valery Tkachenko and Sergey Golotvin </li></ul><ul><li>RSC infrastructure team </li></ul><ul><li>T...
[email_address] Twitter: ChemSpiderman www.chemspider.com/blog
Upcoming SlideShare
Loading in...5
×

How Internet Resources Are Providing a Collaborative Community for Chemistry

2,042

Published on

Online chemistry resources have expanded dramatically in the past few years with resources such as PubChem, ChEBI, Wikipedia, ChemSpider and many others offering rich resources to scientists seeking data and information. ChemSpider has become one of the primary chemistry portals delivering a heterogeneous mix of Open and Closed data. ChemSpider offers a structure-centric community for collaboration enabling the crowd-sourced deposition and validation of online chemistry data. ChemSpider has also been integrated into the ChemMantis system – CHEMistry Markup And Nomenclature Transformation Integrated System. This platform facilitates entity extraction of science related terms using both heuristics and highly curated dictionaries. The resulting documents are marked up to allow viewing of chemical structures linked out to over 200 different data sources via the ChemSpider database.

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,042
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
38
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "How Internet Resources Are Providing a Collaborative Community for Chemistry "

  1. 1. How Internet Resources Are Providing a Collaborative Community for Chemistry 60 slides in 20 minutes
  2. 2. Imagine a time when …. <ul><li>The internet is searchable by chemical structure and substructure (e.g.Wikipedia, Google Scholar) </li></ul><ul><li>Chemistry articles are indexed and searchable by a free online service </li></ul><ul><li>The web is linked together through the “language of chemistry” </li></ul>
  3. 3. It’s Coming…Linked Data Cloud
  4. 4. Thanks to the Organizers…
  5. 5. Antony Williams vs Identifiers Passport ID Dad, Tony, others SSN Green Card License 5 email addresses ChemSpiderman (blog, Twitter account, Facebook, Friendfeed) OpenID … .
  6. 6. Aspirin vs Chemical Identifiers
  7. 7. Aspirin names and synonyms <ul><li>Text searches depend on correct association </li></ul><ul><li>335 suggested identifiers for Aspirin just on PubChem! </li></ul><ul><li>Disambiguation dictionaries are necessary </li></ul>
  8. 11. The Final Search Strategy
  9. 12. All Those Names, One Structure
  10. 13. Searching Chemistry on the Internet <ul><li>How complete a result set will we get if we search for “chemicals” by name? </li></ul><ul><li>Is there a better way to link chemistry databases? Linking by “names” is dangerous </li></ul><ul><li>Chemists want structure and SUBstructure searching </li></ul>
  11. 14. The InChI Identifier
  12. 15. Multiple Layers
  13. 16. InChIStrings Hash to InChIKeys
  14. 17. Oleoylethanolamine
  15. 18. Search Engine Dependencies
  16. 19. Search Engine Dependencies
  17. 20. InChIs have traction…
  18. 21. RDF Linking of Structures
  19. 22. PubChem
  20. 23. The Simplest Organic Molecule
  21. 24. Vancomycin
  22. 26. Vancomycin <ul><li>Who will curate? </li></ul><ul><li>How would you clean such a large dataset? </li></ul>
  23. 27. Vancomycin on ChemSpider
  24. 28. Vancomycin
  25. 29. Vancomycin Search Molecular SKELETON Search Full Molecule
  26. 30. Full Skeleton Search: 104 Hits
  27. 31. Full Molecule Search: 4 Hits
  28. 32. The InChI “Resolver”
  29. 33. Content is King and Quality Costs <ul><li>Curated Chemistry “content” is expensive to create </li></ul><ul><ul><li>Patent searching </li></ul></ul><ul><ul><li>Structures and properties </li></ul></ul><ul><ul><li>Drug databases </li></ul></ul><ul><ul><li>Literature databases </li></ul></ul><ul><li>Chemical Abstracts Service (CAS), the “Gold Standard” in Chemistry related information </li></ul><ul><ul><li>102 years of content </li></ul></ul><ul><ul><li>>50 million substances </li></ul></ul><ul><ul><li>Proprietary platform </li></ul></ul>
  30. 34. The EXPERTS must get it right?!
  31. 35. Wikipedia, C&E News, PubChem <ul><li>C&E News (from ACS) </li></ul>
  32. 36. Feedback from Steve Ritter <ul><li>“ Although CAS and C&EN are both part of the ACS Publications Division, we at C&EN still have to pay for our SciFinder access, strangely enough.” </li></ul><ul><li>“ It would be nice to have an authoritative web-based source of standard, well-drawn structures for chemists to go to so they can freely cut and paste structures into their papers, PowerPoint presentations, and anything else they might need. Maybe Wikipedia will be that source one day .” </li></ul>
  33. 37. Maybe it will be ChemSpider? <ul><li>What is ChemSpider? </li></ul><ul><ul><li>A database of almost 23 million compounds, >200 data sources </li></ul></ul><ul><ul><li>A deposition and curation platform </li></ul></ul><ul><ul><li>A publishing platform for the community </li></ul></ul><ul><ul><li>Grows daily – more depositions, more links, more data sources </li></ul></ul>
  34. 38. Search OEA
  35. 39. Search OEA
  36. 40. Search OEA
  37. 41. Search OEA
  38. 42. Linked Patents for OEA
  39. 44. Linked resources <ul><li>Vendor sites – Aldrich, Alfa Aesar, TCI and 100s of others </li></ul><ul><li>Government databases – PubChem, DSSTox, FDA databases, ChemIDPlus,… </li></ul><ul><li>Biological Databases – Protein Database, Stitch, KEGG, ChEBI,… </li></ul><ul><li>Analytical databases –NMRShiftDB,… </li></ul>
  40. 45. Linked across the internet
  41. 46. Kyoto Encyclopedia of Genes and Genomes
  42. 47. Complex Data and Information
  43. 48. Remember – QUALITY ISSUES
  44. 49. The FDA’s DailyMed
  45. 50. Incorrect Structures
  46. 51. Crowd-sourcing Chemistry Curation
  47. 52. The Currency of Recognition <ul><li>We need to build a platform for recognition …. </li></ul>
  48. 53. Chemistry – A Deposition Platform <ul><li>CAS indexes published literature, patents and chemical vendors </li></ul><ul><li>CAS indexes ChemSpider – >303,000 records </li></ul><ul><li>“ Lost Chemistry” – syntheses in theses, lab notebooks? Compounds in private collections? </li></ul><ul><li>ChemSpider accepts public depositions, linking to websites, hosting of details etc. Accepts structures, text, spectra, images. </li></ul>
  49. 54. Blogs should be searchable too…
  50. 55. Use Intelligent Structures : ChemSpider Embed Web Service
  51. 56. ChemSpider Web Services
  52. 57. Semantic Linking of Structures <ul><li>What would you want to link off a structure? </li></ul><ul><ul><li>Chemical suppliers </li></ul></ul><ul><ul><li>Other publications </li></ul></ul><ul><ul><li>Analytical Data </li></ul></ul><ul><ul><li>Related Reactions </li></ul></ul><ul><ul><li>Wikipedia </li></ul></ul><ul><ul><li>Patents </li></ul></ul><ul><ul><li>“ Everything” </li></ul></ul><ul><ul><li>See Richard Kidd’s Talk </li></ul></ul>
  53. 58. Conclusions <ul><li>Internet resources provide a collaborative community for chemistry </li></ul><ul><li>Crowdsourcing to expand, curate and integrate to the benefit of chemists </li></ul><ul><li>Searching the web for chemistry is arriving </li></ul><ul><li>InChIs are enabling chemistry on the internet </li></ul><ul><li>Question Quality! </li></ul>
  54. 60. Acknowledgments <ul><li>Valery Tkachenko and Sergey Golotvin </li></ul><ul><li>RSC infrastructure team </li></ul><ul><li>The ChemSpider advisory group </li></ul><ul><li>The Wikipedia Chemistry team </li></ul>
  55. 61. [email_address] Twitter: ChemSpiderman www.chemspider.com/blog
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×