Going a mile InChI by InChI   Enabling online chemistry at ChemSpider Antony Williams
Languages of Chemistry
ChemSpider 2009 <ul><li>“ Building a Structure Centric Community for Chemists” </li></ul><ul><li>Hosting structures, spect...
Statistics and Connections <ul><li>>6000 unique users per day on average </li></ul><ul><li>>40,000 transactions per day </...
Search Cholesterol
Search Cholesterol
Search Cholesterol
Search Cholesterol
Search Cholesterol
Search Cholesterol
Searching <ul><li>Structure searching based on  </li></ul><ul><ul><li>SMILES </li></ul></ul><ul><ul><li>InChIString </li><...
InChIKey Searches Work
Depositions <ul><li>Depositions from users – single structures and SDFs </li></ul><ul><li>Depositions from databases/vendo...
Chemistry Papers <ul><li>Cultivation of a rare  Verrucosispora  strain (sediment, Sea of Japan) gave three polyketides,  a...
Structures in Chemistry Papers
Aesthetics vs Machine Readable <ul><li>Beautiful  chemical structures submitted by authors can be  beasts  for machines  <...
InChI Representation
InChI’fication of Articles <ul><li>InChIs from publishers – a lot of work for a publisher to provide exact structures for ...
Cleaning Structures
Converting InChIs to Structures Bacitracin A <ul><li>InChI=1/C66H103N17O16S/......./t35 u ,36 u ,37 u ,40-,41+,42+,43-,44+...
 
Converting InChIs to Structures <ul><li>What we want is a good layout, retention of stereochemistry labels and tautomers a...
Auxinfo – Who Uses It? Who Converts It? <ul><li>AuxInfo=1/1/N:52,24,90,40,41,50,100,60,51,23,61,68,66,67,17,16,83,65,64,15...
Who Has Responsibility?  <ul><li>Who  will take responsibility for drawing/enumerating the structures?  </li></ul><ul><li>...
A Lot of Variability in InChIs <ul><li>Source: Unofficial InChI FAQ page </li></ul>
InChIs for Taxol
Taxol <ul><li>DrugBank: RCINICONZNJXQF-CLDWUXIMDD </li></ul><ul><li>ChEBI:   RCINICONZNJXQF-GXKQXQCDDN  </li></ul><ul><li>...
InChIKeys for Taxol <ul><li>DrugBank: RCINICONZNJXQF-CLDWUXIMDD </li></ul><ul><li>ChEBI:   RCINICONZNJXQF-GXKQXQCDDN  </li...
Does one stereocenter matter? <ul><li>Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, and Softenon...
InChIStrings Hash to InChIKeys
The InChI Resolver
HVYWMOMLDIMFJA-DPAQBDIFSA-N
 
Resolve-It
Resolve-It
Pretty-It
JMol-It, Download-It and Zoom-It
Kind-of-Resolve-It
Generate-It
Draw-It : Thanks Symyx (Beta release)
Generate-It
All Flavors
Serve Up Services
And Once It’s Resolved…
Out to ChemSpider…and its resources
COMING: InChI Resolver to DOIs
Full Text-Based Literature Searching to DOIs Including Citations Now
When Structures are “Connected”
When Structures are “Connected”
ChemSpider Everywhere <ul><li>Linked from Wikipedia </li></ul><ul><li>Linked from Open Notebook Science sites using EMBED ...
ChemSpider Everywhere Embed Functionality (like YouTube)
ChemSpider Everywhere www.spectralgame.com
ChemSpider Everywhere Crowdsourced Curation of Spectra
ChemSpider Everywhere RSC Compounds
ChemSpider Everywhere Nature Chemistry <ul><li>Nature Chemistry  articles are annotated to identify all of the chemical co...
ChemSpider Everywhere ChemMobi
Structure RSS Feeds with InChIs
InChIs are Incomplete <ul><li>What is NOT supported, yet: </li></ul><ul><ul><li>polymers </li></ul></ul><ul><ul><li>organo...
Progressing InChI <ul><li>Highest priority  for the InChI Team is communication with structure drawing package vendors –  ...
Conclusions <ul><li>InChIs and Internet Chemistry </li></ul><ul><li>http://inchis.chemspider.com </li></ul>
Acknowledgments <ul><li>Richard Kidd, Royal Society of Chemistry </li></ul><ul><li>Keith Taylor, Symyx </li></ul><ul><li>C...
Upcoming SlideShare
Loading in...5
×

Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

867

Published on

The task of finding chemical information online can be daunting since even the most rudimentary query on Google can provide tens to hundreds of thousands of links to peruse. While there has been an increase in the number of online chemical structure databases there has not been a central online resource allowing integrated chemical structure-searching of chemistry databases, chemistry articles, patents and web pages, such as blogs and wikis, until now. ChemSpider provides a significant knowledge base and resource for chemists working in different domains. From the perspective of the InChI identifiers this project can be considered to be a success story since ChemSpider has used both for the development of the database and the provision of fast searching routines. ChemSpider has provided web services for both InChI generation and searching, leading to a proliferation of InChI in the web-based domain of chemistry. This talk will provide an update of ChemSpiders functionality.

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
867
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
22
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

  1. 1. Going a mile InChI by InChI Enabling online chemistry at ChemSpider Antony Williams
  2. 2. Languages of Chemistry
  3. 3. ChemSpider 2009 <ul><li>“ Building a Structure Centric Community for Chemists” </li></ul><ul><li>Hosting structures, spectra, images, documents, outlinks </li></ul><ul><li>Many web services for retrieval of data, conversion of files, generation of properties.. </li></ul><ul><li>Now a platform for: </li></ul><ul><ul><li>data deposition, curation and annotation – remove the junk </li></ul></ul><ul><ul><li>Supporting Open Notebook Science efforts </li></ul></ul><ul><ul><li>chemistry document mark-up with ChemMantis </li></ul></ul><ul><ul><li>the online ChemSpider Journal of Chemistry </li></ul></ul>
  4. 4. Statistics and Connections <ul><li>>6000 unique users per day on average </li></ul><ul><li>>40,000 transactions per day </li></ul><ul><li>>21.4 million compounds and growing daily </li></ul><ul><li>Advocate of InChIs for searching and integration </li></ul>
  5. 5. Search Cholesterol
  6. 6. Search Cholesterol
  7. 7. Search Cholesterol
  8. 8. Search Cholesterol
  9. 9. Search Cholesterol
  10. 10. Search Cholesterol
  11. 11. Searching <ul><li>Structure searching based on </li></ul><ul><ul><li>SMILES </li></ul></ul><ul><ul><li>InChIString </li></ul></ul><ul><ul><li>InChIKey </li></ul></ul><ul><ul><li>StdInChI </li></ul></ul><ul><ul><li>StdInChIKey </li></ul></ul><ul><ul><li>molfile uploads </li></ul></ul><ul><ul><li>structures drawn in applet </li></ul></ul><ul><li>Search across Google (to string limit for InChIString) </li></ul><ul><ul><li>Skeleton search </li></ul></ul><ul><ul><li>Full structure search </li></ul></ul>
  12. 12. InChIKey Searches Work
  13. 13. Depositions <ul><li>Depositions from users – single structures and SDFs </li></ul><ul><li>Depositions from databases/vendors – SDF files </li></ul><ul><li>And then came InChIs… </li></ul><ul><ul><li>InChIs and InChIKeys are available on Blogs for harvesting </li></ul></ul><ul><ul><li>Publishers are making their structures available as InChIs for harvesting </li></ul></ul><ul><ul><li>InChIs are NOT ideal for building a database…some lessons </li></ul></ul><ul><ul><li>We want to link to publications especially… </li></ul></ul>
  14. 14. Chemistry Papers <ul><li>Cultivation of a rare Verrucosispora strain (sediment, Sea of Japan) gave three polyketides, atrop -abyssomicin C 35 , abyssomicin G 36 and abyssomicin H 37 . Atrop -abyssomicin C 35 has previously been reported as a synthetic compound, but ready conversion to abyssomicin D suggests that it was probably naturally produced. Atrop -abyssomicin C was an inhibitor of S. aureus N315 (MRSA) and 4-amino-4-deoxychorismate (ADC) synthase. The tenacibactins A–D 38 – 41 , hydroxamate siderophores isolated from culture of the filamentous bacterium Tenacibaculum sp. ( Chondrus ocellatus , Awajishima Island, Japan), all possessed iron-chelating activity with tenacibactins C 40 and D 41 being considerably more effective than tenacibactins A 38 and B 39 . </li></ul>
  15. 15. Structures in Chemistry Papers
  16. 16. Aesthetics vs Machine Readable <ul><li>Beautiful chemical structures submitted by authors can be beasts for machines </li></ul>
  17. 17. InChI Representation
  18. 18. InChI’fication of Articles <ul><li>InChIs from publishers – a lot of work for a publisher to provide exact structures for articles. Applause to RSC for Project Prospect and now Nature Chemistry </li></ul><ul><li>An enormous editorial task with a massive benefit to the community </li></ul><ul><li>If the structures were correct…imagine a centralized DOI:InChI database </li></ul>
  19. 19. Cleaning Structures
  20. 20. Converting InChIs to Structures Bacitracin A <ul><li>InChI=1/C66H103N17O16S/......./t35 u ,36 u ,37 u ,40-,41+,42+,43-,44+,45-,46-,47+,48 u ,52-,53-,54-/m0/s1 </li></ul><ul><li>InChI=1/C66H103N17O16S/.......)/t35 ? ,36 ? ,37 ? ,40-,41+,42+,43-,44+,45-,46-,47+,48 ? ,52-,53-,54-/m0/s1 </li></ul>
  21. 22. Converting InChIs to Structures <ul><li>What we want is a good layout, retention of stereochemistry labels and tautomers as drawn </li></ul>
  22. 23. Auxinfo – Who Uses It? Who Converts It? <ul><li>AuxInfo=1/1/N:52,24,90,40,41,50,100,60,51,23,61,68,66,67,17,16,83,65,64,15,81,30,28,84,92,37,62,99,74,71,13,45,11,39,49,22,59,63,9,88,79,31,35,95,98,75,70,42,76,25,5,47,19,56,77,86,78,26,34,93,7,6,53,18,55,44,85,2,48,12,91,10,80,33,14,38,96,73,69,94,43,58,21,1,27,32,3,4,89,87,82,29,36,97,8,72,54,20,57,46/E:(4,5)(13,14)(18,19)(85,86)(87,88)/it:im/rA:100cONOOCCCOCNCNCNCCCCCONCCCCCOCOCCONCCOCNCCCCNCCSCNCCCCCOCCONCCCCCCCCCCNCCONCCCCCCNCOCCNCOCOCNCCNCNOCCC/rB:;;;s3d4;;;d7;;s9;s10;d11;d9s12;;;;s15s16;s14;s18;d18;N19;s19;s22;s23;;s21;d25;s25;d26;s28;s26s30;s25;N31;s33;s34;d34;s35;N35;s37;s39;s39;;s42;d43;s42;s44s45;s44;s47;s47;s49;s49;s51;s38s42;d53;;s55;d55;P56;s56;s59;s59;;s62;d63;s63;d65;s64;s66d67;s7;s6p69;s5s70;d6;s6;;n73s74;d1s2s74;s75;s58;s78;N79;s79;d78;s81;s83;s84;s80;d86;p14s15s86;d77;s61;s77;s16s91;;s55;s62s93p94;s93;d93;s7n96;s9s98;s22;/rC:12.1656,-8.504,0;12.1656,-6.2884,0;16.5968,-3.1336,0;15.3445,-5.2769,0;16.5968,-4.5785,0;16.7654,-7.5166,0;20.0406,-7.5166,0;20.0406,-6.2402,0;22.208,-6.2161,0;23.2436,-5.4937,0;22.8582,-4.2895,0;21.6059,-4.2895,0;21.1725,-5.4937,0;19.294,-19.028,0;16.067,-19.6542,0;11.2264,-17.9202,0;13.1048,-19.6542,0;20.45,-18.426,0;21.5337,-19.028,0;20.45,-17.1255,0;21.5578,-21.1954,0;22.6174,-18.4019,0;22.6174,-17.1255,0;23.7252,-16.4512,0;19.2459,-23.9168,0;22.7137,-21.8698,0;19.2699,-25.2654,0;20.4018,-23.2425,0;23.8697,-21.1714,0;21.5819,-23.8927,0;22.7378,-23.1943,0;18.0899,-23.2665,0;23.9179,-23.8686,0;25.0497,-23.1702,0;26.2298,-23.8686,0;25.0497,-21.8457,0;27.3617,-23.1702,0;26.2298,-25.2172,0;27.3617,-21.8457,0;28.5417,-21.1714,0;26.2298,-21.1714,0;28.5417,-25.2172,0;29.8181,-25.6025,0;30.6128,-24.5188,0;28.5417,-23.8686,0;29.8181,-23.411,0;31.9614,-24.5188,0;32.6357,-25.6989,0;32.6357,-23.3629,0;31.9614,-22.2069,0;33.9603,-23.3629,0;34.6346,-24.5188,0;27.3617,-25.8675,0;27.3617,-27.2161,0;20.0165,-11.3457,0;18.9328,-11.9959,0;20.0165,-10.0452,0;18.9087,-13.3686,0;17.825,-11.3457,0;16.7172,-11.9959,0;17.825,-10.0693,0;23.5807,-11.7551,0;23.5807,-13.0315,0;24.7367,-13.6817,0;22.5211,-13.6817,0;22.5211,-14.9822,0;24.7367,-14.934,0;23.5807,-15.5842,0;18.9328,-8.1427,0;17.8009,-7.5166,0;17.8009,-5.2769,0;16.091,-6.3365,0;16.1633,-8.6003,0;14.0681,-7.3962,0;14.7906,-8.6003,0;12.8158,-7.3962,0;14.0681,-9.8044,0;17.5119,-13.3686,0;16.8135,-14.5728,0;17.5119,-15.7528,0;15.4408,-14.5728,0;17.0699,-12.6613,0;14.7665,-15.7528,0;13.3697,-15.7528,0;12.6713,-16.9569,0;16.8135,-16.9569,0;15.4408,-16.9569,0;17.5119,-18.137,0;14.7906,-10.9845,0;19.0291,-9.395,0;12.6954,-9.8044,0;11.2264,-11.1049,0;22.2321,-10.0452,0;21.1243,-11.9478,0;22.2321,-11.3457,0;21.1243,-9.4191,0;23.3399,-9.4191,0;21.1243,-8.1186,0;22.2321,-7.5166,0;23.7252,-19.028,0; </li></ul>
  23. 24. Who Has Responsibility? <ul><li>Who will take responsibility for drawing/enumerating the structures? </li></ul><ul><li>Where can software contribute? </li></ul><ul><li>What Quality is “good enough”? </li></ul><ul><li>We MUST reduce rework!!! </li></ul>
  24. 25. A Lot of Variability in InChIs <ul><li>Source: Unofficial InChI FAQ page </li></ul>
  25. 26. InChIs for Taxol
  26. 27. Taxol <ul><li>DrugBank: RCINICONZNJXQF-CLDWUXIMDD </li></ul><ul><li>ChEBI: RCINICONZNJXQF-GXKQXQCDDN </li></ul><ul><li>Wikipedia: RCINICONZNJXQF-MZXODVADBJ </li></ul><ul><li>Which one is correct??? </li></ul>
  27. 28. InChIKeys for Taxol <ul><li>DrugBank: RCINICONZNJXQF-CLDWUXIMDD </li></ul><ul><li>ChEBI: RCINICONZNJXQF-GXKQXQCDDN </li></ul><ul><li>Wikipedia: RCINICONZNJXQF-MZXODVADBJ </li></ul><ul><li>ChEBI and Wikipedia are the SAME structure </li></ul><ul><li>Drugbank is a DIFFERENT structure at ONE stereocenter </li></ul>
  28. 29. Does one stereocenter matter? <ul><li>Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, and Softenon </li></ul>
  29. 30. InChIStrings Hash to InChIKeys
  30. 31. The InChI Resolver
  31. 32. HVYWMOMLDIMFJA-DPAQBDIFSA-N
  32. 34. Resolve-It
  33. 35. Resolve-It
  34. 36. Pretty-It
  35. 37. JMol-It, Download-It and Zoom-It
  36. 38. Kind-of-Resolve-It
  37. 39. Generate-It
  38. 40. Draw-It : Thanks Symyx (Beta release)
  39. 41. Generate-It
  40. 42. All Flavors
  41. 43. Serve Up Services
  42. 44. And Once It’s Resolved…
  43. 45. Out to ChemSpider…and its resources
  44. 46. COMING: InChI Resolver to DOIs
  45. 47. Full Text-Based Literature Searching to DOIs Including Citations Now
  46. 48. When Structures are “Connected”
  47. 49. When Structures are “Connected”
  48. 50. ChemSpider Everywhere <ul><li>Linked from Wikipedia </li></ul><ul><li>Linked from Open Notebook Science sites using EMBED </li></ul><ul><li>Linked from Blogs using Structure/Spectra EMBED </li></ul><ul><li>Integrated into structure drawing packages such as ACD/ChemSketch, Symyx Draw, Open Source applets </li></ul><ul><li>Integrated to software offerings from Thermo, Waters, Agilent, Bruker </li></ul>
  49. 51. ChemSpider Everywhere Embed Functionality (like YouTube)
  50. 52. ChemSpider Everywhere www.spectralgame.com
  51. 53. ChemSpider Everywhere Crowdsourced Curation of Spectra
  52. 54. ChemSpider Everywhere RSC Compounds
  53. 55. ChemSpider Everywhere Nature Chemistry <ul><li>Nature Chemistry articles are annotated to identify all of the chemical compounds mentioned throughout the text. Users can choose to view the article with all of the compounds highlighted, and find out more about those compounds by linking out to other information resources including PubChem and ChemSpider . </li></ul>
  54. 56. ChemSpider Everywhere ChemMobi
  55. 57. Structure RSS Feeds with InChIs
  56. 58. InChIs are Incomplete <ul><li>What is NOT supported, yet: </li></ul><ul><ul><li>polymers </li></ul></ul><ul><ul><li>organometallics </li></ul></ul><ul><ul><li>Markush structures </li></ul></ul><ul><ul><li>3-D structures </li></ul></ul><ul><ul><li>excited states </li></ul></ul><ul><ul><li>interlocking structures (e.g. rotaxanes) </li></ul></ul><ul><ul><li>host-guest complexes </li></ul></ul>
  57. 59. Progressing InChI <ul><li>Highest priority for the InChI Team is communication with structure drawing package vendors – THE interfaces to the users </li></ul><ul><li>For the InChI Resolver : Delivery of services to allow publishers to deposit their structure collections with associated DOIs to ChemSpider </li></ul><ul><li>Not every structure is important…Discussions with Publishers to discern primary compounds </li></ul>
  58. 60. Conclusions <ul><li>InChIs and Internet Chemistry </li></ul><ul><li>http://inchis.chemspider.com </li></ul>
  59. 61. Acknowledgments <ul><li>Richard Kidd, Royal Society of Chemistry </li></ul><ul><li>Keith Taylor, Symyx </li></ul><ul><li>Chris Singleton, Steven Bachrach and Alan McNaught for feedback </li></ul><ul><li>“ The InChI team” </li></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×