Your SlideShare is downloading. ×
Going a mile InChI by InChI : Enabling online chemistry at ChemSpider
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Going a mile InChI by InChI : Enabling online chemistry at ChemSpider


Published on

The task of finding chemical information online can be daunting since even the most rudimentary query on Google can provide tens to hundreds of thousands of links to peruse. While there has been an …

The task of finding chemical information online can be daunting since even the most rudimentary query on Google can provide tens to hundreds of thousands of links to peruse. While there has been an increase in the number of online chemical structure databases there has not been a central online resource allowing integrated chemical structure-searching of chemistry databases, chemistry articles, patents and web pages, such as blogs and wikis, until now. ChemSpider provides a significant knowledge base and resource for chemists working in different domains. From the perspective of the InChI identifiers this project can be considered to be a success story since ChemSpider has used both for the development of the database and the provision of fast searching routines. ChemSpider has provided web services for both InChI generation and searching, leading to a proliferation of InChI in the web-based domain of chemistry. This talk will provide an update of ChemSpiders functionality.

Published in: Technology, Education
1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Going a mile InChI by InChI Enabling online chemistry at ChemSpider Antony Williams
  • 2. Languages of Chemistry
  • 3. ChemSpider 2009
    • “ Building a Structure Centric Community for Chemists”
    • Hosting structures, spectra, images, documents, outlinks
    • Many web services for retrieval of data, conversion of files, generation of properties..
    • Now a platform for:
      • data deposition, curation and annotation – remove the junk
      • Supporting Open Notebook Science efforts
      • chemistry document mark-up with ChemMantis
      • the online ChemSpider Journal of Chemistry
  • 4. Statistics and Connections
    • >6000 unique users per day on average
    • >40,000 transactions per day
    • >21.4 million compounds and growing daily
    • Advocate of InChIs for searching and integration
  • 5. Search Cholesterol
  • 6. Search Cholesterol
  • 7. Search Cholesterol
  • 8. Search Cholesterol
  • 9. Search Cholesterol
  • 10. Search Cholesterol
  • 11. Searching
    • Structure searching based on
      • SMILES
      • InChIString
      • InChIKey
      • StdInChI
      • StdInChIKey
      • molfile uploads
      • structures drawn in applet
    • Search across Google (to string limit for InChIString)
      • Skeleton search
      • Full structure search
  • 12. InChIKey Searches Work
  • 13. Depositions
    • Depositions from users – single structures and SDFs
    • Depositions from databases/vendors – SDF files
    • And then came InChIs…
      • InChIs and InChIKeys are available on Blogs for harvesting
      • Publishers are making their structures available as InChIs for harvesting
      • InChIs are NOT ideal for building a database…some lessons
      • We want to link to publications especially…
  • 14. Chemistry Papers
    • Cultivation of a rare Verrucosispora strain (sediment, Sea of Japan) gave three polyketides, atrop -abyssomicin C 35 , abyssomicin G 36 and abyssomicin H 37 . Atrop -abyssomicin C 35 has previously been reported as a synthetic compound, but ready conversion to abyssomicin D suggests that it was probably naturally produced. Atrop -abyssomicin C was an inhibitor of S. aureus N315 (MRSA) and 4-amino-4-deoxychorismate (ADC) synthase. The tenacibactins A–D 38 – 41 , hydroxamate siderophores isolated from culture of the filamentous bacterium Tenacibaculum sp. ( Chondrus ocellatus , Awajishima Island, Japan), all possessed iron-chelating activity with tenacibactins C 40 and D 41 being considerably more effective than tenacibactins A 38 and B 39 .
  • 15. Structures in Chemistry Papers
  • 16. Aesthetics vs Machine Readable
    • Beautiful chemical structures submitted by authors can be beasts for machines
  • 17. InChI Representation
  • 18. InChI’fication of Articles
    • InChIs from publishers – a lot of work for a publisher to provide exact structures for articles. Applause to RSC for Project Prospect and now Nature Chemistry
    • An enormous editorial task with a massive benefit to the community
    • If the structures were correct…imagine a centralized DOI:InChI database
  • 19. Cleaning Structures
  • 20. Converting InChIs to Structures Bacitracin A
    • InChI=1/C66H103N17O16S/......./t35 u ,36 u ,37 u ,40-,41+,42+,43-,44+,45-,46-,47+,48 u ,52-,53-,54-/m0/s1
    • InChI=1/C66H103N17O16S/.......)/t35 ? ,36 ? ,37 ? ,40-,41+,42+,43-,44+,45-,46-,47+,48 ? ,52-,53-,54-/m0/s1
  • 21.  
  • 22. Converting InChIs to Structures
    • What we want is a good layout, retention of stereochemistry labels and tautomers as drawn
  • 23. Auxinfo – Who Uses It? Who Converts It?
    • AuxInfo=1/1/N:52,24,90,40,41,50,100,60,51,23,61,68,66,67,17,16,83,65,64,15,81,30,28,84,92,37,62,99,74,71,13,45,11,39,49,22,59,63,9,88,79,31,35,95,98,75,70,42,76,25,5,47,19,56,77,86,78,26,34,93,7,6,53,18,55,44,85,2,48,12,91,10,80,33,14,38,96,73,69,94,43,58,21,1,27,32,3,4,89,87,82,29,36,97,8,72,54,20,57,46/E:(4,5)(13,14)(18,19)(85,86)(87,88)/it:im/rA:100cONOOCCCOCNCNCNCCCCCONCCCCCOCOCCONCCOCNCCCCNCCSCNCCCCCOCCONCCCCCCCCCCNCCONCCCCCCNCOCCNCOCOCNCCNCNOCCC/rB:;;;s3d4;;;d7;;s9;s10;d11;d9s12;;;;s15s16;s14;s18;d18;N19;s19;s22;s23;;s21;d25;s25;d26;s28;s26s30;s25;N31;s33;s34;d34;s35;N35;s37;s39;s39;;s42;d43;s42;s44s45;s44;s47;s47;s49;s49;s51;s38s42;d53;;s55;d55;P56;s56;s59;s59;;s62;d63;s63;d65;s64;s66d67;s7;s6p69;s5s70;d6;s6;;n73s74;d1s2s74;s75;s58;s78;N79;s79;d78;s81;s83;s84;s80;d86;p14s15s86;d77;s61;s77;s16s91;;s55;s62s93p94;s93;d93;s7n96;s9s98;s22;/rC:12.1656,-8.504,0;12.1656,-6.2884,0;16.5968,-3.1336,0;15.3445,-5.2769,0;16.5968,-4.5785,0;16.7654,-7.5166,0;20.0406,-7.5166,0;20.0406,-6.2402,0;22.208,-6.2161,0;23.2436,-5.4937,0;22.8582,-4.2895,0;21.6059,-4.2895,0;21.1725,-5.4937,0;19.294,-19.028,0;16.067,-19.6542,0;11.2264,-17.9202,0;13.1048,-19.6542,0;20.45,-18.426,0;21.5337,-19.028,0;20.45,-17.1255,0;21.5578,-21.1954,0;22.6174,-18.4019,0;22.6174,-17.1255,0;23.7252,-16.4512,0;19.2459,-23.9168,0;22.7137,-21.8698,0;19.2699,-25.2654,0;20.4018,-23.2425,0;23.8697,-21.1714,0;21.5819,-23.8927,0;22.7378,-23.1943,0;18.0899,-23.2665,0;23.9179,-23.8686,0;25.0497,-23.1702,0;26.2298,-23.8686,0;25.0497,-21.8457,0;27.3617,-23.1702,0;26.2298,-25.2172,0;27.3617,-21.8457,0;28.5417,-21.1714,0;26.2298,-21.1714,0;28.5417,-25.2172,0;29.8181,-25.6025,0;30.6128,-24.5188,0;28.5417,-23.8686,0;29.8181,-23.411,0;31.9614,-24.5188,0;32.6357,-25.6989,0;32.6357,-23.3629,0;31.9614,-22.2069,0;33.9603,-23.3629,0;34.6346,-24.5188,0;27.3617,-25.8675,0;27.3617,-27.2161,0;20.0165,-11.3457,0;18.9328,-11.9959,0;20.0165,-10.0452,0;18.9087,-13.3686,0;17.825,-11.3457,0;16.7172,-11.9959,0;17.825,-10.0693,0;23.5807,-11.7551,0;23.5807,-13.0315,0;24.7367,-13.6817,0;22.5211,-13.6817,0;22.5211,-14.9822,0;24.7367,-14.934,0;23.5807,-15.5842,0;18.9328,-8.1427,0;17.8009,-7.5166,0;17.8009,-5.2769,0;16.091,-6.3365,0;16.1633,-8.6003,0;14.0681,-7.3962,0;14.7906,-8.6003,0;12.8158,-7.3962,0;14.0681,-9.8044,0;17.5119,-13.3686,0;16.8135,-14.5728,0;17.5119,-15.7528,0;15.4408,-14.5728,0;17.0699,-12.6613,0;14.7665,-15.7528,0;13.3697,-15.7528,0;12.6713,-16.9569,0;16.8135,-16.9569,0;15.4408,-16.9569,0;17.5119,-18.137,0;14.7906,-10.9845,0;19.0291,-9.395,0;12.6954,-9.8044,0;11.2264,-11.1049,0;22.2321,-10.0452,0;21.1243,-11.9478,0;22.2321,-11.3457,0;21.1243,-9.4191,0;23.3399,-9.4191,0;21.1243,-8.1186,0;22.2321,-7.5166,0;23.7252,-19.028,0;
  • 24. Who Has Responsibility?
    • Who will take responsibility for drawing/enumerating the structures?
    • Where can software contribute?
    • What Quality is “good enough”?
    • We MUST reduce rework!!!
  • 25. A Lot of Variability in InChIs
    • Source: Unofficial InChI FAQ page
  • 26. InChIs for Taxol
  • 27. Taxol
    • Which one is correct???
  • 28. InChIKeys for Taxol
    • ChEBI and Wikipedia are the SAME structure
    • Drugbank is a DIFFERENT structure at ONE stereocenter
  • 29. Does one stereocenter matter?
    • Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, and Softenon
  • 30. InChIStrings Hash to InChIKeys
  • 31. The InChI Resolver
  • 33.  
  • 34. Resolve-It
  • 35. Resolve-It
  • 36. Pretty-It
  • 37. JMol-It, Download-It and Zoom-It
  • 38. Kind-of-Resolve-It
  • 39. Generate-It
  • 40. Draw-It : Thanks Symyx (Beta release)
  • 41. Generate-It
  • 42. All Flavors
  • 43. Serve Up Services
  • 44. And Once It’s Resolved…
  • 45. Out to ChemSpider…and its resources
  • 46. COMING: InChI Resolver to DOIs
  • 47. Full Text-Based Literature Searching to DOIs Including Citations Now
  • 48. When Structures are “Connected”
  • 49. When Structures are “Connected”
  • 50. ChemSpider Everywhere
    • Linked from Wikipedia
    • Linked from Open Notebook Science sites using EMBED
    • Linked from Blogs using Structure/Spectra EMBED
    • Integrated into structure drawing packages such as ACD/ChemSketch, Symyx Draw, Open Source applets
    • Integrated to software offerings from Thermo, Waters, Agilent, Bruker
  • 51. ChemSpider Everywhere Embed Functionality (like YouTube)
  • 52. ChemSpider Everywhere
  • 53. ChemSpider Everywhere Crowdsourced Curation of Spectra
  • 54. ChemSpider Everywhere RSC Compounds
  • 55. ChemSpider Everywhere Nature Chemistry
    • Nature Chemistry articles are annotated to identify all of the chemical compounds mentioned throughout the text. Users can choose to view the article with all of the compounds highlighted, and find out more about those compounds by linking out to other information resources including PubChem and ChemSpider .
  • 56. ChemSpider Everywhere ChemMobi
  • 57. Structure RSS Feeds with InChIs
  • 58. InChIs are Incomplete
    • What is NOT supported, yet:
      • polymers
      • organometallics
      • Markush structures
      • 3-D structures
      • excited states
      • interlocking structures (e.g. rotaxanes)
      • host-guest complexes
  • 59. Progressing InChI
    • Highest priority for the InChI Team is communication with structure drawing package vendors – THE interfaces to the users
    • For the InChI Resolver : Delivery of services to allow publishers to deposit their structure collections with associated DOIs to ChemSpider
    • Not every structure is important…Discussions with Publishers to discern primary compounds
  • 60. Conclusions
    • InChIs and Internet Chemistry
  • 61. Acknowledgments
    • Richard Kidd, Royal Society of Chemistry
    • Keith Taylor, Symyx
    • Chris Singleton, Steven Bachrach and Alan McNaught for feedback
    • “ The InChI team”