Your SlideShare is downloading. ×
How can the international chemical identifier (InChI) be extended to non …
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

How can the international chemical identifier (InChI) be extended to non …


Published on

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. How can the International ChemicalIdentifier (InChI) be extended to non- trivial chemicals? of the pillars of a V. Tkachenko, A.J. Williams, Y. Borodina, F. Switzer, T. Peryea, L. Callahan ACS Philly August 2012
  • 2. What is InChI
  • 3. InChI Examples CH3CH2OH InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3 ethanol InChI=1S/C6H8O6/c7-1-2(8)5- L-ascorbic acid 3(9)4(10)6(11)12-5/h2,5,7-8,10- 11H,1H2/t2-,5+/m0/s1
  • 4. InChI Structure
  • 5. InChIKey The condensed, 27 character standard InChIKey is a hashed version of the full standard InChI (using the SHA-256 algorithm) Designed to allow for easy web searches of chemical compounds InChIKeys consist of  14 characters resulting from a hash of the connectivity information of the InChI  followed by 9 characters resulting from a hash of the remaining layers of the InChI  followed by a single character indication the version of InChI used  followed by single checksum character InChI=1S/C17H19NO3/c1-18-7-6-17-10-3-5-13(20)16(17)21-15-12(19)4-2-9(14(15)17)8-11(10)18/h2-5,10- 11,13,16,19-20H,6-8H2,1H3/t10-,11+,13-,16-,17-/m0/s1 BQJCRHHNABKAKU-KBQPJGBKSA-N Unlike InChI, InChIKey  CT only by lookup
  • 6. Proliferation of InChI
  • 7. Search by InChI
  • 8. ChemSpider Google Search
  • 9. What’s the catch? InChI has limitations InChI is ideal for  Simple  Static  Well-defined graphs Real chemical substances can only be approximated by such graphs
  • 10. Limitations Non-trivial stereo (e.g. axial, planar) Non-trivial tautomers (e.g. ring-chain) Mixtures – full stereo is rarely known Polymers Markush structures Organometalics Inorganics Materials Reactions Etc
  • 11. Chemical data complexity
  • 12. Work in progress InChI Extensions: Under the guidance of IUPAC, several sub-teams are now working on expanding InChI to new areas of chemical representation:  Reaction InChI (RInChI): the reaction working group has completed its recommendations, and work is ready to begin.  Polymers/Mixtures: The polymers/mixtures working group also has submitted its recommendations, and work to incorporate the new representations should begin once version 1.04 is released.  Markush: This project is the most complex undertaken to date. The initial recommendations have been submitted, but financing of the work still needs to be sorted out. But what do we do NOW???
  • 13. Data Validation Standardization FilteringComponentization Deposition Process Deduplication Mapping data Non- redundant
  • 14. ChemSpider Data Model
  • 15. Organometallics
  • 16. Mixtures or unknown stereo
  • 17. Accelrys Enhanced Stereo
  • 18. MOL V3000
  • 19. Enhanced stereo and InChI… Unfortunately not supported Is it important? Now real-world examples…
  • 20. FDA Substance Registration System
  • 21. Stoichiometric and non-stoichiometric mixtures Moiety 1:Substance: Moiety 2:
  • 22. Substance: Moiety 1: Moiety 2: Moiety 3: Moiety 4:
  • 23. Substance: Moiety 1: Moiety 2: (undefined)
  • 24. Moiety 1:Substance: (A) Moiety 2: (B)
  • 25. D-glucose
  • 26. SRS standardization approach Substance description Standardization module Moieties generator Normalization InChI[Key] generator Hash function f(InChIKeys, moieties) Unique ID Standard description
  • 27. SRS TBD Markush Polymers Proteins Inorganics Materials
  • 28. OpenPHACTS Open PHACTS is an Innovative Medicines Initiative (IMI) – 3 years project To reduce the barriers to drug discovery in industry, academia and for small businesses To build an open platform, integrating chemistry and biology data from public domain resources Semantic web platform Open Standards, Open Data and Open Source
  • 29. OpenPHACTS specifics Active/inactive ingredient Parent/child Sample/substance Misreferences (!!!)
  • 30. ChemSpider Reactions
  • 31. ChemSpider Reaction Challenges Deduplication Identification Deposition
  • 32. Conclusions InChI is The Identifier InChI has its limitations InChI is work in progress InChI deficiencies can be hot-fixed
  • 33. Acknowledgements RSC Cheminformatics group FDA SRS group OpenPHACTS consortium Software: InChI, GGA Software
  • 34. Thank youEmail: tkachenkov@rsc.orgBlog: