How can the international chemical identifier (InChI) be extended to non …


Published on

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

How can the international chemical identifier (InChI) be extended to non …

  1. 1. How can the International ChemicalIdentifier (InChI) be extended to non- trivial chemicals? of the pillars of a V. Tkachenko, A.J. Williams, Y. Borodina, F. Switzer, T. Peryea, L. Callahan ACS Philly August 2012
  2. 2. What is InChI
  3. 3. InChI Examples CH3CH2OH InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3 ethanol InChI=1S/C6H8O6/c7-1-2(8)5- L-ascorbic acid 3(9)4(10)6(11)12-5/h2,5,7-8,10- 11H,1H2/t2-,5+/m0/s1
  4. 4. InChI Structure
  5. 5. InChIKey The condensed, 27 character standard InChIKey is a hashed version of the full standard InChI (using the SHA-256 algorithm) Designed to allow for easy web searches of chemical compounds InChIKeys consist of  14 characters resulting from a hash of the connectivity information of the InChI  followed by 9 characters resulting from a hash of the remaining layers of the InChI  followed by a single character indication the version of InChI used  followed by single checksum character InChI=1S/C17H19NO3/c1-18-7-6-17-10-3-5-13(20)16(17)21-15-12(19)4-2-9(14(15)17)8-11(10)18/h2-5,10- 11,13,16,19-20H,6-8H2,1H3/t10-,11+,13-,16-,17-/m0/s1 BQJCRHHNABKAKU-KBQPJGBKSA-N Unlike InChI, InChIKey  CT only by lookup
  6. 6. Proliferation of InChI
  7. 7. Search by InChI
  8. 8. ChemSpider Google Search
  9. 9. What’s the catch? InChI has limitations InChI is ideal for  Simple  Static  Well-defined graphs Real chemical substances can only be approximated by such graphs
  10. 10. Limitations Non-trivial stereo (e.g. axial, planar) Non-trivial tautomers (e.g. ring-chain) Mixtures – full stereo is rarely known Polymers Markush structures Organometalics Inorganics Materials Reactions Etc
  11. 11. Chemical data complexity
  12. 12. Work in progress InChI Extensions: Under the guidance of IUPAC, several sub-teams are now working on expanding InChI to new areas of chemical representation:  Reaction InChI (RInChI): the reaction working group has completed its recommendations, and work is ready to begin.  Polymers/Mixtures: The polymers/mixtures working group also has submitted its recommendations, and work to incorporate the new representations should begin once version 1.04 is released.  Markush: This project is the most complex undertaken to date. The initial recommendations have been submitted, but financing of the work still needs to be sorted out. But what do we do NOW???
  13. 13. Data Validation Standardization FilteringComponentization Deposition Process Deduplication Mapping data Non- redundant
  14. 14. ChemSpider Data Model
  15. 15. Organometallics
  16. 16. Mixtures or unknown stereo
  17. 17. Accelrys Enhanced Stereo
  18. 18. MOL V3000
  19. 19. Enhanced stereo and InChI… Unfortunately not supported Is it important? Now real-world examples…
  20. 20. FDA Substance Registration System
  21. 21. Stoichiometric and non-stoichiometric mixtures Moiety 1:Substance: Moiety 2:
  22. 22. Substance: Moiety 1: Moiety 2: Moiety 3: Moiety 4:
  23. 23. Substance: Moiety 1: Moiety 2: (undefined)
  24. 24. Moiety 1:Substance: (A) Moiety 2: (B)
  25. 25. D-glucose
  26. 26. SRS standardization approach Substance description Standardization module Moieties generator Normalization InChI[Key] generator Hash function f(InChIKeys, moieties) Unique ID Standard description
  27. 27. SRS TBD Markush Polymers Proteins Inorganics Materials
  28. 28. OpenPHACTS Open PHACTS is an Innovative Medicines Initiative (IMI) – 3 years project To reduce the barriers to drug discovery in industry, academia and for small businesses To build an open platform, integrating chemistry and biology data from public domain resources Semantic web platform Open Standards, Open Data and Open Source
  29. 29. OpenPHACTS specifics Active/inactive ingredient Parent/child Sample/substance Misreferences (!!!)
  30. 30. ChemSpider Reactions
  31. 31. ChemSpider Reaction Challenges Deduplication Identification Deposition
  32. 32. Conclusions InChI is The Identifier InChI has its limitations InChI is work in progress InChI deficiencies can be hot-fixed
  33. 33. Acknowledgements RSC Cheminformatics group FDA SRS group OpenPHACTS consortium Software: InChI, GGA Software
  34. 34. Thank youEmail: tkachenkov@rsc.orgBlog: