How can the international chemical identifier (InChI) be extended to non …
Upcoming SlideShare
Loading in...5

Like this? Share it with your network


How can the international chemical identifier (InChI) be extended to non …






Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

How can the international chemical identifier (InChI) be extended to non … Presentation Transcript

  • 1. How can the International ChemicalIdentifier (InChI) be extended to non- trivial chemicals? of the pillars of a V. Tkachenko, A.J. Williams, Y. Borodina, F. Switzer, T. Peryea, L. Callahan ACS Philly August 2012
  • 2. What is InChI
  • 3. InChI Examples CH3CH2OH InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3 ethanol InChI=1S/C6H8O6/c7-1-2(8)5- L-ascorbic acid 3(9)4(10)6(11)12-5/h2,5,7-8,10- 11H,1H2/t2-,5+/m0/s1
  • 4. InChI Structure
  • 5. InChIKey The condensed, 27 character standard InChIKey is a hashed version of the full standard InChI (using the SHA-256 algorithm) Designed to allow for easy web searches of chemical compounds InChIKeys consist of  14 characters resulting from a hash of the connectivity information of the InChI  followed by 9 characters resulting from a hash of the remaining layers of the InChI  followed by a single character indication the version of InChI used  followed by single checksum character InChI=1S/C17H19NO3/c1-18-7-6-17-10-3-5-13(20)16(17)21-15-12(19)4-2-9(14(15)17)8-11(10)18/h2-5,10- 11,13,16,19-20H,6-8H2,1H3/t10-,11+,13-,16-,17-/m0/s1 BQJCRHHNABKAKU-KBQPJGBKSA-N Unlike InChI, InChIKey  CT only by lookup
  • 6. Proliferation of InChI
  • 7. Search by InChI
  • 8. ChemSpider Google Search
  • 9. What’s the catch? InChI has limitations InChI is ideal for  Simple  Static  Well-defined graphs Real chemical substances can only be approximated by such graphs
  • 10. Limitations Non-trivial stereo (e.g. axial, planar) Non-trivial tautomers (e.g. ring-chain) Mixtures – full stereo is rarely known Polymers Markush structures Organometalics Inorganics Materials Reactions Etc
  • 11. Chemical data complexity
  • 12. Work in progress InChI Extensions: Under the guidance of IUPAC, several sub-teams are now working on expanding InChI to new areas of chemical representation:  Reaction InChI (RInChI): the reaction working group has completed its recommendations, and work is ready to begin.  Polymers/Mixtures: The polymers/mixtures working group also has submitted its recommendations, and work to incorporate the new representations should begin once version 1.04 is released.  Markush: This project is the most complex undertaken to date. The initial recommendations have been submitted, but financing of the work still needs to be sorted out. But what do we do NOW???
  • 13. Data Validation Standardization FilteringComponentization Deposition Process Deduplication Mapping data Non- redundant
  • 14. ChemSpider Data Model
  • 15. Organometallics
  • 16. Mixtures or unknown stereo
  • 17. Accelrys Enhanced Stereo
  • 18. MOL V3000
  • 19. Enhanced stereo and InChI… Unfortunately not supported Is it important? Now real-world examples…
  • 20. FDA Substance Registration System
  • 21. Stoichiometric and non-stoichiometric mixtures Moiety 1:Substance: Moiety 2:
  • 22. Substance: Moiety 1: Moiety 2: Moiety 3: Moiety 4:
  • 23. Substance: Moiety 1: Moiety 2: (undefined)
  • 24. Moiety 1:Substance: (A) Moiety 2: (B)
  • 25. D-glucose
  • 26. SRS standardization approach Substance description Standardization module Moieties generator Normalization InChI[Key] generator Hash function f(InChIKeys, moieties) Unique ID Standard description
  • 27. SRS TBD Markush Polymers Proteins Inorganics Materials
  • 28. OpenPHACTS Open PHACTS is an Innovative Medicines Initiative (IMI) – 3 years project To reduce the barriers to drug discovery in industry, academia and for small businesses To build an open platform, integrating chemistry and biology data from public domain resources Semantic web platform Open Standards, Open Data and Open Source
  • 29. OpenPHACTS specifics Active/inactive ingredient Parent/child Sample/substance Misreferences (!!!)
  • 30. ChemSpider Reactions
  • 31. ChemSpider Reaction Challenges Deduplication Identification Deposition
  • 32. Conclusions InChI is The Identifier InChI has its limitations InChI is work in progress InChI deficiencies can be hot-fixed
  • 33. Acknowledgements RSC Cheminformatics group FDA SRS group OpenPHACTS consortium Software: InChI, GGA Software
  • 34. Thank youEmail: tkachenkov@rsc.orgBlog: