Your SlideShare is downloading. ×
0
How can the International ChemicalIdentifier (InChI) be extended to non-                     trivial chemicals?           ...
What is InChI
InChI Examples     CH3CH2OH                      InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3      ethanol                      InChI...
InChI Structure
InChIKey   The condensed, 27 character standard InChIKey is a hashed version of the full standard InChI (using the    SHA...
Proliferation of InChI
Search by InChI
ChemSpider Google Searchhttp://www.chemspider.com/google/
What’s the catch? InChI has limitations InChI is ideal for    Simple    Static    Well-defined graphs Real chemical ...
Limitations Non-trivial stereo (e.g. axial, planar) Non-trivial tautomers (e.g. ring-chain) Mixtures – full stereo is r...
Chemical data complexity
Work in progress   InChI Extensions: Under the guidance of IUPAC, several sub-teams are now    working on expanding InChI...
Data   Validation Standardization    FilteringComponentization                   Deposition Process Deduplication    Mappi...
ChemSpider Data Model
Organometallics
Mixtures or unknown stereo
Accelrys Enhanced Stereo
MOL V3000
Enhanced stereo and InChI… Unfortunately not supported Is it important? Now real-world examples…
FDA Substance Registration System
Stoichiometric and non-stoichiometric mixtures                                     Moiety 1:Substance:                    ...
Substance:   Moiety 1:             Moiety 2:             Moiety 3:             Moiety 4:
Substance:   Moiety 1:             Moiety 2:                         (undefined)
Moiety 1:Substance:                         (A)             Moiety 2:                         (B)
D-glucose
SRS standardization approach   Substance description   Standardization module   Moieties generator   Normalization   ...
SRS TBD Markush Polymers Proteins Inorganics Materials
OpenPHACTS Open PHACTS is an Innovative Medicines Initiative  (IMI) – 3 years project To reduce the barriers to drug dis...
OpenPHACTS specifics Active/inactive ingredient Parent/child Sample/substance Misreferences (!!!)
ChemSpider Reactions
ChemSpider Reaction Challenges Deduplication Identification Deposition
Conclusions InChI is The Identifier InChI has its limitations InChI is work in progress InChI deficiencies can be hot-...
Acknowledgements RSC Cheminformatics group FDA SRS group OpenPHACTS consortium Software: InChI, GGA Software
Thank youEmail: tkachenkov@rsc.orgBlog: www.chemspider.com/blogSLIDES:http://www.slideshare.net/valerytkachenko16
How can the international chemical identifier (InChI) be extended to non …
How can the international chemical identifier (InChI) be extended to non …
How can the international chemical identifier (InChI) be extended to non …
Upcoming SlideShare
Loading in...5
×

How can the international chemical identifier (InChI) be extended to non …

844

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
844
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "How can the international chemical identifier (InChI) be extended to non …"

  1. 1. How can the International ChemicalIdentifier (InChI) be extended to non- trivial chemicals? of the pillars of a V. Tkachenko, A.J. Williams, Y. Borodina, F. Switzer, T. Peryea, L. Callahan ACS Philly August 2012
  2. 2. What is InChI
  3. 3. InChI Examples CH3CH2OH InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3 ethanol InChI=1S/C6H8O6/c7-1-2(8)5- L-ascorbic acid 3(9)4(10)6(11)12-5/h2,5,7-8,10- 11H,1H2/t2-,5+/m0/s1
  4. 4. InChI Structure
  5. 5. InChIKey The condensed, 27 character standard InChIKey is a hashed version of the full standard InChI (using the SHA-256 algorithm) Designed to allow for easy web searches of chemical compounds InChIKeys consist of  14 characters resulting from a hash of the connectivity information of the InChI  followed by 9 characters resulting from a hash of the remaining layers of the InChI  followed by a single character indication the version of InChI used  followed by single checksum character InChI=1S/C17H19NO3/c1-18-7-6-17-10-3-5-13(20)16(17)21-15-12(19)4-2-9(14(15)17)8-11(10)18/h2-5,10- 11,13,16,19-20H,6-8H2,1H3/t10-,11+,13-,16-,17-/m0/s1 BQJCRHHNABKAKU-KBQPJGBKSA-N Unlike InChI, InChIKey  CT only by lookup
  6. 6. Proliferation of InChI
  7. 7. Search by InChI
  8. 8. ChemSpider Google Searchhttp://www.chemspider.com/google/
  9. 9. What’s the catch? InChI has limitations InChI is ideal for  Simple  Static  Well-defined graphs Real chemical substances can only be approximated by such graphs
  10. 10. Limitations Non-trivial stereo (e.g. axial, planar) Non-trivial tautomers (e.g. ring-chain) Mixtures – full stereo is rarely known Polymers Markush structures Organometalics Inorganics Materials Reactions Etc
  11. 11. Chemical data complexity
  12. 12. Work in progress InChI Extensions: Under the guidance of IUPAC, several sub-teams are now working on expanding InChI to new areas of chemical representation:  Reaction InChI (RInChI): the reaction working group has completed its recommendations, and work is ready to begin.  Polymers/Mixtures: The polymers/mixtures working group also has submitted its recommendations, and work to incorporate the new representations should begin once version 1.04 is released.  Markush: This project is the most complex undertaken to date. The initial recommendations have been submitted, but financing of the work still needs to be sorted out. But what do we do NOW???
  13. 13. Data Validation Standardization FilteringComponentization Deposition Process Deduplication Mapping data Non- redundant
  14. 14. ChemSpider Data Model
  15. 15. Organometallics
  16. 16. Mixtures or unknown stereo
  17. 17. Accelrys Enhanced Stereo
  18. 18. MOL V3000
  19. 19. Enhanced stereo and InChI… Unfortunately not supported Is it important? Now real-world examples…
  20. 20. FDA Substance Registration System
  21. 21. Stoichiometric and non-stoichiometric mixtures Moiety 1:Substance: Moiety 2:
  22. 22. Substance: Moiety 1: Moiety 2: Moiety 3: Moiety 4:
  23. 23. Substance: Moiety 1: Moiety 2: (undefined)
  24. 24. Moiety 1:Substance: (A) Moiety 2: (B)
  25. 25. D-glucose
  26. 26. SRS standardization approach Substance description Standardization module Moieties generator Normalization InChI[Key] generator Hash function f(InChIKeys, moieties) Unique ID Standard description
  27. 27. SRS TBD Markush Polymers Proteins Inorganics Materials
  28. 28. OpenPHACTS Open PHACTS is an Innovative Medicines Initiative (IMI) – 3 years project To reduce the barriers to drug discovery in industry, academia and for small businesses To build an open platform, integrating chemistry and biology data from public domain resources Semantic web platform Open Standards, Open Data and Open Source
  29. 29. OpenPHACTS specifics Active/inactive ingredient Parent/child Sample/substance Misreferences (!!!)
  30. 30. ChemSpider Reactions
  31. 31. ChemSpider Reaction Challenges Deduplication Identification Deposition
  32. 32. Conclusions InChI is The Identifier InChI has its limitations InChI is work in progress InChI deficiencies can be hot-fixed
  33. 33. Acknowledgements RSC Cheminformatics group FDA SRS group OpenPHACTS consortium Software: InChI, GGA Software
  34. 34. Thank youEmail: tkachenkov@rsc.orgBlog: www.chemspider.com/blogSLIDES:http://www.slideshare.net/valerytkachenko16
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×