How can the international chemical identifier (InChI) be extended to non trivial chemicals
Upcoming SlideShare
Loading in...5
×
 

How can the international chemical identifier (InChI) be extended to non trivial chemicals

on

  • 505 views

In recent years there has been a dramatic increase in the number of databases of chemical substances that have become available, especially online and in the public domain. While many of these ...

In recent years there has been a dramatic increase in the number of databases of chemical substances that have become available, especially online and in the public domain. While many of these databases contain small molecules that can be explicitly defined using molecular connection tables and InChIs many of them also contain chemicals of biological interest such as synthetic polymers, polypeptides, polynucleotides, etc. A critical capability of any database is a unique identifier which allows for the de-duplication of entries and InChI has become increasingly popular for this purpose. However despite many impending developments for InChI (polymer InChIs, Reaction InChIs, etc) the area of biological chemistry support using a standard approach remains a challenge. This presentation will analyze an approach to address this problem.

This presentation was delivered by Valery Tkachenko at the ACS Meeting in Philadelphia in Fall 2012

Statistics

Views

Total Views
505
Slideshare-icon Views on SlideShare
500
Embed Views
5

Actions

Likes
0
Downloads
0
Comments
0

2 Embeds 5

http://www.chemspider.com 3
https://twitter.com 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-NonCommercial LicenseCC Attribution-NonCommercial License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    How can the international chemical identifier (InChI) be extended to non trivial chemicals How can the international chemical identifier (InChI) be extended to non trivial chemicals Presentation Transcript

    • How can the International ChemicalIdentifier (InChI) be extended to non- trivial chemicals? of the pillars of a V. Tkachenko, A.J. Williams, Y. Borodina, F. Switzer, T. Peryea, L. Callahan ACS Philly August 2012
    • What is InChI
    • InChI Examples CH3CH2OH InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3 ethanol InChI=1S/C6H8O6/c7-1-2(8)5- L-ascorbic acid 3(9)4(10)6(11)12-5/h2,5,7-8,10- 11H,1H2/t2-,5+/m0/s1
    • InChI Structure
    • InChIKey The condensed, 27 character standard InChIKey is a hashed version of the full standard InChI (using the SHA-256 algorithm) Designed to allow for easy web searches of chemical compounds InChIKeys consist of  14 characters resulting from a hash of the connectivity information of the InChI  followed by 9 characters resulting from a hash of the remaining layers of the InChI  followed by a single character indication the version of InChI used  followed by single checksum character InChI=1S/C17H19NO3/c1-18-7-6-17-10-3-5-13(20)16(17)21-15-12(19)4-2-9(14(15)17)8-11(10)18/h2-5,10- 11,13,16,19-20H,6-8H2,1H3/t10-,11+,13-,16-,17-/m0/s1 BQJCRHHNABKAKU-KBQPJGBKSA-N Unlike InChI, InChIKey  CT only by lookup
    • Proliferation of InChI
    • Search by InChI
    • ChemSpider Google Searchhttp://www.chemspider.com/google/
    • What’s the catch? InChI has limitations InChI is ideal for  Simple  Static  Well-defined graphs Real chemical substances can only be approximated by such graphs
    • Limitations Non-trivial stereo (e.g. axial, planar) Non-trivial tautomers (e.g. ring-chain) Mixtures – full stereo is rarely known Polymers Markush structures Organometalics Inorganics Materials Reactions Etc
    • Chemical data complexity
    • Work in progress InChI Extensions: Under the guidance of IUPAC, several sub-teams are now working on expanding InChI to new areas of chemical representation:  Reaction InChI (RInChI): the reaction working group has completed its recommendations, and work is ready to begin.  Polymers/Mixtures: The polymers/mixtures working group also has submitted its recommendations, and work to incorporate the new representations should begin once version 1.04 is released.  Markush: This project is the most complex undertaken to date. The initial recommendations have been submitted, but financing of the work still needs to be sorted out. But what do we do NOW???
    • Data Validation Standardization FilteringComponentization Deposition Process Deduplication Mapping data Non- redundant
    • ChemSpider Data Model
    • Organometallics
    • Mixtures or unknown stereo
    • Accelrys Enhanced Stereo
    • MOL V3000
    • Enhanced stereo and InChI… Unfortunately not supported Is it important? Now real-world examples…
    • FDA Substance Registration System
    • Stoichiometric and non-stoichiometric mixtures Moiety 1:Substance: Moiety 2:
    • Substance: Moiety 1: Moiety 2: Moiety 3: Moiety 4:
    • Substance: Moiety 1: Moiety 2: (undefined)
    • Moiety 1:Substance: (A) Moiety 2: (B)
    • D-glucose
    • SRS standardization approach Substance description Standardization module Moieties generator Normalization InChI[Key] generator Hash function f(InChIKeys, moieties) Unique ID Standard description
    • SRS TBD Markush Polymers Proteins Inorganics Materials
    • OpenPHACTS Open PHACTS is an Innovative Medicines Initiative (IMI) – 3 years project To reduce the barriers to drug discovery in industry, academia and for small businesses To build an open platform, integrating chemistry and biology data from public domain resources Semantic web platform Open Standards, Open Data and Open Source
    • OpenPHACTS specifics Active/inactive ingredient Parent/child Sample/substance Misreferences (!!!)
    • ChemSpider Reactions
    • ChemSpider Reaction Challenges Deduplication Identification Deposition
    • Conclusions InChI is The Identifier InChI has its limitations InChI is work in progress InChI deficiencies can be hot-fixed
    • Acknowledgements RSC Cheminformatics group FDA SRS group OpenPHACTS consortium Software: InChI, GGA Software
    • Thank youEmail: tkachenkov@rsc.orgBlog: www.chemspider.com/blogSLIDES:http://www.slideshare.net/valerytkachenko16