SlideShare a Scribd company logo
1 of 37
How can the International Chemical
Identifier (InChI) be extended to non-
                     trivial chemicals?
                        of the pillars of a
                          V. Tkachenko, A.J. Williams,
         Y. Borodina, F. Switzer, T. Peryea, L. Callahan

                                    ACS Philly August 2012
What is InChI
InChI Examples


     CH3CH2OH
                      InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3
      ethanol




                      InChI=1S/C6H8O6/c7-1-2(8)5-
    L-ascorbic acid   3(9)4(10)6(11)12-5/h2,5,7-8,10-
                      11H,1H2/t2-,5+/m0/s1
InChI Structure
InChIKey
   The condensed, 27 character standard InChIKey is a hashed version of the full standard InChI (using the
    SHA-256 algorithm)
   Designed to allow for easy web searches of chemical compounds
   InChIKeys consist of
       14 characters resulting from a hash of the connectivity information of the InChI
       followed by 9 characters resulting from a hash of the remaining layers of the InChI
       followed by a single character indication the version of InChI used
       followed by single checksum character




   InChI=1S/C17H19NO3/c1-18-7-6-17-10-3-5-13(20)16(17)21-15-12(19)4-2-9(14(15)17)8-11(10)18/h2-5,10-
    11,13,16,19-20H,6-8H2,1H3/t10-,11+,13-,16-,17-/m0/s1
   BQJCRHHNABKAKU-KBQPJGBKSA-N
   Unlike InChI, InChIKey  CT only by lookup
Proliferation of InChI
Search by InChI
ChemSpider Google Search
http://www.chemspider.com/google/
What’s the catch?

 InChI has limitations
 InChI is ideal for
    Simple
    Static
    Well-defined graphs
 Real chemical substances can only be
  approximated by such graphs
Limitations
 Non-trivial stereo (e.g. axial, planar)
 Non-trivial tautomers (e.g. ring-chain)
 Mixtures – full stereo is rarely known
 Polymers
 Markush structures
 Organometalics
 Inorganics
 Materials
 Reactions
 Etc
Chemical data complexity
Work in progress
   InChI Extensions: Under the guidance of IUPAC, several sub-teams are now
    working on expanding InChI to new areas of chemical representation:

      Reaction InChI (RInChI): the reaction working group has completed its
       recommendations, and work is ready to begin.

      Polymers/Mixtures: The polymers/mixtures working group also has
       submitted its recommendations, and work to incorporate the new
       representations should begin once version 1.04 is released.

      Markush: This project is the most complex undertaken to date. The initial
       recommendations have been submitted, but financing of the work still
       needs to be sorted out.

   But what do we do NOW???
Data
   Validation

 Standardization

    Filtering

Componentization
                   Deposition Process




 Deduplication

    Mapping
      data
      Non-
   redundant
ChemSpider Data Model
Organometallics
Mixtures or unknown stereo
Accelrys Enhanced Stereo
MOL V3000
Enhanced stereo and InChI…
 Unfortunately not supported
 Is it important?
 Now real-world examples…
FDA Substance Registration System
Stoichiometric and non-stoichiometric mixtures



                                     Moiety 1:
Substance:




                                      Moiety 2:
Substance:   Moiety 1:



             Moiety 2:



             Moiety 3:



             Moiety 4:
Substance:   Moiety 1:




             Moiety 2:
                         (undefined)
Moiety 1:
Substance:


                         (A)


             Moiety 2:
                         (B)
D-glucose
SRS standardization approach
   Substance description
   Standardization module
   Moieties generator
   Normalization
   InChI[Key] generator


 Hash function f(InChIKeys, moieties)


 Unique ID
 Standard description
SRS TBD
 Markush

 Polymers

 Proteins

 Inorganics

 Materials
OpenPHACTS
 Open PHACTS is an Innovative Medicines Initiative
  (IMI) – 3 years project

 To reduce the barriers to drug discovery in industry,
  academia and for small businesses

 To build an open platform, integrating chemistry and
  biology data from public domain resources

 Semantic web platform

 Open Standards, Open Data and Open Source
OpenPHACTS specifics
 Active/inactive ingredient

 Parent/child

 Sample/substance

 Misreferences (!!!)
ChemSpider Reactions
ChemSpider Reaction Challenges
 Deduplication

 Identification

 Deposition
Conclusions
 InChI is The Identifier

 InChI has its limitations

 InChI is work in progress

 InChI deficiencies can be hot-fixed
Acknowledgements
 RSC Cheminformatics group

 FDA SRS group

 OpenPHACTS consortium

 Software: InChI, GGA Software
Thank you

Email: tkachenkov@rsc.org
Blog: www.chemspider.com/blog
SLIDES:
http://www.slideshare.net/valerytkachenko16

More Related Content

Viewers also liked

Do arco da rua Augusta, melhor varanda de Lisboa
Do arco da rua Augusta, melhor varanda de LisboaDo arco da rua Augusta, melhor varanda de Lisboa
Do arco da rua Augusta, melhor varanda de LisboaLuiz Carlos Dias
 
Toda a verdade sobre a linhaça
Toda a verdade sobre a linhaçaToda a verdade sobre a linhaça
Toda a verdade sobre a linhaçaLuiz Carlos Dias
 
Tutoría en entornos virtuales (moodle)
Tutoría en entornos virtuales (moodle)Tutoría en entornos virtuales (moodle)
Tutoría en entornos virtuales (moodle)silviagarcia66
 
Dezenove predios inusitados e curiosos
Dezenove predios inusitados e curiososDezenove predios inusitados e curiosos
Dezenove predios inusitados e curiososLuiz Carlos Dias
 

Viewers also liked (7)

Do arco da rua Augusta, melhor varanda de Lisboa
Do arco da rua Augusta, melhor varanda de LisboaDo arco da rua Augusta, melhor varanda de Lisboa
Do arco da rua Augusta, melhor varanda de Lisboa
 
O Segredo da Cebola
O Segredo da CebolaO Segredo da Cebola
O Segredo da Cebola
 
Microbios
MicrobiosMicrobios
Microbios
 
Toda a verdade sobre a linhaça
Toda a verdade sobre a linhaçaToda a verdade sobre a linhaça
Toda a verdade sobre a linhaça
 
Cuide seus olhos
Cuide seus olhosCuide seus olhos
Cuide seus olhos
 
Tutoría en entornos virtuales (moodle)
Tutoría en entornos virtuales (moodle)Tutoría en entornos virtuales (moodle)
Tutoría en entornos virtuales (moodle)
 
Dezenove predios inusitados e curiosos
Dezenove predios inusitados e curiososDezenove predios inusitados e curiosos
Dezenove predios inusitados e curiosos
 

Similar to How can the international chemical identifier (InChI) be extended to non trivial chemicals

Great promise of navigating the internet using in chis
Great promise of navigating the internet using in chisGreat promise of navigating the internet using in chis
Great promise of navigating the internet using in chisRoyal Society of Chemistry
 
DMCM2018 Community Resources Connecting Chemistry and Toxicity Knowledge
DMCM2018 Community Resources Connecting Chemistry and Toxicity KnowledgeDMCM2018 Community Resources Connecting Chemistry and Toxicity Knowledge
DMCM2018 Community Resources Connecting Chemistry and Toxicity KnowledgeEmma Schymanski
 
Global Regulatory Issues: one BA method, one validation, one report ...
Global Regulatory Issues: one BA method, one validation, one report ...Global Regulatory Issues: one BA method, one validation, one report ...
Global Regulatory Issues: one BA method, one validation, one report ...Peter van Amsterdam
 
A brief history of reaction analytics (CINF 144, ACS National Meeting 2018-08...
A brief history of reaction analytics (CINF 144, ACS National Meeting 2018-08...A brief history of reaction analytics (CINF 144, ACS National Meeting 2018-08...
A brief history of reaction analytics (CINF 144, ACS National Meeting 2018-08...Frederik van den Broek
 
Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...Valery Tkachenko
 
Omdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital AgeOmdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital Agepetermurrayrust
 
Accessing small molecule data using ChEBI
Accessing small molecule data using ChEBIAccessing small molecule data using ChEBI
Accessing small molecule data using ChEBIDuncan Hull
 
Mode of action analysis
Mode of action analysisMode of action analysis
Mode of action analysisWenlan Hu
 
Link - Opportunities and Challenges for Research on Intelligent Algorithms fo...
Link - Opportunities and Challenges for Research on Intelligent Algorithms fo...Link - Opportunities and Challenges for Research on Intelligent Algorithms fo...
Link - Opportunities and Challenges for Research on Intelligent Algorithms fo...Universitat Politècnica de València
 
In vitro data and in silico models for predictive toxicology
In vitro data and in silico models for predictive toxicologyIn vitro data and in silico models for predictive toxicology
In vitro data and in silico models for predictive toxicologyEFSA EU
 
Mixtures InChI: a story of how standards drive upstream products
Mixtures InChI: a story of how standards drive upstream productsMixtures InChI: a story of how standards drive upstream products
Mixtures InChI: a story of how standards drive upstream productsAlex Clark
 
Data4Impact booklet overview of results
Data4Impact booklet overview of resultsData4Impact booklet overview of results
Data4Impact booklet overview of resultsData4Impact
 
PubChem and Big Data Chemistry
PubChem and Big Data ChemistryPubChem and Big Data Chemistry
PubChem and Big Data ChemistrySunghwan Kim
 
Finding Optimal Compound Dosage for Anti-Aging Drugs
Finding Optimal Compound Dosage for Anti-Aging DrugsFinding Optimal Compound Dosage for Anti-Aging Drugs
Finding Optimal Compound Dosage for Anti-Aging DrugsWenlan Hu
 
Best compound characterization protocol
Best compound characterization protocolBest compound characterization protocol
Best compound characterization protocolWenlan Hu
 
Good Model Organism for Anti Aging Testing
Good Model Organism for Anti Aging TestingGood Model Organism for Anti Aging Testing
Good Model Organism for Anti Aging TestingWenlan Hu
 

Similar to How can the international chemical identifier (InChI) be extended to non trivial chemicals (20)

Great promise of navigating the internet using in chis
Great promise of navigating the internet using in chisGreat promise of navigating the internet using in chis
Great promise of navigating the internet using in chis
 
Great promise of navigating the internet using in chis
Great promise of navigating the internet using in chisGreat promise of navigating the internet using in chis
Great promise of navigating the internet using in chis
 
DMCM2018 Community Resources Connecting Chemistry and Toxicity Knowledge
DMCM2018 Community Resources Connecting Chemistry and Toxicity KnowledgeDMCM2018 Community Resources Connecting Chemistry and Toxicity Knowledge
DMCM2018 Community Resources Connecting Chemistry and Toxicity Knowledge
 
Global Regulatory Issues: one BA method, one validation, one report ...
Global Regulatory Issues: one BA method, one validation, one report ...Global Regulatory Issues: one BA method, one validation, one report ...
Global Regulatory Issues: one BA method, one validation, one report ...
 
A brief history of reaction analytics (CINF 144, ACS National Meeting 2018-08...
A brief history of reaction analytics (CINF 144, ACS National Meeting 2018-08...A brief history of reaction analytics (CINF 144, ACS National Meeting 2018-08...
A brief history of reaction analytics (CINF 144, ACS National Meeting 2018-08...
 
ICH
ICHICH
ICH
 
Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...
 
Web-based access to experimental and predicted data for environmental fate, t...
Web-based access to experimental and predicted data for environmental fate, t...Web-based access to experimental and predicted data for environmental fate, t...
Web-based access to experimental and predicted data for environmental fate, t...
 
Omdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital AgeOmdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital Age
 
Accessing small molecule data using ChEBI
Accessing small molecule data using ChEBIAccessing small molecule data using ChEBI
Accessing small molecule data using ChEBI
 
ChemSpider – An Online Database and Registration System Linking the Web
ChemSpider – An Online Database and  Registration System Linking the WebChemSpider – An Online Database and  Registration System Linking the Web
ChemSpider – An Online Database and Registration System Linking the Web
 
Mode of action analysis
Mode of action analysisMode of action analysis
Mode of action analysis
 
Link - Opportunities and Challenges for Research on Intelligent Algorithms fo...
Link - Opportunities and Challenges for Research on Intelligent Algorithms fo...Link - Opportunities and Challenges for Research on Intelligent Algorithms fo...
Link - Opportunities and Challenges for Research on Intelligent Algorithms fo...
 
In vitro data and in silico models for predictive toxicology
In vitro data and in silico models for predictive toxicologyIn vitro data and in silico models for predictive toxicology
In vitro data and in silico models for predictive toxicology
 
Mixtures InChI: a story of how standards drive upstream products
Mixtures InChI: a story of how standards drive upstream productsMixtures InChI: a story of how standards drive upstream products
Mixtures InChI: a story of how standards drive upstream products
 
Data4Impact booklet overview of results
Data4Impact booklet overview of resultsData4Impact booklet overview of results
Data4Impact booklet overview of results
 
PubChem and Big Data Chemistry
PubChem and Big Data ChemistryPubChem and Big Data Chemistry
PubChem and Big Data Chemistry
 
Finding Optimal Compound Dosage for Anti-Aging Drugs
Finding Optimal Compound Dosage for Anti-Aging DrugsFinding Optimal Compound Dosage for Anti-Aging Drugs
Finding Optimal Compound Dosage for Anti-Aging Drugs
 
Best compound characterization protocol
Best compound characterization protocolBest compound characterization protocol
Best compound characterization protocol
 
Good Model Organism for Anti Aging Testing
Good Model Organism for Anti Aging TestingGood Model Organism for Anti Aging Testing
Good Model Organism for Anti Aging Testing
 

Recently uploaded

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Recently uploaded (20)

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

How can the international chemical identifier (InChI) be extended to non trivial chemicals

  • 1. How can the International Chemical Identifier (InChI) be extended to non- trivial chemicals? of the pillars of a V. Tkachenko, A.J. Williams, Y. Borodina, F. Switzer, T. Peryea, L. Callahan ACS Philly August 2012
  • 3. InChI Examples CH3CH2OH InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3 ethanol InChI=1S/C6H8O6/c7-1-2(8)5- L-ascorbic acid 3(9)4(10)6(11)12-5/h2,5,7-8,10- 11H,1H2/t2-,5+/m0/s1
  • 5. InChIKey  The condensed, 27 character standard InChIKey is a hashed version of the full standard InChI (using the SHA-256 algorithm)  Designed to allow for easy web searches of chemical compounds  InChIKeys consist of  14 characters resulting from a hash of the connectivity information of the InChI  followed by 9 characters resulting from a hash of the remaining layers of the InChI  followed by a single character indication the version of InChI used  followed by single checksum character  InChI=1S/C17H19NO3/c1-18-7-6-17-10-3-5-13(20)16(17)21-15-12(19)4-2-9(14(15)17)8-11(10)18/h2-5,10- 11,13,16,19-20H,6-8H2,1H3/t10-,11+,13-,16-,17-/m0/s1  BQJCRHHNABKAKU-KBQPJGBKSA-N  Unlike InChI, InChIKey  CT only by lookup
  • 9. What’s the catch?  InChI has limitations  InChI is ideal for  Simple  Static  Well-defined graphs  Real chemical substances can only be approximated by such graphs
  • 10. Limitations  Non-trivial stereo (e.g. axial, planar)  Non-trivial tautomers (e.g. ring-chain)  Mixtures – full stereo is rarely known  Polymers  Markush structures  Organometalics  Inorganics  Materials  Reactions  Etc
  • 12. Work in progress  InChI Extensions: Under the guidance of IUPAC, several sub-teams are now working on expanding InChI to new areas of chemical representation:  Reaction InChI (RInChI): the reaction working group has completed its recommendations, and work is ready to begin.  Polymers/Mixtures: The polymers/mixtures working group also has submitted its recommendations, and work to incorporate the new representations should begin once version 1.04 is released.  Markush: This project is the most complex undertaken to date. The initial recommendations have been submitted, but financing of the work still needs to be sorted out.  But what do we do NOW???
  • 13. Data Validation Standardization Filtering Componentization Deposition Process Deduplication Mapping data Non- redundant
  • 19. Enhanced stereo and InChI…  Unfortunately not supported  Is it important?  Now real-world examples…
  • 21. Stoichiometric and non-stoichiometric mixtures Moiety 1: Substance: Moiety 2:
  • 22. Substance: Moiety 1: Moiety 2: Moiety 3: Moiety 4:
  • 23. Substance: Moiety 1: Moiety 2: (undefined)
  • 24. Moiety 1: Substance: (A) Moiety 2: (B)
  • 26. SRS standardization approach  Substance description  Standardization module  Moieties generator  Normalization  InChI[Key] generator  Hash function f(InChIKeys, moieties)  Unique ID  Standard description
  • 27. SRS TBD  Markush  Polymers  Proteins  Inorganics  Materials
  • 28. OpenPHACTS  Open PHACTS is an Innovative Medicines Initiative (IMI) – 3 years project  To reduce the barriers to drug discovery in industry, academia and for small businesses  To build an open platform, integrating chemistry and biology data from public domain resources  Semantic web platform  Open Standards, Open Data and Open Source
  • 29.
  • 30.
  • 31. OpenPHACTS specifics  Active/inactive ingredient  Parent/child  Sample/substance  Misreferences (!!!)
  • 33.
  • 34. ChemSpider Reaction Challenges  Deduplication  Identification  Deposition
  • 35. Conclusions  InChI is The Identifier  InChI has its limitations  InChI is work in progress  InChI deficiencies can be hot-fixed
  • 36. Acknowledgements  RSC Cheminformatics group  FDA SRS group  OpenPHACTS consortium  Software: InChI, GGA Software
  • 37. Thank you Email: tkachenkov@rsc.org Blog: www.chemspider.com/blog SLIDES: http://www.slideshare.net/valerytkachenko16