InChI/InChIKey vs. NCI/CADD Structure Identifiers: A comparison Markus Sitzmann Computer-Aided Drug Design Group (NCI/CADD...
The Adaption and Use of the IUPAC InChI/InChIKey NCI/CADD Identifiers InChI/InChIKey Chemical Structure Lookup Service FIC...
<ul><li>based on hashcodes calculated by the chemoinformatics toolkit CACTVS </li></ul><ul><li>CACTVS hashcodes:   </li></...
charged form A3DAE0788050DDE4  3ECEF579D7DF025A tautomers isotope “ errors” E92E4BA2869F3611 8A7AD1EB498CC76A stereoisomer...
input structure MDL Molfile MDL SDF SMILES ChemDraw cdx PDB structure normalization parent structure MDL SDF SMILES databa...
<ul><li>adjustable levels of sensitivity: </li></ul>NCI/CADD Structure Identifiers Fragments sensitive keep only largest o...
NCI/CADD Structure Identifiers Fragments Isotopes Charges sensitive sensitive sensitive un-sensitive un-sensitive un-sensi...
NCI/CADD Structure Identifiers Fragments Isotopes Charges sensitive sensitive sensitive F I C FICTS identifier:   represen...
NCI/CADD Structure Identifiers Fragments Isotopes Charges sensitive sensitive sensitive F I C FICuS identifier:  comes clo...
NCI/CADD Structure Identifier Fragments Isotopes Charges Tautomers Stereochemistry Na + sensitive sensitive sensitive sens...
NCI/CADD Structure Identifier correct structure: add hydrogen atoms correct functional groups correct metal atom bonds inp...
NCI/CADD Structure Identifier 9850FD9F9E2B4E25 -FICTS-01-57   9850FD9F9E2B4E25 -FICuS-01-78 9850FD9F9E2B4E25 -uuuuu-01-27 ...
A3DAE0788050DDE4-FICTS  E5F83F10C5DB080A -FICTS B2FDA68AEDA06DB9-FICTS 9850FD9F9E2B4E25 -FICTS E5F83F10C5DB080A -FICTS E92...
A3DAE0788050DDE4-FICuS  E5F83F10C5DB080A -FICuS B2FDA68AEDA06DB9-FICuS 9850FD9F9E2B4E25 -FICuS E5F83F10C5DB080A -FICuS E92...
9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -FICuS 9850FD9F9E2B4E25 -uuuuu 98...
HNDVDQJCIGZPNO -UHFFFAOYSA-N HNDVDQJCIGZPNO -CDYZYAPPSA-N HNDVDQJCIGZPNO -RXMQYKEDSA-N  HNDVDQJCIGZPNO -YFKPBYRVSA-N HNDVD...
Structure Normalization Tautomers canonical tautomer ? O O OH O O OH O O O
<ul><li>CACTVS: generation of all formal tautomers for a given organic compound (prototropic tautomerism) </li></ul><ul><l...
Tautomers Structure Normalization <ul><li>transform:  1.3 keto-enol </li></ul><ul><li>[O,S,Se,Te;X1:1]=[Cx1:2][CX4R{0-2}:3...
Tautomers Structure Normalization A6199E68A788F2F5 -FICTS 959B273B619C709F -FICTS 61248C4A7D045A47 -FICTS 675R4FCC50F45026...
Tautomerism & Stereochemistry methyl propenyl ketone Structure Normalization O Z O E
tautomer tautomer methyl propenyl ketone Structure Normalization Tautomerism & Stereochemistry O Z O E O H
76D03F08ACDF6C0C -FICuS FICUS disregards stereo-chemistry on double bonds if the double bond is not located during tautome...
76D03F08ACDF6C0C -FICuS FICUS disregards stereo-chemistry on double bonds if the double bond is not located during tautome...
821D8C17ACE5040E -FICTS 6EB4AA2BAA11965F -FICTS  1677645190718885 -FICTS  tautomer tautomer 76D03F08ACDF6C0C -FICTS methyl...
Charges in Resonance Systems Structure Normalization F3A27F03AE77A722 F3A27F03AE77A722 62FADCB01F197FC9 canonical resonanc...
<ul><li>generation of all formal resonance structures for a given (charged) organic compound </li></ul><ul><li>rule set of...
Structure Normalization (no plausible unpolarized resonance structure can be drawn) münchnones: 1.2 shift 1.2 recombinatio...
<ul><li>PubChem database (including Open NCI database,  EPA DSSTox databases, NIAID HIV databases, NIST Webbook,  NLM Chem...
<ul><li>structure records   registered   in CSLS :   74.2 million </li></ul>successful calculation of: Standard InChI/InCh...
original structure record set (74.2 million) FICuS compound set (46.7 million unique) Standard InchI/InChIKey set calculat...
original structure record set (74.2 million) FICuS compound set (46.7 million unique) Standard InchI/InChIKey set calculat...
original structure record set (74.2 million) FICuS compound set (46.7 million unique) Standard InchI/InChIKey set calculat...
no conflicts  between Std. InChI/InChIKey and FICuS Detailed Comparison InChI/InChIKey - NCI/CADD Identifier comparison FI...
conflicts  between Std. InChI/InChIKey and FICuS Detailed Comparison InChI/InChIKey - NCI/CADD Identifier comparison struc...
conflicts  between Std. InChI/InChIKey and FICuS Detailed Comparison InChI/InChIKey - NCI/CADD Identifier comparison struc...
Detailed Comparison FICuS FICTS uuuuu 46.7 48.0 41.6 6.4 (13.7%) 3.8 (7.9%) 11.9 (28.6%) compounds (unique structures) (mi...
Detailed Comparison FICuS FICTS uuuuu 46.7 48.0 41.6 6.4 (13.7%) 3.8 (7.9%) 11.9 (28.6%) compounds (unique structures) (mi...
(formal) tautomer count >  1 (formal) tautomer count >  3 (formal) tautomer count > 10 full stereo contains metal atoms me...
FICuS : 12 different structure records linked to this structure Std. InChI/InChIKey (stdinchi-1) : calculates  3 different...
H N O N N H O O N O N O O N H Z E InChI/InChIKey - NCI/CADD Identifier comparison ChemBlock A3422/0145215 H N O N N H O O
H N O N N H O O N O N O O N H Z E tautomer: InChI/InChIKey - NCI/CADD Identifier comparison H N O N N H O O ChemBlock A342...
H N O N N H O O N O N O O N H Z E tautomer: tautomeric interconversion? InChI/InChIKey - NCI/CADD Identifier comparison Ch...
H N O N N H O O N O N O O N H Z E tautomer: tautomeric interconversion? tautomeric interconversion? S R InChI/InChIKey - N...
H N O N N H O O N O N O O N H Z E tautomer: tautomeric interconversion? tautomeric interconversion? InChI/InChIKey - NCI/C...
H N O N N H O O N O N O O N H Z E tautomer: tautomeric interconversion? tautomeric interconversion? S R InChI/InChIKey - N...
H N O N N H O O N O N O O N H Z E tautomer: tautomeric interconversion? tautomeric interconversion? S R InChI/InChIKey - N...
Dithiazinine InChI/InChIKey - NCI/CADD Identifier comparison S N S N I original structure
Dithiazinine InChI/InChIKey - NCI/CADD Identifier comparison S N S N I best representation S N S N I original structure
Dithiazinine InChI/InChIKey - NCI/CADD Identifier comparison S N S N I S N S N H I H H H H H S N S N I H H H best represen...
The Adaption and Use of the IUPAC InChI/InChIKey NCI/CADD Identifiers InChI/InChIKey FICTS FICuS uuuuu Std. InChI/InChIKey...
Web Service Chemical Structure REST Service (beta)  http://cactus.nci.nih.gov/chemical/structure/ {identifier} / {method} ...
Acknowledgments ChemNavigator Scott Hutton Tad Hurst CADD Group, LMC, NCI Marc Nicklaus Igor V. Filippov CACTVS, Xemistry ...
Upcoming SlideShare
Loading in...5
×

ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

667

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
667
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

  1. 1. InChI/InChIKey vs. NCI/CADD Structure Identifiers: A comparison Markus Sitzmann Computer-Aided Drug Design Group (NCI/CADD), Laboratory of Medicinal Chemistry, NCI-Frederick, NIH, DHHS
  2. 2. The Adaption and Use of the IUPAC InChI/InChIKey NCI/CADD Identifiers InChI/InChIKey Chemical Structure Lookup Service FICTS FICuS uuuuu Std. InChI/InChIKey 74 million structure records – 46 million unique structures
  3. 3. <ul><li>based on hashcodes calculated by the chemoinformatics toolkit CACTVS </li></ul><ul><li>CACTVS hashcodes: </li></ul><ul><ul><li>represent a chemical structure uniquely as 16-digit hexadecimal number (64-bit unsigned) </li></ul></ul><ul><ul><li>have a high sensitivity to structural features of a compound </li></ul></ul><ul><ul><li>change if connectivity changes </li></ul></ul>NCI/CADD Structure Identifiers Unique Representation of Chemical Structures 9850FD9F9E2B4E25 H N N N H 2 O H O
  4. 4. charged form A3DAE0788050DDE4 3ECEF579D7DF025A tautomers isotope “ errors” E92E4BA2869F3611 8A7AD1EB498CC76A stereoisomers 6C16DE2351F9FF50 salt 9850FD9F9E2B4E25 H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 2 O - O N a + H N N N H 3 + O - O 8F7A1DE5A733F0E0 O H N N N H 2 O N a 60525E1AF41497B6 H N N N H O H O B2FDA68AEDA06DB9 N H N 1 5 N H 2 O H O
  5. 5. input structure MDL Molfile MDL SDF SMILES ChemDraw cdx PDB structure normalization parent structure MDL SDF SMILES database NCI/CADD Identifier hashcode calculation NCI/CADD Structure Identifiers Unique Representation of Chemical Structures E_HASHISY
  6. 6. <ul><li>adjustable levels of sensitivity: </li></ul>NCI/CADD Structure Identifiers Fragments sensitive keep only largest organic fragment Isotopes ignore isotope labels sensitive Charges uncharge sensitive find canonical tautomer Stereochemistry sensitive discard stereo information un-sensitive un-sensitive un-sensitive un-sensitive sensitive Tautomers Na + Structure Normalization un-sensitive D D D D D D O O C O O H N H 2 O - O N H 3 + O H O N H 2 O O H O O H C O O H H N H 2 C O O H N H 2 H O O - O O H
  7. 7. NCI/CADD Structure Identifiers Fragments Isotopes Charges sensitive sensitive sensitive un-sensitive un-sensitive un-sensitive un-sensitive Tautomers Stereochemistry sensitive sensitive Na + Structure Normalization D D D D D D O O C O O H N H 2 O - O N H 3 + O H O N H 2 O O H O O H C O O H H N H 2 C O O H N H 2 H O O - O O H
  8. 8. NCI/CADD Structure Identifiers Fragments Isotopes Charges sensitive sensitive sensitive F I C FICTS identifier: representation of the exact drawing un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive T ≠ ≠ ≠ Tautomers Stereochemistry sensitive sensitive ≠ ≠ S Na + = = ≠ ≠ Structure Normalization D D D D D D O O C O O H N H 2 O - O N H 3 + O H O N H 2 O O H O O H C O O H H N H 2 C O O H N H 2 H O O - O O H
  9. 9. NCI/CADD Structure Identifiers Fragments Isotopes Charges sensitive sensitive sensitive F I C FICuS identifier: comes closest to how a chemist perceives a compound un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive u ≠ ≠ ≠ ≠ Tautomers Stereochemistry sensitive sensitive = = ≠ ≠ S Na + Structure Normalization D D D D D D O O C O O H N H 2 O - O N H 3 + O H O N H 2 O O H O O H C O O H H N H 2 C O O H N H 2 H O O - O O H
  10. 10. NCI/CADD Structure Identifier Fragments Isotopes Charges Tautomers Stereochemistry Na + sensitive sensitive sensitive sensitive sensitive = = = = = = = = uuuuu identifier: closely related forms of the same compound u u u u u un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive Structure Normalization O O - D D D D D D O - O N H 3 + O O H O O H C O O H H N H 2 C O O H N H 2 H O O H O O C O O H N H 2 O H O N H 2
  11. 11. NCI/CADD Structure Identifier correct structure: add hydrogen atoms correct functional groups correct metal atom bonds input structure normalize or discard stereo information define canonical tautomer discard isotope labels d Structure Normalization get largest fragment & uncharge: delete complex center get largest organic fragment delete radical center uncharge structure uuuuu uuuuS uuuTu uuuTS FICuu FICuS FICTS FICTu n n n n d d d define canonical resonance form/ protonation state parent structures
  12. 12. NCI/CADD Structure Identifier 9850FD9F9E2B4E25 -FICTS-01-57 9850FD9F9E2B4E25 -FICuS-01-78 9850FD9F9E2B4E25 -uuuuu-01-27 <CACTVS hashcode (E_HASHISY)>-<tag>-<version>-<checksum> H N N N H 2 O H O
  13. 13. A3DAE0788050DDE4-FICTS E5F83F10C5DB080A -FICTS B2FDA68AEDA06DB9-FICTS 9850FD9F9E2B4E25 -FICTS E5F83F10C5DB080A -FICTS E92E4BA2869F3611-FICTS 8A7AD1EB498CC76A-FICTS 6C16DE2351F9FF50-FICTS H N N N H 2 O - O N a + 9850FD9F9E2B4E25 -FICTS charged form tautomers isotope salt stereoisomers FICTS “ errors” H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O
  14. 14. A3DAE0788050DDE4-FICuS E5F83F10C5DB080A -FICuS B2FDA68AEDA06DB9-FICuS 9850FD9F9E2B4E25 -FICuS E5F83F10C5DB080A -FICuS E92E4BA2869F3611-FICuS 8A7AD1EB498CC76A-FICuS 9850FD9F9E2B4E25 -FICuS H N N N H 2 O - O N a + 9850FD9F9E2B4E25 -FICuS charged form tautomers isotope salt stereoisomers FICuS “ errors” H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O
  15. 15. 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -FICuS 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu H N N N H 2 O - O N a + 9850FD9F9E2B4E25 -uuuuu charged form tautomers isotope stereoisomers salt uuuuu “ errors” H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O
  16. 16. HNDVDQJCIGZPNO -UHFFFAOYSA-N HNDVDQJCIGZPNO -CDYZYAPPSA-N HNDVDQJCIGZPNO -RXMQYKEDSA-N HNDVDQJCIGZPNO -YFKPBYRVSA-N HNDVDQJCIGZPNO -UHFFFAOYSA-N H N N N H 2 O - O N a + HNDVDQJCIGZPNO -UHFFFAOYSA-N charged form tautomers isotope stereoisomers salt Std. InChIKey “ errors” HNDVDQJCIGZPNO -UHFFFAOYSA-N UHPNKBYGGMJTIM-UHFFFAOYSA-M UHPNKBYGGMJTIM-UHFFFAOYSA-M H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O
  17. 17. Structure Normalization Tautomers canonical tautomer ? O O OH O O OH O O O
  18. 18. <ul><li>CACTVS: generation of all formal tautomers for a given organic compound (prototropic tautomerism) </li></ul><ul><li>rule set of 21 transforms encoded as (CACTVS-extended) SMIRKS </li></ul><ul><li>types of tautomerism covered: </li></ul>Tautomers Structure Normalization <ul><li>1.3, 1.5 keto/enol  imine/enamine  imine/amine  lactam/lactim  1.3, 1.5, 1.7, 1.11 hydrogen atom shift on (aromatic) heteroatoms  keten/ynol  nitro/ aci -nitro  nitroso/oxime </li></ul><ul><li>special cases: cyanic/ iso -cyanic acid, phosphonic acid, formamidinesulfonic acid, isocyanide, furanones  and more … </li></ul>
  19. 19. Tautomers Structure Normalization <ul><li>transform: 1.3 keto-enol </li></ul><ul><li>[O,S,Se,Te;X1:1]=[Cx1:2][CX4R{0-2}:3] [#1:4] >> [#1:4] [O,S,Se,Te;X2:1][Cx1,cx1:2]=[C,cx1,cx0:3] </li></ul><ul><li>transform: 1.3 heteroatom H shift </li></ul><ul><li>[N,n,S,s,O,o,Se,Te:1]=[NX2,nX2,C,c,P,p:2] [N,n,S,O,Se,Te:3] [#1:4] >> [#1:4] [N,n,S,O,Se,Te:1] [NX2,nX2,C,c,P,p:2]=[N,n,S,s,O,o,Se,Te:3] </li></ul><ul><li>transform: 1.5 heteroatom H shift </li></ul><ul><li>[nX2,NX2,S,O,Se,Te:1]=[C,c,nX2,NX2:6][C,c:5]=[C,c,nX2:2] [N,n,S,s,O,o,Se,Te:3] [#1:4] >> [#1:4] [N,n,S,O,Se,Te:1] [C,c,nX2,NX2:6]=[C,c:5][C,c,nX2:2]=[NX2,S,O,Se,Te:3] </li></ul><ul><li>21 SMIRKS transforms, examples: </li></ul>
  20. 20. Tautomers Structure Normalization A6199E68A788F2F5 -FICTS 959B273B619C709F -FICTS 61248C4A7D045A47 -FICTS 675R4FCC50F45026 -FICTS 0B345B47F6625113 -FICTS 181CA9BCE3EF47F4 -FICTS 1AD375920BE60DAD -FICTS 67196F0B20B1D934 -FICTS BCCDA7D0CDACF120 -FICTS CE8F480C11DBFC4F -FICTS D46A1E6500B06AB6 -FICTS D979CF9770AC0BA5 -FICTS 56FFE8B5619FB01 -FICTS F802E527EC5C61BF -FICTS EF060DA9D97091DE -FICTS BCCDA7D0CDACF120 -FICuS guanine UYTPUPDQBNUYGX-UHFFFAOYSA-N N N H N H N O H 2 N N N H N H N O H 2 N N N H N N O H H 2 N H N N N H N O H 2 N N N N H N O H H 2 N H N N N H N O H 2 N N N N H N O H H 2 N H N N N N O H H 2 N H N N H N H N O H N N N H N H N O H H N H N N H N H N O H N N N H N H N O H H N H N N H N N O H H N H N N N H N O H H N H N N N H N O H H N
  21. 21. Tautomerism & Stereochemistry methyl propenyl ketone Structure Normalization O Z O E
  22. 22. tautomer tautomer methyl propenyl ketone Structure Normalization Tautomerism & Stereochemistry O Z O E O H
  23. 23. 76D03F08ACDF6C0C -FICuS FICUS disregards stereo-chemistry on double bonds if the double bond is not located during tautomer generation. tautomer tautomer methyl propenyl ketone InChI/InChIKey - NCI/CADD Identifier comparison Tautomerism & Stereochemistry O Z O E O H O
  24. 24. 76D03F08ACDF6C0C -FICuS FICUS disregards stereo-chemistry on double bonds if the double bond is not located during tautomer generation. tautomer InChI=1S/C5H8O/c1-3-4-5(2)6/h3-4H,1-2H3/b4-3+ LABTWGUMFABVFG -ONEGZZNKSA-N InChI=1S/C5H8O/c1-3-4-5(2)6/h3-4,6H,1H2,2H3/b5-4- LYGWZVOQSCPYDG -PLNGDYQASA-N InChI=1S/C5H8O/c1-3-4-5(2)6/h3-4H,1-2H3/b4-3- LABTWGUMFABVFG -ARJAWSKDSA-N tautomer methyl propenyl ketone InChI/InChIKey - NCI/CADD Identifier comparison Tautomerism & Stereochemistry InChI=1S/C5H8O/c1-3-4-5(2)6/h3-4H,1-2H3 LABTWGUMFABVFG -UHFFFAOYSA-N O Z O E O H O
  25. 25. 821D8C17ACE5040E -FICTS 6EB4AA2BAA11965F -FICTS 1677645190718885 -FICTS tautomer tautomer 76D03F08ACDF6C0C -FICTS methyl propenyl ketone FICTS “sees” four different structures InChI/InChIKey - NCI/CADD Identifier comparison Tautomerism & Stereochemistry O Z O E O H O
  26. 26. Charges in Resonance Systems Structure Normalization F3A27F03AE77A722 F3A27F03AE77A722 62FADCB01F197FC9 canonical resonance structure? uncharge ≠ uncharge problem! 2E011EE4519F7920 different protonation states N N H N N H H N N H N N H H
  27. 27. <ul><li>generation of all formal resonance structures for a given (charged) organic compound </li></ul><ul><li>rule set of 14 transforms encoded as (CACTVS-extended) SMIRKS </li></ul>Structure Normalization shifting of charges: 5 rules recombination of charges: 5 rules separation of charges: 4 rules O N O Charges in Resonance Systems O N O O N O O N O O N O O N O
  28. 28. Structure Normalization (no plausible unpolarized resonance structure can be drawn) münchnones: 1.2 shift 1.2 recombination 1.2 recombination separation (pentavalent N atom) 1.3 shift 1.3 shift 1.3 recombination 1.3 shift 1.3 shift 1.3 shift 1.3 shift Charges in Resonance Systems IUYUGWCTOLFFCL-UHFFFAOYSA-N F68AC07DE0D3379F -FICuS N O O N O O N O O N O O N O O N O O N O O N O O
  29. 29. <ul><li>PubChem database (including Open NCI database, EPA DSSTox databases, NIAID HIV databases, NIST Webbook, NLM ChemIDplus, ChemSpider … ) </li></ul><ul><li>ChemNavigator iResearch Library (compilation of commercially available screening compounds from ~250 international chemistry suppliers) </li></ul><ul><li>Commercial Sources / Others ( Asinex, Comgenex, … ) </li></ul>»Chemical Structure Lookup Service« Database 74 million structure records (~46 million unique structures) InChI/InChIKey - NCI/CADD Identifier comparison ChemNav. iResearch Lib. ~43% PubChem ~47% Others ~ 10%
  30. 30. <ul><li>structure records registered in CSLS : 74.2 million </li></ul>successful calculation of: Standard InChI/InChIKey: 73.8 million records NCI/CADD Structure Identifiers: 73.7 million records <ul><li>compound sets (unique chemical structure sets): </li></ul>Standard InChI/InChIKey: FICTS Identifier FICuS Identifier Standard InChIKey (first block) uuuuu Identifier 48,027,940 48,023,835 46,715,521 43,055,589 41,671,010 Standard InChI/InChIKeys where calculated by stdinchi-1 (Linux i-386 executable) from the original SD file records Unique Structure Counts InChI/InChIKey - NCI/CADD Identifier comparison
  31. 31. original structure record set (74.2 million) FICuS compound set (46.7 million unique) Standard InchI/InChIKey set calculated by stdinchi-1 (73.8 million, 48.0 million unique) Detailed Comparison InChI/InChIKey - NCI/CADD Identifier comparison
  32. 32. original structure record set (74.2 million) FICuS compound set (46.7 million unique) Standard InchI/InChIKey set calculated by stdinchi-1 (73.8 million, 48.0 million unique) Detailed Comparison 1 conflicts? InChI/InChIKey - NCI/CADD Identifier comparison
  33. 33. original structure record set (74.2 million) FICuS compound set (46.7 million unique) Standard InchI/InChIKey set calculated by stdinchi-1 (73.8 million, 48.0 million unique) Detailed Comparison Standard InChI/InChIKey calculated by CACTVS from FICuS compound structure 1 conflicts? InChI/InChIKey - NCI/CADD Identifier comparison same InChI/InChIKey? 2
  34. 34. no conflicts between Std. InChI/InChIKey and FICuS Detailed Comparison InChI/InChIKey - NCI/CADD Identifier comparison FICuS linked to a single InChI/InChIKey both linked to a single structure record both linked to multiple structure records 62.3 34.4 27.9 all structure records (46.9%) (38.0%) 73.7 (84.5%) structure records (million records) 1
  35. 35. conflicts between Std. InChI/InChIKey and FICuS Detailed Comparison InChI/InChIKey - NCI/CADD Identifier comparison structure records (million records) all structure records FICuS is linked to multiple InChI/InChIKeys or vice versa one FICuS is linked to multiple InChI/InChIKeys one InChI/InChIKey is linked to multiple FICuS 10.4 3.6 6.8 (4.6%) (9.3%) (84.5%) 73.7 1
  36. 36. conflicts between Std. InChI/InChIKey and FICuS Detailed Comparison InChI/InChIKey - NCI/CADD Identifier comparison structure records (million records) all structure records FICuS is linked to multiple InChI/InChIKeys or vice versa one FICuS is linked to multiple InChI/InChIKeys one InChI/InChIKey is linked to multiple FICuS 10.4 3.6 6.8 (4.6%) (9.3%) (84.5%) 73.7 number of InChIKeys first block 0.9 number of InChIKeys first block 2.3 (1.2%) (3.1%) 1
  37. 37. Detailed Comparison FICuS FICTS uuuuu 46.7 48.0 41.6 6.4 (13.7%) 3.8 (7.9%) 11.9 (28.6%) compounds (unique structures) (million records) all compounds 73.7 9.3 4.6 (29.7%) 21.9 (6.2%) (12.7%) structure records (million records) all records InChI/InChIKey - NCI/CADD Identifier comparison same InChI/InChIKey? InChI changes InChI changes 2
  38. 38. Detailed Comparison FICuS FICTS uuuuu 46.7 48.0 41.6 6.4 (13.7%) 3.8 (7.9%) 11.9 (28.6%) compounds (unique structures) (million records) all compounds structure records (million records) all records InChI/InChIKey - NCI/CADD Identifier comparison 3.2 6.3 (7.6%) (8.4%) vs. InChIKey first block InChI changes InChI changes same InChI/InChIKey? 73.7 9.3 4.6 (29.7%) 21.9 (6.2%) (12.7%) 2
  39. 39. (formal) tautomer count > 1 (formal) tautomer count > 3 (formal) tautomer count > 10 full stereo contains metal atoms metal complexes salt has resonance charges inorganic compound classification 14.5% 18.5% 28.9% 16.9% 34.5% 52.1% 18.6% 52.1% 33.9% 56.4% 25.4% 5.5% 25.7% 0.8% 0.2% 1.0% 0.2% 0.1% Detailed Comparison InChI/InChIKey - NCI/CADD Identifier comparison occurrence in FICuS set occurrence in FICuS subset ( InChI changes )
  40. 40. FICuS : 12 different structure records linked to this structure Std. InChI/InChIKey (stdinchi-1) : calculates 3 different strings/keys for these 12 structure records (all have the same connectivity layer/first block) all of these 3 StdInChI/InChIKey differ from the StdInChI/InChIKey calculated after FICuS normalization (including connectivity layer/ first block) InChI/InChIKey - NCI/CADD Identifier comparison ChemBlock A3422/0145215 H N O N N H O O
  41. 41. H N O N N H O O N O N O O N H Z E InChI/InChIKey - NCI/CADD Identifier comparison ChemBlock A3422/0145215 H N O N N H O O
  42. 42. H N O N N H O O N O N O O N H Z E tautomer: InChI/InChIKey - NCI/CADD Identifier comparison H N O N N H O O ChemBlock A3422/0145215 N O N N H O O
  43. 43. H N O N N H O O N O N O O N H Z E tautomer: tautomeric interconversion? InChI/InChIKey - NCI/CADD Identifier comparison ChemBlock A3422/0145215 H N O N O O N H H N O N N H O O N O N N H O O
  44. 44. H N O N N H O O N O N O O N H Z E tautomer: tautomeric interconversion? tautomeric interconversion? S R InChI/InChIKey - NCI/CADD Identifier comparison ChemBlock A3422/0145215 H N O N O O N H H N O N N H O O N O N N H O O N O N N H O O N O N N H O O
  45. 45. H N O N N H O O N O N O O N H Z E tautomer: tautomeric interconversion? tautomeric interconversion? InChI/InChIKey - NCI/CADD Identifier comparison ChemBlock A3422/0145215 S R H N O N O O N H H N O N N H O O N O N N H O O N O N N H O O N O N N H O O
  46. 46. H N O N N H O O N O N O O N H Z E tautomer: tautomeric interconversion? tautomeric interconversion? S R InChI/InChIKey - NCI/CADD Identifier comparison ChemBlock A3422/0145215 N O N N H O O How many structures? ZINC04685909 ChemBlock A3422/0145215 ChemNavigator 47748165 NIST MS-Lib 1967005690 ChemNavigator 34903393 ChemNavigator 65635274 H N O N O O N H H N O N N H O O N O N N H O O N O N N H O O
  47. 47. H N O N N H O O N O N O O N H Z E tautomer: tautomeric interconversion? tautomeric interconversion? S R InChI/InChIKey - NCI/CADD Identifier comparison ChemBlock A3422/0145215 N O N N H O O How many structures? InChIKey A InChIKey B InChIKey C same connectivity layer/block FICuS parent structure H N O N O O N H H N O N N H O O N O N N H O O N O N N H O O
  48. 48. Dithiazinine InChI/InChIKey - NCI/CADD Identifier comparison S N S N I original structure
  49. 49. Dithiazinine InChI/InChIKey - NCI/CADD Identifier comparison S N S N I best representation S N S N I original structure
  50. 50. Dithiazinine InChI/InChIKey - NCI/CADD Identifier comparison S N S N I S N S N H I H H H H H S N S N I H H H best representation InChI FICuS Z E E Z E S N S N I original structure
  51. 51. The Adaption and Use of the IUPAC InChI/InChIKey NCI/CADD Identifiers InChI/InChIKey FICTS FICuS uuuuu Std. InChI/InChIKey 74 million structure records – 46 million unique structures http://cactus.nci.nih.gov/lookup Chemical Structure Lookup Service
  52. 52. Web Service Chemical Structure REST Service (beta) http://cactus.nci.nih.gov/chemical/structure/ {identifier} / {method} http://cactus.nci.nih.gov/chemical/structure/ InChIKey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N / smiles http://cactus.nci.nih.gov/chemical/structure/ InChIKey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N / names http://cactus.nci.nih.gov/chemical/structure/ InChIKey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N / ficus http://cactus.nci.nih.gov/chemical/structure/ InChIKey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N / stdinchi http://cactus.nci.nih.gov/chemical/structure/ InChIKey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N / image http://cactus.nci.nih.gov/chemical/structure/ ethanol / stdinchikey http://cactus.nci.nih.gov/chemical/structure/ 64-17-5 / stdinchikey URL scheme: returns plain text/gif image if the structure identifier is not resolvable: http 404 status code
  53. 53. Acknowledgments ChemNavigator Scott Hutton Tad Hurst CADD Group, LMC, NCI Marc Nicklaus Igor V. Filippov CACTVS, Xemistry GmbH Wolf-Dietrich Ihlenfeldt Thanks to all database providers http://cactus.nci.nih.gov Our web site:
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×