ACS San Diego, March 2012, InChI Symposium

8,515 views

Published on

Published in: Business, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
8,515
On SlideShare
0
From Embeds
0
Number of Embeds
6,122
Actions
Shares
0
Downloads
29
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • … but you can do things also independently from InChI – this is the general scheme Almost every identifier or representation can be converted to any other representation
  • If I take this rules and any of these input structures here
  • ACS San Diego, March 2012, InChI Symposium

    1. 1. Accessing NCI/CADD Web Resources by InChIMarkus SitzmannComputer-Aided Drug Design Group, Chemical Biology Laboratory,Frederick National Laboratory for Cancer Research, NIH, DHHS
    2. 2. http://cactus.nci.nih.gov
    3. 3. Chemical Identifier Resolver (CIR) CIR works as a resolver for different chemical structure identifiers or representations. It allows one to convert a given structure identifier into another representation or structure identifier.http://cactus.nci.nih.gov/chemical/structure
    4. 4. Chemical Structure Representations SYBYL Line Notation SMILES CAS Registry Number chemical names GIF image ChemNavigator SID SD File chemical structure CML FDA UNII NCI/CADD Identifiers NSC number MRV InChI/InChIKey PubChem SID/CID ChemSpider ID ChEBI ID Chemical Formula PDB Ligand ID
    5. 5. Chemical Structure Representations SYBYL Line Notation SMILES CAS Registry Number chemical names GIF image ChemNavigator SID SD File CML InChI FDA UNII NCI/CADD Identifiers NSC number MRV InChI/InChIKey PubChem SID/CID ChemSpider ID ChEBI ID Chemical Formula PDB Ligand ID
    6. 6. Chemical Structure Databases InChI many more …
    7. 7. Chemical Identifier Resolver (CIR) CIR works as a resolver for different chemical structure identifiers or representations. It allows one to convert a given structure identifier into another representation or structure identifier.http://cactus.nci.nih.gov/chemical/structure
    8. 8. Chemical Identifier Resolver (CIR) C7H6O2 APtclcactv03051222202D 0 0.00000 0.00000 WPYMKLBDIGXBTP-FZOZFQFYNA-N 15 15 0 0 0 0 0 0 0 0999 V2000 2.8660 -2.0600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 3.7321 -1.5600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 3.7321 -0.5600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.8660 -0.0600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.0000 -0.5600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.0000 -1.5600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 Works as a resolver for different 2.8660 0.9400 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 3.7321 1.4400 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 chemical structure identifiers. 2.0000 1.4400 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 2.8660 -2.6800 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 Allows one to convert a given 4.2690 -1.8700 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 4.2690 -0.2500 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 structure identifier into another 1.4631 -0.2500 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 1.4631 -1.8700 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 representation or structure 3.7321 2.0600 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 1 2 2 0 0 0 0 2 3 1 0 0 0 0 identifier. 3 4 2 0 0 0 0 4 5 1 0 0 0 0 5 6 2 0 0 0 0 1 6 1 0 0 0 0 4 7 1 0 0 0 0 7 8 1 0 0 0 0 7 9 2 0 0 0 0 1 10 1 0 0 0 0 2 11 1 0 0 0 0 3 12 1 0 0 0 0http://cactus.nci.nih.gov/chemical/structure 5 13 1 0 0 0 0 6 14 1 0 0 0 0 8 15 1 0 0 0 0ChemWriter Editor M END SD file $$$$
    9. 9. Chemical Identifier Resolver (CIR) benzoic acid 65-85-0 WLN: QVR WPYMKLBDIGXBTP-FZOZFQFYNA-N Unisept BZA AIDS018010 Salvo liquid Benzoic acid-ring-UL-14C ST5213864 Benzoesaeure CHEBI:30746 Works as a resolver for different NSC 149 benzenecarboxylic acid chemical structure identifiers. phenylformic acid Benzoic acid (JP15/USP) Allows one to convert a given Benzoic acid (TN) 18102_RIEDEL structure identifier into another Aromatic hydroxy acid Benzoic acid (7CI,8CI,9CI) representation or structure Benzoic acid [USAN:JAN] W213128_ALDRICH 47849_SUPELCO identifier. Acide benzoique [French] Acido benzoico [Italian] Benzoate (VAN) Benzoesaeure [German] Benzoic acid (natural) Acide benzoique Benzeneformic acid Benzenemethanoic acid Benzoesaeure GK Benzoesaeure GVhttp://cactus.nci.nih.gov/chemical/structure Benzoic acid, tech. Carboxybenzene Kyselina benzoovaChemWriter Editor Phenylcarboxylic acid names
    10. 10. Chemical Identifier Resolver (CIR) WPYMKLBDIGXBTP-FZOZFQFYNA-N Works as a resolver for different chemical structure identifiers. InChIKey=WPYMKLBDIGXBTP-UHFFFAOYSA-N InChI=1S/C7H6O2/c8-7(9)6-4-2-1-3-5-6/h1-5H,(H,8,9) Allows one to convert a given C1=CC=C(C=C1)C(O)=O structure identifier into another representation or structure identifier. InChIKeyhttp://cactus.nci.nih.gov/chemical/structure InChIChemWriter Editor SMILES
    11. 11. Chemical Identifier Resolver (CIR)programmatic URL API: http://cactus.nci.nih.gov/chemical/structure/”identifier”/”representation”if a request is not successful: HTTP404 status message
    12. 12. Chemical Identifier Resolver (CIR)examples:programmatic URL API:http://cactus.nci.nih.gov/chemical/structure/PGZUMBJQJWIWGJ-ONAKXNSWSA-N/cas204255-11-8 MIME type: text/plain http://cactus.nci.nih.gov/chemical/structure/”identifier”/”representation”http://cactus.nci.nih.gov/chemical/structure/PGZUMBJQJWIWGJ-ONAKXNSWSA-N/imageif a request is not successful: HTTP404 status message MIME type: image/gif
    13. 13. Chemical Identifier Resolver (CIR)• access by programming libraries/languages (e.g. Python): from urllib2 import * url = “http://cactus.nci.nih.gov/chemical/structure/tamiflu/cas” resolver = urlopen(url) try: response = resolver.read() except HTTPError: raise “your own error handling” print response 204255-11-8• access from Unix shell level (e.g., via wget): shell > wget -qO - http://cactus.nci.nih.gov/chemical/structure/tamiflu/cas 204255-11-8
    14. 14. Chemical Identifier Resolver: InChI/InChIKey Database RegIDs structure images (PubChem, ZINC, eMolecules, ChemSpider ID) (GIF, PNG)(trivial) names SMILES InChI/InChIKey IUPAC names (OPSIN) chemical properties (MW, formula, …) CAS Registry numbers structure files (sdf, pdb, cdx, …)
    15. 15. Chemical Identifier Resolver (CIR) /smiles chemical names /names, /iupac_name IUPAC names (OPSIN) /cas CAS numbers /inchi, /stdinchi SMILES strings /inchikey, /stdinchikey IUPAC InChI/InChIKeys /ficts, /ficus, /uuuuu NCI/CADD Identifiers /image CACTVS HASHISY CIR /file, /sdf NSC number http://cactus.nci.nih.gov/chemcial/structure /mw, /monoisotopic_mass PubChem SID /formula ZINC Code /twirl ChemSpider ID /urls ChemNavigator SID /chemspider_id eMolecule VID /pubchem_sid /chemnavigator_sid “identifier” “representation”
    16. 16. Chemical Identifier Resolver (CIR) identifier representation http request calculation of the http response identifier is a full structure requested structure representation representation detection of (e.g. SMILES, InChI) the identifier MIME type type e.g. InChI, GIF image identifier is a e.g. CAS number, hashed structure structure chemical name representation (e.g. InChIKey), chemical name etc. database lookup CSDB
    17. 17. Chemical Structure Database (CSDB)• ChemNavigator iResearch Library compilation of commercially available screening compounds from ~300 international chemistry suppliers PubChem ChemNav. ~38%• PubChem database iResearch Lib. including Open NCI database, EPA DSSTox ~56% databases, NIAID HIV database, NIST Webbook, NLM ChemIDplus, ChemSpider, … ~6%• Commercial Sources / others others Asinex, Comgenex, eMolecules, … current status: 140 chemical structure databases (as of March 2010) 120 million structure records 84.6 million unique structures by FICuS 110 million Standard InChIKeys for lookup
    18. 18. Chemical Structure Database (Update 2012)• PubChem Substance & Compound as separate databases (both updated to 2012)• ChemNavigator iResearch Library: updated to 2012• new databases, e.g. • Therapeutic Target Database (TTD) • Human Metabolome Database (HMDB) • DrugBank• “pull” download of databases also available in PubChem, e.g. • DSSTox, ZINC 2012/01, ChEBI 2012/01, ChEMBL13, ChemIDplus 2012/01• to a limited extend “historic versions” of databases are archived, e.g. comparison of PubChem Substance 2007 vs 2012 will be possible
    19. 19. Chemical Structure Database (CSDB)Chemical Structure Normalization• calculation of a set of parent structures with different sensitivity to chemical features: structure hashcode original normalization calculation parent NCI/CADD structure structure Identifier record E_HASHISY Molfile SDF FICTS SDF SMILES SMILES database FICuS ChemDraw cdx PDB uuuuu both the original structure record & the normalized parent structures are archived in the database
    20. 20. Chemical Structure Database (CSDB)NCI/CADD Identifiers (FICTS, FICuS, uuuuu)based on CACTVS hashcodes (HASHISY) O16-digit hexadecimal number (64-bit unsigned) HN OH N NH 2 9850FD9F9E2B4E25structure normalization: O O O O O Na+ HN OH N OH HN O- HN OH HN OH N NH NH NH2 N NH2 N NH2 N NH2 histidine: tautomer salt R S9850FD9F9E2B4E25-FICTS 6C16DE2351F9FF50-FICTS E5F83F10C5DB080A-FICTS E92E4BA2869F3611-FICTS 8A7AD1EB498CC76A-FICTS 9850FD9F9E2B4E25-FICuS E5F83F10C5DB080A-FICuS E92E4BA2869F3611-FICuS 8A7AD1EB498CC76A-FICuS 9850FD9F9E2B4E25-uuuuu
    21. 21. Chemical Structure Database (Update 2012)Unique structure count: (HASHISY)based on CACTVS hashcodes O16-digit hexadecimal number (64-bit unsigned) HN OH FICTS ~118 million N NH 2 FICuS ~115 million 9850FD9F9E2B4E25structure normalization: uuuuu ~100 million O O O O O Na+ HN OH N OH HN O- HN OH HN OH N NH NH NH2 N NH2 N NH2 N NH2Chemical Structure Database (Update 2012) histidine: tautomer salt R S231 small-molecule database9850FD9F9E2B4E25-FICTS 6C16DE2351F9FF50-FICTS E5F83F10C5DB080A-FICTS E92E4BA2869F3611-FICTS 8A7AD1EB498CC76A-FICTS367 database releases (full, incremental, “historic versions”) 9850FD9F9E2B4E25-FICuS E5F83F10C5DB080A-FICuS E92E4BA2869F3611-FICuS 8A7AD1EB498CC76A-FICuS324 million original database records 9850FD9F9E2B4E25-uuuuu
    22. 22. Chemical Structure Database (Update 2012)InChI/InChIKeyInChI/InChIKey (Version 1.04) calculated with four InChI flag sets: CACTVS Standard : Add H Standard InChIKey Set 1 : Add H DONOTADDH W0 FIXEDH RECMET NEWPS SPXYZ SAsXYZ Fb Fnud KET 15T Set 2 : Add H DONOTADDH W0 FIXEDH RECMET NEWPS SPXYZ SAsXYZ Fb Fnud KET 15T Set 3 : Add H DONOTADDH W0 FIXEDH RECMET NEWPS SPXYZ SAsXYZ Fb Fnud KET 15TStandard Set, Set 1 & Set 2: addition of hydrogen atoms by CACTVSSet 3: addition of hydrogen atoms by the InChI library
    23. 23. Chemical Structure Database (Update 2012)InChI/InChIKey• calculation of InChI/InChIKey Standard set, Set 1, Set 2 & Set 3 for all original structure records and normalized parent structure: structure hashcode original normalization calculation parent NCI/CADD structure structure Identifier record E_HASHISY FICTS FICuS uuuuu InChI/InChIKey Standard Set 1 Set 2 Set 3
    24. 24. Using CIR with InChI/InChIKey
    25. 25. Using CIR with InChI/InChIKey(Partial) InChIKey Lookup• resolve Standard InChIKey into full structure representation: Ethanolhttp://cactus.nci.nih.gov/chemical/structure/LFQSCWFLJHTTHZ-UHFFFAOYSA-N/smiles CCOhttp://cactus.nci.nih.gov/chemical/structure/LFQSCWFLJHTTHZ-UHFFFAOYSA/smiles` CCO CC[OH2+]http://cactus.nci.nih.gov/chemical/structure/LFQSCWFLJHTTHZ/smiles C(C(O)([2H])[2H])[2H] CC(O)([2H])[2H] C(CO)([2H])([2H])[2H] CC[17OH] C(CO)[2H] [14CH3]CO CCO
    26. 26. Using CIR with InChI/InChIKeyChemical File Representation• available file format representations: Aspirinhttp://cactus.nci.nih.gov/chemical/structure/BSYNRYMUTXBXSQ-UHFFFAOYSA-N/file?format=sdf alc Alchemy format maestro Schroedinger MacroModel cdxml CambridgeSoft ChemDraw XML format structure file format cerius MSI Cerius II format mol Symyx molecule file charmm Chemistry at HARvard sybyl2/mol2 Tripos Sybyl MOL2 format Macromolecular Mechanics file format mrv ChemAxon MRV format cif Crystallographic Information File pdb Protein Data Bank cml Chemical Markup Language sdf Symyx Structure Data Format gjf Gaussian input data file sdf3000 Symyx Structure Data Format 3000 gromacs GROMACS file format sln SYBYL Line Notation hyperchem HyperChem file format smiles SMILES jme Java Molecule Editor format xyz xyz file format
    27. 27. Using CIR with InChI/InChIKeyChemical Structure Images (GIF, PNG) Buckyball http://cactus.nci.nih.gov/chemical/structure/ XMWRBQBLMFGWIX-UHFFFAOYSA-N/image ?height=300&width=300&bgcolor=black&bondcolor=white Aspirin http://cactus.nci.nih.gov/chemical/structure/ BSYNRYMUTXBXSQ-UHFFFAOYSA-N/image ?height=200&width=200&symbolfontsize=7&footer="Aspirin"
    28. 28. Using CIR with InChI/InChIKey3D Chemical Structure Visualization (TwirlyMol)simple javascript that allows you to render a rotatable/zoomable3D representation of a molecule in your web browserimplemented by Noel OBoyle (University College Cork, Ireland)no plugin is needed, only a modern browser:Chrome Safari FF3.6+ IE9 IE8 IE7 IE6
    29. 29. Using CIR with InChI/InChIKey3D Chemical Structure Visualization (TwirlyMol)simple viewer: Restasishttp://cactus.nci.nih.gov/chemical/structure/DDPJWUQJQMKQIF-XPNZOOHZSA-N/twirlembedded into a web page: <div id=“canvas” height=“400” width=“400”></div> <script src=“http://cactus.nci.nih.gov/chemical/structure/ DDPJWUQJQMKQIF-XPNZOOHZSA-N/twirl_cached/canvas” />
    30. 30. Using CIR with InChI/InChIKey3D Chemical Structure Visualization (TwirlyMol) http://baoilleach.blogspot.com/ http://www.coronene.com/blog/ http://chemical-quantum-images.blogspot.com
    31. 31. Using CIR with InChI/InChIKeyChemical Database URLs• request database URLs: Restasishttp://cactus.nci.nih.gov/chemical/structure/DDPJWUQJQMKQIF-XPNZOOHZSA-N/urls/xml <?xml version="1.0" encoding="UTF-8" ?> <request string="DDPJWUQJQMKQIF-XPNZOOHZSA-N" representation="urls"> <data id="1" resolver=“stdinchikey" string_class=“Standard InChIKey"> <item id="1" classification="exact" database="ChemSpider" publisher="ChemSpider"> http://chemspider.com/structure.4939506 </item> <item id="2" classification="exact" database="ChemSpider“ publisher="PubChem"> http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?sid=43028058 </item> <item id="3" classification="exact" database="NLM ChemIDplus" publisher="NLM"> http://chem.sis.nlm.nih.gov/chemidplus/direct.jsp?result=advanced&regno=059865133 […] </data> </request>
    32. 32. Using CIR with InChI/InChIKeyChemical Name Lookup• request (alternative) names: Aspirinhttp://cactus.nci.nih.gov/chemical/structure/BSYNRYMUTXBXSQ-UHFFFAOYSA-N/names/xml <?xml version="1.0" encoding="UTF-8" ?> <request string=“BSYNRYMUTXBXSQ-UHFFFAOYSA-N" representation="names"> <data id="1" resolver=“stdinchikey" string_class=“Standard InChIKey"> <item id="1" classification=“pubchem_iupac_name">2-acetyloxybenzoic acid</item> <item id="2" classification="pubchem_iupac_openeye_name">2-Acetoxybenzoic acid</item> <item id="3" classification="pubchem_generic_registry_name">50-78-2</item> <item id="4" classification="pubchem_generic_registry_name">11126-35-5</item> <item id="5" classification="pubchem_generic_registry_name">11126-37-7</item> <item id="6" classification="pubchem_generic_registry_name">2349-94-2</item> <item id="7" classification="pubchem_generic_registry_name">26914-13-6</item> <item id="8" classification="pubchem_substance_synonym">NCGC00090977-04</item> <item id="9" classification="pubchem_substance_synonym">KBioSS_002272</item> <item id="10" classification="pubchem_substance_synonym">SBB015069</item> <item id="11" classification="pubchem_substance_synonym">Aspirin</item> <item id="12" classification="pubchem_substance_synonym">D00109</item> […]
    33. 33. Using CIR with InChI/InChIKeyChemical Properties• request molecular weight: Aspirinhttp://cactus.nci.nih.gov/chemical/structure/BSYNRYMUTXBXSQ-UHFFFAOYSA-N/weight 180.1598 MIME type: text/plain /mw molecular weight /aromatic compound is aromatic /formula formula /macrocyclic compound is macrocyclic /monoisotopic_mass monoisotopic mass /heteroatom_count heteroatom count /h_bond_donor_count H bond donor count /hydrogen_atom_count H atom count /h_bond_acceptor_count H bond acceptor count /heavy_atom_count heavy atom count /h_bond_center_count H bond center count /deprotonable_group_count number of /rotor_count number of rotatable bonds deprotonable groups /effective_rotor_count number of effectively /protonable_group_count number of rotatable bonds protonable groups /rule_of_5_violation_count number of Rule-of-5 /ring_count number of rings violations /ringsys_count number of ringsystems /xlogp2 octanol−water partition coefficient XLOGP2
    34. 34. Using CIR with InChI/InChIKeyChemical Name Pattern Search• Google-like searches on CIR’s name index (approx. 70 million names) example: all chemical names that contain the words “morphine” and “methyl” (name pattern: ‘+morphine +methyl‘): http://cactus.nci.nih.gov/chemical/structure/+morphine +methyl/stdinchikey/xml?resolver=name_pattern based on the open source full text search server Sphinx (http://sphinxsearch.com)
    35. 35. Search name pattern ‘+morphine +methyl’: 7 matching names<request string="+morphine +methyl" representation="stdinchikey"> <data id="1" resolver="name_pattern" notation="Morphine 3-methyl ether"> <item id="1">InChIKey=OROGSEYTTFOCAN-DNJOTXNNSA-N</item> </data> <data id="2" resolver="name_pattern" notation="6-Methyl-delta(sup 6)-deoxy-morphine"> <item id="1">InChIKey=CUFWYVOFDYVCPM-GGNLRSJOSA-N</item> </data> <data id="3" resolver="name_pattern" notation="Morphine, dihydro-6-methyl-"> <item id="1">InChIKey=NBKVWIJQJMEQLE-NGTWOADLSA-N</item> </data> <data id="4" resolver="name_pattern“ notation="6-METHYL-MORPHINE ETHER"> <item id="1">InChIKey=FNAHUZTWOVOCTL-UHFFFAOYSA-N</item> </data> <data id="5" resolver="name_pattern" notation="Morphine alcoholic methyl ether"> <item id="1">InChIKey=FNAHUZTWOVOCTL-XSSYPUMDSA-N</item> </data> <data id="6" resolver="name_pattern" notation="N-Methyl morphine chloride"> <item id="1">InChIKey=MJNCZWBHCFTYFU-SCLAZZCHSA-N</item> </data> <data id="7" resolver="name_pattern" notation="Morphine, 7-hydroxy-6,6-dimethoxy-3-O-methyl-"> <item id="1">InChIKey=URFKRBIESURBKC-UHFFFAOYSA-N</item> </data></request>
    36. 36. Using CIR with InChI/InChIKeyChemical Name Pattern Searchexample: chemical names that contain the words “morphine” and “methyl”but not “hydroxyl” (name pattern: ‘+morphine +methyl -hydroxyl‘):http://cactus.nci.nih.gov/chemical/structure/+morphine +methyl -hydroxyl/stdinchikey/xml?resolver=name_pattern 6 matching namesexample: chemical names that contain the substring “morphine”somewhere in the name (name pattern: ‘*morphine*‘)http://cactus.nci.nih.gov/chemical/structure/*morphine*/stdinchikey/xml?resolver=name_pattern 45 matching namesexample: chemical names that contain a single character “m” and the word“benzene” in a maximum distance of 3 words (finds meta-substituted aromaticcompounds, name pattern: ‘“m benzene”~3‘):http://cactus.nci.nih.gov/chemical/structure/(m benzene)~3/stdinchikey/xml?resolver=name_pattern 22 matching names
    37. 37. Structure Normalization (Tautomerism)
    38. 38. Structure NormalizationTautomerism21 SMIRKS transform rules: rule 1: 1.3 (thio)keto/(thio)enol rule 12: furanones rule 2: 1.5 (thio)keto/(thio)enol rule 13: keten/ynol exchange rule 3: simple (aliphatic) imine rule 14: ionic nitro/aci-nitro rule 4: special imine rule 15: pentavalent nitro/aci-nitro rule 5: 1.3 aromatic heteroatom H shift rule 16: oxim/nitroso rule 6: 1.3 heteroatom H shift rule 17: oxim/nitroso via phenol rule 7: 1.5 (aromatic) heteroatom H shift (1) rule 18: cyanic/iso-cyanic acids rule 8: 1.5 aromatic heteroatom H shift (2) rule 19: formamidinesulfinic acids rule 9: 1.7 (aromatic) heteroatom H shift rule 20: isocyanides rule 10: 1.9 (aromatic) heteroatom H shift rule 21: phosphonic acids rule 11: 1.11 (aromatic) heteroatom H shift
    39. 39. Structure NormalizationTautomerismrule 1: 1.3 (thio)keto/(thio)enol 1 1 H4 O O 2 2 3 H4 3 1.3 keto/enol [O,S,Se,Te;X1:1]=[C;z{1-2}:2][CX4R{0-2}:3][#1:4]>> [#1:4][O,S,Se,Te;X2:1][#6;z{1-2}:2]=[C,cz{0-1}R{0-1}:3]rule 6: 1.3 heteroatom H shift 4 4 H H 3 1 3 S N 1 S N 2 H H 2 N N H 1.3 heteroatom H shift H [N,n,S,s,O,o,Se,Te:1]=[NX2,nX2,C,c,P,p:2][N,n,S,O,Se,Te:3][#1:4]>> [#1:4][N,n,S,O,Se,Te:1][NX2,nX2,C,c,P,p:2]=[N,n,S,s,O,o,Se,Te:3]
    40. 40. Structure NormalizationWarfarin - Tautomers HO O HO O HO O O O HO O HO O O O O O O O O O HO O HO O O O O O O O O OH HO OH HO OH prototropic tautomerism
    41. 41. Structure NormalizationWarfarin - Tautomers HO O HO O HO O O O HO O HO O O O O O O O O O HO O HO O O O O O O O O OH HO OH HO OHhttp://cactus.nci.nih.gov/chemical/structure/tautomers:warfarin/representation prototropic tautomerism
    42. 42. Structure NormalizationWarfarin – FICuS Identifier FICuS HO O HO O HO O O O HO O HO O D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS O O O O O O O O HO O HO O D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS O O O O O O O OH HO OH HO OH D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS D76B88C0354759F1-FICuShttp://cactus.nci.nih.gov/chemical/structure/tautomers:warfarin/ficus prototropic tautomerism tautomerism prototropic
    43. 43. Structure NormalizationWarfarin – FICuS Identifier FICuS HO O HO O HO O O O O O HO O HO O O HO D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS 09BB2FAADA1508A7-FICuS O O O O O O O O HO O O HO O HO O O D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS 09BB2FAADA1508A7-FICuS O O O O O O O O HO O OH HO OH HO OH OH D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS 2F505A3FCA434B3C-FICuShttp://cactus.nci.nih.gov/chemical/structure/tautomers:warfarin/ficus ring-chain ring-chain prototropic tautomerism tautomerism prototropic tautomerism tautomerism
    44. 44. Structure NormalizationWarfarin – Standard InChIKey HO O HO O HO O O O O O HO O HO O O HOQTXVAVXCBMYBJW-UHFFFAOYSA-N VWSXIGYSLWNCBN-VAWYXSNFSA-N GRAAPKVUSREWIL-UHFFFAOYSA-N LSCYDZJASSKSMJ-UHFFFAOYSA-N O O O O O O O O HO O O HO O HO O OFQEPJUOLUDFINX-UHFFFAOYSA-N UCKRWKACBKRIKB-VAWYXSNFSA-N NNLYDNMZCAHUOV-UHFFFAOYSA-N XGIOTBZTMHLTRL-UHFFFAOYSA-N O O O O O O O O HO O OH HO OH HO OH OHPJVWKTKQMONHTI-UHFFFAOYSA-N FVSFCRPKSVCTBA-VAWYXSNFSA-N BBOSKMPTDUUMKL-UHFFFAOYSA-N QUJJIKXCACZKKD-UHFFFAOYSA-Nhttp://cactus.nci.nih.gov/chemical/structure/tautomers:warfarin/stdinchikey ring-chain prototropic tautomerism tautomerism
    45. 45. Structure NormalizationWarfarin – InChIKey HO O HO O HO O O O O O HO O HO O O HOSAYISSDYYDIVTP-UHFFFAOYNA-N SAYISSDYYDIVTP-UHFFFAOYNA-N PMOPDASZKFXBOL-UHFFFAOYNA-N LSCYDZJASSKSMJ-UHFFFAOYNA-N O O O O O O O O HO O O HO O HO O OSAYISSDYYDIVTP-UHFFFAOYNA-N SAYISSDYYDIVTP-UHFFFAOYNA-N PMOPDASZKFXBOL-UHFFFAOYNA-N FQOKLKCGRHFANU-UHFFFAOYNA-N O O O O O O O O HO O OH HO OH HO OH OHSAYISSDYYDIVTP-UHFFFAOYNA-N SAYISSDYYDIVTP-UHFFFAOYNA-N PMOPDASZKFXBOL-UHFFFAOYNA-N FQOKLKCGRHFANU-UHFFFAOYNA-NInChIKey (W0 RECMET NEWPS SPXYZ SAsXYZ Fb Fnud KET 15T) ring-chain prototropic tautomerism tautomerism
    46. 46. Structure NormalizationWarfarin• “normalize” Standard InChIKey by NCI/CADD’s business rules:http://cactus.nci.nih.gov/chemical/structure/normalize:QTXVAVXCBMYBJW-UHFFFAOYSA-N/stdinchikey InChIKey=FQEPJUOLUDFINX-UHFFFAOYSA-N MIME type: text/plain HO O O O O O O O QTXVAVXCBMYBJW-UHFFFAOYSA-N FQEPJUOLUDFINX-UHFFFAOYSA-N
    47. 47. Structure NormalizationChemical Operators• available operators: add_hyrogens, remove_hydrogens, normalize, ficts, ficus, uuuuu, scaffold_sequence, nostereo, stereoisomers, tautomersexample:http://cactus.nci.nih.gov/chemical/structure/ scaffold_sequence:FQEPJUOLUDFINX-UHFFFAOYSA-N/stdinchikey O O O O O O O O O XVYBSGQBRUYLNK-UHFFFAOYSA-N BQLSCAPEANVCOG-UHFFFAOYSA-N MERGMNQXULKBCH-UHFFFAOYSA-N Schuffenhauer et al., J. Chem. Inf. Model. 2007, 47, 47-58
    48. 48. Soon: Chemical File Resolver (CFR)
    49. 49. Chemical File Resolver (CFR) chemical HTTP Post HTTP Get chemical CFR file file• allows conversion of many chemical file formats into another format or other representations• will have a programmatic URL API & a HTML Web interface• url’izes all elements of the original file, i.e. provides access to each specific record, field, and any metadata (size, record count, etc.) of the posted file by URLs• release: Q2/2012 (hopefully)
    50. 50. Chemical File Resolver (CFR) chemical HTTP Post HTTP Get chemical CFR file file• HTTP: post a file (e.g. with curl), CFR replies with a MD5 hash key: curl -F upload=@/your/local/file.sdf http://cactus.nci.nih.gov/chemical/file >d85b396ed6ced6348a5b402eb8fcfe8b• accepted formats: • chemical file formats: alc, cdxml, cerius, charmm, cif, cml, jme, maestro, mol, mol2, mrv, pdb, sdf, sdf3000, sln, smiles, xyz, … • text files with a list of identifiers …
    51. 51. Post a plain text file, e.g.: ethanol HTTP Post HTTP Get chemical aspirin chemical CFR file InChI=1S/C4H10O/c1-3-5-4-2/h3-4H2,1-2H3 file CCOCC InChIKey=RCINICONZNJXQF-MZXODVADSA-N InChIKey=QTXVAVXCBMYBJW-UHFFFAOYSA-N• 204255-11-8a file, CFR replies with a MD5 hash sum: after posting tautomers:guanine curl -F upload=@/your/local/file.sdf http://cactus.nci.nih.gov/TEST/chemical/file ChemSpider_ID=1234 >d85b396ed6ced6348a5b402eb8fcfe8b Pubchem_SID=456• accepted formats: • chemical file formats: alc, cdxml, cerius, charmm, cif, cml, jme, maestro, mol, mol2, mrv, pdb, sdf, sdf3000, sln, smiles, xyz, … • text files with a list of identifier:
    52. 52. Chemical File Resolver (CFR) chemical HTTP Post HTTP Get chemical CFR file file• request new file format using the obtained MD5 hash key: d85b396ed6ced6348a5b402eb8fcfe8b curl http://cactus.nci.nih.gov/TEST/chemical/file/{key}?format={sdf, smi, pdb, cml, …}
    53. 53. Chemical File Resolver (CFR) chemical HTTP Post HTTP Get chemical CFR file file• request record 2 and 5 as SMILES string: d85b396ed6ced6348a5b402eb8fcfe8b curl http://cactus.nci.nih.gov/TEST/chemical/file/{key}?records=2,5&format=smiles
    54. 54. Chemical File Resolver (CFR) chemical HTTP Post HTTP Get chemical CFR file file• get field names: curl http://cactus.nci.nih.gov/TEST/chemical/file/{key}/fields• get a specific field value from record n: curl http://cactus.nci.nih.gov/TEST/chemical/file/{key}/n/{field_name}
    55. 55. Chemical Structure Web API external Chemical Chemical NCI/CADD web services Identifier File web service Resolver Resolver http Chemical Structure Web API other CACTVS software packages NCI/CADD Chemical Structure OPSIN Database (CSDB)
    56. 56. IUPAC InChI/InChIKey Resolver• (hopefully) there will be many resolvers from different providers with different background: • publishers • commercial databases • free sources and databases: ChemSpider, PubChem, ChEBI, …• InChI/InChIKey is the perfect tool to interlink the resolvers• ChemSpider, PubChem and NCI/CADD are working on a test protocol for a federated InChI/InChIKey resolver
    57. 57. IUPAC InChI/InChIKey Resolver Resolver 1 IUPAC Root Resolver Resolver 2 Resolver 3.1 Resolver 3 Resolver 3.2 Clients Resolver 3 CIR Resolver 3.3
    58. 58. http://cactus.nci.nih.gov
    59. 59. http://cactus.nci.nih.gov/blog
    60. 60. AcknowledgmentsThe InChI TeamNCI/CADD Team University of Cambridge, UKIgor Filippov Daniel LoweMarc NicklausXemistry GmbH, Germany University College Cork, IrelandWolf-Dietrich Ihlenfeldt Noel O’ BoyleAll Database providers ChemNavigator Scott Hutton Tad Hurst
    61. 61. Acknowledgments - Software CACTVS Python Web Framework ChemWriter Python SQL Library Peter Ertl (Novartis) Javascript library Fulltext Search Engine

    ×