1. Accessing NCI/CADD Web Resources by InChI
Markus Sitzmann
Computer-Aided Drug Design Group, Chemical Biology Laboratory,
Frederick National Laboratory for Cancer Research, NIH, DHHS
3. Chemical Identifier Resolver (CIR)
CIR works as a resolver for different
chemical structure identifiers or
representations.
It allows one to convert a given
structure identifier into another
representation or structure
identifier.
http://cactus.nci.nih.gov/chemical/structure
4. Chemical Structure Representations
SYBYL Line Notation
SMILES CAS Registry Number
chemical names
GIF image
ChemNavigator SID SD File
chemical structure
CML
FDA UNII
NCI/CADD Identifiers
NSC number
MRV
InChI/InChIKey
PubChem SID/CID
ChemSpider ID
ChEBI ID
Chemical Formula PDB Ligand ID
5. Chemical Structure Representations
SYBYL Line Notation
SMILES CAS Registry Number
chemical names
GIF image
ChemNavigator SID SD File
CML
InChI FDA UNII
NCI/CADD Identifiers
NSC number
MRV
InChI/InChIKey
PubChem SID/CID
ChemSpider ID
ChEBI ID
Chemical Formula PDB Ligand ID
7. Chemical Identifier Resolver (CIR)
CIR works as a resolver for different
chemical structure identifiers or
representations.
It allows one to convert a given
structure identifier into another
representation or structure
identifier.
http://cactus.nci.nih.gov/chemical/structure
9. Chemical Identifier Resolver (CIR)
benzoic acid
65-85-0
WLN: QVR
WPYMKLBDIGXBTP-FZOZFQFYNA-N Unisept BZA
AIDS018010
Salvo liquid
Benzoic acid-ring-UL-14C
ST5213864
Benzoesaeure
CHEBI:30746
Works as a resolver for different
NSC 149
benzenecarboxylic acid
chemical structure identifiers.
phenylformic acid
Benzoic acid (JP15/USP)
Allows one to convert a given
Benzoic acid (TN)
18102_RIEDEL
structure identifier into another
Aromatic hydroxy acid
Benzoic acid (7CI,8CI,9CI)
representation or structure
Benzoic acid [USAN:JAN]
W213128_ALDRICH
47849_SUPELCO
identifier.
Acide benzoique [French]
Acido benzoico [Italian]
Benzoate (VAN)
Benzoesaeure [German]
Benzoic acid (natural)
Acide benzoique
Benzeneformic acid
Benzenemethanoic acid
Benzoesaeure GK
Benzoesaeure GV
http://cactus.nci.nih.gov/chemical/structure
Benzoic acid, tech.
Carboxybenzene
Kyselina benzoova
ChemWriter Editor Phenylcarboxylic acid
names
10. Chemical Identifier Resolver (CIR)
WPYMKLBDIGXBTP-FZOZFQFYNA-N
Works as a resolver for different
chemical structure identifiers.
InChIKey=WPYMKLBDIGXBTP-UHFFFAOYSA-N
InChI=1S/C7H6O2/c8-7(9)6-4-2-1-3-5-6/h1-5H,(H,8,9)
Allows one to convert a given
C1=CC=C(C=C1)C(O)=O
structure identifier into another
representation or structure
identifier.
InChIKey
http://cactus.nci.nih.gov/chemical/structure
InChI
ChemWriter Editor
SMILES
11. Chemical Identifier Resolver (CIR)
programmatic URL API:
http://cactus.nci.nih.gov/chemical/structure/”identifier”/”representation”
if a request is not successful: HTTP404 status message
12. Chemical Identifier Resolver (CIR)
examples:
programmatic URL API:
http://cactus.nci.nih.gov/chemical/structure/PGZUMBJQJWIWGJ-ONAKXNSWSA-N/cas
204255-11-8 MIME type: text/plain
http://cactus.nci.nih.gov/chemical/structure/”identifier”/”representation”
http://cactus.nci.nih.gov/chemical/structure/PGZUMBJQJWIWGJ-ONAKXNSWSA-N/image
if a request is not successful: HTTP404 status message
MIME type: image/gif
13. Chemical Identifier Resolver (CIR)
• access by programming libraries/languages (e.g. Python):
from urllib2 import *
url = “http://cactus.nci.nih.gov/chemical/structure/tamiflu/cas”
resolver = urlopen(url)
try:
response = resolver.read()
except HTTPError:
raise “your own error handling”
print response
204255-11-8
• access from Unix shell level (e.g., via wget):
shell > wget -qO -
http://cactus.nci.nih.gov/chemical/structure/tamiflu/cas
204255-11-8
15. Chemical Identifier Resolver (CIR)
/smiles
chemical names /names, /iupac_name
IUPAC names (OPSIN) /cas
CAS numbers /inchi, /stdinchi
SMILES strings /inchikey, /stdinchikey
IUPAC InChI/InChIKeys /ficts, /ficus, /uuuuu
NCI/CADD Identifiers /image
CACTVS HASHISY CIR /file, /sdf
NSC number http://cactus.nci.nih.gov/chemcial/structure /mw, /monoisotopic_mass
PubChem SID /formula
ZINC Code /twirl
ChemSpider ID /urls
ChemNavigator SID /chemspider_id
eMolecule VID /pubchem_sid
/chemnavigator_sid
“identifier” “representation”
16. Chemical Identifier Resolver (CIR)
identifier representation
http request
calculation of the http response
identifier is a
full structure requested structure
representation representation
detection of (e.g. SMILES, InChI)
the identifier MIME type
type e.g. InChI, GIF image
identifier is a
e.g. CAS number,
hashed structure structure chemical name
representation
(e.g. InChIKey),
chemical name etc.
database lookup CSDB
17. Chemical Structure Database (CSDB)
• ChemNavigator iResearch Library
compilation of commercially available screening
compounds from ~300 international chemistry
suppliers PubChem
ChemNav. ~38%
• PubChem database iResearch Lib.
including Open NCI database, EPA DSSTox ~56%
databases, NIAID HIV database, NIST Webbook,
NLM ChemIDplus, ChemSpider, …
~6%
• Commercial Sources / others others
Asinex, Comgenex, eMolecules, …
current status: 140 chemical structure databases
(as of March 2010) 120 million structure records
84.6 million unique structures by FICuS
110 million Standard InChIKeys for lookup
18. Chemical Structure Database (Update 2012)
• PubChem Substance & Compound as separate databases
(both updated to 2012)
• ChemNavigator iResearch Library: updated to 2012
• new databases, e.g.
• Therapeutic Target Database (TTD)
• Human Metabolome Database (HMDB)
• DrugBank
• “pull” download of databases also available in PubChem, e.g.
• DSSTox, ZINC 2012/01, ChEBI 2012/01, ChEMBL13,
ChemIDplus 2012/01
• to a limited extend “historic versions” of databases are archived,
e.g. comparison of PubChem Substance 2007 vs 2012 will be
possible
19. Chemical Structure Database (CSDB)
Chemical Structure Normalization
• calculation of a set of parent structures with different
sensitivity to chemical features:
structure hashcode
original normalization calculation
parent NCI/CADD
structure
structure Identifier
record
E_HASHISY
Molfile SDF FICTS
SDF SMILES
SMILES database FICuS
ChemDraw cdx
PDB uuuuu
both the original structure record & the normalized parent structures
are archived in the database
20. Chemical Structure Database (CSDB)
NCI/CADD Identifiers (FICTS, FICuS, uuuuu)
based on CACTVS hashcodes (HASHISY) O
16-digit hexadecimal number (64-bit unsigned) HN OH
N NH 2
9850FD9F9E2B4E25
structure normalization:
O O O O O
Na+
HN OH N OH HN O- HN OH HN OH
N NH NH NH2 N NH2 N NH2 N NH2
histidine: tautomer salt R S
9850FD9F9E2B4E25-FICTS 6C16DE2351F9FF50-FICTS E5F83F10C5DB080A-FICTS E92E4BA2869F3611-FICTS 8A7AD1EB498CC76A-FICTS
9850FD9F9E2B4E25-FICuS E5F83F10C5DB080A-FICuS E92E4BA2869F3611-FICuS 8A7AD1EB498CC76A-FICuS
9850FD9F9E2B4E25-uuuuu
21. Chemical Structure Database (Update 2012)
Unique structure count: (HASHISY)
based on CACTVS hashcodes O
16-digit hexadecimal number (64-bit unsigned) HN OH
FICTS ~118 million N NH 2
FICuS ~115 million 9850FD9F9E2B4E25
structure normalization:
uuuuu ~100 million
O O O O O
Na+
HN OH N OH HN O- HN OH HN OH
N NH NH NH2 N NH2 N NH2 N NH2
Chemical Structure Database (Update 2012)
histidine: tautomer salt R S
231 small-molecule database
9850FD9F9E2B4E25-FICTS 6C16DE2351F9FF50-FICTS E5F83F10C5DB080A-FICTS E92E4BA2869F3611-FICTS 8A7AD1EB498CC76A-FICTS
367 database releases (full, incremental, “historic versions”)
9850FD9F9E2B4E25-FICuS E5F83F10C5DB080A-FICuS E92E4BA2869F3611-FICuS 8A7AD1EB498CC76A-FICuS
324 million original database records 9850FD9F9E2B4E25-uuuuu
22. Chemical Structure Database (Update 2012)
InChI/InChIKey
InChI/InChIKey (Version 1.04) calculated with four InChI flag sets:
CACTVS
Standard : Add H Standard InChIKey
Set 1 : Add H DONOTADDH W0 FIXEDH RECMET NEWPS SPXYZ SAsXYZ Fb Fnud KET 15T
Set 2 : Add H DONOTADDH W0 FIXEDH RECMET NEWPS SPXYZ SAsXYZ Fb Fnud KET 15T
Set 3 : Add H DONOTADDH W0 FIXEDH RECMET NEWPS SPXYZ SAsXYZ Fb Fnud KET 15T
Standard Set, Set 1 & Set 2: addition of hydrogen atoms by CACTVS
Set 3: addition of hydrogen atoms by the InChI library
23. Chemical Structure Database (Update 2012)
InChI/InChIKey
• calculation of InChI/InChIKey Standard set, Set 1, Set 2 & Set 3
for all original structure records and normalized parent structure:
structure hashcode
original normalization calculation
parent NCI/CADD
structure
structure Identifier
record
E_HASHISY
FICTS
FICuS
uuuuu
InChI/InChIKey
Standard Set 1 Set 2 Set 3
25. Using CIR with InChI/InChIKey
(Partial) InChIKey Lookup
• resolve Standard InChIKey into full structure representation:
Ethanol
http://cactus.nci.nih.gov/chemical/structure/LFQSCWFLJHTTHZ-UHFFFAOYSA-N/smiles
CCO
http://cactus.nci.nih.gov/chemical/structure/LFQSCWFLJHTTHZ-UHFFFAOYSA/smiles`
CCO
CC[OH2+]
http://cactus.nci.nih.gov/chemical/structure/LFQSCWFLJHTTHZ/smiles
C(C(O)([2H])[2H])[2H]
CC(O)([2H])[2H]
C(CO)([2H])([2H])[2H]
CC[17OH]
C(CO)[2H]
[14CH3]CO
CCO
26. Using CIR with InChI/InChIKey
Chemical File Representation
• available file format representations:
Aspirin
http://cactus.nci.nih.gov/chemical/structure/BSYNRYMUTXBXSQ-UHFFFAOYSA-N/file?format=sdf
alc Alchemy format maestro Schroedinger MacroModel
cdxml CambridgeSoft ChemDraw XML format structure file format
cerius MSI Cerius II format mol Symyx molecule file
charmm Chemistry at HARvard sybyl2/mol2 Tripos Sybyl MOL2 format
Macromolecular Mechanics file format mrv ChemAxon MRV format
cif Crystallographic Information File pdb Protein Data Bank
cml Chemical Markup Language sdf Symyx Structure Data Format
gjf Gaussian input data file sdf3000 Symyx Structure Data Format 3000
gromacs GROMACS file format sln SYBYL Line Notation
hyperchem HyperChem file format smiles SMILES
jme Java Molecule Editor format xyz xyz file format
27. Using CIR with InChI/InChIKey
Chemical Structure Images (GIF, PNG)
Buckyball
http://cactus.nci.nih.gov/chemical/structure/
XMWRBQBLMFGWIX-UHFFFAOYSA-N/image
?height=300&width=300&bgcolor=black&bondcolor=white
Aspirin
http://cactus.nci.nih.gov/chemical/structure/
BSYNRYMUTXBXSQ-UHFFFAOYSA-N/image
?height=200&width=200&symbolfontsize=7&footer="Aspirin"
28. Using CIR with InChI/InChIKey
3D Chemical Structure Visualization (TwirlyMol)
simple javascript that allows you to render a rotatable/zoomable
3D representation of a molecule in your web browser
implemented by Noel O'Boyle (University College Cork, Ireland)
no plugin is needed, only a modern browser:
Chrome Safari FF3.6+ IE9 IE8 IE7 IE6
29. Using CIR with InChI/InChIKey
3D Chemical Structure Visualization (TwirlyMol)
simple viewer: Restasis
http://cactus.nci.nih.gov/chemical/structure/DDPJWUQJQMKQIF-XPNZOOHZSA-N/twirl
embedded into a web page:
<div id=“canvas” height=“400” width=“400”></div>
<script src=“http://cactus.nci.nih.gov/chemical/structure/
DDPJWUQJQMKQIF-XPNZOOHZSA-N/twirl_cached/canvas” />
30. Using CIR with InChI/InChIKey
3D Chemical Structure Visualization (TwirlyMol)
http://baoilleach.blogspot.com/
http://www.coronene.com/blog/
http://chemical-quantum-images.blogspot.com
33. Using CIR with InChI/InChIKey
Chemical Properties
• request molecular weight:
Aspirin
http://cactus.nci.nih.gov/chemical/structure/BSYNRYMUTXBXSQ-UHFFFAOYSA-N/weight
180.1598 MIME type: text/plain
/mw molecular weight /aromatic compound is aromatic
/formula formula /macrocyclic compound is macrocyclic
/monoisotopic_mass monoisotopic mass /heteroatom_count heteroatom count
/h_bond_donor_count H bond donor count /hydrogen_atom_count H atom count
/h_bond_acceptor_count H bond acceptor count /heavy_atom_count heavy atom count
/h_bond_center_count H bond center count /deprotonable_group_count number of
/rotor_count number of rotatable bonds deprotonable groups
/effective_rotor_count number of effectively /protonable_group_count number of
rotatable bonds protonable groups
/rule_of_5_violation_count number of Rule-of-5 /ring_count number of rings
violations /ringsys_count number of ringsystems
/xlogp2 octanol−water partition
coefficient XLOGP2
34. Using CIR with InChI/InChIKey
Chemical Name Pattern Search
• Google-like searches on CIR’s name index (approx. 70 million names)
example: all chemical names that contain the words “morphine” and “methyl”
(name pattern: ‘+morphine +methyl‘):
http://cactus.nci.nih.gov/chemical/structure/+morphine +methyl/stdinchikey/xml?resolver=name_pattern
based on the open source
full text search server Sphinx
(http://sphinxsearch.com)
36. Using CIR with InChI/InChIKey
Chemical Name Pattern Search
example: chemical names that contain the words “morphine” and “methyl”
but not “hydroxyl” (name pattern: ‘+morphine +methyl -hydroxyl‘):
http://cactus.nci.nih.gov/chemical/structure/+morphine +methyl -hydroxyl/stdinchikey/xml?resolver=name_pattern
6 matching names
example: chemical names that contain the substring “morphine”
somewhere in the name (name pattern: ‘*morphine*‘)
http://cactus.nci.nih.gov/chemical/structure/*morphine*/stdinchikey/xml?resolver=name_pattern
45 matching names
example: chemical names that contain a single character “m” and the word
“benzene” in a maximum distance of 3 words (finds meta-substituted aromatic
compounds, name pattern: ‘“m benzene”~3‘):
http://cactus.nci.nih.gov/chemical/structure/(m benzene)~3/stdinchikey/xml?resolver=name_pattern
22 matching names
39. Structure Normalization
Tautomerism
rule 1: 1.3 (thio)keto/(thio)enol
1 1 H4
O O
2
2 3 H4 3
1.3 keto/enol
[O,S,Se,Te;X1:1]=[C;z{1-2}:2][CX4R{0-2}:3][#1:4]>>
[#1:4][O,S,Se,Te;X2:1][#6;z{1-2}:2]=[C,cz{0-1}R{0-1}:3]
rule 6: 1.3 heteroatom H shift
4 4
H H 3
1 3
S N 1 S N
2 H H
2
N N
H 1.3 heteroatom H shift H
[N,n,S,s,O,o,Se,Te:1]=[NX2,nX2,C,c,P,p:2][N,n,S,O,Se,Te:3][#1:4]>>
[#1:4][N,n,S,O,Se,Te:1][NX2,nX2,C,c,P,p:2]=[N,n,S,s,O,o,Se,Te:3]
40. Structure Normalization
Warfarin - Tautomers
HO O HO O HO O
O O HO O HO O
O O O O O O
O O HO O HO O
O O O O O O
O OH HO OH HO OH
prototropic tautomerism
41. Structure Normalization
Warfarin - Tautomers
HO O HO O HO O
O O HO O HO O
O O O O O O
O O HO O HO O
O O O O O O
O OH HO OH HO OH
http://cactus.nci.nih.gov/chemical/structure/tautomers:warfarin/representation
prototropic tautomerism
42. Structure Normalization
Warfarin – FICuS Identifier
FICuS
HO O HO O HO O
O O HO O HO O
D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS
O O O O O O
O O HO O HO O
D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS
O O O O O O
O OH HO OH HO OH
D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS
http://cactus.nci.nih.gov/chemical/structure/tautomers:warfarin/ficus
prototropic tautomerism tautomerism
prototropic
43. Structure Normalization
Warfarin – FICuS Identifier
FICuS
HO O HO O HO O O O
O O HO O HO O O
HO
D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS 09BB2FAADA1508A7-FICuS
O O O O O O O O
HO
O O HO O HO O O
D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS 09BB2FAADA1508A7-FICuS
O O O O O O O O
HO
O OH HO OH HO OH OH
D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS 2F505A3FCA434B3C-FICuS
http://cactus.nci.nih.gov/chemical/structure/tautomers:warfarin/ficus
ring-chain
ring-chain
prototropic tautomerism tautomerism
prototropic
tautomerism
tautomerism
44. Structure Normalization
Warfarin – Standard InChIKey
HO O HO O HO O O O
O O HO O HO O O
HO
QTXVAVXCBMYBJW-UHFFFAOYSA-N VWSXIGYSLWNCBN-VAWYXSNFSA-N GRAAPKVUSREWIL-UHFFFAOYSA-N LSCYDZJASSKSMJ-UHFFFAOYSA-N
O O O O O O O O
HO
O O HO O HO O O
FQEPJUOLUDFINX-UHFFFAOYSA-N UCKRWKACBKRIKB-VAWYXSNFSA-N NNLYDNMZCAHUOV-UHFFFAOYSA-N XGIOTBZTMHLTRL-UHFFFAOYSA-N
O O O O O O O O
HO
O OH HO OH HO OH OH
PJVWKTKQMONHTI-UHFFFAOYSA-N FVSFCRPKSVCTBA-VAWYXSNFSA-N BBOSKMPTDUUMKL-UHFFFAOYSA-N QUJJIKXCACZKKD-UHFFFAOYSA-N
http://cactus.nci.nih.gov/chemical/structure/tautomers:warfarin/stdinchikey
ring-chain
prototropic tautomerism
tautomerism
45. Structure Normalization
Warfarin – InChIKey
HO O HO O HO O O O
O O HO O HO O O
HO
SAYISSDYYDIVTP-UHFFFAOYNA-N SAYISSDYYDIVTP-UHFFFAOYNA-N PMOPDASZKFXBOL-UHFFFAOYNA-N LSCYDZJASSKSMJ-UHFFFAOYNA-N
O O O O O O O O
HO
O O HO O HO O O
SAYISSDYYDIVTP-UHFFFAOYNA-N SAYISSDYYDIVTP-UHFFFAOYNA-N PMOPDASZKFXBOL-UHFFFAOYNA-N FQOKLKCGRHFANU-UHFFFAOYNA-N
O O O O O O O O
HO
O OH HO OH HO OH OH
SAYISSDYYDIVTP-UHFFFAOYNA-N SAYISSDYYDIVTP-UHFFFAOYNA-N PMOPDASZKFXBOL-UHFFFAOYNA-N FQOKLKCGRHFANU-UHFFFAOYNA-N
InChIKey (W0 RECMET NEWPS SPXYZ SAsXYZ Fb Fnud KET 15T)
ring-chain
prototropic tautomerism
tautomerism
46. Structure Normalization
Warfarin
• “normalize” Standard InChIKey by NCI/CADD’s business rules:
http://cactus.nci.nih.gov/chemical/structure/normalize:QTXVAVXCBMYBJW-UHFFFAOYSA-N/stdinchikey
InChIKey=FQEPJUOLUDFINX-UHFFFAOYSA-N MIME type: text/plain
HO O O O
O O O O
QTXVAVXCBMYBJW-UHFFFAOYSA-N FQEPJUOLUDFINX-UHFFFAOYSA-N
47. Structure Normalization
Chemical Operators
• available operators:
add_hyrogens, remove_hydrogens, normalize, ficts, ficus, uuuuu,
scaffold_sequence, nostereo, stereoisomers, tautomers
example:
http://cactus.nci.nih.gov/chemical/structure/
scaffold_sequence:FQEPJUOLUDFINX-UHFFFAOYSA-N/stdinchikey
O O O O O O
O O O
XVYBSGQBRUYLNK-UHFFFAOYSA-N BQLSCAPEANVCOG-UHFFFAOYSA-N MERGMNQXULKBCH-UHFFFAOYSA-N
Schuffenhauer et al., J. Chem. Inf. Model. 2007, 47, 47-58
49. Chemical File Resolver (CFR)
chemical HTTP Post HTTP Get chemical
CFR
file file
• allows conversion of many chemical file formats into another
format or other representations
• will have a programmatic URL API & a HTML Web interface
• url’izes all elements of the original file, i.e. provides access to each
specific record, field, and any metadata (size, record count, etc.) of
the posted file by URLs
• release: Q2/2012 (hopefully)
50. Chemical File Resolver (CFR)
chemical HTTP Post HTTP Get chemical
CFR
file file
• HTTP: post a file (e.g. with curl), CFR replies with a MD5 hash key:
curl -F upload=@/your/local/file.sdf http://cactus.nci.nih.gov/chemical/file
>d85b396ed6ced6348a5b402eb8fcfe8b
• accepted formats:
• chemical file formats: alc, cdxml, cerius, charmm, cif, cml, jme,
maestro, mol, mol2, mrv, pdb, sdf, sdf3000, sln, smiles, xyz, …
• text files with a list of identifiers …
51. Post a plain text file, e.g.:
ethanol HTTP Post HTTP Get
chemical
aspirin chemical
CFR
file
InChI=1S/C4H10O/c1-3-5-4-2/h3-4H2,1-2H3 file
CCOCC
InChIKey=RCINICONZNJXQF-MZXODVADSA-N
InChIKey=QTXVAVXCBMYBJW-UHFFFAOYSA-N
• 204255-11-8a file, CFR replies with a MD5 hash sum:
after posting
tautomers:guanine
curl -F upload=@/your/local/file.sdf http://cactus.nci.nih.gov/TEST/chemical/file
ChemSpider_ID=1234
>d85b396ed6ced6348a5b402eb8fcfe8b
Pubchem_SID=456
• accepted formats:
• chemical file formats: alc, cdxml, cerius, charmm, cif, cml, jme,
maestro, mol, mol2, mrv, pdb, sdf, sdf3000, sln, smiles, xyz, …
• text files with a list of identifier:
52. Chemical File Resolver (CFR)
chemical HTTP Post HTTP Get chemical
CFR
file file
• request new file format using the obtained MD5 hash key:
d85b396ed6ced6348a5b402eb8fcfe8b
curl http://cactus.nci.nih.gov/TEST/chemical/file/{key}?format={sdf, smi, pdb, cml, …}
53. Chemical File Resolver (CFR)
chemical HTTP Post HTTP Get chemical
CFR
file file
• request record 2 and 5 as SMILES string:
d85b396ed6ced6348a5b402eb8fcfe8b
curl http://cactus.nci.nih.gov/TEST/chemical/file/{key}?records=2,5&format=smiles
54. Chemical File Resolver (CFR)
chemical HTTP Post HTTP Get chemical
CFR
file file
• get field names:
curl http://cactus.nci.nih.gov/TEST/chemical/file/{key}/fields
• get a specific field value from record n:
curl http://cactus.nci.nih.gov/TEST/chemical/file/{key}/n/{field_name}
55. Chemical Structure Web API
external
Chemical Chemical
NCI/CADD web services
Identifier File
web service
Resolver Resolver
http
Chemical Structure Web API
other
CACTVS software
packages
NCI/CADD Chemical Structure OPSIN
Database (CSDB)
56. IUPAC InChI/InChIKey Resolver
• (hopefully) there will be many resolvers from different
providers with different background:
• publishers
• commercial databases
• free sources and databases: ChemSpider, PubChem, ChEBI, …
• InChI/InChIKey is the perfect tool to interlink the resolvers
• ChemSpider, PubChem and NCI/CADD are working on a test
protocol for a federated InChI/InChIKey resolver
60. Acknowledgments
The InChI Team
NCI/CADD Team University of Cambridge, UK
Igor Filippov Daniel Lowe
Marc Nicklaus
Xemistry GmbH, Germany University College Cork, Ireland
Wolf-Dietrich Ihlenfeldt Noel O’ Boyle
All Database providers ChemNavigator
Scott Hutton
Tad Hurst
61. Acknowledgments - Software
CACTVS
Python Web Framework
ChemWriter
Python SQL Library
Peter Ertl (Novartis)
Javascript library
Fulltext Search Engine
Editor's Notes
… but you can do things also independently from InChI – this is the general scheme Almost every identifier or representation can be converted to any other representation
If I take this rules and any of these input structures here