Nikolay T. Kochev1, Vesselina H. Paskaleva1, Nina Jeli...
Upcoming SlideShare
Loading in...5



Published on

OpenTox Euro 2013 poster

6. Ambit-TAUTOMER – a Software Tool for Automatic Tautomer Generation, Nina Jeliazkova (Ideaconsult Ltd)
Ambit-Tautomer [1] is an open source Java library for automatic generation of all tautomers of a given chemical compound. It is implemented on top of the Chemistry Development Kit (CDK) [2]. The system includes three main algorithms: pure combinatorial method, improved combinatorial method and incremental algorithm. The tautomer generator uses a set of predefined, but customizable rules. The rules are defined by Daylight SMILES/SMARTS line notations and support the basic types of tautomerism (1-3, 1-5 and 1-7 proton tautomer shifts). The pure combinatorial method generates all tautomeric forms considering all possible combinations of the matched rule states. The improved combinatorial method uses sub-combinations based on rules clustering. The incremental algorithm applies depth-first search to handle sophisticated cases of overlapping rules. Additionally, rule pre-filtering and tautomer post-filtering are applied for fine tuning of the generation process. The tautomer generator implements tautomer ranking based on empirical rules defined in terms of relative energy difference. Ambit-Tautomer library is applied to improve the Ambit database storage of chemical structures and accordingly to implement search procedures which take into account the tautomerism information. Also the tautomer sets are used to calculate modified values of the original molecular descriptors in order to improve existing QSAR/QSPR models. Ambit-Tautomer module is implemented as open source Java package as part of the Ambit open source software for chemoinformatics and data management [3,4] and is available as a Java library, command line application [5] and OpenTox Algorithm API compatible Web service [6]. Ambit package is available as online web services and as a downloadable application. A web page providing online tautomer generation by Ambit-Tautomer and several different software packages is available on

[1] Kochev, N. T., Paskaleva, V. H. and Jeliazkova, N., Ambit-Tautomer: An Open Source Tool for Tautomer Generation. Mol. Inf., 32: 481–504, 2013
[2] C. Steinbeck, Y. Han, S. Kuhn, O. Horlacher, E. Luttmann, E. Willighagen, The Chemistry Development Kit (CDK):  An Open-Source Java Library for Chemo- and Bioinformatics, J. Chem. Inf. Comput. Sci., 43: 493–500, 2003
[3] Jeliazkova N., Jeliazkov V. AMBIT RESTful web services: an implementation of the OpenTox application programming interface, Journal of Cheminformatics 2011, 3:18, doi:10.1186/1758-2946-3-18.

Published in: Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. ALGORITHMS FOR AUTOMATIC TAUTOMER GENERATION AND THEIR APPLICATIONS Nikolay T. Kochev1, Vesselina H. Paskaleva1, Nina Jeliazkova2 of Plovdiv, Department of Analytical Chemistry and Computer Chemistry; 2Ideaconsult Ltd, 4 A. Kanchev str., Sofia 1000, Bulgaria Ambit-Tautomer Basic Features Tautomer Generation Flow Chart Structure input OC(O)=C(N)C Customizable set of rules • Basic set of 1-3 and 1-5 proton shift • based structure rules representation, • Additional rules: 1-7 proton shifts, input, output and info processing chlorine atom shifts •Supports standard chemical formats: • Rule description based on SMARTS SMILES, InChI, MOL/SDF file, CML • Exhaustive tautomer generation Tautomer generation algorithms • Customizable set of rules and postgeneration filters • Pure combinatorial algorithm • Incremental approach (based on depth • Set of predefined rules first search algorithm) for rule • Tautomer ranking based on simple combination with local rule corrections empirical rules and refinement on the way CH3 (CDK representation) 0 ↔ 1 OH 00 each tautomer is described as a binary combination HN O HN HO CH3 11 OH O H2 N NH2 HO 10 N=CC 01 XLogP (no tautomers) NH2 HO CH3 HO HO XLogP (all tautomers) mean error 1.90 1.70 1.50 1.30 1.10 0.90 0.70 2 ÷ 10 11 ÷ 30 31 ÷ 50 52 ÷ 100 102 ÷ 192 204 ÷ 292 302 ÷ 1318 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 Number of PaDEL descriptors that have RSD > RSDthreshold 0.1 0.3 0.5 1.0 180 124 99 71 pemoline 0.1 0.3 0.5 1.0 217 151 108 80 239 168 138 113 Ambit-Tautomer [1] is part of the Ambit2 software package [2], distributed under LGPL license and using the Chemistry Development Kit (CDK) library [3] for basic chemoinformatics functionality. Ambit-Tautomer utilizes a depth-first search algorithm, combined with a set of rules for tautomeric transformations.The Ambit implementation of OpenTox Web [4] services for predictive toxicology, are being extended to include the tautomer generation algorithm. A web page, providing online tautomer generation by several different algorithms, including Ambit-Tautomer, is available at: 4 3 4 HO 2 NH HO 4 1 3 5 CH 3 HO 2 at 4 3 1 HO OC=C at 213 OC=C 5 CH 2 at 0.135 -0.086 0.267 0.041 0.361 -0.102 -0.084 1.230 0.698 0.363 -0.277 -1.056 -0.932 -1.038 -1.267 unused rules 4 0 1 unused rules 3 431 at 013 5 CH 3 OC=C at 2 1 3 NC=C NH2 at OC=C 5 CH 3 0 NH2 NC=C 3 3 O 2 5 CH 3 HO 2 013 NH2 4 1 HO 0 NH2 5 CH 3 used rules NC=C at 431 N=CC at at 435 NC=C at 435 431 at 213 431 N=CC at O=CC used rules O 0 4 HO 2 used rules NH2 0 4 1 3 HO 3 HO 2 5 CH 3 NH2 5 CH 3 used rules NC=C at 431 NC=C at 431 OC=C at 213 OC=C at 213 O=CC at 013 OC=C at 013 HO Post-generation filtering duplicates, topological HO equivalency, allene Ranking atoms, incorrect structures, … NH2 CH3 HO HO Result output NH CH3 O NH2 HO NH2 HO CH3 HO CH2 QSAR/QSPR Cheminfo Processing Flow Chart methimazole CDK Connection generate representation 2D Table (CDK container) Structure input: C1=CN(C(N1)=S)C /SMILES, InChI, *.mol, CML/ tautomer 3D models S S N N N Z=32 W=40 ATSc1 = 0.14 … H3 C N NH generate tautomers generate 3D Calculate 1D, 2D, 3D molecular descriptors NA = 13 NH = 6 MW = 114.03 … S S S SH H3 C N NH H3 C H3 C N N N N Calculate fingerprints (bit-vectors) Group counts, additive schemes 10001...111011 hashed fingerprint 0 0 1 0 1 . . . 0 0 1 0 1 0 key-based fingerprint QSPR QSAR Similarity search Chemical Data base Models of biological activities: ADME Toxicity, Mutagenicity, Biodegradation, … Models of physicochemical properties: QSAR LogP, BP, MP, MR,… QSPR List of most similar structures CH3 CH3 N CH3 N SH N N NH H3 C N N N compounds (subset of PubChem data base). S H3 C N SH NH (methimazole) CH3 N N H3 C N S N 0.62 CH3 H3 C 0.1 0.3 0.5 1.0 0 1 NH 3 HO 2 used rules Table 1. The similarity search results for the three tautomers of methimazole. Each column contains the five most similar structures to the tautomer. Similarity search is performed in a data base with 553477 1. violuric acid 435 0 Generation of all possible combinations of the rule states based on Depthfirst search with refinement of the rule list at each step. Similarity methimazole RSD threshold at 5 CH 3 1 CH3 Similarity Structure 1 3 HO N=CC Violuric acid tautomers Ames Mutagenicity XLogP /SMILES notations/ (model) Table 3. The number of descriptors (out of total 863) which exhibit relative standard deviation (RSD due to the tautomerism) larger than particular thresholds: 0.1, 0.3, 0.5, 1.0 4 0 used rules Initial rule list N O=C1NC(=O)C(=NO)C(=O)N1 O=C1N=C(O)N=C(O)C1(=NO) O=C1N=C(O)C(=NO)C(O)=N1 The structural information was O=C1N=C(O)C(=NO)C(=O)N1 processed according to the O=C1N=C(O)NC(=O)C1(=NO) presented flow chart. We O=NC1=C(O)N=C(O)N=C1(O) studied the influence of O=NC=1C(=O)NC(O)=NC=1(O) tautomers information on O=NC=1C(=O)N=C(O)NC=1(O) various processing stages: O=NC=1C(O)=NC(=O)NC=1(O) descriptor calculation (table 3), O=NC=1C(=O)NC(=O)NC=1(O) similarity searching (see table O=NC1C(O)=NC(=O)N=C1(O) 1) and QSAR/QSPR modeling O=NC1C(=O)N=C(O)N=C1(O) of Ames-Mutagenicity and O=NC1C(=O)NC(=O)N=C1(O) LogP (see fig.2 and table 2). O=NC1C(=O)N=C(O)NC1(=O) O=NC1C(=O)NC(=O)NC1(=O) 431 1 CH3 Number of tautomers per structure Table 2. The values of Amesmutagenecity model and XLogP model for all tautomers of viuoluric acid. HO at HO 2 Figure 2. The mean absolute errors for XLogP model compared with the errors obtained from the averaged model values calculated for all tautomers for each testing structure. The statistics is calculated for 8327 test structures. 2.10 HO NH2 HO NH CH2 HO 4 3 1 HO 2 marks the current rule used to generate two possible states used rules Substructure search - simple combinations do not work - rule conflicts are possible - some tautomers might be omitted - more sophisticated approach is needed NH2 HO O at 1 NH2 H3 C Similarity 0 213 N H2 N ↔ N=CC at unused rules Combinations of non-overlapping rules 1 used rules 013 NC=C HO at OC=C NH2 HO HO Overlapping rules HO 4 0 OC=C N Software characteristics unused rules S 1University N N 0.71 0.47 CH3 CH3 S N H3 C N N CH3 N H NH2 CH3 2. CH3 N 0.6 CH2 N N H3 C 0.71 CH3 0.45 CH3 S N I– H3 C + NH2 N H N H3 C 3. 0.59 N 0.64 SH + Ag HN N C- 0.44 CH3 H N N NH Figure 1. AMBIT2 Tautomer generation test page CH3 H3 C S 4. 0.58 CH2 N H3 C Cl– 0.57 CH3 N 0.44 S N N N N+ H3 C 5. 0.54 CH3 0.57 H3 C N HN H3 C H N H N S N– N CH3 0.43 References [1] Kochev, N. T., Paskaleva, V. H. and Jeliazkova, N., Ambit-Tautomer: An Open Source Tool for Tautomer Generation. Mol. Inf., 32: 481–504, 2013 [2] AMBIT project, [3] Steinbeck C., Hoppe C., Kuhn S., Guha R., Willighagen E.L., “Recent Developments of the Chemistry Development Kit (CDK) – An Open-Source Java Library for Chemo- and Bioinformatics”. Curr. Pharm. Des. 2006; 12(17):2111-2120 (DOI: 10.2174/138161206777585274) [4] Jeliazkova N., Jeliazkov V., AMBIT RESTful web services: an implementation of the Open Tox application programming interface, Journal of Chemoinformatics 2011, 3:18, doi: 10.1186/17582946-3-18.;