AMBIT-TAUTOMER: AN OPEN SOURCE TOOL FOR TAUTOMER GENERATION
Nikolay Kochev (a*), Vesselina Paskaleva (a), Nina Jeliazkova (b)
a) University of Plovdiv, Department of Analytical Chemistry and Computer Chemistry, BG
b) Ideaconsult Ltd, Sofia, BG
We present a new open source tool for automatic generation of all tautomeric forms of a given organic compound. Ambit-Tautomer is a part of the open source software package Ambit2. It implements three tautomer generation algorithms: combinatorial method, improved combinatorial method and incremental depth-first search algorithm. All algorithms utilize a set of fully customizable rules for tautomeric transformations. The predefined knowledge base covers 1–3, 1–5 and 1–7 proton tautomeric shifts. Some typical supported tautomerism rules are keto-enol, imin-amin, nitroso-oxime, azo-hydrazone, thioketo-thioenol, thionitroso-thiooxime, amidine-imidine, diazoamino-diazoamino, thioamide-iminothiol and nitrosamine-diazohydroxide. Ambit-Tautomer uses a simple energy based system for tautomer ranking implemented by a set of empirically derived rules. A fine-grained output control is achieved by a set of post-generation filters. We performed an exhaustive comparison of the Ambit-Tautomer Incremental algorithm against several other software packages which offer tautomer generation: ChemAxon Marvin, Molecular Networks MN.TAUTOMER, ACDLabs, CACTVS and the CDK implementation of the algorithm, based on the mobile H atoms listed in the InChI. According to the presented test results, Ambit-Tautomer’s performance is either comparable to or better than the competing algorithms.
Ambit-Tautomer module is available for download as a Java library
(Maven repository)
http://ambit.uni-plovdiv.bg:8083/nexus/index.html#nexus-search;quick~ambit2-tautomer
Command line application
https://github.com/ideaconsult/examples-ambit/tree/master/tautomers-example
Demo web page http://apps.ideaconsult.net:8080/ambit2/depict/tautomer.
Publication
http://onlinelibrary.wiley.com/doi/10.1002/minf.201200133/abstract
AMBIT-TAUTOMER: AN OPEN SOURCE TOOL FOR TAUTOMER GENERATION
1. AMBIT-TAUTOMER:
AN OPEN SOURCE TOOL FOR TAUTOMER GENERATION
Nikolay Kochev a, Vesselina Paskaleva a, Nina Jeliazkova b
a) University of Plovdiv, Department of Analytical Chemistry and Computer Chemistry, 24, Tzar Assen Str., Plovdiv 4000, Bulgaria
b) Ideaconsult Ltd, 4 A. Kanchev str., Sofia 1000, Bulgaria, jeliazkova.nina@gmail.com
Tautomer generation flow chart
Ambit-Tautomer Basic Features Structure input
OC(O)=C(N)C 0 4
unused rules HO NH2 marks the current
Software characteristics Tautomer generation algorithms HO NH2 OC=C at 213 rule used to
1 3 generate two
• CDK.sf.net based structure representation, • Pure combinatorial algorithm OC=C at 013
possible states
• Incremental approach (based on depth first HO 2 5 CH
input, output and info processing NC=C at 431 3
search algorithm) for rule combination with HO CH3
• Supports standard chemical formats: local rule corrections and refinement on the way
SMILES, InChI, MOL/SDF file, CML (CDK representation) 4 used rules 4
used rules 0 0
HO NH HO NH2
• Exhaustive tautomer generation N=CC at 431
NC=C at 431
1 3 unused rules 1
• Customizable set of rules and post- Customizable set of transformation rules unused rules
3
HO 2 5 CH OC=C at 213
3
HO 2 5 CH
generation filters • Basic set of 1-3 and 1-5 proton shift rules Substructure search
NC=C at 435
OC=C at 013
3
• Set of predefined rules • Additional rules: 1-7 proton shifts, chlorine used rules 0 4
atom shifts, ring-chain rules HO NH2 4 0 4 0 4 HO NH2
• Tautomer ranking based on simple HO
0
NH HO NH2 NC=C at 4 3 1 HO NH2
1
• Rule description based on SMARTS OC=C at 013
3
empirical rules 1 3 1 3 1 3
O 2 5 CH
unused rules 3
HO 2 5 CH HO 2 5 CH HO 2 5 CH
3 used rules
3 2
OC=C at 213
Combinations of non-overlapping rules HO CH3 NC=C at 431
Overlapping rules used rules used rules
O=CC at 013
↔
N=CC at 431
1 ↔ 0 0 1 - simple combinations do
Initial rules list
NC=C at 435
N=CC
N=CC
at
at
431
435
H2 N HO NH2 not work 0 4
OH 00 HO NH2 used rules
- rule conflicts are used rules
O
0 4
NH2 1 3
NC=C at 431
possible NC=C at 431
HO 2 5 CH
OC=C at 013
each tautomer is described HO CH3 - some tautomers might
OC=C at 013 1 3 3
OC=C at 213
O=CC at 2 1 3 HO 2 5 CH
as a binary combination be omitted 3
Generation of all
HN O - more sophisticated possible combinations
11 Result
approach is needed of the rule states Post-generation filtering
HO NH2 O NH2
based on Depth –
HN OH O NH2 HO NH2 HO NH duplicaties, topological
10 first search with
equivalency, allene atoms, HO CH3 HO CH3
refinement of the rule
incorrect structures, … Ranking HO NH2 HO NH
list at each step.
H2 N O HO CH3
01 HO CH3 HO CH2
HO CH2 HO CH3
Violuric Acid Tautomer Generation
HO N O
An exhaustive comparison of the Pemoline Tautomer Generation O
NH
Software Comparison O
HN
NH
algorithm against several other Software Comparison NH
O software packages for tautomer rank: AMBIT 3; M 1
O
Ambit (15 tautomers generated) Number of generation is performed: Ambit Marvin (ChemAxon) CACTVS Mn_Tautomer ACD Labs
tautomers,
generated by: ChemAxon Marvin, MN.TAUTOMER O
NH
O
NH
Target O O OH O
CACTVS Marvin N
(Molecular Networks), ACD Labs,
N NH NH NH2
N O O O O O
N
Structure:
N N HN N HN N HN NH HO N N N N N
HO
CACTVS, and the CDK
HO N O
O HO HO O
O
HO OH O OH O O O O
O
HN
NH N
OH
N
OH
N
OH
N
O 15 15 implementation of the algorithm, rank: 8 M rank: 8
based on the mobile H atoms listed
NH NH
O C, M, T, I M, T, I C, M, T, I C, M, T O O
NH2 NH NH NH2 NH2
O O O O O
NH NH
C, M, T, I
OH OH OH O O
MN_ ACD
in the InChI. HO O
N N NH N N
O HO HO O HO
N N HN N HN N HN N HN NH
Tautomer Labs
According to the test results, Ambit- M, C rank: 4 M rank: 6
Tautomer’s performance is
NH2 NH
HO OH O O
HO OH O OH HO O O OH
N
comparable to or better than Marvin,
N N
N NH2
N N N O
N
O
15 2 O O
O O O HO HO
C, M, T N N
C, M, T C, M, T C, M, T C, M, T
CACTVS and MN.TAUTOMER O O
M, C, A rank: 1 M, C, T, rank: 7 InChI/CDK Daylight
O OH O OH OH
InChI/ Daylight algorithms and generates A
CDK
N N N N HN N HN N
N N
considerably more tautomeric O
NH
O
NH2 O
NH2
structures than ACD Labs and InChI-
N NH
O OH O O
NH N
HO OH O OH O OH O O N
N N
N
5 10
HO
based algorithms.
O O
N N N N OH
O HO
O O O O
C, M, T, I rank: 6 rank: 5 canonical 3 taut. Forms
C, M, T C, M, T C, M, T C, M, T
/2D structures
NH2 NH2
O O
Abbreviations: N N
O
NH
O
NH2 not available/
M - Marvin (ChemAxon) A - ACD Labs O O
NH N
C - CACTVS I - InChI/CDK
MN.TAUTO
InChI/CDK
ACD Labs
O O
CACTVS
C, T, A, rank: 2 M rank: 5
Marvin
AMBIT
T - Mn_Tautomer
MER
I
Name Formula
Divicine C4H6N4O2 28 31 27 27 3 9 Ambit-Tautomer is part of the Ambit2
Iodothiouracil C4H3IN2OS 10 9 9 9 2 6
Thioguanine C5H5N5S 35 43 26 28 4 15 software package, distributed under
Mercaptopurine C5H4N4S 19 23 8 8 4 8 LGPL license [1] and using The
Allopurinol C5H4N4O 22 22 13 17 5 9
2-Mercaptobenzothiazol C7H5NS2 6 5 2 2 2 2 Chemistry Development Kit (CDK)
Amitrole C2H4N4 8 8 5 7 3 5
library [2] for basic cheminformatics
Thiotetronic acid C4H4O2S 5 5 5 5 3 0 functionality. Ambit-Tautomer utilizes a
2-thiouracil C4H4N2OS 10 9 9 9 2 3 depth-first search algorithm, combined
Flucytosine C4H4FN3O 10 9 5 9 2 6
Citrazinic acid C6H5NO4 5 5 5 5 2 2 with a set of rules for tautomeric
Tenoxicam C13H11N3O4S2 15 14 8 11 3 2 transformation.
Mitoguazone C5H12N8 31 35 12 28 4 4
Methimazole C4H6N2S 3 3 2 3 2 3 The Ambit implementation of OpenTox
Ciclopirox C12H17NO2 8 3 3 2 2 0
Dithranal C14H10O3 52 60 48 48 2 0 Web [3] services for predictive
Enprofylline C8H10N4O2 14 17 11 11 3 8 toxicology is being extended to include
Thymine C5H6N2O2 10 9 9 9 2 6
Abscinic avid C15H20O4 30 30 5 10 2 0
the tautomer generation algorithm. A
4-acetamidobenzaldehyde C9H9NO2 9 9 9 9 2 2 web page, providing online tautomer
Acetoacetanilide C10H11NO2 21 21 8 8 2 2 generation by several different
Acetobromoglucose C14H19BrO9 16 16 0 0 2 0
4-acetoxy-benzaldehyde C9H8O3 2 2 0 0 2 0 algorithms, including Ambit-Tautomer, is
3-acetoxy-2-cyclohexen-1-one C8H10O3 10 10 5 5 2 0 available at:
2-acetyl-butyrolactone C6H8O3 5 5 5 5 2 0
N-Acetyl-L-cysteine C5H9NO3S 14 20 2 3 2 2
2-aminobenzamide C7H8N2O 11 9 7 5 2 2 http://apps.ideaconsult.net:8080/ambit2/depict/tautomer
2-aminobenzonitrile C7H6N2 5 5 0 0 2 0
2-aminodiphenylamine C12H12N2 21 20 0 0 2 0
References
4-aminohippuric acid C9H10N2O3 18 24 5 2 2 2 [1] AMBIT project, http://ambit.sourceforge.net
Mesalazine C7H7NO3 13 12 9 6 2 0
2-amino-3-hydroxypyridine C5H6N2O 11 10 4 8 2 2 [2] Steinbeck C., Hoppe C., Kuhn S., Floris M., Guha R., Willighagen E.L., ,Recent Developments of the
2-(aminomethyl)pyridine C6H8N2 18 23 0 0 2 0 Chemistry Development Kit (CDK) - An Open-Source Java Library for Chemo- and Bioinformatics". Curr.
2-aminophenol C6H7NO 11 10 7 0 2 0
3-aminophenol C6H7NO 10 9 9 0 2 0
Pharm. Des. 2006; 12(17):2111-2120 (DOI: 10.2174/138161206777585274)
4-aminophenol C6H7NO 6 6 4 0 2 0 [3] Jeliazkova N., Jeliazkov V. AMBIT RESTful web services: an implementation of the OpenTox application
benzhydrazide C6H5CONHNH2 3 6 2 3 2 2
Carbazole C12H9N 12 7 0 0 2 0
programming interface, Journal of Cheminformatics 2011, 3:18, doi:10.1186/1758-2946-3-18.;
http://www.jcheminf.com/content/3/1/18