• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
AMBIT-TAUTOMER: AN OPEN SOURCE TOOL FOR TAUTOMER GENERATION
 

AMBIT-TAUTOMER: AN OPEN SOURCE TOOL FOR TAUTOMER GENERATION

on

  • 892 views

AMBIT-TAUTOMER: AN OPEN SOURCE TOOL FOR TAUTOMER GENERATION ...

AMBIT-TAUTOMER: AN OPEN SOURCE TOOL FOR TAUTOMER GENERATION
Nikolay Kochev (a*), Vesselina Paskaleva (a), Nina Jeliazkova (b)
a) University of Plovdiv, Department of Analytical Chemistry and Computer Chemistry, BG
b) Ideaconsult Ltd, Sofia, BG

We present a new open source tool for automatic generation of all tautomeric forms of a given organic compound. Ambit-Tautomer is a part of the open source software package Ambit2. It implements three tautomer generation algorithms: combinatorial method, improved combinatorial method and incremental depth-first search algorithm. All algorithms utilize a set of fully customizable rules for tautomeric transformations. The predefined knowledge base covers 1–3, 1–5 and 1–7 proton tautomeric shifts. Some typical supported tautomerism rules are keto-enol, imin-amin, nitroso-oxime, azo-hydrazone, thioketo-thioenol, thionitroso-thiooxime, amidine-imidine, diazoamino-diazoamino, thioamide-iminothiol and nitrosamine-diazohydroxide. Ambit-Tautomer uses a simple energy based system for tautomer ranking implemented by a set of empirically derived rules. A fine-grained output control is achieved by a set of post-generation filters. We performed an exhaustive comparison of the Ambit-Tautomer Incremental algorithm against several other software packages which offer tautomer generation: ChemAxon Marvin, Molecular Networks MN.TAUTOMER, ACDLabs, CACTVS and the CDK implementation of the algorithm, based on the mobile H atoms listed in the InChI. According to the presented test results, Ambit-Tautomer’s performance is either comparable to or better than the competing algorithms.

Ambit-Tautomer module is available for download as a Java library
(Maven repository)
http://ambit.uni-plovdiv.bg:8083/nexus/index.html#nexus-search;quick~ambit2-tautomer

Command line application
https://github.com/ideaconsult/examples-ambit/tree/master/tautomers-example

Demo web page http://apps.ideaconsult.net:8080/ambit2/depict/tautomer.

Publication
http://onlinelibrary.wiley.com/doi/10.1002/minf.201200133/abstract

Statistics

Views

Total Views
892
Views on SlideShare
886
Embed Views
6

Actions

Likes
1
Downloads
0
Comments
0

2 Embeds 6

https://twitter.com 5
http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    AMBIT-TAUTOMER: AN OPEN SOURCE TOOL FOR TAUTOMER GENERATION AMBIT-TAUTOMER: AN OPEN SOURCE TOOL FOR TAUTOMER GENERATION Document Transcript

    • AMBIT-TAUTOMER: AN OPEN SOURCE TOOL FOR TAUTOMER GENERATION Nikolay Kochev a, Vesselina Paskaleva a, Nina Jeliazkova b a) University of Plovdiv, Department of Analytical Chemistry and Computer Chemistry, 24, Tzar Assen Str., Plovdiv 4000, Bulgaria b) Ideaconsult Ltd, 4 A. Kanchev str., Sofia 1000, Bulgaria, jeliazkova.nina@gmail.com Tautomer generation flow chart Ambit-Tautomer Basic Features Structure input OC(O)=C(N)C 0 4 unused rules HO NH2 marks the current Software characteristics Tautomer generation algorithms HO NH2 OC=C at 213 rule used to 1 3 generate two• CDK.sf.net based structure representation, • Pure combinatorial algorithm OC=C at 013 possible states • Incremental approach (based on depth first HO 2 5 CH input, output and info processing NC=C at 431 3 search algorithm) for rule combination with HO CH3• Supports standard chemical formats: local rule corrections and refinement on the way SMILES, InChI, MOL/SDF file, CML (CDK representation) 4 used rules 4 used rules 0 0 HO NH HO NH2• Exhaustive tautomer generation N=CC at 431 NC=C at 431 1 3 unused rules 1• Customizable set of rules and post- Customizable set of transformation rules unused rules 3 HO 2 5 CH OC=C at 213 3 HO 2 5 CH generation filters • Basic set of 1-3 and 1-5 proton shift rules Substructure search NC=C at 435 OC=C at 013 3• Set of predefined rules • Additional rules: 1-7 proton shifts, chlorine used rules 0 4 atom shifts, ring-chain rules HO NH2 4 0 4 0 4 HO NH2• Tautomer ranking based on simple HO 0 NH HO NH2 NC=C at 4 3 1 HO NH2 1 • Rule description based on SMARTS OC=C at 013 3 empirical rules 1 3 1 3 1 3 O 2 5 CH unused rules 3 HO 2 5 CH HO 2 5 CH HO 2 5 CH 3 used rules 3 2 OC=C at 213Combinations of non-overlapping rules HO CH3 NC=C at 431 Overlapping rules used rules used rules O=CC at 013 ↔ N=CC at 431 1 ↔ 0 0 1 - simple combinations do Initial rules list NC=C at 435 N=CC N=CC at at 431 435 H2 N HO NH2 not work 0 4 OH 00 HO NH2 used rules - rule conflicts are used rules O 0 4 NH2 1 3 NC=C at 431 possible NC=C at 431 HO 2 5 CH OC=C at 013 each tautomer is described HO CH3 - some tautomers might OC=C at 013 1 3 3 OC=C at 213 O=CC at 2 1 3 HO 2 5 CH as a binary combination be omitted 3 Generation of all HN O - more sophisticated possible combinations 11 Result approach is needed of the rule states Post-generation filtering HO NH2 O NH2 based on Depth – HN OH O NH2 HO NH2 HO NH duplicaties, topological 10 first search with equivalency, allene atoms, HO CH3 HO CH3 refinement of the rule incorrect structures, … Ranking HO NH2 HO NH list at each step. H2 N O HO CH3 01 HO CH3 HO CH2 HO CH2 HO CH3 Violuric Acid Tautomer Generation HO N O An exhaustive comparison of the Pemoline Tautomer Generation O NH Software Comparison O HN NH algorithm against several other Software Comparison NH O software packages for tautomer rank: AMBIT 3; M 1 O Ambit (15 tautomers generated) Number of generation is performed: Ambit Marvin (ChemAxon) CACTVS Mn_Tautomer ACD Labs tautomers, generated by: ChemAxon Marvin, MN.TAUTOMER O NH O NH Target O O OH O CACTVS Marvin N (Molecular Networks), ACD Labs, N NH NH NH2 N O O O O O N Structure: N N HN N HN N HN NH HO N N N N N HO CACTVS, and the CDK HO N O O HO HO O O HO OH O OH O O O O O HN NH N OH N OH N OH N O 15 15 implementation of the algorithm, rank: 8 M rank: 8 based on the mobile H atoms listed NH NH O C, M, T, I M, T, I C, M, T, I C, M, T O O NH2 NH NH NH2 NH2 O O O O O NH NH C, M, T, I OH OH OH O O MN_ ACD in the InChI. HO O N N NH N N O HO HO O HO N N HN N HN N HN N HN NH Tautomer Labs According to the test results, Ambit- M, C rank: 4 M rank: 6 Tautomer’s performance is NH2 NH HO OH O O HO OH O OH HO O O OH N comparable to or better than Marvin, N N N NH2 N N N O N O 15 2 O O O O O HO HO C, M, T N N C, M, T C, M, T C, M, T C, M, T CACTVS and MN.TAUTOMER O O M, C, A rank: 1 M, C, T, rank: 7 InChI/CDK Daylight O OH O OH OH InChI/ Daylight algorithms and generates A CDK N N N N HN N HN N N N considerably more tautomeric O NH O NH2 O NH2 structures than ACD Labs and InChI- N NH O OH O O NH N HO OH O OH O OH O O N N N N 5 10 HO based algorithms. O O N N N N OH O HO O O O O C, M, T, I rank: 6 rank: 5 canonical 3 taut. Forms C, M, T C, M, T C, M, T C, M, T /2D structures NH2 NH2 O O Abbreviations: N N O NH O NH2 not available/ M - Marvin (ChemAxon) A - ACD Labs O O NH N C - CACTVS I - InChI/CDK MN.TAUTO InChI/CDK ACD Labs O O CACTVS C, T, A, rank: 2 M rank: 5 Marvin AMBIT T - Mn_Tautomer MER I Name Formula Divicine C4H6N4O2 28 31 27 27 3 9 Ambit-Tautomer is part of the Ambit2 Iodothiouracil C4H3IN2OS 10 9 9 9 2 6 Thioguanine C5H5N5S 35 43 26 28 4 15 software package, distributed under Mercaptopurine C5H4N4S 19 23 8 8 4 8 LGPL license [1] and using The Allopurinol C5H4N4O 22 22 13 17 5 9 2-Mercaptobenzothiazol C7H5NS2 6 5 2 2 2 2 Chemistry Development Kit (CDK) Amitrole C2H4N4 8 8 5 7 3 5 library [2] for basic cheminformatics Thiotetronic acid C4H4O2S 5 5 5 5 3 0 functionality. Ambit-Tautomer utilizes a 2-thiouracil C4H4N2OS 10 9 9 9 2 3 depth-first search algorithm, combined Flucytosine C4H4FN3O 10 9 5 9 2 6 Citrazinic acid C6H5NO4 5 5 5 5 2 2 with a set of rules for tautomeric Tenoxicam C13H11N3O4S2 15 14 8 11 3 2 transformation. Mitoguazone C5H12N8 31 35 12 28 4 4 Methimazole C4H6N2S 3 3 2 3 2 3 The Ambit implementation of OpenTox Ciclopirox C12H17NO2 8 3 3 2 2 0 Dithranal C14H10O3 52 60 48 48 2 0 Web [3] services for predictive Enprofylline C8H10N4O2 14 17 11 11 3 8 toxicology is being extended to include Thymine C5H6N2O2 10 9 9 9 2 6 Abscinic avid C15H20O4 30 30 5 10 2 0 the tautomer generation algorithm. A 4-acetamidobenzaldehyde C9H9NO2 9 9 9 9 2 2 web page, providing online tautomer Acetoacetanilide C10H11NO2 21 21 8 8 2 2 generation by several different Acetobromoglucose C14H19BrO9 16 16 0 0 2 0 4-acetoxy-benzaldehyde C9H8O3 2 2 0 0 2 0 algorithms, including Ambit-Tautomer, is 3-acetoxy-2-cyclohexen-1-one C8H10O3 10 10 5 5 2 0 available at: 2-acetyl-butyrolactone C6H8O3 5 5 5 5 2 0 N-Acetyl-L-cysteine C5H9NO3S 14 20 2 3 2 2 2-aminobenzamide C7H8N2O 11 9 7 5 2 2 http://apps.ideaconsult.net:8080/ambit2/depict/tautomer 2-aminobenzonitrile C7H6N2 5 5 0 0 2 0 2-aminodiphenylamine C12H12N2 21 20 0 0 2 0 References 4-aminohippuric acid C9H10N2O3 18 24 5 2 2 2 [1] AMBIT project, http://ambit.sourceforge.net Mesalazine C7H7NO3 13 12 9 6 2 0 2-amino-3-hydroxypyridine C5H6N2O 11 10 4 8 2 2 [2] Steinbeck C., Hoppe C., Kuhn S., Floris M., Guha R., Willighagen E.L., ,Recent Developments of the 2-(aminomethyl)pyridine C6H8N2 18 23 0 0 2 0 Chemistry Development Kit (CDK) - An Open-Source Java Library for Chemo- and Bioinformatics". Curr. 2-aminophenol C6H7NO 11 10 7 0 2 0 3-aminophenol C6H7NO 10 9 9 0 2 0 Pharm. Des. 2006; 12(17):2111-2120 (DOI: 10.2174/138161206777585274) 4-aminophenol C6H7NO 6 6 4 0 2 0 [3] Jeliazkova N., Jeliazkov V. AMBIT RESTful web services: an implementation of the OpenTox application benzhydrazide C6H5CONHNH2 3 6 2 3 2 2 Carbazole C12H9N 12 7 0 0 2 0 programming interface, Journal of Cheminformatics 2011, 3:18, doi:10.1186/1758-2946-3-18.; http://www.jcheminf.com/content/3/1/18