AMBIT-TAUTOMER: AN OPEN SOURCE TOOL FOR TAUTOMER GENERATION

Nina Jeliazkova
Nina Jeliazkovaat Ideaconsult Ltd.

AMBIT-TAUTOMER: AN OPEN SOURCE TOOL FOR TAUTOMER GENERATION Nikolay Kochev (a*), Vesselina Paskaleva (a), Nina Jeliazkova (b) a) University of Plovdiv, Department of Analytical Chemistry and Computer Chemistry, BG b) Ideaconsult Ltd, Sofia, BG We present a new open source tool for automatic generation of all tautomeric forms of a given organic compound. Ambit-Tautomer is a part of the open source software package Ambit2. It implements three tautomer generation algorithms: combinatorial method, improved combinatorial method and incremental depth-first search algorithm. All algorithms utilize a set of fully customizable rules for tautomeric transformations. The predefined knowledge base covers 1–3, 1–5 and 1–7 proton tautomeric shifts. Some typical supported tautomerism rules are keto-enol, imin-amin, nitroso-oxime, azo-hydrazone, thioketo-thioenol, thionitroso-thiooxime, amidine-imidine, diazoamino-diazoamino, thioamide-iminothiol and nitrosamine-diazohydroxide. Ambit-Tautomer uses a simple energy based system for tautomer ranking implemented by a set of empirically derived rules. A fine-grained output control is achieved by a set of post-generation filters. We performed an exhaustive comparison of the Ambit-Tautomer Incremental algorithm against several other software packages which offer tautomer generation: ChemAxon Marvin, Molecular Networks MN.TAUTOMER, ACDLabs, CACTVS and the CDK implementation of the algorithm, based on the mobile H atoms listed in the InChI. According to the presented test results, Ambit-Tautomer’s performance is either comparable to or better than the competing algorithms. Ambit-Tautomer module is available for download as a Java library (Maven repository) http://ambit.uni-plovdiv.bg:8083/nexus/index.html#nexus-search;quick~ambit2-tautomer Command line application https://github.com/ideaconsult/examples-ambit/tree/master/tautomers-example Demo web page http://apps.ideaconsult.net:8080/ambit2/depict/tautomer. Publication http://onlinelibrary.wiley.com/doi/10.1002/minf.201200133/abstract

AMBIT-TAUTOMER:
                                                                                         AN OPEN SOURCE TOOL FOR TAUTOMER GENERATION
                                                                                                                                                                                                                Nikolay Kochev a, Vesselina Paskaleva a, Nina Jeliazkova                                                                                                                        b

                                                                                                                                                                     a) University of Plovdiv, Department of Analytical Chemistry and Computer Chemistry, 24, Tzar Assen Str., Plovdiv 4000, Bulgaria
                                                                                                                                                                                           b) Ideaconsult Ltd, 4 A. Kanchev str., Sofia 1000, Bulgaria, jeliazkova.nina@gmail.com

                                                                                                                                                                                                                                                                                                                                        Tautomer generation flow chart
                                                                                Ambit-Tautomer Basic Features                                                                                                                                           Structure input
                                                                                                                                                                                                                                                        OC(O)=C(N)C                                                                                                                                                 0            4
                                                                                                                                                                                                                                                                                                                                                            unused rules                                      HO                      NH2                                                               marks the current
     Software characteristics                                                                                                                                                   Tautomer generation algorithms                                          HO           NH2                                                                                    OC=C                 at        213                                                                                                               rule used to
                                                                                                                                                                                                                                                                                                                                                                                                                    1             3                                                                         generate two
•     CDK.sf.net based structure representation,                                                                                                                                • Pure combinatorial algorithm                                                                                                                                              OC=C                 at        013
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          possible states
                                                                                                                                                                                • Incremental approach (based on depth first                                                                                                                                                                                  HO 2               5 CH
      input, output and info processing                                                                                                                                                                                                                                                                                                                     NC=C                 at        431                                       3

                                                                                                                                                                                  search algorithm) for rule combination with                           HO           CH3
•     Supports standard chemical formats:                                                                                                                                         local rule corrections and refinement on the way
      SMILES, InChI, MOL/SDF file, CML                                                                                                                                                                                                                (CDK representation)                                                                                                        4                                                                   used rules                                                           4
                                                                                                                                                                                                                                                                                                                 used rules                                         0                                                                                                                                   0
                                                                                                                                                                                                                                                                                                                                                            HO                         NH                                                                                                       HO                             NH2
•     Exhaustive tautomer generation                                                                                                                                                                                                                                                                             N=CC                   at        431
                                                                                                                                                                                                                                                                                                                                                                                                                                                      NC=C       at       431
                                                                                                                                                                                                                                                                                                                                                                    1                 3                                                               unused rules                                      1
•     Customizable set of rules and post-                                                                                                                                      Customizable set of transformation rules                                                                                          unused rules
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           3
                                                                                                                                                                                                                                                                                                                                                            HO 2                 5 CH                                                                 OC=C       at       213
                                                                                                                                                                                                                                                                                                                                                                                     3
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                HO 2                      5 CH
      generation filters                                                                                                                                                       • Basic set of 1-3 and 1-5 proton shift rules                          Substructure search
                                                                                                                                                                                                                                                                                                                 NC=C                   at       435
                                                                                                                                                                                                                                                                                                                                                                                                                                                      OC=C       at       013
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              3



•     Set of predefined rules                                                                                                                                                  • Additional rules: 1-7 proton shifts, chlorine                                                                                                                                                                                      used rules                                                                              0              4
                                                                                                                                                                                 atom shifts, ring-chain rules                                         HO           NH2                                                                      4                               0             4                                                               0          4                             HO                         NH2
•     Tautomer ranking based on simple                                                                                                                                                                                                                                                                           HO
                                                                                                                                                                                                                                                                                                                          0
                                                                                                                                                                                                                                                                                                                                                 NH                 HO                         NH2                  NC=C         at        4 3 1 HO                        NH2
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            1
                                                                                                                                                                               • Rule description based on SMARTS                                                                                                                                                                                                   OC=C         at        013
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           3
      empirical rules                                                                                                                                                                                                                                                                                                     1                   3                              1              3                                                              1           3
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        O 2               5 CH
                                                                                                                                                                                                                                                                                                                                                                                                                    unused rules                                                                                              3
                                                                                                                                                                                                                                                                                                                 HO 2                        5 CH                   HO 2                   5 CH                                                          HO 2         5 CH
                                                                                                                                                                                                                                                                                                                                                                                                                                                                          3                     used rules
                                                                                                                                                                                                                                                                                                                                                 3                                             2
                                                                                                                                                                                                                                                                                                                                                                                                                    OC=C         at        213
Combinations of non-overlapping rules                                                                                                                                                                                                                  HO           CH3                                                                                                                                                                                                                         NC=C                 at        431
                                                                                                                                                                             Overlapping rules                                                                                                                   used rules                                         used rules
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                O=CC                 at        013
                                                                          ↔
                                                                                                                                                                                                                                                                                                                 N=CC                   at        431
              1        ↔            0                            0                           1                                                                                                                      - simple combinations do
                                                                                                                                                                                                                                                            Initial rules list
                                                                                                                                                                                                                                                                                                                 NC=C                   at        435
                                                                                                                                                                                                                                                                                                                                                                    N=CC
                                                                                                                                                                                                                                                                                                                                                                    N=CC
                                                                                                                                                                                                                                                                                                                                                                                      at
                                                                                                                                                                                                                                                                                                                                                                                      at
                                                                                                                                                                                                                                                                                                                                                                                               431
                                                                                                                                                                                                                                                                                                                                                                                               435
         H2 N                                                                                                                                                                   HO                           NH2    not work                                                                                                                                                                                                                                          0                   4
                                                                                         OH                                                  00                                                                                                                                                                                                                                                                                                                  HO                           NH2           used rules
                                                                                                                                                                                                                    - rule conflicts are                                                                                                                                          used rules
                                                                                                                                                                                                                                                                                                                                                                                                                         O
                                                                                                                                                                                                                                                                                                                                                                                                                             0                 4
                                                                                                                                                                                                                                                                                                                                                                                                                                                   NH2                1                   3
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            NC=C               at     431

                                                                                                                                                                                                                    possible                                                                                                                                                      NC=C                   at        431
                                                                                                                                                                                                                                                                                                                                                                                                                                                                 HO 2                5 CH
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            OC=C           at         013

                                                      each tautomer is described                                                                                                HO                           CH3    - some tautomers might
                                                                                                                                                                                                                                                                                                                                                                                  OC=C               at        013           1                  3                                        3
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            OC=C           at         213
                                                                                                                                                                                                                                                                                                                                                                                  O=CC               at        2 1 3 HO 2                  5 CH
                                                      as a binary combination                                                                                                                                       be omitted                                                                                                                                                                                                                 3
                                                                                                                                                                                                                                                     Generation of all
          HN                                                                                         O                                                                                                              - more sophisticated             possible combinations
                                                                                                                                             11                                                                                                                                                                                                                                                                                                     Result
                                                                                                                                                                                                                    approach is needed               of the rule states                                                   Post-generation filtering
                                                                                                                                                                                                                                                                                                                                                                                                                                                         HO                NH2                  O                         NH2
                                                                                                                                                                                                                                                     based on Depth –
          HN                                                                                          OH                                                                       O                       NH2 HO                NH2   HO        NH                                                                           duplicaties, topological
                                                                                                                                             10                                                                                                      first search with
                                                                                                                                                                                                                                                                                                                          equivalency, allene atoms,                                                                                                     HO                CH3                HO                          CH3
                                                                                                                                                                                                                                                     refinement of the rule
                                                                                                                                                                                                                                                                                                                          incorrect structures, …    Ranking                                                                                             HO                    NH2             HO                         NH
                                                                                                                                                                                                                                                     list at each step.
          H2 N                                                                                       O                                                                                                                             HO        CH3
                                                                                                                                             01                               HO                       CH3 HO                CH2
                                                                                                                                                                                                                                                                                                                                                                                                                                                         HO                    CH2             HO                         CH3




    Violuric Acid Tautomer Generation
                                                                                                                                                                                        HO   N          O
                                                                                                                                                                                                                    An exhaustive comparison of the                  Pemoline Tautomer Generation                                                                                                                                                                                                       O
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     NH



    Software Comparison                                                                                                                                                                 O

                                                                                                                                                                                                 HN
                                                                                                                                                                                                        NH
                                                                                                                                                                                                                    algorithm against several other                  Software Comparison                                                                                                                                                                                                                        NH



                                                                                                                                                                                                        O           software packages for tautomer                                                                                                                                                                       rank: AMBIT 3; M 1
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    O



                                                     Ambit (15 tautomers generated)                                                                                              Number of                          generation is performed:                                                         Ambit                                            Marvin (ChemAxon)                                                  CACTVS                           Mn_Tautomer                               ACD Labs
                                                                                                                                                                                 tautomers,
                                                                                                                                                                                generated by:                       ChemAxon Marvin, MN.TAUTOMER                                         O
                                                                                                                                                                                                                                                                                                        NH
                                                                                                                                                                                                                                                                                                                      O
                                                                                                                                                                                                                                                                                                                                   NH

           Target                                O                          O                                       OH                                    O
                                                                                                                                                                              CACTVS    Marvin                                                                                                                                                                                                                N


                                                                                                                                                                                                                    (Molecular Networks), ACD Labs,
                                                                                                                                                                                                                                                                                                                                                                         N                                                                      NH                                  NH                                         NH2
                                                                                                                                                                                                                                                                                                 N                                                          O                                        O                                 O                                   O                                          O
                                                                                                                                                                                                                                                                                                                              N
         Structure:
                                             N         N             HN             N                        HN             N                       HN         NH                                                                                                                HO                                                                                 N                                     N                                N                                    N                                          N
                                                                                                                                                                                                                                                                                                                 HO


                                                                                                                                                                                                                    CACTVS, and the CDK
    HO        N            O
                                                                                                                                                                                                                                                                                                                                                                                                 O                                HO                                  HO                                              O
                                                                                                                                                                                                                                                                                                                                                        O
                                    HO                      OH   O                          OH           O                          O           O                   O

     O

              HN
                           NH                    N
                                                       OH
                                                                            N
                                                                                    OH
                                                                                                                    N
                                                                                                                            OH
                                                                                                                                                          N
                                                                                                                                                               O                   15                  15           implementation of the algorithm,                                 rank: 8                 M        rank: 8

                                                                                                                                                                                                                    based on the mobile H atoms listed
                                                                                                                                                                                                                                                                                                  NH                              NH

                           O             C, M, T, I                        M, T, I                             C, M, T, I                                C, M, T                                                                                                                     O                               O
                                                                                                                                                                                                                                                                                                                                                                        NH2                                   NH                                NH                                  NH2                                         NH2
                                                                                                                                                                                                                                                                                                                                                        O                                        O                                     O                                  O                                           O
                                                                                                                                                                                                                                                                                             NH                           NH

         C, M, T, I
                  OH                             OH                             OH                                      O                                 O
                                                                                                                                                                                MN_                   ACD
                                                                                                                                                                                                                    in the InChI.                                               HO                               O
                                                                                                                                                                                                                                                                                                                                                                N                                        N                                 NH                                   N                                          N


                                                                                                                                                                                                                                                                                                                                                        O                                       HO                                HO                                   O                                             HO



          N            N                HN            N               HN                N                     HN                N                   HN         NH
                                                                                                                                                                              Tautomer                Labs
                                                                                                                                                                                                                    According to the test results, Ambit-             M, C           rank: 4                 M        rank: 6

                                                                                                                                                                                                                    Tautomer’s       performance       is
                                                                                                                                                                                                                                                                                                  NH2                             NH
                                                                                                         HO                             OH                                                                                                                                       O                                   O
    HO                     OH       O                      OH    HO                              O                                              O                       OH
                                                                                                                                                                                                                                                                                             N


                                                                                                                                                                                                                    comparable to or better than Marvin,
                                                                                                                        N                                                                                                                                                                                                 N
                                                                                                                                                                                                                                                                                                                                                                         N                                                                      NH2
                  N                              N                              N                                               O
                                                                                                                                                          N
                                                                                                                                                               O
                                                                                                                                                                                   15                  2                                                                                                                                                    O                                                                         O
                       O                              O                                 O                                                                                                                                                                                       HO                               HO

                                                                                                                   C, M, T                                                                                                                                                                                                                                      N                                                                          N

          C, M, T                            C, M, T                       C, M, T                                                                       C, M, T
                                                                                                                                                                                                                    CACTVS       and      MN.TAUTOMER                                                                                                   O                                                                         O

                                                                                                                                                                                                                                                                     M, C, A         rank: 1 M, C, T,                 rank: 7                                                                                                                                  InChI/CDK                                    Daylight
                  O                              OH                             O                                       OH                                OH
                                                                                                                                                                                InChI/           Daylight           algorithms       and       generates                                        A
                                                                                                                                                                                 CDK
          N            N                 N             N              HN                N                     HN                N
                                                                                                                                                     N         N
                                                                                                                                                                                                                    considerably     more     tautomeric                             O
                                                                                                                                                                                                                                                                                                     NH
                                                                                                                                                                                                                                                                                                                  O
                                                                                                                                                                                                                                                                                                                                  NH2                                                                                                 O
                                                                                                                                                                                                                                                                                                                                                                                                                                                NH2




                                                                                                                                                                                                                    structures than ACD Labs and InChI-
                                                                                                                                                                                                                                                                                                                                                                         N                                                                                                           NH
                                                                                                                                                O                    OH                                                                                                                                                                                     O                                                                                                              O
                                                                                                                                                                                                                                                                                             NH                                                                                                                                            N
    HO                         OH   O                       OH   O                               OH      O                              O                                                                                                                                                                                 N

                                                                                                                                                                                                                                                                                                                                                                N                                                                                                               N
                                                                                                                                                          N
                                                                                                                                                                                    5                  10
                                                                                                                                                                                                                                                                                                                                                                                                                                  HO


                                                                                                                                                                                                                    based algorithms.
                                                                                                                                                                                                                                                                                 O                               O
                  N                              N                              N                                       N                                      OH
                                                                                                                                                                                                                                                                                                                                                        O                                                                                                             HO
                       O                               O                                O                                       O
                                                                                                                                                     C, M, T, I                                                                                                                      rank: 6                          rank: 5                                                                                                     canonical                                                         3 taut. Forms
          C, M, T                            C, M, T                       C, M, T                                 C, M, T
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    /2D structures
                                                                                                                                                                                                                                                                                                  NH2                             NH2
                                                                                                                                                                                                                                                                                 O                                O


                                                                                                                                                                                                                      Abbreviations:                                                         N                            N
                                                                                                                                                                                                                                                                                                                                                            O
                                                                                                                                                                                                                                                                                                                                                                        NH
                                                                                                                                                                                                                                                                                                                                                                                                                                                                           O
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    NH2             not available/
                                                                                                                                                                                                                      M - Marvin (ChemAxon) A - ACD Labs                        O                                O
                                                                                                                                                                                                                                                                                                                                                                NH                                                                                                              N


                                                                                                                                                                                                                      C - CACTVS            I - InChI/CDK
                                                                                                                                                                             MN.TAUTO




                                                                                                                                                                                                        InChI/CDK
                                                                                                                                                                                         ACD Labs




                                                                                                                                                                                                                                                                                                                                                        O                                                                                                              O
                                                                                                                                                                    CACTVS




                                                                                                                                                                                                                                                                     C, T, A,        rank: 2                 M    rank: 5
                                                                                                                                                    Marvin
                                                                                                                                        AMBIT




                                                                                                                                                                                                                      T - Mn_Tautomer
                                                                                                                                                                               MER




                                                                                                                                                                                                                                                                        I
                                Name                                                     Formula


    Divicine                                                                            C4H6N4O2                                        28          31              27         27            3             9        Ambit-Tautomer is part of the Ambit2
    Iodothiouracil                                                                      C4H3IN2OS                                       10           9               9          9            2             6
    Thioguanine                                                                          C5H5N5S                                        35          43              26         28            4             15       software package, distributed under
    Mercaptopurine                                                                       C5H4N4S                                        19          23               8          8            4             8        LGPL license [1] and using The
    Allopurinol                                                                          C5H4N4O                                        22          22              13         17            5             9
    2-Mercaptobenzothiazol                                                               C7H5NS2                                        6           5               2          2             2             2        Chemistry Development Kit (CDK)
    Amitrole                                                                        C2H4N4                                               8           8               5          7            3              5
                                                                                                                                                                                                                    library [2] for basic cheminformatics
    Thiotetronic acid                                                              C4H4O2S                                               5           5               5          5            3              0       functionality. Ambit-Tautomer utilizes a
    2-thiouracil                                                                  C4H4N2OS                                              10           9               9          9            2              3       depth-first search algorithm, combined
    Flucytosine                                                                   C4H4FN3O                                              10           9               5          9            2              6
    Citrazinic acid                                                                C6H5NO4                                               5           5               5          5            2              2       with a set of rules for tautomeric
    Tenoxicam                                                                    C13H11N3O4S2                                           15          14               8         11            3              2       transformation.
    Mitoguazone                                                                     C5H12N8                                             31          35              12         28            4              4
    Methimazole                                                                    C4H6N2S                                               3           3               2          3            2              3       The Ambit implementation of OpenTox
    Ciclopirox                                                                    C12H17NO2                                              8           3               3          2            2              0
    Dithranal                                                                      C14H10O3                                             52          60              48         48            2              0       Web [3] services for predictive
    Enprofylline                                                                  C8H10N4O2                                             14          17              11         11            3              8       toxicology is being extended to include
    Thymine                                                                       C5H6N2O2                                              10           9              9           9            2              6
    Abscinic avid                                                                  C15H20O4                                             30          30              5          10            2              0
                                                                                                                                                                                                                    the tautomer generation algorithm. A
    4-acetamidobenzaldehyde                                                        C9H9NO2                                               9           9              9          9             2              2       web page, providing online tautomer
    Acetoacetanilide                                                              C10H11NO2                                             21          21              8          8             2              2       generation     by    several    different
    Acetobromoglucose                                                             C14H19BrO9                                            16          16              0          0             2              0
    4-acetoxy-benzaldehyde                                                          C9H8O3                                               2           2              0          0             2              0       algorithms, including Ambit-Tautomer, is
    3-acetoxy-2-cyclohexen-1-one                                                   C8H10O3                                              10          10              5          5             2              0       available at:
    2-acetyl-butyrolactone                                                          C6H8O3                                               5           5              5          5             2              0
    N-Acetyl-L-cysteine                                                           C5H9NO3S                                              14          20              2          3             2              2
    2-aminobenzamide                                                               C7H8N2O                                              11           9              7          5             2              2                                   http://apps.ideaconsult.net:8080/ambit2/depict/tautomer
    2-aminobenzonitrile                                                             C7H6N2                                               5           5              0          0             2              0
    2-aminodiphenylamine                                                           C12H12N2                                             21          20              0          0             2              0
                                                                                                                                                                                                                    References
    4-aminohippuric acid                                                          C9H10N2O3                                             18          24              5          2             2              2       [1] AMBIT project, http://ambit.sourceforge.net
    Mesalazine                                                                     C7H7NO3                                              13          12              9          6             2              0
    2-amino-3-hydroxypyridine                                                      C5H6N2O                                              11          10              4          8             2              2       [2] Steinbeck C., Hoppe C., Kuhn S., Floris M., Guha R., Willighagen E.L., ,Recent Developments of the
    2-(aminomethyl)pyridine                                                         C6H8N2                                              18          23              0          0             2              0       Chemistry Development Kit (CDK) - An Open-Source Java Library for Chemo- and Bioinformatics". Curr.
    2-aminophenol                                                                  C6H7NO                                               11          10              7          0             2              0
    3-aminophenol                                                                  C6H7NO                                               10          9               9          0             2              0
                                                                                                                                                                                                                    Pharm. Des. 2006; 12(17):2111-2120 (DOI: 10.2174/138161206777585274)
    4-aminophenol                                                                  C6H7NO                                                6          6               4          0             2              0       [3] Jeliazkova N., Jeliazkov V. AMBIT RESTful web services: an implementation of the OpenTox application
    benzhydrazide                                                               C6H5CONHNH2                                              3          6               2          3             2              2
    Carbazole                                                                       C12H9N                                              12          7               0          0             2              0
                                                                                                                                                                                                                    programming interface, Journal of Cheminformatics 2011, 3:18, doi:10.1186/1758-2946-3-18.;
                                                                                                                                                                                                                    http://www.jcheminf.com/content/3/1/18

Recommended

Cheap Computing by
Cheap ComputingCheap Computing
Cheap Computingschroedinger
2.2K views22 slides
Inaugural Addresses by
Inaugural AddressesInaugural Addresses
Inaugural AddressesBooz Allen Hamilton
71.8K views1 slide
Teaching Students with Emojis, Emoticons, & Textspeak by
Teaching Students with Emojis, Emoticons, & TextspeakTeaching Students with Emojis, Emoticons, & Textspeak
Teaching Students with Emojis, Emoticons, & TextspeakShelly Sanchez Terrell
210.2K views49 slides
Hype vs. Reality: The AI Explainer by
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerLuminary Labs
497.8K views28 slides
Study: The Future of VR, AR and Self-Driving Cars by
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsLinkedIn
869.8K views28 slides
ChatGPT and the Future of Work - Clark Boyd by
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
22.6K views69 slides

More Related Content

Recently uploaded

Top PCD Pharma Franchise Companies in India | Saphnix Lifesciences by
Top PCD Pharma Franchise Companies in India | Saphnix LifesciencesTop PCD Pharma Franchise Companies in India | Saphnix Lifesciences
Top PCD Pharma Franchise Companies in India | Saphnix LifesciencesSaphnix Lifesciences
26 views11 slides
Explore new Frontiers in Medicine with AI.pdf by
Explore new Frontiers in Medicine with AI.pdfExplore new Frontiers in Medicine with AI.pdf
Explore new Frontiers in Medicine with AI.pdfAnne Marie
8 views31 slides
status epilepticus-management by
status epilepticus-managementstatus epilepticus-management
status epilepticus-managementVamsi Krishna Koneru
12 views91 slides
BODY COMPOSITION.pptx by
BODY COMPOSITION.pptxBODY COMPOSITION.pptx
BODY COMPOSITION.pptxAneriPatwari
35 views46 slides
Taking Action to Improve the Patient Journey With Transthyretin Amyloidosis (... by
Taking Action to Improve the Patient Journey With Transthyretin Amyloidosis (...Taking Action to Improve the Patient Journey With Transthyretin Amyloidosis (...
Taking Action to Improve the Patient Journey With Transthyretin Amyloidosis (...PeerVoice
7 views23 slides
T1DM case example.pptx by
T1DM case example.pptxT1DM case example.pptx
T1DM case example.pptxNguyễn đình Đức
25 views17 slides

Recently uploaded(20)

Top PCD Pharma Franchise Companies in India | Saphnix Lifesciences by Saphnix Lifesciences
Top PCD Pharma Franchise Companies in India | Saphnix LifesciencesTop PCD Pharma Franchise Companies in India | Saphnix Lifesciences
Top PCD Pharma Franchise Companies in India | Saphnix Lifesciences
Explore new Frontiers in Medicine with AI.pdf by Anne Marie
Explore new Frontiers in Medicine with AI.pdfExplore new Frontiers in Medicine with AI.pdf
Explore new Frontiers in Medicine with AI.pdf
Anne Marie8 views
Taking Action to Improve the Patient Journey With Transthyretin Amyloidosis (... by PeerVoice
Taking Action to Improve the Patient Journey With Transthyretin Amyloidosis (...Taking Action to Improve the Patient Journey With Transthyretin Amyloidosis (...
Taking Action to Improve the Patient Journey With Transthyretin Amyloidosis (...
PeerVoice7 views
sales forecasting (Pharma) by sristi51
sales forecasting (Pharma)sales forecasting (Pharma)
sales forecasting (Pharma)
sristi519 views
Basic Life support (BLS) workshop presentation. by Dr Sanket Nandekar
Basic Life support (BLS) workshop presentation.Basic Life support (BLS) workshop presentation.
Basic Life support (BLS) workshop presentation.
eTEP -RS Dr.TVR.pptx by Varunraju9
eTEP -RS Dr.TVR.pptxeTEP -RS Dr.TVR.pptx
eTEP -RS Dr.TVR.pptx
Varunraju9141 views
DRUG REPUROSING SEMINAR.pptx by Riya Gagnani
DRUG REPUROSING SEMINAR.pptxDRUG REPUROSING SEMINAR.pptx
DRUG REPUROSING SEMINAR.pptx
Riya Gagnani6 views
STR-324.pdf by phbordeau
STR-324.pdfSTR-324.pdf
STR-324.pdf
phbordeau19 views
PATIENTCOUNSELLING in.pptx by skShashi1
PATIENTCOUNSELLING  in.pptxPATIENTCOUNSELLING  in.pptx
PATIENTCOUNSELLING in.pptx
skShashi119 views
Taking Action to Improve the Patient Journey With Transthyretin Amyloidosis (... by PeerVoice
Taking Action to Improve the Patient Journey With Transthyretin Amyloidosis (...Taking Action to Improve the Patient Journey With Transthyretin Amyloidosis (...
Taking Action to Improve the Patient Journey With Transthyretin Amyloidosis (...
PeerVoice11 views
Top Ayurvedic PCD Companies in India Riding the Wave of Wellness Trends by muskansbl01
Top Ayurvedic PCD Companies in India Riding the Wave of Wellness TrendsTop Ayurvedic PCD Companies in India Riding the Wave of Wellness Trends
Top Ayurvedic PCD Companies in India Riding the Wave of Wellness Trends
muskansbl0139 views

Featured

How to have difficult conversations by
How to have difficult conversations How to have difficult conversations
How to have difficult conversations Rajiv Jayarajah, MAppComm, ACC
4.7K views19 slides
Introduction to Data Science by
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceChristy Abraham Joy
82.2K views51 slides
Time Management & Productivity - Best Practices by
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
169.7K views42 slides
The six step guide to practical project management by
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
36.6K views27 slides
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright... by
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
12.6K views21 slides
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present... by
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
55.5K views138 slides

Featured(20)

Time Management & Productivity - Best Practices by Vit Horky
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
Vit Horky169.7K views
The six step guide to practical project management by MindGenius
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
MindGenius36.6K views
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright... by RachelPearson36
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
RachelPearson3612.6K views
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present... by Applitools
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Applitools55.5K views
12 Ways to Increase Your Influence at Work by GetSmarter
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
GetSmarter401.6K views
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G... by DevGAMM Conference
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
DevGAMM Conference3.6K views
Barbie - Brand Strategy Presentation by Erica Santiago
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
Erica Santiago25.1K views
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well by Saba Software
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Saba Software25.2K views
Introduction to C Programming Language by Simplilearn
Introduction to C Programming LanguageIntroduction to C Programming Language
Introduction to C Programming Language
Simplilearn8.4K views
The Pixar Way: 37 Quotes on Developing and Maintaining a Creative Company (fr... by Palo Alto Software
The Pixar Way: 37 Quotes on Developing and Maintaining a Creative Company (fr...The Pixar Way: 37 Quotes on Developing and Maintaining a Creative Company (fr...
The Pixar Way: 37 Quotes on Developing and Maintaining a Creative Company (fr...
Palo Alto Software88.4K views
9 Tips for a Work-free Vacation by Weekdone.com
9 Tips for a Work-free Vacation9 Tips for a Work-free Vacation
9 Tips for a Work-free Vacation
Weekdone.com7.2K views
How to Map Your Future by SlideShop.com
How to Map Your FutureHow to Map Your Future
How to Map Your Future
SlideShop.com275.1K views
Beyond Pride: Making Digital Marketing & SEO Authentically LGBTQ+ Inclusive -... by AccuraCast
Beyond Pride: Making Digital Marketing & SEO Authentically LGBTQ+ Inclusive -...Beyond Pride: Making Digital Marketing & SEO Authentically LGBTQ+ Inclusive -...
Beyond Pride: Making Digital Marketing & SEO Authentically LGBTQ+ Inclusive -...
AccuraCast3.4K views
Exploring ChatGPT for Effective Teaching and Learning.pptx by Stan Skrabut, Ed.D.
Exploring ChatGPT for Effective Teaching and Learning.pptxExploring ChatGPT for Effective Teaching and Learning.pptx
Exploring ChatGPT for Effective Teaching and Learning.pptx
Stan Skrabut, Ed.D.57.7K views

AMBIT-TAUTOMER: AN OPEN SOURCE TOOL FOR TAUTOMER GENERATION

  • 1. AMBIT-TAUTOMER: AN OPEN SOURCE TOOL FOR TAUTOMER GENERATION Nikolay Kochev a, Vesselina Paskaleva a, Nina Jeliazkova b a) University of Plovdiv, Department of Analytical Chemistry and Computer Chemistry, 24, Tzar Assen Str., Plovdiv 4000, Bulgaria b) Ideaconsult Ltd, 4 A. Kanchev str., Sofia 1000, Bulgaria, jeliazkova.nina@gmail.com Tautomer generation flow chart Ambit-Tautomer Basic Features Structure input OC(O)=C(N)C 0 4 unused rules HO NH2 marks the current Software characteristics Tautomer generation algorithms HO NH2 OC=C at 213 rule used to 1 3 generate two • CDK.sf.net based structure representation, • Pure combinatorial algorithm OC=C at 013 possible states • Incremental approach (based on depth first HO 2 5 CH input, output and info processing NC=C at 431 3 search algorithm) for rule combination with HO CH3 • Supports standard chemical formats: local rule corrections and refinement on the way SMILES, InChI, MOL/SDF file, CML (CDK representation) 4 used rules 4 used rules 0 0 HO NH HO NH2 • Exhaustive tautomer generation N=CC at 431 NC=C at 431 1 3 unused rules 1 • Customizable set of rules and post- Customizable set of transformation rules unused rules 3 HO 2 5 CH OC=C at 213 3 HO 2 5 CH generation filters • Basic set of 1-3 and 1-5 proton shift rules Substructure search NC=C at 435 OC=C at 013 3 • Set of predefined rules • Additional rules: 1-7 proton shifts, chlorine used rules 0 4 atom shifts, ring-chain rules HO NH2 4 0 4 0 4 HO NH2 • Tautomer ranking based on simple HO 0 NH HO NH2 NC=C at 4 3 1 HO NH2 1 • Rule description based on SMARTS OC=C at 013 3 empirical rules 1 3 1 3 1 3 O 2 5 CH unused rules 3 HO 2 5 CH HO 2 5 CH HO 2 5 CH 3 used rules 3 2 OC=C at 213 Combinations of non-overlapping rules HO CH3 NC=C at 431 Overlapping rules used rules used rules O=CC at 013 ↔ N=CC at 431 1 ↔ 0 0 1 - simple combinations do Initial rules list NC=C at 435 N=CC N=CC at at 431 435 H2 N HO NH2 not work 0 4 OH 00 HO NH2 used rules - rule conflicts are used rules O 0 4 NH2 1 3 NC=C at 431 possible NC=C at 431 HO 2 5 CH OC=C at 013 each tautomer is described HO CH3 - some tautomers might OC=C at 013 1 3 3 OC=C at 213 O=CC at 2 1 3 HO 2 5 CH as a binary combination be omitted 3 Generation of all HN O - more sophisticated possible combinations 11 Result approach is needed of the rule states Post-generation filtering HO NH2 O NH2 based on Depth – HN OH O NH2 HO NH2 HO NH duplicaties, topological 10 first search with equivalency, allene atoms, HO CH3 HO CH3 refinement of the rule incorrect structures, … Ranking HO NH2 HO NH list at each step. H2 N O HO CH3 01 HO CH3 HO CH2 HO CH2 HO CH3 Violuric Acid Tautomer Generation HO N O An exhaustive comparison of the Pemoline Tautomer Generation O NH Software Comparison O HN NH algorithm against several other Software Comparison NH O software packages for tautomer rank: AMBIT 3; M 1 O Ambit (15 tautomers generated) Number of generation is performed: Ambit Marvin (ChemAxon) CACTVS Mn_Tautomer ACD Labs tautomers, generated by: ChemAxon Marvin, MN.TAUTOMER O NH O NH Target O O OH O CACTVS Marvin N (Molecular Networks), ACD Labs, N NH NH NH2 N O O O O O N Structure: N N HN N HN N HN NH HO N N N N N HO CACTVS, and the CDK HO N O O HO HO O O HO OH O OH O O O O O HN NH N OH N OH N OH N O 15 15 implementation of the algorithm, rank: 8 M rank: 8 based on the mobile H atoms listed NH NH O C, M, T, I M, T, I C, M, T, I C, M, T O O NH2 NH NH NH2 NH2 O O O O O NH NH C, M, T, I OH OH OH O O MN_ ACD in the InChI. HO O N N NH N N O HO HO O HO N N HN N HN N HN N HN NH Tautomer Labs According to the test results, Ambit- M, C rank: 4 M rank: 6 Tautomer’s performance is NH2 NH HO OH O O HO OH O OH HO O O OH N comparable to or better than Marvin, N N N NH2 N N N O N O 15 2 O O O O O HO HO C, M, T N N C, M, T C, M, T C, M, T C, M, T CACTVS and MN.TAUTOMER O O M, C, A rank: 1 M, C, T, rank: 7 InChI/CDK Daylight O OH O OH OH InChI/ Daylight algorithms and generates A CDK N N N N HN N HN N N N considerably more tautomeric O NH O NH2 O NH2 structures than ACD Labs and InChI- N NH O OH O O NH N HO OH O OH O OH O O N N N N 5 10 HO based algorithms. O O N N N N OH O HO O O O O C, M, T, I rank: 6 rank: 5 canonical 3 taut. Forms C, M, T C, M, T C, M, T C, M, T /2D structures NH2 NH2 O O Abbreviations: N N O NH O NH2 not available/ M - Marvin (ChemAxon) A - ACD Labs O O NH N C - CACTVS I - InChI/CDK MN.TAUTO InChI/CDK ACD Labs O O CACTVS C, T, A, rank: 2 M rank: 5 Marvin AMBIT T - Mn_Tautomer MER I Name Formula Divicine C4H6N4O2 28 31 27 27 3 9 Ambit-Tautomer is part of the Ambit2 Iodothiouracil C4H3IN2OS 10 9 9 9 2 6 Thioguanine C5H5N5S 35 43 26 28 4 15 software package, distributed under Mercaptopurine C5H4N4S 19 23 8 8 4 8 LGPL license [1] and using The Allopurinol C5H4N4O 22 22 13 17 5 9 2-Mercaptobenzothiazol C7H5NS2 6 5 2 2 2 2 Chemistry Development Kit (CDK) Amitrole C2H4N4 8 8 5 7 3 5 library [2] for basic cheminformatics Thiotetronic acid C4H4O2S 5 5 5 5 3 0 functionality. Ambit-Tautomer utilizes a 2-thiouracil C4H4N2OS 10 9 9 9 2 3 depth-first search algorithm, combined Flucytosine C4H4FN3O 10 9 5 9 2 6 Citrazinic acid C6H5NO4 5 5 5 5 2 2 with a set of rules for tautomeric Tenoxicam C13H11N3O4S2 15 14 8 11 3 2 transformation. Mitoguazone C5H12N8 31 35 12 28 4 4 Methimazole C4H6N2S 3 3 2 3 2 3 The Ambit implementation of OpenTox Ciclopirox C12H17NO2 8 3 3 2 2 0 Dithranal C14H10O3 52 60 48 48 2 0 Web [3] services for predictive Enprofylline C8H10N4O2 14 17 11 11 3 8 toxicology is being extended to include Thymine C5H6N2O2 10 9 9 9 2 6 Abscinic avid C15H20O4 30 30 5 10 2 0 the tautomer generation algorithm. A 4-acetamidobenzaldehyde C9H9NO2 9 9 9 9 2 2 web page, providing online tautomer Acetoacetanilide C10H11NO2 21 21 8 8 2 2 generation by several different Acetobromoglucose C14H19BrO9 16 16 0 0 2 0 4-acetoxy-benzaldehyde C9H8O3 2 2 0 0 2 0 algorithms, including Ambit-Tautomer, is 3-acetoxy-2-cyclohexen-1-one C8H10O3 10 10 5 5 2 0 available at: 2-acetyl-butyrolactone C6H8O3 5 5 5 5 2 0 N-Acetyl-L-cysteine C5H9NO3S 14 20 2 3 2 2 2-aminobenzamide C7H8N2O 11 9 7 5 2 2 http://apps.ideaconsult.net:8080/ambit2/depict/tautomer 2-aminobenzonitrile C7H6N2 5 5 0 0 2 0 2-aminodiphenylamine C12H12N2 21 20 0 0 2 0 References 4-aminohippuric acid C9H10N2O3 18 24 5 2 2 2 [1] AMBIT project, http://ambit.sourceforge.net Mesalazine C7H7NO3 13 12 9 6 2 0 2-amino-3-hydroxypyridine C5H6N2O 11 10 4 8 2 2 [2] Steinbeck C., Hoppe C., Kuhn S., Floris M., Guha R., Willighagen E.L., ,Recent Developments of the 2-(aminomethyl)pyridine C6H8N2 18 23 0 0 2 0 Chemistry Development Kit (CDK) - An Open-Source Java Library for Chemo- and Bioinformatics". Curr. 2-aminophenol C6H7NO 11 10 7 0 2 0 3-aminophenol C6H7NO 10 9 9 0 2 0 Pharm. Des. 2006; 12(17):2111-2120 (DOI: 10.2174/138161206777585274) 4-aminophenol C6H7NO 6 6 4 0 2 0 [3] Jeliazkova N., Jeliazkov V. AMBIT RESTful web services: an implementation of the OpenTox application benzhydrazide C6H5CONHNH2 3 6 2 3 2 2 Carbazole C12H9N 12 7 0 0 2 0 programming interface, Journal of Cheminformatics 2011, 3:18, doi:10.1186/1758-2946-3-18.; http://www.jcheminf.com/content/3/1/18