Evaluating the Quality and
Performance of Automatic Atom
      Mapping Algorithms

                Daniel Lowe and Roger Sayle
                    NextMove Software
                       Cambridge, UK

 ACS National Meeting, Philadelphia, USA 20th August 2012
What is Atom-Mapping?




                                                      Mapping
                                                      algorithm




ACS National Meeting, Philadelphia, USA 20th August 2012
Why Perform Atom-Mapping?

• Assigning roles to reagents

• Normalization of reactions for registration




     ACS National Meeting, Philadelphia, USA 20th August 2012
Why Perform Atom-Mapping?

• More precise database searches
  – Solvents/catalysts can be distinguished from
    reactants
  – Allows the relationship between the reactant
    atoms and product atoms to be made explicit




    ACS National Meeting, Philadelphia, USA 20th August 2012
Example
• I want to find reactions converting an alkene
  to a cyclopropane so I search for C=C>>C1CC1




     ACS National Meeting, Philadelphia, USA 20th August 2012
Why Perform Atom-Mapping?

• Identifying suspect reactions:




     ACS National Meeting, Philadelphia, USA 20th August 2012
Qualities to look for in an atom
        mapping algorithm
• Chemically plausible atom mappings
• Ability to distinguish genuine reactants from
  solvents/catalysts
• Support for unbalanced reactions
  – Side product not specified
  – Reactant stoichiometry > 1
• Fast run-time


     ACS National Meeting, Philadelphia, USA 20th August 2012
Algorithms Evaluated

      Vendor:Program                                         Version
       ChemAxon:Marvin                                        5.10.1
              GGA:Indigo                                       1.1
         InfoChem:ICMAP                                       5.10
PerkinElmer:ChemDraw Ultra                                    12.0




  ACS National Meeting, Philadelphia, USA 20th August 2012
Methodology

                        Test set                                     Reactions
         Pharmaceutical ELN subset                                    18,244
           ChemReact68 database                                       67,926
           SPRESI database subset                                      5,230
      Reactions extracted from 2008-                                  562,872
     2011 USPTO patent applications*



* Lowe, D. M. Automated Extraction of Reactions from the Patent Literature.
243rd ACS National Meeting & Exposition, San Diego, CA, March 27, 2012.

          ACS National Meeting, Philadelphia, USA 20th August 2012
Methodology-cont.

• Reaction SMILES were used as input and
  output for all algorithms bar ICMAP
• Input and output was converted to and from
  RDF for use with ICMAP
• Indigo was ran with its default configuration
  and more lenient settings for matching
  valences, charges and bond orders
• Marvin was configured to use its best
  quality mapping strategy
     ACS National Meeting, Philadelphia, USA 20th August 2012
Ability to map all product atoms




  ACS National Meeting, Philadelphia, USA 20th August 2012
c-c bonds broken




ACS National Meeting, Philadelphia, USA 20th August 2012
Speed Comparison




  Average           1.7                    3.6                  1.6       4.0
reagents per
  reaction
               ACS National Meeting, Philadelphia, USA 20th August 2012
Simple mappings




        Marvin/ChemDraw/Indigo/ICMAP
ACS National Meeting, Philadelphia, USA 20th August 2012
Simple mappings




        Marvin/ChemDraw/Indigo/ICMAP
ACS National Meeting, Philadelphia, USA 20th August 2012
More complicated Mappings




                                        Marvin




                                   ChemDraw

ACS National Meeting, Philadelphia, USA 20th August 2012
More complicated Mappings



                                        Indigo




                                        ICMAP


ACS National Meeting, Philadelphia, USA 20th August 2012
Reuse of reactants




ACS National Meeting, Philadelphia, USA 20th August 2012
Reuse of reactants




                                        Marvin
ACS National Meeting, Philadelphia, USA 20th August 2012
Reuse of reactants




                                   ChemDraw
ACS National Meeting, Philadelphia, USA 20th August 2012
Reuse of reactants




                                        Indigo
ACS National Meeting, Philadelphia, USA 20th August 2012
Reuse of reactants




                                        ICMAP
ACS National Meeting, Philadelphia, USA 20th August 2012
Single Atom Mapping




                               ICMAP/Marvin




                          ChemDraw/Indigo

ACS National Meeting, Philadelphia, USA 20th August 2012
Bugs and quirks

• Marvin
  – 2 unsuccessful mappings produced unchecked
    exceptions rather than checked exceptions
• ChemDraw
  – Hydrogen on aromatic atoms missing in SMILES
     output
• Indigo
  – Calculation of valency fails for aromatic sulfur


     ACS National Meeting, Philadelphia, USA 20th August 2012
Bugs and quirks

• ICMAP
  – Single atom products are interpreted as empty
    molecules or occasionally replaced by a product
    from a previous reaction (bug reported)
  – Input files must be < 2gb and use dos line endings




    ACS National Meeting, Philadelphia, USA 20th August 2012
conclusions

• ICMAP produced the best quality mappings on
  the tested sets

• Atom mapping isn’t as simple as finding a
  maximum common subgraph mapping

• In all the algorithms there were aspects that
  could be improved to yield appreciable
  benefits
     ACS National Meeting, Philadelphia, USA 20th August 2012
acknowledgements

• Ed Griffen and Nick Tomkinson, AstraZeneca.
• Andrew Wooster, GSK.
• Hans Kraut, InfoChem


• Thank you for your time.




     ACS National Meeting, Philadelphia, USA 20th August 2012

Evaluating the Quality and Performance of Automatic Atom Mapping Algorithms

  • 1.
    Evaluating the Qualityand Performance of Automatic Atom Mapping Algorithms Daniel Lowe and Roger Sayle NextMove Software Cambridge, UK ACS National Meeting, Philadelphia, USA 20th August 2012
  • 2.
    What is Atom-Mapping? Mapping algorithm ACS National Meeting, Philadelphia, USA 20th August 2012
  • 3.
    Why Perform Atom-Mapping? •Assigning roles to reagents • Normalization of reactions for registration ACS National Meeting, Philadelphia, USA 20th August 2012
  • 4.
    Why Perform Atom-Mapping? •More precise database searches – Solvents/catalysts can be distinguished from reactants – Allows the relationship between the reactant atoms and product atoms to be made explicit ACS National Meeting, Philadelphia, USA 20th August 2012
  • 5.
    Example • I wantto find reactions converting an alkene to a cyclopropane so I search for C=C>>C1CC1 ACS National Meeting, Philadelphia, USA 20th August 2012
  • 6.
    Why Perform Atom-Mapping? •Identifying suspect reactions: ACS National Meeting, Philadelphia, USA 20th August 2012
  • 7.
    Qualities to lookfor in an atom mapping algorithm • Chemically plausible atom mappings • Ability to distinguish genuine reactants from solvents/catalysts • Support for unbalanced reactions – Side product not specified – Reactant stoichiometry > 1 • Fast run-time ACS National Meeting, Philadelphia, USA 20th August 2012
  • 8.
    Algorithms Evaluated Vendor:Program Version ChemAxon:Marvin 5.10.1 GGA:Indigo 1.1 InfoChem:ICMAP 5.10 PerkinElmer:ChemDraw Ultra 12.0 ACS National Meeting, Philadelphia, USA 20th August 2012
  • 9.
    Methodology Test set Reactions Pharmaceutical ELN subset 18,244 ChemReact68 database 67,926 SPRESI database subset 5,230 Reactions extracted from 2008- 562,872 2011 USPTO patent applications* * Lowe, D. M. Automated Extraction of Reactions from the Patent Literature. 243rd ACS National Meeting & Exposition, San Diego, CA, March 27, 2012. ACS National Meeting, Philadelphia, USA 20th August 2012
  • 10.
    Methodology-cont. • Reaction SMILESwere used as input and output for all algorithms bar ICMAP • Input and output was converted to and from RDF for use with ICMAP • Indigo was ran with its default configuration and more lenient settings for matching valences, charges and bond orders • Marvin was configured to use its best quality mapping strategy ACS National Meeting, Philadelphia, USA 20th August 2012
  • 11.
    Ability to mapall product atoms ACS National Meeting, Philadelphia, USA 20th August 2012
  • 12.
    c-c bonds broken ACSNational Meeting, Philadelphia, USA 20th August 2012
  • 13.
    Speed Comparison Average 1.7 3.6 1.6 4.0 reagents per reaction ACS National Meeting, Philadelphia, USA 20th August 2012
  • 14.
    Simple mappings Marvin/ChemDraw/Indigo/ICMAP ACS National Meeting, Philadelphia, USA 20th August 2012
  • 15.
    Simple mappings Marvin/ChemDraw/Indigo/ICMAP ACS National Meeting, Philadelphia, USA 20th August 2012
  • 16.
    More complicated Mappings Marvin ChemDraw ACS National Meeting, Philadelphia, USA 20th August 2012
  • 17.
    More complicated Mappings Indigo ICMAP ACS National Meeting, Philadelphia, USA 20th August 2012
  • 18.
    Reuse of reactants ACSNational Meeting, Philadelphia, USA 20th August 2012
  • 19.
    Reuse of reactants Marvin ACS National Meeting, Philadelphia, USA 20th August 2012
  • 20.
    Reuse of reactants ChemDraw ACS National Meeting, Philadelphia, USA 20th August 2012
  • 21.
    Reuse of reactants Indigo ACS National Meeting, Philadelphia, USA 20th August 2012
  • 22.
    Reuse of reactants ICMAP ACS National Meeting, Philadelphia, USA 20th August 2012
  • 23.
    Single Atom Mapping ICMAP/Marvin ChemDraw/Indigo ACS National Meeting, Philadelphia, USA 20th August 2012
  • 24.
    Bugs and quirks •Marvin – 2 unsuccessful mappings produced unchecked exceptions rather than checked exceptions • ChemDraw – Hydrogen on aromatic atoms missing in SMILES output • Indigo – Calculation of valency fails for aromatic sulfur ACS National Meeting, Philadelphia, USA 20th August 2012
  • 25.
    Bugs and quirks •ICMAP – Single atom products are interpreted as empty molecules or occasionally replaced by a product from a previous reaction (bug reported) – Input files must be < 2gb and use dos line endings ACS National Meeting, Philadelphia, USA 20th August 2012
  • 26.
    conclusions • ICMAP producedthe best quality mappings on the tested sets • Atom mapping isn’t as simple as finding a maximum common subgraph mapping • In all the algorithms there were aspects that could be improved to yield appreciable benefits ACS National Meeting, Philadelphia, USA 20th August 2012
  • 27.
    acknowledgements • Ed Griffenand Nick Tomkinson, AstraZeneca. • Andrew Wooster, GSK. • Hans Kraut, InfoChem • Thank you for your time. ACS National Meeting, Philadelphia, USA 20th August 2012