Evaluating the Quality andPerformance of Automatic Atom      Mapping Algorithms                Daniel Lowe and Roger Sayle...
What is Atom-Mapping?                                                      Mapping                                        ...
Why Perform Atom-Mapping?• Assigning roles to reagents• Normalization of reactions for registration     ACS National Meeti...
Why Perform Atom-Mapping?• More precise database searches  – Solvents/catalysts can be distinguished from    reactants  – ...
Example• I want to find reactions converting an alkene  to a cyclopropane so I search for C=C>>C1CC1     ACS National Meet...
Why Perform Atom-Mapping?• Identifying suspect reactions:     ACS National Meeting, Philadelphia, USA 20th August 2012
Qualities to look for in an atom        mapping algorithm• Chemically plausible atom mappings• Ability to distinguish genu...
Algorithms Evaluated      Vendor:Program                                         Version       ChemAxon:Marvin            ...
Methodology                        Test set                                     Reactions         Pharmaceutical ELN subse...
Methodology-cont.• Reaction SMILES were used as input and  output for all algorithms bar ICMAP• Input and output was conve...
Ability to map all product atoms  ACS National Meeting, Philadelphia, USA 20th August 2012
c-c bonds brokenACS National Meeting, Philadelphia, USA 20th August 2012
Speed Comparison  Average           1.7                    3.6                  1.6       4.0reagents per  reaction       ...
Simple mappings        Marvin/ChemDraw/Indigo/ICMAPACS National Meeting, Philadelphia, USA 20th August 2012
Simple mappings        Marvin/ChemDraw/Indigo/ICMAPACS National Meeting, Philadelphia, USA 20th August 2012
More complicated Mappings                                        Marvin                                   ChemDrawACS Nati...
More complicated Mappings                                        Indigo                                        ICMAPACS Na...
Reuse of reactantsACS National Meeting, Philadelphia, USA 20th August 2012
Reuse of reactants                                        MarvinACS National Meeting, Philadelphia, USA 20th August 2012
Reuse of reactants                                   ChemDrawACS National Meeting, Philadelphia, USA 20th August 2012
Reuse of reactants                                        IndigoACS National Meeting, Philadelphia, USA 20th August 2012
Reuse of reactants                                        ICMAPACS National Meeting, Philadelphia, USA 20th August 2012
Single Atom Mapping                               ICMAP/Marvin                          ChemDraw/IndigoACS National Meetin...
Bugs and quirks• Marvin  – 2 unsuccessful mappings produced unchecked    exceptions rather than checked exceptions• ChemDr...
Bugs and quirks• ICMAP  – Single atom products are interpreted as empty    molecules or occasionally replaced by a product...
conclusions• ICMAP produced the best quality mappings on  the tested sets• Atom mapping isn’t as simple as finding a  maxi...
acknowledgements• Ed Griffen and Nick Tomkinson, AstraZeneca.• Andrew Wooster, GSK.• Hans Kraut, InfoChem• Thank you for y...
Upcoming SlideShare
Loading in …5
×

Evaluating the Quality and Performance of Automatic Atom Mapping Algorithms

501 views

Published on

Presented in the Open Notebook Science/Open Chemistry/Electronic Lab Notebook symposium. 20th August 2012, Philadelphia ACS

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
501
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Evaluating the Quality and Performance of Automatic Atom Mapping Algorithms

  1. 1. Evaluating the Quality andPerformance of Automatic Atom Mapping Algorithms Daniel Lowe and Roger Sayle NextMove Software Cambridge, UK ACS National Meeting, Philadelphia, USA 20th August 2012
  2. 2. What is Atom-Mapping? Mapping algorithmACS National Meeting, Philadelphia, USA 20th August 2012
  3. 3. Why Perform Atom-Mapping?• Assigning roles to reagents• Normalization of reactions for registration ACS National Meeting, Philadelphia, USA 20th August 2012
  4. 4. Why Perform Atom-Mapping?• More precise database searches – Solvents/catalysts can be distinguished from reactants – Allows the relationship between the reactant atoms and product atoms to be made explicit ACS National Meeting, Philadelphia, USA 20th August 2012
  5. 5. Example• I want to find reactions converting an alkene to a cyclopropane so I search for C=C>>C1CC1 ACS National Meeting, Philadelphia, USA 20th August 2012
  6. 6. Why Perform Atom-Mapping?• Identifying suspect reactions: ACS National Meeting, Philadelphia, USA 20th August 2012
  7. 7. Qualities to look for in an atom mapping algorithm• Chemically plausible atom mappings• Ability to distinguish genuine reactants from solvents/catalysts• Support for unbalanced reactions – Side product not specified – Reactant stoichiometry > 1• Fast run-time ACS National Meeting, Philadelphia, USA 20th August 2012
  8. 8. Algorithms Evaluated Vendor:Program Version ChemAxon:Marvin 5.10.1 GGA:Indigo 1.1 InfoChem:ICMAP 5.10PerkinElmer:ChemDraw Ultra 12.0 ACS National Meeting, Philadelphia, USA 20th August 2012
  9. 9. Methodology Test set Reactions Pharmaceutical ELN subset 18,244 ChemReact68 database 67,926 SPRESI database subset 5,230 Reactions extracted from 2008- 562,872 2011 USPTO patent applications** Lowe, D. M. Automated Extraction of Reactions from the Patent Literature.243rd ACS National Meeting & Exposition, San Diego, CA, March 27, 2012. ACS National Meeting, Philadelphia, USA 20th August 2012
  10. 10. Methodology-cont.• Reaction SMILES were used as input and output for all algorithms bar ICMAP• Input and output was converted to and from RDF for use with ICMAP• Indigo was ran with its default configuration and more lenient settings for matching valences, charges and bond orders• Marvin was configured to use its best quality mapping strategy ACS National Meeting, Philadelphia, USA 20th August 2012
  11. 11. Ability to map all product atoms ACS National Meeting, Philadelphia, USA 20th August 2012
  12. 12. c-c bonds brokenACS National Meeting, Philadelphia, USA 20th August 2012
  13. 13. Speed Comparison Average 1.7 3.6 1.6 4.0reagents per reaction ACS National Meeting, Philadelphia, USA 20th August 2012
  14. 14. Simple mappings Marvin/ChemDraw/Indigo/ICMAPACS National Meeting, Philadelphia, USA 20th August 2012
  15. 15. Simple mappings Marvin/ChemDraw/Indigo/ICMAPACS National Meeting, Philadelphia, USA 20th August 2012
  16. 16. More complicated Mappings Marvin ChemDrawACS National Meeting, Philadelphia, USA 20th August 2012
  17. 17. More complicated Mappings Indigo ICMAPACS National Meeting, Philadelphia, USA 20th August 2012
  18. 18. Reuse of reactantsACS National Meeting, Philadelphia, USA 20th August 2012
  19. 19. Reuse of reactants MarvinACS National Meeting, Philadelphia, USA 20th August 2012
  20. 20. Reuse of reactants ChemDrawACS National Meeting, Philadelphia, USA 20th August 2012
  21. 21. Reuse of reactants IndigoACS National Meeting, Philadelphia, USA 20th August 2012
  22. 22. Reuse of reactants ICMAPACS National Meeting, Philadelphia, USA 20th August 2012
  23. 23. Single Atom Mapping ICMAP/Marvin ChemDraw/IndigoACS National Meeting, Philadelphia, USA 20th August 2012
  24. 24. Bugs and quirks• Marvin – 2 unsuccessful mappings produced unchecked exceptions rather than checked exceptions• ChemDraw – Hydrogen on aromatic atoms missing in SMILES output• Indigo – Calculation of valency fails for aromatic sulfur ACS National Meeting, Philadelphia, USA 20th August 2012
  25. 25. Bugs and quirks• ICMAP – Single atom products are interpreted as empty molecules or occasionally replaced by a product from a previous reaction (bug reported) – Input files must be < 2gb and use dos line endings ACS National Meeting, Philadelphia, USA 20th August 2012
  26. 26. conclusions• ICMAP produced the best quality mappings on the tested sets• Atom mapping isn’t as simple as finding a maximum common subgraph mapping• In all the algorithms there were aspects that could be improved to yield appreciable benefits ACS National Meeting, Philadelphia, USA 20th August 2012
  27. 27. acknowledgements• Ed Griffen and Nick Tomkinson, AstraZeneca.• Andrew Wooster, GSK.• Hans Kraut, InfoChem• Thank you for your time. ACS National Meeting, Philadelphia, USA 20th August 2012

×