Automatic atom mapping attempts to determine the correspondence between the atoms of the reactants and products of a chemical reaction. Such mappings are useful for allowing greater specificity in queries of reaction databases. Recently there has been increased interest in their use to assist in the validation and standardisation of reactions in pharmaceutical ELNs (electronic lab notebooks). Atom mappings can, for example, detect if a reactant is missing or if a reactant does not contribute atoms to the product and hence may be better stored as an agent.
We have evaluated the performance of the new atom mapping algorithm introduced with Marvin v6 compared to the prior version on a publically available dataset extracted from the patent literature and on reactions from multiple pharmaceutical ELNs. Dramatic improvements are observed in all cases both in the percentage of reactions that can be successfully atom-mapped and the quality of mappings produced.
Finally we examine the difficulties that remain in validating reactions for which a complete atom mapping is not possible, such as for “routine” reactions where the reactant that was added is missing.
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Recent improvements in marvin v6 reaction atom mapping and its application to reaction validation in pharmaceutical el ns
1. ChemAxon UGM, San Diego, USA 25th September 2013
Recent improvements in Marvin v6:
Reaction Atom Mapping and its Application to
Reaction Validation in Pharmaceutical ELNs
Daniel Lowe and Roger Sayle
NextMove Software
Cambridge, UK
2. ChemAxon UGM, San Diego, USA 25th September 2013
What is Atom-Mapping?
Mapping
algorithm
3. ChemAxon UGM, San Diego, USA 25th September 2013
Why Perform Atom-Mapping?
• Assigning roles to reagents
• Normalization of reactions for registration
4. ChemAxon UGM, San Diego, USA 25th September 2013
Why Perform Atom-Mapping?
• More precise database searches
– Solvents/catalysts can be distinguished from
reactants
– Allows the relationship between the reactant
atoms and product atoms to be made explicit
5. ChemAxon UGM, San Diego, USA 25th September 2013
Example
• I want to find reactions converting an alkene
to a cyclopropane so I search for C=C>>C1CC1
6. ChemAxon UGM, San Diego, USA 25th September 2013
Why Perform Atom-Mapping?
• Identifying suspect reactions:
7. ChemAxon UGM, San Diego, USA 25th September 2013
Chemaxon atom mapping
8. ChemAxon UGM, San Diego, USA 25th September 2013
Chemaxon atom mapping
9. ChemAxon UGM, San Diego, USA 25th September 2013
Atom mapping modes
• Complete
• Changing
• Matching
10. ChemAxon UGM, San Diego, USA 25th September 2013
Methodology
Test set Reactions
Pharmaceutical ELN subset 18,244
ChemReact68 database 67,926
SPRESI database subset 5,230
Reactions extracted from 2008-
2011 USPTO patent applications*
562,872
* Lowe, D. M. Automated Extraction of Reactions from the Patent Literature.
243rd ACS National Meeting & Exposition, San Diego, CA, March 27, 2012.
11. ChemAxon UGM, San Diego, USA 25th September 2013
MetricS used
• Were all product atoms mapped
– Measures recall
• How many C-C bonds were broken
– Measures precision
12. ChemAxon UGM, San Diego, USA 25th September 2013
Ability to map all product atoms
0
10
20
30
40
50
60
70
80
PharmaELN ChemReact68 SPRESI USPTO
Percentofreactionswithallproductatoms
mapped
Marvin 5.10
Marvin 6.0
ChemDraw 12
14. ChemAxon UGM, San Diego, USA 25th September 2013
Marvin 5.10
ChemDraw 12
Marvin 6.0
15. ChemAxon UGM, San Diego, USA 25th September 2013
Speed Comparison
*Comparison performed on the PharmaELN dataset on an i7-2600
0
50
100
150
200
250
300
350
Marvin 5.12 Marvin 6.0 Marvin 6.0
(multithreaded)
Reactionsmappedpersecond
20. ChemAxon UGM, San Diego, USA 25th September 2013
Beyond atom mapping
• Missing reactants (often for routine reactions)
21. ChemAxon UGM, San Diego, USA 25th September 2013
Beyond atom mapping
• Change of stereoisomer or chiral resolution
(E)-3-{8-[2-(4-Isopropyl-1,3-thiazol-2-yl)ethyl]-2-methoxy-4-oxo-4H-pyrido[1,2-a]pyrimidin-3-yl}-2-propenoic acid (1
mg) was dissolved in CDCl3 (0.5 ml) and irradiated with light from a fluorescent lamp
for 19 hours . The solvent was evaporated to obtain the title compound (1 mg).
22. ChemAxon UGM, San Diego, USA 25th September 2013
Atom mapping + classification
0
10
20
30
40
50
60
70
80
90
100
Atom mapping
algorithms alone
Combined with
NameRXN
Percentofreactionswithallproduct
atomsmapped
Marvin 6.0
ChemDraw 12
Consensus
Result
Verified /
Recognised
by
NameRXN
(71%)
23. ChemAxon UGM, San Diego, USA 25th September 2013
conclusions
• Marvin v6’s atom mapping algorithm provides
large improvements in recall, precision and speed
over v5
• Atom mapping in some cases isn’t as simple as
finding a maximum common subgraph mapping
• Classification algorithms can be useful for the
validation of some reactions
24. ChemAxon UGM, San Diego, USA 25th September 2013
acknowledgements
• Zsolt Mohacsi and Istvan Rabel, ChemAxon
• Ed Griffen and Nick Tomkinson, AstraZeneca
• Andrew Wooster, GSK
• Hans Kraut, InfoChem
• Thank you for your time.