EUGM15 - Alexandre Varnek (Université de Strasbourg): Towards an expert system for predicting reaction conditions: the Michael reaction case

Prediction of Reaction Conditions
for Michael Additions
G. Marcou, J. Aires de Sousa, A. de Luca, D. Latino, V. Rietsch,
D. Horvath, A. Varnek
ChemAxon UGM, Budapest 19-20 May 2015

• Different scenarios of structure-reactivity modeling
• Descriptors for reactions
- Condensed Graph of Reaction / ISIDA descriptors
- Electron Effects Descriptors
• Michael reaction case
OUTLINE

Chemical reactions are difficult objects
+ +
- many species;
- two types of species: reactants and products;
- multi-step reactions,
- dependent on experimental conditions

• How can I synthesize this structure?
• How can I estimate a yield of a given reaction, its
kinetic and thermodynamic parameters ?
• Which reaction conditions I should choose in order
to obtain desirable product selectively ?
Chemical reactions in Chemoinformatics

𝑃𝑟𝑜𝑝𝑒𝑟𝑡𝑦 = 𝐟(𝑫𝒆𝒔𝒄𝒓𝒊𝒑𝒕𝒐𝒓𝒔)
• substructural fragments
• topological indices,
• physico-chem. parameters
• etc…
Parameters directly
derived from
molecular structure
• Support Vector Machine (SVM)
• Multi-Linear Regression (MLR)
• Artificial Neural Networks
• etc…
Mathematical relationship
established with machine
learning methods
Quantitative Structure-Activity/Property
Relationship (QSAR/QSPR)
𝑃𝑟𝑜𝑝𝑒𝑟𝑡𝑦 = 𝐟(𝒔𝒕𝒓𝒖𝒄𝒕𝒖𝒓𝒆)

• 𝑅𝑒𝑎𝑐𝑡𝑖𝑣𝑖𝑡𝑦 = 𝐟 𝑹𝒆𝒂𝒄𝒕𝒂𝒏𝒕𝒔 𝑠𝑡𝑟𝑢𝑐𝑡𝑢𝑟𝑒
• 𝑅𝑒𝑎𝑐𝑡𝑖𝑣𝑖𝑡𝑦 = 𝐟(𝑷𝒓𝒐𝒅𝒖𝒄𝒕𝒔 𝑠𝑡𝑟𝑢𝑐𝑡𝑢𝑟𝑒)
• 𝑅𝑒𝑎𝑐𝑡𝑖𝑣𝑖𝑡𝑦 = 𝐟(𝑹𝒆𝒂𝒄𝒕𝒊𝒐𝒏 𝑠𝑡𝑟𝑢𝑐𝑡𝑢𝑟𝑒)
Quantitative Structure-Reactivity Relationship
(QSRR)

• Different scenarios of structure-reactivity modeling
• Descriptors for reactions
- Condensed Graph of Reaction / ISIDA descriptors
- Electron Effect Descriptors
• Michael reaction case
OUTLINE

￸
Conventional bonds:
single, double, aromatic, …
Dynamical bonds:
created single, broken single,
…
CGR could be viewed as a pseudo-molecule representing a given reaction
Condensed Graph of Reaction

Condensed Graph of Reaction
• CGR condenses the structural information about products
and reactants
several graphs into one only graph
• This simplified presentation opens an opportunity to
apply to CGR the methods developed in
chemoinformatics for individual molecules
A. Varnek, D. Fourches, F. Hoonakker, V. P. Solov’ev, J. Computer-Aided Molecular Design, 2005, 19, 693-703

ISIDA fragment descriptors
Reaction can be encoded by a descriptor vector
which can be used in structure-reactivity modeling
Condensed graph of
reaction
2 1 2 …
…
ISIDA/CGR fragment descriptors
A. Varnek In: "Chemoinformatics and Computational Chemical Biology", J. Bajorath, Ed., Springer, 2010

Sums of property-dependent P(i) contributions from remote atoms i, at tiK
away from the ‘reactive’ center K, modulated by various working hypotheses:
𝐸𝐸𝐷 𝑝,𝑒,𝑜,𝑤,𝑐 𝐾 =
𝑖=1
𝑁
𝛿 𝑐 𝑖, 𝐾 × 𝑃𝑝
𝑒 (𝑖) × 𝑒𝑥𝑝 −1/𝑤 × 𝜏𝑖𝐾 − 𝑜 2
𝜏𝑖𝐾 =
0.5 𝑖𝑓 𝑖 = 𝐾
𝑠ℎ𝑜𝑟𝑡𝑒𝑠𝑡 − 𝑝𝑎𝑡ℎ 𝑡𝑜𝑝𝑜𝑙𝑜𝑔𝑖𝑐𝑎𝑙 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑖, 𝐾 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝛿 𝑐
(𝑖, 𝐾) =
1 𝑖𝑓 𝑐 = 0 𝑂𝑅(𝑝𝑎𝑡ℎ 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑖 𝑎𝑛𝑑 𝐾 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑠 𝑜𝑛𝑙𝑦 𝑢𝑛𝑠𝑎𝑡𝑢𝑟𝑎𝑡𝑒𝑑 𝑎𝑡𝑜𝑚𝑠)
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Property type p=1..11
Property power e=1..2
Neighborhood control o=1..4
Neighborhood width w=2..5
Conjugation toggle c=0,1
11x2x4x4x2=704 EED terms…
Electron Effect Descriptors
M.Elhabiri, E. Davioud-Charvet, D. Horvath, A. Varnek et al., Chemistry Eur. J, 2014, 20, 1 – 11

Atom Properties Tracked in EED

 Solvent
 Hydrophobic (kerosene, benzene, …)
 Polar Aprotic (THF, DMSO, …)
 Polar Protic (water, acetic acid ...)
 No solvent: reaction occurs in pure
solutions of the reagents
 Catalyst
 Bronsted acids (Hydrochloric acid, …
 Lewis acids (transition metal ions, ….)
 Basic (pyridine, Na ethanolate, ….)
 No catalyst: a catalyst is not needed,
or autocatalysis takes place
Reaction occurs in different conditions characterized by solvent and catalyst
Michael reaction
Michael donor
Michael acceptor

For given Nu and R1 – R4, which conditions
(catalyst, solvent) lead to high reaction yield ?
Michael reaction

• to build QSRR models predicting optimal reaction
conditions for a query reaction.
Each model will answer a punctual question such as “Is this process
feasible with Brønsted acid catalysts?”, “Is this process feasible in aprotic
polar solvents?”, etc.
• to develop a public predictive tool adapted to the
Michael-type reaction case
Michael reaction: goals of the modeling

• 53 polar aprotic solvent
• 52 no solvent
• 103 polar protic solvent
• 93 Lewis acid catalyst
• 61 no catalyst
• 40 hydrophobic solvent
• 57 Bronsted acid catalyst
• 45 Basic catalysis
• 24 Decoys (not observed Michael reactions)
Notice that one same reaction may proceed under
different conditions !
Michael reaction: data
The data set consists in 222 reactions:

Non-occuring reaction
Occuring reaction :
condensation of hydroxylamine and aldehyde
Decoy Michael Addition

Compatibility of the Michael reaction in the database with each of the condition
classes is rendered by the condition bitvector.
0 1 1 0 0 0 1 0 0
Solvent
Hydro-
phobic
Solvent
Polar
Aprotic
Solvent
Polar Protic
No Solvent Catalyst
Bronsted
Acid
Catalyst
Lewis Acid
Catalyst
Basic
No
Catalyst
Not
observed
A bit is set on if the reaction was seen to happen under this condition.
NOTE – if off, it means we don’t know whether it’s feasible!
An additional ‘no-go’ bit is on if the given Michael addition should never be
observed (because of competing carbonyl addition).
Bitvector of reaction conditions

Modeling setup
Goal: preparation of 9 two-class classification models:
- 8 models for reaction feasibility for each catalyst or solvent type,
- 1 model « Michael / non-Michael »
Descriptors: ISIDA/CGR, EED … and also MOLMAP, CDK
Scenarios: reagent-based, product-based and reaction-based
Performance assessment : ROC AUC in 3-fold cross-validation
Machine-learning methods: Random Forest, SVM, Naïve Bayes

CGR for Michael reaction
created single double to single

Reaction descriptors
EED ISIDA
Solv:A
Solv:NA
Solv:P
Cat:LA
Cat:NA
Solv:H
Cat:BA
Cat:B
Cat:NO
1
0.9
0.8
0.7
0.6
0.5
1
0.9
0.8
0.7
0.6
0.5
Solv:A
Solv:NA
Solv:P
Cat:LA
Cat:NA
Solv:H
Cat:BA
Cat:B
Cat:NO
ROC AUC
ROC AUC = 1 corresponds to an ideal model
G. Marcou, J. Aires de Sousa, D. Latino, A. Deluca, D. Horvath, V. Rietsch, and A. Varnek J. Chem. Inf. Model., 2015, 55, 239−250.
Michael reaction: models performance

EED ISIDA
Reagent descriptors
Michael reaction: models performance
G. Marcou, J. Aires de Sousa, D. Latino, A. Deluca, D. Horvath, V. Rietsch, and A. Varnek J. Chem. Inf. Model., 2015, 55, 239−250.

http://infochim.u-strasbg.fr/webserv/VSEngine.html
WEB-based expert system predicting
Michael addition feasibility

Model validation on the set of 52 reactions extracted from the literature
Model # in the set # correctly predicted
Catalyst and solvent 52 8
aprotic solvent 12 2
Protic solvent 21 21
No catalyst 26 14
base-catalyzed 19 8
Brønsted acids 2 1
Model failure or uncomplete data for the modeling ?
Michael reaction: external validation

Both ISIDA/CGR and EED descriptors provide with
acceptable models in cross-validation
Reagent, Product and Reaction scenarios perform
similarly
The models cross-validate very well, but external testing
is inconclusive – more data are needed !
Conclusions

J. Chem. Inf. Model., 2015, 55, 239−250

 Campus France for a Franco-Portugese grant
“PESSOA”
 ChemAxon for the license
Thanks

EUGM15 - Alexandre Varnek (Université de Strasbourg): Towards an expert system for predicting reaction conditions: the Michael reaction case

EUGM15 - Alexandre Varnek (Université de Strasbourg): Towards an expert system for predicting reaction conditions: the Michael reaction case

Recommended

Recommended

More Related Content

Similar to EUGM15 - Alexandre Varnek (Université de Strasbourg): Towards an expert system for predicting reaction conditions: the Michael reaction case

Similar to EUGM15 - Alexandre Varnek (Université de Strasbourg): Towards an expert system for predicting reaction conditions: the Michael reaction case (20)

More from ChemAxon

More from ChemAxon (20)

Recently uploaded

Recently uploaded (20)

EUGM15 - Alexandre Varnek (Université de Strasbourg): Towards an expert system for predicting reaction conditions: the Michael reaction case