A generic chemical transformation may often be achieved under various synthetic conditions. However, for any specific reagents, only one or a few amongst the reported synthetic protocols may be successful. For example, Michael beta-addition reactions may proceed under different choices of solvent (e.g., hydrophobic, aprotic polar, protic) and catalyst (e.g., Brønsted acid, Lewis acid, Lewis base, etc.). Chemoinformatics methods could be efficiently used to establish a relationship between the reagent structures and the required reaction conditions, which would allow synthetic chemists to waste less time and resources in trying out various protocols in search for the appropriate one. In order to address this problem, a number of 2-classes classification models have been built on a set of 198 Michael reactions retrieved from literature. Trained models discriminate between processes that are compatible and respectively processes not feasible under a specific reaction condition option (feasible or not with a Lewis acid catalyst, feasible or not in hydrophobic solvent, etc). Eight distinct models were built to decide the compatibility of a Michael addition process with each considered reaction condition option, while a ninth model was aimed to predict whether the assumed Michael addition is feasible at all. Different machine-learning methods (SVM, Naive Bayes and Random Forest) in combination with different types of descriptors have been used. In particular, the ChemAxon toolkit was used to develop Electronic Effect Descriptors (EED), that are particularly well suited in this context. Obtained models have a reasonable predictive performance in 3 times 3-fold cross- validation: Balanced Accuracy varies from 0.7 to 1.
Developed models are available for the users over here. Eventually, these were challenged to predict feasibility conditions for ~50 novel Michael reactions from the eNovalys database (originally retrieved from patent literature).
Chemaxon EU UGM 2022 | Translating data to predictive modelsChemAxon
More Related Content
Similar to EUGM15 - Alexandre Varnek (Université de Strasbourg): Towards an expert system for predicting reaction conditions: the Michael reaction case
Similar to EUGM15 - Alexandre Varnek (Université de Strasbourg): Towards an expert system for predicting reaction conditions: the Michael reaction case (20)
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
EUGM15 - Alexandre Varnek (Université de Strasbourg): Towards an expert system for predicting reaction conditions: the Michael reaction case
1. Prediction of Reaction Conditions
for Michael Additions
G. Marcou, J. Aires de Sousa, A. de Luca, D. Latino, V. Rietsch,
D. Horvath, A. Varnek
ChemAxon UGM, Budapest 19-20 May 2015
2. • Different scenarios of structure-reactivity modeling
• Descriptors for reactions
- Condensed Graph of Reaction / ISIDA descriptors
- Electron Effects Descriptors
• Michael reaction case
OUTLINE
3. Chemical reactions are difficult objects
+ +
- many species;
- two types of species: reactants and products;
- multi-step reactions,
- dependent on experimental conditions
4. • How can I synthesize this structure?
• How can I estimate a yield of a given reaction, its
kinetic and thermodynamic parameters ?
• Which reaction conditions I should choose in order
to obtain desirable product selectively ?
Chemical reactions in Chemoinformatics
7. • Different scenarios of structure-reactivity modeling
• Descriptors for reactions
- Condensed Graph of Reaction / ISIDA descriptors
- Electron Effect Descriptors
• Michael reaction case
OUTLINE
8.
Conventional bonds:
single, double, aromatic, …
Dynamical bonds:
created single, broken single,
…
CGR could be viewed as a pseudo-molecule representing a given reaction
Condensed Graph of Reaction
9. Condensed Graph of Reaction
• CGR condenses the structural information about products
and reactants
several graphs into one only graph
• This simplified presentation opens an opportunity to
apply to CGR the methods developed in
chemoinformatics for individual molecules
A. Varnek, D. Fourches, F. Hoonakker, V. P. Solov’ev, J. Computer-Aided Molecular Design, 2005, 19, 693-703
10. ISIDA fragment descriptors
Reaction can be encoded by a descriptor vector
which can be used in structure-reactivity modeling
Condensed graph of
reaction
2 1 2 …
…
ISIDA/CGR fragment descriptors
A. Varnek In: "Chemoinformatics and Computational Chemical Biology", J. Bajorath, Ed., Springer, 2010
11. Sums of property-dependent P(i) contributions from remote atoms i, at tiK
away from the ‘reactive’ center K, modulated by various working hypotheses:
𝐸𝐸𝐷 𝑝,𝑒,𝑜,𝑤,𝑐 𝐾 =
𝑖=1
𝑁
𝛿 𝑐 𝑖, 𝐾 × 𝑃𝑝
𝑒 (𝑖) × 𝑒𝑥𝑝 −1/𝑤 × 𝜏𝑖𝐾 − 𝑜 2
𝜏𝑖𝐾 =
0.5 𝑖𝑓 𝑖 = 𝐾
𝑠ℎ𝑜𝑟𝑡𝑒𝑠𝑡 − 𝑝𝑎𝑡ℎ 𝑡𝑜𝑝𝑜𝑙𝑜𝑔𝑖𝑐𝑎𝑙 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑖, 𝐾 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝛿 𝑐
(𝑖, 𝐾) =
1 𝑖𝑓 𝑐 = 0 𝑂𝑅(𝑝𝑎𝑡ℎ 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑖 𝑎𝑛𝑑 𝐾 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑠 𝑜𝑛𝑙𝑦 𝑢𝑛𝑠𝑎𝑡𝑢𝑟𝑎𝑡𝑒𝑑 𝑎𝑡𝑜𝑚𝑠)
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Property type p=1..11
Property power e=1..2
Neighborhood control o=1..4
Neighborhood width w=2..5
Conjugation toggle c=0,1
11x2x4x4x2=704 EED terms…
Electron Effect Descriptors
M.Elhabiri, E. Davioud-Charvet, D. Horvath, A. Varnek et al., Chemistry Eur. J, 2014, 20, 1 – 11
13. • Different scenarios of structure-reactivity modeling
• Descriptors for reactions
- Condensed Graph of Reaction / ISIDA descriptors
- Electron Effect Descriptors
• Michael reaction case
OUTLINE
14. Solvent
Hydrophobic (kerosene, benzene, …)
Polar Aprotic (THF, DMSO, …)
Polar Protic (water, acetic acid ...)
No solvent: reaction occurs in pure
solutions of the reagents
Catalyst
Bronsted acids (Hydrochloric acid, …
Lewis acids (transition metal ions, ….)
Basic (pyridine, Na ethanolate, ….)
No catalyst: a catalyst is not needed,
or autocatalysis takes place
Reaction occurs in different conditions characterized by solvent and catalyst
Michael reaction
Michael donor
Michael acceptor
15. For given Nu and R1 – R4, which conditions
(catalyst, solvent) lead to high reaction yield ?
Michael reaction
16. • to build QSRR models predicting optimal reaction
conditions for a query reaction.
Each model will answer a punctual question such as “Is this process
feasible with Brønsted acid catalysts?”, “Is this process feasible in aprotic
polar solvents?”, etc.
• to develop a public predictive tool adapted to the
Michael-type reaction case
Michael reaction: goals of the modeling
17. • 53 polar aprotic solvent
• 52 no solvent
• 103 polar protic solvent
• 93 Lewis acid catalyst
• 61 no catalyst
• 40 hydrophobic solvent
• 57 Bronsted acid catalyst
• 45 Basic catalysis
• 24 Decoys (not observed Michael reactions)
Notice that one same reaction may proceed under
different conditions !
Michael reaction: data
The data set consists in 222 reactions:
19. Compatibility of the Michael reaction in the database with each of the condition
classes is rendered by the condition bitvector.
0 1 1 0 0 0 1 0 0
Solvent
Hydro-
phobic
Solvent
Polar
Aprotic
Solvent
Polar Protic
No Solvent Catalyst
Bronsted
Acid
Catalyst
Lewis Acid
Catalyst
Basic
No
Catalyst
Not
observed
A bit is set on if the reaction was seen to happen under this condition.
NOTE – if off, it means we don’t know whether it’s feasible!
An additional ‘no-go’ bit is on if the given Michael addition should never be
observed (because of competing carbonyl addition).
Bitvector of reaction conditions
20. Modeling setup
Goal: preparation of 9 two-class classification models:
- 8 models for reaction feasibility for each catalyst or solvent type,
- 1 model « Michael / non-Michael »
Descriptors: ISIDA/CGR, EED … and also MOLMAP, CDK
Scenarios: reagent-based, product-based and reaction-based
Performance assessment : ROC AUC in 3-fold cross-validation
Machine-learning methods: Random Forest, SVM, Naïve Bayes
23. EED ISIDA
Reagent descriptors
Michael reaction: models performance
G. Marcou, J. Aires de Sousa, D. Latino, A. Deluca, D. Horvath, V. Rietsch, and A. Varnek J. Chem. Inf. Model., 2015, 55, 239−250.
25. Model validation on the set of 52 reactions extracted from the literature
Model # in the set # correctly predicted
Catalyst and solvent 52 8
aprotic solvent 12 2
Protic solvent 21 21
No catalyst 26 14
base-catalyzed 19 8
Brønsted acids 2 1
Model failure or uncomplete data for the modeling ?
Michael reaction: external validation
26. Both ISIDA/CGR and EED descriptors provide with
acceptable models in cross-validation
Reagent, Product and Reaction scenarios perform
similarly
The models cross-validate very well, but external testing
is inconclusive – more data are needed !
Conclusions