(Journal Club) ICReDD Seminar, Apr 27 2020
Institute for Chemical Reaction Design and Discovery (ICReDD)
Hokkaido University
Sapporo, JAPAN
https://www.icredd.hokudai.ac.jp
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
How to use data to design and optimize reaction? A quick introduction to work from Sigman lab
1. How to use data to design and optimize reactions?
⎯⎯⎯⎯⎯⎯ A quick introduction to work from Sigman lab
Speaker: Kosuke Higashida (Sawamura-G)
Slides: Ichigaku Takigawa (Takigawa-G)
ICReDD Fusion Research Seminar, Apr 27 2020.
2. Today's talk
Introduce papers from Matthew S. Sigman's group that
considers how to use data to design and optimize catalysts.
Matthew S Sigman
Dept. Chemistry
The University of Utah
Organic Synthesis &
Asymmetric Catalysis
MS Sigman, KC Harper, EN Bess, A Milo
The development of multidimensional analysis tools
for asymmetric catalysis and beyond.
Accounts of Chemical Research 2016 49 (6), 1292-1301.
summarized the following studies from the lab.
1. Harper & Sigman, PNAS 2011
2. Harper & Sigman, Science 2011
• Werner, Mei, Burckle & Sigman, Science 2012
3. Milo, Bess & Sigman, Nature 2014
• Harper, Vilardi & Sigman, JACS 2013
• Bess, Bischoff, Sigman, PNAS 2014
4. Niemeyer, Milo, Hickey & Sigman, Nat Chem 2016;
Milo, Neel, Toste & Sigman, Science 2015
3. Our consistent theme today
• Chemically, it's about how to rationally design and improve
asymmetric catalysis ( , 不对称合成).
• Chemoinformatically, it's about QSAR (Quantitative Structure-
Activity Relationship) and descriptors.
• Statistically/Mathematically, it's about an application of linear
regression (design of experiments).
How to grasp "empirical" trends in experiments, and how
to effectively use them to develop new catalytic reactions?
4. Outline
1. Harper & Sigman, PNAS. 2011
Proof of Concept (PoC)
Consider "Design of Experiments (DoE)" (Information Science)
2. Harper & Sigman, Science. 2011
De novo Applications of 1
• Werner, Mei, Burckle & Sigman, Science 2012
3. Milo, Bess & Sigman, Nature. 2014
"More sophisticated descriptors" (Computational Science)
• Harper, Vilardi & Sigman, JACS, 2013
• Bess, Bischoff, Sigman, PNAS, 2014
4. Niemeyer, Milo, Hickey & Sigman, Nat Chem 2016;
Milo, Neel, Toste & Sigman, Science 2015
Establish a data-intensive workflow for "the study of reaction
mechanisms"
0. Quick Introduction to Asymmetric Synthesis
5. Asymmetric synthesis (Enantioselective synthesis)
• A molecule is called "chiral" if it cannot be
superposed on its mirror.
• A chiral molecule exists in two stereoisomers
that are mirror images of each other, called
"enantiomers".
• They have the same chemical properties,
except when reacting with other chiral
compounds.
• They also have the same physical
properties, except that they often have
opposite optical activities.
• A chiral molecule must have at least one
"chiral center" or "stereocenter".
A reaction that produces the stereoisomeric (enantiomeric or
diastereoisomeric) products in unequal amounts.
A key process in modern chemistry, particularly for pharmaceuticals!
"enantiomers"
6. A starting example
0.1 equiv. CrCl3
0.1 equiv. Chiral Ligand
2 equiv. Mn(0)
0.1 equiv. Triethylamine
2 equiv. SiMe3Cl
Nozaki-Hiyama-Kishi (NHK) allylation reaction
THF, RT, 20h
R
7. How to read this?
0.1 equiv. CrCl3
0.1 equiv. Chiral Ligand
Nozaki-Hiyama-Kishi (NHK) allylation reaction
Substrates
(Reactants)
Product
THF, RT, 20h
Catalyst Ligand (binds to Cr)
Reagents
Solvent
Reaction Temperature (RT = Room Temperature)
Reaction Time
Starting material
( )
2 equiv. Mn(0)
0.1 equiv. Triethylamine
2 equiv. SiMe3Cl
Reaction partner
8. A starting example
0.1 equiv. CrCl3
0.1 equiv. Chiral Ligand
Nozaki-Hiyama-Kishi (NHK) allylation reaction
THF, RT, 20h
an activated allyl group is added to carbonyl group
Phenyl-group
Asymmetric center
R
2 equiv. Mn(0)
0.1 equiv. Triethylamine
2 equiv. SiMe3Cl
9. A starting example
0.1 equiv. CrCl3
0.1 equiv. Chiral Ligand
Nozaki-Hiyama-Kishi (NHK) allylation reaction
THF, RT, 20h
an activated allyl group is added to carbonyl group
Phenyl-group
or
SR
Enantiomer ratio: e.r.
Enantiomeric Excess: e.e.
AR : AS
R
2 equiv. Mn(0)
0.1 equiv. Triethylamine
2 equiv. SiMe3Cl
10. Our problem for now
What "substituent" for can
improve the enantioselectivity,
preferably for a wide range of substrates?
"Ligands" ( )
iPr Bn
( )
Groups for ?
"Substituents"
• Ph
• tBu
• CF3
• CH(CH3)2
• OiPr
• Cl
• CH3
• CHO
• CO2Me
• COMe
• COOH
• OEt
• NMe2
• OMe
• NH2
• NO2
• OH
• F
• H= isopropyl group = benzyl group
tBu
= tertiary butyl
group
Me
= methyl
group
Et
= ethyl
group
11. Sigman's approach
1. Use high-throughput assays to rapidly evaluate
many reaction conditions and catalyst structures.
2. Investigate the reaction mechanism to inform the
design of better-performing catalysts.
(by computing various transition-state structures)
Two common tactics of many labs (including Sigman's lab)
The newly proposed approach of Sigman's lab
Simultaneously optimize reaction performance and
interrogate mechanistic features of reactions by
relating empirical results of carefully designed datasets
to physical organic descriptors.
12. Backstory before the new approach
Sigman & Miller, J. Org. Chem. 2009, 74, 20, 7633-7643
They observed 1. Linear Free Energy Relationship (LFER):
the enantioselectivity and the size of the proline
substituent G (measured by Taft/Charton steric
parameters) were positively correlated.
2. A break in the LFER when larger G groups were
employed.
[ ]‡ means
a transition state unclear
multistep
reactions
13. Their two assumptions for the break of LFER
Their focus on outliers: "results that appear at first glance to be
peculiar or poor are considerably more interesting than ones
that follow obvious or intuitive trends."
14. Their two assumptions for the break of LFER
Their focus on outliers: "results that appear at first glance to be
peculiar or poor are considerably more interesting than ones
that follow obvious or intuitive trends."
1. The poor extrapolation to larger substituents may be due to
any change in the mechanisms of asymmetric catalysis
2. The Taft/Charton parameters implemented do not precisely
describe the observed steric effect.
These two assumptions for the break of LFER are the starting
point of their newly developed approach.
15. Their two assumptions for the break of LFER
Their focus on outliers: "results that appear at first glance to be
peculiar or poor are considerably more interesting than ones
that follow obvious or intuitive trends."
1. The poor extrapolation to larger substituents may be due to
any change in the mechanisms of asymmetric catalysis
2. The Taft/Charton parameters implemented do not precisely
describe the observed steric effect.
These two assumptions for the break of LFER are the starting
point of their newly developed approach.
Design of experiments? Data-intensive methods to probe it?
More sophisticated parameters? Computational parameters?
16. First study 1/4 Harper & Sigman, PNAS 2011.
Initially 25 ligands synthesized
acc. the commercial availability and
the ease of synthesis.
17. First study 1/4 Harper & Sigman, PNAS 2011.
Initially 25 ligands synthesized
acc. the commercial availability and
the ease of synthesis.
LFER development by fitting
a linear regression model.
enantioselectivity
enantiomeric
ratio
gas constant reaction
temperature
18. First study 1/4 Harper & Sigman, PNAS 2011.
Initially 25 ligands synthesized
acc. the commercial availability and
the ease of synthesis.
... resulted in only moderate predictive power and did little to
inform catalyst design.
"How many and what distribution of data points would provide a
meaningful model for prediction?" (from the initial review of the work)
LFER development by fitting
a linear regression model.
enantioselectivity
enantiomeric
ratio
gas constant reaction
temperature
19. First study 2/4
Apply the principles of "Design of Experiments (DoE)" using physical organic
parameters.
"Briefly, the precepts of DoE involve evenly spreading experiments to
the sensitivity limits of a system using continuous variables (e.g.
temperature and concentration)."
Harper & Sigman, PNAS 2011.
20. Technical note 1/3: Design of experiments (DoE)
We have many variations, but in short...
(1) Define coordinates
with great care.
(2) Examine/consider
the limit of possible ranges.
(3) Cover the region evenly
with interacting points.
"32 factorial design"
(full factorial design
with 3 levels of 2 factors)
Consult a statistician when descriptors have
strong correlation or clear interdependency
If you have more variables, you can also
consider a "fractional" design...
(a carefully chosen subset of experiments)
better
simplex
lattice
design
central
composite
design
Box-Behnken
design
face-centered
central composite
design
Latin hypercube
design
Sphere packing
design
Uniform design Maximum
entropy design
21. Technical note 2/3: Least-squares regression
"Predictors"
(descriptors)
"Response"
(target)
Regression equation for prediction
that minimizes the sum of (error-bar height)
simple matrix multiplication
2
error-bar
fitting a plane!
22. Technical note 3/3: Least-squares regression
Regression equation for prediction
Exactly same calculation!
fitting a surface!
called "response surface" models (RSMs) in DoE
If we set like this...
all calculated from X and Y
(composite variables)
23. First study 3/4
1. Extrapolation is subject to greater uncertainty and error than interpolation.
2. The predictive power of any statistical model is dependent upon the
variation in the distribution of data points used to develop the model.
Harper & Sigman, PNAS 2011.
Their two key tenets
24. First study 3/4
1. Extrapolation is subject to greater uncertainty and error than interpolation.
2. The predictive power of any statistical model is dependent upon the
variation in the distribution of data points used to develop the model.
Harper & Sigman, PNAS 2011.
Their two key tenets
Carefully designed and evenly spaced ligand space according to the
range of substituent Charton values that are synthetically accessible!
A 3 x 3 library defined
25. First study 3/4 Harper & Sigman, PNAS 2011.
H Me iPr tBu
Me
tBu
CEt3
The response surface model fitted
to 3x3 design by linear regression
2 replicates for each point in 3x3
Not reproducible area
26. First study 3/4 Harper & Sigman, PNAS 2011.
Et
The response surface model fitted
to 3x3 design by linear regression
2 replicates for each point in 3x3
H Me iPr tBu
Me
tBu
CEt3
27. Second study 1/4
They admitted that ...
they were aware of the likely best catalyst structure through extensive
trial-and-error screening in the development of these systems, and the
identity of the optimal predicted catalyst(s) was not surprising.
Harper & Sigman, Science 2011.
28. Second study 1/4
They admitted that ...
they were aware of the likely best catalyst structure through extensive
trial-and-error screening in the development of these systems, and the
identity of the optimal predicted catalyst(s) was not surprising.
De novo application of this approach
They applied this to an NHK-type propargylation
reaction that had been abondoned years earlier in
the group because of their inability, for more than 2
years, to achieve effective asymmetric catalysis....
Harper & Sigman, Science 2011.
29. Second study 1/4
They admitted that ...
they were aware of the likely best catalyst structure through extensive
trial-and-error screening in the development of these systems, and the
identity of the optimal predicted catalyst(s) was not surprising.
De novo application of this approach
They applied this to an NHK-type propargylation
reaction that had been abondoned years earlier in
the group because of their inability, for more than 2
years, to achieve effective asymmetric catalysis....
It turned out
the best predicted one was
limited to 50% ee...
Harper & Sigman, Science 2011.
30. Second study 2/4
By screening only 9 ligands, they could confidently reach an informed
decision that a change in catalyst structure was necessary to proceed
with catalyst optimization.
Harper & Sigman, Science 2011.
31. Second study 2/4
By screening only 9 ligands, they could confidently reach an informed
decision that a change in catalyst structure was necessary to proceed
with catalyst optimization.
Several observations for redesigning the catalyst.
1. oxazoline unit did not bias the
reaction to a great extent
2. a Hammett correlation between the
enantioselectivity and the substrate
structure was observed. (A significant
electronic contribution from the substrate
impacts the enantioselectivity)
oxazoline
Harper & Sigman, Science 2011.
32. Second study 3/4
These two observations implied ...
The new catalyst design should incorporate a readily modifiable group
whose electronics could be systematically perturbed to override the
innate electronic bias of the substrate.
Harper & Sigman, Science 2011.
33. Second study 3/4
These two observations implied ...
The new catalyst design should incorporate a readily modifiable group
whose electronics could be systematically perturbed to override the
innate electronic bias of the substrate.
A 3 x 3 library, replacing the oxazoline with a quinoline, with one dimension
manipulated electronically (E) and the other sterically (S).
oxazoline
quinoline
pyridine
Harper & Sigman, Science 2011.
34. Side note 1/2: LFERs and Hammett equation
"The Hammett equation (and its extended forms) has been one of the
most widely used means for the study and interpretation of organic
reactions and their mechanisms." (Hansch, Leo & Taft, Chem Rev 1991)
35. Side note 1/2: LFERs and Hammett equation
"The Hammett equation (and its extended forms) has been one of the
most widely used means for the study and interpretation of organic
reactions and their mechanisms." (Hansch, Leo & Taft, Chem Rev 1991)
For any two reactions with two aromatic reactants only differing in the
type of substituent , the change in free energy of activation is
proportional to the change in Gibbs free energy. (i.e. a "linear" relationship)
equilibrium
constant with
that with = H
(reference)
substituent constant
(depends only on )
reaction constant (depends on
reactions but independent from )
"parameter"
values for
36. Side note 1/2: LFERs and Hammett equation
"The Hammett equation (and its extended forms) has been one of the
most widely used means for the study and interpretation of organic
reactions and their mechanisms." (Hansch, Leo & Taft, Chem Rev 1991)
• This is an empirical trend, not follow from any chemical theory
(thus also long critisized from the theoretical side?).
• A surprise was, this equation also holds for reaction rates.
• This triggered many other useful variations as linear free-energy
relationships (LFER) as well as nonlinear ones such as quadratic.
For any two reactions with two aromatic reactants only differing in the
type of substituent , the change in free energy of activation is
proportional to the change in Gibbs free energy. (i.e. a "linear" relationship)
equilibrium
constant with
that with = H
(reference)
substituent constant
(depends only on )
reaction constant (depends on
reactions but independent from )
"parameter"
values for
37. Side note 2/2: LFER is generalized to QSAR/QSPR
LFERs or prediction by empirical trends in data also yielded
a long history of QSAR/QSPR in Chemoinformatics
since Taft/Hammett and Hansch-Fujita, etc....
: Hansch (2014)
38. Second study 4/4
This allowed them to confidently identify a catalyst that performed well
for a wide range of subtrates with differential electronic features.
The same approach is also applied to identify highly enantioselective
catalysts for the Pd-catalyzed redox-relay Heck reactions of alkenols
(Werner, Mei, Burckle & Sigman, Science 2012, 338, 1455-1458).
Hammett parameters (σ)
Charton parameters (ν)
electronical effect (E)
sterical effect (S)
Harper & Sigman, Science 2011.
39. Toward parameter (re-)considerations
It is critical to use appropriate parameters (descriptors) for characteristically
describing the properties of substituents...
For example, Taft/Charton values have been historically controversial
because of the limited generality... (They arose from a particular mechanism)
42. Third study 1/2 Milo, Bess & Sigman, Nature 2014.
New computationally derived parameters:
Use IR vibrations as parameters to describe simultaneous
steric and electronic perturbations to structure.
To this point, a key problem in the history of LFERs has avoided:
the inability to analyze substituents with changes to both size
and electronic nature.
(Use computational science!)
43. Third study 2/2
To this point, a key problem in the history of LFERs has avoided:
the inability to analyze substituents with changes to both size
and electronic nature.
1. Perform a geometry minimization and frequency calculation of starting
materials
2. Compute vibrations as well as other parameters, including Sterimol
values, NBO charges, and geometric factors such as torsion angles.
3. Abridge the parameter set to a statistically manageable number.
4. Perform and evaluate a systematic iterative process to establish and
improve statistical predictive models.
New computationally derived parameters:
IR(Infrared)-spectroscopy-based parameters to describe
simultaneous steric and electronic perturbations to structure.
Milo, Bess & Sigman, Nature 2014.
44. Applications of this to evaluate the substrate scope
This approach is intended to use in catalyst optimization, but can
be applied to different challenges such as "substrate scope".
1. Harper, Vilardi & Sigman, JACS 2013
The catalyst and substrate structures were jointly analyzed to predict
the "best" catalyst/substrate combinations.
2. Bess, Bischoff, Sigman, PNAS 2014
A modest-sized library of substrates was designed to build a model for
predicting the performance of even unreported substrates, and
investigated a Rh-catalyzed asymmetric transfer hydrogeneation of
ketones.
45. Returned to their original pursuit: reaction mechanisms
They hypothesized
their predictive models (multidimensional correlations) would
encompass mechanism-rich information.
However, inferring mechanistic information becomes a daunting task
because
1. the models become increasingly complex.
2. the models contain compounded parameters that are not
representative of one distinct structural characteristic.
A data-intensive approach to mechanistic elucidation requires
philoshophical changes: One has to overcome the entrenched
dogma that low-enantioselectivity measurements are "bad results".
Every value obtained in the course of data collection is useful and
can serve to reveal structural information regarding the mechanism!
46. 4th/5th studies
Niemeyer, Milo, Hickey & Sigman, Nat Chem 2016
Milo, Neel, Toste & Sigman, Science 2015
1. Niemeyer, Milo, Hickey & Sigman, Nat Chem, 2016
Parameter selection is not intuitive and an effective parameter set is
statistically obtained for phosphine ligands that served to distinguish
between two limiting mechanistic regimes in the site-selective oxidative
addition in a Suzuki reaction.
2. Milo, Neel, Toste & Sigman, Science 2015
Synthetically reasonable and mechanistically informative catalysts,
substrates, or reagents are virtually predicted, and then prepared and
tested experimentally in the context of chiral phase-transfer catalysis.
This tactic requires the design and preparation of a systematic and
synthetically modular library of catalysts, substrates, and reagents.
The experimental design defines the effects that can be explained
by the produced models.
47. Collaborative and further studies
1. Neel, Milo, Sigman, Toste, JACS, 2016
Chiral phosphoric acid-catalyzed fluorination of allylic alcohols
2. Bess, DeLuca, Tindall, Oderinde, Roizen, Du Bois & Sigman,
JACS, 2014
Rh-catalyzed C-H amination reaction
3. Bess, Guptill, Davies & Sigman, Chem Sci, 2015
a carbene C-H insertion reaction
4. Mougel, Santiago, Zhizhko, Bess, Varga, Frater, Sigman, Copéret,
JACS, 2015
Turnover frequency (TOF) of solid-supported W-based catalysts for
olefin metathesis
5. Zhang, Santiago, Crawford & Sigman, JACS, 2015
ligand optimization for Pd-catalyzed enantioselective redox-relay Heck
reactions
6. Zhang, Santiago, Kou & Sigman, JACS, 2015
evaluation of solvent effects for Pd-catalyzed enantioselective redox-
relay Heck reactions
48. Lessons we learned
1. The design of experiments (DoE) is crucial to obtain causal
insights on the target reactions.
2. We can computationally derive the physical organic chemical
parameters to describe the mechanistic features of reactions.
3. "In brief, one needs a reliable and accurate assay for the
output of interest, a reasonable spread of such outputs to
facilitate statistical analysis, and a reasonable level of
modularity in the process under study."
Deconstructing rather complex relationships still requires
profound physical organic chemistry expertise, the use
of various classical tools in this field, and creativity and
intuition to understand sophisticated processes.
50. 1 page summary of the 2019 Nature paper
Method: Forward stepwise linear regression of Matlab statistics toolbox (since Nature 2014?)
Give up DoE and go more towards a "ML/AI"-like approach for out-of-sample prediction.
nucleophile
imine starting
material
amine
product
chiral
phosphoric
acid catalyst
• 367 reactions
(manually curated from 17 papers)
• 313 parameters from every components
(Steric, DFT-derived, 2D chemoinfo
descriptors, conditions)
Proof-of-concept with a reaction class with different
catalyst chemotypes, substrate structural types.
out-of-sample prediction
Note: this is predicted
vs measured plot
(not a fitted line)
51. Lessons we learned
1. The design of experiments (DoE) is crucial to obtain causal
insights on the target reactions.
2. We can computationally derive the physical organic chemical
parameters to describe the mechanistic features of reactions.
3. "In brief, one needs a reliable and accurate assay for the
output of interest, a reasonable spread of such outputs to
facilitate statistical analysis, and a reasonable level of
modularity in the process under study."
Deconstructing rather complex relationships still requires
profound physical organic chemistry expertise, the use
of various classical tools in this field, and creativity and
intuition to understand sophisticated processes.