Computational Modeling in Chemical Engineering
.edu/comocheng
Finding Transition States Algorithmically for
Automatic Reaction Mechanism Generation
Pierre L. Bhoorasingh
Richard H. West
1
Can you predict TS geometries
from molecular groups alone?
2
(this would be great)
Length of bond being broken, at
TS for Hydrogen abstraction
Can you predict TS geometries
from molecular groups alone?
3
Radical
Molecule
Length of bond being broken, at
TS for Hydrogen abstraction
!"!#$ !"!%% !"!&' !"!($
!")() !")'& !"*+$ !"*!#
!")(' !")$% !"*%%
!")'+ !"*+& !"*&) !"*&$
Can you predict TS geometries
from molecular groups alone?
3in Å with M06-2X/6-31+G(d,p)
Can you predict TS geometries
from molecular groups alone?
4
!"!#$ !"!%% !"!&' !"!($
!")() !")'& !"*+$ !"*!#
!")(' !")$% !"*%%
!")'+ !"*+& !"*&) !"*&$
!"#$# !"#$%!"!#$ !"#$%
in Å with M06-2X/6-31+G(d,p)
You can predict TS geometries
from molecular groups alone!
5
!"!#$ !"!%% !"!&' !"!($
!")() !")'& !"*+$ !"*!#
!")(' !")$% !"*%%
!")'+ !"*+& !"*&) !"*&$
!"#$%
in Å with M06-2X/6-31+G(d,p)
You can predict TS geometries
from molecular groups alone!
6
But...
... you gave me a distance, not a geometry.
... I gave you 15 numbers then asked you for 1.
Automatic Transition State Theory (TST)
would be a game-changer.
•Insight and predictions require detailed kinetic models.
•Error-free detailed models require automatic generation.
•Automatic generation requires reasonable estimates
of millions of reaction rates.
•Current estimates are often unreasonable
due to scarcity of data.
7
Automatic TS searches remain an
important energy research goal
“An accurate description of the often
intricate mechanisms of large-molecule
reactions requires a characterization of
all relevant transition states...
Development of automatic means to
search for chemically relevant
configurations is the computational-
kinetics equivalent of improved
electronic structure methods.”
- Basic Research Needs for Clean and
Efficient Combustion of 21st Century
Transportation Fuels.
US Dept of Energy (2006)
8
Automatic TS searches remain an
important energy research goal
“...transformation from by-
hand calculations of single
reactions to automated
calculations of millions of
reactions would be a game-
changer for the field of
chemistry, and would be a
good ‘Grand Challenge’
target...”
- Combustion Energy Frontier
Research Center (2010)
9
First Annual Conference of the
Combustion Energy Frontier Research
Center (CEFRC)
September 23-24, 2010
Princeton
An introduction to
Reaction Mechanism Generator
Automatically builds detailed kinetic models
facebook.com/rmg.mit
rmg.sourceforge.net
10
⇌RMG
Molecules are represented as graphs
CH3CH2. C C*
H
H
H H
H
=
11
Thermochemistry is often estimated
by Benson group contributions
C-(C)(H)3
C-(C)2(H)2
Cb-(H)
C-(C)(Cb)(O)(H)
12
Reaction families propose all possible
reactions with given species
bond breaking and
hydrogen abstraction
intramolecular
H-abstraction
13
•Template for recognizing reactive sites
•Recipe for changing the bonding at the site
•Rules for estimating the rate
14
Reaction families propose all possible
reactions with given species
•Template for recognizing reactive sites
•Recipe for changing the bonding at the site
•Rules for estimating the rate
Octane autoxidation has many pathways
15
•Some pathways go further than others.
16
Faster pathways are explored further
A
B
C
D
E
F
G
H
A
B
C
D
E
F
17
Edge requires many reaction rates
100 species
1,000 reactions
18
Edge requires many reaction rates
100 species
1,000 reactions
15,000 species
180,000 reactions
18
Rate estimates are based on the local
structure of the reacting sites.
•Hydrogen abstraction: XH + Y. → X. + YH
•Rate depends on X and Y.
19
O
H
O
20
Rate estimation rules are organized in a tree
Part of the tree for X
Part of the tree for Y
21
Ideal tree: lots of data
22
Typical tree: sparse data
23
24
So that was RMG...
...but what about
TS geometries?
Single method not feasible
for all reaction types
Intra-H migration
Intra-OH migration
Birad recombination
Intra R addition exocyclic
Intra R addition endocyclic
1,2 birad to alkene
Beta scission
Diels-alder
Radical recombination
Radical addition
Peroxyradical HO2 elimination
1+2/2+2 cycloaddition
Cyclic ether formation
1,2 insertion
1,3 insertion CO2/ROR
Radical addition COO radical recombination
H abstraction
Dispropotionation
25
But a single method can apply to multiple
reaction types
A B A B + C A + B C + D
Intra-H migration
Intra-OH migration
Birad recombination
Intra R addition exocyclic
Intra R addition endocyclic
1,2 birad to alkene
Beta scission
Diels-alder
Radical recombination
Radical addition
1+2/2+2 cycloaddition
Cyclic ether formation
1,2 insertion
1,3 insertion CO2/ROR
Radical addition CO
O radical recombination
H abstraction
Dispropotionation
Peroxyradical HO2 elimination
26
Want robust and user-friendly
3D representation
•Internal coordinates
•Alter distances and angles
•Cartesian coordinates
•Translate, rotate atoms
•Distance geometry
•Alter only distances
Atom X Y Z
1 x1 y1 z1
2 x2 y2 z2
3 x3 y3 z3
4 x4 y4 z4
27
Use RDKit’s geometry editing tools
for atom positioning
⇌RMG
Molecule
Connectivity
3D
Structure
28
Use RDKit’s geometry editing tools
for atom positioning
⇌RMG
Molecule
Connectivity
Atoms List
AtomsList
Upper limits
Lower limits
Generate
bounds
matrix
Embed
in 3D
28
Use RDKit’s geometry editing tools
for atom positioning
⇌RMG
Molecule
Connectivity
Atoms List
AtomsList
Upper limits
Lower limits
Generate
bounds
matrix
Atoms List
AtomsList
Embed in 3D
Edit
bounds matrix
28
C H H H H O O H
C
H
H
H
H
O
O
H
0 1.12 1.12 1.12 1.12 1000 1000 1000
1.1 0 1.86 1.86 1.86 1000 1000 1000
1.1 1.78 0 1.86 1.86 1000 1000 1000
1.1 1.78 1.78 0 1.86 1000 1000 1000
1.1 1.78 1.78 1.78 0 1000 1000 1000
3.65 2.9 2.9 2.9 2.9 0 1.33 1.04
3.65 2.9 2.9 2.9 2.9 1.31 0 1.97
3.15 2.4 2.4 2.4 2.4 1.02 1.89 0
Edit multiple distances to precisely
position atoms involved in reactions
29
C H H H H O O H
C
H
H
H
H
O
O
H
0 1.12 1.12 1.12 1.12 1000 1000 1000
1.1 0 1.86 1.86 1.86 1000 1000 1000
1.1 1.78 0 1.86 1.86 1000 1000 1000
1.1 1.78 1.78 0 1.86 1000 1000 1000
1.1 1.78 1.78 1.78 0 1000 1000 1000
3.65 2.9 2.9 2.9 2.9 0 1.33 1.04
3.65 2.9 2.9 2.9 2.9 1.31 0 1.97
3.15 2.4 2.4 2.4 2.4 1.02 1.89 0
Edit multiple distances to precisely
position atoms involved in reactions
29
C H H H H O O H
C
H
H
H
H
O
O
H
0 1.12 1.12 1.12 1.12 1000 1000 1000
1.1 0 1.86 1.86 1.86 1000 1000 1000
1.1 1.78 0 1.86 1.86 1000 1000 1000
1.1 1.78 1.78 0 1.86 1000 1000 1000
1.1 1.78 1.78 1.78 0 1000 1000 1000
3.65 2.9 2.9 2.9 2.9 0 1.33 1.04
3.65 2.9 2.9 2.9 2.9 1.31 0 1.97
3.15 2.4 2.4 2.4 2.4 1.02 1.89 0
Edit multiple distances to precisely
position atoms involved in reactions
29
C H H H H O O H
C
H
H
H
H
O
O
H
0 1.12 1.12 1.12 1.12 1000 1000 1000
1.1 0 1.86 1.86 1.86 1000 1000 1000
1.1 1.78 0 1.86 1.86 1000 1000 1000
1.1 1.78 1.78 0 1.86 1000 1000 1000
1.1 1.78 1.78 1.78 0 1000 1000 1000
3.65 2.9 2.9 2.9 2.9 0 1.33 1.04
3.65 2.9 2.9 2.9 2.9 1.31 0 1.97
3.15 2.4 2.4 2.4 2.4 1.02 1.89 0
Edit multiple distances to precisely
position atoms involved in reactions
29
C H H H H O O H
C
H
H
H
H
O
O
H
0 1.12 1.12 1.12 1.12 1000 1000 1000
1.1 0 1.86 1.86 1.86 1000 1000 1000
1.1 1.78 0 1.86 1.86 1000 1000 1000
1.1 1.78 1.78 0 1.86 1000 1000 1000
1.1 1.78 1.78 1.78 0 1000 1000 1000
3.65 2.9 2.9 2.9 2.9 0 1.33 1.04
3.65 2.9 2.9 2.9 2.9 1.31 0 1.97
3.15 2.4 2.4 2.4 2.4 1.02 1.89 0
Edit multiple distances to precisely
position atoms involved in reactions
2.0
2.1
29
C H H H H O O H
C
H
H
H
H
O
O
H
0 1.12 1.12 1.12 1.12 1000 1000 1000
1.1 0 1.86 1.86 1.86 1000 1000 1000
1.1 1.78 0 1.86 1.86 1000 1000 1000
1.1 1.78 1.78 0 1.86 1000 1000 1000
1.1 1.78 1.78 1.78 0 1000 1000 1000
3.65 2.9 2.9 2.9 2.9 0 1.33 1.04
3.65 2.9 2.9 2.9 2.9 1.31 0 1.97
3.15 2.4 2.4 2.4 2.4 1.02 1.89 0
Edit multiple distances to precisely
position atoms involved in reactions
2.0
2.1
29
C H H H H O O H
C
H
H
H
H
O
O
H
0 1.12 1.12 1.12 1.12 1000 1000 1000
1.1 0 1.86 1.86 1.86 1000 1000 1000
1.1 1.78 0 1.86 1.86 1000 1000 1000
1.1 1.78 1.78 0 1.86 1000 1000 1000
1.1 1.78 1.78 1.78 0 1000 1000 1000
3.65 2.9 2.9 2.9 2.9 0 1.33 1.04
3.65 2.9 2.9 2.9 2.9 1.31 0 1.97
3.15 2.4 2.4 2.4 2.4 1.02 1.89 0
Edit multiple distances to precisely
position atoms involved in reactions
2.0
2.1
2.5
2.6
29
C H H H H O O H
C
H
H
H
H
O
O
H
0 1.12 1.12 1.12 1.12 1000 1000 1000
1.1 0 1.86 1.86 1.86 1000 1000 1000
1.1 1.78 0 1.86 1.86 1000 1000 1000
1.1 1.78 1.78 0 1.86 1000 1000 1000
1.1 1.78 1.78 1.78 0 1000 1000 1000
3.65 2.9 2.9 2.9 2.9 0 1.33 1.04
3.65 2.9 2.9 2.9 2.9 1.31 0 1.97
3.15 2.4 2.4 2.4 2.4 1.02 1.89 0
Edit multiple distances to precisely
position atoms involved in reactions
2.0
2.1
2.5
2.6
29
Double-ended algorithms find
transition state estimates
Reactants
Products
30
Double-ended algorithms find
transition state estimates
Reactants
Products
30
R
P
Position molecules for
double-ended searches
31
R
P
Best guess: just either side of TS
32
Method tested with
semi-empirical calculations
•Two double-ended algorithms tested
•QST2 at PM6 in Gaussian09
•SADDLE at PM7 in MOPAC2012
•Reaction path analysis validated the saddle points
Generate
Bounds
Matrix
Edit Bounds
Matrix
close to TS
Embed
Matrix in
3D
Reaction
from RMG
Optimize TS
geometry
Generate
Bounds
Matrix
Edit Bounds
Matrix
close to TS
Embed
Matrix in
3D
Double-
ended
Search
Reactants
Products
IRC
Calculation
33
Path analysis algorithms descend
to find the reactants and products
R
P
34
Path analysis algorithms descend
to find the reactants and products
R
P
34
Path analysis algorithms descend
to find the reactants and products
R
P
34
Path analysis algorithms descend
to find the reactants and products
R
P
34
TS search
and
refinement
Reaction
path
analysis
Compare
to desired
reactants
& products
Embed
geometry
either
side of TS
Get
bounds
matrix
Fail
Succeed
FailFail
H. .OH other
radical
.OH
other
radical
A closer look at the automatic TS search
process for H abstraction
35
338
Reactions
from the NIST
Database
TS search
and
refinement
Reaction
path
analysis
Compare
to desired
reactants
& products
Embed
geometry
either
side of TS
Get
bounds
matrix
Fail
Succeed
FailFail
H. .OH other
radical
.OH
other
radical
A closer look at the automatic TS search
process for H abstraction
35
VdW
collisions
338
Reactions
from the NIST
Database
TS search
and
refinement
Reaction
path
analysis
Compare
to desired
reactants
& products
Embed
geometry
either
side of TS
Get
bounds
matrix
Fail
Succeed
FailFail
H. .OH other
radical
.OH
other
radical
A closer look at the automatic TS search
process for H abstraction
35
VdW
collisions
No TS at
this ES level
338
Reactions
from the NIST
Database
TS search
and
refinement
Reaction
path
analysis
Compare
to desired
reactants
& products
Embed
geometry
either
side of TS
Get
bounds
matrix
Fail
Succeed
FailFail
H. .OH other
radical
.OH
other
radical
A closer look at the automatic TS search
process for H abstraction
35
VdW
collisions
No TS at
this ES level
338
Reactions
from the NIST
Database
TS search
and
refinement
Reaction
path
analysis
Compare
to desired
reactants
& products
Embed
geometry
either
side of TS
Get
bounds
matrix
Fail
Succeed
FailFail
H. .OH other
radical
.OH
other
radical
A closer look at the automatic TS search
process for H abstraction
35
VdW
collisions
No TS at
this ES level
338
Reactions
from the NIST
Database
TS search
and
refinement
Reaction
path
analysis
Compare
to desired
reactants
& products
Embed
geometry
either
side of TS
Get
bounds
matrix
Fail
Succeed
FailFail
H. .OH other
radical
.OH
other
radical
A closer look at the automatic TS search
process for H abstraction
35
VdW
collisions
No TS at
this ES level
338
Reactions
from the NIST
Database
Bond
perception
Species matching returned false negatives
due to incorrect bond order perception.
CH4
36
R
P
Observed:
Expected:
Species matching returned false negatives
due to incorrect bond order perception.
Connect
the dots
CH4
36
R
P
Observed:
Expected:
Species matching returned false negatives
due to incorrect bond order perception.
Connect
the dots
Perceive
bond order
CH4
36
R
P
Observed:
Expected:
Species matching returned false negatives
due to incorrect bond order perception.
Connect
the dots
Perceive
bond order
CH4
CH4
Check
valencies
36
R
P
Observed:
Expected:
Species matching returned false negatives
due to incorrect bond order perception.
Connect
the dots
Perceive
bond order
CH4
CH4
Check
valencies
36
R
P
Observed:
Expected:
Species matching returned false negatives
due to incorrect bond order perception.
Connect
the dots
Perceive
bond order
CH4
CH4
Check
valencies
Check
valencies
CH4
36
R
P
Observed:
Expected:
TS search
and
refinement
Reaction
path
analysis
Compare
to desired
reactants
& products
Embed
geometry
either
side of TS
Get
bounds
matrix
Fail
Succeed
FailFail
H. .OH other
radical
.OH
other
radical
Most failures involve reactions with
small molecules
37
VdW
collisions
No TS at
this ES level
Bond
perception
Small radicals need to be closer to the
molecule they are abstracting from
38
•All abstractions by H. failed
•Many with other small radicals (eg. .OH) also failed
Small radicals need to be closer to the
molecule they are abstracting from
38
•All abstractions by H. failed
•Many with other small radicals (eg. .OH) also failed
TS search
and
refinement
Reaction
path
analysis
Compare
to desired
reactants
& products
Embed
geometry
either
side of TS
Get
bounds
matrix
Fail
Succeed
FailFail
H. .OH other
radical
.OH
other
radical
Learn from the successful saddle points
to improve automatic searches
39
VdW
collisions
No TS at
this ES level
Bond
perception
Semi-empirical estimates used for
DFT calculations
40
•Check semi-empirical geometry validity
•Use geometry as input to DFT calculations
•Check DFT geometry validity
Generate
Bounds
Matrix
Edit Bounds
Matrix
close to TS
Embed
Matrix in
3D
Reaction
from RMG
Optimize TS
geometry
Generate
Bounds
Matrix
Edit Bounds
Matrix
close to TS
Embed
Matrix in
3D
Double-
ended
Search
Reactants
Products
IRC
Calculation
Optimize TS
geometry at
DFT
IRC
Calculation
at DFT
Trends observed in DFT
saddle point geometries
41
Structure method:
Basis set:
M06-2X
6-31+G(d,p)
X
Y•
H
Trends observed in DFT
saddle point geometries
41
Structure method:
Basis set:
M06-2X
6-31+G(d,p)
X
Y•
H
Trends observed in DFT
saddle point geometries
41
Structure method:
Basis set:
M06-2X
6-31+G(d,p)
X
Y•
H
Trends observed in DFT
saddle point geometries
41
Structure method:
Basis set:
M06-2X
6-31+G(d,p)
X
Y•
H
Estimate geometry directly via
group additive distance estimates
42
Generate
Bounds
Matrix
Edit Bounds
Matrix
close to TS
Embed
Matrix in
3D
Reaction
from RMG
Optimize TS
geometry
Generate
Bounds
Matrix
Edit Bounds
Matrix
close to TS
Embed
Matrix in
3D
Double-
ended
SearchReactants
Products
IRC
Calculation
Generate
Bounds
Matrix
Edit Bounds
Matrix
for TS
Embed
Matrix in
3D
•Database arranged in tree structure as for kinetics
•Trained on successfully optimized transition states
•Direct guess much faster than double ended search
•Success depends on training data
Comparison of the developed methods
43
Double-Ended Searches Direct Estimates
Input
requirements
2 rough estimates 1 good estimate
Distance
specifications
One rule for all Group based estimates
Optimization
Methods
QST2, SADDLE,
Surface Walking
Surface Walking
Computational
Speed
Slower Faster
Small radical
reactions
Problematic Better
Multiple
conformers
Problematic Possible
Contributions
•Explained Reaction Mechanism Generator RMG.
•Created framework to find TS geometries
using RMG and RDKit for distance geometry.
•Categorized reaction families,
and chose H-abstraction as first target.
•Implemented double-ended TS searches
that work with no training data.
•Identified trends in functional group contributions
to TS geometries.
•Implemented direct guesses based on group additive
estimates, and started to train group values.
44
Departmentof Chemical Engineering

Finding Transition States Algorithmically for Automatic Reaction Mechanism Generation

  • 1.
    Computational Modeling inChemical Engineering .edu/comocheng Finding Transition States Algorithmically for Automatic Reaction Mechanism Generation Pierre L. Bhoorasingh Richard H. West 1
  • 2.
    Can you predictTS geometries from molecular groups alone? 2 (this would be great)
  • 3.
    Length of bondbeing broken, at TS for Hydrogen abstraction Can you predict TS geometries from molecular groups alone? 3 Radical Molecule
  • 4.
    Length of bondbeing broken, at TS for Hydrogen abstraction !"!#$ !"!%% !"!&' !"!($ !")() !")'& !"*+$ !"*!# !")(' !")$% !"*%% !")'+ !"*+& !"*&) !"*&$ Can you predict TS geometries from molecular groups alone? 3in Å with M06-2X/6-31+G(d,p)
  • 5.
    Can you predictTS geometries from molecular groups alone? 4 !"!#$ !"!%% !"!&' !"!($ !")() !")'& !"*+$ !"*!# !")(' !")$% !"*%% !")'+ !"*+& !"*&) !"*&$ !"#$# !"#$%!"!#$ !"#$% in Å with M06-2X/6-31+G(d,p)
  • 6.
    You can predictTS geometries from molecular groups alone! 5 !"!#$ !"!%% !"!&' !"!($ !")() !")'& !"*+$ !"*!# !")(' !")$% !"*%% !")'+ !"*+& !"*&) !"*&$ !"#$% in Å with M06-2X/6-31+G(d,p)
  • 7.
    You can predictTS geometries from molecular groups alone! 6 But... ... you gave me a distance, not a geometry. ... I gave you 15 numbers then asked you for 1.
  • 8.
    Automatic Transition StateTheory (TST) would be a game-changer. •Insight and predictions require detailed kinetic models. •Error-free detailed models require automatic generation. •Automatic generation requires reasonable estimates of millions of reaction rates. •Current estimates are often unreasonable due to scarcity of data. 7
  • 9.
    Automatic TS searchesremain an important energy research goal “An accurate description of the often intricate mechanisms of large-molecule reactions requires a characterization of all relevant transition states... Development of automatic means to search for chemically relevant configurations is the computational- kinetics equivalent of improved electronic structure methods.” - Basic Research Needs for Clean and Efficient Combustion of 21st Century Transportation Fuels. US Dept of Energy (2006) 8
  • 10.
    Automatic TS searchesremain an important energy research goal “...transformation from by- hand calculations of single reactions to automated calculations of millions of reactions would be a game- changer for the field of chemistry, and would be a good ‘Grand Challenge’ target...” - Combustion Energy Frontier Research Center (2010) 9 First Annual Conference of the Combustion Energy Frontier Research Center (CEFRC) September 23-24, 2010 Princeton
  • 11.
    An introduction to ReactionMechanism Generator Automatically builds detailed kinetic models facebook.com/rmg.mit rmg.sourceforge.net 10 ⇌RMG
  • 12.
    Molecules are representedas graphs CH3CH2. C C* H H H H H = 11
  • 13.
    Thermochemistry is oftenestimated by Benson group contributions C-(C)(H)3 C-(C)2(H)2 Cb-(H) C-(C)(Cb)(O)(H) 12
  • 14.
    Reaction families proposeall possible reactions with given species bond breaking and hydrogen abstraction intramolecular H-abstraction 13 •Template for recognizing reactive sites •Recipe for changing the bonding at the site •Rules for estimating the rate
  • 15.
    14 Reaction families proposeall possible reactions with given species •Template for recognizing reactive sites •Recipe for changing the bonding at the site •Rules for estimating the rate
  • 16.
    Octane autoxidation hasmany pathways 15
  • 17.
    •Some pathways gofurther than others. 16
  • 18.
    Faster pathways areexplored further A B C D E F G H A B C D E F 17
  • 19.
    Edge requires manyreaction rates 100 species 1,000 reactions 18
  • 20.
    Edge requires manyreaction rates 100 species 1,000 reactions 15,000 species 180,000 reactions 18
  • 21.
    Rate estimates arebased on the local structure of the reacting sites. •Hydrogen abstraction: XH + Y. → X. + YH •Rate depends on X and Y. 19 O H O
  • 22.
    20 Rate estimation rulesare organized in a tree Part of the tree for X
  • 23.
    Part of thetree for Y 21
  • 24.
    Ideal tree: lotsof data 22
  • 25.
  • 26.
    24 So that wasRMG... ...but what about TS geometries?
  • 27.
    Single method notfeasible for all reaction types Intra-H migration Intra-OH migration Birad recombination Intra R addition exocyclic Intra R addition endocyclic 1,2 birad to alkene Beta scission Diels-alder Radical recombination Radical addition Peroxyradical HO2 elimination 1+2/2+2 cycloaddition Cyclic ether formation 1,2 insertion 1,3 insertion CO2/ROR Radical addition COO radical recombination H abstraction Dispropotionation 25
  • 28.
    But a singlemethod can apply to multiple reaction types A B A B + C A + B C + D Intra-H migration Intra-OH migration Birad recombination Intra R addition exocyclic Intra R addition endocyclic 1,2 birad to alkene Beta scission Diels-alder Radical recombination Radical addition 1+2/2+2 cycloaddition Cyclic ether formation 1,2 insertion 1,3 insertion CO2/ROR Radical addition CO O radical recombination H abstraction Dispropotionation Peroxyradical HO2 elimination 26
  • 29.
    Want robust anduser-friendly 3D representation •Internal coordinates •Alter distances and angles •Cartesian coordinates •Translate, rotate atoms •Distance geometry •Alter only distances Atom X Y Z 1 x1 y1 z1 2 x2 y2 z2 3 x3 y3 z3 4 x4 y4 z4 27
  • 30.
    Use RDKit’s geometryediting tools for atom positioning ⇌RMG Molecule Connectivity 3D Structure 28
  • 31.
    Use RDKit’s geometryediting tools for atom positioning ⇌RMG Molecule Connectivity Atoms List AtomsList Upper limits Lower limits Generate bounds matrix Embed in 3D 28
  • 32.
    Use RDKit’s geometryediting tools for atom positioning ⇌RMG Molecule Connectivity Atoms List AtomsList Upper limits Lower limits Generate bounds matrix Atoms List AtomsList Embed in 3D Edit bounds matrix 28
  • 33.
    C H HH H O O H C H H H H O O H 0 1.12 1.12 1.12 1.12 1000 1000 1000 1.1 0 1.86 1.86 1.86 1000 1000 1000 1.1 1.78 0 1.86 1.86 1000 1000 1000 1.1 1.78 1.78 0 1.86 1000 1000 1000 1.1 1.78 1.78 1.78 0 1000 1000 1000 3.65 2.9 2.9 2.9 2.9 0 1.33 1.04 3.65 2.9 2.9 2.9 2.9 1.31 0 1.97 3.15 2.4 2.4 2.4 2.4 1.02 1.89 0 Edit multiple distances to precisely position atoms involved in reactions 29
  • 34.
    C H HH H O O H C H H H H O O H 0 1.12 1.12 1.12 1.12 1000 1000 1000 1.1 0 1.86 1.86 1.86 1000 1000 1000 1.1 1.78 0 1.86 1.86 1000 1000 1000 1.1 1.78 1.78 0 1.86 1000 1000 1000 1.1 1.78 1.78 1.78 0 1000 1000 1000 3.65 2.9 2.9 2.9 2.9 0 1.33 1.04 3.65 2.9 2.9 2.9 2.9 1.31 0 1.97 3.15 2.4 2.4 2.4 2.4 1.02 1.89 0 Edit multiple distances to precisely position atoms involved in reactions 29
  • 35.
    C H HH H O O H C H H H H O O H 0 1.12 1.12 1.12 1.12 1000 1000 1000 1.1 0 1.86 1.86 1.86 1000 1000 1000 1.1 1.78 0 1.86 1.86 1000 1000 1000 1.1 1.78 1.78 0 1.86 1000 1000 1000 1.1 1.78 1.78 1.78 0 1000 1000 1000 3.65 2.9 2.9 2.9 2.9 0 1.33 1.04 3.65 2.9 2.9 2.9 2.9 1.31 0 1.97 3.15 2.4 2.4 2.4 2.4 1.02 1.89 0 Edit multiple distances to precisely position atoms involved in reactions 29
  • 36.
    C H HH H O O H C H H H H O O H 0 1.12 1.12 1.12 1.12 1000 1000 1000 1.1 0 1.86 1.86 1.86 1000 1000 1000 1.1 1.78 0 1.86 1.86 1000 1000 1000 1.1 1.78 1.78 0 1.86 1000 1000 1000 1.1 1.78 1.78 1.78 0 1000 1000 1000 3.65 2.9 2.9 2.9 2.9 0 1.33 1.04 3.65 2.9 2.9 2.9 2.9 1.31 0 1.97 3.15 2.4 2.4 2.4 2.4 1.02 1.89 0 Edit multiple distances to precisely position atoms involved in reactions 29
  • 37.
    C H HH H O O H C H H H H O O H 0 1.12 1.12 1.12 1.12 1000 1000 1000 1.1 0 1.86 1.86 1.86 1000 1000 1000 1.1 1.78 0 1.86 1.86 1000 1000 1000 1.1 1.78 1.78 0 1.86 1000 1000 1000 1.1 1.78 1.78 1.78 0 1000 1000 1000 3.65 2.9 2.9 2.9 2.9 0 1.33 1.04 3.65 2.9 2.9 2.9 2.9 1.31 0 1.97 3.15 2.4 2.4 2.4 2.4 1.02 1.89 0 Edit multiple distances to precisely position atoms involved in reactions 2.0 2.1 29
  • 38.
    C H HH H O O H C H H H H O O H 0 1.12 1.12 1.12 1.12 1000 1000 1000 1.1 0 1.86 1.86 1.86 1000 1000 1000 1.1 1.78 0 1.86 1.86 1000 1000 1000 1.1 1.78 1.78 0 1.86 1000 1000 1000 1.1 1.78 1.78 1.78 0 1000 1000 1000 3.65 2.9 2.9 2.9 2.9 0 1.33 1.04 3.65 2.9 2.9 2.9 2.9 1.31 0 1.97 3.15 2.4 2.4 2.4 2.4 1.02 1.89 0 Edit multiple distances to precisely position atoms involved in reactions 2.0 2.1 29
  • 39.
    C H HH H O O H C H H H H O O H 0 1.12 1.12 1.12 1.12 1000 1000 1000 1.1 0 1.86 1.86 1.86 1000 1000 1000 1.1 1.78 0 1.86 1.86 1000 1000 1000 1.1 1.78 1.78 0 1.86 1000 1000 1000 1.1 1.78 1.78 1.78 0 1000 1000 1000 3.65 2.9 2.9 2.9 2.9 0 1.33 1.04 3.65 2.9 2.9 2.9 2.9 1.31 0 1.97 3.15 2.4 2.4 2.4 2.4 1.02 1.89 0 Edit multiple distances to precisely position atoms involved in reactions 2.0 2.1 2.5 2.6 29
  • 40.
    C H HH H O O H C H H H H O O H 0 1.12 1.12 1.12 1.12 1000 1000 1000 1.1 0 1.86 1.86 1.86 1000 1000 1000 1.1 1.78 0 1.86 1.86 1000 1000 1000 1.1 1.78 1.78 0 1.86 1000 1000 1000 1.1 1.78 1.78 1.78 0 1000 1000 1000 3.65 2.9 2.9 2.9 2.9 0 1.33 1.04 3.65 2.9 2.9 2.9 2.9 1.31 0 1.97 3.15 2.4 2.4 2.4 2.4 1.02 1.89 0 Edit multiple distances to precisely position atoms involved in reactions 2.0 2.1 2.5 2.6 29
  • 41.
    Double-ended algorithms find transitionstate estimates Reactants Products 30
  • 42.
    Double-ended algorithms find transitionstate estimates Reactants Products 30
  • 43.
  • 44.
    R P Best guess: justeither side of TS 32
  • 45.
    Method tested with semi-empiricalcalculations •Two double-ended algorithms tested •QST2 at PM6 in Gaussian09 •SADDLE at PM7 in MOPAC2012 •Reaction path analysis validated the saddle points Generate Bounds Matrix Edit Bounds Matrix close to TS Embed Matrix in 3D Reaction from RMG Optimize TS geometry Generate Bounds Matrix Edit Bounds Matrix close to TS Embed Matrix in 3D Double- ended Search Reactants Products IRC Calculation 33
  • 46.
    Path analysis algorithmsdescend to find the reactants and products R P 34
  • 47.
    Path analysis algorithmsdescend to find the reactants and products R P 34
  • 48.
    Path analysis algorithmsdescend to find the reactants and products R P 34
  • 49.
    Path analysis algorithmsdescend to find the reactants and products R P 34
  • 50.
    TS search and refinement Reaction path analysis Compare to desired reactants &products Embed geometry either side of TS Get bounds matrix Fail Succeed FailFail H. .OH other radical .OH other radical A closer look at the automatic TS search process for H abstraction 35 338 Reactions from the NIST Database
  • 51.
    TS search and refinement Reaction path analysis Compare to desired reactants &products Embed geometry either side of TS Get bounds matrix Fail Succeed FailFail H. .OH other radical .OH other radical A closer look at the automatic TS search process for H abstraction 35 VdW collisions 338 Reactions from the NIST Database
  • 52.
    TS search and refinement Reaction path analysis Compare to desired reactants &products Embed geometry either side of TS Get bounds matrix Fail Succeed FailFail H. .OH other radical .OH other radical A closer look at the automatic TS search process for H abstraction 35 VdW collisions No TS at this ES level 338 Reactions from the NIST Database
  • 53.
    TS search and refinement Reaction path analysis Compare to desired reactants &products Embed geometry either side of TS Get bounds matrix Fail Succeed FailFail H. .OH other radical .OH other radical A closer look at the automatic TS search process for H abstraction 35 VdW collisions No TS at this ES level 338 Reactions from the NIST Database
  • 54.
    TS search and refinement Reaction path analysis Compare to desired reactants &products Embed geometry either side of TS Get bounds matrix Fail Succeed FailFail H. .OH other radical .OH other radical A closer look at the automatic TS search process for H abstraction 35 VdW collisions No TS at this ES level 338 Reactions from the NIST Database
  • 55.
    TS search and refinement Reaction path analysis Compare to desired reactants &products Embed geometry either side of TS Get bounds matrix Fail Succeed FailFail H. .OH other radical .OH other radical A closer look at the automatic TS search process for H abstraction 35 VdW collisions No TS at this ES level 338 Reactions from the NIST Database Bond perception
  • 56.
    Species matching returnedfalse negatives due to incorrect bond order perception. CH4 36 R P Observed: Expected:
  • 57.
    Species matching returnedfalse negatives due to incorrect bond order perception. Connect the dots CH4 36 R P Observed: Expected:
  • 58.
    Species matching returnedfalse negatives due to incorrect bond order perception. Connect the dots Perceive bond order CH4 36 R P Observed: Expected:
  • 59.
    Species matching returnedfalse negatives due to incorrect bond order perception. Connect the dots Perceive bond order CH4 CH4 Check valencies 36 R P Observed: Expected:
  • 60.
    Species matching returnedfalse negatives due to incorrect bond order perception. Connect the dots Perceive bond order CH4 CH4 Check valencies 36 R P Observed: Expected:
  • 61.
    Species matching returnedfalse negatives due to incorrect bond order perception. Connect the dots Perceive bond order CH4 CH4 Check valencies Check valencies CH4 36 R P Observed: Expected:
  • 62.
    TS search and refinement Reaction path analysis Compare to desired reactants &products Embed geometry either side of TS Get bounds matrix Fail Succeed FailFail H. .OH other radical .OH other radical Most failures involve reactions with small molecules 37 VdW collisions No TS at this ES level Bond perception
  • 63.
    Small radicals needto be closer to the molecule they are abstracting from 38 •All abstractions by H. failed •Many with other small radicals (eg. .OH) also failed
  • 64.
    Small radicals needto be closer to the molecule they are abstracting from 38 •All abstractions by H. failed •Many with other small radicals (eg. .OH) also failed
  • 65.
    TS search and refinement Reaction path analysis Compare to desired reactants &products Embed geometry either side of TS Get bounds matrix Fail Succeed FailFail H. .OH other radical .OH other radical Learn from the successful saddle points to improve automatic searches 39 VdW collisions No TS at this ES level Bond perception
  • 66.
    Semi-empirical estimates usedfor DFT calculations 40 •Check semi-empirical geometry validity •Use geometry as input to DFT calculations •Check DFT geometry validity Generate Bounds Matrix Edit Bounds Matrix close to TS Embed Matrix in 3D Reaction from RMG Optimize TS geometry Generate Bounds Matrix Edit Bounds Matrix close to TS Embed Matrix in 3D Double- ended Search Reactants Products IRC Calculation Optimize TS geometry at DFT IRC Calculation at DFT
  • 67.
    Trends observed inDFT saddle point geometries 41 Structure method: Basis set: M06-2X 6-31+G(d,p) X Y• H
  • 68.
    Trends observed inDFT saddle point geometries 41 Structure method: Basis set: M06-2X 6-31+G(d,p) X Y• H
  • 69.
    Trends observed inDFT saddle point geometries 41 Structure method: Basis set: M06-2X 6-31+G(d,p) X Y• H
  • 70.
    Trends observed inDFT saddle point geometries 41 Structure method: Basis set: M06-2X 6-31+G(d,p) X Y• H
  • 71.
    Estimate geometry directlyvia group additive distance estimates 42 Generate Bounds Matrix Edit Bounds Matrix close to TS Embed Matrix in 3D Reaction from RMG Optimize TS geometry Generate Bounds Matrix Edit Bounds Matrix close to TS Embed Matrix in 3D Double- ended SearchReactants Products IRC Calculation Generate Bounds Matrix Edit Bounds Matrix for TS Embed Matrix in 3D •Database arranged in tree structure as for kinetics •Trained on successfully optimized transition states •Direct guess much faster than double ended search •Success depends on training data
  • 72.
    Comparison of thedeveloped methods 43 Double-Ended Searches Direct Estimates Input requirements 2 rough estimates 1 good estimate Distance specifications One rule for all Group based estimates Optimization Methods QST2, SADDLE, Surface Walking Surface Walking Computational Speed Slower Faster Small radical reactions Problematic Better Multiple conformers Problematic Possible
  • 73.
    Contributions •Explained Reaction MechanismGenerator RMG. •Created framework to find TS geometries using RMG and RDKit for distance geometry. •Categorized reaction families, and chose H-abstraction as first target. •Implemented double-ended TS searches that work with no training data. •Identified trends in functional group contributions to TS geometries. •Implemented direct guesses based on group additive estimates, and started to train group values. 44 Departmentof Chemical Engineering