This document summarizes Rayner Mendes' honors thesis on simulating Stokes shifts of boron difluoride (BF2) formazanate complexes using density functional theory. The study aims to model the absorption, emission, and Stokes shifts of BF2 complexes and analyze how structural tuning affects their spectroscopic properties. A training set of naphthalene and fluorene was used to optimize the computational methodology. Density functional theory with time-dependent DFT and a polarizable continuum model was employed to simulate the complexes in solvent and calculate vertical excitation energies, optimized geometries, and spectroscopic properties. A quantitative structure-property relationship analysis was also conducted to determine electronic and steric descriptors that influence absorption and emission and
Simulation of Stokes Shifts in BF2 Formazanate Complexes
1. Simulation of Stokes Shifts and Analysis of
Substituent Effects in Boron Difluoride
Formazanate Complexes
By:
Rayner B. Mendes
Supervisor:
Viktor N. Staroverov
CHEM 4491E Thesis
submitted in partial fulfillment of the requirements
for the degree: Honours Bachelor of Science (Chemistry)
Department of Chemistry
The University of Western Ontario
London, Ontario, Canada
2016
2. ii
Thesis Examiner 1: __________________ Styliani Constas
Thesis Examiner 2: __________________ J. Clara Wren
Supervisor: __________________ Viktor N. Staroverov
3. iii
Abstract
Simulation of spectroscopic properties such as absorption, emission, and the Stokes shifts is of
great interest for the tuning of functional materials. Structural tuning of boron difluoride (BF2)
complexes has been shown to affect these spectroscopic properties. This study attempts to simulate
the Stokes shifts of BF2 formazanate complexes using density-functional theory. Additionally, it
analyzes the effect of structural tuning through a quantitative structure-property relationship study
using multivariable regression of quantum-chemical descriptors. The proposed methodology is
able to reproduce experimental data and provide insights into trends in substituent effects. The
PBE0/6-311+G(d,p) level of theory is optimal in simulating absorption, emission, and Stokes
shifts. The level of theory has a mean absolute error of 0.0048 eV, compared to other functionals
and basis sets tested across 60 unique trials. For small ฯ-conjugated molecules the simulated
absorption, emission and Stokes shift had errors under 0.5%, 0.4% and 1.4% respectively. BF2
formazanate complexes had absorption and emission errors under 3% and 7%. Electronic trends
from previous research were reproduced, electron-donation groups caused a red shift while
electron-withdrawing groups caused a blue shift in the R3 position. The quantitative structure-
property relationship model suggests that this trend is reversed when substituents are placed in the
R1 and R2 positions. The multivariable regression analysis found that the absorption energy could
be described by the dipole moment, excited-state dipole moment, dipole moment derivatives,
HOMO and excited-state HOMO energies. The regression equation for a training series had a 6.0
nm mean absolute error, adjusted r2
of 0.9949, F-value under 0.04%, and all descriptors used had
a p-value under 1% and is therefore accurate within a 99% confidence interval.
4. iv
Contents
Abstract .........................................................................................................................................................iii
Acknowledgments..........................................................................................................................................v
1 Introduction............................................................................................................................................1
1.1 Stokes shifts and structural tuning..................................................................................................1
1.2 Objectives.......................................................................................................................................3
1.3 Simulation of Stokes shifts.............................................................................................................3
1.4 Quantitative structure-property relationship (QPSR).....................................................................4
2 Experimental ..........................................................................................................................................5
2.1 Simulation of Stokes shifts.............................................................................................................5
2.1.1 Defining a molecular training set ...............................................................................................5
2.1.2 Computation details....................................................................................................................5
2.1.3 Choice of functionals and basis sets...........................................................................................9
2.2 QSPR model.................................................................................................................................11
2.2.1 QSPR training series ................................................................................................................11
2.2.2 Electronic and steric descriptors...............................................................................................12
2.2.3 Regression methodology..........................................................................................................12
2.2.4 Optimizing QPSR model..........................................................................................................13
3 Results and Discussion.........................................................................................................................15
3.1 Stokes shifts .................................................................................................................................15
3.2 Effect of structural tuning on spectroscopic properties................................................................17
3.2.1 Steric and electronic influence on Stokes shift in the R3 position............................................18
3.2.2 Electrostatics and electronic influence on Stokes shift in the R1 and R2 positions ..................18
3.2.3 Multivariable regression of hydrogen training series...............................................................19
4 Conclusion............................................................................................................................................21
4.1 Further Research ..............................................................................................................................21
References....................................................................................................................................................22
5. v
Acknowledgments
I would like to thank Dr. Viktor Staroverov whose supervision, guidance and input have allowed
me to gain a better grasp of computational chemistry. I would also like to thank Dr. Joe Gilroy for
helping with the analysis of his groupโs experimental data. Finally, I am grateful to Mr. Sviataslau
Kohut, Dr. Rogelio Cuevas, Ms. Darya Komsa, and Mr. Hanqing Zhao for the insights they
provided through conversations and discussions.
List of Abbreviations
B3LYP Becke, three-parameter, Lee-Yang-Parr
BF2 boron difluoride
DCM dichloromethane
DFT density-functional theory
๐ dipole moment
HF Hartree-Fock
HOMO highest occupied molecular orbital
PBE PerdewโBurkeโErnzerhof
PCM polarizable continuum model (excited-state energy)
LUMO lowest unoccupied molecular orbital
M06 Minnesota 06
MV molar volume
SCF self-consistent field (ground-state energy)
SCRF self-consistent reaction field
SS Stokes shift
TD-DFT time dependent density-functional theory
THF tetrahydrofuran
QSPR quantitative structure property relationship
VWN Vosko-Wilk-Nusair
XC exchange-correlation
6. 1 Introduction
1.1 Stokes shifts and structural tuning
Spectroscopic properties such as the Stokes shift (SS) wavelength measured in nm,
๐SS = ๐em โ ๐abs (1)
are of particular interest for molecules which exhibit extended ๐-conjugation. These spectroscopic
properties allow for characterization of such molecules with regard to structural elucidation. The
tuning of such properties through structural variation allows ฯ-conjugated functional materials to
be used in photovoltaic cells,1
luminescent materials,2
and field-effect transistors.3
It has been
shown that increasing the ฯ-conjugation of fused aromatic rings impacts its spectroscopic
properties. Aromatic molecules exhibit an increase in absorption as ฯ-conjugation is increased;
benzene has a maximum absorption of 260 nm, naphthalene increases to 310 nm, and anthracene
to 375 nm.4
The reason the wavelength of emission is different than that of absorption is due to the change in
the potential energy curve for a molecules excited state. Figure 1 shows the Franck-Condom
principle, which suggests that during an electronic transition is more likely to occur if two
vibrational wavefunctions overlap. As a molecule is excited into its excited state it absorbs a
photon, as the geometry distorts and the molecule relaxes to the ground state there is a release of
a photon (emission).
Figure 1: Absorption and emission due to fluorescence.
7. 2
Boron difluoride (BF2) functional materials have been shown to have tunable spectroscopic
properties which vary on ligand type, position of substitution and extent of ฯ-conjugation. Fu and
co-workers5
analyzed to naphthyridine BF2 complexes Figure 2 (1 and 2) observed that extension
of the ฯ-conjugation of the system results in a red shift by 20 nm. Piers et al.6
modified the structure
of anilido-pyridine ligands Figure 2 (3 and 4) to increase the degree of ฯ-conjugation which red-
shifted the absorption and emission by 50 nm.
Figure 2: BF2 complexes synthesized by the Fu5
and Piers groups.6
Functional materials such as the BF2 formazanate complexes synthesized in the Gilroy group are
of particular interest.7
These complexes have numerous applications as dyes,8
or indicators of cell
activity.9
The Gilroy group has shown10
that BF2 complexes derived from formazans have
desirable spectroscopic properties which can be tuned through structural variation shown in Figure
3. It was found that electron-withdrawing groups on the complex will blue shift the maximum
absorption and emission wavelengths.11
Electron-donating groups will cause a red shift even
compared to their phenyl substituted counterparts.11
Through the observation of these trends, BF2
formazanate complexes are an ideal candidate for a computational study.
8. 3
Figure 3: Synthesized boron difluoride formazanate complexes from the Gilroy group.12
1.2 Objectives
The objective of this thesis is to:
1) Model the Stokes shifts of BF2 formazanate complexes in commonly used solvents such as
dichloromethane (DCM) and tetrahydrofuran (THF).
2) Conduct a quantitative structure-property relationship (QPSR) study to determine whether
through the application of the above model can: determine which electronic and steric
descriptors affect the absorption, emission, and Stokes shifts as well as predict the influence
of structural variation on spectroscopic properties.
1.3 Simulation of Stokes shifts
The above objectives will be realized through density-functional theory (DFT) approximations
using the GAUSSIAN 09 program.13
Calculations will utilize protocols such as the self-consistent
reaction field (SCRF)14
to model solvation, and time-dependent DFT (TD-DFT) to model excited-
states.15
DFT is a computational quantum-mechanics model used in a variety of fields from physics, to
material science, and chemistry. DFT investigates the electronic structure of many-body systems
such as atoms and molecules16
through the use of functionals to determine the nature of the electron
density being analyzed. Various properties can be computed to high degree of accuracy without
the need for experiments.17
These properties include excitation energies,12
ground- and excited-state geometries,11
and various
spectroscopic properties.18
The distortion of geometry from the ground-state to the first excited-
state results in the difference in absorption and emission wavelengths.19
9. 4
1.4 Quantitative structure-property relationship (QPSR)
As mentioned, a QSPR study will be performed on the absorption and emission of BF2 formazanate
complexes through the regression of various electronic and steric descriptors. QSPRs are primarily
used in pharmaceutical design and medicinal drug synthesis20
where the dependent variable such
as binding affinity is related to an independent variable such as hydrophobicity.21
QSPR studies
transformed searching for compounds with desired properties using chemical intuition into a
mathematically quantified form.22
These studies can be conducted both experimentally23
and
computationally.24
To conduct a QSPR model for the BF2 formazanate complex a multivariable
regression model is required.25
Multivariable regressions quantitatively relate a properties such as
absorption or emission to a block of predictor variables like quantum-chemical descriptors in the
form of an equation.26
To adequately study the QSPRs of BF2 formazanate complexes the
electronic and steric descriptors need to be quantified. These descriptors should provide insight
into the chemical nature of the property under consideration.22
As with QSPRs of biological systems, the substituted ligands in molecules need to be varied so
the data is meaningful.27
To this purpose, the skeletal BF2 formazanate structure will be substituted
in the R1-R3 positions with various substituents shown in Figure 4. The substituents will vary in
their position, sterics from low โHโ to high โnaphthylโ substituents, and in their electronic
properties, electron-withdrawing groups like โNO2โ substituents or electron-donating groups such
as โmethylโ substituents.
Figure 4: Boron difluoride skeletal structure with positions of substitution
10. 5
The results from these computations will allow analysis of substituent effects of BF2 formazanate
complexes when a variety of substituents impact its spectroscopic properties. The results of this
study can be used to develop a predictive model based on electronic and steric descriptors, adapt
the QSPR methodology outside biological systems, and estimate the Stokes shift in a time-efficient
manner through the derived regression equation.
2 Experimental
2.1 Simulation of Stokes shifts
Developing a model for the Stokes shifts of BF2 formazanate complexes requires three
experimental aspects: molecular training set, protocols for self-consistent reaction field and time
dependent DFT and finding the optimal level of theory for calculations.
2.1.1 Defining a molecular training set
Applying experimental methodology to larger molecules such as BF2 formazanate complexes is
computationally expensive often taking in excess of 150 hours. Using molecules which exhibit
similar extended ฯ-conjugation allows optimizing the model in a time-efficient manner.
Naphthalene and fluorene Figure 5 were chosen as training molecules for Stokes shift simulations
due to their similar ฯ-conjugation and ฯ ๏ ฯ* transitions. The absorption and emission of the
training molecules are simulated in both DCM and THF solvents.
Figure 5: Structure of training set molecules.
2.1.2 Computation details
To simulate the Stokes shifts of the training molecules in solvents like DCM and THF, a protocol
must be employed. The SCRF protocol in GUASSIAN 09 will be used to simulate solvent
interactions. This works through the use of a polarizable continuum model (PCM) using the
integral equation formalism in which the molecule of interest is placed within a cavity of two
overlapping solvent spheres,28
the solventsโ interactions on the molecule are thereby reproduced.
For excited state calculations in solution, there is a distinction between equilibrium and non-
11. 6
equilibrium calculations. There are two ways the solvent responds in regards to changes in the
state of the solute: it polarizes the electron distribution, which is a rapid process, and the solvent
molecules reorient themselves, a slower process. An equilibrium calculation describes a situation
with the solvent had time to fully respond to the solute in both these ways. A non-equilibrium
calculation is appropriate for the processes with are too rapid for the solvent to fully respond, such
as vertical electronic excitation.
TD-DFT is used for calculations which require non-equilibrium or excited-state geometries. TD-
DFT described in the following way, for a given interaction potential, the RG theorem29
shows that
the external potential uniquely determines the density. The Kohn-Sham approach chooses a non-
interacting system for which the interaction potential is zero to form the density equal to the
interacting system. The wave function of a non-interacting system can be represented as a Slater
determinate of single-particle orbitals. This determines a potential which can be used to determine
a non-interacting Hamiltonian Hs.
(2)
which determines a determinatal wave function
(3)
and generates a time-dependent density
(4)
Such that ฯs is equal to the density of the interacting system at all times. In this way if the potential
can be determined then the original Schrรถdinger equation, a single partial differential equation of
3N variables is replaced by N differential equations in 3 dimensions.
12. 7
Absorption and emission calculations
Simulations of Stokes shifts require the following calculations: ground-state geometry, non-
equilibrium solvation, absorption calculation, single point TD-DFT calculation, excited-state
geometry optimization, and emission calculation shown in Figure 6.
Figure 6: Calculations required to simulate absorption, emission, and Stokes shifts.
Ground-state geometry optimization
The output of the ground-state optimized geometry is the energy of the molecule in solution.
Non-equilibrium solvation
This calculation stores the information about the non-equilibrium solvation based on the ground-
state. This calculation yields the ground-state SCF energy.
Absorption calculation
The actual state-specific calculation is then done reading in the required information for non-
equilibrium solvation. The energy of the first excited-state is then calculated at the ground-state
optimized geometry. This calculation yields the ground-state PCM energy.
The maximum absorption is then calculated by:
ฮabs = ฮXS โ ๐ฆGS (5)
๐abs =
โ๐
ฮabs
(6)
13. 8
Single-point TD-DFT calculation
This TD-DFT calculation determines the vertical excitation energy based on a linear response from
the ground-state to the first allowed excited-state.
Excited-state geometry optimization
Using TD-DFT the force constants from the single-point calculations are read, the geometry is
optimized in equilibrium solvation.
Emission calculation
The first step of this calculation writes the solvation data of the state-specific equilibrium solvation
of the excited-state at its equilibrium geometry. This calculation yields the excited-state PCM
energy.
The second step of this procedure reads the solvation data and computes the ground-state energy
with excited-state geometryโ first excited state non- equilibrium static solvation. This calculation
yields the excited-state SCF energy.
ฮem = EXS
โ
โ EGS
โ (7)
๐em =
โ๐
ฮem
(8)
Sample outputs of the above calculations are in Supplemental Section A.1.
Calculations were performed using the outlined protocol on the Gaussian 09 user manual30
in
addition to the following keywords:
Int=(grid=UltraFine) โ Ultrafine integration grid for DFT calculations
Opt=(MaxCycle=100) โ increasing number of cycles for convergence at a minimum in
ground- and excited-state geometry optimizations
TD=(โฆ, NStates=3,โฆ) and TD=(โฆ, NStates=3, Root=๐ฅ) โ calculations were
specified to three excited-states to model allowed singlet transitions. The primary excited-state of
interest ๐ฅ being the first.
14. 9
2.1.3 Choice of functionals and basis sets
The optimal combination of functional and basis set or โlevel of theoryโ needs to be chosen for the
training set and protocols outlined. Calculations other than ground-state geometry were conducted
using a combination of various functional, basis sets, and solvents. Ground-state geometries were
calculated using the B3LYP functional and 6-31G(d) basis set.
Functionals
The functionals chosen for modeling absorption and emission are: B3LYP, PBE0, and M06-2X.
These three functionals are well known for their accuracy in spectroscopy calculations.31
B3LYP
and PBE0 are hybrid exchange-correlation functions constructed as a linear combination of the
Hartree-Fock (HF) exact exchange function. The parameters of the functionals are fitted based on
the functionalsโ prediction of experiment or calculated thermochemical data.32
Since the
functionals being tested have a HF component it reduces the self-interaction error leading to good
performance in TD-DFT calculations.31
The equation for a HF exact exchange energy is,
Ex
HF
= โ
1
2
โ โฌ ฯโ
ii,j (๐ซ1)ฯโ
i
(๐ซ1)
1
r12
ฯi(๐ซ2)ฯj(๐ซ2)d๐ซ1d๐ซ2 (9)
The B3LYP33
functional is based on the Becke 88 exchange functional34
, generalized-gradient
approximation (GGA), and the VWN local-density approximation (LDA) 35
given by,
Exc
B3LYP
= Ex
LDA
+ Ec
LDA
+ 0.20 ( Ex
HF
โ Ex
LDA) + 0.72 (Ex
GGA
โ Ex
LDA) + 0.81 (Ex
GGA
โ Ec
LDA
) (10)
The PBE0 functional32
mixes the PBE exchange energy and HF exchange energy according the
equation,
Exc
PBE0
=
1
4
Ex
HF
+
3
4
Ex
PBE
+ Ec
PBE
(11)
The M06-2X functional36
is a global hybrid functional with 54% HF exchange energy. The M06
suite of functionals are constructed using empirical fitting of their parameters but constraining to
the uniform electron gas.31
15. 10
Basis sets
A basis set is a set of functions that are combined in linear combinations to create molecular
orbitals. The Pople basis set functions are typically denoted by ๐ โ ๐๐๐.37
X represents the
number of primitive Gaussians for each core atomic orbital basis function, ๐and ๐ indicate the
valence orbitals composed of a linear combination of ๐ and ๐ primitive Gaussian functions. The
โ*โ adds valence polarized basis sets of the p, d, an f types, the โ+โ adds diffuse functions.38
The basis sets tested were: 6-31G(d), 6-31G(d,p), 6-31+G(d), 6-31+G(d,p), 6-311G(d), 6-
311G(d,p), 6-311+G(d), 6-311+G(d,p), 6-311+G(2d,p), 6-311+G(2df,2p) to test a variety of
polarized and diffuse functions on the Pople basis set.
Table 1: Summary of functionals and basis sets tested
Across the two training molecules, two solvents, three unique calculations, three functionals, and
10 basis sets are a total 360 unique calculations, equivalent to decades of computation years. The
level of theory which yields the smallest mean absolute error (MAE) across the training molecules
and solvents compared to experimental data will be chosen.
MAE =
1
๐
โ|e ๐ก|
๐
๐ก=1
(12)
S
Functionals tested Basis sets tested
B3LYP
PBE0
M06-2X
6-31G(d)
6-31G(d,p)
6-31+G(d)
6-31+G(d,p)
6-311G(d)
6-311G(d,p)
6-311+G(d)
6-311+G(d,p)
6-311+G(2d,p)
6-311+G(2df,2p)
16. 11
To calculate the emission under the same methodology would be far too computationally
expensive, as a single excited-state geometry optimization can take in excess of two weeks. To
determine if the model can reproduce emission energies and Stokes shifts, the optimal level of
theory will be used to calculate the Stokes shift for naphthalene, fluorene, and BF2 formazanate
complexes to determine accuracy.
2.2 QSPR model
The most accurate model for simulating the Stokes shift of BF2 formazanate complexes using the
methodology above will be used in the subsequent QSPR study. The model will simulate the
spectroscopic properties of the structurally tuned variations of the BF2 formazanate skeletal
structure. The following is required to conduct the QSPR study: representative training series to
simulate, electronic and steric descriptors to be used as independent regression variables, a
regression methodology and means of optimization.
2.2.1QSPR training series
The training series for the QSPR needs to be more robust than the model for the Stokes shift. A
large training set is required to accomplish the goal of creating a predictive model. The training
set should contain substituents which vary in their sterics and electronics, and should be able to
model currently synthesized complexes from the Gilroy group. In Table 2 are five training series
which meet the criteria set out. Each series can be used independently or in combination to build
a QSPR model.
Table 2: Training series for QSPR study
Hydrogen series Phenyl series Naphthyl series Varied series Equivalent series
R1 R2 R3 R1 R2 R3 R1 R2 R3 R1 R2 R3 R1 R2 R3
H H H Ph Ph H Nh Nh H Me Me H Me Me Me
H H Me Ph Ph Me Nh Nh Me Cl Cl H Cl Cl Cl
H H Cl Ph Ph Cl Nh Nh Cl CO2H CO2H H CO2H CO2H CO2H
H H CO2H Ph Ph CO2H Nh Nh CO2H OH OH H OH OH OH
H H OH Ph Ph OH Nh Nh OH NMe2 NMe2 H NMe2 NMe2 NMe2
H H NMe2 Ph Ph NMe2 Nh Nh NMe2 NO2 NO2 H NO2 NO2 NO2
H H NO2 Ph Ph NO2 Nh Nh NO2 CN CN H CN CN CN
H H CN Ph Ph CN Nh Nh CN CO CO H CO CO CO
H H CO Ph Ph CO Nh Nh CO
H H Ph Ph Ph Ph Nh Nh Ph
H H Nh Ph Ph Nh Nh Nh Nh
17. 12
The substituents in each series vary in their steric and electronics from low to high and electron-
donating to electron-withdrawing. The hydrogen, varied, and equivalent series were created to
draw inferences on the effect of substituents on the R3, R1 and R2 positions respectively. The
phenyl and naphthyl series attempt to do the same however represent complexes which have
already been synthesized in the Gilroy group12
.
2.2.2 Electronic and steric descriptors
The dependent variables in the regression model will be absorption and emission wavelengths
respectively. The independent variables or โdescriptorsโ need to have a theoretical basis in other
computational QSPR studies; therefore quantum-chemical descriptors are used because of the lack
of inherent error normally associated with experimental measurements. Systematic error may exist
in the simulation model, however this error is considered to be applied evenly through the series
analyzed, thus does not influence modeled trends.22
To determine influence of structural tuning on the absorption and emission, the following
descriptors will be calculated: highest occupied molecular orbital (HOMO) and lowest unoccupied
molecular orbital (LUMO) energies39
, HOMO-LUMO (HL) gap39
, dipole moment (ฮผ)40
, root of ฮผ
(โฮผ)40
for both the ground- and excited-state. Additionally, molar volume (MV)41
will be used to
describe the steric influence of the substituents. All descriptors with exception to molar volume
are contained within the output files of the ground- and excited-state geometry optimizations
respectively.
2.2.3 Regression methodology
The linear regression model for multiple independent variables can be described through the
following equations.42
Every independent variable (๐ฅ) is associated with a value of the dependent
variable (๐ฆ).
For p independent variables ๐ฅ1, ๐ฅ2, . . . , ๐ฅp the mean (ฮผy) or the โfitโ is
๐ ๐ฆ = ๐ฝo + ๐ฝ1 ๐ฅ1 + ๐ฝ2 ๐ฅ2 + โฏ + ๐ฝ ๐ ๐ฅ ๐ (13)
The observed values for ๐ฆ vary about ฮผ ๐ฆ and are assumed to have the same standard deviation.
The fitted values ๐1, ๐2, . . . , ๐ ๐ estimate the parameters ๐ฝo, ๐ฝ1, โฆ , ๐ฝp of the population. The
18. 13
regression model includes a model deviation term (๐) which represents the deviations of observed
๐ฆ from ฮผ ๐ฆ normally distributed with a mean of 0. The model for multiple linear regressions for ๐
observations where ๐ = 1, 2, โฆ , ๐ is,
๐ฆ๐ = ๐ฝo + ๐ฝ1 ๐ฅ๐1 + ๐ฝ2 ๐ฅ๐2 + โฏ + ๐ฝ ๐ ๐ฅ๐๐ + ๐๐ (14)
The least-squares model finds the line of best-fit by minimizing the sum of squares of the residuals
(๐๐) or vertical deviations from the line. A vertical deviation equal to 0 represents a point which
lies exactly on the line. The residuals (๐๐) is given by
๐๐ = ๐ฆ๐ โ ๐ฆ๐ (15)
Where ๐ฆ๐
represents the values which fit by the equation
๐ฆฬ๐ = ๐o + ๐1 ๐ฅ๐1 + ๐2 ๐ฅ๐2 + โฏ + ๐ ๐ ๐ฅ๐๐ (16)
2.2.4 Optimizing QPSR model
The multivariable linear regression models from the Analysis Toolpak in Microsoft Excel will be
used. The optimized model will ideally have a low standard error (SE), low MAE for residuals,
high adjusted r2
value to ensure accuracy of multiple descriptors, p and F-value under 5% to ensure
the underlying model is statistically sound at a 95% confidence interval.
Where ๐ is the sample size, ๐ท is the number of descriptors
SE =
๐๐ฅ
โN
(17)
r2
=
โ(๐ฆ๐
โ ๐ ๐ฆ)2
โ(๐ฆ๐ โ ๐ ๐ฆ)2
(18)
Adj. r2
= 1 โ
(1 โ r2)(๐ โ 1)
๐ โ ๐ท โ 1
(19)
19. 14
A ๐-value is defined as the probability under the assumption of a hypothesis (H), of obtaining a
result equal to or more extreme than observed in a normal distribution.
Figure 7: Visual representation of a p-value.
Below is the equation for the p-value of the two-tails in a Gaussian distribution,
๐value = 2 min โจPr(๐ โค ๐ฅ|๐ป), Pr(๐ โฅ ๐ฅ|๐ปโฉ (20)
F-tests analyze the variance of a quantifiable variable in pre-defined group. This can be used to
make sure that the groups of descriptors are significant to the regression equation.
Where ๐พ is number of groups and ๐ฆ๐๐ is the ๐th
observation in the ๐th
out of ๐พ groups,
F =
explained variance
unexplain variance
=
โ ni(yi
โ ฮผy)2
ร (N โ K)i
โ (yij โ y
i
)2 ร (K โ 1)i,j
(21)
Descriptors will be analyzed both individually and in conjunctions with each other to determine
how they affect the above metrics. The goal is to develop a model with the fewest descriptors that
predicts the spectroscopic properties of interest.
20. 15
3 Results and Discussion
3.1 Stokes shifts
The PBE0 functional consistently outperforms both B3LYP and M06-2X functionals across basis
sets; B3LYP consistently underestimates while M06-2X overestimates absorption energy, shown
through Figure 7 and 8.
Figure 7: Calculated absorption energy (eV) for naphthalene in DCM and THF compared to
experimental results (dashed line).
Figure 8: Calculated absorption energy (eV) for fluorene in DCM and THF compared to
experimental results (dashed line).
21. 16
Table 3: MAE of calculated absorption energy (eV) compared to experimental data for all
calculated levels of theory
The minimum MAE for calculated absorption energy across training molecules and solvents is
found to be the PBE0/6-311+G(d,p) level of theory.
Table 4: Percent error in absorption, emission and Stokes shifts (nm) for naphthalene, fluorene,
BF2 formazanate Complexes 12
in solution using PBE0/6-311+G(d,p).
Naphthalene
Solvent Absorption (nm) Emission (nm) Stokes shift (nm)
Calcd Expt
Error
(%)
Calcd Expt
Error
(%)
Calcd Expt
Error
(%)
DCM 276.8 277 0.07% 321.4 321 0.12% 44.6 44 1.36%
THF 277.2 276 0.43% 321.7 320.5 0.40% 44.6 44.5 0.22%
Fluorene
DCM 264.6 265 0.15% 311 311.5 0.16% 46.9 46.5 0.86%
THF 264.4 264 0.15% 310.5 309.5 0.32% 46.1 45.5 1.32%
BF2 Complex (R1 โ Ph, R2 โ Ph, R3 โ CN )
DCM 477 491 2.85% 550 584 5.82% 73 93 21.5%
THF 488.4 489 0.12% 550 585 6.98% 61.6 96 35.8%
BF2 Complex (R1 โ Ph, R2 โ Ph, R3 โ NO2 )
DCM 482 491 1.83% 552 587 5.96% 70 96 27.1%
B3LYP PBE0 M06-2X
Basis set MAE (eV)
6-31G(d,
6-31G(d,p)
6-31+G(d)
6-31+G(d,p)
6-311G(d)
6-311G(d,p)
6-311+G(d)
6-311+G(d,p)
6-311+(2d,p)
6-311+(2df,2p)
0.098
0.085
0.128
0.137
0.083
0.090
0.077
0.165
0.189
0.183
0.0939
0.0863
0.0359
0.0459
0.0253
0.0355
0.0052
0.0048
0.0951
0.1100
0.2942
0.2889
0.2238
0.2157
0.2610
0.2542
0.1879
0.1017
0.0686
0.0659
Minimum MAE 0.0767 0.0048 0.0669
22. 17
The chosen level of level of theory gives good results for absorption across all molecules including
the BF2 formazanate test set, boasting errors under 5%. The emission wavelengths and Stokes
shifts for the training molecules also show good agreement under 5%. Unfortunately, the emission
calculations and the Stokes shifts do not show good agreement. Given agreement for emission in
the training set, it is likely that the methodology does not scale with size or ฯ-conjugation. A
possible reason for the high error is, excited-state calculations which utilize the SCRF protocol are
unable to model bulk solution effects such as ฯ-stacking. The emission calculation show a
systematic underestimation of the emission energy. Systematic errors in a regression model are
held constant throughout a training series; therefore the level of theory PBE0/6-311+G(d,p) will
be utilized for modeling all spectroscopic properties in the QSPR study.
3.2 Effect of structural tuning on spectroscopic properties
Given the computationally heavy nature of simulating spectroscopic properties of 49 training
molecules and their descriptors of interest, one was unable to get all the required data for a
thorough analysis. See Supplemental Information A.2 for more information on computation
time. However, enough was acquired to do a partial QSPR study and analyze trends.
Table 5: Compilation of absorption, emission, and descriptor data acquired from training series.
Hydrogen Series Phenyl Series Varied Series Equivalent Series
Position R3
a R3 R1, R2 R1, R2, R3
Calc.b Abs
(nm)
Emis
(nm)
SS
(nm)
Abs
(nm)
Emis
(nm)
SS
(nm)
Abs
(nm)
Emis
(nm)
SS
(nm)
Abs
(nm)
Emis
(nm)
SS
(nm)
H 392.3 493.2 101.0 331.1 543.9 212.8 392.3 493.2 101.0 392.3 493.2 101.0
Me 385.2 565.2 180.0 423.3 555.4 132.1 379.2 468.1 88.9 362.5 ND ND
Cl 388.1 485.5 97.4 506.4 577.3 70.9 331.6 429.4 97.8 377.6 429.3 51.6
CO2H 393.8 485.4 91.6 468.1 539.9 71.8 492.7 634.8 142.1 497.2 624.5 127.3
OH 401.8 542.5 140.7 433.0 566.7 133.8 319.2 357.9 38.8 321.6 366.5 44.9
NMe2 727.8 ND ND 932.0 2210.7 1278.7 372.3 ND ND 439.7 ND ND
NO2 385.3 463.3 78.0 481.7 552.3 70.6 423.5 551.0 127.5 418.9 ND ND
CN 379.9 467.8 87.9 ND ND ND 454.0 609.7 155.8 435.4 ND ND
CO 960.3 1522.2 561.8 ND ND ND 493.9 ND ND ND ND ND
Ph 492.1 1120.4 628.3 ND ND ND ND ND ND ND ND ND
Nh 734.4 2958.7 2224.3 ND ND ND ND ND ND ND ND ND
a
If substituent position is not specified then Rn is Ph for the Phenyl Series, and H for all others.
b
ND โ No Data, ND was obtained for the naphthyl series due to the computation time required.
23. 18
3.2.1 Steric and electronic influence on Stokes shift in the R3 position
Steric influence
The influence for a substituentsโ sterics is unclear, however there seems to be a trend in which
substituents with higher sterics disproportionately increase the emission wavelength compared to
absorption. The emission for the โnaphthylโ in the hydrogen series is 2969 nm over 4 times its
absorption, while โphenylโ in the same series has an emission of 1120 nm about 2 times its
absorption. Molecules with much lower sterics have much smaller absorption to emission ratios,
1:1.25 and 1:45 for โchlorideโ and โmethylโ respectively.
Electronic influence
Using the hydrogen series baseline (R1 = R2 = R3 = H), one can see that electron-donating groups
such as โmethylโ from the hydrogen series cause a red shift by 79 nm to the Stokes shift. Highly
electron-withdrawing groups such as โNO2โ cause a blue shift by 23 nm. This trend holds for the
phenyl series (R1 = R2 = Ph, R3 = H) where โmethylโ causes a red shift 80.7 nm, and โNO2; a blue
shift by 142.2 nm. One can conclude that the ability for electron-donating and electron-
withdrawing groups to shift the Stokes shift increases when large ฯ-conjugated substituents are in
the R1 and R2 positions, thus agreeing with previous research.11
3.2.2 Electrostatics and electronic influence on Stokes shift in the R1 and R2 positions
Steric influence
The influence from a substituents steric is unclear from the data calculated for the R1 and R2
positions.
Electronic influence
Compared to R3 position, the trends in electronic influence on absorption and emission is reversed
for R1 and R2 substituted positions. Compared to the hydrogen series baseline, the electron-
donating โmethylโ substituents from the varied series predict a blue shift of 12 nm; electron-
withdrawing groups like โNO2โ from the varied series the cause a red shift by 27 nm.
24. 19
3.2.3 Multivariable regression of hydrogen training series
QSPR analysis on series without sufficient emission data must be excluded. These must be
excluded because the chosen excited-state quantum-chemical descriptors are found in the output
of emission calculations. The hydrogen series is the only series for which multivariable regression
analysis can be conducted.
Table 6: Tests to optimize multivariable regression model for absorption (nm) of hydrogen series
Hydrogen Series - Absorption Regression Model
Trial # Descriptors r2
Adj. r2
F-value MAE (nm) Standard Error (nm)
1 All NA NA NA NA NA
2 Less MV 0.9998 -0.0016 NA 2.0 7.9
3 Less HL gap 0.9998 0.9984 2.93% 2.0 7.9
4 Less ELUMO
โฌ 0.9995 0.9979 0.16% 2.8 9.1
5 a
Less ELUMO
โ
0.9983 0.9949 0.03% 6.0 14.1
a
Supplemental Information A.3 contains output of trial 5
The goal for the optimized regression model was a low standard error, low MAE, high adjusted
r2
, and F-value less than 5% to ensure a statically sound model within a 95% confidence interval;
these goals were realized in five trials. The reasons descriptors were eliminated is as follows: in
trial 1, the MV had a regression co-efficient of 0, trial 2 had line overfitting as dictated by the
negative adjusted r2
value caused by the HL gap. Trials 3 through 5 removed statistically
insignificant descriptors of LUMO and excited-state LUMO energies carrying p-values of
42.4%, and 14.9% respectively.
ฮปabs(nm) = 5071 โ 6897 EHOMO + 8142 EHOMO
โ
โ 1214 ฮผ + 2619ฮผโ
+ 4110โฮผ โ 9058โฮผ
โ (22)
The final equation for the regression model boast a 6.0 nm MAE and 14.1 nm standard error. The
high coefficients in the equation are due to the units of the descriptors being orders of magnitude
smaller than nm.
25. 20
Table 7: Statistical output of multivariable linear regression on emission (nm) for hydrogen
training series.
Hydrogen Series - Emission Regression Model
Trial # Descriptors r2
Adj. r2
F-value MAE (nm) Standard Error (nm)
1 All NA NA NA NA NA
2 Less MV 0.9905 -0.0855 NA 58.3 234.7
3 Less HL gap 0.9905 0.9145 21.12% 58.3 234.7
4 Less ELUMO
โฌ 0.9840 0.9281 5.48% 89.0 215.3
5 Less EHOMO
โฌ 0.9833 0.9500 0.92% 85.7 179.4
6 Less EHOMO
โ
0.9826 0.9608 0.13% 82.5 158.8
7 Less ELUMO
โ 0.9767 0.9581 0.03% 72.8 164.3
The emission regression model took a series of 7 trials to optimize. The reason descriptors were
eliminated were as follows: trials 1 thorough 3 followed the same logic as the absorption model.
Trials 4 through 7 eliminated the descriptors of LUMO energy, HOMO energy, excited-state
HOMO energy, and excited-state LUMO energy which had p-values of 56.0%, 79.9%, 73.7% and
30.9% respectively. This suggests that these descriptors are unimportant in emission simulation.
ฮปem(nm) = 22048 โ 10187 ฮผ + 17285 ฮผโ
+ 31918 โฮผ โ 57244 โฮผ โ
(23)
Examining the above equation might explain why both MAE and standard error yield higher errors
of 72.8 nm and 164.3 nm respectively. The intercept is an order of magnitude greater than the
equation for absorption (22048 vs. 5071). To counter this the coefficients of the independent
variables must be higher to compensate (-10187 vs. -1214 for ฮผ). Subtle variations in the descriptor
variables inputs would yield a higher associated error. It should be noted that even though the
adjusted r2
value is lower and errors much higher for this model, it is still statistically significant
with a F-value of 0.03%. It is likely that other descriptors which were not tested need to be
employed to predict behavior; one would still expect that the trends replicated by this model to be
accurate.
26. 21
4 Conclusion
The proposed methodology for simulating the Stokes shift and the following QSPR study on
substituents effects on BF2 formazanate complexes reproduces experimental data and provides
foundation for future research into substituent analysis. It has been shown that PBE0/6-311+G(d,p)
level of theory is optimal in simulating absorption, emission, and Stokes shifts with a MAE of
0.0048 eV compared to other functionals and basis sets tested across 60 unique trials. For ฯ-
conjugated molecules in the training set, errors compared to experiment in the absorption, emission
and Stokes shift were under 0.5%, 0.4% and 1.4%. The BF2 formazanate complexes tested under
the same level of theory had errors under 3% for absorption and 7% for emission. Increased error
in emission energy is likely due intermolecular effects not modeled in the system. Over 45 training
molecules were structurally tuned and analyzed based on their electronic and steric features.
Observed electronic trends from previous research were reproduced,11
specifically electron-
donating groups caused a red shift in the R3 position, while electron-withdrawing groups caused a
blue shift to the Stokes shift. The QPSR training model further suggests that this trend is reversed
when substituents are placed in the R1 and R2 positions; electron-donating groups causing a blue
shift, and electron-withdrawing groups a red shift. When substituents with high sterics were added
in the R3 position of the BF2 formazanate skeletal structure the ratio of emission to absorption
increased 5:1 for โnaphthylโ vs 1.2:1 for โchlorideโ. Finally, a multivariable regression analysis
found that the absorption energy could be described by the dipole moment, excited-state dipole
moment, dipole moment derivatives, HOMO and excited-state HOMO energies. The hydrogen
series regression equation boasts a 14.1 nm standard error, 6.0 nm mean absolute error, adjusted
r2
of 0.9949, F-value under 0.04%, and all descriptors had a p-value under 1% and is therefore
accurate within a 99% confidence interval.
4.1 Further Research
Further research may be conducted on the analysis for all the training series with the methodology
outlined herein. The results of the regression analysis from this and future research could be used
on a test set to determine whether the regression equations can reproduce experimental
spectroscopic properties. Various other descriptors could be used to identify their relationships to
properties such as emission wavelengths. Databases of various molecules could be created,
analyzed, and used for optimizing substituents for strategic synthesis in functional materials.
27. 22
References
1
J. Roncali, P. Leriche, and P. Blanchard, Adv. Mater. 26, 3821 (2014).
2
D. Frath, J. Massue, G. Ulrich, and R. Ziessel, Angew. Chem. Int. Ed. Engl. 53, 2290 (2014).
3
W. Wu, Y. Liu, and D. Zhu, Chem. Soc. Rev. 39, 1489 (2010).
4
M. Montalti, A. Credi, L. Prodi, and M.T. Gandolfi, Handbook of Photochemistry (2006).
5
L. Quan, Y. Chen, X.-J. Lv, and W.-F. Fu, Chem. Eur. J. 18, 14599 (2012).
6
J.F. Araneda, W.E. Piers, B. Heyne, M. Parvez, and R. McDonald, Angew. Chem. Int. Ed.
Engl. 50, 12214 (2011).
7
S.M. Barbon, V.N. Staroverov, P.D. Boyle, and J.B. Gilroy, Dalton Trans. 43, 240 (2014).
8
M. Szymczyk, A. El-Shafei, and H.S. Freeman, Dye. Pigment. 72, 8 (2007).
9
W.M. Frederiks, J. van Marle, C. van Oven, B. Comin-Anduix, and M. Cascante, J. Histochem.
Cytochem. 54, 47 (2006).
10
M. Hesari, S.M. Barbon, V.N. Staroverov, Z. Ding, and J.B. Gilroy, Chem. Commun. 51, 3766
(2015).
11
S.M. Barbon, P.A. Reinkeluers, J.T. Price, V.N. Staroverov, and J.B. Gilroy, Chem. Eur. J. 20,
11340 (2014).
12
S.M. Barbon, V.N. Staroverov, and J.B. Gilroy, J. Org. Chem. 80, 5226 (2015).
13
M.J. Frisch, G.W. Trucks, H.B. Schlegel, G.E. Scuseria, M. a. Robb, J.A. Cheeseman, G.
Scalmani, V. Barone, B. Mennuci, G.A. Petersson, H. Nakatsuji, M. Caricato, X. Li, H.P.
Hratchian, A.F. Izmaylov, J. Bloino, G. Zheng, J.L. Sonnenberg, W. Liang, M. Hada, M. Ehara,
K. Toyota, R. Fukuda, J. Hasegawa, M. Ishida, T. Nakajima, Y. Honda, O. Kitao, H. Nakai, T.
Vreven, J.A. Montgomery, J.E. Peralta, F. Ogliaro, M.J. Bearpark, J.J. Heyd, E. Brothers, K.N.
Kudin, V.N. Staroverov, T. Keith, R. Kobayashi, J. Normand, K. Raghavachari, A. Rendell, J.C.
Burant, S.S. Iyengar, J. Tomasi, M. Cossi, N. Rega, J.M. Millam, M. Klene, J.E. Knox, J.B.
Cross, V. Bakken, C. Adamo, J. Jaramillo, R. Gomperts, R.E. Stratmann, O. Yazyev, A.J.
Austin, R. Cammi, C. Pomelli, J.W. Ochterski, R.L. Martin, K. Morokuma, V.G. Zakrezewski,
28. 23
G.A. Voth, P. Salvador, J.J. Dannenberg, S. Dapprich, P. V. Parandekar, N.J. Mayhall, A.D.
Daniels, O. Farkas, J.B. Foresman, J. V. Ortiz, J. Cioslowski, and D.J. Fox, Gaussian Dev.
Version, Revis. H. 32 Gaussian (2010).
14
T. Mineva, N. Russo, and M. Toscano, Int. J. Quantum Chem. 56, 663 (1995).
15
E.K.U. Gross and W. Kohn, Adv. Quantum Chem (1990).
16
P. Hohenberg, Phys. Rev. 136, B864 (1964).
17
W. Kohn and L.J. Sham, Phys. Rev. 140, A1133 (1965).
18
M. Bourass, A. Touimi Benjelloun, M. Hamidi, M. Benzakour, M. Mcharfi, M. Sfaira, F.
Serein-Spirau, J.-P. Lรจre-Porte, J.-M. Sotiropoulos, S.M. Bouzzine, and M. Bouachrine, J. Saudi
Chem. Soc. (2013).
19
F. Cervantes-Navarro and D. Glossman-Mitnik, Chem. Cent. J. 6, 70 (2012).
20
G.-F. Yang and X. Huang, Curr. Pharm. Des. 12, 4601 (2006).
21
J. Verma, V.M. Khedkar, and E.C. Coutinho, Curr. Top. Med. Chem. 10, 95 (2010).
22
M. Karelson, V.S. Lobanov, and A.R. Katritzky, Chem. Rev. 96, 1027 (1996).
23
T.W. Schultz, M.T.D. Cronin, J.D. Walker, and A.O. Aptula, J. Mol. Struct. THEOCHEM
622, 1 (2003).
24
A.E. Soffers, M.G. Boersma, W.H. Vaes, J. Vervoort, B. Tyrakowska, J.L. Hermens, and I.M.
Rietjens, Toxicol. In Vitro 15, 539.
25
M.M.C. Ferreira, J. Braz. Chem. Soc. 13, 742 (2002).
26
R. Kiralj and M.M.C. Ferreira, J. Braz. Chem. Soc. 20, 770 (2009).
27
M.O. Taha, A.M. Qandil, D.D. Zaki, and M.A. AlDamen, Eur. J. Med. Chem. 40, 701 (2005).
28
S. Miertu, E. Scrocco, and J. Tomasi, Chem. Phys. 55, 117 (1981).
29
E. Runge and E.K.U. Gross, Phys. Rev. Lett. 52, 997 (1984).
30
1 (2015).
29. 24
31
Y. Zhao and D.G. Truhlar, J. Phys. Chem. A 110, 13126 (2006).
32
J.P. Perdew, M. Ernzerhof, and K. Burke, J. Chem. Phys. 105, 9982 (1996).
33
K. Kim and K.D. Jordan, J. Phys. Chem. 98, 10089 (1994).
34
A.D. Becke, Phys. Rev. A 38, 3098 (1988).
35
S.H. Vosko, L. Wilk, and M. Nusair, Can. J. Phys. 58, 1200 (1980).
36
Y. Zhao and D.G. Truhlar, Theor. Chem. Acc. 120, 215 (2007).
37
R. Ditchfield, J. Chem. Phys. 54, 724 (1971).
38
J.A. Montgomery, M.J. Frisch, J.W. Ochterski, and G.A. Petersson, J. Chem. Phys. 110, 2822
(1999).
39
A.R. Katritzky, V.S. Lobanov, and M. Karelson, Chem. Soc. Rev. 24, 279 (1995).
40
L. Buydens, D.L. Massart, and P. Geerlings, Anal. Chem. 55, 738 (1983).
41
D.F. V. Lewis, C. Ioannides, and D. V. Parke, Xenobiotica 24, 401 (2008).
42
J.O. Rawlings, S.G. Pantula, and D.A. Dickey, editors , Applied Regression Analysis
(Springer-Verlag, New York, 1998).
30. A Supplemental Information
A.1 Sample output for simulation calculations using acetaldehyde
Ground-state geometry optimization
SCF Done: E(RB3LYP) = -153.851761719 A.U. after 1
cycles
Non-equilibrium solvation
No output for interpretation
Absorption calculation
After PCM corrections, the energy is -153.687679826 a.u.
Single-point TD-DFT calculation
No output for interpretation
Excited-state geometry optimization
Total Energy, E(TD-HF/TD-KS) = -153.705918726
Emission calculation
After PCM corrections, the energy is -153.707148980 a.u.
SCF Done: E(RB3LYP) = -153.822024722 A.U. after 10
cycles
A.2 Total computation time utilized on SHARCNET for calculations
31. A.3 Sample output of regression trial
Final regression output for optimized hydrogen series absorption
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.999150538
R Square 0.998301798
Adjusted R Square 0.994905395
Standard Error 14.1149689
Observations 10
ANOVA
df SS MS F Significance F
Regression 6 351361.0959 58560.18 293.9291 0.000305546
Residual 3 597.6970412 199.2323
Total 9 351958.7929
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Lower 95.0%Upper 95.0%
Intercept 5071.433211 310.4157034 16.33755 0.0% 4083.551902 6059.315 4083.552 6059.314519
HOMO -6896.995793 1150.644052 -5.99403 0.9% -10558.85871 -3235.13 -10558.9 -3235.13288
DM -1214.378491 117.1538478 -10.3657 0.2% -1587.214321 -841.543 -1587.21 -841.5426608
RDM 4109.608812 402.0016826 10.22286 0.2% 2830.260042 5388.958 2830.26 5388.957582
ExHOMO 8141.875651 1263.507913 6.443866 0.8% 4120.829561 12162.92 4120.83 12162.92174
ExDM 2619.329506 213.6784301 12.25828 0.1% 1939.309376 3299.35 1939.309 3299.349637
ExRDM -9058.120425 746.8661692 -12.1282 0.1% -11434.98191 -6681.26 -11435 -6681.258944
RESIDUAL OUTPUT
Observation Predicted Abs Residuals Abs. Error
1 377.5660811 14.72161536 14.72162
2 393.3729351 -8.212519227 8.212519
3 389.9451678 -1.893151418 1.893151
4 396.5349594 -2.730898376 2.730898
5 388.9116662 12.85973835 12.85974
6 389.2860815 -3.975836674 3.975837
7 382.7882077 -2.88556638 2.885566
8 957.9521865 2.373056185 2.373056
9 502.4186819 -10.36025899 10.36026
10 734.260612 0.103821168 0.103821
MAE 6.011646