Quantitative Structure Activity Relationship.pptx

Quantitative Structure Activity
Relationship (QSAR): Statistical
method and Product concept
Presented by:
Radha Sureshrao Chafle
F. Y. M. Pharm. (Semester II)
(Pharmacology)
Guided by:
Dr. Mrs. Vandana S. Nikam
HOD, Associate Professor in Pharmacology

What is QSAR?
 QSAR is a mathematical relationship
which describe the structural dependence
of biological activities either by
physicochemical parameters, by indicator
variables encoding different structural
features , or by three-dimensional
molecular property profiles of the
compounds.

Contd.
Drugs, which exert their biological effects by interaction
with a specific target must have :
 a three-dimensional structure, which in the arrangement
of its functional groups and in its surface properties is
more or less complementary to a binding site.
 Better the steric fit and complementarity of the of the
surface properties of a drug to its binding site are, the
higher its affinity will be and the higher may be its
biological activity.

Classical QSAR analyses
 consider only 2D structures
 main field of application is in substituent
variation of a common scaffold.
3D QSAR
• has a much broader scope.
• starts from 3D structures and
correlates biological activities with
3D-property fields.

History & Development of QSAR
 1868, Crum-Brown and Fraser: Published an equation Φ =
f(C) i.e. assumption of physiological activity Φ as function of
the chemical structure C.
 1900, H. H. Meyer and C. E. Overton: lipoid theory of
narcosis
 1930‘s, L. Hammett: electronic sigma constants
 1964, C. Hansch and T. Fujita: QSAR
 1984, P. Andrews: affinity contributions of functional groups
 1985, P. Goodford: GRID (hot spots at protein surface)
 1988, R. Cramer: 3D QSAR
 1992, H.-J. Bohm: LUDI interaction sites, docking, scoring
 1997, C. Lipinski: bioavailability rule of five
 1998, Ajay, W. P. Walters and M. A. Murcko; J. Sadowski
and H. Kubinyi: drug-like character

Basic Requirements in QSAR
Studies
 all analogs belong to a congeneric series
 all analogs exert the same mechanism of
action
 all analogs bind in a comparable manner
 the effects of isosteric replacement can be
predicted
 binding affinity is correlated to interaction
energies
 biological activities are correlated to binding
affinity

Molecular Properties and Their
Parameters
Molecular Property Corresponding
Interaction
Parameters
Lipophilicity hydrophobic interactions log P, 𝜋, 𝑓, RM, 𝜒
Polarizability van-der-Waals
interactions
MR, parachor, MV
Electron density ionic bonds, dipole-
dipole interactions,
hydrogen bonds, charge
transfer interactions
σ, R, F, κ, quantum
chemical indices
Topology steric hindrance
geometric fit
Es, rv, L, B, distances,
volumes

QSAR models
 Hansch model (property-property relationship):
 Definition of the lipophilicity parameter π
πX = log PRX - log PRH
where PRX represents the partition coefficient between n-
octanol and water and PRH that of the parent compound.
 Linear Hansch model
Log 1/C = a log P + b σ + c MR + ... + k
 Nonlinear Hansch models
log 1/C = a (log P)2 + b log P + c σ + ... + k
log 1/C = a π2 + b π + c σ + ... + k
log 1/C = a log P - b log (ßP + 1) + c σ + ... +

Contd.
 Free-Wilson model (structure-property relationship)
log 1/C = Σ ai + µ
ai = substituent group contributions
µ = activity contribution of reference compound
 Mixed Hansch/Free-Wilson model
log 1/C = a (log P)2 + b log P + c σ + ... + Σ ai + k
log 1/C = a log P - b log (ßP + 1) + c σ + ... + Σ ai + k

Lipophilicity
Hydrophobic interaction between a drug
and a binding site at a receptor
Definition of Partition Coefficients
P = corg/caq (n-octanol/water system)

n-Octanol/Water as a Standard
System
 membrane analogous structure
 hydrogen bond donor and acceptor
 practically insoluble in water
 no desolvation on transfer into organic
phase
 very low vapor pressure
 transparent in the UV region
 large data base of log P values

 Additivity Principle of π Values(C.
Hansch, 1964)
πX = log PR-X - log PR-H
The lipophilicity parameter π is an additive,
constitutive molecular parameter;
compare the Hammett Equation:
ρσX = log KR-X - log KR-H

 Non additivity of π Values
- intramolecular hydrogen bonds
- ortho effects (phenols)
- polysubstituted aromatic compounds
- conjugation (push pull effect)
- Heterocyclic compound
- Cyclophanes

Hydrophobic Fragmental Constants f
 The hydrophobic fragmental constant of a substituent
or molecular fragment represents the lipophilicity
contribution of that molecular fragment .
(R. Rekker, The Hydrophobic Fragmental Constant,
Elsevier, Amsterdam 1977; R. Rekker. Eur. J. Med. Chem.
14, 479 (1979)
log P = Σ aifi (R. Rekker, 1973)
 Experimental Determination of Log P Values
- Shake flask method
- Reversed phase thin layer chromatography
- High performance liquid chromatography (HPLC)

Polarizability Parameters
 Molar volume, Molar Refractivity, Parachor
MV =
MW
d
MR =
n2 – 1
n2+ 2
.
MW
d
PA = 𝛾1/4 MW
d
d = density; n = refraction index;
γ = surface tension
(MR is most often scaled by a factor of 0.1)

Electronic Parameters
 Hammett Equation
ρσ = log KRX - log KRH
 Calculation of pKa Values
pKa R-X = pKa R-H - ρσ
pKa value of 3,5-dinitro-4-methyl-benzoic
acid
(pKa benzoic acid = 4.20)
experimental value = 2.97
calculated value = 4.20 - (0.71 - 0.17 + 0.71) =
2.95

Quantum Mechanical Descriptors
 Atom partial charges:
Mulliken population analysis (orbital population)
ESP charges (mapping EP to atom locations)
 Dipole moment:
strength and orientation behavior of a molecule in an
electrostatic field
 HOMO / LUMO (“frontier orbital theory“):
HOMO = energy of highest occupied molecular orbital,
“nucleophilicity’’
LUMO = energy of lowest unoccupied molecular orbital,
“electrophilicity”
 Superdelocalizability:
estimate for the reactivity of positions in aromatic hydrocarbon

3D QSAR
 3D QSAR is an extension of classical QSAR which
exploits the 3 dimensional properties of the ligands
to predict their biological activity using robust
statistical analysis like PLS, G/PLS, ANN etc.
 3D QSAR uses probe based sampling within a
molecular lattice to determine three-dimensional
properties of molecules and can then correlate these
3D descriptors with biological activity.
 Some of the major factors like desolvation
energetics, temperature, diffusion, transport, pH,
salt concentration etc. which contribute to the
overall free energy of binding are difficult to handle,
and thus usually ignored.

On the basis of
intermolecular bonding
On the basis of alignment
criterion
On the basis of
chemometric
techniques used
Ligand Based
3D QSAR
For e.g.
CoMFA,
CoMSIA,
COMPASS,
CoMMA,
SoMFA
Receptor
Based 3D
QSAR
For e.g.
COMBINE,
AFMoC,
HIFA,
CoRIA
Linear 3D
QSAR
For e.g. CoMFA,
CoMSIA,
AFMoC,
GERM,
CoMMA,
SoMFA
Classification of 3D QSAR
Alignment
dependent 3D
QSAR
For e.g. CoMFA,
CoMSIA,
GERM,
COMBINE,
AFMoC, HIFA,
CoRIA
Alignment
independent 3D
QSAR
For e.g.
COMPASS,
CoMMA, HQSAR,
WHIM,
EVA/CoSA,
GRIND

Comparative Molecular Field
Analysis (CoMFA)
 The Scientist named Cramer developed the
predecessor of 3D approaches called Dynamic
Lattice Oriented Molecular Modeling System
(DYLOMMS) that involves the use of PCA to
extract vectors from the molecular interaction fields,
which are then correlated with biological activities
in 1987.
 CoMFA, powerful 3D QSAR methodology is a
combination of GRID and PLS.

Protocol for CoMFA
 Determination of Bioactive conformations of the
molecule.
 Superimposition or the alignment of molecules
using either manual or automated methods, in a
manner defined by the supposed mode of
interaction with the receptor.
 The steric and electrostatic fields calculated
around the molecules with different probe groups
positioned at all interactions of the lattice.
 The overlaid molecules are placed in the center of
a lattice grid with a spacing of 2 Å.

Contd.
 The PLS technique is used to correlate the
interaction energy or field values with the
biological activity, by which the quantitative
influence of specific chemical features of
molecules on their biological can be
identified and extracted.
 The results are coupled as correlation
equations with the number of latent variable
terms, each of which is a linear combination
of original independent lattice descriptors.

Steps in CoMFA:
 a set of molecules is first selected.
 all molecules have to interact with the same kind
of receptor (or enzyme, ion channel, transporter)
in the same manner, i.e., with identical binding
sites in the same relative geometry.
 a certain subgroup of molecules is selected
which constitutes a training set to derive the
CoMFA model.
 The residual molecules are considered to be a
test set which independently proves the validity
of the derived model(s).

 Atomic partial charges are calculated and
(several) low energy conformations are
generated. A pharmacophore hypothesis is
derived to orient the superposition of all
individual molecules and to afford a rational
and consistent alignment.
 A sufficiently large box is positioned around
the molecules and a grid distance is defined.
 PLS analysis is the most appropriate method
for this purpose. Normally, cross-validation is
used to check the internal predictivity of the
derived model.

Pharmacophore hypotheses and
alignment:

Drawbacks of CoMFA:
 Too many adjustable parameters
 Uncertainty in selection of compounds and
variables.
 Fragmented contour maps with variable
selection procedures.
 Hydrophobicity not well quantified
 Cut-off limits used.
 Low signal to noise ratio due to many useless
field variables.
 Imperfections in potential energy funtions.
 Applicable only to in vitro data.

Comparative Molecular Similarity
Indices Analysis (CoMSIA)
 Molecular similarity indices are calculated
from modified SEAL similarity fields are
employed as descriptors to simultaneously
consider steric, electrostatic, hydrophobic
and hydrogen bonding properties.
 These indices are estimated indirectly by
comparing the similarity of each molecule in
the dataset with a common probe atom
(having a radius of 1Å, charge of +1 and
hydrophobicity of +1) positioned at the
intersections of a surrounding grid/lattice.

 For computing similarity at all grid points, the
mutual distances between the probe atom and the
atoms of the molecules in the aligned dataset are
also taken into account.
 To describe this distance dependence and
calculate the molecular properties, Gaussian type
functions are employed.
 Since the underlying Gaussian type functional
forms are ‘smooth’ with no singularities their
slopes are not as steep as the Columbic and
Lennerd Jones potentials in CoMFA; therefore no
arbitrary cut off limits are required to be defined.

Comparison between CoMFA and
CoMSIA
CoMFA CoMSIA
Function type Lennerd-Jones potential,
Coulomb potential
Gaussian
Descriptors Interaction energies Similarity indices
Cut-off required Not required
Field Steric, electrostatic Steric, electrostatic,
hydrophobic, hydrogen
bond donor and
hydrogen bond acceptor
Contour map Often not contiguous Contiguous
Model reproducibility Poor Good
*CoMSIA is Provide By TRIPOS Inc. in the Sybyl Software, along
with CoMFA.

Statistical methods used in QSAR
Linear Regression
Analysis (RA)
Multivariate Data
Analysis
Pattern Recognition
Simple Linear regression Principal component
analysis (PCA)
Cluster analysis
Multiple Linear
regression (MLR)
Principal component
regression (PCR)
Artificial neural
networks (ANNs)
Stepwise multiple linear
regression
Partial least square
analysis (PLS)
k-nearest neighbor
(kNN)
Genetic function
approximation (GFA)
Genetic partial least
squares (G/PLS)

Linear Regression Analysis (LRA)
 Linear Regression Analyses are
considered as an easily interpretable
methods indicate for QSAR analysis.
 These techniques construct a statistical
model to represent the correlation of one
or more independent variables(x) with a
dependent explicative variable (y).
 The model can be utilized to predict y
from the knowledge of x variables, either
quantitative or qualitative.

a. Simple Linear regression
method
 Standard linear regression calculation to
generate a set of QSAR equations that
include a single independent descriptor x
and dependent variable y.
 A one term linear equation is produced
separately for each independent variable
from the descriptor set.
𝑦 = 𝑎 + 𝑏𝑥

b. Multiple Linear regression (MLR)
 Referred as linear free energy relationship
(LFER) method.
 Generates QSAR equations by performing
standard multivariable regression
calculations to identify the dependence of
a drug property or any all of the
descriptors under investigation.
 Involves more than one variables.
𝑦 = 𝑏0 + 𝑏1𝑥1 + 𝑏2𝑥2 + … … … + 𝑏𝑚𝑥𝑚 + 𝑒

c. Stepwise multiple linear
regression
 Commonly used variant of MLR.
 Creates multiple term linear equation but not
all the independent variables are used.
 Each independent variable is sequentially
added to the equation and new regression is
performed every time.
 The new term is preserved only if the model
passes a test for significance.
 Is useful when the number of descriptors are
large and key descriptor is unknown.

Multivariate Data Analysis
 It replaced LRA.
 It tried to explain an extended set of
variables by means of a reduced number
of new latent variables possessing the
maximum amount of information relevant
to the problem.

a. Principal component analysis
(PCA)
 Data reduction technique that does not
generate QSAR model.
 Creates new set of orthogonal descriptors
i.e. principal components which describe
most of the information contained in the
independent variables.
 Reduces dimensionality of a multivariate
data set of descriptors to the actual
amount of data available.

b. Principal component regression
(PCR)
 When principal components are employed
as the independent variables to perform a
linear regression, the method is termed as
Principal component regression.
 PCR applies scores from PCA
decomposition as regressors in the QSAR
model, to generate multiple term linear
equation.

c. Partial least square analysis (PLS)
 An iterative regression procedure that
produces its solution based on linear
transformation of large number of original
descriptors to a small number of new
orthogonal terms called latent variables.
 PLS is able to analyze complex SAR data
in a more realistic way.
 Is able to interpret the influence of
molecular structure on biological activity

d. Genetic function approximation
(GFA)
 Serves as an alternative to standard
regression analysis for building QSAR
equations.
 Employs natural principles of evolution of
species which leads to improvements by
recombination
 Suitable for obtaining QSAR equations when
dealing with a larger number of independent
variables.
 Results in multiple models generated by
initial models using genetic algorithms.

e. Genetic partial least squares
(G/PLS)
 It is valuable analytical tool that has
evolved by combining the best features of
GFA and PLS.

Pattern Recognition
 The method is based on the principal of
analogy.
 The method is used for the detection of
the distance or closeness within the large
amount of multivariate data.

a. Cluster analysis
 Statistical pattern recognition method used to
investigate the relationship between
observations associated with several
properties and to partition the data set into
categories consisting of similar elements.
 Allows for the consideration of the inactive
compounds in the analysis.
 Can be used to study a large set of
substituents to identify subsets which share
similar physical properties.

b. Artificial neural networks (ANNs)
 The technique has its origin from the real
neurons present in an animal brain.
 Are parallel computational systems
consists of groups of highly
interconnected processing elements called
neurons, which are arranged in a series of
layers.
a)First layer: Input layer
b)Subsequent layer: Hidden layer
c) Last layer: Output layer

Contd.
 Each layer does its independent
computations and pass the results to another
one.
 The weighed inputs are summed up and
supplied to the hidden layers, where a non
linear transfer function does all the required
processing.
 The results of transfer function are
communicated to the neurons in the output
layer, where the results are interpreted and
finally presented to the users.

c. k-nearest neighbor (kNN)
 Simplest machine learning algorithms.
 Most commonly used for classifying a new
pattern (e.g. a molecule)
 Technique is based on a simple distance
learning approach.
 Where unknown/ new molecules are
classified according to the majority of its k-
nearest neighbors in the training set.
 The nearness is determined by Euclidean
distance metric (e.g. similarity measure
computed using the structural descriptors of
the molecules).

Importance of statistical parameter
 Equations generated/established in QSAR
studies are Linear Regression equations.
 A number of equations may be
generated/established for one
problem/case under study.
 Statistics helps in selecting one suitable
bet fit equation out of them.

Contd.
 This may be done by checking standard
deviation/variance an other related
parameters for the data set used for QSAR
studies .
 Correlation coefficient computed for the
data set under study also helps in selecting
appropriate QSAR equation.

Reference:
 Kubinyi H., Introduction, In: Mannhold R., Larsen P.,
Timmerman H. QSAR: Hansch Analysis and related
approaches. New York, VCH Publishers, 1993. p. 4-8
approaches. New York, VCH Publishers, 1993. p. 27-54
approaches. New York, VCH Publishers, 1993. p. 57- 68
approaches. New York, VCH Publishers, 1993. p. 159

Quantitative Structure Activity Relationship.pptx

Quantitative Structure Activity Relationship.pptx

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Quantitative Structure Activity Relationship.pptx

Similar to Quantitative Structure Activity Relationship.pptx (20)

Recently uploaded

Recently uploaded (20)

Quantitative Structure Activity Relationship.pptx