QSRR

SHIKHA D. POPALI
HARSHPAL SINGH WAHI
M.Pharm.
Department of Pharmaceutical Chemistry
Gurunanak College of Pharmacy, Nagpur
Quantitative Structure-Activity Relationship
QUANTATIVE STRUCTURE
RETENTION RELATIONSHIP

2
Crude drug Structure Ligand bindingReceptor
Biological
response

Drug discovery & development-timeline

DRUG DISCOVERY
DRUGS DISCOVERY METHODS:
Random Screening
Molecular Manipulation
Molecular Designing
Drug Metabolites
Serendipity

Target
Selection
• Cellular
and
Genetic
Targets
• Genomics
• Proteomics
• Bioinformat
ics
Lead
Discovery
• Synthesis
and
Isolation
• Combinator
ial
Chemistry
• Assay
developme
nt
• High-
Throughput
Screening
Medicinal
Chemistry
• Library
Developme
nt
• SAR
Studies
• In Silico
Screening
• Chemical
Synthesis
In Vitro
Studies
• Drug
Affinity and
Selectivity
• Cell
Disease
Models
• MOA
• Lead
Candidate
Refinement
In Vivo
Studies
• Animal
models of
Disease
States
• Behavioura
l Studies
• Functional
Imaging
• Ex-Vivo
Studies
Clinical
Trials and
Therapeutic
s

TARGET SELECTION
Target Selection Lead
Discovery
Medicinal
Chemistry
In Vitro
Studies
In Vivo
Studies
Clinical Trials
Cellular &
Genetic Targets
Genomics
Proteomics
Bioinformatics
• Target selection in drug discovery is defined as
the decision to focus on finding an agent with a
particular biological action that is anticipated to
have therapeutic utility — is influenced by a
complex balance of scientific, medical and
strategic considerations.
• Target identification: to identify molecular
targets that are involved in disease progression.
• Target validation: to prove that manipulating the
molecular target can provide therapeutic benefit
for patients.

TARGET SELECTION
Discovery
Medicinal
Chemistry
In Vitro
Studies
In Vivo
Studies
Clinical Trials
Cellular &
Genetic Targets
Genomics
Proteomics
Bioinformatics
Biochemical Classes of Drug Targets
 G-protein coupled receptors - 45%
 enzymes - 28%
 hormones and factors - 11%
 ion channels - 5%
 nuclear receptors - 2%

CELLULAR & GENETIC TARGETS:
Discovery
Medicinal
Chemistry
In Vitro
Studies
In Vivo
Studies
Clinical
Trials
Cellular &
Genetic Targets
Genomics
Proteomics
Bioinformatics
Involves the identification of the function of a potential
therapeutic drug target and its role in the disease process.
For small-molecule drugs, this step in the process involves
identification of the target receptors or enzymes whereas
for some biologic approaches the focus is at the gene or
transcription level.
Drugs usually act on either cellular or genetic chemicals in
the body, known as targets, which are believed to be
associated with disease.

CELLULAR & GENETIC TARGETS:
Discovery
Medicinal
Chemistry
In Vitro
Studies
In Vivo
Studies
Clinical
Trials
Cellular &
Genetic Targets
Genomics
Proteomics
Bioinformatics
Scientists use a variety of techniques to
identify and isolate individual targets to
learn more about their functions and how
they influence disease.
Compounds are then identified that have
various interactions with the drug targets
that might be helpful in treatment of a
specific disease.

GENOMICS:
Discovery
Medicinal
Chemistry
In Vitro
Studies
In Vivo
Studies
Clinical
Trials
Cellular &
Genetic Targets
Genomics
Proteomics
Bioinformatics
The study of genes and their function. Genomics aims to
understand the structure of the genome, including the
mapping genes and sequencing the DNA.
Seeks to exploit the findings from the sequencing of the
human and other genomes to find new drug targets.
Human Genome consists of a sequence of around 3 billion
nucleotides (the A C G T bases) which in turn probably
encode 35,000 – 50,000 genes.

GENOMICS:
Discovery
Medicinal
Chemistry
In Vitro
Studies
In Vivo
Studies
Clinical
Trials
Cellular &
Genetic Targets
Genomics
Proteomics
Bioinformatics
Drew’s estimates that the number of genes implicated
in disease, both those due to defects in single genes
and those arising from combinations of genes, is about
1,000
Based on 5 or 10 linked proteins per gene, he proposes
that the number of potential drug targets may lie
between 5,000 and 10,000.
Single Nucleotide Polymorphism (SNP) libraries: are
used to compare the genomes from both healthy and
sick people and to identify where their genomes vary.

PROTEOMICS:
Discovery
Medicinal
Chemistry
In Vitro
Studies
In Vivo
Studies
Clinical
Trials
Cellular &
Genetic Targets
Genomics
Proteomics
Bioinformatics
It is the study of the proteome, the complete set of
proteins produced by a species, using the technologies of
large – scale protein separation and identification.
It is becoming increasingly evident that the complexity of
biological systems lies at the level of the proteins, and that
genomics alone will not suffice to understand these systems.
It is also at the protein level that disease processes become
manifest, and at which most (91%) drugs act.
Therefore, the analysis of proteins (including protein-protein,
protein-nucleic acid, and protein ligand interactions) will be
utmost importance to target discovery.

PROTEOMICS:
Discovery
Medicinal
Chemistry
In Vitro
Studies
In Vivo
Studies
Clinical
Trials
Cellular &
Genetic Targets
Genomics
Proteomics
Bioinformatics
Proteomics is the systematic high-throughput
separation and characterization of proteins
within biological systems.
Target identification with proteomics is
performed by comparing the protein expression
levels in normal and diseased tissues.
2D PAGE is used to separate the proteins, which
are subsequently identified and fully
characterized with LC-MS/MS.

BIOINFORMATICS:
Discovery
Medicinal
Chemistry
In Vitro
Studies
In Vivo
Studies
Clinical
Trials
Cellular &
Genetic Targets
Genomics
Proteomics
Bioinformatics
Bioinformatics is a branch of molecular biology that involves
extensive analysis of biological data using computers, for the
purpose of enhancing biological research.
It plays a key role in various stages of the drug discovery
process including
 target identification
 computer screening of chemical compounds and
 pharmacogenomics

BIOINFORMATICS:
Discovery
Medicinal
Chemistry
In Vitro
Studies
In Vivo
Studies
Clinical
Trials
Cellular &
Genetic Targets
Genomics
Proteomics
Bioinformatics
Bioinformatics methods are used to transform the
raw sequence into meaningful information (eg.
genes and their encoded proteins) and to compare
whole genomes (disease vs. not).
Can compare the entire genome of pathogenic and
non-pathogenic strains of a microbe and identify
genes/proteins associated with pathogenism
Using gene expression micro arrays and gene chip
technologies, a single device can be used to evaluate
and compare the expression of up to 20000 genes of
healthy and diseased individuals at once

LEAD DISCOVERY:
Discovery
Medicinal
Chemistry
In Vitro
Studies
In Vivo
Studies
Clinical
Trials
Synthesis and
Isolation
Combinatoria
l Chemistry
Assay
Development
High
Throughput
Screening
• Identification of small molecule modulators of
protein function
• The process of transforming these into high-
content lead series.

CHEMICAL STRUCTURE:
A chemical structure includes molecular geometry,
electronic structure, and crystal structure of a molecule.
BIOLOGICAL ACTIVITY:
Biological activity is an expression describing the beneficial
or adverse effects of a drug on living matter.
11/29/2019
17

STRUCTURE ACTIVITY RELATIONSHIP:
The structure–activity relationship (SAR) is the
relationship between the chemical or 3D structure of a
molecule and its biological activity.
 The analysis of SAR enables the determination of the
chemical groups responsible for evoking a target
biological effect in the organism.
11/29/2019
18

WHY SAR EXISTS ?
The interaction of the drug molecule with a protein
depends on its chemical structure.
NEED OF SAR STUDY:
 A study of the structure–activity relationship is
mainly done by lead molecule.
 It is used to determine the parts of the structure
of the lead compound that are responsible for both
its beneficial biological activity, that is its
pharmacophore and also its unwanted side effects
11/29/2019Stucture-activity relationship
19

 This is used to develop a new drug that has
increased activity.
To determine some different activity from an
existing drug and fewer unwanted side effects.
Know the changes in pharmacological properties by
performing minor changes in the drug molecule.
CONTD…
20

EFFECT OF CHEMICAL STRUCTURE
ON PHARMACOLOGICALACTION
Drug action is according to the theory proposed by EARLICH
i.e Corpora non agunt nisi fixata (A drug will not work unless it
is bound)
 It means the drug should be non-uniformly distributed in the
body i.e towards a targeted portion to elicit the
pharmacological action.
In other words when the drug molecule get close to the
cellular molecules and binds with it to alter its activity.
21

Drugs or ligand binds with receptor and
mediate their Pharmacological action
Drug
Receptor
Pharmacological
action
22

Processes such as
 Passage of the drug through biologic membranes
 Penetration to sites of action
 Binding to the target proteins
 Therapeutic response
 Elimination from the body
Are major phenomena to be considered in a drug’s action.
23

Site of
Action
Dosage Effects
Plasma
Concen.
Pharmacokinetics Pharmacodynamics
The above processes can be explained by
pharmacokinetic and pharmacodynamic studies.
24

EFFECT OF CHEMICAL STRUCTURE ON
PHARMACOKINETIC PROPERTIES OF DRUG
Pharmacokinetic studies include things done by the body towards drug
molecules.
It includes following stages
ABSORPTION
DISTRIBUTION
METABOLISM
ELIMINATION
Pharmacokinetic properties can be well explained by BIOAVILABILITY studies.
25

Compounds + biological activity
New compounds with
improved biological activity
QSAR
Correlate chemical structure with activity using statistical approach
QSAR and Drug Design

QSAR?
A QSAR is a mathematical relationship between a
biological activity of a molecular system and its
geometric and chemical characteristics.
QSAR attempts to find consistent relationship
between biological activity and molecular properties,
so that these “rules” can be used to evaluate the
activity of new compounds.

 The number of compounds required for synthesis in
order to place 10 different groups in 4 positions of
benzene ring is 104
 Solution: Synthesize a small number of compounds and
from their data derive rules to predict the biological
activity of other compounds.
Why QSAR ?

QSAR
QSPR
QSBR
Activity
Property
Binding

Molecular Structure ACTIVITIES
Representation Feature Selection & Mapping
Descriptors
Quantitative structure-activity relationships correlate, within
congeneric series of compounds, their chemical or biological
activities, either with certain structural features or with atomic,
group or molecular descriptors.
Quantitative Structure Activity Relationship (QSAR)
Katiritzky, A. R. ; Lovanov, V. S.; Karelson, M. Chem. Soc. Rev. 1995, 24, 279-287

Rationale for QSAR studies
• In drug design, in-vitro potency addresses only part of the need; a
successful drug must also be able to reach its target in the body
while still in its active form.
• The in-vivo activity of a substance is a composite of many factors,
including the intrinsic reactivity of the drug, its solubility in water, its
ability to pass the blood-brain barrier, its non- reactivity with non-
target molecules that it encounters on its way to the target, and
others.
• A quantitative structure-activity relationship (QSAR) correlates
measurable or calculable physical or molecular properties to some
specific biological activity in terms of an equation.
• Once a valid QSAR has been determined, it should be possible to
predict the biological activity of related drug candidates before they
are put through expensive and time-consuming biological testing. In
some cases, only computed values need to be known to make an
assessment.

Advantages of QSAR:
• Quantifying the relationship between structure and activity
provides an understanding of the effect of structure on
activity, which may not be straightforward when large amounts of
data are generated.
• There is also the potential to make predictions leading to the
synthesis of novel analogues. Interpolation is readily justified, but
great care must be taken not to use extrapolation outside the
range of the data set.
• The results can be used to help understand interactions between
functional groups in the molecules of greatest activity, with those of
their target. To do this it is important to interpret any derived QSAR
in terms of the fundamental chemistry of the set of analogues,
including any outliers

Disadvantages of QSAR:
1. False correlations may arise through too heavy a reliance
being placed on biological data, which, by its nature, is
subject to considerable experimental error.
2. Frequently, experiments upon which QSAR analyses depend,
lack design in the strict sense of experimental design.
Therefore the data collected may not reflect the complete
property space. Consequently, many QSAR results cannot be
used to confidently predict the most likely compounds of
best activity.
3. Various physicochemical parameters are known to be cross-
correlated. Therefore only variables or their combinations
that have little covariance should be used in a QSAR analysis;
similar considerations apply when correlations are sought for
different sets of biological data

DMITRY MENDELEÉV
(1834 – 1907)
Russian chemist who arranged the 63 known elements into a
periodic table based on atomic mass, which he published in
Principles of Chemistry in 1869. Mendeléev left space for new
elements, and predicted three yet-to-be-discovered elements:
Ga (1875), Sc (1879) and Ge (1886).
Discoverer of the Periodic Table — an early “Chemoinformatician”

HISTORY OF QSAR
1868, D. Mendeleev – The Periodic Table of Elements
1868, A. Crum-Brown and T.R. Fraser – formulated a suggestion that
physiological activity of molecules depends on their constitution:
Activity = F(structure)
They studied a series of quaternized strychnine derivatives, some of
which possess activity similar to curare in paralyzing muscle.
1869, B.J. Richardson – narcotic effect of primary alcohols varies in
proportion to their molecular weights.

HISTORY OF QSAR
1893, C. Richet has shown that toxicities of some simple
organic compounds (ethers, alcohols, ketones) were
inversely related to their solubility in water.
1899, H. Meyer and 1901, E. Overton have found variation
of the potencies of narcotic compounds with LogP.
1904, J. Traube found a linear relation between narcosis
and surface tension.

HISTORY OF QSAR
1937, L.P. Hammett studied chemical reactivity of
substituted benzenes:
Hammett equation,
Linear Free Energy Relationship (LFER)
1939, J. Fergusson formulated a concept linking narcotic
activity, logP and thermodynamics.
1952- 1956, R.W. Taft devised a procedure for
separating polar, steric and resonance effects.

HISTORY OF QSAR
1964, C. Hansch and T. Fujita: the biologist’s Hammett
equation.
1964, Free and Wilson, QSAR on fragments.
1970s – 1980s – development of 2D QSAR (descriptors,
mathematical formalism).
1980s – 1990s, development of 3D QSAR
(pharmacophores, CoMFA, docking).
1990s – present, virtual screening.

Molecular descriptors used in QSAR
Type Descriptors
Hydrophobic Parameters
Partition coefficient ; log P
Hansch’s substitution constant; π
Hydrophobic fragmental constant; f, f’
Distribution coefficient; log D
Apparent log P
Capacity factor in HPLC; log k’ , log k’W
Solubility parameter; log S

Type Descriptors
Electronic Parameters
Hammett constant; σ, σ +, σ -
Taft’s inductive (polar) constant; σ*
Swain and Lupton field parameter
Ionization constant; pKa , ΔpKa
Chemical shifts: IR, NMR

Type Descriptors
Steric Parameters Taft’s steric parameter; Es
Molar volume; MV
Van der waals radius
Van der waals volume
Molar refractivity; MR
Parachor
Sterimol

Type Descriptors
Quantum chemical descriptors Atomic net charge; Qσ, Qπ
Superdelocalizability
Energy of highest occupied molecular orbital;
EHOMO
Energy of lowest unoccupied molecular orbital;
ELUMO

Type Descriptors
Spatial Descriptor Jurs descriptors,
Shadow indices,
Radius of Gyration
Principle moment of inertia

Classification of Descriptors Based on the
Dimensionality of their Molecular Representation
Molecular
representation
Descriptor Example
0D Atom count, bond
counts, molecular
weight, sum of
atomic properties
Molecular weight, average
molecular weight number of:
atoms, hydrogen atoms carbon
atoms, hetero-atoms, non-
hydrogen atoms, double bonds,
triple bonds, aromatic bonds,
rotatable bonds, rings, 3-
membered ring, 4- membered
ring, 5-membered ring, 6-
membered ring

Molecular
representation
Descriptor Example
1D Fragments counts Number of: primary C, secondary C,
tertiary C, quaternary C, secondary
carbon in ring, tertiary carbon in
quaternary carbon in ring,
unsubstituted aromatic carbon,
substituted carbon, number of H-
bond donar atoms, number of H-
bond acceptor atoms, unsaturation
index,
hydrophilic factor, molecular
refractivity.

Molecular
representation
Descriptor Example
2D Topological
descriptors
Zagreb index, Wiener index, Balaban
index, connectivity indices chi (χ),
kappa (К) shape indices
3D Geometrical
descriptors
Radius of gyration, E-state
parameters, 3D Wiener index, 3D
Balaban index

Various stages of QSAR model development

Hydrophobicity of the Molecule
Partition Coefficient P = [Drug in octanol]
[Drug in water]
High P High hydrophobicity

•Activity of drugs is often related to P
e.g. binding of drugs to serum albumin (straight line - limited range of log P)
•Binding increases as log P increases
•Binding is greater for hydrophobic drugs
Log 1
C



 0.75 logP + 2.30
Log (1/C)
Log P
. .
.
.. .
. ..
0.78 3.82

Example 2 General anaesthetic activity of ethers
(parabolic curve - larger range of log P values)
Optimum value of log P for anaesthetic activity = log Po
Log
1
C



  - 0.22(logP) 2 + 1.04 logP + 2.16
Log P o
Log P
Log (1/C)

QSAR equations are only applicable to compounds in the same structural
class (e.g. ethers)
• However, log Po is similar for anaesthetics of different structural classes
(ca. 2.3)
• Structures with log P ca. 2.3 enter the CNS easily
(e.g. potent barbiturates have a log P of approximately 2.0)
•Can alter log P value of drugs away from 2.0 to avoid CNS side effects
Notes:

Hydrophobicity of Substituents
- the substituent hydrophobicity constant ()
Notes:
•A measure of a substituent’s hydrophobicity relative to hydrogen
•Tabulated values exist for aliphatic and aromatic substituents
•Measured experimentally by comparison of log P values with log P of
parent structure
Example:
•Positive values imply substituents are more hydrophobic than H
•Negative values imply substituents are less hydrophobic than H
Benzene
(Log P = 2.13)
Chlorobenzene
(Log P = 2.84)
Benzamide
(Log P = 0.64)
Cl CONH2
Cl = 0.71 CONH = -1.492

Notes:
•The value of is only valid for parent structures
•It is possible to calculate log P using  values
•A QSAR equation may include both P and .
•P measures the importance of a molecule’s overall hydrophobicity
(relevant to absorption, binding etc)
•  identifies specific regions of the molecule which might interact
with hydrophobic regions in the binding site
Hydrophobicity of Substituents
- the substituent hydrophobicity constant ()
Example:
meta-Chlorobenzamide
Cl
CONH2
Log P(theory) = log P(benzene) + Cl + CONH
= 2.13 + 0.71 - 1.49
= 1.35
Log P (observed) = 1.51
2

Electronic Effects
Hammett Substituent Constant ()
Notes:
•The constant () is a measure of the e-withdrawing or e-donating
influence of substituents
•It can be measured experimentally and tabulated
(e.g.  for aromatic substituents is measured by comparing the
dissociation constants of substituted benzoic acids with benzoic acid)
X=H K H = Dissociation constant= [PhCO 2-]
[PhCO 2H]
+CO2H CO2 H
X X

+
X = electron
withdrawing
group
X
CO2CO2H
X
H
X= electron withdrawing group (e.g. NO2)
 X = log
K X
K H
= logK X - logK H
Charge is stabilised by X
Equilibrium shifts to right
KX > KH
Positive value

X= electron donating group (e.g. CH3)
 X = log
K X
K H
= logK X - logK H
Charge destabilised
Equilibrium shifts to left
KX < KH
Negative value
+
X = electron
withdrawing
group
X
CO2CO2H
X
H

 value depends on inductive and resonance effects
 value depends on whether the substituent is meta or para
ortho values are invalid due to steric factors

DRUG
N
O
O
meta-Substitution
EXAMPLES: p (NO2) m (NO2)
e-withdrawing (inductive effect only)
e-withdrawing
(inductive +
resonance effects)
N
O O
DRUG DRUG
N
OO
N
O O
DRUG DRUG
N
OO
para-Substitution

m (OH) p (OH)
e-withdrawing (inductive effect only)
e-donating by resonance
more important than
inductive effect
EXAMPLES:
DRUG
OH
meta-Substitution
DRUG
OH
DRUG DRUG
OH OH
DRUG
OH
para-Substitution

QSAR Equation:
Diethylphenylphosphates
(Insecticides)
log 1
C



 2.282 - 0.348
Conclusion: e-withdrawing substituents increase activity
X
O P
O
OEt
OEt

Electronic Factors R & F
•R - Quantifies a substituent’s resonance effects
•F - Quantifies a substituent’s inductive effects

Aliphatic electronic substituents
•Defined by I
•Purely inductive effects
•Obtained experimentally by measuring the rates of hydrolyses of aliphatic
esters
•Hydrolysis rates measured under basic and acidic conditions
X= electron donating Rate I = -ve
X= electron withdrawing Rate I = +ve
Basic conditions: Rate affected by steric + electronic factors
Gives I after correction for steric effect
Acidic conditions: Rate affected by steric factors only (see Es)
+
Hydrolysis
HOMe
CH2 OMe
C
O
X CH2 OH
C
O
X

Here, the size of R affects the rate of reaction by blocking nucleophilic attack by water.
Taft quantified the steric (spatial) effects using the hydrolysis of esters:
In this case, the steric effects were quantified by the Taft parameter Es:
k is the rate constant for ester hydrolysis. This expression is analogous to the Hammett equation.
Steric effects

t-Bu -2.78 : large resistance to hydrolysis
Me -1.24: little steric resistance to hydrolysis
H 0.00 the reference substituent in the Taft equation
Compare some extreme values:
Es Values for Various Substituents
H Me Pr t-Bu F Cl Br OH SH NO2 C6H5 CN NH2
0.0 -1.24 -1.60 -2.78 -0.46 -0.97 -1.16 -0.55 -1.07 -2.52 -3.82 -0.51 -0.61
Note: H is usually used as the reference substituent (Es(0)), but sometimes when another
group, such as methyl (Me) is used as the reference, as in the chemical equation above, the
value becomes 1.24.

Steric Factors
Taft’s Steric Factor (Es)
•Measured by comparing the rates of hydrolysis of substituted aliphatic
esters against a standard ester under acidic conditions
Es = log kx - log ko
kx represents the rate of hydrolysis of a substituted ester
ko represents the rate of hydrolysis of the parent ester
•Limited to substituents which interact sterically with the tetrahedral
transition state for the reaction
•Cannot be used for substituents which interact with the transition state by
resonance or hydrogen bonding
•May undervalue the steric effect of groups in an intermolecular process
(i.e. a drug binding to a receptor)

Steric Factors
Molar Refractivity (MR) - a measure of a substituent’s volume
MR =
(n 2
-1)
(n 2
- 2)
x
mol. wt.
density
Correction factor
for polarisation
(n=index of
refraction)
Defines volume

Verloop Steric Parameter
- calculated by software (STERIMOL)
- gives dimensions of a substituent
- can be used for any substituent
L
B
3
B
4
B4 B3
B2
B1
C
O
O
H
H O C O
Example - Carboxylic acid
Steric Factors

2D QSAR Methods
1. Free energy models
a) Hansch analysis (Linear Free Energy Relationship, LFER)
2. Mathematical models
a) Free Wilson analysis
b) Fujita-Ban modification
3. Other statistical methods
a) Discriminant Analysis (DA)
b) Principle Component Analysis (PCA)
c) Cluster Analysis (CA)
d) Combine Multivariate Analysis (CMA)
e) Factor Analysis (FA)
4. Pattern recognition
5. Topological methods
6. Quantum mechanical methods

Hansch Analysis
• In 1969, Corwin Hansch extends the concept of linear free energy
relationships (LFER) to describe the effectiveness of a biologically
active molecule.
• It is one of the most promising approaches to the quantification of
the interaction of drug molecules with biological system.
• It is also known as linear free energy (LFER) or extra
thermodynamic method which assumes additive effect of various
substituents in electronic, steric, hydrophobic, and dispersion data
in the non-covalent interaction of a drug and biomacro molecules.
of a drug as depending on two processes.

• This method relates the biological activity within a homologous series
of compounds to a set of theoretical molecular parameters which
describe essential properties of the drug molecules. Hansch
proposed that the action.
• Journey from point of entry in the body to the site of action which
involves passage of series of membranes and therefore it is related
to partition coefficient log P (lipophilic) and can be explained by
random walk theory.
• Interaction with the receptor site which in turn depends on,
a) Bulk of substituent groups (steric)
b) Electron density on attachment group (electronic)
He suggested linear and non-linear dependence of biological activity on
different parameters.
log (1/C) = a(log P) + b σ + cES+ d …………………..linear
log (1/C) = a(log P)2 + b(log P) + c σ + dES+ e ……..nonlinear

Hansch Equation
•A QSAR equation relating various physicochemical properties to
the biological activity of a series of compounds
•Usually includes log P, electronic and steric factors
•Start with simple equations and elaborate as more structures are
synthesised
•Typical equation for a wide range of log P is parabolic
Log 1
C



 -k (logP) 2 + k 2 logP + k 3  + k 4 Es + k 51

HANSCH ANALYSIS
Physicochemical properties
can be broadly classified into
three general types:
Electronic
Steric
Hydrophobic
Biological Activity = f (Physicochemical properties ) + constant

Hansch Equation
Log
1
C



  1.22  - 1.59  + 7.89
Conclusions:
•Activity increases if  is +ve (i.e. hydrophobic substituents)
•Activity increases if  is negative (i.e. e-donating substituents)
Example: Adrenergic blocking activity of -halo--arylamines
CH CH2 NRR'
XY

Conclusions:
•Activity increases slightly as log P (hydrophobicity) increases (note
that the constant is only 0.14)
•Parabolic equation implies an optimum log Po value for activity
•Activity increases for hydrophobic substituents (esp. ring Y)
•Activity increases for e-withdrawing substituents (esp. ring Y)
Log
1
C



 -0.015 (logP)2 + 0.14 logP+ 0.27X + 0.40Y + 0.65 X + 0.88Y + 2.34
Hansch Equation
Example: Antimalarial activity of phenanthrene aminocarbinols
X
Y
(HO)HC
CH2NHR'R"

Substituents must be chosen to satisfy the following criteria;
•A range of values for each physicochemical property studied
•Values must not be correlated for different properties (i.e. they must be
orthogonal in value)
•At least 5 structures are required for each parameter studied
Correlated values.
Are any differences
due to  or MR?
No correlation in values
Valid for analysing effects
of  and MR.
Hansch Equation
Substituent H Me Et n-Pr n-Bu
 0.00 0.56 1.02 1.50 2.13
MR 0.10 0.56 1.03 1.55 1.96
Substituent H Me OMe NHCONH2 I CN
 0.00 0.56 -0.02 -1.30 1.12 -0.57
MR 0.10 0.56 0.79 1.37 1.39 0.63
Choosing suitable substituents

Free Wilson equation:
• Free and Wilson reasoned that the biological activity for a set of
analogues could be described by
the contributions that substituents or structural elements make to
the activity of a parent structure.
• Free-Wilson equations do not require the use of substituent
constants such as , m, p, F, R, ES, and MR, and are represented
by the general equation:
• where aj is the group contribution of the structural feature X in
position j in molecule i and is the theoretical biological activity
value of a reference or parent compound.
• The presence and absence of structural elements is indicated by the
values 1 and 0, respectively. It is possible to add indicator
variables to the Hansch equation as a combination of Free-Wilson
analysis and Hansch analysis. The indicator variables can be used
to describe a variety of structural features, such as substructures,
chiral centers and special substituents.

Free-Wilson Approach
•The biological activity of the parent structure is measured and compared
with the activity of analogues bearing different substituents
•An equation is derived relating biological activity to the presence or
absence of particular substituents
Activity = k1X1 + k2X2 +.…knXn + Z
•Xn is an indicator variable which is given the value 0 or 1 depending on
whether the substituent (n) is present or not
•The contribution of each substituent (n) to activity is determined by the
value of kn
•Z is a constant representing the overall activity of the structures studied
Method

Free-Wilson Approach
•No need for physicochemical constants or tables
•Useful for structures with unusual substituents
•Useful for quantifying the biological effects of molecular features that
cannot be quantified or tabulated by the Hansch method
Advantages
Disadvantages
•A large number of analogues need to be synthesised to represent each
different substituent and each different position of a substituent
•It is difficult to rationalise why specific substituents are good or bad for
activity
•The effects of different substituents may not be additive
(e.g. intramolecular interactions)

Free-Wilson / Hansch Approach
•It is possible to use indicator variables as part of a Hansch equation - see following
Case Study
Advantages

•Allows an easy identification of suitable substituents for a QSAR analysis
which includes both relevant properties
•Choose a substituent from each quadrant to ensure orthogonality
•Choose substituents with a range of values for each property
Craig Plot

Craig Plot
Craig plot shows values for 2 different physicochemical properties for various substituents
Example:
.
+
-
-.25
.75
.50
1.0
-1.0
-.75
-.50
.25
-.4-.8-1.2-1.6-2.0 2.01.61.2.8.4.
. . .
.
.
.
.
.. . .
.
.
.
.
.
.....
CF3SO2
CF3
Me
Cl Br
I
OCF3
F
NMe2
OCH3
OH
NH2
CH3CONH
CO2H
CH3CO
CN
NO2
CH3SO2
CONH2
SO2NH2
Et
t-Butyl
SF5
- +
- +
+ +
- -
+ -

Topliss Scheme
Used to decide which substituents to use if optimising compounds
one by one (where synthesis is complex and slow)
Example: Aromatic substituents
L E M
ML EL E M
L E M
L E M
See Central
Branch
L E M
H
4-Cl
4-CH34-OMe 3,4-Cl2
4-But
3-CF3-4-Cl
3-Cl 3-Cl 4-CF3
2,4-Cl2
4-NO2
3-NMe2
3-CF3-4-NO2
3-CH3
2-Cl
4-NO2
3-CF3
3,5-Cl2
3-NO2
4-F
4-NMe2
3-Me-4-NMe2
4-NH2

Rationale
Replace H with
para-Cl (+ and +)
+ and/or +
advantageous
Favourable 
unfavourable 
+ and/or +
disadvantageous
Act. Little
change
Act.
Add second Cl to
increase  and 
further
Replace with OMe
(- and -)
Replace with Me
(+ and -)
Further changes suggested based on arguments of  and
steric strain
Topliss Scheme

Topliss Scheme
Aliphatic substituents
L E M
L E L E MM
CH3
i-Pr
H;CH2OCH3;CH2SO2CH3 Et Cyclopentyl
END Cyclohexyl
CHCl2;CF3;CH2CF3;CH2SCH3
Ph;CH2Ph
CH2Ph
CH2CH2Ph
Cyclobutyl;cyclopropyl
t-Bu

Topliss Scheme
Example
M= More Activity
L= Less Activity
E = Equal Activity
High
Potency
*
-
M
L
E
M
H
4-Cl
3,4-Cl2
4-Br
4-NO2
1
2
3
4
5
Biological
Activity
ROrder of
Synthesis
R
SO2NH2

Topliss Scheme
Example
*
*
Order of
Synthesis
R Biological
Activity
1
2
3
4
5
6
7
8
H
4-Cl
4-MeO
3-Cl
3-CF3
3-Br
3-I
3,5-Cl2
-
L
L
M
L
M
L
M
*
High
Potency
M= More Activity
L= Less Activity
E = Equal Activity
R N N
N
CH2CH2CO2H
N

Bio-isosteres
•Choose substituents with similar physicochemical properties (e.g. CN, NO2 and
COMe could be bio-isosteres)
•Choose bio-isosteres based on most important physicochemical
property
(e.g. COMe & SOMe are similar in p; SOMe and SO2Me are similar in )
 -0.55 0.40 -1.58 -1.63 -1.82 -1.51
p 0.50 0.84 0.49 0.72 0.57 0.36
m 0.38 0.66 0.52 0.60 0.46 0.35
MR 11.2 21.5 13.7 13.5 16.9 19.2
SubstituentC
O
CH3
C
CH3
C
NC CN
S
CH3
O
S
O
CH3
O
S
O
NHCH3
O
C
O
NMe2

Case Study
QSAR analysis of pyranenamines (SK & F)
(Anti-allergy compounds)
O O O
NH
O OH OH X
Y
Z
3
4
5

Stage 1 19 structures were synthesised to study  and 
CLog
1


 -0.14 - 1.35()2
 0.72
 and  = total values for  and  for all substituents
Conclusions:
•Activity drops as  increases
•Hydrophobic substituents are bad for activity - unusual
•Any value of  results in a drop in activity
•Substituents should not be e-donating or e-withdrawing (activity falls if is +ve or -ve)
Case Study
O O O
NH
O OH OH X
Y
Z
3
4
5

Stage 2 61 structures were synthesised, concentrating on hydrophilic substituents to test
the first equation
Anomalies
a) 3-NHCOMe, 3-NHCOEt, 3-NHCOPr.
Activity should drop as alkyl group becomes bigger and more
hydrophobic, but the activity is similar for all three substituents
b) OH, SH, NH2 and NHCOR at position 5 : Activity is greater than expected
c) NHSO2R : Activity is worse than expected
d) 3,5-(CF3)2 and 3,5(NHMe)2 : Activity is greater than expected
e) 4-Acyloxy : Activity is 5 x greater than expected
Case Study
O O O
NH
O OH OH X
Y
Z
3
4
5

a) 3-NHCOMe, 3-NHCOEt, 3-NHCOPr.
Possible steric factor at work. Increasing the size of R may be good for activity and
balances out the detrimental effect of increasing hydrophobicity
b) OH, SH, NH2, and NHCOR at position 5
Possibly involved in H-bonding
c) NHSO2R
Exception to H-bonding theory - perhaps bad for steric or electronic reasons
d) 3,5-(CF3)2 and 3,5-(NHMe)2
The only disubstituted structures where a substituent at position 5 was electron
withdrawing
e) 4-Acyloxy
Presumably acts as a prodrug allowing easier crossing of cell membranes.
The group is hydrolysed once across the membrane.
Case Study
O O O
NH
O OH OH X
Y
Z
3
4
5
Theories

Stage 3 Alter the QSAR equation to take account of new results
Log
1
C



 -0.30 - 1.35()2 + 2.0(F-5) + 0.39(345-HBD) -0.63(NHSO 2)
+ 0.78(M-V) + 0.72(4-OCO) - 0.75
Conclusions
(F-5) Electron-withdrawing group at position 5 increases activity
(based on only 2 compounds though)
(3,4,5-HBD) HBD at positions 3, 4,or 5 is good for activity
Term = 1 if a HBD group is at any of these positions
Term = 2 if HBD groups are at two of these positions
Term = 0 if no HBD group is present at these positions
Each HBD group increases activity by 0.39
(NHSO2) Equals 1 if NHSO2 is present (bad for activity by -0.63).
Equals zero if group is absent.
(M-V) Volume of any meta substituent. Large substituents at meta
position increase activity
4-O-CO Equals 1 if acyloxy group is present (activity increases by 0.72).
Equals 0 if group absent
Case Study
O O O
NH
O OH OH X
Y
Z
3
4
5

Stage 3 Alter the QSAR equation to take account of new results
Log
1
C



 -0.30 - 1.35()2 + 2.0(F-5) + 0.39(345-HBD) -0.63(NHSO 2)
+ 0.78(M-V) + 0.72(4-OCO) - 0.75
Note
The terms (3,4,5-HBD), (NHSO2), and 4-O-CO are examples of indicator variables
used in the free-Wilson approach and included in a Hansch equation
Case Study
O O O
NH
O OH OH X
Y
Z
3
4
5

Stage 4
37 Structures were synthesised to test steric and F-5 parameters, as well as the effects of
hydrophilic, H-bonding groups
Anomalies
Two H-bonding groups are bad if they are ortho to each other
Explanation
Possibly groups at the ortho position bond with each other rather than with the receptor -
an intramolecular interaction
Case Study
O O O
NH
O OH OH X
Y
Z
3
4
5

Stage 5 Revise Equation
NOTES
a) Increasing the hydrophilicity of substituents allows the identification of an
optimum value for  ( = -5). The equation is now parabolic (-0.034 ()2)
b) The optimum value of  is very low and implies a hydrophilic binding site
c) R-5 implies that resonance effects are important at position 5
d) HB-INTRA equals 1 for H-bonding groups ortho to each other (act. drops -086)
equals 0 if H-bonding groups are not ortho to each other
e) The steric parameter is no longer significant and is not present
Log
1
C



 -0.034()2 -0.33 + 4.3(F-5) + 1.3 (R -5) - 1.7()2 + 0.73(345- HBD)
- 0.86 (HB -INTRA) - 0.69(NHSO 2) + 0.72(4-OCO) - 0.59
Case Study
O O O
NH
O OH OH X
Y
Z
3
4
5

Stage 6 Optimum Structure and binding theory
Case Study
NH3
X
X
XXH
5
3
NH
NH
C
O
CH
OH
CH2OH
CH CH2OHC
O OH
RHN

NOTES on the optimum structure
•It has unusual NHCOCH(OH)CH2OH groups at positions 3 and 5
•It is 1000 times more active than the lead compound
•The substituents at positions 3 and 5
•are highly polar,
•are capable of hydrogen bonding,
•are at the meta positions and are not ortho to each other
•allow a favourable F-5 parameter for the substituent at position 5
•The structure has a negligible (2 value
Case Study

3D-QSAR
•Physical properties are measured for the molecule as a whole
•Properties are calculated using computer software
•No experimental constants or measurements are involved
•Properties are known as ‘Fields’
•Steric field - defines the size and shape of the molecule
•Electrostatic field - defines electron rich/poor regions of molecule
•Hydrophobic properties are relatively unimportant
Notes
Advantages over QSAR
•No reliance on experimental values
•Can be applied to molecules with unusual substituents
•Not restricted to molecules of the same structural class
•Predictive capability

3D-QSAR
•Comparative molecular field analysis (CoMFA) - Tripos
•Build each molecule using modelling software
•Identify the active conformation for each molecule
•Identify the pharmacophore
Method
NHCH3
OH
HO
HO
Active conformation
Build 3D
model
Define pharmacophore

3D-QSAR
•Place the pharmacophore into a lattice of grid points
Method
•Each grid point defines a point in space
Grid points
.
.
.
.
.

3D-QSAR
Method
•Each grid point defines a point in space
Grid points
.
.
.
.
.
•Position molecule to match the pharmacophore

3D-QSAR
•A probe atom is placed at each grid point in turn
Method
•Probe atom = a proton or sp3 hybridised carbocation
.
.
.
.
.
Probe atom

3D-QSAR
•A probe atom is placed at each grid point in turn
Method
•Measure the steric or electrostatic interaction of the probe atom
with the molecule at each grid point
.
.
.
.
.
Probe atom

3D-QSAR
•The closer the probe atom to the molecule, the higher the steric energy
•Define the shape of the molecule by identifying grid points of equal steric
energy (contour line)
•Favorable electrostatic interactions with the positively charged probe indicate
molecular regions which are negative in nature
•Unfavorable electrostatic interactions with the positively charged probe
indicate molecular regions which are positive in nature
•Define electrostatic fields by identifying grid points of equal energy (contour
line)
•Repeat the procedure for each molecule in turn
•Compare the fields of each molecule with their biological activity
•Identify steric and electrostatic fields which are favorable or unfavorable for
activity
Method

3D-QSAR
Method
Compound Biological Steric fields (S) Electrostatic fields (E)
activity at grid points (001-998) at grid points (001-098)
S001 S002 S003 S004 S005 etc E001 E002 E003 E004 E005 etc
1 5.1
2 6.8
3 5.3
4 6.4
5 6.1
Tabulate fields for each
compound at each grid point
Partial least squares
analysis (PLS)
QSAR equation Activity = aS001 + bS002 +……..mS998 + nE001 +…….+yE998 + z
. .
.
.
.

3D-QSAR
•Define fields using contour maps round a representative molecule
Method

3D-QSAR CASE STUDY
Tacrine
Anticholinesterase used in the treatment of Alzheimer’s disease
N
NH2

3D-QSAR CASE STUDY
Conclusions
Large groups at position 7 are detrimental
Groups at positions 6 & 7 should be electron-withdrawing
No hydrophobic effect
Conventional QSAR Study
12 analogues were synthesised to relate their activity with the hydrophobic, steric and
electronic properties of substituents at positions 6 and 7
N
NH2
R1
R2 6
7
9
C Log
1  pIC 50 = -3.09 MR(R
1
) + 1.43F(R
1
,R
2
) + 7.00
Substituents: CH3, Cl, NO2, OCH3, NH2, F
(Spread of values with no correlation)

3D-QSAR CASE STUDY
CoMFA Study
Analysis includes tetracyclic anticholinesterase inhibitors (II)
N
NH2
R1
R2
R3
R4
R5II
1
2
3
8
7
•Not possible to include above structures in a conventional QSAR analysis
since they are a different structural class
•Molecules belonging to different structural classes must be aligned
properly according to a shared pharmacophore

3D-QSAR CASE STUDY
Possible Alignment
Overlay
Good overlay but assumes similar binding modes

3D-QSAR CASE STUDY
A tacrine / enzyme complex was crystallised and analysed
Results revealed the mode of binding for tacrine
Molecular modelling was used to modify tacrine to structure (II) while still
bound to the binding site (in silico)
The complex was minimized to find the most stable binding mode for
structure II
The binding mode for (II) proved to be different from tacrine
X-Ray Crystallography

3D-QSAR CASE STUDY
Analogues of each type of structure were aligned according to the parent structure
Analysis shows the steric factor is solely responsible for activity
7
6
Blue areas - addition of steric bulk increases activity
Red areas - addition of steric bulk decreases activity
Alignment

3D-QSAR CASE STUDY
Prediction
6-Bromo analogue of tacrine predicted to be active (pIC50 = 7.40)
Actual pIC50 = 7.18
NBr
NH2

Advances in 3DQSAR
1. Molecular shape analysis (MSA)
Molecular shape analysis wherein matrices which include common overlap steric
volume and potential energy fields between pairs of superimposed molecules were
successfully correlated to the activity of series of compounds. The MSA using common
volumes also provide some insight regarding the receptor-binding site shape and size.
2. Molecular topological difference (MTD)
Simons and his coworkers developed 281 a quantitative 3D-approach, the minimal
steric (topologic) difference approach. Minimal topological difference use a
‘hypermolecule’ concept for molecular alignment which correlated vertices (atoms) in
the hypermolecule (a superposed set of molecules having common vertices) to
activity differences in the series

Advances in 3DQSAR
3. Comparative molecular movement analysis (COMMA).
COMMA – a unique alignment independent approach. The 3D QSAR analysis utilizes a
succinct set of descriptors that would simply characterize the three dimensional
information contained in the movement descriptors of molecular mass and charge up to
and inclusive of second order
4. Hypothetical Active Site Lattice (HASL)
Inverse grid based methodology developed in 1986-88, that allow the mathematical
construction of a hypothetical active site lattice which can model enzyme-inhibitor
interaction and provides predictive structure-activity relationship for a set of competitive
inhibitors. Computer-assisted molecule to molecule match which makes the use of
multidimensional representation of inhibitor molecules. The result of suchmatching are
used to construct a hypothetical active site by means of a lattice of points which is
capable of modeling enzyme-inhibitor interactions

Advances in 3DQSAR
5. Self Organizing Molecular Field Analysis (SOMFA)
SOMFA – utilizing a self-centered activity, i.e., dividing the molecule set into actives (+)
and inactives (-), and a grid probe process that penetrates the overlaid molecules, the
resulting steric and electrostatic potentials are mapped onto the grid points and are
correlated with activity using linear regression.
6. Comparative Molecular Field Analysis (COMFA)
The comparative molecular field analysis a grid based technique, most widely used tools
for three dimensional structure-activity relationship studies was introduced in 1988, is
based on the assumption that since, in most cases, the drug-receptor interactions are
noncovalent, the changes in biological activities or binding affinities of sample compound
correlate with changes in the steric and electrostatic fields of these molecules. These field
values are correlated with biological activities by partial least square (PLS) analysis.
7. Comparative Molecular Similarity Indices (COMSIA)
COMSIA is an extension of COMFA methodology where molecular similarity indices can
serve as a set of field descriptors in a novel application of 3d QSAR referred to as COMSIA.

Advances in QSAR
3D Pharmacophore modeling
Pharmacophore modeling is powerful method to identify new potential drugs.
Pharmacophore models are hypothesis on the 3D arrangement of structural properties
such as hydrogen bond donor and acceptor properties, hydrophobic groups and aromatic
rings of compounds that bind to the biological target.288 The pharmacophore concept
assumes that structurally diverse molecules bind to their receptor site in a similar way,
with their pharmacophoric elements interacting with the same functional groups of the
receptor.

4D-QSAR
4D-QSAR analysis incorporates conformational and alignment freedom into
the development of 3D-QSAR models for training sets of structure-activity
data by performing ensemble averaging, the fourth "dimension". The fourth
dimension in 4-D QSAR is the possibility to represent each molecule by an
ensemble of orientations, and protonation states - thereby significantly
reducing the bias associated with the choice of the ligand alignment. The
most likely bioactive conformation/alignment is identified by the genetic
algorithm.
5D-QSAR
The fifth dimension in 5-D QSAR is the possibility to represent an ensemble of
up to six different induced-fit models. The model yielding the highest
predictive surrogates is selected during the simulated evolution.
6D-QSAR
6D-QSAR allows for the simultaneous evaluation of different solvation
models. Software programme BiografX, new Unix platform combines the
multi-dimensional QSAR tools Quasar, Raptor and Symposar under a single
user-interface. The Macintosh version was released on March 15, 2007 and
the PC/Linux version was released on September 15, 2007.292

Application
Chemical
• One of the first historical QSAR applications was to predict boiling
points.
• It is well known for instance that within a particular family of
chemical compounds, especially of organic chemistry, that there are
strong correlations between structure and observed properties.
• There is a clear trend in the increase of boiling point with an
increase in the number carbons, and this serves as a means for
predicting the boiling points of higher alkanes.
• A still very interesting application is the Hammett equation, Taft
equation and pKa prediction methods.

Application
Biological:
• The biological activity of molecules is usually measured in assays to
establish the level of inhibition of particular signal
transduction or metabolic pathways.
• Drug discovery often involves the use of QSAR to identify chemical
structures that could have good inhibitory effects on specific targets and
have low toxicity (non-specific activity). Of special interest is the
prediction of partition coefficient log P, which is an important measure
used in identifying "druglikeness" according to Lipinski's Rule of Five.
• QSAR can also be used to study the interactions between the structural
domains of proteins. Protein-protein interactions can be quantitatively
analyzed for structural variations resulted from site-directed mutagenesis.

Applications
• Prediction of activity Prediction of activity
• Diagnosis of mechanism of drug action
• Lead compound optimization
• Modeling of ADME parameters
• Ecotoxicological modeling
• Property prediction

Successful stories of CADD
• Atypical antipsychotics
• Cimetidine
• Selective COX-2 inhibitor NSAIDs
• Enfuvirtide, a peptide HIV entry inhibitor
• Nonbenzodiazepines like zolpidem and zopiclone
• Raltegravir, an HIV integrase inhibitor
• SSRIs (selective serotonin reuptake inhibitors), a
class of antidepressants
• Zanamivir, an antiviral drug

Validation of QSAR Models - Strategies and Importance :Strategies and
Importance
• Validation methods are needed to establish the predictiveness of a model on
unseen data and to help determine the complexity of an equation that the amount
of data justifies.
• Using the data that created the model (an internal method) or using a separate
data set (an external method) can help validate the QSAR model.
• The methods of least squares fit (R2), crossvalidation (Q2)36-38 , adjusted R2
(R2adj), chi-squared test ( 2), rootmean- squared error (RMSE), bootstrapping and
scrambling (Y-Randomization) are internal methods of validating a model.
• The best method of validating a model is an external method, such as evaluating
the QSAR model on a test set of compounds.
• These are statistical methodologies used to ensure the models created are sound
and unbiased (“good model”).
• A poor model can do more harm than good, thus confirming the model as a “good
model” is of utmost importance.

Statistical Methods for building QSAR models
• Statistical or chemometric techniques form the mathematic foundation
for building a QSAR model.
• A brief history with respect to QSAR analysis and some of the methods
are described briefly.
• Most easily interpretable method was found to be linear regression
analysis among various statistical methods for QSAR.
• These regressions represent direct correlation of independent variables
(x) with a dependent variable (y).
• This model can be considered for prediction of y from the data of x
variables.
• This can either belong to qualitative or quantitative set of system.
• Inclusive variants can be SLR, MLR, and stepwise MLR. Brief
explanation of these variants is as follows.

 Linear regression analysis (RA)
• Simple linear regression (SLR)
• Multiple linear regression (MLR)
• Stepwise multiple linear regression
 Multivariate data analysis
• Principal component analysis (PCA)
• Principal components regression (PCR)
• Partial least square analysis (PLS)
• Genetic function approximation (GFA)
• Genetic partial least squares (G/PLS)
 Pattern recognition
• Cluster analysis
• Artificial neural networks (ANNs)
• k-Nearest neighbor (kNN)

Simple linear regression (SLR)
• This method performs as a standard linear regression calculation in
generation of QSAR model in the form of equations which include a
single independent descriptors x and y as a dependent variable.
• This technique is found to be very promising for generating structure
and activity relationships by exploring some of the most important
descriptors used in governing the activity
• whereas some of multiple descriptors’ interaction was neglected.
• The simple linear regression can be expressed by the following
equation (Eq. 1):
• where y is the dependent variable, x is the independent variable, a is
the constant, and b is the regression coefficient.

Multiple linear regression (MLR)
• This method is the extension of SLR to more than one dimension.
• In this method, standard multivariable regression calculations are performed.
Identification of a drug property is carried out on all of the descriptors under
investigation.
• Correlation possibility is checked by the value of multiple correlation
coefficient (r), t-value through leave-one-out method.
• Correlation is checked by r2 or q2 values which are usually known as cross-
validated correlation coefficient.
• This method is also known as linear free-energy relationship method
(LFER). The relationship is expressed in single multiple-term linear equation
(Eq. 2) as follows:

Stepwise multiple linear regression
• This method commonly used variant MLR which creates a multiple-term
linear equation, but not all the independent variables are used.
• This method has a good utilization when the number of descriptors is large
and main descriptors are unknown. Moreover, orthogonal latent variables
can be used in MLR (Berk, 2003b).

Partial least square (PLS)
• PLS gives a statistically robust solution even when the independent
variables are highly interrelated among themselves, or when the
independent variables exceed the number of observations.
• PLS is an iterative regression method that produces its solutions based on
linear transformation of a large number of original descriptors to a small
number of new orthogonal terms called latent variables (Wold et al., 1993).
• Thus, this method is counted as standard statistical one.

Principal components analysis (PCA))
• This method is known to create a new set of orthogonal descriptors
referred to as principal components (PCs) which describe most of the
information contained in the independent variables in order of decreasing
variance.
• In this method, QSAR model is not generated but still it witness for
relationship among unlike variables.
• PCA reduces dimensionality of data set of descriptors to actual amount of
data.
• To generate a multiple-term linear equation, PCR applies the scores from
PCA decomposition as regressors in QSAR model

Genetic function approximation (GFA)
• This method serves as an alternative to standard regression analysis
for building QSAR equations (Rogers and Hopfinger, 1994).
• It can build linear as well as higher order non-linear equations. Genetic
partial least squares (G/PLS or GA-PLS) is an important tool which has
evolved by combining the best feature of GFA and PLS.
• This method is extensively used by the researchers/scientists around
the globe

Cluster analysis
• This analysis method is a pattern recognition method used to investigate
the relationship between observations associated with many other
properties and to partition the data set into categories into similar
elements.
• This also implies to identify which of the subsets share similar physical
properties

Artificial neural networks (ANNs)
• ANNs are useful tools in QSAR/QSPR studies, and particularly in case where
it is difficult to specify an exact mathematical model for describing a given
structure property relationship.
• Most of these works used neural networks based on the back-propagation
learning algorithm, which has some disadvantages such as local minimum,
slow convergence, time-consuming non-linear iterative optimization, and
difficulty in explicit optimum network configuration (Walczak and Massart,
2000).
• The method of artificial neural networks originated from the real neurons that
are present in an animal brain. ANNs are parallel computational systems
consisting of groups of highly interconnected processing elements called

• The input layer as the very first one and each of its neurons receives data
from user, which corresponds to one of the independent variables used as
inputs in QSAR.
• After input layer, there are many layers of neurons, collectively known as the
hidden layers.
• The last layer is termed to be the output layer, and its neurons handle the
output from the network.
• Each layer may make its independent computations and may pass the results
yet to another layer.
• The result of the transfer function is communicated to the neurons in the
output layer. This is the point where the results are interpreted and presented
finally.

k-Nearest neighbor
• The kNN approaches are executed by distances between an object which is
unknown and all the objects in the training set.
• Based on the calculation of distances, the objects from training set most
similar to object unknown are selected.
• Finally, the optimum k value is selected by optimization through categorizing
the sample test set

QSRR

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to QSRR

Similar to QSRR (20)

More from Shikha Popali

More from Shikha Popali (20)

Recently uploaded

Recently uploaded (20)

QSRR