The screening of chemical libraries with traditional methods, such as high-throughput screening (HTS), is expensive and time consuming. Quantitative structure–activity relation (QSAR) modeling is an alternative method that can assist in the selection of lead molecules by using the information from
reference active and inactive compounds. This approach requires good molecular descriptors that are representative of the molecular features responsible for the relevant molecular activity.
2. QUANTITAIVE STRUCTURE ACTIVITY RELATIONSHIP (QSAR)
Quantitative structure activity relationship correlate, within congeneric series of compounds, their chemical or
biological activities, either with certain structural features or with atomic, group or molecular descriptors.
Molecular
Structure
ACTIVITIES
Representation
Feature Selection &
Mapping
DESCRIPTORS
3. DEFINITION OF DESCRIPTOR
The descriptor is the final result of a logic and mathematical procedure which transforms
chemical information encoded within a symbolic representation of a molecule into a useful
number or the result of some standardized experiment.
Descriptor families
1. Topological
2. Fragments
3. Receptor surface
4. Structural
5. Information-
content
6. Spatial
7. Electronic
8. Thermodynamic
9. Conformational
10. Quantum
mechanical
4. Topological Descriptors
Topological indices are a suitable method of translating chemical constitution
into numerical values that can be used for correlations with physical properties
or indeed for QSAR studies.
Topological indices (TIs) representing the molecules as molecular graphs. In a
molecular graph the atoms are represented as dots (vertices), which are
connected to each other by lines (edges), representing the chemical studies.
Form the molecular graphs, paths of a certain length can be calculated.
A path of a certain length (m) represents two atoms that are connected with m
bonds in the shortest pathway between them.
5. Continue……
For example, a path length one represents two atoms, connected with a bond,
and a path of length two indicates two bonds between the atoms. Another term I
a walk of a certain length.
A walk of length one for a given atom is equal to the number of atoms, to which
it is connected.
A walk of length m for a given atom is equal to the sum of walks of length m-1
of all its neighboring atoms (Morgan’s summation procedure).
To calculate Tis usually a hydrogen-suppressed molecular graph is used, i.e.
molecular graph in which the hydrogen atoms are excluded.
Numerous topological indices have been created and used in QSAR studies.
Their calculation is easy , based on the molecular 2-D structure only, thus not
requiring conformational analysis or 3-D optimization of the structure. The
main drawback of Tis is their complex and difficult interpretation.
6. Wiener number
Weiner number introduced in 1947. It is so named after his inventor Harry Wiener
(1924-1998). His life and his achievements first in Chemistry and alter in
medicine have been well-presented by Rouvray.
It is the total distance between all carbon atoms (sum of the distance between each
pair of carbon atoms in the molecule, in terms of carbon-carbon bonds).
1. The smaller this number, the larger is the compactness of the molecules.
2. Method of calculation: Multiply the number of carbon atoms on one side of any
bond by those on the other side; W is the sum of these two values for all bonds.
3. W can also be obtained by simply adding all the elements of the graph distance
matrix above the main diagonal.
4. Hosoyatopological index Zis obtained by counting the k disjoint edges in a graph
(for k=0,1,2,3).
5. Z counts all sets of non-adjacent bonds in a structure.
7. Balaban number
The Balaban index was proposed by A.T. Balaban which also called the
average distance-sum connectivity or J index. It appears to be a very useful
molecular descriptor with attractive properties.
The Balaban index J is a index defined for a graph G with vertex-set V (G) and
edge-set (G), dG(u,v) denotes the distance between vertices u and v in G, and
DG (u) =∑ dG (u, v) is the distance sum of vertex u in G, i.e., the row
v€V(G)
Sum of distance matrix of G corresponding to u. The Balaban index of G is defined
as
J(G) = m/u + 1 ∑ 1/√D(G) (u)Dg(V)
uv€E(G)
Where m is the number of edges and μ is the cyclomatic number of G, respectively.
8. Randic number
The Molecular Connectivity (Randic) Index (c) was introduced by Randic for
the characterization of molecular branching. It is based on the concept of the
degree Di of the vertex I in the hydrogen-suppressed molecular graph. The Di is
equal to the number of bonds from the atom (vertex) I to non-hydrogen atoms.
The term valency of vertex is often use.
9. Connectivity indices
The connectivity index order k may be derived from the adjacency matrix and is
normally written as, kχt, The order k is between 0 and 4 and is the number of
connected non-hydrogen atoms which appears in a given sub-structure.
knt
kχt = ∑ [II⸹i ]-1/2
In above equation, ⸹I is the number of simple (i.e., sigma ) bonds of the atom i to
on hydrogen atoms, Sj represents the jth sub=structure of order k and type t, and
knt is the total number of sub-graphs of order k and type t that can be identified
in the molecular structure. The types used are path (p), cluster (c), and path
cluster (pc).
10. Their numerous successful application in various areas of physics, chemistry,
biology, pharmacology (drug design), and environment sciences out number all
other existing topological indices whose number is approaching one hundred.
There are two major reasons for being so:- first, these indices are based on
chemical, structural (topological and geometrical) and mathematical grounds
and second, they were developed with the idea to parallel important Physico-
chemical properties like boiling points, mobility on chromatography column,
enthalpies of formation, and total molecular surface areas.
11. Molecular shape indices (Kappa)
The molecular shape indices are the basis of a method of molecular structure
quantitation in which attributes of molecular shape are encoded into three
indices (Kappa values). These kappa values are derived from counts of one-
bond, two-bond, two-bond and three bond fragments, each count being made
relative to fragment counts in reference structures which possess a maximum
and minimum value for that number of atoms.
The calculation of the indices begins with the reduction of the molecule to the
hydrogen-suppressed skeleton.
The count of one-, two-, and three-bond fragments, IP, 2P, 3P, respectively. Is
made.
These values are used in calculating the Kappa indices 1x, 2IC, and 3IC. The
calculation of each index is made using the value of mPmin and ‘Pmax counts
for graphs with the same number of atoms, A.
12. Continue……
These latter counts are the minimum and maximum numbers of paths of that
order (m) that can be found in a real or hypothetical structure. Specifically, the
mPmin for all orders of Kappa is the count of paths of length m in the linear
skeleton.
Kappa 1 shows the degree of complexicity of bonding pattern.
Kappa 2 indicates the degree of linearity of bonding patterns.
Kappa 3 indicates the degree of branching at the center of a molecule.
Kappa value is directly proportional to biological activity.
13. Molecular shape indices (Kalpha)
These indices are refinements of the shape index that take into consideration the
contribution covalent radii and hybridization states make to the shape of the
molecule. The indices Kαn are defined:
Kalpha 1 – The descriptor K1 encodes the counts of atoms and the presence of
cycles relative to the minimal and maximal graphs.
Kalpha 2- K2 encodes the branching, P, Pmin and Pmax now denote the number
of path of length 2 in the corresponding graph.
Kalpha 3- K3 counts of path length.
14. Total dipole moment (TDM)
It is partially charge dependent parameter calculated on the basis of center of
charge over the substitution as the origin.
Total lipole moment (TlM)
The lipole of a molecule is a measure of the lipophilic distribution. It is
calculated from the sum of atomic logP values. This property has been
calculated for the amino acid chain using TSAR 3.3.
15. Dipole moment X component
The dipole moment descriptor is a 3D electronic descriptor that indicates the
strength and orientation behavior of a molecule in an electrostatic field.
It describes the polarity of the molecule and its estimated by utilizing partial
atomic coordinates.
Dipole X moments describes the moments using the substituents point of
attachment as an origin with this bond placed along the X axis.
It belong to electrostatic descriptors which explain the charge distribution in a
molecule.
It has been used to model polar interactions that contribute to the determination
of the compound lipophillicity as well as play an important role in drug receptor
interactions due to electrostatic effects.
16. Lipole X component
Lipole X component is the measure of the substituents point of attachment as an
origin.
It is the measure of the lipophilic distribution substituents point of attachment as
an origin with this bond placed along X-axis.
Lipole X-component is positively correlate with the biological activity.
Lipole X component
Negative contribution of hydrophobic descriptor Lipole y-component for (whole
molecule), which gives the description of cell permeability of drug molecule.
It is a directional component of lipophillicity .
The negative contribution of this hydrophobic parameters shows that the
decreasing the hydrophobicity of the molecule by substituting such groups that
decrease the hydrophobicity of the molecule as a whole will account for an increase
in the biological activity.
17. Lipole Z component
It is very clear from the developed models that inhibitory activity will improve with
increase of the lipole Z component which is the measure of the lipophilic
distribution.
Lipophillicity plays a vital role in determining drug distribution inside the body
after absorption and it also shows how quickly they are metabolized and excreted.
Lipophillicity has come into sight as the chief driving force for the binding of drugs
to their receptor targets. Lipole serves as a quantitative descriptors of molecular
lipophillicity.
Lipole Z component is a directional component of lipophillicity.
A positive contribution of lipole Z component towards biological activity explains
better inhibitory activity of compounds with increments of bulky lipophilic group
in whole compounds.
18. Log P (Whole molecules)
Hydrophobicity plays a fundamental role in biochemical processes such a
penetration, distribution, metabolism clearance and affect the activity of a
molecule in the binding state environment.
“Pioneering work by Hansch Leo had led to use of logP in QSAR methods as a
general descriptors of cell permeability”.
The negative coefficient of LogP in the QSPR models indicate negative
contribution of hydrophobicity towards the permeability of selected set of
compounds.
The relative proportionality of the Log P values of whole molecule was
observed in both active and inactive compounds.
19. Molecular refractivity (MR)
The MR index of a molecule is a combined measure of its size and
polarizability.
This fragment constant thermodynamic descriptor relates the effect of
substituents on a reaction center from one types of process to another.
The basic idea behind the use of such as descriptor is that similar changes in
structure are likely to produce similar changes in reactivity ionization, and
binding. It can be experimentally determined or theoretically calculated using
empirical rules.
20. Heat of formation
The enthalpy for forming a molecule from its constituents atoms, a measure
of the relative thermal stability of a molecule. This descriptor is calculated
using the MNDO semi-empirical molecular orbital method of DWAR.
MNDO is the most rigorous quantum chemical technique available for in
QSAR and has a wide range of applicability in conformational analysis,
intermolecular modelling, and chemical reaction modelling.
The atom limit of MNDO is 300 atoms or 300 atomic orbitals (which is less
per molecule).
21. VAMP LUMO (Whole molecules)
VAMP used to calculate electrostatic properties and LUMO is lowest
unoccupied molecular orbital. Molecular orbital (MO) surfaces represent the
various stable electronic distribution of a molecule.
According to frontier orbital theory, the highest occupied, the highest
occupied and lowest unoccupied molecular orbital’s (HOMO and LUMO) are
crucial in predicting the reactivity of a species.
VAMP LUMO energy is related to electron affinity and many chemical
reactions are governed by this descriptors.
High value of LUMO energy contributes negatively to the activity. An
electron-donating substituent, such as hydroxy, or methoxy group on the ring
increase the energy of the LUMO orbital. Electron withdrawing substituents
such as halogens, Lowers the energy of LUMO.
22. VAMP polarization YY
VAMP polarization YY is a spatial descriptor which calculates the electronic
properties of a compound and projects polarization towards Y planes.
There is a direct relation in between polarizability and the number of valence
electrons on every atom.
A positive correlation of VAMP polarization YY with the biological activity
reveals the direct link of chemical reactivity index with the biological activity.
VAMP dipole Y component
VAMP dipole Y component is an (electronic parameter) and is due to the
degree of charge separation in a molecule. It describes the substituted point of
attachment with the bond sited along the Y-axis.
A positive correlation of VAMP dipole Y with the biological activity reveals
the direct link of chemical reactivity index with the biological activity.
23. VAMP dipole X component
VAMP is a (semi-empirical molecular orbital package ) used to determine
electrostatic-properties and perform.
Optimizing of structure such as total energy, electronic-energy nuclear
repulsion energy, accessible surface area, atomic-change, mean polarizability,
heat of formation, HOMO and LUMO eigen-values, ionization potential, total
dipole, polarizability and dipole components.
The positive co-efficient of this expression in proposed model elucidates that
higher the value, better is the activity, and indicating that the biological
activity.
24. Electropological state (E-state) indices
The atom level topological index called the electropological state index (E-
state), was introduced in 1990, which is compound as a graph invariant for
each atom in the molecular graph.
The E-state index is a descriptor that represents the electron density and the
accessibility of those electrons to participate in noncovalent intermolecular
interactions, the index also takes into account the structural configuration of
the nearest neighbors surrounding the atom.
The E-state indices are defined in terms of the ⸹i and ⸹vi values of an atom I
similarly to the connectivity indices.
The E-state index for an atom I (Si, also called atom-level E-state index) is
composed of an intrinsic state term (Ii), plus a sum of perturbations (ΔIij)
from all other atoms in the molecules:
25. Si = Ii + ∑j ΔIij
Where the summation is over the remaining atoms in the molecules. The intrinsic
state (Ii) and the perturbation (ΔIij) terms are calculated as follows:
The E-state indices can be calculated for each atom (such as >C<, >N-, =O, -
Cl) in a molecule, as well as for each hydride group (such as –CH3, >NH, -
OH).
The atom-type E-state indices have recently been introduced as an extension
of the E-state indices.
These indices are defined as the sum of the individual E-state values for a
particular atom type.
Hydrogen atoms can also be included in calculation of E-state values for
deriving hydrogen atom-type E-state indices.