3. 3
Statistical method are mathematical formula, model and technique that are
used in statistical analysis of research data.
QSAR model represent the mathematical equation correlating the
response of chemical (activity or property ) with their structural and
physicochemical information in form of numerical quantities i,e
descriptor.
Regression based approach are employed data of chemical are
entirely numerical i, e quantitative or semi-quantitative chemical
response are modulated using classification technique
Developed QSAR model are also subjected to several validation test
to check for reliability of developed correlation method.
After it’s development ,QSAR model is usually verified by multiple
statistical validation strategies estimation of predictivity and stability.
Statistical tools used for data pre treatment feature selection , model
development , validation of QSAR .
Computer machine learning based method are also useful in developing
QSAR model.
INTRODUCTION
4. METHODS
1) Chemometric tools:
Various chemometric tools in QSAR
Pre-treatment of data table
Features selection
Multiple linear regression
Partial least square
Cluster analysis
2) Quality metrics:
Important metrics for determination quality model QSAR
Types of validation
Validation metrics for regression based QSAR model
Validation metrics employ in classification based QSAR
Parameter for receiver operating (ROC) characteristic analysis
4
5. 1) Chemometric tools
Various chemometric tool used in QSAR
1) regression based approach
a)Multiple Linear Regression (MLR)
b)Partial Least Square (PLS)
2) classification based approach
a)Linear Descriminant Analysis (LDA)
b)Cluster analysis (CA)
Pre-treatment of data table
molecular str. Correctly draw
Biological activity or other activity have been taken from authentic source
Descriptor value have been computed using validate software
Response data for QSAR pattern modelling normal distribution pattern
Care shoud also taken to avoid duplicate in data set
Computation 3D descriptor optimization carried out 5
6. Features selection:
• Selection of appropriate descriptor for model development from pool of
large no. of descriptor is an imp.step in QSAR modelling.
• Selection done by variety of ways
Stepwise selection –
partial F- statistic = ‘F’ for inclusion and ‘F’ for exclusion
Multiple Linear Regression:
It is used in QSAR due to its simplicity ,trasparency, reproducibility,
interpretability.
Y= a0 + a1 × X1 + a2 × X2 + a3 × X3 +…………+an× Xn
Where, Y-response Dependent variable
a0-constant term
X1,X2,Xn-descriptorindependent variable
a1,a2,a3-regression coefficient
6
7. Partial Least Square:
• It is better choice over MLR , PLS being generalization of MLR.
• It is used for predicting the pharmacokinetic, Pharmacodynamic ,
Toxicological property from structure derived physicochemical and
structural features.
• These method developed using the regression analysis.
Linear Descriminant Analysis
• LDA separate two more classes of object used for classification problem.
• LDA show the diff between classes of data predicted membership is
calculated by computing a discriminant function (DF) score.
• DF value smaller than cutoff value
DF= C1× X1 + C2 × X2 +……….+ CM × XM+ 0
Where , DF- Discriminant function
C-Discriminant coefficient
X- responding score foe variables
a- constant
m-No. of predictor variables 7
8. Cluster Analysis:
• Cluster defined through analysis of data.
• Cluster analysis maximizes the similarity of cases within each
cluster .
• And maximizes the desimilarity between groups that initially
known.
• It is start with each case separate cluster and then combines the
cluster sequentially reducing no. of cluster at each step only one
cluster is left.
DENDOGRAM
Cluster 2
Cluster 3
Cluster 3
Cluster 1
8
9. 2) Qualitymetrics
Important of metrics for determination of quality of QSAR models
• Advancement in fast and economical computational resources make it feasible to
compute large no. of descriptor using bvarious software.
• QSAR model used to check its predictivity for new untested molecule .
Types of validation
• OECD Principle – Principle 1
Principle 2
Principle 3
Principle 4
Principle 5
• Internal validation
• External validation
9
11. Validation Metrics For Regression Based QSAR
1)Metrics for Internal Validation =
• Leave –one-out (LOO) Cross Validation
• Leave –many-out (LMO) Cross Validation
2)Metrics for External Validation
Validation Metrics Employed in Classification Based QSAR
Validation Metrics can access the performance of classification – based
model in terms of accurate quantitative prediction of dependent variables.
Parameters for = 1) Goodness of fit quality determination
2) Model Performance Parameter
a)True Positive (TP)
b) False Negative (FN)
c) False Positive (FP)
d)True Negative (TN)
11
12. Parameter for Receiver Operating Characteristic
(ROC) Analysis
1) ROC Curve
TP rate- True Positive Rate on Y-axis
FP rate-False Positive Rate on X-axis
2) Metrics for pharmacological Distribution Diagram (PDD)
a) Activity Expectancy
b) Inactivity Expectancy
Activity Expectancy= Ea = % of actives
% of inactive + 100
Inactivity Expectancy= Ei = % of inactives
% of actives + 100
12
13. 13
IMPORTANCE
It is used in
Computational
Chemistry represent
molecular structure as
numerical model
stimulate their
behaviour with the
help of quantum
mechanics .
It can Compute
energy related
properties such as
electronic ,
spectroscopic
properties for
molecule.
It is used for prediction
of Constitutional
Descriptor , molecular
weight , counts of
atom,bonds and rings
,topological descriptors,
connectivity of
molecule.
One of most
significant and
widely used
method is using
software computed
descriptor in
QSAR technique.
14. 14
Equation generatedestablished in
QSAR studies are linear regression
equation.
A number of equation may be
generated or established for one
problem case under study. Statistic
also help in selecting one suitable best
fit equation out of them.
This may be done by checking std.
deviation or variance and other related
statistical parameter for data set used
for QSAR studies series of compound.
Correlation coefficient computed for
data set under study also help in
selecting appropriate QSAR equation.
Application of Statistics