Influencing policy (training slides from Fast Track Impact)
simple discriminant
1. SIMPLE
DISCRIMINANT
ANALYSIS
Business Research Methods Project
Contents
Definition:......................................................................................................................................................1
Objective: ......................................................................................................................................................1
Purpose:.........................................................................................................................................................1
Situations for its use:.......................................................................................................................................2
Application of discriminant analysis: ...............................................................................................................2
Assumptions: .................................................................................................................................................2
Terminology Variables in the analysis:.............................................................................................................2
Steps in Analysis:...........................................................................................................................................3
Problem:........................................................................................................................................................3
Interpretation Of Output..............................................................................................................................8
2. Summary of canonical discriminant functions:..............................................................................................9
Problem:......................................................................................................................................................11
Interpretation of Output:............................................................................................................................15
Summary of canonical discriminant functions:............................................................................................15
Problem:......................................................................................................................................................17
Interpretation of Output:............................................................................................................................20
Problem:......................................................................................................................................................24
Interpretation of Output:............................................................................................................................28
Summary of canonical discriminant functions:............................................................................................29
Problem:......................................................................................................................................................33
Interpretation of Output:............................................................................................................................36
Summary of Canonical Discriminant Functions: .........................................................................................36
Problem:......................................................................................................................................................39
Interpretation of Output:............................................................................................................................43
Summary of Canonical Discriminant Functions: .........................................................................................44
Managerial Implications:...............................................................................................................................47
Definition:
Discriminant analysis is a multivariate statistical technique used for classifying a set of observations into
predefined groups.
Objective:
To understand group differences and to predict the likelihood that a particular entity will belong to a particular
class or group based on independent variables
Purpose:
The main purpose is to classify a subject into one of the two groups on the basis of some independent traits.
A second purpose of the discriminant analysis is to study the relationship between group membership and the
variables used to predict the group membership.
3. Situations for its use:
When the dependent variable is dichotomous or multichotomies.
Independent variables are metric, i.e. interval or ratio.
Application of discriminant analysis:
To identify the characteristics on the basis of which one can classify as an individual:
Basketball or Volleyball on the basis of anthropometric variables.
High or low performer on the basis of skill.
Juniors’ or seniors’ category on the basis of the maturity parameters.
What we do in discriminant analysis:
It is also known as discriminant function analysis. In, discriminant analysis, the dependent variable is a
categorical variable, whereas independent variables are metric. After developing the discriminant model, for a
given set of new observation the discriminant function Z is computed, and the subject/ object is assigned to the
first group if the value of Z is less than 0 and to the second group if more than 0. This criterion holds true if an
equal number of observations are taken in both the groups in developing a discriminant function.
Assumptions:
Sample size: group sizes of the defendant should not be grossly different, i.e. 80:20, here logistic
regression may be preferred.
Should be at least five times the number of independent variables.
Normal distribution: Each of the independent variable is normally distributed.
Homogeneity of variances/covariance: All variables have linear and homoscedastic relationships.
Outliers: Outliers should not be present in the data. DA is highly sensitive to the inclusion of outliers.
Non-multicollinearity: There should be any correlation among the independent variables.
Mutually exclusive: The groups must be mutually exclusive, with every subject or case belonging to
only one group.
Classification: Each of the allocations for the dependent categories in the initial classification is correctly
classified.
Variability: No independent variables should have a zero variability in either of the groups formed by the
dependent variable.
Terminology Variables in the analysis:
Discriminant function: A discriminant function is a latent variable which is constructed as a linear
combination of independent variables, such that
Z= c+b1X1+ b2X2+… +bnXn
The discriminant function is also known as canonical root. This discriminant function is used to classify the
subject/cases into one of the two groups on the basis of the observed values of the predictor variables
Classification matrix: In DA, it serves as a yardstick in measuring the accuracy of a model in classifying an
individual /case into one of the two groups. It is also known as confusion matrix, assignment matrix, or prediction
4. matrix. It tells us as to what percentage of the existing data points are correctly classified by the model developed
in DA.
Stepwise Method of Discriminant Analysis: Discriminant function can be developed either by entering all
independent variables together or in stepwise depending upon whether the study is confirmatory or exploratory.
Power of Discriminatory Variables: After developing the model in the discriminant analysis based on the
selected independent variables, it is important to know the relative importance of the variables so selected.
Box’s M Test: By using Box’s M Tests, we test a null hypothesis that the covariance matrices do not differ
between groups formed by the dependent variable. If the Box’s M Test is insignificant, it indicates that the
assumptions required for DA hold true.
Eigenvalues: Eigenvalue is the index of overall fitness.
WILKS lambda: It measures the efficiency of discriminant function in the model. Its value shows, how much
percentage of variability in dependent variable is not explained by the independent variables.
Canonical correlation: The canonical correlation is the multiple correlation between the predictors and the
discriminant function. With only one function, it provides an index of overall model fit which is interpreted as
being the proportion of variance explained (R2).
Steps in Analysis:
1. In step one the independent variables which have the discriminating power are being chosen.
2. A discriminant function model is developed by using the coefficients of the independent variables
3. In step three Wilk’s lambdas are computed for testing the significance of discriminant function.
4. In step four the independent variables which possess importance in discriminating the groups are being
found.
5. In step five classifications of subjects to their respective group is being made
Problem:
Analysis ofRisk by a Bank Manager for giving a Loan with variables Age, Salary and Number of years of
marriage.
The data available for analysis is as follows:-
8. Next we save the data and determine the output of analysis performed.
9. Interpretation Of Output
Group Statistics
RISK Mean
Std.
Deviation
Valid N (listwise)
Unweighted Weighted
LOW
RISK
YEARS 30.3750 3.15945 8 8.000
LAKHS 4.0125 .90307 8 8.000
NO OF
YEARS
4.5000 2.26779 8 8.000
HIGH
RISK
YEARS 27.5000 3.58569 8 8.000
LAKHS 3.1250 .82937 8 8.000
NO OF
YEARS
4.0000 2.26779 8 8.000
Total YEARS 28.9375 3.58643 16 16.000
LAKHS 3.5688 .95479 16 16.000
NO OF
YEARS
4.2500 2.20605 16 16.000
Box test of equality of covariance matrices: By using Box’s M Tests, we test a null hypothesis that the covariance
matrices do not differ between groups formed by the dependent variable. If the Box’s M Test is insignificant, it
indicates that the assumptions required for DA holds true which is not true in this case.
Log Determinants
RISK Rank
Log
Determinan
t
LOW
RISK
3 3.289
HIG
H
RISK
3 3.385
Poole
d
withi
n-
group
s
3 3.734
The ranks and natural logarithms of
determinants printed are those of
the group covariance matrices.
10. Summary of canonical discriminant functions:
The canonical correlation is the multiple correlation between the predictors and the discriminant function. With
only one function it provides an index of overall model fit which is interpreted as being the proportion of variance
explained
Standardized Canonical Discriminant
Function Coefficients
Function
1
YEARS .670
LAKHS .793
NO OF
YEARS
-.027
Eigenvalues
Function Eigenvalue
% of
Variance
Cumulative
%
Canonical
Correlation
1 .541a 100.0 100.0 .592
a. First 1 canonical discriminant function Was used in the analysis.
Eigen value is the index of overall fit. It shows a high correlation value.
Wilks' Lambda
Test of Function(s) Wilks' Lambda Chi-square df Sig.
1 0.649 5.404 3 0.145
WILKS lambda measures the efficiency of discriminant function in the model. Its value shows, how much
percentage of variability in dependent variable is not explained by the independent variables which is moderate in
this case.
12. Problem:
To identify the public response when choosing a Mobile Phone.
The data related to choose is shown as follow in SPSS file:
These are various variables related to the problem.
13. The data related to these 121 responses is shown as follows:
The stepwise method of doing discriminant analysis through SPSS is shown as:
1.
14. 2. Now we have to define variables as grouping and independent variables.
3. Now we will change the stastical values and classify our variables:
15. Now we will save our data pad our output results will be displayed.
16. Interpretation of Output:
Box test of equality of covariance matrices: By using Box’s M Tests, we test a null hypothesis that the covariance
matrices do not differ between groups formed by the dependent variable. If the Box’s M Test is insignificant, it
indicates that the assumptions required for DA holds true which is not true in this case.
Summary of canonical discriminant functions:
The canonical correlation is the multiple correlation between the predictors and the discriminant function. With
only one function, it provides an index of overall model fit which is interpreted as being the proportion of
variance explained:
17. WILKS lambda measures the efficiency of discriminant function in the model. Its value shows, how much
percentage of variability in dependent variable is not explained by the independent variables which is high in this
case.
18. Problem:
Was the person applied for loan is elligible or not?
These are various variables related to the problem
The data related to these people is shown as follows:
19. The stepwise method of doing discriminant analysis through SPSS is shown as:
20.
21. Now we will save our data nad our output results wiil be displayed .
Interpretation of Output:
Group Statistics
previously defaulted Mean Std. Deviation
Valid N (listwise)
Unweighted Weighted
1.00 other debt in thousand 3.7120 .77516 5 5.000
credit card debt in
thousands
3.6500 4.37863 5 5.000
debt to income ratio 15.2000 6.49731 5 5.000
household income in
thousand
62.6000 64.43834 5 5.000
years at current adress 8.8000 6.41872 5 5.000
education in years 2.0000 .70711 5 5.000
age in years 32.2000 7.75887 5 5.000
2.00 other debt in thousand 2.9158 3.60602 19 19.000
credit card debt in
thousands
1.3789 1.33380 19 19.000
22. debt to income ratio 9.2474 7.31759 19 19.000
household income in
thousand
47.3684 28.08560 19 19.000
years at current adress 11.1579 8.42129 19 19.000
education in years 1.1579 .50146 19 19.000
age in years 38.0000 7.66667 19 19.000
Total other debt in thousand 3.0817 3.22338 24 24.000
credit card debt in
thousands
1.8521 2.36944 24 24.000
debt to income ratio 10.4875 7.43951 24 24.000
household income in
thousand
50.5417 37.14013 24 24.000
years at current adress 10.6667 7.97641 24 24.000
education in years 1.3333 .63702 24 24.000
age in years 36.7917 7.89044 24 24.000
Box test of equality of covariance matrices: By using Box’s M Tests, we test a null hypothesis that the covariance
matrices do not differ between groups formed by the dependent variable. If the Box’s M Test is insignificant, it
indicates that the assumptions required for DA holds true which is not true in this case.
Summary of canonical discriminant functions:
The canonical correlation is the multiple correlation between the predictors and the discriminant function. With
only one function it provides an index of overall model fit which is interpreted as being the proportion of variance
explained
Variables in the Analysis
Step Tolerance F to Remove Wilks' Lambda
1 education in years 1.000 9.462
2 education in years .884 12.708 .890
debt to income ratio .884 5.489 .699
23. Eigenvalues
Function Eigenvalue % of Variance Cumulative %
Canonical
Correlation
1 .804a
100.0 100.0 .668
a. First 1 canonical discriminant functions were used in the analysis.
Eigen values Eigen value is the index of overall fit. It shows a high
correlation value.
Wilks' Lambda
Test of Function(s) Wilks' Lambda Chi-square df Sig.
1 .554 12.389 2 .002
WILKS lambda measures the efficiency of discriminant function in the model. Its value shows, how much
percentage of variability in dependent variable is not explained by the independent variables which is not high in
this case
Classification Results:
previously defaulted
Predicted Group Membership
Total1.00 2.00
Original Count 1.00 5 0 5
2.00 3 16 19
% 1.00 100.0 .0 100.0
2.00 15.8 84.2 100.0
24. Classification Results:
previously defaulted
Predicted Group Membership
Total1.00 2.00
Original Count 1.00 5 0 5
2.00 3 16 19
% 1.00 100.0 .0 100.0
2.00 15.8 84.2 100.0
Functions at Group Centroids
previously defaulted
Function
1
1.00 1.673
2.00 -.440
Unstandardized canonical
discriminant functions evaluated at
group means
25. Problem:
A large international air carrier has collected data on employees in three different job classifications: 1)
customer service personnel, and 2) mechanics. The director of Human Resources wants to know if these
two job classifications appeal to different personality types. Each employee is administered a battery of
psychological test which include measuresofinterest in outdoor activity, sociability and conservativeness.
The data related to choice is shown as follow in SPSS file:
These are the variables related to the problem:
26. The data related to these 178 responses is shown as follows:
The stepwise method of doing discriminant analysis through SPSS is shown as:
1.
27. 2. Now we have to define variables as grouping and independent variables.
3. Now we will change the statistical values and classify our variables:
28. Now we will save our data pad our output results will be displayed.
29. Interpretation of Output:
Group Statistics
job Mean Std. Deviation
Valid N (listwise)
Unweighted Weighted
customer service outdoor 12.5176 4.64863 85 85.000
social 24.2235 4.33528 85 85.000
conservative 9.0235 3.14331 85 85.000
jid 43.0000 24.68130 85 85.000
mechanic outdoor 18.5376 3.56480 93 93.000
social 21.1398 4.55066 93 93.000
conservative 10.1398 3.24235 93 93.000
jid 47.0000 26.99074 93 93.000
dispatch outdoor 15.5758 4.11025 66 66.000
social 15.4545 3.76699 66 66.000
conservative 13.2424 3.69224 66 66.000
jid 33.5000 19.19635 66 66.000
Total outdoor 15.6393 4.83993 244 244.000
social 20.6762 5.47926 244 244.000
conservative 10.5902 3.72679 244 244.000
jid 41.9549 24.78903 244 244.000
Box test of equality of covariance matrices: By using Box’s M Tests, we test a null hypothesis that the covariance
matrices do not differ between groups formed by the dependent variable. If the Box’s M Test is insignificant, it
indicates that the assumptions required for DA holds true which is not true in this case.
30. Log Determinants
job Rank
Log
Determinant
customer service 4 14.521
mechanic 4 14.394
dispatch 4 13.983
Pooled within-groups 4 14.491
The ranks and natural logarithms of determinants
printed are those of the group covariance matrices.
Test Results
Box's M 39.442
F Approx. 1.924
df1 20
df2 176082.775
Sig. .008
Tests null hypothesis of equal
population covariance matrices.
Summary of canonical discriminant functions:
The canonical correlation is the multiple correlation between the predictors and the discriminant function. With
only one function it provides an index of overall model fit which is interpreted as being the proportion of variance
explained.
Eigenvalues
Function Eigenvalue % of Variance Cumulative %
Canonical
Correlation
1 1.150a
77.4 77.4 .731
2 .336a
22.6 100.0 .502
a. First 2 canonical discriminant functions were used in the analysis.
31. Wilks' Lambda
Test of Function(s) Wilks' Lambda Chi-square df Sig.
1 through 2 .348 252.757 8 .000
2 .748 69.446 3 .000
WILKS lambda measures the efficiency of discriminant function in the model. Its value shows, how much
percentage of variability in dependent variable is not explained by the independent variables which is high in this
case
Standardized Canonical Discriminant Function Coefficients
Function
1 2
outdoor -.374 .908
social .836 .168
conservative -.504 -.242
jid .251 .210
Structure Matrix
Function
1 2
social .747*
.202
conservative -.459*
-.217
outdoor -.292 .938*
jid .137 .293*
Pooled within-groups correlations between discriminating variables and standardized canonical discriminant
functions
Variables ordered by absolute size of correlation within function.
*. Largest absolute correlation between each variable and any discriminant function
32. Canonical Discriminant Function Coefficients
Function
1 2
outdoor -.091 .221
social .196 .039
conservative -.151 -.073
jid .010 .009
(Constant) -1.455 -3.854
Unstandardized coefficients
Functions at Group Centroids
job
Function
1 2
customer service 1.225 -.427
mechanic -.053 .734
dispatch -1.504 -.484
Unstandardized canonical discriminant
functions evaluated at group means
Classification Processing Summary
Processed 244
Excluded Missing or out-of-range
group codes
0
At least one missing
discriminating variable
0
Used in Output 244
33. Prior Probabilities for Groups
job Prior
Cases Used in Analysis
Unweighted Weighted
customer service .333 85 85.000
mechanic .333 93 93.000
dispatch .333 66 66.000
Total 1.000 244 244.000
Classification Function Coefficients
job
customer
service mechanic dispatch
outdoor .568 .940 .803
social 1.294 1.090 .758
conservative .695 .804 1.111
jid .087 .084 .059
(Constant) -25.342 -27.385 -21.556
Fisher's linear discriminant functions
34. Problem:
A national retail chain desires to build a discriminant function that would enable the firm to distinguish
between normal customers and loyal customers.
The data available for analysis is as follows:-
The variables chosen for analysis
35. Now the stepwise method of performing Discriminant Analysis:-
Now defining variables as Grouping and Independent variables:-
36. Now determining the Statistical values and Classification:-
Next we save the data and determine the output of analysis performed.
37. Interpretation of Output:
Group Statistics
Loyalty Mean Std. Deviation
Valid N (listwise)
Unweighted Weighted
Normal Customers Frequency 20.80 12.599 15 15.000
Average_Purchase 24677.87 12889.131 15 15.000
Years 3.87 1.995 15 15.000
Loyal Customers Frequency 30.40 7.129 15 15.000
Average_Purchase 30694.53 17381.766 15 15.000
Years 6.20 1.656 15 15.000
Total Frequency 25.60 11.181 30 30.000
Average_Purchase 27686.20 15343.289 30 30.000
Years 5.03 2.157 30 30.000
Box test of equality of covariance matrices: By using Box’s M Tests, we test a null hypothesis that the covariance
matrices do not differ between groups formed by the dependent variable. If the Box’s M Test is insignificant, it
indicates that the assumptions required for DA holds true which is not true in this case.
Log Determinants
Loyalty Rank
Log
Determinant
Normal Customers 3 24.853
Loyal Customers 3 24.379
Pooled within-groups 3 25.073
The ranks and natural logarithms of determinants
printed are those of the group covariance
matrices.
Summary of Canonical Discriminant Functions:
The canonical correlation is the multiple correlation between the predictors and the discriminant
function. With only one function it provides an index of overall model fit which is interpreted as being
the proportion of variance explained
38. Canonical Discriminant
Function Coefficients
Function
1
Frequency .061
Average_Purchase .000
Years .400
(Constant) -4.173
Z=0.061X1+0X2+0.40X3-4.173
Eigenvalues
Function Eigenvalue % of Variance Cumulative %
Canonical
Correlation
1 .730a 100.0 100.0 .649
First 1 canonical discriminant functions were used in the analysis.
Wilks' Lambda
Test of Function(s) Wilks' Lambda Chi-square df Sig.
1 .578 14.518 3 .002
WILKS lambda measures the efficiency of discriminant function in the model. Its value shows, how much
percentage of variability in dependent variable is not explained by the independent variables which is not high in
this case.
Classification Resultsa,c
Loyalty
Predicted Group Membership
Total
Normal
Customers
Loyal
Customers
Original Count Normal Customers 12 3 15
Loyal Customers 1 14 15
% Normal Customers 80.0 20.0 100.0
Loyal Customers 6.7 93.3 100.0
Cross-validatedb
Count Normal Customers 12 3 15
Loyal Customers 2 13 15
% Normal Customers 80.0 20.0 100.0
Loyal Customers 13.3 86.7 100.0
39. 86.7% of original grouped cases correctly classified.
Cross validation is done only for those cases in the analysis. In cross validation, each case is
classified by the functions derived from all cases other than that case.
83.3% of cross-validated grouped cases correctly classified.
Functions at Group Centroids
Loyalty
Function
1
Normal Customers -.825
Loyal Customers .825
Unstandardized canonical
discriminant functions
evaluated at group means
40. Problem:
To identify the players into different categories for selection process.
The data related to players is shown as follow in SPSS file:
These are various variables related to the problem.
41. The data related to these 20 players is shown as follows:
The stepwise method of doing discriminant analysis through SPSS is shown as:
1.
42. 2. now we have to define variables as grouping and independent variables.
3. now we will change the stastical values and classify our variables:
43. Now we will save our data and our output results will be displayed.
45. Box test of equality of covariance matrices: By using Box’s M Tests, we test a null hypothesis that the
covariance matrices do not differ between groups formed by the dependent variable. If the Box’s M Test
is insignificant, it indicates that the assumptions required for DA holds true which is not true in this case.
Log Determinants
player Rank
Log
Determinant
1.00 .a .b
2.00 .c .b
Pooled within-
groups
10 -1.480
The ranks and natural logarithms of
determinants printed are those of the group
covariance matrices.
a. Rank < 10
b. Too few cases to be non-singular
c. Rank < 9
Summary of Canonical Discriminant Functions:
The canonical correlation is the multiple correlation between the predictors and the discriminant
function. With only one function it provides an index of overall model fit which is interpreted as being
the proportion of variance explained
Canonical Discriminant Function
Coefficients
Function
1
height .346
arm length -.115
leg length .251
palm length -1.033
shoulder strength -.135
reaction time 9.029
back exposive -.459
speed -1.553
judgement -.140
patience .251
(Constant) -41.099
46. Canonical Discriminant Function
Coefficients
Function
1
height .346
arm length -.115
leg length .251
palm length -1.033
shoulder strength -.135
reaction time 9.029
back exposive -.459
speed -1.553
judgement -.140
patience .251
(Constant) -41.099
Unstandardized coefficients
Z=-41.099+.346x1 -.115x2+.251x3-.135x4….
Eigenvalues
Functio
n
Eigenvalu
e
% of
Variance
Cumulative
%
Canonical
Correlation
1 3.662a 100.0 100.0 .886
a. First 1 canonical discriminant functions were used in the
analysis.
Eigen values Eigen value is the index of overall fit. It shows a high
correlation value.
Wilks' Lambda
Test of
Function(s)
Wilks'
Lambda Chi-square df Sig.
1 .214 18.474 10 .047
WILKS lambda measures the efficiency of discriminant function in the model. Its value shows, how
much percentage of variability in dependent variable is not explained by the independent variables
which is not high in this case.
47. Classification Results:
player
Predicted Group
Membership
Total1.00 2.00
Original Count 1.00 9 1 10
2.00 0 9 9
% 1.00 90.0 10.0 100.0
2.00 .0 100.0 100.0
Cross-validateda Count 1.00 8 2 10
2.00 4 5 9
% 1.00 80.0 20.0 100.0
2.00 44.4 55.6 100.0
a. Cross validation is done only for those cases in the analysis. In cross
validation, each case is classified by the functions derived from all cases
other than that case.
b. 94.7% of original grouped cases correctly classified.
c. 68.4% of cross-validated grouped cases correctly classified
Functions at Group Centroids
player
Function
1
1.00 -1.717
2.00 1.908
Unstandardized canonical
discriminant functions
evaluated at group means.
48. Managerial Implications:
After getting to know the Technical Aspect of this useful concept, we can conclude that DA has the following
applications in the field of Marketing:
Discriminate analysis, a multivariate technique used for market segmentation and predicting group
membership is often used for this type of problem because of its ability to classify individuals or
experimental units into two or more uniquely defined populations.
Product research – Distinguish between heavy, medium, and light users of a product in terms of their
consumption habits and lifestyles.
Perception/Image research – Distinguish between customers who exhibit favorable perceptions of a store
or company and those who do not.
Advertising research – Identify how market segments differ in media consumption habits.
Direct marketing – Identify the characteristics of consumers who will respond to a direct marketing
campaign and those who will not.
49. Sources
Textbook: Business Research Methods,a South-Asian Perspective by William G. Zikund
http://www.bluefinik.com/discriminant-analysis-example/
https://en.wikipedia.org/wiki/Discriminant_function_analysis
http://stats.idre.ucla.edu/spss/dae/discriminant-function-analysis/
http://www.cs.uu.nl/docs/vakken/arm/SPSS/spss6.pdf
https://www.slideshare.net/amritashishbagchi/discriminant-analysis-30449666