simple discriminant

SIMPLE
DISCRIMINANT
ANALYSIS
Business Research Methods Project
Contents
Definition:......................................................................................................................................................1
Objective: ......................................................................................................................................................1
Purpose:.........................................................................................................................................................1
Situations for its use:.......................................................................................................................................2
Application of discriminant analysis: ...............................................................................................................2
Assumptions: .................................................................................................................................................2
Terminology Variables in the analysis:.............................................................................................................2
Steps in Analysis:...........................................................................................................................................3
Problem:........................................................................................................................................................3
Interpretation Of Output..............................................................................................................................8

Summary of canonical discriminant functions:..............................................................................................9
Problem:......................................................................................................................................................11
Interpretation of Output:............................................................................................................................15
Summary of canonical discriminant functions:............................................................................................15
Problem:......................................................................................................................................................17
Problem:......................................................................................................................................................24
Summary of canonical discriminant functions:............................................................................................29
Problem:......................................................................................................................................................33
Summary of Canonical Discriminant Functions: .........................................................................................36
Problem:......................................................................................................................................................39
Summary of Canonical Discriminant Functions: .........................................................................................44
Managerial Implications:...............................................................................................................................47
Definition:
Discriminant analysis is a multivariate statistical technique used for classifying a set of observations into
predefined groups.
Objective:
To understand group differences and to predict the likelihood that a particular entity will belong to a particular
class or group based on independent variables
Purpose:
The main purpose is to classify a subject into one of the two groups on the basis of some independent traits.
A second purpose of the discriminant analysis is to study the relationship between group membership and the
variables used to predict the group membership.

Situations for its use:
When the dependent variable is dichotomous or multichotomies.
Independent variables are metric, i.e. interval or ratio.
Application of discriminant analysis:
To identify the characteristics on the basis of which one can classify as an individual:
 Basketball or Volleyball on the basis of anthropometric variables.
 High or low performer on the basis of skill.
 Juniors’ or seniors’ category on the basis of the maturity parameters.
What we do in discriminant analysis:
It is also known as discriminant function analysis. In, discriminant analysis, the dependent variable is a
categorical variable, whereas independent variables are metric. After developing the discriminant model, for a
given set of new observation the discriminant function Z is computed, and the subject/ object is assigned to the
first group if the value of Z is less than 0 and to the second group if more than 0. This criterion holds true if an
equal number of observations are taken in both the groups in developing a discriminant function.
Assumptions:
 Sample size: group sizes of the defendant should not be grossly different, i.e. 80:20, here logistic
regression may be preferred.
 Should be at least five times the number of independent variables.
 Normal distribution: Each of the independent variable is normally distributed.
 Homogeneity of variances/covariance: All variables have linear and homoscedastic relationships.
 Outliers: Outliers should not be present in the data. DA is highly sensitive to the inclusion of outliers.
 Non-multicollinearity: There should be any correlation among the independent variables.
 Mutually exclusive: The groups must be mutually exclusive, with every subject or case belonging to
only one group.
 Classification: Each of the allocations for the dependent categories in the initial classiﬁcation is correctly
classiﬁed.
 Variability: No independent variables should have a zero variability in either of the groups formed by the
dependent variable.
Terminology Variables in the analysis:
Discriminant function: A discriminant function is a latent variable which is constructed as a linear
combination of independent variables, such that
Z= c+b1X1+ b2X2+… +bnXn
The discriminant function is also known as canonical root. This discriminant function is used to classify the
subject/cases into one of the two groups on the basis of the observed values of the predictor variables
Classification matrix: In DA, it serves as a yardstick in measuring the accuracy of a model in classifying an
individual /case into one of the two groups. It is also known as confusion matrix, assignment matrix, or prediction

matrix. It tells us as to what percentage of the existing data points are correctly classified by the model developed
in DA.
Stepwise Method of Discriminant Analysis: Discriminant function can be developed either by entering all
independent variables together or in stepwise depending upon whether the study is confirmatory or exploratory.
Power of Discriminatory Variables: After developing the model in the discriminant analysis based on the
selected independent variables, it is important to know the relative importance of the variables so selected.
Box’s M Test: By using Box’s M Tests, we test a null hypothesis that the covariance matrices do not differ
between groups formed by the dependent variable. If the Box’s M Test is insignificant, it indicates that the
assumptions required for DA hold true.
Eigenvalues: Eigenvalue is the index of overall fitness.
WILKS lambda: It measures the efficiency of discriminant function in the model. Its value shows, how much
percentage of variability in dependent variable is not explained by the independent variables.
Canonical correlation: The canonical correlation is the multiple correlation between the predictors and the
discriminant function. With only one function, it provides an index of overall model ﬁt which is interpreted as
being the proportion of variance explained (R2).
Steps in Analysis:
1. In step one the independent variables which have the discriminating power are being chosen.
2. A discriminant function model is developed by using the coefficients of the independent variables
3. In step three Wilk’s lambdas are computed for testing the significance of discriminant function.
4. In step four the independent variables which possess importance in discriminating the groups are being
found.
5. In step five classifications of subjects to their respective group is being made
Problem:
Analysis ofRisk by a Bank Manager for giving a Loan with variables Age, Salary and Number of years of
marriage.
The data available for analysis is as follows:-

The variables chosen for analysis

Now the stepwise method of performing Discriminant Analysis:-
Now defining variables as Grouping and Independent variables:-

Now determining the Statistical values and Classification:-

Next we save the data and determine the output of analysis performed.

Interpretation Of Output
Group Statistics
RISK Mean
Std.
Deviation
Valid N (listwise)
Unweighted Weighted
LOW
RISK
YEARS 30.3750 3.15945 8 8.000
LAKHS 4.0125 .90307 8 8.000
NO OF
YEARS
4.5000 2.26779 8 8.000
HIGH
RISK
YEARS 27.5000 3.58569 8 8.000
LAKHS 3.1250 .82937 8 8.000
NO OF
YEARS
4.0000 2.26779 8 8.000
Total YEARS 28.9375 3.58643 16 16.000
LAKHS 3.5688 .95479 16 16.000
NO OF
YEARS
4.2500 2.20605 16 16.000
Box test of equality of covariance matrices: By using Box’s M Tests, we test a null hypothesis that the covariance
matrices do not differ between groups formed by the dependent variable. If the Box’s M Test is insignificant, it
indicates that the assumptions required for DA holds true which is not true in this case.
Log Determinants
RISK Rank
Log
Determinan
t
LOW
RISK
3 3.289
HIG
H
RISK
3 3.385
Poole
d
withi
n-
group
s
3 3.734
The ranks and natural logarithms of
determinants printed are those of
the group covariance matrices.

Summary of canonical discriminant functions:
The canonical correlation is the multiple correlation between the predictors and the discriminant function. With
only one function it provides an index of overall model ﬁt which is interpreted as being the proportion of variance
explained
Standardized Canonical Discriminant
Function Coefficients
Function
1
YEARS .670
LAKHS .793
NO OF
YEARS
-.027
Eigenvalues
Function Eigenvalue
% of
Variance
Cumulative
%
Canonical
Correlation
1 .541a 100.0 100.0 .592
a. First 1 canonical discriminant function Was used in the analysis.
Eigen value is the index of overall fit. It shows a high correlation value.
Wilks' Lambda
Test of Function(s) Wilks' Lambda Chi-square df Sig.
1 0.649 5.404 3 0.145
WILKS lambda measures the efficiency of discriminant function in the model. Its value shows, how much
percentage of variability in dependent variable is not explained by the independent variables which is moderate in
this case.

Classification Resultsa
RISK
Predicted Group
Membership
Total
LOW
RISK
HIGH
RISK
Original Count LOW
RISK
5 3 8
HIGH
RISK
2 6 8
% LOW
RISK
62.5 37.5 100.0
HIGH
RISK
25.0 75.0 100.0
a.68.8% of original grouped cases correctly classified.

Problem:
To identify the public response when choosing a Mobile Phone.
The data related to choose is shown as follow in SPSS file:
These are various variables related to the problem.

The data related to these 121 responses is shown as follows:
The stepwise method of doing discriminant analysis through SPSS is shown as:
1.

2. Now we have to define variables as grouping and independent variables.
3. Now we will change the stastical values and classify our variables:

Now we will save our data pad our output results will be displayed.

Interpretation of Output:
only one function, it provides an index of overall model ﬁt which is interpreted as being the proportion of
variance explained:

percentage of variability in dependent variable is not explained by the independent variables which is high in this
case.

Problem:
Was the person applied for loan is elligible or not?
These are various variables related to the problem
The data related to these people is shown as follows:

Now we will save our data nad our output results wiil be displayed .
Group Statistics
previously defaulted Mean Std. Deviation
Valid N (listwise)
Unweighted Weighted
1.00 other debt in thousand 3.7120 .77516 5 5.000
credit card debt in
thousands
3.6500 4.37863 5 5.000
debt to income ratio 15.2000 6.49731 5 5.000
household income in
thousand
62.6000 64.43834 5 5.000
years at current adress 8.8000 6.41872 5 5.000
education in years 2.0000 .70711 5 5.000
age in years 32.2000 7.75887 5 5.000
2.00 other debt in thousand 2.9158 3.60602 19 19.000
credit card debt in
thousands
1.3789 1.33380 19 19.000

household income in
thousand
47.3684 28.08560 19 19.000
age in years 38.0000 7.66667 19 19.000
Total other debt in thousand 3.0817 3.22338 24 24.000
credit card debt in
thousands
1.8521 2.36944 24 24.000
household income in
thousand
50.5417 37.14013 24 24.000
age in years 36.7917 7.89044 24 24.000
explained
Variables in the Analysis
Step Tolerance F to Remove Wilks' Lambda
1 education in years 1.000 9.462
2 education in years .884 12.708 .890
debt to income ratio .884 5.489 .699

Eigenvalues
Function Eigenvalue % of Variance Cumulative %
Canonical
Correlation
1 .804a
100.0 100.0 .668
a. First 1 canonical discriminant functions were used in the analysis.
Eigen values Eigen value is the index of overall fit. It shows a high
correlation value.
Wilks' Lambda
1 .554 12.389 2 .002
percentage of variability in dependent variable is not explained by the independent variables which is not high in
this case
Classification Results:
previously defaulted
Predicted Group Membership
Total1.00 2.00
Original Count 1.00 5 0 5
2.00 3 16 19
% 1.00 100.0 .0 100.0
2.00 15.8 84.2 100.0

Total1.00 2.00
2.00 3 16 19
% 1.00 100.0 .0 100.0
2.00 15.8 84.2 100.0
Functions at Group Centroids
Function
1
1.00 1.673
2.00 -.440
Unstandardized canonical
discriminant functions evaluated at
group means

Problem:
A large international air carrier has collected data on employees in three different job classifications: 1)
customer service personnel, and 2) mechanics. The director of Human Resources wants to know if these
two job classifications appeal to different personality types. Each employee is administered a battery of
psychological test which include measuresofinterest in outdoor activity, sociability and conservativeness.
The data related to choice is shown as follow in SPSS file:
These are the variables related to the problem:

The data related to these 178 responses is shown as follows:
1.

2. Now we have to define variables as grouping and independent variables.
3. Now we will change the statistical values and classify our variables:

Group Statistics
job Mean Std. Deviation
Valid N (listwise)
Unweighted Weighted
customer service outdoor 12.5176 4.64863 85 85.000
social 24.2235 4.33528 85 85.000
conservative 9.0235 3.14331 85 85.000
jid 43.0000 24.68130 85 85.000
mechanic outdoor 18.5376 3.56480 93 93.000
social 21.1398 4.55066 93 93.000
conservative 10.1398 3.24235 93 93.000
jid 47.0000 26.99074 93 93.000
dispatch outdoor 15.5758 4.11025 66 66.000
social 15.4545 3.76699 66 66.000
conservative 13.2424 3.69224 66 66.000
jid 33.5000 19.19635 66 66.000
Total outdoor 15.6393 4.83993 244 244.000
social 20.6762 5.47926 244 244.000
conservative 10.5902 3.72679 244 244.000
jid 41.9549 24.78903 244 244.000

Log Determinants
job Rank
Log
Determinant
customer service 4 14.521
mechanic 4 14.394
dispatch 4 13.983
Pooled within-groups 4 14.491
The ranks and natural logarithms of determinants
printed are those of the group covariance matrices.
Test Results
Box's M 39.442
F Approx. 1.924
df1 20
df2 176082.775
Sig. .008
Tests null hypothesis of equal
population covariance matrices.
explained.
Eigenvalues
Canonical
Correlation
1 1.150a
77.4 77.4 .731
2 .336a
22.6 100.0 .502
a. First 2 canonical discriminant functions were used in the analysis.

Wilks' Lambda
1 through 2 .348 252.757 8 .000
2 .748 69.446 3 .000
percentage of variability in dependent variable is not explained by the independent variables which is high in this
case
Standardized Canonical Discriminant Function Coefficients
Function
1 2
outdoor -.374 .908
social .836 .168
conservative -.504 -.242
jid .251 .210
Structure Matrix
Function
1 2
social .747*
.202
conservative -.459*
-.217
outdoor -.292 .938*
jid .137 .293*
Pooled within-groups correlations between discriminating variables and standardized canonical discriminant
functions
Variables ordered by absolute size of correlation within function.
*. Largest absolute correlation between each variable and any discriminant function

Canonical Discriminant Function Coefficients
Function
1 2
outdoor -.091 .221
social .196 .039
conservative -.151 -.073
jid .010 .009
(Constant) -1.455 -3.854
Unstandardized coefficients
job
Function
1 2
customer service 1.225 -.427
mechanic -.053 .734
dispatch -1.504 -.484
Unstandardized canonical discriminant
functions evaluated at group means
Classification Processing Summary
Processed 244
Excluded Missing or out-of-range
group codes
0
At least one missing
discriminating variable
0
Used in Output 244

Prior Probabilities for Groups
job Prior
Cases Used in Analysis
Unweighted Weighted
customer service .333 85 85.000
mechanic .333 93 93.000
dispatch .333 66 66.000
Total 1.000 244 244.000
Classification Function Coefficients
job
customer
service mechanic dispatch
outdoor .568 .940 .803
social 1.294 1.090 .758
conservative .695 .804 1.111
jid .087 .084 .059
(Constant) -25.342 -27.385 -21.556
Fisher's linear discriminant functions

Problem:
A national retail chain desires to build a discriminant function that would enable the firm to distinguish
between normal customers and loyal customers.
The data available for analysis is as follows:-
The variables chosen for analysis

Now determining the Statistical values and Classification:-
Next we save the data and determine the output of analysis performed.

Group Statistics
Loyalty Mean Std. Deviation
Valid N (listwise)
Unweighted Weighted
Normal Customers Frequency 20.80 12.599 15 15.000
Average_Purchase 24677.87 12889.131 15 15.000
Years 3.87 1.995 15 15.000
Loyal Customers Frequency 30.40 7.129 15 15.000
Years 6.20 1.656 15 15.000
Total Frequency 25.60 11.181 30 30.000
Years 5.03 2.157 30 30.000
Log Determinants
Loyalty Rank
Log
Determinant
Normal Customers 3 24.853
Loyal Customers 3 24.379
Pooled within-groups 3 25.073
The ranks and natural logarithms of determinants
printed are those of the group covariance
matrices.
Summary of Canonical Discriminant Functions:
The canonical correlation is the multiple correlation between the predictors and the discriminant
function. With only one function it provides an index of overall model ﬁt which is interpreted as being
the proportion of variance explained

Canonical Discriminant
Function Coefficients
Function
1
Frequency .061
Average_Purchase .000
Years .400
(Constant) -4.173
Z=0.061X1+0X2+0.40X3-4.173
Eigenvalues
Canonical
Correlation
1 .730a 100.0 100.0 .649
First 1 canonical discriminant functions were used in the analysis.
Wilks' Lambda
1 .578 14.518 3 .002
percentage of variability in dependent variable is not explained by the independent variables which is not high in
this case.
Classification Resultsa,c
Loyalty
Total
Normal
Customers
Loyal
Customers
Original Count Normal Customers 12 3 15
Loyal Customers 1 14 15
% Normal Customers 80.0 20.0 100.0
Loyal Customers 6.7 93.3 100.0
Cross-validatedb
Count Normal Customers 12 3 15
Loyal Customers 2 13 15
% Normal Customers 80.0 20.0 100.0
Loyal Customers 13.3 86.7 100.0

86.7% of original grouped cases correctly classified.
Cross validation is done only for those cases in the analysis. In cross validation, each case is
classified by the functions derived from all cases other than that case.
83.3% of cross-validated grouped cases correctly classified.
Loyalty
Function
1
Normal Customers -.825
Loyal Customers .825
discriminant functions
evaluated at group means

Problem:
To identify the players into different categories for selection process.
The data related to players is shown as follow in SPSS file:
These are various variables related to the problem.

The data related to these 20 players is shown as follows:
1.

2. now we have to define variables as grouping and independent variables.
3. now we will change the stastical values and classify our variables:

Now we will save our data and our output results will be displayed.

player Mean
Std.
Deviation
Valid N (listwise)
Unweighted Weighted
1.00 height 178.0000 4.32049 10 10.000
arm length 80.7000 1.82878 10 10.000
leg length 90.3000 3.52924 10 10.000
palm length 20.0000 .66667 10 10.000
shoulder strength 49.0000 6.99206 10 10.000
reaction time .4800 .14757 10 10.000
back exposive 11.6000 2.17051 10 10.000
speed 7.1000 .32998 10 10.000
judgement 12.5000 2.22361 10 10.000
patience 21.9000 3.98469 10 10.000
2.00 height 184.1111 2.89156 9 9.000
arm length 84.7778 2.86259 9 9.000
leg length 97.3333 4.18330 9 9.000
palm length 20.8889 .78174 9 9.000
reaction time .7111 .09280 9 9.000
back exposive 12.3333 1.41421 9 9.000
speed 7.3889 .41366 9 9.000
judgement 11.3333 1.11803 9 9.000
patience 20.2222 1.98606 9 9.000
Total height 180.8947 4.78301 19 19.000
arm length 82.6316 3.11289 19 19.000
leg length 93.6316 5.19840 19 19.000
palm length 20.4211 .83771 19 19.000
reaction time .5895 .16962 19 19.000
back exposive 11.9474 1.84010 19 19.000
speed 7.2368 .39046 19 19.000
judgement 11.9474 1.84010 19 19.000
patience 21.1053 3.22998 19 19.000

Box test of equality of covariance matrices: By using Box’s M Tests, we test a null hypothesis that the
covariance matrices do not differ between groups formed by the dependent variable. If the Box’s M Test
is insignificant, it indicates that the assumptions required for DA holds true which is not true in this case.
Log Determinants
player Rank
Log
Determinant
1.00 .a .b
2.00 .c .b
Pooled within-
groups
10 -1.480
The ranks and natural logarithms of
determinants printed are those of the group
covariance matrices.
a. Rank < 10
b. Too few cases to be non-singular
c. Rank < 9
Summary of Canonical Discriminant Functions:
The canonical correlation is the multiple correlation between the predictors and the discriminant
function. With only one function it provides an index of overall model ﬁt which is interpreted as being
the proportion of variance explained
Canonical Discriminant Function
Coefficients
Function
1
height .346
arm length -.115
leg length .251
palm length -1.033
shoulder strength -.135
reaction time 9.029
back exposive -.459
speed -1.553
judgement -.140
patience .251
(Constant) -41.099

Canonical Discriminant Function
Coefficients
Function
1
height .346
arm length -.115
leg length .251
palm length -1.033
shoulder strength -.135
reaction time 9.029
back exposive -.459
speed -1.553
judgement -.140
patience .251
(Constant) -41.099
Unstandardized coefficients
Z=-41.099+.346x1 -.115x2+.251x3-.135x4….
Eigenvalues
Functio
n
Eigenvalu
e
% of
Variance
Cumulative
%
Canonical
Correlation
1 3.662a 100.0 100.0 .886
a. First 1 canonical discriminant functions were used in the
analysis.
Eigen values Eigen value is the index of overall fit. It shows a high
correlation value.
Wilks' Lambda
Test of
Function(s)
Wilks'
Lambda Chi-square df Sig.
1 .214 18.474 10 .047
WILKS lambda measures the efficiency of discriminant function in the model. Its value shows, how
much percentage of variability in dependent variable is not explained by the independent variables
which is not high in this case.

player
Predicted Group
Membership
Total1.00 2.00
2.00 0 9 9
% 1.00 90.0 10.0 100.0
2.00 .0 100.0 100.0
Cross-validateda Count 1.00 8 2 10
2.00 4 5 9
% 1.00 80.0 20.0 100.0
2.00 44.4 55.6 100.0
a. Cross validation is done only for those cases in the analysis. In cross
validation, each case is classified by the functions derived from all cases
other than that case.
b. 94.7% of original grouped cases correctly classified.
c. 68.4% of cross-validated grouped cases correctly classified
player
Function
1
1.00 -1.717
2.00 1.908
discriminant functions
evaluated at group means.

Managerial Implications:
After getting to know the Technical Aspect of this useful concept, we can conclude that DA has the following
applications in the field of Marketing:
 Discriminate analysis, a multivariate technique used for market segmentation and predicting group
membership is often used for this type of problem because of its ability to classify individuals or
experimental units into two or more uniquely defined populations.
 Product research – Distinguish between heavy, medium, and light users of a product in terms of their
consumption habits and lifestyles.
 Perception/Image research – Distinguish between customers who exhibit favorable perceptions of a store
or company and those who do not.
 Advertising research – Identify how market segments differ in media consumption habits.
 Direct marketing – Identify the characteristics of consumers who will respond to a direct marketing
campaign and those who will not.

Sources
Textbook: Business Research Methods,a South-Asian Perspective by William G. Zikund
http://www.bluefinik.com/discriminant-analysis-example/
https://en.wikipedia.org/wiki/Discriminant_function_analysis
http://stats.idre.ucla.edu/spss/dae/discriminant-function-analysis/
http://www.cs.uu.nl/docs/vakken/arm/SPSS/spss6.pdf
https://www.slideshare.net/amritashishbagchi/discriminant-analysis-30449666

simple discriminant

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to simple discriminant

Similar to simple discriminant (20)

More from neha singh

More from neha singh (20)

Recently uploaded

Recently uploaded (20)

simple discriminant