2. Introduction
• Discriminant analysis builds a predictive model for group
membership.
• The model is composed of a discriminant function (or, for more
than two groups, a set of discriminant functions) based on linear
combinations of the predictor variables that provide the best
discrimination between the groups.
• The functions are generated from a sample of cases for which
group membership is known; the functions can then be applied
to new cases that have measurements for the predictor variables
but have unknown group membership.
3. History
• First, in 1936 Fisher formulated linear discriminant for two classes,
and later on, in 1948 C.R Rao generalized it for multiple classes.
• Fisher in his paper used a discriminant function to classify between
two plant species Iris Setosa and Iris Versi
• Fisher’s (1936) classic example of discriminant analysis involved two
varieties of iris and four predictor variables (petal width, petal length,
sepal width, and sepal length). Fisher not only wanted to determine
if the varieties differed significantly on the four continuous variables,
but he was also interested in predicting variety classification for
unknown individual plants.
4. Introduction
• Discriminant analysis is a technique that is used by the researcher to
analyze the research data when the criterion or the dependent
variable is categorical and the predictor or the independent variable is
interval in nature.
• The term categorical variable means that the dependent variable is
divided into a number of categories. For example, three brands of
computers, Computer A, Computer B and Computer C can be the
categorical dependent variable.
5. • Cases should be independent.
• Predictor variables should have a multivariate normal
distribution, and within-group variance-covariance matrices
should be equal across groups.
• Group membership is assumed to be mutually exclusive (that is,
no case belongs to more than one group) and collectively
exhaustive (that is, all cases are members of a group).
• The procedure is most effective when group membership is a
truly categorical variable; if group membership is based on
values of a continuous variable (for example, high IQ versus low
IQ), consider using linear regression to take advantage of the
richer information that is offered by the continuous variable
itself.
6. Application of DA
• Consider a loan officer at a bank who wishes to decide whether to
approve an applicant's automobile loan. This decision is made by
determining whether the applicant's characteristics are more similar
to those persons who in the past repaid loans successfully or to those
persons who defaulted.
• Information on these two groups, available from past records, would
include factors such as age, income, marital status, outstanding debt,
and home ownership.
7. Application of DA
• A large international air carrier has collected data on employees
in three different job classifications:
• 1) customer service personnel,
• 2) mechanics and
• 3) dispatchers.
• The director of Human Resources wants to know if these three
job classifications appeal to different personality types. Each
employee is administered a battery of psychological test which
include measures of interest in outdoor activity, sociability and
conservativeness.
8. Caselet for discussion
• Aruna Kumari (Aruna), the Founder of Aruna Beauty Salons (ABS), a beauty care salon
had decided to introduce a loyalty program by distributing privilege cards to her loyal
customers.
• Based on her experience, she classified sample of customers into loyal and disloyal
based on the amount of purchase and frequency of purchase at the salon.
• However, she was deliberating whether her classification was correct. When requested
for an analysis to rule out any skepticisms, Ram Kumar (Ram) a family friend, Research
Manager and Freelance Consultant, agreed to perform the required analysis and
provide the report. Ram, assigned the responsibility to his subordinate Varun Kumar
(Varun) to do the analysis in the minimal possible time.
• Varun was in a dilemma as to which technique/tool would give him the best possible
analysis in the shortest time.
9. SPSS Practice
• Let’s pursue Example 1 from above.
• We have included the data file, which can be obtained by
clicking on discrim.sav. The dataset has 244 observations on
four variables.
• The psychological variables are outdoor
interests, social and conservative.
• The categorical variable is job type with three levels; 1)
customer service, 2) mechanic, and 3) dispatcher.