Discriminant Analysis
Presented by Farzan Madadizadeh,
Ph.D. student of Biostatistics, Tehran University of Medical
Sciences,Tehran, Iran.
GROUP
SEPARATION
There are two major objectives in separation of groups:
 Description of group separation (discriminant analysis)
 discriminant functions
 Prediction or allocation of observations to groups
(classification analysis)
 classification functions
Unfortunately, there is no general agreement with regard to usage of the terms "discriminant analysis" and
"discriminant functions. "Many writers, perhaps the majority, use the term "discriminant analysis" in
connection with the second objective, prediction or allocation.
Basic Concept & Applications of DA
• 1920s Karl Pearson 1936s Fisher and other English
Statistician
• Discriminant analysis is a technique for analyzing data
when the criterion or dependent variable is categorical and
the predictor or independent variables are interval in nature.
• Total sample can be divided into groups based on a
qualitative dependent variable
DA Model
Discriminant functions are linear combinations of variables
that best separate groups.
D = b0 + b1X1 + b2X2 + b3X3 + . . . + bk Xk
where
b 's = discriminant coefficient or weight
X 's = predictor or independent variable
THE DISCRIMINANT FUNCTION FOR TWO
GROUPS
 We assume that the two populations to be compared have the
same covariance matrix but distinct mean vectors.
 Two sample :
The discriminant function maximizes the distance between the two
(transformed) group mean vectors.
A linear combination z = a'y transforms each observation vector .to a
scalar
1 211 1 21 2,........., , ,.........,n ny y y y
THE DISCRIMINANT FUNCTION FOR TWO
GROUPS
1 2
1 2
11 1 21 2
11 1 21 2
1 2
2 2
1 2 1 2 1 2
2
1
1 2
,........., , ,.........,
,......, , ,............,
( ) ( ) [ ( )]
( )
n n
n n
z z pl
pl
y y y y
z z z z
z z
z z z z a y y
s s a S a
a S y y



  
 


 
TWO GROUPS DA
DISCRIMINANT ANALYSIS FOR SEVERAL
GROUPS
 Discriminant Functions
1
2
1 2
2 2
1 2 1 2
2
,.....,
( ) ,
( ) [ ( )] ( )
( )
k
pl
z pl
z z
y y H S E
z z a y y a Ha SSH z
s a S a a Ea SSE z
  
  
  
 
1 1 1
1 2 1 2 1 2( ) ( ) ( )pl pl sa S y y y y S y y a E H  
      
eigenvalues
 relative importance of each discriminant function z;
1
i
s
j
j




STANDARDIZED DISCRIMINANT FUNCTIONS
 correctly reflect the joint contribution of the variables to the
discriminant function z as it maximally separates the groups.
TESTS OF SIGNIFICANCE
 In order to test hypotheses, we need the assumption of
multivariate normality
 For two groups :
 For several groups: Wilks' lambda-test (MANOVA test statistics)
which is approximately chisquare with p(k - 1) degrees of freedom
0 1 2: a 0H   ;
2
T
INTERPRETATION OF DISCRIMINANT
FUNCTIONS
 In interpretation, the signs of the coefficients are taken into
account; in ascertaining the contribution, the signs are
ignored and the coefficients are ranked in absolute value.
 The discriminant functions are uncorrelated but not
orthogonal.
 common approaches to assessing the contribution of each
variable (in the presence of the other variables) to
separating the groups is Correlations Between Variables
and Discriminant Functions
Limitations of DA
 the coefficient for a variable may change notably if variables
are added or deleted
 the coefficients may not be stable from sample to sample
unless the sample size is large relative to the number of
variables.
CLASSIFICATION ANALYSIS:
 In classification, a sampling unit (subject or object) whose
group membership is unknown is assigned to a group on the
basis of the vector of p measured values, y, associated with the
unit.
CLASSIFICATION INTO TWO GROUPS
 Fisher's (1936) linear classification procedure
 two populations have the same covariance matrix
 Normality is not required.
 assigns y to G if is closer to than toz ay 1z 2z
Fisher's procedure for classification into two
groups.
CLASSIFICATION INTO SEVERAL GROUPS
 Equal Population Covariance Matrices: Linear Classification
Functions
 Unequal Population Covariance Matrices:
Quadratic Classification Functions
ESTIMATING MISCLASSIFICATION RATES
 Classification table or Confusion matrix
Nearest Neighbor Classification Rule
 Fix and Hodges (1951)
 Also known as the k nearest neighbor rule
 To classify yi into one of two groups, the k points nearest to y,
are examined, and if the majority of the k points belong to G1,
assign yi to G1.
 distance function:
LOGISTIC REGRESSION CLASSIFIER
Examples

Discriminant analysis

  • 1.
    Discriminant Analysis Presented byFarzan Madadizadeh, Ph.D. student of Biostatistics, Tehran University of Medical Sciences,Tehran, Iran.
  • 2.
    GROUP SEPARATION There are twomajor objectives in separation of groups:  Description of group separation (discriminant analysis)  discriminant functions  Prediction or allocation of observations to groups (classification analysis)  classification functions Unfortunately, there is no general agreement with regard to usage of the terms "discriminant analysis" and "discriminant functions. "Many writers, perhaps the majority, use the term "discriminant analysis" in connection with the second objective, prediction or allocation.
  • 3.
    Basic Concept &Applications of DA • 1920s Karl Pearson 1936s Fisher and other English Statistician • Discriminant analysis is a technique for analyzing data when the criterion or dependent variable is categorical and the predictor or independent variables are interval in nature. • Total sample can be divided into groups based on a qualitative dependent variable
  • 4.
    DA Model Discriminant functionsare linear combinations of variables that best separate groups. D = b0 + b1X1 + b2X2 + b3X3 + . . . + bk Xk where b 's = discriminant coefficient or weight X 's = predictor or independent variable
  • 5.
    THE DISCRIMINANT FUNCTIONFOR TWO GROUPS  We assume that the two populations to be compared have the same covariance matrix but distinct mean vectors.  Two sample : The discriminant function maximizes the distance between the two (transformed) group mean vectors. A linear combination z = a'y transforms each observation vector .to a scalar 1 211 1 21 2,........., , ,.........,n ny y y y
  • 6.
    THE DISCRIMINANT FUNCTIONFOR TWO GROUPS 1 2 1 2 11 1 21 2 11 1 21 2 1 2 2 2 1 2 1 2 1 2 2 1 1 2 ,........., , ,........., ,......, , ,............, ( ) ( ) [ ( )] ( ) n n n n z z pl pl y y y y z z z z z z z z z z a y y s s a S a a S y y            
  • 7.
  • 8.
    DISCRIMINANT ANALYSIS FORSEVERAL GROUPS  Discriminant Functions 1 2 1 2 2 2 1 2 1 2 2 ,....., ( ) , ( ) [ ( )] ( ) ( ) k pl z pl z z y y H S E z z a y y a Ha SSH z s a S a a Ea SSE z            1 1 1 1 2 1 2 1 2( ) ( ) ( )pl pl sa S y y y y S y y a E H          eigenvalues
  • 9.
     relative importanceof each discriminant function z; 1 i s j j    
  • 10.
    STANDARDIZED DISCRIMINANT FUNCTIONS correctly reflect the joint contribution of the variables to the discriminant function z as it maximally separates the groups.
  • 11.
    TESTS OF SIGNIFICANCE In order to test hypotheses, we need the assumption of multivariate normality  For two groups :  For several groups: Wilks' lambda-test (MANOVA test statistics) which is approximately chisquare with p(k - 1) degrees of freedom 0 1 2: a 0H   ; 2 T
  • 12.
    INTERPRETATION OF DISCRIMINANT FUNCTIONS In interpretation, the signs of the coefficients are taken into account; in ascertaining the contribution, the signs are ignored and the coefficients are ranked in absolute value.  The discriminant functions are uncorrelated but not orthogonal.  common approaches to assessing the contribution of each variable (in the presence of the other variables) to separating the groups is Correlations Between Variables and Discriminant Functions
  • 13.
    Limitations of DA the coefficient for a variable may change notably if variables are added or deleted  the coefficients may not be stable from sample to sample unless the sample size is large relative to the number of variables.
  • 14.
    CLASSIFICATION ANALYSIS:  Inclassification, a sampling unit (subject or object) whose group membership is unknown is assigned to a group on the basis of the vector of p measured values, y, associated with the unit.
  • 15.
    CLASSIFICATION INTO TWOGROUPS  Fisher's (1936) linear classification procedure  two populations have the same covariance matrix  Normality is not required.  assigns y to G if is closer to than toz ay 1z 2z
  • 16.
    Fisher's procedure forclassification into two groups.
  • 17.
    CLASSIFICATION INTO SEVERALGROUPS  Equal Population Covariance Matrices: Linear Classification Functions  Unequal Population Covariance Matrices: Quadratic Classification Functions
  • 18.
    ESTIMATING MISCLASSIFICATION RATES Classification table or Confusion matrix
  • 19.
    Nearest Neighbor ClassificationRule  Fix and Hodges (1951)  Also known as the k nearest neighbor rule  To classify yi into one of two groups, the k points nearest to y, are examined, and if the majority of the k points belong to G1, assign yi to G1.  distance function:
  • 20.
  • 21.