SlideShare a Scribd company logo
1 of 67
Download to read offline
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/289528785
Linear Discriminant Analysis
Presentation · January 2015
CITATION
1
READS
1,910
1 author:
Some of the authors of this publication are also working on these related projects:
Statistical tests View project
Data Structure View project
Alaa Tharwat
Frankfurt University of Applied Sciences
107 PUBLICATIONS   1,031 CITATIONS   
SEE PROFILE
All content following this page was uploaded by Alaa Tharwat on 01 January 2018.
The user has requested enhancement of the downloaded file.
Linear Discriminant Analysis: An Overview
Alaa Tharwat
Alaa Tharwat, Tarek Gaber, Abdelhameed Ibrahim and Aboul Ella Hassanien. ”Linear
discriminant analysis: A detailed tutorial” AI Communications 30 (2017) 169-190
Email: engalaatharwat@hotmail.com
December 30, 2017
Alaa Tharwat December 30, 2017 1 / 66
Agenda
Introduction to Linear Discriminant Analysis (LDA).
Theoretical Background to LDA.
Definition of LDA.
Calculating the Between-Class Variance (SB).
Calculating the Within-Class Variance (SW ).
Constructing the Lower Dimensional Space.
Projection onto the LDA space.
Example of Two LDA Subspaces.
Computational Complexity of LDA.
Class-Dependent vs. Class-Independent Methods.
Numerical Example.
Main Problems of LDA.
Linearity problem.
Small Sample Size Problem.
Conclusions.
Alaa Tharwat December 30, 2017 2 / 66
Agenda
Introduction to Linear Discriminant Analysis (LDA).
Theoretical Background to LDA.
Definition of LDA.
Calculating the Between-Class Variance (SB).
Calculating the Within-Class Variance (SW ).
Constructing the Lower Dimensional Space.
Projection onto the LDA space.
Example of Two LDA Subspaces.
Computational Complexity of LDA.
Class-Dependent vs. Class-Independent Methods.
Numerical Example.
Main Problems of LDA.
Linearity problem.
Small Sample Size Problem.
Conclusions.
Alaa Tharwat December 30, 2017 3 / 66
Introduction to Linear Discriminant Analysis (LDA)
The goal for any dimensional reduction method is to reduce the
dimensions of the original data for different purposes such as
visualization, decrease CPU time, ..etc..
Dimensionality reduction techniques are important in many
applications related to machine learning, data mining, Bioinformatics,
biometric and information retrieval.
There are two types of dimensionality reduction methods, namely,
supervised and unsupervised.
Supervised (e.g. LDA).
Unsupervised (e.g. PCA).
Alaa Tharwat December 30, 2017 4 / 66
Introduction to Linear Discriminant Analysis (LDA)
The Linear Discriminant Analysis (LDA) technique is developed to
transform the features into a lower dimensional space, which
maximizes the ratio of the between-class variance to the within-class
variance, thereby guaranteeing maximum class separability.
J(W) =
WT SB W
WT SW W
(1)
There are two types of LDA technique to deal with classes:
class-dependent and class-independent.
In the class-dependent LDA, one separate lower dimensional space is
calculated for each class to project its data on it,
In the class-independent LDA, each class will be considered as a
separate class against the other classes. In this type, there is just one
lower dimensional space for all classes to project their data on it.
Alaa Tharwat December 30, 2017 5 / 66
Introduction to Linear Discriminant Analysis (LDA)
Introduction to Linear Discriminant Analysis (LDA).
Theoretical Background to LDA.
Definition of LDA.
Calculating the Between-Class Variance (SB).
Calculating the Within-Class Variance (SW ).
Constructing the Lower Dimensional Space.
Projection onto the LDA space.
Example of Two LDA Subspaces.
Computational Complexity of LDA.
Class-Dependent vs. Class-Independent Methods.
Numerical Example.
Main Problems of LDA.
Linearity problem.
Small Sample Size Problem.
Conclusions.
Alaa Tharwat December 30, 2017 6 / 66
Theoretical background to LDA Definition of LDA
Given the original data matrix X = {x1, x2, . . . , xN }, where xi
represents the ith sample, pattern, or observation and N is the total
number of samples.
Each sample is represented by M features (xi ∈ RM ). In other
words, each sample is represented as a point in M-dimensional space.
The data matrix is partitioned into c classes as follows,
X = [ω1, ω2, . . . , ωc]. Thus, Each class has ni samples.
The total number of samples (N) is calculated as follows,
N = c
i=1 ni.
Alaa Tharwat December 30, 2017 7 / 66
Theoretical background to LDA Definition of LDA
The goal of the LDA technique is to project the original data matrix
onto a lower dimensional space. To achieve this goal, three steps
needed to be performed.
1 The first step is to calculate the separability between different classes
(i.e. the distance between the means of different classes), which is
called the between-class variance or between-class matrix.
2 The second step is to calculate the distance between the mean and
the samples of each class, which is called the within-class variance or
within-class matrix.
3 The third step is to construct the lower dimensional space which
maximizes the between-class variance and minimizes the within-class
variance.
Alaa Tharwat December 30, 2017 8 / 66
Theoretical background to LDA Definition of LDA
Introduction to Linear Discriminant Analysis (LDA).
Theoretical Background to LDA.
Definition of LDA.
Calculating the Between-Class Variance (SB).
Calculating the Within-Class Variance (SW ).
Constructing the Lower Dimensional Space.
Projection onto the LDA space.
Example of Two LDA Subspaces.
Computational Complexity of LDA.
Class-Dependent vs. Class-Independent Methods.
Numerical Example.
Main Problems of LDA.
Linearity problem.
Small Sample Size Problem.
Conclusions.
Alaa Tharwat December 30, 2017 9 / 66
Theoretical background to LDA Calculating the Between-Class variance (SB)
To calculate the between-class variance (SB), the separation distance
between different classes which is denoted by (mi − m) will be
calculated as follows,
(mi − m)2
= (WT
µi − WT
µ)2
= WT
(µi − µ)(µi − µ)T
W (2)
where,
mi represents the projection of the mean of the ith class and it is
calculated as follows, mi = WT µi,
m is the projection of the total mean of all classes and it is
calculated as follows, m = WT µ,
W represents the transformation matrix of LDA,
µj(1 × M) represents the mean of the ith class (µj = 1
nj xi∈ωj
xi),
and
µ(1 × M) is the total mean of all classes
(µ = 1
N
N
i=1 xi = 1
c
c
j=1 µj), where c represents the total number
of classes.
Alaa Tharwat December 30, 2017 10 / 66
Theoretical background to LDA Calculating the Between-Class variance (SB)
The term (µi − µ)(µi − µ)T in Equation (2) represents the
separation distance between the mean of the ith class (µi) and the
total mean (µ), or simply it represents the between-class variance of
the ith class (SBi ).
Substitute SBi into Equation (2) as follows,
(mi − m)2
= WT
SBi W (3)
The total between-class variance is calculated as follows,
(SB = c
i=1 niSBi ).
Alaa Tharwat December 30, 2017 11 / 66
Theoretical background to LDA Calculating the Between-Class variance (SB)
μ1
μ2
μ3
μSB3
SB2
SB1
x1
x2
m2
m1
m3
Thedistance
betweenclasses
First
Eigenvector (v1)
m1
m2
m3
m1-m2
m
1-m
2
Second
Eigenvector(v2)
Alaa Tharwat December 30, 2017 12 / 66
Theoretical background to LDA Calculating the Between-Class variance (SB)
Introduction to Linear Discriminant Analysis (LDA).
Theoretical Background to LDA.
Definition of LDA.
Calculating the Between-Class Variance (SB).
Calculating the Within-Class Variance (SW ).
Constructing the Lower Dimensional Space.
Projection onto the LDA space.
Example of Two LDA Subspaces.
Computational Complexity of LDA.
Class-Dependent vs. Class-Independent Methods.
Numerical Example.
Main Problems of LDA.
Linearity problem.
Small Sample Size Problem.
Conclusions.
Alaa Tharwat December 30, 2017 13 / 66
Theoretical background to LDA Calculating the Within-Class variance (SW )
The within-class variance of the ith class (SWi ) represents the
difference between the mean and the samples of that class.
LDA technique searches for a lower-dimensional space, which is used
to minimize the difference between the projected mean (mi) and the
projected samples of each class (WT xi), or simply minimizes the
within-class variance.
xi∈ωj
(WT
xi − mj)2
=
xi∈ωj
(WT
xi − WT
µj)2
=
xi∈ωj
WT
(xi − µj)2
W
=
xi∈ωj
WT
(xi − µj)(xi − µj)T
W
=
xi∈ωj
WT
SWj W
(4)
Alaa Tharwat December 30, 2017 14 / 66
Theoretical background to LDA Calculating the Within-Class variance (SW )
From Equation (4), the within-class variance for each class can be
calculated as follows, SWj = dT
j ∗ dj =
nj
i=1(xij − µj)(xij − µj)T ,
where xij represents the ith sample in the jth class as shown in Fig.
(2, step (E, F)), and dj is the centering data of the jth class, i.e.
dj = ωj − µj = {xi}
nj
i=1 − µj.
Step (F) in the figure illustrates how the within-class variance of the
first class (SW1 ) in our example is calculated.
The total within-class variance represents the sum of all within-class
matrices of all classes (see Fig. (2, step (F))), and it can be
calculated as in Equation (5).
SW =
c
i=1
SWi =
xi∈ω1
(xi − µ1)(xi − µ1)T
+
xi∈ω2
(xi − µ2)(xi − µ2)T
+ · · · +
xi∈ωc
(xi − µc)(xi − µc)T
(5)
Alaa Tharwat December 30, 2017 15 / 66
Theoretical background to LDA Calculating the Within-Class variance (SW )
μ1
μ2
μ3
μ
S
W2
SW1
S
W
3
x1
x2
m2
m1
m3
Thevaraince
withineachclass
First
Eigenvector (v1)
m1
m2
m3
m1-m2
m
1-m
2
SW1
S
W
1
Second
Eigenvector(v2)
Alaa Tharwat December 30, 2017 16 / 66
Theoretical background to LDA Calculating the Within-Class variance (SW )
Introduction to Linear Discriminant Analysis (LDA).
Theoretical Background to LDA.
Definition of LDA.
Calculating the Between-Class Variance (SB).
Calculating the Within-Class Variance (SW ).
Constructing the Lower Dimensional Space.
Projection onto the LDA space.
Example of Two LDA Subspaces.
Computational Complexity of LDA.
Class-Dependent vs. Class-Independent Methods.
Numerical Example.
Main Problems of LDA.
Linearity problem.
Small Sample Size Problem.
Conclusions.
Alaa Tharwat December 30, 2017 17 / 66
Theoretical background to LDA Constructing the lower dimensional space
After calculating the between-class variance (SB) and within-class
variance (SW ), the transformation matrix (W) of the LDA technique
can be calculated as in Equation (1), which can be reformulated as
an optimization problem as in Equation (6).
SW W = λSB W (6)
where λ represents the eigenvalues of the transformation matrix (W).
The solution of this problem can be obtained by calculating the
eigenvalues (λ) and eigenvectors (V = {v1, v2, . . . , vM }) of
W = S−1
W SB, if SW is non-singular.
Alaa Tharwat December 30, 2017 18 / 66
Theoretical background to LDA Constructing the lower dimensional space
The eigenvalues are scalar values, while the eigenvectors are non-zero
vectors provides us with the information about the LDA space.
The eigenvectors represent the directions of the new space, and the
corresponding eigenvalues represent the scaling factor, length, or the
magnitude of the eigenvectors.
Thus, each eigenvector represents one axis of the LDA space and the
associated eigenvalue represents the robustness of this eigenvector.
The robustness of the eigenvector reflects its ability to discriminate
between different classes and decreases the within-class variance of
each class, hence meets the LDA goal.
Thus, the eigenvectors with the k highest eigenvalues are used to
construct a lower dimensional space (Vk), while the other
eigenvectors ({vk+1, vk+2, vM }) are neglected.
Alaa Tharwat December 30, 2017 19 / 66
Theoretical background to LDA Constructing the lower dimensional space
WT
SBWff
WT
SwW
argfmax=
μ1 μ
μ1
μ
dMx1E
d1xME
SB1=
Between-ClassfMatrixfdSBE
dn1xME
Sw1=
-
μ1
ω1
-
ω1 μ1
Within-ClassfMatrixfdSWE
D=
dNxME
ω1-μ1
ω2-μ2
ω3-μ3
x1-μ1
x2-μ1
xn1-μ1
xdn1L1E-μ2
xdn1Ln2E-μ2
xN-μ3
d1xME
μ1
μ2
μ3
MeanfoffeachfClassfdμiE
-
μ
d1xME
TotalfMeanfdμE
Vk∈RMxK
dMxn1E
X =
SB1
SB1 SB2 SB3
SB=
SB
L L =
X
-
-
-
=
SW1
SW1 SW2 SW3
SW=
SW
L L =
v1 v2 vk vM
λ1 λ2 λM
SortedfEigenvalues
λk
Largestfkf
Eigenvalues
kfSelected
Eigenvectors
Lowerf
Dimensionalf
Space
dVkE
dMxME dMxME
B
C
D F
G
E
ConstructfthefLowerfDimensionalfSpacefdVkE
ω1
ω2
ω3
X=
x1
x2
xn1
dNxME
xn1L1
xn1L2
xn1Ln2
xN
DatafMatrixfdXE
A
Mean-CenteringfDataf
dDi=ωi-μiE
W
Figure: Visualized steps to calculate a lower dimensional subspace of the LDAAlaa Tharwat December 30, 2017 20 / 66
Theoretical background to LDA Constructing the lower dimensional space
Introduction to Linear Discriminant Analysis (LDA).
Theoretical Background to LDA.
Definition of LDA.
Calculating the Between-Class Variance (SB).
Calculating the Within-Class Variance (SW ).
Constructing the Lower Dimensional Space.
Projection onto the LDA space.
Example of Two LDA Subspaces.
Computational Complexity of LDA.
Class-Dependent vs. Class-Independent Methods.
Numerical Example.
Main Problems of LDA.
Linearity problem.
Small Sample Size Problem.
Conclusions.
Alaa Tharwat December 30, 2017 21 / 66
Theoretical background to LDA Projection onto the LDA space
The dimension of the original data matrix (X ∈ RN×M ) is reduced
by projecting it onto the lower dimensional space of LDA
(Vk ∈ RM×k) as denoted in Equation (7).
The dimension of the data after projection is k; hence, M − k
features are ignored or deleted from each sample.
Each sample (xi) which was represented as a point a M-dimensional
space will be represented in a k-dimensional space by projecting it
onto the lower dimensional space (Vk) as follows, yi = xiVk.
Y = XVk (7)
Alaa Tharwat December 30, 2017 22 / 66
Theoretical background to LDA Projection onto the LDA space
Projection
Y∈RNxk
Data1After1Projection
k
Vk∈RMxK
Lower1
Dimensional1
Space
(Vk)
ω1
ω2
ω3
X=
x1
x2
xn1
(NxM)
xn1+1
xn1+2
xn1+n2
xN
Data1Matrix1(X)
ω1
ω2
ω3
y1
y2
yn1
yn1+1
yn1+2
yn1+n2
yN
Y=XVk
Figure: Projection of the original samples (i.e. data matrix) on the lower
dimensional space of LDA (Vk).
Alaa Tharwat December 30, 2017 23 / 66
Theoretical background to LDA Projection onto the LDA space
Introduction to Linear Discriminant Analysis (LDA).
Theoretical Background to LDA.
Definition of LDA.
Calculating the Between-Class Variance (SB).
Calculating the Within-Class Variance (SW ).
Constructing the Lower Dimensional Space.
Projection onto the LDA space.
Example of Two LDA Subspaces.
Computational Complexity of LDA.
Class-Dependent vs. Class-Independent Methods.
Numerical Example.
Main Problems of LDA.
Linearity problem.
Small Sample Size Problem.
Conclusions.
Alaa Tharwat December 30, 2017 24 / 66
Theoretical background to LDA Example of two LDA subspaces
μ1
μ2
μ3
μ
S
W2
SW1
S
W
3
SB3
SB2
SB1
x1
x2
m2
m1
m3
Thedistance
betweenclasses
Thevaraince
withineachclass
First
Eigenvector (v1)
m1
m2
m3
m1-m2
m
1-m
2
SW1
S
W
1
Second
Eigenvector(v2)
Figure: A visualized comparison between the two lower-dimensional sub-spaces
which are calculated using three different classes.
Alaa Tharwat December 30, 2017 25 / 66
Theoretical background to LDA Example of two LDA subspaces
The above Figure shows a comparison between two
lower-dimensional sub-spaces.
Each class has five samples, and all samples are represented by two
features only (xi ∈ R2) to be visualized. Thus, each sample is
represented as a point in two-dimensional space.
The transformation matrix (W(2 × 2)) is calculated as mentioned
before.
The eigenvalues (λ1 and λ2) and eigenvectors (i.e. sub-spaces)
(V = {v1, v2}) of W are then calculated. Thus, there are two
eigenvectors or sub-spaces.
Alaa Tharwat December 30, 2017 26 / 66
Theoretical background to LDA Example of two LDA subspaces
A comparison between the two lower-dimensional sub-spaces shows
the following notices:
First, the separation distance between different classes when the data
are projected on the first eigenvector (v1) is much greater than when
the data are projected on the second eigenvector (v2). As shown in
the figure, the three classes are efficiently discriminated when the data
are projected on v1. Moreover, the distance between the means of the
first and second classes (m1 − m2) when the original data are
projected on v1 is much greater than when the data are projected on
v2, which reflects that the first eigenvector discriminates the three
classes better than the second one.
Second, the within-class variance when the data are projected on v1 is
much smaller than when it projected on v2. For example, SW1
when
the data are projected on v1 is much smaller than when the data are
projected on v2. Thus, projecting the data on v1 minimizes the
within-class variance much better than v2.
From these two notes, we conclude that the first eigenvector meets
the goal of the lower-dimensional space of the LDA technique than
the second eigenvector; hence, it is selected to construct a
lower-dimensional space.
Alaa Tharwat December 30, 2017 27 / 66
Theoretical background to LDA Example of two LDA subspaces
Introduction to Linear Discriminant Analysis (LDA).
Theoretical Background to LDA.
Definition of LDA.
Calculating the Between-Class Variance (SB).
Calculating the Within-Class Variance (SW ).
Constructing the Lower Dimensional Space.
Projection onto the LDA space.
Example of Two LDA Subspaces.
Computational Complexity of LDA.
Class-Dependent vs. Class-Independent Methods.
Numerical Example.
Main Problems of LDA.
Linearity problem.
Small Sample Size Problem.
Conclusions.
Alaa Tharwat December 30, 2017 28 / 66
Theoretical background to LDA Computational complexity of LDA
The computational complexity for the first four steps, common in
both class-dependent and class-independent methods, are computed
as follows.
As illustrated in Algorithm (1) (see the paper), in step (2), to
calculate the mean of the ith class, there are niM additions and M
divisions, i.e., in total, there are (NM + cM) operations.
In step (3), there are NM additions and M divisions, i.e., there are
(NM + M) operations.
The computational complexity of the fourth step is
c(M + M2 + M2), where M is for µi − µ, M2 for
(µi − µ)(µi − µ)T , and the last M2 is for the multiplication between
ni and the matrix (µi − µ)(µi − µ)T .
In the fifth step, there are N(M + M2) operations, where M is for
(xij − µj) and M2 is for (xij − µj)(xij − µj)T .
In the sixth step, there are M3 operations to calculate S−1
W , M3 is
for the multiplication between S−1
W and SB, and M3 to calculate the
eigenvalues and eigenvectors.
Alaa Tharwat December 30, 2017 29 / 66
Theoretical background to LDA Computational complexity of LDA
In class-independent method, the computational complexity is
O(NM2) if N > M; otherwise, the complexity is O(M3).
In class-dependent algorithm, the number of operations to calculate
the within-class variance for each class SWj in the sixth step is
nj(M + M2), and to calculate SW , N(M + M2) operations are
needed. Hence, calculating the within-class variance for both LDA
methods are the same. In the seventh step and eighth, there are M3
operations for the inverse, M3 for the multiplication of S−1
Wi
SB, and
M3 for calculating eigenvalues and eigenvectors. These two steps are
repeated for each class which increases the complexity of the
class-dependent algorithm. Totally, the computational complexity of
the class-dependent algorithm is O(NM2) if N > M; otherwise, the
complexity is O(cM3). Hence, the class-dependent method needs
computations more than class-independent method.
Given 40 classes and each class has ten samples. Each sample is
represented by 4096 features (M > N). Thus, the computational
complexity of the class-independent method is O(M3) = 40963,
while the class-dependent method needs O(cM3) = 40 × 40963.
Alaa Tharwat December 30, 2017 30 / 66
Theoretical background to LDA Computational complexity of LDA
Introduction to Linear Discriminant Analysis (LDA).
Theoretical Background to LDA.
Definition of LDA.
Calculating the Between-Class Variance (SB).
Calculating the Within-Class Variance (SW ).
Constructing the Lower Dimensional Space.
Projection onto the LDA space.
Example of Two LDA Subspaces.
Computational Complexity of LDA.
Class-Dependent vs. Class-Independent Methods.
Numerical Example.
Main Problems of LDA.
Linearity problem.
Small Sample Size Problem.
Conclusions.
Alaa Tharwat December 30, 2017 31 / 66
Theoretical background to LDA Class-Dependent vs. Class-Independent methods
The aim of the two methods (class-dependent vs. class-independent)
of the LDA is to calculate the LDA space.
In the class-dependent LDA, one separate lower dimensional space is
calculated for each class as follows, Wi = S−1
Wi
SB, where Wi
represents the transformation matrix for the ith class. Thus,
eigenvalues and eigenvectors are calculated for each transformation
matrix separately. Hence, the samples of each class are projected on
their corresponding eigenvectors.
In the class-independent method, one lower dimensional space is
calculated for all classes. Thus, the transformation matrix is
calculated for all classes, and the samples of all classes are projected
on the selected eigenvectors.
Alaa Tharwat December 30, 2017 32 / 66
Theoretical background to LDA Class-Dependent vs. Class-Independent methods
Introduction to Linear Discriminant Analysis (LDA).
Theoretical Background to LDA.
Definition of LDA.
Calculating the Between-Class Variance (SB).
Calculating the Within-Class Variance (SW ).
Constructing the Lower Dimensional Space.
Projection onto the LDA space.
Example of Two LDA Subspaces.
Computational Complexity of LDA.
Class-Dependent vs. Class-Independent Methods.
Numerical Example.
Main Problems of LDA.
Linearity problem.
Small Sample Size Problem.
Conclusions.
Alaa Tharwat December 30, 2017 33 / 66
Numerical example
Given two different classes, ω1(5 × 2) and ω2(6 × 2) have (n1 = 5) and
(n2 = 6) samples, respectively. Each sample in both classes is represented
by two features (i.e. M = 2) as follows:
ω1 =






1.00 2.00
2.00 3.00
3.00 3.00
4.00 5.00
5.00 5.00






and ω2 =








4.00 2.00
5.00 0.00
5.00 2.00
3.00 2.00
5.00 3.00
6.00 3.00








(8)
µ1 = 3.00 3.60 , µ2 = 4.67 2.00 , and (9)
µ = 5
11µ1
6
11µ2 = 3.91 2.727 (10)
Alaa Tharwat December 30, 2017 34 / 66
Numerical example
To calculate SB1
SB1 = n1(µ1 − µ)T
(µ1 − µ) = 5[−0.91 0.87]T
[−0.91 0.87]
=
4.13 −3.97
−3.97 3.81
(11)
Similarly, SB1
SB2 =
3.44 −3.31
−3.31 3.17
(12)
The total between-class variance is calculated a follows:
SB = SB1 + SB2 =
4.13 −3.97
−3.97 3.81
+
3.44 −3.31
−3.31 3.17
=
7.58 −7.27
−7.27 6.98
(13)
Alaa Tharwat December 30, 2017 35 / 66
Numerical example
To calculate SW , first calculate mean-centering data.
d1 =






−2.00 −1.60
−1.00 −0.60
0.00 −0.60
1.00 1.40
2.00 1.40






and d2 =








−0.67 0.00
0.33 −2.00
0.33 0.00
−1.67 0.00
0.33 1.00
1.33 1.00








(14)
Alaa Tharwat December 30, 2017 36 / 66
Numerical example
After centering the data, in class-independent method, the
within-class variance for each class (SWi (2 × 2)) is calculated as
follows, SWj = dT
j ∗ dj =
nj
i=1(xij − µj)T (xij − µj), where xij
represents the ith sample in the jth class.
The total within-class matrix (SW (2 × 2)) is then calculated as
follows, SW = c
i=1 SWi .
SW1 =
10.00 8.00
8.00 7.20
, SW2 =
5.33 1.00
1.00 6.00
,
SW =
15.33 9.00
9.00 13.20
(15)
Alaa Tharwat December 30, 2017 37 / 66
Numerical example
The transformation matrix (W) can be obtained as follows,
W = S−1
W SB, and the values of (S−1
W ) and (W) are as follows:
S−1
W =
0.11 −0.07
−0.07 0.13
and W =
1.37 −1.32
−1.49 1.43
(16)
The eigenvalues (λ(2 × 2)) and eigenvectors (V (2 × 2)) of W are
then calculated as follows:
λ =
0.00 0.00
0.00 2.81
and V =
−0.69 0.68
−0.72 −0.74
(17)
The second eigenvector (V2) has corresponding eigenvalue more than
the first one (V1), which reflects that, the second eigenvector is more
robust than the first one; hence, it is selected to construct the lower
dimensional space.
Alaa Tharwat December 30, 2017 38 / 66
Numerical example
The original data is projected on the lower dimensional space, as
follows, yi = ωi V2, where yi(ni × 1) represents the data after
projection of the ith class, and its values will be as follows:
y1 = ω1V2 =






1.00 2.00
2.00 3.00
3.00 3.00
4.00 5.00
5.00 5.00






0.68
−0.74
=






−0.79
−0.85
−0.18
−0.97
−0.29






(18)
Similarly, y2 is as follows:
y2 = ω2V2 =








1.24
3.39
1.92
0.56
1.18
1.86








(19)
Alaa Tharwat December 30, 2017 39 / 66
Numerical example
The data of each class is completely discriminated when it is
projected on the second eigenvector (see Fig.)(b)) than the first one
(see Fig. a)).
The within-class variance (i.e. the variance between the same class
samples) of the two classes are minimized when the data are
projected on the second eigenvector. The within-class variance of the
first class is small compared with as shown Fig. (a).
−9 −8 −7 −6 −5 −4 −3 −2 −1 0
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Projected Data
P(ProjectedData)
Class 1
Class 2
SB
SW2
SW1
(a)
−1 0 1 2 3 4 5
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Projected Data
P(ProjectedData)
Class 1
Class 2SB
SW2
SW1
(b)
Figure: Probability density function of the projected data of the first example,
(a) the projected data on V1, (b) the projected data on V2.
Alaa Tharwat December 30, 2017 40 / 66
Numerical example
For class-dependent method, the aim is to calculate a separate
transformation matrix (Wi) for each class.
The within-class variance for each class (SWi (2 × 2)) is calculated as
in class-independent method.
The transformation matrix (Wi) for each class is then calculated as
follows, Wi = S−1
Wi
SB. The values of the two transformation
matrices (W1 and W2) will be as follows:
W1 = S−1
W1
SB =
10.00 8.00
8.00 7.20
−1
7.58 −7.27
−7.27 6.98
=
0.90 −1.00
−1.00 1.25
7.58 −7.27
−7.27 6.98
=
14.09 −13.53
−16.67 16.00
(20)
Alaa Tharwat December 30, 2017 41 / 66
Numerical example
Similarly, W2 is calculated as follows:
W2 =
1.70 −1.63
−1.50 1.44
(21)
The eigenvalues (λi) and eigenvectors (Vi) for each transformation
matrix (Wi) are calculated, and the values of the eigenvalues and
eigenvectors are shown below.
λω1 =
0.00 0.00
0.00 30.01
and Vω1 =
−0.69 0.65
−0.72 −0.76
(22)
λω2 =
3.14 0.00
0.00 0.00
and Vω2 =
0.75 0.69
−0.66 0.72
(23)
where λωi and Vωi represent the eigenvalues and eigenvectors of the
ith class, respectively.
Alaa Tharwat December 30, 2017 42 / 66
Numerical example
From the results shown (above) it can be seen that, the second
eigenvector of the first class (V
{2}
ω1 ) has corresponding eigenvalue
more than the first one; thus, the second eigenvector is used as a
lower dimensional space for the first class as follows, y1 = ω1 ∗ V
{2}
ω1 ,
where y1 represents the projection of the samples of the first class.
The first eigenvector in the second class (V
{1}
ω2 ) has corresponding
eigenvalue more than the second one. Thus, V
{1}
ω2 is used to project
the data of the second class as follows, y2 = ω2 ∗ V
{1}
ω2 , where y2
represents the projection of the samples of the second class.
The values of y1 and y2 will be as follows:
y1 =






−0.88
−1.00
−0.35
−1.24
−0.59






and y2 =








1.68
3.76
2.43
0.93
1.77
2.53








(24)
Alaa Tharwat December 30, 2017 43 / 66
Numerical example
The Figure (below) shows a pdf graph of the projected data (i.e. y1
and y2) on the two eigenvectors (V
{2}
ω1 and V
{1}
ω2 ) and a number of
findings are revealed the following:
First, the projection data of the two classes are efficiently
discriminated.
Second, the within-class variance of the projected samples is lower
than the within-class variance of the original samples.
−1 0 1 2 3 4 5
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Projected Data
P(ProjectedData)
Class 1
Class 2
Figure: Probability density function (pdf) of the projected data using
class-dependent method, the first class is projected on V
{2}
ω1 , while the second
class is projected on V
{1}
ω2 .
Alaa Tharwat December 30, 2017 44 / 66
Numerical example
1
2
3
4
5
6
1 2 3 4 5 6
x2
µ1
Vω2
1}
Vω1
{2}
SamplesPofPClassP1
Class-DependentP
LDAPsubspaces
ProjectionP
onPV1
µ2
µ
SamplesPofPClassP2
7
7
x1
Class-IndependentP
LDAPsubspaces
SW
1
S
W
1
S
W
1
SW
1
m
1 -m
2
m
1
m
2
m
1-m
2
ProjectionP
onPV2
m
1 -m
2
µ
1 -µ
2
µ
1 -µ
2
MeanPofPClassP1
MeanPofPClassP2
ProjectedPMeanPofPClassP1
ProjectedPMeanPofPClassP2
TotalPMean
Figure: Illustration of the example of the two different methods of LDA
methods. The blue and red lines represent the first and second eigenvectors of
the class-dependent approach, respectively, while the solid and dotted black lines
represent the second and first eigenvectors of class-independent approach,
respectively.
Alaa Tharwat December 30, 2017 45 / 66
Numerical example
Figure (above) shows a further explanation of the two methods as
following:
Class-Independent: As shown from the figure, there are two
eigenvectors, V1 (dotted black line) and V2 (solid black line). The
differences between the two eigenvectors are as follows:
The projected data on the second eigenvector (V2) which has the
highest corresponding eigenvalue will discriminate the data of the two
classes better than the first eigenvector. As shown in the figure, the
distance between the projected means m1 − m2 which represents SB,
increased when the data are projected on V2 than V1.
The second eigenvector decreases the within-class variance much
better than the first eigenvector. The above figure illustrates that the
within-class variance of the first class (SW1 ) was much smaller when it
was projected on V2 than V1.
As a result of the above two findings, V2 is used to construct the LDA
space.
Alaa Tharwat December 30, 2017 46 / 66
Numerical example
Figure (above) shows a further explanation of the two methods as
following:
Class-Dependent: As shown from the figure, there are two
eigenvectors, V
{2}
ω1 (red line) and V
{1}
ω2 (blue line), which represent the
first and second classes, respectively. The differences between the two
eigenvectors are as following:
Projecting the original data on the two eigenvectors discriminates
between the two classes. As shown in the figure, the distance between
the projected means m1 − m2 is larger than the distance between the
original means µ1 − µ2.
The within-class variance of each class is decreased. For example, the
within-class variance of the first class (SW1 ) is decreased when it is
projected on its corresponding eigenvector.
As a result of the above two findings, V
{2}
ω1 and V
{1}
ω2 are used to
construct the LDA space.
Alaa Tharwat December 30, 2017 47 / 66
Numerical example
Figure (above) shows a further explanation of the two methods as
following:
Class-Dependent vs. Class-Independent: The two LDA methods are
used to calculate the LDA space, but a class-dependent method
calculates separate lower dimensional spaces for each class which has
two main limitations: (1) it needs more CPU time and calculations
more than class-independent method; (2) it may lead to SSS problem
because the number of samples in each class affects the singularity of
SWi
.
These findings reveal that the standard LDA technique used the
class-independent method rather than using the class-dependent
method.
Alaa Tharwat December 30, 2017 48 / 66
Numerical example
Introduction to Linear Discriminant Analysis (LDA).
Theoretical Background to LDA.
Definition of LDA.
Calculating the Between-Class Variance (SB).
Calculating the Within-Class Variance (SW ).
Constructing the Lower Dimensional Space.
Projection onto the LDA space.
Example of Two LDA Subspaces.
Computational Complexity of LDA.
Class-Dependent vs. Class-Independent Methods.
Numerical Example.
Main Problems of LDA.
Linearity problem.
Small Sample Size Problem.
Conclusions.
Alaa Tharwat December 30, 2017 49 / 66
Main problems of LDA
Although LDA is one of the most common data reduction
techniques, it suffers from two main problems:
Small Sample Size (SSS) and
linearity problems.
Alaa Tharwat December 30, 2017 50 / 66
Main problems of LDA
Introduction to Linear Discriminant Analysis (LDA).
Theoretical Background to LDA.
Definition of LDA.
Calculating the Between-Class Variance (SB).
Calculating the Within-Class Variance (SW ).
Constructing the Lower Dimensional Space.
Projection onto the LDA space.
Example of Two LDA Subspaces.
Computational Complexity of LDA.
Class-Dependent vs. Class-Independent Methods.
Numerical Example.
Main Problems of LDA.
Linearity problem.
Small Sample Size Problem.
Conclusions.
Alaa Tharwat December 30, 2017 51 / 66
Main problems of LDA Linearity problem
LDA technique is used to find a linear transformation that
discriminates between different classes.
If the classes are non-linearly separable, LDA cannot find a lower
dimensional space. In other words, LDA fails to find the LDA space
when the discriminatory information are not in the means of classes.
The Figure (below) shows how the discriminatory information does
not exist in the mean, but in the variance of the data. This is
because the means of the two classes are equal.
The mathematical interpretation for this problem is as follows: if the
means of the classes are approximately equal, so the SB and W will
be zero. Hence, the LDA space cannot be calculated.
Alaa Tharwat December 30, 2017 52 / 66
Main problems of LDA Linearity problem
µ µ2µ1= =x x
ω1
Mapping or Transformation
Non-Linearly Seprable Classes
Linearly Seprable Classes
ω1
ω1
ω1
ω1ω1 ω2
ω2
ω2
ω2
ω2
ω2
Figure: Two examples of two non-linearly separable classes, top panel shows how
the two classes are non-separable, while the bottom shows how the
transformation solves this problem and the two classes are linearly separable.
Alaa Tharwat December 30, 2017 53 / 66
Main problems of LDA Linearity problem
One of the solutions of the linearity problem is based on the
transformation concept, which is known as a kernel methods or
functions.
The kernel idea is applied in Support Vector Machine (SVM),
Support Vector Regression (SVR), PCA, and LDA.
The previous figure illustrates how the transformation is used to map
the original data into a higher dimensional space; hence, the data will
be linearly separable, and the LDA technique can find the lower
dimensional space in the new space.
The figure (next slide) graphically and mathematically shows how
two non-separable classes in one-dimensional space are transformed
into a two-dimensional space (i.e. higher dimensional space); thus,
allowing linear separation.
Alaa Tharwat December 30, 2017 54 / 66
Main problems of LDA Linearity problem
1 2
Mapping
0
1
2
3
4
-1-2
1 20-1-2
X
Φ(x)=(x,x2
)=(z1,z2)=Z
z2
z1
(1)→ (1,1)
(-1)→(-1,1)
(2)→ (2,4)
(-2)→(-2,4)
Samples are
Non-linearly Seprable
Samples are
linearly Seprable
Class 1
Class 2
Figure: Example of kernel functions, the samples lie on the top panel (X) which
are represented by a line (i.e. one-dimensional space) are non-linearly separable,
where the samples lie on the bottom panel (Z) which are generated from
mapping the samples of the top space are linearly separable.
Alaa Tharwat December 30, 2017 55 / 66
Main problems of LDA Linearity problem
Let φ represents a nonlinear mapping to the new feature space Z.
The transformation matrix (W) in the new feature space (Z) is
calculated as follows:
F(W) = max
WT Sφ
BW
WT Sφ
W W
(25)
where W is a transformation matrix and Z is the new feature space.
The between-class matrix (Sφ
B) and the within-class matrix (Sφ
W ) are
defined as follows:
Sφ
B =
c
i=1
ni(µφ
i − µφ
)(µφ
i − µφ
)T
(26)
Sφ
W =
c
j=1
nj
i=1
(φ{xij} − µφ
j )(φ{xij} − µφ
j )T
(27)
where µφ
i = 1
ni
ni
i=1 φ{xi} and µφ = 1
N
N
i=1 φ{xi} = c
i=1
ni
N µφ
i
Alaa Tharwat December 30, 2017 56 / 66
Main problems of LDA Linearity problem
Thus, in kernel LDA, all samples are transformed non-linearly into a
new space Z using the function φ.
In other words, the φ function is used to map the original features
into Z space by creating a nonlinear combination of the original
samples using a dot-products of it.
There are many types of kernel functions to achieve this aim.
Examples of these function include Gaussian or Radial Basis Function
(RBF), K(xi, xj) = exp(−||xi − xj||2/2σ2), where σ is a positive
parameter, and the polynomial kernel of degree d,
K(xi, xj) = ( xi, xj + c)d.
Alaa Tharwat December 30, 2017 57 / 66
Main problems of LDA Linearity problem
Introduction to Linear Discriminant Analysis (LDA).
Theoretical Background to LDA.
Definition of LDA.
Calculating the Between-Class Variance (SB).
Calculating the Within-Class Variance (SW ).
Constructing the Lower Dimensional Space.
Projection onto the LDA space.
Example of Two LDA Subspaces.
Computational Complexity of LDA.
Class-Dependent vs. Class-Independent Methods.
Numerical Example.
Main Problems of LDA.
Linearity problem.
Small Sample Size Problem.
Conclusions.
Alaa Tharwat December 30, 2017 58 / 66
Main problems of LDA Small Sample Size (SSS) problem
Problem Definition
Singularity, Small Sample Size (SSS), or under-sampled problem is one
of the big problems of LDA technique.
This problem results from high-dimensional pattern classification tasks
or a low number of training samples available for each class compared
with the dimensionality of the sample space.
The SSS problem occurs when the SW is singular1
.
The upper bound of the rank2
of SW is N − c, while the dimension of
SW is M × M.
Thus, in most cases M >> N − c which leads to SSS problem.
For example, in face recognition applications, the size of the face
image my reach to 100 × 100 = 10000 pixels, which represent
high-dimensional features and it leads to a singularity problem.
1
A matrix is singular if it is square, does not have a matrix inverse, the determinant
is zeros; hence, not all columns and rows are independent
2
The rank of the matrix represents the number of linearly independent rows or
columns
Alaa Tharwat December 30, 2017 59 / 66
Main problems of LDA Small Sample Size (SSS) problem
Common Solutions to SSS Problem.
Regularization (RLDA):
In regularization method, the identity matrix is scaled by multiplying it
by a regularization parameter (η > 0) and adding it to the within-class
matrix to make it non-singular. Thus, the diagonal components of the
within-class matrix are biased as follows, SW = SW + ηI.
However, choosing the value of the regularization parameter requires
more tuning and a poor choice for this parameter can degrade the
performance of the method.
The parameter η is just added to perform the inverse of SW and has
no clear mathematical interpretation.
Alaa Tharwat December 30, 2017 60 / 66
Main problems of LDA Small Sample Size (SSS) problem
Common Solutions to SSS Problem.
Sub-space:
In this method, a non-singular intermediate space is obtained to
reduce the dimension of the original data to be equal to the rank of
SW ; hence, SW becomes full-rank3
, and then SW can be inverted.
For example, Belhumeur et al. used PCA, to reduce the dimensions of
the original space to be equal to N − c (i.e. the upper bound of the
rank of SW ).
However, losing some discriminant information is a common drawback
associated with the use of this method.
Null Space:
There are many studies proposed to remove the null space of SW to
make SW full-rank; hence, invertible.
The drawback of this method is that more discriminant information is
lost when the null space of SW is removed, which has a negative
impact on how the lower dimensional space satisfies the LDA goal.
3
A is a full-rank matrix if all columns and rows of the matrix are independent, (i.e.
rank(A)= # rows= #cols)
Alaa Tharwat December 30, 2017 61 / 66
Main problems of LDA Small Sample Size (SSS) problem
Four different variants of the LDA technique that are used to solve
the SSS problem are introduced as follows:
PCA + LDA technique:
In this technique, the original d-dimensional features are first reduced
to h-dimensional feature space using PCA, and then the LDA is used
to further reduce the features to k-dimensions.
The PCA is used in this technique to reduce the dimensions to make
the rank of SW is N − c; hence, the SSS problem is addressed.
However, the PCA neglects some discriminant information, which may
reduce the classification performance.
Direct LDA technique
Direct LDA (DLDA) is one of the well-known techniques that are used
to solve the SSS problem.
This technique has two main steps.
In the first step, the transformation matrix, W, is computed to
transform the training data to the range space of SB.
In the second step, the dimensionality of the transformed data is
further transformed using some regulating matrices.
The benefit of the DLDA is that there is no discriminative features are
neglected as in PCA+LDA technique.
Alaa Tharwat December 30, 2017 62 / 66
Main problems of LDA Small Sample Size (SSS) problem
Four different variants of the LDA technique that are used to solve
the SSS problem are introduced as follows:
Regularized LDA technique:
In the Regularized LDA (RLDA), a small perturbation is add to the
SW matrix to make it non-singular. This regularization can be applied
as follows:
(SW + ηI)−1
SBwi = λiwi (28)
where η represents a regularization parameter. The diagonal
components of the SW are biased by adding this small perturbation.
However, the regularization parameter need to be tuned and poor
choice of it can degrade the generalization performance.
W = arg max
|WT SW W|=0
|WT
SBW| (29)
Alaa Tharwat December 30, 2017 63 / 66
Main problems of LDA Small Sample Size (SSS) problem
Four different variants of the LDA technique that are used to solve
the SSS problem are introduced as follows:
Null LDA technique
The aim of NLDA is to find the orientation matrix W.
Firstly, the range space of the SW is neglected, and the data are
projected only on the null space of SW as follows, SW W = 0.
Secondly, the aim is to search for W that satisfies SBW = 0 and
maximizes |WT
SBW|.
The higher dimensionality of the feature space may lead to
computational problems. This problem can be solved by (1) using the
PCA technique as a pre-processing step, i.e. before applying the
NLDA technique, to reduce the dimension of feature space to be
N − 1; by removing the null space of ST = SB + SW , (2) using the
PCA technique before the second step of the NLDA technique.
Mathematically, in the Null LDA (NLDA) technique, the h column
vectors of the transformation matrix W = [w1, w2, . . . , wh] are taken
to be the null space of the SW as follows, wT
i SW wi = 0, ∀i = 1 . . . h,
where wT
i SBwi = 0. Hence, M − (N − c) linearly independent vectors
are used to form a new orientation matrix, which is used to maximize
|WT
SBW| subject to the constraint |WT
SW W| = 0 as follows,
W = arg max
|W T SW W |=0
|WT
SBW|.
Alaa Tharwat December 30, 2017 64 / 66
Main problems of LDA Small Sample Size (SSS) problem
Introduction to Linear Discriminant Analysis (LDA).
Theoretical Background to LDA.
Definition of LDA.
Calculating the Between-Class Variance (SB).
Calculating the Within-Class Variance (SW ).
Constructing the Lower Dimensional Space.
Projection onto the LDA space.
Example of Two LDA Subspaces.
Computational Complexity of LDA.
Class-Dependent vs. Class-Independent Methods.
Numerical Example.
Main Problems of LDA.
Linearity problem.
Small Sample Size Problem.
Conclusions.
Alaa Tharwat December 30, 2017 65 / 66
Conclusions
LDA is an easy to implement dimensionality reduction method.
The goals of LDA is to increase between class variance and decrease
within-class variance.
The value of eigenvalues reflect the robustness of the corresponding
eigenvector.
For more details, read the original paper ”Alaa Tharwat, Tarek
Gaber, Abdelhameed Ibrahim and Aboul Ella Hassanien. ”Linear
discriminant analysis: A detailed tutorial” AI Communications 30
(2017) 169-190”
For more questions, send to engalaatharwat@hotmail.com
Alaa Tharwat December 30, 2017 66 / 66View publication statsView publication stats

More Related Content

What's hot

Discriminant Analysis-lecture 8
Discriminant Analysis-lecture 8Discriminant Analysis-lecture 8
Discriminant Analysis-lecture 8Laila Fatehy
 
2.8 accuracy and ensemble methods
2.8 accuracy and ensemble methods2.8 accuracy and ensemble methods
2.8 accuracy and ensemble methodsKrish_ver2
 
Application of combined support vector machines in process fault diagnosis
Application of combined support vector machines in process fault diagnosisApplication of combined support vector machines in process fault diagnosis
Application of combined support vector machines in process fault diagnosisDr.Pooja Jain
 
Data Mining: Concepts and Techniques — Chapter 2 —
Data Mining:  Concepts and Techniques — Chapter 2 —Data Mining:  Concepts and Techniques — Chapter 2 —
Data Mining: Concepts and Techniques — Chapter 2 —Salah Amean
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsSalah Amean
 
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavMachine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavAgile Testing Alliance
 
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...Marina Santini
 
Talk slides at ISI, 2014
Talk slides at ISI, 2014Talk slides at ISI, 2014
Talk slides at ISI, 2014ychaubey
 
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data AnalysisWorkshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data AnalysisOlga Scrivner
 
Data.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionData.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionMargaret Wang
 

What's hot (20)

Discriminant Analysis-lecture 8
Discriminant Analysis-lecture 8Discriminant Analysis-lecture 8
Discriminant Analysis-lecture 8
 
Linear discriminant analysis
Linear discriminant analysisLinear discriminant analysis
Linear discriminant analysis
 
2.8 accuracy and ensemble methods
2.8 accuracy and ensemble methods2.8 accuracy and ensemble methods
2.8 accuracy and ensemble methods
 
Application of combined support vector machines in process fault diagnosis
Application of combined support vector machines in process fault diagnosisApplication of combined support vector machines in process fault diagnosis
Application of combined support vector machines in process fault diagnosis
 
Data Mining: Concepts and Techniques — Chapter 2 —
Data Mining:  Concepts and Techniques — Chapter 2 —Data Mining:  Concepts and Techniques — Chapter 2 —
Data Mining: Concepts and Techniques — Chapter 2 —
 
Classification
ClassificationClassification
Classification
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
 
Cluster Validation
Cluster ValidationCluster Validation
Cluster Validation
 
Km2417821785
Km2417821785Km2417821785
Km2417821785
 
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavMachine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
 
BAS 250 Lecture 8
BAS 250 Lecture 8BAS 250 Lecture 8
BAS 250 Lecture 8
 
Data Structures Chapter-4
Data Structures Chapter-4Data Structures Chapter-4
Data Structures Chapter-4
 
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
 
Talk slides at ISI, 2014
Talk slides at ISI, 2014Talk slides at ISI, 2014
Talk slides at ISI, 2014
 
Talk slides isi-2014
Talk slides isi-2014Talk slides isi-2014
Talk slides isi-2014
 
Data Structures Chapter-2
Data Structures Chapter-2Data Structures Chapter-2
Data Structures Chapter-2
 
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data AnalysisWorkshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
 
Data structures chapter 1
Data structures chapter  1Data structures chapter  1
Data structures chapter 1
 
Data.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionData.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and prediction
 
test & measuement
test & measuementtest & measuement
test & measuement
 

Similar to Linear discriminant analysis: an overview

Lecture on linerar discriminatory analysis
Lecture on linerar discriminatory analysisLecture on linerar discriminatory analysis
Lecture on linerar discriminatory analysisdevcb13d
 
STATISTICS +1.pptx
STATISTICS +1.pptxSTATISTICS +1.pptx
STATISTICS +1.pptxAjayPM4
 
Experimental design data analysis
Experimental design data analysisExperimental design data analysis
Experimental design data analysismetalkid132
 
Semi-Supervised Discriminant Analysis Based On Data Structure
Semi-Supervised Discriminant Analysis Based On Data StructureSemi-Supervised Discriminant Analysis Based On Data Structure
Semi-Supervised Discriminant Analysis Based On Data Structureiosrjce
 
Statistical techniques used in measurement
Statistical techniques used in measurementStatistical techniques used in measurement
Statistical techniques used in measurementShivamKhajuria3
 
Statistics review
Statistics reviewStatistics review
Statistics reviewjpcagphil
 
4 image segmentation through clustering
4 image segmentation through clustering4 image segmentation through clustering
4 image segmentation through clusteringIAEME Publication
 
4 image segmentation through clustering
4 image segmentation through clustering4 image segmentation through clustering
4 image segmentation through clusteringprjpublications
 
Linear vs. quadratic classifier power point
Linear vs. quadratic classifier power pointLinear vs. quadratic classifier power point
Linear vs. quadratic classifier power pointAlaa Tharwat
 
Day2 session i&ii - spss
Day2 session i&ii - spssDay2 session i&ii - spss
Day2 session i&ii - spssabir hossain
 
Anova n metaanalysis
Anova n metaanalysisAnova n metaanalysis
Anova n metaanalysisutpal sharma
 
An Automatic Medical Image Segmentation using Teaching Learning Based Optimiz...
An Automatic Medical Image Segmentation using Teaching Learning Based Optimiz...An Automatic Medical Image Segmentation using Teaching Learning Based Optimiz...
An Automatic Medical Image Segmentation using Teaching Learning Based Optimiz...idescitation
 
Literature Review on Vague Set Theory in Different Domains
Literature Review on Vague Set Theory in Different DomainsLiterature Review on Vague Set Theory in Different Domains
Literature Review on Vague Set Theory in Different Domainsrahulmonikasharma
 
Clustering techniques final
Clustering techniques finalClustering techniques final
Clustering techniques finalBenard Maina
 
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...Adam Fausett
 
Quality Engineering material
Quality Engineering materialQuality Engineering material
Quality Engineering materialTeluguSudhakar3
 

Similar to Linear discriminant analysis: an overview (20)

Lecture on linerar discriminatory analysis
Lecture on linerar discriminatory analysisLecture on linerar discriminatory analysis
Lecture on linerar discriminatory analysis
 
STATISTICS +1.pptx
STATISTICS +1.pptxSTATISTICS +1.pptx
STATISTICS +1.pptx
 
Experimental design data analysis
Experimental design data analysisExperimental design data analysis
Experimental design data analysis
 
E017373946
E017373946E017373946
E017373946
 
Semi-Supervised Discriminant Analysis Based On Data Structure
Semi-Supervised Discriminant Analysis Based On Data StructureSemi-Supervised Discriminant Analysis Based On Data Structure
Semi-Supervised Discriminant Analysis Based On Data Structure
 
Clustering
ClusteringClustering
Clustering
 
Statistical techniques used in measurement
Statistical techniques used in measurementStatistical techniques used in measurement
Statistical techniques used in measurement
 
Statistics review
Statistics reviewStatistics review
Statistics review
 
4 image segmentation through clustering
4 image segmentation through clustering4 image segmentation through clustering
4 image segmentation through clustering
 
4 image segmentation through clustering
4 image segmentation through clustering4 image segmentation through clustering
4 image segmentation through clustering
 
Linear vs. quadratic classifier power point
Linear vs. quadratic classifier power pointLinear vs. quadratic classifier power point
Linear vs. quadratic classifier power point
 
Day2 session i&ii - spss
Day2 session i&ii - spssDay2 session i&ii - spss
Day2 session i&ii - spss
 
Anova n metaanalysis
Anova n metaanalysisAnova n metaanalysis
Anova n metaanalysis
 
An Automatic Medical Image Segmentation using Teaching Learning Based Optimiz...
An Automatic Medical Image Segmentation using Teaching Learning Based Optimiz...An Automatic Medical Image Segmentation using Teaching Learning Based Optimiz...
An Automatic Medical Image Segmentation using Teaching Learning Based Optimiz...
 
Measures of dispersion
Measures  of  dispersionMeasures  of  dispersion
Measures of dispersion
 
Literature Review on Vague Set Theory in Different Domains
Literature Review on Vague Set Theory in Different DomainsLiterature Review on Vague Set Theory in Different Domains
Literature Review on Vague Set Theory in Different Domains
 
Lecture 03.
Lecture 03.Lecture 03.
Lecture 03.
 
Clustering techniques final
Clustering techniques finalClustering techniques final
Clustering techniques final
 
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
An_Accelerated_Nearest_Neighbor_Search_Method_for_the_K-Means_Clustering_Algo...
 
Quality Engineering material
Quality Engineering materialQuality Engineering material
Quality Engineering material
 

Recently uploaded

Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityUnlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityHung Le
 
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdfSOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdfMahamudul Hasan
 
Digital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalDigital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalFabian de Rijk
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoKayode Fayemi
 
Introduction to Artificial intelligence.
Introduction to Artificial intelligence.Introduction to Artificial intelligence.
Introduction to Artificial intelligence.thamaeteboho94
 
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...ZurliaSoop
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lodhisaajjda
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Baileyhlharris
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...amilabibi1
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatmentnswingard
 
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...David Celestin
 
Zone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptxZone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptxlionnarsimharajumjf
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIINhPhngng3
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar TrainingKylaCullinane
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfSkillCertProExams
 

Recently uploaded (17)

Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityUnlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
 
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdfSOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
 
Digital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalDigital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of Drupal
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
 
Introduction to Artificial intelligence.
Introduction to Artificial intelligence.Introduction to Artificial intelligence.
Introduction to Artificial intelligence.
 
ICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdfICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdf
 
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Bailey
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatment
 
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
 
Zone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptxZone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptx
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio III
 
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait Cityin kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
 

Linear discriminant analysis: an overview

  • 1. See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/289528785 Linear Discriminant Analysis Presentation · January 2015 CITATION 1 READS 1,910 1 author: Some of the authors of this publication are also working on these related projects: Statistical tests View project Data Structure View project Alaa Tharwat Frankfurt University of Applied Sciences 107 PUBLICATIONS   1,031 CITATIONS    SEE PROFILE All content following this page was uploaded by Alaa Tharwat on 01 January 2018. The user has requested enhancement of the downloaded file.
  • 2. Linear Discriminant Analysis: An Overview Alaa Tharwat Alaa Tharwat, Tarek Gaber, Abdelhameed Ibrahim and Aboul Ella Hassanien. ”Linear discriminant analysis: A detailed tutorial” AI Communications 30 (2017) 169-190 Email: engalaatharwat@hotmail.com December 30, 2017 Alaa Tharwat December 30, 2017 1 / 66
  • 3. Agenda Introduction to Linear Discriminant Analysis (LDA). Theoretical Background to LDA. Definition of LDA. Calculating the Between-Class Variance (SB). Calculating the Within-Class Variance (SW ). Constructing the Lower Dimensional Space. Projection onto the LDA space. Example of Two LDA Subspaces. Computational Complexity of LDA. Class-Dependent vs. Class-Independent Methods. Numerical Example. Main Problems of LDA. Linearity problem. Small Sample Size Problem. Conclusions. Alaa Tharwat December 30, 2017 2 / 66
  • 4. Agenda Introduction to Linear Discriminant Analysis (LDA). Theoretical Background to LDA. Definition of LDA. Calculating the Between-Class Variance (SB). Calculating the Within-Class Variance (SW ). Constructing the Lower Dimensional Space. Projection onto the LDA space. Example of Two LDA Subspaces. Computational Complexity of LDA. Class-Dependent vs. Class-Independent Methods. Numerical Example. Main Problems of LDA. Linearity problem. Small Sample Size Problem. Conclusions. Alaa Tharwat December 30, 2017 3 / 66
  • 5. Introduction to Linear Discriminant Analysis (LDA) The goal for any dimensional reduction method is to reduce the dimensions of the original data for different purposes such as visualization, decrease CPU time, ..etc.. Dimensionality reduction techniques are important in many applications related to machine learning, data mining, Bioinformatics, biometric and information retrieval. There are two types of dimensionality reduction methods, namely, supervised and unsupervised. Supervised (e.g. LDA). Unsupervised (e.g. PCA). Alaa Tharwat December 30, 2017 4 / 66
  • 6. Introduction to Linear Discriminant Analysis (LDA) The Linear Discriminant Analysis (LDA) technique is developed to transform the features into a lower dimensional space, which maximizes the ratio of the between-class variance to the within-class variance, thereby guaranteeing maximum class separability. J(W) = WT SB W WT SW W (1) There are two types of LDA technique to deal with classes: class-dependent and class-independent. In the class-dependent LDA, one separate lower dimensional space is calculated for each class to project its data on it, In the class-independent LDA, each class will be considered as a separate class against the other classes. In this type, there is just one lower dimensional space for all classes to project their data on it. Alaa Tharwat December 30, 2017 5 / 66
  • 7. Introduction to Linear Discriminant Analysis (LDA) Introduction to Linear Discriminant Analysis (LDA). Theoretical Background to LDA. Definition of LDA. Calculating the Between-Class Variance (SB). Calculating the Within-Class Variance (SW ). Constructing the Lower Dimensional Space. Projection onto the LDA space. Example of Two LDA Subspaces. Computational Complexity of LDA. Class-Dependent vs. Class-Independent Methods. Numerical Example. Main Problems of LDA. Linearity problem. Small Sample Size Problem. Conclusions. Alaa Tharwat December 30, 2017 6 / 66
  • 8. Theoretical background to LDA Definition of LDA Given the original data matrix X = {x1, x2, . . . , xN }, where xi represents the ith sample, pattern, or observation and N is the total number of samples. Each sample is represented by M features (xi ∈ RM ). In other words, each sample is represented as a point in M-dimensional space. The data matrix is partitioned into c classes as follows, X = [ω1, ω2, . . . , ωc]. Thus, Each class has ni samples. The total number of samples (N) is calculated as follows, N = c i=1 ni. Alaa Tharwat December 30, 2017 7 / 66
  • 9. Theoretical background to LDA Definition of LDA The goal of the LDA technique is to project the original data matrix onto a lower dimensional space. To achieve this goal, three steps needed to be performed. 1 The first step is to calculate the separability between different classes (i.e. the distance between the means of different classes), which is called the between-class variance or between-class matrix. 2 The second step is to calculate the distance between the mean and the samples of each class, which is called the within-class variance or within-class matrix. 3 The third step is to construct the lower dimensional space which maximizes the between-class variance and minimizes the within-class variance. Alaa Tharwat December 30, 2017 8 / 66
  • 10. Theoretical background to LDA Definition of LDA Introduction to Linear Discriminant Analysis (LDA). Theoretical Background to LDA. Definition of LDA. Calculating the Between-Class Variance (SB). Calculating the Within-Class Variance (SW ). Constructing the Lower Dimensional Space. Projection onto the LDA space. Example of Two LDA Subspaces. Computational Complexity of LDA. Class-Dependent vs. Class-Independent Methods. Numerical Example. Main Problems of LDA. Linearity problem. Small Sample Size Problem. Conclusions. Alaa Tharwat December 30, 2017 9 / 66
  • 11. Theoretical background to LDA Calculating the Between-Class variance (SB) To calculate the between-class variance (SB), the separation distance between different classes which is denoted by (mi − m) will be calculated as follows, (mi − m)2 = (WT µi − WT µ)2 = WT (µi − µ)(µi − µ)T W (2) where, mi represents the projection of the mean of the ith class and it is calculated as follows, mi = WT µi, m is the projection of the total mean of all classes and it is calculated as follows, m = WT µ, W represents the transformation matrix of LDA, µj(1 × M) represents the mean of the ith class (µj = 1 nj xi∈ωj xi), and µ(1 × M) is the total mean of all classes (µ = 1 N N i=1 xi = 1 c c j=1 µj), where c represents the total number of classes. Alaa Tharwat December 30, 2017 10 / 66
  • 12. Theoretical background to LDA Calculating the Between-Class variance (SB) The term (µi − µ)(µi − µ)T in Equation (2) represents the separation distance between the mean of the ith class (µi) and the total mean (µ), or simply it represents the between-class variance of the ith class (SBi ). Substitute SBi into Equation (2) as follows, (mi − m)2 = WT SBi W (3) The total between-class variance is calculated as follows, (SB = c i=1 niSBi ). Alaa Tharwat December 30, 2017 11 / 66
  • 13. Theoretical background to LDA Calculating the Between-Class variance (SB) μ1 μ2 μ3 μSB3 SB2 SB1 x1 x2 m2 m1 m3 Thedistance betweenclasses First Eigenvector (v1) m1 m2 m3 m1-m2 m 1-m 2 Second Eigenvector(v2) Alaa Tharwat December 30, 2017 12 / 66
  • 14. Theoretical background to LDA Calculating the Between-Class variance (SB) Introduction to Linear Discriminant Analysis (LDA). Theoretical Background to LDA. Definition of LDA. Calculating the Between-Class Variance (SB). Calculating the Within-Class Variance (SW ). Constructing the Lower Dimensional Space. Projection onto the LDA space. Example of Two LDA Subspaces. Computational Complexity of LDA. Class-Dependent vs. Class-Independent Methods. Numerical Example. Main Problems of LDA. Linearity problem. Small Sample Size Problem. Conclusions. Alaa Tharwat December 30, 2017 13 / 66
  • 15. Theoretical background to LDA Calculating the Within-Class variance (SW ) The within-class variance of the ith class (SWi ) represents the difference between the mean and the samples of that class. LDA technique searches for a lower-dimensional space, which is used to minimize the difference between the projected mean (mi) and the projected samples of each class (WT xi), or simply minimizes the within-class variance. xi∈ωj (WT xi − mj)2 = xi∈ωj (WT xi − WT µj)2 = xi∈ωj WT (xi − µj)2 W = xi∈ωj WT (xi − µj)(xi − µj)T W = xi∈ωj WT SWj W (4) Alaa Tharwat December 30, 2017 14 / 66
  • 16. Theoretical background to LDA Calculating the Within-Class variance (SW ) From Equation (4), the within-class variance for each class can be calculated as follows, SWj = dT j ∗ dj = nj i=1(xij − µj)(xij − µj)T , where xij represents the ith sample in the jth class as shown in Fig. (2, step (E, F)), and dj is the centering data of the jth class, i.e. dj = ωj − µj = {xi} nj i=1 − µj. Step (F) in the figure illustrates how the within-class variance of the first class (SW1 ) in our example is calculated. The total within-class variance represents the sum of all within-class matrices of all classes (see Fig. (2, step (F))), and it can be calculated as in Equation (5). SW = c i=1 SWi = xi∈ω1 (xi − µ1)(xi − µ1)T + xi∈ω2 (xi − µ2)(xi − µ2)T + · · · + xi∈ωc (xi − µc)(xi − µc)T (5) Alaa Tharwat December 30, 2017 15 / 66
  • 17. Theoretical background to LDA Calculating the Within-Class variance (SW ) μ1 μ2 μ3 μ S W2 SW1 S W 3 x1 x2 m2 m1 m3 Thevaraince withineachclass First Eigenvector (v1) m1 m2 m3 m1-m2 m 1-m 2 SW1 S W 1 Second Eigenvector(v2) Alaa Tharwat December 30, 2017 16 / 66
  • 18. Theoretical background to LDA Calculating the Within-Class variance (SW ) Introduction to Linear Discriminant Analysis (LDA). Theoretical Background to LDA. Definition of LDA. Calculating the Between-Class Variance (SB). Calculating the Within-Class Variance (SW ). Constructing the Lower Dimensional Space. Projection onto the LDA space. Example of Two LDA Subspaces. Computational Complexity of LDA. Class-Dependent vs. Class-Independent Methods. Numerical Example. Main Problems of LDA. Linearity problem. Small Sample Size Problem. Conclusions. Alaa Tharwat December 30, 2017 17 / 66
  • 19. Theoretical background to LDA Constructing the lower dimensional space After calculating the between-class variance (SB) and within-class variance (SW ), the transformation matrix (W) of the LDA technique can be calculated as in Equation (1), which can be reformulated as an optimization problem as in Equation (6). SW W = λSB W (6) where λ represents the eigenvalues of the transformation matrix (W). The solution of this problem can be obtained by calculating the eigenvalues (λ) and eigenvectors (V = {v1, v2, . . . , vM }) of W = S−1 W SB, if SW is non-singular. Alaa Tharwat December 30, 2017 18 / 66
  • 20. Theoretical background to LDA Constructing the lower dimensional space The eigenvalues are scalar values, while the eigenvectors are non-zero vectors provides us with the information about the LDA space. The eigenvectors represent the directions of the new space, and the corresponding eigenvalues represent the scaling factor, length, or the magnitude of the eigenvectors. Thus, each eigenvector represents one axis of the LDA space and the associated eigenvalue represents the robustness of this eigenvector. The robustness of the eigenvector reflects its ability to discriminate between different classes and decreases the within-class variance of each class, hence meets the LDA goal. Thus, the eigenvectors with the k highest eigenvalues are used to construct a lower dimensional space (Vk), while the other eigenvectors ({vk+1, vk+2, vM }) are neglected. Alaa Tharwat December 30, 2017 19 / 66
  • 21. Theoretical background to LDA Constructing the lower dimensional space WT SBWff WT SwW argfmax= μ1 μ μ1 μ dMx1E d1xME SB1= Between-ClassfMatrixfdSBE dn1xME Sw1= - μ1 ω1 - ω1 μ1 Within-ClassfMatrixfdSWE D= dNxME ω1-μ1 ω2-μ2 ω3-μ3 x1-μ1 x2-μ1 xn1-μ1 xdn1L1E-μ2 xdn1Ln2E-μ2 xN-μ3 d1xME μ1 μ2 μ3 MeanfoffeachfClassfdμiE - μ d1xME TotalfMeanfdμE Vk∈RMxK dMxn1E X = SB1 SB1 SB2 SB3 SB= SB L L = X - - - = SW1 SW1 SW2 SW3 SW= SW L L = v1 v2 vk vM λ1 λ2 λM SortedfEigenvalues λk Largestfkf Eigenvalues kfSelected Eigenvectors Lowerf Dimensionalf Space dVkE dMxME dMxME B C D F G E ConstructfthefLowerfDimensionalfSpacefdVkE ω1 ω2 ω3 X= x1 x2 xn1 dNxME xn1L1 xn1L2 xn1Ln2 xN DatafMatrixfdXE A Mean-CenteringfDataf dDi=ωi-μiE W Figure: Visualized steps to calculate a lower dimensional subspace of the LDAAlaa Tharwat December 30, 2017 20 / 66
  • 22. Theoretical background to LDA Constructing the lower dimensional space Introduction to Linear Discriminant Analysis (LDA). Theoretical Background to LDA. Definition of LDA. Calculating the Between-Class Variance (SB). Calculating the Within-Class Variance (SW ). Constructing the Lower Dimensional Space. Projection onto the LDA space. Example of Two LDA Subspaces. Computational Complexity of LDA. Class-Dependent vs. Class-Independent Methods. Numerical Example. Main Problems of LDA. Linearity problem. Small Sample Size Problem. Conclusions. Alaa Tharwat December 30, 2017 21 / 66
  • 23. Theoretical background to LDA Projection onto the LDA space The dimension of the original data matrix (X ∈ RN×M ) is reduced by projecting it onto the lower dimensional space of LDA (Vk ∈ RM×k) as denoted in Equation (7). The dimension of the data after projection is k; hence, M − k features are ignored or deleted from each sample. Each sample (xi) which was represented as a point a M-dimensional space will be represented in a k-dimensional space by projecting it onto the lower dimensional space (Vk) as follows, yi = xiVk. Y = XVk (7) Alaa Tharwat December 30, 2017 22 / 66
  • 24. Theoretical background to LDA Projection onto the LDA space Projection Y∈RNxk Data1After1Projection k Vk∈RMxK Lower1 Dimensional1 Space (Vk) ω1 ω2 ω3 X= x1 x2 xn1 (NxM) xn1+1 xn1+2 xn1+n2 xN Data1Matrix1(X) ω1 ω2 ω3 y1 y2 yn1 yn1+1 yn1+2 yn1+n2 yN Y=XVk Figure: Projection of the original samples (i.e. data matrix) on the lower dimensional space of LDA (Vk). Alaa Tharwat December 30, 2017 23 / 66
  • 25. Theoretical background to LDA Projection onto the LDA space Introduction to Linear Discriminant Analysis (LDA). Theoretical Background to LDA. Definition of LDA. Calculating the Between-Class Variance (SB). Calculating the Within-Class Variance (SW ). Constructing the Lower Dimensional Space. Projection onto the LDA space. Example of Two LDA Subspaces. Computational Complexity of LDA. Class-Dependent vs. Class-Independent Methods. Numerical Example. Main Problems of LDA. Linearity problem. Small Sample Size Problem. Conclusions. Alaa Tharwat December 30, 2017 24 / 66
  • 26. Theoretical background to LDA Example of two LDA subspaces μ1 μ2 μ3 μ S W2 SW1 S W 3 SB3 SB2 SB1 x1 x2 m2 m1 m3 Thedistance betweenclasses Thevaraince withineachclass First Eigenvector (v1) m1 m2 m3 m1-m2 m 1-m 2 SW1 S W 1 Second Eigenvector(v2) Figure: A visualized comparison between the two lower-dimensional sub-spaces which are calculated using three different classes. Alaa Tharwat December 30, 2017 25 / 66
  • 27. Theoretical background to LDA Example of two LDA subspaces The above Figure shows a comparison between two lower-dimensional sub-spaces. Each class has five samples, and all samples are represented by two features only (xi ∈ R2) to be visualized. Thus, each sample is represented as a point in two-dimensional space. The transformation matrix (W(2 × 2)) is calculated as mentioned before. The eigenvalues (λ1 and λ2) and eigenvectors (i.e. sub-spaces) (V = {v1, v2}) of W are then calculated. Thus, there are two eigenvectors or sub-spaces. Alaa Tharwat December 30, 2017 26 / 66
  • 28. Theoretical background to LDA Example of two LDA subspaces A comparison between the two lower-dimensional sub-spaces shows the following notices: First, the separation distance between different classes when the data are projected on the first eigenvector (v1) is much greater than when the data are projected on the second eigenvector (v2). As shown in the figure, the three classes are efficiently discriminated when the data are projected on v1. Moreover, the distance between the means of the first and second classes (m1 − m2) when the original data are projected on v1 is much greater than when the data are projected on v2, which reflects that the first eigenvector discriminates the three classes better than the second one. Second, the within-class variance when the data are projected on v1 is much smaller than when it projected on v2. For example, SW1 when the data are projected on v1 is much smaller than when the data are projected on v2. Thus, projecting the data on v1 minimizes the within-class variance much better than v2. From these two notes, we conclude that the first eigenvector meets the goal of the lower-dimensional space of the LDA technique than the second eigenvector; hence, it is selected to construct a lower-dimensional space. Alaa Tharwat December 30, 2017 27 / 66
  • 29. Theoretical background to LDA Example of two LDA subspaces Introduction to Linear Discriminant Analysis (LDA). Theoretical Background to LDA. Definition of LDA. Calculating the Between-Class Variance (SB). Calculating the Within-Class Variance (SW ). Constructing the Lower Dimensional Space. Projection onto the LDA space. Example of Two LDA Subspaces. Computational Complexity of LDA. Class-Dependent vs. Class-Independent Methods. Numerical Example. Main Problems of LDA. Linearity problem. Small Sample Size Problem. Conclusions. Alaa Tharwat December 30, 2017 28 / 66
  • 30. Theoretical background to LDA Computational complexity of LDA The computational complexity for the first four steps, common in both class-dependent and class-independent methods, are computed as follows. As illustrated in Algorithm (1) (see the paper), in step (2), to calculate the mean of the ith class, there are niM additions and M divisions, i.e., in total, there are (NM + cM) operations. In step (3), there are NM additions and M divisions, i.e., there are (NM + M) operations. The computational complexity of the fourth step is c(M + M2 + M2), where M is for µi − µ, M2 for (µi − µ)(µi − µ)T , and the last M2 is for the multiplication between ni and the matrix (µi − µ)(µi − µ)T . In the fifth step, there are N(M + M2) operations, where M is for (xij − µj) and M2 is for (xij − µj)(xij − µj)T . In the sixth step, there are M3 operations to calculate S−1 W , M3 is for the multiplication between S−1 W and SB, and M3 to calculate the eigenvalues and eigenvectors. Alaa Tharwat December 30, 2017 29 / 66
  • 31. Theoretical background to LDA Computational complexity of LDA In class-independent method, the computational complexity is O(NM2) if N > M; otherwise, the complexity is O(M3). In class-dependent algorithm, the number of operations to calculate the within-class variance for each class SWj in the sixth step is nj(M + M2), and to calculate SW , N(M + M2) operations are needed. Hence, calculating the within-class variance for both LDA methods are the same. In the seventh step and eighth, there are M3 operations for the inverse, M3 for the multiplication of S−1 Wi SB, and M3 for calculating eigenvalues and eigenvectors. These two steps are repeated for each class which increases the complexity of the class-dependent algorithm. Totally, the computational complexity of the class-dependent algorithm is O(NM2) if N > M; otherwise, the complexity is O(cM3). Hence, the class-dependent method needs computations more than class-independent method. Given 40 classes and each class has ten samples. Each sample is represented by 4096 features (M > N). Thus, the computational complexity of the class-independent method is O(M3) = 40963, while the class-dependent method needs O(cM3) = 40 × 40963. Alaa Tharwat December 30, 2017 30 / 66
  • 32. Theoretical background to LDA Computational complexity of LDA Introduction to Linear Discriminant Analysis (LDA). Theoretical Background to LDA. Definition of LDA. Calculating the Between-Class Variance (SB). Calculating the Within-Class Variance (SW ). Constructing the Lower Dimensional Space. Projection onto the LDA space. Example of Two LDA Subspaces. Computational Complexity of LDA. Class-Dependent vs. Class-Independent Methods. Numerical Example. Main Problems of LDA. Linearity problem. Small Sample Size Problem. Conclusions. Alaa Tharwat December 30, 2017 31 / 66
  • 33. Theoretical background to LDA Class-Dependent vs. Class-Independent methods The aim of the two methods (class-dependent vs. class-independent) of the LDA is to calculate the LDA space. In the class-dependent LDA, one separate lower dimensional space is calculated for each class as follows, Wi = S−1 Wi SB, where Wi represents the transformation matrix for the ith class. Thus, eigenvalues and eigenvectors are calculated for each transformation matrix separately. Hence, the samples of each class are projected on their corresponding eigenvectors. In the class-independent method, one lower dimensional space is calculated for all classes. Thus, the transformation matrix is calculated for all classes, and the samples of all classes are projected on the selected eigenvectors. Alaa Tharwat December 30, 2017 32 / 66
  • 34. Theoretical background to LDA Class-Dependent vs. Class-Independent methods Introduction to Linear Discriminant Analysis (LDA). Theoretical Background to LDA. Definition of LDA. Calculating the Between-Class Variance (SB). Calculating the Within-Class Variance (SW ). Constructing the Lower Dimensional Space. Projection onto the LDA space. Example of Two LDA Subspaces. Computational Complexity of LDA. Class-Dependent vs. Class-Independent Methods. Numerical Example. Main Problems of LDA. Linearity problem. Small Sample Size Problem. Conclusions. Alaa Tharwat December 30, 2017 33 / 66
  • 35. Numerical example Given two different classes, ω1(5 × 2) and ω2(6 × 2) have (n1 = 5) and (n2 = 6) samples, respectively. Each sample in both classes is represented by two features (i.e. M = 2) as follows: ω1 =       1.00 2.00 2.00 3.00 3.00 3.00 4.00 5.00 5.00 5.00       and ω2 =         4.00 2.00 5.00 0.00 5.00 2.00 3.00 2.00 5.00 3.00 6.00 3.00         (8) µ1 = 3.00 3.60 , µ2 = 4.67 2.00 , and (9) µ = 5 11µ1 6 11µ2 = 3.91 2.727 (10) Alaa Tharwat December 30, 2017 34 / 66
  • 36. Numerical example To calculate SB1 SB1 = n1(µ1 − µ)T (µ1 − µ) = 5[−0.91 0.87]T [−0.91 0.87] = 4.13 −3.97 −3.97 3.81 (11) Similarly, SB1 SB2 = 3.44 −3.31 −3.31 3.17 (12) The total between-class variance is calculated a follows: SB = SB1 + SB2 = 4.13 −3.97 −3.97 3.81 + 3.44 −3.31 −3.31 3.17 = 7.58 −7.27 −7.27 6.98 (13) Alaa Tharwat December 30, 2017 35 / 66
  • 37. Numerical example To calculate SW , first calculate mean-centering data. d1 =       −2.00 −1.60 −1.00 −0.60 0.00 −0.60 1.00 1.40 2.00 1.40       and d2 =         −0.67 0.00 0.33 −2.00 0.33 0.00 −1.67 0.00 0.33 1.00 1.33 1.00         (14) Alaa Tharwat December 30, 2017 36 / 66
  • 38. Numerical example After centering the data, in class-independent method, the within-class variance for each class (SWi (2 × 2)) is calculated as follows, SWj = dT j ∗ dj = nj i=1(xij − µj)T (xij − µj), where xij represents the ith sample in the jth class. The total within-class matrix (SW (2 × 2)) is then calculated as follows, SW = c i=1 SWi . SW1 = 10.00 8.00 8.00 7.20 , SW2 = 5.33 1.00 1.00 6.00 , SW = 15.33 9.00 9.00 13.20 (15) Alaa Tharwat December 30, 2017 37 / 66
  • 39. Numerical example The transformation matrix (W) can be obtained as follows, W = S−1 W SB, and the values of (S−1 W ) and (W) are as follows: S−1 W = 0.11 −0.07 −0.07 0.13 and W = 1.37 −1.32 −1.49 1.43 (16) The eigenvalues (λ(2 × 2)) and eigenvectors (V (2 × 2)) of W are then calculated as follows: λ = 0.00 0.00 0.00 2.81 and V = −0.69 0.68 −0.72 −0.74 (17) The second eigenvector (V2) has corresponding eigenvalue more than the first one (V1), which reflects that, the second eigenvector is more robust than the first one; hence, it is selected to construct the lower dimensional space. Alaa Tharwat December 30, 2017 38 / 66
  • 40. Numerical example The original data is projected on the lower dimensional space, as follows, yi = ωi V2, where yi(ni × 1) represents the data after projection of the ith class, and its values will be as follows: y1 = ω1V2 =       1.00 2.00 2.00 3.00 3.00 3.00 4.00 5.00 5.00 5.00       0.68 −0.74 =       −0.79 −0.85 −0.18 −0.97 −0.29       (18) Similarly, y2 is as follows: y2 = ω2V2 =         1.24 3.39 1.92 0.56 1.18 1.86         (19) Alaa Tharwat December 30, 2017 39 / 66
  • 41. Numerical example The data of each class is completely discriminated when it is projected on the second eigenvector (see Fig.)(b)) than the first one (see Fig. a)). The within-class variance (i.e. the variance between the same class samples) of the two classes are minimized when the data are projected on the second eigenvector. The within-class variance of the first class is small compared with as shown Fig. (a). −9 −8 −7 −6 −5 −4 −3 −2 −1 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Projected Data P(ProjectedData) Class 1 Class 2 SB SW2 SW1 (a) −1 0 1 2 3 4 5 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Projected Data P(ProjectedData) Class 1 Class 2SB SW2 SW1 (b) Figure: Probability density function of the projected data of the first example, (a) the projected data on V1, (b) the projected data on V2. Alaa Tharwat December 30, 2017 40 / 66
  • 42. Numerical example For class-dependent method, the aim is to calculate a separate transformation matrix (Wi) for each class. The within-class variance for each class (SWi (2 × 2)) is calculated as in class-independent method. The transformation matrix (Wi) for each class is then calculated as follows, Wi = S−1 Wi SB. The values of the two transformation matrices (W1 and W2) will be as follows: W1 = S−1 W1 SB = 10.00 8.00 8.00 7.20 −1 7.58 −7.27 −7.27 6.98 = 0.90 −1.00 −1.00 1.25 7.58 −7.27 −7.27 6.98 = 14.09 −13.53 −16.67 16.00 (20) Alaa Tharwat December 30, 2017 41 / 66
  • 43. Numerical example Similarly, W2 is calculated as follows: W2 = 1.70 −1.63 −1.50 1.44 (21) The eigenvalues (λi) and eigenvectors (Vi) for each transformation matrix (Wi) are calculated, and the values of the eigenvalues and eigenvectors are shown below. λω1 = 0.00 0.00 0.00 30.01 and Vω1 = −0.69 0.65 −0.72 −0.76 (22) λω2 = 3.14 0.00 0.00 0.00 and Vω2 = 0.75 0.69 −0.66 0.72 (23) where λωi and Vωi represent the eigenvalues and eigenvectors of the ith class, respectively. Alaa Tharwat December 30, 2017 42 / 66
  • 44. Numerical example From the results shown (above) it can be seen that, the second eigenvector of the first class (V {2} ω1 ) has corresponding eigenvalue more than the first one; thus, the second eigenvector is used as a lower dimensional space for the first class as follows, y1 = ω1 ∗ V {2} ω1 , where y1 represents the projection of the samples of the first class. The first eigenvector in the second class (V {1} ω2 ) has corresponding eigenvalue more than the second one. Thus, V {1} ω2 is used to project the data of the second class as follows, y2 = ω2 ∗ V {1} ω2 , where y2 represents the projection of the samples of the second class. The values of y1 and y2 will be as follows: y1 =       −0.88 −1.00 −0.35 −1.24 −0.59       and y2 =         1.68 3.76 2.43 0.93 1.77 2.53         (24) Alaa Tharwat December 30, 2017 43 / 66
  • 45. Numerical example The Figure (below) shows a pdf graph of the projected data (i.e. y1 and y2) on the two eigenvectors (V {2} ω1 and V {1} ω2 ) and a number of findings are revealed the following: First, the projection data of the two classes are efficiently discriminated. Second, the within-class variance of the projected samples is lower than the within-class variance of the original samples. −1 0 1 2 3 4 5 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Projected Data P(ProjectedData) Class 1 Class 2 Figure: Probability density function (pdf) of the projected data using class-dependent method, the first class is projected on V {2} ω1 , while the second class is projected on V {1} ω2 . Alaa Tharwat December 30, 2017 44 / 66
  • 46. Numerical example 1 2 3 4 5 6 1 2 3 4 5 6 x2 µ1 Vω2 1} Vω1 {2} SamplesPofPClassP1 Class-DependentP LDAPsubspaces ProjectionP onPV1 µ2 µ SamplesPofPClassP2 7 7 x1 Class-IndependentP LDAPsubspaces SW 1 S W 1 S W 1 SW 1 m 1 -m 2 m 1 m 2 m 1-m 2 ProjectionP onPV2 m 1 -m 2 µ 1 -µ 2 µ 1 -µ 2 MeanPofPClassP1 MeanPofPClassP2 ProjectedPMeanPofPClassP1 ProjectedPMeanPofPClassP2 TotalPMean Figure: Illustration of the example of the two different methods of LDA methods. The blue and red lines represent the first and second eigenvectors of the class-dependent approach, respectively, while the solid and dotted black lines represent the second and first eigenvectors of class-independent approach, respectively. Alaa Tharwat December 30, 2017 45 / 66
  • 47. Numerical example Figure (above) shows a further explanation of the two methods as following: Class-Independent: As shown from the figure, there are two eigenvectors, V1 (dotted black line) and V2 (solid black line). The differences between the two eigenvectors are as follows: The projected data on the second eigenvector (V2) which has the highest corresponding eigenvalue will discriminate the data of the two classes better than the first eigenvector. As shown in the figure, the distance between the projected means m1 − m2 which represents SB, increased when the data are projected on V2 than V1. The second eigenvector decreases the within-class variance much better than the first eigenvector. The above figure illustrates that the within-class variance of the first class (SW1 ) was much smaller when it was projected on V2 than V1. As a result of the above two findings, V2 is used to construct the LDA space. Alaa Tharwat December 30, 2017 46 / 66
  • 48. Numerical example Figure (above) shows a further explanation of the two methods as following: Class-Dependent: As shown from the figure, there are two eigenvectors, V {2} ω1 (red line) and V {1} ω2 (blue line), which represent the first and second classes, respectively. The differences between the two eigenvectors are as following: Projecting the original data on the two eigenvectors discriminates between the two classes. As shown in the figure, the distance between the projected means m1 − m2 is larger than the distance between the original means µ1 − µ2. The within-class variance of each class is decreased. For example, the within-class variance of the first class (SW1 ) is decreased when it is projected on its corresponding eigenvector. As a result of the above two findings, V {2} ω1 and V {1} ω2 are used to construct the LDA space. Alaa Tharwat December 30, 2017 47 / 66
  • 49. Numerical example Figure (above) shows a further explanation of the two methods as following: Class-Dependent vs. Class-Independent: The two LDA methods are used to calculate the LDA space, but a class-dependent method calculates separate lower dimensional spaces for each class which has two main limitations: (1) it needs more CPU time and calculations more than class-independent method; (2) it may lead to SSS problem because the number of samples in each class affects the singularity of SWi . These findings reveal that the standard LDA technique used the class-independent method rather than using the class-dependent method. Alaa Tharwat December 30, 2017 48 / 66
  • 50. Numerical example Introduction to Linear Discriminant Analysis (LDA). Theoretical Background to LDA. Definition of LDA. Calculating the Between-Class Variance (SB). Calculating the Within-Class Variance (SW ). Constructing the Lower Dimensional Space. Projection onto the LDA space. Example of Two LDA Subspaces. Computational Complexity of LDA. Class-Dependent vs. Class-Independent Methods. Numerical Example. Main Problems of LDA. Linearity problem. Small Sample Size Problem. Conclusions. Alaa Tharwat December 30, 2017 49 / 66
  • 51. Main problems of LDA Although LDA is one of the most common data reduction techniques, it suffers from two main problems: Small Sample Size (SSS) and linearity problems. Alaa Tharwat December 30, 2017 50 / 66
  • 52. Main problems of LDA Introduction to Linear Discriminant Analysis (LDA). Theoretical Background to LDA. Definition of LDA. Calculating the Between-Class Variance (SB). Calculating the Within-Class Variance (SW ). Constructing the Lower Dimensional Space. Projection onto the LDA space. Example of Two LDA Subspaces. Computational Complexity of LDA. Class-Dependent vs. Class-Independent Methods. Numerical Example. Main Problems of LDA. Linearity problem. Small Sample Size Problem. Conclusions. Alaa Tharwat December 30, 2017 51 / 66
  • 53. Main problems of LDA Linearity problem LDA technique is used to find a linear transformation that discriminates between different classes. If the classes are non-linearly separable, LDA cannot find a lower dimensional space. In other words, LDA fails to find the LDA space when the discriminatory information are not in the means of classes. The Figure (below) shows how the discriminatory information does not exist in the mean, but in the variance of the data. This is because the means of the two classes are equal. The mathematical interpretation for this problem is as follows: if the means of the classes are approximately equal, so the SB and W will be zero. Hence, the LDA space cannot be calculated. Alaa Tharwat December 30, 2017 52 / 66
  • 54. Main problems of LDA Linearity problem µ µ2µ1= =x x ω1 Mapping or Transformation Non-Linearly Seprable Classes Linearly Seprable Classes ω1 ω1 ω1 ω1ω1 ω2 ω2 ω2 ω2 ω2 ω2 Figure: Two examples of two non-linearly separable classes, top panel shows how the two classes are non-separable, while the bottom shows how the transformation solves this problem and the two classes are linearly separable. Alaa Tharwat December 30, 2017 53 / 66
  • 55. Main problems of LDA Linearity problem One of the solutions of the linearity problem is based on the transformation concept, which is known as a kernel methods or functions. The kernel idea is applied in Support Vector Machine (SVM), Support Vector Regression (SVR), PCA, and LDA. The previous figure illustrates how the transformation is used to map the original data into a higher dimensional space; hence, the data will be linearly separable, and the LDA technique can find the lower dimensional space in the new space. The figure (next slide) graphically and mathematically shows how two non-separable classes in one-dimensional space are transformed into a two-dimensional space (i.e. higher dimensional space); thus, allowing linear separation. Alaa Tharwat December 30, 2017 54 / 66
  • 56. Main problems of LDA Linearity problem 1 2 Mapping 0 1 2 3 4 -1-2 1 20-1-2 X Φ(x)=(x,x2 )=(z1,z2)=Z z2 z1 (1)→ (1,1) (-1)→(-1,1) (2)→ (2,4) (-2)→(-2,4) Samples are Non-linearly Seprable Samples are linearly Seprable Class 1 Class 2 Figure: Example of kernel functions, the samples lie on the top panel (X) which are represented by a line (i.e. one-dimensional space) are non-linearly separable, where the samples lie on the bottom panel (Z) which are generated from mapping the samples of the top space are linearly separable. Alaa Tharwat December 30, 2017 55 / 66
  • 57. Main problems of LDA Linearity problem Let φ represents a nonlinear mapping to the new feature space Z. The transformation matrix (W) in the new feature space (Z) is calculated as follows: F(W) = max WT Sφ BW WT Sφ W W (25) where W is a transformation matrix and Z is the new feature space. The between-class matrix (Sφ B) and the within-class matrix (Sφ W ) are defined as follows: Sφ B = c i=1 ni(µφ i − µφ )(µφ i − µφ )T (26) Sφ W = c j=1 nj i=1 (φ{xij} − µφ j )(φ{xij} − µφ j )T (27) where µφ i = 1 ni ni i=1 φ{xi} and µφ = 1 N N i=1 φ{xi} = c i=1 ni N µφ i Alaa Tharwat December 30, 2017 56 / 66
  • 58. Main problems of LDA Linearity problem Thus, in kernel LDA, all samples are transformed non-linearly into a new space Z using the function φ. In other words, the φ function is used to map the original features into Z space by creating a nonlinear combination of the original samples using a dot-products of it. There are many types of kernel functions to achieve this aim. Examples of these function include Gaussian or Radial Basis Function (RBF), K(xi, xj) = exp(−||xi − xj||2/2σ2), where σ is a positive parameter, and the polynomial kernel of degree d, K(xi, xj) = ( xi, xj + c)d. Alaa Tharwat December 30, 2017 57 / 66
  • 59. Main problems of LDA Linearity problem Introduction to Linear Discriminant Analysis (LDA). Theoretical Background to LDA. Definition of LDA. Calculating the Between-Class Variance (SB). Calculating the Within-Class Variance (SW ). Constructing the Lower Dimensional Space. Projection onto the LDA space. Example of Two LDA Subspaces. Computational Complexity of LDA. Class-Dependent vs. Class-Independent Methods. Numerical Example. Main Problems of LDA. Linearity problem. Small Sample Size Problem. Conclusions. Alaa Tharwat December 30, 2017 58 / 66
  • 60. Main problems of LDA Small Sample Size (SSS) problem Problem Definition Singularity, Small Sample Size (SSS), or under-sampled problem is one of the big problems of LDA technique. This problem results from high-dimensional pattern classification tasks or a low number of training samples available for each class compared with the dimensionality of the sample space. The SSS problem occurs when the SW is singular1 . The upper bound of the rank2 of SW is N − c, while the dimension of SW is M × M. Thus, in most cases M >> N − c which leads to SSS problem. For example, in face recognition applications, the size of the face image my reach to 100 × 100 = 10000 pixels, which represent high-dimensional features and it leads to a singularity problem. 1 A matrix is singular if it is square, does not have a matrix inverse, the determinant is zeros; hence, not all columns and rows are independent 2 The rank of the matrix represents the number of linearly independent rows or columns Alaa Tharwat December 30, 2017 59 / 66
  • 61. Main problems of LDA Small Sample Size (SSS) problem Common Solutions to SSS Problem. Regularization (RLDA): In regularization method, the identity matrix is scaled by multiplying it by a regularization parameter (η > 0) and adding it to the within-class matrix to make it non-singular. Thus, the diagonal components of the within-class matrix are biased as follows, SW = SW + ηI. However, choosing the value of the regularization parameter requires more tuning and a poor choice for this parameter can degrade the performance of the method. The parameter η is just added to perform the inverse of SW and has no clear mathematical interpretation. Alaa Tharwat December 30, 2017 60 / 66
  • 62. Main problems of LDA Small Sample Size (SSS) problem Common Solutions to SSS Problem. Sub-space: In this method, a non-singular intermediate space is obtained to reduce the dimension of the original data to be equal to the rank of SW ; hence, SW becomes full-rank3 , and then SW can be inverted. For example, Belhumeur et al. used PCA, to reduce the dimensions of the original space to be equal to N − c (i.e. the upper bound of the rank of SW ). However, losing some discriminant information is a common drawback associated with the use of this method. Null Space: There are many studies proposed to remove the null space of SW to make SW full-rank; hence, invertible. The drawback of this method is that more discriminant information is lost when the null space of SW is removed, which has a negative impact on how the lower dimensional space satisfies the LDA goal. 3 A is a full-rank matrix if all columns and rows of the matrix are independent, (i.e. rank(A)= # rows= #cols) Alaa Tharwat December 30, 2017 61 / 66
  • 63. Main problems of LDA Small Sample Size (SSS) problem Four different variants of the LDA technique that are used to solve the SSS problem are introduced as follows: PCA + LDA technique: In this technique, the original d-dimensional features are first reduced to h-dimensional feature space using PCA, and then the LDA is used to further reduce the features to k-dimensions. The PCA is used in this technique to reduce the dimensions to make the rank of SW is N − c; hence, the SSS problem is addressed. However, the PCA neglects some discriminant information, which may reduce the classification performance. Direct LDA technique Direct LDA (DLDA) is one of the well-known techniques that are used to solve the SSS problem. This technique has two main steps. In the first step, the transformation matrix, W, is computed to transform the training data to the range space of SB. In the second step, the dimensionality of the transformed data is further transformed using some regulating matrices. The benefit of the DLDA is that there is no discriminative features are neglected as in PCA+LDA technique. Alaa Tharwat December 30, 2017 62 / 66
  • 64. Main problems of LDA Small Sample Size (SSS) problem Four different variants of the LDA technique that are used to solve the SSS problem are introduced as follows: Regularized LDA technique: In the Regularized LDA (RLDA), a small perturbation is add to the SW matrix to make it non-singular. This regularization can be applied as follows: (SW + ηI)−1 SBwi = λiwi (28) where η represents a regularization parameter. The diagonal components of the SW are biased by adding this small perturbation. However, the regularization parameter need to be tuned and poor choice of it can degrade the generalization performance. W = arg max |WT SW W|=0 |WT SBW| (29) Alaa Tharwat December 30, 2017 63 / 66
  • 65. Main problems of LDA Small Sample Size (SSS) problem Four different variants of the LDA technique that are used to solve the SSS problem are introduced as follows: Null LDA technique The aim of NLDA is to find the orientation matrix W. Firstly, the range space of the SW is neglected, and the data are projected only on the null space of SW as follows, SW W = 0. Secondly, the aim is to search for W that satisfies SBW = 0 and maximizes |WT SBW|. The higher dimensionality of the feature space may lead to computational problems. This problem can be solved by (1) using the PCA technique as a pre-processing step, i.e. before applying the NLDA technique, to reduce the dimension of feature space to be N − 1; by removing the null space of ST = SB + SW , (2) using the PCA technique before the second step of the NLDA technique. Mathematically, in the Null LDA (NLDA) technique, the h column vectors of the transformation matrix W = [w1, w2, . . . , wh] are taken to be the null space of the SW as follows, wT i SW wi = 0, ∀i = 1 . . . h, where wT i SBwi = 0. Hence, M − (N − c) linearly independent vectors are used to form a new orientation matrix, which is used to maximize |WT SBW| subject to the constraint |WT SW W| = 0 as follows, W = arg max |W T SW W |=0 |WT SBW|. Alaa Tharwat December 30, 2017 64 / 66
  • 66. Main problems of LDA Small Sample Size (SSS) problem Introduction to Linear Discriminant Analysis (LDA). Theoretical Background to LDA. Definition of LDA. Calculating the Between-Class Variance (SB). Calculating the Within-Class Variance (SW ). Constructing the Lower Dimensional Space. Projection onto the LDA space. Example of Two LDA Subspaces. Computational Complexity of LDA. Class-Dependent vs. Class-Independent Methods. Numerical Example. Main Problems of LDA. Linearity problem. Small Sample Size Problem. Conclusions. Alaa Tharwat December 30, 2017 65 / 66
  • 67. Conclusions LDA is an easy to implement dimensionality reduction method. The goals of LDA is to increase between class variance and decrease within-class variance. The value of eigenvalues reflect the robustness of the corresponding eigenvector. For more details, read the original paper ”Alaa Tharwat, Tarek Gaber, Abdelhameed Ibrahim and Aboul Ella Hassanien. ”Linear discriminant analysis: A detailed tutorial” AI Communications 30 (2017) 169-190” For more questions, send to engalaatharwat@hotmail.com Alaa Tharwat December 30, 2017 66 / 66View publication statsView publication stats