SlideShare a Scribd company logo
A Tutorial on Data
Reduction
Linear Discriminant
Analysis (LDA)
Shireen Elhabian and Aly A. Farag
University of Louisville, CVIP Lab
September 2009
Outline
• LDA objective
• Recall … PCA
• Now … LDA
• LDA … Two Classes
– Counter example
• LDA … C Classes
– Illustrative Example
• LDA vs PCA Example
• Limitations of LDA
LDA Objective
• The objective of LDA is to perform
dimensionality reduction …
– So what, PCA does this L…
• However, we want to preserve as much of the
class discriminatory information as possible.
– OK, that’s new, let dwell deeper J …
Recall … PCA
• In PCA, the main idea to re-express the available dataset to
extract the relevant information by reducing the redundancy
and minimize the noise.
• We didn’t care about whether this dataset represent features from one
or more classes, i.e. the discrimination power was not taken into
consideration while we were talking about PCA.
• In PCA, we had a dataset matrix X with dimensions mxn, where
columns represent different data samples.
• We first started by subtracting the mean to have a zero mean dataset,
then we computed the covariance matrix Sx = XXT.
• Eigen values and eigen vectors were then computed for Sx. Hence the
new basis vectors are those eigen vectors with highest eigen values,
where the number of those vectors was our choice.
• Thus, using the new basis, we can project the dataset onto a less
dimensional space with more powerful data representation.
n – feature vectors
(data samples)
m-dimensionaldatavector
Now … LDA
• Consider a pattern classification problem, where we have C-
classes, e.g. seabass, tuna, salmon …
• Each class has Ni m-dimensional samples, where i = 1,2, …, C.
• Hence we have a set of m-dimensional samples {x1, x2,…, xNi}
belong to class ωi.
• Stacking these samples from different classes into one big fat
matrix X such that each column represents one sample.
• We seek to obtain a transformation of X to Y through
projecting the samples in X onto a hyperplane with
dimension C-1.
• Let’s see what does this mean?
LDA … Two Classes
• Assume we have m-dimensional samples {x1,
x2,…, xN}, N1 of which belong to ω1 and
N2 belong to ω2.
• We seek to obtain a scalar y by projecting
the samples x onto a line (C-1 space, C = 2).
– where w is the projection vectors used to project x
to y.
• Of all the possible lines we would like to
select the one that maximizes the
separability of the scalars.
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ë
é
=
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ë
é
==
mm
T
w
w
wand
x
x
xwherexwy
.
.
.
.
11
The two classes are not well
separated when projected onto
this line
This line succeeded in separating
the two classes and in the
meantime reducing the
dimensionality of our problem
from two features (x1,x2) to only a
scalar value y.
LDA … Two Classes
• In order to find a good projection vector, we need to define a
measure of separation between the projections.
• The mean vector of each class in x and y feature space is:
– i.e. projecting x to y will lead to projecting the mean of x to the mean of
y.
• We could then choose the distance between the projected means
as our objective function
i
i
ii
i
i
i
T
xi
T
x
T
iyixi
wx
N
w
xw
N
y
N
andx
N
µ
µµ
w
www
==
===
å
ååå
Î
ÎÎÎ
1
11~1
( )222 111
~~)( µµµµµµ -=-=-= TTT
wwwwJ
LDA … Two Classes
• However, the distance between the projected means is not a very
good measure since it does not take into account the standard
deviation within the classes.
This axis has a larger distance between means
This axis yields better class separability
LDA … Two Classes
• The solution proposed by Fisher is to maximize a function that
represents the difference between the means, normalized by a
measure of the within-class variability, or the so-called scatter.
• For each class we define the scatter, an equivalent of the
variance, as; (sum of square differences between the projected samples and their class
mean).
• measures the variability within class ωi after projecting it on
the y-space.
• Thus measures the variability within the two
classes at hand after projection, hence it is called within-class scatter
of the projected samples.
( )åÎ
-=
iy
ii ys
w
µ 22 ~~
2~
is
2
2
2
1
~~ ss +
LDA … Two Classes
• The Fisher linear discriminant is defined as
the linear function wTx that maximizes the
criterion function: (the distance between the
projected means normalized by the within-
class scatter of the projected samples.
• Therefore, we will be looking for a projection
where examples from the same class are
projected very close to each other and, at the
same time, the projected means are as farther
apart as possible
2
2
2
1
2
2
~~
~~
)( 1
ss
wJ
+
-
=
µµ
LDA … Two Classes
• In order to find the optimum projection w*, we need to express
J(w) as an explicit function of w.
• We will define a measure of the scatter in multivariate feature
space x which are denoted as scatter matrices;
• Where Si is the covariance matrix of class ωi, and Sw is called the
within-class scatter matrix.
( )( )
21 SSS
xxS
w
T
i
x
ii
i
+=
--= åÎ
µµ
w
2
2
2
1
2
2
~~
~~
)( 1
ss
wJ
+
-
=
µµ
LDA … Two Classes
• Now, the scatter of the projection y can then be expressed as a function of
the scatter matrix in feature space x.
Where is the within-class scatter matrix of the projected samples y.
( ) ( )
( )( )
( )( )
( ) WW
TTTT
i
T
x
T
ii
T
x
T
ii
T
x
i
TT
y
ii
SwSwwSSwwSwwSwss
wSwwxxw
wxxw
wxwys
i
i
ii
~~~
~~
2121
2
2
2
1
222
==+=+=+
=÷
÷
ø
ö
ç
ç
è
æ
--=
--=
-=-=
å
å
åå
Î
Î
ÎÎ
w
w
ww
µµ
µµ
µµ
WS
~
2
2
2
1
2
2
~~
~~
)( 1
ss
wJ
+
-
=
µµ
LDA … Two Classes
• Similarly, the difference between the projected means (in y-space) can be
expressed in terms of the means in the original feature space (x-space).
• The matrix SB is called the between-class scatter of the original samples/feature
vectors, while is the between-class scatter of the projected samples y.
• Since SB is the outer product of two vectors, its rank is at most one.
( ) ( )
( )( )
BB
T
S
TT
TT
SwSw
ww
ww
B
~
~~
2121
2
21
2
21
==
--=
-=-
!!! "!!! #$
µµµµ
µµµµ
BS
~
2
2
2
1
2
2
~~
~~
)( 1
ss
wJ
+
-
=
µµ
LDA … Two Classes
• We can finally express the Fisher criterion in terms of SW and SB
as:
• Hence J(w) is a measure of the difference between class
means (encoded in the between-class scatter matrix)
normalized by a measure of the within-class scatter matrix.
wSw
wSw
ss
wJ
W
T
B
T
=
+
-
= 2
2
2
1
2
2
~~
~~
)( 1
µµ
LDA … Two Classes
• To find the maximum of J(w), we differentiate and equate to
zero.
( ) ( ) ( ) ( )
( ) ( )
0)(
0)(
0
:2
022
0
0)(
1
=-Þ
=-Þ
=÷÷
ø
ö
çç
è
æ
-÷÷
ø
ö
çç
è
æ
Þ
=-Þ
=-Þ
=÷÷
ø
ö
çç
è
æ
=
-
wwJwSS
wSwJwS
wS
wSw
wSw
wS
wSw
wSw
wSwbyDividing
wSwSwwSwSw
wSw
dw
d
wSwwSw
dw
d
wSw
wSw
wSw
dw
d
wJ
dw
d
BW
WB
W
W
T
B
T
B
W
T
W
T
W
T
WB
T
BW
T
W
T
B
T
B
T
W
T
W
T
B
T
LDA … Two Classes
• Solving the generalized eigen value problem
yields
• This is known as Fisher’s Linear Discriminant, although it is not a
discriminant but rather a specific choice of direction for the projection
of the data down to one dimension.
• Using the same notation as PCA, the solution will be the eigen
vector(s) of
scalarwJwherewwSS BW ===-
)(1
ll
( )21
1*
maxarg)(maxarg µµ -=÷÷
ø
ö
çç
è
æ
== -
W
W
T
B
T
ww
S
wSw
wSw
wJw
BWX SSS 1-
=
LDA … Two Classes - Example
• Compute the Linear Discriminant projection for the following two-
dimensional dataset.
– Samples for class ω1 : X1=(x1,x2)={(4,2),(2,4),(2,3),(3,6),(4,4)}
– Sample for class ω2 : X2=(x1,x2)={(9,10),(6,8),(9,5),(8,7),(10,8)}
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
x1
x2
LDA … Two Classes - Example
• The classes mean are :
÷÷
ø
ö
çç
è
æ
=ú
û
ù
ê
ë
é
÷÷
ø
ö
çç
è
æ
+÷÷
ø
ö
çç
è
æ
+÷÷
ø
ö
çç
è
æ
+÷÷
ø
ö
çç
è
æ
+÷÷
ø
ö
çç
è
æ
==
÷÷
ø
ö
çç
è
æ
=ú
û
ù
ê
ë
é
÷÷
ø
ö
çç
è
æ
+÷÷
ø
ö
çç
è
æ
+÷÷
ø
ö
çç
è
æ
+÷÷
ø
ö
çç
è
æ
+÷÷
ø
ö
çç
è
æ
==
å
å
Î
Î
6.7
4.8
8
10
7
8
5
9
8
6
10
9
5
11
8.3
3
4
4
6
3
3
2
4
2
2
4
5
11
2
2
1
1
2
1
w
w
µ
µ
x
x
x
N
x
N
LDA … Two Classes - Example
• Covariance matrix of the first class:
( )( )
÷÷
ø
ö
çç
è
æ
-
-
=
ú
û
ù
ê
ë
é
÷÷
ø
ö
çç
è
æ
-÷÷
ø
ö
çç
è
æ
+ú
û
ù
ê
ë
é
÷÷
ø
ö
çç
è
æ
-÷÷
ø
ö
çç
è
æ
+ú
û
ù
ê
ë
é
÷÷
ø
ö
çç
è
æ
-÷÷
ø
ö
çç
è
æ
+
ú
û
ù
ê
ë
é
÷÷
ø
ö
çç
è
æ
-÷÷
ø
ö
çç
è
æ
+ú
û
ù
ê
ë
é
÷÷
ø
ö
çç
è
æ
-÷÷
ø
ö
çç
è
æ
=--= åÎ
2.225.0
25.01
8.3
3
4
4
8.3
3
6
3
8.3
3
3
2
8.3
3
4
2
8.3
3
2
4
222
22
111
1
T
x
xxS µµ
w
LDA … Two Classes - Example
• Covariance matrix of the second class:
( )( )
÷÷
ø
ö
çç
è
æ
-
-
=
ú
û
ù
ê
ë
é
÷÷
ø
ö
çç
è
æ
-÷÷
ø
ö
çç
è
æ
+ú
û
ù
ê
ë
é
÷÷
ø
ö
çç
è
æ
-÷÷
ø
ö
çç
è
æ
+ú
û
ù
ê
ë
é
÷÷
ø
ö
çç
è
æ
-÷÷
ø
ö
çç
è
æ
+
ú
û
ù
ê
ë
é
÷÷
ø
ö
çç
è
æ
-÷÷
ø
ö
çç
è
æ
+ú
û
ù
ê
ë
é
÷÷
ø
ö
çç
è
æ
-÷÷
ø
ö
çç
è
æ
=--= åÎ
3.305.0
05.03.2
6.7
4.8
8
10
6.7
4.8
7
8
6.7
4.8
5
9
6.7
4.8
8
6
6.7
4.8
10
9
222
22
222
2
T
x
xxS µµ
w
LDA … Two Classes - Example
• Within-class scatter matrix:
÷÷
ø
ö
çç
è
æ
-
-
=
÷÷
ø
ö
çç
è
æ
-
-
+÷÷
ø
ö
çç
è
æ
-
-
=+=
5.53.0
3.03.3
3.305.0
05.03.2
2.225.0
25.01
21 SSSw
LDA … Two Classes - Example
• Between-class scatter matrix:
( )( )
( )
÷÷
ø
ö
çç
è
æ
=
--÷÷
ø
ö
çç
è
æ
-
-
=
ú
û
ù
ê
ë
é
÷÷
ø
ö
çç
è
æ
-÷÷
ø
ö
çç
è
æ
ú
û
ù
ê
ë
é
÷÷
ø
ö
çç
è
æ
-÷÷
ø
ö
çç
è
æ
=
--=
44.1452.20
52.2016.29
8.34.5
8.3
4.5
6.7
4.8
8.3
3
6.7
4.8
8.3
3
2121
T
T
BS µµµµ
LDA … Two Classes - Example
• The LDA projection is then obtained as the solution of the generalized eigen
value problem
( )( )
( )
2007.12,0
02007.1202007.12
02339.4489.69794.22213.9
9794.22339.4
489.62213.9
0
10
01
44.1452.20
52.2016.29
1827.00166.0
0166.03045.0
0
10
01
44.1452.20
52.2016.29
5.53.0
3.03.3
0
21
2
1
1
1
==Þ
=-Þ=-Þ
=´---=
÷÷
ø
ö
çç
è
æ
-
-
Þ
=÷÷
ø
ö
çç
è
æ
-÷÷
ø
ö
çç
è
æ
÷÷
ø
ö
çç
è
æ
Þ
=÷÷
ø
ö
çç
è
æ
-÷÷
ø
ö
çç
è
æ
÷÷
ø
ö
çç
è
æ
-
-
Þ
=-Þ
=
-
-
-
ll
llll
ll
l
l
l
l
l
l
ISS
wwSS
BW
BW
LDA … Two Classes - Example
• Hence
• The optimal projection is the one that given maximum λ = J(w)
!
*
21
2
1
2
2
1
1
4173.0
9088.0
8178.0
5755.0
;
2007.12
9794.22339.4
489.62213.9
0
9794.22339.4
489.62213.9
2
1
wwandw
Thus
w
w
w
and
w
w
w
=÷÷
ø
ö
çç
è
æ
=÷÷
ø
ö
çç
è
æ-
=
÷÷
ø
ö
çç
è
æ
=÷÷
ø
ö
çç
è
æ
÷÷
ø
ö
çç
è
æ
=÷÷
ø
ö
çç
è
æ
"#"$%
l
l
LDA … Two Classes - Example
( )
÷÷
ø
ö
çç
è
æ
=
÷÷
ø
ö
çç
è
æ
-
-
÷÷
ø
ö
çç
è
æ
=
ú
û
ù
ê
ë
é
÷÷
ø
ö
çç
è
æ
-÷÷
ø
ö
çç
è
æ
÷÷
ø
ö
çç
è
æ
-
-
=-=
-
-
4173.0
9088.0
8.3
4.5
1827.00166.0
0166.03045.0
6.7
4.8
8.3
3
5.53.0
3.03.3
1
21
1*
µµWSw
Or directly;
LDA - Projection
-4 -3 -2 -1 0 1 2 3 4 5 6
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
y
p(y|wi
)
Classes PDF : using the LDA projection vector with the other eigen value = 8.8818e-016
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
x1
x
2
LDA projection vector with the other eigen value = 8.8818e-016
The projection vector
corresponding to the
smallest eigen value
Using this vector leads to
bad separability
between the two classes
LDA - Projection
0 5 10 15
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
y
p(y|wi
)
Classes PDF : using the LDA projection vector with highest eigen value = 12.2007
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
x1
x
2
LDA projection vector with the highest eigen value = 12.2007
The projection vector
corresponding to the
highest eigen value
Using this vector leads to
good separability
between the two classes
LDA … C-Classes
• Now, we have C-classes instead of just two.
• We are now seeking (C-1) projections [y1, y2, …, yC-1] by means
of (C-1) projection vectors wi.
• wi can be arranged by columns into a projection matrix W =
[w1|w2|…|wC-1] such that:
[ ]1211
1
1
11
1
1
|...||
.
.
,
.
.
--´
-
´-´
=
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ë
é
=
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ë
é
=
CCm
C
C
m
m
wwwWand
y
y
y
x
x
xwhere
xWyxwy TT
ii =Þ=
LDA … C-Classes
• If we have n-feature vectors, we can stack them into one matrix
as follows;
[ ]1211
1
2
1
1
1
1
2
1
1
1
1
21
1
2
1
1
1
|...||
.
....
....
.
,
.
....
....
.
--´
---
´-´
=
ú
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ê
ë
é
=
ú
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ê
ë
é
=
CCm
n
CCC
n
nC
n
mmm
n
nm
wwwWand
yyy
yyy
Y
xxx
xxx
Xwhere
XWY T
=
LDA – C-Classes
• Recall the two classes case, the
within-class scatter was computed as:
• This can be generalized in the C-
classes case as:
21 SSSw +=
( )( )
å
å
å
Î
Î
=
=
--=
=
i
i
xi
i
T
i
x
ii
C
i
iW
x
N
and
xxSwhere
SS
w
w
µ
µµ
1
1
x1
x2
µ1
µ2
µ3
Sw2
Example of two-dimensional
features (m = 2), with three
classes C = 3.
Ni : number of data samples
in class ωi.
LDA – C-Classes
• Recall the two classes case, the between-
class scatter was computed as:
• For C-classes case, we will measure the
between-class scatter with respect to the
mean of all classes as follows:
( )( )
å
åå
å
Î
""
=
=
==
--=
ixi
i
x
ii
x
C
i
T
iiiB
x
N
and
N
N
x
N
where
NS
w
µ
µµ
µµµµ
1
11
1
( )( )T
BS 2121 µµµµ --=
x1
x2
µ1
µ2
µ3
Sw2
µ
Example of two-dimensional
features (m = 2), with three
classes C = 3.
Ni : number of data samples
in class ωi.
N: number of all data .
LDA – C-Classes
• Similarly,
– We can define the mean vectors for the projected samples y as:
– While the scatter matrices for the projected samples y will be:
åå "Î
==
yyi
i y
N
andy
N i
1~1~ µµ
w
( )( )ååå = Î=
--==
C
i
T
i
y
i
C
i
iW yySS
i11
~~~~
µµ
w
( )( )å=
--=
C
i
T
iiiB NS
1
~~~~~
µµµµ
LDA – C-Classes
• Recall in two-classes case, we have expressed the scatter matrices of the
projected samples in terms of those of the original samples as:
This still hold in C-classes case.
• Recall that we are looking for a projection that maximizes the ratio of
between-class to within-class scatter.
• Since the projection is no longer a scalar (it has C-1 dimensions), we then use
the determinant of the scatter matrices to obtain a scalar objective function:
• And we will seek the projection W* that maximizes this ratio.
WSWS
WSWS
B
T
B
W
T
W
=
=
~
~
WSW
WSW
S
S
WJ
W
T
B
T
W
B
== ~
~
)(
LDA – C-Classes
• To find the maximum of J(W), we differentiate with respect to W and equate
to zero.
• Recall in two-classes case, we solved the eigen value problem.
• For C-classes case, we have C-1 projection vectors, hence the eigen value
problem can be generalized to the C-classes case as:
• Thus, It can be shown that the optimal projection matrix W* is the one whose
columns are the eigenvectors corresponding to the largest eigen values of the
following generalized eigen value problem:
1,...2,1)(1
-====-
CiandscalarwJwherewwSS iiiiiBW ll
scalarwJwherewwSS BW ===-
)(1
ll
[ ]*
1
*
2
*
1
**
**1
|...||)( -
-
===
=
C
BW
wwwWandscalarWJwhere
WWSS
l
l
Illustration – 3 Classes
• Let’s generate a dataset for each
class to simulate the three classes
shown
• For each class do the following,
– Use the random number generator
to generate a uniform stream of
500 samples that follows U(0,1).
– Using the Box-Muller approach,
convert the generated uniform
stream to N(0,1).
x1
x2
µ1
µ2
µ3
µ
– Then use the method of eigen values
and eigen vectors to manipulate the
standard normal to have the required
mean vector and covariance matrix .
– Estimate the mean and covariance
matrix of the resulted dataset.
• By visual inspection of the figure,
classes parameters (means and
covariance matrices can be given as
follows:
Dataset Generation
x1
x2
µ1
µ2
µ3
µ
÷÷
ø
ö
çç
è
æ
=
÷÷
ø
ö
çç
è
æ
=
÷÷
ø
ö
çç
è
æ
-
-
=
ú
û
ù
ê
ë
é
+=ú
û
ù
ê
ë
é
-
-
+=ú
û
ù
ê
ë
é-
+=
ú
û
ù
ê
ë
é
=
5.23
15.3
40
04
33
15
5
7
,
5.3
5.2
,
7
3
5
5
meanOverall
3
2
1
321
S
S
S
µµµµµµ
µ
Zero covariance to lead to data samples distributed horizontally.
Positive covariance to lead to data samples distributed along the y = x line.
Negative covariance to lead to data samples distributed along the y = -x line.
In Matlab J
It’s Working … J
-5 0 5 10 15 20
-5
0
5
10
15
20
X1
- the first feature
X2
-thesecondfeature
x1
x2
µ1
µ2
µ3
µ
Computing LDA Projection Vectors
( )( )
å
åå
å
Î
""
=
=
==
--=
ixi
i
x
ii
x
C
i
T
iiiB
x
N
and
N
N
x
N
where
NS
w
µ
µµ
µµµµ
1
11
1
( )( )
å
å
å
Î
Î
=
=
--=
=
i
i
xi
i
T
i
x
ii
C
i
iW
x
N
and
xxSwhere
SS
w
w
µ
µµ
1
1
Recall …
BW SS 1-
Let’s visualize the projection vectors W
-15 -10 -5 0 5 10 15 20 25
-10
-5
0
5
10
15
20
25
X1
- the first feature
X2
-thesecondfeature
Projection … y = WTx
Along first projection vector
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
y
p(y|wi
)
Classes PDF : using the first projection vector with eigen value = 4508.2089
Projection … y = WTx
Along second projection vector
-10 -5 0 5 10 15 20
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
y
p(y|wi
)
Classes PDF : using the second projection vector with eigen value = 1878.8511
Which is Better?!!!
• Apparently, the projection vector that has the highest eigen
value provides higher discrimination power between classes
-10 -5 0 5 10 15 20
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
y
p(y|wi
)
Classes PDF : using the second projection vector with eigen value = 1878.8511
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
y
p(y|wi
)
Classes PDF : using the first projection vector with eigen value = 4508.2089
PCA vs LDA
Limitations of LDA L
• LDA produces at most C-1 feature projections
– If the classification error estimates establish that more features are needed, some other method must be
employed to provide those additional features
• LDA is a parametric method since it assumes unimodal Gaussian likelihoods
– If the distributions are significantly non-Gaussian, the LDA projections will not be able to preserve any
complex structure of the data, which may be needed for classification.
Limitations of LDA L
• LDA will fail when the discriminatory information is not in the mean but
rather in the variance of the data
Thank You

More Related Content

What's hot

KNN
KNN KNN
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant Analysis
Jaclyn Kokx
 
DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)Cory Cook
 
Paper-1 PPT.pptx
Paper-1 PPT.pptxPaper-1 PPT.pptx
Paper-1 PPT.pptx
Sangeeta Mehta
 
Ridge regression
Ridge regressionRidge regression
Ridge regression
Ananda Swarup
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
Mohammad Junaid Khan
 
Klasterisasi - AHC (Agglomerative Hierarchical Clustering).pdf
Klasterisasi - AHC (Agglomerative Hierarchical Clustering).pdfKlasterisasi - AHC (Agglomerative Hierarchical Clustering).pdf
Klasterisasi - AHC (Agglomerative Hierarchical Clustering).pdf
Elvi Rahmi
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
Prakash Pimpale
 
Chapter 09 classification advanced
Chapter 09 classification advancedChapter 09 classification advanced
Chapter 09 classification advanced
Houw Liong The
 
Dbscan algorithom
Dbscan algorithomDbscan algorithom
Dbscan algorithom
Mahbubur Rahman Shimul
 
ML - Multiple Linear Regression
ML - Multiple Linear RegressionML - Multiple Linear Regression
ML - Multiple Linear Regression
Andrew Ferlitsch
 
Linear Regression and Logistic Regression in ML
Linear Regression and Logistic Regression in MLLinear Regression and Logistic Regression in ML
Linear Regression and Logistic Regression in ML
Kumud Arora
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
CosmoAIMS Bassett
 
Data Science - Part VII - Cluster Analysis
Data Science - Part VII -  Cluster AnalysisData Science - Part VII -  Cluster Analysis
Data Science - Part VII - Cluster Analysis
Derek Kane
 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector Machines
Edgar Marca
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Simplilearn
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
odsc
 
Bias and variance trade off
Bias and variance trade offBias and variance trade off
Bias and variance trade off
VARUN KUMAR
 

What's hot (20)

KNN
KNN KNN
KNN
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant Analysis
 
DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)
 
Paper-1 PPT.pptx
Paper-1 PPT.pptxPaper-1 PPT.pptx
Paper-1 PPT.pptx
 
Ridge regression
Ridge regressionRidge regression
Ridge regression
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Klasterisasi - AHC (Agglomerative Hierarchical Clustering).pdf
Klasterisasi - AHC (Agglomerative Hierarchical Clustering).pdfKlasterisasi - AHC (Agglomerative Hierarchical Clustering).pdf
Klasterisasi - AHC (Agglomerative Hierarchical Clustering).pdf
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
Chapter 09 classification advanced
Chapter 09 classification advancedChapter 09 classification advanced
Chapter 09 classification advanced
 
Dbscan algorithom
Dbscan algorithomDbscan algorithom
Dbscan algorithom
 
ML - Multiple Linear Regression
ML - Multiple Linear RegressionML - Multiple Linear Regression
ML - Multiple Linear Regression
 
Linear Regression and Logistic Regression in ML
Linear Regression and Logistic Regression in MLLinear Regression and Logistic Regression in ML
Linear Regression and Logistic Regression in ML
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
 
Data Science - Part VII - Cluster Analysis
Data Science - Part VII -  Cluster AnalysisData Science - Part VII -  Cluster Analysis
Data Science - Part VII - Cluster Analysis
 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector Machines
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
 
Bias and variance trade off
Bias and variance trade offBias and variance trade off
Bias and variance trade off
 
Support Vector Machine
Support Vector MachineSupport Vector Machine
Support Vector Machine
 

Similar to Elhabian lda09

Lecture on linerar discriminatory analysis
Lecture on linerar discriminatory analysisLecture on linerar discriminatory analysis
Lecture on linerar discriminatory analysis
devcb13d
 
Unit-1 Introduction and Mathematical Preliminaries.pptx
Unit-1 Introduction and Mathematical Preliminaries.pptxUnit-1 Introduction and Mathematical Preliminaries.pptx
Unit-1 Introduction and Mathematical Preliminaries.pptx
avinashBajpayee1
 
Dimension Reduction Introduction & PCA.pptx
Dimension Reduction Introduction & PCA.pptxDimension Reduction Introduction & PCA.pptx
Dimension Reduction Introduction & PCA.pptx
RohanBorgalli
 
Svm my
Svm mySvm my
2012 mdsp pr09 pca lda
2012 mdsp pr09 pca lda2012 mdsp pr09 pca lda
2012 mdsp pr09 pca ldanozomuhamada
 
20070823
2007082320070823
20070823neostar
 
Data Mining Lecture_10(b).pptx
Data Mining Lecture_10(b).pptxData Mining Lecture_10(b).pptx
Data Mining Lecture_10(b).pptx
Subrata Kumer Paul
 
DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx
36rajneekant
 
Digital Image Processing.pptx
Digital Image Processing.pptxDigital Image Processing.pptx
Digital Image Processing.pptx
MukhtiarKhan5
 
Lecture cochran
Lecture cochranLecture cochran
Lecture cochran
sabbir11
 
2 vectors notes
2 vectors notes2 vectors notes
2 vectors notes
Vinh Nguyen Xuan
 
machine learning.pptx
machine learning.pptxmachine learning.pptx
machine learning.pptx
AbdusSadik
 
Pca ppt
Pca pptPca ppt
Pca ppt
Alaa Tharwat
 
Distributed Architecture of Subspace Clustering and Related
Distributed Architecture of Subspace Clustering and RelatedDistributed Architecture of Subspace Clustering and Related
Distributed Architecture of Subspace Clustering and Related
Pei-Che Chang
 
talk9.ppt
talk9.ppttalk9.ppt
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
Daiki Tanaka
 
CVPR2009 tutorial: Kernel Methods in Computer Vision: part I: Introduction to...
CVPR2009 tutorial: Kernel Methods in Computer Vision: part I: Introduction to...CVPR2009 tutorial: Kernel Methods in Computer Vision: part I: Introduction to...
CVPR2009 tutorial: Kernel Methods in Computer Vision: part I: Introduction to...zukun
 

Similar to Elhabian lda09 (20)

Cs345 cl
Cs345 clCs345 cl
Cs345 cl
 
Lecture on linerar discriminatory analysis
Lecture on linerar discriminatory analysisLecture on linerar discriminatory analysis
Lecture on linerar discriminatory analysis
 
Unit-1 Introduction and Mathematical Preliminaries.pptx
Unit-1 Introduction and Mathematical Preliminaries.pptxUnit-1 Introduction and Mathematical Preliminaries.pptx
Unit-1 Introduction and Mathematical Preliminaries.pptx
 
Dimension Reduction Introduction & PCA.pptx
Dimension Reduction Introduction & PCA.pptxDimension Reduction Introduction & PCA.pptx
Dimension Reduction Introduction & PCA.pptx
 
Svm my
Svm mySvm my
Svm my
 
Svm my
Svm mySvm my
Svm my
 
2012 mdsp pr09 pca lda
2012 mdsp pr09 pca lda2012 mdsp pr09 pca lda
2012 mdsp pr09 pca lda
 
20070823
2007082320070823
20070823
 
Data Mining Lecture_10(b).pptx
Data Mining Lecture_10(b).pptxData Mining Lecture_10(b).pptx
Data Mining Lecture_10(b).pptx
 
DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx
 
Digital Image Processing.pptx
Digital Image Processing.pptxDigital Image Processing.pptx
Digital Image Processing.pptx
 
Lecture cochran
Lecture cochranLecture cochran
Lecture cochran
 
2 vectors notes
2 vectors notes2 vectors notes
2 vectors notes
 
machine learning.pptx
machine learning.pptxmachine learning.pptx
machine learning.pptx
 
[PPT]
[PPT][PPT]
[PPT]
 
Pca ppt
Pca pptPca ppt
Pca ppt
 
Distributed Architecture of Subspace Clustering and Related
Distributed Architecture of Subspace Clustering and RelatedDistributed Architecture of Subspace Clustering and Related
Distributed Architecture of Subspace Clustering and Related
 
talk9.ppt
talk9.ppttalk9.ppt
talk9.ppt
 
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
 
CVPR2009 tutorial: Kernel Methods in Computer Vision: part I: Introduction to...
CVPR2009 tutorial: Kernel Methods in Computer Vision: part I: Introduction to...CVPR2009 tutorial: Kernel Methods in Computer Vision: part I: Introduction to...
CVPR2009 tutorial: Kernel Methods in Computer Vision: part I: Introduction to...
 

Recently uploaded

Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
Kamal Acharya
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
PrashantGoswami42
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
Kamal Acharya
 
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSETECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
DuvanRamosGarzon1
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
AhmedHussein950959
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
MuhammadTufail242431
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
seandesed
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
Intella Parts
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdf
Kamal Acharya
 

Recently uploaded (20)

Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
 
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSETECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdf
 

Elhabian lda09

  • 1. A Tutorial on Data Reduction Linear Discriminant Analysis (LDA) Shireen Elhabian and Aly A. Farag University of Louisville, CVIP Lab September 2009
  • 2. Outline • LDA objective • Recall … PCA • Now … LDA • LDA … Two Classes – Counter example • LDA … C Classes – Illustrative Example • LDA vs PCA Example • Limitations of LDA
  • 3. LDA Objective • The objective of LDA is to perform dimensionality reduction … – So what, PCA does this L… • However, we want to preserve as much of the class discriminatory information as possible. – OK, that’s new, let dwell deeper J …
  • 4. Recall … PCA • In PCA, the main idea to re-express the available dataset to extract the relevant information by reducing the redundancy and minimize the noise. • We didn’t care about whether this dataset represent features from one or more classes, i.e. the discrimination power was not taken into consideration while we were talking about PCA. • In PCA, we had a dataset matrix X with dimensions mxn, where columns represent different data samples. • We first started by subtracting the mean to have a zero mean dataset, then we computed the covariance matrix Sx = XXT. • Eigen values and eigen vectors were then computed for Sx. Hence the new basis vectors are those eigen vectors with highest eigen values, where the number of those vectors was our choice. • Thus, using the new basis, we can project the dataset onto a less dimensional space with more powerful data representation. n – feature vectors (data samples) m-dimensionaldatavector
  • 5. Now … LDA • Consider a pattern classification problem, where we have C- classes, e.g. seabass, tuna, salmon … • Each class has Ni m-dimensional samples, where i = 1,2, …, C. • Hence we have a set of m-dimensional samples {x1, x2,…, xNi} belong to class ωi. • Stacking these samples from different classes into one big fat matrix X such that each column represents one sample. • We seek to obtain a transformation of X to Y through projecting the samples in X onto a hyperplane with dimension C-1. • Let’s see what does this mean?
  • 6. LDA … Two Classes • Assume we have m-dimensional samples {x1, x2,…, xN}, N1 of which belong to ω1 and N2 belong to ω2. • We seek to obtain a scalar y by projecting the samples x onto a line (C-1 space, C = 2). – where w is the projection vectors used to project x to y. • Of all the possible lines we would like to select the one that maximizes the separability of the scalars. ú ú ú ú û ù ê ê ê ê ë é = ú ú ú ú û ù ê ê ê ê ë é == mm T w w wand x x xwherexwy . . . . 11 The two classes are not well separated when projected onto this line This line succeeded in separating the two classes and in the meantime reducing the dimensionality of our problem from two features (x1,x2) to only a scalar value y.
  • 7. LDA … Two Classes • In order to find a good projection vector, we need to define a measure of separation between the projections. • The mean vector of each class in x and y feature space is: – i.e. projecting x to y will lead to projecting the mean of x to the mean of y. • We could then choose the distance between the projected means as our objective function i i ii i i i T xi T x T iyixi wx N w xw N y N andx N µ µµ w www == === å ååå Î ÎÎÎ 1 11~1 ( )222 111 ~~)( µµµµµµ -=-=-= TTT wwwwJ
  • 8. LDA … Two Classes • However, the distance between the projected means is not a very good measure since it does not take into account the standard deviation within the classes. This axis has a larger distance between means This axis yields better class separability
  • 9. LDA … Two Classes • The solution proposed by Fisher is to maximize a function that represents the difference between the means, normalized by a measure of the within-class variability, or the so-called scatter. • For each class we define the scatter, an equivalent of the variance, as; (sum of square differences between the projected samples and their class mean). • measures the variability within class ωi after projecting it on the y-space. • Thus measures the variability within the two classes at hand after projection, hence it is called within-class scatter of the projected samples. ( )åÎ -= iy ii ys w µ 22 ~~ 2~ is 2 2 2 1 ~~ ss +
  • 10. LDA … Two Classes • The Fisher linear discriminant is defined as the linear function wTx that maximizes the criterion function: (the distance between the projected means normalized by the within- class scatter of the projected samples. • Therefore, we will be looking for a projection where examples from the same class are projected very close to each other and, at the same time, the projected means are as farther apart as possible 2 2 2 1 2 2 ~~ ~~ )( 1 ss wJ + - = µµ
  • 11. LDA … Two Classes • In order to find the optimum projection w*, we need to express J(w) as an explicit function of w. • We will define a measure of the scatter in multivariate feature space x which are denoted as scatter matrices; • Where Si is the covariance matrix of class ωi, and Sw is called the within-class scatter matrix. ( )( ) 21 SSS xxS w T i x ii i += --= åÎ µµ w 2 2 2 1 2 2 ~~ ~~ )( 1 ss wJ + - = µµ
  • 12. LDA … Two Classes • Now, the scatter of the projection y can then be expressed as a function of the scatter matrix in feature space x. Where is the within-class scatter matrix of the projected samples y. ( ) ( ) ( )( ) ( )( ) ( ) WW TTTT i T x T ii T x T ii T x i TT y ii SwSwwSSwwSwwSwss wSwwxxw wxxw wxwys i i ii ~~~ ~~ 2121 2 2 2 1 222 ==+=+=+ =÷ ÷ ø ö ç ç è æ --= --= -=-= å å åå Î Î ÎÎ w w ww µµ µµ µµ WS ~ 2 2 2 1 2 2 ~~ ~~ )( 1 ss wJ + - = µµ
  • 13. LDA … Two Classes • Similarly, the difference between the projected means (in y-space) can be expressed in terms of the means in the original feature space (x-space). • The matrix SB is called the between-class scatter of the original samples/feature vectors, while is the between-class scatter of the projected samples y. • Since SB is the outer product of two vectors, its rank is at most one. ( ) ( ) ( )( ) BB T S TT TT SwSw ww ww B ~ ~~ 2121 2 21 2 21 == --= -=- !!! "!!! #$ µµµµ µµµµ BS ~ 2 2 2 1 2 2 ~~ ~~ )( 1 ss wJ + - = µµ
  • 14. LDA … Two Classes • We can finally express the Fisher criterion in terms of SW and SB as: • Hence J(w) is a measure of the difference between class means (encoded in the between-class scatter matrix) normalized by a measure of the within-class scatter matrix. wSw wSw ss wJ W T B T = + - = 2 2 2 1 2 2 ~~ ~~ )( 1 µµ
  • 15. LDA … Two Classes • To find the maximum of J(w), we differentiate and equate to zero. ( ) ( ) ( ) ( ) ( ) ( ) 0)( 0)( 0 :2 022 0 0)( 1 =-Þ =-Þ =÷÷ ø ö çç è æ -÷÷ ø ö çç è æ Þ =-Þ =-Þ =÷÷ ø ö çç è æ = - wwJwSS wSwJwS wS wSw wSw wS wSw wSw wSwbyDividing wSwSwwSwSw wSw dw d wSwwSw dw d wSw wSw wSw dw d wJ dw d BW WB W W T B T B W T W T W T WB T BW T W T B T B T W T W T B T
  • 16. LDA … Two Classes • Solving the generalized eigen value problem yields • This is known as Fisher’s Linear Discriminant, although it is not a discriminant but rather a specific choice of direction for the projection of the data down to one dimension. • Using the same notation as PCA, the solution will be the eigen vector(s) of scalarwJwherewwSS BW ===- )(1 ll ( )21 1* maxarg)(maxarg µµ -=÷÷ ø ö çç è æ == - W W T B T ww S wSw wSw wJw BWX SSS 1- =
  • 17. LDA … Two Classes - Example • Compute the Linear Discriminant projection for the following two- dimensional dataset. – Samples for class ω1 : X1=(x1,x2)={(4,2),(2,4),(2,3),(3,6),(4,4)} – Sample for class ω2 : X2=(x1,x2)={(9,10),(6,8),(9,5),(8,7),(10,8)} 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 x1 x2
  • 18. LDA … Two Classes - Example • The classes mean are : ÷÷ ø ö çç è æ =ú û ù ê ë é ÷÷ ø ö çç è æ +÷÷ ø ö çç è æ +÷÷ ø ö çç è æ +÷÷ ø ö çç è æ +÷÷ ø ö çç è æ == ÷÷ ø ö çç è æ =ú û ù ê ë é ÷÷ ø ö çç è æ +÷÷ ø ö çç è æ +÷÷ ø ö çç è æ +÷÷ ø ö çç è æ +÷÷ ø ö çç è æ == å å Î Î 6.7 4.8 8 10 7 8 5 9 8 6 10 9 5 11 8.3 3 4 4 6 3 3 2 4 2 2 4 5 11 2 2 1 1 2 1 w w µ µ x x x N x N
  • 19. LDA … Two Classes - Example • Covariance matrix of the first class: ( )( ) ÷÷ ø ö çç è æ - - = ú û ù ê ë é ÷÷ ø ö çç è æ -÷÷ ø ö çç è æ +ú û ù ê ë é ÷÷ ø ö çç è æ -÷÷ ø ö çç è æ +ú û ù ê ë é ÷÷ ø ö çç è æ -÷÷ ø ö çç è æ + ú û ù ê ë é ÷÷ ø ö çç è æ -÷÷ ø ö çç è æ +ú û ù ê ë é ÷÷ ø ö çç è æ -÷÷ ø ö çç è æ =--= åÎ 2.225.0 25.01 8.3 3 4 4 8.3 3 6 3 8.3 3 3 2 8.3 3 4 2 8.3 3 2 4 222 22 111 1 T x xxS µµ w
  • 20. LDA … Two Classes - Example • Covariance matrix of the second class: ( )( ) ÷÷ ø ö çç è æ - - = ú û ù ê ë é ÷÷ ø ö çç è æ -÷÷ ø ö çç è æ +ú û ù ê ë é ÷÷ ø ö çç è æ -÷÷ ø ö çç è æ +ú û ù ê ë é ÷÷ ø ö çç è æ -÷÷ ø ö çç è æ + ú û ù ê ë é ÷÷ ø ö çç è æ -÷÷ ø ö çç è æ +ú û ù ê ë é ÷÷ ø ö çç è æ -÷÷ ø ö çç è æ =--= åÎ 3.305.0 05.03.2 6.7 4.8 8 10 6.7 4.8 7 8 6.7 4.8 5 9 6.7 4.8 8 6 6.7 4.8 10 9 222 22 222 2 T x xxS µµ w
  • 21. LDA … Two Classes - Example • Within-class scatter matrix: ÷÷ ø ö çç è æ - - = ÷÷ ø ö çç è æ - - +÷÷ ø ö çç è æ - - =+= 5.53.0 3.03.3 3.305.0 05.03.2 2.225.0 25.01 21 SSSw
  • 22. LDA … Two Classes - Example • Between-class scatter matrix: ( )( ) ( ) ÷÷ ø ö çç è æ = --÷÷ ø ö çç è æ - - = ú û ù ê ë é ÷÷ ø ö çç è æ -÷÷ ø ö çç è æ ú û ù ê ë é ÷÷ ø ö çç è æ -÷÷ ø ö çç è æ = --= 44.1452.20 52.2016.29 8.34.5 8.3 4.5 6.7 4.8 8.3 3 6.7 4.8 8.3 3 2121 T T BS µµµµ
  • 23. LDA … Two Classes - Example • The LDA projection is then obtained as the solution of the generalized eigen value problem ( )( ) ( ) 2007.12,0 02007.1202007.12 02339.4489.69794.22213.9 9794.22339.4 489.62213.9 0 10 01 44.1452.20 52.2016.29 1827.00166.0 0166.03045.0 0 10 01 44.1452.20 52.2016.29 5.53.0 3.03.3 0 21 2 1 1 1 ==Þ =-Þ=-Þ =´---= ÷÷ ø ö çç è æ - - Þ =÷÷ ø ö çç è æ -÷÷ ø ö çç è æ ÷÷ ø ö çç è æ Þ =÷÷ ø ö çç è æ -÷÷ ø ö çç è æ ÷÷ ø ö çç è æ - - Þ =-Þ = - - - ll llll ll l l l l l l ISS wwSS BW BW
  • 24. LDA … Two Classes - Example • Hence • The optimal projection is the one that given maximum λ = J(w) ! * 21 2 1 2 2 1 1 4173.0 9088.0 8178.0 5755.0 ; 2007.12 9794.22339.4 489.62213.9 0 9794.22339.4 489.62213.9 2 1 wwandw Thus w w w and w w w =÷÷ ø ö çç è æ =÷÷ ø ö çç è æ- = ÷÷ ø ö çç è æ =÷÷ ø ö çç è æ ÷÷ ø ö çç è æ =÷÷ ø ö çç è æ "#"$% l l
  • 25. LDA … Two Classes - Example ( ) ÷÷ ø ö çç è æ = ÷÷ ø ö çç è æ - - ÷÷ ø ö çç è æ = ú û ù ê ë é ÷÷ ø ö çç è æ -÷÷ ø ö çç è æ ÷÷ ø ö çç è æ - - =-= - - 4173.0 9088.0 8.3 4.5 1827.00166.0 0166.03045.0 6.7 4.8 8.3 3 5.53.0 3.03.3 1 21 1* µµWSw Or directly;
  • 26. LDA - Projection -4 -3 -2 -1 0 1 2 3 4 5 6 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 y p(y|wi ) Classes PDF : using the LDA projection vector with the other eigen value = 8.8818e-016 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 x1 x 2 LDA projection vector with the other eigen value = 8.8818e-016 The projection vector corresponding to the smallest eigen value Using this vector leads to bad separability between the two classes
  • 27. LDA - Projection 0 5 10 15 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 y p(y|wi ) Classes PDF : using the LDA projection vector with highest eigen value = 12.2007 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 x1 x 2 LDA projection vector with the highest eigen value = 12.2007 The projection vector corresponding to the highest eigen value Using this vector leads to good separability between the two classes
  • 28. LDA … C-Classes • Now, we have C-classes instead of just two. • We are now seeking (C-1) projections [y1, y2, …, yC-1] by means of (C-1) projection vectors wi. • wi can be arranged by columns into a projection matrix W = [w1|w2|…|wC-1] such that: [ ]1211 1 1 11 1 1 |...|| . . , . . --´ - ´-´ = ú ú ú ú û ù ê ê ê ê ë é = ú ú ú ú û ù ê ê ê ê ë é = CCm C C m m wwwWand y y y x x xwhere xWyxwy TT ii =Þ=
  • 29. LDA … C-Classes • If we have n-feature vectors, we can stack them into one matrix as follows; [ ]1211 1 2 1 1 1 1 2 1 1 1 1 21 1 2 1 1 1 |...|| . .... .... . , . .... .... . --´ --- ´-´ = ú ú ú ú ú û ù ê ê ê ê ê ë é = ú ú ú ú ú û ù ê ê ê ê ê ë é = CCm n CCC n nC n mmm n nm wwwWand yyy yyy Y xxx xxx Xwhere XWY T =
  • 30. LDA – C-Classes • Recall the two classes case, the within-class scatter was computed as: • This can be generalized in the C- classes case as: 21 SSSw += ( )( ) å å å Î Î = = --= = i i xi i T i x ii C i iW x N and xxSwhere SS w w µ µµ 1 1 x1 x2 µ1 µ2 µ3 Sw2 Example of two-dimensional features (m = 2), with three classes C = 3. Ni : number of data samples in class ωi.
  • 31. LDA – C-Classes • Recall the two classes case, the between- class scatter was computed as: • For C-classes case, we will measure the between-class scatter with respect to the mean of all classes as follows: ( )( ) å åå å Î "" = = == --= ixi i x ii x C i T iiiB x N and N N x N where NS w µ µµ µµµµ 1 11 1 ( )( )T BS 2121 µµµµ --= x1 x2 µ1 µ2 µ3 Sw2 µ Example of two-dimensional features (m = 2), with three classes C = 3. Ni : number of data samples in class ωi. N: number of all data .
  • 32. LDA – C-Classes • Similarly, – We can define the mean vectors for the projected samples y as: – While the scatter matrices for the projected samples y will be: åå "Î == yyi i y N andy N i 1~1~ µµ w ( )( )ååå = Î= --== C i T i y i C i iW yySS i11 ~~~~ µµ w ( )( )å= --= C i T iiiB NS 1 ~~~~~ µµµµ
  • 33. LDA – C-Classes • Recall in two-classes case, we have expressed the scatter matrices of the projected samples in terms of those of the original samples as: This still hold in C-classes case. • Recall that we are looking for a projection that maximizes the ratio of between-class to within-class scatter. • Since the projection is no longer a scalar (it has C-1 dimensions), we then use the determinant of the scatter matrices to obtain a scalar objective function: • And we will seek the projection W* that maximizes this ratio. WSWS WSWS B T B W T W = = ~ ~ WSW WSW S S WJ W T B T W B == ~ ~ )(
  • 34. LDA – C-Classes • To find the maximum of J(W), we differentiate with respect to W and equate to zero. • Recall in two-classes case, we solved the eigen value problem. • For C-classes case, we have C-1 projection vectors, hence the eigen value problem can be generalized to the C-classes case as: • Thus, It can be shown that the optimal projection matrix W* is the one whose columns are the eigenvectors corresponding to the largest eigen values of the following generalized eigen value problem: 1,...2,1)(1 -====- CiandscalarwJwherewwSS iiiiiBW ll scalarwJwherewwSS BW ===- )(1 ll [ ]* 1 * 2 * 1 ** **1 |...||)( - - === = C BW wwwWandscalarWJwhere WWSS l l
  • 35. Illustration – 3 Classes • Let’s generate a dataset for each class to simulate the three classes shown • For each class do the following, – Use the random number generator to generate a uniform stream of 500 samples that follows U(0,1). – Using the Box-Muller approach, convert the generated uniform stream to N(0,1). x1 x2 µ1 µ2 µ3 µ – Then use the method of eigen values and eigen vectors to manipulate the standard normal to have the required mean vector and covariance matrix . – Estimate the mean and covariance matrix of the resulted dataset.
  • 36. • By visual inspection of the figure, classes parameters (means and covariance matrices can be given as follows: Dataset Generation x1 x2 µ1 µ2 µ3 µ ÷÷ ø ö çç è æ = ÷÷ ø ö çç è æ = ÷÷ ø ö çç è æ - - = ú û ù ê ë é +=ú û ù ê ë é - - +=ú û ù ê ë é- += ú û ù ê ë é = 5.23 15.3 40 04 33 15 5 7 , 5.3 5.2 , 7 3 5 5 meanOverall 3 2 1 321 S S S µµµµµµ µ Zero covariance to lead to data samples distributed horizontally. Positive covariance to lead to data samples distributed along the y = x line. Negative covariance to lead to data samples distributed along the y = -x line.
  • 38. It’s Working … J -5 0 5 10 15 20 -5 0 5 10 15 20 X1 - the first feature X2 -thesecondfeature x1 x2 µ1 µ2 µ3 µ
  • 39. Computing LDA Projection Vectors ( )( ) å åå å Î "" = = == --= ixi i x ii x C i T iiiB x N and N N x N where NS w µ µµ µµµµ 1 11 1 ( )( ) å å å Î Î = = --= = i i xi i T i x ii C i iW x N and xxSwhere SS w w µ µµ 1 1 Recall … BW SS 1-
  • 40. Let’s visualize the projection vectors W -15 -10 -5 0 5 10 15 20 25 -10 -5 0 5 10 15 20 25 X1 - the first feature X2 -thesecondfeature
  • 41. Projection … y = WTx Along first projection vector -5 0 5 10 15 20 25 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 y p(y|wi ) Classes PDF : using the first projection vector with eigen value = 4508.2089
  • 42. Projection … y = WTx Along second projection vector -10 -5 0 5 10 15 20 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 y p(y|wi ) Classes PDF : using the second projection vector with eigen value = 1878.8511
  • 43. Which is Better?!!! • Apparently, the projection vector that has the highest eigen value provides higher discrimination power between classes -10 -5 0 5 10 15 20 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 y p(y|wi ) Classes PDF : using the second projection vector with eigen value = 1878.8511 -5 0 5 10 15 20 25 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 y p(y|wi ) Classes PDF : using the first projection vector with eigen value = 4508.2089
  • 45. Limitations of LDA L • LDA produces at most C-1 feature projections – If the classification error estimates establish that more features are needed, some other method must be employed to provide those additional features • LDA is a parametric method since it assumes unimodal Gaussian likelihoods – If the distributions are significantly non-Gaussian, the LDA projections will not be able to preserve any complex structure of the data, which may be needed for classification.
  • 46. Limitations of LDA L • LDA will fail when the discriminatory information is not in the mean but rather in the variance of the data