4. Images/cinvestav-
What do we want?
What
Given a set of measurements, the goal is to discover compact and
informative representations of the obtained data.
Our Approach
We want to “squeeze” in a relatively small number of features, leading to
a reduction of the necessary feature space dimension.
Properties
Thus removing information redundancies - Usually produced and the
measurement.
4 / 35
5. Images/cinvestav-
What do we want?
What
Given a set of measurements, the goal is to discover compact and
informative representations of the obtained data.
Our Approach
We want to “squeeze” in a relatively small number of features, leading to
a reduction of the necessary feature space dimension.
Properties
Thus removing information redundancies - Usually produced and the
measurement.
4 / 35
6. Images/cinvestav-
What do we want?
What
Given a set of measurements, the goal is to discover compact and
informative representations of the obtained data.
Our Approach
We want to “squeeze” in a relatively small number of features, leading to
a reduction of the necessary feature space dimension.
Properties
Thus removing information redundancies - Usually produced and the
measurement.
4 / 35
7. Images/cinvestav-
What Methods we will see?
Fisher Linear Discriminant
1 Squeezing to the maximum.
2 From Many to One Dimension
Principal Component Analysis
1 Not so much squeezing
2 You are willing to lose some information
Singular Value Decomposition
1 Decompose a m × n data matrix A into A = USV T , U and V
orthonormal matrices and S contains the eigenvalues.
2 You can read more of it on “Singular Value Decomposition -
Dimensionality Reduction”
5 / 35
8. Images/cinvestav-
What Methods we will see?
Fisher Linear Discriminant
1 Squeezing to the maximum.
2 From Many to One Dimension
Principal Component Analysis
1 Not so much squeezing
2 You are willing to lose some information
Singular Value Decomposition
1 Decompose a m × n data matrix A into A = USV T , U and V
orthonormal matrices and S contains the eigenvalues.
2 You can read more of it on “Singular Value Decomposition -
Dimensionality Reduction”
5 / 35
9. Images/cinvestav-
What Methods we will see?
Fisher Linear Discriminant
1 Squeezing to the maximum.
2 From Many to One Dimension
Principal Component Analysis
1 Not so much squeezing
2 You are willing to lose some information
Singular Value Decomposition
1 Decompose a m × n data matrix A into A = USV T , U and V
orthonormal matrices and S contains the eigenvalues.
2 You can read more of it on “Singular Value Decomposition -
Dimensionality Reduction”
5 / 35
11. Images/cinvestav-
Rotation
Projecting
Projecting well-separated samples onto an arbitrary line usually produces a
confused mixture of samples from all of the classes and thus produces poor
recognition performance.
Something Notable
However, moving and rotating the line around might result in an
orientation for which the projected samples are well separated.
Fisher linear discriminant (FLD)
It is a discriminant analysis seeking directions that are efficient for
discriminating binary classification problem.
7 / 35
12. Images/cinvestav-
Rotation
Projecting
Projecting well-separated samples onto an arbitrary line usually produces a
confused mixture of samples from all of the classes and thus produces poor
recognition performance.
Something Notable
However, moving and rotating the line around might result in an
orientation for which the projected samples are well separated.
Fisher linear discriminant (FLD)
It is a discriminant analysis seeking directions that are efficient for
discriminating binary classification problem.
7 / 35
13. Images/cinvestav-
Rotation
Projecting
Projecting well-separated samples onto an arbitrary line usually produces a
confused mixture of samples from all of the classes and thus produces poor
recognition performance.
Something Notable
However, moving and rotating the line around might result in an
orientation for which the projected samples are well separated.
Fisher linear discriminant (FLD)
It is a discriminant analysis seeking directions that are efficient for
discriminating binary classification problem.
7 / 35
15. Images/cinvestav-
This is actually comming from...
Classifier as
A machine for dimensionality reduction.
Initial Setup
We have:
N d-dimensional samples x1, x2, ..., xN
Ni is the number of samples in class Ci for i=1,2.
Then, we ask for the projection of each xi into the line by means of
yi = wT
xi (1)
9 / 35
16. Images/cinvestav-
This is actually comming from...
Classifier as
A machine for dimensionality reduction.
Initial Setup
We have:
N d-dimensional samples x1, x2, ..., xN
Ni is the number of samples in class Ci for i=1,2.
Then, we ask for the projection of each xi into the line by means of
yi = wT
xi (1)
9 / 35
17. Images/cinvestav-
This is actually comming from...
Classifier as
A machine for dimensionality reduction.
Initial Setup
We have:
N d-dimensional samples x1, x2, ..., xN
Ni is the number of samples in class Ci for i=1,2.
Then, we ask for the projection of each xi into the line by means of
yi = wT
xi (1)
9 / 35
19. Images/cinvestav-
Use the mean of each Class
Then
Select w such that class separation is maximized
We then define the mean sample for ecah class
1 C1 ⇒ m1 = 1
N1
N1
i=1 xi
2 C2 ⇒ m2 = 1
N2
N2
i=1 xi
Ok!!! This is giving us a measure of distance
Thus, we want to maximize the distance the projected means:
m1 − m2 = wT
(m1 − m2) (2)
where mk = wT mk for k = 1, 2.
11 / 35
20. Images/cinvestav-
Use the mean of each Class
Then
Select w such that class separation is maximized
We then define the mean sample for ecah class
1 C1 ⇒ m1 = 1
N1
N1
i=1 xi
2 C2 ⇒ m2 = 1
N2
N2
i=1 xi
Ok!!! This is giving us a measure of distance
Thus, we want to maximize the distance the projected means:
m1 − m2 = wT
(m1 − m2) (2)
where mk = wT mk for k = 1, 2.
11 / 35
21. Images/cinvestav-
Use the mean of each Class
Then
Select w such that class separation is maximized
We then define the mean sample for ecah class
1 C1 ⇒ m1 = 1
N1
N1
i=1 xi
2 C2 ⇒ m2 = 1
N2
N2
i=1 xi
Ok!!! This is giving us a measure of distance
Thus, we want to maximize the distance the projected means:
m1 − m2 = wT
(m1 − m2) (2)
where mk = wT mk for k = 1, 2.
11 / 35
26. Images/cinvestav-
Fixing the Problem
To obtain good separation of the projected data
The difference between the means should be large relative to some
measure of the standard deviations for each class.
We define a SCATTER measure (Based in the Sample Variance)
s2
k =
xi∈Ck
wT
xi − mk
2
=
yi=wT xi∈Ck
(yi − mk)2
(3)
We define then within-class variance for the whole data
s2
1 + s2
2 (4)
15 / 35
27. Images/cinvestav-
Fixing the Problem
To obtain good separation of the projected data
The difference between the means should be large relative to some
measure of the standard deviations for each class.
We define a SCATTER measure (Based in the Sample Variance)
s2
k =
xi∈Ck
wT
xi − mk
2
=
yi=wT xi∈Ck
(yi − mk)2
(3)
We define then within-class variance for the whole data
s2
1 + s2
2 (4)
15 / 35
28. Images/cinvestav-
Fixing the Problem
To obtain good separation of the projected data
The difference between the means should be large relative to some
measure of the standard deviations for each class.
We define a SCATTER measure (Based in the Sample Variance)
s2
k =
xi∈Ck
wT
xi − mk
2
=
yi=wT xi∈Ck
(yi − mk)2
(3)
We define then within-class variance for the whole data
s2
1 + s2
2 (4)
15 / 35
30. Images/cinvestav-
Finally, a Cost Function
The between-class variance
(m1 − m2)2
(5)
The Fisher criterion
between-class variance
within-class variance
(6)
Finally
J (w) =
(m1 − m2)2
s2
1 + s2
2
(7)
17 / 35
31. Images/cinvestav-
Finally, a Cost Function
The between-class variance
(m1 − m2)2
(5)
The Fisher criterion
between-class variance
within-class variance
(6)
Finally
J (w) =
(m1 − m2)2
s2
1 + s2
2
(7)
17 / 35
32. Images/cinvestav-
Finally, a Cost Function
The between-class variance
(m1 − m2)2
(5)
The Fisher criterion
between-class variance
within-class variance
(6)
Finally
J (w) =
(m1 − m2)2
s2
1 + s2
2
(7)
17 / 35
33. Images/cinvestav-
We use a transformation to simplify our life
First
J (w) =
wT
m1 − wT
m2
2
yi =wT xi ∈C1
(yi − mk )2 +
yi =wT xi ∈C2
(yi − mk )2
(8)
Second
J (w) =
wT
m1 − wT
m2 wT
m1 − wT
m2
T
yi =wT xi ∈C1
wT xi − mk wT xi − mk
T
+
yi =wT xi ∈C2
wT xi − mk wT xi − mk
T
(9)
Third
J (w) =
wT
(m1 − m2) wT
(m1 − m2)
T
yi =wT xi ∈C1
wT (xi − m1) wT (xi − m1)
T
+
yi =wT xi ∈C2
wT (xi − m2) wT (xi − m2)
T
(10)
18 / 35
34. Images/cinvestav-
We use a transformation to simplify our life
First
J (w) =
wT
m1 − wT
m2
2
yi =wT xi ∈C1
(yi − mk )2 +
yi =wT xi ∈C2
(yi − mk )2
(8)
Second
J (w) =
wT
m1 − wT
m2 wT
m1 − wT
m2
T
yi =wT xi ∈C1
wT xi − mk wT xi − mk
T
+
yi =wT xi ∈C2
wT xi − mk wT xi − mk
T
(9)
Third
J (w) =
wT
(m1 − m2) wT
(m1 − m2)
T
yi =wT xi ∈C1
wT (xi − m1) wT (xi − m1)
T
+
yi =wT xi ∈C2
wT (xi − m2) wT (xi − m2)
T
(10)
18 / 35
35. Images/cinvestav-
We use a transformation to simplify our life
First
J (w) =
wT
m1 − wT
m2
2
yi =wT xi ∈C1
(yi − mk )2 +
yi =wT xi ∈C2
(yi − mk )2
(8)
Second
J (w) =
wT
m1 − wT
m2 wT
m1 − wT
m2
T
yi =wT xi ∈C1
wT xi − mk wT xi − mk
T
+
yi =wT xi ∈C2
wT xi − mk wT xi − mk
T
(9)
Third
J (w) =
wT
(m1 − m2) wT
(m1 − m2)
T
yi =wT xi ∈C1
wT (xi − m1) wT (xi − m1)
T
+
yi =wT xi ∈C2
wT (xi − m2) wT (xi − m2)
T
(10)
18 / 35
36. Images/cinvestav-
Transformation
Fourth
J (w) =
wT
(m1 − m2) (m1 − m2)T
w
yi =wT xi ∈C1
wT (xi − m1) (xi − m1)T w +
yi =wT xi ∈C2
wT (xi − m2) (xi − m2)T w
(11)
Fifth
J (w) =
wT
(m1 − m2) (m1 − m2)T
w
wT
yi =wT xi ∈C1
(xi − m1) (xi − m1)T +
yi =wT xi ∈C2
(xi − m2) (xi − m2)T w
(12)
Now Rename
J (w) =
wT
SBw
wT Sww
(13)
19 / 35
37. Images/cinvestav-
Transformation
Fourth
J (w) =
wT
(m1 − m2) (m1 − m2)T
w
yi =wT xi ∈C1
wT (xi − m1) (xi − m1)T w +
yi =wT xi ∈C2
wT (xi − m2) (xi − m2)T w
(11)
Fifth
J (w) =
wT
(m1 − m2) (m1 − m2)T
w
wT
yi =wT xi ∈C1
(xi − m1) (xi − m1)T +
yi =wT xi ∈C2
(xi − m2) (xi − m2)T w
(12)
Now Rename
J (w) =
wT
SBw
wT Sww
(13)
19 / 35
38. Images/cinvestav-
Transformation
Fourth
J (w) =
wT
(m1 − m2) (m1 − m2)T
w
yi =wT xi ∈C1
wT (xi − m1) (xi − m1)T w +
yi =wT xi ∈C2
wT (xi − m2) (xi − m2)T w
(11)
Fifth
J (w) =
wT
(m1 − m2) (m1 − m2)T
w
wT
yi =wT xi ∈C1
(xi − m1) (xi − m1)T +
yi =wT xi ∈C2
(xi − m2) (xi − m2)T w
(12)
Now Rename
J (w) =
wT
SBw
wT Sww
(13)
19 / 35
39. Images/cinvestav-
Derive with respect to w
Thus
dJ (w)
dw
=
d wT
SBw wT
Sww
−1
dw
= 0 (14)
Then
dJ (w)
dw
= SBw + ST
B w wT
Sww
−1
− wT
SBw wT
Sww
−2
Sww + ST
w w = 0
(15)
Now because the symmetry in SB and Sw
dJ (w)
dw
=
SB
(wT Sww)
−
wT
SBwSww
(wT Sww)2 = 0 (16)
20 / 35
40. Images/cinvestav-
Derive with respect to w
Thus
dJ (w)
dw
=
d wT
SBw wT
Sww
−1
dw
= 0 (14)
Then
dJ (w)
dw
= SBw + ST
B w wT
Sww
−1
− wT
SBw wT
Sww
−2
Sww + ST
w w = 0
(15)
Now because the symmetry in SB and Sw
dJ (w)
dw
=
SB
(wT Sww)
−
wT
SBwSww
(wT Sww)2 = 0 (16)
20 / 35
41. Images/cinvestav-
Derive with respect to w
Thus
dJ (w)
dw
=
d wT
SBw wT
Sww
−1
dw
= 0 (14)
Then
dJ (w)
dw
= SBw + ST
B w wT
Sww
−1
− wT
SBw wT
Sww
−2
Sww + ST
w w = 0
(15)
Now because the symmetry in SB and Sw
dJ (w)
dw
=
SB
(wT Sww)
−
wT
SBwSww
(wT Sww)2 = 0 (16)
20 / 35
44. Images/cinvestav-
Now, Several Tricks!!!
First
SBw = (m1 − m2) (m1 − m2)T
w = α (m1 − m2) (19)
Where α = (m1 − m2)T
w is a simple constant
It means that SBw is always in the direction m1 − m2!!!
In addition
wT Sww and wT SBw are constants
22 / 35
45. Images/cinvestav-
Now, Several Tricks!!!
First
SBw = (m1 − m2) (m1 − m2)T
w = α (m1 − m2) (19)
Where α = (m1 − m2)T
w is a simple constant
It means that SBw is always in the direction m1 − m2!!!
In addition
wT Sww and wT SBw are constants
22 / 35
46. Images/cinvestav-
Now, Several Tricks!!!
First
SBw = (m1 − m2) (m1 − m2)T
w = α (m1 − m2) (19)
Where α = (m1 − m2)T
w is a simple constant
It means that SBw is always in the direction m1 − m2!!!
In addition
wT Sww and wT SBw are constants
22 / 35
47. Images/cinvestav-
Now, Several Tricks!!!
Finally
Sww ∝ (m1 − m2) ⇒ w ∝ S−1
w (m1 − m2) (20)
Once the data is transformed into yi
Use a threshold y0 ⇒ x ∈ C1 iff y (x) ≥ y0 or x ∈ C2 iff y (x) < y0
Or ML with a Gussian can be used to classify the new transformed
data using a Naive Bayes (Central Limit Theorem and y = wT x sum
of random variables).
23 / 35
48. Images/cinvestav-
Now, Several Tricks!!!
Finally
Sww ∝ (m1 − m2) ⇒ w ∝ S−1
w (m1 − m2) (20)
Once the data is transformed into yi
Use a threshold y0 ⇒ x ∈ C1 iff y (x) ≥ y0 or x ∈ C2 iff y (x) < y0
Or ML with a Gussian can be used to classify the new transformed
data using a Naive Bayes (Central Limit Theorem and y = wT x sum
of random variables).
23 / 35
49. Images/cinvestav-
Now, Several Tricks!!!
Finally
Sww ∝ (m1 − m2) ⇒ w ∝ S−1
w (m1 − m2) (20)
Once the data is transformed into yi
Use a threshold y0 ⇒ x ∈ C1 iff y (x) ≥ y0 or x ∈ C2 iff y (x) < y0
Or ML with a Gussian can be used to classify the new transformed
data using a Naive Bayes (Central Limit Theorem and y = wT x sum
of random variables).
23 / 35
51. Images/cinvestav-
Also Known as Karhunen-Loeve Transform
Setup
Consider a data set of observations {xn} with n = 1, 2, ..., N and
xn ∈ Rd.
Goal
Project data onto space with dimensionality m < d (We assume m is
given)
25 / 35
52. Images/cinvestav-
Also Known as Karhunen-Loeve Transform
Setup
Consider a data set of observations {xn} with n = 1, 2, ..., N and
xn ∈ Rd.
Goal
Project data onto space with dimensionality m < d (We assume m is
given)
25 / 35
53. Images/cinvestav-
Use the 1-Dimensional Variance
Remember the Variance Sample
VAR(X) =
N
i=1 (xi − x) (xi − x)
N − 1
(21)
You can do the same in the case of two variables X and Y
COV (X, Y ) =
N
i=1 (xi − x) (yi − y)
N − 1
(22)
26 / 35
54. Images/cinvestav-
Use the 1-Dimensional Variance
Remember the Variance Sample
VAR(X) =
N
i=1 (xi − x) (xi − x)
N − 1
(21)
You can do the same in the case of two variables X and Y
COV (X, Y ) =
N
i=1 (xi − x) (yi − y)
N − 1
(22)
26 / 35
55. Images/cinvestav-
Now, Define
Given the data
x1, x2, ..., xN (23)
where xi is a column vector
Construct the sample mean
x =
1
N
N
i=1
xi (24)
Build new data
x1 − x, x2 − x, ..., xN − x (25)
27 / 35
56. Images/cinvestav-
Now, Define
Given the data
x1, x2, ..., xN (23)
where xi is a column vector
Construct the sample mean
x =
1
N
N
i=1
xi (24)
Build new data
x1 − x, x2 − x, ..., xN − x (25)
27 / 35
57. Images/cinvestav-
Now, Define
Given the data
x1, x2, ..., xN (23)
where xi is a column vector
Construct the sample mean
x =
1
N
N
i=1
xi (24)
Build new data
x1 − x, x2 − x, ..., xN − x (25)
27 / 35
58. Images/cinvestav-
Build the Sample Mean
The Covariance Matrix
S =
1
N − 1
N
i=1
(xi − x) (xi − x)T
(26)
Properties
1 The ijth value of S is equivalent to σ2
ij.
2 The iith value of S is equivalent to σ2
ii.
3 What else? Look at a plane Center and Rotating!!!
28 / 35
59. Images/cinvestav-
Build the Sample Mean
The Covariance Matrix
S =
1
N − 1
N
i=1
(xi − x) (xi − x)T
(26)
Properties
1 The ijth value of S is equivalent to σ2
ij.
2 The iith value of S is equivalent to σ2
ii.
3 What else? Look at a plane Center and Rotating!!!
28 / 35
60. Images/cinvestav-
Using S to Project Data
As in Fisher
We want to project the data to a line...
For this we use a u1
with uT
1 u1 = 1
Question
What is the Sample Variance of the Projected Data
29 / 35
61. Images/cinvestav-
Using S to Project Data
As in Fisher
We want to project the data to a line...
For this we use a u1
with uT
1 u1 = 1
Question
What is the Sample Variance of the Projected Data
29 / 35
62. Images/cinvestav-
Using S to Project Data
As in Fisher
We want to project the data to a line...
For this we use a u1
with uT
1 u1 = 1
Question
What is the Sample Variance of the Projected Data
29 / 35
63. Images/cinvestav-
Thus we have
Variance of the projected data
1
N − 1
N
i=1
[u1xi − u1x] = uT
1 Su1 (27)
Use Lagrange Multipliers to Maximize
uT
1 Su1 + λ1 1 − uT
1 u1 (28)
30 / 35
64. Images/cinvestav-
Thus we have
Variance of the projected data
1
N − 1
N
i=1
[u1xi − u1x] = uT
1 Su1 (27)
Use Lagrange Multipliers to Maximize
uT
1 Su1 + λ1 1 − uT
1 u1 (28)
30 / 35
65. Images/cinvestav-
Derive by u1
We get
Su1 = λ1u1 (29)
Then
u1 is an eigenvector of S.
If we left-multiply by u1
uT
1 Su1 = λ1 (30)
31 / 35
66. Images/cinvestav-
Derive by u1
We get
Su1 = λ1u1 (29)
Then
u1 is an eigenvector of S.
If we left-multiply by u1
uT
1 Su1 = λ1 (30)
31 / 35
67. Images/cinvestav-
Derive by u1
We get
Su1 = λ1u1 (29)
Then
u1 is an eigenvector of S.
If we left-multiply by u1
uT
1 Su1 = λ1 (30)
31 / 35
68. Images/cinvestav-
Thus
Variance will be the maximum when
uT
1 Su1 = λ1 (31)
is set to the largest eigenvalue. Also know as the First Principal
Component
By Induction
It is possible for M-dimensional space to define M eigenvectors
u1, u2, ..., uM of the data covariance S corresponding to λ1, λ2, ..., λM
that maximize the variance of the projected data.
Computational Cost
1 Full eigenvector decomposition O d3
2 Power Method O Md2 “Golub and Van Loan, 1996)”
3 Use the Expectation Maximization Algorithm
32 / 35
69. Images/cinvestav-
Thus
Variance will be the maximum when
uT
1 Su1 = λ1 (31)
is set to the largest eigenvalue. Also know as the First Principal
Component
By Induction
It is possible for M-dimensional space to define M eigenvectors
u1, u2, ..., uM of the data covariance S corresponding to λ1, λ2, ..., λM
that maximize the variance of the projected data.
Computational Cost
1 Full eigenvector decomposition O d3
2 Power Method O Md2 “Golub and Van Loan, 1996)”
3 Use the Expectation Maximization Algorithm
32 / 35
70. Images/cinvestav-
Thus
Variance will be the maximum when
uT
1 Su1 = λ1 (31)
is set to the largest eigenvalue. Also know as the First Principal
Component
By Induction
It is possible for M-dimensional space to define M eigenvectors
u1, u2, ..., uM of the data covariance S corresponding to λ1, λ2, ..., λM
that maximize the variance of the projected data.
Computational Cost
1 Full eigenvector decomposition O d3
2 Power Method O Md2 “Golub and Van Loan, 1996)”
3 Use the Expectation Maximization Algorithm
32 / 35