SlideShare a Scribd company logo
1 of 73
Download to read offline
Machine Learning for Data Mining
Feature Generation
Andres Mendez-Vazquez
July 16, 2015
1 / 35
Images/cinvestav-
Outline
1 Introduction
What do we want?
2 Fisher Linear Discriminant
The Rotation Idea
Solution
Scatter measure
The Cost Function
3 Principal Component Analysis
2 / 35
Images/cinvestav-
Outline
1 Introduction
What do we want?
2 Fisher Linear Discriminant
The Rotation Idea
Solution
Scatter measure
The Cost Function
3 Principal Component Analysis
3 / 35
Images/cinvestav-
What do we want?
What
Given a set of measurements, the goal is to discover compact and
informative representations of the obtained data.
Our Approach
We want to “squeeze” in a relatively small number of features, leading to
a reduction of the necessary feature space dimension.
Properties
Thus removing information redundancies - Usually produced and the
measurement.
4 / 35
Images/cinvestav-
What do we want?
What
Given a set of measurements, the goal is to discover compact and
informative representations of the obtained data.
Our Approach
We want to “squeeze” in a relatively small number of features, leading to
a reduction of the necessary feature space dimension.
Properties
Thus removing information redundancies - Usually produced and the
measurement.
4 / 35
Images/cinvestav-
What do we want?
What
Given a set of measurements, the goal is to discover compact and
informative representations of the obtained data.
Our Approach
We want to “squeeze” in a relatively small number of features, leading to
a reduction of the necessary feature space dimension.
Properties
Thus removing information redundancies - Usually produced and the
measurement.
4 / 35
Images/cinvestav-
What Methods we will see?
Fisher Linear Discriminant
1 Squeezing to the maximum.
2 From Many to One Dimension
Principal Component Analysis
1 Not so much squeezing
2 You are willing to lose some information
Singular Value Decomposition
1 Decompose a m × n data matrix A into A = USV T , U and V
orthonormal matrices and S contains the eigenvalues.
2 You can read more of it on “Singular Value Decomposition -
Dimensionality Reduction”
5 / 35
Images/cinvestav-
What Methods we will see?
Fisher Linear Discriminant
1 Squeezing to the maximum.
2 From Many to One Dimension
Principal Component Analysis
1 Not so much squeezing
2 You are willing to lose some information
Singular Value Decomposition
1 Decompose a m × n data matrix A into A = USV T , U and V
orthonormal matrices and S contains the eigenvalues.
2 You can read more of it on “Singular Value Decomposition -
Dimensionality Reduction”
5 / 35
Images/cinvestav-
What Methods we will see?
Fisher Linear Discriminant
1 Squeezing to the maximum.
2 From Many to One Dimension
Principal Component Analysis
1 Not so much squeezing
2 You are willing to lose some information
Singular Value Decomposition
1 Decompose a m × n data matrix A into A = USV T , U and V
orthonormal matrices and S contains the eigenvalues.
2 You can read more of it on “Singular Value Decomposition -
Dimensionality Reduction”
5 / 35
Images/cinvestav-
Outline
1 Introduction
What do we want?
2 Fisher Linear Discriminant
The Rotation Idea
Solution
Scatter measure
The Cost Function
3 Principal Component Analysis
6 / 35
Images/cinvestav-
Rotation
Projecting
Projecting well-separated samples onto an arbitrary line usually produces a
confused mixture of samples from all of the classes and thus produces poor
recognition performance.
Something Notable
However, moving and rotating the line around might result in an
orientation for which the projected samples are well separated.
Fisher linear discriminant (FLD)
It is a discriminant analysis seeking directions that are efficient for
discriminating binary classification problem.
7 / 35
Images/cinvestav-
Rotation
Projecting
Projecting well-separated samples onto an arbitrary line usually produces a
confused mixture of samples from all of the classes and thus produces poor
recognition performance.
Something Notable
However, moving and rotating the line around might result in an
orientation for which the projected samples are well separated.
Fisher linear discriminant (FLD)
It is a discriminant analysis seeking directions that are efficient for
discriminating binary classification problem.
7 / 35
Images/cinvestav-
Rotation
Projecting
Projecting well-separated samples onto an arbitrary line usually produces a
confused mixture of samples from all of the classes and thus produces poor
recognition performance.
Something Notable
However, moving and rotating the line around might result in an
orientation for which the projected samples are well separated.
Fisher linear discriminant (FLD)
It is a discriminant analysis seeking directions that are efficient for
discriminating binary classification problem.
7 / 35
Images/cinvestav-
Example
Example - From Left to Right the Improvement
8 / 35
Images/cinvestav-
This is actually comming from...
Classifier as
A machine for dimensionality reduction.
Initial Setup
We have:
N d-dimensional samples x1, x2, ..., xN
Ni is the number of samples in class Ci for i=1,2.
Then, we ask for the projection of each xi into the line by means of
yi = wT
xi (1)
9 / 35
Images/cinvestav-
This is actually comming from...
Classifier as
A machine for dimensionality reduction.
Initial Setup
We have:
N d-dimensional samples x1, x2, ..., xN
Ni is the number of samples in class Ci for i=1,2.
Then, we ask for the projection of each xi into the line by means of
yi = wT
xi (1)
9 / 35
Images/cinvestav-
This is actually comming from...
Classifier as
A machine for dimensionality reduction.
Initial Setup
We have:
N d-dimensional samples x1, x2, ..., xN
Ni is the number of samples in class Ci for i=1,2.
Then, we ask for the projection of each xi into the line by means of
yi = wT
xi (1)
9 / 35
Images/cinvestav-
Outline
1 Introduction
What do we want?
2 Fisher Linear Discriminant
The Rotation Idea
Solution
Scatter measure
The Cost Function
3 Principal Component Analysis
10 / 35
Images/cinvestav-
Use the mean of each Class
Then
Select w such that class separation is maximized
We then define the mean sample for ecah class
1 C1 ⇒ m1 = 1
N1
N1
i=1 xi
2 C2 ⇒ m2 = 1
N2
N2
i=1 xi
Ok!!! This is giving us a measure of distance
Thus, we want to maximize the distance the projected means:
m1 − m2 = wT
(m1 − m2) (2)
where mk = wT mk for k = 1, 2.
11 / 35
Images/cinvestav-
Use the mean of each Class
Then
Select w such that class separation is maximized
We then define the mean sample for ecah class
1 C1 ⇒ m1 = 1
N1
N1
i=1 xi
2 C2 ⇒ m2 = 1
N2
N2
i=1 xi
Ok!!! This is giving us a measure of distance
Thus, we want to maximize the distance the projected means:
m1 − m2 = wT
(m1 − m2) (2)
where mk = wT mk for k = 1, 2.
11 / 35
Images/cinvestav-
Use the mean of each Class
Then
Select w such that class separation is maximized
We then define the mean sample for ecah class
1 C1 ⇒ m1 = 1
N1
N1
i=1 xi
2 C2 ⇒ m2 = 1
N2
N2
i=1 xi
Ok!!! This is giving us a measure of distance
Thus, we want to maximize the distance the projected means:
m1 − m2 = wT
(m1 − m2) (2)
where mk = wT mk for k = 1, 2.
11 / 35
Images/cinvestav-
However
We could simply seek
maxwT
(m1 − m2)
s.t.
d
i=1
wi = 1
After all
We do not care about the magnitude of w.
12 / 35
Images/cinvestav-
However
We could simply seek
maxwT
(m1 − m2)
s.t.
d
i=1
wi = 1
After all
We do not care about the magnitude of w.
12 / 35
Images/cinvestav-
Example
Here, we have the problem
13 / 35
Images/cinvestav-
Outline
1 Introduction
What do we want?
2 Fisher Linear Discriminant
The Rotation Idea
Solution
Scatter measure
The Cost Function
3 Principal Component Analysis
14 / 35
Images/cinvestav-
Fixing the Problem
To obtain good separation of the projected data
The difference between the means should be large relative to some
measure of the standard deviations for each class.
We define a SCATTER measure (Based in the Sample Variance)
s2
k =
xi∈Ck
wT
xi − mk
2
=
yi=wT xi∈Ck
(yi − mk)2
(3)
We define then within-class variance for the whole data
s2
1 + s2
2 (4)
15 / 35
Images/cinvestav-
Fixing the Problem
To obtain good separation of the projected data
The difference between the means should be large relative to some
measure of the standard deviations for each class.
We define a SCATTER measure (Based in the Sample Variance)
s2
k =
xi∈Ck
wT
xi − mk
2
=
yi=wT xi∈Ck
(yi − mk)2
(3)
We define then within-class variance for the whole data
s2
1 + s2
2 (4)
15 / 35
Images/cinvestav-
Fixing the Problem
To obtain good separation of the projected data
The difference between the means should be large relative to some
measure of the standard deviations for each class.
We define a SCATTER measure (Based in the Sample Variance)
s2
k =
xi∈Ck
wT
xi − mk
2
=
yi=wT xi∈Ck
(yi − mk)2
(3)
We define then within-class variance for the whole data
s2
1 + s2
2 (4)
15 / 35
Images/cinvestav-
Outline
1 Introduction
What do we want?
2 Fisher Linear Discriminant
The Rotation Idea
Solution
Scatter measure
The Cost Function
3 Principal Component Analysis
16 / 35
Images/cinvestav-
Finally, a Cost Function
The between-class variance
(m1 − m2)2
(5)
The Fisher criterion
between-class variance
within-class variance
(6)
Finally
J (w) =
(m1 − m2)2
s2
1 + s2
2
(7)
17 / 35
Images/cinvestav-
Finally, a Cost Function
The between-class variance
(m1 − m2)2
(5)
The Fisher criterion
between-class variance
within-class variance
(6)
Finally
J (w) =
(m1 − m2)2
s2
1 + s2
2
(7)
17 / 35
Images/cinvestav-
Finally, a Cost Function
The between-class variance
(m1 − m2)2
(5)
The Fisher criterion
between-class variance
within-class variance
(6)
Finally
J (w) =
(m1 − m2)2
s2
1 + s2
2
(7)
17 / 35
Images/cinvestav-
We use a transformation to simplify our life
First
J (w) =
wT
m1 − wT
m2
2
yi =wT xi ∈C1
(yi − mk )2 +
yi =wT xi ∈C2
(yi − mk )2
(8)
Second
J (w) =
wT
m1 − wT
m2 wT
m1 − wT
m2
T
yi =wT xi ∈C1
wT xi − mk wT xi − mk
T
+
yi =wT xi ∈C2
wT xi − mk wT xi − mk
T
(9)
Third
J (w) =
wT
(m1 − m2) wT
(m1 − m2)
T
yi =wT xi ∈C1
wT (xi − m1) wT (xi − m1)
T
+
yi =wT xi ∈C2
wT (xi − m2) wT (xi − m2)
T
(10)
18 / 35
Images/cinvestav-
We use a transformation to simplify our life
First
J (w) =
wT
m1 − wT
m2
2
yi =wT xi ∈C1
(yi − mk )2 +
yi =wT xi ∈C2
(yi − mk )2
(8)
Second
J (w) =
wT
m1 − wT
m2 wT
m1 − wT
m2
T
yi =wT xi ∈C1
wT xi − mk wT xi − mk
T
+
yi =wT xi ∈C2
wT xi − mk wT xi − mk
T
(9)
Third
J (w) =
wT
(m1 − m2) wT
(m1 − m2)
T
yi =wT xi ∈C1
wT (xi − m1) wT (xi − m1)
T
+
yi =wT xi ∈C2
wT (xi − m2) wT (xi − m2)
T
(10)
18 / 35
Images/cinvestav-
We use a transformation to simplify our life
First
J (w) =
wT
m1 − wT
m2
2
yi =wT xi ∈C1
(yi − mk )2 +
yi =wT xi ∈C2
(yi − mk )2
(8)
Second
J (w) =
wT
m1 − wT
m2 wT
m1 − wT
m2
T
yi =wT xi ∈C1
wT xi − mk wT xi − mk
T
+
yi =wT xi ∈C2
wT xi − mk wT xi − mk
T
(9)
Third
J (w) =
wT
(m1 − m2) wT
(m1 − m2)
T
yi =wT xi ∈C1
wT (xi − m1) wT (xi − m1)
T
+
yi =wT xi ∈C2
wT (xi − m2) wT (xi − m2)
T
(10)
18 / 35
Images/cinvestav-
Transformation
Fourth
J (w) =
wT
(m1 − m2) (m1 − m2)T
w
yi =wT xi ∈C1
wT (xi − m1) (xi − m1)T w +
yi =wT xi ∈C2
wT (xi − m2) (xi − m2)T w
(11)
Fifth
J (w) =
wT
(m1 − m2) (m1 − m2)T
w
wT
yi =wT xi ∈C1
(xi − m1) (xi − m1)T +
yi =wT xi ∈C2
(xi − m2) (xi − m2)T w
(12)
Now Rename
J (w) =
wT
SBw
wT Sww
(13)
19 / 35
Images/cinvestav-
Transformation
Fourth
J (w) =
wT
(m1 − m2) (m1 − m2)T
w
yi =wT xi ∈C1
wT (xi − m1) (xi − m1)T w +
yi =wT xi ∈C2
wT (xi − m2) (xi − m2)T w
(11)
Fifth
J (w) =
wT
(m1 − m2) (m1 − m2)T
w
wT
yi =wT xi ∈C1
(xi − m1) (xi − m1)T +
yi =wT xi ∈C2
(xi − m2) (xi − m2)T w
(12)
Now Rename
J (w) =
wT
SBw
wT Sww
(13)
19 / 35
Images/cinvestav-
Transformation
Fourth
J (w) =
wT
(m1 − m2) (m1 − m2)T
w
yi =wT xi ∈C1
wT (xi − m1) (xi − m1)T w +
yi =wT xi ∈C2
wT (xi − m2) (xi − m2)T w
(11)
Fifth
J (w) =
wT
(m1 − m2) (m1 − m2)T
w
wT
yi =wT xi ∈C1
(xi − m1) (xi − m1)T +
yi =wT xi ∈C2
(xi − m2) (xi − m2)T w
(12)
Now Rename
J (w) =
wT
SBw
wT Sww
(13)
19 / 35
Images/cinvestav-
Derive with respect to w
Thus
dJ (w)
dw
=
d wT
SBw wT
Sww
−1
dw
= 0 (14)
Then
dJ (w)
dw
= SBw + ST
B w wT
Sww
−1
− wT
SBw wT
Sww
−2
Sww + ST
w w = 0
(15)
Now because the symmetry in SB and Sw
dJ (w)
dw
=
SB
(wT Sww)
−
wT
SBwSww
(wT Sww)2 = 0 (16)
20 / 35
Images/cinvestav-
Derive with respect to w
Thus
dJ (w)
dw
=
d wT
SBw wT
Sww
−1
dw
= 0 (14)
Then
dJ (w)
dw
= SBw + ST
B w wT
Sww
−1
− wT
SBw wT
Sww
−2
Sww + ST
w w = 0
(15)
Now because the symmetry in SB and Sw
dJ (w)
dw
=
SB
(wT Sww)
−
wT
SBwSww
(wT Sww)2 = 0 (16)
20 / 35
Images/cinvestav-
Derive with respect to w
Thus
dJ (w)
dw
=
d wT
SBw wT
Sww
−1
dw
= 0 (14)
Then
dJ (w)
dw
= SBw + ST
B w wT
Sww
−1
− wT
SBw wT
Sww
−2
Sww + ST
w w = 0
(15)
Now because the symmetry in SB and Sw
dJ (w)
dw
=
SB
(wT Sww)
−
wT
SBwSww
(wT Sww)2 = 0 (16)
20 / 35
Images/cinvestav-
Derive with respect to w
Thus
dJ (w)
dw
=
SB
(wT Sww)
−
wT
SBwSww
(wT Sww)2 = 0 (17)
Then
wT
Sww SBw = wT
SBw Sww (18)
21 / 35
Images/cinvestav-
Derive with respect to w
Thus
dJ (w)
dw
=
SB
(wT Sww)
−
wT
SBwSww
(wT Sww)2 = 0 (17)
Then
wT
Sww SBw = wT
SBw Sww (18)
21 / 35
Images/cinvestav-
Now, Several Tricks!!!
First
SBw = (m1 − m2) (m1 − m2)T
w = α (m1 − m2) (19)
Where α = (m1 − m2)T
w is a simple constant
It means that SBw is always in the direction m1 − m2!!!
In addition
wT Sww and wT SBw are constants
22 / 35
Images/cinvestav-
Now, Several Tricks!!!
First
SBw = (m1 − m2) (m1 − m2)T
w = α (m1 − m2) (19)
Where α = (m1 − m2)T
w is a simple constant
It means that SBw is always in the direction m1 − m2!!!
In addition
wT Sww and wT SBw are constants
22 / 35
Images/cinvestav-
Now, Several Tricks!!!
First
SBw = (m1 − m2) (m1 − m2)T
w = α (m1 − m2) (19)
Where α = (m1 − m2)T
w is a simple constant
It means that SBw is always in the direction m1 − m2!!!
In addition
wT Sww and wT SBw are constants
22 / 35
Images/cinvestav-
Now, Several Tricks!!!
Finally
Sww ∝ (m1 − m2) ⇒ w ∝ S−1
w (m1 − m2) (20)
Once the data is transformed into yi
Use a threshold y0 ⇒ x ∈ C1 iff y (x) ≥ y0 or x ∈ C2 iff y (x) < y0
Or ML with a Gussian can be used to classify the new transformed
data using a Naive Bayes (Central Limit Theorem and y = wT x sum
of random variables).
23 / 35
Images/cinvestav-
Now, Several Tricks!!!
Finally
Sww ∝ (m1 − m2) ⇒ w ∝ S−1
w (m1 − m2) (20)
Once the data is transformed into yi
Use a threshold y0 ⇒ x ∈ C1 iff y (x) ≥ y0 or x ∈ C2 iff y (x) < y0
Or ML with a Gussian can be used to classify the new transformed
data using a Naive Bayes (Central Limit Theorem and y = wT x sum
of random variables).
23 / 35
Images/cinvestav-
Now, Several Tricks!!!
Finally
Sww ∝ (m1 − m2) ⇒ w ∝ S−1
w (m1 − m2) (20)
Once the data is transformed into yi
Use a threshold y0 ⇒ x ∈ C1 iff y (x) ≥ y0 or x ∈ C2 iff y (x) < y0
Or ML with a Gussian can be used to classify the new transformed
data using a Naive Bayes (Central Limit Theorem and y = wT x sum
of random variables).
23 / 35
Images/cinvestav-
Please
Your Reading Material, it is about the Multiclass
4.1.6 Fisher’s discriminant for multiple classes AT “Pattern Recognition”
by Bishop
24 / 35
Images/cinvestav-
Also Known as Karhunen-Loeve Transform
Setup
Consider a data set of observations {xn} with n = 1, 2, ..., N and
xn ∈ Rd.
Goal
Project data onto space with dimensionality m < d (We assume m is
given)
25 / 35
Images/cinvestav-
Also Known as Karhunen-Loeve Transform
Setup
Consider a data set of observations {xn} with n = 1, 2, ..., N and
xn ∈ Rd.
Goal
Project data onto space with dimensionality m < d (We assume m is
given)
25 / 35
Images/cinvestav-
Use the 1-Dimensional Variance
Remember the Variance Sample
VAR(X) =
N
i=1 (xi − x) (xi − x)
N − 1
(21)
You can do the same in the case of two variables X and Y
COV (X, Y ) =
N
i=1 (xi − x) (yi − y)
N − 1
(22)
26 / 35
Images/cinvestav-
Use the 1-Dimensional Variance
Remember the Variance Sample
VAR(X) =
N
i=1 (xi − x) (xi − x)
N − 1
(21)
You can do the same in the case of two variables X and Y
COV (X, Y ) =
N
i=1 (xi − x) (yi − y)
N − 1
(22)
26 / 35
Images/cinvestav-
Now, Define
Given the data
x1, x2, ..., xN (23)
where xi is a column vector
Construct the sample mean
x =
1
N
N
i=1
xi (24)
Build new data
x1 − x, x2 − x, ..., xN − x (25)
27 / 35
Images/cinvestav-
Now, Define
Given the data
x1, x2, ..., xN (23)
where xi is a column vector
Construct the sample mean
x =
1
N
N
i=1
xi (24)
Build new data
x1 − x, x2 − x, ..., xN − x (25)
27 / 35
Images/cinvestav-
Now, Define
Given the data
x1, x2, ..., xN (23)
where xi is a column vector
Construct the sample mean
x =
1
N
N
i=1
xi (24)
Build new data
x1 − x, x2 − x, ..., xN − x (25)
27 / 35
Images/cinvestav-
Build the Sample Mean
The Covariance Matrix
S =
1
N − 1
N
i=1
(xi − x) (xi − x)T
(26)
Properties
1 The ijth value of S is equivalent to σ2
ij.
2 The iith value of S is equivalent to σ2
ii.
3 What else? Look at a plane Center and Rotating!!!
28 / 35
Images/cinvestav-
Build the Sample Mean
The Covariance Matrix
S =
1
N − 1
N
i=1
(xi − x) (xi − x)T
(26)
Properties
1 The ijth value of S is equivalent to σ2
ij.
2 The iith value of S is equivalent to σ2
ii.
3 What else? Look at a plane Center and Rotating!!!
28 / 35
Images/cinvestav-
Using S to Project Data
As in Fisher
We want to project the data to a line...
For this we use a u1
with uT
1 u1 = 1
Question
What is the Sample Variance of the Projected Data
29 / 35
Images/cinvestav-
Using S to Project Data
As in Fisher
We want to project the data to a line...
For this we use a u1
with uT
1 u1 = 1
Question
What is the Sample Variance of the Projected Data
29 / 35
Images/cinvestav-
Using S to Project Data
As in Fisher
We want to project the data to a line...
For this we use a u1
with uT
1 u1 = 1
Question
What is the Sample Variance of the Projected Data
29 / 35
Images/cinvestav-
Thus we have
Variance of the projected data
1
N − 1
N
i=1
[u1xi − u1x] = uT
1 Su1 (27)
Use Lagrange Multipliers to Maximize
uT
1 Su1 + λ1 1 − uT
1 u1 (28)
30 / 35
Images/cinvestav-
Thus we have
Variance of the projected data
1
N − 1
N
i=1
[u1xi − u1x] = uT
1 Su1 (27)
Use Lagrange Multipliers to Maximize
uT
1 Su1 + λ1 1 − uT
1 u1 (28)
30 / 35
Images/cinvestav-
Derive by u1
We get
Su1 = λ1u1 (29)
Then
u1 is an eigenvector of S.
If we left-multiply by u1
uT
1 Su1 = λ1 (30)
31 / 35
Images/cinvestav-
Derive by u1
We get
Su1 = λ1u1 (29)
Then
u1 is an eigenvector of S.
If we left-multiply by u1
uT
1 Su1 = λ1 (30)
31 / 35
Images/cinvestav-
Derive by u1
We get
Su1 = λ1u1 (29)
Then
u1 is an eigenvector of S.
If we left-multiply by u1
uT
1 Su1 = λ1 (30)
31 / 35
Images/cinvestav-
Thus
Variance will be the maximum when
uT
1 Su1 = λ1 (31)
is set to the largest eigenvalue. Also know as the First Principal
Component
By Induction
It is possible for M-dimensional space to define M eigenvectors
u1, u2, ..., uM of the data covariance S corresponding to λ1, λ2, ..., λM
that maximize the variance of the projected data.
Computational Cost
1 Full eigenvector decomposition O d3
2 Power Method O Md2 “Golub and Van Loan, 1996)”
3 Use the Expectation Maximization Algorithm
32 / 35
Images/cinvestav-
Thus
Variance will be the maximum when
uT
1 Su1 = λ1 (31)
is set to the largest eigenvalue. Also know as the First Principal
Component
By Induction
It is possible for M-dimensional space to define M eigenvectors
u1, u2, ..., uM of the data covariance S corresponding to λ1, λ2, ..., λM
that maximize the variance of the projected data.
Computational Cost
1 Full eigenvector decomposition O d3
2 Power Method O Md2 “Golub and Van Loan, 1996)”
3 Use the Expectation Maximization Algorithm
32 / 35
Images/cinvestav-
Thus
Variance will be the maximum when
uT
1 Su1 = λ1 (31)
is set to the largest eigenvalue. Also know as the First Principal
Component
By Induction
It is possible for M-dimensional space to define M eigenvectors
u1, u2, ..., uM of the data covariance S corresponding to λ1, λ2, ..., λM
that maximize the variance of the projected data.
Computational Cost
1 Full eigenvector decomposition O d3
2 Power Method O Md2 “Golub and Van Loan, 1996)”
3 Use the Expectation Maximization Algorithm
32 / 35
Images/cinvestav-
Example
From Bishop
33 / 35
Images/cinvestav-
Example
From Bishop
34 / 35
Images/cinvestav-
Example
From Bishop
35 / 35

More Related Content

What's hot

18 Machine Learning Radial Basis Function Networks Forward Heuristics
18 Machine Learning Radial Basis Function Networks Forward Heuristics18 Machine Learning Radial Basis Function Networks Forward Heuristics
18 Machine Learning Radial Basis Function Networks Forward HeuristicsAndres Mendez-Vazquez
 
17 Machine Learning Radial Basis Functions
17 Machine Learning Radial Basis Functions17 Machine Learning Radial Basis Functions
17 Machine Learning Radial Basis FunctionsAndres Mendez-Vazquez
 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector MachinesEdgar Marca
 
Machine learning in science and industry — day 3
Machine learning in science and industry — day 3Machine learning in science and industry — day 3
Machine learning in science and industry — day 3arogozhnikov
 
MLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic trackMLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic trackarogozhnikov
 
Machine learning in science and industry — day 2
Machine learning in science and industry — day 2Machine learning in science and industry — day 2
Machine learning in science and industry — day 2arogozhnikov
 
Machine learning in science and industry — day 1
Machine learning in science and industry — day 1Machine learning in science and industry — day 1
Machine learning in science and industry — day 1arogozhnikov
 
Machine learning in science and industry — day 4
Machine learning in science and industry — day 4Machine learning in science and industry — day 4
Machine learning in science and industry — day 4arogozhnikov
 
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...Masahiro Suzuki
 
Vc dimension in Machine Learning
Vc dimension in Machine LearningVc dimension in Machine Learning
Vc dimension in Machine LearningVARUN KUMAR
 
The Kernel Trick
The Kernel TrickThe Kernel Trick
The Kernel TrickEdgar Marca
 
Support Vector Machine
Support Vector MachineSupport Vector Machine
Support Vector MachineLucas Xu
 
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural NetworksMasahiro Suzuki
 
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural NetworksMasahiro Suzuki
 
Ridge regression, lasso and elastic net
Ridge regression, lasso and elastic netRidge regression, lasso and elastic net
Ridge regression, lasso and elastic netVivian S. Zhang
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machinesUjjawal
 
Artificial Intelligence 06.3 Bayesian Networks - Belief Propagation - Junctio...
Artificial Intelligence 06.3 Bayesian Networks - Belief Propagation - Junctio...Artificial Intelligence 06.3 Bayesian Networks - Belief Propagation - Junctio...
Artificial Intelligence 06.3 Bayesian Networks - Belief Propagation - Junctio...Andres Mendez-Vazquez
 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeGilles Louppe
 
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector MachinesDongseo University
 

What's hot (20)

18 Machine Learning Radial Basis Function Networks Forward Heuristics
18 Machine Learning Radial Basis Function Networks Forward Heuristics18 Machine Learning Radial Basis Function Networks Forward Heuristics
18 Machine Learning Radial Basis Function Networks Forward Heuristics
 
17 Machine Learning Radial Basis Functions
17 Machine Learning Radial Basis Functions17 Machine Learning Radial Basis Functions
17 Machine Learning Radial Basis Functions
 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector Machines
 
Machine learning in science and industry — day 3
Machine learning in science and industry — day 3Machine learning in science and industry — day 3
Machine learning in science and industry — day 3
 
MLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic trackMLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic track
 
Machine learning in science and industry — day 2
Machine learning in science and industry — day 2Machine learning in science and industry — day 2
Machine learning in science and industry — day 2
 
Machine learning in science and industry — day 1
Machine learning in science and industry — day 1Machine learning in science and industry — day 1
Machine learning in science and industry — day 1
 
Machine learning in science and industry — day 4
Machine learning in science and industry — day 4Machine learning in science and industry — day 4
Machine learning in science and industry — day 4
 
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
 
Vc dimension in Machine Learning
Vc dimension in Machine LearningVc dimension in Machine Learning
Vc dimension in Machine Learning
 
The Kernel Trick
The Kernel TrickThe Kernel Trick
The Kernel Trick
 
Support Vector Machine
Support Vector MachineSupport Vector Machine
Support Vector Machine
 
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks
 
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
 
Ridge regression, lasso and elastic net
Ridge regression, lasso and elastic netRidge regression, lasso and elastic net
Ridge regression, lasso and elastic net
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machines
 
Svm V SVC
Svm V SVCSvm V SVC
Svm V SVC
 
Artificial Intelligence 06.3 Bayesian Networks - Belief Propagation - Junctio...
Artificial Intelligence 06.3 Bayesian Networks - Belief Propagation - Junctio...Artificial Intelligence 06.3 Bayesian Networks - Belief Propagation - Junctio...
Artificial Intelligence 06.3 Bayesian Networks - Belief Propagation - Junctio...
 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to Practice
 
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines
 

Similar to 23 Machine Learning Feature Generation

05 Machine Learning - Supervised Fisher Classifier
05 Machine Learning - Supervised Fisher Classifier 05 Machine Learning - Supervised Fisher Classifier
05 Machine Learning - Supervised Fisher Classifier Andres Mendez-Vazquez
 
Preparation Data Structures 11 graphs
Preparation Data Structures 11 graphsPreparation Data Structures 11 graphs
Preparation Data Structures 11 graphsAndres Mendez-Vazquez
 
20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variations20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variationsAndres Mendez-Vazquez
 
Standard deviation quartile deviation
Standard deviation  quartile deviationStandard deviation  quartile deviation
Standard deviation quartile deviationRekha Yadav
 
DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx36rajneekant
 
LESSON 04 - Descriptive Satatistics.pdf
LESSON 04 - Descriptive Satatistics.pdfLESSON 04 - Descriptive Satatistics.pdf
LESSON 04 - Descriptive Satatistics.pdfICOMICOM4
 
Statistics-Measures of dispersions
Statistics-Measures of dispersionsStatistics-Measures of dispersions
Statistics-Measures of dispersionsCapricorn
 
Output Units and Cost Function in FNN
Output Units and Cost Function in FNNOutput Units and Cost Function in FNN
Output Units and Cost Function in FNNLin JiaMing
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...Simplilearn
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Zihui Li
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsYONG ZHENG
 
Introduction to Machine Learning Lectures
Introduction to Machine Learning LecturesIntroduction to Machine Learning Lectures
Introduction to Machine Learning Lecturesssuserfece35
 
26 Machine Learning Unsupervised Fuzzy C-Means
26 Machine Learning Unsupervised Fuzzy C-Means26 Machine Learning Unsupervised Fuzzy C-Means
26 Machine Learning Unsupervised Fuzzy C-MeansAndres Mendez-Vazquez
 
Pattern Recognition - Designing a minimum distance class mean classifier
Pattern Recognition - Designing a minimum distance class mean classifierPattern Recognition - Designing a minimum distance class mean classifier
Pattern Recognition - Designing a minimum distance class mean classifierNayem Nayem
 
Unit-1 Introduction and Mathematical Preliminaries.pptx
Unit-1 Introduction and Mathematical Preliminaries.pptxUnit-1 Introduction and Mathematical Preliminaries.pptx
Unit-1 Introduction and Mathematical Preliminaries.pptxavinashBajpayee1
 
Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...
Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...
Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...IRJET Journal
 
Multidimension Scaling and Isomap
Multidimension Scaling and IsomapMultidimension Scaling and Isomap
Multidimension Scaling and IsomapCheng-Shiang Li
 

Similar to 23 Machine Learning Feature Generation (20)

05 Machine Learning - Supervised Fisher Classifier
05 Machine Learning - Supervised Fisher Classifier 05 Machine Learning - Supervised Fisher Classifier
05 Machine Learning - Supervised Fisher Classifier
 
Preparation Data Structures 11 graphs
Preparation Data Structures 11 graphsPreparation Data Structures 11 graphs
Preparation Data Structures 11 graphs
 
20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variations20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variations
 
Standard deviation quartile deviation
Standard deviation  quartile deviationStandard deviation  quartile deviation
Standard deviation quartile deviation
 
DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx
 
Lesson 3
Lesson 3Lesson 3
Lesson 3
 
LESSON 04 - Descriptive Satatistics.pdf
LESSON 04 - Descriptive Satatistics.pdfLESSON 04 - Descriptive Satatistics.pdf
LESSON 04 - Descriptive Satatistics.pdf
 
Statistics-Measures of dispersions
Statistics-Measures of dispersionsStatistics-Measures of dispersions
Statistics-Measures of dispersions
 
Output Units and Cost Function in FNN
Output Units and Cost Function in FNNOutput Units and Cost Function in FNN
Output Units and Cost Function in FNN
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
Teknik Simulasi
Teknik SimulasiTeknik Simulasi
Teknik Simulasi
 
Input analysis
Input analysisInput analysis
Input analysis
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
 
Introduction to Machine Learning Lectures
Introduction to Machine Learning LecturesIntroduction to Machine Learning Lectures
Introduction to Machine Learning Lectures
 
26 Machine Learning Unsupervised Fuzzy C-Means
26 Machine Learning Unsupervised Fuzzy C-Means26 Machine Learning Unsupervised Fuzzy C-Means
26 Machine Learning Unsupervised Fuzzy C-Means
 
Pattern Recognition - Designing a minimum distance class mean classifier
Pattern Recognition - Designing a minimum distance class mean classifierPattern Recognition - Designing a minimum distance class mean classifier
Pattern Recognition - Designing a minimum distance class mean classifier
 
Unit-1 Introduction and Mathematical Preliminaries.pptx
Unit-1 Introduction and Mathematical Preliminaries.pptxUnit-1 Introduction and Mathematical Preliminaries.pptx
Unit-1 Introduction and Mathematical Preliminaries.pptx
 
Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...
Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...
Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...
 
Multidimension Scaling and Isomap
Multidimension Scaling and IsomapMultidimension Scaling and Isomap
Multidimension Scaling and Isomap
 

More from Andres Mendez-Vazquez

01.04 orthonormal basis_eigen_vectors
01.04 orthonormal basis_eigen_vectors01.04 orthonormal basis_eigen_vectors
01.04 orthonormal basis_eigen_vectorsAndres Mendez-Vazquez
 
01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issues01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issuesAndres Mendez-Vazquez
 
05 backpropagation automatic_differentiation
05 backpropagation automatic_differentiation05 backpropagation automatic_differentiation
05 backpropagation automatic_differentiationAndres Mendez-Vazquez
 
01 Introduction to Neural Networks and Deep Learning
01 Introduction to Neural Networks and Deep Learning01 Introduction to Neural Networks and Deep Learning
01 Introduction to Neural Networks and Deep LearningAndres Mendez-Vazquez
 
25 introduction reinforcement_learning
25 introduction reinforcement_learning25 introduction reinforcement_learning
25 introduction reinforcement_learningAndres Mendez-Vazquez
 
Neural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning SyllabusNeural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning SyllabusAndres Mendez-Vazquez
 
Introduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabusIntroduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabusAndres Mendez-Vazquez
 
Ideas about a Bachelor in Machine Learning/Data Sciences
Ideas about a Bachelor in Machine Learning/Data SciencesIdeas about a Bachelor in Machine Learning/Data Sciences
Ideas about a Bachelor in Machine Learning/Data SciencesAndres Mendez-Vazquez
 
Introduction Mathematics Intelligent Systems Syllabus
Introduction Mathematics Intelligent Systems SyllabusIntroduction Mathematics Intelligent Systems Syllabus
Introduction Mathematics Intelligent Systems SyllabusAndres Mendez-Vazquez
 
Introduction Machine Learning Syllabus
Introduction Machine Learning SyllabusIntroduction Machine Learning Syllabus
Introduction Machine Learning SyllabusAndres Mendez-Vazquez
 

More from Andres Mendez-Vazquez (20)

2.03 bayesian estimation
2.03 bayesian estimation2.03 bayesian estimation
2.03 bayesian estimation
 
05 linear transformations
05 linear transformations05 linear transformations
05 linear transformations
 
01.04 orthonormal basis_eigen_vectors
01.04 orthonormal basis_eigen_vectors01.04 orthonormal basis_eigen_vectors
01.04 orthonormal basis_eigen_vectors
 
01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issues01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issues
 
01.02 linear equations
01.02 linear equations01.02 linear equations
01.02 linear equations
 
01.01 vector spaces
01.01 vector spaces01.01 vector spaces
01.01 vector spaces
 
06 recurrent neural_networks
06 recurrent neural_networks06 recurrent neural_networks
06 recurrent neural_networks
 
05 backpropagation automatic_differentiation
05 backpropagation automatic_differentiation05 backpropagation automatic_differentiation
05 backpropagation automatic_differentiation
 
Zetta global
Zetta globalZetta global
Zetta global
 
01 Introduction to Neural Networks and Deep Learning
01 Introduction to Neural Networks and Deep Learning01 Introduction to Neural Networks and Deep Learning
01 Introduction to Neural Networks and Deep Learning
 
25 introduction reinforcement_learning
25 introduction reinforcement_learning25 introduction reinforcement_learning
25 introduction reinforcement_learning
 
Neural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning SyllabusNeural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning Syllabus
 
Introduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabusIntroduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabus
 
Ideas 09 22_2018
Ideas 09 22_2018Ideas 09 22_2018
Ideas 09 22_2018
 
Ideas about a Bachelor in Machine Learning/Data Sciences
Ideas about a Bachelor in Machine Learning/Data SciencesIdeas about a Bachelor in Machine Learning/Data Sciences
Ideas about a Bachelor in Machine Learning/Data Sciences
 
Analysis of Algorithms Syllabus
Analysis of Algorithms  SyllabusAnalysis of Algorithms  Syllabus
Analysis of Algorithms Syllabus
 
17 vapnik chervonenkis dimension
17 vapnik chervonenkis dimension17 vapnik chervonenkis dimension
17 vapnik chervonenkis dimension
 
A basic introduction to learning
A basic introduction to learningA basic introduction to learning
A basic introduction to learning
 
Introduction Mathematics Intelligent Systems Syllabus
Introduction Mathematics Intelligent Systems SyllabusIntroduction Mathematics Intelligent Systems Syllabus
Introduction Mathematics Intelligent Systems Syllabus
 
Introduction Machine Learning Syllabus
Introduction Machine Learning SyllabusIntroduction Machine Learning Syllabus
Introduction Machine Learning Syllabus
 

Recently uploaded

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLManishPatel169454
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spaintimesproduction05
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Christo Ananth
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdfSuman Jyoti
 

Recently uploaded (20)

Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spain
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 

23 Machine Learning Feature Generation

  • 1. Machine Learning for Data Mining Feature Generation Andres Mendez-Vazquez July 16, 2015 1 / 35
  • 2. Images/cinvestav- Outline 1 Introduction What do we want? 2 Fisher Linear Discriminant The Rotation Idea Solution Scatter measure The Cost Function 3 Principal Component Analysis 2 / 35
  • 3. Images/cinvestav- Outline 1 Introduction What do we want? 2 Fisher Linear Discriminant The Rotation Idea Solution Scatter measure The Cost Function 3 Principal Component Analysis 3 / 35
  • 4. Images/cinvestav- What do we want? What Given a set of measurements, the goal is to discover compact and informative representations of the obtained data. Our Approach We want to “squeeze” in a relatively small number of features, leading to a reduction of the necessary feature space dimension. Properties Thus removing information redundancies - Usually produced and the measurement. 4 / 35
  • 5. Images/cinvestav- What do we want? What Given a set of measurements, the goal is to discover compact and informative representations of the obtained data. Our Approach We want to “squeeze” in a relatively small number of features, leading to a reduction of the necessary feature space dimension. Properties Thus removing information redundancies - Usually produced and the measurement. 4 / 35
  • 6. Images/cinvestav- What do we want? What Given a set of measurements, the goal is to discover compact and informative representations of the obtained data. Our Approach We want to “squeeze” in a relatively small number of features, leading to a reduction of the necessary feature space dimension. Properties Thus removing information redundancies - Usually produced and the measurement. 4 / 35
  • 7. Images/cinvestav- What Methods we will see? Fisher Linear Discriminant 1 Squeezing to the maximum. 2 From Many to One Dimension Principal Component Analysis 1 Not so much squeezing 2 You are willing to lose some information Singular Value Decomposition 1 Decompose a m × n data matrix A into A = USV T , U and V orthonormal matrices and S contains the eigenvalues. 2 You can read more of it on “Singular Value Decomposition - Dimensionality Reduction” 5 / 35
  • 8. Images/cinvestav- What Methods we will see? Fisher Linear Discriminant 1 Squeezing to the maximum. 2 From Many to One Dimension Principal Component Analysis 1 Not so much squeezing 2 You are willing to lose some information Singular Value Decomposition 1 Decompose a m × n data matrix A into A = USV T , U and V orthonormal matrices and S contains the eigenvalues. 2 You can read more of it on “Singular Value Decomposition - Dimensionality Reduction” 5 / 35
  • 9. Images/cinvestav- What Methods we will see? Fisher Linear Discriminant 1 Squeezing to the maximum. 2 From Many to One Dimension Principal Component Analysis 1 Not so much squeezing 2 You are willing to lose some information Singular Value Decomposition 1 Decompose a m × n data matrix A into A = USV T , U and V orthonormal matrices and S contains the eigenvalues. 2 You can read more of it on “Singular Value Decomposition - Dimensionality Reduction” 5 / 35
  • 10. Images/cinvestav- Outline 1 Introduction What do we want? 2 Fisher Linear Discriminant The Rotation Idea Solution Scatter measure The Cost Function 3 Principal Component Analysis 6 / 35
  • 11. Images/cinvestav- Rotation Projecting Projecting well-separated samples onto an arbitrary line usually produces a confused mixture of samples from all of the classes and thus produces poor recognition performance. Something Notable However, moving and rotating the line around might result in an orientation for which the projected samples are well separated. Fisher linear discriminant (FLD) It is a discriminant analysis seeking directions that are efficient for discriminating binary classification problem. 7 / 35
  • 12. Images/cinvestav- Rotation Projecting Projecting well-separated samples onto an arbitrary line usually produces a confused mixture of samples from all of the classes and thus produces poor recognition performance. Something Notable However, moving and rotating the line around might result in an orientation for which the projected samples are well separated. Fisher linear discriminant (FLD) It is a discriminant analysis seeking directions that are efficient for discriminating binary classification problem. 7 / 35
  • 13. Images/cinvestav- Rotation Projecting Projecting well-separated samples onto an arbitrary line usually produces a confused mixture of samples from all of the classes and thus produces poor recognition performance. Something Notable However, moving and rotating the line around might result in an orientation for which the projected samples are well separated. Fisher linear discriminant (FLD) It is a discriminant analysis seeking directions that are efficient for discriminating binary classification problem. 7 / 35
  • 14. Images/cinvestav- Example Example - From Left to Right the Improvement 8 / 35
  • 15. Images/cinvestav- This is actually comming from... Classifier as A machine for dimensionality reduction. Initial Setup We have: N d-dimensional samples x1, x2, ..., xN Ni is the number of samples in class Ci for i=1,2. Then, we ask for the projection of each xi into the line by means of yi = wT xi (1) 9 / 35
  • 16. Images/cinvestav- This is actually comming from... Classifier as A machine for dimensionality reduction. Initial Setup We have: N d-dimensional samples x1, x2, ..., xN Ni is the number of samples in class Ci for i=1,2. Then, we ask for the projection of each xi into the line by means of yi = wT xi (1) 9 / 35
  • 17. Images/cinvestav- This is actually comming from... Classifier as A machine for dimensionality reduction. Initial Setup We have: N d-dimensional samples x1, x2, ..., xN Ni is the number of samples in class Ci for i=1,2. Then, we ask for the projection of each xi into the line by means of yi = wT xi (1) 9 / 35
  • 18. Images/cinvestav- Outline 1 Introduction What do we want? 2 Fisher Linear Discriminant The Rotation Idea Solution Scatter measure The Cost Function 3 Principal Component Analysis 10 / 35
  • 19. Images/cinvestav- Use the mean of each Class Then Select w such that class separation is maximized We then define the mean sample for ecah class 1 C1 ⇒ m1 = 1 N1 N1 i=1 xi 2 C2 ⇒ m2 = 1 N2 N2 i=1 xi Ok!!! This is giving us a measure of distance Thus, we want to maximize the distance the projected means: m1 − m2 = wT (m1 − m2) (2) where mk = wT mk for k = 1, 2. 11 / 35
  • 20. Images/cinvestav- Use the mean of each Class Then Select w such that class separation is maximized We then define the mean sample for ecah class 1 C1 ⇒ m1 = 1 N1 N1 i=1 xi 2 C2 ⇒ m2 = 1 N2 N2 i=1 xi Ok!!! This is giving us a measure of distance Thus, we want to maximize the distance the projected means: m1 − m2 = wT (m1 − m2) (2) where mk = wT mk for k = 1, 2. 11 / 35
  • 21. Images/cinvestav- Use the mean of each Class Then Select w such that class separation is maximized We then define the mean sample for ecah class 1 C1 ⇒ m1 = 1 N1 N1 i=1 xi 2 C2 ⇒ m2 = 1 N2 N2 i=1 xi Ok!!! This is giving us a measure of distance Thus, we want to maximize the distance the projected means: m1 − m2 = wT (m1 − m2) (2) where mk = wT mk for k = 1, 2. 11 / 35
  • 22. Images/cinvestav- However We could simply seek maxwT (m1 − m2) s.t. d i=1 wi = 1 After all We do not care about the magnitude of w. 12 / 35
  • 23. Images/cinvestav- However We could simply seek maxwT (m1 − m2) s.t. d i=1 wi = 1 After all We do not care about the magnitude of w. 12 / 35
  • 25. Images/cinvestav- Outline 1 Introduction What do we want? 2 Fisher Linear Discriminant The Rotation Idea Solution Scatter measure The Cost Function 3 Principal Component Analysis 14 / 35
  • 26. Images/cinvestav- Fixing the Problem To obtain good separation of the projected data The difference between the means should be large relative to some measure of the standard deviations for each class. We define a SCATTER measure (Based in the Sample Variance) s2 k = xi∈Ck wT xi − mk 2 = yi=wT xi∈Ck (yi − mk)2 (3) We define then within-class variance for the whole data s2 1 + s2 2 (4) 15 / 35
  • 27. Images/cinvestav- Fixing the Problem To obtain good separation of the projected data The difference between the means should be large relative to some measure of the standard deviations for each class. We define a SCATTER measure (Based in the Sample Variance) s2 k = xi∈Ck wT xi − mk 2 = yi=wT xi∈Ck (yi − mk)2 (3) We define then within-class variance for the whole data s2 1 + s2 2 (4) 15 / 35
  • 28. Images/cinvestav- Fixing the Problem To obtain good separation of the projected data The difference between the means should be large relative to some measure of the standard deviations for each class. We define a SCATTER measure (Based in the Sample Variance) s2 k = xi∈Ck wT xi − mk 2 = yi=wT xi∈Ck (yi − mk)2 (3) We define then within-class variance for the whole data s2 1 + s2 2 (4) 15 / 35
  • 29. Images/cinvestav- Outline 1 Introduction What do we want? 2 Fisher Linear Discriminant The Rotation Idea Solution Scatter measure The Cost Function 3 Principal Component Analysis 16 / 35
  • 30. Images/cinvestav- Finally, a Cost Function The between-class variance (m1 − m2)2 (5) The Fisher criterion between-class variance within-class variance (6) Finally J (w) = (m1 − m2)2 s2 1 + s2 2 (7) 17 / 35
  • 31. Images/cinvestav- Finally, a Cost Function The between-class variance (m1 − m2)2 (5) The Fisher criterion between-class variance within-class variance (6) Finally J (w) = (m1 − m2)2 s2 1 + s2 2 (7) 17 / 35
  • 32. Images/cinvestav- Finally, a Cost Function The between-class variance (m1 − m2)2 (5) The Fisher criterion between-class variance within-class variance (6) Finally J (w) = (m1 − m2)2 s2 1 + s2 2 (7) 17 / 35
  • 33. Images/cinvestav- We use a transformation to simplify our life First J (w) = wT m1 − wT m2 2 yi =wT xi ∈C1 (yi − mk )2 + yi =wT xi ∈C2 (yi − mk )2 (8) Second J (w) = wT m1 − wT m2 wT m1 − wT m2 T yi =wT xi ∈C1 wT xi − mk wT xi − mk T + yi =wT xi ∈C2 wT xi − mk wT xi − mk T (9) Third J (w) = wT (m1 − m2) wT (m1 − m2) T yi =wT xi ∈C1 wT (xi − m1) wT (xi − m1) T + yi =wT xi ∈C2 wT (xi − m2) wT (xi − m2) T (10) 18 / 35
  • 34. Images/cinvestav- We use a transformation to simplify our life First J (w) = wT m1 − wT m2 2 yi =wT xi ∈C1 (yi − mk )2 + yi =wT xi ∈C2 (yi − mk )2 (8) Second J (w) = wT m1 − wT m2 wT m1 − wT m2 T yi =wT xi ∈C1 wT xi − mk wT xi − mk T + yi =wT xi ∈C2 wT xi − mk wT xi − mk T (9) Third J (w) = wT (m1 − m2) wT (m1 − m2) T yi =wT xi ∈C1 wT (xi − m1) wT (xi − m1) T + yi =wT xi ∈C2 wT (xi − m2) wT (xi − m2) T (10) 18 / 35
  • 35. Images/cinvestav- We use a transformation to simplify our life First J (w) = wT m1 − wT m2 2 yi =wT xi ∈C1 (yi − mk )2 + yi =wT xi ∈C2 (yi − mk )2 (8) Second J (w) = wT m1 − wT m2 wT m1 − wT m2 T yi =wT xi ∈C1 wT xi − mk wT xi − mk T + yi =wT xi ∈C2 wT xi − mk wT xi − mk T (9) Third J (w) = wT (m1 − m2) wT (m1 − m2) T yi =wT xi ∈C1 wT (xi − m1) wT (xi − m1) T + yi =wT xi ∈C2 wT (xi − m2) wT (xi − m2) T (10) 18 / 35
  • 36. Images/cinvestav- Transformation Fourth J (w) = wT (m1 − m2) (m1 − m2)T w yi =wT xi ∈C1 wT (xi − m1) (xi − m1)T w + yi =wT xi ∈C2 wT (xi − m2) (xi − m2)T w (11) Fifth J (w) = wT (m1 − m2) (m1 − m2)T w wT yi =wT xi ∈C1 (xi − m1) (xi − m1)T + yi =wT xi ∈C2 (xi − m2) (xi − m2)T w (12) Now Rename J (w) = wT SBw wT Sww (13) 19 / 35
  • 37. Images/cinvestav- Transformation Fourth J (w) = wT (m1 − m2) (m1 − m2)T w yi =wT xi ∈C1 wT (xi − m1) (xi − m1)T w + yi =wT xi ∈C2 wT (xi − m2) (xi − m2)T w (11) Fifth J (w) = wT (m1 − m2) (m1 − m2)T w wT yi =wT xi ∈C1 (xi − m1) (xi − m1)T + yi =wT xi ∈C2 (xi − m2) (xi − m2)T w (12) Now Rename J (w) = wT SBw wT Sww (13) 19 / 35
  • 38. Images/cinvestav- Transformation Fourth J (w) = wT (m1 − m2) (m1 − m2)T w yi =wT xi ∈C1 wT (xi − m1) (xi − m1)T w + yi =wT xi ∈C2 wT (xi − m2) (xi − m2)T w (11) Fifth J (w) = wT (m1 − m2) (m1 − m2)T w wT yi =wT xi ∈C1 (xi − m1) (xi − m1)T + yi =wT xi ∈C2 (xi − m2) (xi − m2)T w (12) Now Rename J (w) = wT SBw wT Sww (13) 19 / 35
  • 39. Images/cinvestav- Derive with respect to w Thus dJ (w) dw = d wT SBw wT Sww −1 dw = 0 (14) Then dJ (w) dw = SBw + ST B w wT Sww −1 − wT SBw wT Sww −2 Sww + ST w w = 0 (15) Now because the symmetry in SB and Sw dJ (w) dw = SB (wT Sww) − wT SBwSww (wT Sww)2 = 0 (16) 20 / 35
  • 40. Images/cinvestav- Derive with respect to w Thus dJ (w) dw = d wT SBw wT Sww −1 dw = 0 (14) Then dJ (w) dw = SBw + ST B w wT Sww −1 − wT SBw wT Sww −2 Sww + ST w w = 0 (15) Now because the symmetry in SB and Sw dJ (w) dw = SB (wT Sww) − wT SBwSww (wT Sww)2 = 0 (16) 20 / 35
  • 41. Images/cinvestav- Derive with respect to w Thus dJ (w) dw = d wT SBw wT Sww −1 dw = 0 (14) Then dJ (w) dw = SBw + ST B w wT Sww −1 − wT SBw wT Sww −2 Sww + ST w w = 0 (15) Now because the symmetry in SB and Sw dJ (w) dw = SB (wT Sww) − wT SBwSww (wT Sww)2 = 0 (16) 20 / 35
  • 42. Images/cinvestav- Derive with respect to w Thus dJ (w) dw = SB (wT Sww) − wT SBwSww (wT Sww)2 = 0 (17) Then wT Sww SBw = wT SBw Sww (18) 21 / 35
  • 43. Images/cinvestav- Derive with respect to w Thus dJ (w) dw = SB (wT Sww) − wT SBwSww (wT Sww)2 = 0 (17) Then wT Sww SBw = wT SBw Sww (18) 21 / 35
  • 44. Images/cinvestav- Now, Several Tricks!!! First SBw = (m1 − m2) (m1 − m2)T w = α (m1 − m2) (19) Where α = (m1 − m2)T w is a simple constant It means that SBw is always in the direction m1 − m2!!! In addition wT Sww and wT SBw are constants 22 / 35
  • 45. Images/cinvestav- Now, Several Tricks!!! First SBw = (m1 − m2) (m1 − m2)T w = α (m1 − m2) (19) Where α = (m1 − m2)T w is a simple constant It means that SBw is always in the direction m1 − m2!!! In addition wT Sww and wT SBw are constants 22 / 35
  • 46. Images/cinvestav- Now, Several Tricks!!! First SBw = (m1 − m2) (m1 − m2)T w = α (m1 − m2) (19) Where α = (m1 − m2)T w is a simple constant It means that SBw is always in the direction m1 − m2!!! In addition wT Sww and wT SBw are constants 22 / 35
  • 47. Images/cinvestav- Now, Several Tricks!!! Finally Sww ∝ (m1 − m2) ⇒ w ∝ S−1 w (m1 − m2) (20) Once the data is transformed into yi Use a threshold y0 ⇒ x ∈ C1 iff y (x) ≥ y0 or x ∈ C2 iff y (x) < y0 Or ML with a Gussian can be used to classify the new transformed data using a Naive Bayes (Central Limit Theorem and y = wT x sum of random variables). 23 / 35
  • 48. Images/cinvestav- Now, Several Tricks!!! Finally Sww ∝ (m1 − m2) ⇒ w ∝ S−1 w (m1 − m2) (20) Once the data is transformed into yi Use a threshold y0 ⇒ x ∈ C1 iff y (x) ≥ y0 or x ∈ C2 iff y (x) < y0 Or ML with a Gussian can be used to classify the new transformed data using a Naive Bayes (Central Limit Theorem and y = wT x sum of random variables). 23 / 35
  • 49. Images/cinvestav- Now, Several Tricks!!! Finally Sww ∝ (m1 − m2) ⇒ w ∝ S−1 w (m1 − m2) (20) Once the data is transformed into yi Use a threshold y0 ⇒ x ∈ C1 iff y (x) ≥ y0 or x ∈ C2 iff y (x) < y0 Or ML with a Gussian can be used to classify the new transformed data using a Naive Bayes (Central Limit Theorem and y = wT x sum of random variables). 23 / 35
  • 50. Images/cinvestav- Please Your Reading Material, it is about the Multiclass 4.1.6 Fisher’s discriminant for multiple classes AT “Pattern Recognition” by Bishop 24 / 35
  • 51. Images/cinvestav- Also Known as Karhunen-Loeve Transform Setup Consider a data set of observations {xn} with n = 1, 2, ..., N and xn ∈ Rd. Goal Project data onto space with dimensionality m < d (We assume m is given) 25 / 35
  • 52. Images/cinvestav- Also Known as Karhunen-Loeve Transform Setup Consider a data set of observations {xn} with n = 1, 2, ..., N and xn ∈ Rd. Goal Project data onto space with dimensionality m < d (We assume m is given) 25 / 35
  • 53. Images/cinvestav- Use the 1-Dimensional Variance Remember the Variance Sample VAR(X) = N i=1 (xi − x) (xi − x) N − 1 (21) You can do the same in the case of two variables X and Y COV (X, Y ) = N i=1 (xi − x) (yi − y) N − 1 (22) 26 / 35
  • 54. Images/cinvestav- Use the 1-Dimensional Variance Remember the Variance Sample VAR(X) = N i=1 (xi − x) (xi − x) N − 1 (21) You can do the same in the case of two variables X and Y COV (X, Y ) = N i=1 (xi − x) (yi − y) N − 1 (22) 26 / 35
  • 55. Images/cinvestav- Now, Define Given the data x1, x2, ..., xN (23) where xi is a column vector Construct the sample mean x = 1 N N i=1 xi (24) Build new data x1 − x, x2 − x, ..., xN − x (25) 27 / 35
  • 56. Images/cinvestav- Now, Define Given the data x1, x2, ..., xN (23) where xi is a column vector Construct the sample mean x = 1 N N i=1 xi (24) Build new data x1 − x, x2 − x, ..., xN − x (25) 27 / 35
  • 57. Images/cinvestav- Now, Define Given the data x1, x2, ..., xN (23) where xi is a column vector Construct the sample mean x = 1 N N i=1 xi (24) Build new data x1 − x, x2 − x, ..., xN − x (25) 27 / 35
  • 58. Images/cinvestav- Build the Sample Mean The Covariance Matrix S = 1 N − 1 N i=1 (xi − x) (xi − x)T (26) Properties 1 The ijth value of S is equivalent to σ2 ij. 2 The iith value of S is equivalent to σ2 ii. 3 What else? Look at a plane Center and Rotating!!! 28 / 35
  • 59. Images/cinvestav- Build the Sample Mean The Covariance Matrix S = 1 N − 1 N i=1 (xi − x) (xi − x)T (26) Properties 1 The ijth value of S is equivalent to σ2 ij. 2 The iith value of S is equivalent to σ2 ii. 3 What else? Look at a plane Center and Rotating!!! 28 / 35
  • 60. Images/cinvestav- Using S to Project Data As in Fisher We want to project the data to a line... For this we use a u1 with uT 1 u1 = 1 Question What is the Sample Variance of the Projected Data 29 / 35
  • 61. Images/cinvestav- Using S to Project Data As in Fisher We want to project the data to a line... For this we use a u1 with uT 1 u1 = 1 Question What is the Sample Variance of the Projected Data 29 / 35
  • 62. Images/cinvestav- Using S to Project Data As in Fisher We want to project the data to a line... For this we use a u1 with uT 1 u1 = 1 Question What is the Sample Variance of the Projected Data 29 / 35
  • 63. Images/cinvestav- Thus we have Variance of the projected data 1 N − 1 N i=1 [u1xi − u1x] = uT 1 Su1 (27) Use Lagrange Multipliers to Maximize uT 1 Su1 + λ1 1 − uT 1 u1 (28) 30 / 35
  • 64. Images/cinvestav- Thus we have Variance of the projected data 1 N − 1 N i=1 [u1xi − u1x] = uT 1 Su1 (27) Use Lagrange Multipliers to Maximize uT 1 Su1 + λ1 1 − uT 1 u1 (28) 30 / 35
  • 65. Images/cinvestav- Derive by u1 We get Su1 = λ1u1 (29) Then u1 is an eigenvector of S. If we left-multiply by u1 uT 1 Su1 = λ1 (30) 31 / 35
  • 66. Images/cinvestav- Derive by u1 We get Su1 = λ1u1 (29) Then u1 is an eigenvector of S. If we left-multiply by u1 uT 1 Su1 = λ1 (30) 31 / 35
  • 67. Images/cinvestav- Derive by u1 We get Su1 = λ1u1 (29) Then u1 is an eigenvector of S. If we left-multiply by u1 uT 1 Su1 = λ1 (30) 31 / 35
  • 68. Images/cinvestav- Thus Variance will be the maximum when uT 1 Su1 = λ1 (31) is set to the largest eigenvalue. Also know as the First Principal Component By Induction It is possible for M-dimensional space to define M eigenvectors u1, u2, ..., uM of the data covariance S corresponding to λ1, λ2, ..., λM that maximize the variance of the projected data. Computational Cost 1 Full eigenvector decomposition O d3 2 Power Method O Md2 “Golub and Van Loan, 1996)” 3 Use the Expectation Maximization Algorithm 32 / 35
  • 69. Images/cinvestav- Thus Variance will be the maximum when uT 1 Su1 = λ1 (31) is set to the largest eigenvalue. Also know as the First Principal Component By Induction It is possible for M-dimensional space to define M eigenvectors u1, u2, ..., uM of the data covariance S corresponding to λ1, λ2, ..., λM that maximize the variance of the projected data. Computational Cost 1 Full eigenvector decomposition O d3 2 Power Method O Md2 “Golub and Van Loan, 1996)” 3 Use the Expectation Maximization Algorithm 32 / 35
  • 70. Images/cinvestav- Thus Variance will be the maximum when uT 1 Su1 = λ1 (31) is set to the largest eigenvalue. Also know as the First Principal Component By Induction It is possible for M-dimensional space to define M eigenvectors u1, u2, ..., uM of the data covariance S corresponding to λ1, λ2, ..., λM that maximize the variance of the projected data. Computational Cost 1 Full eigenvector decomposition O d3 2 Power Method O Md2 “Golub and Van Loan, 1996)” 3 Use the Expectation Maximization Algorithm 32 / 35