SlideShare a Scribd company logo
1 of 99
Download to read offline
Machine Learning for Data Mining
Fisher Linear Discriminant
Andres Mendez-Vazquez
May 26, 2015
1 / 43
Outline
1 Introduction
History
The Rotation Idea
Classifiers as Machines for dimensionality reduction
2 Solution
Use the mean of each Class
Scatter measure
The Cost Function
A Transformation for simplification and defining the cost function
3 Where is this used?
Applications
4 Relation with Least Squared Error
What?
2 / 43
Outline
1 Introduction
History
The Rotation Idea
Classifiers as Machines for dimensionality reduction
2 Solution
Use the mean of each Class
Scatter measure
The Cost Function
A Transformation for simplification and defining the cost function
3 Where is this used?
Applications
4 Relation with Least Squared Error
What?
3 / 43
Outline
1 Introduction
History
The Rotation Idea
Classifiers as Machines for dimensionality reduction
2 Solution
Use the mean of each Class
Scatter measure
The Cost Function
A Transformation for simplification and defining the cost function
3 Where is this used?
Applications
4 Relation with Least Squared Error
What?
4 / 43
Rotation
Projecting
Projecting well-separated samples onto an arbitrary line usually produces a
confused mixture of samples from all of the classes and thus produces poor
recognition performance.
Something Notable
However, moving and rotating the line around might result in an
orientation for which the projected samples are well separated.
Fisher linear discriminant (FLD)
It is a discriminant analysis seeking directions that are efficient for
discriminating binary classification problem.
5 / 43
Rotation
Projecting
Projecting well-separated samples onto an arbitrary line usually produces a
confused mixture of samples from all of the classes and thus produces poor
recognition performance.
Something Notable
However, moving and rotating the line around might result in an
orientation for which the projected samples are well separated.
Fisher linear discriminant (FLD)
It is a discriminant analysis seeking directions that are efficient for
discriminating binary classification problem.
5 / 43
Rotation
Projecting
Projecting well-separated samples onto an arbitrary line usually produces a
confused mixture of samples from all of the classes and thus produces poor
recognition performance.
Something Notable
However, moving and rotating the line around might result in an
orientation for which the projected samples are well separated.
Fisher linear discriminant (FLD)
It is a discriminant analysis seeking directions that are efficient for
discriminating binary classification problem.
5 / 43
Example
Example - From Left to Right the Improvement
6 / 43
Outline
1 Introduction
History
The Rotation Idea
Classifiers as Machines for dimensionality reduction
2 Solution
Use the mean of each Class
Scatter measure
The Cost Function
A Transformation for simplification and defining the cost function
3 Where is this used?
Applications
4 Relation with Least Squared Error
What?
7 / 43
This is actually coming from...
Classifier as
A machine for dimensionality reduction.
Initial Setup
We have:
Nd-dimensional samples x1, x2, ..., xN .
Ni is the number of samples in class Ci for i=1,2.
Then, we ask for the projection of each xi into the line by means of
yi = wT
xi (1)
8 / 43
This is actually coming from...
Classifier as
A machine for dimensionality reduction.
Initial Setup
We have:
Nd-dimensional samples x1, x2, ..., xN .
Ni is the number of samples in class Ci for i=1,2.
Then, we ask for the projection of each xi into the line by means of
yi = wT
xi (1)
8 / 43
This is actually coming from...
Classifier as
A machine for dimensionality reduction.
Initial Setup
We have:
Nd-dimensional samples x1, x2, ..., xN .
Ni is the number of samples in class Ci for i=1,2.
Then, we ask for the projection of each xi into the line by means of
yi = wT
xi (1)
8 / 43
This is actually coming from...
Classifier as
A machine for dimensionality reduction.
Initial Setup
We have:
Nd-dimensional samples x1, x2, ..., xN .
Ni is the number of samples in class Ci for i=1,2.
Then, we ask for the projection of each xi into the line by means of
yi = wT
xi (1)
8 / 43
This is actually coming from...
Classifier as
A machine for dimensionality reduction.
Initial Setup
We have:
Nd-dimensional samples x1, x2, ..., xN .
Ni is the number of samples in class Ci for i=1,2.
Then, we ask for the projection of each xi into the line by means of
yi = wT
xi (1)
8 / 43
Outline
1 Introduction
History
The Rotation Idea
Classifiers as Machines for dimensionality reduction
2 Solution
Use the mean of each Class
Scatter measure
The Cost Function
A Transformation for simplification and defining the cost function
3 Where is this used?
Applications
4 Relation with Least Squared Error
What?
9 / 43
Use the mean of each Class
Then
Select w such that class separation is maximized
We then define the mean sample for ecah class
1 C1 ⇒ m1 = 1
N1
N1
i=1 xi
2 C2 ⇒ m2 = 1
N2
N2
i=1 xi
Ok!!! This is giving us a measure of distance
Thus, we want to maximize the distance the projected means:
m1 − m2 = wT
(m1 − m2) (2)
where mk = wT mk for k = 1, 2.
10 / 43
Use the mean of each Class
Then
Select w such that class separation is maximized
We then define the mean sample for ecah class
1 C1 ⇒ m1 = 1
N1
N1
i=1 xi
2 C2 ⇒ m2 = 1
N2
N2
i=1 xi
Ok!!! This is giving us a measure of distance
Thus, we want to maximize the distance the projected means:
m1 − m2 = wT
(m1 − m2) (2)
where mk = wT mk for k = 1, 2.
10 / 43
Use the mean of each Class
Then
Select w such that class separation is maximized
We then define the mean sample for ecah class
1 C1 ⇒ m1 = 1
N1
N1
i=1 xi
2 C2 ⇒ m2 = 1
N2
N2
i=1 xi
Ok!!! This is giving us a measure of distance
Thus, we want to maximize the distance the projected means:
m1 − m2 = wT
(m1 − m2) (2)
where mk = wT mk for k = 1, 2.
10 / 43
Use the mean of each Class
Then
Select w such that class separation is maximized
We then define the mean sample for ecah class
1 C1 ⇒ m1 = 1
N1
N1
i=1 xi
2 C2 ⇒ m2 = 1
N2
N2
i=1 xi
Ok!!! This is giving us a measure of distance
Thus, we want to maximize the distance the projected means:
m1 − m2 = wT
(m1 − m2) (2)
where mk = wT mk for k = 1, 2.
10 / 43
Use the mean of each Class
Then
Select w such that class separation is maximized
We then define the mean sample for ecah class
1 C1 ⇒ m1 = 1
N1
N1
i=1 xi
2 C2 ⇒ m2 = 1
N2
N2
i=1 xi
Ok!!! This is giving us a measure of distance
Thus, we want to maximize the distance the projected means:
m1 − m2 = wT
(m1 − m2) (2)
where mk = wT mk for k = 1, 2.
10 / 43
However
We could simply seek
max
w
wT
(m1 − m2)
s.t.
d
i=1
wi = 1
After all
We do not care about the magnitude of w.
11 / 43
However
We could simply seek
max
w
wT
(m1 − m2)
s.t.
d
i=1
wi = 1
After all
We do not care about the magnitude of w.
11 / 43
Example
Here, we have the problem... The Scattering!!!
12 / 43
Outline
1 Introduction
History
The Rotation Idea
Classifiers as Machines for dimensionality reduction
2 Solution
Use the mean of each Class
Scatter measure
The Cost Function
A Transformation for simplification and defining the cost function
3 Where is this used?
Applications
4 Relation with Least Squared Error
What?
13 / 43
Fixing the Problem
To obtain good separation of the projected data
The difference between the means should be large relative to some
measure of the standard deviations for each class.
We define a SCATTER measure (Based in the Sample Variance)
s2
k =
xi∈Ck
wT
xi − mk
2
=
yi=wT xi∈Ck
(yi − mk)2
(3)
We define then within-class variance for the whole data
s2
1 + s2
2 (4)
14 / 43
Fixing the Problem
To obtain good separation of the projected data
The difference between the means should be large relative to some
measure of the standard deviations for each class.
We define a SCATTER measure (Based in the Sample Variance)
s2
k =
xi∈Ck
wT
xi − mk
2
=
yi=wT xi∈Ck
(yi − mk)2
(3)
We define then within-class variance for the whole data
s2
1 + s2
2 (4)
14 / 43
Fixing the Problem
To obtain good separation of the projected data
The difference between the means should be large relative to some
measure of the standard deviations for each class.
We define a SCATTER measure (Based in the Sample Variance)
s2
k =
xi∈Ck
wT
xi − mk
2
=
yi=wT xi∈Ck
(yi − mk)2
(3)
We define then within-class variance for the whole data
s2
1 + s2
2 (4)
14 / 43
Outline
1 Introduction
History
The Rotation Idea
Classifiers as Machines for dimensionality reduction
2 Solution
Use the mean of each Class
Scatter measure
The Cost Function
A Transformation for simplification and defining the cost function
3 Where is this used?
Applications
4 Relation with Least Squared Error
What?
15 / 43
Finally, a Cost Function
The between-class variance
(m1 − m2)2
(5)
The Fisher criterion
between-class variance
within-class variance
(6)
Finally
J (w) =
(m1 − m2)2
s2
1 + s2
2
(7)
16 / 43
Finally, a Cost Function
The between-class variance
(m1 − m2)2
(5)
The Fisher criterion
between-class variance
within-class variance
(6)
Finally
J (w) =
(m1 − m2)2
s2
1 + s2
2
(7)
16 / 43
Finally, a Cost Function
The between-class variance
(m1 − m2)2
(5)
The Fisher criterion
between-class variance
within-class variance
(6)
Finally
J (w) =
(m1 − m2)2
s2
1 + s2
2
(7)
16 / 43
Outline
1 Introduction
History
The Rotation Idea
Classifiers as Machines for dimensionality reduction
2 Solution
Use the mean of each Class
Scatter measure
The Cost Function
A Transformation for simplification and defining the cost function
3 Where is this used?
Applications
4 Relation with Least Squared Error
What?
17 / 43
We use a transformation to simplify our life
First
J (w) =
wT
m1 − wT
m2
2
yi =wT xi ∈C1
(yi − mk )2 +
yi =wT xi ∈C2
(yi − mk )2
(8)
Second
J (w) =
wT
m1 − wT
m2 wT
m1 − wT
m2
T
yi =wT xi ∈C1
wT xi − mk wT xi − mk
T
+
yi =wT xi ∈C2
wT xi − mk wT xi − mk
T
(9)
Third
J (w) =
wT
(m1 − m2) wT
(m1 − m2)
T
yi =wT xi ∈C1
wT (xi − m1) wT (xi − m1)
T
+
yi =wT xi ∈C2
wT (xi − m2) wT (xi − m2)
T
(10)
18 / 43
We use a transformation to simplify our life
First
J (w) =
wT
m1 − wT
m2
2
yi =wT xi ∈C1
(yi − mk )2 +
yi =wT xi ∈C2
(yi − mk )2
(8)
Second
J (w) =
wT
m1 − wT
m2 wT
m1 − wT
m2
T
yi =wT xi ∈C1
wT xi − mk wT xi − mk
T
+
yi =wT xi ∈C2
wT xi − mk wT xi − mk
T
(9)
Third
J (w) =
wT
(m1 − m2) wT
(m1 − m2)
T
yi =wT xi ∈C1
wT (xi − m1) wT (xi − m1)
T
+
yi =wT xi ∈C2
wT (xi − m2) wT (xi − m2)
T
(10)
18 / 43
We use a transformation to simplify our life
First
J (w) =
wT
m1 − wT
m2
2
yi =wT xi ∈C1
(yi − mk )2 +
yi =wT xi ∈C2
(yi − mk )2
(8)
Second
J (w) =
wT
m1 − wT
m2 wT
m1 − wT
m2
T
yi =wT xi ∈C1
wT xi − mk wT xi − mk
T
+
yi =wT xi ∈C2
wT xi − mk wT xi − mk
T
(9)
Third
J (w) =
wT
(m1 − m2) wT
(m1 − m2)
T
yi =wT xi ∈C1
wT (xi − m1) wT (xi − m1)
T
+
yi =wT xi ∈C2
wT (xi − m2) wT (xi − m2)
T
(10)
18 / 43
Transformation
Fourth
J (w) =
wT
(m1 − m2) (m1 − m2)T
w
yi =wT xi ∈C1
wT (xi − m1) (xi − m1)T w +
yi =wT xi ∈C2
wT (xi − m2) (xi − m2)T w
(11)
Fifth
J (w) =
wT
(m1 − m2) (m1 − m2)T
w
wT
yi =wT xi ∈C1
(xi − m1) (xi − m1)T +
yi =wT xi ∈C2
(xi − m2) (xi − m2)T w
(12)
Now Rename
J (w) =
wT
SBw
wT Sww
(13)
19 / 43
Transformation
Fourth
J (w) =
wT
(m1 − m2) (m1 − m2)T
w
yi =wT xi ∈C1
wT (xi − m1) (xi − m1)T w +
yi =wT xi ∈C2
wT (xi − m2) (xi − m2)T w
(11)
Fifth
J (w) =
wT
(m1 − m2) (m1 − m2)T
w
wT
yi =wT xi ∈C1
(xi − m1) (xi − m1)T +
yi =wT xi ∈C2
(xi − m2) (xi − m2)T w
(12)
Now Rename
J (w) =
wT
SBw
wT Sww
(13)
19 / 43
Transformation
Fourth
J (w) =
wT
(m1 − m2) (m1 − m2)T
w
yi =wT xi ∈C1
wT (xi − m1) (xi − m1)T w +
yi =wT xi ∈C2
wT (xi − m2) (xi − m2)T w
(11)
Fifth
J (w) =
wT
(m1 − m2) (m1 − m2)T
w
wT
yi =wT xi ∈C1
(xi − m1) (xi − m1)T +
yi =wT xi ∈C2
(xi − m2) (xi − m2)T w
(12)
Now Rename
J (w) =
wT
SBw
wT Sww
(13)
19 / 43
Derive with respect to w
Thus
dJ (w)
dw
=
d wT
SBw wT
Sww
−1
dw
= 0 (14)
Then
dJ (w)
dw
= SBw + ST
B w wT
Sww
−1
− wT
SBw wT
Sww
−2
Sww + ST
w w = 0
(15)
Now because the symmetry in SB and Sw
dJ (w)
dw
=
SB
(wT Sww)
−
wT
SBwSww
(wT Sww)2 = 0 (16)
20 / 43
Derive with respect to w
Thus
dJ (w)
dw
=
d wT
SBw wT
Sww
−1
dw
= 0 (14)
Then
dJ (w)
dw
= SBw + ST
B w wT
Sww
−1
− wT
SBw wT
Sww
−2
Sww + ST
w w = 0
(15)
Now because the symmetry in SB and Sw
dJ (w)
dw
=
SB
(wT Sww)
−
wT
SBwSww
(wT Sww)2 = 0 (16)
20 / 43
Derive with respect to w
Thus
dJ (w)
dw
=
d wT
SBw wT
Sww
−1
dw
= 0 (14)
Then
dJ (w)
dw
= SBw + ST
B w wT
Sww
−1
− wT
SBw wT
Sww
−2
Sww + ST
w w = 0
(15)
Now because the symmetry in SB and Sw
dJ (w)
dw
=
SB
(wT Sww)
−
wT
SBwSww
(wT Sww)2 = 0 (16)
20 / 43
Derive with respect to w
Thus
dJ (w)
dw
=
SB
(wT Sww)
−
wT
SBwSww
(wT Sww)2 = 0 (17)
Then
wT
Sww SBw = wT
SBw Sww (18)
21 / 43
Derive with respect to w
Thus
dJ (w)
dw
=
SB
(wT Sww)
−
wT
SBwSww
(wT Sww)2 = 0 (17)
Then
wT
Sww SBw = wT
SBw Sww (18)
21 / 43
Now, Several Tricks!!!
First
SBw = (m1 − m2) (m1 − m2)T
w = α (m1 − m2) (19)
Where α = (m1 − m2)T
w is a simple constant
It means that SBw is always in the direction m1 − m2!!!
In addition
wT Sww and wT SBw are constants
22 / 43
Now, Several Tricks!!!
First
SBw = (m1 − m2) (m1 − m2)T
w = α (m1 − m2) (19)
Where α = (m1 − m2)T
w is a simple constant
It means that SBw is always in the direction m1 − m2!!!
In addition
wT Sww and wT SBw are constants
22 / 43
Now, Several Tricks!!!
First
SBw = (m1 − m2) (m1 − m2)T
w = α (m1 − m2) (19)
Where α = (m1 − m2)T
w is a simple constant
It means that SBw is always in the direction m1 − m2!!!
In addition
wT Sww and wT SBw are constants
22 / 43
Now, Several Tricks!!!
Finally
Sww ∝ (m1 − m2) ⇒ w ∝ S−1
w (m1 − m2) (20)
Once the data is transformed into yi
Use a threshold y0 ⇒ x ∈ C1 iff y (x) ≥ y0 or x ∈ C2 iff y (x) < y0
Or ML with a Gussian can be used to classify the new transformed
data using a Naive Bayes (Central Limit Theorem and y = wT x sum
of random variables).
23 / 43
Now, Several Tricks!!!
Finally
Sww ∝ (m1 − m2) ⇒ w ∝ S−1
w (m1 − m2) (20)
Once the data is transformed into yi
Use a threshold y0 ⇒ x ∈ C1 iff y (x) ≥ y0 or x ∈ C2 iff y (x) < y0
Or ML with a Gussian can be used to classify the new transformed
data using a Naive Bayes (Central Limit Theorem and y = wT x sum
of random variables).
23 / 43
Now, Several Tricks!!!
Finally
Sww ∝ (m1 − m2) ⇒ w ∝ S−1
w (m1 − m2) (20)
Once the data is transformed into yi
Use a threshold y0 ⇒ x ∈ C1 iff y (x) ≥ y0 or x ∈ C2 iff y (x) < y0
Or ML with a Gussian can be used to classify the new transformed
data using a Naive Bayes (Central Limit Theorem and y = wT x sum
of random variables).
23 / 43
Outline
1 Introduction
History
The Rotation Idea
Classifiers as Machines for dimensionality reduction
2 Solution
Use the mean of each Class
Scatter measure
The Cost Function
A Transformation for simplification and defining the cost function
3 Where is this used?
Applications
4 Relation with Least Squared Error
What?
24 / 43
Applications
Something Notable
Bankruptcy prediction
In bankruptcy prediction based on accounting ratios and other financial
variables, linear discriminant analysis was the first statistical method
applied to systematically explain which firms entered bankruptcy vs.
survived.
Face recognition
In computerized face recognition, each face is represented by a large
number of pixel values.
The linear combinations obtained using Fisher’s linear discriminant are
called Fisher faces.
Marketing
In marketing, discriminant analysis was once often used to determine
the factors which distinguish different types of customers and/or
products on the basis of surveys or other forms of collected data.
25 / 43
Applications
Something Notable
Bankruptcy prediction
In bankruptcy prediction based on accounting ratios and other financial
variables, linear discriminant analysis was the first statistical method
applied to systematically explain which firms entered bankruptcy vs.
survived.
Face recognition
In computerized face recognition, each face is represented by a large
number of pixel values.
The linear combinations obtained using Fisher’s linear discriminant are
called Fisher faces.
Marketing
In marketing, discriminant analysis was once often used to determine
the factors which distinguish different types of customers and/or
products on the basis of surveys or other forms of collected data.
25 / 43
Applications
Something Notable
Bankruptcy prediction
In bankruptcy prediction based on accounting ratios and other financial
variables, linear discriminant analysis was the first statistical method
applied to systematically explain which firms entered bankruptcy vs.
survived.
Face recognition
In computerized face recognition, each face is represented by a large
number of pixel values.
The linear combinations obtained using Fisher’s linear discriminant are
called Fisher faces.
Marketing
In marketing, discriminant analysis was once often used to determine
the factors which distinguish different types of customers and/or
products on the basis of surveys or other forms of collected data.
25 / 43
Applications
Something Notable
Bankruptcy prediction
In bankruptcy prediction based on accounting ratios and other financial
variables, linear discriminant analysis was the first statistical method
applied to systematically explain which firms entered bankruptcy vs.
survived.
Face recognition
In computerized face recognition, each face is represented by a large
number of pixel values.
The linear combinations obtained using Fisher’s linear discriminant are
called Fisher faces.
Marketing
In marketing, discriminant analysis was once often used to determine
the factors which distinguish different types of customers and/or
products on the basis of surveys or other forms of collected data.
25 / 43
Applications
Something Notable
Bankruptcy prediction
In bankruptcy prediction based on accounting ratios and other financial
variables, linear discriminant analysis was the first statistical method
applied to systematically explain which firms entered bankruptcy vs.
survived.
Face recognition
In computerized face recognition, each face is represented by a large
number of pixel values.
The linear combinations obtained using Fisher’s linear discriminant are
called Fisher faces.
Marketing
In marketing, discriminant analysis was once often used to determine
the factors which distinguish different types of customers and/or
products on the basis of surveys or other forms of collected data.
25 / 43
Applications
Something Notable
Bankruptcy prediction
In bankruptcy prediction based on accounting ratios and other financial
variables, linear discriminant analysis was the first statistical method
applied to systematically explain which firms entered bankruptcy vs.
survived.
Face recognition
In computerized face recognition, each face is represented by a large
number of pixel values.
The linear combinations obtained using Fisher’s linear discriminant are
called Fisher faces.
Marketing
In marketing, discriminant analysis was once often used to determine
the factors which distinguish different types of customers and/or
products on the basis of surveys or other forms of collected data.
25 / 43
Applications
Something Notable
Bankruptcy prediction
In bankruptcy prediction based on accounting ratios and other financial
variables, linear discriminant analysis was the first statistical method
applied to systematically explain which firms entered bankruptcy vs.
survived.
Face recognition
In computerized face recognition, each face is represented by a large
number of pixel values.
The linear combinations obtained using Fisher’s linear discriminant are
called Fisher faces.
Marketing
In marketing, discriminant analysis was once often used to determine
the factors which distinguish different types of customers and/or
products on the basis of surveys or other forms of collected data.
25 / 43
Applications
Something Notable
Bankruptcy prediction
In bankruptcy prediction based on accounting ratios and other financial
variables, linear discriminant analysis was the first statistical method
applied to systematically explain which firms entered bankruptcy vs.
survived.
Face recognition
In computerized face recognition, each face is represented by a large
number of pixel values.
The linear combinations obtained using Fisher’s linear discriminant are
called Fisher faces.
Marketing
In marketing, discriminant analysis was once often used to determine
the factors which distinguish different types of customers and/or
products on the basis of surveys or other forms of collected data.
25 / 43
Applications
Something Notable
Biomedical studies
The main application of discriminant analysis in medicine is the
assessment of severity state of a patient and prognosis of disease
outcome.
26 / 43
Applications
Something Notable
Biomedical studies
The main application of discriminant analysis in medicine is the
assessment of severity state of a patient and prognosis of disease
outcome.
26 / 43
Please
Your Reading Material, it is about the Multi-class
4.1.6 Fisher’s discriminant for multiple classes AT “Pattern Recognition”
by Bishop
27 / 43
Outline
1 Introduction
History
The Rotation Idea
Classifiers as Machines for dimensionality reduction
2 Solution
Use the mean of each Class
Scatter measure
The Cost Function
A Transformation for simplification and defining the cost function
3 Where is this used?
Applications
4 Relation with Least Squared Error
What?
28 / 43
Relation with Least Squared Error
First
The least-squares approach to the determination of a linear discriminant
was based on the goal of making the model predictions as close as
possible to a set of target values.
Second
The Fisher criterion was derived by requiring maximum class separation in
the output space.
29 / 43
Relation with Least Squared Error
First
The least-squares approach to the determination of a linear discriminant
was based on the goal of making the model predictions as close as
possible to a set of target values.
Second
The Fisher criterion was derived by requiring maximum class separation in
the output space.
29 / 43
How do we do this?
First
We have N samples.
We have N1samples for class C1.
We have N2 samples for class C2.
So, we decide the following for the targets on each class
We have then for class C1 is N
N1
.
We have then for class C2 is − N
N2
.
30 / 43
How do we do this?
First
We have N samples.
We have N1samples for class C1.
We have N2 samples for class C2.
So, we decide the following for the targets on each class
We have then for class C1 is N
N1
.
We have then for class C2 is − N
N2
.
30 / 43
How do we do this?
First
We have N samples.
We have N1samples for class C1.
We have N2 samples for class C2.
So, we decide the following for the targets on each class
We have then for class C1 is N
N1
.
We have then for class C2 is − N
N2
.
30 / 43
How do we do this?
First
We have N samples.
We have N1samples for class C1.
We have N2 samples for class C2.
So, we decide the following for the targets on each class
We have then for class C1 is N
N1
.
We have then for class C2 is − N
N2
.
30 / 43
Thus
The new cost function
E =
1
2
N
n=1
wT
xn + wo − tn
2
(21)
Deriving with respect to w
N
n=1
wT
xn + wo − tn xn = 0 (22)
Deriving with respect to w0
N
n=1
wT
xn + wo − tn = 0 (23)
31 / 43
Thus
The new cost function
E =
1
2
N
n=1
wT
xn + wo − tn
2
(21)
Deriving with respect to w
N
n=1
wT
xn + wo − tn xn = 0 (22)
Deriving with respect to w0
N
n=1
wT
xn + wo − tn = 0 (23)
31 / 43
Thus
The new cost function
E =
1
2
N
n=1
wT
xn + wo − tn
2
(21)
Deriving with respect to w
N
n=1
wT
xn + wo − tn xn = 0 (22)
Deriving with respect to w0
N
n=1
wT
xn + wo − tn = 0 (23)
31 / 43
Then
We have that
N
n=1
wT
xn + wo − tn =
N
n=1
wT
xn + wo −
N
n=1
tn
=
N
n=1
wT
xn + wo − N1
N
N1
+ N2
N
N2
=
N
n=1
wT
xn + wo
Then
N
n=1
wT
xn + Nwo = 0
32 / 43
Then
We have that
N
n=1
wT
xn + wo − tn =
N
n=1
wT
xn + wo −
N
n=1
tn
=
N
n=1
wT
xn + wo − N1
N
N1
+ N2
N
N2
=
N
n=1
wT
xn + wo
Then
N
n=1
wT
xn + Nwo = 0
32 / 43
Then
We have that
wo = −wT 1
N
N
n=1
xn
We rename 1
N
N
n=1 xn = m
m =
1
N
N
n=1
xn =
1
N
[N1m1 + N2m2]
Finally
wo = −wT
m
33 / 43
Then
We have that
wo = −wT 1
N
N
n=1
xn
We rename 1
N
N
n=1 xn = m
m =
1
N
N
n=1
xn =
1
N
[N1m1 + N2m2]
Finally
wo = −wT
m
33 / 43
Then
We have that
wo = −wT 1
N
N
n=1
xn
We rename 1
N
N
n=1 xn = m
m =
1
N
N
n=1
xn =
1
N
[N1m1 + N2m2]
Finally
wo = −wT
m
33 / 43
Now
In a similar way
N
n=1
wT
xn + wo xn −
N
n=1
tnxn = 0
34 / 43
Thus, we have
Something Notable
N
n=1
wT
xn + wo xn −
N
N1
N1
n=1
xn +
N
N2
N2
n=1
xn = 0
Thus
N
n=1
wT
xn + wo xn − N (m1 − m2) = 0
35 / 43
Thus, we have
Something Notable
N
n=1
wT
xn + wo xn −
N
N1
N1
n=1
xn +
N
N2
N2
n=1
xn = 0
Thus
N
n=1
wT
xn + wo xn − N (m1 − m2) = 0
35 / 43
Next
Then
N
n=1
wT
xn − wT
m xn = N (m1 − m2)
Thus
N
n=1
wT
xn − wT
m xn = N (m1 − m2)
Now
What? Please do this as a take homework for next class
36 / 43
Next
Then
N
n=1
wT
xn − wT
m xn = N (m1 − m2)
Thus
N
n=1
wT
xn − wT
m xn = N (m1 − m2)
Now
What? Please do this as a take homework for next class
36 / 43
Next
Then
N
n=1
wT
xn − wT
m xn = N (m1 − m2)
Thus
N
n=1
wT
xn − wT
m xn = N (m1 − m2)
Now
What? Please do this as a take homework for next class
36 / 43
Now, Do you have the solution?
You have a version in Duda and Hart Section 5.8
ˆw = XT
X
−1
XT
y
Thus
XT
X ˆw = XT
y
Now, we rewrite the data matrix by the normalization discussed early
X =
11 X1
−12 −X2
37 / 43
Now, Do you have the solution?
You have a version in Duda and Hart Section 5.8
ˆw = XT
X
−1
XT
y
Thus
XT
X ˆw = XT
y
Now, we rewrite the data matrix by the normalization discussed early
X =
11 X1
−12 −X2
37 / 43
Now, Do you have the solution?
You have a version in Duda and Hart Section 5.8
ˆw = XT
X
−1
XT
y
Thus
XT
X ˆw = XT
y
Now, we rewrite the data matrix by the normalization discussed early
X =
11 X1
−12 −X2
37 / 43
In addition
Our old augmented w
w =
w0
w
And our new y
y =
N
N1
11
N
N2
12
(24)
38 / 43
In addition
Our old augmented w
w =
w0
w
And our new y
y =
N
N1
11
N
N2
12
(24)
38 / 43
Thus, we have
Something Notable
1T
1 −1T
1
XT
1 −XT
2
11 X1
−12 −X2
w0
w
=
1T
1 −1T
1
XT
1 −XT
2
N
N1
11
N
N2
12
Thus, if we use the following definitions
mi = 1
Ni x∈Ci
x
SW =
xi∈C1
(xi − m1) (xi − m1)T
+ xi∈C2
(xi − m2) (xi − m2)T
39 / 43
Thus, we have
Something Notable
1T
1 −1T
1
XT
1 −XT
2
11 X1
−12 −X2
w0
w
=
1T
1 −1T
1
XT
1 −XT
2
N
N1
11
N
N2
12
Thus, if we use the following definitions
mi = 1
Ni x∈Ci
x
SW =
xi∈C1
(xi − m1) (xi − m1)T
+ xi∈C2
(xi − m2) (xi − m2)T
39 / 43
Thus, we have
Something Notable
1T
1 −1T
1
XT
1 −XT
2
11 X1
−12 −X2
w0
w
=
1T
1 −1T
1
XT
1 −XT
2
N
N1
11
N
N2
12
Thus, if we use the following definitions
mi = 1
Ni x∈Ci
x
SW =
xi∈C1
(xi − m1) (xi − m1)T
+ xi∈C2
(xi − m2) (xi − m2)T
39 / 43
Thus, we have
Something Notable
1T
1 −1T
1
XT
1 −XT
2
11 X1
−12 −X2
w0
w
=
1T
1 −1T
1
XT
1 −XT
2
N
N1
11
N
N2
12
Thus, if we use the following definitions
mi = 1
Ni x∈Ci
x
SW =
xi∈C1
(xi − m1) (xi − m1)T
+ xi∈C2
(xi − m2) (xi − m2)T
39 / 43
Then
We have the following
n (n1m1 + n2m2)
T
(n1m1 + n2m2) Sw + n1m1mT
1 + n2m2mT
2
w0
w
=
0
n [m1 − m2]
Then
nw0 + (n1m1 + n2m2)
T
w
(n1m1 + n2m2) w0 + Sw + n1m1mT
1 + n2m2mT
2
=
0
n [m1 − m2]
40 / 43
Then
We have the following
n (n1m1 + n2m2)
T
(n1m1 + n2m2) Sw + n1m1mT
1 + n2m2mT
2
w0
w
=
0
n [m1 − m2]
Then
nw0 + (n1m1 + n2m2)
T
w
(n1m1 + n2m2) w0 + Sw + n1m1mT
1 + n2m2mT
2
=
0
n [m1 − m2]
40 / 43
Thus
We have that
w0 = −wT
m
1
n
Sw +
n1n2
n2
(m1 − m2) (m1 − m2)T
w = m1 − m2
Thus
Since the vector (m1 − m2) (m1 − m2)T
w is in the direction of
m1 − m2
We have that
n1n2
n2
(m1 − m2) (m1 − m2)T
w = (1 − α) (m1 − m2)
41 / 43
Thus
We have that
w0 = −wT
m
1
n
Sw +
n1n2
n2
(m1 − m2) (m1 − m2)T
w = m1 − m2
Thus
Since the vector (m1 − m2) (m1 − m2)T
w is in the direction of
m1 − m2
We have that
n1n2
n2
(m1 − m2) (m1 − m2)T
w = (1 − α) (m1 − m2)
41 / 43
Thus
We have that
w0 = −wT
m
1
n
Sw +
n1n2
n2
(m1 − m2) (m1 − m2)T
w = m1 − m2
Thus
Since the vector (m1 − m2) (m1 − m2)T
w is in the direction of
m1 − m2
We have that
n1n2
n2
(m1 − m2) (m1 − m2)T
w = (1 − α) (m1 − m2)
41 / 43
Finally
We have that
w = αnS−1
w (m1 − m2) (25)
42 / 43
Exercises
Bishop
Chapter 4
4.4, 4.5, 4.6
43 / 43
Exercises
Bishop
Chapter 4
4.4, 4.5, 4.6
43 / 43

More Related Content

What's hot

Designing a Minimum Distance classifier to Class Mean Classifier
Designing a Minimum Distance classifier to Class Mean ClassifierDesigning a Minimum Distance classifier to Class Mean Classifier
Designing a Minimum Distance classifier to Class Mean ClassifierDipesh Shome
 
Rabbit challenge 3 DNN Day2
Rabbit challenge 3 DNN Day2Rabbit challenge 3 DNN Day2
Rabbit challenge 3 DNN Day2TOMMYLINK1
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesnextlib
 
Rabbit challenge 5_dnn3
Rabbit challenge 5_dnn3Rabbit challenge 5_dnn3
Rabbit challenge 5_dnn3TOMMYLINK1
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for ClassificationPrakash Pimpale
 
2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revisedKrish_ver2
 
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...butest
 
Dynamic clustering algorithm using fuzzy c means
Dynamic clustering algorithm using fuzzy c meansDynamic clustering algorithm using fuzzy c means
Dynamic clustering algorithm using fuzzy c meansWrishin Bhattacharya
 
Support Vector Machine
Support Vector MachineSupport Vector Machine
Support Vector MachineLucas Xu
 
A novel approach for high speed convolution of finite and infinite length seq...
A novel approach for high speed convolution of finite and infinite length seq...A novel approach for high speed convolution of finite and infinite length seq...
A novel approach for high speed convolution of finite and infinite length seq...eSAT Journals
 
Dimensionality Reduction | Machine Learning | CloudxLab
Dimensionality Reduction | Machine Learning | CloudxLabDimensionality Reduction | Machine Learning | CloudxLab
Dimensionality Reduction | Machine Learning | CloudxLabCloudxLab
 
Support Vector Machine and Implementation using Weka
Support Vector Machine and Implementation using WekaSupport Vector Machine and Implementation using Weka
Support Vector Machine and Implementation using WekaMacha Pujitha
 

What's hot (20)

Designing a Minimum Distance classifier to Class Mean Classifier
Designing a Minimum Distance classifier to Class Mean ClassifierDesigning a Minimum Distance classifier to Class Mean Classifier
Designing a Minimum Distance classifier to Class Mean Classifier
 
Rabbit challenge 3 DNN Day2
Rabbit challenge 3 DNN Day2Rabbit challenge 3 DNN Day2
Rabbit challenge 3 DNN Day2
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
End1
End1End1
End1
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
Rabbit challenge 5_dnn3
Rabbit challenge 5_dnn3Rabbit challenge 5_dnn3
Rabbit challenge 5_dnn3
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised
 
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
 
Svm
SvmSvm
Svm
 
Intro
IntroIntro
Intro
 
Dynamic clustering algorithm using fuzzy c means
Dynamic clustering algorithm using fuzzy c meansDynamic clustering algorithm using fuzzy c means
Dynamic clustering algorithm using fuzzy c means
 
Support Vector Machine
Support Vector MachineSupport Vector Machine
Support Vector Machine
 
A novel approach for high speed convolution of finite and infinite length seq...
A novel approach for high speed convolution of finite and infinite length seq...A novel approach for high speed convolution of finite and infinite length seq...
A novel approach for high speed convolution of finite and infinite length seq...
 
Lesson 36
Lesson 36Lesson 36
Lesson 36
 
Dimensionality Reduction | Machine Learning | CloudxLab
Dimensionality Reduction | Machine Learning | CloudxLabDimensionality Reduction | Machine Learning | CloudxLab
Dimensionality Reduction | Machine Learning | CloudxLab
 
Lesson 33
Lesson 33Lesson 33
Lesson 33
 
Lesson 39
Lesson 39Lesson 39
Lesson 39
 
Text categorization
Text categorizationText categorization
Text categorization
 
Support Vector Machine and Implementation using Weka
Support Vector Machine and Implementation using WekaSupport Vector Machine and Implementation using Weka
Support Vector Machine and Implementation using Weka
 

Similar to 05 Machine Learning - Supervised Fisher Classifier

23 Machine Learning Feature Generation
23 Machine Learning Feature Generation23 Machine Learning Feature Generation
23 Machine Learning Feature GenerationAndres Mendez-Vazquez
 
Implementing Minimum Error Rate Classifier
Implementing Minimum Error Rate ClassifierImplementing Minimum Error Rate Classifier
Implementing Minimum Error Rate ClassifierDipesh Shome
 
APPLICATION OF PARTICLE SWARM OPTIMIZATION TO MICROWAVE TAPERED MICROSTRIP LINES
APPLICATION OF PARTICLE SWARM OPTIMIZATION TO MICROWAVE TAPERED MICROSTRIP LINESAPPLICATION OF PARTICLE SWARM OPTIMIZATION TO MICROWAVE TAPERED MICROSTRIP LINES
APPLICATION OF PARTICLE SWARM OPTIMIZATION TO MICROWAVE TAPERED MICROSTRIP LINEScseij
 
Application of particle swarm optimization to microwave tapered microstrip lines
Application of particle swarm optimization to microwave tapered microstrip linesApplication of particle swarm optimization to microwave tapered microstrip lines
Application of particle swarm optimization to microwave tapered microstrip linescseij
 
Wiener Filter Hardware Realization
Wiener Filter Hardware RealizationWiener Filter Hardware Realization
Wiener Filter Hardware RealizationSayan Chaudhuri
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationGeoffrey Fox
 
Optimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering AlgorithmOptimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering AlgorithmIJERA Editor
 
Classification of voltage disturbance using machine learning
Classification of voltage disturbance using machine learning Classification of voltage disturbance using machine learning
Classification of voltage disturbance using machine learning Mohan Kashyap
 
Instance Based Learning in Machine Learning
Instance Based Learning in Machine LearningInstance Based Learning in Machine Learning
Instance Based Learning in Machine LearningPavithra Thippanaik
 
Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...
Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...
Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...IRJET Journal
 
HOME ASSIGNMENT omar ali.pptx
HOME ASSIGNMENT omar ali.pptxHOME ASSIGNMENT omar ali.pptx
HOME ASSIGNMENT omar ali.pptxSayedulHassan1
 
Paper id 252014114
Paper id 252014114Paper id 252014114
Paper id 252014114IJRAT
 
HOME ASSIGNMENT (0).pptx
HOME ASSIGNMENT (0).pptxHOME ASSIGNMENT (0).pptx
HOME ASSIGNMENT (0).pptxSayedulHassan1
 

Similar to 05 Machine Learning - Supervised Fisher Classifier (20)

23 Machine Learning Feature Generation
23 Machine Learning Feature Generation23 Machine Learning Feature Generation
23 Machine Learning Feature Generation
 
Report
ReportReport
Report
 
Implementing Minimum Error Rate Classifier
Implementing Minimum Error Rate ClassifierImplementing Minimum Error Rate Classifier
Implementing Minimum Error Rate Classifier
 
G1804014348
G1804014348G1804014348
G1804014348
 
APPLICATION OF PARTICLE SWARM OPTIMIZATION TO MICROWAVE TAPERED MICROSTRIP LINES
APPLICATION OF PARTICLE SWARM OPTIMIZATION TO MICROWAVE TAPERED MICROSTRIP LINESAPPLICATION OF PARTICLE SWARM OPTIMIZATION TO MICROWAVE TAPERED MICROSTRIP LINES
APPLICATION OF PARTICLE SWARM OPTIMIZATION TO MICROWAVE TAPERED MICROSTRIP LINES
 
Application of particle swarm optimization to microwave tapered microstrip lines
Application of particle swarm optimization to microwave tapered microstrip linesApplication of particle swarm optimization to microwave tapered microstrip lines
Application of particle swarm optimization to microwave tapered microstrip lines
 
Wiener Filter Hardware Realization
Wiener Filter Hardware RealizationWiener Filter Hardware Realization
Wiener Filter Hardware Realization
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Respose surface methods
Respose surface methodsRespose surface methods
Respose surface methods
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
 
Optimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering AlgorithmOptimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering Algorithm
 
Classification of voltage disturbance using machine learning
Classification of voltage disturbance using machine learning Classification of voltage disturbance using machine learning
Classification of voltage disturbance using machine learning
 
ICPR 2016
ICPR 2016ICPR 2016
ICPR 2016
 
Feature Selection
Feature Selection Feature Selection
Feature Selection
 
Instance Based Learning in Machine Learning
Instance Based Learning in Machine LearningInstance Based Learning in Machine Learning
Instance Based Learning in Machine Learning
 
Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...
Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...
Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...
 
HOME ASSIGNMENT omar ali.pptx
HOME ASSIGNMENT omar ali.pptxHOME ASSIGNMENT omar ali.pptx
HOME ASSIGNMENT omar ali.pptx
 
Paper id 252014114
Paper id 252014114Paper id 252014114
Paper id 252014114
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
 
HOME ASSIGNMENT (0).pptx
HOME ASSIGNMENT (0).pptxHOME ASSIGNMENT (0).pptx
HOME ASSIGNMENT (0).pptx
 

More from Andres Mendez-Vazquez

01.04 orthonormal basis_eigen_vectors
01.04 orthonormal basis_eigen_vectors01.04 orthonormal basis_eigen_vectors
01.04 orthonormal basis_eigen_vectorsAndres Mendez-Vazquez
 
01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issues01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issuesAndres Mendez-Vazquez
 
05 backpropagation automatic_differentiation
05 backpropagation automatic_differentiation05 backpropagation automatic_differentiation
05 backpropagation automatic_differentiationAndres Mendez-Vazquez
 
01 Introduction to Neural Networks and Deep Learning
01 Introduction to Neural Networks and Deep Learning01 Introduction to Neural Networks and Deep Learning
01 Introduction to Neural Networks and Deep LearningAndres Mendez-Vazquez
 
25 introduction reinforcement_learning
25 introduction reinforcement_learning25 introduction reinforcement_learning
25 introduction reinforcement_learningAndres Mendez-Vazquez
 
Neural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning SyllabusNeural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning SyllabusAndres Mendez-Vazquez
 
Introduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabusIntroduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabusAndres Mendez-Vazquez
 
Ideas about a Bachelor in Machine Learning/Data Sciences
Ideas about a Bachelor in Machine Learning/Data SciencesIdeas about a Bachelor in Machine Learning/Data Sciences
Ideas about a Bachelor in Machine Learning/Data SciencesAndres Mendez-Vazquez
 
20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variations20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variationsAndres Mendez-Vazquez
 

More from Andres Mendez-Vazquez (20)

2.03 bayesian estimation
2.03 bayesian estimation2.03 bayesian estimation
2.03 bayesian estimation
 
05 linear transformations
05 linear transformations05 linear transformations
05 linear transformations
 
01.04 orthonormal basis_eigen_vectors
01.04 orthonormal basis_eigen_vectors01.04 orthonormal basis_eigen_vectors
01.04 orthonormal basis_eigen_vectors
 
01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issues01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issues
 
01.02 linear equations
01.02 linear equations01.02 linear equations
01.02 linear equations
 
01.01 vector spaces
01.01 vector spaces01.01 vector spaces
01.01 vector spaces
 
06 recurrent neural_networks
06 recurrent neural_networks06 recurrent neural_networks
06 recurrent neural_networks
 
05 backpropagation automatic_differentiation
05 backpropagation automatic_differentiation05 backpropagation automatic_differentiation
05 backpropagation automatic_differentiation
 
Zetta global
Zetta globalZetta global
Zetta global
 
01 Introduction to Neural Networks and Deep Learning
01 Introduction to Neural Networks and Deep Learning01 Introduction to Neural Networks and Deep Learning
01 Introduction to Neural Networks and Deep Learning
 
25 introduction reinforcement_learning
25 introduction reinforcement_learning25 introduction reinforcement_learning
25 introduction reinforcement_learning
 
Neural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning SyllabusNeural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning Syllabus
 
Introduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabusIntroduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabus
 
Ideas 09 22_2018
Ideas 09 22_2018Ideas 09 22_2018
Ideas 09 22_2018
 
Ideas about a Bachelor in Machine Learning/Data Sciences
Ideas about a Bachelor in Machine Learning/Data SciencesIdeas about a Bachelor in Machine Learning/Data Sciences
Ideas about a Bachelor in Machine Learning/Data Sciences
 
Analysis of Algorithms Syllabus
Analysis of Algorithms  SyllabusAnalysis of Algorithms  Syllabus
Analysis of Algorithms Syllabus
 
20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variations20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variations
 
18.1 combining models
18.1 combining models18.1 combining models
18.1 combining models
 
17 vapnik chervonenkis dimension
17 vapnik chervonenkis dimension17 vapnik chervonenkis dimension
17 vapnik chervonenkis dimension
 
A basic introduction to learning
A basic introduction to learningA basic introduction to learning
A basic introduction to learning
 

Recently uploaded

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxhumanexperienceaaa
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 

Recently uploaded (20)

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 

05 Machine Learning - Supervised Fisher Classifier

  • 1. Machine Learning for Data Mining Fisher Linear Discriminant Andres Mendez-Vazquez May 26, 2015 1 / 43
  • 2. Outline 1 Introduction History The Rotation Idea Classifiers as Machines for dimensionality reduction 2 Solution Use the mean of each Class Scatter measure The Cost Function A Transformation for simplification and defining the cost function 3 Where is this used? Applications 4 Relation with Least Squared Error What? 2 / 43
  • 3. Outline 1 Introduction History The Rotation Idea Classifiers as Machines for dimensionality reduction 2 Solution Use the mean of each Class Scatter measure The Cost Function A Transformation for simplification and defining the cost function 3 Where is this used? Applications 4 Relation with Least Squared Error What? 3 / 43
  • 4. Outline 1 Introduction History The Rotation Idea Classifiers as Machines for dimensionality reduction 2 Solution Use the mean of each Class Scatter measure The Cost Function A Transformation for simplification and defining the cost function 3 Where is this used? Applications 4 Relation with Least Squared Error What? 4 / 43
  • 5. Rotation Projecting Projecting well-separated samples onto an arbitrary line usually produces a confused mixture of samples from all of the classes and thus produces poor recognition performance. Something Notable However, moving and rotating the line around might result in an orientation for which the projected samples are well separated. Fisher linear discriminant (FLD) It is a discriminant analysis seeking directions that are efficient for discriminating binary classification problem. 5 / 43
  • 6. Rotation Projecting Projecting well-separated samples onto an arbitrary line usually produces a confused mixture of samples from all of the classes and thus produces poor recognition performance. Something Notable However, moving and rotating the line around might result in an orientation for which the projected samples are well separated. Fisher linear discriminant (FLD) It is a discriminant analysis seeking directions that are efficient for discriminating binary classification problem. 5 / 43
  • 7. Rotation Projecting Projecting well-separated samples onto an arbitrary line usually produces a confused mixture of samples from all of the classes and thus produces poor recognition performance. Something Notable However, moving and rotating the line around might result in an orientation for which the projected samples are well separated. Fisher linear discriminant (FLD) It is a discriminant analysis seeking directions that are efficient for discriminating binary classification problem. 5 / 43
  • 8. Example Example - From Left to Right the Improvement 6 / 43
  • 9. Outline 1 Introduction History The Rotation Idea Classifiers as Machines for dimensionality reduction 2 Solution Use the mean of each Class Scatter measure The Cost Function A Transformation for simplification and defining the cost function 3 Where is this used? Applications 4 Relation with Least Squared Error What? 7 / 43
  • 10. This is actually coming from... Classifier as A machine for dimensionality reduction. Initial Setup We have: Nd-dimensional samples x1, x2, ..., xN . Ni is the number of samples in class Ci for i=1,2. Then, we ask for the projection of each xi into the line by means of yi = wT xi (1) 8 / 43
  • 11. This is actually coming from... Classifier as A machine for dimensionality reduction. Initial Setup We have: Nd-dimensional samples x1, x2, ..., xN . Ni is the number of samples in class Ci for i=1,2. Then, we ask for the projection of each xi into the line by means of yi = wT xi (1) 8 / 43
  • 12. This is actually coming from... Classifier as A machine for dimensionality reduction. Initial Setup We have: Nd-dimensional samples x1, x2, ..., xN . Ni is the number of samples in class Ci for i=1,2. Then, we ask for the projection of each xi into the line by means of yi = wT xi (1) 8 / 43
  • 13. This is actually coming from... Classifier as A machine for dimensionality reduction. Initial Setup We have: Nd-dimensional samples x1, x2, ..., xN . Ni is the number of samples in class Ci for i=1,2. Then, we ask for the projection of each xi into the line by means of yi = wT xi (1) 8 / 43
  • 14. This is actually coming from... Classifier as A machine for dimensionality reduction. Initial Setup We have: Nd-dimensional samples x1, x2, ..., xN . Ni is the number of samples in class Ci for i=1,2. Then, we ask for the projection of each xi into the line by means of yi = wT xi (1) 8 / 43
  • 15. Outline 1 Introduction History The Rotation Idea Classifiers as Machines for dimensionality reduction 2 Solution Use the mean of each Class Scatter measure The Cost Function A Transformation for simplification and defining the cost function 3 Where is this used? Applications 4 Relation with Least Squared Error What? 9 / 43
  • 16. Use the mean of each Class Then Select w such that class separation is maximized We then define the mean sample for ecah class 1 C1 ⇒ m1 = 1 N1 N1 i=1 xi 2 C2 ⇒ m2 = 1 N2 N2 i=1 xi Ok!!! This is giving us a measure of distance Thus, we want to maximize the distance the projected means: m1 − m2 = wT (m1 − m2) (2) where mk = wT mk for k = 1, 2. 10 / 43
  • 17. Use the mean of each Class Then Select w such that class separation is maximized We then define the mean sample for ecah class 1 C1 ⇒ m1 = 1 N1 N1 i=1 xi 2 C2 ⇒ m2 = 1 N2 N2 i=1 xi Ok!!! This is giving us a measure of distance Thus, we want to maximize the distance the projected means: m1 − m2 = wT (m1 − m2) (2) where mk = wT mk for k = 1, 2. 10 / 43
  • 18. Use the mean of each Class Then Select w such that class separation is maximized We then define the mean sample for ecah class 1 C1 ⇒ m1 = 1 N1 N1 i=1 xi 2 C2 ⇒ m2 = 1 N2 N2 i=1 xi Ok!!! This is giving us a measure of distance Thus, we want to maximize the distance the projected means: m1 − m2 = wT (m1 − m2) (2) where mk = wT mk for k = 1, 2. 10 / 43
  • 19. Use the mean of each Class Then Select w such that class separation is maximized We then define the mean sample for ecah class 1 C1 ⇒ m1 = 1 N1 N1 i=1 xi 2 C2 ⇒ m2 = 1 N2 N2 i=1 xi Ok!!! This is giving us a measure of distance Thus, we want to maximize the distance the projected means: m1 − m2 = wT (m1 − m2) (2) where mk = wT mk for k = 1, 2. 10 / 43
  • 20. Use the mean of each Class Then Select w such that class separation is maximized We then define the mean sample for ecah class 1 C1 ⇒ m1 = 1 N1 N1 i=1 xi 2 C2 ⇒ m2 = 1 N2 N2 i=1 xi Ok!!! This is giving us a measure of distance Thus, we want to maximize the distance the projected means: m1 − m2 = wT (m1 − m2) (2) where mk = wT mk for k = 1, 2. 10 / 43
  • 21. However We could simply seek max w wT (m1 − m2) s.t. d i=1 wi = 1 After all We do not care about the magnitude of w. 11 / 43
  • 22. However We could simply seek max w wT (m1 − m2) s.t. d i=1 wi = 1 After all We do not care about the magnitude of w. 11 / 43
  • 23. Example Here, we have the problem... The Scattering!!! 12 / 43
  • 24. Outline 1 Introduction History The Rotation Idea Classifiers as Machines for dimensionality reduction 2 Solution Use the mean of each Class Scatter measure The Cost Function A Transformation for simplification and defining the cost function 3 Where is this used? Applications 4 Relation with Least Squared Error What? 13 / 43
  • 25. Fixing the Problem To obtain good separation of the projected data The difference between the means should be large relative to some measure of the standard deviations for each class. We define a SCATTER measure (Based in the Sample Variance) s2 k = xi∈Ck wT xi − mk 2 = yi=wT xi∈Ck (yi − mk)2 (3) We define then within-class variance for the whole data s2 1 + s2 2 (4) 14 / 43
  • 26. Fixing the Problem To obtain good separation of the projected data The difference between the means should be large relative to some measure of the standard deviations for each class. We define a SCATTER measure (Based in the Sample Variance) s2 k = xi∈Ck wT xi − mk 2 = yi=wT xi∈Ck (yi − mk)2 (3) We define then within-class variance for the whole data s2 1 + s2 2 (4) 14 / 43
  • 27. Fixing the Problem To obtain good separation of the projected data The difference between the means should be large relative to some measure of the standard deviations for each class. We define a SCATTER measure (Based in the Sample Variance) s2 k = xi∈Ck wT xi − mk 2 = yi=wT xi∈Ck (yi − mk)2 (3) We define then within-class variance for the whole data s2 1 + s2 2 (4) 14 / 43
  • 28. Outline 1 Introduction History The Rotation Idea Classifiers as Machines for dimensionality reduction 2 Solution Use the mean of each Class Scatter measure The Cost Function A Transformation for simplification and defining the cost function 3 Where is this used? Applications 4 Relation with Least Squared Error What? 15 / 43
  • 29. Finally, a Cost Function The between-class variance (m1 − m2)2 (5) The Fisher criterion between-class variance within-class variance (6) Finally J (w) = (m1 − m2)2 s2 1 + s2 2 (7) 16 / 43
  • 30. Finally, a Cost Function The between-class variance (m1 − m2)2 (5) The Fisher criterion between-class variance within-class variance (6) Finally J (w) = (m1 − m2)2 s2 1 + s2 2 (7) 16 / 43
  • 31. Finally, a Cost Function The between-class variance (m1 − m2)2 (5) The Fisher criterion between-class variance within-class variance (6) Finally J (w) = (m1 − m2)2 s2 1 + s2 2 (7) 16 / 43
  • 32. Outline 1 Introduction History The Rotation Idea Classifiers as Machines for dimensionality reduction 2 Solution Use the mean of each Class Scatter measure The Cost Function A Transformation for simplification and defining the cost function 3 Where is this used? Applications 4 Relation with Least Squared Error What? 17 / 43
  • 33. We use a transformation to simplify our life First J (w) = wT m1 − wT m2 2 yi =wT xi ∈C1 (yi − mk )2 + yi =wT xi ∈C2 (yi − mk )2 (8) Second J (w) = wT m1 − wT m2 wT m1 − wT m2 T yi =wT xi ∈C1 wT xi − mk wT xi − mk T + yi =wT xi ∈C2 wT xi − mk wT xi − mk T (9) Third J (w) = wT (m1 − m2) wT (m1 − m2) T yi =wT xi ∈C1 wT (xi − m1) wT (xi − m1) T + yi =wT xi ∈C2 wT (xi − m2) wT (xi − m2) T (10) 18 / 43
  • 34. We use a transformation to simplify our life First J (w) = wT m1 − wT m2 2 yi =wT xi ∈C1 (yi − mk )2 + yi =wT xi ∈C2 (yi − mk )2 (8) Second J (w) = wT m1 − wT m2 wT m1 − wT m2 T yi =wT xi ∈C1 wT xi − mk wT xi − mk T + yi =wT xi ∈C2 wT xi − mk wT xi − mk T (9) Third J (w) = wT (m1 − m2) wT (m1 − m2) T yi =wT xi ∈C1 wT (xi − m1) wT (xi − m1) T + yi =wT xi ∈C2 wT (xi − m2) wT (xi − m2) T (10) 18 / 43
  • 35. We use a transformation to simplify our life First J (w) = wT m1 − wT m2 2 yi =wT xi ∈C1 (yi − mk )2 + yi =wT xi ∈C2 (yi − mk )2 (8) Second J (w) = wT m1 − wT m2 wT m1 − wT m2 T yi =wT xi ∈C1 wT xi − mk wT xi − mk T + yi =wT xi ∈C2 wT xi − mk wT xi − mk T (9) Third J (w) = wT (m1 − m2) wT (m1 − m2) T yi =wT xi ∈C1 wT (xi − m1) wT (xi − m1) T + yi =wT xi ∈C2 wT (xi − m2) wT (xi − m2) T (10) 18 / 43
  • 36. Transformation Fourth J (w) = wT (m1 − m2) (m1 − m2)T w yi =wT xi ∈C1 wT (xi − m1) (xi − m1)T w + yi =wT xi ∈C2 wT (xi − m2) (xi − m2)T w (11) Fifth J (w) = wT (m1 − m2) (m1 − m2)T w wT yi =wT xi ∈C1 (xi − m1) (xi − m1)T + yi =wT xi ∈C2 (xi − m2) (xi − m2)T w (12) Now Rename J (w) = wT SBw wT Sww (13) 19 / 43
  • 37. Transformation Fourth J (w) = wT (m1 − m2) (m1 − m2)T w yi =wT xi ∈C1 wT (xi − m1) (xi − m1)T w + yi =wT xi ∈C2 wT (xi − m2) (xi − m2)T w (11) Fifth J (w) = wT (m1 − m2) (m1 − m2)T w wT yi =wT xi ∈C1 (xi − m1) (xi − m1)T + yi =wT xi ∈C2 (xi − m2) (xi − m2)T w (12) Now Rename J (w) = wT SBw wT Sww (13) 19 / 43
  • 38. Transformation Fourth J (w) = wT (m1 − m2) (m1 − m2)T w yi =wT xi ∈C1 wT (xi − m1) (xi − m1)T w + yi =wT xi ∈C2 wT (xi − m2) (xi − m2)T w (11) Fifth J (w) = wT (m1 − m2) (m1 − m2)T w wT yi =wT xi ∈C1 (xi − m1) (xi − m1)T + yi =wT xi ∈C2 (xi − m2) (xi − m2)T w (12) Now Rename J (w) = wT SBw wT Sww (13) 19 / 43
  • 39. Derive with respect to w Thus dJ (w) dw = d wT SBw wT Sww −1 dw = 0 (14) Then dJ (w) dw = SBw + ST B w wT Sww −1 − wT SBw wT Sww −2 Sww + ST w w = 0 (15) Now because the symmetry in SB and Sw dJ (w) dw = SB (wT Sww) − wT SBwSww (wT Sww)2 = 0 (16) 20 / 43
  • 40. Derive with respect to w Thus dJ (w) dw = d wT SBw wT Sww −1 dw = 0 (14) Then dJ (w) dw = SBw + ST B w wT Sww −1 − wT SBw wT Sww −2 Sww + ST w w = 0 (15) Now because the symmetry in SB and Sw dJ (w) dw = SB (wT Sww) − wT SBwSww (wT Sww)2 = 0 (16) 20 / 43
  • 41. Derive with respect to w Thus dJ (w) dw = d wT SBw wT Sww −1 dw = 0 (14) Then dJ (w) dw = SBw + ST B w wT Sww −1 − wT SBw wT Sww −2 Sww + ST w w = 0 (15) Now because the symmetry in SB and Sw dJ (w) dw = SB (wT Sww) − wT SBwSww (wT Sww)2 = 0 (16) 20 / 43
  • 42. Derive with respect to w Thus dJ (w) dw = SB (wT Sww) − wT SBwSww (wT Sww)2 = 0 (17) Then wT Sww SBw = wT SBw Sww (18) 21 / 43
  • 43. Derive with respect to w Thus dJ (w) dw = SB (wT Sww) − wT SBwSww (wT Sww)2 = 0 (17) Then wT Sww SBw = wT SBw Sww (18) 21 / 43
  • 44. Now, Several Tricks!!! First SBw = (m1 − m2) (m1 − m2)T w = α (m1 − m2) (19) Where α = (m1 − m2)T w is a simple constant It means that SBw is always in the direction m1 − m2!!! In addition wT Sww and wT SBw are constants 22 / 43
  • 45. Now, Several Tricks!!! First SBw = (m1 − m2) (m1 − m2)T w = α (m1 − m2) (19) Where α = (m1 − m2)T w is a simple constant It means that SBw is always in the direction m1 − m2!!! In addition wT Sww and wT SBw are constants 22 / 43
  • 46. Now, Several Tricks!!! First SBw = (m1 − m2) (m1 − m2)T w = α (m1 − m2) (19) Where α = (m1 − m2)T w is a simple constant It means that SBw is always in the direction m1 − m2!!! In addition wT Sww and wT SBw are constants 22 / 43
  • 47. Now, Several Tricks!!! Finally Sww ∝ (m1 − m2) ⇒ w ∝ S−1 w (m1 − m2) (20) Once the data is transformed into yi Use a threshold y0 ⇒ x ∈ C1 iff y (x) ≥ y0 or x ∈ C2 iff y (x) < y0 Or ML with a Gussian can be used to classify the new transformed data using a Naive Bayes (Central Limit Theorem and y = wT x sum of random variables). 23 / 43
  • 48. Now, Several Tricks!!! Finally Sww ∝ (m1 − m2) ⇒ w ∝ S−1 w (m1 − m2) (20) Once the data is transformed into yi Use a threshold y0 ⇒ x ∈ C1 iff y (x) ≥ y0 or x ∈ C2 iff y (x) < y0 Or ML with a Gussian can be used to classify the new transformed data using a Naive Bayes (Central Limit Theorem and y = wT x sum of random variables). 23 / 43
  • 49. Now, Several Tricks!!! Finally Sww ∝ (m1 − m2) ⇒ w ∝ S−1 w (m1 − m2) (20) Once the data is transformed into yi Use a threshold y0 ⇒ x ∈ C1 iff y (x) ≥ y0 or x ∈ C2 iff y (x) < y0 Or ML with a Gussian can be used to classify the new transformed data using a Naive Bayes (Central Limit Theorem and y = wT x sum of random variables). 23 / 43
  • 50. Outline 1 Introduction History The Rotation Idea Classifiers as Machines for dimensionality reduction 2 Solution Use the mean of each Class Scatter measure The Cost Function A Transformation for simplification and defining the cost function 3 Where is this used? Applications 4 Relation with Least Squared Error What? 24 / 43
  • 51. Applications Something Notable Bankruptcy prediction In bankruptcy prediction based on accounting ratios and other financial variables, linear discriminant analysis was the first statistical method applied to systematically explain which firms entered bankruptcy vs. survived. Face recognition In computerized face recognition, each face is represented by a large number of pixel values. The linear combinations obtained using Fisher’s linear discriminant are called Fisher faces. Marketing In marketing, discriminant analysis was once often used to determine the factors which distinguish different types of customers and/or products on the basis of surveys or other forms of collected data. 25 / 43
  • 52. Applications Something Notable Bankruptcy prediction In bankruptcy prediction based on accounting ratios and other financial variables, linear discriminant analysis was the first statistical method applied to systematically explain which firms entered bankruptcy vs. survived. Face recognition In computerized face recognition, each face is represented by a large number of pixel values. The linear combinations obtained using Fisher’s linear discriminant are called Fisher faces. Marketing In marketing, discriminant analysis was once often used to determine the factors which distinguish different types of customers and/or products on the basis of surveys or other forms of collected data. 25 / 43
  • 53. Applications Something Notable Bankruptcy prediction In bankruptcy prediction based on accounting ratios and other financial variables, linear discriminant analysis was the first statistical method applied to systematically explain which firms entered bankruptcy vs. survived. Face recognition In computerized face recognition, each face is represented by a large number of pixel values. The linear combinations obtained using Fisher’s linear discriminant are called Fisher faces. Marketing In marketing, discriminant analysis was once often used to determine the factors which distinguish different types of customers and/or products on the basis of surveys or other forms of collected data. 25 / 43
  • 54. Applications Something Notable Bankruptcy prediction In bankruptcy prediction based on accounting ratios and other financial variables, linear discriminant analysis was the first statistical method applied to systematically explain which firms entered bankruptcy vs. survived. Face recognition In computerized face recognition, each face is represented by a large number of pixel values. The linear combinations obtained using Fisher’s linear discriminant are called Fisher faces. Marketing In marketing, discriminant analysis was once often used to determine the factors which distinguish different types of customers and/or products on the basis of surveys or other forms of collected data. 25 / 43
  • 55. Applications Something Notable Bankruptcy prediction In bankruptcy prediction based on accounting ratios and other financial variables, linear discriminant analysis was the first statistical method applied to systematically explain which firms entered bankruptcy vs. survived. Face recognition In computerized face recognition, each face is represented by a large number of pixel values. The linear combinations obtained using Fisher’s linear discriminant are called Fisher faces. Marketing In marketing, discriminant analysis was once often used to determine the factors which distinguish different types of customers and/or products on the basis of surveys or other forms of collected data. 25 / 43
  • 56. Applications Something Notable Bankruptcy prediction In bankruptcy prediction based on accounting ratios and other financial variables, linear discriminant analysis was the first statistical method applied to systematically explain which firms entered bankruptcy vs. survived. Face recognition In computerized face recognition, each face is represented by a large number of pixel values. The linear combinations obtained using Fisher’s linear discriminant are called Fisher faces. Marketing In marketing, discriminant analysis was once often used to determine the factors which distinguish different types of customers and/or products on the basis of surveys or other forms of collected data. 25 / 43
  • 57. Applications Something Notable Bankruptcy prediction In bankruptcy prediction based on accounting ratios and other financial variables, linear discriminant analysis was the first statistical method applied to systematically explain which firms entered bankruptcy vs. survived. Face recognition In computerized face recognition, each face is represented by a large number of pixel values. The linear combinations obtained using Fisher’s linear discriminant are called Fisher faces. Marketing In marketing, discriminant analysis was once often used to determine the factors which distinguish different types of customers and/or products on the basis of surveys or other forms of collected data. 25 / 43
  • 58. Applications Something Notable Bankruptcy prediction In bankruptcy prediction based on accounting ratios and other financial variables, linear discriminant analysis was the first statistical method applied to systematically explain which firms entered bankruptcy vs. survived. Face recognition In computerized face recognition, each face is represented by a large number of pixel values. The linear combinations obtained using Fisher’s linear discriminant are called Fisher faces. Marketing In marketing, discriminant analysis was once often used to determine the factors which distinguish different types of customers and/or products on the basis of surveys or other forms of collected data. 25 / 43
  • 59. Applications Something Notable Biomedical studies The main application of discriminant analysis in medicine is the assessment of severity state of a patient and prognosis of disease outcome. 26 / 43
  • 60. Applications Something Notable Biomedical studies The main application of discriminant analysis in medicine is the assessment of severity state of a patient and prognosis of disease outcome. 26 / 43
  • 61. Please Your Reading Material, it is about the Multi-class 4.1.6 Fisher’s discriminant for multiple classes AT “Pattern Recognition” by Bishop 27 / 43
  • 62. Outline 1 Introduction History The Rotation Idea Classifiers as Machines for dimensionality reduction 2 Solution Use the mean of each Class Scatter measure The Cost Function A Transformation for simplification and defining the cost function 3 Where is this used? Applications 4 Relation with Least Squared Error What? 28 / 43
  • 63. Relation with Least Squared Error First The least-squares approach to the determination of a linear discriminant was based on the goal of making the model predictions as close as possible to a set of target values. Second The Fisher criterion was derived by requiring maximum class separation in the output space. 29 / 43
  • 64. Relation with Least Squared Error First The least-squares approach to the determination of a linear discriminant was based on the goal of making the model predictions as close as possible to a set of target values. Second The Fisher criterion was derived by requiring maximum class separation in the output space. 29 / 43
  • 65. How do we do this? First We have N samples. We have N1samples for class C1. We have N2 samples for class C2. So, we decide the following for the targets on each class We have then for class C1 is N N1 . We have then for class C2 is − N N2 . 30 / 43
  • 66. How do we do this? First We have N samples. We have N1samples for class C1. We have N2 samples for class C2. So, we decide the following for the targets on each class We have then for class C1 is N N1 . We have then for class C2 is − N N2 . 30 / 43
  • 67. How do we do this? First We have N samples. We have N1samples for class C1. We have N2 samples for class C2. So, we decide the following for the targets on each class We have then for class C1 is N N1 . We have then for class C2 is − N N2 . 30 / 43
  • 68. How do we do this? First We have N samples. We have N1samples for class C1. We have N2 samples for class C2. So, we decide the following for the targets on each class We have then for class C1 is N N1 . We have then for class C2 is − N N2 . 30 / 43
  • 69. Thus The new cost function E = 1 2 N n=1 wT xn + wo − tn 2 (21) Deriving with respect to w N n=1 wT xn + wo − tn xn = 0 (22) Deriving with respect to w0 N n=1 wT xn + wo − tn = 0 (23) 31 / 43
  • 70. Thus The new cost function E = 1 2 N n=1 wT xn + wo − tn 2 (21) Deriving with respect to w N n=1 wT xn + wo − tn xn = 0 (22) Deriving with respect to w0 N n=1 wT xn + wo − tn = 0 (23) 31 / 43
  • 71. Thus The new cost function E = 1 2 N n=1 wT xn + wo − tn 2 (21) Deriving with respect to w N n=1 wT xn + wo − tn xn = 0 (22) Deriving with respect to w0 N n=1 wT xn + wo − tn = 0 (23) 31 / 43
  • 72. Then We have that N n=1 wT xn + wo − tn = N n=1 wT xn + wo − N n=1 tn = N n=1 wT xn + wo − N1 N N1 + N2 N N2 = N n=1 wT xn + wo Then N n=1 wT xn + Nwo = 0 32 / 43
  • 73. Then We have that N n=1 wT xn + wo − tn = N n=1 wT xn + wo − N n=1 tn = N n=1 wT xn + wo − N1 N N1 + N2 N N2 = N n=1 wT xn + wo Then N n=1 wT xn + Nwo = 0 32 / 43
  • 74. Then We have that wo = −wT 1 N N n=1 xn We rename 1 N N n=1 xn = m m = 1 N N n=1 xn = 1 N [N1m1 + N2m2] Finally wo = −wT m 33 / 43
  • 75. Then We have that wo = −wT 1 N N n=1 xn We rename 1 N N n=1 xn = m m = 1 N N n=1 xn = 1 N [N1m1 + N2m2] Finally wo = −wT m 33 / 43
  • 76. Then We have that wo = −wT 1 N N n=1 xn We rename 1 N N n=1 xn = m m = 1 N N n=1 xn = 1 N [N1m1 + N2m2] Finally wo = −wT m 33 / 43
  • 77. Now In a similar way N n=1 wT xn + wo xn − N n=1 tnxn = 0 34 / 43
  • 78. Thus, we have Something Notable N n=1 wT xn + wo xn − N N1 N1 n=1 xn + N N2 N2 n=1 xn = 0 Thus N n=1 wT xn + wo xn − N (m1 − m2) = 0 35 / 43
  • 79. Thus, we have Something Notable N n=1 wT xn + wo xn − N N1 N1 n=1 xn + N N2 N2 n=1 xn = 0 Thus N n=1 wT xn + wo xn − N (m1 − m2) = 0 35 / 43
  • 80. Next Then N n=1 wT xn − wT m xn = N (m1 − m2) Thus N n=1 wT xn − wT m xn = N (m1 − m2) Now What? Please do this as a take homework for next class 36 / 43
  • 81. Next Then N n=1 wT xn − wT m xn = N (m1 − m2) Thus N n=1 wT xn − wT m xn = N (m1 − m2) Now What? Please do this as a take homework for next class 36 / 43
  • 82. Next Then N n=1 wT xn − wT m xn = N (m1 − m2) Thus N n=1 wT xn − wT m xn = N (m1 − m2) Now What? Please do this as a take homework for next class 36 / 43
  • 83. Now, Do you have the solution? You have a version in Duda and Hart Section 5.8 ˆw = XT X −1 XT y Thus XT X ˆw = XT y Now, we rewrite the data matrix by the normalization discussed early X = 11 X1 −12 −X2 37 / 43
  • 84. Now, Do you have the solution? You have a version in Duda and Hart Section 5.8 ˆw = XT X −1 XT y Thus XT X ˆw = XT y Now, we rewrite the data matrix by the normalization discussed early X = 11 X1 −12 −X2 37 / 43
  • 85. Now, Do you have the solution? You have a version in Duda and Hart Section 5.8 ˆw = XT X −1 XT y Thus XT X ˆw = XT y Now, we rewrite the data matrix by the normalization discussed early X = 11 X1 −12 −X2 37 / 43
  • 86. In addition Our old augmented w w = w0 w And our new y y = N N1 11 N N2 12 (24) 38 / 43
  • 87. In addition Our old augmented w w = w0 w And our new y y = N N1 11 N N2 12 (24) 38 / 43
  • 88. Thus, we have Something Notable 1T 1 −1T 1 XT 1 −XT 2 11 X1 −12 −X2 w0 w = 1T 1 −1T 1 XT 1 −XT 2 N N1 11 N N2 12 Thus, if we use the following definitions mi = 1 Ni x∈Ci x SW = xi∈C1 (xi − m1) (xi − m1)T + xi∈C2 (xi − m2) (xi − m2)T 39 / 43
  • 89. Thus, we have Something Notable 1T 1 −1T 1 XT 1 −XT 2 11 X1 −12 −X2 w0 w = 1T 1 −1T 1 XT 1 −XT 2 N N1 11 N N2 12 Thus, if we use the following definitions mi = 1 Ni x∈Ci x SW = xi∈C1 (xi − m1) (xi − m1)T + xi∈C2 (xi − m2) (xi − m2)T 39 / 43
  • 90. Thus, we have Something Notable 1T 1 −1T 1 XT 1 −XT 2 11 X1 −12 −X2 w0 w = 1T 1 −1T 1 XT 1 −XT 2 N N1 11 N N2 12 Thus, if we use the following definitions mi = 1 Ni x∈Ci x SW = xi∈C1 (xi − m1) (xi − m1)T + xi∈C2 (xi − m2) (xi − m2)T 39 / 43
  • 91. Thus, we have Something Notable 1T 1 −1T 1 XT 1 −XT 2 11 X1 −12 −X2 w0 w = 1T 1 −1T 1 XT 1 −XT 2 N N1 11 N N2 12 Thus, if we use the following definitions mi = 1 Ni x∈Ci x SW = xi∈C1 (xi − m1) (xi − m1)T + xi∈C2 (xi − m2) (xi − m2)T 39 / 43
  • 92. Then We have the following n (n1m1 + n2m2) T (n1m1 + n2m2) Sw + n1m1mT 1 + n2m2mT 2 w0 w = 0 n [m1 − m2] Then nw0 + (n1m1 + n2m2) T w (n1m1 + n2m2) w0 + Sw + n1m1mT 1 + n2m2mT 2 = 0 n [m1 − m2] 40 / 43
  • 93. Then We have the following n (n1m1 + n2m2) T (n1m1 + n2m2) Sw + n1m1mT 1 + n2m2mT 2 w0 w = 0 n [m1 − m2] Then nw0 + (n1m1 + n2m2) T w (n1m1 + n2m2) w0 + Sw + n1m1mT 1 + n2m2mT 2 = 0 n [m1 − m2] 40 / 43
  • 94. Thus We have that w0 = −wT m 1 n Sw + n1n2 n2 (m1 − m2) (m1 − m2)T w = m1 − m2 Thus Since the vector (m1 − m2) (m1 − m2)T w is in the direction of m1 − m2 We have that n1n2 n2 (m1 − m2) (m1 − m2)T w = (1 − α) (m1 − m2) 41 / 43
  • 95. Thus We have that w0 = −wT m 1 n Sw + n1n2 n2 (m1 − m2) (m1 − m2)T w = m1 − m2 Thus Since the vector (m1 − m2) (m1 − m2)T w is in the direction of m1 − m2 We have that n1n2 n2 (m1 − m2) (m1 − m2)T w = (1 − α) (m1 − m2) 41 / 43
  • 96. Thus We have that w0 = −wT m 1 n Sw + n1n2 n2 (m1 − m2) (m1 − m2)T w = m1 − m2 Thus Since the vector (m1 − m2) (m1 − m2)T w is in the direction of m1 − m2 We have that n1n2 n2 (m1 − m2) (m1 − m2)T w = (1 − α) (m1 − m2) 41 / 43
  • 97. Finally We have that w = αnS−1 w (m1 − m2) (25) 42 / 43