SlideShare a Scribd company logo
1 of 125
Download to read offline
Machine Learning for Data Mining
Cluster Validity
Andres Mendez-Vazquez
July 30, 2015
1 / 55
Images/cinvestav-
Outline
1 Introduction
What is a Good Clustering?
2 Cluster Validity
The Process
Hypothesis Testing
Monte Carlo techniques
Bootstrapping Techniques
Which Hypothesis?
Hypothesis Testing in Cluster Validity
External Criteria
Relative Criteria
Hard Clustering
2 / 55
Images/cinvestav-
Outline
1 Introduction
What is a Good Clustering?
2 Cluster Validity
The Process
Hypothesis Testing
Monte Carlo techniques
Bootstrapping Techniques
Which Hypothesis?
Hypothesis Testing in Cluster Validity
External Criteria
Relative Criteria
Hard Clustering
3 / 55
Images/cinvestav-
What is a Good Clustering?
Internal criterion
A good clustering will produce high quality clusters in which:
The intra-class (that is, intra-cluster) similarity is high.
The inter-class similarity is low.
The measured quality of a clustering depends on both the document
representation and the similarity measure used.
4 / 55
Images/cinvestav-
What is a Good Clustering?
Internal criterion
A good clustering will produce high quality clusters in which:
The intra-class (that is, intra-cluster) similarity is high.
The inter-class similarity is low.
The measured quality of a clustering depends on both the document
representation and the similarity measure used.
4 / 55
Images/cinvestav-
What is a Good Clustering?
Internal criterion
A good clustering will produce high quality clusters in which:
The intra-class (that is, intra-cluster) similarity is high.
The inter-class similarity is low.
The measured quality of a clustering depends on both the document
representation and the similarity measure used.
4 / 55
Images/cinvestav-
What is a Good Clustering?
Internal criterion
A good clustering will produce high quality clusters in which:
The intra-class (that is, intra-cluster) similarity is high.
The inter-class similarity is low.
The measured quality of a clustering depends on both the document
representation and the similarity measure used.
4 / 55
Images/cinvestav-
Problem
Many of the Clustering Algorithms
They impose a clustering structure on the data, even though the data may
not posses any.
This is why
Cluster analysis is not a panacea.
Therefore
It is necessary to have an indication that the vectors of X form clusters
before we apply a clustering algorithm.
5 / 55
Images/cinvestav-
Problem
Many of the Clustering Algorithms
They impose a clustering structure on the data, even though the data may
not posses any.
This is why
Cluster analysis is not a panacea.
Therefore
It is necessary to have an indication that the vectors of X form clusters
before we apply a clustering algorithm.
5 / 55
Images/cinvestav-
Problem
Many of the Clustering Algorithms
They impose a clustering structure on the data, even though the data may
not posses any.
This is why
Cluster analysis is not a panacea.
Therefore
It is necessary to have an indication that the vectors of X form clusters
before we apply a clustering algorithm.
5 / 55
Images/cinvestav-
Then
The problem of verifying whether X possesses a clustering structure
Without identifying it explicitly, this is known as clustering tendency.
6 / 55
Images/cinvestav-
What can happen if X posses a cluster structure?
A different kind of problem is encountered now
All the previous algorithms require knowledge of the values of specific
parameters.
Some of them impose restrictions on the shape of the clusters.
It is more
Poor estimation of these parameters and inappropriate restrictions on the
shape of the clusters may lead to incorrect conclusions about the
clustering structure of X.
Thus, it is necessary to discuss
Methods suitable for quantitative evaluation of the results of a
clustering algorithm.
This task is known as cluster validity.
7 / 55
Images/cinvestav-
What can happen if X posses a cluster structure?
A different kind of problem is encountered now
All the previous algorithms require knowledge of the values of specific
parameters.
Some of them impose restrictions on the shape of the clusters.
It is more
Poor estimation of these parameters and inappropriate restrictions on the
shape of the clusters may lead to incorrect conclusions about the
clustering structure of X.
Thus, it is necessary to discuss
Methods suitable for quantitative evaluation of the results of a
clustering algorithm.
This task is known as cluster validity.
7 / 55
Images/cinvestav-
What can happen if X posses a cluster structure?
A different kind of problem is encountered now
All the previous algorithms require knowledge of the values of specific
parameters.
Some of them impose restrictions on the shape of the clusters.
It is more
Poor estimation of these parameters and inappropriate restrictions on the
shape of the clusters may lead to incorrect conclusions about the
clustering structure of X.
Thus, it is necessary to discuss
Methods suitable for quantitative evaluation of the results of a
clustering algorithm.
This task is known as cluster validity.
7 / 55
Images/cinvestav-
What can happen if X posses a cluster structure?
A different kind of problem is encountered now
All the previous algorithms require knowledge of the values of specific
parameters.
Some of them impose restrictions on the shape of the clusters.
It is more
Poor estimation of these parameters and inappropriate restrictions on the
shape of the clusters may lead to incorrect conclusions about the
clustering structure of X.
Thus, it is necessary to discuss
Methods suitable for quantitative evaluation of the results of a
clustering algorithm.
This task is known as cluster validity.
7 / 55
Images/cinvestav-
What can happen if X posses a cluster structure?
A different kind of problem is encountered now
All the previous algorithms require knowledge of the values of specific
parameters.
Some of them impose restrictions on the shape of the clusters.
It is more
Poor estimation of these parameters and inappropriate restrictions on the
shape of the clusters may lead to incorrect conclusions about the
clustering structure of X.
Thus, it is necessary to discuss
Methods suitable for quantitative evaluation of the results of a
clustering algorithm.
This task is known as cluster validity.
7 / 55
Images/cinvestav-
Outline
1 Introduction
What is a Good Clustering?
2 Cluster Validity
The Process
Hypothesis Testing
Monte Carlo techniques
Bootstrapping Techniques
Which Hypothesis?
Hypothesis Testing in Cluster Validity
External Criteria
Relative Criteria
Hard Clustering
8 / 55
Images/cinvestav-
The Cluster Flowchart
Flowchart of the validity paradigm for clustering structures
9 / 55
Images/cinvestav-
Outline
1 Introduction
What is a Good Clustering?
2 Cluster Validity
The Process
Hypothesis Testing
Monte Carlo techniques
Bootstrapping Techniques
Which Hypothesis?
Hypothesis Testing in Cluster Validity
External Criteria
Relative Criteria
Hard Clustering
10 / 55
Images/cinvestav-
Hypothesis Testing Revisited
Hypothesis
H1 :θ = θ0
H0 :θ = θ0
In addition
Also let Dρ be the critical interval corresponding to significance level ρ of
a test statistics q.
Now
Given Θ1, the set of all values that θ may take under hypothesis H1.
11 / 55
Images/cinvestav-
Hypothesis Testing Revisited
Hypothesis
H1 :θ = θ0
H0 :θ = θ0
In addition
Also let Dρ be the critical interval corresponding to significance level ρ of
a test statistics q.
Now
Given Θ1, the set of all values that θ may take under hypothesis H1.
11 / 55
Images/cinvestav-
Hypothesis Testing Revisited
Hypothesis
H1 :θ = θ0
H0 :θ = θ0
In addition
Also let Dρ be the critical interval corresponding to significance level ρ of
a test statistics q.
Now
Given Θ1, the set of all values that θ may take under hypothesis H1.
11 / 55
Images/cinvestav-
Power Function
Definition (Power Function)
W (θ) = P q ∈ Dρ|θ ∈ Θ1 (1)
Meaning
W (θ) is the probability that q lies in the critical region when the value of
the parameter vector θ.
Thus
The power function can be used for the comparison of two different
statistical tests.
The test whose power under the alternative hypotheses is greater is
always preferred.
12 / 55
Images/cinvestav-
Power Function
Definition (Power Function)
W (θ) = P q ∈ Dρ|θ ∈ Θ1 (1)
Meaning
W (θ) is the probability that q lies in the critical region when the value of
the parameter vector θ.
Thus
The power function can be used for the comparison of two different
statistical tests.
The test whose power under the alternative hypotheses is greater is
always preferred.
12 / 55
Images/cinvestav-
Power Function
Definition (Power Function)
W (θ) = P q ∈ Dρ|θ ∈ Θ1 (1)
Meaning
W (θ) is the probability that q lies in the critical region when the value of
the parameter vector θ.
Thus
The power function can be used for the comparison of two different
statistical tests.
The test whose power under the alternative hypotheses is greater is
always preferred.
12 / 55
Images/cinvestav-
Power Function
Definition (Power Function)
W (θ) = P q ∈ Dρ|θ ∈ Θ1 (1)
Meaning
W (θ) is the probability that q lies in the critical region when the value of
the parameter vector θ.
Thus
The power function can be used for the comparison of two different
statistical tests.
The test whose power under the alternative hypotheses is greater is
always preferred.
12 / 55
Images/cinvestav-
There are two types of errors associated with a statistical
test
Suppose that H0 is true
If q (x) ∈ Dρ, H0 will be rejected even if it is true.
It is called a type error I.
The probability of such error is ρ.
The probability of accepting H0 when it is true is 1 − ρ.
Suppose that H0 is false
If q (x) /∈ Dρ, H0 will be accepted even if it is false.
It is called a type error I.
The probability of such error is 1 − W (θ).
This depends on the specific value of θ.
13 / 55
Images/cinvestav-
There are two types of errors associated with a statistical
test
Suppose that H0 is true
If q (x) ∈ Dρ, H0 will be rejected even if it is true.
It is called a type error I.
The probability of such error is ρ.
The probability of accepting H0 when it is true is 1 − ρ.
Suppose that H0 is false
If q (x) /∈ Dρ, H0 will be accepted even if it is false.
It is called a type error I.
The probability of such error is 1 − W (θ).
This depends on the specific value of θ.
13 / 55
Images/cinvestav-
There are two types of errors associated with a statistical
test
Suppose that H0 is true
If q (x) ∈ Dρ, H0 will be rejected even if it is true.
It is called a type error I.
The probability of such error is ρ.
The probability of accepting H0 when it is true is 1 − ρ.
Suppose that H0 is false
If q (x) /∈ Dρ, H0 will be accepted even if it is false.
It is called a type error I.
The probability of such error is 1 − W (θ).
This depends on the specific value of θ.
13 / 55
Images/cinvestav-
There are two types of errors associated with a statistical
test
Suppose that H0 is true
If q (x) ∈ Dρ, H0 will be rejected even if it is true.
It is called a type error I.
The probability of such error is ρ.
The probability of accepting H0 when it is true is 1 − ρ.
Suppose that H0 is false
If q (x) /∈ Dρ, H0 will be accepted even if it is false.
It is called a type error I.
The probability of such error is 1 − W (θ).
This depends on the specific value of θ.
13 / 55
Images/cinvestav-
There are two types of errors associated with a statistical
test
Suppose that H0 is true
If q (x) ∈ Dρ, H0 will be rejected even if it is true.
It is called a type error I.
The probability of such error is ρ.
The probability of accepting H0 when it is true is 1 − ρ.
Suppose that H0 is false
If q (x) /∈ Dρ, H0 will be accepted even if it is false.
It is called a type error I.
The probability of such error is 1 − W (θ).
This depends on the specific value of θ.
13 / 55
Images/cinvestav-
There are two types of errors associated with a statistical
test
Suppose that H0 is true
If q (x) ∈ Dρ, H0 will be rejected even if it is true.
It is called a type error I.
The probability of such error is ρ.
The probability of accepting H0 when it is true is 1 − ρ.
Suppose that H0 is false
If q (x) /∈ Dρ, H0 will be accepted even if it is false.
It is called a type error I.
The probability of such error is 1 − W (θ).
This depends on the specific value of θ.
13 / 55
Images/cinvestav-
There are two types of errors associated with a statistical
test
Suppose that H0 is true
If q (x) ∈ Dρ, H0 will be rejected even if it is true.
It is called a type error I.
The probability of such error is ρ.
The probability of accepting H0 when it is true is 1 − ρ.
Suppose that H0 is false
If q (x) /∈ Dρ, H0 will be accepted even if it is false.
It is called a type error I.
The probability of such error is 1 − W (θ).
This depends on the specific value of θ.
13 / 55
Images/cinvestav-
There are two types of errors associated with a statistical
test
Suppose that H0 is true
If q (x) ∈ Dρ, H0 will be rejected even if it is true.
It is called a type error I.
The probability of such error is ρ.
The probability of accepting H0 when it is true is 1 − ρ.
Suppose that H0 is false
If q (x) /∈ Dρ, H0 will be accepted even if it is false.
It is called a type error I.
The probability of such error is 1 − W (θ).
This depends on the specific value of θ.
13 / 55
Images/cinvestav-
Thus
Something Notable
The probability density function (pdf) of the statistic q, under H0 ,
for most of the statistics used in practice has a single maximum.
In addition, the region Dρ, is either a half-line or the union of two
half-lines.
14 / 55
Images/cinvestav-
Example
Dρ is the union of two half-lines (A two-tailed statistical test)
15 / 55
Images/cinvestav-
Example
A right-tailed test
16 / 55
Images/cinvestav-
Example
A left-tailed test
17 / 55
Images/cinvestav-
Problem
In many practical cases
The exact form of the pdf of a statistic q, under a given hypothesis, is not
available and it is difficult to obtain.
However, we can do the following
Monte Carlo techniques.
Bootstrapping techniques.
18 / 55
Images/cinvestav-
Problem
In many practical cases
The exact form of the pdf of a statistic q, under a given hypothesis, is not
available and it is difficult to obtain.
However, we can do the following
Monte Carlo techniques.
Bootstrapping techniques.
18 / 55
Images/cinvestav-
Problem
In many practical cases
The exact form of the pdf of a statistic q, under a given hypothesis, is not
available and it is difficult to obtain.
However, we can do the following
Monte Carlo techniques.
Bootstrapping techniques.
18 / 55
Images/cinvestav-
Monte Carlo techniques
What do they do?
They rely on simulating the process at hand using a sufficient number of
computer-generated data.
Given enough data, we can try to learn the pdf for q.
Then, using that pdf we simulate samples of q.
Thus
For each of the say r, data sets, Xi, we compute the value of q, denoted
by qi.
Then
We construct the corresponding histogram of these values
19 / 55
Images/cinvestav-
Monte Carlo techniques
What do they do?
They rely on simulating the process at hand using a sufficient number of
computer-generated data.
Given enough data, we can try to learn the pdf for q.
Then, using that pdf we simulate samples of q.
Thus
For each of the say r, data sets, Xi, we compute the value of q, denoted
by qi.
Then
We construct the corresponding histogram of these values
19 / 55
Images/cinvestav-
Monte Carlo techniques
What do they do?
They rely on simulating the process at hand using a sufficient number of
computer-generated data.
Given enough data, we can try to learn the pdf for q.
Then, using that pdf we simulate samples of q.
Thus
For each of the say r, data sets, Xi, we compute the value of q, denoted
by qi.
Then
We construct the corresponding histogram of these values
19 / 55
Images/cinvestav-
Monte Carlo techniques
What do they do?
They rely on simulating the process at hand using a sufficient number of
computer-generated data.
Given enough data, we can try to learn the pdf for q.
Then, using that pdf we simulate samples of q.
Thus
For each of the say r, data sets, Xi, we compute the value of q, denoted
by qi.
Then
We construct the corresponding histogram of these values
19 / 55
Images/cinvestav-
Monte Carlo techniques
What do they do?
They rely on simulating the process at hand using a sufficient number of
computer-generated data.
Given enough data, we can try to learn the pdf for q.
Then, using that pdf we simulate samples of q.
Thus
For each of the say r, data sets, Xi, we compute the value of q, denoted
by qi.
Then
We construct the corresponding histogram of these values
19 / 55
Images/cinvestav-
Using this approximation
Assume
q corresponds to a right-tailed statistical test.
A histogram is constructed using r values of q corresponding to the r
data sets.
20 / 55
Images/cinvestav-
Using this approximation
Assume
q corresponds to a right-tailed statistical test.
A histogram is constructed using r values of q corresponding to the r
data sets.
20 / 55
Images/cinvestav-
Using this approximation
A right-tailed test
21 / 55
Images/cinvestav-
Thus
Then, acceptance or rejection may be based on the rules
Reject H0, if q is greater than (1 − ρ) r of the qi values.
Accept H0, if q is smaller than (1 − ρ) r of the qi values.
22 / 55
Images/cinvestav-
Thus
Then, acceptance or rejection may be based on the rules
Reject H0, if q is greater than (1 − ρ) r of the qi values.
Accept H0, if q is smaller than (1 − ρ) r of the qi values.
22 / 55
Images/cinvestav-
Next
For a left-tailed test
23 / 55
Images/cinvestav-
Next
For a left-tailed test, rejection or acceptance of the null hypothesis is
done on the basis
1 Reject H0, if q is smaller than ρr of the qi values.
2 Accept H0, if q is greater than ρr of the qi values
24 / 55
Images/cinvestav-
Now, for two-tailed statistical test
A two-tailed statistical test
25 / 55
Images/cinvestav-
Thus
For a two-tailed test we have
Accept H0, if q is greater than ρ
2 r of the qi values and less than
1 − ρ
2 r of the qi values.
26 / 55
Images/cinvestav-
Bootstrapping Techniques
Why?
They constitute an alternative way to cope with a limited amount of data.
Then
The idea here is to parameterize the unknown pdf in terms of an unknown
parameter.
How
To cope with the limited amount of data and in order to improve the
accuracy of the estimate of the unknown pdf parameter.
27 / 55
Images/cinvestav-
Bootstrapping Techniques
Why?
They constitute an alternative way to cope with a limited amount of data.
Then
The idea here is to parameterize the unknown pdf in terms of an unknown
parameter.
How
To cope with the limited amount of data and in order to improve the
accuracy of the estimate of the unknown pdf parameter.
27 / 55
Images/cinvestav-
Bootstrapping Techniques
Why?
They constitute an alternative way to cope with a limited amount of data.
Then
The idea here is to parameterize the unknown pdf in terms of an unknown
parameter.
How
To cope with the limited amount of data and in order to improve the
accuracy of the estimate of the unknown pdf parameter.
27 / 55
Images/cinvestav-
The Process
For this, we create
Several “fake”data sets X1, ..., Xr are created by sampling X with
replacement.
Thus
By using this sample, we estimate the desired pdf for q.
Then
Typically, good estimates are obtained if r is between 100 and 200.
28 / 55
Images/cinvestav-
The Process
For this, we create
Several “fake”data sets X1, ..., Xr are created by sampling X with
replacement.
Thus
By using this sample, we estimate the desired pdf for q.
Then
Typically, good estimates are obtained if r is between 100 and 200.
28 / 55
Images/cinvestav-
The Process
For this, we create
Several “fake”data sets X1, ..., Xr are created by sampling X with
replacement.
Thus
By using this sample, we estimate the desired pdf for q.
Then
Typically, good estimates are obtained if r is between 100 and 200.
28 / 55
Images/cinvestav-
Outline
1 Introduction
What is a Good Clustering?
2 Cluster Validity
The Process
Hypothesis Testing
Monte Carlo techniques
Bootstrapping Techniques
Which Hypothesis?
Hypothesis Testing in Cluster Validity
External Criteria
Relative Criteria
Hard Clustering
29 / 55
Images/cinvestav-
Which Hypothesis?
Random position hypothesis
H0: All the locations of N data points in some specific region of a
d-dimensional space are equally likely.
Random graph hypothesis
H0: All N × N rank order proximity matrices are equally likely.
Random label hypothesis
H0: All permutations of the labels on N data objects are equally likely.
30 / 55
Images/cinvestav-
Which Hypothesis?
Random position hypothesis
H0: All the locations of N data points in some specific region of a
d-dimensional space are equally likely.
Random graph hypothesis
H0: All N × N rank order proximity matrices are equally likely.
Random label hypothesis
H0: All permutations of the labels on N data objects are equally likely.
30 / 55
Images/cinvestav-
Which Hypothesis?
Random position hypothesis
H0: All the locations of N data points in some specific region of a
d-dimensional space are equally likely.
Random graph hypothesis
H0: All N × N rank order proximity matrices are equally likely.
Random label hypothesis
H0: All permutations of the labels on N data objects are equally likely.
30 / 55
Images/cinvestav-
Thus
We must define an appropriate statistic
Whose values are indicative of the structure of a data set, and compare
the value that results from our data set X against the value obtained from
the reference (random) population.
Random Population
In order to obtain the baseline distribution under the null hypothesis,
statistical sampling techniques like Monte Carlo analysis and
bootstrapping are used (Jain and Dubes, 1988).
31 / 55
Images/cinvestav-
Thus
We must define an appropriate statistic
Whose values are indicative of the structure of a data set, and compare
the value that results from our data set X against the value obtained from
the reference (random) population.
Random Population
In order to obtain the baseline distribution under the null hypothesis,
statistical sampling techniques like Monte Carlo analysis and
bootstrapping are used (Jain and Dubes, 1988).
31 / 55
Images/cinvestav-
For example
Random position hypothesis - appropriate for ratio data
All the arrangements of N vectors in a specific region of the d-dimensional
space are equally likely to occur.
How?
One way to produce such an arrangement is to insert each point randomly
in this region of the d-dimensional space, according to a uniform
distribution.
The random position hypothesis
It can be used with either external or internal criterion.
32 / 55
Images/cinvestav-
For example
Random position hypothesis - appropriate for ratio data
All the arrangements of N vectors in a specific region of the d-dimensional
space are equally likely to occur.
How?
One way to produce such an arrangement is to insert each point randomly
in this region of the d-dimensional space, according to a uniform
distribution.
The random position hypothesis
It can be used with either external or internal criterion.
32 / 55
Images/cinvestav-
For example
Random position hypothesis - appropriate for ratio data
All the arrangements of N vectors in a specific region of the d-dimensional
space are equally likely to occur.
How?
One way to produce such an arrangement is to insert each point randomly
in this region of the d-dimensional space, according to a uniform
distribution.
The random position hypothesis
It can be used with either external or internal criterion.
32 / 55
Images/cinvestav-
Outline
1 Introduction
What is a Good Clustering?
2 Cluster Validity
The Process
Hypothesis Testing
Monte Carlo techniques
Bootstrapping Techniques
Which Hypothesis?
Hypothesis Testing in Cluster Validity
External Criteria
Relative Criteria
Hard Clustering
33 / 55
Images/cinvestav-
Internal criteria
What do we do?
In this case, the statistic q is defined so as to measure the degree to which
a clustering structure, produced by a clustering algorithm, matches the
proximity matrix of the corresponding data set.
We apply our clustering algorithm to the following data set
1 Let Xi be a set of N vectors generated according to the random
position hypothesis.
2 Pi be the corresponding proximity matrix.
3 Ci the corresponding clustering.
Then, we apply our algorithm over the real data X to obtain C
Now, we compute the statistics q for each clustering structure.
34 / 55
Images/cinvestav-
Internal criteria
What do we do?
In this case, the statistic q is defined so as to measure the degree to which
a clustering structure, produced by a clustering algorithm, matches the
proximity matrix of the corresponding data set.
We apply our clustering algorithm to the following data set
1 Let Xi be a set of N vectors generated according to the random
position hypothesis.
2 Pi be the corresponding proximity matrix.
3 Ci the corresponding clustering.
Then, we apply our algorithm over the real data X to obtain C
Now, we compute the statistics q for each clustering structure.
34 / 55
Images/cinvestav-
Internal criteria
What do we do?
In this case, the statistic q is defined so as to measure the degree to which
a clustering structure, produced by a clustering algorithm, matches the
proximity matrix of the corresponding data set.
We apply our clustering algorithm to the following data set
1 Let Xi be a set of N vectors generated according to the random
position hypothesis.
2 Pi be the corresponding proximity matrix.
3 Ci the corresponding clustering.
Then, we apply our algorithm over the real data X to obtain C
Now, we compute the statistics q for each clustering structure.
34 / 55
Images/cinvestav-
Internal criteria
What do we do?
In this case, the statistic q is defined so as to measure the degree to which
a clustering structure, produced by a clustering algorithm, matches the
proximity matrix of the corresponding data set.
We apply our clustering algorithm to the following data set
1 Let Xi be a set of N vectors generated according to the random
position hypothesis.
2 Pi be the corresponding proximity matrix.
3 Ci the corresponding clustering.
Then, we apply our algorithm over the real data X to obtain C
Now, we compute the statistics q for each clustering structure.
34 / 55
Images/cinvestav-
Internal criteria
What do we do?
In this case, the statistic q is defined so as to measure the degree to which
a clustering structure, produced by a clustering algorithm, matches the
proximity matrix of the corresponding data set.
We apply our clustering algorithm to the following data set
1 Let Xi be a set of N vectors generated according to the random
position hypothesis.
2 Pi be the corresponding proximity matrix.
3 Ci the corresponding clustering.
Then, we apply our algorithm over the real data X to obtain C
Now, we compute the statistics q for each clustering structure.
34 / 55
Images/cinvestav-
Then, we use our hypothesis H0
The random hypothesis H0
It is rejected if the value q, resulting from X lies in the critical interval Dρ
of the statistic pdf of the reference random population.
Meaning
if q is unusually small or large.
35 / 55
Images/cinvestav-
Then, we use our hypothesis H0
The random hypothesis H0
It is rejected if the value q, resulting from X lies in the critical interval Dρ
of the statistic pdf of the reference random population.
Meaning
if q is unusually small or large.
35 / 55
Images/cinvestav-
Also a External Criteria can be used
Definition
The statistic q is defined to measure the degree of correspondence
between a prespecified structure P imposed on X.
And the clustering that results after the application of a specific
clustering algorithm.
Then
Then, the value of q corresponding to the clustering C resulting from
the data set X is tested against the qi’s.
These qi’s correspond to the clusterings resulting from the reference
population generated under the random position hypothesis.
Again
The random hypothesis is rejected if q is unusually large or small.
36 / 55
Images/cinvestav-
Also a External Criteria can be used
Definition
The statistic q is defined to measure the degree of correspondence
between a prespecified structure P imposed on X.
And the clustering that results after the application of a specific
clustering algorithm.
Then
Then, the value of q corresponding to the clustering C resulting from
the data set X is tested against the qi’s.
These qi’s correspond to the clusterings resulting from the reference
population generated under the random position hypothesis.
Again
The random hypothesis is rejected if q is unusually large or small.
36 / 55
Images/cinvestav-
Also a External Criteria can be used
Definition
The statistic q is defined to measure the degree of correspondence
between a prespecified structure P imposed on X.
And the clustering that results after the application of a specific
clustering algorithm.
Then
Then, the value of q corresponding to the clustering C resulting from
the data set X is tested against the qi’s.
These qi’s correspond to the clusterings resulting from the reference
population generated under the random position hypothesis.
Again
The random hypothesis is rejected if q is unusually large or small.
36 / 55
Images/cinvestav-
Also a External Criteria can be used
Definition
The statistic q is defined to measure the degree of correspondence
between a prespecified structure P imposed on X.
And the clustering that results after the application of a specific
clustering algorithm.
Then
Then, the value of q corresponding to the clustering C resulting from
the data set X is tested against the qi’s.
These qi’s correspond to the clusterings resulting from the reference
population generated under the random position hypothesis.
Again
The random hypothesis is rejected if q is unusually large or small.
36 / 55
Images/cinvestav-
Also a External Criteria can be used
Definition
The statistic q is defined to measure the degree of correspondence
between a prespecified structure P imposed on X.
And the clustering that results after the application of a specific
clustering algorithm.
Then
Then, the value of q corresponding to the clustering C resulting from
the data set X is tested against the qi’s.
These qi’s correspond to the clusterings resulting from the reference
population generated under the random position hypothesis.
Again
The random hypothesis is rejected if q is unusually large or small.
36 / 55
Images/cinvestav-
Nevertheless
There are more examples
In chapter 16 in the book of Theodoridis.
37 / 55
Images/cinvestav-
Outline
1 Introduction
What is a Good Clustering?
2 Cluster Validity
The Process
Hypothesis Testing
Monte Carlo techniques
Bootstrapping Techniques
Which Hypothesis?
Hypothesis Testing in Cluster Validity
External Criteria
Relative Criteria
Hard Clustering
38 / 55
Images/cinvestav-
External Criteria Usage
First
For the comparison of a clustering structure C, produced by a clustering
algorithm, with a partition P of X drawn independently from C.
Second
For measuring the degree of agreement between a predetermined partition
P and the proximity matrix of X, P.
39 / 55
Images/cinvestav-
External Criteria Usage
First
For the comparison of a clustering structure C, produced by a clustering
algorithm, with a partition P of X drawn independently from C.
Second
For measuring the degree of agreement between a predetermined partition
P and the proximity matrix of X, P.
39 / 55
Images/cinvestav-
Comparison of P with a Clustering C
First
In this case, C may be either a specific hierarchy of clusterings or a specific
clustering.
However
The problem with the hierarchical clustering is the cutting in the correct
level of the dendogram.
Then
We will concentrate on clustering that does not imply hierarchical
clustering.
40 / 55
Images/cinvestav-
Comparison of P with a Clustering C
First
In this case, C may be either a specific hierarchy of clusterings or a specific
clustering.
However
The problem with the hierarchical clustering is the cutting in the correct
level of the dendogram.
Then
We will concentrate on clustering that does not imply hierarchical
clustering.
40 / 55
Images/cinvestav-
Comparison of P with a Clustering C
First
In this case, C may be either a specific hierarchy of clusterings or a specific
clustering.
However
The problem with the hierarchical clustering is the cutting in the correct
level of the dendogram.
Then
We will concentrate on clustering that does not imply hierarchical
clustering.
40 / 55
Images/cinvestav-
Thus
Setup
Consider a clustering C given by a specific clustering algorithm.
This clustering is done in a independently drawn partition P.
Thus, we obtain the following sets
C = {C1, C2, ..., Cm}
P = {P1, P2, ..., Ps}
Note that the number of clusters in C need not be the same as the number
of groups in P.
41 / 55
Images/cinvestav-
Thus
Setup
Consider a clustering C given by a specific clustering algorithm.
This clustering is done in a independently drawn partition P.
Thus, we obtain the following sets
C = {C1, C2, ..., Cm}
P = {P1, P2, ..., Ps}
Note that the number of clusters in C need not be the same as the number
of groups in P.
41 / 55
Images/cinvestav-
Thus
Setup
Consider a clustering C given by a specific clustering algorithm.
This clustering is done in a independently drawn partition P.
Thus, we obtain the following sets
C = {C1, C2, ..., Cm}
P = {P1, P2, ..., Ps}
Note that the number of clusters in C need not be the same as the number
of groups in P.
41 / 55
Images/cinvestav-
Thus
Setup
Consider a clustering C given by a specific clustering algorithm.
This clustering is done in a independently drawn partition P.
Thus, we obtain the following sets
C = {C1, C2, ..., Cm}
P = {P1, P2, ..., Ps}
Note that the number of clusters in C need not be the same as the number
of groups in P.
41 / 55
Images/cinvestav-
Thus
Setup
Consider a clustering C given by a specific clustering algorithm.
This clustering is done in a independently drawn partition P.
Thus, we obtain the following sets
C = {C1, C2, ..., Cm}
P = {P1, P2, ..., Ps}
Note that the number of clusters in C need not be the same as the number
of groups in P.
41 / 55
Images/cinvestav-
Thus
Setup
Consider a clustering C given by a specific clustering algorithm.
This clustering is done in a independently drawn partition P.
Thus, we obtain the following sets
C = {C1, C2, ..., Cm}
P = {P1, P2, ..., Ps}
Note that the number of clusters in C need not be the same as the number
of groups in P.
41 / 55
Images/cinvestav-
Now
Consider the following
Let nij denote the number of vectors that belong to Ci and Pj
simultaneously.
Also nC
i = s
j=1 nij.
It is the number of vectors that belong to Ci.
Similarly nP
j = m
i=1 nij
The number of vectors that belong to Pj.
42 / 55
Images/cinvestav-
Now
Consider the following
Let nij denote the number of vectors that belong to Ci and Pj
simultaneously.
Also nC
i = s
j=1 nij.
It is the number of vectors that belong to Ci.
Similarly nP
j = m
i=1 nij
The number of vectors that belong to Pj.
42 / 55
Images/cinvestav-
Now
Consider the following
Let nij denote the number of vectors that belong to Ci and Pj
simultaneously.
Also nC
i = s
j=1 nij.
It is the number of vectors that belong to Ci.
Similarly nP
j = m
i=1 nij
The number of vectors that belong to Pj.
42 / 55
Images/cinvestav-
Consider the a pair of vectors (xv, xu)
Thus, we have the following cases
Case 1: If both vectors belong to the same cluster in C and to the
same group in P.
Case 2: if both vectors belong to the same cluster in C and to
different groups in P.
Case 3: if both vectors belong to the different clusters in C and to the
same group in P.
Case 4: if both vectors belong to different clusters in C and to
different groups in P.
43 / 55
Images/cinvestav-
Consider the a pair of vectors (xv, xu)
Thus, we have the following cases
Case 1: If both vectors belong to the same cluster in C and to the
same group in P.
Case 2: if both vectors belong to the same cluster in C and to
different groups in P.
Case 3: if both vectors belong to the different clusters in C and to the
same group in P.
Case 4: if both vectors belong to different clusters in C and to
different groups in P.
43 / 55
Images/cinvestav-
Consider the a pair of vectors (xv, xu)
Thus, we have the following cases
Case 1: If both vectors belong to the same cluster in C and to the
same group in P.
Case 2: if both vectors belong to the same cluster in C and to
different groups in P.
Case 3: if both vectors belong to the different clusters in C and to the
same group in P.
Case 4: if both vectors belong to different clusters in C and to
different groups in P.
43 / 55
Images/cinvestav-
Consider the a pair of vectors (xv, xu)
Thus, we have the following cases
Case 1: If both vectors belong to the same cluster in C and to the
same group in P.
Case 2: if both vectors belong to the same cluster in C and to
different groups in P.
Case 3: if both vectors belong to the different clusters in C and to the
same group in P.
Case 4: if both vectors belong to different clusters in C and to
different groups in P.
43 / 55
Images/cinvestav-
Example
The numbers of pairs of points for the four cases are denoted as a, b,
c, and d
44 / 55
Images/cinvestav-
Thus
The total number of pairs of points is N(N−1)
2
denoted as M
a + b + c + d = M (2)
Now
We can give some commonly used external indices for measuring the
match between C and P.
45 / 55
Images/cinvestav-
Thus
The total number of pairs of points is N(N−1)
2
denoted as M
a + b + c + d = M (2)
Now
We can give some commonly used external indices for measuring the
match between C and P.
45 / 55
Images/cinvestav-
Commonly used external indices
Rand index (Rand, 1971)
R =
a + d
M
(3)
Jaccard coefficient
J =
a
a + b + c
(4)
Fowlkes and Mallows index (Fowlkes and Mallows, 1983)
FM =
a
a + b
×
a
a + c
(5)
46 / 55
Images/cinvestav-
Commonly used external indices
Rand index (Rand, 1971)
R =
a + d
M
(3)
Jaccard coefficient
J =
a
a + b + c
(4)
Fowlkes and Mallows index (Fowlkes and Mallows, 1983)
FM =
a
a + b
×
a
a + c
(5)
46 / 55
Images/cinvestav-
Commonly used external indices
Rand index (Rand, 1971)
R =
a + d
M
(3)
Jaccard coefficient
J =
a
a + b + c
(4)
Fowlkes and Mallows index (Fowlkes and Mallows, 1983)
FM =
a
a + b
×
a
a + c
(5)
46 / 55
Images/cinvestav-
Explanation
Given a + d
The Rand statistic measures the fraction of the total number of pairs that
are either case 1 or 4.
Something Notable
The Jaccard coefficient follows the same philosophy as the Rand except
that it excludes case 4.
Properties
The values of these two statistics are between 0 and 1.
47 / 55
Images/cinvestav-
Explanation
Given a + d
The Rand statistic measures the fraction of the total number of pairs that
are either case 1 or 4.
Something Notable
The Jaccard coefficient follows the same philosophy as the Rand except
that it excludes case 4.
Properties
The values of these two statistics are between 0 and 1.
47 / 55
Images/cinvestav-
Explanation
Given a + d
The Rand statistic measures the fraction of the total number of pairs that
are either case 1 or 4.
Something Notable
The Jaccard coefficient follows the same philosophy as the Rand except
that it excludes case 4.
Properties
The values of these two statistics are between 0 and 1.
47 / 55
Images/cinvestav-
Commonly used external indices
Hubert’s Γ statistics
Γ =
Ma − m1m2
m1m2 (M − m1) (M − m2)
(6)
Property
Unusually large absolute values of suggest that C and P agree with each
other.
48 / 55
Images/cinvestav-
Commonly used external indices
Hubert’s Γ statistics
Γ =
Ma − m1m2
m1m2 (M − m1) (M − m2)
(6)
Property
Unusually large absolute values of suggest that C and P agree with each
other.
48 / 55
Images/cinvestav-
How we use all this?
Assuming a Random Hypothesis
It is possible using a Monte Carlo Method of sampling
Example: Gibbs Sampling
1 Initialize x to some value
2 Sample each variable in the feature vector x and resample
xi ∼ P xi|x(i=j)
49 / 55
Images/cinvestav-
How we use all this?
Assuming a Random Hypothesis
It is possible using a Monte Carlo Method of sampling
Example: Gibbs Sampling
1 Initialize x to some value
2 Sample each variable in the feature vector x and resample
xi ∼ P xi|x(i=j)
49 / 55
Images/cinvestav-
Example
A trace of sampling for a single variable
50 / 55
Images/cinvestav-
Algorithm and Example
Please
Go and read the section 16.3 in the Theodoridis’ Book page 871 for more.
51 / 55
Images/cinvestav-
Outline
1 Introduction
What is a Good Clustering?
2 Cluster Validity
The Process
Hypothesis Testing
Monte Carlo techniques
Bootstrapping Techniques
Which Hypothesis?
Hypothesis Testing in Cluster Validity
External Criteria
Relative Criteria
Hard Clustering
52 / 55
Images/cinvestav-
Hard Clustering
The Dunn and Dunn-like indices
Given a dissimilarity function:
d (Ci, Cj) = min
x∈Ci,y∈Cj
d (x, y) (7)
The it is possible to define the diameter of a cluster
diam (C) = max
x,y∈C
d (x, y) (8)
Then the Dunn Index
Dm = min
i=1,..,m
min
j=i+1,...,m
d (Ci, Cj)
maxk=1,...,m diam (Ck)
(9)
53 / 55
Images/cinvestav-
Hard Clustering
The Dunn and Dunn-like indices
Given a dissimilarity function:
d (Ci, Cj) = min
x∈Ci,y∈Cj
d (x, y) (7)
The it is possible to define the diameter of a cluster
diam (C) = max
x,y∈C
d (x, y) (8)
Then the Dunn Index
Dm = min
i=1,..,m
min
j=i+1,...,m
d (Ci, Cj)
maxk=1,...,m diam (Ck)
(9)
53 / 55
Images/cinvestav-
Hard Clustering
The Dunn and Dunn-like indices
Given a dissimilarity function:
d (Ci, Cj) = min
x∈Ci,y∈Cj
d (x, y) (7)
The it is possible to define the diameter of a cluster
diam (C) = max
x,y∈C
d (x, y) (8)
Then the Dunn Index
Dm = min
i=1,..,m
min
j=i+1,...,m
d (Ci, Cj)
maxk=1,...,m diam (Ck)
(9)
53 / 55
Images/cinvestav-
Thus
It is possible to prove that
If X contains compact and well-separated clusters, Dunn’s index will be
large
Example
54 / 55
Images/cinvestav-
Thus
It is possible to prove that
If X contains compact and well-separated clusters, Dunn’s index will be
large
Example
54 / 55
Images/cinvestav-
Although there are more
Please
Look at chapter 16 in the Theodoridis for more examples
55 / 55

More Related Content

What's hot

18 Machine Learning Radial Basis Function Networks Forward Heuristics
18 Machine Learning Radial Basis Function Networks Forward Heuristics18 Machine Learning Radial Basis Function Networks Forward Heuristics
18 Machine Learning Radial Basis Function Networks Forward HeuristicsAndres Mendez-Vazquez
 
Machine learning in science and industry — day 1
Machine learning in science and industry — day 1Machine learning in science and industry — day 1
Machine learning in science and industry — day 1arogozhnikov
 
Machine learning in science and industry — day 2
Machine learning in science and industry — day 2Machine learning in science and industry — day 2
Machine learning in science and industry — day 2arogozhnikov
 
Machine learning in science and industry — day 3
Machine learning in science and industry — day 3Machine learning in science and industry — day 3
Machine learning in science and industry — day 3arogozhnikov
 
16 Machine Learning Universal Approximation Multilayer Perceptron
16 Machine Learning Universal Approximation Multilayer Perceptron16 Machine Learning Universal Approximation Multilayer Perceptron
16 Machine Learning Universal Approximation Multilayer PerceptronAndres Mendez-Vazquez
 
Random Matrix Theory in Array Signal Processing: Application Examples
Random Matrix Theory in Array Signal Processing: Application ExamplesRandom Matrix Theory in Array Signal Processing: Application Examples
Random Matrix Theory in Array Signal Processing: Application ExamplesFörderverein Technische Fakultät
 
Vc dimension in Machine Learning
Vc dimension in Machine LearningVc dimension in Machine Learning
Vc dimension in Machine LearningVARUN KUMAR
 
Probabilistic PCA, EM, and more
Probabilistic PCA, EM, and moreProbabilistic PCA, EM, and more
Probabilistic PCA, EM, and morehsharmasshare
 
Principal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionPrincipal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionJordan McBain
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...butest
 
A Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation ProblemA Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation ProblemErika G. G.
 
MLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic trackMLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic trackarogozhnikov
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesnextlib
 
Reweighting and Boosting to uniforimty in HEP
Reweighting and Boosting to uniforimty in HEPReweighting and Boosting to uniforimty in HEP
Reweighting and Boosting to uniforimty in HEParogozhnikov
 
07 Machine Learning - Expectation Maximization
07 Machine Learning - Expectation Maximization07 Machine Learning - Expectation Maximization
07 Machine Learning - Expectation MaximizationAndres Mendez-Vazquez
 

What's hot (20)

18.1 combining models
18.1 combining models18.1 combining models
18.1 combining models
 
18 Machine Learning Radial Basis Function Networks Forward Heuristics
18 Machine Learning Radial Basis Function Networks Forward Heuristics18 Machine Learning Radial Basis Function Networks Forward Heuristics
18 Machine Learning Radial Basis Function Networks Forward Heuristics
 
Machine learning in science and industry — day 1
Machine learning in science and industry — day 1Machine learning in science and industry — day 1
Machine learning in science and industry — day 1
 
Polynomial Matrix Decompositions
Polynomial Matrix DecompositionsPolynomial Matrix Decompositions
Polynomial Matrix Decompositions
 
Estimating Space-Time Covariance from Finite Sample Sets
Estimating Space-Time Covariance from Finite Sample SetsEstimating Space-Time Covariance from Finite Sample Sets
Estimating Space-Time Covariance from Finite Sample Sets
 
Machine learning in science and industry — day 2
Machine learning in science and industry — day 2Machine learning in science and industry — day 2
Machine learning in science and industry — day 2
 
Machine learning in science and industry — day 3
Machine learning in science and industry — day 3Machine learning in science and industry — day 3
Machine learning in science and industry — day 3
 
16 Machine Learning Universal Approximation Multilayer Perceptron
16 Machine Learning Universal Approximation Multilayer Perceptron16 Machine Learning Universal Approximation Multilayer Perceptron
16 Machine Learning Universal Approximation Multilayer Perceptron
 
Random Matrix Theory in Array Signal Processing: Application Examples
Random Matrix Theory in Array Signal Processing: Application ExamplesRandom Matrix Theory in Array Signal Processing: Application Examples
Random Matrix Theory in Array Signal Processing: Application Examples
 
06 Machine Learning - Naive Bayes
06 Machine Learning - Naive Bayes06 Machine Learning - Naive Bayes
06 Machine Learning - Naive Bayes
 
Vc dimension in Machine Learning
Vc dimension in Machine LearningVc dimension in Machine Learning
Vc dimension in Machine Learning
 
Probabilistic PCA, EM, and more
Probabilistic PCA, EM, and moreProbabilistic PCA, EM, and more
Probabilistic PCA, EM, and more
 
Principal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionPrincipal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty Detection
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...
 
A Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation ProblemA Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation Problem
 
MLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic trackMLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic track
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Svm
SvmSvm
Svm
 
Reweighting and Boosting to uniforimty in HEP
Reweighting and Boosting to uniforimty in HEPReweighting and Boosting to uniforimty in HEP
Reweighting and Boosting to uniforimty in HEP
 
07 Machine Learning - Expectation Maximization
07 Machine Learning - Expectation Maximization07 Machine Learning - Expectation Maximization
07 Machine Learning - Expectation Maximization
 

Viewers also liked

Viewers also liked (20)

Periodic table (1)
Periodic table (1)Periodic table (1)
Periodic table (1)
 
Preparation Data Structures 11 graphs
Preparation Data Structures 11 graphsPreparation Data Structures 11 graphs
Preparation Data Structures 11 graphs
 
Preparation Data Structures 07 stacks
Preparation Data Structures 07 stacksPreparation Data Structures 07 stacks
Preparation Data Structures 07 stacks
 
Waves
WavesWaves
Waves
 
26 Computational Geometry
26 Computational Geometry26 Computational Geometry
26 Computational Geometry
 
What's the Matter?
What's the Matter?What's the Matter?
What's the Matter?
 
03 Probability Review for Analysis of Algorithms
03 Probability Review for Analysis of Algorithms03 Probability Review for Analysis of Algorithms
03 Probability Review for Analysis of Algorithms
 
01 Machine Learning Introduction
01 Machine Learning Introduction 01 Machine Learning Introduction
01 Machine Learning Introduction
 
13 Machine Learning Supervised Decision Trees
13 Machine Learning Supervised Decision Trees13 Machine Learning Supervised Decision Trees
13 Machine Learning Supervised Decision Trees
 
Work energy power 2 reading assignment -revision 2
Work energy power 2 reading assignment -revision 2Work energy power 2 reading assignment -revision 2
Work energy power 2 reading assignment -revision 2
 
23 Matrix Algorithms
23 Matrix Algorithms23 Matrix Algorithms
23 Matrix Algorithms
 
Drainage
DrainageDrainage
Drainage
 
Preparation Data Structures 05 chain linear_list
Preparation Data Structures 05 chain linear_listPreparation Data Structures 05 chain linear_list
Preparation Data Structures 05 chain linear_list
 
Preparation Data Structures 10 trees
Preparation Data Structures 10 treesPreparation Data Structures 10 trees
Preparation Data Structures 10 trees
 
15 B-Trees
15 B-Trees15 B-Trees
15 B-Trees
 
Force resultant force
Force resultant forceForce resultant force
Force resultant force
 
Force resolution force
Force resolution forceForce resolution force
Force resolution force
 
F1
F1F1
F1
 
12 Machine Learning Supervised Hidden Markov Chains
12 Machine Learning  Supervised Hidden Markov Chains12 Machine Learning  Supervised Hidden Markov Chains
12 Machine Learning Supervised Hidden Markov Chains
 
Drainage
DrainageDrainage
Drainage
 

Similar to 31 Machine Learning Unsupervised Cluster Validity

13ClassifierPerformance.pdf
13ClassifierPerformance.pdf13ClassifierPerformance.pdf
13ClassifierPerformance.pdfssuserdce5c21
 
Composing graphical models with neural networks for structured representatio...
Composing graphical models with  neural networks for structured representatio...Composing graphical models with  neural networks for structured representatio...
Composing graphical models with neural networks for structured representatio...Jeongmin Cha
 
Chapter 02-logistic regression
Chapter 02-logistic regressionChapter 02-logistic regression
Chapter 02-logistic regressionRaman Kannan
 
It's Not Magic - Explaining classification algorithms
It's Not Magic - Explaining classification algorithmsIt's Not Magic - Explaining classification algorithms
It's Not Magic - Explaining classification algorithmsBrian Lange
 
Computational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding RegionsComputational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding Regionsbutest
 
Support Vector Machines USING MACHINE LEARNING HOW IT WORKS
Support Vector Machines USING MACHINE LEARNING HOW IT WORKSSupport Vector Machines USING MACHINE LEARNING HOW IT WORKS
Support Vector Machines USING MACHINE LEARNING HOW IT WORKSrajalakshmi5921
 
Robustness Metrics for ML Models based on Deep Learning Methods
Robustness Metrics for ML Models based on Deep Learning MethodsRobustness Metrics for ML Models based on Deep Learning Methods
Robustness Metrics for ML Models based on Deep Learning MethodsData Science Milan
 
Probability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsProbability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsChirag Gupta
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationBoston Institute of Analytics
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional VerificationSai Kiran Kadam
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9Roger Barga
 
CS 402 DATAMINING AND WAREHOUSING -MODULE 4
CS 402 DATAMINING AND WAREHOUSING -MODULE 4CS 402 DATAMINING AND WAREHOUSING -MODULE 4
CS 402 DATAMINING AND WAREHOUSING -MODULE 4NIMMYRAJU
 
RBM from Scratch
RBM from Scratch RBM from Scratch
RBM from Scratch Hadi Sinaee
 
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017 John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017 MLconf
 
CounterFactual Explanations.pdf
CounterFactual Explanations.pdfCounterFactual Explanations.pdf
CounterFactual Explanations.pdfBong-Ho Lee
 
25 Machine Learning Unsupervised Learaning K-means K-centers
25 Machine Learning Unsupervised Learaning K-means K-centers25 Machine Learning Unsupervised Learaning K-means K-centers
25 Machine Learning Unsupervised Learaning K-means K-centersAndres Mendez-Vazquez
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseasesijsrd.com
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix DatasetBen Mabey
 

Similar to 31 Machine Learning Unsupervised Cluster Validity (20)

13ClassifierPerformance.pdf
13ClassifierPerformance.pdf13ClassifierPerformance.pdf
13ClassifierPerformance.pdf
 
Composing graphical models with neural networks for structured representatio...
Composing graphical models with  neural networks for structured representatio...Composing graphical models with  neural networks for structured representatio...
Composing graphical models with neural networks for structured representatio...
 
Chapter 02-logistic regression
Chapter 02-logistic regressionChapter 02-logistic regression
Chapter 02-logistic regression
 
It's Not Magic - Explaining classification algorithms
It's Not Magic - Explaining classification algorithmsIt's Not Magic - Explaining classification algorithms
It's Not Magic - Explaining classification algorithms
 
Computational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding RegionsComputational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding Regions
 
Support Vector Machines USING MACHINE LEARNING HOW IT WORKS
Support Vector Machines USING MACHINE LEARNING HOW IT WORKSSupport Vector Machines USING MACHINE LEARNING HOW IT WORKS
Support Vector Machines USING MACHINE LEARNING HOW IT WORKS
 
Robustness Metrics for ML Models based on Deep Learning Methods
Robustness Metrics for ML Models based on Deep Learning MethodsRobustness Metrics for ML Models based on Deep Learning Methods
Robustness Metrics for ML Models based on Deep Learning Methods
 
Probability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsProbability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional Experts
 
Declarative data analysis
Declarative data analysisDeclarative data analysis
Declarative data analysis
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health Classification
 
AI Algorithms
AI AlgorithmsAI Algorithms
AI Algorithms
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional Verification
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9
 
CS 402 DATAMINING AND WAREHOUSING -MODULE 4
CS 402 DATAMINING AND WAREHOUSING -MODULE 4CS 402 DATAMINING AND WAREHOUSING -MODULE 4
CS 402 DATAMINING AND WAREHOUSING -MODULE 4
 
RBM from Scratch
RBM from Scratch RBM from Scratch
RBM from Scratch
 
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017 John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
 
CounterFactual Explanations.pdf
CounterFactual Explanations.pdfCounterFactual Explanations.pdf
CounterFactual Explanations.pdf
 
25 Machine Learning Unsupervised Learaning K-means K-centers
25 Machine Learning Unsupervised Learaning K-means K-centers25 Machine Learning Unsupervised Learaning K-means K-centers
25 Machine Learning Unsupervised Learaning K-means K-centers
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseases
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix Dataset
 

More from Andres Mendez-Vazquez

01.04 orthonormal basis_eigen_vectors
01.04 orthonormal basis_eigen_vectors01.04 orthonormal basis_eigen_vectors
01.04 orthonormal basis_eigen_vectorsAndres Mendez-Vazquez
 
01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issues01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issuesAndres Mendez-Vazquez
 
05 backpropagation automatic_differentiation
05 backpropagation automatic_differentiation05 backpropagation automatic_differentiation
05 backpropagation automatic_differentiationAndres Mendez-Vazquez
 
01 Introduction to Neural Networks and Deep Learning
01 Introduction to Neural Networks and Deep Learning01 Introduction to Neural Networks and Deep Learning
01 Introduction to Neural Networks and Deep LearningAndres Mendez-Vazquez
 
25 introduction reinforcement_learning
25 introduction reinforcement_learning25 introduction reinforcement_learning
25 introduction reinforcement_learningAndres Mendez-Vazquez
 
Neural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning SyllabusNeural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning SyllabusAndres Mendez-Vazquez
 
Introduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabusIntroduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabusAndres Mendez-Vazquez
 
Ideas about a Bachelor in Machine Learning/Data Sciences
Ideas about a Bachelor in Machine Learning/Data SciencesIdeas about a Bachelor in Machine Learning/Data Sciences
Ideas about a Bachelor in Machine Learning/Data SciencesAndres Mendez-Vazquez
 
20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variations20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variationsAndres Mendez-Vazquez
 
Introduction Mathematics Intelligent Systems Syllabus
Introduction Mathematics Intelligent Systems SyllabusIntroduction Mathematics Intelligent Systems Syllabus
Introduction Mathematics Intelligent Systems SyllabusAndres Mendez-Vazquez
 

More from Andres Mendez-Vazquez (20)

2.03 bayesian estimation
2.03 bayesian estimation2.03 bayesian estimation
2.03 bayesian estimation
 
05 linear transformations
05 linear transformations05 linear transformations
05 linear transformations
 
01.04 orthonormal basis_eigen_vectors
01.04 orthonormal basis_eigen_vectors01.04 orthonormal basis_eigen_vectors
01.04 orthonormal basis_eigen_vectors
 
01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issues01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issues
 
01.02 linear equations
01.02 linear equations01.02 linear equations
01.02 linear equations
 
01.01 vector spaces
01.01 vector spaces01.01 vector spaces
01.01 vector spaces
 
06 recurrent neural_networks
06 recurrent neural_networks06 recurrent neural_networks
06 recurrent neural_networks
 
05 backpropagation automatic_differentiation
05 backpropagation automatic_differentiation05 backpropagation automatic_differentiation
05 backpropagation automatic_differentiation
 
Zetta global
Zetta globalZetta global
Zetta global
 
01 Introduction to Neural Networks and Deep Learning
01 Introduction to Neural Networks and Deep Learning01 Introduction to Neural Networks and Deep Learning
01 Introduction to Neural Networks and Deep Learning
 
25 introduction reinforcement_learning
25 introduction reinforcement_learning25 introduction reinforcement_learning
25 introduction reinforcement_learning
 
Neural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning SyllabusNeural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning Syllabus
 
Introduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabusIntroduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabus
 
Ideas 09 22_2018
Ideas 09 22_2018Ideas 09 22_2018
Ideas 09 22_2018
 
Ideas about a Bachelor in Machine Learning/Data Sciences
Ideas about a Bachelor in Machine Learning/Data SciencesIdeas about a Bachelor in Machine Learning/Data Sciences
Ideas about a Bachelor in Machine Learning/Data Sciences
 
Analysis of Algorithms Syllabus
Analysis of Algorithms  SyllabusAnalysis of Algorithms  Syllabus
Analysis of Algorithms Syllabus
 
20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variations20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variations
 
17 vapnik chervonenkis dimension
17 vapnik chervonenkis dimension17 vapnik chervonenkis dimension
17 vapnik chervonenkis dimension
 
A basic introduction to learning
A basic introduction to learningA basic introduction to learning
A basic introduction to learning
 
Introduction Mathematics Intelligent Systems Syllabus
Introduction Mathematics Intelligent Systems SyllabusIntroduction Mathematics Intelligent Systems Syllabus
Introduction Mathematics Intelligent Systems Syllabus
 

Recently uploaded

Adsorption (mass transfer operations 2) ppt
Adsorption (mass transfer operations 2) pptAdsorption (mass transfer operations 2) ppt
Adsorption (mass transfer operations 2) pptjigup7320
 
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdfInvolute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdfJNTUA
 
Artificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdfArtificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdfKira Dess
 
Circuit Breakers for Engineering Students
Circuit Breakers for Engineering StudentsCircuit Breakers for Engineering Students
Circuit Breakers for Engineering Studentskannan348865
 
Working Principle of Echo Sounder and Doppler Effect.pdf
Working Principle of Echo Sounder and Doppler Effect.pdfWorking Principle of Echo Sounder and Doppler Effect.pdf
Working Principle of Echo Sounder and Doppler Effect.pdfSkNahidulIslamShrabo
 
Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...IJECEIAES
 
Independent Solar-Powered Electric Vehicle Charging Station
Independent Solar-Powered Electric Vehicle Charging StationIndependent Solar-Powered Electric Vehicle Charging Station
Independent Solar-Powered Electric Vehicle Charging Stationsiddharthteach18
 
一比一原版(NEU毕业证书)东北大学毕业证成绩单原件一模一样
一比一原版(NEU毕业证书)东北大学毕业证成绩单原件一模一样一比一原版(NEU毕业证书)东北大学毕业证成绩单原件一模一样
一比一原版(NEU毕业证书)东北大学毕业证成绩单原件一模一样A
 
CLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference ModalCLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference ModalSwarnaSLcse
 
History of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & ModernizationHistory of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & ModernizationEmaan Sharma
 
Databricks Generative AI Fundamentals .pdf
Databricks Generative AI Fundamentals  .pdfDatabricks Generative AI Fundamentals  .pdf
Databricks Generative AI Fundamentals .pdfVinayVadlagattu
 
Call for Papers - Journal of Electrical Systems (JES), E-ISSN: 1112-5209, ind...
Call for Papers - Journal of Electrical Systems (JES), E-ISSN: 1112-5209, ind...Call for Papers - Journal of Electrical Systems (JES), E-ISSN: 1112-5209, ind...
Call for Papers - Journal of Electrical Systems (JES), E-ISSN: 1112-5209, ind...Christo Ananth
 
Worksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxWorksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxMustafa Ahmed
 
Dynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptxDynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptxMustafa Ahmed
 
handbook on reinforce concrete and detailing
handbook on reinforce concrete and detailinghandbook on reinforce concrete and detailing
handbook on reinforce concrete and detailingAshishSingh1301
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Ramkumar k
 
01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...
01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...
01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...AshwaniAnuragi1
 
Filters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility ApplicationsFilters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility ApplicationsMathias Magdowski
 
Databricks Generative AI FoundationCertified.pdf
Databricks Generative AI FoundationCertified.pdfDatabricks Generative AI FoundationCertified.pdf
Databricks Generative AI FoundationCertified.pdfVinayVadlagattu
 

Recently uploaded (20)

Adsorption (mass transfer operations 2) ppt
Adsorption (mass transfer operations 2) pptAdsorption (mass transfer operations 2) ppt
Adsorption (mass transfer operations 2) ppt
 
Signal Processing and Linear System Analysis
Signal Processing and Linear System AnalysisSignal Processing and Linear System Analysis
Signal Processing and Linear System Analysis
 
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdfInvolute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
 
Artificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdfArtificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdf
 
Circuit Breakers for Engineering Students
Circuit Breakers for Engineering StudentsCircuit Breakers for Engineering Students
Circuit Breakers for Engineering Students
 
Working Principle of Echo Sounder and Doppler Effect.pdf
Working Principle of Echo Sounder and Doppler Effect.pdfWorking Principle of Echo Sounder and Doppler Effect.pdf
Working Principle of Echo Sounder and Doppler Effect.pdf
 
Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...
 
Independent Solar-Powered Electric Vehicle Charging Station
Independent Solar-Powered Electric Vehicle Charging StationIndependent Solar-Powered Electric Vehicle Charging Station
Independent Solar-Powered Electric Vehicle Charging Station
 
一比一原版(NEU毕业证书)东北大学毕业证成绩单原件一模一样
一比一原版(NEU毕业证书)东北大学毕业证成绩单原件一模一样一比一原版(NEU毕业证书)东北大学毕业证成绩单原件一模一样
一比一原版(NEU毕业证书)东北大学毕业证成绩单原件一模一样
 
CLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference ModalCLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference Modal
 
History of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & ModernizationHistory of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & Modernization
 
Databricks Generative AI Fundamentals .pdf
Databricks Generative AI Fundamentals  .pdfDatabricks Generative AI Fundamentals  .pdf
Databricks Generative AI Fundamentals .pdf
 
Call for Papers - Journal of Electrical Systems (JES), E-ISSN: 1112-5209, ind...
Call for Papers - Journal of Electrical Systems (JES), E-ISSN: 1112-5209, ind...Call for Papers - Journal of Electrical Systems (JES), E-ISSN: 1112-5209, ind...
Call for Papers - Journal of Electrical Systems (JES), E-ISSN: 1112-5209, ind...
 
Worksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxWorksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptx
 
Dynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptxDynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptx
 
handbook on reinforce concrete and detailing
handbook on reinforce concrete and detailinghandbook on reinforce concrete and detailing
handbook on reinforce concrete and detailing
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)
 
01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...
01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...
01-vogelsanger-stanag-4178-ed-2-the-new-nato-standard-for-nitrocellulose-test...
 
Filters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility ApplicationsFilters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility Applications
 
Databricks Generative AI FoundationCertified.pdf
Databricks Generative AI FoundationCertified.pdfDatabricks Generative AI FoundationCertified.pdf
Databricks Generative AI FoundationCertified.pdf
 

31 Machine Learning Unsupervised Cluster Validity

  • 1. Machine Learning for Data Mining Cluster Validity Andres Mendez-Vazquez July 30, 2015 1 / 55
  • 2. Images/cinvestav- Outline 1 Introduction What is a Good Clustering? 2 Cluster Validity The Process Hypothesis Testing Monte Carlo techniques Bootstrapping Techniques Which Hypothesis? Hypothesis Testing in Cluster Validity External Criteria Relative Criteria Hard Clustering 2 / 55
  • 3. Images/cinvestav- Outline 1 Introduction What is a Good Clustering? 2 Cluster Validity The Process Hypothesis Testing Monte Carlo techniques Bootstrapping Techniques Which Hypothesis? Hypothesis Testing in Cluster Validity External Criteria Relative Criteria Hard Clustering 3 / 55
  • 4. Images/cinvestav- What is a Good Clustering? Internal criterion A good clustering will produce high quality clusters in which: The intra-class (that is, intra-cluster) similarity is high. The inter-class similarity is low. The measured quality of a clustering depends on both the document representation and the similarity measure used. 4 / 55
  • 5. Images/cinvestav- What is a Good Clustering? Internal criterion A good clustering will produce high quality clusters in which: The intra-class (that is, intra-cluster) similarity is high. The inter-class similarity is low. The measured quality of a clustering depends on both the document representation and the similarity measure used. 4 / 55
  • 6. Images/cinvestav- What is a Good Clustering? Internal criterion A good clustering will produce high quality clusters in which: The intra-class (that is, intra-cluster) similarity is high. The inter-class similarity is low. The measured quality of a clustering depends on both the document representation and the similarity measure used. 4 / 55
  • 7. Images/cinvestav- What is a Good Clustering? Internal criterion A good clustering will produce high quality clusters in which: The intra-class (that is, intra-cluster) similarity is high. The inter-class similarity is low. The measured quality of a clustering depends on both the document representation and the similarity measure used. 4 / 55
  • 8. Images/cinvestav- Problem Many of the Clustering Algorithms They impose a clustering structure on the data, even though the data may not posses any. This is why Cluster analysis is not a panacea. Therefore It is necessary to have an indication that the vectors of X form clusters before we apply a clustering algorithm. 5 / 55
  • 9. Images/cinvestav- Problem Many of the Clustering Algorithms They impose a clustering structure on the data, even though the data may not posses any. This is why Cluster analysis is not a panacea. Therefore It is necessary to have an indication that the vectors of X form clusters before we apply a clustering algorithm. 5 / 55
  • 10. Images/cinvestav- Problem Many of the Clustering Algorithms They impose a clustering structure on the data, even though the data may not posses any. This is why Cluster analysis is not a panacea. Therefore It is necessary to have an indication that the vectors of X form clusters before we apply a clustering algorithm. 5 / 55
  • 11. Images/cinvestav- Then The problem of verifying whether X possesses a clustering structure Without identifying it explicitly, this is known as clustering tendency. 6 / 55
  • 12. Images/cinvestav- What can happen if X posses a cluster structure? A different kind of problem is encountered now All the previous algorithms require knowledge of the values of specific parameters. Some of them impose restrictions on the shape of the clusters. It is more Poor estimation of these parameters and inappropriate restrictions on the shape of the clusters may lead to incorrect conclusions about the clustering structure of X. Thus, it is necessary to discuss Methods suitable for quantitative evaluation of the results of a clustering algorithm. This task is known as cluster validity. 7 / 55
  • 13. Images/cinvestav- What can happen if X posses a cluster structure? A different kind of problem is encountered now All the previous algorithms require knowledge of the values of specific parameters. Some of them impose restrictions on the shape of the clusters. It is more Poor estimation of these parameters and inappropriate restrictions on the shape of the clusters may lead to incorrect conclusions about the clustering structure of X. Thus, it is necessary to discuss Methods suitable for quantitative evaluation of the results of a clustering algorithm. This task is known as cluster validity. 7 / 55
  • 14. Images/cinvestav- What can happen if X posses a cluster structure? A different kind of problem is encountered now All the previous algorithms require knowledge of the values of specific parameters. Some of them impose restrictions on the shape of the clusters. It is more Poor estimation of these parameters and inappropriate restrictions on the shape of the clusters may lead to incorrect conclusions about the clustering structure of X. Thus, it is necessary to discuss Methods suitable for quantitative evaluation of the results of a clustering algorithm. This task is known as cluster validity. 7 / 55
  • 15. Images/cinvestav- What can happen if X posses a cluster structure? A different kind of problem is encountered now All the previous algorithms require knowledge of the values of specific parameters. Some of them impose restrictions on the shape of the clusters. It is more Poor estimation of these parameters and inappropriate restrictions on the shape of the clusters may lead to incorrect conclusions about the clustering structure of X. Thus, it is necessary to discuss Methods suitable for quantitative evaluation of the results of a clustering algorithm. This task is known as cluster validity. 7 / 55
  • 16. Images/cinvestav- What can happen if X posses a cluster structure? A different kind of problem is encountered now All the previous algorithms require knowledge of the values of specific parameters. Some of them impose restrictions on the shape of the clusters. It is more Poor estimation of these parameters and inappropriate restrictions on the shape of the clusters may lead to incorrect conclusions about the clustering structure of X. Thus, it is necessary to discuss Methods suitable for quantitative evaluation of the results of a clustering algorithm. This task is known as cluster validity. 7 / 55
  • 17. Images/cinvestav- Outline 1 Introduction What is a Good Clustering? 2 Cluster Validity The Process Hypothesis Testing Monte Carlo techniques Bootstrapping Techniques Which Hypothesis? Hypothesis Testing in Cluster Validity External Criteria Relative Criteria Hard Clustering 8 / 55
  • 18. Images/cinvestav- The Cluster Flowchart Flowchart of the validity paradigm for clustering structures 9 / 55
  • 19. Images/cinvestav- Outline 1 Introduction What is a Good Clustering? 2 Cluster Validity The Process Hypothesis Testing Monte Carlo techniques Bootstrapping Techniques Which Hypothesis? Hypothesis Testing in Cluster Validity External Criteria Relative Criteria Hard Clustering 10 / 55
  • 20. Images/cinvestav- Hypothesis Testing Revisited Hypothesis H1 :θ = θ0 H0 :θ = θ0 In addition Also let Dρ be the critical interval corresponding to significance level ρ of a test statistics q. Now Given Θ1, the set of all values that θ may take under hypothesis H1. 11 / 55
  • 21. Images/cinvestav- Hypothesis Testing Revisited Hypothesis H1 :θ = θ0 H0 :θ = θ0 In addition Also let Dρ be the critical interval corresponding to significance level ρ of a test statistics q. Now Given Θ1, the set of all values that θ may take under hypothesis H1. 11 / 55
  • 22. Images/cinvestav- Hypothesis Testing Revisited Hypothesis H1 :θ = θ0 H0 :θ = θ0 In addition Also let Dρ be the critical interval corresponding to significance level ρ of a test statistics q. Now Given Θ1, the set of all values that θ may take under hypothesis H1. 11 / 55
  • 23. Images/cinvestav- Power Function Definition (Power Function) W (θ) = P q ∈ Dρ|θ ∈ Θ1 (1) Meaning W (θ) is the probability that q lies in the critical region when the value of the parameter vector θ. Thus The power function can be used for the comparison of two different statistical tests. The test whose power under the alternative hypotheses is greater is always preferred. 12 / 55
  • 24. Images/cinvestav- Power Function Definition (Power Function) W (θ) = P q ∈ Dρ|θ ∈ Θ1 (1) Meaning W (θ) is the probability that q lies in the critical region when the value of the parameter vector θ. Thus The power function can be used for the comparison of two different statistical tests. The test whose power under the alternative hypotheses is greater is always preferred. 12 / 55
  • 25. Images/cinvestav- Power Function Definition (Power Function) W (θ) = P q ∈ Dρ|θ ∈ Θ1 (1) Meaning W (θ) is the probability that q lies in the critical region when the value of the parameter vector θ. Thus The power function can be used for the comparison of two different statistical tests. The test whose power under the alternative hypotheses is greater is always preferred. 12 / 55
  • 26. Images/cinvestav- Power Function Definition (Power Function) W (θ) = P q ∈ Dρ|θ ∈ Θ1 (1) Meaning W (θ) is the probability that q lies in the critical region when the value of the parameter vector θ. Thus The power function can be used for the comparison of two different statistical tests. The test whose power under the alternative hypotheses is greater is always preferred. 12 / 55
  • 27. Images/cinvestav- There are two types of errors associated with a statistical test Suppose that H0 is true If q (x) ∈ Dρ, H0 will be rejected even if it is true. It is called a type error I. The probability of such error is ρ. The probability of accepting H0 when it is true is 1 − ρ. Suppose that H0 is false If q (x) /∈ Dρ, H0 will be accepted even if it is false. It is called a type error I. The probability of such error is 1 − W (θ). This depends on the specific value of θ. 13 / 55
  • 28. Images/cinvestav- There are two types of errors associated with a statistical test Suppose that H0 is true If q (x) ∈ Dρ, H0 will be rejected even if it is true. It is called a type error I. The probability of such error is ρ. The probability of accepting H0 when it is true is 1 − ρ. Suppose that H0 is false If q (x) /∈ Dρ, H0 will be accepted even if it is false. It is called a type error I. The probability of such error is 1 − W (θ). This depends on the specific value of θ. 13 / 55
  • 29. Images/cinvestav- There are two types of errors associated with a statistical test Suppose that H0 is true If q (x) ∈ Dρ, H0 will be rejected even if it is true. It is called a type error I. The probability of such error is ρ. The probability of accepting H0 when it is true is 1 − ρ. Suppose that H0 is false If q (x) /∈ Dρ, H0 will be accepted even if it is false. It is called a type error I. The probability of such error is 1 − W (θ). This depends on the specific value of θ. 13 / 55
  • 30. Images/cinvestav- There are two types of errors associated with a statistical test Suppose that H0 is true If q (x) ∈ Dρ, H0 will be rejected even if it is true. It is called a type error I. The probability of such error is ρ. The probability of accepting H0 when it is true is 1 − ρ. Suppose that H0 is false If q (x) /∈ Dρ, H0 will be accepted even if it is false. It is called a type error I. The probability of such error is 1 − W (θ). This depends on the specific value of θ. 13 / 55
  • 31. Images/cinvestav- There are two types of errors associated with a statistical test Suppose that H0 is true If q (x) ∈ Dρ, H0 will be rejected even if it is true. It is called a type error I. The probability of such error is ρ. The probability of accepting H0 when it is true is 1 − ρ. Suppose that H0 is false If q (x) /∈ Dρ, H0 will be accepted even if it is false. It is called a type error I. The probability of such error is 1 − W (θ). This depends on the specific value of θ. 13 / 55
  • 32. Images/cinvestav- There are two types of errors associated with a statistical test Suppose that H0 is true If q (x) ∈ Dρ, H0 will be rejected even if it is true. It is called a type error I. The probability of such error is ρ. The probability of accepting H0 when it is true is 1 − ρ. Suppose that H0 is false If q (x) /∈ Dρ, H0 will be accepted even if it is false. It is called a type error I. The probability of such error is 1 − W (θ). This depends on the specific value of θ. 13 / 55
  • 33. Images/cinvestav- There are two types of errors associated with a statistical test Suppose that H0 is true If q (x) ∈ Dρ, H0 will be rejected even if it is true. It is called a type error I. The probability of such error is ρ. The probability of accepting H0 when it is true is 1 − ρ. Suppose that H0 is false If q (x) /∈ Dρ, H0 will be accepted even if it is false. It is called a type error I. The probability of such error is 1 − W (θ). This depends on the specific value of θ. 13 / 55
  • 34. Images/cinvestav- There are two types of errors associated with a statistical test Suppose that H0 is true If q (x) ∈ Dρ, H0 will be rejected even if it is true. It is called a type error I. The probability of such error is ρ. The probability of accepting H0 when it is true is 1 − ρ. Suppose that H0 is false If q (x) /∈ Dρ, H0 will be accepted even if it is false. It is called a type error I. The probability of such error is 1 − W (θ). This depends on the specific value of θ. 13 / 55
  • 35. Images/cinvestav- Thus Something Notable The probability density function (pdf) of the statistic q, under H0 , for most of the statistics used in practice has a single maximum. In addition, the region Dρ, is either a half-line or the union of two half-lines. 14 / 55
  • 36. Images/cinvestav- Example Dρ is the union of two half-lines (A two-tailed statistical test) 15 / 55
  • 39. Images/cinvestav- Problem In many practical cases The exact form of the pdf of a statistic q, under a given hypothesis, is not available and it is difficult to obtain. However, we can do the following Monte Carlo techniques. Bootstrapping techniques. 18 / 55
  • 40. Images/cinvestav- Problem In many practical cases The exact form of the pdf of a statistic q, under a given hypothesis, is not available and it is difficult to obtain. However, we can do the following Monte Carlo techniques. Bootstrapping techniques. 18 / 55
  • 41. Images/cinvestav- Problem In many practical cases The exact form of the pdf of a statistic q, under a given hypothesis, is not available and it is difficult to obtain. However, we can do the following Monte Carlo techniques. Bootstrapping techniques. 18 / 55
  • 42. Images/cinvestav- Monte Carlo techniques What do they do? They rely on simulating the process at hand using a sufficient number of computer-generated data. Given enough data, we can try to learn the pdf for q. Then, using that pdf we simulate samples of q. Thus For each of the say r, data sets, Xi, we compute the value of q, denoted by qi. Then We construct the corresponding histogram of these values 19 / 55
  • 43. Images/cinvestav- Monte Carlo techniques What do they do? They rely on simulating the process at hand using a sufficient number of computer-generated data. Given enough data, we can try to learn the pdf for q. Then, using that pdf we simulate samples of q. Thus For each of the say r, data sets, Xi, we compute the value of q, denoted by qi. Then We construct the corresponding histogram of these values 19 / 55
  • 44. Images/cinvestav- Monte Carlo techniques What do they do? They rely on simulating the process at hand using a sufficient number of computer-generated data. Given enough data, we can try to learn the pdf for q. Then, using that pdf we simulate samples of q. Thus For each of the say r, data sets, Xi, we compute the value of q, denoted by qi. Then We construct the corresponding histogram of these values 19 / 55
  • 45. Images/cinvestav- Monte Carlo techniques What do they do? They rely on simulating the process at hand using a sufficient number of computer-generated data. Given enough data, we can try to learn the pdf for q. Then, using that pdf we simulate samples of q. Thus For each of the say r, data sets, Xi, we compute the value of q, denoted by qi. Then We construct the corresponding histogram of these values 19 / 55
  • 46. Images/cinvestav- Monte Carlo techniques What do they do? They rely on simulating the process at hand using a sufficient number of computer-generated data. Given enough data, we can try to learn the pdf for q. Then, using that pdf we simulate samples of q. Thus For each of the say r, data sets, Xi, we compute the value of q, denoted by qi. Then We construct the corresponding histogram of these values 19 / 55
  • 47. Images/cinvestav- Using this approximation Assume q corresponds to a right-tailed statistical test. A histogram is constructed using r values of q corresponding to the r data sets. 20 / 55
  • 48. Images/cinvestav- Using this approximation Assume q corresponds to a right-tailed statistical test. A histogram is constructed using r values of q corresponding to the r data sets. 20 / 55
  • 49. Images/cinvestav- Using this approximation A right-tailed test 21 / 55
  • 50. Images/cinvestav- Thus Then, acceptance or rejection may be based on the rules Reject H0, if q is greater than (1 − ρ) r of the qi values. Accept H0, if q is smaller than (1 − ρ) r of the qi values. 22 / 55
  • 51. Images/cinvestav- Thus Then, acceptance or rejection may be based on the rules Reject H0, if q is greater than (1 − ρ) r of the qi values. Accept H0, if q is smaller than (1 − ρ) r of the qi values. 22 / 55
  • 53. Images/cinvestav- Next For a left-tailed test, rejection or acceptance of the null hypothesis is done on the basis 1 Reject H0, if q is smaller than ρr of the qi values. 2 Accept H0, if q is greater than ρr of the qi values 24 / 55
  • 54. Images/cinvestav- Now, for two-tailed statistical test A two-tailed statistical test 25 / 55
  • 55. Images/cinvestav- Thus For a two-tailed test we have Accept H0, if q is greater than ρ 2 r of the qi values and less than 1 − ρ 2 r of the qi values. 26 / 55
  • 56. Images/cinvestav- Bootstrapping Techniques Why? They constitute an alternative way to cope with a limited amount of data. Then The idea here is to parameterize the unknown pdf in terms of an unknown parameter. How To cope with the limited amount of data and in order to improve the accuracy of the estimate of the unknown pdf parameter. 27 / 55
  • 57. Images/cinvestav- Bootstrapping Techniques Why? They constitute an alternative way to cope with a limited amount of data. Then The idea here is to parameterize the unknown pdf in terms of an unknown parameter. How To cope with the limited amount of data and in order to improve the accuracy of the estimate of the unknown pdf parameter. 27 / 55
  • 58. Images/cinvestav- Bootstrapping Techniques Why? They constitute an alternative way to cope with a limited amount of data. Then The idea here is to parameterize the unknown pdf in terms of an unknown parameter. How To cope with the limited amount of data and in order to improve the accuracy of the estimate of the unknown pdf parameter. 27 / 55
  • 59. Images/cinvestav- The Process For this, we create Several “fake”data sets X1, ..., Xr are created by sampling X with replacement. Thus By using this sample, we estimate the desired pdf for q. Then Typically, good estimates are obtained if r is between 100 and 200. 28 / 55
  • 60. Images/cinvestav- The Process For this, we create Several “fake”data sets X1, ..., Xr are created by sampling X with replacement. Thus By using this sample, we estimate the desired pdf for q. Then Typically, good estimates are obtained if r is between 100 and 200. 28 / 55
  • 61. Images/cinvestav- The Process For this, we create Several “fake”data sets X1, ..., Xr are created by sampling X with replacement. Thus By using this sample, we estimate the desired pdf for q. Then Typically, good estimates are obtained if r is between 100 and 200. 28 / 55
  • 62. Images/cinvestav- Outline 1 Introduction What is a Good Clustering? 2 Cluster Validity The Process Hypothesis Testing Monte Carlo techniques Bootstrapping Techniques Which Hypothesis? Hypothesis Testing in Cluster Validity External Criteria Relative Criteria Hard Clustering 29 / 55
  • 63. Images/cinvestav- Which Hypothesis? Random position hypothesis H0: All the locations of N data points in some specific region of a d-dimensional space are equally likely. Random graph hypothesis H0: All N × N rank order proximity matrices are equally likely. Random label hypothesis H0: All permutations of the labels on N data objects are equally likely. 30 / 55
  • 64. Images/cinvestav- Which Hypothesis? Random position hypothesis H0: All the locations of N data points in some specific region of a d-dimensional space are equally likely. Random graph hypothesis H0: All N × N rank order proximity matrices are equally likely. Random label hypothesis H0: All permutations of the labels on N data objects are equally likely. 30 / 55
  • 65. Images/cinvestav- Which Hypothesis? Random position hypothesis H0: All the locations of N data points in some specific region of a d-dimensional space are equally likely. Random graph hypothesis H0: All N × N rank order proximity matrices are equally likely. Random label hypothesis H0: All permutations of the labels on N data objects are equally likely. 30 / 55
  • 66. Images/cinvestav- Thus We must define an appropriate statistic Whose values are indicative of the structure of a data set, and compare the value that results from our data set X against the value obtained from the reference (random) population. Random Population In order to obtain the baseline distribution under the null hypothesis, statistical sampling techniques like Monte Carlo analysis and bootstrapping are used (Jain and Dubes, 1988). 31 / 55
  • 67. Images/cinvestav- Thus We must define an appropriate statistic Whose values are indicative of the structure of a data set, and compare the value that results from our data set X against the value obtained from the reference (random) population. Random Population In order to obtain the baseline distribution under the null hypothesis, statistical sampling techniques like Monte Carlo analysis and bootstrapping are used (Jain and Dubes, 1988). 31 / 55
  • 68. Images/cinvestav- For example Random position hypothesis - appropriate for ratio data All the arrangements of N vectors in a specific region of the d-dimensional space are equally likely to occur. How? One way to produce such an arrangement is to insert each point randomly in this region of the d-dimensional space, according to a uniform distribution. The random position hypothesis It can be used with either external or internal criterion. 32 / 55
  • 69. Images/cinvestav- For example Random position hypothesis - appropriate for ratio data All the arrangements of N vectors in a specific region of the d-dimensional space are equally likely to occur. How? One way to produce such an arrangement is to insert each point randomly in this region of the d-dimensional space, according to a uniform distribution. The random position hypothesis It can be used with either external or internal criterion. 32 / 55
  • 70. Images/cinvestav- For example Random position hypothesis - appropriate for ratio data All the arrangements of N vectors in a specific region of the d-dimensional space are equally likely to occur. How? One way to produce such an arrangement is to insert each point randomly in this region of the d-dimensional space, according to a uniform distribution. The random position hypothesis It can be used with either external or internal criterion. 32 / 55
  • 71. Images/cinvestav- Outline 1 Introduction What is a Good Clustering? 2 Cluster Validity The Process Hypothesis Testing Monte Carlo techniques Bootstrapping Techniques Which Hypothesis? Hypothesis Testing in Cluster Validity External Criteria Relative Criteria Hard Clustering 33 / 55
  • 72. Images/cinvestav- Internal criteria What do we do? In this case, the statistic q is defined so as to measure the degree to which a clustering structure, produced by a clustering algorithm, matches the proximity matrix of the corresponding data set. We apply our clustering algorithm to the following data set 1 Let Xi be a set of N vectors generated according to the random position hypothesis. 2 Pi be the corresponding proximity matrix. 3 Ci the corresponding clustering. Then, we apply our algorithm over the real data X to obtain C Now, we compute the statistics q for each clustering structure. 34 / 55
  • 73. Images/cinvestav- Internal criteria What do we do? In this case, the statistic q is defined so as to measure the degree to which a clustering structure, produced by a clustering algorithm, matches the proximity matrix of the corresponding data set. We apply our clustering algorithm to the following data set 1 Let Xi be a set of N vectors generated according to the random position hypothesis. 2 Pi be the corresponding proximity matrix. 3 Ci the corresponding clustering. Then, we apply our algorithm over the real data X to obtain C Now, we compute the statistics q for each clustering structure. 34 / 55
  • 74. Images/cinvestav- Internal criteria What do we do? In this case, the statistic q is defined so as to measure the degree to which a clustering structure, produced by a clustering algorithm, matches the proximity matrix of the corresponding data set. We apply our clustering algorithm to the following data set 1 Let Xi be a set of N vectors generated according to the random position hypothesis. 2 Pi be the corresponding proximity matrix. 3 Ci the corresponding clustering. Then, we apply our algorithm over the real data X to obtain C Now, we compute the statistics q for each clustering structure. 34 / 55
  • 75. Images/cinvestav- Internal criteria What do we do? In this case, the statistic q is defined so as to measure the degree to which a clustering structure, produced by a clustering algorithm, matches the proximity matrix of the corresponding data set. We apply our clustering algorithm to the following data set 1 Let Xi be a set of N vectors generated according to the random position hypothesis. 2 Pi be the corresponding proximity matrix. 3 Ci the corresponding clustering. Then, we apply our algorithm over the real data X to obtain C Now, we compute the statistics q for each clustering structure. 34 / 55
  • 76. Images/cinvestav- Internal criteria What do we do? In this case, the statistic q is defined so as to measure the degree to which a clustering structure, produced by a clustering algorithm, matches the proximity matrix of the corresponding data set. We apply our clustering algorithm to the following data set 1 Let Xi be a set of N vectors generated according to the random position hypothesis. 2 Pi be the corresponding proximity matrix. 3 Ci the corresponding clustering. Then, we apply our algorithm over the real data X to obtain C Now, we compute the statistics q for each clustering structure. 34 / 55
  • 77. Images/cinvestav- Then, we use our hypothesis H0 The random hypothesis H0 It is rejected if the value q, resulting from X lies in the critical interval Dρ of the statistic pdf of the reference random population. Meaning if q is unusually small or large. 35 / 55
  • 78. Images/cinvestav- Then, we use our hypothesis H0 The random hypothesis H0 It is rejected if the value q, resulting from X lies in the critical interval Dρ of the statistic pdf of the reference random population. Meaning if q is unusually small or large. 35 / 55
  • 79. Images/cinvestav- Also a External Criteria can be used Definition The statistic q is defined to measure the degree of correspondence between a prespecified structure P imposed on X. And the clustering that results after the application of a specific clustering algorithm. Then Then, the value of q corresponding to the clustering C resulting from the data set X is tested against the qi’s. These qi’s correspond to the clusterings resulting from the reference population generated under the random position hypothesis. Again The random hypothesis is rejected if q is unusually large or small. 36 / 55
  • 80. Images/cinvestav- Also a External Criteria can be used Definition The statistic q is defined to measure the degree of correspondence between a prespecified structure P imposed on X. And the clustering that results after the application of a specific clustering algorithm. Then Then, the value of q corresponding to the clustering C resulting from the data set X is tested against the qi’s. These qi’s correspond to the clusterings resulting from the reference population generated under the random position hypothesis. Again The random hypothesis is rejected if q is unusually large or small. 36 / 55
  • 81. Images/cinvestav- Also a External Criteria can be used Definition The statistic q is defined to measure the degree of correspondence between a prespecified structure P imposed on X. And the clustering that results after the application of a specific clustering algorithm. Then Then, the value of q corresponding to the clustering C resulting from the data set X is tested against the qi’s. These qi’s correspond to the clusterings resulting from the reference population generated under the random position hypothesis. Again The random hypothesis is rejected if q is unusually large or small. 36 / 55
  • 82. Images/cinvestav- Also a External Criteria can be used Definition The statistic q is defined to measure the degree of correspondence between a prespecified structure P imposed on X. And the clustering that results after the application of a specific clustering algorithm. Then Then, the value of q corresponding to the clustering C resulting from the data set X is tested against the qi’s. These qi’s correspond to the clusterings resulting from the reference population generated under the random position hypothesis. Again The random hypothesis is rejected if q is unusually large or small. 36 / 55
  • 83. Images/cinvestav- Also a External Criteria can be used Definition The statistic q is defined to measure the degree of correspondence between a prespecified structure P imposed on X. And the clustering that results after the application of a specific clustering algorithm. Then Then, the value of q corresponding to the clustering C resulting from the data set X is tested against the qi’s. These qi’s correspond to the clusterings resulting from the reference population generated under the random position hypothesis. Again The random hypothesis is rejected if q is unusually large or small. 36 / 55
  • 84. Images/cinvestav- Nevertheless There are more examples In chapter 16 in the book of Theodoridis. 37 / 55
  • 85. Images/cinvestav- Outline 1 Introduction What is a Good Clustering? 2 Cluster Validity The Process Hypothesis Testing Monte Carlo techniques Bootstrapping Techniques Which Hypothesis? Hypothesis Testing in Cluster Validity External Criteria Relative Criteria Hard Clustering 38 / 55
  • 86. Images/cinvestav- External Criteria Usage First For the comparison of a clustering structure C, produced by a clustering algorithm, with a partition P of X drawn independently from C. Second For measuring the degree of agreement between a predetermined partition P and the proximity matrix of X, P. 39 / 55
  • 87. Images/cinvestav- External Criteria Usage First For the comparison of a clustering structure C, produced by a clustering algorithm, with a partition P of X drawn independently from C. Second For measuring the degree of agreement between a predetermined partition P and the proximity matrix of X, P. 39 / 55
  • 88. Images/cinvestav- Comparison of P with a Clustering C First In this case, C may be either a specific hierarchy of clusterings or a specific clustering. However The problem with the hierarchical clustering is the cutting in the correct level of the dendogram. Then We will concentrate on clustering that does not imply hierarchical clustering. 40 / 55
  • 89. Images/cinvestav- Comparison of P with a Clustering C First In this case, C may be either a specific hierarchy of clusterings or a specific clustering. However The problem with the hierarchical clustering is the cutting in the correct level of the dendogram. Then We will concentrate on clustering that does not imply hierarchical clustering. 40 / 55
  • 90. Images/cinvestav- Comparison of P with a Clustering C First In this case, C may be either a specific hierarchy of clusterings or a specific clustering. However The problem with the hierarchical clustering is the cutting in the correct level of the dendogram. Then We will concentrate on clustering that does not imply hierarchical clustering. 40 / 55
  • 91. Images/cinvestav- Thus Setup Consider a clustering C given by a specific clustering algorithm. This clustering is done in a independently drawn partition P. Thus, we obtain the following sets C = {C1, C2, ..., Cm} P = {P1, P2, ..., Ps} Note that the number of clusters in C need not be the same as the number of groups in P. 41 / 55
  • 92. Images/cinvestav- Thus Setup Consider a clustering C given by a specific clustering algorithm. This clustering is done in a independently drawn partition P. Thus, we obtain the following sets C = {C1, C2, ..., Cm} P = {P1, P2, ..., Ps} Note that the number of clusters in C need not be the same as the number of groups in P. 41 / 55
  • 93. Images/cinvestav- Thus Setup Consider a clustering C given by a specific clustering algorithm. This clustering is done in a independently drawn partition P. Thus, we obtain the following sets C = {C1, C2, ..., Cm} P = {P1, P2, ..., Ps} Note that the number of clusters in C need not be the same as the number of groups in P. 41 / 55
  • 94. Images/cinvestav- Thus Setup Consider a clustering C given by a specific clustering algorithm. This clustering is done in a independently drawn partition P. Thus, we obtain the following sets C = {C1, C2, ..., Cm} P = {P1, P2, ..., Ps} Note that the number of clusters in C need not be the same as the number of groups in P. 41 / 55
  • 95. Images/cinvestav- Thus Setup Consider a clustering C given by a specific clustering algorithm. This clustering is done in a independently drawn partition P. Thus, we obtain the following sets C = {C1, C2, ..., Cm} P = {P1, P2, ..., Ps} Note that the number of clusters in C need not be the same as the number of groups in P. 41 / 55
  • 96. Images/cinvestav- Thus Setup Consider a clustering C given by a specific clustering algorithm. This clustering is done in a independently drawn partition P. Thus, we obtain the following sets C = {C1, C2, ..., Cm} P = {P1, P2, ..., Ps} Note that the number of clusters in C need not be the same as the number of groups in P. 41 / 55
  • 97. Images/cinvestav- Now Consider the following Let nij denote the number of vectors that belong to Ci and Pj simultaneously. Also nC i = s j=1 nij. It is the number of vectors that belong to Ci. Similarly nP j = m i=1 nij The number of vectors that belong to Pj. 42 / 55
  • 98. Images/cinvestav- Now Consider the following Let nij denote the number of vectors that belong to Ci and Pj simultaneously. Also nC i = s j=1 nij. It is the number of vectors that belong to Ci. Similarly nP j = m i=1 nij The number of vectors that belong to Pj. 42 / 55
  • 99. Images/cinvestav- Now Consider the following Let nij denote the number of vectors that belong to Ci and Pj simultaneously. Also nC i = s j=1 nij. It is the number of vectors that belong to Ci. Similarly nP j = m i=1 nij The number of vectors that belong to Pj. 42 / 55
  • 100. Images/cinvestav- Consider the a pair of vectors (xv, xu) Thus, we have the following cases Case 1: If both vectors belong to the same cluster in C and to the same group in P. Case 2: if both vectors belong to the same cluster in C and to different groups in P. Case 3: if both vectors belong to the different clusters in C and to the same group in P. Case 4: if both vectors belong to different clusters in C and to different groups in P. 43 / 55
  • 101. Images/cinvestav- Consider the a pair of vectors (xv, xu) Thus, we have the following cases Case 1: If both vectors belong to the same cluster in C and to the same group in P. Case 2: if both vectors belong to the same cluster in C and to different groups in P. Case 3: if both vectors belong to the different clusters in C and to the same group in P. Case 4: if both vectors belong to different clusters in C and to different groups in P. 43 / 55
  • 102. Images/cinvestav- Consider the a pair of vectors (xv, xu) Thus, we have the following cases Case 1: If both vectors belong to the same cluster in C and to the same group in P. Case 2: if both vectors belong to the same cluster in C and to different groups in P. Case 3: if both vectors belong to the different clusters in C and to the same group in P. Case 4: if both vectors belong to different clusters in C and to different groups in P. 43 / 55
  • 103. Images/cinvestav- Consider the a pair of vectors (xv, xu) Thus, we have the following cases Case 1: If both vectors belong to the same cluster in C and to the same group in P. Case 2: if both vectors belong to the same cluster in C and to different groups in P. Case 3: if both vectors belong to the different clusters in C and to the same group in P. Case 4: if both vectors belong to different clusters in C and to different groups in P. 43 / 55
  • 104. Images/cinvestav- Example The numbers of pairs of points for the four cases are denoted as a, b, c, and d 44 / 55
  • 105. Images/cinvestav- Thus The total number of pairs of points is N(N−1) 2 denoted as M a + b + c + d = M (2) Now We can give some commonly used external indices for measuring the match between C and P. 45 / 55
  • 106. Images/cinvestav- Thus The total number of pairs of points is N(N−1) 2 denoted as M a + b + c + d = M (2) Now We can give some commonly used external indices for measuring the match between C and P. 45 / 55
  • 107. Images/cinvestav- Commonly used external indices Rand index (Rand, 1971) R = a + d M (3) Jaccard coefficient J = a a + b + c (4) Fowlkes and Mallows index (Fowlkes and Mallows, 1983) FM = a a + b × a a + c (5) 46 / 55
  • 108. Images/cinvestav- Commonly used external indices Rand index (Rand, 1971) R = a + d M (3) Jaccard coefficient J = a a + b + c (4) Fowlkes and Mallows index (Fowlkes and Mallows, 1983) FM = a a + b × a a + c (5) 46 / 55
  • 109. Images/cinvestav- Commonly used external indices Rand index (Rand, 1971) R = a + d M (3) Jaccard coefficient J = a a + b + c (4) Fowlkes and Mallows index (Fowlkes and Mallows, 1983) FM = a a + b × a a + c (5) 46 / 55
  • 110. Images/cinvestav- Explanation Given a + d The Rand statistic measures the fraction of the total number of pairs that are either case 1 or 4. Something Notable The Jaccard coefficient follows the same philosophy as the Rand except that it excludes case 4. Properties The values of these two statistics are between 0 and 1. 47 / 55
  • 111. Images/cinvestav- Explanation Given a + d The Rand statistic measures the fraction of the total number of pairs that are either case 1 or 4. Something Notable The Jaccard coefficient follows the same philosophy as the Rand except that it excludes case 4. Properties The values of these two statistics are between 0 and 1. 47 / 55
  • 112. Images/cinvestav- Explanation Given a + d The Rand statistic measures the fraction of the total number of pairs that are either case 1 or 4. Something Notable The Jaccard coefficient follows the same philosophy as the Rand except that it excludes case 4. Properties The values of these two statistics are between 0 and 1. 47 / 55
  • 113. Images/cinvestav- Commonly used external indices Hubert’s Γ statistics Γ = Ma − m1m2 m1m2 (M − m1) (M − m2) (6) Property Unusually large absolute values of suggest that C and P agree with each other. 48 / 55
  • 114. Images/cinvestav- Commonly used external indices Hubert’s Γ statistics Γ = Ma − m1m2 m1m2 (M − m1) (M − m2) (6) Property Unusually large absolute values of suggest that C and P agree with each other. 48 / 55
  • 115. Images/cinvestav- How we use all this? Assuming a Random Hypothesis It is possible using a Monte Carlo Method of sampling Example: Gibbs Sampling 1 Initialize x to some value 2 Sample each variable in the feature vector x and resample xi ∼ P xi|x(i=j) 49 / 55
  • 116. Images/cinvestav- How we use all this? Assuming a Random Hypothesis It is possible using a Monte Carlo Method of sampling Example: Gibbs Sampling 1 Initialize x to some value 2 Sample each variable in the feature vector x and resample xi ∼ P xi|x(i=j) 49 / 55
  • 117. Images/cinvestav- Example A trace of sampling for a single variable 50 / 55
  • 118. Images/cinvestav- Algorithm and Example Please Go and read the section 16.3 in the Theodoridis’ Book page 871 for more. 51 / 55
  • 119. Images/cinvestav- Outline 1 Introduction What is a Good Clustering? 2 Cluster Validity The Process Hypothesis Testing Monte Carlo techniques Bootstrapping Techniques Which Hypothesis? Hypothesis Testing in Cluster Validity External Criteria Relative Criteria Hard Clustering 52 / 55
  • 120. Images/cinvestav- Hard Clustering The Dunn and Dunn-like indices Given a dissimilarity function: d (Ci, Cj) = min x∈Ci,y∈Cj d (x, y) (7) The it is possible to define the diameter of a cluster diam (C) = max x,y∈C d (x, y) (8) Then the Dunn Index Dm = min i=1,..,m min j=i+1,...,m d (Ci, Cj) maxk=1,...,m diam (Ck) (9) 53 / 55
  • 121. Images/cinvestav- Hard Clustering The Dunn and Dunn-like indices Given a dissimilarity function: d (Ci, Cj) = min x∈Ci,y∈Cj d (x, y) (7) The it is possible to define the diameter of a cluster diam (C) = max x,y∈C d (x, y) (8) Then the Dunn Index Dm = min i=1,..,m min j=i+1,...,m d (Ci, Cj) maxk=1,...,m diam (Ck) (9) 53 / 55
  • 122. Images/cinvestav- Hard Clustering The Dunn and Dunn-like indices Given a dissimilarity function: d (Ci, Cj) = min x∈Ci,y∈Cj d (x, y) (7) The it is possible to define the diameter of a cluster diam (C) = max x,y∈C d (x, y) (8) Then the Dunn Index Dm = min i=1,..,m min j=i+1,...,m d (Ci, Cj) maxk=1,...,m diam (Ck) (9) 53 / 55
  • 123. Images/cinvestav- Thus It is possible to prove that If X contains compact and well-separated clusters, Dunn’s index will be large Example 54 / 55
  • 124. Images/cinvestav- Thus It is possible to prove that If X contains compact and well-separated clusters, Dunn’s index will be large Example 54 / 55
  • 125. Images/cinvestav- Although there are more Please Look at chapter 16 in the Theodoridis for more examples 55 / 55