SlideShare a Scribd company logo
1 of 115
Download to read offline
Introduction to Machine Learning
K-Means, K-Meoids, K-Centers and Variations
Andres Mendez-Vazquez
August 4, 2018
1 / 142
Images/cinvestav-
Outline
1 K-Means Clustering
The NP-Hard Problem
K-Means Clustering Heuristic
Convergence Criterion
The Distance Function
Example
Properties of K-Means
K-Means and Principal Component Analysis
2 K-Meoids
Introduction
The Algorithm
Complexity
3 The K-Center Criterion Clustering
Introduction
Re-Stating the K-center as a Clustering Problem
Comparison with K-means
The Greedy K-Center Algorithm
Pseudo-Code
The K-Center Algorithm
Notes in Implementation
Examples
K-Center Algorithm Properties
K-Center Algorithm proof of correctness
4 Variations
Fuzzy Clustering
Rethinking K-Means Cost Function
Using the Lagrange Multipliers
Examples
Pros and Cons of FCM
What can we do? Possibilistic Clustering
Cost Function
2 / 142
Images/cinvestav-
The Hardness of K-means clustering
Definition
Given a multiset S ⊆ Rd , an integer k and L ∈ R, is there a subset
T ⊂ Rd with |T| = k such that
x∈S
min
t∈T
x − t 2
≤ L?
Theorem
The k-means clustering problem is NP-complete even for d = 2.
4 / 142
Images/cinvestav-
Reduction
The reduction to an NP-Complete problem
Exact Cover by 3-Sets problem
Definition
Given a finite set U containing exactly 3n elements and a collection
C = {S1, S2, ..., Sl} of subsets of U each of which contains exactly 3
elements, Are there n sets in C such that their union is U?
5 / 142
Images/cinvestav-
However
There are efficient heuristic and approximation algorithms
Which can solve this problem
6 / 142
Images/cinvestav-
K-Means - Stuart Lloyd(Circa 1957)
History
Invented by Stuart Loyd in Bell Labs to obtain the best quantization in a
signal data set.
Something Notable
The paper was published until 1982
Basically given N vectors x1, ..., xN ∈ Rd
It tries to find k points µ1, ..., µk ∈ Rd that minimize the expression (i.e.
a partition S of the vector points):
N
k=1 i:xi∈Ck
xi − µk
2
=
N
k=1 i:xi∈Ck
(xi − µk)T
(xi − µk)
8 / 142
Images/cinvestav-
K-means clustering
K-means
It is a partitional clustering algorithm.
Definition
Let the set of data points (or instances) D be {x1, · · · , xn} where
xi = (xi1, · · · , xir)T :
The K-means algorithm partitions the given data into K clusters.
Each cluster has a cluster center, called centroid.
K is specified by the user.
9 / 142
Images/cinvestav-
K-means algorithm
The K-means algorithm works as follows
Given k as the possible number of cluster:
1 Randomly choose K data points (seeds) to be the initial centroids,
cluster centers,
{v1, · · · , vk}
2 Assign each data point to the closest centroid
ci = arg min
j
{dist(xi − vj)}
3 Re-compute the centroids using the current cluster memberships.
vj =
n
i=1
I(ci = j)xi
n
i=1
I(ci = j)
4 If a convergence criterion is not met, go to 2.
10 / 142
Images/cinvestav-
What is the code trying to do?
It is trying to find a partition S
K-means tries to find a partition S such that it minimizes the cost
function:
min
S
N
k=1 i:xi∈Ck
(xi − µk)T
(xi − µk) (1)
Where µk is the centroid for cluster Ck
µk =
1
Nk i:xi∈Ck
xi (2)
Where Nk is the number of samples in the cluster Ck.
11 / 142
Images/cinvestav-
What Stopping/convergence criterion should we use?
First
No (or minimum) re-assignments of data points to different clusters.
Second
No (or minimum) change of centroids.
Third
Minimum decrease in the sum of squared error (SSE),
Ck is cluster k.
vk is the centroid of cluster Ck.
SSE =
K
k=1 x∈ck
dist (x, vk)2
13 / 142
Images/cinvestav-
The distance function
Actually, we have the following distance functions:
Euclidean
dist(x, y) = ||x − y|| = (x − y)T (x − y)
Manhattan
dist(x, y) = ||x − y||1 =
n
i=1
|xi − yi|
Mahalanobis
dist(x, y) = ||x − y||A = (x − y)T A(x − y)
15 / 142
Images/cinvestav-
An example
Dropping two possible centroids
17 / 142
Images/cinvestav-
An example
Calculate the memberships
18 / 142
Images/cinvestav-
An example
We re-calculate centroids
19 / 142
Images/cinvestav-
An example
We re-calculate memberships
20 / 142
Images/cinvestav-
An example
We re-calculate centroids and keep going
21 / 142
Images/cinvestav-
Strengths of K-means
Strengths
Simple: easy to understand and to implement
Efficient: Time complexity: O(tKN), where N is the number of data
points, K is the number of clusters, and t is the number of iterations.
Since both K and t are small. K-means is considered a linear
algorithm.
Popularity
K-means is the most popular clustering algorithm.
Note that
It terminates at a local optimum if SSE is used. The global optimum is
hard to find due to complexity.
23 / 142
Images/cinvestav-
Weaknesses of K-means
Important
The algorithm is only applicable if the mean is defined.
For categorical data, K-mode - the centroid is represented by most
frequent values.
In addition
The user needs to specify K.
Outliers
The algorithm is sensitive to outliers.
Outliers are data points that are very far away from other data points.
Outliers could be errors in the data recording or some special data
points with very different values.
24 / 142
Images/cinvestav-
Weaknesses of K-means: Problems with outliers
A series of outliers
4 2 0 2 4 6 8
3
2
1
0
1
2
3
4
4 2 0 2 4 6 8
3
2
1
0
1
2
3
4
25 / 142
Images/cinvestav-
Weaknesses of K-means: Problems with outliers
Nevertheless, if you have more dense clusters
4 2 0 2 4 6 8
4
3
2
1
0
1
2
3
4
4 2 0 2 4 6 8
4
3
2
1
0
1
2
3
4
26 / 142
Images/cinvestav-
Weaknesses of K-means: How to deal with outliers
One method
To remove some data points in the clustering process that are much
further away from the centroids than other data points.
To be safe, we may want to monitor these possible outliers over a few
iterations and then decide to remove them.
Another method
To perform random sampling.
Since in sampling we only choose a small subset of the data points,
the chance of selecting an outlier is very small.
Assign the rest of the data points to the clusters by distance or
similarity comparison, or classification.
27 / 142
Images/cinvestav-
Weaknesses of K-means (cont...)
The algorithm is sensitive to initial seeds
28 / 142
Images/cinvestav-
Weaknesses of K-means : Different Densities
We have three cluster nevertheless
29 / 142
Images/cinvestav-
Weaknesses of K-means: Non-globular Shapes
Here, we notice that K-means may only detect globular shapes
10 8 6 4 2 0 2 4 6 8
6
4
2
0
2
4
6
10 8 6 4 2 0 2 4 6 8
6
4
2
0
2
4
6
30 / 142
Images/cinvestav-
Weaknesses of K-means: Non-globular Shapes
However, it sometimes work better than expected
5 0 5
6
4
2
0
2
4
6
5 0 5
6
4
2
0
2
4
6
31 / 142
Images/cinvestav-
Consider the following
Theorem
Every matrix A ∈ Rm×n has an SVD.
Frobenious Matrix Norm
A F =
m
i=1
n
j=1
a2
ij = trace (AT A)
33 / 142
Images/cinvestav-
Then, you have a the Eckhart-Young Theorem
Theorem
Let A be a real m × n matrix. Then for any k ∈ N and any m × m
orthogonal projection matrix P of rank k, we have
A − PkA F ≤ A − PA F
with Pk =
k
i=1 uiuT
i
34 / 142
Images/cinvestav-
Thus
We have the Covariance matrix
S =
1
N − 1
N
i=1
(xi − x) (xi − x)T
Therefore, we have the following decomposition
S = UΣUT
Where UUT = I and U is a d × d matrix
35 / 142
Images/cinvestav-
Orthogonal Projection
Therefore, we have that U is a orthogonal projection
Given that UUT = I and Ux = x
Now, we can re-write k-means
fk−mean = min
µ1,...µk
i∈[n]
min
j∈[k]
xi − µj
2
36 / 142
Images/cinvestav-
Then
PCA can also re-write the cost function
fPCA = min
Pk
i∈[n]
xi − Pkxi
2
= min
Pk
i∈[n]
min
yi∈Pk
xi − yi
2
Where
Given that Pk is a projection into dimension k and y ∈ Pk means that
Pky = y
Furthermore
arg min
y∈P
x − y = Px
37 / 142
Images/cinvestav-
Thus, using the Eckhart-Young Theorem
Assume P∗
k which contains the k optimal centers
Given that µj ∈ P∗
k
fk−mean =
i∈[n]
min
j∈[k]
xi − µ∗
j
2
≥
i∈[n]
min
yi∈P∗
k
xi − yi
2
≥ min
Pk
i∈[n]
min
yi∈Pk
xi − yi
2
= min
Pk
i∈[n]
xi − Pkxi
2
= fPCA
38 / 142
Images/cinvestav-
Therefore
Now, consider solving k-means on the points yi instead
They are embedded into dimension exactly k by projection Pk
Therefore, given Pxi = yi and µj = Pµj
Where the S and µ are the assignments and centers of the projected
points yi:
j∈[k] i∈Sj
xi − µj
2
≥
j∈[k] i∈Sj
Pxi − Pµj
2
=
j∈[k] i∈Sj
yi − µj
2
≥
j∈[k] i∈Sj
yi − µj
2
= f∗
k−means
39 / 142
Images/cinvestav-
Therefore, your best beat
Steps
1 Compute the PCA of the points xi into dimension k.
2 Solve k-means on the points yi in dimension k.
3 Output the resulting clusters and centers.
40 / 142
Images/cinvestav-
Given that
We have that
fnew =
j∈[k] i∈S∗
j
xi − µ∗
j
2
= ∗
Therefore by the fact that xi − yi and yi − µ∗
j are perpendiculars
∗ =
j∈[k] i∈S∗
j
xi − yi
2
+ yi − µ∗
j
2
= ∗∗
Finally
∗∗ =
i∈[n]
xi − yi
2
+
j∈[k] i∈S∗
j
yi − µ∗
j
2
41 / 142
Images/cinvestav-
Therefore, we have
Something Notable
fPCA + f∗
k−means ≤ 2fk−means
42 / 142
Images/cinvestav-
Until now, we have assumed a Euclidean metric space
Important step
The cluster representatives m1, ..., mk in are taken to be the means of
the currently assigned clusters.
We can generalize this by using a dissimilarity D (xi, xi )
By using an explicit optimization with respect to m1, ..., mk
44 / 142
Images/cinvestav-
Algorithm K-meoids
Step 1
For a given cluster assignment C find the observation in the cluster
minimizing total distance to other points in that cluster:
i∗
k = arg min
{i|C(i)=k}
C(i )=k
D (xi, xi )
Then mk = xi∗
k
k = 1, ..., K are the current estimates of the cluster
centers.
46 / 142
Images/cinvestav-
Now
Step 2
Given a current set of cluster centers m1, ..., mk, minimize the total
error by assigning each observation to the closest (current) cluster
center:
C (i) = arg min
1≤k≤K
D (xi, mk)
Iterate over steps 1 and 2
Until the assignments do not change.
47 / 142
Images/cinvestav-
Complexity
Problem, solving the first step has a complexity for k = 1, ..., K
O N2
k
Given a set of cluster “centers,” {i1, i2, ..., iK}
Given the new assignments
C (i) = arg min
1≤k≤K
D (xi, mk)
It requires a complexity of O (KN) as before.
49 / 142
Images/cinvestav-
Therefore
We have that
K-medoids is more computationally intensive than K-means.
50 / 142
Images/cinvestav-
The K-center Problem
The input
It is a set of points with distances represented by a weighted graph
G = (V, V × V ).
Output
Select K centroids in G.
Such that minimize maximum distance of every point to a centroid.
Theorem (In the general case for any distance)
It is NP-hard to approximate the general K-center problem within any
factor α.
52 / 142
Images/cinvestav-
Therefore
We change the distance to be constrained by
The Triangle Inequality
Given x, y and z
L (x, z) ≤ L (x, y) + L (y, z)
53 / 142
Images/cinvestav-
We have a new criterion
Instead of using the K-mean criterion
We use the K-center criterion under the triangle inequality.
New Criterion
The K-center criterion partitions the points into K clusters so as to
minimize the maximum distance of any point to its cluster center.
55 / 142
Images/cinvestav-
Explanation
Setup
Suppose we have a data set A that contains N objects.
We want the following
We want to partition A into K sets labeled C1, C2, ..., CK.
Now
Define a cluster size for any cluster Ck as follows.
The smallest value D for which all points in this cluster Ck are:
Within distance D of each other.
Or within distance D
2 of some point called the cluster center.
56 / 142
Images/cinvestav-
Another Way
We have
Another way to define the cluster size:
D is the maximum pairwise distance between an arbitrary pair of
points in the cluster.
Or twice the maximum distance between a data point and a chosen
centroid.
Thus
Denoting the cluster size of Ck by Dk, we have that the cluster size of
partition (the way the points are grouped) S by:
D = max
k=1,...,K
Dk (3)
In another words
The cluster size of a partition composed of multiple clusters is the
maximum size of these clusters.
57 / 142
Images/cinvestav-
Comparison with K-means
We use the following distance for comparison
In order to compare these methods, we use Euclidean distance.
In K-means we assume the distance between vectors is the squared
Euclidean distance
K-means tries to find a partition S that minimizes:
min
S
N
k=1 i:xi∈Ck
(xi − µk)T
(xi − µk) (4)
Where µk is the centroid for cluster Ck
µk =
1
Nk i:xi∈Ck
xi (5)
59 / 142
Images/cinvestav-
Now, what about the K-centers
K-center, on the other hand, minimizes the worst case distance to
these centroids
min
S
max
k=1,...,K
max
i:xi∈Ck
(xi − µk)T
(xi − µk) (6)
Where
µk is called the “centroid”, but may not be the mean vector.
Properties
The above objective function shows that for each cluster, only the
worst scenario matters, that is, the farthest data point to the centroid.
Moreover, among the clusters, only the worst cluster matters, whose
farthest data point yields the maximum distance to the centroid
comparing with the farthest data points of the other clusters.
60 / 142
Images/cinvestav-
In other words
First
This minimax type of problem is much harder to solve than solving the
objective function of k-means.
Another formulation of K-center minimizes the worst case pairwise
distance instead of using centroids
min
S
max
k=1,...,K
max
i,j:xi,xj∈Ck
L (xi, xj) (7)
With
L (xi, xj) denotes any distance between a pair of objects in the same
cluster.
61 / 142
Images/cinvestav-
Greedy Algorithm for K-Center
Main Idea
The idea behind the Greedy Algorithm is to choose a subset H from the
original dataset S consisting of K points that are farthest apart from each
other.
Intuition
Since the points in set H are far apart then the worst-case scenario has
been taken care of and hopefully the cluster size for the partition is small.
Thus
Each point hk ∈ H represents one cluster or subset of points Ck.
63 / 142
Images/cinvestav-
Then
Something Notable
We can think of it as a centroid.
However
Technically it is not a centroid because it tends to be at the boundary of a
cluster, but conceptually we can think of it as a centroid.
Important
The way that we partition these points given the centroids is the same as
in K-means, that is, the nearest-neighbor rule.
64 / 142
Images/cinvestav-
Specifically
We do the following
For every point xi, in order to see which cluster Ck it is partitioned into,
we compute its distance to each cluster centroid as follows, and find out
which centroid is the closest:
L (xi, hk) = min
k =1,...,K
L (xi, hk ) (8)
Thus
Whichever centroid with the minimum distance is selected as the cluster
for xi.
65 / 142
Images/cinvestav-
Important
For K-center clustering
We only need pairwise distance L (xi, xj) for any xi, xj ∈ S.
Where
xi can be a non-vector representation of the objects.
As long we can calculate
L (xi, xj) which makes the K-center more general than the K-means.
66 / 142
Images/cinvestav-
Properties
Something Notable
The greedy algorithm achieves an approximation factor of 2 as the
distance measure L satisfies the triangle inequality.
Thus, we have that
D∗
= min
S
max
k=1,...,K
max
i,j:xi,xj∈Ck
L (xi, xj) (9)
Then, we have the following guarantee for the greedy algorithm
D ≤ 2D∗
(10)
67 / 142
Images/cinvestav-
Nevertheless
We have that
K-center does not provide a locally optimal solution.
We can get only a solution
Guaranteeing to be within a certain performance range of the theoretical
optimal solution.
68 / 142
Images/cinvestav-
Setup
First
Set H denotes the set of cluster centroids or cluster of representative
objects {h1, ..., hk} ⊂ S.
Second
Let cluster (xi) be the identity of the cluster xi ∈ S belongs to.
Third
The distance dist (xi) is the distance between xi and its closest cluster
representative object (centroid):
dist (xi) = min
hj∈H
L (xi, hj) (11)
70 / 142
Images/cinvestav-
The Main Idea
Something Notable
We always assign xi to the closest centroid. Therefore dist (xi) is the
minimum distance between xi and any centroid.
The Algorithm is Iterative
We generate one cluster centroid first and then add others one by one
until we get K clusters.
The set of centroids H starts with only a single centroid, h1.
71 / 142
Images/cinvestav-
Algorithm
Step 1
Randomly select an object xj from S, let h1 = xj, H = {h1}.
It does not matter how xj is selected.
Step 2
For j = 1 to n:
1 dist (xi) = L (xi, h1).
2 cluster (xi).
In other words
The dist (xi) is the computed distance between xj and h1.
Because so far we only have one cluster, we will assign a cluster label
1 to every xj.
73 / 142
Images/cinvestav-
Algorithm
Step 3
For i = 2 to K
1 D = max
xj:xj∈S−H
dist (xj)
2 Choose hi ∈ S − H such that dist (hi) == D
3 H = H ∪ {hi}
4 for j = 1 to N
5 if L (xj, hi) ≤ dist (xj)
6 dist (xj) = L (xj, hi)
7 cluster (xj) = i
74 / 142
Images/cinvestav-
Thus
As the algorithm progresses
We gradually add more and more cluster centroids, beginning with 2
until we get to K.
At each iteration, we find among all of the points which are not yet
included in the set, a worst point:
Worst in the sense that this point has maximum distance to its
corresponding centroid.
This worst point
It is added to the set H.
To stress gain, points already included in H are not among the
consideration.
75 / 142
Images/cinvestav-
Implementation
We can use the following for implementation
The disjoint-set data structure with the following operations:
MakeSet
Find
Union
Remove Iteratively through the disjoint trees.
Of course it is necessary to have extra fields for the necessary
distances
Thus, each node-element for the sets must have a field dist (xj) such that
dist (xj) = L (xj, Find (xj)) (12)
77 / 142
Images/cinvestav-
Although
Other things need to be taken in consideration
I will allow to you to think about them
78 / 142
Images/cinvestav-
Example
Running the k-center and k-means algorithms allows to see that for
different densities k-center is more robust
10 8 6 4 2 0 2 4
6
4
2
0
2
4
6
10 8 6 4 2 0 2 4
6
4
2
0
2
4
6
80 / 142
Images/cinvestav-
Example
Decreasing the density of one of the clusters, we see a degradation on
the clusters
10 8 6 4 2 0 2
6
4
2
0
2
4
6
10 8 6 4 2 0 2
6
4
2
0
2
4
6
81 / 142
Images/cinvestav-
Using Centroids of the K-center to initialize K-mean
Thus, we can use the centroids of K-center to try to improve upon
K-means to a certain degree
10 8 6 4 2 0 2 4
6
4
2
0
2
4
6
10 8 6 4 2 0 2 4
6
4
2
0
2
4
6
82 / 142
Images/cinvestav-
The Running Time
We have that
The running time of the algorithm is O (KN), where K is the number of
clusters generated and N is the size of the data set.
Why?
Because K-center only requires pairwise distance between any point and
the centroids.
84 / 142
Images/cinvestav-
Main Bound of the K-center algorithm
Lemma
Given the distance measure L satisfying the triangle inequality.
If the partition obtained by the greedy algorithm is S and the optimal
partition be S∗, such that the cluster size of S be D and the one for
S∗ is D∗, then
D ≤ 2D∗
(13)
85 / 142
Images/cinvestav-
Proof
If we look only at the first j centroid
It generates a partition j with size Dj and also:
The cluster size is the size of the biggest cluster in the current
partition.
The size of every cluster is defined as the maximum distance between
a point and the cluster centroid
Thus
D1 ≥ D2 ≥ D3... (14)
Why?
Because D = max
xj:xj∈S−H
dist (xj) and lines 5-7 in the step 3 and using
induction!!!
You can prove that part by yourselves..
87 / 142
Images/cinvestav-
Graphically
It is easy to see that when the inner loop changes distances
88 / 142
Images/cinvestav-
Now
It is necessary to prove
∀i < j, L (hi, hj) ≥ Dj−1 (15)
Thus
Dj−1 is a lower bound for the distance between hi and hj.
89 / 142
Images/cinvestav-
Proof of the Previous Statement
It is possible to see that
L (hj−2, hj) ≥ L (hj−1, hj) (16)
How?
Assume it is not true. Then, L (hj−2, hj) < L (hj−1, hj)
90 / 142
Images/cinvestav-
Then
We have that given the partition S generated by the greedy algorithm
hj−2 ∈Cj−2 , hj−2 ∈Cj for Cj−2, Cj ∈ S
It is a contradiction because hj−2 is generated by the algorithm such
that cannot be in any other cluster!!!
Thus iteratively
L (h1, hj) ≥ L (h2, hj) ≥ L (h3, hj) ≥ ... ≥ L (hj−1, hj) = Dj−1 (17)
91 / 142
Images/cinvestav-
Not only that
Not only that
∀j, ∃i < j, L (hi, hj) = Dj−1 (18)
Therefore
Therefore, Dj−1 is not only the lower bound for the distance between hi
and hj, it is also the exact boundary for a specific i.
92 / 142
Images/cinvestav-
If, we continue with the proof
Now
Let us consider the optimal partition S∗ with K clusters and its size
D∗.
Suppose the greedy algorithm generates the centroids
H = {h1, h2, ..., hK}.
For the proof, we are adding one more, hK+1.
This can be done without losing generality.
According to the pigeonhole principle, at least two of the centroids
among
{h1, h2, ..., hK, hK+1}will fall into one cluster k of the partition S∗.
Thus assume
1 ≤ i < k < j ≤ K + 1 ⇒ Using the triangle inequality:
L (hi, hj) ≤ L (hi, hk) + L (hk, hj) ≤ D∗
+ D∗
= 2D∗
(19)
93 / 142
Images/cinvestav-
Then
In addition
Also L (hi, hj) ≥ Dj−1 ≥ Dk then Dk ≤ 2D∗
Given S, the partition generated by the greedy algorithm
We define ∆ as
∆ = max
xj:xj∈S−H
min
hk:hk∈H
L (xj, hk) (20)
Basically
The maximum of all points that are not centroids that minimize the
distance to some centroid for the partition generated by the greedy
algorithm.
94 / 142
Images/cinvestav-
Now, we are ready for the final part
Let hK+1 be an element in S − H
Such that
min
hk:hk∈H
L (hK+1, hk) = ∆ (21)
By definition
L (hK+1, hk) ≥ ∆, ∀k = 1, ..., K (22)
Thus, we have the following sets
Let Hk = {h1, ..., hk} with k = 1, 2, ..., K.
95 / 142
Images/cinvestav-
Now
Consider the distance between hi and hj for i < j ≤ K
According the greedy algorithm
min
hk:hk∈Hj−1
L (hj, hk) ≥ min
hk:hk∈Hj−1
L (xl, hk) for any xl ∈ S − Hj
Basically remember that the hi are obtained by finding the farthest
points.
96 / 142
Images/cinvestav-
Now, we have
Since hK+1 ∈ S − H and S − H ⊂ S − Hj
L (hj, hi) ≥ min
hk:hk∈Hj−1
L (hj, hk)
≥ min
hk:hk∈Hj−1
L (hK+1, hk)
≥ min
hk:hk∈H
L (hK+1, hk)
=∆
We have shown that for any for i < j ≤ K + 1
L (hj, hi) ≥ ∆ (23)
97 / 142
Images/cinvestav-
Now
Consider the optimal partition S∗
= {C∗
1 , C∗
2 , ..., C∗
K}
Thus at least 2 of the K + 1 elements h1, h2, ..., hK+1 will be covered by
one cluster.
Assume that
hi and hj belong to the same cluster in S∗ . Then L (hi, hj) ≤ D∗.
In addition
We have that since L (hi, hj) ≥ ∆ then ∆ ≤ D∗
98 / 142
Images/cinvestav-
In addition
Consider elements xm and xn in any cluster represented by hk
L (xm, hk) ≤ ∆ and L (xn, hk) ≤ ∆ (24)
By Triangle Inequality
L (xm, xk) ≤ L (xm, hk) + L (xn, hk) ≤ 2∆ (25)
Finally, there are two elements xm and xn in a cluster such that
D = L (xm, xn)
D = max
k
Dk ≤ 2∆ ≤ 2D∗
(26)
99 / 142
Images/cinvestav-
Some of the Fuzzy Clustering Models
Fuzzy Clustering Model
Bezdek, 1981
Possibilistic Clustering Model
Krishnapuram - Keller, 1993
Fuzzy Possibilistic Clustering Model
N. Pal - K. Pal - Bezdek, 1997
101 / 142
Images/cinvestav-
Fuzzy C-Means Clustering
The input an unlabeled data set
X = {x1, x2, x3, ..., xN }.
xk ∈ Rp
Output
A partition S of the X as a matrix U of C × N.
Set of cluster centers V = {v1, v2, ..., vC} ⊂ Rp
102 / 142
Images/cinvestav-
What we want
Creation of the Cost Function
First:
We can use a distance defined as:
xk − vi = (xk − vi)T
(xk − vi) (27)
The euclidean distance from a point k to a centroid i.
NOTE other distances based in Mahalonobis can be taken in
consideration.
103 / 142
Images/cinvestav-
Do you remember the cost function for K-means?
Finding a partition S that minimizes the following function
min
S
N
k=1 k:xk∈Ci
xk − vi
2
(28)
Where vi = 1
Ni
xk∈Ci
xk
We can rewrite the previous equation as
min
S
N
k=1
C
i=1
I (xk ∈ Ci) xk − vi
2
(29)
104 / 142
Images/cinvestav-
In addition
Did you notice that the membership is always one or zero?
min
S
N
k=1
C
i=1
Membership
I (xk ∈ Ci) xk − vi
2
(30)
105 / 142
Images/cinvestav-
Thus, we can rethink the membership using something
“Fuzzy”
What if we modify the cost function to something like this
min
S
N
k=1
C
i=1
Membership
Fuzzy Value xk − vi
2
(31)
This means that we think that each cluster Ci is “Fuzzy”
We can assume a fuzzy set for the cluster Ci with membership function:
Ai : Rp
→ [0, 1] (32)
Such that we can tune it by using a power i.e. decreasing it by a m power.
107 / 142
Images/cinvestav-
Under the following constraints
First
Ai (xk) ∈ [0, 1] ∀i, k (33)
Second
0 <
N
k=1
Ai (xk) < N ∀i (34)
Third
C
i=1
Ai (xk) = 1 ∀k (35)
108 / 142
Images/cinvestav-
Final Cost Function
Properties
Jm (S) =
N
k=1
C
i=1
[Ai (xk)]m
xk − vi
2
(36)
Under the constraints
Ai (xk) ∈ [0, 1], for 1 ≤ k ≤ N and 1 ≤ i ≤ C.
C
i=1 Ai (xk) = 1, for 1 ≤ k ≤ N.
0 < N
k=1 Ai (xk) < n, for 1 ≤ i ≤ C.
m > 1.
109 / 142
Images/cinvestav-
Using the Lagrange Multipliers
New cost function
¯Jm (S) =
N
k=1
C
i=1
[Ai (xk)]m
xk − vi
2
−
N
k=1
λk
C
i=1
Ai (xk) − 1 (37)
Derive with respect to Ai (xk)
∂ ¯Jm (S)
∂Ai (xk)
= mAi (xk)m−1
xk − vi
2
− λk = 0 (38)
Thus
Ai (xk) =
λk
m xk − vi
2
1
m−1
(39)
111 / 142
Images/cinvestav-
Using the Lagrange Multipliers
Sum over all i’s
C
i=1
Ai (xk) =
λ
1
m−1
k
m
1
m−1 xk − vi
2
m−1
(40)
Thus
λk =
m
C
i=1
1
xk−vi
2
m−1
m−1 (41)
Plug Back on equation 38 using j instead of i
m
C
j=1
1
xk−vj
2
m−1
m−1 = mAi (xk)m−1
xk − vi
2
(42)
112 / 142
Images/cinvestav-
Finally
We have that
Ai (xk) =
1
C
j=1
xk−vi
2
xk−vj
2
1
m−1
(43)
In a similar way we have
vi =
N
k=1 Ai (xk)m
xk
N
k=1 Ai (xk)m (44)
113 / 142
Images/cinvestav-
Final Algorithm
Fuzzy c-means
1 Let t = 0. Select an initial fuzzy pseudo-partition.
2 Calculate the initial C cluster centers using, v
(t)
i =
N
k=1
A
(t)
i (xk)m
xk
N
k=1
A
(t)
i (xk)m
.
3 Update for each xk the membership function by
Case I: xk − v
(t)
i
2
> 0 for all i ∈ {1, 2, ..., C} then
A
(t+1)
i (xk) = 1
C
j=1
xk−v
(t)
i
2
xk−v
(t)
j
2
1
m−1
Case II: xk − v
(t)
i
2
= 0 for some i ∈ I ⊆ {1, 2, ..., C} then define
A
(t+1)
i (xk) by any nonnegative number such that
i∈I A
(t+1)
i (xk) = 1 and A
(t+1)
i (xk) = 0 for i /∈ I.
4 If S(t+1) − S(t) = max
i,k
A
(t+1)
i (xk) − A
(t)
i (xk) ≤ stop; otherwise
increase t and go to step 2.
114 / 142
Images/cinvestav-
Final Output
The Matrix U
The elements of U are Uik = Ai (xk).
The centroids
V = {v1, v2, ..., vC}
115 / 142
Images/cinvestav-
Example
Here the clustering of two Gaussian Clusters with
µ1 = (4, 0)T
, µ2 = (10, 0)T
and variance 1.0
24681012
4
3
2
1
0
1
2
3
4
MembershipLabel0.30.40.50.60.70.80.91.01.1
117 / 142
Images/cinvestav-
Example
Here the clustering of two Gaussian Clusters
24681012
432101234
MembershipLabel
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
118 / 142
Images/cinvestav-
Example
Here the clustering of two Gaussian Clusters
2
4
6
8
10
12
4
3
2
1
0
1
2
3
4
MembershipLabel
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
119 / 142
Images/cinvestav-
Example
Here the clustering of two Gaussian Clusters
24681012
432101234
MembershipLabel
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
120 / 142
Images/cinvestav-
Pros and Cons of Fuzzy C-Means
Advantages
Unsupervised
Always converges
Disadvantages
Long computational time
Sensitivity to the initial guess (speed, local minima)
Sensitivity to noise
One expects low (or even no) membership degree for outliers (noisy
points)
122 / 142
Images/cinvestav-
Outliers, Disadvantage of FCM
After running without outliers
2 4 6 8 10 12
4
2
0
2
4
MembershipLabel
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
123 / 142
Images/cinvestav-
Outliers, Disadvantage of FCM
Now add outliers (Shown in blue x s) and their high memberships
4 6 8 10 12 14
4
2
0
2
4
MembershipLabel
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
124 / 142
Images/cinvestav-
Outliers, Disadvantage of FCM
Now add outliers (Shown in blue x s) and their high memberships
4 6 8 10 12 14 42024
MembershipLabel
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
125 / 142
Images/cinvestav-
Krinshapuram and Keller
Following Zadeh
They took in consideration that each class prototype as defining an elastic
constraint.
What?
Giving the ti (xk) as degree of compatibility of sample xk with cluster Ci.
We do the following
If we consider the Ci as fuzzy sets over the set of samples
X = {x1, x2, ..., xN }
127 / 142
Images/cinvestav-
Here is the Catch!!!
We should not use the old membership
C
i=1
Ai (xk) = 1 (45)
Because
This is quite probabilistic... which is not what we want!!!
Thus
We only ask for membership, now using the possibilistic notation of ti (xk)
(This is known as typicality value), to be in the interval [0, 1].
128 / 142
Images/cinvestav-
New Constraints
First
ti (xk) ∈ [0, 1] ∀i, k (46)
Second
0 <
N
k=1
ti (xk) < N ∀i (47)
Third
max
i
ti (xk) > 0 ∀k (48)
129 / 142
Images/cinvestav-
We have the following cost function
Cost Function
N
k=1
C
i=1
[ti (xk)]m
xk − vi
2
(49)
Problem
Unconstrained optimization of first term will lead to the trivial solution
ti (xk) = 0 for all i, k.
Thus, we can introduce the following constraint
ti (xk) → 1 (50)
Roughly it means to make the typicality values as large as possible.
131 / 142
Images/cinvestav-
We can try to control this tendency
By putting all them together in
N
k=1
(1 − ti (xk))m
(51)
With m to control the tendency of ti (xk) → 1
We can also run this tendency over all the cluster using a suitable
wi > 0 per cluster
C
i=1
wi
N
k=1
(1 − ti (xk))m
(52)
132 / 142
Images/cinvestav-
Possibilistic C-Mean Clustering (PCM)
The final Cost Function
Jm (S) =
N
k=1
C
i=1
[ti (xk)]m
xk − vi
2
+
C
i=1
wi
N
k=1
(1 − ti (xk))m
(53)
Where
ti (xk) are typicality values.
wi are cluster weights
133 / 142
Images/cinvestav-
Explanation
First Term
N
k=1
C
i=1
[ti (xk)]m
xk − vi
2
(54)
It demands that the distance from feature vector to prototypes be as small
as possible!!!
Second Term
c
i=1
wi
n
k=1
(1 − ti (xk))m
(55)
It forces the typicality values ti (xk) to be as large as possible.
134 / 142
Images/cinvestav-
Final Updating Equations
Typicality Values
ti (xk) =
1
1 + xk−vi
2
wi
1
m−1
, ∀i, k (56)
Cluster Centers
vi =
N
k=1 ti (xk)m
xk
n
k=1 ti (xk)m (57)
135 / 142
Images/cinvestav-
Final Updating Equations
Weights
wi = M
N
k=1 [ti (xk)]m
xk − vi
2
n
k=1 [ti (xk)]m , (58)
with M > 0.
136 / 142
Images/cinvestav-
Possibilistic can deal with outliers
Two Gaussian Clusters with µ1 = (4, 0)T
, µ2 = (10, 0)T
and σ2
= 1.0
4
6
8
10 3
2
1
0
1
2
3
4
MembershipLabel
0.2
0.4
0.6
0.8
1.0
137 / 142
Images/cinvestav-
Possibilistic can deal with outliers
Now add outliers
6
8
10
12
14 3
2
1
0
1
2
3
4
MembershipLabel
0.0
0.2
0.4
0.6
0.8
1.0
1.2
138 / 142
Images/cinvestav-
Possibilistic can deal with outliers
Another Angle
6 8 10 12 14 32101234
MembershipLabel
0.0
0.2
0.4
0.6
0.8
1.0
1.2
139 / 142
Images/cinvestav-
Possibilistic can deal with outliers
Another Angle
6 8 10 12 1432101234
MembershipLabel
0.0
0.2
0.4
0.6
0.8
1.0
1.2
140 / 142
Images/cinvestav-
Pros and Cons of Fuzzy C-Means
Advantages
Clustering noisy data samples.
Disadvantages
Very sensitive to good initialization.
For example, I needed to run fuzzy C-means to obtain good initial
typicalities
In Between!!!
Coincident clusters may result.
Because the columns and rows of the typicality matrix are
independent of each other.
This could be advantageous (start with a large value of C and get
less distinct clusters)
141 / 142
Images/cinvestav-
Nevertheless
There are more advanced clustering methods based on the
possibilistic and fuzzy idea
Pal, N.R.; Pal, K.; Keller, J.M.; Bezdek, J.C., "A Possibilistic Fuzzy
c-Means Clustering Algorithm," Fuzzy Systems, IEEE Transactions on ,
vol.13, no.4, pp.517,530, Aug. 2005.
142 / 142

More Related Content

What's hot

Optimization as a model for few shot learning
Optimization as a model for few shot learningOptimization as a model for few shot learning
Optimization as a model for few shot learningKaty Lee
 
Introduction to Machine learning
Introduction to Machine learningIntroduction to Machine learning
Introduction to Machine learningQuentin Ambard
 
Classification By Back Propagation
Classification By Back PropagationClassification By Back Propagation
Classification By Back PropagationBineeshJose99
 
Exposé segmentation
Exposé segmentationExposé segmentation
Exposé segmentationDonia Hammami
 
Intro to Deep Learning for Computer Vision
Intro to Deep Learning for Computer VisionIntro to Deep Learning for Computer Vision
Intro to Deep Learning for Computer VisionChristoph Körner
 
Federated learning based_trafiic_flow_prediction.ppt
Federated learning based_trafiic_flow_prediction.pptFederated learning based_trafiic_flow_prediction.ppt
Federated learning based_trafiic_flow_prediction.pptkhalidhassan105
 
Clustering: Méthode hiérarchique
Clustering: Méthode hiérarchiqueClustering: Méthode hiérarchique
Clustering: Méthode hiérarchiqueYassine Mhadhbi
 
Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...
Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...
Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...Simplilearn
 
Présentation sur le Data Mining
Présentation sur le Data MiningPrésentation sur le Data Mining
Présentation sur le Data MiningTakfarinas KENOUCHE
 
Divide and conquer surfing lower bounds
Divide and conquer  surfing lower boundsDivide and conquer  surfing lower bounds
Divide and conquer surfing lower boundsRajendran
 
Deep Learning Models for Question Answering
Deep Learning Models for Question AnsweringDeep Learning Models for Question Answering
Deep Learning Models for Question AnsweringSujit Pal
 
Algorithmes machine learning/ neural network / deep learning
Algorithmes machine learning/ neural network / deep learningAlgorithmes machine learning/ neural network / deep learning
Algorithmes machine learning/ neural network / deep learningBassem Brayek
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter TuningJon Lederman
 
GAN's (Generative Adversial Networks)
GAN's (Generative Adversial Networks)GAN's (Generative Adversial Networks)
GAN's (Generative Adversial Networks)walid lakehal
 
Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)SungminYou
 
Apprentissage supervisé.pdf
Apprentissage supervisé.pdfApprentissage supervisé.pdf
Apprentissage supervisé.pdfhanamettali
 
Introduction to Keras
Introduction to KerasIntroduction to Keras
Introduction to KerasJohn Ramey
 

What's hot (20)

Deep learning
Deep learningDeep learning
Deep learning
 
Optimization as a model for few shot learning
Optimization as a model for few shot learningOptimization as a model for few shot learning
Optimization as a model for few shot learning
 
Random forest
Random forestRandom forest
Random forest
 
Introduction to Machine learning
Introduction to Machine learningIntroduction to Machine learning
Introduction to Machine learning
 
Classification By Back Propagation
Classification By Back PropagationClassification By Back Propagation
Classification By Back Propagation
 
Exposé segmentation
Exposé segmentationExposé segmentation
Exposé segmentation
 
Intro to Deep Learning for Computer Vision
Intro to Deep Learning for Computer VisionIntro to Deep Learning for Computer Vision
Intro to Deep Learning for Computer Vision
 
Federated learning based_trafiic_flow_prediction.ppt
Federated learning based_trafiic_flow_prediction.pptFederated learning based_trafiic_flow_prediction.ppt
Federated learning based_trafiic_flow_prediction.ppt
 
Clustering: Méthode hiérarchique
Clustering: Méthode hiérarchiqueClustering: Méthode hiérarchique
Clustering: Méthode hiérarchique
 
Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...
Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...
Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...
 
Présentation sur le Data Mining
Présentation sur le Data MiningPrésentation sur le Data Mining
Présentation sur le Data Mining
 
Divide and conquer surfing lower bounds
Divide and conquer  surfing lower boundsDivide and conquer  surfing lower bounds
Divide and conquer surfing lower bounds
 
Deep Learning Models for Question Answering
Deep Learning Models for Question AnsweringDeep Learning Models for Question Answering
Deep Learning Models for Question Answering
 
Algorithmes machine learning/ neural network / deep learning
Algorithmes machine learning/ neural network / deep learningAlgorithmes machine learning/ neural network / deep learning
Algorithmes machine learning/ neural network / deep learning
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 
Prez PFE
Prez PFEPrez PFE
Prez PFE
 
GAN's (Generative Adversial Networks)
GAN's (Generative Adversial Networks)GAN's (Generative Adversial Networks)
GAN's (Generative Adversial Networks)
 
Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)
 
Apprentissage supervisé.pdf
Apprentissage supervisé.pdfApprentissage supervisé.pdf
Apprentissage supervisé.pdf
 
Introduction to Keras
Introduction to KerasIntroduction to Keras
Introduction to Keras
 

Similar to 20 k-means, k-center, k-meoids and variations

26 Machine Learning Unsupervised Fuzzy C-Means
26 Machine Learning Unsupervised Fuzzy C-Means26 Machine Learning Unsupervised Fuzzy C-Means
26 Machine Learning Unsupervised Fuzzy C-MeansAndres Mendez-Vazquez
 
Information-theoretic clustering with applications
Information-theoretic clustering  with applicationsInformation-theoretic clustering  with applications
Information-theoretic clustering with applicationsFrank Nielsen
 
5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdf5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdfRahul926331
 
DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx36rajneekant
 
Litvinenko low-rank kriging +FFT poster
Litvinenko low-rank kriging +FFT  posterLitvinenko low-rank kriging +FFT  poster
Litvinenko low-rank kriging +FFT posterAlexander Litvinenko
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...Simplilearn
 
Enhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetEnhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetAlaaZ
 
Chapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.pptChapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.pptSubrata Kumer Paul
 
K-means Clustering || Data Mining
K-means Clustering || Data MiningK-means Clustering || Data Mining
K-means Clustering || Data MiningIffat Firozy
 
Digital Signal Processing[ECEG-3171]-Ch1_L03
Digital Signal Processing[ECEG-3171]-Ch1_L03Digital Signal Processing[ECEG-3171]-Ch1_L03
Digital Signal Processing[ECEG-3171]-Ch1_L03Rediet Moges
 
23 Machine Learning Feature Generation
23 Machine Learning Feature Generation23 Machine Learning Feature Generation
23 Machine Learning Feature GenerationAndres Mendez-Vazquez
 
Clustering and Factorization using Apache SystemML by Alexandre V Evfimievski
Clustering and Factorization using Apache SystemML by  Alexandre V EvfimievskiClustering and Factorization using Apache SystemML by  Alexandre V Evfimievski
Clustering and Factorization using Apache SystemML by Alexandre V EvfimievskiArvind Surve
 
Clustering and Factorization using Apache SystemML by Alexandre V Evfimievski
Clustering and Factorization using Apache SystemML by  Alexandre V EvfimievskiClustering and Factorization using Apache SystemML by  Alexandre V Evfimievski
Clustering and Factorization using Apache SystemML by Alexandre V EvfimievskiArvind Surve
 
Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Alexander Litvinenko
 
11ClusAdvanced.ppt
11ClusAdvanced.ppt11ClusAdvanced.ppt
11ClusAdvanced.pptSueMiu
 
Lecture 16 KL Transform in Image Processing
Lecture 16 KL Transform in Image ProcessingLecture 16 KL Transform in Image Processing
Lecture 16 KL Transform in Image ProcessingVARUN KUMAR
 

Similar to 20 k-means, k-center, k-meoids and variations (20)

26 Machine Learning Unsupervised Fuzzy C-Means
26 Machine Learning Unsupervised Fuzzy C-Means26 Machine Learning Unsupervised Fuzzy C-Means
26 Machine Learning Unsupervised Fuzzy C-Means
 
Information-theoretic clustering with applications
Information-theoretic clustering  with applicationsInformation-theoretic clustering  with applications
Information-theoretic clustering with applications
 
5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdf5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdf
 
DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx
 
Clustering lect
Clustering lectClustering lect
Clustering lect
 
Litvinenko low-rank kriging +FFT poster
Litvinenko low-rank kriging +FFT  posterLitvinenko low-rank kriging +FFT  poster
Litvinenko low-rank kriging +FFT poster
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
 
Enhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetEnhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial Dataset
 
Chapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.pptChapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.ppt
 
Lect4
Lect4Lect4
Lect4
 
K-means Clustering || Data Mining
K-means Clustering || Data MiningK-means Clustering || Data Mining
K-means Clustering || Data Mining
 
Digital Signal Processing[ECEG-3171]-Ch1_L03
Digital Signal Processing[ECEG-3171]-Ch1_L03Digital Signal Processing[ECEG-3171]-Ch1_L03
Digital Signal Processing[ECEG-3171]-Ch1_L03
 
23 Machine Learning Feature Generation
23 Machine Learning Feature Generation23 Machine Learning Feature Generation
23 Machine Learning Feature Generation
 
Clustering and Factorization using Apache SystemML by Alexandre V Evfimievski
Clustering and Factorization using Apache SystemML by  Alexandre V EvfimievskiClustering and Factorization using Apache SystemML by  Alexandre V Evfimievski
Clustering and Factorization using Apache SystemML by Alexandre V Evfimievski
 
Clustering and Factorization using Apache SystemML by Alexandre V Evfimievski
Clustering and Factorization using Apache SystemML by  Alexandre V EvfimievskiClustering and Factorization using Apache SystemML by  Alexandre V Evfimievski
Clustering and Factorization using Apache SystemML by Alexandre V Evfimievski
 
Absorbing Random Walk Centrality
Absorbing Random Walk CentralityAbsorbing Random Walk Centrality
Absorbing Random Walk Centrality
 
Cluster
ClusterCluster
Cluster
 
Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...
 
11ClusAdvanced.ppt
11ClusAdvanced.ppt11ClusAdvanced.ppt
11ClusAdvanced.ppt
 
Lecture 16 KL Transform in Image Processing
Lecture 16 KL Transform in Image ProcessingLecture 16 KL Transform in Image Processing
Lecture 16 KL Transform in Image Processing
 

More from Andres Mendez-Vazquez

01.04 orthonormal basis_eigen_vectors
01.04 orthonormal basis_eigen_vectors01.04 orthonormal basis_eigen_vectors
01.04 orthonormal basis_eigen_vectorsAndres Mendez-Vazquez
 
01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issues01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issuesAndres Mendez-Vazquez
 
05 backpropagation automatic_differentiation
05 backpropagation automatic_differentiation05 backpropagation automatic_differentiation
05 backpropagation automatic_differentiationAndres Mendez-Vazquez
 
01 Introduction to Neural Networks and Deep Learning
01 Introduction to Neural Networks and Deep Learning01 Introduction to Neural Networks and Deep Learning
01 Introduction to Neural Networks and Deep LearningAndres Mendez-Vazquez
 
25 introduction reinforcement_learning
25 introduction reinforcement_learning25 introduction reinforcement_learning
25 introduction reinforcement_learningAndres Mendez-Vazquez
 
Neural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning SyllabusNeural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning SyllabusAndres Mendez-Vazquez
 
Introduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabusIntroduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabusAndres Mendez-Vazquez
 
Ideas about a Bachelor in Machine Learning/Data Sciences
Ideas about a Bachelor in Machine Learning/Data SciencesIdeas about a Bachelor in Machine Learning/Data Sciences
Ideas about a Bachelor in Machine Learning/Data SciencesAndres Mendez-Vazquez
 
Introduction Mathematics Intelligent Systems Syllabus
Introduction Mathematics Intelligent Systems SyllabusIntroduction Mathematics Intelligent Systems Syllabus
Introduction Mathematics Intelligent Systems SyllabusAndres Mendez-Vazquez
 

More from Andres Mendez-Vazquez (20)

2.03 bayesian estimation
2.03 bayesian estimation2.03 bayesian estimation
2.03 bayesian estimation
 
05 linear transformations
05 linear transformations05 linear transformations
05 linear transformations
 
01.04 orthonormal basis_eigen_vectors
01.04 orthonormal basis_eigen_vectors01.04 orthonormal basis_eigen_vectors
01.04 orthonormal basis_eigen_vectors
 
01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issues01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issues
 
01.02 linear equations
01.02 linear equations01.02 linear equations
01.02 linear equations
 
01.01 vector spaces
01.01 vector spaces01.01 vector spaces
01.01 vector spaces
 
06 recurrent neural_networks
06 recurrent neural_networks06 recurrent neural_networks
06 recurrent neural_networks
 
05 backpropagation automatic_differentiation
05 backpropagation automatic_differentiation05 backpropagation automatic_differentiation
05 backpropagation automatic_differentiation
 
Zetta global
Zetta globalZetta global
Zetta global
 
01 Introduction to Neural Networks and Deep Learning
01 Introduction to Neural Networks and Deep Learning01 Introduction to Neural Networks and Deep Learning
01 Introduction to Neural Networks and Deep Learning
 
25 introduction reinforcement_learning
25 introduction reinforcement_learning25 introduction reinforcement_learning
25 introduction reinforcement_learning
 
Neural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning SyllabusNeural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning Syllabus
 
Introduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabusIntroduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabus
 
Ideas 09 22_2018
Ideas 09 22_2018Ideas 09 22_2018
Ideas 09 22_2018
 
Ideas about a Bachelor in Machine Learning/Data Sciences
Ideas about a Bachelor in Machine Learning/Data SciencesIdeas about a Bachelor in Machine Learning/Data Sciences
Ideas about a Bachelor in Machine Learning/Data Sciences
 
Analysis of Algorithms Syllabus
Analysis of Algorithms  SyllabusAnalysis of Algorithms  Syllabus
Analysis of Algorithms Syllabus
 
18.1 combining models
18.1 combining models18.1 combining models
18.1 combining models
 
17 vapnik chervonenkis dimension
17 vapnik chervonenkis dimension17 vapnik chervonenkis dimension
17 vapnik chervonenkis dimension
 
A basic introduction to learning
A basic introduction to learningA basic introduction to learning
A basic introduction to learning
 
Introduction Mathematics Intelligent Systems Syllabus
Introduction Mathematics Intelligent Systems SyllabusIntroduction Mathematics Intelligent Systems Syllabus
Introduction Mathematics Intelligent Systems Syllabus
 

Recently uploaded

GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2RajaP95
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxvipinkmenon1
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 

Recently uploaded (20)

GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptx
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 

20 k-means, k-center, k-meoids and variations

  • 1. Introduction to Machine Learning K-Means, K-Meoids, K-Centers and Variations Andres Mendez-Vazquez August 4, 2018 1 / 142
  • 2. Images/cinvestav- Outline 1 K-Means Clustering The NP-Hard Problem K-Means Clustering Heuristic Convergence Criterion The Distance Function Example Properties of K-Means K-Means and Principal Component Analysis 2 K-Meoids Introduction The Algorithm Complexity 3 The K-Center Criterion Clustering Introduction Re-Stating the K-center as a Clustering Problem Comparison with K-means The Greedy K-Center Algorithm Pseudo-Code The K-Center Algorithm Notes in Implementation Examples K-Center Algorithm Properties K-Center Algorithm proof of correctness 4 Variations Fuzzy Clustering Rethinking K-Means Cost Function Using the Lagrange Multipliers Examples Pros and Cons of FCM What can we do? Possibilistic Clustering Cost Function 2 / 142
  • 3. Images/cinvestav- The Hardness of K-means clustering Definition Given a multiset S ⊆ Rd , an integer k and L ∈ R, is there a subset T ⊂ Rd with |T| = k such that x∈S min t∈T x − t 2 ≤ L? Theorem The k-means clustering problem is NP-complete even for d = 2. 4 / 142
  • 4. Images/cinvestav- Reduction The reduction to an NP-Complete problem Exact Cover by 3-Sets problem Definition Given a finite set U containing exactly 3n elements and a collection C = {S1, S2, ..., Sl} of subsets of U each of which contains exactly 3 elements, Are there n sets in C such that their union is U? 5 / 142
  • 5. Images/cinvestav- However There are efficient heuristic and approximation algorithms Which can solve this problem 6 / 142
  • 6. Images/cinvestav- K-Means - Stuart Lloyd(Circa 1957) History Invented by Stuart Loyd in Bell Labs to obtain the best quantization in a signal data set. Something Notable The paper was published until 1982 Basically given N vectors x1, ..., xN ∈ Rd It tries to find k points µ1, ..., µk ∈ Rd that minimize the expression (i.e. a partition S of the vector points): N k=1 i:xi∈Ck xi − µk 2 = N k=1 i:xi∈Ck (xi − µk)T (xi − µk) 8 / 142
  • 7. Images/cinvestav- K-means clustering K-means It is a partitional clustering algorithm. Definition Let the set of data points (or instances) D be {x1, · · · , xn} where xi = (xi1, · · · , xir)T : The K-means algorithm partitions the given data into K clusters. Each cluster has a cluster center, called centroid. K is specified by the user. 9 / 142
  • 8. Images/cinvestav- K-means algorithm The K-means algorithm works as follows Given k as the possible number of cluster: 1 Randomly choose K data points (seeds) to be the initial centroids, cluster centers, {v1, · · · , vk} 2 Assign each data point to the closest centroid ci = arg min j {dist(xi − vj)} 3 Re-compute the centroids using the current cluster memberships. vj = n i=1 I(ci = j)xi n i=1 I(ci = j) 4 If a convergence criterion is not met, go to 2. 10 / 142
  • 9. Images/cinvestav- What is the code trying to do? It is trying to find a partition S K-means tries to find a partition S such that it minimizes the cost function: min S N k=1 i:xi∈Ck (xi − µk)T (xi − µk) (1) Where µk is the centroid for cluster Ck µk = 1 Nk i:xi∈Ck xi (2) Where Nk is the number of samples in the cluster Ck. 11 / 142
  • 10. Images/cinvestav- What Stopping/convergence criterion should we use? First No (or minimum) re-assignments of data points to different clusters. Second No (or minimum) change of centroids. Third Minimum decrease in the sum of squared error (SSE), Ck is cluster k. vk is the centroid of cluster Ck. SSE = K k=1 x∈ck dist (x, vk)2 13 / 142
  • 11. Images/cinvestav- The distance function Actually, we have the following distance functions: Euclidean dist(x, y) = ||x − y|| = (x − y)T (x − y) Manhattan dist(x, y) = ||x − y||1 = n i=1 |xi − yi| Mahalanobis dist(x, y) = ||x − y||A = (x − y)T A(x − y) 15 / 142
  • 12. Images/cinvestav- An example Dropping two possible centroids 17 / 142
  • 16. Images/cinvestav- An example We re-calculate centroids and keep going 21 / 142
  • 17. Images/cinvestav- Strengths of K-means Strengths Simple: easy to understand and to implement Efficient: Time complexity: O(tKN), where N is the number of data points, K is the number of clusters, and t is the number of iterations. Since both K and t are small. K-means is considered a linear algorithm. Popularity K-means is the most popular clustering algorithm. Note that It terminates at a local optimum if SSE is used. The global optimum is hard to find due to complexity. 23 / 142
  • 18. Images/cinvestav- Weaknesses of K-means Important The algorithm is only applicable if the mean is defined. For categorical data, K-mode - the centroid is represented by most frequent values. In addition The user needs to specify K. Outliers The algorithm is sensitive to outliers. Outliers are data points that are very far away from other data points. Outliers could be errors in the data recording or some special data points with very different values. 24 / 142
  • 19. Images/cinvestav- Weaknesses of K-means: Problems with outliers A series of outliers 4 2 0 2 4 6 8 3 2 1 0 1 2 3 4 4 2 0 2 4 6 8 3 2 1 0 1 2 3 4 25 / 142
  • 20. Images/cinvestav- Weaknesses of K-means: Problems with outliers Nevertheless, if you have more dense clusters 4 2 0 2 4 6 8 4 3 2 1 0 1 2 3 4 4 2 0 2 4 6 8 4 3 2 1 0 1 2 3 4 26 / 142
  • 21. Images/cinvestav- Weaknesses of K-means: How to deal with outliers One method To remove some data points in the clustering process that are much further away from the centroids than other data points. To be safe, we may want to monitor these possible outliers over a few iterations and then decide to remove them. Another method To perform random sampling. Since in sampling we only choose a small subset of the data points, the chance of selecting an outlier is very small. Assign the rest of the data points to the clusters by distance or similarity comparison, or classification. 27 / 142
  • 22. Images/cinvestav- Weaknesses of K-means (cont...) The algorithm is sensitive to initial seeds 28 / 142
  • 23. Images/cinvestav- Weaknesses of K-means : Different Densities We have three cluster nevertheless 29 / 142
  • 24. Images/cinvestav- Weaknesses of K-means: Non-globular Shapes Here, we notice that K-means may only detect globular shapes 10 8 6 4 2 0 2 4 6 8 6 4 2 0 2 4 6 10 8 6 4 2 0 2 4 6 8 6 4 2 0 2 4 6 30 / 142
  • 25. Images/cinvestav- Weaknesses of K-means: Non-globular Shapes However, it sometimes work better than expected 5 0 5 6 4 2 0 2 4 6 5 0 5 6 4 2 0 2 4 6 31 / 142
  • 26. Images/cinvestav- Consider the following Theorem Every matrix A ∈ Rm×n has an SVD. Frobenious Matrix Norm A F = m i=1 n j=1 a2 ij = trace (AT A) 33 / 142
  • 27. Images/cinvestav- Then, you have a the Eckhart-Young Theorem Theorem Let A be a real m × n matrix. Then for any k ∈ N and any m × m orthogonal projection matrix P of rank k, we have A − PkA F ≤ A − PA F with Pk = k i=1 uiuT i 34 / 142
  • 28. Images/cinvestav- Thus We have the Covariance matrix S = 1 N − 1 N i=1 (xi − x) (xi − x)T Therefore, we have the following decomposition S = UΣUT Where UUT = I and U is a d × d matrix 35 / 142
  • 29. Images/cinvestav- Orthogonal Projection Therefore, we have that U is a orthogonal projection Given that UUT = I and Ux = x Now, we can re-write k-means fk−mean = min µ1,...µk i∈[n] min j∈[k] xi − µj 2 36 / 142
  • 30. Images/cinvestav- Then PCA can also re-write the cost function fPCA = min Pk i∈[n] xi − Pkxi 2 = min Pk i∈[n] min yi∈Pk xi − yi 2 Where Given that Pk is a projection into dimension k and y ∈ Pk means that Pky = y Furthermore arg min y∈P x − y = Px 37 / 142
  • 31. Images/cinvestav- Thus, using the Eckhart-Young Theorem Assume P∗ k which contains the k optimal centers Given that µj ∈ P∗ k fk−mean = i∈[n] min j∈[k] xi − µ∗ j 2 ≥ i∈[n] min yi∈P∗ k xi − yi 2 ≥ min Pk i∈[n] min yi∈Pk xi − yi 2 = min Pk i∈[n] xi − Pkxi 2 = fPCA 38 / 142
  • 32. Images/cinvestav- Therefore Now, consider solving k-means on the points yi instead They are embedded into dimension exactly k by projection Pk Therefore, given Pxi = yi and µj = Pµj Where the S and µ are the assignments and centers of the projected points yi: j∈[k] i∈Sj xi − µj 2 ≥ j∈[k] i∈Sj Pxi − Pµj 2 = j∈[k] i∈Sj yi − µj 2 ≥ j∈[k] i∈Sj yi − µj 2 = f∗ k−means 39 / 142
  • 33. Images/cinvestav- Therefore, your best beat Steps 1 Compute the PCA of the points xi into dimension k. 2 Solve k-means on the points yi in dimension k. 3 Output the resulting clusters and centers. 40 / 142
  • 34. Images/cinvestav- Given that We have that fnew = j∈[k] i∈S∗ j xi − µ∗ j 2 = ∗ Therefore by the fact that xi − yi and yi − µ∗ j are perpendiculars ∗ = j∈[k] i∈S∗ j xi − yi 2 + yi − µ∗ j 2 = ∗∗ Finally ∗∗ = i∈[n] xi − yi 2 + j∈[k] i∈S∗ j yi − µ∗ j 2 41 / 142
  • 35. Images/cinvestav- Therefore, we have Something Notable fPCA + f∗ k−means ≤ 2fk−means 42 / 142
  • 36. Images/cinvestav- Until now, we have assumed a Euclidean metric space Important step The cluster representatives m1, ..., mk in are taken to be the means of the currently assigned clusters. We can generalize this by using a dissimilarity D (xi, xi ) By using an explicit optimization with respect to m1, ..., mk 44 / 142
  • 37. Images/cinvestav- Algorithm K-meoids Step 1 For a given cluster assignment C find the observation in the cluster minimizing total distance to other points in that cluster: i∗ k = arg min {i|C(i)=k} C(i )=k D (xi, xi ) Then mk = xi∗ k k = 1, ..., K are the current estimates of the cluster centers. 46 / 142
  • 38. Images/cinvestav- Now Step 2 Given a current set of cluster centers m1, ..., mk, minimize the total error by assigning each observation to the closest (current) cluster center: C (i) = arg min 1≤k≤K D (xi, mk) Iterate over steps 1 and 2 Until the assignments do not change. 47 / 142
  • 39. Images/cinvestav- Complexity Problem, solving the first step has a complexity for k = 1, ..., K O N2 k Given a set of cluster “centers,” {i1, i2, ..., iK} Given the new assignments C (i) = arg min 1≤k≤K D (xi, mk) It requires a complexity of O (KN) as before. 49 / 142
  • 40. Images/cinvestav- Therefore We have that K-medoids is more computationally intensive than K-means. 50 / 142
  • 41. Images/cinvestav- The K-center Problem The input It is a set of points with distances represented by a weighted graph G = (V, V × V ). Output Select K centroids in G. Such that minimize maximum distance of every point to a centroid. Theorem (In the general case for any distance) It is NP-hard to approximate the general K-center problem within any factor α. 52 / 142
  • 42. Images/cinvestav- Therefore We change the distance to be constrained by The Triangle Inequality Given x, y and z L (x, z) ≤ L (x, y) + L (y, z) 53 / 142
  • 43. Images/cinvestav- We have a new criterion Instead of using the K-mean criterion We use the K-center criterion under the triangle inequality. New Criterion The K-center criterion partitions the points into K clusters so as to minimize the maximum distance of any point to its cluster center. 55 / 142
  • 44. Images/cinvestav- Explanation Setup Suppose we have a data set A that contains N objects. We want the following We want to partition A into K sets labeled C1, C2, ..., CK. Now Define a cluster size for any cluster Ck as follows. The smallest value D for which all points in this cluster Ck are: Within distance D of each other. Or within distance D 2 of some point called the cluster center. 56 / 142
  • 45. Images/cinvestav- Another Way We have Another way to define the cluster size: D is the maximum pairwise distance between an arbitrary pair of points in the cluster. Or twice the maximum distance between a data point and a chosen centroid. Thus Denoting the cluster size of Ck by Dk, we have that the cluster size of partition (the way the points are grouped) S by: D = max k=1,...,K Dk (3) In another words The cluster size of a partition composed of multiple clusters is the maximum size of these clusters. 57 / 142
  • 46. Images/cinvestav- Comparison with K-means We use the following distance for comparison In order to compare these methods, we use Euclidean distance. In K-means we assume the distance between vectors is the squared Euclidean distance K-means tries to find a partition S that minimizes: min S N k=1 i:xi∈Ck (xi − µk)T (xi − µk) (4) Where µk is the centroid for cluster Ck µk = 1 Nk i:xi∈Ck xi (5) 59 / 142
  • 47. Images/cinvestav- Now, what about the K-centers K-center, on the other hand, minimizes the worst case distance to these centroids min S max k=1,...,K max i:xi∈Ck (xi − µk)T (xi − µk) (6) Where µk is called the “centroid”, but may not be the mean vector. Properties The above objective function shows that for each cluster, only the worst scenario matters, that is, the farthest data point to the centroid. Moreover, among the clusters, only the worst cluster matters, whose farthest data point yields the maximum distance to the centroid comparing with the farthest data points of the other clusters. 60 / 142
  • 48. Images/cinvestav- In other words First This minimax type of problem is much harder to solve than solving the objective function of k-means. Another formulation of K-center minimizes the worst case pairwise distance instead of using centroids min S max k=1,...,K max i,j:xi,xj∈Ck L (xi, xj) (7) With L (xi, xj) denotes any distance between a pair of objects in the same cluster. 61 / 142
  • 49. Images/cinvestav- Greedy Algorithm for K-Center Main Idea The idea behind the Greedy Algorithm is to choose a subset H from the original dataset S consisting of K points that are farthest apart from each other. Intuition Since the points in set H are far apart then the worst-case scenario has been taken care of and hopefully the cluster size for the partition is small. Thus Each point hk ∈ H represents one cluster or subset of points Ck. 63 / 142
  • 50. Images/cinvestav- Then Something Notable We can think of it as a centroid. However Technically it is not a centroid because it tends to be at the boundary of a cluster, but conceptually we can think of it as a centroid. Important The way that we partition these points given the centroids is the same as in K-means, that is, the nearest-neighbor rule. 64 / 142
  • 51. Images/cinvestav- Specifically We do the following For every point xi, in order to see which cluster Ck it is partitioned into, we compute its distance to each cluster centroid as follows, and find out which centroid is the closest: L (xi, hk) = min k =1,...,K L (xi, hk ) (8) Thus Whichever centroid with the minimum distance is selected as the cluster for xi. 65 / 142
  • 52. Images/cinvestav- Important For K-center clustering We only need pairwise distance L (xi, xj) for any xi, xj ∈ S. Where xi can be a non-vector representation of the objects. As long we can calculate L (xi, xj) which makes the K-center more general than the K-means. 66 / 142
  • 53. Images/cinvestav- Properties Something Notable The greedy algorithm achieves an approximation factor of 2 as the distance measure L satisfies the triangle inequality. Thus, we have that D∗ = min S max k=1,...,K max i,j:xi,xj∈Ck L (xi, xj) (9) Then, we have the following guarantee for the greedy algorithm D ≤ 2D∗ (10) 67 / 142
  • 54. Images/cinvestav- Nevertheless We have that K-center does not provide a locally optimal solution. We can get only a solution Guaranteeing to be within a certain performance range of the theoretical optimal solution. 68 / 142
  • 55. Images/cinvestav- Setup First Set H denotes the set of cluster centroids or cluster of representative objects {h1, ..., hk} ⊂ S. Second Let cluster (xi) be the identity of the cluster xi ∈ S belongs to. Third The distance dist (xi) is the distance between xi and its closest cluster representative object (centroid): dist (xi) = min hj∈H L (xi, hj) (11) 70 / 142
  • 56. Images/cinvestav- The Main Idea Something Notable We always assign xi to the closest centroid. Therefore dist (xi) is the minimum distance between xi and any centroid. The Algorithm is Iterative We generate one cluster centroid first and then add others one by one until we get K clusters. The set of centroids H starts with only a single centroid, h1. 71 / 142
  • 57. Images/cinvestav- Algorithm Step 1 Randomly select an object xj from S, let h1 = xj, H = {h1}. It does not matter how xj is selected. Step 2 For j = 1 to n: 1 dist (xi) = L (xi, h1). 2 cluster (xi). In other words The dist (xi) is the computed distance between xj and h1. Because so far we only have one cluster, we will assign a cluster label 1 to every xj. 73 / 142
  • 58. Images/cinvestav- Algorithm Step 3 For i = 2 to K 1 D = max xj:xj∈S−H dist (xj) 2 Choose hi ∈ S − H such that dist (hi) == D 3 H = H ∪ {hi} 4 for j = 1 to N 5 if L (xj, hi) ≤ dist (xj) 6 dist (xj) = L (xj, hi) 7 cluster (xj) = i 74 / 142
  • 59. Images/cinvestav- Thus As the algorithm progresses We gradually add more and more cluster centroids, beginning with 2 until we get to K. At each iteration, we find among all of the points which are not yet included in the set, a worst point: Worst in the sense that this point has maximum distance to its corresponding centroid. This worst point It is added to the set H. To stress gain, points already included in H are not among the consideration. 75 / 142
  • 60. Images/cinvestav- Implementation We can use the following for implementation The disjoint-set data structure with the following operations: MakeSet Find Union Remove Iteratively through the disjoint trees. Of course it is necessary to have extra fields for the necessary distances Thus, each node-element for the sets must have a field dist (xj) such that dist (xj) = L (xj, Find (xj)) (12) 77 / 142
  • 61. Images/cinvestav- Although Other things need to be taken in consideration I will allow to you to think about them 78 / 142
  • 62. Images/cinvestav- Example Running the k-center and k-means algorithms allows to see that for different densities k-center is more robust 10 8 6 4 2 0 2 4 6 4 2 0 2 4 6 10 8 6 4 2 0 2 4 6 4 2 0 2 4 6 80 / 142
  • 63. Images/cinvestav- Example Decreasing the density of one of the clusters, we see a degradation on the clusters 10 8 6 4 2 0 2 6 4 2 0 2 4 6 10 8 6 4 2 0 2 6 4 2 0 2 4 6 81 / 142
  • 64. Images/cinvestav- Using Centroids of the K-center to initialize K-mean Thus, we can use the centroids of K-center to try to improve upon K-means to a certain degree 10 8 6 4 2 0 2 4 6 4 2 0 2 4 6 10 8 6 4 2 0 2 4 6 4 2 0 2 4 6 82 / 142
  • 65. Images/cinvestav- The Running Time We have that The running time of the algorithm is O (KN), where K is the number of clusters generated and N is the size of the data set. Why? Because K-center only requires pairwise distance between any point and the centroids. 84 / 142
  • 66. Images/cinvestav- Main Bound of the K-center algorithm Lemma Given the distance measure L satisfying the triangle inequality. If the partition obtained by the greedy algorithm is S and the optimal partition be S∗, such that the cluster size of S be D and the one for S∗ is D∗, then D ≤ 2D∗ (13) 85 / 142
  • 67. Images/cinvestav- Proof If we look only at the first j centroid It generates a partition j with size Dj and also: The cluster size is the size of the biggest cluster in the current partition. The size of every cluster is defined as the maximum distance between a point and the cluster centroid Thus D1 ≥ D2 ≥ D3... (14) Why? Because D = max xj:xj∈S−H dist (xj) and lines 5-7 in the step 3 and using induction!!! You can prove that part by yourselves.. 87 / 142
  • 68. Images/cinvestav- Graphically It is easy to see that when the inner loop changes distances 88 / 142
  • 69. Images/cinvestav- Now It is necessary to prove ∀i < j, L (hi, hj) ≥ Dj−1 (15) Thus Dj−1 is a lower bound for the distance between hi and hj. 89 / 142
  • 70. Images/cinvestav- Proof of the Previous Statement It is possible to see that L (hj−2, hj) ≥ L (hj−1, hj) (16) How? Assume it is not true. Then, L (hj−2, hj) < L (hj−1, hj) 90 / 142
  • 71. Images/cinvestav- Then We have that given the partition S generated by the greedy algorithm hj−2 ∈Cj−2 , hj−2 ∈Cj for Cj−2, Cj ∈ S It is a contradiction because hj−2 is generated by the algorithm such that cannot be in any other cluster!!! Thus iteratively L (h1, hj) ≥ L (h2, hj) ≥ L (h3, hj) ≥ ... ≥ L (hj−1, hj) = Dj−1 (17) 91 / 142
  • 72. Images/cinvestav- Not only that Not only that ∀j, ∃i < j, L (hi, hj) = Dj−1 (18) Therefore Therefore, Dj−1 is not only the lower bound for the distance between hi and hj, it is also the exact boundary for a specific i. 92 / 142
  • 73. Images/cinvestav- If, we continue with the proof Now Let us consider the optimal partition S∗ with K clusters and its size D∗. Suppose the greedy algorithm generates the centroids H = {h1, h2, ..., hK}. For the proof, we are adding one more, hK+1. This can be done without losing generality. According to the pigeonhole principle, at least two of the centroids among {h1, h2, ..., hK, hK+1}will fall into one cluster k of the partition S∗. Thus assume 1 ≤ i < k < j ≤ K + 1 ⇒ Using the triangle inequality: L (hi, hj) ≤ L (hi, hk) + L (hk, hj) ≤ D∗ + D∗ = 2D∗ (19) 93 / 142
  • 74. Images/cinvestav- Then In addition Also L (hi, hj) ≥ Dj−1 ≥ Dk then Dk ≤ 2D∗ Given S, the partition generated by the greedy algorithm We define ∆ as ∆ = max xj:xj∈S−H min hk:hk∈H L (xj, hk) (20) Basically The maximum of all points that are not centroids that minimize the distance to some centroid for the partition generated by the greedy algorithm. 94 / 142
  • 75. Images/cinvestav- Now, we are ready for the final part Let hK+1 be an element in S − H Such that min hk:hk∈H L (hK+1, hk) = ∆ (21) By definition L (hK+1, hk) ≥ ∆, ∀k = 1, ..., K (22) Thus, we have the following sets Let Hk = {h1, ..., hk} with k = 1, 2, ..., K. 95 / 142
  • 76. Images/cinvestav- Now Consider the distance between hi and hj for i < j ≤ K According the greedy algorithm min hk:hk∈Hj−1 L (hj, hk) ≥ min hk:hk∈Hj−1 L (xl, hk) for any xl ∈ S − Hj Basically remember that the hi are obtained by finding the farthest points. 96 / 142
  • 77. Images/cinvestav- Now, we have Since hK+1 ∈ S − H and S − H ⊂ S − Hj L (hj, hi) ≥ min hk:hk∈Hj−1 L (hj, hk) ≥ min hk:hk∈Hj−1 L (hK+1, hk) ≥ min hk:hk∈H L (hK+1, hk) =∆ We have shown that for any for i < j ≤ K + 1 L (hj, hi) ≥ ∆ (23) 97 / 142
  • 78. Images/cinvestav- Now Consider the optimal partition S∗ = {C∗ 1 , C∗ 2 , ..., C∗ K} Thus at least 2 of the K + 1 elements h1, h2, ..., hK+1 will be covered by one cluster. Assume that hi and hj belong to the same cluster in S∗ . Then L (hi, hj) ≤ D∗. In addition We have that since L (hi, hj) ≥ ∆ then ∆ ≤ D∗ 98 / 142
  • 79. Images/cinvestav- In addition Consider elements xm and xn in any cluster represented by hk L (xm, hk) ≤ ∆ and L (xn, hk) ≤ ∆ (24) By Triangle Inequality L (xm, xk) ≤ L (xm, hk) + L (xn, hk) ≤ 2∆ (25) Finally, there are two elements xm and xn in a cluster such that D = L (xm, xn) D = max k Dk ≤ 2∆ ≤ 2D∗ (26) 99 / 142
  • 80. Images/cinvestav- Some of the Fuzzy Clustering Models Fuzzy Clustering Model Bezdek, 1981 Possibilistic Clustering Model Krishnapuram - Keller, 1993 Fuzzy Possibilistic Clustering Model N. Pal - K. Pal - Bezdek, 1997 101 / 142
  • 81. Images/cinvestav- Fuzzy C-Means Clustering The input an unlabeled data set X = {x1, x2, x3, ..., xN }. xk ∈ Rp Output A partition S of the X as a matrix U of C × N. Set of cluster centers V = {v1, v2, ..., vC} ⊂ Rp 102 / 142
  • 82. Images/cinvestav- What we want Creation of the Cost Function First: We can use a distance defined as: xk − vi = (xk − vi)T (xk − vi) (27) The euclidean distance from a point k to a centroid i. NOTE other distances based in Mahalonobis can be taken in consideration. 103 / 142
  • 83. Images/cinvestav- Do you remember the cost function for K-means? Finding a partition S that minimizes the following function min S N k=1 k:xk∈Ci xk − vi 2 (28) Where vi = 1 Ni xk∈Ci xk We can rewrite the previous equation as min S N k=1 C i=1 I (xk ∈ Ci) xk − vi 2 (29) 104 / 142
  • 84. Images/cinvestav- In addition Did you notice that the membership is always one or zero? min S N k=1 C i=1 Membership I (xk ∈ Ci) xk − vi 2 (30) 105 / 142
  • 85. Images/cinvestav- Thus, we can rethink the membership using something “Fuzzy” What if we modify the cost function to something like this min S N k=1 C i=1 Membership Fuzzy Value xk − vi 2 (31) This means that we think that each cluster Ci is “Fuzzy” We can assume a fuzzy set for the cluster Ci with membership function: Ai : Rp → [0, 1] (32) Such that we can tune it by using a power i.e. decreasing it by a m power. 107 / 142
  • 86. Images/cinvestav- Under the following constraints First Ai (xk) ∈ [0, 1] ∀i, k (33) Second 0 < N k=1 Ai (xk) < N ∀i (34) Third C i=1 Ai (xk) = 1 ∀k (35) 108 / 142
  • 87. Images/cinvestav- Final Cost Function Properties Jm (S) = N k=1 C i=1 [Ai (xk)]m xk − vi 2 (36) Under the constraints Ai (xk) ∈ [0, 1], for 1 ≤ k ≤ N and 1 ≤ i ≤ C. C i=1 Ai (xk) = 1, for 1 ≤ k ≤ N. 0 < N k=1 Ai (xk) < n, for 1 ≤ i ≤ C. m > 1. 109 / 142
  • 88. Images/cinvestav- Using the Lagrange Multipliers New cost function ¯Jm (S) = N k=1 C i=1 [Ai (xk)]m xk − vi 2 − N k=1 λk C i=1 Ai (xk) − 1 (37) Derive with respect to Ai (xk) ∂ ¯Jm (S) ∂Ai (xk) = mAi (xk)m−1 xk − vi 2 − λk = 0 (38) Thus Ai (xk) = λk m xk − vi 2 1 m−1 (39) 111 / 142
  • 89. Images/cinvestav- Using the Lagrange Multipliers Sum over all i’s C i=1 Ai (xk) = λ 1 m−1 k m 1 m−1 xk − vi 2 m−1 (40) Thus λk = m C i=1 1 xk−vi 2 m−1 m−1 (41) Plug Back on equation 38 using j instead of i m C j=1 1 xk−vj 2 m−1 m−1 = mAi (xk)m−1 xk − vi 2 (42) 112 / 142
  • 90. Images/cinvestav- Finally We have that Ai (xk) = 1 C j=1 xk−vi 2 xk−vj 2 1 m−1 (43) In a similar way we have vi = N k=1 Ai (xk)m xk N k=1 Ai (xk)m (44) 113 / 142
  • 91. Images/cinvestav- Final Algorithm Fuzzy c-means 1 Let t = 0. Select an initial fuzzy pseudo-partition. 2 Calculate the initial C cluster centers using, v (t) i = N k=1 A (t) i (xk)m xk N k=1 A (t) i (xk)m . 3 Update for each xk the membership function by Case I: xk − v (t) i 2 > 0 for all i ∈ {1, 2, ..., C} then A (t+1) i (xk) = 1 C j=1 xk−v (t) i 2 xk−v (t) j 2 1 m−1 Case II: xk − v (t) i 2 = 0 for some i ∈ I ⊆ {1, 2, ..., C} then define A (t+1) i (xk) by any nonnegative number such that i∈I A (t+1) i (xk) = 1 and A (t+1) i (xk) = 0 for i /∈ I. 4 If S(t+1) − S(t) = max i,k A (t+1) i (xk) − A (t) i (xk) ≤ stop; otherwise increase t and go to step 2. 114 / 142
  • 92. Images/cinvestav- Final Output The Matrix U The elements of U are Uik = Ai (xk). The centroids V = {v1, v2, ..., vC} 115 / 142
  • 93. Images/cinvestav- Example Here the clustering of two Gaussian Clusters with µ1 = (4, 0)T , µ2 = (10, 0)T and variance 1.0 24681012 4 3 2 1 0 1 2 3 4 MembershipLabel0.30.40.50.60.70.80.91.01.1 117 / 142
  • 94. Images/cinvestav- Example Here the clustering of two Gaussian Clusters 24681012 432101234 MembershipLabel 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 118 / 142
  • 95. Images/cinvestav- Example Here the clustering of two Gaussian Clusters 2 4 6 8 10 12 4 3 2 1 0 1 2 3 4 MembershipLabel 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 119 / 142
  • 96. Images/cinvestav- Example Here the clustering of two Gaussian Clusters 24681012 432101234 MembershipLabel 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 120 / 142
  • 97. Images/cinvestav- Pros and Cons of Fuzzy C-Means Advantages Unsupervised Always converges Disadvantages Long computational time Sensitivity to the initial guess (speed, local minima) Sensitivity to noise One expects low (or even no) membership degree for outliers (noisy points) 122 / 142
  • 98. Images/cinvestav- Outliers, Disadvantage of FCM After running without outliers 2 4 6 8 10 12 4 2 0 2 4 MembershipLabel 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 123 / 142
  • 99. Images/cinvestav- Outliers, Disadvantage of FCM Now add outliers (Shown in blue x s) and their high memberships 4 6 8 10 12 14 4 2 0 2 4 MembershipLabel 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 124 / 142
  • 100. Images/cinvestav- Outliers, Disadvantage of FCM Now add outliers (Shown in blue x s) and their high memberships 4 6 8 10 12 14 42024 MembershipLabel 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 125 / 142
  • 101. Images/cinvestav- Krinshapuram and Keller Following Zadeh They took in consideration that each class prototype as defining an elastic constraint. What? Giving the ti (xk) as degree of compatibility of sample xk with cluster Ci. We do the following If we consider the Ci as fuzzy sets over the set of samples X = {x1, x2, ..., xN } 127 / 142
  • 102. Images/cinvestav- Here is the Catch!!! We should not use the old membership C i=1 Ai (xk) = 1 (45) Because This is quite probabilistic... which is not what we want!!! Thus We only ask for membership, now using the possibilistic notation of ti (xk) (This is known as typicality value), to be in the interval [0, 1]. 128 / 142
  • 103. Images/cinvestav- New Constraints First ti (xk) ∈ [0, 1] ∀i, k (46) Second 0 < N k=1 ti (xk) < N ∀i (47) Third max i ti (xk) > 0 ∀k (48) 129 / 142
  • 104. Images/cinvestav- We have the following cost function Cost Function N k=1 C i=1 [ti (xk)]m xk − vi 2 (49) Problem Unconstrained optimization of first term will lead to the trivial solution ti (xk) = 0 for all i, k. Thus, we can introduce the following constraint ti (xk) → 1 (50) Roughly it means to make the typicality values as large as possible. 131 / 142
  • 105. Images/cinvestav- We can try to control this tendency By putting all them together in N k=1 (1 − ti (xk))m (51) With m to control the tendency of ti (xk) → 1 We can also run this tendency over all the cluster using a suitable wi > 0 per cluster C i=1 wi N k=1 (1 − ti (xk))m (52) 132 / 142
  • 106. Images/cinvestav- Possibilistic C-Mean Clustering (PCM) The final Cost Function Jm (S) = N k=1 C i=1 [ti (xk)]m xk − vi 2 + C i=1 wi N k=1 (1 − ti (xk))m (53) Where ti (xk) are typicality values. wi are cluster weights 133 / 142
  • 107. Images/cinvestav- Explanation First Term N k=1 C i=1 [ti (xk)]m xk − vi 2 (54) It demands that the distance from feature vector to prototypes be as small as possible!!! Second Term c i=1 wi n k=1 (1 − ti (xk))m (55) It forces the typicality values ti (xk) to be as large as possible. 134 / 142
  • 108. Images/cinvestav- Final Updating Equations Typicality Values ti (xk) = 1 1 + xk−vi 2 wi 1 m−1 , ∀i, k (56) Cluster Centers vi = N k=1 ti (xk)m xk n k=1 ti (xk)m (57) 135 / 142
  • 109. Images/cinvestav- Final Updating Equations Weights wi = M N k=1 [ti (xk)]m xk − vi 2 n k=1 [ti (xk)]m , (58) with M > 0. 136 / 142
  • 110. Images/cinvestav- Possibilistic can deal with outliers Two Gaussian Clusters with µ1 = (4, 0)T , µ2 = (10, 0)T and σ2 = 1.0 4 6 8 10 3 2 1 0 1 2 3 4 MembershipLabel 0.2 0.4 0.6 0.8 1.0 137 / 142
  • 111. Images/cinvestav- Possibilistic can deal with outliers Now add outliers 6 8 10 12 14 3 2 1 0 1 2 3 4 MembershipLabel 0.0 0.2 0.4 0.6 0.8 1.0 1.2 138 / 142
  • 112. Images/cinvestav- Possibilistic can deal with outliers Another Angle 6 8 10 12 14 32101234 MembershipLabel 0.0 0.2 0.4 0.6 0.8 1.0 1.2 139 / 142
  • 113. Images/cinvestav- Possibilistic can deal with outliers Another Angle 6 8 10 12 1432101234 MembershipLabel 0.0 0.2 0.4 0.6 0.8 1.0 1.2 140 / 142
  • 114. Images/cinvestav- Pros and Cons of Fuzzy C-Means Advantages Clustering noisy data samples. Disadvantages Very sensitive to good initialization. For example, I needed to run fuzzy C-means to obtain good initial typicalities In Between!!! Coincident clusters may result. Because the columns and rows of the typicality matrix are independent of each other. This could be advantageous (start with a large value of C and get less distinct clusters) 141 / 142
  • 115. Images/cinvestav- Nevertheless There are more advanced clustering methods based on the possibilistic and fuzzy idea Pal, N.R.; Pal, K.; Keller, J.M.; Bezdek, J.C., "A Possibilistic Fuzzy c-Means Clustering Algorithm," Fuzzy Systems, IEEE Transactions on , vol.13, no.4, pp.517,530, Aug. 2005. 142 / 142