SlideShare a Scribd company logo
1 of 32
Download to read offline
Clustering, Mixture Models, and EM Algorithm
Man-Wai MAK
Dept. of Electronic and Information Engineering,
The Hong Kong Polytechnic University
enmwmak@polyu.edu.hk
http://www.eie.polyu.edu.hk/∼mwmak
References:
C. Bishop, Pattern Recognition and Machine Learning, Springer, 2006. (Chapter 9)
S.Y. Kung, M.W. Mak and S.H. Lin, Biometric Authentication: A Machine Learning
Approach, Prentice Hall, 2005. (Chapter 3)
M.W. Mak and J.T. Chien, Machine Learning for Speaker Recognition, Cambridge
University Press, 2020. (Chapters 2 and 3)
November 1, 2020
Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 1 / 32
Overview
1 Motivations
2 Clustering
K-means
Gaussian Mixture Models
3 The EM Algorithm
Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 2 / 32
Motivations
Clustering is a kind of unsupervised learning, which has been used in
many disciplines.
Power Electronics: “Genetic k-means algorithm based RBF network
for photovoltaic MPP prediction.” Energy, 35.2 (2010): 529-536.
Telecommunication: “An energy efficient hierarchical clustering
algorithm for wireless sensor networks.” INFOCOM 2003, Vol. 3. IEEE,
2003.
Photonics: “Contiguity-enhanced k-means clustering algorithm for
unsupervised multispectral image segmentation.” Optical Science,
Engineering and Instrumentation’97, International Society for Optics
and Photonics, 1997.
Multimedia: “Normalized cuts and image segmentation,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no.
8, pp. 888-905, Aug 2000.
Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 3 / 32
K-means
Divide a data set X = {xt; t = 1, . . . , T} into K groups, each
represented by its centroid denoted by µk, k = 1, . . . , K.
The task is
1 to determine the K centroids {µ1, . . . , µK} and
2 to assign each pattern xt to one of the centroids.
Mathematically speaking, one denotes the centroid associated with xt
as ct, where ct ∈ {µ1, . . . , µK}.
Then the objective of the K-means algorithm is to minimize the sum
of squared errors:
E(X) =
T
X
t=1
kxt − ctk2
=
T
X
t=1
(xt − ct)T
(xt − ct).
(1)
Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 4 / 32
K-means
Let Xk denotes the set of data vectors associated with the k-th cluster
with the centroid µk and Nk denotes the number of vectors in it.
The learning rule of the K-means algorithm consists of:
1 Determine the membership of a data vector:
x ∈ Xk if kx − µkk < kx − µjk ∀j 6= k. (2)
2 Update the representation of the cluster: The centroid is updated
based on the new membership:
µk =
1
Nk
X
x∈Xk
x, k = 1, . . . , K. (3)
Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 5 / 32
K-means
K-means procedure:
! !
!
Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 6 / 32
K-means
K-means procedure:
1 Randomly picks K samples from the training data and consider them
as the centroids. In the example on previous page, K = 3.
2 For each training sample, assign it to the nearest centroid. In this
example, samples are assigned to either green, red or blue diamond.
3 For each cluster (green, red, or blue), re-compute the cluster means.
Then, repeat step 2 until no change in the centroids.
Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 7 / 32
Example Applications of K-means
Assume that we got some iris flowers
Setosa Versicolor Virginica
Four attributes (features): (1) sepal length, (2) sepal width, (3) petal
length, and (4) petal width
We only know there are 3 types of iris flowers but no labels are
available in the dataset.
We may apply K-means to divide the 4-dimensional vectors into 3
clusters.
But we still do not know which cluster belongs to which iris type.
Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 8 / 32
Example Applications of K-means
Results of K-mean clustering:
Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 9 / 32
Example Applications of K-means
K-mean Clustering of handwritten digits with K = 10
Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 10 / 32
Example Applications of K-means
K-mean Clustering of handwritten digits with K = 4
Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 11 / 32
Gaussian Mixture Models (GMM)
Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 12 / 32
Gaussian Mixture Models
A Gaussian mixture model (GMM) is a linear weighted sum of K
Gaussian densities:
p(x) =
K
X
k=1
wkN(x|µk, Σk),
where wk ≡ Pr(mix = k) is the k-th mixture coefficient and
N(x|µk, Σk) =
1
(2π)
D
2 |Σk|
1
2
exp

−
1
2
(x − µk)T
Σ−1
k (x − µk)

is the k-th Gaussian density with mean µk and covariance matrix Σk.
Note that
PK
k=1 wk = 1.
Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 13 / 32
Gaussian Mixture Models
GMM with 3 mixtures (K = 3):
Bishop, 2006
Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 14 / 32
Gaussian Mixture Models
GMM clustering:
K =1 K = 2
Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 15 / 32
Training of GMM by Maximum Likelihood
Given a set of N-independent and identically distributed (iid) vectors
X = {xn; n = 1, . . . , N}, the log of the likelihood function is given by
ln p(X|θ) = log
( N
Y
n=1
K
X
k=1
wkN(xn|µk, Σk)
)
=
N
X
n=1
log
( K
X
k=1
wkN(xn|µk, Σk)
)
To find the parameters θ = {wk, µk, Σk}K
k=1 that maximize
log p(X|θ), we may set ∂ log p(X)
∂θ = 0 and solve for θ.
But this method will not give a closed-form solution for θ.
The trouble is that the summation appears inside the logarithm.
Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 16 / 32
Training of GMM by Maximum Likelihood
An elegant method for finding maximum-likelihood solutions for
model with latent variable is the expectation-maximization (EM)
algorithm.
In GMM, for each data point xn, we do not know which Gaussian
generates it. So, the latent information is the Gaussian ID for each
xn.
Define Z = {znk; n = 1, . . . , N; k = 1, . . . , K} as the set of latent
variables, where znk = 1 if xn is generated by the k-th Gaussian;
otherwise znk = 0.
{X, Z} is called the complete data set, and X is the incomplete data
set.
In most cases, including GMM, maximizing log p(X, Z|θ) with
respect to θ is straightforward.
Fig. 9.5(a) [next page] shows the distribution p(x, z) of the complete
data, whereas Fig. 9.5(b) shows the distribution p(x) of the
incomplete data.
Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 17 / 32
GMM Joint vs Marginal Distributions
Source: C.M. Bishop (2006)
Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 18 / 32
EM Algorithm for GMM
However, we actually do not know Z. So, we could not compute
ln p(Z, X|θ).
Fortunately, we know its posterior distribution, i.e., P(Z|X, θ),
through the Bayes theorem:1
P(z|x) =
P(z)p(x|z)
p(x)
In the context of GMM, we compute the posterior probability for each
xn:
γ(znk) ≡ P(znk = 1|xn, θ) =
wkN(xn|µk, Σk)
PK
j=1 wjN(xn|µj, Σj)
(4)
Eq. 4 constitutes the E-step of the EM algorithm.
1
We denote probabilities and probability mass functions of discrete random variable
using capital letter P.
Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 19 / 32
EM Algorithm for GMM
Computing the posteriors of the latent variables can be considered as
alignment.
The posterior probabilities indicate the closeness of xn to individual
Gaussians in the Mahalanobis sense.
µ1 µ2
µ3
ot
(zt1)
(zt2)
(zt3)
Mahalanobis distance between x and y is
Dmah(x, y) =
q
(x − y)TΣ−1
(x − y)
Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 20 / 32
EM Algorithm for GMM
So, given the current estimate of the model parameters θold, we can
find its new estimate θ by computing the expected value of
ln p(Z, X|θ) under the posterior distribution of Z:
Q(θ|θold
) = EZ {log p(Z, X|θ)|X, θold
}
= Ez∼P (z|x){log p(Z, X|θ)|X, θold
}
=
N
X
n=1
K
X
k=1
P(znk = 1|xn, θold
) log p(xn, znk = 1|θ)
=
N
X
n=1
K
X
k=1
γ(znk) log p(xn, znk = 1|θ)
=
N
X
n=1
K
X
k=1
γ(znk) log p(xn|znk = 1, θ)P(znk = 1|θ)
=
N
X
n=1
K
X
k=1
γ(znk) log N(xn|µk, Σk)wk
(5)
Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 21 / 32
EM Algorithm for GMM
Then, we maximize Q(θ|θold) with respect to θ by setting
∂Q(θ|θold)
∂θ = 0 to obtain (see Tutorial):
µk =
PN
n=1 γ(znk)xn
PN
n=1 γ(znk)
Σk =
PN
n=1 γ(znk)(xn − µk)(xn − µk)T
PN
n=1 γ(znk)
wk =
1
N
N
X
n=1
γ(znk)
This constitutes the M-step.
Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 22 / 32
EM Algorithm for GMM
In practice, we compute the following sufficient statistics:
0th-order: Nk =
N
X
n=1
γ(znk) (6)
1st-order: fk =
N
X
n=1
γ(znk)xn (7)
2nd-order: Sk =
N
X
n=1
γ(znk)xtxT
t , (8)
where k = 1, . . . , K.
Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 23 / 32
EM Algorithm for GMM
The model parameters are then updated as follows:
µk =
1
Nk
fk (9)
Σk =
1
Nk
Sk − µkµT
k (10)
wk =
1
N
Nk. (11)
where k = 1, . . . , K.
Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 24 / 32
EM Algorithm for GMM
In summary, the EM algorithm iteratively performs the following:
Initialization: Randomly select K samples from X and assign them to
{µk}K
k=1. Set wk = 1
K and Σk = I, where k = 1, . . . , K.
E-Step: Find the posterior distribution of the latent (unobserved)
variables, given the observed data and the current estimate of the
parameters;
M-Step: Re-estimates the parameters to maximize the likelihood of
the observed data, under the assumption that the distribution found in
the E-step is correct.
The iterative process guarantees to increases the true likelihood or
leaves it unchanged (if a local maximum has already been reached).
Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 25 / 32
The EM Algorithm
The EM algorithm is an ideal candidate for determining the
parameters of a GMM.
EM is applicable to the problems where the observable data provide
only partial information or where some data are “missing”.
Each EM iteration is composed of two steps—Estimation (E) and
Maximization (M). The M-step maximizes a likelihood function that
is further refined in each iteration by the E-step.
Animations:
https://www.youtube.com/watch?v=v-pq8VCQk4M
Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 26 / 32
GMM: A Numerical Example
This example uses the following data as the observed data.
Section 3.2. Traditional Derivation of EM 55
Example 1: Hidden-State Problem
x1 = 1 x7 = 8
x4 = 4
x3 = 3
x2 = 2 x5 = 6 x6 = 7
x
(a)
Example 2: Partial-Data Problem
1 2 3 4 6
5
x1 = 1 x2 = 2 x3 = 3 x4 = 4
x
y = 5.0 or 6.0
(b)
Example 3: Doubly-Stochastic
(Partial-Data and Hidden-State)
12
2 4
3 5 6 11 13 14 15 16
1
x
x1 = {1 or 2} x6 = {15 or 16}
x4 = {11 or 12}
x3 = {4 or 5}
x2 = {3 or 4} x5 = {13 or 14}
(c)
Figure 3.3. One-dimensional example illustrating the concept of (a) hidden-
Assume that when EM begins,
θold
= {w1, {µ1, σ1}, w2, {µ2, σ2}}
= {0.5, {0, 1}, 0.5, {9, 1}} .
Therefore, one has
γ(zn1) =
w1
σ1
e−1
2
(xn−µ1)2/σ2
1
P2
k=1
wk
σk
e−1
2
(xn−µk)2/σ2
k
=
e−1
2
xn
2
e−1
2
xn
2
+ e−1
2
(xn−9)2
(12)
Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 27 / 32
GMM: A Numerical Example
Pattern Index (t) Pattern (xn) γ(zn1) γ(zn2)
1 1 1 0
2 2 1 0
3 3 1 0
4 4 1 0
5 6 0 1
6 7 0 1
7 8 0 1
Iteration Q(θ|θold
) µ1 σ2
1 µ2 σ2
2
0 -∞ 0 1 9 1
1 -43.71 2.50 1.25 6.99 0.70
2 -25.11 2.51 1.29 7.00 0.68
3 -25.11 2.51 1.30 7.00 0.67
4 -25.10 2.52 1.30 7.00 0.67
5 -25.10 2.52 1.30 7.00 0.67
Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 28 / 32
The E- and M-Steps
Section 3.2. Traditional Derivation of EM 59
©
©
©
©
©
©
H
HH
H
HH©©
©©
©
©
H
H
H
H
H
H
?
?
?
?
6
-
?
E-step:
Compute Q(θ|θn)
M-step:
θn+1 = θ∗
n = n + 1
θML = θn+1
Y
N
Compute θ∗
= arg max
θ
Q(θ|θn)
Set n = 0
Initialize θ0
Set Q(θ0|θ−1) = −∞
Q(θn+1|θn)−
Q(θn|θn−1) ≤ ξ
ξ: termination threshold
θML: maximum-likelihood estimate
Figure 3.4. The flow of the EM algorithm.
Now, a set of indicator variables is introduced to indicate the status of the hidden-
states:2
(j)
Figure: The flow of the EM algorithm.
Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 29 / 32
Example Applications of GMM
GMM Clustering of handwritten digits with K = 10
Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 30 / 32
Example Applications of GMM
GMM Clustering of handwritten digits with K = 4
Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 31 / 32
Example Applications of Clustering
DNN for Face Clustering
https://github.com/durgeshtrivedi/imagecluster
Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 32 / 32

More Related Content

Similar to Clustering-beamer.pdf

machinelearning project
machinelearning projectmachinelearning project
machinelearning projectLianli Liu
 
Enhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetEnhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetAlaaZ
 
An Acceleration Scheme For Solving Convex Feasibility Problems Using Incomple...
An Acceleration Scheme For Solving Convex Feasibility Problems Using Incomple...An Acceleration Scheme For Solving Convex Feasibility Problems Using Incomple...
An Acceleration Scheme For Solving Convex Feasibility Problems Using Incomple...Ashley Smith
 
Chapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.pptChapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.pptSubrata Kumer Paul
 
Matrix Completion Presentation
Matrix Completion PresentationMatrix Completion Presentation
Matrix Completion PresentationMichael Hankin
 
Tensor train to solve stochastic PDEs
Tensor train to solve stochastic PDEsTensor train to solve stochastic PDEs
Tensor train to solve stochastic PDEsAlexander Litvinenko
 
Cheatsheet unsupervised-learning
Cheatsheet unsupervised-learningCheatsheet unsupervised-learning
Cheatsheet unsupervised-learningSteve Nouri
 
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...CSCJournals
 
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...Waqas Tariq
 
Computation of Electromagnetic Fields Scattered from Dielectric Objects of Un...
Computation of Electromagnetic Fields Scattered from Dielectric Objects of Un...Computation of Electromagnetic Fields Scattered from Dielectric Objects of Un...
Computation of Electromagnetic Fields Scattered from Dielectric Objects of Un...Alexander Litvinenko
 
Information-theoretic clustering with applications
Information-theoretic clustering  with applicationsInformation-theoretic clustering  with applications
Information-theoretic clustering with applicationsFrank Nielsen
 
Recent developments on unbiased MCMC
Recent developments on unbiased MCMCRecent developments on unbiased MCMC
Recent developments on unbiased MCMCPierre Jacob
 
SPDE presentation 2012
SPDE presentation 2012SPDE presentation 2012
SPDE presentation 2012Zheng Mengdi
 
Paper id 21201488
Paper id 21201488Paper id 21201488
Paper id 21201488IJRAT
 
11ClusAdvanced.ppt
11ClusAdvanced.ppt11ClusAdvanced.ppt
11ClusAdvanced.pptSueMiu
 
Generative models : VAE and GAN
Generative models : VAE and GANGenerative models : VAE and GAN
Generative models : VAE and GANSEMINARGROOT
 

Similar to Clustering-beamer.pdf (20)

Lecture9 xing
Lecture9 xingLecture9 xing
Lecture9 xing
 
machinelearning project
machinelearning projectmachinelearning project
machinelearning project
 
Enhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetEnhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial Dataset
 
An Acceleration Scheme For Solving Convex Feasibility Problems Using Incomple...
An Acceleration Scheme For Solving Convex Feasibility Problems Using Incomple...An Acceleration Scheme For Solving Convex Feasibility Problems Using Incomple...
An Acceleration Scheme For Solving Convex Feasibility Problems Using Incomple...
 
Chapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.pptChapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.ppt
 
Matrix Completion Presentation
Matrix Completion PresentationMatrix Completion Presentation
Matrix Completion Presentation
 
Tensor train to solve stochastic PDEs
Tensor train to solve stochastic PDEsTensor train to solve stochastic PDEs
Tensor train to solve stochastic PDEs
 
Cheatsheet unsupervised-learning
Cheatsheet unsupervised-learningCheatsheet unsupervised-learning
Cheatsheet unsupervised-learning
 
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
 
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
 
Clustering lect
Clustering lectClustering lect
Clustering lect
 
Computation of Electromagnetic Fields Scattered from Dielectric Objects of Un...
Computation of Electromagnetic Fields Scattered from Dielectric Objects of Un...Computation of Electromagnetic Fields Scattered from Dielectric Objects of Un...
Computation of Electromagnetic Fields Scattered from Dielectric Objects of Un...
 
Information-theoretic clustering with applications
Information-theoretic clustering  with applicationsInformation-theoretic clustering  with applications
Information-theoretic clustering with applications
 
Lecture12 xing
Lecture12 xingLecture12 xing
Lecture12 xing
 
Recent developments on unbiased MCMC
Recent developments on unbiased MCMCRecent developments on unbiased MCMC
Recent developments on unbiased MCMC
 
SPDE presentation 2012
SPDE presentation 2012SPDE presentation 2012
SPDE presentation 2012
 
11 clusadvanced
11 clusadvanced11 clusadvanced
11 clusadvanced
 
Paper id 21201488
Paper id 21201488Paper id 21201488
Paper id 21201488
 
11ClusAdvanced.ppt
11ClusAdvanced.ppt11ClusAdvanced.ppt
11ClusAdvanced.ppt
 
Generative models : VAE and GAN
Generative models : VAE and GANGenerative models : VAE and GAN
Generative models : VAE and GAN
 

Recently uploaded

CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxvipinkmenon1
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and usesDevarapalliHaritha
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 

Recently uploaded (20)

CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptx
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and uses
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 

Clustering-beamer.pdf

  • 1. Clustering, Mixture Models, and EM Algorithm Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/∼mwmak References: C. Bishop, Pattern Recognition and Machine Learning, Springer, 2006. (Chapter 9) S.Y. Kung, M.W. Mak and S.H. Lin, Biometric Authentication: A Machine Learning Approach, Prentice Hall, 2005. (Chapter 3) M.W. Mak and J.T. Chien, Machine Learning for Speaker Recognition, Cambridge University Press, 2020. (Chapters 2 and 3) November 1, 2020 Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 1 / 32
  • 2. Overview 1 Motivations 2 Clustering K-means Gaussian Mixture Models 3 The EM Algorithm Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 2 / 32
  • 3. Motivations Clustering is a kind of unsupervised learning, which has been used in many disciplines. Power Electronics: “Genetic k-means algorithm based RBF network for photovoltaic MPP prediction.” Energy, 35.2 (2010): 529-536. Telecommunication: “An energy efficient hierarchical clustering algorithm for wireless sensor networks.” INFOCOM 2003, Vol. 3. IEEE, 2003. Photonics: “Contiguity-enhanced k-means clustering algorithm for unsupervised multispectral image segmentation.” Optical Science, Engineering and Instrumentation’97, International Society for Optics and Photonics, 1997. Multimedia: “Normalized cuts and image segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888-905, Aug 2000. Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 3 / 32
  • 4. K-means Divide a data set X = {xt; t = 1, . . . , T} into K groups, each represented by its centroid denoted by µk, k = 1, . . . , K. The task is 1 to determine the K centroids {µ1, . . . , µK} and 2 to assign each pattern xt to one of the centroids. Mathematically speaking, one denotes the centroid associated with xt as ct, where ct ∈ {µ1, . . . , µK}. Then the objective of the K-means algorithm is to minimize the sum of squared errors: E(X) = T X t=1 kxt − ctk2 = T X t=1 (xt − ct)T (xt − ct). (1) Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 4 / 32
  • 5. K-means Let Xk denotes the set of data vectors associated with the k-th cluster with the centroid µk and Nk denotes the number of vectors in it. The learning rule of the K-means algorithm consists of: 1 Determine the membership of a data vector: x ∈ Xk if kx − µkk < kx − µjk ∀j 6= k. (2) 2 Update the representation of the cluster: The centroid is updated based on the new membership: µk = 1 Nk X x∈Xk x, k = 1, . . . , K. (3) Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 5 / 32
  • 6. K-means K-means procedure: ! ! ! Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 6 / 32
  • 7. K-means K-means procedure: 1 Randomly picks K samples from the training data and consider them as the centroids. In the example on previous page, K = 3. 2 For each training sample, assign it to the nearest centroid. In this example, samples are assigned to either green, red or blue diamond. 3 For each cluster (green, red, or blue), re-compute the cluster means. Then, repeat step 2 until no change in the centroids. Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 7 / 32
  • 8. Example Applications of K-means Assume that we got some iris flowers Setosa Versicolor Virginica Four attributes (features): (1) sepal length, (2) sepal width, (3) petal length, and (4) petal width We only know there are 3 types of iris flowers but no labels are available in the dataset. We may apply K-means to divide the 4-dimensional vectors into 3 clusters. But we still do not know which cluster belongs to which iris type. Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 8 / 32
  • 9. Example Applications of K-means Results of K-mean clustering: Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 9 / 32
  • 10. Example Applications of K-means K-mean Clustering of handwritten digits with K = 10 Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 10 / 32
  • 11. Example Applications of K-means K-mean Clustering of handwritten digits with K = 4 Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 11 / 32
  • 12. Gaussian Mixture Models (GMM) Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 12 / 32
  • 13. Gaussian Mixture Models A Gaussian mixture model (GMM) is a linear weighted sum of K Gaussian densities: p(x) = K X k=1 wkN(x|µk, Σk), where wk ≡ Pr(mix = k) is the k-th mixture coefficient and N(x|µk, Σk) = 1 (2π) D 2 |Σk| 1 2 exp − 1 2 (x − µk)T Σ−1 k (x − µk) is the k-th Gaussian density with mean µk and covariance matrix Σk. Note that PK k=1 wk = 1. Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 13 / 32
  • 14. Gaussian Mixture Models GMM with 3 mixtures (K = 3): Bishop, 2006 Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 14 / 32
  • 15. Gaussian Mixture Models GMM clustering: K =1 K = 2 Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 15 / 32
  • 16. Training of GMM by Maximum Likelihood Given a set of N-independent and identically distributed (iid) vectors X = {xn; n = 1, . . . , N}, the log of the likelihood function is given by ln p(X|θ) = log ( N Y n=1 K X k=1 wkN(xn|µk, Σk) ) = N X n=1 log ( K X k=1 wkN(xn|µk, Σk) ) To find the parameters θ = {wk, µk, Σk}K k=1 that maximize log p(X|θ), we may set ∂ log p(X) ∂θ = 0 and solve for θ. But this method will not give a closed-form solution for θ. The trouble is that the summation appears inside the logarithm. Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 16 / 32
  • 17. Training of GMM by Maximum Likelihood An elegant method for finding maximum-likelihood solutions for model with latent variable is the expectation-maximization (EM) algorithm. In GMM, for each data point xn, we do not know which Gaussian generates it. So, the latent information is the Gaussian ID for each xn. Define Z = {znk; n = 1, . . . , N; k = 1, . . . , K} as the set of latent variables, where znk = 1 if xn is generated by the k-th Gaussian; otherwise znk = 0. {X, Z} is called the complete data set, and X is the incomplete data set. In most cases, including GMM, maximizing log p(X, Z|θ) with respect to θ is straightforward. Fig. 9.5(a) [next page] shows the distribution p(x, z) of the complete data, whereas Fig. 9.5(b) shows the distribution p(x) of the incomplete data. Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 17 / 32
  • 18. GMM Joint vs Marginal Distributions Source: C.M. Bishop (2006) Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 18 / 32
  • 19. EM Algorithm for GMM However, we actually do not know Z. So, we could not compute ln p(Z, X|θ). Fortunately, we know its posterior distribution, i.e., P(Z|X, θ), through the Bayes theorem:1 P(z|x) = P(z)p(x|z) p(x) In the context of GMM, we compute the posterior probability for each xn: γ(znk) ≡ P(znk = 1|xn, θ) = wkN(xn|µk, Σk) PK j=1 wjN(xn|µj, Σj) (4) Eq. 4 constitutes the E-step of the EM algorithm. 1 We denote probabilities and probability mass functions of discrete random variable using capital letter P. Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 19 / 32
  • 20. EM Algorithm for GMM Computing the posteriors of the latent variables can be considered as alignment. The posterior probabilities indicate the closeness of xn to individual Gaussians in the Mahalanobis sense. µ1 µ2 µ3 ot (zt1) (zt2) (zt3) Mahalanobis distance between x and y is Dmah(x, y) = q (x − y)TΣ−1 (x − y) Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 20 / 32
  • 21. EM Algorithm for GMM So, given the current estimate of the model parameters θold, we can find its new estimate θ by computing the expected value of ln p(Z, X|θ) under the posterior distribution of Z: Q(θ|θold ) = EZ {log p(Z, X|θ)|X, θold } = Ez∼P (z|x){log p(Z, X|θ)|X, θold } = N X n=1 K X k=1 P(znk = 1|xn, θold ) log p(xn, znk = 1|θ) = N X n=1 K X k=1 γ(znk) log p(xn, znk = 1|θ) = N X n=1 K X k=1 γ(znk) log p(xn|znk = 1, θ)P(znk = 1|θ) = N X n=1 K X k=1 γ(znk) log N(xn|µk, Σk)wk (5) Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 21 / 32
  • 22. EM Algorithm for GMM Then, we maximize Q(θ|θold) with respect to θ by setting ∂Q(θ|θold) ∂θ = 0 to obtain (see Tutorial): µk = PN n=1 γ(znk)xn PN n=1 γ(znk) Σk = PN n=1 γ(znk)(xn − µk)(xn − µk)T PN n=1 γ(znk) wk = 1 N N X n=1 γ(znk) This constitutes the M-step. Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 22 / 32
  • 23. EM Algorithm for GMM In practice, we compute the following sufficient statistics: 0th-order: Nk = N X n=1 γ(znk) (6) 1st-order: fk = N X n=1 γ(znk)xn (7) 2nd-order: Sk = N X n=1 γ(znk)xtxT t , (8) where k = 1, . . . , K. Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 23 / 32
  • 24. EM Algorithm for GMM The model parameters are then updated as follows: µk = 1 Nk fk (9) Σk = 1 Nk Sk − µkµT k (10) wk = 1 N Nk. (11) where k = 1, . . . , K. Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 24 / 32
  • 25. EM Algorithm for GMM In summary, the EM algorithm iteratively performs the following: Initialization: Randomly select K samples from X and assign them to {µk}K k=1. Set wk = 1 K and Σk = I, where k = 1, . . . , K. E-Step: Find the posterior distribution of the latent (unobserved) variables, given the observed data and the current estimate of the parameters; M-Step: Re-estimates the parameters to maximize the likelihood of the observed data, under the assumption that the distribution found in the E-step is correct. The iterative process guarantees to increases the true likelihood or leaves it unchanged (if a local maximum has already been reached). Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 25 / 32
  • 26. The EM Algorithm The EM algorithm is an ideal candidate for determining the parameters of a GMM. EM is applicable to the problems where the observable data provide only partial information or where some data are “missing”. Each EM iteration is composed of two steps—Estimation (E) and Maximization (M). The M-step maximizes a likelihood function that is further refined in each iteration by the E-step. Animations: https://www.youtube.com/watch?v=v-pq8VCQk4M Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 26 / 32
  • 27. GMM: A Numerical Example This example uses the following data as the observed data. Section 3.2. Traditional Derivation of EM 55 Example 1: Hidden-State Problem x1 = 1 x7 = 8 x4 = 4 x3 = 3 x2 = 2 x5 = 6 x6 = 7 x (a) Example 2: Partial-Data Problem 1 2 3 4 6 5 x1 = 1 x2 = 2 x3 = 3 x4 = 4 x y = 5.0 or 6.0 (b) Example 3: Doubly-Stochastic (Partial-Data and Hidden-State) 12 2 4 3 5 6 11 13 14 15 16 1 x x1 = {1 or 2} x6 = {15 or 16} x4 = {11 or 12} x3 = {4 or 5} x2 = {3 or 4} x5 = {13 or 14} (c) Figure 3.3. One-dimensional example illustrating the concept of (a) hidden- Assume that when EM begins, θold = {w1, {µ1, σ1}, w2, {µ2, σ2}} = {0.5, {0, 1}, 0.5, {9, 1}} . Therefore, one has γ(zn1) = w1 σ1 e−1 2 (xn−µ1)2/σ2 1 P2 k=1 wk σk e−1 2 (xn−µk)2/σ2 k = e−1 2 xn 2 e−1 2 xn 2 + e−1 2 (xn−9)2 (12) Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 27 / 32
  • 28. GMM: A Numerical Example Pattern Index (t) Pattern (xn) γ(zn1) γ(zn2) 1 1 1 0 2 2 1 0 3 3 1 0 4 4 1 0 5 6 0 1 6 7 0 1 7 8 0 1 Iteration Q(θ|θold ) µ1 σ2 1 µ2 σ2 2 0 -∞ 0 1 9 1 1 -43.71 2.50 1.25 6.99 0.70 2 -25.11 2.51 1.29 7.00 0.68 3 -25.11 2.51 1.30 7.00 0.67 4 -25.10 2.52 1.30 7.00 0.67 5 -25.10 2.52 1.30 7.00 0.67 Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 28 / 32
  • 29. The E- and M-Steps Section 3.2. Traditional Derivation of EM 59 © © © © © © H HH H HH©© ©© © © H H H H H H ? ? ? ? 6 - ? E-step: Compute Q(θ|θn) M-step: θn+1 = θ∗ n = n + 1 θML = θn+1 Y N Compute θ∗ = arg max θ Q(θ|θn) Set n = 0 Initialize θ0 Set Q(θ0|θ−1) = −∞ Q(θn+1|θn)− Q(θn|θn−1) ≤ ξ ξ: termination threshold θML: maximum-likelihood estimate Figure 3.4. The flow of the EM algorithm. Now, a set of indicator variables is introduced to indicate the status of the hidden- states:2 (j) Figure: The flow of the EM algorithm. Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 29 / 32
  • 30. Example Applications of GMM GMM Clustering of handwritten digits with K = 10 Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 30 / 32
  • 31. Example Applications of GMM GMM Clustering of handwritten digits with K = 4 Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 31 / 32
  • 32. Example Applications of Clustering DNN for Face Clustering https://github.com/durgeshtrivedi/imagecluster Man-Wai MAK (EIE6207) Clustering and EM November 1, 2020 32 / 32