SlideShare a Scribd company logo
1 of 29
Clustering
Shadi Albarqouni, M.Sc.
Graduate Research Assistant | PhD Candidate
shadi.albarqouni@tum.de
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
Machine Learning in Medical Imaging (MLMI)
Winter Semester 15/16
BioMedical Computing (BMC) Master Program
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
Outline
1 Introduction
2 Parametric, cost-based clustering
1. K-Means
2. K-Medoids
3. Kernel K-Means
4. Spectral Clustering
5. Extensions
6. Comparison
3 Parametric, model-based clustering
1. Mixture Models
4 Non-parametric, model-based clustering
1. Mean-shift
2 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
What is clustering?
Definition (Clustering)
Given n unlabelled data points, separate them into K clusters.
Dilemma! [6]
• What is a Cluster?
(Compact vs. Connected)
• How many K clusters?
(Parametric vs. Non-parametric)
• Soft vs. Hard clustering.
(Model vs. Cost based)
• Data representation.
(Vector vs. Similarities)
• Classification vs. Clustering.
• Stability [7].
3 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
Applications
• Image Retrieval
• Image Compression
• Image Segmentation
• Pattern Recognition
4 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
Notation
• D = {x1, x2, ..., xn} ∈ Rm×n is the data set.
• m is the feature dimension of xi .
• n is the number of instances.
• K is the number of clusters.
• = {C1, C2, ...., CK }, where Ck is a partition of D.
• c(xi ) is the label/cluster of instance xi .
• rnk where n is the index of instance and k is the index of cluster.
Objective
Find the clusters minimizing the cost function L( ).
5 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
Parametric, cost-based clustering
Parametric: K is defined.
Cost-based: It is hard-clustering based on the cost function.
Selected Algorithms:
• K-Means [8].
• K-Medoids [11].
• Kernel K-Means [12].
• Spectral Clustering [10].
6 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
K-Means
• K-Means algorithm:
1. Initialize: Pick K random samples from the dataset D as the cluster
centroids µk = {µ1, µ2, ..., µK }.
2. Assign Points to the clusters: Partition data points D into K
clusters = {C1, C2, ..., CK } based on the Euclidean distance
between the points and centroids (searching for the closest centroid).
3. Centroid update: Based on the points assigned to each cluster, a
new centroid is computed µk .
4. Repeat: Do step 2 and 3 until convergence.
5. Convergence: if the cluster centroids barley change, or we have
compact and/or isolated clusters. Mathematically, when the cost
(distortion) function L( ) =
K
k=1 i∈Ck
xi − µk
2
is minimum.
• Practical issues:
a) The initialization. b) Pre-processing.
7 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
K-Means – Algorithm
input : Data points D = {x1, x2, ..., xn}, number of clusters K
output: Clusters, = {C1, C2, ...., CK }
Pick K random samples as the cluster centroids µk.
repeat
for i = 1 to n do
c(xi ) = mink∈K xi − µk
2
2 %Assign points to clusters
end
for k = 1 to K do
µk = 1
|Ck | i∈Ck
xi %Update the cluster centroid
end
until convergence;
8 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
K-Medoids (1)
• K-Medoids algorithm:
1. Initialize: Pick K random samples from the dataset D as the medoids
µk = {µ1, µ2, ..., µK }.
2. Assign Points to the clusters: Partition data points D into K
clusters = {C1, C2, ..., CK } based on the dissimilarity (Manhattan)
distance between the points and medoids (searching for the min.
dissimilarity).
3. Medoids update: Based on the points assigned to each cluster, swap
the medoid with a new data point and compute the cost. (undo the
swap if the cost is getting increased).
4. Repeat: Do step 2 and 3 until convergence.
5. Convergence: if the cluster medoids barley change. Mathematically,
when the cost (distortion) function L( ) =
K
k=1 i∈Ck
xi − µk is
minimum.
9 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
K-Medoids (2)
Figure : K-Means vs. K-Medoids
10 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
Kernel K-Means (1)
Definition
It is a generalization of the
standard K-Means algorithm.
• What happens if the clusters
are not linearly separable?
• Euclidean distance vs. Geodesic
distance.
Figure : Spiral and Jain datasets
11 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
Kernel K-Means (2)
• K-Means can not be applied right away.
• Map the data points xi ∈ D to a high dimensional feature space
M using a nonlinear function φ(xi ).
• Assume the clusters in the high dimensional feature space (RKHS)
is linearly separable, hence K-Means can be applied.
• The cost function would be
LK( ) =
K
k=1 i∈Ck
φ(xi ) − φ(µk) 2
,
where
φ(xi ) − φ(µk) 2
= φ(xi )T ·φ(xi )−2φ(xi )T ·φ(µk)+φ(µk)T ·φ(µk).
12 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
Kernel K-Means (3)
• Using the kernel trick, Kij = φ(xi )T · φ(xj), the Euclidean distance
in LK( ) can be computed easily using any kernel function Kij
without explicitly knowing the nonlinear transformation φ(xi ).
• Examples of kernel functions (positive semidefinite)
1. Hom. Polynomial kernel: Kij = (xT
i xj )δ
2. Inho. Polynomial kernel: Kij = (xT
i xj + γ)δ
3. Gaussian kernel: Kij = e
− xi −xj
2
2σ2
4. Laplacian kernel Kij = e
− xi −xj
σ
5. Sigmoid kernel: Kij = tanh(γ(xT
i xj ) + θ)
13 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
Kernel K-Means – Algorithm
input : Data points D = {x1, x2, ..., xn}, Kernel matrix Kij, number
of clusters K
output: Clusters, = {C1, C2, ...., CK }
Pick K random samples as the cluster centroids µk.
repeat
for i = 1 to n do
for k = 1 to K do
Compute φ(xi ) − φ(µk) 2
using Kij
end
c(xi ) = mink∈K φ(xi ) − φ(µk) 2
end
for k = 1 to K do
µk = 1
|Ck | i∈Ck
xi
end
until convergence;
14 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
Spectral Clustering
Graph - Overview
• Fully connected, undirected, and wighted graph with N vertices
• The graph is represented by G = {ν, ε, ω}, where ν is a set of
vertices N, ε is a set of edges, and ω is a set of weights are
assigned using a heat kernel as follows to build the Adjacency
matrix W
Wij =



e−
xi −xj
2
2
σ2 eij ∈ ε
0 else
• The degree matrix D, where its diagonal elements Dij = j Wij
• Compute the Normalized graph Laplacian Matrix
˜L = I − D−1/2
WD−1/2
15 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
Spectral Clustering – Algorithm
input : Normalized Laplacian Matrix ˜L, number of clusters K
output: Clusters, = {C1, C2, ...., CK }
Compute the firsts K eigenvectors U = {u1, u2, ..., uK } ∈ Rn×K of ˜L.
Compute ˜U by normalising the rows to norm 1.
Do K-Means on ˜U ∈ Rn×K such that your data points are the rows
vectors which have K-dimensions or simply: D ← ˜UT .
Pick K random samples as the cluster centroids µk.
repeat
for i = 1 to n do
c(xi ) = mink∈K xi − µk
2
2 %Assign points to clusters
end
for k = 1 to K do
µk = 1
|Ck | i∈Ck
xi %Update the cluster centroid
end
until convergence;
16 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
Extensions (1)
• Alternative cost (distortion) function:
n
i=1
n
j=1
xi − xj
2
=
K
k=1 i,j∈Ck
xi − xj
2
Intracluster distance
+
K
k=1 i∈Ck j /∈Ck
xi − xj
2
Intercluster distance
1. Intracluster distance:
L( ) =
K
k=1 i,j∈Ck
xi − xj
2
+ constant
2. Interclsuter distance:
L( ) = −
K
k=1 i∈Ck l /∈Ck
xi − xj
2
+ constant
17 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
Extensions (2)
3. K-Median:
L( ) =
K
k=1 i∈Ck
xi − µk
• Alternative Initialization:
1. K-Means++ [1]
2. Global Kernel K-Means [13]
• On selecting K 1:
1. Rule of thumb: K = n/2
2. Elbow Method
3. Silhouette
• Soft clustering: Fuzzy C-Means [2]
• Variant: Spectral Clustering [14]
• Hierarchical Clustering
1
https://en.wikipedia.org/wiki/Determining_the_number_of_
clusters_in_a_data_set
18 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
Comparison
Algorithm Data Rep. Comp. Out. Cent.
K-Means Vectors Low No /∈ D
K-Medians Vectors High No /∈ D
K-Medoids Similarity High Yes ∈ D
Kernel K-Means Kernel High N/A /∈ D
Spectral Clustering Similarity High N/A /∈ D
2
2
Data Rep: Data Representation, Comp.: Computational cost, Out.:
Handling outliers, Cent.: Centroids.
19 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
Parametric, model-based clustering
Parametric: K and the density function are defined (i.e. Gaussian)
Model-based: It is soft-clustering based on the mixture density f (x).
f (x) =
K
k=1
πkfk(x), s.t. πk ≥ 0,
K
πk = 1,
where fk(x) is the component of mixture. f (x) is a Gaussian Mixture
Model (GMM) when fk(x) ∼ N(x; µk, σ2
k).
Degree of Membership:
γki = P[xi ∈ Ck] =
πkfk(xi )
f (xi )
GMM Parameter: θ = {π1:K , µ1:K , σ1:K }.
Selected Algorithm to estimate the parameter:
• EM-Algorithm [5].
20 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
Expectation-Maximization (EM) Algorithm
• Given data points D sampled
i.i.d from an unknown
distribution f
• We need to model the
distribution using Maximum
Likelihood (ML) principle
(log-likelihood):
l(θ) = ln fθ(D) =
n
i=1
ln fθ(xi )
l(θ) =
n
i=1
ln
K
k=1
πkfk(xi )
The objective: θML = argmaxθl(θ)
Figure : GMM Clustering
21 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
EM – Algorithm
input : data points D, number of clusters K
output: Parameters, θML = {π1:K , µ1:K , σ1:K }
Initialize the parameters θ at random.
repeat
for i = 1 to n do
for k = 1 to K do
γik = πk fk (xi )
f (xi ) %E-Step
end
end
for k = 1 to K do
πk = 1
n
n
i=1 γik %M-Step
µk = 1
nπk
n
i=1 γik xi
σk = 1
nπk
n
i=1 γik (xi − µk )(xi − µk )T
end
until convergence;
22 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
Non-parametric, model-based clustering
Idea: group the points by the peak of data density
Parameter: shape and number of clusters K are defined by the
algorithm, however, you should define:
1. smoothness of density estimate (h)3
2. what is a peak
Selected Algorithm:
• Mean-shift [4].
3
Rule of thumb: h = (4ˆσ
3n
)1/5
, where ˆσ is the standard deviation of the samples.
23 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
Mean-shift Algorithm
• Given data points D sampled
i.i.d from an unknown density f
• We need to define the shape of
the density using Kernel Density
Estimation (KDE) principle:
fh(x) =
1
nhm
n
i=1
K(
x − xi
h
),
where K(·) is a kernel function,
must be positive, symmetric
and differentiable, i.e. Gaussian
kernel K(z) = 1
(2π)m/2 e−
z 2
2
• The objective: find the peaks of
fh(x) by equating fh(x) = 0
• That results in
x =
n
i=1 xi K(x−xi
h )
n
i=1 K(x−xi
h )
mean−shift
Figure : Mean-shift Clustering
24 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
Summary
25 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
Acknowledgment
This tutorial is done with the help of
• Bishop’s book [3],
• Meila’s slides in MLSS 2011 [9], and
• Lichao’s slides form pervious semester (SS15)
26 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
References (1)
David Arthur and Sergei Vassilvitskii.
k-means++: The advantages of careful seeding.
In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pages 1027–1035.
Society for Industrial and Applied Mathematics, 2007.
James C Bezdek.
Pattern recognition with fuzzy objective function algorithms.
Springer Science & Business Media, 2013.
Christopher M Bishop.
Pattern recognition and machine learning.
springer, 2006.
Yizong Cheng.
Mean shift, mode seeking, and clustering.
Pattern Analysis and Machine Intelligence, IEEE Transactions on, 17(8):790–799, 1995.
Arthur P Dempster, Nan M Laird, and Donald B Rubin.
Maximum likelihood from incomplete data via the em algorithm.
Journal of the royal statistical society. Series B (methodological), pages 1–38, 1977.
Anil K Jain and Martin HC Law.
Data clustering: A user’s dilemma.
In Pattern Recognition and Machine Intelligence, pages 1–10. Springer, 2005.
27 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
References (2)
Tilman Lange, Volker Roth, Mikio L Braun, and Joachim M Buhmann.
Stability-based validation of clustering solutions.
Neural computation, 16(6):1299–1323, 2004.
S. Lloyd.
Least squares quantization in pcm.
Information Theory, IEEE Transactions on, 28(2):129–137, Mar 1982.
Marina Meila.
Classic and modern data clustering.
Machine Learning Summer School (MLSS), 2011.
Andrew Y Ng, Michael I Jordan, Yair Weiss, et al.
On spectral clustering: Analysis and an algorithm.
Advances in neural information processing systems, 2:849–856, 2002.
Hae-Sang Park and Chi-Hyuck Jun.
A simple and fast algorithm for k-medoids clustering.
Expert Systems with Applications, 36(2):3336–3341, 2009.
Bernhard Schölkopf, Alexander Smola, and Klaus-Robert Müller.
Nonlinear component analysis as a kernel eigenvalue problem.
Neural computation, 10(5):1299–1319, 1998.
28 / 29
Computer Aided Medical Procedures (CAMP) | TU München (TUM)
References (3)
Grigorios F Tzortzis and Aristidis C Likas.
The global kernel-means algorithm for clustering in feature space.
Neural Networks, IEEE Transactions on, 20(7):1181–1194, 2009.
Ulrike Von Luxburg.
A tutorial on spectral clustering.
Statistics and computing, 17(4):395–416, 2007.
29 / 29

More Related Content

What's hot

MLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic trackMLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic trackarogozhnikov
 
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...Cemal Ardil
 
第13回 配信講義 計算科学技術特論A(2021)
第13回 配信講義 計算科学技術特論A(2021)第13回 配信講義 計算科学技術特論A(2021)
第13回 配信講義 計算科学技術特論A(2021)RCCSRENKEI
 
MLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic trackMLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic trackarogozhnikov
 
010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian ProcessHa Phuong
 
2012 mdsp pr08 nonparametric approach
2012 mdsp pr08 nonparametric approach2012 mdsp pr08 nonparametric approach
2012 mdsp pr08 nonparametric approachnozomuhamada
 
2012 mdsp pr04 monte carlo
2012 mdsp pr04 monte carlo2012 mdsp pr04 monte carlo
2012 mdsp pr04 monte carlonozomuhamada
 
Drobics, m. 2001: datamining using synergiesbetween self-organising maps and...
Drobics, m. 2001:  datamining using synergiesbetween self-organising maps and...Drobics, m. 2001:  datamining using synergiesbetween self-organising maps and...
Drobics, m. 2001: datamining using synergiesbetween self-organising maps and...ArchiLab 7
 
A review on structure learning in GNN
A review on structure learning in GNNA review on structure learning in GNN
A review on structure learning in GNNtuxette
 
Training and Inference for Deep Gaussian Processes
Training and Inference for Deep Gaussian ProcessesTraining and Inference for Deep Gaussian Processes
Training and Inference for Deep Gaussian ProcessesKeyon Vafa
 
Machine learning in science and industry — day 4
Machine learning in science and industry — day 4Machine learning in science and industry — day 4
Machine learning in science and industry — day 4arogozhnikov
 
MLHEP Lectures - day 3, basic track
MLHEP Lectures - day 3, basic trackMLHEP Lectures - day 3, basic track
MLHEP Lectures - day 3, basic trackarogozhnikov
 
Section5 Rbf
Section5 RbfSection5 Rbf
Section5 Rbfkylin
 
Hyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientHyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientFabian Pedregosa
 
MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4arogozhnikov
 
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data SetsMethods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data SetsRyan B Harvey, CSDP, CSM
 

What's hot (20)

MLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic trackMLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic track
 
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
 
第13回 配信講義 計算科学技術特論A(2021)
第13回 配信講義 計算科学技術特論A(2021)第13回 配信講義 計算科学技術特論A(2021)
第13回 配信講義 計算科学技術特論A(2021)
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 
MLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic trackMLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic track
 
Analysis_molf
Analysis_molfAnalysis_molf
Analysis_molf
 
010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process
 
2012 mdsp pr08 nonparametric approach
2012 mdsp pr08 nonparametric approach2012 mdsp pr08 nonparametric approach
2012 mdsp pr08 nonparametric approach
 
2012 mdsp pr04 monte carlo
2012 mdsp pr04 monte carlo2012 mdsp pr04 monte carlo
2012 mdsp pr04 monte carlo
 
Drobics, m. 2001: datamining using synergiesbetween self-organising maps and...
Drobics, m. 2001:  datamining using synergiesbetween self-organising maps and...Drobics, m. 2001:  datamining using synergiesbetween self-organising maps and...
Drobics, m. 2001: datamining using synergiesbetween self-organising maps and...
 
A review on structure learning in GNN
A review on structure learning in GNNA review on structure learning in GNN
A review on structure learning in GNN
 
Training and Inference for Deep Gaussian Processes
Training and Inference for Deep Gaussian ProcessesTraining and Inference for Deep Gaussian Processes
Training and Inference for Deep Gaussian Processes
 
Machine learning in science and industry — day 4
Machine learning in science and industry — day 4Machine learning in science and industry — day 4
Machine learning in science and industry — day 4
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
MLHEP Lectures - day 3, basic track
MLHEP Lectures - day 3, basic trackMLHEP Lectures - day 3, basic track
MLHEP Lectures - day 3, basic track
 
Section5 Rbf
Section5 RbfSection5 Rbf
Section5 Rbf
 
Hyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientHyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradient
 
Smart Multitask Bregman Clustering
Smart Multitask Bregman ClusteringSmart Multitask Bregman Clustering
Smart Multitask Bregman Clustering
 
MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4
 
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data SetsMethods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
 

Similar to Clustering lect

Litvinenko low-rank kriging +FFT poster
Litvinenko low-rank kriging +FFT  posterLitvinenko low-rank kriging +FFT  poster
Litvinenko low-rank kriging +FFT posterAlexander Litvinenko
 
Chapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.pptChapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.pptSubrata Kumer Paul
 
20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variations20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variationsAndres Mendez-Vazquez
 
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...Rafael Nogueras
 
11ClusAdvanced.ppt
11ClusAdvanced.ppt11ClusAdvanced.ppt
11ClusAdvanced.pptSueMiu
 
Computing near-optimal policies from trajectories by solving a sequence of st...
Computing near-optimal policies from trajectories by solving a sequence of st...Computing near-optimal policies from trajectories by solving a sequence of st...
Computing near-optimal policies from trajectories by solving a sequence of st...Université de Liège (ULg)
 
Inria Tech Talk - La classification de données complexes avec MASSICCC
Inria Tech Talk - La classification de données complexes avec MASSICCCInria Tech Talk - La classification de données complexes avec MASSICCC
Inria Tech Talk - La classification de données complexes avec MASSICCCStéphanie Roger
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
Clustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modelClustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modeljins0618
 
Unbiased Markov chain Monte Carlo
Unbiased Markov chain Monte CarloUnbiased Markov chain Monte Carlo
Unbiased Markov chain Monte CarloJeremyHeng10
 
DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx36rajneekant
 
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...Salah Amean
 
Optimization Techniques
Optimization TechniquesOptimization Techniques
Optimization TechniquesAjay Bidyarthy
 
Optimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering AlgorithmOptimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering AlgorithmIJERA Editor
 
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleDataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleHakka Labs
 

Similar to Clustering lect (20)

Litvinenko low-rank kriging +FFT poster
Litvinenko low-rank kriging +FFT  posterLitvinenko low-rank kriging +FFT  poster
Litvinenko low-rank kriging +FFT poster
 
ICPR 2016
ICPR 2016ICPR 2016
ICPR 2016
 
Chapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.pptChapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.ppt
 
20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variations20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variations
 
Introduction to Sparse Methods
Introduction to Sparse Methods Introduction to Sparse Methods
Introduction to Sparse Methods
 
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
 
11ClusAdvanced.ppt
11ClusAdvanced.ppt11ClusAdvanced.ppt
11ClusAdvanced.ppt
 
Clustering-beamer.pdf
Clustering-beamer.pdfClustering-beamer.pdf
Clustering-beamer.pdf
 
Computing near-optimal policies from trajectories by solving a sequence of st...
Computing near-optimal policies from trajectories by solving a sequence of st...Computing near-optimal policies from trajectories by solving a sequence of st...
Computing near-optimal policies from trajectories by solving a sequence of st...
 
Inria Tech Talk - La classification de données complexes avec MASSICCC
Inria Tech Talk - La classification de données complexes avec MASSICCCInria Tech Talk - La classification de données complexes avec MASSICCC
Inria Tech Talk - La classification de données complexes avec MASSICCC
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Clustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modelClustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture model
 
K mean-clustering
K mean-clusteringK mean-clustering
K mean-clustering
 
Unbiased Markov chain Monte Carlo
Unbiased Markov chain Monte CarloUnbiased Markov chain Monte Carlo
Unbiased Markov chain Monte Carlo
 
DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx
 
Subquad multi ff
Subquad multi ffSubquad multi ff
Subquad multi ff
 
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
 
Optimization Techniques
Optimization TechniquesOptimization Techniques
Optimization Techniques
 
Optimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering AlgorithmOptimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering Algorithm
 
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleDataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
 

More from Shadi Nabil Albarqouni (10)

Certificate
CertificateCertificate
Certificate
 
AggNet: Deep Learning from Crowds
AggNet: Deep Learning from CrowdsAggNet: Deep Learning from Crowds
AggNet: Deep Learning from Crowds
 
Sparse Regularization
Sparse RegularizationSparse Regularization
Sparse Regularization
 
Pmsd
PmsdPmsd
Pmsd
 
Telemedicine in Palestine
Telemedicine in PalestineTelemedicine in Palestine
Telemedicine in Palestine
 
Pro
ProPro
Pro
 
Medical robots history
Medical robots historyMedical robots history
Medical robots history
 
Robust Stability and Disturbance Analysis of a Class of Networked Control Sys...
Robust Stability and Disturbance Analysis of a Class of Networked Control Sys...Robust Stability and Disturbance Analysis of a Class of Networked Control Sys...
Robust Stability and Disturbance Analysis of a Class of Networked Control Sys...
 
DIP Course Projects (HCR)
DIP Course Projects (HCR)DIP Course Projects (HCR)
DIP Course Projects (HCR)
 
Rigid motions & Homogeneous Transformation
Rigid motions & Homogeneous TransformationRigid motions & Homogeneous Transformation
Rigid motions & Homogeneous Transformation
 

Recently uploaded

Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 

Recently uploaded (20)

Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 

Clustering lect

  • 1. Clustering Shadi Albarqouni, M.Sc. Graduate Research Assistant | PhD Candidate shadi.albarqouni@tum.de Computer Aided Medical Procedures (CAMP) | TU München (TUM) Machine Learning in Medical Imaging (MLMI) Winter Semester 15/16 BioMedical Computing (BMC) Master Program
  • 2. Computer Aided Medical Procedures (CAMP) | TU München (TUM) Outline 1 Introduction 2 Parametric, cost-based clustering 1. K-Means 2. K-Medoids 3. Kernel K-Means 4. Spectral Clustering 5. Extensions 6. Comparison 3 Parametric, model-based clustering 1. Mixture Models 4 Non-parametric, model-based clustering 1. Mean-shift 2 / 29
  • 3. Computer Aided Medical Procedures (CAMP) | TU München (TUM) What is clustering? Definition (Clustering) Given n unlabelled data points, separate them into K clusters. Dilemma! [6] • What is a Cluster? (Compact vs. Connected) • How many K clusters? (Parametric vs. Non-parametric) • Soft vs. Hard clustering. (Model vs. Cost based) • Data representation. (Vector vs. Similarities) • Classification vs. Clustering. • Stability [7]. 3 / 29
  • 4. Computer Aided Medical Procedures (CAMP) | TU München (TUM) Applications • Image Retrieval • Image Compression • Image Segmentation • Pattern Recognition 4 / 29
  • 5. Computer Aided Medical Procedures (CAMP) | TU München (TUM) Notation • D = {x1, x2, ..., xn} ∈ Rm×n is the data set. • m is the feature dimension of xi . • n is the number of instances. • K is the number of clusters. • = {C1, C2, ...., CK }, where Ck is a partition of D. • c(xi ) is the label/cluster of instance xi . • rnk where n is the index of instance and k is the index of cluster. Objective Find the clusters minimizing the cost function L( ). 5 / 29
  • 6. Computer Aided Medical Procedures (CAMP) | TU München (TUM) Parametric, cost-based clustering Parametric: K is defined. Cost-based: It is hard-clustering based on the cost function. Selected Algorithms: • K-Means [8]. • K-Medoids [11]. • Kernel K-Means [12]. • Spectral Clustering [10]. 6 / 29
  • 7. Computer Aided Medical Procedures (CAMP) | TU München (TUM) K-Means • K-Means algorithm: 1. Initialize: Pick K random samples from the dataset D as the cluster centroids µk = {µ1, µ2, ..., µK }. 2. Assign Points to the clusters: Partition data points D into K clusters = {C1, C2, ..., CK } based on the Euclidean distance between the points and centroids (searching for the closest centroid). 3. Centroid update: Based on the points assigned to each cluster, a new centroid is computed µk . 4. Repeat: Do step 2 and 3 until convergence. 5. Convergence: if the cluster centroids barley change, or we have compact and/or isolated clusters. Mathematically, when the cost (distortion) function L( ) = K k=1 i∈Ck xi − µk 2 is minimum. • Practical issues: a) The initialization. b) Pre-processing. 7 / 29
  • 8. Computer Aided Medical Procedures (CAMP) | TU München (TUM) K-Means – Algorithm input : Data points D = {x1, x2, ..., xn}, number of clusters K output: Clusters, = {C1, C2, ...., CK } Pick K random samples as the cluster centroids µk. repeat for i = 1 to n do c(xi ) = mink∈K xi − µk 2 2 %Assign points to clusters end for k = 1 to K do µk = 1 |Ck | i∈Ck xi %Update the cluster centroid end until convergence; 8 / 29
  • 9. Computer Aided Medical Procedures (CAMP) | TU München (TUM) K-Medoids (1) • K-Medoids algorithm: 1. Initialize: Pick K random samples from the dataset D as the medoids µk = {µ1, µ2, ..., µK }. 2. Assign Points to the clusters: Partition data points D into K clusters = {C1, C2, ..., CK } based on the dissimilarity (Manhattan) distance between the points and medoids (searching for the min. dissimilarity). 3. Medoids update: Based on the points assigned to each cluster, swap the medoid with a new data point and compute the cost. (undo the swap if the cost is getting increased). 4. Repeat: Do step 2 and 3 until convergence. 5. Convergence: if the cluster medoids barley change. Mathematically, when the cost (distortion) function L( ) = K k=1 i∈Ck xi − µk is minimum. 9 / 29
  • 10. Computer Aided Medical Procedures (CAMP) | TU München (TUM) K-Medoids (2) Figure : K-Means vs. K-Medoids 10 / 29
  • 11. Computer Aided Medical Procedures (CAMP) | TU München (TUM) Kernel K-Means (1) Definition It is a generalization of the standard K-Means algorithm. • What happens if the clusters are not linearly separable? • Euclidean distance vs. Geodesic distance. Figure : Spiral and Jain datasets 11 / 29
  • 12. Computer Aided Medical Procedures (CAMP) | TU München (TUM) Kernel K-Means (2) • K-Means can not be applied right away. • Map the data points xi ∈ D to a high dimensional feature space M using a nonlinear function φ(xi ). • Assume the clusters in the high dimensional feature space (RKHS) is linearly separable, hence K-Means can be applied. • The cost function would be LK( ) = K k=1 i∈Ck φ(xi ) − φ(µk) 2 , where φ(xi ) − φ(µk) 2 = φ(xi )T ·φ(xi )−2φ(xi )T ·φ(µk)+φ(µk)T ·φ(µk). 12 / 29
  • 13. Computer Aided Medical Procedures (CAMP) | TU München (TUM) Kernel K-Means (3) • Using the kernel trick, Kij = φ(xi )T · φ(xj), the Euclidean distance in LK( ) can be computed easily using any kernel function Kij without explicitly knowing the nonlinear transformation φ(xi ). • Examples of kernel functions (positive semidefinite) 1. Hom. Polynomial kernel: Kij = (xT i xj )δ 2. Inho. Polynomial kernel: Kij = (xT i xj + γ)δ 3. Gaussian kernel: Kij = e − xi −xj 2 2σ2 4. Laplacian kernel Kij = e − xi −xj σ 5. Sigmoid kernel: Kij = tanh(γ(xT i xj ) + θ) 13 / 29
  • 14. Computer Aided Medical Procedures (CAMP) | TU München (TUM) Kernel K-Means – Algorithm input : Data points D = {x1, x2, ..., xn}, Kernel matrix Kij, number of clusters K output: Clusters, = {C1, C2, ...., CK } Pick K random samples as the cluster centroids µk. repeat for i = 1 to n do for k = 1 to K do Compute φ(xi ) − φ(µk) 2 using Kij end c(xi ) = mink∈K φ(xi ) − φ(µk) 2 end for k = 1 to K do µk = 1 |Ck | i∈Ck xi end until convergence; 14 / 29
  • 15. Computer Aided Medical Procedures (CAMP) | TU München (TUM) Spectral Clustering Graph - Overview • Fully connected, undirected, and wighted graph with N vertices • The graph is represented by G = {ν, ε, ω}, where ν is a set of vertices N, ε is a set of edges, and ω is a set of weights are assigned using a heat kernel as follows to build the Adjacency matrix W Wij =    e− xi −xj 2 2 σ2 eij ∈ ε 0 else • The degree matrix D, where its diagonal elements Dij = j Wij • Compute the Normalized graph Laplacian Matrix ˜L = I − D−1/2 WD−1/2 15 / 29
  • 16. Computer Aided Medical Procedures (CAMP) | TU München (TUM) Spectral Clustering – Algorithm input : Normalized Laplacian Matrix ˜L, number of clusters K output: Clusters, = {C1, C2, ...., CK } Compute the firsts K eigenvectors U = {u1, u2, ..., uK } ∈ Rn×K of ˜L. Compute ˜U by normalising the rows to norm 1. Do K-Means on ˜U ∈ Rn×K such that your data points are the rows vectors which have K-dimensions or simply: D ← ˜UT . Pick K random samples as the cluster centroids µk. repeat for i = 1 to n do c(xi ) = mink∈K xi − µk 2 2 %Assign points to clusters end for k = 1 to K do µk = 1 |Ck | i∈Ck xi %Update the cluster centroid end until convergence; 16 / 29
  • 17. Computer Aided Medical Procedures (CAMP) | TU München (TUM) Extensions (1) • Alternative cost (distortion) function: n i=1 n j=1 xi − xj 2 = K k=1 i,j∈Ck xi − xj 2 Intracluster distance + K k=1 i∈Ck j /∈Ck xi − xj 2 Intercluster distance 1. Intracluster distance: L( ) = K k=1 i,j∈Ck xi − xj 2 + constant 2. Interclsuter distance: L( ) = − K k=1 i∈Ck l /∈Ck xi − xj 2 + constant 17 / 29
  • 18. Computer Aided Medical Procedures (CAMP) | TU München (TUM) Extensions (2) 3. K-Median: L( ) = K k=1 i∈Ck xi − µk • Alternative Initialization: 1. K-Means++ [1] 2. Global Kernel K-Means [13] • On selecting K 1: 1. Rule of thumb: K = n/2 2. Elbow Method 3. Silhouette • Soft clustering: Fuzzy C-Means [2] • Variant: Spectral Clustering [14] • Hierarchical Clustering 1 https://en.wikipedia.org/wiki/Determining_the_number_of_ clusters_in_a_data_set 18 / 29
  • 19. Computer Aided Medical Procedures (CAMP) | TU München (TUM) Comparison Algorithm Data Rep. Comp. Out. Cent. K-Means Vectors Low No /∈ D K-Medians Vectors High No /∈ D K-Medoids Similarity High Yes ∈ D Kernel K-Means Kernel High N/A /∈ D Spectral Clustering Similarity High N/A /∈ D 2 2 Data Rep: Data Representation, Comp.: Computational cost, Out.: Handling outliers, Cent.: Centroids. 19 / 29
  • 20. Computer Aided Medical Procedures (CAMP) | TU München (TUM) Parametric, model-based clustering Parametric: K and the density function are defined (i.e. Gaussian) Model-based: It is soft-clustering based on the mixture density f (x). f (x) = K k=1 πkfk(x), s.t. πk ≥ 0, K πk = 1, where fk(x) is the component of mixture. f (x) is a Gaussian Mixture Model (GMM) when fk(x) ∼ N(x; µk, σ2 k). Degree of Membership: γki = P[xi ∈ Ck] = πkfk(xi ) f (xi ) GMM Parameter: θ = {π1:K , µ1:K , σ1:K }. Selected Algorithm to estimate the parameter: • EM-Algorithm [5]. 20 / 29
  • 21. Computer Aided Medical Procedures (CAMP) | TU München (TUM) Expectation-Maximization (EM) Algorithm • Given data points D sampled i.i.d from an unknown distribution f • We need to model the distribution using Maximum Likelihood (ML) principle (log-likelihood): l(θ) = ln fθ(D) = n i=1 ln fθ(xi ) l(θ) = n i=1 ln K k=1 πkfk(xi ) The objective: θML = argmaxθl(θ) Figure : GMM Clustering 21 / 29
  • 22. Computer Aided Medical Procedures (CAMP) | TU München (TUM) EM – Algorithm input : data points D, number of clusters K output: Parameters, θML = {π1:K , µ1:K , σ1:K } Initialize the parameters θ at random. repeat for i = 1 to n do for k = 1 to K do γik = πk fk (xi ) f (xi ) %E-Step end end for k = 1 to K do πk = 1 n n i=1 γik %M-Step µk = 1 nπk n i=1 γik xi σk = 1 nπk n i=1 γik (xi − µk )(xi − µk )T end until convergence; 22 / 29
  • 23. Computer Aided Medical Procedures (CAMP) | TU München (TUM) Non-parametric, model-based clustering Idea: group the points by the peak of data density Parameter: shape and number of clusters K are defined by the algorithm, however, you should define: 1. smoothness of density estimate (h)3 2. what is a peak Selected Algorithm: • Mean-shift [4]. 3 Rule of thumb: h = (4ˆσ 3n )1/5 , where ˆσ is the standard deviation of the samples. 23 / 29
  • 24. Computer Aided Medical Procedures (CAMP) | TU München (TUM) Mean-shift Algorithm • Given data points D sampled i.i.d from an unknown density f • We need to define the shape of the density using Kernel Density Estimation (KDE) principle: fh(x) = 1 nhm n i=1 K( x − xi h ), where K(·) is a kernel function, must be positive, symmetric and differentiable, i.e. Gaussian kernel K(z) = 1 (2π)m/2 e− z 2 2 • The objective: find the peaks of fh(x) by equating fh(x) = 0 • That results in x = n i=1 xi K(x−xi h ) n i=1 K(x−xi h ) mean−shift Figure : Mean-shift Clustering 24 / 29
  • 25. Computer Aided Medical Procedures (CAMP) | TU München (TUM) Summary 25 / 29
  • 26. Computer Aided Medical Procedures (CAMP) | TU München (TUM) Acknowledgment This tutorial is done with the help of • Bishop’s book [3], • Meila’s slides in MLSS 2011 [9], and • Lichao’s slides form pervious semester (SS15) 26 / 29
  • 27. Computer Aided Medical Procedures (CAMP) | TU München (TUM) References (1) David Arthur and Sergei Vassilvitskii. k-means++: The advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pages 1027–1035. Society for Industrial and Applied Mathematics, 2007. James C Bezdek. Pattern recognition with fuzzy objective function algorithms. Springer Science & Business Media, 2013. Christopher M Bishop. Pattern recognition and machine learning. springer, 2006. Yizong Cheng. Mean shift, mode seeking, and clustering. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 17(8):790–799, 1995. Arthur P Dempster, Nan M Laird, and Donald B Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the royal statistical society. Series B (methodological), pages 1–38, 1977. Anil K Jain and Martin HC Law. Data clustering: A user’s dilemma. In Pattern Recognition and Machine Intelligence, pages 1–10. Springer, 2005. 27 / 29
  • 28. Computer Aided Medical Procedures (CAMP) | TU München (TUM) References (2) Tilman Lange, Volker Roth, Mikio L Braun, and Joachim M Buhmann. Stability-based validation of clustering solutions. Neural computation, 16(6):1299–1323, 2004. S. Lloyd. Least squares quantization in pcm. Information Theory, IEEE Transactions on, 28(2):129–137, Mar 1982. Marina Meila. Classic and modern data clustering. Machine Learning Summer School (MLSS), 2011. Andrew Y Ng, Michael I Jordan, Yair Weiss, et al. On spectral clustering: Analysis and an algorithm. Advances in neural information processing systems, 2:849–856, 2002. Hae-Sang Park and Chi-Hyuck Jun. A simple and fast algorithm for k-medoids clustering. Expert Systems with Applications, 36(2):3336–3341, 2009. Bernhard Schölkopf, Alexander Smola, and Klaus-Robert Müller. Nonlinear component analysis as a kernel eigenvalue problem. Neural computation, 10(5):1299–1319, 1998. 28 / 29
  • 29. Computer Aided Medical Procedures (CAMP) | TU München (TUM) References (3) Grigorios F Tzortzis and Aristidis C Likas. The global kernel-means algorithm for clustering in feature space. Neural Networks, IEEE Transactions on, 20(7):1181–1194, 2009. Ulrike Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17(4):395–416, 2007. 29 / 29