K-Means Clustering Simply

Machine Learning
Computer Science Department, Faculty of Computer and Information System, Islamic University of Madinah, Madinah, KSA
Computer Science Department, Faculty of Computers and Artificial Intelligence, Cairo University, Giza, Egypt.
K-Means Clustering Simply
Dr. Emad Nabil
The Lloyd Algorithm
Faculty of computers and information systems
Slides are compiled from many resources, thanks to those who made their slides available online.
Most of the Slides are by Andrew NG

Andrew Ng
Supervised learning
Training set:

Andrew Ng
Unsupervised learning
Training set:

Andrew Ng
Applications of clustering
Organize computing clusters
Social network analysis
Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin, Madison)
Astronomical data analysis
Market segmentation

Clustering
K-means
algorithm
Machine Learning

Andrew Ng
Random
initialization
of clusters

Andrew Ng
1
Cluster
assignment
step

Andrew Ng
2
Move
centroid
step

Andrew Ng
No
Enhancement
2
Move
centroid
step
stop

Andrew Ng
Input:
- (number of clusters)
- Training set
(drop convention)
K-means algorithm

Andrew Ng
f1 f2 …. fn Cluster index
𝑐 𝑖 𝜇 𝑐(𝑖)
1 1 𝑐 1
= 1 𝑣1, 𝑣2, … 𝑣 𝑛
2 1 𝑐 2 = 1 𝑣1, 𝑣2, … 𝑣 𝑛
3 2 𝑐(3) = 2
4 1 𝑐(4) = 1 𝑣1, 𝑣2, … 𝑣 𝑛
2 ….
…… …
m
1 ≤ 𝑐(𝑖)≤ 𝑘, 𝑐 𝑖 is the centrid assigned to data example 𝑥 𝑖
Say K=5, then  we have 5 centroids
𝜇 𝑐(𝑖) = 𝑤ℎ𝑒𝑟𝑒 𝑐(𝑖) = 1 ⇒ 𝜇1 =
x 1 +x 2 +x 4
3
= 𝑣1, 𝑣2, … 𝑣 𝑛 ∈ ℝ 𝑛
Data set description example

Clustering
Optimization
objective
Machine Learning

Andrew Ng
Randomly initialize cluster centroids
K-means algorithm
Repeat {
for = 1 to
:= index (from 1 to ) of cluster centroid
closest to
for = 1 to
:= average (mean) of points assigned to cluster
}
for loop over K, to find the nearest centroid to
𝒙 𝒊
, many distances measure may be used,
here we used squared Euclidean distance
1
Cluster
assignment
step

Andrew Ng
K-means algorithm
Repeat {
for = 1 to
closest to
for = 1 to
}
2
Move
centroid
step

Andrew Ng
K-means optimization objective
= index of cluster (1,2,…, ) to which example is currently
assigned
= cluster centroid ( )
= cluster centroid of cluster to which example has been
assigned
Optimization objective:

Clustering
Random
initialization
Machine Learning

Andrew Ng
K-means algorithm
Repeat {
for = 1 to
closest to
for = 1 to
}

Andrew Ng
Random initialization
Should have

Andrew Ng
Random initialization
Should have
Randomly pick training
examples.
Set equal to these
examples.

Andrew Ng
Local optima example 1
Optimal
clustering

Andrew Ng
No enhancement in the objective
function over iterations
Local optima

Andrew Ng
Local optima
No enhancement in the objective
function over iterations

Andrew Ng
Local optima

Andrew Ng
For i = 1 to 100 {
Randomly initialize K-means.
Run K-means. Get .
Compute cost function (distortion)
}
Pick clustering that gave lowest cost
Random initialization Solution :
Run K-means many times and pick the
clustering that gave lowest cost

Clustering
Choosing the
number of clusters
Machine Learning

Andrew Ng
What is the right value of K?

Andrew Ng
Choosing the value of K
Costfunction
(no. of clusters)
Elbow method
Elbow

Andrew Ng
Choosing the value of K
Sometimes, you’re running K-means to get clusters to use for some
later/downstream purpose. Evaluate K-means based on a metric for
how well it performs for that later purpose.
E.g. T-shirt sizing
Height
Weight
T-shirt sizing
HeightWeight
K=3 Small, Medium, Large
K=5 S, M, L, XL, XXL

Andrew Ng
Some of the Distance measures

Andrew Ng
Complexity
Complexity = O(t*k*m*n) where:
• t of iterations of the standard algorithm takes only
• m number of examples (data points)
• n: (n-dimensional) points,
• k is the number of centroids (or clusters).
This what practical implementations do (often with
random restarts between the iterations).
m iterations
k iterations
To find the distance between
centroid and any x i
𝑡ℎ𝑒𝑟𝑒 𝑎𝑟𝑒 𝑛
𝑜𝑝𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠
t iterations
Efficient algorithm !

Andrew Ng
K-Means Visualization
https://www.naftaliharris.com/blog/visualizing-k-means-clustering/

K-Means Clustering Simply

More Related Content

What's hot

Similar to K-Means Clustering Simply

Recently uploaded

K-Means Clustering Simply

Editor's Notes