This document discusses machine learning and K-means clustering. It provides an overview of the K-means algorithm, including random initialization of clusters, cluster assignment and moving centroid steps. It also discusses choosing the number of clusters, evaluating and visualizing K-means clustering, and some applications of clustering like image analysis and market segmentation. The document is attributed to Andrew Ng and references his lecture slides on machine learning and K-means clustering.
Human Factors of XR: Using Human Factors to Design XR Systems
Machine Learning K-Means Clustering
1. Machine Learning
Computer Science Department, Faculty of Computer and Information System, Islamic University of Madinah, Madinah, KSA
Computer Science Department, Faculty of Computers and Artificial Intelligence, Cairo University, Giza, Egypt.
K-Means Clustering Simply
Dr. Emad Nabil
The Lloyd Algorithm
Faculty of computers and information systems
Slides are compiled from many resources, thanks to those who made their slides available online.
Most of the Slides are by Andrew NG
5. Andrew Ng
Applications of clustering
Organize computing clusters
Social network analysis
Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin, Madison)
Astronomical data analysis
Market segmentation
24. Andrew Ng
Randomly initialize cluster centroids
K-means algorithm
Repeat {
for = 1 to
:= index (from 1 to ) of cluster centroid
closest to
for = 1 to
:= average (mean) of points assigned to cluster
}
for loop over K, to find the nearest centroid to
𝒙 𝒊
, many distances measure may be used,
here we used squared Euclidean distance
1
Cluster
assignment
step
25. Andrew Ng
Randomly initialize cluster centroids
K-means algorithm
Repeat {
for = 1 to
:= index (from 1 to ) of cluster centroid
closest to
for = 1 to
:= average (mean) of points assigned to cluster
}
2
Move
centroid
step
32. Andrew Ng
K-means optimization objective
= index of cluster (1,2,…, ) to which example is currently
assigned
= cluster centroid ( )
= cluster centroid of cluster to which example has been
assigned
Optimization objective:
34. Andrew Ng
Randomly initialize cluster centroids
K-means algorithm
Repeat {
for = 1 to
:= index (from 1 to ) of cluster centroid
closest to
for = 1 to
:= average (mean) of points assigned to cluster
}
42. Andrew Ng
For i = 1 to 100 {
Randomly initialize K-means.
Run K-means. Get .
Compute cost function (distortion)
}
Pick clustering that gave lowest cost
Random initialization Solution :
Run K-means many times and pick the
clustering that gave lowest cost
47. Andrew Ng
Choosing the value of K
Sometimes, you’re running K-means to get clusters to use for some
later/downstream purpose. Evaluate K-means based on a metric for
how well it performs for that later purpose.
E.g. T-shirt sizing
Height
Weight
T-shirt sizing
HeightWeight
K=3 Small, Medium, Large
K=5 S, M, L, XL, XXL
50. Andrew Ng
Complexity
Complexity = O(t*k*m*n) where:
• t of iterations of the standard algorithm takes only
• m number of examples (data points)
• n: (n-dimensional) points,
• k is the number of centroids (or clusters).
This what practical implementations do (often with
random restarts between the iterations).
m iterations
k iterations
To find the distance between
centroid and any x i
𝑡ℎ𝑒𝑟𝑒 𝑎𝑟𝑒 𝑛
𝑜𝑝𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠
t iterations
Efficient algorithm !