K-means Clustering Explained

K-meansClustering
-Vinit Dantkale

Information:
What is Clustering?
• Clustering is alternatively called as “grouping”
• Organizing data into class such that:
High intra-class Similarity
 Low inter-class Similarity
Clustering Algorithm:
• Assigning same labels to data points that are close to each
other.
• Clustering algorithms rely on a distance metric between data
points

Information:
(contd.)
Types Of Clustering:
1. Hierarchical: [find Successive Cluster]
 Agglomerative (bottom-up)
 Divisive (top-down)
2. Partitional: [Construct various partition and evaluate]
 K-means Clustering
 Fuzzy c-means
 QT clustering
The centroid is (typically) the mean of the points in the cluster.
 Similarity is measured by Euclidean distance, Manhattan
Distance
NOTE: UsedWhen data is numeric not when categorical or boolean.

PseudoCode:
Input: K, Set of points X1....Xn
Place Centroids C1.....Ck at random Locations
Repeat until Convergence
-for each point X1:
 Find nearest Centroid Cj
 Assign the point Xi to cluster j
-for each cluster j=1.....k
 New Centroid Cj = mean of all points Xi assigned
to cluster j in previous step
Stop when none of the cluster assignment changes
Euclidian Distance

Working:
(contd.)
Interation 1 ends

Working:
(contd.) As no points change cluster Algorithm stops
Facts
 K-means is blazingly fast compared to
other Clustering Algorithm
 This Algorithm is also used to form clusters in 3D

Application:
Real Life:
1. Marketing: Help marketers discover distinct
groups in their customer bases.
2. Insurance: Identifying groups of motor insurance
policy holders with a high average claim cost
Application Domain:
1. Vector quantization: For color quantization to
reduce the color palette of an image to a fixed
number of colors k.
2. Image Segmentation: It is the process of
partitioning a digital image into multiple
segments

Complexity:
O(t*k*n)
where n is # data points, k is # clusters, and
t is # iterations.
Normally, k, t << n.
Strengths:
Relatively Efficient and fast.
Often terminates Early.

References:
• http://courses.washington.edu/css490/2012.Winter/lecture_slides/10_cl
ustering_basics_1.pdf
• https://www.youtube.com/watch?v=_aWzGGNrcic
• https://www.youtube.com/watch?v=7Qv0cmJ6FsI
• https://en.wikipedia.org/wiki/K-means_clustering#Applications

K-means Clustering Explained

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to K-means Clustering Explained

Similar to K-means Clustering Explained (20)

Recently uploaded

Recently uploaded (20)

K-means Clustering Explained