2. Information:
What is Clustering?
• Clustering is alternatively called as “grouping”
• Organizing data into class such that:
High intra-class Similarity
Low inter-class Similarity
Clustering Algorithm:
• Assigning same labels to data points that are close to each
other.
• Clustering algorithms rely on a distance metric between data
points
3. Information:
(contd.)
Types Of Clustering:
1. Hierarchical: [find Successive Cluster]
Agglomerative (bottom-up)
Divisive (top-down)
2. Partitional: [Construct various partition and evaluate]
K-means Clustering
Fuzzy c-means
QT clustering
The centroid is (typically) the mean of the points in the cluster.
Similarity is measured by Euclidean distance, Manhattan
Distance
NOTE: UsedWhen data is numeric not when categorical or boolean.
4. PseudoCode:
Input: K, Set of points X1....Xn
Place Centroids C1.....Ck at random Locations
Repeat until Convergence
-for each point X1:
Find nearest Centroid Cj
Assign the point Xi to cluster j
-for each cluster j=1.....k
New Centroid Cj = mean of all points Xi assigned
to cluster j in previous step
Stop when none of the cluster assignment changes
Euclidian Distance
7. Working:
(contd.) As no points change cluster Algorithm stops
Facts
K-means is blazingly fast compared to
other Clustering Algorithm
This Algorithm is also used to form clusters in 3D
8. Application:
Real Life:
1. Marketing: Help marketers discover distinct
groups in their customer bases.
2. Insurance: Identifying groups of motor insurance
policy holders with a high average claim cost
Application Domain:
1. Vector quantization: For color quantization to
reduce the color palette of an image to a fixed
number of colors k.
2. Image Segmentation: It is the process of
partitioning a digital image into multiple
segments
9. Complexity:
O(t*k*n)
where n is # data points, k is # clusters, and
t is # iterations.
Normally, k, t << n.
Strengths:
Relatively Efficient and fast.
Often terminates Early.