2. K-means clustering is a simple unsupervised learning algorithm
that is used to solve clustering problems. It follows a simple
procedure of classifying a given data set into a number of clusters,
defined by the letter "k," which is fixed beforehand. The clusters are
then positioned as points and all observations or data points are
associated with the nearest cluster, computed, adjusted and then the
process starts over using the new adjustments until a desired result
is reached.
3. This is a versatile algorithm that can be used for any type of
grouping. Some examples of use cases are:
Behavioural segmentation:
Segment by purchase history
Segment by activities on application, website, or platform
Define personas based on interests
Create profiles based on activity monitoring
Inventory categorization:
Group inventory by sales activity
Group inventory by manufacturing metrics
4. Sorting sensor measurements:
Detect activity types in motion sensors
Group images
Separate audio
Identify groups in health monitoring
Detecting bots or anomalies:
Separate valid activity groups from bots
Group valid activity to clean up outlier detection
In addition, monitoring if a tracked data point switches
between groups over time can be used to detect
meaningful changes in the data.
5. Suppose we have some data of Height of
Students and Weight of student
No. Height of
Student (in
cms)
Weight of student
(in Kgs)
1 185 72
2 170 56
3 168 60
4 179 68
5 182 72
6 188 77
7 180 71
8 180 70
9 183 84
10 180 88
11 180 67
12 177 76
0
10
20
30
40
50
60
70
80
90
100
165 170 175 180 185 190
Weight
6. Now I need to classify the data points using K- Means algorithm into 2
Cluster in the name K1 and K2.
Now here I am using the centroid Concept i.e. For Every cluster there will
a Centroid value associated.
Centroid value is such value by using the value the rest data points will
be clustered.
Then we need to calculate the distance of the data points from the
centroid value.
Here the distance will be Euclidean Distance.
ED = (𝑋𝑜 − 𝑋𝐶)2+(𝑌𝑜 + 𝑌𝐶)2
𝑋𝑜 & 𝑌𝑜 - Observed Value
𝑋𝐶 & 𝑌𝐶 - are centroid value
7. I have taken the 1st row as a centroid value for K1i.e.
(185, 72) and 2nd Row as a centroid value of K2 i.e.
(170,56).
Now, we need to cluster the data into two clusters by
measuring Euclidean Distance.
Now ED for 3rd row =
K1: (168 − 185)2+(60 − 72)2 = 20.82
K2 (168 − 170)2+(60 − 56)2 = 4.48
As 3rd row value ED is nearer to K2 [ ED for K2 < ED for
K1] so 3rd row will be in K2
So our New cluster will be like :
K1 – 1st row
K2 – 2nd and 3rd Row
185,72 170,56
K1 K2
8. Now we need to recalculate the new
centroid for K2 [ as 3rd row gone under
K2]
So new Cetroid value of K2 =
(
170+168
2
,
60+56
2
) = (169, 58)
Now we need to recalculate the ED for
4th Row as before.
Thus, we get the final K1 and K2
Cluster as
K1: {1,4,5,6,7,8,9,10,11,12}
K2: {2,3}
185,72 169,58
K1 K2