2. Iteration in K-means Clustering
In each iteration of the K-means algorithm, data points are assigned to the nearest centroid, and the centroids
are updated based on the assigned points. This iterative process continues until convergence, where the
clusters become more refined. Convergence criteria and stopping conditions, such as a maximum number of
iterations or a minimum change in centroids, ensure that the algorithm stops when the clusters have reached
stability.
3. Introduction to Clustering
What is Clustering?
Clustering is a technique used
to group similar data points
together based on certain
features or attributes. It's a
fundamental concept in
unsupervised learning and
plays a crucial role in identifying
patterns within datasets.
Purpose of Grouping Data
The main purpose of clustering
is to discover inherent
structures within the data,
helping to organize and
understand complex datasets.
By identifying similarities
among data points, clustering
can provide valuable insights
for various applications.
4. What is K-means Clustering?
1 Definition
K-means clustering is a popular unsupervised learning algorithm used for partitioning data into
K clusters. It aims to minimize the variance within each cluster and is particularly useful for
large datasets or data exploration.
2 Unsupervised Learning
As an unsupervised learning algorithm, K-means clustering does not require labeled output or
target variables. It automatically identifies patterns in the input data and assigns data points to
clusters without external guidance.
5. How Does K-means Work?
Initialization
K-means clustering begins by
randomly selecting K centroids,
which serve as the initial cluster
centers. These centroids are
crucial for the subsequent steps
of the algorithm.
Assignment & Update
Once the centroids are
initialized, the algorithm iterates
through an assignment step,
where each data point is
assigned to the nearest
centroid, and an update step,
where the centroids are
recalculated based on the
mean of the data points within
each cluster.
Iteration
The assignment and update
steps are repeated iteratively
until convergence, refining the
clusters with each iteration.
Convergence is achieved when
the centroids no longer change
significantly.
6. Example Dataset
1 Characteristics of Dataset
The example dataset consists of various data points represented with (x, y) coordinates on a 2-
dimensional graph. It exhibits different patterns and groupings, making it suitable for application of the
K-means clustering algorithm.
7. Step 1 - Initialization
Random Centroid Selection
The first step in K-means clustering involves the random selection of K initial
centroids, visually represented as red dots on the graph. These centroids
serve as the starting points for the clustering process.
8. Step 2 - Assignment
Assigning Data Points
Data points are assigned to the nearest centroids, forming distinct clusters based on their proximity to the
centroids. This step effectively groups the data points based on similarity to the centroid.
9. Step 3 - Update Centroids
Recalculating Centroids
After assigning data points to clusters, the centroids are recalculated based on the mean of the data points
within each cluster. This process involves shifting the centroids to new positions, optimizing the cluster
centres.