This document discusses K-means clustering, an unsupervised machine learning algorithm. It begins with an introduction to clustering and describes K-means clustering as assigning data points to K number of centroids, or cluster centers. The document outlines the K-means clustering procedure, which iteratively assigns data points to the closest centroid and recomputes centroids until centroids do not change. Advantages include faster computation than hierarchical clustering for large datasets, while disadvantages include difficulty selecting the optimal K value. Applications include wireless sensor networks, city planning, search engines, and email filtering.
Presentation by Deepak Verma and Ajay introduces K-Means Clustering and outlines the overview.
Clusters are groups of similar objects; introduces hard and soft clustering with an example of colored balls.
Describes exclusive clustering (e.g., K-means) vs. overlapping clustering, and details hierarchical clustering as nested clusters.
Discusses clustering algorithms identifying groups based on similarity, focusing on the K-Means algorithm's iterative method.
Flow chart and steps involved in K-Means clustering: selecting centroids, assigning points, and recomputing cluster means.Advantages: speed and tighter clusters. Disadvantages: difficulty in determining k-value and performance issues with varied cluster sizes.
INTRODUCTION
A Clusteris a group of similar objects.
Clustering refers to a method by which large
sets of data are grouped into clusters of
smaller sets of similar data.
Let us consider an example:-
f the three different colors into three
Different groups.
4.
The ballsof same color are clustered into a
group as shown below
Types of Clustering:-
Hard clustering
Soft clustering
5.
Types of Clusters
ExclusiveClustering:
Data is grouped in an exclusive way, so that if a
certain datum belongs to a definite cluster then it
could not be included in another cluster.
E.g. K-means
OverlappingClustering:
The overlapping clustering, uses fuzzy sets to
cluster data, so that each point may belong to
two or more cluster with different degrees of
member ship.
6.
Hierarchical clustering:
“A setof nested clusters organized as a
hierarchical tree”
•The hierarchical methods produce a set of
nested clusters in which each pair of objects
or clusters is progressively nested in a larger
cluster until only one cluster remains
7.
Clustering Algorithms:-
Aclustering algorithm attempts to find natural
groups of component (or data)based on some
similarity.
The clustering algorithm finds the centroid of
group of datasets.
Most algorithms evaluate the distance between a
point and the cluster centroids
RAW DATA CLUSTERINGALGORITHM CLUSTERS OF DATA
8.
K-Means Algorithm:-
Itis a distance-based, Partitional Clustering
algorithm.
“K” stands for number of clusters, it is a user
input to the algorithm.
It is unsupervised algorithm.
Each cluster is associated with a centroid.
Each point is assigned to cluster with closest
centroid.
This algorithm is iterative in nature.
9.
Procedure:-
1) Select Kpoints as the initial centroid.
2) Repeat it again.
3) From K clusters by assigning all points to
the closest centroid.
4) Re-compute the centroid of each cluster.
5) Until the centroids don’t change
ADVANTAGES
1. If variablesare huge, then the K-Means most of the
times computationally faster than hierarchical clustering.
If we keep k smalls.
2. K-Means produce tighter clusters than hierarchical
clustering , especially if the cluster are globular.
DISADVANTAGES
1. Difficult to product k-value.
2. It does not work well with clusters of Different size and
Different density.