K-MEANS CLUSTERING
PRESENTED BY-
DEEPAK VERMA(14052140019)
AJAY(1405214003)
OVERVIEW
 INTRODUCTION
 TYPE OF CLUSTERING
 IMPLEMENTATION DETAILS
 FLOW CHART
 ADVANTAGE AND DISADVANTAGE OF
CLUSTERING
 APPLICATIONS
 REFERENCES
INTRODUCTION
 A Cluster is a group of similar objects.
 Clustering refers to a method by which large
sets of data are grouped into clusters of
smaller sets of similar data.
Let us consider an example:-
f the three different colors into three
Different groups.
 The balls of same color are clustered into a
group as shown below
Types of Clustering:-
 Hard clustering
 Soft clustering
Types of Clusters
Exclusive Clustering:
Data is grouped in an exclusive way, so that if a
certain datum belongs to a definite cluster then it
could not be included in another cluster.
E.g. K-means
OverlappingClustering:
The overlapping clustering, uses fuzzy sets to
cluster data, so that each point may belong to
two or more cluster with different degrees of
member ship.
Hierarchical clustering:
“A set of nested clusters organized as a
hierarchical tree”
•The hierarchical methods produce a set of
nested clusters in which each pair of objects
or clusters is progressively nested in a larger
cluster until only one cluster remains
Clustering Algorithms:-
 A clustering algorithm attempts to find natural
groups of component (or data)based on some
similarity.
 The clustering algorithm finds the centroid of
group of datasets.
 Most algorithms evaluate the distance between a
point and the cluster centroids
RAW DATA CLUSTERINGALGORITHM CLUSTERS OF DATA
K-Means Algorithm:-
 It is a distance-based, Partitional Clustering
algorithm.
 “K” stands for number of clusters, it is a user
input to the algorithm.
 It is unsupervised algorithm.
 Each cluster is associated with a centroid.
 Each point is assigned to cluster with closest
centroid.
 This algorithm is iterative in nature.
Procedure:-
1) Select K points as the initial centroid.
2) Repeat it again.
3) From K clusters by assigning all points to
the closest centroid.
4) Re-compute the centroid of each cluster.
5) Until the centroids don’t change
START
Number of
Cluster K
Centroid
Distance Objects of
Centroids
Grouping Based of
Minimum Distance
No
object
move
group
END
FLOW CHART:
Pick
k=3
Initial
Cluster
Centers
(rando
mly)
Y
STEP.1
k2
k1
k3
X
k2
k1
k3Y
Assign
each
point
To the
Closest
Cluster
center
X
STEP.2
k2
k1
k3
Y
X
Move
each
Cluster
Center
To the
Mean
Of each
cluster
STEP.3
k2
k1
k3
X
Y
Reassign
Points
Closest
to a
Different
New
Cluster
center
Q.Which
Points are
reassigned
STEP.4 Continue
k2
k1
k3
X
Y
A:three
Points
With
animation
STEP.4
k2
k1
k3
X
Y
Re-compute
Cluster
means
STEP.5
k2
k1
k3
X
Y
Move
Cluster
Centers
To cluster
means
STEP.6
ADVANTAGES
1. If variables are huge, then the K-Means most of the
times computationally faster than hierarchical clustering.
If we keep k smalls.
2. K-Means produce tighter clusters than hierarchical
clustering , especially if the cluster are globular.
DISADVANTAGES
1. Difficult to product k-value.
2. It does not work well with clusters of Different size and
Different density.
Applications
 Wireless sensor networks
 City-planning
 Search Engines
 Email Filtering
K-Means clustring @jax

K-Means clustring @jax