OVERVIEW INTRODUCTION LITERATURE SURVEY IMPLEMENTATION DETAILS FUTURISTIC SCOPE
INTRODUCTION A Cluster is nothing but a group of similar data objects. Clustering refers to a method by which large sets of data are grouped into clusters of smaller sets of similar data. Clustering has many types. These include : - Hierarchical clustering Partitional clustering Density - based clustering Distance - based clustering
One of the Algorithms for clustering is K-means Algorithm. As the name suggests, we divide the data set into K clusters; where k is a positive integer number. Firstly, we compute the centroid of each cluster. Then, the proximity of data points from this centroid is computed by finding the mean. This process continues iteratively till entire data is divided into proper k clusters.
LITERATURE SURVEYClustering :- Let us consider an example :- f the three different colours into three different groups.
The balls of same colour are clustered into a group as shown belowTypes of Clustering :- Hard clustering Soft clustering
Clustering Algorithms :- A clustering algorithm attempts to find natural groups of components (or data) based on some similarity. The clustering algorithm finds the centroid of a group of datasets. Most algorithms evaluate the distance between a point and the cluster centroids.
K-Means Algorithm:- It is a distance-based, Partitional clustering algorithm. “K” stands for number of clusters, it is a user input to the algorithm. It is unsupervised algorithm. Each cluster is associated with a centroid. Each point is assigned to cluster with closest centroid. This algorithm is iterative in nature.
1) Select K points as the initial centroid.2) repeat3) form K clusters by assigning all points to the closest centroid.4) Recompute the centroid of each cluster.5) until the centroids don’t change.
K-means example, step 1 k1Pickk=3initialcluster Y k2centers(randomly) k3 X
K-means example, step 2 k1Assigneach pointto the k2closest Yclustercenter k3 X
K-means example, step 3Move k1 k1eachclustercenterto the Y k2meanof each k3cluster k2 k3 X
K-means example, step 4Reassignpoints k1closest to adifferentnew clustercenter YQ: Which k3points are k2reassigned? X
K-means example, step 4 … k1A: threepoints withanimation Y k3 k2 X
K-means example, step 5 k1re-computecluster Ymeans k3 k2 X
K-means example, step 6 k1moveclustercenters Yto clustermeans k2 k3 X
Advantages : Simple, understandable. Items automatically assigned to clusters.Disadvantages : The number of clusters, K, must be determined before hand. We never know which attribute contributes more to the grouping process since we assume that each attribute has the same weight. Too sensitive to outliers.
Applications of K-means : Unsupervised learning of neural networks. Pattern recognitions. Classification analysis. Artificial intelligence. Image processing. Machine vision. Email filtering. Web page classification.