Improving the accuracy of K-means clustering algorithm Kasun Ranga Wijeweera (firstname.lastname@example.org)
This presentation is based on the following research paper K. A. Abdul Nazeer, M. P. Sebastian, Improving the Accuracy and Efficiency of the k-means Clustering Algorithm, Proceedings of the World Congress on Engineering 2009 Vol I, WCE 2009, July 1 – 3, 2009, London, U. K.
Consider a Set of Data Points,And a Set of Clusters,
Algorithm k-means1.Randomly choose K data items from X as initialcentroids.2.Repeat Assign each data point to the cluster which has the closest centroid. Calculate new cluster centroids. Until the convergence criteria is met.
Algorithm selection of initial centroids1. Set m = 1;2. Compute the distance between each data point and all other data points in the set;3. Find the closest pair of data points from the set X and form a data point set A[m] (1 <= m <= K) which contains these two data points. Delete these two data points from the set;4. Find the data point in X that is closest to the data points set. Add it to A[m] and delete it from X;5. Repeat step 4 until the number of data points in A[m] reaches 0.75*(n/k);
Algorithm selection of initial centroidscontinued…6. If m < k then m = m + 1, find another pair of data points from X between which the distance is the shortest, form another data point set A[m] and delete them from X. Go to step 4;7. For each data point set A[m] (1 <= m <= K) find the arithmetic mean of the vectors of data points in A[m]. These means will be the initial centroids.