Kmeans

Gaurav.R.Handa
TE CMPN-A ,40
Under the Guidance of
Ms.Veena Kulkarni
Thakur college of Engineering and Technology, Shyamnarayan Marg, Thakur Village,
Kandivli (E), Mumbai-101.Year 2013-2014

This presentation shows implementation of the k-means
algorithm. Along with a brief description of the algorithm
we have also provided graphs and arithmetic problems for
better understanding of the algorithm.
It shows how k-means algorithm is implemented
efficiently along with the drawbacks of this algorithm.

• Business Intelligence is a more advanced form of Data
Mining and Databases.
• Business Intelligence enables the business to make
intelligent and fact-based decisions.
• It is divided into Association Analysis, Classification,
Clustering and Regression.
• Data clustering is a method in which we make cluster of
objects which are somewhat similar in characteristics.
• Clustering is further divided into Hierarchical,
Partitional and Density based. K-means is an algorithm
which is a part of partitional clustering.

•The knowledge discovery process by analyzing large
volumes of data from various perspectives and organizing
them into useful information.
•The search for valuable information in large volumes of
data and to identify hidden structures in data.

K-means algorithm is a Centroid based technique in
which each cluster is represented by the centre of the
cluster.
This algorithm aims at minimizing an objective
function, specifically a squared error function.

• Papers on K-Means
“The Uniqueness of a Good Optimum for K-Means’’, Marina Meila,
Proceedings of the 23rd International Conference on Machine Learning,
2006-By augmenting k-means with a simple,randomized seeding
technique, they obtained an algorithm that is O(log k)-competitive with
the optimal clustering,that guarantees speed &accuracy.
• “The Effectiveness of Lloyd-Type Methods for the k-Means Problem”,
Rafail Ostrovsky, Yuval Rabani, Leonard J. Schulman, and Chaitanya
Swamy, SODA, 2007-Polynomial-time approximation schemes (PTAS’s)
has been obtained for the k-means clustering algo.
• “Improved Smoothed Analysis of the k-Means Method”, Bodo Manthey
and Heiko Roglin, preprint, 2008- The paper tells us one of the
distinguished features is its speed in practice. Its worst-case running-time,
however, is exponential, leaving a gap between practical and theoretical
performance. This technical paper aims at closing this gap.

1.Archaeology
The objective here is to cluster the locations of
archaeological sites and to make inferences about political
history based on the clusters.
With the help of these we can make some speculations and
these can be tested by actual going to the site.

2. Computational Biology
Here, carp to different levels of cold and genes were
clustered based on their response in different tissues.
Green colour indicates that the gene is under expressed
whereas red colour indicates that the gene is over expressed.
We can see in the figure that there are some patterns in
different tissues.
Thus clustering is a useful tool where we can represent so
much information in one plot.

3.Education
This example is taken from “Teachers as Sources of Middle
School Students’ Motivational Identity: Variable Centered
and Person Centered Analytic Approaches” paper.
In this paper survey results of 206 students are clustered.
These clusters are used to identify groups to buttress an
analysis of what affects motivation.
The number of clusters were selected to get some nice
hypothesis. This hypothesis can then be verified.

 Need to specify K, the number of clusters, in advance
 Unable to handle noisy data and outliers (K-Medoids
algorithm)
 Not suitable for discovering clusters with non-convex
shapes
 Applicable only when mean is defined(K-mode
algorithm).

 K-means algorithm is a simple yet popular method for
clustering analysis. Its performance is determined by
initialisation and appropriate distance measure. There
are several variants of K-means to overcome its
weaknesses :
– K-Medoids: resistance to noise and/or outliers
– K-Modes: extension to categorical data clustering
analysis
– CLARA: dealing with large data sets
– Mixture models (EM algorithm): handling uncertainty
of clusters

Bowman, M., Debray, S. K., and Peterson, L. L. 1993. Reasoning
about naming systems. .
Ding, W. and Marchionini, G. 1997 A Study on Video Browsing
Strategies. Technical Report. University of Maryland at College
Park.
Fröhlich, B. and Plate, J. 2000. The cubic mouse: a new device for
three-dimensional input. In Proceedings of the SIGCHI Conference
on Human Factors in Computing Systems
Tavel, P. 2007 Modeling and Simulation Design. AK Peters Ltd.

Kmeans

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Kmeans

Similar to Kmeans (20)

More from Gaurav Handa

More from Gaurav Handa (9)

Recently uploaded

Recently uploaded (20)

Kmeans