2. Introduction Centroid - Center of a cluster Centroid could be either a real point or an imaginary one. Objective function – Measures the quality of clustering (small value is desirable) Calculated by summing the squares of distances of each point from the centroid of the cluster Two types of Clustering are: k-means Clustering Hierarchical Clustering
3. k-means Clustering It is an exclusive clustering algorithm Algorithm: Select a value for ‘k’ Select ‘k’ objects in an arbitrary fashion. Use it as an initial set of k centroids Assign each object to the cluster for which it is nearest to the centroid Recalculate the centroids Repeat steps 3 & 4 until centroids don’t move. It may not find the best set of clusters but will always terminate.
4. Agglomerative Hierarchical Clustering Algorithm: Assign each object to its own single-object cluster. Calculate the distance between each pair (distance matrix) Select and merge the closest pairs Calculate the distance between this new cluster and other clusters. Repeat steps 2 & 3 until all objects are in single cluster
6. Example It gives the entire hierarchy of clusters Dendrogram (A Binary tree) – End result of hierarchical clustering
7. Distance Measure Three ways of calculating distances: Single-link clustering Shortest distance from any member of one cluster to any member of another cluster Complete-link clustering Longest distance from any member of one cluster to any member of another cluster Average-link clustering Average distance from any member of one cluster to any member of another cluster
8. Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net