CLUSTERING
IntroductionCentroid -  Center of a clusterCentroid could be either a real point or an imaginary one.Objective function –Measures the quality of clustering (small value is desirable)Calculated by summing the squares of distances of each point from the centroid of the clusterTwo types of Clustering are:k-means ClusteringHierarchical Clustering
k-means ClusteringIt is an exclusive clustering algorithmAlgorithm:Select a value for ‘k’Select ‘k’ objects in an arbitrary fashion. Use it as an initial set of k centroidsAssign each object to the cluster for which it is nearest to the centroidRecalculate the centroidsRepeat steps 3 & 4 until centroids don’t move.It may not find the best set of clusters but will always terminate.
Agglomerative Hierarchical ClusteringAlgorithm:Assign each object to its own single-object cluster. Calculate the distance between each pair (distance matrix)Select and merge the closest pairsCalculate the distance between this new cluster and other clusters.Repeat steps 2 & 3 until all objects are in single cluster
ExampleBefore clusteringAfter two passes
ExampleIt gives the entire hierarchy of clustersDendrogram (A Binary tree) – End result of hierarchical clustering
Distance MeasureThree ways of calculating distances:Single-link clustering Shortest distance from any member of one cluster to any member of another clusterComplete-link clustering Longest distance from any member of one cluster to any member of another clusterAverage-link clustering Average distance from any member of one cluster to any member of another cluster
Visit more self help tutorialsPick a tutorial of your choice and browse through it at your own pace.The tutorials section is free, self-guiding and will not involve any additional support.Visit us at www.dataminingtools.net

Quick Look At Clustering

  • 1.
  • 2.
    IntroductionCentroid - Center of a clusterCentroid could be either a real point or an imaginary one.Objective function –Measures the quality of clustering (small value is desirable)Calculated by summing the squares of distances of each point from the centroid of the clusterTwo types of Clustering are:k-means ClusteringHierarchical Clustering
  • 3.
    k-means ClusteringIt isan exclusive clustering algorithmAlgorithm:Select a value for ‘k’Select ‘k’ objects in an arbitrary fashion. Use it as an initial set of k centroidsAssign each object to the cluster for which it is nearest to the centroidRecalculate the centroidsRepeat steps 3 & 4 until centroids don’t move.It may not find the best set of clusters but will always terminate.
  • 4.
    Agglomerative Hierarchical ClusteringAlgorithm:Assigneach object to its own single-object cluster. Calculate the distance between each pair (distance matrix)Select and merge the closest pairsCalculate the distance between this new cluster and other clusters.Repeat steps 2 & 3 until all objects are in single cluster
  • 5.
  • 6.
    ExampleIt gives theentire hierarchy of clustersDendrogram (A Binary tree) – End result of hierarchical clustering
  • 7.
    Distance MeasureThree waysof calculating distances:Single-link clustering Shortest distance from any member of one cluster to any member of another clusterComplete-link clustering Longest distance from any member of one cluster to any member of another clusterAverage-link clustering Average distance from any member of one cluster to any member of another cluster
  • 8.
    Visit more selfhelp tutorialsPick a tutorial of your choice and browse through it at your own pace.The tutorials section is free, self-guiding and will not involve any additional support.Visit us at www.dataminingtools.net