CLUSTERING
WHAT IS CLUSTERING?
1. Grouping of objects based on homogeneous features.
2. Example: In a class of students, boys and girls are separated into two groups.
3. We can further classify the groups further into another set of groups based on
some homogeneous features.
4. Example: Considering the previous example given in point 2, the boys and girls
separated into groups can be further grouped based on there heights for
seating arrangements in the class.
OBJECTIVE OF CLUSTER ANALYSIS
• Intra cluster is the sum distances between objects in the same cluster.
• Inter cluster is the distance between objects in different cluster.
• Intra cluster • Inter cluster
CLUSTERING A SET OF DATAPOINTS
TYPES OF CLUSTERING
1. Hierarchical clustering:
• No need to decide number of clusters for ‘n’
number of observations.
• It will automatically create a set of clusters till
when ‘n’ clusters = n set of observations.
• It will create a family tree of clusters better
known as “dendrogram”.
• Two approaches of hierarchical clustering
• Agglomerative approach (“bottom-up”)
• Divisive (“top-down”)
TYPES OF CLUSTERING
2. K-means clustering or
portioning clustering:
• Decide k clusters for n
observations when k < n.
• For example n = 10 and k = 2 , we
will have 2 clusters each having 5
observations.
• Reiterations of clustering is
required till when all the
observations are finally allocated
to the respective cluster.
DISTANCE COMPUTATION OF COORDINATE POINTS
• Euclidean distance:
• Manhattan distance:
• Mahalanobis distance: A combination of Euclidean and Manhattan distance.
METHODS OF CLUSTERING
1. Single linkage method: Measuring the minimum distance between two data
points
METHODS OF CLUSTERING
2. Complete linkage method: Measures the maximum distance between two data
points
METHODS OF CLUSTERING
3. Average linkage method: Average of all the distance measured between set of
data points

Clustering

  • 1.
  • 2.
    WHAT IS CLUSTERING? 1.Grouping of objects based on homogeneous features. 2. Example: In a class of students, boys and girls are separated into two groups. 3. We can further classify the groups further into another set of groups based on some homogeneous features. 4. Example: Considering the previous example given in point 2, the boys and girls separated into groups can be further grouped based on there heights for seating arrangements in the class.
  • 3.
    OBJECTIVE OF CLUSTERANALYSIS • Intra cluster is the sum distances between objects in the same cluster. • Inter cluster is the distance between objects in different cluster.
  • 4.
    • Intra cluster• Inter cluster
  • 5.
    CLUSTERING A SETOF DATAPOINTS
  • 6.
    TYPES OF CLUSTERING 1.Hierarchical clustering: • No need to decide number of clusters for ‘n’ number of observations. • It will automatically create a set of clusters till when ‘n’ clusters = n set of observations. • It will create a family tree of clusters better known as “dendrogram”. • Two approaches of hierarchical clustering • Agglomerative approach (“bottom-up”) • Divisive (“top-down”)
  • 7.
    TYPES OF CLUSTERING 2.K-means clustering or portioning clustering: • Decide k clusters for n observations when k < n. • For example n = 10 and k = 2 , we will have 2 clusters each having 5 observations. • Reiterations of clustering is required till when all the observations are finally allocated to the respective cluster.
  • 8.
    DISTANCE COMPUTATION OFCOORDINATE POINTS • Euclidean distance: • Manhattan distance: • Mahalanobis distance: A combination of Euclidean and Manhattan distance.
  • 9.
    METHODS OF CLUSTERING 1.Single linkage method: Measuring the minimum distance between two data points
  • 10.
    METHODS OF CLUSTERING 2.Complete linkage method: Measures the maximum distance between two data points
  • 11.
    METHODS OF CLUSTERING 3.Average linkage method: Average of all the distance measured between set of data points