2. Introduction
• The goal of clustering is to
– group data points that are close (or similar) to each other
– identify such groupings (or clusters) in an unsupervised manner
• Unsupervised: no information is provided to the algorithm on which
data points belong to which clusters
• Example
×
×
×
×
×
×
×
×
×
What should the clusters
be for these data points?
3. What is Clustering?
• Clustering can be considered the most important unsupervised learning
problem; so, as every other problem of this kind deals with finding a
structure in a collection of unlabeled data.
• A loose definition of clustering could be “the process of organizing
objects into groups whose members are similar in some way”.
• A cluster is therefore a collection of objects which are “similar”
between them and are “dissimilar” to the objects belonging to other
clusters.
4. Clustering Algorithms
A clustering algorithm attempts to find natural groups of components (or data)
based on some similarity
Also, the clustering algorithm finds the centroid of a group of data sets
To determine cluster membership, most algorithms evaluate the distance
between a point and the cluster centroids
The output from a clustering algorithm is basically a statistical description of
the cluster centroids with the number of components in each cluster.
5. • Simple graphical example:
In this case we easily identify the 4 clusters into which the data can be
divided; the similarity criterion is distance: two or more objects belong to
the same cluster if they are “close” according to a given distance. This is
called distance-based clustering.
Another kind of clustering is conceptual clustering: two or more objects
belong to the same cluster if this one defines a concept common to all that
objects.
In other words, objects are grouped according to their fit to descriptive
concepts, not according to simple similarity measures.
6. Examples of Clustering Applications
Marketing: Help marketers discover distinct groups in their customer bases,
and then use this knowledge to develop targeted marketing programs
Land use: Identification of areas of similar land use in an earth observation
database
Insurance: Identifying groups of motor insurance policy holders with a
high average claim cost
City-planning: Identifying groups of houses according to their house type,
value, and geographical location
Earth-quake studies: Observed earth quake epicenters should be clustered
along continent faults
7. Quality: What Is Good Clustering?
A good clustering method will produce high quality clusters with
– high intra-class similarity
– low inter-class similarity
The quality of a clustering result depends on both the similarity measure
used by the method and its implementation
The quality of a clustering method is also measured by its ability to
discover some or all of the hidden patterns
8. Measure the Quality of Clustering
• Dissimilarity/Similarity metric: Similarity is expressed in terms of a distance
function, which is typically metric: d(i, j)
• There is a separate “quality” function that measures the “goodness” of a
cluster.
• The definitions of distance functions are usually very different for interval-
scaled, boolean, categorical, ordinal and ratio variables.
• Weights should be associated with different variables based on applications
and data semantics.
• It is hard to define “similar enough” or “good enough”
– the answer is typically highly subjective.
9. Requirements of Clustering in Data Mining
Scalability
Ability to deal with different types of attributes
Ability to handle dynamic data
Discovery of clusters with arbitrary shape
Minimal requirements for domain knowledge to determine input
parameters
Able to deal with noise and outliers
Insensitive to order of input records
High dimensionality
Incorporation of user-specified constraints
Interpretability and usability
10. Major Clustering Approaches
• Partitioning: Construct various partitions and then evaluate them by some
criterion
• Hierarchical: Create a hierarchical decomposition of the set of objects using
some criterion
• Model-based: Hypothesize a model for each cluster and find best fit of
models to data
• Density-based: Guided by connectivity and density functions
11. Typical Alternatives to Calculate the Distance
between Clusters
Single link: smallest distance between an element in one cluster and an
element in the other, i.e., dis(Ki, Kj) = min(tip, tjq)
Complete link: largest distance between an element in one cluster and an
element in the other, i.e., dis(Ki, Kj) = max(tip, tjq)
Average: avg distance between an element in one cluster and an element in
the other, i.e., dis(Ki, Kj) = avg(tip, tjq)
Centroid: distance between the centroids of two clusters, i.e., dis(Ki, Kj) =
dis(Ci, Cj)
Medoid: distance between the medoids of two clusters, i.e., dis(Ki, Kj) =
dis(Mi, Mj)
– Medoid: one chosen, centrally located object in the cluster
12. Centroid, Radius and Diameter of a
Cluster (for numerical data sets)
• Centroid: the “middle” of a cluster
• Radius: square root of average distance from any point of the cluster to its
centroid
• Diameter: square root of average mean squared distance between all pairs
of points in the cluster
N
t
N
i ip
m
C
)
(
1
N
m
c
ip
t
N
i
m
R
2
)
(
1
)
1
(
2
)
(
1
1
N
N
iq
t
ip
t
N
i
N
i
m
D
13. Partitioning Algorithms
• Partitioning method: Construct a partition of a database D of n objects into a
set of k clusters
• Given a k, find a partition of k clusters that optimizes the chosen partitioning
criterion
– Global optimal: exhaustively enumerate all partitions
– Heuristic methods: k-means and k-medoids algorithms
– k-means (MacQueen, 1967): Each cluster is represented by the center of
the cluster
– k-medoids or PAM (Partition around medoids) (Kaufman & Rousseeuw,
1987): Each cluster is represented by one of the objects in the cluster
2
1 )
( mi
m
Km
t
k
m t
C
mi
14. The K-Means Clustering Method
• Given k, the k-means algorithm is implemented in four steps:
– Partition objects into k nonempty subsets
– Compute seed points as the centroids of the clusters of the current
partition (the centroid is the center, i.e., mean point, of the cluster)
– Assign each object to the cluster with the nearest seed point
– Go back to Step 2, stop when no more new assignment
16. Comments on the K-Means Method
• Strength: Relatively efficient: O(tkn), where n is # objects, k is # clusters, and t
is # iterations. Normally, k, t << n.
• Comparing: PAM: O(k(n-k)2 ), CLARA: O(ks2 + k(n-k))
• Comment: Often terminates at a local optimum. The global optimum may be
found using techniques such as: deterministic annealing and genetic
algorithms
• Weakness
– Applicable only when mean is defined, then what about categorical data?
– Need to specify k, the number of clusters, in advance
– Unable to handle noisy data and outliers
– Not suitable to discover clusters with non-convex shapes
17. Variations of the K-Means Method
• A few variants of the k-means which differ in
– Selection of the initial k means
– Dissimilarity calculations
– Strategies to calculate cluster means
• Handling categorical data: k-modes (Huang’98)
– Replacing means of clusters with modes
– Using new dissimilarity measures to deal with categorical objects
– Using a frequency-based method to update modes of clusters
– A mixture of categorical and numerical data: k-prototype method
18. What Is the Problem of the K-Means Method?
• The k-means algorithm is sensitive to outliers !
– Since an object with an extremely large value may substantially distort the
distribution of the data.
• K-Medoids: Instead of taking the mean value of the object in a cluster as a
reference point, medoids can be used, which is the most centrally located
object in a cluster.
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
19. Example
• Height and weight information are given. Using these two variables,
we need to group the objects based on height and weight information.
21. Step 1: Input
Dataset, Clustering Variables and Maximum Number of Clusters (K in
Means Clustering)
In this dataset, only two variables –height and weight – are considered for
clustering
Height Weight
185 72
170 56
168 60
179 68
182 72
188 77
180 71
180 70
183 84
180 88
180 67
177 76
22. Step 2: Initialize cluster centroid
In this example, value of K is considered as 2. Cluster centroids are
initialized with first 2 observations.
Initial Centroid
Cluster Height Weight
K1 185 72
K2 170 56
23. Step 3: Calculate Euclidean Distance
Euclidean is one of the distance measures used on K Means algorithm.
Euclidean distance between of a observation and initial cluster centroids 1
and 2 is calculated.
Based on euclidean distance each observation is assigned to one of the
clusters - based on minimum distance.
Euclidean Distance
24. First two observations
Height Weight
185 72
170 56
Now initial cluster centroids are :
Updated Centroid
Cluster Height Weight
K1 185 72
K2 170 56
Euclidean Distance Calculation from each of the clusters is calculated.
Euclidian Distance from Euclidian Distance from
Cluster 1 Cluster 2 Assignment
(185-185)2+(72-72)2 =0 (185-170)2+(72-56)2= 21.93 1
(170-185)2+(56-72)2= 21.93 (170-170)2+(56-56)2= 0 2
We have considered two observations for assignment only because we knew the
assignment. And there is no change in Centroids as these two observations were
only considered as initial centroids.
25. Step 4: Move on to next observation and calculate Euclidean Distance
Height Weight
168 60
Euclidean Distance Euclidean Distance
from Cluster 1 from Cluster 2 Assignment
(168-185)2+(60-72)2 =20.808 (168-185)2+(60-72)2= 4.472 2
Since distance is minimum from cluster 2, so the observation is assigned to
cluster 2.
Now revise Cluster Centroid – mean value Height and Weight as Custer
Centroids. Addition is only to cluster 2, so centroid of cluster 2 will be
updated
Updated cluster centroids
Updated Centroid
Cluster Height Weight
K=1 185 72
K=2 (170+168)/2 = 169 (56+60)/2 = 58
26. Step 5: Calculate Euclidean Distance for the next observation, assign next
observation based on minimum euclidean distance and update the cluster
centroids.
Next Observation.
Height Weight
179 68
Euclidean Distance Calculation and Assignment
Euclidain Distance Euclidain Distance
from Cluster 1 from Cluster 2 Assignment
7.211103 14.14214 1
Update Cluster Centroid
Updated Centroid
Cluster Height Weight
K=1 182 70.6667
K=2 169 58
Continue the steps until all observations are assigned
28. This is what was expected initially based on two-dimensional plot.
29. A few important considerations in K Means
•Scale of measurements influences Euclidean Distance , so variable
standardisation becomes necessary
•Depending on expectations - you may require outlier treatment
•K Means clustering may be biased on initial centroids - called cluster
seeds
•Maximum clusters is typically inputs and may also impacts the clusters
getting created
30. The K-Medoids Clustering Method
• Find representative objects, called medoids, in clusters
• PAM (Partitioning Around Medoids, 1987)
– starts from an initial set of medoids and iteratively replaces one of the
medoids by one of the non-medoids if it improves the total distance of
the resulting clustering
– PAM works effectively for small data sets, but does not scale well for
large data sets
• CLARA (Kaufmann & Rousseeuw, 1990)
• CLARANS (Ng & Han, 1994): Randomized sampling
• Focusing + spatial data structure (Ester et al., 1995)
31. A Typical K-Medoids Algorithm (PAM)
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
Total Cost = 20
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
K=2
Arbitrary
choose k
object as
initial
medoids
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
Assign
each
remaining
object to
nearest
medoids
Randomly select a
nonmedoid object,Oramdom
Compute
total cost of
swapping
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
Total Cost = 26
Swapping O
and Oramdom
If quality is
improved.
Do loop
Until no change
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
32. Hierarchical Clustering
• Clusters are created in levels actually creating sets of clusters at each
level.
• Agglomerative
– Initially each item in its own cluster
– Iteratively clusters are merged together
– Bottom Up
• Divisive
– Initially all items in one cluster
– Large clusters are successively divided
– Top Down
33. Hierarchical Clustering
Use distance matrix as clustering criteria. This method does not require
the number of clusters k as an input, but needs a termination condition
Illustrative Example:
Agglomerative and divisive clustering on the data set {a, b, c, d ,e }
Cluster distance
Termination condition
Step 0 Step 1 Step 2 Step 3 Step 4
b
d
c
e
a
a b
d e
c d e
a b c d e
Step 4 Step 3 Step 2 Step 1 Step 0
Agglomerative
Divisive
34. Hierarchical Agglomerative Clustering
(HAC)
• Starts with each doc in a separate cluster
– then repeatedly joins the closest pair of clusters, until there is only one
cluster.
• The history of merging forms a binary tree or hierarchy.
How to measure distance of clusters??
35. Closest pair of clusters
Many variants to defining closest pair of clusters
• Single-link
– Distance of the “closest” points (single-link)
• Complete-link
– Distance of the “farthest” points
• Centroid
– Distance of the centroids (centers of gravity)
• (Average-link)
– Average distance between pairs of elements
36. single link
(min)
complete link
(max)
average
Cluster Distance Measures
• Single link: smallest distance between an
element in one cluster and an element in
the other, i.e., d(Ci, Cj) = min{d(xip, xjq)}
• Complete link: largest distance between
an element in one cluster and an element
in the other, i.e., d(Ci, Cj) = max{d(xip,
xjq)}
• Average: avg distance between elements
in one cluster and elements in the other,
i.e.,
d(Ci, Cj) = avg{d(xip, xjq)
d(C, C)=0
37. Dendrogram
• Dendrogram: a tree data
structure which illustrates
hierarchical clustering
techniques.
• Each level shows clusters for
that level.
– Leaf – individual clusters
– Root – one cluster
• A cluster at level i is the union of
its children clusters at level i+1.
38. Cluster Distance Measures
Example: Given a data set of five objects characterized by a single continuous feature,
assume that there are two clusters: C1: {a, b} and C2: {c, d, e}.
1.Calculate the distance matrix.
2.Calculate three cluster distances between C1 and C2.
a b c d e
Feature 1 2 4 5 6
a b c d e
a 0 1 3 4 5
b 1 0 2 3 4
c 3 2 0 1 2
d 4 3 1 0 1
e 5 4 2 1 0
Single link
Complete link
Average
2
4}
3,
2,
5,
4,
min{3,
e)}
(b,
d),
(b,
c),
(b,
e),
(a,
d),
a,
(
,
c)
a,
(
min{
)
C
,
C
(
dist 2
1
d
d
d
d
d
d
5
4}
3,
2,
5,
4,
max{3,
e)}
(b,
d),
(b,
c),
(b,
e),
(a,
d),
a,
(
,
c)
a,
(
max{
)
C
,
dist(C 2
1
d
d
d
d
d
d
5
.
3
6
21
6
4
3
2
5
4
3
6
e)
(b,
d)
(b,
c)
(b,
e)
(a,
d)
a,
(
c)
a,
(
)
C
,
dist(C 2
1
d
d
d
d
d
d
39. Agglomerative Algorithm
• The Agglomerative algorithm is carried out in three steps:
1) Convert all object features into a
distance matrix
2) Set each object as a cluster (thus if
we have N objects, we will have N
clusters at the beginning)
3) Repeat until number of cluster is
one (or known # of clusters)
Merge two closest clusters
Update “distance matrix”
40. • Problem: clustering analysis with agglomerative algorithm
Example
data matrix
distance matrix
Euclidean distance
41. • Merge two closest clusters (iteration 1)
Example
48. • Dendrogram tree representation
Example
1. There are 6 clusters: A, B, C, D, E and
F
2. Merge clusters D and F into cluster (D,
F) at distance 0.50
3. Merge cluster A and cluster B into (A,
B) at distance 0.71
4. Merge clusters E and (D, F) into ((D,
F), E) at distance 1.00
5. Merge clusters ((D, F), E) and C into
(((D, F), E), C) at distance 1.41
6. Merge clusters (((D, F), E), C) and (A,
B) into ((((D, F), E), C), (A, B))
at distance 2.50
7. The last cluster contain all the objects,
thus conclude the computation
2
3
4
5
6
object
lifetime
49. Exercise
Given a data set of five objects characterised by a single continuous feature:
Apply the agglomerative algorithm with single-link, complete-link and averaging cluster
distance measures to produce three dendrogram trees, respectively.
a b C d e
Feature 1 2 4 5 6
a b c d e
a 0 1 3 4 5
b 1 0 2 3 4
c 3 2 0 1 2
d 4 3 1 0 1
e 5 4 2 1 0
51. Density-Based Clustering
• Clustering based on density (local cluster criterion), such as density-
connected points or based on an explicitly constructed density
function
• This connected dense component which can grow in any direction that
density leads.
• Density, connectivity and boundary
• Arbitrary shaped clusters and good scalability
• Each cluster has a considerable higher density of points than outside
of the cluster
52. Major Features
• Major features:
– Discover clusters of arbitrary shape
– Handle noise
– One scan
– Need density parameters
53. Two Major Types of Density-Based
Clustering Algorithms
• Connectivity based:
– DBSCAN: Ester, et al. (KDD’96)
– OPTICS: Ankerst, et al (SIGMOD’99).
– CLIQUE: Agrawal, et al. (SIGMOD’98)
• Density function based:
- DENCLUE: Hinneburg & D. Keim (KDD’98/2006)
54. Density Based Clustering: Basic Concept
• Intuition for the formalization of the basic idea
– For any point in a cluster, the local point density around that point has to
exceed some threshold
– The set of points from one cluster is connected
• Local point density at a point p defined by two parameters
– ε – radius for the neighborhood of point p:
Nε (p) := {q in data set D | dist(p, q) ε}
– MinPts – minimum number of points in the given neighbourhood N(p)
55. -Neighborhood
• -Neighborhood – Objects within a radius of from an object.
• “High density” - ε-Neighborhood of an object contains at least MinPts
of objects.
q p
ε
ε
ε-Neighborhood of p
ε-Neighborhood of q
Density of p is “high” (MinPts = 4)
Density of q is “low” (MinPts = 4)
}
)
,
(
|
{
:
)
(
q
p
d
q
p
N
56. Core, Border & Outlier
Given and MinPts, categorize the
objects into three exclusive groups.
A point is a core point if it has more
than a specified number of points
(MinPts) within Eps These are points
that are at the interior of a cluster.
A border point has fewer than
MinPts within Eps, but is in the
neighborhood of a core point.
A noise point is any point that is
not a core point nor a border point.
57. Example
• M, P, O, and R are core objects since each is in an Eps neighborhood
containing at least 3 points
Minpts = 3
Eps=radius
of the circles
58. Density-Reachability
Directly density-reachable
An object q is directly density-reachable from object p if p is a
core object and q is in p’s -neighborhood.
q is directly density-reachable from p
p is not directly density- reachable from q?
Density-reachability is asymmetric.
q
p
MinPts = 5
Eps = 1 cm
59. Density-Reachability
• Density-Reachable (directly and indirectly):
– A point p is directly density-reachable from p1;
– p1 is directly density-reachable from q;
– pp1q form a chain.
• p is (indirectly) density-reachable from q
• q is not density- reachable from p?
• Density-connected
– A point p is density-connected to a point q wrt.
Eps, MinPts if there is a point o such that both,
p and q are density-reachable from o wrt. Eps
and MinPts.
p
q
p1
p q
o
60. Formal Description of Cluster
• Given a data set D, parameter and threshold MinPts.
• A cluster C is a subset of objects satisfying two criteria:
– Connected: p, q C: p and q are density-connected.
– Maximal: p, q: if p C and q is density-reachable from p, then q C.
(avoid redundancy)
P is a core object.
61. Review of Concepts
Are objects p and q in the
same cluster?
Are p and q density-
connected?
Are p and q density-
reachable by some object o?
Directly density-
reachable
Indirectly density-reachable
through a chain
Is an object o in a cluster or
an outlier?
Is o a core object?
Is o density-reachable by
some core object?
62. DBSCAN Algorithm
Input: The data set D
Parameter: , MinPts
For each object p in D
if p is a core object and not processed then
C = retrieve all objects density-reachable from p
mark all objects in C as processed
report C as a cluster
else mark p as outlier
end if
End For
DBScan Algorithm
63. DBSCAN: The Algorithm
– Arbitrary select a point p
– Retrieve all points density-reachable from p wrt Eps and MinPts.
– If p is a core point, a cluster is formed.
– If p is a border point, no points are density-reachable from p and
DBSCAN visits the next point of the database.
– Continue the process until all of the points have been processed.
64. DBSCAN Algorithm: Example
• Parameter
• e = 2 cm
• MinPts = 3
for each o D do
if o is not yet classified then
if o is a core-object then
collect all objects density-reachable
from o
and assign them to a new cluster.
else
assign o to NOISE
65. DBSCAN Algorithm: Example
• Parameter
• e = 2 cm
• MinPts = 3
for each o Î D do
if o is not yet classified then
if o is a core-object then
collect all objects density-reachable
from o
and assign them to a new cluster.
else
assign o to NOISE
66. DBSCAN Algorithm: Example
• Parameter
• e = 2 cm
• MinPts = 3
for each o Î D do
if o is not yet classified then
if o is a core-object then
collect all objects density-reachable
from o
and assign them to a new cluster.
else
assign o to NOISE
67. DBSCAN Algorithm: Advantages
• DBSCAN does not require to specify the number of clusters in the data
apriori, as opposed to k-means.
• DBSCAN can find arbitrarily shaped clusters. It can even find a cluster
completely surrounded by (but not connected to) a different cluster. Due to the
MinPts parameter, the so-called single-link effect (different clusters being
connected by a thin line of points) is reduced.
• DBSCAN has a notion of noise, and is robust to outliers.
• DBSCAN requires just two parameters and is mostly insensitive to the
ordering of the points in the database. (However, points sitting on the edge of
two different clusters might swap cluster membership if the ordering of the
points is changed, and the cluster assignment is unique only up to
isomorphism.)
• The parameters minPts and ε can be set by a domain expert, if the data is well
understood.
68. DBSCAN Algorithm: Disadvantages
• DBSCAN is not entirely deterministic: border points that are reachable from
more than one cluster can be part of either cluster, depending on the order the
data is processed. Fortunately, this situation does not arise often, and has little
impact on the clustering result: both on core points and noise points,
DBSCAN is deterministic.
• The quality of DBSCAN depends on the distance measure used in the function
regionQuery (P, ε). The most common distance metric used is Euclidean
distance. Especially for high-dimensional data, this metric can be rendered
almost useless due to the so-called "Curse of dimensionality", making it
difficult to find an appropriate value for ε. This effect, however, is also present
in any other algorithm based on Euclidean distance.
• DBSCAN cannot cluster data sets well with large differences in densities,
since the minPts-ε combination cannot then be chosen appropriately for all
clusters.
• If the data and scale are not well understood, choosing a meaningful distance
threshold ε can be difficult.
69. Steps of Grid-based Clustering
Algorithms
Basic Grid-based Algorithm
1. Define a set of grid-cells
2. Assign objects to the appropriate grid cell and compute the density of
each cell.
3. Eliminate cells, whose density is below a certain threshold t.
4. Form clusters from contiguous (adjacent) groups of dense cells
(usually minimizing a given objective function)
70. Advantages of Grid-based Clustering Algorithms
• fast:
– No distance computations
– Clustering is performed on summaries and not individual objects;
complexity is usually O(#-populated-grid-cells) and not O(#objects)
– Easy to determine which clusters are neighboring
• Shapes are limited to union of grid-cells
71. Grid-Based Clustering Methods
• Grid-based methods quantize the object space into a finite number of cells
that form a gird structure (Uses multi-resolution grid data structure).
• All the clustering operations are performed on the grid structure.
• Clustering complexity depends on the number of populated grid cells and
not on the number of objects in the dataset
• Several interesting methods (in addition to the basic grid-based algorithm)
– STING (a STatistical INformation Grid approach) by Wang, Yang and
Muntz (1997)
– CLIQUE: Agrawal, et al. (SIGMOD’98)
72. STING: A Statistical Information Grid
Approach
• Wang, Yang and Muntz (VLDB’97)
• The spatial area is divided into rectangular cells
• There are several levels of cells corresponding to different levels of
resolution
73. STING: A Statistical Information Grid
Approach (2)
– Each cell at a high level is partitioned into a number of smaller cells in the
next lower level
– Statistical info of each cell is calculated and stored beforehand and is used
to answer queries
– Parameters of higher level cells can be easily calculated from parameters
of lower level cell
• count, mean, s, min, max
• type of distribution—normal, uniform, etc.
– Use a top-down approach to answer spatial data queries
74. STING: Query Processing(3)
Used a top-down approach to answer spatial data queries
1. Start from a pre-selected layer—typically with a small number of cells
2. From the pre-selected layer until you reach the bottom layer do the
following:
• For each cell in the current level compute the confidence interval
indicating a cell’s relevance to a given query;
– If it is relevant, include the cell in a cluster
– If it irrelevant, remove cell from further consideration
– otherwise, look for relevant cells at the next lower layer
3. Combine relevant cells into relevant regions (based on grid-neighborhood)
and return the so obtained clusters as your answers.
75. STING: A Statistical Information Grid
Approach (3)
– Advantages:
• Query-independent, easy to parallelize, incremental update
• O(K), where K is the number of grid cells at the lowest level
– Disadvantages:
• All the cluster boundaries are either horizontal or vertical, and no
diagonal boundary is detected