08 clustering

What is Clustering in Data Mining?
• Cluster : (collection)
– (Similarity)
– (Dissimilarity or
Distance)
• Cluster Analysis
–
• Clustering
– (Classification)
(unsupervised classification)
2

Cluster Analysis
How many clusters?
Four ClustersTwo Clusters
Six Clusters
3

What is Good Clustering?
•
(Minimize Intra-Cluster
Distances)
(Maximize Inter-Cluster Distances)
Inter-cluster
distances are
maximized
Intra-cluster
distances are
minimized
4

Types of Clustering
• Partitional Clustering
– A division data objects into non-overlapping subsets
(clusters) such that each data object is in exactly one subset
Original Points A Partitional Clustering
5

Types of Clustering
• Hierarchical clustering
– A set of nested clusters organized as a hierarchical tree
p4
p1
p3
p2
p4
p1
p3
p2
p4p1 p2 p3
p4p1 p2 p3
Hierarchical Clustering#1
Hierarchical Clustering#2 Traditional Dendrogram 2
Traditional Dendrogram 1
6

Types of Clustering
• Exclusive versus non-exclusive
– In non-exclusive clusterings, points may belong to multiple
clusters.
– Can represent multiple classes or „border‟ points
• Fuzzy versus non-fuzzy
– In fuzzy clustering, a point belongs to every cluster with
some weight between 0 and 1
– Weights must sum to 1
– Probabilistic clustering has similar characteristics
• Partial versus complete
– In some cases, we only want to cluster some of the data
• Heterogeneous versus homogeneous
– Cluster of widely different sizes, shapes, and densities
7

Characteristics of Cluster
• Well-Separated Clusters:
– A cluster is a set of points such that any point in a cluster is
closer (or more similar) to every other point in the cluster than
to any point not in the cluster.
3 well-separated clusters
8

• Center-based
– A cluster is a set of objects such that an object in a cluster
is closer (more similar) to the “center” of a cluster, than to
the center of any other cluster.
– The center of a cluster is often a centroid, the average of all
the points in the cluster, or a medoid, the most
“representative” point of a cluster.
4 center-based clusters
9

• Density-based
– A cluster is a dense region of points, which is separated by
low-density regions, from other regions of high density.
– Used when the clusters are irregular, and when noise and
outliers are present.
6 density-based clusters
10

• Shared Property or Conceptual Clusters
– Finds clusters that share some common property
or represent a particular concept.
2 Overlapping Circles
11

Clustering Algorithms
• K-means clustering
• Hierarchical clustering
12

K-means Clustering
• (Partition) n
D k (
k)
• k-Means k
13

K-means Clustering Algorithm
Algorithm: The k-Means algorithm for partitioning based
on the mean value of object in the cluster.
Input: The number of cluster k and a database containing
n objects.
Output: A set of k clusters that mininimizes the squared-
error criterion.
14

K-means Clustering Algorithm
Method
1) Randomly choose k object as the initial cluster centers
(centroid);
2) Repeat
3) (re)assign each object to the cluster to which the object
is the most similar, based on the mean value of the
objects in the cluster;
4) Update the cluster mean
calculate the mean value of the objects for each
cluster;
5) Until centroid (center point) no change;
15

Example: K-Mean Clustering
• Problem: Cluster the following eight points (with (x,
y) representing locations) into three clusters A1(2,
10) A2(2, 5) A3(8, 4) A4(5, 8) A5(7, 5) A6(6, 4)
A7(1, 2) A8(4, 9).
0
2
4
6
8
10
12
0 1 2 3 4 5 6 7 8 9
16

• Randomly choose k object as the initial cluster
centers;
• k =3 ; c1(2, 10), c2(5, 8) and c3(1, 2).
0
2
4
6
8
10
12
0 1 2 3 4 5 6 7 8 9
+
+
+
c1
c2
c3
17

• The distance function between two points a=(x1, y1)
and b=(x2, y2) is defined as:
distance(a, b) = |x2 – x1| + |y2 – y1|
(2, 10) (5, 8) (1, 2)
Point Dist Mean 1 Dist Mean 2 Dist Mean 3 Cluster
A1 (2, 10)
A2 (2, 5)
A3 (8, 4)
A4 (5, 8)
A5 (7, 5)
A6 (6, 4)
A7 (1, 2)
A8 (4, 9)
18

• Step 2 Calculate distance by using the
distance functionpoint mean1
x1, y1 x2, y2
(2, 10) (2, 10)
distance(point, mean1) = |x2 – x1| + |y2 – y1|
= |2 – 2| + |10 – 10|
= 0 + 0 = 0
point mean2
x1, y1 x2, y2
(2, 10) (5, 8)
= |5 – 2| + |8 – 10|
= 3 + 2 = 5
point mean3
x1, y1 x2, y2
(2, 10) (1, 2)
= |1 – 2| + |2 – 10|
= 1 + 8 = 9
19

(2, 10) (5, 8) (1, 2)
A1 (2, 10) 0 5 9 1
A2 (2, 5)
A3 (8, 4)
A4 (5, 8)
A5 (7, 5)
A6 (6, 4)
A7 (1, 2)
A8 (4, 9)
20

• Calculate distance by using the distance function
point mean1
x1, y1 x2, y2
(2, 5) (2, 10)
= |2 – 2| + |10 – 5|
= 0 + 5 = 5
point mean2
x1, y1 x2, y2
(2, 5) (5, 8)
= |5 – 2| + |8 – 5|
= 3 + 3 = 6
point mean3
x1, y1 x2, y2
(2, 5) (1, 2)
= |1 – 2| + |2 – 5|
= 1 + 3 = 4
21

(2, 10) (5, 8) (1, 2)
A1 (2, 10) 0 5 9 1
A2 (2, 5) 5 6 4 3
A3 (8, 4)
A4 (5, 8)
A5 (7, 5)
A6 (6, 4)
A7 (1, 2)
A8 (4, 9)
22

• Iteration#1
(2, 10) (5, 8) (1, 2)
A1 (2, 10) 0 5 9 1
A2 (2, 5) 5 6 4 3
A3 (8, 4) 12 7 9 2
A4 (5, 8) 5 0 10 2
A5 (7, 5) 10 5 9 2
A6 (6, 4) 10 5 7 2
A7 (1, 2) 9 10 0 3
A8 (4, 9) 3 2 10 2
23

Cluster 1 Cluster 2 Cluster 3
A1(2, 10) A3(8, 4) A2(2, 5)
A4(5, 8) A7(1, 2)
A5(7, 5)
A6(6, 4)
A8(4, 9)
0
2
4
6
8
10
12
0 1 2 3 4 5 6 7 8 9
+
+
c1
c2
c3
+
24

• re-compute the new cluster centers (means). We do
so, by taking the mean of all points in each cluster.
• For Cluster 1, we only have one point
A1(2, 10), which was the old mean, so the cluster
center remains the same.
• For Cluster 2, we have (
(8+5+7+6+4)/5, (4+8+5+4+9)/5 ) = (6, 6)
• For Cluster 3, we have ( (2+1)/2, (5+2)/2 ) =
(1.5, 3.5)
25

0
2
4
6
8
10
12
0 1 2 3 4 5 6 7 8 9
+
+
c1
c2
c3
+
26

• Iteration#2
(2, 10) (6, 6) (1.5, 3.5)
A1 (2, 10)
A2 (2, 5)
A3 (8, 4)
A4 (5, 8)
A5 (7, 5)
A6 (6, 4)
A7 (1, 2)
A8 (4, 9)
27

(Iteration#2)
Cluster 1? Cluster 2? Cluster 3?
0
2
4
6
8
10
12
0 1 2 3 4 5 6 7 8 9
re-compute the new
cluster centers (means)
C1 = (2+4/2, 10+9/2)
= (3, 9.5)
C2 = (6.5, 5.25)
C3 = (1.5, 3.5)
28

Iteration#3
Cluster 1? Cluster 2? Cluster 3?
0
2
4
6
8
10
12
0 1 2 3 4 5 6 7 8 9
re-compute the new
cluster centers (means)??
29

Distance functions
•
• Minkowski distance
• q=1 d Manhattan distance
• q=2 d Euclidean distance
q
q
jpip
q
ji
q
ji xxxxxxjid )...(),( 2211
jpipjiji xxxxxxjid ...),( 2211
)...(),(
22
22
2
11 jpipjiji xxxxxxjid
30

Evaluating K-means Clusters
• Most common measure is Sum of Squared
Error (SSE)
– For each point, the error is the distance to
the nearest cluster
– To get SSE, we square these errors and
sum them.
where,
– x is a data point in cluster Ci
– mi is the centroid point for cluster Ci
• can show that mi corresponds to the
K
i Cx
i
i
xmdistSSE
1
2
),(
31

Limitations of K-Mean
• K-means
–Size
–Density
–Shapes
32

Limitations of K-means: Differing
Sizes
• K-means
Original Points K-means (3 Clusters)
33

Limitations of K-means: Differing
Density
• K-means
34

Limitations of K-means: Non-
globular Shapes
• K-means
35

Overcoming K-means Limitations
Original Points K-means Clusters
One solution is to use many clusters.
Find parts of clusters, but need to put together.36

38

Hierarchical Clustering
•
Dendrogram
• Dendrogram
(cluster) (subcluster)
(cluster)
1 3 2 5 4 6
0
0.05
0.1
0.15
0.2
1
2
3
4
5
6
1
2
3 4
5
Dendrogram
39

2
1. Agglomerative ( ) :
Agglomerative
2. Divisive ( ) :
Divisive Agglomerative
(singleton
cluster) cluster
40

Agglomerative Clustering Algorithm
Basic algorithm is straightforward
1. Compute the proximity matrix
2. Let each data point be a cluster
3. Repeat
4. Merge the two closest clusters
5. Update the proximity matrix
6. Until only a single cluster remains
41

Example:
6
Euclidean distance
44

How to Define Inter-Cluster
Similarity
p1
p3
p5
p4
p2
p1 p2 p3 p4 p5 . . .
.
.
.
Similarity?
Proximity Matrix
 MIN
 MAX
 Group Average
 Ward’s Method uses squared
error
45

Similarity
 MIN
 MAX
 Group Average
 Ward’s Method uses squared
error
p1
p3
p5
p4
p2
p1 p2 p3 p4 p5 . . .
.
.
.
Proximity Matrix
46

How to Define Inter-Cluster Similarity
p1
p3
p5
p4
p2
p1 p2 p3 p4 p5 . . .
.
.
.
Proximity Matrix
 MIN
 MAX
 Group Average
 Ward’s Method uses squared error
47

Similarity
p1
p3
p5
p4
p2
p1 p2 p3 p4 p5 . . .
.
.
.
Proximity Matrix
 MIN
 MAX
 Group Average
 Ward’s Method uses squared error
48

Cluster Similarity: MIN or Single
Link
• Single link MIN
2
49

Link
1
2
3
4
5
6
1
3 6 2 5 4 1
0
0.05
0.1
0.15
0.2
0.11
50

Link
1
2
3
4
5
6
1
3 6 2 5 4 1
0
0.05
0.1
0.15
0.2
Dist({3,6},{2}) = min(dist(3,2), dist(6,2))
min(0.15, 0.25) = 0.15
Dist({3,6}, {5}) = min(dist(3,5), dist(6,5)) =0.28
Dist ({3,6}, {4}) = min(dist(3,4), dist(6,4)) = 0.15
Dist({3,6}, {1}) = min(dist(3,1), dist(6,1)) = 0.22
51

Cluster Similarity: MIN or Single Link
1
2
3
4
5
6
1
2
3 6 2 5 4 1
0
0.05
0.1
0.15
0.2
Dist({3,6},{2}) = min(dist(3,2), dist(6,2))
min(0.15, 0.25) = 0.15
Dist ({3,6}, {4}) = min(dist(3,4), dist(6,4)) = 0.15
Dist ({3,6},{2}), Dist ({3,6}, {4}) > dist(5,2) = 0.14
0.14
52

Link
1
2
3
4
5
6
1
2
3
3 6 2 5 4 1
0
0.05
0.1
0.15
0.2
Dist({3,6},{2,5}) = min(dist(3,2), dist(6,2), dist(3,5), dist(6,5))
= min(0.15, 0.25, 0.28, 0.39)
=0.15
Dist ({3,6},{1}) = min(dist(3,1), dist(6,1))
= min(0.22, 0.23) = 0.22
Dist ({3,6},{4}) = min (dist(3,4), dist(6,4))
= min(0.15, 0.22) = 0.15
53

Link
1
2
3
4
5
6
1
2
3
4
3 6 2 5 4 1
0
0.05
0.1
0.15
0.2
Dist({3,6,2,5}, {1}) = min(dist(3,1), dist(6,1), dist(2,1), dist(5,1))
= min(0.22, 0.23, 0.24, 0.34) = 0.22
Dist(({3,6,2,5}, {4}) = min(dist(3,4), dist(6,4), dist(2,4), dist(5,4))
= min(0.15, 0.22, 0.20, 0.29) = 0.15

Link
1
2
3
4
5
6
1
2
3
4
5
3 6 2 5 4 1
0
0.05
0.1
0.15
0.2
Dist(({3,6,2,5,4}, {1}) = min(dist(3,1), dist(6,1),
dist(2,1), dist(5,1), dist(4,1))
= min(0.22, 0.23, 0.24, 0.34, 0.37)
= 0.22
55

Strength of MIN
Original Points Two Clusters
• Can handle non-elliptical shapes
56

Limitations of MIN
•
• Sensitive to noise and outliers
57

Cluster Similarity: MAX or
Complete Linkage
• Complete link MAX
2
58

3 6 4 1 2 5
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Complete Linkage
1
2
3
4
5
6
1
0.11
59

Complete Linkage
3 6 4 1 2 5
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
1
2
3
4
5
6
1
Dist({3,6},{1}) = max(dist(3,1), dist(6,1))= 0.23
Dist({3,6},{4}) = max(dist(3,4), dist(6,4))= 0.22**

Complete Linkage
3 6 4 1 2 5
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
1
2
3
4
5
6
1
2
Dist({3,6},{4}) = max(dist(3,4), dist(6,4))= 0.22 > Dist({2},{5})
0.14
61

Complete Linkage
3 6 4 1 2 5
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
1
2
3
4
5
6
1
3
Dist({3,6},{2,5}) = max(dist(3,2), dist(3,5), dist(6,2), dist(6,5))
= 0.39
Dist({3,6},{4}) = max(dist(3,4), dist(6,4))= 0.22**
Dist({2,5},{4}) = max(dist(2,4), dist(5,4)) = 0.29
Dist({2,5}, {1}) = max(dist(2,1), dist(5,1)) = 0.34
0.22
2
62

Complete Linkage
3 6 4 1 2 5
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
1
2
3
4
5
6
1
2
3
4
Dist({2,5}, {1}) = max(dist(2,1), dist(5,1))= 0.34**
Dist({2,5}, {3,6,4}) = max(dist(2,3), dist(2,6), dist(2,4)),
dist(5,3), dist(5,6), dist(5,4))
= max(0.15, 0.25, 0.20, 0.28, 0.39, 0.29)
= 0.39
0.34
63

Cluster Similarity: MAX or Complete Linkage
3 6 4 1 2 5
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
1
2
3
4
5
6
1
3 5
2
4
Dist({2,5,1},{3,6,4}) = max(dist(2,3), dist(2,6), dist(2,4)
dist(5,3), dist(5, 6), dist(5,4)
dist(1,3), dist(1,6), dist(1,4))
= max(0.15, 0.25, 0.20, 0.28, 0.39,
0.29, 0.22, 0.23,0.37)
=0.39
0.39
64

Strength of MAX
• Less susceptible to noise and outliers
65

Limitations of MAX
•Tends to break large clusters
•Biased towards globular clusters
66

Cluster Similarity: Group Average
• Group average Hierarchical
Clustering
Single link
compete link
67

Linkage
1
2
3
4
5
6
1
3 6 4 1 2 5
0
0.05
0.1
0.15
0.2
0.25
2 5 1
0.11
68

1
2
3
4
5
6
1
Dist({3,6},{1}) = avg(dist(3,1), dist(6,1)) = (0.22+0.23)/(2*1) = 0.225
Dist ({3,6},{2}) = avg(dist(3,2), dist(6,2)) = (0.15+0.25)/(2*1) = 0.20
Dist({3,6},{4}) = avg(dist(3,4), dist(6,4)) = (0.15+0.22)/(2*1) = 0.185**
Dist({3,6}, {5}) = avg(dist(3,5), dist(6,5)) = (0.28+0.39)/(2*1) = 0.335
3 6 4 1 2 5
0
0.05
0.1
0.15
0.2
0.25
2 5 1
0.11
69

1
2
3
4
5
6
1
2
Dist({3,6},{4}) = avg(dist(3,4), dist(6,4)) = (0.15+0.22)/(2*1) = 0.185** >
Dist({2}, {5})
3 6 4 1 2 5
0
0.05
0.1
0.15
0.2
0.25
2 5 1
0.14
70

1
2
3
4
5
6
1
2
3
3 6 4 1 2 5
0
0.05
0.1
0.15
0.2
0.25
2 5 1
Dist({3,6},{1}) = avg(dist(3,1), dist(6,1)) = (0.22+0.23)/(2*1) = 0.225
Dist ({3,6},{2}) = avg(dist(3,2), dist(6,2)) = (0.15+0.25)/(2*1) = 0.20
Dist({3,6},{4}) = avg(dist(3,4), dist(6,4)) = (0.15+0.22)/(2*1) = 0.185**
Dist({3,6}, {5}) = avg(dist(3,5), dist(6,5)) = (0.28+0.39)/(2*1) = 0.335
0.185
71

1
2
3
4
5
6
1
2
3
3 6 4 1 2 5
0
0.05
0.1
0.15
0.2
0.25
2 5 1
Dist({3,6, 4},{1}) = avg(dist(3,1), dist(6,1), dist(4,1))
= (0.22+0.23+0.37)/(3*1) = 0.273
Dist ({3,6,4},{2,5}) = avg(dist(3,2), dist(3,5), dist(6,2), dist(6,5),
dist(4,2), dist(4,5))
= (0.15+0.28+0.25+0.39+0.20+0.29)/(3*2)
= 0.26
4
0.26

1
2
3
4
5
6
1
2
5
3
3 6 4 1 2 5
0
0.05
0.1
0.15
0.2
0.25
2 5 1
4
Dist ({3,6,4,2,5}, {1}) = avg(dist(3,1), dist(6,1), dist(4,1), dist(2,1),dist(5,1))
= (0.22+0.23+0.37+0.24+0.34)/(5*1)
= 0.28
0.28
73

Hierarchical Clustering: Group
Average
• Compromise between Single and Complete
Link
• Strengths
– Less susceptible to noise and outliers
• Limitations
– Biased towards globular clusters
74

Hierarchical Clustering:
Comparison
Group Average
MIN MAX
1
2
3
4
5
6
1
2
5
3
4
1
2
3
4
5
6
1
2 5
3
41
2
3
4
5
6
1
2
3
4
5
75

Hierarchical Clustering:
Comparison
3 6 4 1 2 5
0
0.05
0.1
0.15
0.2
0.25
2 5 1
Group Average
3 6 4 1 2 5
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
MAX
3 6 2 5 4 1
0
0.05
0.1
0.15
0.2
MIN
76

Internal Measures : Cohesion and
Separation
(graph-based clusters)• A graph-based cluster approach can be evaluated by
cohesion and separation measures.
– Cluster cohesion is the sum of the weight of all links within a
cluster.
– Cluster separation is the sum of the weights between nodes in the
cluster and nodes outside the cluster.
cohesion separation
77

Cohesion and Separation (Central-
based clusters)
• A central-based cluster approach can be
evaluated by cohesion and separation
measures.
78

Cohesion and Separation (Central-
based clustering)
• Cluster Cohesion: Measures how closely related are
objects in a cluster
– Cohesion is measured by the within cluster
sum of squares (SSE)
• Cluster Separation: Measure how distinct or well-
separated a cluster is from other clusters
– Separation is measured by the between cluster
sum of squares
»Where |Ci| is the size of cluster i
i Cx
i
i
mxWSS 2
)(
i
ii mmCBSS 2
)(
79

Example: Cohesion and Separation
 Example: WSS + BSS = Total SSE (constant)
1 2 3 4 5
m
1091
9)5.43(2)5.13(2
1)5.45()5.44()5.12()5.11(
22
2222
Total
BSS
WSSK=2 clusters:
10010
0)33(4
10)35()34()32()31(
2
2222
Total
BSS
WSSK=1 cluster:
1 2 3 4 5m1 m2
m

HW#8
81
• Database Segmentation
K-means clustering K=3
C1 = (1,5)
, C2 = (3,12) C3 = (2,13)
(Pattern)
K-means

HW#8
82
• What is cluster?
• What is Good Clustering?
• How many types of clustering?
• How many Characteristics of Cluster?
• What is K-means Clustering?
• What are limitations of K-Mean?
• Please explain method of Hierarchical
Clustering?

ID X Y
A1 1 5
A2 4 9
A3 8 15
A4 6 2
A5 3 12
A6 10 7
A7 7 7
A8 11 4
A9 13 10
A10 2 13

LAB 8
84
• Use weka program to construct a neural
network classification from the given file.
• Weka Explorer  Open file  bank.arff
• Cluster  Choose button 
SimpleKMeans  Next, click on the text
box to the right of the "Choose" button to
get the pop-up window

08 clustering

More Related Content

What's hot

Viewers also liked

Similar to 08 clustering

Recently uploaded

In this document

08 clustering