The document describes and compares different hierarchical clustering algorithms:
1) Single-link clustering connects clusters based on the closest pair of patterns, forming elongated clusters. Complete-link connects based on the furthest pair, forming more compact clusters.
2) Complete-link is more useful than single-link for most applications as it produces more interpretable hierarchies. However, single-link can extract certain cluster types that complete-link cannot, like concentric clusters.
3) Average group linkage connects clusters based on the average distance between all pairs of patterns in the two clusters. It provides a balance between single and complete link.
Dependable direct solutions for linear systems using a little extra precisionJason Riedy
Georgia Tech CSE Seminar, Friday, August 21,2009
Solving a square linear system Ax=b often is considered a black box. It's supposed to "just work," and failures often are blamed on the original data or subtleties of floating-point. Now that we have an abundance of cheap computations, however, we can do much better. A little extra precision in just the right places produces accurate solutions cheaply or demonstrates when problems are too hard to solve without significant cost.
This talk will outline the method, iterative refinement with a new twist; the benefits, small backward and forward errors; and the trade-offs and unexpected benefits.
For helpful CXC Maths Multiple Choice Videos please click below
These videos are very helpful
https://oke.io/dUqlSrd
https://oke.io/UWfOCCP
https://oke.io/FrCDQ
Predictive analytics have long lived in the domain of statistical tools like R. Increasingly, however, as companies struggle to deal with exploding volumes of data not easily analyzed by small data tools, they are looking at ways of doing predictive analytics directly inside the primary data store.
This approach, called in-database predictive analytics, eliminates the need to sample data and perform a separate ETL process into a statistical tool, which can decrease total cost, improve the quality of predictive models, and dramatically shorten development time. In this class, you will learn the pros and cons of doing in-database predictive analytics, highlights of its limitations, and survey the tools and technologies necessary to head down the path.
Dependable direct solutions for linear systems using a little extra precisionJason Riedy
Georgia Tech CSE Seminar, Friday, August 21,2009
Solving a square linear system Ax=b often is considered a black box. It's supposed to "just work," and failures often are blamed on the original data or subtleties of floating-point. Now that we have an abundance of cheap computations, however, we can do much better. A little extra precision in just the right places produces accurate solutions cheaply or demonstrates when problems are too hard to solve without significant cost.
This talk will outline the method, iterative refinement with a new twist; the benefits, small backward and forward errors; and the trade-offs and unexpected benefits.
For helpful CXC Maths Multiple Choice Videos please click below
These videos are very helpful
https://oke.io/dUqlSrd
https://oke.io/UWfOCCP
https://oke.io/FrCDQ
Predictive analytics have long lived in the domain of statistical tools like R. Increasingly, however, as companies struggle to deal with exploding volumes of data not easily analyzed by small data tools, they are looking at ways of doing predictive analytics directly inside the primary data store.
This approach, called in-database predictive analytics, eliminates the need to sample data and perform a separate ETL process into a statistical tool, which can decrease total cost, improve the quality of predictive models, and dramatically shorten development time. In this class, you will learn the pros and cons of doing in-database predictive analytics, highlights of its limitations, and survey the tools and technologies necessary to head down the path.
7. •
d(x, y) : x, y
Ci , Cj
Dmin (Ci , Cj ) = min {d(x, y) | x ∈ Ci , y ∈ Cj }
1.
2. 1
3. 2 1
7
8. 1 C C C
B B B
A 3 A 5 A
E 4 E E
2
F F F
G G G
D D D
(A) B,C D,F (B) A,E (C)
6
5
4
3 2
1
A B C E D F G
(D) 8
9. (1/2)
•
4.5.
• 59
(1) N x1 , . . . , xN 1
x1 , . . . , xN C1 , . . . , CN
(2) n = N n
(3) n=1
(a) C1 , . . . , Cn Ci , Cj
i<j1 2 3 4 5 6 7
(b) Ci Cj Ci
(c)
(d) Cj = Cn n=n−1
4.8
9
10. (1) N x1 , . . . , xN 1
(2) n = N
x1 , . . . , xN
n
C1 , . . . , CN
(2/2)
(3) n=1
(a) C1 , . . . , Cn Ci , Cj
i<j
(b) Ci Cj Ci
(c)
(d) Cj = Cn n=n−1
1 4.8 2 3 4 5 6 7
(a)
1 2 3 4 5 6 7
(b)
1 C C C
B 1 2 3
B 4 5 6 7
B
(d)
A 3 A 5 A
E E 10 E
11. (A) (B)
A B,C D,F E G
A 1.2 2.3 1.9 4.1 B,C 1.2
B,C 1.2 3.2 2.0 4.0 A 1.2
D,F 2.3 3.2 2.2 3.5 E 2.2
C
E 1.9 2.0 2.2 2.5 A 1.9
B
G 4.1 4.0 3.5 2.5 E 2.5
A
E A B,C
F
(C) (D)
G
D A,B,C D,F E G
A,B,C 2.3 1.9 4.0 E 1.9
D,F 2.3 2.2 3.5 E 2.2
E 1.9 2.2 2.5 A,B,C 1.9
G 4.0 3.5 2.5 E 2.5
11 ※
12. •
• Ci Cj Ci’
• Ci’ Ck
• Ci Ck Cj Ck
•
• N O(N)
• O(N^2)
•
• Cj Ci’
• 1 O(N) O(N^2)
• N-1 O(N)
O(N^2), O(N^2)
12
13. 35, 48)
•(Kruskal’s Algorithm) (Minimum Spanning Tree)
•
4.2 G = (V, E)(V E )
T ⊆G G T T
G V T T
(Spanning Tree) G
(u, v) ∈ E w(u, v) G
(u,v)∈T w(u, v) T
G (Minimum Spanning Tree)
4.13(A) 4.9(P. 59)
4.13(B) 4.13(B)
G 1 BC 6
GE 13 4.9(D) P. 59
15. Kruscal(
72 ) 4
(1) G = (V, E) V E
(2) A A
(3) V ( 1
)
(4) ( (u, v) ∈ E )
(a) A A ∪ {(u, v)} u v
(b) u v
(5) A
C C C
B B B
4.14 A
A A A
E E E
C e V-C
F F F
G G G
D D D
A={} A={(B,C),(D,F)} A={(B,C),(D,F),(A,B)}
{A}, {B}, {C}, {D}, {E}, {F} {A}, {B,C}, {D,F}, {E}
e’ {A, B,C}, {D,F}, {E}
15
16. C C
B
1
B
A
3
A
E E 4
F
6 5 F
2
G G
D D
(A) (B) (A)
6
5
4
3 2
1
A B C E D F G
(C)
16
17. C C C
B B B
A A A
E E E
F F F
G G G
D D D
(A) (B) A (C) ( AB)
E A B,C T B
C C
B B
A
A
E E Q
F F
T
G G
D D
(D) C, F T (E)
O(E + V log V )
17
18. X
•
d(x, y) : x, y
Ci , Cj
Dmax (Ci , Cj ) = max{d(x, y)|x ∈ Ci , y ∈ Cj }
1.
2. 1
3. 2 1
18
19. 1 3 C C
C
B B B
A A A
E 5
2 4
E
F E F F
G
G G
D D D
(A) B,C D,F (B) A,E (C) G
5 (A) (C) (1 5)
(D)
1 5
4
3
2
1
A B C D F E G
(D)
19
20. (A) (B)
A B,C D,F E G
A 1.3 3.0 1.9 4.1 B,C 1.3
B,C 1.3 4.1 2.5 4.5 A 1.3
D,F 3.0 4.1 2.3 4.0 E 2.3
E 1.9 2.5 2.3 2.5 A 1.9
G 4.1 4.5 4.0 2.5 E 2.5
A B,C
(C) (D)
A,B,C D,F E G
A,B,C 4.1 2.5 4.5 E 2.5
D,F 4.1 2.3 4.0 E 2.3
E 2.5 2.3 2.5 D,F 2.3
G 4.5 4.0 2.5 E 2.5 20
21. C
C
B B
A A
E E
F F
G G
D
D
(A) E A (B) A B,C E
A
C
B
(C) A B,C E
A D,F CE DE
E
E F
G
D O
(N^2) O(N^3)
21
23. duces the clusters shown in Figure 12,
whereas the complete-link algorithm ob-
tains the clustering shown in Figure 13.
Data Clustering • 277
S The clusters obtained by the complete-
i
X
m2 link algorithm are more compact than
X2
i those obtained by the single-link algo-
l
a rithm; the cluster labeled 1 obtained
r 2 2 using the 1single-link algorithm 2is elon-
1 11 2
i 1 111 2 2 22 2 2 2 22 2
t 1 1 11 2 2 gated because 1of the noisy patterns la-
1 1 1 2
2
2
2 2 2
y 11
1 1 1 *** * * * * ** 2 2 2 beled “*”. 1 1 1 1 single-link * algorithm 2is
11
The * * * * * * * * 2 2
1 1 2 2 1 2 2
1 1 1 1
1 2 more versatile 1than the complete-link
1
1 1 1
1 2
2
1 2
algorithm, otherwise. For example, the
single-link algorithm can extract the
concentric clusters shown in Figure 11,
A B C D E F***G
but the complete-link algorithm cannot.
Figure 10. The dendrogram obtained using X1 However, from a pragmatic viewpoint, it 1 X
the single-link algorithm.
Figure 12. A single-link clustering of a pattern has been observed that clustering of a pat-
Figure 13. A complete-link the complete-
set containing two classes (1 and 2) connected by tern set containing two classes (1 and 2) con-
link algorithm produces more useful hi-
Y
a chain of noisy patterns (*). erarchies inchain of noisy patterns (*).
nected by a
many applications than the
1 single-link algorithm [Jain and Dubes
1 1988].
1
(3) The output of the algorithm is a well-separated, chain-like, and concen-
2
nested hierarchy of graphs which
2 1 tric clusters, whereas a typicalClus-
Agglomerative Single-Link parti-
2
can be cut at a desired dissimilarity
1 2 2 tering Algorithm such as the k -means
tional algorithm
level forming a partition (clustering)
1
algorithm works well only on data sets
2 (1) Place isotropic clustersits own clus-
having each pattern in [Nagy 1968].
identified by simply connected com-
1
ponents in the 1corresponding graph.
1 On theConstruct a list of interpattern
ter. other hand, the time and space
complexities for all 1992] ofunordered
distances [Day distinct the parti-
Agglomerative Complete-Link Clus- 23
tional algorithms are typically lower
X pairs of patterns, and sort this list
24. •
• Average Group Linkage
• 1
D(Ci , Cj ) = D(x1 , x2 )
|Ci ||Cj |
x1 ∈Ci x2 ∈Cj
• Ward’s Method
•
D(Ci , Cj ) = E(Ci ∪ Cj E(Ci ) − E(Cj )
)−
where E(Ci ) = (d(x, ci ))2 ,
x∈Ci
1
ci = x
|Ci |
x∈Ci
Average Group Linkage 24
Ward’s Method
29. DIANA (1)
• V(i,S)
V
S(⊂V):
d(i, j) : i j
S i∈V-S V(i,S)
V (i, S)
1
|V |−1 j∈V −{i} d(i, j) if S = φ
=
1
|V −S|−1 j∈S∪{i} d(i, j) − 1
|S| j∈S d(i, j) if S = φ
V(i,S) i S - (V-S)
30. DIANA (2):
C A
B B 1.2 B
A C 1.3 1.0 C
E D 3.0 4.0 4.1 D
F E 1.9 2.0 2.5 2.3 E
G F 2.3 3.2 3.4 1.1 2.2 F
D G 4.1 4.0 4.5 3.5 2.5 4.0
(A) (B)
6 4
(1) : S {} S
(2) V (i, S) i∈V −S
(3) V (i, S) 0 i S (2)
(4) V (i, S) ≤ 0 S i (5)
(5) V V −S
4.24
31. DIANA (3): 1 (1)
C A C
B B 1.2 B B
A C 1.3 1.0 C A
E D 3.0 4.0 4.1 D E
F E 1.9 2.0 2.5 2.3 E F
F 2.3 3.2 3.4 1.1 2.2 F G
G
D G 4.1 4.0 4.5 3.5 2.5 4.0 D
(A) (B) (C)
G
V (i, S)
1
|V |−1 j∈V −{i} d(i, j) if S = φ
=
1
|V −S|−1 j∈S∪{i} d(i, j) − 1
|S| j∈S d(i, j) if S = φ
V (G, {})
= 1/6(d(G, A) + d(G, B) + d(G, C) + d(G, D) + d(G, E) + d(G, F))
= 1/6(4.1 + 4.0 + 4.5 + 3.5 + 2.5 + 4.0) = 3.77
32. DIANA (4): 1 (2)
A
C
B 1.2 B B
C 1.3 1.0 C
A
D 3.0 4.0 4.1 D E
E 1.9 2.0 2.5 2.3 E F
F 2.3 3.2 3.4 1.1 2.2 F
G
G 4.1 4.0 4.5 3.5 2.5 4.0 D
(B) (D)
E
V (i, S)
1
|V |−1 j∈V −{i} d(i, j) if S = φ
=
1
|V −S|−1 j∈S∪{i} d(i, j) − 1
|S| j∈S d(i, j) if S = φ
V (E, {})
= 1/6(d(E, A) + d(E, B) + d(E, C) + d(E, D) + d(E, F) + d(E, G))
= 1/6(1.9 + 2.0 + 2.5 + 2.3 + 2.2 + 2.5) = 2.23
33. DIANA (5): 1 (3)
• V(i, {})
V (A, {}) = 13.8/6 = 2.3, V (B, {}) = 15.4/6 = 2.57,
V (C, {}) = 16.8/6 = 2.8, V (D, {}) = 18.0/6 = 3.0,
V (E, {}) = 2.23, V (F, {}) = 16.2/6 = 2.7
V (G, {}) = 3.77
• V(G, {})
• V(G, {}) 0 G S
• S = {G}
• S
34. DIANA (6): 2 (1)
C A
B B 1.2 B
A C 1.3 1.0 C
E D 3.0 4.0 4.1 D
F E 1.9 2.0 2.5 2.3 E
D F 2.3 3.2 3.4 1.1 2.2 F
G
G 4.1 4.0 4.5 3.5 2.5 4.0
V (i, S)
1
|V |−1 j∈V −{i} d(i, j) if S = φ
=
1
|V −S|−1 j∈S∪{i} d(i, j) − 1
|S| j∈S d(i, j) if S = φ
V (A, {G})
= 1/5(d(A, B) + d(A, C)+
d(A, D) + d(A, E) + d(A, F)) − 1/1(d(A, G))
= 1/5(1.2 + 1.3 + 3.0 + 1.9 + 2.3) − 4.1 = −2.16
35. DIANA (7): 2 (2)
•
V (B, {G}) = 1/5(1.2 + 1.0 + 4.0 + 2.0 + 3.2) − 4.0 = −1.72
V (C, {G}) = 1/5(1.3 + 1.0 + 4.1 + 2.5 + 3.4) − 4.5 = −2.04
V (D, {G}) = 1/5(3.0 + 4.0 + 4.1 + 2.3 + 1.1) − 3.5 = −0.6
V (E, {G}) = 1/5(1.9 + 2.0 + 2.5 + 2.3 + 2.2) − 2.5 = −0.32
V (F, {G}) = 1/5(2.3 + 3.2 + 3.4 + 1.1 + 2.2) − 4.0 = −1.56
• V(E {G})
• V(E, {G}) 0
C
B
A
E
F
G D
36. DIANA (8): 2 (1)
• V {A,B,C,D,E,F,G} {A,B,C,D,E,F} {G}
• V = {G}
• V = {A,B,C,D,E,F}
V (i, S)
1
|V |−1 j∈V −{i} d(i, j) if S = φ
=
1
|V −S|−1 j∈S∪{i} d(i, j) − 1
|S| j∈S d(i, j) if S = φ
V (A, {}) = 1/5(1.2 + 1.3 + 3.0 + 1.9 + 2.3) = 1.94
V (B, {}) = 1/5(1.2 + 1.0 + 4.0 + 2.0 + 3.2) = 2.28
V (C, {}) = 1/5(1.3 + 1.0 + 4.1 + 2.5 + 3.4) = 2.46
V (D, {}) = 1/5(3.0 + 4.0 + 4.1 + 2.3 + 1.1) = 2.9
V (E, {}) = 1/5(1.9 + 2.0 + 2.5 + 2.3 + 2.2) = 2.18
V (F, {}) = 1/5(2.3 + 3.2 + 3.4 + 1.1 + 2.2) = 2.44
• S={D} S
37. DIANA (9): 2 (2)
• V={A,B,C,D,E,F}, S={D} V(i, S)
V (i, S)
1
|V |−1 j∈V −{i} d(i, j) if S = φ
=
1
|V −S|−1 j∈S∪{i} d(i, j) − 1
|S| j∈S d(i, j) if S = φ
V (A, {D}) = 1/4(1.2 + 1.3 + 1.9 + 2.3) − 3.0 = −1.325
V (B, {D}) = 1/4(1.2 + 1.0 + 2.0 + 3.2) − 4.0 = −2.15
V (C, {D}) = 1/4(1.3 + 1.0 + 2.5 + 3.4) − 4.1 = −2.05
V (E, {D}) = 1/4(1.9 + 2.0 + 2.5 + 2.2) − 2.3 = −0.15
V (F, {D}) = 1/4(2.3 + 3.2 + 3.4 + 2.2) − 1.1 = 1.675
C
• S F B
• V(F, {D}) 0 S
E
A
F
D
G
38. DIANA (10): 2 (3)
• V={A,B,C,D,E,F}, S={D,F}
V (i, S)
1
|V |−1 j∈V −{i} d(i, j) if S = φ
=
1
|V −S|−1 j∈S∪{i} d(i, j) − 1
|S| j∈S d(i, j) if S = φ
V (A, {D, F}) = 1/3(1.2 + 1.3 + 1.9) − 1/2(3.0 + 2.3) = −1.183
V (B, {D, F}) = 1/3(1.2 + 1.0 + 2.0) − 1/2(4.0 + 3.2) = −2.2
V (C, {D, F}) = 1/3(1.3 + 1.0 + 2.5) − 1/2(4.1 + 3.4) = −2.15
V (E, {D, F}) = 1/3(1.9 + 2.0 + 2.5) − 1/2(2.3 + 2.2) = −0.117
•
C
B
A
E
F
G
D G D F E A B C