Vision and reflection on Mining Software Repositories research in 2024
Presentation slides
1. Drawing Large Graphs Using Divisive Hierarchical
k-means
Barbara Ikica
22. 11. 2012
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
2. Abstract
Idea behind the algorithm
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
3. Abstract
Idea behind the algorithm
Implementation
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
4. Abstract
Idea behind the algorithm
Implementation
Hierarchical clustering (Divisive Hierarchical k-means)
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
5. Abstract
Idea behind the algorithm
Implementation
Hierarchical clustering (Divisive Hierarchical k-means)
Graph representation
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
6. Abstract
Idea behind the algorithm
Implementation
Hierarchical clustering (Divisive Hierarchical k-means)
Graph representation
Identifying connected components
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
7. Abstract
Idea behind the algorithm
Implementation
Hierarchical clustering (Divisive Hierarchical k-means)
Graph representation
Identifying connected components
Diffusion kernels on graphs
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
8. Abstract
Idea behind the algorithm
Implementation
Hierarchical clustering (Divisive Hierarchical k-means)
Graph representation
Identifying connected components
Diffusion kernels on graphs
Drawing
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
9. Abstract
Idea behind the algorithm
Implementation
Hierarchical clustering (Divisive Hierarchical k-means)
Graph representation
Identifying connected components
Diffusion kernels on graphs
Drawing
Time complexity
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
10. Abstract
Idea behind the algorithm
Implementation
Hierarchical clustering (Divisive Hierarchical k-means)
Graph representation
Identifying connected components
Diffusion kernels on graphs
Drawing
Time complexity
Random projections
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
11. Idea behind the algorithm
0
1
2
3 4
5 6
7
8
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
12. Idea behind the algorithm
0
1
2
3 4
5 6
7
8
0
1
2
3 4
5 6
7
8
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
13. Idea behind the algorithm
0
1
2
3 4
5 6
7
8
0
1
2
3 4
5 6
7
8
012 3456
78
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
14. Idea behind the algorithm
Objective: Arrangement of the (sets of the) vertices of a graph
G = V (G), E(G) at the level of hierarchy n
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
15. Idea behind the algorithm
Objective: Arrangement of the (sets of the) vertices of a graph
G = V (G), E(G) at the level of hierarchy n
1. Determining the partition of V (G) : Pn = {Mi}m
i=1
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
16. Idea behind the algorithm
Objective: Arrangement of the (sets of the) vertices of a graph
G = V (G), E(G) at the level of hierarchy n
1. Determining the partition of V (G) : Pn = {Mi}m
i=1
2. Determining the drawing area for each Mi ∈ Pn
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
18. Implementation – Hierarchical clustering
k-means clustering
M := {vi}n
i=1, vi ∈ Rd ∀i, 1 ≤ i ≤ n.
We aim to partition the set M into k clusters P = {S1, S2, . . . , Sk}
so as to minimize the within-cluster sum of squares
min
P
k
i=1 vj∈Si
vj − µi
2
,
where µi := 1
|Si|( vj∈Si
vj) is the centroid vector of the cluster Si.
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
19. Implementation – Hierarchical clustering
k-means clustering
NP-hard problem
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
20. Implementation – Hierarchical clustering
k-means clustering
NP-hard problem
If k and d are fixed, the problem can be exactly solved in
O(ndk+1 log n) steps.
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
21. Implementation – Hierarchical clustering
k-means clustering
Recall that:
We aim to partition the set M into k clusters P = {S1, S2, . . . , Sk} so as to minimize
the within-cluster sum of squares
min
P
k
i=1 vj ∈Si
vj − µi
2
.
Corollary 1
min
z∈Rd
v∈S
v − z 2
=
v∈S
v − µ 2
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
22. Implementation – Hierarchical clustering
k-means clustering
Lemma 1
Choose arbitrary S ⊂ Rd and z ∈ Rd. Then
v∈S
v − z 2
=
v∈S
v − µ 2
+ |S| z − µ 2
,
where µ is the centroid vector of the cluster S, i.e. µ = 1
|S| v∈S
v.
Corollary 1
min
z∈Rd
v∈S
v − z 2
=
v∈S
v − µ 2
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
23. Implementation – Hierarchical clustering
k-means clustering
Lemma 1
Choose arbitrary S ⊂ Rd and z ∈ Rd. Then
v∈S
v − z 2
=
v∈S
v − µ 2
+ |S| z − µ 2
,
where µ is the centroid vector of the cluster S, i.e. µ = 1
|S| v∈S
v.
Lemma 2
Let X denote an arbitrary random variable with values in Rd. For any z ∈ Rd the
following holds:
E( X − z 2
) = E( X − E(X) 2
) + z − E(X) 2
.
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
24. Implementation – Hierarchical clustering
k-means clustering – iterative algorithm
Randomly pick k vectors from M = {v1, v2, . . . , vn} as the centroids µ
(0)
i for
each i = 1, 2, . . . , k.
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
25. Implementation – Hierarchical clustering
k-means clustering – iterative algorithm
Randomly pick k vectors from M = {v1, v2, . . . , vn} as the centroids µ
(0)
i for
each i = 1, 2, . . . , k.
For t = 0, 1, 2, . . . repeat (for each i = 1, 2, . . . , k)
1. S
(t)
i = vp : vp − µ
(t)
i
2
≤ vp − µ
(t)
j
2
∀ j : 1 ≤ j ≤ k
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
26. Implementation – Hierarchical clustering
k-means clustering – iterative algorithm
Randomly pick k vectors from M = {v1, v2, . . . , vn} as the centroids µ
(0)
i for
each i = 1, 2, . . . , k.
For t = 0, 1, 2, . . . repeat (for each i = 1, 2, . . . , k)
1. S
(t)
i = vp : vp − µ
(t)
i
2
≤ vp − µ
(t)
j
2
∀ j : 1 ≤ j ≤ k and
2. µ
(t+1)
i = 1
S
(t)
i
vj ∈S
(t)
i
vj
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
27. Implementation – Hierarchical clustering
k-means clustering – iterative algorithm
Randomly pick k vectors from M = {v1, v2, . . . , vn} as the centroids µ
(0)
i for
each i = 1, 2, . . . , k.
For t = 0, 1, 2, . . . repeat (for each i = 1, 2, . . . , k)
1. S
(t)
i = vp : vp − µ
(t)
i
2
≤ vp − µ
(t)
j
2
∀ j : 1 ≤ j ≤ k and
2. µ
(t+1)
i = 1
S
(t)
i
vj ∈S
(t)
i
vj
until there is no further change in the assignments of the vectors to the clusters
S
(t)
i (for each i) in the partition V.
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
28. Implementation – Hierarchical clustering
k-means clustering – iterative algorithm
Lemma 3
The value of the expression
k
i=1 vj ∈S
(t)
i
vj − µ
(t)
i
2
decreases monotonically
during iteration.
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
29. Implementation – Hierarchical clustering
k-means clustering – iterative algorithm
Lemma 3
The value of the expression
k
i=1 vj ∈S
(t)
i
vj − µ
(t)
i
2
decreases monotonically
during iteration.
Proof.
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
30. Implementation – Hierarchical clustering
k-means clustering – iterative algorithm
Lemma 3
The value of the expression
k
i=1 vj ∈S
(t)
i
vj − µ
(t)
i
2
decreases monotonically
during iteration.
Proof.
k
i=1 vj ∈S
(t)
i
vj − µ
(t)
i
2
≤
k
i=1 vj ∈S
(t−1)
i
vj − µ
(t)
i
2
(1)
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
31. Implementation – Hierarchical clustering
k-means clustering – iterative algorithm
Lemma 3
The value of the expression
k
i=1 vj ∈S
(t)
i
vj − µ
(t)
i
2
decreases monotonically
during iteration.
Proof.
k
i=1 vj ∈S
(t)
i
vj − µ
(t)
i
2
≤
k
i=1 vj ∈S
(t−1)
i
vj − µ
(t)
i
2
, since (1)
S
(t)
i = vp : vp − µ
(t)
i
2
≤ vp − µ
(t)
j
2
∀ j : 1 ≤ j ≤ k .
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
32. Implementation – Hierarchical clustering
k-means clustering – iterative algorithm
Lemma 3
The value of the expression
k
i=1 vj ∈S
(t)
i
vj − µ
(t)
i
2
decreases monotonically
during iteration.
Proof.
k
i=1 vj ∈S
(t)
i
vj − µ
(t+1)
i
2
≤
k
i=1 vj ∈S
(t)
i
vj − µ
(t)
i
2
(2)
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
33. Implementation – Hierarchical clustering
k-means clustering – iterative algorithm
Lemma 3
The value of the expression
k
i=1 vj ∈S
(t)
i
vj − µ
(t)
i
2
decreases monotonically
during iteration.
Proof.
k
i=1 vj ∈S
(t)
i
vj − µ
(t+1)
i
2
≤
k
i=1 vj ∈S
(t)
i
vj − µ
(t)
i
2
, since (2)
µ
(t+1)
i =
1
S
(t)
i
vj ∈S
(t)
i
vj and min
z∈Rd
v∈S
v − z 2
=
v∈S
v − µ 2
.
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
34. Implementation – Hierarchical clustering
k-means clustering – iterative algorithm
Lemma 3
The value of the expression
k
i=1 vj ∈S
(t)
i
vj − µ
(t)
i
2
decreases monotonically
during iteration.
Proof. Thus
k
i=1 vj ∈S
(t)
i
vj − µ
(t)
i
2
≤
k
i=1 vj ∈S
(t−1)
i
vj − µ
(t)
i
2
(1)
k
i=1 vj ∈S
(t)
i
vj − µ
(t+1)
i
2
≤
k
i=1 vj ∈S
(t)
i
vj − µ
(t)
i
2
(2)
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
35. Implementation – Hierarchical clustering
k-means clustering – iterative algorithm
Lemma 3
The value of the expression
k
i=1 vj ∈S
(t)
i
vj − µ
(t)
i
2
decreases monotonically
during iteration.
Proof. Thus
k
i=1 vj ∈S
(t+1)
i
vj − µ
(t+1)
i
2
≤
k
i=1 vj ∈S
(t)
i
vj − µ
(t+1)
i
2
(1)
k
i=1 vj ∈S
(t)
i
vj − µ
(t+1)
i
2
≤
k
i=1 vj ∈S
(t)
i
vj − µ
(t)
i
2
(2)
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
36. Implementation – Hierarchical clustering
k-means clustering – iterative algorithm
Lemma 3
The value of the expression
k
i=1 vj ∈S
(t)
i
vj − µ
(t)
i
2
decreases monotonically
during iteration.
Proof. Thus
k
i=1 vj ∈S
(t+1)
i
vj − µ
(t+1)
i
2
≤
k
i=1 vj ∈S
(t)
i
vj − µ
(t)
i
2
.
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
37. Implementation – Hierarchical clustering
Hierarchical clustering
M
A1
A11
A111 A112
A12
A12
A2
A21
A211 A212
A22
A221 A222
Figure: Schematic representation of the hierarchical partition of the set
M.
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
38. Implementation – Hierarchical clustering
Hierarchical clustering
M
A1
A11
A111 A112
A12
A12
A2
A21
A211 A212
A22
A221 A222
Figure: Schematic representation of the hierarchical partition of the set
M.
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
39. Implementation – Hierarchical clustering
Hierarchical clustering
M
A1
A11
A111 A112
A12
A12
A2
A21
A211 A212
A22
A221 A222
Figure: Schematic representation of the hierarchical partition of the set
M.
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
40. Implementation – Hierarchical clustering
Hierarchical clustering
M
A1
A11
A111 A112
A12
A12
A2
A21
A211 A212
A22
A221 A222
Figure: Schematic representation of the hierarchical partition of the set
M.
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
41. Implementation – Hierarchical clustering
Hierarchical clustering
M
A1
A11
A111 A112
A12
A12
A2
A21
A211 A212
A22
A221 A222
Figure: Schematic representation of the hierarchical partition of the set
M.
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
42. Implementation – Hierarchical clustering
Hierarchical clustering
M
A1
A11
A111 A112
A12
A12
A2
A21
A211 A212
A22
A221 A222
Figure: Schematic representation of the hierarchical partition of the set
M.
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
43. Implementation – Hierarchical clustering
Hierarchical clustering
M
A1
A11
A111 A112
A12
A12
A2
A21
A211 A212
A22
A221 A222
Figure: Schematic representation of the hierarchical partition of the set
M.
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
44. Implementation – Hierarchical clustering
Hierarchical clustering
M
A1
A11
A111 A112
A12
A12
A2
A21
A211 A212
A22
A221 A222
Figure: Schematic representation of the hierarchical partition of the set
M.
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
45. Implementation – Hierarchical clustering
Hierarchical clustering
M
A1
A11
A111 A112
A12
A12
A2
A21
A211 A212
A22
A221 A222
Figure: Schematic representation of the hierarchical partition of the set
M.
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
46. Implementation – Hierarchical clustering
Hierarchical clustering
M
A1
A11
A111 A112
A12
A12
A2
A21
A211 A212
A22
A221 A222
Figure: Schematic representation of the hierarchical partition of the set
M.
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
47. Implementation – Hierarchical clustering
Hierarchical clustering
M
A1
A11
A111 A112
A12
A12
A2
A21
A211 A212
A22
A221 A222
Figure: Schematic representation of the hierarchical partition of the set
M.
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
48. Implementation – Hierarchical clustering
Hierarchical clustering
M
A1
A11
A111 A112
A12
A12
A2
A21
A211 A212
A22
A221 A222
Figure: Schematic representation of the hierarchical partition of the set
M.
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
49. Implementation – Hierarchical clustering
Hierarchical clustering
M
A1
A11
A111 A112
A12
A12
A2
A21
A211 A212
A22
A221 A222
Figure: Schematic representation of the hierarchical partition of the set
M.
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
50. Implementation – Hierarchical clustering
Hierarchical clustering
M
A1
A11
A111 A112
A12
A12
A2
A21
A211 A212
A22
A221 A222
Figure: Schematic representation of the hierarchical partition of the set
M.
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
78. Implementation – Diffusion kernels on graphs
0 1
2 3
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
79. Implementation – Diffusion kernels on graphs
0 1
2 3
0
3
1
2 Hierarchical clustering on the row
vectors of the adjacency matrix
MG:
S1 = {1, 2}
S2 = {0, 3}
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
80. Implementation – Diffusion kernels on graphs
0 1
2 3
0
3
1
2 Hierarchical clustering on the row
vectors of ?:
S1 = {0, 1, 2}
S2 = {3}
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
81. Implementation – Diffusion kernels on graphs
The adjacency matrix MG has the following property:
[(MG
)k
]ij = # of all paths of length up to k between vertices i and j
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
82. Implementation – Diffusion kernels on graphs
The adjacency matrix MG has the following property:
[(MG
)k
]ij = # of all paths of length up to k between vertices i and j
We modify the algorithm by replacing MG with the kernel matrix
KG:
KG
:=
∞
k=0
αk(MG)k
k!
= exp(αMG
) .
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
83. Implementation – Diffusion kernels on graphs
k : Ω × Ω → R is a similarity measure if:
k(x, y) characterizes the similarities of x, y ∈ Ω.
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
84. Implementation – Diffusion kernels on graphs
k : Ω × Ω → R is a similarity measure if:
k(x, y) characterizes the similarities of x, y ∈ Ω.
k : Ω × Ω → R is a kernel if:
1. k(x, y) = k(y, x), ∀x, y ∈ Ω,
2. k is positive semidefinite:
the kernel matrix K ∈ Rn×n, [K]ij = k(xi, xj), is positive
semidefinite for all x1, x2, . . . , xn ∈ Ω.
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
85. Implementation – Diffusion kernels on graphs
k : Ω × Ω → R is a similarity measure if:
k(x, y) characterizes the similarities of x, y ∈ Ω.
k : Ω × Ω → R is a kernel if:
1. k(x, y) = k(y, x), ∀x, y ∈ Ω,
2. k is positive semidefinite:
the kernel matrix K ∈ Rn×n, [K]ij = k(xi, xj), is positive
semidefinite for all x1, x2, . . . , xn ∈ Ω.
Given a kernel k, there exist a Hilbert space Hk and a map φ : Ω → Hk such
that
φ(x), φ(y) Hk
= k(x, y) for all x, y ∈ Ω.
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
86. Implementation – Diffusion kernels on graphs
KG
:=
∞
k=0
αk(MG)k
k!
= exp(αMG
) indeed is a kernel matrix,
since
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
87. Implementation – Diffusion kernels on graphs
KG
:=
∞
k=0
αk(MG)k
k!
= exp(αMG
) indeed is a kernel matrix,
since
KG
= exp(αMG
)
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
88. Implementation – Diffusion kernels on graphs
KG
:=
∞
k=0
αk(MG)k
k!
= exp(αMG
) indeed is a kernel matrix,
since
KG
= exp(αMG
) = exp(αUDUT
)
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
89. Implementation – Diffusion kernels on graphs
KG
:=
∞
k=0
αk(MG)k
k!
= exp(αMG
) indeed is a kernel matrix,
since
KG
= exp(αMG
) = exp(αUDUT
) = αU exp(D)UT
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
90. Implementation – Diffusion kernels on graphs
KG
:=
∞
k=0
αk(MG)k
k!
= exp(αMG
) indeed is a kernel matrix,
since
KG
= exp(αMG
) = exp(αUDUT
) = αU exp(D)UT
=⇒ KG is a symmetric positive semidefinite matrix
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
91. Implementation – Hierarchical clustering on KG
Example
"1#1"
"2#11"
"1#2"
"2#211"
"3#11"
"2#12"
"3#2"
"3#12"
"2#221"
"2#212"
"2#2221"
"2#2222"
0
2
4
6 7
1 3
5
8
9
10
11
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
92. Implementation – Hierarchical clustering on KG
Example
"1#1"
"2#11"
"1#2"
"2#211"
"3#11"
"2#12"
"3#2"
"3#12"
"2#221"
"2#212"
"2#2221"
"2#2222"
0
2
4
6 7
1 3
5
8
9
10
11
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
93. Implementation – Hierarchical clustering on KG
Example
"1#1"
"2#11"
"1#2"
"2#211"
"3#11"
"2#12"
"3#2"
"3#12"
"2#221"
"2#212"
"2#2221"
"2#2222"
0
2
4
6 7
1
5
3
8
9
10
11
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
94. Implementation – Hierarchical clustering on KG
Example
"1#1"
"2#11"
"1#2"
"2#211"
"3#11"
"2#12"
"3#2"
"3#12"
"2#221"
"2#212"
"2#2221"
"2#2222"
0
2
4
6 7
1
5
8
10
11
3
9
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
95. Implementation – Hierarchical clustering on KG
Example
"1#1"
"2#11"
"1#2"
"2#211"
"3#11"
"2#12"
"3#2"
"3#12"
"2#221"
"2#212"
"2#2221"
"2#2222"
0
2
4
6 7
1
5
8
10
11
3
9
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
96. Implementation – Drawing
A111A112
A12
A211 A212
A221 A222
Figure: Determining the
drawing area for the sets
in the partition P0.
V (G)
A1
A11
A111 A112
A12
A12
A2
A21
A211 A212
A22
A221 A222
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
97. Implementation – Drawing
A111A112
A12
A211 A212
A221 A222
Figure: Determining the
drawing area for the sets
in the partition P1.
V (G)
A1
A11
A111 A112
A12
A12
A2
A21
A211 A212
A22
A221 A222
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
98. Implementation – Drawing
A111A112
A12
A211 A212
A221 A222
Figure: Determining the
drawing area for the sets
in the partition P1.
V (G)
A1
A11
A111 A112
A12
A12
A2
A21
A211 A212
A22
A221 A222
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
99. Implementation – Drawing
A111A112
A12
A211 A212
A221 A222
Figure: Determining the
drawing area for the sets
in the partition P1.
V (G)
A1
A11
A111 A112
A12
A12
A2
A21
A211 A212
A22
A221 A222
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
100. Implementation – Drawing
A111A112
A12
A211 A212
A221 A222
Figure: Determining the
drawing area for the sets
in the partition P2.
V (G)
A1
A11
A111 A112
A12
A12
A2
A21
A211 A212
A22
A221 A222
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
101. Implementation – Drawing
A111A112
A12
A211 A212
A221 A222
Figure: Determining the
drawing area for the sets
in the partition P2.
V (G)
A1
A11
A111 A112
A12
A12
A2
A21
A211 A212
A22
A221 A222
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
102. Implementation – Drawing
A111A112
A12
A211 A212
A221 A222
Figure: Determining the
drawing area for the sets
in the partition P3.
V (G)
A1
A11
A111 A112
A12
A12
A2
A21
A211 A212
A22
A221 A222
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
103. Implementation – Drawing
A111A112
A12
A211 A212
A221 A222
Figure: Determining the
drawing area for the sets
in the partition P3.
V (G)
A1
A11
A111 A112
A12
A12
A2
A21
A211 A212
A22
A221 A222
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
104. Implementation – Drawing
A111A112
A12
A211 A212
A221 A222
Figure: Determining the
drawing area for the sets
in the partition P3.
V (G)
A1
A11
A111 A112
A12
A12
A2
A21
A211 A212
A22
A221 A222
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
105. Implementation – Drawing
A111A112
A12
A211 A212
A221 A222
Figure: Determining the
drawing area for the sets
in the partition P3.
V (G)
A1
A11
A111 A112
A12
A12
A2
A21
A211 A212
A22
A221 A222
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
106. Implementation – Drawing
A111A112
A12
A211 A212
A221 A222
Figure: Determining the
drawing area for the sets
in the partition P3.
V (G)
A1
A11
A111 A112
A12
A12
A2
A21
A211 A212
A22
A221 A222
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
107. Implementation – Drawing
A111A112
A12
A211 A212
A221 A222
Figure: Determining the
drawing area for the sets
in the partition P3.
V (G)
A1
A11
A111 A112
A12
A12
A2
A21
A211 A212
A22
A221 A222
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
108. Implementation – Drawing
A111A112
A12
A211 A212
A221 A222
Figure: Determining the
drawing area for the sets
in the partition P3.
V (G)
A1
A11
A111 A112
A12
A12
A2
A21
A211 A212
A22
A221 A222
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
109. Implementation – Drawing
A111A112
A12
A211 A212
A221 A222
Figure: Determining the
drawing area for the sets
in the partition P3.
V (G)
A1
A11
A111 A112
A12
A12
A2
A21
A211 A212
A22
A221 A222
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
110. Implementation – Drawing
A111A112
A12
A211 A212
A221 A222
Figure: Determining the
drawing area for the sets
in the partition P3.
V (G)
A1
A11
A111 A112
A12
A12
A2
A21
A211 A212
A22
A221 A222
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
111. Implementation – Drawing
A111A112
A12
A211 A212
A221 A222
Figure: Determining the
drawing area for the sets
in the partition P3.
V (G)
A1
A11
A111 A112
A12
A12
A2
A21
A211 A212
A22
A221 A222
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
112. Implementation – Time complexity
Efficient computation of KG
= exp(αMG
)
Computation of KG si needed in the algorithm:
to determine the centroids µj,
to minimize vi − µj
2
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
113. Implementation – Time complexity
Efficient computation of KG
= exp(αMG
)
Computation of KG si needed in the algorithm:
to determine the centroids µj,
to minimize vi − µj
2:
vi − µj
2
= vi − µj, vi − µj = vi, vi − 2 vi, µj + µj, µj .
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
114. Implementation – Time complexity
Efficient computation of KG
= exp(αMG
)
Computation of KG si needed in the algorithm:
to determine the centroids µj,
to minimize vi − µj
2:
vi − µj
2
= vi − µj, vi − µj = vi, vi − 2 vi, µj + µj, µj .
The question is thus:
How to efficiently multiply the matrix KG with an arbitrary vector and how to
efficiently compute the inner products vi, vi = vi
2?
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
115. Implementation – Time complexity
Multiplying the matrix KG
with an arbitrary vector v
KG
v = Iv +
α
1
MG
v +
α2
2!
(MG
)2
v +
α3
3!
(MG
)3
v + . . .
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
116. Implementation – Time complexity
Multiplying the matrix KG
with an arbitrary vector v
KG
v = Iv +
α
1
MG
v +
α2
2!
(MG
)2
v +
α3
3!
(MG
)3
v + . . . =
= v +
α
1
(MG
v) +
α
2
MG α
1
(MG
v) +
α
3
MG α
2
MG α
1
(MG
v) + . . .
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
117. Implementation – Time complexity
Multiplying the matrix KG
with an arbitrary vector v
KG
v = Iv +
α
1
MG
v +
α2
2!
(MG
)2
v +
α3
3!
(MG
)3
v + . . . =
= v +
α
1
(MG
v) +
α
2
MG α
1
(MG
v) +
α
3
MG α
2
MG α
1
(MG
v) + . . .
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
118. Implementation – Time complexity
Multiplying the matrix KG
with an arbitrary vector v
KG
v = Iv +
α
1
MG
v +
α2
2!
(MG
)2
v +
α3
3!
(MG
)3
v + . . . =
= v +
α
1
(MG
v) +
α
2
MG α
1
(MG
v) +
α
3
MG α
2
MG α
1
(MG
v) + . . .
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
119. Implementation – Time complexity
Multiplying the matrix KG
with an arbitrary vector v
KG
v = Iv +
α
1
MG
v +
α2
2!
(MG
)2
v +
α3
3!
(MG
)3
v + . . . =
= v +
α
1
(MG
v) +
α
2
MG α
1
(MG
v) +
α
3
MG α
2
MG α
1
(MG
v) + . . .
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
120. Implementation – Time complexity
Multiplying the matrix KG
with an arbitrary vector v
KG
v = Iv +
α
1
MG
v +
α2
2!
(MG
)2
v +
α3
3!
(MG
)3
v + . . . =
= v +
α
1
(MG
v) +
α
2
MG α
1
(MG
v) +
α
3
MG α
2
MG α
1
(MG
v) + . . .
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
121. Implementation – Time complexity
Multiplying the matrix KG
with an arbitrary vector v
KG
v = Iv +
α
1
MG
v +
α2
2!
(MG
)2
v +
α3
3!
(MG
)3
v + . . . =
= v +
α
1
(MG
v) +
α
2
MG α
1
(MG
v) +
α
3
MG α
2
MG α
1
(MG
v) + . . .
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
122. Implementation – Random projections
Random projections
Corollary of the Johnson–Lindenstrauss lemma:
Theorem 1
Let P be an arbitrary set of n points in Rd
, represented as an n × d matrix A ∈ Rn×d
. Given ε > 0, β > 0 let
k0 =
4+2β
ε2/2−ε3/3
log n. For integer k ≥ k0, let R ∈ Rd×k
be a random matrix with elements rij , where rij
are independent random variables from either one of the following two probability distributions:
rij :
1 −1
1
2
1
2
, rij :
1 0 −1
1
6
2
3
1
6
.
Let E = 1√
k
AR and let f : Rd
→ Rk
map the i-th row of A to the i-th row of E.
Then
(1 − ε) u − v
2
≤ f (u) − f (v)
2
≤ (1 + ε) u − v
2
holds for all u, v ∈ P with probability at least 1 − n−β
.
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means
123. Implementation – Random projections
Random projections
Corollary of the Johnson–Lindenstrauss lemma:
Theorem 1
Let P be an arbitrary set of n points in Rd
, represented as an n × d matrix A ∈ Rn×d
. Given ε > 0, β > 0 let
k0 =
4+2β
ε2/2−ε3/3
log n. For integer k ≥ k0, let R ∈ Rd×k
be a random matrix with elements rij , where rij
are independent random variables from either one of the following two probability distributions:
rij :
1 −1
1
2
1
2
, rij :
1 0 −1
1
6
2
3
1
6
.
Let E =
1
√
k
AR and let f : Rd
→ Rk
map the i-th row of A to the i-th row of E.
Then
(1 − ε) u − v
2
≤ f (u) − f (v)
2
≤ (1 + ε) u − v
2
holds for all u, v ∈ P with probability at least 1 − n−β
.
Barbara Ikica
Drawing Large Graphs Using Divisive Hierarchical k-means