"SSumM: Sparse Summarization of Massive Graphs", KDD 2020

K
SSumM : Sparse Summarization
of Massive Graphs
Kyuhan Lee* Hyeonsoo Jo* Jihoon Ko Sungsu Lim Kijung Shin
Graphs are Everywhere
Citation networksSubway networks Internet topologies
Introduction Algorithms Experiments ConclusionProblem
Designed by Freepik from Flaticon
Massive Graphs Appeared
Social networks
2.49 Billion active users
Purchase histories
0.5 Billion products
World Wide Web
5.49 Billion web pages
Introduction Algorithms Experiments ConclusionProblem
Difficulties in Analyzing Massive graphs
Computational cost
(number of nodes & edges)
Introduction Algorithms Experiments ConclusionProblem
Difficulties in Analyzing Massive graphs
>
Cannot fit
Introduction Algorithms Experiments ConclusionProblem
Input Graph
Solution: Graph Summarization
>
Introduction Algorithms Experiments ConclusionProblem
Can fit
Summary Graph
Advantages of Graph Summarization
Introduction Algorithms Experiments ConclusionProblem
• Many graph compression techniques are available
• TheWebGraph Framework [BV04]
• BFS encoding [AD09]
• SlashBurn [KF11]
• VoG [KKVF14]
• Graph summarization stands out because
• Elastic: reduce size of outputs as much as we want
• Analyzable: existing graph analysis and tools can be applied
• Combinable for Additional Compression: can be further compressed
Example of Graph Summarization
1
2
3 4
5
6
7
8
9
1 2 3 4 5 6 7 8 9
1 0 1 0 0 0 1 0 0 1
2 1 0 1 0 0 1 1 0 0
3 0 1 0 1 1 1 0 0 1
4 0 0 1 0 1 0 0 0 0
5 0 0 1 1 0 1 1 0 0
6 1 1 1 0 1 0 0 0 0
7 0 1 0 0 1 0 0 1 1
8 0 0 0 0 0 0 1 0 1
9 1 0 1 0 0 0 1 1 0
Adjacency Matrix
1 2 3 4 5 6 7 8 9
1 0 1 0 0 0 1 0 0 1
2 1 0 1 0 0 1 1 0 0
3 0 1 0 1 1 1 0 0 1
4 0 0 1 0 1 0 0 0 0
5 0 0 1 1 0 1 1 0 0
6 1 1 1 0 1 0 0 0 0
7 0 1 0 0 1 0 0 1 1
8 0 0 0 0 0 0 1 0 1
9 1 0 1 0 0 0 1 1 0
1 2 3 4 5 6 7 8 9
1 0 1 0 0 0 1 0 0 1
2 1 0 1 0 0 1 1 0 0
3 0 1 0 1 1 1 0 0 1
4 0 0 1 0 1 0 0 0 0
5 0 0 1 1 0 1 1 0 0
6 1 1 1 0 1 0 0 0 0
7 0 1 0 0 1 0 0 1 1
8 0 0 0 0 0 0 1 0 1
9 1 0 1 0 0 0 1 1 0
Introduction Algorithms Experiments ConclusionProblem
Input Graph
Example of Graph Summarization
1
2
3 4
5
6
7
8
9
1 2 3 4 5 6 7 8 9
1 0 1 0 0 0 1 0 0 1
2 1 0 1 0 0 1 1 0 0
3 0 1 0 1 1 1 0 0 1
4 0 0 1 0 1 0 0 0 0
5 0 0 1 1 0 1 1 0 0
6 1 1 1 0 1 0 0 0 0
7 0 1 0 0 1 0 0 1 1
8 0 0 0 0 0 0 1 0 1
9 1 0 1 0 0 0 1 1 0
Adjacency Matrix
Input Graph
1 2 3 4 5 6 7 8 9
1 0 1 0 0 0 1 0 0 1
2 1 0 1 0 0 1 1 0 0
3 0 1 0 1 1 1 0 0 1
4 0 0 1 0 1 0 0 0 0
5 0 0 1 1 0 1 1 0 0
6 1 1 1 0 1 0 0 0 0
7 0 1 0 0 1 0 0 1 1
8 0 0 0 0 0 0 1 0 1
9 1 0 1 0 0 0 1 1 0
1 2 3 4 5 6 7 8 9
1 0 1 0 0 0 1 0 0 1
2 1 0 1 0 0 1 1 0 0
3 0 1 0 1 1 1 0 0 1
4 0 0 1 0 1 0 0 0 0
5 0 0 1 1 0 1 1 0 0
6 1 1 1 0 1 0 0 0 0
7 0 1 0 0 1 0 0 1 1
8 0 0 0 0 0 0 1 0 1
9 1 0 1 0 0 0 1 1 0
𝑽𝑽𝟏𝟏={1,2}
𝑽𝑽𝟕𝟕={7,8,9}
𝑽𝑽𝟑𝟑={3,4,5,6}
Summary Graph
Introduction Algorithms Experiments ConclusionProblem
Subnode
Subedge
Supernode
Example of Graph Summarization
1
2
3 4
5
6
7
8
9
1 2 3 4 5 6 7 8 9
1 0 1 0 0 0 1 0 0 1
2 1 0 1 0 0 1 1 0 0
3 0 1 0 1 1 1 0 0 1
4 0 0 1 0 1 0 0 0 0
5 0 0 1 1 0 1 1 0 0
6 1 1 1 0 1 0 0 0 0
7 0 1 0 0 1 0 0 1 1
8 0 0 0 0 0 0 1 0 1
9 1 0 1 0 0 0 1 1 0
Adjacency Matrix
Input Graph
1 2 3 4 5 6 7 8 9
1 0 1 0 0 0 1 0 0 1
2 1 0 1 0 0 1 1 0 0
3 0 1 0 1 1 1 0 0 1
4 0 0 1 0 1 0 0 0 0
5 0 0 1 1 0 1 1 0 0
6 1 1 1 0 1 0 0 0 0
7 0 1 0 0 1 0 0 1 1
8 0 0 0 0 0 0 1 0 1
9 1 0 1 0 0 0 1 1 0
1 2 3 4 5 6 7 8 9
1 0 1 0 0 0 1 0 0 1
2 1 0 1 0 0 1 1 0 0
3 0 1 0 1 1 1 0 0 1
4 0 0 1 0 1 0 0 0 0
5 0 0 1 1 0 1 1 0 0
6 1 1 1 0 1 0 0 0 0
7 0 1 0 0 1 0 0 1 1
8 0 0 0 0 0 0 1 0 1
9 1 0 1 0 0 0 1 1 0
𝑽𝑽𝟏𝟏={1,2}
𝑽𝑽𝟕𝟕={7,8,9}
𝑽𝑽𝟑𝟑={3,4,5,6}
Summary Graph
𝝎𝝎 {𝑽𝑽𝟏𝟏, 𝑽𝑽𝟑𝟑} =3
Introduction Algorithms Experiments ConclusionProblem
Subnode
Subedge
Superedge
Supernode
Example of Graph Summarization
1
2
3 4
5
6
7
8
9
1 2 3 4 5 6 7 8 9
1 0 1 0 0 0 1 0 0 1
2 1 0 1 0 0 1 1 0 0
3 0 1 0 1 1 1 0 0 1
4 0 0 1 0 1 0 0 0 0
5 0 0 1 1 0 1 1 0 0
6 1 1 1 0 1 0 0 0 0
7 0 1 0 0 1 0 0 1 1
8 0 0 0 0 0 0 1 0 1
9 1 0 1 0 0 0 1 1 0
Adjacency Matrix
Input Graph
1 2 3 4 5 6 7 8 9
1 0 1 0 0 0 1 0 0 1
2 1 0 1 0 0 1 1 0 0
3 0 1 0 1 1 1 0 0 1
4 0 0 1 0 1 0 0 0 0
5 0 0 1 1 0 1 1 0 0
6 1 1 1 0 1 0 0 0 0
7 0 1 0 0 1 0 0 1 1
8 0 0 0 0 0 0 1 0 1
9 1 0 1 0 0 0 1 1 0
1 2 3 4 5 6 7 8 9
1 0 1 0 0 0 1 0 0 1
2 1 0 1 0 0 1 1 0 0
3 0 1 0 1 1 1 0 0 1
4 0 0 1 0 1 0 0 0 0
5 0 0 1 1 0 1 1 0 0
6 1 1 1 0 1 0 0 0 0
7 0 1 0 0 1 0 0 1 1
8 0 0 0 0 0 0 1 0 1
9 1 0 1 0 0 0 1 1 0
𝑽𝑽𝟏𝟏={1,2}
𝑽𝑽𝟕𝟕={7,8,9}
𝑽𝑽𝟑𝟑={3,4,5,6}
Summary Graph
𝝎𝝎 {𝑽𝑽𝟏𝟏, 𝑽𝑽𝟕𝟕} =2
𝝎𝝎 {𝑽𝑽𝟏𝟏, 𝑽𝑽𝟑𝟑} =3
𝝎𝝎 {𝑽𝑽𝟑𝟑, 𝑽𝑽𝟕𝟕} =2
𝝎𝝎 {𝑽𝑽𝟏𝟏, 𝑽𝑽𝟏𝟏} =1
𝝎𝝎 {𝑽𝑽𝟕𝟕, 𝑽𝑽𝟕𝟕} =3
𝝎𝝎 {𝑽𝑽𝟑𝟑, 𝑽𝑽𝟑𝟑} =5
Introduction Algorithms Experiments ConclusionProblem
Example of Graph Summarization
1
2
3 4
5
6
7
8
9
1 2 3 4 5 6 7 8 9
1 0 1 0 0 0 1 0 0 1
2 1 0 1 0 0 1 1 0 0
3 0 1 0 1 1 1 0 0 1
4 0 0 1 0 1 0 0 0 0
5 0 0 1 1 0 1 1 0 0
6 1 1 1 0 1 0 0 0 0
7 0 1 0 0 1 0 0 1 1
8 0 0 0 0 0 0 1 0 1
9 1 0 1 0 0 0 1 1 0
Adjacency Matrix
Input Graph
1 2 3 4 5 6 7 8 9
1 0 1 0 0 0 1 0 0 1
2 1 0 1 0 0 1 1 0 0
3 0 1 0 1 1 1 0 0 1
4 0 0 1 0 1 0 0 0 0
5 0 0 1 1 0 1 1 0 0
6 1 1 1 0 1 0 0 0 0
7 0 1 0 0 1 0 0 1 1
8 0 0 0 0 0 0 1 0 1
9 1 0 1 0 0 0 1 1 0
1 2 3 4 5 6 7 8 9
1 0 1 0 0 0 1 0 0 1
2 1 0 1 0 0 1 1 0 0
3 0 1 0 1 1 1 0 0 1
4 0 0 1 0 1 0 0 0 0
5 0 0 1 1 0 1 1 0 0
6 1 1 1 0 1 0 0 0 0
7 0 1 0 0 1 0 0 1 1
8 0 0 0 0 0 0 1 0 1
9 1 0 1 0 0 0 1 1 0
𝑽𝑽𝟏𝟏={1,2}
𝑽𝑽𝟕𝟕={7,8,9}
𝑽𝑽𝟑𝟑={3,4,5,6}
Summary Graph
𝝎𝝎 {𝑽𝑽𝟏𝟏, 𝑽𝑽𝟕𝟕} =2
𝝎𝝎 {𝑽𝑽𝟏𝟏, 𝑽𝑽𝟑𝟑} =3𝝎𝝎 {𝑽𝑽𝟏𝟏, 𝑽𝑽𝟏𝟏} =1
𝝎𝝎 {𝑽𝑽𝟕𝟕, 𝑽𝑽𝟕𝟕} =3
𝝎𝝎 {𝑽𝑽𝟑𝟑, 𝑽𝑽𝟑𝟑} =5
Introduction Algorithms Experiments ConclusionProblem
Example of Graph Summarization
Introduction Algorithms Experiments ConclusionProblem
1 2 3 4 5 6 7 8 9
1 0 1 3/8 3/8 3/8 3/8 1/3 1/3 1/3
2 1 0 3/8 3/8 3/8 3/8 1/3 1/3 1/3
3 3/8 3/8 0 5/6 5/6 5/6 0 0 0
4 3/8 3/8 5/6 0 5/6 5/6 0 0 0
5 3/8 3/8 5/6 5/6 0 5/6 0 0 0
6 3/8 3/8 5/6 5/6 5/6 0 0 0 0
7 1/3 1/3 0 0 0 0 0 1 1
8 1/3 1/3 0 0 0 0 1 0 1
9 1/3 1/3 0 0 0 0 1 1 0
Reconstructed Adjacency Matrix
𝑽𝑽𝟏𝟏={1,2}
𝑽𝑽𝟕𝟕={7,8,9}
𝑽𝑽𝟑𝟑={3,4,5,6}
Summary Graph
𝝎𝝎 {𝑽𝑽𝟏𝟏, 𝑽𝑽𝟕𝟕} =2
𝝎𝝎 {𝑽𝑽𝟏𝟏, 𝑽𝑽𝟑𝟑} =3𝝎𝝎 {𝑽𝑽𝟏𝟏, 𝑽𝑽𝟏𝟏} =1
𝝎𝝎 {𝑽𝑽𝟕𝟕, 𝑽𝑽𝟕𝟕} =3
𝝎𝝎 {𝑽𝑽𝟑𝟑, 𝑽𝑽𝟑𝟑} =5
Example of Graph Summarization
𝑽𝑽𝟏𝟏={1,2}
𝑽𝑽𝟕𝟕={7,8,9}
𝑽𝑽𝟑𝟑={3,4,5,6}
Summary Graph
𝝎𝝎 {𝑽𝑽𝟏𝟏, 𝑽𝑽𝟕𝟕} =2
𝝎𝝎 {𝑽𝑽𝟏𝟏, 𝑽𝑽𝟑𝟑} =3𝝎𝝎 {𝑽𝑽𝟏𝟏, 𝑽𝑽𝟏𝟏} =1
𝝎𝝎 {𝑽𝑽𝟕𝟕, 𝑽𝑽𝟕𝟕} =3
𝝎𝝎 {𝑽𝑽𝟑𝟑, 𝑽𝑽𝟑𝟑} =5
Introduction Algorithms Experiments ConclusionProblem
1 2 3 4 5 6 7 8 9
1 0 1 3/8 3/8 3/8 3/8 1/3 1/3 1/3
2 1 0 3/8 3/8 3/8 3/8 1/3 1/3 1/3
3 3/8 3/8 0 5/6 5/6 5/6 0 0 0
4 3/8 3/8 5/6 0 5/6 5/6 0 0 0
5 3/8 3/8 5/6 5/6 0 5/6 0 0 0
6 3/8 3/8 5/6 5/6 5/6 0 0 0 0
7 1/3 1/3 0 0 0 0 0 1 1
8 1/3 1/3 0 0 0 0 1 0 1
9 1/3 1/3 0 0 0 0 1 1 0
Reconstructed Adjacency Matrix
𝝎𝝎 {𝑽𝑽𝟏𝟏, 𝑽𝑽𝟑𝟑} = 3
𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 𝑀𝑀𝑀𝑀𝑀𝑀 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑜𝑜𝑜𝑜 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 = 8
Road Map
• Introduction
• Problem <<
• Proposed Algorithm: SSumM
• Experimental Results
• Conclusions
Problem Definition: Graph Summarization
𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮:
a graph 𝑮𝑮 and the target number of node 𝑲𝑲
𝑭𝑭𝑭𝑭𝑭𝑭𝑭𝑭:
a summary graph �𝑮𝑮
𝑻𝑻𝑻𝑻 𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴:
the difference between graph 𝑮𝑮and the restored graph �𝑮𝑮
𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺 𝒕𝒕𝒕𝒕:
the number of supernodes in �𝑮𝑮 ≤ 𝑲𝑲
Introduction Algorithms Experiments ConclusionProblem
Problem Definition: Graph Summarization
𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮:
a graph 𝑮𝑮 and the target number of node 𝑲𝑲
𝑭𝑭𝑭𝑭𝑭𝑭𝑭𝑭:
a summary graph �𝑮𝑮
𝑻𝑻𝑻𝑻 𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴:
the difference between graph 𝑮𝑮and the restored graph �𝑮𝑮
𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺 𝒕𝒕𝒕𝒕:
the number of supernodes in �𝑮𝑮 ≤ 𝑲𝑲
Shouldn’t we
consider sizes?
Introduction Algorithms Experiments ConclusionProblem
Problem Definition: Graph Summarization
𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮:
a graph 𝑮𝑮 and the desired size 𝑲𝑲 (in bits)
𝑭𝑭𝑭𝑭𝑭𝑭𝑭𝑭:
a summary graph �𝑮𝑮
𝑻𝑻𝑻𝑻 𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴:
the difference with graph graph 𝑮𝑮and the restored graph �𝑮𝑮
𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺 𝒕𝒕𝒕𝒕:
size of �𝑮𝑮 in bits ≤ 𝑲𝑲
Introduction Algorithms Experiments ConclusionProblem
Details: Size in Bits of a Graph
𝐸𝐸 : set of edges
𝑉𝑉 : set of nodes
Encoded using log2|𝑉𝑉| bits
Introduction Algorithms Experiments ConclusionProblem
𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺 𝒐𝒐𝒐𝒐 𝒈𝒈𝒈𝒈𝒈𝒈𝒈𝒈𝒈𝒈: 2 𝐸𝐸 log2 𝑉𝑉
Input graph 𝑮𝑮
Details: Size in Bits of a Summary Graph
4
5
1
4
1
5
Introduction Algorithms Experiments ConclusionProblem
𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺 𝒐𝒐𝒐𝒐 𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔 𝒈𝒈𝒈𝒈𝒈𝒈𝒈𝒈𝒈𝒈: 𝑃𝑃 2 log2 𝑆𝑆 + log2 𝜔𝜔𝑚𝑚𝑚𝑚𝑚𝑚 + 𝑉𝑉 log2 𝑆𝑆
𝑆𝑆 : set of supernodes
𝑃𝑃 : set of superedges
𝑤𝑤𝑚𝑚𝑚𝑚𝑚𝑚 : maximum superedge weight
Summary graph �𝑮𝑮
Details: Size in Bits of a Summary Graph
4
5
1
4
1
5
Introduction Algorithms Experiments ConclusionProblem
𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺 𝒐𝒐𝒐𝒐 𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔 𝒈𝒈𝒈𝒈𝒈𝒈𝒈𝒈𝒈𝒈: 𝑃𝑃 2 log2 𝑆𝑆 + log2 𝜔𝜔𝑚𝑚𝑚𝑚𝑚𝑚 + 𝑉𝑉 log2 𝑆𝑆
𝑆𝑆 : set of supernodes
𝑃𝑃 : set of superedges
𝑤𝑤𝑚𝑚𝑚𝑚𝑚𝑚 : maximum superedge weight
Summary graph �𝑮𝑮
Details: Size in Bits of a Summary Graph
4
5
1
4
1
5
Introduction Algorithms Experiments ConclusionProblem
𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺 𝒐𝒐𝒐𝒐 𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔 𝒈𝒈𝒈𝒈𝒈𝒈𝒈𝒈𝒈𝒈: 𝑃𝑃 2 log2 𝑆𝑆 + log2 𝜔𝜔𝑚𝑚𝑚𝑚𝑚𝑚 + 𝑉𝑉 log2 𝑆𝑆
𝑆𝑆 : set of supernodes
𝑃𝑃 : set of superedges
𝑤𝑤𝑚𝑚𝑚𝑚𝑚𝑚 : maximum superedge weight
Summary graph �𝑮𝑮
Details: Size in Bits of a Summary Graph
4
5
1
4
1
5
Introduction Algorithms Experiments ConclusionProblem
𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺 𝒐𝒐𝒐𝒐 𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔 𝒈𝒈𝒈𝒈𝒈𝒈𝒈𝒈𝒈𝒈: 𝑃𝑃 2 log2 𝑆𝑆 + log2 𝜔𝜔𝑚𝑚𝑚𝑚𝑚𝑚 + 𝑉𝑉 log2 𝑆𝑆
𝑆𝑆 : set of supernodes
𝑃𝑃 : set of superedges
𝑤𝑤𝑚𝑚𝑚𝑚𝑚𝑚 : maximum superedge weight
Summary graph �𝑮𝑮
Details: Error Measurement
Introduction Algorithms Experiments ConclusionProblem
1 2 3 4 5 6 7 8 9
1 0 1 3/8 3/8 3/8 3/8 1/3 1/3 1/3
2 1 0 3/8 3/8 3/8 3/8 1/3 1/3 1/3
3 3/8 3/8 0 5/6 5/6 5/6 0 0 0
4 3/8 3/8 5/6 0 5/6 5/6 0 0 0
5 3/8 3/8 5/6 5/6 0 5/6 0 0 0
6 3/8 3/8 5/6 5/6 5/6 0 0 0 0
7 1/3 1/3 0 0 0 0 0 1 1
8 1/3 1/3 0 0 0 0 1 0 1
9 1/3 1/3 0 0 0 0 1 1 0
1 2 3 4 5 6 7 8 9
1 0 1 0 0 0 1 0 0 1
2 1 0 1 0 0 1 1 0 0
3 0 1 0 1 1 1 0 0 1
4 0 0 1 0 1 0 0 0 0
5 0 0 1 1 0 1 1 0 0
6 1 1 1 0 1 0 0 0 0
7 0 1 0 0 1 0 0 1 1
8 0 0 0 0 0 0 1 0 1
9 1 0 1 0 0 0 1 1 0
Reconstructed Adjacency Matrix �𝑨𝑨Reconstructed Adjacency Matrix 𝑨𝑨
𝑅𝑅𝐸𝐸𝑝𝑝(𝑨𝑨, �𝑨𝑨) = �
𝑖𝑖=1
𝑉𝑉
�
𝑗𝑗=1
𝑉𝑉
𝐴𝐴 𝑖𝑖, 𝑗𝑗 − ̂𝐴𝐴 𝑖𝑖, 𝑗𝑗
𝑝𝑝
𝟏𝟏
𝒑𝒑
Road Map
• Introduction
• Problem
• Proposed Algorithm: SSumM <<
• Experimental Results
• Conclusions
Main ideas of SSumM
Introduction Algorithms Experiments ConclusionProblem
Combines node grouping and edge sparsification
Prunes search space
Balances error and size of the summary graph using MDL principle
• Practical graph summarization problem
◦ Given: a graph 𝑮𝑮
◦ Find: a summary graph �𝑮𝑮
◦ To minimize: the difference between 𝑮𝑮and the restored graph �𝑮𝑮
◦ Subject to: 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑜𝑜𝑜𝑜 �𝑮𝑮 in bits ≤ 𝑲𝑲
Main Idea: Combining Two Strategies
Introduction Algorithms Experiments ConclusionProblem
Node Grouping Sparsification
Main Idea: Combining Two Strategies
Introduction Algorithms Experiments ConclusionProblem
Node Grouping Sparsification
Main Idea: Combining Two Strategies
Introduction Algorithms Experiments ConclusionProblem
Node Grouping Sparsification
Main Idea: Combining Two Strategies
Introduction Algorithms Experiments ConclusionProblem
Node Grouping Sparsification
Main Idea: MDL Principle
Introduction Algorithms Experiments ConclusionProblem
1
2
3
4
5
6
Merge (5, 6)
1
2
3
4
Merge (1, 2)
3
4
5
6
5
6
1
2
{1,2}
{5,6}Merge (1, {5,6})
Merge (1, 2)
Merge (1, 3)
Merge (1, 3)
How to choose a next action?
Main Idea: MDL Principle
Introduction Algorithms Experiments ConclusionProblem
1
2
3
4
5
6
Merge (5, 6)
1
2
3
4
Merge (1, 2)
3
4
5
6
5
6
1
2
{1,2}
{5,6}Merge (1, {5,6})
Merge (1, 2)
Merge (1, 3)
Merge (1, 3)
Graph Summarization is
A Search Problem
How to choose a next action?
Main Idea: MDL Principle
Introduction Algorithms Experiments ConclusionProblem
1
2
3
4
5
6
Merge (5, 6)
1
2
3
4
Merge (1, 2)
3
4
5
6
5
6
1
2
{1,2}
{5,6}
Summary graph size + Information loss
Merge (1, {5,6})
Merge (1, 2)
Merge (1, 3)
Merge (1, 3)
Graph Summarization is
A Search Problem
How to choose a next action?
Main Idea: MDL Principle
Introduction Algorithms Experiments ConclusionProblem
1
2
3
4
5
6
Merge (5, 6)
1
2
3
4
Merge (1, 2)
3
4
5
6
5
6
1
2
{1,2}
{5,6}
Summary graph size + Information loss
MDL Principle
Merge (1, {5,6})
Merge (1, 2)
Merge (1, 3)
Merge (1, 3)
Graph Summarization is
A Search Problem
How to choose a next action?
arg min 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 �𝑮𝑮 + 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶(𝑮𝑮|�𝑮𝑮)
�𝑮𝑮
# 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 𝑓𝑓𝑓𝑓𝑓𝑓 �𝑮𝑮 # 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 𝑓𝑓𝑓𝑓𝑓𝑓 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑮𝑮
𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢 �𝑮𝑮
Overview: SSumM
 Initialization phase
 𝑡𝑡 = 1
 While 𝑡𝑡++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits
 Candidate generation phase
 Merge and sparsification phase
 Further sparsification phase
Introduction Algorithms Experiments ConclusionProblem
Procedure
• Given:
◦ (1) An input graph 𝑮𝑮, (2) the desired size 𝑲𝑲, (3) the number 𝑻𝑻 of iterations
• Outputs:
◦ Summary graph �𝑮𝑮
Input graph 𝑮𝑮 Summary graph �𝑮𝑮
Introduction Algorithms Experiments ConclusionProblem
Procedure
 Initialization phase <<
 𝑡𝑡 = 1
 While 𝑡𝑡++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits
 Candidate generation phase
 Merge and sparsification phase
 Further sparsification phase
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒
𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
𝐴𝐴 = {𝑎𝑎}
𝐵𝐵 = {𝑏𝑏}C = {𝑐𝑐}
𝐷𝐷 = {𝑑𝑑}
𝐸𝐸 = {𝑒𝑒}
𝐹𝐹 = {𝑓𝑓}
𝐺𝐺 = {𝑔𝑔}
𝐻𝐻 = {ℎ}
𝐼𝐼 = {𝑖𝑖}
Initialization Phase
Candidate Generation Phase
𝐵𝐵 = {𝑏𝑏}
𝐶𝐶 = {𝑐𝑐}
D = {𝑑𝑑}
𝐸𝐸 = {𝑒𝑒}
𝐴𝐴 = {𝑎𝑎}
𝐹𝐹 = {𝑓𝑓}
𝐺𝐺 = {𝑔𝑔}
𝐻𝐻 = {ℎ}
𝐼𝐼 = {𝑖𝑖}
Input graph 𝑮𝑮
Introduction Algorithms Experiments ConclusionProblem
Procedure
 Initialization phase
 𝑡𝑡 = 1
 While 𝑡𝑡++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits
 Candidate generation phase <<
 Merge and sparsification phase
 Further sparsification phase
𝑎𝑎
𝑏𝑏𝑐𝑐
𝑑𝑑
𝑒𝑒
𝑓𝑓
𝑔𝑔
ℎ
𝑖𝑖
Merging and Sparsification Phase
For each candidate set 𝑪𝑪 Among possible candidate pairs
Introduction Algorithms Experiments ConclusionProblem
(A, B) (B, C) (C, D) (D, E)
(A, C) (B, D) (C, E)
(A, D) (B, E)
(A, E)
𝐵𝐵 = {𝑏𝑏}
𝐶𝐶 = {𝑐𝑐}
D = {𝑑𝑑}
𝐴𝐴 = {𝑎𝑎}
Procedure
 Initialization phase
 𝑡𝑡 = 1
 While 𝑡𝑡++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits
 Candidate generation phase
 Merge and sparsification phase <<
 Further sparsification phase
𝐸𝐸 = {𝑒𝑒}
Merging and Sparsification Phase
For each candidate set 𝑪𝑪 Among possible candidate pairs
Introduction Algorithms Experiments ConclusionProblem
(A, B) (B, C) (C, D) (D, E)
(A, C) (B, D) (C, E)
(A, D) (B, E)
(A, E)
𝐵𝐵 = {𝑏𝑏}
𝐶𝐶 = {𝑐𝑐}
D = {𝑑𝑑}
𝐴𝐴 = {𝑎𝑎}
Procedure
 Initialization phase
 𝑡𝑡 = 1
 While 𝑡𝑡++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits
 Candidate generation phase
 Merge and sparsification phase <<
 Further sparsification phase
Sample log2 |𝑪𝑪| pairs𝐸𝐸 = {𝑒𝑒}
Merging and Sparsification Phase
Select the pair with
the greatest (relative) reduction
in the cost function
𝒊𝒊𝒊𝒊 reduction(C, D) > 𝜽𝜽:
merge(C, D)
𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆
sample log2 |𝑪𝑪| pairs again
Introduction Algorithms Experiments ConclusionProblem
Procedure
 Initialization phase
 𝑡𝑡 = 1
 While 𝑡𝑡++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits
 Candidate generation phase
 Merge and sparsification phase <<
 Further sparsification phase
(A, B) (A, D) (C, D)
Merging and Sparsification Phase
Select the pair with
the greatest (relative) reduction
in the cost function
𝒊𝒊𝒊𝒊 reduction(C, D) > 𝜽𝜽:
merge(C, D)
𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆
sample log2 |𝑪𝑪| pairs again
Introduction Algorithms Experiments ConclusionProblem
Procedure
 Initialization phase
 𝑡𝑡 = 1
 While 𝑡𝑡++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits
 Candidate generation phase
 Merge and sparsification phase <<
 Further sparsification phase
(A, B) (A, D) (C, D)
Merging and Sparsification Phase (cont.)
Summary graph �𝑮𝑮
Introduction Algorithms Experiments ConclusionProblem
Procedure
 Initialization phase
 𝑡𝑡 = 1
 While 𝑡𝑡++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits
 Candidate generation phase
 Merge and sparsification phase <<
 Further sparsification phase
{𝑎𝑎}
{𝑏𝑏}𝐶𝐶 = {𝑐𝑐}
𝐷𝐷 = {𝑑𝑑}
{𝑒𝑒}
{𝑓𝑓}
{𝑔𝑔}
{ℎ}
{𝑖𝑖}
Merging and Sparsification Phase (cont.)
Summary graph �𝑮𝑮
Introduction Algorithms Experiments ConclusionProblem
Procedure
 Initialization phase
 𝑡𝑡 = 1
 While 𝑡𝑡++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits
 Candidate generation phase
 Merge and sparsification phase <<
 Further sparsification phase
{𝑎𝑎}
{𝑏𝑏}
{𝑒𝑒}
{𝑓𝑓}
{𝑔𝑔}
{ℎ}
{𝑖𝑖}
Merging and Sparsification Phase (cont.)
Summary graph �𝑮𝑮
Introduction Algorithms Experiments ConclusionProblem
Procedure
 Initialization phase
 𝑡𝑡 = 1
 While 𝑡𝑡++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits
 Candidate generation phase
 Merge and sparsification phase <<
 Further sparsification phase
{𝑎𝑎}
{𝑏𝑏}
{𝑒𝑒}
{𝑓𝑓}
{𝑔𝑔}
{ℎ}
{𝑖𝑖}
C = {𝑐𝑐, 𝑑𝑑}
Merging and Sparsification Phase (cont.)
Summary graph �𝑮𝑮
Introduction Algorithms Experiments ConclusionProblem
Procedure
 Initialization phase
 𝑡𝑡 = 1
 While 𝑡𝑡++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits
 Candidate generation phase
 Merge and sparsification phase <<
 Further sparsification phase
{𝑎𝑎}
{𝑏𝑏}
{𝑒𝑒}
{𝑓𝑓}
{𝑔𝑔}
{ℎ}
{𝑖𝑖}
C = {𝑐𝑐, 𝑑𝑑}
Merging and Sparsification Phase (cont.)
Summary graph �𝑮𝑮
Introduction Algorithms Experiments ConclusionProblem
Procedure
 Initialization phase
 𝑡𝑡 = 1
 While 𝑡𝑡++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits
 Candidate generation phase
 Merge and sparsification phase <<
 Further sparsification phase
Sparsify or not according
to total description cost
{𝑎𝑎}
{𝑏𝑏}
{𝑒𝑒}
{𝑓𝑓}
{𝑔𝑔}
{ℎ}
{𝑖𝑖}
C = {𝑐𝑐, 𝑑𝑑}
Merging and Sparsification Phase (cont.)
Summary graph �𝑮𝑮
Introduction Algorithms Experiments ConclusionProblem
Procedure
 Initialization phase
 𝑡𝑡 = 1
 While 𝑡𝑡++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits
 Candidate generation phase
 Merge and sparsification phase <<
 Further sparsification phase
Sparsify or not according
to total description cost
{𝑎𝑎}
{𝑏𝑏}
{𝑒𝑒}
{𝑓𝑓}
{𝑔𝑔}
{ℎ}
{𝑖𝑖}
C = {𝑐𝑐, 𝑑𝑑}
Merging and Sparsification Phase (cont.)
Summary graph �𝑮𝑮
Introduction Algorithms Experiments ConclusionProblem
Procedure
 Initialization phase
 𝑡𝑡 = 1
 While 𝑡𝑡++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits
 Candidate generation phase
 Merge and sparsification phase <<
 Further sparsification phase
Sparsify or not according
to total description cost
{𝑎𝑎}
{𝑏𝑏}
{𝑒𝑒}
{𝑓𝑓}
{𝑔𝑔}
{ℎ}
{𝑖𝑖}
C = {𝑐𝑐, 𝑑𝑑}
Repetition
Summary graph �𝑮𝑮
Introduction Algorithms Experiments ConclusionProblem
Procedure
 Initialization phase
 𝑡𝑡 = 1
 While 𝒕𝒕++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits
 Candidate generation phase
 Merge and sparsification phase
 Further sparsification phase
{𝑎𝑎}
{𝑏𝑏}
{𝑐𝑐, 𝑑𝑑}
{𝑒𝑒}
{𝑓𝑓}
{𝑔𝑔}
{ℎ}
{𝑖𝑖}
Repetition
Summary graph �𝑮𝑮
Introduction Algorithms Experiments ConclusionProblem
Different
candidate sets
and decreasing
threshold 𝜃𝜃
over iteration
Procedure
 Initialization phase
 𝑡𝑡 = 1
 While 𝒕𝒕++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits
 Candidate generation phase
 Merge and sparsification phase
 Further sparsification phase
{𝑎𝑎}
{𝑏𝑏}
{𝑐𝑐, 𝑑𝑑}
{𝑒𝑒}
{𝑓𝑓}
{𝑔𝑔}
{ℎ}
{𝑖𝑖}
Repetition
Summary graph �𝑮𝑮
Introduction Algorithms Experiments ConclusionProblem
Summary graph �𝑮𝑮
Different
candidate sets
and decreasing
threshold 𝜃𝜃
over iteration
Procedure
 Initialization phase
 𝑡𝑡 = 1
 While 𝒕𝒕++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits
 Candidate generation phase
 Merge and sparsification phase
 Further sparsification phase
{𝑎𝑎}
{𝑏𝑏}
{𝑐𝑐, 𝑑𝑑}
{𝑒𝑒}
{𝑓𝑓}
{𝑔𝑔}
{ℎ}
{𝑖𝑖}
{𝑎𝑎}
{𝑏𝑏}
{𝑐𝑐, 𝑑𝑑}
{𝑒𝑒}
{𝑓𝑓}
{𝑔𝑔}
{ℎ}
{𝑖𝑖}
Repetition
Summary graph �𝑮𝑮
Introduction Algorithms Experiments ConclusionProblem
Summary graph �𝑮𝑮
Different
candidate sets
and decreasing
threshold 𝜃𝜃
over iteration
Procedure
 Initialization phase
 𝑡𝑡 = 1
 While 𝒕𝒕++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits
 Candidate generation phase
 Merge and sparsification phase
 Further sparsification phase
{𝑎𝑎}
{𝑏𝑏}
{𝑐𝑐, 𝑑𝑑}
{𝑒𝑒}
{𝑓𝑓} {ℎ}
{𝑔𝑔, 𝑖𝑖}
{𝑎𝑎}
{𝑏𝑏}
{𝑐𝑐, 𝑑𝑑}
{𝑒𝑒}
{𝑓𝑓}
{𝑔𝑔}
{ℎ}
{𝑖𝑖}
Repetition
Summary graph �𝑮𝑮
Introduction Algorithms Experiments ConclusionProblem
Summary graph �𝑮𝑮
Different
candidate sets
and decreasing
threshold 𝜃𝜃
over iteration
Procedure
 Initialization phase
 𝑡𝑡 = 1
 While 𝒕𝒕++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits
 Candidate generation phase
 Merge and sparsification phase
 Further sparsification phase
{𝑎𝑎}
{𝑏𝑏}
{𝑐𝑐, 𝑑𝑑}
{𝑒𝑒}
{𝑓𝑓} {ℎ}
{𝑔𝑔, 𝑖𝑖}
{𝑎𝑎}
{𝑏𝑏}
{𝑐𝑐, 𝑑𝑑}
{𝑒𝑒}
{𝑓𝑓}
{𝑔𝑔}
{ℎ}
{𝑖𝑖}
Repetition
Summary graph �𝑮𝑮
Introduction Algorithms Experiments ConclusionProblem
Summary graph �𝑮𝑮
Different
candidate sets
and decreasing
threshold 𝜃𝜃
over iteration
Procedure
 Initialization phase
 𝑡𝑡 = 1
 While 𝒕𝒕++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits
 Candidate generation phase
 Merge and sparsification phase
 Further sparsification phase
{𝑎𝑎}
{𝑏𝑏}
{𝑐𝑐, 𝑑𝑑}
{𝑒𝑒}
{𝑓𝑓} {ℎ}
{𝑔𝑔, 𝑖𝑖}
{𝑎𝑎}
{𝑏𝑏}
{𝑐𝑐, 𝑑𝑑}
{𝑒𝑒}
{𝑓𝑓}
{𝑔𝑔}
{ℎ}
{𝑖𝑖}
Repetition
Summary graph �𝑮𝑮
Introduction Algorithms Experiments ConclusionProblem
Summary graph �𝑮𝑮
Different
candidate sets
and decreasing
threshold 𝜃𝜃
over iteration
Procedure
 Initialization phase
 𝑡𝑡 = 1
 While 𝒕𝒕++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits
 Candidate generation phase
 Merge and sparsification phase
 Further sparsification phase
{𝑎𝑎}
{𝑏𝑏}
{𝑐𝑐, 𝑑𝑑}
{𝑒𝑒}
{𝑓𝑓} {𝑔𝑔, ℎ, 𝑖𝑖}
{𝑎𝑎}
{𝑏𝑏}
{𝑐𝑐, 𝑑𝑑}
{𝑒𝑒}
{𝑓𝑓}
{𝑔𝑔}
{ℎ}
{𝑖𝑖}
Repetition
Summary graph �𝑮𝑮
Introduction Algorithms Experiments ConclusionProblem
Summary graph �𝑮𝑮
Different
candidate sets
and decreasing
threshold 𝜃𝜃
over iteration
Procedure
 Initialization phase
 𝑡𝑡 = 1
 While 𝒕𝒕++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits
 Candidate generation phase
 Merge and sparsification phase
 Further sparsification phase
{𝑎𝑎}
{𝑏𝑏}
{𝑐𝑐, 𝑑𝑑}
{𝑒𝑒}
{𝑓𝑓} {𝑔𝑔, ℎ, 𝑖𝑖}
{𝑎𝑎}
{𝑏𝑏}
{𝑐𝑐, 𝑑𝑑}
{𝑒𝑒}
{𝑓𝑓}
{𝑔𝑔}
{ℎ}
{𝑖𝑖}
Repetition
Summary graph �𝑮𝑮
Introduction Algorithms Experiments ConclusionProblem
Summary graph �𝑮𝑮
Different
candidate sets
and decreasing
threshold 𝜃𝜃
over iteration
Procedure
 Initialization phase
 𝑡𝑡 = 1
 While 𝒕𝒕++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits
 Candidate generation phase
 Merge and sparsification phase
 Further sparsification phase
{𝑎𝑎}
{𝑏𝑏}
{𝑐𝑐, 𝑑𝑑}
{𝑒𝑒}
{𝑓𝑓}
{𝑔𝑔, ℎ, 𝑖𝑖}
Summary graph �𝑮𝑮
{𝑎𝑎}
{𝑏𝑏}
{𝑐𝑐, 𝑑𝑑}
{𝑒𝑒}
{𝑓𝑓} {𝑔𝑔, ℎ, 𝑖𝑖}
{𝑎𝑎}
{𝑏𝑏}
{𝑐𝑐, 𝑑𝑑}
{𝑒𝑒}
{𝑓𝑓}
{𝑔𝑔}
{ℎ}
{𝑖𝑖}
Repetition
Summary graph �𝑮𝑮
Introduction Algorithms Experiments ConclusionProblem
Summary graph �𝑮𝑮
Different
candidate sets
and decreasing
threshold 𝜃𝜃
over iteration
Procedure
 Initialization phase
 𝑡𝑡 = 1
 While 𝒕𝒕++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits
 Candidate generation phase
 Merge and sparsification phase
 Further sparsification phase
{𝑏𝑏}
{𝑐𝑐, 𝑑𝑑}
{𝑓𝑓}
{𝑔𝑔, ℎ, 𝑖𝑖}
Summary graph �𝑮𝑮
{𝑎𝑎}
{𝑏𝑏}
{𝑐𝑐, 𝑑𝑑}
{𝑒𝑒}
{𝑓𝑓} {𝑔𝑔, ℎ, 𝑖𝑖}
{𝑎𝑎}
{𝑏𝑏}
{𝑐𝑐, 𝑑𝑑}
{𝑒𝑒}
{𝑓𝑓}
{𝑔𝑔}
{ℎ}
{𝑖𝑖}
{𝑎𝑎, 𝑒𝑒}
Repetition
Summary graph �𝑮𝑮
Introduction Algorithms Experiments ConclusionProblem
Summary graph �𝑮𝑮
Different
candidate sets
and decreasing
threshold 𝜃𝜃
over iteration
Procedure
 Initialization phase
 𝑡𝑡 = 1
 While 𝒕𝒕++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits
 Candidate generation phase
 Merge and sparsification phase
 Further sparsification phase
{𝑏𝑏}
{𝑐𝑐, 𝑑𝑑}
{𝑓𝑓}
{𝑔𝑔, ℎ, 𝑖𝑖}
Summary graph �𝑮𝑮
{𝑎𝑎}
{𝑏𝑏}
{𝑐𝑐, 𝑑𝑑}
{𝑒𝑒}
{𝑓𝑓} {𝑔𝑔, ℎ, 𝑖𝑖}
{𝑎𝑎}
{𝑏𝑏}
{𝑐𝑐, 𝑑𝑑}
{𝑒𝑒}
{𝑓𝑓}
{𝑔𝑔}
{ℎ}
{𝑖𝑖}
{𝑎𝑎, 𝑒𝑒}
Introduction Algorithms Experiments ConclusionProblem
Further Sparsification Phase
Summary graph �𝑮𝑮
𝐴𝐴 𝐴𝐴
𝐵𝐵 𝐴𝐴
𝐹𝐹 𝐴𝐴
𝐺𝐺 𝐺𝐺
Superedges sorted by ∆𝑅𝑅𝐸𝐸𝑝𝑝
𝐶𝐶 𝐶𝐶
Size of �𝑮𝑮 in bits ≤ 𝑲𝑲
Procedure
 Initialization phase
 𝑡𝑡 = 1
 While 𝑡𝑡++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits
 Candidate generation phase
 Merge and sparsification phase
 Further sparsification phase <<
C = {𝑐𝑐, 𝑑𝑑}
𝐴𝐴 = {𝑎𝑎, 𝑒𝑒}
𝐵𝐵 = {𝑏𝑏}
𝐹𝐹 = {𝑓𝑓}
𝐺𝐺 = {𝑔𝑔, ℎ, 𝑖𝑖}
Road Map
• Introduction
• Problem
• Proposed Algorithm: SSumM
• Experimental Results <<
• Conclusions
• 10 datasets from 6 domains (up to 0.8B edges)
• Three competitors for graph summarization
◦ k-Gs [LT10]
◦ S2L [RSB17]
◦ SAA-Gs [BAZK18]
Introduction Algorithms Experiments ConclusionProblem
Social Internet Email Co-purchase Collaboration Hyperlinks
Experiments Settings
Email-Enron Caida Ego-Facebook
Web-UK-05 Web-UK-02 LiveJournal
DBLP Amazon-0302Skitter
k-GsSSumM S2LSAA-Gs SAA-Gs (linear sample)
Introduction Algorithms Experiments ConclusionProblem
o.o.t. >12hours
o.o.m. >64GB
SSumM Gives Concise and Accurate Summary
Email-Enron Caida Ego-Facebook
Web-UK-05 Web-UK-02 LiveJournal
DBLP Amazon-0302Skitter
k-GsSSumM S2LSAA-Gs SAA-Gs (linear sample)
Introduction Algorithms Experiments ConclusionProblem
o.o.t. >12hours
o.o.m. >64GB
SSumM Gives Concise and Accurate Summary
Email-Enron Caida Ego-Facebook
Web-UK-05 Web-UK-02 LiveJournal
Amazon-0601 DBLPSkitter
k-GsSSumM S2LSAA-Gs SAA-Gs (linear sample)
Introduction Algorithms Experiments ConclusionProblem
SSumM is Fast
Email-Enron Caida Ego-Facebook
Web-UK-05 Web-UK-02 LiveJournal
Amazon-0601 DBLPSkitter
k-GsSSumM S2LSAA-Gs SAA-Gs (linear sample)
Introduction Algorithms Experiments ConclusionProblem
SSumM is Fast
Introduction Algorithms Experiments ConclusionProblem
SSumM is Scalable
Introduction Algorithms Experiments ConclusionProblem
SSumM Converges Fast
Road Map
• Introduction
• Problem
• Proposed Algorithm: SSumM
• Experimental Results
• Conclusions <<
Code available at https://github.com/KyuhanLee/SSumM
Concise & Accurate Fast Scalable
Introduction Algorithms Experiments ConclusionProblem
Practical Problem Formulation
Extensive Experiments on 10 real world graphs
Scalable and Effective Algorithm Design
Conclusions
SSumM : Sparse Summarization
of Massive Graphs
Kyuhan Lee* Hyeonsoo Jo* Jihoon Ko Sungsu Lim Kijung Shin
1 of 71

Recommended

"Incremental Lossless Graph Summarization", KDD 2020 by
"Incremental Lossless Graph Summarization", KDD 2020"Incremental Lossless Graph Summarization", KDD 2020
"Incremental Lossless Graph Summarization", KDD 2020지훈 고
550 views113 slides
Hmotif vldb2020 slide by
Hmotif vldb2020 slideHmotif vldb2020 slide
Hmotif vldb2020 slideGeon Lee
357 views35 slides
GraphBLAS: A linear algebraic approach for high-performance graph queries by
GraphBLAS: A linear algebraic approach for high-performance graph queriesGraphBLAS: A linear algebraic approach for high-performance graph queries
GraphBLAS: A linear algebraic approach for high-performance graph queriesGábor Szárnyas
274 views48 slides
Identification of unknown parameters and prediction of missing values. Compar... by
Identification of unknown parameters and prediction of missing values. Compar...Identification of unknown parameters and prediction of missing values. Compar...
Identification of unknown parameters and prediction of missing values. Compar...Alexander Litvinenko
159 views39 slides
Fixed point theorems for random variables in complete metric spaces by
Fixed point theorems for random variables in complete metric spacesFixed point theorems for random variables in complete metric spaces
Fixed point theorems for random variables in complete metric spacesAlexander Decker
193 views8 slides
Application of parallel hierarchical matrices for parameter inference and pre... by
Application of parallel hierarchical matrices for parameter inference and pre...Application of parallel hierarchical matrices for parameter inference and pre...
Application of parallel hierarchical matrices for parameter inference and pre...Alexander Litvinenko
114 views28 slides

More Related Content

What's hot

Igraph by
IgraphIgraph
IgraphAnu Radha
1.1K views63 slides
Application of parallel hierarchical matrices and low-rank tensors in spatial... by
Application of parallel hierarchical matrices and low-rank tensors in spatial...Application of parallel hierarchical matrices and low-rank tensors in spatial...
Application of parallel hierarchical matrices and low-rank tensors in spatial...Alexander Litvinenko
116 views38 slides
Trial terengganu 2014 spm add math k2 skema by
Trial terengganu 2014 spm add math k2 skemaTrial terengganu 2014 spm add math k2 skema
Trial terengganu 2014 spm add math k2 skemaCikgu Pejal
3.2K views10 slides
Low-rank matrix approximations in Python by Christian Thurau PyData 2014 by
Low-rank matrix approximations in Python by Christian Thurau PyData 2014Low-rank matrix approximations in Python by Christian Thurau PyData 2014
Low-rank matrix approximations in Python by Christian Thurau PyData 2014PyData
9.3K views25 slides
Add Maths 1 by
Add Maths 1Add Maths 1
Add Maths 1morabisma
3.1K views5 slides
Hideitsu Hino by
Hideitsu HinoHideitsu Hino
Hideitsu HinoSuurist
1.3K views96 slides

What's hot(20)

Igraph by Anu Radha
IgraphIgraph
Igraph
Anu Radha1.1K views
Application of parallel hierarchical matrices and low-rank tensors in spatial... by Alexander Litvinenko
Application of parallel hierarchical matrices and low-rank tensors in spatial...Application of parallel hierarchical matrices and low-rank tensors in spatial...
Application of parallel hierarchical matrices and low-rank tensors in spatial...
Trial terengganu 2014 spm add math k2 skema by Cikgu Pejal
Trial terengganu 2014 spm add math k2 skemaTrial terengganu 2014 spm add math k2 skema
Trial terengganu 2014 spm add math k2 skema
Cikgu Pejal3.2K views
Low-rank matrix approximations in Python by Christian Thurau PyData 2014 by PyData
Low-rank matrix approximations in Python by Christian Thurau PyData 2014Low-rank matrix approximations in Python by Christian Thurau PyData 2014
Low-rank matrix approximations in Python by Christian Thurau PyData 2014
PyData9.3K views
Add Maths 1 by morabisma
Add Maths 1Add Maths 1
Add Maths 1
morabisma3.1K views
Hideitsu Hino by Suurist
Hideitsu HinoHideitsu Hino
Hideitsu Hino
Suurist1.3K views
Hiroyuki Sato by Suurist
Hiroyuki SatoHiroyuki Sato
Hiroyuki Sato
Suurist8.5K views
Oct8 - 131 slid by Tak Lee
Oct8 - 131 slidOct8 - 131 slid
Oct8 - 131 slid
Tak Lee617 views
Skema SPM SBP Add Maths Paper 2012 by Tuisyen Geliga
Skema SPM SBP Add Maths Paper 2012Skema SPM SBP Add Maths Paper 2012
Skema SPM SBP Add Maths Paper 2012
Tuisyen Geliga8.4K views
Tetsunao Matsuta by Suurist
Tetsunao MatsutaTetsunao Matsuta
Tetsunao Matsuta
Suurist1.4K views
Deep genenergyprobdoc by Masato Nakai
Deep genenergyprobdocDeep genenergyprobdoc
Deep genenergyprobdoc
Masato Nakai2.4K views
Hiroaki Shiokawa by Suurist
Hiroaki ShiokawaHiroaki Shiokawa
Hiroaki Shiokawa
Suurist1.1K views
Lecture 5: Stochastic Hydrology by Amro Elfeki
Lecture 5: Stochastic Hydrology Lecture 5: Stochastic Hydrology
Lecture 5: Stochastic Hydrology
Amro Elfeki216 views
Solution Manual : Chapter - 01 Functions by Hareem Aslam
Solution Manual : Chapter - 01 FunctionsSolution Manual : Chapter - 01 Functions
Solution Manual : Chapter - 01 Functions
Hareem Aslam1.5K views
Identification of unknown parameters and prediction with hierarchical matrice... by Alexander Litvinenko
Identification of unknown parameters and prediction with hierarchical matrice...Identification of unknown parameters and prediction with hierarchical matrice...
Identification of unknown parameters and prediction with hierarchical matrice...
Sbe final exam jan17 - solved-converted by cairo university
Sbe final exam jan17 - solved-convertedSbe final exam jan17 - solved-converted
Sbe final exam jan17 - solved-converted
cairo university37 views
Solution Manual : Chapter - 05 Integration by Hareem Aslam
Solution Manual : Chapter - 05 IntegrationSolution Manual : Chapter - 05 Integration
Solution Manual : Chapter - 05 Integration
Hareem Aslam3.7K views
Solution Manual : Chapter - 06 Application of the Definite Integral in Geomet... by Hareem Aslam
Solution Manual : Chapter - 06 Application of the Definite Integral in Geomet...Solution Manual : Chapter - 06 Application of the Definite Integral in Geomet...
Solution Manual : Chapter - 06 Application of the Definite Integral in Geomet...
Hareem Aslam4.9K views

Similar to "SSumM: Sparse Summarization of Massive Graphs", KDD 2020

Control charts by
Control chartsControl charts
Control chartsWaqaruddin Siddiqui, MBA
1.7K views39 slides
A Step By Step Guide On Derivation And Analysis Of A New Numerical Method For... by
A Step By Step Guide On Derivation And Analysis Of A New Numerical Method For...A Step By Step Guide On Derivation And Analysis Of A New Numerical Method For...
A Step By Step Guide On Derivation And Analysis Of A New Numerical Method For...Whitney Anderson
3 views19 slides
Brief instruction on backprop by
Brief instruction on backpropBrief instruction on backprop
Brief instruction on backpropYasutoTamura1
1.1K views17 slides
Data Analysis Assignment Help by
Data Analysis Assignment HelpData Analysis Assignment Help
Data Analysis Assignment HelpMatlab Assignment Experts
54 views17 slides
icdm06-tong.ppt by
icdm06-tong.ppticdm06-tong.ppt
icdm06-tong.pptAbhishekSingh43430
4 views46 slides
Pratt truss optimization using by
Pratt truss optimization usingPratt truss optimization using
Pratt truss optimization usingHarish Kant Soni
2.1K views19 slides

Similar to "SSumM: Sparse Summarization of Massive Graphs", KDD 2020(20)

A Step By Step Guide On Derivation And Analysis Of A New Numerical Method For... by Whitney Anderson
A Step By Step Guide On Derivation And Analysis Of A New Numerical Method For...A Step By Step Guide On Derivation And Analysis Of A New Numerical Method For...
A Step By Step Guide On Derivation And Analysis Of A New Numerical Method For...
Brief instruction on backprop by YasutoTamura1
Brief instruction on backpropBrief instruction on backprop
Brief instruction on backprop
YasutoTamura11.1K views
Enumerating cycles in bipartite graph using matrix approach by Usatyuk Vasiliy
Enumerating cycles in bipartite graph using matrix approachEnumerating cycles in bipartite graph using matrix approach
Enumerating cycles in bipartite graph using matrix approach
Usatyuk Vasiliy411 views
Nelson maple pdf by NelsonP23
Nelson maple pdfNelson maple pdf
Nelson maple pdf
NelsonP23206 views
Sample Data for Problem 1aA =[1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0.docx by anhlodge
Sample Data for Problem 1aA =[1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0.docxSample Data for Problem 1aA =[1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0.docx
Sample Data for Problem 1aA =[1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0.docx
anhlodge6 views
Overview of sparse and low-rank matrix / tensor techniques by Alexander Litvinenko
Overview of sparse and low-rank matrix / tensor techniques Overview of sparse and low-rank matrix / tensor techniques
Overview of sparse and low-rank matrix / tensor techniques
My presentation at University of Nottingham "Fast low-rank methods for solvin... by Alexander Litvinenko
My presentation at University of Nottingham "Fast low-rank methods for solvin...My presentation at University of Nottingham "Fast low-rank methods for solvin...
My presentation at University of Nottingham "Fast low-rank methods for solvin...
PREDICTION MODELS BASED ON MAX-STEMS Episode Two: Combinatorial Approach by ahmet furkan emrehan
PREDICTION MODELS BASED ON MAX-STEMS Episode Two: Combinatorial ApproachPREDICTION MODELS BASED ON MAX-STEMS Episode Two: Combinatorial Approach
PREDICTION MODELS BASED ON MAX-STEMS Episode Two: Combinatorial Approach
Computer Vision: Correlation, Convolution, and Gradient by Ahmed Gad
Computer Vision: Correlation, Convolution, and GradientComputer Vision: Correlation, Convolution, and Gradient
Computer Vision: Correlation, Convolution, and Gradient
Ahmed Gad7.5K views
Statistical quality control, sampling by Sana Fatima
Statistical quality control, samplingStatistical quality control, sampling
Statistical quality control, sampling
Sana Fatima259 views
Shmoo Quantify by Dan Shu
Shmoo QuantifyShmoo Quantify
Shmoo Quantify
Dan Shu1.5K views
newmicrosoftofficepowerpointpresentation-150826055944-lva1-app6891.pdf by BiswajitPalei2
newmicrosoftofficepowerpointpresentation-150826055944-lva1-app6891.pdfnewmicrosoftofficepowerpointpresentation-150826055944-lva1-app6891.pdf
newmicrosoftofficepowerpointpresentation-150826055944-lva1-app6891.pdf
BiswajitPalei24 views

Recently uploaded

3196 The Case of The East River by
3196 The Case of The East River3196 The Case of The East River
3196 The Case of The East RiverErickANDRADE90
17 views4 slides
CRM stick or twist workshop by
CRM stick or twist workshopCRM stick or twist workshop
CRM stick or twist workshopinfo828217
11 views16 slides
Cross-network in Google Analytics 4.pdf by
Cross-network in Google Analytics 4.pdfCross-network in Google Analytics 4.pdf
Cross-network in Google Analytics 4.pdfGA4 Tutorials
6 views7 slides
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx by
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptxDataScienceConferenc1
5 views12 slides
MOSORE_BRESCIA by
MOSORE_BRESCIAMOSORE_BRESCIA
MOSORE_BRESCIAFederico Karagulian
5 views8 slides
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ... by
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...DataScienceConferenc1
5 views19 slides

Recently uploaded(20)

3196 The Case of The East River by ErickANDRADE90
3196 The Case of The East River3196 The Case of The East River
3196 The Case of The East River
ErickANDRADE9017 views
CRM stick or twist workshop by info828217
CRM stick or twist workshopCRM stick or twist workshop
CRM stick or twist workshop
info82821711 views
Cross-network in Google Analytics 4.pdf by GA4 Tutorials
Cross-network in Google Analytics 4.pdfCross-network in Google Analytics 4.pdf
Cross-network in Google Analytics 4.pdf
GA4 Tutorials6 views
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx by DataScienceConferenc1
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ... by DataScienceConferenc1
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...
UNEP FI CRS Climate Risk Results.pptx by pekka28
UNEP FI CRS Climate Risk Results.pptxUNEP FI CRS Climate Risk Results.pptx
UNEP FI CRS Climate Risk Results.pptx
pekka2811 views
Advanced_Recommendation_Systems_Presentation.pptx by neeharikasingh29
Advanced_Recommendation_Systems_Presentation.pptxAdvanced_Recommendation_Systems_Presentation.pptx
Advanced_Recommendation_Systems_Presentation.pptx
Ukraine Infographic_22NOV2023_v2.pdf by AnastosiyaGurin
Ukraine Infographic_22NOV2023_v2.pdfUkraine Infographic_22NOV2023_v2.pdf
Ukraine Infographic_22NOV2023_v2.pdf
AnastosiyaGurin1.4K views
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init... by DataScienceConferenc1
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...
CRIJ4385_Death Penalty_F23.pptx by yvettemm100
CRIJ4385_Death Penalty_F23.pptxCRIJ4385_Death Penalty_F23.pptx
CRIJ4385_Death Penalty_F23.pptx
yvettemm1007 views
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx by DataScienceConferenc1
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx
Survey on Factuality in LLM's.pptx by NeethaSherra1
Survey on Factuality in LLM's.pptxSurvey on Factuality in LLM's.pptx
Survey on Factuality in LLM's.pptx
NeethaSherra17 views
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an... by StatsCommunications
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...
Data about the sector workshop by info828217
Data about the sector workshopData about the sector workshop
Data about the sector workshop
info82821715 views
SUPER STORE SQL PROJECT.pptx by khan888620
SUPER STORE SQL PROJECT.pptxSUPER STORE SQL PROJECT.pptx
SUPER STORE SQL PROJECT.pptx
khan88862013 views
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation by DataScienceConferenc1
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
CRM stick or twist.pptx by info828217
CRM stick or twist.pptxCRM stick or twist.pptx
CRM stick or twist.pptx
info82821711 views

"SSumM: Sparse Summarization of Massive Graphs", KDD 2020

  • 1. SSumM : Sparse Summarization of Massive Graphs Kyuhan Lee* Hyeonsoo Jo* Jihoon Ko Sungsu Lim Kijung Shin
  • 2. Graphs are Everywhere Citation networksSubway networks Internet topologies Introduction Algorithms Experiments ConclusionProblem Designed by Freepik from Flaticon
  • 3. Massive Graphs Appeared Social networks 2.49 Billion active users Purchase histories 0.5 Billion products World Wide Web 5.49 Billion web pages Introduction Algorithms Experiments ConclusionProblem
  • 4. Difficulties in Analyzing Massive graphs Computational cost (number of nodes & edges) Introduction Algorithms Experiments ConclusionProblem
  • 5. Difficulties in Analyzing Massive graphs > Cannot fit Introduction Algorithms Experiments ConclusionProblem Input Graph
  • 6. Solution: Graph Summarization > Introduction Algorithms Experiments ConclusionProblem Can fit Summary Graph
  • 7. Advantages of Graph Summarization Introduction Algorithms Experiments ConclusionProblem • Many graph compression techniques are available • TheWebGraph Framework [BV04] • BFS encoding [AD09] • SlashBurn [KF11] • VoG [KKVF14] • Graph summarization stands out because • Elastic: reduce size of outputs as much as we want • Analyzable: existing graph analysis and tools can be applied • Combinable for Additional Compression: can be further compressed
  • 8. Example of Graph Summarization 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 0 1 0 0 0 1 0 0 1 2 1 0 1 0 0 1 1 0 0 3 0 1 0 1 1 1 0 0 1 4 0 0 1 0 1 0 0 0 0 5 0 0 1 1 0 1 1 0 0 6 1 1 1 0 1 0 0 0 0 7 0 1 0 0 1 0 0 1 1 8 0 0 0 0 0 0 1 0 1 9 1 0 1 0 0 0 1 1 0 Adjacency Matrix 1 2 3 4 5 6 7 8 9 1 0 1 0 0 0 1 0 0 1 2 1 0 1 0 0 1 1 0 0 3 0 1 0 1 1 1 0 0 1 4 0 0 1 0 1 0 0 0 0 5 0 0 1 1 0 1 1 0 0 6 1 1 1 0 1 0 0 0 0 7 0 1 0 0 1 0 0 1 1 8 0 0 0 0 0 0 1 0 1 9 1 0 1 0 0 0 1 1 0 1 2 3 4 5 6 7 8 9 1 0 1 0 0 0 1 0 0 1 2 1 0 1 0 0 1 1 0 0 3 0 1 0 1 1 1 0 0 1 4 0 0 1 0 1 0 0 0 0 5 0 0 1 1 0 1 1 0 0 6 1 1 1 0 1 0 0 0 0 7 0 1 0 0 1 0 0 1 1 8 0 0 0 0 0 0 1 0 1 9 1 0 1 0 0 0 1 1 0 Introduction Algorithms Experiments ConclusionProblem Input Graph
  • 9. Example of Graph Summarization 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 0 1 0 0 0 1 0 0 1 2 1 0 1 0 0 1 1 0 0 3 0 1 0 1 1 1 0 0 1 4 0 0 1 0 1 0 0 0 0 5 0 0 1 1 0 1 1 0 0 6 1 1 1 0 1 0 0 0 0 7 0 1 0 0 1 0 0 1 1 8 0 0 0 0 0 0 1 0 1 9 1 0 1 0 0 0 1 1 0 Adjacency Matrix Input Graph 1 2 3 4 5 6 7 8 9 1 0 1 0 0 0 1 0 0 1 2 1 0 1 0 0 1 1 0 0 3 0 1 0 1 1 1 0 0 1 4 0 0 1 0 1 0 0 0 0 5 0 0 1 1 0 1 1 0 0 6 1 1 1 0 1 0 0 0 0 7 0 1 0 0 1 0 0 1 1 8 0 0 0 0 0 0 1 0 1 9 1 0 1 0 0 0 1 1 0 1 2 3 4 5 6 7 8 9 1 0 1 0 0 0 1 0 0 1 2 1 0 1 0 0 1 1 0 0 3 0 1 0 1 1 1 0 0 1 4 0 0 1 0 1 0 0 0 0 5 0 0 1 1 0 1 1 0 0 6 1 1 1 0 1 0 0 0 0 7 0 1 0 0 1 0 0 1 1 8 0 0 0 0 0 0 1 0 1 9 1 0 1 0 0 0 1 1 0 𝑽𝑽𝟏𝟏={1,2} 𝑽𝑽𝟕𝟕={7,8,9} 𝑽𝑽𝟑𝟑={3,4,5,6} Summary Graph Introduction Algorithms Experiments ConclusionProblem Subnode Subedge Supernode
  • 10. Example of Graph Summarization 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 0 1 0 0 0 1 0 0 1 2 1 0 1 0 0 1 1 0 0 3 0 1 0 1 1 1 0 0 1 4 0 0 1 0 1 0 0 0 0 5 0 0 1 1 0 1 1 0 0 6 1 1 1 0 1 0 0 0 0 7 0 1 0 0 1 0 0 1 1 8 0 0 0 0 0 0 1 0 1 9 1 0 1 0 0 0 1 1 0 Adjacency Matrix Input Graph 1 2 3 4 5 6 7 8 9 1 0 1 0 0 0 1 0 0 1 2 1 0 1 0 0 1 1 0 0 3 0 1 0 1 1 1 0 0 1 4 0 0 1 0 1 0 0 0 0 5 0 0 1 1 0 1 1 0 0 6 1 1 1 0 1 0 0 0 0 7 0 1 0 0 1 0 0 1 1 8 0 0 0 0 0 0 1 0 1 9 1 0 1 0 0 0 1 1 0 1 2 3 4 5 6 7 8 9 1 0 1 0 0 0 1 0 0 1 2 1 0 1 0 0 1 1 0 0 3 0 1 0 1 1 1 0 0 1 4 0 0 1 0 1 0 0 0 0 5 0 0 1 1 0 1 1 0 0 6 1 1 1 0 1 0 0 0 0 7 0 1 0 0 1 0 0 1 1 8 0 0 0 0 0 0 1 0 1 9 1 0 1 0 0 0 1 1 0 𝑽𝑽𝟏𝟏={1,2} 𝑽𝑽𝟕𝟕={7,8,9} 𝑽𝑽𝟑𝟑={3,4,5,6} Summary Graph 𝝎𝝎 {𝑽𝑽𝟏𝟏, 𝑽𝑽𝟑𝟑} =3 Introduction Algorithms Experiments ConclusionProblem Subnode Subedge Superedge Supernode
  • 11. Example of Graph Summarization 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 0 1 0 0 0 1 0 0 1 2 1 0 1 0 0 1 1 0 0 3 0 1 0 1 1 1 0 0 1 4 0 0 1 0 1 0 0 0 0 5 0 0 1 1 0 1 1 0 0 6 1 1 1 0 1 0 0 0 0 7 0 1 0 0 1 0 0 1 1 8 0 0 0 0 0 0 1 0 1 9 1 0 1 0 0 0 1 1 0 Adjacency Matrix Input Graph 1 2 3 4 5 6 7 8 9 1 0 1 0 0 0 1 0 0 1 2 1 0 1 0 0 1 1 0 0 3 0 1 0 1 1 1 0 0 1 4 0 0 1 0 1 0 0 0 0 5 0 0 1 1 0 1 1 0 0 6 1 1 1 0 1 0 0 0 0 7 0 1 0 0 1 0 0 1 1 8 0 0 0 0 0 0 1 0 1 9 1 0 1 0 0 0 1 1 0 1 2 3 4 5 6 7 8 9 1 0 1 0 0 0 1 0 0 1 2 1 0 1 0 0 1 1 0 0 3 0 1 0 1 1 1 0 0 1 4 0 0 1 0 1 0 0 0 0 5 0 0 1 1 0 1 1 0 0 6 1 1 1 0 1 0 0 0 0 7 0 1 0 0 1 0 0 1 1 8 0 0 0 0 0 0 1 0 1 9 1 0 1 0 0 0 1 1 0 𝑽𝑽𝟏𝟏={1,2} 𝑽𝑽𝟕𝟕={7,8,9} 𝑽𝑽𝟑𝟑={3,4,5,6} Summary Graph 𝝎𝝎 {𝑽𝑽𝟏𝟏, 𝑽𝑽𝟕𝟕} =2 𝝎𝝎 {𝑽𝑽𝟏𝟏, 𝑽𝑽𝟑𝟑} =3 𝝎𝝎 {𝑽𝑽𝟑𝟑, 𝑽𝑽𝟕𝟕} =2 𝝎𝝎 {𝑽𝑽𝟏𝟏, 𝑽𝑽𝟏𝟏} =1 𝝎𝝎 {𝑽𝑽𝟕𝟕, 𝑽𝑽𝟕𝟕} =3 𝝎𝝎 {𝑽𝑽𝟑𝟑, 𝑽𝑽𝟑𝟑} =5 Introduction Algorithms Experiments ConclusionProblem
  • 12. Example of Graph Summarization 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 0 1 0 0 0 1 0 0 1 2 1 0 1 0 0 1 1 0 0 3 0 1 0 1 1 1 0 0 1 4 0 0 1 0 1 0 0 0 0 5 0 0 1 1 0 1 1 0 0 6 1 1 1 0 1 0 0 0 0 7 0 1 0 0 1 0 0 1 1 8 0 0 0 0 0 0 1 0 1 9 1 0 1 0 0 0 1 1 0 Adjacency Matrix Input Graph 1 2 3 4 5 6 7 8 9 1 0 1 0 0 0 1 0 0 1 2 1 0 1 0 0 1 1 0 0 3 0 1 0 1 1 1 0 0 1 4 0 0 1 0 1 0 0 0 0 5 0 0 1 1 0 1 1 0 0 6 1 1 1 0 1 0 0 0 0 7 0 1 0 0 1 0 0 1 1 8 0 0 0 0 0 0 1 0 1 9 1 0 1 0 0 0 1 1 0 1 2 3 4 5 6 7 8 9 1 0 1 0 0 0 1 0 0 1 2 1 0 1 0 0 1 1 0 0 3 0 1 0 1 1 1 0 0 1 4 0 0 1 0 1 0 0 0 0 5 0 0 1 1 0 1 1 0 0 6 1 1 1 0 1 0 0 0 0 7 0 1 0 0 1 0 0 1 1 8 0 0 0 0 0 0 1 0 1 9 1 0 1 0 0 0 1 1 0 𝑽𝑽𝟏𝟏={1,2} 𝑽𝑽𝟕𝟕={7,8,9} 𝑽𝑽𝟑𝟑={3,4,5,6} Summary Graph 𝝎𝝎 {𝑽𝑽𝟏𝟏, 𝑽𝑽𝟕𝟕} =2 𝝎𝝎 {𝑽𝑽𝟏𝟏, 𝑽𝑽𝟑𝟑} =3𝝎𝝎 {𝑽𝑽𝟏𝟏, 𝑽𝑽𝟏𝟏} =1 𝝎𝝎 {𝑽𝑽𝟕𝟕, 𝑽𝑽𝟕𝟕} =3 𝝎𝝎 {𝑽𝑽𝟑𝟑, 𝑽𝑽𝟑𝟑} =5 Introduction Algorithms Experiments ConclusionProblem
  • 13. Example of Graph Summarization Introduction Algorithms Experiments ConclusionProblem 1 2 3 4 5 6 7 8 9 1 0 1 3/8 3/8 3/8 3/8 1/3 1/3 1/3 2 1 0 3/8 3/8 3/8 3/8 1/3 1/3 1/3 3 3/8 3/8 0 5/6 5/6 5/6 0 0 0 4 3/8 3/8 5/6 0 5/6 5/6 0 0 0 5 3/8 3/8 5/6 5/6 0 5/6 0 0 0 6 3/8 3/8 5/6 5/6 5/6 0 0 0 0 7 1/3 1/3 0 0 0 0 0 1 1 8 1/3 1/3 0 0 0 0 1 0 1 9 1/3 1/3 0 0 0 0 1 1 0 Reconstructed Adjacency Matrix 𝑽𝑽𝟏𝟏={1,2} 𝑽𝑽𝟕𝟕={7,8,9} 𝑽𝑽𝟑𝟑={3,4,5,6} Summary Graph 𝝎𝝎 {𝑽𝑽𝟏𝟏, 𝑽𝑽𝟕𝟕} =2 𝝎𝝎 {𝑽𝑽𝟏𝟏, 𝑽𝑽𝟑𝟑} =3𝝎𝝎 {𝑽𝑽𝟏𝟏, 𝑽𝑽𝟏𝟏} =1 𝝎𝝎 {𝑽𝑽𝟕𝟕, 𝑽𝑽𝟕𝟕} =3 𝝎𝝎 {𝑽𝑽𝟑𝟑, 𝑽𝑽𝟑𝟑} =5
  • 14. Example of Graph Summarization 𝑽𝑽𝟏𝟏={1,2} 𝑽𝑽𝟕𝟕={7,8,9} 𝑽𝑽𝟑𝟑={3,4,5,6} Summary Graph 𝝎𝝎 {𝑽𝑽𝟏𝟏, 𝑽𝑽𝟕𝟕} =2 𝝎𝝎 {𝑽𝑽𝟏𝟏, 𝑽𝑽𝟑𝟑} =3𝝎𝝎 {𝑽𝑽𝟏𝟏, 𝑽𝑽𝟏𝟏} =1 𝝎𝝎 {𝑽𝑽𝟕𝟕, 𝑽𝑽𝟕𝟕} =3 𝝎𝝎 {𝑽𝑽𝟑𝟑, 𝑽𝑽𝟑𝟑} =5 Introduction Algorithms Experiments ConclusionProblem 1 2 3 4 5 6 7 8 9 1 0 1 3/8 3/8 3/8 3/8 1/3 1/3 1/3 2 1 0 3/8 3/8 3/8 3/8 1/3 1/3 1/3 3 3/8 3/8 0 5/6 5/6 5/6 0 0 0 4 3/8 3/8 5/6 0 5/6 5/6 0 0 0 5 3/8 3/8 5/6 5/6 0 5/6 0 0 0 6 3/8 3/8 5/6 5/6 5/6 0 0 0 0 7 1/3 1/3 0 0 0 0 0 1 1 8 1/3 1/3 0 0 0 0 1 0 1 9 1/3 1/3 0 0 0 0 1 1 0 Reconstructed Adjacency Matrix 𝝎𝝎 {𝑽𝑽𝟏𝟏, 𝑽𝑽𝟑𝟑} = 3 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 𝑀𝑀𝑀𝑀𝑀𝑀 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑜𝑜𝑜𝑜 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 = 8
  • 15. Road Map • Introduction • Problem << • Proposed Algorithm: SSumM • Experimental Results • Conclusions
  • 16. Problem Definition: Graph Summarization 𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮: a graph 𝑮𝑮 and the target number of node 𝑲𝑲 𝑭𝑭𝑭𝑭𝑭𝑭𝑭𝑭: a summary graph �𝑮𝑮 𝑻𝑻𝑻𝑻 𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴: the difference between graph 𝑮𝑮and the restored graph �𝑮𝑮 𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺 𝒕𝒕𝒕𝒕: the number of supernodes in �𝑮𝑮 ≤ 𝑲𝑲 Introduction Algorithms Experiments ConclusionProblem
  • 17. Problem Definition: Graph Summarization 𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮: a graph 𝑮𝑮 and the target number of node 𝑲𝑲 𝑭𝑭𝑭𝑭𝑭𝑭𝑭𝑭: a summary graph �𝑮𝑮 𝑻𝑻𝑻𝑻 𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴: the difference between graph 𝑮𝑮and the restored graph �𝑮𝑮 𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺 𝒕𝒕𝒕𝒕: the number of supernodes in �𝑮𝑮 ≤ 𝑲𝑲 Shouldn’t we consider sizes? Introduction Algorithms Experiments ConclusionProblem
  • 18. Problem Definition: Graph Summarization 𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮: a graph 𝑮𝑮 and the desired size 𝑲𝑲 (in bits) 𝑭𝑭𝑭𝑭𝑭𝑭𝑭𝑭: a summary graph �𝑮𝑮 𝑻𝑻𝑻𝑻 𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴𝑴: the difference with graph graph 𝑮𝑮and the restored graph �𝑮𝑮 𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺 𝒕𝒕𝒕𝒕: size of �𝑮𝑮 in bits ≤ 𝑲𝑲 Introduction Algorithms Experiments ConclusionProblem
  • 19. Details: Size in Bits of a Graph 𝐸𝐸 : set of edges 𝑉𝑉 : set of nodes Encoded using log2|𝑉𝑉| bits Introduction Algorithms Experiments ConclusionProblem 𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺 𝒐𝒐𝒐𝒐 𝒈𝒈𝒈𝒈𝒈𝒈𝒈𝒈𝒈𝒈: 2 𝐸𝐸 log2 𝑉𝑉 Input graph 𝑮𝑮
  • 20. Details: Size in Bits of a Summary Graph 4 5 1 4 1 5 Introduction Algorithms Experiments ConclusionProblem 𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺 𝒐𝒐𝒐𝒐 𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔 𝒈𝒈𝒈𝒈𝒈𝒈𝒈𝒈𝒈𝒈: 𝑃𝑃 2 log2 𝑆𝑆 + log2 𝜔𝜔𝑚𝑚𝑚𝑚𝑚𝑚 + 𝑉𝑉 log2 𝑆𝑆 𝑆𝑆 : set of supernodes 𝑃𝑃 : set of superedges 𝑤𝑤𝑚𝑚𝑚𝑚𝑚𝑚 : maximum superedge weight Summary graph �𝑮𝑮
  • 21. Details: Size in Bits of a Summary Graph 4 5 1 4 1 5 Introduction Algorithms Experiments ConclusionProblem 𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺 𝒐𝒐𝒐𝒐 𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔 𝒈𝒈𝒈𝒈𝒈𝒈𝒈𝒈𝒈𝒈: 𝑃𝑃 2 log2 𝑆𝑆 + log2 𝜔𝜔𝑚𝑚𝑚𝑚𝑚𝑚 + 𝑉𝑉 log2 𝑆𝑆 𝑆𝑆 : set of supernodes 𝑃𝑃 : set of superedges 𝑤𝑤𝑚𝑚𝑚𝑚𝑚𝑚 : maximum superedge weight Summary graph �𝑮𝑮
  • 22. Details: Size in Bits of a Summary Graph 4 5 1 4 1 5 Introduction Algorithms Experiments ConclusionProblem 𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺 𝒐𝒐𝒐𝒐 𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔 𝒈𝒈𝒈𝒈𝒈𝒈𝒈𝒈𝒈𝒈: 𝑃𝑃 2 log2 𝑆𝑆 + log2 𝜔𝜔𝑚𝑚𝑚𝑚𝑚𝑚 + 𝑉𝑉 log2 𝑆𝑆 𝑆𝑆 : set of supernodes 𝑃𝑃 : set of superedges 𝑤𝑤𝑚𝑚𝑚𝑚𝑚𝑚 : maximum superedge weight Summary graph �𝑮𝑮
  • 23. Details: Size in Bits of a Summary Graph 4 5 1 4 1 5 Introduction Algorithms Experiments ConclusionProblem 𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺 𝒐𝒐𝒐𝒐 𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔 𝒈𝒈𝒈𝒈𝒈𝒈𝒈𝒈𝒈𝒈: 𝑃𝑃 2 log2 𝑆𝑆 + log2 𝜔𝜔𝑚𝑚𝑚𝑚𝑚𝑚 + 𝑉𝑉 log2 𝑆𝑆 𝑆𝑆 : set of supernodes 𝑃𝑃 : set of superedges 𝑤𝑤𝑚𝑚𝑚𝑚𝑚𝑚 : maximum superedge weight Summary graph �𝑮𝑮
  • 24. Details: Error Measurement Introduction Algorithms Experiments ConclusionProblem 1 2 3 4 5 6 7 8 9 1 0 1 3/8 3/8 3/8 3/8 1/3 1/3 1/3 2 1 0 3/8 3/8 3/8 3/8 1/3 1/3 1/3 3 3/8 3/8 0 5/6 5/6 5/6 0 0 0 4 3/8 3/8 5/6 0 5/6 5/6 0 0 0 5 3/8 3/8 5/6 5/6 0 5/6 0 0 0 6 3/8 3/8 5/6 5/6 5/6 0 0 0 0 7 1/3 1/3 0 0 0 0 0 1 1 8 1/3 1/3 0 0 0 0 1 0 1 9 1/3 1/3 0 0 0 0 1 1 0 1 2 3 4 5 6 7 8 9 1 0 1 0 0 0 1 0 0 1 2 1 0 1 0 0 1 1 0 0 3 0 1 0 1 1 1 0 0 1 4 0 0 1 0 1 0 0 0 0 5 0 0 1 1 0 1 1 0 0 6 1 1 1 0 1 0 0 0 0 7 0 1 0 0 1 0 0 1 1 8 0 0 0 0 0 0 1 0 1 9 1 0 1 0 0 0 1 1 0 Reconstructed Adjacency Matrix �𝑨𝑨Reconstructed Adjacency Matrix 𝑨𝑨 𝑅𝑅𝐸𝐸𝑝𝑝(𝑨𝑨, �𝑨𝑨) = � 𝑖𝑖=1 𝑉𝑉 � 𝑗𝑗=1 𝑉𝑉 𝐴𝐴 𝑖𝑖, 𝑗𝑗 − ̂𝐴𝐴 𝑖𝑖, 𝑗𝑗 𝑝𝑝 𝟏𝟏 𝒑𝒑
  • 25. Road Map • Introduction • Problem • Proposed Algorithm: SSumM << • Experimental Results • Conclusions
  • 26. Main ideas of SSumM Introduction Algorithms Experiments ConclusionProblem Combines node grouping and edge sparsification Prunes search space Balances error and size of the summary graph using MDL principle • Practical graph summarization problem ◦ Given: a graph 𝑮𝑮 ◦ Find: a summary graph �𝑮𝑮 ◦ To minimize: the difference between 𝑮𝑮and the restored graph �𝑮𝑮 ◦ Subject to: 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑜𝑜𝑜𝑜 �𝑮𝑮 in bits ≤ 𝑲𝑲
  • 27. Main Idea: Combining Two Strategies Introduction Algorithms Experiments ConclusionProblem Node Grouping Sparsification
  • 28. Main Idea: Combining Two Strategies Introduction Algorithms Experiments ConclusionProblem Node Grouping Sparsification
  • 29. Main Idea: Combining Two Strategies Introduction Algorithms Experiments ConclusionProblem Node Grouping Sparsification
  • 30. Main Idea: Combining Two Strategies Introduction Algorithms Experiments ConclusionProblem Node Grouping Sparsification
  • 31. Main Idea: MDL Principle Introduction Algorithms Experiments ConclusionProblem 1 2 3 4 5 6 Merge (5, 6) 1 2 3 4 Merge (1, 2) 3 4 5 6 5 6 1 2 {1,2} {5,6}Merge (1, {5,6}) Merge (1, 2) Merge (1, 3) Merge (1, 3) How to choose a next action?
  • 32. Main Idea: MDL Principle Introduction Algorithms Experiments ConclusionProblem 1 2 3 4 5 6 Merge (5, 6) 1 2 3 4 Merge (1, 2) 3 4 5 6 5 6 1 2 {1,2} {5,6}Merge (1, {5,6}) Merge (1, 2) Merge (1, 3) Merge (1, 3) Graph Summarization is A Search Problem How to choose a next action?
  • 33. Main Idea: MDL Principle Introduction Algorithms Experiments ConclusionProblem 1 2 3 4 5 6 Merge (5, 6) 1 2 3 4 Merge (1, 2) 3 4 5 6 5 6 1 2 {1,2} {5,6} Summary graph size + Information loss Merge (1, {5,6}) Merge (1, 2) Merge (1, 3) Merge (1, 3) Graph Summarization is A Search Problem How to choose a next action?
  • 34. Main Idea: MDL Principle Introduction Algorithms Experiments ConclusionProblem 1 2 3 4 5 6 Merge (5, 6) 1 2 3 4 Merge (1, 2) 3 4 5 6 5 6 1 2 {1,2} {5,6} Summary graph size + Information loss MDL Principle Merge (1, {5,6}) Merge (1, 2) Merge (1, 3) Merge (1, 3) Graph Summarization is A Search Problem How to choose a next action? arg min 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 �𝑮𝑮 + 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶(𝑮𝑮|�𝑮𝑮) �𝑮𝑮 # 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 𝑓𝑓𝑓𝑓𝑓𝑓 �𝑮𝑮 # 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 𝑓𝑓𝑓𝑓𝑓𝑓 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑮𝑮 𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢 �𝑮𝑮
  • 35. Overview: SSumM  Initialization phase  𝑡𝑡 = 1  While 𝑡𝑡++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits  Candidate generation phase  Merge and sparsification phase  Further sparsification phase Introduction Algorithms Experiments ConclusionProblem Procedure • Given: ◦ (1) An input graph 𝑮𝑮, (2) the desired size 𝑲𝑲, (3) the number 𝑻𝑻 of iterations • Outputs: ◦ Summary graph �𝑮𝑮
  • 36. Input graph 𝑮𝑮 Summary graph �𝑮𝑮 Introduction Algorithms Experiments ConclusionProblem Procedure  Initialization phase <<  𝑡𝑡 = 1  While 𝑡𝑡++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits  Candidate generation phase  Merge and sparsification phase  Further sparsification phase 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖 𝐴𝐴 = {𝑎𝑎} 𝐵𝐵 = {𝑏𝑏}C = {𝑐𝑐} 𝐷𝐷 = {𝑑𝑑} 𝐸𝐸 = {𝑒𝑒} 𝐹𝐹 = {𝑓𝑓} 𝐺𝐺 = {𝑔𝑔} 𝐻𝐻 = {ℎ} 𝐼𝐼 = {𝑖𝑖} Initialization Phase
  • 37. Candidate Generation Phase 𝐵𝐵 = {𝑏𝑏} 𝐶𝐶 = {𝑐𝑐} D = {𝑑𝑑} 𝐸𝐸 = {𝑒𝑒} 𝐴𝐴 = {𝑎𝑎} 𝐹𝐹 = {𝑓𝑓} 𝐺𝐺 = {𝑔𝑔} 𝐻𝐻 = {ℎ} 𝐼𝐼 = {𝑖𝑖} Input graph 𝑮𝑮 Introduction Algorithms Experiments ConclusionProblem Procedure  Initialization phase  𝑡𝑡 = 1  While 𝑡𝑡++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits  Candidate generation phase <<  Merge and sparsification phase  Further sparsification phase 𝑎𝑎 𝑏𝑏𝑐𝑐 𝑑𝑑 𝑒𝑒 𝑓𝑓 𝑔𝑔 ℎ 𝑖𝑖
  • 38. Merging and Sparsification Phase For each candidate set 𝑪𝑪 Among possible candidate pairs Introduction Algorithms Experiments ConclusionProblem (A, B) (B, C) (C, D) (D, E) (A, C) (B, D) (C, E) (A, D) (B, E) (A, E) 𝐵𝐵 = {𝑏𝑏} 𝐶𝐶 = {𝑐𝑐} D = {𝑑𝑑} 𝐴𝐴 = {𝑎𝑎} Procedure  Initialization phase  𝑡𝑡 = 1  While 𝑡𝑡++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits  Candidate generation phase  Merge and sparsification phase <<  Further sparsification phase 𝐸𝐸 = {𝑒𝑒}
  • 39. Merging and Sparsification Phase For each candidate set 𝑪𝑪 Among possible candidate pairs Introduction Algorithms Experiments ConclusionProblem (A, B) (B, C) (C, D) (D, E) (A, C) (B, D) (C, E) (A, D) (B, E) (A, E) 𝐵𝐵 = {𝑏𝑏} 𝐶𝐶 = {𝑐𝑐} D = {𝑑𝑑} 𝐴𝐴 = {𝑎𝑎} Procedure  Initialization phase  𝑡𝑡 = 1  While 𝑡𝑡++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits  Candidate generation phase  Merge and sparsification phase <<  Further sparsification phase Sample log2 |𝑪𝑪| pairs𝐸𝐸 = {𝑒𝑒}
  • 40. Merging and Sparsification Phase Select the pair with the greatest (relative) reduction in the cost function 𝒊𝒊𝒊𝒊 reduction(C, D) > 𝜽𝜽: merge(C, D) 𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆 sample log2 |𝑪𝑪| pairs again Introduction Algorithms Experiments ConclusionProblem Procedure  Initialization phase  𝑡𝑡 = 1  While 𝑡𝑡++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits  Candidate generation phase  Merge and sparsification phase <<  Further sparsification phase (A, B) (A, D) (C, D)
  • 41. Merging and Sparsification Phase Select the pair with the greatest (relative) reduction in the cost function 𝒊𝒊𝒊𝒊 reduction(C, D) > 𝜽𝜽: merge(C, D) 𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆 sample log2 |𝑪𝑪| pairs again Introduction Algorithms Experiments ConclusionProblem Procedure  Initialization phase  𝑡𝑡 = 1  While 𝑡𝑡++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits  Candidate generation phase  Merge and sparsification phase <<  Further sparsification phase (A, B) (A, D) (C, D)
  • 42. Merging and Sparsification Phase (cont.) Summary graph �𝑮𝑮 Introduction Algorithms Experiments ConclusionProblem Procedure  Initialization phase  𝑡𝑡 = 1  While 𝑡𝑡++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits  Candidate generation phase  Merge and sparsification phase <<  Further sparsification phase {𝑎𝑎} {𝑏𝑏}𝐶𝐶 = {𝑐𝑐} 𝐷𝐷 = {𝑑𝑑} {𝑒𝑒} {𝑓𝑓} {𝑔𝑔} {ℎ} {𝑖𝑖}
  • 43. Merging and Sparsification Phase (cont.) Summary graph �𝑮𝑮 Introduction Algorithms Experiments ConclusionProblem Procedure  Initialization phase  𝑡𝑡 = 1  While 𝑡𝑡++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits  Candidate generation phase  Merge and sparsification phase <<  Further sparsification phase {𝑎𝑎} {𝑏𝑏} {𝑒𝑒} {𝑓𝑓} {𝑔𝑔} {ℎ} {𝑖𝑖}
  • 44. Merging and Sparsification Phase (cont.) Summary graph �𝑮𝑮 Introduction Algorithms Experiments ConclusionProblem Procedure  Initialization phase  𝑡𝑡 = 1  While 𝑡𝑡++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits  Candidate generation phase  Merge and sparsification phase <<  Further sparsification phase {𝑎𝑎} {𝑏𝑏} {𝑒𝑒} {𝑓𝑓} {𝑔𝑔} {ℎ} {𝑖𝑖} C = {𝑐𝑐, 𝑑𝑑}
  • 45. Merging and Sparsification Phase (cont.) Summary graph �𝑮𝑮 Introduction Algorithms Experiments ConclusionProblem Procedure  Initialization phase  𝑡𝑡 = 1  While 𝑡𝑡++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits  Candidate generation phase  Merge and sparsification phase <<  Further sparsification phase {𝑎𝑎} {𝑏𝑏} {𝑒𝑒} {𝑓𝑓} {𝑔𝑔} {ℎ} {𝑖𝑖} C = {𝑐𝑐, 𝑑𝑑}
  • 46. Merging and Sparsification Phase (cont.) Summary graph �𝑮𝑮 Introduction Algorithms Experiments ConclusionProblem Procedure  Initialization phase  𝑡𝑡 = 1  While 𝑡𝑡++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits  Candidate generation phase  Merge and sparsification phase <<  Further sparsification phase Sparsify or not according to total description cost {𝑎𝑎} {𝑏𝑏} {𝑒𝑒} {𝑓𝑓} {𝑔𝑔} {ℎ} {𝑖𝑖} C = {𝑐𝑐, 𝑑𝑑}
  • 47. Merging and Sparsification Phase (cont.) Summary graph �𝑮𝑮 Introduction Algorithms Experiments ConclusionProblem Procedure  Initialization phase  𝑡𝑡 = 1  While 𝑡𝑡++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits  Candidate generation phase  Merge and sparsification phase <<  Further sparsification phase Sparsify or not according to total description cost {𝑎𝑎} {𝑏𝑏} {𝑒𝑒} {𝑓𝑓} {𝑔𝑔} {ℎ} {𝑖𝑖} C = {𝑐𝑐, 𝑑𝑑}
  • 48. Merging and Sparsification Phase (cont.) Summary graph �𝑮𝑮 Introduction Algorithms Experiments ConclusionProblem Procedure  Initialization phase  𝑡𝑡 = 1  While 𝑡𝑡++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits  Candidate generation phase  Merge and sparsification phase <<  Further sparsification phase Sparsify or not according to total description cost {𝑎𝑎} {𝑏𝑏} {𝑒𝑒} {𝑓𝑓} {𝑔𝑔} {ℎ} {𝑖𝑖} C = {𝑐𝑐, 𝑑𝑑}
  • 49. Repetition Summary graph �𝑮𝑮 Introduction Algorithms Experiments ConclusionProblem Procedure  Initialization phase  𝑡𝑡 = 1  While 𝒕𝒕++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits  Candidate generation phase  Merge and sparsification phase  Further sparsification phase {𝑎𝑎} {𝑏𝑏} {𝑐𝑐, 𝑑𝑑} {𝑒𝑒} {𝑓𝑓} {𝑔𝑔} {ℎ} {𝑖𝑖}
  • 50. Repetition Summary graph �𝑮𝑮 Introduction Algorithms Experiments ConclusionProblem Different candidate sets and decreasing threshold 𝜃𝜃 over iteration Procedure  Initialization phase  𝑡𝑡 = 1  While 𝒕𝒕++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits  Candidate generation phase  Merge and sparsification phase  Further sparsification phase {𝑎𝑎} {𝑏𝑏} {𝑐𝑐, 𝑑𝑑} {𝑒𝑒} {𝑓𝑓} {𝑔𝑔} {ℎ} {𝑖𝑖}
  • 51. Repetition Summary graph �𝑮𝑮 Introduction Algorithms Experiments ConclusionProblem Summary graph �𝑮𝑮 Different candidate sets and decreasing threshold 𝜃𝜃 over iteration Procedure  Initialization phase  𝑡𝑡 = 1  While 𝒕𝒕++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits  Candidate generation phase  Merge and sparsification phase  Further sparsification phase {𝑎𝑎} {𝑏𝑏} {𝑐𝑐, 𝑑𝑑} {𝑒𝑒} {𝑓𝑓} {𝑔𝑔} {ℎ} {𝑖𝑖} {𝑎𝑎} {𝑏𝑏} {𝑐𝑐, 𝑑𝑑} {𝑒𝑒} {𝑓𝑓} {𝑔𝑔} {ℎ} {𝑖𝑖}
  • 52. Repetition Summary graph �𝑮𝑮 Introduction Algorithms Experiments ConclusionProblem Summary graph �𝑮𝑮 Different candidate sets and decreasing threshold 𝜃𝜃 over iteration Procedure  Initialization phase  𝑡𝑡 = 1  While 𝒕𝒕++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits  Candidate generation phase  Merge and sparsification phase  Further sparsification phase {𝑎𝑎} {𝑏𝑏} {𝑐𝑐, 𝑑𝑑} {𝑒𝑒} {𝑓𝑓} {ℎ} {𝑔𝑔, 𝑖𝑖} {𝑎𝑎} {𝑏𝑏} {𝑐𝑐, 𝑑𝑑} {𝑒𝑒} {𝑓𝑓} {𝑔𝑔} {ℎ} {𝑖𝑖}
  • 53. Repetition Summary graph �𝑮𝑮 Introduction Algorithms Experiments ConclusionProblem Summary graph �𝑮𝑮 Different candidate sets and decreasing threshold 𝜃𝜃 over iteration Procedure  Initialization phase  𝑡𝑡 = 1  While 𝒕𝒕++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits  Candidate generation phase  Merge and sparsification phase  Further sparsification phase {𝑎𝑎} {𝑏𝑏} {𝑐𝑐, 𝑑𝑑} {𝑒𝑒} {𝑓𝑓} {ℎ} {𝑔𝑔, 𝑖𝑖} {𝑎𝑎} {𝑏𝑏} {𝑐𝑐, 𝑑𝑑} {𝑒𝑒} {𝑓𝑓} {𝑔𝑔} {ℎ} {𝑖𝑖}
  • 54. Repetition Summary graph �𝑮𝑮 Introduction Algorithms Experiments ConclusionProblem Summary graph �𝑮𝑮 Different candidate sets and decreasing threshold 𝜃𝜃 over iteration Procedure  Initialization phase  𝑡𝑡 = 1  While 𝒕𝒕++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits  Candidate generation phase  Merge and sparsification phase  Further sparsification phase {𝑎𝑎} {𝑏𝑏} {𝑐𝑐, 𝑑𝑑} {𝑒𝑒} {𝑓𝑓} {ℎ} {𝑔𝑔, 𝑖𝑖} {𝑎𝑎} {𝑏𝑏} {𝑐𝑐, 𝑑𝑑} {𝑒𝑒} {𝑓𝑓} {𝑔𝑔} {ℎ} {𝑖𝑖}
  • 55. Repetition Summary graph �𝑮𝑮 Introduction Algorithms Experiments ConclusionProblem Summary graph �𝑮𝑮 Different candidate sets and decreasing threshold 𝜃𝜃 over iteration Procedure  Initialization phase  𝑡𝑡 = 1  While 𝒕𝒕++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits  Candidate generation phase  Merge and sparsification phase  Further sparsification phase {𝑎𝑎} {𝑏𝑏} {𝑐𝑐, 𝑑𝑑} {𝑒𝑒} {𝑓𝑓} {𝑔𝑔, ℎ, 𝑖𝑖} {𝑎𝑎} {𝑏𝑏} {𝑐𝑐, 𝑑𝑑} {𝑒𝑒} {𝑓𝑓} {𝑔𝑔} {ℎ} {𝑖𝑖}
  • 56. Repetition Summary graph �𝑮𝑮 Introduction Algorithms Experiments ConclusionProblem Summary graph �𝑮𝑮 Different candidate sets and decreasing threshold 𝜃𝜃 over iteration Procedure  Initialization phase  𝑡𝑡 = 1  While 𝒕𝒕++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits  Candidate generation phase  Merge and sparsification phase  Further sparsification phase {𝑎𝑎} {𝑏𝑏} {𝑐𝑐, 𝑑𝑑} {𝑒𝑒} {𝑓𝑓} {𝑔𝑔, ℎ, 𝑖𝑖} {𝑎𝑎} {𝑏𝑏} {𝑐𝑐, 𝑑𝑑} {𝑒𝑒} {𝑓𝑓} {𝑔𝑔} {ℎ} {𝑖𝑖}
  • 57. Repetition Summary graph �𝑮𝑮 Introduction Algorithms Experiments ConclusionProblem Summary graph �𝑮𝑮 Different candidate sets and decreasing threshold 𝜃𝜃 over iteration Procedure  Initialization phase  𝑡𝑡 = 1  While 𝒕𝒕++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits  Candidate generation phase  Merge and sparsification phase  Further sparsification phase {𝑎𝑎} {𝑏𝑏} {𝑐𝑐, 𝑑𝑑} {𝑒𝑒} {𝑓𝑓} {𝑔𝑔, ℎ, 𝑖𝑖} Summary graph �𝑮𝑮 {𝑎𝑎} {𝑏𝑏} {𝑐𝑐, 𝑑𝑑} {𝑒𝑒} {𝑓𝑓} {𝑔𝑔, ℎ, 𝑖𝑖} {𝑎𝑎} {𝑏𝑏} {𝑐𝑐, 𝑑𝑑} {𝑒𝑒} {𝑓𝑓} {𝑔𝑔} {ℎ} {𝑖𝑖}
  • 58. Repetition Summary graph �𝑮𝑮 Introduction Algorithms Experiments ConclusionProblem Summary graph �𝑮𝑮 Different candidate sets and decreasing threshold 𝜃𝜃 over iteration Procedure  Initialization phase  𝑡𝑡 = 1  While 𝒕𝒕++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits  Candidate generation phase  Merge and sparsification phase  Further sparsification phase {𝑏𝑏} {𝑐𝑐, 𝑑𝑑} {𝑓𝑓} {𝑔𝑔, ℎ, 𝑖𝑖} Summary graph �𝑮𝑮 {𝑎𝑎} {𝑏𝑏} {𝑐𝑐, 𝑑𝑑} {𝑒𝑒} {𝑓𝑓} {𝑔𝑔, ℎ, 𝑖𝑖} {𝑎𝑎} {𝑏𝑏} {𝑐𝑐, 𝑑𝑑} {𝑒𝑒} {𝑓𝑓} {𝑔𝑔} {ℎ} {𝑖𝑖} {𝑎𝑎, 𝑒𝑒}
  • 59. Repetition Summary graph �𝑮𝑮 Introduction Algorithms Experiments ConclusionProblem Summary graph �𝑮𝑮 Different candidate sets and decreasing threshold 𝜃𝜃 over iteration Procedure  Initialization phase  𝑡𝑡 = 1  While 𝒕𝒕++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits  Candidate generation phase  Merge and sparsification phase  Further sparsification phase {𝑏𝑏} {𝑐𝑐, 𝑑𝑑} {𝑓𝑓} {𝑔𝑔, ℎ, 𝑖𝑖} Summary graph �𝑮𝑮 {𝑎𝑎} {𝑏𝑏} {𝑐𝑐, 𝑑𝑑} {𝑒𝑒} {𝑓𝑓} {𝑔𝑔, ℎ, 𝑖𝑖} {𝑎𝑎} {𝑏𝑏} {𝑐𝑐, 𝑑𝑑} {𝑒𝑒} {𝑓𝑓} {𝑔𝑔} {ℎ} {𝑖𝑖} {𝑎𝑎, 𝑒𝑒}
  • 60. Introduction Algorithms Experiments ConclusionProblem Further Sparsification Phase Summary graph �𝑮𝑮 𝐴𝐴 𝐴𝐴 𝐵𝐵 𝐴𝐴 𝐹𝐹 𝐴𝐴 𝐺𝐺 𝐺𝐺 Superedges sorted by ∆𝑅𝑅𝐸𝐸𝑝𝑝 𝐶𝐶 𝐶𝐶 Size of �𝑮𝑮 in bits ≤ 𝑲𝑲 Procedure  Initialization phase  𝑡𝑡 = 1  While 𝑡𝑡++ ≤ 𝑻𝑻 and 𝑲𝑲 < size of �𝑮𝑮 in bits  Candidate generation phase  Merge and sparsification phase  Further sparsification phase << C = {𝑐𝑐, 𝑑𝑑} 𝐴𝐴 = {𝑎𝑎, 𝑒𝑒} 𝐵𝐵 = {𝑏𝑏} 𝐹𝐹 = {𝑓𝑓} 𝐺𝐺 = {𝑔𝑔, ℎ, 𝑖𝑖}
  • 61. Road Map • Introduction • Problem • Proposed Algorithm: SSumM • Experimental Results << • Conclusions
  • 62. • 10 datasets from 6 domains (up to 0.8B edges) • Three competitors for graph summarization ◦ k-Gs [LT10] ◦ S2L [RSB17] ◦ SAA-Gs [BAZK18] Introduction Algorithms Experiments ConclusionProblem Social Internet Email Co-purchase Collaboration Hyperlinks Experiments Settings
  • 63. Email-Enron Caida Ego-Facebook Web-UK-05 Web-UK-02 LiveJournal DBLP Amazon-0302Skitter k-GsSSumM S2LSAA-Gs SAA-Gs (linear sample) Introduction Algorithms Experiments ConclusionProblem o.o.t. >12hours o.o.m. >64GB SSumM Gives Concise and Accurate Summary
  • 64. Email-Enron Caida Ego-Facebook Web-UK-05 Web-UK-02 LiveJournal DBLP Amazon-0302Skitter k-GsSSumM S2LSAA-Gs SAA-Gs (linear sample) Introduction Algorithms Experiments ConclusionProblem o.o.t. >12hours o.o.m. >64GB SSumM Gives Concise and Accurate Summary
  • 65. Email-Enron Caida Ego-Facebook Web-UK-05 Web-UK-02 LiveJournal Amazon-0601 DBLPSkitter k-GsSSumM S2LSAA-Gs SAA-Gs (linear sample) Introduction Algorithms Experiments ConclusionProblem SSumM is Fast
  • 66. Email-Enron Caida Ego-Facebook Web-UK-05 Web-UK-02 LiveJournal Amazon-0601 DBLPSkitter k-GsSSumM S2LSAA-Gs SAA-Gs (linear sample) Introduction Algorithms Experiments ConclusionProblem SSumM is Fast
  • 67. Introduction Algorithms Experiments ConclusionProblem SSumM is Scalable
  • 68. Introduction Algorithms Experiments ConclusionProblem SSumM Converges Fast
  • 69. Road Map • Introduction • Problem • Proposed Algorithm: SSumM • Experimental Results • Conclusions <<
  • 70. Code available at https://github.com/KyuhanLee/SSumM Concise & Accurate Fast Scalable Introduction Algorithms Experiments ConclusionProblem Practical Problem Formulation Extensive Experiments on 10 real world graphs Scalable and Effective Algorithm Design Conclusions
  • 71. SSumM : Sparse Summarization of Massive Graphs Kyuhan Lee* Hyeonsoo Jo* Jihoon Ko Sungsu Lim Kijung Shin