1. Introduction Preliminaries MapReduce GPAbstraction Experiments
Distributed Algorithms for k-Truss
Decomposition
Pei-Ling Chen 1 Ming-Syan Chen 2
12Department of Electrical Engineering, National Taiwan University
2Research Center of Information Technology Innovatiom, Academia Sinica
July 17, 2014
Distributed Algorithms for k-Truss Decomposition July 17, 2014 1 / 37
2. Introduction Preliminaries MapReduce GPAbstraction Experiments
Outline
1 Introduction
2 Preliminaries
3 Distributed k-Truss Decompostion in MapReduce Framework
4 Distributed k-Truss Decompostion in Graph Parallel Abstractions
5 Experimental Analysis
Distributed Algorithms for k-Truss Decomposition July 17, 2014 2 / 37
3. Introduction Preliminaries MapReduce GPAbstraction Experiments
Outline
1 Introduction
Motivation
Related Work and Our Contribution
2 Preliminaries
3 Distributed k-Truss Decompostion in MapReduce Framework
4 Distributed k-Truss Decompostion in Graph Parallel Abstractions
5 Experimental Analysis
Distributed Algorithms for k-Truss Decomposition July 17, 2014 3 / 37
4. Introduction Preliminaries MapReduce GPAbstraction Experiments
Motivation
1 k-truss is one of graph measures or methods for describing
the characteristic of a vertex or capturing the structure of a
network;
2 Graph measures have several application such as marketing
and group formation;
3 With the emergence of large online networks, e.g.,
Facebook, processing graph measures becomes difficult on
long consuming time and limited memory for a single
machine;
4 Designing algorithms based on cloud computing is an
important issue.
Distributed Algorithms for k-Truss Decomposition July 17, 2014 4 / 37
5. Introduction Preliminaries MapReduce GPAbstraction Experiments
Related Work and Our Contribution
For k-truss in large graphs:
• Wang and Cheng [8] propose I/O efficient algorithms of
k-truss decomposition. They break a graph into several
partitions and use a sequential processing method to
conquer problem on limited memory of a single machine;
• A heuristic distributed k-truss decomposition with
MapReduce framework has been mentioned in [1] without
experiments and since MapReduce is not designed for
iterative algorithms, the algorithm suffers from IO waiting
time during MapReduce jobs.
We adopt the most recent graph computing model, graph
parallel abstractions, and provide rigorous theoretical basis to
propose an algorithm of efficient and scalable k-truss
decomposition.
Distributed Algorithms for k-Truss Decomposition July 17, 2014 5 / 37
6. Introduction Preliminaries MapReduce GPAbstraction Experiments
Outline
1 Introduction
2 Preliminaries
Definition
Traditional k-Truss Decomposition
3 Distributed k-Truss Decompostion in MapReduce Framework
4 Distributed k-Truss Decompostion in Graph Parallel Abstractions
5 Experimental Analysis
Distributed Algorithms for k-Truss Decomposition July 17, 2014 6 / 37
7. Introduction Preliminaries MapReduce GPAbstraction Experiments
Definition
A
BC
D
E F
sup = 2
Definition (Support)
The support of an edge e = (u, v) ∈ EG, denoted by sup(e, G),
is defined as |nb(u) ∩ nb(v)| where nb(u), nb(v) are the sets of
neighbors of u, v respectively. When G is obvious from content,
we replace sup(e, G) by sup(e).
Distributed Algorithms for k-Truss Decomposition July 17, 2014 7 / 37
8. Introduction Preliminaries MapReduce GPAbstraction Experiments
Definition
A
BC
D
E F
sup = 2
This is a 4-truss
Definition (k-Truss)
A k-truss Rk of G, where k ≥ 2, is defined as a connected
subgraph such that each sup(e, Rk) ≥ k − 2 for all e ∈ Rk.
Distributed Algorithms for k-Truss Decomposition July 17, 2014 8 / 37
9. Introduction Preliminaries MapReduce GPAbstraction Experiments
Definition
We further define a Tk as the subgraph of the union of all
k-trusses, that is, Tk = i Ri
k, where Ri
k is the i-th k-truss in G.
A
B
C
D
E
F
G
H
I
J
5
5
5 5
5
5
55
5
5
4
4
4
4
4
4
3
3 3
Definition (Trussness)
The trussness of an edge e in G, denoted by φ(e) = k, is the
maximal k value that it can be contained in ETk
.
Distributed Algorithms for k-Truss Decomposition July 17, 2014 9 / 37
10. Introduction Preliminaries MapReduce GPAbstraction Experiments
Definition
We further define a Tk as the subgraph of the union of all
k-trusses, that is, Tk = i Ri
k, where Ri
k is the i-th k-truss in G.
A
B
C
D
E
F
G
H
I
J
T5
5
5
5 5
5
5
55
5
5
4
4
4
4
4
4
3
3 3
Definition (Trussness)
The trussness of an edge e in G, denoted by φ(e) = k, is the
maximal k value that it can be contained in ETk
.
Distributed Algorithms for k-Truss Decomposition July 17, 2014 9 / 37
11. Introduction Preliminaries MapReduce GPAbstraction Experiments
Definition
We further define a Tk as the subgraph of the union of all
k-trusses, that is, Tk = i Ri
k, where Ri
k is the i-th k-truss in G.
A
B
C
D
E
F
G
H
I
J
T4
5
5
5 5
5
5
55
5
5
4
4
4
4
4
4
3
3 3
Definition (Trussness)
The trussness of an edge e in G, denoted by φ(e) = k, is the
maximal k value that it can be contained in ETk
.
Distributed Algorithms for k-Truss Decomposition July 17, 2014 9 / 37
12. Introduction Preliminaries MapReduce GPAbstraction Experiments
Definition
We further define a Tk as the subgraph of the union of all
k-trusses, that is, Tk = i Ri
k, where Ri
k is the i-th k-truss in G.
A
B
C
D
E
F
G
H
I
J
T3
5
5
5 5
5
5
55
5
5
4
4
4
4
4
4
3
3 3
Definition (Trussness)
The trussness of an edge e in G, denoted by φ(e) = k, is the
maximal k value that it can be contained in ETk
.
Distributed Algorithms for k-Truss Decomposition July 17, 2014 9 / 37
13. Introduction Preliminaries MapReduce GPAbstraction Experiments
Definition
We further define a Tk as the subgraph of the union of all
k-trusses, that is, Tk = i Ri
k, where Ri
k is the i-th k-truss in G.
A
B
C
D
E
F
G
H
I
J
5
5
5 5
5
5
55
5
5
4
4
4
4
4
4
3
3 3
Definition (Trussness)
The trussness of an edge e in G, denoted by φ(e) = k, is the
maximal k value that it can be contained in ETk
.
Distributed Algorithms for k-Truss Decomposition July 17, 2014 9 / 37
14. Introduction Preliminaries MapReduce GPAbstraction Experiments
Traditional k-Truss Decomposition
1 The graph which needs to be
processed by k-truss decomposition.
2 Find supports for each edge.
3 For k = 4, the edges with support
< 4 − 2 = 2 should be removed and
then supports for remained edges
are updated.
4 For k = 5, the edges with support
< 5 − 2 = 3 should be removed.
5 The final result.
A
B
C
D
E
2
2
2
2
1
1
sup = 2
3
Distributed Algorithms for k-Truss Decomposition July 17, 2014 10 / 37
15. Introduction Preliminaries MapReduce GPAbstraction Experiments
Traditional k-Truss Decomposition
1 The graph which needs to be
processed by k-truss decomposition.
2 Find supports for each edge.
3 For k = 4, the edges with support
< 4 − 2 = 2 should be removed and
then supports for remained edges
are updated.
4 For k = 5, the edges with support
< 5 − 2 = 3 should be removed.
5 The final result.
A
B
C
D
E
2
2
2
2
1
1
sup = 2
3
Distributed Algorithms for k-Truss Decomposition July 17, 2014 10 / 37
16. Introduction Preliminaries MapReduce GPAbstraction Experiments
Traditional k-Truss Decomposition
1 The graph which needs to be
processed by k-truss decomposition.
2 Find supports for each edge.
3 For k = 4, the edges with support
< 4 − 2 = 2 should be removed and
then supports for remained edges
are updated.
4 For k = 5, the edges with support
< 5 − 2 = 3 should be removed.
5 The final result.
A
B
C
D
E
2
2
2
2
1
1
sup = 2
2
Distributed Algorithms for k-Truss Decomposition July 17, 2014 10 / 37
17. Introduction Preliminaries MapReduce GPAbstraction Experiments
Traditional k-Truss Decomposition
1 The graph which needs to be
processed by k-truss decomposition.
2 Find supports for each edge.
3 For k = 4, the edges with support
< 4 − 2 = 2 should be removed and
then supports for remained edges
are updated.
4 For k = 5, the edges with support
< 5 − 2 = 3 should be removed.
5 The final result.
A
B
C
D
E
2
2
2
2
1
1
sup = 2
2
Distributed Algorithms for k-Truss Decomposition July 17, 2014 10 / 37
18. Introduction Preliminaries MapReduce GPAbstraction Experiments
Traditional k-Truss Decomposition
1 The graph which needs to be
processed by k-truss decomposition.
2 Find supports for each edge.
3 For k = 4, the edges with support
< 4 − 2 = 2 should be removed and
then supports for remained edges
are updated.
4 For k = 5, the edges with support
< 5 − 2 = 3 should be removed.
5 The final result.
A
B
C
D
E3
φ = 3
4
4
4
4
4
φ = 4
Distributed Algorithms for k-Truss Decomposition July 17, 2014 10 / 37
20. Introduction Preliminaries MapReduce GPAbstraction Experiments
MRTruss
A very heuristic algorithm for distributed k-truss decomposition
under MapReduce framework is proposed in [1]. We abbreviate
this method as MRTruss.
1 For each pair of edges with a common vertex, i.e., an open
triad, generate a record with a triad as a value and a
potential closure which is an edge closing this triad as a
triangle and the existing of such an edge is unknown in this
task, as a key;
2 Check whether a closure specified in a key exists or not,
and output existing triangles;
3 Count sup(e) for each edge e, and delete edges with the
smallest support.
The procedure of MRTruss is the same as that of the traditional
batch k-truss decomposition.
Distributed Algorithms for k-Truss Decomposition July 17, 2014 12 / 37
21. Introduction Preliminaries MapReduce GPAbstraction Experiments
MRTruss
However, there are several problems in MRTruss.
1 The main issue in MRTruss is that edge triangle
relationships for the input graph are not preserved in each
iteration.
2 Three jobs required in each iteration cause too many
unwanted disk IO operations.
3 Too many intermediate outputs.
Therefore, we propose an improved version as follows.
Distributed Algorithms for k-Truss Decomposition July 17, 2014 13 / 37
22. Introduction Preliminaries MapReduce GPAbstraction Experiments
i-MRTruss
Algorithm 1 i-MRTruss
Input: G = (V, E)
Output: Records with (e, φ(e))
1: run Procedure 1: Triangle Finding
2: t ← 2
3: repeat
4: run Procedure 2: Trussness Counting
5: until ∀e ∈ E, sup(e) + 2 ≥ t
6: t ← (t + 1)
7: goto Step 4
1 Sacrifice memory usage to condense the three tasks in
MRTruss into one, which efficiently decreases the number
of Disk I/O operations and speeds up the running time.
Distributed Algorithms for k-Truss Decomposition July 17, 2014 14 / 37
23. Introduction Preliminaries MapReduce GPAbstraction Experiments
i-MRTruss
Triangle Finding
A B D
AB
C D
Map phase
A B D
B A C D
D A B C
AB
C D
Reduce Phase 1
AB 3 AD BD
AD 3 AB BD
Reduce Phase 2
Distributed Algorithms for k-Truss Decomposition July 17, 2014 15 / 37
24. Introduction Preliminaries MapReduce GPAbstraction Experiments
i-MRTruss
Trussness Counting
BD 4 AB AD BC CD
AB
C D
Map phase with c = 4
BD 3 AB AD BC CD
AB 3 AD BD
AD 3 AB BD
BC 3 CD BD
CD 3 BC BD
Reduce Phase with c = 4
Distributed Algorithms for k-Truss Decomposition July 17, 2014 16 / 37
25. Introduction Preliminaries MapReduce GPAbstraction Experiments
Outline
1 Introduction
2 Preliminaries
3 Distributed k-Truss Decompostion in MapReduce Framework
4 Distributed k-Truss Decompostion in Graph Parallel Abstractions
Graph Parallel Abstractions
Definitions and Theorems
Algorithm and Illustrated Example
5 Experimental Analysis
Distributed Algorithms for k-Truss Decomposition July 17, 2014 17 / 37
26. Introduction Preliminaries MapReduce GPAbstraction Experiments
Graph Parallel Abstractions
• A graph-parallel abstraction comprises a graph and a
vertex-program executed in parallel on every vertex in the
graph.
• A vertex-program can interact neighbors of the vertex.
• Pregel [5] and GraphLab [4] are two well-known graph
parallel abstractions.
A B
vertex-program
Compute{· · · }
vertex-program
Compute{· · · }
Distributed Algorithms for k-Truss Decomposition July 17, 2014 18 / 37
27. Introduction Preliminaries MapReduce GPAbstraction Experiments
Graph Parallel Abstractions
• Pregel is a well-known abstraction based on BSP model in which
a vertex-program passes messages to other neighbors in a
sequence of supersteps. Barrier synchronization is used to
separate each superstep and ensures the synchronization. Both
Apache Hama [6] and Apache Giraph are the open source
counterparts to Pregel.
A B Active Inactive
Compute{· · · }
Compute{· · · }
msg
m
sg
m
sg
m
sg
m
sg
Volt to halt
Message recieved
Distributed Algorithms for k-Truss Decomposition July 17, 2014 19 / 37
28. Introduction Preliminaries MapReduce GPAbstraction Experiments
Definitions and Theorems
We want to solve the k-truss decomposition in a brand new
aspect:
• The trussness φ(e) of an edge e ∈ EG can be decided by
the trussnesses of a subset of edges in the graph G.
This idea provides a new computation logic different from the
traditional batch algorithms.
Therefore, we first derive a theorem to prove the locality
property in k-truss to decide the enough range of the subset of
edges in a graph for computing the trussness φ(e) of an edge e.
Distributed Algorithms for k-Truss Decomposition July 17, 2014 20 / 37
29. Introduction Preliminaries MapReduce GPAbstraction Experiments
Definitions and Theorems
AB
C D
3
33
φ = 3
3
1 Since φ(BD) = 3, we can find at least 2(3 − 2) = 2 edges forming 1 triangle
with BD, and both of these edges have φ = 3 ≥ 3, but no 2(4 − 2) = 4 edges
forming 2 triangles with it have φ ≥ 4.
Theorem (Locality)
∀e ∈ EG: φ(e) = k if and only if
1 there exists a subset Ek ⊆ enb(e) such that |Ek| = 2(k − 2), edges in Ek
forms total (k − 2) triangles with e, and for each edge e ∈ Ek, φ(e ) ≥ k;
2 there is no subset Ek+1 ⊆ enb(e) such that |Ek+1| = 2(k − 1), edges in
Ek+1 forms total k − 1 triangles with e, and for each edge e ∈ Ek+1,
φ(e ) ≥ k + 1.
Distributed Algorithms for k-Truss Decomposition July 17, 2014 21 / 37
30. Introduction Preliminaries MapReduce GPAbstraction Experiments
Definitions and Theorems
AB
C D
3
33
φ = 3
?
2 If φ(BD) = k is unknown, since sup(BD) = 2, let’s start from assuming
k = 4
Theorem (Locality)
∀e ∈ EG: φ(e) = k if and only if
1 there exists a subset Ek ⊆ enb(e) such that |Ek| = 2(k − 2), edges in Ek
forms total (k − 2) triangles with e, and for each edge e ∈ Ek, φ(e ) ≥ k;
2 there is no subset Ek+1 ⊆ enb(e) such that |Ek+1| = 2(k − 1), edges in
Ek+1 forms total k − 1 triangles with e, and for each edge e ∈ Ek+1,
φ(e ) ≥ k + 1.
Distributed Algorithms for k-Truss Decomposition July 17, 2014 21 / 37
31. Introduction Preliminaries MapReduce GPAbstraction Experiments
Definitions and Theorems
AB
C D
3
33
φ = 3
? = 4
3 Then there are at least 2(4 − 2) = 4 edges forming 2 triangles with BD, but
both of these edges have φ = 3 < 4, so φ(BD) = 4.
Theorem (Locality)
∀e ∈ EG: φ(e) = k if and only if
1 there exists a subset Ek ⊆ enb(e) such that |Ek| = 2(k − 2), edges in Ek
forms total (k − 2) triangles with e, and for each edge e ∈ Ek, φ(e ) ≥ k;
2 there is no subset Ek+1 ⊆ enb(e) such that |Ek+1| = 2(k − 1), edges in
Ek+1 forms total k − 1 triangles with e, and for each edge e ∈ Ek+1,
φ(e ) ≥ k + 1.
Distributed Algorithms for k-Truss Decomposition July 17, 2014 21 / 37
32. Introduction Preliminaries MapReduce GPAbstraction Experiments
Definitions and Theorems
AB
C D
3
33
φ = 3
3
4 Assuming k = 3, then there are at least 2(3 − 2) = 2 edges forming 1
triangles with BD, and both of these edges have φ = 3 ≥ 3, so φ(BD) = 3.
Theorem (Locality)
∀e ∈ EG: φ(e) = k if and only if
1 there exists a subset Ek ⊆ enb(e) such that |Ek| = 2(k − 2), edges in Ek
forms total (k − 2) triangles with e, and for each edge e ∈ Ek, φ(e ) ≥ k;
2 there is no subset Ek+1 ⊆ enb(e) such that |Ek+1| = 2(k − 1), edges in
Ek+1 forms total k − 1 triangles with e, and for each edge e ∈ Ek+1,
φ(e ) ≥ k + 1.
Distributed Algorithms for k-Truss Decomposition July 17, 2014 21 / 37
33. Introduction Preliminaries MapReduce GPAbstraction Experiments
Definitions and Theorems
A
B
C
D
E
G
AB
BC
CD
DE
AE
AC
L(G)
Definition (Line Graph)
Given a simple graph G = (VG, EG), its line graph
L(G) = (VL(G), EL(G)) is a graph where each vertex v ∈ VL(G)
represents an e ∈ EG (1 − 1 correspondence), and two vertices
in VL(G) are adjacent if and only if their corresponding edges in
EG share a common endpoint.
Distributed Algorithms for k-Truss Decomposition July 17, 2014 22 / 37
34. Introduction Preliminaries MapReduce GPAbstraction Experiments
Definitions and Theorems
A
B
C
D
E
G
AB
BC
CD
DE
AE
AC
L(G)
AB
BC
CD
DE
AE
AC
PL(G)
Definition (Pruned Line Graph)
Given a simple graph G = (VG, EG) and its line graph L(G) = (VL(G),
EL(G)), the pruned line graph of G is PL(G) = (VP L(G), EP L(G))
where VP L(G) is the same as the VL(G), but EP L(G) is reduced by the
constraint: two vertices in VP L(G) are adjacent if and only if their
corresponding edges e1, e2 ∈ EG forming a triangle with another edge
e ∈ EG.
Distributed Algorithms for k-Truss Decomposition July 17, 2014 23 / 37
35. Introduction Preliminaries MapReduce GPAbstraction Experiments
Algorithm and Illustrated Example
Trussness-Parallel Computing
1 The pruned line graph PL(G) constructed by Trussness-Parallel
Computing based on the output of Triangle Finding on graph G
(The dashed line represents edges pruned from the original line
graph)
AB
C D
G
AB, 3BC, 3
CD, 3 AD, 3
BD, 4
PL(G)
Distributed Algorithms for k-Truss Decomposition July 17, 2014 24 / 37
36. Introduction Preliminaries MapReduce GPAbstraction Experiments
Algorithm and Illustrated Example
Trussness-Parallel Computing
2 The first superstep in Trussness-Parallel Computing.
AB, 3BC, 3
CD, 3 AD, 3
BD, 4 msg
m
sg
msg
m
sg
m
sg
m
sg
AB’s list
AB : 3
· · ·
BD’s list
BD : 4
msg : (ID, potential trussness)
Distributed Algorithms for k-Truss Decomposition July 17, 2014 25 / 37
37. Introduction Preliminaries MapReduce GPAbstraction Experiments
Algorithm and Illustrated Example
Trussness-Parallel Computing
2 The first superstep in Trussness-Parallel Computing.
AB, 3BC, 3
CD, 3 AD, 3
BD, 4 msg
m
sg
msg
m
sg
m
sg
m
sg
AB’s list
AB : 3
· · ·
BD’s list
BD : 4
msg : (ID, potential trussness)
Distributed Algorithms for k-Truss Decomposition July 17, 2014 25 / 37
38. Introduction Preliminaries MapReduce GPAbstraction Experiments
Algorithm and Illustrated Example
3 The second superstep in Trussness-Parallel Computing.
AB’s list
AB : 3
AD : 3
BD : 4
· · ·
BD’s list
BD : 4
AB : 3
BC : 3
CD : 3
AD : 3
AB’s table M
D : (A, 3) (B, 4)· · ·
BD’s table M
A : (B, 3) (D, 3)
C : (B, 3) (D, 3)
AB’s counter
φ ≥ 2 : 1
φ ≥ 3 : 1
· · ·
BD’s counter
φ ≥ 2 : 2
φ ≥ 3 : 2
φ ≥ 4 : 0
msg : (ID, potential trussness)
AB, 3BC, 3
CD, 3 AD, 3
BD, 4
m
sg
m
sg
m
sg
m
sg
Distributed Algorithms for k-Truss Decomposition July 17, 2014 26 / 37
39. Introduction Preliminaries MapReduce GPAbstraction Experiments
Algorithm and Illustrated Example
3 The second superstep in Trussness-Parallel Computing.
AB’s list
AB : 3
AD : 3
BD : 4
· · ·
BD’s list
BD : 4
AB : 3
BC : 3
CD : 3
AD : 3
AB’s table M
D : (A, 3) (B, 4)· · ·
BD’s table M
A : (B, 3) (D, 3)
C : (B, 3) (D, 3)
AB’s counter
φ ≥ 2 : 1
φ ≥ 3 : 1
· · ·
BD’s counter
φ ≥ 2 : 2
φ ≥ 3 : 2
φ ≥ 4 : 0
msg : (ID, potential trussness)
AB, 3BC, 3
CD, 3 AD, 3
BD, 4
m
sg
m
sg
m
sg
m
sg
Distributed Algorithms for k-Truss Decomposition July 17, 2014 26 / 37
40. Introduction Preliminaries MapReduce GPAbstraction Experiments
Algorithm and Illustrated Example
3 The second superstep in Trussness-Parallel Computing.
AB’s list
AB : 3
AD : 3
BD : 4
· · ·
BD’s list
BD : 4
AB : 3
BC : 3
CD : 3
AD : 3
AB’s table M
D : (A, 3) (B, 4)· · ·
BD’s table M
A : (B, 3) (D, 3)
C : (B, 3) (D, 3)
AB’s counter
φ ≥ 2 : 1
φ ≥ 3 : 1
· · ·
BD’s counter
φ ≥ 2 : 2
φ ≥ 3 : 2
φ ≥ 4 : 0
msg : (ID, potential trussness)
AB, 3BC, 3
CD, 3 AD, 3
BD, 4
m
sg
m
sg
m
sg
m
sg
Distributed Algorithms for k-Truss Decomposition July 17, 2014 26 / 37
41. Introduction Preliminaries MapReduce GPAbstraction Experiments
Algorithm and Illustrated Example
3 The second superstep in Trussness-Parallel Computing.
AB’s list
AB : 3
AD : 3
BD : 4
· · ·
BD’s list
BD : 4
AB : 3
BC : 3
CD : 3
AD : 3
AB’s table M
D : (A, 3) (B, 4)· · ·
BD’s table M
A : (B, 3) (D, 3)
C : (B, 3) (D, 3)
AB’s counter
φ ≥ 2 : 1
φ ≥ 3 : 1
· · ·
BD’s counter
φ ≥ 2 : 2
φ ≥ 3 : 2
φ ≥ 4 : 0
msg : (ID, potential trussness)
AB, 3BC, 3
CD, 3 AD, 3
BD, 3
m
sg
m
sg
m
sg
m
sg
Distributed Algorithms for k-Truss Decomposition July 17, 2014 26 / 37
42. Introduction Preliminaries MapReduce GPAbstraction Experiments
Algorithm and Illustrated Example
3 The second superstep in Trussness-Parallel Computing.
AB’s list
AB : 3
AD : 3
BD : 4
· · ·
BD’s list
BD : 4
AB : 3
BC : 3
CD : 3
AD : 3
AB’s table M
D : (A, 3) (B, 4)· · ·
BD’s table M
A : (B, 3) (D, 3)
C : (B, 3) (D, 3)
AB’s counter
φ ≥ 2 : 1
φ ≥ 3 : 1
· · ·
BD’s counter
φ ≥ 2 : 2
φ ≥ 3 : 2
φ ≥ 4 : 0
msg : (ID, potential trussness)
AB, 3BC, 3
CD, 3 AD, 3
BD, 3
m
sg
m
sg
m
sg
m
sg
Distributed Algorithms for k-Truss Decomposition July 17, 2014 26 / 37
43. Introduction Preliminaries MapReduce GPAbstraction Experiments
Algorithm and Illustrated Example
4 The third superstep in Trussness-Parallel Computing.
AB’s list
AB : 3
BD : 3
· · ·
BD’s list
BD : 4
AB’s table M
D : (A, 3) (B, 3)· · ·
BD’s table M
A : (B, 3) (D, 3)
C : (B, 3) (D, 3)
AB’s counter
φ ≥ 2 : 1
φ ≥ 3 : 1
· · ·
BD’s counter
φ ≥ 2 : 2
φ ≥ 3 : 2
φ ≥ 4 : 0
msg : (ID, potential trussness)
AB, 3BC, 3
CD, 3 AD, 3
BD, 3
m
sg
m
sg
m
sg
m
sg
Distributed Algorithms for k-Truss Decomposition July 17, 2014 26 / 37
44. Introduction Preliminaries MapReduce GPAbstraction Experiments
Algorithm and Illustrated Example
4 The third superstep in Trussness-Parallel Computing.
AB’s list
AB : 3
BD : 3
· · ·
BD’s list
BD : 4
AB’s table M
D : (A, 3) (B, 3)· · ·
BD’s table M
A : (B, 3) (D, 3)
C : (B, 3) (D, 3)
AB’s counter
φ ≥ 2 : 1
φ ≥ 3 : 1
· · ·
BD’s counter
φ ≥ 2 : 2
φ ≥ 3 : 2
φ ≥ 4 : 0
msg : (ID, potential trussness)
AB, 3BC, 3
CD, 3 AD, 3
BD, 3
m
sg
m
sg
m
sg
m
sg
Distributed Algorithms for k-Truss Decomposition July 17, 2014 26 / 37
45. Introduction Preliminaries MapReduce GPAbstraction Experiments
Algorithm and Illustrated Example
4 The third superstep in Trussness-Parallel Computing.
AB’s list
AB : 3
BD : 3
· · ·
BD’s list
BD : 4
AB’s table M
D : (A, 3) (B, 3)· · ·
BD’s table M
A : (B, 3) (D, 3)
C : (B, 3) (D, 3)
AB’s counter
φ ≥ 2 : 1
φ ≥ 3 : 1
· · ·
BD’s counter
φ ≥ 2 : 2
φ ≥ 3 : 2
φ ≥ 4 : 0
msg : (ID, potential trussness)
AB, 3BC, 3
CD, 3 AD, 3
BD, 3
m
sg
m
sg
m
sg
m
sg
Distributed Algorithms for k-Truss Decomposition July 17, 2014 26 / 37
46. Introduction Preliminaries MapReduce GPAbstraction Experiments
Algorithm and Illustrated Example
4 The third superstep in Trussness-Parallel Computing.
AB’s list
AB : 3
BD : 3
· · ·
BD’s list
BD : 4
AB’s table M
D : (A, 3) (B, 3)· · ·
BD’s table M
A : (B, 3) (D, 3)
C : (B, 3) (D, 3)
AB’s counter
φ ≥ 2 : 1
φ ≥ 3 : 1
· · ·
BD’s counter
φ ≥ 2 : 2
φ ≥ 3 : 2
φ ≥ 4 : 0
msg : (ID, potential trussness)
AB, 3BC, 3
CD, 3 AD, 3
BD, 3
m
sg
m
sg
m
sg
m
sg
Distributed Algorithms for k-Truss Decomposition July 17, 2014 26 / 37
47. Introduction Preliminaries MapReduce GPAbstraction Experiments
Outline
1 Introduction
2 Preliminaries
3 Distributed k-Truss Decompostion in MapReduce Framework
4 Distributed k-Truss Decompostion in Graph Parallel Abstractions
5 Experimental Analysis
Synthetic Data
Real Data
Conclusion
Distributed Algorithms for k-Truss Decomposition July 17, 2014 27 / 37
48. Introduction Preliminaries MapReduce GPAbstraction Experiments
Synthetic Data
Table: Statistics of Synthetic Graph Datasets
Scale i V E dmax davg supmax supavg kmax
103
10 1024 1368 44 2.6718 13 0.147 4
104
14 16384 25514 110 3.1145 11 0.052 4
105
18 262144 465679 363 3.5528 6 0.017 4
106
20 1048576 1986937 648 3.7898 7 0.009 4
107
24 16777216 36146725 2164 4.308711 29 0.003 4
1 These five datasets have the same Kronecker matrix setting:
{0.999 0.327; 0.348 0.391}.
2 We can ensure the similar properties in these datasets but with
different scales.
3 All graphs are simple graph (undirected, unweighted, no loops or
multiple edges).
Distributed Algorithms for k-Truss Decomposition July 17, 2014 28 / 37
49. Introduction Preliminaries MapReduce GPAbstraction Experiments
Synthetic Data
103 104 105 106 107
0
1
2
3·103
Scale of Node Number
RunningTime(103·sec.)
GPTruss i-MRTruss MRTruss
1 The running time of these
three methods grows when
the scale of the dataset
increases.
2 The running time of
GPTruss is always half of
what i-MRTruss has.
3 The running time of
MRTruss significantly
grows when the scale is up
to 107
.
Distributed Algorithms for k-Truss Decomposition July 17, 2014 29 / 37
50. Introduction Preliminaries MapReduce GPAbstraction Experiments
Synthetic Data
103 104 105 106 107
0
20
40
Scale of Node Number
NumberofJobs
GPTruss i-MRTruss MRTruss
103 104 105 106 107
0
5
10
15
Scale of Node Number
NumberofIterations
GPTruss i-MRTruss MRTruss
1 Since the range of k for k-truss in these datasets is not wide, the
difference of required iteration numbers among these three
methods is not far.
2 However, the required job numbers are far apart from these three
methods.
Distributed Algorithms for k-Truss Decomposition July 17, 2014 30 / 37
51. Introduction Preliminaries MapReduce GPAbstraction Experiments
Real Data
Table: Statistics of Real World Network Datasets
Name V E dmax davg supmax supavg kmax
com-Youtube 1134890 2987624 28754 5.265 4034 3.069 19
loc-Gowalla 196591 950327 14730 9.668 1297 7.176 29
roadNet-TX 1379917 1921660 12 2.785 3 0.129 4
com-DBLP 317080 1049866 343 6.622 312 6.356 114
1 Among these four datasets, com-Youtube, loc-Gowalla and com-DBLP are
three dense datasets with high average degree.
2 For datasets like RoadNet-TX, they are considered to be large datasets with
over one million vertices.
3 Since RoadNet-TX has davg < 3, it is viewed as a sparse dataset.
4 All datasets are preprocessed to be simple graphs (undirected, unweighted,
no self loops or multiple edges).
Distributed Algorithms for k-Truss Decomposition July 17, 2014 31 / 37
52. Introduction Preliminaries MapReduce GPAbstraction Experiments
Real Data
4 6 8 10 12
103
104
Number of reducers
RunningTime(sec.)
com-Youtube
4 6 8 10 12
102
103
104
Number of reducers
RunningTime(sec.)
GPTruss i-MRTruss
loc-Gowalla
4 6 8 10 12
102
103
104
Number of reducers
RunningTime(sec.)
com-DBLP
4 6 8 10 12
60
80
100
120
140
Number of reducers
RunningTime(sec.)
roadNet-TX
1 For i-MRTruss, since the
iteration number cannot be
decreased by using more
reducers, the running time
will be finally bounded by
the consuming time of
Disk I/O operations.
2 GPTruss is shown to be
much more efficient than
i-MRTruss in this kind of
large and dense dataset.
3 For the sparse dataset like
roadNet-TX, the difference
of performance is relatively
smaller, but the running
time of GPTruss is still
lower than half of what
i-MRTruss has.
Distributed Algorithms for k-Truss Decomposition July 17, 2014 32 / 37
53. Introduction Preliminaries MapReduce GPAbstraction Experiments
Real Data
4 6 8 10 12
200
400
600
Number of reducers
RunningTime(sec.)
com-Youtube
4 6 8 10 12
0
100
200
300
Number of reducers
RunningTime(sec.)
Trussness-Parallel Computing Triangle Finding
loc-Gowalla
4 6 8 10 12
0
50
100
150
200
Number of reducers
RunningTime(sec.)
com-DBLP
4 6 8 10 12
0
20
40
60
80
100
Number of reducers
RunningTime(sec.)
roadNet-TX
1 The running time for both
of them is roughly similar
to each other’s in
loc-Gowalla, com-DBLP,
and roadNet-Tx, which
have similar edge number.
2 In com-Youtube, the
running time of Triangle
Finding is much longer
than that of
Trussness-Parallel
Computing.
3 Above results indicate
when a graph has a large
number of edges, the
running time of Triangle
Finding will dominate the
performance of GPTruss.
Distributed Algorithms for k-Truss Decomposition July 17, 2014 33 / 37
54. Introduction Preliminaries MapReduce GPAbstraction Experiments
Real Data
Youtube
Gowalla
DBLP
RoadNet
0
0.5
1
1.5
2
·104
MemoryUsage(MB)
GPTruss i-MRTruss
1 All methods are tested by
using 4 slaves.
2 The memory usage of
i-MRTruss is the average
memory used by one job
(iteration) in a single machine.
3 The memory usage of GPTruss
is the average memory in a
single machine.
4 Since GPTruss needs a line
graph transformation, the line
graph of the original dataset
will become larger and with a
new average degree roughly
equal to 2 × davg − 2.
Therefore, the memory usage
is more than i-MRTruss.
Distributed Algorithms for k-Truss Decomposition July 17, 2014 34 / 37
55. Introduction Preliminaries MapReduce GPAbstraction Experiments
Real Data
Youtube
Gowalla
DBLP
roadNet
102
103
104
105
DiskUsage(MB) GPTruss i-MRTruss
Youtube
Gowalla
DBLP
roadNet
0
200
400
NumberofIteration
GPTruss i-MRTruss
1 For i-MRTruss, the disk usage is roughly correlated to how many iterations
needed by the dataset and the dataset size.
2 For GPTruss, since total 2 jobs (Disk I/O operations) needed, the disk usage
is always lower than i-MRTruss.
3 For the datasets with dense vertices and edges, the difference of iteration
number between these two methods is much obvious.
Distributed Algorithms for k-Truss Decomposition July 17, 2014 35 / 37
56. Introduction Preliminaries MapReduce GPAbstraction Experiments
Conclusion
• We provide an improved MapReduce version, i-MRTruss,
which is based on an existing distributed k-truss
decomposition;
• We prove the locality property of k-truss and design a
distributed k-truss decomposition based on this property
under graph-parallel abstractions, which efficiently
increases the performance;
• In the future work, it is worth studying how to process the
pruned line graph efficiently when a graph has lots of
edges, which is pointed out in the experimental analysis of
GPTruss.
Distributed Algorithms for k-Truss Decomposition July 17, 2014 36 / 37
58. Introduction Preliminaries MapReduce GPAbstraction Experiments
Jonathan Cohen.
Graph twiddling in a mapreduce world.
Computing in Science & Engineering, 11(4):29–41, 2009.
Jonathan D Cohen.
Trusses: Cohesive subgraphs for social network analysis.
National Security Agency Technical Report, 2008.
Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos
Guestrin.
Powergraph: Distributed graph-parallel computation on natural graphs.
In OSDI, 2012.
Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo
Kyrola, and Joseph M Hellerstein.
Distributed graphlab: a framework for machine learning and data mining in
the cloud.
In VLDB, 2012.
Grzegorz Malewicz, Matthew H Austern, Aart JC Bik, James C Dehnert,
Ilan Horn, Naty Leiser, and Grzegorz Czajkowski.
Pregel: a system for large-scale graph processing.
In SIGMOD, 2010.
Distributed Algorithms for k-Truss Decomposition July 17, 2014 37 / 37
59. Introduction Preliminaries MapReduce GPAbstraction Experiments
Sangwon Seo, Edward J Yoon, Jaehong Kim, Seongwook Jin, Jin-Soo Kim,
and Seungryoul Maeng.
Hama: An efficient matrix computation with the mapreduce framework.
In CloudCom, 2010.
Johan Ugander, Lars Backstrom, Cameron Marlow, and Jon Kleinberg.
Structural diversity in social contagion.
In PNAS, 2012.
Jia Wang and James Cheng.
Truss decomposition in massive networks.
In VLDB, 2012.
De-Nian Yang, Yi-Ling Chen, Wang-Chien Lee, and Ming-Syan Chen.
On social-temporal group query with acquaintance constraint.
In VLDB, 2011.
De-Nian Yang, Chih-Ya Shen, Wang-Chien Lee, and Ming-Syan Chen.
On socio-spatial group query for location-based social networks.
In SIGKDD, 2012.
Distributed Algorithms for k-Truss Decomposition July 17, 2014 37 / 37