If this Giant Must Walk: A Manifesto for a New Nigeria
Boosting Vertex-Cut Partitioning for Streaming Graphs
1. Boosting Vertex-Cut Partitioning for
Streaming Graphs
Hooman Peiro Sajjad*, Amir H. Payberah†, Fatemeh Rahimian†, Vladimir Vlassov*, Seif Haridi†
* KTH Royal Institute of Technology † SICS Swedish ICT
5th IEEE International Congress on Big Data
10. A Good Vertex-Cut Partitioning
10
• Low replication factor
• Balanced partitions with respect to the number of edges
11. Streaming Graph Partitioning
• Graph elements are
assigned to partitions as
they are being streamed
• No global knowledge
11
Partitioner
P1
P2
Pp
streaming edges
13. State-of-the-Art Partitioners
• Centralized partitioner:
• Single thread partitioner
• Multi-threaded partitioner: each thread
partitions a subset of the graph and shares the
state information
13
14. State-of-the-Art Partitioners
• Centralized partitioner:
• Single thread partitioner
• Multi-threaded partitioner: each thread
partitions a subset of the graph and shares the
state information
• Distributed partitioner:
• Oblivious partitioners: several independent
partitioners
14
15. State-of-the-Art Partitioners
• Centralized partitioner:
• Single thread partitioner
• Multi-threaded partitioner: each thread
partitions a subset of the graph and shares the
state information
• Distributed partitioner:
• Oblivious partitioners: several independent
partitioners
15
Slow partitioning time
Low replication factor
16. State-of-the-Art Partitioners
• Centralized partitioner:
• Single thread partitioner
• Multi-threaded partitioner: each thread
partitions a subset of the graph and shares the
state information
• Distributed partitioner:
• Oblivious partitioners: several independent
partitioners
16
Slow partitioning time
Low replication factor
Fast partitioning time
High replication factor
17. Slow partitioning time
Low replication factor
Centralized partitioner
Partitioning Time vs. Partition Quality
17
Distributed
partitioner
Fast partitioning time
High replication factor
18. Slow partitioning time
Low replication factor
Centralized partitioner
Partitioning Time vs. Partition Quality
18
Distributed
partitioner
Fast partitioning time
High replication factor
?
19. Slow partitioning time
Low replication factor
Centralized partitioner
Partitioning Time vs. Partition Quality
19
Distributed
partitioner
Fast partitioning time
High replication factor
HoVerCut
23. Architecture: Input
23
Core
Partitioning Policy
Tumbling Window
Loca
l
Stat
e
Subpartitioner 1
Core
Partitioning Policy
Tumbling Window
Loca
l
Stat
e
Subpartitioner n
Edge stream
Async
Async
• Input graphs are
streamed by their edges
• Each subpartitioner
receives an exclusive
subset of the edges
Shared
State
24. Architecture: Configurable Window
24
Partitioning Policy
Local
State
Subpartitioner 1
Edge stream
Core
Partitioning Policy
Tumbling Window
Local
State
Subpartitioner n
Edge stream
Async
Async
Subpartitioners
collect a number of
incoming edges in a
window of a certain
size.
Tumbling Window
Core
Shared
State
25. Architecture: Partitioning Policy
25
Local
State
Subpartitioner 1
Edge stream
Core
Partitioning Policy
Tumbling Window
Local
State
Subpartitioner n
Edge stream
Async
Async
Each subpartitioner
assigns the edges to
the partitions based
on a given policy
Partitioning Policy
Tumbling Window
Shared
State
Core
26. Architecture: Local State
26
Each subpartitioner has a local
state, which includes information
about the edges processed
locally:
• partial degree
• partitions of each vertex
• num. edges in each partition
Partitioning Policy
Subpartitioner 1
Edge stream
Core
Partitioning Policy
Tumbling Window
Local
State
Subpartitioner n
Edge stream
Async
Async
Local
State
Tumbling Window
Shared
State
Core
27. Architecture: Shared State
27
Shared-state is the global state accessible by
all subpartitioners.
Partitioning Policy
Subpartitioner 1
Edge
stream
Cor
e
Partitioning Policy
Tumbling Window
Lo
ca
l
S
t
a
t
e
Subpartitioner n
Edge
stream
Asyn
c
Asyn
c
Tumbling Window
Shared
State
Core
Lo
ca
l
S
t
a
t
e
28. Architecture: Shared State
28
Shared-state is the global state accessible by
all subpartitioners.
putState
getState
ID Partial Degree partitions
v1 12 p1
v2 50 p1,p2
Vertex Table Partition
Table
Shared State
ID Num. of
edges
p1 5000
p2 6500
Partitioning Policy
Subpartitioner 1
Edge
stream
Cor
e
Partitioning Policy
Tumbling Window
Lo
ca
l
S
t
a
t
e
Subpartitioner n
Edge
stream
Asyn
c
Asyn
c
Tumbling Window
Shared
State
Core
Lo
ca
l
S
t
a
t
e
29. Architecture: Core
29
Partitioning Policy
Local
State
Subpartitioner 1
Edge stream
Core
Partitioning Policy
Tumbling Window
Local
State
Subpartitioner n
Edge stream
Async
Async
The core is HoVerCut’s
main algorithm
parametrised with
partitioning policy and the
window size.
Core
Shared
State
Tumbling Window
32. Vertex-Cut Partitioning Heuristics
Choose the partition that
maximizes the Score.
32
Score = ReplicationScore + LoadBalanceScore
For an edge with end-vertices u
and v and for every partition p
33. Vertex-Cut Partitioning Heuristics
Choose the partition that
maximizes the Score
33
Score = ReplicationScore + LoadBalanceScore
State-of-the-Art Heuristics:
•Greedy
•HDRF
For an edge with end-vertices u
and v and for every partition p
35. Greedy vs. HDRF
Greedy: places end-vertices
u and v of an edge in a
partition that already has a
replica of u or v.
35
36. Greedy vs. HDRF
Greedy: places end-vertices
u and v of an edge in a
partition that already has a
replica of u or v.
36
P1
P2
u
v
u
Greedy
37. Greedy vs. HDRF
Greedy: places end-vertices
u and v of an edge in a
partition that already has a
replica of u or v.
37
P1
P2
u
v
u
P1
P2
u
v
Greedy
38. Greedy vs. HDRF
Greedy: places end-vertices
u and v of an edge in a
partition that already has a
replica of u or v.
38
P1
P2
u
v
u
P1
P2
u
v
Greedy
HDRF (High Degree
Replicated First): replicates
the higher degree end-
vertex.
39. Greedy vs. HDRF
Greedy: places end-vertices
u and v of an edge in a
partition that already has a
replica of u or v.
39
P1
P2
u
v
u
P1
P2
u
v
Greedy
P1
P2
u
v
u
HDRF
v
HDRF (High Degree
Replicated First): replicates
the higher degree end-
vertex.
40. Greedy vs. HDRF
Greedy: places end-vertices
u and v of an edge in a
partition that already has a
replica of u or v.
40
P1
P2
u
v
u
P1
P2
u
v
Greedy
P1
P2
u
v
u
P1
P2
HDRF
u
v
v
vHDRF (High Degree
Replicated First): replicates
the higher degree end-
vertex.
42. Partitioning a Window of Edges
vids: the set of vertex ids in the current window
edges: set of edges in current window
pt = get the partition table
vt = get the vertex subtable restricted to vids
42
43. Partitioning a Window of Edges
vids: the set of vertex ids in the current window
edges: set of edges in current window
pt = get the partition table
vt = get the vertex table restricted to vids
for each e ∊ edges:
u = e.src , v = e.dst
increment vt(u).degree and vt(v).degree
given a partition policy: select p based on vt(u), vt(v) and pt
add p to vt(u).partitions and vt(v).partitions
increment pt(p).size
end
43
44. Partitioning a Window of Edges
vids: the set of vertex ids in the current window
edges: set of edges in current window
pt = get the partition table
vt = get the vertex table restricted to vids
for each e ∊ edges:
u = e.src , v = e.dst
increment vt(u).degree and vt(v).degree
given a partition policy: select p based on vt(u), vt(v) and pt
add p to vt(u).partitions and vt(v).partitions
increment pt(p).size
end
update the shared state by sending vt, pt represented as deltas
44
45. Partitioning a Window of Edges
vids: the set of vertex ids in the current window
edges: set of edges in current window
pt = get the partition table
vt = get the vertex table restricted to vids
for each e ∊ edges:
u = e.src , v = e.dst
increment vt(u).degree and vt(v).degree
given a partition policy: select p based on vt(u), vt(v) and pt
add p to vt(u).partitions and vt(v).partitions
increment pt(p).size
end
update the shared state by sending vt, pt represented as deltas
ID Degree partitions
v1 +4 +p1
v2 +2 +p2
ID size
p1 +3
p2 +1
vt pt
47. Datasets
47
Dataset |V| |E|
Autonomous systems (AS) 1.7M 11M
Pokec social network (PSN) 1.6M 22M
LiveJournal social network (LSN) 4.8M 48M
Orkut social network (OSN) 3.1M 117M
Partitions: 16
48. Evaluation Metrics
•Replication Factor (RF): the average number of replicated vertices
•Load Relative Standard Deviation (LRSD): the relative standard
deviation of edge size in each partition (LRSD=0 indicates equal
size partitions)
•Partitioning time: the time it takes to partition a graph
48
53. Conclusion
•We presented HoVerCut, a parallel and
distributed partitioner
•We can employ different partitioning
policies in a scalable fashion
•We can scale HoVerCut to partition larger
graphs without degrading the quality of
partitions 53