SlideShare a Scribd company logo
HDRF: Stream-Based Partitioning
for Power-Law Graphs.
travel grant from
Fabio Petroni, L. Querzoni, K. Daudjee, S. Kamali, G. Iacoboni
Graphs
I large amount of real data can be represented as a graph.
Vertices
Edges
G(V,E)
I the problem of optimally partitioning a graph while
maintaining load balance is important in several contexts.
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 2 of 22
Distributed Graph Computing Frameworks
I support e cient computation on really large graphs.
B graph analytics; data mining and machine learning tasks.
I input data partitioning can have a significant impact on the
performance of the graph computation.
B a↵ects network usage, memory occupation, synchronization.
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 3 of 22
Balanced Graph Partitioning
vertex-cutedge-cut
I partition G into smaller components of (ideally) equal size
edge-cut
vertex disjoint partitions
vertex-cut
edge disjoint partitions
I a vertex can be cut in multiple ways and span several
partitions while a cut edge connects only two partitions.
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 4 of 22
Balanced Graph Partitioning
vertex-cutedge-cut
I partition G into smaller components of (ideally) equal size
edge-cut
vertex disjoint partitions
vertex-cut
edge disjoint partitions
I a vertex can be cut in multiple ways and span several
partitions while a cut edge connects only two partitions.
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 4 of 22
Vertex-Cut
computation steps are
associated with edges
v-cut perform better on
power law graphs
create less storage and
network overhead
(Gonzalez et al., 2012)
I modern distributed graph computing
frameworks use vertex-cut.
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 5 of 22
Power-Law Graphs
I characteristic of real graphs: power-law degree distribution.
B most vertices have few connections while a few have many.
10
0
101
10
2
10
3
10
4
10
5
10
6
10
0
10
1
10
2
10
3
10
4
10
5
numberofvertices
degree
crawl
Twitter
connection
2010
log-log scale
I the probability that a vertex has degree d is P(d) / d ↵
.
I ↵ controls the “skewness” of the degree distribution.
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 6 of 22
Balanced Vertex-Cut Graph Partitioning
I v 2 V vertex; e 2 E edge; p 2 P partition.
I A(v) set of partitions where vertex v is replicated.
I 1 tolerance to load imbalance.
I the size |p| of partition p is its edge cardinality.
minimize replicas
reduce (1) bandwidth, (2) memory
usage and (3) synchronization
balance the load
efficient usage of available
computing resources
min
1
|V |
X
v2V
|A(v)| s.t. max
p2P
|p| <
|E|
|P|
I the objective function is the replication factor (RF).
B average number of replicas per vertex.
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 7 of 22
Streaming Setting
I input data is a list of edges, consumed in streaming fashion,
requiring only a single pass.
partitioning
algorithm
stream of edges
graph
3 handle graphs that don’t fit in the main memory.
3 impose minimum overhead in time.
3 scalable, easy parallel implementations.
7 assignment decision taken cannot be later changed.
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 8 of 22
Algorithms Taxonomy
Grid PDS HDRFDBH Greedy
hashing
stream-based
vertex-cut graph
partitioning
Hashing
constrained
partitioning
(Gonzalez et al., 2012)(Jain et al., 2013) (Petroni et al., 2015)(Xie et al., 2014)
history
aware
(Jain et al., 2013)(Gonzalez et al., 2012)
history
agnostic
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 9 of 22
Algorithms Taxonomy
Grid PDS HDRFDBH Greedy
hashing
stream-based
vertex-cut graph
partitioning
Hashing
constrained
partitioning
(Gonzalez et al., 2012)(Jain et al., 2013) (Petroni et al., 2015)(Xie et al., 2014)
history
aware
(Jain et al., 2013)(Gonzalez et al., 2012)
history
agnostic
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 9 of 22
HDRF: High Degree are Replicated First
I favor the replication of high-degree vertices.
I the number of high-degree vertices in power-law graphs is
very low.
I overall reduction of the replication factor.
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 10 of 22
HDRF: High Degree are Replicated First
I in the context of robustness to network failure.
I if few high-degree vertices are removed from a power-law
graph then it is turned into a set of isolated clusters.
I focus on the locality of low-degree vertices.
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 10 of 22
The HDRF Algorithm
vertex without replicas
vertex with replicas
case 1
vertices not assigned to partitions
incoming edge
e
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 11 of 22
The HDRF Algorithm
vertex without replicas
vertex with replicas
case 1
place e in the least loaded partition
incoming edge
e
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 11 of 22
The HDRF Algorithm
case 2
only one vertex has been assigned
vertex without replicas
vertex with replicas
case 1
place e in the least loaded partition
incoming edge
e
e
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 11 of 22
The HDRF Algorithm
case 2
place e in the partition
vertex without replicas
vertex with replicas
case 1
place e in the least loaded partition
incoming edge
e
e
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 11 of 22
The HDRF Algorithm
case 2
place e in the partition
vertex without replicas
vertex with replicas
case 1
place e in the least loaded partition
incoming edge
e
e
case 3
vertices assigned, common partition
e
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 11 of 22
The HDRF Algorithm
case 2
place e in the partition
vertex without replicas
vertex with replicas
case 1
place e in the least loaded partition
incoming edge
e
e
case 3
place e in the intersection
e
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 11 of 22
Create Replicas
case 4
empty intersection
e
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 12 of 22
Create Replicas
case 4
least loaded partition in the union
case 4
empty intersection
e
e
standard Greedy solution
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 12 of 22
Create Replicas
case 4
least loaded partition in the union
case 4
empty intersection
e
case 4
replicate vertex with highest degree
e
δ(v1) > δ(v2)
e
standard Greedy solution
HDRF
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 12 of 22
Equivalent Formulation: Maximize The Score
I computing a score C(e, p) for all partitions p 2 P.
I assigns e to the partition that maximizes C(e, p).
C(e, p) = CREP(e, p) + CBAL(p)
I the balance term breaks ties in replication term .
I this may not be enough to ensure load balance.
control the importance of the balance term.
balanced partitions even when classical greedy fails.
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 13 of 22
Equivalent Formulation: Maximize The Score
I computing a score C(e, p) for all partitions p 2 P.
I assigns e to the partition that maximizes C(e, p).
C(e, p) = CREP(e, p) + · CBAL(p)
the balance term breaks ties in replication term.
this may not be enough to ensure load balance.
I controls the importance of the balance term .
I balanced partitions even when classical greedy fails.
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 13 of 22
Ordered Stream Of Edges
I without
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 14 of 22
Ordered Stream Of Edges
I without
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 14 of 22
Ordered Stream Of Edges
I without
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 14 of 22
Ordered Stream Of Edges
I without
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 14 of 22
Ordered Stream Of Edges
I without
single partition
all the graph in a
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 14 of 22
Ordered Stream Of Edges
I with
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 14 of 22
Experiments - Settings
I standalone partitioner.
B VGP, a software package for one-pass vertex-cut balanced
graph partitioning.
B measure the performance: replication and balancing.
I GraphLab.
B HDRF has been integrated in GraphLab PowerGraph 2.2.
B measure the impact on the execution time of graph
computation in a distributed graph computing frameworks.
I stream of edges in random order.
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 15 of 22
Experiments - Datasets
I real-word
graphs.
I synthetic
graphs.
1M vertices
60M to 3M edges
Dataset |V | |E|
MovieLens 10M 80.6K 10M
Netflix 497.9K 100.4M
Tencent Weibo 1.4M 140M
twitter-2010 41.7M 1.47B
10
0
10
1
10
2
10
3
104
105
101
102
103
104
105
numberofvertices
degree
α = 1.8
α = 2.2
α = 2.6
α = 3.0
α = 3.4
α = 4.0
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 16 of 22
Results - Synthetic Graphs Replication Factor
I 128 partitions
4
8
1.8 2.0 2.4 2.8 3.2 3.6 4.0
replicationfactor
alpha
HDRF
PDS |P|=133
grid |P|=121
greedy
DBH
skewed homogeneous
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 17 of 22
Results - Synthetic Graphs Replication Factor
I 128 partitions
4
8
1.8 2.0 2.4 2.8 3.2 3.6 4.0
replicationfactor
alpha
HDRF
PDS |P|=133
grid |P|=121
greedy
DBH
skewed homogeneous
less dense
less edges
easyer to partition
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 17 of 22
Results - Synthetic Graphs Replication Factor
I 128 partitions
4
8
1.8 2.0 2.4 2.8 3.2 3.6 4.0
replicationfactor
alpha
HDRF
PDS |P|=133
grid |P|=121
greedy
DBH
skewed homogeneous
less dense
less edges
easier to partition
more dense
more edges
difficult to partition
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 17 of 22
Results - Synthetic Graphs Replication Factor
I 128 partitions
4
8
1.8 2.0 2.4 2.8 3.2 3.6 4.0
replicationfactor
alpha
HDRF
PDS |P|=133
grid |P|=121
greedy
DBH
skewed homogeneous
less dense
less edges
easier to partition
more dense
more edges
difficult to partition
exploit
power-law
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 17 of 22
Results - Synthetic Graphs Replication Factor
I 128 partitions
4
8
1.8 2.0 2.4 2.8 3.2 3.6 4.0
replicationfactor
alpha
HDRF
PDS |P|=133
grid |P|=121
greedy
DBH
skewed homogeneous
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 17 of 22
Results - Real-Word Graphs Replication Factor
I 133 partitions
1
2
4
8
16
Tencent Weibo Netflix MovieLens 10M twitter-2010
replicationfactor
PDS
7.9
11.3 11.5
8.0
greedy
2.8
8.1
10.7
5.9
DBH
1.5
7.1
12.9
6.8
HDRF
1.3
4.9
9.1
4.8
1
2
4
8
16
Tencent Weibo Netflix MovieLens 10M twitter-2010
replicationfactor
PDS
greedy
DBH
HDRF
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 18 of 22
Results - Real-Word Graphs Replication Factor
I 133 partitions
1
2
4
8
16
Tencent Weibo Netflix MovieLens 10M twitter-2010
replicationfactor
PDS
7.9
11.3 11.5
8.0
greedy
2.8
8.1
10.7
5.9
DBH
1.5
7.1
12.9
6.8
HDRF
1.3
4.9
9.1
4.8
1
2
4
8
16
Tencent Weibo Netflix MovieLens 10M twitter-2010
replicationfactor
PDS
greedy
DBH
HDRF
HDRF achieves a replication factor
about 40% smaller than DBH,
more than 50% smaller than Greedy,
almost 3× smaller than PDS,
more than 4× smaller than Grid and
almost 14× smaller than Hashing.
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 18 of 22
Results - Load Relative Standard Deviation
I MovieLens 10M
0
1
2
3
4
5
6
7
8 16 32 64 128 256
loadrelativestandarddeviation(%)
partitions
PDS
grid
greedy
DBH
HDRF
hashing
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 19 of 22
Results - Graph Algorithm Runtime Speedup
I SGD algorithm for collaborative filtering on Tencent Weibo.
1
1.5
2
2.5
3
32 64 128
speedup(×)
partitions
greedy
PDS
1
1.5
2
2.5
3
32 64 128
speedup(×)
partitions
greedy
PDS |P|=31,57,133
I the speedup is proportional to both:
B the advantage in replication factor.
B the actual network usage of the algorithm.
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 20 of 22
Conclusion
I HDRF is a one-pass vertex-cut graph partitioning algorithm.
I based on a greedy vertex-cut approach that leverages
information on vertex degrees.
I we provide a theoretical analysis of HDRF with an
average-case upper bound for the vertex replication factor.
I experimental study shows that:
B HDRF provides the smallest replication factor with close to
optimal load balance.
B HDRF significantly reduces the time needed to perform
computation on graphs.
I the stand-alone software package for one-pass v-cut balanced
partitioning at https://github.com/fabiopetroni/VGP.
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 21 of 22
Thank you!
Questions?
Fabio Petroni
Sapienza University of Rome, Italy
Current position:
PhD Student in Engineering in Computer Science
Research Interests:
data mining, machine learning, big data
petroni@dis.uniroma1.it
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 22 of 22

More Related Content

Similar to HDRF: Stream-Based Partitioning for Power-Law Graphs

03 hdrf presentation
03 hdrf presentation03 hdrf presentation
03 hdrf presentation
AndreaCingolani
 
GASGD: Stochastic Gradient Descent for Distributed Asynchronous Matrix Comple...
GASGD: Stochastic Gradient Descent for Distributed Asynchronous Matrix Comple...GASGD: Stochastic Gradient Descent for Distributed Asynchronous Matrix Comple...
GASGD: Stochastic Gradient Descent for Distributed Asynchronous Matrix Comple...
Fabio Petroni, PhD
 
IRJET- Design and Implementation of Combinational Circuits using Reversible G...
IRJET- Design and Implementation of Combinational Circuits using Reversible G...IRJET- Design and Implementation of Combinational Circuits using Reversible G...
IRJET- Design and Implementation of Combinational Circuits using Reversible G...
IRJET Journal
 
IRJET- Design and Implementation of Combinational Circuits using Reversib...
IRJET-  	  Design and Implementation of Combinational Circuits using Reversib...IRJET-  	  Design and Implementation of Combinational Circuits using Reversib...
IRJET- Design and Implementation of Combinational Circuits using Reversib...
IRJET Journal
 
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
IRJET Journal
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix Dataset
Ben Mabey
 
Reducing Structural Bias in Technology Mapping
Reducing Structural Bias in Technology MappingReducing Structural Bias in Technology Mapping
Reducing Structural Bias in Technology Mappingsatrajit
 
Fpga 07-port-rules-gate-delay-data-flow-carry-look-ahead-adder
Fpga 07-port-rules-gate-delay-data-flow-carry-look-ahead-adderFpga 07-port-rules-gate-delay-data-flow-carry-look-ahead-adder
Fpga 07-port-rules-gate-delay-data-flow-carry-look-ahead-adderMalik Tauqir Hasan
 
Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...
Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...
Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...
IRJET Journal
 
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...
Artem Lutov
 
A Configurable and Low Power Hard-Decision Viterbi Decoder in VLSI Architecture
A Configurable and Low Power Hard-Decision Viterbi Decoder in VLSI ArchitectureA Configurable and Low Power Hard-Decision Viterbi Decoder in VLSI Architecture
A Configurable and Low Power Hard-Decision Viterbi Decoder in VLSI Architecture
IRJET Journal
 
Technical aptitude questions_e_book1
Technical aptitude questions_e_book1Technical aptitude questions_e_book1
Technical aptitude questions_e_book1
Sateesh Allu
 
IRJET - Realization of Power Optimised Carry Skip Adder using AOI Logic
IRJET -  	  Realization of Power Optimised Carry Skip Adder using AOI LogicIRJET -  	  Realization of Power Optimised Carry Skip Adder using AOI Logic
IRJET - Realization of Power Optimised Carry Skip Adder using AOI Logic
IRJET Journal
 
IRJET- Color Image Compression using Canonic Signed Digit and Block based...
IRJET-  	  Color Image Compression using Canonic Signed Digit and Block based...IRJET-  	  Color Image Compression using Canonic Signed Digit and Block based...
IRJET- Color Image Compression using Canonic Signed Digit and Block based...
IRJET Journal
 
IRJET- High Speed Multi-Rate Approach based Adaptive Filter using Multiplier-...
IRJET- High Speed Multi-Rate Approach based Adaptive Filter using Multiplier-...IRJET- High Speed Multi-Rate Approach based Adaptive Filter using Multiplier-...
IRJET- High Speed Multi-Rate Approach based Adaptive Filter using Multiplier-...
IRJET Journal
 
Area, Delay and Power Comparison of Adder Topologies
Area, Delay and Power Comparison of Adder TopologiesArea, Delay and Power Comparison of Adder Topologies
Area, Delay and Power Comparison of Adder Topologies
VLSICS Design
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET Journal
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET Journal
 
H010414651
H010414651H010414651
H010414651
IOSR Journals
 

Similar to HDRF: Stream-Based Partitioning for Power-Law Graphs (20)

03 hdrf presentation
03 hdrf presentation03 hdrf presentation
03 hdrf presentation
 
GASGD: Stochastic Gradient Descent for Distributed Asynchronous Matrix Comple...
GASGD: Stochastic Gradient Descent for Distributed Asynchronous Matrix Comple...GASGD: Stochastic Gradient Descent for Distributed Asynchronous Matrix Comple...
GASGD: Stochastic Gradient Descent for Distributed Asynchronous Matrix Comple...
 
IRJET- Design and Implementation of Combinational Circuits using Reversible G...
IRJET- Design and Implementation of Combinational Circuits using Reversible G...IRJET- Design and Implementation of Combinational Circuits using Reversible G...
IRJET- Design and Implementation of Combinational Circuits using Reversible G...
 
IRJET- Design and Implementation of Combinational Circuits using Reversib...
IRJET-  	  Design and Implementation of Combinational Circuits using Reversib...IRJET-  	  Design and Implementation of Combinational Circuits using Reversib...
IRJET- Design and Implementation of Combinational Circuits using Reversib...
 
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix Dataset
 
Reducing Structural Bias in Technology Mapping
Reducing Structural Bias in Technology MappingReducing Structural Bias in Technology Mapping
Reducing Structural Bias in Technology Mapping
 
Fpga 07-port-rules-gate-delay-data-flow-carry-look-ahead-adder
Fpga 07-port-rules-gate-delay-data-flow-carry-look-ahead-adderFpga 07-port-rules-gate-delay-data-flow-carry-look-ahead-adder
Fpga 07-port-rules-gate-delay-data-flow-carry-look-ahead-adder
 
Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...
Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...
Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...
 
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...
 
A Configurable and Low Power Hard-Decision Viterbi Decoder in VLSI Architecture
A Configurable and Low Power Hard-Decision Viterbi Decoder in VLSI ArchitectureA Configurable and Low Power Hard-Decision Viterbi Decoder in VLSI Architecture
A Configurable and Low Power Hard-Decision Viterbi Decoder in VLSI Architecture
 
Technical aptitude questions_e_book1
Technical aptitude questions_e_book1Technical aptitude questions_e_book1
Technical aptitude questions_e_book1
 
IRJET - Realization of Power Optimised Carry Skip Adder using AOI Logic
IRJET -  	  Realization of Power Optimised Carry Skip Adder using AOI LogicIRJET -  	  Realization of Power Optimised Carry Skip Adder using AOI Logic
IRJET - Realization of Power Optimised Carry Skip Adder using AOI Logic
 
IRJET- Color Image Compression using Canonic Signed Digit and Block based...
IRJET-  	  Color Image Compression using Canonic Signed Digit and Block based...IRJET-  	  Color Image Compression using Canonic Signed Digit and Block based...
IRJET- Color Image Compression using Canonic Signed Digit and Block based...
 
Masters Thesis
Masters ThesisMasters Thesis
Masters Thesis
 
IRJET- High Speed Multi-Rate Approach based Adaptive Filter using Multiplier-...
IRJET- High Speed Multi-Rate Approach based Adaptive Filter using Multiplier-...IRJET- High Speed Multi-Rate Approach based Adaptive Filter using Multiplier-...
IRJET- High Speed Multi-Rate Approach based Adaptive Filter using Multiplier-...
 
Area, Delay and Power Comparison of Adder Topologies
Area, Delay and Power Comparison of Adder TopologiesArea, Delay and Power Comparison of Adder Topologies
Area, Delay and Power Comparison of Adder Topologies
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms Comparison
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms Comparison
 
H010414651
H010414651H010414651
H010414651
 

Recently uploaded

6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
ClaraZara1
 
The Role of Electrical and Electronics Engineers in IOT Technology.pdf
The Role of Electrical and Electronics Engineers in IOT Technology.pdfThe Role of Electrical and Electronics Engineers in IOT Technology.pdf
The Role of Electrical and Electronics Engineers in IOT Technology.pdf
Nettur Technical Training Foundation
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
aqil azizi
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
ChristineTorrepenida1
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
heavyhaig
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
Basic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparelBasic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparel
top1002
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
Steel & Timber Design according to British Standard
Steel & Timber Design according to British StandardSteel & Timber Design according to British Standard
Steel & Timber Design according to British Standard
AkolbilaEmmanuel1
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
AmarGB2
 
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
ssuser7dcef0
 

Recently uploaded (20)

6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
 
The Role of Electrical and Electronics Engineers in IOT Technology.pdf
The Role of Electrical and Electronics Engineers in IOT Technology.pdfThe Role of Electrical and Electronics Engineers in IOT Technology.pdf
The Role of Electrical and Electronics Engineers in IOT Technology.pdf
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
Basic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparelBasic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparel
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
Steel & Timber Design according to British Standard
Steel & Timber Design according to British StandardSteel & Timber Design according to British Standard
Steel & Timber Design according to British Standard
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
 
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
 

HDRF: Stream-Based Partitioning for Power-Law Graphs

  • 1. HDRF: Stream-Based Partitioning for Power-Law Graphs. travel grant from Fabio Petroni, L. Querzoni, K. Daudjee, S. Kamali, G. Iacoboni
  • 2. Graphs I large amount of real data can be represented as a graph. Vertices Edges G(V,E) I the problem of optimally partitioning a graph while maintaining load balance is important in several contexts. CIKM 2015. October 19-23, 2015. Melbourne, Australia. 2 of 22
  • 3. Distributed Graph Computing Frameworks I support e cient computation on really large graphs. B graph analytics; data mining and machine learning tasks. I input data partitioning can have a significant impact on the performance of the graph computation. B a↵ects network usage, memory occupation, synchronization. CIKM 2015. October 19-23, 2015. Melbourne, Australia. 3 of 22
  • 4. Balanced Graph Partitioning vertex-cutedge-cut I partition G into smaller components of (ideally) equal size edge-cut vertex disjoint partitions vertex-cut edge disjoint partitions I a vertex can be cut in multiple ways and span several partitions while a cut edge connects only two partitions. CIKM 2015. October 19-23, 2015. Melbourne, Australia. 4 of 22
  • 5. Balanced Graph Partitioning vertex-cutedge-cut I partition G into smaller components of (ideally) equal size edge-cut vertex disjoint partitions vertex-cut edge disjoint partitions I a vertex can be cut in multiple ways and span several partitions while a cut edge connects only two partitions. CIKM 2015. October 19-23, 2015. Melbourne, Australia. 4 of 22
  • 6. Vertex-Cut computation steps are associated with edges v-cut perform better on power law graphs create less storage and network overhead (Gonzalez et al., 2012) I modern distributed graph computing frameworks use vertex-cut. CIKM 2015. October 19-23, 2015. Melbourne, Australia. 5 of 22
  • 7. Power-Law Graphs I characteristic of real graphs: power-law degree distribution. B most vertices have few connections while a few have many. 10 0 101 10 2 10 3 10 4 10 5 10 6 10 0 10 1 10 2 10 3 10 4 10 5 numberofvertices degree crawl Twitter connection 2010 log-log scale I the probability that a vertex has degree d is P(d) / d ↵ . I ↵ controls the “skewness” of the degree distribution. CIKM 2015. October 19-23, 2015. Melbourne, Australia. 6 of 22
  • 8. Balanced Vertex-Cut Graph Partitioning I v 2 V vertex; e 2 E edge; p 2 P partition. I A(v) set of partitions where vertex v is replicated. I 1 tolerance to load imbalance. I the size |p| of partition p is its edge cardinality. minimize replicas reduce (1) bandwidth, (2) memory usage and (3) synchronization balance the load efficient usage of available computing resources min 1 |V | X v2V |A(v)| s.t. max p2P |p| < |E| |P| I the objective function is the replication factor (RF). B average number of replicas per vertex. CIKM 2015. October 19-23, 2015. Melbourne, Australia. 7 of 22
  • 9. Streaming Setting I input data is a list of edges, consumed in streaming fashion, requiring only a single pass. partitioning algorithm stream of edges graph 3 handle graphs that don’t fit in the main memory. 3 impose minimum overhead in time. 3 scalable, easy parallel implementations. 7 assignment decision taken cannot be later changed. CIKM 2015. October 19-23, 2015. Melbourne, Australia. 8 of 22
  • 10. Algorithms Taxonomy Grid PDS HDRFDBH Greedy hashing stream-based vertex-cut graph partitioning Hashing constrained partitioning (Gonzalez et al., 2012)(Jain et al., 2013) (Petroni et al., 2015)(Xie et al., 2014) history aware (Jain et al., 2013)(Gonzalez et al., 2012) history agnostic CIKM 2015. October 19-23, 2015. Melbourne, Australia. 9 of 22
  • 11. Algorithms Taxonomy Grid PDS HDRFDBH Greedy hashing stream-based vertex-cut graph partitioning Hashing constrained partitioning (Gonzalez et al., 2012)(Jain et al., 2013) (Petroni et al., 2015)(Xie et al., 2014) history aware (Jain et al., 2013)(Gonzalez et al., 2012) history agnostic CIKM 2015. October 19-23, 2015. Melbourne, Australia. 9 of 22
  • 12. HDRF: High Degree are Replicated First I favor the replication of high-degree vertices. I the number of high-degree vertices in power-law graphs is very low. I overall reduction of the replication factor. CIKM 2015. October 19-23, 2015. Melbourne, Australia. 10 of 22
  • 13. HDRF: High Degree are Replicated First I in the context of robustness to network failure. I if few high-degree vertices are removed from a power-law graph then it is turned into a set of isolated clusters. I focus on the locality of low-degree vertices. CIKM 2015. October 19-23, 2015. Melbourne, Australia. 10 of 22
  • 14. The HDRF Algorithm vertex without replicas vertex with replicas case 1 vertices not assigned to partitions incoming edge e CIKM 2015. October 19-23, 2015. Melbourne, Australia. 11 of 22
  • 15. The HDRF Algorithm vertex without replicas vertex with replicas case 1 place e in the least loaded partition incoming edge e CIKM 2015. October 19-23, 2015. Melbourne, Australia. 11 of 22
  • 16. The HDRF Algorithm case 2 only one vertex has been assigned vertex without replicas vertex with replicas case 1 place e in the least loaded partition incoming edge e e CIKM 2015. October 19-23, 2015. Melbourne, Australia. 11 of 22
  • 17. The HDRF Algorithm case 2 place e in the partition vertex without replicas vertex with replicas case 1 place e in the least loaded partition incoming edge e e CIKM 2015. October 19-23, 2015. Melbourne, Australia. 11 of 22
  • 18. The HDRF Algorithm case 2 place e in the partition vertex without replicas vertex with replicas case 1 place e in the least loaded partition incoming edge e e case 3 vertices assigned, common partition e CIKM 2015. October 19-23, 2015. Melbourne, Australia. 11 of 22
  • 19. The HDRF Algorithm case 2 place e in the partition vertex without replicas vertex with replicas case 1 place e in the least loaded partition incoming edge e e case 3 place e in the intersection e CIKM 2015. October 19-23, 2015. Melbourne, Australia. 11 of 22
  • 20. Create Replicas case 4 empty intersection e CIKM 2015. October 19-23, 2015. Melbourne, Australia. 12 of 22
  • 21. Create Replicas case 4 least loaded partition in the union case 4 empty intersection e e standard Greedy solution CIKM 2015. October 19-23, 2015. Melbourne, Australia. 12 of 22
  • 22. Create Replicas case 4 least loaded partition in the union case 4 empty intersection e case 4 replicate vertex with highest degree e δ(v1) > δ(v2) e standard Greedy solution HDRF CIKM 2015. October 19-23, 2015. Melbourne, Australia. 12 of 22
  • 23. Equivalent Formulation: Maximize The Score I computing a score C(e, p) for all partitions p 2 P. I assigns e to the partition that maximizes C(e, p). C(e, p) = CREP(e, p) + CBAL(p) I the balance term breaks ties in replication term . I this may not be enough to ensure load balance. control the importance of the balance term. balanced partitions even when classical greedy fails. CIKM 2015. October 19-23, 2015. Melbourne, Australia. 13 of 22
  • 24. Equivalent Formulation: Maximize The Score I computing a score C(e, p) for all partitions p 2 P. I assigns e to the partition that maximizes C(e, p). C(e, p) = CREP(e, p) + · CBAL(p) the balance term breaks ties in replication term. this may not be enough to ensure load balance. I controls the importance of the balance term . I balanced partitions even when classical greedy fails. CIKM 2015. October 19-23, 2015. Melbourne, Australia. 13 of 22
  • 25. Ordered Stream Of Edges I without CIKM 2015. October 19-23, 2015. Melbourne, Australia. 14 of 22
  • 26. Ordered Stream Of Edges I without CIKM 2015. October 19-23, 2015. Melbourne, Australia. 14 of 22
  • 27. Ordered Stream Of Edges I without CIKM 2015. October 19-23, 2015. Melbourne, Australia. 14 of 22
  • 28. Ordered Stream Of Edges I without CIKM 2015. October 19-23, 2015. Melbourne, Australia. 14 of 22
  • 29. Ordered Stream Of Edges I without single partition all the graph in a CIKM 2015. October 19-23, 2015. Melbourne, Australia. 14 of 22
  • 30. Ordered Stream Of Edges I with CIKM 2015. October 19-23, 2015. Melbourne, Australia. 14 of 22
  • 31. Experiments - Settings I standalone partitioner. B VGP, a software package for one-pass vertex-cut balanced graph partitioning. B measure the performance: replication and balancing. I GraphLab. B HDRF has been integrated in GraphLab PowerGraph 2.2. B measure the impact on the execution time of graph computation in a distributed graph computing frameworks. I stream of edges in random order. CIKM 2015. October 19-23, 2015. Melbourne, Australia. 15 of 22
  • 32. Experiments - Datasets I real-word graphs. I synthetic graphs. 1M vertices 60M to 3M edges Dataset |V | |E| MovieLens 10M 80.6K 10M Netflix 497.9K 100.4M Tencent Weibo 1.4M 140M twitter-2010 41.7M 1.47B 10 0 10 1 10 2 10 3 104 105 101 102 103 104 105 numberofvertices degree α = 1.8 α = 2.2 α = 2.6 α = 3.0 α = 3.4 α = 4.0 CIKM 2015. October 19-23, 2015. Melbourne, Australia. 16 of 22
  • 33. Results - Synthetic Graphs Replication Factor I 128 partitions 4 8 1.8 2.0 2.4 2.8 3.2 3.6 4.0 replicationfactor alpha HDRF PDS |P|=133 grid |P|=121 greedy DBH skewed homogeneous CIKM 2015. October 19-23, 2015. Melbourne, Australia. 17 of 22
  • 34. Results - Synthetic Graphs Replication Factor I 128 partitions 4 8 1.8 2.0 2.4 2.8 3.2 3.6 4.0 replicationfactor alpha HDRF PDS |P|=133 grid |P|=121 greedy DBH skewed homogeneous less dense less edges easyer to partition CIKM 2015. October 19-23, 2015. Melbourne, Australia. 17 of 22
  • 35. Results - Synthetic Graphs Replication Factor I 128 partitions 4 8 1.8 2.0 2.4 2.8 3.2 3.6 4.0 replicationfactor alpha HDRF PDS |P|=133 grid |P|=121 greedy DBH skewed homogeneous less dense less edges easier to partition more dense more edges difficult to partition CIKM 2015. October 19-23, 2015. Melbourne, Australia. 17 of 22
  • 36. Results - Synthetic Graphs Replication Factor I 128 partitions 4 8 1.8 2.0 2.4 2.8 3.2 3.6 4.0 replicationfactor alpha HDRF PDS |P|=133 grid |P|=121 greedy DBH skewed homogeneous less dense less edges easier to partition more dense more edges difficult to partition exploit power-law CIKM 2015. October 19-23, 2015. Melbourne, Australia. 17 of 22
  • 37. Results - Synthetic Graphs Replication Factor I 128 partitions 4 8 1.8 2.0 2.4 2.8 3.2 3.6 4.0 replicationfactor alpha HDRF PDS |P|=133 grid |P|=121 greedy DBH skewed homogeneous CIKM 2015. October 19-23, 2015. Melbourne, Australia. 17 of 22
  • 38. Results - Real-Word Graphs Replication Factor I 133 partitions 1 2 4 8 16 Tencent Weibo Netflix MovieLens 10M twitter-2010 replicationfactor PDS 7.9 11.3 11.5 8.0 greedy 2.8 8.1 10.7 5.9 DBH 1.5 7.1 12.9 6.8 HDRF 1.3 4.9 9.1 4.8 1 2 4 8 16 Tencent Weibo Netflix MovieLens 10M twitter-2010 replicationfactor PDS greedy DBH HDRF CIKM 2015. October 19-23, 2015. Melbourne, Australia. 18 of 22
  • 39. Results - Real-Word Graphs Replication Factor I 133 partitions 1 2 4 8 16 Tencent Weibo Netflix MovieLens 10M twitter-2010 replicationfactor PDS 7.9 11.3 11.5 8.0 greedy 2.8 8.1 10.7 5.9 DBH 1.5 7.1 12.9 6.8 HDRF 1.3 4.9 9.1 4.8 1 2 4 8 16 Tencent Weibo Netflix MovieLens 10M twitter-2010 replicationfactor PDS greedy DBH HDRF HDRF achieves a replication factor about 40% smaller than DBH, more than 50% smaller than Greedy, almost 3× smaller than PDS, more than 4× smaller than Grid and almost 14× smaller than Hashing. CIKM 2015. October 19-23, 2015. Melbourne, Australia. 18 of 22
  • 40. Results - Load Relative Standard Deviation I MovieLens 10M 0 1 2 3 4 5 6 7 8 16 32 64 128 256 loadrelativestandarddeviation(%) partitions PDS grid greedy DBH HDRF hashing CIKM 2015. October 19-23, 2015. Melbourne, Australia. 19 of 22
  • 41. Results - Graph Algorithm Runtime Speedup I SGD algorithm for collaborative filtering on Tencent Weibo. 1 1.5 2 2.5 3 32 64 128 speedup(×) partitions greedy PDS 1 1.5 2 2.5 3 32 64 128 speedup(×) partitions greedy PDS |P|=31,57,133 I the speedup is proportional to both: B the advantage in replication factor. B the actual network usage of the algorithm. CIKM 2015. October 19-23, 2015. Melbourne, Australia. 20 of 22
  • 42. Conclusion I HDRF is a one-pass vertex-cut graph partitioning algorithm. I based on a greedy vertex-cut approach that leverages information on vertex degrees. I we provide a theoretical analysis of HDRF with an average-case upper bound for the vertex replication factor. I experimental study shows that: B HDRF provides the smallest replication factor with close to optimal load balance. B HDRF significantly reduces the time needed to perform computation on graphs. I the stand-alone software package for one-pass v-cut balanced partitioning at https://github.com/fabiopetroni/VGP. CIKM 2015. October 19-23, 2015. Melbourne, Australia. 21 of 22
  • 43. Thank you! Questions? Fabio Petroni Sapienza University of Rome, Italy Current position: PhD Student in Engineering in Computer Science Research Interests: data mining, machine learning, big data petroni@dis.uniroma1.it CIKM 2015. October 19-23, 2015. Melbourne, Australia. 22 of 22