F. Petroni, L. Querzoni, K. Daudjee, S. Kamali and G. Iacoboni:
"HDRF: Stream-Based Partitioning for Power-Law Graphs."
In: Proceedings of the 24th ACM International Conference on Information and Knowledge Management (CIKM), 2015.
Abstract: "Balanced graph partitioning is a fundamental problem that is receiving growing attention with the emergence of distributed graph-computing (DGC) frameworks. In these frameworks, the partitioning strategy plays an important role since it drives the communication cost and the workload balance among computing nodes, thereby affecting system performance.
However, existing solutions only partially exploit a key characteristic of natural graphs commonly found in the
real-world: their highly skewed power-law degree distributions.
In this paper, we propose High-Degree (are) Replicated First (HDRF), a novel streaming vertex-cut graph partitioning algorithm that effectively exploits skewed degree distributions by explicitly taking into account vertex degree in the placement decision. We analytically and experimentally evaluate HDRF on both synthetic and real-world graphs and show that it outperforms all existing algorithms in partitioning quality."
Aligning supply chain strategies with product uncertainty cmr spring 2002-k...NathanTse
These slides are a sumup of a paper about supply chain strategy. If you wanna understand more, you can search the original paper on internet. Title is Aligning supply chain strategies with product uncertainty, Hau L. Lee, California Management Review, VOL.44, NO.3, Spring 2002. which given many case examples.
Building the Next-Generation Messaging Platform on Pulsar at Intuit - Pulsar ...StreamNative
Intuit operates a highly distributed, multi-cluster, multi-region messaging platform to serve the queuing use-cases of its applications and services. In this session we will talk about the journey of our messaging platform in the world of Apache Pulsar and share the experiences and learnings gained so far. As we adopted Pulsar for our next generation platform and adapted it for Intuit specific requirements, we faced and solved some intrinsic challenges that we would be happy to share and get feedback. This journey has just begun and we would like to learn and absorb recommended best practices and guidelines.
Extending the Apache Kafka® Replication Protocol Across Clusters, Sanjana Kau...HostedbyConfluent
Extending the Apache Kafka® Replication Protocol Across Clusters, Sanjana Kaundinya | Current 2022
When Apache Kafka® was first created, one of the hallmarks was its native replication protocol, which provided built-in resiliency in the system. As a business scales, there’s a need to have this fault-tolerance transcend beyond the local data center, and a multi-geographic deployment becomes critical. Traditionally, Kafka Connect based solutions have tried their hand at enabling these types of deployments. However, this presents its own set of operational challenges that can be quite costly.
In this talk, we will go over how you can use the existing replication protocol across clusters. You will learn how to use Cluster Linking to run a multi-region data streaming deployment without the burden and operational overhead of running yet another data system. We will discuss:.
* Automation options for creating mirror topics
* Failover processes and caveats to consider
* Handling ACL replication and consumer offset synchronization
* And more!
So, join us on this intergalactic journey to discover how you can use Cluster Linking to decrease your operational overhead, maintain a multi-geographic deployment, and perhaps even reach infinity (and beyond)!
Peer to peer transactions of tokens representing generative photo voltaic capacity of kWh production at the facility level. An Ethereum enabled glimpse into the community energy sharing economy of the future
Aligning supply chain strategies with product uncertainty cmr spring 2002-k...NathanTse
These slides are a sumup of a paper about supply chain strategy. If you wanna understand more, you can search the original paper on internet. Title is Aligning supply chain strategies with product uncertainty, Hau L. Lee, California Management Review, VOL.44, NO.3, Spring 2002. which given many case examples.
Building the Next-Generation Messaging Platform on Pulsar at Intuit - Pulsar ...StreamNative
Intuit operates a highly distributed, multi-cluster, multi-region messaging platform to serve the queuing use-cases of its applications and services. In this session we will talk about the journey of our messaging platform in the world of Apache Pulsar and share the experiences and learnings gained so far. As we adopted Pulsar for our next generation platform and adapted it for Intuit specific requirements, we faced and solved some intrinsic challenges that we would be happy to share and get feedback. This journey has just begun and we would like to learn and absorb recommended best practices and guidelines.
Extending the Apache Kafka® Replication Protocol Across Clusters, Sanjana Kau...HostedbyConfluent
Extending the Apache Kafka® Replication Protocol Across Clusters, Sanjana Kaundinya | Current 2022
When Apache Kafka® was first created, one of the hallmarks was its native replication protocol, which provided built-in resiliency in the system. As a business scales, there’s a need to have this fault-tolerance transcend beyond the local data center, and a multi-geographic deployment becomes critical. Traditionally, Kafka Connect based solutions have tried their hand at enabling these types of deployments. However, this presents its own set of operational challenges that can be quite costly.
In this talk, we will go over how you can use the existing replication protocol across clusters. You will learn how to use Cluster Linking to run a multi-region data streaming deployment without the burden and operational overhead of running yet another data system. We will discuss:.
* Automation options for creating mirror topics
* Failover processes and caveats to consider
* Handling ACL replication and consumer offset synchronization
* And more!
So, join us on this intergalactic journey to discover how you can use Cluster Linking to decrease your operational overhead, maintain a multi-geographic deployment, and perhaps even reach infinity (and beyond)!
Peer to peer transactions of tokens representing generative photo voltaic capacity of kWh production at the facility level. An Ethereum enabled glimpse into the community energy sharing economy of the future
F. Petroni and L. Querzoni:
"GASGD: Stochastic Gradient Descent for Distributed Asynchronous Matrix Completion via Graph Partitioning."
In: Proceedings of the 8th ACM Conference on Recommender Systems (RecSys), 2014.
Abstract: "Matrix completion latent factors models are known to be an effective method to build recommender systems. Currently,
stochastic gradient descent (SGD) is considered one of the best latent factor-based algorithm for matrix completion. In this paper we discuss GASGD, a distributed asynchronous variant of SGD for large-scale matrix completion, that (i) leverages data partitioning schemes based on graph partitioning techniques, (ii) exploits specific characteristics of the input data and (iii) introduces an explicit parameter to tune synchronization frequency among the computing nodes. We empirically show how, thanks to these features, GASGD achieves a fast convergence rate incurring in smaller communication cost with respect to current asynchronous distributed SGD implementations."
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...Artem Lutov
Slides of the presentation given at BigData'19, special session on Information Granulation in Data Science and Scalable Computing.
The fully automatic (i.e., without any manual tuning) graph embedding (i.e., network representation learning, unsupervised feature extraction) performed in near-linear time is presented. The resulting embeddings are interpretable, preserve both low- and high-order structural proximity of the graph nodes, computed (i.e., learned) by orders of magnitude faster and perform competitively to the manually tuned best state-of-the-art embedding techniques evaluated on diverse tasks of graph analysis.
Area, Delay and Power Comparison of Adder TopologiesVLSICS Design
Adders form an almost obligatory component of every contemporary integrated circuit. The prerequisite of the adder is that it is primarily fast and secondarily efficient in terms of power consumption and chip area. This paper presents the pertinent choice for selecting the adder topology with the tradeoff between delay, power consumption and area. The adder topology used in this work are ripple carry adder, carry lookahead adder, carry skip adder, carry select adder, carry increment adder, carry save adder and carry bypass adder. The module functionality and performance issues like area, power dissipation and propagation delay are analyzed at 0.12µm 6metal layer CMOS technology using microwind tool.
6th International Conference on Machine Learning & Applications (CMLA 2024)ClaraZara1
6th International Conference on Machine Learning & Applications (CMLA 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of on Machine Learning & Applications.
The Internet of Things (IoT) is a revolutionary concept that connects everyday objects and devices to the internet, enabling them to communicate, collect, and exchange data. Imagine a world where your refrigerator notifies you when you’re running low on groceries, or streetlights adjust their brightness based on traffic patterns – that’s the power of IoT. In essence, IoT transforms ordinary objects into smart, interconnected devices, creating a network of endless possibilities.
Here is a blog on the role of electrical and electronics engineers in IOT. Let's dig in!!!!
For more such content visit: https://nttftrg.com/
More Related Content
Similar to HDRF: Stream-Based Partitioning for Power-Law Graphs
F. Petroni and L. Querzoni:
"GASGD: Stochastic Gradient Descent for Distributed Asynchronous Matrix Completion via Graph Partitioning."
In: Proceedings of the 8th ACM Conference on Recommender Systems (RecSys), 2014.
Abstract: "Matrix completion latent factors models are known to be an effective method to build recommender systems. Currently,
stochastic gradient descent (SGD) is considered one of the best latent factor-based algorithm for matrix completion. In this paper we discuss GASGD, a distributed asynchronous variant of SGD for large-scale matrix completion, that (i) leverages data partitioning schemes based on graph partitioning techniques, (ii) exploits specific characteristics of the input data and (iii) introduces an explicit parameter to tune synchronization frequency among the computing nodes. We empirically show how, thanks to these features, GASGD achieves a fast convergence rate incurring in smaller communication cost with respect to current asynchronous distributed SGD implementations."
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...Artem Lutov
Slides of the presentation given at BigData'19, special session on Information Granulation in Data Science and Scalable Computing.
The fully automatic (i.e., without any manual tuning) graph embedding (i.e., network representation learning, unsupervised feature extraction) performed in near-linear time is presented. The resulting embeddings are interpretable, preserve both low- and high-order structural proximity of the graph nodes, computed (i.e., learned) by orders of magnitude faster and perform competitively to the manually tuned best state-of-the-art embedding techniques evaluated on diverse tasks of graph analysis.
Area, Delay and Power Comparison of Adder TopologiesVLSICS Design
Adders form an almost obligatory component of every contemporary integrated circuit. The prerequisite of the adder is that it is primarily fast and secondarily efficient in terms of power consumption and chip area. This paper presents the pertinent choice for selecting the adder topology with the tradeoff between delay, power consumption and area. The adder topology used in this work are ripple carry adder, carry lookahead adder, carry skip adder, carry select adder, carry increment adder, carry save adder and carry bypass adder. The module functionality and performance issues like area, power dissipation and propagation delay are analyzed at 0.12µm 6metal layer CMOS technology using microwind tool.
6th International Conference on Machine Learning & Applications (CMLA 2024)ClaraZara1
6th International Conference on Machine Learning & Applications (CMLA 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of on Machine Learning & Applications.
The Internet of Things (IoT) is a revolutionary concept that connects everyday objects and devices to the internet, enabling them to communicate, collect, and exchange data. Imagine a world where your refrigerator notifies you when you’re running low on groceries, or streetlights adjust their brightness based on traffic patterns – that’s the power of IoT. In essence, IoT transforms ordinary objects into smart, interconnected devices, creating a network of endless possibilities.
Here is a blog on the role of electrical and electronics engineers in IOT. Let's dig in!!!!
For more such content visit: https://nttftrg.com/
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...ssuser7dcef0
Power plants release a large amount of water vapor into the
atmosphere through the stack. The flue gas can be a potential
source for obtaining much needed cooling water for a power
plant. If a power plant could recover and reuse a portion of this
moisture, it could reduce its total cooling water intake
requirement. One of the most practical way to recover water
from flue gas is to use a condensing heat exchanger. The power
plant could also recover latent heat due to condensation as well
as sensible heat due to lowering the flue gas exit temperature.
Additionally, harmful acids released from the stack can be
reduced in a condensing heat exchanger by acid condensation. reduced in a condensing heat exchanger by acid condensation.
Condensation of vapors in flue gas is a complicated
phenomenon since heat and mass transfer of water vapor and
various acids simultaneously occur in the presence of noncondensable
gases such as nitrogen and oxygen. Design of a
condenser depends on the knowledge and understanding of the
heat and mass transfer processes. A computer program for
numerical simulations of water (H2O) and sulfuric acid (H2SO4)
condensation in a flue gas condensing heat exchanger was
developed using MATLAB. Governing equations based on
mass and energy balances for the system were derived to
predict variables such as flue gas exit temperature, cooling
water outlet temperature, mole fraction and condensation rates
of water and sulfuric acid vapors. The equations were solved
using an iterative solution technique with calculations of heat
and mass transfer coefficients and physical properties.
2. Graphs
I large amount of real data can be represented as a graph.
Vertices
Edges
G(V,E)
I the problem of optimally partitioning a graph while
maintaining load balance is important in several contexts.
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 2 of 22
3. Distributed Graph Computing Frameworks
I support e cient computation on really large graphs.
B graph analytics; data mining and machine learning tasks.
I input data partitioning can have a significant impact on the
performance of the graph computation.
B a↵ects network usage, memory occupation, synchronization.
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 3 of 22
4. Balanced Graph Partitioning
vertex-cutedge-cut
I partition G into smaller components of (ideally) equal size
edge-cut
vertex disjoint partitions
vertex-cut
edge disjoint partitions
I a vertex can be cut in multiple ways and span several
partitions while a cut edge connects only two partitions.
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 4 of 22
5. Balanced Graph Partitioning
vertex-cutedge-cut
I partition G into smaller components of (ideally) equal size
edge-cut
vertex disjoint partitions
vertex-cut
edge disjoint partitions
I a vertex can be cut in multiple ways and span several
partitions while a cut edge connects only two partitions.
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 4 of 22
6. Vertex-Cut
computation steps are
associated with edges
v-cut perform better on
power law graphs
create less storage and
network overhead
(Gonzalez et al., 2012)
I modern distributed graph computing
frameworks use vertex-cut.
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 5 of 22
7. Power-Law Graphs
I characteristic of real graphs: power-law degree distribution.
B most vertices have few connections while a few have many.
10
0
101
10
2
10
3
10
4
10
5
10
6
10
0
10
1
10
2
10
3
10
4
10
5
numberofvertices
degree
crawl
Twitter
connection
2010
log-log scale
I the probability that a vertex has degree d is P(d) / d ↵
.
I ↵ controls the “skewness” of the degree distribution.
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 6 of 22
8. Balanced Vertex-Cut Graph Partitioning
I v 2 V vertex; e 2 E edge; p 2 P partition.
I A(v) set of partitions where vertex v is replicated.
I 1 tolerance to load imbalance.
I the size |p| of partition p is its edge cardinality.
minimize replicas
reduce (1) bandwidth, (2) memory
usage and (3) synchronization
balance the load
efficient usage of available
computing resources
min
1
|V |
X
v2V
|A(v)| s.t. max
p2P
|p| <
|E|
|P|
I the objective function is the replication factor (RF).
B average number of replicas per vertex.
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 7 of 22
9. Streaming Setting
I input data is a list of edges, consumed in streaming fashion,
requiring only a single pass.
partitioning
algorithm
stream of edges
graph
3 handle graphs that don’t fit in the main memory.
3 impose minimum overhead in time.
3 scalable, easy parallel implementations.
7 assignment decision taken cannot be later changed.
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 8 of 22
10. Algorithms Taxonomy
Grid PDS HDRFDBH Greedy
hashing
stream-based
vertex-cut graph
partitioning
Hashing
constrained
partitioning
(Gonzalez et al., 2012)(Jain et al., 2013) (Petroni et al., 2015)(Xie et al., 2014)
history
aware
(Jain et al., 2013)(Gonzalez et al., 2012)
history
agnostic
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 9 of 22
11. Algorithms Taxonomy
Grid PDS HDRFDBH Greedy
hashing
stream-based
vertex-cut graph
partitioning
Hashing
constrained
partitioning
(Gonzalez et al., 2012)(Jain et al., 2013) (Petroni et al., 2015)(Xie et al., 2014)
history
aware
(Jain et al., 2013)(Gonzalez et al., 2012)
history
agnostic
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 9 of 22
12. HDRF: High Degree are Replicated First
I favor the replication of high-degree vertices.
I the number of high-degree vertices in power-law graphs is
very low.
I overall reduction of the replication factor.
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 10 of 22
13. HDRF: High Degree are Replicated First
I in the context of robustness to network failure.
I if few high-degree vertices are removed from a power-law
graph then it is turned into a set of isolated clusters.
I focus on the locality of low-degree vertices.
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 10 of 22
14. The HDRF Algorithm
vertex without replicas
vertex with replicas
case 1
vertices not assigned to partitions
incoming edge
e
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 11 of 22
15. The HDRF Algorithm
vertex without replicas
vertex with replicas
case 1
place e in the least loaded partition
incoming edge
e
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 11 of 22
16. The HDRF Algorithm
case 2
only one vertex has been assigned
vertex without replicas
vertex with replicas
case 1
place e in the least loaded partition
incoming edge
e
e
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 11 of 22
17. The HDRF Algorithm
case 2
place e in the partition
vertex without replicas
vertex with replicas
case 1
place e in the least loaded partition
incoming edge
e
e
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 11 of 22
18. The HDRF Algorithm
case 2
place e in the partition
vertex without replicas
vertex with replicas
case 1
place e in the least loaded partition
incoming edge
e
e
case 3
vertices assigned, common partition
e
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 11 of 22
19. The HDRF Algorithm
case 2
place e in the partition
vertex without replicas
vertex with replicas
case 1
place e in the least loaded partition
incoming edge
e
e
case 3
place e in the intersection
e
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 11 of 22
21. Create Replicas
case 4
least loaded partition in the union
case 4
empty intersection
e
e
standard Greedy solution
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 12 of 22
22. Create Replicas
case 4
least loaded partition in the union
case 4
empty intersection
e
case 4
replicate vertex with highest degree
e
δ(v1) > δ(v2)
e
standard Greedy solution
HDRF
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 12 of 22
23. Equivalent Formulation: Maximize The Score
I computing a score C(e, p) for all partitions p 2 P.
I assigns e to the partition that maximizes C(e, p).
C(e, p) = CREP(e, p) + CBAL(p)
I the balance term breaks ties in replication term .
I this may not be enough to ensure load balance.
control the importance of the balance term.
balanced partitions even when classical greedy fails.
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 13 of 22
24. Equivalent Formulation: Maximize The Score
I computing a score C(e, p) for all partitions p 2 P.
I assigns e to the partition that maximizes C(e, p).
C(e, p) = CREP(e, p) + · CBAL(p)
the balance term breaks ties in replication term.
this may not be enough to ensure load balance.
I controls the importance of the balance term .
I balanced partitions even when classical greedy fails.
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 13 of 22
25. Ordered Stream Of Edges
I without
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 14 of 22
26. Ordered Stream Of Edges
I without
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 14 of 22
27. Ordered Stream Of Edges
I without
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 14 of 22
28. Ordered Stream Of Edges
I without
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 14 of 22
29. Ordered Stream Of Edges
I without
single partition
all the graph in a
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 14 of 22
30. Ordered Stream Of Edges
I with
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 14 of 22
31. Experiments - Settings
I standalone partitioner.
B VGP, a software package for one-pass vertex-cut balanced
graph partitioning.
B measure the performance: replication and balancing.
I GraphLab.
B HDRF has been integrated in GraphLab PowerGraph 2.2.
B measure the impact on the execution time of graph
computation in a distributed graph computing frameworks.
I stream of edges in random order.
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 15 of 22
41. Results - Graph Algorithm Runtime Speedup
I SGD algorithm for collaborative filtering on Tencent Weibo.
1
1.5
2
2.5
3
32 64 128
speedup(×)
partitions
greedy
PDS
1
1.5
2
2.5
3
32 64 128
speedup(×)
partitions
greedy
PDS |P|=31,57,133
I the speedup is proportional to both:
B the advantage in replication factor.
B the actual network usage of the algorithm.
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 20 of 22
42. Conclusion
I HDRF is a one-pass vertex-cut graph partitioning algorithm.
I based on a greedy vertex-cut approach that leverages
information on vertex degrees.
I we provide a theoretical analysis of HDRF with an
average-case upper bound for the vertex replication factor.
I experimental study shows that:
B HDRF provides the smallest replication factor with close to
optimal load balance.
B HDRF significantly reduces the time needed to perform
computation on graphs.
I the stand-alone software package for one-pass v-cut balanced
partitioning at https://github.com/fabiopetroni/VGP.
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 21 of 22
43. Thank you!
Questions?
Fabio Petroni
Sapienza University of Rome, Italy
Current position:
PhD Student in Engineering in Computer Science
Research Interests:
data mining, machine learning, big data
petroni@dis.uniroma1.it
CIKM 2015. October 19-23, 2015. Melbourne, Australia. 22 of 22