SlideShare a Scribd company logo
1 of 20
Download to read offline
Parallel BFS on Distributed Memory Systems
Aydin Buluc and Kamesh Madduri
Sapta
DC reading group
September 29, 2016
Outline
Introduction
Shared Memory BFS
Model
Contributions
Serial BFS overview
Another paper: Parallel BFS using 2 queues
This paper: Hybrid Parallel BFS using 2 stacks
Experimental Results
Conclusion
Introduction
BFS is important.
BFS usually forms a sub-part to more complex graph
algorithms.
Now that we have BIG graphs, parallelizing it is very
important
Shared Memory BFS involves: (1) communication between
processors and (2) distribution of the graph(vertices) among
processors
Model
Graph G(V , E), and |V | = n and |E| = m, also m is O(n);
i.e. sparse graphs.
Edge weights = 1.
Contributions
Traditional representation: 1 dimensional BFS (1D adjacency
arrays).
Sparse matrix representation: 2D partitioning of the graph
(Not discussed).
Serial BFS overview
Sequential BFS uses a queue data structure
BFS requirement :
all vertices at a distance k from the source should be “visited”
before vertices at distance k + 1.
Explanation?
Level Synchronous BFS is a key concept in correct shared
memory BFS.
Modified BFS : Use 2 stacks
Can be parallelized as is: perform lines 6-7 in parallel,
lines 8-10 are atomic
Related Work: Level Synchronous Parallel BFS using 2
queues by Agarwal et al SC’10 [1]
Hybrid 1D Parallel BFS Algorithm
One of the main areas for optimization to this basic parallel
algorithm is
load-balancing: ensuring that parallelization of the edge visit
steps is load-balanced
1D partitioning: If there are p processors in the system, give
ownership of n/p vertices, to each processor.
Random shuffling of the vertice identifiers prior to
partitioning. So all processors ge roughly same number of
vertices(n/p) and edges(m/p)
Use of local stacks NSi for pushes and then global
union.(Overhead < 3% of execution time)
1D BFS
1D BFS contd..
1D BFS errors
The value of level is not incremented
The Next Stack NSi data structure should be emptied before
traversing next level.
Experiments
1D Flat MPI: one process per core
1D Hybrid: one or more MPI processes within a node
synthetic graphs based on the R-MAT random graph
model(default m : n 16) , web crawl of the UK domain (133
million vertices and 5.5 billion edges).
Systems: Hopper (6392-node Cray XE6) and Franklin
(9660-node Cray XT4)
Experimental Results
Strong scaling on Franklin
Higher is better
GTEPS: Giga Traversed Edges per Second
Experimental Results
lower is better
Strong scaling on Franklin
Experimental Results
Weak Scaling on Franklin
Lower is better
Experiments
Flat 1D algorithms are about 1.5 − 1.8 times faster than the
2D algorithms.
The 1D hybrid algorithm, are slower than the flat 1D
algorithm for smaller concurrencies, starts to perform
significantly faster for larger concurrencies.
Conclusion
Conjecture: Level synchronous BFS can be implemented
without any error with relaxed queues
Question: Can the error be bounded if we don’t have a level
synchronous algorithm?
V. Agarwal, F. Petrini, D. Pasetto, and D.A. Bader. Scalable
graph exploration on multicore processors. In Proc. ACM/IEEE
Conference on Supercomputing (SC10), November 2010.
A. Buluc K. Madduri. Parallel breadth-first search on
distributed memory systems. In Proceedings of 2011
International Conference for High Performance Computing,
Networking, Storage and Analysis, SC ’11, pages 65:1–65:12,
New York, NY, USA, 2011. ACM.
C.E. Leiserson and T.B. Schardl. A work-efficient parallel
breadth-first search algorithm (or how to cope with the
nondeterminism of reducers). In Proc. 22nd ACM Symp. on
Parallism in Algorithms and Architectures (SPAA ’10), pages
303–314, June 2010.
Thank You :)

More Related Content

What's hot

Graph based transistor network generation method for supergate design
Graph based transistor network generation method for supergate designGraph based transistor network generation method for supergate design
Graph based transistor network generation method for supergate designIeee Xpert
 
An Efficient Arabic Text Spotting from Natural Scenes Images
An Efficient Arabic Text Spotting from Natural Scenes ImagesAn Efficient Arabic Text Spotting from Natural Scenes Images
An Efficient Arabic Text Spotting from Natural Scenes ImagesReham Marzouk
 
High performance nb-ldpc decoder with reduction of message exchange
High performance nb-ldpc decoder with reduction of message exchange High performance nb-ldpc decoder with reduction of message exchange
High performance nb-ldpc decoder with reduction of message exchange Ieee Xpert
 
High performance pipelined architecture of elliptic curve scalar multiplicati...
High performance pipelined architecture of elliptic curve scalar multiplicati...High performance pipelined architecture of elliptic curve scalar multiplicati...
High performance pipelined architecture of elliptic curve scalar multiplicati...Ieee Xpert
 
Flexible dsp accelerator architecture exploiting carry save arithmetic
Flexible dsp accelerator architecture exploiting carry save arithmeticFlexible dsp accelerator architecture exploiting carry save arithmetic
Flexible dsp accelerator architecture exploiting carry save arithmeticIeee Xpert
 
A high performance fir filter architecture for fixed and reconfigurable appli...
A high performance fir filter architecture for fixed and reconfigurable appli...A high performance fir filter architecture for fixed and reconfigurable appli...
A high performance fir filter architecture for fixed and reconfigurable appli...Ieee Xpert
 
Basic use of xcms
Basic use of xcmsBasic use of xcms
Basic use of xcmsXiuxia Du
 
A novel area efficient vlsi architecture for recursion computation in lte tur...
A novel area efficient vlsi architecture for recursion computation in lte tur...A novel area efficient vlsi architecture for recursion computation in lte tur...
A novel area efficient vlsi architecture for recursion computation in lte tur...jpstudcorner
 
RWCap ASCION2011
RWCap ASCION2011RWCap ASCION2011
RWCap ASCION2011Hao Zhuang
 
Flexible dsp accelerator architecture exploiting carry save arithmetic
Flexible dsp accelerator architecture exploiting carry save arithmeticFlexible dsp accelerator architecture exploiting carry save arithmetic
Flexible dsp accelerator architecture exploiting carry save arithmeticNexgen Technology
 
Building and road detection from large aerial imagery
Building and road detection from large aerial imageryBuilding and road detection from large aerial imagery
Building and road detection from large aerial imageryShunta Saito
 
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...Derryck Lamptey, MPhil, CISSP
 
A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN ME...
A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN ME...A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN ME...
A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN ME...ijdpsjournal
 
On Extending MapReduce - Survey and Experiments
On Extending MapReduce - Survey and ExperimentsOn Extending MapReduce - Survey and Experiments
On Extending MapReduce - Survey and ExperimentsYu Liu
 

What's hot (19)

Graph based transistor network generation method for supergate design
Graph based transistor network generation method for supergate designGraph based transistor network generation method for supergate design
Graph based transistor network generation method for supergate design
 
An Efficient Arabic Text Spotting from Natural Scenes Images
An Efficient Arabic Text Spotting from Natural Scenes ImagesAn Efficient Arabic Text Spotting from Natural Scenes Images
An Efficient Arabic Text Spotting from Natural Scenes Images
 
High performance nb-ldpc decoder with reduction of message exchange
High performance nb-ldpc decoder with reduction of message exchange High performance nb-ldpc decoder with reduction of message exchange
High performance nb-ldpc decoder with reduction of message exchange
 
ca-ap9222-pdf
ca-ap9222-pdfca-ap9222-pdf
ca-ap9222-pdf
 
High performance pipelined architecture of elliptic curve scalar multiplicati...
High performance pipelined architecture of elliptic curve scalar multiplicati...High performance pipelined architecture of elliptic curve scalar multiplicati...
High performance pipelined architecture of elliptic curve scalar multiplicati...
 
Flexible dsp accelerator architecture exploiting carry save arithmetic
Flexible dsp accelerator architecture exploiting carry save arithmeticFlexible dsp accelerator architecture exploiting carry save arithmetic
Flexible dsp accelerator architecture exploiting carry save arithmetic
 
Capp june 2012
Capp june 2012Capp june 2012
Capp june 2012
 
Aerial detection1
Aerial detection1Aerial detection1
Aerial detection1
 
A high performance fir filter architecture for fixed and reconfigurable appli...
A high performance fir filter architecture for fixed and reconfigurable appli...A high performance fir filter architecture for fixed and reconfigurable appli...
A high performance fir filter architecture for fixed and reconfigurable appli...
 
Basic use of xcms
Basic use of xcmsBasic use of xcms
Basic use of xcms
 
A novel area efficient vlsi architecture for recursion computation in lte tur...
A novel area efficient vlsi architecture for recursion computation in lte tur...A novel area efficient vlsi architecture for recursion computation in lte tur...
A novel area efficient vlsi architecture for recursion computation in lte tur...
 
RWCap ASCION2011
RWCap ASCION2011RWCap ASCION2011
RWCap ASCION2011
 
Flexible dsp accelerator architecture exploiting carry save arithmetic
Flexible dsp accelerator architecture exploiting carry save arithmeticFlexible dsp accelerator architecture exploiting carry save arithmetic
Flexible dsp accelerator architecture exploiting carry save arithmetic
 
Building and road detection from large aerial imagery
Building and road detection from large aerial imageryBuilding and road detection from large aerial imagery
Building and road detection from large aerial imagery
 
Cs 611
Cs 611Cs 611
Cs 611
 
PAP245gauss
PAP245gaussPAP245gauss
PAP245gauss
 
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
 
A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN ME...
A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN ME...A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN ME...
A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN ME...
 
On Extending MapReduce - Survey and Experiments
On Extending MapReduce - Survey and ExperimentsOn Extending MapReduce - Survey and Experiments
On Extending MapReduce - Survey and Experiments
 

Viewers also liked

The American Rifle and Pistol Association: Confessions of a Former MAIG Supp...
The American Rifle and Pistol Association: Confessions of a Former MAIG  Supp...The American Rifle and Pistol Association: Confessions of a Former MAIG  Supp...
The American Rifle and Pistol Association: Confessions of a Former MAIG Supp...Peter Vogt
 
Running eZ Platform on Kubernetes (presented by Björn Dieding at eZ Conferenc...
Running eZ Platform on Kubernetes (presented by Björn Dieding at eZ Conferenc...Running eZ Platform on Kubernetes (presented by Björn Dieding at eZ Conferenc...
Running eZ Platform on Kubernetes (presented by Björn Dieding at eZ Conferenc...eZ Systems
 
GEAR_Company Introduction
GEAR_Company IntroductionGEAR_Company Introduction
GEAR_Company Introductionadarsh pandey
 
Keynote: How to design effective financial interventions - Sille Krukow
Keynote: How to design effective financial interventions - Sille KrukowKeynote: How to design effective financial interventions - Sille Krukow
Keynote: How to design effective financial interventions - Sille KrukowWijzer in geldzaken
 
Using Unstructured Text Data to Stay Ahead of Market Trends and Quantify Cust...
Using Unstructured Text Data to Stay Ahead of Market Trends and Quantify Cust...Using Unstructured Text Data to Stay Ahead of Market Trends and Quantify Cust...
Using Unstructured Text Data to Stay Ahead of Market Trends and Quantify Cust...Course5i
 
Festo IO-Link Presentation
Festo IO-Link PresentationFesto IO-Link Presentation
Festo IO-Link PresentationPaul Plavicheanu
 
The Convergence of Content and Commerce in a Complex World
The Convergence of Content and Commerce in a Complex WorldThe Convergence of Content and Commerce in a Complex World
The Convergence of Content and Commerce in a Complex WorldMozu
 

Viewers also liked (12)

The American Rifle and Pistol Association: Confessions of a Former MAIG Supp...
The American Rifle and Pistol Association: Confessions of a Former MAIG  Supp...The American Rifle and Pistol Association: Confessions of a Former MAIG  Supp...
The American Rifle and Pistol Association: Confessions of a Former MAIG Supp...
 
Active concluded
Active concludedActive concluded
Active concluded
 
Jubilee2
Jubilee2Jubilee2
Jubilee2
 
Aida rec.
Aida rec.Aida rec.
Aida rec.
 
Kickb1
Kickb1Kickb1
Kickb1
 
Presentation1.PPTX
Presentation1.PPTXPresentation1.PPTX
Presentation1.PPTX
 
Running eZ Platform on Kubernetes (presented by Björn Dieding at eZ Conferenc...
Running eZ Platform on Kubernetes (presented by Björn Dieding at eZ Conferenc...Running eZ Platform on Kubernetes (presented by Björn Dieding at eZ Conferenc...
Running eZ Platform on Kubernetes (presented by Björn Dieding at eZ Conferenc...
 
GEAR_Company Introduction
GEAR_Company IntroductionGEAR_Company Introduction
GEAR_Company Introduction
 
Keynote: How to design effective financial interventions - Sille Krukow
Keynote: How to design effective financial interventions - Sille KrukowKeynote: How to design effective financial interventions - Sille Krukow
Keynote: How to design effective financial interventions - Sille Krukow
 
Using Unstructured Text Data to Stay Ahead of Market Trends and Quantify Cust...
Using Unstructured Text Data to Stay Ahead of Market Trends and Quantify Cust...Using Unstructured Text Data to Stay Ahead of Market Trends and Quantify Cust...
Using Unstructured Text Data to Stay Ahead of Market Trends and Quantify Cust...
 
Festo IO-Link Presentation
Festo IO-Link PresentationFesto IO-Link Presentation
Festo IO-Link Presentation
 
The Convergence of Content and Commerce in a Complex World
The Convergence of Content and Commerce in a Complex WorldThe Convergence of Content and Commerce in a Complex World
The Convergence of Content and Commerce in a Complex World
 

Similar to Parallel 1D Hybrid BFS Delivers Faster Traversal on Large Graphs

Adams_SIAMCSE15
Adams_SIAMCSE15Adams_SIAMCSE15
Adams_SIAMCSE15Karen Pao
 
Distributed approximate spectral clustering for large scale datasets
Distributed approximate spectral clustering for large scale datasetsDistributed approximate spectral clustering for large scale datasets
Distributed approximate spectral clustering for large scale datasetsBita Kazemi
 
El text.tokuron a(2019).jung190711
El text.tokuron a(2019).jung190711El text.tokuron a(2019).jung190711
El text.tokuron a(2019).jung190711RCCSRENKEI
 
Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Usatyuk Vasiliy
 
Fast and Scalable NUMA-based Thread Parallel Breadth-first Search
Fast and Scalable NUMA-based Thread Parallel Breadth-first SearchFast and Scalable NUMA-based Thread Parallel Breadth-first Search
Fast and Scalable NUMA-based Thread Parallel Breadth-first SearchYuichiro Yasui
 
Simulation of Scale-Free Networks
Simulation of Scale-Free NetworksSimulation of Scale-Free Networks
Simulation of Scale-Free NetworksGabriele D'Angelo
 
CMSI計算科学技術特論A (2015) 第13回 Parallelization of Molecular Dynamics
CMSI計算科学技術特論A (2015) 第13回 Parallelization of Molecular Dynamics CMSI計算科学技術特論A (2015) 第13回 Parallelization of Molecular Dynamics
CMSI計算科学技術特論A (2015) 第13回 Parallelization of Molecular Dynamics Computational Materials Science Initiative
 
"An adaptive modular approach to the mining of sensor network ...
"An adaptive modular approach to the mining of sensor network ..."An adaptive modular approach to the mining of sensor network ...
"An adaptive modular approach to the mining of sensor network ...butest
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationGeoffrey Fox
 
Scaling PageRank to 100 Billion Pages
Scaling PageRank to 100 Billion PagesScaling PageRank to 100 Billion Pages
Scaling PageRank to 100 Billion PagesSubhajit Sahu
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processingjins0618
 
Towards Deep Attention in Graph Neural Networks: Problems and Remedies.pptx
Towards Deep Attention in Graph Neural Networks: Problems and Remedies.pptxTowards Deep Attention in Graph Neural Networks: Problems and Remedies.pptx
Towards Deep Attention in Graph Neural Networks: Problems and Remedies.pptxssuser2624f71
 

Similar to Parallel 1D Hybrid BFS Delivers Faster Traversal on Large Graphs (20)

Adams_SIAMCSE15
Adams_SIAMCSE15Adams_SIAMCSE15
Adams_SIAMCSE15
 
Distributed approximate spectral clustering for large scale datasets
Distributed approximate spectral clustering for large scale datasetsDistributed approximate spectral clustering for large scale datasets
Distributed approximate spectral clustering for large scale datasets
 
El text.tokuron a(2019).jung190711
El text.tokuron a(2019).jung190711El text.tokuron a(2019).jung190711
El text.tokuron a(2019).jung190711
 
Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...
 
Multidimensional RNN
Multidimensional RNNMultidimensional RNN
Multidimensional RNN
 
Fast and Scalable NUMA-based Thread Parallel Breadth-first Search
Fast and Scalable NUMA-based Thread Parallel Breadth-first SearchFast and Scalable NUMA-based Thread Parallel Breadth-first Search
Fast and Scalable NUMA-based Thread Parallel Breadth-first Search
 
Simulation of Scale-Free Networks
Simulation of Scale-Free NetworksSimulation of Scale-Free Networks
Simulation of Scale-Free Networks
 
Graph chi
Graph chiGraph chi
Graph chi
 
CMSI計算科学技術特論A (2015) 第13回 Parallelization of Molecular Dynamics
CMSI計算科学技術特論A (2015) 第13回 Parallelization of Molecular Dynamics CMSI計算科学技術特論A (2015) 第13回 Parallelization of Molecular Dynamics
CMSI計算科学技術特論A (2015) 第13回 Parallelization of Molecular Dynamics
 
UIC Panella Thesis
UIC Panella ThesisUIC Panella Thesis
UIC Panella Thesis
 
Solution(1)
Solution(1)Solution(1)
Solution(1)
 
Kailash(13EC35032)_mtp.pptx
Kailash(13EC35032)_mtp.pptxKailash(13EC35032)_mtp.pptx
Kailash(13EC35032)_mtp.pptx
 
Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)
 
"An adaptive modular approach to the mining of sensor network ...
"An adaptive modular approach to the mining of sensor network ..."An adaptive modular approach to the mining of sensor network ...
"An adaptive modular approach to the mining of sensor network ...
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
 
3rd 3DDRESD: DReAMS
3rd 3DDRESD: DReAMS3rd 3DDRESD: DReAMS
3rd 3DDRESD: DReAMS
 
Scaling PageRank to 100 Billion Pages
Scaling PageRank to 100 Billion PagesScaling PageRank to 100 Billion Pages
Scaling PageRank to 100 Billion Pages
 
1409.1556.pdf
1409.1556.pdf1409.1556.pdf
1409.1556.pdf
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processing
 
Towards Deep Attention in Graph Neural Networks: Problems and Remedies.pptx
Towards Deep Attention in Graph Neural Networks: Problems and Remedies.pptxTowards Deep Attention in Graph Neural Networks: Problems and Remedies.pptx
Towards Deep Attention in Graph Neural Networks: Problems and Remedies.pptx
 

Recently uploaded

Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Cytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptxCytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptxVarshiniMK
 
insect anatomy and insect body wall and their physiology
insect anatomy and insect body wall and their  physiologyinsect anatomy and insect body wall and their  physiology
insect anatomy and insect body wall and their physiologyDrAnita Sharma
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
TOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptxTOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptxdharshini369nike
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Heredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of TraitsHeredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of TraitsCharlene Llagas
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Forest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantForest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantadityabhardwaj282
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫qfactory1
 
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |aasikanpl
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 

Recently uploaded (20)

Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Cytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptxCytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptx
 
insect anatomy and insect body wall and their physiology
insect anatomy and insect body wall and their  physiologyinsect anatomy and insect body wall and their  physiology
insect anatomy and insect body wall and their physiology
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
TOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptxTOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptx
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Heredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of TraitsHeredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of Traits
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Forest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantForest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are important
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫
 
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 

Parallel 1D Hybrid BFS Delivers Faster Traversal on Large Graphs

  • 1. Parallel BFS on Distributed Memory Systems Aydin Buluc and Kamesh Madduri Sapta DC reading group September 29, 2016
  • 2. Outline Introduction Shared Memory BFS Model Contributions Serial BFS overview Another paper: Parallel BFS using 2 queues This paper: Hybrid Parallel BFS using 2 stacks Experimental Results Conclusion
  • 3. Introduction BFS is important. BFS usually forms a sub-part to more complex graph algorithms. Now that we have BIG graphs, parallelizing it is very important Shared Memory BFS involves: (1) communication between processors and (2) distribution of the graph(vertices) among processors
  • 4. Model Graph G(V , E), and |V | = n and |E| = m, also m is O(n); i.e. sparse graphs. Edge weights = 1.
  • 5. Contributions Traditional representation: 1 dimensional BFS (1D adjacency arrays). Sparse matrix representation: 2D partitioning of the graph (Not discussed).
  • 6. Serial BFS overview Sequential BFS uses a queue data structure BFS requirement : all vertices at a distance k from the source should be “visited” before vertices at distance k + 1. Explanation? Level Synchronous BFS is a key concept in correct shared memory BFS.
  • 7. Modified BFS : Use 2 stacks Can be parallelized as is: perform lines 6-7 in parallel, lines 8-10 are atomic
  • 8. Related Work: Level Synchronous Parallel BFS using 2 queues by Agarwal et al SC’10 [1]
  • 9. Hybrid 1D Parallel BFS Algorithm One of the main areas for optimization to this basic parallel algorithm is load-balancing: ensuring that parallelization of the edge visit steps is load-balanced 1D partitioning: If there are p processors in the system, give ownership of n/p vertices, to each processor. Random shuffling of the vertice identifiers prior to partitioning. So all processors ge roughly same number of vertices(n/p) and edges(m/p) Use of local stacks NSi for pushes and then global union.(Overhead < 3% of execution time)
  • 12. 1D BFS errors The value of level is not incremented The Next Stack NSi data structure should be emptied before traversing next level.
  • 13. Experiments 1D Flat MPI: one process per core 1D Hybrid: one or more MPI processes within a node synthetic graphs based on the R-MAT random graph model(default m : n 16) , web crawl of the UK domain (133 million vertices and 5.5 billion edges). Systems: Hopper (6392-node Cray XE6) and Franklin (9660-node Cray XT4)
  • 14. Experimental Results Strong scaling on Franklin Higher is better GTEPS: Giga Traversed Edges per Second
  • 15. Experimental Results lower is better Strong scaling on Franklin
  • 16. Experimental Results Weak Scaling on Franklin Lower is better
  • 17. Experiments Flat 1D algorithms are about 1.5 − 1.8 times faster than the 2D algorithms. The 1D hybrid algorithm, are slower than the flat 1D algorithm for smaller concurrencies, starts to perform significantly faster for larger concurrencies.
  • 18. Conclusion Conjecture: Level synchronous BFS can be implemented without any error with relaxed queues Question: Can the error be bounded if we don’t have a level synchronous algorithm?
  • 19. V. Agarwal, F. Petrini, D. Pasetto, and D.A. Bader. Scalable graph exploration on multicore processors. In Proc. ACM/IEEE Conference on Supercomputing (SC10), November 2010. A. Buluc K. Madduri. Parallel breadth-first search on distributed memory systems. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’11, pages 65:1–65:12, New York, NY, USA, 2011. ACM. C.E. Leiserson and T.B. Schardl. A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers). In Proc. 22nd ACM Symp. on Parallism in Algorithms and Architectures (SPAA ’10), pages 303–314, June 2010.