SlideShare a Scribd company logo
1 of 47
Download to read offline
Fast Billion-scale Graph Computation Using a Bimodal
Block Processing Model
Hugo Gualdron1, Robson Cordeiro1, Jose Rodrigues-Jr1,
Duen Horng (Polo) Chau 2, Minsuk Kahng2, U Kang3
1University of Sao Paulo, Brazil
2Georgia Institute of Technology, Atlanta, USA
3Seoul National University, Republic of Korea
{gualdron,robson,junio}@icmc.usp.br, {polo,kahng}@gatech.edu, ukang@snu.ac.kr
Sept/2016
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 1 / 28
Summary
1 Introduction
2 Graph Organization
3 Processing Model
4 Programming Model
5 Results
6 Conclusions
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 2 / 28
Introduction
Summary
1 Introduction
2 Graph Organization
3 Processing Model
4 Programming Model
5 Results
6 Conclusions
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 3 / 28
Introduction
Introduction
Frameworks for large-graph processing based on secondary memory
have shown better or equal performance than distributed frameworks.
They are better to solve problems such as PageRank, Connected
Components and Triangle Counting.
Large-graph processing by minimizing computational resources is very
important for the industry.
“You can have a second computer once you have shown you know
how to use the first one.”
Paul Barham
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 4 / 28
Introduction
Contributions
M-Flash, the fastest graph computation framework to date.
A Bimodal Block Processing Model, an innovation that is able to
boost the graph computation by minimizing the I/O cost even further.
A flexible and simple programming model to easily implement
popular and essential graph algorithm including the first
single-machine billion-scale eigensolver.
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 5 / 28
Graph Organization
Summary
1 Introduction
2 Graph Organization
3 Processing Model
4 Programming Model
5 Results
6 Conclusions
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 6 / 28
Graph Organization
Graph Organization – Facts
Edges and vertex values do not fit in main memory.
Sequential operations on disk must be maximized, improving the
performance on HDDs and SSDs.
Real graphs have a varying density of edges (sparse and dense
regions) and these regions can be processed in different ways to
achieve superior performance.
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 7 / 28
Graph Organization
Graph Organization
Each vertex has a set of attributes γ
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 8 / 28
Graph Organization
Graph Organization
Vertices are divided into intervals.
Intersections between intervals are called blocks
Edges are divided into blocks
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 8 / 28
Graph Organization
Graph Organization
M-Flash loads in memory some vertex intervals and edges are
processed using streaming.
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 8 / 28
Graph Organization
Graph Organization
Edges create a source-partition (SP) when they are grouped by source interval.
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 8 / 28
Graph Organization
Graph Organization
Edges create a destination-partition (DP) when they are grouped by destination interval.
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 8 / 28
Graph Organization
Number of intervals to process a graph
β =
φ(T + 1) |V |
M
M = RAM size
|V | = Number of vertices of the graph
φ = Size of data per vertex
T = Number of threads
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 9 / 28
Graph Organization
Number of intervals to process a graph
β =
φ(T + 1) |V |
M
M = RAM size
|V | = Number of vertices of the graph
φ = Size of data per vertex
T = Number of threads
For example, 4 bytes of data per node, 2 threads, a graph with 2 billion
nodes, and for 1 GB RAM:
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 9 / 28
Graph Organization
Number of intervals to process a graph
β =
φ(T + 1) |V |
M
M = RAM size
|V | = Number of vertices of the graph
φ = Size of data per vertex
T = Number of threads
For example, 4 bytes of data per node, 2 threads, a graph with 2 billion
nodes, and for 1 GB RAM:
β = 23 intervals
529 blocks
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 9 / 28
Processing Model
Summary
1 Introduction
2 Graph Organization
3 Processing Model
4 Programming Model
5 Results
6 Conclusions
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 10 / 28
Processing Model
Processing Model
Dense Block Processing model DBP
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 11 / 28
Processing Model
Processing Model
Dense Block Processing model DBP
An efficient model to process blocks with higher density, i.e., blocks
with high number of edges.
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 11 / 28
Processing Model
Processing Model
Streaming Partition Processing Model SPP
SP = Source-partition, DP = Destination-partition
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 12 / 28
Processing Model
Processing Model
Streaming Partition Processing Model SPP
SP = Source-partition, DP = Destination-partition
An efficient model for processing sparse blocks.
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 12 / 28
Processing Model
Processing Model
Bimodal Block Processing Model BBP
I/O cost as a metric of efficiency
DBP = Dense Block Processing, SSP = Streaming Partition Processing
O (DBP (G)) = O
(β + 1) |V | + |E|
B
+ β2
O (SPP (G)) = O


2 |V | + |E| + 2 ˆE
B
+ β


β = Number of intervals
V = Number of Vertices
E = Number of Edges
ˆE = Number of edges with extended size
B = Block size for I/O operations on disk
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 13 / 28
Processing Model
Processing Model
Bimodal Block Processing Model BBP
I/O cost per block G(p,q)
O DBP G(p,q)
= O
ϑφ (1 + 1/β) + ξψ
B
O SPP G(p,q)
= O
2ϑφ/β + 2ξ(φ + ψ) + ξψ
B
DBP = Dense Block Processing, SSP = Streaming Partition Processing
ξ = Number of edges withing the block G(p,q)
ϑ = Number of vertices within the interval
φ = Vertex size in bytes
ψ = Edge size in bytes
B = Block size for I/O operations on disk.
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 14 / 28
Processing Model
Processing Model
Bimodal Block Processing Model BBP
Ratio SPP/DBP
O
SPP
DBP
= O
1
β
+
2ξ
ϑ
1 +
ψ
φ
BlockType G(p,q)
=
sparse, if O SPP
DBP
< 1
dense, otherwise
DBP = Dense Block Processing, SSP = Streaming Partition Processing
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 15 / 28
Processing Model
Processing Model — Preprocessing 1
Edges are divided in source intervals.
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 16 / 28
Processing Model
Processing Model — Preprocessing 1
The number of edges is measured at the same time to classify the block (sparse or dense).
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 16 / 28
Processing Model
Processing Model — Preprocessing 2
Source Partitions are rewritten and we store only edges of sparse blocks.
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 16 / 28
Processing Model
Processing Model - an iteration
Destination-partitions are generated since the source-partitions.
For processing a source-partition, vertex values of the source interval are loaded in
memory.
The edges are rewritten and divided by destination.
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 16 / 28
Processing Model
Processing Model - an iteration
Destination-partitions are generated since the source-partitions.
For processing a source-partition, vertex values of the source interval are loaded in
memory.
The edges are rewritten and divided by destination.
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 16 / 28
Processing Model
Processing Model - an iteration
Destination-partitions are generated since the source-partitions.
For processing a source-partition, vertex values of the source interval are loaded in
memory.
The edges are rewritten and divided by destination.
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 16 / 28
Processing Model
Processing Model - an iteration
The graph processing starts over the destination-partitions and the dense blocks.
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 16 / 28
Processing Model
Processing Model - an iteration
Vertex values of the destination interval are initialized.
Next, edges of the destination partition are processed.
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 16 / 28
Processing Model
Processing Model - an iteration
Next, dense blocks of the interval are processed. To do the processing, vertex values
associated with the source interval are loaded in memory.
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 16 / 28
Processing Model
Processing Model - an iteration
Vertex values of the destination interval are initialized.
Next, edges of the destination partition are processed.
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 16 / 28
Processing Model
Processing Model - an iteration
Next, dense blocks of the interval are processed. To do the processing, vertex values
associated with the source interval are loaded in memory.
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 16 / 28
Processing Model
Processing Model - an iteration
Vertex values of the destination interval are initialized.
Next, edges of the destination partition are processed.
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 16 / 28
Processing Model
Programming Model
MAlgorithm: Algorithm Interface
initialize (Vertex v);
gather (Vertex u, Vertex v, EdgeData data);
process (Accum v 1, Accum v 2, Accum v out);
apply (Vertex v);
PageRank
degree(v) = out degree for Vertex v;
initialize (v): v.value = 0
gather (u, v, data): v.value += u.value / degree(u)
process (v 1, v 2, Accum v out): v out = v 1 + v 2
apply (Vertex v): v.value = 0.15 + 0.85 * v.value
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 17 / 28
Programming Model
Summary
1 Introduction
2 Graph Organization
3 Processing Model
4 Programming Model
5 Results
6 Conclusions
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 18 / 28
Programming Model
Programming Model
Algorithm for Connected Components
initialize (v): v.value = v.id
gather (u, v, data): v.value = min (u.value, v.value)
process: (v 1, v 2, Accum v out): v out = min (v 1, v 2)
Algorithm for Sparse Matrix-Vector Multiplication SpMV
initialize (v): v.value = 0
gather (u, v, data): v.value += u.value * data
process: (v 1, v 2, Accum v out): v out = v 1 + v 2
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 19 / 28
Results
Summary
1 Introduction
2 Graph Organization
3 Processing Model
4 Programming Model
5 Results
6 Conclusions
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 20 / 28
Results
Results
Datasets
Graph Nodes Edges Size
LiveJournal 4,847,571 68,993,773 Small
Twitter 41,652,230 1,468,365,182 Medium
YahooWeb 1,413,511,391 6,636,600,779 Large
R-Mat (Synthetic graph) 4,000,000,000 12,000,000,000 Large
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 21 / 28
Results
Results
State of the art approaches that are compared with
the experiments
GraphChi (2012)
X-Stream (2013)
TurboGraph (2013)
MMap (2014)
GridGraph (2015)
M-Flash
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 22 / 28
Results
Preprocessing time (seconds)
LiveJournal Twitter YahooWeb R-Mat
GraphChi 23 511 2,781 7,440
X-Stream 5 131 865 2,553
TurboGraph 18 582 4,694 -
MMap 17 372 636 -
M-Flash 10 206 1,265 4,837
In the worst case, M-Flash is twice slower than the best framework
(MMAP)
It reads and writes two times the entire graph on disk, which is the
third best performance, after MMap and X-Stream.
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 23 / 28
Results
Runtime (in seconds) with 8GB of RAM
The symbol “-” indicates that the corresponding system failed to process
the graph or the information is not available in the respective papers.
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 24 / 28
Results
Effect of Memory Size
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 25 / 28
Conclusions
Summary
1 Introduction
2 Graph Organization
3 Processing Model
4 Programming Model
5 Results
6 Conclusions
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 26 / 28
Conclusions
Conclusions
M-Flash uses an innovative design that considers the varying density
of the graph;
By using an adaptive engineering, it benefits from the best of both
worlds: stream processing and dense-block processing;
Its programming model supports classical and new algorithms
straightly adapting to the model of other similar frameworks – among
its processing possibilities: PageRank, Connected Components,
matrix-vector multiplication, eigensolver, clustering coefficient, to
name few;
To date, M-Flash is the fastest single-node graph processing
framework, beating competitors GraphChi, X-Stream, TurboGraph
and MMAP.
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 27 / 28
Conclusions
Acknowledgments
CNPq (grant 444985/2014-0)
Fapesp (grants 2016/02557-0, 2014/21483-2),
Capes
NSF (grants IIS-1563816, TWC-1526254, IIS-1217559)
GRFP (grant DGE-1148903)
Korean (MSIP) agency IITP (grant R0190-15-2012)
Thanks
Gualdron et al. (1
University of Sao Paulo, Brazil 2
Georgia Institute of Technology, Atlanta, USA 3
Seoul National University, Republic ofSept/2016 28 / 28

More Related Content

Viewers also liked

Unveiling smoke in social images with the SmokeBlock approach
Unveiling smoke in social images with the SmokeBlock approachUnveiling smoke in social images with the SmokeBlock approach
Unveiling smoke in social images with the SmokeBlock approachUniversidade de São Paulo
 
6 7-metodologia depesquisaemcienciadacomputacao-escritadeartigocientifico-plagio
6 7-metodologia depesquisaemcienciadacomputacao-escritadeartigocientifico-plagio6 7-metodologia depesquisaemcienciadacomputacao-escritadeartigocientifico-plagio
6 7-metodologia depesquisaemcienciadacomputacao-escritadeartigocientifico-plagioUniversidade de São Paulo
 
Visualization tree multiple linked analytical decisions
Visualization tree multiple linked analytical decisionsVisualization tree multiple linked analytical decisions
Visualization tree multiple linked analytical decisionsUniversidade de São Paulo
 
Frequency plot and relevance plot to enhance visual data exploration
Frequency plot and relevance plot to enhance visual data explorationFrequency plot and relevance plot to enhance visual data exploration
Frequency plot and relevance plot to enhance visual data explorationUniversidade de São Paulo
 
Effective and Unsupervised Fractal-based Feature Selection for Very Large Dat...
Effective and Unsupervised Fractal-based Feature Selection for Very Large Dat...Effective and Unsupervised Fractal-based Feature Selection for Very Large Dat...
Effective and Unsupervised Fractal-based Feature Selection for Very Large Dat...Universidade de São Paulo
 
Reviewing Data Visualization: an Analytical Taxonomical Study
Reviewing Data Visualization: an Analytical Taxonomical StudyReviewing Data Visualization: an Analytical Taxonomical Study
Reviewing Data Visualization: an Analytical Taxonomical StudyUniversidade de São Paulo
 
On the Support of a Similarity-Enabled Relational Database Management System ...
On the Support of a Similarity-Enabled Relational Database Management System ...On the Support of a Similarity-Enabled Relational Database Management System ...
On the Support of a Similarity-Enabled Relational Database Management System ...Universidade de São Paulo
 
StructMatrix: large-scale visualization of graphs by means of structure detec...
StructMatrix: large-scale visualization of graphs by means of structure detec...StructMatrix: large-scale visualization of graphs by means of structure detec...
StructMatrix: large-scale visualization of graphs by means of structure detec...Universidade de São Paulo
 
Supervised-Learning Link Recommendation in the DBLP co-authoring network
Supervised-Learning Link Recommendation in the DBLP co-authoring networkSupervised-Learning Link Recommendation in the DBLP co-authoring network
Supervised-Learning Link Recommendation in the DBLP co-authoring networkUniversidade de São Paulo
 
Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...Universidade de São Paulo
 
Techniques for effective and efficient fire detection from social media images
Techniques for effective and efficient fire detection from social media imagesTechniques for effective and efficient fire detection from social media images
Techniques for effective and efficient fire detection from social media imagesUniversidade de São Paulo
 
Physics of Algorithms Talk
Physics of Algorithms TalkPhysics of Algorithms Talk
Physics of Algorithms Talkjasonj383
 
Efficient Belief Propagation in Depth Finding
Efficient Belief Propagation in Depth FindingEfficient Belief Propagation in Depth Finding
Efficient Belief Propagation in Depth FindingSamantha Luber
 
Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...
Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...
Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...Universidade de São Paulo
 
A Movement Recognition Method using LBP
A Movement Recognition Method using LBPA Movement Recognition Method using LBP
A Movement Recognition Method using LBPZihui Li
 
02 probabilistic inference in graphical models
02 probabilistic inference in graphical models02 probabilistic inference in graphical models
02 probabilistic inference in graphical modelszukun
 

Viewers also liked (20)

SuperGraph visualization
SuperGraph visualizationSuperGraph visualization
SuperGraph visualization
 
Unveiling smoke in social images with the SmokeBlock approach
Unveiling smoke in social images with the SmokeBlock approachUnveiling smoke in social images with the SmokeBlock approach
Unveiling smoke in social images with the SmokeBlock approach
 
6 7-metodologia depesquisaemcienciadacomputacao-escritadeartigocientifico-plagio
6 7-metodologia depesquisaemcienciadacomputacao-escritadeartigocientifico-plagio6 7-metodologia depesquisaemcienciadacomputacao-escritadeartigocientifico-plagio
6 7-metodologia depesquisaemcienciadacomputacao-escritadeartigocientifico-plagio
 
Apresentacao vldb
Apresentacao vldbApresentacao vldb
Apresentacao vldb
 
Visualization tree multiple linked analytical decisions
Visualization tree multiple linked analytical decisionsVisualization tree multiple linked analytical decisions
Visualization tree multiple linked analytical decisions
 
Frequency plot and relevance plot to enhance visual data exploration
Frequency plot and relevance plot to enhance visual data explorationFrequency plot and relevance plot to enhance visual data exploration
Frequency plot and relevance plot to enhance visual data exploration
 
Effective and Unsupervised Fractal-based Feature Selection for Very Large Dat...
Effective and Unsupervised Fractal-based Feature Selection for Very Large Dat...Effective and Unsupervised Fractal-based Feature Selection for Very Large Dat...
Effective and Unsupervised Fractal-based Feature Selection for Very Large Dat...
 
Reviewing Data Visualization: an Analytical Taxonomical Study
Reviewing Data Visualization: an Analytical Taxonomical StudyReviewing Data Visualization: an Analytical Taxonomical Study
Reviewing Data Visualization: an Analytical Taxonomical Study
 
On the Support of a Similarity-Enabled Relational Database Management System ...
On the Support of a Similarity-Enabled Relational Database Management System ...On the Support of a Similarity-Enabled Relational Database Management System ...
On the Support of a Similarity-Enabled Relational Database Management System ...
 
StructMatrix: large-scale visualization of graphs by means of structure detec...
StructMatrix: large-scale visualization of graphs by means of structure detec...StructMatrix: large-scale visualization of graphs by means of structure detec...
StructMatrix: large-scale visualization of graphs by means of structure detec...
 
Supervised-Learning Link Recommendation in the DBLP co-authoring network
Supervised-Learning Link Recommendation in the DBLP co-authoring networkSupervised-Learning Link Recommendation in the DBLP co-authoring network
Supervised-Learning Link Recommendation in the DBLP co-authoring network
 
Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...
 
Techniques for effective and efficient fire detection from social media images
Techniques for effective and efficient fire detection from social media imagesTechniques for effective and efficient fire detection from social media images
Techniques for effective and efficient fire detection from social media images
 
Physics of Algorithms Talk
Physics of Algorithms TalkPhysics of Algorithms Talk
Physics of Algorithms Talk
 
Efficient Belief Propagation in Depth Finding
Efficient Belief Propagation in Depth FindingEfficient Belief Propagation in Depth Finding
Efficient Belief Propagation in Depth Finding
 
C04922125
C04922125C04922125
C04922125
 
Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...
Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...
Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...
 
Graph-based Relational Data Visualization
Graph-based RelationalData VisualizationGraph-based RelationalData Visualization
Graph-based Relational Data Visualization
 
A Movement Recognition Method using LBP
A Movement Recognition Method using LBPA Movement Recognition Method using LBP
A Movement Recognition Method using LBP
 
02 probabilistic inference in graphical models
02 probabilistic inference in graphical models02 probabilistic inference in graphical models
02 probabilistic inference in graphical models
 

Similar to Fast Billion-scale Graph Computation Using a Bimodal Block Processing Model

Parallel Biological Sequence Comparison in GPU Platforms
Parallel Biological Sequence Comparison in GPU PlatformsParallel Biological Sequence Comparison in GPU Platforms
Parallel Biological Sequence Comparison in GPU PlatformsGanesan Narayanasamy
 
Educational data mining using jmp
Educational data mining using jmpEducational data mining using jmp
Educational data mining using jmpijcsit
 
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...NAIST Machine Translation Study Group
 
Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2izahn
 
Boetticher Presentation Promise 2008v2
Boetticher Presentation Promise 2008v2Boetticher Presentation Promise 2008v2
Boetticher Presentation Promise 2008v2gregoryg
 
Presentation Elpub 2013
Presentation   Elpub 2013Presentation   Elpub 2013
Presentation Elpub 2013Helder Firmino
 
Ns ws-liver-paper-icA Hybrid Segmentation Approach Based on Neutrosophic Sets...
Ns ws-liver-paper-icA Hybrid Segmentation Approach Based on Neutrosophic Sets...Ns ws-liver-paper-icA Hybrid Segmentation Approach Based on Neutrosophic Sets...
Ns ws-liver-paper-icA Hybrid Segmentation Approach Based on Neutrosophic Sets...Aboul Ella Hassanien
 
Box Plots vs Data Plots
Box Plots vs Data PlotsBox Plots vs Data Plots
Box Plots vs Data PlotsJuran Global
 
A genetic algorithm for a university weekly courses timetabling problem
A genetic algorithm for a university weekly courses timetabling problemA genetic algorithm for a university weekly courses timetabling problem
A genetic algorithm for a university weekly courses timetabling problemMotasem Smadi
 
Naturnet newsletter05 1
Naturnet newsletter05 1Naturnet newsletter05 1
Naturnet newsletter05 1WirelessInfo
 
2014-10-10-SBC361-Reproducible research
2014-10-10-SBC361-Reproducible research2014-10-10-SBC361-Reproducible research
2014-10-10-SBC361-Reproducible researchYannick Wurm
 
Bayesian Estimation of Above-Average Performance in Tertiary Institutions: A ...
Bayesian Estimation of Above-Average Performance in Tertiary Institutions: A ...Bayesian Estimation of Above-Average Performance in Tertiary Institutions: A ...
Bayesian Estimation of Above-Average Performance in Tertiary Institutions: A ...IOSR Journals
 
Studio 4 - workshop introduction
Studio 4 - workshop introductionStudio 4 - workshop introduction
Studio 4 - workshop introductionDanil Nagy
 
MS Presentation
MS PresentationMS Presentation
MS Presentationrajeeja
 
Julia for R programmers
Julia for R programmersJulia for R programmers
Julia for R programmersNaren Arya
 
130321 zephyrin soh - on the effect of exploration strategies on maintenanc...
130321   zephyrin soh - on the effect of exploration strategies on maintenanc...130321   zephyrin soh - on the effect of exploration strategies on maintenanc...
130321 zephyrin soh - on the effect of exploration strategies on maintenanc...Ptidej Team
 

Similar to Fast Billion-scale Graph Computation Using a Bimodal Block Processing Model (20)

Parallel Biological Sequence Comparison in GPU Platforms
Parallel Biological Sequence Comparison in GPU PlatformsParallel Biological Sequence Comparison in GPU Platforms
Parallel Biological Sequence Comparison in GPU Platforms
 
Educational data mining using jmp
Educational data mining using jmpEducational data mining using jmp
Educational data mining using jmp
 
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
 
Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2
 
Boetticher Presentation Promise 2008v2
Boetticher Presentation Promise 2008v2Boetticher Presentation Promise 2008v2
Boetticher Presentation Promise 2008v2
 
Presentation Elpub 2013
Presentation   Elpub 2013Presentation   Elpub 2013
Presentation Elpub 2013
 
Ns ws-liver-paper-icA Hybrid Segmentation Approach Based on Neutrosophic Sets...
Ns ws-liver-paper-icA Hybrid Segmentation Approach Based on Neutrosophic Sets...Ns ws-liver-paper-icA Hybrid Segmentation Approach Based on Neutrosophic Sets...
Ns ws-liver-paper-icA Hybrid Segmentation Approach Based on Neutrosophic Sets...
 
Box Plots vs Data Plots
Box Plots vs Data PlotsBox Plots vs Data Plots
Box Plots vs Data Plots
 
A genetic algorithm for a university weekly courses timetabling problem
A genetic algorithm for a university weekly courses timetabling problemA genetic algorithm for a university weekly courses timetabling problem
A genetic algorithm for a university weekly courses timetabling problem
 
Naturnet newsletter05 1
Naturnet newsletter05 1Naturnet newsletter05 1
Naturnet newsletter05 1
 
2014-10-10-SBC361-Reproducible research
2014-10-10-SBC361-Reproducible research2014-10-10-SBC361-Reproducible research
2014-10-10-SBC361-Reproducible research
 
Bayesian Estimation of Above-Average Performance in Tertiary Institutions: A ...
Bayesian Estimation of Above-Average Performance in Tertiary Institutions: A ...Bayesian Estimation of Above-Average Performance in Tertiary Institutions: A ...
Bayesian Estimation of Above-Average Performance in Tertiary Institutions: A ...
 
Studio 4 - workshop introduction
Studio 4 - workshop introductionStudio 4 - workshop introduction
Studio 4 - workshop introduction
 
Ijsea04031014
Ijsea04031014Ijsea04031014
Ijsea04031014
 
MS Presentation
MS PresentationMS Presentation
MS Presentation
 
Resume 2016-10-13
Resume 2016-10-13Resume 2016-10-13
Resume 2016-10-13
 
Julia for R programmers
Julia for R programmersJulia for R programmers
Julia for R programmers
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilis...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilis...MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilis...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilis...
 
130321 zephyrin soh - on the effect of exploration strategies on maintenanc...
130321   zephyrin soh - on the effect of exploration strategies on maintenanc...130321   zephyrin soh - on the effect of exploration strategies on maintenanc...
130321 zephyrin soh - on the effect of exploration strategies on maintenanc...
 
Jk2416381644
Jk2416381644Jk2416381644
Jk2416381644
 

More from Universidade de São Paulo

More from Universidade de São Paulo (13)

A gentle introduction to Deep Learning
A gentle introduction to Deep LearningA gentle introduction to Deep Learning
A gentle introduction to Deep Learning
 
Computação: carreira e mercado de trabalho
Computação: carreira e mercado de trabalhoComputação: carreira e mercado de trabalho
Computação: carreira e mercado de trabalho
 
Introdução às ferramentas de Business Intelligence do ecossistema Hadoop
Introdução às ferramentas de Business Intelligence do ecossistema HadoopIntrodução às ferramentas de Business Intelligence do ecossistema Hadoop
Introdução às ferramentas de Business Intelligence do ecossistema Hadoop
 
Complexidade de Algoritmos, Notação assintótica, Algoritmos polinomiais e in...
Complexidade de Algoritmos, Notação assintótica, Algoritmos polinomiais e in...Complexidade de Algoritmos, Notação assintótica, Algoritmos polinomiais e in...
Complexidade de Algoritmos, Notação assintótica, Algoritmos polinomiais e in...
 
Dawarehouse e OLAP
Dawarehouse e OLAPDawarehouse e OLAP
Dawarehouse e OLAP
 
Metric s plat - a platform for quick development testing and visualization of...
Metric s plat - a platform for quick development testing and visualization of...Metric s plat - a platform for quick development testing and visualization of...
Metric s plat - a platform for quick development testing and visualization of...
 
Hierarchical visual filtering pragmatic and epistemic actions for database vi...
Hierarchical visual filtering pragmatic and epistemic actions for database vi...Hierarchical visual filtering pragmatic and epistemic actions for database vi...
Hierarchical visual filtering pragmatic and epistemic actions for database vi...
 
Java generics-basics
Java generics-basicsJava generics-basics
Java generics-basics
 
Java collections-basic
Java collections-basicJava collections-basic
Java collections-basic
 
Java network-sockets-etc
Java network-sockets-etcJava network-sockets-etc
Java network-sockets-etc
 
Java streams
Java streamsJava streams
Java streams
 
Infovis tutorial
Infovis tutorialInfovis tutorial
Infovis tutorial
 
Java platform
Java platformJava platform
Java platform
 

Recently uploaded

Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 

Recently uploaded (20)

Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 

Fast Billion-scale Graph Computation Using a Bimodal Block Processing Model

  • 1. Fast Billion-scale Graph Computation Using a Bimodal Block Processing Model Hugo Gualdron1, Robson Cordeiro1, Jose Rodrigues-Jr1, Duen Horng (Polo) Chau 2, Minsuk Kahng2, U Kang3 1University of Sao Paulo, Brazil 2Georgia Institute of Technology, Atlanta, USA 3Seoul National University, Republic of Korea {gualdron,robson,junio}@icmc.usp.br, {polo,kahng}@gatech.edu, ukang@snu.ac.kr Sept/2016 Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 1 / 28
  • 2. Summary 1 Introduction 2 Graph Organization 3 Processing Model 4 Programming Model 5 Results 6 Conclusions Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 2 / 28
  • 3. Introduction Summary 1 Introduction 2 Graph Organization 3 Processing Model 4 Programming Model 5 Results 6 Conclusions Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 3 / 28
  • 4. Introduction Introduction Frameworks for large-graph processing based on secondary memory have shown better or equal performance than distributed frameworks. They are better to solve problems such as PageRank, Connected Components and Triangle Counting. Large-graph processing by minimizing computational resources is very important for the industry. “You can have a second computer once you have shown you know how to use the first one.” Paul Barham Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 4 / 28
  • 5. Introduction Contributions M-Flash, the fastest graph computation framework to date. A Bimodal Block Processing Model, an innovation that is able to boost the graph computation by minimizing the I/O cost even further. A flexible and simple programming model to easily implement popular and essential graph algorithm including the first single-machine billion-scale eigensolver. Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 5 / 28
  • 6. Graph Organization Summary 1 Introduction 2 Graph Organization 3 Processing Model 4 Programming Model 5 Results 6 Conclusions Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 6 / 28
  • 7. Graph Organization Graph Organization – Facts Edges and vertex values do not fit in main memory. Sequential operations on disk must be maximized, improving the performance on HDDs and SSDs. Real graphs have a varying density of edges (sparse and dense regions) and these regions can be processed in different ways to achieve superior performance. Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 7 / 28
  • 8. Graph Organization Graph Organization Each vertex has a set of attributes γ Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 8 / 28
  • 9. Graph Organization Graph Organization Vertices are divided into intervals. Intersections between intervals are called blocks Edges are divided into blocks Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 8 / 28
  • 10. Graph Organization Graph Organization M-Flash loads in memory some vertex intervals and edges are processed using streaming. Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 8 / 28
  • 11. Graph Organization Graph Organization Edges create a source-partition (SP) when they are grouped by source interval. Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 8 / 28
  • 12. Graph Organization Graph Organization Edges create a destination-partition (DP) when they are grouped by destination interval. Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 8 / 28
  • 13. Graph Organization Number of intervals to process a graph β = φ(T + 1) |V | M M = RAM size |V | = Number of vertices of the graph φ = Size of data per vertex T = Number of threads Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 9 / 28
  • 14. Graph Organization Number of intervals to process a graph β = φ(T + 1) |V | M M = RAM size |V | = Number of vertices of the graph φ = Size of data per vertex T = Number of threads For example, 4 bytes of data per node, 2 threads, a graph with 2 billion nodes, and for 1 GB RAM: Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 9 / 28
  • 15. Graph Organization Number of intervals to process a graph β = φ(T + 1) |V | M M = RAM size |V | = Number of vertices of the graph φ = Size of data per vertex T = Number of threads For example, 4 bytes of data per node, 2 threads, a graph with 2 billion nodes, and for 1 GB RAM: β = 23 intervals 529 blocks Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 9 / 28
  • 16. Processing Model Summary 1 Introduction 2 Graph Organization 3 Processing Model 4 Programming Model 5 Results 6 Conclusions Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 10 / 28
  • 17. Processing Model Processing Model Dense Block Processing model DBP Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 11 / 28
  • 18. Processing Model Processing Model Dense Block Processing model DBP An efficient model to process blocks with higher density, i.e., blocks with high number of edges. Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 11 / 28
  • 19. Processing Model Processing Model Streaming Partition Processing Model SPP SP = Source-partition, DP = Destination-partition Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 12 / 28
  • 20. Processing Model Processing Model Streaming Partition Processing Model SPP SP = Source-partition, DP = Destination-partition An efficient model for processing sparse blocks. Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 12 / 28
  • 21. Processing Model Processing Model Bimodal Block Processing Model BBP I/O cost as a metric of efficiency DBP = Dense Block Processing, SSP = Streaming Partition Processing O (DBP (G)) = O (β + 1) |V | + |E| B + β2 O (SPP (G)) = O   2 |V | + |E| + 2 ˆE B + β   β = Number of intervals V = Number of Vertices E = Number of Edges ˆE = Number of edges with extended size B = Block size for I/O operations on disk Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 13 / 28
  • 22. Processing Model Processing Model Bimodal Block Processing Model BBP I/O cost per block G(p,q) O DBP G(p,q) = O ϑφ (1 + 1/β) + ξψ B O SPP G(p,q) = O 2ϑφ/β + 2ξ(φ + ψ) + ξψ B DBP = Dense Block Processing, SSP = Streaming Partition Processing ξ = Number of edges withing the block G(p,q) ϑ = Number of vertices within the interval φ = Vertex size in bytes ψ = Edge size in bytes B = Block size for I/O operations on disk. Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 14 / 28
  • 23. Processing Model Processing Model Bimodal Block Processing Model BBP Ratio SPP/DBP O SPP DBP = O 1 β + 2ξ ϑ 1 + ψ φ BlockType G(p,q) = sparse, if O SPP DBP < 1 dense, otherwise DBP = Dense Block Processing, SSP = Streaming Partition Processing Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 15 / 28
  • 24. Processing Model Processing Model — Preprocessing 1 Edges are divided in source intervals. Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 16 / 28
  • 25. Processing Model Processing Model — Preprocessing 1 The number of edges is measured at the same time to classify the block (sparse or dense). Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 16 / 28
  • 26. Processing Model Processing Model — Preprocessing 2 Source Partitions are rewritten and we store only edges of sparse blocks. Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 16 / 28
  • 27. Processing Model Processing Model - an iteration Destination-partitions are generated since the source-partitions. For processing a source-partition, vertex values of the source interval are loaded in memory. The edges are rewritten and divided by destination. Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 16 / 28
  • 28. Processing Model Processing Model - an iteration Destination-partitions are generated since the source-partitions. For processing a source-partition, vertex values of the source interval are loaded in memory. The edges are rewritten and divided by destination. Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 16 / 28
  • 29. Processing Model Processing Model - an iteration Destination-partitions are generated since the source-partitions. For processing a source-partition, vertex values of the source interval are loaded in memory. The edges are rewritten and divided by destination. Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 16 / 28
  • 30. Processing Model Processing Model - an iteration The graph processing starts over the destination-partitions and the dense blocks. Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 16 / 28
  • 31. Processing Model Processing Model - an iteration Vertex values of the destination interval are initialized. Next, edges of the destination partition are processed. Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 16 / 28
  • 32. Processing Model Processing Model - an iteration Next, dense blocks of the interval are processed. To do the processing, vertex values associated with the source interval are loaded in memory. Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 16 / 28
  • 33. Processing Model Processing Model - an iteration Vertex values of the destination interval are initialized. Next, edges of the destination partition are processed. Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 16 / 28
  • 34. Processing Model Processing Model - an iteration Next, dense blocks of the interval are processed. To do the processing, vertex values associated with the source interval are loaded in memory. Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 16 / 28
  • 35. Processing Model Processing Model - an iteration Vertex values of the destination interval are initialized. Next, edges of the destination partition are processed. Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 16 / 28
  • 36. Processing Model Programming Model MAlgorithm: Algorithm Interface initialize (Vertex v); gather (Vertex u, Vertex v, EdgeData data); process (Accum v 1, Accum v 2, Accum v out); apply (Vertex v); PageRank degree(v) = out degree for Vertex v; initialize (v): v.value = 0 gather (u, v, data): v.value += u.value / degree(u) process (v 1, v 2, Accum v out): v out = v 1 + v 2 apply (Vertex v): v.value = 0.15 + 0.85 * v.value Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 17 / 28
  • 37. Programming Model Summary 1 Introduction 2 Graph Organization 3 Processing Model 4 Programming Model 5 Results 6 Conclusions Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 18 / 28
  • 38. Programming Model Programming Model Algorithm for Connected Components initialize (v): v.value = v.id gather (u, v, data): v.value = min (u.value, v.value) process: (v 1, v 2, Accum v out): v out = min (v 1, v 2) Algorithm for Sparse Matrix-Vector Multiplication SpMV initialize (v): v.value = 0 gather (u, v, data): v.value += u.value * data process: (v 1, v 2, Accum v out): v out = v 1 + v 2 Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 19 / 28
  • 39. Results Summary 1 Introduction 2 Graph Organization 3 Processing Model 4 Programming Model 5 Results 6 Conclusions Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 20 / 28
  • 40. Results Results Datasets Graph Nodes Edges Size LiveJournal 4,847,571 68,993,773 Small Twitter 41,652,230 1,468,365,182 Medium YahooWeb 1,413,511,391 6,636,600,779 Large R-Mat (Synthetic graph) 4,000,000,000 12,000,000,000 Large Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 21 / 28
  • 41. Results Results State of the art approaches that are compared with the experiments GraphChi (2012) X-Stream (2013) TurboGraph (2013) MMap (2014) GridGraph (2015) M-Flash Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 22 / 28
  • 42. Results Preprocessing time (seconds) LiveJournal Twitter YahooWeb R-Mat GraphChi 23 511 2,781 7,440 X-Stream 5 131 865 2,553 TurboGraph 18 582 4,694 - MMap 17 372 636 - M-Flash 10 206 1,265 4,837 In the worst case, M-Flash is twice slower than the best framework (MMAP) It reads and writes two times the entire graph on disk, which is the third best performance, after MMap and X-Stream. Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 23 / 28
  • 43. Results Runtime (in seconds) with 8GB of RAM The symbol “-” indicates that the corresponding system failed to process the graph or the information is not available in the respective papers. Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 24 / 28
  • 44. Results Effect of Memory Size Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 25 / 28
  • 45. Conclusions Summary 1 Introduction 2 Graph Organization 3 Processing Model 4 Programming Model 5 Results 6 Conclusions Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 26 / 28
  • 46. Conclusions Conclusions M-Flash uses an innovative design that considers the varying density of the graph; By using an adaptive engineering, it benefits from the best of both worlds: stream processing and dense-block processing; Its programming model supports classical and new algorithms straightly adapting to the model of other similar frameworks – among its processing possibilities: PageRank, Connected Components, matrix-vector multiplication, eigensolver, clustering coefficient, to name few; To date, M-Flash is the fastest single-node graph processing framework, beating competitors GraphChi, X-Stream, TurboGraph and MMAP. Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 27 / 28
  • 47. Conclusions Acknowledgments CNPq (grant 444985/2014-0) Fapesp (grants 2016/02557-0, 2014/21483-2), Capes NSF (grants IIS-1563816, TWC-1526254, IIS-1217559) GRFP (grant DGE-1148903) Korean (MSIP) agency IITP (grant R0190-15-2012) Thanks Gualdron et al. (1 University of Sao Paulo, Brazil 2 Georgia Institute of Technology, Atlanta, USA 3 Seoul National University, Republic ofSept/2016 28 / 28