SlideShare a Scribd company logo
Pregel: A System for Large-Scale Graph Processing
Amir H. Payberah
amir@sics.se
Amirkabir University of Technology
(Tehran Polytechnic)
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 1 / 40
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 2 / 40
Introduction
Graphs provide a flexible abstraction for describing relationships be-
tween discrete objects.
Many problems can be modeled by graphs and solved with appro-
priate graph algorithms.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 3 / 40
Large Graph
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 4 / 40
Large-Scale Graph Processing
Large graphs need large-scale processing.
A large graph either cannot fit into memory of single computer or
it fits with huge cost.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 5 / 40
Question
Can we use platforms like MapReduce or Spark, which are based on data-parallel
model, for large-scale graph proceeding?
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 6 / 40
Data-Parallel Model for Large-Scale Graph Processing
The platforms that have worked well for developing parallel applica-
tions are not necessarily effective for large-scale graph problems.
Why?
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 7 / 40
Graph Algorithms Characteristics (1/2)
Unstructured problems
• Difficult to extract parallelism based on partitioning of the data: the
irregular structure of graphs.
• Limited scalability: unbalanced computational loads resulting from
poorly partitioned data.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 8 / 40
Graph Algorithms Characteristics (1/2)
Unstructured problems
• Difficult to extract parallelism based on partitioning of the data: the
irregular structure of graphs.
• Limited scalability: unbalanced computational loads resulting from
poorly partitioned data.
Data-driven computations
• Difficult to express parallelism based on partitioning of computation:
the structure of computations in the algorithm is not known a priori.
• The computations are dictated by nodes and links of the graph.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 8 / 40
Graph Algorithms Characteristics (2/2)
Poor data locality
• The computations and data access patterns do not have much local-
ity: the irregular structure of graphs.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 9 / 40
Graph Algorithms Characteristics (2/2)
Poor data locality
• The computations and data access patterns do not have much local-
ity: the irregular structure of graphs.
High data access to computation ratio
• Graph algorithms are often based on exploring the structure of a
graph to perform computations on the graph data.
• Runtime can be dominated by waiting memory fetches: low locality.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 9 / 40
Proposed Solution
Graph-Parallel Processing
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 10 / 40
Proposed Solution
Graph-Parallel Processing
Computation typically depends on the neighbors.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 10 / 40
Graph-Parallel Processing
Restricts the types of computation.
New techniques to partition and distribute graphs.
Exploit graph structure.
Executes graph algorithms orders-of-magnitude faster than more
general data-parallel systems.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 11 / 40
Data-Parallel vs. Graph-Parallel Computation
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 12 / 40
Data-Parallel vs. Graph-Parallel Computation
Data-parallel computation
• Record-centric view of data.
• Parallelism: processing independent data on separate resources.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 13 / 40
Data-Parallel vs. Graph-Parallel Computation
Data-parallel computation
• Record-centric view of data.
• Parallelism: processing independent data on separate resources.
Graph-parallel computation
• Vertex-centric view of graphs.
• Parallelism: partitioning graph (dependent) data across processing
resources, and resolving dependencies (along edges) through
iterative computation and communication.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 13 / 40
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 14 / 40
Seven Bridges of K¨onigsberg
Finding a walk through the city that would cross each bridge once
and only once.
Euler proved that the problem has no solution.
Map of K¨onigsberg in Euler’s time, highlighting the river Pregel and the bridges.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 15 / 40
Pregel
Large-scale graph-parallel processing platform developed at Google.
Inspired by bulk synchronous parallel (BSP) model.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 16 / 40
Bulk Synchronous Parallel (1/2)
It is a parallel programming model.
The model consists of:
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 17 / 40
Bulk Synchronous Parallel (1/2)
It is a parallel programming model.
The model consists of:
• A set of processor-memory pairs.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 17 / 40
Bulk Synchronous Parallel (1/2)
It is a parallel programming model.
The model consists of:
• A set of processor-memory pairs.
• A communications network that delivers messages in a point-
to-point manner.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 17 / 40
Bulk Synchronous Parallel (1/2)
It is a parallel programming model.
The model consists of:
• A set of processor-memory pairs.
• A communications network that delivers messages in a point-
to-point manner.
• A mechanism for the efficient barrier synchronization for all or
a subset of the processes.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 17 / 40
Bulk Synchronous Parallel (1/2)
It is a parallel programming model.
The model consists of:
• A set of processor-memory pairs.
• A communications network that delivers messages in a point-
to-point manner.
• A mechanism for the efficient barrier synchronization for all or
a subset of the processes.
• There are no special combining, replicating, or broadcasting fa-
cilities.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 17 / 40
Bulk Synchronous Parallel (2/2)
All vertices update in parallel (at the same time).
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 18 / 40
Vertex-Centric Programs
Think as a vertex.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 19 / 40
Vertex-Centric Programs
Think as a vertex.
Each vertex computes individually its value: in parallel
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 19 / 40
Vertex-Centric Programs
Think as a vertex.
Each vertex computes individually its value: in parallel
Each vertex can see its local context, and updates its value accord-
ingly.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 19 / 40
Data Model
A directed graph that stores the program state, e.g., the current
value.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 20 / 40
Execution Model (1/3)
Applications run in sequence of iterations: supersteps
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 21 / 40
Execution Model (1/3)
Applications run in sequence of iterations: supersteps
During a superstep, user-defined functions for each vertex is invoked
(method Compute()): in parallel
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 21 / 40
Execution Model (1/3)
Applications run in sequence of iterations: supersteps
During a superstep, user-defined functions for each vertex is invoked
(method Compute()): in parallel
A vertex in superstep S can:
• reads messages sent to it in superstep S-1.
• sends messages to other vertices: receiving at superstep S+1.
• modifies its state.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 21 / 40
Execution Model (1/3)
Applications run in sequence of iterations: supersteps
During a superstep, user-defined functions for each vertex is invoked
(method Compute()): in parallel
A vertex in superstep S can:
• reads messages sent to it in superstep S-1.
• sends messages to other vertices: receiving at superstep S+1.
• modifies its state.
Vertices communicate directly with one another by sending mes-
sages.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 21 / 40
Execution Model (2/3)
Superstep 0: all vertices are in the active state.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 22 / 40
Execution Model (2/3)
Superstep 0: all vertices are in the active state.
A vertex deactivates itself by voting to halt: no further work to do.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 22 / 40
Execution Model (2/3)
Superstep 0: all vertices are in the active state.
A vertex deactivates itself by voting to halt: no further work to do.
A halted vertex can be active if it receives a message.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 22 / 40
Execution Model (2/3)
Superstep 0: all vertices are in the active state.
A vertex deactivates itself by voting to halt: no further work to do.
A halted vertex can be active if it receives a message.
The whole algorithm terminates when:
• All vertices are simultaneously inactive.
• There are no messages in transit.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 22 / 40
Execution Model (3/3)
Aggregation: a mechanism for global communication, monitoring,
and data.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 23 / 40
Execution Model (3/3)
Aggregation: a mechanism for global communication, monitoring,
and data.
Runs after each superstep.
Each vertex can provide a value to an aggregator in superstep S.
The system combines those values and the resulting value is made
available to all vertices in superstep S + 1.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 23 / 40
Execution Model (3/3)
Aggregation: a mechanism for global communication, monitoring,
and data.
Runs after each superstep.
Each vertex can provide a value to an aggregator in superstep S.
The system combines those values and the resulting value is made
available to all vertices in superstep S + 1.
A number of predefined aggregators, e.g., min, max, sum.
Aggregation operators should be commutative and associative.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 23 / 40
Example: Max Value (1/4)
i_val := val
for each message m
if m > val then val := m
if i_val == val then
vote_to_halt
else
for each neighbor v
send_message(v, val)
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 24 / 40
Example: Max Value (2/4)
i_val := val
for each message m
if m > val then val := m
if i_val == val then
vote_to_halt
else
for each neighbor v
send_message(v, val)
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 25 / 40
Example: Max Value (3/4)
i_val := val
for each message m
if m > val then val := m
if i_val == val then
vote_to_halt
else
for each neighbor v
send_message(v, val)
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 26 / 40
Example: Max Value (4/4)
i_val := val
for each message m
if m > val then val := m
if i_val == val then
vote_to_halt
else
for each neighbor v
send_message(v, val)
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 27 / 40
Example: PageRank
Update ranks in parallel.
Iterate until convergence.
R[i] = 0.15 +
j∈Nbrs(i)
wjiR[j]
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 28 / 40
Example: PageRank
Pregel_PageRank(i, messages):
// receive all the messages
total = 0
foreach(msg in messages):
total = total + msg
// update the rank of this vertex
R[i] = 0.15 + total
// send new messages to neighbors
foreach(j in out_neighbors[i]):
sendmsg(R[i] * wij) to vertex j
R[i] = 0.15 +
j∈Nbrs(i)
wjiR[j]
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 29 / 40
Partitioning the Graph
The pregel library divides a graph into a number of partitions.
Each consisting of a set of vertices and all of those vertices’ outgoing
edges.
Vertices are assigned to partitions based on their vertex-ID (e.g.,
hash(ID)).
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 30 / 40
Implementation (1/4)
Master-worker model.
User programs are copied on machines.
One copy becomes the master.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 31 / 40
Implementation (2/4)
The master is responsible for
• Coordinating workers activity.
• Determining the number of partitions.
Each worker is responsible for
• Maintaining the state of its partitions.
• Executing the user’s Compute() method on its vertices.
• Managing messages to and from other workers.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 32 / 40
Implementation (3/4)
The master assigns one or more partitions to each worker.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 33 / 40
Implementation (3/4)
The master assigns one or more partitions to each worker.
The master assigns a portion of user input to each worker.
• Set of records containing an arbitrary number of vertices and edges.
• If a worker loads a vertex that belongs to that worker’s partitions,
the appropriate data structures are immediately updated.
• Otherwise the worker enqueues a message to the remote peer that
owns the vertex.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 33 / 40
Implementation (4/4)
After the input has finished loading, all vertices are marked as active.
The master instructs each worker to perform a superstep.
After the computation halts, the master may instruct each worker
to save its portion of the graph.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 34 / 40
Combiner
Sending a message between workers incurs some overhead: use com-
biner.
This can be reduced in some cases: sometimes vertices only care
about a summary value for the messages it is sent (e.g., min, max,
sum, avg).
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 35 / 40
Fault Tolerance (1/2)
Fault tolerance is achieved through checkpointing.
At start of each superstep, master tells workers to save their state:
• Vertex values, edge values, incoming messages
• Saved to persistent storage
Master saves aggregator values (if any).
This is not necessarily done at every superstep: costly
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 36 / 40
Fault Tolerance (2/2)
When master detects one or more worker failures:
• All workers revert to last checkpoint.
• Continue from there.
• That is a lot of repeated work.
• At least it is better than redoing the whole job.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 37 / 40
Pregel Limitations
Inefficient if different regions of the graph converge at different
speed.
Can suffer if one task is more expensive than the others.
Runtime of each phase is determined by the slowest machine.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 38 / 40
Pregel Summary
Bulk Synchronous Parallel model
Vertex-centric
Superstep: sequence of iterations
Master-worker model
Communication: message passing
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 39 / 40
Questions?
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 40 / 40

More Related Content

What's hot

Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
Samy Dindane
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
HBase Storage Internals
HBase Storage InternalsHBase Storage Internals
HBase Storage Internals
DataWorks Summit
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
DataWorks Summit
 
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use CaseApache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Mo Patel
 
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overview
DataArt
 
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Spark Summit
 
Managing Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic OptimizingManaging Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic Optimizing
Databricks
 
Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2
Databricks
 
Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013
Julien Le Dem
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
Rahul Jain
 
Spark DataFrames and ML Pipelines
Spark DataFrames and ML PipelinesSpark DataFrames and ML Pipelines
Spark DataFrames and ML Pipelines
Databricks
 
GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™
Databricks
 
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQLBuilding a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Databricks
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Databricks
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical Optimization
Databricks
 
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
Databricks
 
Substrait Overview.pdf
Substrait Overview.pdfSubstrait Overview.pdf
Substrait Overview.pdf
Rinat Abdullin
 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache Spark
Databricks
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkDatabricks
 

What's hot (20)

Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
HBase Storage Internals
HBase Storage InternalsHBase Storage Internals
HBase Storage Internals
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use CaseApache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
 
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overview
 
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
Managing Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic OptimizingManaging Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic Optimizing
 
Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2
 
Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Spark DataFrames and ML Pipelines
Spark DataFrames and ML PipelinesSpark DataFrames and ML Pipelines
Spark DataFrames and ML Pipelines
 
GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™
 
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQLBuilding a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQL
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical Optimization
 
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
 
Substrait Overview.pdf
Substrait Overview.pdfSubstrait Overview.pdf
Substrait Overview.pdf
 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache Spark
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
 

Viewers also liked

Spark Stream and SEEP
Spark Stream and SEEPSpark Stream and SEEP
Spark Stream and SEEP
Amir Payberah
 
MapReduce
MapReduceMapReduce
MapReduce
Amir Payberah
 
Introduction to stream processing
Introduction to stream processingIntroduction to stream processing
Introduction to stream processing
Amir Payberah
 
Process Management - Part3
Process Management - Part3Process Management - Part3
Process Management - Part3
Amir Payberah
 
Linux Module Programming
Linux Module ProgrammingLinux Module Programming
Linux Module Programming
Amir Payberah
 
Graph processing - Graphlab
Graph processing - GraphlabGraph processing - Graphlab
Graph processing - Graphlab
Amir Payberah
 
MegaStore and Spanner
MegaStore and SpannerMegaStore and Spanner
MegaStore and Spanner
Amir Payberah
 
Process Management - Part1
Process Management - Part1Process Management - Part1
Process Management - Part1
Amir Payberah
 
Threads
ThreadsThreads
Threads
Amir Payberah
 
Introduction to Operating Systems - Part3
Introduction to Operating Systems - Part3Introduction to Operating Systems - Part3
Introduction to Operating Systems - Part3
Amir Payberah
 
Mesos and YARN
Mesos and YARNMesos and YARN
Mesos and YARN
Amir Payberah
 
Main Memory - Part2
Main Memory - Part2Main Memory - Part2
Main Memory - Part2
Amir Payberah
 
IO Systems
IO SystemsIO Systems
IO Systems
Amir Payberah
 
Security
SecuritySecurity
Security
Amir Payberah
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
Amir Payberah
 
Protection
ProtectionProtection
Protection
Amir Payberah
 
Process Management - Part2
Process Management - Part2Process Management - Part2
Process Management - Part2
Amir Payberah
 
Introduction to Operating Systems - Part2
Introduction to Operating Systems - Part2Introduction to Operating Systems - Part2
Introduction to Operating Systems - Part2
Amir Payberah
 
Storage
StorageStorage
Storage
Amir Payberah
 
CPU Scheduling - Part2
CPU Scheduling - Part2CPU Scheduling - Part2
CPU Scheduling - Part2
Amir Payberah
 

Viewers also liked (20)

Spark Stream and SEEP
Spark Stream and SEEPSpark Stream and SEEP
Spark Stream and SEEP
 
MapReduce
MapReduceMapReduce
MapReduce
 
Introduction to stream processing
Introduction to stream processingIntroduction to stream processing
Introduction to stream processing
 
Process Management - Part3
Process Management - Part3Process Management - Part3
Process Management - Part3
 
Linux Module Programming
Linux Module ProgrammingLinux Module Programming
Linux Module Programming
 
Graph processing - Graphlab
Graph processing - GraphlabGraph processing - Graphlab
Graph processing - Graphlab
 
MegaStore and Spanner
MegaStore and SpannerMegaStore and Spanner
MegaStore and Spanner
 
Process Management - Part1
Process Management - Part1Process Management - Part1
Process Management - Part1
 
Threads
ThreadsThreads
Threads
 
Introduction to Operating Systems - Part3
Introduction to Operating Systems - Part3Introduction to Operating Systems - Part3
Introduction to Operating Systems - Part3
 
Mesos and YARN
Mesos and YARNMesos and YARN
Mesos and YARN
 
Main Memory - Part2
Main Memory - Part2Main Memory - Part2
Main Memory - Part2
 
IO Systems
IO SystemsIO Systems
IO Systems
 
Security
SecuritySecurity
Security
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Protection
ProtectionProtection
Protection
 
Process Management - Part2
Process Management - Part2Process Management - Part2
Process Management - Part2
 
Introduction to Operating Systems - Part2
Introduction to Operating Systems - Part2Introduction to Operating Systems - Part2
Introduction to Operating Systems - Part2
 
Storage
StorageStorage
Storage
 
CPU Scheduling - Part2
CPU Scheduling - Part2CPU Scheduling - Part2
CPU Scheduling - Part2
 

Similar to Graph processing - Pregel

Spark
SparkSpark
Process Synchronization - Part1
Process Synchronization -  Part1Process Synchronization -  Part1
Process Synchronization - Part1
Amir Payberah
 
Process Synchronization - Part2
Process Synchronization -  Part2Process Synchronization -  Part2
Process Synchronization - Part2
Amir Payberah
 
Scala
ScalaScala
Chubby
ChubbyChubby
Introduction to Operating Systems - Part1
Introduction to Operating Systems - Part1Introduction to Operating Systems - Part1
Introduction to Operating Systems - Part1
Amir Payberah
 
An Experiment-Driven Performance Model of Stream Processing Operators in Fog ...
An Experiment-Driven Performance Model of Stream Processing Operators in Fog ...An Experiment-Driven Performance Model of Stream Processing Operators in Fog ...
An Experiment-Driven Performance Model of Stream Processing Operators in Fog ...
FogGuru MSCA Project
 
Virtual Memory - Part2
Virtual Memory - Part2Virtual Memory - Part2
Virtual Memory - Part2
Amir Payberah
 
Bigtable
BigtableBigtable
Bigtable
Amir Payberah
 
Reliability Study of wireless corba using Petri net and end to end instanteno...
Reliability Study of wireless corba using Petri net and end to end instanteno...Reliability Study of wireless corba using Petri net and end to end instanteno...
Reliability Study of wireless corba using Petri net and end to end instanteno...
Ahmed Koriem
 
Introduction to cloud computing and big data - part1
Introduction to cloud computing and big data - part1Introduction to cloud computing and big data - part1
Introduction to cloud computing and big data - part1
Amir Payberah
 
Boetticher Presentation Promise 2008v2
Boetticher Presentation Promise 2008v2Boetticher Presentation Promise 2008v2
Boetticher Presentation Promise 2008v2
gregoryg
 
MHH_20Feb_2012111111111111111111111111111.ppt
MHH_20Feb_2012111111111111111111111111111.pptMHH_20Feb_2012111111111111111111111111111.ppt
MHH_20Feb_2012111111111111111111111111111.ppt
BiHongPhc
 
Introduction to cloud computing and big data - part2
Introduction to cloud computing and big data - part2Introduction to cloud computing and big data - part2
Introduction to cloud computing and big data - part2
Amir Payberah
 
Deadlocks
DeadlocksDeadlocks
Deadlocks
Amir Payberah
 
Raminder kaur presentation_two
Raminder kaur presentation_twoRaminder kaur presentation_two
Raminder kaur presentation_tworamikaurraminder
 
HyperPrompt:Prompt-based Task-Conditioning of Transformerspdf
HyperPrompt:Prompt-based Task-Conditioning of TransformerspdfHyperPrompt:Prompt-based Task-Conditioning of Transformerspdf
HyperPrompt:Prompt-based Task-Conditioning of Transformerspdf
Po-Chuan Chen
 
Virtual Memory - Part1
Virtual Memory - Part1Virtual Memory - Part1
Virtual Memory - Part1
Amir Payberah
 
Transformer and BERT
Transformer and BERTTransformer and BERT
Transformer and BERT
Hao(Robin) Dong
 
PLA Minimization -Testing
PLA Minimization -TestingPLA Minimization -Testing
PLA Minimization -Testing
Dr.YNM
 

Similar to Graph processing - Pregel (20)

Spark
SparkSpark
Spark
 
Process Synchronization - Part1
Process Synchronization -  Part1Process Synchronization -  Part1
Process Synchronization - Part1
 
Process Synchronization - Part2
Process Synchronization -  Part2Process Synchronization -  Part2
Process Synchronization - Part2
 
Scala
ScalaScala
Scala
 
Chubby
ChubbyChubby
Chubby
 
Introduction to Operating Systems - Part1
Introduction to Operating Systems - Part1Introduction to Operating Systems - Part1
Introduction to Operating Systems - Part1
 
An Experiment-Driven Performance Model of Stream Processing Operators in Fog ...
An Experiment-Driven Performance Model of Stream Processing Operators in Fog ...An Experiment-Driven Performance Model of Stream Processing Operators in Fog ...
An Experiment-Driven Performance Model of Stream Processing Operators in Fog ...
 
Virtual Memory - Part2
Virtual Memory - Part2Virtual Memory - Part2
Virtual Memory - Part2
 
Bigtable
BigtableBigtable
Bigtable
 
Reliability Study of wireless corba using Petri net and end to end instanteno...
Reliability Study of wireless corba using Petri net and end to end instanteno...Reliability Study of wireless corba using Petri net and end to end instanteno...
Reliability Study of wireless corba using Petri net and end to end instanteno...
 
Introduction to cloud computing and big data - part1
Introduction to cloud computing and big data - part1Introduction to cloud computing and big data - part1
Introduction to cloud computing and big data - part1
 
Boetticher Presentation Promise 2008v2
Boetticher Presentation Promise 2008v2Boetticher Presentation Promise 2008v2
Boetticher Presentation Promise 2008v2
 
MHH_20Feb_2012111111111111111111111111111.ppt
MHH_20Feb_2012111111111111111111111111111.pptMHH_20Feb_2012111111111111111111111111111.ppt
MHH_20Feb_2012111111111111111111111111111.ppt
 
Introduction to cloud computing and big data - part2
Introduction to cloud computing and big data - part2Introduction to cloud computing and big data - part2
Introduction to cloud computing and big data - part2
 
Deadlocks
DeadlocksDeadlocks
Deadlocks
 
Raminder kaur presentation_two
Raminder kaur presentation_twoRaminder kaur presentation_two
Raminder kaur presentation_two
 
HyperPrompt:Prompt-based Task-Conditioning of Transformerspdf
HyperPrompt:Prompt-based Task-Conditioning of TransformerspdfHyperPrompt:Prompt-based Task-Conditioning of Transformerspdf
HyperPrompt:Prompt-based Task-Conditioning of Transformerspdf
 
Virtual Memory - Part1
Virtual Memory - Part1Virtual Memory - Part1
Virtual Memory - Part1
 
Transformer and BERT
Transformer and BERTTransformer and BERT
Transformer and BERT
 
PLA Minimization -Testing
PLA Minimization -TestingPLA Minimization -Testing
PLA Minimization -Testing
 

More from Amir Payberah

P2P Content Distribution Network
P2P Content Distribution NetworkP2P Content Distribution Network
P2P Content Distribution Network
Amir Payberah
 
Data Intensive Computing Frameworks
Data Intensive Computing FrameworksData Intensive Computing Frameworks
Data Intensive Computing Frameworks
Amir Payberah
 
The Stratosphere Big Data Analytics Platform
The Stratosphere Big Data Analytics PlatformThe Stratosphere Big Data Analytics Platform
The Stratosphere Big Data Analytics Platform
Amir Payberah
 
The Spark Big Data Analytics Platform
The Spark Big Data Analytics PlatformThe Spark Big Data Analytics Platform
The Spark Big Data Analytics Platform
Amir Payberah
 
File System Implementation - Part2
File System Implementation - Part2File System Implementation - Part2
File System Implementation - Part2
Amir Payberah
 
File System Implementation - Part1
File System Implementation - Part1File System Implementation - Part1
File System Implementation - Part1
Amir Payberah
 
File System Interface
File System InterfaceFile System Interface
File System Interface
Amir Payberah
 
Main Memory - Part1
Main Memory - Part1Main Memory - Part1
Main Memory - Part1
Amir Payberah
 
CPU Scheduling - Part1
CPU Scheduling - Part1CPU Scheduling - Part1
CPU Scheduling - Part1
Amir Payberah
 

More from Amir Payberah (9)

P2P Content Distribution Network
P2P Content Distribution NetworkP2P Content Distribution Network
P2P Content Distribution Network
 
Data Intensive Computing Frameworks
Data Intensive Computing FrameworksData Intensive Computing Frameworks
Data Intensive Computing Frameworks
 
The Stratosphere Big Data Analytics Platform
The Stratosphere Big Data Analytics PlatformThe Stratosphere Big Data Analytics Platform
The Stratosphere Big Data Analytics Platform
 
The Spark Big Data Analytics Platform
The Spark Big Data Analytics PlatformThe Spark Big Data Analytics Platform
The Spark Big Data Analytics Platform
 
File System Implementation - Part2
File System Implementation - Part2File System Implementation - Part2
File System Implementation - Part2
 
File System Implementation - Part1
File System Implementation - Part1File System Implementation - Part1
File System Implementation - Part1
 
File System Interface
File System InterfaceFile System Interface
File System Interface
 
Main Memory - Part1
Main Memory - Part1Main Memory - Part1
Main Memory - Part1
 
CPU Scheduling - Part1
CPU Scheduling - Part1CPU Scheduling - Part1
CPU Scheduling - Part1
 

Recently uploaded

Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 

Recently uploaded (20)

Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 

Graph processing - Pregel

  • 1. Pregel: A System for Large-Scale Graph Processing Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 1 / 40
  • 2. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 2 / 40
  • 3. Introduction Graphs provide a flexible abstraction for describing relationships be- tween discrete objects. Many problems can be modeled by graphs and solved with appro- priate graph algorithms. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 3 / 40
  • 4. Large Graph Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 4 / 40
  • 5. Large-Scale Graph Processing Large graphs need large-scale processing. A large graph either cannot fit into memory of single computer or it fits with huge cost. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 5 / 40
  • 6. Question Can we use platforms like MapReduce or Spark, which are based on data-parallel model, for large-scale graph proceeding? Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 6 / 40
  • 7. Data-Parallel Model for Large-Scale Graph Processing The platforms that have worked well for developing parallel applica- tions are not necessarily effective for large-scale graph problems. Why? Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 7 / 40
  • 8. Graph Algorithms Characteristics (1/2) Unstructured problems • Difficult to extract parallelism based on partitioning of the data: the irregular structure of graphs. • Limited scalability: unbalanced computational loads resulting from poorly partitioned data. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 8 / 40
  • 9. Graph Algorithms Characteristics (1/2) Unstructured problems • Difficult to extract parallelism based on partitioning of the data: the irregular structure of graphs. • Limited scalability: unbalanced computational loads resulting from poorly partitioned data. Data-driven computations • Difficult to express parallelism based on partitioning of computation: the structure of computations in the algorithm is not known a priori. • The computations are dictated by nodes and links of the graph. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 8 / 40
  • 10. Graph Algorithms Characteristics (2/2) Poor data locality • The computations and data access patterns do not have much local- ity: the irregular structure of graphs. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 9 / 40
  • 11. Graph Algorithms Characteristics (2/2) Poor data locality • The computations and data access patterns do not have much local- ity: the irregular structure of graphs. High data access to computation ratio • Graph algorithms are often based on exploring the structure of a graph to perform computations on the graph data. • Runtime can be dominated by waiting memory fetches: low locality. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 9 / 40
  • 12. Proposed Solution Graph-Parallel Processing Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 10 / 40
  • 13. Proposed Solution Graph-Parallel Processing Computation typically depends on the neighbors. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 10 / 40
  • 14. Graph-Parallel Processing Restricts the types of computation. New techniques to partition and distribute graphs. Exploit graph structure. Executes graph algorithms orders-of-magnitude faster than more general data-parallel systems. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 11 / 40
  • 15. Data-Parallel vs. Graph-Parallel Computation Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 12 / 40
  • 16. Data-Parallel vs. Graph-Parallel Computation Data-parallel computation • Record-centric view of data. • Parallelism: processing independent data on separate resources. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 13 / 40
  • 17. Data-Parallel vs. Graph-Parallel Computation Data-parallel computation • Record-centric view of data. • Parallelism: processing independent data on separate resources. Graph-parallel computation • Vertex-centric view of graphs. • Parallelism: partitioning graph (dependent) data across processing resources, and resolving dependencies (along edges) through iterative computation and communication. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 13 / 40
  • 18. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 14 / 40
  • 19. Seven Bridges of K¨onigsberg Finding a walk through the city that would cross each bridge once and only once. Euler proved that the problem has no solution. Map of K¨onigsberg in Euler’s time, highlighting the river Pregel and the bridges. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 15 / 40
  • 20. Pregel Large-scale graph-parallel processing platform developed at Google. Inspired by bulk synchronous parallel (BSP) model. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 16 / 40
  • 21. Bulk Synchronous Parallel (1/2) It is a parallel programming model. The model consists of: Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 17 / 40
  • 22. Bulk Synchronous Parallel (1/2) It is a parallel programming model. The model consists of: • A set of processor-memory pairs. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 17 / 40
  • 23. Bulk Synchronous Parallel (1/2) It is a parallel programming model. The model consists of: • A set of processor-memory pairs. • A communications network that delivers messages in a point- to-point manner. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 17 / 40
  • 24. Bulk Synchronous Parallel (1/2) It is a parallel programming model. The model consists of: • A set of processor-memory pairs. • A communications network that delivers messages in a point- to-point manner. • A mechanism for the efficient barrier synchronization for all or a subset of the processes. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 17 / 40
  • 25. Bulk Synchronous Parallel (1/2) It is a parallel programming model. The model consists of: • A set of processor-memory pairs. • A communications network that delivers messages in a point- to-point manner. • A mechanism for the efficient barrier synchronization for all or a subset of the processes. • There are no special combining, replicating, or broadcasting fa- cilities. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 17 / 40
  • 26. Bulk Synchronous Parallel (2/2) All vertices update in parallel (at the same time). Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 18 / 40
  • 27. Vertex-Centric Programs Think as a vertex. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 19 / 40
  • 28. Vertex-Centric Programs Think as a vertex. Each vertex computes individually its value: in parallel Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 19 / 40
  • 29. Vertex-Centric Programs Think as a vertex. Each vertex computes individually its value: in parallel Each vertex can see its local context, and updates its value accord- ingly. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 19 / 40
  • 30. Data Model A directed graph that stores the program state, e.g., the current value. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 20 / 40
  • 31. Execution Model (1/3) Applications run in sequence of iterations: supersteps Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 21 / 40
  • 32. Execution Model (1/3) Applications run in sequence of iterations: supersteps During a superstep, user-defined functions for each vertex is invoked (method Compute()): in parallel Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 21 / 40
  • 33. Execution Model (1/3) Applications run in sequence of iterations: supersteps During a superstep, user-defined functions for each vertex is invoked (method Compute()): in parallel A vertex in superstep S can: • reads messages sent to it in superstep S-1. • sends messages to other vertices: receiving at superstep S+1. • modifies its state. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 21 / 40
  • 34. Execution Model (1/3) Applications run in sequence of iterations: supersteps During a superstep, user-defined functions for each vertex is invoked (method Compute()): in parallel A vertex in superstep S can: • reads messages sent to it in superstep S-1. • sends messages to other vertices: receiving at superstep S+1. • modifies its state. Vertices communicate directly with one another by sending mes- sages. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 21 / 40
  • 35. Execution Model (2/3) Superstep 0: all vertices are in the active state. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 22 / 40
  • 36. Execution Model (2/3) Superstep 0: all vertices are in the active state. A vertex deactivates itself by voting to halt: no further work to do. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 22 / 40
  • 37. Execution Model (2/3) Superstep 0: all vertices are in the active state. A vertex deactivates itself by voting to halt: no further work to do. A halted vertex can be active if it receives a message. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 22 / 40
  • 38. Execution Model (2/3) Superstep 0: all vertices are in the active state. A vertex deactivates itself by voting to halt: no further work to do. A halted vertex can be active if it receives a message. The whole algorithm terminates when: • All vertices are simultaneously inactive. • There are no messages in transit. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 22 / 40
  • 39. Execution Model (3/3) Aggregation: a mechanism for global communication, monitoring, and data. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 23 / 40
  • 40. Execution Model (3/3) Aggregation: a mechanism for global communication, monitoring, and data. Runs after each superstep. Each vertex can provide a value to an aggregator in superstep S. The system combines those values and the resulting value is made available to all vertices in superstep S + 1. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 23 / 40
  • 41. Execution Model (3/3) Aggregation: a mechanism for global communication, monitoring, and data. Runs after each superstep. Each vertex can provide a value to an aggregator in superstep S. The system combines those values and the resulting value is made available to all vertices in superstep S + 1. A number of predefined aggregators, e.g., min, max, sum. Aggregation operators should be commutative and associative. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 23 / 40
  • 42. Example: Max Value (1/4) i_val := val for each message m if m > val then val := m if i_val == val then vote_to_halt else for each neighbor v send_message(v, val) Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 24 / 40
  • 43. Example: Max Value (2/4) i_val := val for each message m if m > val then val := m if i_val == val then vote_to_halt else for each neighbor v send_message(v, val) Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 25 / 40
  • 44. Example: Max Value (3/4) i_val := val for each message m if m > val then val := m if i_val == val then vote_to_halt else for each neighbor v send_message(v, val) Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 26 / 40
  • 45. Example: Max Value (4/4) i_val := val for each message m if m > val then val := m if i_val == val then vote_to_halt else for each neighbor v send_message(v, val) Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 27 / 40
  • 46. Example: PageRank Update ranks in parallel. Iterate until convergence. R[i] = 0.15 + j∈Nbrs(i) wjiR[j] Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 28 / 40
  • 47. Example: PageRank Pregel_PageRank(i, messages): // receive all the messages total = 0 foreach(msg in messages): total = total + msg // update the rank of this vertex R[i] = 0.15 + total // send new messages to neighbors foreach(j in out_neighbors[i]): sendmsg(R[i] * wij) to vertex j R[i] = 0.15 + j∈Nbrs(i) wjiR[j] Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 29 / 40
  • 48. Partitioning the Graph The pregel library divides a graph into a number of partitions. Each consisting of a set of vertices and all of those vertices’ outgoing edges. Vertices are assigned to partitions based on their vertex-ID (e.g., hash(ID)). Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 30 / 40
  • 49. Implementation (1/4) Master-worker model. User programs are copied on machines. One copy becomes the master. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 31 / 40
  • 50. Implementation (2/4) The master is responsible for • Coordinating workers activity. • Determining the number of partitions. Each worker is responsible for • Maintaining the state of its partitions. • Executing the user’s Compute() method on its vertices. • Managing messages to and from other workers. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 32 / 40
  • 51. Implementation (3/4) The master assigns one or more partitions to each worker. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 33 / 40
  • 52. Implementation (3/4) The master assigns one or more partitions to each worker. The master assigns a portion of user input to each worker. • Set of records containing an arbitrary number of vertices and edges. • If a worker loads a vertex that belongs to that worker’s partitions, the appropriate data structures are immediately updated. • Otherwise the worker enqueues a message to the remote peer that owns the vertex. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 33 / 40
  • 53. Implementation (4/4) After the input has finished loading, all vertices are marked as active. The master instructs each worker to perform a superstep. After the computation halts, the master may instruct each worker to save its portion of the graph. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 34 / 40
  • 54. Combiner Sending a message between workers incurs some overhead: use com- biner. This can be reduced in some cases: sometimes vertices only care about a summary value for the messages it is sent (e.g., min, max, sum, avg). Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 35 / 40
  • 55. Fault Tolerance (1/2) Fault tolerance is achieved through checkpointing. At start of each superstep, master tells workers to save their state: • Vertex values, edge values, incoming messages • Saved to persistent storage Master saves aggregator values (if any). This is not necessarily done at every superstep: costly Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 36 / 40
  • 56. Fault Tolerance (2/2) When master detects one or more worker failures: • All workers revert to last checkpoint. • Continue from there. • That is a lot of repeated work. • At least it is better than redoing the whole job. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 37 / 40
  • 57. Pregel Limitations Inefficient if different regions of the graph converge at different speed. Can suffer if one task is more expensive than the others. Runtime of each phase is determined by the slowest machine. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 38 / 40
  • 58. Pregel Summary Bulk Synchronous Parallel model Vertex-centric Superstep: sequence of iterations Master-worker model Communication: message passing Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 39 / 40
  • 59. Questions? Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 40 / 40