SlideShare a Scribd company logo
1 of 49
Download to read offline
A Lightweight Infrastructure for
Graph Analytics
Donald Nguyen
Andrew Lenharth and Keshav Pingali
The University of Texas at Austin
What is Graph Analytics?
• Algorithms to compute properties of
graphs
– Connected components, shortest
paths, centrality measures,
diameter, PageRank, …
• Many applications
– Google, path routing, friend
recommendations, network analysis
• Difficult to implement on a large
scale
– Data sets are large, data accesses
are irregular
– Need parallelismand efficient
runtimes
Bridges of Konigsberg
The Social Network
Contributions
• A lightweight infrastructure for graph
analytics based on improvements to the
Galois system
– Example: scalable priority scheduling
• Orders of magnitude faster than third-party
graph DSLs for many applications and inputs
– Galois permits better algorithms to be written
– Galois infrastructure more scalable
3
Outline
1. Abstraction for graph analytics applications
and implementation in Galois system
2. One piece of infrastructure: priority
scheduling
3. Evaluation of Galois system versus graph
analytics DSLs
4
Example: SSSP
• Find the shortest distance from source
node to all other nodes in a graph
– Label nodes with tentativedistance
– Assume non-negativeedge weights
• Algorithms
– Chaotic relaxation O(2V)
– Bellman-Ford O(VE)
– Dijkstra’s algorithm O(E log V)
• Uses priority queue
– Δ-stepping
• Uses sequence of bags to prioritize work
• Δ=1, O(E log V)
• Δ=∞, O(VE)
• Different algorithms are different
schedules for applying relaxations
– SSSP needs priority scheduling for work
efficiency
1
9
2
3
1∞
∞
0
1 9
4
3
1
3
Operator
Edge relaxation
∞
2
5
Example: SSSP
• Find the shortest distance from source
node to all other nodes in a graph
– Label nodes with tentativedistance
– Assume non-negativeedge weights
• Algorithms
– Chaotic relaxation O(2V)
– Bellman-Ford O(VE)
– Dijkstra’s algorithm O(E log V)
• Uses priority queue
– Δ-stepping
• Uses sequence of bags to prioritize work
• Δ=1, O(E log V)
• Δ=∞, O(VE)
• Different algorithms are different
schedules for applying relaxations
– SSSP needs priority scheduling for work
efficiency
1
9
2
3
1∞
∞
0
Active edge
1 9
4
3
1
3
Operator
Edge relaxation
∞
2
6
Example: SSSP
• Find the shortest distance from source
node to all other nodes in a graph
– Label nodes with tentativedistance
– Assume non-negativeedge weights
• Algorithms
– Chaotic relaxation O(2V)
– Bellman-Ford O(VE)
– Dijkstra’s algorithm O(E log V)
• Uses priority queue
– Δ-stepping
• Uses sequence of bags to prioritize work
• Δ=1, O(E log V)
• Δ=∞, O(VE)
• Different algorithms are different
schedules for applying relaxations
– SSSP needs priority scheduling for work
efficiency
1
9
2
3
1∞
∞
0
Active edge
Neighborhood
Activity
1 9
4
3
1
3
Operator
Edge relaxation
∞
2
7
Parallel Program = Operator + Schedule + Parallel Data Structure
Algorithm
8
Parallel Program = Operator + Schedule + Parallel Data Structure
• What is the operator?
– Ligra, PowerGraph:only vertex programs
– Galois: Unrestricted, may even morph graph by
adding/removing nodes and edges
• Where/When does it execute?
– Autonomous scheduling: activities execute
transactionally
– Coordinated scheduling: activities execute in rounds
• Read values refer to previous rounds
• Multiple updates to the same location are resolved with
reduction, etc.
Algorithm
9
Parallel Program = Operator + Schedule + Parallel Data Structure
• What is the operator?
– Ligra, PowerGraph:only vertex programs
– Galois: Unrestricted, may even morph graph by
adding/removing nodes and edges
• Where/When does it execute?
– Autonomous scheduling: activities execute
transactionally
– Coordinated scheduling: activities execute in rounds
• Read values refer to previous rounds
• Multiple updates to the same location are resolved with
reduction, etc.
Algorithm
10
Parallel Program = Operator + Schedule + Parallel Data Structure
• What is the operator?
– Ligra, PowerGraph:only vertex programs
– Galois: Unrestricted, may even morph graph by
adding/removing nodes and edges
• Where/When does it execute?
– Autonomous scheduling: activities execute
transactionally
– Coordinated scheduling: activities execute in rounds
• Read values refer to previous rounds
• Multiple updates to the same location are resolved with
reduction, etc.
Algorithm
11
Parallel Program = Operator + Schedule + Parallel Data Structure
• What is the operator?
– Ligra, PowerGraph:only vertex programs
– Galois: Unrestricted, may even morph graph by
adding/removing nodes and edges
• Where/When does it execute?
– Autonomous scheduling: activities execute
transactionally
– Coordinated scheduling: activities execute in rounds
• Read values refer to previous rounds
• Multiple updates to the same location are resolved with
reduction, etc.
Algorithm
ba
c
12
Galois System
Parallel Program = Operator + Schedule + Parallel Data Structure
Operators Schedules Data StructureAPI Calls
Schedulers Data Structures
AllocatorThread Primitives
Multicore
User Program
Galois System
13
Outline
1. Abstraction for graph analytics applications
and implementation in Galois system
2. One piece of infrastructure: priority
scheduling
3. Evaluation of Galois system versus DSLs
14
Galois Infrastructure for Scheduling
• Library of different schedulers
– FIFO, LIFO, Priority, …
– DSL for specifying scheduling policies
• Nguyen and Pingali (ASPLOS ’11)
15
Priority Scheduling
• Concurrent priority queue
– Implemented in TBB, java.concurrent
– Does not scale: our tasks are very fine-grain
• Sequence of bags
– One bag per priority level
– High overhead for finding non-empty, urgent-
priority bags
16
Ordered by Metric (OBIM)
• Sparse sequence of Bags
– One Bag per priority level
– To reduce synchronization
• Sequence is replicated among cores
• Bag is distributed between cores
– To reduce overhead of finding work
• A reduction tree over machine topology to get
estimate of current urgent priority
17
Scheduling Bag
c1 c2
Head
Package 1
18
Bag
c1 c2
Head
Package 2
Scheduling Bag
c1 c2
Head
Package 1
19
Bag
c1 c2
Head
Package 2
Scheduling Bag
c1 c2
Head
Package 1
20
Bag
c1 c2
Head
Package 2
Scheduling Bag
c1 c2
Head
Package 1
21
Bag
c1 c2
Head
Package 2
Scheduling Bag
c1 c2
Head
Package 1
22
Bag
c1 c2
Head
Package 2
Scheduling Bag
c1 c2
Head
Package 1
23
Bag
c1 c2
Head
Package 2
Scheduling Bag
c1 c2
Head
Package 1
24
Bag
c1 c2
Head
Package 2
Scheduling Bag
c1 c2
Head
Package 1
25
Bag
c1 c2
Head
Package 2
Priority Map
Global Map
Scheduling
Bags
1 30
26
Priority Map
0 1
Global Map
Local Maps
Core 1 Core 2
Scheduling
Bags
1 3
1 30
27
Priority Map
0 1
Global Map
Local Maps
Core 1 Core 2
Scheduling
Bags
1 3
1 30
28
Priority Map
0 1
Global Map
Local Maps
Core 1 Core 2
Scheduling
Bags
1 3
1 30
13
29
Priority Map
0 1
Global Map
Local Maps
Core 1 Core 2
Scheduling
Bags
1 3
1 30
13
30
Priority Map
0 1
Global Map
Local Maps
Core 1 Core 2
Scheduling
Bags
1 3
1 30
13
3
31
Priority Map
0 1
Global Map
Local Maps
Core 1 Core 2
Scheduling
Bags
1 3
1 30
1
3
32
Priority Map
0 1
Global Map
Local Maps
Core 1 Core 2
Scheduling
Bags
1 3
1 30
1
3
15
33
Priority Map
0 1
Global Map
Local Maps
Core 1 Core 2
Scheduling
Bags
1 3
1 30
1
3
15
34
Priority Map
0 1
Global Map
Local Maps
Core 1 Core 2
Scheduling
Bags
1 3
1 30
1
3
15
5
5
35
Priority Map
0 1
Global Map
Local Maps
Core 1 Core 2
Scheduling
Bags
1 3
1 30
1
3
1
5
5
36
Outline
1. Abstraction for graph analytics applications
and implementation in Galois system
2. One piece of infrastructure: priority
scheduling
3. Evaluation of Galois system versus graph
analytics DSLs
37
Graph Analytics DSLs
• Easy to implement their APIs on top of Galois
system
– Galois implementations called PowerGraph-g,
Ligra-g, etc.
– About 200-300 lines of code each
38
Ÿ GraphLab Low et al. (UAI ’10)
Ÿ PowerGraph Gonzalez et al. (OSDI ’12)
Ÿ GraphChi Kyrola et al. (OSDI ’12)
Ÿ Ligra Shun and Blelloch (PPoPP ’13)
Ÿ Pregel Malewicz et al. (SIGMOD ‘10)
Ÿ …
Evaluation
• Platform
– 40-core system
• 4 socket, Xeon E7-4860
(Westmere)
– 128 GB RAM
• Applications
– Breadth-first search (bfs)
– Connected components (cc)
– Approximate diameter (dia)
– PageRank (pr)
– Single-source shortest paths
(sssp)
• Inputs
– twitter50 (50 M nodes, 2 B
edges, low-diameter)
– road (20 M nodes, 60 M
edges, high-diameter)
• Comparison with
– Ligra (shared memory)
– PowerGraph (distributed)
• Runtimes with 64 16-core
machines (1024 cores) does not
beat one 40-core machine using
Galois
39
40
Ligra runtime
Galois runtime
1
10
100
1000
10000
1
10
100
1000
10000
bfs cc dia pr sssp
Ligra runtime
Ligra⌫g runtime
1
2
1
2
4
6
bfs cc dia pr sssp
Range
<= 1
> 1
41
Ligra runtime
Galois runtime
1
10
100
1000
10000
1
10
100
1000
10000
bfs cc dia pr sssp
Ligra runtime
Ligra⌫g runtime
1
2
1
2
4
6
bfs cc dia pr sssp
Range
<= 1
> 1
42
PowerGraph runtime
Galois runtime
1
10
100
1000
10000
1
10
100
1000
10000
bfs cc dia pr sssp
PowerGraph runtime
PowerGraph⇠g runtime
1
2
1
2
4
6
bfs cc dia pr sssp
43
PowerGraph runtime
Galois runtime
1
10
100
1000
10000
1
10
100
1000
10000
bfs cc dia pr sssp
PowerGraph runtime
PowerGraph⇠g runtime
1
2
1
2
4
6
bfs cc dia pr sssp
PowerGraph runtime
Galois runtime
1
10
100
1000
10000
1
10
100
1000
10000
bfs cc dia pr sssp
PowerGraph runtime
PowerGraph⇠g runtime
1
2
1
2
4
6
bfs cc dia pr sssp
44
PowerGraph runtime
Galois runtime
1
10
100
1000
10000
1
10
100
1000
10000
bfs cc dia pr sssp
PowerGraph runtime
PowerGraph⇠g runtime
1
2
1
2
4
6
bfs cc dia pr sssp
• The best algorithm may
require application-
specific scheduling
– Priority scheduling for
SSSP
• The best algorithm may
not be expressible as a
vertex program
– Connected components
with union-find
• Autonomous scheduling
required for high-
diameter graphs
– Coordinated scheduling
uses many rounds and
has too much overhead
45
PowerGraph runtime
Galois runtime
1
10
100
1000
10000
1
10
100
1000
10000
bfs cc dia pr sssp
PowerGraph runtime
PowerGraph⇠g runtime
1
2
1
2
4
6
bfs cc dia pr sssp
• The best algorithm may
require application-
specific scheduling
– Priority scheduling for
SSSP
• The best algorithm may
not be expressible as a
vertex program
– Connected components
with union-find
• Autonomous scheduling
required for high-
diameter graphs
– Coordinated scheduling
uses many rounds and
has too much overhead
46
PowerGraph runtime
Galois runtime
1
10
100
1000
10000
1
10
100
1000
10000
bfs cc dia pr sssp
PowerGraph runtime
PowerGraph⇠g runtime
1
2
1
2
4
6
bfs cc dia pr sssp
• The best algorithm may
require application-
specific scheduling
– Priority scheduling for
SSSP
• The best algorithm may
not be expressible as a
vertex program
– Connected components
with union-find
• Autonomous scheduling
required for high-
diameter graphs
– Coordinated scheduling
uses many rounds and
has too much overhead
47
See Paper For
• More inputs, applications
• Detailed discussion of performance results
• More infrastructure: non-priority scheduling,
memory management
• Results for out-of-core DSLs
48
Summary
• Galois programming model is general
– Permits efficient algorithms to be written
• Galois infrastructure is lightweight
• Simpler graph DSLs can be layered on top
– Can perform better than existing systems
• Download at http://iss.ices.utexas.edu/galois
49

More Related Content

What's hot

Automobile Route Matching with Dynamic Time Warping Using PySpark with Cather...
Automobile Route Matching with Dynamic Time Warping Using PySpark with Cather...Automobile Route Matching with Dynamic Time Warping Using PySpark with Cather...
Automobile Route Matching with Dynamic Time Warping Using PySpark with Cather...Databricks
 
Topic 6: MapReduce Applications
Topic 6: MapReduce ApplicationsTopic 6: MapReduce Applications
Topic 6: MapReduce ApplicationsZubair Nabi
 
Big Graph Analytics Systems (Sigmod16 Tutorial)
Big Graph Analytics Systems (Sigmod16 Tutorial)Big Graph Analytics Systems (Sigmod16 Tutorial)
Big Graph Analytics Systems (Sigmod16 Tutorial)Yuanyuan Tian
 
Machine Learning as a Service: Apache Spark MLlib Enrichment and Web-Based Co...
Machine Learning as a Service: Apache Spark MLlib Enrichment and Web-Based Co...Machine Learning as a Service: Apache Spark MLlib Enrichment and Web-Based Co...
Machine Learning as a Service: Apache Spark MLlib Enrichment and Web-Based Co...Databricks
 
Spark Summit EU talk by Herman van Hovell
Spark Summit EU talk by Herman van HovellSpark Summit EU talk by Herman van Hovell
Spark Summit EU talk by Herman van HovellSpark Summit
 
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CAApache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CARobert Metzger
 
Partitioning SKA Dataflows for Optimal Graph Execution
Partitioning SKA Dataflows for Optimal Graph ExecutionPartitioning SKA Dataflows for Optimal Graph Execution
Partitioning SKA Dataflows for Optimal Graph Execution Chen Wu
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsLeila panahi
 
Generalized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkRGeneralized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkRDatabricks
 
MATLAB_BIg_Data_ds_Haddop_22032015
MATLAB_BIg_Data_ds_Haddop_22032015MATLAB_BIg_Data_ds_Haddop_22032015
MATLAB_BIg_Data_ds_Haddop_22032015Asaf Ben Gal
 
Extending Spark's Ingestion: Build Your Own Java Data Source with Jean George...
Extending Spark's Ingestion: Build Your Own Java Data Source with Jean George...Extending Spark's Ingestion: Build Your Own Java Data Source with Jean George...
Extending Spark's Ingestion: Build Your Own Java Data Source with Jean George...Databricks
 
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...Databricks
 
Random Walks on Large Scale Graphs with Apache Spark with Min Shen
Random Walks on Large Scale Graphs with Apache Spark with Min ShenRandom Walks on Large Scale Graphs with Apache Spark with Min Shen
Random Walks on Large Scale Graphs with Apache Spark with Min ShenDatabricks
 
Introducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph ProcessingIntroducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph Processingsscdotopen
 
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphXIntroduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphXrhatr
 
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...Spark Summit
 
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...Spark Summit
 
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...Data Con LA
 

What's hot (20)

Reading HDF family of formats via NetCDF-Java / CDM
Reading HDF family of formats via NetCDF-Java / CDMReading HDF family of formats via NetCDF-Java / CDM
Reading HDF family of formats via NetCDF-Java / CDM
 
Automobile Route Matching with Dynamic Time Warping Using PySpark with Cather...
Automobile Route Matching with Dynamic Time Warping Using PySpark with Cather...Automobile Route Matching with Dynamic Time Warping Using PySpark with Cather...
Automobile Route Matching with Dynamic Time Warping Using PySpark with Cather...
 
Topic 6: MapReduce Applications
Topic 6: MapReduce ApplicationsTopic 6: MapReduce Applications
Topic 6: MapReduce Applications
 
Big Graph Analytics Systems (Sigmod16 Tutorial)
Big Graph Analytics Systems (Sigmod16 Tutorial)Big Graph Analytics Systems (Sigmod16 Tutorial)
Big Graph Analytics Systems (Sigmod16 Tutorial)
 
Machine Learning as a Service: Apache Spark MLlib Enrichment and Web-Based Co...
Machine Learning as a Service: Apache Spark MLlib Enrichment and Web-Based Co...Machine Learning as a Service: Apache Spark MLlib Enrichment and Web-Based Co...
Machine Learning as a Service: Apache Spark MLlib Enrichment and Web-Based Co...
 
Spark Summit EU talk by Herman van Hovell
Spark Summit EU talk by Herman van HovellSpark Summit EU talk by Herman van Hovell
Spark Summit EU talk by Herman van Hovell
 
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CAApache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
 
Partitioning SKA Dataflows for Optimal Graph Execution
Partitioning SKA Dataflows for Optimal Graph ExecutionPartitioning SKA Dataflows for Optimal Graph Execution
Partitioning SKA Dataflows for Optimal Graph Execution
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling Algorithms
 
Generalized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkRGeneralized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkR
 
MATLAB_BIg_Data_ds_Haddop_22032015
MATLAB_BIg_Data_ds_Haddop_22032015MATLAB_BIg_Data_ds_Haddop_22032015
MATLAB_BIg_Data_ds_Haddop_22032015
 
Extending Spark's Ingestion: Build Your Own Java Data Source with Jean George...
Extending Spark's Ingestion: Build Your Own Java Data Source with Jean George...Extending Spark's Ingestion: Build Your Own Java Data Source with Jean George...
Extending Spark's Ingestion: Build Your Own Java Data Source with Jean George...
 
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
 
Random Walks on Large Scale Graphs with Apache Spark with Min Shen
Random Walks on Large Scale Graphs with Apache Spark with Min ShenRandom Walks on Large Scale Graphs with Apache Spark with Min Shen
Random Walks on Large Scale Graphs with Apache Spark with Min Shen
 
Introducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph ProcessingIntroducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph Processing
 
MapReduce and Hadoop
MapReduce and HadoopMapReduce and Hadoop
MapReduce and Hadoop
 
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphXIntroduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
 
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
 
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
 
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...
 

Similar to A Lightweight Infrastructure for Graph Analytics

Galois: A System for Parallel Execution of Irregular Algorithms
Galois: A System for Parallel Execution of Irregular AlgorithmsGalois: A System for Parallel Execution of Irregular Algorithms
Galois: A System for Parallel Execution of Irregular AlgorithmsDonald Nguyen
 
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)Ontico
 
Lecutre-6 Datapath Design.ppt
Lecutre-6 Datapath Design.pptLecutre-6 Datapath Design.ppt
Lecutre-6 Datapath Design.pptRaJibRaju3
 
Big learning 1.2
Big learning   1.2Big learning   1.2
Big learning 1.2Mohit Garg
 
Large-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCLarge-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCAapo Kyrölä
 
11. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:211. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:2Fabio Fumarola
 
HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010Cloudera, Inc.
 
Automating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomateAutomating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomateAnubhav Jain
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...Reynold Xin
 
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML ConferenceDB Tsai
 
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...Globus
 
BDAS Shark study report 03 v1.1
BDAS Shark study report  03 v1.1BDAS Shark study report  03 v1.1
BDAS Shark study report 03 v1.1Stefanie Zhao
 
On Extending MapReduce - Survey and Experiments
On Extending MapReduce - Survey and ExperimentsOn Extending MapReduce - Survey and Experiments
On Extending MapReduce - Survey and ExperimentsYu Liu
 

Similar to A Lightweight Infrastructure for Graph Analytics (20)

Galois: A System for Parallel Execution of Irregular Algorithms
Galois: A System for Parallel Execution of Irregular AlgorithmsGalois: A System for Parallel Execution of Irregular Algorithms
Galois: A System for Parallel Execution of Irregular Algorithms
 
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
 
Lecutre-6 Datapath Design.ppt
Lecutre-6 Datapath Design.pptLecutre-6 Datapath Design.ppt
Lecutre-6 Datapath Design.ppt
 
Big learning 1.2
Big learning   1.2Big learning   1.2
Big learning 1.2
 
Large-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCLarge-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PC
 
Scala and spark
Scala and sparkScala and spark
Scala and spark
 
MapReduce.pptx
MapReduce.pptxMapReduce.pptx
MapReduce.pptx
 
MapReduce with Hadoop
MapReduce with HadoopMapReduce with Hadoop
MapReduce with Hadoop
 
11. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:211. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:2
 
Pig Experience
Pig ExperiencePig Experience
Pig Experience
 
HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010
 
week15a.pdf
week15a.pdfweek15a.pdf
week15a.pdf
 
Automating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomateAutomating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomate
 
chapter3part1.ppt
chapter3part1.pptchapter3part1.ppt
chapter3part1.ppt
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
 
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
 
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
 
Concurrency
ConcurrencyConcurrency
Concurrency
 
BDAS Shark study report 03 v1.1
BDAS Shark study report  03 v1.1BDAS Shark study report  03 v1.1
BDAS Shark study report 03 v1.1
 
On Extending MapReduce - Survey and Experiments
On Extending MapReduce - Survey and ExperimentsOn Extending MapReduce - Survey and Experiments
On Extending MapReduce - Survey and Experiments
 

Recently uploaded

Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Intelisync
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 

Recently uploaded (20)

Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 

A Lightweight Infrastructure for Graph Analytics

  • 1. A Lightweight Infrastructure for Graph Analytics Donald Nguyen Andrew Lenharth and Keshav Pingali The University of Texas at Austin
  • 2. What is Graph Analytics? • Algorithms to compute properties of graphs – Connected components, shortest paths, centrality measures, diameter, PageRank, … • Many applications – Google, path routing, friend recommendations, network analysis • Difficult to implement on a large scale – Data sets are large, data accesses are irregular – Need parallelismand efficient runtimes Bridges of Konigsberg The Social Network
  • 3. Contributions • A lightweight infrastructure for graph analytics based on improvements to the Galois system – Example: scalable priority scheduling • Orders of magnitude faster than third-party graph DSLs for many applications and inputs – Galois permits better algorithms to be written – Galois infrastructure more scalable 3
  • 4. Outline 1. Abstraction for graph analytics applications and implementation in Galois system 2. One piece of infrastructure: priority scheduling 3. Evaluation of Galois system versus graph analytics DSLs 4
  • 5. Example: SSSP • Find the shortest distance from source node to all other nodes in a graph – Label nodes with tentativedistance – Assume non-negativeedge weights • Algorithms – Chaotic relaxation O(2V) – Bellman-Ford O(VE) – Dijkstra’s algorithm O(E log V) • Uses priority queue – Δ-stepping • Uses sequence of bags to prioritize work • Δ=1, O(E log V) • Δ=∞, O(VE) • Different algorithms are different schedules for applying relaxations – SSSP needs priority scheduling for work efficiency 1 9 2 3 1∞ ∞ 0 1 9 4 3 1 3 Operator Edge relaxation ∞ 2 5
  • 6. Example: SSSP • Find the shortest distance from source node to all other nodes in a graph – Label nodes with tentativedistance – Assume non-negativeedge weights • Algorithms – Chaotic relaxation O(2V) – Bellman-Ford O(VE) – Dijkstra’s algorithm O(E log V) • Uses priority queue – Δ-stepping • Uses sequence of bags to prioritize work • Δ=1, O(E log V) • Δ=∞, O(VE) • Different algorithms are different schedules for applying relaxations – SSSP needs priority scheduling for work efficiency 1 9 2 3 1∞ ∞ 0 Active edge 1 9 4 3 1 3 Operator Edge relaxation ∞ 2 6
  • 7. Example: SSSP • Find the shortest distance from source node to all other nodes in a graph – Label nodes with tentativedistance – Assume non-negativeedge weights • Algorithms – Chaotic relaxation O(2V) – Bellman-Ford O(VE) – Dijkstra’s algorithm O(E log V) • Uses priority queue – Δ-stepping • Uses sequence of bags to prioritize work • Δ=1, O(E log V) • Δ=∞, O(VE) • Different algorithms are different schedules for applying relaxations – SSSP needs priority scheduling for work efficiency 1 9 2 3 1∞ ∞ 0 Active edge Neighborhood Activity 1 9 4 3 1 3 Operator Edge relaxation ∞ 2 7
  • 8. Parallel Program = Operator + Schedule + Parallel Data Structure Algorithm 8
  • 9. Parallel Program = Operator + Schedule + Parallel Data Structure • What is the operator? – Ligra, PowerGraph:only vertex programs – Galois: Unrestricted, may even morph graph by adding/removing nodes and edges • Where/When does it execute? – Autonomous scheduling: activities execute transactionally – Coordinated scheduling: activities execute in rounds • Read values refer to previous rounds • Multiple updates to the same location are resolved with reduction, etc. Algorithm 9
  • 10. Parallel Program = Operator + Schedule + Parallel Data Structure • What is the operator? – Ligra, PowerGraph:only vertex programs – Galois: Unrestricted, may even morph graph by adding/removing nodes and edges • Where/When does it execute? – Autonomous scheduling: activities execute transactionally – Coordinated scheduling: activities execute in rounds • Read values refer to previous rounds • Multiple updates to the same location are resolved with reduction, etc. Algorithm 10
  • 11. Parallel Program = Operator + Schedule + Parallel Data Structure • What is the operator? – Ligra, PowerGraph:only vertex programs – Galois: Unrestricted, may even morph graph by adding/removing nodes and edges • Where/When does it execute? – Autonomous scheduling: activities execute transactionally – Coordinated scheduling: activities execute in rounds • Read values refer to previous rounds • Multiple updates to the same location are resolved with reduction, etc. Algorithm 11
  • 12. Parallel Program = Operator + Schedule + Parallel Data Structure • What is the operator? – Ligra, PowerGraph:only vertex programs – Galois: Unrestricted, may even morph graph by adding/removing nodes and edges • Where/When does it execute? – Autonomous scheduling: activities execute transactionally – Coordinated scheduling: activities execute in rounds • Read values refer to previous rounds • Multiple updates to the same location are resolved with reduction, etc. Algorithm ba c 12
  • 13. Galois System Parallel Program = Operator + Schedule + Parallel Data Structure Operators Schedules Data StructureAPI Calls Schedulers Data Structures AllocatorThread Primitives Multicore User Program Galois System 13
  • 14. Outline 1. Abstraction for graph analytics applications and implementation in Galois system 2. One piece of infrastructure: priority scheduling 3. Evaluation of Galois system versus DSLs 14
  • 15. Galois Infrastructure for Scheduling • Library of different schedulers – FIFO, LIFO, Priority, … – DSL for specifying scheduling policies • Nguyen and Pingali (ASPLOS ’11) 15
  • 16. Priority Scheduling • Concurrent priority queue – Implemented in TBB, java.concurrent – Does not scale: our tasks are very fine-grain • Sequence of bags – One bag per priority level – High overhead for finding non-empty, urgent- priority bags 16
  • 17. Ordered by Metric (OBIM) • Sparse sequence of Bags – One Bag per priority level – To reduce synchronization • Sequence is replicated among cores • Bag is distributed between cores – To reduce overhead of finding work • A reduction tree over machine topology to get estimate of current urgent priority 17
  • 18. Scheduling Bag c1 c2 Head Package 1 18 Bag c1 c2 Head Package 2
  • 19. Scheduling Bag c1 c2 Head Package 1 19 Bag c1 c2 Head Package 2
  • 20. Scheduling Bag c1 c2 Head Package 1 20 Bag c1 c2 Head Package 2
  • 21. Scheduling Bag c1 c2 Head Package 1 21 Bag c1 c2 Head Package 2
  • 22. Scheduling Bag c1 c2 Head Package 1 22 Bag c1 c2 Head Package 2
  • 23. Scheduling Bag c1 c2 Head Package 1 23 Bag c1 c2 Head Package 2
  • 24. Scheduling Bag c1 c2 Head Package 1 24 Bag c1 c2 Head Package 2
  • 25. Scheduling Bag c1 c2 Head Package 1 25 Bag c1 c2 Head Package 2
  • 27. Priority Map 0 1 Global Map Local Maps Core 1 Core 2 Scheduling Bags 1 3 1 30 27
  • 28. Priority Map 0 1 Global Map Local Maps Core 1 Core 2 Scheduling Bags 1 3 1 30 28
  • 29. Priority Map 0 1 Global Map Local Maps Core 1 Core 2 Scheduling Bags 1 3 1 30 13 29
  • 30. Priority Map 0 1 Global Map Local Maps Core 1 Core 2 Scheduling Bags 1 3 1 30 13 30
  • 31. Priority Map 0 1 Global Map Local Maps Core 1 Core 2 Scheduling Bags 1 3 1 30 13 3 31
  • 32. Priority Map 0 1 Global Map Local Maps Core 1 Core 2 Scheduling Bags 1 3 1 30 1 3 32
  • 33. Priority Map 0 1 Global Map Local Maps Core 1 Core 2 Scheduling Bags 1 3 1 30 1 3 15 33
  • 34. Priority Map 0 1 Global Map Local Maps Core 1 Core 2 Scheduling Bags 1 3 1 30 1 3 15 34
  • 35. Priority Map 0 1 Global Map Local Maps Core 1 Core 2 Scheduling Bags 1 3 1 30 1 3 15 5 5 35
  • 36. Priority Map 0 1 Global Map Local Maps Core 1 Core 2 Scheduling Bags 1 3 1 30 1 3 1 5 5 36
  • 37. Outline 1. Abstraction for graph analytics applications and implementation in Galois system 2. One piece of infrastructure: priority scheduling 3. Evaluation of Galois system versus graph analytics DSLs 37
  • 38. Graph Analytics DSLs • Easy to implement their APIs on top of Galois system – Galois implementations called PowerGraph-g, Ligra-g, etc. – About 200-300 lines of code each 38 Ÿ GraphLab Low et al. (UAI ’10) Ÿ PowerGraph Gonzalez et al. (OSDI ’12) Ÿ GraphChi Kyrola et al. (OSDI ’12) Ÿ Ligra Shun and Blelloch (PPoPP ’13) Ÿ Pregel Malewicz et al. (SIGMOD ‘10) Ÿ …
  • 39. Evaluation • Platform – 40-core system • 4 socket, Xeon E7-4860 (Westmere) – 128 GB RAM • Applications – Breadth-first search (bfs) – Connected components (cc) – Approximate diameter (dia) – PageRank (pr) – Single-source shortest paths (sssp) • Inputs – twitter50 (50 M nodes, 2 B edges, low-diameter) – road (20 M nodes, 60 M edges, high-diameter) • Comparison with – Ligra (shared memory) – PowerGraph (distributed) • Runtimes with 64 16-core machines (1024 cores) does not beat one 40-core machine using Galois 39
  • 40. 40 Ligra runtime Galois runtime 1 10 100 1000 10000 1 10 100 1000 10000 bfs cc dia pr sssp Ligra runtime Ligra⌫g runtime 1 2 1 2 4 6 bfs cc dia pr sssp Range <= 1 > 1
  • 41. 41 Ligra runtime Galois runtime 1 10 100 1000 10000 1 10 100 1000 10000 bfs cc dia pr sssp Ligra runtime Ligra⌫g runtime 1 2 1 2 4 6 bfs cc dia pr sssp Range <= 1 > 1
  • 42. 42 PowerGraph runtime Galois runtime 1 10 100 1000 10000 1 10 100 1000 10000 bfs cc dia pr sssp PowerGraph runtime PowerGraph⇠g runtime 1 2 1 2 4 6 bfs cc dia pr sssp
  • 43. 43 PowerGraph runtime Galois runtime 1 10 100 1000 10000 1 10 100 1000 10000 bfs cc dia pr sssp PowerGraph runtime PowerGraph⇠g runtime 1 2 1 2 4 6 bfs cc dia pr sssp
  • 44. PowerGraph runtime Galois runtime 1 10 100 1000 10000 1 10 100 1000 10000 bfs cc dia pr sssp PowerGraph runtime PowerGraph⇠g runtime 1 2 1 2 4 6 bfs cc dia pr sssp 44
  • 45. PowerGraph runtime Galois runtime 1 10 100 1000 10000 1 10 100 1000 10000 bfs cc dia pr sssp PowerGraph runtime PowerGraph⇠g runtime 1 2 1 2 4 6 bfs cc dia pr sssp • The best algorithm may require application- specific scheduling – Priority scheduling for SSSP • The best algorithm may not be expressible as a vertex program – Connected components with union-find • Autonomous scheduling required for high- diameter graphs – Coordinated scheduling uses many rounds and has too much overhead 45
  • 46. PowerGraph runtime Galois runtime 1 10 100 1000 10000 1 10 100 1000 10000 bfs cc dia pr sssp PowerGraph runtime PowerGraph⇠g runtime 1 2 1 2 4 6 bfs cc dia pr sssp • The best algorithm may require application- specific scheduling – Priority scheduling for SSSP • The best algorithm may not be expressible as a vertex program – Connected components with union-find • Autonomous scheduling required for high- diameter graphs – Coordinated scheduling uses many rounds and has too much overhead 46
  • 47. PowerGraph runtime Galois runtime 1 10 100 1000 10000 1 10 100 1000 10000 bfs cc dia pr sssp PowerGraph runtime PowerGraph⇠g runtime 1 2 1 2 4 6 bfs cc dia pr sssp • The best algorithm may require application- specific scheduling – Priority scheduling for SSSP • The best algorithm may not be expressible as a vertex program – Connected components with union-find • Autonomous scheduling required for high- diameter graphs – Coordinated scheduling uses many rounds and has too much overhead 47
  • 48. See Paper For • More inputs, applications • Detailed discussion of performance results • More infrastructure: non-priority scheduling, memory management • Results for out-of-core DSLs 48
  • 49. Summary • Galois programming model is general – Permits efficient algorithms to be written • Galois infrastructure is lightweight • Simpler graph DSLs can be layered on top – Can perform better than existing systems • Download at http://iss.ices.utexas.edu/galois 49