ABSTRACT GRAPH MACHINE: MODELING ORDERINGS IN ASYNCHRONOUS DISTRIBUTED-MEMORY PARALLEL GRAPH ALGORITHMS

Modeling Orderings in Asynchronous Distributed-Memory
Parallel Graph Algorithms
Abstract Graph Machine
Final Dissertation Defense
Thejaka Amila Kanewala
November 9, 2018

Agenda
• Preliminaries
• Motivation
• Abstract Graph Machine (AGM)
• Extended Abstract Graph Machine (EAGM)
• Engineering an AGM
• Engineering an EAGM
• Results
• Final Remarks
2Abstract Graph Machine

Graphs
• Graph is an ordered pair, G = (V, E) where V is a set of vertices and E
is a set of edges ( ).
Our Focus.

Graph Applications (Graph Problems)
• Single Source Shortest Paths
• Breadth First Search
• Connected Components
• Maximal Independent Set
• Minimum Spanning Tree
• Graph Coloring
• And many more …
Multiple algorithms to solve a
graph problem.
E.g.,
- Dijkstra’s SSSP
- Bellman-Ford
- Etc.

Why Performance in Parallel Graph Algorithms is
Challenging ?
• Low compute/communication ratio (caused by irregularity)
• Synchronization overhead
• Higher amount of work
• Many dependencies
- Input graph
- The algorithm logic
- The underlying runtime
- Etc.

Regular vs. Irregular
Communication only take
place at the data boundary
Communication take place
everywhere.
Low communication/computation
ratio.
High communication/computation
ratio.

Synchronization Overhead
• Many existing parallel graph algorithms are designed focusing
shared-memory parallel architectures.
• When we ”extend” these algorithms for distributed-memory
execution, they end up having many synchronization phases.
 Synchronization overhead is significant when processing a
large graph on many distributed nodes.

Shared-Memory Parallel Graph Algorithms
• Shared-memory parallel algorithms can be describe abstractly using the
Parallel Random Access Machine (PRAM) model.
Independent
processors
with its own
memory (e.g.,
registers)
Shared-memory

Shared-Memory Parallel Graph Algorithms
.
.
.
.
.
Consists of
multiple phases
like these
Synchronize
Synchronize
These algorithms can be naturally extended to distributed-memory using
Bulk Synchronous Parallel (BSP).

Shared-Memory to Distributed-Memory
.
.
.
.
.
.
.
.
.
. Global Barrier
Global Barrier
Executes many global barriers. Synchronization overhead is
significant, especially when executing the program on many nodes.

Asynchronous Graph Algorithms
Level synchronous
BFS, using “S” as
the source.

Level synchronous
the source.
Synchronize

Level synchronous
the source.
Synchronize
Once a label is set, no corrections  0 additional work.

Asynchronous
the source.

Asynchronous
the source. Label correction
Need to correct
these as well.
Generates lot of redundant
work.

• Asynchronous graph algorithms are better in the sense
they avoid the overhead of synchronization.
• BUT, they tend to generate lot of redundant work.
- High runtime

How much Synchronization?
Low
diameter
Have enough
parallel work in
each level
Fewer barriers
(diameter ~ barriers)
E.g., Twitter, ER.
Level synchronous execution is
better.
.
.
.
High
diameter
Not enough
parallel work in a
level
E.g., Road NW.
Asynchronous execution
is better.
We need a way to control the amount of synchronization needed by a graph
algorithm  Abstract Graph Machine (AGM) is a model that can control the
level of synchronization of a graph algorithm using an ordering.

SSSP as an Example
• In an undirected weighted graph (G=(V, E)), the Single Source Shortest Path
problem is to find a path to every vertex so that the sum of weights in that path is
minimized.

Algorithms
void Chaotic(Graph g, Vertex source) {
For each Vertex v in G {
state[v] <- INFINITY;
}
Relax(source, 0);
}
void Relax(Vertex v, Distance d) {
If (d < state[v]) {
state[v] <- d;
For each edge e of v {
Vertex u = target_vertex(e);
Relax(u, d+weight[e]);
}
}
}
void Dijkstra(Graph g, Vertex source)
{
}
pq.insert(<source, 0>);
While pq is not empty {
<v, d> = vertex-distance pair with
minimum distance in pq;
Relax(v, d);
}
}
If (d < state[v]) {
state[v] <- d;
pq.insert(<u, d+weight[e]>);
}
}
}
void -Stepping(Graph g, Vertex source) {
}
insert <source, 0> to appropriate bucket from
buckets;
While all buckets are not empty {
bucket = smallest non-empty bucket in buckets;
For each <v, d> in bucket {
Relax(v, d);
}
}
}
If (d < state[v]) {
state[v] <- d
insert <u, d+weight[e]> to appropriate
bucket in buckets;
}
}
}
Relax
Relax
Relax
Process random
vertex
Put vertex in priority
queue
Put vertex in ∆
priority queue

SSSP Algorithms Processing Work
Chaotic ▲-Stepping KLA Dijkstras
Vertex, distance pairs
generated by the Relax
function are fed to itself.
Vertex distance pairs
generated in by the Relax
function are ordered based on
∆. Generated vertex-distance
pair is inserted into
appropriated, bucket and
vertex, distance pairs in the
first bucket is processed first.
Vertex, distance pairs are
ordered based on the level
from the source vertex.
Buckets are created based on
level interval k.
Vertex, distance pairs are
ordered based on the
distance. Smallest distant
vertex, distance pairs are
processed first.
Relax
State
update
Ordering of
updates
Same Same
Different Different

Ordering of SSSP Algorithms
Chaotic ▲-Stepping KLA Dijkstra
Generated vertex,
distance pairs are
selected randomly and not
ordered.
Generated vertex,
distance pairs are
separated into several
partitions based on a
relation defined on the
distance and ∆.
Generated vertex,
distance pairs are
separated into partitions
based on the level and k.
Generated vertex,
distance pairs are
separated into partitions
based on the distinct
distance values.
Single partition. Multiple partitions based
on ∆.
Multiple partitions based
on the level.
Multiple partitions based
on the distance.
(v0,d0)
(v1,d1)
(v2,d2)
(v3,d3)
(v4,d4)(v5,d5)
(v6,d6) (v0,d0)(v1,d1)
(v2,d2)(v3,d3)
(v4,d4)(v5,d5)
(v6,d6) (v0,d0)
(v1,d1)
(v2,d2)
(v3,d3)
(v4,d4)
(v5,d5)
(v6,d6)
(v0,d0)
(v1,d1)
(v2,d2)
…
Unordered
Unordered
within ∆
Unordered
within k
Ordered
Ordered
between ∆
Ordered
between k

Distributed Memory Implementations
Chaotic ▲-Stepping KLA Dijkstras
Barriers are executed at
the beginning of the
algorithm and at the end
of the algorithm.
A barrier is executed after
processing a ∆ bucket.
processing k levels.
processing a vertex-
distance pair with different
distance.
(v0,d0)
(v2,d2)
(v4,d4)
(v6,d6)
(v1,d1)
(v3,d3)
(v5,d5)
R0 R1
(v1,d1)
(v3,d3)
(v5,d5)
(v0,d0)
(v2,d2)
(v4,d4)
(v6,d6)
R0 R1
(v0,d0)
(v2,d2)
(v4,d4)
(v6,d6)
(v1,d1)
(v3,d3)
(v5,d5)
R0 R1
(v0,d0)
(v2,d2)
…
(v1,d1)
…
R0 R1
• Two ranks (R0 and R1)
• Vertices are distributed (e.g., Even vertices in R0 and odd vertices in R1).
• Barrier execution :

Abstract Graph Machine for SSSP
Strict weak ordering
relation defined on work
units.
The strict weak ordering relation creates equivalence
classes and induces an ordering on generated equivalence
classes.

• Abstract model that represents graph algorithms as a processing function and
strict weak ordering
• In AGM for SSSP, the (vertex, distance) pair is called a workitem
• An AGM consists of:
 A definition of a graph (vertices, edges, vertex attributes, edge attributes)
 A set of states (of the computation, e.g., distances)
 A definition of workitems
 A processing function
 A strict weak ordering relation
 An initial workitem set (to start the algorithm)

The Abstract Graph Machine (AGM)
Definition of
graph
Definition of
workitems
Set of states
Processing
function
Strict weak
ordering
Initial set of
workitems
Including vertex and
edge attributes
SSSP: distance
from source
SSSP: relax
SSSP: edge weight
SSSP: starting vertex SSSP: depends on
algorithm!

SSSP Algorithms in AGM
• In general, a SSSP algorithm’s workitem consists of a vertex and a
distance. Therefore, the set of workitems for AGM is defined as
follows:
• is a pair (e.g., w = <v, d>). The “[]”
operator retrieves values associated with ”w”. E.g., w[0] returns the
vertex associated with workitem and w[1] returns the distances.
• The state of the algorithms (shortest distance calculated at a point in
time) is stored in a state. We call this state “distance”.

The Processing Function for SSSP
Input workitem. The
workitem is a pair and
w[0] returns the
vertex associated to
the workitem and w[1]
returns the distance
associated.
1. Check this
condition. If input
workitem’s distance is
better than stored
distance go to 2.
2. Updated
distance and go
to 3.
3. Generate new
workitems.
relax() in AGM notation

State Update & Work Generation
State update logic.
Work generation
logic.

Strict weak ordering relations (SSSP)
• Chaotic SSSP strict weak ordering relation ( )
• Dijkstra’s SSSP strict weak ordering relation ( )
• ∆-Stepping SSSP strict weak ordering relation ( )
• Similar ordering relation can be derived for KLA.

Algorithm Family
• All algorithms share the same processing function but with different
orderings.
• The collection of algorithms that share the same processing function
is called an algorithm family.

Distributed ∆–Stepping Algorithm
Partitions created by global
ordering, based on relation
the .
First bucket created
by delta-stepping
algorithm
Work within a bucket
is not ordered
Partitions are ordered
∆–Stepping Algorithm
on two ranks (R0, R1)
Huge opportunity for
acceleration

Extended AGM (EAGM): ∆–Stepping + thread level ordering
For a specific algorithm, work
within a bucket is not ordered
AGM creates buckets with
global ordering, based on
relation .
And we can do
these in parallel
Each rank can process its
bucket with a better ordering!
But why be
unordered?
E.g., .

Extended AGM (EAGM): Hybrid Hierarchical Ordering
As we distribute/parallelize, we
create different spatial domains
for processing
Extended AGM: Use
different orderings in each
domain
33
E.g.,

Extended AGM in General
Abstract Graph Machine 34
AGM takes a
processing function
and a strict weak
ordering relation
EAGM takes a
processing function
and a hierarchy of
strict weak ordering
relations attached to
each spatial domain.
Global
Ordering
Node
Ordering
NUMA
Ordering
Thread
Ordering

Extended AGM (EAGM): E.g., ∆–Stepping with NUMA level
ordering.
• Work within each bucket is further ordered using NUMA local priority
queues.
ordering for B1
bucket and NUMA domain
1 in Rank 0

Extended AGM (EAGM) - ∆–Stepping with node level
ordering.
• Work within each bucket is further ordered using node local priority
queues.
ordering for B1
bucket in Rank 0

Engineering an AGM

template<typename buckets>
void PF(const WorkItem& wi,
int tid,
buckets& outset) {
Vertex v = std::get<0>(wi);
Distance d = std::get<1>(wi);
Distance old_dist = vdistance[v], last_old_dist;
while (d < old_dist) {
Distance din = CAS (d, &vdistance[v]);
if (din == old_dist) {
FORALL_OUTEDGES_T(v, e, g, Graph) {
Vertex u = boost::target(e, g);
Weight we = boost::get(vweight, e);
WorkItem wigen(u, (d+we));
outset.push(wigen, tid);
}
break;
} else {
old_dist = din;
}
}
}
AGM Graph Processing Framework
//================== Dijkstra Ordering =====================
template<int index>
struct dijkstra : public base_ordering {
public:
static const eagm_ordering ORDERING_NAME = eagm_ordering::enum_dijkstra;
template <typename T>
bool operator()(T i, T j) {
return (std::get<index>(i) < std::get<index>(j));
}
eagm_ordering name() {
return ORDERING_NAME;
}
};
The strict weak
ordering, partitions
WorkItems. In this
case the ordering is
based on the
distance.
The processing
function for
SSSP.
For SSSP a
”WorkItem” is a
vertex and distance.
CAS = Atomic
compare & swap
After executing wi,
processing function
generates new
work items and
they are pushed for
ordering.

Invoking the Framework
// SSSP (AGM) algorithm
typedef agm<Graph,
WorkItem,
ProcessingFunction,
StrictWeakOrdering,
RuntimeModelGen> sssp_agm_t;
sssp_agm_t ssspalgo(rtmodelgen,
ordering,
pf,
initial);
Creating the
AGM for SSSP.
The WorkItem type.
The processing function
(preorder) type.
The strict weak ordering
relation type.
The underlying runtime.
(AGM framework abstracts
out the runtime).
The underlying
runtime.
The specific ordering
function (e.g., Delta
Ordering).
The processing
function instance.
The initial work item
set (e.g., source
vertex).

Mapping AGM to an Implementation
Q2. How should we arrange
processing function and
ordering ?
Q1. How should we do
ordering ?
Abstract
Actual
ImplementationCommunication
method

Ordering Implementation
• Every compute node keeps a set of equivalence classes locally in a data structure.
• Equivalence classes are ordered according to the strict weak ordering (swo)
• Every equivalence class has:
 A representative work-item
 An append buffer that holds work items for the equivalence class.
Representation
Shared-memory
parallel thread
Holds all the work
items that are not
comparable to the
representative.
Strict weak ordering
is transitive.
Will discuss later.
An equivalence
class.

Ordering Implementation: Inserts to an Equivalence
class
• Finds the equivalence class by comparing with representative
work items.
• If there is no equivalence class, create a one with the inserting
work item as the representative work item.
 - Deal with concurrency
Inserting element is not
comparable with the
representative element.

Ordering Implementation: Processing an Equivalence
Class
• Find the equivalence class with minimum representation workitem.
 - Every rank sends the representative work item of the first equivalence class in the list. (Uses all
reduce to communicate minimum work items)
 - Selects the smallest equivalence class
Equivalence
classes are not
distributed
uniformly
If a particular node does
not have an equivalence
class relevant to the
global minimum
representative, then
insert one.

Ordering Implementation: Termination of an Equivalence
Class
• Termination of a single equivalence class
 When processing an equivalence class more work can be
added to the same equivalence class.
 Every node keeps track of number of work items pushed into
the equivalence class and number of work items processed in
the equivalence class.
 The difference between those two numbers are globally
reduced (sum) and when the sum is zero framework decides
that equivalence class is processed.
Counts are reduced
after making sure
all messages are
exchanged (i.e.
after an epoch).

AGM Implementation: Data Structure
• A data structure is needed to hold the equivalence classes
 Later this data structure will be further extended to use at
EAGM levels.
• Requirements of the Data Structure
 Should order nodes by representative work items
 Faster lookups are important
 Ability to concurrently insert/delete and lookup/search data
Sounds like a
Dictionary ADT.

Data Structure
• Binary Search Trees (BST), with Locks
• Linked List, with Locks and Atomics
• Concurrent SkipList
• Partitioning Scheme
E.g, RBTrees. Often
balances the tree. Therefore,
need to lock the whole data
structure for insert/delete
Not much difference between
locked version and lock free
version. No need to lock the
whole data structure (can lock
only the modifying node), but
linear lookup time.
An alternative probabilistic data
structure that gives the same
complexity guarantees as BST,
but avoids the need for
rebalancing, but the contention is
much higher.
Explain in the next slide.

Date Structure: Partitioning Scheme
Every node maintains
a set of vectors -- one
per each thread
Every thread maintains the
min work item seen so far.
Push back - O(1)
(assuming no resize)
Find the global min
representative
Partition by the
global min with strict
weak ordering
relation and move
partitioned work to a
new vector. (O(n))
Thread buckets is a
data structure that
has a vector per
each thread,
represents the next
processing
equivalence class
ordering for B1
bucket in Rank 0

Data Structure: Partitioning Scheme
• Processing Thread Buckets
• Avoid load imbalance issues.
• But cannot insert work items that are coming for the same buckets (cannot
afford vector resize -- if we need this we need to use locks)
• Work items pushed for the same bucket are directly sent.
T0 T1 T2
T0 T1 T2 T0
…

Data Structure: Summary
Data Structure Execution Time (sec.)
Linked List (Atomics) ~61
Binary Search Trees with
Locks
~82
Concurrent SkipList ~89
Partitioning Scheme ~44
• Pre-order processing function
• SSSP-Delta ordering
• Two nodes, 16 shared-memory
parallel threads, Graph500 Scale 24.

Mapping AGM to an Implementation
Q2. How should we arrange
processing function and
ordering ?
Q1. How should we do
ordering ?
Abstract
Actual
ImplementationCommunication
method

Communication Paradigms
R0
R1 R2 R3
Push
Push
Push
Push
Push
Push
R0
R1 R2 R3
Pull
Pull
Pull
Pull
Pull
Pull
R0
R1 R2 R3
Gather
Gather
Gather
Scatter
Scatter
Scatter
Apply
Push Communication Pull Communication GAS Communication
R1 pushes workitems to R0 R0 pulls workitems to R1

Placing Processing Function: Pre-Order & Post-Order
After receiving the
workitem processing
function is executed
and generated
workitems are
inserted to the data
structure
After receiving
the workitem
insert to the
data structure
for ordering
Pre-Order
Post-Order

Placing Processing Function: Split-Order
• The processing function contains logic for updating state/s and to generate new
work.
 - We split the processing function into two functions: a state update function
( and)
workitem
Insert the
workitem for
ordering if state
is updated
Generate new
work only if state is
not changed
Allows us to
prune work
further

Placing Processing Function: Summary
Run Time Total Inserts Additional
WorkItems
Processed
Split-Order 14.54 17139284 21334
Pre-order 62.55 409388039 2469025
Post-order 44.09 398344581 2284918
Best timing.
Total work items pushed
into the data structure is
much less compared other
two implementations.
Additional relaxes are
much less compared to
other two
implementations,
because of pruning.
Post-Order is
slightly better
than pre-order
because of the
reduced
contention.

Engineering an EAGM

EAGM Spatial Levels
T0 T1 T2 T3 T4 T5
N0
Global
Spatial Level
Node
Spatial Level
NUMA
Spatial Level
Thread
Spatial Level
numa0 numa1
T0 T1 T2 T3 T4 T5
N1
numa0 numa1
T0 T1 T2 T3 T4 T5
N2
numa0 numa1

EAGM Execution
// SSSP (AGM) algorithm
typedef agm<Graph,
WorkItem,
ProcessingFunction,
StrictWeakOrdering,
RuntimeModelGen> sssp_agm_t;
sssp_agm_t ssspalgo(rtmodelgen,
ordering,
pf,
initial);
// SSSP (EAGM) algorithm
typedef eagm<Graph,
WorkItem,
ProcessingFunction,
EAGMConfig,
RuntimeModelGen> sssp_eagm_t;
sssp_eagm_t ssspalgo(rtmodelgen,
config,
pf,
initial);
AGM takes a strict
weak ordering
relation.
EAGM takes an
hierarchy of
orderings
The hierarchy
correspond to the
memory hierarchy.
Globally Delta (=5)
ordering Node level chaotic
Ordering
NUMA level chaotic
Ordering
Thread level chaotic
Ordering
EAGMConfig

EAGMConfig
// strict weak orderings
//================== Chaotic =====================
struct chaotic : public base_ordering {
public:
static const eagm_ordering ORDERING_NAME = eagm_ordering::enum_chaotic;
template <typename T>
bool operator()(T i, T j) {
return false;
}
eagm_ordering name() {
return ORDERING_NAME;
}
};
Any two workitems
are not comparable
to each other.
CHAOTIC_ORDERING_T ch;
DELTA_ORDERING_T delta(agm_params.delta);
auto config = boost::graph::agm::create_eagm_config(delta,
ch,
ch,
ch);
Globally delta
ordering
Node level chaotic
ordering
NUMA level chaotic
orderingThread level chaotic
ordering

EAGM Execution:
T0 T1 T2 T3 T4 T5
N0
numa0 numa1
T0 T1 T2 T3 T4 T5
N1
numa0 numa1
T0 T1 T2 T3 T4 T5
N2
numa0 numa1
Time
An equivalence
class
Barrier
synchronization
Execution same
as AGM
Execution is
globally
synchronous
Execution is
asynchronous in
process, NUMA and
thread levels

EAGM Execution:
Execution is globally
asynchronous 
globally single
equivalence class.
Execution is node
level
synchronous,
Equivalence class
at process level.
Execution is
asynchronous in
NUMA and thread
levels
Local thread
barriers

EAGM Execution:
T0 T1 T2 T3 T4 T5
N0
numa0 numa1
T0 T1 T2 T3 T4 T5
N1
numa0 numa1
T0 T1 T2 T3 T4 T5
N2
numa0 numa1
Time
Global
Equivalence
Class
Global
Barrier
Node
Equivalence
Class
NUMA
Equivalence
Class
Thread
Equivalence
Class
NUMA
Barrier
Thread
Barrier
Spatial Ordering
TemporalOrdering
• If we have “non-
chaotic” orderings
defined for every
memory level.
• E.g.,
Spatial ordering
Temporal ordering

EAGM Implementation
• The heart of the EAGM implementation is the nested data structure that holds
equivalence classes.
Global level
classes
Node level
classes
NUMA level
classes

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Time
EAGM Implementation: Data Structure
Order by
Delta=10
Order by
K=2
Order by
Delta=5
E.g.,

EAGM Implementation: Static Optimizations
Chaotic ordering always
create a single
equivalence class.
Therefore, we exclude
levels for chaotic
orderings.
Optimized data structure.
Optimizations are performed
statically at compile time with the
help of template meta-
programming and template
specialization.
Remove nested levels
for chaotic orderings.

Remove nested levels
for chaotic orderings.This is really an AGM.

Results

Single Source Shortest Paths: Pre/Post/Split Orderings
Weak scaling results
carried out on 2 Broad-
well 16-core Intel Xeon
processors.
230 vertices &
234 edges.
Shared-memory
execution.
Split-order is ~5
times faster than
pre-order & post-
order.

Single Source Shortest Paths: Weak Scaling
Graph500 graph.
Graph500
proposed SSSP
graph.
Erdos-Renyi
graph.
Framework algorithms show
better scaling compared to
PowerGraph.
Thread level Dijkstra ordering
shows better scaling behavior
for Power-Law graphs.
Globally synchronous delta-
stepping has better scaling for
ER graphs.

Single Source Shortest Paths: Strong Scaling (BRII+)
Relative Speedup, = Time for
fastest sequential algorithm /
Parallel execution time on “n”
PEs
Synchronization
overhead
become
significant when
executing on
higher number
of nodes
Just ordering by
level does not
reduce the work,
but if we apply
Dijkstra ordering
at thread level
we see better
performance.

SSSP More Results: Weak Scaling (BR II)
Both Power-Graph & PBGL-1
do not scale well at higher
scale

Breadth First Search: Weak Scaling
K(2) global ordering and
level global ordering
reduce more redundant
work compared thread
level ordering.

BFS: In-node Performance (Weak Scaling)
Global level
synchronous
Node level
synchronous
Within a node both configurations
execute in the same way.
Therefore, the performance of both
configurations are similar.

BFS: Strong Scaling
Globally level
synchronous versions
show better speed-up
than thread level
synchronous
versions.

BFS: Road Networks
Strong scaling with
Road networks. Road
networks has a high
diameter (~850-900).
Therefore, global
level synchronous
version show poor
scaling because of
the synchronization
overhead.
Global level
synchronous
Globally
asynchronous
and thread level
synchronous

Other Graph Applications
• Connected Components
• Maximal Independent Set
• Triangle Counting

Conclusions
• Using the AGM abstraction, we showed that, we can generate
families of algorithms by keeping the processing function intact
and by changing ordering.
• With the EAGM we can achieve spatial and temporal orderings.
• We discuss the challenges in mapping AGM framework to an
implementations and described how we could avoid such
challenges.
• We came up with an efficient implementation of AGM and EAGM
frameworks.
 Weak and strong scaling results show that AGM/EAGM
algorithms outperform graph algorithms in other frameworks.

Thank You !

ABSTRACT GRAPH MACHINE: MODELING ORDERINGS IN ASYNCHRONOUS DISTRIBUTED-MEMORY PARALLEL GRAPH ALGORITHMS

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to ABSTRACT GRAPH MACHINE: MODELING ORDERINGS IN ASYNCHRONOUS DISTRIBUTED-MEMORY PARALLEL GRAPH ALGORITHMS

Similar to ABSTRACT GRAPH MACHINE: MODELING ORDERINGS IN ASYNCHRONOUS DISTRIBUTED-MEMORY PARALLEL GRAPH ALGORITHMS (20)

Recently uploaded

Recently uploaded (20)

ABSTRACT GRAPH MACHINE: MODELING ORDERINGS IN ASYNCHRONOUS DISTRIBUTED-MEMORY PARALLEL GRAPH ALGORITHMS

Editor's Notes