Ling liu part 02：big graph processing

Ling
Liu

School
of
Computer
Science

College
of
Compu2ng

Part
II:

Distributed
Graph
Processing

2
2
Big
Data
Trends
Big Data
Volume
Velocity
Variety
1 zettabyte = a trillion gigabytes (1021 bytes)
CISCO, 2012
500 million
Tweets per day
100 hours of video
are uploaded
every minute

3
3
Why
Graphs?
Graphs
are

everywhere
!

Social
Network
Graphs
Road
Networks
National
Security
Business
Analytics
Biological
Graphs
Friendship Graph
Facebook Engineering, 2010
Brain Network
The Journal of Neuroscience 2011
US Road Network
www.pavementinteractive.org
Web Security
Graph
McAfee, 2013
Intelligence Data Model
NSA, 2013

4
4
How
Big?
Social
Scale

!   1
billion
ver2ces,
100
billion
edges

!   111
PB
adjacency
matrix

!   2.92
TB
adjacency
list

1 billion vertices, 100 billion edges
111 PB adjacency matrix
2.92 TB adjacency list
2.92 TB edge list
Twitter graph from Gephi dataset
(http://www.gephi.org)
Paul Burkhardt, Chris Waring An NSA Big Graph experiment
Web
Scale

!   50
billion
ver2ces,
1
trillion
edges

!   271
EB
adjacency
matrix

!   29.5
TB
adjacency
list

Brain
Scale

!   100
billion
ver2ces,
100
trillion
edges

!   1.1
ZB
adjacency
matrix

!   2.83
PB
adjacency
list

NSA-RD-2013-056001v1
Web scale. . .
50 billion vertices, 1 trillion edges
271 EB adjacency matrix
29.5 TB adjacency list
29.1 TB edge list
Internet graph from the Opte Project
(http://www.opte.org/maps)
Web graph from the SNAP database
(http://snap.stanford.edu/data)
Paul Burkhardt, Chris Waring An NSA Big Graph experiment
NSA-RD-
Brain scale. . .
100 billion vertices, 100 trillion edges
2.08 mNA · bytes2 (molar bytes) adjacency matrix
2.84 PB adjacency list
2.84 PB edge list
Human connectome.
Gerhard et al., Frontiers in Neuroinformatics 5(3), 2011

5
5
Big
Graph
Data
Technical
Challenges
Huge
and
growing
size

-‐ Requires
massive
storage
capaci2es

-‐ Graph
analy2cs
usually
requires
much
bigger
compu2ng
and
storage
resources

Complicated
correlaEons
among
data
enEEes
(verEces)

-‐ Make
it
hard
to
parallelize
graph
processing
(hard
to
par22on)

-‐ Most
exis2ng
big
data
systems
are
not
designed
to
handle
such
complexity
Skewed
distribuEon
(i.e.,
high-‐degree
verEces)

-‐ Makes
it
hard
to
ensure
load
balancing

6
6
Parallel
Graph
Processing:
Challenges

•  Structure
driven
computa2on

–  Storage
and
Data
Transfer
Issues

•  Irregular
Graph
Structure
and
Computa2on
Model

–  Storage
and
Data/Computa2on
Par22oning
Issues

–  Par22oning
v.s.
Load/Resource
Balancing

6

7
7
Parallel
Graph
Processing:
OpportuniEes

•  Extend
Exis2ng
Paradigms

–  Vertex
centric

–  Edge
centric

•  BUILD
NEW
FRAMEWORKS
for
Parallel
Graph

Processing

–  Single
Machine
Solu2ons

•  GraphLego
[ACM
HPDC
2015]
/
GraphTwist
[VLDB2015]

–  Distributed
Approaches

•  GraphMap
[IEEE
SC
2015],
PathGraph
[IEEE
SC
2014]

7

8
8
Build
New
Graph
Frameworks:

Key
Requirements/Challenges

•  Less
pre-‐processing

•  Low
and
load-‐balanced
computaEon

•  Low
and
load-‐balanced
communicaEon

•  Low
memory
footprint

•  Scalable
wrt
cluster
size
and
graph
size

•  General
graph
processing
framework
for
large

collecEons
of
graph
computaEon
algorithms
and

applicaEons

9
9
Graph
OperaEons:
Two
DisEnct
Classes
IteraEve
Graph
Algorithms

!   Each
execu2on
consists
of
a
set
of
itera2ons

!   In
each
itera2on,
vertex
(or
edge)
values
are
updated

!   All
(or
most)
ver2ces
par2cipate
in
the
execu2on

!   Examples:
PageRank,
shortest
paths
(SSSP),
connected
components

!   Systems:
Pregel,
GraphLab,
GraphChi,
X-‐Stream,
GraphX,
Pregelix
Graph
PaXern
Queries

! Subgraph
matching
problem

!   Requires
fast
query
response
2me

!   Explores
a
small
frac2on
of
the
en2re
graph

!   Examples:
friends-‐of-‐friends,
triangle
paberns

!   Systems:
RDF-‐3X,
TripleBit,
SHAPE

VLDB 2014
IEEE SC 2015

10
10
Distributed
Approaches
to

Parallel
Graph
Processing
SHAPE:
Distributed
RDF
System
with
Seman2c
Hash
Par22oning

!   Graph
Pabern
Queries

!   Seman2c
Hash
Par22oning

!   Distributed
RDF
Query
Processing

!   Experiments

GraphMap:
Scalable
Itera2ve
Graph
Computa2ons

!   Itera2ve
Graph
Computa2ons

! GraphMap
Approaches

!   Experiments

11
11
What
Are
IteraEve
Graph
Algorithms?
IteraEve
Graph
Algorithms

!   Each
execu2on
consists
of
a
set
of
itera2ons

!   In
each
itera2on,
vertex
(or
edge)
values
are
updated

!   All
(or
most)
ver2ces
par2cipate
in
the
opera2ons

!   Examples:
PageRank,
shortest
paths
(SSSP),
connected
components

!   Systems:
Google’s
Pregel,
GraphLab,
GraphChi,
X-‐Stream,
GraphX,
Pregelix
SSSP
Connected Components Source: amplab

12
12
Why
Is
IteraEve
Graph
Processing
So
Difficult?
Huge
and
growing
size
of
graph
data

-‐ Makes
it
hard
to
store
and
handle
the
data
on
a
single
machine

Poor
locality
(many
random
accesses)

-‐ Each
vertex
depends
on
its
neighboring
ver2ces,
recursively

Huge
size
of
intermediate
data
for
each
iteraEon

-‐ Requires
addi2onal
compu2ng
and
storage
resources

Heterogeneous
graph
algorithms

-‐ Different
algorithms
have
different
computa2on
and
access
paberns

High-‐degree
verEces

-‐ Make
it
hard
to
ensure
load
balancing

13
The
problems
of
current
computaEon
models

13

14
The
problems
of
current
computaEon
models

•  Ghost
ver2ces
maintain
adjacency
structure
and

replicate
remote
data.

•  Too
much
interac2ons
among
par22ons

14
“ghost” vertices

15
15
IteraEve
Graph
Algorithm
Example
Figure source: Apache Flink
Connected
Components

16
16
Why
Don’t
We
Use
MapReduce?
Of
course,
we
can
use
MapReduce!

The
ﬁrst
iteraEon
of
Connected
Components

for
this
graph
would
be
…

Map

K
V
2
1
K
V
1
2
3
2
4
2
K
V
2
4
3
4
K
V
2
3
4
3
Reduce

K
Values
1
2
Min
1
2
1,3,4
3
2,4
4
2,3
1
2
2

17
17
Why
We
Shouldn’t
Use
MapReduce
But
…

In
a
typical
MapReduce
job,
disk

IOs
are
performed
in
four
places

So…
10
iteraEons
mean…

Figure source: http://arasan-blog.blogspot.com/
Disk
IOs
in
40
places

18
18
Related
Work
Distributed
Memory-‐Based
Systems

!   Messaging-‐based:
Google
Pregel,
Apache
Giraph,
Apache
Hama

!   Vertex
mirroring:
GraphLab,
PowerGraph,
GraphX

!   Dynamic
load
balancing:
Mizan,
GPS

!   Graph-‐centric
view:
Giraph++

Disk-‐Based
Systems
using
single
machine

!   Vertex-‐centric
model:
GraphChi

!   Edge-‐centric
model:
X-‐Stream

!   Vertex-‐Edge
Centric:
GraphLego

With
External
Memory

!   Out-‐of-‐core
capabili2es
(Apache
Giraph,
Apache
Hama,
GraphX)

!   Not
op2mized
for
graph
computa2ons

!   Users
need
to
conﬁgure
several
parameters

19
19
Two
Research
DirecEons
IteraEve
Graph
Processing
Systems

Disk-based systems
on a single machine

!   Load
a
part
of
the
input
graph

in
memory

!   Include
a
set
of
data
structures

and
techniques
to
eﬃciently

load
graph
data
from
disk

! GraphChi,
X-‐Stream,
…

! Disadv.:
1)
relaEvely
slow,
2)

resource
limitaEons
of
a
single

machine

Distributed memory-based
systems on a cluster

!   Load
the
whole
input
graph
in

memory

!   Load
all
intermediate
results

and
messages
in
memory

! Pregel,
Giraph,
Hama,

GraphLab,
GraphX,
…

! Disadv.:
1)
very
high
memory

requirement,
2)
coordinaEon

of
distributed
machines

20
20
Main
Features
Develop
GraphMap

!   Distributed
itera2ve
graph
computa2on
framework
that
eﬀec2vely

u2lizes
secondary
storage

!   To
reduce
the
memory
requirement
of
itera2ve
graph
computa2ons

while
ensuring
compe22ve
(or
beber)
performance

Main
ContribuEons

!   Clear
separaEon
between
mutable
and
read-‐only
data

!   Two-‐level
parEEoning
technique
for
locality-‐op2mized
data

placement

!   Dynamic
access
methods
based
on
the
workloads
of
the
current

itera2on

21
21
Clear
Data
SeparaEon
Graph
Data

Vertices and their data
(mutable)
Edges and their data
(read-only)
Read edge data for each iteration!

22
22
Locality-‐Based
Data
Placement
on
Disk
Edge
Access
Locality

!   All
edges
(out-‐edges,
in-‐edges
or
bi-‐edges)
of
a
vertex
are
accessed

together
to
update
its
vertex
value

è
We
place
all
connected
edges
of
a
vertex
together
on
disk

Vertex
Access
Locality

!   All
ver2ces
in
a
par22on
are
accessed
by
the
same
worker

(processor)
in
every
itera2on

è
We
store
all
ver2ces,
in
a
par22on,
and
their
edges
into
con2guous

disk
blocks
to
u2lize
sequenEal
disk
accesses

How
can
you
access
disk
eﬃciently
for
each
iteraEon?

23
23
Dynamic
Access
Methods
Various
Workloads
If
the
current
workload
is

larger
than
the
threshold?

YES NO
Sequential
disk accesses
Random
disk accesses
The threshold is dynamically
configured based on actual access
times for each iteration and for
each worker
0
20
40
60
80
100
120
140
0 1 2 3 4 5 6 7 8 9
#activevertices(x1000)
Iteration
PageRank CC SSSP

24
24
Experiments
First
Prototype
of
GraphMap

!   BSP
engine
&
messaging
engine:
U2lize
Apache
Hama

!   Disk
storage:
U2lize
Apache
HBase

!   Two-‐dimensional
key-‐value
store

Sefngs

!   Cluster
of
21
machines
on
Emulab

!   12GB
RAM,
Xeon
E5530,
500GB
and
250GB
SATA
disks

!   Connected
via
a
1
GigE
network

! HBase
(ver.
0.96)
on
HDFS
of
Hadoop
(ver.
1.0.4)

!   Hama
(ver.
0.6.3)

IteraEve
Graph
Algorithms

!   1)
PageRank
(10
iter.),
2)
SSSP,
3)
CC

25
25
ExecuEon
Time
Analysis

!   Hama
fails
for
large
graphs
with
more
than
900M
edges

while
GraphMap
s2ll
works

!   Note
that,
in
all
the
cases,
GraphMap
is
faster
(up
to
6
2mes)

than
Hama,
which
is
the
in-‐memory
system

26
26
Breakdown
of
GraphMap
ExecuEon
Time
PageRank on uk-2005 SSSP on uk-2005
CC on uk-2005
Analysis

!   For
PageRank,
all
itera2ons
have
similar

results
except
the
ﬁrst
and
last

!   For
SSSP,
itera2on
5
–
15
u2lize

sequen2al
disk
accesses
based
on
our

dynamic
selec2on

!   For
CC,
random
disk
accesses
are

selected
from
itera2on
24

27
27
Eﬀects
of
Dynamic
Access
Methods
Analysis

! GraphMap
chooses
the

op2mal
access
method
in

most
of
the
itera2ons

!   Possible
further
improvement

through
ﬁne-‐tuning
in

itera2ons
5
and
15

!   For
cit-‐Patents,
GraphMap

always
chooses
random

accesses
because
only
3.3%

ver2ces
are
reachable
from

the
start
vertex
and
thus
the

number
of
ac2ve
ver2ces
is

always
small

0
5
10
15
20
25
30
35
0 5 10 15 20 25 30 35 40
ComputationTime(sec)
Iteration
Sequential
Random
Dynamic

28
28
•  Exis2ng
distributed
graph
systems
are
all
in-‐memory
systems.
In

addi2on
to
Hama,
we
give
a
rela2ve
comparison
with
a
few
other

representa2ve
systems:

Comparing
GraphMap
with
other
systems

GraphMap: 12GB DRAM per node of a cluster of size 21 nodes and 252GB
distributed shared memory
5x DRAM per node

29
29
Social
Life
Journal
(LJ)
Graph
Dataset

Vertices: Members
Edges: Friendship
Graph dataset (stored in HDFS)
cit-Patents (raw size: 268MB): 3.8M vertices, 16.5M edges
soc-LiveJournal1 (raw size: 1.1GB): 4.8M vertices, 69M edges

30
30
•  Cluster
sepng

–  6
machines
(1
master
&
5
slaves)

•  Spark
sepng

–  Spark
shell
(i.e.,
did
NOT
implement
any
Spark
applica2on

yet)

• Built-‐in
PageRank
func2on
of
GraphX

–  All
40
cores
(=
8
cores
x
5
slaves)

–  Por2on
of
memory
for
RDD
storage:
0.52
(by
default)

• If
we
assign
512MB
for
each
executor,
about
265MB
is

dedicated
for
RDD
storage

Our
iniEal
experience
with
SPARK
/
GraphX
30

31
31
•  Implemented
two
hash
par22oning
techniques
using
GraphX
API

–  1)
by
source
vertex
IDs,
2)
by
des2na2on
vertex
IDs

Use-‐deﬁned
ParEEoning
(soc-‐LiveJournal1)
31
5GB slave RAM &
40 partitions
loading
1st PageRank
10 iterations
2nd PageRank
10 iterations
3rd PageRank
10 iterations
Hashing by src. ve
rtices
10s (2.1GB)
196s (7.2GB)
(12.3 / 13.2GB)
156s (7.2GB)
(12.1 / 13.2GB)
OOM
Hashing by dst. ve
rtices
10s (2.1GB)

168s (5.8GB)
(12.5 / 13.2GB)
OOM
EdgePartition2D
10s (2.1GB)
199s (8GB)
(12.1 / 13.2GB)
OOM
EdgePartition1D
13s (2.1GB, outlier
?)
188s (6.9GB)
(12.2 / 13.2GB)
173s (7.2GB)
(12.3 / 13.2GB)
OOM
10GB slave RAM &
40 partitions
loading
1st PageRank
10 iterations
2nd PageRank
10 iterations
3rd PageRank
10 iterations
Hashing by src. verti
ces
10s (2.1GB)
189s (13.2GB)
(17.5 / 26.1 GB)
186s (18.2GB)
(23 / 26.1GB)
OOM
Hashing by dst. verti
ces
11s (2.1GB)

223s (15.9GB)
(21.1 / 26.1GB)
OOM

32
32
•  Our
ini2al
experience
with
SPARK

–  Spark
performs
well
with
large
per-‐node
with
>=
68GB

DRAM,
as
reported
in
the
SPARK/GraphX
paper.

–  Do
not
perform
well
for
cluster
with
nodes
of
smaller

DRAM

•  Messaging
Overheads

–  Distributed
graph
processing
systems
do
not
scale
as
the
#

nodes
increases
due
to
the
amount
of
messaging
cost

among
compute
nodes
in
the
cluster
to
synchronize
the

computa2on
in
each
itera2on
round

Spark/GraphX
experience
and
Messaging
Cost

33
33
Summary
GraphMap

!   Distributed
itera2ve
graph
computa2on
framework
that
eﬀec2vely

u2lizes
secondary
storage

!   Clear
separa2on
between
mutable
and
read-‐only
data

!   Locality-‐based
data
placement
on
disk

!   Dynamic
access
methods
based
on
the
workloads
of
the
current

itera2on

Ongoing
Research

!   Disk
and
worker
coloca2on
to
improve
the
disk
access
performance

!   Eﬃcient
and
lightweight
par22oning
techniques,
incorpora2ng
our

work
on
GraphLego
for
single
PC
graph
processing
[ACM
HPDC
2015]

!   Comparing
with
SPARK/GraphX
on
larger
DRAM
cluster

34
34
General
Purpose
Distributed
Graph
System
ExisEng
State
of
Art

!   Separate
efforts
for
the
two
representa2ve
graph
opera2ons

!   Separate
efforts
for
the
scale-‐up
and
scale-‐out
systems

Challenges
for
Developing
a
General
Purpose
Graph
Processing
System

!   Different
data
access
paberns
/
graph
computa2on
models

!   Different
inter-‐node
communica2on
effects

Possible
DirecEons

!   Graph
summariza2on
techniques

!   Lightweight
graph
par22oning
techniques

!   Op2mized
data
storage
systems
and
access
methods

35
VISIT:

35
https://sites.google.com/site/gtshape/
https://sites.google.com/site/git_GraphLego/
https://sites.google.com/site/git_GraphTwist/

Ling liu part 02：big graph processing

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Ling liu part 02：big graph processing

Similar to Ling liu part 02：big graph processing (20)

More from jins0618

More from jins0618 (20)

Recently uploaded

Recently uploaded (20)

Ling liu part 02：big graph processing