Graph analysis platform comparison, pregel/goldenorb/giraph

A System for Large-Scale Graph
Processing
Pregel, GoldenOrb, Giraph
2012-07-18
Andrew Yongjoon Kong
sstrato.open@gmail.com
1

Contents
• Introduction
• Model of Computation
• Pregel Architecture
• Goldenorb
• Implementation
• Future work
2

Introduction
• Today, Many practical computing problems concern large
graphs
• Applied algorithms
- Shortest paths computations
- Page rank
- Clustering techniques
• MapReduce is ill-suited for graph processing
- Many iterations are needed for parallel graph processing
- Materializations of intermediate results at every MapReduce
iteration harm performance
4

Introduction
• Hadoop is well-suited for non-iterative, data
parallelized processing
5
Smith Waterman
is a non iterativ
e case and of c
ourse runs fine

Introduction
6
map map
reduce
Compute the dist
ance to each dat
a point from eac
h cluster center a
nd assign points
to cluster centers
Compute new cluster
centers
Compute new clust
er centers
User program

Iterative?
• Should Handle iterative processing like PDE
(Partial Differential Equation)
• http://www.iterativemapreduce.org/
7

Graph based Computation
• Pregel
– Google’s large scale graph
• GordenOrb
• Giraph
– Yahoo’s platform
• Hama
– Apache Hama’s
• Pegasus
– Carnegie Melon University 8

Single Source Shortest Path (SSSP)
 Problem
– Find shortest path from a source node to all target
nodes
 Solution
– MapReduce
– Pregel
9

Example: SSSP—using MapReduce
• A Map task receives
– Key: node n
– Value: D (distance from start), points-to (list of nodes
reachable from n)
– D(n) = dist + min(D(m))
• The Reduce task gathers possible distances and selects
the minimum one
10

 Adjacency matrix
 Adjacency List
A: (B, 10), (D, 5)
B: (C, 1), (D, 2)
C: (E, 4)
D: (B, 3), (C, 9), (E, 2)
E: (A, 7), (C, 6)
A B C D E
A 0 10 0 5 0
B 0 0 1 2 0
C 0 0 0 0 4
D 0 3 9 0 2
E 7 0 6 0 0
11
0




10
5
2 3
2
1
9
7
4 6
A
B C
D E

 Map input: <node ID, <dist, adj list>>
<A, <0, <(B, 10), (D, 5)>>>
<B, <inf, <(C, 1), (D, 2)>>>
<C, <inf, <(E, 4)>>>
<D, <inf, <(B, 3), (C, 9), (E, 2)>>>
<E, <inf, <(A, 7), (C, 6)>>>
 Map output: <dest node ID, dist>
<B, 10> <D, 5>
<C, inf> <D, inf>
<E, inf>
<B, inf> <C, inf> <E, inf>
<A, inf> <C, inf>
<A, <0, <(B, 10), (D, 5)>>>
<B, <inf, <(C, 1), (D, 2)>>>
<C, <inf, <(E, 4)>>>
<D, <inf, <(B, 3), (C, 9), (E, 2)>>>
<E, <inf, <(A, 7), (C, 6)>>>
Flush to local
disk!
12
0




10
5
2 3
2
1
9
7
4 6
A
B C
D E

 Reduce input: <node ID, dist>
<A, <0, <(B, 10), (D, 5)>>>
<A, inf>
<B, <inf, <(C, 1), (D, 2)>>>
<B, 10> <B, inf>
<C, <inf, <(E, 4)>>>
<C, inf> <C, inf> <C, inf>
<D, <inf, <(B, 3), (C, 9), (E, 2)>>>
<D, 5> <D, inf>
<E, <inf, <(A, 7), (C, 6)>>>
<E, inf> <E, inf>
13
0




10
5
2 3
2
1
9
7
4 6
A
B C
D E
Select possible & minimum value
and update former iteration
result.

 Reduce output: <node ID, <dist, adj list>>
= Map input for next iteration
<A, <0, <(B, 10), (D, 5)>>>
<B, <10, <(C, 1), (D, 2)>>>
<C, <inf, <(E, 4)>>>
<D, <5, <(B, 3), (C, 9), (E, 2)>>>
<E, <inf, <(A, 7), (C, 6)>>>
<B, 10> <D, 5>
<C, 11> <D, 12>
<E, inf>
<B, 8> <C, 14> <E, 7>
<A, inf> <C, inf>
<A, <0, <(B, 10), (D, 5)>>>
<B, <10, <(C, 1), (D, 2)>>>
<C, <inf, <(E, 4)>>>
<D, <5, <(B, 3), (C, 9), (E, 2)>>>
<E, <inf, <(A, 7), (C, 6)>>>
Flush to local
disk!
14
0
10
5


10
5
2 3
2
1
9
7
4 6
A
B C
D E

<A, <0, <(B, 10), (D, 5)>>>
<A, inf>
<B, <10, <(C, 1), (D, 2)>>>
<B, 10> <B, 8>
<C, <inf, <(E, 4)>>>
<C, 11> <C, 14> <C, inf>
<D, <5, <(B, 3), (C, 9), (E, 2)>>>
<D, 5> <D,12>
<E, <inf, <(A, 7), (C, 6)>>>
<E, inf> <E, 7>
15
0




10
5
2 3
2
1
9
7
4 6
A
B C
D E
result.

<A, <0, <(B, 10), (D, 5)>>>
<B, <8, <(C, 1), (D, 2)>>>
<C, <11, <(E, 4)>>>
<D, <5, <(B, 3), (C, 9), (E, 2)>>>
<E, <7, <(A, 7), (C, 6)>>>
<B, 10> <D, 5>
<C, 9> <D, 10>
<E, 15>
<B, 8> <C, 14> <E, 7>
<A, 14> <C, 13>
<A, <0, <(B, 10), (D, 5)>>>
<B, <8, <(C, 1), (D, 2)>>>
<C, <11, <(E, 4)>>>
<D, <5, <(B, 3), (C, 9), (E, 2)>>>
<E, <7, <(A, 7), (C, 6)>>>
Flush to local
disk!
16
0
8
5
11
7
10
5
2 3
2
1
9
7
4 6
A
B C
D E

<A, <0, <(B, 10), (D, 5)>>>
<A, 14>
<B, <8, <(C, 1), (D, 2)>>>
<B, 10> <B, 8>
<C, <11, <(E, 4)>>>
<C, 9> <C, 14> <C, 13>
<D, <5, <(B, 3), (C, 9), (E, 2)>>>
<D, 5> <D, 10>
<E, <7, <(A, 7), (C, 6)>>>
<E, 15> <E, 7>
17
result.
0
8
5
9
7
10
5
2 3
2
1
9
7
4 6
A
B C
D E

<A, <0, <(B, 10), (D, 5)>>>
<B, <8, <(C, 1), (D, 2)>>>
<C, <9, <(E, 4)>>>
<D, <5, <(B, 3), (C, 9), (E, 2)>>>
<E, <7, <(A, 7), (C, 6)>>>
<B, 10> <D, 5>
<C, 9> <D, 10>
<E, 13>
<B, 8> <C, 14> <E, 7>
<A, 14> <C, 13>
<A, <0, <(B, 10), (D, 5)>>>
<B, <8, <(C, 1), (D, 2)>>>
<C, <9, <(E, 4)>>>
<D, <5, <(B, 3), (C, 9), (E, 2)>>>
<E, <7, <(A, 7), (C, 6)>>>
Flush to local
disk!
18
0
8
5
9
7
10
5
2 3
2
1
9
7
4 6
A
B C
D E

<A, <0, <(B, 10), (D, 5)>>>
<A, 14>
<B, <8, <(C, 1), (D, 2)>>>
<B, 10> <B, 8>
<C, <9, <(E, 4)>>>
<C, 9> <C, 14> <C, 13>
<D, <5, <(B, 3), (C, 9), (E, 2)>>>
<D, 5><D, 10>
<E, <7, <(A, 7), (C, 6)>>>
<E, 13> <E, 7>
19
No Changes. Quit Process!
0
8
5
9
7
10
5
2 3
2
1
9
7
4 6
A
B C
D E

• The MapReduce use the key/value pairs to save the
nodes and adjacent distance, It is more suitable to
process huge datasets rather than the large-scale
graph
Here, we introduce the new system– Pregel!
20

Model of Pregel Computation
Input
Output
Supersteps:
• A sequence of iterations
• Vertex compute in parallel
Input: a directed graph
•Vertex : a vertex ID
a modifiable
•Edges: a target vertex
a modifiable
associate with source vertices
Output: a directed graph
•The set of values explicitly output
by the vertices
•vertices and edges can be added
and moved
22

Maximum Value Example
• propagate the largest value to every vertex
23
A B C D

Single Source Shortest Path (SSSP)
 Problem
– Find shortest path from a source node to all target
nodes
 Solution
– MapReduce
– Pregel
24

Example: SSSP—using Pregel
25
0




10
5
2 3
2
1
9
7
4 6
A
B
D E
C

26
A
ED
CB

27
0
10
5


10
5
2 3
2
1
9
7
4 6
A
ED
CB

28
A
ED
CB

29
0
8
5
11
7
10
5
2 3
2
1
9
7
4 6
A
ED
CB

30
0
8
5
11
7
10
5
2 3
2
1
9
7
4 6
13
14
15
9
10
A
ED
CB

31
0
8
5
9
7
10
5
2 3
2
1
9
7
4 6
A
ED
CB

32
0
8
5
9
7
10
5
2 3
2
1
9
7
4 6
13
A
ED
CB

33
0
8
5
9
7
10
5
2 3
2
1
9
7
4 6
A
ED
CB

Pregel vs MapReduce
 Pregel
– Keeps vertices & edges on
the machine that performs
computation
– Uses network transfers only
for messages
– Sufficiently expressive, no
need for remote reads
 MapReduce
– Require much more
communication and
associated overhead
– Needs to coordinate the
steps of a chained
MapReduce add the
programming complexity

System Architecture
 Pregel system uses the master/worker model
– Master
 Coordinating worker activity
 Determines the amount of partitions and assign to worker
 Recovers faults of workers (“ping” messges)
 Maintains statistics about the progress of computation
and the state of the graph
– Worker
 Maintain the state of its portion of the graph in memory
 Executing the Compute() method
 Communicates with the other workers
36

37
•Assign portion of the input
•Instruct each worker to
perform a superstep
•call Compute() for each
vertex
• update the data structure
• receive/send messages
• responds to master when
finished
•Control the number of
partitions in graph
•Notify the master to
start the processing

Fault Tolerance
 Checkpointing
– The master periodically instructs the workers to save the state
of their partitions to persistent storage
 e.g., Vertex values, edge values, incoming messages
 Failure detection
– Using regular “ping” messages
 Recovery
– The master reassigns graph partitions to the currently available
workers
– The workers all reload their partition state from most recent
available checkpoint
38

Goldenorb
• Open Source Version of Google’s Pregel
• Implemented in Java
• Version 0.1.1
• Requirements
- hadoop file system
- zookeeper for communication
40

41
Orbcluster
JobsInProgr
ess
Jobid
Messages heartbeat
OrbTracker
LeaderGrou
p
JobQue
OrbTrackte
rs
ZK-TREE
Orb-Tracker(L)
Orb-Tracker(S)
Orb-Tracker(S)
Job
manager
Partition
manager
Watcher
/Event
Partition
request
Partition
(Master)
…
Partition
(slave)
Partition
(slave)
Inbound msg queue
outbound msg queue
current msg queue
Inbound msg queue
outbound msg queue
current msg queue
•startLoadVerticesBarrier
•Superstep start Barrier
•doneComputingVerticesBarr
•doneSendingMessageBarrier
…
•LeaderShipChange
•LostMember
•NewMember
•JobStart/Death/Complete…
HDFS
Read/write

Message Exchange
• Message교환은 Superstep간에 이루어짐
• [S-1] superstep의 outbound message들은 [s] superstep의 inbound
messages
• Outbound Queue가 가득차면 message들을 보내고 다시 queuing
• Superstep 중간에 message를 받은 partition은 inbound queue에
저장하고 다음 Superstep까지 보관
• 현재 superstep에 사용할 message들은 current message queue에 복사
• 이 때, inbound queue가 system이나 jvm의 memory size 를 넘어서면
overflow 발생

Memory management
• Outbound Message Queue
- Fixed size, 가득 차면 바로 messages 보냄
• Inbound Message Queue
- 다음 Superstep에 사용
- Message 양이 많아지면 overflow가능성 있음
• Current Message Queue
- Inbound Queue 과 같은 사이즈
- 현 Superstep 에서는 CurrentQueue에 inboundQueue를 복사해서 사용하므로
currentQueue+inboundQueue 의 메모리 사용 overflow
 Inbound Queue를 file 기반의 local 저장공간에 구현 필요

API
• Sub-classing the predefined classes
– Reader/writer/vertex/message
44
Class Vertex {
public Vertex(Class<VV> vertexValue, Class<EV> edgeValue, Class<MV> messageValue);
String vertexID();
abstract void compute(Collection<MV> messages);
long superStep();
void setValue(VV value);
VV getValue();
Collection<Edge<EV>> getEdges();
void sendMessage(MV message);
void voteToHalt();
}

Not yet implemented
• Aggregator
– a mechanism for global communication monitoring and data
• Combiner
– Reducing the number of messages
– Ex) if compute() sum messages’ value, combiner can calculate
and transmit single message(sum)
• Topology mutation
– Remove or add Vertex/Edge
• Fault Recovery
45

Implement
 Maximum Value
 Single Source Shortest Path
 PageRank
 K-means
 Mean-shift
47

PageRank
• PageRank is Google’s way of deciding a Page’s
importance
• A important page is linked to by many pages with
high PageRank
• PR(A) = PR(inLink_v1)/L(t1) + ….+ P(inLink_vn)/L(tn)
• Add damping factor d
• PR(A) = (1-d) + d∑PR(v)/L(v)
• Repeat until converged
49

PageRank
51
AE
B
C
D
F
<Input file>
<output file>

K-means
• N observations are parted to k cluster
• Each observation belongs to the cluster with the
nearest mean
No object
move group?
End
Number of cluster K
Calculate centroids
Distance objects to
centroids
Grouping based on
minimum distance
start
NO
YES

K-means
53
• Message includes cluster id and value
• Every superstep, a vertex sends message to all
vertices
1
2
3
100
101
102
seed2
seed1
A
B C
D
E
F
Step A B C D E F
S0 C1 C2 - - - -

K-means
54
1
2
3
100
101
102
A
B C
D
E
F
Step A B C D E F
S0 C1 C2 - - - -
S1 C1 C2 C1 C1 C1 C1
Centroid1 = Value(A)
Centroid2 = Value(B)
1
2
3
100
101
102
seed2
seed1
A
B C
D
E
F

K-means
55
Step A B C D E F
S0 C1 C2 - - - -
S1 C1 C2 C1 C1 C1 C1
S2 C2 C2 C2 C1 C1 C1
1
2
3
100
101
102
A
B C
D
E
F
Centroid1 = Mean(A,C,D,E,F)
Centroid2 = Mean(B)
1
2
3
100
101
102
A
B C
D
E
F

K-means
56
Step A B C D E F
S0 C1 C2 - - - -
S1 C1 C2 C1 C1 C1 C1
S2 C2 C2 C2 C1 C1 C1
S3 C2 C2 C2 C1 C1 C1
1
2
3
100
101
102
A
B C
D
E
F
Centroid1 = Mean(D,E,F)
Centroid2 = Mean(A,B,C)
If centroids are
converged,
Quit Process!

K-means
58
<input file>
<output file>
N : number of vertices
Each superstep, NxN messages are exchanged.
 O(n2) : need too much memory !!!

Giraph
• ASF(Apache Software Foundation)’s Open Source
Version of Google’s Pregel
• Implemented in Java
• Apache incubator
• Requirements
- hadoop 0.20.203 or higher version
: map-only job in hadoop
- zookeeper
: if not exist, use hadoop file system instead of zookeeper
60

Giraph – vertex distribution
61

Giraph - usages
• Users can set the checkpoint frequency
– GiraphJob.getConfiguration().set(“giraph.checkpointFrequency”, 0)
//means no check points
• User should set zookeeper configuration
– GiraphJob.setZookeeperConfiguration(“zk-server-list”);
62

Giraph - Characteristics
• Faulty tolerance
– If the master dies, a new one will automatically take over
– If a worker dies, the app is rolled back to a previously checkpointed
superstep
– If a zookeeper server dies, as long as a quorum remains, the app can
proceed
– But, Hadoop SPOF still exist
• Combiner/Aggregator
• JSON in/out format
• Easy Job status monitoring (http)
63

Experiments
• 3 severs
• nPartition = nMapper = 9
• MR vs GoldenOrb vs Giraph
– PageRank
– Kmeans (mahout)
– Elapse time, cpu, memory, disk, network
65

Experiments - PageRank
• Number of Vertices ≈ 220,000
• Fixed iteration = 100
66
Elapse
Time
CPU
(%)
Memory
(kb)
Network
(bytes)
Disk Write
(sec/s)
Rcv. Trans. read write
GoldenOrb 1m 56s 14.53 3,745,376 19,437 12,845 777 606
Giraph 3m 31s 8.77 1,244,000 11,374 914 0 326
MapReduce 34m 51s 3.75 3,091,239 13,514 867 0 4101

Experiments - Kmeans
• Number of Vertices = 100,000
• Number of Cluster(K) = 10
67
Elapse
Time
CPU
(%)
Memory
(kb)
Network
(bytes)
Disk Write
(sec/s)
Rcv. Trans. read write
GoldenOrb 3m 19s 13.32 3,857,892 11,634 27,086 128 1151
Giraph 1m 49s 6.36 1,245,000 7,810 1,999 0 536
MapReduce 11m 28s 1.48 2,645,517 13,528 1,005 0 7104

Install Goldenorb (1)
• Requirements
- hadoop-0.20.2
- zookeeper-3.3.3
• Download & unzip
– org.goldenorb.core-0.1.1-SNAPSHOT-distribution.zip
70

Install Goldenorb(2)
• Set configuration
① ORB_HOME 환경변수
> export ORB_HOME=/usr/local/goldenorb
② Conf/orbServers
> localhost:/usr/local/goldenorb
③ Conf/orb-site.xml
> cp orb-site.sample.xml orb-site.xml
> vi orb-site.xml
④ If Distributed mode ,
copy to all servers
71
<property>
<name>goldenOrb.zookeeper.quorum</name>
<value> localhost</value>
<description> The server running zookeeper</description>
<property>
……
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
Target
zookeeper
server IP

• Set running environment
① Hadoop 실행
> $HADOOP_HOME/bin/start-dfs.sh
② Zookeeper 실행
> $ZK_HOME/bin/zkServer.sh start
③ Orb-tracker 실행
> $ORB_HOME/bin/orb-tracker.sh start
④ Log 확인
> cat #ORB_HOME/logs/xxx.log
72

• Make input
- ex) maximum value
< Vertex-id > <value> <outgoing-edge-list>
73
0
8
5
11
7
A
B C
D E
A 0 B D
B 8 C D
C 11 E
D 5 B C E
E 7 A C

• Upload input files
> hadoop fs –put maxvalue.txt /test/
• Run
> java -cp conf/.:org.goldenorb.core-0.1.1-SNAPSHOT.jar:lib/*:yourjar.jar
org.goldenOrb.algorithms.YourAlgorithm
goldenOrb.orb.localFilesToDistribute=/home/user/yourjar.jar
mapred.input.dir=/test/maxvaluetxt/ mapred.output.dir=/test/output
goldenOrb.orb.requestedPartitions=3 goldenOrb.orb.reservedPartitions=0
goldenOrb.orb.classpaths=yourjar.jar
• Result
> hadoop fs –ls /test/output
> hadoop fs –cat /test/output/*
74

Install Giraph(1)
• Requirements
- hadoop-0.20.203
- zookeeper-3.3.3
- maven 3.0.3
• Download
> svn checkout http://svn.apache.org/repos/asf/incubator/giraph/trunk giraph
• Compile
> mvn compile
– Generate target/giraph-{version}-jar-with-dependencies.jar
75

Install Giraph(2)
• Set running environment
① >$HADOOP_HOME/bin/start-all.sh
② >$ZK_HOME/zkServer.sh start
• Upload input file to HDFS
> hadoop fs –put test.grf /giraph/test/input/
* Input Format (JASON)
76
[0,0,[[1,0]]]
[1,0,[[2,100]]]
[2,100,[[3,200]]]
[3,300,[[4,300]]]
[4,600,[[5,400]]]

Install Giraph(3)
• Run
- ex)shortest path algorithm
> hadoop jar giraph-{version}-jar-with-dependencies.jar
org.apache.giraph.examples.SimpleShortestPathsVertex <input-path>
<output-path> <source-node> <number of worker>
• Running status
– http://localhost:50030
• Result
> hadoop fs –cat /<output-path/part-*
77

Orb vs Giraph
78
GoldenOrb Giraph
상태 확인 로그 파일 용이
Fault Tolerance X O
Vertex mutation,
Combiner,
Aggregator…
X O
개발환경 X O
I/O format O X
Update X O

Graph analysis platform comparison, pregel/goldenorb/giraph

More Related Content

Similar to Graph analysis platform comparison, pregel/goldenorb/giraph

More from Andrew Yongjoon Kong

Recently uploaded

Graph analysis platform comparison, pregel/goldenorb/giraph