A System for Large-Scale Graph
Processing
Pregel, GoldenOrb, Giraph
2012-07-18
Andrew Yongjoon Kong
sstrato.open@gmail.com
1
Contents
• Introduction
• Model of Computation
• Pregel Architecture
• Goldenorb
• Implementation
• Future work
2
Introduction
3
Introduction
• Today, Many practical computing problems concern large
graphs
• Applied algorithms
- Shortest paths computations
- Page rank
- Clustering techniques
• MapReduce is ill-suited for graph processing
- Many iterations are needed for parallel graph processing
- Materializations of intermediate results at every MapReduce
iteration harm performance
4
Introduction
• Hadoop is well-suited for non-iterative, data
parallelized processing
5
Smith Waterman
is a non iterativ
e case and of c
ourse runs fine
Introduction
6
map map
reduce
Compute the dist
ance to each dat
a point from eac
h cluster center a
nd assign points
to cluster centers
Compute new cluster
centers
Compute new clust
er centers
User program
Iterative?
• Should Handle iterative processing like PDE
(Partial Differential Equation)
• http://www.iterativemapreduce.org/
7
Graph based Computation
• Pregel
– Google’s large scale graph
• GordenOrb
• Giraph
– Yahoo’s platform
• Hama
– Apache Hama’s
• Pegasus
– Carnegie Melon University 8
Single Source Shortest Path (SSSP)
 Problem
– Find shortest path from a source node to all target
nodes
 Solution
– MapReduce
– Pregel
9
Example: SSSP—using MapReduce
• A Map task receives
– Key: node n
– Value: D (distance from start), points-to (list of nodes
reachable from n)
– D(n) = dist + min(D(m))
• The Reduce task gathers possible distances and selects
the minimum one
10
Example: SSSP—using MapReduce
 Adjacency matrix
 Adjacency List
A: (B, 10), (D, 5)
B: (C, 1), (D, 2)
C: (E, 4)
D: (B, 3), (C, 9), (E, 2)
E: (A, 7), (C, 6)
A B C D E
A 0 10 0 5 0
B 0 0 1 2 0
C 0 0 0 0 4
D 0 3 9 0 2
E 7 0 6 0 0
11
0




10
5
2 3
2
1
9
7
4 6
A
B C
D E
Example: SSSP—using MapReduce
 Map input: <node ID, <dist, adj list>>
<A, <0, <(B, 10), (D, 5)>>>
<B, <inf, <(C, 1), (D, 2)>>>
<C, <inf, <(E, 4)>>>
<D, <inf, <(B, 3), (C, 9), (E, 2)>>>
<E, <inf, <(A, 7), (C, 6)>>>
 Map output: <dest node ID, dist>
<B, 10> <D, 5>
<C, inf> <D, inf>
<E, inf>
<B, inf> <C, inf> <E, inf>
<A, inf> <C, inf>
<A, <0, <(B, 10), (D, 5)>>>
<B, <inf, <(C, 1), (D, 2)>>>
<C, <inf, <(E, 4)>>>
<D, <inf, <(B, 3), (C, 9), (E, 2)>>>
<E, <inf, <(A, 7), (C, 6)>>>
Flush to local
disk!
12
0




10
5
2 3
2
1
9
7
4 6
A
B C
D E
Example: SSSP—using MapReduce
 Reduce input: <node ID, dist>
<A, <0, <(B, 10), (D, 5)>>>
<A, inf>
<B, <inf, <(C, 1), (D, 2)>>>
<B, 10> <B, inf>
<C, <inf, <(E, 4)>>>
<C, inf> <C, inf> <C, inf>
<D, <inf, <(B, 3), (C, 9), (E, 2)>>>
<D, 5> <D, inf>
<E, <inf, <(A, 7), (C, 6)>>>
<E, inf> <E, inf>
13
0




10
5
2 3
2
1
9
7
4 6
A
B C
D E
Select possible & minimum value
and update former iteration
result.
Example: SSSP—using MapReduce
 Reduce output: <node ID, <dist, adj list>>
= Map input for next iteration
<A, <0, <(B, 10), (D, 5)>>>
<B, <10, <(C, 1), (D, 2)>>>
<C, <inf, <(E, 4)>>>
<D, <5, <(B, 3), (C, 9), (E, 2)>>>
<E, <inf, <(A, 7), (C, 6)>>>
 Map output: <dest node ID, dist>
<B, 10> <D, 5>
<C, 11> <D, 12>
<E, inf>
<B, 8> <C, 14> <E, 7>
<A, inf> <C, inf>
<A, <0, <(B, 10), (D, 5)>>>
<B, <10, <(C, 1), (D, 2)>>>
<C, <inf, <(E, 4)>>>
<D, <5, <(B, 3), (C, 9), (E, 2)>>>
<E, <inf, <(A, 7), (C, 6)>>>
Flush to local
disk!
14
0
10
5


10
5
2 3
2
1
9
7
4 6
A
B C
D E
Example: SSSP—using MapReduce
 Reduce input: <node ID, dist>
<A, <0, <(B, 10), (D, 5)>>>
<A, inf>
<B, <10, <(C, 1), (D, 2)>>>
<B, 10> <B, 8>
<C, <inf, <(E, 4)>>>
<C, 11> <C, 14> <C, inf>
<D, <5, <(B, 3), (C, 9), (E, 2)>>>
<D, 5> <D,12>
<E, <inf, <(A, 7), (C, 6)>>>
<E, inf> <E, 7>
15
0




10
5
2 3
2
1
9
7
4 6
A
B C
D E
Select possible & minimum value
and update former iteration
result.
Example: SSSP—using MapReduce
 Map input: <node ID, <dist, adj list>>
<A, <0, <(B, 10), (D, 5)>>>
<B, <8, <(C, 1), (D, 2)>>>
<C, <11, <(E, 4)>>>
<D, <5, <(B, 3), (C, 9), (E, 2)>>>
<E, <7, <(A, 7), (C, 6)>>>
 Map output: <dest node ID, dist>
<B, 10> <D, 5>
<C, 9> <D, 10>
<E, 15>
<B, 8> <C, 14> <E, 7>
<A, 14> <C, 13>
<A, <0, <(B, 10), (D, 5)>>>
<B, <8, <(C, 1), (D, 2)>>>
<C, <11, <(E, 4)>>>
<D, <5, <(B, 3), (C, 9), (E, 2)>>>
<E, <7, <(A, 7), (C, 6)>>>
Flush to local
disk!
16
0
8
5
11
7
10
5
2 3
2
1
9
7
4 6
A
B C
D E
Example: SSSP—using MapReduce
 Reduce input: <node ID, dist>
<A, <0, <(B, 10), (D, 5)>>>
<A, 14>
<B, <8, <(C, 1), (D, 2)>>>
<B, 10> <B, 8>
<C, <11, <(E, 4)>>>
<C, 9> <C, 14> <C, 13>
<D, <5, <(B, 3), (C, 9), (E, 2)>>>
<D, 5> <D, 10>
<E, <7, <(A, 7), (C, 6)>>>
<E, 15> <E, 7>
17
Select possible & minimum value
and update former iteration
result.
0
8
5
9
7
10
5
2 3
2
1
9
7
4 6
A
B C
D E
Example: SSSP—using MapReduce
 Map input: <node ID, <dist, adj list>>
<A, <0, <(B, 10), (D, 5)>>>
<B, <8, <(C, 1), (D, 2)>>>
<C, <9, <(E, 4)>>>
<D, <5, <(B, 3), (C, 9), (E, 2)>>>
<E, <7, <(A, 7), (C, 6)>>>
 Map output: <dest node ID, dist>
<B, 10> <D, 5>
<C, 9> <D, 10>
<E, 13>
<B, 8> <C, 14> <E, 7>
<A, 14> <C, 13>
<A, <0, <(B, 10), (D, 5)>>>
<B, <8, <(C, 1), (D, 2)>>>
<C, <9, <(E, 4)>>>
<D, <5, <(B, 3), (C, 9), (E, 2)>>>
<E, <7, <(A, 7), (C, 6)>>>
Flush to local
disk!
18
0
8
5
9
7
10
5
2 3
2
1
9
7
4 6
A
B C
D E
Example: SSSP—using MapReduce
 Reduce input: <node ID, dist>
<A, <0, <(B, 10), (D, 5)>>>
<A, 14>
<B, <8, <(C, 1), (D, 2)>>>
<B, 10> <B, 8>
<C, <9, <(E, 4)>>>
<C, 9> <C, 14> <C, 13>
<D, <5, <(B, 3), (C, 9), (E, 2)>>>
<D, 5><D, 10>
<E, <7, <(A, 7), (C, 6)>>>
<E, 13> <E, 7>
19
No Changes. Quit Process!
0
8
5
9
7
10
5
2 3
2
1
9
7
4 6
A
B C
D E
• The MapReduce use the key/value pairs to save the
nodes and adjacent distance, It is more suitable to
process huge datasets rather than the large-scale
graph
Here, we introduce the new system– Pregel!
20
Model of Computation
21
Model of Pregel Computation
Input
Output
Supersteps:
• A sequence of iterations
• Vertex compute in parallel
Input: a directed graph
•Vertex : a vertex ID
a modifiable
•Edges: a target vertex
a modifiable
associate with source vertices
Output: a directed graph
•The set of values explicitly output
by the vertices
•vertices and edges can be added
and moved
22
Maximum Value Example
• propagate the largest value to every vertex
23
A B C D
Single Source Shortest Path (SSSP)
 Problem
– Find shortest path from a source node to all target
nodes
 Solution
– MapReduce
– Pregel
24
Example: SSSP—using Pregel
25
0




10
5
2 3
2
1
9
7
4 6
A
B
D E
C
Example: SSSP—using Pregel
26
A
ED
CB
Example: SSSP—using Pregel
27
0
10
5


10
5
2 3
2
1
9
7
4 6
A
ED
CB
Example: SSSP—using Pregel
28
A
ED
CB
Example: SSSP—using Pregel
29
0
8
5
11
7
10
5
2 3
2
1
9
7
4 6
A
ED
CB
Example: SSSP—using Pregel
30
0
8
5
11
7
10
5
2 3
2
1
9
7
4 6
13
14
15
9
10
A
ED
CB
Example: SSSP—using Pregel
31
0
8
5
9
7
10
5
2 3
2
1
9
7
4 6
A
ED
CB
Example: SSSP—using Pregel
32
0
8
5
9
7
10
5
2 3
2
1
9
7
4 6
13
A
ED
CB
Example: SSSP—using Pregel
33
0
8
5
9
7
10
5
2 3
2
1
9
7
4 6
A
ED
CB
Pregel vs MapReduce
 Pregel
– Keeps vertices & edges on
the machine that performs
computation
– Uses network transfers only
for messages
– Sufficiently expressive, no
need for remote reads
 MapReduce
– Require much more
communication and
associated overhead
– Needs to coordinate the
steps of a chained
MapReduce add the
programming complexity
Pregel Architecture
35
System Architecture
 Pregel system uses the master/worker model
– Master
 Coordinating worker activity
 Determines the amount of partitions and assign to worker
 Recovers faults of workers (“ping” messges)
 Maintains statistics about the progress of computation
and the state of the graph
– Worker
 Maintain the state of its portion of the graph in memory
 Executing the Compute() method
 Communicates with the other workers
36
37
•Assign portion of the input
•Instruct each worker to
perform a superstep
•call Compute() for each
vertex
• update the data structure
• receive/send messages
• responds to master when
finished
•Control the number of
partitions in graph
•Notify the master to
start the processing
Fault Tolerance
 Checkpointing
– The master periodically instructs the workers to save the state
of their partitions to persistent storage
 e.g., Vertex values, edge values, incoming messages
 Failure detection
– Using regular “ping” messages
 Recovery
– The master reassigns graph partitions to the currently available
workers
– The workers all reload their partition state from most recent
available checkpoint
38
Goldenorb
39
Goldenorb
• Open Source Version of Google’s Pregel
• Implemented in Java
• Version 0.1.1
• Requirements
- hadoop file system
- zookeeper for communication
40
41
Orbcluster
JobsInProgr
ess
Jobid
Messages heartbeat
OrbTracker
LeaderGrou
p
JobQue
OrbTrackte
rs
ZK-TREE
Orb-Tracker(L)
Orb-Tracker(S)
Orb-Tracker(S)
Job
manager
Partition
manager
Watcher
/Event
Partition
request
Partition
(Master)
…
Partition
(slave)
Partition
(slave)
Inbound msg queue
outbound msg queue
current msg queue
Inbound msg queue
outbound msg queue
current msg queue
•startLoadVerticesBarrier
•Superstep start Barrier
•doneComputingVerticesBarr
•doneSendingMessageBarrier
…
•LeaderShipChange
•LostMember
•NewMember
•JobStart/Death/Complete…
HDFS
Read/write
Message Exchange
• Message교환은 Superstep간에 이루어짐
• [S-1] superstep의 outbound message들은 [s] superstep의 inbound
messages
• Outbound Queue가 가득차면 message들을 보내고 다시 queuing
• Superstep 중간에 message를 받은 partition은 inbound queue에
저장하고 다음 Superstep까지 보관
• 현재 superstep에 사용할 message들은 current message queue에 복사
• 이 때, inbound queue가 system이나 jvm의 memory size 를 넘어서면
overflow 발생
Memory management
• Outbound Message Queue
- Fixed size, 가득 차면 바로 messages 보냄
• Inbound Message Queue
- 다음 Superstep에 사용
- Message 양이 많아지면 overflow가능성 있음
• Current Message Queue
- Inbound Queue 과 같은 사이즈
- 현 Superstep 에서는 CurrentQueue에 inboundQueue를 복사해서 사용하므로
currentQueue+inboundQueue 의 메모리 사용 overflow
 Inbound Queue를 file 기반의 local 저장공간에 구현 필요
API
• Sub-classing the predefined classes
– Reader/writer/vertex/message
44
Class Vertex {
public Vertex(Class<VV> vertexValue, Class<EV> edgeValue, Class<MV> messageValue);
String vertexID();
abstract void compute(Collection<MV> messages);
long superStep();
void setValue(VV value);
VV getValue();
Collection<Edge<EV>> getEdges();
void sendMessage(MV message);
void voteToHalt();
}
Not yet implemented
• Aggregator
– a mechanism for global communication monitoring and data
• Combiner
– Reducing the number of messages
– Ex) if compute() sum messages’ value, combiner can calculate
and transmit single message(sum)
• Topology mutation
– Remove or add Vertex/Edge
• Fault Recovery
45
Implementation
46
Implement
 Maximum Value
 Single Source Shortest Path
 PageRank
 K-means
 Mean-shift
47
48
MaximumValue
PageRank
• PageRank is Google’s way of deciding a Page’s
importance
• A important page is linked to by many pages with
high PageRank
• PR(A) = PR(inLink_v1)/L(t1) + ….+ P(inLink_vn)/L(tn)
• Add damping factor d
• PR(A) = (1-d) + d∑PR(v)/L(v)
• Repeat until converged
49
PageRank
50
PageRank
51
AE
B
C
D
F
<Input file>
<output file>
K-means
• N observations are parted to k cluster
• Each observation belongs to the cluster with the
nearest mean
No object
move group?
End
Number of cluster K
Calculate centroids
Distance objects to
centroids
Grouping based on
minimum distance
start
NO
YES
K-means
53
• Message includes cluster id and value
• Every superstep, a vertex sends message to all
vertices
1
2
3
100
101
102
seed2
seed1
A
B C
D
E
F
Step A B C D E F
S0 C1 C2 - - - -
K-means
54
1
2
3
100
101
102
A
B C
D
E
F
Step A B C D E F
S0 C1 C2 - - - -
S1 C1 C2 C1 C1 C1 C1
Centroid1 = Value(A)
Centroid2 = Value(B)
1
2
3
100
101
102
seed2
seed1
A
B C
D
E
F
K-means
55
Step A B C D E F
S0 C1 C2 - - - -
S1 C1 C2 C1 C1 C1 C1
S2 C2 C2 C2 C1 C1 C1
1
2
3
100
101
102
A
B C
D
E
F
Centroid1 = Mean(A,C,D,E,F)
Centroid2 = Mean(B)
1
2
3
100
101
102
A
B C
D
E
F
K-means
56
Step A B C D E F
S0 C1 C2 - - - -
S1 C1 C2 C1 C1 C1 C1
S2 C2 C2 C2 C1 C1 C1
S3 C2 C2 C2 C1 C1 C1
1
2
3
100
101
102
A
B C
D
E
F
Centroid1 = Mean(D,E,F)
Centroid2 = Mean(A,B,C)
If centroids are
converged,
Quit Process!
57
K-means
58
<input file>
<output file>
N : number of vertices
Each superstep, NxN messages are exchanged.
 O(n2) : need too much memory !!!
Giraph
59
Giraph
• ASF(Apache Software Foundation)’s Open Source
Version of Google’s Pregel
• Implemented in Java
• Apache incubator
• Requirements
- hadoop 0.20.203 or higher version
: map-only job in hadoop
- zookeeper
: if not exist, use hadoop file system instead of zookeeper
60
Giraph – vertex distribution
61
Giraph - usages
• Users can set the checkpoint frequency
– GiraphJob.getConfiguration().set(“giraph.checkpointFrequency”, 0)
//means no check points
• User should set zookeeper configuration
– GiraphJob.setZookeeperConfiguration(“zk-server-list”);
62
Giraph - Characteristics
• Faulty tolerance
– If the master dies, a new one will automatically take over
– If a worker dies, the app is rolled back to a previously checkpointed
superstep
– If a zookeeper server dies, as long as a quorum remains, the app can
proceed
– But, Hadoop SPOF still exist
• Combiner/Aggregator
• JSON in/out format
• Easy Job status monitoring (http)
63
Experiments
64
Experiments
• 3 severs
• nPartition = nMapper = 9
• MR vs GoldenOrb vs Giraph
– PageRank
– Kmeans (mahout)
– Elapse time, cpu, memory, disk, network
65
Experiments - PageRank
• Number of Vertices ≈ 220,000
• Fixed iteration = 100
66
Elapse
Time
CPU
(%)
Memory
(kb)
Network
(bytes)
Disk Write
(sec/s)
Rcv. Trans. read write
GoldenOrb 1m 56s 14.53 3,745,376 19,437 12,845 777 606
Giraph 3m 31s 8.77 1,244,000 11,374 914 0 326
MapReduce 34m 51s 3.75 3,091,239 13,514 867 0 4101
Experiments - Kmeans
• Number of Vertices = 100,000
• Number of Cluster(K) = 10
67
Elapse
Time
CPU
(%)
Memory
(kb)
Network
(bytes)
Disk Write
(sec/s)
Rcv. Trans. read write
GoldenOrb 3m 19s 13.32 3,857,892 11,634 27,086 128 1151
Giraph 1m 49s 6.36 1,245,000 7,810 1,999 0 536
MapReduce 11m 28s 1.48 2,645,517 13,528 1,005 0 7104
Experiments
68
Elapse Time(s)
Installation
69
Install Goldenorb (1)
• Requirements
- hadoop-0.20.2
- zookeeper-3.3.3
• Download & unzip
– org.goldenorb.core-0.1.1-SNAPSHOT-distribution.zip
70
Install Goldenorb(2)
• Set configuration
① ORB_HOME 환경변수
> export ORB_HOME=/usr/local/goldenorb
② Conf/orbServers
> localhost:/usr/local/goldenorb
③ Conf/orb-site.xml
> cp orb-site.sample.xml orb-site.xml
> vi orb-site.xml
④ If Distributed mode ,
copy to all servers
71
<property>
<name>goldenOrb.zookeeper.quorum</name>
<value> localhost</value>
<description> The server running zookeeper</description>
<property>
……
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
Target
zookeeper
server IP
Install Goldenorb(3)
• Set running environment
① Hadoop 실행
> $HADOOP_HOME/bin/start-dfs.sh
② Zookeeper 실행
> $ZK_HOME/bin/zkServer.sh start
③ Orb-tracker 실행
> $ORB_HOME/bin/orb-tracker.sh start
④ Log 확인
> cat #ORB_HOME/logs/xxx.log
72
Install Goldenorb(4)
• Make input
- ex) maximum value
< Vertex-id > <value> <outgoing-edge-list>
73
0
8
5
11
7
A
B C
D E
A 0 B D
B 8 C D
C 11 E
D 5 B C E
E 7 A C
Install Goldenorb(5)
• Upload input files
> hadoop fs –put maxvalue.txt /test/
• Run
> java -cp conf/.:org.goldenorb.core-0.1.1-SNAPSHOT.jar:lib/*:yourjar.jar
org.goldenOrb.algorithms.YourAlgorithm
goldenOrb.orb.localFilesToDistribute=/home/user/yourjar.jar
mapred.input.dir=/test/maxvaluetxt/ mapred.output.dir=/test/output
goldenOrb.orb.requestedPartitions=3 goldenOrb.orb.reservedPartitions=0
goldenOrb.orb.classpaths=yourjar.jar
• Result
> hadoop fs –ls /test/output
> hadoop fs –cat /test/output/*
74
Install Giraph(1)
• Requirements
- hadoop-0.20.203
- zookeeper-3.3.3
- maven 3.0.3
• Download
> svn checkout http://svn.apache.org/repos/asf/incubator/giraph/trunk giraph
• Compile
> mvn compile
– Generate target/giraph-{version}-jar-with-dependencies.jar
75
Install Giraph(2)
• Set running environment
① >$HADOOP_HOME/bin/start-all.sh
② >$ZK_HOME/zkServer.sh start
• Upload input file to HDFS
> hadoop fs –put test.grf /giraph/test/input/
* Input Format (JASON)
76
[0,0,[[1,0]]]
[1,0,[[2,100]]]
[2,100,[[3,200]]]
[3,300,[[4,300]]]
[4,600,[[5,400]]]
Install Giraph(3)
• Run
- ex)shortest path algorithm
> hadoop jar giraph-{version}-jar-with-dependencies.jar
org.apache.giraph.examples.SimpleShortestPathsVertex <input-path>
<output-path> <source-node> <number of worker>
• Running status
– http://localhost:50030
• Result
> hadoop fs –cat /<output-path/part-*
77
Orb vs Giraph
78
GoldenOrb Giraph
상태 확인 로그 파일 용이
Fault Tolerance X O
Vertex mutation,
Combiner,
Aggregator…
X O
개발환경 X O
I/O format O X
Update X O
Thank you !
79

Graph analysis platform comparison, pregel/goldenorb/giraph

  • 1.
    A System forLarge-Scale Graph Processing Pregel, GoldenOrb, Giraph 2012-07-18 Andrew Yongjoon Kong sstrato.open@gmail.com 1
  • 2.
    Contents • Introduction • Modelof Computation • Pregel Architecture • Goldenorb • Implementation • Future work 2
  • 3.
  • 4.
    Introduction • Today, Manypractical computing problems concern large graphs • Applied algorithms - Shortest paths computations - Page rank - Clustering techniques • MapReduce is ill-suited for graph processing - Many iterations are needed for parallel graph processing - Materializations of intermediate results at every MapReduce iteration harm performance 4
  • 5.
    Introduction • Hadoop iswell-suited for non-iterative, data parallelized processing 5 Smith Waterman is a non iterativ e case and of c ourse runs fine
  • 6.
    Introduction 6 map map reduce Compute thedist ance to each dat a point from eac h cluster center a nd assign points to cluster centers Compute new cluster centers Compute new clust er centers User program
  • 7.
    Iterative? • Should Handleiterative processing like PDE (Partial Differential Equation) • http://www.iterativemapreduce.org/ 7
  • 8.
    Graph based Computation •Pregel – Google’s large scale graph • GordenOrb • Giraph – Yahoo’s platform • Hama – Apache Hama’s • Pegasus – Carnegie Melon University 8
  • 9.
    Single Source ShortestPath (SSSP)  Problem – Find shortest path from a source node to all target nodes  Solution – MapReduce – Pregel 9
  • 10.
    Example: SSSP—using MapReduce •A Map task receives – Key: node n – Value: D (distance from start), points-to (list of nodes reachable from n) – D(n) = dist + min(D(m)) • The Reduce task gathers possible distances and selects the minimum one 10
  • 11.
    Example: SSSP—using MapReduce Adjacency matrix  Adjacency List A: (B, 10), (D, 5) B: (C, 1), (D, 2) C: (E, 4) D: (B, 3), (C, 9), (E, 2) E: (A, 7), (C, 6) A B C D E A 0 10 0 5 0 B 0 0 1 2 0 C 0 0 0 0 4 D 0 3 9 0 2 E 7 0 6 0 0 11 0     10 5 2 3 2 1 9 7 4 6 A B C D E
  • 12.
    Example: SSSP—using MapReduce Map input: <node ID, <dist, adj list>> <A, <0, <(B, 10), (D, 5)>>> <B, <inf, <(C, 1), (D, 2)>>> <C, <inf, <(E, 4)>>> <D, <inf, <(B, 3), (C, 9), (E, 2)>>> <E, <inf, <(A, 7), (C, 6)>>>  Map output: <dest node ID, dist> <B, 10> <D, 5> <C, inf> <D, inf> <E, inf> <B, inf> <C, inf> <E, inf> <A, inf> <C, inf> <A, <0, <(B, 10), (D, 5)>>> <B, <inf, <(C, 1), (D, 2)>>> <C, <inf, <(E, 4)>>> <D, <inf, <(B, 3), (C, 9), (E, 2)>>> <E, <inf, <(A, 7), (C, 6)>>> Flush to local disk! 12 0     10 5 2 3 2 1 9 7 4 6 A B C D E
  • 13.
    Example: SSSP—using MapReduce Reduce input: <node ID, dist> <A, <0, <(B, 10), (D, 5)>>> <A, inf> <B, <inf, <(C, 1), (D, 2)>>> <B, 10> <B, inf> <C, <inf, <(E, 4)>>> <C, inf> <C, inf> <C, inf> <D, <inf, <(B, 3), (C, 9), (E, 2)>>> <D, 5> <D, inf> <E, <inf, <(A, 7), (C, 6)>>> <E, inf> <E, inf> 13 0     10 5 2 3 2 1 9 7 4 6 A B C D E Select possible & minimum value and update former iteration result.
  • 14.
    Example: SSSP—using MapReduce Reduce output: <node ID, <dist, adj list>> = Map input for next iteration <A, <0, <(B, 10), (D, 5)>>> <B, <10, <(C, 1), (D, 2)>>> <C, <inf, <(E, 4)>>> <D, <5, <(B, 3), (C, 9), (E, 2)>>> <E, <inf, <(A, 7), (C, 6)>>>  Map output: <dest node ID, dist> <B, 10> <D, 5> <C, 11> <D, 12> <E, inf> <B, 8> <C, 14> <E, 7> <A, inf> <C, inf> <A, <0, <(B, 10), (D, 5)>>> <B, <10, <(C, 1), (D, 2)>>> <C, <inf, <(E, 4)>>> <D, <5, <(B, 3), (C, 9), (E, 2)>>> <E, <inf, <(A, 7), (C, 6)>>> Flush to local disk! 14 0 10 5   10 5 2 3 2 1 9 7 4 6 A B C D E
  • 15.
    Example: SSSP—using MapReduce Reduce input: <node ID, dist> <A, <0, <(B, 10), (D, 5)>>> <A, inf> <B, <10, <(C, 1), (D, 2)>>> <B, 10> <B, 8> <C, <inf, <(E, 4)>>> <C, 11> <C, 14> <C, inf> <D, <5, <(B, 3), (C, 9), (E, 2)>>> <D, 5> <D,12> <E, <inf, <(A, 7), (C, 6)>>> <E, inf> <E, 7> 15 0     10 5 2 3 2 1 9 7 4 6 A B C D E Select possible & minimum value and update former iteration result.
  • 16.
    Example: SSSP—using MapReduce Map input: <node ID, <dist, adj list>> <A, <0, <(B, 10), (D, 5)>>> <B, <8, <(C, 1), (D, 2)>>> <C, <11, <(E, 4)>>> <D, <5, <(B, 3), (C, 9), (E, 2)>>> <E, <7, <(A, 7), (C, 6)>>>  Map output: <dest node ID, dist> <B, 10> <D, 5> <C, 9> <D, 10> <E, 15> <B, 8> <C, 14> <E, 7> <A, 14> <C, 13> <A, <0, <(B, 10), (D, 5)>>> <B, <8, <(C, 1), (D, 2)>>> <C, <11, <(E, 4)>>> <D, <5, <(B, 3), (C, 9), (E, 2)>>> <E, <7, <(A, 7), (C, 6)>>> Flush to local disk! 16 0 8 5 11 7 10 5 2 3 2 1 9 7 4 6 A B C D E
  • 17.
    Example: SSSP—using MapReduce Reduce input: <node ID, dist> <A, <0, <(B, 10), (D, 5)>>> <A, 14> <B, <8, <(C, 1), (D, 2)>>> <B, 10> <B, 8> <C, <11, <(E, 4)>>> <C, 9> <C, 14> <C, 13> <D, <5, <(B, 3), (C, 9), (E, 2)>>> <D, 5> <D, 10> <E, <7, <(A, 7), (C, 6)>>> <E, 15> <E, 7> 17 Select possible & minimum value and update former iteration result. 0 8 5 9 7 10 5 2 3 2 1 9 7 4 6 A B C D E
  • 18.
    Example: SSSP—using MapReduce Map input: <node ID, <dist, adj list>> <A, <0, <(B, 10), (D, 5)>>> <B, <8, <(C, 1), (D, 2)>>> <C, <9, <(E, 4)>>> <D, <5, <(B, 3), (C, 9), (E, 2)>>> <E, <7, <(A, 7), (C, 6)>>>  Map output: <dest node ID, dist> <B, 10> <D, 5> <C, 9> <D, 10> <E, 13> <B, 8> <C, 14> <E, 7> <A, 14> <C, 13> <A, <0, <(B, 10), (D, 5)>>> <B, <8, <(C, 1), (D, 2)>>> <C, <9, <(E, 4)>>> <D, <5, <(B, 3), (C, 9), (E, 2)>>> <E, <7, <(A, 7), (C, 6)>>> Flush to local disk! 18 0 8 5 9 7 10 5 2 3 2 1 9 7 4 6 A B C D E
  • 19.
    Example: SSSP—using MapReduce Reduce input: <node ID, dist> <A, <0, <(B, 10), (D, 5)>>> <A, 14> <B, <8, <(C, 1), (D, 2)>>> <B, 10> <B, 8> <C, <9, <(E, 4)>>> <C, 9> <C, 14> <C, 13> <D, <5, <(B, 3), (C, 9), (E, 2)>>> <D, 5><D, 10> <E, <7, <(A, 7), (C, 6)>>> <E, 13> <E, 7> 19 No Changes. Quit Process! 0 8 5 9 7 10 5 2 3 2 1 9 7 4 6 A B C D E
  • 20.
    • The MapReduceuse the key/value pairs to save the nodes and adjacent distance, It is more suitable to process huge datasets rather than the large-scale graph Here, we introduce the new system– Pregel! 20
  • 21.
  • 22.
    Model of PregelComputation Input Output Supersteps: • A sequence of iterations • Vertex compute in parallel Input: a directed graph •Vertex : a vertex ID a modifiable •Edges: a target vertex a modifiable associate with source vertices Output: a directed graph •The set of values explicitly output by the vertices •vertices and edges can be added and moved 22
  • 23.
    Maximum Value Example •propagate the largest value to every vertex 23 A B C D
  • 24.
    Single Source ShortestPath (SSSP)  Problem – Find shortest path from a source node to all target nodes  Solution – MapReduce – Pregel 24
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
    Example: SSSP—using Pregel 30 0 8 5 11 7 10 5 23 2 1 9 7 4 6 13 14 15 9 10 A ED CB
  • 31.
  • 32.
  • 33.
  • 34.
    Pregel vs MapReduce Pregel – Keeps vertices & edges on the machine that performs computation – Uses network transfers only for messages – Sufficiently expressive, no need for remote reads  MapReduce – Require much more communication and associated overhead – Needs to coordinate the steps of a chained MapReduce add the programming complexity
  • 35.
  • 36.
    System Architecture  Pregelsystem uses the master/worker model – Master  Coordinating worker activity  Determines the amount of partitions and assign to worker  Recovers faults of workers (“ping” messges)  Maintains statistics about the progress of computation and the state of the graph – Worker  Maintain the state of its portion of the graph in memory  Executing the Compute() method  Communicates with the other workers 36
  • 37.
    37 •Assign portion ofthe input •Instruct each worker to perform a superstep •call Compute() for each vertex • update the data structure • receive/send messages • responds to master when finished •Control the number of partitions in graph •Notify the master to start the processing
  • 38.
    Fault Tolerance  Checkpointing –The master periodically instructs the workers to save the state of their partitions to persistent storage  e.g., Vertex values, edge values, incoming messages  Failure detection – Using regular “ping” messages  Recovery – The master reassigns graph partitions to the currently available workers – The workers all reload their partition state from most recent available checkpoint 38
  • 39.
  • 40.
    Goldenorb • Open SourceVersion of Google’s Pregel • Implemented in Java • Version 0.1.1 • Requirements - hadoop file system - zookeeper for communication 40
  • 41.
    41 Orbcluster JobsInProgr ess Jobid Messages heartbeat OrbTracker LeaderGrou p JobQue OrbTrackte rs ZK-TREE Orb-Tracker(L) Orb-Tracker(S) Orb-Tracker(S) Job manager Partition manager Watcher /Event Partition request Partition (Master) … Partition (slave) Partition (slave) Inbound msgqueue outbound msg queue current msg queue Inbound msg queue outbound msg queue current msg queue •startLoadVerticesBarrier •Superstep start Barrier •doneComputingVerticesBarr •doneSendingMessageBarrier … •LeaderShipChange •LostMember •NewMember •JobStart/Death/Complete… HDFS Read/write
  • 42.
    Message Exchange • Message교환은Superstep간에 이루어짐 • [S-1] superstep의 outbound message들은 [s] superstep의 inbound messages • Outbound Queue가 가득차면 message들을 보내고 다시 queuing • Superstep 중간에 message를 받은 partition은 inbound queue에 저장하고 다음 Superstep까지 보관 • 현재 superstep에 사용할 message들은 current message queue에 복사 • 이 때, inbound queue가 system이나 jvm의 memory size 를 넘어서면 overflow 발생
  • 43.
    Memory management • OutboundMessage Queue - Fixed size, 가득 차면 바로 messages 보냄 • Inbound Message Queue - 다음 Superstep에 사용 - Message 양이 많아지면 overflow가능성 있음 • Current Message Queue - Inbound Queue 과 같은 사이즈 - 현 Superstep 에서는 CurrentQueue에 inboundQueue를 복사해서 사용하므로 currentQueue+inboundQueue 의 메모리 사용 overflow  Inbound Queue를 file 기반의 local 저장공간에 구현 필요
  • 44.
    API • Sub-classing thepredefined classes – Reader/writer/vertex/message 44 Class Vertex { public Vertex(Class<VV> vertexValue, Class<EV> edgeValue, Class<MV> messageValue); String vertexID(); abstract void compute(Collection<MV> messages); long superStep(); void setValue(VV value); VV getValue(); Collection<Edge<EV>> getEdges(); void sendMessage(MV message); void voteToHalt(); }
  • 45.
    Not yet implemented •Aggregator – a mechanism for global communication monitoring and data • Combiner – Reducing the number of messages – Ex) if compute() sum messages’ value, combiner can calculate and transmit single message(sum) • Topology mutation – Remove or add Vertex/Edge • Fault Recovery 45
  • 46.
  • 47.
    Implement  Maximum Value Single Source Shortest Path  PageRank  K-means  Mean-shift 47
  • 48.
  • 49.
    PageRank • PageRank isGoogle’s way of deciding a Page’s importance • A important page is linked to by many pages with high PageRank • PR(A) = PR(inLink_v1)/L(t1) + ….+ P(inLink_vn)/L(tn) • Add damping factor d • PR(A) = (1-d) + d∑PR(v)/L(v) • Repeat until converged 49
  • 50.
  • 51.
  • 52.
    K-means • N observationsare parted to k cluster • Each observation belongs to the cluster with the nearest mean No object move group? End Number of cluster K Calculate centroids Distance objects to centroids Grouping based on minimum distance start NO YES
  • 53.
    K-means 53 • Message includescluster id and value • Every superstep, a vertex sends message to all vertices 1 2 3 100 101 102 seed2 seed1 A B C D E F Step A B C D E F S0 C1 C2 - - - -
  • 54.
    K-means 54 1 2 3 100 101 102 A B C D E F Step AB C D E F S0 C1 C2 - - - - S1 C1 C2 C1 C1 C1 C1 Centroid1 = Value(A) Centroid2 = Value(B) 1 2 3 100 101 102 seed2 seed1 A B C D E F
  • 55.
    K-means 55 Step A BC D E F S0 C1 C2 - - - - S1 C1 C2 C1 C1 C1 C1 S2 C2 C2 C2 C1 C1 C1 1 2 3 100 101 102 A B C D E F Centroid1 = Mean(A,C,D,E,F) Centroid2 = Mean(B) 1 2 3 100 101 102 A B C D E F
  • 56.
    K-means 56 Step A BC D E F S0 C1 C2 - - - - S1 C1 C2 C1 C1 C1 C1 S2 C2 C2 C2 C1 C1 C1 S3 C2 C2 C2 C1 C1 C1 1 2 3 100 101 102 A B C D E F Centroid1 = Mean(D,E,F) Centroid2 = Mean(A,B,C) If centroids are converged, Quit Process!
  • 57.
  • 58.
    K-means 58 <input file> <output file> N: number of vertices Each superstep, NxN messages are exchanged.  O(n2) : need too much memory !!!
  • 59.
  • 60.
    Giraph • ASF(Apache SoftwareFoundation)’s Open Source Version of Google’s Pregel • Implemented in Java • Apache incubator • Requirements - hadoop 0.20.203 or higher version : map-only job in hadoop - zookeeper : if not exist, use hadoop file system instead of zookeeper 60
  • 61.
    Giraph – vertexdistribution 61
  • 62.
    Giraph - usages •Users can set the checkpoint frequency – GiraphJob.getConfiguration().set(“giraph.checkpointFrequency”, 0) //means no check points • User should set zookeeper configuration – GiraphJob.setZookeeperConfiguration(“zk-server-list”); 62
  • 63.
    Giraph - Characteristics •Faulty tolerance – If the master dies, a new one will automatically take over – If a worker dies, the app is rolled back to a previously checkpointed superstep – If a zookeeper server dies, as long as a quorum remains, the app can proceed – But, Hadoop SPOF still exist • Combiner/Aggregator • JSON in/out format • Easy Job status monitoring (http) 63
  • 64.
  • 65.
    Experiments • 3 severs •nPartition = nMapper = 9 • MR vs GoldenOrb vs Giraph – PageRank – Kmeans (mahout) – Elapse time, cpu, memory, disk, network 65
  • 66.
    Experiments - PageRank •Number of Vertices ≈ 220,000 • Fixed iteration = 100 66 Elapse Time CPU (%) Memory (kb) Network (bytes) Disk Write (sec/s) Rcv. Trans. read write GoldenOrb 1m 56s 14.53 3,745,376 19,437 12,845 777 606 Giraph 3m 31s 8.77 1,244,000 11,374 914 0 326 MapReduce 34m 51s 3.75 3,091,239 13,514 867 0 4101
  • 67.
    Experiments - Kmeans •Number of Vertices = 100,000 • Number of Cluster(K) = 10 67 Elapse Time CPU (%) Memory (kb) Network (bytes) Disk Write (sec/s) Rcv. Trans. read write GoldenOrb 3m 19s 13.32 3,857,892 11,634 27,086 128 1151 Giraph 1m 49s 6.36 1,245,000 7,810 1,999 0 536 MapReduce 11m 28s 1.48 2,645,517 13,528 1,005 0 7104
  • 68.
  • 69.
  • 70.
    Install Goldenorb (1) •Requirements - hadoop-0.20.2 - zookeeper-3.3.3 • Download & unzip – org.goldenorb.core-0.1.1-SNAPSHOT-distribution.zip 70
  • 71.
    Install Goldenorb(2) • Setconfiguration ① ORB_HOME 환경변수 > export ORB_HOME=/usr/local/goldenorb ② Conf/orbServers > localhost:/usr/local/goldenorb ③ Conf/orb-site.xml > cp orb-site.sample.xml orb-site.xml > vi orb-site.xml ④ If Distributed mode , copy to all servers 71 <property> <name>goldenOrb.zookeeper.quorum</name> <value> localhost</value> <description> The server running zookeeper</description> <property> …… <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> Target zookeeper server IP
  • 72.
    Install Goldenorb(3) • Setrunning environment ① Hadoop 실행 > $HADOOP_HOME/bin/start-dfs.sh ② Zookeeper 실행 > $ZK_HOME/bin/zkServer.sh start ③ Orb-tracker 실행 > $ORB_HOME/bin/orb-tracker.sh start ④ Log 확인 > cat #ORB_HOME/logs/xxx.log 72
  • 73.
    Install Goldenorb(4) • Makeinput - ex) maximum value < Vertex-id > <value> <outgoing-edge-list> 73 0 8 5 11 7 A B C D E A 0 B D B 8 C D C 11 E D 5 B C E E 7 A C
  • 74.
    Install Goldenorb(5) • Uploadinput files > hadoop fs –put maxvalue.txt /test/ • Run > java -cp conf/.:org.goldenorb.core-0.1.1-SNAPSHOT.jar:lib/*:yourjar.jar org.goldenOrb.algorithms.YourAlgorithm goldenOrb.orb.localFilesToDistribute=/home/user/yourjar.jar mapred.input.dir=/test/maxvaluetxt/ mapred.output.dir=/test/output goldenOrb.orb.requestedPartitions=3 goldenOrb.orb.reservedPartitions=0 goldenOrb.orb.classpaths=yourjar.jar • Result > hadoop fs –ls /test/output > hadoop fs –cat /test/output/* 74
  • 75.
    Install Giraph(1) • Requirements -hadoop-0.20.203 - zookeeper-3.3.3 - maven 3.0.3 • Download > svn checkout http://svn.apache.org/repos/asf/incubator/giraph/trunk giraph • Compile > mvn compile – Generate target/giraph-{version}-jar-with-dependencies.jar 75
  • 76.
    Install Giraph(2) • Setrunning environment ① >$HADOOP_HOME/bin/start-all.sh ② >$ZK_HOME/zkServer.sh start • Upload input file to HDFS > hadoop fs –put test.grf /giraph/test/input/ * Input Format (JASON) 76 [0,0,[[1,0]]] [1,0,[[2,100]]] [2,100,[[3,200]]] [3,300,[[4,300]]] [4,600,[[5,400]]]
  • 77.
    Install Giraph(3) • Run -ex)shortest path algorithm > hadoop jar giraph-{version}-jar-with-dependencies.jar org.apache.giraph.examples.SimpleShortestPathsVertex <input-path> <output-path> <source-node> <number of worker> • Running status – http://localhost:50030 • Result > hadoop fs –cat /<output-path/part-* 77
  • 78.
    Orb vs Giraph 78 GoldenOrbGiraph 상태 확인 로그 파일 용이 Fault Tolerance X O Vertex mutation, Combiner, Aggregator… X O 개발환경 X O I/O format O X Update X O
  • 79.