PROCESSING LARGE-SCALE GRAPHS WITH GOOGLE(TM) PREGEL 
MICHAEL HACKSTEIN 
FRONT END AND GRAPH SPECIALIST ARANGODB
Processing large-scale graphs 
with GoogleTMPregel 
November 17th 
Michael Hackstein 
@mchacki 
www.arangodb.com
Michael Hackstein 
ArangoDB Core Team 
Web Frontend 
Graph visualisation 
Graph features 
Host of cologne.js 
Master’s Degree 
(spec. Databases and 
Information Systems) 
1
Graph Algorithms 
Pattern matching 
Search through the entire graph 
Identify similar components 
) Touch all vertices and their neighbourhoods 
2
Graph Algorithms 
Pattern matching 
Search through the entire graph 
Identify similar components 
) Touch all vertices and their neighbourhoods 
Traversals 
De1ne a speci1c start point 
Iteratively explore the graph 
) History of steps is known 
2
Graph Algorithms 
Pattern matching 
Search through the entire graph 
Identify similar components 
) Touch all vertices and their neighbourhoods 
Traversals 
De1ne a speci1c start point 
Iteratively explore the graph 
) History of steps is known 
Global measurements 
Compute one value for the graph, based on all it’s vertices 
or edges 
Compute one value for each vertex or edge 
) Often require a global view on the graph 
2
Pregel 
A framework to query distributed, directed graphs. 
Known as “Map-Reduce” for graphs 
Uses same phases 
Has several iterations 
Aims at: 
Operate all servers at full capacity 
Reduce network traZc 
Good at calculations touching all vertices 
Bad at calculations touching a very small number of vertices 
3
Example – Connected Components 
1 
1 
2 
2 
5 
7 
7 
5 4 
3 4 
3 
6 
6 
active inactive 
3 forward message 2 backward message 
4
Example – Connected Components 
1 
1 
2 
2 
5 
7 
7 
5 
6 
7 
5 4 
3 4 
3 
6 
6 
4 
2 
3 
4 
active inactive 
3 forward message 2 backward message 
4
Example – Connected Components 
1 
1 
2 
2 
5 
7 
7 
5 
6 
7 
5 4 
3 4 
3 
6 
6 
4 
2 
3 
4 
active inactive 
3 forward message 2 backward message 
4
Example – Connected Components 
1 
1 
2 
2 
5 
6 
7 
5 
6 
5 
5 4 
3 4 
3 
5 
6 
3 
1 
2 
2 
active inactive 
3 forward message 2 backward message 
4
Example – Connected Components 
1 
1 
2 
2 
5 
6 
7 
5 
6 
5 
5 4 
3 4 
3 
5 
6 
3 
1 
2 
2 
active inactive 
3 forward message 2 backward message 
4
Example – Connected Components 
1 
1 
1 
2 
5 
5 
7 
5 2 
2 4 
3 
5 
6 
1 
1 
2 
2 
active inactive 
3 forward message 2 backward message 
4
Example – Connected Components 
1 
1 
1 
2 
5 
5 
7 
5 2 
2 4 
3 
5 
6 
1 
1 
2 
2 
active inactive 
3 forward message 2 backward message 
4
Example – Connected Components 
1 
1 
1 
2 
5 
5 
7 
5 1 
1 4 
3 
5 
6 
1 
1 
active inactive 
3 forward message 2 backward message 
4
Example – Connected Components 
1 
1 
1 
2 
5 
5 
7 
5 1 
1 4 
3 
5 
6 
1 
1 
active inactive 
3 forward message 2 backward message 
4
Example – Connected Components 
1 
1 
1 
2 
5 
5 
7 
5 1 
1 4 
3 
5 
6 
active inactive 
3 forward message 2 backward message 
4
Pregel – Sequence 
5
Pregel – Sequence 
5
Pregel – Sequence 
5
Pregel – Sequence 
5
Pregel – Sequence 
5
Worker ^= Map 
“Map” a user-de1ned algorithm over all vertices 
Output: set of messages to other vertices 
Available parameters: 
The current vertex and his outbound edges 
All incoming messages 
Global values 
Allow modi1cations on the vertex: 
Attach a result to this vertex and his outgoing edges 
Delete the vertex and his outgoing edges 
Deactivate the vertex 
6
Combine ^= Reduce 
“Reduce” all generated messages 
Output: An aggregated message for each vertex. 
Executed on sender as well as receiver. 
Available parameters: 
One new message for a vertex 
The stored aggregate for this vertex 
Typical combiners are SUM, MIN or MAX 
Reduces network traZc 
7
Activity ^= Termination 
Execute several rounds of Map/Reduce 
Count active vertices and messages 
Start next round if one of the following is true: 
At least one vertex is active 
At least one message is sent 
Terminate if neither a vertex is active nor messages were sent 
Store all non-deleted vertices and edges as resulting graph 
8
Pregel at ArangoDB 
Started as a side project in free hack time 
Experimental on operational database 
Implemented as an alternative to traversals 
Make use of the 2exibility of JavaScript: 
No strict type system 
No pre-compilation, on-the-2y queries 
Native JSON documents 
Really fast development 
9
Pagerank for Giraph 
10 
1 public class SimplePageRankComputation extends BasicComputation < 
LongWritable , DoubleWritable , FloatWritable , DoubleWritable > 
{ 
2 public static final int MAX_SUPERSTEPS = 30; 
34 
@Override 
5 public void compute ( Vertex < LongWritable , DoubleWritable , 
FloatWritable > vertex , Iterable < DoubleWritable > messages ) 
throws IOException { 
6 if ( getSuperstep () >= 1) { 
7 double sum = 0; 
8 for ( DoubleWritable message : messages ) { 
9 sum += message .get (); 
10 } 
11 DoubleWritable vertexValue = new DoubleWritable ((0.15 f / 
getTotalNumVertices ()) + 0.85 f * sum ); 
12 vertex . setValue ( vertexValue ); 
13 } 
14 if ( getSuperstep () < MAX_SUPERSTEPS ) { 
15 long edges = vertex . getNumEdges (); 
16 sendMessageToAllEdges (vertex , new DoubleWritable ( vertex . 
getValue ().get () / edges )); 
17 } else { 
18 vertex . voteToHalt (); 
19 } 
20 } 
21 
22 public static class SimplePageRankWorkerContext extends 
WorkerContext { 
23 @Override 
24 public void preApplication () throws InstantiationException , 
IllegalAccessException { } 
25 @Override 
26 public void postApplication () { } 
27 @Override 
28 public void preSuperstep () { } 
29 @Override 
30 public void postSuperstep () { } 
31 } 
32 
33 public static class SimplePageRankMasterCompute extends 
DefaultMasterCompute { 
34 @Override 
35 public void initialize () throws InstantiationException , 
IllegalAccessException { 
36 } 
37 } 
38 public static class SimplePageRankVertexReader extends 
GeneratedVertexReader < LongWritable , DoubleWritable , 
FloatWritable > { 
39 @Override 
40 public boolean nextVertex () { 
41 return totalRecords > recordsRead ; 
42 } 
44 @Override 
45 public Vertex < LongWritable , DoubleWritable , FloatWritable > 
getCurrentVertex () throws IOException { 
46 Vertex < LongWritable , DoubleWritable , FloatWritable > vertex 
= getConf (). createVertex (); 
47 LongWritable vertexId = new LongWritable ( 
48 ( inputSplit . getSplitIndex () * totalRecords ) + 
recordsRead ); 
49 DoubleWritable vertexValue = new DoubleWritable ( vertexId . 
get () * 10d); 
50 long targetVertexId = ( vertexId .get () + 1) % ( inputSplit . 
getNumSplits () * totalRecords ); 
51 float edgeValue = vertexId . get () * 100 f; 
52 List <Edge < LongWritable , FloatWritable >> edges = Lists . 
newLinkedList (); 
53 edges .add ( EdgeFactory . create (new LongWritable ( 
targetVertexId ), new FloatWritable ( edgeValue ))); 
54 vertex . initialize ( vertexId , vertexValue , edges ); 
55 ++ recordsRead ; 
56 return vertex ; 
57 } 
58 } 
59 
60 public static class SimplePageRankVertexInputFormat extends 
GeneratedVertexInputFormat < LongWritable , DoubleWritable , 
FloatWritable > { 
61 @Override 
62 public VertexReader < LongWritable , DoubleWritable , 
FloatWritable > createVertexReader ( InputSplit split , 
TaskAttemptContext context ) 
63 throws IOException { 
64 return new SimplePageRankVertexReader (); 
65 } 
66 } 
67 
68 public static class SimplePageRankVertexOutputFormat extends 
TextVertexOutputFormat < LongWritable , DoubleWritable , 
FloatWritable > { 
69 @Override 
70 public TextVertexWriter createVertexWriter ( 
TaskAttemptContext context ) throws IOException , 
InterruptedException { 
71 return new SimplePageRankVertexWriter (); 
72 } 
73 
74 public class SimplePageRankVertexWriter extends 
TextVertexWriter { 
75 @Override 
76 public void writeVertex ( Vertex < LongWritable , 
DoubleWritable , FloatWritable > vertex ) throws 
IOException , InterruptedException { 
77 getRecordWriter (). write ( new Text ( vertex . getId (). 
toString ()), new Text ( vertex . getValue (). toString ())) 
; 
78 } 
79 } 
80 } 
81 }
Pagerank for TinkerPop3 
11 
1 public class PageRankVertexProgram implements VertexProgram < 
Double > { 
2 private MessageType . Local messageType = MessageType . Local .of 
(() -> GraphTraversal .< Vertex >of (). outE ()); 
3 public static final String PAGE_RANK = Graph .Key . hide (" gremlin 
. pageRank "); 
4 public static final String EDGE_COUNT = Graph .Key . hide (" 
gremlin . edgeCount "); 
5 private static final String VERTEX_COUNT = " gremlin . 
pageRankVertexProgram . vertexCount "; 
6 private static final String ALPHA = " gremlin . 
pageRankVertexProgram . alpha "; 
7 private static final String TOTAL_ITERATIONS = " gremlin . 
pageRankVertexProgram . totalIterations "; 
8 private static final String INCIDENT_TRAVERSAL = " gremlin . 
pageRankVertexProgram . incidentTraversal "; 
9 private double vertexCountAsDouble = 1; 
10 private double alpha = 0.85 d; 
11 private int totalIterations = 30; 
12 private static final Set <String > COMPUTE_KEYS = new HashSet <>( 
Arrays . asList ( PAGE_RANK , EDGE_COUNT )); 
13 
14 private PageRankVertexProgram () {} 
15 
16 @Override 
17 public void loadState ( final Configuration configuration ) { 
18 this . vertexCountAsDouble = configuration . getDouble ( 
VERTEX_COUNT , 1.0 d); 
19 this . alpha = configuration . getDouble (ALPHA , 0.85 d); 
20 this . totalIterations = configuration . getInt ( 
TOTAL_ITERATIONS , 30); 
21 try { 
22 if ( configuration . containsKey ( INCIDENT_TRAVERSAL )) { 
23 final SSupplier < Traversal > traversalSupplier = 
VertexProgramHelper . deserialize ( configuration , 
INCIDENT_TRAVERSAL ); 
24 VertexProgramHelper . verifyReversibility ( 
traversalSupplier .get ()); 
25 this . messageType = MessageType . Local .of (( SSupplier ) 
traversalSupplier ); 
26 } 
27 } catch ( final Exception e) { 
28 throw new IllegalStateException (e. getMessage () , e); 
29 } 
30 } 
32 @Override 
33 public void storeState ( final Configuration configuration ) { 
34 configuration . setProperty ( GraphComputer . VERTEX_PROGRAM , 
PageRankVertexProgram . class . getName ()); 
35 configuration . setProperty ( VERTEX_COUNT , this . 
vertexCountAsDouble ); 
36 configuration . setProperty (ALPHA , this . alpha ); 
37 configuration . setProperty ( TOTAL_ITERATIONS , this . 
totalIterations ); 
38 try { 
39 VertexProgramHelper . serialize ( this . messageType . 
getIncidentTraversal () , configuration , 
INCIDENT_TRAVERSAL ); 
40 } catch ( final Exception e) { 
41 throw new IllegalStateException (e. getMessage () , e); 
42 } 
43 } 
44 
45 @Override 
46 public Set <String > getElementComputeKeys () { 
47 return COMPUTE_KEYS ; 
48 } 
49 
50 @Override 
51 public void setup ( final Memory memory ) { 
52 
53 } 
54 
55 @Override 
56 public void execute ( final Vertex vertex , Messenger <Double > 
messenger , final Memory memory ) { 
57 if ( memory . isInitialIteration ()) { 
58 double initialPageRank = 1.0d / this . vertexCountAsDouble 
; 
59 double edgeCount = Double . valueOf (( Long ) this . 
messageType . edges ( vertex ). count (). next ()); 
60 vertex . singleProperty ( PAGE_RANK , initialPageRank ); 
61 vertex . singleProperty ( EDGE_COUNT , edgeCount ); 
62 messenger . sendMessage ( this . messageType , initialPageRank 
/ edgeCount ); 
63 } else { 
64 double newPageRank = StreamFactory . stream ( messenger . 
receiveMessages ( this . messageType )). reduce (0.0d, (a, 
b) -> a + b); 
65 newPageRank = ( this . alpha * newPageRank ) + ((1.0 d - this 
. alpha ) / this . vertexCountAsDouble ); 
66 vertex . singleProperty ( PAGE_RANK , newPageRank ); 
67 messenger . sendMessage ( this . messageType , newPageRank / 
vertex .<Double > property ( EDGE_COUNT ). orElse (0.0 d)); 
68 } 
69 } 
70 
71 @Override 
72 public boolean terminate ( final Memory memory ) { 
73 return memory . getIteration () >= this . totalIterations ; 
74 } 
75 }
Pagerank for ArangoDB 
1 var pageRank = function (vertex , message , global ) { 
2 var total , rank , edgeCount , send , edge , alpha , sum ; 
3 total = global . vertexCount ; 
4 edgeCount = vertex . _outEdges . length ; 
5 alpha = global . alpha ; 
6 sum = 0; 
7 if ( global . step > 0) { 
8 while ( message . hasNext ()) { 
9 sum += message . next (). data ; 
10 } 
11 rank = alpha * sum + (1- alpha ) / total ; 
12 } else { 
13 rank = 1 / total ; 
14 } 
15 vertex . _setResult ( rank ); 
16 if ( global . step < global . MAX_STEPS ) { 
17 send = rank / edgeCount ; 
18 while ( vertex . _outEdges . hasNext ()) { 
19 edge = vertex . _outEdges . next (); 
20 message . sendTo ( edge . _getTarget () , send ); 
21 } 
22 } else { 
23 vertex . _deactivate (); 
24 } 
25 }; 
26 
27 var combiner = function ( message , oldMessage ) { 
28 return message + oldMessage ; 
29 }; 
30 
31 var Runner = require (" org/ arangodb / pregelRunner "). Runner ; 
32 var runner = new Runner (); 
33 runner . setWorker ( pageRank ); 
34 runner . setCombiner ( combiner ); 
35 runner . start (" myGraph "); 
12
Thank you 
Further Questions? 
Follow me on twitter/github: @mchacki 
Write me a mail: mchacki@arangodb.com 
Follow @arangodb on Twitter 
Join our google group: 
https://groups.google.com/forum/#!forum/arangodb 
Visit our blog https://www.arangodb.com/blog 
Slides available at https://www.slideshare.net/arangodb 
13
17TH ~ 18th NOV 2014 
MADRID (SPAIN)

Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at Big Data Spain 2014

  • 1.
    PROCESSING LARGE-SCALE GRAPHSWITH GOOGLE(TM) PREGEL MICHAEL HACKSTEIN FRONT END AND GRAPH SPECIALIST ARANGODB
  • 2.
    Processing large-scale graphs with GoogleTMPregel November 17th Michael Hackstein @mchacki www.arangodb.com
  • 3.
    Michael Hackstein ArangoDBCore Team Web Frontend Graph visualisation Graph features Host of cologne.js Master’s Degree (spec. Databases and Information Systems) 1
  • 4.
    Graph Algorithms Patternmatching Search through the entire graph Identify similar components ) Touch all vertices and their neighbourhoods 2
  • 5.
    Graph Algorithms Patternmatching Search through the entire graph Identify similar components ) Touch all vertices and their neighbourhoods Traversals De1ne a speci1c start point Iteratively explore the graph ) History of steps is known 2
  • 6.
    Graph Algorithms Patternmatching Search through the entire graph Identify similar components ) Touch all vertices and their neighbourhoods Traversals De1ne a speci1c start point Iteratively explore the graph ) History of steps is known Global measurements Compute one value for the graph, based on all it’s vertices or edges Compute one value for each vertex or edge ) Often require a global view on the graph 2
  • 7.
    Pregel A frameworkto query distributed, directed graphs. Known as “Map-Reduce” for graphs Uses same phases Has several iterations Aims at: Operate all servers at full capacity Reduce network traZc Good at calculations touching all vertices Bad at calculations touching a very small number of vertices 3
  • 8.
    Example – ConnectedComponents 1 1 2 2 5 7 7 5 4 3 4 3 6 6 active inactive 3 forward message 2 backward message 4
  • 9.
    Example – ConnectedComponents 1 1 2 2 5 7 7 5 6 7 5 4 3 4 3 6 6 4 2 3 4 active inactive 3 forward message 2 backward message 4
  • 10.
    Example – ConnectedComponents 1 1 2 2 5 7 7 5 6 7 5 4 3 4 3 6 6 4 2 3 4 active inactive 3 forward message 2 backward message 4
  • 11.
    Example – ConnectedComponents 1 1 2 2 5 6 7 5 6 5 5 4 3 4 3 5 6 3 1 2 2 active inactive 3 forward message 2 backward message 4
  • 12.
    Example – ConnectedComponents 1 1 2 2 5 6 7 5 6 5 5 4 3 4 3 5 6 3 1 2 2 active inactive 3 forward message 2 backward message 4
  • 13.
    Example – ConnectedComponents 1 1 1 2 5 5 7 5 2 2 4 3 5 6 1 1 2 2 active inactive 3 forward message 2 backward message 4
  • 14.
    Example – ConnectedComponents 1 1 1 2 5 5 7 5 2 2 4 3 5 6 1 1 2 2 active inactive 3 forward message 2 backward message 4
  • 15.
    Example – ConnectedComponents 1 1 1 2 5 5 7 5 1 1 4 3 5 6 1 1 active inactive 3 forward message 2 backward message 4
  • 16.
    Example – ConnectedComponents 1 1 1 2 5 5 7 5 1 1 4 3 5 6 1 1 active inactive 3 forward message 2 backward message 4
  • 17.
    Example – ConnectedComponents 1 1 1 2 5 5 7 5 1 1 4 3 5 6 active inactive 3 forward message 2 backward message 4
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
    Worker ^= Map “Map” a user-de1ned algorithm over all vertices Output: set of messages to other vertices Available parameters: The current vertex and his outbound edges All incoming messages Global values Allow modi1cations on the vertex: Attach a result to this vertex and his outgoing edges Delete the vertex and his outgoing edges Deactivate the vertex 6
  • 24.
    Combine ^= Reduce “Reduce” all generated messages Output: An aggregated message for each vertex. Executed on sender as well as receiver. Available parameters: One new message for a vertex The stored aggregate for this vertex Typical combiners are SUM, MIN or MAX Reduces network traZc 7
  • 25.
    Activity ^= Termination Execute several rounds of Map/Reduce Count active vertices and messages Start next round if one of the following is true: At least one vertex is active At least one message is sent Terminate if neither a vertex is active nor messages were sent Store all non-deleted vertices and edges as resulting graph 8
  • 26.
    Pregel at ArangoDB Started as a side project in free hack time Experimental on operational database Implemented as an alternative to traversals Make use of the 2exibility of JavaScript: No strict type system No pre-compilation, on-the-2y queries Native JSON documents Really fast development 9
  • 27.
    Pagerank for Giraph 10 1 public class SimplePageRankComputation extends BasicComputation < LongWritable , DoubleWritable , FloatWritable , DoubleWritable > { 2 public static final int MAX_SUPERSTEPS = 30; 34 @Override 5 public void compute ( Vertex < LongWritable , DoubleWritable , FloatWritable > vertex , Iterable < DoubleWritable > messages ) throws IOException { 6 if ( getSuperstep () >= 1) { 7 double sum = 0; 8 for ( DoubleWritable message : messages ) { 9 sum += message .get (); 10 } 11 DoubleWritable vertexValue = new DoubleWritable ((0.15 f / getTotalNumVertices ()) + 0.85 f * sum ); 12 vertex . setValue ( vertexValue ); 13 } 14 if ( getSuperstep () < MAX_SUPERSTEPS ) { 15 long edges = vertex . getNumEdges (); 16 sendMessageToAllEdges (vertex , new DoubleWritable ( vertex . getValue ().get () / edges )); 17 } else { 18 vertex . voteToHalt (); 19 } 20 } 21 22 public static class SimplePageRankWorkerContext extends WorkerContext { 23 @Override 24 public void preApplication () throws InstantiationException , IllegalAccessException { } 25 @Override 26 public void postApplication () { } 27 @Override 28 public void preSuperstep () { } 29 @Override 30 public void postSuperstep () { } 31 } 32 33 public static class SimplePageRankMasterCompute extends DefaultMasterCompute { 34 @Override 35 public void initialize () throws InstantiationException , IllegalAccessException { 36 } 37 } 38 public static class SimplePageRankVertexReader extends GeneratedVertexReader < LongWritable , DoubleWritable , FloatWritable > { 39 @Override 40 public boolean nextVertex () { 41 return totalRecords > recordsRead ; 42 } 44 @Override 45 public Vertex < LongWritable , DoubleWritable , FloatWritable > getCurrentVertex () throws IOException { 46 Vertex < LongWritable , DoubleWritable , FloatWritable > vertex = getConf (). createVertex (); 47 LongWritable vertexId = new LongWritable ( 48 ( inputSplit . getSplitIndex () * totalRecords ) + recordsRead ); 49 DoubleWritable vertexValue = new DoubleWritable ( vertexId . get () * 10d); 50 long targetVertexId = ( vertexId .get () + 1) % ( inputSplit . getNumSplits () * totalRecords ); 51 float edgeValue = vertexId . get () * 100 f; 52 List <Edge < LongWritable , FloatWritable >> edges = Lists . newLinkedList (); 53 edges .add ( EdgeFactory . create (new LongWritable ( targetVertexId ), new FloatWritable ( edgeValue ))); 54 vertex . initialize ( vertexId , vertexValue , edges ); 55 ++ recordsRead ; 56 return vertex ; 57 } 58 } 59 60 public static class SimplePageRankVertexInputFormat extends GeneratedVertexInputFormat < LongWritable , DoubleWritable , FloatWritable > { 61 @Override 62 public VertexReader < LongWritable , DoubleWritable , FloatWritable > createVertexReader ( InputSplit split , TaskAttemptContext context ) 63 throws IOException { 64 return new SimplePageRankVertexReader (); 65 } 66 } 67 68 public static class SimplePageRankVertexOutputFormat extends TextVertexOutputFormat < LongWritable , DoubleWritable , FloatWritable > { 69 @Override 70 public TextVertexWriter createVertexWriter ( TaskAttemptContext context ) throws IOException , InterruptedException { 71 return new SimplePageRankVertexWriter (); 72 } 73 74 public class SimplePageRankVertexWriter extends TextVertexWriter { 75 @Override 76 public void writeVertex ( Vertex < LongWritable , DoubleWritable , FloatWritable > vertex ) throws IOException , InterruptedException { 77 getRecordWriter (). write ( new Text ( vertex . getId (). toString ()), new Text ( vertex . getValue (). toString ())) ; 78 } 79 } 80 } 81 }
  • 28.
    Pagerank for TinkerPop3 11 1 public class PageRankVertexProgram implements VertexProgram < Double > { 2 private MessageType . Local messageType = MessageType . Local .of (() -> GraphTraversal .< Vertex >of (). outE ()); 3 public static final String PAGE_RANK = Graph .Key . hide (" gremlin . pageRank "); 4 public static final String EDGE_COUNT = Graph .Key . hide (" gremlin . edgeCount "); 5 private static final String VERTEX_COUNT = " gremlin . pageRankVertexProgram . vertexCount "; 6 private static final String ALPHA = " gremlin . pageRankVertexProgram . alpha "; 7 private static final String TOTAL_ITERATIONS = " gremlin . pageRankVertexProgram . totalIterations "; 8 private static final String INCIDENT_TRAVERSAL = " gremlin . pageRankVertexProgram . incidentTraversal "; 9 private double vertexCountAsDouble = 1; 10 private double alpha = 0.85 d; 11 private int totalIterations = 30; 12 private static final Set <String > COMPUTE_KEYS = new HashSet <>( Arrays . asList ( PAGE_RANK , EDGE_COUNT )); 13 14 private PageRankVertexProgram () {} 15 16 @Override 17 public void loadState ( final Configuration configuration ) { 18 this . vertexCountAsDouble = configuration . getDouble ( VERTEX_COUNT , 1.0 d); 19 this . alpha = configuration . getDouble (ALPHA , 0.85 d); 20 this . totalIterations = configuration . getInt ( TOTAL_ITERATIONS , 30); 21 try { 22 if ( configuration . containsKey ( INCIDENT_TRAVERSAL )) { 23 final SSupplier < Traversal > traversalSupplier = VertexProgramHelper . deserialize ( configuration , INCIDENT_TRAVERSAL ); 24 VertexProgramHelper . verifyReversibility ( traversalSupplier .get ()); 25 this . messageType = MessageType . Local .of (( SSupplier ) traversalSupplier ); 26 } 27 } catch ( final Exception e) { 28 throw new IllegalStateException (e. getMessage () , e); 29 } 30 } 32 @Override 33 public void storeState ( final Configuration configuration ) { 34 configuration . setProperty ( GraphComputer . VERTEX_PROGRAM , PageRankVertexProgram . class . getName ()); 35 configuration . setProperty ( VERTEX_COUNT , this . vertexCountAsDouble ); 36 configuration . setProperty (ALPHA , this . alpha ); 37 configuration . setProperty ( TOTAL_ITERATIONS , this . totalIterations ); 38 try { 39 VertexProgramHelper . serialize ( this . messageType . getIncidentTraversal () , configuration , INCIDENT_TRAVERSAL ); 40 } catch ( final Exception e) { 41 throw new IllegalStateException (e. getMessage () , e); 42 } 43 } 44 45 @Override 46 public Set <String > getElementComputeKeys () { 47 return COMPUTE_KEYS ; 48 } 49 50 @Override 51 public void setup ( final Memory memory ) { 52 53 } 54 55 @Override 56 public void execute ( final Vertex vertex , Messenger <Double > messenger , final Memory memory ) { 57 if ( memory . isInitialIteration ()) { 58 double initialPageRank = 1.0d / this . vertexCountAsDouble ; 59 double edgeCount = Double . valueOf (( Long ) this . messageType . edges ( vertex ). count (). next ()); 60 vertex . singleProperty ( PAGE_RANK , initialPageRank ); 61 vertex . singleProperty ( EDGE_COUNT , edgeCount ); 62 messenger . sendMessage ( this . messageType , initialPageRank / edgeCount ); 63 } else { 64 double newPageRank = StreamFactory . stream ( messenger . receiveMessages ( this . messageType )). reduce (0.0d, (a, b) -> a + b); 65 newPageRank = ( this . alpha * newPageRank ) + ((1.0 d - this . alpha ) / this . vertexCountAsDouble ); 66 vertex . singleProperty ( PAGE_RANK , newPageRank ); 67 messenger . sendMessage ( this . messageType , newPageRank / vertex .<Double > property ( EDGE_COUNT ). orElse (0.0 d)); 68 } 69 } 70 71 @Override 72 public boolean terminate ( final Memory memory ) { 73 return memory . getIteration () >= this . totalIterations ; 74 } 75 }
  • 29.
    Pagerank for ArangoDB 1 var pageRank = function (vertex , message , global ) { 2 var total , rank , edgeCount , send , edge , alpha , sum ; 3 total = global . vertexCount ; 4 edgeCount = vertex . _outEdges . length ; 5 alpha = global . alpha ; 6 sum = 0; 7 if ( global . step > 0) { 8 while ( message . hasNext ()) { 9 sum += message . next (). data ; 10 } 11 rank = alpha * sum + (1- alpha ) / total ; 12 } else { 13 rank = 1 / total ; 14 } 15 vertex . _setResult ( rank ); 16 if ( global . step < global . MAX_STEPS ) { 17 send = rank / edgeCount ; 18 while ( vertex . _outEdges . hasNext ()) { 19 edge = vertex . _outEdges . next (); 20 message . sendTo ( edge . _getTarget () , send ); 21 } 22 } else { 23 vertex . _deactivate (); 24 } 25 }; 26 27 var combiner = function ( message , oldMessage ) { 28 return message + oldMessage ; 29 }; 30 31 var Runner = require (" org/ arangodb / pregelRunner "). Runner ; 32 var runner = new Runner (); 33 runner . setWorker ( pageRank ); 34 runner . setCombiner ( combiner ); 35 runner . start (" myGraph "); 12
  • 30.
    Thank you FurtherQuestions? Follow me on twitter/github: @mchacki Write me a mail: mchacki@arangodb.com Follow @arangodb on Twitter Join our google group: https://groups.google.com/forum/#!forum/arangodb Visit our blog https://www.arangodb.com/blog Slides available at https://www.slideshare.net/arangodb 13
  • 31.
    17TH ~ 18thNOV 2014 MADRID (SPAIN)