# Processing graph/relational data with Map-Reduce and Bulk Synchronous Parallel

## by chodakowski on Feb 25, 2011

• 2,768 views

### Statistics

Likes
7
79
0
Embed Views
6
Views on SlideShare
2,762
Total Views
2,768

## Processing graph/relational data with Map-Reduce and Bulk Synchronous ParallelPresentation Transcript

• Processing graph/relational data with Map-Reduce and Bulk Synchronous Parallel v. 1.1 Tomasz Chodakowski, 1st Bristol Hadoop Workshop, 08-11-2010
• Irregular Algorithms● Map-reduce – a simplified model for “embarasingly parallel” problems – Easily separable into independent tasks – Captured by static dependence graph● Most graph algorithms are irregular, ie.: – Dependencies between tasks arise during execution – “dont care non-determinism” - tasks can be executed in arbitrary order yet still yield correct results.
• Irregular Algorithms● Often operate on data structures with complex topologies: – Graphs, trees, grids, ... – Where “data elements” are connected by “relations”● Computations on such structures depend strongly on relations between data elements – primary source of dependencies between tasks more in [ADP] “Amorphous Data-parallelism in Irregular Algorithms”
• Relational Data● Example relations between elements: – social interactions (co-authorship, friendship) – web links, document references – linked data or semantic network relations – geo-spatial relations – ...● Different from a relational model – in that relations are arbitrary
• Graph Algorithms Rough Classification● Aggregation, feature extraction – Not leveraging latent relations● Network analysis (matrix-based, single relational) – Geodesic (radius, diameter etc.) – Spectral (eigenvector-based, centrality)● Algorithmic/node-based algorithms – Recommender systems, belief/label propagation – Traversal, path detection, interaction networks, etc.
• Iterative Vertex-based Graph Algorithms● Iteratively: – Compute local function of a vertex that depends on the vertex state and local graph structure (neighbourhood) – and/or Modify local state – and/or Modify local topology – pass messages to neighbouring nodes● -> “vertex-based computation” Amorphous Data-Parallelism [ADP] operator formulation: “repeated application of neighbourhood operators in a specific order”
• Recent applications/developments● Google work on graph-based YouTube recommendations: – Leveraging latent information – Diffusing interest in sparsely labeled video clips● User profiling, sentiment analysis – Facebook likes, Hunch, Gravity, MusicMetric ...
• Single Source Shortest Path Time P1 P2 P1 P2 Graph structure work split into two partitions (P1, P2) 0 1 6 This time-space 4 view shows 1 3 workload and 2 communication 9 Turquoise 2 between rectangles show partitions 5 computational 1 work load for a3 partition (work) Directed graph labelled with positive integers
• Single Source Shortest Path P1 P2 P1 P2 work comm 0 0+6 0+6 1 6 4 1 3 0+1 0+1 2 9 2 0+9 0+9 5 13 Signals being passed along Thick green linesActive vertices relations are in show, costly, interare in turquoise light green partition communications
• Single Source Shortest Path P1 P2 P1 P2 work comm barrier 0 0+6 0+6 1 6 4 1 3 0+1 0+1 2 9 2 0+9 0+9 5 13 Vertical grey line is a barrier synchronisation to avoid race conditions
• Single Source Shortest Path P1 P2 P1 P2 work comm barrier 0 work 1 6 6 4 1 3 9 2 1 2 1 59 3 Work,comm,barrier form a BSP superstep Vertices become active upon receiving signal in a previous superstep
• Single Source Shortest Path P1 P2 P1 P2 work comm barrier 0 work 1 6 6 4 comm 1+3 1+3 1 3 9 2 1 2 6+2 6+2 1 59 3 1+1 1+1 After performing local computation they send signals to their neighbouring vertices
• Single Source Shortest Path P1 P2 P1 P2 work comm barrier 0 work 1 6 6 4 comm 1+3 1+3 barrier 1 3 9 2 1 2 6+2 6+2 1 59 3 1+1 1+1
• Single Source Shortest Path P1 P2 P1 P2 work comm barrier 0 work 1 6 4 4 comm barrier 1 3 work 9 2 1 2 8 1 59 3
• Single Source Shortest Path P1 P2 P1 P2 work comm barrier 0 work 1 6 4 4 comm barrier 1 3 work 9 2 1 comm 2 4+2 4+2 8 1 59 3
• Single Source Shortest Path P1 P2 P1 P2 work comm barrier 0 work 1 6 4 4 comm barrier 1 3 work 9 2 1 comm 2 4+2 4+2 barrier 8 1 59 3
• Single Source Shortest Path P1 P2 P1 P2 work comm barrier 0 work 1 6 4 4 comm barrier 1 3 work 9 2 1 comm 2 barrier 6 1 59 work 3
• Single Source Shortest Path P1 P2 P1 P2 work comm barrier 0 work 1 6 4 4 comm barrier 1 3 work 9 2 1 comm 2 barrier 6 1 59 work comm 3 barrier Computation ends when there are no active vertices left
• Bulk Synchronous Parallelsuperstep P1 P2 ... Pn 0 w0 h0 l0 1 w1 h1 l1 2 w2 h2 l2 3 w3 h3 ... l3 ... ... ... ... Time to finish work on slowest partition + superstep n cost = cost of bulk communication + wn + hn + ln barrier synchronization time
• Bulk Synchronous Parallel● Advantages – Simple and portable execution model – Clear cost model – No concurrency control, no data races, deadlocks, etc.● Disadvantages – Coarse grained ●Depends on a large “parallel slack” – Requires well-partitioned problem space for efficiency (well balanced partitions) more in [BSP] “A bridging model for parallel computation”
• Bulk Synchronous Parallel - extensions● Combiners – minimizing inter-node communication (h factor)● Aggregators – Computing global state (ex. map/reduce) And other extensions...
• public void superStep() { Sample code int minDist = this.isStartingElement() ? 0 : Integer.MAX_VALUE; for(DistanceMessage msg: messages()) { // Choose min. proposed distance for(DistanceMessage minDist = Math.min( minDist, msg.getDistance() ); } if( minDist < this.getCurrentDistance() ) { //If improves the path, store and propagate if( this.setCurrentDistance(minDist); IVertex v = this.getElement(); for(IEdge r: v.getOutgoingEdges(DemoRelationshipTypes.KNOWS) ) { for(IEdge IElement recipient = r.getOtherElement(v); int rDist = this.getLengthOf(r); this.sendMessage( new DistanceMessage(minDist+rDist, recipient.getId()) ); }}
• SSSP - Map-Reduce Naive● Idea [DPMR]: – In map phase: ● emit both signals and local vertex structure and state – In reduce phase: ● gather signals and local vertex structure messages ● reconstruct vertex structure and state
• SSSP - Map-Reduce Naivedef map(Id nId, Node N): def reduce(Id rId, {m1,m2,..} ): //emit state and structure new M; M.deActivateemit(nId, minDist = MAX_VALUEN.graphStateAndStruct) for(m in {m1,m2,..}) if(m is Node) M:=m //stateif(N.isActive) else if(m is Distance) //signals for(nbr :N.adjacencyL) minDist = min( minDist, m ) //local computation dist:= N.currDist+DistToNbr if(M.currDist > minDist) //emit signals M.currDist:=minDist; emit(nbr.id, dist) M.activate emit(rId, M)
• SSSP - Map Reduce Naive - issues● Cost associated with marshaling intermediate <k,v> pairs for combiners (which are optional) – -> in-line combiner● Need to pass the whole graph state and structure around – -> “Shimmy trick” -- pin down the structure● Partitions verticies without regard to graph topology – -> cluster highly connected components together
• Inline Combiners● In job configure: – Initialize a map<NodeId, Distance>;● In job map operation: – Do not emit interm. pairs ( emit(nbr.id, dist) ) ; – Store them in the local map; – Combine values in the same slots.● In job close: – Emit a value from each slot in the map to a corresponding neighbour ● emit(nbr.id, map[nbr.id])
• “Shimmy trick”● Store graph structure in a file system (no shuffle)● Inspired by a parallel merge join partition p1 p1 p2 p2 p3 p3 sorted by join key sorted and partitioned by join key
• “Shimmy trick”● Assume: – Graph G representation sorted by node ids; – G partitioned into n parts: G1, G2, .., Gn – Use the same partitioner as in MR – Set number of reducers to n● The above gives us: – Reducer Ri, receives the same intermediate keys as those in Gi graph partition (in sorted order).
• “Shimmy trick”def configure( ): def reduce(Id rId, {m1,m2,..} ): P.openGraphPartition() repeat: (id nId, node N) <- P.read() if (nId != rId): N.deact; emit(nId, N) until: nId == rId minDist = MAX_VALUE for(m in {m1,m2,..}):def close( ): minDist = min( minDist, m )repeat: if(N.currDist > minDist) (id nId, node N) <-P.read() N.currDist:=minDist; N.deactivate N.activate emit(nId, N) emit(rId, N)
• “Shimmy trick”● Improvements: – Files containing graph structure reside on dfs – Reducers arbitrarily assigned to cluster machines ● -> remote reads.● -> change the scheduler to assign key ranges to the same machines consistently.
• Topology-aware Partitioner● Choose a partitioner that: – minimizes inter-block traffic; – maximizes intra-block traffic; – places adjacent nodes in the same block● Difficult to achieve particularly with many real world datasets: – Power-law distributions – Reported that state of the art partitioners (ex. parmetis) fail for such cases (???)
• MR Graph Processing Design Pattern● [DPMR] reports 60% 70% improvement over naive implementation● Solution closely resembles the BSP model
• BSP (inspired) implementations● Google Pregel: – classic BSP, C++, production● CMU GraphLab – inspired by BSP, java, multi-core – consistency models, custom schedulers● Apache Hama – scientific computation package that runs on top of Hadoop, BSP, MS Dryad (?)● Signal/Collect (Zurich University) – Scala, not yet distributed● ...
• Open questions● What problems are particularly suitable for MR and which ones for BSP – where are the boundaries? – Topology-based centrality algorithms (PageRank): ● Algebraic, matrix-based methods vs. vertex-based ones?● When considering graph algorithms: – MR user base vs. BSP ergonomy? – Performance overheads?● Relaxing the BSP synchronous schedule --> “Amorphous data parallelism”
• POC, Sample Code● Project Masuria (early stages, 2011-02) – http://masuria-project.org/ – As much POC of BSP framework as it is (distributed) OSGI playground.● Sample code: – https://github.com/tch/Cloud9 * – git@git.assembla.com:tch_sandbox.git – RunSSSPNaive.java – RunSSSPShimmy.java * * - expect (my) bugs Based on Jimmy Lin and Michael Schatz Cloud9 library
• References● [ADP] “Amorphous Data-parallelism in Irregular Algorithms”, Keshav Pingali et al.● [BSP] “A bridging model for parallel computation”, Leslie G. Valiant● [DPMR] “Design Patterns for Efﬁcient Graph Algorithms in MapReduce”, Jimmy Lin and Michael Schatz