Map reduce programming model to solve graph problems

3,520 views

Published on

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,520
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
121
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Map reduce programming model to solve graph problems

  1. 1. MapReduce Programming Model To Solve Graph Problems Presented By: Nishant Gandhi M.Tech. - CSE 1st Year 1311CS05 Guided By: Dr. Rajiv Misra
  2. 2. Seminar Overview • Introduction to MapReduce • MapReduce Programming Model – Word Count problem • Graph Problems & MapReduce – Breath First Search – Augmenting Edges with Degree – Enumerating Triangles from Graph
  3. 3. Introduction to MapReduce • History of Computing – Moore’s Law • Not holding since last few years • Memory is still bottle neck for large GHZ processor – Distributed Problems • Indexing The Web, Simulating Internet Sized Network, Speeding Up Content Delivery, Rendering Multiple Frames – Parallel Computing (1975-1985) • Synchronization Problems • Very Costly Super Computers – Distributed Computing (1995-Today) • Cost Effective Solution • Use Commodity Hardware • Google has no Super Computer
  4. 4. Introduction to MapReduce • History of MapReduce at Google – Problem at Google • Computing Large Amount of Data on DS • Parallelize Computing, Distribute Data, Handle Failure – One Solution • New Abstract that allows simple computation & hide all other mess • Automatics Parallelization, Distribution, Fault Handling • MapReduce Paper 2004
  5. 5. MapReduce Programming Model • Motivation – Automatic Parallelization & Distribution – Fault tolerant – Provides Status & Monitoring Tool – Clean Abstract For Programmer
  6. 6. MapReduce Programming Model • Programming Model – Borrows From Functional Programming – User Implement interface of two functions • Map & Reduce • map (in_key, in_value) --> (out_key, intermediate_value) list • reduce (out_key, intermediate_value list) --> out_value list
  7. 7. MapReduce Programming Model map: (K1,V1) → list (K2,V2) reduce: (K2,list(V2)) → list (K3,V3) 1. Map function is applied to every input key-value pair 2. Map function generates intermediate key-value pairs 3. Intermediate key-values are sorted and grouped by key 4. Reduce is applied to sorted and grouped intermediate key-values 5. Reduce emits result key-values
  8. 8. MapReduce Programming Model
  9. 9. MapReduce Programming Model Example: WordCount
  10. 10. Graph Problems Graphs are ubiquitous in modern society. Some examples: • The hyperlink structure of the web • Social networks on social networking sites like Facebook, IMDB, email, text messages and tweet flows (like Twitter) • Transportation networks (roads, trains, fights etc) • Human body can be seen as a graph of genes, proteins, cells etc..
  11. 11. Graph Problems & MapReduce • Performing Computation on a graph data structure requires processing at each node • Each node contain node-specific data as well as links (edges) to other nodes • Computation must traverse the graph and perform the computation step • How do we traverse a graph in MapReduce? How do we represent the graph for this?
  12. 12. Breath First Search & MapReduce Problem: This does not fit into MapReduce Solution: Iterated passes through MapReduce-map some nodes, result includes additional nodes which are fed into successive MapReduce passes
  13. 13. Breath First Search & MapReduce Example Representation as adjacent list ID EDGES|DISTANCE_FROM_SOURCE|COLOR| • Input to MAP 1 2,5|0|GRAY| 2 1,3,4,5|Integer.MAX_VALUE|WHITE| 3 2,4|Integer.MAX_VALUE|WHITE| 4 2,3,5|Integer.MAX_VALUE|WHITE| 5 1,2,4|Integer.MAX_VALUE|WHITE|
  14. 14. Breath First Search & MapReduce Example • 1st iteration of Map 1 2,5|0|BLACK| 2 NULL|1|GRAY| 5 NULL|1|GRAY| 2 1,3,4,5|Integer.MAX_VALUE|WHITE| 3 2,4|Integer.MAX_VALUE|WHITE| 4 2,3,5|Integer.MAX_VALUE|WHITE| 5 1,2,4|Integer.MAX_VALUE|WHITE| •1st iteration for Reduce(result only for node 2) 2 NULL|1|GRAY| 2 1,3,4,5|Integer.MAX_VALUE|WHITE| The reducers job is to take all this data and construct a new node using the non-null list of edges the minimum distance the darkest color
  15. 15. Breath First Search & MapReduce Example •Output of 1st iteration 1 2,5,|0|BLACK 2 1,3,4,5,|1|GRAY 3 2,4,|Integer.MAX_VALUE|WHITE 4 2,3,5,|Integer.MAX_VALUE|WHITE 5 1,2,4,|1|GRAY •Output of 2st iteration 1 2,5,|0|BLACK 2 1,3,4,5,|1|BLACK 3 2,4,|2|GRAY 4 2,3,5,|2|GRAY 5 1,2,4,|1|BLACK
  16. 16. Breath First Search & MapReduce Example •Output of 3st iteration 1 2,5,|0|BLACK 2 1,3,4,5,|1|BLACK 3 2,4,|2|BLACK 4 2,3,5,|2|BLACK 5 1,2,4,|1|BLACK
  17. 17. Augmenting Edges with Degrees & MapReduce Problem: This does not fit into MapReduce Solution: Requires two MapReduce jobs: two reduce steps and two map steps, one of which is the identity map.
  18. 18. Augmenting Edges with Degrees & MapReduce Example Mapper: for each input record, the map creates two output records, one keyed under each vertex in the edge. Reducer: The reduce takes all edges mapped to a single vertex (“Fred” here), counts them to obtain the degree, and emits a record for each input record, each keyed under the edge it represents.
  19. 19. Augmenting Edges with Degrees & MapReduce Example Mapper: the identity mapper preserves the records unchanged, so the records are binned by the edges they represent. Reducer: The reducer combines the partial-degree information to produce a complete record, which it exports.
  20. 20. Enumerating Triangles & MapReduce Example  Problem: Enumerating 3-cycle sub graph from given graph  Solution: • augmenting the edge records with vertex valence • two MapReduce jobs
  21. 21. Enumerating Triangles & MapReduce Example • In the first map operation for enumerating triangles, the mapper records each edge under the vertex with the lowest degree. • The incoming records’ key doesn’t matter.
  22. 22. Enumerating Triangles & MapReduce Example • In the first map operation for enumerating triangles, the mapper records each edge under the vertex with the lowest degree. • The incoming records’ key doesn’t matter.
  23. 23. Enumerating Triangles & MapReduce Example • The second map for enumerating triangles brings together the edge and open triad records. • In the process, it rekeys the edge records so that both record types are binned under the vertices they connect.
  24. 24. Enumerating Triangles & MapReduce Example • In the second reduce, each bin contains at most one edge record and some number of triad records (perhaps none). • For every combination of edge record and triad record in a bin, the reduce emits a triangle record. The output key isn’t significant.
  25. 25. Bibliography 1. J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,” Comm. ACM, vol. 51, no. 1,2008, pp. 107–112. 2. GoogleDevelopers, “Lecture 5: Parallel Graph Algorithms with MapReduce,” 28 Aug. 2007; http://youtube.com/watch?v=BT-piFBP4fE. 3. Jonathan Cohen, Graph Twiddling in a MapReduce World. Comp. in Science & Engineering, July/August 2009, 29-41.
  26. 26. Thank You

×