MapReduce Programming Model
To Solve Graph Problems
Presented By:
Nishant Gandhi
M.Tech. - CSE 1st Year
1311CS05
Guided By:
Dr. Rajiv Misra
Seminar Overview
• Introduction to MapReduce
• MapReduce Programming Model
– Word Count problem
• Graph Problems & MapReduce
– Breath First Search
– Augmenting Edges with Degree
– Enumerating Triangles from Graph
Introduction to MapReduce
• History of Computing
– Moore’s Law
• Not holding since last few years
• Memory is still bottle neck for large GHZ processor
– Distributed Problems
• Indexing The Web, Simulating Internet Sized Network, Speeding Up
Content Delivery, Rendering Multiple Frames
– Parallel Computing (1975-1985)
• Synchronization Problems
• Very Costly Super Computers
– Distributed Computing (1995-Today)
• Cost Effective Solution
• Use Commodity Hardware
• Google has no Super Computer
Introduction to MapReduce
• History of MapReduce at Google
– Problem at Google
• Computing Large Amount of Data on DS
• Parallelize Computing, Distribute Data, Handle Failure
– One Solution
• New Abstract that allows simple computation & hide
all other mess
• Automatics Parallelization, Distribution, Fault Handling
• MapReduce Paper 2004
MapReduce Programming Model
• Motivation
– Automatic Parallelization & Distribution
– Fault tolerant
– Provides Status & Monitoring Tool
– Clean Abstract For Programmer
MapReduce Programming Model
• Programming Model
– Borrows From Functional Programming
– User Implement interface of two functions
• Map & Reduce
• map (in_key, in_value) --> (out_key, intermediate_value)
list
• reduce (out_key, intermediate_value list) --> out_value list
MapReduce Programming Model
map: (K1,V1) → list (K2,V2)
reduce: (K2,list(V2)) → list (K3,V3)
1. Map function is applied to every input key-value pair
2. Map function generates intermediate key-value pairs
3. Intermediate key-values are sorted and grouped by key
4. Reduce is applied to sorted and grouped intermediate
key-values
5. Reduce emits result key-values
MapReduce Programming Model
MapReduce Programming Model
Example: WordCount
Graph Problems
Graphs are ubiquitous in modern society. Some
examples:
• The hyperlink structure of the web
• Social networks on social networking sites like
Facebook, IMDB, email, text messages and tweet
flows (like Twitter)
• Transportation networks (roads, trains, fights etc)
• Human body can be seen as a graph of genes,
proteins, cells etc..
Graph Problems & MapReduce
• Performing Computation on a graph data
structure requires processing at each node
• Each node contain node-specific data as well
as links (edges) to other nodes
• Computation must traverse the graph and
perform the computation step
• How do we traverse a graph in MapReduce?
How do we represent the graph for this?
Breath First Search & MapReduce
Problem:
This does not fit into MapReduce
Solution:
Iterated passes through
MapReduce-map some nodes,
result includes additional nodes
which are fed into successive
MapReduce passes
Breath First Search & MapReduce
Example
Representation as adjacent list
ID EDGES|DISTANCE_FROM_SOURCE|COLOR|
• Input to MAP
1 2,5|0|GRAY|
2 1,3,4,5|Integer.MAX_VALUE|WHITE|
3 2,4|Integer.MAX_VALUE|WHITE|
4 2,3,5|Integer.MAX_VALUE|WHITE|
5 1,2,4|Integer.MAX_VALUE|WHITE|
Breath First Search & MapReduce
Example
• 1st iteration of Map
1 2,5|0|BLACK|
2 NULL|1|GRAY|
5 NULL|1|GRAY|
2 1,3,4,5|Integer.MAX_VALUE|WHITE|
3 2,4|Integer.MAX_VALUE|WHITE|
4 2,3,5|Integer.MAX_VALUE|WHITE|
5 1,2,4|Integer.MAX_VALUE|WHITE|
•1st iteration for Reduce(result only for node 2)
2 NULL|1|GRAY|
2 1,3,4,5|Integer.MAX_VALUE|WHITE|
The reducers job is to take all
this data and construct a new
node using
the non-null list of edges
the minimum distance
the darkest color
Breath First Search & MapReduce
Example
•Output of 1st iteration
1 2,5,|0|BLACK
2 1,3,4,5,|1|GRAY
3 2,4,|Integer.MAX_VALUE|WHITE
4 2,3,5,|Integer.MAX_VALUE|WHITE
5 1,2,4,|1|GRAY
•Output of 2st iteration
1 2,5,|0|BLACK
2 1,3,4,5,|1|BLACK
3 2,4,|2|GRAY
4 2,3,5,|2|GRAY
5 1,2,4,|1|BLACK
Breath First Search & MapReduce
Example
•Output of 3st iteration
1 2,5,|0|BLACK
2 1,3,4,5,|1|BLACK
3 2,4,|2|BLACK
4 2,3,5,|2|BLACK
5 1,2,4,|1|BLACK
Augmenting Edges with Degrees &
MapReduce
Problem:
This does not fit into MapReduce
Solution:
Requires two MapReduce
jobs: two reduce steps and two
map steps,
one of which is the identity map.
Augmenting Edges with Degrees &
MapReduce Example
Mapper:
for each input record, the map creates two
output records, one keyed under each
vertex in the edge.
Reducer:
The reduce takes all edges mapped to a
single vertex (“Fred” here), counts them to
obtain the degree, and emits a record for
each input record, each keyed under the
edge it represents.
Augmenting Edges with Degrees &
MapReduce Example
Mapper:
the identity mapper preserves the records
unchanged, so the records are binned by
the edges they represent.
Reducer:
The reducer combines the partial-degree
information to produce a complete record,
which it exports.
Enumerating Triangles & MapReduce
Example
 Problem:
Enumerating 3-cycle sub graph
from given graph
 Solution:
• augmenting the edge records
with vertex valence
• two MapReduce jobs
Enumerating Triangles & MapReduce
Example
• In the first map operation for enumerating triangles, the
mapper records each edge under the vertex with the lowest
degree.
• The incoming records’ key doesn’t matter.
Enumerating Triangles & MapReduce
Example
• In the first map operation for enumerating triangles, the
mapper records each edge under the vertex with the lowest
degree.
• The incoming records’ key doesn’t matter.
Enumerating Triangles & MapReduce
Example
• The second map for enumerating triangles brings together
the edge and open triad records.
• In the process, it rekeys the edge records so that both record
types are binned under the vertices they connect.
Enumerating Triangles & MapReduce
Example
• In the second reduce, each bin contains at most one edge record
and some number of triad records (perhaps none).
• For every combination of edge record and triad record in a bin, the
reduce emits a triangle record. The output key isn’t significant.
Bibliography
1. J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on
Large Clusters,” Comm. ACM, vol. 51, no. 1,2008, pp. 107–112.
2. GoogleDevelopers, “Lecture 5: Parallel Graph Algorithms with
MapReduce,” 28 Aug. 2007; http://youtube.com/watch?v=BT-piFBP4fE.
3. Jonathan Cohen, Graph Twiddling in a MapReduce World. Comp. in
Science & Engineering, July/August 2009, 29-41.
Thank You

Map reduce programming model to solve graph problems

  • 1.
    MapReduce Programming Model ToSolve Graph Problems Presented By: Nishant Gandhi M.Tech. - CSE 1st Year 1311CS05 Guided By: Dr. Rajiv Misra
  • 2.
    Seminar Overview • Introductionto MapReduce • MapReduce Programming Model – Word Count problem • Graph Problems & MapReduce – Breath First Search – Augmenting Edges with Degree – Enumerating Triangles from Graph
  • 3.
    Introduction to MapReduce •History of Computing – Moore’s Law • Not holding since last few years • Memory is still bottle neck for large GHZ processor – Distributed Problems • Indexing The Web, Simulating Internet Sized Network, Speeding Up Content Delivery, Rendering Multiple Frames – Parallel Computing (1975-1985) • Synchronization Problems • Very Costly Super Computers – Distributed Computing (1995-Today) • Cost Effective Solution • Use Commodity Hardware • Google has no Super Computer
  • 4.
    Introduction to MapReduce •History of MapReduce at Google – Problem at Google • Computing Large Amount of Data on DS • Parallelize Computing, Distribute Data, Handle Failure – One Solution • New Abstract that allows simple computation & hide all other mess • Automatics Parallelization, Distribution, Fault Handling • MapReduce Paper 2004
  • 5.
    MapReduce Programming Model •Motivation – Automatic Parallelization & Distribution – Fault tolerant – Provides Status & Monitoring Tool – Clean Abstract For Programmer
  • 6.
    MapReduce Programming Model •Programming Model – Borrows From Functional Programming – User Implement interface of two functions • Map & Reduce • map (in_key, in_value) --> (out_key, intermediate_value) list • reduce (out_key, intermediate_value list) --> out_value list
  • 7.
    MapReduce Programming Model map:(K1,V1) → list (K2,V2) reduce: (K2,list(V2)) → list (K3,V3) 1. Map function is applied to every input key-value pair 2. Map function generates intermediate key-value pairs 3. Intermediate key-values are sorted and grouped by key 4. Reduce is applied to sorted and grouped intermediate key-values 5. Reduce emits result key-values
  • 8.
  • 9.
  • 10.
    Graph Problems Graphs areubiquitous in modern society. Some examples: • The hyperlink structure of the web • Social networks on social networking sites like Facebook, IMDB, email, text messages and tweet flows (like Twitter) • Transportation networks (roads, trains, fights etc) • Human body can be seen as a graph of genes, proteins, cells etc..
  • 11.
    Graph Problems &MapReduce • Performing Computation on a graph data structure requires processing at each node • Each node contain node-specific data as well as links (edges) to other nodes • Computation must traverse the graph and perform the computation step • How do we traverse a graph in MapReduce? How do we represent the graph for this?
  • 12.
    Breath First Search& MapReduce Problem: This does not fit into MapReduce Solution: Iterated passes through MapReduce-map some nodes, result includes additional nodes which are fed into successive MapReduce passes
  • 13.
    Breath First Search& MapReduce Example Representation as adjacent list ID EDGES|DISTANCE_FROM_SOURCE|COLOR| • Input to MAP 1 2,5|0|GRAY| 2 1,3,4,5|Integer.MAX_VALUE|WHITE| 3 2,4|Integer.MAX_VALUE|WHITE| 4 2,3,5|Integer.MAX_VALUE|WHITE| 5 1,2,4|Integer.MAX_VALUE|WHITE|
  • 14.
    Breath First Search& MapReduce Example • 1st iteration of Map 1 2,5|0|BLACK| 2 NULL|1|GRAY| 5 NULL|1|GRAY| 2 1,3,4,5|Integer.MAX_VALUE|WHITE| 3 2,4|Integer.MAX_VALUE|WHITE| 4 2,3,5|Integer.MAX_VALUE|WHITE| 5 1,2,4|Integer.MAX_VALUE|WHITE| •1st iteration for Reduce(result only for node 2) 2 NULL|1|GRAY| 2 1,3,4,5|Integer.MAX_VALUE|WHITE| The reducers job is to take all this data and construct a new node using the non-null list of edges the minimum distance the darkest color
  • 15.
    Breath First Search& MapReduce Example •Output of 1st iteration 1 2,5,|0|BLACK 2 1,3,4,5,|1|GRAY 3 2,4,|Integer.MAX_VALUE|WHITE 4 2,3,5,|Integer.MAX_VALUE|WHITE 5 1,2,4,|1|GRAY •Output of 2st iteration 1 2,5,|0|BLACK 2 1,3,4,5,|1|BLACK 3 2,4,|2|GRAY 4 2,3,5,|2|GRAY 5 1,2,4,|1|BLACK
  • 16.
    Breath First Search& MapReduce Example •Output of 3st iteration 1 2,5,|0|BLACK 2 1,3,4,5,|1|BLACK 3 2,4,|2|BLACK 4 2,3,5,|2|BLACK 5 1,2,4,|1|BLACK
  • 17.
    Augmenting Edges withDegrees & MapReduce Problem: This does not fit into MapReduce Solution: Requires two MapReduce jobs: two reduce steps and two map steps, one of which is the identity map.
  • 18.
    Augmenting Edges withDegrees & MapReduce Example Mapper: for each input record, the map creates two output records, one keyed under each vertex in the edge. Reducer: The reduce takes all edges mapped to a single vertex (“Fred” here), counts them to obtain the degree, and emits a record for each input record, each keyed under the edge it represents.
  • 19.
    Augmenting Edges withDegrees & MapReduce Example Mapper: the identity mapper preserves the records unchanged, so the records are binned by the edges they represent. Reducer: The reducer combines the partial-degree information to produce a complete record, which it exports.
  • 20.
    Enumerating Triangles &MapReduce Example  Problem: Enumerating 3-cycle sub graph from given graph  Solution: • augmenting the edge records with vertex valence • two MapReduce jobs
  • 21.
    Enumerating Triangles &MapReduce Example • In the first map operation for enumerating triangles, the mapper records each edge under the vertex with the lowest degree. • The incoming records’ key doesn’t matter.
  • 22.
    Enumerating Triangles &MapReduce Example • In the first map operation for enumerating triangles, the mapper records each edge under the vertex with the lowest degree. • The incoming records’ key doesn’t matter.
  • 23.
    Enumerating Triangles &MapReduce Example • The second map for enumerating triangles brings together the edge and open triad records. • In the process, it rekeys the edge records so that both record types are binned under the vertices they connect.
  • 24.
    Enumerating Triangles &MapReduce Example • In the second reduce, each bin contains at most one edge record and some number of triad records (perhaps none). • For every combination of edge record and triad record in a bin, the reduce emits a triangle record. The output key isn’t significant.
  • 25.
    Bibliography 1. J. Deanand S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,” Comm. ACM, vol. 51, no. 1,2008, pp. 107–112. 2. GoogleDevelopers, “Lecture 5: Parallel Graph Algorithms with MapReduce,” 28 Aug. 2007; http://youtube.com/watch?v=BT-piFBP4fE. 3. Jonathan Cohen, Graph Twiddling in a MapReduce World. Comp. in Science & Engineering, July/August 2009, 29-41.
  • 26.