Join Algorithm in MapReduce
Vettal Wu
Agenda
• Introduction
• Join Operation
• MapReduce
• Problem Statement
• Join Algorithms
• Repartition Join
• Improved Repartition Join
• Broadcast Join
• Semi Join
• Per-split Semi Join
• Conclusion
Introduction
• What’s Join?
• A SQL join clause combines columns from one or more tables in a
relational database.
• Join Types
• Equal Join between two tables
• Theta Join between two tables
• Equal Join among multi tables(star join, chain join)
• Theta Join among multi tables
• …
Introduction
• What’s Join?
• A SQL join clause combines columns from one or more tables in a
relational database.
• Join Types
• Equal Join between two tables*
• Theta Join between two tables
• Equal Join among multi tables
• Theta Join among multi tables
• …
Introduction
• MapReduce
• a programming model and an associated implementation for processing and
generating big data sets with a parallel, distributed algorithm on a cluster.
• "Map" phase: Each worker node applies the "map()" function to the local data, and writes the
output to a temporary storage. A master node ensures that only one copy of redundant input
data is processed.
• "Shuffle" phase: Worker nodes redistribute data based on the output keys (produced by the
"map()" function), such that all data belonging to one key is located on the same worker node.
• "Reduce" phase: Worker nodes now process each group of output data, per key, in parallel.
Introduction
• MapReduce A brief
• Map phase
• map(in_key, in_value) -> list(out_key, intermediate_value)
• process input key values
• produce set of intermedia pairs
• Reduce phase
• reduce(out_key, list(intermediate_value)) -> list(out_value)
• combines all intermediate values for a particular key
• produce a set of merged output values
Problem Statement
• We consider an equi-join between table L and table R on a
single column.
• L, R and the Join Result is stored in DFS.
• WHERE clause is not considered here
A B
a0 b0
a1 b1
a2 b2
… …
B C
b0 c0
b0 c1
b1 c2
… …
L R
A B C
a0 b0 c0
a0 b0 c1
a1 b1 c2
… … …
Result
⨝
Repartition Join
• Reduce-side Join
• One MapReduce Job
• Map: read records from two table and send record to reducer
according to join key
• Reduce: join records
Repartition Join
R(A,B) L(B,C)
L
R
Input
Reduce input
Final output
Map Reduce
A B
a0 b0
a1 b1
a2 b2
… …
B C
b0 c0
b0 c1
b1 c2
… …
K V
b0 R:(a0, b0)
b0 L:(b0, c0)
b0 L:(b0, c1)
… …
K V
b1 R:(a1, b1)
b1 L:(b1, c2)
… …
A B C
a0 b0 c0
a0 b0 c1
a1 b1 c2
… … …
Repartition Join
Repartition Join
Repartition Join
• Problems
• All records have to be buffered
• May not fit in memory (out of memory)
• The data is highly skewed
• The key cardinality is small
Improved Repartition Join
• Output key: composite of the join key and table tag
• Customized partitioning: hash function of join key
• Records from the smaller table R are guaranteed to be
ahead of those from for a given key
• Only records in R are buffered and records in L are
streamed to generate the join output
K V
b0 R:(a0, b0)
K V
R:b0 R:(a0, b0)
Improved Repartition Join
Improved Repartition Join
Broadcast Join
• Map-side Join
• Some applications: |R| << |L|
• Avoid sending both R and L across the network
• Broadcasting the smaller table R to avoid the network overhead
• A map-only job
• Each map task uses a main-memory hash table for either L or R
• key: join key
• value: list of records
Broadcast Join
A B
a1 b0
a1 b1
A B
a2 b1
a2 b2
A B
a3 b2
a3 b3
Map
B C
b1 c0
b2 c1
R
⨝A B C
a1 b1 c1
A B C
a2 b2 c1
A B C
a3 b2 c1
Result
Broadcast Join
Broadcast Join
The smaller of R and the split of L is chosen to
build the hash table
Broadcast Join
Broadcast Join
Partition R on the join key
Not all partitions of R have to be loaded in
memory during the join
Semi Join
• Motivation
• Often , when R is large, many records in R may not be actually referenced by
any records in table L.
• Avoid sending the records in R over the network that will not join with L
A B
a0 b0
a1 b1
a2 b2
… …
a50 b50
B C
b0 c0
… …
b90 c90
b91 c91
L R
A B C
a0 b0 c0
a0 b0 c1
a1 b1 c2
… … …
Result
⨝
Semi Join
Semi Join
Preprocessing
Semi Join
Join
Per-split Semi Join
• Motivation
• The problem of Semi-join : not every record of R will join with a particular
split Li of L
A B
a0 b0
a1 b1
B C
b0 c0
b1 c1
b2 c2
b3 c3
L1
R
A B C
a0 b0 c0
a0 b0 c1
a1 b1 c2
… … …
Result
⨝
A B
a2 b2
a3 b3
L2
…
Per-split Semi Join
Per-split Semi Join
Per-split Semi Join
Filter R for each Li
Conclusion
Algorithm Pros Cons
Reduce-
side join
Repartition Join
One MapReduce job
Suitable for any case
Out of memory
Send both tables across network
Improved Repartition
Join
Avoid out of memory
Suitable for any case
Send both tables across network
Customized partition
Map-side
Join
Broadcast Join
Only send smaller table across
network
Only work for the case that one
table is small enough
Exist some records across
network
Semi Join
Filter some necessary records
across network
Preprocessing step
Coarse-grained filtering
Per-split Semi Join
Fine-grained filtering unnecessary
records across network
Preprocessing step
Each split of L need a specific R
Q&A

Join algorithm in MapReduce

  • 1.
    Join Algorithm inMapReduce Vettal Wu
  • 2.
    Agenda • Introduction • JoinOperation • MapReduce • Problem Statement • Join Algorithms • Repartition Join • Improved Repartition Join • Broadcast Join • Semi Join • Per-split Semi Join • Conclusion
  • 3.
    Introduction • What’s Join? •A SQL join clause combines columns from one or more tables in a relational database. • Join Types • Equal Join between two tables • Theta Join between two tables • Equal Join among multi tables(star join, chain join) • Theta Join among multi tables • …
  • 4.
    Introduction • What’s Join? •A SQL join clause combines columns from one or more tables in a relational database. • Join Types • Equal Join between two tables* • Theta Join between two tables • Equal Join among multi tables • Theta Join among multi tables • …
  • 5.
    Introduction • MapReduce • aprogramming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster. • "Map" phase: Each worker node applies the "map()" function to the local data, and writes the output to a temporary storage. A master node ensures that only one copy of redundant input data is processed. • "Shuffle" phase: Worker nodes redistribute data based on the output keys (produced by the "map()" function), such that all data belonging to one key is located on the same worker node. • "Reduce" phase: Worker nodes now process each group of output data, per key, in parallel.
  • 6.
    Introduction • MapReduce Abrief • Map phase • map(in_key, in_value) -> list(out_key, intermediate_value) • process input key values • produce set of intermedia pairs • Reduce phase • reduce(out_key, list(intermediate_value)) -> list(out_value) • combines all intermediate values for a particular key • produce a set of merged output values
  • 7.
    Problem Statement • Weconsider an equi-join between table L and table R on a single column. • L, R and the Join Result is stored in DFS. • WHERE clause is not considered here A B a0 b0 a1 b1 a2 b2 … … B C b0 c0 b0 c1 b1 c2 … … L R A B C a0 b0 c0 a0 b0 c1 a1 b1 c2 … … … Result ⨝
  • 8.
    Repartition Join • Reduce-sideJoin • One MapReduce Job • Map: read records from two table and send record to reducer according to join key • Reduce: join records
  • 9.
    Repartition Join R(A,B) L(B,C) L R Input Reduceinput Final output Map Reduce A B a0 b0 a1 b1 a2 b2 … … B C b0 c0 b0 c1 b1 c2 … … K V b0 R:(a0, b0) b0 L:(b0, c0) b0 L:(b0, c1) … … K V b1 R:(a1, b1) b1 L:(b1, c2) … … A B C a0 b0 c0 a0 b0 c1 a1 b1 c2 … … …
  • 10.
  • 11.
  • 12.
    Repartition Join • Problems •All records have to be buffered • May not fit in memory (out of memory) • The data is highly skewed • The key cardinality is small
  • 13.
    Improved Repartition Join •Output key: composite of the join key and table tag • Customized partitioning: hash function of join key • Records from the smaller table R are guaranteed to be ahead of those from for a given key • Only records in R are buffered and records in L are streamed to generate the join output K V b0 R:(a0, b0) K V R:b0 R:(a0, b0)
  • 14.
  • 15.
  • 16.
    Broadcast Join • Map-sideJoin • Some applications: |R| << |L| • Avoid sending both R and L across the network • Broadcasting the smaller table R to avoid the network overhead • A map-only job • Each map task uses a main-memory hash table for either L or R • key: join key • value: list of records
  • 17.
    Broadcast Join A B a1b0 a1 b1 A B a2 b1 a2 b2 A B a3 b2 a3 b3 Map B C b1 c0 b2 c1 R ⨝A B C a1 b1 c1 A B C a2 b2 c1 A B C a3 b2 c1 Result
  • 18.
  • 19.
    Broadcast Join The smallerof R and the split of L is chosen to build the hash table
  • 20.
  • 21.
    Broadcast Join Partition Ron the join key Not all partitions of R have to be loaded in memory during the join
  • 22.
    Semi Join • Motivation •Often , when R is large, many records in R may not be actually referenced by any records in table L. • Avoid sending the records in R over the network that will not join with L A B a0 b0 a1 b1 a2 b2 … … a50 b50 B C b0 c0 … … b90 c90 b91 c91 L R A B C a0 b0 c0 a0 b0 c1 a1 b1 c2 … … … Result ⨝
  • 23.
  • 24.
  • 25.
  • 26.
    Per-split Semi Join •Motivation • The problem of Semi-join : not every record of R will join with a particular split Li of L A B a0 b0 a1 b1 B C b0 c0 b1 c1 b2 c2 b3 c3 L1 R A B C a0 b0 c0 a0 b0 c1 a1 b1 c2 … … … Result ⨝ A B a2 b2 a3 b3 L2 …
  • 27.
  • 28.
  • 29.
  • 30.
    Conclusion Algorithm Pros Cons Reduce- sidejoin Repartition Join One MapReduce job Suitable for any case Out of memory Send both tables across network Improved Repartition Join Avoid out of memory Suitable for any case Send both tables across network Customized partition Map-side Join Broadcast Join Only send smaller table across network Only work for the case that one table is small enough Exist some records across network Semi Join Filter some necessary records across network Preprocessing step Coarse-grained filtering Per-split Semi Join Fine-grained filtering unnecessary records across network Preprocessing step Each split of L need a specific R
  • 31.

Editor's Notes

  • #14 Should give an example
  • #19 partitions R on the join key, and stores those partitions in the local file system. We do this in the hope that not all partitions of R have to be loaded in memory during the join.
  • #22 better to give a example with picturef
  • #23 Better to give an example
  • #24 better to use animation to show the comments
  • #25 better to use animation to show the comments
  • #26 better to use animation to show the comments