These slides are made to explain theta-join (M-bucket-I algorithm) proposed in Ockan et al. "Processing Theta-joins using map-reduce"
Paper link: http://www.ccs.neu.edu/home/mirek/papers/2011-SIGMOD-ParallelJoins.pdf
2. Processing pipeline at a reducer
Goal: We want to minimize job completion time. Since it’s a function of both
input and output, we need a way to model both inputs and outputs to a reducer.
Reducer Join OutputMapper Output
time = f(input size) time = f(output size)
Receive Mapper
Output
Sort input
by key
Read
input
Run join
algorithm
Send join
output
3. Theta Join Model
S_id Value
1 5
2 6
3 6
4 8
5 8
6 10
Dataset S Dataset T
T_id Value
1 5
2 5
3 6
4 8
5 8
6 10
Assuming join condition:
S.value = T.value
4. Theta Join Model
S_id Value
1 5
2 6
3 6
4 8
5 8
6 10
Dataset S Dataset T
T_id Value
1 5
2 5
3 6
4 8
5 8
6 10
Assuming join condition:
S.value = T.value
5 5 6 8 8 10
5
6
6
8
8
10
[ Join Matrix M ]
: tuple satisfying the join
condition
S
T
5. Theta Join Model
(Examples)
5 5 6 8 8 10
5
6
6
8
8
10
Join condition:
S.value <= T.value
S
T 5 5 6 8 8 10
5
6
6
8
8
10
Join condition:
abs (S.value - T.value) < 2
S
T 5 5 6 8 8 10
5
6
6
8
8
10
Join condition:
S.value = T.value
S
T
6. Theta Join Model
(Examples)
5 5 6 8 8 10
5
6
6
8
8
10
Join condition:
S.value <= T.value
S
T 5 5 6 8 8 10
5
6
6
8
8
10
Join condition:
abs (S.value - T.value) < 2
S
T 5 5 6 8 8 10
5
6
6
8
8
10
Join condition:
S.value = T.value
S
T
7. Theta Join Model
(Examples)
5 5 6 8 8 10
5
6
6
8
8
10
Join condition:
S.value <= T.value
S
T 5 5 6 8 8 10
5
6
6
8
8
10
Join condition:
abs (S.value - T.value) < 2
S
T 5 5 6 8 8 10
5
6
6
8
8
10
Join condition:
S.value = T.value
S
T
8. Goal Revisited
• We want to minimize job completion time
• We need to assign every true cell to exactly one
reducer. (find a mapping from M to R)
9. Goal Revisited
• We want to minimize job completion time
• We need to assign every true cell to exactly one
reducer. (find a mapping from M to R)
• Goal: Find a mapping from the join matrix M to
reducers that minimizes job completion time
16. Mappings from join matrix to
reducers
• We see there could be many possible mappings
from join matrix to reducers
• We will see in different cases, which mapping is
(close to) optimal and algorithms to compute
such mapping.
17. Lemma
We will be using the following lemma repeatedly to show
how (close to) optimal each mapping is.
[ LEMMA 1 ] A reducer that is assigned to c cells of the join
matrix M will receive at least input tuples
[ Proof ] Consider a reducer r that receives m records from
T and n records from S. Then,
!
!
2
p
c
mn c
2
p
mn 2
p
c
m + n 2
p
c
18. Lemma
We will be using the following lemma repeatedly to show
how (close to) optimal each mapping is.
[ LEMMA 1 ] A reducer that is assigned to c cells of the join
matrix M will receive at least input tuples
[ Proof ] Consider a reducer r that receives m records from
T and n records from S. Then,
!
!
2
p
c
mn c
2
p
mn 2
p
c
m + n 2
p
c
19. Cross Product
• We first consider cross product, where all of
tuples from two datasets satisfy the join
condition. The join matrix would look like the
following:
5 5 6 8 8 10
5
6
6
8
8
10
Join condition:
S X T
S
T
20. Cross Product
• We first consider cross product, where all of
tuples from two datasets satisfy the join
condition. The join matrix would look like the
following:
5 5 6 8 8 10
5
6
6
8
8
10
Join condition:
S X T
S
T
21. Cross Product
• Since all entries of the join matrix are true, we
can see that the maximum-reducer-output
(MRO) . (Otherwise, there would be
tuples not mapped to a reducer.)
• Along with Lemma 1, we have a lower bound for
the maximum-reducer-input (MRI):
MRI
|S||T|/r
2
r
|S||T|
r
[ LEMMA 1 ] A reducer that is assigned to c cells of the join
matrix M will receive at least input tuples2
p
c
22. Cross Product
• Since all entries of the join matrix are true, we
can see that the maximum-reducer-output
(MRO) . (Otherwise, there would be
tuples not mapped to a reducer.
• Along with Lemma 1, we have a lower bound for
the maximum-reducer-input (MRI):
MRI
|S||T|/r
2
r
|S||T|
r
[ LEMMA 1 ] A reducer that is assigned to c cells of the join
matrix M will receive at least input tuples2
p
c
23. Cross Product
• We will revisit these two properties frequently to
see the quality of join mappings:
|S||T|/rMRO and MRI 2
r
|S||T|
r
24. p
|S||T|/rCase 1: Suppose |S| and |T| are multiples of .
Namely, and .|S| = cs
p
|S||T|/r |T| = cT
p
|S||T|/r
Then, partitioning the join matrix with squares of size
is an optimal mapping.
p
|S||T|/r
Proof : is trivial. Each region mapped to a reducer
!
has output size: and input size:|S||T|/r 2
r
|S||T|
r
Cross Product
|S||T|/rMRO and MRI 2
r
|S||T|
r
Properties
25. p
|S||T|/rCase 1: Suppose |S| and |T| are multiples of .
Namely, and .|S| = cs
p
|S||T|/r |T| = cT
p
|S||T|/r
Then, partitioning the join matrix with squares of size
is an optimal mapping.
p
|S||T|/r
Proof : is trivial. Each region mapped to a reducer
!
has output size: and input size:|S||T|/r 2
r
|S||T|
r
Cross Product
|S||T|/rMRO and MRI 2
r
|S||T|
r
Properties
26. Cross Product
|S||T|/rMRO and MRI 2
r
|S||T|
r
Properties
5 5 6 8 8 10
5
6
6
8
8
10
S
T
Suppose |S| = |T| = 6
and r = 9
27. Cross Product
|S||T|/rMRO and MRI 2
r
|S||T|
r
Properties
5 5 6 8 8 10
5
6
6
8
8
10
S
T
Suppose |S| = |T| = 6
and r = 9
28. Cross Product
|S||T|/rMRO and MRI 2
r
|S||T|
r
Properties
5 5 6 8 8 10
5
6
6
8
8
10
S
T
Suppose |S| = |T| = 6
and r = 9
29. Cross Product
|S||T|/rMRO and MRI 2
r
|S||T|
r
Properties
5 5 6 8 8 10
5
6
6
8
8
10
S
T
Suppose |S| = |T| = 6
and r = 9
MRO = 4 = 2
r
|S||T|
r
MRI = 4 = |S||T|/r
30. Case 2: Suppose the cardinality of one dataset is
significantly greater than that of the other. (WLOG,
assume ). Then, rectangle cover
Cross Product
|S||T|/rMRO and MRI 2
r
|S||T|
r
Properties
|S| < |T|/r |S| ⇥ |T|/r
is the optimal mapping.
31. Case 2: Suppose the cardinality of one dataset is
significantly greater than that of the other. (WLOG,
assume ). Then, rectangle cover
Cross Product
|S||T|/rMRO and MRI 2
r
|S||T|
r
Properties
|S| < |T|/r |S| ⇥ |T|/r
is the optimal mapping.
(e.g., |S| = 3, |T| = 20, r = 5)
32. Case 3: The remaining case where .
!
Let ,
!
Then, covering M with squares
is a mapping worse than an optimal mapping by a
factor no greater than 4.
Cross Product
|S||T|/rMRO and MRI 2
r
|S||T|
r
Properties
|T|/r |S| |T|
CT =
$
|T|/
r
|S||T|
r
%
CS =
$
|S|/
r
|S||T|
r
%
p
|S||T|/r ⇥
p
|S||T|/r
33. If |S| and/or |T| is not a multiple of , scale each
!
side by and/or respectively to
!
cover M. Given , we see that
Cross Product
|S||T|/rMRO and MRI 2
r
|S||T|
r
Properties
p
|S||T|/r
✓
1 +
1
CS
◆ ✓
1 +
1
CT
◆
|T|/r |S| |T|
✓
1 +
1
CS
◆ r
|S||T|
r
2
r
|S||T|
r
34. Hence, and
Cross Product
|S||T|/rMRO and MRI 2
r
|S||T|
r
Properties
Comparing these with the lower bounds given above,
we see that the MRO and MRI produced by this mapping
are at most 4 times (twice for MRI) the lower bounds.
MRI 4
p
|S||T|/rMRO 4|S||T|/r
35. Implementation
• Now we know how to (nearly) optimally partition
the join matrix. So let’s run it!!
• However, when a reducer is given a record (either
from S or T), it does NOT have enough information
where exactly in the dataset (in which row/col) the
record belongs to.
• We could run another pre-process to get that info,
but it can be avoided by running a randomized
algorithm!
36. Implementation
• Now we know how to (nearly) optimally partition
the join matrix. So let’s run it!!
• However, when a reducer is given a record (either
from S or T), it does NOT have enough information
where exactly in the dataset (in which row/col) the
record belongs to.
• We could run another pre-process to get that info,
but it can be avoided by running a randomized
algorithm!
37. Implementation
• Now we know how to (nearly) optimally partition
the join matrix. So let’s run it!!
• However, when a reducer is given a record (either
from S or T), it does NOT have enough information
where exactly in the dataset (in which row/col) the
record belongs to.
• We could run another pre-process to get that info,
but it can be avoided by running a randomized
algorithm!
38. Mapping & Randomized
Algorithm
Algorithm 1 : Map (Theta - Join)
!
Input : input tuple
1: if then
2: matrixRow = random(1,|S|)
3: for all regionID in lookup.getRegions(matrixRow)
do
4: Output (regionID, (x, “S”) )
5: else
6: matrixCol = random (1,|T|)
7: for all regionID in lookup.getRegions(matrixCol)
do
8: Output (regionID, (x, “T”) )
x 2 S [ T
x 2 S
39. Mapping & Randomized
Algorithm
Algorithm 1 : Map (Theta - Join)
!
Input : input tuple
1: if then
2: matrixRow = random(1,|S|)
3: for all regionID in lookup.getRegions(matrixRow)
do
4: Output (regionID, (x, “S”) )
5: else
6: matrixCol = random (1,|T|)
7: for all regionID in lookup.getRegions(matrixCol)
do
8: Output (regionID, (x, “T”) )
x 2 S [ T
x 2 S
1. Given a record ( WLOG )
2. Get a row uniformly randomly
3. Get all the regions intersecting that row
and output ( regID, (x, S) )
x 2 S
41. Cross Product… NOT!
• We have verified that 1 Bucket Theta algorithm is
close to optimal when the join condition is cross
product.
• How does 1 Bucket Theta algorithm perform
when join condition is NOT cross product ?
• We will compare the quality of 1 Bucket Theta
algorithm to any join algorithm
42. Cross Product… NOT!
• We have verified that 1 Bucket Theta algorithm is
close to optimal when the join condition is cross
product.
• How does 1 Bucket Theta algorithm perform
when join condition is NOT cross product ?
• We will compare the quality of 1 Bucket Theta
algorithm to any join algorithm
43. Cross Product… NOT!
• We have verified that 1 Bucket Theta algorithm is
close to optimal when the join condition is cross
product.
• How does 1 Bucket Theta algorithm perform
when join condition is NOT cross product ?
• We will compare the quality of 1 Bucket Theta
algorithm to any join algorithm
44. 1BT vs ANY join algorithm
Let . Any matrix to reducer mapping that has
to cover at least of the cells of the join matrix,
by Lemma 1, has MRI
1 x > 0
x|S||T| |S||T|
2
p
x|S||T|
[ LEMMA 1 ] A reducer that is assigned to c cells of the join
matrix M will receive at least input tuples2
p
c
As we have seen, 1BT guarantees that MRI .
!
Hence,
4
p
|S||T|
MRI1BT
MRIAnyJoinAlg
=
4
p
|S||T|/r
2
p
x|S||T|/r
=
2
p
x
46. 1BT vs ANY join algorithm
When , the ratio < 3.
!
Hence,compared to ANY join algorithm that assigns more
than 50% of its matrix cells to reducers, the MRI for 1BT is
at most 3 times the MRI of that algorithm.
x = 0.5
47. 1BT vs ANY join algorithm
When , the ratio < 3.
!
Hence,compared to ANY join algorithm that assigns more
than 50% of its matrix cells to reducers, the MRI for 1BT is
at most 3 times the MRI of that algorithm.
x = 0.5
48. M-Bucket-I
• In the previous slide, we see that instead of
covering the entire matrix, mapping smaller
regions would yield better MRI result.
• Ideally, we only want to map those satisfying the
join condition, but it cannot be done before
knowing input statistics and/or join condition.
• M-Bucket-I exploits statistics to improve over 1
Bucket Theta join algorithm
49. M-Bucket-I
• In the previous slide, we see that instead of
covering the entire matrix, mapping smaller
regions would yield better MRI result.
• Ideally, we only want to map those satisfying the
join condition, but it cannot be done before
knowing input statistics and/or join condition.
• M-Bucket-I exploits statistics to improve over 1
Bucket Theta join algorithm
50. M-Bucket-I
• In the previous slide, we see that instead of
covering the entire matrix, mapping smaller
regions would yield better MRI result.
• Ideally, we only want to map those satisfying the
join condition, but it cannot be done before
knowing input statistics and/or join condition.
• M-Bucket-I exploits statistics to improve over 1
Bucket Theta join algorithm
51. M-Bucket-I
[ Step 1 ] Approximate Equi-Depth Histograms
1) With probability n /|S|, sample approx. n records
from |S|
2) Build k-quantiles (k buckets), where k < n
3) Iterate through |S| and count the number of
records in each bucket
4) Do the same for |T| and build the join matrix
accordingly
57. M-Bucket-I
[ Step 1 ] Approximate Equi-Depth Histograms
S S S S S S S S S S
T
T
T
T
T
T
T
T
T
T
Join condition:
S.value = T.value
58. M-Bucket-I
[ Step 1 ] Approximate Equi-Depth Histograms
S S S S S S S S S S
T
T
T
T
T
T
T
T
T
T
2 3 9
1
5
8
Join condition:
S.value = T.value
59. M-Bucket-I
[ Step 1 ] Approximate Equi-Depth Histograms
S S S S S S S S S S
T
T
T
T
T
T
T
T
T
T
2 3 9
1
5
8
Join condition:
S.value = T.value
We now have candidate
cells. How do we map
these cells to reducers?
60. M-Bucket-I
[ Step 2 ] M-Bucket-I Algorithm
Algorithm : M-Bucket-I
!
Input : maxInput, r, M
1: row = 0
2: while row < M.noOfRows do
3: (row,r) = CoverSubMatrix(row, maxInput, r, M)
4: if r < 0 then!
5: return false
6: return true!
61. M-Bucket-I
Algorithm : CoverSubMatrix
!
Input : row_s, maxInput, r, M
1: maxScore = -1, rUsed = 0
2: for i = 1 to maxInput-1 do
3: R_i = CoverRows(row_s, row_s + i, maxInput, M)
4: area = totalCandidateArea(row_s, row_s + i, M)
5: score = area/R_i.size
6: if score >= maxScore then!
7: bestRow = row_s + i
8: rUsed = R_i.size
9: r = r - rUsed
10: return (bestRow + 1, r)
[ Step 2 ] M-Bucket-I Algorithm
62. M-Bucket-I
Algorithm : CoverRows
!
Input : row_f, row_l, maxInput, M
1: Regions = 0; r = newRegion()
2: for all c_i in M.getColumns do
3: if r. cap < c_i.candidateInputCosts then!
4: Regions = Regions U r
5: r = newRegion()
6: r.Cells = r.Cells U c_i.candidateCells
7: return Regions
[ Step 2 ] M-Bucket-I Algorithm
68. M-Bucket-I
Run the algorithm with r = 6
maxInput = 5
row : 0
cost : 4
row : 1
cost : 13/3 = 4.3
row : 2
cost : 22/4 = 5.5
row : 3
cost : 31/7 = 4.428..
We choose the mapping with
highest score!
(1) (2)
(3) (4)
[ Step 2 ] M-Bucket-I Algorithm
69. M-Bucket-I
Run the algorithm with r = 6
maxInput = 5
row : 3
cost : 3
(1) (2)
(3) (4) So on and so forth…
[ Step 2 ] M-Bucket-I Algorithm
70. M-Bucket-I
Run the algorithm with r = 6
maxInput = 5
Final mapping!
[ Step 2 ] M-Bucket-I Algorithm
(1) (2)
(3) (4)
(7)(6)(5)
(8) (9)
(10)
(11) (12)
(13)
71. M-Bucket-I
Run the algorithm with r = 6
maxInput = 5
(1) (2)
(3) (4)
However, we have mapped the
candidate cells to > r reducers.
!
We do binary search until we get
to the point where we a mapping
to <= r reducers.
(7)(6)(5)
(8) (9)
(10)
(11) (12)
(13)
[ Step 2 ] M-Bucket-I Algorithm