Finding Top-k Dominance on
Incomplete Big Data Using MapReduce
Framework
IEEE Access (Volume: 6), January 2018
Navid Kalaei
Shiraz University of Technology
Content
2
• Top-k Dominances
• Definition
• Q, P, and nonD
• Bitmap
• P and Q
• Algorithm
• Evaluation
• References
Top-k Dominances
3
• The most powerful data
• Data may have missing
values
What
• Skyband Based Algorithm
• Upper Bound Based
Algorithm
• Bitmap Index Guided
Algorithm
How
• Find nominates
• Estimate incomplete data
• Recommender systems
Why
Definition
4
d1 d2 d3 d4
m1 - 1 2 -
m2 1 - 3 2
m3 3 1 - -
m4 - - - 1
m5 - 2 1 -
m1 dominates m2 if:
“All of the m1’s dimensions are bigger
than m2’s
Excluding the missing dimensions”
d1 d2 d3 d4
m6 - 1 2 -
m7 1 - 3 2
m8 3 1 - -
m9 - - - 1
m1
0
- 2 1 -
Q, P, and nonD
5
 Q: not better than m
 nonD: not dominant
by m
 P: strictly worse than
m
 Ø: not comparable to
m
Q
non
D
ø
P
Bitmap
6
d1 - 1 2 3 d2 - 1 2 3 d3 - 1 2 3
m
1
- 0 0 0 0 1 0 1 1 1 2 0 0 1 1
m
2
1 0 1 1 1 - 0 0 0 0 3 0 0 0 1
m
3
3 0 0 0 1 1 0 1 1 1 - 0 0 0 0
m
4
- 0 0 0 0 2 0 0 1 1 1 0 1 1 1
m
5
1 0 1 1 1 1 0 1 1 1 - 0 0 0 0
m
6
- 0 0 0 0 3 0 0 0 1 2 0 0 1 1
m 2 0 0 1 1 - 0 0 0 0 2 0 0 1 1
P and Q
7
d1 - 1 2 3 d2 - 1 2 3 d3 - 1 2 3
m
1
- 0 0 0 0 1 0 1 1 1 2 0 0 1 1
m
2
1 0 1 1 1 - 0 0 0 0 3 0 0 0 1
m
3
3 0 0 0 1 1 0 1 1 1 - 0 0 0 0
m3 P1 Q1 P2 Q2
m1 0 0 0 0 0 0 0 1
m2 0 1 1 1 1 1 0 0
Algorithm
8
1) MapReduce splits and send each dimension
to mappers
2) Mapper maps each dimension to its equivalent
Bitmap
3) Mapper computes the sets P, Q, and nonD
from Bitmaps
4) MapReduce pipes Ps, Qs, and nonD to
reducers
5) Reducer assigns the bitwise AND of Ps to P*
6) Reducer assigns the bitwise AND of Qs to Q*
7) MapReduce computes and stores the element’s
score
8) MapReduce sorts the scores
Pseudo Code
9
Overview
10
Evaluation
11
Name No. of
Users
No. of
Movies
BIG
(min)
MRBIG
(min)
PR
100K 1,000 1,700 0.42 0.43 0.97
1M 6,040 3,706 13.4 15 0.89
10M 71,000 11,000 1540 1440 1.07
20M 138,000 26,000 18500 15500 1.19
Evaluation
12
References
13
 Finding Top- k Dominance on Incomplete Big
Data Using MapReduce Framework [link]
 Top-k dominating queries on incomplete data
[link]

Finding Top-k Dominance on Incomplete Big Data Using MapReduce Framework

  • 1.
    Finding Top-k Dominanceon Incomplete Big Data Using MapReduce Framework IEEE Access (Volume: 6), January 2018 Navid Kalaei Shiraz University of Technology
  • 2.
    Content 2 • Top-k Dominances •Definition • Q, P, and nonD • Bitmap • P and Q • Algorithm • Evaluation • References
  • 3.
    Top-k Dominances 3 • Themost powerful data • Data may have missing values What • Skyband Based Algorithm • Upper Bound Based Algorithm • Bitmap Index Guided Algorithm How • Find nominates • Estimate incomplete data • Recommender systems Why
  • 4.
    Definition 4 d1 d2 d3d4 m1 - 1 2 - m2 1 - 3 2 m3 3 1 - - m4 - - - 1 m5 - 2 1 - m1 dominates m2 if: “All of the m1’s dimensions are bigger than m2’s Excluding the missing dimensions” d1 d2 d3 d4 m6 - 1 2 - m7 1 - 3 2 m8 3 1 - - m9 - - - 1 m1 0 - 2 1 -
  • 5.
    Q, P, andnonD 5  Q: not better than m  nonD: not dominant by m  P: strictly worse than m  Ø: not comparable to m Q non D ø P
  • 6.
    Bitmap 6 d1 - 12 3 d2 - 1 2 3 d3 - 1 2 3 m 1 - 0 0 0 0 1 0 1 1 1 2 0 0 1 1 m 2 1 0 1 1 1 - 0 0 0 0 3 0 0 0 1 m 3 3 0 0 0 1 1 0 1 1 1 - 0 0 0 0 m 4 - 0 0 0 0 2 0 0 1 1 1 0 1 1 1 m 5 1 0 1 1 1 1 0 1 1 1 - 0 0 0 0 m 6 - 0 0 0 0 3 0 0 0 1 2 0 0 1 1 m 2 0 0 1 1 - 0 0 0 0 2 0 0 1 1
  • 7.
    P and Q 7 d1- 1 2 3 d2 - 1 2 3 d3 - 1 2 3 m 1 - 0 0 0 0 1 0 1 1 1 2 0 0 1 1 m 2 1 0 1 1 1 - 0 0 0 0 3 0 0 0 1 m 3 3 0 0 0 1 1 0 1 1 1 - 0 0 0 0 m3 P1 Q1 P2 Q2 m1 0 0 0 0 0 0 0 1 m2 0 1 1 1 1 1 0 0
  • 8.
    Algorithm 8 1) MapReduce splitsand send each dimension to mappers 2) Mapper maps each dimension to its equivalent Bitmap 3) Mapper computes the sets P, Q, and nonD from Bitmaps 4) MapReduce pipes Ps, Qs, and nonD to reducers 5) Reducer assigns the bitwise AND of Ps to P* 6) Reducer assigns the bitwise AND of Qs to Q* 7) MapReduce computes and stores the element’s score 8) MapReduce sorts the scores
  • 9.
  • 10.
  • 11.
    Evaluation 11 Name No. of Users No.of Movies BIG (min) MRBIG (min) PR 100K 1,000 1,700 0.42 0.43 0.97 1M 6,040 3,706 13.4 15 0.89 10M 71,000 11,000 1540 1440 1.07 20M 138,000 26,000 18500 15500 1.19
  • 12.
  • 13.
    References 13  Finding Top-k Dominance on Incomplete Big Data Using MapReduce Framework [link]  Top-k dominating queries on incomplete data [link]

Editor's Notes

  • #4 The power is defined by a function Missing values are random
  • #5 Row=movie, column=user,value=rate M2>M5 M1=M5 M1?M4
  • #6 Q -> nonD -> P -> O Pure score = Q – nonD - O
  • #7 Find unique values of each dimension Sort them including missing symbol For missing value: all zeros, for value: one to the end Dimensions are independent and could be calculated in parallel Mapper 1 = flat map
  • #8 The object itself is not included D3 is not included since it’s missed Len(Ps and Qs) = len(objects) - 1
  • #13 A: dimension=3500 B: object=6000