Finding Top-k Dominance on Incomplete Big Data Using MapReduce Framework

Finding Top-k Dominance on
Incomplete Big Data Using MapReduce
Framework
IEEE Access (Volume: 6), January 2018
Navid Kalaei
Shiraz University of Technology

Content
2
• Top-k Dominances
• Definition
• Q, P, and nonD
• Bitmap
• P and Q
• Algorithm
• Evaluation
• References

Top-k Dominances
3
• The most powerful data
• Data may have missing
values
What
• Skyband Based Algorithm
• Upper Bound Based
Algorithm
• Bitmap Index Guided
Algorithm
How
• Find nominates
• Estimate incomplete data
• Recommender systems
Why

Definition
4
d1 d2 d3 d4
m1 - 1 2 -
m2 1 - 3 2
m3 3 1 - -
m4 - - - 1
m5 - 2 1 -
m1 dominates m2 if:
“All of the m1’s dimensions are bigger
than m2’s
Excluding the missing dimensions”
d1 d2 d3 d4
m6 - 1 2 -
m7 1 - 3 2
m8 3 1 - -
m9 - - - 1
m1
0
- 2 1 -

Q, P, and nonD
5
 Q: not better than m
 nonD: not dominant
by m
 P: strictly worse than
m
 Ø: not comparable to
m
Q
non
D
ø
P

Bitmap
6
d1 - 1 2 3 d2 - 1 2 3 d3 - 1 2 3
m
1
- 0 0 0 0 1 0 1 1 1 2 0 0 1 1
m
2
1 0 1 1 1 - 0 0 0 0 3 0 0 0 1
m
3
3 0 0 0 1 1 0 1 1 1 - 0 0 0 0
m
4
- 0 0 0 0 2 0 0 1 1 1 0 1 1 1
m
5
1 0 1 1 1 1 0 1 1 1 - 0 0 0 0
m
6
- 0 0 0 0 3 0 0 0 1 2 0 0 1 1
m 2 0 0 1 1 - 0 0 0 0 2 0 0 1 1

P and Q
7
d1 - 1 2 3 d2 - 1 2 3 d3 - 1 2 3
m
1
- 0 0 0 0 1 0 1 1 1 2 0 0 1 1
m
2
1 0 1 1 1 - 0 0 0 0 3 0 0 0 1
m
3
3 0 0 0 1 1 0 1 1 1 - 0 0 0 0
m3 P1 Q1 P2 Q2
m1 0 0 0 0 0 0 0 1
m2 0 1 1 1 1 1 0 0

Algorithm
8
1) MapReduce splits and send each dimension
to mappers
2) Mapper maps each dimension to its equivalent
Bitmap
3) Mapper computes the sets P, Q, and nonD
from Bitmaps
4) MapReduce pipes Ps, Qs, and nonD to
reducers
5) Reducer assigns the bitwise AND of Ps to P*
6) Reducer assigns the bitwise AND of Qs to Q*
7) MapReduce computes and stores the element’s
score
8) MapReduce sorts the scores

Evaluation
11
Name No. of
Users
No. of
Movies
BIG
(min)
MRBIG
(min)
PR
100K 1,000 1,700 0.42 0.43 0.97
1M 6,040 3,706 13.4 15 0.89
10M 71,000 11,000 1540 1440 1.07
20M 138,000 26,000 18500 15500 1.19

References
13
 Finding Top- k Dominance on Incomplete Big
Data Using MapReduce Framework [link]
 Top-k dominating queries on incomplete data
[link]

Finding Top-k Dominance on Incomplete Big Data Using MapReduce Framework

More Related Content

What's hot

Recently uploaded

Finding Top-k Dominance on Incomplete Big Data Using MapReduce Framework

Editor's Notes