Finding Top-k Dominance on Incomplete Big Data Using MapReduce Framework

•Download as PPTX, PDF•

0 likes•129 views

Incomplete data is one major kind of multi-dimensional dataset that has random-distributed missing nodes in its dimensions. It is very difficult to retrieve information from this type of dataset when it becomes large. Finding top-k dominant values in this type of dataset is a challenging procedure. In this presentation, I introduced a novel approach based on MapReduce and Bit Map Indexing algorithm introduced by students of the University of Nevada. Paper: https://ieeexplore.ieee.org/abstract/document/8267252

Data & Analytics

Finding Top-k Dominance on
Incomplete Big Data Using MapReduce
Framework
IEEE Access (Volume: 6), January 2018
Navid Kalaei
Shiraz University of Technology

Content
2
• Top-k Dominances
• Definition
• Q, P, and nonD
• Bitmap
• P and Q
• Algorithm
• Evaluation
• References

Top-k Dominances
3
• The most powerful data
• Data may have missing
values
What
• Skyband Based Algorithm
• Upper Bound Based
Algorithm
• Bitmap Index Guided
Algorithm
How
• Find nominates
• Estimate incomplete data
• Recommender systems
Why

Definition
4
d1 d2 d3 d4
m1 - 1 2 -
m2 1 - 3 2
m3 3 1 - -
m4 - - - 1
m5 - 2 1 -
m1 dominates m2 if:
“All of the m1’s dimensions are bigger
than m2’s
Excluding the missing dimensions”
d1 d2 d3 d4
m6 - 1 2 -
m7 1 - 3 2
m8 3 1 - -
m9 - - - 1
m1
0
- 2 1 -

Q, P, and nonD
5
 Q: not better than m
 nonD: not dominant
by m
 P: strictly worse than
m
 Ø: not comparable to
m
Q
non
D
ø
P

Bitmap
6
d1 - 1 2 3 d2 - 1 2 3 d3 - 1 2 3
m
1
- 0 0 0 0 1 0 1 1 1 2 0 0 1 1
m
2
1 0 1 1 1 - 0 0 0 0 3 0 0 0 1
m
3
3 0 0 0 1 1 0 1 1 1 - 0 0 0 0
m
4
- 0 0 0 0 2 0 0 1 1 1 0 1 1 1
m
5
1 0 1 1 1 1 0 1 1 1 - 0 0 0 0
m
6
- 0 0 0 0 3 0 0 0 1 2 0 0 1 1
m 2 0 0 1 1 - 0 0 0 0 2 0 0 1 1

P and Q
7
d1 - 1 2 3 d2 - 1 2 3 d3 - 1 2 3
m
1
- 0 0 0 0 1 0 1 1 1 2 0 0 1 1
m
2
1 0 1 1 1 - 0 0 0 0 3 0 0 0 1
m
3
3 0 0 0 1 1 0 1 1 1 - 0 0 0 0
m3 P1 Q1 P2 Q2
m1 0 0 0 0 0 0 0 1
m2 0 1 1 1 1 1 0 0

Algorithm
8
1) MapReduce splits and send each dimension
to mappers
2) Mapper maps each dimension to its equivalent
Bitmap
3) Mapper computes the sets P, Q, and nonD
from Bitmaps
4) MapReduce pipes Ps, Qs, and nonD to
reducers
5) Reducer assigns the bitwise AND of Ps to P*
6) Reducer assigns the bitwise AND of Qs to Q*
7) MapReduce computes and stores the element’s
score
8) MapReduce sorts the scores

Evaluation
11
Name No. of
Users
No. of
Movies
BIG
(min)
MRBIG
(min)
PR
100K 1,000 1,700 0.42 0.43 0.97
1M 6,040 3,706 13.4 15 0.89
10M 71,000 11,000 1540 1440 1.07
20M 138,000 26,000 18500 15500 1.19

References
13
 Finding Top- k Dominance on Incomplete Big
Data Using MapReduce Framework [link]
 Top-k dominating queries on incomplete data
[link]

What's hot

S2Daniel Marcous

Data visualization using r pt 20140316Myung-Hoe Huh

POSTER_BUSTOSGuillermo Bustos

calculating wind speed and direction using arcgissaqibjavaid17

3D Analyst - Watershed LorelinduHartanto Sanjaya

R user group 2011 09MapR Technologies

3D Analyst - Lake, JatiluhurHartanto Sanjaya

LIDAR-derived DTM for archaeology and landscape history research some recent ...Shaun Lewis

Building k-nn Graphs From Large Text DataThibault Debatty

Private and secure secret shared map reduceShantanu Sharma

Unit 4 GPS infinitesimal strain analysis presentationSERC at Carleton College

Geographic Information Systems (April – 2016) [Question Paper | CBSGS: 75:25 ...Mumbai B.Sc.IT Study

Regional Address Map for Public Safety (RAMPS)Safe Software

Longest Common Sequence Algorithm AnalysisRex Yuan

3D Analyst - Cut and FillHartanto Sanjaya

Penggunaan regresi kuadrat terkecil parsial abstractGusti Rusmayadi

EU 2015 (RJ) - FME pelo mundo - casos de sucessoInovação GIS - Tecnologia da Informação

EU 2015 (SP) - FME pelo mundo - casos de sucessoInovação GIS - Tecnologia da Informação

Dijkstra algorithm a dynammic programming approachAkash Sethiya

Cypher for Apache SparkopenCypher

What's hot (20)

Data visualization using r pt 20140316

POSTER_BUSTOS

calculating wind speed and direction using arcgis

3D Analyst - Watershed Lorelindu

R user group 2011 09

3D Analyst - Lake, Jatiluhur

LIDAR-derived DTM for archaeology and landscape history research some recent ...

Building k-nn Graphs From Large Text Data

Private and secure secret shared map reduce

Unit 4 GPS infinitesimal strain analysis presentation

Geographic Information Systems (April – 2016) [Question Paper | CBSGS: 75:25 ...

Regional Address Map for Public Safety (RAMPS)

Longest Common Sequence Algorithm Analysis

3D Analyst - Cut and Fill

Penggunaan regresi kuadrat terkecil parsial abstract

EU 2015 (RJ) - FME pelo mundo - casos de sucesso

EU 2015 (SP) - FME pelo mundo - casos de sucesso

Dijkstra algorithm a dynammic programming approach

Cypher for Apache Spark

Similar to Finding Top-k Dominance on Incomplete Big Data Using MapReduce Framework

Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...Austin Benson

LIDAR- Light Detection and Ranging.Gaurav Agarwal

Principal Components Analysis, Calculation and VisualizationMarjan Sterjev

Statistics & Decision Science for Agile - A Guided TourSanjaya K Saxena

대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화NAVER Engineering

A New Key Stream Generator Based on 3D Henon map and 3D Cat maptayseer Karam alshekly

Review of Digital Soil Mapping stepsFAO

Meta-MapReduce- A Technique for Reducing Communication in MapReduce ComputationsShantanu Sharma

Dp idp exploredbGeorge Valkanas

Tall-and-skinny Matrix Computations in MapReduce (ICME colloquium)Austin Benson

Topology hiding Multipath Routing Protocol in MANETAkshay Phalke

Design and Implementation of Variable Radius Sphere Decoding Algorithmcsandit

Analysis and implementation of modified k medoidseSAT Publishing House

Web-Scale Graph Analytics with Apache Spark with Tim HunterDatabricks

Clustering techniquestalktoharry

Point-GNN: Graph Neural Network for 3D Object Detection in a Point CloudNuwan Sriyantha Bandara

Spc laPaul Robere

Time of arrival based localization in wireless sensor networks a non linear ...sipij

Map reduce programming model to solve graph problemsNishant Gandhi

Data quality evaluation & orbit identification from scatterometerMudit Dholakia

Similar to Finding Top-k Dominance on Incomplete Big Data Using MapReduce Framework (20)

Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...

LIDAR- Light Detection and Ranging.

Principal Components Analysis, Calculation and Visualization

Statistics & Decision Science for Agile - A Guided Tour

대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화

A New Key Stream Generator Based on 3D Henon map and 3D Cat map

Review of Digital Soil Mapping steps

Meta-MapReduce- A Technique for Reducing Communication in MapReduce Computations

Dp idp exploredb

Tall-and-skinny Matrix Computations in MapReduce (ICME colloquium)

Topology hiding Multipath Routing Protocol in MANET

Design and Implementation of Variable Radius Sphere Decoding Algorithm

Analysis and implementation of modified k medoids

Web-Scale Graph Analytics with Apache Spark with Tim Hunter

Clustering techniques

Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud

Spc la

Time of arrival based localization in wireless sensor networks a non linear ...

Map reduce programming model to solve graph problems

Data quality evaluation & orbit identification from scatterometer

Recently uploaded

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion

Introduction-to-Machine-Learning (1).pptxfirstjob4

Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45

Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY

Midocean dropshipping via API with DroFxolyaivanovalion

Data-Analysis for Chicago Crime Data 2023ymrp368

Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson

VidaXL dropshipping via API with DroFx.pptxolyaivanovalion

Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Riyadh +966572737505 get cytotec

Mature dropshipping via API with DroFx.pptxolyaivanovalion

FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg

Zuja dropshipping via API with DroFx.pptxolyaivanovalion

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls

Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls

BigBuy dropshipping via API with DroFx.pptxolyaivanovalion

Discover Why Less is More in B2B Researchmichael115558

Invezz.com - Grow your wealth with trading signalsInvezz1

Recently uploaded (20)

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

CebaBaby dropshipping via API with DroFX.pptx

Introduction-to-Machine-Learning (1).pptx

Determinants of health, dimensions of health, positive health and spectrum of...

Log Analysis using OSSEC sasoasasasas.pptx

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...

Midocean dropshipping via API with DroFx

Data-Analysis for Chicago Crime Data 2023

Schema on read is obsolete. Welcome metaprogramming..pdf

VidaXL dropshipping via API with DroFx.pptx

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec

Mature dropshipping via API with DroFx.pptx

FESE Capital Markets Fact Sheet 2024 Q1.pdf

Zuja dropshipping via API with DroFx.pptx

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779

BigBuy dropshipping via API with DroFx.pptx

Discover Why Less is More in B2B Research

Invezz.com - Grow your wealth with trading signals

Finding Top-k Dominance on Incomplete Big Data Using MapReduce Framework

1. Finding Top-k Dominance on Incomplete Big Data Using MapReduce Framework IEEE Access (Volume: 6), January 2018 Navid Kalaei Shiraz University of Technology

2. Content 2 • Top-k Dominances • Definition • Q, P, and nonD • Bitmap • P and Q • Algorithm • Evaluation • References

3. Top-k Dominances 3 • The most powerful data • Data may have missing values What • Skyband Based Algorithm • Upper Bound Based Algorithm • Bitmap Index Guided Algorithm How • Find nominates • Estimate incomplete data • Recommender systems Why

4. Definition 4 d1 d2 d3 d4 m1 - 1 2 - m2 1 - 3 2 m3 3 1 - - m4 - - - 1 m5 - 2 1 - m1 dominates m2 if: “All of the m1’s dimensions are bigger than m2’s Excluding the missing dimensions” d1 d2 d3 d4 m6 - 1 2 - m7 1 - 3 2 m8 3 1 - - m9 - - - 1 m1 0 - 2 1 -

5. Q, P, and nonD 5  Q: not better than m  nonD: not dominant by m  P: strictly worse than m  Ø: not comparable to m Q non D ø P

6. Bitmap 6 d1 - 1 2 3 d2 - 1 2 3 d3 - 1 2 3 m 1 - 0 0 0 0 1 0 1 1 1 2 0 0 1 1 m 2 1 0 1 1 1 - 0 0 0 0 3 0 0 0 1 m 3 3 0 0 0 1 1 0 1 1 1 - 0 0 0 0 m 4 - 0 0 0 0 2 0 0 1 1 1 0 1 1 1 m 5 1 0 1 1 1 1 0 1 1 1 - 0 0 0 0 m 6 - 0 0 0 0 3 0 0 0 1 2 0 0 1 1 m 2 0 0 1 1 - 0 0 0 0 2 0 0 1 1

7. P and Q 7 d1 - 1 2 3 d2 - 1 2 3 d3 - 1 2 3 m 1 - 0 0 0 0 1 0 1 1 1 2 0 0 1 1 m 2 1 0 1 1 1 - 0 0 0 0 3 0 0 0 1 m 3 3 0 0 0 1 1 0 1 1 1 - 0 0 0 0 m3 P1 Q1 P2 Q2 m1 0 0 0 0 0 0 0 1 m2 0 1 1 1 1 1 0 0

8. Algorithm 8 1) MapReduce splits and send each dimension to mappers 2) Mapper maps each dimension to its equivalent Bitmap 3) Mapper computes the sets P, Q, and nonD from Bitmaps 4) MapReduce pipes Ps, Qs, and nonD to reducers 5) Reducer assigns the bitwise AND of Ps to P* 6) Reducer assigns the bitwise AND of Qs to Q* 7) MapReduce computes and stores the element’s score 8) MapReduce sorts the scores

9. Pseudo Code 9

10. Overview 10

11. Evaluation 11 Name No. of Users No. of Movies BIG (min) MRBIG (min) PR 100K 1,000 1,700 0.42 0.43 0.97 1M 6,040 3,706 13.4 15 0.89 10M 71,000 11,000 1540 1440 1.07 20M 138,000 26,000 18500 15500 1.19

12. Evaluation 12

13. References 13  Finding Top- k Dominance on Incomplete Big Data Using MapReduce Framework [link]  Top-k dominating queries on incomplete data [link]

Editor's Notes

The power is defined by a function Missing values are random
Row=movie, column=user,value=rate M2>M5 M1=M5 M1?M4
Q -> nonD -> P -> O Pure score = Q – nonD - O
Find unique values of each dimension Sort them including missing symbol For missing value: all zeros, for value: one to the end Dimensions are independent and could be calculated in parallel Mapper 1 = flat map
The object itself is not included D3 is not included since it’s missed Len(Ps and Qs) = len(objects) - 1
A: dimension=3500 B: object=6000

Finding Top-k Dominance on Incomplete Big Data Using MapReduce Framework

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Finding Top-k Dominance on Incomplete Big Data Using MapReduce Framework

Similar to Finding Top-k Dominance on Incomplete Big Data Using MapReduce Framework (20)

Recently uploaded

Recently uploaded (20)

Finding Top-k Dominance on Incomplete Big Data Using MapReduce Framework

Editor's Notes