SlideShare a Scribd company logo
Three-way join in one 
round on Hadoop 
COMP 6231 
GROUP 7 
IRAJ HEDAYATISOMARIN, ZAKARIA NASERELDINE, J INYANG DU
Problem statement 
푅 ⋈ 푆 ⋈ 푇 
In this section of second project we 
aimed to calculate three-way join in 
one round of Map-Reduce algorithm. 
S R 
T 
R join S join T
Algorithm Overview 
First relation: R 
a, b 
Second relation: S 
b, c 
Third relation: T 
c, d 
0 1 2 3 
4 5 6 7 
8 9 10 11 
12 13 14 15 
Mapper 
h(b)=x 
h(c)=y 
R,(a,b) 
S,(b,c) 
T,(c,d) 
x 
y 
<KEY, VALUE>=<(X,Y), (relation_name, tuple)> In memory join 
Coordinate of a reducer in imagined matrix of reducers
Mapping and Hashing 
<KEY, VALUE>=<(X,Y), (relation_name, tuple)> 
Exactly same as input 
Fetch from file name 
Input tuple 
First relation: R 
(h(b),1) 
(h(b),2) 
Second relation: S (h(b),h(c)) 
Third relation: T 
… 
(h(b),11) 
(1,h(c)) 
(1,h(c)) 
… 
(11,h(c)) 
푅푒푑푢푐푒푟 # = (푥 − 1) × # 표푓 푟푒푑푢푐푒푟푠 + 푦 
h(b)=x 
h(c)=y
In-memory join algorithm 
NESTED LOOP JOIN 
For each tuple in R 
For each tuple in S 
If R.b==S.b then 
For each tuple in T 
If S.c==T.c then 
Print (R.a, S.b, S.c, T.d) 
SORT-BASED JOIN ALGORITHM 
1. divide input list in three sorted lists using 
Binary Search 푂(푛 algorithm 
log 푛) 
2. Execute in-memory join algorithm 
•UNTIL R and S are not empty DO 
• IF the first items in both list are equal THEN 
• make sure all the tuples with the same value have 
been joined together and remove them from the list 
• ELSE 
• Choose the smallest one and remove items until 
reach an item equal or greater than the front item in 
the another list 
푂(푛3) 
1.Divide list: 푂(푛 log 푛) 
2.In-memory join: 
1.푅 ⋈ 푆 = 푂 푛 
2.푅푆 ⋈ 푇 = 푂 푛
Number of reducers 
We decide to use a square matrix. This choice would be a constraint on number of reducers. For 
example in this case, we had 128 reducers available but actually we just use 121 of them 
On the other hand selecting different number of reducers in each dimension, we will have data 
replication and inefficiency.
Number of reducers (example 1, 
replication problem) 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 
2 
3 
4 
# of reducers=128 
Assumption: R>>T 
Both of them have uniform distribution 
T(R) = 1,000,000 
T(T) = 1,000 
For square matrix: 
Replicated data=1,000,000*11+1,000*11=11,011,000 
For above matrix: 
Replicated data=1,000,000*16+1,000*16=16,016,000
Number of reducers (example 1, 
inefficiency problem) 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 
2 
3 
4 
IDLE FULL IDLE FULL 
# of reducers=128 
Assumption: T>>R 
T is not uniformly distributed 
T(R) = 1,000 
T(T) = 1,000,000 
When the range is reduced, it’s more likely two value 
hash in to the same location.
Experimental results 
37 seconds
Any Question?

More Related Content

What's hot

Data structure lecture 2
Data structure lecture 2Data structure lecture 2
Data structure lecture 2
Kumar
 
Bca ii dfs u-1 introduction to data structure
Bca ii dfs u-1 introduction to data structureBca ii dfs u-1 introduction to data structure
Bca ii dfs u-1 introduction to data structure
Rai University
 
Hashing Techniques in Data Structures Part2
Hashing Techniques in Data Structures Part2Hashing Techniques in Data Structures Part2
Hashing Techniques in Data Structures Part2
SHAKOOR AB
 
DBMS 9 | Extendible Hashing
DBMS 9 | Extendible HashingDBMS 9 | Extendible Hashing
DBMS 9 | Extendible Hashing
Mohammad Imam Hossain
 
Data structure
Data structureData structure
Data structure
viswanathV8
 
4.4 hashing
4.4 hashing4.4 hashing
4.4 hashing
Krish_ver2
 
Data structure ppt
Data structure pptData structure ppt
Data structure ppt
Prof. Dr. K. Adisesha
 
2nd puc computer science chapter 3 data structures 1
2nd puc computer science chapter 3 data structures 12nd puc computer science chapter 3 data structures 1
2nd puc computer science chapter 3 data structures 1
Aahwini Esware gowda
 
358 33 powerpoint-slides_15-hashing-collision_chapter-15
358 33 powerpoint-slides_15-hashing-collision_chapter-15358 33 powerpoint-slides_15-hashing-collision_chapter-15
358 33 powerpoint-slides_15-hashing-collision_chapter-15
sumitbardhan
 
Application of hashing in better alg design tanmay
Application of hashing in better alg design tanmayApplication of hashing in better alg design tanmay
Application of hashing in better alg design tanmay
Tanmay 'Unsinkable'
 
Stack and Hash Table
Stack and Hash TableStack and Hash Table
Stack and Hash Table
Umma Khatuna Jannat
 
Day 5 u8f13
Day 5 u8f13Day 5 u8f13
Day 5 u8f13
jchartiersjsd
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructures
Krish_ver2
 
Set data structure
Set data structure Set data structure
Set data structure
Tech_MX
 
Data structure and its types
Data structure and its typesData structure and its types
Data structure and its types
Navtar Sidhu Brar
 
Hashing
HashingHashing
Hashing
Abbas Ali
 
Row major and column major in 2 d
Row major and column major in 2 dRow major and column major in 2 d
Row major and column major in 2 d
nikhilarora2211
 
Hashing
HashingHashing
Hashing
LavanyaJ28
 
Introduction of data structure
Introduction of data structureIntroduction of data structure
Introduction of data structure
eShikshak
 
Lecture 2a arrays
Lecture 2a arraysLecture 2a arrays
Lecture 2a arrays
Victor Palmar
 

What's hot (20)

Data structure lecture 2
Data structure lecture 2Data structure lecture 2
Data structure lecture 2
 
Bca ii dfs u-1 introduction to data structure
Bca ii dfs u-1 introduction to data structureBca ii dfs u-1 introduction to data structure
Bca ii dfs u-1 introduction to data structure
 
Hashing Techniques in Data Structures Part2
Hashing Techniques in Data Structures Part2Hashing Techniques in Data Structures Part2
Hashing Techniques in Data Structures Part2
 
DBMS 9 | Extendible Hashing
DBMS 9 | Extendible HashingDBMS 9 | Extendible Hashing
DBMS 9 | Extendible Hashing
 
Data structure
Data structureData structure
Data structure
 
4.4 hashing
4.4 hashing4.4 hashing
4.4 hashing
 
Data structure ppt
Data structure pptData structure ppt
Data structure ppt
 
2nd puc computer science chapter 3 data structures 1
2nd puc computer science chapter 3 data structures 12nd puc computer science chapter 3 data structures 1
2nd puc computer science chapter 3 data structures 1
 
358 33 powerpoint-slides_15-hashing-collision_chapter-15
358 33 powerpoint-slides_15-hashing-collision_chapter-15358 33 powerpoint-slides_15-hashing-collision_chapter-15
358 33 powerpoint-slides_15-hashing-collision_chapter-15
 
Application of hashing in better alg design tanmay
Application of hashing in better alg design tanmayApplication of hashing in better alg design tanmay
Application of hashing in better alg design tanmay
 
Stack and Hash Table
Stack and Hash TableStack and Hash Table
Stack and Hash Table
 
Day 5 u8f13
Day 5 u8f13Day 5 u8f13
Day 5 u8f13
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructures
 
Set data structure
Set data structure Set data structure
Set data structure
 
Data structure and its types
Data structure and its typesData structure and its types
Data structure and its types
 
Hashing
HashingHashing
Hashing
 
Row major and column major in 2 d
Row major and column major in 2 dRow major and column major in 2 d
Row major and column major in 2 d
 
Hashing
HashingHashing
Hashing
 
Introduction of data structure
Introduction of data structureIntroduction of data structure
Introduction of data structure
 
Lecture 2a arrays
Lecture 2a arraysLecture 2a arrays
Lecture 2a arrays
 

Similar to Three way join in one round on hadoop

R programming intro with examples
R programming intro with examplesR programming intro with examples
R programming intro with examples
Dennis
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
Khaled Al-Shamaa
 
2 data structure in R
2 data structure in R2 data structure in R
2 data structure in R
naroranisha
 
R for Statistical Computing
R for Statistical ComputingR for Statistical Computing
R for Statistical Computing
Mohammed El Rafie Tarabay
 
R Cheat Sheet for Data Analysts and Statisticians.pdf
R Cheat Sheet for Data Analysts and Statisticians.pdfR Cheat Sheet for Data Analysts and Statisticians.pdf
R Cheat Sheet for Data Analysts and Statisticians.pdf
Timothy McBush Hiele
 
Lecture 01 reals number system
Lecture 01 reals number systemLecture 01 reals number system
Lecture 01 reals number system
Hazel Joy Chong
 
TENSOR DECOMPOSITION WITH PYTHON
TENSOR DECOMPOSITION WITH PYTHONTENSOR DECOMPOSITION WITH PYTHON
TENSOR DECOMPOSITION WITH PYTHON
André Panisson
 
lecture 15
lecture 15lecture 15
lecture 15
sajinsc
 
DATA STRUCTURES unit 1.pptx
DATA STRUCTURES unit 1.pptxDATA STRUCTURES unit 1.pptx
DATA STRUCTURES unit 1.pptx
ShivamKrPathak
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD Editor
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
University of Salerno
 
Principal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and VisualizationPrincipal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and Visualization
Marjan Sterjev
 
Basic R Data Manipulation
Basic R Data ManipulationBasic R Data Manipulation
Basic R Data Manipulation
Chu An
 
Survey Demo
Survey DemoSurvey Demo
Survey Demo
Anthony Castellani
 
Statistics lab 1
Statistics lab 1Statistics lab 1
Statistics lab 1
University of Salerno
 
Ch2
Ch2Ch2
R gráfico
R gráficoR gráfico
R gráfico
stryper1968
 
Advanced algebra
Advanced algebraAdvanced algebra
Advanced algebra
spark21
 
Big Data Mining in Indian Economic Survey 2017
Big Data Mining in Indian Economic Survey 2017Big Data Mining in Indian Economic Survey 2017
Big Data Mining in Indian Economic Survey 2017
Parth Khare
 
R tutorial for a windows environment
R tutorial for a windows environmentR tutorial for a windows environment
R tutorial for a windows environment
Yogendra Chaubey
 

Similar to Three way join in one round on hadoop (20)

R programming intro with examples
R programming intro with examplesR programming intro with examples
R programming intro with examples
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
2 data structure in R
2 data structure in R2 data structure in R
2 data structure in R
 
R for Statistical Computing
R for Statistical ComputingR for Statistical Computing
R for Statistical Computing
 
R Cheat Sheet for Data Analysts and Statisticians.pdf
R Cheat Sheet for Data Analysts and Statisticians.pdfR Cheat Sheet for Data Analysts and Statisticians.pdf
R Cheat Sheet for Data Analysts and Statisticians.pdf
 
Lecture 01 reals number system
Lecture 01 reals number systemLecture 01 reals number system
Lecture 01 reals number system
 
TENSOR DECOMPOSITION WITH PYTHON
TENSOR DECOMPOSITION WITH PYTHONTENSOR DECOMPOSITION WITH PYTHON
TENSOR DECOMPOSITION WITH PYTHON
 
lecture 15
lecture 15lecture 15
lecture 15
 
DATA STRUCTURES unit 1.pptx
DATA STRUCTURES unit 1.pptxDATA STRUCTURES unit 1.pptx
DATA STRUCTURES unit 1.pptx
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Principal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and VisualizationPrincipal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and Visualization
 
Basic R Data Manipulation
Basic R Data ManipulationBasic R Data Manipulation
Basic R Data Manipulation
 
Survey Demo
Survey DemoSurvey Demo
Survey Demo
 
Statistics lab 1
Statistics lab 1Statistics lab 1
Statistics lab 1
 
Ch2
Ch2Ch2
Ch2
 
R gráfico
R gráficoR gráfico
R gráfico
 
Advanced algebra
Advanced algebraAdvanced algebra
Advanced algebra
 
Big Data Mining in Indian Economic Survey 2017
Big Data Mining in Indian Economic Survey 2017Big Data Mining in Indian Economic Survey 2017
Big Data Mining in Indian Economic Survey 2017
 
R tutorial for a windows environment
R tutorial for a windows environmentR tutorial for a windows environment
R tutorial for a windows environment
 

Recently uploaded

A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 

Recently uploaded (20)

A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 

Three way join in one round on hadoop

  • 1. Three-way join in one round on Hadoop COMP 6231 GROUP 7 IRAJ HEDAYATISOMARIN, ZAKARIA NASERELDINE, J INYANG DU
  • 2. Problem statement 푅 ⋈ 푆 ⋈ 푇 In this section of second project we aimed to calculate three-way join in one round of Map-Reduce algorithm. S R T R join S join T
  • 3. Algorithm Overview First relation: R a, b Second relation: S b, c Third relation: T c, d 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Mapper h(b)=x h(c)=y R,(a,b) S,(b,c) T,(c,d) x y <KEY, VALUE>=<(X,Y), (relation_name, tuple)> In memory join Coordinate of a reducer in imagined matrix of reducers
  • 4. Mapping and Hashing <KEY, VALUE>=<(X,Y), (relation_name, tuple)> Exactly same as input Fetch from file name Input tuple First relation: R (h(b),1) (h(b),2) Second relation: S (h(b),h(c)) Third relation: T … (h(b),11) (1,h(c)) (1,h(c)) … (11,h(c)) 푅푒푑푢푐푒푟 # = (푥 − 1) × # 표푓 푟푒푑푢푐푒푟푠 + 푦 h(b)=x h(c)=y
  • 5. In-memory join algorithm NESTED LOOP JOIN For each tuple in R For each tuple in S If R.b==S.b then For each tuple in T If S.c==T.c then Print (R.a, S.b, S.c, T.d) SORT-BASED JOIN ALGORITHM 1. divide input list in three sorted lists using Binary Search 푂(푛 algorithm log 푛) 2. Execute in-memory join algorithm •UNTIL R and S are not empty DO • IF the first items in both list are equal THEN • make sure all the tuples with the same value have been joined together and remove them from the list • ELSE • Choose the smallest one and remove items until reach an item equal or greater than the front item in the another list 푂(푛3) 1.Divide list: 푂(푛 log 푛) 2.In-memory join: 1.푅 ⋈ 푆 = 푂 푛 2.푅푆 ⋈ 푇 = 푂 푛
  • 6. Number of reducers We decide to use a square matrix. This choice would be a constraint on number of reducers. For example in this case, we had 128 reducers available but actually we just use 121 of them On the other hand selecting different number of reducers in each dimension, we will have data replication and inefficiency.
  • 7. Number of reducers (example 1, replication problem) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 2 3 4 # of reducers=128 Assumption: R>>T Both of them have uniform distribution T(R) = 1,000,000 T(T) = 1,000 For square matrix: Replicated data=1,000,000*11+1,000*11=11,011,000 For above matrix: Replicated data=1,000,000*16+1,000*16=16,016,000
  • 8. Number of reducers (example 1, inefficiency problem) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 2 3 4 IDLE FULL IDLE FULL # of reducers=128 Assumption: T>>R T is not uniformly distributed T(R) = 1,000 T(T) = 1,000,000 When the range is reduced, it’s more likely two value hash in to the same location.