SlideShare a Scribd company logo
1 of 10
Three-way join in one 
round on Hadoop 
COMP 6231 
GROUP 7 
IRAJ HEDAYATISOMARIN, ZAKARIA NASERELDINE, J INYANG DU
Problem statement 
푅 ⋈ 푆 ⋈ 푇 
In this section of second project we 
aimed to calculate three-way join in 
one round of Map-Reduce algorithm. 
S R 
T 
R join S join T
Algorithm Overview 
First relation: R 
a, b 
Second relation: S 
b, c 
Third relation: T 
c, d 
0 1 2 3 
4 5 6 7 
8 9 10 11 
12 13 14 15 
Mapper 
h(b)=x 
h(c)=y 
R,(a,b) 
S,(b,c) 
T,(c,d) 
x 
y 
<KEY, VALUE>=<(X,Y), (relation_name, tuple)> In memory join 
Coordinate of a reducer in imagined matrix of reducers
Mapping and Hashing 
<KEY, VALUE>=<(X,Y), (relation_name, tuple)> 
Exactly same as input 
Fetch from file name 
Input tuple 
First relation: R 
(h(b),1) 
(h(b),2) 
Second relation: S (h(b),h(c)) 
Third relation: T 
… 
(h(b),11) 
(1,h(c)) 
(1,h(c)) 
… 
(11,h(c)) 
푅푒푑푢푐푒푟 # = (푥 − 1) × # 표푓 푟푒푑푢푐푒푟푠 + 푦 
h(b)=x 
h(c)=y
In-memory join algorithm 
NESTED LOOP JOIN 
For each tuple in R 
For each tuple in S 
If R.b==S.b then 
For each tuple in T 
If S.c==T.c then 
Print (R.a, S.b, S.c, T.d) 
SORT-BASED JOIN ALGORITHM 
1. divide input list in three sorted lists using 
Binary Search 푂(푛 algorithm 
log 푛) 
2. Execute in-memory join algorithm 
•UNTIL R and S are not empty DO 
• IF the first items in both list are equal THEN 
• make sure all the tuples with the same value have 
been joined together and remove them from the list 
• ELSE 
• Choose the smallest one and remove items until 
reach an item equal or greater than the front item in 
the another list 
푂(푛3) 
1.Divide list: 푂(푛 log 푛) 
2.In-memory join: 
1.푅 ⋈ 푆 = 푂 푛 
2.푅푆 ⋈ 푇 = 푂 푛
Number of reducers 
We decide to use a square matrix. This choice would be a constraint on number of reducers. For 
example in this case, we had 128 reducers available but actually we just use 121 of them 
On the other hand selecting different number of reducers in each dimension, we will have data 
replication and inefficiency.
Number of reducers (example 1, 
replication problem) 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 
2 
3 
4 
# of reducers=128 
Assumption: R>>T 
Both of them have uniform distribution 
T(R) = 1,000,000 
T(T) = 1,000 
For square matrix: 
Replicated data=1,000,000*11+1,000*11=11,011,000 
For above matrix: 
Replicated data=1,000,000*16+1,000*16=16,016,000
Number of reducers (example 1, 
inefficiency problem) 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 
2 
3 
4 
IDLE FULL IDLE FULL 
# of reducers=128 
Assumption: T>>R 
T is not uniformly distributed 
T(R) = 1,000 
T(T) = 1,000,000 
When the range is reduced, it’s more likely two value 
hash in to the same location.
Experimental results 
37 seconds
Any Question?

More Related Content

What's hot

Data structure lecture 2
Data structure lecture 2Data structure lecture 2
Data structure lecture 2Kumar
 
Bca ii dfs u-1 introduction to data structure
Bca ii dfs u-1 introduction to data structureBca ii dfs u-1 introduction to data structure
Bca ii dfs u-1 introduction to data structureRai University
 
Hashing Techniques in Data Structures Part2
Hashing Techniques in Data Structures Part2Hashing Techniques in Data Structures Part2
Hashing Techniques in Data Structures Part2SHAKOOR AB
 
2nd puc computer science chapter 3 data structures 1
2nd puc computer science chapter 3 data structures 12nd puc computer science chapter 3 data structures 1
2nd puc computer science chapter 3 data structures 1Aahwini Esware gowda
 
358 33 powerpoint-slides_15-hashing-collision_chapter-15
358 33 powerpoint-slides_15-hashing-collision_chapter-15358 33 powerpoint-slides_15-hashing-collision_chapter-15
358 33 powerpoint-slides_15-hashing-collision_chapter-15sumitbardhan
 
Application of hashing in better alg design tanmay
Application of hashing in better alg design tanmayApplication of hashing in better alg design tanmay
Application of hashing in better alg design tanmayTanmay 'Unsinkable'
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructuresKrish_ver2
 
Set data structure
Set data structure Set data structure
Set data structure Tech_MX
 
Data structure and its types
Data structure and its typesData structure and its types
Data structure and its typesNavtar Sidhu Brar
 
Row major and column major in 2 d
Row major and column major in 2 dRow major and column major in 2 d
Row major and column major in 2 dnikhilarora2211
 
Introduction of data structure
Introduction of data structureIntroduction of data structure
Introduction of data structureeShikshak
 

What's hot (20)

Data structure lecture 2
Data structure lecture 2Data structure lecture 2
Data structure lecture 2
 
Bca ii dfs u-1 introduction to data structure
Bca ii dfs u-1 introduction to data structureBca ii dfs u-1 introduction to data structure
Bca ii dfs u-1 introduction to data structure
 
Hashing Techniques in Data Structures Part2
Hashing Techniques in Data Structures Part2Hashing Techniques in Data Structures Part2
Hashing Techniques in Data Structures Part2
 
DBMS 9 | Extendible Hashing
DBMS 9 | Extendible HashingDBMS 9 | Extendible Hashing
DBMS 9 | Extendible Hashing
 
Data structure
Data structureData structure
Data structure
 
4.4 hashing
4.4 hashing4.4 hashing
4.4 hashing
 
Data structure ppt
Data structure pptData structure ppt
Data structure ppt
 
2nd puc computer science chapter 3 data structures 1
2nd puc computer science chapter 3 data structures 12nd puc computer science chapter 3 data structures 1
2nd puc computer science chapter 3 data structures 1
 
358 33 powerpoint-slides_15-hashing-collision_chapter-15
358 33 powerpoint-slides_15-hashing-collision_chapter-15358 33 powerpoint-slides_15-hashing-collision_chapter-15
358 33 powerpoint-slides_15-hashing-collision_chapter-15
 
Application of hashing in better alg design tanmay
Application of hashing in better alg design tanmayApplication of hashing in better alg design tanmay
Application of hashing in better alg design tanmay
 
Stack and Hash Table
Stack and Hash TableStack and Hash Table
Stack and Hash Table
 
Day 5 u8f13
Day 5 u8f13Day 5 u8f13
Day 5 u8f13
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructures
 
Set data structure
Set data structure Set data structure
Set data structure
 
Data structure and its types
Data structure and its typesData structure and its types
Data structure and its types
 
Hashing
HashingHashing
Hashing
 
Row major and column major in 2 d
Row major and column major in 2 dRow major and column major in 2 d
Row major and column major in 2 d
 
Hashing
HashingHashing
Hashing
 
Introduction of data structure
Introduction of data structureIntroduction of data structure
Introduction of data structure
 
Lecture 2a arrays
Lecture 2a arraysLecture 2a arrays
Lecture 2a arrays
 

Similar to Three way join in one round on hadoop

R programming intro with examples
R programming intro with examplesR programming intro with examples
R programming intro with examplesDennis
 
2 data structure in R
2 data structure in R2 data structure in R
2 data structure in Rnaroranisha
 
R Cheat Sheet for Data Analysts and Statisticians.pdf
R Cheat Sheet for Data Analysts and Statisticians.pdfR Cheat Sheet for Data Analysts and Statisticians.pdf
R Cheat Sheet for Data Analysts and Statisticians.pdfTimothy McBush Hiele
 
Lecture 01 reals number system
Lecture 01 reals number systemLecture 01 reals number system
Lecture 01 reals number systemHazel Joy Chong
 
TENSOR DECOMPOSITION WITH PYTHON
TENSOR DECOMPOSITION WITH PYTHONTENSOR DECOMPOSITION WITH PYTHON
TENSOR DECOMPOSITION WITH PYTHONAndré Panisson
 
lecture 15
lecture 15lecture 15
lecture 15sajinsc
 
DATA STRUCTURES unit 1.pptx
DATA STRUCTURES unit 1.pptxDATA STRUCTURES unit 1.pptx
DATA STRUCTURES unit 1.pptxShivamKrPathak
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
 
Principal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and VisualizationPrincipal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and VisualizationMarjan Sterjev
 
Basic R Data Manipulation
Basic R Data ManipulationBasic R Data Manipulation
Basic R Data ManipulationChu An
 
Advanced algebra
Advanced algebraAdvanced algebra
Advanced algebraspark21
 
Big Data Mining in Indian Economic Survey 2017
Big Data Mining in Indian Economic Survey 2017Big Data Mining in Indian Economic Survey 2017
Big Data Mining in Indian Economic Survey 2017Parth Khare
 
R tutorial for a windows environment
R tutorial for a windows environmentR tutorial for a windows environment
R tutorial for a windows environmentYogendra Chaubey
 

Similar to Three way join in one round on hadoop (20)

R programming intro with examples
R programming intro with examplesR programming intro with examples
R programming intro with examples
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
2 data structure in R
2 data structure in R2 data structure in R
2 data structure in R
 
R for Statistical Computing
R for Statistical ComputingR for Statistical Computing
R for Statistical Computing
 
R Cheat Sheet for Data Analysts and Statisticians.pdf
R Cheat Sheet for Data Analysts and Statisticians.pdfR Cheat Sheet for Data Analysts and Statisticians.pdf
R Cheat Sheet for Data Analysts and Statisticians.pdf
 
Lecture 01 reals number system
Lecture 01 reals number systemLecture 01 reals number system
Lecture 01 reals number system
 
TENSOR DECOMPOSITION WITH PYTHON
TENSOR DECOMPOSITION WITH PYTHONTENSOR DECOMPOSITION WITH PYTHON
TENSOR DECOMPOSITION WITH PYTHON
 
lecture 15
lecture 15lecture 15
lecture 15
 
DATA STRUCTURES unit 1.pptx
DATA STRUCTURES unit 1.pptxDATA STRUCTURES unit 1.pptx
DATA STRUCTURES unit 1.pptx
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Principal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and VisualizationPrincipal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and Visualization
 
Basic R Data Manipulation
Basic R Data ManipulationBasic R Data Manipulation
Basic R Data Manipulation
 
Survey Demo
Survey DemoSurvey Demo
Survey Demo
 
Statistics lab 1
Statistics lab 1Statistics lab 1
Statistics lab 1
 
Ch2
Ch2Ch2
Ch2
 
R gráfico
R gráficoR gráfico
R gráfico
 
Advanced algebra
Advanced algebraAdvanced algebra
Advanced algebra
 
Big Data Mining in Indian Economic Survey 2017
Big Data Mining in Indian Economic Survey 2017Big Data Mining in Indian Economic Survey 2017
Big Data Mining in Indian Economic Survey 2017
 
R tutorial for a windows environment
R tutorial for a windows environmentR tutorial for a windows environment
R tutorial for a windows environment
 

Recently uploaded

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Servicejennyeacort
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationBoston Institute of Analytics
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/managementakshesh doshi
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health Classification
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/management
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 

Three way join in one round on hadoop

  • 1. Three-way join in one round on Hadoop COMP 6231 GROUP 7 IRAJ HEDAYATISOMARIN, ZAKARIA NASERELDINE, J INYANG DU
  • 2. Problem statement 푅 ⋈ 푆 ⋈ 푇 In this section of second project we aimed to calculate three-way join in one round of Map-Reduce algorithm. S R T R join S join T
  • 3. Algorithm Overview First relation: R a, b Second relation: S b, c Third relation: T c, d 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Mapper h(b)=x h(c)=y R,(a,b) S,(b,c) T,(c,d) x y <KEY, VALUE>=<(X,Y), (relation_name, tuple)> In memory join Coordinate of a reducer in imagined matrix of reducers
  • 4. Mapping and Hashing <KEY, VALUE>=<(X,Y), (relation_name, tuple)> Exactly same as input Fetch from file name Input tuple First relation: R (h(b),1) (h(b),2) Second relation: S (h(b),h(c)) Third relation: T … (h(b),11) (1,h(c)) (1,h(c)) … (11,h(c)) 푅푒푑푢푐푒푟 # = (푥 − 1) × # 표푓 푟푒푑푢푐푒푟푠 + 푦 h(b)=x h(c)=y
  • 5. In-memory join algorithm NESTED LOOP JOIN For each tuple in R For each tuple in S If R.b==S.b then For each tuple in T If S.c==T.c then Print (R.a, S.b, S.c, T.d) SORT-BASED JOIN ALGORITHM 1. divide input list in three sorted lists using Binary Search 푂(푛 algorithm log 푛) 2. Execute in-memory join algorithm •UNTIL R and S are not empty DO • IF the first items in both list are equal THEN • make sure all the tuples with the same value have been joined together and remove them from the list • ELSE • Choose the smallest one and remove items until reach an item equal or greater than the front item in the another list 푂(푛3) 1.Divide list: 푂(푛 log 푛) 2.In-memory join: 1.푅 ⋈ 푆 = 푂 푛 2.푅푆 ⋈ 푇 = 푂 푛
  • 6. Number of reducers We decide to use a square matrix. This choice would be a constraint on number of reducers. For example in this case, we had 128 reducers available but actually we just use 121 of them On the other hand selecting different number of reducers in each dimension, we will have data replication and inefficiency.
  • 7. Number of reducers (example 1, replication problem) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 2 3 4 # of reducers=128 Assumption: R>>T Both of them have uniform distribution T(R) = 1,000,000 T(T) = 1,000 For square matrix: Replicated data=1,000,000*11+1,000*11=11,011,000 For above matrix: Replicated data=1,000,000*16+1,000*16=16,016,000
  • 8. Number of reducers (example 1, inefficiency problem) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 2 3 4 IDLE FULL IDLE FULL # of reducers=128 Assumption: T>>R T is not uniformly distributed T(R) = 1,000 T(T) = 1,000,000 When the range is reduced, it’s more likely two value hash in to the same location.