SlideShare a Scribd company logo
1 of 18
28th International Symposium on Distributed Computing (DISC 2014) 
Austin, Texas, USA (12-15 October 2014) 
Assignment of Different-Sized 
Inputs in MapReduce 
Shantanu Sharma2 
joint work with 
Foto N. Afrati1, Shlomi Dolev2, Ephraim Korach2, and 
Jeffrey D. Ullman3 
1 National Technical University of Athens, Greece 
2 Ben-Gurion University of the Negev, Israel 
3 Stanford University, USA
Introduction 
• Cluster Computing 
– Terabytes or Petabytes amount of data cannot be 
processed on a single computer 
– Cluster of computers 
– How to mask failures, e.g., hardware failures 
• MapReduce is a programming model used for 
parallel processing over large-scale data 
2
Introduction 
3 
MapReduce job: Map Phase and Reduce Phase 
Worker 
Worker 
Master 
process 
Map Phase: applies a 
user-defined Map 
function 
Worker 
Worker 
Worker 
fork 
Read Local 
write 
Remote read, 
sort 
Output 
File 0 
Output 
File 1 
Write 
Chunk 0 
Chunk 1 
Chunk 2 
Input Data 
Reduce Phase: applies 
a user-defined Reduce 
function
MapReduce working example – Word Count 
Mapper 
1 
Reducer for 
I 
Mapper 
2 
Introduction 
I 1 
like 1 
apple 2 
Reducer for 
like 
Reducer for 
apple 
Reducer for 
is 
Reducer for 
fruit 
Reducer for 
banana 
(I, 2) 
(like, 2) 
(apple, 2) 
(is, 1) 
(fruit, 1) 
(banana, 1) 
I like 
apple. 
Apple is 
fruit. 
I like 
banana. 
is 1 
fruit 1 
I 1 
like 1 
banana 1
Inputs and outputs in our context 
Mapper 
1 
Reducer for 
I 
Mapper 
2 
Introduction 
I 1 
like 1 
apple 2 
Reducer for 
like 
Reducer for 
apple 
Reducer for 
is 
Reducer for 
fruit 
Reducer for 
banana 
(I, 2) 
(like, 2) 
(apple, 2) 
(is, 1) 
(fruit, 1) 
(banana, 1) 
I like 
apple. 
Apple is 
fruit. 
I like 
banana. 
is 1 
fruit 1 
I 1 
like 1 
banana 1 
Inputs 
Outputs
Reducer Capacity 
• Values, provided by each mapper, have some sizes 
(input size) 
• Reduce capacity: an upper bound on the sum of the 
sizes of the values that are assigned to the reducer 
• Example: reducer capacity to be the size of the main 
memory of the processors on which reducers run 
We consider two special matching problems 
6
State-of-the-Art 
• F. Afrati, A.D. Sarma, S. Salihoglu, and J.D. Ullman, 
“Upper and Lower Bounds on the Cost of a Map- 
Reduce Computation,” PVLDB, 2013. 
• Unit input size 
• Reducer Size 
– Maximum number of inputs that a given reducer 
can have. 
7
Problem Statement 
• Communication cost between the map and the 
reduce phases is a significant factor 
• How we can reduce the communication cost? 
– A lesser number of reducers, and hence, a smaller 
communication cost 
– How to minimize the total number of reducers 
while respecting their limited capacity? 
• Not an easy task 
– All-to-All mapping schema problem 
– X-to-Y mapping schema problem 
8 
Mapper for 
1st 
input 
Reducer for k1 
(1, 2) 
Reducer for k2 
(1, 3) 
Reducer for k3 
(2, 3) 
Mapper for 
2nd 
input 
Mapper for 
3rd 
input 
input1 k1 
input1 
k2 
input2 k1 
k input2 3 
input3 k2 
input3 k3 
Mapper for 
1st 
input 
Reducer for k1 
(1, 2, 3) 
Mapper for 
2nd 
input 
Mapper for 
3rd 
input 
input1 k1 
input2 k1 
input3 k1 
inputinput 1 2 input3 
inputinput 1 2 input3 
Notation 
ki: key
A2A Mapping Schema Problem 
• A set of inputs is given 
• Each pair of inputs corresponds to one output 
• Example 
– Computing common friends 
• Lists of friends of m persons are given 
• Find common friends of the given m persons 
• Every two friend lists must be assigned to a single 
common reducer 
9
A2A Mapping Schema Problem 
Mapper for 
fl1 Reducer for k1 
1st 
friend 
fl2 
fl3 
(1, 2, 3) 
fl4 
Reducer for k2 
(1, 2, 4) 
Reducer for k3 
(3, 4) 
Mapper for 
2nd 
friend 
Mapper for 
3rd 
friend 
Mapper for 
4th 
friend 
fl1 k1 
fl1 
k2 
fl2 k1 
Reducer capacity is 
enough to hold some of 
the friend lists together 
k fl2 2 
fl3 k1 
fl3 k3 
fl4 k2 
flk 4 3 
10 
Notations 
ki: key 
1, 2 fli: ith friend list 
1, 3 
2, 3 
1, 4 
2, 4 
3, 4
A2A Mapping Schema Problem 
Mapper for 
1st 
friend 
fl1 
fl2 
fl3 
Notations 
ki: key 
fli: ith 1, 2 friend list 
1, 3 
1, 4 
Reducer for k1 
(1, 2, 3, 4) 
fl4 
Mapper for 
2nd 
friend 
Mapper for 
3rd 
friend 
Mapper for 
4th 
friend 
fl1 k1 
Reducer capacity is 
enough to hold all the 
friend lists together 
fl2 k1 
fl3 k1 
fl4 k1 
11 
2, 3 
2, 4 
3, 4
A2A Mapping Schema Problem 
• What to do? 
– Assigns the given m inputs to the given number of 
reducers, without exceeding q, in a manner that 
every given input is coupled with every other given 
input in at least one reducer in common 
• Polynomial time solution for one and two 
reducers 
• NP-hard for z > 2 reducers 
12
Heuristics for A2A Mapping 
Schema Problem 
• Based on 
– First-Fit Decreasing (FFD) or Best-Fit Decreasing 
(BFD) bin-packing algorithm 
– Pseudo-polynomial bin-packing algorithm* 
– 2-step Algorithms 
– The selection of a prime number p 
• A fixed reducer capacity is given 
13 
*D. R. Karger and J. Scott. Efficient algorithms for fixed-precision instances of bin 
packing and euclidean tsp. In APPROX-RANDOM, pages 104–117, 2008.
X2Y Mapping Schema Problem 
• Two disjoint sets X and Y are given 
• Each pairs of element xi, yj (where xi  X, yj 
 Y, i, j) of the sets X and Y corresponds to 
one output 
• Example 
– Skew Join 
• Two relations X(A, B) and Y(B, C) are given where lots of 
tuple have a common “b” value 
• Every tuple with an identical “b” value is required to 
assign to at least one reducer 
14
X2Y Mapping Schema Problem 
• What to do? 
– Assigns each input of the set X with each input 
of the set Y to at least one reducer in common, 
without exceeding q 
• Polynomial for one reducer 
– Can we assign all the inputs of the sets X and Y to 
a single reducer 
• NP-hard for z > 1 reducers 
15
Heuristics for X2Y Mapping 
Schema Problem 
• Based on 
– First-Fit Decreasing (FFD) or Best-Fit Decreasing 
(BFD) bin-packing algorithm 
• A fixed reducer capacity is given 
16
Conclusion 
• Reducer capacity 
– An important parameter to be considered in all MapReduce 
algorithms 
– The capacity is in terms of, not necessarily identical, memory 
auxiliary size, augmented and added to the index of the data 
item(s) 
• Two assignment schemas of MapReduce are given 
– All-to-All (A2A) mapping schema problem 
– X-to-Y (X2Y) mapping schema problem 
• Several heuristics for A2A and X2Y mapping schema 
problems are provided 
17
Presentation is available at 
http://www.cs.bgu.ac.il/~sharmas/publication.html 
Foto Afrati1, Shlomi Dolev2, Ephraim Korach3, 
Shantanu Sharma2, and Jeffrey D. Ullman4 
1 School of Electrical and Computing Engineering, National Technical 
University of Athens, Greece 
afrati@softlab.ece.ntua.gr 
2 Department of Computer Science, Ben-Gurion University of the 
Negev, Israel 
{dolev,sharmas}@cs.bgu.ac.il 
3 Department of Industrial Engineering and Management, Ben-Gurion 
University of the Negev, Israel 
korach@bgu.ac.il 
4 Department of Computer Science, Stanford University, USA 
ullman@cs.stanford.edu

More Related Content

What's hot

Mapreduce advanced
Mapreduce advancedMapreduce advanced
Mapreduce advancedChirag Ahuja
 
CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query ExecutionJ Singh
 
Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compres...
Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compres...Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compres...
Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compres...Daniel Lemire
 
Search algorithms for discrete optimization
Search algorithms for discrete optimizationSearch algorithms for discrete optimization
Search algorithms for discrete optimizationSally Salem
 
Hadoop scheduler with deadline constraint
Hadoop scheduler with deadline constraintHadoop scheduler with deadline constraint
Hadoop scheduler with deadline constraintijccsa
 
Access to non local names
Access to non local namesAccess to non local names
Access to non local namesVarsha Kumar
 
Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...NECST Lab @ Politecnico di Milano
 
Topic 6: MapReduce Applications
Topic 6: MapReduce ApplicationsTopic 6: MapReduce Applications
Topic 6: MapReduce ApplicationsZubair Nabi
 
Scheduling Using Multi Objective Genetic Algorithm
Scheduling Using Multi Objective Genetic AlgorithmScheduling Using Multi Objective Genetic Algorithm
Scheduling Using Multi Objective Genetic Algorithmiosrjce
 
C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
C-MR: Continuously Executing MapReduce Workflows on Multi-Core ProcessorsC-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
C-MR: Continuously Executing MapReduce Workflows on Multi-Core ProcessorsQian Lin
 

What's hot (20)

Mapreduce advanced
Mapreduce advancedMapreduce advanced
Mapreduce advanced
 
main
mainmain
main
 
Main map reduce
Main map reduceMain map reduce
Main map reduce
 
CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query Execution
 
Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compres...
Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compres...Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compres...
Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compres...
 
Search algorithms for discrete optimization
Search algorithms for discrete optimizationSearch algorithms for discrete optimization
Search algorithms for discrete optimization
 
Hadoop scheduler with deadline constraint
Hadoop scheduler with deadline constraintHadoop scheduler with deadline constraint
Hadoop scheduler with deadline constraint
 
Interpreting the Data:Parallel Analysis with Sawzall
Interpreting the Data:Parallel Analysis with SawzallInterpreting the Data:Parallel Analysis with Sawzall
Interpreting the Data:Parallel Analysis with Sawzall
 
Access to non local names
Access to non local namesAccess to non local names
Access to non local names
 
3D-DRESD Polaris
3D-DRESD Polaris3D-DRESD Polaris
3D-DRESD Polaris
 
Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...
 
Topic 6: MapReduce Applications
Topic 6: MapReduce ApplicationsTopic 6: MapReduce Applications
Topic 6: MapReduce Applications
 
Lecture set 5
Lecture set 5Lecture set 5
Lecture set 5
 
04 pig data operations
04 pig data operations04 pig data operations
04 pig data operations
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Scheduling Using Multi Objective Genetic Algorithm
Scheduling Using Multi Objective Genetic AlgorithmScheduling Using Multi Objective Genetic Algorithm
Scheduling Using Multi Objective Genetic Algorithm
 
Run time administration
Run time administrationRun time administration
Run time administration
 
C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
C-MR: Continuously Executing MapReduce Workflows on Multi-Core ProcessorsC-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
 
Parallel-kmeans
Parallel-kmeansParallel-kmeans
Parallel-kmeans
 
Chapter 7 Run Time Environment
Chapter 7   Run Time EnvironmentChapter 7   Run Time Environment
Chapter 7 Run Time Environment
 

Similar to Assignment of Different-Sized Inputs in MapReduce

MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLabMapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
BDC-presentation
BDC-presentationBDC-presentation
BDC-presentationPavel Popa
 
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduceComputing Scientometrics in Large-Scale Academic Search Engines with MapReduce
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduceLeonidas Akritidis
 
MapReduce
MapReduceMapReduce
MapReduceKavyaGo
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopApache Apex
 
Session 19 - MapReduce
Session 19  - MapReduce Session 19  - MapReduce
Session 19 - MapReduce AnandMHadoop
 
DFA minimization algorithms in map reduce
DFA minimization algorithms in map reduceDFA minimization algorithms in map reduce
DFA minimization algorithms in map reduceIraj Hedayati
 
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화NAVER Engineering
 
Lecture 3: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 3: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 3: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 3: Data-Intensive Computing for Text Analysis (Fall 2011)Matthew Lease
 
Meta-MapReduce- A Technique for Reducing Communication in MapReduce Computations
Meta-MapReduce- A Technique for Reducing Communication in MapReduce ComputationsMeta-MapReduce- A Technique for Reducing Communication in MapReduce Computations
Meta-MapReduce- A Technique for Reducing Communication in MapReduce ComputationsShantanu Sharma
 
MapReduceAlgorithms.ppt
MapReduceAlgorithms.pptMapReduceAlgorithms.ppt
MapReduceAlgorithms.pptCheeWeiTan10
 

Similar to Assignment of Different-Sized Inputs in MapReduce (20)

MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLabMapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
 
BDC-presentation
BDC-presentationBDC-presentation
BDC-presentation
 
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduceComputing Scientometrics in Large-Scale Academic Search Engines with MapReduce
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce
 
The google MapReduce
The google MapReduceThe google MapReduce
The google MapReduce
 
MapReduce
MapReduceMapReduce
MapReduce
 
poster
posterposter
poster
 
Hadoop classes in mumbai
Hadoop classes in mumbaiHadoop classes in mumbai
Hadoop classes in mumbai
 
Hadoop-Introduction
Hadoop-IntroductionHadoop-Introduction
Hadoop-Introduction
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Intro to Map Reduce
Intro to Map ReduceIntro to Map Reduce
Intro to Map Reduce
 
MapReduce
MapReduceMapReduce
MapReduce
 
Session 19 - MapReduce
Session 19  - MapReduce Session 19  - MapReduce
Session 19 - MapReduce
 
An Introduction To Map-Reduce
An Introduction To Map-ReduceAn Introduction To Map-Reduce
An Introduction To Map-Reduce
 
DFA minimization algorithms in map reduce
DFA minimization algorithms in map reduceDFA minimization algorithms in map reduce
DFA minimization algorithms in map reduce
 
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
 
Lecture 3: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 3: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 3: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 3: Data-Intensive Computing for Text Analysis (Fall 2011)
 
Meta-MapReduce- A Technique for Reducing Communication in MapReduce Computations
Meta-MapReduce- A Technique for Reducing Communication in MapReduce ComputationsMeta-MapReduce- A Technique for Reducing Communication in MapReduce Computations
Meta-MapReduce- A Technique for Reducing Communication in MapReduce Computations
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
ch02-mapreduce.pptx
ch02-mapreduce.pptxch02-mapreduce.pptx
ch02-mapreduce.pptx
 
MapReduceAlgorithms.ppt
MapReduceAlgorithms.pptMapReduceAlgorithms.ppt
MapReduceAlgorithms.ppt
 

More from Shantanu Sharma

Secure and Privacy-Preserving Big-Data Processing
Secure and Privacy-Preserving Big-Data ProcessingSecure and Privacy-Preserving Big-Data Processing
Secure and Privacy-Preserving Big-Data ProcessingShantanu Sharma
 
OBSCURE: Information Theoretic Oblivious and Verifiable Aggregation Queries
OBSCURE: Information Theoretic Oblivious and Verifiable Aggregation QueriesOBSCURE: Information Theoretic Oblivious and Verifiable Aggregation Queries
OBSCURE: Information Theoretic Oblivious and Verifiable Aggregation QueriesShantanu Sharma
 
Verifiable Round-Robin Scheme for Smart Homes (CODASPY 2019)
Verifiable Round-Robin Scheme for Smart Homes (CODASPY 2019)Verifiable Round-Robin Scheme for Smart Homes (CODASPY 2019)
Verifiable Round-Robin Scheme for Smart Homes (CODASPY 2019)Shantanu Sharma
 
Partitioned Data Security on Outsourced Sensitive and Non-sensitive Data -- I...
Partitioned Data Security on Outsourced Sensitive and Non-sensitive Data -- I...Partitioned Data Security on Outsourced Sensitive and Non-sensitive Data -- I...
Partitioned Data Security on Outsourced Sensitive and Non-sensitive Data -- I...Shantanu Sharma
 
Private and secure secret shared map reduce
Private and secure secret shared map reducePrivate and secure secret shared map reduce
Private and secure secret shared map reduceShantanu Sharma
 
A Survey on 5G: The Next Generation of Mobile Communication
A Survey on 5G: The Next Generation of Mobile CommunicationA Survey on 5G: The Next Generation of Mobile Communication
A Survey on 5G: The Next Generation of Mobile CommunicationShantanu Sharma
 
On Detecting Termination in Cognitive Radio Networks
On Detecting Termination in Cognitive Radio NetworksOn Detecting Termination in Cognitive Radio Networks
On Detecting Termination in Cognitive Radio NetworksShantanu Sharma
 
Bounds for overlapping interval join on MapReduce
Bounds for overlapping interval join on MapReduceBounds for overlapping interval join on MapReduce
Bounds for overlapping interval join on MapReduceShantanu Sharma
 
Self-Stabilizing End-to-End Communication in Bounded Capacity, Omitting, D...
Self-Stabilizing End-to-End Communication in Bounded Capacity, Omitting, D...Self-Stabilizing End-to-End Communication in Bounded Capacity, Omitting, D...
Self-Stabilizing End-to-End Communication in Bounded Capacity, Omitting, D...Shantanu Sharma
 

More from Shantanu Sharma (9)

Secure and Privacy-Preserving Big-Data Processing
Secure and Privacy-Preserving Big-Data ProcessingSecure and Privacy-Preserving Big-Data Processing
Secure and Privacy-Preserving Big-Data Processing
 
OBSCURE: Information Theoretic Oblivious and Verifiable Aggregation Queries
OBSCURE: Information Theoretic Oblivious and Verifiable Aggregation QueriesOBSCURE: Information Theoretic Oblivious and Verifiable Aggregation Queries
OBSCURE: Information Theoretic Oblivious and Verifiable Aggregation Queries
 
Verifiable Round-Robin Scheme for Smart Homes (CODASPY 2019)
Verifiable Round-Robin Scheme for Smart Homes (CODASPY 2019)Verifiable Round-Robin Scheme for Smart Homes (CODASPY 2019)
Verifiable Round-Robin Scheme for Smart Homes (CODASPY 2019)
 
Partitioned Data Security on Outsourced Sensitive and Non-sensitive Data -- I...
Partitioned Data Security on Outsourced Sensitive and Non-sensitive Data -- I...Partitioned Data Security on Outsourced Sensitive and Non-sensitive Data -- I...
Partitioned Data Security on Outsourced Sensitive and Non-sensitive Data -- I...
 
Private and secure secret shared map reduce
Private and secure secret shared map reducePrivate and secure secret shared map reduce
Private and secure secret shared map reduce
 
A Survey on 5G: The Next Generation of Mobile Communication
A Survey on 5G: The Next Generation of Mobile CommunicationA Survey on 5G: The Next Generation of Mobile Communication
A Survey on 5G: The Next Generation of Mobile Communication
 
On Detecting Termination in Cognitive Radio Networks
On Detecting Termination in Cognitive Radio NetworksOn Detecting Termination in Cognitive Radio Networks
On Detecting Termination in Cognitive Radio Networks
 
Bounds for overlapping interval join on MapReduce
Bounds for overlapping interval join on MapReduceBounds for overlapping interval join on MapReduce
Bounds for overlapping interval join on MapReduce
 
Self-Stabilizing End-to-End Communication in Bounded Capacity, Omitting, D...
Self-Stabilizing End-to-End Communication in Bounded Capacity, Omitting, D...Self-Stabilizing End-to-End Communication in Bounded Capacity, Omitting, D...
Self-Stabilizing End-to-End Communication in Bounded Capacity, Omitting, D...
 

Recently uploaded

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 

Recently uploaded (20)

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 

Assignment of Different-Sized Inputs in MapReduce

  • 1. 28th International Symposium on Distributed Computing (DISC 2014) Austin, Texas, USA (12-15 October 2014) Assignment of Different-Sized Inputs in MapReduce Shantanu Sharma2 joint work with Foto N. Afrati1, Shlomi Dolev2, Ephraim Korach2, and Jeffrey D. Ullman3 1 National Technical University of Athens, Greece 2 Ben-Gurion University of the Negev, Israel 3 Stanford University, USA
  • 2. Introduction • Cluster Computing – Terabytes or Petabytes amount of data cannot be processed on a single computer – Cluster of computers – How to mask failures, e.g., hardware failures • MapReduce is a programming model used for parallel processing over large-scale data 2
  • 3. Introduction 3 MapReduce job: Map Phase and Reduce Phase Worker Worker Master process Map Phase: applies a user-defined Map function Worker Worker Worker fork Read Local write Remote read, sort Output File 0 Output File 1 Write Chunk 0 Chunk 1 Chunk 2 Input Data Reduce Phase: applies a user-defined Reduce function
  • 4. MapReduce working example – Word Count Mapper 1 Reducer for I Mapper 2 Introduction I 1 like 1 apple 2 Reducer for like Reducer for apple Reducer for is Reducer for fruit Reducer for banana (I, 2) (like, 2) (apple, 2) (is, 1) (fruit, 1) (banana, 1) I like apple. Apple is fruit. I like banana. is 1 fruit 1 I 1 like 1 banana 1
  • 5. Inputs and outputs in our context Mapper 1 Reducer for I Mapper 2 Introduction I 1 like 1 apple 2 Reducer for like Reducer for apple Reducer for is Reducer for fruit Reducer for banana (I, 2) (like, 2) (apple, 2) (is, 1) (fruit, 1) (banana, 1) I like apple. Apple is fruit. I like banana. is 1 fruit 1 I 1 like 1 banana 1 Inputs Outputs
  • 6. Reducer Capacity • Values, provided by each mapper, have some sizes (input size) • Reduce capacity: an upper bound on the sum of the sizes of the values that are assigned to the reducer • Example: reducer capacity to be the size of the main memory of the processors on which reducers run We consider two special matching problems 6
  • 7. State-of-the-Art • F. Afrati, A.D. Sarma, S. Salihoglu, and J.D. Ullman, “Upper and Lower Bounds on the Cost of a Map- Reduce Computation,” PVLDB, 2013. • Unit input size • Reducer Size – Maximum number of inputs that a given reducer can have. 7
  • 8. Problem Statement • Communication cost between the map and the reduce phases is a significant factor • How we can reduce the communication cost? – A lesser number of reducers, and hence, a smaller communication cost – How to minimize the total number of reducers while respecting their limited capacity? • Not an easy task – All-to-All mapping schema problem – X-to-Y mapping schema problem 8 Mapper for 1st input Reducer for k1 (1, 2) Reducer for k2 (1, 3) Reducer for k3 (2, 3) Mapper for 2nd input Mapper for 3rd input input1 k1 input1 k2 input2 k1 k input2 3 input3 k2 input3 k3 Mapper for 1st input Reducer for k1 (1, 2, 3) Mapper for 2nd input Mapper for 3rd input input1 k1 input2 k1 input3 k1 inputinput 1 2 input3 inputinput 1 2 input3 Notation ki: key
  • 9. A2A Mapping Schema Problem • A set of inputs is given • Each pair of inputs corresponds to one output • Example – Computing common friends • Lists of friends of m persons are given • Find common friends of the given m persons • Every two friend lists must be assigned to a single common reducer 9
  • 10. A2A Mapping Schema Problem Mapper for fl1 Reducer for k1 1st friend fl2 fl3 (1, 2, 3) fl4 Reducer for k2 (1, 2, 4) Reducer for k3 (3, 4) Mapper for 2nd friend Mapper for 3rd friend Mapper for 4th friend fl1 k1 fl1 k2 fl2 k1 Reducer capacity is enough to hold some of the friend lists together k fl2 2 fl3 k1 fl3 k3 fl4 k2 flk 4 3 10 Notations ki: key 1, 2 fli: ith friend list 1, 3 2, 3 1, 4 2, 4 3, 4
  • 11. A2A Mapping Schema Problem Mapper for 1st friend fl1 fl2 fl3 Notations ki: key fli: ith 1, 2 friend list 1, 3 1, 4 Reducer for k1 (1, 2, 3, 4) fl4 Mapper for 2nd friend Mapper for 3rd friend Mapper for 4th friend fl1 k1 Reducer capacity is enough to hold all the friend lists together fl2 k1 fl3 k1 fl4 k1 11 2, 3 2, 4 3, 4
  • 12. A2A Mapping Schema Problem • What to do? – Assigns the given m inputs to the given number of reducers, without exceeding q, in a manner that every given input is coupled with every other given input in at least one reducer in common • Polynomial time solution for one and two reducers • NP-hard for z > 2 reducers 12
  • 13. Heuristics for A2A Mapping Schema Problem • Based on – First-Fit Decreasing (FFD) or Best-Fit Decreasing (BFD) bin-packing algorithm – Pseudo-polynomial bin-packing algorithm* – 2-step Algorithms – The selection of a prime number p • A fixed reducer capacity is given 13 *D. R. Karger and J. Scott. Efficient algorithms for fixed-precision instances of bin packing and euclidean tsp. In APPROX-RANDOM, pages 104–117, 2008.
  • 14. X2Y Mapping Schema Problem • Two disjoint sets X and Y are given • Each pairs of element xi, yj (where xi  X, yj  Y, i, j) of the sets X and Y corresponds to one output • Example – Skew Join • Two relations X(A, B) and Y(B, C) are given where lots of tuple have a common “b” value • Every tuple with an identical “b” value is required to assign to at least one reducer 14
  • 15. X2Y Mapping Schema Problem • What to do? – Assigns each input of the set X with each input of the set Y to at least one reducer in common, without exceeding q • Polynomial for one reducer – Can we assign all the inputs of the sets X and Y to a single reducer • NP-hard for z > 1 reducers 15
  • 16. Heuristics for X2Y Mapping Schema Problem • Based on – First-Fit Decreasing (FFD) or Best-Fit Decreasing (BFD) bin-packing algorithm • A fixed reducer capacity is given 16
  • 17. Conclusion • Reducer capacity – An important parameter to be considered in all MapReduce algorithms – The capacity is in terms of, not necessarily identical, memory auxiliary size, augmented and added to the index of the data item(s) • Two assignment schemas of MapReduce are given – All-to-All (A2A) mapping schema problem – X-to-Y (X2Y) mapping schema problem • Several heuristics for A2A and X2Y mapping schema problems are provided 17
  • 18. Presentation is available at http://www.cs.bgu.ac.il/~sharmas/publication.html Foto Afrati1, Shlomi Dolev2, Ephraim Korach3, Shantanu Sharma2, and Jeffrey D. Ullman4 1 School of Electrical and Computing Engineering, National Technical University of Athens, Greece afrati@softlab.ece.ntua.gr 2 Department of Computer Science, Ben-Gurion University of the Negev, Israel {dolev,sharmas}@cs.bgu.ac.il 3 Department of Industrial Engineering and Management, Ben-Gurion University of the Negev, Israel korach@bgu.ac.il 4 Department of Computer Science, Stanford University, USA ullman@cs.stanford.edu