SlideShare a Scribd company logo
28th International Symposium on Distributed Computing (DISC 2014) 
Austin, Texas, USA (12-15 October 2014) 
Assignment of Different-Sized 
Inputs in MapReduce 
Shantanu Sharma2 
joint work with 
Foto N. Afrati1, Shlomi Dolev2, Ephraim Korach2, and 
Jeffrey D. Ullman3 
1 National Technical University of Athens, Greece 
2 Ben-Gurion University of the Negev, Israel 
3 Stanford University, USA
Introduction 
• Cluster Computing 
– Terabytes or Petabytes amount of data cannot be 
processed on a single computer 
– Cluster of computers 
– How to mask failures, e.g., hardware failures 
• MapReduce is a programming model used for 
parallel processing over large-scale data 
2
Introduction 
3 
MapReduce job: Map Phase and Reduce Phase 
Worker 
Worker 
Master 
process 
Map Phase: applies a 
user-defined Map 
function 
Worker 
Worker 
Worker 
fork 
Read Local 
write 
Remote read, 
sort 
Output 
File 0 
Output 
File 1 
Write 
Chunk 0 
Chunk 1 
Chunk 2 
Input Data 
Reduce Phase: applies 
a user-defined Reduce 
function
MapReduce working example – Word Count 
Mapper 
1 
Reducer for 
I 
Mapper 
2 
Introduction 
I 1 
like 1 
apple 2 
Reducer for 
like 
Reducer for 
apple 
Reducer for 
is 
Reducer for 
fruit 
Reducer for 
banana 
(I, 2) 
(like, 2) 
(apple, 2) 
(is, 1) 
(fruit, 1) 
(banana, 1) 
I like 
apple. 
Apple is 
fruit. 
I like 
banana. 
is 1 
fruit 1 
I 1 
like 1 
banana 1
Inputs and outputs in our context 
Mapper 
1 
Reducer for 
I 
Mapper 
2 
Introduction 
I 1 
like 1 
apple 2 
Reducer for 
like 
Reducer for 
apple 
Reducer for 
is 
Reducer for 
fruit 
Reducer for 
banana 
(I, 2) 
(like, 2) 
(apple, 2) 
(is, 1) 
(fruit, 1) 
(banana, 1) 
I like 
apple. 
Apple is 
fruit. 
I like 
banana. 
is 1 
fruit 1 
I 1 
like 1 
banana 1 
Inputs 
Outputs
Reducer Capacity 
• Values, provided by each mapper, have some sizes 
(input size) 
• Reduce capacity: an upper bound on the sum of the 
sizes of the values that are assigned to the reducer 
• Example: reducer capacity to be the size of the main 
memory of the processors on which reducers run 
We consider two special matching problems 
6
State-of-the-Art 
• F. Afrati, A.D. Sarma, S. Salihoglu, and J.D. Ullman, 
“Upper and Lower Bounds on the Cost of a Map- 
Reduce Computation,” PVLDB, 2013. 
• Unit input size 
• Reducer Size 
– Maximum number of inputs that a given reducer 
can have. 
7
Problem Statement 
• Communication cost between the map and the 
reduce phases is a significant factor 
• How we can reduce the communication cost? 
– A lesser number of reducers, and hence, a smaller 
communication cost 
– How to minimize the total number of reducers 
while respecting their limited capacity? 
• Not an easy task 
– All-to-All mapping schema problem 
– X-to-Y mapping schema problem 
8 
Mapper for 
1st 
input 
Reducer for k1 
(1, 2) 
Reducer for k2 
(1, 3) 
Reducer for k3 
(2, 3) 
Mapper for 
2nd 
input 
Mapper for 
3rd 
input 
input1 k1 
input1 
k2 
input2 k1 
k input2 3 
input3 k2 
input3 k3 
Mapper for 
1st 
input 
Reducer for k1 
(1, 2, 3) 
Mapper for 
2nd 
input 
Mapper for 
3rd 
input 
input1 k1 
input2 k1 
input3 k1 
inputinput 1 2 input3 
inputinput 1 2 input3 
Notation 
ki: key
A2A Mapping Schema Problem 
• A set of inputs is given 
• Each pair of inputs corresponds to one output 
• Example 
– Computing common friends 
• Lists of friends of m persons are given 
• Find common friends of the given m persons 
• Every two friend lists must be assigned to a single 
common reducer 
9
A2A Mapping Schema Problem 
Mapper for 
fl1 Reducer for k1 
1st 
friend 
fl2 
fl3 
(1, 2, 3) 
fl4 
Reducer for k2 
(1, 2, 4) 
Reducer for k3 
(3, 4) 
Mapper for 
2nd 
friend 
Mapper for 
3rd 
friend 
Mapper for 
4th 
friend 
fl1 k1 
fl1 
k2 
fl2 k1 
Reducer capacity is 
enough to hold some of 
the friend lists together 
k fl2 2 
fl3 k1 
fl3 k3 
fl4 k2 
flk 4 3 
10 
Notations 
ki: key 
1, 2 fli: ith friend list 
1, 3 
2, 3 
1, 4 
2, 4 
3, 4
A2A Mapping Schema Problem 
Mapper for 
1st 
friend 
fl1 
fl2 
fl3 
Notations 
ki: key 
fli: ith 1, 2 friend list 
1, 3 
1, 4 
Reducer for k1 
(1, 2, 3, 4) 
fl4 
Mapper for 
2nd 
friend 
Mapper for 
3rd 
friend 
Mapper for 
4th 
friend 
fl1 k1 
Reducer capacity is 
enough to hold all the 
friend lists together 
fl2 k1 
fl3 k1 
fl4 k1 
11 
2, 3 
2, 4 
3, 4
A2A Mapping Schema Problem 
• What to do? 
– Assigns the given m inputs to the given number of 
reducers, without exceeding q, in a manner that 
every given input is coupled with every other given 
input in at least one reducer in common 
• Polynomial time solution for one and two 
reducers 
• NP-hard for z > 2 reducers 
12
Heuristics for A2A Mapping 
Schema Problem 
• Based on 
– First-Fit Decreasing (FFD) or Best-Fit Decreasing 
(BFD) bin-packing algorithm 
– Pseudo-polynomial bin-packing algorithm* 
– 2-step Algorithms 
– The selection of a prime number p 
• A fixed reducer capacity is given 
13 
*D. R. Karger and J. Scott. Efficient algorithms for fixed-precision instances of bin 
packing and euclidean tsp. In APPROX-RANDOM, pages 104–117, 2008.
X2Y Mapping Schema Problem 
• Two disjoint sets X and Y are given 
• Each pairs of element xi, yj (where xi  X, yj 
 Y, i, j) of the sets X and Y corresponds to 
one output 
• Example 
– Skew Join 
• Two relations X(A, B) and Y(B, C) are given where lots of 
tuple have a common “b” value 
• Every tuple with an identical “b” value is required to 
assign to at least one reducer 
14
X2Y Mapping Schema Problem 
• What to do? 
– Assigns each input of the set X with each input 
of the set Y to at least one reducer in common, 
without exceeding q 
• Polynomial for one reducer 
– Can we assign all the inputs of the sets X and Y to 
a single reducer 
• NP-hard for z > 1 reducers 
15
Heuristics for X2Y Mapping 
Schema Problem 
• Based on 
– First-Fit Decreasing (FFD) or Best-Fit Decreasing 
(BFD) bin-packing algorithm 
• A fixed reducer capacity is given 
16
Conclusion 
• Reducer capacity 
– An important parameter to be considered in all MapReduce 
algorithms 
– The capacity is in terms of, not necessarily identical, memory 
auxiliary size, augmented and added to the index of the data 
item(s) 
• Two assignment schemas of MapReduce are given 
– All-to-All (A2A) mapping schema problem 
– X-to-Y (X2Y) mapping schema problem 
• Several heuristics for A2A and X2Y mapping schema 
problems are provided 
17
Presentation is available at 
http://www.cs.bgu.ac.il/~sharmas/publication.html 
Foto Afrati1, Shlomi Dolev2, Ephraim Korach3, 
Shantanu Sharma2, and Jeffrey D. Ullman4 
1 School of Electrical and Computing Engineering, National Technical 
University of Athens, Greece 
afrati@softlab.ece.ntua.gr 
2 Department of Computer Science, Ben-Gurion University of the 
Negev, Israel 
{dolev,sharmas}@cs.bgu.ac.il 
3 Department of Industrial Engineering and Management, Ben-Gurion 
University of the Negev, Israel 
korach@bgu.ac.il 
4 Department of Computer Science, Stanford University, USA 
ullman@cs.stanford.edu

More Related Content

What's hot

Mapreduce advanced
Mapreduce advancedMapreduce advanced
Mapreduce advanced
Chirag Ahuja
 
main
mainmain
Main map reduce
Main map reduceMain map reduce
Main map reduce
Masoumeh Rezaei Jam
 
CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query Execution
J Singh
 
Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compres...
Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compres...Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compres...
Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compres...
Daniel Lemire
 
Search algorithms for discrete optimization
Search algorithms for discrete optimizationSearch algorithms for discrete optimization
Search algorithms for discrete optimization
Sally Salem
 
Hadoop scheduler with deadline constraint
Hadoop scheduler with deadline constraintHadoop scheduler with deadline constraint
Hadoop scheduler with deadline constraint
ijccsa
 
Interpreting the Data:Parallel Analysis with Sawzall
Interpreting the Data:Parallel Analysis with SawzallInterpreting the Data:Parallel Analysis with Sawzall
Interpreting the Data:Parallel Analysis with Sawzall
Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL
 
Access to non local names
Access to non local namesAccess to non local names
Access to non local names
Varsha Kumar
 
3D-DRESD Polaris
3D-DRESD Polaris3D-DRESD Polaris
3D-DRESD Polaris
Marco Santambrogio
 
Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...
NECST Lab @ Politecnico di Milano
 
Topic 6: MapReduce Applications
Topic 6: MapReduce ApplicationsTopic 6: MapReduce Applications
Topic 6: MapReduce Applications
Zubair Nabi
 
Lecture set 5
Lecture set 5Lecture set 5
Lecture set 5
Gopi Saiteja
 
04 pig data operations
04 pig data operations04 pig data operations
04 pig data operations
Subhas Kumar Ghosh
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
VNIT-ACM Student Chapter
 
Scheduling Using Multi Objective Genetic Algorithm
Scheduling Using Multi Objective Genetic AlgorithmScheduling Using Multi Objective Genetic Algorithm
Scheduling Using Multi Objective Genetic Algorithm
iosrjce
 
Run time administration
Run time administrationRun time administration
Run time administration
Arjun Srivastava
 
C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
C-MR: Continuously Executing MapReduce Workflows on Multi-Core ProcessorsC-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
Qian Lin
 
Parallel-kmeans
Parallel-kmeansParallel-kmeans
Parallel-kmeans
Tien-Yang (Aiden) Wu
 
Chapter 7 Run Time Environment
Chapter 7   Run Time EnvironmentChapter 7   Run Time Environment
Chapter 7 Run Time Environment
Radhakrishnan Chinnusamy
 

What's hot (20)

Mapreduce advanced
Mapreduce advancedMapreduce advanced
Mapreduce advanced
 
main
mainmain
main
 
Main map reduce
Main map reduceMain map reduce
Main map reduce
 
CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query Execution
 
Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compres...
Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compres...Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compres...
Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compres...
 
Search algorithms for discrete optimization
Search algorithms for discrete optimizationSearch algorithms for discrete optimization
Search algorithms for discrete optimization
 
Hadoop scheduler with deadline constraint
Hadoop scheduler with deadline constraintHadoop scheduler with deadline constraint
Hadoop scheduler with deadline constraint
 
Interpreting the Data:Parallel Analysis with Sawzall
Interpreting the Data:Parallel Analysis with SawzallInterpreting the Data:Parallel Analysis with Sawzall
Interpreting the Data:Parallel Analysis with Sawzall
 
Access to non local names
Access to non local namesAccess to non local names
Access to non local names
 
3D-DRESD Polaris
3D-DRESD Polaris3D-DRESD Polaris
3D-DRESD Polaris
 
Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...
 
Topic 6: MapReduce Applications
Topic 6: MapReduce ApplicationsTopic 6: MapReduce Applications
Topic 6: MapReduce Applications
 
Lecture set 5
Lecture set 5Lecture set 5
Lecture set 5
 
04 pig data operations
04 pig data operations04 pig data operations
04 pig data operations
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Scheduling Using Multi Objective Genetic Algorithm
Scheduling Using Multi Objective Genetic AlgorithmScheduling Using Multi Objective Genetic Algorithm
Scheduling Using Multi Objective Genetic Algorithm
 
Run time administration
Run time administrationRun time administration
Run time administration
 
C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
C-MR: Continuously Executing MapReduce Workflows on Multi-Core ProcessorsC-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
 
Parallel-kmeans
Parallel-kmeansParallel-kmeans
Parallel-kmeans
 
Chapter 7 Run Time Environment
Chapter 7   Run Time EnvironmentChapter 7   Run Time Environment
Chapter 7 Run Time Environment
 

Similar to Assignment of Different-Sized Inputs in MapReduce

MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLabMapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab
 
BDC-presentation
BDC-presentationBDC-presentation
BDC-presentation
Pavel Popa
 
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduceComputing Scientometrics in Large-Scale Academic Search Engines with MapReduce
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce
Leonidas Akritidis
 
The google MapReduce
The google MapReduceThe google MapReduce
The google MapReduce
Romain Jacotin
 
MapReduce
MapReduceMapReduce
MapReduce
KavyaGo
 
poster
posterposter
Hadoop classes in mumbai
Hadoop classes in mumbaiHadoop classes in mumbai
Hadoop classes in mumbai
Vibrant Technologies & Computers
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Apache Apex
 
Hadoop-Introduction
Hadoop-IntroductionHadoop-Introduction
Hadoop-Introduction
Sandeep Deshmukh
 
Intro to Map Reduce
Intro to Map ReduceIntro to Map Reduce
Intro to Map Reduce
Doron Vainrub
 
MapReduce
MapReduceMapReduce
Session 19 - MapReduce
Session 19  - MapReduce Session 19  - MapReduce
Session 19 - MapReduce
AnandMHadoop
 
An Introduction To Map-Reduce
An Introduction To Map-ReduceAn Introduction To Map-Reduce
An Introduction To Map-Reduce
Francisco Pérez-Sorrosal
 
DFA minimization algorithms in map reduce
DFA minimization algorithms in map reduceDFA minimization algorithms in map reduce
DFA minimization algorithms in map reduce
Iraj Hedayati
 
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
NAVER Engineering
 
Lecture 3: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 3: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 3: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 3: Data-Intensive Computing for Text Analysis (Fall 2011)
Matthew Lease
 
Meta-MapReduce- A Technique for Reducing Communication in MapReduce Computations
Meta-MapReduce- A Technique for Reducing Communication in MapReduce ComputationsMeta-MapReduce- A Technique for Reducing Communication in MapReduce Computations
Meta-MapReduce- A Technique for Reducing Communication in MapReduce Computations
Shantanu Sharma
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
Mohamed Ali Mahmoud khouder
 
ch02-mapreduce.pptx
ch02-mapreduce.pptxch02-mapreduce.pptx
ch02-mapreduce.pptx
GiannisPagges
 
MapReduceAlgorithms.ppt
MapReduceAlgorithms.pptMapReduceAlgorithms.ppt
MapReduceAlgorithms.ppt
CheeWeiTan10
 

Similar to Assignment of Different-Sized Inputs in MapReduce (20)

MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLabMapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
 
BDC-presentation
BDC-presentationBDC-presentation
BDC-presentation
 
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduceComputing Scientometrics in Large-Scale Academic Search Engines with MapReduce
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce
 
The google MapReduce
The google MapReduceThe google MapReduce
The google MapReduce
 
MapReduce
MapReduceMapReduce
MapReduce
 
poster
posterposter
poster
 
Hadoop classes in mumbai
Hadoop classes in mumbaiHadoop classes in mumbai
Hadoop classes in mumbai
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop-Introduction
Hadoop-IntroductionHadoop-Introduction
Hadoop-Introduction
 
Intro to Map Reduce
Intro to Map ReduceIntro to Map Reduce
Intro to Map Reduce
 
MapReduce
MapReduceMapReduce
MapReduce
 
Session 19 - MapReduce
Session 19  - MapReduce Session 19  - MapReduce
Session 19 - MapReduce
 
An Introduction To Map-Reduce
An Introduction To Map-ReduceAn Introduction To Map-Reduce
An Introduction To Map-Reduce
 
DFA minimization algorithms in map reduce
DFA minimization algorithms in map reduceDFA minimization algorithms in map reduce
DFA minimization algorithms in map reduce
 
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
 
Lecture 3: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 3: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 3: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 3: Data-Intensive Computing for Text Analysis (Fall 2011)
 
Meta-MapReduce- A Technique for Reducing Communication in MapReduce Computations
Meta-MapReduce- A Technique for Reducing Communication in MapReduce ComputationsMeta-MapReduce- A Technique for Reducing Communication in MapReduce Computations
Meta-MapReduce- A Technique for Reducing Communication in MapReduce Computations
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
ch02-mapreduce.pptx
ch02-mapreduce.pptxch02-mapreduce.pptx
ch02-mapreduce.pptx
 
MapReduceAlgorithms.ppt
MapReduceAlgorithms.pptMapReduceAlgorithms.ppt
MapReduceAlgorithms.ppt
 

More from Shantanu Sharma

Secure and Privacy-Preserving Big-Data Processing
Secure and Privacy-Preserving Big-Data ProcessingSecure and Privacy-Preserving Big-Data Processing
Secure and Privacy-Preserving Big-Data Processing
Shantanu Sharma
 
OBSCURE: Information Theoretic Oblivious and Verifiable Aggregation Queries
OBSCURE: Information Theoretic Oblivious and Verifiable Aggregation QueriesOBSCURE: Information Theoretic Oblivious and Verifiable Aggregation Queries
OBSCURE: Information Theoretic Oblivious and Verifiable Aggregation Queries
Shantanu Sharma
 
Verifiable Round-Robin Scheme for Smart Homes (CODASPY 2019)
Verifiable Round-Robin Scheme for Smart Homes (CODASPY 2019)Verifiable Round-Robin Scheme for Smart Homes (CODASPY 2019)
Verifiable Round-Robin Scheme for Smart Homes (CODASPY 2019)
Shantanu Sharma
 
Partitioned Data Security on Outsourced Sensitive and Non-sensitive Data -- I...
Partitioned Data Security on Outsourced Sensitive and Non-sensitive Data -- I...Partitioned Data Security on Outsourced Sensitive and Non-sensitive Data -- I...
Partitioned Data Security on Outsourced Sensitive and Non-sensitive Data -- I...
Shantanu Sharma
 
Private and secure secret shared map reduce
Private and secure secret shared map reducePrivate and secure secret shared map reduce
Private and secure secret shared map reduce
Shantanu Sharma
 
A Survey on 5G: The Next Generation of Mobile Communication
A Survey on 5G: The Next Generation of Mobile CommunicationA Survey on 5G: The Next Generation of Mobile Communication
A Survey on 5G: The Next Generation of Mobile Communication
Shantanu Sharma
 
On Detecting Termination in Cognitive Radio Networks
On Detecting Termination in Cognitive Radio NetworksOn Detecting Termination in Cognitive Radio Networks
On Detecting Termination in Cognitive Radio Networks
Shantanu Sharma
 
Bounds for overlapping interval join on MapReduce
Bounds for overlapping interval join on MapReduceBounds for overlapping interval join on MapReduce
Bounds for overlapping interval join on MapReduce
Shantanu Sharma
 
Self-Stabilizing End-to-End Communication in Bounded Capacity, Omitting, D...
Self-Stabilizing End-to-End Communication in Bounded Capacity, Omitting, D...Self-Stabilizing End-to-End Communication in Bounded Capacity, Omitting, D...
Self-Stabilizing End-to-End Communication in Bounded Capacity, Omitting, D...
Shantanu Sharma
 

More from Shantanu Sharma (9)

Secure and Privacy-Preserving Big-Data Processing
Secure and Privacy-Preserving Big-Data ProcessingSecure and Privacy-Preserving Big-Data Processing
Secure and Privacy-Preserving Big-Data Processing
 
OBSCURE: Information Theoretic Oblivious and Verifiable Aggregation Queries
OBSCURE: Information Theoretic Oblivious and Verifiable Aggregation QueriesOBSCURE: Information Theoretic Oblivious and Verifiable Aggregation Queries
OBSCURE: Information Theoretic Oblivious and Verifiable Aggregation Queries
 
Verifiable Round-Robin Scheme for Smart Homes (CODASPY 2019)
Verifiable Round-Robin Scheme for Smart Homes (CODASPY 2019)Verifiable Round-Robin Scheme for Smart Homes (CODASPY 2019)
Verifiable Round-Robin Scheme for Smart Homes (CODASPY 2019)
 
Partitioned Data Security on Outsourced Sensitive and Non-sensitive Data -- I...
Partitioned Data Security on Outsourced Sensitive and Non-sensitive Data -- I...Partitioned Data Security on Outsourced Sensitive and Non-sensitive Data -- I...
Partitioned Data Security on Outsourced Sensitive and Non-sensitive Data -- I...
 
Private and secure secret shared map reduce
Private and secure secret shared map reducePrivate and secure secret shared map reduce
Private and secure secret shared map reduce
 
A Survey on 5G: The Next Generation of Mobile Communication
A Survey on 5G: The Next Generation of Mobile CommunicationA Survey on 5G: The Next Generation of Mobile Communication
A Survey on 5G: The Next Generation of Mobile Communication
 
On Detecting Termination in Cognitive Radio Networks
On Detecting Termination in Cognitive Radio NetworksOn Detecting Termination in Cognitive Radio Networks
On Detecting Termination in Cognitive Radio Networks
 
Bounds for overlapping interval join on MapReduce
Bounds for overlapping interval join on MapReduceBounds for overlapping interval join on MapReduce
Bounds for overlapping interval join on MapReduce
 
Self-Stabilizing End-to-End Communication in Bounded Capacity, Omitting, D...
Self-Stabilizing End-to-End Communication in Bounded Capacity, Omitting, D...Self-Stabilizing End-to-End Communication in Bounded Capacity, Omitting, D...
Self-Stabilizing End-to-End Communication in Bounded Capacity, Omitting, D...
 

Recently uploaded

Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
PsychoTech Services
 
Senior Software Profiles Backend Sample - Sheet1.pdf
Senior Software Profiles  Backend Sample - Sheet1.pdfSenior Software Profiles  Backend Sample - Sheet1.pdf
Senior Software Profiles Backend Sample - Sheet1.pdf
Vineet
 
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
9gr6pty
 
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
zoykygu
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
ywqeos
 
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
Timothy Spann
 
Q4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slideQ4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slide
mukulupadhayay1
 
06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus
Timothy Spann
 
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
Call Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call GirlCall Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call Girl
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
sapna sharmap11
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
nyvan3
 
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
Call Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call GirlCall Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call Girl
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
sapna sharmap11
 
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdfreading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
perranet1
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
agdhot
 
Senior Engineering Sample EM DOE - Sheet1.pdf
Senior Engineering Sample EM DOE  - Sheet1.pdfSenior Engineering Sample EM DOE  - Sheet1.pdf
Senior Engineering Sample EM DOE - Sheet1.pdf
Vineet
 
CAP Excel Formulas & Functions July - Copy (4).pdf
CAP Excel Formulas & Functions July - Copy (4).pdfCAP Excel Formulas & Functions July - Copy (4).pdf
CAP Excel Formulas & Functions July - Copy (4).pdf
frp60658
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
actyx
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
eudsoh
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
blueshagoo1
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
Vietnam Cotton & Spinning Association
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
hqfek
 

Recently uploaded (20)

Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
 
Senior Software Profiles Backend Sample - Sheet1.pdf
Senior Software Profiles  Backend Sample - Sheet1.pdfSenior Software Profiles  Backend Sample - Sheet1.pdf
Senior Software Profiles Backend Sample - Sheet1.pdf
 
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
 
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
 
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
 
Q4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slideQ4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slide
 
06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus
 
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
Call Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call GirlCall Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call Girl
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
 
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
Call Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call GirlCall Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call Girl
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
 
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdfreading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
 
Senior Engineering Sample EM DOE - Sheet1.pdf
Senior Engineering Sample EM DOE  - Sheet1.pdfSenior Engineering Sample EM DOE  - Sheet1.pdf
Senior Engineering Sample EM DOE - Sheet1.pdf
 
CAP Excel Formulas & Functions July - Copy (4).pdf
CAP Excel Formulas & Functions July - Copy (4).pdfCAP Excel Formulas & Functions July - Copy (4).pdf
CAP Excel Formulas & Functions July - Copy (4).pdf
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
 

Assignment of Different-Sized Inputs in MapReduce

  • 1. 28th International Symposium on Distributed Computing (DISC 2014) Austin, Texas, USA (12-15 October 2014) Assignment of Different-Sized Inputs in MapReduce Shantanu Sharma2 joint work with Foto N. Afrati1, Shlomi Dolev2, Ephraim Korach2, and Jeffrey D. Ullman3 1 National Technical University of Athens, Greece 2 Ben-Gurion University of the Negev, Israel 3 Stanford University, USA
  • 2. Introduction • Cluster Computing – Terabytes or Petabytes amount of data cannot be processed on a single computer – Cluster of computers – How to mask failures, e.g., hardware failures • MapReduce is a programming model used for parallel processing over large-scale data 2
  • 3. Introduction 3 MapReduce job: Map Phase and Reduce Phase Worker Worker Master process Map Phase: applies a user-defined Map function Worker Worker Worker fork Read Local write Remote read, sort Output File 0 Output File 1 Write Chunk 0 Chunk 1 Chunk 2 Input Data Reduce Phase: applies a user-defined Reduce function
  • 4. MapReduce working example – Word Count Mapper 1 Reducer for I Mapper 2 Introduction I 1 like 1 apple 2 Reducer for like Reducer for apple Reducer for is Reducer for fruit Reducer for banana (I, 2) (like, 2) (apple, 2) (is, 1) (fruit, 1) (banana, 1) I like apple. Apple is fruit. I like banana. is 1 fruit 1 I 1 like 1 banana 1
  • 5. Inputs and outputs in our context Mapper 1 Reducer for I Mapper 2 Introduction I 1 like 1 apple 2 Reducer for like Reducer for apple Reducer for is Reducer for fruit Reducer for banana (I, 2) (like, 2) (apple, 2) (is, 1) (fruit, 1) (banana, 1) I like apple. Apple is fruit. I like banana. is 1 fruit 1 I 1 like 1 banana 1 Inputs Outputs
  • 6. Reducer Capacity • Values, provided by each mapper, have some sizes (input size) • Reduce capacity: an upper bound on the sum of the sizes of the values that are assigned to the reducer • Example: reducer capacity to be the size of the main memory of the processors on which reducers run We consider two special matching problems 6
  • 7. State-of-the-Art • F. Afrati, A.D. Sarma, S. Salihoglu, and J.D. Ullman, “Upper and Lower Bounds on the Cost of a Map- Reduce Computation,” PVLDB, 2013. • Unit input size • Reducer Size – Maximum number of inputs that a given reducer can have. 7
  • 8. Problem Statement • Communication cost between the map and the reduce phases is a significant factor • How we can reduce the communication cost? – A lesser number of reducers, and hence, a smaller communication cost – How to minimize the total number of reducers while respecting their limited capacity? • Not an easy task – All-to-All mapping schema problem – X-to-Y mapping schema problem 8 Mapper for 1st input Reducer for k1 (1, 2) Reducer for k2 (1, 3) Reducer for k3 (2, 3) Mapper for 2nd input Mapper for 3rd input input1 k1 input1 k2 input2 k1 k input2 3 input3 k2 input3 k3 Mapper for 1st input Reducer for k1 (1, 2, 3) Mapper for 2nd input Mapper for 3rd input input1 k1 input2 k1 input3 k1 inputinput 1 2 input3 inputinput 1 2 input3 Notation ki: key
  • 9. A2A Mapping Schema Problem • A set of inputs is given • Each pair of inputs corresponds to one output • Example – Computing common friends • Lists of friends of m persons are given • Find common friends of the given m persons • Every two friend lists must be assigned to a single common reducer 9
  • 10. A2A Mapping Schema Problem Mapper for fl1 Reducer for k1 1st friend fl2 fl3 (1, 2, 3) fl4 Reducer for k2 (1, 2, 4) Reducer for k3 (3, 4) Mapper for 2nd friend Mapper for 3rd friend Mapper for 4th friend fl1 k1 fl1 k2 fl2 k1 Reducer capacity is enough to hold some of the friend lists together k fl2 2 fl3 k1 fl3 k3 fl4 k2 flk 4 3 10 Notations ki: key 1, 2 fli: ith friend list 1, 3 2, 3 1, 4 2, 4 3, 4
  • 11. A2A Mapping Schema Problem Mapper for 1st friend fl1 fl2 fl3 Notations ki: key fli: ith 1, 2 friend list 1, 3 1, 4 Reducer for k1 (1, 2, 3, 4) fl4 Mapper for 2nd friend Mapper for 3rd friend Mapper for 4th friend fl1 k1 Reducer capacity is enough to hold all the friend lists together fl2 k1 fl3 k1 fl4 k1 11 2, 3 2, 4 3, 4
  • 12. A2A Mapping Schema Problem • What to do? – Assigns the given m inputs to the given number of reducers, without exceeding q, in a manner that every given input is coupled with every other given input in at least one reducer in common • Polynomial time solution for one and two reducers • NP-hard for z > 2 reducers 12
  • 13. Heuristics for A2A Mapping Schema Problem • Based on – First-Fit Decreasing (FFD) or Best-Fit Decreasing (BFD) bin-packing algorithm – Pseudo-polynomial bin-packing algorithm* – 2-step Algorithms – The selection of a prime number p • A fixed reducer capacity is given 13 *D. R. Karger and J. Scott. Efficient algorithms for fixed-precision instances of bin packing and euclidean tsp. In APPROX-RANDOM, pages 104–117, 2008.
  • 14. X2Y Mapping Schema Problem • Two disjoint sets X and Y are given • Each pairs of element xi, yj (where xi  X, yj  Y, i, j) of the sets X and Y corresponds to one output • Example – Skew Join • Two relations X(A, B) and Y(B, C) are given where lots of tuple have a common “b” value • Every tuple with an identical “b” value is required to assign to at least one reducer 14
  • 15. X2Y Mapping Schema Problem • What to do? – Assigns each input of the set X with each input of the set Y to at least one reducer in common, without exceeding q • Polynomial for one reducer – Can we assign all the inputs of the sets X and Y to a single reducer • NP-hard for z > 1 reducers 15
  • 16. Heuristics for X2Y Mapping Schema Problem • Based on – First-Fit Decreasing (FFD) or Best-Fit Decreasing (BFD) bin-packing algorithm • A fixed reducer capacity is given 16
  • 17. Conclusion • Reducer capacity – An important parameter to be considered in all MapReduce algorithms – The capacity is in terms of, not necessarily identical, memory auxiliary size, augmented and added to the index of the data item(s) • Two assignment schemas of MapReduce are given – All-to-All (A2A) mapping schema problem – X-to-Y (X2Y) mapping schema problem • Several heuristics for A2A and X2Y mapping schema problems are provided 17
  • 18. Presentation is available at http://www.cs.bgu.ac.il/~sharmas/publication.html Foto Afrati1, Shlomi Dolev2, Ephraim Korach3, Shantanu Sharma2, and Jeffrey D. Ullman4 1 School of Electrical and Computing Engineering, National Technical University of Athens, Greece afrati@softlab.ece.ntua.gr 2 Department of Computer Science, Ben-Gurion University of the Negev, Israel {dolev,sharmas}@cs.bgu.ac.il 3 Department of Industrial Engineering and Management, Ben-Gurion University of the Negev, Israel korach@bgu.ac.il 4 Department of Computer Science, Stanford University, USA ullman@cs.stanford.edu