SlideShare a Scribd company logo
Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu
Distributed Systems 1
CUCS Course 4113
https://systems.cs.columbia.edu/ds1-class/
Instructor: Roxana Geambasu
1
Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu
MapReduce
2
Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu
Outline
• Today: MapReduce
– Analytics programming interface
– Word count example
– Chaining
– Reading/write from/to HDFS
– Dealing with failures
• In future: Spark
– Motivation
– Resilient Distributed Datasets (RDDs)
– Programming interface
– Transformations and actions
3
Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu
Large-scale Analytics
• Compute the frequency of
words in a corpus of
documents.
• Count how many times users
have clicked on each of a
(large) set of ads.
• PageRank: Compute the
“importance” of a web page
based on the “importances”
of the pages that link to it.
• ….
4
Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu
Option 1: SQL
5
• Before MapReduce, analytics mostly done in SQL, or manually.
• Example: Count word appearances in a corpus of documents.
• With SQL, the rough query might be:
• Very expressive, convenient to program, but no one knew how
to scale SQL execution!
SELECT COUNT(*)
FROM (
SELECT UNNEST(string_to_array(doc_content, ‘ ’)) as word
FROM Corpus
)
GROUP BY word
Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu
Option 2: Manual
6
• Example: Count word appearances
in a corpus of documents.
Docs
net-
work
M1 M2
M5 M4
M3
Mn
Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu
Option 2: Manual
7
• Example: Count word appearances
in a corpus of documents.
• Phase 1: Assign documents to
different machines/nodes.
– Each computes a dictionary:
{word: local_freq}.
net-
work
M1 M2
M5 M4
M3
Mn
Docs
Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu
Option 2: Manual
8
• Example: Count word appearances
in a corpus of documents.
• Phase 1: Assign documents to
different machines/nodes.
– Each computes a dictionary:
{word: local_freq}.
• Phase 2: Nodes exchange
dictionaries (how?) to aggregate
local_freq’s.
Docs
net-
work
M1 M2
M5 M4
M3
Mn
Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu
Option 2: Manual
9
• Example: Count word appearances
in a corpus of documents.
• Phase 1: Assign documents to
different machines/nodes.
– Each computes a dictionary:
{word: local_freq}.
• Phase 2: Nodes exchange
dictionaries (how?) to aggregate
local_freq’s.
– But how to make this scale??
Docs
net-
work
M1 M2
M5 M4
M3
Mn
Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu
Option 2: Manual (cont’ed)
10
• Phase 2, Option a: Send all
{word: local_freq} dictionaries to
one node, who aggregates.
– But what if it’s too much
data for one node?
• Phase 2, Option b: Each node
sends (word, local_freq) to a
designated node, e.g., node
with ID hash(word) % N.
net-
work
M1 M2
M5 M4
M3
Mn
Docs
Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu
Option 2: Manual (cont’ed)
11
• Phase 2, Option a: Send all
{word: local_freq} dictionaries to
one node, who aggregates.
– But what if it’s too much
data for one node?
• Phase 2, Option b: Each node
sends (word, local_freq) to a
designated node, e.g., node
with ID hash(word) % N.
net-
work
M1 M2
M5 M4
M3
Mn
Docs
We’ve roughly discovered an app-specific version of MapReduce!
Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu
Option 2: Challenges
12
• How to generalize to other
applications?
• How to deal with failures?
• How to deal with slow nodes?
• How to deal with load balancing
(some docs are very large,
others small)?
• How to deal with skew (some
words are very frequent, so
nodes designated to aggregate
them will be pounded)?
• …
net-
work
M1 M2
M5 M4
M3
Mn
Docs
Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu
MapReduce
13
• Parallelizable programming model:
– Applies to a broad class of analytics applications.
– Isn’t as expressive as SQL but it is easier to scale.
– Consists of two phases, each intrinsically parallelizable:
• Map: processes input elements independently to emit relevant
(key, value) pairs from each.
• Transparently, the system groups all the values for each key
together: (key, [list of values]).
• Reduce: aggregates all the values for each key to emit a global
value for each key.
• Scalable, efficient, fault tolerant runtime system.
– Addresses preceding challenges (and more).
• Technical paper: please read it!
Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu
How it works
14
• Input: a collection of elements of (key, value) pair type.
• Programmer defines two functions:
– Map(key, value) → 0, 1, or more (key’, value’) pairs
– Reduce(key, value-list) → output
• Execution:
– Apply Map to each input key-value pair, in parallel for different keys.
– Sort emitted (key’, value’) pairs to produce (key, values-list) pairs.
– Apply Reduce to each (key, value-list) pair, in parallel for different
keys.
• Output is the union of all Reduce invocations’ outputs.
Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu
Workflow
15
Reducer
Reducer
Mapper
Mapper
Mapper
read
local
write
system does
remote read, sort
Output
File 0
Output
File 1
write
Split 0
Split 1
Split 2
Input data
(in a DFS)
Output data
(in a DFS)
Map phase:
extract something you care
about from each record
Reduce phase:
aggregate
Workers Workers
Tmp data
(local FS)
Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu
• We have a directory, which contains many documents.
• The documents contain words separated by whitespace
and punctuation.
• Goal: Count the number of times each distinct word
appears in the file.
Example: Word count
Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu
Word count with MapReduce
Map(key, value):
// key: document ID; value: document content
FOR (each word w IN value)
emit(w, 1);
Reduce(key, value-list):
// key: a word; value-list: a list of integers
result = 0;
FOR (each integer v on value-list)
result += v;
emit(key, result);
Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu
Word count with MapReduce
Map(key, value):
// key: document ID; value: document content
FOR (each word w IN value)
emit(w, 1);
Reduce(key, value-list):
// key: a word; value-list: a list of integers
result = 0;
FOR (each integer v on value-list)
result += v;
emit(key, result);
Expect to be all 1’s, but
“combiners” allow local
summing of
integers with the same
key before passing to
reducers.
Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu
Map Phase
• Mapper is given key: document ID; value: document content, say:
(D1,“The teacher went to the store. The store was closed; the
store opens in the morning. The store opens at 9am.”)
• It will emit the following pairs:
<The, 1> <teacher, 1> <went, 1> <to, 1> <the, 1> <store,1>
<the, 1> <store, 1> <was, 1> <closed, 1> <the, 1> <store,1>
<opens, 1> <in, 1> <the, 1> <morning, 1> <the 1> <store, 1>
<opens, 1> <at, 1> <9am, 1>
• Normally, there will be many documents, hence many Mappers that
emit such pairs in parallel, but for simplicity, let’s say that these are
all the pairs emitted from the Map phase. 19
Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu
Intermediary Phase
• System sorts emitted (key, value) pairs by key:
20
<9am, 1>
<at, 1>
<closed, 1>
<in, 1>
<morning, 1>
<opens, 1>
<opens, 1>
<store, 1>
<store,1>
<store, 1>
<store,1>
<teacher, 1>
<the, 1>
<the, 1>
<the, 1>
<the, 1>
<the 1>
<to, 1>
<went, 1>
<was, 1>
<The, 1>
Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu
Intermediary Phase
• System sorts emitted (key, value) pairs by key:
21
<9am, 1>
<at, 1>
<closed, 1>
<in, 1>
<morning, 1>
<opens, 1>
<opens, 1>
<store, 1>
<store,1>
<store, 1>
<store,1>
<teacher, 1>
<the, 1>
<the, 1>
<the, 1>
<the, 1>
<the 1>
<to, 1>
<went, 1>
<was, 1>
<The, 1>
Reducer 2
Reducer 1
Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu
• For each unique key emitted from the Map Phase, function
Reduce(key, value-list) is invoked on Reducer 1 or Reducer 2.
• Across their invocations, these Reducers will emit:
22
Reduce Phase
<9am, 1>
<at, 1>
<closed, 1>
<in, 1>
<morning, 1>
<opens, 2>
<store, 4>
<teacher, 1>
<the, 5>
<to, 1>
<went, 1>
<was, 1>
<The, 1>
Reducer 1 Reducer 2
Output of the entire program
Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu
Chaining MapReduce
• The programming model seems pretty restrictive.
• But quite a few analytics applications can be written with it,
especially with a technique called chaining.
• If the output of reducers is (key, value) pairs, then their output
can be passed onto other Map/Reduce processes.
• This chaining can support a variety of analytics (though
certainly not all types of analytics, e.g., no ML b/c no loops).
23
Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu
Example: Word Frequency
• Suppose instead of word count, we wanted to compute word
frequency: the probability that a word would appear in a document.
• This means computing the fraction of times a word appears, out of
the total number of words in the corpus.
24
Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu
Solution: Chain two MapReduce’s
• First Map/Reduce: Word Count (like before)
– Map: process documents and output <word, 1> pairs.
– Multiple Reducers: emit <word, word_count> for each word.
• Second MapReduce:
– Map: processes <word, word_count> and outputs (1, (word, word_count)).
– 1 Reducer: does two passes:
• In first pass, sums up all word_count’s to calculate overall_count.
• In second pass calculates fractions. Emits multiple <word,
word_count/overall_count>.
25
Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu
Option 2: Challenges (recall)
26
• How to generalize to other applications?
– See original MapReduce paper for more examples.
• How to deal with failures?
• How to deal with slow nodes?
• How to deal with load balancing?
• How to deal with skew?
• The MapReduce runtime tries to hide these challenges as
much as possible. We’ll talk about a few of the performance
and fault tolerance challenges here.
Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu
Architecture
27
Reducer
Reducer
Mapper
Mapper
Mapper
read
local
write
system does
remote read, sort
Output
File 0
Output
File 1
write
Split 0
Split 1
Split 2
Input data
(in a DFS)
Output data
(in a DFS)
Tmp data
(local FS)
Master
assign
map
task
assign
reduce
task
DFS
Data
Nodes
DFS
Data
Nodes
DFS
Name
Server
Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu
Data locality for performance
• Master scheduling policy:
– Asks DFS for locations of replicas of input file blocks.
– Map tasks are scheduled so DFS input block replica are
on the same machine or on the same rack.
• Effect: Thousands of machines read input at local
disk speed.
– Don’t need to transfer input data all over the cluster over
the network: eliminate network bottleneck!
28
Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu
Heartbeats, replication for fault tolerance
• Failures are the norm in data centers.
• Worker failure:
– Master detects if workers failed by periodically pinging
them (this is called a “heartbeat”).
– Re-execute in-progress map/reduce tasks.
• Master failure:
– Initially, was single point of failure; Resume from Execution
Log. Subsequent versions used replication and consensus.
• Effect: From Google’s paper: lost 1600 of 1800 machines once, but
finished fine.
29
Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu
Redundant execution for performance
• Slow workers or stragglers significantly lengthen completion time.
• Slowest worker can determine the total latency!
– This is why many systems measure 99th
percentile latency.
– Other jobs consuming resources on machine.
– Bad disks with errors transfer data very slowly.
• Solution: spawn backup copies of tasks.
– Whichever one finishes first "wins.”
– I.e., treat slow executions as failed executions!
30
Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu
Take-Aways
• MapReduce is a programming model and runtime for scalable, fault
tolerant, and efficient analytics.
– Well, some types of analytics, it’s not very expressive.
• It hides common at-scale challenges under convenient abstractions.
– Programmers need only implement Map and Reduce and the system
takes care of running those at scale on lots of data.
– Failures raise semantic/performance challenges for MapReduce, which
it typically handles through redundancy.
• There exist open-source implementations, including Hadoop, Flink.
• Spark, a popular data analytics framework, also supports
MapReduce but also a wider range of programming models.
31
Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu
Acknowledgement
• Slides prepared based on Asaf Cidon’s 2020 DS-1
invited lecture.
32

More Related Content

Similar to 02-map-reduce.pdf

Unit-2 Hadoop Framework.pdf
Unit-2 Hadoop Framework.pdfUnit-2 Hadoop Framework.pdf
Unit-2 Hadoop Framework.pdf
Sitamarhi Institute of Technology
 
Unit-2 Hadoop Framework.pdf
Unit-2 Hadoop Framework.pdfUnit-2 Hadoop Framework.pdf
Unit-2 Hadoop Framework.pdf
Sitamarhi Institute of Technology
 
MapReduce basics
MapReduce basicsMapReduce basics
MapReduce basics
Harisankar H
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
Pallav Jha
 
Big data shim
Big data shimBig data shim
Big data shim
tistrue
 
Evaluating Cues for Resuming Interrupted Programming TAsks
Evaluating Cues for Resuming Interrupted Programming TAsksEvaluating Cues for Resuming Interrupted Programming TAsks
Evaluating Cues for Resuming Interrupted Programming TAsks
Chris Parnin
 
5_RNN_LSTM.pdf
5_RNN_LSTM.pdf5_RNN_LSTM.pdf
5_RNN_LSTM.pdf
FEG
 
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduceComputing Scientometrics in Large-Scale Academic Search Engines with MapReduce
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce
Leonidas Akritidis
 
L19CloudMapReduce introduction for cloud computing .ppt
L19CloudMapReduce introduction for cloud computing .pptL19CloudMapReduce introduction for cloud computing .ppt
L19CloudMapReduce introduction for cloud computing .ppt
MaruthiPrasad96
 
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Neelabha Pant
 
The Legion Programming Model for HPC
The Legion Programming Model for HPCThe Legion Programming Model for HPC
The Legion Programming Model for HPC
inside-BigData.com
 
Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016
Ram Sriharsha
 
05 k-means clustering
05 k-means clustering05 k-means clustering
05 k-means clustering
Subhas Kumar Ghosh
 
Map reduce part1
Map reduce part1Map reduce part1
Map reduce part1
AllsoftSolutions
 
Enar short course
Enar short courseEnar short course
Enar short course
Deepak Agarwal
 
Mapreduce script
Mapreduce scriptMapreduce script
Mapreduce script
Haripritha
 
mapreduce.pptx
mapreduce.pptxmapreduce.pptx
mapreduce.pptx
ShimoFcis
 
OrientDB - Time Series and Event Sequences - Codemotion Milan 2014
OrientDB - Time Series and Event Sequences - Codemotion Milan 2014OrientDB - Time Series and Event Sequences - Codemotion Milan 2014
OrientDB - Time Series and Event Sequences - Codemotion Milan 2014
Luigi Dell'Aquila
 
ch02-mapreduce.pptx
ch02-mapreduce.pptxch02-mapreduce.pptx
ch02-mapreduce.pptx
GiannisPagges
 
Lecture 1 (bce-7)
Lecture   1 (bce-7)Lecture   1 (bce-7)
Lecture 1 (bce-7)
farazahmad005
 

Similar to 02-map-reduce.pdf (20)

Unit-2 Hadoop Framework.pdf
Unit-2 Hadoop Framework.pdfUnit-2 Hadoop Framework.pdf
Unit-2 Hadoop Framework.pdf
 
Unit-2 Hadoop Framework.pdf
Unit-2 Hadoop Framework.pdfUnit-2 Hadoop Framework.pdf
Unit-2 Hadoop Framework.pdf
 
MapReduce basics
MapReduce basicsMapReduce basics
MapReduce basics
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
 
Big data shim
Big data shimBig data shim
Big data shim
 
Evaluating Cues for Resuming Interrupted Programming TAsks
Evaluating Cues for Resuming Interrupted Programming TAsksEvaluating Cues for Resuming Interrupted Programming TAsks
Evaluating Cues for Resuming Interrupted Programming TAsks
 
5_RNN_LSTM.pdf
5_RNN_LSTM.pdf5_RNN_LSTM.pdf
5_RNN_LSTM.pdf
 
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduceComputing Scientometrics in Large-Scale Academic Search Engines with MapReduce
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce
 
L19CloudMapReduce introduction for cloud computing .ppt
L19CloudMapReduce introduction for cloud computing .pptL19CloudMapReduce introduction for cloud computing .ppt
L19CloudMapReduce introduction for cloud computing .ppt
 
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
 
The Legion Programming Model for HPC
The Legion Programming Model for HPCThe Legion Programming Model for HPC
The Legion Programming Model for HPC
 
Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016
 
05 k-means clustering
05 k-means clustering05 k-means clustering
05 k-means clustering
 
Map reduce part1
Map reduce part1Map reduce part1
Map reduce part1
 
Enar short course
Enar short courseEnar short course
Enar short course
 
Mapreduce script
Mapreduce scriptMapreduce script
Mapreduce script
 
mapreduce.pptx
mapreduce.pptxmapreduce.pptx
mapreduce.pptx
 
OrientDB - Time Series and Event Sequences - Codemotion Milan 2014
OrientDB - Time Series and Event Sequences - Codemotion Milan 2014OrientDB - Time Series and Event Sequences - Codemotion Milan 2014
OrientDB - Time Series and Event Sequences - Codemotion Milan 2014
 
ch02-mapreduce.pptx
ch02-mapreduce.pptxch02-mapreduce.pptx
ch02-mapreduce.pptx
 
Lecture 1 (bce-7)
Lecture   1 (bce-7)Lecture   1 (bce-7)
Lecture 1 (bce-7)
 

Recently uploaded

Engineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdfEngineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdf
abbyasa1014
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
Nada Hikmah
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
co23btech11018
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
VICTOR MAESTRE RAMIREZ
 
BRAIN TUMOR DETECTION for seminar ppt.pdf
BRAIN TUMOR DETECTION for seminar ppt.pdfBRAIN TUMOR DETECTION for seminar ppt.pdf
BRAIN TUMOR DETECTION for seminar ppt.pdf
LAXMAREDDY22
 
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have oneISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
Las Vegas Warehouse
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
IJECEIAES
 
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by AnantLLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
Anant Corporation
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
Madan Karki
 
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
ecqow
 
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
shadow0702a
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
Rahul
 
The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.
sachin chaurasia
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
Roger Rozario
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
Hematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood CountHematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood Count
shahdabdulbaset
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
gerogepatton
 
john krisinger-the science and history of the alcoholic beverage.pptx
john krisinger-the science and history of the alcoholic beverage.pptxjohn krisinger-the science and history of the alcoholic beverage.pptx
john krisinger-the science and history of the alcoholic beverage.pptx
Madan Karki
 

Recently uploaded (20)

Engineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdfEngineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdf
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
 
BRAIN TUMOR DETECTION for seminar ppt.pdf
BRAIN TUMOR DETECTION for seminar ppt.pdfBRAIN TUMOR DETECTION for seminar ppt.pdf
BRAIN TUMOR DETECTION for seminar ppt.pdf
 
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have oneISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
 
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by AnantLLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
 
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
 
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
 
The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
Hematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood CountHematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood Count
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
 
john krisinger-the science and history of the alcoholic beverage.pptx
john krisinger-the science and history of the alcoholic beverage.pptxjohn krisinger-the science and history of the alcoholic beverage.pptx
john krisinger-the science and history of the alcoholic beverage.pptx
 

02-map-reduce.pdf

  • 1. Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu Distributed Systems 1 CUCS Course 4113 https://systems.cs.columbia.edu/ds1-class/ Instructor: Roxana Geambasu 1
  • 2. Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu MapReduce 2
  • 3. Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu Outline • Today: MapReduce – Analytics programming interface – Word count example – Chaining – Reading/write from/to HDFS – Dealing with failures • In future: Spark – Motivation – Resilient Distributed Datasets (RDDs) – Programming interface – Transformations and actions 3
  • 4. Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu Large-scale Analytics • Compute the frequency of words in a corpus of documents. • Count how many times users have clicked on each of a (large) set of ads. • PageRank: Compute the “importance” of a web page based on the “importances” of the pages that link to it. • …. 4
  • 5. Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu Option 1: SQL 5 • Before MapReduce, analytics mostly done in SQL, or manually. • Example: Count word appearances in a corpus of documents. • With SQL, the rough query might be: • Very expressive, convenient to program, but no one knew how to scale SQL execution! SELECT COUNT(*) FROM ( SELECT UNNEST(string_to_array(doc_content, ‘ ’)) as word FROM Corpus ) GROUP BY word
  • 6. Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu Option 2: Manual 6 • Example: Count word appearances in a corpus of documents. Docs net- work M1 M2 M5 M4 M3 Mn
  • 7. Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu Option 2: Manual 7 • Example: Count word appearances in a corpus of documents. • Phase 1: Assign documents to different machines/nodes. – Each computes a dictionary: {word: local_freq}. net- work M1 M2 M5 M4 M3 Mn Docs
  • 8. Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu Option 2: Manual 8 • Example: Count word appearances in a corpus of documents. • Phase 1: Assign documents to different machines/nodes. – Each computes a dictionary: {word: local_freq}. • Phase 2: Nodes exchange dictionaries (how?) to aggregate local_freq’s. Docs net- work M1 M2 M5 M4 M3 Mn
  • 9. Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu Option 2: Manual 9 • Example: Count word appearances in a corpus of documents. • Phase 1: Assign documents to different machines/nodes. – Each computes a dictionary: {word: local_freq}. • Phase 2: Nodes exchange dictionaries (how?) to aggregate local_freq’s. – But how to make this scale?? Docs net- work M1 M2 M5 M4 M3 Mn
  • 10. Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu Option 2: Manual (cont’ed) 10 • Phase 2, Option a: Send all {word: local_freq} dictionaries to one node, who aggregates. – But what if it’s too much data for one node? • Phase 2, Option b: Each node sends (word, local_freq) to a designated node, e.g., node with ID hash(word) % N. net- work M1 M2 M5 M4 M3 Mn Docs
  • 11. Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu Option 2: Manual (cont’ed) 11 • Phase 2, Option a: Send all {word: local_freq} dictionaries to one node, who aggregates. – But what if it’s too much data for one node? • Phase 2, Option b: Each node sends (word, local_freq) to a designated node, e.g., node with ID hash(word) % N. net- work M1 M2 M5 M4 M3 Mn Docs We’ve roughly discovered an app-specific version of MapReduce!
  • 12. Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu Option 2: Challenges 12 • How to generalize to other applications? • How to deal with failures? • How to deal with slow nodes? • How to deal with load balancing (some docs are very large, others small)? • How to deal with skew (some words are very frequent, so nodes designated to aggregate them will be pounded)? • … net- work M1 M2 M5 M4 M3 Mn Docs
  • 13. Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu MapReduce 13 • Parallelizable programming model: – Applies to a broad class of analytics applications. – Isn’t as expressive as SQL but it is easier to scale. – Consists of two phases, each intrinsically parallelizable: • Map: processes input elements independently to emit relevant (key, value) pairs from each. • Transparently, the system groups all the values for each key together: (key, [list of values]). • Reduce: aggregates all the values for each key to emit a global value for each key. • Scalable, efficient, fault tolerant runtime system. – Addresses preceding challenges (and more). • Technical paper: please read it!
  • 14. Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu How it works 14 • Input: a collection of elements of (key, value) pair type. • Programmer defines two functions: – Map(key, value) → 0, 1, or more (key’, value’) pairs – Reduce(key, value-list) → output • Execution: – Apply Map to each input key-value pair, in parallel for different keys. – Sort emitted (key’, value’) pairs to produce (key, values-list) pairs. – Apply Reduce to each (key, value-list) pair, in parallel for different keys. • Output is the union of all Reduce invocations’ outputs.
  • 15. Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu Workflow 15 Reducer Reducer Mapper Mapper Mapper read local write system does remote read, sort Output File 0 Output File 1 write Split 0 Split 1 Split 2 Input data (in a DFS) Output data (in a DFS) Map phase: extract something you care about from each record Reduce phase: aggregate Workers Workers Tmp data (local FS)
  • 16. Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu • We have a directory, which contains many documents. • The documents contain words separated by whitespace and punctuation. • Goal: Count the number of times each distinct word appears in the file. Example: Word count
  • 17. Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu Word count with MapReduce Map(key, value): // key: document ID; value: document content FOR (each word w IN value) emit(w, 1); Reduce(key, value-list): // key: a word; value-list: a list of integers result = 0; FOR (each integer v on value-list) result += v; emit(key, result);
  • 18. Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu Word count with MapReduce Map(key, value): // key: document ID; value: document content FOR (each word w IN value) emit(w, 1); Reduce(key, value-list): // key: a word; value-list: a list of integers result = 0; FOR (each integer v on value-list) result += v; emit(key, result); Expect to be all 1’s, but “combiners” allow local summing of integers with the same key before passing to reducers.
  • 19. Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu Map Phase • Mapper is given key: document ID; value: document content, say: (D1,“The teacher went to the store. The store was closed; the store opens in the morning. The store opens at 9am.”) • It will emit the following pairs: <The, 1> <teacher, 1> <went, 1> <to, 1> <the, 1> <store,1> <the, 1> <store, 1> <was, 1> <closed, 1> <the, 1> <store,1> <opens, 1> <in, 1> <the, 1> <morning, 1> <the 1> <store, 1> <opens, 1> <at, 1> <9am, 1> • Normally, there will be many documents, hence many Mappers that emit such pairs in parallel, but for simplicity, let’s say that these are all the pairs emitted from the Map phase. 19
  • 20. Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu Intermediary Phase • System sorts emitted (key, value) pairs by key: 20 <9am, 1> <at, 1> <closed, 1> <in, 1> <morning, 1> <opens, 1> <opens, 1> <store, 1> <store,1> <store, 1> <store,1> <teacher, 1> <the, 1> <the, 1> <the, 1> <the, 1> <the 1> <to, 1> <went, 1> <was, 1> <The, 1>
  • 21. Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu Intermediary Phase • System sorts emitted (key, value) pairs by key: 21 <9am, 1> <at, 1> <closed, 1> <in, 1> <morning, 1> <opens, 1> <opens, 1> <store, 1> <store,1> <store, 1> <store,1> <teacher, 1> <the, 1> <the, 1> <the, 1> <the, 1> <the 1> <to, 1> <went, 1> <was, 1> <The, 1> Reducer 2 Reducer 1
  • 22. Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu • For each unique key emitted from the Map Phase, function Reduce(key, value-list) is invoked on Reducer 1 or Reducer 2. • Across their invocations, these Reducers will emit: 22 Reduce Phase <9am, 1> <at, 1> <closed, 1> <in, 1> <morning, 1> <opens, 2> <store, 4> <teacher, 1> <the, 5> <to, 1> <went, 1> <was, 1> <The, 1> Reducer 1 Reducer 2 Output of the entire program
  • 23. Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu Chaining MapReduce • The programming model seems pretty restrictive. • But quite a few analytics applications can be written with it, especially with a technique called chaining. • If the output of reducers is (key, value) pairs, then their output can be passed onto other Map/Reduce processes. • This chaining can support a variety of analytics (though certainly not all types of analytics, e.g., no ML b/c no loops). 23
  • 24. Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu Example: Word Frequency • Suppose instead of word count, we wanted to compute word frequency: the probability that a word would appear in a document. • This means computing the fraction of times a word appears, out of the total number of words in the corpus. 24
  • 25. Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu Solution: Chain two MapReduce’s • First Map/Reduce: Word Count (like before) – Map: process documents and output <word, 1> pairs. – Multiple Reducers: emit <word, word_count> for each word. • Second MapReduce: – Map: processes <word, word_count> and outputs (1, (word, word_count)). – 1 Reducer: does two passes: • In first pass, sums up all word_count’s to calculate overall_count. • In second pass calculates fractions. Emits multiple <word, word_count/overall_count>. 25
  • 26. Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu Option 2: Challenges (recall) 26 • How to generalize to other applications? – See original MapReduce paper for more examples. • How to deal with failures? • How to deal with slow nodes? • How to deal with load balancing? • How to deal with skew? • The MapReduce runtime tries to hide these challenges as much as possible. We’ll talk about a few of the performance and fault tolerance challenges here.
  • 27. Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu Architecture 27 Reducer Reducer Mapper Mapper Mapper read local write system does remote read, sort Output File 0 Output File 1 write Split 0 Split 1 Split 2 Input data (in a DFS) Output data (in a DFS) Tmp data (local FS) Master assign map task assign reduce task DFS Data Nodes DFS Data Nodes DFS Name Server
  • 28. Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu Data locality for performance • Master scheduling policy: – Asks DFS for locations of replicas of input file blocks. – Map tasks are scheduled so DFS input block replica are on the same machine or on the same rack. • Effect: Thousands of machines read input at local disk speed. – Don’t need to transfer input data all over the cluster over the network: eliminate network bottleneck! 28
  • 29. Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu Heartbeats, replication for fault tolerance • Failures are the norm in data centers. • Worker failure: – Master detects if workers failed by periodically pinging them (this is called a “heartbeat”). – Re-execute in-progress map/reduce tasks. • Master failure: – Initially, was single point of failure; Resume from Execution Log. Subsequent versions used replication and consensus. • Effect: From Google’s paper: lost 1600 of 1800 machines once, but finished fine. 29
  • 30. Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu Redundant execution for performance • Slow workers or stragglers significantly lengthen completion time. • Slowest worker can determine the total latency! – This is why many systems measure 99th percentile latency. – Other jobs consuming resources on machine. – Bad disks with errors transfer data very slowly. • Solution: spawn backup copies of tasks. – Whichever one finishes first "wins.” – I.e., treat slow executions as failed executions! 30
  • 31. Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu Take-Aways • MapReduce is a programming model and runtime for scalable, fault tolerant, and efficient analytics. – Well, some types of analytics, it’s not very expressive. • It hides common at-scale challenges under convenient abstractions. – Programmers need only implement Map and Reduce and the system takes care of running those at scale on lots of data. – Failures raise semantic/performance challenges for MapReduce, which it typically handles through redundancy. • There exist open-source implementations, including Hadoop, Flink. • Spark, a popular data analytics framework, also supports MapReduce but also a wider range of programming models. 31
  • 32. Distributed Systems 1, Columbia Course 4113, Instructor: Roxana Geambasu Acknowledgement • Slides prepared based on Asaf Cidon’s 2020 DS-1 invited lecture. 32