2. 2
Outline
ØToday: MapReduce
§ Analytics programming interface
§ Word count example
§ Chaining
§ Reading/write from/to HDFS
§ Dealing with failures
ØIn future: Spark
§ Motivation
§ Resilient Distributed Datasets (RDDs)
§ Programming interface
§ Transformations and actions
04/03/2023
Principles of Cloud Computing--S.A.Javadi 2
3. 3
Large-scale Analytics
ØCompute the frequency of words
in a corpus of documents.
ØCount how many times users
have clicked on each of
a (large) set of ads.
ØPageRank: Compute the
“importance” of a web page
based on the “importances”
of the pages that link to it.
04/03/2023
Principles of Cloud Computing--S.A.Javadi 3
4. 4
Option 1: SQL
Ø Before MapReduce, analytics mostly done in SQL, or manually.
Ø Example: Count word appearances in a corpus of documents.
Ø With SQL, the rough query might be:
Ø Very expressive, convenient to program, but no one knew how to scale SQL
execution!
04/03/2023
Principles of Cloud Computing--S.A.Javadi 4
5. 5
Option 2: Manual
ØExample: Count word appearances in a corpus of documents.
04/03/2023
Principles of Cloud Computing--S.A.Javadi 5
6. 6
Option 2: Manual
ØExample: Count word appearances in a corpus of documents.
ØPhase 1: Assign documents to different machines/nodes.
§ Each computes a dictionary: {word: local_freq}.
04/03/2023
Principles of Cloud Computing--S.A.Javadi 6
7. 7
Option 2: Manual
ØExample: Count word appearances in a corpus of documents.
ØPhase 1: Assign documents to different machines/nodes.
§ Each computes a dictionary: {word: local_freq}.
ØPhase 2: Nodes exchange dictionaries (how?) to aggregate
local_freq’s.
ØBut how to make this scale??
04/03/2023
Principles of Cloud Computing--S.A.Javadi 7
8. 8
Option 2: Manual
ØPhase 2, Option a: Send all {word: local_freq} dictionaries to one
node, who aggregates.
§ But what if it’s too much data for one node?
ØPhase 2, Option b: Each node sends (word, local_freq) to a
designated node, e.g., node
with ID
hash(word) % N.
04/03/2023
Principles of Cloud Computing--S.A.Javadi 8
We’ve roughly discovered an app-specific
version of MapReduce!
9. 9
Option 2: Challenges
ØHow to generalize to other applications?
ØHow to deal with failures?
ØHow to deal with slow nodes?
ØHow to deal with load balancing
(some docs are very large,
others small)?
ØHow to deal with skew
(some words are very frequent,
so nodes designated to
aggregate them will be pounded)?
04/03/2023
Principles of Cloud Computing--S.A.Javadi 9
10. 10
MapReduce
ØParallelizable programming model:
§ Applies to a broad class of analytics applications.
§ Isn’t as expressive as SQL but it is easier to scale.
04/03/2023
Principles of Cloud Computing--S.A.Javadi 10
11. 11
MapReduce
ØConsists of two phases, each intrinsically parallelizable:
§ Map: processes input elements independently to emit relevant
(key, value) pairs from each.
• Transparently, the system groups all the values for each key
together: (key, [list of values]).
§ Reduce: aggregates all the values for each key to emit a global
value for each key.
04/03/2023
Principles of Cloud Computing--S.A.Javadi 11
13. 13
How it works
ØInput: a collection of elements of (key, value) pair type.
ØProgrammer defines two functions:
§ Map(key, value) → 0, 1, or more (key’, value’) pairs
§ Reduce(key, value-list) → output
04/03/2023
Principles of Cloud Computing--S.A.Javadi 13
14. 14
How it works
ØExecution:
§ Apply Map to each input key-value pair, in parallel for different keys.
§ Sort emitted (key’, value’) pairs to produce (key, values-list) pairs.
§ Apply Reduce to each (key, value-list) pair, in parallel for different keys.
ØOutput is the union of all Reduce invocations’ outputs.
04/03/2023
Principles of Cloud Computing--S.A.Javadi 14
16. 16
Example: Word count
ØWe have a directory, which contains many documents.
ØThe documents contain words separated by whitespace and
punctuation.
ØGoal: Count the number of times each distinct word appears in
the file.
04/03/2023
Principles of Cloud Computing--S.A.Javadi 16
17. 17
Word count with MapReduce
Map(key, value):
// key: document ID; value: document content
FOR (each word w IN value)
emit(w, 1);
Reduce(key, value-list):
// key: a word; value-list: a list of integers
result = 0;
FOR (each integer v on value-list)
result += v;
emit(key, result);
04/03/2023
Principles of Cloud Computing--S.A.Javadi 17
Expect to be all 1’s, but “combiners” allow
local summing of integers with the same
key before passing to reducers.
18. 18
Map Phase
ØMapper is given key: document ID; value: document content, say:
(D1,“The teacher went to the store. The store was
closed; the store opens in the morning. The store
opens at 9am.”)
ØIt will emit the following pairs:
<The, 1> <teacher, 1> <went, 1> <to, 1> <the, 1> <store,1>
<the, 1> <store, 1> <was, 1> <closed, 1> <the, 1> <store,1>
<opens, 1> <in, 1> <the, 1> <morning, 1> <the 1> <store, 1>
<opens, 1> <at, 1> <9am, 1>
04/03/2023
Principles of Cloud Computing--S.A.Javadi 18
19. 19
Map Phase
ØNormally, there will be many documents, hence many Mappers that
emit such pairs in parallel, but for simplicity, let’s say that these are
all the pairs emitted from the Map phase.
04/03/2023
Principles of Cloud Computing--S.A.Javadi 19
22. 22
Reduce Phase
ØFor each unique key emitted from the Map Phase, function
Reduce(key, value-list) is invoked on Reducer 1 or Reducer 2.
ØAcross their invocations, these Reducers will emit:
04/03/2023
Principles of Cloud Computing--S.A.Javadi 22
23. 23
Chaining MapReduce
ØThe programming model seems pretty restrictive.
ØBut quite a few analytics applications can be written with it,
especially with a technique called chaining.
ØIf the output of reducers is (key, value) pairs, then their output
can be passed onto other Map/Reduce processes.
ØThis chaining can support a variety of analytics (though certainly
not all types of analytics, e.g., no ML b/c no loops).
04/03/2023
Principles of Cloud Computing--S.A.Javadi 23
24. 24
Example: Word Frequency
ØSuppose instead of word count, we wanted to compute word
frequency: the probability that a word would appear in a
document.
ØThis means computing the fraction of times a word appears, out
of the total number of words in the corpus.
04/03/2023
Principles of Cloud Computing--S.A.Javadi 24
25. 25
Solution: Chain two MapReduce’s
ØFirst Map/Reduce: Word Count (like before)
§ Map: process documents and output <word, 1> pairs.
§ Multiple Reducers: emit <word, word_count> for each word.
ØSecond MapReduce: ?
04/03/2023
Principles of Cloud Computing--S.A.Javadi 25
26. 26
Solution: Chain two MapReduce’s
ØSecond MapReduce:
§ Map: processes <word, word_count> and outputs (1, (word,
word_count)).
§ Reducer: does two passes:
• In first pass, sums up all word_count’s to calculate overall_count.
• In second pass calculates fractions. Emits multiple <word,
word_count/overall_count>.
04/03/2023
Principles of Cloud Computing--S.A.Javadi 26
27. 27
Option 2: Challenges (recall)
ØHow to generalize to other applications?
§ See original MapReduce paper for more examples.
ØHow to deal with failures?
ØHow to deal with slow nodes?
ØHow to deal with load balancing?
ØHow to deal with skew?
ØThe MapReduce runtime tries to hide these challenges as much as
possible. We’ll talk about a few of the performance and fault
tolerance challenges here.
04/03/2023
Principles of Cloud Computing--S.A.Javadi 27
30. 30
High-performance distributed file systems
and storage clouds
ØLustre
ØGoogle File System (GFS)
ØHadoop Distributed File System (HDFS)
ØAmazon Simple Storage Service (S3)
ØAnd many more
04/03/2023
Principles of Cloud Computing--S.A.Javadi 30
31. 31
Hadoop Distributed File System (HDFS)
ØHDFS stores very large files (many terabytes and petabytes).
ØIt could store tens of millions of files.
ØIt can run on hundreds or thousands of commodity servers.
04/03/2023
Principles of Cloud Computing--S.A.Javadi 31
32. 32
Hadoop Distributed File System (HDFS)
ØA general assumption about HDFS is that hardware failure is a
norm – not an exception.
ØIt is suitable for big data analytics
§ It is optimized to write very-big files once and to read them
many times
§ It is not suitable for random reads/writes.
04/03/2023
Principles of Cloud Computing--S.A.Javadi 32
33. 33
Hadoop Distributed File System (HDFS)
ØAn HDFS file is a sequence of blocks stored in a cluster of multiple servers.
ØFault tolerance is at block level.
ØHDFS blocks are big ones – 128 MB by default.
ØHDFC is designed for applications that access (streaming) data sets
successively, it is not suitable for small files or for direct reads and writes.
ØHadoop moves the computations to the storage nodes
§ It is the best approach for cases when the computing programs are relatively
small, and the stored data are big enough.
04/03/2023
Principles of Cloud Computing--S.A.Javadi 33
34. 34
HDFS Structure
ØThere are namenode and datanode nodes
ØThe namenode contains metadata for files, directories, file block
locations, and so forth.
ØThe datanode stores data blocks.
04/03/2023
Principles of Cloud Computing--S.A.Javadi 34
35. 35
HDFS Architecture
35
Master Node
NameNode process
File System Metedata
Node 1
DataNode process
Local file system
B1 B2
Node 2
DataNode process
Local file system
B1 B2
...
Node N
DataNode process
Local file system
HDFS Data Blocks HDFS
04/03/2023
Principles of Cloud Computing--S.A.Javadi
37. 37
Anatomy of File Read in HDFS
04/03/2023
Principles of Cloud Computing--S.A.Javadi 37
https://www.geeksforgeeks.org/anatomy-of-file-read-and-write-in-hdfs/
38. 38
Anatomy of File Write in HDFS
04/03/2023
Principles of Cloud Computing--S.A.Javadi 38
https://www.geeksforgeeks.org/anatomy-of-file-read-and-write-in-hdfs/
39. 39
HDFS Structure
ØThe client opens files or directories using the metadata from a
namenode, after that, the file datanodes execute the operations.
ØThe read operations directly access datanodes in a sequential
read access mode.
ØIf a read fails, then the datanode uses a block replica.
ØWhen HDFS has read all the data from a given block, it chooses
the next block among all next block replicas.
04/03/2023
Principles of Cloud Computing--S.A.Javadi 39
42. 42
Data locality for performance
ØMaster scheduling policy:
§ Asks DFS for locations of replicas of input file blocks.
§ Map tasks are scheduled so DFS input block replica are on the
same machine or on the same rack.
ØEffect: Thousands of machines read input at local disk speed.
§ Don’t need to transfer input data all over the cluster over the
network: eliminate network bottleneck!
04/03/2023
Principles of Cloud Computing--S.A.Javadi 42
43. 43
Data locality in Hadoop
ØData local data locality in Hadoop
§ When the data is located on the same node as the mapper
working on the data it is known as data local data locality.
§ In this case, the proximity of data is very near to computation.
§ This is the most preferred scenario.
Ø…
04/03/2023
Principles of Cloud Computing--S.A.Javadi 43
https://data-flair.training/blogs/data-locality-in-hadoop-mapreduce/
44. 44
Data locality in Hadoop
ØData local data locality in Hadoop
ØIntra-Rack data locality in Hadoop
§ It is not always possible to execute the mapper on the same
datanode due to resource constraints.
§ In such case, it is preferred to run the mapper on the different
node but on the same rack.
Ø…
04/03/2023
Principles of Cloud Computing--S.A.Javadi 44
https://data-flair.training/blogs/data-locality-in-hadoop-mapreduce/
45. 45
Data locality in Hadoop
ØData local data locality in Hadoop
ØIntra-Rack data locality in Hadoop
ØInter-Rack data locality in Hadoop
§ Sometimes it is not possible to execute mapper on a different
node in the same rack due to resource constraints.
§ In such a case, we will execute the mapper on the nodes on
different racks.
§ This is the least preferred scenario.
04/03/2023
Principles of Cloud Computing--S.A.Javadi 45
https://data-flair.training/blogs/data-locality-in-hadoop-mapreduce/
46. 46
Heartbeats, replication for fault tolerance
ØFailures are the norm in data centers.
ØWorker failure:
§ Master detects if workers failed by periodically pinging them (this
is called a “heartbeat”).
§ Re-execute in-progress map/reduce tasks.
04/03/2023
Principles of Cloud Computing--S.A.Javadi 46
47. 47
Heartbeats, replication for fault tolerance
ØMaster failure:
§ Initially, was single point of failure; Resume from Execution Log.
§ Subsequent versions used replication and consensus.
ØEffect: From Google’s paper: lost 1600 of 1800 machines once,
but finished fine.
04/03/2023
Principles of Cloud Computing--S.A.Javadi 47
48. 48
Redundant execution for performance
ØSlow workers or stragglers significantly lengthen completion
time.
ØSlowest worker can determine the total latency!
§ This is why many systems measure 99th percentile latency.
§ Other jobs consuming resources on machine.
§ Bad disks with errors transfer data very slowly.
ØSolution: spawn backup copies of tasks.
§ Whichever one finishes first "wins.”
§ I.e., treat slow executions as failed executions!
04/03/2023
Principles of Cloud Computing--S.A.Javadi 48
49. 49
Take-Aways
ØMapReduce is a programming model and runtime for scalable,
fault tolerant, and efficient analytics.
§ Well, some types of analytics, it’s not very expressive.
ØIt hides common at-scale challenges under convenient
abstractions.
ØProgrammers need only implement Map and Reduce and the
system takes care of running those at scale on lots of data.
04/03/2023
Principles of Cloud Computing--S.A.Javadi 49
50. 50
Take-Aways
ØFailures raise semantic/performance challenges for MapReduce,
which it typically handles through redundancy.
ØThere exist open-source implementations, including Hadoop,
Flink.
ØSpark, a popular data analytics framework, also supports
MapReduce but also a wider range of programming models.
04/03/2023
Principles of Cloud Computing--S.A.Javadi 50
51. 51
Acknowledgement
ØSlides prepared based on:
§ Dr Geambasu’s lecture slides
• https://systems.cs.columbia.edu/ds1-class/
• https://systems.cs.columbia.edu/ds2-class/
§ Dr Mittal’s lecture slides
• https://courses.grainger.illinois.edu/ece428/sp2022/index.html
§ Dr Gupta’s lecture slides
• http://courses.engr.illinois.edu/cs425/
04/03/2023
Principles of Cloud Computing--S.A.Javadi 51