The document discusses MapReduce, a programming model for processing large datasets in a distributed manner. It describes how MapReduce works by mapping data to transform it and then reducing the data to aggregate it. An example application given is counting the number of movies each user rated using a movie ratings dataset from MovieLens. The mapping and reducing functions are defined to extract user IDs and movie IDs from the raw data, count the occurrences of each user ID, and output the final counts.
2. MapReduce Definition
Is a programming model having a simplified implementation of
many data parallel applications for processing and generating
large datasets.
10/1/2019 8:11:02 AM RAJAB SSEMWOGERERE
2/11
3. How MapReduce operates
A MapReduce maps data and reduces data. Mapping transforms
data as data comes in one line at a time (for every input line there
is one output from the mapper).
Then the reducer aggregates data together.
10/1/2019 8:11:02 AM RAJAB SSEMWOGERERE
3/11
4. Example of where MapReduce can be applied
How many movies each user rated on the Movie Lens.
A MovieLens is a web-based virtual community system
recommender that recommends movies for its users to watch,
based on their preferences using a collaborative filtering of
members’ movie ratings and movie reviews.
10/1/2019 8:11:02 AM RAJAB SSEMWOGERERE
4/11
6. The mapper function
The mapper function converts raw data into key/value pairs, the
key will be the userID, and the value will be the movieID.
We don’t care about the rating and the timestamp for
optimization benefits.
def mapper_get_userID (self, _, line):
(userID, movieID) = line.split('t')
yield userID, 1
10/1/2019 8:11:03 AM RAJAB SSEMWOGERERE 6/11
7. The mapper function continues’
The Mapper function will convert the raw data into key/value
pairs.
196:1 186:1 196:1 244:1 166:1 186:1 186:1
By the time the mapper function finishes our data will be well
extracted and organized for the reducer function to aggregate.
10/1/2019 8:11:03 AM RAJAB SSEMWOGERERE 7/11
8. Shuffle and Sort
MapReduce sorts and Groups the Mapped Data (“Shuffle and
Sort”) at this point it aggregates the values for each unique key.
196:1 186:1 196:1 244:1 166:1 186:1 186:1
166:1 186:1,1,1 196:1,1 244:1
10/1/2019 8:11:03 AM RAJAB SSEMWOGERERE 8/11
9. The reducer function
Given the output from shuffle and sort.
The reducer is called once for each unique key, and then
processes or does the computation then produces the output.
def reducer_count_ratings (self, key, values):
yield key, sum (values)
10/1/2019 8:11:03 AM RAJAB SSEMWOGERERE 9/11
10. The reducer function continues’
166:1 186:1,1,1 196:1,1 244:1
Out put 166:1 186:3 196:2 244:1
10/1/2019 8:11:03 AM RAJAB SSEMWOGERERE 10/11
11. THANK YOU FOR YOUR
ATTENTION
END
10/1/2019 8:11:04 AM RAJAB SSEMWOGERERE 11/11