MAPREDUCE
MAKERERE UNIVERSITY MS-CS
RAJAB SSEMWOGERERE
2019/HD03/29911U
MapReduce Definition
Is a programming model having a simplified implementation of
many data parallel applications for processing and generating
large datasets.
10/1/2019 8:11:02 AM RAJAB SSEMWOGERERE
2/11
How MapReduce operates
A MapReduce maps data and reduces data. Mapping transforms
data as data comes in one line at a time (for every input line there
is one output from the mapper).
Then the reducer aggregates data together.
10/1/2019 8:11:02 AM RAJAB SSEMWOGERERE
3/11
Example of where MapReduce can be applied
How many movies each user rated on the Movie Lens.
A MovieLens is a web-based virtual community system
recommender that recommends movies for its users to watch,
based on their preferences using a collaborative filtering of
members’ movie ratings and movie reviews.
10/1/2019 8:11:02 AM RAJAB SSEMWOGERERE
4/11
Sample Dataset of a MovieLens Data
UserID MOVIEID RATING TIMESTAMP
196 242 3 881250949
186 302 3 891717742
196 377 1 878887116
244 51 2 880606923
166 346 1 886397596
186 474 4 884182806
186 265 2 881171488
10/1/2019 8:11:03 AM RAJAB SSEMWOGERERE
5/11
The mapper function
The mapper function converts raw data into key/value pairs, the
key will be the userID, and the value will be the movieID.
We don’t care about the rating and the timestamp for
optimization benefits.
def mapper_get_userID (self, _, line):
(userID, movieID) = line.split('t')
yield userID, 1
10/1/2019 8:11:03 AM RAJAB SSEMWOGERERE 6/11
The mapper function continues’
The Mapper function will convert the raw data into key/value
pairs.
196:1 186:1 196:1 244:1 166:1 186:1 186:1
By the time the mapper function finishes our data will be well
extracted and organized for the reducer function to aggregate.
10/1/2019 8:11:03 AM RAJAB SSEMWOGERERE 7/11
Shuffle and Sort
MapReduce sorts and Groups the Mapped Data (“Shuffle and
Sort”) at this point it aggregates the values for each unique key.
196:1 186:1 196:1 244:1 166:1 186:1 186:1
166:1 186:1,1,1 196:1,1 244:1
10/1/2019 8:11:03 AM RAJAB SSEMWOGERERE 8/11
The reducer function
Given the output from shuffle and sort.
The reducer is called once for each unique key, and then
processes or does the computation then produces the output.
def reducer_count_ratings (self, key, values):
yield key, sum (values)
10/1/2019 8:11:03 AM RAJAB SSEMWOGERERE 9/11
The reducer function continues’
166:1 186:1,1,1 196:1,1 244:1
Out put 166:1 186:3 196:2 244:1
10/1/2019 8:11:03 AM RAJAB SSEMWOGERERE 10/11
THANK YOU FOR YOUR
ATTENTION
END
10/1/2019 8:11:04 AM RAJAB SSEMWOGERERE 11/11

Map reduce presentation

  • 1.
    MAPREDUCE MAKERERE UNIVERSITY MS-CS RAJABSSEMWOGERERE 2019/HD03/29911U
  • 2.
    MapReduce Definition Is aprogramming model having a simplified implementation of many data parallel applications for processing and generating large datasets. 10/1/2019 8:11:02 AM RAJAB SSEMWOGERERE 2/11
  • 3.
    How MapReduce operates AMapReduce maps data and reduces data. Mapping transforms data as data comes in one line at a time (for every input line there is one output from the mapper). Then the reducer aggregates data together. 10/1/2019 8:11:02 AM RAJAB SSEMWOGERERE 3/11
  • 4.
    Example of whereMapReduce can be applied How many movies each user rated on the Movie Lens. A MovieLens is a web-based virtual community system recommender that recommends movies for its users to watch, based on their preferences using a collaborative filtering of members’ movie ratings and movie reviews. 10/1/2019 8:11:02 AM RAJAB SSEMWOGERERE 4/11
  • 5.
    Sample Dataset ofa MovieLens Data UserID MOVIEID RATING TIMESTAMP 196 242 3 881250949 186 302 3 891717742 196 377 1 878887116 244 51 2 880606923 166 346 1 886397596 186 474 4 884182806 186 265 2 881171488 10/1/2019 8:11:03 AM RAJAB SSEMWOGERERE 5/11
  • 6.
    The mapper function Themapper function converts raw data into key/value pairs, the key will be the userID, and the value will be the movieID. We don’t care about the rating and the timestamp for optimization benefits. def mapper_get_userID (self, _, line): (userID, movieID) = line.split('t') yield userID, 1 10/1/2019 8:11:03 AM RAJAB SSEMWOGERERE 6/11
  • 7.
    The mapper functioncontinues’ The Mapper function will convert the raw data into key/value pairs. 196:1 186:1 196:1 244:1 166:1 186:1 186:1 By the time the mapper function finishes our data will be well extracted and organized for the reducer function to aggregate. 10/1/2019 8:11:03 AM RAJAB SSEMWOGERERE 7/11
  • 8.
    Shuffle and Sort MapReducesorts and Groups the Mapped Data (“Shuffle and Sort”) at this point it aggregates the values for each unique key. 196:1 186:1 196:1 244:1 166:1 186:1 186:1 166:1 186:1,1,1 196:1,1 244:1 10/1/2019 8:11:03 AM RAJAB SSEMWOGERERE 8/11
  • 9.
    The reducer function Giventhe output from shuffle and sort. The reducer is called once for each unique key, and then processes or does the computation then produces the output. def reducer_count_ratings (self, key, values): yield key, sum (values) 10/1/2019 8:11:03 AM RAJAB SSEMWOGERERE 9/11
  • 10.
    The reducer functioncontinues’ 166:1 186:1,1,1 196:1,1 244:1 Out put 166:1 186:3 196:2 244:1 10/1/2019 8:11:03 AM RAJAB SSEMWOGERERE 10/11
  • 11.
    THANK YOU FORYOUR ATTENTION END 10/1/2019 8:11:04 AM RAJAB SSEMWOGERERE 11/11