Map reduce presentation

•

1 like•10 views

The document discusses MapReduce, a programming model for processing large datasets in a distributed manner. It describes how MapReduce works by mapping data to transform it and then reducing the data to aggregate it. An example application given is counting the number of movies each user rated using a movie ratings dataset from MovieLens. The mapping and reducing functions are defined to extract user IDs and movie IDs from the raw data, count the occurrences of each user ID, and output the final counts.

Technology

MAPREDUCE
MAKERERE UNIVERSITY MS-CS
RAJAB SSEMWOGERERE
2019/HD03/29911U

MapReduce Definition
Is a programming model having a simplified implementation of
many data parallel applications for processing and generating
large datasets.
10/1/2019 8:11:02 AM RAJAB SSEMWOGERERE
2/11

How MapReduce operates
A MapReduce maps data and reduces data. Mapping transforms
data as data comes in one line at a time (for every input line there
is one output from the mapper).
Then the reducer aggregates data together.
10/1/2019 8:11:02 AM RAJAB SSEMWOGERERE
3/11

Example of where MapReduce can be applied
How many movies each user rated on the Movie Lens.
A MovieLens is a web-based virtual community system
recommender that recommends movies for its users to watch,
based on their preferences using a collaborative filtering of
members’ movie ratings and movie reviews.
10/1/2019 8:11:02 AM RAJAB SSEMWOGERERE
4/11

Sample Dataset of a MovieLens Data
UserID MOVIEID RATING TIMESTAMP
196 242 3 881250949
186 302 3 891717742
196 377 1 878887116
244 51 2 880606923
166 346 1 886397596
186 474 4 884182806
186 265 2 881171488
10/1/2019 8:11:03 AM RAJAB SSEMWOGERERE
5/11

The mapper function
The mapper function converts raw data into key/value pairs, the
key will be the userID, and the value will be the movieID.
We don’t care about the rating and the timestamp for
optimization benefits.
def mapper_get_userID (self, _, line):
(userID, movieID) = line.split('t')
yield userID, 1
10/1/2019 8:11:03 AM RAJAB SSEMWOGERERE 6/11

The mapper function continues’
The Mapper function will convert the raw data into key/value
pairs.
196:1 186:1 196:1 244:1 166:1 186:1 186:1
By the time the mapper function finishes our data will be well
extracted and organized for the reducer function to aggregate.
10/1/2019 8:11:03 AM RAJAB SSEMWOGERERE 7/11

Shuffle and Sort
MapReduce sorts and Groups the Mapped Data (“Shuffle and
Sort”) at this point it aggregates the values for each unique key.
196:1 186:1 196:1 244:1 166:1 186:1 186:1
166:1 186:1,1,1 196:1,1 244:1
10/1/2019 8:11:03 AM RAJAB SSEMWOGERERE 8/11

The reducer function
Given the output from shuffle and sort.
The reducer is called once for each unique key, and then
processes or does the computation then produces the output.
def reducer_count_ratings (self, key, values):
yield key, sum (values)
10/1/2019 8:11:03 AM RAJAB SSEMWOGERERE 9/11

The reducer function continues’
166:1 186:1,1,1 196:1,1 244:1
Out put 166:1 186:3 196:2 244:1
10/1/2019 8:11:03 AM RAJAB SSEMWOGERERE 10/11

THANK YOU FOR YOUR
ATTENTION
END
10/1/2019 8:11:04 AM RAJAB SSEMWOGERERE 11/11

Recently uploaded

Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer

GenAI Risks & Security Meetup 01052024.pdflior mazor

🐬 The future of MySQL is Postgres 🐘RTylerCroy

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal

Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services

Histor y of HAM Radio presentation slidevu2urc

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

Artificial Intelligence: Facts and MythsJoaquim Jorge

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

Recently uploaded (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024

GenAI Risks & Security Meetup 01052024.pdf

🐬 The future of MySQL is Postgres 🐘

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

2024: Domino Containers - The Next Step. News from the Domino Container commu...

GenCyber Cyber Security Day Presentation

Exploring the Future Potential of AI-Enabled Smartphone Processors

How to Troubleshoot Apps for the Modern Connected Worker

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

Strategies for Landing an Oracle DBA Job as a Fresher

Histor y of HAM Radio presentation slide

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Boost Fertility New Invention Ups Success Rates.pdf

Finology Group – Insurtech Innovation Award 2024

Artificial Intelligence: Facts and Myths

Axa Assurance Maroc - Insurer Innovation Award 2024

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Driving Behavioral Change for Information Management through Data-Driven Gree...

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Map reduce presentation

1. MAPREDUCE MAKERERE UNIVERSITY MS-CS RAJAB SSEMWOGERERE 2019/HD03/29911U

2. MapReduce Definition Is a programming model having a simplified implementation of many data parallel applications for processing and generating large datasets. 10/1/2019 8:11:02 AM RAJAB SSEMWOGERERE 2/11

3. How MapReduce operates A MapReduce maps data and reduces data. Mapping transforms data as data comes in one line at a time (for every input line there is one output from the mapper). Then the reducer aggregates data together. 10/1/2019 8:11:02 AM RAJAB SSEMWOGERERE 3/11

4. Example of where MapReduce can be applied How many movies each user rated on the Movie Lens. A MovieLens is a web-based virtual community system recommender that recommends movies for its users to watch, based on their preferences using a collaborative filtering of members’ movie ratings and movie reviews. 10/1/2019 8:11:02 AM RAJAB SSEMWOGERERE 4/11

5. Sample Dataset of a MovieLens Data UserID MOVIEID RATING TIMESTAMP 196 242 3 881250949 186 302 3 891717742 196 377 1 878887116 244 51 2 880606923 166 346 1 886397596 186 474 4 884182806 186 265 2 881171488 10/1/2019 8:11:03 AM RAJAB SSEMWOGERERE 5/11

6. The mapper function The mapper function converts raw data into key/value pairs, the key will be the userID, and the value will be the movieID. We don’t care about the rating and the timestamp for optimization benefits. def mapper_get_userID (self, _, line): (userID, movieID) = line.split('t') yield userID, 1 10/1/2019 8:11:03 AM RAJAB SSEMWOGERERE 6/11

7. The mapper function continues’ The Mapper function will convert the raw data into key/value pairs. 196:1 186:1 196:1 244:1 166:1 186:1 186:1 By the time the mapper function finishes our data will be well extracted and organized for the reducer function to aggregate. 10/1/2019 8:11:03 AM RAJAB SSEMWOGERERE 7/11

8. Shuffle and Sort MapReduce sorts and Groups the Mapped Data (“Shuffle and Sort”) at this point it aggregates the values for each unique key. 196:1 186:1 196:1 244:1 166:1 186:1 186:1 166:1 186:1,1,1 196:1,1 244:1 10/1/2019 8:11:03 AM RAJAB SSEMWOGERERE 8/11

9. The reducer function Given the output from shuffle and sort. The reducer is called once for each unique key, and then processes or does the computation then produces the output. def reducer_count_ratings (self, key, values): yield key, sum (values) 10/1/2019 8:11:03 AM RAJAB SSEMWOGERERE 9/11

10. The reducer function continues’ 166:1 186:1,1,1 196:1,1 244:1 Out put 166:1 186:3 196:2 244:1 10/1/2019 8:11:03 AM RAJAB SSEMWOGERERE 10/11

11. THANK YOU FOR YOUR ATTENTION END 10/1/2019 8:11:04 AM RAJAB SSEMWOGERERE 11/11

Map reduce presentation

Recommended

Recommended

More Related Content

Similar to Map reduce presentation

Similar to Map reduce presentation (20)

More from rajab ssemwogerere

More from rajab ssemwogerere (6)

Recently uploaded

Recently uploaded (20)

Map reduce presentation