MapReduce

•Download as PPTX, PDF•

0 likes•73 views

MostafaAliAbbas

MapReduce in Distributed Operating System

Software

MapReduce
Mostafa Abbas
American University of Culture and
Education (AUCE)
Nabatieh Campus
Distributed Operating Systems
CSI510
Dr. Omar Falou

Problem Statement / Motivation
01
MapReduce definition
02
A Typical MapReduce Program
03
MapReduce workflow
04
Outline
MapReduce Examples
05
Conclusion
07
GFS
06

1. Why MapReduce?
 Considerations
 Threading is hard!
 How do you scale to more machines?
 How do you handle machine failures?
 How do you facilitate communication between nodes?
 Does your solution scale?
 Before MapReduce
 Large Concurrent Systems
 Grid Computing
 Rolling Your Own Solution
 Scale out, not up!
Map Reduce
1

2. What is MapReduce?
“MapReduce is a programming model and an associated
implementation for processing and generating large data sets”
 Programming model
 Abstractions to express simple computations
 Library
 Takes care of the gory stuff: Parallelization, Fault Tolerance,
Data Distribution and Load Balancing
Map Reduce
2

3. A Typical MapReduce Program
Map Reduce
3
1. Read a lot of data.
2. Map: extract something you
care about from each record.
3. Shuffle and Sort.
4. Reduce: aggregate,
summarize, filter, or transform.
5. Write the results.

4. MapReduce workflow 4
Map Reduce
Worker
Worker
Worker
Worker
Worker
read
local
write
remote
read,
sort
Output
File 0
Output
File 1
write
Split 0
Split 1
Split 2
Input Data Output Data
Map Reduce

5. MapReduce Examples
 Mapper
 Reads in input pair <K,V> Outputs a pair <K’, V’>
 Ex. For our Word Count example, with the following input: “The teacher went to
the store. The store was closed; the store opens in the morning. The store opens
at 9am.”
The output would be:
 <The, 1> <teacher, 1> <went, 1> <to, 1> <the, 1> <store, 1> <the, 1> <store,
1> <was, 1> <closed, 1> <the, 1> <store, 1> <opens, 1> <in, 1> <the, 1>
<morning, 1> <the 1> <store, 1> <opens, 1> <at, 1> <9am, 1>
Map Reduce
5

 Reducer
 Accepts the Mapper output, and collects values on the key
 For our example, the reducer input would be:
<The, 1> <teacher, 1> <went, 1> <to, 1> <the, 1> <store, 1>
<the, 1> <store, 1> <was, 1> <closed, 1> <the, 1> <store, 1>
<opens, 1> <in, 1> <the, 1> <morning, 1> <the 1> <store, 1>
<opens, 1> <at, 1> <9am, 1>
The output would be:
• <The, 6> <teacher, 1> <went, 1> <to, 1> <store, 3> <was, 1> <closed, 1>
<opens, 1> <morning, 1> <at, 1> <9am, 1>
Map Reduce
6

6. GFS Distributed File System
HOW DATA IS SAVED?!
Each file is split into contiguous 64 Mbyte chunks
GFS cluster is made up of two types of nodes:
1. Large number of Chunkservers
2. One Master node
Link GFS client library into applications
Map Reduce
7

7. Conclusion
 MapReduce provides a simple
way to scale your application.
8
Map Reduce
 Scales out to more machines,
rather than scaling up.
 Effortlessly scale from a single machine
to thousands.
 Fault tolerant & High performance.
 If you can fit your use case to its
paradigm, scaling is handled by the
framework.

Similar to MapReduce

Map-Reduce for Machine Learning on Multicoreillidan2004

Developing a Map Reduce ApplicationDr. C.V. Suresh Babu

Cloud computing_processing frameworksReem Abdel-Rahman

Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)Brian Brazil

Architecting and productionising data science applications at scalesamthemonad

Big Data, a space adventure - Mario Cartia - Codemotion Rome 2015Codemotion

ENAR short courseDeepak Agarwal

Hadoop interview questionpappupassindia

Challenges in Large Scale Machine LearningSudarsun Santhiappan

Provisioning and Capacity Planning (Travel Meets Big Data)Brian Brazil

Seminar Presentation HadoopVarun Narang

Ke yi small summaries for big datajins0618

Hadoop on-mesosHenry Cai 蔡明航

Automatic Self-Tuning Architecture for Batch Scheduler on Large Scale Computi...Sugree Phatanapherom

Size-Based Scheduling: From Theory To Practice, And BackMatteo Dell'Amico

$IEEE CLOUD \'11$ $IEEE CLOUD \'11$

IEEE CLOUD \'11David Ribeiro Alves

Intermachine ParallelismSri Prasanna

MapReduce on Zero VM Joy Rahman

Hadoop interview questions - Softwarequery.comsoftwarequery

Predict saturated thickness using tensor board visualizationVinh Nguyen

Similar to MapReduce (20)

Map-Reduce for Machine Learning on Multicore

Developing a Map Reduce Application

Cloud computing_processing frameworks

Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)

Architecting and productionising data science applications at scale

Big Data, a space adventure - Mario Cartia - Codemotion Rome 2015

ENAR short course

Hadoop interview question

Challenges in Large Scale Machine Learning

Provisioning and Capacity Planning (Travel Meets Big Data)

Seminar Presentation Hadoop

Ke yi small summaries for big data

Hadoop on-mesos

Automatic Self-Tuning Architecture for Batch Scheduler on Large Scale Computi...

Size-Based Scheduling: From Theory To Practice, And Back

$IEEE CLOUD \'11$ $IEEE CLOUD \'11$

IEEE CLOUD \'11

Intermachine Parallelism

MapReduce on Zero VM

Hadoop interview questions - Softwarequery.com

Predict saturated thickness using tensor board visualization

Recently uploaded

The Top App Development Trends Shaping the Industry in 2024-25 .pdfayushiqss

Optimizing AI for immediate response in Smart CCTVshikhaohhpro

TECUNIQUE: Success Stories: IT Service providermohitmore19

Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd

8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82

Define the academic and professional writing..pdfPearlKirahMaeRagusta1

MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit

Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812

%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba

Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls

HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai

Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171

%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda

%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba

The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...kalichargn70th171

%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba

Direct Style Effect Systems -The Print[A] Example- A Comprehension AidPhilip Schwarz

%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba

10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems

Recently uploaded (20)

The Top App Development Trends Shaping the Industry in 2024-25 .pdf

Optimizing AI for immediate response in Smart CCTV

TECUNIQUE: Success Stories: IT Service provider

Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...

8257 interfacing 2 in microprocessor for btech students

Define the academic and professional writing..pdf

MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...

Unlocking the Future of AI Agents with Large Language Models

%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein

Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified

HR Software Buyers Guide in 2024 - HRSoftware.com

Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf

%in kempton park+277-882-255-28 abortion pills for sale in kempton park

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...

%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain

The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...

%in ivory park+277-882-255-28 abortion pills for sale in ivory park

Direct Style Effect Systems -The Print[A] Example- A Comprehension Aid

%in Midrand+277-882-255-28 abortion pills for sale in midrand

10 Trends Likely to Shape Enterprise Technology in 2024

MapReduce

1. MapReduce Mostafa Abbas American University of Culture and Education (AUCE) Nabatieh Campus Distributed Operating Systems CSI510 Dr. Omar Falou

2. Problem Statement / Motivation 01 MapReduce definition 02 A Typical MapReduce Program 03 MapReduce workflow 04 Outline MapReduce Examples 05 Conclusion 07 GFS 06

3. 1. Why MapReduce?  Considerations  Threading is hard!  How do you scale to more machines?  How do you handle machine failures?  How do you facilitate communication between nodes?  Does your solution scale?  Before MapReduce  Large Concurrent Systems  Grid Computing  Rolling Your Own Solution  Scale out, not up! Map Reduce 1

4. 2. What is MapReduce? “MapReduce is a programming model and an associated implementation for processing and generating large data sets”  Programming model  Abstractions to express simple computations  Library  Takes care of the gory stuff: Parallelization, Fault Tolerance, Data Distribution and Load Balancing Map Reduce 2

5. 3. A Typical MapReduce Program Map Reduce 3 1. Read a lot of data. 2. Map: extract something you care about from each record. 3. Shuffle and Sort. 4. Reduce: aggregate, summarize, filter, or transform. 5. Write the results.

6. 4. MapReduce workflow 4 Map Reduce Worker Worker Worker Worker Worker read local write remote read, sort Output File 0 Output File 1 write Split 0 Split 1 Split 2 Input Data Output Data Map Reduce

7. 5. MapReduce Examples  Mapper  Reads in input pair <K,V> Outputs a pair <K’, V’>  Ex. For our Word Count example, with the following input: “The teacher went to the store. The store was closed; the store opens in the morning. The store opens at 9am.” The output would be:  <The, 1> <teacher, 1> <went, 1> <to, 1> <the, 1> <store, 1> <the, 1> <store, 1> <was, 1> <closed, 1> <the, 1> <store, 1> <opens, 1> <in, 1> <the, 1> <morning, 1> <the 1> <store, 1> <opens, 1> <at, 1> <9am, 1> Map Reduce 5

8.  Reducer  Accepts the Mapper output, and collects values on the key  For our example, the reducer input would be: <The, 1> <teacher, 1> <went, 1> <to, 1> <the, 1> <store, 1> <the, 1> <store, 1> <was, 1> <closed, 1> <the, 1> <store, 1> <opens, 1> <in, 1> <the, 1> <morning, 1> <the 1> <store, 1> <opens, 1> <at, 1> <9am, 1> The output would be: • <The, 6> <teacher, 1> <went, 1> <to, 1> <store, 3> <was, 1> <closed, 1> <opens, 1> <morning, 1> <at, 1> <9am, 1> Map Reduce 6

9. 6. GFS Distributed File System HOW DATA IS SAVED?! Each file is split into contiguous 64 Mbyte chunks GFS cluster is made up of two types of nodes: 1. Large number of Chunkservers 2. One Master node Link GFS client library into applications Map Reduce 7

10. 7. Conclusion  MapReduce provides a simple way to scale your application. 8 Map Reduce  Scales out to more machines, rather than scaling up.  Effortlessly scale from a single machine to thousands.  Fault tolerant & High performance.  If you can fit your use case to its paradigm, scaling is handled by the framework.

MapReduce

Recommended

Recommended

More Related Content

Similar to MapReduce

Similar to MapReduce (20)

Recently uploaded

Recently uploaded (20)

MapReduce