Your SlideShare is downloading. ×
0
MapReduce: Simplified DataProcessing on Large Clusters          Rob Keisler           CSCI 638         Summer 2011
Outline● Background● Model● Examples● Execution● Conclusions
Background● Transformation operations are conceptually straightforward   ○ Until data is large and the computation must be...
Model● Utilizes higher-order shaping functions Map and Reduce to  take a set of input key/value pairs and produce a set of...
Examples● Distributed Grep● Count of URL Access Frequency● Reverse Web-Link Graph● Term-Vector per Host● Inverted Index● D...
Execution Overview
Conclusions● The MapReduce programming model proved to be a useful  abstraction for many different purposes   ○ Easy to us...
Questions?http://labs.google.com/papers/mapreduce.html
Upcoming SlideShare
Loading in...5
×

MapReduce

463

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
463
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
15
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "MapReduce"

  1. 1. MapReduce: Simplified DataProcessing on Large Clusters Rob Keisler CSCI 638 Summer 2011
  2. 2. Outline● Background● Model● Examples● Execution● Conclusions
  3. 3. Background● Transformation operations are conceptually straightforward ○ Until data is large and the computation must be distributed over hundred or thousands of machines● So, Google created MapReduce● MapReduce is a programming abstraction ○ Expresses simple computations ○ Hides complexity details
  4. 4. Model● Utilizes higher-order shaping functions Map and Reduce to take a set of input key/value pairs and produce a set of output key/value pairs● Map ○ Takes an input key/value pair and produces a set of intermediate key/value pairs● Reduce ○ Accepts an intermediate key I and a set of values for that key, and merges those values to form possibly smaller sets of values
  5. 5. Examples● Distributed Grep● Count of URL Access Frequency● Reverse Web-Link Graph● Term-Vector per Host● Inverted Index● Distributed Sort
  6. 6. Execution Overview
  7. 7. Conclusions● The MapReduce programming model proved to be a useful abstraction for many different purposes ○ Easy to use ■ even for programmers without experience with parallel and distributed systems ○ A large variety of problems are easily expressible as MapReduce computations ○ The implementation scales to large clusters of machines● Greatly simplifies large-scale computations at Google
  8. 8. Questions?http://labs.google.com/papers/mapreduce.html
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×