MapReduce

433 views

Published on

Published in: Technology, News & Politics
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
433
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

MapReduce

  1. 1. MapReduce: Simplified DataProcessing on Large Clusters Rob Keisler CSCI 638 Summer 2011
  2. 2. Outline● Background● Model● Examples● Execution● Conclusions
  3. 3. Background● Transformation operations are conceptually straightforward ○ Until data is large and the computation must be distributed over hundred or thousands of machines● So, Google created MapReduce● MapReduce is a programming abstraction ○ Expresses simple computations ○ Hides complexity details
  4. 4. Model● Utilizes higher-order shaping functions Map and Reduce to take a set of input key/value pairs and produce a set of output key/value pairs● Map ○ Takes an input key/value pair and produces a set of intermediate key/value pairs● Reduce ○ Accepts an intermediate key I and a set of values for that key, and merges those values to form possibly smaller sets of values
  5. 5. Examples● Distributed Grep● Count of URL Access Frequency● Reverse Web-Link Graph● Term-Vector per Host● Inverted Index● Distributed Sort
  6. 6. Execution Overview
  7. 7. Conclusions● The MapReduce programming model proved to be a useful abstraction for many different purposes ○ Easy to use ■ even for programmers without experience with parallel and distributed systems ○ A large variety of problems are easily expressible as MapReduce computations ○ The implementation scales to large clusters of machines● Greatly simplifies large-scale computations at Google
  8. 8. Questions?http://labs.google.com/papers/mapreduce.html

×