Introducing MapReduceMichael GronerCloudCamp ST. Louis 2009<br />
Description<br />MapReduce: A software framework to support processing of massive data sets across distributed computers<b...
MapReduce installations<br />Google – Index building<br />Visa – Transaction Processing<br />Facebook – Facebook Lexicon<b...
Algorithm<br />Map Phase <br />Raw data analyzed and converted to name/value pair<br />Shuffle Phase<br />All name/value p...
MAPREDUCE BENEFITS<br />Scale<br />Processing speed increases with number of machines involved<br />Reliable<br />Loss of ...
More INFORMATION<br />MapReduce: Simplified Data Processing on Large Clusters<br />Dean and Ghemawat<br />Slides: tech.mic...
Thank You<br />
Upcoming SlideShare
Loading in …5
×

Introducing Map Reduce: CloudCamp St. Louis 2009

836 views

Published on

This introduction to Map/Reduce was presented by Michael Groner of Appistry at CloudCamp St. Louis on December 10, 2009.

Published in: Business, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
836
On SlideShare
0
From Embeds
0
Number of Embeds
55
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Introducing Map Reduce: CloudCamp St. Louis 2009

  1. 1. Introducing MapReduceMichael GronerCloudCamp ST. Louis 2009<br />
  2. 2. Description<br />MapReduce: A software framework to support processing of massive data sets across distributed computers<br />Simple, powerful programming model<br />Language independent<br />Break down the processing problem into embarrassingly parallel atomic operations<br />
  3. 3. MapReduce installations<br />Google – Index building<br />Visa – Transaction Processing<br />Facebook – Facebook Lexicon<br />Intelligence Community<br />Yahoo/Google – Terabyte Sort<br />10 billion, 100 byte records<br />Yahoo: 910 nodes, 206 seconds<br />Google: ~1,000 nodes, 68 seconds<br />
  4. 4. Algorithm<br />Map Phase <br />Raw data analyzed and converted to name/value pair<br />Shuffle Phase<br />All name/value pairs are sorted and grouped by their keys<br />Reduce Phase<br />All values associated with a key are processed for results<br />
  5. 5. MAPREDUCE BENEFITS<br />Scale<br />Processing speed increases with number of machines involved<br />Reliable<br />Loss of any one machine doesn’t stop processing<br />Cost<br />Often built from heterogeneous commodity grade computers<br />
  6. 6. More INFORMATION<br />MapReduce: Simplified Data Processing on Large Clusters<br />Dean and Ghemawat<br />Slides: tech.michaelgroner.com<br />
  7. 7. Thank You<br />

×