0
Map Reduce  An Example
Who am I?My name is James Grant (james@queeg.org).Im a developer here at Brandwatch.For the last three years Ive been a Da...
Coming up…●   What happens during MapReduce?●   Plays and Reach from music listening data●   The Mapper pseudo code●   The...
What happens during MapReduce?Input     Data           Data            Data        Fragment     Mapper     MapData     Fra...
Plays and Reach from musiclistening data● Plays - The number of times that song has  been played● Reach - The number of un...
The Mapperfunction map(Integer user, Integer song):  emit(song, user);
The Reducerfunction reduce(Integer song, Iterator users):  Integer plays = 0;  Set uniqueUsers = [];  foreach user in user...
What if…?You often hear that for nearly all cases youshould use a higher level tool like Pig or Hive tosolve problems.So w...
Using Pigsubs = LOAD submissions.tsv USING PigStorage()        AS (user:int, song:int);songs = GROUP subs BY song;songs = ...
Questions?
Upcoming SlideShare
Loading in...5
×

Map Reduce: An Example (James Grant at Big Data Brighton)

270

Published on

Presentation by Brandwatch Developer James Grant at the second Big Data Brighton meetup, hosted by Brandwatch: www.brandwatch.com

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
270
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Map Reduce: An Example (James Grant at Big Data Brighton)"

  1. 1. Map Reduce An Example
  2. 2. Who am I?My name is James Grant (james@queeg.org).Im a developer here at Brandwatch.For the last three years Ive been a DataEngineer at Last.fm and the maintainer of theirHadoop Cluster.
  3. 3. Coming up…● What happens during MapReduce?● Plays and Reach from music listening data● The Mapper pseudo code● The Reducer pseudo code● The result● What if…?
  4. 4. What happens during MapReduce?Input Data Data Data Fragment Mapper MapData Fragment Fragment Output Sort Data Data Reduce Reducer Fragment Reducer Fragment Output Input
  5. 5. Plays and Reach from musiclistening data● Plays - The number of times that song has been played● Reach - The number of unique listeners to that song● Similar to hits and uniques for web properties● Input data has columns for user id and song id (amongst others)
  6. 6. The Mapperfunction map(Integer user, Integer song): emit(song, user);
  7. 7. The Reducerfunction reduce(Integer song, Iterator users): Integer plays = 0; Set uniqueUsers = []; foreach user in users: increment plays; if user not within uniqueUsers: uniqueUsers.add(user); result.plays = plays; result.reach = uniqueUsers.cardinality(); emit(song, result);
  8. 8. What if…?You often hear that for nearly all cases youshould use a higher level tool like Pig or Hive tosolve problems.So what does the Pig script look like for thisproblem?
  9. 9. Using Pigsubs = LOAD submissions.tsv USING PigStorage() AS (user:int, song:int);songs = GROUP subs BY song;songs = FOREACH songs GENERATE group AS song, subs.user;songs = FOREACH songs GENERATE song, COUNT($1.user), COUNT(Distinct($1.user));STORE songs INTO playsreach.tsv;
  10. 10. Questions?
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×