Map/Confused? A practical approach to Map/Reduce with MongoDB
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Map/Confused? A practical approach to Map/Reduce with MongoDB

  • 30,769 views
Uploaded on

Talk given at MongoDb Munich on 16.10.2012 about the different approaches in MongoDB for using the Map/Reduce algorithm. The talk compares the performance of built-in MongoDB Map/Reduce, group(),......

Talk given at MongoDb Munich on 16.10.2012 about the different approaches in MongoDB for using the Map/Reduce algorithm. The talk compares the performance of built-in MongoDB Map/Reduce, group(), aggregate(), find() and the MongoDB-Hadoop Adapter using a practical use case.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
30,769
On Slideshare
30,563
From Embeds
206
Number of Embeds
9

Actions

Shares
Downloads
38
Comments
0
Likes
2

Embeds 206

http://www.10gen.com 69
http://cores.gl 37
http://klippr.co 33
http://blog.codecentric.de 31
http://www.mongodb.com 22
http://drupal1.10gen.cc 5
http://127.0.0.1 5
http://www.verious.com 3
http://www.linkedin.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1.      
  • 2.      
  • 3.      
  • 4. { "_id" : ObjectId("4fb9fb91d066d657de8d6f36"), "text" : “MongoDB uses Map/Reduce #epic #win", … "user" : { "friends_count" : 73, … "followers_count" : 102, "id" : 53507833, }, …}
  • 5.      mongod --rest --shardsvr --port 27017 --dbpath /tmp/shard1/ --smallfiles  mongod --rest --shardsvr --port 27017 --dbpath /tmp/shard1/ --smallfiles  mongod --configsvr --port 10000 --dbpath /tmp/config/ --smallfiles  mongos --port 22222 --configdb localhost:10000 1. db.tweets.mapReduce() 2. db.tweets.group() 3. db.tweets.aggregate() 4. MongoDB-Hadoop Adapter 5. db.tweets.find()
  • 6. var measure = function(c) { var a = Date.now(); var results = c.apply(); var d = Date.now() - a; return { results:results, duration:d };};
  • 7. function() { if (this.user != null) { emit("user", {userName: this.user.name, followers: this.user.followers_count}); }}
  • 8. function(key, values) { var result = null; values.forEach( function(value) { if (result == null || result.followers < value.followers) { result = value; } }) return result;}
  • 9. db.tweets.group({ key: {}, initial: { name:, followers_count:0 }, reduce: function(obj,prev) { if (obj.user != null && prev.followers_count < obj.user.followers_count) { prev.name = obj.user.name; prev.followers_count = obj.user.followers_count; } }})
  • 10. db.tweets.aggregate( {$group: { _id: {user_name: "$user.name"}, followers_count: {$max: "$user.followers_count"} }}, {$sort: {"followers_count" : -1}}, {$limit : 1}, {$project: { _id : 0, user_name : "$_id.user_name", followers_count : "$followers_count" }})
  • 11. #!/usr/bin/env python# encoding: utf-8import syssys.path.append(".")from pymongo_hadoop import BSONMapperdef mapper(documents): for doc in documents: if doc[user] != None: yield {_id: doc[user][name].encode(utf-8), followers:doc[user][followers_count]}BSONMapper(mapper)print >> sys.stderr, "Done Mapping!"
  • 12. #!/usr/bin/env python# encoding: utf-8import syssys.path.append(.)from pymongo_hadoop import BSONReducerdef reducer(key, values): print >> sys.stderr, "Processing key %s" % key.encode(utf-8) _count = 0 for v in values: if _count < v[followers]: _count = v["followers"] return {"_id": key.encode(utf-8), "count": _count}BSONReducer(reducer)print >> sys.stderr, "Done Reducing!"
  • 13. hadoop jar /usr/lib/hadoop/lib/mongo-hadoop-streaming-assembly-1.1.0-SNAPSHOT.jar-files mapper.py, reducer.py-inputURI mongodb://localhost:27017/twitter.tweets-outputURI mongodb://localhost:27017/twitter.top_user-mapper mapper.py-reducer reducer.py
  • 14. db.tweets.find().sort( {"user.followers_count": -1} ).limit(1)
  • 15. db.tweets.mapReduce()db.tweets.group()db.tweets.aggregate()MongoDB-Hadoop Adapterdb.tweets.find()
  • 16. db.tweets.mapReduce()db.tweets.group()db.tweets.aggregate()MongoDB-Hadoop Adapterdb.tweets.find()
  • 17. db.tweets.mapReduce()db.tweets.group()db.tweets.aggregate()MongoDB-Hadoop Adapterdb.tweets.find()
  • 18. db.tweets.mapReduce()db.tweets.group()db.tweets.aggregate()MongoDB-Hadoop Adapterdb.tweets.find()
  • 19. db.tweets.mapReduce()db.tweets.group()db.tweets.aggregate()MongoDB-Hadoop Adapterdb.tweets.find()
  • 20.   
  • 21. 