MongoDB & Hadoop: Flexible Hourly Batch Processing Model

8,578 views

Published on

Published in: Technology
0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
8,578
On SlideShare
0
From Embeds
0
Number of Embeds
2,821
Actions
Shares
0
Downloads
90
Comments
0
Likes
8
Embeds 0
No embeds

No notes for slide

MongoDB & Hadoop: Flexible Hourly Batch Processing Model

  1. 1. { "_id" : ObjectId("4dcd3ebc9278000000005158"), "timestamp" : ISODate("2011-05-13T14:22:46.777Z"), "binary" : BinData(0,""), "string" : "abc", "number" : 3, "subobj" : {"subA": 1, "subB": 2 }, "array" : [1, 2, 3], "dbref" : [_id1, _id2, _id3] padding}
  2. 2. { db.coll.find({"string": "abc"});db.coll.find({ "string" : /^a.*$/i }); "_id" : ObjectId("4dcd3ebc9278000000005158"), "timestamp" : ISODate("2011-05-13T14:22:46.777Z"), db.coll.find({"subobj.subA": 1}); db.coll.find({"subobj.subB": {$exists: true} }); "binary" : BinData(0,""), "string" : "abc", db.coll.find({"number": 3}); db.coll.find({"number": {$gt: 1}}); "number" : 3, "subobj" : {"subA": 1, "subB": 2 }, "array" : [1, 2, 3], db.coll.find({"array": {$all:[1, 2]} }); "dbref" : [_id1, _id2, _id3] db.coll.find({"array": {$in:[2, 4, 6]} }); padding}
  3. 3. { "_id" : ObjectId("4dcd3ebc9278000000005158"), "timestamp" : ISODate("2011-05-13T14:22:46.777Z"), { $set : {"string": "def"} } "binary" : BinData(0,""), { $inc : {"number": 1} } "string" : "def", { $pull : {"subobj": {"subB": 2 } } } "number" : 4, "subobj" : {"subA": 1, "subB": 2 }, "array" : [1, 2, 3, 4, 5, 6], "dbref"$addToSet : { "array" : { $each : [ 4 , 5 , 6 ] } } } { : [_id1, _id2, _id3] "newkey" : "In-place"} { $set : {"newkey": "In-place"} }
  4. 4. ScientificPython
  5. 5. def mapper(key, value): for word in value.split(): yield word,1def reducer(key, values): yield key,sum(values)if __name__ == "__main__": import dumbo dumbo.run(mapper, reducer)dumbo start wordcount.py -hadoop /path/to/hadoop -input wc_input.txt -output wc_output
  6. 6. [2011-07-01 12:01:48,447]
  7. 7. db.collection.insert( {hour:0, userId:”1234”, actionType:”login”,});
  8. 8. m = function(){ this.tags.forEach{ function(z) { emit(z, {count: 1}); } };};r = function(key, values) { var total=0; for (i=0, i<values.length, i++) total += values[i].count; return { count : total };}res=db.things.mapReduce(m,!r);# finalize
  9. 9. Examples Conclusions and Future Work Party Solutions Motivation Architecture Examples Conclusions and Future Workummary of Features Hadoop-based: same limitations as Streaming (Dumbo) and Streaming Jython Pydoop Jython (Happy), except for ease of use C/C++ Ext Yes No Yes Other implementations: good if you have your own cluster Standard Lib Full Partial Full Hadoop is the most widespread implementation MR API No* Full Partial Java-like FW No Yes Yes HDFS No Leo, Zanetti Yes Yes Pydoop: a Python MapReduce and HDFS API for Hadoop (*) you can only write the map and reduce parts as executable scripts.
  10. 10. Motivation Architecture Examples Conclusions and Future WorkHadoop Pipes Communication with Java framework via persistent sockets The C++ app provides a factory used by the framework to create MR components Providing Mapper and Reducer is mandatory Leo, Zanetti Pydoop: a Python MapReduce and HDFS API for Hadoop
  11. 11. Motivation Architecture Examples Conclusions and Future WorkIntegration of Pydoop with C++ Integration with Pipes: Method calls flow from the framework through the C++ and the Pydoop API, ultimately reaching user-defined methods Results are wrapped by Boost and returned to the framework Integration with HDFS: Function calls initiated by Pydoop Results wrapped and returned as Python objects to the app
  12. 12. gawk BEGIN{ reducenum=$REDUCE_NUM; } { userid=$7; key=$8; } key ~ /a{GetLoginBonus}/ { incrby(userid,key,$9,a); next;} key ~ /a{SideJob}/ { incrby(userid,key,$11,a); next;} key ~ /a{CleanMyShop}/ { hincr(userid,key,$9,a); next; } key ~ /(GetAvatarPart|ChangeP|ChangeWakuwakuP|ChangeKonergy)/ { incrbydiff(userid,key,$9,a); next; } ...‘ $IN# for reducer1 (such as “userid % reducenum == 0”)# command userid key valueMULTIHINCRBY 1111 a{ChangeGreed} 3HINCRBY 1111 a{GianEvent} 7HINCRBY 1111 a{TeamChallenge} 5HINCRBY 2222 a{Battle} 3HINCRBY 2222 a{ChangeMoney} 3...EXEC

×