MongoDB & Hadoop: Flexible Hourly Batch Processing Model
Upcoming SlideShare
Loading in...5
×
 

MongoDB & Hadoop: Flexible Hourly Batch Processing Model

on

  • 6,296 views

 

Statistics

Views

Total Views
6,296
Views on SlideShare
5,053
Embed Views
1,243

Actions

Likes
7
Downloads
82
Comments
0

7 Embeds 1,243

http://mobicon.tistory.com 1230
http://paper.li 4
http://twitter.com 3
http://a0.twimg.com 3
https://twitter.com 1
http://blog.naver.com 1
http://www.google.co.kr 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

MongoDB & Hadoop: Flexible Hourly Batch Processing Model MongoDB & Hadoop: Flexible Hourly Batch Processing Model Presentation Transcript

  • { "_id" : ObjectId("4dcd3ebc9278000000005158"), "timestamp" : ISODate("2011-05-13T14:22:46.777Z"), "binary" : BinData(0,""), "string" : "abc", "number" : 3, "subobj" : {"subA": 1, "subB": 2 }, "array" : [1, 2, 3], "dbref" : [_id1, _id2, _id3] padding}
  • { db.coll.find({"string": "abc"});db.coll.find({ "string" : /^a.*$/i }); "_id" : ObjectId("4dcd3ebc9278000000005158"), "timestamp" : ISODate("2011-05-13T14:22:46.777Z"), db.coll.find({"subobj.subA": 1}); db.coll.find({"subobj.subB": {$exists: true} }); "binary" : BinData(0,""), "string" : "abc", db.coll.find({"number": 3}); db.coll.find({"number": {$gt: 1}}); "number" : 3, "subobj" : {"subA": 1, "subB": 2 }, "array" : [1, 2, 3], db.coll.find({"array": {$all:[1, 2]} }); "dbref" : [_id1, _id2, _id3] db.coll.find({"array": {$in:[2, 4, 6]} }); padding}
  • { "_id" : ObjectId("4dcd3ebc9278000000005158"), "timestamp" : ISODate("2011-05-13T14:22:46.777Z"), { $set : {"string": "def"} } "binary" : BinData(0,""), { $inc : {"number": 1} } "string" : "def", { $pull : {"subobj": {"subB": 2 } } } "number" : 4, "subobj" : {"subA": 1, "subB": 2 }, "array" : [1, 2, 3, 4, 5, 6], "dbref"$addToSet : { "array" : { $each : [ 4 , 5 , 6 ] } } } { : [_id1, _id2, _id3] "newkey" : "In-place"} { $set : {"newkey": "In-place"} }
  • ScientificPython
  • def mapper(key, value): for word in value.split(): yield word,1def reducer(key, values): yield key,sum(values)if __name__ == "__main__": import dumbo dumbo.run(mapper, reducer)dumbo start wordcount.py -hadoop /path/to/hadoop -input wc_input.txt -output wc_output
  • [2011-07-01 12:01:48,447]
  • db.collection.insert( {hour:0, userId:”1234”, actionType:”login”,});
  • m = function(){ this.tags.forEach{ function(z) { emit(z, {count: 1}); } };};r = function(key, values) { var total=0; for (i=0, i<values.length, i++) total += values[i].count; return { count : total };}res=db.things.mapReduce(m,!r);# finalize
  • Examples Conclusions and Future Work Party Solutions Motivation Architecture Examples Conclusions and Future Workummary of Features Hadoop-based: same limitations as Streaming (Dumbo) and Streaming Jython Pydoop Jython (Happy), except for ease of use C/C++ Ext Yes No Yes Other implementations: good if you have your own cluster Standard Lib Full Partial Full Hadoop is the most widespread implementation MR API No* Full Partial Java-like FW No Yes Yes HDFS No Leo, Zanetti Yes Yes Pydoop: a Python MapReduce and HDFS API for Hadoop (*) you can only write the map and reduce parts as executable scripts.
  • Motivation Architecture Examples Conclusions and Future WorkHadoop Pipes Communication with Java framework via persistent sockets The C++ app provides a factory used by the framework to create MR components Providing Mapper and Reducer is mandatory Leo, Zanetti Pydoop: a Python MapReduce and HDFS API for Hadoop
  • Motivation Architecture Examples Conclusions and Future WorkIntegration of Pydoop with C++ Integration with Pipes: Method calls flow from the framework through the C++ and the Pydoop API, ultimately reaching user-defined methods Results are wrapped by Boost and returned to the framework Integration with HDFS: Function calls initiated by Pydoop Results wrapped and returned as Python objects to the app
  • gawk BEGIN{ reducenum=$REDUCE_NUM; } { userid=$7; key=$8; } key ~ /a{GetLoginBonus}/ { incrby(userid,key,$9,a); next;} key ~ /a{SideJob}/ { incrby(userid,key,$11,a); next;} key ~ /a{CleanMyShop}/ { hincr(userid,key,$9,a); next; } key ~ /(GetAvatarPart|ChangeP|ChangeWakuwakuP|ChangeKonergy)/ { incrbydiff(userid,key,$9,a); next; } ...‘ $IN# for reducer1 (such as “userid % reducenum == 0”)# command userid key valueMULTIHINCRBY 1111 a{ChangeGreed} 3HINCRBY 1111 a{GianEvent} 7HINCRBY 1111 a{TeamChallenge} 5HINCRBY 2222 a{Battle} 3HINCRBY 2222 a{ChangeMoney} 3...EXEC