Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

MongoDB Aggregation Framework

8,085 views

Published on

Published in: Technology, Business
  • Distinct can be achieved using pipes in Aggregate framework...Sample code snippet at http://www.techiesinfo.com/code-snippet analysing clickstream
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

MongoDB Aggregation Framework

  1. 1. MongoDB’s New Aggregation Framework Tyler Brock
  2. 2. 2.1 available now (unstable)
  3. 3. Map Reduce Map/Reduce is a big hammer• Used to perform complex analytics tasks on massive amounts of data• Users are currently using it for aggregation… • totaling, averaging, etc
  4. 4. Problem • It should be easier to do simple aggregations • Shouldn’t need to write JavaScript • Avoid the overhead of JavaScript engine
  5. 5. New Aggregation Framework • Declarative • No JavaScript required • C++ implementation • Higher performance than JavaScript • Expression evaluation • Return computed values • Framework: we can add new operations easily
  6. 6. Pipeline • Series of operations • Members of a collection are passed through a pipeline to produce a result
  7. 7. The Aggregation Command• Takes two arguments • Aggregate -- name of collection • Pipeline -- array of pipeline operators db.runCommand( { aggregate : "article", pipeline : [ {$op1, $op2, ...} ] } );
  8. 8. Aggregation helper db.article.aggregate( { $pipeline_op1 }, { $pipeline_op2 }, { $pipeline_op3 }, { $pipeline_op4 }, ... );
  9. 9. Pipeline Operators Old Faves New Hotness • $match • $project • $sort • $unwind • $limit • $group • $skip
  10. 10. $match • Uses a query predicate (like .find({…})) as a filter { title : "this is my title" , author : "bob" , posted : new Date(1079895594000) , pageViews : 5 , tags : [ "fun" , "good" , "fun" ] , } { $match :{ $match : { author : "bob" } } { pgv : { $gt : 50, $lte : 90 } } }
  11. 11. $sort• Sorts input documents• Requires sort key -- specified like index keys { $sort : { name : 1, age: -1 } }
  12. 12. $limit• Limits the number of JSON documents { $limit : 5 }$skip• Skips a number of JSON documents { $skip : 5 }
  13. 13. $project• Project can reshape a document • add, remove, rename, move• Similar to .find()’s field selection syntax • But much more powerful• Can generate computed values
  14. 14. $project (include and exclude fields){ $project : { title : 1 , /* include this field, if it exists */ author : 1 , /* include this field, if it exists */ "comments.author" : 1 }}{ $project : { title : 0 , /* exclude this field */ author : 0 , /* exclude this field */ }}
  15. 15. $project (computed fields){ $project : { title : 1, /* include this field if it exists */ doctoredPageViews : { $add: ["$pageViews", 10] } }}
  16. 16. Computed Fields• Prefix expression language • Add two fields • $add:[“$field1”, “$field2”] • Provide a value for a missing field • $ifnull:[“$field1”, “$field2”] • Nesting • $add:[“$field1”, $ifnull:[“$field2”, “$field3”]] • Date field extraction • Get year, month, day, hour, etc, from Date • Date arithmetic
  17. 17. $project (rename and pull fields up){ $project : { title : 1 , page_views : "$pageViews" , /* rename this field */ upgrade : "$other.foo" /* move to top level */ }}
  18. 18. $project (push fields down){ $project : { title : 1 , stats : { pv : "$pageViews", /* rename this from the top-level */ } }}
  19. 19. $unwind• Produces document for each value in an array where the array value is single array element { title : "this is my title" , author : "bob" , posted : new Date(1079895594000) , pageViews : 5 , tags : [ "fun" , "good" , "awesome" ] , comments : [ { author :"joe" , text : "this is cool" } , { author :"sam" , text : "this is bad" } ], other : { foo : 5 } }
  20. 20. { ... tags : "fun" ...},{ ... tags : "good" ...}{ ... tags : "awesome" ...}
  21. 21. $unwind db.article.aggregate( { $project : { author : 1 , /* include this field */ title : 1 , /* include this field */ tags : 1 /* include this field */ }}, { $unwind : "$tags" } );
  22. 22. { "result" : [ { "_id" : ObjectId("4e6e4ef557b77501a49233f6"), "title" : "this is my title", "author" : "bob", "tags" : "fun" }, { "_id" : ObjectId("4e6e4ef557b77501a49233f6"), "title" : "this is my title", "author" : "bob", "tags" : "good" }, { "_id" : ObjectId("4e6e4ef557b77501a49233f6"), "title" : "this is my title", "author" : "bob", "tags" : "fun" } ], "ok" : 1}
  23. 23. Grouping• $group aggregation expressions • Total of column values: $sum • Average of column values: $avg • Collect column values in an array: $push { $group : { _id: "$author", fieldname: { $aggfunc: “$field” } } }
  24. 24. $group example db.article.aggregate( { $group : { _id : "$author", viewsPerAuthor : { $sum : "$pageViews" } }} );
  25. 25. { "result" : [ { "_id" : "jane", "viewsPerAuthor" : 6 }, { "_id" : "dave", "viewsPerAuthor" : 7 }, { "_id" : "bob", "viewsPerAuthor" : 5 } ], "ok" : 1}
  26. 26. Group Aggregation Functions $min $addToSet $avg $first $push $last $sum $max
  27. 27. Pulling it all together {{ tag : “fun” title : "this is my title" , authors: [ ..., ..., ... ] author : "bob" , }, posted : new Date(1079895594000) , { pageViews : 5 , tag: “good” tags : [ "fun" , "good" , "fun" ] authors: [ ..., ..., ... ]} }
  28. 28. db.article.aggregate( { $project : { author : 1, tags : 1, }}, { $unwind : "$tags" }, { $group : { _id : “$tags”, authors : { $addToSet : "$author" } }});
  29. 29. "result" : [ { "_id" : { "tags" : "cool" }, "authors" : [ "jane","dave" ] }, { "_id" : { "tags" : "fun" }, "authors" : [ "dave", "bob" ] }, { "_id" : { "tags" : "good" }, "authors" : [ "bob" ] }, { "_id" : { "tags" : "awful" }, "authors" : [ "jane" ] } ]
  30. 30. Usage Tips• Use $match in a pipeline as early as possible • The query optimizer can then be used to choose an index and avoid scanning the entire collection
  31. 31. Driver Support• Initial version is a command • For any language, build a JSON database object, and execute the command • { aggregate : <collection>, pipeline : [ ] } • Beware of command result size limit
  32. 32. Sharding support• Initial release will support sharding• Mongos analyzes pipeline, and forwards operations up to first $group or $sort to shards; combines shard server results and continues
  33. 33. Common SQL• Distinct • aggregate({ $group: { _id: "$author" }})• Count • aggregate({ $group: {_id:null, count: {$sum:1}}}])• Sum • aggregate({ $group: {_id:null, total: {$sum: "$price"}}})

×