• Save
MongoDB Aggregation Framework
Upcoming SlideShare
Loading in...5
×
 

MongoDB Aggregation Framework

on

  • 6,758 views

 

Statistics

Views

Total Views
6,758
Views on SlideShare
6,758
Embed Views
0

Actions

Likes
6
Downloads
0
Comments
1

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • Distinct can be achieved using pipes in Aggregate framework...Sample code snippet at http://www.techiesinfo.com/code-snippet analysing clickstream
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • \n
  • well, why do we need a new aggregation framework\n
  • but...\n
  • \n
  • it works by creating a pipeline\n
  • the way you create this pipeline is through the aggregation command\n
  • \n
  • \n
  • \n
  • $match should be placed as early in the aggregation pipeline as possible. This minimizes the number of documents after it, thereby minimizing later processing. Placing a $match at the very beginning of a pipeline will enable it to take advantage of indexes in exactly the same way as a regular query (find()/findOne()).\n
  • \n
  • \n
  • \n
  • _id is included by default in inclusion mode\nuser can specify _id: 0 but no other fields can be excluded\n\n
  • respects ordering\n
  • \n
  • \n
  • Doctored page views\nThe BSON specification specifies that field order matters, and is to be preserved. A projection will honor that, and fields will be output in the same order as they are input, regardless of the order of any inclusion or exclusion specifications.\nWhen new computed fields are added via a projection, these always follow all fields from the original source, and will appear in the order they appear in the projection specification.\n
  • \n
  • \n
  • \n
  • $unwind is most useful when combined with $group or $filter.\nThe effects of an unwind can be undone with the $push $group aggregation function.\nIf the target field does not exist within an input document, the document is passed through unchanged.\nIf the target field within an input document is not an array, an error is generated.\nIf the target field within an input document is an empty array ("[]"), then the document is passed through unchanged.\n
  • \n
  • _id can be a dotted field path reference (prefixed with a dollar sign, '$'), a braced document expression containing multiple fields (an order-preserving concatenated key), or a single constant. Using a constant will create a single bucket, and can be used to count documents, or to add all the values for a field in all the documents in a collection.\nIf you need the output to have a different name, it can easily be renamed using a simple $project after the $group.\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

MongoDB Aggregation Framework MongoDB Aggregation Framework Presentation Transcript

  • MongoDB’s New Aggregation Framework Tyler Brock
  • 2.1 available now (unstable)
  • Map Reduce Map/Reduce is a big hammer• Used to perform complex analytics tasks on massive amounts of data• Users are currently using it for aggregation… • totaling, averaging, etc
  • Problem • It should be easier to do simple aggregations • Shouldn’t need to write JavaScript • Avoid the overhead of JavaScript engine
  • New Aggregation Framework • Declarative • No JavaScript required • C++ implementation • Higher performance than JavaScript • Expression evaluation • Return computed values • Framework: we can add new operations easily
  • Pipeline • Series of operations • Members of a collection are passed through a pipeline to produce a result
  • The Aggregation Command• Takes two arguments • Aggregate -- name of collection • Pipeline -- array of pipeline operators db.runCommand( { aggregate : "article", pipeline : [ {$op1, $op2, ...} ] } );
  • Aggregation helper db.article.aggregate( { $pipeline_op1 }, { $pipeline_op2 }, { $pipeline_op3 }, { $pipeline_op4 }, ... );
  • Pipeline Operators Old Faves New Hotness • $match • $project • $sort • $unwind • $limit • $group • $skip
  • $match • Uses a query predicate (like .find({…})) as a filter { title : "this is my title" , author : "bob" , posted : new Date(1079895594000) , pageViews : 5 , tags : [ "fun" , "good" , "fun" ] , } { $match :{ $match : { author : "bob" } } { pgv : { $gt : 50, $lte : 90 } } }
  • $sort• Sorts input documents• Requires sort key -- specified like index keys { $sort : { name : 1, age: -1 } }
  • $limit• Limits the number of JSON documents { $limit : 5 }$skip• Skips a number of JSON documents { $skip : 5 }
  • $project• Project can reshape a document • add, remove, rename, move• Similar to .find()’s field selection syntax • But much more powerful• Can generate computed values
  • $project (include and exclude fields){ $project : { title : 1 , /* include this field, if it exists */ author : 1 , /* include this field, if it exists */ "comments.author" : 1 }}{ $project : { title : 0 , /* exclude this field */ author : 0 , /* exclude this field */ }}
  • $project (computed fields){ $project : { title : 1, /* include this field if it exists */ doctoredPageViews : { $add: ["$pageViews", 10] } }}
  • Computed Fields• Prefix expression language • Add two fields • $add:[“$field1”, “$field2”] • Provide a value for a missing field • $ifnull:[“$field1”, “$field2”] • Nesting • $add:[“$field1”, $ifnull:[“$field2”, “$field3”]] • Date field extraction • Get year, month, day, hour, etc, from Date • Date arithmetic
  • $project (rename and pull fields up){ $project : { title : 1 , page_views : "$pageViews" , /* rename this field */ upgrade : "$other.foo" /* move to top level */ }}
  • $project (push fields down){ $project : { title : 1 , stats : { pv : "$pageViews", /* rename this from the top-level */ } }}
  • $unwind• Produces document for each value in an array where the array value is single array element { title : "this is my title" , author : "bob" , posted : new Date(1079895594000) , pageViews : 5 , tags : [ "fun" , "good" , "awesome" ] , comments : [ { author :"joe" , text : "this is cool" } , { author :"sam" , text : "this is bad" } ], other : { foo : 5 } }
  • { ... tags : "fun" ...},{ ... tags : "good" ...}{ ... tags : "awesome" ...}
  • $unwind db.article.aggregate( { $project : { author : 1 , /* include this field */ title : 1 , /* include this field */ tags : 1 /* include this field */ }}, { $unwind : "$tags" } );
  • { "result" : [ { "_id" : ObjectId("4e6e4ef557b77501a49233f6"), "title" : "this is my title", "author" : "bob", "tags" : "fun" }, { "_id" : ObjectId("4e6e4ef557b77501a49233f6"), "title" : "this is my title", "author" : "bob", "tags" : "good" }, { "_id" : ObjectId("4e6e4ef557b77501a49233f6"), "title" : "this is my title", "author" : "bob", "tags" : "fun" } ], "ok" : 1}
  • Grouping• $group aggregation expressions • Total of column values: $sum • Average of column values: $avg • Collect column values in an array: $push { $group : { _id: "$author", fieldname: { $aggfunc: “$field” } } }
  • $group example db.article.aggregate( { $group : { _id : "$author", viewsPerAuthor : { $sum : "$pageViews" } }} );
  • { "result" : [ { "_id" : "jane", "viewsPerAuthor" : 6 }, { "_id" : "dave", "viewsPerAuthor" : 7 }, { "_id" : "bob", "viewsPerAuthor" : 5 } ], "ok" : 1}
  • Group Aggregation Functions $min $addToSet $avg $first $push $last $sum $max
  • Pulling it all together {{ tag : “fun” title : "this is my title" , authors: [ ..., ..., ... ] author : "bob" , }, posted : new Date(1079895594000) , { pageViews : 5 , tag: “good” tags : [ "fun" , "good" , "fun" ] authors: [ ..., ..., ... ]} }
  • db.article.aggregate( { $project : { author : 1, tags : 1, }}, { $unwind : "$tags" }, { $group : { _id : “$tags”, authors : { $addToSet : "$author" } }});
  • "result" : [ { "_id" : { "tags" : "cool" }, "authors" : [ "jane","dave" ] }, { "_id" : { "tags" : "fun" }, "authors" : [ "dave", "bob" ] }, { "_id" : { "tags" : "good" }, "authors" : [ "bob" ] }, { "_id" : { "tags" : "awful" }, "authors" : [ "jane" ] } ]
  • Usage Tips• Use $match in a pipeline as early as possible • The query optimizer can then be used to choose an index and avoid scanning the entire collection
  • Driver Support• Initial version is a command • For any language, build a JSON database object, and execute the command • { aggregate : <collection>, pipeline : [ ] } • Beware of command result size limit
  • Sharding support• Initial release will support sharding• Mongos analyzes pipeline, and forwards operations up to first $group or $sort to shards; combines shard server results and continues
  • Common SQL• Distinct • aggregate({ $group: { _id: "$author" }})• Count • aggregate({ $group: {_id:null, count: {$sum:1}}}])• Sum • aggregate({ $group: {_id:null, total: {$sum: "$price"}}})