Your SlideShare is downloading. ×
MongoDB Aggregation Framework
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

MongoDB Aggregation Framework


Published on

These are slides from our Big Data Warehouse Meetup in April. We talked about NoSQL databases: What they are, how they’re used and where they fit in existing enterprise data ecosystems. …

These are slides from our Big Data Warehouse Meetup in April. We talked about NoSQL databases: What they are, how they’re used and where they fit in existing enterprise data ecosystems.

Mike O’Brian from 10gen, introduced the syntax and usage patterns for a new aggregation system in MongoDB and give some demonstrations of aggregation using the new system. The new MongoDB aggregation framework makes it simple to do tasks such as counting, averaging, and finding minima or maxima while grouping by keys in a collection, complementing MongoDB’s built-in map/reduce capabilities.

For more information, visit our website at or email us at

Published in: Technology, Business

1 Comment
No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. AggregationFramework
  • 2. Quick Overview of
  • 3. Quick Overview ofDocument-orientedSchemalessJSON-style documentsRich QueriesScales Horizontallydb.users.find({last_name: Smith,age: {$gt : 10}});SELECT * FROM users WHERElast_name=‘Smith’ AND age > 10;
  • 4. Computing Aggregations inDatabasesSQL-basedRDBMSJOINGROUP BYAVG(),COUNT(),SUM(), FIRST(),LAST(),etc.MongoDB 2.0MapReduceMongoDB 2.2+MapReduceAggregation Framework
  • 5. MapReducevar map = function(){...emit(key, val);}var reduce = function(key, vals){...return resultVal;}DataMap()emit(k,v)Sort(k)Group(k)Reduce(k,values)k,vFinalize(k,v)k,vMongoDBmap iterates ondocumentsDocument is $this1 at time per shardInput matches outputCan run multiple times
  • 6. What’s wrong with just usingMapReduce?Map/Reduce is verypowerful, but often overkillLots of users relying on itfor simple aggregation tasks••
  • 7. What’s wrong with just usingMapReduce?Easy to screw up JavaScriptDebugging a M/R job sucksWriting more JS for simple tasks should not be necessary•••(ಠ︿ಠ)
  • 8. AggregationFrameworkDeclarative (no need to write JS)Implemented directly in C++Expression EvaluationReturn computed valuesFramework: We can extend it with newops•••••
  • 9. InputData(collection)FilterProjectUnwindGroupSortLimitResult(document)
  • 10. db.article.aggregate({ $project : {author : 1,tags : 1}},{ $unwind : "$tags" },{ $group : {_id : “$tags”,authors:{ $addToSet:"$author"}}});An aggregation command looks like:
  • 11. db.article.aggregate({ $project : {author : 1, tags : 1}},{ $unwind : "$tags" },{ $group : {_id : “$tags”,authors : { $addToSet:"$author"}}});New HelperMethod:.aggregate()Operatorpipelinedb.runCommand({aggregate : "article",pipeline : [ {$op1, $op2, ...} ]}
  • 12. {"result" : [{ "_id" : "art", "authors" : [ "bill", "bob" ] },{ "_id" : "sports", "authors" : [ "jane", "bob" ] },{ "_id" : "food", "authors" : [ "jane", "bob" ] },{ "_id" : "science", "authors" : [ "jane", "bill", "bob" ] }],"ok" : 1}Output Document Looks like this:result: array of pipelineoutputok: 1 for success, 0otherwise
  • 13. PipelineInput to the start of the pipeline is a collectionSeries of operators - each one filters or transforms itsinputPasses output data to next operator in the pipelineOutput of the pipeline is the result document••••ps -ax | tee processes.txt | moreKind of like UNIX:
  • 14. Let’s do:1. Tour of the pipelineoperators2. A couple examples based oncommon SQL aggregation tasks$match$unwind$group$project$skip $limit $sort
  • 15. filters documents from pipeline with a query predicatefiltered with:{$match: {author:”bob”}}$match{author: "bob", pageViews:5, title:"Lorem Ipsum..."}{author: "bill", pageViews:3, title:"dolor sit amet..."}{author: "joe", pageViews:52, title:"consectetur adipi..."}{author: "jane", pageViews:51, title:"sed diam..."}{author: "bob", pageViews:14, title:"magna aliquam..."}{author: "bob", pageViews:53, title:"claritas est..."}filtered with:{$match: {pageViews:{$gt:50}}{author:"bob",pageViews:5,title:"Lorem Ipsum..."}{author:"bob",pageViews:14,title:"magna aliquam..."}{author:"bob",pageViews:53,title:"claritas est..."}{author: "joe", pageViews:52, title:"consectetur adipiscing..."}{author: "jane", pageViews:51, title:"sed diam..."}{author: "bob", pageViews:53, title:"claritas est..."}Input:
  • 16. $unwind{"_id" : ObjectId("4f...146"),"author" : "bob","tags" :[ "fun","good","awesome"]}explode the “tags” array with:{ $unwind : ”$tags” }{ _id : ObjectId("4f...146"), author : "bob", tags:"fun"},{ _id : ObjectId("4f...146"), author : "bob", tags:"good"},{ _id : ObjectId("4f...146"), author : "bob", tags:"awesome"}produces output:Produce a new document foreach value in an input array
  • 17. Bucket a subset of docs together,calculate an aggregated output doc from the bucket$sum$max, $min$avg$first, $last$addToSet$pushdb.article.aggregate({ $group : {_id : "$author",viewsPerAuthor : { $sum :"$pageViews" }}});$groupOutputCalculationOperators:
  • 18. db.article.aggregate({ $group : {_id : "$author",viewsPerAuthor : { $sum : "$pageViews" }}});_id: selects a field to use asbucket key for groupingOutput field name Operation used to calculate theoutput value($sum, $max, $avg, etc.)$group (cont’d)dot notation (nested fields)a constanta multi-key expression inside{...}•••also allowed here:
  • 19. An example with $match and $groupSELECT SUM(price) FROM ordersWHERE customer_id = 4;MongoDB:SQL:db.orders.aggregate({$match : {“$customer_id” : 4}},{$group : { _id : null,total: {$sum : “price”}})English: Find the sum of all prices of theorders placed by customer #4
  • 20. An example with $unwind and $groupMongoDB:SQL:English:db.posts.aggregate({ $unwind : "$tags" },{ $group : {_id : “$tags”,authors : { $addToSet : "$author" }}});For all tags used in blog posts, produce a list ofauthors that have posted under each tagSELECT tag, author FROM post_tags LEFTJOIN posts ON post_tags.post_id GROUP BY tag, author;
  • 21. More operators - Controlling Pipeline Input$skip$limit$sortSimilar to:.skip().limit().sort()in a regular Mongo query
  • 22. $sortspecified the same way as index keys:{ $sort : { name : 1, age: -1 } }Must be used in order to takeadvantage of $first/$last with$group.order input documents
  • 23. $limitlimit the number of input documents{$limit : 5}$skipskips over documents{$skip : 5}
  • 24. $projectUse for:Add, Remove, Pull up, Push down, RenameFieldsBuilding computed fieldsReshape a document
  • 25. $project(cont’d)Include or exclude fields{$project :{ title : 1,author : 1} }Only pass on fields“title” and “author”{$project : { comments : 0}Exclude“comments” field,keep everythingelse
  • 26. Moving + Renaming fields{$project :{ page_views : “$pageViews”,catName : “$”,info : {published : “$ctime”,update : “$mtime”}}}Rename page_views to pageViewsTake nested field“”, moveit into top-level fieldcalled “catName”Populate a newsub-documentinto the output$project(cont’d)
  • 27. db.article.aggregate({ $project : {name : 1,age_fixed : { $add:["$age", 2] }}});Building a Computed FieldOutput(computed field) OperandsExpression$project(cont’d)
  • 28. Lots of AvailableExpressions$project(cont’d)Numeric $add $sub $mod $divide $multiplyLogical $eq $lte/$lt $gte/$gt $and $not $or $eqDates$dayOfMonth $dayOfYear $dayOfWeek $second $minute$hour $week $month $isoDateStrings $substr $add $toLower $toUpper $strcasecmp
  • 29. Example: $sort → $limit → $project→$groupMongoDB:SQL:English: Of the most recent 1000 blog posts, how manywere posted within each calendar year?SELECT YEAR(pub_time) as pub_year,COUNT(*) FROM(SELECT pub_time FROM posts ORDER BYpub_time desc)GROUP BY pub_year;db.test.aggregate({$sort : {pub_time: -1}},{$limit : 1000},{$project:{pub_year:{$year:["$pub_time"]}}},{$group: {_id:"$pub_year", num_year:{$sum:1}}})
  • 30. Some Usage NotesIn BSON, order matters - so computedfields always show up after regular fieldsWe use $ in front of field names todistinguish fields from string literalsin expressions “$name”“name”vs.
  • 31. Some Usage NotesUse a $match,$sort and $limitfirst in pipeline if possibleCumulative Operators $group:be aware of memory usageUse $project to discard unneeded fieldsRemember the 16MB output limit
  • 32. Aggregation vs.MapReduceFramework is geared towards counting/accumulatingIf you need something more exotic, useMapReduceNo 16MB constraint on output size withMapReduceJS in M/R is not limited to any fixed set of expressions••••
  • 33. thanks! ✌(-‿-)✌questions?$$$ BTW: we are hiring! $$$ me up: