1
What problem are we solving?• Map/Reduce can be used for aggregation…  • Currently being used for totaling, averaging, etc...
How will we solve the problem?• Our new aggregation framework  • Declarative framework    • No JavaScript required  • Desc...
Aggregation - Pipelines• Aggregation requests specify a pipeline• A pipeline is a series of operations• Conceptually, the ...
Pipeline Operations• $match  • Uses a query predicate (like .find({…})) as a filter• $project  • Uses a sample document to...
Pipeline Operations (continued)• $sort  • Sort documents• $limit  • Only allow the specified number of documents to    pas...
Projections• $project can reshape results  • Include or exclude fields  • Computed fields    • Arithmetic expressions, inc...
Unwinding• $unwind can “stream” arrays  • Array values are doled out one at time in the    context of their surrounding do...
Grouping• $group aggregation expressions  • Define a grouping key as the _id of the result  • Total grouped column values:...
Sorting• $sort can sort documents  • Sort specifications are the same as today, e.g.,    $sort:{ key1: 1, key2: -1, …}
Computed Expressions• Available in $project operations• Prefix expression language  • Add two fields: $add:[“$field1”, “$f...
Computed Expressions (continued)• String functions  • toUpper, toLower, substr• Date field extraction  • Get year, month, ...
DemoDemo files are at https://gist.github.com/1401585
Usage Tips• Use $match in a pipeline as early as possible  • The query optimizer can then choose to scan an    index and a...
Driver Support• Initial version is a command  • For any language, build a JSON database object,    and execute the command...
Sharding support• Initial release will support sharding• Mongos analyzes pipeline, and forwards  operations up to $group o...
When is this being released?• In final development now  • Adding an explain facility• Expect to see this in the near future
Future Plans• More optimizations• $out pipeline operation  • Saves the document stream to a collection  • Similar to M/R $...
mongodb-aggregation-may-2012
Upcoming SlideShare
Loading in...5
×

mongodb-aggregation-may-2012

2,607

Published on

Slide deck for my presentation at MongoSF 2012 in May: http://www.10gen.com/presentations/mongosf-2012/mongodb-new-aggregation-framework .

Published in: Technology, Business
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,607
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
0
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Transcript of "mongodb-aggregation-may-2012"

  1. 1. 1
  2. 2. What problem are we solving?• Map/Reduce can be used for aggregation… • Currently being used for totaling, averaging, etc• Map/Reduce is a big hammer • Simpler tasks should be easier • Shouldn’t need to write JavaScript • Avoid the overhead of JavaScript engine• We’re seeing requests for help in handling complex documents • Select only matching subdocuments or arrays
  3. 3. How will we solve the problem?• Our new aggregation framework • Declarative framework • No JavaScript required • Describe a chain of operations to apply • Expression evaluation • Return computed values • Framework: we can add new operations easily • C++ implementation • Higher performance than JavaScript
  4. 4. Aggregation - Pipelines• Aggregation requests specify a pipeline• A pipeline is a series of operations• Conceptually, the members of a collection are passed through a pipeline to produce a result • Similar to a command-line pipe
  5. 5. Pipeline Operations• $match • Uses a query predicate (like .find({…})) as a filter• $project • Uses a sample document to determine the shape of the result (similar to .find()’s optional argument) • This can include computed values• $unwind • Hands out array elements one at a time• $group • Aggregates items into buckets defined by a key
  6. 6. Pipeline Operations (continued)• $sort • Sort documents• $limit • Only allow the specified number of documents to pass• $skip • Skip over the specified number of documents
  7. 7. Projections• $project can reshape results • Include or exclude fields • Computed fields • Arithmetic expressions, including built-in functions • Pull fields from nested documents to the top • Push fields from the top down into new virtual documents
  8. 8. Unwinding• $unwind can “stream” arrays • Array values are doled out one at time in the context of their surrounding documents • Makes it possible to filter out elements before returning
  9. 9. Grouping• $group aggregation expressions • Define a grouping key as the _id of the result • Total grouped column values: $sum • Average grouped column values: $avg • Collect grouped column values in an array or set: $push, $addToSet • Other functions • $min, $max, $first, $last
  10. 10. Sorting• $sort can sort documents • Sort specifications are the same as today, e.g., $sort:{ key1: 1, key2: -1, …}
  11. 11. Computed Expressions• Available in $project operations• Prefix expression language • Add two fields: $add:[“$field1”, “$field2”] • Provide a value for a missing field: $ifNull:[“$field1”, “$field2”] • Nesting: $add:[“$field1”, $ifNull:[“$field2”, “$field3”]] • Other functions…. • And we can easily add more as required
  12. 12. Computed Expressions (continued)• String functions • toUpper, toLower, substr• Date field extraction • Get year, month, day, hour, etc, from ISODate• Date arithmetic• Null value substitution (like MySQL ifnull(), Oracle nvl())• Ternary conditional • Return one of two values based on a predicate
  13. 13. DemoDemo files are at https://gist.github.com/1401585
  14. 14. Usage Tips• Use $match in a pipeline as early as possible • The query optimizer can then choose to scan an index and avoid scanning the entire collection• Use $sort in a pipeline as early as possible • The query optimizer can then be used to choose an index to scan instead of sorting the result
  15. 15. Driver Support• Initial version is a command • For any language, build a JSON database object, and execute the command • In the shell: db.runCommand({ aggregate : <collection-name>, pipeline : {…} }); • Beware of command result size limit • Document size limit is 16MB
  16. 16. Sharding support• Initial release will support sharding• Mongos analyzes pipeline, and forwards operations up to $group or $sort to shards; combines shard server results and returns them
  17. 17. When is this being released?• In final development now • Adding an explain facility• Expect to see this in the near future
  18. 18. Future Plans• More optimizations• $out pipeline operation • Saves the document stream to a collection • Similar to M/R $out, but with sharded output • Functions like a tee, so that intermediate results can be saved

×