MongoDB Aggregation MongoSF May 2011


Published on

Chris Westin's talk from MongoSF (May 2011) on MongoDB's coming aggregation framework.

Published in: Technology, Business
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

MongoDB Aggregation MongoSF May 2011

  1. 1. MongoDB’s New Aggregation Features<br />Chris Westin<br />© Copyright 2010 10gen Inc.<br />
  2. 2. What problem are we solving?<br />Map/Reduce can be used for aggregation…<br />Currently being used for totaling, averaging, etc<br />Map/Reduce is a big hammer<br />Simpler tasks should be easier<br />Shouldn’t need to write JavaScript<br />Avoid the overhead of JavaScript engine<br />We’re seeing requests for help in handling complex documents<br />Select only subdocuments or arrays<br />
  3. 3. How will we solve the problem?<br />Our new aggregation framework<br />Declarative framework<br />No JavaScript required<br />Describe a chain of operations to apply<br />Expression evaluation<br />Return computed values<br />Framework: we can add new operations easily<br />C++ implementation<br />Higher performance than JavaScript<br />
  4. 4. Aggregation - Pipelines<br />Aggregation requests specify a pipeline<br />A pipeline is a series of operations<br />Conceptually, the members of a collection are passed through a pipeline to produce a result<br />Similar to a command-line pipe<br />
  5. 5. Pipeline Operations<br />$match<br />Uses a query predicate (like .find({…})) as a filter<br />$project<br />Uses a sample document to determine the shape of the result (similar to .find()’s optional argument)<br />This can include computed values<br />$group<br />Aggregates items into buckets defined by a key<br />
  6. 6. Computed Expressions<br />Available in $project operations<br />Prefix expression language<br />Add two fields: $add:[“$field1”, “$field2”]<br />Provide a value for a missing field: $ifnull:[“$field1”, “$field2”]<br />Nesting: $add:[“$field1”, $ifnull:[“$field2”, “$field3”]]<br />Other functions….<br />And we can easily add more as required<br />
  7. 7. Projections<br />$project can reshape results<br />$unwind expression doles out array values one at a time<br />Pull fields from nested documents to the top<br />Push fields from the top down into new virtual documents<br />
  8. 8. Grouping<br />$group aggregation expressions<br />Total of column values: $sum<br />Average of column values: $avg<br />Collect column values in an array: $push<br />
  9. 9. Demo<br />(See script at<br />
  10. 10. Usage Tips<br />Use $match in a pipeline as early as possible<br />The query optimizer can then be used to choose an index and avoid scanning the entire collection<br />
  11. 11. Driver Support<br />Initial version is a command<br />For any language, build a JSON database object, and execute the command<br />{ aggregate : <collection>, pipeline : {…} }<br />Beware of command result size limit<br />
  12. 12. When is this being released?<br />In final development now<br />Expect to see this in the near future<br />
  13. 13. Sharding support<br />Initial release will support sharding<br />Mongos analyzes pipeline, and forwards operations up to $group to shards; combines shard server results and continues<br />
  14. 14. Pipeline Operations – Future Plans<br />$sort<br />Sorts the document stream according to a key<br />$out<br />Saves the document stream to a collection<br />Similar to M/R $out, but with sharded output<br />
  15. 15. Expressions – Future Plans<br />Date field extraction<br />Get year, month, day, hour, etc, from Date<br />Date arithmetic<br />