• Save
MongoDB's New Aggregation framework
Upcoming SlideShare
Loading in...5
×
 

MongoDB's New Aggregation framework

on

  • 21,969 views

My talk on the new MongoDB aggregation framework from MongoSeattle December 1, 2011

My talk on the new MongoDB aggregation framework from MongoSeattle December 1, 2011

Statistics

Views

Total Views
21,969
Views on SlideShare
7,836
Embed Views
14,133

Actions

Likes
15
Downloads
0
Comments
1

40 Embeds 14,133

http://www.10gen.com 9307
http://mobicon.tistory.com 3060
http://www.mongodb.com 1317
http://iroylabs.blogspot.com 165
http://lanyrd.com 99
http://www.twylah.com 32
http://coderwall.com 20
https://twitter.com 18
http://paper.li 16
http://wikin.nhncorp.com 14
http://a0.twimg.com 11
http://iroylabs.blogspot.in 9
http://drupal1.10gen.cc 8
http://translate.googleusercontent.com 5
http://iroylabs.blogspot.ca 4
http://iroylabs.blogspot.co.uk 4
http://gurjarmehul.wordpress.com 4
http://iroylabs.blogspot.it 4
https://www.mongodb.com 3
http://iroylabs.blogspot.kr 3
http://iroylabs.blogspot.de 3
http://wikin.navercorp.com 3
http://iroylabs.blogspot.jp 3
http://iroylabs.blogspot.hu 2
http://us-w1.rockmelt.com 2
http://iroylabs.blogspot.com.br 2
http://iroylabs.blogspot.se 2
http://webcache.googleusercontent.com 1
http://wiki.nhncorp.com 1
http://www.google.ru 1
http://iroylabs.blogspot.com.es 1
http://www.google.com 1
http://blog.naver.com 1
http://iroylabs.blogspot.co.il 1
http://iroylabs.blogspot.ro 1
http://iroylabs.blogspot.com.au 1
http://webcache-exp-test.googleusercontent.com 1
https://si0.twimg.com 1
http://iroylabs.blogspot.pt 1
http://iroylabs.blogspot.be 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

MongoDB's New Aggregation framework MongoDB's New Aggregation framework Presentation Transcript

  • Chris WestinSoftware Engineer, 10gen © Copyright 2010 10gen Inc.
  • What problem are we solving?• Map/Reduce can be used for aggregation… • Currently being used for totaling, averaging, etc• Map/Reduce is a big hammer • Simpler tasks should be easier • Shouldn’t need to write JavaScript • Avoid the overhead of JavaScript engine• We’re seeing requests for help in handling complex documents • Select only matching subdocuments or arrays
  • How will we solve the problem?• Our new aggregation framework • Declarative framework • No JavaScript required • Describe a chain of operations to apply • Expression evaluation • Return computed values • Framework: we can add new operations easily • C++ implementation • Higher performance than JavaScript
  • Aggregation - Pipelines• Aggregation requests specify a pipeline• A pipeline is a series of operations• Conceptually, the members of a collection are passed through a pipeline to produce a result • Similar to a command-line pipe
  • Pipeline Operations• $match • Uses a query predicate (like .find({…})) as a filter• $project • Uses a sample document to determine the shape of the result (similar to .find()’s optional argument) • This can include computed values• $unwind • Hands out array elements one at a time• $group • Aggregates items into buckets defined by a key
  • Pipeline Operations (continued)• $sort • Sort documents• $limit • Only allow the specified number of documents to pass• $skip • Skip over the specified number of documents
  • Computed Expressions• Available in $project operations• Prefix expression language • Add two fields: $add:[“$field1”, “$field2”] • Provide a value for a missing field: $ifNull:[“$field1”, “$field2”] • Nesting: $add:[“$field1”, $ifNull:[“$field2”, “$field3”]] • Other functions…. • And we can easily add more as required
  • Computed Expressions (continued)• String functions • toUpper, toLower, substr• Date field extraction • Get year, month, day, hour, etc, from ISODate• Date arithmetic• Null value substitution (like MySQL ifnull(), Oracle nvl())• Ternary conditional • Return one of two values based on a predicate
  • Projections• $project can reshape results • Include or exclude fields • Computed fields • Arithmetic expressions, including built-in functions • Pull fields from nested documents to the top • Push fields from the top down into new virtual documents
  • Unwinding• $unwind can “stream” arrays • Array values are doled out one at time in the context of their surrounding documents • Makes it possible to filter out elements before returning
  • Grouping• $group aggregation expressions • Define a grouping key as the _id of the result • Total grouped column values: $sum • Average grouped column values: $avg • Collect grouped column values in an array or set: $push, $addToSet • Other functions • $min, $max, $first, $last
  • Sorting• $sort can sort documents • Sort specifications are the same as today, e.g., $sort:{ key1: 1, key2: -1, …}
  • DemoDemo files are at https://gist.github.com/1401585
  • Usage Tips• Use $match in a pipeline as early as possible • The query optimizer can then be used to choose an index and avoid scanning the entire collection• Use $sort in a pipeline as early as possible • The query optimizer can sometimes be used to choose an index to scan instead of sorting the result
  • Driver Support• Initial version is a command • For any language, build a JSON database object, and execute the command • { aggregate : <collection>, pipeline : {…} } • Beware of command result size limit • Document size limit is 16MB
  • When is this being released?• In final development now• Expect to see this in the near future
  • Sharding support• Initial release will support sharding• Mongos analyzes pipeline, and forwards operations up to $group or $sort to shards; combines shard server results and returns them
  • Pipeline Operations – Future Plans• $out • Saves the document stream to a collection • Similar to M/R $out, but with sharded output • Functions like a tee, so that intermediate results can be saved