MongoDB Aggregation MongoSF May 2011

  • 8,269 views
Uploaded on

Chris Westin's talk from MongoSF (May 2011) on MongoDB's coming aggregation framework.

Chris Westin's talk from MongoSF (May 2011) on MongoDB's coming aggregation framework.

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
8,269
On Slideshare
0
From Embeds
0
Number of Embeds
3

Actions

Shares
Downloads
0
Comments
0
Likes
12

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. MongoDB’s New Aggregation Features
    Chris Westin
    © Copyright 2010 10gen Inc.
  • 2. What problem are we solving?
    Map/Reduce can be used for aggregation…
    Currently being used for totaling, averaging, etc
    Map/Reduce is a big hammer
    Simpler tasks should be easier
    Shouldn’t need to write JavaScript
    Avoid the overhead of JavaScript engine
    We’re seeing requests for help in handling complex documents
    Select only subdocuments or arrays
  • 3. How will we solve the problem?
    Our new aggregation framework
    Declarative framework
    No JavaScript required
    Describe a chain of operations to apply
    Expression evaluation
    Return computed values
    Framework: we can add new operations easily
    C++ implementation
    Higher performance than JavaScript
  • 4. Aggregation - Pipelines
    Aggregation requests specify a pipeline
    A pipeline is a series of operations
    Conceptually, the members of a collection are passed through a pipeline to produce a result
    Similar to a command-line pipe
  • 5. Pipeline Operations
    $match
    Uses a query predicate (like .find({…})) as a filter
    $project
    Uses a sample document to determine the shape of the result (similar to .find()’s optional argument)
    This can include computed values
    $group
    Aggregates items into buckets defined by a key
  • 6. Computed Expressions
    Available in $project operations
    Prefix expression language
    Add two fields: $add:[“$field1”, “$field2”]
    Provide a value for a missing field: $ifnull:[“$field1”, “$field2”]
    Nesting: $add:[“$field1”, $ifnull:[“$field2”, “$field3”]]
    Other functions….
    And we can easily add more as required
  • 7. Projections
    $project can reshape results
    $unwind expression doles out array values one at a time
    Pull fields from nested documents to the top
    Push fields from the top down into new virtual documents
  • 8. Grouping
    $group aggregation expressions
    Total of column values: $sum
    Average of column values: $avg
    Collect column values in an array: $push
  • 9. Demo
    (See script at https://gist.github.com/993733)
  • 10. Usage Tips
    Use $match in a pipeline as early as possible
    The query optimizer can then be used to choose an index and avoid scanning the entire collection
  • 11. Driver Support
    Initial version is a command
    For any language, build a JSON database object, and execute the command
    { aggregate : <collection>, pipeline : {…} }
    Beware of command result size limit
  • 12. When is this being released?
    In final development now
    Expect to see this in the near future
  • 13. Sharding support
    Initial release will support sharding
    Mongos analyzes pipeline, and forwards operations up to $group to shards; combines shard server results and continues
  • 14. Pipeline Operations – Future Plans
    $sort
    Sorts the document stream according to a key
    $out
    Saves the document stream to a collection
    Similar to M/R $out, but with sharded output
  • 15. Expressions – Future Plans
    Date field extraction
    Get year, month, day, hour, etc, from Date
    Date arithmetic