DeNormalised London:              Aggregation Framework Overview                                 Chris Harris             ...
Terminology          RDBMS              MongoDB          Table              Collection          Row(s)             JSON Do...
Here is a “simple” SQL Model     mysql> select * from book;     +----+----------------------------------------------------...
The Same Data in MongoDB             {         "_id" : ObjectId("4dfa6baa9c65dae09a4bbda5"),         "title" : "Programmin...
What problem are we solving?     • Map/Reduce can be used for aggregation…          • Currently being used for totaling, a...
How will we solve the problem?     • New aggregation framework          • Declarative framework (no JavaScript)          •...
Aggregation - Pipelines     • Aggregation requests specify a pipeline     • A pipeline is a series of operations     • Mem...
Example - twitter         {             "_id" : ObjectId("4f47b268fb1c80e141e9888c"),             "user" : {              ...
Example - twitter       db.tweets.aggregate(         {$match:           {"user.friends_count": { $gt: 0 },            "use...
Example - twitter       db.tweets.aggregate(         {$match:           {"user.friends_count": { $gt: 0 },     Predicate  ...
Example - twitter       db.tweets.aggregate(         {$match:           {"user.friends_count": { $gt: 0 },      Predicate ...
Example - twitter       db.tweets.aggregate(         {$match:           {"user.friends_count": { $gt: 0 },      Predicate ...
Example - twitter       {             "result" : [                    {                          "_id" : "Far Far Away",  ...
Demo                  Demo files are at https://gist.github.com/                                  2036709Wednesday, 21 Marc...
Projections     • $project can reshape results          • Include or exclude fields          • Computed fields            • ...
Unwinding     • $unwind can “stream” arrays          • Array values are doled out one at time in the            context of...
Grouping     • $group aggregation expressions          • Define a grouping key as the _id of the result          • Total gr...
Sorting     • $sort can sort documents          • Sort specifications are the same as today,              e.g., $sort:{ key...
Computed Expressions     • Available in $project operations     • Prefix expression language          • $add:[“$field1”, “$fi...
Computed Expressions     • String functions          • $toUpper, $toLower, $substr     • Date field extraction          • $...
download at mongodb.org                            We’re Hiring !                                  Chris Harris           ...
Upcoming SlideShare
Loading in …5
×

De normalised london aggregation framework overview

723
-1

Published on

Published in: Technology, Spiritual
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
723
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

De normalised london aggregation framework overview

  1. 1. DeNormalised London: Aggregation Framework Overview Chris Harris Email : charris@10gen.com Twitter : cj_harris5Wednesday, 21 March 12
  2. 2. Terminology RDBMS MongoDB Table Collection Row(s) JSON Document Index Index Join Embedding & Linking Partition Shard Partition Key Shard KeyWednesday, 21 March 12
  3. 3. Here is a “simple” SQL Model mysql> select * from book; +----+----------------------------------------------------------+ | id | title | +----+----------------------------------------------------------+ | 1 | The Demon-Haunted World: Science as a Candle in the Dark | | 2 | Cosmos | | 3 | Programming in Scala | +----+----------------------------------------------------------+ 3 rows in set (0.00 sec) mysql> select * from bookauthor; +---------+-----------+ | book_id | author_id | +---------+-----------+ | 1| 1| | 2| 1| | 3| 2| | 3| 3| | 3| 4| +---------+-----------+ 5 rows in set (0.00 sec) mysql> select * from author; +----+-----------+------------+-------------+-------------+---------------+ | id | last_name | first_name | middle_name | nationality | year_of_birth | +----+-----------+------------+-------------+-------------+---------------+ | 1 | Sagan | Carl | Edward | NULL | 1934 | | 2 | Odersky | Martin | NULL | DE | 1958 | | 3 | Spoon | Lex | NULL | NULL | NULL | | 4 | Venners | Bill | NULL | NULL | NULL | +----+-----------+------------+-------------+-------------+---------------+ 4 rows in set (0.00 sec)Wednesday, 21 March 12
  4. 4. The Same Data in MongoDB { "_id" : ObjectId("4dfa6baa9c65dae09a4bbda5"), "title" : "Programming in Scala", "author" : [ { "first_name" : "Martin", "last_name" : "Odersky", "nationality" : "DE", "year_of_birth" : 1958 }, { "first_name" : "Lex", "last_name" : "Spoon" }, { "first_name" : "Bill", "last_name" : "Venners" } ] }Wednesday, 21 March 12
  5. 5. What problem are we solving? • Map/Reduce can be used for aggregation… • Currently being used for totaling, averaging, etc • Map/Reduce is a big hammer • Simpler tasks should be easier • Shouldn’t need to write JavaScript • Avoid the overhead of JavaScript engine • We’re seeing requests for help in handling complex documents • Select only matching subdocuments or arraysWednesday, 21 March 12
  6. 6. How will we solve the problem? • New aggregation framework • Declarative framework (no JavaScript) • Describe a chain of operations to apply • Expression evaluation • Return computed values • Framework: new operations added easily • C++ implementationWednesday, 21 March 12
  7. 7. Aggregation - Pipelines • Aggregation requests specify a pipeline • A pipeline is a series of operations • Members of a collection are passed through a pipeline to produce a result • ps -ef | grep -i mongodWednesday, 21 March 12
  8. 8. Example - twitter { "_id" : ObjectId("4f47b268fb1c80e141e9888c"), "user" : { "friends_count" : 73, "location" : "Brazil", "screen_name" : "Bia_cunha1", "name" : "Beatriz Helena Cunha", "followers_count" : 102, } } • Find the # of followers and # friends by locationWednesday, 21 March 12
  9. 9. Example - twitter db.tweets.aggregate( {$match: {"user.friends_count": { $gt: 0 }, "user.followers_count": { $gt: 0 } } }, {$project: { location: "$user.location", friends: "$user.friends_count", followers: "$user.followers_count" } }, {$group: {_id: "$location", friends: {$sum: "$friends"}, followers: {$sum: "$followers"} } } );Wednesday, 21 March 12
  10. 10. Example - twitter db.tweets.aggregate( {$match: {"user.friends_count": { $gt: 0 }, Predicate "user.followers_count": { $gt: 0 } } }, {$project: { location: "$user.location", friends: "$user.friends_count", followers: "$user.followers_count" } }, {$group: {_id: "$location", friends: {$sum: "$friends"}, followers: {$sum: "$followers"} } } );Wednesday, 21 March 12
  11. 11. Example - twitter db.tweets.aggregate( {$match: {"user.friends_count": { $gt: 0 }, Predicate "user.followers_count": { $gt: 0 } } }, {$project: { location: "$user.location", Parts of the friends: "$user.friends_count", document you followers: "$user.followers_count" want to project } }, {$group: {_id: "$location", friends: {$sum: "$friends"}, followers: {$sum: "$followers"} } } );Wednesday, 21 March 12
  12. 12. Example - twitter db.tweets.aggregate( {$match: {"user.friends_count": { $gt: 0 }, Predicate "user.followers_count": { $gt: 0 } } }, {$project: { location: "$user.location", Parts of the friends: "$user.friends_count", document you followers: "$user.followers_count" want to project } }, {$group: {_id: "$location", Function to friends: {$sum: "$friends"}, apply to the followers: {$sum: "$followers"} } result set } );Wednesday, 21 March 12
  13. 13. Example - twitter { "result" : [ { "_id" : "Far Far Away", "friends" : 344, "followers" : 789 }, ... ], "ok" : 1 }Wednesday, 21 March 12
  14. 14. Demo Demo files are at https://gist.github.com/ 2036709Wednesday, 21 March 12
  15. 15. Projections • $project can reshape results • Include or exclude fields • Computed fields • Arithmetic expressions • Pull fields from nested documents to the top • Push fields from the top down into new virtual documentsWednesday, 21 March 12
  16. 16. Unwinding • $unwind can “stream” arrays • Array values are doled out one at time in the context of their surrounding documents • Makes it possible to filter out elements before returningWednesday, 21 March 12
  17. 17. Grouping • $group aggregation expressions • Define a grouping key as the _id of the result • Total grouped column values: $sum • Average grouped column values: $avg • Collect grouped column values in an array or set: $push, $addToSet • Other functions • $min, $max, $first, $lastWednesday, 21 March 12
  18. 18. Sorting • $sort can sort documents • Sort specifications are the same as today, e.g., $sort:{ key1: 1, key2: -1, …}Wednesday, 21 March 12
  19. 19. Computed Expressions • Available in $project operations • Prefix expression language • $add:[“$field1”, “$field2”] • $ifNull:[“$field1”, “$field2”] • Nesting: $add:[“$field1”, $ifNull:[“$field2”, “$field3”]] • Other functions…. • $divide, $mod, $multiplyWednesday, 21 March 12
  20. 20. Computed Expressions • String functions • $toUpper, $toLower, $substr • Date field extraction • $year, $month, $day, $hour... • Date arithmetic • $ifNull • Ternary conditional • Return one of two values based on a predicateWednesday, 21 March 12
  21. 21. download at mongodb.org We’re Hiring ! Chris Harris Email : charris@10gen.com Twitter : cj_harris5 conferences, appearances http://www.10gen.com/events and meetups http://www.meetup.com/London-MongoDB-User-GroupWednesday, 21 March 12
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×