• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
De normalised london  aggregation framework overview
 

De normalised london aggregation framework overview

on

  • 790 views

 

Statistics

Views

Total Views
790
Views on SlideShare
790
Embed Views
0

Actions

Likes
0
Downloads
7
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    De normalised london  aggregation framework overview De normalised london aggregation framework overview Presentation Transcript

    • DeNormalised London: Aggregation Framework Overview Chris Harris Email : charris@10gen.com Twitter : cj_harris5Wednesday, 21 March 12
    • Terminology RDBMS MongoDB Table Collection Row(s) JSON Document Index Index Join Embedding & Linking Partition Shard Partition Key Shard KeyWednesday, 21 March 12
    • Here is a “simple” SQL Model mysql> select * from book; +----+----------------------------------------------------------+ | id | title | +----+----------------------------------------------------------+ | 1 | The Demon-Haunted World: Science as a Candle in the Dark | | 2 | Cosmos | | 3 | Programming in Scala | +----+----------------------------------------------------------+ 3 rows in set (0.00 sec) mysql> select * from bookauthor; +---------+-----------+ | book_id | author_id | +---------+-----------+ | 1| 1| | 2| 1| | 3| 2| | 3| 3| | 3| 4| +---------+-----------+ 5 rows in set (0.00 sec) mysql> select * from author; +----+-----------+------------+-------------+-------------+---------------+ | id | last_name | first_name | middle_name | nationality | year_of_birth | +----+-----------+------------+-------------+-------------+---------------+ | 1 | Sagan | Carl | Edward | NULL | 1934 | | 2 | Odersky | Martin | NULL | DE | 1958 | | 3 | Spoon | Lex | NULL | NULL | NULL | | 4 | Venners | Bill | NULL | NULL | NULL | +----+-----------+------------+-------------+-------------+---------------+ 4 rows in set (0.00 sec)Wednesday, 21 March 12
    • The Same Data in MongoDB { "_id" : ObjectId("4dfa6baa9c65dae09a4bbda5"), "title" : "Programming in Scala", "author" : [ { "first_name" : "Martin", "last_name" : "Odersky", "nationality" : "DE", "year_of_birth" : 1958 }, { "first_name" : "Lex", "last_name" : "Spoon" }, { "first_name" : "Bill", "last_name" : "Venners" } ] }Wednesday, 21 March 12
    • What problem are we solving? • Map/Reduce can be used for aggregation… • Currently being used for totaling, averaging, etc • Map/Reduce is a big hammer • Simpler tasks should be easier • Shouldn’t need to write JavaScript • Avoid the overhead of JavaScript engine • We’re seeing requests for help in handling complex documents • Select only matching subdocuments or arraysWednesday, 21 March 12
    • How will we solve the problem? • New aggregation framework • Declarative framework (no JavaScript) • Describe a chain of operations to apply • Expression evaluation • Return computed values • Framework: new operations added easily • C++ implementationWednesday, 21 March 12
    • Aggregation - Pipelines • Aggregation requests specify a pipeline • A pipeline is a series of operations • Members of a collection are passed through a pipeline to produce a result • ps -ef | grep -i mongodWednesday, 21 March 12
    • Example - twitter { "_id" : ObjectId("4f47b268fb1c80e141e9888c"), "user" : { "friends_count" : 73, "location" : "Brazil", "screen_name" : "Bia_cunha1", "name" : "Beatriz Helena Cunha", "followers_count" : 102, } } • Find the # of followers and # friends by locationWednesday, 21 March 12
    • Example - twitter db.tweets.aggregate( {$match: {"user.friends_count": { $gt: 0 }, "user.followers_count": { $gt: 0 } } }, {$project: { location: "$user.location", friends: "$user.friends_count", followers: "$user.followers_count" } }, {$group: {_id: "$location", friends: {$sum: "$friends"}, followers: {$sum: "$followers"} } } );Wednesday, 21 March 12
    • Example - twitter db.tweets.aggregate( {$match: {"user.friends_count": { $gt: 0 }, Predicate "user.followers_count": { $gt: 0 } } }, {$project: { location: "$user.location", friends: "$user.friends_count", followers: "$user.followers_count" } }, {$group: {_id: "$location", friends: {$sum: "$friends"}, followers: {$sum: "$followers"} } } );Wednesday, 21 March 12
    • Example - twitter db.tweets.aggregate( {$match: {"user.friends_count": { $gt: 0 }, Predicate "user.followers_count": { $gt: 0 } } }, {$project: { location: "$user.location", Parts of the friends: "$user.friends_count", document you followers: "$user.followers_count" want to project } }, {$group: {_id: "$location", friends: {$sum: "$friends"}, followers: {$sum: "$followers"} } } );Wednesday, 21 March 12
    • Example - twitter db.tweets.aggregate( {$match: {"user.friends_count": { $gt: 0 }, Predicate "user.followers_count": { $gt: 0 } } }, {$project: { location: "$user.location", Parts of the friends: "$user.friends_count", document you followers: "$user.followers_count" want to project } }, {$group: {_id: "$location", Function to friends: {$sum: "$friends"}, apply to the followers: {$sum: "$followers"} } result set } );Wednesday, 21 March 12
    • Example - twitter { "result" : [ { "_id" : "Far Far Away", "friends" : 344, "followers" : 789 }, ... ], "ok" : 1 }Wednesday, 21 March 12
    • Demo Demo files are at https://gist.github.com/ 2036709Wednesday, 21 March 12
    • Projections • $project can reshape results • Include or exclude fields • Computed fields • Arithmetic expressions • Pull fields from nested documents to the top • Push fields from the top down into new virtual documentsWednesday, 21 March 12
    • Unwinding • $unwind can “stream” arrays • Array values are doled out one at time in the context of their surrounding documents • Makes it possible to filter out elements before returningWednesday, 21 March 12
    • Grouping • $group aggregation expressions • Define a grouping key as the _id of the result • Total grouped column values: $sum • Average grouped column values: $avg • Collect grouped column values in an array or set: $push, $addToSet • Other functions • $min, $max, $first, $lastWednesday, 21 March 12
    • Sorting • $sort can sort documents • Sort specifications are the same as today, e.g., $sort:{ key1: 1, key2: -1, …}Wednesday, 21 March 12
    • Computed Expressions • Available in $project operations • Prefix expression language • $add:[“$field1”, “$field2”] • $ifNull:[“$field1”, “$field2”] • Nesting: $add:[“$field1”, $ifNull:[“$field2”, “$field3”]] • Other functions…. • $divide, $mod, $multiplyWednesday, 21 March 12
    • Computed Expressions • String functions • $toUpper, $toLower, $substr • Date field extraction • $year, $month, $day, $hour... • Date arithmetic • $ifNull • Ternary conditional • Return one of two values based on a predicateWednesday, 21 March 12
    • download at mongodb.org We’re Hiring ! Chris Harris Email : charris@10gen.com Twitter : cj_harris5 conferences, appearances http://www.10gen.com/events and meetups http://www.meetup.com/London-MongoDB-User-GroupWednesday, 21 March 12