Mongo db mug_2012-02-07


Published on

Data extraction for reporting on Mongo using

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • BREAK: Break-out and run the code for demo purposes.
  • Point out why you would want to do this:Quick and easy for exploratory purposes. Also a dirty way to validate your finished code for accurate numbers.Some customers prefer this format
  • Think of $match as being similar to query operators of a find query. The purpose of $match is to identify the documents that are relevant to your aggregation and shed the rest.In our example here, we’re using two operators to match documents where the kill._id is one of three specified AND the fragdate is within the time window specified. Any documents not matching both of these criteria will be discarded and not passed down the the next stage of the pipeline.
  • The $project operator gives you the opportunity to reformat and refine your data prior to passing it on. We’re doing a couple of things: First we’re removing all the data from the document except the id and displayname from the “kill” subdocumentWe’re also bringing along the fragdate, but only the hour that the event occurred, since that is what we’ll be aggregating onAll other data from the document is not passed on.
  • $group can be thought of as the main workhorse of the aggregation framework. This is where you’ll calculate your aggregate values based on the documents in the pipeline.$group must specify an _id field- and it has to be called _id. This can be a dotted field path reference, a subdocument with multiple fields or a constant value. If need be, a $project operator can rename the _id further down the pipeline.In our example, we’re using a subdocument containing the displayname of the player who was killed and the $eventhour for the hour when the event took place.Our aggregated value is numKills, meaning we want to track the number of times this player was killed in the documents being considered. To achieve that, we’re using the $sum operator and specifying a value of 1 for a new field we’re creating called “numKills”. This has the effect of incrementing the value of “numKills” by 1 each time a matching document is found in the collection.
  • Had numKills been an existing numeric value- we could have summed those values by specifying the $sum operator with a value of $numKills, which causes the aggregation framework to read the value of the number found inside of the numKills field. Similarly, we could have used $min, $max, $first, $last, or $avg in place of $sum to achieve different aggregation results.One thing to note: the $group operator currently stores $group operations in memory, so your capabilities may be impacted as a result.
  • Last, but certainly not least: the $sort operator sorts our data in the order we specify.As parameters, it accepts an object specifying fields and 1 or -1 as the sort order (ascending or descending respectively). This works just like the sort operator on a standard mongodb query.
  • The output of the aggregation query is a document with two fields: result and ok.Ok returns 1 if the query completed successfully, or an error code if it did not.The result field contains an array of documents returned by the pipeline.
  • If we take a closer look at the result array, we see documents with an _id field showing the displayname of the player and the hour being represented. A second field, numKills shows the aggregate value indicating the number of times this player was killed during the match.
  • To recap, the aggregation query accepts a series of pipeline operators to modify and aggregate a collection. On the surface, it is that simple. In practice, it’s pipleline approach can produce a wide array of results.
  • Mongo db mug_2012-02-07

    1. 1. O Where Clause, Where Clause!Wherefore art thou Where Clause? (a.k.a. Aggregation for Reporting)
    2. 2. Overview• Discuss and demonstrate aggregating data• Specifically addresses reporting needs• Example study: Aggregating Video Game Stats
    3. 3. Disclaimer!
    4. 4. Kills
    5. 5. Sales
    6. 6. Player
    7. 7. Product
    8. 8. Dataset{ "_id" : ObjectId("50fc77ee364c74eba1afe1e3"), "fragdate" : ISODate("2012-12-24T00:00:19.901Z"), "gameId" : 1221, Aggregate the "gameName" : "Christmas Blitz", "kill" : { number of times each "_id" : ObjectId("50acfd45712e8bc7832ea7cb"), player was killed "username" : "player002", "avatar" : "", "displayname" : "Sniper the Clown", "rank" : "Sniper", "motto" : "If you run, youll just die tired." }, "player" : { "userid" : 1, "username" : "ArmyD00d1221", "avatar" : "", "displayname" : "Army Grunt" }, "server" : ""}
    9. 9. Report Details{ "_id" : ObjectId("50fc77ee364c74eba1afe1e3"), "fragdate" : ISODate("2012-12-24T00:00:19.901Z"), Only aggregate kills on "gameId" : 1221, these three players: "gameName" : "Christmas Blitz", "kill" : { • Sniper the Clown "_id" : ObjectId("50acfd45712e8bc7832ea7cb"), • Kurious Killer "username" : "player002", "avatar" : "", • My L1ttl3 P0wn13 "displayname" : "Sniper the Clown", "rank" : "Sniper", "motto" : "If you run, youll just die tired." }, "player" : { Only on Dec 23, 2012 "userid" : 1, Between 2pm and 10pm "username" : "ArmyD00d1221", "avatar" : "", "displayname" : "Army Grunt" }, "server" : ""}
    10. 10. Relational DB Killed Kills IdId usernamefragDate avatargameID displayNamegameName rankserver mottofkKilledfkPlayer Player Id username avatar Could be the displayName same table rank motto
    11. 11. Relational DB Killed Kills IdId username avatarfragDate SELECT tk.fragDate,, count( FROM test.kills tkgameID JOIN players p ON tk.fkPlayer = displayNamegameName JOIN killed k ON tk.fkKilled = rankserver WHERE IN (1,2,3) mottofkKilled GROUP BY fragDate,;fkPlayer Player Id username avatar Could be the displayName same table rank motto
    12. 12. Sidenote: Exploration• Software Engineering tends to have more clearly defined goals• Report Engineering tends to have more clearly defined questions
    13. 13. Query From The Shell
    14. 14. Output
    15. 15. Next Step: Delimited Output
    16. 16. Display in Excel, R, Processing, etc
    17. 17. Aggregation: Big Picture MongoMongo Map/Reduce AggregationQueries Implementations Framework Complexity• Somewhere between Mongo Queries and Map/Reduce implementations• Best suited for totaling and averaging functions• Similar functionality to SQL Group By clause
    18. 18. Anatomy of Aggregation Frameworkdb.collection.aggregate( Aggregate command [ {do something}, {do something else}, Pipeline Operators {do even more stuff} ])
    19. 19. Pipeline Operators• Pipelines: transforms documents from the collection as they pass through – grep e server.log | less• Expressions: produce output documents based on calculations performed on input documents
    20. 20. Pipelines:$project$match$limit$skip$unwind$group$sort
    21. 21. Expressions$group Operators: Boolean Operators: Comparison Operators:$addToSet $and $cmp$first $or $eq$last $not $gt$max $lt$min $ne$avg$sumArithmetic Operators: String Operators: Date Operators:$add $strcasecmp $year$subtract $substr $month$multiply $toLower $hour$divide $toUpper See For an exhaustive list
    22. 22. Our Aggregation Query
    23. 23. Our Aggregation Query All the magic goes between the []
    24. 24. Our Aggregation Query $match: Provides a query-like interface to filter documents out of the aggregation pipeline. The $match drops documents that do not match the condition from the aggregation pipeline, and it passes documents that match along the pipeline unaltered.
    25. 25. Our Aggregation Query $project: Reshapes a document stream by renaming, adding, orremoving fields. Also use $project to create computed values or sub- objects
    26. 26. Our Aggregation Query $group Groups documents together for the purpose of calculating aggregate values based on a collection of documents. Practically, group often supports tasks such as average page views for each page in a website on a daily basis.
    27. 27. Our Aggregation Query “numKills”: { $sum: “$numKills” }
    28. 28. Our Aggregation Query $sort The $sort pipeline operator sorts all input documents and returns them to the pipeline in sorted order. { $sort : { <sort-key> } }
    29. 29. Aggregation Output{ "result" : [ { "_id" : { "displayname" : "My L1ttl3 P0wn13", "eventhour" : 21 }, "numKills" : 133 }, { "_id" : { Produces a document with "displayname" : "Kurious Killer", two fields: result and ok "eventhour" : 21 }, "numKills" : 130 },// ******* Omitted for brevity ******* { "_id" : { "displayname" : "Sniper the Clown", "eventhour" : 2 }, "numKills" : 6 } ], "ok" : 1}
    30. 30. Aggregation Output{ "result" : [ { "_id" : { "displayname" : "My L1ttl3 P0wn13", "eventhour" : 21 }, "numKills" : 133 }, { "_id" : { "displayname" : "Kurious Killer", "eventhour" : 21 }, "numKills" : 130 },// ******* Omitted for brevity ******* { "_id" : { "displayname" : "Sniper the Clown", "eventhour" : 2 }, "numKills" : 6 } ], "ok" : 1}
    31. 31. Recap: Aggregation Frameworkdb.collection.aggregate( Aggregate command [ {do something}, {do something else}, Pipeline Operators {do even more stuff} ])
    32. 32. We’re not quite done…We can’t really give something like this to ourcustomers:
    33. 33. But if we had… A database config.
    34. 34. But if we had… To run our aggregation.
    35. 35. But if we had… Inside a node server.
    36. 36. Q/A/Comments Will Button @wfbutton