Developers love MongoDB because its flexible document model enhances their productivity. But did you know that MongoDB supports rich queries and lets you accomplish some of the same things you currently do with SQL statements? And that MongoDB's powerful aggregation framework makes it possible to perform real-time analytics for dashboards and reports?
Watch this webinar for an introduction to the MongoDB aggregation framework and a walk through of what you can do with it. We'll also demo an analysis of U.S. census data.
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Webinar: Exploring the Aggregation Framework
1. Exploring the
Aggregation Framework
Jason Mimick - Senior Consulting Engineer
jason.mimick@mongodb.com @jmimick
Original Slide Credits:
Jay Runkel jay.runkel@mongodb.com
et al
2. 2
Warning or Whew
This is a “101” beginner talk!
Assuming you know some basics about
MongoDB
But basically nothing about the Aggregation
Framework
3. 3
Agenda
1. Analytics in MongoDB?
2. Aggregation Framework
3. Aggregation Framework in Action
– US Census Data
– Aggregation Framework Options
4. New 3.2 stuff
– Friends of friends $lookup for self-joins
5. 5
For Example: US Census Data
• Census data from 1990, 2000, 2010
• Question:
Which US Division has the fastest growing population density?
– We only want to include data states with more than 1M people
– We only want to include divisions larger than 100K square miles
Division = a group of US States
Population density = Area of division/# of people
Data is provided at the state level
10. 10
What is the Aggregation Pipeline?
A Series of Document Transformations
– Executed in stages
– Original input is a collection
– Output as a cursor or a collection
Rich Library of Functions
– Filter, compute, group, and summarize data
– Output of one stage sent to input of next
– Operations executed in sequential order
17. 17
cData Collection
• Document For Each State
– Name
– Region
– Division
• Census Data For 1990, 2000, 2010
– Population
– Housing Units
– Occupied Housing Units
• Census Data is an array with three subdocuments
18. 18
Count, Distinct
• Check out cData docs
• count()
• distinct()
When you starting building your
aggregations you need to ‘get to know’ your
data!
19. 19
Simple $group
Census data has a collection called regions
> db.regions.findOne()
{
"_id" : ObjectId("54d0e1ac28099359f5660f9f"),
"state" : "Connecticut",
"region" : "Northeast",
"regNum" : 1,
"division" : "New England",
"divNum" : 1
}
How can we find out how many states are in each
region?
21. 21
$group
• Group documents by value
– _id - field reference, object,
constant
– Other output fields are computed
• $max, $min, $avg, $sum
• $addToSet, $push
• $first, $last
– Processes all data in memory by
default
22. 22
Total US Area
Back to cData…
Can we use $group to find the total area of the
US (according to these data)?
35. 35
$sort, $limit, $skip
• Sort documents by one or more
fields
– Same order syntax as cursors
– Waits for earlier pipeline operator to
return
– In-memory unless early and indexed
• Limit and skip follow cursor
behavior
36. 36
$first, $last
• Collection operations like $push and
$addToSet
• Must be used in $group
• $first and $last determined by document
order
• Typically used with $sort to ensure ordering is
known
43. 43
$geoNear
• Order/Filter Documents by Location
– Requires a geospatial index
– Output includes physical distance
– Must be first aggregation stage
47. 47
Back To The Original Question
• Which US Division has the fastest growing population density?
– We only want to include data states with more than 1M people
– We only want to include divisions larger than 100K square miles
53. 53
$sample
{ $sample: { size: <positive integer> } }
● If WT - pseudo-random cursor to return
docs
● If MMAPv1 - uses _id index to randomly
select docs
Used by Compass, Useful for unit tests, etc
54. 54
$lookup
• Performs a left outer join to another collection in the same database to filter in
documents from the “joined” collection for processing.
• To each input document, the $lookup stage adds a new array field whose
elements are the matching documents from the “joined” collection.
{
$lookup:
{
from: <collection to join>,
localField: <field from the input documents>,
foreignField: <field from the documents of the "from" collection>,
as: <output array field>
}
}
CANNOT BE SHARDED
https://docs.mongodb.org/master/reference/operator/aggregation/lookup/
60. 60
lots of new mathematical operators
$stdDevSamp Calculates standard deviation. { $stdDevSamp: <array> }
$stdDevPop Calculates population standard deviation. { $stdDevPop: <array> }
$sqrt Calculates the square root. { $sqrt: <number> }
$abs Returns the absolute value of a number. { $abs: <number> }
$log Calculates the log of a number in the specified base. { $log: [ <number>, <base> ] }
$log10 Calculates the log base 10 of a number. { $log10: <number> }
$ln Calculates the natural log of a number. { $ln: <number> }
$pow Raises a number to the specified exponent. { $pow: [ <number>, <exponent> ] }
$exp Raises e to the specified exponent. { $exp: <number> }
$trunc Truncates a number to its integer. { $trunc: <number> }
$ceil Returns the smallest integer greater than or equal to the specified number.{$ceil:<number>}
$floor Returns the largest integer less than or equal to the specified number. {$floor: <number>}
61. 61
new array operators
$slice Returns a subset of an array.
{ $slice: [ <array>, <n> ] } or { $slice: [ <array>, <position>, <n> ] }
$arrayElemAt Returns the element at the specified array index.{ $arrayElemAt: [ <array>, <idx>
] }
$concatArrays Concatenates arrays. { $concatArrays: [ <array1>, <array2>, ... ]}
$isArray Determines if the operand is an array. { $isArray: [ <expression> ] }
$filter Selects a subset of the array based on the condition.
{
$filter:
{
input: <array>,
as: <string>,
cond: <expression>
}
}