SlideShare a Scribd company logo
1 of 65
Exploring the
Aggregation Framework
Jason Mimick - Senior Consulting Engineer
jason.mimick@mongodb.com @jmimick
Original Slide Credits:
Jay Runkel jay.runkel@mongodb.com
et al
2
Warning or Whew
This is a “101” beginner talk!
Assuming you know some basics about
MongoDB
But basically nothing about the Aggregation
Framework
3
Agenda
1. Analytics in MongoDB?
2. Aggregation Framework
3. Aggregation Framework in Action
– US Census Data
– Aggregation Framework Options
4. New 3.2 stuff
– Friends of friends $lookup for self-joins
4
Analytics in MongoDB?
Create
Read
Update
Delete
Analytics
?
Group
Count
Derive Values
Filter
Average
Sort
5
For Example: US Census Data
• Census data from 1990, 2000, 2010
• Question:
Which US Division has the fastest growing population density?
– We only want to include data states with more than 1M people
– We only want to include divisions larger than 100K square miles
Division = a group of US States
Population density = Area of division/# of people
Data is provided at the state level
6
US Regions and Divisions
7
How would we solve this in SQL?
SELECT GROUP BY HAVING
Of course, we don’t have SQL
we’re a noSQL database
8
The Aggregation Framework
9
Core Concept: Pipeline
ps -ef | grep mongod
10
What is the Aggregation Pipeline?
A Series of Document Transformations
– Executed in stages
– Original input is a collection
– Output as a cursor or a collection
Rich Library of Functions
– Filter, compute, group, and summarize data
– Output of one stage sent to input of next
– Operations executed in sequential order
11
An Example Aggregation Pipeline
12
Syntax
>db.foo.aggregate( [ { stage1 },{ stage2 },{ stage3 }, … ])
mongo shell
1 db - variable pointing to current database
2 collection name
3 aggregate - method on collection
4 array of objects, each a pipeline operator
5 pipeline operators
1 2 3 4 ...5...
13
Syntax - Driver - Java
db.hospital.aggregate( [
{ "$group" : { "_id" : "$PatientID, "count" : { "$sum" : 1 } } },
{ "$match" : { "count" : { "$gte" : 5 } } },
{ "$sort" : { "count" : -1 } } ] )
14
Some Popular Pipeline Operators
$match Filter documents
$project Reshape documents
$group Summarize documents
$unwind Expand arrays in documents
$sort Order documents
$limit/$skip Paginate documents
$redact Restrict documents
$geoNear Proximity sort documents
$let,$map Define variables
15
80+ operators available as of MongoDB 3.2
Aggregation Framework in Action
(let’s play with the census data)
17
cData Collection
• Document For Each State
– Name
– Region
– Division
• Census Data For 1990, 2000, 2010
– Population
– Housing Units
– Occupied Housing Units
• Census Data is an array with three subdocuments
18
Count, Distinct
• Check out cData docs
• count()
• distinct()
When you starting building your
aggregations you need to ‘get to know’ your
data!
19
Simple $group
Census data has a collection called regions
> db.regions.findOne()
{
"_id" : ObjectId("54d0e1ac28099359f5660f9f"),
"state" : "Connecticut",
"region" : "Northeast",
"regNum" : 1,
"division" : "New England",
"divNum" : 1
}
How can we find out how many states are in each
region?
20
> db.regions.aggregate( [
{ "$group" : { "_id" : "$region",
"count" : { "$sum" : 1 }
}
} ] )
{ "_id" : "West", "count" : 13 }
{ "_id" : "South", "count" : 17 }
{ "_id" : "Midwest", "count" : 12 }
{ "_id" : "Northeast", "count" : 9 }
// make more readable - store your pipeline ops in variables
>var group = { "$group" : { "_id" : "$region", "count" : {
"$sum" : 1 } } };
db.regions.aggregate( [ group ] )
21
$group
• Group documents by value
– _id - field reference, object,
constant
– Other output fields are computed
• $max, $min, $avg, $sum
• $addToSet, $push
• $first, $last
– Processes all data in memory by
default
22
Total US Area
Back to cData…
Can we use $group to find the total area of the
US (according to these data)?
23
db.cData.aggregate([
{"$group" : {"_id" : null,
"totalArea" : {$sum : "$areaM"},
"avgArea" : {$avg : "$areaM"}
}
}])
{ "_id" : null,
"totalArea" : 3802067.0700000003,
"avgArea" : 73116.67442307693 }
24
Area By Regiondb.cData.aggregate([
{"$group" : {"_id" : "$region",
"totalArea" : {$sum : "$areaM"},
"avgArea" : {$avg : "$areaM"},
"numStates" : {$sum : 1},
"states" : {$push : "$name"}}}
])
{ "_id" : null, "totalArea" : 5393.18, "avgArea" : 2696.59, "numStates" : 2, "states" : [ "District of
Columbia", "Puerto Rico" ] }
{ "_id" : "Northeast", "totalArea" : 181319.86, "avgArea" : 20146.65111111111, "numStates" : 9, "states"
: [ "New Jersey", "Vermont", "Maine", "New Hampshire", "Rhode Island", "Pennsylvania", "Connecticut",
"Massachusetts", "New York" ] }
{ "_id" : "Midwest", "totalArea" : 821724.3700000001, "avgArea" : 68477.03083333334, "numStates" : 12,
"states" : [ "Iowa", "Missouri", "Ohio", "Indiana", "North Dakota", "Wisconsin", "Illinois", "Minnesota",
"Kansas", "South Dakota", "Michigan", "Nebraska" ] }
{ "_id" : "West", "totalArea" : 1873251.6300000001, "avgArea" : 144096.27923076923, "numStates" : 13,
"states" : [ "Colorado", "Wyoming", "California", "Utah", "Nevada", "Alaska", "Hawaii", "Montana", "New
Mexico", "Arizona", "Idaho", "Oregon", "Washington" ] }
{ "_id" : "South", "totalArea" : 920378.03, "avgArea" : 57523.626875, "numStates" : 16, "states" : [
"Alabama", "Georgia", "Maryland", "South Carolina", "Florida", "Mississippi", "Arkansas", "Louisiana",
"North Carolina", "Texas", "West Virginia", "Oklahoma", "Virginia", "Delaware", "Kentucky", "Tennessee" ]
}
25
Calculating Average State Area By
Region
{ $group: {
_id: "$region",
avgAreaM: {$avg:
”$areaM" }
}}
{
_id: ”North East",
avgAreaM: 154
}
{
_id: “West",
avgAreaM: 300
}
{
state: ”New York",
areaM: 218,
region: “North East"
}
{
state: ”New Jersey",
areaM: 90,
region: “North East”
}
{
state: “California",
areaM: 300,
region: “West"
}
26
Calculating Total Area and State Count
{ $group: {
_id: "$region",
totArea: {$sum:
”$areaM" },
sCount : {$sum : 1}}}
{
_id: ”North East",
totArea: 308
sCount: 2}
{
_id: “West",
totArea: 300,
sCount: 1}
{
state: ”New York",
areaM: 218,
region: “North East"
}
{
state: ”New Jersey",
areaM: 90,
region: “North East”
}
{
state: “California",
area: 300,
region: “West"
}
27
Total US Population By Year
db.cData.aggregate(
[{$unwind : "$data"},
{$group : {"_id" : "$data.year",
"totalPop" : {$sum : "$data.totalPop"}}},
{$sort : {"totalPop" : 1}}
])
{ "_id" : 1990, "totalPop" : 248709873 }
{ "_id" : 2000, "totalPop" : 281421906 }
{ "_id" : 2010, "totalPop" : 312471327 }
28
$unwind
• Flattens arrays
• Create documents from array elements
• Array replaced by element value
• Missing/empty fields → no output
• Non-array fields → error
• Pipe to $group to aggregate
{ "a" : "foo", "b" : [1, 2, 3] }
{ "a" : "foo", "b" : 1 }
{ "a" : "foo", "b" : 2 }
{ "a" : "foo", "b" : 3 }
29
$unwind
{ $unwind: $census }
{ state: “New York,
census: 1990}
{
state: ”New York",
census: [1990, 2000,
2010]
}
{
state: ”New Jersey",
census: [1990, 2000]
}
{
state: “California",
census: [1980, 1990,
2000, 2010]
}
{
state: ”Delaware",
census: [1990, 2000]
}
{ state: “New York,
census: 2000}
{ state: “New York,
census: 2010}
{ state: “New Jersey,
census: 1990}
{ state: “New Jersey,
census: 2000}
…
30
Southern State Population By Year
db.cData.aggregate(
[{$match : {"region" : "South"}},
{$unwind : "$data"},
{$group : {"_id" : "$data.year",
"totalPop” : {"$sum” :
"$data.totalPop"}}}])
{ "_id" : 2010, "totalPop" : 113954021 }
{ "_id" : 2000, "totalPop" : 99664761 }
{ "_id" : 1990, "totalPop" : 84839030 }
31
$match
• Filter documents
–Uses existing query syntax
32
$match
{ $match:
{ “region” : “West” }
}
{
state: ”New York",
areaM: 218,
region: “North East"
}
{
state: ”Oregon",
areaM: 245,
region: “West”
}
{
state: “California",
area: 300,
region: “West"
}
{
state: ”Oregon",
areaM: 245,
region: “West”
}
{
state: “California",
area: 300,
region: “West"
}
33
Population Delta By State from 1990 to
2010
db.cData.aggregate([
{$unwind : "$data"},
{$sort : {"data.year" : 1}},
{$group :{"_id" : "$name",
"pop1990" : {"$first" : "$data.totalPop"},
"pop2010" : {"$last" : "$data.totalPop"}}},
{$project : {"_id" : 0, "name" : "$_id",
"delta" : {"$subtract" : ["$pop2010",
"$pop1990"]}, "pop1990" :
1,
"pop2010” : 1}
}
])
34
{ "pop1990" : 3725789, "pop2010" : 3725789, "name" : "Puerto Rico", "delta" : 0 }
{ "pop1990" : 4866692, "pop2010" : 6724540, "name" : "Washington", "delta" : 1857848 }
{ "pop1990" : 4877185, "pop2010" : 6346105, "name" : "Tennessee", "delta" : 1468920 }
{ "pop1990" : 1227928, "pop2010" : 1328361, "name" : "Maine", "delta" : 100433 }
{ "pop1990" : 1006749, "pop2010" : 1567582, "name" : "Idaho", "delta" : 560833 }
{ "pop1990" : 1108229, "pop2010" : 1360301, "name" : "Hawaii", "delta" : 252072 }
{ "pop1990" : 3665228, "pop2010" : 6392017, "name" : "Arizona", "delta" : 2726789 }
{ "pop1990" : 638800, "pop2010" : 672591, "name" : "North Dakota", "delta" : 33791 }
{ "pop1990" : 6187358, "pop2010" : 8001024, "name" : "Virginia", "delta" : 1813666 }
{ "pop1990" : 550043, "pop2010" : 710231, "name" : "Alaska", "delta" : 160188 }
{ "pop1990" : 1109252, "pop2010" : 1316470, "name" : "New Hampshire", "delta" : 207218
}
{ "pop1990" : 10847115, "pop2010" : 11536504, "name" : "Ohio", "delta" : 689389 }
{ "pop1990" : 6016425, "pop2010" : 6547629, "name" : "Massachusetts", "delta" : 531204
}
{ "pop1990" : 6628637, "pop2010" : 9535483, "name" : "North Carolina", "delta" :
2906846 }
{ "pop1990" : 3287116, "pop2010" : 3574097, "name" : "Connecticut", "delta" : 286981 }
{ "pop1990" : 17990455, "pop2010" : 19378102, "name" : "New York", "delta" : 1387647 }
{ "pop1990" : 29760021, "pop2010" : 37253956, "name" : "California", "delta" : 7493935
}
{ "pop1990" : 16986510, "pop2010" : 25145561, "name" : "Texas", "delta" : 8159051 }
{ "pop1990" : 11881643, "pop2010" : 12702379, "name" : "Pennsylvania", "delta" : 820736
}
{ "pop1990" : 2842321, "pop2010" : 3831074, "name" : "Oregon", "delta" : 988753 }
35
$sort, $limit, $skip
• Sort documents by one or more
fields
– Same order syntax as cursors
– Waits for earlier pipeline operator to
return
– In-memory unless early and indexed
• Limit and skip follow cursor
behavior
36
$first, $last
• Collection operations like $push and
$addToSet
• Must be used in $group
• $first and $last determined by document
order
• Typically used with $sort to ensure ordering is
known
37
$project
• Reshape/Transform Documents
– Include, exclude or rename fields
– Inject computed fields
– Create sub-document fields
38
Including and Excluding Fields
{ $project:
{ “_id” : 0,
“pop1990” : 1,
“pop2010” : 1
}
{
"_id" : "Virginia”,
"pop1990" : 453588,
"pop2010" : 3725789
}
{
"_id" : "South Dakota",
"pop1990" : 453588,
"pop2010" : 3725789
}
{
"pop1990" : 453588,
"pop2010" : 3725789
}
{
"pop1990" : 453588,
"pop2010" : 3725789
}
39
{
”name" : “South Dakota”,
”delta" : 118176
}
Renaming and Computing Fields
{ $project:
{ “_id” : 0,
“pop1990” : 0,
“pop2010” : 0,
“name” : “$_id”,
"delta" :
{"$subtract" :
["$pop2010",
"$pop1990"]}}
}
{
"_id" : "Virginia”,
"pop1990" : 6187358,
"pop2010" : 8001024
}
{
"_id" : "South Dakota",
"pop1990" : 696004,
"pop2010" : 814180
}
{
”name" : “Virginia”,
”delta" : 1813666
}
40
Compare number of people living within
500KM of Memphis, TN in 1990, 2000, 2010
41
Compare number of people living within
500KM of Memphis, TN in 1990, 2000, 2010
db.cData.aggregate([
{$geoNear : {
"near" : {"type" : "Point", "coordinates" : [90, 35]},
“distanceField” : "dist.calculated",
“maxDistance” : 500000,
“includeLocs” : "dist.location",
“spherical” : true }},
{$unwind : "$data"},
{$group : {"_id" : "$data.year",
"totalPop" : {"$sum" : "$data.totalPop"},
"states" : {"$addToSet" : "$name"}}},
{$sort : {"_id" : 1}}
])
42
{ "_id" : 1990, "totalPop" : 22644082, "states" : [ "Kentucky", "Missouri", "Alabama",
"Tennessee", "Mississippi", "Arkansas" ] }
{ "_id" : 2000, "totalPop" : 25291421, "states" : [ "Kentucky", "Missouri", "Alabama",
"Tennessee", "Mississippi", "Arkansas" ] }
{ "_id" : 2010, "totalPop" : 27337350, "states" : [ "Kentucky", "Missouri", "Alabama",
"Tennessee", "Mississippi", "Arkansas" ] }
43
$geoNear
• Order/Filter Documents by Location
– Requires a geospatial index
– Output includes physical distance
– Must be first aggregation stage
44
{
"_id" : ”Tennessee",
"pop1990" : 4877185,
"pop2010" : 6346105,
“center” :
{“type” : “Point”,
“coordinates” :
[86.6,
37.8]}
}
{
"_id" : "Virginia”,
"pop1990" : 6187358,
"pop2010" : 8001024,
“center” :
{“type” : “Point”,
“coordinates” :
[78.6,
37.5]}
}
$geoNear
{$geoNear : {
"near”: {"type”: "Point",
"coordinates”:
[90, 35]},
maxDistance : 500000,
spherical : true }}
{
"_id" : ”Tennessee",
"pop1990" : 4877185,
"pop2010" : 6346105,
“center” :
{“type” : “Point”,
“coordinates” :
[86.6,
37.8]}
}
45
What if I want to save the results to a
collection?
db.cData.aggregate([
{$geoNear : {
"near" : {"type" : "Point", "coordinates" : [90, 35]},
“distanceField” : "dist.calculated",
“maxDistance” : 500000,
“includeLocs” : "dist.location",
“spherical” : true }},
{$unwind : "$data"},
{$group : {"_id" : "$data.year",
"totalPop" : {"$sum" : "$data.totalPop"},
"states" : {"$addToSet" : "$name"}}},
{$sort : {"_id" : 1}},
{$out : “peopleNearMemphis”}
])
46
$out
db.cData.aggregate([ <pipeline stages>,
{“$out” :“resultsCollection”}])
• Save aggregation results to a new collection
• NOTE: Overwrites any data existing in collection
• Transform documents - ETL
47
Back To The Original Question
• Which US Division has the fastest growing population density?
– We only want to include data states with more than 1M people
– We only want to include divisions larger than 100K square miles
48
Division with Fastest Growing Pop
Densitydb.cData.aggregate(
[{$match : {"data.totalPop" : {"$gt" : 1000000}}},
{$unwind : "$data"},
{$sort : {"data.year" : 1}},
{$group : {"_id" : "$name",
"pop1990" : {"$first" : "$data.totalPop"},
"pop2010" : {"$last" : "$data.totalPop"},
"areaM" : {"$first" : "$areaM"},
"division" : {"$first" : "$division"}}},
{$group : {"_id" : "$division",
"totalPop1990" : {"$sum" : "$pop1990"},
"totalPop2010" : {"$sum" : "$pop2010"},
"totalAreaM" : {"$sum" : "$areaM"}}},
{$match : {"totalAreaM" : {"$gt" : 100000}}},
{$project : {"_id" : 0,
"division" : "$_id",
"density1990" : {"$divide" : ["$totalPop1990", "$totalAreaM"]},
"density2010" : {"$divide" : ["$totalPop2010", "$totalAreaM"]},
"denDelta" : {"$subtract" : [{"$divide" : ["$totalPop2010", "$totalAreaM"]},{"$divide" : ["$totalPop1990”,"$totalAreaM"]}]},
"totalAreaM" : 1,
"totalPop1990" : 1,
"totalPop2010" : 1}},
{$sort : {"denDelta" : -1}}])
49
{ "totalPop1990" : 42293785, "totalPop2010" : 58277380, "totalAreaM" : 290433.39999999997,
"division" : "South Atlantic", "density1990" : 145.62300685802668, "density2010" :
200.6566049221612, "denDelta" : 55.03359806413451 }
{ "totalPop1990" : 38577263, "totalPop2010" : 49169871, "totalAreaM" : 344302.94999999995,
"division" : "Pacific", "density1990" : 112.0445322934352, "density2010" : 142.80990331334658,
"denDelta" : 30.765371019911385 }
{ "totalPop1990" : 37602286, "totalPop2010" : 40872375, "totalAreaM" : 109331.91, "division" :
"Mid-Atlantic", "density1990" : 343.9278249140621, "density2010" : 373.8375648975674,
"denDelta" : 29.90973998350529 }
{ "totalPop1990" : 26702793, "totalPop2010" : 36346202, "totalAreaM" : 444052.01, "division" :
"West South Central", "density1990" : 60.134381555890265, "density2010" : 81.85122729204626,
"denDelta" : 21.716845736155996 }
{ "totalPop1990" : 15176284, "totalPop2010" : 18432505, "totalAreaM" : 183403.9, "division" :
"East South Central", "density1990" : 82.74788049763391, "density2010" : 100.50225213313348,
"denDelta" : 17.754371635499567 }
{ "totalPop1990" : 42008942, "totalPop2010" : 46421564, "totalAreaM" : 301368.57, "division" :
"East North Central", "density1990" : 139.39390560867048, "density2010" : 154.03585052017866,
"denDelta" : 14.641944911508176 }
{ "totalPop1990" : 12406123, "totalPop2010" : 20512410, "totalAreaM" : 618711.92, "division" :
"Mountain", "density1990" : 20.051533838236054, "density2010" : 33.153410071685705, "denDelta"
: 13.101876233449651 }
{ "totalPop1990" : 16324886, "totalPop2010" : 19018666, "totalAreaM" : 372541.8, "division" :
"West North Central", "density1990" : 43.820280033005695, "density2010" : 51.05109279012449,
"denDelta" : 7.230812757118798 }
Aggregate Options
51
Aggregate options
db.cData.aggregate([<pipeline stages>],
{‘explain’ : false
'allowDiskUse' : true,
'cursor' : {'batchSize' : 5}})
explain – similar to find().explain()
allowDiskUse – enable use of disk to store intermediate
results
cursor – specify the size of the initial result
New things in 3.2
53
$sample
{ $sample: { size: <positive integer> } }
● If WT - pseudo-random cursor to return
docs
● If MMAPv1 - uses _id index to randomly
select docs
Used by Compass, Useful for unit tests, etc
54
$lookup
• Performs a left outer join to another collection in the same database to filter in
documents from the “joined” collection for processing.
• To each input document, the $lookup stage adds a new array field whose
elements are the matching documents from the “joined” collection.
{
$lookup:
{
from: <collection to join>,
localField: <field from the input documents>,
foreignField: <field from the documents of the "from" collection>,
as: <output array field>
}
}
CANNOT BE SHARDED
https://docs.mongodb.org/master/reference/operator/aggregation/lookup/
55
• Sample data:
> db.data.find()
{ "_id" : ObjectId("565e759ae6f9919371a53896"), "v" : 14, "k" : 0 }
{ "_id" : ObjectId("565e759ae6f9919371a53897"), "v" : 664, "k" : 1 }
{ "_id" : ObjectId("565e759ae6f9919371a53898"), "v" : 701, "k" : 1 }
{ "_id" : ObjectId("565e759ae6f9919371a53899"), "v" : 312, "k" : 1 }
{ "_id" : ObjectId("565e759ae6f9919371a5389a"), "v" : 10, "k" : 2 }
{ "_id" : ObjectId("565e759ae6f9919371a5389b"), "v" : 686, "k" : 0 }
{ "_id" : ObjectId("565e759ae6f9919371a5389c"), "v" : 669, "k" : 2 }
{ "_id" : ObjectId("565e759ae6f9919371a5389d"), "v" : 273, "k" : 2 }
{ "_id" : ObjectId("565e759ae6f9919371a5389e"), "v" : 473, "k" : 0 }
{ "_id" : ObjectId("565e759ae6f9919371a5389f"), "v" : 158, "k" : 2 }
> db.keys.find()
{ "_id" : 0, "name" : "East Meter" }
{ "_id" : 1, "name" : "Central Meter 12" }
{ "_id" : 2, "name" : "New HIFI Monitor" }
56
• Try to find ave “v” value but lookup name of “k”
db.data.aggregate( [
{ "$lookup" : {
"from" : "keys",
"localField" : "k",
"foreignField" : "_id",
"as" : "name" }
},
{ "$unwind" : "$name" },
{ "$project" : {
"k" : "$k",
"name" : "$name.name",
"v" : "$v" }
},
{ "$group" : {
"_id" : "$name",
"aveValue" : { "$avg" : "$v" }
}
},
{ "$project" : {
"_id" : 0,
"name" : "$_id",
"aveValue" : "$aveValue" }
}
]);
{ "aveValue" : 277.5, "name" : "New HIFI Monitor"}
{ "aveValue" : 559, "name" : "Central Meter 12"}
{ "aveValue" : 391, "name" : "East Meter"}
57
friends of friends
Use $lookup to perform "self-joins" for graph problems.
Simple case: find the friends of someone's friends
Can extend this to find cliques, paths, etc.
Dataset:
{ "_id" : 1, "name" : "FLOYD", "friends" : [ "BILLIE",
"MARGENE", "HERMINIA", "LACRESHA", "SHAUN", "INOCENCIA",
"DEANA", "MARAGRET", "MICHELE", "KARLENE", "KASSANDRA",
"JOAN", "HIRAM" ] }
{ "_id" : 2, "name" : "ELIDA", "friends" : [ "ALI",
"KESHIA" ] }
...
58
59
don't forget your indexes…
Running FOF.friendsOfFriends(1)
2016-01-26T10:19:41.201-0500 I COMMAND [conn6] command friendship.friends command:
aggregate { aggregate: "friends", pipeline: [ { $match: { _id: 1.0 } }, { $unwind:
"$friends" }, { $lookup: { from: "friends", localField: "friends", foreignField:
"name", as: "friendsOfFriends" } }, { $unwind: "$friendsOfFriends" }, { $unwind:
"$friendsOfFriends.friends" }, { $group: { _id: "$friendsOfFriends.friends" } }, {
$project: { friendOfFriend: "$_id", _id: 0.0 } } ], cursor: {} } cursorid:42505581740
keyUpdates:0 writeConflicts:0 numYields:0 reslen:3722 locks:{ Global: { acquireCount:
{ r: 1124 } }, Database: { acquireCount: { r: 562 } }, Collection: { acquireCount: {
r: 562 } } } protocol:op_command 48ms
with indexes { "friends" : 1 } & { "name" : 1 }:
2016-01-26T10:17:45.167-0500 I COMMAND [conn6] command friendship.friends command:
aggregate { aggregate: "friends", pipeline: [ { $match: { _id: 1.0 } }, { $unwind:
"$friends" }, { $lookup: { from: "friends", localField: "friends", foreignField:
"name", as: "friendsOfFriends" } }, { $unwind: "$friendsOfFriends" }, { $unwind:
"$friendsOfFriends.friends" }, { $group: { _id: "$friendsOfFriends.friends" } }, {
$project: { friendOfFriend: "$_id", _id: 0.0 } } ], cursor: {} } cursorid:39053867824
keyUpdates:0 writeConflicts:0 numYields:0 reslen:3722 locks:{ Global: { acquireCount:
{ r: 32 } }, Database: { acquireCount: { r: 16 } }, Collection: { acquireCount: { r:
16 } } } protocol:op_command 2ms
60
lots of new mathematical operators
$stdDevSamp Calculates standard deviation. { $stdDevSamp: <array> }
$stdDevPop Calculates population standard deviation. { $stdDevPop: <array> }
$sqrt Calculates the square root. { $sqrt: <number> }
$abs Returns the absolute value of a number. { $abs: <number> }
$log Calculates the log of a number in the specified base. { $log: [ <number>, <base> ] }
$log10 Calculates the log base 10 of a number. { $log10: <number> }
$ln Calculates the natural log of a number. { $ln: <number> }
$pow Raises a number to the specified exponent. { $pow: [ <number>, <exponent> ] }
$exp Raises e to the specified exponent. { $exp: <number> }
$trunc Truncates a number to its integer. { $trunc: <number> }
$ceil Returns the smallest integer greater than or equal to the specified number.{$ceil:<number>}
$floor Returns the largest integer less than or equal to the specified number. {$floor: <number>}
61
new array operators
$slice Returns a subset of an array.
{ $slice: [ <array>, <n> ] } or { $slice: [ <array>, <position>, <n> ] }
$arrayElemAt Returns the element at the specified array index.{ $arrayElemAt: [ <array>, <idx>
] }
$concatArrays Concatenates arrays. { $concatArrays: [ <array1>, <array2>, ... ]}
$isArray Determines if the operand is an array. { $isArray: [ <expression> ] }
$filter Selects a subset of the array based on the condition.
{
$filter:
{
input: <array>,
as: <string>,
cond: <expression>
}
}
Summary
63
Analytics in MongoDB?
Create
Read
Update
Delete
Analytics
?
Group
Count
Derive Values
Filter
Average
Sort
YES!
64
Framework Use Cases
• Basic aggregation queries
• Ad-hoc reporting
• Real-time analytics
• Visualizing and reshaping data
Questions?
Thanks for attending & happy aggregating
Please complete survey
jason.mimick@mongodb.com
@jmimick

More Related Content

What's hot

Mongodb Aggregation Pipeline
Mongodb Aggregation PipelineMongodb Aggregation Pipeline
Mongodb Aggregation Pipelinezahid-mian
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichAggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichNorberto Leite
 
Aggregation Framework
Aggregation FrameworkAggregation Framework
Aggregation FrameworkMongoDB
 
Aggregation in MongoDB
Aggregation in MongoDBAggregation in MongoDB
Aggregation in MongoDBKishor Parkhe
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineJason Terpko
 
MongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced AggregationMongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced AggregationJoe Drumgoole
 
MongoDB Aggregation
MongoDB Aggregation MongoDB Aggregation
MongoDB Aggregation Amit Ghosh
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorAnalytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorHenrik Ingo
 
Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB MongoDB
 
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB
 
Aggregation Framework in MongoDB Overview Part-1
Aggregation Framework in MongoDB Overview Part-1Aggregation Framework in MongoDB Overview Part-1
Aggregation Framework in MongoDB Overview Part-1Anuj Jain
 
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right WayMongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right WayMongoDB
 
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2MongoDB
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB
 
MongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB Europe 2016 - Debugging MongoDB PerformanceMongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB Europe 2016 - Debugging MongoDB PerformanceMongoDB
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBMongoDB
 
Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBantoinegirbal
 
2011 Mongo FR - Indexing in MongoDB
2011 Mongo FR - Indexing in MongoDB2011 Mongo FR - Indexing in MongoDB
2011 Mongo FR - Indexing in MongoDBantoinegirbal
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBNosh Petigara
 

What's hot (20)

Mongodb Aggregation Pipeline
Mongodb Aggregation PipelineMongodb Aggregation Pipeline
Mongodb Aggregation Pipeline
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichAggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days Munich
 
Aggregation Framework
Aggregation FrameworkAggregation Framework
Aggregation Framework
 
Aggregation in MongoDB
Aggregation in MongoDBAggregation in MongoDB
Aggregation in MongoDB
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation Pipeline
 
MongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced AggregationMongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced Aggregation
 
MongoDB Aggregation
MongoDB Aggregation MongoDB Aggregation
MongoDB Aggregation
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorAnalytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop Connector
 
Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB
 
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDB
 
Aggregation Framework in MongoDB Overview Part-1
Aggregation Framework in MongoDB Overview Part-1Aggregation Framework in MongoDB Overview Part-1
Aggregation Framework in MongoDB Overview Part-1
 
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right WayMongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
 
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
 
MongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB Europe 2016 - Debugging MongoDB PerformanceMongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB Europe 2016 - Debugging MongoDB Performance
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDB
 
Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
2011 Mongo FR - Indexing in MongoDB
2011 Mongo FR - Indexing in MongoDB2011 Mongo FR - Indexing in MongoDB
2011 Mongo FR - Indexing in MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 

Viewers also liked

MongoDB and Spring - Two leaves of a same tree
MongoDB and Spring - Two leaves of a same treeMongoDB and Spring - Two leaves of a same tree
MongoDB and Spring - Two leaves of a same treeMongoDB
 
Back to Basics Webinar 3: Schema Design Thinking in Documents
 Back to Basics Webinar 3: Schema Design Thinking in Documents Back to Basics Webinar 3: Schema Design Thinking in Documents
Back to Basics Webinar 3: Schema Design Thinking in DocumentsMongoDB
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLMongoDB
 
Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsWebinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsMongoDB
 
Back to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation FrameworkBack to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation FrameworkMongoDB
 

Viewers also liked (6)

MongoDB and Spring - Two leaves of a same tree
MongoDB and Spring - Two leaves of a same treeMongoDB and Spring - Two leaves of a same tree
MongoDB and Spring - Two leaves of a same tree
 
Back to Basics Webinar 3: Schema Design Thinking in Documents
 Back to Basics Webinar 3: Schema Design Thinking in Documents Back to Basics Webinar 3: Schema Design Thinking in Documents
Back to Basics Webinar 3: Schema Design Thinking in Documents
 
MongoDB 3.4 webinar
MongoDB 3.4 webinarMongoDB 3.4 webinar
MongoDB 3.4 webinar
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQL
 
Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsWebinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in Documents
 
Back to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation FrameworkBack to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation Framework
 

Similar to Webinar: Exploring the Aggregation Framework

Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Keshav Murthy
 
De normalised london aggregation framework overview
De normalised london  aggregation framework overview De normalised london  aggregation framework overview
De normalised london aggregation framework overview Chris Harris
 
MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDB
MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDBMongoDB.local DC 2018: Tutorial - Data Analytics with MongoDB
MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDBMongoDB
 
Visual Api Training
Visual Api TrainingVisual Api Training
Visual Api TrainingSpark Summit
 
Powerful Analysis with the Aggregation Pipeline
Powerful Analysis with the Aggregation PipelinePowerful Analysis with the Aggregation Pipeline
Powerful Analysis with the Aggregation PipelineMongoDB
 
MongoDB Aggregations Indexing and Profiling
MongoDB Aggregations Indexing and ProfilingMongoDB Aggregations Indexing and Profiling
MongoDB Aggregations Indexing and ProfilingManish Kapoor
 
Joins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation EnhancementsJoins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation EnhancementsAndrew Morgan
 
Introduction to MongoDB for C# developers
Introduction to MongoDB for C# developersIntroduction to MongoDB for C# developers
Introduction to MongoDB for C# developersTaras Romanyk
 
How to leverage what's new in MongoDB 3.6
How to leverage what's new in MongoDB 3.6How to leverage what's new in MongoDB 3.6
How to leverage what's new in MongoDB 3.6Maxime Beugnet
 
Mongo db 101 dc group
Mongo db 101 dc groupMongo db 101 dc group
Mongo db 101 dc groupJohn Ragan
 
Doing More with MongoDB Aggregation
Doing More with MongoDB AggregationDoing More with MongoDB Aggregation
Doing More with MongoDB AggregationMongoDB
 
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012Big Data Spain
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
Gov APIs: The Notorious Case of Official Statistics
Gov APIs: The Notorious Case of Official StatisticsGov APIs: The Notorious Case of Official Statistics
Gov APIs: The Notorious Case of Official StatisticsXavier Badosa
 
ElasticSearch - Introduction to Aggregations
ElasticSearch - Introduction to AggregationsElasticSearch - Introduction to Aggregations
ElasticSearch - Introduction to Aggregationsenterprisesearchmeetup
 
MongoDB Aggregation Framework in action !
MongoDB Aggregation Framework in action !MongoDB Aggregation Framework in action !
MongoDB Aggregation Framework in action !Sébastien Prunier
 

Similar to Webinar: Exploring the Aggregation Framework (20)

MongoDB 3.2 - Analytics
MongoDB 3.2  - AnalyticsMongoDB 3.2  - Analytics
MongoDB 3.2 - Analytics
 
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
 
Starting out with MongoDB
Starting out with MongoDBStarting out with MongoDB
Starting out with MongoDB
 
De normalised london aggregation framework overview
De normalised london  aggregation framework overview De normalised london  aggregation framework overview
De normalised london aggregation framework overview
 
MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDB
MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDBMongoDB.local DC 2018: Tutorial - Data Analytics with MongoDB
MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDB
 
Visual Api Training
Visual Api TrainingVisual Api Training
Visual Api Training
 
Powerful Analysis with the Aggregation Pipeline
Powerful Analysis with the Aggregation PipelinePowerful Analysis with the Aggregation Pipeline
Powerful Analysis with the Aggregation Pipeline
 
MongoDB Aggregations Indexing and Profiling
MongoDB Aggregations Indexing and ProfilingMongoDB Aggregations Indexing and Profiling
MongoDB Aggregations Indexing and Profiling
 
Joins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation EnhancementsJoins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation Enhancements
 
Introduction to MongoDB for C# developers
Introduction to MongoDB for C# developersIntroduction to MongoDB for C# developers
Introduction to MongoDB for C# developers
 
How to leverage what's new in MongoDB 3.6
How to leverage what's new in MongoDB 3.6How to leverage what's new in MongoDB 3.6
How to leverage what's new in MongoDB 3.6
 
Querying mongo db
Querying mongo dbQuerying mongo db
Querying mongo db
 
Mongo db 101 dc group
Mongo db 101 dc groupMongo db 101 dc group
Mongo db 101 dc group
 
Doing More with MongoDB Aggregation
Doing More with MongoDB AggregationDoing More with MongoDB Aggregation
Doing More with MongoDB Aggregation
 
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
Gov APIs: The Notorious Case of Official Statistics
Gov APIs: The Notorious Case of Official StatisticsGov APIs: The Notorious Case of Official Statistics
Gov APIs: The Notorious Case of Official Statistics
 
ElasticSearch - Introduction to Aggregations
ElasticSearch - Introduction to AggregationsElasticSearch - Introduction to Aggregations
ElasticSearch - Introduction to Aggregations
 
MongoDB Aggregation Framework in action !
MongoDB Aggregation Framework in action !MongoDB Aggregation Framework in action !
MongoDB Aggregation Framework in action !
 
Talk MongoDB - Amil
Talk MongoDB - AmilTalk MongoDB - Amil
Talk MongoDB - Amil
 

More from MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDBMongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDBMongoDB
 

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDBMongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
 

Recently uploaded

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 

Recently uploaded (20)

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 

Webinar: Exploring the Aggregation Framework

  • 1. Exploring the Aggregation Framework Jason Mimick - Senior Consulting Engineer jason.mimick@mongodb.com @jmimick Original Slide Credits: Jay Runkel jay.runkel@mongodb.com et al
  • 2. 2 Warning or Whew This is a “101” beginner talk! Assuming you know some basics about MongoDB But basically nothing about the Aggregation Framework
  • 3. 3 Agenda 1. Analytics in MongoDB? 2. Aggregation Framework 3. Aggregation Framework in Action – US Census Data – Aggregation Framework Options 4. New 3.2 stuff – Friends of friends $lookup for self-joins
  • 5. 5 For Example: US Census Data • Census data from 1990, 2000, 2010 • Question: Which US Division has the fastest growing population density? – We only want to include data states with more than 1M people – We only want to include divisions larger than 100K square miles Division = a group of US States Population density = Area of division/# of people Data is provided at the state level
  • 6. 6 US Regions and Divisions
  • 7. 7 How would we solve this in SQL? SELECT GROUP BY HAVING Of course, we don’t have SQL we’re a noSQL database
  • 9. 9 Core Concept: Pipeline ps -ef | grep mongod
  • 10. 10 What is the Aggregation Pipeline? A Series of Document Transformations – Executed in stages – Original input is a collection – Output as a cursor or a collection Rich Library of Functions – Filter, compute, group, and summarize data – Output of one stage sent to input of next – Operations executed in sequential order
  • 12. 12 Syntax >db.foo.aggregate( [ { stage1 },{ stage2 },{ stage3 }, … ]) mongo shell 1 db - variable pointing to current database 2 collection name 3 aggregate - method on collection 4 array of objects, each a pipeline operator 5 pipeline operators 1 2 3 4 ...5...
  • 13. 13 Syntax - Driver - Java db.hospital.aggregate( [ { "$group" : { "_id" : "$PatientID, "count" : { "$sum" : 1 } } }, { "$match" : { "count" : { "$gte" : 5 } } }, { "$sort" : { "count" : -1 } } ] )
  • 14. 14 Some Popular Pipeline Operators $match Filter documents $project Reshape documents $group Summarize documents $unwind Expand arrays in documents $sort Order documents $limit/$skip Paginate documents $redact Restrict documents $geoNear Proximity sort documents $let,$map Define variables
  • 15. 15 80+ operators available as of MongoDB 3.2
  • 16. Aggregation Framework in Action (let’s play with the census data)
  • 17. 17 cData Collection • Document For Each State – Name – Region – Division • Census Data For 1990, 2000, 2010 – Population – Housing Units – Occupied Housing Units • Census Data is an array with three subdocuments
  • 18. 18 Count, Distinct • Check out cData docs • count() • distinct() When you starting building your aggregations you need to ‘get to know’ your data!
  • 19. 19 Simple $group Census data has a collection called regions > db.regions.findOne() { "_id" : ObjectId("54d0e1ac28099359f5660f9f"), "state" : "Connecticut", "region" : "Northeast", "regNum" : 1, "division" : "New England", "divNum" : 1 } How can we find out how many states are in each region?
  • 20. 20 > db.regions.aggregate( [ { "$group" : { "_id" : "$region", "count" : { "$sum" : 1 } } } ] ) { "_id" : "West", "count" : 13 } { "_id" : "South", "count" : 17 } { "_id" : "Midwest", "count" : 12 } { "_id" : "Northeast", "count" : 9 } // make more readable - store your pipeline ops in variables >var group = { "$group" : { "_id" : "$region", "count" : { "$sum" : 1 } } }; db.regions.aggregate( [ group ] )
  • 21. 21 $group • Group documents by value – _id - field reference, object, constant – Other output fields are computed • $max, $min, $avg, $sum • $addToSet, $push • $first, $last – Processes all data in memory by default
  • 22. 22 Total US Area Back to cData… Can we use $group to find the total area of the US (according to these data)?
  • 23. 23 db.cData.aggregate([ {"$group" : {"_id" : null, "totalArea" : {$sum : "$areaM"}, "avgArea" : {$avg : "$areaM"} } }]) { "_id" : null, "totalArea" : 3802067.0700000003, "avgArea" : 73116.67442307693 }
  • 24. 24 Area By Regiondb.cData.aggregate([ {"$group" : {"_id" : "$region", "totalArea" : {$sum : "$areaM"}, "avgArea" : {$avg : "$areaM"}, "numStates" : {$sum : 1}, "states" : {$push : "$name"}}} ]) { "_id" : null, "totalArea" : 5393.18, "avgArea" : 2696.59, "numStates" : 2, "states" : [ "District of Columbia", "Puerto Rico" ] } { "_id" : "Northeast", "totalArea" : 181319.86, "avgArea" : 20146.65111111111, "numStates" : 9, "states" : [ "New Jersey", "Vermont", "Maine", "New Hampshire", "Rhode Island", "Pennsylvania", "Connecticut", "Massachusetts", "New York" ] } { "_id" : "Midwest", "totalArea" : 821724.3700000001, "avgArea" : 68477.03083333334, "numStates" : 12, "states" : [ "Iowa", "Missouri", "Ohio", "Indiana", "North Dakota", "Wisconsin", "Illinois", "Minnesota", "Kansas", "South Dakota", "Michigan", "Nebraska" ] } { "_id" : "West", "totalArea" : 1873251.6300000001, "avgArea" : 144096.27923076923, "numStates" : 13, "states" : [ "Colorado", "Wyoming", "California", "Utah", "Nevada", "Alaska", "Hawaii", "Montana", "New Mexico", "Arizona", "Idaho", "Oregon", "Washington" ] } { "_id" : "South", "totalArea" : 920378.03, "avgArea" : 57523.626875, "numStates" : 16, "states" : [ "Alabama", "Georgia", "Maryland", "South Carolina", "Florida", "Mississippi", "Arkansas", "Louisiana", "North Carolina", "Texas", "West Virginia", "Oklahoma", "Virginia", "Delaware", "Kentucky", "Tennessee" ] }
  • 25. 25 Calculating Average State Area By Region { $group: { _id: "$region", avgAreaM: {$avg: ”$areaM" } }} { _id: ”North East", avgAreaM: 154 } { _id: “West", avgAreaM: 300 } { state: ”New York", areaM: 218, region: “North East" } { state: ”New Jersey", areaM: 90, region: “North East” } { state: “California", areaM: 300, region: “West" }
  • 26. 26 Calculating Total Area and State Count { $group: { _id: "$region", totArea: {$sum: ”$areaM" }, sCount : {$sum : 1}}} { _id: ”North East", totArea: 308 sCount: 2} { _id: “West", totArea: 300, sCount: 1} { state: ”New York", areaM: 218, region: “North East" } { state: ”New Jersey", areaM: 90, region: “North East” } { state: “California", area: 300, region: “West" }
  • 27. 27 Total US Population By Year db.cData.aggregate( [{$unwind : "$data"}, {$group : {"_id" : "$data.year", "totalPop" : {$sum : "$data.totalPop"}}}, {$sort : {"totalPop" : 1}} ]) { "_id" : 1990, "totalPop" : 248709873 } { "_id" : 2000, "totalPop" : 281421906 } { "_id" : 2010, "totalPop" : 312471327 }
  • 28. 28 $unwind • Flattens arrays • Create documents from array elements • Array replaced by element value • Missing/empty fields → no output • Non-array fields → error • Pipe to $group to aggregate { "a" : "foo", "b" : [1, 2, 3] } { "a" : "foo", "b" : 1 } { "a" : "foo", "b" : 2 } { "a" : "foo", "b" : 3 }
  • 29. 29 $unwind { $unwind: $census } { state: “New York, census: 1990} { state: ”New York", census: [1990, 2000, 2010] } { state: ”New Jersey", census: [1990, 2000] } { state: “California", census: [1980, 1990, 2000, 2010] } { state: ”Delaware", census: [1990, 2000] } { state: “New York, census: 2000} { state: “New York, census: 2010} { state: “New Jersey, census: 1990} { state: “New Jersey, census: 2000} …
  • 30. 30 Southern State Population By Year db.cData.aggregate( [{$match : {"region" : "South"}}, {$unwind : "$data"}, {$group : {"_id" : "$data.year", "totalPop” : {"$sum” : "$data.totalPop"}}}]) { "_id" : 2010, "totalPop" : 113954021 } { "_id" : 2000, "totalPop" : 99664761 } { "_id" : 1990, "totalPop" : 84839030 }
  • 31. 31 $match • Filter documents –Uses existing query syntax
  • 32. 32 $match { $match: { “region” : “West” } } { state: ”New York", areaM: 218, region: “North East" } { state: ”Oregon", areaM: 245, region: “West” } { state: “California", area: 300, region: “West" } { state: ”Oregon", areaM: 245, region: “West” } { state: “California", area: 300, region: “West" }
  • 33. 33 Population Delta By State from 1990 to 2010 db.cData.aggregate([ {$unwind : "$data"}, {$sort : {"data.year" : 1}}, {$group :{"_id" : "$name", "pop1990" : {"$first" : "$data.totalPop"}, "pop2010" : {"$last" : "$data.totalPop"}}}, {$project : {"_id" : 0, "name" : "$_id", "delta" : {"$subtract" : ["$pop2010", "$pop1990"]}, "pop1990" : 1, "pop2010” : 1} } ])
  • 34. 34 { "pop1990" : 3725789, "pop2010" : 3725789, "name" : "Puerto Rico", "delta" : 0 } { "pop1990" : 4866692, "pop2010" : 6724540, "name" : "Washington", "delta" : 1857848 } { "pop1990" : 4877185, "pop2010" : 6346105, "name" : "Tennessee", "delta" : 1468920 } { "pop1990" : 1227928, "pop2010" : 1328361, "name" : "Maine", "delta" : 100433 } { "pop1990" : 1006749, "pop2010" : 1567582, "name" : "Idaho", "delta" : 560833 } { "pop1990" : 1108229, "pop2010" : 1360301, "name" : "Hawaii", "delta" : 252072 } { "pop1990" : 3665228, "pop2010" : 6392017, "name" : "Arizona", "delta" : 2726789 } { "pop1990" : 638800, "pop2010" : 672591, "name" : "North Dakota", "delta" : 33791 } { "pop1990" : 6187358, "pop2010" : 8001024, "name" : "Virginia", "delta" : 1813666 } { "pop1990" : 550043, "pop2010" : 710231, "name" : "Alaska", "delta" : 160188 } { "pop1990" : 1109252, "pop2010" : 1316470, "name" : "New Hampshire", "delta" : 207218 } { "pop1990" : 10847115, "pop2010" : 11536504, "name" : "Ohio", "delta" : 689389 } { "pop1990" : 6016425, "pop2010" : 6547629, "name" : "Massachusetts", "delta" : 531204 } { "pop1990" : 6628637, "pop2010" : 9535483, "name" : "North Carolina", "delta" : 2906846 } { "pop1990" : 3287116, "pop2010" : 3574097, "name" : "Connecticut", "delta" : 286981 } { "pop1990" : 17990455, "pop2010" : 19378102, "name" : "New York", "delta" : 1387647 } { "pop1990" : 29760021, "pop2010" : 37253956, "name" : "California", "delta" : 7493935 } { "pop1990" : 16986510, "pop2010" : 25145561, "name" : "Texas", "delta" : 8159051 } { "pop1990" : 11881643, "pop2010" : 12702379, "name" : "Pennsylvania", "delta" : 820736 } { "pop1990" : 2842321, "pop2010" : 3831074, "name" : "Oregon", "delta" : 988753 }
  • 35. 35 $sort, $limit, $skip • Sort documents by one or more fields – Same order syntax as cursors – Waits for earlier pipeline operator to return – In-memory unless early and indexed • Limit and skip follow cursor behavior
  • 36. 36 $first, $last • Collection operations like $push and $addToSet • Must be used in $group • $first and $last determined by document order • Typically used with $sort to ensure ordering is known
  • 37. 37 $project • Reshape/Transform Documents – Include, exclude or rename fields – Inject computed fields – Create sub-document fields
  • 38. 38 Including and Excluding Fields { $project: { “_id” : 0, “pop1990” : 1, “pop2010” : 1 } { "_id" : "Virginia”, "pop1990" : 453588, "pop2010" : 3725789 } { "_id" : "South Dakota", "pop1990" : 453588, "pop2010" : 3725789 } { "pop1990" : 453588, "pop2010" : 3725789 } { "pop1990" : 453588, "pop2010" : 3725789 }
  • 39. 39 { ”name" : “South Dakota”, ”delta" : 118176 } Renaming and Computing Fields { $project: { “_id” : 0, “pop1990” : 0, “pop2010” : 0, “name” : “$_id”, "delta" : {"$subtract" : ["$pop2010", "$pop1990"]}} } { "_id" : "Virginia”, "pop1990" : 6187358, "pop2010" : 8001024 } { "_id" : "South Dakota", "pop1990" : 696004, "pop2010" : 814180 } { ”name" : “Virginia”, ”delta" : 1813666 }
  • 40. 40 Compare number of people living within 500KM of Memphis, TN in 1990, 2000, 2010
  • 41. 41 Compare number of people living within 500KM of Memphis, TN in 1990, 2000, 2010 db.cData.aggregate([ {$geoNear : { "near" : {"type" : "Point", "coordinates" : [90, 35]}, “distanceField” : "dist.calculated", “maxDistance” : 500000, “includeLocs” : "dist.location", “spherical” : true }}, {$unwind : "$data"}, {$group : {"_id" : "$data.year", "totalPop" : {"$sum" : "$data.totalPop"}, "states" : {"$addToSet" : "$name"}}}, {$sort : {"_id" : 1}} ])
  • 42. 42 { "_id" : 1990, "totalPop" : 22644082, "states" : [ "Kentucky", "Missouri", "Alabama", "Tennessee", "Mississippi", "Arkansas" ] } { "_id" : 2000, "totalPop" : 25291421, "states" : [ "Kentucky", "Missouri", "Alabama", "Tennessee", "Mississippi", "Arkansas" ] } { "_id" : 2010, "totalPop" : 27337350, "states" : [ "Kentucky", "Missouri", "Alabama", "Tennessee", "Mississippi", "Arkansas" ] }
  • 43. 43 $geoNear • Order/Filter Documents by Location – Requires a geospatial index – Output includes physical distance – Must be first aggregation stage
  • 44. 44 { "_id" : ”Tennessee", "pop1990" : 4877185, "pop2010" : 6346105, “center” : {“type” : “Point”, “coordinates” : [86.6, 37.8]} } { "_id" : "Virginia”, "pop1990" : 6187358, "pop2010" : 8001024, “center” : {“type” : “Point”, “coordinates” : [78.6, 37.5]} } $geoNear {$geoNear : { "near”: {"type”: "Point", "coordinates”: [90, 35]}, maxDistance : 500000, spherical : true }} { "_id" : ”Tennessee", "pop1990" : 4877185, "pop2010" : 6346105, “center” : {“type” : “Point”, “coordinates” : [86.6, 37.8]} }
  • 45. 45 What if I want to save the results to a collection? db.cData.aggregate([ {$geoNear : { "near" : {"type" : "Point", "coordinates" : [90, 35]}, “distanceField” : "dist.calculated", “maxDistance” : 500000, “includeLocs” : "dist.location", “spherical” : true }}, {$unwind : "$data"}, {$group : {"_id" : "$data.year", "totalPop" : {"$sum" : "$data.totalPop"}, "states" : {"$addToSet" : "$name"}}}, {$sort : {"_id" : 1}}, {$out : “peopleNearMemphis”} ])
  • 46. 46 $out db.cData.aggregate([ <pipeline stages>, {“$out” :“resultsCollection”}]) • Save aggregation results to a new collection • NOTE: Overwrites any data existing in collection • Transform documents - ETL
  • 47. 47 Back To The Original Question • Which US Division has the fastest growing population density? – We only want to include data states with more than 1M people – We only want to include divisions larger than 100K square miles
  • 48. 48 Division with Fastest Growing Pop Densitydb.cData.aggregate( [{$match : {"data.totalPop" : {"$gt" : 1000000}}}, {$unwind : "$data"}, {$sort : {"data.year" : 1}}, {$group : {"_id" : "$name", "pop1990" : {"$first" : "$data.totalPop"}, "pop2010" : {"$last" : "$data.totalPop"}, "areaM" : {"$first" : "$areaM"}, "division" : {"$first" : "$division"}}}, {$group : {"_id" : "$division", "totalPop1990" : {"$sum" : "$pop1990"}, "totalPop2010" : {"$sum" : "$pop2010"}, "totalAreaM" : {"$sum" : "$areaM"}}}, {$match : {"totalAreaM" : {"$gt" : 100000}}}, {$project : {"_id" : 0, "division" : "$_id", "density1990" : {"$divide" : ["$totalPop1990", "$totalAreaM"]}, "density2010" : {"$divide" : ["$totalPop2010", "$totalAreaM"]}, "denDelta" : {"$subtract" : [{"$divide" : ["$totalPop2010", "$totalAreaM"]},{"$divide" : ["$totalPop1990”,"$totalAreaM"]}]}, "totalAreaM" : 1, "totalPop1990" : 1, "totalPop2010" : 1}}, {$sort : {"denDelta" : -1}}])
  • 49. 49 { "totalPop1990" : 42293785, "totalPop2010" : 58277380, "totalAreaM" : 290433.39999999997, "division" : "South Atlantic", "density1990" : 145.62300685802668, "density2010" : 200.6566049221612, "denDelta" : 55.03359806413451 } { "totalPop1990" : 38577263, "totalPop2010" : 49169871, "totalAreaM" : 344302.94999999995, "division" : "Pacific", "density1990" : 112.0445322934352, "density2010" : 142.80990331334658, "denDelta" : 30.765371019911385 } { "totalPop1990" : 37602286, "totalPop2010" : 40872375, "totalAreaM" : 109331.91, "division" : "Mid-Atlantic", "density1990" : 343.9278249140621, "density2010" : 373.8375648975674, "denDelta" : 29.90973998350529 } { "totalPop1990" : 26702793, "totalPop2010" : 36346202, "totalAreaM" : 444052.01, "division" : "West South Central", "density1990" : 60.134381555890265, "density2010" : 81.85122729204626, "denDelta" : 21.716845736155996 } { "totalPop1990" : 15176284, "totalPop2010" : 18432505, "totalAreaM" : 183403.9, "division" : "East South Central", "density1990" : 82.74788049763391, "density2010" : 100.50225213313348, "denDelta" : 17.754371635499567 } { "totalPop1990" : 42008942, "totalPop2010" : 46421564, "totalAreaM" : 301368.57, "division" : "East North Central", "density1990" : 139.39390560867048, "density2010" : 154.03585052017866, "denDelta" : 14.641944911508176 } { "totalPop1990" : 12406123, "totalPop2010" : 20512410, "totalAreaM" : 618711.92, "division" : "Mountain", "density1990" : 20.051533838236054, "density2010" : 33.153410071685705, "denDelta" : 13.101876233449651 } { "totalPop1990" : 16324886, "totalPop2010" : 19018666, "totalAreaM" : 372541.8, "division" : "West North Central", "density1990" : 43.820280033005695, "density2010" : 51.05109279012449, "denDelta" : 7.230812757118798 }
  • 51. 51 Aggregate options db.cData.aggregate([<pipeline stages>], {‘explain’ : false 'allowDiskUse' : true, 'cursor' : {'batchSize' : 5}}) explain – similar to find().explain() allowDiskUse – enable use of disk to store intermediate results cursor – specify the size of the initial result
  • 53. 53 $sample { $sample: { size: <positive integer> } } ● If WT - pseudo-random cursor to return docs ● If MMAPv1 - uses _id index to randomly select docs Used by Compass, Useful for unit tests, etc
  • 54. 54 $lookup • Performs a left outer join to another collection in the same database to filter in documents from the “joined” collection for processing. • To each input document, the $lookup stage adds a new array field whose elements are the matching documents from the “joined” collection. { $lookup: { from: <collection to join>, localField: <field from the input documents>, foreignField: <field from the documents of the "from" collection>, as: <output array field> } } CANNOT BE SHARDED https://docs.mongodb.org/master/reference/operator/aggregation/lookup/
  • 55. 55 • Sample data: > db.data.find() { "_id" : ObjectId("565e759ae6f9919371a53896"), "v" : 14, "k" : 0 } { "_id" : ObjectId("565e759ae6f9919371a53897"), "v" : 664, "k" : 1 } { "_id" : ObjectId("565e759ae6f9919371a53898"), "v" : 701, "k" : 1 } { "_id" : ObjectId("565e759ae6f9919371a53899"), "v" : 312, "k" : 1 } { "_id" : ObjectId("565e759ae6f9919371a5389a"), "v" : 10, "k" : 2 } { "_id" : ObjectId("565e759ae6f9919371a5389b"), "v" : 686, "k" : 0 } { "_id" : ObjectId("565e759ae6f9919371a5389c"), "v" : 669, "k" : 2 } { "_id" : ObjectId("565e759ae6f9919371a5389d"), "v" : 273, "k" : 2 } { "_id" : ObjectId("565e759ae6f9919371a5389e"), "v" : 473, "k" : 0 } { "_id" : ObjectId("565e759ae6f9919371a5389f"), "v" : 158, "k" : 2 } > db.keys.find() { "_id" : 0, "name" : "East Meter" } { "_id" : 1, "name" : "Central Meter 12" } { "_id" : 2, "name" : "New HIFI Monitor" }
  • 56. 56 • Try to find ave “v” value but lookup name of “k” db.data.aggregate( [ { "$lookup" : { "from" : "keys", "localField" : "k", "foreignField" : "_id", "as" : "name" } }, { "$unwind" : "$name" }, { "$project" : { "k" : "$k", "name" : "$name.name", "v" : "$v" } }, { "$group" : { "_id" : "$name", "aveValue" : { "$avg" : "$v" } } }, { "$project" : { "_id" : 0, "name" : "$_id", "aveValue" : "$aveValue" } } ]); { "aveValue" : 277.5, "name" : "New HIFI Monitor"} { "aveValue" : 559, "name" : "Central Meter 12"} { "aveValue" : 391, "name" : "East Meter"}
  • 57. 57 friends of friends Use $lookup to perform "self-joins" for graph problems. Simple case: find the friends of someone's friends Can extend this to find cliques, paths, etc. Dataset: { "_id" : 1, "name" : "FLOYD", "friends" : [ "BILLIE", "MARGENE", "HERMINIA", "LACRESHA", "SHAUN", "INOCENCIA", "DEANA", "MARAGRET", "MICHELE", "KARLENE", "KASSANDRA", "JOAN", "HIRAM" ] } { "_id" : 2, "name" : "ELIDA", "friends" : [ "ALI", "KESHIA" ] } ...
  • 58. 58
  • 59. 59 don't forget your indexes… Running FOF.friendsOfFriends(1) 2016-01-26T10:19:41.201-0500 I COMMAND [conn6] command friendship.friends command: aggregate { aggregate: "friends", pipeline: [ { $match: { _id: 1.0 } }, { $unwind: "$friends" }, { $lookup: { from: "friends", localField: "friends", foreignField: "name", as: "friendsOfFriends" } }, { $unwind: "$friendsOfFriends" }, { $unwind: "$friendsOfFriends.friends" }, { $group: { _id: "$friendsOfFriends.friends" } }, { $project: { friendOfFriend: "$_id", _id: 0.0 } } ], cursor: {} } cursorid:42505581740 keyUpdates:0 writeConflicts:0 numYields:0 reslen:3722 locks:{ Global: { acquireCount: { r: 1124 } }, Database: { acquireCount: { r: 562 } }, Collection: { acquireCount: { r: 562 } } } protocol:op_command 48ms with indexes { "friends" : 1 } & { "name" : 1 }: 2016-01-26T10:17:45.167-0500 I COMMAND [conn6] command friendship.friends command: aggregate { aggregate: "friends", pipeline: [ { $match: { _id: 1.0 } }, { $unwind: "$friends" }, { $lookup: { from: "friends", localField: "friends", foreignField: "name", as: "friendsOfFriends" } }, { $unwind: "$friendsOfFriends" }, { $unwind: "$friendsOfFriends.friends" }, { $group: { _id: "$friendsOfFriends.friends" } }, { $project: { friendOfFriend: "$_id", _id: 0.0 } } ], cursor: {} } cursorid:39053867824 keyUpdates:0 writeConflicts:0 numYields:0 reslen:3722 locks:{ Global: { acquireCount: { r: 32 } }, Database: { acquireCount: { r: 16 } }, Collection: { acquireCount: { r: 16 } } } protocol:op_command 2ms
  • 60. 60 lots of new mathematical operators $stdDevSamp Calculates standard deviation. { $stdDevSamp: <array> } $stdDevPop Calculates population standard deviation. { $stdDevPop: <array> } $sqrt Calculates the square root. { $sqrt: <number> } $abs Returns the absolute value of a number. { $abs: <number> } $log Calculates the log of a number in the specified base. { $log: [ <number>, <base> ] } $log10 Calculates the log base 10 of a number. { $log10: <number> } $ln Calculates the natural log of a number. { $ln: <number> } $pow Raises a number to the specified exponent. { $pow: [ <number>, <exponent> ] } $exp Raises e to the specified exponent. { $exp: <number> } $trunc Truncates a number to its integer. { $trunc: <number> } $ceil Returns the smallest integer greater than or equal to the specified number.{$ceil:<number>} $floor Returns the largest integer less than or equal to the specified number. {$floor: <number>}
  • 61. 61 new array operators $slice Returns a subset of an array. { $slice: [ <array>, <n> ] } or { $slice: [ <array>, <position>, <n> ] } $arrayElemAt Returns the element at the specified array index.{ $arrayElemAt: [ <array>, <idx> ] } $concatArrays Concatenates arrays. { $concatArrays: [ <array1>, <array2>, ... ]} $isArray Determines if the operand is an array. { $isArray: [ <expression> ] } $filter Selects a subset of the array based on the condition. { $filter: { input: <array>, as: <string>, cond: <expression> } }
  • 64. 64 Framework Use Cases • Basic aggregation queries • Ad-hoc reporting • Real-time analytics • Visualizing and reshaping data
  • 65. Questions? Thanks for attending & happy aggregating Please complete survey jason.mimick@mongodb.com @jmimick