We're working on a new aggregation framework for MongoDB that will introduce a new aggregation system that will make it a lot easier to do simple tasks like counting, averaging, and finding minima or maxima while grouping by keys in a collection. The new aggregation features are not a replacement for map-reduce, but will make it possible to do a number of things much more easily, without having to resort to the big hammer that is map-reduce. After introducing the syntax and usage patterns for the new aggregation system, we will give some demonstrations of aggregation using the new system.
3. STATE OF AGGREGATION
We're storing our data in MongoDB.
We need to do ad-hoc reporting,
grouping, common aggregations, etc.
What are we using for this?
4. DATA WAREHOUSING
SQL for reporting and analytics
Infrastructure complications
Additional maintenance
Data duplication
ETL processes
Real time?
6. MAPREDUCE IN MONGODB
Implemented with JavaScript
Single-threaded
Difficult to debug
Concurrency
Appearance of parallelism
Write locks (without i l n or j M d )
nie soe
8. AGGREGATION FRAMEWORK
Declared in BSON, executes in C++
Flexible, functional, and simple
Operation pipeline
Computational expressions
Plays nice with sharding
10. PIPELINE
Process a stream of documents
Original input is a collection
Final output is a result document
Series of operators
Filter or transform data
Input/output chain
p x|ge ogd|ha n1
sa rpmno ed‐
11. PIPELINE OPERATORS
$ac
mth
$rjc
poet
$ru
gop
$nid
uwn
$ot
sr
$ii
lmt
$kp
si
12. OUR EXAMPLE DATA
Library Books
{_d 7,
i:35
tte TeGetGtb"
il:"h ra asy,
IB:"715109"
SN 9887513,
aalbe re
vial:tu,
pgs 1,
ae:28
catr:9
hpes ,
sbet:[
ujcs
"ogIln"
Ln sad,
"e ok,
NwYr"
"90"
12s
]
,
lnug:"nls"
agae Egih,
pbihr
ulse:{
ct:"odn,
iy Lno"
nm:"admHue
ae Rno os"
}
}
20. $GROUP
Group documents by an ID
Field reference, object, constant
Other output fields are computed
$a, $i, $v, $u
mx mn ag sm
adoe, ps
$ d T S t$ u h
$is, $at
frt ls
Processes all data in memory
24. $UNWIND
Operate on an array field
Yield new documents for each array element
Array replaced by element value
Missing/empty fields → no output
Non-array fields → error
Pipe to $ r u to aggregate array values
gop
25. $UNWIND
Yielding multiple documents from one
{_d 7,
i:35
tte TeGetGtb"
il:"h ra asy, ► {$nid $ujcs
uwn:"sbet"}
sbet:[
ujcs
"ogIln"
Ln sad,
"e ok,
NwYr"
"90"
12s
▼
]
} {_d 7,
i:35
tte TeGetGtb"
il:"h ra asy,
sbet:"ogIln"
ujcs Ln sad
}
{_d 7,
i:35
tte TeGetGtb"
il:"h ra asy,
sbet:"e ok
ujcs NwYr"
}
{_d 7,
i:35
tte TeGetGtb"
il:"h ra asy,
sbet:"90"
ujcs 12s
}
26. $SORT, $LIMIT, $SKIP
Sort documents by one or more fields
Same order syntax as cursors
Waits for earlier pipeline operator to return
In-memory unless early and indexed
Limit and skip follow cursor behavior
27. $SORT
Sort all documents in the pipeline
{tte TeGetGtb"}
il:"h ra asy
► {$ot il:1}
sr:{tte }
▼
{tte BaeNwWrd
il:"rv e ol"}
{tte TeGae fWah
il:"h rpso rt"}
{tte Aia am
il:"nmlFr"}
{tte Aia am
il:"nmlFr"}
{tte BaeNwWrd
il:"rv e ol"}
{tte Lr fteFis
il:"odo h le"}
{tte Fhehi 5"}
il:"arnet41
{tte FtesadSn"}
il:"ahr n os
{tte FtesadSn"}
il:"ahr n os
{tte IvsbeMn
il:"niil a"}
{tte IvsbeMn
il:"niil a"}
{tte Fhehi 5"}
il:"arnet41
{tte Lr fteFis
il:"odo h le"}
{tte TeGae fWah
il:"h rpso rt"}
{tte TeGetGtb"}
il:"h ra asy
28. $LIMIT
Limit documents through the pipeline
{tte Aia am
il:"nmlFr"}
► {$ii:5}
lmt
▼
{tte BaeNwWrd
il:"rv e ol"}
{tte Fhehi 5"}
il:"arnet41
{tte Aia am
il:"nmlFr"}
{tte FtesadSn"}
il:"ahr n os
{tte BaeNwWrd
il:"rv e ol"}
{tte IvsbeMn
il:"niil a"}
{tte Fhehi 5"}
il:"arnet41
{tte Lr fteFis
il:"odo h le"}
{tte FtesadSn"}
il:"ahr n os
{tte TeGae fWah
il:"h rpso rt"}
{tte IvsbeMn
il:"niil a"}
{tte TeGetGtb"}
il:"h ra asy
29. $SKIP
Skip over documents in the pipeline
{tte Aia am
il:"nmlFr"}
► {$kp
si:2}
▼
{tte BaeNwWrd
il:"rv e ol"}
{tte Fhehi 5"}
il:"arnet41
{tte Fhehi 5"}
il:"arnet41
{tte FtesadSn"}
il:"ahr n os
{tte FtesadSn"}
il:"ahr n os
{tte IvsbeMn
il:"niil a"}
{tte IvsbeMn
il:"niil a"}
30. EXPRESSIONS
State of Aggregation
Pipeline
Expressions
Usage and Limitations
Sharding
Looking Ahead
32. EXPRESSIONS
Logic Comparison
$ n , $ r$ o …
ad o, nt $ m , $ q$ t
cp e, g…
Arithmetic String
$d, $iie
ad dvd… sraem, sbt…
$ t c s c p$ u s r
Date Conditional
ya, dyfot…
$ e r$ a O M n h cn, iNl…
$ o d$ f u l
39. SHARDING
Split the pipeline at first $ r u or $ o t
gop sr
Shards execute pipeline up to that point
mongos merges results and continues
Early $ a c may excuse shards
mth
CPU and memory implications for mongos
42. LOOKING AHEAD
State of Aggregation
Pipeline
Expressions
Usage and Limitations
Sharding
Looking Ahead
43. FRAMEWORK USE CASES
Basic aggregation queries
Ad-hoc reporting
Real-time analytics
Visualizing time series data
44. EXTENDING THE FRAMEWORK
Adding new pipeline operators, expressions
$ u and $ e for output control
ot te
https://jira.mongodb.org/browse/SERVER-3253
45. FUTURE ENHANCEMENTS
Move $ a c earlier when possible
mth
Pipeline explain facility
Memory usage improvements
Grouping input sorted by _ d
i
Sorting with limited output (top k)
46. ENABLING DEVELOPERS
Doing more within MongoDB, faster
Refactoring MapReduce and groupings
Replace pages of JavaScript
Longer aggregation pipelines
Quick aggregations from the shell