Aggregation Framework

#MONGOBOSTON

AGGREGATION FRAMEWORK
Jeremy Mikola
@jmikola

AGENDA

State of Aggregation Usage and Limitations
Pipeline Sharding
Expressions Looking Ahead

STATE OF AGGREGATION
We're storing our data in MongoDB.

We need to do ad-hoc reporting,
grouping, common aggregations, etc.

What are we using for this?

DATA WAREHOUSING
SQL for reporting and analytics
Infrastructure complications
Additional maintenance
Data duplication
ETL processes
Real time?

MAPREDUCE
Extremely versatile, powerful
Intended for complex data analysis
Overkill for simple aggregation tasks
Averages
Summation
Grouping

MAPREDUCE IN MONGODB
Implemented with JavaScript
Single-threaded
Difficult to debug
Concurrency
Appearance of parallelism
Write locks (without i l n or j M d )
nie soe

And now for something completely different…

AGGREGATION FRAMEWORK
Declared in BSON, executes in C++
Flexible, functional, and simple
Operation pipeline
Computational expressions
Plays nice with sharding

PIPELINE
Process a stream of documents
Original input is a collection
Final output is a result document
Series of operators
Filter or transform data
Input/output chain

p x|ge ogd|ha n1
sa rpmno ed‐

PIPELINE OPERATORS
$ac
mth
$rjc
poet
$ru
gop
$nid
uwn
$ot
sr
$ii
lmt
$kp
si

OUR EXAMPLE DATA
Library Books
{_d 7,
i:35
tte TeGetGtb"
il:"h ra asy,
IB:"715109"
SN 9887513,
aalbe re
vial:tu,
pgs 1,
ae:28
catr:9
hpes ,
sbet:[
ujcs
  "ogIln"
  Ln sad,
  "e ok,
  NwYr"
  "90"
  12s
]
,
lnug:"nls"
agae Egih,
pbihr
ulse:{
  ct:"odn,
  iy Lno"
  nm:"admHue
  ae Rno os"
}

}

$MATCH
Filter documents
Uses existing query syntax
No geospatial operations or $ h r
wee

$MATCH
Matching field values
{tte TeGetGtb"
il:"h ra asy,
pgs 1,
ae:28 ► {$ac:{
mth
lnug:"usa"
agae Rsin
lnug:"nls"
agae Egih }}
}

{tte WradPae,
il:"a n ec"
pgs 40
ae:14,
▼
lnug:"usa"
agae Rsin
} {tte WradPae,
il:"a n ec"
pgs 40
ae:14,
lnug:"usa"
agae Rsin
{tte AlsSrge"
il:"ta hugd, }
pgs 08
ae:18,
lnug:"nls"
agae Egih
}

$MATCH
Matching with query operators
{tte TeGetGtb"
il:"h ra asy,
pgs 1,
ae:28 ► {$ac:{
mth
pgs g:10
ae:{$t 00}
lnug:"nls"
agae Egih }}
}

{tte WradPae,
il:"a n ec"
pgs 40
ae:14,
▼
lnug:"usa"
agae Rsin
} {tte WradPae,
il:"a n ec"
pgs 40
ae:14,
lnug:"usa"
agae Rsin
{tte AlsSrge"
il:"ta hugd, }
pgs 08
ae:18,
lnug:"nls"
agae Egih
} {tte AlsSrge"
il:"ta hugd,
pgs 08
ae:18,
lnug:"nls"
agae Egih
}

$PROJECT
Reshape documents
Include, exclude or rename fields
Inject computed fields
Manipulate sub-document fields

$PROJECT
Including and excluding fields
{_d 7,
i:35
tte TeGetGtb"
il:"h ra asy, ► {$rjc:{
poet
_d ,
i:0
IB:"715109"
SN 9887513, tte ,
il:1
aalbe re
vial:tu, lnug:1
agae
pgs 1,
ae:28 }}
catr:9
hpes ,
sbet:[
ujcs
  "ogIln"
  Ln sad,
  "e ok,
  NwYr"
  "90"
  12s
▼
]
,
lnug:"nls"
agae Egih {tte TeGetGtb"
il:"h ra asy,
} lnug:"nls"
agae Egih
}

$PROJECT
Renaming and computing fields
{_d 7,
i:35
tte TeGetGtb"
il:"h ra asy, ► {$rjc:{
poet
agaePrhpe:{
vPgseCatr
IB:"715109"
SN 9887513,   $iie "pgs,"catr"
  dvd:[$ae" $hpes]
aalbe re
vial:tu, }
,
pgs 1,
ae:28 ln:"lnug"
ag $agae
catr:9
hpes , }}
sbet:[
ujcs
  "ogIln"
  Ln sad,
  "e ok,
  NwYr"
  "90"
  12s
]
,
▼
lnug:"nls"
agae Egih
} {_d 7,
i:35
agaePrhpe:2.2222222
vPgseCatr 42222222,
ln:"nls"
ag Egih
}

$PROJECT
Creating and extracting sub-document fields
{_d 7,
i:35
tte TeGetGtb"
il:"h ra asy,
► {$rjc:{
poet
tte ,
il:1
IB:"715109"
SN 9887513, sas
tt:{
aalbe re
vial:tu,   pgs $ae"
  ae:"pgs,
pgs 1,
ae:28   catr:"catr"
  hpes $hpes,
catr:9
hpes , }
,
sbet:[
ujcs pbct:"pbihrct"
u_iy $ulse.iy
  "ogIln"
  Ln sad, }}
  "e ok,
  NwYr"
  "90"
  12s
]
,
pbihr
ulse:{
  ct:"odn,
  iy Lno"
▼
  nm:"admHue
  ae Rno os"
}
{_d 7,
i:35
} tte TeGetGtb"
il:"h ra asy,
sas
tt:{
  pgs 1,
  ae:28
  lnug:"nls"
  agae Egih
}
,
pbct:"odn
u_iy Lno"
}

$GROUP
Group documents by an ID
Field reference, object, constant
Other output fields are computed
$a, $i, $v, $u
mx mn ag sm
adoe, ps
$ d T S t$ u h
$is, $at
frt ls
Processes all data in memory

$GROUP
Calculating an average
{tte TeGetGtb"
il:"h ra asy,
pgs 1,
ae:28 ► {$ru:{
gop
_d $agae,
i:"lnug"
lnug:"nls"
agae Egih agae:{$v:"pgs
vPgs ag $ae"}
} }}

▼
{tte WradPae,
il:"a n ec"
pgs 40
ae:14,
lnug:"usa"
agae Rsin
}
{_d Rsin,
i:"usa"
agae:14
vPgs 40
{tte AlsSrge"
il:"ta hugd, }
pgs 08
ae:18,
lnug:"nls"
agae Egih
} {_d Egih,
i:"nls"
agae:63
vPgs 5
}

$GROUP
Summating fields and counting
{tte TeGetGtb"
il:"h ra asy,
pgs 1,
ae:28 ► {$ru:{
gop
_d $agae,
i:"lnug"
lnug:"nls"
agae Egih nmils sm ,
uTte:{$u:1}
} smae:{$u:"pgs
uPgs sm $ae"}
}}

{tte WradPae,
il:"a n ec"

▼
pgs 40
ae:14,
lnug:"usa"
agae Rsin
}

{_d Rsin,
i:"usa"
{tte AlsSrge"
il:"ta hugd, nmils ,
uTte:1
pgs 08
ae:18, smae:14
uPgs 40
lnug:"nls"
agae Egih }
}

{_d Egih,
i:"nls"
nmils ,
uTte:2
smae:10
uPgs 36
}

$GROUP
Collecting distinct values
{tte TeGetGtb"
il:"h ra asy,
pgs 1,
ae:28 ► {$ru:{
gop
_d $agae,
i:"lnug"
lnug:"nls"
agae Egih tte:{$dTSt $il"}
ils  adoe:"tte
} }}

▼
{tte WradPae,
il:"a n ec"
pgs 40
ae:14,
lnug:"usa"
agae Rsin
}
{_d Rsin,
i:"usa"
tte:["a n ec"]
ils  WradPae
{tte AlsSrge"
il:"ta hugd, }
pgs 08
ae:18,
lnug:"nls"
agae Egih
} {_d Egih,
i:"nls"
tte:[
ils
  "ta hugd,
  AlsSrge"
  "h ra asy
  TeGetGtb"
]

}

$UNWIND
Operate on an array field
Yield new documents for each array element
Array replaced by element value
Missing/empty fields → no output
Non-array fields → error
Pipe to $ r u to aggregate array values
gop

$UNWIND
Yielding multiple documents from one
{_d 7,
i:35
tte TeGetGtb"
il:"h ra asy, ► {$nid $ujcs
uwn:"sbet"}

sbet:[
ujcs
  "ogIln"
  Ln sad,
  "e ok,
  NwYr"
  "90"
  12s
▼
]

} {_d 7,
i:35
tte TeGetGtb"
il:"h ra asy,
sbet:"ogIln"
ujcs Ln sad
}

{_d 7,
i:35
tte TeGetGtb"
il:"h ra asy,
sbet:"e ok
ujcs NwYr"
}

{_d 7,
i:35
tte TeGetGtb"
il:"h ra asy,
sbet:"90"
ujcs 12s
}

$SORT, $LIMIT, $SKIP
Sort documents by one or more fields
Same order syntax as cursors
Waits for earlier pipeline operator to return
In-memory unless early and indexed
Limit and skip follow cursor behavior

$SORT
Sort all documents in the pipeline
{tte TeGetGtb"}
il:"h ra asy
► {$ot il:1}
sr:{tte }

▼
{tte BaeNwWrd
il:"rv e ol"}

{tte TeGae fWah
il:"h rpso rt"}
{tte Aia am
il:"nmlFr"}
{tte Aia am
il:"nmlFr"}
{tte BaeNwWrd
il:"rv e ol"}
{tte Lr fteFis
il:"odo h le"}
{tte Fhehi 5"}
il:"arnet41
{tte FtesadSn"}
il:"ahr n os
{tte FtesadSn"}
il:"ahr n os
{tte IvsbeMn
il:"niil a"}
{tte IvsbeMn
il:"niil a"}
{tte Fhehi 5"}
il:"arnet41
{tte Lr fteFis
il:"odo h le"}

{tte TeGae fWah
il:"h rpso rt"}

{tte TeGetGtb"}
il:"h ra asy

$LIMIT
Limit documents through the pipeline
{tte Aia am
il:"nmlFr"}
► {$ii:5}
lmt

▼
{tte BaeNwWrd
il:"rv e ol"}

{tte Fhehi 5"}
il:"arnet41
{tte Aia am
il:"nmlFr"}
{tte FtesadSn"}
il:"ahr n os
{tte BaeNwWrd
il:"rv e ol"}
{tte IvsbeMn
il:"niil a"}
{tte Fhehi 5"}
il:"arnet41
{tte Lr fteFis
il:"odo h le"}
{tte FtesadSn"}
il:"ahr n os
{tte TeGae fWah
il:"h rpso rt"}
{tte IvsbeMn
il:"niil a"}
{tte TeGetGtb"}
il:"h ra asy

$SKIP
Skip over documents in the pipeline
{tte Aia am
il:"nmlFr"}
► {$kp
si:2}

▼
{tte BaeNwWrd
il:"rv e ol"}

{tte Fhehi 5"}
il:"arnet41
{tte Fhehi 5"}
il:"arnet41
{tte FtesadSn"}
il:"ahr n os
{tte FtesadSn"}
il:"ahr n os
{tte IvsbeMn
il:"niil a"}
{tte IvsbeMn
il:"niil a"}

EXPRESSIONS
State of Aggregation
Pipeline
Expressions
Usage and Limitations
Sharding
Looking Ahead

EXPRESSIONS
Return computed values
Used with $ r j c and $ r u
poet gop
Reference fields using $(e.g. " x )
$"
Expressions may be nested

EXPRESSIONS
Logic Comparison
$ n , $ r$ o …
ad o, nt $ m , $ q$ t
cp e, g…

Arithmetic String
$d, $iie
ad dvd… sraem, sbt…
$ t c s c p$ u s r

Date Conditional
ya, dyfot…
$ e r$ a O M n h cn, iNl…
$ o d$ f u l

USAGE
Pipeline
Expressions
Sharding
Looking Ahead

USAGE
g r g t database command
ageae
c l e t o . g r g t ( method
olcinageae)
Mongo shell
Most drivers

COLLECTION METHOD
d.ok.grgt(
bbosageae[
{$ot  rae:1},
  sr:{cetd  }
{$nid $ujcs ,
  uwn:"sbet"}
{$ru:{_d $ujcs,n  sm  ,
  gop  i:"sbet" :{$u:1}
       f:{$is:"cetd  }
       c  frt $rae"}},
{$rjc:{_d ,n ,f:{$er $c }
  poet  i:1 :1 c  ya:"f"}}
];
)

▼
{
rsl:[
eut
  {"i" Fnay,"u" ,"c:20 ,
   _d:"ats" nm:6 f" 08}
  {"i" Hsoia" nm:7 f" 02}
   _d:"itrcl,"u" ,"c:21 ,
  {"i" WrdLtrtr" n:2 f" 09}
   _d:"ol ieaue,"" ,"c:20
  / te eut olw
  /Ohrrslsflo…
]
,
o:1
k
}

DATABASE COMMAND
d.uCmad{ageae bos,ppln:[
brnomn( grgt:"ok" ieie
{$ot  rae:1},
  sr:{cetd  }
{$nid $ujcs ,
  uwn:"sbet"}
{$ru:{_d $ujcs,n  sm  ,
  gop  i:"sbet" :{$u:1}
       f:{$is:"cetd  }
       c  frt $rae"}},
{$rjc:{_d ,n ,f:{$er $c }
  poet  i:1 :1 c  ya:"f"}}
])
};

▼
{
rsl:[
eut
  {"i" Fnay,"u" ,"c:20 ,
   _d:"ats" nm:6 f" 08}
  {"i" Hsoia" nm:7 f" 02}
   _d:"itrcl,"u" ,"c:21 ,
  {"i" WrdLtrtr" n:2 f" 09}
   _d:"ol ieaue,"" ,"c:20
  / te eut olw
  /Ohrrslsflo…
]
,
o:1
k
}

LIMITATIONS
Result limited by BSON document size
Final command result
Intermediate shard results
Pipeline operator memory limits
Some BSON types unsupported
Binary, Code, deprecated types

SHARDING
Pipeline
Expressions
Sharding
Looking Ahead

SHARDING
Split the pipeline at first $ r u or $ o t
gop sr
Shards execute pipeline up to that point
mongos merges results and continues
Early $ a c may excuse shards
mth
CPU and memory implications for mongos

SHARDING
[
{$ac: {/ itrb hr e /},
  mth   *fle ysadky* }
{$ru: {/ ru ysm il /},
  gop   *gopb oefed* }
{$ot  {/ otb oefed /},
  sr:   *sr ysm il * }
{$rjc:{/ ehp eut   /}
  poet  *rsaersl   * }
]

SHARDING
shard1 shard2 shard3
$ac
mth $ac
mth
$ru1
gop $ru1
gop

↘ ↓
mongos
$ru2
gop
$ot
sr
$rjc
poet

↓
Result

LOOKING AHEAD
Pipeline
Expressions
Sharding
Looking Ahead

FRAMEWORK USE CASES
Basic aggregation queries
Ad-hoc reporting
Real-time analytics
Visualizing time series data

EXTENDING THE FRAMEWORK
Adding new pipeline operators, expressions
$ u and $ e for output control
ot te
https://jira.mongodb.org/browse/SERVER-3253

FUTURE ENHANCEMENTS
Move $ a c earlier when possible
mth
Pipeline explain facility
Memory usage improvements
Grouping input sorted by _ d
i
Sorting with limited output (top k)

ENABLING DEVELOPERS
Doing more within MongoDB, faster
Refactoring MapReduce and groupings
Replace pages of JavaScript
Longer aggregation pipelines
Quick aggregations from the shell

THANKS!
http://goo.gl/G3cmD

QUESTIONS?

PHOTO CREDITS
http://dilbert.com/strips/comic/2012-09-05
http://www.flickr.com/photos/toolstop/4324416999
http://www.ristart.ee/web2/files/Product/large/13443203307.jpg
http://www.flickr.com/photos/vascorola/3164882131
http://www.flickr.com/photos/capcase/4970062156
http://img.timeinc.net/time/photoessays/2009/monty_python/monty_python_02.jpg

Aggregation Framework

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

More from MongoDB

More from MongoDB (20)

Aggregation Framework