2. Alexander C. S. Hendorf
CTO Königsweg GmbH
Always love data and new ideas
mongoDB master 2016, MUG Orga.
EuroPython organizer + program chair
speaker mongoDB world NYC, CEBIT,…
@hendorf
3. Agenda
1. Map Reduce
2. Aggregation framework
a. Pipeline model
b. Pipeline stages
c. Accumulators & Expression Operators
d. Boosting performance
3. Redemption & summary
11. // map
function () {
var artist = this.info.artistName;
emit(artist, 1);
}
// reduce
function (key, values) {
var total = 0;
for (var i = 0; i < values.length; i++) {
total += values[i];
}
return total;
}
}
17. • introduced with mongoDB 2.2 in 2012
• framework for data aggregation
• it's designed 'straight-forward'
• documents enter a
multi-stage pipeline that transforms the documents
into an aggregated results
• all operations have an optimization phase
which attempts to reshape the pipeline for improved
performance
18.
19. get the baton
Pipeline is like a relay race
$match $group
something smart
$project
present nicely
37. $skip
skip documents in found set
$out
write the resulting documents of the aggregation pipeline to a
collection, also incremental.
38. $geoNear
returns an ordered stream of documents
based on the proximity to a geospatial point
$redact
reshapes each document in the stream by restricting the content for
each document based on information stored in the documents
themselves
39. $lookup
left outer join with another collection.
new in 3.2
$indexStats
statistics on index usage (an actual performance metric)
$sample
select some random documents from a collection
47. The Nemesis. Google say
# By Katy_Perry_-_MTV_VMA_2011.jpg: Philip Nelson from San Antonio, TX, USA derivative work: Truu (Katy_Perry_-_MTV_VMA_2011.jpg) [CC BY-SA 2.0 (http://creativecommons.org/licenses/by-sa/2.0)], via Wikimedia Commons
61. Tip
Infastructure
work with dedicated server for aggregation
e.g. a (hidden/delayed) member of replica set or standalone copy
especially useful if you primary is busy with writes
63. Collection
Shard1 Shard2 Shard3 Shard4
server-1 server-2 server-3 server-4
server
CollectionCollection
server server
Replica-Set
horizontal scaling
one primary + copies
Sharding
vertical scaling
split the data across nodes
one server - utilize multiple cpu + IO
"Micro-Sharding"
64. Shard1 Shard2 … ShardN
"Micro-Sharding"
1 2 … N
1 2 N…
CPUs
High IOPS
RAM
68. 2nd Call for Proposals
reserved for Hot Topics
~first week of June
ADVERTISEMENT
69. Alexander C. S. Hendorf
@hendorf
self.Slides: https://goo.gl/VbzFrc
John Page's Tutorial Micro-Sharding:
https://gist.github.com/johnlpage/e0bb9971f4f1c4ed3a09