SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 30 day free trial to unlock unlimited reading.
1.
MongoDB: Schema
Design at Scale
Rick Copeland
@rick446
http://arborian.com
2.
Who am I?
• Now a consultant/trainer, but formerly...
• Software engineer at SourceForge
• Author of Essential SQLAlchemy
• Author of MongoDB with Python and Ming
• Primarily code Python
3.
The Inspiration
• MongoDB monitoring service
(MMS)
• Free to all MongoDB users
• Minute-by-minute stats on all
your servers
• Hardware cost is important,
use it efficiently (remember it’s
a free service!)
4.
Our Experiment
• Similar to MMS but not identical
• Collection of 100 metrics, each with per-
minute values
• “Simulation time” is 300x real time
• Run on 2x AWS small instance
• one MongoDB server (2.0.2)
• one “load generator”
5.
Load Generator
• Increment each metric as many times as
possible during the course of a simulated
minute
• Record number of updates per second
• Occasionally call getLastError to prevent
disconnects
6.
Schema v1
{
_id: "20101010/metric-1",
metadata: {
date: ISODate("2000-10-10T00:00:00Z"),
metric: "metric-1" }, • One document
daily: 5468426,
hourly: {
per metric (per
"00": 227850, server) per day
"01": 210231,
...
"23": 20457 },
minute: { • Per hour/minute
"0000": 3612, statistics stored as
"0001": 3241,
... documents
"1439": 2819 }
}
7.
Update v1
• Use $inc to
update fields in-
place
increment = { daily: 1 }
increment['hourly.' + hour] = 1
increment['minute.' + minute] = 1
• Use upsert to
db.stats.update(
{ _id: id, metadata: metadata },
create document
{ $inc: update }, if it’s missing
true) // upsert
• Easy, correct,
seems like a good
idea....
11.
Problems with v1
• The document movement problem
• The midnight problem
• The end-of-the-day problem
• The historical query problem
12.
Document movement
problem
• MongoDB in-place updates are fast
• ... except when they’re not in place
• MongoDB adaptively pads documents
• ... but it’s better to know your doc size
ahead of time
13.
Midnight problem
• Upserts are convenient, but what’s our key?
• date/metric
• At midnight, you get a huge spike in inserts
14.
Fixing the document
movement problem
• Preallocate
db.stats.update(
documents with
{ _id: id, metadata: metadata }, zeros
{ $inc: {
daily: 0,
hourly.0: 0,
hourly.1: 0,
...
• Crontab (?)
minute.0: 0,
minute.1: 0,
... } • NO! (makes
true) // upsert the midnight
problem even
worse)
16.
Fixing the midnight
problem
• Could schedule preallocation for different
metrics, staggered through the day
17.
Fixing the midnight
problem
• Could schedule preallocation for different
metrics, staggered through the day
• Observation: Preallocation isn’t required for
correct operation
18.
Fixing the midnight
problem
• Could schedule preallocation for different
metrics, staggered through the day
• Observation: Preallocation isn’t required for
correct operation
• Let’s just preallocate tomorrow’s docs
randomly as new stats are inserted (with
low probability).
19.
Performance with
Preallocation
Experiment startup
20.
Performance with
Preallocation
• Well, it’s better
Experiment startup
21.
Performance with
Preallocation
• Well, it’s better
• Still have
Experiment startup decreasing
performance
through the day...
WTF?
22.
Performance with
Preallocation
• Well, it’s better
• Still have
Experiment startup decreasing
performance
through the day...
WTF?
23.
Problems with v1
• The document movement problem
• The midnight problem
• The end-of-the-day problem
• The historical query problem
24.
End-of-day problem
“0000” Value “0001” Value “1439” Value
• Bson stores documents as an association list
• MongoDB must check each key for a match
• Load increases significantly at the end of the day
(MongoDB must scan 1439 keys to find the right minute!)
29.
Historical Query
Problem
• Intra-day queries are great
• What about “performance year to date”?
• Now you’re hitting a lot of “cold”
documents and causing page faults
30.
Fixing the historical
query problem
• Store multiple levels
{ _id: "201010/metric-1", of granularity in
metadata: {
date: ISODate("2000-10-01T00:00:00Z"), different collections
metric: "metric-1" },
•
daily: {
"0": 5468426, 2 updates rather than
"1": ...,
... 1, but historical
}
"31": ... },
queries much faster
• Preallocate along with
daily docs (only
infrequently upserted)
31.
Queries
db.stats.daily.find( { • Updates are by
"metadata.date": { $gte: dt1, $lte: dt2 },
"metadata.metric": "metric-1"},
_id, so no index
{ "metadata.date": 1, "hourly": 1 } }, needed there
sort=[("metadata.date", 1)])
• Chart queries are
by metadata
db.stats.daily.ensureIndex({
'metadata.metric': 1,
• Your range/sort
'metadata.date': 1 }) should be last in
the compound
index
32.
Conclusion
• Monitor your performance. Watch out for
spikes.
• Preallocate to prevent document copying
• Pay attention to the number of keys in your
documents (hierarchy can help)
• Make sure your index is optimized for your
sorts
33.
Questions?
MongoDB Monitoring Service
http://www.10gen.com/mongodb-monitoring-service
Rick Copeland
@rick446
http://arborian.com
MongoDB Consulting & Training