Schema Design at Scale
Upcoming SlideShare
Loading in...5
×
 

Schema Design at Scale

on

  • 3,680 views

This deck is an overview of the process that 10gen went through to scale their MongoDB monitoring service MMS on a single unsharded replica set.

This deck is an overview of the process that 10gen went through to scale their MongoDB monitoring service MMS on a single unsharded replica set.

Statistics

Views

Total Views
3,680
Views on SlideShare
3,273
Embed Views
407

Actions

Likes
5
Downloads
14
Comments
0

5 Embeds 407

http://www.arborian.com 377
http://lanyrd.com 24
https://twitter.com 3
https://www.linkedin.com 2
http://feeds.feedburner.com 1

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

Schema Design at Scale Schema Design at Scale Presentation Transcript

  • MongoDB: Schema Design at Scale Rick Copeland @rick446 http://arborian.com
  • Who am I?• Now a consultant/trainer, but formerly... • Software engineer at SourceForge • Author of Essential SQLAlchemy • Author of MongoDB with Python and Ming • Primarily code Python
  • The Inspiration• MongoDB monitoring service (MMS)• Free to all MongoDB users• Minute-by-minute stats on all your servers• Hardware cost is important, use it efficiently (remember it’s a free service!)
  • Our Experiment• Similar to MMS but not identical• Collection of 100 metrics, each with per- minute values• “Simulation time” is 300x real time• Run on 2x AWS small instance • one MongoDB server (2.0.2) • one “load generator”
  • Load Generator• Increment each metric as many times as possible during the course of a simulated minute• Record number of updates per second• Occasionally call getLastError to prevent disconnects
  • Schema v1{ _id: "20101010/metric-1", metadata: { date: ISODate("2000-10-10T00:00:00Z"), metric: "metric-1" }, • One document daily: 5468426, hourly: { per metric (per "00": 227850, server) per day "01": 210231, ... "23": 20457 }, minute: { • Per hour/minute "0000": 3612, statistics stored as "0001": 3241, ... documents "1439": 2819 }}
  • Update v1 • Use $inc to update fields in- placeincrement = { daily: 1 }increment[hourly. + hour] = 1increment[minute. + minute] = 1 • Use upsert todb.stats.update( { _id: id, metadata: metadata }, create document { $inc: update }, if it’s missing true) // upsert • Easy, correct, seems like a good idea....
  • Performance of v1
  • Performance of v1 Experiment startup
  • Performance of v1 Experiment startup OUCH!
  • Problems with v1• The document movement problem• The midnight problem• The end-of-the-day problem• The historical query problem
  • Document movement problem• MongoDB in-place updates are fast • ... except when they’re not in place• MongoDB adaptively pads documents • ... but it’s better to know your doc size ahead of time
  • Midnight problem• Upserts are convenient, but what’s our key? • date/metric• At midnight, you get a huge spike in inserts
  • Fixing the document movement problem • Preallocatedb.stats.update( documents with { _id: id, metadata: metadata }, zeros { $inc: { daily: 0, hourly.0: 0, hourly.1: 0, ... • Crontab (?) minute.0: 0, minute.1: 0, ... } • NO! (makes true) // upsert the midnight problem even worse)
  • Fixing the midnight problem
  • Fixing the midnight problem• Could schedule preallocation for different metrics, staggered through the day
  • Fixing the midnight problem• Could schedule preallocation for different metrics, staggered through the day• Observation: Preallocation isn’t required for correct operation
  • Fixing the midnight problem• Could schedule preallocation for different metrics, staggered through the day• Observation: Preallocation isn’t required for correct operation• Let’s just preallocate tomorrow’s docs randomly as new stats are inserted (with low probability).
  • Performance with Preallocation Experiment startup
  • Performance with Preallocation • Well, it’s better Experiment startup
  • Performance with Preallocation • Well, it’s better • Still have Experiment startup decreasing performance through the day... WTF?
  • Performance with Preallocation • Well, it’s better • Still have Experiment startup decreasing performance through the day... WTF?
  • Problems with v1• The document movement problem• The midnight problem• The end-of-the-day problem• The historical query problem
  • End-of-day problem“0000” Value “0001” Value “1439” Value• Bson stores documents as an association list• MongoDB must check each key for a match• Load increases significantly at the end of the day (MongoDB must scan 1439 keys to find the right minute!)
  • Fixing the end-of-day problem •{ _id: "20101010/metric-1", metadata: { Split up our date: ISODate("2000-10-10T00:00:00Z"), metric: "metric-1" }, ‘minute’ property daily: 5468426, hourly: { by hour "0": 227850, "1": 210231, ... "23": 20457 }, • Better worst-case minute: { keys scanned: "00": { "0000": 3612, "0100": 3241, ... }, ..., • Old: 1439 "23": { ..., "1439": 2819 }} • New: 82
  • “Hierarchical minutes” Performance
  • PerformanceComparision
  • PerformanceComparision (2.2)
  • Historical Query Problem• Intra-day queries are great• What about “performance year to date”? • Now you’re hitting a lot of “cold” documents and causing page faults
  • Fixing the historical query problem • Store multiple levels{ _id: "201010/metric-1", of granularity in metadata: { date: ISODate("2000-10-01T00:00:00Z"), different collections metric: "metric-1" }, • daily: { "0": 5468426, 2 updates rather than "1": ..., ... 1, but historical} "31": ... }, queries much faster • Preallocate along with daily docs (only infrequently upserted)
  • Queriesdb.stats.daily.find( { • Updates are by "metadata.date": { $gte: dt1, $lte: dt2 }, "metadata.metric": "metric-1"}, _id, so no index{ "metadata.date": 1, "hourly": 1 } }, needed theresort=[("metadata.date", 1)]) • Chart queries are by metadatadb.stats.daily.ensureIndex({ metadata.metric: 1, • Your range/sort metadata.date: 1 }) should be last in the compound index
  • Conclusion• Monitor your performance. Watch out for spikes.• Preallocate to prevent document copying• Pay attention to the number of keys in your documents (hierarchy can help)• Make sure your index is optimized for your sorts
  • Questions? MongoDB Monitoring Servicehttp://www.10gen.com/mongodb-monitoring-service Rick Copeland @rick446 http://arborian.com MongoDB Consulting & Training