MongoDB: Schema Design at Scale     Rick Copeland       @rick446    http://arborian.com
Who am I?• Now a consultant/trainer, but formerly... • Software engineer at SourceForge • Author of Essential SQLAlchemy •...
The Inspiration• MongoDB monitoring service  (MMS)• Free to all MongoDB users• Minute-by-minute stats on all  your servers...
Our Experiment• Similar to MMS but not identical• Collection of 100 metrics, each with per-  minute values• “Simulation ti...
Load Generator• Increment each metric as many times as  possible during the course of a simulated  minute• Record number o...
Schema v1{    _id: "20101010/metric-1",    metadata: {        date: ISODate("2000-10-10T00:00:00Z"),        metric: "metri...
Update v1                                     •   Use $inc to                                         update fields in-    ...
Performance of v1
Performance of v1              Experiment startup
Performance of v1                Experiment startup            OUCH!
Problems with v1• The document movement problem• The midnight problem• The end-of-the-day problem• The historical query pr...
Document movement     problem• MongoDB in-place updates are fast • ... except when they’re not in place• MongoDB adaptivel...
Midnight problem• Upserts are convenient, but what’s our key? • date/metric• At midnight, you get a huge spike in inserts
Fixing the document          movement problem                                     •   Preallocatedb.stats.update(         ...
Fixing the midnight      problem
Fixing the midnight         problem• Could schedule preallocation for different  metrics, staggered through the day
Fixing the midnight         problem• Could schedule preallocation for different  metrics, staggered through the day• Obser...
Fixing the midnight         problem• Could schedule preallocation for different  metrics, staggered through the day• Obser...
Performance with  Preallocation     Experiment startup
Performance with  Preallocation                          • Well, it’s better     Experiment startup
Performance with  Preallocation                          • Well, it’s better                          • Still have     Exp...
Performance with  Preallocation                          • Well, it’s better                          • Still have     Exp...
Problems with v1• The document movement problem• The midnight problem• The end-of-the-day problem• The historical query pr...
End-of-day problem“0000” Value “0001” Value           “1439” Value•   Bson stores documents as an association list•   Mong...
Fixing the end-of-day                problem                                             •{ _id: "20101010/metric-1",  met...
“Hierarchical minutes”    Performance
PerformanceComparision
PerformanceComparision (2.2)
Historical Query         Problem• Intra-day queries are great• What about “performance year to date”? • Now you’re hitting...
Fixing the historical              query problem                                             •   Store multiple levels{ _i...
Queriesdb.stats.daily.find( {                           •   Updates are by    "metadata.date": { $gte: dt1, $lte: dt2 },  ...
Conclusion• Monitor your performance. Watch out for  spikes.• Preallocate to prevent document copying• Pay attention to th...
Questions?          MongoDB Monitoring Servicehttp://www.10gen.com/mongodb-monitoring-service                Rick Copeland...
Upcoming SlideShare
Loading in...5
×

Schema Design at Scale

7,917

Published on

This deck is an overview of the process that 10gen went through to scale their MongoDB monitoring service MMS on a single unsharded replica set.

Published in: Technology, Design
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
7,917
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
18
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Schema Design at Scale

    1. 1. MongoDB: Schema Design at Scale Rick Copeland @rick446 http://arborian.com
    2. 2. Who am I?• Now a consultant/trainer, but formerly... • Software engineer at SourceForge • Author of Essential SQLAlchemy • Author of MongoDB with Python and Ming • Primarily code Python
    3. 3. The Inspiration• MongoDB monitoring service (MMS)• Free to all MongoDB users• Minute-by-minute stats on all your servers• Hardware cost is important, use it efficiently (remember it’s a free service!)
    4. 4. Our Experiment• Similar to MMS but not identical• Collection of 100 metrics, each with per- minute values• “Simulation time” is 300x real time• Run on 2x AWS small instance • one MongoDB server (2.0.2) • one “load generator”
    5. 5. Load Generator• Increment each metric as many times as possible during the course of a simulated minute• Record number of updates per second• Occasionally call getLastError to prevent disconnects
    6. 6. Schema v1{ _id: "20101010/metric-1", metadata: { date: ISODate("2000-10-10T00:00:00Z"), metric: "metric-1" }, • One document daily: 5468426, hourly: { per metric (per "00": 227850, server) per day "01": 210231, ... "23": 20457 }, minute: { • Per hour/minute "0000": 3612, statistics stored as "0001": 3241, ... documents "1439": 2819 }}
    7. 7. Update v1 • Use $inc to update fields in- placeincrement = { daily: 1 }increment[hourly. + hour] = 1increment[minute. + minute] = 1 • Use upsert todb.stats.update( { _id: id, metadata: metadata }, create document { $inc: update }, if it’s missing true) // upsert • Easy, correct, seems like a good idea....
    8. 8. Performance of v1
    9. 9. Performance of v1 Experiment startup
    10. 10. Performance of v1 Experiment startup OUCH!
    11. 11. Problems with v1• The document movement problem• The midnight problem• The end-of-the-day problem• The historical query problem
    12. 12. Document movement problem• MongoDB in-place updates are fast • ... except when they’re not in place• MongoDB adaptively pads documents • ... but it’s better to know your doc size ahead of time
    13. 13. Midnight problem• Upserts are convenient, but what’s our key? • date/metric• At midnight, you get a huge spike in inserts
    14. 14. Fixing the document movement problem • Preallocatedb.stats.update( documents with { _id: id, metadata: metadata }, zeros { $inc: { daily: 0, hourly.0: 0, hourly.1: 0, ... • Crontab (?) minute.0: 0, minute.1: 0, ... } • NO! (makes true) // upsert the midnight problem even worse)
    15. 15. Fixing the midnight problem
    16. 16. Fixing the midnight problem• Could schedule preallocation for different metrics, staggered through the day
    17. 17. Fixing the midnight problem• Could schedule preallocation for different metrics, staggered through the day• Observation: Preallocation isn’t required for correct operation
    18. 18. Fixing the midnight problem• Could schedule preallocation for different metrics, staggered through the day• Observation: Preallocation isn’t required for correct operation• Let’s just preallocate tomorrow’s docs randomly as new stats are inserted (with low probability).
    19. 19. Performance with Preallocation Experiment startup
    20. 20. Performance with Preallocation • Well, it’s better Experiment startup
    21. 21. Performance with Preallocation • Well, it’s better • Still have Experiment startup decreasing performance through the day... WTF?
    22. 22. Performance with Preallocation • Well, it’s better • Still have Experiment startup decreasing performance through the day... WTF?
    23. 23. Problems with v1• The document movement problem• The midnight problem• The end-of-the-day problem• The historical query problem
    24. 24. End-of-day problem“0000” Value “0001” Value “1439” Value• Bson stores documents as an association list• MongoDB must check each key for a match• Load increases significantly at the end of the day (MongoDB must scan 1439 keys to find the right minute!)
    25. 25. Fixing the end-of-day problem •{ _id: "20101010/metric-1", metadata: { Split up our date: ISODate("2000-10-10T00:00:00Z"), metric: "metric-1" }, ‘minute’ property daily: 5468426, hourly: { by hour "0": 227850, "1": 210231, ... "23": 20457 }, • Better worst-case minute: { keys scanned: "00": { "0000": 3612, "0100": 3241, ... }, ..., • Old: 1439 "23": { ..., "1439": 2819 }} • New: 82
    26. 26. “Hierarchical minutes” Performance
    27. 27. PerformanceComparision
    28. 28. PerformanceComparision (2.2)
    29. 29. Historical Query Problem• Intra-day queries are great• What about “performance year to date”? • Now you’re hitting a lot of “cold” documents and causing page faults
    30. 30. Fixing the historical query problem • Store multiple levels{ _id: "201010/metric-1", of granularity in metadata: { date: ISODate("2000-10-01T00:00:00Z"), different collections metric: "metric-1" }, • daily: { "0": 5468426, 2 updates rather than "1": ..., ... 1, but historical} "31": ... }, queries much faster • Preallocate along with daily docs (only infrequently upserted)
    31. 31. Queriesdb.stats.daily.find( { • Updates are by "metadata.date": { $gte: dt1, $lte: dt2 }, "metadata.metric": "metric-1"}, _id, so no index{ "metadata.date": 1, "hourly": 1 } }, needed theresort=[("metadata.date", 1)]) • Chart queries are by metadatadb.stats.daily.ensureIndex({ metadata.metric: 1, • Your range/sort metadata.date: 1 }) should be last in the compound index
    32. 32. Conclusion• Monitor your performance. Watch out for spikes.• Preallocate to prevent document copying• Pay attention to the number of keys in your documents (hierarchy can help)• Make sure your index is optimized for your sorts
    33. 33. Questions? MongoDB Monitoring Servicehttp://www.10gen.com/mongodb-monitoring-service Rick Copeland @rick446 http://arborian.com MongoDB Consulting & Training
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×