Successfully reported this slideshow.

Schema Design at Scale

5

Share

Loading in …3
×
1 of 33
1 of 33

More Related Content

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Schema Design at Scale

  1. 1. MongoDB: Schema Design at Scale Rick Copeland @rick446 http://arborian.com
  2. 2. Who am I? • Now a consultant/trainer, but formerly... • Software engineer at SourceForge • Author of Essential SQLAlchemy • Author of MongoDB with Python and Ming • Primarily code Python
  3. 3. The Inspiration • MongoDB monitoring service (MMS) • Free to all MongoDB users • Minute-by-minute stats on all your servers • Hardware cost is important, use it efficiently (remember it’s a free service!)
  4. 4. Our Experiment • Similar to MMS but not identical • Collection of 100 metrics, each with per- minute values • “Simulation time” is 300x real time • Run on 2x AWS small instance • one MongoDB server (2.0.2) • one “load generator”
  5. 5. Load Generator • Increment each metric as many times as possible during the course of a simulated minute • Record number of updates per second • Occasionally call getLastError to prevent disconnects
  6. 6. Schema v1 { _id: "20101010/metric-1", metadata: { date: ISODate("2000-10-10T00:00:00Z"), metric: "metric-1" }, • One document daily: 5468426, hourly: { per metric (per "00": 227850, server) per day "01": 210231, ... "23": 20457 }, minute: { • Per hour/minute "0000": 3612, statistics stored as "0001": 3241, ... documents "1439": 2819 } }
  7. 7. Update v1 • Use $inc to update fields in- place increment = { daily: 1 } increment['hourly.' + hour] = 1 increment['minute.' + minute] = 1 • Use upsert to db.stats.update( { _id: id, metadata: metadata }, create document { $inc: update }, if it’s missing true) // upsert • Easy, correct, seems like a good idea....
  8. 8. Performance of v1
  9. 9. Performance of v1 Experiment startup
  10. 10. Performance of v1 Experiment startup OUCH!
  11. 11. Problems with v1 • The document movement problem • The midnight problem • The end-of-the-day problem • The historical query problem
  12. 12. Document movement problem • MongoDB in-place updates are fast • ... except when they’re not in place • MongoDB adaptively pads documents • ... but it’s better to know your doc size ahead of time
  13. 13. Midnight problem • Upserts are convenient, but what’s our key? • date/metric • At midnight, you get a huge spike in inserts
  14. 14. Fixing the document movement problem • Preallocate db.stats.update( documents with { _id: id, metadata: metadata }, zeros { $inc: { daily: 0, hourly.0: 0, hourly.1: 0, ... • Crontab (?) minute.0: 0, minute.1: 0, ... } • NO! (makes true) // upsert the midnight problem even worse)
  15. 15. Fixing the midnight problem
  16. 16. Fixing the midnight problem • Could schedule preallocation for different metrics, staggered through the day
  17. 17. Fixing the midnight problem • Could schedule preallocation for different metrics, staggered through the day • Observation: Preallocation isn’t required for correct operation
  18. 18. Fixing the midnight problem • Could schedule preallocation for different metrics, staggered through the day • Observation: Preallocation isn’t required for correct operation • Let’s just preallocate tomorrow’s docs randomly as new stats are inserted (with low probability).
  19. 19. Performance with Preallocation Experiment startup
  20. 20. Performance with Preallocation • Well, it’s better Experiment startup
  21. 21. Performance with Preallocation • Well, it’s better • Still have Experiment startup decreasing performance through the day... WTF?
  22. 22. Performance with Preallocation • Well, it’s better • Still have Experiment startup decreasing performance through the day... WTF?
  23. 23. Problems with v1 • The document movement problem • The midnight problem • The end-of-the-day problem • The historical query problem
  24. 24. End-of-day problem “0000” Value “0001” Value “1439” Value • Bson stores documents as an association list • MongoDB must check each key for a match • Load increases significantly at the end of the day (MongoDB must scan 1439 keys to find the right minute!)
  25. 25. Fixing the end-of-day problem • { _id: "20101010/metric-1", metadata: { Split up our date: ISODate("2000-10-10T00:00:00Z"), metric: "metric-1" }, ‘minute’ property daily: 5468426, hourly: { by hour "0": 227850, "1": 210231, ... "23": 20457 }, • Better worst-case minute: { keys scanned: "00": { "0000": 3612, "0100": 3241, ... }, ..., • Old: 1439 "23": { ..., "1439": 2819 } } • New: 82
  26. 26. “Hierarchical minutes” Performance
  27. 27. Performance Comparision
  28. 28. Performance Comparision (2.2)
  29. 29. Historical Query Problem • Intra-day queries are great • What about “performance year to date”? • Now you’re hitting a lot of “cold” documents and causing page faults
  30. 30. Fixing the historical query problem • Store multiple levels { _id: "201010/metric-1", of granularity in metadata: { date: ISODate("2000-10-01T00:00:00Z"), different collections metric: "metric-1" }, • daily: { "0": 5468426, 2 updates rather than "1": ..., ... 1, but historical } "31": ... }, queries much faster • Preallocate along with daily docs (only infrequently upserted)
  31. 31. Queries db.stats.daily.find( { • Updates are by "metadata.date": { $gte: dt1, $lte: dt2 }, "metadata.metric": "metric-1"}, _id, so no index { "metadata.date": 1, "hourly": 1 } }, needed there sort=[("metadata.date", 1)]) • Chart queries are by metadata db.stats.daily.ensureIndex({ 'metadata.metric': 1, • Your range/sort 'metadata.date': 1 }) should be last in the compound index
  32. 32. Conclusion • Monitor your performance. Watch out for spikes. • Preallocate to prevent document copying • Pay attention to the number of keys in your documents (hierarchy can help) • Make sure your index is optimized for your sorts
  33. 33. Questions? MongoDB Monitoring Service http://www.10gen.com/mongodb-monitoring-service Rick Copeland @rick446 http://arborian.com MongoDB Consulting & Training

Editor's Notes

  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • ×