Mongo scaling

Co-founder at Stealth Startup
Sep. 27, 2012
Mongo scaling
Mongo scaling
Mongo scaling
Mongo scaling
Mongo scaling
Mongo scaling
Mongo scaling
Mongo scaling
Mongo scaling
Mongo scaling
Mongo scaling
Mongo scaling
Mongo scaling
Mongo scaling
Mongo scaling
Mongo scaling
Mongo scaling
Mongo scaling
Mongo scaling
Mongo scaling
Mongo scaling
Mongo scaling
Mongo scaling
Mongo scaling
Mongo scaling
Mongo scaling
Mongo scaling
Mongo scaling
Mongo scaling
Mongo scaling
Mongo scaling
Mongo scaling
Mongo scaling
Mongo scaling
Mongo scaling
Mongo scaling
Mongo scaling
Mongo scaling
Mongo scaling
1 of 39

More Related Content

What's hot

MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by B...MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by B...
MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by B...MongoDB
Using npm to Manage Your Projects for Fun and Profit - USEFUL INFO IN NOTES!Using npm to Manage Your Projects for Fun and Profit - USEFUL INFO IN NOTES!
Using npm to Manage Your Projects for Fun and Profit - USEFUL INFO IN NOTES!async_io
Metaprogramming with JavaScriptMetaprogramming with JavaScript
Metaprogramming with JavaScriptTimur Shemsedinov
Performance patternsPerformance patterns
Performance patternsStoyan Stefanov
Back to Basics Webinar 5: Introduction to the Aggregation FrameworkBack to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation FrameworkMongoDB
Choosing a Shard keyChoosing a Shard key
Choosing a Shard keyMongoDB

Similar to Mongo scaling

Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...MongoDB
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsServer Density
CouchDB introductionCouchDB introduction
CouchDB introductionSander van de Graaf
MongoDB: a gentle, friendly overviewMongoDB: a gentle, friendly overview
MongoDB: a gentle, friendly overviewAntonio Pintus
Scaling MongoDB; Sharding Into and Beyond the Multi-Terabyte RangeScaling MongoDB; Sharding Into and Beyond the Multi-Terabyte Range
Scaling MongoDB; Sharding Into and Beyond the Multi-Terabyte RangeMongoDB
Webinar: Position and Trade Management with MongoDBWebinar: Position and Trade Management with MongoDB
Webinar: Position and Trade Management with MongoDBMongoDB

Recently uploaded

Meetup_adessoCamunda_2023-09-13_Part1&2_en.pdfMeetup_adessoCamunda_2023-09-13_Part1&2_en.pdf
Meetup_adessoCamunda_2023-09-13_Part1&2_en.pdfMariaAlcantara50
info_session_gdsc_tmsl .pptxinfo_session_gdsc_tmsl .pptx
info_session_gdsc_tmsl .pptxNikitaSingh741518
Document Understanding as Cloud APIs and Generative AI Pre-labeling Extractio...Document Understanding as Cloud APIs and Generative AI Pre-labeling Extractio...
Document Understanding as Cloud APIs and Generative AI Pre-labeling Extractio...DianaGray10
Experts Live Europe 2023 - Ensure your compliance in Microsoft Teams with Mic...Experts Live Europe 2023 - Ensure your compliance in Microsoft Teams with Mic...
Experts Live Europe 2023 - Ensure your compliance in Microsoft Teams with Mic...Jasper Oosterveld
Nymity Framework: Privacy & Data Protection Update in 7 StatesNymity Framework: Privacy & Data Protection Update in 7 States
Nymity Framework: Privacy & Data Protection Update in 7 StatesTrustArc
"Architecture assessment from classics to details",  Dmytro Ovcharenko"Architecture assessment from classics to details",  Dmytro Ovcharenko
"Architecture assessment from classics to details", Dmytro OvcharenkoFwdays

Mongo scaling

Editor's Notes

  1. \n
  2. \n
  3. All user activity stored in mongo - checkins, game usernames, etc\nHeyzap SDK in many top tier titles - lots of events. Analytics for the millions of game sessions involving heyzap SDK\nGeospatial queries to find where people checked in\nSupplement Mongo with MySQL (allows you to do joins etc)\nAlso Redis as a caching layer\n
  4. High burst write. People deploy bad code and we get all their exceptions.\nBugsnag uses Mongo and Redis alone. Redis caching layer on top of mongo\n\n\n
  5. \n
  6. Schemaless - No migrations. Migrating SQL caused a lot of downtime for Heyzap. \nFire & Forget - by default mongo doesnt wait for the write to complete before returning to the app.\n\n
  7. Many pros are also cons. Know what you are getting into.\nSchemaless means the app has to cope with bad data/migrations/bad states etc\nFire & Forget you can use the safe keyword, but that affects speed\nNo joins, can only pull data from one collection at a time\nSingle write lock across a database. Not great for high proportion of writes, but writes yield - mitigate with db per collection in 2.2. 2.4 will have collection locks.\n
  8. \n
  9. You should design with performance in mind. Think future proof.\nWork out where your pain points will be\nBegin to scale before you hit 95% capacity. You need spare capacity to scale.\n
  10. \n
  11. Working set = often used data. In logging app it would be the last n days of logs. 99% of queries would be on that.\nIndexes and documents should be in RAM for best results. Bare minimum is indexes!\n
  12. When RAM gets full! This is no exaggeration. Mongo’s performance drops massively\n
  13. For Heyzap I/O is the single biggest headache on EC2. EBS random spikes. \nHeyzap moved to provisioned IOPS when it was released to smooth the spikes, rather than get better throughput.\n
  14. xfs supports io suspend and write-cache flushing - essential for AWS snapshots\nincrease file descriptors to allow more open files\natime updates access times for files. That turns reads into writes = bad\nread-ahead means system will read extra blocks from disk when doing a read. Good for sequential access, bad for random (mongo) access\n
  15. \n
  16. Bigger machine.\nHard to get more on 1 machine, especially in the cloud.\nCan be viable in the short term. You can do this with no downtime. Heyzap & Bugsnag do\n
  17. \n
  18. If you use replica sets - monitor the replication lag. This should be close to zero. Otherwise users can write something but cant read it back.\nYou can send a “Write Concern” to say replicate to slaves. Can screw you if slaves are behind.\nAll working set still in memory on each member, just scales volume of reads, not data size\n
  19. Can automatically shard, mongo supports that. Carefully pick your shard key to correctly distribute the load across shards.\nDistributes working set across all shards for big working sets. Also distributes writes.\nHeyzap did manual sharding by collection.\n
  20. \n
  21. Only returning what you need will be faster.\nI advise ensuring (on large datasets) that pretty much every query is indexed. Cron jobs running unindexed queries have caused Heyzap downtime. Smaller datasets is fine.\nRun explain on a new query you are about to deploy. Saves a lot of downtime! Verify it uses an index.\n
  22. Means we dont have to read as many documents, which means we dont need to seek as much on disk.\nNot always applicable. Sometimes the same doc will be in too many diff places. Would make updates too hard.\n
  23. If we wanted to index here on android and iphone separately. That would be 2 indexes.\nWe can combine them into one “bitfield”, halving our index size. Heyzap had a very similar issue with schema.\nMeans we can use less RAM. #1 rule in mongo, use less RAM\n
  24. \n
  25. Depends how small your values/documents are as to whether its worth it\nCan reduce your working set - commonly accessed documents smaller.\nNo effect on indexes\n
  26. Small performance hit from using the profile is worth it. You need to know how fast your db is running.\nIn mongo (command line) run db.setProfilingLevel(1,100). Logs all queries that took more than 100ms.\nprofile is capped collection. May need resize depending on your throughput.\n
  27. Sample output of profiler.\n
  28. ts = when it ran. Tie that to your other logs\nnscanned = number of indexes or documents scanned\nscanAndOrder = when mongo cant use the index to sort\nnumYield = how many times it yielded, indication of page fault etc\nmillis = total duration\n
  29. \n
  30. Index size graphing will allow you to predict scaling needs. Heyzap could accurately predict to within ~ day\nCurrent Ops spikes show you when to look at profiler\nIndexes should rarely miss.\nReplication lag leads to bunk user experience on reads, and hard app code (read from primary).\n
  31. \n
  32. opid = opid - Pass this to db.killOp() to stop it\nns = namespace = database.collection\nCan show you why everything has suddenly gone slow, but you can miss the guilty query, profiler is better\n
  33. Locks are the microsecond duration locked and waiting for locks\nindex counters say how many index hits we had. Miss means index not in RAM = bad.\n
  34. Useful stats. Index size - keep in RAM\nGraph index size.\nThese metrics can help you predict the need for scaling\nCan also call db.collection.stats(). Get something similar\n\n
  35. Can use --locks to show you lock statistics if you prefer that view\nGood to check if you aren’t sure what collections are heavily used\n
  36. \n
  37. \n
  38. \n
  39. \n