Capacity Planning For Your Growing MongoDB Cluster

2,082 views
1,839 views

Published on

Your MongoDB deployment is growing, but are you prepared for that growth? Capacity planning is an essential practice when deploying any database system. You need to understand your usage patterns and determine the appropriate hardware based on your application's needs. Scaling reads and scaling writes will require different types of resources. With the proper tools in place, you can understand your working set, gain visibility into when it's time to add resources or start sharding and avoid performance issues. In this session, you'll learn how to use MongoDB Management Service and other tools to identify patterns and predict growth, ensuring your success with MongoDB.

Published in: Technology
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,082
On SlideShare
0
From Embeds
0
Number of Embeds
487
Actions
Shares
0
Downloads
99
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide

Capacity Planning For Your Growing MongoDB Cluster

  1. 1. Solution Architect, MongoDB Sam Weaver Capacity Planning: Deploying MongoDB #mongodb
  2. 2. Capacity Planning • Why is it important? • What is it? • When is it important? • How is it actually done?
  3. 3. Why?
  4. 4. Prepping for launch • You’ve written your application • The code is good • You’re looking to launch soon • How do I deploy?
  5. 5. Questions to ask yourself • Instance types – Standalone? – Replica set? – Sharded? • Architecture • Size of machines – Machines cost money – Size of machines may affect instance types required
  6. 6. • What are the consequences of not planning? Why does it matter?
  7. 7. Why • Once we launch, we don't want to have avoidable down time due to poorly selected HW • As our success grows we want to stay in front of the demand curve • We want to meet business' and users' expectations • We want to keep our jobs 
  8. 8. What?
  9. 9. What is Capacity Planning? Requirement s Resources
  10. 10. Requirements • Availability – Planning for a crash – Planning for binary upgrades – Planning for hardware maintenance • Throughput – X many users at any one time – Bulk loads vs. random access • Responsiveness – SLAof x ms per page load – Amazon, Google study
  11. 11. How?
  12. 12. CPU • Non-indexed Data • Sorting • Aggregation – Map/Reduce – Framework • Data – Fields – Nesting – Arrays/Embedded-Docs
  13. 13. Network • Latency – WriteConcern – ReadPreference – Batching • Throughput – Update/Write Patterns – Reads/Queries
  14. 14. Understand memory usage for MongoDB • Data & indexes memory mapped into virtual address space • Data accessed is paged into RAM • OS evicts least recently used page • More frequently used pages stay in RAM
  15. 15. Identify your working set Number of active users on the system at any one time Number of distinct pages accessed per second =
  16. 16. Working Set
  17. 17. Working Set 4 distinct pages per second RAM Disk
  18. 18. Working Set 4 distinct pages per second
  19. 19. Working Set 4 distinct pages per second Worst case 4 disk accesses
  20. 20. Working Set 6 distinct pages per second
  21. 21. Working Set 6 distinct pages per second
  22. 22. Working Set 6 distinct pages per second
  23. 23. Working Set 6 distinct pages per second Worst case disk access on every op
  24. 24. Memory & Storage MOPs PFs
  25. 25. Memory • Working set affected by –Sorting –Aggregation –Connections SORTS Connections Aggregations
  26. 26. Working Set Estimator "workingSet" : { "note" : "thisIsAnEstimate", "pagesInMemory" : <num>, "computationTimeMicros" : <num>, "overSeconds" : num } Number of unique pages the server needed in the last 15 minutes. Use this to see if you are growing out RAM
  27. 27. Storage • Different storage have different IOPs – Spinning disk • 7,500k SATA 75-100 IOPs – SSD • 9,000-120,000 IOPs – EBS • 100 IOPs – Provisioned EBS • 2,000 IOPs • Work out how much data you need to write per time frame. • MongoDB writes to a journal and datafiles flush to disk. • Replication adds oplog considerations
  28. 28. Using this information • Plan hardware to hold the working set + indexes • Allow room to grow • If working set is larger than RAM and you can’t reasonably add more resources, then shard – Don’t shard too early – Lots of little instances vs. a few big instances • Think about architecture – Local disk or central storage – Don’t be surprised with x copies of data with x number of nodes
  29. 29. Development to production • Don’t be surprised by: – More data = more/larger indexes – Indexes make your working set bigger • Replication adds a network overhead • Journal has different access patterns
  30. 30. What tools are there to help me?
  31. 31. IOStat
  32. 32. MongoStat
  33. 33. MongoPerf • Measure amount of data written to device per second
  34. 34. MongoDB Management Service • Free Cloud or On-Premise based management tool – Monitoring – Automation – Backup
  35. 35. Scaling for capacity – MMS automation
  36. 36. When?
  37. 37. Capacity Planning: When • When? – Before it's too late! – Iterative process Start Launch Version 2
  38. 38. Repeat (continuously) • Repeat Testing • Repeat Evaluations • Repeat Deployment
  39. 39. What is failure? • We have failed at Capacity Planning when our resources don’t meet our requirements • Because our requirements can have many dimensions, we may exceed our requirements in one characteristic but not meet them in another • This means that we can spend many $$$ and still fail!
  40. 40. Models • Load/Users – Response Time/TTFB • System Performance – Peak Usage – Min Usage
  41. 41. Starter Questions • What is the working set? – How does that equate to memory – How much disk access will that require • How efficient are the queries? • What is the rate of data change? • How big are the highs and lows?
  42. 42. Questions?
  43. 43. Solution Architect, MongoDB Sam Weaver Thank You #mongodb

×