This document discusses capacity planning for deploying MongoDB. It defines capacity planning as planning for requirements like availability, throughput, and responsiveness by determining necessary resources like CPU, memory, storage, and network capacity. It emphasizes starting capacity planning before launch to avoid downtime. Key aspects of capacity planning for MongoDB include estimating working memory set size, storage I/O needs based on data size and access patterns, using tools like IOStat and MongoDB Management Service for monitoring and automation, and conducting iterative testing and deployments. Failure occurs if planned resources cannot meet requirements.
4. Prepping for launch
• You’ve written your application
• The code is good
• You’re looking to launch soon
• How do I deploy?
5. Questions to ask yourself
• Instance types
– Standalone?
– Replica set?
– Sharded?
• Architecture
• Size of machines
– Machines cost money
– Size of machines may affect instance types required
6. • What are the consequences of not planning?
Why does it matter?
7. Why
• Once we launch, we don't want to have
avoidable down time due to poorly selected HW
• As our success grows we want to stay in front of
the demand curve
• We want to meet business' and users'
expectations
• We want to keep our jobs
10. Requirements
• Availability
– Planning for a crash
– Planning for binary upgrades
– Planning for hardware maintenance
• Throughput
– X many users at any one time
– Bulk loads vs. random access
• Responsiveness
– SLAof x ms per page load
– Amazon, Google study
14. Understand memory usage for
MongoDB
• Data & indexes memory mapped into virtual
address space
• Data accessed is paged into RAM
• OS evicts least recently used page
• More frequently used pages stay in RAM
15. Identify your working set
Number of active users on the
system at any one time
Number of distinct pages accessed
per second
=
25. Memory
• Working set affected by
–Sorting
–Aggregation
–Connections
SORTS
Connections
Aggregations
26. Working Set Estimator
"workingSet" : {
"note" : "thisIsAnEstimate",
"pagesInMemory" : <num>,
"computationTimeMicros" : <num>,
"overSeconds" : num
}
Number of unique pages the server needed in the last
15 minutes. Use this to see if you are growing out
RAM
27. Storage
• Different storage have different IOPs
– Spinning disk
• 7,500k SATA 75-100 IOPs
– SSD
• 9,000-120,000 IOPs
– EBS
• 100 IOPs
– Provisioned EBS
• 2,000 IOPs
• Work out how much data you need to write per time frame.
• MongoDB writes to a journal and datafiles flush to disk.
• Replication adds oplog considerations
28. Using this information
• Plan hardware to hold the working set + indexes
• Allow room to grow
• If working set is larger than RAM and you can’t
reasonably add more resources, then shard
– Don’t shard too early
– Lots of little instances vs. a few big instances
• Think about architecture
– Local disk or central storage
– Don’t be surprised with x copies of data with x number of
nodes
29. Development to production
• Don’t be surprised by:
– More data = more/larger indexes
– Indexes make your working set bigger
• Replication adds a network overhead
• Journal has different access patterns
40. What is failure?
• We have failed at Capacity Planning when our
resources don’t meet our requirements
• Because our requirements can have many
dimensions, we may exceed our requirements in
one characteristic but not meet them in another
• This means that we can spend many $$$ and still
fail!
42. Starter Questions
• What is the working set?
– How does that equate to memory
– How much disk access will that require
• How efficient are the queries?
• What is the rate of data change?
• How big are the highs and lows?