Hardware Provisioning

Solution Architect, MongoDB
chad.tindel@mongodb.com
@ctindel
Chad Tindel
#MongoDBWorld
Hardware Provisioning

MongoDB is so easy for
programmers….

Even a baby can write an
application!

MongoDB is so easy to
manage with MMS…

Even a baby can manage a cluster!

Hardware Selection for
MongoDB is….

Text Over Photo
A Cautionary Tale

Requirements – Step One
• It is impossible to properly size a MongoDB
cluster without first documenting your business
requirements
• Availability: what is your uptime requirement?
• Throughput
• Responsiveness
– what is acceptable latency?
– is higher latency during peak times acceptable?

Requirements – Step Two
• Understand your own resources available to you
– Storage
– Memory
– Network
– CPU
• Many customers limited to the options available in
AWS or presented by their own Enterprise
Virtualization team

Continuing Requirements – Step
Three
• Once you deploy initially, it is common for requirements to
change
– More users added to the application
• Causes more queries and a larger working set
– New functionality changes queries patterns
• New indexes added causes a larger working set
– What started as a read-intensive application can add more and more
write-heavy workloads
• More write-locking increases reader queue depth
• You must monitor and collect metrics and update your
hardware selection as necessary (scale up /Add RAM? Add
more shards?)

Run a Proof of Concept
• Forces you to:
– Do schema / index design
– Understand query patterns
– Get a handle on Working Set size
• Start small on a single node
– See how much performance you can get from one box
• Add replication, then add sharding
– Understand how these affect performance in your use case
• POC can be done on a smaller scale to infer what will be
needed for production

POC – Requirements to Gather
Data Sizes
– Total Number of Documents
– Average Document Size
– Size of Data on Disk
– Size of Indexes on Disk
– Expected growth
– What is your document model?
• Ingestion
– Insertions / Updates / Deletes per second, peak &
average
– Bulk inserts / updates? If so, how large and how often?

• Query Patterns and Performance Expectations
– Read Response SLA
– Write Response SLA
– Range queries or single document queries?
– Sort conditions
– Is more recent data queried more frequently?
• Data Policies
– How long will you keep the data for?
– Replication Requirements
– Backup Requirements / Time to Recovery

• Multi-datacenter Requirements
– Number and location of datacenters
– Cross DC latency
– Active /Active orActive / Passive?
– Geographical / Data locality requirements?
• Security Requirements
– Encryption over the wire (SSL) ?
– Encryption of data at rest?

Resource Usage
• Storage
– IOPS
– Size
– Data & Loading Patterns
• Memory
– Working Set
• CPU
– Speed
– Cores
• Network
– Latency
– Throughput

Storage Capability
7,200 rpm SATA ~ 75-100 IOPS
15,000 rpm SAS ~ 175-210 IOPS
Amazon SSD EBS ~ 4000 PIOPS / Volume
~ 48,000 PIOPS / Instance
Intel X25-E (SLC) ~ 5,000 IOPS
Fusion IO ~ 135,000 IOPS
Violin Memory 6000 ~ 1,000,000 IOPS

Memory Measuring
• Added in 2.4
– workingSet option on db.serverStatus()
> db.serverStatus( { workingSet: 1 } )

Network
• Latency
– WriteConcern
– ReadPreference
• Throughput
– Update/Write Patterns
– Reads/Queries
• Come to love netperf

CPU Usage
• Non-indexed Queries
• Sorting
• Aggregation
– Map/Reduce
– Framework

Case Study #1: A Spanish Bank
• Problem statement: want to store 6 months worth of
logs
• 18TB of total data (3 TB/month)
• Primarily analyzing the last month’s worth of logs, so
Working Set Size is 1 month’s worth of data (3TB)
plus indexes (1TB) = 4 TB Working Set

Case Study #1: Hardware Selection
• QAEnvironment
– Did not want to mirror a full production cluster. Just
wanted to hold 2TB of data
– 3 nodes / shard * 4 shards = 12 physical machines
– 2 mongos
– 3 config servers (virtual machines)
• Production Environment
– 3 nodes / shard * 36 shards = 108 physical machines
– 128GB/RAM * 36 = 4.6 TB RAM
– 2 mongos
– 3 config servers (virtual machines)

Case Study #2: A Large Online
Retailer
• Problem statement: Moving their product catalog
from SQL Server to MongoDB as part of a larger
architectural overhaul to Open Source Software
• 2 main datacenters running active/active
• On Cyber Monday they peaked at 214 requests/sec,
so let’s budget for 400 requests/sec to give some
headroom

Case Study #2: The POC
• APOC yielded the following numbers:
– 4 million product SKUs, average JSON document size
30KB
• Need to service requests for:
– a specific product (by _id)
– Products in a specific category (i.e. “Desks” or “Hard
Drives”)
• Returns 72 documents, or 200 if it’s a google bot
crawling)

Case Study #2: The Math
• Want to partition (Shard) by category, and have
products that exist in multiple categories duplicated
– The average product appears in 2 categories, so we
actually need to store 8M SKU documents, not 4M
• 8M docs * 30KB/doc = 240GB of data
• 270 GB with indexes
• Working Set is 100% of all data + indexes as this is
a core functionality that must be fast at all times

Case Study #2: Our
Recommendation
• MongoDB initial recommendation was to deploy a single
Replica Set with enough RAM in each server to hold all the
data (at least 384GB RAM/server)
• 4 node Replica Set (2 nodes in each DC, 1 arbiter in a 3rd DC)
– Allows for a node in each DC to go down for maintenance or system
crash while still servicing the application centers in that datacenter
• Deploy using secondary reads (NEAREST read preference)
• This avoids the complexity of sharding, setting up mongos,
config servers, worrying about orphaned documents, etc.

Node 1
Primary
Node 2
Secondary
Node 3
Secondary
Node 3
Secondary
Datacenter 3
Arbiter
Datacenter 1 Datacenter 2

Case Study #2: Actual Provisioning
• Customer decided to deploy on their corporate
VMWare Cloud
• IT would not give them nodes any bigger than 64
GB RAM
• Decided to deploy 3 shards (4 nodes each + arbiter)
= 192 GB/RAM cluster wide into a staging
environment and add a fourth shard if staging
proves it would be worthwhile

Key Takeaways
• Document your performance requirements up front
• Conduct a Proof of Concept
• Always test with a real workload
• Constantly monitor and adjust based on changing
requirements

Solution Architect, MongoDB
Chad Tindel
#MongoDBWorld
Thank You

Hardware Provisioning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Hardware Provisioning

Similar to Hardware Provisioning (20)

More from MongoDB

More from MongoDB (20)

Recently uploaded

Recently uploaded (20)

Hardware Provisioning

Editor's Notes