17. Important performance concepts
• Throughput
• How many “transactions” per “second” were completed
• Latency
• How many “seconds” did each “transaction” take
• Which is important to your use-case? Both?
• Each should be measured in detail
• Overall average
• Interval average (every 10 seconds)
• Exit (last 10% of run)
• Percentiles (99%, 95%)
• Outliers (find a way to catch them)
18. What is A/B benchmarking?
• Always have two “sides” for comparison
• Today vs. yesterday
• directIO vs. bufferedIO
• WiredTiger vs. RocksDB
• Snappy compression vs. zlib
• EC2 m3.large vs. m3.2xlarge
• Change 1 thing
• Compare to prior run
• Repeat
20. Step 1: Model your workload
• Three techniques
• Use your real data and real workload if possible
• You probably can’t share with others,
• Capture/replay tools
• Same downside to above,
• Also might be hard to modify data or workload
• Create a synthetic representation
• i.e., a benchmark
• Open source and share it
21. Step 2: Run it often
• Every day, or at least weekly
• Look for measurable changes
• Throughput, latency, CPU, RSS, IO
• Compare to yesterday, last week, last month
• Automation is a must
• Tutorial at http://bit.ly/benchmarkmongodb
• Use for testing any upcoming changes
• OS, hardware, application version, MongoDB upgrade
• Measure and save everything
• Save the data forever
• You are only measuring too much when it impacts performance
• Start with mongostat, iostat, ps
22. Step 3: Share with others (if possible)
• Open source your benchmark
• Blog about your results
• File crashes or performance issues (bug hunt!)
• https://jira.mongodb.org
• Encourage storage engine competition
23. Is it fast ENOUGH?
• What if your application is performing fine?
• But you’d like to reduce your infrastructure
• MongoDB v3.0 allows mixed storage engines within
replica sets
• Add a hidden replica set member with a new storage
engine into your production environment
• Compare CPU, RSS, IO, disk space with other
secondaries
• You won’t see how it will perform as primary
• Far different concurrency model
25. Things to look forward to, part 1
• MMAPv1 journal performance
• Collection level locking in v3.0 only changed the bottleneck
• Group commit algorithm?
• WiredTiger as default makes this unimportant
• Capped collections are hard
• How “large” is a transactional data store at a given point in time?
• They are natural in MMAPv1 (CLL), but nowhere else
• TokuMX solved this by partitioning the oplog
• But used time based partitioning (by hour or by day)
• Interesting solutions are surely coming
• TokuMX
• Currently based on MongoDB v2.4, needs v2.6 or v3.0
• Public feature roadmap?
26. Things to look forward to, part 2
• The oplog gates performance
• It’s a capped collection (see prior slide)
• It’s a serious point of contention (writers and readers)
• Replication bottleneck
• Write concurrency on primaries is far higher than on secondaries
• Multiple mongod processes per physical server is workaround
• But adds significant operational complexity
• MySQL is constantly improving this, as will MongoDB
• TTL indexes are painful
• In write optimized SE inserts are far less work than deletes
• Extremely busy systems might fall behind and never catch up
27. DO TRY THIS AT HOME!
Tim Callaghan
Acme Benchmarking
www.acmebenchmarking.com
@acmebench