Is It Fast? : Measuring MongoDB Performance

IS IT FAST?
Measuring MongoDB Performance
Tim Callaghan, @acmebench

Who am I?
ACME + + ing+
CBO (Chief Benchmarking Officer)

Me?
• Database Consumer (92-09)
• VoltDB (09-11)
• Tokutek (11-15)
• CrunchTime! (15-)
• Semi-professional database benchmarker

Why measure performance?
1. Win cool bug-hunt prizes

2. Optimize and tune for your environment and workload

3. Monitor for regressions

4. Improve your skillset (or become sympathetic)
Code
Build
Test
Deploy
Operate
Monitor
Developers Operations

MongoDB performance timeline
2009 2010 2011 2012 2013 2014 2016 201820172015
MongoDB v1.0
Global Lock
(2/2009)
MMAPv1

2009 2010 2011 2012 2013 2014 2016 201820172015
MongoDB v1.0
Global Lock
(2/2009)
MongoDB v2.2
Database Lock
(8/2012)
MMAPv1

2009 2010 2011 2012 2013 2014 2016 201820172015
MongoDB v1.0
Global Lock
(2/2009)
MongoDB v2.2
Database Lock
(8/2012)
TokuMX v1.0
Concurrency
Compression
(6/2013)
MMAPv1
TokuMX
Not fast.

2009 2010 2011 2012 2013 2014 2016 201820172015
MongoDB v1.0
Global Lock
(2/2009)
MongoDB v2.2
Database Lock
(8/2012)
TokuMX v1.0
Concurrency
Compression
(6/2013)
MongoDB v3.0
Storage Engines
(3/2015)
MMAPv1
TokuMX
WiredTiger
RocksDB
TokuMXse
Others?

2009 2010 2011 2012 2013 2014 2016 201820172015
MongoDB v1.0
Global Lock
(2/2009)
MongoDB v2.2
Database Lock
(8/2012)
TokuMX v1.0
Concurrency
Compression
(6/2013)
MongoDB v3.0
Storage Engines
(3/2015)
MMAPv1
TokuMX
WiredTiger
RocksDB
TokuMXse
Others?
(Prediction)
Interesting times
ahead

Storage Engine == Performance
• MongoDB v3.0 = Storage Engine API v1.0
• It’s going to take some time
• TokuMX currently has serious performance advantages
• Read free replication, partitioning, read free $ operations
• Competition FTW!
• Available now
• Some performance improvements
• Compression
• Future features
• Additional performance improvements
• Transactions
• Joins

Default
Storage Engine Wars! – today
MMAPv1
se

Default
Storage Engine Wars! – tomorrow?
MMAPv1
In-Memory?
OLAP/Analytics?X
Others?
se
Column Store?

Important performance concepts
• Throughput
• How many “transactions” per “second” were completed
• Latency
• How many “seconds” did each “transaction” take
• Which is important to your use-case? Both?
• Each should be measured in detail
• Overall average
• Interval average (every 10 seconds)
• Exit (last 10% of run)
• Percentiles (99%, 95%)
• Outliers (find a way to catch them)

What is A/B benchmarking?
• Always have two “sides” for comparison
• Today vs. yesterday
• directIO vs. bufferedIO
• WiredTiger vs. RocksDB
• Snappy compression vs. zlib
• EC2 m3.large vs. m3.2xlarge
• Change 1 thing
• Compare to prior run
• Repeat

Step 1: Model your workload
• Three techniques
• Use your real data and real workload if possible
• You probably can’t share with others, 
• Capture/replay tools
• Same downside to above, 
• Also might be hard to modify data or workload
• Create a synthetic representation
• i.e., a benchmark
• Open source and share it

Step 2: Run it often
• Every day, or at least weekly
• Look for measurable changes
• Throughput, latency, CPU, RSS, IO
• Compare to yesterday, last week, last month
• Automation is a must
• Tutorial at http://bit.ly/benchmarkmongodb
• Use for testing any upcoming changes
• OS, hardware, application version, MongoDB upgrade
• Measure and save everything
• Save the data forever
• You are only measuring too much when it impacts performance
• Start with mongostat, iostat, ps

Step 3: Share with others (if possible)
• Open source your benchmark
• Blog about your results
• File crashes or performance issues (bug hunt!)
• https://jira.mongodb.org
• Encourage storage engine competition

Is it fast ENOUGH?
• What if your application is performing fine?
• But you’d like to reduce your infrastructure
• MongoDB v3.0 allows mixed storage engines within
replica sets
• Add a hidden replica set member with a new storage
engine into your production environment
• Compare CPU, RSS, IO, disk space with other
secondaries
• You won’t see how it will perform as primary
• Far different concurrency model

The future of MongoDB performance

Things to look forward to, part 1
• MMAPv1 journal performance
• Collection level locking in v3.0 only changed the bottleneck
• Group commit algorithm?
• WiredTiger as default makes this unimportant
• Capped collections are hard
• How “large” is a transactional data store at a given point in time?
• They are natural in MMAPv1 (CLL), but nowhere else
• TokuMX solved this by partitioning the oplog
• But used time based partitioning (by hour or by day)
• Interesting solutions are surely coming
• TokuMX
• Currently based on MongoDB v2.4, needs v2.6 or v3.0
• Public feature roadmap?

Things to look forward to, part 2
• The oplog gates performance
• It’s a capped collection (see prior slide)
• It’s a serious point of contention (writers and readers)
• Replication bottleneck
• Write concurrency on primaries is far higher than on secondaries
• Multiple mongod processes per physical server is workaround
• But adds significant operational complexity
• MySQL is constantly improving this, as will MongoDB
• TTL indexes are painful
• In write optimized SE inserts are far less work than deletes
• Extremely busy systems might fall behind and never catch up

DO TRY THIS AT HOME!
Tim Callaghan
Acme Benchmarking
www.acmebenchmarking.com
@acmebench

Is It Fast? : Measuring MongoDB Performance

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Is It Fast? : Measuring MongoDB Performance

Similar to Is It Fast? : Measuring MongoDB Performance (20)

More from Tim Callaghan

More from Tim Callaghan (9)

Recently uploaded

Recently uploaded (20)

Is It Fast? : Measuring MongoDB Performance