Mike Kania
Production Engineer @ Parse
Benchmarking, Load
Testing, and Preventing
Terrible Disasters
What Parse Does
We have 500k+ apps running on Parse.
Provide services to —
•Store user data
•Run server side JavaScript
•Send push notifications
•Handle crash reporting
•Generate analytics
Parse + MongoDB
• Use many of MongoDB’s feature set
• Support almost every type of workload you can
imagine
•Millions of collections and indexes
• new ones being created every minute
•Run MongoDB exclusively on AWS
•We do crazy things with MongoDB
Why Should You Listen
to Me?
• Parse has one of the most complex MongoDB
infrastructures(in the world?)
• Started using MongoDB in 1.8
• Upgraded 2.6 everywhere 6 months ago
• We have some battle wounds from upgrading
MongoDB to pass on to you
Why Shouldn’t You
Listen to Me?
MongoDB is a jack of all trades, and
there’s certain features that we haven’t
touched.
•Sharding — We built our own way
to shard data
•Aggregation/Map Reduce — We
don’t touch this at all
History of MongoDB
Upgrades at Parse
1.8 2.0 2.2 2.4 2.6 3.0
{Doitlive
Cowboy Upgrade
1. Review “Upgrade Requirements” and
known bugs in JIRA
2. Run intigration/unit tests agains the
new version
3. Spin up a hidden secondary. Watch for
problems
4. Unhide SECONDARY.. Watch for
problems
5. Promote to PRIMARY
6. Declare success! Oh wait I mean
watch for problems.
What Went Wrong
• 60% perf reduction
• all geo indexes block global
lock until the first document
found
• unindexable writes suddenly
refused
• changed the definition of
scan limits,
A New Approach
1.8 2.0 2.2 2.4 2.6 3.0
{
{
Doitlive
Doitwith
production
workloads
in a
test environment
Flashback
• Open sourced benchmarking
tool specifically for MongoDB
• Captures production
workloads
• Replay those workloads
over and over again with
configurable speeds
• Recently merged a pull request
to support load testing with
Mongo sharing
Record
Get the config setup:
•oplog_server: A secondary that will be used to
tail the oplog for write operations
•profiler_server: The primary in the target replica
set to capture profiling data
•duration_sec: Defines how long you want to
record
Enable Profiling
• Keep in mind, it does an additional write for every
operation.
•./set_mongo_profiling.py -a enable -n
$PRIMARY_HOSTNAME
Moar Better Recording
• What about just capturing it over the wire?
• Maybe use mongosniff
• MongoDB has a built in pcap library.
• Enter mongocaputils
• Also open source
• Still a little buggy
Running the Record
./record.py
Creating a Consistent
Snapshot
Need a way to quickly capture a consistent
snapshot of your dataset
We use EBS snapshots,
•locking mongod
•creating an EBS snapshot of all the RAIDed
volumes on /var/lib/mongodb
•unlocking mongod.
Quickly Replaying
Workloads
•Pre-Warming EBS snapshots after each run is slow
and time consuming
•Pulling down the blocks from S3 takes hours or
days if you have terabytes of data.
•We decided to use LVM on top of EBS
•Does incur I/O overhead
•Allows us to do LVM snapshots!
How we used LVM
Define a restore point before benchmarking
•lvcreate -l 10%VG -s -n restore_point /dev/
mongovg/mongoraid
Merge Copy-on-Write logical volume to rollback
•Stop MongoDB
•Unmount Filesystem
•lvconvert –merge /dev/mongovg/restore_point
Creating the Test
Environment
• Spin up new EC2 instance and restore the EBS
volumes from snapshot
•New EBS volumes need to be pre-warmed.
Blocks are lazily loaded from S3
• Benchmark server which will run Flashback
request and has the workload on disk.
•Nothing specials needs to happen here
Benchmarking New
Shiny Storage Engines
In MongoDB 3.0, each storage engine has a
different on-disk format
So we also need to run an initial sync of each
new storage engine against our restored
MMAPv1 backup, and then run benchmarks
on each format.
MMAPv1
(restored from
snapshot)
RocksDB
WiredTiger
initial sync
initial sync
Side Note: The Storage
Efficiency of the RocksDB/
WiredTiger is Amazing*
*You should totally check out the “Storage Engine Wars” talk
by Charity Majors and Igor Canadi
0
1,000
2,000
3,000
4,000
283GB318GB
3,245GB
MMAPv1 WiredTiger RocksDB
Running the Replay
• Two styles to replay: real and
stress
flashback 
-ops_filename=OUTPUT 
-style=real 
-url=$MONGO_HOST:27017 
-workers=50
MongoDB 2.6
MMAPv1
MongoDB 3.0
MMAPv1
MongoDB 3.0
RocksDB
Flashback
Metrics Gathering
• Flashback percentile latencies broken down by
operation type.
• Useful from a high level
• Not so useful when diving into query regressions
Logging Pipeline
• Mongo logs are hard to parse.
• Thankfully you don’t need to worry about it
• Just use our open source PEG parser
mongologtools
• Ship JSON via Scribe to an internal Facebook
data diving tool
First Results
Op
2.6
MMAPv1
3.0
MMAPv1
3.0
RockDB
query 2.93ms 4.43ms 3.04ms
p50 Query Latency
Op
2.6
MMAPv1
3.0
MMAPv1
3.0 RockDB
query 177.41ms
619471.47m
s
1441442.26
ms
p99 Query Latency
First Regression
•Regression in $nearSphere
queries just for 3.0
•SERVER-17469 — patched in
3.0.2
• After the fix average latency for
$nearSphere went from
•2354 ms to 35 ms
More Ad-Hoc Analysis
MMAPv1RocksDB
# documents scanned
durationmsdurationms
# documents scanned
P99 Latency
query
insert
remove
update
findandmodify
count
0ms 10ms 20ms 30ms 40ms
1
5
1
1
0
2
0
28
1
22
23
2
0
15
11
21
32
8
2.6 MMAPv1
3.0 MMAPv1
3.0 RockDB
Some time later…
Benchmarks Won’t
Find Everything
•[RocksDB] Prefix collision could happen between
restarts
https://github.com/mongodb-partners/mongo/
commit/
da8a90b3b71bf291684ffc5a6d2fd32118ce1a7b
•[MongoDB] Secondary reads block replication
https://jira.mongodb.org/browse/SERVER-18190
Where are we now with
testing 3.0?
• MongoDB 3.0 with RocksDB is serving some
production traffic and it looks amazing.
milliseconds
API Request
Linkage
• Flashback
• https://github.com/ParsePlatform/flashback
• Mongologtools
• https://github.com/tmc/mongologtools
• MongoDB 3.0 Benchmarking Results
• http://blog.parse.com/learn/engineering/mongodb-rocksdb-writing-
so-fast-it-makes-your-head-spin/
• nearSphere regression
• https://jira.mongodb.org/browse/SERVER-17469
• WT/RocksDB secondary crash
• https://jira.mongodb.org/browse/SERVER-17882

Benchmarking, Load Testing, and Preventing Terrible Disasters

  • 1.
  • 2.
    Benchmarking, Load Testing, andPreventing Terrible Disasters
  • 3.
    What Parse Does Wehave 500k+ apps running on Parse. Provide services to — •Store user data •Run server side JavaScript •Send push notifications •Handle crash reporting •Generate analytics
  • 4.
    Parse + MongoDB •Use many of MongoDB’s feature set • Support almost every type of workload you can imagine •Millions of collections and indexes • new ones being created every minute •Run MongoDB exclusively on AWS •We do crazy things with MongoDB
  • 5.
    Why Should YouListen to Me? • Parse has one of the most complex MongoDB infrastructures(in the world?) • Started using MongoDB in 1.8 • Upgraded 2.6 everywhere 6 months ago • We have some battle wounds from upgrading MongoDB to pass on to you
  • 6.
    Why Shouldn’t You Listento Me? MongoDB is a jack of all trades, and there’s certain features that we haven’t touched. •Sharding — We built our own way to shard data •Aggregation/Map Reduce — We don’t touch this at all
  • 7.
    History of MongoDB Upgradesat Parse 1.8 2.0 2.2 2.4 2.6 3.0 {Doitlive
  • 8.
    Cowboy Upgrade 1. Review“Upgrade Requirements” and known bugs in JIRA 2. Run intigration/unit tests agains the new version 3. Spin up a hidden secondary. Watch for problems 4. Unhide SECONDARY.. Watch for problems 5. Promote to PRIMARY 6. Declare success! Oh wait I mean watch for problems.
  • 9.
    What Went Wrong •60% perf reduction • all geo indexes block global lock until the first document found • unindexable writes suddenly refused • changed the definition of scan limits,
  • 10.
    A New Approach 1.82.0 2.2 2.4 2.6 3.0 { { Doitlive Doitwith production workloads in a test environment
  • 11.
    Flashback • Open sourcedbenchmarking tool specifically for MongoDB • Captures production workloads • Replay those workloads over and over again with configurable speeds • Recently merged a pull request to support load testing with Mongo sharing
  • 12.
    Record Get the configsetup: •oplog_server: A secondary that will be used to tail the oplog for write operations •profiler_server: The primary in the target replica set to capture profiling data •duration_sec: Defines how long you want to record
  • 13.
    Enable Profiling • Keepin mind, it does an additional write for every operation. •./set_mongo_profiling.py -a enable -n $PRIMARY_HOSTNAME
  • 14.
    Moar Better Recording •What about just capturing it over the wire? • Maybe use mongosniff • MongoDB has a built in pcap library. • Enter mongocaputils • Also open source • Still a little buggy
  • 15.
  • 16.
    Creating a Consistent Snapshot Needa way to quickly capture a consistent snapshot of your dataset We use EBS snapshots, •locking mongod •creating an EBS snapshot of all the RAIDed volumes on /var/lib/mongodb •unlocking mongod.
  • 17.
    Quickly Replaying Workloads •Pre-Warming EBSsnapshots after each run is slow and time consuming •Pulling down the blocks from S3 takes hours or days if you have terabytes of data. •We decided to use LVM on top of EBS •Does incur I/O overhead •Allows us to do LVM snapshots!
  • 18.
    How we usedLVM Define a restore point before benchmarking •lvcreate -l 10%VG -s -n restore_point /dev/ mongovg/mongoraid Merge Copy-on-Write logical volume to rollback •Stop MongoDB •Unmount Filesystem •lvconvert –merge /dev/mongovg/restore_point
  • 19.
    Creating the Test Environment •Spin up new EC2 instance and restore the EBS volumes from snapshot •New EBS volumes need to be pre-warmed. Blocks are lazily loaded from S3 • Benchmark server which will run Flashback request and has the workload on disk. •Nothing specials needs to happen here
  • 20.
    Benchmarking New Shiny StorageEngines In MongoDB 3.0, each storage engine has a different on-disk format So we also need to run an initial sync of each new storage engine against our restored MMAPv1 backup, and then run benchmarks on each format. MMAPv1 (restored from snapshot) RocksDB WiredTiger initial sync initial sync
  • 21.
    Side Note: TheStorage Efficiency of the RocksDB/ WiredTiger is Amazing* *You should totally check out the “Storage Engine Wars” talk by Charity Majors and Igor Canadi 0 1,000 2,000 3,000 4,000 283GB318GB 3,245GB MMAPv1 WiredTiger RocksDB
  • 22.
    Running the Replay •Two styles to replay: real and stress flashback -ops_filename=OUTPUT -style=real -url=$MONGO_HOST:27017 -workers=50 MongoDB 2.6 MMAPv1 MongoDB 3.0 MMAPv1 MongoDB 3.0 RocksDB Flashback
  • 23.
    Metrics Gathering • Flashbackpercentile latencies broken down by operation type. • Useful from a high level • Not so useful when diving into query regressions
  • 24.
    Logging Pipeline • Mongologs are hard to parse. • Thankfully you don’t need to worry about it • Just use our open source PEG parser mongologtools • Ship JSON via Scribe to an internal Facebook data diving tool
  • 25.
    First Results Op 2.6 MMAPv1 3.0 MMAPv1 3.0 RockDB query 2.93ms4.43ms 3.04ms p50 Query Latency Op 2.6 MMAPv1 3.0 MMAPv1 3.0 RockDB query 177.41ms 619471.47m s 1441442.26 ms p99 Query Latency
  • 26.
    First Regression •Regression in$nearSphere queries just for 3.0 •SERVER-17469 — patched in 3.0.2 • After the fix average latency for $nearSphere went from •2354 ms to 35 ms
  • 27.
    More Ad-Hoc Analysis MMAPv1RocksDB #documents scanned durationmsdurationms # documents scanned
  • 28.
    P99 Latency query insert remove update findandmodify count 0ms 10ms20ms 30ms 40ms 1 5 1 1 0 2 0 28 1 22 23 2 0 15 11 21 32 8 2.6 MMAPv1 3.0 MMAPv1 3.0 RockDB Some time later…
  • 30.
    Benchmarks Won’t Find Everything •[RocksDB]Prefix collision could happen between restarts https://github.com/mongodb-partners/mongo/ commit/ da8a90b3b71bf291684ffc5a6d2fd32118ce1a7b •[MongoDB] Secondary reads block replication https://jira.mongodb.org/browse/SERVER-18190
  • 31.
    Where are wenow with testing 3.0? • MongoDB 3.0 with RocksDB is serving some production traffic and it looks amazing. milliseconds API Request
  • 32.
    Linkage • Flashback • https://github.com/ParsePlatform/flashback •Mongologtools • https://github.com/tmc/mongologtools • MongoDB 3.0 Benchmarking Results • http://blog.parse.com/learn/engineering/mongodb-rocksdb-writing- so-fast-it-makes-your-head-spin/ • nearSphere regression • https://jira.mongodb.org/browse/SERVER-17469 • WT/RocksDB secondary crash • https://jira.mongodb.org/browse/SERVER-17882