Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
MongoDB Miami Meetup 1/26/15: Introduction to WiredTiger
1. Introduction to WiredTiger and the
Storage Engine API
Valeri Karpov
NodeJS Engineer, MongoDB
www.thecodebarbarian.com
www.slideshare.net/vkarpov15
github.com/vkarpov15
@code_barbarian
2. *
A Bit About Me
•CI/NodeJS Engineer at MongoDB
•Maintainer of mongoose ODM
•Recently working on rewriting mongodump, etc.
3. *
Talk Overview
•What is the Storage Engine API?
•What is WT and why you should care
•Basic WT internals and gotchas
•MMS Automation and WT
•Some very basic performance numbers
4. *
Introducing Storage Engines
•How MongoDB persists data
•<= MongoDB 2.6: “mmapv1” storage engine
•MongoDB 3.0 has a few options
• mmapv1
• in_memory
• wiredTiger
• devnull (now /dev/null does support sharding!)
•Internal hack: Twitter storage engine
5. *
Why Storage Engine API?
•Different performance characteristics
•mmapv1 doesn’t handle certain workloads well
•Consistent API on top of storage layer
• Can mix storage engines in a replset or sharded cluster!
8. *
What is WiredTiger?
•Storage engine company founded by BerkleyDB alums
•Recently acquired by MongoDB
•Available as a storage engine option in MongoDB 3.0
9. *
Why is WiredTiger Awesome?
•Document-level locking
•Compression on disk
•Consistency without journaling
•Better performance on certain workloads
10. *
Document-level Locking
•The often-criticized global write lock was removed in
2.2
•Database-level locking
•3.0 with mmapv1 has collection-level locking
•3.0 with WT only locks at document layer
•Writes no longer block all other writes
•Better CPU usage: more cores ~= more writes
11. *
Compression
•WT uses snappy compression by default
•Data is compressed on disk
•2 supported compression algorithms:
• snappy: default. Good compression, relatively low overhead
• zlib: Better compression, but at cost of more overhead
12. *
Consistency without Journaling
•mmapv1 uses write-ahead log to guarantee consistency
as well as durability
•WT doesn’t have this problem: no in-place updates
•Potentially good for insert-heavy workloads
•Rely on replication for durability
•More on this in the next section
15. *
Upgrading to WT
•Can’t copy database files
•Can’t just restart with same dbpath
•Other methods for upgrading still work:
• Initial sync from replica set
• mongodump/mongorestore
•Can still do rolling upgrade of replica set to WT:
• Shut down secondary, delete dbpath, bring it back up with --
storageEngine wiredTiger
17. *
Other Configuration Options
•directoryperdb: doesn’t exist in WT
•Databases are a higher level abstraction with WT
•Following options also have no WT equivalent
• noprealloc
• syncdelay
• smallfiles
• journalCommitInterval
18. *
Configuration with YAML Files
•MongoDB 2.6 introduced YAML config files
•The storage.wiredTiger field lets you tweak WT options
19. *
WiredTiger Journaling
•Journaling in WT is a little different
•Write-ahead log committed to disk at checkpoints
•By default checkpoint every 60 seconds or 2GB written
•Data files always consistent - no journaling means you
lose data since last checkpoint
•No journal commit interval: writes are written to
journal as they come in
21. *
Gotcha: No 32-bit Support
•WT storage engine will not work on 32-bit platforms at
all
22. *
Using MMS Automation with WT
•MMS automation allows you to manage and deploy
MongoDB installations
•Demo of upgrading a standalone to WT
23. *
Some Basic Performance Numbers
•My desired use case: MongoDB for analytics data
•Write-heavy workloads aren’t mmapv1’s strong suit
•Don’t care about durability as much, but do care about
high throughput
•Compression is a plus
•How does WT w/o journaling do on insert-only?
•Simple N=1 experiment
26. *
Some Basic Performance Numbers
•How does the compression work in this example?
•After bench run, WT’s /data/db sums to ~23mb
•With --noprealloc, --nopreallocj, --smallfiles, mmapv1
has ~100mb
•Not a really fair comparison since data is very small