3. Introduction to new high performance
storage engines in MongoDB 2.8
Agenda:
- MongoDB and NoSQL
- Storage Engine API
- WiredTiger configuration + performance
3.0
6. 6
MongoDB is a Document Database
MongoDB
Rich Queries
• Find Paul’s cars
• Find everybody in London with a car
built between 1970 and 1980
Geospatial
• Find all of the car owners within 5km of
Trafalgar Sq.
Text Search
• Find all the cars described as having
leather seats
Aggregation
• Calculate the average value of Paul’s
car collection
Map Reduce
• What is the ownership pattern of colors
by geography over time? (is purple
trending up in China?)
{
first_name: ‘Paul’,
surname: ‘Miller’,
city: ‘London’,
location:
[45.123,47.232],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
}
}
9. 9
Current state in MongoDB 2.6
Read-heavy apps
• Great performance
• B-tree
• Low overhead
• Good scale-out perf
• Secondary reads
• Sharding
Write-heavy apps
• Good scale-out perf
• Sharding
• Per-node efficiency wish-list:
• Doc level locking
• Write-optimized data
structures (LSM)
• Compression
Other
• Complex transactions
• In-memory engine
• SSD optimized engine
• etc...
10. 10
Current state in MongoDB 2.6
Read-heavy apps
• Great performance
• B-tree
• Low overhead
• Good scale-out perf
• Secondary reads
• Sharding
Write-heavy apps
• Good scale-out perf
• Sharding
• Per-node efficiency wish-list:
• Doc level locking
• Write-optimized data
structures (LSM)
• Compression
Other
• Complex transactions
• In-memory engine
• SSD optimized engine
• etc...
How to get all of the above?
11. 11
MongoDB 3.0 Storage Engine API
MMAP
Read-heavy app
WiredTiger
Write-heavy app
3rd party
Special app
12. 12
MMAP
Read-heavy app
WiredTiger
Write-heavy app
3rd party
Special app
• One at a time:
– Many engines built into mongod
– Choose 1 at startup
– All data stored by the same engine
– Incompatible on-disk data formats (obviously)
– Compatible client API
• Compatible Oplog & Replication
– Same replica set can mix different engines
– No-downtime migration possible
MongoDB 3.0 Storage Engine API
13. 13
• MMAPv1
– Improved MMAP (collection-level locking)
• WiredTiger
– Discussed next
• RocksDB
– LSM style engine developed by Facebook
– Based on LevelDB
• TokuMXse
– Fractal Tree indexing engine from Tokutek
Some existing engines
14. 14
• Heap
– In-memory engine
• Devnull
– Write all data to /dev/null
– Based on idea from famous flash animation...
– Oplog stored as normal
• SSD optimized engine (e.g. Fusion-IO)
• KV simple key-value engine
Some rumored engines
https://github.com/mongodb/mongo/tree/master/src/mongo/db/storage
24. 24
Covering 90% of your optimization needsWiredTigerSE
Btree LSM Columnar
Cache (default: 50%)
None Snappy Zlib
OS Disk Cache (Default: 50%)
Physical disk
Decompression time
Disk seek time
25. 25
Strategy 1: fit working set in CacheWiredTigerSE
Btree LSM Columnar
Cache (default: 50%)
None Snappy Zlib
OS Disk Cache (Default: 50%)
Physical disk
cache_size = 80%
26. 26
Strategy 2: fit working set in OS Disk CacheWiredTigerSE
Btree LSM Columnar
Cache (default: 50%)
None Snappy Zlib
OS Disk Cache (Default: 50%)
Physical disk
cache_size = 10%
OS Disk Cache (Remaining: 90%)
27. 27
Strategy 3: SSD disk + compression to save €WiredTigerSE
Btree LSM Columnar
Cache (default: 50%)
None Snappy Zlib
OS Disk Cache (Default: 50%)
Physical diskSSD
28. 28
Strategy 4: SSD disk (no compression)WiredTigerSE
Btree LSM Columnar
Cache (default: 50%)
None Snappy Zlib
OS Disk Cache (Default: 50%)
Physical diskSSD
29. 29
What problem is solved by LSM indexes?Performance
Fast reads Fast writesBoth
Easy:
Add indexes
Easy:
No indexes
Hard:
Smart schema design (hire a consultant)
LSM index structures (or columnar)