1
TELENAV CONFIDENTIAL
RocksDB vs. BoltDB
Xun Liu & Jay
2
TELENAV CONFIDENTIAL
Agenda
•RocksDB
• What & Why & How
• Internal
• Use cases
•BoltDB
• History & Keywords & Project Status
• RocksDB vs. BoltDB
• Practice in OSRM
3
TELENAV CONFIDENTIAL
RocksDB
4
TELENAV CONFIDENTIAL
History
5
TELENAV CONFIDENTIAL
What is RocksDB
1. Key-value persistent store
2. Embedded
4. Optimized for flash
3. Log Structure base tree
5. Fork of LevelDB
More info:
https://github.com/CodeBear801/tech_summary/blob/master/t
ech-summary/tools/rocksdb/rocksdb_index.md
6
TELENAV CONFIDENTIAL
Why RocksDB
Write amplification
Traditional DB use B+ tree
Optimize for heavy write
7
TELENAV CONFIDENTIAL
Why RocksDB
Embedded
8
TELENAV CONFIDENTIAL
Why RocksDB
9
TELENAV CONFIDENTIAL
RocksDB architecture
10
TELENAV CONFIDENTIAL
RocksDB architecture
11
TELENAV CONFIDENTIAL
RocksDB architecture
12
TELENAV CONFIDENTIAL
RocksDB architecture
13
TELENAV CONFIDENTIAL
RocksDB architecture
14
TELENAV CONFIDENTIAL
RocksDB architecture
Topics we will focus on
- Avoid random write
- Single write + multiple read
- Snapshot + multiple version
concurrency control
15
TELENAV CONFIDENTIAL
Internal(levelDB)
16
TELENAV CONFIDENTIAL
Internal(Based on LevelDB)
1. Put
2. Compaction
3. Get
4. Concurrent read/write
17
TELENAV CONFIDENTIAL
RocksDB Example
Write operation is extremely fast, just one disk update(WAL) + one
memory write(skiplist)
18
TELENAV CONFIDENTIAL
19
TELENAV CONFIDENTIAL
Write Ahead Log
key =bob value =3
20
TELENAV CONFIDENTIAL
Write Ahead Log
key =bob value =3
21
TELENAV CONFIDENTIAL
Write Ahead Log
key =bob value =3
22
TELENAV CONFIDENTIAL
23
TELENAV CONFIDENTIAL
Skiplist – a balance of space and performance, CAS, lock free
24
TELENAV CONFIDENTIAL
Internal(Based on LevelDB)
1. Put
2. Compaction
3. Get
4. Concurrent read/write
25
TELENAV CONFIDENTIAL
When to trigger
- Size
- Seek
- Manual compaction
26
TELENAV CONFIDENTIAL
RocksDB put() -> compaction
27
TELENAV CONFIDENTIAL
RocksDB put() -> compaction
28
TELENAV CONFIDENTIAL
RocksDB put() -> compaction
29
TELENAV CONFIDENTIAL
Internal(Based on LevelDB)
1. Put
2. Compaction
3. Get
4. Concurrent read/write
30
TELENAV CONFIDENTIAL
RocksDB Example
31
TELENAV CONFIDENTIAL
RocksDB get() example
32
TELENAV CONFIDENTIAL
bloom filter
If bloom filter detect key is not in the set, then it
must not: never false negative
If bloom filter detect the key inside the set, then
there is high possibility its in, but also possible
not exist, which is false positive
For rocksDB, 99.9% of get() finds value in single
sstable load
33
TELENAV CONFIDENTIAL
SSTable(sorted string table)
34
TELENAV CONFIDENTIAL
Internal(Based on LevelDB)
1. Put
2. Compaction
3. Get
4. Concurrent read/write
35
TELENAV CONFIDENTIAL
Concurrent Read/Write
Challenges
- Performance
- What if compaction happens
during query?
- What if a key is updated
during reading?
- Consistent result(no stale data,
timeout)
36
TELENAV CONFIDENTIAL
Put write tasks into queue
37
TELENAV CONFIDENTIAL
38
TELENAV CONFIDENTIAL
39
TELENAV CONFIDENTIAL
40
TELENAV CONFIDENTIAL
41
TELENAV CONFIDENTIAL
Challenges
- Performance
- What if compaction happens
during query?
- What if a key is updated
during reading?
- Consistent result(no stale data,
timeout)
Solutions:
- Isolate action of compaction and
query
- Sequence ID + Version
42
TELENAV CONFIDENTIAL
43
TELENAV CONFIDENTIAL
44
TELENAV CONFIDENTIAL
45
TELENAV CONFIDENTIAL
46
TELENAV CONFIDENTIAL
When to delete
old sstable?
reference count
by version
47
TELENAV CONFIDENTIAL
48
TELENAV CONFIDENTIAL
Usage
49
TELENAV CONFIDENTIAL
#1 Chrome – storage engine
50
TELENAV CONFIDENTIAL
#1 Chrome – storage engine
51
TELENAV CONFIDENTIAL
#2 Calculation Engine – record state
52
TELENAV CONFIDENTIAL
#3 Realtime newsfeed processing
More
Writes
More
Data
More
Query
53
TELENAV CONFIDENTIAL
#4. Map Making/Map data aggregator
54
TELENAV CONFIDENTIAL
RocksDB Resources
Main pages
• https://github.com/facebook/rocksdb
• https://github.com/facebook/rocksdb/wiki
• https://github.com/google/leveldb/tree/master/doc
Posts
• RocksDB: A High Performance Embedded Key-Value Store for Flash Storage
• Code reading notes about LevelDB
Practices
• RocksDB usage at LinkedIn
• Linked FollowFeed - (ALT - Aggregator Leaf Tailer)
55
TELENAV CONFIDENTIAL
BoltDB
56
TELENAV CONFIDENTIAL
BoltDB History
• 2011, Howard Chu introduced MDB, a memory-mapped database backend
for OpenLDAP, later renamed to LMDB (Lightning Memory-Mapped
Database).
• 2013, Bolt initially started by Ben Johnson as a port of LMDB to Go, but
then the two projects diverged.
• The author of Bolt decided to focus on simplicity and providing the easy-to-use Go
API.
• 2018, Blot stable and widely used. Etcd-io forked for further improvement.
57
TELENAV CONFIDENTIAL
BoltDB Keywords
• Key/value
• Embedded
• Simplicity and easy-to-use API
• Fast
• B+ tree
• Pure Go
• Memory-mapped
58
TELENAV CONFIDENTIAL
BoltDB vs RocksDB
• B+ tree vs. LSM tree
• Optimizes for
• read-heavy/range scans vs. random writes
• Go vs. C++
https://github.com/etcd-io/bbolt#comparison-with-other-databases
59
TELENAV CONFIDENTIAL
BoltDB – Stable & Widely used
Project Status
• Bolt is stable, the API is fixed, and the file format is fixed.
• Full unit test coverage and randomized black box testing are
used to ensure database consistency and thread safety.
• Bolt is currently used in high-load production environments
serving databases as large as 1TB.
• Many companies such as Shopify and Heroku use Bolt-
backed services every day.
ÞBackend store of etcd
Þhttps://github.com/boltdb/bolt#other-projects-using-bolt
60
TELENAV CONFIDENTIAL
BoltDB Practice in OSRM
Generate DB
• Stores fromNodeID,toNodeID->wayID mapping, each key costs 16bytes, each value costs 8 bytes.
• DB file is about 28 GB(about 9.2GB after snappy compressed), ~5 billions keys.
• Generating process is about 4 hours.
Query
• Query tens of nodes only takes 1ms!
• Query 226390 wayIDs from 226394 nodeIDs, takes 1.399967 seconds. (shm)
References
§ https://github.com/Telenav/osrm-backend/issues/257
§ https://github.com/Telenav/osrm-backend/issues/272
61
TELENAV CONFIDENTIAL
BoltDB Resources
Main pages
• https://github.com/boltdb/bolt
• https://github.com/etcd-io/bbolt
• https://github.com/bmatsuo/lmdb-go
• https://github.com/dgraph-io/badger
Posts
• Intro to BoltDB: Painless Performant Persistence by Nate Finch.
• Bolt -- an embedded key/value database for Go by Progville
• https://tech.townsourced.com/post/boltdb-vs-badger/
• https://dgraph.io/blog/post/badger-lmdb-boltdb/
Practices
• https://github.com/Telenav/osrm-backend/issues/257
• https://github.com/Telenav/osrm-backend/issues/272

Rocksdb vs boltdb