In Memory
When Fast Storage Is Not fast Enough
Gleb Natapov
Developer At ScyllaDB
Presenter bio
I am a long-time open source developer who previously
worked on the open source routing suite “Zebra”, the
OpenMPI HPC library, KVM hypervisor for Linux, the OSv
unikernel, and now Scylla.
Motivation
Scylla Single Node Storage Model
Cache Miss
SSTable Format
One SSTable consists of multiple files:
la-1-big-CRC.db
la-1-big-Data.db
la-1-big-Digest.sha1
la-1-big-Filter.db
la-1-big-Index.db
la-1-big-Scylla.db
la-1-big-Statistics.db
la-1-big-Summary.db
la-1-big-TOC.txt
SStable Format (Cont.)
SStable Read
At least two storage accesses per SSTable
Storage Speed
Device Latency
Register 1 Cycle
Cache 2-10ns
DRAM 100-200ns
NVMe 10-100us
SATA SSD 400us
Hard Disk 10ms
Cache Miss Price
With Average 2 SSTables Per Cache Miss on NVMe
~200us
Not too bad?
Cache Miss Price (In Real World)
▪ SSD becomes slower over time
▪ fstrim is running in the background
▪ Many parallel requests cause tail latency to grow
Real latency may be much bigger:
IOPING
No other IO:
99 requests completed in 8.54 ms, 396 KiB read, 11.6 k iops, 45.3 MiB/s
generated 100 requests in 990.3 ms, 400 KiB, 100 iops, 403.9 KiB/s
min/avg/max/mdev = 59.6 us / 86.3 us / 157.8 us / 27.2 us
Read/Write fio benchmark:
99 requests completed in 34.2 ms, 396 KiB read, 2.90 k iops, 11.3 MiB/s
generated 100 requests in 990.3 ms, 400 KiB, 100 iops, 403.9 KiB/s
min/avg/max/mdev = 73.0 us / 345.2 us / 5.74 ms / 694.3 us
fstrim:
99 requests completed in 300.3 ms, 396 KiB read, 329 iops, 1.29 MiB/s
generated 100 requests in 1.24 s, 400 KiB, 80 iops, 323.5 KiB/s
min/avg/max/mdev = 62.2 us / 3.03 ms / 83.4 ms / 14.5 ms
In-Memory vs On-Disk
Implementation
New “In-Memory” File Type
Store its content in RAM
New “Mirror” File Type
▪ Writes to both regular and in-memory file, reads from memory only:
Mirror file
Disk file
memory
file
Mirror file
Disk file
memory
file
write path read path
▪ Read everything into memory when opened.
Configuration
RAM Reservation
Command line:
--in-memory-storage-size-mb
Scylla.yaml:
in_memory_storage_size_mb:
Table Creation
CREATE TABLE ks.cf (
key blob PRIMARY KEY,
"C0" blob,
) WITH compression = {}
AND read_repair_chance = '0'
AND speculative_retry = 'ALWAYS'
AND in_memory = 'true'
AND compaction = {'class':'InMemoryCompactionStrategy'};
InMemoryCompactionStrategy
Compacts aggressively until only one SSTable is left.
Use In-Memory
For Predictably Low Latency
Thank You
Any Questions ?
Please stay in touch
gleb@scylladb.com

Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough