Techniques to Improve Cache Speed

PRESENTED BY
Techniques to Improve
Cache Speed
Zohaib Sibte Hassan
DoorDash, Technical Lead

PRESENTED BY
About me
Zohaib Sibte Hassan
Technical Lead at DoorDash
Dreamer, hacker, philosopher, troublemaker, loves
open source.
@zohaibility

IT’S HARD!
Creating delightful
experience

PRESENTED BY
A tale of reducing tail
latency

Cache Stampede
(aka Cache miss storm)

PRESENTED BY
Fixing stampede
● Consider key expired a in fixed random (gap) before expiry time.
● Store timestamp (timestamp) with value.
● Set TTL on key (ttl) on Redis.
● Consider key expired if:
timestamp + ttl + (rand() * gap) > now()
RANDOM AHEAD OF TIME EVICTION

PRESENTED BY
Optimal Probabilistic Cache Stampede
Prevention
http://www.vldb.org/pvldb/vol8/p886-vattani.pdf

PRESENTED BY
Implementation
def get_if_not_aot_expired(entry, expiry_time_window):
from math import log
from random import random
import time
expiry_ts, value = entry
current_ts = int(time.time())
predictive_expire_gap = expiry_time_window * log(random())
# predictive_expire_delta is always -ve due to log(0 .. 1)
if current_ts - predictive_expire_gap >= expiry_ts:
return None
return value

PRESENTED BY
Typical cache setup

PRESENTED BY
Under high traffic load similar cache
stampede/miss-storm can be observed
between L1 & L2 cache (and so on)

PRESENTED BY
We took inspiration from frontend world
(debounce) and exploited promises*
* Also known as Deferred, Task, Future etc.

PRESENTED BY
How
● Every reader provides a function (L1_MISS_FUNCTION) that returns a promise. Usually this function will
read L2 or DB (if L2 miss happens).
● When invoked via debouncer with a key (ID), only first reader is allowed to execute its
L1_MISS_FUNCTION which immediately returning back a promise.
● Rest of the readers coming in (while promise has not been resolved), are immediately returned back the
promise returned from first call to L1_MISS_FUNCTION.
● All readers await on the returned promise to read the value.
● Subsequent calls with ID will call L1_MISS_FUNCTION as soon as promise has been resolved.
ANATOMY OF OUR DEBOUNCING

PRESENTED BY
Implementation
class Debouncer {
constructor() {
this.pendingBoard = {};
}
async debounce(id, callback) {
if (this.pendingBoard[id] !== undefined) {
return await this.pendingBoard[id];
}
this.pendingBoard[id] = callback(id);
try {
return await this.pendingBoard[id];
} finally {
delete this.pendingBoard[id];
}
}
}
const debouncer = new Debouncer();
async function menuItemLoader(key) {
// Read from Redis/DB
}
// After cache miss
const menu = await debouncer.debounce(
`menu-${id}`,
menuItemLoader
);

We wrote a simulator
https://repl.it/@x0a1b/DebounceSimulator

PRESENTED BY
OTHER IMPLEMENTATIONS
LoadingCache<Key, Graph> graphs = Caffeine.newBuilder()
.maximumSize(10_000)
.expireAfterWrite(5, TimeUnit.MINUTES)
.refreshAfterWrite(1, TimeUnit.MINUTES)
.build(key -> createExpensiveGraph(key));
Caffeine cache

PRESENTED BY
WITHOUT DEBOUNCING
Stress test
Summary:
Count: 1053108
Total: 60769.34 ms
Slowest: 8058.86 ms
Fastest: 0.38 ms
Average: 114.43 ms
Requests/sec: 17329.60
Summary:
Count: 1321340
Total: 60064.00 ms
Slowest: 578.69 ms
Fastest: 36.04 ms
Average: 90.77 ms
Requests/sec: 21998.87
WITH DEBOUNCING
The GRPC endpoint being tested is performing a full read through operation ranging from an L1 cache (5
seconds TTL), an L2 cache (10 seconds TTL) and finally falling back to our Postgres database (1 second AOT
eviction). We used ghz to benchmark our service of 2 pods for 60 seconds with 2000 concurrent connections
and no rate limit.

PRESENTED BY
Why?
● Caching precompiled payloads (e.g. views, menus).
● ML models.
● Message queues or events.
● Even more use-cases with Redis stream message payloads.
WHY WOULD YOU NEED TO STORE LARGE PAYLOADS

PRESENTED BY
Took a page from HTTP, and decided to
use compression.

PRESENTED BY
Considerations
COMPRESSION RATIO
Should provide a decent
compression ratio.
LIGHTWEIGHT
Light on CPU and footprint
STABLE
Stable, well tested, and
well maintained

PRESENTED BY
Meet lzbench
● https://github.com/inikep/lzbench
● https://morotti.github.io/lzbench-web/ - Nice UI with set of
pre-benchmarked results.
● Compression is dependent upon data.
BENCHMARKING COMPRESSORS

Techniques to Improve Cache Speed
Our scenario
● 64,220 bytes of Chick-fil-A menu (serialized JSON)
● 350,333 bytes of Cheesecake factory (serialized JSON)

Observations
● On average LZ4 had slightly higher compression ratio than Snappy i.e. while compressing our serialized
payloads, on average LZ4 was 38.54% vs. 39.71% of Snappy compression ratio.
● Compression speeds of LZ4, and Snappy were almost the same. LZ4 was fractionally slower than Snappy.
● LZ4 was hands down faster than Snappy for decompression. In some cases we found it to be 2x faster
than Snappy.

Redis Operation No Compression (seconds) Snappy (seconds) LZ4 (seconds)
Set (10000) 16.526179 12.635553 12.802149
Get (10000) 12.047090 07.560119 06.434711
Benchmarks

Results

Techniques to Improve Cache Speed

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Techniques to Improve Cache Speed

Similar to Techniques to Improve Cache Speed (20)

Recently uploaded

Recently uploaded (20)

Techniques to Improve Cache Speed

Editor's Notes