Redis part 2

Goal
Improve the stability and performance of our ElasticCache/Redis clusters by
introducing
● Background context not in AWS documentation
● Numbers for capacity/performance estimation
● Metrics/signals to pay attention to
● Anti-patterns and solutions not covered in part 1

Sharding
● Redis cluster itself does NOT use consistent hashing. This means high cost
when you resize the cluster - Think carefully before you do it!
● For estimation, 1TB resharding takes about 30 minutes
● However, consistent hashing is available through redis clients and proxies

Eviction and Memory usage
● Redis uses a timed + lazy eviction policy. Every 100ms it will randomly select
some keys and deleted expired ones. It will also delete expired keys when
fetched.
● mem_fragmentation_ratio = used_memory_rss / used_memory. < 1 means
not enough memory and using virtual memory. Higher number means more
fragments. Try to keep it around 1.03

Rule of thumb
● Keep string size < 10kb, and size of a collection < 5000
● 99% percentile latency >= 5ms most likely means problem, either with the
cluster or with the code.
● Each Redis node handle 20k rps and 20k connections easily.
● Redis using < 60% of total memory is in general fine.
● Cache hit rate < 70% is often a problem

Remember redis is single threaded
● KEYS commands should be rarely if at all used - A careless scan will block
the whole thread and thus other requests! Use SCAN instead
● For similar reason, DEL on huge collections should be avoided. In fact, use of
collections should be rare and carefully reviewed

Why we need cache-aside
If we update DB and then cache:
● What if two threads update the same entries
● Frequent write case. We update cache more than it is read
If we delete cache and then update DB:
● What if one thread writes and another one reads between the cache deletion
and new db write?

Updating a hot key
● Overall goal is to avoid stampede on the main DB
● Use a separate process just to update keys. The update gets triggered
periodically or when a cache is about to happen
● Alternatively, the service processes need to acquire a mutex before
proceeding. Processes failing to acquire mutex will retry the fetch-update.

Redis part 2

More Related Content

What's hot

Similar to Redis part 2

Recently uploaded

Redis part 2