RedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetch

Redis Labs
Redis LabsRedis Labs
Preventing cache
stampede with Redis &
XFetch
Jim Nelson <jnelson@archive.org>
Internet Archive
RedisConf 2017
Internet Archive
Universal Access to All Knowledge
Founded 1996, based in San Francisco
Archive of digital and physical media
Includes Web, books, music, film, software & more
Digital holdings: over 30 petabytes & counting
Key collections & services:
Wayback Machine
Grateful Dead live concert collection
Internet Archive ♡ Redis
Caching & other services backed by 10-node sharded Redis cluster
Sharding performed client-side via consistent hashing (PHP, Predis)
Each node supported by two replicated mirrors (fail-over)
Specialized Redis instances also used throughout IA’s services, including
Wayback, search, and more
Caching: Quick terminology
I assume we all know what caching is. This is the terminology I’ll use today:
Recompute: Expensive operation whose result is cached
(database query, file system read, HTTP request to remote service)
Expiration: When a cache value is considered stale or out-of-date
(time-to-live)
Evict: Removing a value from the cache
(to forcibly invalidate a value prior to expiry)
Cache stampede
Cache stampede
“A cache stampede is a type of cascading failure that can
occur when massively parallel computing systems with
caching mechanisms come under very high load. This
behaviour is sometimes also called dog-piling.”
–Wikipedia
https://en.wikipedia.org/wiki/Cache_stampede
Cache stampede: A scenario
Multiple servers, each with multiple workers serving requests, accessing a
common cached value
When the cached value expires or is evicted, all workers experience a
simultaneous cache miss
Workers recompute the missing value, causing overload of primary data
sources (e.g. database) and/or hung requests
Congestion collapse
Hung workers due to network congestion or expensive recomputes—that’s bad
Discarded user requests—that’s bad
Overloaded primary data stores (“Sources of Truth”)—that’s bad
Harmonics (peaks & valleys): brief periods of intense activity (mini-outages)
followed by lulls—that’s bad
Imagine a cached value with TTL of 1hr enjoying 10,000 hits/sec—that’s good.
Now imagine @ 1hr+1sec 10,000 cache misses —that’s bad.
Typical cache code
function fetch(name)
var data = redis.get(name)
if (!data)
data = recompute(name)
redis.set(name, expires, data)
return data
This “looks” fine, but consider tens of thousands of simultaneous workers calling this code at once:
no mutual exclusion, no upper-bound to simultaneous recomputes or writes … that’s a cache stampede
Typical stampede solutions
(a) Locking
One worker acquires lock, recomputes, and writes value to cache
Other workers wait for lock to be released, then retry cache read
Primary data source is not overloaded by requests
Redis is often used as a cluster-wide distributed lock:
https://redis.io/topics/distlock
Problems with locking
Introduces extra reads and writes into code path
Starvation: expiration / eviction can lead to blocked workers waiting for a
single worker to finish recompute
Distributed locks may be abandoned
Typical stampede solutions
(b) External recompute
Use a separate process / independent worker to recompute value
Workers never recompute
(Alternately, workers recompute as fall-back when external process fails)
Problems with external recompute
One more “moving part”—a daemon, a cron job, work stealing
Requires fall-back scheme if external recompute fails to run
External recomputation is often not easily deterministic:
caching based on a wide variety of user input
periodic external recomputation of 1,000,000 user records
External recomputation may be inefficient if cached values are never read by
XFetch
(Probabilistic early recomputation)
Probabilistic early recomputation (PER)
Recompute cache values before they expire
Before expiration, one worker “volunteers” to recompute the value
Without evicting old value, volunteer performs expensive recompute—
other workers continue reading cache
Before expiration, volunteer writes new cache value and extends its
time-to-live
Under ideal conditions, there are no cache misses
XFetch
Full paper title: “Optimal Probabilistic Cache Stampede Prevention”
Authors:
Andrea Vattani (Goodreads)
Flavio Chierichetti (Sapienza University)
Keegan Lowenstein (Bugsnag)
Archived at IA:
https://archive.org/details/xfetch
The algorithm
XFetch (“exponential fetch”) is elegant:
delta * beta * loge(rand())
where
delta – Time to recompute value
beta – control (default: 1.0, > 1.0 favors earlier recomputation, < 1.0 favors later)
rand – Random number [ 0.0 … 1.0 ]
Remember: log(0) to log(1) is negative, so XFetch produces negative value
Updated code
function fetch(name)
var data,delta,ttl = redis.get(name, delta, ttl)
if (!data or xfetch(delta, time() + ttl))
var data,recompute_time = recompute(name)
redis.set(name, expires, data), redis.set(delta, expires, recompute_time)
return data
function xfetch(delta, expiry)
/* XFetch is negative; value is being added to time() */
return time() - (delta * BETA * log(rand(0,1))) >= expiry
Can more than one volunteer recompute?
Yes. You should know this before using XFetch.
It’s possible for more than one worker to “roll” the magic number and start a
recompute. The odds of this occurring increase as the expiration deadline
approaches.
If your data source absolutely cannot be accessed by multiple workers, use a
lock or another sentinel—XFetch will minimize lock contention
How to determine delta?
XFetch must be supplied with the time required to recompute.
The easiest approach is to store the duration of the last recompute and read it
with the cached value.
What’s the deal with the beta value?
beta is the one knob you have to tweak XFetch.
beta > 1.0 favors earlier recomputation, < 1.0 favors later recomputation.
My suggestion: Start with the default (1.0), instrument your code, and change
only if necessary.
XFetch & Redis
Let’s look at some sample
code
Questions?
Redis & XFetch
Jim Nelson <jnelson@archive.org>
Internet Archive
RedisConf 2017
1 of 24

Recommended

CyberAgent における OSS の CI/CD 基盤開発 myshoes #CICD2021 by
CyberAgent における OSS の CI/CD 基盤開発 myshoes #CICD2021CyberAgent における OSS の CI/CD 基盤開発 myshoes #CICD2021
CyberAgent における OSS の CI/CD 基盤開発 myshoes #CICD2021whywaita
2.1K views50 slides
AWS와 부하테스트의 절묘한 만남 :: 김무현 솔루션즈 아키텍트 :: Gaming on AWS 2016 by
AWS와 부하테스트의 절묘한 만남 :: 김무현 솔루션즈 아키텍트 :: Gaming on AWS 2016AWS와 부하테스트의 절묘한 만남 :: 김무현 솔루션즈 아키텍트 :: Gaming on AWS 2016
AWS와 부하테스트의 절묘한 만남 :: 김무현 솔루션즈 아키텍트 :: Gaming on AWS 2016Amazon Web Services Korea
11.6K views64 slides
クラウドのためのアーキテクチャ設計 - ベストプラクティス - by
クラウドのためのアーキテクチャ設計 - ベストプラクティス - クラウドのためのアーキテクチャ設計 - ベストプラクティス -
クラウドのためのアーキテクチャ設計 - ベストプラクティス - SORACOM, INC
28.9K views59 slides
Reactive Programming for a demanding world: building event-driven and respons... by
Reactive Programming for a demanding world: building event-driven and respons...Reactive Programming for a demanding world: building event-driven and respons...
Reactive Programming for a demanding world: building event-driven and respons...Mario Fusco
7.7K views32 slides
ストリーム処理を支えるキューイングシステムの選び方 by
ストリーム処理を支えるキューイングシステムの選び方ストリーム処理を支えるキューイングシステムの選び方
ストリーム処理を支えるキューイングシステムの選び方Yoshiyasu SAEKI
40.2K views42 slides
쿠버네티스를 이용한 기능 브랜치별 테스트 서버 만들기 (GitOps CI/CD) by
쿠버네티스를 이용한 기능 브랜치별 테스트 서버 만들기 (GitOps CI/CD)쿠버네티스를 이용한 기능 브랜치별 테스트 서버 만들기 (GitOps CI/CD)
쿠버네티스를 이용한 기능 브랜치별 테스트 서버 만들기 (GitOps CI/CD)충섭 김
12K views85 slides

More Related Content

What's hot

Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55 by
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55Preferred Networks
22.2K views36 slides
Cloud runのオートスケールを検証してみる by
Cloud runのオートスケールを検証してみるCloud runのオートスケールを検証してみる
Cloud runのオートスケールを検証してみる虎の穴 開発室
861 views15 slides
Tackling Complexity by
Tackling ComplexityTackling Complexity
Tackling ComplexityYoshitaka Kawashima
2.9K views16 slides
ドメイン駆動設計のためのオブジェクト指向入門 by
ドメイン駆動設計のためのオブジェクト指向入門ドメイン駆動設計のためのオブジェクト指向入門
ドメイン駆動設計のためのオブジェクト指向入門増田 亨
48K views89 slides
「黒騎士と白の魔王」gRPCによるHTTP/2 - API, Streamingの実践 by
「黒騎士と白の魔王」gRPCによるHTTP/2 - API, Streamingの実践「黒騎士と白の魔王」gRPCによるHTTP/2 - API, Streamingの実践
「黒騎士と白の魔王」gRPCによるHTTP/2 - API, Streamingの実践Yoshifumi Kawai
253.6K views53 slides
NginxとLuaを用いた動的なリバースプロキシでデプロイを 100 倍速くした by
NginxとLuaを用いた動的なリバースプロキシでデプロイを 100 倍速くしたNginxとLuaを用いた動的なリバースプロキシでデプロイを 100 倍速くした
NginxとLuaを用いた動的なリバースプロキシでデプロイを 100 倍速くしたtoshi_pp
43.4K views53 slides

What's hot(20)

Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55 by Preferred Networks
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
Preferred Networks22.2K views
Cloud runのオートスケールを検証してみる by 虎の穴 開発室
Cloud runのオートスケールを検証してみるCloud runのオートスケールを検証してみる
Cloud runのオートスケールを検証してみる
ドメイン駆動設計のためのオブジェクト指向入門 by 増田 亨
ドメイン駆動設計のためのオブジェクト指向入門ドメイン駆動設計のためのオブジェクト指向入門
ドメイン駆動設計のためのオブジェクト指向入門
増田 亨48K views
「黒騎士と白の魔王」gRPCによるHTTP/2 - API, Streamingの実践 by Yoshifumi Kawai
「黒騎士と白の魔王」gRPCによるHTTP/2 - API, Streamingの実践「黒騎士と白の魔王」gRPCによるHTTP/2 - API, Streamingの実践
「黒騎士と白の魔王」gRPCによるHTTP/2 - API, Streamingの実践
Yoshifumi Kawai253.6K views
NginxとLuaを用いた動的なリバースプロキシでデプロイを 100 倍速くした by toshi_pp
NginxとLuaを用いた動的なリバースプロキシでデプロイを 100 倍速くしたNginxとLuaを用いた動的なリバースプロキシでデプロイを 100 倍速くした
NginxとLuaを用いた動的なリバースプロキシでデプロイを 100 倍速くした
toshi_pp43.4K views
マイクロにしすぎた結果がこれだよ! by mosa siru
マイクロにしすぎた結果がこれだよ!マイクロにしすぎた結果がこれだよ!
マイクロにしすぎた結果がこれだよ!
mosa siru132.6K views
ksqlDB로 실시간 데이터 변환 및 스트림 처리 by confluent
ksqlDB로 실시간 데이터 변환 및 스트림 처리ksqlDB로 실시간 데이터 변환 및 스트림 처리
ksqlDB로 실시간 데이터 변환 및 스트림 처리
confluent965 views
Apache NiFiと 他プロダクトのつなぎ方 by Sotaro Kimura
Apache NiFiと他プロダクトのつなぎ方Apache NiFiと他プロダクトのつなぎ方
Apache NiFiと 他プロダクトのつなぎ方
Sotaro Kimura5.6K views
MongoDBが遅いときの切り分け方法 by Tetsutaro Watanabe
MongoDBが遅いときの切り分け方法MongoDBが遅いときの切り分け方法
MongoDBが遅いときの切り分け方法
Tetsutaro Watanabe34.8K views
20171005 告白に学ぶ http status code by Shinichi Takahashi
20171005 告白に学ぶ http status code20171005 告白に学ぶ http status code
20171005 告白に学ぶ http status code
Shinichi Takahashi11.1K views
C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz by DataStax Academy
C* Summit 2013: How Not to Use Cassandra by Axel LiljencrantzC* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
DataStax Academy22.4K views
HBase and HDFS: Understanding FileSystem Usage in HBase by enissoz
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz74K views
게임서비스를 위한 ElastiCache 활용 전략 :: 구승모 솔루션즈 아키텍트 :: Gaming on AWS 2016 by Amazon Web Services Korea
게임서비스를 위한 ElastiCache 활용 전략 :: 구승모 솔루션즈 아키텍트 :: Gaming on AWS 2016게임서비스를 위한 ElastiCache 활용 전략 :: 구승모 솔루션즈 아키텍트 :: Gaming on AWS 2016
게임서비스를 위한 ElastiCache 활용 전략 :: 구승모 솔루션즈 아키텍트 :: Gaming on AWS 2016
君はyarn.lockをコミットしているか? by Teppei Sato
君はyarn.lockをコミットしているか?君はyarn.lockをコミットしているか?
君はyarn.lockをコミットしているか?
Teppei Sato30.5K views
プログラミング言語の比較表 by Kazunori Sakamoto
プログラミング言語の比較表プログラミング言語の比較表
プログラミング言語の比較表
Kazunori Sakamoto4.3K views
Modern C# Programming 現代的なC#の書き方、ライブラリの選び方 by Yoshifumi Kawai
Modern C# Programming 現代的なC#の書き方、ライブラリの選び方Modern C# Programming 現代的なC#の書き方、ライブラリの選び方
Modern C# Programming 現代的なC#の書き方、ライブラリの選び方
Yoshifumi Kawai25.8K views
AWS Lambdaで作るクローラー/スクレイピング by Takuro Sasaki
AWS Lambdaで作るクローラー/スクレイピングAWS Lambdaで作るクローラー/スクレイピング
AWS Lambdaで作るクローラー/スクレイピング
Takuro Sasaki25.8K views
webservice scaling for newbie by DaeMyung Kang
webservice scaling for newbiewebservice scaling for newbie
webservice scaling for newbie
DaeMyung Kang21.8K views

Similar to RedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetch

Sedna XML Database: Executor Internals by
Sedna XML Database: Executor InternalsSedna XML Database: Executor Internals
Sedna XML Database: Executor InternalsIvan Shcheklein
1.8K views20 slides
A Scalable I/O Manager for GHC by
A Scalable I/O Manager for GHCA Scalable I/O Manager for GHC
A Scalable I/O Manager for GHCJohan Tibell
2.2K views22 slides
Performance and predictability (1) by
Performance and predictability (1)Performance and predictability (1)
Performance and predictability (1)RichardWarburton
1.1K views70 slides
Performance and Predictability - Richard Warburton by
Performance and Predictability - Richard WarburtonPerformance and Predictability - Richard Warburton
Performance and Predictability - Richard WarburtonJAXLondon2014
778 views70 slides
Work Stealing For Fun & Profit: Jim Nelson by
Work Stealing For Fun & Profit: Jim NelsonWork Stealing For Fun & Profit: Jim Nelson
Work Stealing For Fun & Profit: Jim NelsonRedis Labs
275 views48 slides
Leveraging Hadoop in your PostgreSQL Environment by
Leveraging Hadoop in your PostgreSQL EnvironmentLeveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL EnvironmentJim Mlodgenski
14K views48 slides

Similar to RedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetch(20)

Sedna XML Database: Executor Internals by Ivan Shcheklein
Sedna XML Database: Executor InternalsSedna XML Database: Executor Internals
Sedna XML Database: Executor Internals
Ivan Shcheklein1.8K views
A Scalable I/O Manager for GHC by Johan Tibell
A Scalable I/O Manager for GHCA Scalable I/O Manager for GHC
A Scalable I/O Manager for GHC
Johan Tibell2.2K views
Performance and predictability (1) by RichardWarburton
Performance and predictability (1)Performance and predictability (1)
Performance and predictability (1)
RichardWarburton1.1K views
Performance and Predictability - Richard Warburton by JAXLondon2014
Performance and Predictability - Richard WarburtonPerformance and Predictability - Richard Warburton
Performance and Predictability - Richard Warburton
JAXLondon2014778 views
Work Stealing For Fun & Profit: Jim Nelson by Redis Labs
Work Stealing For Fun & Profit: Jim NelsonWork Stealing For Fun & Profit: Jim Nelson
Work Stealing For Fun & Profit: Jim Nelson
Redis Labs275 views
Leveraging Hadoop in your PostgreSQL Environment by Jim Mlodgenski
Leveraging Hadoop in your PostgreSQL EnvironmentLeveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL Environment
Jim Mlodgenski14K views
.NET Fest 2018. Maarten Balliauw. Let’s refresh our memory! Memory management... by NETFest
.NET Fest 2018. Maarten Balliauw. Let’s refresh our memory! Memory management....NET Fest 2018. Maarten Balliauw. Let’s refresh our memory! Memory management...
.NET Fest 2018. Maarten Balliauw. Let’s refresh our memory! Memory management...
NETFest90 views
Openstack meetup lyon_2017-09-28 by Xavier Lucas
Openstack meetup lyon_2017-09-28Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28
Xavier Lucas118 views
Java In-Process Caching - Performance, Progress and Pittfalls by cruftex
Java In-Process Caching - Performance, Progress and PittfallsJava In-Process Caching - Performance, Progress and Pittfalls
Java In-Process Caching - Performance, Progress and Pittfalls
cruftex151 views
Java In-Process Caching - Performance, Progress and Pitfalls by Jens Wilke
Java In-Process Caching - Performance, Progress and PitfallsJava In-Process Caching - Performance, Progress and Pitfalls
Java In-Process Caching - Performance, Progress and Pitfalls
Jens Wilke611 views
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac... by HostedbyConfluent
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
HostedbyConfluent4.3K views
[Ruxcon Monthly Sydney 2011] Proprietary Protocols Reverse Engineering : Rese... by Moabi.com
[Ruxcon Monthly Sydney 2011] Proprietary Protocols Reverse Engineering : Rese...[Ruxcon Monthly Sydney 2011] Proprietary Protocols Reverse Engineering : Rese...
[Ruxcon Monthly Sydney 2011] Proprietary Protocols Reverse Engineering : Rese...
Moabi.com1.2K views
GC free coding in @Java presented @Geecon by Peter Lawrey
GC free coding in @Java presented @GeeconGC free coding in @Java presented @Geecon
GC free coding in @Java presented @Geecon
Peter Lawrey9.5K views
DotNetFest - Let’s refresh our memory! Memory management in .NET by Maarten Balliauw
DotNetFest - Let’s refresh our memory! Memory management in .NETDotNetFest - Let’s refresh our memory! Memory management in .NET
DotNetFest - Let’s refresh our memory! Memory management in .NET
Maarten Balliauw480 views
Servers and Processes: Behavior and Analysis by dreamwidth
Servers and Processes: Behavior and AnalysisServers and Processes: Behavior and Analysis
Servers and Processes: Behavior and Analysis
dreamwidth525 views
Web program-peformance-optimization by xiaojueqq12345
Web program-peformance-optimizationWeb program-peformance-optimization
Web program-peformance-optimization
xiaojueqq12345499 views
Privilege Escalation with Metasploit by egypt
Privilege Escalation with MetasploitPrivilege Escalation with Metasploit
Privilege Escalation with Metasploit
egypt20 views
Data Grids with Oracle Coherence by Ben Stopford
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle Coherence
Ben Stopford6.9K views

More from Redis Labs

Redis Day Bangalore 2020 - Session state caching with redis by
Redis Day Bangalore 2020 - Session state caching with redisRedis Day Bangalore 2020 - Session state caching with redis
Redis Day Bangalore 2020 - Session state caching with redisRedis Labs
2.7K views12 slides
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020 by
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020Redis Labs
851 views23 slides
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da... by
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...Redis Labs
1.8K views29 slides
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020 by
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020Redis Labs
412 views10 slides
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar... by
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...Redis Labs
303 views14 slides
Redis for Data Science and Engineering by Dmitry Polyakovsky of Oracle by
Redis for Data Science and Engineering by Dmitry Polyakovsky of OracleRedis for Data Science and Engineering by Dmitry Polyakovsky of Oracle
Redis for Data Science and Engineering by Dmitry Polyakovsky of OracleRedis Labs
356 views14 slides

More from Redis Labs(20)

Redis Day Bangalore 2020 - Session state caching with redis by Redis Labs
Redis Day Bangalore 2020 - Session state caching with redisRedis Day Bangalore 2020 - Session state caching with redis
Redis Day Bangalore 2020 - Session state caching with redis
Redis Labs2.7K views
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020 by Redis Labs
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
Redis Labs851 views
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da... by Redis Labs
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
Redis Labs1.8K views
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020 by Redis Labs
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
Redis Labs412 views
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar... by Redis Labs
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...
Redis Labs303 views
Redis for Data Science and Engineering by Dmitry Polyakovsky of Oracle by Redis Labs
Redis for Data Science and Engineering by Dmitry Polyakovsky of OracleRedis for Data Science and Engineering by Dmitry Polyakovsky of Oracle
Redis for Data Science and Engineering by Dmitry Polyakovsky of Oracle
Redis Labs356 views
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020 by Redis Labs
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020
Redis Labs294 views
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020 by Redis Labs
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Redis Labs210 views
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S... by Redis Labs
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...
Redis Labs170 views
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt... by Redis Labs
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
Redis Labs318 views
Highly Available Persistent Session Management Service by Mohamed Elmergawi o... by Redis Labs
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
Redis Labs113 views
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da... by Redis Labs
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Redis Labs118 views
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos... by Redis Labs
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...
Redis Labs102 views
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020 by Redis Labs
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
Redis Labs98 views
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020 by Redis Labs
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
Redis Labs80 views
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020 by Redis Labs
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020
Redis Labs73 views
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020 by Redis Labs
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020
Redis Labs302 views
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi... by Redis Labs
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...
Redis Labs394 views
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh... by Redis Labs
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...
Redis Labs89 views
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr... by Redis Labs
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...
Redis Labs119 views

Recently uploaded

Case Study Copenhagen Energy and Business Central.pdf by
Case Study Copenhagen Energy and Business Central.pdfCase Study Copenhagen Energy and Business Central.pdf
Case Study Copenhagen Energy and Business Central.pdfAitana
17 views3 slides
Igniting Next Level Productivity with AI-Infused Data Integration Workflows by
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Safe Software
317 views86 slides
Info Session November 2023.pdf by
Info Session November 2023.pdfInfo Session November 2023.pdf
Info Session November 2023.pdfAleksandraKoprivica4
15 views15 slides
Ransomware is Knocking your Door_Final.pdf by
Ransomware is Knocking your Door_Final.pdfRansomware is Knocking your Door_Final.pdf
Ransomware is Knocking your Door_Final.pdfSecurity Bootcamp
66 views46 slides
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... by
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc
72 views29 slides
Melek BEN MAHMOUD.pdf by
Melek BEN MAHMOUD.pdfMelek BEN MAHMOUD.pdf
Melek BEN MAHMOUD.pdfMelekBenMahmoud
17 views1 slide

Recently uploaded(20)

Case Study Copenhagen Energy and Business Central.pdf by Aitana
Case Study Copenhagen Energy and Business Central.pdfCase Study Copenhagen Energy and Business Central.pdf
Case Study Copenhagen Energy and Business Central.pdf
Aitana17 views
Igniting Next Level Productivity with AI-Infused Data Integration Workflows by Safe Software
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Safe Software317 views
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... by TrustArc
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc72 views
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... by James Anderson
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
James Anderson126 views
Business Analyst Series 2023 - Week 3 Session 5 by DianaGray10
Business Analyst Series 2023 -  Week 3 Session 5Business Analyst Series 2023 -  Week 3 Session 5
Business Analyst Series 2023 - Week 3 Session 5
DianaGray10345 views
Five Things You SHOULD Know About Postman by Postman
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About Postman
Postman38 views
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors by sugiuralab
TouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective SensorsTouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective Sensors
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors
sugiuralab23 views
Piloting & Scaling Successfully With Microsoft Viva by Richard Harbridge
Piloting & Scaling Successfully With Microsoft VivaPiloting & Scaling Successfully With Microsoft Viva
Piloting & Scaling Successfully With Microsoft Viva
The Forbidden VPN Secrets.pdf by Mariam Shaba
The Forbidden VPN Secrets.pdfThe Forbidden VPN Secrets.pdf
The Forbidden VPN Secrets.pdf
Mariam Shaba20 views
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院 by IttrainingIttraining
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
PharoJS - Zürich Smalltalk Group Meetup November 2023 by Noury Bouraqadi
PharoJS - Zürich Smalltalk Group Meetup November 2023PharoJS - Zürich Smalltalk Group Meetup November 2023
PharoJS - Zürich Smalltalk Group Meetup November 2023
Noury Bouraqadi139 views
Data Integrity for Banking and Financial Services by Precisely
Data Integrity for Banking and Financial ServicesData Integrity for Banking and Financial Services
Data Integrity for Banking and Financial Services
Precisely29 views

RedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetch

  • 1. Preventing cache stampede with Redis & XFetch Jim Nelson <jnelson@archive.org> Internet Archive RedisConf 2017
  • 2. Internet Archive Universal Access to All Knowledge Founded 1996, based in San Francisco Archive of digital and physical media Includes Web, books, music, film, software & more Digital holdings: over 30 petabytes & counting Key collections & services: Wayback Machine Grateful Dead live concert collection
  • 3. Internet Archive ♡ Redis Caching & other services backed by 10-node sharded Redis cluster Sharding performed client-side via consistent hashing (PHP, Predis) Each node supported by two replicated mirrors (fail-over) Specialized Redis instances also used throughout IA’s services, including Wayback, search, and more
  • 4. Caching: Quick terminology I assume we all know what caching is. This is the terminology I’ll use today: Recompute: Expensive operation whose result is cached (database query, file system read, HTTP request to remote service) Expiration: When a cache value is considered stale or out-of-date (time-to-live) Evict: Removing a value from the cache (to forcibly invalidate a value prior to expiry)
  • 6. Cache stampede “A cache stampede is a type of cascading failure that can occur when massively parallel computing systems with caching mechanisms come under very high load. This behaviour is sometimes also called dog-piling.” –Wikipedia https://en.wikipedia.org/wiki/Cache_stampede
  • 7. Cache stampede: A scenario Multiple servers, each with multiple workers serving requests, accessing a common cached value When the cached value expires or is evicted, all workers experience a simultaneous cache miss Workers recompute the missing value, causing overload of primary data sources (e.g. database) and/or hung requests
  • 8. Congestion collapse Hung workers due to network congestion or expensive recomputes—that’s bad Discarded user requests—that’s bad Overloaded primary data stores (“Sources of Truth”)—that’s bad Harmonics (peaks & valleys): brief periods of intense activity (mini-outages) followed by lulls—that’s bad Imagine a cached value with TTL of 1hr enjoying 10,000 hits/sec—that’s good. Now imagine @ 1hr+1sec 10,000 cache misses —that’s bad.
  • 9. Typical cache code function fetch(name) var data = redis.get(name) if (!data) data = recompute(name) redis.set(name, expires, data) return data This “looks” fine, but consider tens of thousands of simultaneous workers calling this code at once: no mutual exclusion, no upper-bound to simultaneous recomputes or writes … that’s a cache stampede
  • 10. Typical stampede solutions (a) Locking One worker acquires lock, recomputes, and writes value to cache Other workers wait for lock to be released, then retry cache read Primary data source is not overloaded by requests Redis is often used as a cluster-wide distributed lock: https://redis.io/topics/distlock
  • 11. Problems with locking Introduces extra reads and writes into code path Starvation: expiration / eviction can lead to blocked workers waiting for a single worker to finish recompute Distributed locks may be abandoned
  • 12. Typical stampede solutions (b) External recompute Use a separate process / independent worker to recompute value Workers never recompute (Alternately, workers recompute as fall-back when external process fails)
  • 13. Problems with external recompute One more “moving part”—a daemon, a cron job, work stealing Requires fall-back scheme if external recompute fails to run External recomputation is often not easily deterministic: caching based on a wide variety of user input periodic external recomputation of 1,000,000 user records External recomputation may be inefficient if cached values are never read by
  • 15. Probabilistic early recomputation (PER) Recompute cache values before they expire Before expiration, one worker “volunteers” to recompute the value Without evicting old value, volunteer performs expensive recompute— other workers continue reading cache Before expiration, volunteer writes new cache value and extends its time-to-live Under ideal conditions, there are no cache misses
  • 16. XFetch Full paper title: “Optimal Probabilistic Cache Stampede Prevention” Authors: Andrea Vattani (Goodreads) Flavio Chierichetti (Sapienza University) Keegan Lowenstein (Bugsnag) Archived at IA: https://archive.org/details/xfetch
  • 17. The algorithm XFetch (“exponential fetch”) is elegant: delta * beta * loge(rand()) where delta – Time to recompute value beta – control (default: 1.0, > 1.0 favors earlier recomputation, < 1.0 favors later) rand – Random number [ 0.0 … 1.0 ] Remember: log(0) to log(1) is negative, so XFetch produces negative value
  • 18. Updated code function fetch(name) var data,delta,ttl = redis.get(name, delta, ttl) if (!data or xfetch(delta, time() + ttl)) var data,recompute_time = recompute(name) redis.set(name, expires, data), redis.set(delta, expires, recompute_time) return data function xfetch(delta, expiry) /* XFetch is negative; value is being added to time() */ return time() - (delta * BETA * log(rand(0,1))) >= expiry
  • 19. Can more than one volunteer recompute? Yes. You should know this before using XFetch. It’s possible for more than one worker to “roll” the magic number and start a recompute. The odds of this occurring increase as the expiration deadline approaches. If your data source absolutely cannot be accessed by multiple workers, use a lock or another sentinel—XFetch will minimize lock contention
  • 20. How to determine delta? XFetch must be supplied with the time required to recompute. The easiest approach is to store the duration of the last recompute and read it with the cached value.
  • 21. What’s the deal with the beta value? beta is the one knob you have to tweak XFetch. beta > 1.0 favors earlier recomputation, < 1.0 favors later recomputation. My suggestion: Start with the default (1.0), instrument your code, and change only if necessary.
  • 22. XFetch & Redis Let’s look at some sample code
  • 24. Redis & XFetch Jim Nelson <jnelson@archive.org> Internet Archive RedisConf 2017