SlideShare a Scribd company logo
1 of 37
“TOO BIG TO FAILOVER”
A cautionary tale of scaling Redis
Aaron Pollack - May 2017
Presentation Summary
2
● How redis is used at napster
● Problems with failover at scale
● Our solution for constant time failovers
‹#›
Napster is still around?
‹#›
● Rhapsody rebranded as Napster last
Spring
● Provides on-demand and radio streaming
for mobile and desktop apps
● Powers on-demand streaming for apps like
iHeartRadio
The cat is back!
5
API.NAPSTER.COM
+
‹#›
NAPSTER API SNAPSHOT
● API Gateway Layer
● 1k developers using the API
● 70m request/day
● 7k Redis ops/sec
‹#›
We LOVE Redis (mostly)
● Fast! - Response times <10ms to Redis cluster
with network round trip included.
● Simple - Built in data types translate easily into
JS. Replication comes free.
● Available - Redis is mission critical for us. When
it’s down, we’re down.
Architected for Speed
8
So What’s The Problem?
9
So What’s The Problem?
10
● Redis server and sentinel share the same host
So What’s The Problem?
11
● Redis server and sentinel share the same host
● Four sentinels
a. An even number means that there is a chance for ties if
quorum is 2
So What’s The Problem?
12
● Redis server and sentinel share the same host
● Four sentinels
a. An even number means that there is a chance for ties if
quorum is 2
● Sending all read traffic to slaves means that you have downtime
during failover
1. Master is unreachable
2. Sentinels reach quorum and failover is initiated
3. A new slave is elected master
4. New master does full BGSAVE
5. Master syncs data to existing slaves
6. Data is loaded into memory
7. Slave serves traffic
Steps in Failover
13
1. Master is unreachable
2. Sentinels reach quorum and failover is initiated (30 seconds)
3. A new slave is elected master
4. New master does full BGSAVE
5. Master syncs data to existing slaves
6. Data is loaded into memory
7. Slave serves traffic
Steps in Failover (1GB in Memory)
14
1. Master is unreachable
2. Sentinels reach quorum and failover is initiated (30 seconds)
3. A new slave is elected master
4. New master does full BGSAVE (9 seconds)
5. Master syncs data to existing slaves
6. Data is loaded into memory
7. Slave serves traffic
Steps in Failover (1GB in Memory)
15
1. Master is unreachable
2. Sentinels reach quorum and failover is initiated (30 seconds)
3. A new slave is elected master
4. New master does full BGSAVE (9 seconds)
5. Master syncs data to existing slaves (39 seconds)
6. Data is loaded into memory
7. Slave serves traffic
Steps in Failover (1GB in Memory)
16
1. Master is unreachable
2. Sentinels reach quorum and failover is initiated (30 seconds)
3. A new slave is elected master
4. New master does full BGSAVE (9 seconds)
5. Master syncs data to existing slaves (39 seconds)
6. Data is loaded into memory (8 seconds)
7. Slave serves traffic
Steps in Failover (1GB in Memory)
17
1. Master is unreachable
2. Sentinels reach quorum and failover is initiated (30 seconds)
3. A new slave is elected master
4. New master does full BGSAVE (9 seconds)
5. Master syncs data to existing slaves (39 seconds)
6. Data is loaded into memory (8 seconds)
7. Slave serves traffic
Steps in Failover (1GB in Memory)
18
Total Time: ~1.5 minutes
1. Master is unreachable
2. Sentinels reach quorum and failover is initiated (30 seconds)
3. A new slave is elected master
4. New master does full BGSAVE (40 seconds)
5. Master syncs data to existing slaves (122 seconds)
6. Data is loaded into memory (43 seconds)
7. Slave serves traffic
Steps in Failover (5GB in Memory)
19
Total Time: 3 minutes
1. Master is unreachable
2. Sentinels reach quorum and failover is initiated (30 seconds)
3. A new slave is elected master
4. New master does full BGSAVE (181 seconds)
5. Master syncs data to existing slaves (305 seconds)
6. Data is loaded into memory (238 seconds)
7. Slave serves traffic
Steps in Failover (20GB in Memory)
20
Total Time: ~12.5 minutes
1. Master is unreachable
2. Sentinels reach quorum and failover is initiated (30 seconds)
3. A new slave is elected master
4. New master does full BGSAVE (243 seconds)
5. Master syncs data to existing slaves (425 seconds)
6. Data is loaded into memory (354 seconds)
7. Slave serves traffic
Steps in Failover (40GB in Memory)
21
Total Time: ~18 minutes
23
Slaves Become Unreachable During Failover
1. What is causing the failover?
2. Why is the data growing so quickly?
Investigation
24
1. Out of memory
1. What’s causing the failover?
25
1. Out of memory
2. Saturated client connections
1. What’s causing the failover?
26
1. Out of memory
2. Saturated client connections
3. Gremlins
1. What’s causing the failover?
27
1. Can you control the growth
of data?
2. If you can’t control it, at least
monitor it!
3. Think about data in terms of
volatile vs non-volatile
2. Why is the data growing so quickly?
28
1. Connection Pooling!
a. https://github.com/luin/ioredis
2. Fast fail if connection is not ready
3. Backoff strategy for retry
3. How can we be better clients of Redis?
29
ioredis
30
https://github.com/luin/ioredis
https://www.npmjs.com/package/ioredis
Client Singleton
31
Tuning ioredis Config
32
1. keepAlive - 0 (by default) enable connection pooling to redis
2. connectTimeout - milliseconds before a timeout occurs during the
initial connection to the Redis server
3. enableReadyCheck - wait for server to load database from disk before
sending commands
4. retryStrategy - wait an increasing amount of time with each connection
attempt.
1. Volatile vs non-volatile
a. Are you setting a ttl on keys?
2. What data is accessed the most?
4. Build your redis env around your data
33
Client Initializer
34
Architected for Availability
THANK YOU
Me:
apollack@napster.com
github.com/lolpack
lolpack.me
‹#›
Napster API Team:
@napsterAPI
Links:
White Paper: lolpack.me/rediswhitepaper.pdf
Try out Napster: order.napster.com/developer
API Docs: developer.napster.com

More Related Content

What's hot

Automatic Operation Bot for Ceph - You Ji
Automatic Operation Bot for Ceph - You JiAutomatic Operation Bot for Ceph - You Ji
Automatic Operation Bot for Ceph - You JiCeph Community
 
Global deduplication for Ceph - Myoungwon Oh
Global deduplication for Ceph - Myoungwon OhGlobal deduplication for Ceph - Myoungwon Oh
Global deduplication for Ceph - Myoungwon OhCeph Community
 
Solving some of the scalability problems at booking.com
Solving some of the scalability problems at booking.comSolving some of the scalability problems at booking.com
Solving some of the scalability problems at booking.comIvan Kruglov
 
Buildinga billionuserloadbalancer may2015-sre-con15europe-shuff
Buildinga billionuserloadbalancer may2015-sre-con15europe-shuffBuildinga billionuserloadbalancer may2015-sre-con15europe-shuff
Buildinga billionuserloadbalancer may2015-sre-con15europe-shuffPatrick Shuff
 
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014Amazon Web Services
 
Running Cloud Foundry for 12 months - An experience report | anynines
Running Cloud Foundry for 12 months - An experience report | anyninesRunning Cloud Foundry for 12 months - An experience report | anynines
Running Cloud Foundry for 12 months - An experience report | anyninesanynines GmbH
 
[En] IPVS for Docker Containers
[En] IPVS for Docker Containers[En] IPVS for Docker Containers
[En] IPVS for Docker ContainersAndrey Sibirev
 
Full Stack Load Testing
Full Stack Load Testing Full Stack Load Testing
Full Stack Load Testing Terral R Jordan
 
Leveraging Structured Data To Reduce Disk, IO & Network Bandwidth
Leveraging Structured Data To Reduce Disk, IO & Network BandwidthLeveraging Structured Data To Reduce Disk, IO & Network Bandwidth
Leveraging Structured Data To Reduce Disk, IO & Network BandwidthPerforce
 
SaltConf14 - Brendan Burns, Google - Management at Google Scale
SaltConf14 - Brendan Burns, Google - Management at Google ScaleSaltConf14 - Brendan Burns, Google - Management at Google Scale
SaltConf14 - Brendan Burns, Google - Management at Google ScaleSaltStack
 
Apache Traffic Server
Apache Traffic ServerApache Traffic Server
Apache Traffic Serversupertom
 
Ceph Goes on Online at Qihoo 360 - Xuehan Xu
Ceph Goes on Online at Qihoo 360 - Xuehan XuCeph Goes on Online at Qihoo 360 - Xuehan Xu
Ceph Goes on Online at Qihoo 360 - Xuehan XuCeph Community
 
Алексей Петров "PHP at Scale: Knowing enough to be dangerous!"
Алексей Петров "PHP at Scale: Knowing enough to be dangerous!"Алексей Петров "PHP at Scale: Knowing enough to be dangerous!"
Алексей Петров "PHP at Scale: Knowing enough to be dangerous!"Fwdays
 
Experience Report: Cloud Foundry Open Source Operations | anynines
Experience Report: Cloud Foundry Open Source Operations | anyninesExperience Report: Cloud Foundry Open Source Operations | anynines
Experience Report: Cloud Foundry Open Source Operations | anyninesanynines GmbH
 
JRuby - Everything in a single process
JRuby - Everything in a single processJRuby - Everything in a single process
JRuby - Everything in a single processocher
 
SaltConf 2014: Safety with powertools
SaltConf 2014: Safety with powertoolsSaltConf 2014: Safety with powertools
SaltConf 2014: Safety with powertoolsThomas Jackson
 
Chaos Engineering for Docker
Chaos Engineering for DockerChaos Engineering for Docker
Chaos Engineering for DockerAlexei Ledenev
 
How Typepad changed their architecture without taking down the service
How Typepad changed their architecture without taking down the serviceHow Typepad changed their architecture without taking down the service
How Typepad changed their architecture without taking down the serviceroyans
 

What's hot (19)

Automatic Operation Bot for Ceph - You Ji
Automatic Operation Bot for Ceph - You JiAutomatic Operation Bot for Ceph - You Ji
Automatic Operation Bot for Ceph - You Ji
 
Global deduplication for Ceph - Myoungwon Oh
Global deduplication for Ceph - Myoungwon OhGlobal deduplication for Ceph - Myoungwon Oh
Global deduplication for Ceph - Myoungwon Oh
 
Solving some of the scalability problems at booking.com
Solving some of the scalability problems at booking.comSolving some of the scalability problems at booking.com
Solving some of the scalability problems at booking.com
 
Buildinga billionuserloadbalancer may2015-sre-con15europe-shuff
Buildinga billionuserloadbalancer may2015-sre-con15europe-shuffBuildinga billionuserloadbalancer may2015-sre-con15europe-shuff
Buildinga billionuserloadbalancer may2015-sre-con15europe-shuff
 
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
 
Running Cloud Foundry for 12 months - An experience report | anynines
Running Cloud Foundry for 12 months - An experience report | anyninesRunning Cloud Foundry for 12 months - An experience report | anynines
Running Cloud Foundry for 12 months - An experience report | anynines
 
[En] IPVS for Docker Containers
[En] IPVS for Docker Containers[En] IPVS for Docker Containers
[En] IPVS for Docker Containers
 
Full Stack Load Testing
Full Stack Load Testing Full Stack Load Testing
Full Stack Load Testing
 
Leveraging Structured Data To Reduce Disk, IO & Network Bandwidth
Leveraging Structured Data To Reduce Disk, IO & Network BandwidthLeveraging Structured Data To Reduce Disk, IO & Network Bandwidth
Leveraging Structured Data To Reduce Disk, IO & Network Bandwidth
 
SaltConf14 - Brendan Burns, Google - Management at Google Scale
SaltConf14 - Brendan Burns, Google - Management at Google ScaleSaltConf14 - Brendan Burns, Google - Management at Google Scale
SaltConf14 - Brendan Burns, Google - Management at Google Scale
 
Apache Traffic Server
Apache Traffic ServerApache Traffic Server
Apache Traffic Server
 
Ceph Goes on Online at Qihoo 360 - Xuehan Xu
Ceph Goes on Online at Qihoo 360 - Xuehan XuCeph Goes on Online at Qihoo 360 - Xuehan Xu
Ceph Goes on Online at Qihoo 360 - Xuehan Xu
 
Алексей Петров "PHP at Scale: Knowing enough to be dangerous!"
Алексей Петров "PHP at Scale: Knowing enough to be dangerous!"Алексей Петров "PHP at Scale: Knowing enough to be dangerous!"
Алексей Петров "PHP at Scale: Knowing enough to be dangerous!"
 
Experience Report: Cloud Foundry Open Source Operations | anynines
Experience Report: Cloud Foundry Open Source Operations | anyninesExperience Report: Cloud Foundry Open Source Operations | anynines
Experience Report: Cloud Foundry Open Source Operations | anynines
 
JRuby - Everything in a single process
JRuby - Everything in a single processJRuby - Everything in a single process
JRuby - Everything in a single process
 
SaltConf 2014: Safety with powertools
SaltConf 2014: Safety with powertoolsSaltConf 2014: Safety with powertools
SaltConf 2014: Safety with powertools
 
Chaos Engineering for Docker
Chaos Engineering for DockerChaos Engineering for Docker
Chaos Engineering for Docker
 
Os Webb
Os WebbOs Webb
Os Webb
 
How Typepad changed their architecture without taking down the service
How Typepad changed their architecture without taking down the serviceHow Typepad changed their architecture without taking down the service
How Typepad changed their architecture without taking down the service
 

Similar to RedisConf17 - Too Big to Failover - A cautionary tale of scaling Redis

Graph Stream Processing : spinning fast, large scale, complex analytics
Graph Stream Processing : spinning fast, large scale, complex analyticsGraph Stream Processing : spinning fast, large scale, complex analytics
Graph Stream Processing : spinning fast, large scale, complex analyticsParis Carbone
 
PhpTek Ten Things to do to make your MySQL servers Happier and Healthier
PhpTek Ten Things to do to make your MySQL servers Happier and HealthierPhpTek Ten Things to do to make your MySQL servers Happier and Healthier
PhpTek Ten Things to do to make your MySQL servers Happier and HealthierDave Stokes
 
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...Fred de Villamil
 
UKOUG, Lies, Damn Lies and I/O Statistics
UKOUG, Lies, Damn Lies and I/O StatisticsUKOUG, Lies, Damn Lies and I/O Statistics
UKOUG, Lies, Damn Lies and I/O StatisticsKyle Hailey
 
Infrastructure as code might be literally impossible part 2
Infrastructure as code might be literally impossible part 2Infrastructure as code might be literally impossible part 2
Infrastructure as code might be literally impossible part 2ice799
 
High performance Infrastructure Oct 2013
High performance Infrastructure Oct 2013High performance Infrastructure Oct 2013
High performance Infrastructure Oct 2013Server Density
 
Flink Forward Berlin 2018: Lasse Nedergaard - "Our successful journey with Fl...
Flink Forward Berlin 2018: Lasse Nedergaard - "Our successful journey with Fl...Flink Forward Berlin 2018: Lasse Nedergaard - "Our successful journey with Fl...
Flink Forward Berlin 2018: Lasse Nedergaard - "Our successful journey with Fl...Flink Forward
 
Rails performance at Justin.tv - Guillaume Luccisano
Rails performance at Justin.tv - Guillaume LuccisanoRails performance at Justin.tv - Guillaume Luccisano
Rails performance at Justin.tv - Guillaume LuccisanoGuillaume Luccisano
 
The care and feeding of a MySQL database
The care and feeding of a MySQL databaseThe care and feeding of a MySQL database
The care and feeding of a MySQL databaseDave Stokes
 
My talk at Linux Piter 2015
My talk at Linux Piter 2015My talk at Linux Piter 2015
My talk at Linux Piter 2015Alex Chistyakov
 
Advanced Administration, Monitoring and Backup
Advanced Administration, Monitoring and BackupAdvanced Administration, Monitoring and Backup
Advanced Administration, Monitoring and BackupMongoDB
 
Failover or not to failover
Failover or not to failoverFailover or not to failover
Failover or not to failoverHenrik Ingo
 
Pluk2013 bodybuilding ratheesh
Pluk2013 bodybuilding ratheeshPluk2013 bodybuilding ratheesh
Pluk2013 bodybuilding ratheeshRatheesh Kaniyala
 
Sheepdog Status Report
Sheepdog Status ReportSheepdog Status Report
Sheepdog Status ReportLiu Yuan
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsmarkgrover
 
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Adam Kawa
 
Low level java programming
Low level java programmingLow level java programming
Low level java programmingPeter Lawrey
 
5 Tips for Getting Started with Pivotal GemFire
5 Tips for Getting Started with Pivotal GemFire5 Tips for Getting Started with Pivotal GemFire
5 Tips for Getting Started with Pivotal GemFireVMware Tanzu
 
Leveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark PipelinesLeveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark PipelinesRose Toomey
 

Similar to RedisConf17 - Too Big to Failover - A cautionary tale of scaling Redis (20)

Graph Stream Processing : spinning fast, large scale, complex analytics
Graph Stream Processing : spinning fast, large scale, complex analyticsGraph Stream Processing : spinning fast, large scale, complex analytics
Graph Stream Processing : spinning fast, large scale, complex analytics
 
PhpTek Ten Things to do to make your MySQL servers Happier and Healthier
PhpTek Ten Things to do to make your MySQL servers Happier and HealthierPhpTek Ten Things to do to make your MySQL servers Happier and Healthier
PhpTek Ten Things to do to make your MySQL servers Happier and Healthier
 
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
 
UKOUG, Lies, Damn Lies and I/O Statistics
UKOUG, Lies, Damn Lies and I/O StatisticsUKOUG, Lies, Damn Lies and I/O Statistics
UKOUG, Lies, Damn Lies and I/O Statistics
 
Performance
PerformancePerformance
Performance
 
Infrastructure as code might be literally impossible part 2
Infrastructure as code might be literally impossible part 2Infrastructure as code might be literally impossible part 2
Infrastructure as code might be literally impossible part 2
 
High performance Infrastructure Oct 2013
High performance Infrastructure Oct 2013High performance Infrastructure Oct 2013
High performance Infrastructure Oct 2013
 
Flink Forward Berlin 2018: Lasse Nedergaard - "Our successful journey with Fl...
Flink Forward Berlin 2018: Lasse Nedergaard - "Our successful journey with Fl...Flink Forward Berlin 2018: Lasse Nedergaard - "Our successful journey with Fl...
Flink Forward Berlin 2018: Lasse Nedergaard - "Our successful journey with Fl...
 
Rails performance at Justin.tv - Guillaume Luccisano
Rails performance at Justin.tv - Guillaume LuccisanoRails performance at Justin.tv - Guillaume Luccisano
Rails performance at Justin.tv - Guillaume Luccisano
 
The care and feeding of a MySQL database
The care and feeding of a MySQL databaseThe care and feeding of a MySQL database
The care and feeding of a MySQL database
 
My talk at Linux Piter 2015
My talk at Linux Piter 2015My talk at Linux Piter 2015
My talk at Linux Piter 2015
 
Advanced Administration, Monitoring and Backup
Advanced Administration, Monitoring and BackupAdvanced Administration, Monitoring and Backup
Advanced Administration, Monitoring and Backup
 
Failover or not to failover
Failover or not to failoverFailover or not to failover
Failover or not to failover
 
Pluk2013 bodybuilding ratheesh
Pluk2013 bodybuilding ratheeshPluk2013 bodybuilding ratheesh
Pluk2013 bodybuilding ratheesh
 
Sheepdog Status Report
Sheepdog Status ReportSheepdog Status Report
Sheepdog Status Report
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
 
Low level java programming
Low level java programmingLow level java programming
Low level java programming
 
5 Tips for Getting Started with Pivotal GemFire
5 Tips for Getting Started with Pivotal GemFire5 Tips for Getting Started with Pivotal GemFire
5 Tips for Getting Started with Pivotal GemFire
 
Leveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark PipelinesLeveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark Pipelines
 

More from Redis Labs

Redis Day Bangalore 2020 - Session state caching with redis
Redis Day Bangalore 2020 - Session state caching with redisRedis Day Bangalore 2020 - Session state caching with redis
Redis Day Bangalore 2020 - Session state caching with redisRedis Labs
 
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020Redis Labs
 
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...Redis Labs
 
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020Redis Labs
 
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...Redis Labs
 
Redis for Data Science and Engineering by Dmitry Polyakovsky of Oracle
Redis for Data Science and Engineering by Dmitry Polyakovsky of OracleRedis for Data Science and Engineering by Dmitry Polyakovsky of Oracle
Redis for Data Science and Engineering by Dmitry Polyakovsky of OracleRedis Labs
 
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020Redis Labs
 
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020Redis Labs
 
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...Redis Labs
 
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...Redis Labs
 
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...Redis Labs
 
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...Redis Labs
 
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...Redis Labs
 
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020Redis Labs
 
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020Redis Labs
 
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020Redis Labs
 
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020Redis Labs
 
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...Redis Labs
 
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...Redis Labs
 
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...Redis Labs
 

More from Redis Labs (20)

Redis Day Bangalore 2020 - Session state caching with redis
Redis Day Bangalore 2020 - Session state caching with redisRedis Day Bangalore 2020 - Session state caching with redis
Redis Day Bangalore 2020 - Session state caching with redis
 
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
 
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
 
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
 
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...
 
Redis for Data Science and Engineering by Dmitry Polyakovsky of Oracle
Redis for Data Science and Engineering by Dmitry Polyakovsky of OracleRedis for Data Science and Engineering by Dmitry Polyakovsky of Oracle
Redis for Data Science and Engineering by Dmitry Polyakovsky of Oracle
 
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020
 
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
 
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...
 
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
 
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
 
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
 
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...
 
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
 
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
 
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020
 
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020
 
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...
 
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...
 
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...
 

Recently uploaded

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 

Recently uploaded (20)

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

RedisConf17 - Too Big to Failover - A cautionary tale of scaling Redis

  • 1. “TOO BIG TO FAILOVER” A cautionary tale of scaling Redis Aaron Pollack - May 2017
  • 2. Presentation Summary 2 ● How redis is used at napster ● Problems with failover at scale ● Our solution for constant time failovers
  • 4. ‹#› ● Rhapsody rebranded as Napster last Spring ● Provides on-demand and radio streaming for mobile and desktop apps ● Powers on-demand streaming for apps like iHeartRadio The cat is back!
  • 6. ‹#› NAPSTER API SNAPSHOT ● API Gateway Layer ● 1k developers using the API ● 70m request/day ● 7k Redis ops/sec
  • 7. ‹#› We LOVE Redis (mostly) ● Fast! - Response times <10ms to Redis cluster with network round trip included. ● Simple - Built in data types translate easily into JS. Replication comes free. ● Available - Redis is mission critical for us. When it’s down, we’re down.
  • 9. So What’s The Problem? 9
  • 10. So What’s The Problem? 10 ● Redis server and sentinel share the same host
  • 11. So What’s The Problem? 11 ● Redis server and sentinel share the same host ● Four sentinels a. An even number means that there is a chance for ties if quorum is 2
  • 12. So What’s The Problem? 12 ● Redis server and sentinel share the same host ● Four sentinels a. An even number means that there is a chance for ties if quorum is 2 ● Sending all read traffic to slaves means that you have downtime during failover
  • 13. 1. Master is unreachable 2. Sentinels reach quorum and failover is initiated 3. A new slave is elected master 4. New master does full BGSAVE 5. Master syncs data to existing slaves 6. Data is loaded into memory 7. Slave serves traffic Steps in Failover 13
  • 14. 1. Master is unreachable 2. Sentinels reach quorum and failover is initiated (30 seconds) 3. A new slave is elected master 4. New master does full BGSAVE 5. Master syncs data to existing slaves 6. Data is loaded into memory 7. Slave serves traffic Steps in Failover (1GB in Memory) 14
  • 15. 1. Master is unreachable 2. Sentinels reach quorum and failover is initiated (30 seconds) 3. A new slave is elected master 4. New master does full BGSAVE (9 seconds) 5. Master syncs data to existing slaves 6. Data is loaded into memory 7. Slave serves traffic Steps in Failover (1GB in Memory) 15
  • 16. 1. Master is unreachable 2. Sentinels reach quorum and failover is initiated (30 seconds) 3. A new slave is elected master 4. New master does full BGSAVE (9 seconds) 5. Master syncs data to existing slaves (39 seconds) 6. Data is loaded into memory 7. Slave serves traffic Steps in Failover (1GB in Memory) 16
  • 17. 1. Master is unreachable 2. Sentinels reach quorum and failover is initiated (30 seconds) 3. A new slave is elected master 4. New master does full BGSAVE (9 seconds) 5. Master syncs data to existing slaves (39 seconds) 6. Data is loaded into memory (8 seconds) 7. Slave serves traffic Steps in Failover (1GB in Memory) 17
  • 18. 1. Master is unreachable 2. Sentinels reach quorum and failover is initiated (30 seconds) 3. A new slave is elected master 4. New master does full BGSAVE (9 seconds) 5. Master syncs data to existing slaves (39 seconds) 6. Data is loaded into memory (8 seconds) 7. Slave serves traffic Steps in Failover (1GB in Memory) 18 Total Time: ~1.5 minutes
  • 19. 1. Master is unreachable 2. Sentinels reach quorum and failover is initiated (30 seconds) 3. A new slave is elected master 4. New master does full BGSAVE (40 seconds) 5. Master syncs data to existing slaves (122 seconds) 6. Data is loaded into memory (43 seconds) 7. Slave serves traffic Steps in Failover (5GB in Memory) 19 Total Time: 3 minutes
  • 20. 1. Master is unreachable 2. Sentinels reach quorum and failover is initiated (30 seconds) 3. A new slave is elected master 4. New master does full BGSAVE (181 seconds) 5. Master syncs data to existing slaves (305 seconds) 6. Data is loaded into memory (238 seconds) 7. Slave serves traffic Steps in Failover (20GB in Memory) 20 Total Time: ~12.5 minutes
  • 21. 1. Master is unreachable 2. Sentinels reach quorum and failover is initiated (30 seconds) 3. A new slave is elected master 4. New master does full BGSAVE (243 seconds) 5. Master syncs data to existing slaves (425 seconds) 6. Data is loaded into memory (354 seconds) 7. Slave serves traffic Steps in Failover (40GB in Memory) 21 Total Time: ~18 minutes
  • 22.
  • 23. 23 Slaves Become Unreachable During Failover
  • 24. 1. What is causing the failover? 2. Why is the data growing so quickly? Investigation 24
  • 25. 1. Out of memory 1. What’s causing the failover? 25
  • 26. 1. Out of memory 2. Saturated client connections 1. What’s causing the failover? 26
  • 27. 1. Out of memory 2. Saturated client connections 3. Gremlins 1. What’s causing the failover? 27
  • 28. 1. Can you control the growth of data? 2. If you can’t control it, at least monitor it! 3. Think about data in terms of volatile vs non-volatile 2. Why is the data growing so quickly? 28
  • 29. 1. Connection Pooling! a. https://github.com/luin/ioredis 2. Fast fail if connection is not ready 3. Backoff strategy for retry 3. How can we be better clients of Redis? 29
  • 32. Tuning ioredis Config 32 1. keepAlive - 0 (by default) enable connection pooling to redis 2. connectTimeout - milliseconds before a timeout occurs during the initial connection to the Redis server 3. enableReadyCheck - wait for server to load database from disk before sending commands 4. retryStrategy - wait an increasing amount of time with each connection attempt.
  • 33. 1. Volatile vs non-volatile a. Are you setting a ttl on keys? 2. What data is accessed the most? 4. Build your redis env around your data 33
  • 37. ‹#› Napster API Team: @napsterAPI Links: White Paper: lolpack.me/rediswhitepaper.pdf Try out Napster: order.napster.com/developer API Docs: developer.napster.com

Editor's Notes

  1. Issues my team faced while scaling redis in production
  2. Address the elephant in the room
  3. How we use Redis I work on the team that provides the public facing API for napster We use redis to store information about our developers and to authenticate our users
  4. 70 million that fall through the cache We store data about our developers, some user data, but mostly token sets as part of the Oauth flow
  5. If you lose one, you lose the other. You are subject to the 28K port limit
  6. A quorum of 3 when you only have 4 sentinels can delay the time it takes to elect a new master.
  7. Once the new master is elected, it can immediately handle writes
  8. The default of 30 seconds allows for network hiccups and any other event that might trigger an unnecessary failover. We’ve tried to tune this down to decrease overall failover time and if it’s too short it becomes too sensitive
  9. When developing with small data sets it’s almost unnoticeable
  10. Authenticated calls are failing Some health checks are failing By the time you have been alerted and look at the problem it’s fixed itself
  11. Unacceptable amount of downtime A restart won’t do anything for you. You are at the mercy of the time it takes to sync.
  12. - Can anyone else who has been on call relate?
  13. There is a linear correlation between data growth and the time it takes a slave to recover and become readable. BGSAVE doubles memory Perfect storm of connections piling up, bgsave memory issue and tokens not expiring fast enough
  14. The dust has settled and now it’s time to investigate the issue
  15. Set a maxmemory and key expire policy Key expiry policy only works for ephemeral data or if you are willing to lose persisted data
  16. Make sure your app/client is not making a bad problem worse for redis by re-establishing connections as soon as they fail
  17. Systems will fail, so building redundancy into critical systems is essential
  18. We are at the mercy of our client’s implementation of Oauth Monitoring usage allows us to proactively reach out to developers so they understand how the API should be used and we don’t have to store extra data We found a client was requesting a new Auth Token before each authenticated call We have to allow all new token sets in and don’t have a way of eagerly expiring old refresh tokens Developer data has to stay, ephemeral data like refresh tokens can go
  19. Switched NPM packages to ioredis and have never looked back There was a bug in our old package where it wouldn’t kill the old connection after a failed redis lookup Hit 28K port limit during redis outage
  20. Finally, some code! Create a global client referenced in the function to create a JS singleton
  21. Finally, some code! Create a global client referenced in the function to create a JS singleton Ensures any place we require redis throughout the app is using the same connection
  22. Key Configuration: `role: master` These configs are helpful during problem or outage situations Enable offline queue is dangerous for us - the only time we are offline is during an outage, so queueing up requests is not doing us any favors Retry Strategy: Good for network outages or failovers
  23. Redis is so fast and flexible, you may not consider volatility vs space issues We we’re storing critical data with ephemeral data
  24. The speed is not too shaby either: we can still auth a user in <50ms with backend roundtrip included. We traded some performance, but not too much No redis downtime since split Easy upgrades (30 second failover)
  25. You can go to order.napster.com/developer and get a free 6 month trial of Napster. Build an app with our APIs and then tweet at us, we would love to see what you come up with!