Redis

•Download as PPTX, PDF•

0 likes•74 views

George Li

Reids and ElasticCache common patterns and anit patterns

Software

Goal
Understand common Elasticcache problems a platform engineer would receive
from application developers, including
1. Unexpected behavior
2. Performance
3. Cluster stability
4. HA/DR

Common problems in the application code
1. Not following cache-aside pattern - cached data becomes stale
2. Redis call embedded in @Transactional - adds 1ms to 2ms latency to the DB
transaction
3. Write huge key (> 1M) - cause p95 latency spike during MIGRATE call, which
is synchronous
4. Forget to update cache as empty even if the db has no value - can cause
cache penetration
5. No reconcile logic to defend against data inconsistency

Cache penetration
Most traffic still hits DB, because the read value does not exist in cache.
Solution:
1. Update cache with empty value if DB has no such value
2. Put a bloom filter in front of the cache, to filter out values that does not exist
for sure.
3. During load testing, make sure the service is still available even if cache
penetration happens.

Cache stampede
Much traffic hits DB, because a cache server just restarted or many values
expired around the same time
Solution:
1. Added a randomized slack to timeout
2. Make sure service is still available even if cache stampede happens
3. Sharding to limit the impact of 1 instance’s failure

Distributed lock
A common use case, but we can not treat most distributed locks as safe as locks
within the same process
1. The shift of node’s time (e.g., by NTP), may cause redis to expire the lock key
early
2. Need to record lock holder to avoid unlocking by mistake
3. The naive SET NX implementation is not reliable when a slave is promoted to
master

Sharding
Similar to all systems with fixed number of shards upfront (e.g., Kafka), try to avoid
online resharding. Instead, leave margin for shards during capacity planning

Metrics to monitor
1. In general redis is more network/memory bound than CPU bound.
2. Try to maintain > 70% cache hit rate
3. Use datadog’s ElasticCache dashboard as the starting point.

What's hot

SQL Server Query execution. Why my query is slow?GlobalLogic Ukraine

ThingMonk 2014: How To Improve On MQTT 3.1.1kellogh

Prometheus (Monitorama 2016)Brian Brazil

Ssl Accelerator Test BenchRijksoverheid

GopherCon 2017 - Writing Networking Clients in Go: The Design & Implementati...wallyqs

Evolution of Monitoring and Prometheus (Dublin 2018)Brian Brazil

Kubernetes at Telekom Austria Group Oliver Moser

Patterns and Considerations in Service Discovery - Con327 - re:Invent 2017Roven Drabo

Prometheus design and philosophy Docker, Inc.

Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"Fwdays

Ssl attacksn|u - The Open Security Community

Prometheus (Prometheus London, 2016)Brian Brazil

Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Brian Brazil

Monitoring Cloud Native Applications with PrometheusJacopo Nardiello

NATS + Docker meetup talk Oct - 2016wallyqs

Microservices and Prometheus (Microservices NYC 2016)Brian Brazil

NServiceBus_for_AdminsAdam Fyles

How to improve your apache web server’s performanceAndolasoft Inc

Project Reactor By ExampleDenny Abraham Cheriyan

Non-Kafkaesque Apache Kafka - Yottabyte 2018Otávio Carvalho

What's hot (20)

SQL Server Query execution. Why my query is slow?

ThingMonk 2014: How To Improve On MQTT 3.1.1

Prometheus (Monitorama 2016)

Ssl Accelerator Test Bench

GopherCon 2017 - Writing Networking Clients in Go: The Design & Implementati...

Evolution of Monitoring and Prometheus (Dublin 2018)

Kubernetes at Telekom Austria Group

Patterns and Considerations in Service Discovery - Con327 - re:Invent 2017

Prometheus design and philosophy

Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"

Ssl attacks

Prometheus (Prometheus London, 2016)

Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)

Monitoring Cloud Native Applications with Prometheus

NATS + Docker meetup talk Oct - 2016

Microservices and Prometheus (Microservices NYC 2016)

NServiceBus_for_Admins

How to improve your apache web server’s performance

Project Reactor By Example

Non-Kafkaesque Apache Kafka - Yottabyte 2018

Similar to Redis

RedisConf18 - Techniques for Synchronizing In-Memory Caches with RedisRedis Labs

[Hanoi-August 13] Tech Talk on Caching SolutionsITviec

Taking Full Advantage of Galera Multi Master ClusterCodership Oy - Creators of Galera Cluster

MariaDB High Availability WebinarMariaDB plc

Infinit's Next Generation Key-value Store - Julien Quintard and Quentin Hocqu...Docker, Inc.

Architecting for the cloud elasticity securityLen Bass

Best Practice for Achieving High Availability in MariaDBMariaDB plc

Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE confluent

Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEkawamuray

Real Time Operating SystemsMurtadha Alsabbagh

seminarembedded-150504150805-conversion-gate02.pdfkarunyamittapally

Redis part 2George Li

Dal deckCaroline_Rose

Mysql Latencysrubinstein

Introduction to Galera ClusterCodership Oy - Creators of Galera Cluster

Caching and tuning fun for high scalability @ FrOSCon 2011Wim Godden

(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014Amazon Web Services

Openstack meetup lyon_2017-09-28Xavier Lucas

MongoDB at MapMyFitnessMapMyFitness

Cassandra presentationSergey Enin

Similar to Redis (20)

RedisConf18 - Techniques for Synchronizing In-Memory Caches with Redis

[Hanoi-August 13] Tech Talk on Caching Solutions

Taking Full Advantage of Galera Multi Master Cluster

MariaDB High Availability Webinar

Infinit's Next Generation Key-value Store - Julien Quintard and Quentin Hocqu...

Architecting for the cloud elasticity security

Best Practice for Achieving High Availability in MariaDB

Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE

Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE

Real Time Operating Systems

seminarembedded-150504150805-conversion-gate02.pdf

Redis part 2

Dal deck

Mysql Latency

Introduction to Galera Cluster

Caching and tuning fun for high scalability @ FrOSCon 2011

(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014

Openstack meetup lyon_2017-09-28

MongoDB at MapMyFitness

Cassandra presentation

Recently uploaded

Exploring iOS App Development: Simplifying the ProcessEvangelist Apps https://twitter.com/EvangelistSW/

Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH

BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp

Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531

Introduction to Decentralized Applications (dApps)Intelisync

Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.

The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS

5 Signs You Need a Fashion PLM Software.pdfWave PLM

Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin

The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.

Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions

Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin

HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai

Asset Management Software - InfographicHr365.us smith

Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3

Recently uploaded (20)

Exploring iOS App Development: Simplifying the Process

Der Spagat zwischen BIAS und FAIRNESS (2024)

BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE

Hand gesture recognition PROJECT PPT.pptx

Introduction to Decentralized Applications (dApps)

Unit 1.1 Excite Part 1, class 9, cbse...

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data

The Ultimate Test Automation Guide_ Best Practices and Tips.pdf

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...

5 Signs You Need a Fashion PLM Software.pdf

Engage Usergroup 2024 - The Good The Bad_The Ugly

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...

The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...

Advancing Engineering with AI through the Next Generation of Strategic Projec...

Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide

HR Software Buyers Guide in 2024 - HRSoftware.com

Asset Management Software - Infographic

Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data

Redis

1. Elasticcache A platform perspective

2. Goal Understand common Elasticcache problems a platform engineer would receive from application developers, including 1. Unexpected behavior 2. Performance 3. Cluster stability 4. HA/DR

3. Common problems in the application code 1. Not following cache-aside pattern - cached data becomes stale 2. Redis call embedded in @Transactional - adds 1ms to 2ms latency to the DB transaction 3. Write huge key (> 1M) - cause p95 latency spike during MIGRATE call, which is synchronous 4. Forget to update cache as empty even if the db has no value - can cause cache penetration 5. No reconcile logic to defend against data inconsistency

4. Cache penetration Most traffic still hits DB, because the read value does not exist in cache. Solution: 1. Update cache with empty value if DB has no such value 2. Put a bloom filter in front of the cache, to filter out values that does not exist for sure. 3. During load testing, make sure the service is still available even if cache penetration happens.

5. Cache stampede Much traffic hits DB, because a cache server just restarted or many values expired around the same time Solution: 1. Added a randomized slack to timeout 2. Make sure service is still available even if cache stampede happens 3. Sharding to limit the impact of 1 instance’s failure

6. Distributed lock A common use case, but we can not treat most distributed locks as safe as locks within the same process 1. The shift of node’s time (e.g., by NTP), may cause redis to expire the lock key early 2. Need to record lock holder to avoid unlocking by mistake 3. The naive SET NX implementation is not reliable when a slave is promoted to master

7. Sharding Similar to all systems with fixed number of shards upfront (e.g., Kafka), try to avoid online resharding. Instead, leave margin for shards during capacity planning

8. Metrics to monitor 1. In general redis is more network/memory bound than CPU bound. 2. Try to maintain > 70% cache hit rate 3. Use datadog’s ElasticCache dashboard as the starting point.

Redis

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Redis

Similar to Redis (20)

Recently uploaded

Recently uploaded (20)

Redis