Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Nate Wiger, Principal Solutions Architect, AWS
T...
Amazon ElastiCache
• Managed in-memory service
• Memcached or Redis
• Cluster of nodes
• Read replicas
• Monitoring + aler...
ELB App
External APIs
Modern Web / Mobile App
Memcached vs Redis
• Flat string cache
• Multithreaded
• No persistence
• Low maintenance
• Easy to scale horizontally
• S...
Storing JSON – Memcached vs Redis
# Memcached: Serialize string
str_json = Encode({“name”: “Nate Wiger”, “gender”: “M”})
S...
ElastiCache with
ElastiCache with Memcached – Development
Region
Availability Zone A Availability Zone B
Auto Scaling group
ElastiCache clu...
ElastiCache with Memcached – Development
Region
Availability Zone A Availability Zone B
Auto Scaling group
ElastiCache clu...
Add Nodes to Memcached Cluster
Add Nodes to Memcached Cluster
Add Nodes to Memcached Cluster
aws elasticache modify-cache-cluster
--cache-cluster-id my-cache-cluster
--num-cache-nodes ...
ElastiCache with Memcached – High Availability
Region
Availability Zone A Availability Zone B
Auto Scaling group
ElastiCac...
ElastiCache with Memcached – Scale Out
Region
Availability Zone A Availability Zone B
Auto Scaling group
ElastiCache clust...
Sharding
Consistent Hashing
Client pre-calculates a hash ring for best key distribution
http://berb.github.io/diploma-thesis/origin...
It’s All Been Done Before
• Ruby
• Dalli https://github.com/mperham/dalli
• Plus ElastiCache https://github.com/ktheory/da...
Auto-Discovery Endpoint
# PHP
$server_endpoint = "mycache.z2vq55.cfg.usw2.cache.amazonaws.com";
$cache = new Memcached();
$cache->setOption(
Memca...
App Caching Patterns
Be Lazy
# Python
def get_user(user_id):
record = cache.get(user_id)
if record is None:
# Run a DB query
record = db.query(...
Write On Through
# Python
def save_user(user_id, values):
record = db.query("update users ... where id = ?", user_id, valu...
Combo Move!
def save_user(user_id, values):
record = db.query("update users ... where id = ?", user_id, values)
cache.set(...
Web Cache with Memcached
# Gemfile
gem 'dalli-elasticache’
# config/environments/production.rb
endpoint = “mycluster.abc12...
Thundering Herd
Causes
• Cold cache – app startup
• Adding / removing nodes
• Cache key expiration (TTL)
• Out of cache me...
ElastiCache with
Not if I
destroy
it first!
It’s
mine!
Need uniqueness + ordering
Easy with Redis Sorted Sets
ZADD "leaderboard" 1201 "Goll...
Ex: Throttling requests to an API
Leverages Redis Counters
ELB
Externally
Facing
API
Reference: http://redis.io/commands/I...
• Redis counters – increment likes/dislikes
• Redis hashes – list of everyone’s ratings
• Process with algorithm like Slop...
Chat and Messaging
• PUBLISH and SUBSCRIBE Redis commands
• Game or Mobile chat
• Server intercommunication
SUBSCRIBE chat...
ElastiCache with Redis – Development
Region
Availability Zone A Availability Zone B
Auto Scaling group
ElastiCache cluster
Availability Zone A Availability Zone B
Use Primary Endpoint
Use Read Replicas
Auto-Failover
 Chooses replica with
lowest...
ElastiCache with Redis Multi-AZ
Region
Availability Zone A Availability Zone B
Auto Scaling group
ElastiCache cluster
ElastiCache with Redis Multi-AZ
Region
Availability Zone A Availability Zone B
Auto Scaling group
ElastiCache cluster
ElastiCache with Redis Multi-AZ
Region
Availability Zone A Availability Zone B
Auto Scaling group
ElastiCache cluster
ElastiCache with Redis Multi-AZ
Region
Availability Zone A Availability Zone B
Auto Scaling group
ElastiCache cluster
Redis Multi-AZ – Reads and Writes
ELB App
External APIs
Replication Group
ReadsWrites
Redis – Read/Write Connections
# Ruby example
redis_write = Redis.new(
'mygame-dev.z2vq55.ng.0001.usw2.cache.amazonaws.com...
Recap – Endpoint Autodetection
• Cluster endpoints:
aws elasticache describe-cache-clusters
--cache-cluster-id mycluster
-...
Splitting Redis By Purpose
ELB App
External APIs
Reads
Writes
Replication Group
Leaderboards
Replication Group
User Profil...
Don’t Plan Ahead!!
1. Start with one Redis Multi-AZ cluster
2. Split as needed
3. Scale read load via replicas
4. Rinse, r...
Tune It Up!
Alarms
Monitoring with CloudWatch
• CPU
• Evictions
• Memory
• Swap Usage
• Network In/Out
Key ElastiCache CloudWatch Metrics
• CPUUtilization
• Memcached – up to 90% ok
• Redis – divide by cores (ex: 90% / 4 = 22...
Scaling Up Redis
1. Snapshot existing cluster to Amazon S3
http://bit.ly/redis-snapshot
2. Spin up new Redis cluster from ...
Common Issues
DNS Caching – Redis Failover
• Failover requires updating a DNS CNAME
• Can take up to two minutes
• Watch out for app DNS...
1. Forks main Redis process
2. Writes data to disk from child process
3. Continues to accept traffic on main process
4. An...
Reduce memory allocated to Redis
• Set reserved-memory field in parameter groups
• Evicts more data from memory
Use larger...
Redis reserved-memory Parameter
Redis Engine Enhancements
• Only Available in Amazon ElastiCache
• Forkless backups = Lower memory usage
• If enough memor...
Riot Games: ElastiCache in the Wild
Tom Kerr
LEAGUE OF LEGENDS
APOLLO
APOLLO: COMMENTS ANYWHERE
APOLLO: COMMENTS ANYWHERE
APOLLO: ARCHITECTURE
Replication with automatic failover
Replication across availability zones
More snapshots, more often
LESS GOOD
Fun Stuff Deploy Stuff
GOOD
Fun Stuff Deploy Stuff
APOLLO
LEADERBOARDS
LEADERBOARDS: ARCHITECTURE
LEADERBOARDS: DATA STORE
US-WEST2:NA:3848433 37
US-WEST2:NA:3848 37433
http://redis.io/topics/memory-optimization
LEADERBOARDS
Replicas with automatic failover
BEST
PRACTICES
Manually snapshot more often
Monitor your replication metrics
Redis hash k...
Thank you!
Nate Wiger, Principal Solutions Architect, AWS
Tom Kerr, Software Engineer, Riot Games
Remember to complete
your evaluations!
(DAT407) Amazon ElastiCache: Deep Dive
(DAT407) Amazon ElastiCache: Deep Dive
(DAT407) Amazon ElastiCache: Deep Dive
(DAT407) Amazon ElastiCache: Deep Dive
(DAT407) Amazon ElastiCache: Deep Dive
(DAT407) Amazon ElastiCache: Deep Dive
(DAT407) Amazon ElastiCache: Deep Dive
(DAT407) Amazon ElastiCache: Deep Dive
(DAT407) Amazon ElastiCache: Deep Dive
(DAT407) Amazon ElastiCache: Deep Dive
(DAT407) Amazon ElastiCache: Deep Dive
(DAT407) Amazon ElastiCache: Deep Dive
(DAT407) Amazon ElastiCache: Deep Dive
Upcoming SlideShare
Loading in …5
×

(DAT407) Amazon ElastiCache: Deep Dive

4,744 views

Published on

Peek behind the scenes to learn about Amazon ElastiCache's design and architecture. See common design patterns of our Memcached and Redis offerings and how customers have used them for in-memory operations and achieved improved latency and throughput for applications. During this session, we review best practices, design patterns, and anti-patterns related to Amazon ElastiCache.

Published in: Technology

(DAT407) Amazon ElastiCache: Deep Dive

  1. 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Nate Wiger, Principal Solutions Architect, AWS Tom Kerr, Software Engineer, Riot Games October 8, 2015 Amazon ElastiCache Deep Dive Scaling Your Data in a Real-Time World DAT407
  2. 2. Amazon ElastiCache • Managed in-memory service • Memcached or Redis • Cluster of nodes • Read replicas • Monitoring + alerts
  3. 3. ELB App External APIs Modern Web / Mobile App
  4. 4. Memcached vs Redis • Flat string cache • Multithreaded • No persistence • Low maintenance • Easy to scale horizontally • Single-threaded • Persistence • Atomic operations • Advanced data types - http://redis.io/topics/data-types • Pub/sub messaging • Read replicas / failover
  5. 5. Storing JSON – Memcached vs Redis # Memcached: Serialize string str_json = Encode({“name”: “Nate Wiger”, “gender”: “M”}) SET user:nateware str_json GET user:nateware json = Decode(str_json) # Redis: Use a hash! HMSET user:nateware name “Nate Wiger” gender M HGET user:nateware name >> Nate Wiger HMGET user:nateware name gender >> Nate Wiger >> M
  6. 6. ElastiCache with
  7. 7. ElastiCache with Memcached – Development Region Availability Zone A Availability Zone B Auto Scaling group ElastiCache cluster
  8. 8. ElastiCache with Memcached – Development Region Availability Zone A Availability Zone B Auto Scaling group ElastiCache cluster Nope
  9. 9. Add Nodes to Memcached Cluster
  10. 10. Add Nodes to Memcached Cluster
  11. 11. Add Nodes to Memcached Cluster aws elasticache modify-cache-cluster --cache-cluster-id my-cache-cluster --num-cache-nodes 4 --apply-immediately # response "CacheClusterStatus": "modifying", "PendingModifiedValues": { "NumCacheNodes": 4 },
  12. 12. ElastiCache with Memcached – High Availability Region Availability Zone A Availability Zone B Auto Scaling group ElastiCache cluster
  13. 13. ElastiCache with Memcached – Scale Out Region Availability Zone A Availability Zone B Auto Scaling group ElastiCache cluster
  14. 14. Sharding
  15. 15. Consistent Hashing Client pre-calculates a hash ring for best key distribution http://berb.github.io/diploma-thesis/original/062_internals.html
  16. 16. It’s All Been Done Before • Ruby • Dalli https://github.com/mperham/dalli • Plus ElastiCache https://github.com/ktheory/dalli-elasticache • Python • HashRing / MemcacheRing https://pypi.python.org/pypi/hash_ring/ • Django w/ Auto-Discovery https://github.com/gusdan/django-elasticache • Node.js • node-memcached https://github.com/3rd-Eden/node-memcached • Auto-Discovery example http://stackoverflow.com/questions/17046661 • Java • SpyMemcached https://github.com/dustin/java-memcached-client • ElastiCache Client https://github.com/amazonwebservices/aws-elasticache- cluster-client-memcached-for-java • PHP • ElastiCache Client https://github.com/awslabs/aws-elasticache-cluster-client- memcached-for-php • .NET • ElastiCache Client https://github.com/awslabs/elasticache-cluster-config-net
  17. 17. Auto-Discovery Endpoint
  18. 18. # PHP $server_endpoint = "mycache.z2vq55.cfg.usw2.cache.amazonaws.com"; $cache = new Memcached(); $cache->setOption( Memcached::OPT_CLIENT_MODE, Memcached::DYNAMIC_CLIENT_MODE); # Set config endpoint as only server $cache->addServer($server_endpoint, 11211); DIY: http://bit.ly/elasticache-autodisc Memcached Node Auto-Discovery
  19. 19. App Caching Patterns
  20. 20. Be Lazy # Python def get_user(user_id): record = cache.get(user_id) if record is None: # Run a DB query record = db.query("select * from users where id = ?", user_id) cache.set(user_id, record) return record # App code user = get_user(17)
  21. 21. Write On Through # Python def save_user(user_id, values): record = db.query("update users ... where id = ?", user_id, values) cache.set(user_id, record) return record # App code user = save_user(17, {"name": "Nate Dogg"})
  22. 22. Combo Move! def save_user(user_id, values): record = db.query("update users ... where id = ?", user_id, values) cache.set(user_id, record, 300) # TTL return record def get_user(user_id): record = cache.get(user_id) if record is None: record = db.query("select * from users where id = ?", user_id) cache.set(user_id, record, 300) # TTL return record # App code save_user(17, {"name": "Nate Diddy"}) user = get_user(17)
  23. 23. Web Cache with Memcached # Gemfile gem 'dalli-elasticache’ # config/environments/production.rb endpoint = “mycluster.abc123.cfg.use1.cache.amazonaws.com:11211” elasticache = Dalli::ElastiCache.new(endpoint) config.cache_store = :dalli_store, elasticache.servers, expires_in: 1.day, compress: true # if you change ElastiCache cluster nodes elasticache.refresh.client Ruby on Rails Example
  24. 24. Thundering Herd Causes • Cold cache – app startup • Adding / removing nodes • Cache key expiration (TTL) • Out of cache memory Large # of cache misses Spike in database load Mitigations • Script to populate cache • Gradually scale nodes • Randomize TTL values • Monitor cache utilization
  25. 25. ElastiCache with
  26. 26. Not if I destroy it first! It’s mine! Need uniqueness + ordering Easy with Redis Sorted Sets ZADD "leaderboard" 1201 "Gollum” ZADD "leaderboard" 963 "Sauron" ZADD "leaderboard" 1092 "Bilbo" ZADD "leaderboard" 1383 "Frodo” ZREVRANGE "leaderboard" 0 -1 1) "Frodo" 2) "Gollum" 3) "Bilbo" 4) "Sauron” ZREVRANK "leaderboard" "Sauron" (integer) 3 Real-time Leaderboard!
  27. 27. Ex: Throttling requests to an API Leverages Redis Counters ELB Externally Facing API Reference: http://redis.io/commands/INCR FUNCTION LIMIT_API_CALL(APIaccesskey) limit = HGET(APIaccesskey, “limit”) time = CURRENT_UNIX_TIME() keyname = APIaccesskey + ":” + time count = GET(keyname) IF current != NULL && count > limit THEN ERROR ”API request limit exceeded" ELSE MULTI INCR(keyname) EXPIRE(keyname,10) EXEC PERFORM_API_CALL() END Rate Limiting
  28. 28. • Redis counters – increment likes/dislikes • Redis hashes – list of everyone’s ratings • Process with algorithm like Slope One or Jaccardian similarity • Ruby example - https://github.com/davidcelis/recommendable Recommendation Engines INCR item:38927:likes HSET item:38927:ratings "Susan" 1 INCR item:38927:dislikes HSET item:38927:ratings "Tommy" -1
  29. 29. Chat and Messaging • PUBLISH and SUBSCRIBE Redis commands • Game or Mobile chat • Server intercommunication SUBSCRIBE chat_channel:114 PUBLISH chat_channel:114 "Hello all" ["message", " chat_channel:114 ", "Hello all"] UNSUBSCRIBE chat_channel:114
  30. 30. ElastiCache with Redis – Development Region Availability Zone A Availability Zone B Auto Scaling group ElastiCache cluster
  31. 31. Availability Zone A Availability Zone B Use Primary Endpoint Use Read Replicas Auto-Failover  Chooses replica with lowest replication lag  DNS endpoint is same Redis Multi-AZ
  32. 32. ElastiCache with Redis Multi-AZ Region Availability Zone A Availability Zone B Auto Scaling group ElastiCache cluster
  33. 33. ElastiCache with Redis Multi-AZ Region Availability Zone A Availability Zone B Auto Scaling group ElastiCache cluster
  34. 34. ElastiCache with Redis Multi-AZ Region Availability Zone A Availability Zone B Auto Scaling group ElastiCache cluster
  35. 35. ElastiCache with Redis Multi-AZ Region Availability Zone A Availability Zone B Auto Scaling group ElastiCache cluster
  36. 36. Redis Multi-AZ – Reads and Writes ELB App External APIs Replication Group ReadsWrites
  37. 37. Redis – Read/Write Connections # Ruby example redis_write = Redis.new( 'mygame-dev.z2vq55.ng.0001.usw2.cache.amazonaws.com') redis_read = Redis::Distributed.new([ 'mygame-dev-002.z2vq55.ng.0001.usw2.cache.amazonaws.com', 'mygame-dev-003.z2vq55.ng.0001.usw2.cache.amazonaws.com' ]) redis_write.zset("leaderboard", "nateware", 1976) top_10 = redis_read.zrevrange("leaderboard", 0, 10)
  38. 38. Recap – Endpoint Autodetection • Cluster endpoints: aws elasticache describe-cache-clusters --cache-cluster-id mycluster --show-cache-node-info • Redis read replica endpoints: aws elasticache describe-replication-groups --replication-group-id myredisgroup • Can listen for SNS events: http://bit.ly/elasticache-sns http://bit.ly/elasticache-whitepaper
  39. 39. Splitting Redis By Purpose ELB App External APIs Reads Writes Replication Group Leaderboards Replication Group User Profiles Reads
  40. 40. Don’t Plan Ahead!! 1. Start with one Redis Multi-AZ cluster 2. Split as needed 3. Scale read load via replicas 4. Rinse, repeat
  41. 41. Tune It Up!
  42. 42. Alarms Monitoring with CloudWatch • CPU • Evictions • Memory • Swap Usage • Network In/Out
  43. 43. Key ElastiCache CloudWatch Metrics • CPUUtilization • Memcached – up to 90% ok • Redis – divide by cores (ex: 90% / 4 = 22.5%) • SwapUsage low • CacheMisses / CacheHits Ratio low / stable • Evictions near zero • Exception: Russian doll caching • CurrConnections stable • Whitepaper: http://bit.ly/elasticache-whitepaper
  44. 44. Scaling Up Redis 1. Snapshot existing cluster to Amazon S3 http://bit.ly/redis-snapshot 2. Spin up new Redis cluster from snapshot http://bit.ly/redis-seeding 3. Profit! 4. Also good for debugging copy of production data
  45. 45. Common Issues
  46. 46. DNS Caching – Redis Failover • Failover requires updating a DNS CNAME • Can take up to two minutes • Watch out for app DNS caching – esp. Java! http://bit.ly/jvm-dns • No API for triggering Redis failover • Turn off Multi-AZ temporarily • Promote replica to primary • Turn on Multi-AZ
  47. 47. 1. Forks main Redis process 2. Writes data to disk from child process 3. Continues to accept traffic on main process 4. Any key update causes a copy-on-write 5. Potentially DOUBLES memory usage by Redis Swapping During Redis Backup (BGSAVE)
  48. 48. Reduce memory allocated to Redis • Set reserved-memory field in parameter groups • Evicts more data from memory Use larger cache node type • More expensive • But no data eviction Write-heavy apps need extra Redis memory  Swapping During Redis Backup – Solutions
  49. 49. Redis reserved-memory Parameter
  50. 50. Redis Engine Enhancements • Only Available in Amazon ElastiCache • Forkless backups = Lower memory usage • If enough memory, will still fork (faster) • Improved replica sync under heavy write loads • Smoother failovers (PSYNC) • Two new CloudWatch metrics • ReplicationBytes: Number of bytes sent from primary node • SaveInProgress: 1/0 value that indicates if save is running • Try it today! Redis 2.8.22 or later.`
  51. 51. Riot Games: ElastiCache in the Wild Tom Kerr
  52. 52. LEAGUE OF LEGENDS
  53. 53. APOLLO
  54. 54. APOLLO: COMMENTS ANYWHERE
  55. 55. APOLLO: COMMENTS ANYWHERE
  56. 56. APOLLO: ARCHITECTURE
  57. 57. Replication with automatic failover Replication across availability zones More snapshots, more often
  58. 58. LESS GOOD Fun Stuff Deploy Stuff GOOD Fun Stuff Deploy Stuff
  59. 59. APOLLO
  60. 60. LEADERBOARDS
  61. 61. LEADERBOARDS: ARCHITECTURE
  62. 62. LEADERBOARDS: DATA STORE
  63. 63. US-WEST2:NA:3848433 37 US-WEST2:NA:3848 37433 http://redis.io/topics/memory-optimization
  64. 64. LEADERBOARDS
  65. 65. Replicas with automatic failover BEST PRACTICES Manually snapshot more often Monitor your replication metrics Redis hash key trick
  66. 66. Thank you! Nate Wiger, Principal Solutions Architect, AWS Tom Kerr, Software Engineer, Riot Games
  67. 67. Remember to complete your evaluations!

×