Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Scaling Redis Workloads with Amazon ElastiCache - AWS Online Tech Talks

910 views

Published on

Learning Objectives:
- Learn how to horizontally scale Redis clusters within ElastiCache
- Learn about features to secure data in ElastiCache for Redis
- Learn about ElastiCache for Redis use cases to speed up real-time applications in web, gaming, ad-tech, media

  • Be the first to comment

Scaling Redis Workloads with Amazon ElastiCache - AWS Online Tech Talks

  1. 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Labib, NoSQL Specialist SA 12/6/2017 Scaling Redis Workloads with Amazon ElastiCache
  2. 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What to expect from this session • Amazon ElastiCache Overview • Scaling your cluster with Online Re-sharding • Amazon ElastiCache Security & Encryption • Common Usage Patterns • Best Practices
  3. 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. In-memory key-value store supporting • Redis 3.2.10 • Memcached 1.4.34 High-performance Fully managed; zero admin Highly available and reliable Hardened by Amazon Amazon ElastiCache
  4. 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Powerful ~200 commands + Lua scripting In-memory data structure server Utility data structures Strings, lists, hashes, sets, sorted sets, bitmaps & HyperLogLogs Simple Atomic operations supports transactions Ridiculously fast! <1ms latency for most commands Highly available replication Persistence Open source Redis Overview
  5. 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. SMEMBERS features REDIS:6379> Amazon ElastiCache 1) “Easy to deploy & monitor” AWS Config Amazon CloudWatch AWS CloudTrail AWS CloudFormation AWS Management Console AWS CLI and SDKs alarm REDIS:6379> hget feature:details “deploy-monitor” Amazon SNS Email Notification AWS Lambda
  6. 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. SMEMBERS features REDIS:6379> REDIS:6379> hget feature:details “enhancements” 2) “Enhanced Redis Engine” Optimized Swap Memory •Mitigate the risk of increased swap usage during syncs and snapshots Dynamic write throttling •Improved output buffer management when the node’s memory is close to being exhausted Smoother failovers •Clusters recover faster as replicas avoid flushing their data to do a full re-sync with the primary Amazon ElastiCache
  7. 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Redis Topologies Slot 0-5461 Cluster Mode Disabled Keyspace Slot 10923-16383 I Primary 0-5 Replica’s Cluster Mode Enabled Primary Endpoint 1-15 Primaries / Shards Slot 0 Slot 5462-10922 Slot 16383 Keyspace 0-5 Replica’s Configuration Endpoint Slot 1 … Vertically Scaled Horizontally Scaled Max Storage 407 GiB Max Storage 6+ TiB
  8. 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Feature Enabled Disabled Failover 15–30 sec (Non-DNS) ~1.5 min (DNS-based) Failover risk • Writes affected—partial dataset (less risk with more partitions) • Reads available • Writes affected on entire dataset • Reads available Performance Scales with cluster size (90 nodes—15 primaries + 0–5 replicas per shard) 6 nodes (1 primary + 0–5 replicas) Max connections • Primaries (65,000 x 15 = 975,000) • Replicas (65,000 x 75 = 4,875,000) • Primary: 65,000 • Replicas: (65,000 x 5 = 325,000) Storage 6+ TiB 407 GB Cost Example: Assume needs 175 GB Smaller nodes but more $$ 9 x cache.r3.xlarge ($0.455hr) = $4.095 hr 255.6 GB Larger nodes less $ 1 X cache.r3.8xlarge = $3.640 , 237 GB Redis Cluster mode enabled vs disabled
  9. 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon ElastiCache Closer look at cluster-mode enabled
  10. 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. S5 S1 S2 S4 S3 Client  16384 hash slots per cluster  Slot for a key is CRC16(key) mod 16384  Slots are distributed across the cluster into shards  Developers must use a RedisCluster aware client  Clients are redirected to the correct shard  Smart clients store a map Shard S1 = slots 0–3276 Shard S2 = slots 3277–6553 Shard S3 = slots 6554–9829 Shard S4 = slots 9830–13106 Shard S5 = slots 13107–16383 Redis Cluster: automatic client -side sharding
  11. 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Availability Zone A slots 0–5454 slots 5455–10909 Redis Cluster Redis Cluster—architecture slots 10910–16363 Availability Zone B Availability Zone C slots 5455–10909 slots 5455–10909slots 0–5454 slots 0–5454 slots 10910–16363 slots 10910–16363 Redis Cluster—Multi AZ A cluster consists of 1 to 15 shards Example: 3 shard cluster, 2 read replicas
  12. 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Availability Zone A slots 0–5454 Redis Cluster Redis Cluster—architecture slots 10910–16363 Availability Zone B Availability Zone C slots 5455–10909 slots 5455–10909slots 0–5454 slots 0–5454 slots 10910–16363 Shard ReplicaReplicaPrimary Each shard has a primary node and up to 5 replica nodes slots 5455–10909 slots 10910–16363
  13. 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Availability Zone A slots 0–5454 slots 5455–10909 Redis Cluster Redis Cluster—architecture slots 10910–16363 Availability Zone B Availability Zone C slots 5455–10909 slots 5455–10909 Shard ReplicaReplica Primary Each shard has a primary node and up to 5 replica nodes slots 0–5454 slots 0–5454 slots 10910–16363 slots 10910–16363
  14. 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Availability Zone A slots 0–5454 Redis Cluster Redis Cluster—architecture slots 10910–16363 Availability Zone B Availability Zone C slots 10910–16363 slots 10910–16363 Shard Replica PrimaryReplica Each shard has a primary node and up to 5 replica nodes slots 5455–10909 slots 0–5454 slots 5455–10909 slots 0–5454 slots 5455–10909
  15. 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Availability Zone A slots 0–5454 slots 5455–10909 Redis Cluster slots 10910–16363 Availability Zone B Availability Zone C slots 5455–10909 slots 5455–10909slots 0–5454 slots 0–5454 slots 10910–16363 slots 10910–16363 Scenario 1: single primary failure
  16. 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Availability Zone A slots 0–5454 slots 5455–10909 Redis Cluster Scenario 1: single primary failure slots 10910–16363 Availability Zone B Availability Zone C slots 5455–10909 slots 5455–10909slots 0–5454 slots 0–5454 slots 10910–16363 Mitigation: 1. Automatic failure detection and replica promotion (~15-30s) 2. Repair failed node slots 10910–16363
  17. 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Availability Zone A slots 0–5454 slots 5455–10909 Redis Cluster Scenario 2: majority of primaries fail slots 10910–16363 Availability Zone B Availability Zone C slots 5455–10909 slots 5455–10909slots 0–5454 slots 0–5454 slots 10910–16363slots 10910–16363
  18. 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Availability Zone A slots 0–5454 slots 5455–10909 Redis Cluster slots 10910–16363 Availability Zone B Availability Zone C slots 5455–10909 slots 5455–10909slots 0–5454 slots 0–5454 Mitigation: Redis enhancements on ElastiCache • Automatic failure detection and replica promotion • Repair failed nodes slots 10910–16363slots 10910–16363 Scenario 2: majority of primaries fail
  19. 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. aws elasticache create-snapshot --replication-group-id redisclusterID --snapshot-name snameStep 1 aws elasticache copy-snapshot --source-snapshot-name sname --target-snapshot-name sname --target-bucket s3ucketname Step 2 Step 3 aws elasticache create-replication-group --replication-group-id NewRedisClusterID … --snapshot-arns arn:aws:s3:::bucketname/redisbackup-0001.rdb, etc. Step 4 Once the new cluster is up, update your app with new Amazon ElastiCache endpoint, then terminate old cluster. 3 Shards 5 Shards Downtime new writes not in snapshot rdb Pro Tip: DR Strategy – Enable CRR on S3 bucket triggering AWS Lambda function to hydrate destination cluster Resizing via backup & restore
  20. 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Zero-Downtime Online Re-sharding Amazon ElastiCache
  21. 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 0-5461 Shard 1 Shard 2 Shard 3 5462--10922 10923-16383 aws elasticache modify-replication-group-shard-configuration --replication-group-id rep-group-id --apply-immediately --node-group-count 5 Simple API Scale In || Out Online Re-Sharding – Zero Downtime
  22. 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 0-5461 reads/ writes Shard 1 Shard 2 Shard 3 Shard 4 Shard 5 5462--10922 10923-163830-2909, 5095-5461 5462-5783, 6876-9830 10923-14199 2910-5094, 9831--10922 No Application Interruption Uniform slot distribution across shards 5784-6875, 14200-16383 Online Re-Sharding – Zero Downtime- Scale Out
  23. 23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 0-5461 reads/ writes Shard 1 Shard 2 Shard 3 Shard 4 Shard 5 5462--10922 10923-16383 Uniform slot distribution across shards No Application Interruption Online Re-Sharding – Zero Downtime- Scale In
  24. 24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Lambda 3 Shards … var params = { ApplyImmediately: true, NodeGroupCount: 5, ReplicationGroupId: ‘rep-group-id’, … } elasticache.modifyReplicationGroupShardConfiguration(params, function(err, data) { if (err) console.log(err, err.stack); else console.log(data); }); … 5 Shards MEMORY HIGH! Amazon CloudWatch Cluster Resized AWS SNS Online Re-Sharding—CW alarm triggered
  25. 25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. reads/ writes reads AZ1 AZ2 reads search reads search clients c a c h e c l u s t e r r e l a t i o n a l d a t a Healthy
  26. 26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AZ1 AZ2 search search clients c a c h e c l u s t e r r e l a t i o n a l d a t a reads/ writes reads reads reads Heavy pressure
  27. 27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AZ1 AZ2 search search c a c h e c l u s t e r clients r e l a t i o n a l d a t a Healthy – Auto Scaled Out reads/ writes reads reads reads
  28. 28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon ElastiCache Security Overview
  29. 29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Availability Zone B Availability Zone CAvailability Zone A REDIS:6379> hget feature:details “ref-arch”
  30. 30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Availability Zone B Availability Zone CAvailability Zone A Private SubnetPrivate Subnet Private Subnet REDIS:6379> hget feature:details “ref-arch”
  31. 31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Availability Zone B Availability Zone CAvailability Zone A Private SubnetPrivate Subnet Private Subnet REDIS:6379> hget feature:details “ref-arch” security group security group security group
  32. 32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Availability Zone B Availability Zone CAvailability Zone A Private SubnetPrivate Subnet Private Subnet REDIS:6379> hget feature:details “ref-arch” security group security group security group Elasticache Redis Cluster Amazon S3 bucket REDIS RDB snapshot Encryption at REST
  33. 33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Availability Zone B Availability Zone CAvailability Zone A Private SubnetPrivate Subnet Private Subnet REDIS:6379> hget feature:details “ref-arch” security group security group security group Elasticache Redis Cluster security group Public Subnet security group Public Subnet Public Subnet security group Encryption In-Transit 3.2.6 Redis AUTH Amazon S3 bucket Encryption at REST REDIS RDB snapshot
  34. 34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Encryption • In-Transit: Encrypt all communications between clients and Redis server as well as between nodes • At-Rest: Encrypt backups on disk and in Amazon S3 • Fully managed: Setup via API or console, automatic issuance and renewal Compliance • HIPAA Eligibility for ElastiCache for Redis • Included in AWS Business Associate Addendum • Redis 3.2.6 Amazon ElastiCache Encryption and Compliance
  35. 35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon ElastiCache Common Usage Patterns
  36. 36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Session management Database caching APIs (HTTP responses) IOT Streaming data analytics (Filtering/aggregation) Pub/sub Social media (Sentiment analysis) Standalone database (Metadata store) Leaderboards Usage patterns
  37. 37. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Caching Clients Amazon ElastiCache Redis Amazon DynamoDB Elastic Load Balancing Amazon EC2 Amazon RDS write-through reads/ writes DDB streams mysql.lambda_async reads/ writes Amazon S3 reads/writes Object data Unstructured data Relational data
  38. 38. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Caching NoSQL Amazon EC2 reads/ writes reads MongoDB Cluster Cassandra Cluster  Smaller NoSQL DB clusters needed = lower costs  Faster data retrieval = better performance Elasticsearch Cluster Clients
  39. 39. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon EC2 reads/ writes Amazon ElastiCache Redis reads MongoDB Cluster DBObject doc = collection.findOne(); Cache serialized DBObject in Redis (good) Cache rows in Redis hash (faster/more efficient) Cassandra Cluster Amazon ElastiCache Redis Amazon EC2 reads/ writes reads ResultSet rs = session.execute(stmt); Cache serialized ResultSet in Redis (good) Cache rows in Redis hash (faster/more efficient)  Smaller NoSQL DB clusters needed = lower costs  Faster data retrieval = better performance Caching NoSQL databases with Amazon ElastiCache
  40. 40. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Kinesis Analytics Amazon Kinesis Streams Amazon Kinesis Streams Amazon ElastiCache (Redis) cleansed stream Streaming data enrichment/processing Datasources raw stream Subscribers AWS Lambda function 1 Continualdatafiltering/ Enrichment Real-time pub/sub AWS Lambda function 2
  41. 41. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Big data architectures using Redis Amazon Kinesis DataSources AWS Lambda Apache Storm on EMR Spark Streaming on Amazon EMR Amazon Kinesis app Amazon EC2 AWS IoT Amazon ElastiCache Collect Store Process Amazon S3 Apache Kafka AWS Lambda Custom app Spark on Amazon EMR Analyze
  42. 42. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Rules Engine Amazon ElastiCache Redis AWS Lambda Direct integration LambdaSNS SQS S3 KinesisDDB AWS IoT devices AWS IoT Sensor store IoT powered by ElastiCache
  43. 43. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Mobile apps powered by ElastiCache Amazon API Gateway AWS Lambda Amazon ElastiCache Redis GEOADD GEORADIUS Search points of interest Update points of interest https://aws.amazon.com/blogs/database/amazon-elasticache-utilizing-redis-geospatial-capabilities/ Amazon DynamoDB DDB streams Amazon EC2
  44. 44. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Ad tech powered by ElastiCache Clients Advertisers https://aws.amazon.com/caching/database-caching/ Ad network Ad slot Consumer Ad slot publishers Ad placement (websites/apps) Amazon ElastiCache Redis <40 ms Clickstream (shopping events) User visits page Publisher places ad slot for auction Ad network calls for bidsBidders respond with bids Winners bid ad displayed
  45. 45. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Chat apps powered by ElastiCache https://aws.amazon.com/blogs/database/amazon-elasticache-utilizing-redis-geospatial-capabilities/ Clients Chat apps Application Load Balancer WebSockets Amazon ElastiCache Redis PubSub Server persistent connections Elastic Beanstalk SUBSCRIBE chat_channel:114 PUBLISH chat_channel:114 "Hello all" >> ["message", "chat_channel:114", "Hello all"] UNSUBSCRIBE chat_channel:114
  46. 46. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Very popular for gaming apps that need uniqueness and ordering • Easy with Redis sorted sets ZADD "leaderboard" 1201 "Gollum” ZADD "leaderboard" 963 "Sauron" ZADD "leaderboard" 1092 "Bilbo" ZADD "leaderboard" 1383 "Frodo” ZREVRANGE "leaderboard" 0 -1 1) "Frodo" 2) "Gollum" 3) "Bilbo" 4) "Sauron” ZREVRANK "leaderboard" "Sauron" (integer) 3 Gaming—real-time leaderboards
  47. 47. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Ex: throttling requests to anAPI uses Redis counters ELB Externally facing API Reference: http://redis.io/commands/INCR FUNCTION LIMIT_API_CALL(APIaccesskey) limit = HGET(APIaccesskey, “limit”) time = CURRENT_UNIX_TIME() keyname = APIaccesskey + ":” + time count = GET(keyname) IF current != NULL && count > limit THEN ERROR ”API request limit exceeded" ELSE MULTI INCR(keyname) EXPIRE(keyname,10) EXEC PERFORM_API_CALL() END Rate limiting
  48. 48. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon ElastiCache Best Practices
  49. 49. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Cluster sizing best practices • Storage—Clusters should have adequate memory • Recommended: Memory needed + 25% reserved memory (for Redis) + some room for growth (optional 10%) • Optimize using eviction policies andTTLs • Scale up or out when before reaching max-memory usingCloudWatch alarms • Use memory optimized nodes for cost effectiveness (R4 support ) • Performance—Performance should not be compromised • Benchmark operations using Redis Benchmark tool • For more READIOPS—Add replicas • For moreWRITEIOPS—Add shards (scale out) • For more network IO—Use network optimized instances and scale out • Use pipelining for bulk reads/writes • Consider Big(O) time complexity for data structure commands • Cluster Isolation (apps sharing key space)—Choose a strategy that works for your workload • Identify what kind of isolation is needed based on the workload and environment • Isolation: No Isolation $ | Isolation by Purpose $$ | Full Isolation $$$
  50. 50. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Redis Benchmark tool Open source utility to benchmark performance example: src/redis-benchmark -h r3-xlarge-perf.foio87.0001.use1.cache.amazonaws.com -p 6379 -n -150000 -d 100 Syntax: redis-benchmark -h <host> -p <port> -c 50 -n 1000 -d 500 –q -c <clients>—Specifies the number of parallel connections (default 50). -n <requests>—Specifies the number of requests (default 1000000). -d <size>—Specifies the data size of GET and SET values in bytes. -t <test1,test2>—Comma-separated list of tests to perform. -q—Quiet operation, displays only the result.
  51. 51. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Redis max-memory policies Select a max-memory policy based on your workload needs • noeviction: return errors when the memory limit was reached and the client is trying to execute commands that might result in more memory to be used. • allkeys-lru: evict keys trying to remove the less recently used (LRU) keys first. • volatile-lru: evict keys trying to remove the less recently used (LRU) keys first, but only among keys that have an expire set. • allkeys-random: evict random keys to make space for the new data added. • volatile-random: evict random keys to make space for the new data added, but only evict keys with an expire set. • volatile-ttl: evict only keys with an expire set, and try to evict keys with a shorter time to live (TTL) first.
  52. 52. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Key ElastiCache CloudWatch Metrics • CPUUtilization • Memcached – up to 90% ok • Redis – divide by cores (ex: 90% / 4 = 22.5%) • SwapUsage low • CacheMisses / CacheHits Ratio low / stable • Evictions near zero • Exception: Russian doll caching • CurrConnections stable • Setup alarms with CloudWatch Metrics
  53. 53. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. ElastiCache Modifiable Parameters • Maxclients: 65000 (unchangeable) • Use connection pooling • timeout – Closes a connection after its been idle for a given interval • tcp-keepalive – Detects dead peers given an interval • Databases: 16 (Default) for non-clustered mode • Logical partition • Reserved-memory: 25% (Default) • Recommended  50% of maxmemory to use before 2.8.22  25% after 2.8.22 – ElastiCache • Maxmemory-policy: • The eviction policy for keys when maximum memory usage is reached • Possible values: volatile-lru, allkeys-lru, volatile-random, allkeys-random, volatile-ttl, noeviction
  54. 54. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Understand the frequency of change of underlying data • Set appropriate TTLs on keys that match that frequency • Choose appropriate eviction policies that are aligned with application requirements • Isolate your cluster by purpose (i.e. cache cluster, queue, standalone database, etc.) • Maintain cache freshness with write-throughs • Performance test and size your cluster appropriately • Monitor Cache HIT/MISS ratio and alarm on poor performance • Use Failover API to test application resiliency Caching tips
  55. 55. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank You! https://aws.amazon.com/elasticache/ Amazon ElastiCache

×