Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Counting image views using redis cluster

2,050 views

Published on

Streaming Logs and Processing View Counts using Redis Cluster
Seandon Mooy
(Imgur)

When you browse through Imgur, you notice that each user's post includes the number of views for that particular post. Imgur processes over 3 billion views per month and powers our view count feature using Redis. In this talk, we cover our current architecture for streaming logs and processing view counts using Redis Cluster, as well as some of the alternatives we explored and why we chose Redis.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Counting image views using redis cluster

  1. 1. Counting Image Views using Redis Cluster Seandon Mooy DevOps Engineer @erulabs
  2. 2. Counting Image Views using Redis Cluster Or…. how I stopped map-reducing and learned to love the stream Seandon Mooy DevOps Engineer @erulabs
  3. 3. 3 Billion!
  4. 4. Delay!
  5. 5. Delay! Failures!
  6. 6. Delay! Failures! Failures!
  7. 7. Also… I may not be the best zookeeper
  8. 8. Challenges with Hbase Roughly 5% of all requests through THRIFT were failing… So many tunables!
  9. 9. Challenges with Hbase Roughly 5% of all requests through THRIFT were failing… So many tunables! Optimized timeouts, added circuitbreakers, etc Trickle of working requests during outage means circuit breakers are hard to design…
  10. 10. Challenges with Hbase Roughly 5% of all requests through THRIFT were failing… So many tunables! Optimized timeouts, added circuitbreakers, etc Trickle of working requests during outage means circuit breakers are hard to design… “Hbase down == Imgur down” Downtime == sadtime :(
  11. 11. 3 Billion!
  12. 12. Solution? Redis Cluster!
  13. 13. Fastly ViewCount V2 - Real time with less complexity! TCP syslog stream
  14. 14. Fastly ViewCount V2 - Real time with less complexity! TCP syslog stream Ingest service
  15. 15. Fastly ViewCount V2 - Real time with less complexity! TCP syslog stream Ingest service Parses syslog lines, reports metrics via statsd
  16. 16. Fastly ViewCount V2 - Real time with less complexity! TCP syslog stream Ingest service Parses syslog lines, reports metrics via statsd Redis 3.2 cluster!
  17. 17. Fastly ViewCount V2 - Real time with less complexity! Ingest service Hbase Backfill service
  18. 18. Fastly ViewCount V2 - Real time with less complexity! Ingest service Hbase Backfill service Internet API service
  19. 19. ViewCount V2 - Results:
  20. 20. ViewCount V2 - Results: Request latency: min: 1ms max: 16.9ms median: 1.6ms p95: 2.6ms p99: 4.6ms Codes: 200: 10000
  21. 21. ViewCount V2 - Results: Request latency: min: 1ms max: 16.9ms median: 1.6ms p95: 2.6ms p99: 4.6ms Codes: 200: 10000
  22. 22. ViewCount V2 - Results: 20 billion commands! > 400GB in memory!
  23. 23. Things to be aware of: 1. Redis Cluster shard maps - redirections, etc. Monitor redirections - gracefully restart workers after shard moves 2. AOF can slow down / fail large “redis-trib.rb” operations. Make sure to disable before / re-enable after! 3. Not all legacy systems support Redis Cluster, and if they do… They might not support it well (PHP-FPM)! 4. Over memory capacity behavior? Previously we would hard-crash - now we’d LRU old 1-view images. Neither are good, but for us, one is much less painful
  24. 24. ViewCount V3? Approaching the point of minimal gains for man-hours, but what else might be fun? 1. Moving PHP7 off NodeJS API and directly to Redis Cluster Downsides: dealing with shard maps is complex is a stateless / process-per-request environment! 2. Using redis3's BITFIELD or HSet to save on key storage costs Downsides: complicate the system, reduce “hit-by-a-bus” issues - keys are just hashes, values are just counts! 3. Dealing with the nature of TCP Streams (TCP is not HTTP!) One connection to rule them all! - Node’s Cluster module helps, but perhaps Rust or Golang? Downsides: Vertical scaling is non-obvious on EC2
  25. 25. ViewCount V2 - Results: Redis is: Faster - Imgur response time decreased ~50ms
  26. 26. ViewCount V2 - Results: Redis is: Faster - Imgur response time decreased ~50ms Cheaper - EC2 cost reduced by 75%
  27. 27. ViewCount V2 - Results: Redis is: Faster - Imgur response time decreased ~50ms Cheaper - EC2 cost reduced by 75% Simpler - No Java, no MR, no ZK, no third parties, just INCR + GET!
  28. 28. Redis is: Faster - Imgur response time decreased ~50ms Cheaper - EC2 cost reduced by 75% Simpler - No Java, no MR, no ZK, no third parties, just INCR + GET! More fun! - I got to talk at RedisConf17! ViewCount V2 - Results:
  29. 29. Acknowledgment Imgur DevOps Team Imgur Platform Team

×