Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

RedisDay London 2018 - How We Run Redis in Multiple Datacenters

60 views

Published on

RedisDay London 2018

Published in: Technology
  • Be the first to comment

  • Be the first to like this

RedisDay London 2018 - How We Run Redis in Multiple Datacenters

  1. 1. How we run Redis in multiple datacenters Moe Chaieb, Shopify
  2. 2. Shopify is the leading omni-channel commerce platform. Merchants use Shopify to design, set up, and manage their stores across multiple sales channels, including mobile, web, social media, marketplaces, brick-and- mortar locations, and pop-up shops.
  3. 3. • One of the oldest and largest Ruby on Rails monoliths • 1000+ developers • 50 deploys per day • 80K peak RPS • 2.5 million LOC • 100 000 unit tests • 1.6 billion background jobs processed per day
  4. 4. • Why Redis? • Architecture Overview • Multi-tenancy • Multi-DC • Resque • Resiliency • Persistence • Failovers Outline
  5. 5. Why Redis?
  6. 6. Shopify Multi-tenancy
  7. 7. Shopify Multi-DC
  8. 8. Workers: web fashionstore.ca shop_id: 87235 pod_id: A location: east controller: checkout action: create
  9. 9. Workers: job
  10. 10. • Ability to be resilient at the system level, not just at the component level • Expect single components 
 to fail routinely • Especially true in cloud environments • server pre-emption is out of your hands • expect processes to be restarted routinely • graceful termination, 
 fallbacks are key Platform Resiliency
  11. 11. • Depends on the level of data criticality • Jobs are critical • Shopping carts are not • SIGKILLs are still dangerous • still potential for some data loss (up to 1 second) Redis Persistence
  12. 12. • Useful for: • Incident response • Load balancing • Need to be fast • Need to be safe • Manually trigger (for now) Shopify Pod Failovers
  13. 13. Failover Orchestration
  14. 14. Failover Steps
  15. 15. Shopify Redis “Failover”: TransferJobs
  16. 16. • Write throughput is high • stress on tunnels • cross-DC traffic $$$ • Writes are shortly no-oped • A bit more complex than 
 what we need Redis Replication & Resque
  17. 17. • Bugs can result in data loss • Delays can result 
 in missed SLOs • Race conditions 
 with job workers TransferJobs issues
  18. 18. • Sharding is helpful (in write-heavy applications) • No silver-bullet solution for multi-DC, active/passive architectures • Keep it simple • know when to build it yourself • …and when to defer to Redis Closing remarks
  19. 19. Thank you!

×