Redis to the Rescue?

29,292 views
28,337 views

Published on

Based in Berlin, wooga is the leading European social games developer. Social games offer interesting scaling challenges—when they become popular, the user base can grow quickly, often by 50.000 people per day or more.

This case study recounts how we successfully scaled up two Facebook games to one million daily active users each, why we decided to replace MySQL (once partially, once fully) with Redis, what difficulties we encountered on the way and how we solved them.

As an in-memory database, Redis offers an order-of-magnitude reduction in query roundtrip latency, but also introduces new challenges: How can you guarantee durability of data in the case of server outages? How do you best structure your data when there are no ad-hoc query capabilities?

This talk will go into technical details of our backend architecture and discuss both its advantages and disadvantages, how it stacks up against other possible setups, and what lessons we have learned.

Published in: Technology
4 Comments
37 Likes
Statistics
Notes
No Downloads
Views
Total views
29,292
On SlideShare
0
From Embeds
0
Number of Embeds
11,743
Actions
Shares
0
Downloads
324
Comments
4
Likes
37
Embeds 0
No embeds

No notes for slide

Redis to the Rescue?

  1. Redis to the Rescue?O’Reilly MySQL Conference2011-04-13
  2. Who?• Tim Lossen / @tlossen• Berlin, Germany• backend developer at wooga
  3. Redis IntroCase 1: Monster WorldCase 2: Happy HospitalDiscussion
  4. Redis IntroCase 1: Monster WorldCase 2: Happy HospitalDiscussion
  5. What?• key-value-store• in-memory database• “data structure server”
  6. Data Types• strings (integers)
  7. Data Types• strings (integers)• lists• hashes
  8. Data Types• strings (integers)• lists• hashes• sets• sorted sets
  9. flickr.com/photos/atzu/2645776918
  10. Performance• same for reads / writes
  11. Performance• same for reads / writes• 50 K ops/second - regular notebook, EC2 instance
  12. Performance• same for reads / writes• 50 K ops/second - regular notebook, EC2 instance• 200 K ops/second - intel core i7 X980 (3.33 GHz)
  13. Durability• snapshots• append-only log
  14. Other Features• master-slave replication• virtual memory• ...
  15. Redis IntroCase 1: Monster WorldCase 2: Happy HospitalDiscussion
  16. Daily Active UsersApril May June July Aug Sept Oct 2010
  17. Daily Active UsersApril May June July Aug Sept Oct 2010
  18. Challenge• traffic growing rapidly
  19. Challenge• traffic growing rapidly• bottleneck: write throughput - EBS volumes pretty slow
  20. Challenge• traffic growing rapidly• bottleneck: write throughput - EBS volumes pretty slow• MySQL already sharded - 4 x 2 = 8 shards
  21. Idea• move write-itensive data to Redis
  22. Idea• move write-itensive data to Redis• first candidate: inventory - integer fields - frequently changing
  23. Solution• inventory = Redis hash - atomic increment / decrement !
  24. Solution• inventory = Redis hash - atomic increment / decrement !• on-demand migration of users - with batch roll-up
  25. Results• quick win - implemented in 2 weeks - 10% less load on MySQL servers
  26. Results• quick win - implemented in 2 weeks - 10% less load on MySQL servers• decision: move over more data
  27. But ...• “honeymoon soon over”
  28. But ...• “honeymoon soon over”• growing memory usage (fragmentation) - servers need periodic “refresh” - replication dance
  29. Current Status• hybrid setup - 4 MySQL master-slave pairs - 4 Redis master-slave pairs
  30. Current Status• hybrid setup - 4 MySQL master-slave pairs - 4 Redis master-slave pairs• evaluating other alternatives - Riak, Membase
  31. Redis IntroCase 1: Monster WorldCase 2: Happy HospitalDiscussion
  32. Challenge• expected peak load: - 16000 concurrent users - 4000 requests/second - mostly writes
  33. “Memory is the new Disk, Disk is the new Tape.” ⎯ Jim Gray
  34. Idea• use Redis as main database - excellent (write) performance - virtual memory for capacity
  35. Idea• use Redis as main database - excellent (write) performance - virtual memory for capacity• no sharding = simple operations
  36. Data Model• user = single Redis hash - each entity stored in hash field (serialized to JSON)
  37. Data Model• user = single Redis hash - each entity stored in hash field (serialized to JSON)• custom Ruby mapping layer (“Remodel”)
  38. {“level”: 4,1220032045 u1 “xp”: 241} u1_pets [“p7”, “p8”] {“pet_type”: p7 “Cat”} {“pet_type”: p8 “Dog”} {“level”: 1,1234599660 u1 “xp”: 22} u1_pets [“p3”] ... ...
  39. {“level”: 4,1220032045 u1 “xp”: 241} u1_pets [“p7”, “p8”] {“pet_type”: p7 “Cat”} {“pet_type”: p8 “Dog”} {“level”: 1,1234599660 u1 “xp”: 22} u1_pets [“p3”] ... ...
  40. {“level”: 4,1220032045 u1 “xp”: 241} u1_pets [“p7”, “p8”] {“pet_type”: p7 “Cat”} {“pet_type”: p8 “Dog”} {“level”: 1,1234599660 u1 “xp”: 22} u1_pets [“p3”] ... ...
  41. {“level”: 4,1220032045 u1 “xp”: 241} u1_pets [“p7”, “p8”] {“pet_type”: p7 “Cat”} {“pet_type”: p8 “Dog”} {“level”: 1,1234599660 u1 “xp”: 22} u1_pets [“p3”] ... ...
  42. {“level”: 4,1220032045 u1 “xp”: 241} u1_pets [“p7”, “p8”] {“pet_type”: p7 “Cat”} {“pet_type”: p8 “Dog”} {“level”: 1,1234599660 u1 “xp”: 22} u1_pets [“p3”] ... ...
  43. {“level”: 4,1220032045 u1 “xp”: 241} u1_pets [“p7”, “p8”] {“pet_type”: p7 “Cat”} {“pet_type”: p8 “Dog”} {“level”: 1,1234599660 u1 “xp”: 22} u1_pets [“p3”] ... ...
  44. Virtual Memory• server: 24 GB RAM, 500 GB disk - can only keep “hot data” in RAM
  45. Virtual Memory• server: 24 GB RAM, 500 GB disk - can only keep “hot data” in RAM• 380 GB swap file - 50 mio. pages, 8 KB each
  46. Dec 2010: Crisis• memory usage growing fast
  47. Dec 2010: Crisis• memory usage growing fast• cannot take snapshots any more - cannot start new slaves
  48. Dec 2010: Crisis• memory usage growing fast• cannot take snapshots any more - cannot start new slaves• random crashes
  49. Analysis• Redis virtual memory not compatible with: - persistence - replication
  50. Analysis• Redis virtual memory not compatible with: - persistence - replication• need to implement our own!
  51. Workaround• “dumper” process - tracks active users - every 10 minutes, writes them into YAML file
  52. ruby redis disk
  53. ruby redis disk
  54. ruby redis disk
  55. ruby redis disk
  56. ruby redis disk
  57. Workaround• in case of Redis crash - start with empty database - restore users on demand from YAML files
  58. Real Solution• Redis “diskstore” - keeps all data on disk - swaps data into memory as needed
  59. Real Solution• Redis “diskstore” - keeps all data on disk - swaps data into memory as needed• under development (expected Q2)
  60. Results• average response time: 10 ms
  61. Results• average response time: 10 ms• peak traffic: - 1500 requests/second - 15000 Redis ops/second
  62. Current Status• very happy with setup - simple, robust, fast - easy to operate• still lots of spare capacity
  63. Redis IntroCase 1: Monster WorldCase 2: Happy HospitalDiscussion
  64. Advantages• order-of-magnitude performance improvement - removes main bottleneck - enables simple architecture
  65. Disadvantages• main challenge: durability - diskstore very promising
  66. Disadvantages• main challenge: durability - diskstore very promising• no ad-hoc queries - think hard about data model - hybrid approach?
  67. Conclusion• Ruby + Redis = killer combo
  68. Q&A
  69. redis.iogithub.com/tlossen/remodelwooga.com/jobs

×