Scaling social games   “the order of magnitude          challenge”  Paolo Negri @hungryblank
Order of magnitute                                      DAU                     1000000                     750000DAU:    ...
Social GamesFlash client (game)    HTTP API                      http://www.flickr.com/photos/stars6/4381851322
Social GamesFlash client               • Game actions need to be                 persisted and validated               • 1...
Social Games                          HTTP API• 5000 HTTP reqs/sec• more than 90% writes• 60K queries/sec
                ...
July 2010                                HAproxy• ~ 170 000 daily users• Plain Ruby on Rails app• Persistency 100% SQL    ...
July 2010• 1 haproxy server              HAproxy• multiple RoR servers• 4 mysql servers             Ruby on Rails  (sharde...
July 2010                      HAproxy                    Ruby on RailsSlow down             MySQL
July 2010                            HAproxyHigh queries/request      Ruby on Rails       ratio    Slow down              ...
Queries/request• Which code is triggering extra queries?• Why in our test environment the ratio is  lower than live?
Queries/request       Running code of live systemApplication   Plugins   Ruby on Rails
Queries/request Source of extra queries              •   sharding plugin “breaks” std                  Rails query cache  ...
Plugins• Deceiving “feature for free”• Might provide the right feature• But might not meet scaling need
Plugins• Instant code legacy, for new projects also!• Once added it’s your code• Even if it’s maintained, will it follow y...
Plugins• Assess code quality when you add it• Can you afford to maintain/change it?
Plugins• We fixed it!• Query cut up to 40% on some requests
Early August 3022.5 15 7.5  0   6:00 6:10 6:20 6:30 6:40 6:50 7:00 7:10 7:20 7:30 7:40 7:50 8:00 8:10                     ...
Hiccup causes    Who is periodically blocking MySQL• Code (app + plugins + Rails)?• Some periodic job?• The devil (AWS)?
Hiccup quick fix• We shard out the top queried table  (40% of all queries)                 MySQL servers      shard 1   sha...
Hiccup quick fix• We shard out the top queried table  (40% of all queries)     Top table      Top table      Top table     ...
Hiccup quick fix• Mysql likes it• “top table” shards will go a long way in the  scaling process     Top table      Top tabl...
Hiccup causes    Who is periodically blocking MySQL• Code (app + plugins + Rails)?• Some periodic job?• The devil (AWS)?  ...
Hiccup real cause• Emerging MySQL internal at high volume• MySQL flushes its buffer• Under heavy write IO it’s blocking
Hiccup solution• Percona MySQL patches (XtraDB) avoid  blocking behavior• Query time profile gets smooth• IO capacity limit...
Write through cache• Memcache in front of MySQL• Evaluated before sharding• Was discarded• Because of our read/write reatio
Write through cache 90% of the times we read data     in order to modify it
Write through cache  It means 90% of the times      1. read cache      2. write cache      3. write SQL
Write through cache                   Bound to   Read heavy                  Write heavy                         • Mysql w...
MySQL• Sharding SQL is a painful way to scale• Data migrations at high load imply  downtime• ACID benefits all lost because...
Redis• A persistent cache• Fast 60000 qps on AWS hardware• Interesting data structures, not only KV• Already some small sc...
Redis adoption• Which data to start from?• How do we migrate without downtime?• Which Ruby object - Redis structure lib?
Redis adoption• Which data to start from?• Best data fit for Redis hashes• Top 3rd queried table• a collection of integer fi...
Redis adoption• How do we migrate without downtime?• Migrate one user at a time• Use a Redis set to keep note of migrated/...
Redis adoption• How do we migrate without downtime?                            MySQLUser 123              RoR             ...
Redis adoption• How do we migrate without downtime?              read original data                                   MySQ...
Redis adoption• How do we migrate without downtime?                                  MySQLUser 123               RoR      ...
Redis adoption• How do we migrate without downtime?• Migration might never complete• SQL + Redis set information to genera...
Redis 1st result10% query load from 4 MySQL serveris moved to 1 Redis serverRedis server load is 0.05
Redis• Becomes the tool to use• Migration plan for all write intensive data• Migrate one “class” at a time
Redis honeymoon end• Memory usage grows more than data• Snapshot to disk causes spikes in query  time• Starting new slaves...
Redis honeymoon end           Russian Roulette Feeling• Redis machine sized with overabundant  RAM• Rigorous slave/master ...
Redis• Redis team acknowledges persistency/  replication problems• Redis 2.4 diskstore plan starts
1.000.000And counting...
1.000.000painless scaling          HAproxy                        Ruby on Rails                         Persistency
1.000.000                            HAproxyjust add servers          Ruby on Rails as load grows                         ...
1.000.000                         HAproxy                       Ruby on Rails Painful and            Peristencytroublesome
Infrastructure• AWS• Chef - through Scalarium• Ganglia
Thanks  ...
wooga        Is looking forBusiness Intelligence Engineer   http://wooga.com/jobs
Upcoming SlideShare
Loading in …5
×

Scaling Social Games

2,777
-1

Published on

Short talk given at the Berlin hadoop get together on the 27th of january 2011

0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,777
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
38
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide

Scaling Social Games

  1. 1. Scaling social games “the order of magnitude challenge” Paolo Negri @hungryblank
  2. 2. Order of magnitute DAU 1000000 750000DAU: 500000daily active users 250000 0 July December
  3. 3. Social GamesFlash client (game) HTTP API http://www.flickr.com/photos/stars6/4381851322
  4. 4. Social GamesFlash client • Game actions need to be persisted and validated • 1 API call every few secs
  5. 5. Social Games HTTP API• 5000 HTTP reqs/sec• more than 90% writes• 60K queries/sec
 http://www.flickr.com/photos/stars6/4381851322
  6. 6. July 2010 HAproxy• ~ 170 000 daily users• Plain Ruby on Rails app• Persistency 100% SQL Ruby on Rails MySQL
  7. 7. July 2010• 1 haproxy server HAproxy• multiple RoR servers• 4 mysql servers Ruby on Rails (sharded dataset) MySQL
  8. 8. July 2010 HAproxy Ruby on RailsSlow down MySQL
  9. 9. July 2010 HAproxyHigh queries/request Ruby on Rails ratio Slow down MySQL
  10. 10. Queries/request• Which code is triggering extra queries?• Why in our test environment the ratio is lower than live?
  11. 11. Queries/request Running code of live systemApplication Plugins Ruby on Rails
  12. 12. Queries/request Source of extra queries • sharding plugin “breaks” std Rails query cache Plugins • Flash wire protocol plugin generates extra queries
  13. 13. Plugins• Deceiving “feature for free”• Might provide the right feature• But might not meet scaling need
  14. 14. Plugins• Instant code legacy, for new projects also!• Once added it’s your code• Even if it’s maintained, will it follow your needs?
  15. 15. Plugins• Assess code quality when you add it• Can you afford to maintain/change it?
  16. 16. Plugins• We fixed it!• Query cut up to 40% on some requests
  17. 17. Early August 3022.5 15 7.5 0 6:00 6:10 6:20 6:30 6:40 6:50 7:00 7:10 7:20 7:30 7:40 7:50 8:00 8:10 query time in ms• The MySQL hiccup• every 70 mins query time spikes x7
  18. 18. Hiccup causes Who is periodically blocking MySQL• Code (app + plugins + Rails)?• Some periodic job?• The devil (AWS)?
  19. 19. Hiccup quick fix• We shard out the top queried table (40% of all queries) MySQL servers shard 1 shard 2 shard 3 shard 4
  20. 20. Hiccup quick fix• We shard out the top queried table (40% of all queries) Top table Top table Top table Top table shard 1 shard 2 shard 3 shard 4 Other tables Other tables Other tables Other tables shard 1 shard 2 shard 3 shard 4
  21. 21. Hiccup quick fix• Mysql likes it• “top table” shards will go a long way in the scaling process Top table Top table Top table Top table shard 1 shard 2 shard 3 shard 4 Other tables Other tables Other tables Other tables shard 1 shard 2 shard 3 shard 4
  22. 22. Hiccup causes Who is periodically blocking MySQL• Code (app + plugins + Rails)?• Some periodic job?• The devil (AWS)? None of the Above
  23. 23. Hiccup real cause• Emerging MySQL internal at high volume• MySQL flushes its buffer• Under heavy write IO it’s blocking
  24. 24. Hiccup solution• Percona MySQL patches (XtraDB) avoid blocking behavior• Query time profile gets smooth• IO capacity limit manifested with gradual performance decay
  25. 25. Write through cache• Memcache in front of MySQL• Evaluated before sharding• Was discarded• Because of our read/write reatio
  26. 26. Write through cache 90% of the times we read data in order to modify it
  27. 27. Write through cache It means 90% of the times 1. read cache 2. write cache 3. write SQL
  28. 28. Write through cache Bound to Read heavy Write heavy • Mysql write (unless async)• memcache perfs • Write through lib optimized for writes?
  29. 29. MySQL• Sharding SQL is a painful way to scale• Data migrations at high load imply downtime• ACID benefits all lost because of sharding or in name of performance
  30. 30. Redis• A persistent cache• Fast 60000 qps on AWS hardware• Interesting data structures, not only KV• Already some small scale experince in house
  31. 31. Redis adoption• Which data to start from?• How do we migrate without downtime?• Which Ruby object - Redis structure lib?
  32. 32. Redis adoption• Which data to start from?• Best data fit for Redis hashes• Top 3rd queried table• a collection of integer fields that need only increment / decrement
  33. 33. Redis adoption• How do we migrate without downtime?• Migrate one user at a time• Use a Redis set to keep note of migrated/ non migrated• No downtime, transparent to users
  34. 34. Redis adoption• How do we migrate without downtime? MySQLUser 123 RoR Server Redis
  35. 35. Redis adoption• How do we migrate without downtime? read original data MySQLUser 123 RoR Server Redis
  36. 36. Redis adoption• How do we migrate without downtime? MySQLUser 123 RoR Server Redis write migrated data
  37. 37. Redis adoption• How do we migrate without downtime?• Migration might never complete• SQL + Redis set information to generate final batch migration
  38. 38. Redis 1st result10% query load from 4 MySQL serveris moved to 1 Redis serverRedis server load is 0.05
  39. 39. Redis• Becomes the tool to use• Migration plan for all write intensive data• Migrate one “class” at a time
  40. 40. Redis honeymoon end• Memory usage grows more than data• Snapshot to disk causes spikes in query time• Starting new slaves eats memory on the master node
  41. 41. Redis honeymoon end Russian Roulette Feeling• Redis machine sized with overabundant RAM• Rigorous slave/master starting plan
  42. 42. Redis• Redis team acknowledges persistency/ replication problems• Redis 2.4 diskstore plan starts
  43. 43. 1.000.000And counting...
  44. 44. 1.000.000painless scaling HAproxy Ruby on Rails Persistency
  45. 45. 1.000.000 HAproxyjust add servers Ruby on Rails as load grows Peristency
  46. 46. 1.000.000 HAproxy Ruby on Rails Painful and Peristencytroublesome
  47. 47. Infrastructure• AWS• Chef - through Scalarium• Ganglia
  48. 48. Thanks ...
  49. 49. wooga Is looking forBusiness Intelligence Engineer http://wooga.com/jobs
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×