How to Handle 1,000,000 Daily Users Without Using a Cache (RailsWayCon 2012)

7,283 views
7,225 views

Published on

Online games pose a few interesting challenges on their backend: A single user generates one http call every few seconds and the balance between data read and write is close to 50/50 which makes the use of a write through cache or other common scaling approaches less effective. Starting from a rather classic Ruby on Rails application as the traffic grew we gradually changed it in order to meet the required performance. And when small changes no longer were enough we turned inside out parts of our data persistency layer migrating from SQL to NoSQL without taking downtimes longer than a few minutes. Follow the problems we hit, how we diagnosed them, and how we got around limitations. See which tools we found useful and which other lessons we learned by running the system with a team of just two developers without a sysadmin or operation team as support.

Published in: Technology, Education
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
7,283
On SlideShare
0
From Embeds
0
Number of Embeds
1,611
Actions
Shares
0
Downloads
79
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide

How to Handle 1,000,000 Daily Users Without Using a Cache (RailsWayCon 2012)

  1. 1. HOW  TO  HANDLE   1,000,000  DAILY  USERS (without  using  a  cache) Jesper  Richter-­‐Reichhelm,  @jrireiTuesday, June 5, 2012
  2. 2. Tuesday, June 5, 2012
  3. 3. The  overall  architecture  is  not  that  complex Flash  client BackendTuesday, June 5, 2012
  4. 4. The  overall  architecture  is  not  that  complex Flash  client Game  Session Asynch.  Communica2onTuesday, June 5, 2012
  5. 5. The  overall  architecture  is  not  that  complex Backend State  Changes ValidaMon PersistenceTuesday, June 5, 2012
  6. 6. But  the  scale  is  interesMng 14  billion  requests  /  monthTuesday, June 5, 2012
  7. 7. But  the  scale  is  interesMng 14  billion  requests  /  monthTuesday, June 5, 2012
  8. 8. But  the  scale  is  interesMng 14  billion  requests  /  month >100,000  DB  operaMons  /  secondTuesday, June 5, 2012
  9. 9. But  the  scale  is  interesMng 14  billion  requests  /  month >100,000  DB  operaMons  /  second >50,000  DB  updates  /  secondTuesday, June 5, 2012
  10. 10. A  journey  to  1,000,000  daily  users Start  of  the  journey 6  weeks  of  pain Paradise ConclusionTuesday, June 5, 2012
  11. 11. October  2009:  wooga’s  first  simulaMon  gameTuesday, June 5, 2012
  12. 12. Instead  of  PHP  we  used  RubyTuesday, June 5, 2012
  13. 13. Our  database  was  MySQLTuesday, June 5, 2012
  14. 14. Our  database  was  MySQL even  user  ids odd  user  idsTuesday, June 5, 2012
  15. 15. And  we  went  into  the  cloudTuesday, June 5, 2012
  16. 16. Master-­‐slave  replicaMon  for  DBs  worked  fine lb app app app db dbTuesday, June 5, 2012
  17. 17. We  added  a  few  applicaMon  servers  over  Mme lb app app app app app app app app app db dbTuesday, June 5, 2012
  18. 18. 250K  daily  users  and  no  problems &$!!!$!!!" %$#!!$!!!" %$!!!$!!!" #!!$!!!" Life  was  good !" ()*%!" +,-*%!" ./0*%!" +12*%%" ()*%%" +,-*%%" ./0*%%"Tuesday, June 5, 2012
  19. 19. Life  was  well  and  I  went  on  a  nice  vacaMon TO  DO <picture:  Jesper  in  clot   canyon>Tuesday, June 5, 2012
  20. 20. Tuesday, June 5, 2012
  21. 21. Our  bane:  MySQL  hiccups (!!"# !"# &!"# %!"# $!"# !"# !# )# (!# ()# $!# $)# *!# *)# %!#Tuesday, June 5, 2012
  22. 22. Our  bane:  MySQL  hiccups (!!"# !"# &!"# %!"# $!"# !"# !# )# (!# ()# $!# $)# *!# *)# %!#Tuesday, June 5, 2012
  23. 23. Our  bane:  MySQL  hiccups (!!"# !"# &!"# %!"# $!"# !"# !# )# (!# ()# $!# $)# *!# *)# %!#Tuesday, June 5, 2012
  24. 24. A  journey  to  1,000,000  daily  users Start  of  the  journey 6  weeks  of  pain Paradise ConclusionTuesday, June 5, 2012
  25. 25. SQL  queries  generated  by  Rubyamf  gem AMF  responses  to  Flash  clientTuesday, June 5, 2012
  26. 26. SQL  queries  generated  by  Rubyamf  gem AMF  responses  to  Flash  client Wrong  config... ...  so  associated  data  was  included,  tooTuesday, June 5, 2012
  27. 27. SQL  queries  generated  by  Rubyamf  gem AMF  responses  to  Flash  client Wrong  config... ...  so  associated  data  was  included,  too =>  Easy  to  fixTuesday, June 5, 2012
  28. 28. More  traffic  using  the  same  cluster lb app app app app app app app app app db dbTuesday, June 5, 2012
  29. 29. Config  tweaks  brought  us  to  300K  DAU &$!!!$!!!" %$#!!$!!!" %$!!!$!!!" #!!$!!!" Config  fixes !" ()*%!" +,-*%!" ./0*%!" +12*%%" ()*%%" +,-*%%" ./0*%%"Tuesday, June 5, 2012
  30. 30. AcMveRecord’s  checks  caused  20%  extra  DB   Checking  connecMon  state MySQL  process  list  full  of  ‘status’  callsTuesday, June 5, 2012
  31. 31. AcMveRecord’s  checks  caused  20%  extra  DB   Checking  connecMon  state MySQL  process  list  full  of  ‘status’  calls =>  Fixed  by  1  line  of  codeTuesday, June 5, 2012
  32. 32. I/O  on  MySQL  masters  sMll  was  the  bojleneck New  Relic:  60%  of  all  UPDATEs  on  ‘Mles’  tableTuesday, June 5, 2012
  33. 33. Tiles  are  part  of  the  core  game  loop Core  game  loop 1)  plant 2)  wait 3)  harvestTuesday, June 5, 2012
  34. 34. We  started  to  shard  on  model,  too Adding  new  shards old   old   master slaveTuesday, June 5, 2012
  35. 35. We  started  to  shard  on  model,  too Adding  new  shards 1)  Setup  new  masters  as  slaves  of  old  ones old   old   new   master slave masterTuesday, June 5, 2012
  36. 36. We  started  to  shard  on  model,  too Adding  new  shards 1)  Setup  new  masters old   old   new   new   master slave master slaveTuesday, June 5, 2012
  37. 37. We  started  to  shard  on  model,  too Adding  new  shards 1)  Setup  new  masters 2)  Start  using  new  masters old   old   new   new   master slave master slaveTuesday, June 5, 2012
  38. 38. We  started  to  shard  on  model,  too Adding  new  shards 1)  Setup  new  masters 2)  Start  using  new  masters 3)  Cut  replicaBon old   old   new   new   master slave master slaveTuesday, June 5, 2012
  39. 39. We  started  to  shard  on  model,  too Adding  new  shards 1)  Setup  new  masters 2)  Start  using  new  masters 3)  Cut  replicaBon 4)  Truncate old   old   new   new   master slave master slaveTuesday, June 5, 2012
  40. 40. 4  DB  masters  and  a  few  more  servers lb app app app app app app app app app app app app app app app app Bles Bles db db db dbTuesday, June 5, 2012
  41. 41. Sharding  by  model  brought  us  to  400K  DAU &$!!!$!!!" %$#!!$!!!" %$!!!$!!!" #!!$!!!" Shard  by  model !" ()*%!" +,-*%!" ./0*%!" +12*%%" ()*%%" +,-*%%" ./0*%%"Tuesday, June 5, 2012
  42. 42. We  improved  our  MySQL  setup RAID-­‐0  of  EBS  volumesTuesday, June 5, 2012
  43. 43. We  improved  our  MySQL  setup RAID-­‐0  of  EBS  volumes Using  XtraDBTuesday, June 5, 2012
  44. 44. We  improved  our  MySQL  setup RAID-­‐0  of  EBS  volumes Using  XtraDB Tweaking  my.cnfTuesday, June 5, 2012
  45. 45. Sharding  gem  circumvented  AR’s  internal  cache AcMveRecord  caches  SQL  queries...Tuesday, June 5, 2012
  46. 46. Sharding  gem  circumvented  AR’s  internal  cache AcMveRecord  caches  SQL  queries... ...  only  in  our  development  environment!Tuesday, June 5, 2012
  47. 47. Sharding  gem  circumvented  AR’s  internal  cache AcMveRecord  caches  SQL  queries... ...  only  in  our  development  environment! =>  Fixed  by  2  lines  of  codeTuesday, June 5, 2012
  48. 48. I/O  sMll  was  not  fast  enough If  2  +  2  is  not  enough,  ...Tuesday, June 5, 2012
  49. 49. I/O  sMll  was  not  fast  enough If  2  +  2  is  not  enough,  ... …  perhaps  4  +  4  masters  will  do?Tuesday, June 5, 2012
  50. 50. It’s  no  fun  to  handle  8+8  MySQL  DBs lb app app app app app app app app app app app app app app app app app app Bles Bles db db db dbTuesday, June 5, 2012
  51. 51. It’s  no  fun  to  handle  8+8  MySQL  DBs lb app app app app app app app app app app app app app app app app app app Bles Bles Bles Bles db db db db db db db dbTuesday, June 5, 2012
  52. 52. At  500K  DAU  we  were  at  a  dead  end &$!!!$!!!" %$#!!$!!!" %$!!!$!!!" #!!$!!!" !" ()*%!" +,-*%!" ./0*%!" +12*%%" ()*%%" +,-*%%" ./0*%%"Tuesday, June 5, 2012
  53. 53. At  500K  DAU  we  were  at  a  dead  end &$!!!$!!!" %$#!!$!!!" %$!!!$!!!" #!!$!!!" !" ()*%!" +,-*%!" ./0*%!" +12*%%" ()*%%" +,-*%%" ./0*%%"Tuesday, June 5, 2012
  54. 54. I/O  remained  bojleneck  for  MySQL  UPDATEs Each  DB  master  could  do about  1000  DB  write/s.Tuesday, June 5, 2012
  55. 55. I/O  remained  bojleneck  for  MySQL  UPDATEs Each  DB  master  could  do about  1000  DB  write/s. That’s  not  enough!Tuesday, June 5, 2012
  56. 56. Pick  the  right  tool  for  the  job!Tuesday, June 5, 2012
  57. 57. Pick  the  right  tool  for  the  job!Tuesday, June 5, 2012
  58. 58. Redis  is  fast  but  goes  beyond  simple  key/value Redis  is  a  key-­‐value  store Hashes,  Sets,  Sorted  Sets,  Lists Atomic  operaBons  like  set,  get,  incrementTuesday, June 5, 2012
  59. 59. Redis  is  fast  but  goes  beyond  simple  key/value Redis  is  a  key-­‐value  store Hashes,  Sets,  Sorted  Sets,  Lists Atomic  operaBons  like  set,  get,  increment 50,000  transacMons/s  on  EC2 Writes  are  as  fast  as  readsTuesday, June 5, 2012
  60. 60. We  could  learn  from  another  team  using  RedisTuesday, June 5, 2012
  61. 61. We  could  learn  from  another  team  using  RedisTuesday, June 5, 2012
  62. 62. Shelf  Mles  :  An  ideal  candidate  for  Redis Shelf  2les: {  plant1  =>  184, plant2  =>  141, plant3  =>  130, plant4  =>  112, …  }Tuesday, June 5, 2012
  63. 63. Shelf  Mles  :  An  ideal  candidate  for  using  Redis   Redis  Hash HGETALL HGETSET HINCRBY …Tuesday, June 5, 2012
  64. 64. On-­‐demand  migraMons  from  MySQL  to  RedisTuesday, June 5, 2012
  65. 65. On-­‐demand  migraMons  from  MySQL  to  RedisTuesday, June 5, 2012
  66. 66. On-­‐demand  migraMons  from  MySQL  to  RedisTuesday, June 5, 2012
  67. 67. On-­‐demand  migraMons  from  MySQL  to  RedisTuesday, June 5, 2012
  68. 68. On-­‐demand  migraMons  from  MySQL  to  RedisTuesday, June 5, 2012
  69. 69. On-­‐demand  migraMons  from  MySQL  to  RedisTuesday, June 5, 2012
  70. 70. On-­‐demand  migraMons  from  MySQL  to  RedisTuesday, June 5, 2012
  71. 71. On-­‐demand  migraMons  from  MySQL  to  RedisTuesday, June 5, 2012
  72. 72. Typical  migraMon  throughput  over  3  daysTuesday, June 5, 2012
  73. 73. Migrate  on  the  fly  -­‐  and  clean  up  later 1. Let  migraMon  run  unMl  everything  cools  downTuesday, June 5, 2012
  74. 74. Migrate  on  the  fly  -­‐  and  clean  up  later 1. Let  migraMon  run  unMl  everything  cools  down 2. Migrate  the  rest  manuallyTuesday, June 5, 2012
  75. 75. Migrate  on  the  fly  -­‐  and  clean  up  later 1. Let  migraMon  run  unMl  everything  cools  down 2. Migrate  the  rest  manually 3. Remove  migraMon  codeTuesday, June 5, 2012
  76. 76. Migrate  on  the  fly  -­‐  and  clean  up  later 1. Let  migraMon  run  unMl  everything  cools  down 2. Migrate  the  rest  manually 3. Remove  migraMon  code 4. Wait  unMl  no  fallback  necessaryTuesday, June 5, 2012
  77. 77. Migrate  on  the  fly  -­‐  and  clean  up  later 1. Let  migraMon  run  unMl  everything  cools  down 2. Migrate  the  rest  manually 3. Remove  migraMon  code 4. Wait  unMl  no  fallback  necessary 5. Remove  SQL  tableTuesday, June 5, 2012
  78. 78. A  journey  to  1,000,000  daily  users Start  of  the  journey 6  weeks  of  pain Paredise  (or  not?) ConclusionTuesday, June 5, 2012
  79. 79. Again:  Tiles  are  part  of  the  core  game  loop Core  game  loop 1)  plant 2)  wait 3)  harvestTuesday, June 5, 2012
  80. 80. Size  majers  for  migraMonsTuesday, June 5, 2012
  81. 81. Size  majers  for  migraMons MigraMon  check  overloadTuesday, June 5, 2012
  82. 82. Size  majers  for  migraMons MigraMon  check  overload MigraBon  only  on  startupTuesday, June 5, 2012
  83. 83. Size  majers  for  migraMons MigraMon  check  overload MigraBon  only  on  startup Overlooked  an  edge  caseTuesday, June 5, 2012
  84. 84. Size  majers  for  migraMons MigraMon  check  overload MigraBon  only  on  startup Overlooked  an  edge  case Only  migrate  1%  of  usersTuesday, June 5, 2012
  85. 85. Size  majers  for  migraMons MigraMon  check  overload MigraBon  only  on  startup Overlooked  an  edge  case Only  migrate  1%  of  users ConBnue  if  everything  is  okTuesday, June 5, 2012
  86. 86. In-­‐memory  DBs  don’t  like  dumping  to  disk Dumping  to  diskTuesday, June 5, 2012
  87. 87. In-­‐memory  DBs  don’t  like  dumping  to  disk Dumping  to  disk SAVE  is  blockingTuesday, June 5, 2012
  88. 88. In-­‐memory  DBs  don’t  like  dumping  to  disk Dumping  to  disk SAVE  is  blocking BGSAVE  needs  free  RAMTuesday, June 5, 2012
  89. 89. In-­‐memory  DBs  don’t  like  dumping  to  disk Dumping  to  disk SAVE  is  blocking BGSAVE  needs  free  RAM Latency  increase  by  100%Tuesday, June 5, 2012
  90. 90. In-­‐memory  DBs  don’t  like  dumping  to  disk Dumping  to  disk SAVE  is  blocking BGSAVE  needs  free  RAM Latency  increase  by  100% =>  BGSAVE  on  slaves  every  15  minutesTuesday, June 5, 2012
  91. 91. Redis  replicaMon  starts  with  a  BGSAVE StarMng  up  a  new  slave  by  replicaMonTuesday, June 5, 2012
  92. 92. Redis  replicaMon  starts  with  a  BGSAVE StarMng  up  a  new  slave  by  replicaMon BGSAVE  on  masterTuesday, June 5, 2012
  93. 93. Redis  replicaMon  starts  with  a  BGSAVE StarMng  up  a  new  slave  by  replicaMon BGSAVE  on  master Slave  imports  dumped  fileTuesday, June 5, 2012
  94. 94. Redis  replicaMon  starts  with  a  BGSAVE StarMng  up  a  new  slave  by  replicaMon BGSAVE  on  master Slave  imports  dumped  file =>  No  RAM  means  no  new  slavesTuesday, June 5, 2012
  95. 95. Redis  had  a  memory  fragmenMon  problem 44  GB in  8  days 24  GBTuesday, June 5, 2012
  96. 96. Redis  had  a  memory  fragmenMon  problem 38  GB in  3  days 24  GBTuesday, June 5, 2012
  97. 97. Redis  had  a  memory  fragmenMon  problem 2.2  in  v xe d F i 38  GB in  3  days 24  GBTuesday, June 5, 2012
  98. 98. If  MySQL  is  a  truck Fast  enough Disk  based Robust Fast  enough                    disk  based                    robustTuesday, June 5, 2012
  99. 99. If  MySQL  is  a  truck,  Redis  is  a  race  car Super  fast RAM  based Fragile Super  fast                    RAM  based                    fragileTuesday, June 5, 2012
  100. 100. Big  and  staMc  data  in  MySQL,  rest  goes  to  Redis 256  GB  data 60  GB  data 10%  writes 50%  writes h"p://www.flickr.com/photos/erix/245657047/Tuesday, June 5, 2012
  101. 101. Lots  of  boxes,  but  automaMon  helps  a  lot! lb lb app app app app app app app app app app app app app app app app app app app app app app app app app app app app app app app app app app app app app app app db db db db db redis redis redis redis redisTuesday, June 5, 2012
  102. 102. We  reached  1  million  daily  users! &$!!!$!!!" %$#!!$!!!" %$!!!$!!!" 1,000,000  -­‐  Big  party! #!!$!!!" !" ()*%!" +,-*%!" ./0*%!" +12*%%" ()*%%" +,-*%%" ./0*%%"Tuesday, June 5, 2012
  103. 103. We  started  archiving  inacMve  users &$!!!$!!!" %$#!!$!!!" 50%  DB %$!!!$!!!" reduc2on #!!$!!!" !" ()*%!" +,-*%!" ./0*%!" +12*%%" ()*%%" +,-*%%" ./0*%%"Tuesday, June 5, 2012
  104. 104. We  even  survived  a  complete  data  center  loss &$!!!$!!!" EBS  no %$#!!$!!!" more! %$!!!$!!!" #!!$!!!" !" ()*%!" +,-*%!" ./0*%!" +12*%%" ()*%%" +,-*%%" ./0*%%"Tuesday, June 5, 2012
  105. 105. We  improved  our  MySQL  schema  on-­‐the-­‐fly &$!!!$!!!" 30%  DB %$#!!$!!!" reduc2on %$!!!$!!!" #!!$!!!" !" ()*%!" +,-*%!" ./0*%!" +12*%%" ()*%%" +,-*%%" ./0*%%"Tuesday, June 5, 2012
  106. 106. Meanwhile  we  have  more  than  2M  daily  users &$!!!$!!!" %$#!!$!!!" %$!!!$!!!" #!!$!!!" !" ()*%!" +,-*%!" ./0*%!" +12*%%" ()*%%" +,-*%%" ./0*%%"Tuesday, June 5, 2012
  107. 107. A  journey  to  1,000,000  daily  users Start  of  the  journey 6  weeks  of  pain Paredise  (or  not?) ConclusionTuesday, June 5, 2012
  108. 108. EvoluMon  every  week EVOLUTION of  sotware &$!!!$!!!" %$#!!$!!!" %$!!!$!!!" #!!$!!!" !" ()*%!" +,-*%!" ./0*%!" +12*%%" ()*%%" +,-*%%" ./0*%%"Tuesday, June 5, 2012
  109. 109. EvoluMon  every  week EVOLUTION of  sotware &$!!!$!!!" %$#!!$!!!" %$!!!$!!!" #!!$!!!" !" ()*%!" +,-*%!" ./0*%!" +12*%%" ()*%%" +,-*%%" ./0*%%"Tuesday, June 5, 2012
  110. 110. EvoluMon  every  week REVOLUTION of  sotware &$!!!$!!!" %$#!!$!!!" %$!!!$!!!" #!!$!!!" !" ()*%!" +,-*%!" ./0*%!" +12*%%" ()*%%" +,-*%%" ./0*%%"Tuesday, June 5, 2012
  111. 111. EvoluMon  every  week REVOLUTION of  sotware &$!!!$!!!" %$#!!$!!!" %$!!!$!!!" #!!$!!!" !" ()*%!" +,-*%!" ./0*%!" +12*%%" ()*%%" +,-*%%" ./0*%%"Tuesday, June 5, 2012
  112. 112. EvoluMon  every  week REVOLUTION of  sotware &$!!!$!!!" %$#!!$!!!" %$!!!$!!!" #!!$!!!" !" ()*%!" +,-*%!" ./0*%!" +12*%%" ()*%%" +,-*%%" ./0*%%"Tuesday, June 5, 2012
  113. 113. Works  for  teams  ...Tuesday, June 5, 2012
  114. 114. Each  new  game  is  a  revoluMonTuesday, June 5, 2012
  115. 115. Each  new  game  is  a  revoluMonTuesday, June 5, 2012
  116. 116. Each  new  game  is  a  revoluMonTuesday, June 5, 2012
  117. 117. Each  new  game  is  a  revoluMonTuesday, June 5, 2012
  118. 118. Each  new  game  is  a  revoluMonTuesday, June 5, 2012
  119. 119. Works  for  teams  and  for  companies !""#$%&"()"*+,Tuesday, June 5, 2012
  120. 120. Ques2ons? Jesper  Richter-­‐Reichhelm @jrirei slideshare.net/wooga wooga.com/jobsTuesday, June 5, 2012
  121. 121. Tuesday, June 5, 2012

×