1,000,000 daily users and no cache (Splash 2011)
Upcoming SlideShare
Loading in...5
×
 

1,000,000 daily users and no cache (Splash 2011)

on

  • 42,985 views

Online games pose a few interesting challenges on their backend: A single user generates one http call every few seconds and the balance between data read and write is close to 50/50 which makes the ...

Online games pose a few interesting challenges on their backend: A single user generates one http call every few seconds and the balance between data read and write is close to 50/50 which makes the use of a write through cache or other common scaling approaches less effective.

Starting from a rather classic Ruby on Rails application as the traffic grew we gradually changed it in order to meet the required performance. And when small changes no longer were enough we turned inside out parts of our data persistency layer migrating from SQL to NoSQL without taking downtimes longer than a few minutes.

Follow the problems we hit, how we diagnosed them, and how we got around limitations. See which tools we found useful and which other lessons we learned by running the system with a team of just two developers without a sysadmin or operation team as support.

Statistics

Views

Total Views
42,985
Views on SlideShare
35,146
Embed Views
7,839

Actions

Likes
127
Downloads
930
Comments
12

47 Embeds 7,839

http://www.netmagazine.com 3429
http://blog.nosqlfan.com 2278
http://mrjaba.posterous.com 298
http://www.wooga.com 269
http://www.developpez.net 215
http://www.worldit.info 196
http://a0.twimg.com 187
http://moin.neocoin.net 179
http://people.tivort.com 173
http://m.netmagazine.com 101
http://asyncionews.com 98
http://feed.feedsky.com 86
http://us-w1.rockmelt.com 56
https://twitter.com 46
http://www.kuqin.com 44
http://geekgabyte.com 21
http://xianguo.com 20
http://zhuaxia.com 19
http://paper.li 16
http://reader.youdao.com 10
http://www.zhuaxia.com 10
http://profeo.pl 9
http://feeds.feedburner.com 8
http://wiki.baby.com.cn 7
http://lanyrd.com 7
http://blog.newitfarmer.com 7
http://translate.googleusercontent.com 6
http://twitter.com 6
http://erlangmmo.blogspot.com 6
http://xue.uplook.cn 5
http://www.techgig.com 4
http://www.uplook.cn 3
http://reguemusik.wikispaces.com 2
http://socialmediawealth.net 2
http://webcache.googleusercontent.com 2
http://www.viewbix.com 2
https://www.google.gp 2
http://www.scoop.it 1
https://tweetdeck.twitter.com 1
http://www.r66r.net 1
http://131.253.14.98 1
http://tweetedtimes.com 1
https://si0.twimg.com 1
http://www.onlydoo.com 1
http://profeo.ac 1
http://local.techpostmedia.com 1
http://t.qq.com 1
More...

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

15 of 12 Post a comment

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • At peak we do about 8,000 https requests that generate 100,000 DB operations per second (of which 50,000 are updates). Most http requests generate only 2-3 DB updates, but there are a few very complex requests that cause dozens of updates, hence the high ratio.
    Are you sure you want to
    Your message goes here
    Processing…
  • page 12 said you got 14 billion requests/month, wihch equals with 4,000 requests per second, while there are 5000 DB updates per second. It seems these two number do not fit with each other.
    Are you sure you want to
    Your message goes here
    Processing…
  • Re myst1313: We only serve the landing page using SSL (400 million hits per month). We just added another load balancer to handle the load. Our Flash client does not use SSL to call our API so we do not have a problem there.
    Are you sure you want to
    Your message goes here
    Processing…
  • Re Алик Нематов: Redis is a NoSQL solution that matches our requirements. For other projects we also use CouchDB and Riak. We have experimented with Cassandra, too. Nice product but Redis was better suited for this game.
    Are you sure you want to
    Your message goes here
    Processing…
  • Hi! Very nice. You never talk about https. Facebook impose to use ssl to access to apps since beginning of october and i think ssl impose big bottleneck as it takes a lot of time (at least the firsts requests) for ssl negotiations. How you handle this and how it impact the load ? Just curious to see your graphs after october.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

1,000,000 daily users and no cache (Splash 2011) 1,000,000 daily users and no cache (Splash 2011) Presentation Transcript

  • Who  is  that  guy? Jesper  Richter-­‐Reichhelm Twi1er:  @jrirei Head  of  Engineering wooga   Berlin,  Germany
  • wooga  is  #3  game  developer  on  Facebook
  • Wooga  has  dedicated  game  teams Cooming soon
  • Flash  client  sends  state  changes  to  backend Flash  client Ruby  backend
  • Social  games  need  to  scale  quite  a  bit 400  million  PIs  /  month
  • Social  games  need  to  scale  quite  a  bit 400  million  PIs  /  month
  • Social  games  need  to  scale  quite  a  bit 14  billion  requests  /  month
  • Social  games  need  to  scale  quite  a  bit 14  billion  requests  /  month
  • Social  games  need  to  scale  quite  a  bit 14  billion  requests  /  month 100,000  DB  operaKons  /  second
  • Social  games  need  to  scale  quite  a  bit 14  billion  requests  /  month 50,000  DB  updates  /  second
  • Social  games  need  to  scale  quite  a  bit 14  billion  requests  /  month 50,000  DB  updates  /  second no  cache
  • A  journey  to  1,000,000  daily  users Start  of  the  journey 6  weeks  of  pain Paradise Conclusion
  • October  2009:  wooga’s  first  simulaKon  game
  • Instead  of  PHP  we  used  Ruby
  • Our  database  was  MySQL
  • Our  database  was  MySQL even  user  ids odd  user  ids
  • And  we  went  into  the  cloud
  • Master-­‐slave  replicaKon  for  DBs  worked  fine lb app app app db db
  • We  added  a  few  applicaKon  servers  over  Kme lb app app app app app app app app app db db
  • 250K  daily  users  and  no  problems&$!!!$!!!"%$#!!$!!!"%$!!!$!!!" #!!$!!!" Life  was  good !" ()*%!" +,-*%!" ./0*%!" +12*%%" ()*%%" +,-*%%" ./0*%%"
  • Life  was  well  and  I  went  on  a  nice  vacaKon TO  DO <picture:  Jesper  in  clot   canyon>
  • Our  bane:  MySQL  hiccups (!!"# !"# &!"# %!"# $!"# !"# !# )# (!# ()# $!# $)# *!# *)# %!#
  • Our  bane:  MySQL  hiccups (!!"# !"# &!"# %!"# $!"# !"# !# )# (!# ()# $!# $)# *!# *)# %!#
  • Our  bane:  MySQL  hiccups (!!"# !"# &!"# %!"# $!"# !"# !# )# (!# ()# $!# $)# *!# *)# %!#
  • A  journey  to  1,000,000  daily  users Start  of  the  journey 6  weeks  of  pain Paradise Conclusion
  • SQL  queries  generated  by  Rubyamf  gem AMF  responses  to  Flash  client
  • SQL  queries  generated  by  Rubyamf  gem AMF  responses  to  Flash  client Wrong  config... ...  so  associated  data  was  included,  too
  • SQL  queries  generated  by  Rubyamf  gem AMF  responses  to  Flash  client Wrong  config... ...  so  associated  data  was  included,  too =>  Easy  to  fix
  • More  traffic  using  the  same  cluster lb app app app app app app app app app db db
  • Config  tweaks  brought  us  to  300K  DAU&$!!!$!!!"%$#!!$!!!"%$!!!$!!!" #!!$!!!" Config  fixes !" ()*%!" +,-*%!" ./0*%!" +12*%%" ()*%%" +,-*%%" ./0*%%"
  • AcKveRecord’s  checks  caused  20%  extra  DB   Checking  connecKon  state MySQL  process  list  full  of  ‘status’  calls
  • AcKveRecord’s  checks  caused  20%  extra  DB   Checking  connecKon  state MySQL  process  list  full  of  ‘status’  calls =>  Fixed  by  1  line  of  code
  • I/O  on  MySQL  masters  sKll  was  the  bo^leneck New  Relic:  60%  of  all  UPDATEs  on  ‘Kles’  table
  • Tiles  are  part  of  the  core  game  loop Core  game  loop 1)  plant 2)  wait 3)  harvest
  • We  started  to  shard  on  model,  too Adding  new  shards old   old   master slave
  • We  started  to  shard  on  model,  too Adding  new  shards 1)  Setup  new  masters  as  slaves  of  old  ones old   old   new   master slave master
  • We  started  to  shard  on  model,  too Adding  new  shards 1)  Setup  new  masters old   old   new   new   master slave master slave
  • We  started  to  shard  on  model,  too Adding  new  shards 1)  Setup  new  masters 2)  Start  using  new  masters old   old   new   new   master slave master slave
  • We  started  to  shard  on  model,  too Adding  new  shards 1)  Setup  new  masters 2)  Start  using  new  masters 3)  Cut  replica<on old   old   new   new   master slave master slave
  • We  started  to  shard  on  model,  too Adding  new  shards 1)  Setup  new  masters 2)  Start  using  new  masters 3)  Cut  replica<on 4)  Truncate old   old   new   new   master slave master slave
  • 4  DB  masters  and  a  few  more  servers lb app app app app app app app app app app app app app app app app <les <les db db db db
  • Sharding  by  model  brought  us  to  400K  DAU&$!!!$!!!"%$#!!$!!!"%$!!!$!!!" #!!$!!!" Shard  by  model !" ()*%!" +,-*%!" ./0*%!" +12*%%" ()*%%" +,-*%%" ./0*%%"
  • We  improved  our  MySQL  setup RAID-­‐0  of  EBS  volumes
  • We  improved  our  MySQL  setup RAID-­‐0  of  EBS  volumes Using  XtraDB
  • We  improved  our  MySQL  setup RAID-­‐0  of  EBS  volumes Using  XtraDB Tweaking  my.cnf
  • Sharding  gem  circumvented  AR’s  internal  cache AcKveRecord  caches  SQL  queries...
  • Sharding  gem  circumvented  AR’s  internal  cache AcKveRecord  caches  SQL  queries... ...  only  in  our  development  environment!
  • Sharding  gem  circumvented  AR’s  internal  cache AcKveRecord  caches  SQL  queries... ...  only  in  our  development  environment! =>  Fixed  by  2  lines  of  code
  • I/O  sKll  was  not  fast  enough If  2  +  2  is  not  enough,  ...
  • I/O  sKll  was  not  fast  enough If  2  +  2  is  not  enough,  ... …  perhaps  4  +  4  masters  will  do?
  • It’s  no  fun  to  handle  8+8  MySQL  DBs lb app app app app app app app app app app app app app app app app app app <les <les db db db db
  • It’s  no  fun  to  handle  8+8  MySQL  DBs lb app app app app app app app app app app app app app app app app app app <les <les <les <les db db db db db db db db
  • At  500K  DAU  we  were  at  a  dead  end&$!!!$!!!"%$#!!$!!!"%$!!!$!!!" #!!$!!!" !" ()*%!" +,-*%!" ./0*%!" +12*%%" ()*%%" +,-*%%" ./0*%%"
  • At  500K  DAU  we  were  at  a  dead  end&$!!!$!!!"%$#!!$!!!"%$!!!$!!!" #!!$!!!" !" ()*%!" +,-*%!" ./0*%!" +12*%%" ()*%%" +,-*%%" ./0*%%"
  • I/O  remained  bo^leneck  for  MySQL  UPDATEs Each  DB  master  could  do about  1000  DB  write/s.
  • I/O  remained  bo^leneck  for  MySQL  UPDATEs Each  DB  master  could  do about  1000  DB  write/s. That’s  not  enough!
  • Pick  the  right  tool  for  the  job!
  • Redis  is  fast  but  goes  beyond  simple  key/value Redis  is  a  key-­‐value  store Hashes,  Sets,  Sorted  Sets,  Lists Atomic  opera<ons  like  set,  get,  increment
  • Redis  is  fast  but  goes  beyond  simple  key/value Redis  is  a  key-­‐value  store Hashes,  Sets,  Sorted  Sets,  Lists Atomic  opera<ons  like  set,  get,  increment 50,000  transacKons/s  on  EC2 Writes  are  as  fast  as  reads
  • Wooga  has  dedicated  game  teams
  • Shelf  Kles  :  An  ideal  candidate  for  using   Shelf  Kles: {  plant1  =>  184, plant2  =>  141, plant3  =>  130, plant4  =>  112, …  }
  • Shelf  Kles  :  An  ideal  candidate  for  using  Redis   Redis  Hash HGETALL HGETHSET HINCRBY …
  • Migrate  on  the  fly  when  accessing  new  model
  • Migrate  on  the  fly  -­‐  but  only  once true  if  id  could  be  added else  false
  • Typical  migraKon  throughput  over  3  days
  • Migrate  on  the  fly  -­‐  and  clean  up  later 1. Let  migraKon  run  unKl  everything  cools  down
  • Migrate  on  the  fly  -­‐  and  clean  up  later 1. Let  migraKon  run  unKl  everything  cools  down 2. Migrate  the  rest  manually
  • Migrate  on  the  fly  -­‐  and  clean  up  later 1. Let  migraKon  run  unKl  everything  cools  down 2. Migrate  the  rest  manually 3. Remove  migraKon  code
  • Migrate  on  the  fly  -­‐  and  clean  up  later 1. Let  migraKon  run  unKl  everything  cools  down 2. Migrate  the  rest  manually 3. Remove  migraKon  code 4. Wait  unKl  no  fallback  necessary
  • Migrate  on  the  fly  -­‐  and  clean  up  later 1. Let  migraKon  run  unKl  everything  cools  down 2. Migrate  the  rest  manually 3. Remove  migraKon  code 4. Wait  unKl  no  fallback  necessary 5. Remove  SQL  table
  • A  journey  to  1,000,000  daily  users Start  of  the  journey 6  weeks  of  pain Paredise  (or  not?) Conclusion
  • Again:  Tiles  are  part  of  the  core  game  loop Core  game  loop 1)  plant 2)  wait 3)  harvest
  • Size  ma^ers  for  migraKons MigraKon  check  overload Migra<on  only  on  startup
  • Size  ma^ers  for  migraKons MigraKon  check  overload Migra<on  only  on  startup Overlooked  an  edge  case Only  migrate  1%  of  users Con<nue  if  everything  is  ok
  • In-­‐memory  DBs  don’t  like  to  dump  to  disk Dumping  to  disk SAVE  is  blocking BGSAVE  needs  free  RAM
  • In-­‐memory  DBs  don’t  like  to  dump  to  disk Dumping  to  disk SAVE  is  blocking BGSAVE  needs  free  RAM Latency  increase  by  100%
  • In-­‐memory  DBs  don’t  like  to  dump  to  disk Dumping  to  disk SAVE  is  blocking BGSAVE  needs  free  RAM Latency  increase  by  100% =>  BGSAVE  on  slaves  every  15  minutes
  • Redis  replicaKon  starts  with  a  BGSAVE BGSAVE  on  master Slave  imports  dumped  file
  • Redis  replicaKon  starts  with  a  BGSAVE BGSAVE  on  master Slave  imports  dumped  file =>  No  RAM  means  no  new  slaves
  • Redis  had  a  memory  fragmenKon  problem 44  GB in  8  days 24  GB
  • Redis  had  a  memory  fragmenKon  problem 38  GB in  3  days 24  GB
  • If  MySQL  is  a  truck Fast  enough Disk  based Robust Fast  enough                    disk  based                    robust
  • If  MySQL  is  a  truck,  Redis  is  a  race  car Super  fast RAM  based Fragile Super  fast                    RAM  based                    fragile
  • Big  and  staKc  data  in  MySQL,  rest  goes  to  Redis 256  GB  data 60  GB  data 10%  writes 50%  writes hSp://www.flickr.com/photos/erix/245657047/
  • Lots  of  boxes,  but  automaKon  helps  a  lot! lb lbapp app app app app app app app app app app app appapp app app app app app app app app app app app appapp app app app app app app app app app app app appdb db db db db redis redis redis redis redis
  • We  reached  1  million  daily  users!&$!!!$!!!"%$#!!$!!!"%$!!!$!!!" 1,000,000  -­‐  Big  party! #!!$!!!" !" ()*%!" +,-*%!" ./0*%!" +12*%%" ()*%%" +,-*%%" ./0*%%"
  • We  started  archiving  inacKve  users&$!!!$!!!"%$#!!$!!!" 50%  DB%$!!!$!!!" reducKon #!!$!!!" !" ()*%!" +,-*%!" ./0*%!" +12*%%" ()*%%" +,-*%%" ./0*%%"
  • We  even  survived  a  complete  data  center  loss&$!!!$!!!" EBS  no%$#!!$!!!" more!%$!!!$!!!" #!!$!!!" !" ()*%!" +,-*%!" ./0*%!" +12*%%" ()*%%" +,-*%%" ./0*%%"
  • We  improved  our  MySQL  schema  on-­‐the-­‐fly&$!!!$!!!" 30%  DB%$#!!$!!!" reducKon%$!!!$!!!" #!!$!!!" !" ()*%!" +,-*%!" ./0*%!" +12*%%" ()*%%" +,-*%%" ./0*%%"
  • Will  we  reach  2  million  daily  users?&$!!!$!!!"%$#!!$!!!"%$!!!$!!!" #!!$!!!" !" ()*%!" +,-*%!" ./0*%!" +12*%%" ()*%%" +,-*%%" ./0*%%"
  • A  journey  to  1,000,000  daily  users Start  of  the  journey 6  weeks  of  pain Paredise  (or  not?) Conclusion
  • You  do  not  know  the  future Plan  ahead
  • You  do  not  know  the  future Plan  ahead Learn
  • You  do  not  know  the  future Plan  ahead Learn Adapt
  • EvoluKon  every  week EVOLUTION of  sonware &$!!!$!!!" %$#!!$!!!" %$!!!$!!!" #!!$!!!" !" ()*%!" +,-*%!" ./0*%!" +12*%%" ()*%%" +,-*%%" ./0*%%"
  • EvoluKon  every  week EVOLUTION of  sonware &$!!!$!!!" %$#!!$!!!" %$!!!$!!!" #!!$!!!" !" ()*%!" +,-*%!" ./0*%!" +12*%%" ()*%%" +,-*%%" ./0*%%"
  • EvoluKon  every  week EVOLUTION of  sonware
  • EvoluKon  every  week,  RevoluKon  if  necessary REVOLUTION of  sonware
  • EvoluKon  every  week,  RevoluKon  if  necessary REVOLUTION of  sonware
  • EvoluKon  every  week,  RevoluKon  if  necessary REVOLUTION of  sonware &$!!!$!!!" %$#!!$!!!" %$!!!$!!!" #!!$!!!" !" ()*%!" +,-*%!" ./0*%!" +12*%%" ()*%%" +,-*%%" ./0*%%"
  • Each  new  game  is  a  revoluKon
  • Each  new  game  is  a  revoluKon
  • Each  new  game  is  a  revoluKon
  • Each  new  game  is  a  revoluKon
  • Each  new  game  is  a  revoluKon Cooming soon
  • Works  for  teams  ...
  • Works  for  teams  and  for  companies !""#$%&"()"*+,
  • Thank  you!Jesper  Richter-­‐Reichhelm @jrirei slideshare.net/wooga wooga.com/jobs