Getting real with erlang                             From the idea to a live system                               Knut Nes...
Social Games              Flash client (game)                                             HTTP API                        ...
Social Games                                       HTTP API• @ 1 000 000 daily users• 5000 HTTP reqs/sec• more than 90% wr...
November 2010                    • At the erlang user group!                    • Learn where/how erlang is used          ...
Why looking into                                 erlang?                                         HTTP API• @ 1 000 000 dai...
60000 qps                    • Most maintenance effort in databases                    • mix of SQL / NoSQL               ...
Social games data            • User data self contained            • Strong hot/cold pattern - gaming session            •...
User session                    User data             1. start session   1. load all             2. game actions    2. rea...
User session              Erlang process             1. start session   1. start (load state)             2. game actions ...
User session DB usage                                                 Stateful server                                State...
Erlang process                    • Follows session lifecycle                    • Contains and defends state (data)      ...
December 2010                    • Prototype                                              erlang              user1       ...
January 2011                    • erlang team goes from 1 to 2 developers                                Open topics      ...
ArchitectureThursday, November 3, 2011
Architecture  goals       • Move  data  into  the  applica4on  server       • Be  as  simple  as  possible       • Gracefu...
Session                              Session                               Session                                        ...
Worker                             Session                              Session                               Session     ...
Worker             Worker      Worker          Session           Session           Session                              Se...
Worker                               Worker                                Worker                             Session     ...
Worker                               Worker                                Worker                             Session     ...
Worker                               Worker                                Worker                             Session     ...
New  user  comes  online                                  Worker                                   Worker                 ...
New  user  comes  online                                                 Worker                                           ...
New  user  comes  online                                                 Worker                                           ...
New  user  comes  online                                                 Worker                                           ...
New  user  comes  online                                                 Worker                                           ...
New  user  comes  online                                                                              Game  acFons        ...
User  goes  offline  (session  Fmes  out)                                           Worker                                  ...
Implementa4on  of  game  logicThursday, November 3, 2011
Dream  game  logic                             Fast                Safe                                        +          ...
Dream  game  logic                                                  26Thursday, November 3, 2011
Dream  game  logic       • We  want  high  throughput  (for  scaling)             – Try  to  spend  as  liPle  CPU  Fme  a...
How  we  avoid  using  CPU       • Remove  need  for  DB  serializaFon  by  storing  data  in           process       • Ga...
How  to  find  where  CPU  is  used       • Profile  (eprof,  fprof,  kprof[1])       • Measure  garbage  collecFon  (proces...
Opera4ons                             August  2011  -­‐  ...Thursday, November 3, 2011
Opera4ons       • At  wooga,  developers  also  operate  the  game       • Most  developers  are  ex-­‐sysadmins       • S...
Deployments       • Goal:  upgrade  without  annoying  users       • SoT  purge       • Set  system  quiet  (webserver  & ...
How  we  know  what’s  going  on       • Event  logging  to  syslog             – Session  start,  session  end  (process ...
How  we  know  what’s  going  on         Game  error:  The  game  acFon  is  not  allowed  with  the  current  server  sta...
How  we  know  what’s  going  on                    Half-­‐word  emulator                                                 ...
How  we  know  what’s  going  on                                                       36Thursday, November 3, 2011
How  we  know  what’s  going  on                                                       37Thursday, November 3, 2011
How  we  know  what’s  going  on                                                       38Thursday, November 3, 2011
How  we  know  what’s  going  on                                                       39Thursday, November 3, 2011
ConclusionsThursday, November 3, 2011
Conclusions  -­‐  database                              Ruby Stateless           Erlang Stateful          30000          2...
Conclusions                                Db  maintenace                             AWS  S3  as  a  main  datastore     ...
Conclusions                             data  locality       • average  game  call  <  1ms       • no  db  roundtrip  at  ...
Conclusions                             data  locality       • finally  CPU  bound       • no  CPU  4me  for  serializing/d...
Conclusions                              CPU  usage       • 300K  daily  users       • 1000  hcp  req/sec  (game  ac4ons) ...
Conclusions                                  extra  benefits                             en4re  user  state  in  one  proce...
Conclusions                                       extra  benefits                             One  user  session  -­‐>  one...
Conclusions                  Thanks  to  VM  process  introspec4on               process  reduc4ons  -­‐>  cost  of  a  ga...
Conclusions       • a  radical  change  was  made  possible  by  a  radically           different  tool  (erlang)          ...
Conclusions       • a  radical  change  was  made  possible  by  a  radically           different  tool  (erlang)       • e...
Conclusions       • a  radical  change  was  made  possible  by  a  radically           different  tool  (erlang)       • e...
Conclusions       • november  2010:  0  lines  of  erlang  @wooga       • november  2011:  1  erlang  game  server  live  ...
Q&A                               Knut Nesheim @knutin                              Paolo Negri @hungryblank              ...
Upcoming SlideShare
Loading in...5
×

Getting real with Erlang: From the idea to a live system

8,953

Published on

From 0 to 1,000,000 daily users with Erlang

This talk wants to sum up the experience about building and maintaining live our first social game server completely written in erlang.

The game has been now published and it’s time to think about the results achieved. In this talk we’ll describe details of our architecture, principles that drove its design and considerations about hosting the infrastructure.

We’ll go in detail through what worked well and what needs to be improved, all backed by data coming from our live system.
Come and follow us if you want to get the taste of real world erlang.

Published in: Technology, News & Politics
2 Comments
40 Likes
Statistics
Notes
  • I love Wooga slideshares. It's like a game for programmers. You've got the slides but not the audio, so you have to infer the missing info by deepening your knowledge of system architectures and cross-referencing several presentations. Very educational, in a fun way! Though I still wish we had all the audio... ;)
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • do you make parallel replicas of your session processes?
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
8,953
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
221
Comments
2
Likes
40
Embeds 0
No embeds

No notes for slide

Getting real with Erlang: From the idea to a live system

  1. 1. Getting real with erlang From the idea to a live system Knut Nesheim @knutin Paolo Negri @hungryblankThursday, November 3, 2011
  2. 2. Social Games Flash client (game) HTTP API http://www.flickr.com/photos/theplanetdotcom/4879421344Thursday, November 3, 2011
  3. 3. Social Games HTTP API• @ 1 000 000 daily users• 5000 HTTP reqs/sec• more than 90% writesThursday, November 3, 2011
  4. 4. November 2010 • At the erlang user group! • Learn where/how erlang is used • 0 lines of erlang code across all @wooga code baseThursday, November 3, 2011
  5. 5. Why looking into erlang? HTTP API• @ 1 000 000 daily users• 5000 HTTP reqs/sec• around 60000 queries/secThursday, November 3, 2011
  6. 6. 60000 qps • Most maintenance effort in databases • mix of SQL / NoSQL • Already using in RAM data stores (REDIS) • RAM DB are fast but expensive / high maintenanceThursday, November 3, 2011
  7. 7. Social games data • User data self contained • Strong hot/cold pattern - gaming session • Heavy write load (caching is ineffective)Thursday, November 3, 2011
  8. 8. User session User data 1. start session 1. load all 2. game actions 2. read/update many times 3. end session 3. data becomes coldThursday, November 3, 2011
  9. 9. User session Erlang process 1. start session 1. start (load state) 2. game actions 2. responds to messages (use state) 3. end session 3. stop (save state)Thursday, November 3, 2011
  10. 10. User session DB usage Stateful server Stateless server (Erlang ) start session load user state load user state game actions many queries end session save user stateThursday, November 3, 2011
  11. 11. Erlang process • Follows session lifecycle • Contains and defends state (data) • Acts as a lock/serializer (one message at a time)Thursday, November 3, 2011
  12. 12. December 2010 • Prototype erlang user1 user2 user3 process = user session user4 user5 user6 user7 user8 userNThursday, November 3, 2011
  13. 13. January 2011 • erlang team goes from 1 to 2 developers Open topics • Distribution/clustering • Error handling • Deployments • OperationsThursday, November 3, 2011
  14. 14. ArchitectureThursday, November 3, 2011
  15. 15. Architecture  goals • Move  data  into  the  applica4on  server • Be  as  simple  as  possible • Graceful  degrada4on  when  DBs  go  down • Easy  to  inspect  and  repair  state  of  cluster • Easy  to  add  more  machines  for  scaling  out 15Thursday, November 3, 2011
  16. 16. Session Session Session 16Thursday, November 3, 2011
  17. 17. Worker Session Session Session 17Thursday, November 3, 2011
  18. 18. Worker Worker Worker Session Session Session Session Session Session Session Session Session 18Thursday, November 3, 2011
  19. 19. Worker Worker Worker Session Session Session Session Session Session Session Coordinator Coordinator 19Thursday, November 3, 2011
  20. 20. Worker Worker Worker Session Session Session Session Session Session Session Coordinator Coordinator Lock 20Thursday, November 3, 2011
  21. 21. Worker Worker Worker Session Session Session Session Session Session Session Coordinator Coordinator Lock DBs 21Thursday, November 3, 2011
  22. 22. New  user  comes  online Worker Worker Worker Session Session Session Session Session Session Session Coordinator Lock DBs 22Thursday, November 3, 2011
  23. 23. New  user  comes  online Worker Worker Flash  calls  ”setup” Worker Session Session Session Session Session Session Session Coordinator Lock DBs 22Thursday, November 3, 2011
  24. 24. New  user  comes  online Worker Worker Flash  calls  ”setup” Worker Session Session Session Session Session Session Session Coordinator session:start(Uid) on  suitable  worker Lock DBs 22Thursday, November 3, 2011
  25. 25. New  user  comes  online Worker Worker Flash  calls  ”setup” Worker Session Session Session Session Session Session Session Coordinator session:start(Uid) s3:get(Uid) on  suitable  worker Lock DBs 22Thursday, November 3, 2011
  26. 26. New  user  comes  online Worker Worker Flash  calls  ”setup” Worker Session Session Session Session Session Session Session lock:acquire(Uid) Coordinator test-­‐and-­‐set session:start(Uid) s3:get(Uid) on  suitable  worker Lock DBs 22Thursday, November 3, 2011
  27. 27. New  user  comes  online Game  acFons   Worker from  Flash Worker Worker Flash  calls  ”setup” Session Session Session Session Session Session Session lock:acquire(Uid) Coordinator test-­‐and-­‐set session:start(Uid) s3:get(Uid) on  suitable  worker Lock DBs 22Thursday, November 3, 2011
  28. 28. User  goes  offline  (session  Fmes  out) Worker Worker Worker gen_server  Fmeout Session Session Session Session Session Session Session lock:release/1 s3:put/1 Coordinator Lock DBs 23Thursday, November 3, 2011
  29. 29. Implementa4on  of  game  logicThursday, November 3, 2011
  30. 30. Dream  game  logic Fast Safe + 25Thursday, November 3, 2011
  31. 31. Dream  game  logic 26Thursday, November 3, 2011
  32. 32. Dream  game  logic • We  want  high  throughput  (for  scaling) – Try  to  spend  as  liPle  CPU  Fme  as  possible – Avoid  heavy  computaFon – Try  to  avoid  creaFng  garbage • ..and  simple  and  testable  logic  (correctness) – FuncFonal  game  logic  makes  thinking  about  code  easy – Single  entry  point,  gets  ”request”  and  game  state – Code  for  happy  case,  roll  back  on  game-­‐related  excepFon 27Thursday, November 3, 2011
  33. 33. How  we  avoid  using  CPU • Remove  need  for  DB  serializaFon  by  storing  data  in   process • Game  is  designed  to  avoid  heavy  liXing  in  the  backend,   very  simple  game  logic • OpFmize  hot  parts  on  the  criFcal  path,  like  collision   detecFon,  regrowing  of  forest • Generate  erlang  modules  for  read-­‐heavy  configuraFon   (~1  billion  reads/1  write  per  week) • Use  NIFs  for  parsing  JSON  (jiffy) 28Thursday, November 3, 2011
  34. 34. How  to  find  where  CPU  is  used • Profile  (eprof,  fprof,  kprof[1]) • Measure  garbage  collecFon  (process_info/1,  gcprof[2]) • Conduct  experiment:  change  code,  measure,  repeat • Is  the  increased  performance  worth  the  increase  in   complexity? • SomeFmes  a  radically  different  approach  is  needed.. [1]:  github.com/knuFn/kprof [2]:  github.com/knuFn/gcprof 29Thursday, November 3, 2011
  35. 35. Opera4ons August  2011  -­‐  ...Thursday, November 3, 2011
  36. 36. Opera4ons • At  wooga,  developers  also  operate  the  game • Most  developers  are  ex-­‐sysadmins • Simple  tools: –remsh  for  deployments,  maintenance,  debugging –automaFon  with  chef –syslog,  tail,  cut,  awk,  grep –verbose  crash  logs  (SASL) –alarms  only  when  something  really  bad  happens 31Thursday, November 3, 2011
  37. 37. Deployments • Goal:  upgrade  without  annoying  users • SoT  purge • Set  system  quiet  (webserver  &  coordinator) • Reload • Open  the  flood  gates • Migrate  process  memory  state  on-­‐demand • Total  4me  not  answering  game  requests:  <  1s 32Thursday, November 3, 2011
  38. 38. How  we  know  what’s  going  on • Event  logging  to  syslog – Session  start,  session  end  (process  memory,  gc,  game  stats) – Game-­‐related  excepFons • Latency  measurement  within  the  app • Use  munin  to  pull  overall  server  stats – CPU  and  memory  usage  by  beam.smp – Memory  used  by  processes,  ets  tables,  etc – Throughput  of  app,  dbs – Throughput  of  coordinator,  workers,  lock 33Thursday, November 3, 2011
  39. 39. How  we  know  what’s  going  on Game  error:  The  game  acFon  is  not  allowed  with  the  current  server  state  and  configuraFon 34Thursday, November 3, 2011
  40. 40. How  we  know  what’s  going  on Half-­‐word  emulator 35Thursday, November 3, 2011
  41. 41. How  we  know  what’s  going  on 36Thursday, November 3, 2011
  42. 42. How  we  know  what’s  going  on 37Thursday, November 3, 2011
  43. 43. How  we  know  what’s  going  on 38Thursday, November 3, 2011
  44. 44. How  we  know  what’s  going  on 39Thursday, November 3, 2011
  45. 45. ConclusionsThursday, November 3, 2011
  46. 46. Conclusions  -­‐  database Ruby Stateless Erlang Stateful 30000 22500 15000 7500 700 0 queries/sec 41Thursday, November 3, 2011
  47. 47. Conclusions Db  maintenace AWS  S3  as  a  main  datastore one  document/user 0  maintenance/setup  cost Redis  for  the  derived  data  (leaderboard  etc.) load  very  far  from  Redis  max  capacity 42Thursday, November 3, 2011
  48. 48. Conclusions data  locality • average  game  call  <  1ms • no  db  roundtrip  at  every  request • no  need  for  low  latency  network • efficient  setup  for  cloud  environment 43Thursday, November 3, 2011
  49. 49. Conclusions data  locality • finally  CPU  bound • no  CPU  4me  for  serializing/deserializing  data   from  db • CPU  only  busy  transforming  data  (minimum   possible  ac4vity) 44Thursday, November 3, 2011
  50. 50. Conclusions CPU  usage • 300K  daily  users • 1000  hcp  req/sec  (game  ac4ons) • 4  m1.large  AWS  instances  (dual  core  8GB  RAM) • 2  instances  (coordinators)  5%  CPU  load • 2  instances  (workers)  20%  CPU  load 45Thursday, November 3, 2011
  51. 51. Conclusions extra  benefits en4re  user  state  in  one  process + immutability = Transac4onal  behavior 46Thursday, November 3, 2011
  52. 52. Conclusions extra  benefits One  user  session  -­‐>  one  erlang  process The  erlang  VM  is  aware  of  processes => the  erlang  VM  is  aware  of  user  sessions 47Thursday, November 3, 2011
  53. 53. Conclusions Thanks  to  VM  process  introspec4on process  reduc4ons  -­‐>  cost  of  a  game  ac4on process  used  memory  -­‐>  memory  used  by  session We  gained  a  lot  of  knowledge  about  a  fundamental   ”business”  en4ty 48Thursday, November 3, 2011
  54. 54. Conclusions • a  radical  change  was  made  possible  by  a  radically   different  tool  (erlang) 49Thursday, November 3, 2011
  55. 55. Conclusions • a  radical  change  was  made  possible  by  a  radically   different  tool  (erlang) • erlang  can  be  good  for  data  intensive/high   throughput  applica4ons 49Thursday, November 3, 2011
  56. 56. Conclusions • a  radical  change  was  made  possible  by  a  radically   different  tool  (erlang) • erlang  can  be  good  for  data  intensive/high   throughput  applica4ons • stateful  is  not  necessarily  hard/dangerous/ unmaintainable 49Thursday, November 3, 2011
  57. 57. Conclusions • november  2010:  0  lines  of  erlang  @wooga • november  2011:  1  erlang  game  server  live ...with  more  erlang  coming,  join  us 50Thursday, November 3, 2011
  58. 58. Q&A Knut Nesheim @knutin Paolo Negri @hungryblank hPp://wooga.com/jobs 51Thursday, November 3, 2011
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×