Getting real with Erlang: From the idea to a live system
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Getting real with Erlang: From the idea to a live system

on

  • 8,503 views

From 0 to 1,000,000 daily users with Erlang ...

From 0 to 1,000,000 daily users with Erlang

This talk wants to sum up the experience about building and maintaining live our first social game server completely written in erlang.

The game has been now published and it’s time to think about the results achieved. In this talk we’ll describe details of our architecture, principles that drove its design and considerations about hosting the infrastructure.

We’ll go in detail through what worked well and what needs to be improved, all backed by data coming from our live system.
Come and follow us if you want to get the taste of real world erlang.

Statistics

Views

Total Views
8,503
Views on SlideShare
8,424
Embed Views
79

Actions

Likes
33
Downloads
194
Comments
2

8 Embeds 79

http://a0.twimg.com 37
https://twitter.com 34
http://us-w1.rockmelt.com 3
http://tweetedtimes.com 1
https://twimg0-a.akamaihd.net 1
https://si0.twimg.com 1
http://courtneybolton.parasoup.de 1
http://www.twylah.com 1
More...

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • I love Wooga slideshares. It's like a game for programmers. You've got the slides but not the audio, so you have to infer the missing info by deepening your knowledge of system architectures and cross-referencing several presentations. Very educational, in a fun way! Though I still wish we had all the audio... ;)
    Are you sure you want to
    Your message goes here
    Processing…
  • do you make parallel replicas of your session processes?
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Getting real with Erlang: From the idea to a live system Presentation Transcript

  • 1. Getting real with erlang From the idea to a live system Knut Nesheim @knutin Paolo Negri @hungryblankThursday, November 3, 2011
  • 2. Social Games Flash client (game) HTTP API http://www.flickr.com/photos/theplanetdotcom/4879421344Thursday, November 3, 2011
  • 3. Social Games HTTP API• @ 1 000 000 daily users• 5000 HTTP reqs/sec• more than 90% writesThursday, November 3, 2011
  • 4. November 2010 • At the erlang user group! • Learn where/how erlang is used • 0 lines of erlang code across all @wooga code baseThursday, November 3, 2011
  • 5. Why looking into erlang? HTTP API• @ 1 000 000 daily users• 5000 HTTP reqs/sec• around 60000 queries/secThursday, November 3, 2011
  • 6. 60000 qps • Most maintenance effort in databases • mix of SQL / NoSQL • Already using in RAM data stores (REDIS) • RAM DB are fast but expensive / high maintenanceThursday, November 3, 2011
  • 7. Social games data • User data self contained • Strong hot/cold pattern - gaming session • Heavy write load (caching is ineffective)Thursday, November 3, 2011
  • 8. User session User data 1. start session 1. load all 2. game actions 2. read/update many times 3. end session 3. data becomes coldThursday, November 3, 2011
  • 9. User session Erlang process 1. start session 1. start (load state) 2. game actions 2. responds to messages (use state) 3. end session 3. stop (save state)Thursday, November 3, 2011
  • 10. User session DB usage Stateful server Stateless server (Erlang ) start session load user state load user state game actions many queries end session save user stateThursday, November 3, 2011
  • 11. Erlang process • Follows session lifecycle • Contains and defends state (data) • Acts as a lock/serializer (one message at a time)Thursday, November 3, 2011
  • 12. December 2010 • Prototype erlang user1 user2 user3 process = user session user4 user5 user6 user7 user8 userNThursday, November 3, 2011
  • 13. January 2011 • erlang team goes from 1 to 2 developers Open topics • Distribution/clustering • Error handling • Deployments • OperationsThursday, November 3, 2011
  • 14. ArchitectureThursday, November 3, 2011
  • 15. Architecture  goals • Move  data  into  the  applica4on  server • Be  as  simple  as  possible • Graceful  degrada4on  when  DBs  go  down • Easy  to  inspect  and  repair  state  of  cluster • Easy  to  add  more  machines  for  scaling  out 15Thursday, November 3, 2011
  • 16. Session Session Session 16Thursday, November 3, 2011
  • 17. Worker Session Session Session 17Thursday, November 3, 2011
  • 18. Worker Worker Worker Session Session Session Session Session Session Session Session Session 18Thursday, November 3, 2011
  • 19. Worker Worker Worker Session Session Session Session Session Session Session Coordinator Coordinator 19Thursday, November 3, 2011
  • 20. Worker Worker Worker Session Session Session Session Session Session Session Coordinator Coordinator Lock 20Thursday, November 3, 2011
  • 21. Worker Worker Worker Session Session Session Session Session Session Session Coordinator Coordinator Lock DBs 21Thursday, November 3, 2011
  • 22. New  user  comes  online Worker Worker Worker Session Session Session Session Session Session Session Coordinator Lock DBs 22Thursday, November 3, 2011
  • 23. New  user  comes  online Worker Worker Flash  calls  ”setup” Worker Session Session Session Session Session Session Session Coordinator Lock DBs 22Thursday, November 3, 2011
  • 24. New  user  comes  online Worker Worker Flash  calls  ”setup” Worker Session Session Session Session Session Session Session Coordinator session:start(Uid) on  suitable  worker Lock DBs 22Thursday, November 3, 2011
  • 25. New  user  comes  online Worker Worker Flash  calls  ”setup” Worker Session Session Session Session Session Session Session Coordinator session:start(Uid) s3:get(Uid) on  suitable  worker Lock DBs 22Thursday, November 3, 2011
  • 26. New  user  comes  online Worker Worker Flash  calls  ”setup” Worker Session Session Session Session Session Session Session lock:acquire(Uid) Coordinator test-­‐and-­‐set session:start(Uid) s3:get(Uid) on  suitable  worker Lock DBs 22Thursday, November 3, 2011
  • 27. New  user  comes  online Game  acFons   Worker from  Flash Worker Worker Flash  calls  ”setup” Session Session Session Session Session Session Session lock:acquire(Uid) Coordinator test-­‐and-­‐set session:start(Uid) s3:get(Uid) on  suitable  worker Lock DBs 22Thursday, November 3, 2011
  • 28. User  goes  offline  (session  Fmes  out) Worker Worker Worker gen_server  Fmeout Session Session Session Session Session Session Session lock:release/1 s3:put/1 Coordinator Lock DBs 23Thursday, November 3, 2011
  • 29. Implementa4on  of  game  logicThursday, November 3, 2011
  • 30. Dream  game  logic Fast Safe + 25Thursday, November 3, 2011
  • 31. Dream  game  logic 26Thursday, November 3, 2011
  • 32. Dream  game  logic • We  want  high  throughput  (for  scaling) – Try  to  spend  as  liPle  CPU  Fme  as  possible – Avoid  heavy  computaFon – Try  to  avoid  creaFng  garbage • ..and  simple  and  testable  logic  (correctness) – FuncFonal  game  logic  makes  thinking  about  code  easy – Single  entry  point,  gets  ”request”  and  game  state – Code  for  happy  case,  roll  back  on  game-­‐related  excepFon 27Thursday, November 3, 2011
  • 33. How  we  avoid  using  CPU • Remove  need  for  DB  serializaFon  by  storing  data  in   process • Game  is  designed  to  avoid  heavy  liXing  in  the  backend,   very  simple  game  logic • OpFmize  hot  parts  on  the  criFcal  path,  like  collision   detecFon,  regrowing  of  forest • Generate  erlang  modules  for  read-­‐heavy  configuraFon   (~1  billion  reads/1  write  per  week) • Use  NIFs  for  parsing  JSON  (jiffy) 28Thursday, November 3, 2011
  • 34. How  to  find  where  CPU  is  used • Profile  (eprof,  fprof,  kprof[1]) • Measure  garbage  collecFon  (process_info/1,  gcprof[2]) • Conduct  experiment:  change  code,  measure,  repeat • Is  the  increased  performance  worth  the  increase  in   complexity? • SomeFmes  a  radically  different  approach  is  needed.. [1]:  github.com/knuFn/kprof [2]:  github.com/knuFn/gcprof 29Thursday, November 3, 2011
  • 35. Opera4ons August  2011  -­‐  ...Thursday, November 3, 2011
  • 36. Opera4ons • At  wooga,  developers  also  operate  the  game • Most  developers  are  ex-­‐sysadmins • Simple  tools: –remsh  for  deployments,  maintenance,  debugging –automaFon  with  chef –syslog,  tail,  cut,  awk,  grep –verbose  crash  logs  (SASL) –alarms  only  when  something  really  bad  happens 31Thursday, November 3, 2011
  • 37. Deployments • Goal:  upgrade  without  annoying  users • SoT  purge • Set  system  quiet  (webserver  &  coordinator) • Reload • Open  the  flood  gates • Migrate  process  memory  state  on-­‐demand • Total  4me  not  answering  game  requests:  <  1s 32Thursday, November 3, 2011
  • 38. How  we  know  what’s  going  on • Event  logging  to  syslog – Session  start,  session  end  (process  memory,  gc,  game  stats) – Game-­‐related  excepFons • Latency  measurement  within  the  app • Use  munin  to  pull  overall  server  stats – CPU  and  memory  usage  by  beam.smp – Memory  used  by  processes,  ets  tables,  etc – Throughput  of  app,  dbs – Throughput  of  coordinator,  workers,  lock 33Thursday, November 3, 2011
  • 39. How  we  know  what’s  going  on Game  error:  The  game  acFon  is  not  allowed  with  the  current  server  state  and  configuraFon 34Thursday, November 3, 2011
  • 40. How  we  know  what’s  going  on Half-­‐word  emulator 35Thursday, November 3, 2011
  • 41. How  we  know  what’s  going  on 36Thursday, November 3, 2011
  • 42. How  we  know  what’s  going  on 37Thursday, November 3, 2011
  • 43. How  we  know  what’s  going  on 38Thursday, November 3, 2011
  • 44. How  we  know  what’s  going  on 39Thursday, November 3, 2011
  • 45. ConclusionsThursday, November 3, 2011
  • 46. Conclusions  -­‐  database Ruby Stateless Erlang Stateful 30000 22500 15000 7500 700 0 queries/sec 41Thursday, November 3, 2011
  • 47. Conclusions Db  maintenace AWS  S3  as  a  main  datastore one  document/user 0  maintenance/setup  cost Redis  for  the  derived  data  (leaderboard  etc.) load  very  far  from  Redis  max  capacity 42Thursday, November 3, 2011
  • 48. Conclusions data  locality • average  game  call  <  1ms • no  db  roundtrip  at  every  request • no  need  for  low  latency  network • efficient  setup  for  cloud  environment 43Thursday, November 3, 2011
  • 49. Conclusions data  locality • finally  CPU  bound • no  CPU  4me  for  serializing/deserializing  data   from  db • CPU  only  busy  transforming  data  (minimum   possible  ac4vity) 44Thursday, November 3, 2011
  • 50. Conclusions CPU  usage • 300K  daily  users • 1000  hcp  req/sec  (game  ac4ons) • 4  m1.large  AWS  instances  (dual  core  8GB  RAM) • 2  instances  (coordinators)  5%  CPU  load • 2  instances  (workers)  20%  CPU  load 45Thursday, November 3, 2011
  • 51. Conclusions extra  benefits en4re  user  state  in  one  process + immutability = Transac4onal  behavior 46Thursday, November 3, 2011
  • 52. Conclusions extra  benefits One  user  session  -­‐>  one  erlang  process The  erlang  VM  is  aware  of  processes => the  erlang  VM  is  aware  of  user  sessions 47Thursday, November 3, 2011
  • 53. Conclusions Thanks  to  VM  process  introspec4on process  reduc4ons  -­‐>  cost  of  a  game  ac4on process  used  memory  -­‐>  memory  used  by  session We  gained  a  lot  of  knowledge  about  a  fundamental   ”business”  en4ty 48Thursday, November 3, 2011
  • 54. Conclusions • a  radical  change  was  made  possible  by  a  radically   different  tool  (erlang) 49Thursday, November 3, 2011
  • 55. Conclusions • a  radical  change  was  made  possible  by  a  radically   different  tool  (erlang) • erlang  can  be  good  for  data  intensive/high   throughput  applica4ons 49Thursday, November 3, 2011
  • 56. Conclusions • a  radical  change  was  made  possible  by  a  radically   different  tool  (erlang) • erlang  can  be  good  for  data  intensive/high   throughput  applica4ons • stateful  is  not  necessarily  hard/dangerous/ unmaintainable 49Thursday, November 3, 2011
  • 57. Conclusions • november  2010:  0  lines  of  erlang  @wooga • november  2011:  1  erlang  game  server  live ...with  more  erlang  coming,  join  us 50Thursday, November 3, 2011
  • 58. Q&A Knut Nesheim @knutin Paolo Negri @hungryblank hPp://wooga.com/jobs 51Thursday, November 3, 2011