Getting real with erlang

7,219 views

Published on

Presentation given at the Erlang user conference 2011 in Stockholm about building social game servers at wooga

Published in: Technology, News & Politics
3 Comments
30 Likes
Statistics
Notes
No Downloads
Views
Total views
7,219
On SlideShare
0
From Embeds
0
Number of Embeds
215
Actions
Shares
0
Downloads
105
Comments
3
Likes
30
Embeds 0
No embeds

No notes for slide

Getting real with erlang

  1. 1. Getting real with erlang From the idea to a live system Knut Nesheim @knutin Paolo Negri @hungryblank
  2. 2. Social GamesFlash client (game) HTTP API http://www.flickr.com/photos/theplanetdotcom/4879421344
  3. 3. Social Games HTTP API• @ 1 000 000 daily users• 5000 HTTP reqs/sec• more than 90% writes
  4. 4. November 2010• At the erlang user group!• Learn where/how erlang is used• 0 lines of erlang code across all @wooga code base
  5. 5. Why looking into erlang? HTTP API• @ 1 000 000 daily users• 5000 HTTP reqs/sec• around 60000 queries/sec
  6. 6. 60000 qps• Most maintenance effort in databases• mix of SQL / NoSQL• Already using in RAM data stores (REDIS)• RAM DB are fast but expensive / high maintenance
  7. 7. Social games data• User data self contained• Strong hot/cold pattern - gaming session• Heavy write load (caching is ineffective)
  8. 8. User session User data 1. start session 1. load all 2. game actions 2. read/update many times 3. end session 3. data becomes cold
  9. 9. User session Erlang process 1. start session 1. start (load state) 2. game actions 2. responds to messages (use state) 3. end session 3. stop (save state)
  10. 10. User session DB usage Stateful server Stateless server (Erlang )start session load user state load user stategame actions many queriesend session save user state
  11. 11. Erlang process• Follows session lifecycle• Contains and defends state (data)• Acts as a lock/serializer (one message at a time)
  12. 12. December 2010 • Prototype erlanguser1 user2 user3 process = user sessionuser4 user5 user6user7 user8 userN
  13. 13. January 2011• erlang team goes from 1 to 2 developers Open topics• Distribution/clustering• Error handling• Deployments• Operations
  14. 14. Architecture
  15. 15. Architecture goals• Move data into the applica4on server• Be as simple as possible• Graceful degrada4on when DBs go down• Easy to inspect and repair state of cluster• Easy to add more machines for scaling out 15
  16. 16. Session Session Session 16
  17. 17. WorkerSession Session Session 17
  18. 18. Worker Worker WorkerSession Session Session Session Session Session Session Session Session 18
  19. 19. Worker Worker Worker Session Session Session Session Session Session SessionCoordinatorCoordinator 19
  20. 20. Worker Worker Worker Session Session Session Session Session Session SessionCoordinatorCoordinator Lock 20
  21. 21. Worker Worker Worker Session Session Session Session Session Session SessionCoordinatorCoordinator Lock DBs 21
  22. 22. New user comes online Worker Worker Worker Session Session Session Session Session Session SessionCoordinator Lock DBs 22
  23. 23. New user comes online Worker Worker Flash calls ”setup” Worker Session Session Session Session Session Session SessionCoordinator Lock DBs 22
  24. 24. New user comes online Worker Worker Flash calls ”setup” Worker Session Session Session Session Session Session SessionCoordinator session:start(Uid) on suitable worker Lock DBs 22
  25. 25. New user comes online Worker Worker Flash calls ”setup” Worker Session Session Session Session Session Session SessionCoordinator session:start(Uid) s3:get(Uid) on suitable worker Lock DBs 22
  26. 26. New user comes online Worker Worker Flash calls ”setup” Worker Session Session Session Session Session Session Session lock:acquire(Uid)Coordinator test‐and‐set session:start(Uid) s3:get(Uid) on suitable worker Lock DBs 22
  27. 27. New user comes online Game ac4ons  Worker from Flash Worker Worker Flash calls ”setup” Session Session Session Session Session Session Session lock:acquire(Uid)Coordinator test‐and‐set session:start(Uid) s3:get(Uid) on suitable worker Lock DBs 22
  28. 28. User goes offline (session 4mes out) Worker Worker Workergen_server 4meout Session Session Session Session Session Session Session lock:release/1 s3:put/1 Coordinator Lock DBs 23
  29. 29. Implementa4on of game logic
  30. 30. Dream game logicFast Safe + 25
  31. 31. Dream game logic 26
  32. 32. Dream game logic• We want high throughput (for scaling) – Try to spend as li[le CPU 4me as possible – Avoid heavy computa4on – Try to avoid crea4ng garbage• ..and simple and testable logic (correctness) – Func4onal game logic makes thinking about code easy – Single entry point, gets ”request” and game state – Code for happy case, roll back on game‐related excep4on 27
  33. 33. How we avoid using CPU• Remove need for DB serializa4on by storing data in  process• Game is designed to avoid heavy licing in the backend,  very simple game logic• Op4mize hot parts on the cri4cal path, like collision  detec4on, regrowing of forest• Generate erlang modules for read‐heavy configura4on  (~1 billion reads/1 write per week)• Use NIFs for parsing JSON (jiffy) 28
  34. 34. How to find where CPU is used• Profile (eprof, fprof, kprof[1])• Measure garbage collec4on (process_info/1, gcprof[2])• Conduct experiment: change code, measure, repeat• Is the increased performance worth the increase in  complexity?• Some4mes a radically different approach is needed.. [1]: github.com/knu4n/kprof [2]: github.com/knu4n/gcprof 29
  35. 35. Opera4onsAugust 2011 ‐ ...
  36. 36. Opera4ons• At wooga, developers also operate the game• Most developers are ex‐sysadmins• Simple tools: –remsh for deployments, maintenance, debugging –automa4on with chef –syslog, tail, cut, awk, grep –verbose crash logs (SASL) –alarms only when something really bad happens 31
  37. 37. Deployments• Goal: upgrade without annoying users• Soc purge• Set system quiet (webserver & coordinator)• Reload• Open the flood gates• Migrate process memory state on‐demand• Total 4me not answering game requests: < 1s 32
  38. 38. How we know what’s going on• Event logging to syslog – Session start, session end (process memory, gc, game stats) – Game‐related excep4ons• Latency measurement within the app• Use munin to pull overall server stats – CPU and memory usage by beam.smp – Memory used by processes, ets tables, etc – Throughput of app, dbs – Throughput of coordinator, workers, lock 33
  39. 39. How we know what’s going onGame error: The game ac4on is not allowed with the current server state and configura4on 34
  40. 40. How we know what’s going onHalf‐word emulator 35
  41. 41. How we know what’s going on 36
  42. 42. How we know what’s going on 37
  43. 43. How we know what’s going on 38
  44. 44. How we know what’s going on 39
  45. 45. Conclusions
  46. 46. Conclusions ‐ database Ruby Stateless Erlang Stateful300002250015000 7500 700 0 queries/sec 41
  47. 47. Conclusions Db maintenace AWS S3 as a main datastore one document/user 0 maintenance/setup costRedis for the derived data (leaderboard etc.) load very far from Redis max capacity 42
  48. 48. Conclusions data locality• average game call < 1ms• no db roundtrip at every request• no need for low latency network• efficient setup for cloud environment 43
  49. 49. Conclusions data locality• finally CPU bound• no CPU 4me for serializing/deserializing data  from db• CPU only busy transforming data (minimum  possible ac4vity) 44
  50. 50. Conclusions CPU usage• 300K daily users• 1000 h[p req/sec (game ac4ons)• 4 m1.large AWS instances (dual core 8GB RAM)• 2 instances (coordinators) 5% CPU load• 2 instances (workers) 20% CPU load 45
  51. 51. Conclusions extra benefitsen4re user state in one process + immutability = Transac4onal behavior 46
  52. 52. Conclusions extra benefitsOne user session ‐> one erlang process The erlang VM is aware of processes =>the erlang VM is aware of user sessions 47
  53. 53. Conclusions Thanks to VM process introspec4on process reduc4ons ‐> cost of a game ac4onprocess used memory ‐> memory used by sessionWe gained a lot of knowledge about a fundamental  ”business” en4ty 48
  54. 54. Conclusions• a radical change was made possible by a radically  different tool (erlang) 49
  55. 55. Conclusions• a radical change was made possible by a radically  different tool (erlang)• erlang can be good for data intensive/high  throughput applica4ons 49
  56. 56. Conclusions• a radical change was made possible by a radically  different tool (erlang)• erlang can be good for data intensive/high  throughput applica4ons• stateful is not necessarily hard/dangerous/ unmaintainable 49
  57. 57. Conclusions• november 2010: 0 lines of erlang @wooga• november 2011: 1 erlang game server live...with more erlang coming, join us 50
  58. 58. Q&A Knut Nesheim @knutin Paolo Negri @hungryblankh[p://wooga.com/jobs 51

×