Erlang as a Cloud Citizen

4,084 views
3,790 views

Published on

This talk wants to sum up the experience of designing, deploying and maintaining an Erlang application targeting the cloud and precisely AWS as hosting infrastructure.

As the application now serves a significantly large user base with a sustained throughput of thousands of games actions per second we're able to analyse retrospectively our engineering and architectural choices and see how Erlang fits in the cloud environment also comparing it to previous experiences of clouds deployments of other platforms.

We'll discuss properties of Erlang as a language and OTP as a framework and how we used them to design a system that is a good cloud citizen. We'll also discuss topics that are still open for a solution.

Published in: Technology, News & Politics
0 Comments
14 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,084
On SlideShare
0
From Embeds
0
Number of Embeds
40
Actions
Shares
0
Downloads
67
Comments
0
Likes
14
Embeds 0
No embeds

No notes for slide

Erlang as a Cloud Citizen

  1. 1. Erlang as a cloud citizen Paolo Negri @hungryblank
  2. 2. 1 day at• 10 millions players• 2 billions game server requests (http)• 20 devops people
  3. 3. Cloud“A cloud is made of billows uponbillows upon billows that look likeclouds.As you come closer to a cloud youdont get something smooth, butirregularities at a smaller scale.” Benoît B. Mandelbrot http://www.flickr.com/photos/nirak/644336486
  4. 4. AWS Cloud
  5. 5. This talk will answer• Why building a system targeting the cloud?• How many EC2 instances do you need to respond to 0.25 billion uncacheable game reqs/day?
  6. 6. 15 months ago... http://www.flickr.com/photos/wheatfields/515068701
  7. 7. 1st cloud hosted project, lessons learned1 Pushing live the 60th application server? not different from adding the 6th! push a button
  8. 8. 1st cloud hosted project, lessons learned2 local network/local disk are low performance general purpose tools (nothing to do with ad hoc data center solutions)
  9. 9. 1st cloud hosted project, lessons learned3 Complete automation is cool Ease of adding hosts/automation can lead to bloated infrastructure
  10. 10. 1st cloud hosted project, points of pain • A lot of inefficient app servers (as per tweets) • Much effort to scale up/maintain databases (mySQL & Redis) • Expensive, not crazy expensive, but expensive
  11. 11. Why trying again? http://www.flickr.com/photos/kky/704056791/
  12. 12. Uncertainity• will we reach100K or 3 millions users?• 3 million users in 2 weeks or 12 months?• cheat tool released 1h ago => single game call up 5000%• weekly releases, new feature performance impact?
  13. 13. the cloud• standard units (instances) of computing capacity• a network connecting all instances• an API to provision/dismiss instances
  14. 14. the cloudSounds like a good framework tocompose computing capacityWhy didn’t work as a frameworkto compose throughput?
  15. 15. Scaling in the cloud the recipeCLOUD: composable units of computing capacity +DEVELOPER: turn a unit of computing capacity in a unit of throughput =composable throughput, a plan for scaling BIG
  16. 16. turn a unit of computing capacity in a unit ofhttp://www.flickr.com/photos/pasukaru76 throughput
  17. 17. Unit of throughput, where? App server Database
  18. 18. Unit of throughput, joke?App server App server App server App server Cache Database Database
  19. 19. Unit of throughput? App server Database No unit!
  20. 20. Unit of throughput? App server DatabaseTightly coupled throughput?
  21. 21. Unit of throughput? App server DatabaseMonolithic throughput?
  22. 22. Monolithic throughput
  23. 23. Monolithic throughput• likes monolithic infrastructure!• scales well vertically• wants screaming fast stack (network, disks...)• any performance glitch impacts the whole system
  24. 24. Tightly coupled throughput +loosely coupled hardware (like cloud) = frustration
  25. 25. Who leads the tightly coupled dance? App server Database
  26. 26. Who leads the tightly coupled dance? App server Database The stateless application server!
  27. 27. Stateless application servers guarantee one thing... which?
  28. 28. Data is neverwhere you need it
  29. 29. And another one...
  30. 30. If you can feed them data fast enough... they’ll choke on garbage collection
  31. 31. We measure memcache HIT / MISSwhy app servers need to be 100% MISS?
  32. 32. Where’s the bestknowledge about hot/ cold data?
  33. 33. Even the reverse makes more sense Database 1. pick your data up 2. go in the stateless App server app server
  34. 34. WhatWentWrong?
  35. 35. He can tell you! • Rich Hickey • Clojure author “...If not in Erlang which I think has a complete story for how they do state”[1] [1] Value Identity State @0.27 http://goo.gl/Zdjv0 http://www.flickr.com/photos/ghoseb/5120173586
  36. 36. Most languages and runtimesdon’t have a safe solution for concurrent, long lived state Erlang stands out as anexception in this panorama
  37. 37. Erlang... Processes are the primarymeans to structure an Erlang application. wikipedia
  38. 38. Erlang + OTP Generic Server Behaviour A generic server process(gen_server) implemented using this module... otp documentation
  39. 39. Erlang + OTP Generic Server Behaviourhandle_call(_Request, _From, State) ->    {reply, ignored, State}.handle_cast(_Msg, State) ->    {noreply, State}.handle_info(_Info, State) ->    {noreply, State}.terminate(_Reason, _State) ->    ok.code_change(_OldVsn, State, _Extra) ->    {ok, State}.
  40. 40. gen_server • An erlang process • With LOCAL state • responding to requests from clientsA unit of throughput!
  41. 41. Composing throughput with erlanggen_server + EC2 instance
  42. 42. Composing throughput with erlang EC2 instance 1 EC2 instance + 1 erlang VM =N kilo gen_servers (N kilo units of throughput)
  43. 43. Composing throughput with erlang EC2 instance 1 EC2 instance + 1 erlang VM =N kilo gen_servers (N kilo units of throughput)
  44. 44. Composing throughput with erlang EC2 instance 1 EC2 instance + 1 erlang VM =N kilo gen_servers (N kilo units of throughput)
  45. 45. Composing throughput with erlang EC2 instance 1 EC2 instance + 1 erlang VM =N kilo gen_servers (N kilo units of throughput)
  46. 46. Composing throughput with erlang EC2 instance 1 EC2 instance + 1 erlang VM =N kilo gen_servers (N kilo units of throughput)
  47. 47. Scale by adding units instances
  48. 48. Scale up adding units instances
  49. 49. Erlang distribution
  50. 50. Throughput complexity VS.• Losely coupled peers • Tightly coupled roles• Independent throughput • Dependent throughput
  51. 51. Where does the state come from? Start Databasegen_server Stop
  52. 52. Now with database AWS S3
  53. 53. Database scalabilityDB is (almost) never onlatency critical path AWS S3No need for low latency DB
  54. 54. Database scalabilityThroughput required is low AWS S3 We can approximate S3 capacity as infinite
  55. 55. Database scalabilityUbiquitous and uniform fromapplication servers point ofview. AWS S3
  56. 56. Remember? AWS S3
  57. 57. How it actually works Data from the S3 is uniformly available to any ec2 instance
  58. 58. And as you zoom in...
  59. 59. And zoom in...
  60. 60. And zoom in you see... EC2 instance
  61. 61. Always the same kind of structureA fractal approach to throughput
  62. 62. Fractal“A cauliflower shows how an objectcan be made of many parts, each ofwhich is like a whole, but smaller.” Benoît B. Mandelbrot http://www.flickr.com/photos/paulobrabo/3588387063
  63. 63. HomeworkThe exact same solution might not work for you... but look for that unit of throughput
  64. 64. Answer time You need XXXX Smallish instances to serve 0.25billions uncacheable reqs/day
  65. 65. Answer time You need 0XXX Smallish instances to serve 0.25billions uncacheable reqs/day
  66. 66. Answer time You need 00XX Smallish instances to serve 0.25billions uncacheable reqs/day
  67. 67. Answer time You need 0012 Smallish instances to serve 0.25billions uncacheable reqs/day
  68. 68. What’s1200 ?
  69. 69. ThanksPaolo Negri @hungryblankhttp://www.wooga.com/jobs

×