Large Partially-connected Erlang Clusters

374 views

Published on

The video is available on YouTube: https://www.youtube.com/watch?v=dEot3Fb1iB4

Have you heard of a Node100 problem? Naïve Erlang clusters are complete network graphs which tend to blow up with a only a handful (100+) number of nodes. Solution is partially-connected mesh networks. The tricky part is easy communication for devs and transparent cluster management for devops.

In Spil Games n-tier architecture we use native Erlang message passing between tiers. Thus our cluster size (way over 500) requires partially connected network.

In this talk I will present a library which helps to create and maintain medium to large Erlang clusters. We have been using it for over 2 years now and time has come to announce and open-source it.

This talk is a part of the continuous effort Spil Games is making to give back to the community, as seen in Erlang Factory 2014 with the release of erl-cache and erl-memcache.

Target audience

Engineers interested or working with medium to large-scale Erlang clusters.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
374
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Large Partially-connected Erlang Clusters

  1. 1. 1/34 Large Partially-connected Erlang Clusters Motiejus Jakˇstys @mo kelione Sr. backend developer Erlang User Conference 2014 Stockholm
  2. 2. 2/34 Background and Agenda Big service-oriented backend infrastructure. Infrastructure which provides API for our gaming portals A couple of dozen servers running about 50 services each
  3. 3. 2/34 Background and Agenda Big service-oriented backend infrastructure. Infrastructure which provides API for our gaming portals A couple of dozen servers running about 50 services each Agenda: Historical introduction Technical stuff
  4. 4. 2/34 Background and Agenda Big service-oriented backend infrastructure. Infrastructure which provides API for our gaming portals A couple of dozen servers running about 50 services each Agenda: Historical introduction Technical stuff Don’t hesitate to interrupt!
  5. 5. 3/34 My point today
  6. 6. 3/34 My point today Management of big battles is very similar to running distributed systems.
  7. 7. 3/34 My point today Management of big battles is very similar to running distributed systems. I have a good example for you.
  8. 8. 4/34 Battle of Stalingrad 1942.08.23 – 1943.02.02 Image source: Deutsches Bundesarchiv, RIA Novosti Archive
  9. 9. 5/34 Rubble battle Image source: Deutsches Bundesarchiv, RIA Novosti Archive
  10. 10. 6/34 Peer-to-peer communication Impossible to determine location. Radio was unreliable and useless. Reinforcement/supply requests were just voice.
  11. 11. 7/34 Heterogeneous Infantry Tank fleet Air Force Medical staff Commanders ... Image source: http://navy.mil
  12. 12. 8/34 Communication channels It is clear who gets orders from who. Very clear. Image source: http://www.vetfriends.com/military_structure/
  13. 13. 9/34 Dynamic environment Nature of the battle is dynamic:
  14. 14. 9/34 Dynamic environment Nature of the battle is dynamic: Losses and reinforcements change the dynamics of the battlefield.
  15. 15. 10/34 Big Plan It is not enough to only take care of your business. All units must work to achieve a common goal.
  16. 16. 11/34 Situation at 1942.11.15 Dire Soviet situation: Huge causalities
  17. 17. 11/34 Situation at 1942.11.15 Dire Soviet situation: Huge causalities Red Army life expectancy: Soldier: < 1 day Officer: < 3 days
  18. 18. 11/34 Situation at 1942.11.15 Dire Soviet situation: Huge causalities Red Army life expectancy: Soldier: < 1 day Officer: < 3 days Germans have 90% of the city Image source: Antill, P., Dennis, P. Stalingrad 1942 (Campaign). Osprey Publishing (June 19, 2007).
  19. 19. 12/34 Situation after 1942.11.23 Germans surrounded by soviets
  20. 20. 12/34 Situation after 1942.11.23 Germans surrounded by soviets City is Germans’, Germans’ are Soviets’
  21. 21. 12/34 Situation after 1942.11.23 Germans surrounded by soviets City is Germans’, Germans’ are Soviets’ Turning point of the battle
  22. 22. 12/34 Situation after 1942.11.23 Germans surrounded by soviets City is Germans’, Germans’ are Soviets’ Turning point of the battle 6’th Army (the surrounded one) was destroyed. Image source: Antill, P., Dennis, P. Stalingrad 1942 (Campaign). Osprey Publishing (June 19, 2007).
  23. 23. 13/34 Outline 1 Historical introduction 2 Technical stuff Motivation Features API 3 QA You sure you have no questions?
  24. 24. 14/34 Set the grounds We have a lot of Erlang nodes trying to achieve a common goal.
  25. 25. 14/34 Set the grounds We have a lot of Erlang nodes trying to achieve a common goal. ≈ 50 nodes per server
  26. 26. 14/34 Set the grounds We have a lot of Erlang nodes trying to achieve a common goal. ≈ 50 nodes per server a couple of dozens of servers
  27. 27. 14/34 Set the grounds We have a lot of Erlang nodes trying to achieve a common goal. ≈ 50 nodes per server a couple of dozens of servers Example services: Higscores
  28. 28. 14/34 Set the grounds We have a lot of Erlang nodes trying to achieve a common goal. ≈ 50 nodes per server a couple of dozens of servers Example services: Higscores Authentication
  29. 29. 14/34 Set the grounds We have a lot of Erlang nodes trying to achieve a common goal. ≈ 50 nodes per server a couple of dozens of servers Example services: Higscores Authentication Chat
  30. 30. 14/34 Set the grounds We have a lot of Erlang nodes trying to achieve a common goal. ≈ 50 nodes per server a couple of dozens of servers Example services: Higscores Authentication Chat User profiles
  31. 31. 14/34 Set the grounds We have a lot of Erlang nodes trying to achieve a common goal. ≈ 50 nodes per server a couple of dozens of servers Example services: Higscores Authentication Chat User profiles ...
  32. 32. 14/34 Set the grounds We have a lot of Erlang nodes trying to achieve a common goal. ≈ 50 nodes per server a couple of dozens of servers Example services: Higscores Authentication Chat User profiles ... How to connect them?
  33. 33. 15/34 Peer-to-peer communication You don’t want bottlenecks. You don’t want single points of failure.
  34. 34. 16/34 Dynamic nodes Nodes and services start and stop all the time.
  35. 35. 16/34 Dynamic nodes Nodes and services start and stop all the time. The system must continue to function and self-heal.
  36. 36. 17/34 Partially connected network n : number of nodes. Total connections = n(n−1) 2
  37. 37. 17/34 Partially connected network n : number of nodes. Total connections = n(n−1) 2 0 2 M 4 M 6 M 8 M 10 M 12 M 14 M 0 20 40 60 80 100 Total connections, 50 nodes per machine
  38. 38. 18/34 Conclusions (for now) It pays off to optimize the topology so communication is more effective, e.g.: Army Software Defined Networking Management
  39. 39. 19/34 Remember? Erlang Battle P2P communication P2P communication Multi app groups Heterogeneous Dynamic nodes Dynamic environment Partially connected network Communication channels
  40. 40. 19/34 Remember? Erlang Battle P2P communication P2P communication Multi app groups Heterogeneous Dynamic nodes Dynamic environment Partially connected network Communication channels Maintaining a distributed system is like managing a battle.
  41. 41. 19/34 Remember? Erlang Battle P2P communication P2P communication Multi app groups Heterogeneous Dynamic nodes Dynamic environment Partially connected network Communication channels Maintaining a distributed system is like managing a battle. We can be generals.
  42. 42. 20/34 We need to solve this What spapi-router pg2 gproc P2P communication Multi app groups Dynamic nodes Partially connected network by limiting connections x x
  43. 43. 21/34 spapi-router features in a nutshell A library. Creates a mesh network Connects (hidden) nodes like configured
  44. 44. 21/34 spapi-router features in a nutshell A library. Creates a mesh network Connects (hidden) nodes like configured Abstracts destination RPC-call a service, not a node
  45. 45. 21/34 spapi-router features in a nutshell A library. Creates a mesh network Connects (hidden) nodes like configured Abstracts destination RPC-call a service, not a node Mature and optimized Used for 2,5 years in a sufficiently large SOA
  46. 46. 21/34 spapi-router features in a nutshell A library. Creates a mesh network Connects (hidden) nodes like configured Abstracts destination RPC-call a service, not a node Mature and optimized Used for 2,5 years in a sufficiently large SOA Instrumented
  47. 47. 22/34 For example pagebuilder@host3.fqdn kernel stdlib pagebuilder spapi_router kernel stdlib header kernel stdlib mainsection kernel stdlib mainsection header@host1.fqdn mainsec2@host2.fqdn mainsec1@host1.fqdn test2@host2.fqdn kernel stdlib mainsection
  48. 48. 22/34 For example pagebuilder@host3.fqdn kernel stdlib pagebuilder spapi_router kernel stdlib header kernel stdlib mainsection kernel stdlib mainsection header@host1.fqdn mainsec2@host2.fqdn mainsec1@host1.fqdn test2@host2.fqdn kernel stdlib mainsection Configuration of pagebuilder@host3.fqdn: {spapi_router, [ {host_names, [ "host1.fqdn", "host2.fqdn", ]}, {workers, [ {"^header[0-9]*", [header]}, {"^mainsec[0-9]*", [mainsection]}, ]} ]}
  49. 49. 23/34 More configuration hosts monitor interval ms world monitor interval ms worker monitor interval ms callback module
  50. 50. 24/34 How it works {spapi_router, [ {host_names, [ "host1.fqdn", "host2.fqdn", ]}, {workers, [ {"^header[0-9]*", [header]}, {"^mainsec[0-9]*", [mainsection]}, ]} ]}
  51. 51. 24/34 How it works {spapi_router, [ {host_names, [ "host1.fqdn", "host2.fqdn", ]}, {workers, [ {"^header[0-9]*", [header]}, {"^mainsec[0-9]*", [mainsection]}, ]} ]} 1. Connects to host names
  52. 52. 24/34 How it works {spapi_router, [ {host_names, [ "host1.fqdn", "host2.fqdn", ]}, {workers, [ {"^header[0-9]*", [header]}, {"^mainsec[0-9]*", [mainsection]}, ]} ]} 1. Connects to host names 2. Asks EPMD for running nodes
  53. 53. 24/34 How it works {spapi_router, [ {host_names, [ "host1.fqdn", "host2.fqdn", ]}, {workers, [ {"^header[0-9]*", [header]}, {"^mainsec[0-9]*", [mainsection]}, ]} ]} 1. Connects to host names 2. Asks EPMD for running nodes 3. Connects to nodes matching regexp
  54. 54. 24/34 How it works {spapi_router, [ {host_names, [ "host1.fqdn", "host2.fqdn", ]}, {workers, [ {"^header[0-9]*", [header]}, {"^mainsec[0-9]*", [mainsection]}, ]} ]} 1. Connects to host names 2. Asks EPMD for running nodes 3. Connects to nodes matching regexp 4. Checks for applications in nodes
  55. 55. 24/34 How it works {spapi_router, [ {host_names, [ "host1.fqdn", "host2.fqdn", ]}, {workers, [ {"^header[0-9]*", [header]}, {"^mainsec[0-9]*", [mainsection]}, ]} ]} 1. Connects to host names 2. Asks EPMD for running nodes 3. Connects to nodes matching regexp 4. Checks for applications in nodes 5. Connects to relevant nodes
  56. 56. 25/34 Why instrument? Helps understand the system is sound. The first thing you want to instrument is the border of your service. Effortless instrumentation for all calls via spapi-router.
  57. 57. 26/34 callback module #1 %% New resource is detected -callback new_resource({Service :: atom(), node()}, opts()) -> any(). %% Existing resource is lost %% (node disconnect, shutdown, etc). -callback lost_resource({Service :: atom(), node()}, opts()) -> any().
  58. 58. 27/34 callback module #2 -type log_spec() :: { Service :: atom(), Module :: atom(), Function :: atom() }. %% Time instrumentation -callback measure(log_spec(), fun(() -> A), opts()) -> A. %% Called on success/failure of a function call. -callback success(log_spec(), opts()) -> term(). -callback failure(log_spec(), opts()) -> term().
  59. 59. 28/34 Calling others
  60. 60. 28/34 Calling others spr_router:call(piqi_rpc, erlang, node, []).
  61. 61. 28/34 Calling others spr_router:call(piqi_rpc, erlang, node, []). spr_router:call_all(piqi_rpc, erlang, node, []).
  62. 62. 28/34 Calling others spr_router:call(piqi_rpc, erlang, node, []). spr_router:call_all(piqi_rpc, erlang, node, []). Extras: call/5 call all/5 list workers/0 list workers/1 list hosts/0
  63. 63. 29/34 Future optimizations Takes time to figure out a ’stop’. Monitor application controller instead of node. One node == one service.
  64. 64. 30/34 How to change nodes? Puppet plus RelUp ... or anything really: spr app:config change([], [], []).
  65. 65. 31/34 Battle stories
  66. 66. 31/34 Battle stories Tried to disconnect from irrelevant nodes first
  67. 67. 31/34 Battle stories Tried to disconnect from irrelevant nodes first what if relationship is one-way?
  68. 68. 31/34 Battle stories Tried to disconnect from irrelevant nodes first what if relationship is one-way? one misbehaving component can bring the system down
  69. 69. 31/34 Battle stories Tried to disconnect from irrelevant nodes first what if relationship is one-way? one misbehaving component can bring the system down Fully connected network experience (> 1K nodes)
  70. 70. 31/34 Battle stories Tried to disconnect from irrelevant nodes first what if relationship is one-way? one misbehaving component can bring the system down Fully connected network experience (> 1K nodes) everyone should try that
  71. 71. 31/34 Battle stories Tried to disconnect from irrelevant nodes first what if relationship is one-way? one misbehaving component can bring the system down Fully connected network experience (> 1K nodes) everyone should try that thanks to off-peak and 10G NIC
  72. 72. 32/34 Stats 2012-01-12 Initial commit (Thijs Terlouw)
  73. 73. 32/34 Stats 2012-01-12 Initial commit (Thijs Terlouw) 2012-02-16 1.0.0 – first prod (Thijs Terlouw)
  74. 74. 32/34 Stats 2012-01-12 Initial commit (Thijs Terlouw) 2012-02-16 1.0.0 – first prod (Thijs Terlouw) 2012-08-24 1.2.0 – broadcast (Enrique Paz)
  75. 75. 32/34 Stats 2012-01-12 Initial commit (Thijs Terlouw) 2012-02-16 1.0.0 – first prod (Thijs Terlouw) 2012-08-24 1.2.0 – broadcast (Enrique Paz) 2014-06-20 2.1.0 – many contributions by Spil Games
  76. 76. 32/34 Stats 2012-01-12 Initial commit (Thijs Terlouw) 2012-02-16 1.0.0 – first prod (Thijs Terlouw) 2012-08-24 1.2.0 – broadcast (Enrique Paz) 2014-06-20 2.1.0 – many contributions by Spil Games 2014-07-04 Public pre-release commit b3b7aad9ca14ed230f28635826b371b6bbea3840 Author: Motiejus Jakˇstys <motiejus.jakstys@spilgames.com> Date: Wed Jun 4 14:39:40 2014 +0200 Initial commit 21 files changed, 2984 insertions(+)
  77. 77. 32/34 Stats 2012-01-12 Initial commit (Thijs Terlouw) 2012-02-16 1.0.0 – first prod (Thijs Terlouw) 2012-08-24 1.2.0 – broadcast (Enrique Paz) 2014-06-20 2.1.0 – many contributions by Spil Games 2014-07-04 Public pre-release commit b3b7aad9ca14ed230f28635826b371b6bbea3840 Author: Motiejus Jakˇstys <motiejus.jakstys@spilgames.com> Date: Wed Jun 4 14:39:40 2014 +0200 Initial commit 21 files changed, 2984 insertions(+) 2014-07-10 http://github.com/spilgames/spapi-router
  78. 78. 33/34 Outline 1 Historical introduction 2 Technical stuff Motivation Features API 3 QA You sure you have no questions?
  79. 79. 34/34 QA

×