Large Partially-connected Erlang Clusters

  • 59 views
Uploaded on

The video is available on YouTube: https://www.youtube.com/watch?v=dEot3Fb1iB4 …

The video is available on YouTube: https://www.youtube.com/watch?v=dEot3Fb1iB4

Have you heard of a Node100 problem? Naïve Erlang clusters are complete network graphs which tend to blow up with a only a handful (100+) number of nodes. Solution is partially-connected mesh networks. The tricky part is easy communication for devs and transparent cluster management for devops.

In Spil Games n-tier architecture we use native Erlang message passing between tiers. Thus our cluster size (way over 500) requires partially connected network.

In this talk I will present a library which helps to create and maintain medium to large Erlang clusters. We have been using it for over 2 years now and time has come to announce and open-source it.

This talk is a part of the continuous effort Spil Games is making to give back to the community, as seen in Erlang Factory 2014 with the release of erl-cache and erl-memcache.

Target audience

Engineers interested or working with medium to large-scale Erlang clusters.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
59
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
3
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. 1/34 Large Partially-connected Erlang Clusters Motiejus Jakˇstys @mo kelione Sr. backend developer Erlang User Conference 2014 Stockholm
  • 2. 2/34 Background and Agenda Big service-oriented backend infrastructure. Infrastructure which provides API for our gaming portals A couple of dozen servers running about 50 services each
  • 3. 2/34 Background and Agenda Big service-oriented backend infrastructure. Infrastructure which provides API for our gaming portals A couple of dozen servers running about 50 services each Agenda: Historical introduction Technical stuff
  • 4. 2/34 Background and Agenda Big service-oriented backend infrastructure. Infrastructure which provides API for our gaming portals A couple of dozen servers running about 50 services each Agenda: Historical introduction Technical stuff Don’t hesitate to interrupt!
  • 5. 3/34 My point today
  • 6. 3/34 My point today Management of big battles is very similar to running distributed systems.
  • 7. 3/34 My point today Management of big battles is very similar to running distributed systems. I have a good example for you.
  • 8. 4/34 Battle of Stalingrad 1942.08.23 – 1943.02.02 Image source: Deutsches Bundesarchiv, RIA Novosti Archive
  • 9. 5/34 Rubble battle Image source: Deutsches Bundesarchiv, RIA Novosti Archive
  • 10. 6/34 Peer-to-peer communication Impossible to determine location. Radio was unreliable and useless. Reinforcement/supply requests were just voice.
  • 11. 7/34 Heterogeneous Infantry Tank fleet Air Force Medical staff Commanders ... Image source: http://navy.mil
  • 12. 8/34 Communication channels It is clear who gets orders from who. Very clear. Image source: http://www.vetfriends.com/military_structure/
  • 13. 9/34 Dynamic environment Nature of the battle is dynamic:
  • 14. 9/34 Dynamic environment Nature of the battle is dynamic: Losses and reinforcements change the dynamics of the battlefield.
  • 15. 10/34 Big Plan It is not enough to only take care of your business. All units must work to achieve a common goal.
  • 16. 11/34 Situation at 1942.11.15 Dire Soviet situation: Huge causalities
  • 17. 11/34 Situation at 1942.11.15 Dire Soviet situation: Huge causalities Red Army life expectancy: Soldier: < 1 day Officer: < 3 days
  • 18. 11/34 Situation at 1942.11.15 Dire Soviet situation: Huge causalities Red Army life expectancy: Soldier: < 1 day Officer: < 3 days Germans have 90% of the city Image source: Antill, P., Dennis, P. Stalingrad 1942 (Campaign). Osprey Publishing (June 19, 2007).
  • 19. 12/34 Situation after 1942.11.23 Germans surrounded by soviets
  • 20. 12/34 Situation after 1942.11.23 Germans surrounded by soviets City is Germans’, Germans’ are Soviets’
  • 21. 12/34 Situation after 1942.11.23 Germans surrounded by soviets City is Germans’, Germans’ are Soviets’ Turning point of the battle
  • 22. 12/34 Situation after 1942.11.23 Germans surrounded by soviets City is Germans’, Germans’ are Soviets’ Turning point of the battle 6’th Army (the surrounded one) was destroyed. Image source: Antill, P., Dennis, P. Stalingrad 1942 (Campaign). Osprey Publishing (June 19, 2007).
  • 23. 13/34 Outline 1 Historical introduction 2 Technical stuff Motivation Features API 3 QA You sure you have no questions?
  • 24. 14/34 Set the grounds We have a lot of Erlang nodes trying to achieve a common goal.
  • 25. 14/34 Set the grounds We have a lot of Erlang nodes trying to achieve a common goal. ≈ 50 nodes per server
  • 26. 14/34 Set the grounds We have a lot of Erlang nodes trying to achieve a common goal. ≈ 50 nodes per server a couple of dozens of servers
  • 27. 14/34 Set the grounds We have a lot of Erlang nodes trying to achieve a common goal. ≈ 50 nodes per server a couple of dozens of servers Example services: Higscores
  • 28. 14/34 Set the grounds We have a lot of Erlang nodes trying to achieve a common goal. ≈ 50 nodes per server a couple of dozens of servers Example services: Higscores Authentication
  • 29. 14/34 Set the grounds We have a lot of Erlang nodes trying to achieve a common goal. ≈ 50 nodes per server a couple of dozens of servers Example services: Higscores Authentication Chat
  • 30. 14/34 Set the grounds We have a lot of Erlang nodes trying to achieve a common goal. ≈ 50 nodes per server a couple of dozens of servers Example services: Higscores Authentication Chat User profiles
  • 31. 14/34 Set the grounds We have a lot of Erlang nodes trying to achieve a common goal. ≈ 50 nodes per server a couple of dozens of servers Example services: Higscores Authentication Chat User profiles ...
  • 32. 14/34 Set the grounds We have a lot of Erlang nodes trying to achieve a common goal. ≈ 50 nodes per server a couple of dozens of servers Example services: Higscores Authentication Chat User profiles ... How to connect them?
  • 33. 15/34 Peer-to-peer communication You don’t want bottlenecks. You don’t want single points of failure.
  • 34. 16/34 Dynamic nodes Nodes and services start and stop all the time.
  • 35. 16/34 Dynamic nodes Nodes and services start and stop all the time. The system must continue to function and self-heal.
  • 36. 17/34 Partially connected network n : number of nodes. Total connections = n(n−1) 2
  • 37. 17/34 Partially connected network n : number of nodes. Total connections = n(n−1) 2 0 2 M 4 M 6 M 8 M 10 M 12 M 14 M 0 20 40 60 80 100 Total connections, 50 nodes per machine
  • 38. 18/34 Conclusions (for now) It pays off to optimize the topology so communication is more effective, e.g.: Army Software Defined Networking Management
  • 39. 19/34 Remember? Erlang Battle P2P communication P2P communication Multi app groups Heterogeneous Dynamic nodes Dynamic environment Partially connected network Communication channels
  • 40. 19/34 Remember? Erlang Battle P2P communication P2P communication Multi app groups Heterogeneous Dynamic nodes Dynamic environment Partially connected network Communication channels Maintaining a distributed system is like managing a battle.
  • 41. 19/34 Remember? Erlang Battle P2P communication P2P communication Multi app groups Heterogeneous Dynamic nodes Dynamic environment Partially connected network Communication channels Maintaining a distributed system is like managing a battle. We can be generals.
  • 42. 20/34 We need to solve this What spapi-router pg2 gproc P2P communication Multi app groups Dynamic nodes Partially connected network by limiting connections x x
  • 43. 21/34 spapi-router features in a nutshell A library. Creates a mesh network Connects (hidden) nodes like configured
  • 44. 21/34 spapi-router features in a nutshell A library. Creates a mesh network Connects (hidden) nodes like configured Abstracts destination RPC-call a service, not a node
  • 45. 21/34 spapi-router features in a nutshell A library. Creates a mesh network Connects (hidden) nodes like configured Abstracts destination RPC-call a service, not a node Mature and optimized Used for 2,5 years in a sufficiently large SOA
  • 46. 21/34 spapi-router features in a nutshell A library. Creates a mesh network Connects (hidden) nodes like configured Abstracts destination RPC-call a service, not a node Mature and optimized Used for 2,5 years in a sufficiently large SOA Instrumented
  • 47. 22/34 For example pagebuilder@host3.fqdn kernel stdlib pagebuilder spapi_router kernel stdlib header kernel stdlib mainsection kernel stdlib mainsection header@host1.fqdn mainsec2@host2.fqdn mainsec1@host1.fqdn test2@host2.fqdn kernel stdlib mainsection
  • 48. 22/34 For example pagebuilder@host3.fqdn kernel stdlib pagebuilder spapi_router kernel stdlib header kernel stdlib mainsection kernel stdlib mainsection header@host1.fqdn mainsec2@host2.fqdn mainsec1@host1.fqdn test2@host2.fqdn kernel stdlib mainsection Configuration of pagebuilder@host3.fqdn: {spapi_router, [ {host_names, [ "host1.fqdn", "host2.fqdn", ]}, {workers, [ {"^header[0-9]*", [header]}, {"^mainsec[0-9]*", [mainsection]}, ]} ]}
  • 49. 23/34 More configuration hosts monitor interval ms world monitor interval ms worker monitor interval ms callback module
  • 50. 24/34 How it works {spapi_router, [ {host_names, [ "host1.fqdn", "host2.fqdn", ]}, {workers, [ {"^header[0-9]*", [header]}, {"^mainsec[0-9]*", [mainsection]}, ]} ]}
  • 51. 24/34 How it works {spapi_router, [ {host_names, [ "host1.fqdn", "host2.fqdn", ]}, {workers, [ {"^header[0-9]*", [header]}, {"^mainsec[0-9]*", [mainsection]}, ]} ]} 1. Connects to host names
  • 52. 24/34 How it works {spapi_router, [ {host_names, [ "host1.fqdn", "host2.fqdn", ]}, {workers, [ {"^header[0-9]*", [header]}, {"^mainsec[0-9]*", [mainsection]}, ]} ]} 1. Connects to host names 2. Asks EPMD for running nodes
  • 53. 24/34 How it works {spapi_router, [ {host_names, [ "host1.fqdn", "host2.fqdn", ]}, {workers, [ {"^header[0-9]*", [header]}, {"^mainsec[0-9]*", [mainsection]}, ]} ]} 1. Connects to host names 2. Asks EPMD for running nodes 3. Connects to nodes matching regexp
  • 54. 24/34 How it works {spapi_router, [ {host_names, [ "host1.fqdn", "host2.fqdn", ]}, {workers, [ {"^header[0-9]*", [header]}, {"^mainsec[0-9]*", [mainsection]}, ]} ]} 1. Connects to host names 2. Asks EPMD for running nodes 3. Connects to nodes matching regexp 4. Checks for applications in nodes
  • 55. 24/34 How it works {spapi_router, [ {host_names, [ "host1.fqdn", "host2.fqdn", ]}, {workers, [ {"^header[0-9]*", [header]}, {"^mainsec[0-9]*", [mainsection]}, ]} ]} 1. Connects to host names 2. Asks EPMD for running nodes 3. Connects to nodes matching regexp 4. Checks for applications in nodes 5. Connects to relevant nodes
  • 56. 25/34 Why instrument? Helps understand the system is sound. The first thing you want to instrument is the border of your service. Effortless instrumentation for all calls via spapi-router.
  • 57. 26/34 callback module #1 %% New resource is detected -callback new_resource({Service :: atom(), node()}, opts()) -> any(). %% Existing resource is lost %% (node disconnect, shutdown, etc). -callback lost_resource({Service :: atom(), node()}, opts()) -> any().
  • 58. 27/34 callback module #2 -type log_spec() :: { Service :: atom(), Module :: atom(), Function :: atom() }. %% Time instrumentation -callback measure(log_spec(), fun(() -> A), opts()) -> A. %% Called on success/failure of a function call. -callback success(log_spec(), opts()) -> term(). -callback failure(log_spec(), opts()) -> term().
  • 59. 28/34 Calling others
  • 60. 28/34 Calling others spr_router:call(piqi_rpc, erlang, node, []).
  • 61. 28/34 Calling others spr_router:call(piqi_rpc, erlang, node, []). spr_router:call_all(piqi_rpc, erlang, node, []).
  • 62. 28/34 Calling others spr_router:call(piqi_rpc, erlang, node, []). spr_router:call_all(piqi_rpc, erlang, node, []). Extras: call/5 call all/5 list workers/0 list workers/1 list hosts/0
  • 63. 29/34 Future optimizations Takes time to figure out a ’stop’. Monitor application controller instead of node. One node == one service.
  • 64. 30/34 How to change nodes? Puppet plus RelUp ... or anything really: spr app:config change([], [], []).
  • 65. 31/34 Battle stories
  • 66. 31/34 Battle stories Tried to disconnect from irrelevant nodes first
  • 67. 31/34 Battle stories Tried to disconnect from irrelevant nodes first what if relationship is one-way?
  • 68. 31/34 Battle stories Tried to disconnect from irrelevant nodes first what if relationship is one-way? one misbehaving component can bring the system down
  • 69. 31/34 Battle stories Tried to disconnect from irrelevant nodes first what if relationship is one-way? one misbehaving component can bring the system down Fully connected network experience (> 1K nodes)
  • 70. 31/34 Battle stories Tried to disconnect from irrelevant nodes first what if relationship is one-way? one misbehaving component can bring the system down Fully connected network experience (> 1K nodes) everyone should try that
  • 71. 31/34 Battle stories Tried to disconnect from irrelevant nodes first what if relationship is one-way? one misbehaving component can bring the system down Fully connected network experience (> 1K nodes) everyone should try that thanks to off-peak and 10G NIC
  • 72. 32/34 Stats 2012-01-12 Initial commit (Thijs Terlouw)
  • 73. 32/34 Stats 2012-01-12 Initial commit (Thijs Terlouw) 2012-02-16 1.0.0 – first prod (Thijs Terlouw)
  • 74. 32/34 Stats 2012-01-12 Initial commit (Thijs Terlouw) 2012-02-16 1.0.0 – first prod (Thijs Terlouw) 2012-08-24 1.2.0 – broadcast (Enrique Paz)
  • 75. 32/34 Stats 2012-01-12 Initial commit (Thijs Terlouw) 2012-02-16 1.0.0 – first prod (Thijs Terlouw) 2012-08-24 1.2.0 – broadcast (Enrique Paz) 2014-06-20 2.1.0 – many contributions by Spil Games
  • 76. 32/34 Stats 2012-01-12 Initial commit (Thijs Terlouw) 2012-02-16 1.0.0 – first prod (Thijs Terlouw) 2012-08-24 1.2.0 – broadcast (Enrique Paz) 2014-06-20 2.1.0 – many contributions by Spil Games 2014-07-04 Public pre-release commit b3b7aad9ca14ed230f28635826b371b6bbea3840 Author: Motiejus Jakˇstys <motiejus.jakstys@spilgames.com> Date: Wed Jun 4 14:39:40 2014 +0200 Initial commit 21 files changed, 2984 insertions(+)
  • 77. 32/34 Stats 2012-01-12 Initial commit (Thijs Terlouw) 2012-02-16 1.0.0 – first prod (Thijs Terlouw) 2012-08-24 1.2.0 – broadcast (Enrique Paz) 2014-06-20 2.1.0 – many contributions by Spil Games 2014-07-04 Public pre-release commit b3b7aad9ca14ed230f28635826b371b6bbea3840 Author: Motiejus Jakˇstys <motiejus.jakstys@spilgames.com> Date: Wed Jun 4 14:39:40 2014 +0200 Initial commit 21 files changed, 2984 insertions(+) 2014-07-10 http://github.com/spilgames/spapi-router
  • 78. 33/34 Outline 1 Historical introduction 2 Technical stuff Motivation Features API 3 QA You sure you have no questions?
  • 79. 34/34 QA