Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deep Dive into Cluster Manager in Couchbase Server 4.0: Couchbase Connect 2015

964 views

Published on

If the data nodes are the heavy lifters of the Couchbase Server, then the cluster manager is the brains behind the operation. The job of the cluster manager covers provisioning and management of an increasing number of services; tracking and distributing topology and metadata; and coordinating all other cluster-wide actions such as replication, failover and rebalance. Oh, yes and it must be fault-tolerant to provide robust behavior when, as is inevitable, things go wrong. In this session, we’ll cover all of this and provide concrete examples of how to use the cluster manager to optimize your Couchbase implementation.

Published in: Technology
  • Be the first to comment

Deep Dive into Cluster Manager in Couchbase Server 4.0: Couchbase Connect 2015

  1. 1. DEEP DIVE INTO CLUSTER MANAGER IN 4.0 Dave Finlay Snr Director Server Engineering, Couchbase
  2. 2. ©2015 Couchbase Inc. 2 Cluster Manager What is this “Cluster Manager” of which you speak?
  3. 3. ©2015 Couchbase Inc. 3 First: We Need a Cluster! machine 1 machine 2 machine 3 Ethernet
  4. 4. ©2015 Couchbase Inc. 4 The “Depth Gauge” cluster node os process erlang process The “depth gauge” indicates the current zoom-level as we dive into Cluster Manager
  5. 5. ©2015 Couchbase Inc. 5 The “Depth Gauge” cluster node os process erlang process
  6. 6. ©2015 Couchbase Inc. 6 Back to the Cluster machine 1 machine 2 machine 3 Ethernet Couchbase Node Couchbase Node Couchbase Node
  7. 7. ©2015 Couchbase Inc. 7 Back to the Cluster machine 1 machine 2 machine 3 Ethernet Couchbase Node Couchbase Node Couchbase Node
  8. 8. ©2015 Couchbase Inc. 8 Back to the Cluster machine 1 machine 2 machine 3 Ethernet node 1 node 2 node 3 This is the Cluster Manager
  9. 9. ©2015 Couchbase Inc. 9 What Does the Cluster Manager Do?
  10. 10. ©2015 Couchbase Inc. 10 What Does the Cluster Manager Do? manage services manage data topology manage replications provide metadata service auth / security RESTAPI UI stats, monitoring, simple alerting
  11. 11. ©2015 Couchbase Inc. 11 Anatomy of a Node machine 1 babysitter query indexer memcached ns-server xdcr view-engine other…
  12. 12. ©2015 Couchbase Inc. 12 Anatomy of a Node machine 1 babysitter query indexer memcached ns-server xdcr view-engine other… The Cluster Manager is babysitter and ns-server
  13. 13. ©2015 Couchbase Inc. 13 Anatomy of a Node machine 1 babysitter memcached ns-server Memcached serves key-value memcached requests & is deployed on “data service” nodes
  14. 14. ©2015 Couchbase Inc. 14 Anatomy of a Node machine 1 babysitter memcached ns-server xdcr view-engine other… XDCR, view- engine and some other services also deployed on “data nodes”
  15. 15. ©2015 Couchbase Inc. 15 Anatomy of a Node machine 1 babysitter memcached ns-server xdcr view-engine other… XDCR and view- engine were previously part of ns-server. They’ve been carved out for improved robustness
  16. 16. ©2015 Couchbase Inc. 16 Anatomy of a Node machine 1 babysitter indexer memcached ns-server Indexer is deployed on nodes with the “indexing service” xdcr view-engine other…
  17. 17. ©2015 Couchbase Inc. 17 Anatomy of a Node machine 1 babysitter query indexer memcached ns-server Query is deployed on “query nodes” and serves N1QL requests xdcr view-engine other…
  18. 18. ©2015 Couchbase Inc. 18 Oh no, it’s the Chaos Monkey! machine 1 babysitter query indexer memcached ns-server xdcr view-engine other… xdcr
  19. 19. ©2015 Couchbase Inc. 19 Oh no, it’s the Chaos Monkey! machine 1 babysitter query indexer memcached ns-server xdcr view-engine other… xdcr
  20. 20. ©2015 Couchbase Inc. 20 Inside ns-server per-node-&-bucket services generic distributed facilities generic local facilities vclock, uuid, work queue, events, misclogging (ALE) distributed node discovery master-only services REST admin config gossip replication local config store per-node services per-node-&-bucket services
  21. 21. ©2015 Couchbase Inc. 21 A Little Bit About Erlang  Developed in the 80’s for use in switches Such as this – the famous AXD 301 switch which boasted ridiculous reliability (Even allowing for hyperbole, it was pretty reliable)
  22. 22. ©2015 Couchbase Inc. 22 What Cool About Erlang? According to Four-fold Increase in Productivity andQuality— Industrial-Strength Functional Programming inTelecom-Class Products, http://www.erlang.se/publications/Ulf_Wiger.pdf
  23. 23. ©2015 Couchbase Inc. 23 Key Feature of Erlang #1: Processes & Messages process Active, “lightweight processes” Supports “actor model” style of programming Processes have a mailbox for incoming messages
  24. 24. ©2015 Couchbase Inc. 24 Key Feature of Erlang #1: Processes & Messages process P xyz processQ Values are immutable (no shared state) Communication asynchronous P ! x, P ! y, P ! z.
  25. 25. ©2015 Couchbase Inc. 25 Key Feature of Erlang #1: Processes & Messages a processQ x y z process P “Selective receive” receive y -> Q ! a end.
  26. 26. ©2015 Couchbase Inc. 26 Key Feature of Erlang #1: Processes & Messages processQ x z process P a
  27. 27. ©2015 Couchbase Inc. 27 Key Feature of Erlang #1: Processes & Messages processQ x z a b process P
  28. 28. ©2015 Couchbase Inc. 28 1 gen_server Key Feature of Erlang #2: OTP 21 process Such as gen_server, a generic, stateful, startable and stoppable server Framework that provides useful higher-level constructs built from Erlang primitives
  29. 29. ©2015 Couchbase Inc. 29 Key Feature of Erlang #2: OTP supervisor gen_server 3 gen_server 2 gen_server 1 Also such as supervisor, a process responsible for starting, restarting, stopping and monitoring its child processes
  30. 30. ©2015 Couchbase Inc. 30 gen_server 2 gen_server 2 Oh no, it’s the Chaos Monkey! supervisor gen_server 3 gen_server 1
  31. 31. ©2015 Couchbase Inc. 31 gen_server 2 gen_server 2 Oh no, it’s the Chaos Monkey! supervisor gen_server 3 gen_server 1
  32. 32. ©2015 Couchbase Inc. 32 Inside ns-server ns_server The ns_server ‘application’. Applications describe the bundle of code that is started and stopped together.
  33. 33. ©2015 Couchbase Inc. 33 Inside ns-server ns_server ns_server_cluster_sup The application is responsible for starting the top level supervisor: ns_server_cluster_sup Supervises all supervisors and by extension all Erlang processes in ns_server
  34. 34. ©2015 Couchbase Inc. 34 Inside ns-server ns_server ns_server_cluster_sup ns_config_sup ns_cluster other services …ns_server_nodes_sup Main supervisor for the node outside of config mgmt Everything related to managing config: e.g. storing, updating change notifications Performs node join / leave requests Other stuff we won’t focus on today
  35. 35. ©2015 Couchbase Inc. 35 Inside ns-server ns_server ns_server_cluster_sup ns_config_sup ns_cluster other services …ns_server_nodes_sup ns_server_sup other services … Controls main part of supervision hierarchy
  36. 36. ©2015 Couchbase Inc. 36 Inside ns-server ns_server ns_server_cluster_sup ns_config_sup ns_cluster other services …ns_server_nodes_sup ns_server_sup other services …
  37. 37. ©2015 Couchbase Inc. 37 Inside ns-server ns_server_sup mb_master ns_node_disco_sup menelaus_sup Responsible for master election Node discovery supervisor Supervisor for RESTAPI and web UI app server ns_orchestrator mb_master_sup
  38. 38. ©2015 Couchbase Inc. 38 Inside ns-server ns_server_sup ns_bucket_sup other processes
  39. 39. ©2015 Couchbase Inc. 39 Inside ns-server ns_bucket_sup single_bucket_kv_sup other processes ns_memcached_sup ns_memcached ns_memcached mediates communication with memcached process and tracks its health
  40. 40. ©2015 Couchbase Inc. 40 Inside ns-server ns_bucket_sup single_bucket_kv_sup other processes ns_memcached_sup replication manager dcp_sup dcp_replicator -node1 dcp_replicator -node2 ns_memcached dcp_sup and the replicators manage inbound (consumer) DCP replication streams
  41. 41. ©2015 Couchbase Inc. 41 Inside ns-server ns_bucket_sup single_bucket_kv_sup other processes ns_memcached_sup replication manager dcp_sup dcp_replicator -node1 dcp_replicator -node2 janitor_agent_sup janitor agent ns_memcached The janitor agent keeps things nice and clean and provides remote communication façade for bucket management
  42. 42. ©2015 Couchbase Inc. 42 NS-Server in Action: Rebalance ns_rebalancer idlerebalancing ns_orchestratormenelaus_web node master node rebalance rebalancing User clicks “Rebalance” button State -> “rebalancing” Orchestrator starts rebalancer
  43. 43. ©2015 Couchbase Inc. 43 ns_single_vbucket_mover 1 ns_single_vbucket_mover 2 NS-Server in Action: Rebalance ns_rebalancer ns_bucket_mover master node Rebalance computes new vbucket map & starts the bucket mover Bucket mover in turn starts single vbucket movers
  44. 44. ©2015 Couchbase Inc. 44 janitor agent NS-Server in Action: Rebalance master node to-node for vbucket dcp_replicator dcp_sup janitor agent beginns_single_vbucket mover from-node for vbucket Single vbucket mover sends request to new node to begin replicating memcached to-node memcached from-node
  45. 45. ©2015 Couchbase Inc. 45 janitor agent NS-Server in Action: Rebalance master node to-node for vbucket dcp_replicator dcp_sup from-node for vbucket wait ns_single_vbucket mover memcached to-node memcached from-node janitor agent Then mover waits until the backfill is done and replication is up-to-date
  46. 46. ©2015 Couchbase Inc. 46 ns_single_vbucket mover janitor agent NS-Server in Action: Rebalance master node to-node for vbucket dcp_sup from-node for vbucket memcached to-node janitor agent dcp_replicator memcached from-node Lastly mover converts the backfill replication stream to a “takeover” stream takeover
  47. 47. ©2015 Couchbase Inc. 47 Inside ns-server per-node-&-bucket services generic distributed facilities generic local facilities vclock, uuid, work queue, events, misclogging (ALE) distributed node discovery master-only services REST admin config gossip replication local config store per-node services per-node-&-bucket services
  48. 48. ©2015 Couchbase Inc. 48 Anatomy of a Node machine 1 babysitter query indexer memcached ns-server xdcr view-engine other…
  49. 49. ©2015 Couchbase Inc. 49 Back to the Cluster machine 1 machine 2 machine 3 Ethernet node 1 node 2 node 3
  50. 50. couchbase.com/beta

×