Untangling Cluster Management with Helix

Helix team @ LinkedIn
Kishore Gopalakrishna
http://www.linkedin.com/in/kgopalak
@kishoreg1980
     Recruiting Solutions                  1
Outline


 What is Helix
 Use case 1: distributed data store
 Architecture
 Use case 2: consumer group
 Helix at LinkedIn
 Q&A


                                       2
What is Helix




  Cluster management framework for distributed systems
  using declarative state model




                                                         3
Distributed system examples




                              4
Motivation

 A system starts out simple…
 …but gets complex in the real world
 …as you address real requirements

                          Application

                           client library
  Scale
  Failover
  Bootstrapping
                           Call Routing
                             System

          Replica 1                         …

          Replica 2                         …
                                                5
Motivation




 These are cluster management problems
  Helix solves them once…
     Scale
  …so you can focus on your system
     Failover
  Bootstrapping




                                          6
Outline


 What is Helix
 Use case 1: distributed data store
 Architecture
 Use case 2: consumer group
 Helix at LinkedIn
 Q&A


                                       7
Use-Case: Distributed Data Store

 Distributed




                          P.1




      Node 1            Node 2     Node 3


                                            8
Use-Case: Distributed Data Store

 Distributed
 Partitioned




  P.1    P.2     P.3   P.5     P.6    P.7   P.9    P.1     P.11
                                                   0
  P.4                  P.8                  P.1
                                            2



        Node 1               Node 2               Node 3


                                                                  9
Use-Case: Distributed Data Store

 Distributed
 Partitioned
 Replicated




  P.1    P.2     P.3   P.5      P.6    P.7   P.9    P.1     P.11
                                                    0
  P.4    P.5     P.6   P.8      P.1    P.2   P.1    P.3     P.4
                                             2
  P.9    P.1           P.11     P.1          P.7    P.8
         0                      2

        Node 1                Node 2               Node 3


                                                                   10
Partition Layout

 Highly Available
 Master accepts writes
 Balanced distribution
                                                            Master
                                                            Slave




  P.1    P.2     P.3   P.5      P.6    P.7   P.9    P.1       P.11
                                                    0
  P.4    P.5     P.6   P.8      P.1    P.2   P.1    P.3       P.4
                                             2
  P.9    P.1           P.11     P.1          P.7    P.8
         0                      2

        Node 1                Node 2               Node 3


                                                                     11
Failover




                                                            Master
                                                            Slave




  P.1    P.2     P.3   P.5      P.6    P.7   P.9    P.1       P.11
                                                    0
  P.4    P.5     P.6   P.8      P.1    P.2   P.1    P.3       P.4
                                             2
  P.9    P.1           P.11     P.1          P.7    P.8
         0                      2

        Node 1                Node 2               Node 3
Add Capacity


  P.1    P.5     P.9


  P.1    P.1     P.8
  0      2
                                                            Master
        Node 4                                              Slave




  P.1    P.2     P.3   P.5      P.6    P.7   P.9    P.1       P.11
                                                    0
  P.4    P.5     P.6   P.8      P.1    P.2   P.1    P.3       P.4
                                             2
  P.9    P.1           P.11     P.1          P.7    P.8
         0                      2

        Node 1                Node 2               Node 3
Use-case requirements

  • Partition constraints
     • 1 master per partition
     • Balance partitions across cluster
     • No single-point-of-failure: replicas on different nodes
  • Handle failures: transfer mastership
  • Elasticity
     • Distribute workload across added nodes
      Minimize partition movement
  • Meet SLAs
      Throttle concurrent data movement




                                                                 14
Recruiting Solutions   ‹#›
Generalizing cluster management



                   STATE MACHINE




          CONSTRAINTS              OBJECTIVE

                                               16
Outline


 What is Helix
 Use case 1: distributed data store
 Architecture
 Use case 2: consumer group
 Helix at LinkedIn
 Q&A


                                       17
Helix Based System Roles

                                                                                 PARTICIPANT
    IDEAL STATE

                                                                                 SPECTATOR
                                    Controller


                                                       Parition routing
                                                             logic
   CURRENT STATE
                         RESPONSE        COMMAND




   P.1     P.2     P.3          P.5        P.6   P.7       P.9       P.1   P.1
                                                                     0     1

   P.4     P.5     P.6          P.8        P.1   P.2       P.1       P.3   P.4
                                           P.1
                                                           2

   P.9     P.1                  P.1        P.1             P.7       P.8
           0                    1          2


         Node 1                       Node 2                     Node 3

                                                                                       18
Controller Execution Flow



             N1   P1   P2               SLAVE              N1   P1   P2
                                          S
             N2   P2   P3                                  N2   P2   P3


             N3   P3   P1                                  N3   P3   P1

                                                                           N1
                             O                        M
                            OFFLINE               MASTER

                                      REBALANCER                           N2

                                                            P1:OS
                                                           P1:SM
             N1   P1   P2

                                                                           N3
             N2   P2   P3
                                      ZooKeeper

SPECTATORS   N3   P3   P1



                                                           MESSAGE QUEUE
Controller fault tolerance




                             Zookeeper




               Controller    Controller   Controller
                  1             2            3




               LEADER        STANDBY      STANDBY




                                                       20
Controller fault tolerance




                             Zookeeper




               Controller    Controller   Controller
                  1             2            3




               OFFLINE       LEADER       STANDBY




                                                       21
Participant Plug-in code




                           22
Spectator Plug-in code




                         23
Benefits

 Cluster operations “just work”
   – Bootstrapping
   – Failover
   – Add nodes
 Global vs Local
   – Helix Controller
        Global knowledge
        Makes cluster decisions
   – Participant
        Local knowledge
        Follows orders




                                   24
Outline


 What is Helix
 Use case 1: distributed data store
 Architecture
 Use case 2: consumer group
 Helix at LinkedIn
 Q&A


                                       25
consumer group




                 26
Consumer group: Scaling




                          27
Consumer group: Fault tolerance




                                  28
Consumer group: state model


                   ONLINE     MAX=1




                   OFFLINE


                                      29
Outline


 What is Helix
 Use case 1: distributed data store
 Architecture
 Use case 2: consumer group
 Helix at LinkedIn
 Q&A


                                       30
Helix usage at LinkedIn (Pictures)

 Espresso
   – a timeline-consistent, distributed data store
 Databus
   – a change data capture service
 Search as a Service
   – a multi-tenant service for multiple search applications
 More planned




                                                               31
Summary

 Building Distributed Data Systems is hard
   – Abstraction and modularity is key
 Helix: A Generic framework for Cluster Management
 Simple programming model: declarative state machine




                                                        32
Helix: Future Roadmap


• Features
   • Span multiple data centers
   • Load balancing


• Announcement
   • Open source: https://github.com/linkedin/helix
   • Apache incubation
   • New contributors
Questions?




             34

Apache Helix presentation at SOCC 2012

  • 1.
    Untangling Cluster Managementwith Helix Helix team @ LinkedIn Kishore Gopalakrishna http://www.linkedin.com/in/kgopalak @kishoreg1980 Recruiting Solutions 1
  • 2.
    Outline  What isHelix  Use case 1: distributed data store  Architecture  Use case 2: consumer group  Helix at LinkedIn  Q&A 2
  • 3.
    What is Helix Cluster management framework for distributed systems using declarative state model 3
  • 4.
  • 5.
    Motivation  A systemstarts out simple…  …but gets complex in the real world  …as you address real requirements Application client library  Scale  Failover  Bootstrapping Call Routing System Replica 1 … Replica 2 … 5
  • 6.
    Motivation  These arecluster management problems   Helix solves them once… Scale   …so you can focus on your system Failover  Bootstrapping 6
  • 7.
    Outline  What isHelix  Use case 1: distributed data store  Architecture  Use case 2: consumer group  Helix at LinkedIn  Q&A 7
  • 8.
    Use-Case: Distributed DataStore  Distributed P.1 Node 1 Node 2 Node 3 8
  • 9.
    Use-Case: Distributed DataStore  Distributed  Partitioned P.1 P.2 P.3 P.5 P.6 P.7 P.9 P.1 P.11 0 P.4 P.8 P.1 2 Node 1 Node 2 Node 3 9
  • 10.
    Use-Case: Distributed DataStore  Distributed  Partitioned  Replicated P.1 P.2 P.3 P.5 P.6 P.7 P.9 P.1 P.11 0 P.4 P.5 P.6 P.8 P.1 P.2 P.1 P.3 P.4 2 P.9 P.1 P.11 P.1 P.7 P.8 0 2 Node 1 Node 2 Node 3 10
  • 11.
    Partition Layout  HighlyAvailable  Master accepts writes  Balanced distribution Master Slave P.1 P.2 P.3 P.5 P.6 P.7 P.9 P.1 P.11 0 P.4 P.5 P.6 P.8 P.1 P.2 P.1 P.3 P.4 2 P.9 P.1 P.11 P.1 P.7 P.8 0 2 Node 1 Node 2 Node 3 11
  • 12.
    Failover Master Slave P.1 P.2 P.3 P.5 P.6 P.7 P.9 P.1 P.11 0 P.4 P.5 P.6 P.8 P.1 P.2 P.1 P.3 P.4 2 P.9 P.1 P.11 P.1 P.7 P.8 0 2 Node 1 Node 2 Node 3
  • 13.
    Add Capacity P.1 P.5 P.9 P.1 P.1 P.8 0 2 Master Node 4 Slave P.1 P.2 P.3 P.5 P.6 P.7 P.9 P.1 P.11 0 P.4 P.5 P.6 P.8 P.1 P.2 P.1 P.3 P.4 2 P.9 P.1 P.11 P.1 P.7 P.8 0 2 Node 1 Node 2 Node 3
  • 14.
    Use-case requirements • Partition constraints • 1 master per partition • Balance partitions across cluster • No single-point-of-failure: replicas on different nodes • Handle failures: transfer mastership • Elasticity • Distribute workload across added nodes  Minimize partition movement • Meet SLAs  Throttle concurrent data movement 14
  • 15.
  • 16.
    Generalizing cluster management STATE MACHINE CONSTRAINTS OBJECTIVE 16
  • 17.
    Outline  What isHelix  Use case 1: distributed data store  Architecture  Use case 2: consumer group  Helix at LinkedIn  Q&A 17
  • 18.
    Helix Based SystemRoles PARTICIPANT IDEAL STATE SPECTATOR Controller Parition routing logic CURRENT STATE RESPONSE COMMAND P.1 P.2 P.3 P.5 P.6 P.7 P.9 P.1 P.1 0 1 P.4 P.5 P.6 P.8 P.1 P.2 P.1 P.3 P.4 P.1 2 P.9 P.1 P.1 P.1 P.7 P.8 0 1 2 Node 1 Node 2 Node 3 18
  • 19.
    Controller Execution Flow N1 P1 P2 SLAVE N1 P1 P2 S N2 P2 P3 N2 P2 P3 N3 P3 P1 N3 P3 P1 N1 O M OFFLINE MASTER REBALANCER N2 P1:OS P1:SM N1 P1 P2 N3 N2 P2 P3 ZooKeeper SPECTATORS N3 P3 P1 MESSAGE QUEUE
  • 20.
    Controller fault tolerance Zookeeper Controller Controller Controller 1 2 3 LEADER STANDBY STANDBY 20
  • 21.
    Controller fault tolerance Zookeeper Controller Controller Controller 1 2 3 OFFLINE LEADER STANDBY 21
  • 22.
  • 23.
  • 24.
    Benefits  Cluster operations“just work” – Bootstrapping – Failover – Add nodes  Global vs Local – Helix Controller  Global knowledge  Makes cluster decisions – Participant  Local knowledge  Follows orders 24
  • 25.
    Outline  What isHelix  Use case 1: distributed data store  Architecture  Use case 2: consumer group  Helix at LinkedIn  Q&A 25
  • 26.
  • 27.
  • 28.
  • 29.
    Consumer group: statemodel ONLINE MAX=1 OFFLINE 29
  • 30.
    Outline  What isHelix  Use case 1: distributed data store  Architecture  Use case 2: consumer group  Helix at LinkedIn  Q&A 30
  • 31.
    Helix usage atLinkedIn (Pictures)  Espresso – a timeline-consistent, distributed data store  Databus – a change data capture service  Search as a Service – a multi-tenant service for multiple search applications  More planned 31
  • 32.
    Summary  Building DistributedData Systems is hard – Abstraction and modularity is key  Helix: A Generic framework for Cluster Management  Simple programming model: declarative state machine 32
  • 33.
    Helix: Future Roadmap •Features • Span multiple data centers • Load balancing • Announcement • Open source: https://github.com/linkedin/helix • Apache incubation • New contributors
  • 34.

Editor's Notes

  • #27 Partitioned queue consumption, lets say there are 6 queues and some consumers to consume form these queues.The requirement is simple, the number of queues must be equally divided among the consumers. On top of the we need partition affinity while consuming instead of randomly picking up from any queue.