Untangling Cluster Management with HelixHelix team @ LinkedInKishore Gopalakrishnahttp://www.linkedin.com/in/kgopalak@kish...
Outline What is Helix Use case 1: distributed data store Architecture Use case 2: consumer group Helix at LinkedIn Q...
What is Helix  Cluster management framework for distributed systems  using declarative state model                        ...
Distributed system examples                              4
Motivation A system starts out simple… …but gets complex in the real world …as you address real requirements           ...
Motivation These are cluster management problems  Helix solves them once…     Scale  …so you can focus on your system...
Outline What is Helix Use case 1: distributed data store Architecture Use case 2: consumer group Helix at LinkedIn Q...
Use-Case: Distributed Data Store Distributed                          P.1      Node 1            Node 2     Node 3       ...
Use-Case: Distributed Data Store Distributed Partitioned  P.1    P.2     P.3   P.5     P.6    P.7   P.9    P.1     P.11 ...
Use-Case: Distributed Data Store Distributed Partitioned Replicated  P.1    P.2     P.3   P.5      P.6    P.7   P.9    ...
Partition Layout Highly Available Master accepts writes Balanced distribution                                          ...
Failover                                                            Master                                                ...
Add Capacity  P.1    P.5     P.9  P.1    P.1     P.8  0      2                                                            ...
Use-case requirements  • Partition constraints     • 1 master per partition     • Balance partitions across cluster     • ...
Recruiting Solutions   ‹#›
Generalizing cluster management                   STATE MACHINE          CONSTRAINTS              OBJECTIVE               ...
Outline What is Helix Use case 1: distributed data store Architecture Use case 2: consumer group Helix at LinkedIn Q...
Helix Based System Roles                                                                                 PARTICIPANT    ID...
Controller Execution Flow             N1   P1   P2               SLAVE              N1   P1   P2                          ...
Controller fault tolerance                             Zookeeper               Controller    Controller   Controller      ...
Controller fault tolerance                             Zookeeper               Controller    Controller   Controller      ...
Participant Plug-in code                           22
Spectator Plug-in code                         23
Benefits Cluster operations “just work”   – Bootstrapping   – Failover   – Add nodes Global vs Local   – Helix Controlle...
Outline What is Helix Use case 1: distributed data store Architecture Use case 2: consumer group Helix at LinkedIn Q...
consumer group                 26
Consumer group: Scaling                          27
Consumer group: Fault tolerance                                  28
Consumer group: state model                   ONLINE     MAX=1                   OFFLINE                                  ...
Outline What is Helix Use case 1: distributed data store Architecture Use case 2: consumer group Helix at LinkedIn Q...
Helix usage at LinkedIn (Pictures) Espresso   – a timeline-consistent, distributed data store Databus   – a change data ...
Summary Building Distributed Data Systems is hard   – Abstraction and modularity is key Helix: A Generic framework for C...
Helix: Future Roadmap• Features   • Span multiple data centers   • Load balancing• Announcement   • Open source: https://g...
Questions?             34
Upcoming SlideShare
Loading in...5
×

Apache Helix presentation at SOCC 2012

2,375

Published on

Untangling Cluster Management with Helix SOCC 2012 presentation

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,375
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
70
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide
  • Partitioned queue consumption, lets say there are 6 queues and some consumers to consume form these queues.The requirement is simple, the number of queues must be equally divided among the consumers. On top of the we need partition affinity while consuming instead of randomly picking up from any queue.
  • Apache Helix presentation at SOCC 2012

    1. 1. Untangling Cluster Management with HelixHelix team @ LinkedInKishore Gopalakrishnahttp://www.linkedin.com/in/kgopalak@kishoreg1980 Recruiting Solutions 1
    2. 2. Outline What is Helix Use case 1: distributed data store Architecture Use case 2: consumer group Helix at LinkedIn Q&A 2
    3. 3. What is Helix Cluster management framework for distributed systems using declarative state model 3
    4. 4. Distributed system examples 4
    5. 5. Motivation A system starts out simple… …but gets complex in the real world …as you address real requirements Application client library  Scale  Failover  Bootstrapping Call Routing System Replica 1 … Replica 2 … 5
    6. 6. Motivation These are cluster management problems  Helix solves them once… Scale  …so you can focus on your system Failover  Bootstrapping 6
    7. 7. Outline What is Helix Use case 1: distributed data store Architecture Use case 2: consumer group Helix at LinkedIn Q&A 7
    8. 8. Use-Case: Distributed Data Store Distributed P.1 Node 1 Node 2 Node 3 8
    9. 9. Use-Case: Distributed Data Store Distributed Partitioned P.1 P.2 P.3 P.5 P.6 P.7 P.9 P.1 P.11 0 P.4 P.8 P.1 2 Node 1 Node 2 Node 3 9
    10. 10. Use-Case: Distributed Data Store Distributed Partitioned Replicated P.1 P.2 P.3 P.5 P.6 P.7 P.9 P.1 P.11 0 P.4 P.5 P.6 P.8 P.1 P.2 P.1 P.3 P.4 2 P.9 P.1 P.11 P.1 P.7 P.8 0 2 Node 1 Node 2 Node 3 10
    11. 11. Partition Layout Highly Available Master accepts writes Balanced distribution Master Slave P.1 P.2 P.3 P.5 P.6 P.7 P.9 P.1 P.11 0 P.4 P.5 P.6 P.8 P.1 P.2 P.1 P.3 P.4 2 P.9 P.1 P.11 P.1 P.7 P.8 0 2 Node 1 Node 2 Node 3 11
    12. 12. Failover Master Slave P.1 P.2 P.3 P.5 P.6 P.7 P.9 P.1 P.11 0 P.4 P.5 P.6 P.8 P.1 P.2 P.1 P.3 P.4 2 P.9 P.1 P.11 P.1 P.7 P.8 0 2 Node 1 Node 2 Node 3
    13. 13. Add Capacity P.1 P.5 P.9 P.1 P.1 P.8 0 2 Master Node 4 Slave P.1 P.2 P.3 P.5 P.6 P.7 P.9 P.1 P.11 0 P.4 P.5 P.6 P.8 P.1 P.2 P.1 P.3 P.4 2 P.9 P.1 P.11 P.1 P.7 P.8 0 2 Node 1 Node 2 Node 3
    14. 14. Use-case requirements • Partition constraints • 1 master per partition • Balance partitions across cluster • No single-point-of-failure: replicas on different nodes • Handle failures: transfer mastership • Elasticity • Distribute workload across added nodes  Minimize partition movement • Meet SLAs  Throttle concurrent data movement 14
    15. 15. Recruiting Solutions ‹#›
    16. 16. Generalizing cluster management STATE MACHINE CONSTRAINTS OBJECTIVE 16
    17. 17. Outline What is Helix Use case 1: distributed data store Architecture Use case 2: consumer group Helix at LinkedIn Q&A 17
    18. 18. Helix Based System Roles PARTICIPANT IDEAL STATE SPECTATOR Controller Parition routing logic CURRENT STATE RESPONSE COMMAND P.1 P.2 P.3 P.5 P.6 P.7 P.9 P.1 P.1 0 1 P.4 P.5 P.6 P.8 P.1 P.2 P.1 P.3 P.4 P.1 2 P.9 P.1 P.1 P.1 P.7 P.8 0 1 2 Node 1 Node 2 Node 3 18
    19. 19. Controller Execution Flow N1 P1 P2 SLAVE N1 P1 P2 S N2 P2 P3 N2 P2 P3 N3 P3 P1 N3 P3 P1 N1 O M OFFLINE MASTER REBALANCER N2 P1:OS P1:SM N1 P1 P2 N3 N2 P2 P3 ZooKeeperSPECTATORS N3 P3 P1 MESSAGE QUEUE
    20. 20. Controller fault tolerance Zookeeper Controller Controller Controller 1 2 3 LEADER STANDBY STANDBY 20
    21. 21. Controller fault tolerance Zookeeper Controller Controller Controller 1 2 3 OFFLINE LEADER STANDBY 21
    22. 22. Participant Plug-in code 22
    23. 23. Spectator Plug-in code 23
    24. 24. Benefits Cluster operations “just work” – Bootstrapping – Failover – Add nodes Global vs Local – Helix Controller  Global knowledge  Makes cluster decisions – Participant  Local knowledge  Follows orders 24
    25. 25. Outline What is Helix Use case 1: distributed data store Architecture Use case 2: consumer group Helix at LinkedIn Q&A 25
    26. 26. consumer group 26
    27. 27. Consumer group: Scaling 27
    28. 28. Consumer group: Fault tolerance 28
    29. 29. Consumer group: state model ONLINE MAX=1 OFFLINE 29
    30. 30. Outline What is Helix Use case 1: distributed data store Architecture Use case 2: consumer group Helix at LinkedIn Q&A 30
    31. 31. Helix usage at LinkedIn (Pictures) Espresso – a timeline-consistent, distributed data store Databus – a change data capture service Search as a Service – a multi-tenant service for multiple search applications More planned 31
    32. 32. Summary Building Distributed Data Systems is hard – Abstraction and modularity is key Helix: A Generic framework for Cluster Management Simple programming model: declarative state machine 32
    33. 33. Helix: Future Roadmap• Features • Span multiple data centers • Load balancing• Announcement • Open source: https://github.com/linkedin/helix • Apache incubation • New contributors
    34. 34. Questions? 34
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×