Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Untangling Cluster Management with HelixHelix team @ LinkedInKishore Gopalakrishnahttp://www.linkedin.com/in/kgopalak@kish...
Outline What is Helix Use case 1: distributed data store Architecture Use case 2: consumer group Helix at LinkedIn Q...
What is Helix  Cluster management framework for distributed systems  using declarative state model                        ...
Distributed system examples                              4
Motivation A system starts out simple… …but gets complex in the real world …as you address real requirements           ...
Motivation These are cluster management problems  Helix solves them once…     Scale  …so you can focus on your system...
Outline What is Helix Use case 1: distributed data store Architecture Use case 2: consumer group Helix at LinkedIn Q...
Use-Case: Distributed Data Store Distributed                          P.1      Node 1            Node 2     Node 3       ...
Use-Case: Distributed Data Store Distributed Partitioned  P.1    P.2     P.3   P.5     P.6    P.7   P.9    P.1     P.11 ...
Use-Case: Distributed Data Store Distributed Partitioned Replicated  P.1    P.2     P.3   P.5      P.6    P.7   P.9    ...
Partition Layout Highly Available Master accepts writes Balanced distribution                                          ...
Failover                                                            Master                                                ...
Add Capacity  P.1    P.5     P.9  P.1    P.1     P.8  0      2                                                            ...
Use-case requirements  • Partition constraints     • 1 master per partition     • Balance partitions across cluster     • ...
Recruiting Solutions   ‹#›
Generalizing cluster management                   STATE MACHINE          CONSTRAINTS              OBJECTIVE               ...
Outline What is Helix Use case 1: distributed data store Architecture Use case 2: consumer group Helix at LinkedIn Q...
Helix Based System Roles                                                                                 PARTICIPANT    ID...
Controller Execution Flow             N1   P1   P2               SLAVE              N1   P1   P2                          ...
Controller fault tolerance                             Zookeeper               Controller    Controller   Controller      ...
Controller fault tolerance                             Zookeeper               Controller    Controller   Controller      ...
Participant Plug-in code                           22
Spectator Plug-in code                         23
Benefits Cluster operations “just work”   – Bootstrapping   – Failover   – Add nodes Global vs Local   – Helix Controlle...
Outline What is Helix Use case 1: distributed data store Architecture Use case 2: consumer group Helix at LinkedIn Q...
consumer group                 26
Consumer group: Scaling                          27
Consumer group: Fault tolerance                                  28
Consumer group: state model                   ONLINE     MAX=1                   OFFLINE                                  ...
Outline What is Helix Use case 1: distributed data store Architecture Use case 2: consumer group Helix at LinkedIn Q...
Helix usage at LinkedIn (Pictures) Espresso   – a timeline-consistent, distributed data store Databus   – a change data ...
Summary Building Distributed Data Systems is hard   – Abstraction and modularity is key Helix: A Generic framework for C...
Helix: Future Roadmap• Features   • Span multiple data centers   • Load balancing• Announcement   • Open source: https://g...
Questions?             34
Upcoming SlideShare
Loading in …5
×

Apache Helix presentation at SOCC 2012

4,090 views

Published on

Untangling Cluster Management with Helix SOCC 2012 presentation

Published in: Technology
  • Be the first to comment

Apache Helix presentation at SOCC 2012

  1. 1. Untangling Cluster Management with HelixHelix team @ LinkedInKishore Gopalakrishnahttp://www.linkedin.com/in/kgopalak@kishoreg1980 Recruiting Solutions 1
  2. 2. Outline What is Helix Use case 1: distributed data store Architecture Use case 2: consumer group Helix at LinkedIn Q&A 2
  3. 3. What is Helix Cluster management framework for distributed systems using declarative state model 3
  4. 4. Distributed system examples 4
  5. 5. Motivation A system starts out simple… …but gets complex in the real world …as you address real requirements Application client library  Scale  Failover  Bootstrapping Call Routing System Replica 1 … Replica 2 … 5
  6. 6. Motivation These are cluster management problems  Helix solves them once… Scale  …so you can focus on your system Failover  Bootstrapping 6
  7. 7. Outline What is Helix Use case 1: distributed data store Architecture Use case 2: consumer group Helix at LinkedIn Q&A 7
  8. 8. Use-Case: Distributed Data Store Distributed P.1 Node 1 Node 2 Node 3 8
  9. 9. Use-Case: Distributed Data Store Distributed Partitioned P.1 P.2 P.3 P.5 P.6 P.7 P.9 P.1 P.11 0 P.4 P.8 P.1 2 Node 1 Node 2 Node 3 9
  10. 10. Use-Case: Distributed Data Store Distributed Partitioned Replicated P.1 P.2 P.3 P.5 P.6 P.7 P.9 P.1 P.11 0 P.4 P.5 P.6 P.8 P.1 P.2 P.1 P.3 P.4 2 P.9 P.1 P.11 P.1 P.7 P.8 0 2 Node 1 Node 2 Node 3 10
  11. 11. Partition Layout Highly Available Master accepts writes Balanced distribution Master Slave P.1 P.2 P.3 P.5 P.6 P.7 P.9 P.1 P.11 0 P.4 P.5 P.6 P.8 P.1 P.2 P.1 P.3 P.4 2 P.9 P.1 P.11 P.1 P.7 P.8 0 2 Node 1 Node 2 Node 3 11
  12. 12. Failover Master Slave P.1 P.2 P.3 P.5 P.6 P.7 P.9 P.1 P.11 0 P.4 P.5 P.6 P.8 P.1 P.2 P.1 P.3 P.4 2 P.9 P.1 P.11 P.1 P.7 P.8 0 2 Node 1 Node 2 Node 3
  13. 13. Add Capacity P.1 P.5 P.9 P.1 P.1 P.8 0 2 Master Node 4 Slave P.1 P.2 P.3 P.5 P.6 P.7 P.9 P.1 P.11 0 P.4 P.5 P.6 P.8 P.1 P.2 P.1 P.3 P.4 2 P.9 P.1 P.11 P.1 P.7 P.8 0 2 Node 1 Node 2 Node 3
  14. 14. Use-case requirements • Partition constraints • 1 master per partition • Balance partitions across cluster • No single-point-of-failure: replicas on different nodes • Handle failures: transfer mastership • Elasticity • Distribute workload across added nodes  Minimize partition movement • Meet SLAs  Throttle concurrent data movement 14
  15. 15. Recruiting Solutions ‹#›
  16. 16. Generalizing cluster management STATE MACHINE CONSTRAINTS OBJECTIVE 16
  17. 17. Outline What is Helix Use case 1: distributed data store Architecture Use case 2: consumer group Helix at LinkedIn Q&A 17
  18. 18. Helix Based System Roles PARTICIPANT IDEAL STATE SPECTATOR Controller Parition routing logic CURRENT STATE RESPONSE COMMAND P.1 P.2 P.3 P.5 P.6 P.7 P.9 P.1 P.1 0 1 P.4 P.5 P.6 P.8 P.1 P.2 P.1 P.3 P.4 P.1 2 P.9 P.1 P.1 P.1 P.7 P.8 0 1 2 Node 1 Node 2 Node 3 18
  19. 19. Controller Execution Flow N1 P1 P2 SLAVE N1 P1 P2 S N2 P2 P3 N2 P2 P3 N3 P3 P1 N3 P3 P1 N1 O M OFFLINE MASTER REBALANCER N2 P1:OS P1:SM N1 P1 P2 N3 N2 P2 P3 ZooKeeperSPECTATORS N3 P3 P1 MESSAGE QUEUE
  20. 20. Controller fault tolerance Zookeeper Controller Controller Controller 1 2 3 LEADER STANDBY STANDBY 20
  21. 21. Controller fault tolerance Zookeeper Controller Controller Controller 1 2 3 OFFLINE LEADER STANDBY 21
  22. 22. Participant Plug-in code 22
  23. 23. Spectator Plug-in code 23
  24. 24. Benefits Cluster operations “just work” – Bootstrapping – Failover – Add nodes Global vs Local – Helix Controller  Global knowledge  Makes cluster decisions – Participant  Local knowledge  Follows orders 24
  25. 25. Outline What is Helix Use case 1: distributed data store Architecture Use case 2: consumer group Helix at LinkedIn Q&A 25
  26. 26. consumer group 26
  27. 27. Consumer group: Scaling 27
  28. 28. Consumer group: Fault tolerance 28
  29. 29. Consumer group: state model ONLINE MAX=1 OFFLINE 29
  30. 30. Outline What is Helix Use case 1: distributed data store Architecture Use case 2: consumer group Helix at LinkedIn Q&A 30
  31. 31. Helix usage at LinkedIn (Pictures) Espresso – a timeline-consistent, distributed data store Databus – a change data capture service Search as a Service – a multi-tenant service for multiple search applications More planned 31
  32. 32. Summary Building Distributed Data Systems is hard – Abstraction and modularity is key Helix: A Generic framework for Cluster Management Simple programming model: declarative state machine 32
  33. 33. Helix: Future Roadmap• Features • Span multiple data centers • Load balancing• Announcement • Open source: https://github.com/linkedin/helix • Apache incubation • New contributors
  34. 34. Questions? 34

×