• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Apache Helix presentation at SOCC 2012
 

Apache Helix presentation at SOCC 2012

on

  • 1,926 views

Untangling Cluster Management with Helix SOCC 2012 presentation

Untangling Cluster Management with Helix SOCC 2012 presentation

Statistics

Views

Total Views
1,926
Views on SlideShare
1,921
Embed Views
5

Actions

Likes
4
Downloads
43
Comments
0

2 Embeds 5

http://www.slashdocs.com 4
http://www.docshut.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Partitioned queue consumption, lets say there are 6 queues and some consumers to consume form these queues.The requirement is simple, the number of queues must be equally divided among the consumers. On top of the we need partition affinity while consuming instead of randomly picking up from any queue.

Apache Helix presentation at SOCC 2012 Apache Helix presentation at SOCC 2012 Presentation Transcript

  • Untangling Cluster Management with HelixHelix team @ LinkedInKishore Gopalakrishnahttp://www.linkedin.com/in/kgopalak@kishoreg1980 Recruiting Solutions 1
  • Outline What is Helix Use case 1: distributed data store Architecture Use case 2: consumer group Helix at LinkedIn Q&A 2
  • What is Helix Cluster management framework for distributed systems using declarative state model 3
  • Distributed system examples 4
  • Motivation A system starts out simple… …but gets complex in the real world …as you address real requirements Application client library  Scale  Failover  Bootstrapping Call Routing System Replica 1 … Replica 2 … 5
  • Motivation These are cluster management problems  Helix solves them once… Scale  …so you can focus on your system Failover  Bootstrapping 6
  • Outline What is Helix Use case 1: distributed data store Architecture Use case 2: consumer group Helix at LinkedIn Q&A 7
  • Use-Case: Distributed Data Store Distributed P.1 Node 1 Node 2 Node 3 8
  • Use-Case: Distributed Data Store Distributed Partitioned P.1 P.2 P.3 P.5 P.6 P.7 P.9 P.1 P.11 0 P.4 P.8 P.1 2 Node 1 Node 2 Node 3 9
  • Use-Case: Distributed Data Store Distributed Partitioned Replicated P.1 P.2 P.3 P.5 P.6 P.7 P.9 P.1 P.11 0 P.4 P.5 P.6 P.8 P.1 P.2 P.1 P.3 P.4 2 P.9 P.1 P.11 P.1 P.7 P.8 0 2 Node 1 Node 2 Node 3 10
  • Partition Layout Highly Available Master accepts writes Balanced distribution Master Slave P.1 P.2 P.3 P.5 P.6 P.7 P.9 P.1 P.11 0 P.4 P.5 P.6 P.8 P.1 P.2 P.1 P.3 P.4 2 P.9 P.1 P.11 P.1 P.7 P.8 0 2 Node 1 Node 2 Node 3 11
  • Failover Master Slave P.1 P.2 P.3 P.5 P.6 P.7 P.9 P.1 P.11 0 P.4 P.5 P.6 P.8 P.1 P.2 P.1 P.3 P.4 2 P.9 P.1 P.11 P.1 P.7 P.8 0 2 Node 1 Node 2 Node 3
  • Add Capacity P.1 P.5 P.9 P.1 P.1 P.8 0 2 Master Node 4 Slave P.1 P.2 P.3 P.5 P.6 P.7 P.9 P.1 P.11 0 P.4 P.5 P.6 P.8 P.1 P.2 P.1 P.3 P.4 2 P.9 P.1 P.11 P.1 P.7 P.8 0 2 Node 1 Node 2 Node 3
  • Use-case requirements • Partition constraints • 1 master per partition • Balance partitions across cluster • No single-point-of-failure: replicas on different nodes • Handle failures: transfer mastership • Elasticity • Distribute workload across added nodes  Minimize partition movement • Meet SLAs  Throttle concurrent data movement 14
  • Recruiting Solutions ‹#›
  • Generalizing cluster management STATE MACHINE CONSTRAINTS OBJECTIVE 16
  • Outline What is Helix Use case 1: distributed data store Architecture Use case 2: consumer group Helix at LinkedIn Q&A 17
  • Helix Based System Roles PARTICIPANT IDEAL STATE SPECTATOR Controller Parition routing logic CURRENT STATE RESPONSE COMMAND P.1 P.2 P.3 P.5 P.6 P.7 P.9 P.1 P.1 0 1 P.4 P.5 P.6 P.8 P.1 P.2 P.1 P.3 P.4 P.1 2 P.9 P.1 P.1 P.1 P.7 P.8 0 1 2 Node 1 Node 2 Node 3 18
  • Controller Execution Flow N1 P1 P2 SLAVE N1 P1 P2 S N2 P2 P3 N2 P2 P3 N3 P3 P1 N3 P3 P1 N1 O M OFFLINE MASTER REBALANCER N2 P1:OS P1:SM N1 P1 P2 N3 N2 P2 P3 ZooKeeperSPECTATORS N3 P3 P1 MESSAGE QUEUE
  • Controller fault tolerance Zookeeper Controller Controller Controller 1 2 3 LEADER STANDBY STANDBY 20
  • Controller fault tolerance Zookeeper Controller Controller Controller 1 2 3 OFFLINE LEADER STANDBY 21
  • Participant Plug-in code 22
  • Spectator Plug-in code 23
  • Benefits Cluster operations “just work” – Bootstrapping – Failover – Add nodes Global vs Local – Helix Controller  Global knowledge  Makes cluster decisions – Participant  Local knowledge  Follows orders 24
  • Outline What is Helix Use case 1: distributed data store Architecture Use case 2: consumer group Helix at LinkedIn Q&A 25
  • consumer group 26
  • Consumer group: Scaling 27
  • Consumer group: Fault tolerance 28
  • Consumer group: state model ONLINE MAX=1 OFFLINE 29
  • Outline What is Helix Use case 1: distributed data store Architecture Use case 2: consumer group Helix at LinkedIn Q&A 30
  • Helix usage at LinkedIn (Pictures) Espresso – a timeline-consistent, distributed data store Databus – a change data capture service Search as a Service – a multi-tenant service for multiple search applications More planned 31
  • Summary Building Distributed Data Systems is hard – Abstraction and modularity is key Helix: A Generic framework for Cluster Management Simple programming model: declarative state machine 32
  • Helix: Future Roadmap• Features • Span multiple data centers • Load balancing• Announcement • Open source: https://github.com/linkedin/helix • Apache incubation • New contributors
  • Questions? 34