Apache Helix presentation at SOCC 2012

Untangling Cluster Management with Helix

Helix team @ LinkedIn
Kishore Gopalakrishna
http://www.linkedin.com/in/kgopalak
@kishoreg1980
Recruiting Solutions 1

Outline

 What is Helix
 Use case 1: distributed data store
 Architecture
 Use case 2: consumer group
 Helix at LinkedIn
 Q&A

2

What is Helix

Cluster management framework for distributed systems
using declarative state model

3

Distributed system examples

4

Motivation

 A system starts out simple…
 …but gets complex in the real world
 …as you address real requirements

Application

client library
 Scale
 Failover
 Bootstrapping
Call Routing
System

Replica 1 …

Replica 2 …
5

Motivation

 These are cluster management problems
  Helix solves them once…
Scale
  …so you can focus on your system
Failover
 Bootstrapping

6

Outline

 What is Helix
 Architecture
 Q&A

7

Use-Case: Distributed Data Store

 Distributed

P.1

Node 1 Node 2 Node 3

8


 Distributed
 Partitioned

P.1 P.2 P.3 P.5 P.6 P.7 P.9 P.1 P.11
0
P.4 P.8 P.1
2


9


 Distributed
 Partitioned
 Replicated

P.1 P.2 P.3 P.5 P.6 P.7 P.9 P.1 P.11
0
P.4 P.5 P.6 P.8 P.1 P.2 P.1 P.3 P.4
2
P.9 P.1 P.11 P.1 P.7 P.8
0 2


10

Partition Layout

 Highly Available
 Master accepts writes
 Balanced distribution
Master
Slave

P.1 P.2 P.3 P.5 P.6 P.7 P.9 P.1 P.11
0
P.4 P.5 P.6 P.8 P.1 P.2 P.1 P.3 P.4
2
P.9 P.1 P.11 P.1 P.7 P.8
0 2


11

Failover

Master
Slave

P.1 P.2 P.3 P.5 P.6 P.7 P.9 P.1 P.11
0
P.4 P.5 P.6 P.8 P.1 P.2 P.1 P.3 P.4
2
P.9 P.1 P.11 P.1 P.7 P.8
0 2


Add Capacity

P.1 P.5 P.9

P.1 P.1 P.8
0 2
Master
Node 4 Slave

P.1 P.2 P.3 P.5 P.6 P.7 P.9 P.1 P.11
0
P.4 P.5 P.6 P.8 P.1 P.2 P.1 P.3 P.4
2
P.9 P.1 P.11 P.1 P.7 P.8
0 2


Use-case requirements

• Partition constraints
• 1 master per partition
• Balance partitions across cluster
• No single-point-of-failure: replicas on different nodes
• Handle failures: transfer mastership
• Elasticity
• Distribute workload across added nodes
 Minimize partition movement
• Meet SLAs
 Throttle concurrent data movement

14

Recruiting Solutions ‹#›

Generalizing cluster management

STATE MACHINE

CONSTRAINTS OBJECTIVE

16

Outline

 What is Helix
 Architecture
 Q&A

17

Helix Based System Roles

PARTICIPANT
IDEAL STATE

SPECTATOR
Controller

Parition routing
logic
CURRENT STATE
RESPONSE COMMAND

P.1 P.2 P.3 P.5 P.6 P.7 P.9 P.1 P.1
0 1

P.4 P.5 P.6 P.8 P.1 P.2 P.1 P.3 P.4
P.1
2

P.9 P.1 P.1 P.1 P.7 P.8
0 1 2


18

Controller Execution Flow

N1 P1 P2 SLAVE N1 P1 P2
S
N2 P2 P3 N2 P2 P3

N3 P3 P1 N3 P3 P1

N1
O M
OFFLINE MASTER

REBALANCER N2

P1:OS
P1:SM
N1 P1 P2

N3
N2 P2 P3
ZooKeeper

SPECTATORS N3 P3 P1

MESSAGE QUEUE

Controller fault tolerance

Zookeeper

Controller Controller Controller
1 2 3

LEADER STANDBY STANDBY

20

Controller fault tolerance

Zookeeper

Controller Controller Controller
1 2 3

OFFLINE LEADER STANDBY

21

Participant Plug-in code

22

Spectator Plug-in code

23

Benefits

 Cluster operations “just work”
– Bootstrapping
– Failover
– Add nodes
 Global vs Local
– Helix Controller
 Global knowledge
 Makes cluster decisions
– Participant
 Local knowledge
 Follows orders

24

Outline

 What is Helix
 Architecture
 Q&A

25

Consumer group: Scaling

27

Consumer group: Fault tolerance

28

Consumer group: state model

ONLINE MAX=1

OFFLINE

29

Outline

 What is Helix
 Architecture
 Q&A

30

Helix usage at LinkedIn (Pictures)

 Espresso
– a timeline-consistent, distributed data store
 Databus
– a change data capture service
 Search as a Service
– a multi-tenant service for multiple search applications
 More planned

31

Summary

 Building Distributed Data Systems is hard
– Abstraction and modularity is key
 Helix: A Generic framework for Cluster Management
 Simple programming model: declarative state machine

32

Helix: Future Roadmap

• Features
• Span multiple data centers
• Load balancing

• Announcement
• Open source: https://github.com/linkedin/helix
• Apache incubation
• New contributors

Apache Helix presentation at SOCC 2012

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

More from Kishore Gopalakrishna

More from Kishore Gopalakrishna (6)

Recently uploaded

Recently uploaded (20)

Apache Helix presentation at SOCC 2012

Editor's Notes