Multi-Tenant Data Cloud with YARN & Helix

Multi-Tenant Data Cloud
with YARN & Helix
LinkedIn - Data infra : Helix, Espresso
@kishore_b_g
Yahoo - Ads infra : S4
Kishore Gopalakrishna
1Thursday, June 5, 14

What is YARN
Next Generation Compute Platform
MapReduce
HDFS
Hadoop 1.0
MapReduce
HDFS
Hadoop 2.0
Others
(Batch, Interactive, Online,
Streaming)
YARN
(cluster resource management)

What is YARN
Next Generation Compute Platform
MapReduce
HDFS
Hadoop 1.0
MapReduce
HDFS
Hadoop 2.0
Others
(Batch, Interactive, Online,
Streaming)
YARN
A1
A1
A2
A3
B1 C1
C5
B2
B3 C2
B4
B5
C3
C4
Enables

HDFS/Common Area
YARN
YARN Architecture
Client
Resource
Manager
Node Manager Node Manager
submit job
node statusnode status
container
request
App Package
Application
Master
Container

So, let’s build something

Example System
Generate Data
Serve
M/R
Redis
Server 3
HDFS 3
- Generate data in Hadoop
- Use it for serving

Example System
Generate Data
Serve
M/R
Server 3
HDFS 3

Example System
Requirements
Big Data :-)
Partitioned, replicated
Fault tolerant, Scalable
Efﬁcient resource utilization
Generate Data
Serve
M/R
Server 3
HDFS 3

Application
Master
Example System
Request
Containers Assign work
Handle Failure
Handle
workload
Changes
Requirements
Big Data :-)
Partitioned, replicated
Fault tolerant, Scalable
Efﬁcient resource utilization
Generate Data
Serve
M/R
Server 3
HDFS 3

Allocation + Assignment
HDFS
Server 1 Server 2Server 3
Partition Assignment - afﬁnity, even distribution
Replica Placement - on different physical machines
Container Allocation - data afﬁnity, rack aware placement
M/Rp1 p2 p3 p4 p5 p6
p1 p2
p5 p4
Server 3
p3 p4
p1 p6
Server 3
p5 p6
p3 p2
Multiple servers to serve
the partitioned data
M/R job generates partitioned data

Failure Handling
Acquire new container close to data if possible
Assign failed partitions to new container
On Failure - Even load distribution, while waiting for new container
Server 23 Server 3
p5 p4 p1 p6 p3 p2
p1 p2 p3 p4 p5 p6

Failure Handling
Acquire new container close to data if possible
Assign failed partitions to new container
On Failure - Even load distribution, while waiting for new container
Server 23 Server 3 Server 4
p5 p4 p1 p6 p3 p2
p1 p2 p3 p4 p5 p6
p3 p2
p5 p6

Workload Changes
Workload change - Acquire/Release containers
Container change - Re-distribute work
Monitor - CPU, Memory, Latency, Tps
p1 p2
p5 p4
Server 3
p3 p4
p1 p6
Server 3
p5 p6
p3 p2

Workload Changes
p1 p2
p5 p4
Server 3
p3 p4
p1 p6
Server 3
p5 p6
p3 p2
Server 3
p4 p6
p2

Workload Changes
p1 p2
p5
Server 3
p3 p4
p1
Server 3
p5 p6
p3
Server 3
p4 p6
p2

Service Discovery
Dynamically updated on changes
Discover everything, what is running where
p1 p2
p1 p1
Server 3
p3 p4
p1 p1
Server 3
p5 p6
p1 p1

Service Discovery
Dynamically updated on changes
Discover everything, what is running where
p1 p2
p1 p1
Server 3
p3 p4
p1 p1
Server 3
p5 p6
p1 p1
Client Client
Service Discovery

Building YARN Application
Writing AM is Hard and Error Prone
Handling Faults, Workload Changes is non-trivial and often overlooked
Request
container
How many
containers
Where
Assign work
Place
partitions &
replicas
Afﬁnity
Workload
changes
acquire/
release
containers
Minimize
movement
Faults
Handling
Detect non
trivial failures
new v/s
reuse
containers
Other
Service
Discovery
Monitoring

Building YARN Application
Writing AM is Hard and Error Prone
Handling Faults, Workload Changes is non-trivial and often overlooked
Request
container
How many
containers
Where
Assign work
Place
partitions &
replicas
Afﬁnity
Workload
changes
acquire/
release
containers
Minimize
movement
Faults
Handling
Detect non
trivial failures
new v/s
reuse
containers
Other
Service
Discovery
Monitoring
Is there something that can make
this easy?

Apache Helix

What is Helix?
Built at LinkedIn, 2+ years in production
Generic cluster management framework
Contributed to Apache, now a TLP: helix.apache.org
Decoupling cluster management from core functionality

Helix at LinkedIn
Oracle
Oracle
OracleDB
Change Capture
Change
Consumers
Index Search Index
User Writes
Data Replicator
In Production
ETL
HDFS
Analytics

Helix at LinkedIn
In Production
Over 1000 instances covering over 30000
partitions
Over 1000 instances for change
capture consumers
As many as 500 instances in a
single Helix cluster
(all numbers are per-datacenter)

Others Using Helix

Helix concepts
Resource
(Database, Index, Topic, Task)

Helix concepts
Resource
Partitions
p1 p2 p3 p4 p5 p6

Helix concepts
Resource
Partitions
Replicas
p1 p2 p3 p4 p5 p6
r1
r2
r3

Helix concepts
Resource
Partitions
Replicas
p1 p2 p3 p4 p5 p6
r1
r2
r3
Container
Process
Container
Process
Container
Process

Helix concepts
Resource
Partitions
Replicas
p1 p2 p3 p4 p5 p6
r1
r2
r3
Container
Process
Container
Process
Container
Process
Assignment ?

State Model and Constraints
Helix Concepts

Serve
bootstrap
Helix Concepts
Stop

Serve
bootstrap
Helix Concepts
State
Constraints
Transition
Constraints
Partition
Resource
Node
Cluster
Serve: 3
bootstrap: 0
Max T1 transitions in
parallel
-
parallel
No more than
10 replicas
parallel
-
parallel
Stop

Serve
bootstrap
Helix Concepts
State
Constraints
Transition
Constraints
Partition
Resource
Node
Cluster
Serve: 3
bootstrap: 0
parallel
-
parallel
No more than
10 replicas
parallel
-
parallel
StateCount=
Replication factor:3
Stop

ParticipantParticipantParticipant
Helix Architecture
P1
stop
bootstrap
server
P2 P5
P3
P4
P8
P6
P7
Controller
Client Client Target Provider
Provisioner
Rebalancer
assign work via callback
spectator spectator
Service Discovery
metrics
metrics

Helix Controller
High-Level Overview
Resource
Conﬁg
Constraints
Objectives
Controller
TargetProvider
Provisioner
Rebalancer
Number of Containers
Task-> Container
Mapping
YARN RM

Helix Controller
Target Provider
Determine how many containers are required along with the spec
Fixed CPU Memory Bin Packing
monitoring system provides usage information
Default implementations, Bin Packing can be used to customize further
TargetProvider
Resources p1,p2 .. pn
Existing containers c1,c2 .. cn
Health of tasks,
containers
cpu, memory, health
Allocation
constraints
Afﬁnity,
rack locality
SLA
Fixed: 10 containers
CPU headroom:30%
Memory Usage: 70%
time: 5h
Number of
container
release list
acquire list
Container spec
cpu: x
memory: y
location: L

Helix Controller
Provisioner
Given the container spec, interact with YARN RM to
acquire/release, NM to start/stop containers
YARN
Interacts with YARN RM and subscribes to notiﬁcations

Helix Controller
Rebalancer
Based on the current nodes in the cluster and constraints, find an
assignment of task to node
Auto Semi-Auto Static
Rebalancer
Tasks t1,t2 .. tn
Existing containers c1,c2 .. cn
Allocation
constraints &
objectives
Affinity,
rack locality,
Even distribution of
tasks,
Minimize movement
while expanding
Assignment
C1: t1,t2
C2: t3,t4
User defined
Based on the FSM, compute & fire the transitions to Participants

Example System: Helix-Based Solution
Solution
Configure App
Configure Target Provider
Configure Provisioner
Configure Rebalancer
Generate Data
Serve
M/R
Server 3
HDFS 3

Configure AppConfigure App
App Name Partitioned Data Server
App Master
Package
/path/to/
GenericHelixAppMaster.tar
App package
/path/to/
RedisServerLauncher.tar
App Config
DataDirectory: hdfs:/path/to/
data
Configure target providerConfigure target provider
TargetProvider RedisTargetProvider
Goal Target TPS: 1 million
Min container 1
Max containers 25
Configure ProvisionerConfigure Provisioner
YARN RM host:port
Configure RebalancerConfigure Rebalancer
Partitions 6
Replica 2
Max partitions per container 4
Rebalancer.Mode AUTO
Placement Data Affinity
FailureHandling Even distribution
Scaling Minimize Movement
app_config_spec.yaml
Example System: Helix-Based Solution

yarn_app_launcher.sh
app_config_spec.yaml
Launch Application

Helix + YARN
Server 1 Server 2

Helix + YARN
YARN
Resource
Manager
Client
submit job
Server 1 Server 2

Application Master
Helix + YARN
YARN
Resource
Manager
Client
submit job
Launch
AM
Server 1 Server 2

Application Master
Helix + YARN
Helix Controller
YARN
Resource
Manager
Target Provider
Provisioner
RebalancerClient
submit job
Launch
AM
Server 1 Server 2

Application Master
Helix + YARN
Helix Controller
YARN
Resource
Manager
Target Provider
Provisioner
RebalancerClient
submit job
Launch
AM
request
cntrs
Server 1 Server 2

Node ManagerNode Manager
Application Master
Helix + YARN
Helix Controller
Node Manager
YARN
Resource
Manager
Target Provider
Provisioner
RebalancerClient
submit job
Launch
AM
request
cntrs
launch
containers
Server 1 Server 2participant 3 participant 3 participant 3

Node ManagerNode Manager
Application Master
Helix + YARN
Helix Controller
Node Manager
YARN
Resource
Manager
Target Provider
Provisioner
Rebalancer
assign
work
Client
submit job
Launch
AM
request
cntrs
launch
containers
Server 1 Server 2participant 3
p1 p2
p5 p4
participant 3
p3 p4
p1 p6
participant 3
p5 p6
p3 p2

Auto Scaling
Non linear scaling from 0 to 1M TPS and back

Failure Handling: Random Faults
Recovering from faults at 1M Tps (5%, 10%, 20% failures/min)

Summary
HDFS
YARN
HELIX
(container + task management)
Others
(Batch, Interactive, Online, Streaming)
Fault tolerance, Expansion handled transparently
Generic Application Master
Efﬁcient resource utilization by task model

Questions?
Website
Twitter
Mail
Team
helix.apache.org, #apachehelix
@apachehelix, @kishore_b_g
user@helix.apache.org
Kanak Biscuitwala, Zhen Zhang
?We love helping & being helped

Multi-Tenant Data Cloud with YARN & Helix

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Multi-Tenant Data Cloud with YARN & Helix

Similar to Multi-Tenant Data Cloud with YARN & Helix (20)

Recently uploaded

Recently uploaded (20)

Multi-Tenant Data Cloud with YARN & Helix