Introduction to Kafka Cruise Control

Distributed Data Systems 1©2016 LinkedIn Corporation. All Rights Reserved.
Introduction to Kafka Cruise Control
Jiangjie (Becket) Qin
Efe Gencer

Agenda
▪ The operation challenges for Kafka
▪ What is Kafka Cruise Control?
▪ Complexity of dynamic workload balancing
▪ Goals for workload balancing
▪ System design and architecture
▪ Q&A

Agenda
▪ Q&A

The operation challenges for Kafka
▪ The scale of Kafka’s deployment @ LinkedIn
– ~1,800 brokers
– ~80,000 Topics
– > 1.3 Trillion messages / day

The operation challenges for Kafka
▪ Almost everyday
– Broker dies
– new topics creation
– Partition reassignment to balance the workload
▪ Huge operation load

Agenda
▪ Q&A

What is Kafka Cruise Control
▪ Dynamic workload balancing for resources
– CPU
– Disk Utilization
– Network Inbound
– Network Outbound
▪A predecessor (kafka-assigner) with only disk balancing
–https://github.com/linkedin/kafka-tools

What is Kafka Cruise Control
▪ Failure detection and self-healing
– Reassign the replicas on the dead brokers
– Reduce the window of under replication
▪ Add / decommission a broker

Agenda
▪ Q&A

Complexity of Dynamic Workload Balancing
▪ Consider a Kafka cluster with
– 2000 Topics
– 16,000 partitions
– 32,000 replicas (assume RF=2)

▪ Each replica has various workload profile
– CPU (Leader > Follower)
– Disk utilization (Leader = Follower)
– Network Inbound (Leader = Follower)
– Network Outbound (Follower = 0)
Leader
Follower
Decompression
Re-compression (Optional)
Append
ConsumerProducer

▪ Multiple values should be analyzed for each metric
– traffic patterns vary
– A long enough observation period is necessary
t
Throughput

▪ Some more things to consider
– Even distribution of all partitions among the brokers
– Rack awareness
– The load of each broker after one of the brokers failed

▪ Partition reassignment is tricky
– Should not affect the normal traffic (KIP-73)
– May need to be interrupted
▪ e.g. A failed broker recovered

▪ Large number of replicas
▪ Workload on multiple resources
▪ Additional restrictions
▪ Two ways to balance
– Leadership movement (cheap)
– Replica movement (expensive)

Agenda
▪ Q&A

Goals of workload balancing
1. Rack Awareness Goal (Hard Goal)
– The replicas of the same partition has to be in different
racks
Rack0
p0r0
p1r1
Rack1
p0r1
p2r0
Rack2
p1r0
p2r1

2. Resource Utilization Threshold Goal (Hard Goal)
– The utilization of each resource on a broker has to be
below a defined threshold

3. Resource Utilization During Failure Goal (Soft Goal)
– Utilization of each resource on a broker cannot exceed
the broker’s capacity when there are broker failures.

4. Resource Utilization Balance Goal (Soft Goal)
– The utilization of each resource of a broker should not
differ for more than X% of the average utilization.

5. Topic Partition Distribution Goal (Soft Goal)
– The partitions of each topic should be distributed among
the brokers as evenly as possible
Rack0 Rack1
Broker0 Broker1 Broker2
T0_P0_R1 T0_P1_R0 T0_P0_R0
Rack2
Broker3
T0_P1_R1

6. Global Partition Distribution Goal (Soft Goal)
– Partitions of all the topics in the Kafka cluster should be
distributed among the brokers as evenly as possible
Rack0 Rack1
Broker0 Broker1 Broker2
T0_P0_R1 T0_P1_R0 T0_P0_R0
Rack2
Broker3
T0_P1_R1
T1_P0_R0 T1_P1_R1 T1_P0_R1 T1_P1_R0

▪ Each goal has a priority
– Represented by a unique integer
– Determines the satisfying order

Agenda
▪ Q&A

Kafka Cruise Control
Architecture of Kafka Cruise Control
Workload Monitor
Executor
Kafka
Cluster
Failure Detector
User
Analyzer
Metric Sampler
REST
API
Goal0
Goal1
…

REST API
./rebalance_cluster [balance_percentage]
./add_brokers <broker-info>
./decommission_brokers <broker-info>
./assignment_history [option]
./restore_assignment [option]
./cancel_all
./list_assignment [option]

Many Interesting Challenges
▪ Trustworthy Workload Modeling (Workload Monitor)
▪ Fast Optimization Resolution (Analyzer)
▪ False Alarm in Failure (Failure Detector)
▪ Controlled Balancing Execution (Executor)
▪ And so on…

Trustworthy Workload Modeling
▪ Garbage in, garbage out
▪ A good workload model is critical
– Robust workload sampling
▪ Workload sampling may fail intermittently
– Only take action when we are confident

Robust Workload Sampling
▪ Replica workload model (Workload Sample)
– CPU Utilization: Derived from total CPU usage
▪(Partition_Bytes_In / Total_Bytes_In) * CPU_UTIL
– Disk Utilization: Partition size (latest size)
– Network Inbound: Kafka metrics
– Network Outbound: Kafka metrics
▪ Followers’ loads are derived from leaders’ loads

Robust Workload Sampling
▪ Workload Monitor
– Periodically sample the Kafka cluster
▪ E.g. Every 5 min.
▪ Many metrics if the cluster is big
–Multiple customizable metric samplers
▪Parallel metric sampling
– Each Workload Sample is for one partition

▪ Workload Snapshot
– Represents the average workload in a defined window
– Keep most recent N snapshots for each partition
– Multiple workload samples in each workload snapshot
– Insufficient samples leads to invalid Workload Snapshots
▪ E.g. 4 samples per snapshot window, at least 3 samples required
Only take confident action
Snapshot 0 Snapshot 1 Snapshot 2
S0 S1 S2 S3 S0 S1 S2 S3 S0 S1 S2 S3

Only take confident action
▪ Never take action when we are not confident
– Exclude a partition without enough valid snapshots
– Exclude a topic if one of its partitions is excluded
– Stop the analysis if too many topics are excluded
▪ E.g. < 98% topics are included

▪ And so on…

Fast Optimization Resolution
▪ A reminder: dynamic workload balancing is not easy
– Tens of thousands of replicas
– Multiple resources (CPU, DISK, Network)
– 6 Goals
▪ We need to get a solution quickly
– Otherwise the workload model may be outdated

An attempt of using Microsoft Z3
▪ Microsoft z3
– An open source theorem prover (optimizer)
▪ https://github.com/Z3Prover/z3
– Optimization by minimize a set of cost functions
▪ In our case it is a bunch of first order formula.

An attempt of using Microsoft Z3
▪ Microsoft z3
– An open source theorem prover (optimizer)
▪ https://github.com/Z3Prover/z3
– Optimization by minimize a set of cost functions
▪ In our case it is a bunch of first order formula.
▪ It takes a couple of weeks to get a solution assuming
everything goes perfectly well

Heuristic Analyzer
▪ Simple procedure
Move a partition to other brokers
Get an unchecked broker
Y
Y
Have unchecked
broker?
Can move a
partition to other
brokers?
Y
N
N
Done
Failed
N
Start
All goals met?
Hard
goal?
Y
N N

Heuristic Analyzer
▪ From weeks to lower seconds
– Not globally optimal solution
– But good enough
▪ Pluggable goals
– Each goal implements an interface
– Easy to add new goals

▪ And so on…

Avoid false alarm in failure detection
▪ Broker may appear to be failed in a few cases
– Rolling bounce
– Machine reboot
– Hard kill testing
▪ Heal a cluster is expensive
– Data movement

Avoid false alarm in failure detection
▪ Trade off between detection time and false alarm
– A grace period for a broker to come back
▪ E.g. 30 min.
– Asking for human intervention
▪ E.g. a broker will be back with a reboot.

▪ Trustworthy Workload Modeling
▪ Fast Optimization Resolution
▪ False Alarm in Failure
▪ Controlled Balancing Execution
▪ And so on…

Controlled Balancing Execution
▪ Leader movement is cheap
▪ Partition reassignment is expensive
– A long lasting job
– Data movements
– Difficult to interrupt
▪ When and how to interrupt

Controlled Balancing Execution
▪ KIP-73
– replication quotas to throttle the replication traffic during
partition reassignment
– Avoid impact on normal traffic
▪ Executor in Kafka Cruise Control
– Batched replica reassignment
– Allow easy and safe interruption between batches

Future Works
▪ Integration with cloud infrastructure
– E.g. RAIN, Kubernetes
▪ GUI for Cruise Control
▪ Time machine for partition assignments
– Allows restoring a previous partition assignment
▪ Optimize performance and reduce overheads

Acknowledgements
Aditya Auradkar
Dong Lin
Joel Koshy
Kartik Paramasivam
Kafka team@LinkedIn

Q&A

Introduction to Kafka Cruise Control

More Related Content

What's hot

Similar to Introduction to Kafka Cruise Control

Recently uploaded

Introduction to Kafka Cruise Control