Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summit SF 2019

Kafka Cluster Federation at Uber
Yupeng Fu, Xiaoman Dong
Streaming Data Team, Uber

Apache Kafka @ Uber
PRODUCERS CONSUMERS
Real-time Analytics,
Alerts, DashboardsSamza / Flink
Applications
Data Science
Analytics
Reporting
Apache
Kafka
Vertica / Hive
Rider App
Driver App
API / Services
Etc.
Ad-hoc Exploration
ELK
Debugging
Hadoop
Surge
Mobile App
Cassandra
MySQL
DATABASES
(Internal) Services
AWS S3
Payment

PBsMessages / DayTrillions Data/day
Tens of
Thousands
Topics
Kafka Scale at Uber
excluding replication
ThousandsServices
Dozens clusters

When disaster strikes...
2 AM on Sat morning
Region 1
Kafka on-call paged, service owners paged
Emergency failover of services to another
region performed
Region 2
Region 1

What if ...
2 AM on Sat morning
Region 1
Redirect services’ traﬃc to another cluster

Cluster Federation: Cluster of Clusters
Kafka Users:
Kafka Team:

Cluster Federation: beneﬁts
● Availability
○ Tolerate a single cluster downtime without user impact and region failover
● Scalability
○ Avoid giant Kafka cluster
○ Horizontally scale out Kafka clusters without disrupting users
● Ease of Operation and management
○ Easier maintenance of critical clusters like decomm, rebalance etc
○ Easier topic migration from one cluster to another
○ Easier topic discovery for users without knowing the actual clusters

High-level Design Concepts
● Users view a logical cluster
● A topic has a primary cluster and
secondary cluster(s)
● Clients fetch topic-cluster mapping and
determine which cluster to connect
● Dynamic traffic redirection of
consumers/producers without restart
● Data replication between the physical
clusters for redundancy
● Consumer progress sync between the
clusters

Design Challenges
● Producer/Consumer client traffic redirection
● Aggregate and serve the topic-to-cluster mappings
● Replication of data between clusters
● Consumer offset management

Architecture Overview
1. Client fetches metadata from Kafka
Proxy
2. Metadata service manages the
global metadata
3. Data cross-replicated between the
clusters by uReplicator
4. Push-based offset sync between
the clusters

1. Client fetches metadata from
Kafka Proxy
global metadata
3. Data cross-replicated between the
clusters by uReplicator
the clusters

#1 Kafka proxy for traﬃc redirection
● A proxy server that supports Kafka protocol of metadata requests
● Shares the same network implementation of Apache Kafka
● Routes the client to the Kafka cluster for fetch and produce
● Triggers a consumer group rebalance when the primary cluster changes

Kafka Proxy
#1 Kafka proxy and client interaction
ApiVersionRequest
Configured API version for the clusters
MetadataRequest
Metadata of the kafkaA (primary)
(Consumer)GroupCoordinatorRequest
GroupCoordinator response
kafkaA (primary)
Lookup the cache of
primary cluster
cache the primary
cluster to client
Kafka Client
metadata:
kafkaA-01
kafkaA-02
bootstrap.servers:
kafka-proxy-01
kafka-proxy-02
fetch/produce
from/to kafkaA
metadataUpdate
getLeastLoadedNode
getRandomNode
metadata:
kafkaB-01
kafkaB-02
fetch/produce
from/to kafkaB
Metadata of the kafkaB (primary)

#1 Kafka proxy internals
● Socket Server: serve the incoming metadata requests
● Metadata Provider: collect information from metadata service
● Zookeeper: local metadata cache
● Cluster Manager: manage the clients to the federated clusters

#2 Kafka Metadata Service
● The central service that manages the topic and cluster metadata
information
● Paired with a service that periodically syncs with all the physical
clusters
● Exposes endpoints for setting primary cluster

#2 Kafka Metadata Service
● Single entry point for topic metadata management
○ Topic creation/deletion
○ Partition expansion
○ Blacklist/Quota control etc
Metadata Service
Topic Creation
KafkaB
KafkaA
Topic
Creation
Topic
Creation
replication setup

1. Client fetches metadata from Kafka
Proxy
global metadata
3. Data cross-replicated between
the clusters by uReplicator
the clusters

#3 Data replication - uReplicator
● Uber’s Kafka replication service derived from MirrorMaker
● Goals
○ Optimized and stable replication, e.g. rebalance only occurs during startup
○ Operate with ease, e.g. add/remove whitelists
○ Scalable, High throughput
● Open sourced: https://github.com/uber/uReplicator
● Blog: https://eng.uber.com/ureplicator/

#3 Data replication - cont’d
Improvements for Federation
● Header-based ﬁlter to avoid cyclic replication
○ Source cluster info written into message header
○ Messages will not be replicated back to its original cluster
○ Bi-directional replication becomes simple and easy
● Improved oﬀset mapping for consumer management

● Consumer should resume after switching cluster
● They will rejoin consumer group with the same name
#4 Oﬀset Management - Solutions

● Offset Solutions
○ Resume from largest offset → Data Loss

○ Resume from smallest offset → Lots of Backlog & Duplicates

○ Resume by timestamp → Complicated & Not Reliable
○ Trying to make topic offsets the same → Nearly impossible

○ Resume by timestamp → Complicated & Not Reliable
○ Trying to make topic offsets the same → Nearly impossible
○ ✅ Offset manipulation by a dedicated service

#4 Oﬀset Management - Oﬀset Mapping
Goal: no data loss
● uReplicator copies data between clusters
● uReplicator knows the offset mapping
between clusters

#4 Oﬀset Management - Oﬀset Mapping
Goal: no data loss
● uReplicator copies data between clusters
● uReplicator knows the offset mapping
between clusters
● Offset mappings are reported periodically
into a DB
● Consuming starting from the the mapped
offset pair can guarantee no data loss

#4 Oﬀset Management - Consumer Group
Example for a specific topic partition
1. Consumer commits offset 17

2. Offset sync service
a. Queries the Store, closest offset pair is
(13 mapped to 29)

(13 mapped to 29)
b. Commits offset 29 into Kafka B

(13 mapped to 29)
b. Commits offset 29 into Kafka B
3. Consumer redirected to Cluster B
a. Joins consumer group with same name
b. Resumes from offset 29 -- no loss

#4 Oﬀset Management - Eﬃcient Update
Kafka __consumer_offsets internal topic
● Kafka internal storage of consumer groups
● Each message in it is a changelog of consumer groups
● All offset commits are written as Kafka messages
● Can have huge traffic (thousands of messages per second)

#4 Oﬀset Management - Eﬃcient Update
Offset Sync: A Streaming Job
● Reads from __consumer_offsets topic
● Compacts offset commits into batches
● Then converts and updates the committed
offset into offset of other cluster(s)
The job monitors and reports all consumer group
metrics conveniently. Open source planned.

Tradeoﬀ and limitation
● Data redundancy for higher availability: 2X replicas with 2 clusters
● Message out of order during failover transition
● Topic level federation is challenging for REST Proxy, and also for consumers
that subscribe to several topics or a pattern
● Consumer has to rely on Kafka clusters to manage offsets (e.g., not friendly to
some Flink consumers)

Proprietary and confidential © 2019 Uber Technologies, Inc. All rights reserved. No part of this document may be reproduced or utilized in any
form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval systems, without
permission in writing from Uber. This document is intended only for the use of the individual or entity to whom it is addressed and contains
information that is privileged, confidential or otherwise exempt from disclosure under applicable law. All recipients of this document are notified
that the information contained herein includes proprietary and confidential information of Uber, and recipient may not make use of, disseminate,
or in any way disclose this document or any of the enclosed information to any person other than employees of addressee to the extent
necessary for consultations with authorized personnel of Uber.

Highly available Kafka at Uber: Active-active
● Provide business resilience and continuity as
the top priority
● Active-active in multiple regions
○ Data produced locally via Rest proxy
○ Data aggregated to agg cluster
○ Active-active consumers
● Issues
○ Failover coordination and communication
required
○ Data unavailable in regional cluster during
downtime

Highly available Kafka at Uber: secondary cluster
● Provide business resilience and continuity as
the top priority
● When regional cluster is unavailable
○ Data produced to secondary cluster
○ Then replicated to regional when it’s back
● Issues
○ Unused capacity when regional cluster is
up
○ Regional cluster unavailable for
consumption during downtime

Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summit SF 2019

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summit SF 2019

Similar to Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summit SF 2019 (20)

More from confluent

More from confluent (20)

Recently uploaded

Recently uploaded (20)

Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summit SF 2019