Uber has one of the largest Kafka deployment in the industry. To improve the scalability and availability, we developed and deployed a novel federated Kafka cluster setup which hides the cluster details from producers/consumers. Users do not need to know which cluster a topic resides and the clients view a "logical cluster". The federation layer will map the clients to the actual physical clusters, and keep the location of the physical cluster transparent from the user. Cluster federation brings us several benefits to support our business growth and ease our daily operation. In particular, Client control. Inside Uber there are a large of applications and clients on Kafka, and it's challenging to migrate a topic with live consumers between clusters. Coordinations with the users are usually needed to shift their traffic to the migrated cluster. Cluster federation enables much control of the clients from the server side by enabling consumer traffic redirection to another physical cluster without restarting the application. Scalability: With federation, the Kafka service can horizontally scale by adding more clusters when a cluster is full. The topics can freely migrate to a new cluster without notifying the users or restarting the clients. Moreover, no matter how many physical clusters we manage per topic type, from the user perspective, they view only one logical cluster. Availability: With a topic replicated to at least two clusters we can tolerate a single cluster failure by redirecting the clients to the secondary cluster without performing a region-failover. This also provides much freedom and alleviates the risks for us to carry out important maintenance on a critical cluster. Before the maintenance, we mark the cluster as a secondary and migrate off the live traffic and consumers. We will present the details of the architecture and several interesting technical challenges we overcame.
2. Apache Kafka @ Uber
PRODUCERS CONSUMERS
Real-time Analytics,
Alerts, DashboardsSamza / Flink
Applications
Data Science
Analytics
Reporting
Apache
Kafka
Vertica / Hive
Rider App
Driver App
API / Services
Etc.
Ad-hoc Exploration
ELK
Debugging
Hadoop
Surge
Mobile App
Cassandra
MySQL
DATABASES
(Internal) Services
AWS S3
Payment
3. PBsMessages / DayTrillions Data/day
Tens of
Thousands
Topics
Kafka Scale at Uber
excluding replication
ThousandsServices
Dozens clusters
4. When disaster strikes...
2 AM on Sat morning
Region 1
Kafka on-call paged, service owners paged
Emergency failover of services to another
region performed
Region 2
Region 1
5. What if ...
2 AM on Sat morning
Region 1
Redirect services’ traffic to another cluster
7. Cluster Federation: benefits
● Availability
○ Tolerate a single cluster downtime without user impact and region failover
● Scalability
○ Avoid giant Kafka cluster
○ Horizontally scale out Kafka clusters without disrupting users
● Ease of Operation and management
○ Easier maintenance of critical clusters like decomm, rebalance etc
○ Easier topic migration from one cluster to another
○ Easier topic discovery for users without knowing the actual clusters
8. High-level Design Concepts
● Users view a logical cluster
● A topic has a primary cluster and
secondary cluster(s)
● Clients fetch topic-cluster mapping and
determine which cluster to connect
● Dynamic traffic redirection of
consumers/producers without restart
● Data replication between the physical
clusters for redundancy
● Consumer progress sync between the
clusters
9. Design Challenges
● Producer/Consumer client traffic redirection
● Aggregate and serve the topic-to-cluster mappings
● Replication of data between clusters
● Consumer offset management
10. Architecture Overview
1. Client fetches metadata from Kafka
Proxy
2. Metadata service manages the
global metadata
3. Data cross-replicated between the
clusters by uReplicator
4. Push-based offset sync between
the clusters
11. Architecture Overview
1. Client fetches metadata from
Kafka Proxy
2. Metadata service manages the
global metadata
3. Data cross-replicated between the
clusters by uReplicator
4. Push-based offset sync between
the clusters
12. #1 Kafka proxy for traffic redirection
● A proxy server that supports Kafka protocol of metadata requests
● Shares the same network implementation of Apache Kafka
● Routes the client to the Kafka cluster for fetch and produce
● Triggers a consumer group rebalance when the primary cluster changes
13. Kafka Proxy
#1 Kafka proxy and client interaction
ApiVersionRequest
Configured API version for the clusters
MetadataRequest
Metadata of the kafkaA (primary)
(Consumer)GroupCoordinatorRequest
GroupCoordinator response
kafkaA (primary)
Lookup the cache of
primary cluster
cache the primary
cluster to client
Kafka Client
metadata:
kafkaA-01
kafkaA-02
bootstrap.servers:
kafka-proxy-01
kafka-proxy-02
fetch/produce
from/to kafkaA
metadataUpdate
getLeastLoadedNode
getRandomNode
metadata:
kafkaB-01
kafkaB-02
fetch/produce
from/to kafkaB
Metadata of the kafkaB (primary)
14. #1 Kafka proxy internals
● Socket Server: serve the incoming metadata requests
● Metadata Provider: collect information from metadata service
● Zookeeper: local metadata cache
● Cluster Manager: manage the clients to the federated clusters
15. Architecture Overview
1. Client fetches metadata from Kafka
Proxy
2. Metadata service manages the
global metadata
3. Data cross-replicated between the
clusters by uReplicator
4. Push-based offset sync between
the clusters
16. #2 Kafka Metadata Service
● The central service that manages the topic and cluster metadata
information
● Paired with a service that periodically syncs with all the physical
clusters
● Exposes endpoints for setting primary cluster
17. #2 Kafka Metadata Service
● Single entry point for topic metadata management
○ Topic creation/deletion
○ Partition expansion
○ Blacklist/Quota control etc
Metadata Service
Topic Creation
KafkaB
KafkaA
Topic
Creation
Topic
Creation
replication setup
18. Architecture Overview
1. Client fetches metadata from Kafka
Proxy
2. Metadata service manages the
global metadata
3. Data cross-replicated between
the clusters by uReplicator
4. Push-based offset sync between
the clusters
19. #3 Data replication - uReplicator
● Uber’s Kafka replication service derived from MirrorMaker
● Goals
○ Optimized and stable replication, e.g. rebalance only occurs during startup
○ Operate with ease, e.g. add/remove whitelists
○ Scalable, High throughput
● Open sourced: https://github.com/uber/uReplicator
● Blog: https://eng.uber.com/ureplicator/
20. #3 Data replication - cont’d
Improvements for Federation
● Header-based filter to avoid cyclic replication
○ Source cluster info written into message header
○ Messages will not be replicated back to its original cluster
○ Bi-directional replication becomes simple and easy
● Improved offset mapping for consumer management
21. Architecture Overview
1. Client fetches metadata from Kafka
Proxy
2. Metadata service manages the
global metadata
3. Data cross-replicated between the
clusters by uReplicator
4. Push-based offset sync between
the clusters
22. ● Consumer should resume after switching cluster
● They will rejoin consumer group with the same name
#4 Offset Management - Solutions
23. ● Consumer should resume after switching cluster
● They will rejoin consumer group with the same name
● Offset Solutions
○ Resume from largest offset → Data Loss
#4 Offset Management - Solutions
24. ● Consumer should resume after switching cluster
● They will rejoin consumer group with the same name
● Offset Solutions
○ Resume from largest offset → Data Loss
○ Resume from smallest offset → Lots of Backlog & Duplicates
#4 Offset Management - Solutions
25. ● Consumer should resume after switching cluster
● They will rejoin consumer group with the same name
● Offset Solutions
○ Resume from largest offset → Data Loss
○ Resume from smallest offset → Lots of Backlog & Duplicates
○ Resume by timestamp → Complicated & Not Reliable
○ Trying to make topic offsets the same → Nearly impossible
#4 Offset Management - Solutions
26. ● Consumer should resume after switching cluster
● They will rejoin consumer group with the same name
● Offset Solutions
○ Resume from largest offset → Data Loss
○ Resume from smallest offset → Lots of Backlog & Duplicates
○ Resume by timestamp → Complicated & Not Reliable
○ Trying to make topic offsets the same → Nearly impossible
○ ✅ Offset manipulation by a dedicated service
#4 Offset Management - Solutions
27. #4 Offset Management - Offset Mapping
Goal: no data loss
● uReplicator copies data between clusters
● uReplicator knows the offset mapping
between clusters
28. #4 Offset Management - Offset Mapping
Goal: no data loss
● uReplicator copies data between clusters
● uReplicator knows the offset mapping
between clusters
● Offset mappings are reported periodically
into a DB
● Consuming starting from the the mapped
offset pair can guarantee no data loss
29. #4 Offset Management - Consumer Group
Example for a specific topic partition
1. Consumer commits offset 17
30. #4 Offset Management - Consumer Group
Example for a specific topic partition
1. Consumer commits offset 17
2. Offset sync service
a. Queries the Store, closest offset pair is
(13 mapped to 29)
31. #4 Offset Management - Consumer Group
Example for a specific topic partition
1. Consumer commits offset 17
2. Offset sync service
a. Queries the Store, closest offset pair is
(13 mapped to 29)
b. Commits offset 29 into Kafka B
32. #4 Offset Management - Consumer Group
Example for a specific topic partition
1. Consumer commits offset 17
2. Offset sync service
a. Queries the Store, closest offset pair is
(13 mapped to 29)
b. Commits offset 29 into Kafka B
3. Consumer redirected to Cluster B
a. Joins consumer group with same name
b. Resumes from offset 29 -- no loss
33. #4 Offset Management - Efficient Update
Kafka __consumer_offsets internal topic
● Kafka internal storage of consumer groups
● Each message in it is a changelog of consumer groups
● All offset commits are written as Kafka messages
● Can have huge traffic (thousands of messages per second)
34. #4 Offset Management - Efficient Update
Offset Sync: A Streaming Job
● Reads from __consumer_offsets topic
● Compacts offset commits into batches
● Then converts and updates the committed
offset into offset of other cluster(s)
The job monitors and reports all consumer group
metrics conveniently. Open source planned.
41. Tradeoff and limitation
● Data redundancy for higher availability: 2X replicas with 2 clusters
● Message out of order during failover transition
● Topic level federation is challenging for REST Proxy, and also for consumers
that subscribe to several topics or a pattern
● Consumer has to rely on Kafka clusters to manage offsets (e.g., not friendly to
some Flink consumers)
44. Highly available Kafka at Uber: Active-active
● Provide business resilience and continuity as
the top priority
● Active-active in multiple regions
○ Data produced locally via Rest proxy
○ Data aggregated to agg cluster
○ Active-active consumers
● Issues
○ Failover coordination and communication
required
○ Data unavailable in regional cluster during
downtime
45. Highly available Kafka at Uber: secondary cluster
● Provide business resilience and continuity as
the top priority
● When regional cluster is unavailable
○ Data produced to secondary cluster
○ Then replicated to regional when it’s back
● Issues
○ Unused capacity when regional cluster is
up
○ Regional cluster unavailable for
consumption during downtime