More Data, More Problems:
Scaling Kafka Mirroring Pipelines at LinkedIn
Celia Kung
Software Engineer
Agenda
Use Cases & Motivation
Kafka MirrorMaker at LinkedIn
Brooklin MirrorMaker
Future
Use Cases & Motivation
• Aggregating data from all data centers
• Moving data from offline data stores into
online environments
• Moving data between LinkedIn and external
cloud services
Use Cases
Motivation
● Kafka data at LinkedIn continues to grow rapidly
Motivation
● Kafka MirrorMaker (KMM) has not scaled well
● KMM is difficult to operate and maintain
Kafka MirrorMaker
at LinkedIn
Kafka MirrorMaker at LinkedIn
100+ 9pipelines data centers
Kafka MirrorMaker at LinkedIn
100+clusters
6K+hosts
1T+messages/day
Topology
Datacenter B
aggregate
tracking
tracking
KMM KMM
Datacenter A
aggregate
tracking
tracking
KMM KMM
• Each pipeline:
○ mirrors data from 1 source cluster to 1
destination cluster
○ constitutes its own KMM cluster
Topology
Datacenter B
aggregate
tracking
tracking
Datacenter A
aggregate
tracking
tracking
KMM
aggregate
metrics
metrics
aggregate
metrics
metrics
Datacenter C
aggregate
tracking
tracking
aggregate
metrics
metrics
...
KMM KMM
KMM KMM KMM
KMM KMM KMM KMM KMM KMM
KMM KMM KMM KMM KMM KMM
... ... ...
KMM does not scale well
● # of KMM clusters = # of data centers x # of Kafka clusters
● More consumer-producer pairs → provision more hardware
KMM is difficult to operate
● Static configuration file per KMM cluster
● Changes require deploying to 100+ clusters
KMM is fragile
● Poor failure isolation
● Unable to catch up with traffic
● Increased latency
Brooklin MirrorMaker
Brooklin MirrorMaker
● Optimized for stability and operability
● Built on top of our stream ingestion service, Brooklin
● BMM is in production at LinkedIn today
Mirroring pipelines at LinkedIn
100+ 9pipelines data centers
Kafka MirrorMaker
100+clusters
6K+hosts
1T+
messages/day
Kafka MirrorMaker
100+clusters
6K+hosts
1T+
messages/day
Brooklin MirrorMaker
9clusters
<2Khosts
1T+
messages/day
Topology
Datacenter B
aggregate
tracking
tracking
BMM
Datacenter A
aggregate
tracking
tracking
BMM
• A single BMM cluster encompasses
multiple pipelines
○ 1 cluster per data center
Topology
Datacenter A
aggregate
tracking
tracking
BMM
metrics
aggregate
metrics
Datacenter
B
aggregate
tracking
tracking
BMM
metrics
aggregate
metrics
Datacenter C
aggregate
tracking
tracking
BMM
metrics
aggregate
metrics
...
BMM Architecture
BMM
BMM is built on Brooklin
DestinationsSources
Data stores
Messaging systems
Microsoft
EventHubs
Data stores
Messaging systems
Microsoft
EventHubs
BMM is built on Brooklin
DestinationsSources
Messaging systems
Microsoft
EventHubs
Messaging systems
Microsoft
EventHubs
Data stores Data stores
BMM is built on Brooklin
BMM is built on Brooklin
Brooklin
Engine
Kafka src
connector
Kafka dest
connector
Management
Rest API
Diagnostics
Rest API
ZooKeeper
Management/
monitoring
portal
SRE/op
dashboards
BMM is built on Brooklin
Brooklin
Engine
Kafka src
connector
Kafka dest
connector
Management
Rest API
Diagnostics
Rest API
ZooKeeper
Management/
monitoring
portal
SRE/op
dashboards
Dynamic Management
Brooklin
Engine
Kafka src
connector
Kafka dest
connector
Management
Rest API
Diagnostics
Rest API
ZooKeeper
Management/
monitoring
portal
SRE/op
dashboards
Creating a pipeline
Brooklin
Engine
Management
Rest API
ZooKeeper
create POST /datastream
name: mm_DC1-tracking_DC2-aggregate-tracking
connectorName: KafkaMirrorMaker
source:
connectionString: kafkassl://DC1-tracking-vip:12345/topicA|topicB
destination:
connectionString: kafkassl://DC2-aggregate-tracking-vip:12345
metadata:
num-streams: 5
Creating a pipeline
ZooKeeper
BMM
host
BMM
host
BMM
host
BMM
host
BMM
host
Updating a pipeline
Brooklin
Engine
Management
Rest API
ZooKeeper
update PUT /datastream/mm_DC1-tracking_DC2-aggregate-
tracking
name: mm_DC1-tracking_DC2-aggregate-tracking
connectorName: KafkaMirrorMaker
source:
connectionString: kafkassl://DC1-tracking-vip:12345/topicA|topicB|topicC|topicD
destination:
connectionString: kafkassl://DC2-aggregate-tracking-vip:12345
metadata:
num-streams: 10
Updating a pipeline
ZooKeeper
BMM
host
BMM
host
BMM
host
BMM
host
BMM
host
Dynamic Management
Brooklin
Engine
Kafka src
connector
Kafka dest
connector
Management
Rest API
Diagnostics
Rest API
ZooKeeper
Management/
monitoring
portal
SRE/op
dashboards
On-demand Diagnostics
Brooklin
Engine
Diagnostics
Rest API
ZooKeeper
getAllStatus GET /diag?datastream=mm_DC1-tracking_DC2-aggregate-tracking
host1.prod.linkedin.com:
datastream: mm_DC1-tracking_DC2-aggregate-tracking
assignedTopicPartitions: [topicA-0, topicA-3, topicB-0, topicB-2]
autoPausedPartitions: [{topicA-3: {reason: SEND_ERROR, description: failed to produce messages from this
partition}}]
manuallyPausedPartitions: []
host2.prod.linkedin.com:
datastream: mm_DC1-tracking_DC2-aggregate-tracking
assignedTopicPartitions: [topicA-1, topicA-2, topicB-1, topicB-3]
autoPausedPartitions: []
manuallyPausedPartitions: []
Error Isolation
● Manually pause and resume mirroring at every level
○ Entire pipeline, topic, topic-partition
● BMM can automatically pause mirroring of partitions
○ Auto-resumes the partitions after configurable duration
● Flow of messages from other partitions continue
Processing Loop
while (!shutdown) {
records = consumer.poll();
producer.send(records);
if (timeToCommit) {
producer.flush();
consumer.commit();
}
}
Producer flush can be expensive
while (!shutdown) {
records = consumer.poll();
producer.send(records);
if (timeToCommit) {
producer.flush();
consumer.commit();
}
}
Long Flush
producer.flush() can take several minutes
200
100
0
flushlatency(seconds)
Rebalance Storms
consumers rebalance after max.poll.interval.ms
3
2
1
0
rebalancespersecond
Increase max.poll.interval.ms?
● Reduces chances of consumer rebalance
● Risk detecting real failures late
Flushless Produce
consumer.poll() → producer.send(records) → producer.flush() → consumer.commit()
Flushless Produce
Only commit “safe” acknowledged checkpoints:
consumer.poll() → producer.send(records) → consumer.commit(offsets)
Flushless Produce
sp0 consumer producer
checkpoint
manager
o1, o2 o1, o2 o1, o2
o1
o2
Source
Destination
ack(sp0, o2)
dp0
dp1
● Checkpoint manager maintains producer-acknowledged offsets for
each source partition
Source partition sp0
in-flight: [o1]
acked: [o2]
safe checkpoint: --
Flushless Produce
sp0 consumer producer
checkpoint
manager
o3, o4 o3, o4 o3, o4
o3
o4
Source
Destination
ack(sp0, o1)
dp0
dp1
● Update safe checkpoint to largest acknowledged offset that is less
than oldest in-flight (if any)
Source partition sp0
in-flight: [o3, o4]
acked: [o1, o2]
safe checkpoint: o2
BMM Performance
● Use same consumer/producer configs as KMM
● Single host: 64 GB memory, 24 CPU (12 cores each)
● Metrics:
○ Throughput (output compressed bytes/sec)
○ Memory utilization %
○ CPU utilization %
Throughput
Memory Utilization
CPU Utilization
BMM Performance
● BMM is CPU-bound
● Metrics (20 consumer-producer pairs):
○ Throughput: up to 28 MB/s
○ Memory utilization: 70%
○ CPU utilization: 97%
Future
• 70%+ CPU time spent in decompression & re-
compression
○ GZIPInputStream.read(): ~10%
○ GZIPOutputStream.write(): ~61%
• “Passthrough” mirroring
Performance
• Rebalances cause drop in availability
• Kafka low-level consumer
Stability
• Auto-scaling: adjust number of consumers
based on throughput needs
Scalability
Thank you

More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn