Brooklin Mirror Maker (BMM) was created by LinkedIn to address limitations in Kafka Mirror Maker (KMM) for mirroring data between data centers at large scale. BMM is built on LinkedIn's stream ingestion service called Brooklin which provides better operability, fault isolation, and performance optimizations compared to KMM. BMM uses REST APIs for dynamic management and diagnostics of mirroring pipelines. It also includes features like flushless produce and passthrough compression to improve performance. LinkedIn has fully replaced KMM with BMM and plans to open source BMM in the future.
8. Kafka Mirror Maker(KMM) Topology
Datacenter B
aggregate
tracking
tracking
KMM KMM
Datacenter A
aggregate
tracking
tracking
KMM KMM
● Each KMM pipeline
○ mirrors data from 1 source cluster to 1
destination cluster
○ constitutes its own KMM cluster
10. ● Static configuration file per KMM cluster requires every change
to be deployed
Example - Add a Topic in KMM
● Let’s say we have a pipeline (a KMM cluster) with 100+ hosts
● And 100+ pipelines ?
11. KMM Pain Points
● Hard to operate
○ hard to add new topic
○ difficult to split the pipeline
● One bad partition brings down the pipeline
○ deleted topic
○ ACL issue
● Performance issues
○ Unable to catch up with traffic
○ Increased lag
12. : (
Your Kafka Mirror Maker runs into problems and need to restart. We’re just collecting some error
infos and we will restart for you. (0% completed)
14. Brooklin - Stream Ingestion Service
DestinationsSources
Data stores
Messaging systems
Microsoft
EventHubs
Data stores
Messaging systems
Microsoft
EventHubs
15. BMM is built on Brooklin
DestinationsSources
Data stores
Messaging systems
Microsoft
EventHubs
Data stores
Messaging systems
Microsoft
EventHubs
16. Brooklin Mirror Maker
● Built on top of our stream ingestion service, Brooklin
○ Better operability
○ Fault isolation
○ Performance optimizations
● BMM has fully replaced KMM at LinkedIn today
18. KMM vs BMM
Datacenter B
aggregate
tracking
tracking
BMM
Datacenter A
aggregate
tracking
tracking
BMM
Datacenter B
aggregate
tracking
tracking
KMM KMM
Datacenter A
aggregate
tracking
tracking
KMM KMM
● BMM is one cluster per data center
20. Dynamic Management API
Brooklin
Engine
Kafka src
connector
Kafka dest
connector
Management
Rest API
Diagnostics
Rest API
ZooKeeper
Management/
monitoring
portal
SRE/op
dashboards
21. Restful API- Creating a Pipeline
Brooklin
Engine
Management
Rest API
ZooKeeper
create POST /datastream
name: mm_DC1-tracking_DC2-aggregate-tracking
connectorName: KafkaMirrorMaker
source:
connectionString: kafkassl://DC1-tracking-vip:12345/topicA|topicB
destination:
connectionString: kafkassl://DC2-aggregate-tracking-vip:12345
metadata:
taskNums: 5
22. Restful API - Updating a Pipeline
Brooklin
Engine
Management
Rest API
ZooKeeper
update PUT /datastream/mm_DC1-tracking_DC2-aggregate-
tracking
name: mm_DC1-tracking_DC2-aggregate-tracking
connectorName: KafkaMirrorMaker
source:
connectionString: kafkassl://DC1-tracking-vip:12345/topicA|topicB|topicC|topicD
destination:
connectionString: kafkassl://DC2-aggregate-tracking-vip:12345
metadata:
taskNums: 10
^topic*.
23. Pause a Pipeline
● Manually pause and resume mirroring for each pipeline
● BMM can automatically pause mirroring for bad partitions for fault
isolation
○ Flow of messages from healthy partitions continue
○ Auto-resumes the partitions after configurable duration
27. Brooklin Mirroring Pseudocode
while (!shutdown) {
records = consumer.poll();
producer.send(records);
if (timeToCommit) {
producer.flush();
consumer.commit();
}
}
Producer flush can be expensive
28. Flushless Produce
Only commit “safe” acknowledged checkpoints:
consumer.poll() → producer.send(records) → consumer.commit(offsets)
consumer.poll() → producer.send(records) → producer.flush() → consumer.commit()
30. Flushless Produce
sp0 consumer producer
checkpoint
manager
o3, o4 o3, o4 o3, o4
o3
o4
Source
Destination
ack(sp0, o1)
dp0
dp1
● Update safe checkpoint to largest acknowledged offset that is less
than oldest in-flight (if any)
Source partition sp0
in-flight: [o3, o4]
acked: [o1, o2]
safe checkpoint: o2
31. Manage Performance through Task
● Datastream task
○ Consists of a dedicated kafka consumer and use a share producer pool to
produce the data
○ Performance is controlled by the # of Tasks
○ Tasks are assigned to each host within the BMM cluster
● BMM uses sticky assignment to speeds up the task allocation
33. BMM Performance Numbers
● Testing environment
○ Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, 12 cores, 64GB RAM
● Performance Metrics with 20 datastream tasks:
○ Throughput: compressed bytes up to 28 MB/s
○ Memory utilization: 70%
○ CPU utilization: ~100%
34. Passthrough Compression
● BMM is CPU bound, 70%+ CPU time is spent in decompression & re-
compression
○ GZIPInputStream.read(): ~10%
○ GZIPOutputStream.write(): ~61%
● “Passthrough” mirroring - skip the decompression & recompression
○ Throughput ~ 100MB/s
○ CPU utilization drops to 50%