An embedded mirror maker is being prototyped to address the large number of dedicated machines currently used for mirroring. The proposed approach would embed the mirroring logic directly in the Kafka brokers to reduce latency, load, and number of machines. It would use idempotent producers, dynamic configuration via Zookeeper, and handle scenarios like leader movement. Challenges include tighter broker/mirror coupling and ensuring message ordering across clusters.
3. History
KAFKA-74 (Oct 2011):
Originally implemented
with embedded approach
KAFKA-249 (Apr 2012):
Deprecated and replaced by
standalone approach in
0.7.1
NOW (Apr 2016): Re-
visiting and prototyping
an embedded approach
6. Motivation
Save machines (412 dedicated machines across 26 fabrics)
Save network (Eliminate producer to destination cluster network utilization)
Reduced latency (Shorten processing and network time)
Reduced request load on destination cluster, equal request load on source
cluster (Eliminate produce requests)
Equal processing load on source and destination cluster
Enable dynamic configuration of topics to mirror
7. Drawback
Tighter coupling of server and mirror features:
- Broker vulnerable to errors thrown from mirror (need good isolation)
- Mirror deployment tied to broker deployment (more difficult to hotfix)
Have to pass in clunky consumer configurations if customization is required
(can be mitigated by dynamic configuration via Zookeeper)
More complex server and mirror code (prototype proves it to be not too bad)
8. High level approach
Idempotent producer and free exactly once
transfer
Improve latency by supporting pipelining
(especially cross geographic mirroring)
No polling (especially idle topics)
Immediate reaction to partition expansion
and topic deletion
Idempotence can be done at log level
Pipeline does not help much (with
throughput)
Polling traffic is cheap
Issue with automatic topic creation
Source
Cluster
Produce Destination
Cluster
Consume
10. Static configuration
/** ********* Mirror configuration ***********/
val NumMirrorConsumersProp = "num.mirror.consumers"
val MirrorRefreshMetadataBackoffMsProp = "mirror.refresh.metadata.backoff.ms"
val MirrorOffsetCommitIntervalMsProp = "mirror.offset.commit.interval.ms"
val MirrorRequiredAcksProp = "mirror.required.acks"
val MirrorAppendMessageTimeoutMsProp = "mirror.append.message.timeout.ms"
val MirrorTopicMapProp = "mirror.topic.map"
/** ********* Mirror configuration ***********/
val NumMirrorConsumersDoc = "Number of mirror consumers to use per destination broker per source cluster."
val MirrorOffsetCommitIntervalMsDoc = "The interval in milliseconds that the mirror consumer threads will use to commit offsets."
val MirrorRefreshMetadataBackoffMsDoc = "The interval in milliseconds used by the mirror consumer manager to refresh metadata of both
source and destination cluster(s)"
val MirrorRequiredAcksDoc = "This value controls when a message set append is considered completed."
val MirrorAppendMessageTimeoutMsDoc = "The amount of time the broker will wait trying to append message sets before timing out."
val MirrorTopicMapDoc = "A list of topics that this cluster should be mirroring. The format is
SOURCE_BOOTSTRAP_SERVERS_0:TOPIC_PATTERN;SOURCE_BOOTSTRAP_SERVERS_1:TOPIC_PATTERN"
12. Demo
Setup:
Destination: Local 2-node cluster with
local zookeeper (gitli trunk)
Source: kafka.uniform(0.8.2.66) &
kafka.charlie(0.9.0.2)
Validation: Kafka monitor trunk
Scenarios:
- Clean shutdown broker
- Rolling bounce brokers
- Pause and resume mirror
- Restart mirror
Guarantee:
- Zero data loss
- Zero data duplication
17. Metadata refresh: finite state machine
Normal
Updated
Outdated
Paused
MirrorClusterCommandListener:
Listen to Zookeeper data change
Commit offsets synchronously & assign new
partition map to MirrorConsumer
Partition map updated by
MetadataRefreshThread
periodically and upon
request
Caught not leader for partition or
unknown topic or partition error
from ReplicaManager
Request metadata refresh from
MirrorConsumerManager
20. Appending to log
Append to log:
only if thread state is normal or pause (abort if
metadata outdated or updated)
Update appended offsets:
when required acks are fulfilled and received callback
from replica manager with no error (skip and request metadata
update if leadership has changed)
24. Message format version & timestamp
/**
* The "magic" value
* When magic value is 0, the message uses absolute
offset and does not have a timestamp field.
* When magic value is 1, the message uses relative
offset and has a timestamp field.
*/
val MagicValue_V0: Byte = 0
val MagicValue_V1: Byte = 1
val CurrentMagicValue: Byte = 1
/**
* This method validates the timestamps of a message.
* If the message is using create time, this method checks if it is
within acceptable range.
*/
private def validateTimestamp(message: Message,
now: Long,
timestampType: TimestampType,
timestampDiffMaxMs: Long) {
if (timestampType == TimestampType.CREATE_TIME &&
math.abs(message.timestamp - now) > timestampDiffMaxMs)
throw new InvalidTimestampException(...)
if (!mirrored && message.timestampType ==
TimestampType.LOG_APPEND_TIME)
throw new InvalidTimestampException(...)
}
25. Message sets & offset assignment
Issue: No in-place offset assignment and need recompression
Solution: Use split iterator to split received message sets
into singular message sets (only containing one outer
message)
Received message set
Outer: | 4
| 7 |
10 |
Inner: | 0 | 1 | 2 | 3 | 4 | 0 | 1 | 2 | 0
| 1 | 2 |
Expected message set
Outer: | 4
| 7 |
10 |
Inner: | 0 | 1 | 2 | 3 | 4 | 0 | 1 | 2 | 0
| 1 | 2 |
26. Future work
Support custom partition
assignment scheme
Measure and reduce latency
Per-topic configurations
29. Number of Mirror Maker Machines
1 #!/bin/sh
2
3
4 TOTAL_MACHINES=0
5 NUM_FABRICS=0
6 for i in `eh -e '%fabrics'`;do
7 NUM_IN_FABRIC=`eh -e %%${i}.kafka-mirror-maker | grep -iv noclusterdef | wc -l`
8 if [ $NUM_IN_FABRIC -gt 0 ]; then
9 TOTAL_MACHINES=$((TOTAL_MACHINES + NUM_IN_FABRIC))
10 NUM_FABRICS=$((NUM_FABRICS + 1))
11 echo ${i}: $NUM_IN_FABRIC;
12 fi
13 done
14 echo There are $TOTAL_MACHINES machines in total across $NUM_FABRICS fabrics
Editor's Notes
Mirror maker is a tool developed for Kafka to copy data from one Kafka cluster to another.
It is essentially a consumer/producer pair.
It runs on dedicated machines.
This allows us to aggregate data from multiple datacenters into one container.
4 years ago, LinkedIn runs a much smaller scale Kafka deployment. Cost in terms of hardware was not the highest priority. It was cleaner to decouple the mirror maker tool from core functionality.
But that decision is near-sighted
But since then, our physical footprint has exploded. Now we have to consider the financial impact of that decision.
Eliminate redundant step of decompressing and parsing into records, directly append to log instead
We propose to use the destination cluster zookeeper as a public interface to provide dynamic configuration and admin command functionalities.
Talk about implementation choices
I encountered several unexpected caveats while implementing the new design. I think they are interesting enough to share and everyone can get a deeper exposure to the inner working of Kafka server.