A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff Gilmore, AWS

A Look into the Mirror
Patterns and Best Practices for MirrorMaker2
© 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved. 1
Cliff Gilmore
Principal Solutions Architect – Streaming Data

Agenda
• MirrorMaker2 Overview
• Live Migration
• Active Passive
• Hub and Spoke / Aggregation
• Active Active
• Capacity Planning
• Best Practices

Key MirrorMaker2 Features
• Multi-cluster data replication engine based on Kafka Connect framework
• Detects new topics and partitions in self-discovery mode
• Automatically synchronizes topic configuration & ACLs
• Supports “active/active” cluster pairs, as well as any number of active clusters
• Provides new metrics including end-to-end replication latency across multiple clusters
• Emits the offsets required to migrate consumers between clusters and tooling for offset
translation (in Apache Kafka 2.7, offset sync is also automated)
• Configuration file for specifying multiple clusters and replication flows in one place

MirrorMaker2 Core Concepts
• Connectors
• MirrorSourceConnector
• MirrorSinkConnector
• MirrorCheckpointConnector
• MirrorHeartbeatConnector
• Replication Policies
• Controls naming of replicated topics
• DefaultReplicationPolicy
• <source>.<topic> convention
• CustomReplicationPolicy
• Create your own policy based on preferences (IE, no prefix for migration/active-passive)
• Configuration Parameters
• topics lets you control which topics are replicated
• Groups lets you control which consumer groups have offsets replicated
• More configuration for options, connectivity and topology defined at
• https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0

Live Migrations with MM2

Live Migration Pattern (Initial State)
Source
Cluster
Destination
Cluster
Consumer
Producers

Live Migration Pattern (Enable Replication)
Source
Cluster
Destination
Cluster
Consumer
Producers
Kafka Connect
MirrorSourceConnector
MirrorCheckpointConnector
MirrorHeartBeatConnector
Replication for
Topics and Offsets
Replication for
Topics and Offsets
Source Cluster
Heartbeats
Topic Names stay consistent via CustomReplicationPolicy

Live Migration Pattern (Stop Consumer, Update Offsets)
Source
Cluster
Destination
Cluster
Producers
Kafka Connect
Replication for
Topics and Offsets
Replication for
Topics and Offsets
Source Cluster
Heartbeats
Note: OffsetSync can be Automated with the MirrorMaker2 version part of Apache Kafka 2.7 release
RemoteClusterUtils to
update offsets

Live Migration Pattern (Start Consumer on Destination)
Source
Cluster
Destination
Cluster
Producers
Kafka Connect
Replication for
Topics and Offsets
Replication for
Topics and Offsets
Source Cluster
Heartbeats
Consumer

Live Migration Pattern (Point Producers to Destination)
Source
Cluster
Destination
Cluster
Kafka Connect
Replication for
Topics and Offsets
Replication for
Topics and Offsets
Source Cluster
Heartbeats
Consumer
Producers

Live Migration Pattern (Shut Down Source/Connect)
Destination
Cluster
Consumer
Producers
• The source cluster can now be safely shut
down
• This leaves us with our clients migrated to
the new cluster
• Very little delay is introduced with just the
restart time of each topic’s producers and
consumers being the disruption

Active – Passive Replication with DR Failover/Back

Active-Passive (Normal State)
Source
Cluster
Destination
Cluster
Consumer
Producers
Kafka Connect
Replication for
Topics and Offsets
Replication for
Topics and Offsets
Source Cluster
Heartbeats
Datacenter/Region #1 Datacenter/Region #2
topic1
topic1
region1.topic1

Active-Passive (Cluster Failure, Bounce clients)
Source
Cluster
Destination
Cluster
Consumer
Producers
Kafka Connect
topic1
region1.topic1
update offsets

Active-Passive (Datacenter Failure)
Source
Cluster
Destination
Cluster
Consumer
Producers
Kafka Connect
Copy of
Producers
Consumers with
same group ID
topic1
region1.topic1
update offsets

Active-Passive (Reverse Replication)
Source
Cluster
Destination
Cluster
Kafka Connect
Copy of
Producers
Consumers with
same group ID
topic1
region1.topic1
topic1
region2.topic1

Active-Passive (Reverse Replication)
Source
Cluster
Destination
Cluster
Kafka Connect
MirrorSinkConnector
Copy of
Producers
Consumers with
same group ID
topic1
region1.topic1
topic1
region2.topic1
Replication for
Topics and Offsets
Replication for
Topics and Offsets
Dest Cluster
Heartbeats
RemoteClusterUtils
to
update offsets

Active-Passive (Failback Producer/Consumer)
Source
Cluster
Destination
Cluster
Kafka Connect
MirrorSinkConnector
topic1
region1.topic1
topic1
region2.topic1
Replication for
Topics and Offsets
Replication for
Topics and Offsets
Dest Cluster
Heartbeats
Consumer
Producers

Active-Passive (Resume Normal Replication)
Source
Cluster
Destination
Cluster
Consumer
Producers
Kafka Connect
Replication for
Topics and Offsets
Replication for
Topics and Offsets
Source Cluster
Heartbeats
topic1
region2.topic1
topic1
region1.topic1

Hub and Spoke aka Aggregation Pattern

Hub and Spoke/Aggregation
Destination
Cluster
Consumer
Producers
Kafka Connect
Replication for
Topics
Consumer
Producers
Consumer
Producers
Consumer
Local Datacenter/Region #2
Central Datacenter/Region
region1.topic1
region2.topic1
region3.topic1

Active – Active also useful for Cloud Bridging

Active-Active
OnPrem
Cluster
Cloud
Cluster
Consumers
Producers
Kafka Connect
Datacenter
Cloud Region
topic1
cloud.topic1 topic1
datacenter.topic1
Producers
Consumers
Kafka Connect

Capacity Planning Considerations
• Run Kafka Connect in distributed mode
• Mainly size based on network throughput
• Always consider the impact of a downed worker in capacity planning
• Remember to measure impact on source cluster (extra consumer)
• Follow general task to CPU core ratio of 8-16 tasks per core
• Avoid SMTs and type conversion
• Use CPU/Network optimized instances/VMs/machines as disk or high amounts of RAM are not needed
• Increase the Heap size to something between 6-12GB

Best Practices / Observations
• Keep partitions per connect task ratio <=10:1
• There can be issues with offset replication when > 10:1
• https://issues.apache.org/jira/browse/KAFKA-12558
• Set consumer.task.id in the connect configuration manually
• This allows for client quotas to be set to control replication max throughput for balancing with other
producers/consumers
• Monitor task status vs Connect REST interface
• Monitor JMX metrics to ensure latency and throughput are as expected
• You can’t measure consumer lag due to the way MM2 consumes from the source assign() vs subscribe()
• Negative lag on replication destinations can occur if replication is pending or behind and offset sync is
enabled

Monitoring Metrics
• MirrorSourceConnector
• record-count
• record-age-ms
• record-age-ms-min
• record-age-ms-max
• record-age-ms-avg
• replication-latency-ms
• replication-latency-ms-min
• replication-latency-ms-max
• replication-latency-ms-avg
• byte-rate
• MirrorCheckpointConnector
• checkpoint-latency-ms
• checkpoint-latency-ms-min
• checkpoint-latency-ms-max
• checkpoint-latency-ms-avg

Thanks!
Twitter : @cwgdata
LinkedIN: https://www.linkedin.com/in/cliffgilmore

A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff Gilmore, AWS

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff Gilmore, AWS

Similar to A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff Gilmore, AWS (20)

More from HostedbyConfluent

More from HostedbyConfluent (20)

Recently uploaded

Recently uploaded (20)

A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff Gilmore, AWS