More Related Content Similar to Disaster Recovery and High Availability with Kafka, SRM and MM2 (20) More from Abdelkrim Hadjidj (9) Disaster Recovery and High Availability with Kafka, SRM and MM22. © 2019 Cloudera, Inc. All rights reserved. 2
Quick intro
• Senior Specialist Solution Engineer at Cloudera
• Focus on CDF offering
● Edge Management & IoT (MiNiFi, CEM)
● Flow Management (NiFi, Registry)
● Stream Processing (Kafka, KStreams, SMM, SR, …)
• Founder of Future of Data Paris Meetup http://tiny.cc/fodp
• Founder of Solutions Engineers of Paris http://tiny.cc/PSE
@ahadjidj
3. © 2019 Cloudera, Inc. All rights reserved. 3
Kafka Disaster Recovery options
Broker
Broker
Broker
DC1 DC2
Data
DC1 DC2
Data
Dual ingest
Zero RPO
Mirroring**
Very low RPO
DC2 DC3
Data
Multiple DC*
Zero RPO
BrokerBroker Broker
Broker
Broker
Broker
Broker
Broker
Broker
Broker
Broker
DC1
Broker
* Stretch cluster on geographically distributed DC is not recommended
** Replication is used for internal broker replication
4. © 2019 Cloudera, Inc. All rights reserved. 4
Agenda
From MM to MM2 and SRM
Active Passive Architecture
Active Active Architectures
Other use cases
Monitoring
Q&A
5. © 2019 Cloudera, Inc. All rights reserved. 5
Mirror Maker use cases
DC1 DC2 DC3
K1 K2 K3
MM aggregate
Aggregation
DC1 DC2 DC3
K1 K2 K3
MM MM
Data Deployment
MMK1 K2
P
P
P
P
P
P
C
C
C
C
C
C
Segmentation
MMK2 K1
P
P
P
P
P
P
C
C
C
C
C
C
MMK3
P
P
P
P
P
P
Acquisitions & mergers
6. © 2019 Cloudera, Inc. All rights reserved. 6
Mirror Maker use cases
Tracking
Queuing
P
P
P
P
P
P
P
P
P
P
P
P
C
C
C
C
C
C
C
C
C
C
C
C
Tracking
Aggregate
MM
Queuing
Aggregate
MM
C
C
C
C
C
C
C
C
C
C
C
C
HDFS
HDFS
MM
MM
7. © 2019 Cloudera, Inc. All rights reserved. 7
Mirror Make limitations for Disaster Recovery
• Static Whitelists and Blacklists
• Configuration synch
• Manual Topic Naming to avoid Cycles
• Scalability and Throughput Limitations due to Rebalances
• Lack of Monitoring and Operational Support
• No Disaster Recovery, Migration, Failover
• Too many MirrorMaker Clusters
8. © 2019 Cloudera, Inc. All rights reserved. 8
Streams
Replication
Manager
• Mirror Maker 2 KIP-382
• Supports active-active, multi-
cluster, cross DC replication &
other complex scenarios
• Leverage Kafka Connect for
scalability and HA
• Replicate data and configurations
(ACL, partitioning, new topics, etc)
• Offset translation for failover and
failback
• Monitoring integration with SMM
A
B
C
X
Y
C
C
C
Kafka
Connect
MM2 cluster
X
topic1.part1
topic1.part0
A
topic1.part1
topic1.part0
A.topic1.part1
A.topic1.part0
B
topic1.part1
topic1.part0
X.topic1.part1
X.topic1.part0
10. © 2019 Cloudera, Inc. All rights reserved. 10
Producers send to primary if
available, to secondary if not
Consumers can be migrated between
primary and secondary clusters.
Active/standby
Data, offset syncs,
and consumer
checkpoints.
Producers
Producers
Producers
Producers
Producers
Consumers
VIP/Load
Balancers SRM
Primary
Cluster
Secondary
Cluster
11. © 2019 Cloudera, Inc. All rights reserved. 11
Configuration file
• Simple file configuration
• Multi directional
• Fine grained replication
• Topics white/black lists
• Group white/black lists
• Interval configurations
• Supports patterns
$ ./bin/connect-mirror-maker.sh mm2.properties
12. © 2019 Cloudera, Inc. All rights reserved. 12
Remote topics
• Replicated topics are
renamed according to
ReplicationPolicy.
• Default policy :
<source>.<topic>
• Can implement custom
policies
topic1
topic2
secondary.topic1
secondary.topic2
topic1
topic2
primary.topic1
primary.topic2
SRM
Primary
Cluster
Secondary
Cluster
13. © 2019 Cloudera, Inc. All rights reserved. 13
Heartbeats
• MM2 emits a heartbeat topic
in each source cluster, which
is replicated to other clusters
• Downstream cluster uses this
topic to verify that
● The connector is running
● The corresponding
source cluster is
available
target=primary
source=secondary
Timestamp=5434356
primary.heartbeats
SRM
Secondary
Cluster
14. © 2019 Cloudera, Inc. All rights reserved. 14
Offset Syncs
• Offset sync stream maps
offsets between mirrored
clusters.
topic=primary.topic1
partition=4
upstreamOffset=100
downstreamOffset=102
primary.offset-syncs.internal
SRM
Secondary
Cluster
15. © 2019 Cloudera, Inc. All rights reserved. 15
Checkpoints
• Checkpoint stream replicates
consumer group state.
• MM2 periodically
emit checkpoints in the
destination cluster
• The checkpoint topic is log-
compacted to reflect only the
latest offsets across
consumer groups
topic=primary.topic1
partition=4
group=consumer-group-2
upstreamOffset=100
offset=102
primary.checkpoints.internal
SRM
Secondary
Cluster
16. © 2019 Cloudera, Inc. All rights reserved. 16
Cross-cluster offset translation
Translate offsets between clusters via RemoteClusterUtils
Map<TopicPartition, Long> newOffsets =
RemoteClusterUtils.translateOffsets(
newClusterProperties, oldClusterName,
consumerGroupId);
consumer.seek(newOffsets);
● offset translation based on checkpoints in new cluster
● no connection to old cluster required
17. © 2019 Cloudera, Inc. All rights reserved. 17
Publish to topic
Active/standby
Data, offset syncs,
and consumer
checkpoints.
Producers
Producers
Producers
Producers
Producers
Consumers
VIP/Load
Balancers SRM
Primary
Cluster
Secondary
Cluster
Subscribe to *.topic
18. © 2019 Cloudera, Inc. All rights reserved. 18
Publish to topic
Primary down: fail over
Migrate consumers
Data, offset syncs,
and consumer
checkpoints.
Producers
Producers
Producers
Producers
Producers
Consumers
VIP/Load
Balancers SRM
Primary
Cluster
Secondary
Cluster
Use RemoteClusterUtil to migrate to
primary.topic (old data) and topic (new
data)
19. © 2019 Cloudera, Inc. All rights reserved. 19
Publish to topic
Primary down: fail over
Migrate consumers
Data, offset syncs,
and consumer
checkpoints.
Producers
Producers
Producers
Producers
Producers
Consumers
VIP/Load
Balancers SRM
Primary
Cluster
Secondary
Cluster
$ srm-control offsets --bootstrap-server :9092 --source primary --group foo --export > out.csv
$ kafka-consumer-groups --bootstrap-server B_host:9092 --reset-offsets --group foo --execute --from-file out.csv
20. © 2019 Cloudera, Inc. All rights reserved. 20
Publish to topic
Primary permanently lost? Recover from secondary.
Lost primary topics can be recovered from remote topics on secondary cluster.
Producers
Producers
Producers
VIP/Load
Balancers SRM
Primary
Cluster
Secondary
Cluster
Primary-2
topic1
topic2
secondary.topic1
secondary.topic2
secondary.primary.topic1
secondary.primary.topic2
topic1
topic2
primary.topic1
primary.topic2
primary-2.topic1
primary-2.topic2
Data from old primary
22. © 2019 Cloudera, Inc. All rights reserved. 22
Publish to retail-store
Active/standby Demo Scenario
Producers
Producers
NiFi Producers
Producers
NiFi
SRM
Paris
Cluster
NYC
Cluster
Subscribe to retail-store
and nyc_retail-store
24. © 2019 Cloudera, Inc. All rights reserved. 24
Publish to topic
Active/active: Cross Consumer Groups or XDCR
Consumer subscription defines the patterns
Producers
Producers
Producers
Producers
Producers
Consumers
VIP/Load
Balancers SRM
Primary
Cluster
Secondary
Cluster
Produce to both cluster.
Producers
Producers
Consumers
Consume from both clusters.
26. © 2019 Cloudera, Inc. All rights reserved. 26
Publish to topic
Cross-cluster consumer groups
Effectively one big consumer group
Producers
Producers
Producers
Producers
Producers
Consumers
VIP/Load
Balancers SRM
Primary
Cluster
Secondary
Cluster
Produce to both cluster.
Producers
Producers
Consumers
Subscribe to topic
R1
R1 R1
Subscribe to topic
R2
R2
R2
27. © 2019 Cloudera, Inc. All rights reserved. 27
Publish to topic
Cross-cluster consumer groups
What it takes to fail-over? Nothing
Producers
Producers
Producers
Producers
Producers
Consumers
VIP/Load
Balancers SRM
Primary
Cluster
Secondary
Cluster
Produce to both cluster.
Producers
Producers
Consumers
Subscribe to topic
R3
Subscribe to topic
R3
R3
Primary
Cluster
DC temporarily lost
28. © 2019 Cloudera, Inc. All rights reserved. 28
Publish to topic
Cross-cluster consumer groups
What it takes to fail-back? Nothing also
Producers
Producers
Producers
Producers
Producers
Consumers
VIP/Load
Balancers SRM
Primary
Cluster
Secondary
Cluster
Produce to both cluster.
Producers
Producers
Consumers
Recover from last point and
resume – some events may
be delayed
R4
R4 R4
DC issue resolved
29. © 2019 Cloudera, Inc. All rights reserved. 29
Publish to topic
Cross-cluster consumer groups
DC permanently lost
Producers
Producers
Producers
Producers
Producers
Consumers
VIP/Load
Balancers SRM
Primary-2
Cluster
Secondary
Cluster
Produce to both cluster.
Producers
Producers
Consumers
Data previously in primary is
not lost and can be recovered
from secondary
Subscribe to topic
Primary
Cluster
Bring new DC
31. © 2019 Cloudera, Inc. All rights reserved. 31
Publish to topic
Cross Data Center Replication XDCR
All consumers process all records
Producers
Producers
Producers
Producers
Producers
Consumers
VIP/Load
Balancers SRM
Primary
Cluster
Secondary
Cluster
Produce to both cluster.
Producers
Producers
Consumers
Subscribe to *.topic
R1
R1 R1
Subscribe to *.topic
R1 R1
R2 R2
R2
R2 R2
34. © 2019 Cloudera, Inc. All rights reserved. 34
Cloud migration or Kafka version upgrade