SlideShare a Scribd company logo
1 of 37
Kafka Disaster Recovery
Abdelkrim Hadjidj
Senior Data Streaming Specialist
© 2019 Cloudera, Inc. All rights reserved. 2
Quick intro
• Senior Specialist Solution Engineer at Cloudera
• Focus on CDF offering
● Edge Management & IoT (MiNiFi, CEM)
● Flow Management (NiFi, Registry)
● Stream Processing (Kafka, KStreams, SMM, SR, …)
• Founder of Future of Data Paris Meetup http://tiny.cc/fodp
• Founder of Solutions Engineers of Paris http://tiny.cc/PSE
@ahadjidj
© 2019 Cloudera, Inc. All rights reserved. 3
Kafka Disaster Recovery options
Broker
Broker
Broker
DC1 DC2
Data
DC1 DC2
Data
Dual ingest
Zero RPO
Mirroring**
Very low RPO
DC2 DC3
Data
Multiple DC*
Zero RPO
BrokerBroker Broker
Broker
Broker
Broker
Broker
Broker
Broker
Broker
Broker
DC1
Broker
* Stretch cluster on geographically distributed DC is not recommended
** Replication is used for internal broker replication
© 2019 Cloudera, Inc. All rights reserved. 4
Agenda
From MM to MM2 and SRM
Active Passive Architecture
Active Active Architectures
Other use cases
Monitoring
Q&A
© 2019 Cloudera, Inc. All rights reserved. 5
Mirror Maker use cases
DC1 DC2 DC3
K1 K2 K3
MM aggregate
Aggregation
DC1 DC2 DC3
K1 K2 K3
MM MM
Data Deployment
MMK1 K2
P
P
P
P
P
P
C
C
C
C
C
C
Segmentation
MMK2 K1
P
P
P
P
P
P
C
C
C
C
C
C
MMK3
P
P
P
P
P
P
Acquisitions & mergers
© 2019 Cloudera, Inc. All rights reserved. 6
Mirror Maker use cases
Tracking
Queuing
P
P
P
P
P
P
P
P
P
P
P
P
C
C
C
C
C
C
C
C
C
C
C
C
Tracking
Aggregate
MM
Queuing
Aggregate
MM
C
C
C
C
C
C
C
C
C
C
C
C
HDFS
HDFS
MM
MM
© 2019 Cloudera, Inc. All rights reserved. 7
Mirror Make limitations for Disaster Recovery
• Static Whitelists and Blacklists
• Configuration synch
• Manual Topic Naming to avoid Cycles
• Scalability and Throughput Limitations due to Rebalances
• Lack of Monitoring and Operational Support
• No Disaster Recovery, Migration, Failover
• Too many MirrorMaker Clusters
© 2019 Cloudera, Inc. All rights reserved. 8
Streams
Replication
Manager
• Mirror Maker 2 KIP-382
• Supports active-active, multi-
cluster, cross DC replication &
other complex scenarios
• Leverage Kafka Connect for
scalability and HA
• Replicate data and configurations
(ACL, partitioning, new topics, etc)
• Offset translation for failover and
failback
• Monitoring integration with SMM
A
B
C
X
Y
C
C
C
Kafka
Connect
MM2 cluster
X
topic1.part1
topic1.part0
A
topic1.part1
topic1.part0
A.topic1.part1
A.topic1.part0
B
topic1.part1
topic1.part0
X.topic1.part1
X.topic1.part0
Active – Passive Architecture
© 2019 Cloudera, Inc. All rights reserved. 10
Producers send to primary if
available, to secondary if not
Consumers can be migrated between
primary and secondary clusters.
Active/standby
Data, offset syncs,
and consumer
checkpoints.
Producers
Producers
Producers
Producers
Producers
Consumers
VIP/Load
Balancers SRM
Primary
Cluster
Secondary
Cluster
© 2019 Cloudera, Inc. All rights reserved. 11
Configuration file
• Simple file configuration
• Multi directional
• Fine grained replication
• Topics white/black lists
• Group white/black lists
• Interval configurations
• Supports patterns
$ ./bin/connect-mirror-maker.sh mm2.properties
© 2019 Cloudera, Inc. All rights reserved. 12
Remote topics
• Replicated topics are
renamed according to
ReplicationPolicy.
• Default policy :
<source>.<topic>
• Can implement custom
policies
topic1
topic2
secondary.topic1
secondary.topic2
topic1
topic2
primary.topic1
primary.topic2
SRM
Primary
Cluster
Secondary
Cluster
© 2019 Cloudera, Inc. All rights reserved. 13
Heartbeats
• MM2 emits a heartbeat topic
in each source cluster, which
is replicated to other clusters
• Downstream cluster uses this
topic to verify that
● The connector is running
● The corresponding
source cluster is
available
target=primary
source=secondary
Timestamp=5434356
primary.heartbeats
SRM
Secondary
Cluster
© 2019 Cloudera, Inc. All rights reserved. 14
Offset Syncs
• Offset sync stream maps
offsets between mirrored
clusters.
topic=primary.topic1
partition=4
upstreamOffset=100
downstreamOffset=102
primary.offset-syncs.internal
SRM
Secondary
Cluster
© 2019 Cloudera, Inc. All rights reserved. 15
Checkpoints
• Checkpoint stream replicates
consumer group state.
• MM2 periodically
emit checkpoints in the
destination cluster
• The checkpoint topic is log-
compacted to reflect only the
latest offsets across
consumer groups
topic=primary.topic1
partition=4
group=consumer-group-2
upstreamOffset=100
offset=102
primary.checkpoints.internal
SRM
Secondary
Cluster
© 2019 Cloudera, Inc. All rights reserved. 16
Cross-cluster offset translation
Translate offsets between clusters via RemoteClusterUtils
Map<TopicPartition, Long> newOffsets =
RemoteClusterUtils.translateOffsets(
newClusterProperties, oldClusterName,
consumerGroupId);
consumer.seek(newOffsets);
● offset translation based on checkpoints in new cluster
● no connection to old cluster required
© 2019 Cloudera, Inc. All rights reserved. 17
Publish to topic
Active/standby
Data, offset syncs,
and consumer
checkpoints.
Producers
Producers
Producers
Producers
Producers
Consumers
VIP/Load
Balancers SRM
Primary
Cluster
Secondary
Cluster
Subscribe to *.topic
© 2019 Cloudera, Inc. All rights reserved. 18
Publish to topic
Primary down: fail over
Migrate consumers
Data, offset syncs,
and consumer
checkpoints.
Producers
Producers
Producers
Producers
Producers
Consumers
VIP/Load
Balancers SRM
Primary
Cluster
Secondary
Cluster
Use RemoteClusterUtil to migrate to
primary.topic (old data) and topic (new
data)
© 2019 Cloudera, Inc. All rights reserved. 19
Publish to topic
Primary down: fail over
Migrate consumers
Data, offset syncs,
and consumer
checkpoints.
Producers
Producers
Producers
Producers
Producers
Consumers
VIP/Load
Balancers SRM
Primary
Cluster
Secondary
Cluster
$ srm-control offsets --bootstrap-server :9092 --source primary --group foo --export > out.csv
$ kafka-consumer-groups --bootstrap-server B_host:9092 --reset-offsets --group foo --execute --from-file out.csv
© 2019 Cloudera, Inc. All rights reserved. 20
Publish to topic
Primary permanently lost? Recover from secondary.
Lost primary topics can be recovered from remote topics on secondary cluster.
Producers
Producers
Producers
VIP/Load
Balancers SRM
Primary
Cluster
Secondary
Cluster
Primary-2
topic1
topic2
secondary.topic1
secondary.topic2
secondary.primary.topic1
secondary.primary.topic2
topic1
topic2
primary.topic1
primary.topic2
primary-2.topic1
primary-2.topic2
Data from old primary
Active – Passive Demo
© 2019 Cloudera, Inc. All rights reserved. 22
Publish to retail-store
Active/standby Demo Scenario
Producers
Producers
NiFi Producers
Producers
NiFi
SRM
Paris
Cluster
NYC
Cluster
Subscribe to retail-store
and nyc_retail-store
Active - Active
© 2019 Cloudera, Inc. All rights reserved. 24
Publish to topic
Active/active: Cross Consumer Groups or XDCR
Consumer subscription defines the patterns
Producers
Producers
Producers
Producers
Producers
Consumers
VIP/Load
Balancers SRM
Primary
Cluster
Secondary
Cluster
Produce to both cluster.
Producers
Producers
Consumers
Consume from both clusters.
A/ Cross-cluster consumer groups
© 2019 Cloudera, Inc. All rights reserved. 26
Publish to topic
Cross-cluster consumer groups
Effectively one big consumer group
Producers
Producers
Producers
Producers
Producers
Consumers
VIP/Load
Balancers SRM
Primary
Cluster
Secondary
Cluster
Produce to both cluster.
Producers
Producers
Consumers
Subscribe to topic
R1
R1 R1
Subscribe to topic
R2
R2
R2
© 2019 Cloudera, Inc. All rights reserved. 27
Publish to topic
Cross-cluster consumer groups
What it takes to fail-over? Nothing
Producers
Producers
Producers
Producers
Producers
Consumers
VIP/Load
Balancers SRM
Primary
Cluster
Secondary
Cluster
Produce to both cluster.
Producers
Producers
Consumers
Subscribe to topic
R3
Subscribe to topic
R3
R3
Primary
Cluster
DC temporarily lost
© 2019 Cloudera, Inc. All rights reserved. 28
Publish to topic
Cross-cluster consumer groups
What it takes to fail-back? Nothing also
Producers
Producers
Producers
Producers
Producers
Consumers
VIP/Load
Balancers SRM
Primary
Cluster
Secondary
Cluster
Produce to both cluster.
Producers
Producers
Consumers
Recover from last point and
resume – some events may
be delayed
R4
R4 R4
DC issue resolved
© 2019 Cloudera, Inc. All rights reserved. 29
Publish to topic
Cross-cluster consumer groups
DC permanently lost
Producers
Producers
Producers
Producers
Producers
Consumers
VIP/Load
Balancers SRM
Primary-2
Cluster
Secondary
Cluster
Produce to both cluster.
Producers
Producers
Consumers
Data previously in primary is
not lost and can be recovered
from secondary
Subscribe to topic
Primary
Cluster
Bring new DC
XDCR
© 2019 Cloudera, Inc. All rights reserved. 31
Publish to topic
Cross Data Center Replication XDCR
All consumers process all records
Producers
Producers
Producers
Producers
Producers
Consumers
VIP/Load
Balancers SRM
Primary
Cluster
Secondary
Cluster
Produce to both cluster.
Producers
Producers
Consumers
Subscribe to *.topic
R1
R1 R1
Subscribe to *.topic
R1 R1
R2 R2
R2
R2 R2
Active – Passive Demo
Other use cases
© 2019 Cloudera, Inc. All rights reserved. 34
Cloud migration or Kafka version upgrade
© 2019 Cloudera, Inc. All rights reserved. 35
Aggregation for Analytics
Monitoring: Demo integration with
SMM
THAN YOU

More Related Content

What's hot

What's hot (20)

Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Tuning kafka pipelines
Tuning kafka pipelinesTuning kafka pipelines
Tuning kafka pipelines
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Disaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache KafkaDisaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache Kafka
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
Securing Kafka
Securing Kafka Securing Kafka
Securing Kafka
 
Deep Dive into Apache Kafka
Deep Dive into Apache KafkaDeep Dive into Apache Kafka
Deep Dive into Apache Kafka
 
Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
Kafka Tutorial: Kafka Security
Kafka Tutorial: Kafka SecurityKafka Tutorial: Kafka Security
Kafka Tutorial: Kafka Security
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®
 
Maria db 이중화구성_고민하기
Maria db 이중화구성_고민하기Maria db 이중화구성_고민하기
Maria db 이중화구성_고민하기
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming ApplicationsRunning Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 

Similar to Disaster Recovery and High Availability with Kafka, SRM and MM2

Kafka/SMM Crash Course
Kafka/SMM Crash CourseKafka/SMM Crash Course
Kafka/SMM Crash Course
DataWorks Summit
 
Dcs cloud architecture-high-level-design
Dcs cloud architecture-high-level-designDcs cloud architecture-high-level-design
Dcs cloud architecture-high-level-design
Isaac Chiang
 

Similar to Disaster Recovery and High Availability with Kafka, SRM and MM2 (20)

Beyond the Brokers | Emma Humber and Andrew Borley, IBM
Beyond the Brokers | Emma Humber and Andrew Borley, IBMBeyond the Brokers | Emma Humber and Andrew Borley, IBM
Beyond the Brokers | Emma Humber and Andrew Borley, IBM
 
Db2 developer ecosystem
Db2 developer ecosystemDb2 developer ecosystem
Db2 developer ecosystem
 
Db2 on cloud overview
Db2 on cloud overviewDb2 on cloud overview
Db2 on cloud overview
 
OpenStackを利用したEnterprise Cloudを支える技術 - OpenStack最新情報セミナー 2016年5月
OpenStackを利用したEnterprise Cloudを支える技術 - OpenStack最新情報セミナー 2016年5月OpenStackを利用したEnterprise Cloudを支える技術 - OpenStack最新情報セミナー 2016年5月
OpenStackを利用したEnterprise Cloudを支える技術 - OpenStack最新情報セミナー 2016年5月
 
Client Deployment of IBM Cloud Private (Think 2019 Session 5964A)
Client Deployment of IBM Cloud Private (Think 2019 Session 5964A)Client Deployment of IBM Cloud Private (Think 2019 Session 5964A)
Client Deployment of IBM Cloud Private (Think 2019 Session 5964A)
 
Client Deployment of IBM Cloud Private (IBM #Think2019 #5964)
Client Deployment of IBM Cloud Private (IBM #Think2019 #5964)Client Deployment of IBM Cloud Private (IBM #Think2019 #5964)
Client Deployment of IBM Cloud Private (IBM #Think2019 #5964)
 
Citi Tech Talk Disaster Recovery Solutions Deep Dive
Citi Tech Talk  Disaster Recovery Solutions Deep DiveCiti Tech Talk  Disaster Recovery Solutions Deep Dive
Citi Tech Talk Disaster Recovery Solutions Deep Dive
 
From Disaster to Recovery: Preparing Your IT for the Unexpected
From Disaster to Recovery: Preparing Your IT for the UnexpectedFrom Disaster to Recovery: Preparing Your IT for the Unexpected
From Disaster to Recovery: Preparing Your IT for the Unexpected
 
MySQL Database Architectures - 2022-08
MySQL Database Architectures - 2022-08MySQL Database Architectures - 2022-08
MySQL Database Architectures - 2022-08
 
Kafka/SMM Crash Course
Kafka/SMM Crash CourseKafka/SMM Crash Course
Kafka/SMM Crash Course
 
OpenStack Atlanta Summit - IBM, SoftLayer, and OpenStack: Present and Future
OpenStack Atlanta Summit - IBM, SoftLayer, and OpenStack: Present and FutureOpenStack Atlanta Summit - IBM, SoftLayer, and OpenStack: Present and Future
OpenStack Atlanta Summit - IBM, SoftLayer, and OpenStack: Present and Future
 
INFINIDAT InfiniGuard - 20220330.pdf
INFINIDAT InfiniGuard - 20220330.pdfINFINIDAT InfiniGuard - 20220330.pdf
INFINIDAT InfiniGuard - 20220330.pdf
 
Data stream with cruise control
Data stream with cruise controlData stream with cruise control
Data stream with cruise control
 
IBM WebSphere Liberty and Docker Deep Dive
IBM WebSphere Liberty and Docker Deep DiveIBM WebSphere Liberty and Docker Deep Dive
IBM WebSphere Liberty and Docker Deep Dive
 
Nrb Mainframe Day z Data and AI - Leif Pedersen
Nrb Mainframe Day z Data and AI - Leif PedersenNrb Mainframe Day z Data and AI - Leif Pedersen
Nrb Mainframe Day z Data and AI - Leif Pedersen
 
Help, My Kafka is Broken! (Emma Humber & Gantigmaa Selenge, IBM) Kafka Summit...
Help, My Kafka is Broken! (Emma Humber & Gantigmaa Selenge, IBM) Kafka Summit...Help, My Kafka is Broken! (Emma Humber & Gantigmaa Selenge, IBM) Kafka Summit...
Help, My Kafka is Broken! (Emma Humber & Gantigmaa Selenge, IBM) Kafka Summit...
 
Sklm webinar
Sklm webinarSklm webinar
Sklm webinar
 
Kubernetes Security Best Practices for DevOps
Kubernetes Security Best Practices for DevOpsKubernetes Security Best Practices for DevOps
Kubernetes Security Best Practices for DevOps
 
20200113 - IBM Cloud Côte d'Azur - DeepDive Kubernetes
20200113 - IBM Cloud Côte d'Azur - DeepDive Kubernetes20200113 - IBM Cloud Côte d'Azur - DeepDive Kubernetes
20200113 - IBM Cloud Côte d'Azur - DeepDive Kubernetes
 
Dcs cloud architecture-high-level-design
Dcs cloud architecture-high-level-designDcs cloud architecture-high-level-design
Dcs cloud architecture-high-level-design
 

More from Abdelkrim Hadjidj

More from Abdelkrim Hadjidj (9)

Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizon
 
Paris FOD meetup - koordinator
Paris FOD meetup - koordinatorParis FOD meetup - koordinator
Paris FOD meetup - koordinator
 
Paris FOD meetup - Streams Messaging Manager
Paris FOD meetup - Streams Messaging ManagerParis FOD meetup - Streams Messaging Manager
Paris FOD meetup - Streams Messaging Manager
 
Paris FOD meetup - kafka security 101
Paris FOD meetup - kafka security 101Paris FOD meetup - kafka security 101
Paris FOD meetup - kafka security 101
 
FOD Paris Meetup - Global Data Management with DataPlane Services (DPS)
FOD Paris Meetup -  Global Data Management with DataPlane Services (DPS)FOD Paris Meetup -  Global Data Management with DataPlane Services (DPS)
FOD Paris Meetup - Global Data Management with DataPlane Services (DPS)
 
Paris FOD Meetup #5 Hortonworks Presentation
Paris FOD Meetup #5 Hortonworks PresentationParis FOD Meetup #5 Hortonworks Presentation
Paris FOD Meetup #5 Hortonworks Presentation
 
Paris FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationParis FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant Presentation
 
Apache NiFi: latest developments for flow management at scale
Apache NiFi: latest developments for flow management at scaleApache NiFi: latest developments for flow management at scale
Apache NiFi: latest developments for flow management at scale
 
Future of Data Meetup : Boontadata
Future of Data Meetup : BoontadataFuture of Data Meetup : Boontadata
Future of Data Meetup : Boontadata
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

Disaster Recovery and High Availability with Kafka, SRM and MM2

  • 1. Kafka Disaster Recovery Abdelkrim Hadjidj Senior Data Streaming Specialist
  • 2. © 2019 Cloudera, Inc. All rights reserved. 2 Quick intro • Senior Specialist Solution Engineer at Cloudera • Focus on CDF offering ● Edge Management & IoT (MiNiFi, CEM) ● Flow Management (NiFi, Registry) ● Stream Processing (Kafka, KStreams, SMM, SR, …) • Founder of Future of Data Paris Meetup http://tiny.cc/fodp • Founder of Solutions Engineers of Paris http://tiny.cc/PSE @ahadjidj
  • 3. © 2019 Cloudera, Inc. All rights reserved. 3 Kafka Disaster Recovery options Broker Broker Broker DC1 DC2 Data DC1 DC2 Data Dual ingest Zero RPO Mirroring** Very low RPO DC2 DC3 Data Multiple DC* Zero RPO BrokerBroker Broker Broker Broker Broker Broker Broker Broker Broker Broker DC1 Broker * Stretch cluster on geographically distributed DC is not recommended ** Replication is used for internal broker replication
  • 4. © 2019 Cloudera, Inc. All rights reserved. 4 Agenda From MM to MM2 and SRM Active Passive Architecture Active Active Architectures Other use cases Monitoring Q&A
  • 5. © 2019 Cloudera, Inc. All rights reserved. 5 Mirror Maker use cases DC1 DC2 DC3 K1 K2 K3 MM aggregate Aggregation DC1 DC2 DC3 K1 K2 K3 MM MM Data Deployment MMK1 K2 P P P P P P C C C C C C Segmentation MMK2 K1 P P P P P P C C C C C C MMK3 P P P P P P Acquisitions & mergers
  • 6. © 2019 Cloudera, Inc. All rights reserved. 6 Mirror Maker use cases Tracking Queuing P P P P P P P P P P P P C C C C C C C C C C C C Tracking Aggregate MM Queuing Aggregate MM C C C C C C C C C C C C HDFS HDFS MM MM
  • 7. © 2019 Cloudera, Inc. All rights reserved. 7 Mirror Make limitations for Disaster Recovery • Static Whitelists and Blacklists • Configuration synch • Manual Topic Naming to avoid Cycles • Scalability and Throughput Limitations due to Rebalances • Lack of Monitoring and Operational Support • No Disaster Recovery, Migration, Failover • Too many MirrorMaker Clusters
  • 8. © 2019 Cloudera, Inc. All rights reserved. 8 Streams Replication Manager • Mirror Maker 2 KIP-382 • Supports active-active, multi- cluster, cross DC replication & other complex scenarios • Leverage Kafka Connect for scalability and HA • Replicate data and configurations (ACL, partitioning, new topics, etc) • Offset translation for failover and failback • Monitoring integration with SMM A B C X Y C C C Kafka Connect MM2 cluster X topic1.part1 topic1.part0 A topic1.part1 topic1.part0 A.topic1.part1 A.topic1.part0 B topic1.part1 topic1.part0 X.topic1.part1 X.topic1.part0
  • 9. Active – Passive Architecture
  • 10. © 2019 Cloudera, Inc. All rights reserved. 10 Producers send to primary if available, to secondary if not Consumers can be migrated between primary and secondary clusters. Active/standby Data, offset syncs, and consumer checkpoints. Producers Producers Producers Producers Producers Consumers VIP/Load Balancers SRM Primary Cluster Secondary Cluster
  • 11. © 2019 Cloudera, Inc. All rights reserved. 11 Configuration file • Simple file configuration • Multi directional • Fine grained replication • Topics white/black lists • Group white/black lists • Interval configurations • Supports patterns $ ./bin/connect-mirror-maker.sh mm2.properties
  • 12. © 2019 Cloudera, Inc. All rights reserved. 12 Remote topics • Replicated topics are renamed according to ReplicationPolicy. • Default policy : <source>.<topic> • Can implement custom policies topic1 topic2 secondary.topic1 secondary.topic2 topic1 topic2 primary.topic1 primary.topic2 SRM Primary Cluster Secondary Cluster
  • 13. © 2019 Cloudera, Inc. All rights reserved. 13 Heartbeats • MM2 emits a heartbeat topic in each source cluster, which is replicated to other clusters • Downstream cluster uses this topic to verify that ● The connector is running ● The corresponding source cluster is available target=primary source=secondary Timestamp=5434356 primary.heartbeats SRM Secondary Cluster
  • 14. © 2019 Cloudera, Inc. All rights reserved. 14 Offset Syncs • Offset sync stream maps offsets between mirrored clusters. topic=primary.topic1 partition=4 upstreamOffset=100 downstreamOffset=102 primary.offset-syncs.internal SRM Secondary Cluster
  • 15. © 2019 Cloudera, Inc. All rights reserved. 15 Checkpoints • Checkpoint stream replicates consumer group state. • MM2 periodically emit checkpoints in the destination cluster • The checkpoint topic is log- compacted to reflect only the latest offsets across consumer groups topic=primary.topic1 partition=4 group=consumer-group-2 upstreamOffset=100 offset=102 primary.checkpoints.internal SRM Secondary Cluster
  • 16. © 2019 Cloudera, Inc. All rights reserved. 16 Cross-cluster offset translation Translate offsets between clusters via RemoteClusterUtils Map<TopicPartition, Long> newOffsets = RemoteClusterUtils.translateOffsets( newClusterProperties, oldClusterName, consumerGroupId); consumer.seek(newOffsets); ● offset translation based on checkpoints in new cluster ● no connection to old cluster required
  • 17. © 2019 Cloudera, Inc. All rights reserved. 17 Publish to topic Active/standby Data, offset syncs, and consumer checkpoints. Producers Producers Producers Producers Producers Consumers VIP/Load Balancers SRM Primary Cluster Secondary Cluster Subscribe to *.topic
  • 18. © 2019 Cloudera, Inc. All rights reserved. 18 Publish to topic Primary down: fail over Migrate consumers Data, offset syncs, and consumer checkpoints. Producers Producers Producers Producers Producers Consumers VIP/Load Balancers SRM Primary Cluster Secondary Cluster Use RemoteClusterUtil to migrate to primary.topic (old data) and topic (new data)
  • 19. © 2019 Cloudera, Inc. All rights reserved. 19 Publish to topic Primary down: fail over Migrate consumers Data, offset syncs, and consumer checkpoints. Producers Producers Producers Producers Producers Consumers VIP/Load Balancers SRM Primary Cluster Secondary Cluster $ srm-control offsets --bootstrap-server :9092 --source primary --group foo --export > out.csv $ kafka-consumer-groups --bootstrap-server B_host:9092 --reset-offsets --group foo --execute --from-file out.csv
  • 20. © 2019 Cloudera, Inc. All rights reserved. 20 Publish to topic Primary permanently lost? Recover from secondary. Lost primary topics can be recovered from remote topics on secondary cluster. Producers Producers Producers VIP/Load Balancers SRM Primary Cluster Secondary Cluster Primary-2 topic1 topic2 secondary.topic1 secondary.topic2 secondary.primary.topic1 secondary.primary.topic2 topic1 topic2 primary.topic1 primary.topic2 primary-2.topic1 primary-2.topic2 Data from old primary
  • 22. © 2019 Cloudera, Inc. All rights reserved. 22 Publish to retail-store Active/standby Demo Scenario Producers Producers NiFi Producers Producers NiFi SRM Paris Cluster NYC Cluster Subscribe to retail-store and nyc_retail-store
  • 24. © 2019 Cloudera, Inc. All rights reserved. 24 Publish to topic Active/active: Cross Consumer Groups or XDCR Consumer subscription defines the patterns Producers Producers Producers Producers Producers Consumers VIP/Load Balancers SRM Primary Cluster Secondary Cluster Produce to both cluster. Producers Producers Consumers Consume from both clusters.
  • 26. © 2019 Cloudera, Inc. All rights reserved. 26 Publish to topic Cross-cluster consumer groups Effectively one big consumer group Producers Producers Producers Producers Producers Consumers VIP/Load Balancers SRM Primary Cluster Secondary Cluster Produce to both cluster. Producers Producers Consumers Subscribe to topic R1 R1 R1 Subscribe to topic R2 R2 R2
  • 27. © 2019 Cloudera, Inc. All rights reserved. 27 Publish to topic Cross-cluster consumer groups What it takes to fail-over? Nothing Producers Producers Producers Producers Producers Consumers VIP/Load Balancers SRM Primary Cluster Secondary Cluster Produce to both cluster. Producers Producers Consumers Subscribe to topic R3 Subscribe to topic R3 R3 Primary Cluster DC temporarily lost
  • 28. © 2019 Cloudera, Inc. All rights reserved. 28 Publish to topic Cross-cluster consumer groups What it takes to fail-back? Nothing also Producers Producers Producers Producers Producers Consumers VIP/Load Balancers SRM Primary Cluster Secondary Cluster Produce to both cluster. Producers Producers Consumers Recover from last point and resume – some events may be delayed R4 R4 R4 DC issue resolved
  • 29. © 2019 Cloudera, Inc. All rights reserved. 29 Publish to topic Cross-cluster consumer groups DC permanently lost Producers Producers Producers Producers Producers Consumers VIP/Load Balancers SRM Primary-2 Cluster Secondary Cluster Produce to both cluster. Producers Producers Consumers Data previously in primary is not lost and can be recovered from secondary Subscribe to topic Primary Cluster Bring new DC
  • 30. XDCR
  • 31. © 2019 Cloudera, Inc. All rights reserved. 31 Publish to topic Cross Data Center Replication XDCR All consumers process all records Producers Producers Producers Producers Producers Consumers VIP/Load Balancers SRM Primary Cluster Secondary Cluster Produce to both cluster. Producers Producers Consumers Subscribe to *.topic R1 R1 R1 Subscribe to *.topic R1 R1 R2 R2 R2 R2 R2
  • 34. © 2019 Cloudera, Inc. All rights reserved. 34 Cloud migration or Kafka version upgrade
  • 35. © 2019 Cloudera, Inc. All rights reserved. 35 Aggregation for Analytics