Common Patterns of Multi Data-Center Architectures with Apache Kafka

confluent
confluentconfluent
1
Patterns of Multi Data-Center
Architectures
Gwen Shapira, Product Manager
@gwenshap
2
What we’ll talk about today
• When is one cluster not enough?
• When is one DC not enough?
• Trade-offs in multi-DC architectures
• Architectures used in common use-cases
3
Multi-
Cluster
Multi-
DC
Multi-
Region
4
Reasons to have multiple
Kafka clusters in same DC
• Isolation
• Tuning
• Convenience
• Organization structure
5
Workload Isolation
• Dev, Test, Staging, Prod
• Lower impact of cluster-wide failures
• Prioritize and protect important topics
• Separate high-throughput but low-
value topics
• Different access patterns
• Security / access requirements
Payments
Insights
Metrics
6
Reasons to go
Multi-DC
• Geo-locality
• Legal reasons
• Cloud and On-Prem
• Disaster Recovery
7
Multi-DC is going to involve some tough choices
8
Main trade-off
Low Latency / High
throughput
• Write to one DC,
replicate later
Strong Consistency
• Wait for multi-DC to
acknowledge writes
9
Operationalizing is hard
• Multi-DC is EXPENSIVE
• Install, configure and upgrade multiple clusters
• Monitor and troubleshoot multiple clusters
• Figure out a multi-DC architecture
• Choose, install, configure replication solution
• Monitor replication
• Failover?
This is complex enough to warrant another talk…
10
Geo-Locality
Why Geo-Locality
Data needs to be close to the users.
And the users are all over the place.
Main Challenges
• Topic names
• Management of replication pipelines
• Managing configuration
• Avoiding “loops”
11
Geo-locality
scenarios
12
Minimize
Number of Pipes
• Less to configure and
maintain
• Less to monitor
• Easier to avoid loops
13
Replicator – One end-point to rule them all
curl -X POST -H "Content-Type: application/json" --data @replicator-sf.json http://localhost:28083/connectors
{ "name": ”SF-Replicator",
"config": {
"connector.class": "io.confluent.connect.replicator.ReplicatorSourceConnector",
"tasks" : 4,
"topic.whitelist": "demo-topic",
"topic.rename.format": ”SF.${topic}",
"topic.auto.create": true,
"key.converter": "io.confluent.connect.replicator.util.ByteArrayConverter",
"value.converter": "io.confluent.connect.replicator.util.ByteArrayConverter”,
"src.kafka.bootstrap.servers": "dc1-kafka:19092",
"src.zookeeper.connect": "dc1-zookeeper:12181",
"dest.zookeeper.connect": "dc2-zookeeper:22181”}
}
14
Origin Regions
ZooKeeper
Kafka Broker
Destination
Central Cluster
ZooKeeper
Kafka Broker
test-topic
NYC.test-topic
Connect
Replicator
NYC
consumer
ZooKeeper
Kafka Broker
test-topic
SF.test-topic
Replicator
SF
consumer
Producer
15
Multi-DC for Legalities
What Legalities
• Similar to geo-localization. but…
• Different countries have different data
storage laws
• But some data needs to be shared
• Laws regarding encryption
• Laws regarding privacy
• Also – legal usually wants failover
Main Challenges
• Avoid copying some data
• Encryption over the wire
• Lineage
• Security, audits
16
SMT to the rescue!
• Simple message transforms.
• Route, filter or modify events with
just a bit of config
• Work with any connector –
specifically, Replicator.
• Pluggable – you can add your own
17
Lineage + Message Filter SMT
"transforms":"InsertSourceDetails, DropField",
"transforms.InsertSourceDetails.type":"org.apache.kafka.connect.transforms.InsertField$Value",
"transforms.InsertSourceDetails.static.field":"messagesource",
"transforms.InsertSourceDetails.static.value":"MySQL demo on asgard”
"transforms.DropField.type":"org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.DropField.blacklist":”very_private_field"
18
Multi-DC for Cloud Migration
Why Cloud Migration?
• All the cool kids are moving to the cloud
• Likely to be a lengthy process
• Maybe on-prem + multi-clouds forever
• Many apps and teams involved
• Need well known central pipe
Main Challenges
• Most expensive network ever
• Random failure modes
• Many applications, teams, data-stores
• To be honest, Kafka is the easy part
19
At first, this is
no big deal…. App
App
AppApp
DWH
DB
KV
App
DB
DC1 AWS
20
6 month later...DC1 AWS
DB
APP
APP
APP
APP
APP
APP
APP
APP
DB
DB
DWH
KV
KV
KV
DWH
21
Are you
kidding?
● This is expensive
● This is a maintenance
nightmare
● We may need more than one
region!
● We may need more than one
cloud!
22
We’ve done this before...
This... To this...
23
There is a
better way
24
Benefits of Kafka + Replicator for Cloud
Operations
1. Proven architecture
(Watch our online talk with Monsanto!)
2. Non-stop, low latency pipe
3. Cost savings
4. One throat to tune, manage, monitor,
secure and improve
Future-proof
1. Connect lets you explore cloud services
2. Avoid lock-in: ”Kafka is our escape valve”
3. Multi-zone, multi-region, multi-cloud…
4. Microservices ready
5. Streams ready
24
25
Few general lessons
1. Don’t be afraid of many clusters
2. Decide if you need to scale clusters, data-centers, regions or all of above
3. Choose your trade-offs
4. It is better to consume over distance than to produce
5. Unless you stop consuming when you can’t produce
6. Security over the wire:
• SSL encryption for consumers takes LOTS of broker resources
• Maybe consume locally without SSL and produce remotely with SSL
7. Monitor, monitor, monitor. Especially lag
8. Tuning over WAN is different: https://docs.confluent.io/current/multi-dc/replicator-tuning.html
26
Feature Benefit MirrorMaker Confluent Replicator
Data Replication Real-time event streaming between Kafka clusters and data-centers
Schema Replication Integrate with Confluent Schema Registry for multi-dc data quality and governance
Connect Replication Manage data integration across multiple data centers
Flexible topic selection Select topics with white-lists, black-lists and regular expressions
Auto-create topics New topics are automatically detected and replicated. Minimize admin overhead.
Add new partitions New partitions are automatically detected and replicated. Minimize admin overhead.
Configuration Replication
Topic configuration remains synchronized between the two clusters. Avoid configuration
diverging due to human error.
Auto-Scale Scale replication processes as Kafka traffic increases with a single configuration
Active-active replication Redirect events to avoid infinite replication loops in active-active configurations
Aggregate cluster One management point for replicating more than a single cluster
Control Center Integration Manage and monitor replication via Control Center UI
Support transformations Via Connect’s SMT – Lineage, routing, masking, filtering and more
Confluent Replicator is Enterprise-ready MirrorMaker
27
We learned
1. Why you need multiple clusters. Or multiple Data Centers. Or multiple regions.
2. What are the trade-offs involved
3. Few common use-cases and architectures
4. Next week: Data recovery and failover
28
Thank You!
1 of 28

Recommended

Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17 by
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Gwen (Chen) Shapira
9.6K views33 slides
Disaster Recovery Plans for Apache Kafka by
Disaster Recovery Plans for Apache KafkaDisaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache Kafkaconfluent
11.3K views34 slides
Introduction to Apache Kafka by
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
52.2K views70 slides
Stream processing using Kafka by
Stream processing using KafkaStream processing using Kafka
Stream processing using KafkaKnoldus Inc.
1.6K views44 slides
Apache kafka by
Apache kafkaApache kafka
Apache kafkaNexThoughts Technologies
805 views23 slides
Intelligent Auto-scaling of Kafka Consumers with Workload Prediction | Ming S... by
Intelligent Auto-scaling of Kafka Consumers with Workload Prediction | Ming S...Intelligent Auto-scaling of Kafka Consumers with Workload Prediction | Ming S...
Intelligent Auto-scaling of Kafka Consumers with Workload Prediction | Ming S...HostedbyConfluent
4.6K views16 slides

More Related Content

What's hot

Apache kafka by
Apache kafkaApache kafka
Apache kafkaViswanath J
3.3K views36 slides
Introduction to Apache Kafka by
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaShiao-An Yuan
1.6K views48 slides
When NOT to use Apache Kafka? by
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?Kai Wähner
1.5K views12 slides
Kafka Tutorial - Introduction to Apache Kafka (Part 1) by
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
8.9K views250 slides
Apache Kafka Architecture & Fundamentals Explained by
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explainedconfluent
27.7K views33 slides
An Introduction to Apache Kafka by
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache KafkaAmir Sedighi
2.7K views63 slides

What's hot(20)

Introduction to Apache Kafka by Shiao-An Yuan
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
Shiao-An Yuan1.6K views
When NOT to use Apache Kafka? by Kai Wähner
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?
Kai Wähner1.5K views
Kafka Tutorial - Introduction to Apache Kafka (Part 1) by Jean-Paul Azar
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Jean-Paul Azar8.9K views
Apache Kafka Architecture & Fundamentals Explained by confluent
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
confluent27.7K views
An Introduction to Apache Kafka by Amir Sedighi
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
Amir Sedighi2.7K views
Apache Kafka by emreakis
Apache KafkaApache Kafka
Apache Kafka
emreakis1.2K views
Exactly-Once Financial Data Processing at Scale with Flink and Pinot by Flink Forward
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward688 views
Introduction to Kafka by Akash Vacher
Introduction to KafkaIntroduction to Kafka
Introduction to Kafka
Akash Vacher3.2K views
Common issues with Apache Kafka® Producer by confluent
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producer
confluent2.8K views
A visual introduction to Apache Kafka by Paul Brebner
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache Kafka
Paul Brebner4.7K views
Apache Kafka – (Pattern and) Anti-Pattern by confluent
Apache Kafka – (Pattern and) Anti-PatternApache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-Pattern
confluent2.3K views
Microservices in the Apache Kafka Ecosystem by confluent
Microservices in the Apache Kafka EcosystemMicroservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka Ecosystem
confluent19.8K views
Stream processing with Apache Flink (Timo Walther - Ververica) by KafkaZone
Stream processing with Apache Flink (Timo Walther - Ververica)Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)
KafkaZone606 views

Similar to Common Patterns of Multi Data-Center Architectures with Apache Kafka

SpringPeople - Introduction to Cloud Computing by
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople
252 views40 slides
BigData Developers MeetUp by
BigData Developers MeetUpBigData Developers MeetUp
BigData Developers MeetUpChristian Johannsen
812 views52 slides
Streaming Analytics with Spark, Kafka, Cassandra and Akka by
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaHelena Edelson
52.1K views84 slides
Managing storage on Prem and in Cloud by
Managing storage on Prem and in CloudManaging storage on Prem and in Cloud
Managing storage on Prem and in CloudHoward Marks
478 views59 slides
The MySQL High Availability Landscape and where Galera Cluster fits in by
The MySQL High Availability Landscape and where Galera Cluster fits inThe MySQL High Availability Landscape and where Galera Cluster fits in
The MySQL High Availability Landscape and where Galera Cluster fits inSakari Keskitalo
495 views27 slides
InfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard by
InfluxEnterprise Architecture Patterns by Tim Hall & Sam DillardInfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
InfluxEnterprise Architecture Patterns by Tim Hall & Sam DillardInfluxData
379 views35 slides

Similar to Common Patterns of Multi Data-Center Architectures with Apache Kafka(20)

SpringPeople - Introduction to Cloud Computing by SpringPeople
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
SpringPeople252 views
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Helena Edelson52.1K views
Managing storage on Prem and in Cloud by Howard Marks
Managing storage on Prem and in CloudManaging storage on Prem and in Cloud
Managing storage on Prem and in Cloud
Howard Marks478 views
The MySQL High Availability Landscape and where Galera Cluster fits in by Sakari Keskitalo
The MySQL High Availability Landscape and where Galera Cluster fits inThe MySQL High Availability Landscape and where Galera Cluster fits in
The MySQL High Availability Landscape and where Galera Cluster fits in
Sakari Keskitalo495 views
InfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard by InfluxData
InfluxEnterprise Architecture Patterns by Tim Hall & Sam DillardInfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
InfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
InfluxData379 views
What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum... by confluent
What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...
What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...
confluent890 views
Highly available, scalable and secure data with Cassandra and DataStax Enterp... by Johnny Miller
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Johnny Miller4.5K views
MySQL Security in a Cloudy World by Dave Stokes
MySQL Security in a Cloudy WorldMySQL Security in a Cloudy World
MySQL Security in a Cloudy World
Dave Stokes671 views
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre... by InfluxData
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
InfluxData226 views
Monitoring MySQL at scale by Ovais Tariq
Monitoring MySQL at scaleMonitoring MySQL at scale
Monitoring MySQL at scale
Ovais Tariq2.1K views
From cache to in-memory data grid. Introduction to Hazelcast. by Taras Matyashovsky
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.
Taras Matyashovsky42.6K views
Hacking apache cloud stack by Nitin Mehta
Hacking apache cloud stackHacking apache cloud stack
Hacking apache cloud stack
Nitin Mehta5.7K views
How To Build A Stable And Robust Base For a “Cloud” by Hardway Hou
How To Build A Stable And Robust Base For a “Cloud”How To Build A Stable And Robust Base For a “Cloud”
How To Build A Stable And Robust Base For a “Cloud”
Hardway Hou218 views
OpenStack Grizzly Release by OpenStack
OpenStack Grizzly ReleaseOpenStack Grizzly Release
OpenStack Grizzly Release
OpenStack1.4K views
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson by Spark Summit
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonStreaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Spark Summit8.8K views
Scylla Summit 2016: Compose on Containing the Database by ScyllaDB
Scylla Summit 2016: Compose on Containing the DatabaseScylla Summit 2016: Compose on Containing the Database
Scylla Summit 2016: Compose on Containing the Database
ScyllaDB2K views
Data has a better idea the in-memory data grid by Bogdan Dina
Data has a better idea   the in-memory data gridData has a better idea   the in-memory data grid
Data has a better idea the in-memory data grid
Bogdan Dina50 views

More from confluent

Citi TechTalk Session 2: Kafka Deep Dive by
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
18 views60 slides
Build real-time streaming data pipelines to AWS with Confluent by
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
69 views53 slides
Q&A with Confluent Professional Services: Confluent Service Mesh by
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
67 views69 slides
Citi Tech Talk: Event Driven Kafka Microservices by
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
23 views29 slides
Confluent & GSI Webinars series - Session 3 by
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
15 views59 slides
Citi Tech Talk: Messaging Modernization by
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
17 views39 slides

More from confluent(20)

Citi TechTalk Session 2: Kafka Deep Dive by confluent
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
confluent18 views
Build real-time streaming data pipelines to AWS with Confluent by confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
confluent69 views
Q&A with Confluent Professional Services: Confluent Service Mesh by confluent
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
confluent67 views
Citi Tech Talk: Event Driven Kafka Microservices by confluent
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
confluent23 views
Confluent & GSI Webinars series - Session 3 by confluent
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
confluent15 views
Citi Tech Talk: Messaging Modernization by confluent
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
confluent17 views
Citi Tech Talk: Data Governance for streaming and real time data by confluent
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
confluent21 views
Confluent & GSI Webinars series: Session 2 by confluent
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
confluent16 views
Data In Motion Paris 2023 by confluent
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
confluent224 views
The Future of Application Development - API Days - Melbourne 2023 by confluent
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
confluent68 views
The Playful Bond Between REST And Data Streams by confluent
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
confluent49 views
The Journey to Data Mesh with Confluent by confluent
The Journey to Data Mesh with ConfluentThe Journey to Data Mesh with Confluent
The Journey to Data Mesh with Confluent
confluent71 views
Citi Tech Talk: Monitoring and Performance by confluent
Citi Tech Talk: Monitoring and PerformanceCiti Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and Performance
confluent40 views
Citi Tech Talk Disaster Recovery Solutions Deep Dive by confluent
Citi Tech Talk  Disaster Recovery Solutions Deep DiveCiti Tech Talk  Disaster Recovery Solutions Deep Dive
Citi Tech Talk Disaster Recovery Solutions Deep Dive
confluent66 views
Citi Tech Talk: Hybrid Cloud by confluent
Citi Tech Talk: Hybrid CloudCiti Tech Talk: Hybrid Cloud
Citi Tech Talk: Hybrid Cloud
confluent43 views
Confluent Partner Tech Talk with QLIK by confluent
Confluent Partner Tech Talk with QLIKConfluent Partner Tech Talk with QLIK
Confluent Partner Tech Talk with QLIK
confluent90 views
Real-time Streaming for Government and the Public Sector by confluent
Real-time Streaming for Government and the Public SectorReal-time Streaming for Government and the Public Sector
Real-time Streaming for Government and the Public Sector
confluent41 views
Confluent Partner Tech Talk with SVA by confluent
Confluent Partner Tech Talk with SVAConfluent Partner Tech Talk with SVA
Confluent Partner Tech Talk with SVA
confluent95 views
How to Build Real-Time Analytics Applications like Netflix, Confluent, and Re... by confluent
How to Build Real-Time Analytics Applications like Netflix, Confluent, and Re...How to Build Real-Time Analytics Applications like Netflix, Confluent, and Re...
How to Build Real-Time Analytics Applications like Netflix, Confluent, and Re...
confluent28 views
Single View of Data by confluent
Single View of DataSingle View of Data
Single View of Data
confluent71 views

Recently uploaded

Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... by
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...The Digital Insurer
24 views52 slides
Design Driven Network Assurance by
Design Driven Network AssuranceDesign Driven Network Assurance
Design Driven Network AssuranceNetwork Automation Forum
19 views42 slides
GDSC CTU First Meeting Party by
GDSC CTU First Meeting PartyGDSC CTU First Meeting Party
GDSC CTU First Meeting PartyNational Yang Ming Chiao Tung University
11 views25 slides
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...Bernd Ruecker
48 views69 slides
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive by
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveAutomating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveNetwork Automation Forum
43 views35 slides
Five Things You SHOULD Know About Postman by
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About PostmanPostman
38 views43 slides

Recently uploaded(20)

Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... by The Digital Insurer
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker48 views
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive by Network Automation Forum
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveAutomating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Five Things You SHOULD Know About Postman by Postman
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About Postman
Postman38 views
Unit 1_Lecture 2_Physical Design of IoT.pdf by StephenTec
Unit 1_Lecture 2_Physical Design of IoT.pdfUnit 1_Lecture 2_Physical Design of IoT.pdf
Unit 1_Lecture 2_Physical Design of IoT.pdf
StephenTec15 views
SAP Automation Using Bar Code and FIORI.pdf by Virendra Rai, PMP
SAP Automation Using Bar Code and FIORI.pdfSAP Automation Using Bar Code and FIORI.pdf
SAP Automation Using Bar Code and FIORI.pdf
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院 by IttrainingIttraining
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
Business Analyst Series 2023 - Week 3 Session 5 by DianaGray10
Business Analyst Series 2023 -  Week 3 Session 5Business Analyst Series 2023 -  Week 3 Session 5
Business Analyst Series 2023 - Week 3 Session 5
DianaGray10345 views
PharoJS - Zürich Smalltalk Group Meetup November 2023 by Noury Bouraqadi
PharoJS - Zürich Smalltalk Group Meetup November 2023PharoJS - Zürich Smalltalk Group Meetup November 2023
PharoJS - Zürich Smalltalk Group Meetup November 2023
Noury Bouraqadi139 views

Common Patterns of Multi Data-Center Architectures with Apache Kafka

  • 1. 1 Patterns of Multi Data-Center Architectures Gwen Shapira, Product Manager @gwenshap
  • 2. 2 What we’ll talk about today • When is one cluster not enough? • When is one DC not enough? • Trade-offs in multi-DC architectures • Architectures used in common use-cases
  • 4. 4 Reasons to have multiple Kafka clusters in same DC • Isolation • Tuning • Convenience • Organization structure
  • 5. 5 Workload Isolation • Dev, Test, Staging, Prod • Lower impact of cluster-wide failures • Prioritize and protect important topics • Separate high-throughput but low- value topics • Different access patterns • Security / access requirements Payments Insights Metrics
  • 6. 6 Reasons to go Multi-DC • Geo-locality • Legal reasons • Cloud and On-Prem • Disaster Recovery
  • 7. 7 Multi-DC is going to involve some tough choices
  • 8. 8 Main trade-off Low Latency / High throughput • Write to one DC, replicate later Strong Consistency • Wait for multi-DC to acknowledge writes
  • 9. 9 Operationalizing is hard • Multi-DC is EXPENSIVE • Install, configure and upgrade multiple clusters • Monitor and troubleshoot multiple clusters • Figure out a multi-DC architecture • Choose, install, configure replication solution • Monitor replication • Failover? This is complex enough to warrant another talk…
  • 10. 10 Geo-Locality Why Geo-Locality Data needs to be close to the users. And the users are all over the place. Main Challenges • Topic names • Management of replication pipelines • Managing configuration • Avoiding “loops”
  • 12. 12 Minimize Number of Pipes • Less to configure and maintain • Less to monitor • Easier to avoid loops
  • 13. 13 Replicator – One end-point to rule them all curl -X POST -H "Content-Type: application/json" --data @replicator-sf.json http://localhost:28083/connectors { "name": ”SF-Replicator", "config": { "connector.class": "io.confluent.connect.replicator.ReplicatorSourceConnector", "tasks" : 4, "topic.whitelist": "demo-topic", "topic.rename.format": ”SF.${topic}", "topic.auto.create": true, "key.converter": "io.confluent.connect.replicator.util.ByteArrayConverter", "value.converter": "io.confluent.connect.replicator.util.ByteArrayConverter”, "src.kafka.bootstrap.servers": "dc1-kafka:19092", "src.zookeeper.connect": "dc1-zookeeper:12181", "dest.zookeeper.connect": "dc2-zookeeper:22181”} }
  • 14. 14 Origin Regions ZooKeeper Kafka Broker Destination Central Cluster ZooKeeper Kafka Broker test-topic NYC.test-topic Connect Replicator NYC consumer ZooKeeper Kafka Broker test-topic SF.test-topic Replicator SF consumer Producer
  • 15. 15 Multi-DC for Legalities What Legalities • Similar to geo-localization. but… • Different countries have different data storage laws • But some data needs to be shared • Laws regarding encryption • Laws regarding privacy • Also – legal usually wants failover Main Challenges • Avoid copying some data • Encryption over the wire • Lineage • Security, audits
  • 16. 16 SMT to the rescue! • Simple message transforms. • Route, filter or modify events with just a bit of config • Work with any connector – specifically, Replicator. • Pluggable – you can add your own
  • 17. 17 Lineage + Message Filter SMT "transforms":"InsertSourceDetails, DropField", "transforms.InsertSourceDetails.type":"org.apache.kafka.connect.transforms.InsertField$Value", "transforms.InsertSourceDetails.static.field":"messagesource", "transforms.InsertSourceDetails.static.value":"MySQL demo on asgard” "transforms.DropField.type":"org.apache.kafka.connect.transforms.ReplaceField$Value", "transforms.DropField.blacklist":”very_private_field"
  • 18. 18 Multi-DC for Cloud Migration Why Cloud Migration? • All the cool kids are moving to the cloud • Likely to be a lengthy process • Maybe on-prem + multi-clouds forever • Many apps and teams involved • Need well known central pipe Main Challenges • Most expensive network ever • Random failure modes • Many applications, teams, data-stores • To be honest, Kafka is the easy part
  • 19. 19 At first, this is no big deal…. App App AppApp DWH DB KV App DB DC1 AWS
  • 20. 20 6 month later...DC1 AWS DB APP APP APP APP APP APP APP APP DB DB DWH KV KV KV DWH
  • 21. 21 Are you kidding? ● This is expensive ● This is a maintenance nightmare ● We may need more than one region! ● We may need more than one cloud!
  • 22. 22 We’ve done this before... This... To this...
  • 24. 24 Benefits of Kafka + Replicator for Cloud Operations 1. Proven architecture (Watch our online talk with Monsanto!) 2. Non-stop, low latency pipe 3. Cost savings 4. One throat to tune, manage, monitor, secure and improve Future-proof 1. Connect lets you explore cloud services 2. Avoid lock-in: ”Kafka is our escape valve” 3. Multi-zone, multi-region, multi-cloud… 4. Microservices ready 5. Streams ready 24
  • 25. 25 Few general lessons 1. Don’t be afraid of many clusters 2. Decide if you need to scale clusters, data-centers, regions or all of above 3. Choose your trade-offs 4. It is better to consume over distance than to produce 5. Unless you stop consuming when you can’t produce 6. Security over the wire: • SSL encryption for consumers takes LOTS of broker resources • Maybe consume locally without SSL and produce remotely with SSL 7. Monitor, monitor, monitor. Especially lag 8. Tuning over WAN is different: https://docs.confluent.io/current/multi-dc/replicator-tuning.html
  • 26. 26 Feature Benefit MirrorMaker Confluent Replicator Data Replication Real-time event streaming between Kafka clusters and data-centers Schema Replication Integrate with Confluent Schema Registry for multi-dc data quality and governance Connect Replication Manage data integration across multiple data centers Flexible topic selection Select topics with white-lists, black-lists and regular expressions Auto-create topics New topics are automatically detected and replicated. Minimize admin overhead. Add new partitions New partitions are automatically detected and replicated. Minimize admin overhead. Configuration Replication Topic configuration remains synchronized between the two clusters. Avoid configuration diverging due to human error. Auto-Scale Scale replication processes as Kafka traffic increases with a single configuration Active-active replication Redirect events to avoid infinite replication loops in active-active configurations Aggregate cluster One management point for replicating more than a single cluster Control Center Integration Manage and monitor replication via Control Center UI Support transformations Via Connect’s SMT – Lineage, routing, masking, filtering and more Confluent Replicator is Enterprise-ready MirrorMaker
  • 27. 27 We learned 1. Why you need multiple clusters. Or multiple Data Centers. Or multiple regions. 2. What are the trade-offs involved 3. Few common use-cases and architectures 4. Next week: Data recovery and failover