1
Patterns of Multi Data-Center
Architectures
Gwen Shapira, Product Manager
@gwenshap
2
What we’ll talk about today
• When is one cluster not enough?
• When is one DC not enough?
• Trade-offs in multi-DC architectures
• Architectures used in common use-cases
3
Multi-
Cluster
Multi-
DC
Multi-
Region
4
Reasons to have multiple
Kafka clusters in same DC
• Isolation
• Tuning
• Convenience
• Organization structure
5
Workload Isolation
• Dev, Test, Staging, Prod
• Lower impact of cluster-wide failures
• Prioritize and protect important topics
• Separate high-throughput but low-
value topics
• Different access patterns
• Security / access requirements
Payments
Insights
Metrics
6
Reasons to go
Multi-DC
• Geo-locality
• Legal reasons
• Cloud and On-Prem
• Disaster Recovery
7
Multi-DC is going to involve some tough choices
8
Main trade-off
Low Latency / High
throughput
• Write to one DC,
replicate later
Strong Consistency
• Wait for multi-DC to
acknowledge writes
9
Operationalizing is hard
• Multi-DC is EXPENSIVE
• Install, configure and upgrade multiple clusters
• Monitor and troubleshoot multiple clusters
• Figure out a multi-DC architecture
• Choose, install, configure replication solution
• Monitor replication
• Failover?
This is complex enough to warrant another talk…
10
Geo-Locality
Why Geo-Locality
Data needs to be close to the users.
And the users are all over the place.
Main Challenges
• Topic names
• Management of replication pipelines
• Managing configuration
• Avoiding “loops”
11
Geo-locality
scenarios
12
Minimize
Number of Pipes
• Less to configure and
maintain
• Less to monitor
• Easier to avoid loops
13
Replicator – One end-point to rule them all
curl -X POST -H "Content-Type: application/json" --data @replicator-sf.json http://localhost:28083/connectors
{ "name": ”SF-Replicator",
"config": {
"connector.class": "io.confluent.connect.replicator.ReplicatorSourceConnector",
"tasks" : 4,
"topic.whitelist": "demo-topic",
"topic.rename.format": ”SF.${topic}",
"topic.auto.create": true,
"key.converter": "io.confluent.connect.replicator.util.ByteArrayConverter",
"value.converter": "io.confluent.connect.replicator.util.ByteArrayConverter”,
"src.kafka.bootstrap.servers": "dc1-kafka:19092",
"src.zookeeper.connect": "dc1-zookeeper:12181",
"dest.zookeeper.connect": "dc2-zookeeper:22181”}
}
14
Origin Regions
ZooKeeper
Kafka Broker
Destination
Central Cluster
ZooKeeper
Kafka Broker
test-topic
NYC.test-topic
Connect
Replicator
NYC
consumer
ZooKeeper
Kafka Broker
test-topic
SF.test-topic
Replicator
SF
consumer
Producer
15
Multi-DC for Legalities
What Legalities
• Similar to geo-localization. but…
• Different countries have different data
storage laws
• But some data needs to be shared
• Laws regarding encryption
• Laws regarding privacy
• Also – legal usually wants failover
Main Challenges
• Avoid copying some data
• Encryption over the wire
• Lineage
• Security, audits
16
SMT to the rescue!
• Simple message transforms.
• Route, filter or modify events with
just a bit of config
• Work with any connector –
specifically, Replicator.
• Pluggable – you can add your own
17
Lineage + Message Filter SMT
"transforms":"InsertSourceDetails, DropField",
"transforms.InsertSourceDetails.type":"org.apache.kafka.connect.transforms.InsertField$Value",
"transforms.InsertSourceDetails.static.field":"messagesource",
"transforms.InsertSourceDetails.static.value":"MySQL demo on asgard”
"transforms.DropField.type":"org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.DropField.blacklist":”very_private_field"
18
Multi-DC for Cloud Migration
Why Cloud Migration?
• All the cool kids are moving to the cloud
• Likely to be a lengthy process
• Maybe on-prem + multi-clouds forever
• Many apps and teams involved
• Need well known central pipe
Main Challenges
• Most expensive network ever
• Random failure modes
• Many applications, teams, data-stores
• To be honest, Kafka is the easy part
19
At first, this is
no big deal…. App
App
AppApp
DWH
DB
KV
App
DB
DC1 AWS
20
6 month later...DC1 AWS
DB
APP
APP
APP
APP
APP
APP
APP
APP
DB
DB
DWH
KV
KV
KV
DWH
21
Are you
kidding?
● This is expensive
● This is a maintenance
nightmare
● We may need more than one
region!
● We may need more than one
cloud!
22
We’ve done this before...
This... To this...
23
There is a
better way
24
Benefits of Kafka + Replicator for Cloud
Operations
1. Proven architecture
(Watch our online talk with Monsanto!)
2. Non-stop, low latency pipe
3. Cost savings
4. One throat to tune, manage, monitor,
secure and improve
Future-proof
1. Connect lets you explore cloud services
2. Avoid lock-in: ”Kafka is our escape valve”
3. Multi-zone, multi-region, multi-cloud…
4. Microservices ready
5. Streams ready
24
25
Few general lessons
1. Don’t be afraid of many clusters
2. Decide if you need to scale clusters, data-centers, regions or all of above
3. Choose your trade-offs
4. It is better to consume over distance than to produce
5. Unless you stop consuming when you can’t produce
6. Security over the wire:
• SSL encryption for consumers takes LOTS of broker resources
• Maybe consume locally without SSL and produce remotely with SSL
7. Monitor, monitor, monitor. Especially lag
8. Tuning over WAN is different: https://docs.confluent.io/current/multi-dc/replicator-tuning.html
26
Feature Benefit MirrorMaker Confluent Replicator
Data Replication Real-time event streaming between Kafka clusters and data-centers
Schema Replication Integrate with Confluent Schema Registry for multi-dc data quality and governance
Connect Replication Manage data integration across multiple data centers
Flexible topic selection Select topics with white-lists, black-lists and regular expressions
Auto-create topics New topics are automatically detected and replicated. Minimize admin overhead.
Add new partitions New partitions are automatically detected and replicated. Minimize admin overhead.
Configuration Replication
Topic configuration remains synchronized between the two clusters. Avoid configuration
diverging due to human error.
Auto-Scale Scale replication processes as Kafka traffic increases with a single configuration
Active-active replication Redirect events to avoid infinite replication loops in active-active configurations
Aggregate cluster One management point for replicating more than a single cluster
Control Center Integration Manage and monitor replication via Control Center UI
Support transformations Via Connect’s SMT – Lineage, routing, masking, filtering and more
Confluent Replicator is Enterprise-ready MirrorMaker
27
We learned
1. Why you need multiple clusters. Or multiple Data Centers. Or multiple regions.
2. What are the trade-offs involved
3. Few common use-cases and architectures
4. Next week: Data recovery and failover
28
Thank You!

Common Patterns of Multi Data-Center Architectures with Apache Kafka

  • 1.
    1 Patterns of MultiData-Center Architectures Gwen Shapira, Product Manager @gwenshap
  • 2.
    2 What we’ll talkabout today • When is one cluster not enough? • When is one DC not enough? • Trade-offs in multi-DC architectures • Architectures used in common use-cases
  • 3.
  • 4.
    4 Reasons to havemultiple Kafka clusters in same DC • Isolation • Tuning • Convenience • Organization structure
  • 5.
    5 Workload Isolation • Dev,Test, Staging, Prod • Lower impact of cluster-wide failures • Prioritize and protect important topics • Separate high-throughput but low- value topics • Different access patterns • Security / access requirements Payments Insights Metrics
  • 6.
    6 Reasons to go Multi-DC •Geo-locality • Legal reasons • Cloud and On-Prem • Disaster Recovery
  • 7.
    7 Multi-DC is goingto involve some tough choices
  • 8.
    8 Main trade-off Low Latency/ High throughput • Write to one DC, replicate later Strong Consistency • Wait for multi-DC to acknowledge writes
  • 9.
    9 Operationalizing is hard •Multi-DC is EXPENSIVE • Install, configure and upgrade multiple clusters • Monitor and troubleshoot multiple clusters • Figure out a multi-DC architecture • Choose, install, configure replication solution • Monitor replication • Failover? This is complex enough to warrant another talk…
  • 10.
    10 Geo-Locality Why Geo-Locality Data needsto be close to the users. And the users are all over the place. Main Challenges • Topic names • Management of replication pipelines • Managing configuration • Avoiding “loops”
  • 11.
  • 12.
    12 Minimize Number of Pipes •Less to configure and maintain • Less to monitor • Easier to avoid loops
  • 13.
    13 Replicator – Oneend-point to rule them all curl -X POST -H "Content-Type: application/json" --data @replicator-sf.json http://localhost:28083/connectors { "name": ”SF-Replicator", "config": { "connector.class": "io.confluent.connect.replicator.ReplicatorSourceConnector", "tasks" : 4, "topic.whitelist": "demo-topic", "topic.rename.format": ”SF.${topic}", "topic.auto.create": true, "key.converter": "io.confluent.connect.replicator.util.ByteArrayConverter", "value.converter": "io.confluent.connect.replicator.util.ByteArrayConverter”, "src.kafka.bootstrap.servers": "dc1-kafka:19092", "src.zookeeper.connect": "dc1-zookeeper:12181", "dest.zookeeper.connect": "dc2-zookeeper:22181”} }
  • 14.
    14 Origin Regions ZooKeeper Kafka Broker Destination CentralCluster ZooKeeper Kafka Broker test-topic NYC.test-topic Connect Replicator NYC consumer ZooKeeper Kafka Broker test-topic SF.test-topic Replicator SF consumer Producer
  • 15.
    15 Multi-DC for Legalities WhatLegalities • Similar to geo-localization. but… • Different countries have different data storage laws • But some data needs to be shared • Laws regarding encryption • Laws regarding privacy • Also – legal usually wants failover Main Challenges • Avoid copying some data • Encryption over the wire • Lineage • Security, audits
  • 16.
    16 SMT to therescue! • Simple message transforms. • Route, filter or modify events with just a bit of config • Work with any connector – specifically, Replicator. • Pluggable – you can add your own
  • 17.
    17 Lineage + MessageFilter SMT "transforms":"InsertSourceDetails, DropField", "transforms.InsertSourceDetails.type":"org.apache.kafka.connect.transforms.InsertField$Value", "transforms.InsertSourceDetails.static.field":"messagesource", "transforms.InsertSourceDetails.static.value":"MySQL demo on asgard” "transforms.DropField.type":"org.apache.kafka.connect.transforms.ReplaceField$Value", "transforms.DropField.blacklist":”very_private_field"
  • 18.
    18 Multi-DC for CloudMigration Why Cloud Migration? • All the cool kids are moving to the cloud • Likely to be a lengthy process • Maybe on-prem + multi-clouds forever • Many apps and teams involved • Need well known central pipe Main Challenges • Most expensive network ever • Random failure modes • Many applications, teams, data-stores • To be honest, Kafka is the easy part
  • 19.
    19 At first, thisis no big deal…. App App AppApp DWH DB KV App DB DC1 AWS
  • 20.
    20 6 month later...DC1AWS DB APP APP APP APP APP APP APP APP DB DB DWH KV KV KV DWH
  • 21.
    21 Are you kidding? ● Thisis expensive ● This is a maintenance nightmare ● We may need more than one region! ● We may need more than one cloud!
  • 22.
    22 We’ve done thisbefore... This... To this...
  • 23.
  • 24.
    24 Benefits of Kafka+ Replicator for Cloud Operations 1. Proven architecture (Watch our online talk with Monsanto!) 2. Non-stop, low latency pipe 3. Cost savings 4. One throat to tune, manage, monitor, secure and improve Future-proof 1. Connect lets you explore cloud services 2. Avoid lock-in: ”Kafka is our escape valve” 3. Multi-zone, multi-region, multi-cloud… 4. Microservices ready 5. Streams ready 24
  • 25.
    25 Few general lessons 1.Don’t be afraid of many clusters 2. Decide if you need to scale clusters, data-centers, regions or all of above 3. Choose your trade-offs 4. It is better to consume over distance than to produce 5. Unless you stop consuming when you can’t produce 6. Security over the wire: • SSL encryption for consumers takes LOTS of broker resources • Maybe consume locally without SSL and produce remotely with SSL 7. Monitor, monitor, monitor. Especially lag 8. Tuning over WAN is different: https://docs.confluent.io/current/multi-dc/replicator-tuning.html
  • 26.
    26 Feature Benefit MirrorMakerConfluent Replicator Data Replication Real-time event streaming between Kafka clusters and data-centers Schema Replication Integrate with Confluent Schema Registry for multi-dc data quality and governance Connect Replication Manage data integration across multiple data centers Flexible topic selection Select topics with white-lists, black-lists and regular expressions Auto-create topics New topics are automatically detected and replicated. Minimize admin overhead. Add new partitions New partitions are automatically detected and replicated. Minimize admin overhead. Configuration Replication Topic configuration remains synchronized between the two clusters. Avoid configuration diverging due to human error. Auto-Scale Scale replication processes as Kafka traffic increases with a single configuration Active-active replication Redirect events to avoid infinite replication loops in active-active configurations Aggregate cluster One management point for replicating more than a single cluster Control Center Integration Manage and monitor replication via Control Center UI Support transformations Via Connect’s SMT – Lineage, routing, masking, filtering and more Confluent Replicator is Enterprise-ready MirrorMaker
  • 27.
    27 We learned 1. Whyyou need multiple clusters. Or multiple Data Centers. Or multiple regions. 2. What are the trade-offs involved 3. Few common use-cases and architectures 4. Next week: Data recovery and failover
  • 28.