Kafka meetup seattle 2019 mirus reliable, high performance replication for apache kafka

Mirus
Reliable, high performance replication for Apache Kafka
pdavidson@salesforce.com
Paul Davidson, Principal Engineer
Seattle Apache Kafka Meetup
April 18, 2019

Kafka Data Replication at Salesforce
Must send data from multiple global data centres to aggregate clusters
• Securely
• No data loss
• Minimal latency
• Minimal data duplication
Challenges:
• Rapidly increasing load, multiple DCs
• Dynamic environment: topics frequently added and removed
• WAN connections

Kafka Data Replication at Salesforce
Apache Mirror Maker
• Simple tool provided with Apache Kafka
• A Kafka Consumer sending data to a Kafka Producer
• Static configuration
• Regex whitelist for topics
How was it done?

Mirror Maker at Scale
● Mirror Maker works well for small static clusters, but ...
● At large scale:
○ Re-balance loops
■ Fix: Careful configuration
● Increase session.timeout.ms, request.timeout.ms
○ Unhandled exceptions (especially in Kafka < 0.11)
■ Fix: Apply upstream patches and stay up-to-date!
○ Poor handling of missing / offline destination partitions
■ Fix: Custom patch for the Consumer Group Coordinator
● Filter any topics currently unavailable in the destination
● Reliable, but definitely a work-around

Mirror Maker at Scale
● Poor control of partition assignment impacts performance
○ Stops consuming while committing a batch
○ Large batches required for throughput across WAN
○ Must run many instances per node to achieve good
throughput
● Not suitable for API-driven cluster management
○ Static configuration: release and restart to update!
○ Limited control of topic white-list
There must be a better way ...

Mirus
Dynamic Replication service based on Kafka Connect
Introducing...

Mirus?
Latin root of Mirror: “wonderful, marvelous”

Aside: Kafka Connect
● Built-in Kafka framework
○ For reliably streaming data to and from Kafka
● Dynamic configuration and status with REST API
● Handles cluster management
○ “Distributed Herder” built on Kafka cluster management
○ Backed by compacted Kafka topics
● Continuous ingestion
○ Keeps consuming while committing offsets
○ Supports multiple consumers and producers

Introducing Mirus
● Based on the Kafka Connect framework
● Dynamic configuration
○ REST API for configuration updates
○ Precise control of replication
● Configurable parallelism
● Support for task-level consumer and producer monitoring
● Improved resiliency: automated restart on Kafka Connect task failure

Kafka Connect Overview
● Kafka Connect cluster
○ A distributed a set of Worker processes
○ One or more Workers per host
○ Managed by Distributed Herder
● Worker Processes
○ Tasks
○ Connectors
■ Source - read from X, write to Kafka
■ Sink - read from Kafka, write to X

Mirus Internals
● Mirus includes:
○ Custom Source Connector implementation
○ Custom Source Task implementation
○ Customized Worker entry point
○ Custom monitor threads

Mirus “Kafka Monitor”
● The heart of Mirus
○ Thread managed by the Mirus Source Connector
○ Monitors the state of source and destination Kafka clusters
● Handles partition assignment
○ Applies white-list to current state
○ Missing destination topics are counted but not mirrored
○ Triggers rebalances when partition assignments change:
■ Source configuration changes
■ Source partitions changes
■ Destination partitions creation / deletion

Dynamic Configuration
● REST API to POST connector configuration updates
○ Connector config dynamic, worker config static
● Config updates trigger rebalance
● Configurable location, can be in Source or Destination Kafka Cluster
○ Use the cluster closest to the Mirus Workers

Partition and Task Assignment
● Happens on every rebalance
● Mirus partition algorithm
pluggable
○ Round-robin by default
○ Could use metadata, e.g.
high-throughput.
● Task assignment
○ KC framework: round-robin only
○ Not pluggable (we could patch
this).

Open Sourced
Released September 2018
BSD License
https://github.com/salesforce/mirus

What’s Next?
● Integration with our Kafka orchestration tooling
○ Rapidly provision and mirror new topics across multiple clusters
● Topic creation, topic metadata replication
○ Has been requested by users
● Mirus Sink Connector
○ Improved support for multiple destination clusters in push configurations

REST API Example
● PUT connector
configuration update
○ E.g. increase number of
tasks.
● Handler writes to
configuration topic
● Distributed Herder triggers
rebalance
Increase the number of parallel tasks
bash-4.1$ curl localhost:8093/connectors/source-name/config
-X PUT
-- data
'{
"connector.class":
"com.salesforce.mirus.kafka.connect.MirusSourceConnector",
"consumer.bootstrap.servers": "source-hostname:9093",
"destination.bootstrap.servers": "dest-hostname:9093",
"name": "source-name",
"tasks.max": "90",
"topics.regex": "^topic-name.*$",
...
}'

DC
Push Mode Replication
DCDCLeaf DCs
Kafka Cluster
Mirus
Cluster
Aggregate DCs
Kafka
Cluster
WAN

DCAggregate DCs
Pull Mode Replication
Kafka Cluster
Mirus
Cluster
Leaf DCs
Kafka Cluster WAN

Kafka meetup seattle 2019 mirus reliable, high performance replication for apache kafka

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Kafka meetup seattle 2019 mirus reliable, high performance replication for apache kafka

Similar to Kafka meetup seattle 2019 mirus reliable, high performance replication for apache kafka (20)

More from Nitin Kumar

More from Nitin Kumar (15)

Recently uploaded

Recently uploaded (20)

Kafka meetup seattle 2019 mirus reliable, high performance replication for apache kafka