A CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdf
Kafka meetup seattle 2019 mirus reliable, high performance replication for apache kafka
1. Mirus
Reliable, high performance replication for Apache Kafka
pdavidson@salesforce.com
Paul Davidson, Principal Engineer
Seattle Apache Kafka Meetup
April 18, 2019
2. Kafka Data Replication at Salesforce
Must send data from multiple global data centres to aggregate clusters
• Securely
• No data loss
• Minimal latency
• Minimal data duplication
Challenges:
• Rapidly increasing load, multiple DCs
• Dynamic environment: topics frequently added and removed
• WAN connections
3. Kafka Data Replication at Salesforce
Apache Mirror Maker
• Simple tool provided with Apache Kafka
• A Kafka Consumer sending data to a Kafka Producer
• Static configuration
• Regex whitelist for topics
How was it done?
5. Mirror Maker at Scale
● Mirror Maker works well for small static clusters, but ...
● At large scale:
○ Re-balance loops
■ Fix: Careful configuration
● Increase session.timeout.ms, request.timeout.ms
○ Unhandled exceptions (especially in Kafka < 0.11)
■ Fix: Apply upstream patches and stay up-to-date!
○ Poor handling of missing / offline destination partitions
■ Fix: Custom patch for the Consumer Group Coordinator
● Filter any topics currently unavailable in the destination
● Reliable, but definitely a work-around
6. Mirror Maker at Scale
● Poor control of partition assignment impacts performance
○ Stops consuming while committing a batch
○ Large batches required for throughput across WAN
○ Must run many instances per node to achieve good
throughput
● Not suitable for API-driven cluster management
○ Static configuration: release and restart to update!
○ Limited control of topic white-list
There must be a better way ...
9. Aside: Kafka Connect
● Built-in Kafka framework
○ For reliably streaming data to and from Kafka
● Dynamic configuration and status with REST API
● Handles cluster management
○ “Distributed Herder” built on Kafka cluster management
○ Backed by compacted Kafka topics
● Continuous ingestion
○ Keeps consuming while committing offsets
○ Supports multiple consumers and producers
10. Introducing Mirus
● Based on the Kafka Connect framework
● Dynamic configuration
○ REST API for configuration updates
○ Precise control of replication
● Configurable parallelism
● Support for task-level consumer and producer monitoring
● Improved resiliency: automated restart on Kafka Connect task failure
11. Kafka Connect Overview
● Kafka Connect cluster
○ A distributed a set of Worker processes
○ One or more Workers per host
○ Managed by Distributed Herder
● Worker Processes
○ Tasks
○ Connectors
■ Source - read from X, write to Kafka
■ Sink - read from Kafka, write to X
12. Mirus Internals
● Mirus includes:
○ Custom Source Connector implementation
○ Custom Source Task implementation
○ Customized Worker entry point
○ Custom monitor threads
13. Mirus “Kafka Monitor”
● The heart of Mirus
○ Thread managed by the Mirus Source Connector
○ Monitors the state of source and destination Kafka clusters
● Handles partition assignment
○ Applies white-list to current state
○ Missing destination topics are counted but not mirrored
○ Triggers rebalances when partition assignments change:
■ Source configuration changes
■ Source partitions changes
■ Destination partitions creation / deletion
15. Dynamic Configuration
● REST API to POST connector configuration updates
○ Connector config dynamic, worker config static
● Config updates trigger rebalance
● Configurable location, can be in Source or Destination Kafka Cluster
○ Use the cluster closest to the Mirus Workers
16. Partition and Task Assignment
● Happens on every rebalance
● Mirus partition algorithm
pluggable
○ Round-robin by default
○ Could use metadata, e.g.
high-throughput.
● Task assignment
○ KC framework: round-robin only
○ Not pluggable (we could patch
this).
18. What’s Next?
● Integration with our Kafka orchestration tooling
○ Rapidly provision and mirror new topics across multiple clusters
● Topic creation, topic metadata replication
○ Has been requested by users
● Mirus Sink Connector
○ Improved support for multiple destination clusters in push configurations
22. REST API Example
● PUT connector
configuration update
○ E.g. increase number of
tasks.
● Handler writes to
configuration topic
● Distributed Herder triggers
rebalance
Increase the number of parallel tasks
bash-4.1$ curl localhost:8093/connectors/source-name/config
-X PUT
-- data
'{
"connector.class":
"com.salesforce.mirus.kafka.connect.MirusSourceConnector",
"consumer.bootstrap.servers": "source-hostname:9093",
"destination.bootstrap.servers": "dest-hostname:9093",
"name": "source-name",
"tasks.max": "90",
"topics.regex": "^topic-name.*$",
...
}'