Flink Forward San Francisco 2022.
Flink consumers read from Kafka as a scalable, high throughput, and low latency data source. However, there are challenges in scaling out data streams where migration and multiple Kafka clusters are required. Thus, we introduced a new Kafka source to read sharded data across multiple Kafka clusters in a way that conforms well with elastic, dynamic, and reliable infrastructure. In this presentation, we will present the source design and how the solution increases application availability while reducing maintenance toil. Furthermore, we will describe how we extended the existing KafkaSource to provide mechanisms to read logical streams located on multiple clusters, to dynamically adapt to infrastructure changes, and to perform transparent cluster migrations and failover.
by
Mason Chen
13. User Manual Migration Steps
• Change source uid
• Change bootstrap server
• Upgrade application
• With non restore state
• Change parallelism and resources to catch with lag
• Revert to steady state when caught up
14. Manual Migration Steps
• Application downtime
• Need to increase system resources for catchup
• User manual toil
• User could have 100+ jobs
• Multiple hours of team coordination
Drawbacks
15. Scaling Multiple Kafka Clusters
• Hybrid cloud: on-prem, private cloud and public cloud providers
• Scalability
• Topic sharding
• Operability and Failover
• In place upgrade is complex and error prone
55. Multi Cluster Kafka Source Benefits
• Migrations and failover automated transparently within source
• Simplify operations between compute and storage infra
• Hybrid Source compatible
• Can be leveraged for topic migration
56. Future Work
• Integrate with split level watermark alignment
• Optimizations to remove only affected readers
• FLIP-246 (https://cwiki.apache.org/confluence/display/FLINK/
FLIP-246%3A+Multi+Cluster+Kafka+Source)