"Apache Flink and Kafka Streams are the dominant stream processing technologies available today. But which one do you choose to build an event streaming application? Both Kafka Streams and Flink offer similar features despite being distinctly different distributed processing frameworks. So how does one go about selecting the correct one?
In this talk, attendees will learn the information needed to match their event streaming requirements and objectives with the correct streaming framework. Specifically, I'll cover the following topics:
1. Architecture
2. Deploying applications
3. Work/Task Assignment and Distribution
4. API comparison
5. Stateful operations and state durability
You'll leave with the knowledge of both Kafka Streams and Flink's strengths and weaknesses and enough information to determine which framework best suits your event streaming needs."
35. 35
@bbejeck
Task Distribution & Assignment – Flink
• Flink doesn’t work in key-value pairs
• Keys are Virtual – defined by functions
• Individual keys are allocated into KeyGroups for distribution
37. 37
@bbejeck
Task Distribution & Assignment – Flink
Max Parallelism
• Determines the number of Key Groups per operator
• Limits the number of parallel tasks keyed state can scale to
• Default level for cluster, jobs can set individually
44. 44
@bbejeck
API – Flink
// default parallelism set to 2 in configuration
KafkaSource<StockTrade> kafkaStockTradeSource =
KafkaSource.<StockTrade>builder()
.setTopics("stock-trades")
.setBootstrapServers("bootstrap-servers")
.setProperties(properties)
.setStartingOffsets(OffsetsInitializer.earliest())
.setGroupId("flink-stock-trade")
.setDeserializer(new StockTradeDeserializationSchema())
.build();
45. 45
@bbejeck
API - Flink
StockTradeDeserializationSchema implements KafkaRecordDeserializationSchema<StockTrade> {
//Details omitted for clarity
@Override
public void deserialize(ConsumerRecord<byte[], byte[]> record,
Collector<StockTrade> out) throws IOException {
// deserialize key and value
// create object with key and value
out.collect(trade);
}
}
47. 47
@bbejeck
API - Flink
//Details omitted for clarity
StockTradeSerializationSchema implements KafkaRecordSerializationSchema<TradeAgg> {
final String topic; // constructor param
public ProducerRecord<byte[], byte[]> serialize(TradeAgg aggregate,
KafkaSinkContext context,
Long timestamp) {
// serialize key and value from aggregate obj
return new ProducerRecord(...);
}
}
64. 64
@bbejeck
Summary
• Kafka Streams offers flexible deployment
• High Availability for processing
• Standby Tasks mitigate time to restore stateful processing
• Flink highly optimized
• Good for large stateful operations
• Snapshots offer quick recovery for moderate state