Flink Forward San Francisco 2022.
In normal situations, the default Kafka consumer and producer configuration options work well. But we all know life is not all roses and rainbows and in this session we’ll explore a few knobs that can save the day in atypical scenarios. First, we'll take a detailed look at the parameters available when reading from Kafka. We’ll inspect the params helping us to spot quickly an application lock or crash, the ones that can significantly improve the performance and the ones to touch with gloves since they could cause more harm than benefit. Moreover we’ll explore the partitioning options and discuss when diverging from the default strategy is needed. Next, we’ll discuss the Kafka Sink. After browsing the available options we'll then dive deep into understanding how to approach use cases like sinking enormous records, managing spikes, and handling small but frequent updates.. If you want to understand how to make your application survive when the sky is dark, this session is for you!
by
Olena Babenko
13. Job Status: RUNNING
Fix in Kafka topic config:
'segment.bytes' >= MAX_MESSAGE_SIZE_IN_BYTES
Reason:
...org.apache.kafka.common.errors.RecordBatchTooLargeException:
The request included message batch larger than the configured
segment size on the server.
14. Impact on Flink Source
Performance improvement:
'properties.request.timeout.ms' > 30 Sec
Job Status: RUNNING (No progress)
Reason:
...INFO...org.apache.kafka.clients.FetchSessionHandler - Error
sending fetch request to node 5:
...org.apache.kafka.common.errors.DisconnectException: null
20. Job Status: RUNNING (No progress)
Reason:
...WARN...org.apache.kafka.clients.producer.internals.Sender - Got
error produce response in correlation id 137 on topic-partition
response-0, splitting and retrying (2147483647 attempts left).
Error: MESSAGE_TOO_LARGE
21. Job Status: RUNNING
Fix in Kafka topic config:
'max.message.bytes' >= X * MAX_MESSAGE_SIZE_IN_BYTES
Reason:
...WARN...org.apache.kafka.clients.producer.internals.Sender - Got
error produce response in correlation id 137 on topic-partition
response-0, splitting and retrying (2147483647 attempts left).
Error: MESSAGE_TOO_LARGE
22. Job Status: RUNNING (No progress)
Reason:
...WARN...org.apache.kafka.clients.producer.internals.Sender - Got
error produce response in correlation id 137 on topic-partition
response-0, splitting and retrying (2147483647 attempts left).
Error: MESSAGE_TOO_LARGE
23. Job Status: FINISHED
Fix in Flink Sink Config:
'properties.compression.type' = 'snappy' ['gzip', 'lz4', 'zstd']
Reason:
...WARN...org.apache.kafka.clients.producer.internals.Sender - Got
error produce response in correlation id 137 on topic-partition
response-0, splitting and retrying (2147483647 attempts left).
Error: MESSAGE_TOO_LARGE
24. Batch compression
Pros:
● Save disk space on Kafka
● The bigger batch, the better compression
● Fewer chances to get a timeout
25. Batch compression
Cons:
● Every consumer has to decompress messages
● A Flink Sink task runs slower
● Compression uses additional memory (~ 315 MB of JVM heap
per 5000 partitions)
26. Impact on Flink Source
Job Status: RUNNING
Performance improvement:
'properties.max.partition.fetch.bytes' >=
X * MAX_MESSAGE_SIZE_IN_BYTES
34. High load
Warning
To keep records order in partition with retry > 0 (Default)
'max.in.flight.requests.per.connection' = 1 AND
‘enable.idempotence’ = False
OR
'max.in.flight.requests.per.connection' = 5(Default) AND
‘acks’ = ‘all’ AND ‘enable.idempotence’ = True
51. Flink partitioner vs Kafka partitioner
Flink Partitioner
Class org.apache.flink.streaming.connectors.kafka.partitioner.FlinkKafkaPartitioner
.option("sink.partitioner", "org.myorg.quickstart.FlinkWeightedPartitioner")
Kafka Partitioner
Class org.apache.kafka.clients.producer.Partitioner
.option("properties.partitioner.class", "org.apache.kafka.clients.producer.RoundRobinPartitioner")
52. ❏ A custom partitioner could improve the performance of a Flink
Source.
❏ Add custom ConsumerPartitionAssignor in
‘properties.partition.assignment.strategy’ if necessary.
Impact on Flink Source/Kafka Consumer
53. Summary
● Enormous messages
● Group records to big batches
● Spikes, High load
● Custom partitions
Adapt Apache Kafka Sink/Source according
to business needs and data load
54. Ask me about Flink and Kafka!
Olena Babenko
Senior Software Engineer
olena@aiven.io
Editor's Notes
Kafka optimised for small but frequent records, to work with a big record, some tuning is needed.
Sometimes changing such a property require a Kafka knowledge, not only Flink tuning
Why Kafka do that? Why is it so restrictive?
Batch size high, but not too high
Some edge cases it might be uneven. Batch.size too high.
If there is time
Generate only DE, GB, FI, NO ES
Generate only DE, GB, FI, NO ES
Thank you for your attention. If you want to ask questions later, don't hesitate to contact me in linked in, or write . You can ask questions now.