Fan-in Flames: Scaling Kafka to Millions of Producers With Ryanne Dolan | Current 2022
At supermassive scale, a perennial problem with Kafka is ""high fan-in"" -- a large number of producers sending records to a small number of brokers. Even a relatively modest amount of data can overwhelm a broker when there are hundreds of thousands of concurrent producer requests.
This talk discusses a few real-world applications where high fan-in becomes a problem, and presents a few strategies for dealing with it. These include: fronting Kafka with an ingestion layer; separating brokers into read-only and write-only subsets; implementing specialized partitioning strategies; and scaling across clusters with ""smart clients"".
15. Data pipelines can reduce fan-out
One big topic à high fan-out
One big topic
16. Data pipelines can reduce fan-out
Pipelines à smaller topics à less fan-out
One big topic
Filtered topic
Projected topic
17. Data pipelines for aggregation
Small topics à aggregated topics à less fan-out
many small topics aggregated
topic 1
many small topics
many small topics
many small topics
aggregated
topic 2
18. Use case: Logging infrastructure
Two requirements
Application-specific logs
All logs
Host-specific logs
19. Use case: Logging infrastructure
Application, container, and host log events
Two
reasonable
approaches
One big topic
• All applications send log
events to one big topic,
which is sent to the cloud
• Data pipeline routes records
based on application ID,
container ID, host ID
• Derived topics for each
application, container, host
• Consumers can process a
single application, container,
host
Many small topics
• Each application sends to its
own topic
• Data pipeline aggregates
across containers and hosts,
and sends to cloud
28. The Application -- Mirrored
N+1 Producers and M+1 Consumer per Workload
Broker 1
Broker 2
29. The Workload
A modest amount of data
10K
RPS
10K
QPS
300K
BPS
constant constant constant
30. Latency metrics
As used here.
End-to-End Latency
Delay between when a record is
created (before send) and
when it is ultimately processed
by a consumer (after fetch).
Send Latency
Delay between when a record is
created (before send) and
when it is written to disk on the
last broker (before ACK).
Fetch Latency
Delay between when a record is
written to disk on the last broker
(before ACK) and when it is
ultimately processed by a
consumer (after fetch).
46. Smart clients
1. Round-robin among a subset
of partitions à only a fraction
of clients can send to a given
broker
2. Measure latency and avoid
slow partitions
47. Dealing with Fan-in
Some easy strategies
Add more brokers
Fetch-from-follower can be
used to split brokers into read vs
write sets
Shard workload
Divide workload into non-
overlapping groups
Combine Producers
Avoid having many producer
clients within the same
application
48. Dealing with Fan-in
Some easy strategies
Mirroring
Separate producers from
consumers
Batching
Slow down to get faster!
Smart partitioning
Avoid having multiple producers
write to the same partition at the
same time