This document discusses Kafka deployment at online advertising company Criteo. Some key points:
1. Criteo uses Kafka to process up to 10 million messages per second and 180 TB of data per day across 13 Kafka clusters spanning multiple datacenters.
2. They define partitioning based on retaining 72GB of data per partition over a 72 hour period. This has led to topics with over 1,300 partitions.
3. Criteo developed an in-house C# Kafka client optimized for their use cases of high throughput and ability to blacklist partitions when needed. They are looking to upgrade to support new Kafka features like idempotent producers and transactions.
4. Monitoring lag is a key metric,
7. 7 •
Up to 10 millions msgs/s
(500 billions msgs/day)
180 TB / day (compressed)
Around 200 brokers
Kafka in production for 4 years
Some Figures
8. 8 •
The Infrastructure
2
3
2
Multiple Datacenters
Bare-metal servers
Servers managed by Chef
User applications running on
Mesos or YARN
13 Kafka clusters
9. 9 •
Partition size
• How we define the number of partitions?
• 1GB/partition/hour
• 72h retention
• We try to keep 72GB per partition
• No key, no problem when increasing partitions
• We have topics with more than 1.300 partitions
11. 11 •
• First C# implementation
• Open Source (https://github.com/criteo/kafka-sharp/)
• Built for high-troughput
• Ability to blacklist partitions
• It discards messages if needed
• Our use:
• No Key Partitioning
• No order per partition guarantee
In-house C# Kafka Client
13. 13 •
Cons
• Costly to mantain
• Difficult to keep up to date
Pros
• Highly customizable
• Optimized for our use case
• Full control during the migration
Trade-off
15. 15 •
• Lag is our main metric for the clients
• We should be able to measure the lag in all conditions:
• No messages sent
• Blacklisted partitions
• New partitions added
• Offline partitions
SLA based on lag
17. 17 •
Watermarks
46
6
6
6
6
5
5
partition 1
partition 2
partition 4
• Special messages sent to each partition.
• They contain a monotonic timestamp.
• They provide a clock for the stream of messages.
• If a message has a timestamp lower than the previous timestamp, it’s late.
2
3
3
2
2
3
2
2
2
1
1
1
3
3
44
4
4
44
5 4
new old
22. 22 •
• Producers/Consumers can overload the cluster.
• Overloaded brokers may lead to losing data.
• Front Cluster
• All data, that goes to HDFS
• Application clusters
• Streaming
Front and Application cluster
Front Application
Online
Service
HDFS
23. 23 •
Mesos
• Kafka Connect application running on Mesos.
• Custom connector.
• Writes offsets on the destination.
• Replication inter or cross datacenters.
Replication
32. More generic problems Kappa and Kafka
partition 0
partition 1
partition 2
partition 3
partition 4
33. 33 •
We are upgrading our C# Kafka client
We look forward for new features:
• Idempotent producers
• Transaction
• Headers
Kafka new features
34. 34 •
• Challenges:
• More and more streaming use cases.
• Multiple frameworks: Flink, Kafka Connect, Kafka Stream, Plain
Consumers.
• Clients running on Mesos, Yarn or bare-metal
• Pulsar and Pravega evaluation
• We are working on a framework to help:
• Release
• Schedule
• Scale
• Monitor
• Maintain
Streaming
Because of some GC and bufferization time, topic balcklist logic some data could be in-ordered
When you have a Spark hammer, everything else looks like a micro batch. Actually micro-batch paradigm is a very nature for those who comes from batch world to nearline data processing.
- Main problem of Spark Streaming was not the latency.
- No Event Time processing => no accurate data processing.
It doesn’t matter how fast are you if you are wrong
Processing time: cannot give you reproducibility. In case if you are using time – you should implement some kind of the event time
When you have a Spark hammer, everything else looks like a micro batch. Actually micro-batch paradigm is a very nature for those who comes from batch world to nearline data processing.
- Main problem of Spark Streaming was not the latency.
- No Event Time processing => no accurate data processing.
It doesn’t matter how fast are you if you are wrong
Why we cannot have a kappa architecture for such pipeline.
And actually a job state is the main bottleneck we see for our systems.
So it means we still need to have a supervisor for our system, which restart job with an offset if he understand that we cannot catch up
In case you have unbalanced partitions in the Kafka we risk to have some issues with a catching up.