How is Kafka so Fast?

3. 3 • Apache Kafka is a distributed message queue • Open-sourced by LinkedIn in 2011 • High-throughput • Highly distributed • Fault-tolerant • Low-latency What is Kafka?

4. 4 • • Use case • GLUP pipeline (aka Kafka Local) • Streaming event processing platform (aka Kafka Stream) • Some figures : • 14 clusters / 200 servers / 7 DC • Up to 7 millions messages / sec • Up to 150 TB processed per day Kafka @ Criteo ?

5. 5 • Topics, Partitions and Offsets 7 6 5 4 3 2 1 08910 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 08 7 6 5 4 3 2 1 0891011 7 6 5 4 3 2 1 08 7 6 5 4 3 2 1 0 Partition 0 Partition 1 Partition 0 Partition 1 Partition 2 Partition 3 OldNew Writes Topic A Topic B

6. Complexity inside the clients

7. 7 • Brokers • Manage partitions • Receive from producer records for a (topic, partition) • Answer to consumer asking records for (topic, partition, offset) • Manage replicas • Manage consumer coordination • Assigning good partitions to the good consumer Broker 1 Producer Broker 2 Consumer Consumer Fetch (Topic A, Partition 4, Offset 10) Bytes Fetch (Topic B, Partition 1, Offset 10) Bytes

8. 8 • Producers Producer Broker (partition leader) Broker (replica) Broker (replica) ack • Producers decide what partitions to send to; • Producers can send a batch of messages; • Producers can compress a batch; • Producers wait for acknowledgement from the broker (acks=1) or broker + replica (acks=all);

9. 9 • Consumers ConsumerBroker 6 7 8 9 10 11 12 12 offset=7 Partition 2: Partition 2, offset 6 7 8 9 1 2 3Commit offset=9 • Consumers control what offset to consume from; • Consumers commit offsets to kafka, but it’s just another Kafka topic; • Consumers can receive batched and / or compressed data; • Kafka coordinates which partitions each consumer will consume from.

10. Did you say SSD is better than HDD ?

11. 11 • Faster but not that much

12. 12 • • Each Kafka partition is mapped to segment files • Segment file : log append structure • Records are immutable • Broker is doing very few random disk search Only sequential I/O Kafka Active Segment file Old segment files

13. 13 • • Kafka relies on native Linux Page cache (read-ahead and write-behind) • JVM off-heap cache for free • Kafka records aren’t deserialized in Kafka JVM • No Java object memory overhead • No OutOfMemory issue • No big GC pauses Caching data for free Kafka Active Segment file Disk OS Old segments files

14. 14 • Reliability with replication • Kafka disk writes are asynchronous • Kafka replicas synchronisation (over network) is synchronous • Trusting replicas in case of data corruption / server crash Broker (partition leader) Broker (replica) Broker (previous leader)

15. Zero Copy

16. 16 • Sending data from file to network (traditional approach) read(file, tmp_buf, len); write(socket, tmp_buf, len);

17. 17 • Sending data from file to network (zero-copy approach) transferTo(position, count, writableChannel);

18. Make things simple

19. 19 • • Paralelism based on topic partitions; • Data compressed/uncompressed on the client; • Producers send a batch of messages; • No serialization/deserialization costs on the brokers; • Writing directly to file: • Append only (cheapers disks); • No complex data structure (no BTree or LSM tree); • Uses OS memory management; • Relies on replicas not on disks; • Zero-copy; Key takeaways

20. Thank you! #rivers

Editor's Notes

Do quick presentation of each other short agenda (first kafka basics + seconds design choice that made it a great tool for our scale)
Why this name : just because initial creator (Jay Kreps) liked this author, like the fact he was a writer and think it was a good name for an OS project.
Topic is just lake a table in a DB but for a queue for a queue we called that topic. You send message to Bid request topic and you received message from billable click topic Partition are a section of a topic. So here topic A have two partiotn / topic B have 4. Partitions are spread over different servers but one partition is always fitting in one server. Topic can be bid request and billable click Bid request as 1000 partitions Partitions are in different server Order only inside a partition Each message as a monotonic offset. Focus on : - Kafka is just storing bytes / no schema --> you can send image in kafka if you want (not a wonderfull idea, but it works)
First step we want to explain you is complexity is not in server but in client
Producer and consumer Broker is dummy Difference between rabbit MQ or oyher queue : you can have huge queue if you want (cf event sourcing store) limit is disk / don’t care about status of a message is it well received is dummy + pull and not push You can group together consumer to create a consumer group and so a distributed application. Broker is managing coordianation of consumer to assgn good partition to good consumer
Focus on : - No SPOF /no broker acting like gateway for the cluster : producer is maintenaing the mapping (topic, partition) -> broker Batch is only logic : one physical message (one send request / ack) is containing several messages Batch advantage : Compress is efficient / network ack is efficient : one ack for each 1 000 messages for instance
Warning : consumer receive compress batch data only if producer was sending like that
Cost efficiency + highest perf Advantage here is to use JBOD or RAID Having ssd will cost more with equal perf or even lower
- Same cache system than varnish (HTTP cache server) - Designed to work with linux only. - a heap of 4gb is enough because no data inside (only managing metatdata and client connection)
- Same cache system than varnish (HTTP cache server) - Designed to work with linux only. - a heap of 4gb is enough because no data inside (only managing metatdata and client connection)
Disk is async (and it's ok because network is sync)

How is Kafka so Fast?

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to How is Kafka so Fast?

Similar to How is Kafka so Fast? (20)

Recently uploaded

Recently uploaded (20)

How is Kafka so Fast?

Editor's Notes