SlideShare a Scribd company logo
1 of 20
Ricardo Paiva and Hervé Rivière
Understanding the design of Kafka and how it
handles Criteo workload
How is Kafka so
fast?
What is Kafka?
3 •
Apache Kafka is a distributed message queue
• Open-sourced by LinkedIn in 2011
• High-throughput
• Highly distributed
• Fault-tolerant
• Low-latency
What is Kafka?
4 •
• Use case
• GLUP pipeline (aka Kafka Local)
• Streaming event processing platform (aka Kafka Stream)
• Some figures :
• 14 clusters / 200 servers / 7 DC
• Up to 7 millions messages / sec
• Up to 150 TB processed per day
Kafka @ Criteo ?
5 •
Topics, Partitions and Offsets
7 6 5 4 3 2 1 08910
7 6 5 4 3 2 1 0
7 6 5 4 3 2 1 08
7 6 5 4 3 2 1 0891011
7 6 5 4 3 2 1 08
7 6 5 4 3 2 1 0
Partition 0
Partition 1
Partition 0
Partition 1
Partition 2
Partition 3
OldNew
Writes
Topic A
Topic B
Complexity inside the clients
7 •
Brokers
• Manage partitions
• Receive from producer records for a (topic, partition)
• Answer to consumer asking records for (topic, partition, offset)
• Manage replicas
• Manage consumer coordination
• Assigning good partitions to the good consumer
Broker 1
Producer
Broker 2
Consumer
Consumer
Fetch (Topic A, Partition 4, Offset 10)
Bytes
Fetch (Topic B, Partition 1, Offset 10)
Bytes
8 •
Producers
Producer
Broker
(partition leader)
Broker
(replica)
Broker
(replica)
ack
• Producers decide what partitions to send to;
• Producers can send a batch of messages;
• Producers can compress a batch;
• Producers wait for acknowledgement from the broker (acks=1) or broker + replica (acks=all);
9 •
Consumers
ConsumerBroker
6 7 8 9 10 11 12 12
offset=7
Partition 2:
Partition 2, offset 6
7 8 9
1
2
3Commit offset=9
• Consumers control what offset to consume from;
• Consumers commit offsets to kafka, but it’s just another Kafka topic;
• Consumers can receive batched and / or compressed data;
• Kafka coordinates which partitions each consumer will consume from.
Did you say SSD is better than HDD
?
11 •
Faster but not that much
12 •
• Each Kafka partition is mapped to segment files
• Segment file : log append structure
• Records are immutable
• Broker is doing very few random disk search
Only sequential I/O
Kafka
Active
Segment
file
Old
segment
files
13 •
• Kafka relies on native Linux Page cache (read-ahead and write-behind)
• JVM off-heap cache for free
• Kafka records aren’t deserialized in Kafka JVM
• No Java object memory overhead
• No OutOfMemory issue
• No big GC pauses
Caching data for free
Kafka
Active
Segment
file
Disk
OS
Old segments files
14 •
Reliability with replication
• Kafka disk writes are asynchronous
• Kafka replicas synchronisation (over network) is synchronous
• Trusting replicas in case of data corruption / server crash
Broker
(partition leader)
Broker
(replica)
Broker
(previous
leader)
Zero Copy
16 •
Sending data from file to network (traditional approach)
read(file, tmp_buf, len);
write(socket, tmp_buf, len);
17 •
Sending data from file to network (zero-copy approach)
transferTo(position, count, writableChannel);
Make things simple
19 •
• Paralelism based on topic partitions;
• Data compressed/uncompressed on the client;
• Producers send a batch of messages;
• No serialization/deserialization costs on the brokers;
• Writing directly to file:
• Append only (cheapers disks);
• No complex data structure (no BTree or LSM tree);
• Uses OS memory management;
• Relies on replicas not on disks;
• Zero-copy;
Key takeaways
Thank you!
#rivers

More Related Content

What's hot

What's hot (20)

Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Deep Dive into Apache Kafka
Deep Dive into Apache KafkaDeep Dive into Apache Kafka
Deep Dive into Apache Kafka
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database System
 
kafka
kafkakafka
kafka
 
Kafka Streams State Stores Being Persistent
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistent
 
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan GünalpRunning Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
 
Please Upgrade Apache Kafka. Now. (Gwen Shapira, Confluent) Kafka Summit SF 2019
Please Upgrade Apache Kafka. Now. (Gwen Shapira, Confluent) Kafka Summit SF 2019Please Upgrade Apache Kafka. Now. (Gwen Shapira, Confluent) Kafka Summit SF 2019
Please Upgrade Apache Kafka. Now. (Gwen Shapira, Confluent) Kafka Summit SF 2019
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Kafka At Scale in the Cloud
Kafka At Scale in the CloudKafka At Scale in the Cloud
Kafka At Scale in the Cloud
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Kafka 101 and Developer Best Practices
Kafka 101 and Developer Best PracticesKafka 101 and Developer Best Practices
Kafka 101 and Developer Best Practices
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
Flexible Authentication Strategies with SASL/OAUTHBEARER (Michael Kaminski, T...
Flexible Authentication Strategies with SASL/OAUTHBEARER (Michael Kaminski, T...Flexible Authentication Strategies with SASL/OAUTHBEARER (Michael Kaminski, T...
Flexible Authentication Strategies with SASL/OAUTHBEARER (Michael Kaminski, T...
 
Kafka Tutorial: Kafka Security
Kafka Tutorial: Kafka SecurityKafka Tutorial: Kafka Security
Kafka Tutorial: Kafka Security
 

Similar to How is Kafka so Fast?

Building High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in KafkaBuilding High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in Kafka
confluent
 
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Erik Onnen
 
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
VMware Tanzu
 

Similar to How is Kafka so Fast? (20)

Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
 
Stream Processing @ Lyft
Stream Processing @ LyftStream Processing @ Lyft
Stream Processing @ Lyft
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
 
Building High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in KafkaBuilding High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in Kafka
 
Consensus in Apache Kafka: From Theory to Production.pdf
Consensus in Apache Kafka: From Theory to Production.pdfConsensus in Apache Kafka: From Theory to Production.pdf
Consensus in Apache Kafka: From Theory to Production.pdf
 
Scylla Summit 2018: Keeping Your Latency SLAs No Matter What!
Scylla Summit 2018: Keeping Your Latency SLAs No Matter What!Scylla Summit 2018: Keeping Your Latency SLAs No Matter What!
Scylla Summit 2018: Keeping Your Latency SLAs No Matter What!
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Stores
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streams
 
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
 
Kafka Explainaton
Kafka ExplainatonKafka Explainaton
Kafka Explainaton
 
Distributed messaging through Kafka
Distributed messaging through KafkaDistributed messaging through Kafka
Distributed messaging through Kafka
 
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life ExampleKafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
 
Web Analytics using Kafka - August talk w/ Women Who Code
Web Analytics using Kafka - August talk w/ Women Who CodeWeb Analytics using Kafka - August talk w/ Women Who Code
Web Analytics using Kafka - August talk w/ Women Who Code
 
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
 
Apache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling OutApache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling Out
 
World of Tanks Experience of Using Kafka
World of Tanks Experience of Using KafkaWorld of Tanks Experience of Using Kafka
World of Tanks Experience of Using Kafka
 
Tuning kafka pipelines
Tuning kafka pipelinesTuning kafka pipelines
Tuning kafka pipelines
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
Building Apps with Distributed In-Memory Computing Using Apache Geode
Building Apps with Distributed In-Memory Computing Using Apache GeodeBuilding Apps with Distributed In-Memory Computing Using Apache Geode
Building Apps with Distributed In-Memory Computing Using Apache Geode
 

Recently uploaded

TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
FIDO Alliance
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
FIDO Alliance
 

Recently uploaded (20)

TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdf
 

How is Kafka so Fast?

  • 1. Ricardo Paiva and Hervé Rivière Understanding the design of Kafka and how it handles Criteo workload How is Kafka so fast?
  • 3. 3 • Apache Kafka is a distributed message queue • Open-sourced by LinkedIn in 2011 • High-throughput • Highly distributed • Fault-tolerant • Low-latency What is Kafka?
  • 4. 4 • • Use case • GLUP pipeline (aka Kafka Local) • Streaming event processing platform (aka Kafka Stream) • Some figures : • 14 clusters / 200 servers / 7 DC • Up to 7 millions messages / sec • Up to 150 TB processed per day Kafka @ Criteo ?
  • 5. 5 • Topics, Partitions and Offsets 7 6 5 4 3 2 1 08910 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 08 7 6 5 4 3 2 1 0891011 7 6 5 4 3 2 1 08 7 6 5 4 3 2 1 0 Partition 0 Partition 1 Partition 0 Partition 1 Partition 2 Partition 3 OldNew Writes Topic A Topic B
  • 7. 7 • Brokers • Manage partitions • Receive from producer records for a (topic, partition) • Answer to consumer asking records for (topic, partition, offset) • Manage replicas • Manage consumer coordination • Assigning good partitions to the good consumer Broker 1 Producer Broker 2 Consumer Consumer Fetch (Topic A, Partition 4, Offset 10) Bytes Fetch (Topic B, Partition 1, Offset 10) Bytes
  • 8. 8 • Producers Producer Broker (partition leader) Broker (replica) Broker (replica) ack • Producers decide what partitions to send to; • Producers can send a batch of messages; • Producers can compress a batch; • Producers wait for acknowledgement from the broker (acks=1) or broker + replica (acks=all);
  • 9. 9 • Consumers ConsumerBroker 6 7 8 9 10 11 12 12 offset=7 Partition 2: Partition 2, offset 6 7 8 9 1 2 3Commit offset=9 • Consumers control what offset to consume from; • Consumers commit offsets to kafka, but it’s just another Kafka topic; • Consumers can receive batched and / or compressed data; • Kafka coordinates which partitions each consumer will consume from.
  • 10. Did you say SSD is better than HDD ?
  • 11. 11 • Faster but not that much
  • 12. 12 • • Each Kafka partition is mapped to segment files • Segment file : log append structure • Records are immutable • Broker is doing very few random disk search Only sequential I/O Kafka Active Segment file Old segment files
  • 13. 13 • • Kafka relies on native Linux Page cache (read-ahead and write-behind) • JVM off-heap cache for free • Kafka records aren’t deserialized in Kafka JVM • No Java object memory overhead • No OutOfMemory issue • No big GC pauses Caching data for free Kafka Active Segment file Disk OS Old segments files
  • 14. 14 • Reliability with replication • Kafka disk writes are asynchronous • Kafka replicas synchronisation (over network) is synchronous • Trusting replicas in case of data corruption / server crash Broker (partition leader) Broker (replica) Broker (previous leader)
  • 16. 16 • Sending data from file to network (traditional approach) read(file, tmp_buf, len); write(socket, tmp_buf, len);
  • 17. 17 • Sending data from file to network (zero-copy approach) transferTo(position, count, writableChannel);
  • 19. 19 • • Paralelism based on topic partitions; • Data compressed/uncompressed on the client; • Producers send a batch of messages; • No serialization/deserialization costs on the brokers; • Writing directly to file: • Append only (cheapers disks); • No complex data structure (no BTree or LSM tree); • Uses OS memory management; • Relies on replicas not on disks; • Zero-copy; Key takeaways

Editor's Notes

  1. Do quick presentation of each other short agenda (first kafka basics + seconds design choice that made it a great tool for our scale)
  2. Why this name : just because initial creator (Jay Kreps) liked this author, like the fact he was a writer and think it was a good name for an OS project.
  3. Topic is just lake a table in a DB but for a queue for a queue we called that topic. You send message to Bid request topic and you received message from billable click topic Partition are a section of a topic. So here topic A have two partiotn / topic B have 4. Partitions are spread over different servers but one partition is always fitting in one server. Topic can be bid request and billable click Bid request as 1000 partitions Partitions are in different server Order only inside a partition Each message as a monotonic offset. Focus on :  - Kafka is just storing bytes / no schema --> you can send image in kafka if you want (not a wonderfull idea, but it works)
  4. First step we want to explain you is complexity is not in server but in client
  5. Producer and consumer Broker is dummy Difference between rabbit MQ or oyher queue : you can have huge queue if you want (cf event sourcing store) limit is disk / don’t care about status of a message is it well received is dummy + pull and not push You can group together consumer to create a consumer group and so a distributed application. Broker is managing coordianation of consumer to assgn good partition to good consumer
  6. Focus on :  - No SPOF /no  broker acting like gateway for the cluster : producer is maintenaing the mapping (topic, partition) -> broker Batch is only logic : one physical message (one send request / ack) is containing several messages Batch advantage : Compress is efficient / network ack is efficient : one ack for each 1 000 messages for instance
  7. Warning : consumer receive compress batch data only if producer was sending like that
  8. Cost efficiency + highest perf Advantage here is to use JBOD or RAID Having ssd will cost more with equal perf or even lower
  9. - Same cache system than varnish (HTTP cache server) - Designed to work with linux only.  - a heap of 4gb is enough because no data inside (only managing metatdata and client connection) 
  10. - Same cache system than varnish (HTTP cache server) - Designed to work with linux only.  - a heap of 4gb is enough because no data inside (only managing metatdata and client connection) 
  11. Disk is async (and it's ok because network is sync)