SlideShare a Scribd company logo
1 / 40
Kafka In Action
Sumit Jain
2 / 40
Introduction
●
Apache Kafka is a publish/subscribe
messaging system.
●
A distributed, partitioned, replicated commit
log (record of transactions) service.
●
High-throughput, low-latency platform for
handling real-time data feeds.
3 / 40
●
Activity Tracking: generating reports, feeding
machine learning systems, updating search results
●
Messaging: Send notifications to users, application
events
●
Metrics and logging: Aggregating statistics/logs from
distributed applications to produce centralized feeds
of operational data.
●
Commit Log: Database changes can be published to
Kafka and applications can easily monitor this
stream
●
Stream processing: Real time processing of infinite
data stream.
Use Cases
4 / 40
Example: Logging Pipeline
5 / 40
Kafka Architecture
6 / 40
Kafka Basics
●
The unit of data within Kafka is called a
message (array of bytes).
●
Message can have a key (byte array)
●
Keys are used to distribute messages among
partitions using a partitioner.
7 / 40
Kafka Internals
8 / 40
Topic Partitions
●
Messages in Kafka are categorized into topics.
●
Topics are broken into ordered commit logs called
partitions
●
Each message in a partition is assigned a sequential
id called offset.
●
Partitions are the way that Kafka provides
redundancy and scalability.
9 / 40
How many partitions?
●
Expected throughput you want to achieve.
●
Maximum throughput of 1 consumer thread.
●
Consider the number of partitions you will place on each
broker and available diskspace
●
Calculate throughput based on your expected future
usage, not the current usage
●
When publishing based on keys, difficult to change
partition count later.
●
Decreasing partitions not allowed.
●
Each partition uses memory/other resources on broker
and will increase the time for leader elections.
10 / 40
Kafka Cluster
●
Zookeper maintains list of current brokers
●
Every broker has a unique id (broker.id)
●
Kafka components subscribe to the /brokers/ids
path in Zookeeper
●
The first broker that starts in the cluster
becomes the controller by creating an
ephemeral node called /controller
– Responsible for electing partition leaders
when nodes join and leave the cluster
– Cluster will(should) only have one controller at
a time
11 / 40
Replication
●
Two types of replicas: leader & follower
●
All produce and consume requests go through the leader,
in order to guarantee consistency.
●
Followers send the leader Fetch requests, to stay in sync
(just like consumers)
●
A replica is considered in-sync if it is leader, or if it is a
follower that has an active session with Zookeeper and is
not too far behind the leader (configurable)
●
Only in-sync replicas are eligible to be elected as partition
leaders in case the existing leader fails
●
Each partition has a preferred leader—the replica that was
the leader when the topic was originally created
12 / 40
Partition Allocation
●
When creating a topic, Kafka places the partitions and
replicas in a way that the brokers with least number of
existing partitions are used first, and replicas for same
partition are on different brokers.
●
When you add a new broker, it is used for new partitions
(since it has lowest number of existing partitions), but
there is no automatic balancing of existing partitions to
the new broker. You can use the replica-reassignment tool
to move partitions and replicas to the new broker.
13 / 40
Kafka APIs
●
Producer API
●
Consumer API
●
Streams API
– Transforming streams of data from input topics to
output topics.
●
Connector API
– Implementing connectors that continually pull
from some source system into Kafka or push from
Kafka into some sink system.
●
AdminClient API
– Managing and inspecting topics, brokers, and
other Kafka objects.
14 / 40
Producer
15 / 40
●
Default Partitioner: Round-robin, used if Key not
specified.
●
Hashing partitioner is used if key is provided,
hashing over all partitions, not just available ones.
●
Define custome partitioner
●
Returns RecordMetadata: [topic, partition, offset]
●
max.in.flight.requests.per.connection: how many
messages to send without receiving responses
●
When a producer firsts connect, it issues a Metadata
request, with the topics you are publishing to, and
the broker replies with which broker has which
partition.
Producer
16 / 40
●
Kafka consumers are typically part of a consumer
group.
●
Multiple consumers of same group, will consume
from a different subset of the partitions in the topic.
●
One consumer per thread rule.
Consumer
17 / 40
●
Consumer keep track of consumed messages by
saving offset position in kafka.
●
__consumer_offsets topic: [partition key: groupId,
topic and partition number] > offset
●
When consumers are added or leave, partitions are
re-allocated. Moving partition ownership from one
consumer to another is called a rebalance.
●
Rebalances provide the consumer group with high
availability and scalability.
Consumer
18 / 40
Consumer: The poll loop
19 / 40
Performance
20 / 40
Kafka optimizations
●
Kafka achieves high throughput and low latency
primarily from 2 key concepts:
– Batching of individual messages to amortizee n/w
overhead and append/consume chunks together.
– Zero copy using sendFile, skips unnecessary
copying to user space.
●
Relies heavily on Linux page cache to batch writes
and re-sequence writes to minimize disk-head
movement
●
Automatically uses all free memory.
●
If consumers are caught up with producers, you are
essentially reading from cache.
●
Batches are also typically compressed
21 / 40
Cluster Sizing
●
No of brokers is goverened by data size &
capacity of the cluster to handle requests.
●
No. of brokers >= replication factor
●
Separate zookeper ensemble the atleast 3 nodes
(tolerates 1 failure), atleast 5 is ideal to support
2 failures. Requires (N/2 +1) majority to function.
●
Kafka and zookeper shouldn‘t be co-located as
both are disk I/O latency sensitive.
●
Kafka uses Linux OS page cache to reduce disk
I/O. Other apps "pollute" the page cache with
other data that takes away from cache for Kafka.
●
Single zookeper ensemble can be used for
multiple kafka clusters.
22 / 40
Hardware
●
Use SSDs if possible, multiple data directories.
●
Make good amount of memory available to
system, to improve page cache performance.
●
Available network throughput along with disk
storage governs cluster sizing.
●
If network interface become saturated, cluster
replication can fall behind
●
Ideally, clients should compress messages to
optimize network and disk usage.
●
Processing power not as important as disk and
memory. CPU is chiefly required for
compression/decompression.
23 / 40
Durability
24 / 40
Pesistence
●
Kafka always immediately writes all data to the
filesystem.
●
On Linux, the messages are written to the
filesystem cache and there is no guarantee
about when they will be written to disk.
●
Supports flush policy configuration (from OS
cache to disk).
●
Durability in Kafka (or other distributed systems)
does not require syncing data to disk, as a failed
node will recover from its replicas.
●
Recommended to use default flush settings
which disable application fsync entirely.
25 / 40
Message Retention
●
By time and/or size
●
log.retention.ms: 1 week enabled by default
●
log.retention.bytes: Applied per partition
●
Compacted Topics: Fine grained per record log
retention.
– Retains at least the last value for each key for a
single topic partition.
– Allows downstream consumers to restore their
state after a crash/reloading cache.
26 / 40
Reliability
27 / 40
Broker Guarantees
●
Kafka provides order guarantee of messages in a
partition
●
Produced messages are considered committed when
they were written to the partition on all its in-sync
replicas (but not necessarily flushed to disk)
●
Messages that are committed will not be lost as long
as at least one replica remains alive.
●
Consumers can only read messages that are
committed and see records in the order they are
stored in the log.
●
Stronger ordering guarantees than traditional
messaging system like rabbitmq.
28 / 40
●
Guarantees are necessary but not sufficient
to make the system fully reliable.
●
Trade-offs are invloved between
reliability/consistency and
availability/throughput/latency/hardware cost
●
Kafka’s replication mechanism, with multiple
replicas per partition, is at the core of its
reliability guarantees
●
Reliability can be configured per broker/topic
by certain parameters
Configurable Reliability
29 / 40
Reliability parameters
●
Replication Factor (Default: 1): A replication factor
of N allows you to lose N-1 brokers while still being
able to read and write data to the topic reliably. So a
higher replication factor leads to higher availability,
higher reliability, and fewer disasters.
●
On flipside, for RF=N, you need N times storage
space.
●
Placement of replicas is also important: Kafka
supports rack awareness.
●
Unclean Leader Election (Default true) : Allow out-
of-sync replica to become leader or not.
●
Minimum In-Sync Replicas (Default: 1): Minimum
no. of ISR to be available to accept produce requests.
30 / 40
Producing Reliably
●
Use the correct acks configuration:
– acks=0: successfully sent over the n/w
– acks=1: leader will send acknowledgment/error
after writing to partition
– acks=all: leader will wait until all in-sync replicas
got the message before sending back an
acknowledgment or an error. Here all means
min.insync.replica. Slowest option.
●
Handle errors correctly. Retry all retiable errors e.g
Leader not available, to prevent message loss.
●
Can lead to duplicate publish in case of lost
acknowledgments.
●
Error Handling strategies: Ignore/log/report
31 / 40
Consuming Reliably
●
Consumers can lose messages when committing
offsets for events they’ve read but haven‘t processed
●
auto.offset.reset: (earliest/latest/..) what the
consumer will do when no commited offsets are
found
●
enable.auto.commit: set it to true if you can finish
processing the batch within the poll loop in sync.
●
No control over the number of duplicate records you
may need to process
●
auto.commit.interval.ms: tradeoff between commit
overhead vs duplicate processing
●
Handle Rebalances properly: Commit offsets before
partitions are revoked, clear old state on assignment
32 / 40
●
Need to retry in case of error in upstream system like
database.
●
Can‘t block indefinitely, as need to poll to prevent
consumer session termination.
●
Commit last successfully processed record, store retry
packets in buffer and keep trying to process the
records.
●
Can lead to unbounded buffer problem
●
Use the consumer pause() method to ensure that
additional polls won’t return additional data and
resume if retry succeeds.
●
Dead letter topic: Write to separate topic and
continue
●
Separate or same consumer group can be used to
handle retries from the retry topic and pause the retry
topic between retries
33 / 40
Exactly Once Delivery
34 / 40
Idempotent Producer
●
Every new producer will be assigned a unique PID
during initialization
●
Producer will have a sequence no. for each
partition. It will be incremented by the producer
on every message sent to the broker.
●
The broker maintains persistent sequence
numbers it receives for each topic partition from
every PID.
●
The broker will reject a produce request if its seq-
no is not exactly one greater than the last
committed message from that PID/TopicPartition
pair
●
configure your producer to set
“enable.idempotence=true”
35 / 40
Excatly once consumer
●
Easiest way is to make consumer operation
idempotent (idempotent writes).
●
Write to a system that has transactions
– The idea is to write the records and their
offsets in the same transaction so they will
be in-sync.
– When starting up, retrieve the offsets of the
latest records written to the external store
and then use consumer.seek() to start
consuming again from those offsets.
36 / 40
Transactions
●
Kafka now supports atomic writes across
multiple partitions through the new
transactions API (v 0.11)
●
Allows a producer to send a batch of
messages to multiple partitions such that
either all or no messages are visible to any
consumer
●
Allows you to commit your consumer offsets
in the same transaction along with the data,
thereby allowing end-to-end exactly-once
semantics.
●
Idempotency and atomicity, make exactly-
once stream processing possible.
37 / 40
Validating Reliability
●
Leader election: what happens if I kill the leader?
How long does it take the producer and consumer to
start working as usual again?
●
Controller election: how long does it take the system
to resume after a restart of the controller?
●
Rolling restart: can I restart the brokers one by one
without losing any messages?
●
Unclean leader election test: what happens when we
kill all the replicas for a partition one by one (to
make sure each goes out of sync) and then start a
broker that was out of sync?
●
Rolling restart of consumers
●
Clients lose connectivity to the server
38 / 40
Kafka vs Rabbitmq
39 / 40
Apache Kafka RabbitMQ
Creation year 2011 2007
License Apache (Open source)
Mozilla Public License
(Open source)
Enterprise support
available
✘ ✓
Programming language Scala Erlang
AMQP compliant ✘ ✓
Officially supported clients1
in
Java2 Java, .NET/C#, Erlang
Other available clients in
~ 10 languages (incl.
Python and Node.js)
~ 13 languages (incl.
Python, PHP and Node.js)
Documentation, examples
and community3
∿
Immature and incomplete
✓
Table 1 -‐ General information comparison between Apache Kafka and RabbitMQ
Predefined queues on
broker
✓ ✓
Multiple different
consumers of same data set1
✓
∿
Load balancing across a set
of same consumers
✓ ✓
Remote2 queue definition ✘ ✓
Advanced/conditional
message routing
✘
Message to partition only
✓
Permission and access
control list support
✘ ✓
SSL support ✘ ✓
Clustering support ✓ ✓
Self-‐sufficient
✘
Needs ZooKeeper ✓
Management and
monitoring interface
✘
JMX and CLI-‐based
✓
Web and CLI-‐based
40 / 40
References
●
https://kafka.apache.org/documentation/
●
Kafka: The definitive Guide
●
https://www.confluent.io/blog/
●
https://community.hortonworks.com/articles/8081
3/kafka-best-practices-1.html
●
https://www.cloudera.com/documentation/kafka/la
test/topics/kafka_ha.html
●
www.stackoverflow.com
https://www.confluent.io/blog/publishing-apache-
kafka-new-york-times/
●
https://blog.serverdensity.com/how-to-monitor-ka

More Related Content

What's hot

Microservices With Istio Service Mesh
Microservices With Istio Service MeshMicroservices With Istio Service Mesh
Microservices With Istio Service Mesh
Natanael Fonseca
 
Docker Kubernetes Istio
Docker Kubernetes IstioDocker Kubernetes Istio
Docker Kubernetes Istio
Araf Karsh Hamid
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Jean-Paul Azar
 
SeaweedFS introduction
SeaweedFS introductionSeaweedFS introduction
SeaweedFS introduction
chrislusf
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developers
confluent
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022
Kai Wähner
 
OpenShift Container Platform 4.12 Release Notes
OpenShift Container Platform 4.12 Release NotesOpenShift Container Platform 4.12 Release Notes
OpenShift Container Platform 4.12 Release Notes
GerryJamisola1
 
Disaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache KafkaDisaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache Kafka
confluent
 
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeApache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Slim Baltagi
 
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Kai Wähner
 
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentTemporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
HostedbyConfluent
 
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
Apache kafka performance(throughput) - without data loss and guaranteeing dat...Apache kafka performance(throughput) - without data loss and guaranteeing dat...
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
SANG WON PARK
 
Service mesh
Service meshService mesh
Service mesh
Arnab Mitra
 
Kubernetes design principles, patterns and ecosystem
Kubernetes design principles, patterns and ecosystemKubernetes design principles, patterns and ecosystem
Kubernetes design principles, patterns and ecosystem
Sreenivas Makam
 
RethinkConn 2022!
RethinkConn 2022!RethinkConn 2022!
RethinkConn 2022!
NATS
 
Loki - like prometheus, but for logs
Loki - like prometheus, but for logsLoki - like prometheus, but for logs
Loki - like prometheus, but for logs
Juraj Hantak
 
Data integration with Apache Kafka
Data integration with Apache KafkaData integration with Apache Kafka
Data integration with Apache Kafka
confluent
 
Kafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platformKafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platform
Jean-Paul Azar
 
Infinispan, a distributed in-memory key/value data grid and cache
 Infinispan, a distributed in-memory key/value data grid and cache Infinispan, a distributed in-memory key/value data grid and cache
Infinispan, a distributed in-memory key/value data grid and cache
Sebastian Andrasoni
 
Apache BookKeeper: A High Performance and Low Latency Storage Service
Apache BookKeeper: A High Performance and Low Latency Storage ServiceApache BookKeeper: A High Performance and Low Latency Storage Service
Apache BookKeeper: A High Performance and Low Latency Storage Service
Sijie Guo
 

What's hot (20)

Microservices With Istio Service Mesh
Microservices With Istio Service MeshMicroservices With Istio Service Mesh
Microservices With Istio Service Mesh
 
Docker Kubernetes Istio
Docker Kubernetes IstioDocker Kubernetes Istio
Docker Kubernetes Istio
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
SeaweedFS introduction
SeaweedFS introductionSeaweedFS introduction
SeaweedFS introduction
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developers
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022
 
OpenShift Container Platform 4.12 Release Notes
OpenShift Container Platform 4.12 Release NotesOpenShift Container Platform 4.12 Release Notes
OpenShift Container Platform 4.12 Release Notes
 
Disaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache KafkaDisaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache Kafka
 
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeApache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
 
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
 
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentTemporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
 
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
Apache kafka performance(throughput) - without data loss and guaranteeing dat...Apache kafka performance(throughput) - without data loss and guaranteeing dat...
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
 
Service mesh
Service meshService mesh
Service mesh
 
Kubernetes design principles, patterns and ecosystem
Kubernetes design principles, patterns and ecosystemKubernetes design principles, patterns and ecosystem
Kubernetes design principles, patterns and ecosystem
 
RethinkConn 2022!
RethinkConn 2022!RethinkConn 2022!
RethinkConn 2022!
 
Loki - like prometheus, but for logs
Loki - like prometheus, but for logsLoki - like prometheus, but for logs
Loki - like prometheus, but for logs
 
Data integration with Apache Kafka
Data integration with Apache KafkaData integration with Apache Kafka
Data integration with Apache Kafka
 
Kafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platformKafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platform
 
Infinispan, a distributed in-memory key/value data grid and cache
 Infinispan, a distributed in-memory key/value data grid and cache Infinispan, a distributed in-memory key/value data grid and cache
Infinispan, a distributed in-memory key/value data grid and cache
 
Apache BookKeeper: A High Performance and Low Latency Storage Service
Apache BookKeeper: A High Performance and Low Latency Storage ServiceApache BookKeeper: A High Performance and Low Latency Storage Service
Apache BookKeeper: A High Performance and Low Latency Storage Service
 

Similar to Kafka in action - Tech Talk - Paytm

Copy of Kafka-Camus
Copy of Kafka-CamusCopy of Kafka-Camus
Copy of Kafka-CamusDeep Shah
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
Amir Sedighi
 
Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018
Otávio Carvalho
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache Kafka
Angelo Cesaro
 
Columbus mule soft_meetup_aug2021_Kafka_Integration
Columbus mule soft_meetup_aug2021_Kafka_IntegrationColumbus mule soft_meetup_aug2021_Kafka_Integration
Columbus mule soft_meetup_aug2021_Kafka_Integration
MuleSoft Meetup
 
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
confluent
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
Chhavi Parasher
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Jemin Patel
 
High performance messaging with Apache Pulsar
High performance messaging with Apache PulsarHigh performance messaging with Apache Pulsar
High performance messaging with Apache Pulsar
Matteo Merli
 
Linked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache PulsarLinked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache Pulsar
Karthik Ramasamy
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
Amita Mirajkar
 
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUpStrimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
José Román Martín Gil
 
Apache Kafka - Free Friday
Apache Kafka - Free FridayApache Kafka - Free Friday
Apache Kafka - Free Friday
Otávio Carvalho
 
Kafka - Messaging System
Kafka - Messaging SystemKafka - Messaging System
Kafka - Messaging SystemTanuj Mehta
 
Insta clustr seattle kafka meetup presentation bb
Insta clustr seattle kafka meetup presentation   bbInsta clustr seattle kafka meetup presentation   bb
Insta clustr seattle kafka meetup presentation bb
Nitin Kumar
 
Unleashing Real-time Power with Kafka.pptx
Unleashing Real-time Power with Kafka.pptxUnleashing Real-time Power with Kafka.pptx
Unleashing Real-time Power with Kafka.pptx
Knoldus Inc.
 
Structured Streaming with Kafka
Structured Streaming with KafkaStructured Streaming with Kafka
Structured Streaming with Kafka
datamantra
 
Kafka syed academy_v1_introduction
Kafka syed academy_v1_introductionKafka syed academy_v1_introduction
Kafka syed academy_v1_introduction
Syed Hadoop
 
Uber: Kafka Consumer Proxy
Uber: Kafka Consumer ProxyUber: Kafka Consumer Proxy
Uber: Kafka Consumer Proxy
confluent
 

Similar to Kafka in action - Tech Talk - Paytm (20)

Copy of Kafka-Camus
Copy of Kafka-CamusCopy of Kafka-Camus
Copy of Kafka-Camus
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache Kafka
 
Columbus mule soft_meetup_aug2021_Kafka_Integration
Columbus mule soft_meetup_aug2021_Kafka_IntegrationColumbus mule soft_meetup_aug2021_Kafka_Integration
Columbus mule soft_meetup_aug2021_Kafka_Integration
 
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
 
kafka
kafkakafka
kafka
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
High performance messaging with Apache Pulsar
High performance messaging with Apache PulsarHigh performance messaging with Apache Pulsar
High performance messaging with Apache Pulsar
 
Linked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache PulsarLinked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache Pulsar
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUpStrimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
 
Apache Kafka - Free Friday
Apache Kafka - Free FridayApache Kafka - Free Friday
Apache Kafka - Free Friday
 
Kafka - Messaging System
Kafka - Messaging SystemKafka - Messaging System
Kafka - Messaging System
 
Insta clustr seattle kafka meetup presentation bb
Insta clustr seattle kafka meetup presentation   bbInsta clustr seattle kafka meetup presentation   bb
Insta clustr seattle kafka meetup presentation bb
 
Unleashing Real-time Power with Kafka.pptx
Unleashing Real-time Power with Kafka.pptxUnleashing Real-time Power with Kafka.pptx
Unleashing Real-time Power with Kafka.pptx
 
Structured Streaming with Kafka
Structured Streaming with KafkaStructured Streaming with Kafka
Structured Streaming with Kafka
 
Kafka syed academy_v1_introduction
Kafka syed academy_v1_introductionKafka syed academy_v1_introduction
Kafka syed academy_v1_introduction
 
Uber: Kafka Consumer Proxy
Uber: Kafka Consumer ProxyUber: Kafka Consumer Proxy
Uber: Kafka Consumer Proxy
 

Recently uploaded

UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 

Recently uploaded (20)

UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 

Kafka in action - Tech Talk - Paytm

  • 1. 1 / 40 Kafka In Action Sumit Jain
  • 2. 2 / 40 Introduction ● Apache Kafka is a publish/subscribe messaging system. ● A distributed, partitioned, replicated commit log (record of transactions) service. ● High-throughput, low-latency platform for handling real-time data feeds.
  • 3. 3 / 40 ● Activity Tracking: generating reports, feeding machine learning systems, updating search results ● Messaging: Send notifications to users, application events ● Metrics and logging: Aggregating statistics/logs from distributed applications to produce centralized feeds of operational data. ● Commit Log: Database changes can be published to Kafka and applications can easily monitor this stream ● Stream processing: Real time processing of infinite data stream. Use Cases
  • 4. 4 / 40 Example: Logging Pipeline
  • 5. 5 / 40 Kafka Architecture
  • 6. 6 / 40 Kafka Basics ● The unit of data within Kafka is called a message (array of bytes). ● Message can have a key (byte array) ● Keys are used to distribute messages among partitions using a partitioner.
  • 7. 7 / 40 Kafka Internals
  • 8. 8 / 40 Topic Partitions ● Messages in Kafka are categorized into topics. ● Topics are broken into ordered commit logs called partitions ● Each message in a partition is assigned a sequential id called offset. ● Partitions are the way that Kafka provides redundancy and scalability.
  • 9. 9 / 40 How many partitions? ● Expected throughput you want to achieve. ● Maximum throughput of 1 consumer thread. ● Consider the number of partitions you will place on each broker and available diskspace ● Calculate throughput based on your expected future usage, not the current usage ● When publishing based on keys, difficult to change partition count later. ● Decreasing partitions not allowed. ● Each partition uses memory/other resources on broker and will increase the time for leader elections.
  • 10. 10 / 40 Kafka Cluster ● Zookeper maintains list of current brokers ● Every broker has a unique id (broker.id) ● Kafka components subscribe to the /brokers/ids path in Zookeeper ● The first broker that starts in the cluster becomes the controller by creating an ephemeral node called /controller – Responsible for electing partition leaders when nodes join and leave the cluster – Cluster will(should) only have one controller at a time
  • 11. 11 / 40 Replication ● Two types of replicas: leader & follower ● All produce and consume requests go through the leader, in order to guarantee consistency. ● Followers send the leader Fetch requests, to stay in sync (just like consumers) ● A replica is considered in-sync if it is leader, or if it is a follower that has an active session with Zookeeper and is not too far behind the leader (configurable) ● Only in-sync replicas are eligible to be elected as partition leaders in case the existing leader fails ● Each partition has a preferred leader—the replica that was the leader when the topic was originally created
  • 12. 12 / 40 Partition Allocation ● When creating a topic, Kafka places the partitions and replicas in a way that the brokers with least number of existing partitions are used first, and replicas for same partition are on different brokers. ● When you add a new broker, it is used for new partitions (since it has lowest number of existing partitions), but there is no automatic balancing of existing partitions to the new broker. You can use the replica-reassignment tool to move partitions and replicas to the new broker.
  • 13. 13 / 40 Kafka APIs ● Producer API ● Consumer API ● Streams API – Transforming streams of data from input topics to output topics. ● Connector API – Implementing connectors that continually pull from some source system into Kafka or push from Kafka into some sink system. ● AdminClient API – Managing and inspecting topics, brokers, and other Kafka objects.
  • 15. 15 / 40 ● Default Partitioner: Round-robin, used if Key not specified. ● Hashing partitioner is used if key is provided, hashing over all partitions, not just available ones. ● Define custome partitioner ● Returns RecordMetadata: [topic, partition, offset] ● max.in.flight.requests.per.connection: how many messages to send without receiving responses ● When a producer firsts connect, it issues a Metadata request, with the topics you are publishing to, and the broker replies with which broker has which partition. Producer
  • 16. 16 / 40 ● Kafka consumers are typically part of a consumer group. ● Multiple consumers of same group, will consume from a different subset of the partitions in the topic. ● One consumer per thread rule. Consumer
  • 17. 17 / 40 ● Consumer keep track of consumed messages by saving offset position in kafka. ● __consumer_offsets topic: [partition key: groupId, topic and partition number] > offset ● When consumers are added or leave, partitions are re-allocated. Moving partition ownership from one consumer to another is called a rebalance. ● Rebalances provide the consumer group with high availability and scalability. Consumer
  • 18. 18 / 40 Consumer: The poll loop
  • 20. 20 / 40 Kafka optimizations ● Kafka achieves high throughput and low latency primarily from 2 key concepts: – Batching of individual messages to amortizee n/w overhead and append/consume chunks together. – Zero copy using sendFile, skips unnecessary copying to user space. ● Relies heavily on Linux page cache to batch writes and re-sequence writes to minimize disk-head movement ● Automatically uses all free memory. ● If consumers are caught up with producers, you are essentially reading from cache. ● Batches are also typically compressed
  • 21. 21 / 40 Cluster Sizing ● No of brokers is goverened by data size & capacity of the cluster to handle requests. ● No. of brokers >= replication factor ● Separate zookeper ensemble the atleast 3 nodes (tolerates 1 failure), atleast 5 is ideal to support 2 failures. Requires (N/2 +1) majority to function. ● Kafka and zookeper shouldn‘t be co-located as both are disk I/O latency sensitive. ● Kafka uses Linux OS page cache to reduce disk I/O. Other apps "pollute" the page cache with other data that takes away from cache for Kafka. ● Single zookeper ensemble can be used for multiple kafka clusters.
  • 22. 22 / 40 Hardware ● Use SSDs if possible, multiple data directories. ● Make good amount of memory available to system, to improve page cache performance. ● Available network throughput along with disk storage governs cluster sizing. ● If network interface become saturated, cluster replication can fall behind ● Ideally, clients should compress messages to optimize network and disk usage. ● Processing power not as important as disk and memory. CPU is chiefly required for compression/decompression.
  • 24. 24 / 40 Pesistence ● Kafka always immediately writes all data to the filesystem. ● On Linux, the messages are written to the filesystem cache and there is no guarantee about when they will be written to disk. ● Supports flush policy configuration (from OS cache to disk). ● Durability in Kafka (or other distributed systems) does not require syncing data to disk, as a failed node will recover from its replicas. ● Recommended to use default flush settings which disable application fsync entirely.
  • 25. 25 / 40 Message Retention ● By time and/or size ● log.retention.ms: 1 week enabled by default ● log.retention.bytes: Applied per partition ● Compacted Topics: Fine grained per record log retention. – Retains at least the last value for each key for a single topic partition. – Allows downstream consumers to restore their state after a crash/reloading cache.
  • 27. 27 / 40 Broker Guarantees ● Kafka provides order guarantee of messages in a partition ● Produced messages are considered committed when they were written to the partition on all its in-sync replicas (but not necessarily flushed to disk) ● Messages that are committed will not be lost as long as at least one replica remains alive. ● Consumers can only read messages that are committed and see records in the order they are stored in the log. ● Stronger ordering guarantees than traditional messaging system like rabbitmq.
  • 28. 28 / 40 ● Guarantees are necessary but not sufficient to make the system fully reliable. ● Trade-offs are invloved between reliability/consistency and availability/throughput/latency/hardware cost ● Kafka’s replication mechanism, with multiple replicas per partition, is at the core of its reliability guarantees ● Reliability can be configured per broker/topic by certain parameters Configurable Reliability
  • 29. 29 / 40 Reliability parameters ● Replication Factor (Default: 1): A replication factor of N allows you to lose N-1 brokers while still being able to read and write data to the topic reliably. So a higher replication factor leads to higher availability, higher reliability, and fewer disasters. ● On flipside, for RF=N, you need N times storage space. ● Placement of replicas is also important: Kafka supports rack awareness. ● Unclean Leader Election (Default true) : Allow out- of-sync replica to become leader or not. ● Minimum In-Sync Replicas (Default: 1): Minimum no. of ISR to be available to accept produce requests.
  • 30. 30 / 40 Producing Reliably ● Use the correct acks configuration: – acks=0: successfully sent over the n/w – acks=1: leader will send acknowledgment/error after writing to partition – acks=all: leader will wait until all in-sync replicas got the message before sending back an acknowledgment or an error. Here all means min.insync.replica. Slowest option. ● Handle errors correctly. Retry all retiable errors e.g Leader not available, to prevent message loss. ● Can lead to duplicate publish in case of lost acknowledgments. ● Error Handling strategies: Ignore/log/report
  • 31. 31 / 40 Consuming Reliably ● Consumers can lose messages when committing offsets for events they’ve read but haven‘t processed ● auto.offset.reset: (earliest/latest/..) what the consumer will do when no commited offsets are found ● enable.auto.commit: set it to true if you can finish processing the batch within the poll loop in sync. ● No control over the number of duplicate records you may need to process ● auto.commit.interval.ms: tradeoff between commit overhead vs duplicate processing ● Handle Rebalances properly: Commit offsets before partitions are revoked, clear old state on assignment
  • 32. 32 / 40 ● Need to retry in case of error in upstream system like database. ● Can‘t block indefinitely, as need to poll to prevent consumer session termination. ● Commit last successfully processed record, store retry packets in buffer and keep trying to process the records. ● Can lead to unbounded buffer problem ● Use the consumer pause() method to ensure that additional polls won’t return additional data and resume if retry succeeds. ● Dead letter topic: Write to separate topic and continue ● Separate or same consumer group can be used to handle retries from the retry topic and pause the retry topic between retries
  • 33. 33 / 40 Exactly Once Delivery
  • 34. 34 / 40 Idempotent Producer ● Every new producer will be assigned a unique PID during initialization ● Producer will have a sequence no. for each partition. It will be incremented by the producer on every message sent to the broker. ● The broker maintains persistent sequence numbers it receives for each topic partition from every PID. ● The broker will reject a produce request if its seq- no is not exactly one greater than the last committed message from that PID/TopicPartition pair ● configure your producer to set “enable.idempotence=true”
  • 35. 35 / 40 Excatly once consumer ● Easiest way is to make consumer operation idempotent (idempotent writes). ● Write to a system that has transactions – The idea is to write the records and their offsets in the same transaction so they will be in-sync. – When starting up, retrieve the offsets of the latest records written to the external store and then use consumer.seek() to start consuming again from those offsets.
  • 36. 36 / 40 Transactions ● Kafka now supports atomic writes across multiple partitions through the new transactions API (v 0.11) ● Allows a producer to send a batch of messages to multiple partitions such that either all or no messages are visible to any consumer ● Allows you to commit your consumer offsets in the same transaction along with the data, thereby allowing end-to-end exactly-once semantics. ● Idempotency and atomicity, make exactly- once stream processing possible.
  • 37. 37 / 40 Validating Reliability ● Leader election: what happens if I kill the leader? How long does it take the producer and consumer to start working as usual again? ● Controller election: how long does it take the system to resume after a restart of the controller? ● Rolling restart: can I restart the brokers one by one without losing any messages? ● Unclean leader election test: what happens when we kill all the replicas for a partition one by one (to make sure each goes out of sync) and then start a broker that was out of sync? ● Rolling restart of consumers ● Clients lose connectivity to the server
  • 38. 38 / 40 Kafka vs Rabbitmq
  • 39. 39 / 40 Apache Kafka RabbitMQ Creation year 2011 2007 License Apache (Open source) Mozilla Public License (Open source) Enterprise support available ✘ ✓ Programming language Scala Erlang AMQP compliant ✘ ✓ Officially supported clients1 in Java2 Java, .NET/C#, Erlang Other available clients in ~ 10 languages (incl. Python and Node.js) ~ 13 languages (incl. Python, PHP and Node.js) Documentation, examples and community3 ∿ Immature and incomplete ✓ Table 1 -‐ General information comparison between Apache Kafka and RabbitMQ Predefined queues on broker ✓ ✓ Multiple different consumers of same data set1 ✓ ∿ Load balancing across a set of same consumers ✓ ✓ Remote2 queue definition ✘ ✓ Advanced/conditional message routing ✘ Message to partition only ✓ Permission and access control list support ✘ ✓ SSL support ✘ ✓ Clustering support ✓ ✓ Self-‐sufficient ✘ Needs ZooKeeper ✓ Management and monitoring interface ✘ JMX and CLI-‐based ✓ Web and CLI-‐based
  • 40. 40 / 40 References ● https://kafka.apache.org/documentation/ ● Kafka: The definitive Guide ● https://www.confluent.io/blog/ ● https://community.hortonworks.com/articles/8081 3/kafka-best-practices-1.html ● https://www.cloudera.com/documentation/kafka/la test/topics/kafka_ha.html ● www.stackoverflow.com https://www.confluent.io/blog/publishing-apache- kafka-new-york-times/ ● https://blog.serverdensity.com/how-to-monitor-ka