I have covered the basics terminology of Kafka and best practices we should use while producing data to Kafka or starting any sources sink connector to consume data from Kafka and store that in any system
these are some common configurations which we should keep in mind to provide.
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Kafka basics and best prectices
1. What is Kafka
Kafka is a Distributed Streaming Platform,it is a fast, scalable, fault-tolerant
messaging system which enables communication between producers and
consumers using message-based topics
Two types of messaging system:
● Point to point
● Publish and subscribe
5. Kafka Broker & ZooKeeper
● A broker is a Kafka server. As the name suggests, the producer and consumer
don’t interact directly but use the Kafka server as an agent or broker to exchange
message services
● Kafka cluster typically consists of multiple brokers to maintain load balance
● ZooKeeper is used for managing and coordinating Kafka broker
● Zookeeper is used for broker leader election
6. Topics, Partition and offset
● Topics: Streams of “related” Messages in kafka
● In a Kafka cluster, a topic is identified by its name and must
be unique
● Topics are split into Partitions and also replicated across
brokers (helps to achieve parallelism)
● Always keep the appropriate number of partitions,so that
apps can be scaled up.
● In a partition, each message is assigned an incremental id,
which is known as offset
8. Partitioning in kafka
● Producers use a Partitioning Strategy to assign each message to a
Partition
● Partitioning Strategy specified by Producer
○ Default Strategy: hash(key) % number_of_partitions
○ No Key Round-Robin
● Custom Partitioner possible
● Always see if you need a partitioning key in your use case. It will
distribute your data in a way that you will get purely incremental
(ordered) updates according to the unique id
10. Message Delivery and Durability
Guarantees
Acks Options(all):[all, -1, 0, 1]
For data durability, the KafkaProducer has the configuration setting acks. The
acks setting specifies how many acknowledgments the producer must receive in
order to consider a record delivered to the broker
● none (0): This is basically “fire and forget”
● 1: The producer waits for the lead broker to acknowledge
● all(-1): The producer waits for an acknowledgment from the lead broker and
from the follower
In-sync replicas:An in-sync replica is a replica that fully catches up with the
leader in the last 10 seconds (replica.lag.time.max.ms)
13. Setting a Minimum(min.insync.replicas=1)
● min.insync.replicas configuration setting enforces the number of
replicas that must be in sync for the write to proceed
● The min.insync.replicas setting is set at the broker or topic level,
and is not part of the producer configuration
● The default value for min.insync.replicas is one
● If the number of in-sync replicas is below the configured
amount, the lead broker won’t attempt to append the record to
its log
● So by setting the min.insync.replicas and producer acks
configuration settings to work together in this way, we can
increase the durability of data.
15. Imp Config:
● request.timeout.ms(30 sec):defines how long the client (both producer
and consumer) will wait to receive a response from the broker.
● retries (INT_MAX): As implied by the name, this is the number of retries to
attempt.
● delivery.timeout.ms(2 min): is an upper bound on the total time a producer
will attempt to deliver a record
● linger.ms: controls the amount of time that the producer will linger
while batching these messages
16. ● Msg Deliver semantics:
1. At Least once
2. At most once
3. Exactly once
● Enable.idempotence (true): the producer will ensure that exactly one copy
of each message is written in the stream. If 'false', producer retries due to
broker failures, etc., may write duplicates of the retried message in the
stream.