Kafka101

101
Presented by: Aparna Pillai

 What is Kafka
 What problem does Kafka solve
 How does Kafka work
 What are the benefits of Kafka
 Conclusion

Common pattern
Source system Source system Source system Source system
Target system Target system Target system Target system

With Apache Kafka
Source system Source system Source system Source system
Target system Target system Target system Target system

Taxonomy
• Producer – An application that send data to apache Kafka
• Consumer – An application that receives data from apache Kafka
• Consumer Groups – A group of consumers acting as a single logical
unit
• Broker – Kafka Server
• Cluster – Group of Kafka brokers
• Topic – All Kafka messages are organized into topics
• Partition – Part of Topic
• Offset – Unique id for a message with partition

Brokers
• A Kafka cluster is composed of brokers
• Each broker is identified by an id
• Each broker contains certain topic partitions
Broker 101 Broker 102 Broker 103

Brokers & Topics
Topic A
Partition 0
Topic A
Partition 2
Topic A
Partition 1
Topic B
Partition 1
Topic B
Partition 0
Topic A with 3 partitions and Topic B with 2

Topic replication factor
Topic A
Partition 0
Topic A
Partition 1
Topic A
Partition 1
Topics should have replication factor > 1 (usually between 2 and 3)
This way if a broker is down, another broker can serve the data
Eg: Topic A with 2 partitions and replication factor of 2
Topic A
Partition 0

Topic replication factor
Topic A
Partition 0
Topic A
Partition 1
Topic A
Partition 1
Topic A
Partition 0
If we lose Broker 102, we could still serve data from 101 and 103

Leader for a partition
• At a time only ONE broker can be a leader for a given partition
• Only that leader can receive and serve data for a partition
• The other brokers will synchronize the data
• Each partition has one leader and multiple ISR (In Sync Relplica)
Topic A
Partition 0
Topic A
Partition 1
Topic A
Partition 1(ISR)
Topic A
Partition 0(ISR)

• Producer can choose to receive acknowledgement of data writes
• acks=0 : Producer will not wait for acknowledgment (possible data loss)
• acks=1 : Producer will wait for leader acknowledgment (limited data loss)
• acks=all : leader + replica acknowledgment
Producer
Producer
Broker 101
Topic A/ Partition 0
0 1 2 3 4
0 1 2 3
0 1 2 3 4
Broker 102
Broker 103
writes
writes
writes

• Producer writes data to topics
• Load is balanced to many brokers
Producer
Producer
Broker 101
0 1 2 3 4
0 1 2 3
0 1 2 3 4
Broker 102
Broker 103
writes
writes
writes

• Producer can choose to send key with message (string, number …)
• If key = null, data is sent in round robin manner
• If a key is sent then, all messages for that key will go to the same partition
Producer
Topic A
Partition 0
Partition 1
Partition 2
Key =cc_payment_cc_123 data will always be partition 0

• Producer writes data to topics
• Load is balanced to many brokers
Consumer
Topic A/Partition 0
0 1 2 3 4
0 1 2 3
0 1 2 3 4
consumer
consumer
Read in order
Read in order
Read in order

• Consumer read data in consumer groups
• Each consumer within a group reads from exclusive partitions
• If you have more consumers than partitions, some consumers will be inactive
Consumer Groups
Topic A
Partition 0
Topic A
Partition 1
Topic A
Partition 2
Consumer 1 Consumer 2 Consumer 1 Consumer 2 Consumer 3
Consumer group app 1 Consumer group app 2

What if too many consumers ?
Consumer Groups
Topic A
Partition 0
Topic A
Partition 1
Topic A
Partition 2
Consumer 1 Consumer 2 Consumer 3
Consumer group app 2
Consumer 4
inactive

• Kafka stores the offsets at which a consumer group has been reading.
• The offsets committed live in a Kafka topic named _consumer_offsets
• When a consumer in a group has processed data received from Kafka,
it should be committing the offsets
• If a consumer dies, it will be able to read back from where it left off.
Thanks to the committed consumer offset
1001 1002 1003 1004 1005 1006 1007 1008
Consumer Groups
Consumer from
consumer Group
Committed offsets
Reads

• Consumer choose when to commit offsets.
• There are 3 delivery mechanisms
• At most once
• Offsets are committed as soon as the message is received.
• If the processing goes wrong, the message will be lost (it wont be read again)
• At least once
• Offsets are committed after the message is received.
• If the processing goes wrong, the message will be read again
• This can result in duplicate processing of messages. Make sure your processing is idempotent.
• Exactly once
Delivery semantics for consumer

• You can use connectors to
copy data between Apache
Kafka and other systems that
you want to pull data from or
push data to.
• Source Connectors import
data from another system.
Sink Connectors export data.
Kafka Connectors

Streaming SQL
for Apache
Kafka
• Confluent KSQL is the streaming SQL
engine that enables real-time data
processing against Apache Kafka®. It
provides an easy-to-use, yet powerful
interactive SQL interface for stream
processing on Kafka, without the need
to write code in a programming
language such as Java or Python. KSQL
is scalable, elastic, fault-tolerant, and it
supports a wide range of streaming
operations, including data filtering,
transformations, aggregations, joins,
windowing, and sessionization.

Kafka101

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Kafka101

Similar to Kafka101 (20)

Recently uploaded

Recently uploaded (20)

Kafka101

Editor's Notes