Introduction to Kafka

Introduction to Kafka
BY DUCAS FRANCIS

The problem
Web
Security
System
Real-time
Monitoring
Logging
System
Other
services
Mobile
API
Job
It’s simple enough at first…
Then it gets a little busy…
And ends up a mess.

The solution
Web
Security
System
Real-time
Monitoring
Logging
System
Other
services
Mobile
API
Job
Pub/Sub
Decouple data pipelines using a pub/sub system
Producers Brokers Consumers

Apache Kafka
A UNIFIED, HIGH-
THROUGHPUT, LOW-LATENCY
PLATFORM FOR HANDLING
REAL-TIME DATA FEEDS

A brief history lesson
 Originally developed at LinkedIn in 2011
 Graduated Apache Incubator in 2012
 Engineers from LinkedIn formed Confluent in 2014
 Up to version 0.9.4 with 0.10 on horizon

Motivation
 Unified platform for all real-time data feeds
 High throughput for high volume streams
 Support periodic data loads from offline systems
 Low latency for traditional messaging
 Support partitioned, distributed, real-time processing
 Guarantee fault-tolerance

Common use cases
 Messaging
 Website activity tracking
 Metrics
 Log aggregation
 Stream processing
 Event sourcing
 Commit log

Benefits of Kafka
 High throughput
 Low latency
 Load balancing
 Fault tolerant
 Guaranteed delivery
 Secure

Some terminology
 Topic – feed of messages
 Producer – publishes messages to a topic
 Consumer – subscribes to topics and processes the feed of messages
 Broker – server instance that acts in a cluster

@apachekafkapowers
@microsot…

Libraries
 Python – kafka-python / pykafka
 Go – sarama / go_kafka_client / …
 C/C++ - librdkafka / libkafka / …
 .NET – kafka-net (x2) / rdkafka-dotnet / CSharpClient-for-Kafka
 Node.js – kafka-node / sutoiku/node-kafka / ...
 HTTP – kafka-pixy / kafka-rest
 etc.

Architecture
Producer Producer
Broker BrokerBroker
Consumer ConsumerZookeeper
Cluster
x3

Show me the
Kafka!!!
VAGRANT TO THE RESCUE

Anatomy of a topic
 Topics are broken into partitions
 Messages are assigned sequential ID
called and offset
 Data is retained for a configurable
period of time
 Number of partitions can be increased
after creation, but not decreased
 Partitions are assigned to brokers
Each partition is an ordered, immutable sequence of messages that is continually appended to…
a commit log.

Broker
 Kafka service running as part of a cluster
 Receives messages from producers and serves them to consumers
 Coordinated using Zookeeper
 Need odd number for quorum
 Store messages on the file system
 Replicate messages to/from other brokers
 Answer metadata requests about brokers and topics/partitions
 As of 0.9.0 – coordinate consumers

Replication
 Partitions on a topic should be replicated
 Each partition has 1 leader and 0 or more followers
 An In-Sync Replica (ISR) is one that’s communicating with Zookeeper and not too
far behind the leader
 Replication factor can be increased after creation, not decreased

./kafka-topics
--CREATE
--REPLICATION-FACTOR
--PARTITIONS
--DESCRIBE

Producers
 Publishes messages to a topic
 Distributes messages across partitions
 Round-robin
 Key hashing
 Send synchronously or asynchronously to the broker that is the leader for the
partition
 ACKS = 0 (none),1 (leader), -1 (all ISRs)
 Synchronous is obviously slower, but more durable

Testing... Testing…
1 2 3
LET’S SEE HOW FAST WE CAN
PUSH

Consumers
 Read messages from a topic
 Multiple consumers can read from the same topic
 Manage their offsets
 Messages stay on Kafka after they are consumed

Testing... Testing…
1 2 3
LET’S SEE HOW FAST WE CAN
RECEIVE

It’s fast! But why…?
 Efficient protocol based on message set
 Batching messages to reduce network latency and small I/O operations
 Append/chunk messages to increase consumer throughput
 Optimised OS operations
 pagecache
 sendfile()
 Broker services consumers from cache where possible
 End-to-end batch compression

Load balanced consumers
 Distribute load across instances in a group by allocating partitions
 Handle failure by rebalancing partitions to other instances
 Commit their offsets to Kafka
Cluster
Broker 1 Broker 2
P0 P1 P2 P3
Consumer Group 1
C0 C1
Consumer Group 2
C2 C3 C4 C6

Consumer groups and offsets
Cluster
Broker 1 Broker 2
P0 P1 P2 P3
Consumer Group 1
C0 C1
0 1 2 3 4 5 6 7 8 9 10P3
C1
read
C1
commit
C0
read
C0
commit

Guarantees
 Messages sent by a producer to a particular topic’s partition will be appended in
the order they are sent
 A consumer instance sees messages in the order they are stored in the log
 For a topic with replication factor N, we will tolerate up to N-1 server failures
without losing any messages committed to the log

Ordered delivery
 Messages are guaranteed to be delivered in order by partition, NOT topic
M1 M3 M5
M2 M4 M6
P0
P1
 M1 before M3 before M5 – YES
 M1 before M2 – NO
 M2 before M4 before M6 – YES
 M2 before M3 - NO

Enough ALT… now
.NET
USING RDKAFKA-DOTNET

Resources
 http://kafka.apache.org/documentation.html
 http://www.confluent.io/
 https://kafka.apache.org/090/configuration.html
 https://github.com/edenhill/librdkafka
 https://github.com/ah-/rdkafka-dotnet

Log compaction
 Keep the most recent payload for a key
 Use cases
 Database change subscription
 Event sourcing
 Journaling for HA

Introduction to Kafka

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (9)

Similar to Introduction to Kafka

Similar to Introduction to Kafka (20)

Recently uploaded

Recently uploaded (20)

Introduction to Kafka

Editor's Notes