4. Apache Kafka
A UNIFIED, HIGH-
THROUGHPUT, LOW-LATENCY
PLATFORM FOR HANDLING
REAL-TIME DATA FEEDS
5. A brief history lesson
Originally developed at LinkedIn in 2011
Graduated Apache Incubator in 2012
Engineers from LinkedIn formed Confluent in 2014
Up to version 0.9.4 with 0.10 on horizon
6. Motivation
Unified platform for all real-time data feeds
High throughput for high volume streams
Support periodic data loads from offline systems
Low latency for traditional messaging
Support partitioned, distributed, real-time processing
Guarantee fault-tolerance
11. Some terminology
Topic – feed of messages
Producer – publishes messages to a topic
Consumer – subscribes to topics and processes the feed of messages
Broker – server instance that acts in a cluster
16. Anatomy of a topic
Topics are broken into partitions
Messages are assigned sequential ID
called and offset
Data is retained for a configurable
period of time
Number of partitions can be increased
after creation, but not decreased
Partitions are assigned to brokers
Each partition is an ordered, immutable sequence of messages that is continually appended to…
a commit log.
17. Broker
Kafka service running as part of a cluster
Receives messages from producers and serves them to consumers
Coordinated using Zookeeper
Need odd number for quorum
Store messages on the file system
Replicate messages to/from other brokers
Answer metadata requests about brokers and topics/partitions
As of 0.9.0 – coordinate consumers
18. Replication
Partitions on a topic should be replicated
Each partition has 1 leader and 0 or more followers
An In-Sync Replica (ISR) is one that’s communicating with Zookeeper and not too
far behind the leader
Replication factor can be increased after creation, not decreased
20. Producers
Publishes messages to a topic
Distributes messages across partitions
Round-robin
Key hashing
Send synchronously or asynchronously to the broker that is the leader for the
partition
ACKS = 0 (none),1 (leader), -1 (all ISRs)
Synchronous is obviously slower, but more durable
22. Consumers
Read messages from a topic
Multiple consumers can read from the same topic
Manage their offsets
Messages stay on Kafka after they are consumed
24. It’s fast! But why…?
Efficient protocol based on message set
Batching messages to reduce network latency and small I/O operations
Append/chunk messages to increase consumer throughput
Optimised OS operations
pagecache
sendfile()
Broker services consumers from cache where possible
End-to-end batch compression
25. Load balanced consumers
Distribute load across instances in a group by allocating partitions
Handle failure by rebalancing partitions to other instances
Commit their offsets to Kafka
Cluster
Broker 1 Broker 2
P0 P1 P2 P3
Consumer Group 1
C0 C1
Consumer Group 2
C2 C3 C4 C6
27. Guarantees
Messages sent by a producer to a particular topic’s partition will be appended in
the order they are sent
A consumer instance sees messages in the order they are stored in the log
For a topic with replication factor N, we will tolerate up to N-1 server failures
without losing any messages committed to the log
28. Ordered delivery
Messages are guaranteed to be delivered in order by partition, NOT topic
M1 M3 M5
M2 M4 M6
P0
P1
M1 before M3 before M5 – YES
M1 before M2 – NO
M2 before M4 before M6 – YES
M2 before M3 - NO
High throughput – web activity tracking receiving 10’s of events per page hit or interaction.
Periodic data loads – every 5min receving 100,000s messages
Low latency – pub/sub in ms
Distributed – anyone sending or receiving messages should be able to accomplish HA
cd ~/Projects/kafka-vagrant
vagrant status
vagrant up
vagrant ssh kafka-1
cat /etc/kafka/server.properties
https://kafka.apache.org/090/configuration.html
Topic – feed of messages
Partition – topics are broken into partitions
Messages – written to the end of a partition within a topic and assigned a sequential identifier (a 64bit integer) which is called an offset
Data is retained within a partition for a configurable amount of time. The time is defaulted in broker configuration, but can be set per topic. Messages are stored on the file system in segmented files.
Number of partitions can be increased after creation, but not decreased. This is because (as mentioned) the messages are stored on the file system on a per-partition basis, so reducing partitions would be effectively deleting data.
Partitions are assigned to brokers – not topics. Kafka attempts to balance the number of partitions across the available brokers, which can be manually configured too. This is how kafka attempts to load balance its activity because, in theory, each broker having an equal number of partitions should receive an equal number of send and fetch requests.
…
The responsibilities of coordination are mixed between ZK and Kafka. Older versions of kafka relied more on ZK, but this is being brought more into the broker and ZK is being used more for service discovery and configuration.
Before 0.9.0, consumers were coordinated by ZK and had to have a lot of logic around which partitions were assigned to them. This was changed so that for a new consumer a broker is assigned to be the consumer coordinator and tell the consumers which partitions were assigned to them.
Modern OSs maintain a page cache and aggressively use main memory for disk caching. By NOT utilizing this and storing an in-memory representation of data you’re effectively doubling up on the amount of memory you’re application is consuming. By utilizing this you’re utilizing all available RAM for caching without GC penalties. It’s also kept in memory even if the application is restarted.
This is obviously advantageous when reading messages, but also when writing.
Rather than maintain as much as possible in-memory and flush it all out to the file system in a panic when we run out of space, we invert that. All data is immediately written to a persistent log on the filesystem without necessarily flushing to disk. In effect this just means that it is transferred into the kernel's pagecache.
Modern unix operating systems offer a highly optimized code path for transferring data out of pagecache to a socket – the sendfile system call.
OS reads data from a file into pagecache in kernel space
Application reads from kernel space to a user space buffer
Application writes data back to kernel space into a socker buffer
OS copies from socket buffer to NIC buffer to send over the network
sendfile avoids this by instructing the OS to send data directly from the pagecache to the NIC. This means that consumers that are caught up will be served completely from memory.
Kafka scales topic consumption by distributing partitions among a consumer group, which is a set of consumers sharing a common group identifier.
For each group a broker is selected as the group coordinator. The coordinator is responsible for managing the state of the group. Its main job is to mediate partition assignment when new members arrive, old members depart, and when topic metadata changes. The act of reassigning partitions is known as rebalancing the group.