SlideShare a Scribd company logo
Apache Kafka - Messaging System 
Dmitry Tolpeko, EPAM Systems – September 2014
Kafka Overview 
Kafka is a real-time, fault-tolerant, scalable messaging 
system. 
2 
It is a publish-subscribe system that connects various 
applications with the help of messages - producers 
and consumers of information. 
Producers and consumers are independent, 
messages are queued, one producer can serve 
multiple consumers. 
Was originally developed by LinkedIn.
SECTION 
Apache Kafka 
CONCEPTS 
3
Kafka Architecture 
4 
Client Server Client 
Producer(s) Broker(s) Consumers(s) 
ZooKeeper 
• Brokers act as the server part of Kafka. Brokers are peers, 
there is no the master broker. 
• Brokers can run on multiple nodes, but you can also run 
multiple brokers on each node. Each broker has own IP and 
port for client connections.
Topic is a way to handle 
multiple data streams 
(different data feeds i.e.) 
Each producer sends 
messages to, and 
consumers read the 
messages from the 
specified topic. 
New topics can be 
created automatically 
when a message with a 
new topic arrives, or you 
can use --create 
command to create a 
topic. 
Topics 
5 
Broker 
Topic 1 
Topic 2 
Producer 1 
Producer 2 
Producer 3 
Consumer 1 
Consumer 2
A topic can contain one or more partitions. 
Each partition is stored on a single server, and 
multiple partitions allow the queue to scale 
and go beyond the limits of a single system. 
Partitions also allow a single consumer to 
concurrently read messages in multiple 
concurrent threads. You can add new 
partitions dynamically. 
Offset is uniquely identifies a message within 
partition. 
Partitions 
6 
Broker 1 
Topic 1 
Partition 1 
Topic 2 
Partition 1 
Broker 2 
Topic 2 
Partition 2 
Partition 3
Each partition is replicated for fault-tolerance. 
Partition has one server that acts a Leader, it 
handles all read-write requests. 
Zero or more servers act as Followers, they 
replicate the leader and if it fails one of them 
becomes the new Leader. 
Leader uses ZooKeeper heartbeat mechanism to 
indicate that it is alive. 
A follower acts as a normal consumer, it pulls 
messages and updates own log. Only when all 
followers (ISR group) sync the message it can be 
send to consumers. When a follower rejoins after 
a downtime it can re-sync. 
Replication 
7 
Broker 1 
Topic 1 
Partition 1 - Leader 
Broker 2 
Topic 1 
Partition 1 - Follower
Consumers are organized to consumer 
groups. 
To consume a single message by multiple 
consumers, they must belong to different 
consumer groups. 
A consumer group is a single consumer 
abstraction, so consumers from single group 
read messages like from a queue there is no 
message broadcast within the group. This 
helps balance load among consumers of the 
same type (fault-tolerance, scalability). 
The state of consumed messages are 
handled by consumers, not brokers. 
Consumers store the state in ZooKeeper - 
offset within each partition for each 
Consumer group, not consumer (!) 
Consumer group name is unique within the 
Kafka cluster. 
Consumer Groups 
8 
Topic 1 
Partition 1 
Partition 2 
Partition 3 
Group 1 
Consumer 
Consumer 
Group 2 
Consumer
Order Guarantees and Delivery Semantics 
Each partition can be consumed only one consumer within the consumer group. 
Kafka only provides total order guarantee within a partition, not between different 
partitions in a topic. 
If you need total order over messages you have to use one partition, and in this 
case you can use only one consumer process. 
Kafka guarantees at-least-once delivery semantics by default where 
messages are never lost but may be redelivered (keys can be used to handle 
duplicates). Kafka offers options to disable retries (so messages can be lost) in 
case if the application can handle this, and needs a higher performance. 
Kafka retains all published messages - no matter whether they are consumed or 
not - for the configured period of time (2 days by default). 
9
Producer can assign a key for a message 
that defines which partition to publish 
message to. 
• Random (default, when no partition 
class or key specified) 
• Round-robin for load balancing 
• Partition function (hash by message 
key i.e.) - if key is a class type 
(Source ID i.e.) then all messages of 
the same type go to one partition. 
Producer can optionally require an 
acknowledgment from the broker that the 
message was received (synced to Leader 
or all followers). 
Kafka can group multiple messages and 
compress them. 
Producers 
10 
Producer 1 
Topic 1 
Partition 1 
Partition 2 
Partition 3 
Producer 2
Consumers read the messages from the 
brokers leading the partitions (pull method). 
A consumer labels itself with a consumer 
group. 
If the number of consumers of a specific 
consumer group is greater than the number 
of partitions, then some consumers will never 
see a message. 
If there are more partitions than consumers 
of a specific consumer group, then a 
consumer can get messages from multiple 
partitions (no order guarantee). Then when 
you add consumers, Kafka re-balances 
partitions. 
Consumers can get compressed message as 
a single message. 
Consumers 
11 
Partition 1 
Partition 2 
Partition 3 
Group 1 
Consumer 1 
Consumer 2 
Consumer 3 
Consumer 4
Consumer Advanced Features 
There are High Level and Simple Consumer API. 
A High Level Consumer sets 
auto.commit.interval.ms option that defines how often 
offset is updated in ZooKeeper. If an error occurs between 
updates, the consumer will get replayed messages (!) 
Simple Consumer is a low-level API that allows you to set 
any offset, explicitly read messages multiple times, or ensure 
that a message is processed only once. 
12
SECTION 
Apache Kafka 
INTERNALS 
13
Kafka relies heavily on OS disk cache, not 
JVM heap even for caching messages. Data 
immediately written (appended) to a file. 
Consumed messages are not deleted. 
Data files (called logs) are stored at 
log.dirs 
A directory exists for each topic partition that 
contains log segments (files 0000000.log - 
named as offset of the 1st message in the 
log). log.segment.bytes and 
log.roll.hours define rotation policy. 
log.flush.interval.xxx options define 
how often fsync performed on files. 
All options can be specified either globally or 
per topic. 
Persistence 
14 
Broker JVM App 
OS page cache 
/data/kafka-logs 
TopicName-0 
00000.log
Messages can be grouped 
together to minimize the number of 
network round-trips. 
Multiple messages can be also 
compressed together (GZIP, 
Snappy) that helps achieve a good 
compression rate and reduce 
amount of data sent over network. 
Producer can specify 
compression.codec and 
compressed.topics 
Network I/O 
15 
Message1 
Message2 
Message3 
Compressed 
Network
There is no in-memory application 
level cache, data are in the OS 
pagecache. 
Kafka uses sendfile Linux API 
calls that directly sends data from 
pagecache to a network socket, so 
there is no need to do read/write 
to application memory space. 
Grouped messages are stored 
compressed in the log, and 
decompressed only by consumers. 
Memory 
16 
Broker JVM App 
OS page cache 
Network
Log Compaction 
Without log compaction (time series data): 
17 
Key1 Key2 Key3 Key1 Key2 Key1 Key3 
A B C AA BB AAA CC 
With log compaction only the last update is stored for each key: 
Key2 Key1 Key3 
BB AAA CC 
Log compaction can be defined per topic. This can help increase 
performance of roll-forward operations, and reduce storage.
Kafka Use Cases 
• Messaging - decouple processing or handle message 
buffer 
• Monitoring and Tracking - collect activity, clickstream, 
status data and logs from various systems 
• Stream Processing - aggregate, enrich, handle micro-batches 
etc. 
• Commit Log - facilitate replication between systems 
18
Thanks! 
Join us at 
https://www.linkedin.com/groups/Belarus- 
Hadoop-User-Group-BHUG-8104884 
dmitry_tolpeko@epam.com

More Related Content

What's hot

Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
Chhavi Parasher
 
Kafka Overview
Kafka OverviewKafka Overview
Kafka Overview
iamtodor
 
Kafka tutorial
Kafka tutorialKafka tutorial
Kafka tutorial
Srikrishna k
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Srikrishna k
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Kumar Shivam
 
Kafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced ProducersKafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced Producers
Jean-Paul Azar
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Long Nguyen
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
emreakis
 
Envoy and Kafka
Envoy and KafkaEnvoy and Kafka
Envoy and Kafka
Adam Kotwasinski
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Ramakrishna kapa
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
AIMDek Technologies
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Srikrishna k
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
Shiao-An Yuan
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
Amita Mirajkar
 
kafka
kafkakafka
Kafka: Internals
Kafka: InternalsKafka: Internals
Kafka: Internals
Knoldus Inc.
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Viswanath J
 
Kafka 101 and Developer Best Practices
Kafka 101 and Developer Best PracticesKafka 101 and Developer Best Practices
Kafka 101 and Developer Best Practices
confluent
 
Spring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise PlatformSpring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise Platform
VMware Tanzu
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
Knoldus Inc.
 

What's hot (20)

Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Kafka Overview
Kafka OverviewKafka Overview
Kafka Overview
 
Kafka tutorial
Kafka tutorialKafka tutorial
Kafka tutorial
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Kafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced ProducersKafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced Producers
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Envoy and Kafka
Envoy and KafkaEnvoy and Kafka
Envoy and Kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
kafka
kafkakafka
kafka
 
Kafka: Internals
Kafka: InternalsKafka: Internals
Kafka: Internals
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Kafka 101 and Developer Best Practices
Kafka 101 and Developer Best PracticesKafka 101 and Developer Best Practices
Kafka 101 and Developer Best Practices
 
Spring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise PlatformSpring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise Platform
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 

Viewers also liked

Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis
 
Decoupling Decisions with Apache Kafka
Decoupling Decisions with Apache KafkaDecoupling Decisions with Apache Kafka
Decoupling Decisions with Apache Kafka
Grant Henke
 
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Data Con LA
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
Joe Stein
 
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
Chen-en Lu
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
Guozhang Wang
 
BioniceMe
BioniceMeBioniceMe
BioniceMe
Bionicme
 
Keys to Successful Governance
Keys to Successful GovernanceKeys to Successful Governance
Keys to Successful Governance
Advanta
 
Making Your IEP System Work for You: 5 Questions to Ask About Your IEP System
Making Your IEP System Work for You: 5 Questions to Ask About Your IEP SystemMaking Your IEP System Work for You: 5 Questions to Ask About Your IEP System
Making Your IEP System Work for You: 5 Questions to Ask About Your IEP System
Accelify
 
More than a vocabulary lesson
More than a vocabulary lessonMore than a vocabulary lesson
More than a vocabulary lesson
Sarah Whitten
 
Thongtu01.2015/TT-BXD
Thongtu01.2015/TT-BXDThongtu01.2015/TT-BXD
Thongtu01.2015/TT-BXD
Mèo Hoang
 
LA VOTE ORY
LA VOTE ORYLA VOTE ORY
LA VOTE ORY
DrBamboozler
 
39.2015.tt.bnnptnt
39.2015.tt.bnnptnt39.2015.tt.bnnptnt
39.2015.tt.bnnptnt
Mèo Hoang
 
Meadowbrook Estates
Meadowbrook EstatesMeadowbrook Estates
Meadowbrook Estates
Paddie Ferraro
 
YVES ROCHER CATALOGO CAMPAÑA 10-2014
YVES ROCHER  CATALOGO CAMPAÑA 10-2014YVES ROCHER  CATALOGO CAMPAÑA 10-2014
YVES ROCHER CATALOGO CAMPAÑA 10-2014Selene Gamboa
 

Viewers also liked (17)

Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
 
kafka
kafkakafka
kafka
 
Decoupling Decisions with Apache Kafka
Decoupling Decisions with Apache KafkaDecoupling Decisions with Apache Kafka
Decoupling Decisions with Apache Kafka
 
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
 
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
BioniceMe
BioniceMeBioniceMe
BioniceMe
 
Keys to Successful Governance
Keys to Successful GovernanceKeys to Successful Governance
Keys to Successful Governance
 
Making Your IEP System Work for You: 5 Questions to Ask About Your IEP System
Making Your IEP System Work for You: 5 Questions to Ask About Your IEP SystemMaking Your IEP System Work for You: 5 Questions to Ask About Your IEP System
Making Your IEP System Work for You: 5 Questions to Ask About Your IEP System
 
More than a vocabulary lesson
More than a vocabulary lessonMore than a vocabulary lesson
More than a vocabulary lesson
 
Thongtu01.2015/TT-BXD
Thongtu01.2015/TT-BXDThongtu01.2015/TT-BXD
Thongtu01.2015/TT-BXD
 
LA VOTE ORY
LA VOTE ORYLA VOTE ORY
LA VOTE ORY
 
39.2015.tt.bnnptnt
39.2015.tt.bnnptnt39.2015.tt.bnnptnt
39.2015.tt.bnnptnt
 
Meadowbrook Estates
Meadowbrook EstatesMeadowbrook Estates
Meadowbrook Estates
 
YVES ROCHER CATALOGO CAMPAÑA 10-2014
YVES ROCHER  CATALOGO CAMPAÑA 10-2014YVES ROCHER  CATALOGO CAMPAÑA 10-2014
YVES ROCHER CATALOGO CAMPAÑA 10-2014
 

Similar to Apache Kafka - Messaging System Overview

Introduction to Kafka and Event-Driven
Introduction to Kafka and Event-DrivenIntroduction to Kafka and Event-Driven
Introduction to Kafka and Event-Driven
Dimosthenis Botsaris
 
Introduction to Kafka and Event-Driven
Introduction to Kafka and Event-DrivenIntroduction to Kafka and Event-Driven
Introduction to Kafka and Event-Driven
arconsis
 
Kafka Fundamentals
Kafka FundamentalsKafka Fundamentals
Kafka Fundamentals
Ketan Keshri
 
Kafka 10000 feet view
Kafka 10000 feet viewKafka 10000 feet view
Kafka 10000 feet view
younessx01
 
Kafka RealTime Streaming
Kafka RealTime StreamingKafka RealTime Streaming
Kafka RealTime Streaming
Viyaan Jhiingade
 
Notes leo kafka
Notes leo kafkaNotes leo kafka
Notes leo kafka
Léopold Gault
 
Session 23 - Kafka and Zookeeper
Session 23 - Kafka and ZookeeperSession 23 - Kafka and Zookeeper
Session 23 - Kafka and Zookeeper
AnandMHadoop
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Jemin Patel
 
A Quick Guide to Refresh Kafka Skills
A Quick Guide to Refresh Kafka SkillsA Quick Guide to Refresh Kafka Skills
A Quick Guide to Refresh Kafka Skills
Ravindra kumar
 
Apache kafka
Apache kafkaApache kafka
Kafka Deep Dive
Kafka Deep DiveKafka Deep Dive
Kafka Deep Dive
Knoldus Inc.
 
Copy of Kafka-Camus
Copy of Kafka-CamusCopy of Kafka-Camus
Copy of Kafka-CamusDeep Shah
 
Distributed messaging with Apache Kafka
Distributed messaging with Apache KafkaDistributed messaging with Apache Kafka
Distributed messaging with Apache Kafka
Saumitra Srivastav
 
[@NaukriEngineering] Messaging Queues
[@NaukriEngineering] Messaging Queues[@NaukriEngineering] Messaging Queues
[@NaukriEngineering] Messaging Queues
Naukri.com
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018
Otávio Carvalho
 
Apache Kafka
Apache Kafka Apache Kafka
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQCluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQShameera Rathnayaka
 
Apache kafka introduction
Apache kafka introductionApache kafka introduction
Apache kafka introduction
Mohammad Mazharuddin
 
apachekafka-160907180205.pdf
apachekafka-160907180205.pdfapachekafka-160907180205.pdf
apachekafka-160907180205.pdf
TarekHamdi8
 

Similar to Apache Kafka - Messaging System Overview (20)

Introduction to Kafka and Event-Driven
Introduction to Kafka and Event-DrivenIntroduction to Kafka and Event-Driven
Introduction to Kafka and Event-Driven
 
Introduction to Kafka and Event-Driven
Introduction to Kafka and Event-DrivenIntroduction to Kafka and Event-Driven
Introduction to Kafka and Event-Driven
 
Kafka Fundamentals
Kafka FundamentalsKafka Fundamentals
Kafka Fundamentals
 
Kafka 10000 feet view
Kafka 10000 feet viewKafka 10000 feet view
Kafka 10000 feet view
 
Kafka RealTime Streaming
Kafka RealTime StreamingKafka RealTime Streaming
Kafka RealTime Streaming
 
Notes leo kafka
Notes leo kafkaNotes leo kafka
Notes leo kafka
 
Session 23 - Kafka and Zookeeper
Session 23 - Kafka and ZookeeperSession 23 - Kafka and Zookeeper
Session 23 - Kafka and Zookeeper
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
A Quick Guide to Refresh Kafka Skills
A Quick Guide to Refresh Kafka SkillsA Quick Guide to Refresh Kafka Skills
A Quick Guide to Refresh Kafka Skills
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Kafka Deep Dive
Kafka Deep DiveKafka Deep Dive
Kafka Deep Dive
 
Copy of Kafka-Camus
Copy of Kafka-CamusCopy of Kafka-Camus
Copy of Kafka-Camus
 
Distributed messaging with Apache Kafka
Distributed messaging with Apache KafkaDistributed messaging with Apache Kafka
Distributed messaging with Apache Kafka
 
[@NaukriEngineering] Messaging Queues
[@NaukriEngineering] Messaging Queues[@NaukriEngineering] Messaging Queues
[@NaukriEngineering] Messaging Queues
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018
 
Apache Kafka
Apache Kafka Apache Kafka
Apache Kafka
 
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQCluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
 
Apache kafka introduction
Apache kafka introductionApache kafka introduction
Apache kafka introduction
 
apachekafka-160907180205.pdf
apachekafka-160907180205.pdfapachekafka-160907180205.pdf
apachekafka-160907180205.pdf
 

More from Dmitry Tolpeko

Big Data Analytics for BI, BA and QA
Big Data Analytics for BI, BA and QABig Data Analytics for BI, BA and QA
Big Data Analytics for BI, BA and QA
Dmitry Tolpeko
 
Qubole - Big data in cloud
Qubole - Big data in cloudQubole - Big data in cloud
Qubole - Big data in cloud
Dmitry Tolpeko
 
Epam BI - Near Realtime Marketing Support System
Epam BI - Near Realtime Marketing Support SystemEpam BI - Near Realtime Marketing Support System
Epam BI - Near Realtime Marketing Support System
Dmitry Tolpeko
 
Big Data Technology - Solit 2015 Conference
Big Data Technology - Solit 2015 ConferenceBig Data Technology - Solit 2015 Conference
Big Data Technology - Solit 2015 Conference
Dmitry Tolpeko
 
Apache Yarn - Hadoop Cluster Management
Apache Yarn -  Hadoop Cluster ManagementApache Yarn -  Hadoop Cluster Management
Apache Yarn - Hadoop Cluster Management
Dmitry Tolpeko
 
Bi 2.0 hadoop everywhere
Bi 2.0   hadoop everywhereBi 2.0   hadoop everywhere
Bi 2.0 hadoop everywhere
Dmitry Tolpeko
 

More from Dmitry Tolpeko (6)

Big Data Analytics for BI, BA and QA
Big Data Analytics for BI, BA and QABig Data Analytics for BI, BA and QA
Big Data Analytics for BI, BA and QA
 
Qubole - Big data in cloud
Qubole - Big data in cloudQubole - Big data in cloud
Qubole - Big data in cloud
 
Epam BI - Near Realtime Marketing Support System
Epam BI - Near Realtime Marketing Support SystemEpam BI - Near Realtime Marketing Support System
Epam BI - Near Realtime Marketing Support System
 
Big Data Technology - Solit 2015 Conference
Big Data Technology - Solit 2015 ConferenceBig Data Technology - Solit 2015 Conference
Big Data Technology - Solit 2015 Conference
 
Apache Yarn - Hadoop Cluster Management
Apache Yarn -  Hadoop Cluster ManagementApache Yarn -  Hadoop Cluster Management
Apache Yarn - Hadoop Cluster Management
 
Bi 2.0 hadoop everywhere
Bi 2.0   hadoop everywhereBi 2.0   hadoop everywhere
Bi 2.0 hadoop everywhere
 

Recently uploaded

Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
Ortus Solutions, Corp
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024
Sharepoint Designs
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Natan Silnitsky
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Hivelance Technology
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
IES VE
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
Cyanic lab
 
Visitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.appVisitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.app
NaapbooksPrivateLimi
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
XfilesPro
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
Peter Caitens
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
Tendenci - The Open Source AMS (Association Management Software)
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
varshanayak241
 

Recently uploaded (20)

Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
Visitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.appVisitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.app
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
 

Apache Kafka - Messaging System Overview

  • 1. Apache Kafka - Messaging System Dmitry Tolpeko, EPAM Systems – September 2014
  • 2. Kafka Overview Kafka is a real-time, fault-tolerant, scalable messaging system. 2 It is a publish-subscribe system that connects various applications with the help of messages - producers and consumers of information. Producers and consumers are independent, messages are queued, one producer can serve multiple consumers. Was originally developed by LinkedIn.
  • 3. SECTION Apache Kafka CONCEPTS 3
  • 4. Kafka Architecture 4 Client Server Client Producer(s) Broker(s) Consumers(s) ZooKeeper • Brokers act as the server part of Kafka. Brokers are peers, there is no the master broker. • Brokers can run on multiple nodes, but you can also run multiple brokers on each node. Each broker has own IP and port for client connections.
  • 5. Topic is a way to handle multiple data streams (different data feeds i.e.) Each producer sends messages to, and consumers read the messages from the specified topic. New topics can be created automatically when a message with a new topic arrives, or you can use --create command to create a topic. Topics 5 Broker Topic 1 Topic 2 Producer 1 Producer 2 Producer 3 Consumer 1 Consumer 2
  • 6. A topic can contain one or more partitions. Each partition is stored on a single server, and multiple partitions allow the queue to scale and go beyond the limits of a single system. Partitions also allow a single consumer to concurrently read messages in multiple concurrent threads. You can add new partitions dynamically. Offset is uniquely identifies a message within partition. Partitions 6 Broker 1 Topic 1 Partition 1 Topic 2 Partition 1 Broker 2 Topic 2 Partition 2 Partition 3
  • 7. Each partition is replicated for fault-tolerance. Partition has one server that acts a Leader, it handles all read-write requests. Zero or more servers act as Followers, they replicate the leader and if it fails one of them becomes the new Leader. Leader uses ZooKeeper heartbeat mechanism to indicate that it is alive. A follower acts as a normal consumer, it pulls messages and updates own log. Only when all followers (ISR group) sync the message it can be send to consumers. When a follower rejoins after a downtime it can re-sync. Replication 7 Broker 1 Topic 1 Partition 1 - Leader Broker 2 Topic 1 Partition 1 - Follower
  • 8. Consumers are organized to consumer groups. To consume a single message by multiple consumers, they must belong to different consumer groups. A consumer group is a single consumer abstraction, so consumers from single group read messages like from a queue there is no message broadcast within the group. This helps balance load among consumers of the same type (fault-tolerance, scalability). The state of consumed messages are handled by consumers, not brokers. Consumers store the state in ZooKeeper - offset within each partition for each Consumer group, not consumer (!) Consumer group name is unique within the Kafka cluster. Consumer Groups 8 Topic 1 Partition 1 Partition 2 Partition 3 Group 1 Consumer Consumer Group 2 Consumer
  • 9. Order Guarantees and Delivery Semantics Each partition can be consumed only one consumer within the consumer group. Kafka only provides total order guarantee within a partition, not between different partitions in a topic. If you need total order over messages you have to use one partition, and in this case you can use only one consumer process. Kafka guarantees at-least-once delivery semantics by default where messages are never lost but may be redelivered (keys can be used to handle duplicates). Kafka offers options to disable retries (so messages can be lost) in case if the application can handle this, and needs a higher performance. Kafka retains all published messages - no matter whether they are consumed or not - for the configured period of time (2 days by default). 9
  • 10. Producer can assign a key for a message that defines which partition to publish message to. • Random (default, when no partition class or key specified) • Round-robin for load balancing • Partition function (hash by message key i.e.) - if key is a class type (Source ID i.e.) then all messages of the same type go to one partition. Producer can optionally require an acknowledgment from the broker that the message was received (synced to Leader or all followers). Kafka can group multiple messages and compress them. Producers 10 Producer 1 Topic 1 Partition 1 Partition 2 Partition 3 Producer 2
  • 11. Consumers read the messages from the brokers leading the partitions (pull method). A consumer labels itself with a consumer group. If the number of consumers of a specific consumer group is greater than the number of partitions, then some consumers will never see a message. If there are more partitions than consumers of a specific consumer group, then a consumer can get messages from multiple partitions (no order guarantee). Then when you add consumers, Kafka re-balances partitions. Consumers can get compressed message as a single message. Consumers 11 Partition 1 Partition 2 Partition 3 Group 1 Consumer 1 Consumer 2 Consumer 3 Consumer 4
  • 12. Consumer Advanced Features There are High Level and Simple Consumer API. A High Level Consumer sets auto.commit.interval.ms option that defines how often offset is updated in ZooKeeper. If an error occurs between updates, the consumer will get replayed messages (!) Simple Consumer is a low-level API that allows you to set any offset, explicitly read messages multiple times, or ensure that a message is processed only once. 12
  • 13. SECTION Apache Kafka INTERNALS 13
  • 14. Kafka relies heavily on OS disk cache, not JVM heap even for caching messages. Data immediately written (appended) to a file. Consumed messages are not deleted. Data files (called logs) are stored at log.dirs A directory exists for each topic partition that contains log segments (files 0000000.log - named as offset of the 1st message in the log). log.segment.bytes and log.roll.hours define rotation policy. log.flush.interval.xxx options define how often fsync performed on files. All options can be specified either globally or per topic. Persistence 14 Broker JVM App OS page cache /data/kafka-logs TopicName-0 00000.log
  • 15. Messages can be grouped together to minimize the number of network round-trips. Multiple messages can be also compressed together (GZIP, Snappy) that helps achieve a good compression rate and reduce amount of data sent over network. Producer can specify compression.codec and compressed.topics Network I/O 15 Message1 Message2 Message3 Compressed Network
  • 16. There is no in-memory application level cache, data are in the OS pagecache. Kafka uses sendfile Linux API calls that directly sends data from pagecache to a network socket, so there is no need to do read/write to application memory space. Grouped messages are stored compressed in the log, and decompressed only by consumers. Memory 16 Broker JVM App OS page cache Network
  • 17. Log Compaction Without log compaction (time series data): 17 Key1 Key2 Key3 Key1 Key2 Key1 Key3 A B C AA BB AAA CC With log compaction only the last update is stored for each key: Key2 Key1 Key3 BB AAA CC Log compaction can be defined per topic. This can help increase performance of roll-forward operations, and reduce storage.
  • 18. Kafka Use Cases • Messaging - decouple processing or handle message buffer • Monitoring and Tracking - collect activity, clickstream, status data and logs from various systems • Stream Processing - aggregate, enrich, handle micro-batches etc. • Commit Log - facilitate replication between systems 18
  • 19. Thanks! Join us at https://www.linkedin.com/groups/Belarus- Hadoop-User-Group-BHUG-8104884 dmitry_tolpeko@epam.com