SlideShare a Scribd company logo
Distributed Message Broker
Presented by
Majid Hajibaba
12 April 2015Majid Hajibaba 1
 What?
 Is a messaging system (integration between producers and
consumers)
 Distributed
 Peer-to-Peer
 High-throughput
 Fault Tolerant
 Replicated
 Developed at LinkedIn
 Why?
 Log aggregation
 Stream processing
 real-time processing
 as the output sink
 act as a buffer or feeder for messages
What is Kafka?
12 April 2015Majid Hajibaba 2
 Topics: categories in which
message feed is
maintained
 Producer: Processes that
publish messages to a
Kafka topic
 Consumers: processes that
subscribe to topics and
process the feed of
published messages
 Brokers: Servers which
form a kafka cluster and
act as a data transport
channel between
producers and consumers
12 April 2015Majid Hajibaba 3
Terminology
The Kafka architecture
12 April 2015Majid Hajibaba 4
A Kafka cluster
Stateless brokers
 Topic is a queue
 Have multiple partitions (scaling, parallelism)
 Consumed by multiple consumers
 Reads and writes can happen to each partition in parallel
12 April 2015Majid Hajibaba 5
Topic
partitions ≈ directories
 consumers should pull data from brokers ?
 brokers should push data to the consumer?
12 April 2015Majid Hajibaba 6
Pull vs. Push
push
pull
Reads are done by giving the 64-bit
logical offset of a message and an
S-byte max chunk size
The write allows serial appends
which always go to the last file.
the maximum
possible rate
 Synchronous send
 Producers get an ack. back when they publish a message
 Asynchronous send
 does not guarantee message delivery
 Batching
 will attempt to accumulate data in memory and to send out larger
batches in a single request
 Load balancing
 client controls
12 April 2015Majid Hajibaba 7
Producer
12 April 2015Majid Hajibaba 8
Consumer
partition p
partition q
partition p , q
12 April 2015Majid Hajibaba 9
Kafka Storage Architecture
Each log file is named
with the offset of the
first message it contains
file is rolled over to a
fresh file when it reaches
a configurable size
 n replicas can afford n-1 failures
 one replica acts as the lead replica
 lead replica maintains the list of
all in-sync follower replicas
12 April 2015Majid Hajibaba 10
Replication
12 April 2015Majid Hajibaba 11
Replication
bin/kafka-topics.sh --create --zookeeper 192.168.11.185:2181 --
replication-factor 3 --partitions 4 --topic test
 At most once: Messages may be lost but are never redelivered
 At least once: Messages are never lost but may be redelivered
 Exactly once: this is what people actually want, each message is
delivered once and only once
 When publishing a message ??
 At most once: without any acknowledgment
 Exactly once: can be achieved by producer and acknowledgment
 When consuming a message ??
 At most once: by sending ack after taking the message
 At least once: by sending ack after processing message
 Exactly once: requires co-operation with the destination storage
system
12 April 2015Majid Hajibaba 12
Message Delivery Semantics
 Adding new server
 Just assign a unique broker id and start up Kafka on it
 Will connect to others through zookeeper
 Will not automatically be assigned any data partitions
 Won't be doing any work until new topics are created
 Should migrate some existing data to these machines
 Data migrating is manually initiated but fully automated
12 April 2015Majid Hajibaba 13
Scaling
 no replication, no partition
 Linux virtual machine, 2.6GHz Intel xenon, 2GB memory
 publish a total of 1 million messages
 each of 300 bytes (300 MB)
12 April 2015Majid Hajibaba 14
Performance Test
7000
12000
17000
22000
27000
32000
37000
42000
300 600 900 1200
Messages/sec
Accomulated Message in MB
1 producer
2 producer
4 producer
8 producer
16 producer
16 producer with 2 cpu for
broker
40000 messages/second
12 MB/second
 no replication, 2 partition
12 April 2015Majid Hajibaba 15
Scalability test
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
50000
1 producer 2 producer 4 producer 8 producer 16 producer
Messages/sec
number of producers
1 Broker
2 Broker
12 April 2015Majid Hajibaba 16
Single node – multiple broker
12 April 2015Majid Hajibaba 17
Multiple node – multiple broker
12 April 2015Majid Hajibaba 18
Kafka Usage at LinkedIn
 List Topics
 Create Topics
 Delete Topic
12 April 2015Majid Hajibaba 19
Commands
bin/kafka-topics.sh --list --zookeeper 192.168.11.185:2181
bin/kafka-topics.sh --create --zookeeper 192.168.11.185:2181 --
replication-factor 1 --partitions 1 --topic test
bin/kafka-topics.sh --delete --zookeeper 192.168.11.185:2181 --
topic test
END
Any Question?
12 April 2015Majid Hajibaba 20

More Related Content

What's hot

Partner Development Guide for Kafka Connect
Partner Development Guide for Kafka ConnectPartner Development Guide for Kafka Connect
Partner Development Guide for Kafka Connect
confluent
 
Apache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platformApache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platform
confluent
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
DataStax
 
Event Driven Architectures with Apache Kafka on Heroku
Event Driven Architectures with Apache Kafka on HerokuEvent Driven Architectures with Apache Kafka on Heroku
Event Driven Architectures with Apache Kafka on Heroku
Heroku
 
Evolving from Messaging to Event Streaming
Evolving from Messaging to Event StreamingEvolving from Messaging to Event Streaming
Evolving from Messaging to Event Streaming
confluent
 
Understanding Kafka Produce and Fetch api calls for high throughtput applicat...
Understanding Kafka Produce and Fetch api calls for high throughtput applicat...Understanding Kafka Produce and Fetch api calls for high throughtput applicat...
Understanding Kafka Produce and Fetch api calls for high throughtput applicat...
HostedbyConfluent
 
Confluent Enterprise Datasheet
Confluent Enterprise DatasheetConfluent Enterprise Datasheet
Confluent Enterprise Datasheet
confluent
 
Confluent Developer Training
Confluent Developer TrainingConfluent Developer Training
Confluent Developer Training
confluent
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
confluent
 
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...
HostedbyConfluent
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDB
confluent
 
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and VormetricProtecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
confluent
 
Introducing Confluent Cloud: Apache Kafka as a Service
Introducing Confluent Cloud: Apache Kafka as a Service Introducing Confluent Cloud: Apache Kafka as a Service
Introducing Confluent Cloud: Apache Kafka as a Service
confluent
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache Kafka
Slim Baltagi
 
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...
StreamNative
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?
confluent
 
How did we move the mountain? - Migrating 1 trillion+ messages per day across...
How did we move the mountain? - Migrating 1 trillion+ messages per day across...How did we move the mountain? - Migrating 1 trillion+ messages per day across...
How did we move the mountain? - Migrating 1 trillion+ messages per day across...
HostedbyConfluent
 
AWS compute Services
AWS compute ServicesAWS compute Services
AWS compute Services
Nagesh Ramamoorthy
 
Real time data processing with anypoint connector for kafka
Real time data processing with anypoint connector for kafkaReal time data processing with anypoint connector for kafka
Real time data processing with anypoint connector for kafka
Son Nguyen
 
An Introduction to Confluent Cloud: Apache Kafka as a Service
An Introduction to Confluent Cloud: Apache Kafka as a ServiceAn Introduction to Confluent Cloud: Apache Kafka as a Service
An Introduction to Confluent Cloud: Apache Kafka as a Service
confluent
 

What's hot (20)

Partner Development Guide for Kafka Connect
Partner Development Guide for Kafka ConnectPartner Development Guide for Kafka Connect
Partner Development Guide for Kafka Connect
 
Apache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platformApache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platform
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
 
Event Driven Architectures with Apache Kafka on Heroku
Event Driven Architectures with Apache Kafka on HerokuEvent Driven Architectures with Apache Kafka on Heroku
Event Driven Architectures with Apache Kafka on Heroku
 
Evolving from Messaging to Event Streaming
Evolving from Messaging to Event StreamingEvolving from Messaging to Event Streaming
Evolving from Messaging to Event Streaming
 
Understanding Kafka Produce and Fetch api calls for high throughtput applicat...
Understanding Kafka Produce and Fetch api calls for high throughtput applicat...Understanding Kafka Produce and Fetch api calls for high throughtput applicat...
Understanding Kafka Produce and Fetch api calls for high throughtput applicat...
 
Confluent Enterprise Datasheet
Confluent Enterprise DatasheetConfluent Enterprise Datasheet
Confluent Enterprise Datasheet
 
Confluent Developer Training
Confluent Developer TrainingConfluent Developer Training
Confluent Developer Training
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDB
 
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and VormetricProtecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
 
Introducing Confluent Cloud: Apache Kafka as a Service
Introducing Confluent Cloud: Apache Kafka as a Service Introducing Confluent Cloud: Apache Kafka as a Service
Introducing Confluent Cloud: Apache Kafka as a Service
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache Kafka
 
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?
 
How did we move the mountain? - Migrating 1 trillion+ messages per day across...
How did we move the mountain? - Migrating 1 trillion+ messages per day across...How did we move the mountain? - Migrating 1 trillion+ messages per day across...
How did we move the mountain? - Migrating 1 trillion+ messages per day across...
 
AWS compute Services
AWS compute ServicesAWS compute Services
AWS compute Services
 
Real time data processing with anypoint connector for kafka
Real time data processing with anypoint connector for kafkaReal time data processing with anypoint connector for kafka
Real time data processing with anypoint connector for kafka
 
An Introduction to Confluent Cloud: Apache Kafka as a Service
An Introduction to Confluent Cloud: Apache Kafka as a ServiceAn Introduction to Confluent Cloud: Apache Kafka as a Service
An Introduction to Confluent Cloud: Apache Kafka as a Service
 

Viewers also liked

Apache Spark
Apache Spark Apache Spark
Apache Spark
Majid Hajibaba
 
The Lean Lifecycle in the Cloud
The Lean Lifecycle in the CloudThe Lean Lifecycle in the Cloud
The Lean Lifecycle in the Cloud
Amazon Web Services
 
Checklist for Competent Cloud Security Management
Checklist for Competent Cloud Security ManagementChecklist for Competent Cloud Security Management
Checklist for Competent Cloud Security Management
Cloud Credential Council
 
UMF Cloud Pilot: architecturing an IaaS offer for higher education
UMF Cloud Pilot: architecturing an IaaS offer for higher educationUMF Cloud Pilot: architecturing an IaaS offer for higher education
UMF Cloud Pilot: architecturing an IaaS offer for higher education
Andy Powell
 
Battelle AoA Evaluation Report on Military Mesh Network Products
Battelle AoA Evaluation Report on Military Mesh Network Products Battelle AoA Evaluation Report on Military Mesh Network Products
Battelle AoA Evaluation Report on Military Mesh Network Products
MeshDynamics
 
Storm (Distribute Stream Processing System)
Storm (Distribute Stream Processing System)Storm (Distribute Stream Processing System)
Storm (Distribute Stream Processing System)
Majid Hajibaba
 
Cloud Computing Principles and Paradigms: 10 comet cloud-an autonomic cloud e...
Cloud Computing Principles and Paradigms: 10 comet cloud-an autonomic cloud e...Cloud Computing Principles and Paradigms: 10 comet cloud-an autonomic cloud e...
Cloud Computing Principles and Paradigms: 10 comet cloud-an autonomic cloud e...
Majid Hajibaba
 
Cloud Computing Principles and Paradigms: 11 t-systems cloud-based solutions ...
Cloud Computing Principles and Paradigms: 11 t-systems cloud-based solutions ...Cloud Computing Principles and Paradigms: 11 t-systems cloud-based solutions ...
Cloud Computing Principles and Paradigms: 11 t-systems cloud-based solutions ...
Majid Hajibaba
 
Simplifying M&A Consolidation | Salesforce Mergers and Acquisitions: Deamforc...
Simplifying M&A Consolidation | Salesforce Mergers and Acquisitions: Deamforc...Simplifying M&A Consolidation | Salesforce Mergers and Acquisitions: Deamforc...
Simplifying M&A Consolidation | Salesforce Mergers and Acquisitions: Deamforc...
Jade Global
 
AWS IoT: a cloud platform for building IoT applications
AWS IoT: a cloud platform for building IoT applicationsAWS IoT: a cloud platform for building IoT applications
AWS IoT: a cloud platform for building IoT applications
Andy Powell
 
Federal Cloud Computing Report - Market Connections & General Dynamics Inform...
Federal Cloud Computing Report - Market Connections & General Dynamics Inform...Federal Cloud Computing Report - Market Connections & General Dynamics Inform...
Federal Cloud Computing Report - Market Connections & General Dynamics Inform...
Market Connections, Inc.
 
8 secure distributed data storage in cloud computing
8 secure distributed data storage in cloud computing8 secure distributed data storage in cloud computing
8 secure distributed data storage in cloud computing
Majid Hajibaba
 
Cloud Computing Principles and Paradigms: 2 migration into a cloud
Cloud Computing Principles and Paradigms: 2 migration into a cloudCloud Computing Principles and Paradigms: 2 migration into a cloud
Cloud Computing Principles and Paradigms: 2 migration into a cloud
Majid Hajibaba
 
Cloud Computing Principles and Paradigms: 6 on the management of virtual mach...
Cloud Computing Principles and Paradigms: 6 on the management of virtual mach...Cloud Computing Principles and Paradigms: 6 on the management of virtual mach...
Cloud Computing Principles and Paradigms: 6 on the management of virtual mach...
Majid Hajibaba
 
Cloud Computing Principles and Paradigms: 9 aneka-integration of private and ...
Cloud Computing Principles and Paradigms: 9 aneka-integration of private and ...Cloud Computing Principles and Paradigms: 9 aneka-integration of private and ...
Cloud Computing Principles and Paradigms: 9 aneka-integration of private and ...
Majid Hajibaba
 
5. decision making
5. decision making5. decision making
5. decision making
VJTI Production
 
Cloud Computing Principles and Paradigms: 7 enhancing cloud computing environ...
Cloud Computing Principles and Paradigms: 7 enhancing cloud computing environ...Cloud Computing Principles and Paradigms: 7 enhancing cloud computing environ...
Cloud Computing Principles and Paradigms: 7 enhancing cloud computing environ...
Majid Hajibaba
 
Cloud Computing Principles and Paradigms: 4 the enterprise cloud computing pa...
Cloud Computing Principles and Paradigms: 4 the enterprise cloud computing pa...Cloud Computing Principles and Paradigms: 4 the enterprise cloud computing pa...
Cloud Computing Principles and Paradigms: 4 the enterprise cloud computing pa...
Majid Hajibaba
 
cloud computing, Principle and Paradigms: 1 introdution
cloud computing, Principle and Paradigms: 1 introdutioncloud computing, Principle and Paradigms: 1 introdution
cloud computing, Principle and Paradigms: 1 introdution
Majid Hajibaba
 
Cloud Computing Principles and Paradigms: 3 enriching the integration as a se...
Cloud Computing Principles and Paradigms: 3 enriching the integration as a se...Cloud Computing Principles and Paradigms: 3 enriching the integration as a se...
Cloud Computing Principles and Paradigms: 3 enriching the integration as a se...
Majid Hajibaba
 

Viewers also liked (20)

Apache Spark
Apache Spark Apache Spark
Apache Spark
 
The Lean Lifecycle in the Cloud
The Lean Lifecycle in the CloudThe Lean Lifecycle in the Cloud
The Lean Lifecycle in the Cloud
 
Checklist for Competent Cloud Security Management
Checklist for Competent Cloud Security ManagementChecklist for Competent Cloud Security Management
Checklist for Competent Cloud Security Management
 
UMF Cloud Pilot: architecturing an IaaS offer for higher education
UMF Cloud Pilot: architecturing an IaaS offer for higher educationUMF Cloud Pilot: architecturing an IaaS offer for higher education
UMF Cloud Pilot: architecturing an IaaS offer for higher education
 
Battelle AoA Evaluation Report on Military Mesh Network Products
Battelle AoA Evaluation Report on Military Mesh Network Products Battelle AoA Evaluation Report on Military Mesh Network Products
Battelle AoA Evaluation Report on Military Mesh Network Products
 
Storm (Distribute Stream Processing System)
Storm (Distribute Stream Processing System)Storm (Distribute Stream Processing System)
Storm (Distribute Stream Processing System)
 
Cloud Computing Principles and Paradigms: 10 comet cloud-an autonomic cloud e...
Cloud Computing Principles and Paradigms: 10 comet cloud-an autonomic cloud e...Cloud Computing Principles and Paradigms: 10 comet cloud-an autonomic cloud e...
Cloud Computing Principles and Paradigms: 10 comet cloud-an autonomic cloud e...
 
Cloud Computing Principles and Paradigms: 11 t-systems cloud-based solutions ...
Cloud Computing Principles and Paradigms: 11 t-systems cloud-based solutions ...Cloud Computing Principles and Paradigms: 11 t-systems cloud-based solutions ...
Cloud Computing Principles and Paradigms: 11 t-systems cloud-based solutions ...
 
Simplifying M&A Consolidation | Salesforce Mergers and Acquisitions: Deamforc...
Simplifying M&A Consolidation | Salesforce Mergers and Acquisitions: Deamforc...Simplifying M&A Consolidation | Salesforce Mergers and Acquisitions: Deamforc...
Simplifying M&A Consolidation | Salesforce Mergers and Acquisitions: Deamforc...
 
AWS IoT: a cloud platform for building IoT applications
AWS IoT: a cloud platform for building IoT applicationsAWS IoT: a cloud platform for building IoT applications
AWS IoT: a cloud platform for building IoT applications
 
Federal Cloud Computing Report - Market Connections & General Dynamics Inform...
Federal Cloud Computing Report - Market Connections & General Dynamics Inform...Federal Cloud Computing Report - Market Connections & General Dynamics Inform...
Federal Cloud Computing Report - Market Connections & General Dynamics Inform...
 
8 secure distributed data storage in cloud computing
8 secure distributed data storage in cloud computing8 secure distributed data storage in cloud computing
8 secure distributed data storage in cloud computing
 
Cloud Computing Principles and Paradigms: 2 migration into a cloud
Cloud Computing Principles and Paradigms: 2 migration into a cloudCloud Computing Principles and Paradigms: 2 migration into a cloud
Cloud Computing Principles and Paradigms: 2 migration into a cloud
 
Cloud Computing Principles and Paradigms: 6 on the management of virtual mach...
Cloud Computing Principles and Paradigms: 6 on the management of virtual mach...Cloud Computing Principles and Paradigms: 6 on the management of virtual mach...
Cloud Computing Principles and Paradigms: 6 on the management of virtual mach...
 
Cloud Computing Principles and Paradigms: 9 aneka-integration of private and ...
Cloud Computing Principles and Paradigms: 9 aneka-integration of private and ...Cloud Computing Principles and Paradigms: 9 aneka-integration of private and ...
Cloud Computing Principles and Paradigms: 9 aneka-integration of private and ...
 
5. decision making
5. decision making5. decision making
5. decision making
 
Cloud Computing Principles and Paradigms: 7 enhancing cloud computing environ...
Cloud Computing Principles and Paradigms: 7 enhancing cloud computing environ...Cloud Computing Principles and Paradigms: 7 enhancing cloud computing environ...
Cloud Computing Principles and Paradigms: 7 enhancing cloud computing environ...
 
Cloud Computing Principles and Paradigms: 4 the enterprise cloud computing pa...
Cloud Computing Principles and Paradigms: 4 the enterprise cloud computing pa...Cloud Computing Principles and Paradigms: 4 the enterprise cloud computing pa...
Cloud Computing Principles and Paradigms: 4 the enterprise cloud computing pa...
 
cloud computing, Principle and Paradigms: 1 introdution
cloud computing, Principle and Paradigms: 1 introdutioncloud computing, Principle and Paradigms: 1 introdution
cloud computing, Principle and Paradigms: 1 introdution
 
Cloud Computing Principles and Paradigms: 3 enriching the integration as a se...
Cloud Computing Principles and Paradigms: 3 enriching the integration as a se...Cloud Computing Principles and Paradigms: 3 enriching the integration as a se...
Cloud Computing Principles and Paradigms: 3 enriching the integration as a se...
 

Similar to Kafka

An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
Amir Sedighi
 
NATS: Control Flow for Distributed Systems
NATS: Control Flow for Distributed SystemsNATS: Control Flow for Distributed Systems
NATS: Control Flow for Distributed Systems
Apcera
 
Using NATS for Control Flow in Distributed Systems
Using NATS for Control Flow in Distributed SystemsUsing NATS for Control Flow in Distributed Systems
Using NATS for Control Flow in Distributed Systems
NATS
 
Liveperson DLD 2015
Liveperson DLD 2015 Liveperson DLD 2015
Liveperson DLD 2015
LivePerson
 
Twitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
Twitter’s Apache Kafka Adoption Journey | Ming Liu, TwitterTwitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
Twitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
HostedbyConfluent
 
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support PerspectiveApache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
HostedbyConfluent
 
Getting Started with Kafka on k8s
Getting Started with Kafka on k8sGetting Started with Kafka on k8s
Getting Started with Kafka on k8s
VMware Tanzu
 
F_1330_Narkhede_Kafka .pptx
F_1330_Narkhede_Kafka .pptxF_1330_Narkhede_Kafka .pptx
F_1330_Narkhede_Kafka .pptx
NIMITJAIN71
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
sKaushikNarayanan
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
sKaushikNarayanan
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
MvkZ
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
sKaushikNarayanan
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
sKaushikNarayanan
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
MvkZ
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
MvkZ
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
David Groozman
 
101 mistakes FINN.no has made with Kafka (Baksida meetup)
101 mistakes FINN.no has made with Kafka (Baksida meetup)101 mistakes FINN.no has made with Kafka (Baksida meetup)
101 mistakes FINN.no has made with Kafka (Baksida meetup)
Henning Spjelkavik
 
Robust stream processing with Apache Flink
Robust stream processing with Apache FlinkRobust stream processing with Apache Flink
Robust stream processing with Apache Flink
Aljoscha Krettek
 
AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)
AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)
AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)
Kay Lerch
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Degendra Sivakoti
 

Similar to Kafka (20)

An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
NATS: Control Flow for Distributed Systems
NATS: Control Flow for Distributed SystemsNATS: Control Flow for Distributed Systems
NATS: Control Flow for Distributed Systems
 
Using NATS for Control Flow in Distributed Systems
Using NATS for Control Flow in Distributed SystemsUsing NATS for Control Flow in Distributed Systems
Using NATS for Control Flow in Distributed Systems
 
Liveperson DLD 2015
Liveperson DLD 2015 Liveperson DLD 2015
Liveperson DLD 2015
 
Twitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
Twitter’s Apache Kafka Adoption Journey | Ming Liu, TwitterTwitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
Twitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
 
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support PerspectiveApache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
 
Getting Started with Kafka on k8s
Getting Started with Kafka on k8sGetting Started with Kafka on k8s
Getting Started with Kafka on k8s
 
F_1330_Narkhede_Kafka .pptx
F_1330_Narkhede_Kafka .pptxF_1330_Narkhede_Kafka .pptx
F_1330_Narkhede_Kafka .pptx
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
 
101 mistakes FINN.no has made with Kafka (Baksida meetup)
101 mistakes FINN.no has made with Kafka (Baksida meetup)101 mistakes FINN.no has made with Kafka (Baksida meetup)
101 mistakes FINN.no has made with Kafka (Baksida meetup)
 
Robust stream processing with Apache Flink
Robust stream processing with Apache FlinkRobust stream processing with Apache Flink
Robust stream processing with Apache Flink
 
AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)
AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)
AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 

Recently uploaded

UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
Peter Muessig
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
Grant Fritchey
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
XfilesPro
 
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
dakas1
 
fiscal year variant fiscal year variant.
fiscal year variant fiscal year variant.fiscal year variant fiscal year variant.
fiscal year variant fiscal year variant.
AnkitaPandya11
 
UI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design SystemUI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design System
Peter Muessig
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
zOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL DifferenceszOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL Differences
YousufSait3
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
Hornet Dynamics
 
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
gapen1
 
Liberarsi dai framework con i Web Component.pptx
Liberarsi dai framework con i Web Component.pptxLiberarsi dai framework con i Web Component.pptx
Liberarsi dai framework con i Web Component.pptx
Massimo Artizzu
 
How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
Rakesh Kumar R
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
Remote DBA Services
 
Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
Alberto Brandolini
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
Philip Schwarz
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
Sven Peters
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
Patrick Weigel
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
Remote DBA Services
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 

Recently uploaded (20)

UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
 
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
 
fiscal year variant fiscal year variant.
fiscal year variant fiscal year variant.fiscal year variant fiscal year variant.
fiscal year variant fiscal year variant.
 
UI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design SystemUI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design System
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
zOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL DifferenceszOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL Differences
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
 
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
 
Liberarsi dai framework con i Web Component.pptx
Liberarsi dai framework con i Web Component.pptxLiberarsi dai framework con i Web Component.pptx
Liberarsi dai framework con i Web Component.pptx
 
How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
 
Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 

Kafka

  • 1. Distributed Message Broker Presented by Majid Hajibaba 12 April 2015Majid Hajibaba 1
  • 2.  What?  Is a messaging system (integration between producers and consumers)  Distributed  Peer-to-Peer  High-throughput  Fault Tolerant  Replicated  Developed at LinkedIn  Why?  Log aggregation  Stream processing  real-time processing  as the output sink  act as a buffer or feeder for messages What is Kafka? 12 April 2015Majid Hajibaba 2
  • 3.  Topics: categories in which message feed is maintained  Producer: Processes that publish messages to a Kafka topic  Consumers: processes that subscribe to topics and process the feed of published messages  Brokers: Servers which form a kafka cluster and act as a data transport channel between producers and consumers 12 April 2015Majid Hajibaba 3 Terminology
  • 4. The Kafka architecture 12 April 2015Majid Hajibaba 4 A Kafka cluster Stateless brokers
  • 5.  Topic is a queue  Have multiple partitions (scaling, parallelism)  Consumed by multiple consumers  Reads and writes can happen to each partition in parallel 12 April 2015Majid Hajibaba 5 Topic partitions ≈ directories
  • 6.  consumers should pull data from brokers ?  brokers should push data to the consumer? 12 April 2015Majid Hajibaba 6 Pull vs. Push push pull Reads are done by giving the 64-bit logical offset of a message and an S-byte max chunk size The write allows serial appends which always go to the last file. the maximum possible rate
  • 7.  Synchronous send  Producers get an ack. back when they publish a message  Asynchronous send  does not guarantee message delivery  Batching  will attempt to accumulate data in memory and to send out larger batches in a single request  Load balancing  client controls 12 April 2015Majid Hajibaba 7 Producer
  • 8. 12 April 2015Majid Hajibaba 8 Consumer partition p partition q partition p , q
  • 9. 12 April 2015Majid Hajibaba 9 Kafka Storage Architecture Each log file is named with the offset of the first message it contains file is rolled over to a fresh file when it reaches a configurable size
  • 10.  n replicas can afford n-1 failures  one replica acts as the lead replica  lead replica maintains the list of all in-sync follower replicas 12 April 2015Majid Hajibaba 10 Replication
  • 11. 12 April 2015Majid Hajibaba 11 Replication bin/kafka-topics.sh --create --zookeeper 192.168.11.185:2181 -- replication-factor 3 --partitions 4 --topic test
  • 12.  At most once: Messages may be lost but are never redelivered  At least once: Messages are never lost but may be redelivered  Exactly once: this is what people actually want, each message is delivered once and only once  When publishing a message ??  At most once: without any acknowledgment  Exactly once: can be achieved by producer and acknowledgment  When consuming a message ??  At most once: by sending ack after taking the message  At least once: by sending ack after processing message  Exactly once: requires co-operation with the destination storage system 12 April 2015Majid Hajibaba 12 Message Delivery Semantics
  • 13.  Adding new server  Just assign a unique broker id and start up Kafka on it  Will connect to others through zookeeper  Will not automatically be assigned any data partitions  Won't be doing any work until new topics are created  Should migrate some existing data to these machines  Data migrating is manually initiated but fully automated 12 April 2015Majid Hajibaba 13 Scaling
  • 14.  no replication, no partition  Linux virtual machine, 2.6GHz Intel xenon, 2GB memory  publish a total of 1 million messages  each of 300 bytes (300 MB) 12 April 2015Majid Hajibaba 14 Performance Test 7000 12000 17000 22000 27000 32000 37000 42000 300 600 900 1200 Messages/sec Accomulated Message in MB 1 producer 2 producer 4 producer 8 producer 16 producer 16 producer with 2 cpu for broker 40000 messages/second 12 MB/second
  • 15.  no replication, 2 partition 12 April 2015Majid Hajibaba 15 Scalability test 0 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 1 producer 2 producer 4 producer 8 producer 16 producer Messages/sec number of producers 1 Broker 2 Broker
  • 16. 12 April 2015Majid Hajibaba 16 Single node – multiple broker
  • 17. 12 April 2015Majid Hajibaba 17 Multiple node – multiple broker
  • 18. 12 April 2015Majid Hajibaba 18 Kafka Usage at LinkedIn
  • 19.  List Topics  Create Topics  Delete Topic 12 April 2015Majid Hajibaba 19 Commands bin/kafka-topics.sh --list --zookeeper 192.168.11.185:2181 bin/kafka-topics.sh --create --zookeeper 192.168.11.185:2181 -- replication-factor 1 --partitions 1 --topic test bin/kafka-topics.sh --delete --zookeeper 192.168.11.185:2181 -- topic test
  • 20. END Any Question? 12 April 2015Majid Hajibaba 20

Editor's Notes

  1. Apache Kafka is a high-throughput, distributed, fault tolerant, and replicated messaging system that was first developed at LinkedIn. The use cases of Kafka vary from log aggregation to stream processing to replacing other messaging systems. Kafka has emerged as one of the important components of real-time processing pipelines in combination with Storm. Kafka can act as a buffer or feeder for messages that need to be processed by Storm. Kafka can also be used as the output sink for results emitted from the Storm topologies. Most of the time, applications that are producing information and applications that are consuming this information are well apart and inaccessible to each other. This, at times, leads to redevelopment of information producers or consumers to provide an integration point between them. Therefore, a mechanism is required for seamless integration of information of producers and consumers to avoid any kind of rewriting of an application at either end. Kafka does not have any concept of a master and treats all the brokers as peers.
  2. messages are published by a producer to named entities called topics. A broker receives the messages from a producer (push mechanism) and delivers the messages to a consumer (pull mechanism). Kafka is run as a cluster comprised of one or more servers each of which is called a broker. Kafka uses Zookeeper to share and save state between brokers. Each broker maintains a set of partitions: primary and/ or secondary for each topic. A set of Kafka brokers working together will maintain a set of topics. Each topic has its partitions distributed over the participating Kafka brokers
  3. A topic is a queue that can be consumed by multiple consumers. For parallelism, a Kafka topic can have multiple partitions. Each of these directories can be on different disks, allowing us to overcome the I/O limitations of a single disk. Each message in a partition has a unique sequence number associated with it called an offset. Two partitions of a single topic can be allocated on different brokers, thus increasing throughput as each partition is independent of each other. A consumer reads a range of messages from a broker. Most messaging systems keep metadata about what messages have been consumed on the broker. messages are only marked as sent not consumed when they are sent; the broker waits for a specific acknowledgement from the consumer to record the message as consumed. The second problem is around performance, now the broker must keep multiple states about every single message. In Kaka the position of consumer in each partition is just a single integer, the offset of the next message to consume. This makes the state about what has been consumed very small, just one number for each partition.
  4. data is pushed to the broker from the producer and pulled from the broker by the consumer. push-based system has difficulty dealing with diverse consumers as the broker controls the rate at which data is transferred. The goal is generally for the consumer to be able to consume at the maximum possible rate.
  5. Producers get an acknowledgement back when they publish a message containing the record's offset. The first record published to a partition is given the offset 0, the second record 1, and so on in an ever-increasing sequence. Consumers consume data from a position specified by an offset, and they save their position in a log by committing periodically: saving this offset in case that consumer instance crashes and another instance needs to resume from it's position. The producer is responsible for choosing which message to assign to which partition within the topic. This can be done in a round-robin fashion simply to balance load or it can be done according to some semantic partition function (say based on some key in the message). The producer sends data directly to the broker that is the leader for the partition. The client controls which partition it publishes messages to. This can be done at random, implementing a kind of random load balancing, or it can be done by some semantic partitioning function. Batching is one of the big drivers of efficiency, and to enable batching the Kafka producer will attempt to accumulate data in memory and to send out larger batches in a single request.
  6. A consumer reads a range of messages from a broker. A group ID is associated with each consumer. All the consumers with the same group ID act as a single logical consumer. Each message of the topic is delivered to one consumer from a consumer group (with the same group ID). Different consumer groups for a particular topic can process messages at their own pace as messages are not removed from the topics as soon as they are consumed. In fact, it is the responsibility of the consumers to keep track of how many messages they have consumed. The broker subscription API requires the identifier of the last message a consumer had from a given partition and starts to stream from that point on. The Kafka consumer works by issuing "fetch" requests to the brokers leading the partitions it wants to consume. The consumer specifies its offset in the log with each request and receives back a chunk of log beginning from that position.
  7. A log for a topic named "my_topic" with two partitions consists of two directories (namely my_topic_0 and my_topic_1) populated with data files containing the messages for that topic. The log allows serial appends which always go to the last file. This file is rolled over to a fresh file when it reaches a configurable size (say 1GB).
  8. replication guarantees that the message will be published and consumed even in case of broker failure. In replication, each partition of a message has n replicas and can afford n-1 failures to guarantee message delivery Out of the n replicas, one replica acts as the lead replica for the rest of the replicas. ZooKeeper keeps the information about the lead replica and the current in-sync follower replica (lead replica maintains the list of all in-sync follower replicas). If the lead replica fails, either while writing the message partition to its local log or before sending the acknowledgement to the message producer, a message partition is resent by the producer to the new lead broker. The very first registered replica becomes the new lead replica, and the rest of the registered replicas become the followers. Each follower replica sends an acknowledgement to the lead replica once the message is written to its respective logs. when the broker is restarted it will only be a follower for all its partitions, meaning it will not be used for client reads and writes
  9. For each partition, a broker stores the incoming messages with monoticaly increasing order identifiers (offsets) and persists the “deck” to disk using a data structure with access complexity of O(1). The producer sends data directly to the broker that is the leader for the partition without any intervening routing tier. To help the producer do this all Kafka nodes can answer a request for metadata about which servers are alive and where the leaders for the partitions of a topic are at any given time to allow the producer to appropriate direct its requests.
  10. When publishing a message At most once: If a producer attempts to publish a message and experiences a network error it cannot be sure if this error happened before or after the message was committed. When consuming a message Kafka guarantees at-least-once delivery by default and allows the user to implement at most once delivery by disabling retries on the producer and committing its offset prior to processing a batch of messages. Exactly-once delivery requires co-operation with the destination storage system but Kafka provides the offset which makes implementing this straight-forward. The classic way of achieving exactly once would be to introduce a two-phase commit between the storage for the consumer position and the storage of the consumers output. But this can be handled more simply and generally by simply letting the consumer store its offset in the same place as its output.
  11. Adding servers to a Kafka cluster is easy, just assign them a unique broker id and start up Kafka on your new servers. However these new servers will not automatically be assigned any data partitions, so unless partitions are moved to them they won't be doing any work until new topics are created. So usually when you add machines to your cluster you will want to migrate some existing data to these machines. The process of migrating data is manually initiated but fully automated. Under the covers what happens is that Kafka will add the new server as a follower of the partition it is migrating and allow it to fully replicate the existing data in that partition. When the new server has fully replicated the contents of this partition and joined the in-sync replica one of the existing replicas will delete their partition's data.