SlideShare a Scribd company logo
Introduction to Kafka
BY DUCAS FRANCIS
The problem
Web
Security
System
Real-time
Monitoring
Logging
System
Other
services
Mobile
API
Job
It’s simple enough at first…
Then it gets a little busy…
And ends up a mess.
The solution
Web
Security
System
Real-time
Monitoring
Logging
System
Other
services
Mobile
API
Job
Pub/Sub
Decouple data pipelines using a pub/sub system
Producers Brokers Consumers
Apache Kafka
A UNIFIED, HIGH-
THROUGHPUT, LOW-LATENCY
PLATFORM FOR HANDLING
REAL-TIME DATA FEEDS
A brief history lesson
 Originally developed at LinkedIn in 2011
 Graduated Apache Incubator in 2012
 Engineers from LinkedIn formed Confluent in 2014
 Up to version 0.9.4 with 0.10 on horizon
Motivation
 Unified platform for all real-time data feeds
 High throughput for high volume streams
 Support periodic data loads from offline systems
 Low latency for traditional messaging
 Support partitioned, distributed, real-time processing
 Guarantee fault-tolerance
Common use cases
 Messaging
 Website activity tracking
 Metrics
 Log aggregation
 Stream processing
 Event sourcing
 Commit log
Benefits of Kafka
 High throughput
 Low latency
 Load balancing
 Fault tolerant
 Guaranteed delivery
 Secure
Performance comparison
Batch performance comparison
Some terminology
 Topic – feed of messages
 Producer – publishes messages to a topic
 Consumer – subscribes to topics and processes the feed of messages
 Broker – server instance that acts in a cluster
@apachekafkapowers
@microsot…
Libraries
 Python – kafka-python / pykafka
 Go – sarama / go_kafka_client / …
 C/C++ - librdkafka / libkafka / …
 .NET – kafka-net (x2) / rdkafka-dotnet / CSharpClient-for-Kafka
 Node.js – kafka-node / sutoiku/node-kafka / ...
 HTTP – kafka-pixy / kafka-rest
 etc.
Architecture
Producer Producer
Broker BrokerBroker
Consumer ConsumerZookeeper
Cluster
x3
Show me the
Kafka!!!
VAGRANT TO THE RESCUE
Anatomy of a topic
 Topics are broken into partitions
 Messages are assigned sequential ID
called and offset
 Data is retained for a configurable
period of time
 Number of partitions can be increased
after creation, but not decreased
 Partitions are assigned to brokers
Each partition is an ordered, immutable sequence of messages that is continually appended to…
a commit log.
Broker
 Kafka service running as part of a cluster
 Receives messages from producers and serves them to consumers
 Coordinated using Zookeeper
 Need odd number for quorum
 Store messages on the file system
 Replicate messages to/from other brokers
 Answer metadata requests about brokers and topics/partitions
 As of 0.9.0 – coordinate consumers
Replication
 Partitions on a topic should be replicated
 Each partition has 1 leader and 0 or more followers
 An In-Sync Replica (ISR) is one that’s communicating with Zookeeper and not too
far behind the leader
 Replication factor can be increased after creation, not decreased
./kafka-topics
--CREATE
--REPLICATION-FACTOR
--PARTITIONS
--DESCRIBE
Producers
 Publishes messages to a topic
 Distributes messages across partitions
 Round-robin
 Key hashing
 Send synchronously or asynchronously to the broker that is the leader for the
partition
 ACKS = 0 (none),1 (leader), -1 (all ISRs)
 Synchronous is obviously slower, but more durable
Testing... Testing…
1 2 3
LET’S SEE HOW FAST WE CAN
PUSH
Consumers
 Read messages from a topic
 Multiple consumers can read from the same topic
 Manage their offsets
 Messages stay on Kafka after they are consumed
Testing... Testing…
1 2 3
LET’S SEE HOW FAST WE CAN
RECEIVE
It’s fast! But why…?
 Efficient protocol based on message set
 Batching messages to reduce network latency and small I/O operations
 Append/chunk messages to increase consumer throughput
 Optimised OS operations
 pagecache
 sendfile()
 Broker services consumers from cache where possible
 End-to-end batch compression
Load balanced consumers
 Distribute load across instances in a group by allocating partitions
 Handle failure by rebalancing partitions to other instances
 Commit their offsets to Kafka
Cluster
Broker 1 Broker 2
P0 P1 P2 P3
Consumer Group 1
C0 C1
Consumer Group 2
C2 C3 C4 C6
Consumer groups and offsets
Cluster
Broker 1 Broker 2
P0 P1 P2 P3
Consumer Group 1
C0 C1
0 1 2 3 4 5 6 7 8 9 10P3
C1
read
C1
commit
C0
read
C0
commit
Guarantees
 Messages sent by a producer to a particular topic’s partition will be appended in
the order they are sent
 A consumer instance sees messages in the order they are stored in the log
 For a topic with replication factor N, we will tolerate up to N-1 server failures
without losing any messages committed to the log
Ordered delivery
 Messages are guaranteed to be delivered in order by partition, NOT topic
M1 M3 M5
M2 M4 M6
P0
P1
 M1 before M3 before M5 – YES
 M1 before M2 – NO
 M2 before M4 before M6 – YES
 M2 before M3 - NO
Enough ALT… now
.NET
USING RDKAFKA-DOTNET
FIN. THANK YOU
Resources
 http://kafka.apache.org/documentation.html
 http://www.confluent.io/
 https://kafka.apache.org/090/configuration.html
 https://github.com/edenhill/librdkafka
 https://github.com/ah-/rdkafka-dotnet
Log compaction
 Keep the most recent payload for a key
 Use cases
 Database change subscription
 Event sourcing
 Journaling for HA
Log compaction

More Related Content

What's hot

Apache Kafka - Free Friday
Apache Kafka - Free FridayApache Kafka - Free Friday
Apache Kafka - Free Friday
Otávio Carvalho
 
Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)
Timothy Spann
 
Apache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewApache Kafka - Messaging System Overview
Apache Kafka - Messaging System Overview
Dmitry Tolpeko
 
Devoxx Morocco 2016 - Microservices with Kafka
Devoxx Morocco 2016 - Microservices with KafkaDevoxx Morocco 2016 - Microservices with Kafka
Devoxx Morocco 2016 - Microservices with Kafka
László-Róbert Albert
 
Kafka connect 101
Kafka connect 101Kafka connect 101
Kafka connect 101
Whiteklay
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
Design Patterns for working with Fast Data
Design Patterns for working with Fast DataDesign Patterns for working with Fast Data
Design Patterns for working with Fast Data
MapR Technologies
 
Kafka connect
Kafka connectKafka connect
Kafka connect
Andrew Stevenson
 
Kafka goutam chowdhury-unicom-spark kafka-summit
Kafka goutam chowdhury-unicom-spark kafka-summitKafka goutam chowdhury-unicom-spark kafka-summit
Kafka goutam chowdhury-unicom-spark kafka-summit
Goutam Chowdhury
 
Understanding kafka
Understanding kafkaUnderstanding kafka
Understanding kafka
AmitDhodi
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Jemin Patel
 
Data Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectData Pipelines with Kafka Connect
Data Pipelines with Kafka Connect
Kaufman Ng
 
The best of Apache Kafka Architecture
The best of Apache Kafka ArchitectureThe best of Apache Kafka Architecture
The best of Apache Kafka Architecture
techmaddy
 
Building Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache Kafka
Brian Ritchie
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Guozhang Wang
 
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and VormetricProtecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
confluent
 
Kafka clients and emitters
Kafka clients and emittersKafka clients and emitters
Kafka clients and emitters
Edgar Domingues
 
Building an Event Bus at Scale
Building an Event Bus at ScaleBuilding an Event Bus at Scale
Building an Event Bus at Scale
jimriecken
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
Shiao-An Yuan
 
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the FieldKafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
confluent
 

What's hot (20)

Apache Kafka - Free Friday
Apache Kafka - Free FridayApache Kafka - Free Friday
Apache Kafka - Free Friday
 
Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)
 
Apache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewApache Kafka - Messaging System Overview
Apache Kafka - Messaging System Overview
 
Devoxx Morocco 2016 - Microservices with Kafka
Devoxx Morocco 2016 - Microservices with KafkaDevoxx Morocco 2016 - Microservices with Kafka
Devoxx Morocco 2016 - Microservices with Kafka
 
Kafka connect 101
Kafka connect 101Kafka connect 101
Kafka connect 101
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
Design Patterns for working with Fast Data
Design Patterns for working with Fast DataDesign Patterns for working with Fast Data
Design Patterns for working with Fast Data
 
Kafka connect
Kafka connectKafka connect
Kafka connect
 
Kafka goutam chowdhury-unicom-spark kafka-summit
Kafka goutam chowdhury-unicom-spark kafka-summitKafka goutam chowdhury-unicom-spark kafka-summit
Kafka goutam chowdhury-unicom-spark kafka-summit
 
Understanding kafka
Understanding kafkaUnderstanding kafka
Understanding kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Data Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectData Pipelines with Kafka Connect
Data Pipelines with Kafka Connect
 
The best of Apache Kafka Architecture
The best of Apache Kafka ArchitectureThe best of Apache Kafka Architecture
The best of Apache Kafka Architecture
 
Building Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache Kafka
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
 
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and VormetricProtecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
 
Kafka clients and emitters
Kafka clients and emittersKafka clients and emitters
Kafka clients and emitters
 
Building an Event Bus at Scale
Building an Event Bus at ScaleBuilding an Event Bus at Scale
Building an Event Bus at Scale
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the FieldKafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
 

Viewers also liked

Building a robot with the .Net Micro Framework
Building a robot with the .Net Micro FrameworkBuilding a robot with the .Net Micro Framework
Building a robot with the .Net Micro Framework
Ducas Francis
 
A Day in the Life of a Metro-veloper
A Day in the Life of a Metro-veloperA Day in the Life of a Metro-veloper
A Day in the Life of a Metro-veloper
Ducas Francis
 
Seattle kafka meetup nov 2015 published siphon
Seattle kafka meetup nov 2015 published  siphonSeattle kafka meetup nov 2015 published  siphon
Seattle kafka meetup nov 2015 published siphon
Nitin Kumar
 
Continuous Delivery Pipeline - Patterns and Anti-patterns
Continuous Delivery Pipeline - Patterns and Anti-patternsContinuous Delivery Pipeline - Patterns and Anti-patterns
Continuous Delivery Pipeline - Patterns and Anti-patterns
Sonatype
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
David Groozman
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
Samuel Kerrien
 
Apache Kafka lessons learned @PAYBACK
Apache Kafka lessons learned @PAYBACKApache Kafka lessons learned @PAYBACK
Apache Kafka lessons learned @PAYBACK
Maxim Shelest
 
Kafka At Scale in the Cloud
Kafka At Scale in the CloudKafka At Scale in the Cloud
Kafka At Scale in the Cloud
confluent
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and Zookeeper
Rahul Jain
 

Viewers also liked (9)

Building a robot with the .Net Micro Framework
Building a robot with the .Net Micro FrameworkBuilding a robot with the .Net Micro Framework
Building a robot with the .Net Micro Framework
 
A Day in the Life of a Metro-veloper
A Day in the Life of a Metro-veloperA Day in the Life of a Metro-veloper
A Day in the Life of a Metro-veloper
 
Seattle kafka meetup nov 2015 published siphon
Seattle kafka meetup nov 2015 published  siphonSeattle kafka meetup nov 2015 published  siphon
Seattle kafka meetup nov 2015 published siphon
 
Continuous Delivery Pipeline - Patterns and Anti-patterns
Continuous Delivery Pipeline - Patterns and Anti-patternsContinuous Delivery Pipeline - Patterns and Anti-patterns
Continuous Delivery Pipeline - Patterns and Anti-patterns
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
Apache Kafka lessons learned @PAYBACK
Apache Kafka lessons learned @PAYBACKApache Kafka lessons learned @PAYBACK
Apache Kafka lessons learned @PAYBACK
 
Kafka At Scale in the Cloud
Kafka At Scale in the CloudKafka At Scale in the Cloud
Kafka At Scale in the Cloud
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and Zookeeper
 

Similar to Introduction to Kafka

Apache kafka
Apache kafkaApache kafka
Apache kafka
Ramakrishna kapa
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Srikrishna k
 
Kafka syed academy_v1_introduction
Kafka syed academy_v1_introductionKafka syed academy_v1_introduction
Kafka syed academy_v1_introduction
Syed Hadoop
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps_Fest
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
Remarks on Grids e-Science CyberInfrastructure and Peer-to-Peer ...
Remarks on Grids e-Science CyberInfrastructure and Peer-to-Peer ...Remarks on Grids e-Science CyberInfrastructure and Peer-to-Peer ...
Remarks on Grids e-Science CyberInfrastructure and Peer-to-Peer ...
Videoguy
 
Connecting mq&kafka
Connecting mq&kafkaConnecting mq&kafka
Connecting mq&kafka
Matt Leming
 
ppt
pptppt
ppt
pptppt
NaradaBrokering Grid Messaging and Applications as Web Services
NaradaBrokering Grid Messaging and Applications as Web ServicesNaradaBrokering Grid Messaging and Applications as Web Services
NaradaBrokering Grid Messaging and Applications as Web Services
Videoguy
 
NaradaBrokering Grid Messaging and Applications as Web Services
NaradaBrokering Grid Messaging and Applications as Web ServicesNaradaBrokering Grid Messaging and Applications as Web Services
NaradaBrokering Grid Messaging and Applications as Web Services
Videoguy
 
Collaboration and Grid Technologies
Collaboration and Grid TechnologiesCollaboration and Grid Technologies
Collaboration and Grid Technologies
Videoguy
 
Streaming the platform with Confluent (Apache Kafka)
Streaming the platform with Confluent (Apache Kafka)Streaming the platform with Confluent (Apache Kafka)
Streaming the platform with Confluent (Apache Kafka)
GiuseppeBaccini
 
Kafka RealTime Streaming
Kafka RealTime StreamingKafka RealTime Streaming
Kafka RealTime Streaming
Viyaan Jhiingade
 
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQCluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Shameera Rathnayaka
 
Kafka and ibm event streams basics
Kafka and ibm event streams basicsKafka and ibm event streams basics
Kafka and ibm event streams basics
Brian S. Paskin
 
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
GeeksLab Odessa
 
A Quick Guide to Refresh Kafka Skills
A Quick Guide to Refresh Kafka SkillsA Quick Guide to Refresh Kafka Skills
A Quick Guide to Refresh Kafka Skills
Ravindra kumar
 
Understanding Apache Kafka P99 Latency at Scale
Understanding Apache Kafka P99 Latency at ScaleUnderstanding Apache Kafka P99 Latency at Scale
Understanding Apache Kafka P99 Latency at Scale
ScyllaDB
 

Similar to Introduction to Kafka (20)

Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Kafka syed academy_v1_introduction
Kafka syed academy_v1_introductionKafka syed academy_v1_introduction
Kafka syed academy_v1_introduction
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Remarks on Grids e-Science CyberInfrastructure and Peer-to-Peer ...
Remarks on Grids e-Science CyberInfrastructure and Peer-to-Peer ...Remarks on Grids e-Science CyberInfrastructure and Peer-to-Peer ...
Remarks on Grids e-Science CyberInfrastructure and Peer-to-Peer ...
 
Connecting mq&kafka
Connecting mq&kafkaConnecting mq&kafka
Connecting mq&kafka
 
ppt
pptppt
ppt
 
ppt
pptppt
ppt
 
NaradaBrokering Grid Messaging and Applications as Web Services
NaradaBrokering Grid Messaging and Applications as Web ServicesNaradaBrokering Grid Messaging and Applications as Web Services
NaradaBrokering Grid Messaging and Applications as Web Services
 
NaradaBrokering Grid Messaging and Applications as Web Services
NaradaBrokering Grid Messaging and Applications as Web ServicesNaradaBrokering Grid Messaging and Applications as Web Services
NaradaBrokering Grid Messaging and Applications as Web Services
 
Collaboration and Grid Technologies
Collaboration and Grid TechnologiesCollaboration and Grid Technologies
Collaboration and Grid Technologies
 
Streaming the platform with Confluent (Apache Kafka)
Streaming the platform with Confluent (Apache Kafka)Streaming the platform with Confluent (Apache Kafka)
Streaming the platform with Confluent (Apache Kafka)
 
Kafka RealTime Streaming
Kafka RealTime StreamingKafka RealTime Streaming
Kafka RealTime Streaming
 
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQCluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
 
Kafka and ibm event streams basics
Kafka and ibm event streams basicsKafka and ibm event streams basics
Kafka and ibm event streams basics
 
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
 
A Quick Guide to Refresh Kafka Skills
A Quick Guide to Refresh Kafka SkillsA Quick Guide to Refresh Kafka Skills
A Quick Guide to Refresh Kafka Skills
 
Understanding Apache Kafka P99 Latency at Scale
Understanding Apache Kafka P99 Latency at ScaleUnderstanding Apache Kafka P99 Latency at Scale
Understanding Apache Kafka P99 Latency at Scale
 

Recently uploaded

National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 

Recently uploaded (20)

National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 

Introduction to Kafka

Editor's Notes

  1. High throughput – web activity tracking receiving 10’s of events per page hit or interaction. Periodic data loads – every 5min receving 100,000s messages Low latency – pub/sub in ms Distributed – anyone sending or receiving messages should be able to accomplish HA
  2. cd ~/Projects/kafka-vagrant vagrant status vagrant up vagrant ssh kafka-1 cat /etc/kafka/server.properties https://kafka.apache.org/090/configuration.html
  3. Topic – feed of messages Partition – topics are broken into partitions Messages – written to the end of a partition within a topic and assigned a sequential identifier (a 64bit integer) which is called an offset Data is retained within a partition for a configurable amount of time. The time is defaulted in broker configuration, but can be set per topic. Messages are stored on the file system in segmented files. Number of partitions can be increased after creation, but not decreased. This is because (as mentioned) the messages are stored on the file system on a per-partition basis, so reducing partitions would be effectively deleting data. Partitions are assigned to brokers – not topics. Kafka attempts to balance the number of partitions across the available brokers, which can be manually configured too. This is how kafka attempts to load balance its activity because, in theory, each broker having an equal number of partitions should receive an equal number of send and fetch requests.
  4. … The responsibilities of coordination are mixed between ZK and Kafka. Older versions of kafka relied more on ZK, but this is being brought more into the broker and ZK is being used more for service discovery and configuration. Before 0.9.0, consumers were coordinated by ZK and had to have a lot of logic around which partitions were assigned to them. This was changed so that for a new consumer a broker is assigned to be the consumer coordinator and tell the consumers which partitions were assigned to them.
  5. cd ~/Downloads/confluent-2.0.0/bin ls ./kafka-topics ./kafka-topics --create --topic perf-test --partitions 10 --replication-factor 3 --zookeeper 192.168.32.11:2181 ./kafka-topics --list --zookeeper 192.168.32.12 ./kafka-topics --describe --topic perf-test --zookeeper 192.168.32.13
  6. ./kafka-producer-perf-test # Publish 10k x 4kb messages ./kafka-producer-perf-test --topic perf-test --num-records 10000 --record-size 4096 --throughput 1000 --producer-props bootstrap.servers=192.168.32.21:9092,192.168.32.22:9092,192.168.32.23:9092 # Up the throughput for 100k ./kafka-producer-perf-test --topic perf-test --num-records 100000 --record-size 4096 --throughput 100000 --producer-props bootstrap.servers=192.168.32.21:9092,192.168.32.22:9092,192.168.32.23:9092 # No ACKs ./kafka-producer-perf-test --topic perf-test --num-records 1000000 --record-size 4096 --throughput 100000 --producer-props bootstrap.servers=192.168.32.21:9092,192.168.32.22:9092,192.168.32.23:9092 acks=0 # ACKs from all ISRs ./kafka-producer-perf-test --topic perf-test --num-records 1000000 --record-size 4096 --throughput 100000 --producer-props bootstrap.servers=192.168.32.21:9092,192.168.32.22:9092,192.168.32.23:9092 acks=-1 # ACK from leader, use snappy ./kafka-producer-perf-test --topic perf-test --num-records 1000000 --record-size 4096 --throughput 100000 --producer-props bootstrap.servers=192.168.32.21:9092,192.168.32.22:9092,192.168.32.23:9092 acks=1 compression.type=snappy # linger for 100ms ./kafka-producer-perf-test --topic perf-test --num-records 1000000 --record-size 4096 --throughput 100000 --producer-props bootstrap.servers=192.168.32.21:9092,192.168.32.22:9092,192.168.32.23:9092 acks=1 compression.type=snappy linger.ms=100
  7. # Consumer 1M on 5 threads ./kafka-consumer-perf-test --zookeeper 192.168.32.11 --topic perf-test --messages 1000000 --group perf-test --threads 5
  8. Modern OSs maintain a page cache and aggressively use main memory for disk caching. By NOT utilizing this and storing an in-memory representation of data you’re effectively doubling up on the amount of memory you’re application is consuming. By utilizing this you’re utilizing all available RAM for caching without GC penalties. It’s also kept in memory even if the application is restarted. This is obviously advantageous when reading messages, but also when writing. Rather than maintain as much as possible in-memory and flush it all out to the file system in a panic when we run out of space, we invert that. All data is immediately written to a persistent log on the filesystem without necessarily flushing to disk. In effect this just means that it is transferred into the kernel's pagecache. Modern unix operating systems offer a highly optimized code path for transferring data out of pagecache to a socket – the sendfile system call. OS reads data from a file into pagecache in kernel space Application reads from kernel space to a user space buffer Application writes data back to kernel space into a socker buffer OS copies from socket buffer to NIC buffer to send over the network sendfile avoids this by instructing the OS to send data directly from the pagecache to the NIC. This means that consumers that are caught up will be served completely from memory.
  9. Kafka scales topic consumption by distributing partitions among a consumer group, which is a set of consumers sharing a common group identifier. For each group a broker is selected as the group coordinator. The coordinator is responsible for managing the state of the group. Its main job is to mediate partition assignment when new members arrive, old members depart, and when topic metadata changes. The act of reassigning partitions is known as rebalancing the group.