SlideShare a Scribd company logo
1 of 33
Introduction to Kafka
BY DUCAS FRANCIS
The problem
Web
Security
System
Real-time
Monitoring
Logging
System
Other
services
Mobile
API
Job
It’s simple enough at first…
Then it gets a little busy…
And ends up a mess.
The solution
Web
Security
System
Real-time
Monitoring
Logging
System
Other
services
Mobile
API
Job
Pub/Sub
Decouple data pipelines using a pub/sub system
Producers Brokers Consumers
Apache Kafka
A UNIFIED, HIGH-
THROUGHPUT, LOW-LATENCY
PLATFORM FOR HANDLING
REAL-TIME DATA FEEDS
A brief history lesson
 Originally developed at LinkedIn in 2011
 Graduated Apache Incubator in 2012
 Engineers from LinkedIn formed Confluent in 2014
 Up to version 0.9.4 with 0.10 on horizon
Motivation
 Unified platform for all real-time data feeds
 High throughput for high volume streams
 Support periodic data loads from offline systems
 Low latency for traditional messaging
 Support partitioned, distributed, real-time processing
 Guarantee fault-tolerance
Common use cases
 Messaging
 Website activity tracking
 Metrics
 Log aggregation
 Stream processing
 Event sourcing
 Commit log
Benefits of Kafka
 High throughput
 Low latency
 Load balancing
 Fault tolerant
 Guaranteed delivery
 Secure
Performance comparison
Batch performance comparison
Some terminology
 Topic – feed of messages
 Producer – publishes messages to a topic
 Consumer – subscribes to topics and processes the feed of messages
 Broker – server instance that acts in a cluster
@apachekafkapowers
@microsot…
Libraries
 Python – kafka-python / pykafka
 Go – sarama / go_kafka_client / …
 C/C++ - librdkafka / libkafka / …
 .NET – kafka-net (x2) / rdkafka-dotnet / CSharpClient-for-Kafka
 Node.js – kafka-node / sutoiku/node-kafka / ...
 HTTP – kafka-pixy / kafka-rest
 etc.
Architecture
Producer Producer
Broker BrokerBroker
Consumer ConsumerZookeeper
Cluster
x3
Show me the
Kafka!!!
VAGRANT TO THE RESCUE
Anatomy of a topic
 Topics are broken into partitions
 Messages are assigned sequential ID
called and offset
 Data is retained for a configurable
period of time
 Number of partitions can be increased
after creation, but not decreased
 Partitions are assigned to brokers
Each partition is an ordered, immutable sequence of messages that is continually appended to…
a commit log.
Broker
 Kafka service running as part of a cluster
 Receives messages from producers and serves them to consumers
 Coordinated using Zookeeper
 Need odd number for quorum
 Store messages on the file system
 Replicate messages to/from other brokers
 Answer metadata requests about brokers and topics/partitions
 As of 0.9.0 – coordinate consumers
Replication
 Partitions on a topic should be replicated
 Each partition has 1 leader and 0 or more followers
 An In-Sync Replica (ISR) is one that’s communicating with Zookeeper and not too
far behind the leader
 Replication factor can be increased after creation, not decreased
./kafka-topics
--CREATE
--REPLICATION-FACTOR
--PARTITIONS
--DESCRIBE
Producers
 Publishes messages to a topic
 Distributes messages across partitions
 Round-robin
 Key hashing
 Send synchronously or asynchronously to the broker that is the leader for the
partition
 ACKS = 0 (none),1 (leader), -1 (all ISRs)
 Synchronous is obviously slower, but more durable
Testing... Testing…
1 2 3
LET’S SEE HOW FAST WE CAN
PUSH
Consumers
 Read messages from a topic
 Multiple consumers can read from the same topic
 Manage their offsets
 Messages stay on Kafka after they are consumed
Testing... Testing…
1 2 3
LET’S SEE HOW FAST WE CAN
RECEIVE
It’s fast! But why…?
 Efficient protocol based on message set
 Batching messages to reduce network latency and small I/O operations
 Append/chunk messages to increase consumer throughput
 Optimised OS operations
 pagecache
 sendfile()
 Broker services consumers from cache where possible
 End-to-end batch compression
Load balanced consumers
 Distribute load across instances in a group by allocating partitions
 Handle failure by rebalancing partitions to other instances
 Commit their offsets to Kafka
Cluster
Broker 1 Broker 2
P0 P1 P2 P3
Consumer Group 1
C0 C1
Consumer Group 2
C2 C3 C4 C6
Consumer groups and offsets
Cluster
Broker 1 Broker 2
P0 P1 P2 P3
Consumer Group 1
C0 C1
0 1 2 3 4 5 6 7 8 9 10P3
C1
read
C1
commit
C0
read
C0
commit
Guarantees
 Messages sent by a producer to a particular topic’s partition will be appended in
the order they are sent
 A consumer instance sees messages in the order they are stored in the log
 For a topic with replication factor N, we will tolerate up to N-1 server failures
without losing any messages committed to the log
Ordered delivery
 Messages are guaranteed to be delivered in order by partition, NOT topic
M1 M3 M5
M2 M4 M6
P0
P1
 M1 before M3 before M5 – YES
 M1 before M2 – NO
 M2 before M4 before M6 – YES
 M2 before M3 - NO
Enough ALT… now
.NET
USING RDKAFKA-DOTNET
FIN. THANK YOU
Resources
 http://kafka.apache.org/documentation.html
 http://www.confluent.io/
 https://kafka.apache.org/090/configuration.html
 https://github.com/edenhill/librdkafka
 https://github.com/ah-/rdkafka-dotnet
Log compaction
 Keep the most recent payload for a key
 Use cases
 Database change subscription
 Event sourcing
 Journaling for HA
Log compaction

More Related Content

What's hot

Apache Kafka - Free Friday
Apache Kafka - Free FridayApache Kafka - Free Friday
Apache Kafka - Free FridayOtávio Carvalho
 
Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Timothy Spann
 
Apache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewApache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewDmitry Tolpeko
 
Devoxx Morocco 2016 - Microservices with Kafka
Devoxx Morocco 2016 - Microservices with KafkaDevoxx Morocco 2016 - Microservices with Kafka
Devoxx Morocco 2016 - Microservices with KafkaLászló-Róbert Albert
 
Kafka connect 101
Kafka connect 101Kafka connect 101
Kafka connect 101Whiteklay
 
Design Patterns for working with Fast Data
Design Patterns for working with Fast DataDesign Patterns for working with Fast Data
Design Patterns for working with Fast DataMapR Technologies
 
Kafka goutam chowdhury-unicom-spark kafka-summit
Kafka goutam chowdhury-unicom-spark kafka-summitKafka goutam chowdhury-unicom-spark kafka-summit
Kafka goutam chowdhury-unicom-spark kafka-summitGoutam Chowdhury
 
Understanding kafka
Understanding kafkaUnderstanding kafka
Understanding kafkaAmitDhodi
 
Data Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectData Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectKaufman Ng
 
The best of Apache Kafka Architecture
The best of Apache Kafka ArchitectureThe best of Apache Kafka Architecture
The best of Apache Kafka Architecturetechmaddy
 
Building Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBrian Ritchie
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaGuozhang Wang
 
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and VormetricProtecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and Vormetricconfluent
 
Kafka clients and emitters
Kafka clients and emittersKafka clients and emitters
Kafka clients and emittersEdgar Domingues
 
Building an Event Bus at Scale
Building an Event Bus at ScaleBuilding an Event Bus at Scale
Building an Event Bus at Scalejimriecken
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaShiao-An Yuan
 
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the FieldKafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Fieldconfluent
 

What's hot (20)

Apache Kafka - Free Friday
Apache Kafka - Free FridayApache Kafka - Free Friday
Apache Kafka - Free Friday
 
Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)
 
Apache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewApache Kafka - Messaging System Overview
Apache Kafka - Messaging System Overview
 
Devoxx Morocco 2016 - Microservices with Kafka
Devoxx Morocco 2016 - Microservices with KafkaDevoxx Morocco 2016 - Microservices with Kafka
Devoxx Morocco 2016 - Microservices with Kafka
 
Kafka connect 101
Kafka connect 101Kafka connect 101
Kafka connect 101
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
Design Patterns for working with Fast Data
Design Patterns for working with Fast DataDesign Patterns for working with Fast Data
Design Patterns for working with Fast Data
 
Kafka connect
Kafka connectKafka connect
Kafka connect
 
Kafka goutam chowdhury-unicom-spark kafka-summit
Kafka goutam chowdhury-unicom-spark kafka-summitKafka goutam chowdhury-unicom-spark kafka-summit
Kafka goutam chowdhury-unicom-spark kafka-summit
 
Understanding kafka
Understanding kafkaUnderstanding kafka
Understanding kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Data Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectData Pipelines with Kafka Connect
Data Pipelines with Kafka Connect
 
The best of Apache Kafka Architecture
The best of Apache Kafka ArchitectureThe best of Apache Kafka Architecture
The best of Apache Kafka Architecture
 
Building Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache Kafka
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
 
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and VormetricProtecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
 
Kafka clients and emitters
Kafka clients and emittersKafka clients and emitters
Kafka clients and emitters
 
Building an Event Bus at Scale
Building an Event Bus at ScaleBuilding an Event Bus at Scale
Building an Event Bus at Scale
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the FieldKafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
 

Viewers also liked

Building a robot with the .Net Micro Framework
Building a robot with the .Net Micro FrameworkBuilding a robot with the .Net Micro Framework
Building a robot with the .Net Micro FrameworkDucas Francis
 
A Day in the Life of a Metro-veloper
A Day in the Life of a Metro-veloperA Day in the Life of a Metro-veloper
A Day in the Life of a Metro-veloperDucas Francis
 
Seattle kafka meetup nov 2015 published siphon
Seattle kafka meetup nov 2015 published  siphonSeattle kafka meetup nov 2015 published  siphon
Seattle kafka meetup nov 2015 published siphonNitin Kumar
 
Continuous Delivery Pipeline - Patterns and Anti-patterns
Continuous Delivery Pipeline - Patterns and Anti-patternsContinuous Delivery Pipeline - Patterns and Anti-patterns
Continuous Delivery Pipeline - Patterns and Anti-patternsSonatype
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafkaSamuel Kerrien
 
Apache Kafka lessons learned @PAYBACK
Apache Kafka lessons learned @PAYBACKApache Kafka lessons learned @PAYBACK
Apache Kafka lessons learned @PAYBACKMaxim Shelest
 
Kafka At Scale in the Cloud
Kafka At Scale in the CloudKafka At Scale in the Cloud
Kafka At Scale in the Cloudconfluent
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperRahul Jain
 

Viewers also liked (9)

Building a robot with the .Net Micro Framework
Building a robot with the .Net Micro FrameworkBuilding a robot with the .Net Micro Framework
Building a robot with the .Net Micro Framework
 
A Day in the Life of a Metro-veloper
A Day in the Life of a Metro-veloperA Day in the Life of a Metro-veloper
A Day in the Life of a Metro-veloper
 
Seattle kafka meetup nov 2015 published siphon
Seattle kafka meetup nov 2015 published  siphonSeattle kafka meetup nov 2015 published  siphon
Seattle kafka meetup nov 2015 published siphon
 
Continuous Delivery Pipeline - Patterns and Anti-patterns
Continuous Delivery Pipeline - Patterns and Anti-patternsContinuous Delivery Pipeline - Patterns and Anti-patterns
Continuous Delivery Pipeline - Patterns and Anti-patterns
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
Apache Kafka lessons learned @PAYBACK
Apache Kafka lessons learned @PAYBACKApache Kafka lessons learned @PAYBACK
Apache Kafka lessons learned @PAYBACK
 
Kafka At Scale in the Cloud
Kafka At Scale in the CloudKafka At Scale in the Cloud
Kafka At Scale in the Cloud
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and Zookeeper
 

Similar to Introduction to Kafka

Kafka syed academy_v1_introduction
Kafka syed academy_v1_introductionKafka syed academy_v1_introduction
Kafka syed academy_v1_introductionSyed Hadoop
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...DevOps_Fest
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
 
Remarks on Grids e-Science CyberInfrastructure and Peer-to-Peer ...
Remarks on Grids e-Science CyberInfrastructure and Peer-to-Peer ...Remarks on Grids e-Science CyberInfrastructure and Peer-to-Peer ...
Remarks on Grids e-Science CyberInfrastructure and Peer-to-Peer ...Videoguy
 
Connecting mq&kafka
Connecting mq&kafkaConnecting mq&kafka
Connecting mq&kafkaMatt Leming
 
NaradaBrokering Grid Messaging and Applications as Web Services
NaradaBrokering Grid Messaging and Applications as Web ServicesNaradaBrokering Grid Messaging and Applications as Web Services
NaradaBrokering Grid Messaging and Applications as Web ServicesVideoguy
 
NaradaBrokering Grid Messaging and Applications as Web Services
NaradaBrokering Grid Messaging and Applications as Web ServicesNaradaBrokering Grid Messaging and Applications as Web Services
NaradaBrokering Grid Messaging and Applications as Web ServicesVideoguy
 
Collaboration and Grid Technologies
Collaboration and Grid TechnologiesCollaboration and Grid Technologies
Collaboration and Grid TechnologiesVideoguy
 
Streaming the platform with Confluent (Apache Kafka)
Streaming the platform with Confluent (Apache Kafka)Streaming the platform with Confluent (Apache Kafka)
Streaming the platform with Confluent (Apache Kafka)GiuseppeBaccini
 
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQCluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQShameera Rathnayaka
 
Kafka and ibm event streams basics
Kafka and ibm event streams basicsKafka and ibm event streams basics
Kafka and ibm event streams basicsBrian S. Paskin
 
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...GeeksLab Odessa
 
A Quick Guide to Refresh Kafka Skills
A Quick Guide to Refresh Kafka SkillsA Quick Guide to Refresh Kafka Skills
A Quick Guide to Refresh Kafka SkillsRavindra kumar
 
Understanding Apache Kafka P99 Latency at Scale
Understanding Apache Kafka P99 Latency at ScaleUnderstanding Apache Kafka P99 Latency at Scale
Understanding Apache Kafka P99 Latency at ScaleScyllaDB
 

Similar to Introduction to Kafka (20)

Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Kafka syed academy_v1_introduction
Kafka syed academy_v1_introductionKafka syed academy_v1_introduction
Kafka syed academy_v1_introduction
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Remarks on Grids e-Science CyberInfrastructure and Peer-to-Peer ...
Remarks on Grids e-Science CyberInfrastructure and Peer-to-Peer ...Remarks on Grids e-Science CyberInfrastructure and Peer-to-Peer ...
Remarks on Grids e-Science CyberInfrastructure and Peer-to-Peer ...
 
Connecting mq&kafka
Connecting mq&kafkaConnecting mq&kafka
Connecting mq&kafka
 
ppt
pptppt
ppt
 
ppt
pptppt
ppt
 
NaradaBrokering Grid Messaging and Applications as Web Services
NaradaBrokering Grid Messaging and Applications as Web ServicesNaradaBrokering Grid Messaging and Applications as Web Services
NaradaBrokering Grid Messaging and Applications as Web Services
 
NaradaBrokering Grid Messaging and Applications as Web Services
NaradaBrokering Grid Messaging and Applications as Web ServicesNaradaBrokering Grid Messaging and Applications as Web Services
NaradaBrokering Grid Messaging and Applications as Web Services
 
Collaboration and Grid Technologies
Collaboration and Grid TechnologiesCollaboration and Grid Technologies
Collaboration and Grid Technologies
 
Streaming the platform with Confluent (Apache Kafka)
Streaming the platform with Confluent (Apache Kafka)Streaming the platform with Confluent (Apache Kafka)
Streaming the platform with Confluent (Apache Kafka)
 
Kafka RealTime Streaming
Kafka RealTime StreamingKafka RealTime Streaming
Kafka RealTime Streaming
 
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQCluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
 
Kafka and ibm event streams basics
Kafka and ibm event streams basicsKafka and ibm event streams basics
Kafka and ibm event streams basics
 
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
 
A Quick Guide to Refresh Kafka Skills
A Quick Guide to Refresh Kafka SkillsA Quick Guide to Refresh Kafka Skills
A Quick Guide to Refresh Kafka Skills
 
Understanding Apache Kafka P99 Latency at Scale
Understanding Apache Kafka P99 Latency at ScaleUnderstanding Apache Kafka P99 Latency at Scale
Understanding Apache Kafka P99 Latency at Scale
 

Recently uploaded

Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 

Recently uploaded (20)

Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 

Introduction to Kafka

Editor's Notes

  1. High throughput – web activity tracking receiving 10’s of events per page hit or interaction. Periodic data loads – every 5min receving 100,000s messages Low latency – pub/sub in ms Distributed – anyone sending or receiving messages should be able to accomplish HA
  2. cd ~/Projects/kafka-vagrant vagrant status vagrant up vagrant ssh kafka-1 cat /etc/kafka/server.properties https://kafka.apache.org/090/configuration.html
  3. Topic – feed of messages Partition – topics are broken into partitions Messages – written to the end of a partition within a topic and assigned a sequential identifier (a 64bit integer) which is called an offset Data is retained within a partition for a configurable amount of time. The time is defaulted in broker configuration, but can be set per topic. Messages are stored on the file system in segmented files. Number of partitions can be increased after creation, but not decreased. This is because (as mentioned) the messages are stored on the file system on a per-partition basis, so reducing partitions would be effectively deleting data. Partitions are assigned to brokers – not topics. Kafka attempts to balance the number of partitions across the available brokers, which can be manually configured too. This is how kafka attempts to load balance its activity because, in theory, each broker having an equal number of partitions should receive an equal number of send and fetch requests.
  4. … The responsibilities of coordination are mixed between ZK and Kafka. Older versions of kafka relied more on ZK, but this is being brought more into the broker and ZK is being used more for service discovery and configuration. Before 0.9.0, consumers were coordinated by ZK and had to have a lot of logic around which partitions were assigned to them. This was changed so that for a new consumer a broker is assigned to be the consumer coordinator and tell the consumers which partitions were assigned to them.
  5. cd ~/Downloads/confluent-2.0.0/bin ls ./kafka-topics ./kafka-topics --create --topic perf-test --partitions 10 --replication-factor 3 --zookeeper 192.168.32.11:2181 ./kafka-topics --list --zookeeper 192.168.32.12 ./kafka-topics --describe --topic perf-test --zookeeper 192.168.32.13
  6. ./kafka-producer-perf-test # Publish 10k x 4kb messages ./kafka-producer-perf-test --topic perf-test --num-records 10000 --record-size 4096 --throughput 1000 --producer-props bootstrap.servers=192.168.32.21:9092,192.168.32.22:9092,192.168.32.23:9092 # Up the throughput for 100k ./kafka-producer-perf-test --topic perf-test --num-records 100000 --record-size 4096 --throughput 100000 --producer-props bootstrap.servers=192.168.32.21:9092,192.168.32.22:9092,192.168.32.23:9092 # No ACKs ./kafka-producer-perf-test --topic perf-test --num-records 1000000 --record-size 4096 --throughput 100000 --producer-props bootstrap.servers=192.168.32.21:9092,192.168.32.22:9092,192.168.32.23:9092 acks=0 # ACKs from all ISRs ./kafka-producer-perf-test --topic perf-test --num-records 1000000 --record-size 4096 --throughput 100000 --producer-props bootstrap.servers=192.168.32.21:9092,192.168.32.22:9092,192.168.32.23:9092 acks=-1 # ACK from leader, use snappy ./kafka-producer-perf-test --topic perf-test --num-records 1000000 --record-size 4096 --throughput 100000 --producer-props bootstrap.servers=192.168.32.21:9092,192.168.32.22:9092,192.168.32.23:9092 acks=1 compression.type=snappy # linger for 100ms ./kafka-producer-perf-test --topic perf-test --num-records 1000000 --record-size 4096 --throughput 100000 --producer-props bootstrap.servers=192.168.32.21:9092,192.168.32.22:9092,192.168.32.23:9092 acks=1 compression.type=snappy linger.ms=100
  7. # Consumer 1M on 5 threads ./kafka-consumer-perf-test --zookeeper 192.168.32.11 --topic perf-test --messages 1000000 --group perf-test --threads 5
  8. Modern OSs maintain a page cache and aggressively use main memory for disk caching. By NOT utilizing this and storing an in-memory representation of data you’re effectively doubling up on the amount of memory you’re application is consuming. By utilizing this you’re utilizing all available RAM for caching without GC penalties. It’s also kept in memory even if the application is restarted. This is obviously advantageous when reading messages, but also when writing. Rather than maintain as much as possible in-memory and flush it all out to the file system in a panic when we run out of space, we invert that. All data is immediately written to a persistent log on the filesystem without necessarily flushing to disk. In effect this just means that it is transferred into the kernel's pagecache. Modern unix operating systems offer a highly optimized code path for transferring data out of pagecache to a socket – the sendfile system call. OS reads data from a file into pagecache in kernel space Application reads from kernel space to a user space buffer Application writes data back to kernel space into a socker buffer OS copies from socket buffer to NIC buffer to send over the network sendfile avoids this by instructing the OS to send data directly from the pagecache to the NIC. This means that consumers that are caught up will be served completely from memory.
  9. Kafka scales topic consumption by distributing partitions among a consumer group, which is a set of consumers sharing a common group identifier. For each group a broker is selected as the group coordinator. The coordinator is responsible for managing the state of the group. Its main job is to mediate partition assignment when new members arrive, old members depart, and when topic metadata changes. The act of reassigning partitions is known as rebalancing the group.