SlideShare a Scribd company logo
Markus Günther
Freelance Software Engineer / Architect
mail@mguenther.net | mguenther.net | @markus_guenther
Streaming Data
with Apache Kafka
2
Point-to-point communication is simple to maintain – especially
if there is only a small number of systems involved.
System
System
3
Adding more systems increases the complexity of
communication channels in this kind of architecture.
System
System
System System
System
System
4
A messaging solution can be used to decouple producing systems
from consuming systems and thus remove that complexity.
Producer
Consumer
Producer Producer
Consumer
Consumer
Messaging Solution
5
Apache Kafka supports this communication model.
Producer
Consumer
Producer Producer
Consumer
Consumer
Apache Kafka Cluster
6
Producers publish data to specific topics, consumers subscribe to
topics of interest and consume data at their own pace.
Producer
Consumer
Producer Producer
Consumer
Consumer
Topic A Topic B Topic C
Consumer
Consumer
7
Apache Kafka is a distributed publish-subscribe messaging
system that supports topic access semantics.
History
Intentions ▪ Designed for near-real-time processing of events
▪ Supports multiple delivery semantics
▪ At-least-once
▪ Exactly-once (well, not quite)
▪ Optimized binary protocol for client-to-broker communication
▪ No integration with JMS, …
▪ Apache Kafka originated at LinkedIn
▪ Maintained by the Apache Foundation
▪ Confluent drives further development
▪ Confluent provides various system components that enrich the Kafka ecosystem
8
Apache Kafka is a distributed publish-subscribe messaging
system that supports topic access semantics. (cont.)
Innovations ▪ Messages are acknowledged in order
▪ Messages are persisted for days / weeks / indefinite
▪ Consumers manage their offsets
9
Kafka uses a persistent log to implement publish-subscribe
messaging. Publishers append, consumers read sequentially.
9 8 7 6 5 4 3 2 1 0
Producer
publishes
Consumer
consumer group: A
Consumer
consumer group: B
current position: 8 current position: 3
1
1
A Kafka topic is comprised of at least one partition.
8 7 6 5 4 3 2 1 0
1 0
4 3 2 1 0
Partition 0
Topic with 3 partitions
Partition 1
Partition 2
1
Consumers that participate in the same consumer group share the
read workload of an equally partition-sized topic.
8 7 6 5 4 3 2 1 0
1 0
4 3 2 1 0
Partition 0
Topic with 3 partitions
Partition 1
Partition 2
Consumer
Consumer
Consumer
Consumer group
1
Kafka redistributes work if a consumer process fails and is no
longer able to process messages.
8 7 6 5 4 3 2 1 0
1 0
4 3 2 1 0
Partition 0
Topic with 3 partitions
Partition 1
Partition 2
Consumer
Consumer
Consumer
Consumer group
1
A message (or record, or event, or what-have-you) contains
metadata alongside the actual message payload.
Headers
(optional)
Key
(optional)
Value
(set by application)
Timestamp
(set by Kafka or by application)
1
Topic-partitions are spread across available brokers and can thus
span multiple machines in a Apache Kafka cluster.
Partition 0
Partition 1
Partition 2
Broker 1
Partition 2
Apache Kafka Cluster
Broker 2
Partition 0
Broker 3
Partition 1
Topic with 3 partitions, replication factor = 1
1
Topic-partitions are spread across available brokers and can thus
span multiple machines in a Apache Kafka cluster.
Partition 0
Partition 1
Partition 2
Broker 1
Leader-partition 2
Apache Kafka Cluster
Broker 2
Leader-partition 0
Broker 3
Leader-partition 1
Topic with 3 partitions, replication factor = 2
Follower-partition 0 Follower-partition 1 Follower-partition 2
1
In-Sync-Replica set
for partition 0
The In-Sync-Replica set (ISR) contains all brokers that are either a
leader or a follower for a dedicated topic-partition.
Partition 0
Broker 1
Follower-partition 0
Broker 2
Leader-partition 0
replicate
acknowledge
1
Code, anyone?
1
A reference architecture helps us to sort things into categories that
are driven by certain (non-)functional requirements.
Collection Tier Messaging Tier Analysis Tier Persistence Tier Data Access Tier
Collection Service
(MQTT)
Collection Service
(HTTP)
Cache
Topic 1
Topic 2
Topic 3
Subscriber 2
(Stream Processor)
Subscriber 3
(Stream Processor)
Search
Engine
RDBMS
Client Application
Subscriber 1
(Stream Processor)
2
Apache Kafka features a rich ecosystem of supporting services that
fit nicely into the tiers of a streaming architecture.
Collection Tier Messaging Tier Analysis Tier Persistence Tier Data Access Tier
Kafka Connect
(Source Connector)
Kafka Client DSL
(Producing System) Topic 1
Topic 2
Topic 3
Search
Engine
RDBMS
Client Application
Kafka Client DSL
(Consuming System)
Confluent
Schema Registry
Confluent
REST Proxy
Kafka Streams DSL
or ksqlDB
(Stream Processor)
Kafka Connect
(Sink Connector)
Kafka Cluster
2
Want to know more?
Books ▪ Narkhede N., Shapira G., Palino T., Kafka - The Definitive Guide: Real-time data and
stream processing at scale, O‘Reilly, 2nd Edition, 2021
▪ Koutanov E., Effective Kafka: A Hands-On Guide to Building Robust and Scalable
Event-Driven Applications, Independently published, 2020
▪ Kreps J., I Heart Logs: Event Data, Stream Processing, and Data Integration, O‘Reilly,
2014
▪ Seymour M., Mastering Kafka Streams and ksqlDB: Building Real-Time Data
Systems by Example, O‘Reilly, 2021
▪ Dunning T., Friedman E., Streaming Architecture: New Designs Using Apache Kafka
and MapR Streams, O‘Reilly, 2016
▪ Akidau T., Chernyak S., Lax R., Streaming Systems, O‘Reilly, 2018
▪ Young G., Versioning in an Event-sourced system, Leanpub, 2017
2
Want to know more?
Magazines ▪ Fresow B., Günther M., Nachrichten aus dem Archiv: Event-gestützte Applikationen
mit Spring Kafka (Teil 3), JavaMagazin, 3/2018, p. 90-98
▪ Fresow B., Günther M., Briefe vom Windrad: Event-gestützte Applikationen mit
Spring Kafka (Teil 2), JavaMagazin, 2/2018, p. 80-87
▪ Fresow B., Günther M., Frühlingsbotschaften: Event-gestützte Applikationen mit
Spring Kafka (Teil 1), JavaMagazin, 1/2018, p. 73-77
▪ Günther M., Datenserialisierung mit Apache Avro, JavaSPEKTRUM, 5/2017, p. 35-38
▪ Günther M., Streaming-Applikationen mit Kafka Streams, JavaSPEKTRUM, 4/2017,
p. 54-58
▪ Günther M., Skalierfähige, asynchrone Nachrichtenverarbeitung mit Apache Kafka,
JavaSPEKTRUM, 3/2017, p. 48-51
2
Want to know more?
GitHub
Other ▪ Confluent Developer Portal,
https://developer.confluent.io/
▪ Various blogs on testing, data exploration, etc.,
https://www.mguenther.net/tag/kafka.html/
▪ Kafka for JUnit on GitHub,
https://mguenther.github.io/kafka-junit/
▪ User Guide to Kafka for JUnit,
https://mguenther.github.io/kafka-junit/
▪ Event-sourcing using Spring Kafka,
https://github.com/mguenther/spring-kafka-event-sourcing-sampler
▪ Spring Kafka for Large-Scale Event Processing
https://github.com/mguenther/spring-kafka-event-processing-sampler
▪ Introduction to Spring Kafka
https://github.com/mguenther/spring-kafka-introduction
2
Questions?
mguenther.net markus_guenther
mail@mguenther.net

More Related Content

What's hot

Understanding kafka
Understanding kafkaUnderstanding kafka
Understanding kafka
AmitDhodi
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
natashasweety7
 
Kafka clients and emitters
Kafka clients and emittersKafka clients and emitters
Kafka clients and emitters
Edgar Domingues
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
Mohammed Fazuluddin
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
DataStax Academy
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
Saroj Panyasrivanit
 
Event Hub & Kafka
Event Hub & KafkaEvent Hub & Kafka
Event Hub & Kafka
Aparna Pillai
 
Apache Kafka
Apache Kafka Apache Kafka
Apache kafka
Apache kafkaApache kafka
Apache kafka
Srikrishna k
 
Kafka Technical Overview
Kafka Technical OverviewKafka Technical Overview
Kafka Technical Overview
Sylvester John
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
emreakis
 
Apache Kafka Demo
Apache Kafka DemoApache Kafka Demo
Apache Kafka Demo
Edward Capriolo
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
Kafka: Internals
Kafka: InternalsKafka: Internals
Kafka: Internals
Knoldus Inc.
 
Kafka meetup JP #3 - Engineering Apache Kafka at LINE
Kafka meetup JP #3 - Engineering Apache Kafka at LINEKafka meetup JP #3 - Engineering Apache Kafka at LINE
Kafka meetup JP #3 - Engineering Apache Kafka at LINE
kawamuray
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
David Groozman
 
ES & Kafka
ES & KafkaES & Kafka
ES & Kafka
Diego Pacheco
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Jemin Patel
 
Apache Kafka Women Who Code Meetup
Apache Kafka Women Who Code MeetupApache Kafka Women Who Code Meetup
Apache Kafka Women Who Code Meetup
Snehal Nagmote
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka Streams
Ketan Gote
 

What's hot (20)

Understanding kafka
Understanding kafkaUnderstanding kafka
Understanding kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Kafka clients and emitters
Kafka clients and emittersKafka clients and emitters
Kafka clients and emitters
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Event Hub & Kafka
Event Hub & KafkaEvent Hub & Kafka
Event Hub & Kafka
 
Apache Kafka
Apache Kafka Apache Kafka
Apache Kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Kafka Technical Overview
Kafka Technical OverviewKafka Technical Overview
Kafka Technical Overview
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Apache Kafka Demo
Apache Kafka DemoApache Kafka Demo
Apache Kafka Demo
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Kafka: Internals
Kafka: InternalsKafka: Internals
Kafka: Internals
 
Kafka meetup JP #3 - Engineering Apache Kafka at LINE
Kafka meetup JP #3 - Engineering Apache Kafka at LINEKafka meetup JP #3 - Engineering Apache Kafka at LINE
Kafka meetup JP #3 - Engineering Apache Kafka at LINE
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
 
ES & Kafka
ES & KafkaES & Kafka
ES & Kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka Women Who Code Meetup
Apache Kafka Women Who Code MeetupApache Kafka Women Who Code Meetup
Apache Kafka Women Who Code Meetup
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka Streams
 

Similar to Streaming Data with Apache Kafka

Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQCluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Shameera Rathnayaka
 
Timothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLTimothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for ML
Edunomica
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
DataStax Academy
 
bigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar Appsbigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar Apps
Timothy Spann
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Apache kafkaApache kafka
Apache kafka
Ramakrishna kapa
 
Princeton Dec 2022 Meetup_ StreamNative and Cloudera Streaming
Princeton Dec 2022 Meetup_ StreamNative and Cloudera StreamingPrinceton Dec 2022 Meetup_ StreamNative and Cloudera Streaming
Princeton Dec 2022 Meetup_ StreamNative and Cloudera Streaming
Timothy Spann
 
Introduction to Kafka Streams Presentation
Introduction to Kafka Streams PresentationIntroduction to Kafka Streams Presentation
Introduction to Kafka Streams Presentation
Knoldus Inc.
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
GeeksLab Odessa
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
Anton Nazaruk
 
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Timothy Spann
 
Large scale, distributed and reliable messaging with Kafka
Large scale, distributed and reliable messaging with KafkaLarge scale, distributed and reliable messaging with Kafka
Large scale, distributed and reliable messaging with Kafka
Rafał Hryniewski
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Kumar Shivam
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache Kafka
Slim Baltagi
 
Event Sourcing & CQRS, Kafka, Rabbit MQ
Event Sourcing & CQRS, Kafka, Rabbit MQEvent Sourcing & CQRS, Kafka, Rabbit MQ
Event Sourcing & CQRS, Kafka, Rabbit MQ
Araf Karsh Hamid
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Guido Schmutz
 
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps_Fest
 
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
Timothy Spann
 

Similar to Streaming Data with Apache Kafka (20)

Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQCluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
 
Timothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLTimothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for ML
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
 
bigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar Appsbigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar Apps
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Princeton Dec 2022 Meetup_ StreamNative and Cloudera Streaming
Princeton Dec 2022 Meetup_ StreamNative and Cloudera StreamingPrinceton Dec 2022 Meetup_ StreamNative and Cloudera Streaming
Princeton Dec 2022 Meetup_ StreamNative and Cloudera Streaming
 
Introduction to Kafka Streams Presentation
Introduction to Kafka Streams PresentationIntroduction to Kafka Streams Presentation
Introduction to Kafka Streams Presentation
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
 
Large scale, distributed and reliable messaging with Kafka
Large scale, distributed and reliable messaging with KafkaLarge scale, distributed and reliable messaging with Kafka
Large scale, distributed and reliable messaging with Kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache Kafka
 
Event Sourcing & CQRS, Kafka, Rabbit MQ
Event Sourcing & CQRS, Kafka, Rabbit MQEvent Sourcing & CQRS, Kafka, Rabbit MQ
Event Sourcing & CQRS, Kafka, Rabbit MQ
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
 
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
 
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
 

Recently uploaded

Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
Hornet Dynamics
 
What is Master Data Management by PiLog Group
What is Master Data Management by PiLog GroupWhat is Master Data Management by PiLog Group
What is Master Data Management by PiLog Group
aymanquadri279
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
Remote DBA Services
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
Green Software Development
 
SMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API ServiceSMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API Service
Yara Milbes
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
rodomar2
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
Peter Muessig
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
Philip Schwarz
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
Green Software Development
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
mz5nrf0n
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
Peter Muessig
 
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
Hironori Washizaki
 

Recently uploaded (20)

Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
 
What is Master Data Management by PiLog Group
What is Master Data Management by PiLog GroupWhat is Master Data Management by PiLog Group
What is Master Data Management by PiLog Group
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
 
SMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API ServiceSMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API Service
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
 
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
 

Streaming Data with Apache Kafka

  • 1. Markus Günther Freelance Software Engineer / Architect mail@mguenther.net | mguenther.net | @markus_guenther Streaming Data with Apache Kafka
  • 2. 2 Point-to-point communication is simple to maintain – especially if there is only a small number of systems involved. System System
  • 3. 3 Adding more systems increases the complexity of communication channels in this kind of architecture. System System System System System System
  • 4. 4 A messaging solution can be used to decouple producing systems from consuming systems and thus remove that complexity. Producer Consumer Producer Producer Consumer Consumer Messaging Solution
  • 5. 5 Apache Kafka supports this communication model. Producer Consumer Producer Producer Consumer Consumer Apache Kafka Cluster
  • 6. 6 Producers publish data to specific topics, consumers subscribe to topics of interest and consume data at their own pace. Producer Consumer Producer Producer Consumer Consumer Topic A Topic B Topic C Consumer Consumer
  • 7. 7 Apache Kafka is a distributed publish-subscribe messaging system that supports topic access semantics. History Intentions ▪ Designed for near-real-time processing of events ▪ Supports multiple delivery semantics ▪ At-least-once ▪ Exactly-once (well, not quite) ▪ Optimized binary protocol for client-to-broker communication ▪ No integration with JMS, … ▪ Apache Kafka originated at LinkedIn ▪ Maintained by the Apache Foundation ▪ Confluent drives further development ▪ Confluent provides various system components that enrich the Kafka ecosystem
  • 8. 8 Apache Kafka is a distributed publish-subscribe messaging system that supports topic access semantics. (cont.) Innovations ▪ Messages are acknowledged in order ▪ Messages are persisted for days / weeks / indefinite ▪ Consumers manage their offsets
  • 9. 9 Kafka uses a persistent log to implement publish-subscribe messaging. Publishers append, consumers read sequentially. 9 8 7 6 5 4 3 2 1 0 Producer publishes Consumer consumer group: A Consumer consumer group: B current position: 8 current position: 3
  • 10. 1
  • 11. 1 A Kafka topic is comprised of at least one partition. 8 7 6 5 4 3 2 1 0 1 0 4 3 2 1 0 Partition 0 Topic with 3 partitions Partition 1 Partition 2
  • 12. 1 Consumers that participate in the same consumer group share the read workload of an equally partition-sized topic. 8 7 6 5 4 3 2 1 0 1 0 4 3 2 1 0 Partition 0 Topic with 3 partitions Partition 1 Partition 2 Consumer Consumer Consumer Consumer group
  • 13. 1 Kafka redistributes work if a consumer process fails and is no longer able to process messages. 8 7 6 5 4 3 2 1 0 1 0 4 3 2 1 0 Partition 0 Topic with 3 partitions Partition 1 Partition 2 Consumer Consumer Consumer Consumer group
  • 14. 1 A message (or record, or event, or what-have-you) contains metadata alongside the actual message payload. Headers (optional) Key (optional) Value (set by application) Timestamp (set by Kafka or by application)
  • 15. 1 Topic-partitions are spread across available brokers and can thus span multiple machines in a Apache Kafka cluster. Partition 0 Partition 1 Partition 2 Broker 1 Partition 2 Apache Kafka Cluster Broker 2 Partition 0 Broker 3 Partition 1 Topic with 3 partitions, replication factor = 1
  • 16. 1 Topic-partitions are spread across available brokers and can thus span multiple machines in a Apache Kafka cluster. Partition 0 Partition 1 Partition 2 Broker 1 Leader-partition 2 Apache Kafka Cluster Broker 2 Leader-partition 0 Broker 3 Leader-partition 1 Topic with 3 partitions, replication factor = 2 Follower-partition 0 Follower-partition 1 Follower-partition 2
  • 17. 1 In-Sync-Replica set for partition 0 The In-Sync-Replica set (ISR) contains all brokers that are either a leader or a follower for a dedicated topic-partition. Partition 0 Broker 1 Follower-partition 0 Broker 2 Leader-partition 0 replicate acknowledge
  • 19. 1 A reference architecture helps us to sort things into categories that are driven by certain (non-)functional requirements. Collection Tier Messaging Tier Analysis Tier Persistence Tier Data Access Tier Collection Service (MQTT) Collection Service (HTTP) Cache Topic 1 Topic 2 Topic 3 Subscriber 2 (Stream Processor) Subscriber 3 (Stream Processor) Search Engine RDBMS Client Application Subscriber 1 (Stream Processor)
  • 20. 2 Apache Kafka features a rich ecosystem of supporting services that fit nicely into the tiers of a streaming architecture. Collection Tier Messaging Tier Analysis Tier Persistence Tier Data Access Tier Kafka Connect (Source Connector) Kafka Client DSL (Producing System) Topic 1 Topic 2 Topic 3 Search Engine RDBMS Client Application Kafka Client DSL (Consuming System) Confluent Schema Registry Confluent REST Proxy Kafka Streams DSL or ksqlDB (Stream Processor) Kafka Connect (Sink Connector) Kafka Cluster
  • 21. 2 Want to know more? Books ▪ Narkhede N., Shapira G., Palino T., Kafka - The Definitive Guide: Real-time data and stream processing at scale, O‘Reilly, 2nd Edition, 2021 ▪ Koutanov E., Effective Kafka: A Hands-On Guide to Building Robust and Scalable Event-Driven Applications, Independently published, 2020 ▪ Kreps J., I Heart Logs: Event Data, Stream Processing, and Data Integration, O‘Reilly, 2014 ▪ Seymour M., Mastering Kafka Streams and ksqlDB: Building Real-Time Data Systems by Example, O‘Reilly, 2021 ▪ Dunning T., Friedman E., Streaming Architecture: New Designs Using Apache Kafka and MapR Streams, O‘Reilly, 2016 ▪ Akidau T., Chernyak S., Lax R., Streaming Systems, O‘Reilly, 2018 ▪ Young G., Versioning in an Event-sourced system, Leanpub, 2017
  • 22. 2 Want to know more? Magazines ▪ Fresow B., Günther M., Nachrichten aus dem Archiv: Event-gestützte Applikationen mit Spring Kafka (Teil 3), JavaMagazin, 3/2018, p. 90-98 ▪ Fresow B., Günther M., Briefe vom Windrad: Event-gestützte Applikationen mit Spring Kafka (Teil 2), JavaMagazin, 2/2018, p. 80-87 ▪ Fresow B., Günther M., Frühlingsbotschaften: Event-gestützte Applikationen mit Spring Kafka (Teil 1), JavaMagazin, 1/2018, p. 73-77 ▪ Günther M., Datenserialisierung mit Apache Avro, JavaSPEKTRUM, 5/2017, p. 35-38 ▪ Günther M., Streaming-Applikationen mit Kafka Streams, JavaSPEKTRUM, 4/2017, p. 54-58 ▪ Günther M., Skalierfähige, asynchrone Nachrichtenverarbeitung mit Apache Kafka, JavaSPEKTRUM, 3/2017, p. 48-51
  • 23. 2 Want to know more? GitHub Other ▪ Confluent Developer Portal, https://developer.confluent.io/ ▪ Various blogs on testing, data exploration, etc., https://www.mguenther.net/tag/kafka.html/ ▪ Kafka for JUnit on GitHub, https://mguenther.github.io/kafka-junit/ ▪ User Guide to Kafka for JUnit, https://mguenther.github.io/kafka-junit/ ▪ Event-sourcing using Spring Kafka, https://github.com/mguenther/spring-kafka-event-sourcing-sampler ▪ Spring Kafka for Large-Scale Event Processing https://github.com/mguenther/spring-kafka-event-processing-sampler ▪ Introduction to Spring Kafka https://github.com/mguenther/spring-kafka-introduction