SlideShare a Scribd company logo
Introducing Apache Kafka®
Paul Brebner
Technical Evangelist February 2019
What is Kafka?
Message flow
Distributed streams
Processing System
Messages sent by
Distributed Producers
to
Distributed Consumers
Consumers
via
Distributed Kafka Cluster
Cluster
Kafka Benefits
 Fast
 Scalable
 Reliable
 Durable
 Open Source
 Managed Service
Kafka benefits
Kafka Benefits
 Fast – high throughput and low latency
 Scalable – horizontally scalable with nodes and partitions
 Reliable – distributed and fault tolerant
 Durable - zero data loss, messages persisted to disk with immutable log
 Open Source – An Apache project
 Available as a Managed Service - on multiple cloud platforms
Kafka benefits
Message flow
Message flow
To send messages from A to B
“A” is the Producer – sends a message, to
“B” the Consumer (recipient) of the message
Due to decline in “snail mail” direct deliveries
Due to decline in “snail mail” direct deliveries
Instead … “Poste Restante”
“Poste Restante”?
• Not a post office in a
restaurant
• General delivery (in the
US)
• The mail is delivered to
a post office, they hold
it for you until you call
Consumers “poll” for messages by visiting the
Poste Restante counter at the post office
Kafka topics act like a Post Office
Benefits include
• Disconnected delivery –
consumer doesn’t need to
be available to receive
messages
• Less effort for the
messaging service – only
has to deliver to a few
locations not every
consumer
• Can scale better and
handle more complex
delivery semantics!
Scalability? Many consumers for a topic?
A single counter introduces delays
More counters increases concurrency
Kafka topics have >= 1 Partitions (“counters”)
• Partitions increase
consumer concurrency
• Increase throughput
• Reduce latency
Santa
North Pole
What’s a Kafka message?
A Record – like a letter
Santa
North PoleTopic
Topic is the destination
Santa
North PoleTopic
Timestamp, offset, partition
The “Postmark”
Santa
North PoleTopic
Timestamp, offset, partition
The “Postmark”
Time semantics are flexible
time of event creation,
ingestion,
or processing.
Santa
North PoleTopic
Timestamp, offset, partition
Key -> Partition (optional)
Key (optional)
Santa
North PoleTopic
Timestamp, offset, partition
Key -> Partition (optional)
Key (optional)
Refines the destination
Send to Santa not just any Elf
Santa
North PoleTopic
Timestamp, offset, partition
Key -> Partition (optional)
Value is contents (byte array)
Santa
North PoleTopic
Timestamp, offset, partition
Key -> Partition (optional)
Kafka Producers and Consumers
need a serializer and de-serializer
to write & read key and value
• Kafka doesn’t look
at the value
• Consumer can
read value
• And try to make
sense of the
message
• What will Santa
be delivering?!
Next…
Delivery Semantics
Do we care
if the message arrives?
Yes! Guaranteed delivery is desirable
But homing pigeons got lost or eaten
Send multiple pigeons
One pigeon may make it
Eaten
Lost
Producer
Broker
1
Broker
2
Broker
3
M1
How does Kafka guarantee delivery?
Producer
Broker
1
Broker
2
Broker
3
M1
A Message (M1) is written to a broker (2)
Producer
Broker
1
Broker
2
Broker
3
M1
M1
The message is always persisted to disk.
Producer
Broker
1
Broker
2
Broker
3
M1
M1
This makes it resilient to loss of power.
Producer
Broker
1
Broker
2
Broker
3
M1 M1
M1 M1M1
The message is also replicated on multiple “brokers”
Producer
Broker
1
Broker
2
Broker
3
M1
M1 M1M1
Which makes it resilient to loss of most servers
Producer
Broker
1
Broker
2
Broker
3
Acknowledgement
M1 M1 M1
Producer gets acknowledgement
once the message is persisted and replicated
(configurable)
Consumer
Broker
1
Broker
2
Broker
3
M1 M1 M1
Multiple Brokers and Partitions 
increased read availability and concurrency
Producer
Consumer
Consumer
Consumer
Consumer
The 2nd aspect of delivery semantics:
Who gets the messages?
How many times are messages delivered?
Producer
Consumer
Consumer
Consumer
Consumer
Delivery Semantics - Kafka is “pub-sub”
- Loosely coupled
- Producers and consumers don’t know about each other
Producer
Topic “Parties”
Consumer
Consumer
Consumer
Consumer
Which consumers get which messages
(filtering), is topic based
Topic “Work”
Producer
Topic “Parties”
Consumer
Consumer
Consumer
Consumer
Topic “Work”
Consumers Subscribe to topic “Parties”
Subscribe
Producer
Topic “Parties”
Consumer
Consumer
Consumer
Consumers
Subscribed to Topic “Parties”
Consumer
Publishers send messages to topics
Send
Topic “Work”Send
Producer
Topic “Parties”
Consumer
Consumer
Consumer
Consumers
Subscribed to Topic “Parties”
Consumer
Consumers only receive messages from
subscribed topics
Consumers Poll
To receive messages
from ”Parties”
Topic “Work” Consumers not subscribed to “Work”
Don’t receive any “Work” messages
Partitions and Consumer Groups
Enable sharing of work across
consumers
Duplicate message delivery
Each message is
delivered to each
subscribed
consumer group
Producer
Partition 1
Partition 2
Partition n
Topic “Parties”
C1
Consumer Group
Consumer Group
Consumer
Consumer
Consumer
Consumer
Consumers subscribed to topic are allocated partitions
They will only get messages from their allocated partitions.
Producer
Partition 1
Partition 2
Partition n
Topic “Parties”
C1
Consumer Group
Consumer
Consumer
Consumer
Consumers share work
within groups
Consumers in the same group share the work around
Each consumer gets only a subset of messages
Producer
Partition 1
Partition 2
Partition n
Topic “Parties”
C1
Consumer Group
Consumer Group
Consumer
Consumer
Consumer
Consumer
Messages are duplicated across
Consumer groups
Multiple groups enable message broadcasting
Messages are duplicated across groups, each consumer
group receives a copy of each message.
Key
Partition based delivery
Which messages are delivered to
which consumers?
If a message has a key, then Kafka
uses Partition based delivery.
Messages with the same key are
always sent to the same partition
and therefore the same
consumer.
And the order is guaranteed.
No Key
Round robin delivery
If the key is null, then
Kafka uses round robin
delivery
Each message is delivered
to the next partition
Time for an Example, with 2 consumer groups.
Consumer Group = Nerds
Multiple consumers
Consumer Group = Nerds
Multiple consumers
Consumer Group = Hairy
Single consumer
Producer
Partition 1
Partition 2
Partition n
Topic “Parties”
C1
Group “Nerds”
Group “Hairy”
Consumer 1 (Bill)
Consumer 2 (Paul)
Consumer n
Consumer 1 (Chewy)
Consumers
Subscribed to “Parties”
No Key
Round Robin
M1
M2
etc
M1
M1
M1
M2
M2
M2
Case 1: No Key
Message (M1, M2, etc) sent to the next partition
All consumers allocated to that partition will receive a message when they poll next.
Here’s what happens (not showing producer or topics, have to imagine them)
1. Both Groups subscribe to Topic “parties” (11 partitions, so 1 consumer per partition).
Subscribe to “Parties” Subscribe to “Parties”
1. Both Groups subscribe to Topic “parties” (11 partitions, so 1 consumer per partition).
2. Producer sends record “Cool pool party – Invitation”
<key=null, value=“Cool pool party - Invitation”> to “parties” topic (no key)
1. Both Groups subscribe to Topic “parties” (11 partitions, so 1 consumer per partition).
2. Producer sends record “Cool pool party - Invitation”> to “parties” topic
3. Bill and Chewbacca receive a copy of the invitation and plan to attend
4. Producer sends another record “Cool pool party – Cancelled”
<key=null, value=“Cool pool party - Cancelled”> to “parties” topic
4. Producer sends another record <key=null, value=“Cool pool party - Cancelled”> to “parties” topic
5. Paul and Chewbacca receive the cancellation.
Paul gets the message this time as it’s round robin, ignores it as he didn’t get the invitation. Bill wastes his
time trying to go to cancelled party. The rest of the gang aren’t surprised at not receiving any party invites and
stay at home to do some hacking. Chewy is only consumer in his group so gets all messages, plans something
fun instead…
Producer
Partition 1
Partition 2
Partition n
Topic “Parties”
C1
Group “Nerds”
Group “Hairy”
Consumer 1 (Bill)
Consumer 2 (Paul)
Consumer n
Consumer 1 (Chewy)
Consumers
Subscribed to “Parties”
Key
Hashed to partition
M1, M2
etc
M1, M2
M1, M2
M1, M2
M3
M3
M3
M3
Case 2: If there is a Key
A key is hashed to a partition, and a Message with that key is always sent to that partition.
Assume there are 3 messages, and messages 1 and 2 are hashed to same partition.
Here’s what happens with a key: key is “title” of the message (e.g. “Cool pool party”)
Same set up as before:
1. Both Groups subscribe to Topic “parties” (11 partitions).
1. Both Groups subscribe to Topic “parties” (11 partitions).
2. Producer sends record <key=“Cool pool party”, value=“Invitation”> to “parties” topic
1. Both Groups subscribe to Topic “parties” (11 partitions).
2. Producer sends record <key=“Cool pool party”, value=“Invitation”> to “parties” topic
3. As before Bill and Chewbacca receive a copy of the invitation and plan to attend
4. Producer sends another record <key=“Cool pool party”, value=“Cancelled”> to “parties” topic
4. Producer sends another record <key=“Cool pool party”, value=“Cancelled”> to “parties” topic
5. Bill and Chewbacca receive the cancellation (same consumers this time, as identical key)
6. Producer sends another record <key=“Horrible Halloween party”, value=“Invitation”> to ”parties” topic
6. Producer sends another record <key=“Horrible Halloween party”, value=“Invitation”> to ”parties” topic
7. Paul and Chewy receive the invitation
Paul receives the Halloween invitation as the key is different and the record is sent to the partition that Paul is
allocated to
Chewy is the only consumer in his group so he gets every record no matter what partition it’s sent to
Example Kafka Use Cases
Real-time data pipeline
Read-time data pipeline features:
• Ingestion of multiple heterogeneous sources
• Sending data to multiple heterogeneous sinks
• Acts as a buffer to smooth out load spikes
• Enables use cases which reprocess data (e.g. disaster recovery)
Anomaly Detection Pipeline
Real-time Event processing pipeline:
• Simple event driven applications (If X then Y…)
• May write and read from other data sources (e.g. Cassandra)
• New Events sent back to Kafka or to other systems
• E.g. Anomaly Detection, check out my current blog series if you are interested in this example.
Kafka Streams Processing (Kongo IoT Blog series)
Streams processing features:
• Complex streams processing (multiple events and streams)
• Time, windows, and transformations
• Uses Kafka Streams API, includes state store
• Visualization of the streams topology
• Continuously computes the loads for trucks and checks if they are overloaded.
Linkedin - Before Kafka (BK)
A real example from Linkedin, who developed Kafka.
Before Kafka they had spaghetti integration of monolithic applications.
To accommodate growing membership and increasing site complexity, they migrated from a monolithic
application infrastructure to one based on microservices, which made the integration even more complex!
After Kafka (AK)
Rather than maintaining and scaling each pipeline individually, they invested in the
development of a single, distributed pub-sub platform - Kafka was born.
The main benefit was better Service decoupling and independent scaling.
The End (of the introduction) -
Find out more
Apache Kafka: https://kafka.apache.org/
Instaclustr blogs
• Mix of Cassandra, Spark, Zeppelin and Kafka
https://www.instaclustr.com/paul-brebner/
• Kafka introduction
https://insidebigdata.com/2018/04/12/developing-deeper-understanding-apache-kafka-architecture/
https://insidebigdata.com/2018/04/19/developing-deeper-understanding-apache-kafka-architecture-part-2-
• Kongo – Kafka IoT logistics application blog series
https://www.instaclustr.com/instaclustr-kongo-iot-logistics-streaming-demo-application/
• Anomaly detection with Kafka and Cassandra (and Kubernetes), current blog series
https://www.instaclustr.com/anomalia-machina-1-massively-scalable-anomaly-detection-with-apache-kafka-
Instaclustr’s Managed Kafka (Free trial)
https://www.instaclustr.com/solutions/managed-apache-kafka/

More Related Content

What's hot

Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
Chhavi Parasher
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
Amir Sedighi
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
Knoldus Inc.
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developers
confluent
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
emreakis
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
confluent
 
Apache kafka
Apache kafkaApache kafka
kafka
kafkakafka
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Jean-Paul Azar
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Kumar Shivam
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
Guozhang Wang
 
Spring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise PlatformSpring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise Platform
VMware Tanzu
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
Diego Pacheco
 
Apache Kafka - Overview
Apache Kafka - OverviewApache Kafka - Overview
Apache Kafka - Overview
CodeOps Technologies LLP
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
Saroj Panyasrivanit
 
Messaging queue - Kafka
Messaging queue - KafkaMessaging queue - Kafka
Messaging queue - Kafka
Mayank Bansal
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
Aparna Pillai
 
Kafka At Scale in the Cloud
Kafka At Scale in the CloudKafka At Scale in the Cloud
Kafka At Scale in the Cloud
confluent
 
Kafka 101 and Developer Best Practices
Kafka 101 and Developer Best PracticesKafka 101 and Developer Best Practices
Kafka 101 and Developer Best Practices
confluent
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producer
confluent
 

What's hot (20)

Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developers
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
kafka
kafkakafka
kafka
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Spring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise PlatformSpring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise Platform
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Apache Kafka - Overview
Apache Kafka - OverviewApache Kafka - Overview
Apache Kafka - Overview
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Messaging queue - Kafka
Messaging queue - KafkaMessaging queue - Kafka
Messaging queue - Kafka
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Kafka At Scale in the Cloud
Kafka At Scale in the CloudKafka At Scale in the Cloud
Kafka At Scale in the Cloud
 
Kafka 101 and Developer Best Practices
Kafka 101 and Developer Best PracticesKafka 101 and Developer Best Practices
Kafka 101 and Developer Best Practices
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producer
 

Similar to A visual introduction to Apache Kafka

Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
Kafka101
Kafka101Kafka101
Kafka101
Aparna Pillai
 
Apache Kafka Women Who Code Meetup
Apache Kafka Women Who Code MeetupApache Kafka Women Who Code Meetup
Apache Kafka Women Who Code Meetup
Snehal Nagmote
 
Stream Processing Metamorphosis - A Kafka's tale
Stream Processing Metamorphosis - A Kafka's taleStream Processing Metamorphosis - A Kafka's tale
Stream Processing Metamorphosis - A Kafka's tale
João Vazão Vasques
 
From a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePersonFrom a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePerson
LivePerson
 
Animated: Components of Kafka
Animated: Components of KafkaAnimated: Components of Kafka
Animated: Components of Kafka
Arvind Singh Rawat
 
04-Kafka.pptx
04-Kafka.pptx04-Kafka.pptx
04-Kafka.pptx
MannMehta13
 
04-Kafka.pptx
04-Kafka.pptx04-Kafka.pptx
04-Kafka.pptx
AdityaGanguly12
 
From a kafkaesque story to The Promised Land
From a kafkaesque story to The Promised LandFrom a kafkaesque story to The Promised Land
From a kafkaesque story to The Promised Land
Ran Silberman
 
Exactly Once Delivery is a Harsh Mistress - Natan Silnitsky
Exactly Once Delivery is a Harsh Mistress  - Natan SilnitskyExactly Once Delivery is a Harsh Mistress  - Natan Silnitsky
Exactly Once Delivery is a Harsh Mistress - Natan Silnitsky
DevOpsDays Tel Aviv
 
Exactly once delivery is a harsh mistress - DevOps Days TLV
Exactly once delivery is a harsh mistress - DevOps Days TLVExactly once delivery is a harsh mistress - DevOps Days TLV
Exactly once delivery is a harsh mistress - DevOps Days TLV
Natan Silnitsky
 
Exactly Once Delivery with Kafka - JOTB2020 Mini Session
Exactly Once Delivery with Kafka - JOTB2020 Mini SessionExactly Once Delivery with Kafka - JOTB2020 Mini Session
Exactly Once Delivery with Kafka - JOTB2020 Mini Session
Natan Silnitsky
 
Kafka.pptx
Kafka.pptxKafka.pptx
Kafka.pptx
Tarun techme
 
TDEA 2018 Kafka EOS (Exactly-once)
TDEA 2018 Kafka EOS (Exactly-once)TDEA 2018 Kafka EOS (Exactly-once)
TDEA 2018 Kafka EOS (Exactly-once)
Erhwen Kuo
 

Similar to A visual introduction to Apache Kafka (14)

Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Kafka101
Kafka101Kafka101
Kafka101
 
Apache Kafka Women Who Code Meetup
Apache Kafka Women Who Code MeetupApache Kafka Women Who Code Meetup
Apache Kafka Women Who Code Meetup
 
Stream Processing Metamorphosis - A Kafka's tale
Stream Processing Metamorphosis - A Kafka's taleStream Processing Metamorphosis - A Kafka's tale
Stream Processing Metamorphosis - A Kafka's tale
 
From a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePersonFrom a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePerson
 
Animated: Components of Kafka
Animated: Components of KafkaAnimated: Components of Kafka
Animated: Components of Kafka
 
04-Kafka.pptx
04-Kafka.pptx04-Kafka.pptx
04-Kafka.pptx
 
04-Kafka.pptx
04-Kafka.pptx04-Kafka.pptx
04-Kafka.pptx
 
From a kafkaesque story to The Promised Land
From a kafkaesque story to The Promised LandFrom a kafkaesque story to The Promised Land
From a kafkaesque story to The Promised Land
 
Exactly Once Delivery is a Harsh Mistress - Natan Silnitsky
Exactly Once Delivery is a Harsh Mistress  - Natan SilnitskyExactly Once Delivery is a Harsh Mistress  - Natan Silnitsky
Exactly Once Delivery is a Harsh Mistress - Natan Silnitsky
 
Exactly once delivery is a harsh mistress - DevOps Days TLV
Exactly once delivery is a harsh mistress - DevOps Days TLVExactly once delivery is a harsh mistress - DevOps Days TLV
Exactly once delivery is a harsh mistress - DevOps Days TLV
 
Exactly Once Delivery with Kafka - JOTB2020 Mini Session
Exactly Once Delivery with Kafka - JOTB2020 Mini SessionExactly Once Delivery with Kafka - JOTB2020 Mini Session
Exactly Once Delivery with Kafka - JOTB2020 Mini Session
 
Kafka.pptx
Kafka.pptxKafka.pptx
Kafka.pptx
 
TDEA 2018 Kafka EOS (Exactly-once)
TDEA 2018 Kafka EOS (Exactly-once)TDEA 2018 Kafka EOS (Exactly-once)
TDEA 2018 Kafka EOS (Exactly-once)
 

More from Paul Brebner

The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
Paul Brebner
 
Apache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Apache ZooKeeper and Apache Curator: Meet the Dining PhilosophersApache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Apache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Paul Brebner
 
Spinning your Drones with Cadence Workflows and Apache Kafka
Spinning your Drones with Cadence Workflows and Apache KafkaSpinning your Drones with Cadence Workflows and Apache Kafka
Spinning your Drones with Cadence Workflows and Apache Kafka
Paul Brebner
 
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Paul Brebner
 
Scaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/HardScaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/Hard
Paul Brebner
 
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/HardOPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
Paul Brebner
 
A Visual Introduction to Apache Kafka
A Visual Introduction to Apache KafkaA Visual Introduction to Apache Kafka
A Visual Introduction to Apache Kafka
Paul Brebner
 
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Paul Brebner
 
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Paul Brebner
 
Grid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and PotentialGrid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and Potential
Paul Brebner
 
Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...
Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Paul Brebner
 
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Paul Brebner
 
0b101000 years of computing: a personal timeline - decade "0", the 1980's
0b101000 years of computing: a personal timeline - decade "0", the 1980's0b101000 years of computing: a personal timeline - decade "0", the 1980's
0b101000 years of computing: a personal timeline - decade "0", the 1980's
Paul Brebner
 
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
Paul Brebner
 
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
Paul Brebner
 
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
Paul Brebner
 
How to Improve the Observability of Apache Cassandra and Kafka applications...
How to Improve the Observability of Apache Cassandra and Kafka applications...How to Improve the Observability of Apache Cassandra and Kafka applications...
How to Improve the Observability of Apache Cassandra and Kafka applications...
Paul Brebner
 

More from Paul Brebner (20)

The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
 
Apache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Apache ZooKeeper and Apache Curator: Meet the Dining PhilosophersApache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Apache ZooKeeper and Apache Curator: Meet the Dining Philosophers
 
Spinning your Drones with Cadence Workflows and Apache Kafka
Spinning your Drones with Cadence Workflows and Apache KafkaSpinning your Drones with Cadence Workflows and Apache Kafka
Spinning your Drones with Cadence Workflows and Apache Kafka
 
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
 
Scaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/HardScaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/Hard
 
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/HardOPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
 
A Visual Introduction to Apache Kafka
A Visual Introduction to Apache KafkaA Visual Introduction to Apache Kafka
A Visual Introduction to Apache Kafka
 
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
 
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
 
Grid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and PotentialGrid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and Potential
 
Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
 
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
 
0b101000 years of computing: a personal timeline - decade "0", the 1980's
0b101000 years of computing: a personal timeline - decade "0", the 1980's0b101000 years of computing: a personal timeline - decade "0", the 1980's
0b101000 years of computing: a personal timeline - decade "0", the 1980's
 
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
 
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
 
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
 
How to Improve the Observability of Apache Cassandra and Kafka applications...
How to Improve the Observability of Apache Cassandra and Kafka applications...How to Improve the Observability of Apache Cassandra and Kafka applications...
How to Improve the Observability of Apache Cassandra and Kafka applications...
 

Recently uploaded

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 

Recently uploaded (20)

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 

A visual introduction to Apache Kafka

  • 1. Introducing Apache Kafka® Paul Brebner Technical Evangelist February 2019
  • 2. What is Kafka? Message flow Distributed streams Processing System Messages sent by Distributed Producers to Distributed Consumers Consumers via Distributed Kafka Cluster Cluster
  • 3. Kafka Benefits  Fast  Scalable  Reliable  Durable  Open Source  Managed Service Kafka benefits
  • 4. Kafka Benefits  Fast – high throughput and low latency  Scalable – horizontally scalable with nodes and partitions  Reliable – distributed and fault tolerant  Durable - zero data loss, messages persisted to disk with immutable log  Open Source – An Apache project  Available as a Managed Service - on multiple cloud platforms Kafka benefits
  • 7.
  • 8. To send messages from A to B
  • 9. “A” is the Producer – sends a message, to
  • 10. “B” the Consumer (recipient) of the message
  • 11. Due to decline in “snail mail” direct deliveries
  • 12. Due to decline in “snail mail” direct deliveries
  • 13. Instead … “Poste Restante”
  • 14. “Poste Restante”? • Not a post office in a restaurant • General delivery (in the US) • The mail is delivered to a post office, they hold it for you until you call
  • 15. Consumers “poll” for messages by visiting the Poste Restante counter at the post office
  • 16. Kafka topics act like a Post Office
  • 17. Benefits include • Disconnected delivery – consumer doesn’t need to be available to receive messages • Less effort for the messaging service – only has to deliver to a few locations not every consumer • Can scale better and handle more complex delivery semantics!
  • 19. A single counter introduces delays
  • 20. More counters increases concurrency
  • 21. Kafka topics have >= 1 Partitions (“counters”) • Partitions increase consumer concurrency • Increase throughput • Reduce latency
  • 22. Santa North Pole What’s a Kafka message? A Record – like a letter
  • 24. Santa North PoleTopic Timestamp, offset, partition The “Postmark”
  • 25. Santa North PoleTopic Timestamp, offset, partition The “Postmark” Time semantics are flexible time of event creation, ingestion, or processing.
  • 26. Santa North PoleTopic Timestamp, offset, partition Key -> Partition (optional) Key (optional)
  • 27. Santa North PoleTopic Timestamp, offset, partition Key -> Partition (optional) Key (optional) Refines the destination Send to Santa not just any Elf
  • 28. Santa North PoleTopic Timestamp, offset, partition Key -> Partition (optional) Value is contents (byte array)
  • 29. Santa North PoleTopic Timestamp, offset, partition Key -> Partition (optional) Kafka Producers and Consumers need a serializer and de-serializer to write & read key and value
  • 30. • Kafka doesn’t look at the value • Consumer can read value • And try to make sense of the message • What will Santa be delivering?!
  • 31. Next… Delivery Semantics Do we care if the message arrives?
  • 32. Yes! Guaranteed delivery is desirable
  • 33. But homing pigeons got lost or eaten
  • 35. One pigeon may make it Eaten Lost
  • 40. Producer Broker 1 Broker 2 Broker 3 M1 M1 M1 M1M1 The message is also replicated on multiple “brokers”
  • 41. Producer Broker 1 Broker 2 Broker 3 M1 M1 M1M1 Which makes it resilient to loss of most servers
  • 42. Producer Broker 1 Broker 2 Broker 3 Acknowledgement M1 M1 M1 Producer gets acknowledgement once the message is persisted and replicated (configurable)
  • 43. Consumer Broker 1 Broker 2 Broker 3 M1 M1 M1 Multiple Brokers and Partitions  increased read availability and concurrency
  • 44. Producer Consumer Consumer Consumer Consumer The 2nd aspect of delivery semantics: Who gets the messages? How many times are messages delivered?
  • 45. Producer Consumer Consumer Consumer Consumer Delivery Semantics - Kafka is “pub-sub” - Loosely coupled - Producers and consumers don’t know about each other
  • 46. Producer Topic “Parties” Consumer Consumer Consumer Consumer Which consumers get which messages (filtering), is topic based Topic “Work”
  • 48. Producer Topic “Parties” Consumer Consumer Consumer Consumers Subscribed to Topic “Parties” Consumer Publishers send messages to topics Send Topic “Work”Send
  • 49. Producer Topic “Parties” Consumer Consumer Consumer Consumers Subscribed to Topic “Parties” Consumer Consumers only receive messages from subscribed topics Consumers Poll To receive messages from ”Parties” Topic “Work” Consumers not subscribed to “Work” Don’t receive any “Work” messages
  • 50. Partitions and Consumer Groups Enable sharing of work across consumers
  • 51. Duplicate message delivery Each message is delivered to each subscribed consumer group
  • 52. Producer Partition 1 Partition 2 Partition n Topic “Parties” C1 Consumer Group Consumer Group Consumer Consumer Consumer Consumer Consumers subscribed to topic are allocated partitions They will only get messages from their allocated partitions.
  • 53. Producer Partition 1 Partition 2 Partition n Topic “Parties” C1 Consumer Group Consumer Consumer Consumer Consumers share work within groups Consumers in the same group share the work around Each consumer gets only a subset of messages
  • 54. Producer Partition 1 Partition 2 Partition n Topic “Parties” C1 Consumer Group Consumer Group Consumer Consumer Consumer Consumer Messages are duplicated across Consumer groups Multiple groups enable message broadcasting Messages are duplicated across groups, each consumer group receives a copy of each message.
  • 55. Key Partition based delivery Which messages are delivered to which consumers? If a message has a key, then Kafka uses Partition based delivery. Messages with the same key are always sent to the same partition and therefore the same consumer. And the order is guaranteed.
  • 56. No Key Round robin delivery If the key is null, then Kafka uses round robin delivery Each message is delivered to the next partition
  • 57. Time for an Example, with 2 consumer groups.
  • 58. Consumer Group = Nerds Multiple consumers
  • 59. Consumer Group = Nerds Multiple consumers Consumer Group = Hairy Single consumer
  • 60. Producer Partition 1 Partition 2 Partition n Topic “Parties” C1 Group “Nerds” Group “Hairy” Consumer 1 (Bill) Consumer 2 (Paul) Consumer n Consumer 1 (Chewy) Consumers Subscribed to “Parties” No Key Round Robin M1 M2 etc M1 M1 M1 M2 M2 M2 Case 1: No Key Message (M1, M2, etc) sent to the next partition All consumers allocated to that partition will receive a message when they poll next.
  • 61. Here’s what happens (not showing producer or topics, have to imagine them)
  • 62. 1. Both Groups subscribe to Topic “parties” (11 partitions, so 1 consumer per partition). Subscribe to “Parties” Subscribe to “Parties”
  • 63. 1. Both Groups subscribe to Topic “parties” (11 partitions, so 1 consumer per partition). 2. Producer sends record “Cool pool party – Invitation” <key=null, value=“Cool pool party - Invitation”> to “parties” topic (no key)
  • 64. 1. Both Groups subscribe to Topic “parties” (11 partitions, so 1 consumer per partition). 2. Producer sends record “Cool pool party - Invitation”> to “parties” topic 3. Bill and Chewbacca receive a copy of the invitation and plan to attend
  • 65. 4. Producer sends another record “Cool pool party – Cancelled” <key=null, value=“Cool pool party - Cancelled”> to “parties” topic
  • 66. 4. Producer sends another record <key=null, value=“Cool pool party - Cancelled”> to “parties” topic 5. Paul and Chewbacca receive the cancellation. Paul gets the message this time as it’s round robin, ignores it as he didn’t get the invitation. Bill wastes his time trying to go to cancelled party. The rest of the gang aren’t surprised at not receiving any party invites and stay at home to do some hacking. Chewy is only consumer in his group so gets all messages, plans something fun instead…
  • 67.
  • 68. Producer Partition 1 Partition 2 Partition n Topic “Parties” C1 Group “Nerds” Group “Hairy” Consumer 1 (Bill) Consumer 2 (Paul) Consumer n Consumer 1 (Chewy) Consumers Subscribed to “Parties” Key Hashed to partition M1, M2 etc M1, M2 M1, M2 M1, M2 M3 M3 M3 M3 Case 2: If there is a Key A key is hashed to a partition, and a Message with that key is always sent to that partition. Assume there are 3 messages, and messages 1 and 2 are hashed to same partition.
  • 69. Here’s what happens with a key: key is “title” of the message (e.g. “Cool pool party”) Same set up as before: 1. Both Groups subscribe to Topic “parties” (11 partitions).
  • 70. 1. Both Groups subscribe to Topic “parties” (11 partitions). 2. Producer sends record <key=“Cool pool party”, value=“Invitation”> to “parties” topic
  • 71. 1. Both Groups subscribe to Topic “parties” (11 partitions). 2. Producer sends record <key=“Cool pool party”, value=“Invitation”> to “parties” topic 3. As before Bill and Chewbacca receive a copy of the invitation and plan to attend
  • 72. 4. Producer sends another record <key=“Cool pool party”, value=“Cancelled”> to “parties” topic
  • 73. 4. Producer sends another record <key=“Cool pool party”, value=“Cancelled”> to “parties” topic 5. Bill and Chewbacca receive the cancellation (same consumers this time, as identical key)
  • 74. 6. Producer sends another record <key=“Horrible Halloween party”, value=“Invitation”> to ”parties” topic
  • 75. 6. Producer sends another record <key=“Horrible Halloween party”, value=“Invitation”> to ”parties” topic 7. Paul and Chewy receive the invitation Paul receives the Halloween invitation as the key is different and the record is sent to the partition that Paul is allocated to Chewy is the only consumer in his group so he gets every record no matter what partition it’s sent to
  • 76.
  • 77.
  • 79. Real-time data pipeline Read-time data pipeline features: • Ingestion of multiple heterogeneous sources • Sending data to multiple heterogeneous sinks • Acts as a buffer to smooth out load spikes • Enables use cases which reprocess data (e.g. disaster recovery)
  • 80. Anomaly Detection Pipeline Real-time Event processing pipeline: • Simple event driven applications (If X then Y…) • May write and read from other data sources (e.g. Cassandra) • New Events sent back to Kafka or to other systems • E.g. Anomaly Detection, check out my current blog series if you are interested in this example.
  • 81. Kafka Streams Processing (Kongo IoT Blog series) Streams processing features: • Complex streams processing (multiple events and streams) • Time, windows, and transformations • Uses Kafka Streams API, includes state store • Visualization of the streams topology • Continuously computes the loads for trucks and checks if they are overloaded.
  • 82. Linkedin - Before Kafka (BK) A real example from Linkedin, who developed Kafka. Before Kafka they had spaghetti integration of monolithic applications. To accommodate growing membership and increasing site complexity, they migrated from a monolithic application infrastructure to one based on microservices, which made the integration even more complex!
  • 83. After Kafka (AK) Rather than maintaining and scaling each pipeline individually, they invested in the development of a single, distributed pub-sub platform - Kafka was born. The main benefit was better Service decoupling and independent scaling.
  • 84. The End (of the introduction) - Find out more Apache Kafka: https://kafka.apache.org/ Instaclustr blogs • Mix of Cassandra, Spark, Zeppelin and Kafka https://www.instaclustr.com/paul-brebner/ • Kafka introduction https://insidebigdata.com/2018/04/12/developing-deeper-understanding-apache-kafka-architecture/ https://insidebigdata.com/2018/04/19/developing-deeper-understanding-apache-kafka-architecture-part-2- • Kongo – Kafka IoT logistics application blog series https://www.instaclustr.com/instaclustr-kongo-iot-logistics-streaming-demo-application/ • Anomaly detection with Kafka and Cassandra (and Kubernetes), current blog series https://www.instaclustr.com/anomalia-machina-1-massively-scalable-anomaly-detection-with-apache-kafka- Instaclustr’s Managed Kafka (Free trial) https://www.instaclustr.com/solutions/managed-apache-kafka/

Editor's Notes

  1. What is Kafka? Kafka is a distributed streams processing system, it allows distributed producers to send messages to distributed consumers via a Kafka cluster.
  2. Kafka benefits Fast – high throughput and low latency Scalable – horizontally scalable, just add nodes and partitions Reliable – distributed and fault tolerant Zero data loss – messages are persisted to disk with immutable log Open Source – An Apache project Available as a Managed Service - on multiple cloud platforms
  3. Kafka benefits Fast – high throughput and low latency Scalable – horizontally scalable, just add nodes and partitions Reliable – distributed and fault tolerant Zero data loss – messages are persisted to disk with immutable log Open Source – An Apache project Available as a Managed Service - on multiple cloud platforms
  4. Back to the intro Kafka overview diagram, it’s a bit monochrome and boring. This talk will be more colourful and it’s going to be an extended story…
  5. Back to the intro Kafka overview diagram, it’s a bit monochrome and boring. This talk will be more colourful and it’s going to be an extended story…
  6. Let’s build a modern day fully electronic postal service – the Kafka Postal Service
  7. What does a postal service do? It sends messages from A to B (animated with sounds, click!)
  8. A is a producer, sends a message…
  9. To B, the consumer.
  10. Actually, no. Due to the decline in “snail mail” volumes, direct deliveries have been cancelled.
  11. Actually, no. Due to the decline in “snail mail” volumes, direct deliveries have been cancelled.
  12. Instead we have “Poste Restante” - Not a post office in a restaurant It’s called general delivery (in the US). The mail is delivered to a post office, and they hold it for you until you call for it.
  13. Consumers poll for messages by visiting the counter at the post office.
  14. Consumers poll for messages by visiting the counter at the post office.
  15. Kafka topics act like a Post Office. What are the benefits? Disconnected delivery – consumer doesn’t need to be available to receive messages Less effort for the messaging service – only has to delivery to a few locations not larger number of consumer addresses Can scale better and handle more complex delivery semantics!
  16. Kafka topics act like a Post Office. What are the benefits? Disconnected delivery – consumer doesn’t need to be available to receive messages Less effort for the messaging service – only has to delivery to a few locations not larger number of consumer addresses Can scale better and handle more complex delivery semantics!
  17. First lets see how it scales. What if there are many consumers for a topic?
  18. A single counter introduces delays and limits concurrency
  19. More counters increases concurrency and can reduce delays
  20. Kafka Topics have 1 or more Partitions, partitions function like multiple counters and enable high concurrency.
  21. Before looking at delivery semantics what does a message actually look like? In Kafka a message is called a Record and is like a letter.
  22. The topic is the destination.
  23. The “Postmark” includes a timestamp, offset in the topic, and the partition it was sent to. Time semantics are flexible, either the time of event creation, ingestion, or processing.
  24. The “Postmark” includes a timestamp, offset in the topic, and the partition it was sent to. Time semantics are flexible, either the time of event creation, ingestion, or processing.
  25. There’s also a thing called a Key, which is is optional. It refines the destination so it’s a bit like the rest of the address. We want this letter sent to Santa not just any Elf.
  26. There’s also a thing called a Key, which is is optional. It refines the destination so it’s a bit like the rest of the address. We want this letter sent to Santa not just any Elf.
  27. The value is the contents (just a byte array). Kafka Producers and consumers need to have a shared serializer and de-serializer for both the key and value.
  28. The value is the contents (just a byte array). Kafka Producers and consumers need to have a shared serializer and de-serializer for both the key and value.
  29. Kafka doesn’t look inside the value, but the Producer and Consumer can, and the Consumer can try and make sense of the message… I wonder what Santa will be delivering?
  30. Next lets look at delivery semantics. For example, do we care if the message actually arrives or not?
  31. Yes we do! Guaranteed message delivery is desirable. Homing pigeons got lost or eaten, so need to send the message with multiple pigeons
  32. Yes we do! Guaranteed message delivery is desirable. Homing pigeons got lost or eaten, so need to send the message with multiple pigeons
  33. Yes we do! Guaranteed message delivery is desirable. Homing pigeons got lost or eaten, so need to send the message with multiple pigeons
  34. Yes we do! Guaranteed message delivery is desirable. Homing pigeons got lost or eaten, so need to send the message with multiple pigeons
  35. How does Kafka guarantee delivery? A Message (M1) is written to a broker (2)
  36. How does Kafka guarantee delivery? A Message (M1) is written to a broker (2)
  37. and the message is always persisted to disk.
  38. This makes it resilient to loss of power.
  39. The message is also replicated on multiple “brokers” , 3 is typical
  40. And makes it resilient to loss of most servers
  41. Finally the producer gets acknowledgement once the message is persisted and replicated (configurable for number, and sync or async) The message is now available from more than one broker in case some fail. This also increases the read concurrency as partitions are spread over multiple brokers.
  42. Finally the producer gets acknowledgement once the message is persisted and replicated (configurable for number, and sync or async) The message is now available from more than one broker in case some fail. This also increases the read concurrency as partitions are spread over multiple brokers.
  43. Now let’s look at the 2nd aspect of delivery semantics. Who gets the messages and how many times are messages delivered? Kafka is “pub-sub”, It’s loosely coupled, producers and consumers don’t know about each other
  44. Now let’s look at the 2nd aspect of delivery semantics. Who gets the messages and how many times are messages delivered? Kafka is “pub-sub”, It’s loosely coupled, producers and consumers don’t know about each other
  45. Filtering, or which consumers get which messages, is topic based Publishers send messages to topics Consumers subscribe to topics of interest, e.g. parties. When they poll they only receive messages sent to those topics.
  46. Filtering, or which consumers get which messages, is topic based Publishers send messages to topics Consumers subscribe to topics of interest, e.g. parties. When they poll they only receive messages sent to those topics.
  47. Filtering, or which consumers get which messages, is topic based Publishers send messages to topics Consumers subscribe to topics of interest, e.g. parties. When they poll they only receive messages sent to those topics.
  48. Filtering, or which consumers get which messages, is topic based Publishers send messages to topics Consumers subscribe to topics of interest, e.g. parties. When they poll they only receive messages sent to those topics.
  49. Just a few more details and we can see how this works. Partitions and consumer groups enable sharing of work across multiple consumers, the more partitions a topic has the more consumers it supports
  50. Kafka supports delivery of the same message to multiple consumers. Kafka doesn’t throw messages away immediately they are delivered, so the same message can easily be delivered to multiple consumer groups.
  51. Consumers subscribed to ”parties” topic are allocated partitions. When they poll they will only get messages from their allocated partitions.
  52. This enables consumers in the same group to share the work around. Each consumer gets only a subset of the available messages.
  53. Multiple groups enable message broadcasting. Messages are duplicated across groups, as each consumer group receives a copy of each message.
  54. Which messages are delivered to which consumers? The final aspect of delivery semantics is to do with message keys. If a message has a key, then Kafka uses Partition based delivery. Messages with the same key are always sent to the same partition and therefore the same consumer. And the order is guaranteed.
  55. If the key is null, then Kafka uses round robin delivery. Each message is delivered to the next partition.
  56. Let’s look at an example with 2 consumer groups. Nerds, which has multiple consumers, and Hairy which has a single consumer.
  57. Let’s look at an example with 2 consumer groups. Nerds, which has multiple consumers, and Hairy which has a single consumer.
  58. Let’s look at an example with 2 consumer groups. Nerds, which has multiple consumers, and Hairy which has a single consumer.
  59. Looking at the case where there’s No Key 1st, each message (1, 2, etc) is sent to the next partition, and all consumers allocated to that partition will receive the message when they poll next.
  60. Here’s what actually happens. We’re not showing the producer or topic for simplicity. You’ll have to imagine them.
  61. Both Groups subscribe to Topic “parties” (11 partitions, so 1 consumer per partition).
  62. 2 Producer sends record with the value “Cool pool party - Invitation” to “parties” topic (there’s no key)
  63. 3 Bill and Chewbacca receive a copy of the invitation and plan to attend
  64. 4. Producer sends another record with the value “Cool pool party – Cancelled” to “parties” topic
  65. In the Nerds group, Paul gets the message this time as it’s round robin, and Chewy gets it as he’s the only consumer in his group. Paul ignores it as he didn’t get the original invite Bill wastes his time trying to go The rest of the gang aren’t surprised at not receiving any invites and stay home to do some hacking Chewy plans something else fun instead...
  66. A visit to the hairdressers!
  67. How does it work if there is a Key? The key is hashed to a partition, and the Message is sent to that partition. Assume there are 3 messages, and messages 1 and 2 are hashed to same partition.
  68. Here’s what happens with a key, assuming that the key is the “title” of the message (“Cool pool party”) As before Both Groups subscribe to Topic “parties” (11 partitions). Producer sends record <key=“Cool pool party”, value=“Invitation”> to “parties” topic
  69. Here’s what happens with a key, assuming that the key is the “title” of the message (“Cool pool party”) As before Both Groups subscribe to Topic “parties” (11 partitions). Producer sends record <key=“Cool pool party”, value=“Invitation”> to “parties” topic
  70. As before, Bill and Chewbacca receive a copy of the invitation and plan to attend
  71. 4. Producer sends another record with the same key but a cancellation value to “parties” topic
  72. This time, Bill and Chewbacca receive the cancellation (the same consumers as the key is identical)
  73. Producer sends out another invitation to a Halloween party. The key is different this time.
  74. Paul receives the Halloween invitation as the key is different and the record is sent to the partition that Paul is allocated Chewy is the only consumer in his group so he gets every record no matter what partition it’s sent to
  75. This time Chewy gets dressed up and goes to the party.
  76. Who did you imagine was producing the invitations? Maybe this fellow.
  77. Here’s a UML-like diagram with the main Kafka components. We’ve introduced Producers, Topics, Partitions, Consumer Groups and Consumes today. There’s a lot more to explore, including how Kafka provides replication, and the Connect and Streaming APIs.
  78. To finish up here are three important use cases for Kafka
  79. The Real-time Data pipeline features Ingestion of multiple heterogeneous sources Sending data to multiple heterogeneous sinks Acts as a buffer to smooth out load spikes Enables use cases which reprocess data (e.g. disaster recovery)
  80. Real-time Event processing features: Simple event driven applications (If X then Y…) May write and read from other data sources (e.g. Cassandra) New Events sent back to Kafka or to other systems E.g. Anomaly Detection, check out my current blog series if you are interested in this example.
  81. Streams processing features: Complex streams processing (multiple events and streams) Time, windows, and transformations Streams API, includes state store This a visualization of the streams topology from the streams processing pipeline from my previous blog series, Kongo, a Kafka IoT logistics application. It continuously computes the loads for trucks and checks if they are overloaded.
  82. Here’s a real example from Linkedin, who developed Kafka. Before Kafka they had spaghetti integration of monolithic applications. To accommodate growing membership and increasing site complexity, they migrated from a monolithic application infrastructure to one based on microservices, which made the integration even more complex.
  83. After Kafka Rather than maintaining and scaling each pipeline individually, they invested in the development of a single, distributed pub-sub platform. Thus, Kafka was born. This main benefit was better Service decoupling and independent scaling.