1
What is Apache Kafka and
What is an Event Streaming Platform?
Bern Apache Kafka®
Meetup
2
Join the Confluent
Community Slack Channel
Subscribe to the
Confluent blog
cnfl.io/community-slack cnfl.io/read
Welcome to the Apache Kafka® Meetup in Bern!
6:00pm
Doors open
6:00pm - 6:30pm
Food, Drinks and Networking
6:30pm – 6:50pm
Matthias Imsand, Amanox Solutions
6:50pm - 7:35pm
Gabriel Schenker, Confluent
7:35pm - 8:00pm
Additional Q&A & Networking
Apache, Apache Kafka, Kafka and the Kafka logo are trademarks of the Apache Software Foundation. The Apache Software Foundation has no affiliation
with and does not endorse the materials provided at this event.
3
About Me
● Gabriel N. Schenker
● Lead Curriculum Developer @ Confluent
● Formerly at Docker, Alienvault, …
● Lives in Appenzell, AI
● Github: github.org/gnschenker
● Twitter: @gnschenker
44
What is an
Event Streaming Platform?
5
Event
Streaming
Platforms
should do two
things:
Reliably store streams of events
Process streams of events
66
The Event
Streaming Paradigm
77
ETL/Data Integration Messaging
Batch
Expensive
Time Consuming
Difficult to Scale
No Persistence
Data Loss
No Replay
High Throughput
Durable
Persistent
Maintains Order
Fast (Low Latency)
88
ETL/Data Integration Messaging
Batch
Expensive
Time Consuming
Difficult to Scale
No Persistence
Data Loss
No Replay
High Throughput
Durable
Persistent
Maintains Order
Fast (Low Latency)
99
ETL/Data Integration Messaging
Transient MessagesStored records
ETL/Data Integration MessagingMessaging
Batch
Expensive
Time Consuming
Difficult to Scale
No Persistence
Data Loss
No Replay
High Throughput
Durable
Persistent
Maintains Order
Fast (Low Latency)
Event Streaming Paradigm
High Throughput
Durable
Persistent
Maintains Order
Fast (Low Latency)
1010
Fast (Low Latency)
Event Streaming Paradigm
To rethink data as not stored records
or transient messages, but instead as
a continually updating stream of events
1111
Fast (Low Latency)
Event Streaming Paradigm
12
13
14
15
16C O N F I D E N T I A L
Mainframes Hadoop
Data
Warehouse
...
Device
Logs
... Splunk ... App App Microservice ...
Data Stores Custom Apps/MicroservicesLogs 3rd Party Apps
Universal Event Pipeline
Real-Time
Inventory
Real-Time
Fraud
Detection
Real-Time
Customer 360
Machine
Learning
Models
Real-Time
Data
Transformation
...
Contextual Event-Driven Apps
Apache Kafka®
STREAMS
CONNECT CLIENTS
17
18
A Modern, Distributed Platform for
Data Streams
19
Apache Kafka® is made up of
distributed, immutable, append-only
commit logs
20
Writers
Kafka
cluster
Readers
2121
Kafka: Scalability of a filesystem
• hundreds of MB/s throughput
• many TB per server
• commodity hardware
2222
Kafka: Guarantees of a Database
• Strict ordering
• Persistence
2323
Kafka: Rewind and Replay
Rewind & Replay
Reset to any point in the shared narrative
2424
Kafka: Distributed by design
• Replication
• Fault Tolerance
• Partitioning
• Elastic Scaling
2525
Kafka Topics
my-topic
my-topic-partition-0
my-topic-partition-1
my-topic-partition-2
broker-1
broker-2
broker-3
2626
Creating a topic
$ kafka-topics --bootstrap-server broker101:9092 
--create 
--topic my-topic 
--replication-factor 3 
--partitions 3
2727
Producing to Kafka
Time
2828
Producing to Kafka
Time
C CC
2929
Partition Leadership and Replication
Broker 1
Topic1
partition1
Broker 2 Broker 3 Broker 4
Topic1
partition1
Topic1
partition1
Leader Follower
Topic1
partition2
Topic1
partition2
Topic1
partition2
Topic1
partition3
Topic1
partition4
Topic1
partition3
Topic1
partition3
Topic1
partition4
Topic1
partition4
3030
Partition Leadership and Replication - node failure
Broker 1
Topic1
partition1
Broker 2 Broker 3 Broker 4
Topic1
partition1
Topic1
partition1
Leader Follower
Topic1
partition2
Topic1
partition2
Topic1
partition2
Topic1
partition3
Topic1
partition4
Topic1
partition3
Topic1
partition3
Topic1
partition4
Topic1
partition4
3131
Producing to Kafka
3232
Producer
Clients - Producer Design
Producer Record
Topic
[Partition]
[Timestamp]
Value
Serializer Partitioner
Topic A
Partition 0
Batch 0
Batch 1
Batch 2
Topic B
Partition 1
Batch 0
Batch 1
Batch 2
Kafka
Broker
Send()
Retry
?
Fail
?
Yes
No
Can’t retry,
throw exception
Success: return
metadata
Yes
[Headers]
[Key]
3333
The Serializer
Kafka doesn’t care about what you send to it as long as it’s
been converted to a byte stream beforehand.
JSON
CSV
Avro
Protobufs
XML
SERIALIZERS
01001010 01010011 01001111 01001110
01000011 01010011 01010110
01001010 01010011 01001111 01001110
01010000 01110010 01101111 01110100 ...
01011000 01001101 01001100
(if you must)
Reference
https://kafka.apache.org/10/documentation/streams/developer-guide/datatypes.html
3434
The Serializer
private Properties settings = new Properties();
settings.put("bootstrap.servers", "broker1:9092,broker2:9092");
settings.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
settings.put("value.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer");
settings.put("schema.registry.url", "https://schema-registry:8083");
producer = new KafkaProducer<String, Invoice>(settings);
Reference
https://kafka.apache.org/10/documentation/streams/developer-guide/datatypes.html
3535
Producer Record
Topic
[Partition]
[Key]
Value
Record keys determine the partition with the default kafka
partitioner
If a key isn’t provided, messages will be produced
in a round robin fashion
partitioner
Record Keys and why they’re important - Ordering
3636
Producer Record
Topic
[Partition]
AAAA
Value
Record keys determine the partition with the default kafka
partitioner, and therefore guarantee order for a key
Keys are used in the default partitioning algorithm:
partition = hash(key) % numPartitions
partitioner
Record Keys and why they’re important - Ordering
3737
Producer Record
Topic
[Partition]
BBBB
Value
Keys are used in the default partitioning algorithm:
partition = hash(key) % numPartitions
partitioner
Record keys determine the partition with the default kafka
partitioner, and therefore guarantee order for a key
Record Keys and why they’re important - Ordering
3838
Producer Record
Topic
[Partition]
CCCC
Value
Keys are used in the default partitioning algorithm:
partition = hash(key) % numPartitions
partitioner
Record keys determine the partition with the default kafka
partitioner, and therefore guarantee order for a key
Record Keys and why they’re important - Ordering
3939
Record Keys and why they’re important - Ordering
Producer Record
Topic
[Partition]
DDDD
Value
Keys are used in the default partitioning algorithm:
partition = hash(key) % numPartitions
partitioner
Record keys determine the partition with the default kafka
partitioner, and therefore guarantee order for a key
4040
Record Keys and why they’re important - Key Cardinality
Consumers
Key cardinality affects the amount
of work done by the individual
consumers in a group. Poor key
choice can lead to uneven
workloads.
Keys in Kafka don’t have to be
primitives, like strings or ints. Like
values, they can be be anything:
JSON, Avro, etc… So create a key
that will evenly distribute groups of
records around the partitions.
Car·di·nal·i·ty
/ˌkärdəˈnalədē/
Noun
the number of elements in a set or other grouping, as a property of that grouping.
4141
{
“Name”: “John Smith”,
“Address”: “123 Apple St.”,
“Zip”: “19101”
}
You don’t have to but... use a Schema!
Data
Producer
Service
Data
Consumer
Service
{
"Name": "John Smith",
"Address": "123 Apple St.",
"City": "Philadelphia",
"State": "PA",
"Zip": "19101"
}
send JSON
“Where’s record.City?”
Reference
https://www.confluent.io/blog/schema-registry-kafka-stream-processing-yes-virginia-you
-really-need-one/
4242
Schema Registry: Make Data Backwards Compatible and Future-Proof
● Define the expected fields for each Kafka topic
● Automatically handle schema changes (e.g. new
fields)
● Prevent backwards incompatible
changes
● Support multi-data center environments
Elastic
Cassandra
HDFS
Example Consumers
Serializer
App 1
Serializer
App 2
!
Kafka Topic!
Schema
Registry
Open Source Feature
4343
Developing with Confluent Schema Registry
We provide several Maven plugins for developing with
the Confluent Schema Registry
● download - download a subject’s schema to
your project
● register - register a new schema to the
schema registry from your development env
● test-compatibility - test changes made to
a schema against compatibility rules set by the
schema registry
Reference
https://docs.confluent.io/current/schema-registry/docs/maven-plugin.html
<plugin>
<groupId>io.confluent</groupId>
<artifactId>kafka-schema-registry-maven-plugin</
<version>5.0.0</version>
<configuration>
<schemaRegistryUrls>
<param>http://192.168.99.100:8081</p
</schemaRegistryUrls>
<outputDirectory>src/main/avro</outputDi
<subjectPatterns>
<param>^TestSubject000-(key|value)$<
</subjectPatterns>
</configuration>
</plugin>
4444
{
"Name": "John Smith",
"Address": "123 Apple St.",
"Zip": "19101",
"City": "NA",
"State": "NA"
}
Avro allows for evolution of schemas
{
"Name": "John Smith",
"Address": "123 Apple St.",
"City": "Philadelphia",
"State": "PA",
"Zip": "19101"
}
Data
Producer
Service
Data
Consumer
Service
send AvroRecord
Schema
Registry
Version 1Version 2
Reference
https://www.confluent.io/blog/schema-registry-kafka-stream-processing-yes-virginia-you
-really-need-one/
4545
Use Kafka’s Headers
Reference
https://cwiki.apache.org/confluence/display/KAFKA/KIP-82+-+Add+Record+Headers
Producer Record
Topic
[Partition]
[Timestamp]
Value
[Headers]
[Key]
Kafka Headers are simply an interface that requires a key of type
String, and a value of type byte[], the headers are stored in an
iterator in the ProducerRecord .
Example Use Cases
● Data lineage: reference previous topic partition/offsets
● Producing host/application/owner
● Message routing
● Encryption metadata (which key pair was this message
payload encrypted with?)
4646
Producer Guarantees
P
Broker 1 Broker 2 Broker 3
Topic1
partition1
Leader Follower
Topic1
partition1
Topic1
partition1
Producer Properties
acks=0
Reference
https://www.confluent.io/blog/exactly-once-semantics-are-p
ossible-heres-how-apache-kafka-does-it/
4747
Producer Guarantees
P
Broker 1 Broker 2 Broker 3
Topic1
partition1
Leader Follower
Topic1
partition1
Topic1
partition1
ack
Producer Properties
acks=1
Reference
https://www.confluent.io/blog/exactly-once-semantics-are-p
ossible-heres-how-apache-kafka-does-it/
4848
Producer Guarantees
P
Broker 1 Broker 2 Broker 3
Topic1
partition1
Leader Follower
Topic1
partition1
Topic1
partition1
Producer Properties
acks=all
min.insync.replica=2
ack
4949
Producer Guarantees - without exactly once guarantees
P
Broker 1 Broker 2 Broker 3
Topic1
partition1
Leader Follower
Topic1
partition1
Topic1
partition1
Producer Properties
acks=all
min.insync.replica=2
{key: 1234 data: abcd} - offset 3345
Failed ack
Successful write
Reference
https://www.confluent.io/blog/exactly-once-semantics-are-p
ossible-heres-how-apache-kafka-does-it/
5050
Producer Guarantees - without exactly once guarantees
P
Broker 1 Broker 2 Broker 3
Topic1
partition1
Leader Follower
Topic1
partition1
Topic1
partition1
Producer Properties
acks=all
min.insync.replica=2
{key: 1234, data: abcd} - offset 3345
{key: 1234, data: abcd} - offset 3346
retry
ack
dupe!
Reference
https://www.confluent.io/blog/exactly-once-semantics-are-p
ossible-heres-how-apache-kafka-does-it/
5151
Producer Guarantees - with exactly once guarantees
P
Broker 1 Broker 2 Broker 3
Topic1
partition1
Leader Follower
Topic1
partition1
Topic1
partition1
Producer Properties
enable.idempotence=true
max.inflight.requests.per.connection=5
acks = "all"
retries > 0 (preferably MAX_INT)
(pid, seq) [payload]
(100, 1) {key: 1234, data: abcd} - offset 3345
(100, 1) {key: 1234, data: abcd} - rejected, ack re-sent
(100, 2) {key: 5678, data: efgh} - offset 3346
retry
ack
no dupe!
Reference
https://www.confluent.io/blog/exactly-once-semantics-are-p
ossible-heres-how-apache-kafka-does-it/
5252
Transactional Producer
Producer
T1 T1 T1 T1 T1
KafkaProducer producer = createKafkaProducer(
"bootstrap.servers", "broker:9092",
"transactional.id", "my-transactional-id");
producer.initTransactions();
-- send some records --
producer.commitTransaction();
Consumer
KafkaConsumer consumer = createKafkaConsumer(
"bootstrap.servers", "broker:9092",
"group.id", "my-group-id",
"isolation.level", "read_committed");
Reference
https://www.confluent.io/blog/transactions-apache-kafka/
5353
Consuming from Kafka
5454
A basic Java Consumer
final Consumer<String, String> consumer = new KafkaConsumer<String, String>(props);
consumer.subscribe(Arrays.asList(topic));
try {
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
// Do Some Work …
}
}
} finally {
consumer.close();
}
}
5555
Consuming From Kafka - Single Consumer
C
5656
Consuming From Kafka - Grouped Consumers
CC
C1
CC
C2
5757
Consuming From Kafka - Grouped Consumers
C C
C C
5858
Consuming From Kafka - Grouped Consumers
0 1
2 3
5959
Consuming From Kafka - Grouped Consumers
0 1
2 3
6060
Consuming From Kafka - Grouped Consumers
0, 3 1
2 3
6161
Resources
Free E-Books from Confluent!
https://www.confluent.io/apache-kafka-stream-processing-book-bundle
Confluent Blog: https://www.confluent.io/blog
Thank You!
gabriel@confluent.io
@gnschenker
6262
Thank You!
25% off!
KS19Comm25
25% off!
KS19Comm25

What is Apache Kafka and What is an Event Streaming Platform?

  • 1.
    1 What is ApacheKafka and What is an Event Streaming Platform? Bern Apache Kafka® Meetup
  • 2.
    2 Join the Confluent CommunitySlack Channel Subscribe to the Confluent blog cnfl.io/community-slack cnfl.io/read Welcome to the Apache Kafka® Meetup in Bern! 6:00pm Doors open 6:00pm - 6:30pm Food, Drinks and Networking 6:30pm – 6:50pm Matthias Imsand, Amanox Solutions 6:50pm - 7:35pm Gabriel Schenker, Confluent 7:35pm - 8:00pm Additional Q&A & Networking Apache, Apache Kafka, Kafka and the Kafka logo are trademarks of the Apache Software Foundation. The Apache Software Foundation has no affiliation with and does not endorse the materials provided at this event.
  • 3.
    3 About Me ● GabrielN. Schenker ● Lead Curriculum Developer @ Confluent ● Formerly at Docker, Alienvault, … ● Lives in Appenzell, AI ● Github: github.org/gnschenker ● Twitter: @gnschenker
  • 4.
    44 What is an EventStreaming Platform?
  • 5.
    5 Event Streaming Platforms should do two things: Reliablystore streams of events Process streams of events
  • 6.
  • 7.
    77 ETL/Data Integration Messaging Batch Expensive TimeConsuming Difficult to Scale No Persistence Data Loss No Replay High Throughput Durable Persistent Maintains Order Fast (Low Latency)
  • 8.
    88 ETL/Data Integration Messaging Batch Expensive TimeConsuming Difficult to Scale No Persistence Data Loss No Replay High Throughput Durable Persistent Maintains Order Fast (Low Latency)
  • 9.
    99 ETL/Data Integration Messaging TransientMessagesStored records ETL/Data Integration MessagingMessaging Batch Expensive Time Consuming Difficult to Scale No Persistence Data Loss No Replay High Throughput Durable Persistent Maintains Order Fast (Low Latency) Event Streaming Paradigm High Throughput Durable Persistent Maintains Order Fast (Low Latency)
  • 10.
    1010 Fast (Low Latency) EventStreaming Paradigm To rethink data as not stored records or transient messages, but instead as a continually updating stream of events
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
    16C O NF I D E N T I A L Mainframes Hadoop Data Warehouse ... Device Logs ... Splunk ... App App Microservice ... Data Stores Custom Apps/MicroservicesLogs 3rd Party Apps Universal Event Pipeline Real-Time Inventory Real-Time Fraud Detection Real-Time Customer 360 Machine Learning Models Real-Time Data Transformation ... Contextual Event-Driven Apps Apache Kafka® STREAMS CONNECT CLIENTS
  • 17.
  • 18.
    18 A Modern, DistributedPlatform for Data Streams
  • 19.
    19 Apache Kafka® ismade up of distributed, immutable, append-only commit logs
  • 20.
  • 21.
    2121 Kafka: Scalability ofa filesystem • hundreds of MB/s throughput • many TB per server • commodity hardware
  • 22.
    2222 Kafka: Guarantees ofa Database • Strict ordering • Persistence
  • 23.
    2323 Kafka: Rewind andReplay Rewind & Replay Reset to any point in the shared narrative
  • 24.
    2424 Kafka: Distributed bydesign • Replication • Fault Tolerance • Partitioning • Elastic Scaling
  • 25.
  • 26.
    2626 Creating a topic $kafka-topics --bootstrap-server broker101:9092 --create --topic my-topic --replication-factor 3 --partitions 3
  • 27.
  • 28.
  • 29.
    2929 Partition Leadership andReplication Broker 1 Topic1 partition1 Broker 2 Broker 3 Broker 4 Topic1 partition1 Topic1 partition1 Leader Follower Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition4 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4
  • 30.
    3030 Partition Leadership andReplication - node failure Broker 1 Topic1 partition1 Broker 2 Broker 3 Broker 4 Topic1 partition1 Topic1 partition1 Leader Follower Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition4 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4
  • 31.
  • 32.
    3232 Producer Clients - ProducerDesign Producer Record Topic [Partition] [Timestamp] Value Serializer Partitioner Topic A Partition 0 Batch 0 Batch 1 Batch 2 Topic B Partition 1 Batch 0 Batch 1 Batch 2 Kafka Broker Send() Retry ? Fail ? Yes No Can’t retry, throw exception Success: return metadata Yes [Headers] [Key]
  • 33.
    3333 The Serializer Kafka doesn’tcare about what you send to it as long as it’s been converted to a byte stream beforehand. JSON CSV Avro Protobufs XML SERIALIZERS 01001010 01010011 01001111 01001110 01000011 01010011 01010110 01001010 01010011 01001111 01001110 01010000 01110010 01101111 01110100 ... 01011000 01001101 01001100 (if you must) Reference https://kafka.apache.org/10/documentation/streams/developer-guide/datatypes.html
  • 34.
    3434 The Serializer private Propertiessettings = new Properties(); settings.put("bootstrap.servers", "broker1:9092,broker2:9092"); settings.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); settings.put("value.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer"); settings.put("schema.registry.url", "https://schema-registry:8083"); producer = new KafkaProducer<String, Invoice>(settings); Reference https://kafka.apache.org/10/documentation/streams/developer-guide/datatypes.html
  • 35.
    3535 Producer Record Topic [Partition] [Key] Value Record keysdetermine the partition with the default kafka partitioner If a key isn’t provided, messages will be produced in a round robin fashion partitioner Record Keys and why they’re important - Ordering
  • 36.
    3636 Producer Record Topic [Partition] AAAA Value Record keysdetermine the partition with the default kafka partitioner, and therefore guarantee order for a key Keys are used in the default partitioning algorithm: partition = hash(key) % numPartitions partitioner Record Keys and why they’re important - Ordering
  • 37.
    3737 Producer Record Topic [Partition] BBBB Value Keys areused in the default partitioning algorithm: partition = hash(key) % numPartitions partitioner Record keys determine the partition with the default kafka partitioner, and therefore guarantee order for a key Record Keys and why they’re important - Ordering
  • 38.
    3838 Producer Record Topic [Partition] CCCC Value Keys areused in the default partitioning algorithm: partition = hash(key) % numPartitions partitioner Record keys determine the partition with the default kafka partitioner, and therefore guarantee order for a key Record Keys and why they’re important - Ordering
  • 39.
    3939 Record Keys andwhy they’re important - Ordering Producer Record Topic [Partition] DDDD Value Keys are used in the default partitioning algorithm: partition = hash(key) % numPartitions partitioner Record keys determine the partition with the default kafka partitioner, and therefore guarantee order for a key
  • 40.
    4040 Record Keys andwhy they’re important - Key Cardinality Consumers Key cardinality affects the amount of work done by the individual consumers in a group. Poor key choice can lead to uneven workloads. Keys in Kafka don’t have to be primitives, like strings or ints. Like values, they can be be anything: JSON, Avro, etc… So create a key that will evenly distribute groups of records around the partitions. Car·di·nal·i·ty /ˌkärdəˈnalədē/ Noun the number of elements in a set or other grouping, as a property of that grouping.
  • 41.
    4141 { “Name”: “John Smith”, “Address”:“123 Apple St.”, “Zip”: “19101” } You don’t have to but... use a Schema! Data Producer Service Data Consumer Service { "Name": "John Smith", "Address": "123 Apple St.", "City": "Philadelphia", "State": "PA", "Zip": "19101" } send JSON “Where’s record.City?” Reference https://www.confluent.io/blog/schema-registry-kafka-stream-processing-yes-virginia-you -really-need-one/
  • 42.
    4242 Schema Registry: MakeData Backwards Compatible and Future-Proof ● Define the expected fields for each Kafka topic ● Automatically handle schema changes (e.g. new fields) ● Prevent backwards incompatible changes ● Support multi-data center environments Elastic Cassandra HDFS Example Consumers Serializer App 1 Serializer App 2 ! Kafka Topic! Schema Registry Open Source Feature
  • 43.
    4343 Developing with ConfluentSchema Registry We provide several Maven plugins for developing with the Confluent Schema Registry ● download - download a subject’s schema to your project ● register - register a new schema to the schema registry from your development env ● test-compatibility - test changes made to a schema against compatibility rules set by the schema registry Reference https://docs.confluent.io/current/schema-registry/docs/maven-plugin.html <plugin> <groupId>io.confluent</groupId> <artifactId>kafka-schema-registry-maven-plugin</ <version>5.0.0</version> <configuration> <schemaRegistryUrls> <param>http://192.168.99.100:8081</p </schemaRegistryUrls> <outputDirectory>src/main/avro</outputDi <subjectPatterns> <param>^TestSubject000-(key|value)$< </subjectPatterns> </configuration> </plugin>
  • 44.
    4444 { "Name": "John Smith", "Address":"123 Apple St.", "Zip": "19101", "City": "NA", "State": "NA" } Avro allows for evolution of schemas { "Name": "John Smith", "Address": "123 Apple St.", "City": "Philadelphia", "State": "PA", "Zip": "19101" } Data Producer Service Data Consumer Service send AvroRecord Schema Registry Version 1Version 2 Reference https://www.confluent.io/blog/schema-registry-kafka-stream-processing-yes-virginia-you -really-need-one/
  • 45.
    4545 Use Kafka’s Headers Reference https://cwiki.apache.org/confluence/display/KAFKA/KIP-82+-+Add+Record+Headers ProducerRecord Topic [Partition] [Timestamp] Value [Headers] [Key] Kafka Headers are simply an interface that requires a key of type String, and a value of type byte[], the headers are stored in an iterator in the ProducerRecord . Example Use Cases ● Data lineage: reference previous topic partition/offsets ● Producing host/application/owner ● Message routing ● Encryption metadata (which key pair was this message payload encrypted with?)
  • 46.
    4646 Producer Guarantees P Broker 1Broker 2 Broker 3 Topic1 partition1 Leader Follower Topic1 partition1 Topic1 partition1 Producer Properties acks=0 Reference https://www.confluent.io/blog/exactly-once-semantics-are-p ossible-heres-how-apache-kafka-does-it/
  • 47.
    4747 Producer Guarantees P Broker 1Broker 2 Broker 3 Topic1 partition1 Leader Follower Topic1 partition1 Topic1 partition1 ack Producer Properties acks=1 Reference https://www.confluent.io/blog/exactly-once-semantics-are-p ossible-heres-how-apache-kafka-does-it/
  • 48.
    4848 Producer Guarantees P Broker 1Broker 2 Broker 3 Topic1 partition1 Leader Follower Topic1 partition1 Topic1 partition1 Producer Properties acks=all min.insync.replica=2 ack
  • 49.
    4949 Producer Guarantees -without exactly once guarantees P Broker 1 Broker 2 Broker 3 Topic1 partition1 Leader Follower Topic1 partition1 Topic1 partition1 Producer Properties acks=all min.insync.replica=2 {key: 1234 data: abcd} - offset 3345 Failed ack Successful write Reference https://www.confluent.io/blog/exactly-once-semantics-are-p ossible-heres-how-apache-kafka-does-it/
  • 50.
    5050 Producer Guarantees -without exactly once guarantees P Broker 1 Broker 2 Broker 3 Topic1 partition1 Leader Follower Topic1 partition1 Topic1 partition1 Producer Properties acks=all min.insync.replica=2 {key: 1234, data: abcd} - offset 3345 {key: 1234, data: abcd} - offset 3346 retry ack dupe! Reference https://www.confluent.io/blog/exactly-once-semantics-are-p ossible-heres-how-apache-kafka-does-it/
  • 51.
    5151 Producer Guarantees -with exactly once guarantees P Broker 1 Broker 2 Broker 3 Topic1 partition1 Leader Follower Topic1 partition1 Topic1 partition1 Producer Properties enable.idempotence=true max.inflight.requests.per.connection=5 acks = "all" retries > 0 (preferably MAX_INT) (pid, seq) [payload] (100, 1) {key: 1234, data: abcd} - offset 3345 (100, 1) {key: 1234, data: abcd} - rejected, ack re-sent (100, 2) {key: 5678, data: efgh} - offset 3346 retry ack no dupe! Reference https://www.confluent.io/blog/exactly-once-semantics-are-p ossible-heres-how-apache-kafka-does-it/
  • 52.
    5252 Transactional Producer Producer T1 T1T1 T1 T1 KafkaProducer producer = createKafkaProducer( "bootstrap.servers", "broker:9092", "transactional.id", "my-transactional-id"); producer.initTransactions(); -- send some records -- producer.commitTransaction(); Consumer KafkaConsumer consumer = createKafkaConsumer( "bootstrap.servers", "broker:9092", "group.id", "my-group-id", "isolation.level", "read_committed"); Reference https://www.confluent.io/blog/transactions-apache-kafka/
  • 53.
  • 54.
    5454 A basic JavaConsumer final Consumer<String, String> consumer = new KafkaConsumer<String, String>(props); consumer.subscribe(Arrays.asList(topic)); try { while (true) { ConsumerRecords<String, String> records = consumer.poll(100); for (ConsumerRecord<String, String> record : records) { // Do Some Work … } } } finally { consumer.close(); } }
  • 55.
    5555 Consuming From Kafka- Single Consumer C
  • 56.
    5656 Consuming From Kafka- Grouped Consumers CC C1 CC C2
  • 57.
    5757 Consuming From Kafka- Grouped Consumers C C C C
  • 58.
    5858 Consuming From Kafka- Grouped Consumers 0 1 2 3
  • 59.
    5959 Consuming From Kafka- Grouped Consumers 0 1 2 3
  • 60.
    6060 Consuming From Kafka- Grouped Consumers 0, 3 1 2 3
  • 61.
    6161 Resources Free E-Books fromConfluent! https://www.confluent.io/apache-kafka-stream-processing-book-bundle Confluent Blog: https://www.confluent.io/blog Thank You! gabriel@confluent.io @gnschenker
  • 62.
  • 63.