What is Apache Kafka and What is an Event Streaming Platform?

1
What is Apache Kafka and
What is an Event Streaming Platform?
Bern Apache Kafka®
Meetup

2
Join the Confluent
Community Slack Channel
Subscribe to the
Confluent blog
cnfl.io/community-slack cnfl.io/read
Welcome to the Apache Kafka® Meetup in Bern!
6:00pm
Doors open
6:00pm - 6:30pm
Food, Drinks and Networking
6:30pm – 6:50pm
Matthias Imsand, Amanox Solutions
6:50pm - 7:35pm
Gabriel Schenker, Confluent
7:35pm - 8:00pm
Additional Q&A & Networking
Apache, Apache Kafka, Kafka and the Kafka logo are trademarks of the Apache Software Foundation. The Apache Software Foundation has no affiliation
with and does not endorse the materials provided at this event.

3
About Me
● Gabriel N. Schenker
● Lead Curriculum Developer @ Conﬂuent
● Formerly at Docker, Alienvault, …
● Lives in Appenzell, AI
● Github: github.org/gnschenker
● Twitter: @gnschenker

44
What is an
Event Streaming Platform?

5
Event
Streaming
Platforms
should do two
things:
Reliably store streams of events
Process streams of events

66
The Event
Streaming Paradigm

77
ETL/Data Integration Messaging
Batch
Expensive
Time Consuming
Diﬃcult to Scale
No Persistence
Data Loss
No Replay
High Throughput
Durable
Persistent
Maintains Order
Fast (Low Latency)

88
Batch
Expensive
Time Consuming
Diﬃcult to Scale
No Persistence
Data Loss
No Replay
High Throughput
Durable
Persistent
Maintains Order
Fast (Low Latency)

99
Transient MessagesStored records
ETL/Data Integration MessagingMessaging
Batch
Expensive
Time Consuming
Diﬃcult to Scale
No Persistence
Data Loss
No Replay
High Throughput
Durable
Persistent
Maintains Order
Fast (Low Latency)
Event Streaming Paradigm
High Throughput
Durable
Persistent
Maintains Order
Fast (Low Latency)

1010
Fast (Low Latency)
To rethink data as not stored records
or transient messages, but instead as
a continually updating stream of events

1111
Fast (Low Latency)

16C O N F I D E N T I A L
Mainframes Hadoop
Data
Warehouse
...
Device
Logs
... Splunk ... App App Microservice ...
Data Stores Custom Apps/MicroservicesLogs 3rd Party Apps
Universal Event Pipeline
Real-Time
Inventory
Real-Time
Fraud
Detection
Real-Time
Customer 360
Machine
Learning
Models
Real-Time
Data
Transformation
...
Contextual Event-Driven Apps
Apache Kafka®
STREAMS
CONNECT CLIENTS

18
A Modern, Distributed Platform for
Data Streams

19
Apache Kafka® is made up of
distributed, immutable, append-only
commit logs

20
Writers
Kafka
cluster
Readers

2121
Kafka: Scalability of a ﬁlesystem
• hundreds of MB/s throughput
• many TB per server
• commodity hardware

2222
Kafka: Guarantees of a Database
• Strict ordering
• Persistence

2323
Kafka: Rewind and Replay
Rewind & Replay
Reset to any point in the shared narrative

2424
Kafka: Distributed by design
• Replication
• Fault Tolerance
• Partitioning
• Elastic Scaling

2525
Kafka Topics
my-topic
my-topic-partition-0
broker-1
broker-2
broker-3

2626
Creating a topic
$ kafka-topics --bootstrap-server broker101:9092
--create
--topic my-topic
--replication-factor 3
--partitions 3

2828
Producing to Kafka
Time
C CC

2929
Partition Leadership and Replication
Broker 1
Topic1
partition1
Broker 2 Broker 3 Broker 4
Topic1
partition1
Topic1
partition1
Leader Follower
Topic1
partition2
Topic1
partition2
Topic1
partition2
Topic1
partition3
Topic1
partition4
Topic1
partition3
Topic1
partition3
Topic1
partition4
Topic1
partition4

3030
Partition Leadership and Replication - node failure
Broker 1
Topic1
partition1
Topic1
partition1
Topic1
partition1
Leader Follower
Topic1
partition2
Topic1
partition2
Topic1
partition2
Topic1
partition3
Topic1
partition4
Topic1
partition3
Topic1
partition3
Topic1
partition4
Topic1
partition4

3232
Producer
Clients - Producer Design
Producer Record
Topic
[Partition]
[Timestamp]
Value
Serializer Partitioner
Topic A
Partition 0
Batch 0
Batch 1
Batch 2
Topic B
Partition 1
Batch 0
Batch 1
Batch 2
Kafka
Broker
Send()
Retry
?
Fail
?
Yes
No
Can’t retry,
throw exception
Success: return
metadata
Yes
[Headers]
[Key]

3333
The Serializer
Kafka doesn’t care about what you send to it as long as it’s
been converted to a byte stream beforehand.
JSON
CSV
Avro
Protobufs
XML
SERIALIZERS
01001010 01010011 01001111 01001110
01000011 01010011 01010110
01001010 01010011 01001111 01001110
01010000 01110010 01101111 01110100 ...
01011000 01001101 01001100
(if you must)
Reference
https://kafka.apache.org/10/documentation/streams/developer-guide/datatypes.html

3434
The Serializer
private Properties settings = new Properties();
settings.put("bootstrap.servers", "broker1:9092,broker2:9092");
settings.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
settings.put("value.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer");
settings.put("schema.registry.url", "https://schema-registry:8083");
producer = new KafkaProducer<String, Invoice>(settings);
Reference
https://kafka.apache.org/10/documentation/streams/developer-guide/datatypes.html

3535
Producer Record
Topic
[Partition]
[Key]
Value
Record keys determine the partition with the default kafka
partitioner
If a key isn’t provided, messages will be produced
in a round robin fashion
partitioner
Record Keys and why they’re important - Ordering

3636
Producer Record
Topic
[Partition]
AAAA
Value
partitioner, and therefore guarantee order for a key
Keys are used in the default partitioning algorithm:
partition = hash(key) % numPartitions
partitioner

3737
Producer Record
Topic
[Partition]
BBBB
Value
partitioner

3838
Producer Record
Topic
[Partition]
CCCC
Value
partitioner

3939
Producer Record
Topic
[Partition]
DDDD
Value
partitioner

4040
Record Keys and why they’re important - Key Cardinality
Consumers
Key cardinality affects the amount
of work done by the individual
consumers in a group. Poor key
choice can lead to uneven
workloads.
Keys in Kafka don’t have to be
primitives, like strings or ints. Like
values, they can be be anything:
JSON, Avro, etc… So create a key
that will evenly distribute groups of
records around the partitions.
Car·di·nal·i·ty
/ˌkärdəˈnalədē/
Noun
the number of elements in a set or other grouping, as a property of that grouping.

4141
{
“Name”: “John Smith”,
“Address”: “123 Apple St.”,
“Zip”: “19101”
}
You don’t have to but... use a Schema!
Data
Producer
Service
Data
Consumer
Service
{
"Name": "John Smith",
"Address": "123 Apple St.",
"City": "Philadelphia",
"State": "PA",
"Zip": "19101"
}
send JSON
“Where’s record.City?”
Reference
https://www.conﬂuent.io/blog/schema-registry-kafka-stream-processing-yes-virginia-you
-really-need-one/

4242
Schema Registry: Make Data Backwards Compatible and Future-Proof
● Define the expected fields for each Kafka topic
● Automatically handle schema changes (e.g. new
fields)
● Prevent backwards incompatible
changes
● Support multi-data center environments
Elastic
Cassandra
HDFS
Example Consumers
Serializer
App 1
Serializer
App 2
!
Kafka Topic!
Schema
Registry
Open Source Feature

4343
Developing with Confluent Schema Registry
We provide several Maven plugins for developing with
the Confluent Schema Registry
● download - download a subject’s schema to
your project
● register - register a new schema to the
schema registry from your development env
● test-compatibility - test changes made to
a schema against compatibility rules set by the
schema registry
Reference
https://docs.confluent.io/current/schema-registry/docs/maven-plugin.html
<plugin>
<groupId>io.confluent</groupId>
<artifactId>kafka-schema-registry-maven-plugin</
<version>5.0.0</version>
<configuration>
<schemaRegistryUrls>
<param>http://192.168.99.100:8081</p
</schemaRegistryUrls>
<outputDirectory>src/main/avro</outputDi
<subjectPatterns>
<param>^TestSubject000-(key|value)$<
</subjectPatterns>
</configuration>
</plugin>

4444
{
"Zip": "19101",
"City": "NA",
"State": "NA"
}
Avro allows for evolution of schemas
{
"City": "Philadelphia",
"State": "PA",
"Zip": "19101"
}
Data
Producer
Service
Data
Consumer
Service
send AvroRecord
Schema
Registry
Version 1Version 2
Reference
https://www.conﬂuent.io/blog/schema-registry-kafka-stream-processing-yes-virginia-you
-really-need-one/

4545
Use Kafka’s Headers
Reference
https://cwiki.apache.org/conﬂuence/display/KAFKA/KIP-82+-+Add+Record+Headers
Producer Record
Topic
[Partition]
[Timestamp]
Value
[Headers]
[Key]
Kafka Headers are simply an interface that requires a key of type
String, and a value of type byte[], the headers are stored in an
iterator in the ProducerRecord .
Example Use Cases
● Data lineage: reference previous topic partition/offsets
● Producing host/application/owner
● Message routing
● Encryption metadata (which key pair was this message
payload encrypted with?)

4646
Producer Guarantees
P
Topic1
partition1
Leader Follower
Topic1
partition1
Topic1
partition1
Producer Properties
acks=0
Reference
https://www.conﬂuent.io/blog/exactly-once-semantics-are-p
ossible-heres-how-apache-kafka-does-it/

4747
Producer Guarantees
P
Topic1
partition1
Leader Follower
Topic1
partition1
Topic1
partition1
ack
Producer Properties
acks=1
Reference

4848
Producer Guarantees
P
Topic1
partition1
Leader Follower
Topic1
partition1
Topic1
partition1
Producer Properties
acks=all
min.insync.replica=2
ack

4949
Producer Guarantees - without exactly once guarantees
P
Topic1
partition1
Leader Follower
Topic1
partition1
Topic1
partition1
Producer Properties
acks=all
{key: 1234 data: abcd} - offset 3345
Failed ack
Successful write
Reference

5050
Producer Guarantees - without exactly once guarantees
P
Topic1
partition1
Leader Follower
Topic1
partition1
Topic1
partition1
Producer Properties
acks=all
{key: 1234, data: abcd} - offset 3345
{key: 1234, data: abcd} - offset 3346
retry
ack
dupe!
Reference

5151
Producer Guarantees - with exactly once guarantees
P
Topic1
partition1
Leader Follower
Topic1
partition1
Topic1
partition1
Producer Properties
enable.idempotence=true
max.inflight.requests.per.connection=5
acks = "all"
retries > 0 (preferably MAX_INT)
(pid, seq) [payload]
(100, 1) {key: 1234, data: abcd} - offset 3345
(100, 1) {key: 1234, data: abcd} - rejected, ack re-sent
(100, 2) {key: 5678, data: efgh} - offset 3346
retry
ack
no dupe!
Reference

5252
Transactional Producer
Producer
T1 T1 T1 T1 T1
KafkaProducer producer = createKafkaProducer(
"bootstrap.servers", "broker:9092",
"transactional.id", "my-transactional-id");
producer.initTransactions();
-- send some records --
producer.commitTransaction();
Consumer
KafkaConsumer consumer = createKafkaConsumer(
"bootstrap.servers", "broker:9092",
"group.id", "my-group-id",
"isolation.level", "read_committed");
Reference
https://www.conﬂuent.io/blog/transactions-apache-kafka/

5454
A basic Java Consumer
final Consumer<String, String> consumer = new KafkaConsumer<String, String>(props);
consumer.subscribe(Arrays.asList(topic));
try {
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
// Do Some Work …
}
}
} finally {
consumer.close();
}
}

5555
Consuming From Kafka - Single Consumer
C

5656
Consuming From Kafka - Grouped Consumers
CC
C1
CC
C2

5757
C C
C C

5858
0 1
2 3

5959
0 1
2 3

6060
0, 3 1
2 3

6161
Resources
Free E-Books from Confluent!
https://www.confluent.io/apache-kafka-stream-processing-book-bundle
Confluent Blog: https://www.confluent.io/blog
Thank You!
gabriel@confluent.io
@gnschenker

25% off!
KS19Comm25
25% off!
KS19Comm25

What is Apache Kafka and What is an Event Streaming Platform?

More Related Content

What's hot

Similar to What is Apache Kafka and What is an Event Streaming Platform?

More from confluent

Recently uploaded

What is Apache Kafka and What is an Event Streaming Platform?