SlideShare a Scribd company logo
1 of 49
1
Apache Kafka™ Fundamentals:
What Every Software Engineer Should
Know about Streams and Tables in Kafka
Kafka Meetup Zurich, Switzerland, Feb 2020
Michael G. Noll
Senior Technologist, Office of the CTO
@miguno
22
@miguno
Streams Tables
Event Streaming
33
@miguno
Streams Tables
Event Streaming
44
Events, Streams, Tables
A Primer
5
An Event Streaming Platform
gives you three key functionalities
5
Publish & Subscribe
to Events
Store
Events
Process & Analyze
Events
@miguno
6
An Event
records the fact that something happened
6
A good
was sold
An invoice
was issued to
Michael
Alice paid
$200 to Bob
on Feb 01, 2020
at 8:31am
A new customer
registered
@miguno
7
A Stream
records history as a sequence of Events
7
@miguno
88
@miguno
Event Streaming Paradigm
Highly Scalable
Durable
Persistent
Maintains Order
Fast (Low Latency)
Kafka = Source of Truth,
stores every article since 1851
Denormalized into
“Content View”
Normalized assets
(images, articles,
bylines, etc.)
https://www.confluent.io/blog/publishing-apache-kafka-new-york-times/
Streams record history, even hundreds of years
99
“The ledger of sales.” “The sales totals.”
Streams
record history
Tables
represent state
@miguno
1010
1. e4 e5
2. Nf3 Nc6
3. Bc4 Bc5
4. d3 Nf6
5. Nbd2
“The sequence of moves.” “The state of the board.”
Streams
record history
Tables
represent state
@miguno
11
Streams = INSERT only
Immutable, append-only
Tables = INSERT, UPDATE, DELETE
Mutable, row key (event.key) identifies which row
11
@miguno
12
The key to mutability is … the event.key!
12
@miguno
Stream Table
Has unique key constraint? No Yes
First event with key bob arrives INSERT INSERT
Another event with key bob arrives INSERT UPDATE
Event with key bob and value null arrives INSERT DELETE
Event with key null arrives INSERT <ignored>
RDBMS analogy: A Stream is ~ a Table that has no unique key and is append-only.
13
Creating a table from a stream or topic
streams
1414
@miguno
Stream
(facts)
Table
(dims)
alice Berlin
bob Lima
alice Berlin
alice Rome
bob Lima
alice Paris
bob Sydney
alice Berlin
alice Rome
bob Lima
alice Paris
bob Sydney
90°
1515
aggregation
like SUM(), COUNT()
table changes
*
See the academic paper Streams and Tables: Two Sides of the Same Coin, M. Sax et al., BIRTE ’18
Streams
record history
Tables
represent state
Duality
@miguno
16
Aggregating a stream (COUNT example)
streams
@miguno
17
Aggregating a stream (COUNT example)
@miguno
streams
1818
Topics, Partitions, and
Other Storage Fundamentals
19
A Topic
is where events are durably stored,
like a file in a distributed file system
19
@miguno
An unbounded sequence of serialized events,
where each event is represented as an
encoded key-value pair*
or “Kafka message”.
00100 11101 11000 0001110101 10001
20
Storage formats
Clients serialize events to bytes when writing,
and deserialize when reading.
20
@miguno
00100 11101 11000 00011 00100 0011010101 10001
Producer serializes
and writes an event
Consumer reads and
deserializes an event
21
Storage formats
Brokers take no part in this. All they see is bytes.
Being this “dumb” is actually really smart!
21
@miguno
Bytes in Bytes out
22
Storage is partitioned
And ‘partitions’ are the most important concept in Kafka
22
...
...
...
...
P1
P2
P3
P4
Storage
Topic
@miguno
23
Partitions are spread across Kafka brokers
A broker can be leader and/or follower of 1+ partitions
23
@miguno
...
P1 ...
P2
...
P3 ...
P1
...
...
...P4 ...P4 ...
...
P1
...
P3
...
P2
...
...
...
Leader
Follower
24
Storage Layer
(Brokers)
Processing Layer
(ksqlDB, KStreams,
Kafka clients)
Partitions play a central role in Kafka
24
Topics are partitioned. Partitions enable scalability, elasticity, fault tolerance.
stored in
replicated based on
ordered based on
PartitionsData is joined based on
read from and written to
processed based on
@miguno
25
Event producers determine event partitioning
through a partitioning function ƒ(event.key, event.value)*
25
...
...
...
...
P1
P2
P3
P4
Storage
Topic
Event sent and appended
to partition 1
@miguno
26
Events with same key should be in same partition
to ensure proper ordering of related events
to ensure processing by Consumers returns expected results
26
...
...
...
...
P1
P2
P3
P4
Yellow events should always
be stored in partition 3
@miguno
27
Top causes for same key in different partitions
1. Topic configuration
You increased the number of partitions in a topic
2. Producer configuration
A producer uses a custom partitioner
→ Be careful in this situation!
27
@miguno
2828
Processing Fundamentals with
ksqlDB and Kafka Streams
From Topics to Streams and Tables
29
Topics
live in the Kafka storage layer, ‘filesystem’ (Brokers)
Streams and Tables
live in the Kafka processing layer (ksqlDB, KStreams)
29
@miguno
30
Processing Layer
(ksqlDB, KStreams)
30
00100 11101 11000 00011 00100 00110Topic
alice Paris bob Sydney alice RomeEvent Stream
plus schema (serdes)
alice 2
bob 1
Table
plus aggregation
Storage Layer
(Brokers)
Topics vs. Streams and Tables
@miguno
31
Kafka Processing
Data is processed per-partition
31
...
...
...
...
P1
P2
P3
P4
Storage Processing
read via
network
Topic App Instance 1 Application
App Instance 2
‘payments’ with consumer group
‘fraud-detection-app’
@miguno
32
Kafka Processing
Stream tasks are the unit of parallelism
32
...
...
...
...
P1
P2
P3
P4
Storage Processing State
Stream Task 1
Stream Task 2
Stream Task 3
Stream Task 4
read via
network
Application Instance 1Topic Application Instance 1
Application Instance 2
@miguno
33
Streams and Tables are partitioned, too
33
...
...
...
...
P1
P2
P3
P4
Stream Task 1
Stream Task 2
Stream Task 3
Stream Task 4
TABLE / KTable
2 GB
3 GB
5 GB
2 GB
Application Instance 1
Application Instance 2
34
Global Tables give complete data to every task
Great for joins without re-keying the input or to broadcast info
34
...
...
...
...
P1
P2
P3
P4
Stream Task 1
Stream Task 2
Stream Task 3
Stream Task 4
GlobalKTable
2 + 3 + 5 + 2 = 12 GB
12 GB
12 GB
12 GB
@miguno
3535
@miguno@miguno
Concept Schema Partitioned Unbounded Ordering
Guarantee***
Mutable Single Value
Per Key
Fault Tolerant
Storage Layer
Topic No* (raw bytes) Yes Yes Yes No No Yes
Processing Layer
Stream Yes Yes Yes Yes No No Yes
Table Yes Yes No** No Yes Yes Yes
Global Table Yes No No** No Yes Yes Yes
*When using Schema Registry with Avro, then a schema is technically available for the data in a topic.
**Generally speaking the answer is Yes but, in practice, tables are almost always bounded due to finite key space.
***Ordering per partition, not across the multiple partitions in a topic.
Topics vs. Streams and Tables
3636
Elasticity, Fault Tolerance, and
Other Advanced Concepts
37
Tables are materialized on local disk
so that processing is fast, and tables can be larger than RAM.
Enables large-scale state. Saves $$$ on cloud instances.
37
But machines and containers can be lost!
How can we make this fault-tolerant? @miguno
Stream Task 1 Materialized on local disk under /var
(via RocksDB by default)
Global
Table
Table
38
streaming ‘restore’
via network
Stream Task 1
On machine B
Tables and other state are made fault-tolerant
by storing their data durably, remotely in Kafka topics
38
streaming ‘backup’
via network
table’s changelog topic
(log-compacted)
Kafka
Storage
ksqlDB,
Kafka Streams
On machine A
Stream Task 1
39
Standby Replicas speed up application recovery
App instances can optionally maintain copies of another
instance’s table/state data to minimize failover times
39
@miguno
num.standby.replicas = 1
Stream Task 1
Stream Task 2
App Instance 1
App Instance 2
num.standby.replicas = 0 (default)
Stream Task 1
Stream Task 2
Stream Task 2
failover
40
Elastic scaling
Stream tasks are migrated, including tables/state (via Kafka)
40
@miguno
App Instance 1
App Instance 2
Stream Task 1
Stream Task 3
Stream Task 2
Stream Task 4
App Instance 3
App Instance 4
Stream Task 2
Stream Task 4
41
bob Zurich
alice Rome
bob Zurich
bob Sydney alice Romealice Bern alice Paris
Tables <-> topic log-compaction
A table’s underlying topic is compacted by default
to save Kafka storage space
to speed up failover and recovery for processing
41
Note: Compaction intentionally removes part of a table’s history. If you need the full
history and don’t have the historic data elsewhere, consider disabling compaction.
@miguno
4242
@miguno
TL;DR for Log Compaction
Have a Stream?
→ Disable log compaction for its topic (= default)
Have a Table?
→ Enable log compaction for its topic (= default)
Disable only when needed, see previous slide
43
Max processing parallelism = #input partitions
43
...
...
...
...
P1
P2
P3
P4
Topic Application Instance 1
Application Instance 2
Application Instance 3
Application Instance 4
Application Instance 5 *** idle ***
Application Instance 6 *** idle ***
→ Need higher parallelism? Increase the original topic’s partition count.
→ Higher parallelism for just one use case? Derive a new topic from the
original with higher partition count. Lower its retention to save storage.
@miguno
44
How to increase number of partitions when needed
ksqlDB example: statement below creates a new stream with
the desired number of partitions.
44
CREATE STREAM products_repartitioned
WITH (PARTITIONS=30) AS
SELECT * FROM products
EMIT CHANGES
@miguno
45
‘Hot’ partitions can be problematic, often caused by
1. Events not evenly distributed across partitions
2. Events evenly distributed but certain events take longer to process
45
Strategies to address hot partitions include
1a. Ingress: Find better partitioning function ƒ(event.key) for producers
1b. Storage: Re-partition data into new topic if you can’t change the original
2. Scale processing vertically, e.g. more powerful CPU instances
...
...
...
...
P1
P2
P3
P4
@miguno
46
How to re-partition your data when needed
ksqlDB example: statement below creates a new stream with
changed number of partitions and a new field as event.key
(so that its data is now correctly co-partitioned for joining)
46
CREATE STREAM products_repartitioned
WITH (PARTITIONS=42) AS
SELECT * FROM products
PARTITION BY product_id
EMIT CHANGES
@miguno
47
Capacity Planning and Sizing
Sorry, not covered here! I recommend:
ksqlDB Performance Tuning for Fun and Profit, by Nick Dearden
https://kafka-summit.org/sessions/ksql-performance-tuning-fun-profit/
ksqlDB documentation
https://docs.ksqldb.io/en/latest/operate-and-deploy/capacity-planning/
Kafka Streams documentation
https://docs.confluent.io/current/streams/sizing.html
47
@miguno
4848
Where to go from here?
My blog series on Kafka Fundamentals to learn more
https://www.confluent.io/blog/kafka-streams-tables-part-1-event-streaming
Confluent Cloud, the fastest & easiest way to get started with Kafka
https://www.confluent.io/confluent-cloud/
Confluent Platform, if you want to operate Kafka yourself
https://www.confluent.io/product/confluent-platform/
48
@miguno
49
THANK YOU
@miguno
michael@confluent.io
cnfl.io/meetups cnfl.io/blog cnfl.io/slack

More Related Content

What's hot

SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...HostedbyConfluent
 
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...confluent
 
Concepts and Patterns for Streaming Services with Kafka
Concepts and Patterns for Streaming Services with KafkaConcepts and Patterns for Streaming Services with Kafka
Concepts and Patterns for Streaming Services with KafkaQAware GmbH
 
What's new in confluent platform 5.4 online talk
What's new in confluent platform 5.4 online talkWhat's new in confluent platform 5.4 online talk
What's new in confluent platform 5.4 online talkconfluent
 
Building a Streaming Platform with Kafka
Building a Streaming Platform with KafkaBuilding a Streaming Platform with Kafka
Building a Streaming Platform with Kafkaconfluent
 
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...HostedbyConfluent
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...confluent
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?confluent
 
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...confluent
 
Etl, esb, mq? no! es Apache Kafka®
Etl, esb, mq?  no! es Apache Kafka®Etl, esb, mq?  no! es Apache Kafka®
Etl, esb, mq? no! es Apache Kafka®confluent
 
How to Build an Apache Kafka® Connector
How to Build an Apache Kafka® ConnectorHow to Build an Apache Kafka® Connector
How to Build an Apache Kafka® Connectorconfluent
 
Confluent real time_acquisition_analysis_and_evaluation_of_data_streams_20190...
Confluent real time_acquisition_analysis_and_evaluation_of_data_streams_20190...Confluent real time_acquisition_analysis_and_evaluation_of_data_streams_20190...
Confluent real time_acquisition_analysis_and_evaluation_of_data_streams_20190...confluent
 
KSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache KafkaKSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache Kafkaconfluent
 
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...confluent
 
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud ServicesBuild a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Servicesconfluent
 
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)KafkaZone
 
Architecting Microservices Applications with Instant Analytics
Architecting Microservices Applications with Instant AnalyticsArchitecting Microservices Applications with Instant Analytics
Architecting Microservices Applications with Instant Analyticsconfluent
 
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...confluent
 
EDA Meets Data Engineering – What's the Big Deal?
EDA Meets Data Engineering – What's the Big Deal?EDA Meets Data Engineering – What's the Big Deal?
EDA Meets Data Engineering – What's the Big Deal?confluent
 
Battle Tested Event-Driven Patterns for your Microservices Architecture - Dev...
Battle Tested Event-Driven Patterns for your Microservices Architecture - Dev...Battle Tested Event-Driven Patterns for your Microservices Architecture - Dev...
Battle Tested Event-Driven Patterns for your Microservices Architecture - Dev...Natan Silnitsky
 

What's hot (20)

SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
 
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
 
Concepts and Patterns for Streaming Services with Kafka
Concepts and Patterns for Streaming Services with KafkaConcepts and Patterns for Streaming Services with Kafka
Concepts and Patterns for Streaming Services with Kafka
 
What's new in confluent platform 5.4 online talk
What's new in confluent platform 5.4 online talkWhat's new in confluent platform 5.4 online talk
What's new in confluent platform 5.4 online talk
 
Building a Streaming Platform with Kafka
Building a Streaming Platform with KafkaBuilding a Streaming Platform with Kafka
Building a Streaming Platform with Kafka
 
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?
 
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...
 
Etl, esb, mq? no! es Apache Kafka®
Etl, esb, mq?  no! es Apache Kafka®Etl, esb, mq?  no! es Apache Kafka®
Etl, esb, mq? no! es Apache Kafka®
 
How to Build an Apache Kafka® Connector
How to Build an Apache Kafka® ConnectorHow to Build an Apache Kafka® Connector
How to Build an Apache Kafka® Connector
 
Confluent real time_acquisition_analysis_and_evaluation_of_data_streams_20190...
Confluent real time_acquisition_analysis_and_evaluation_of_data_streams_20190...Confluent real time_acquisition_analysis_and_evaluation_of_data_streams_20190...
Confluent real time_acquisition_analysis_and_evaluation_of_data_streams_20190...
 
KSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache KafkaKSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache Kafka
 
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
 
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud ServicesBuild a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
 
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
 
Architecting Microservices Applications with Instant Analytics
Architecting Microservices Applications with Instant AnalyticsArchitecting Microservices Applications with Instant Analytics
Architecting Microservices Applications with Instant Analytics
 
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
 
EDA Meets Data Engineering – What's the Big Deal?
EDA Meets Data Engineering – What's the Big Deal?EDA Meets Data Engineering – What's the Big Deal?
EDA Meets Data Engineering – What's the Big Deal?
 
Battle Tested Event-Driven Patterns for your Microservices Architecture - Dev...
Battle Tested Event-Driven Patterns for your Microservices Architecture - Dev...Battle Tested Event-Driven Patterns for your Microservices Architecture - Dev...
Battle Tested Event-Driven Patterns for your Microservices Architecture - Dev...
 

Similar to Kafka Streams and Tables: A Primer

Kafka 102: Streams and Tables All the Way Down | Kafka Summit San Francisco 2019
Kafka 102: Streams and Tables All the Way Down | Kafka Summit San Francisco 2019Kafka 102: Streams and Tables All the Way Down | Kafka Summit San Francisco 2019
Kafka 102: Streams and Tables All the Way Down | Kafka Summit San Francisco 2019Michael Noll
 
Getting Started with Kafka on k8s
Getting Started with Kafka on k8sGetting Started with Kafka on k8s
Getting Started with Kafka on k8sVMware Tanzu
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
 
Deploying Kafka Streams Applications with Docker and Kubernetes
Deploying Kafka Streams Applications with Docker and KubernetesDeploying Kafka Streams Applications with Docker and Kubernetes
Deploying Kafka Streams Applications with Docker and Kubernetesconfluent
 
spark stream - kafka - the right way
spark stream - kafka - the right way spark stream - kafka - the right way
spark stream - kafka - the right way Dori Waldman
 
Unified stateful big data processing in Apache Beam (incubating)
Unified stateful big data processing in Apache Beam (incubating)Unified stateful big data processing in Apache Beam (incubating)
Unified stateful big data processing in Apache Beam (incubating)Aljoscha Krettek
 
Aljoscha Krettek - Portable stateful big data processing in Apache Beam
Aljoscha Krettek - Portable stateful big data processing in Apache BeamAljoscha Krettek - Portable stateful big data processing in Apache Beam
Aljoscha Krettek - Portable stateful big data processing in Apache BeamVerverica
 
Apache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingApache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingGuozhang Wang
 
Managing Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBManaging Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBJason Terpko
 
Virtual Bash! A Lunchtime Introduction to Kafka
Virtual Bash! A Lunchtime Introduction to KafkaVirtual Bash! A Lunchtime Introduction to Kafka
Virtual Bash! A Lunchtime Introduction to KafkaJason Bell
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningGuido Schmutz
 
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
The Evolution of Trillion-level Real-time Messaging System in BIGO  - Puslar ...The Evolution of Trillion-level Real-time Messaging System in BIGO  - Puslar ...
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...StreamNative
 
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...confluent
 
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호Amazon Web Services Korea
 
Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Intel at Strata Singapore 2015Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Intel at Strata Singapore 2015Tachyon Nexus, Inc.
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per SecondAmazon Web Services
 
How to Build a Scylla Database Cluster that Fits Your Needs
How to Build a Scylla Database Cluster that Fits Your NeedsHow to Build a Scylla Database Cluster that Fits Your Needs
How to Build a Scylla Database Cluster that Fits Your NeedsScyllaDB
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafkaSamuel Kerrien
 
ksqlDB Workshop
ksqlDB WorkshopksqlDB Workshop
ksqlDB Workshopconfluent
 
Stream Application Development with Apache Kafka
Stream Application Development with Apache KafkaStream Application Development with Apache Kafka
Stream Application Development with Apache KafkaMatthias J. Sax
 

Similar to Kafka Streams and Tables: A Primer (20)

Kafka 102: Streams and Tables All the Way Down | Kafka Summit San Francisco 2019
Kafka 102: Streams and Tables All the Way Down | Kafka Summit San Francisco 2019Kafka 102: Streams and Tables All the Way Down | Kafka Summit San Francisco 2019
Kafka 102: Streams and Tables All the Way Down | Kafka Summit San Francisco 2019
 
Getting Started with Kafka on k8s
Getting Started with Kafka on k8sGetting Started with Kafka on k8s
Getting Started with Kafka on k8s
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Deploying Kafka Streams Applications with Docker and Kubernetes
Deploying Kafka Streams Applications with Docker and KubernetesDeploying Kafka Streams Applications with Docker and Kubernetes
Deploying Kafka Streams Applications with Docker and Kubernetes
 
spark stream - kafka - the right way
spark stream - kafka - the right way spark stream - kafka - the right way
spark stream - kafka - the right way
 
Unified stateful big data processing in Apache Beam (incubating)
Unified stateful big data processing in Apache Beam (incubating)Unified stateful big data processing in Apache Beam (incubating)
Unified stateful big data processing in Apache Beam (incubating)
 
Aljoscha Krettek - Portable stateful big data processing in Apache Beam
Aljoscha Krettek - Portable stateful big data processing in Apache BeamAljoscha Krettek - Portable stateful big data processing in Apache Beam
Aljoscha Krettek - Portable stateful big data processing in Apache Beam
 
Apache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingApache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream Processing
 
Managing Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBManaging Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDB
 
Virtual Bash! A Lunchtime Introduction to Kafka
Virtual Bash! A Lunchtime Introduction to KafkaVirtual Bash! A Lunchtime Introduction to Kafka
Virtual Bash! A Lunchtime Introduction to Kafka
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
 
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
The Evolution of Trillion-level Real-time Messaging System in BIGO  - Puslar ...The Evolution of Trillion-level Real-time Messaging System in BIGO  - Puslar ...
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
 
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...
 
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
 
Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Intel at Strata Singapore 2015Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Intel at Strata Singapore 2015
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
 
How to Build a Scylla Database Cluster that Fits Your Needs
How to Build a Scylla Database Cluster that Fits Your NeedsHow to Build a Scylla Database Cluster that Fits Your Needs
How to Build a Scylla Database Cluster that Fits Your Needs
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
ksqlDB Workshop
ksqlDB WorkshopksqlDB Workshop
ksqlDB Workshop
 
Stream Application Development with Apache Kafka
Stream Application Development with Apache KafkaStream Application Development with Apache Kafka
Stream Application Development with Apache Kafka
 

More from confluent

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluentconfluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkconfluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataconfluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2confluent
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023confluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesisconfluent
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023confluent
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streamsconfluent
 

More from confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Recently uploaded

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Recently uploaded (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Kafka Streams and Tables: A Primer

  • 1. 1 Apache Kafka™ Fundamentals: What Every Software Engineer Should Know about Streams and Tables in Kafka Kafka Meetup Zurich, Switzerland, Feb 2020 Michael G. Noll Senior Technologist, Office of the CTO @miguno
  • 5. 5 An Event Streaming Platform gives you three key functionalities 5 Publish & Subscribe to Events Store Events Process & Analyze Events @miguno
  • 6. 6 An Event records the fact that something happened 6 A good was sold An invoice was issued to Michael Alice paid $200 to Bob on Feb 01, 2020 at 8:31am A new customer registered @miguno
  • 7. 7 A Stream records history as a sequence of Events 7 @miguno
  • 8. 88 @miguno Event Streaming Paradigm Highly Scalable Durable Persistent Maintains Order Fast (Low Latency) Kafka = Source of Truth, stores every article since 1851 Denormalized into “Content View” Normalized assets (images, articles, bylines, etc.) https://www.confluent.io/blog/publishing-apache-kafka-new-york-times/ Streams record history, even hundreds of years
  • 9. 99 “The ledger of sales.” “The sales totals.” Streams record history Tables represent state @miguno
  • 10. 1010 1. e4 e5 2. Nf3 Nc6 3. Bc4 Bc5 4. d3 Nf6 5. Nbd2 “The sequence of moves.” “The state of the board.” Streams record history Tables represent state @miguno
  • 11. 11 Streams = INSERT only Immutable, append-only Tables = INSERT, UPDATE, DELETE Mutable, row key (event.key) identifies which row 11 @miguno
  • 12. 12 The key to mutability is … the event.key! 12 @miguno Stream Table Has unique key constraint? No Yes First event with key bob arrives INSERT INSERT Another event with key bob arrives INSERT UPDATE Event with key bob and value null arrives INSERT DELETE Event with key null arrives INSERT <ignored> RDBMS analogy: A Stream is ~ a Table that has no unique key and is append-only.
  • 13. 13 Creating a table from a stream or topic streams
  • 14. 1414 @miguno Stream (facts) Table (dims) alice Berlin bob Lima alice Berlin alice Rome bob Lima alice Paris bob Sydney alice Berlin alice Rome bob Lima alice Paris bob Sydney 90°
  • 15. 1515 aggregation like SUM(), COUNT() table changes * See the academic paper Streams and Tables: Two Sides of the Same Coin, M. Sax et al., BIRTE ’18 Streams record history Tables represent state Duality @miguno
  • 16. 16 Aggregating a stream (COUNT example) streams @miguno
  • 17. 17 Aggregating a stream (COUNT example) @miguno streams
  • 18. 1818 Topics, Partitions, and Other Storage Fundamentals
  • 19. 19 A Topic is where events are durably stored, like a file in a distributed file system 19 @miguno An unbounded sequence of serialized events, where each event is represented as an encoded key-value pair* or “Kafka message”. 00100 11101 11000 0001110101 10001
  • 20. 20 Storage formats Clients serialize events to bytes when writing, and deserialize when reading. 20 @miguno 00100 11101 11000 00011 00100 0011010101 10001 Producer serializes and writes an event Consumer reads and deserializes an event
  • 21. 21 Storage formats Brokers take no part in this. All they see is bytes. Being this “dumb” is actually really smart! 21 @miguno Bytes in Bytes out
  • 22. 22 Storage is partitioned And ‘partitions’ are the most important concept in Kafka 22 ... ... ... ... P1 P2 P3 P4 Storage Topic @miguno
  • 23. 23 Partitions are spread across Kafka brokers A broker can be leader and/or follower of 1+ partitions 23 @miguno ... P1 ... P2 ... P3 ... P1 ... ... ...P4 ...P4 ... ... P1 ... P3 ... P2 ... ... ... Leader Follower
  • 24. 24 Storage Layer (Brokers) Processing Layer (ksqlDB, KStreams, Kafka clients) Partitions play a central role in Kafka 24 Topics are partitioned. Partitions enable scalability, elasticity, fault tolerance. stored in replicated based on ordered based on PartitionsData is joined based on read from and written to processed based on @miguno
  • 25. 25 Event producers determine event partitioning through a partitioning function ƒ(event.key, event.value)* 25 ... ... ... ... P1 P2 P3 P4 Storage Topic Event sent and appended to partition 1 @miguno
  • 26. 26 Events with same key should be in same partition to ensure proper ordering of related events to ensure processing by Consumers returns expected results 26 ... ... ... ... P1 P2 P3 P4 Yellow events should always be stored in partition 3 @miguno
  • 27. 27 Top causes for same key in different partitions 1. Topic configuration You increased the number of partitions in a topic 2. Producer configuration A producer uses a custom partitioner → Be careful in this situation! 27 @miguno
  • 28. 2828 Processing Fundamentals with ksqlDB and Kafka Streams From Topics to Streams and Tables
  • 29. 29 Topics live in the Kafka storage layer, ‘filesystem’ (Brokers) Streams and Tables live in the Kafka processing layer (ksqlDB, KStreams) 29 @miguno
  • 30. 30 Processing Layer (ksqlDB, KStreams) 30 00100 11101 11000 00011 00100 00110Topic alice Paris bob Sydney alice RomeEvent Stream plus schema (serdes) alice 2 bob 1 Table plus aggregation Storage Layer (Brokers) Topics vs. Streams and Tables @miguno
  • 31. 31 Kafka Processing Data is processed per-partition 31 ... ... ... ... P1 P2 P3 P4 Storage Processing read via network Topic App Instance 1 Application App Instance 2 ‘payments’ with consumer group ‘fraud-detection-app’ @miguno
  • 32. 32 Kafka Processing Stream tasks are the unit of parallelism 32 ... ... ... ... P1 P2 P3 P4 Storage Processing State Stream Task 1 Stream Task 2 Stream Task 3 Stream Task 4 read via network Application Instance 1Topic Application Instance 1 Application Instance 2 @miguno
  • 33. 33 Streams and Tables are partitioned, too 33 ... ... ... ... P1 P2 P3 P4 Stream Task 1 Stream Task 2 Stream Task 3 Stream Task 4 TABLE / KTable 2 GB 3 GB 5 GB 2 GB Application Instance 1 Application Instance 2
  • 34. 34 Global Tables give complete data to every task Great for joins without re-keying the input or to broadcast info 34 ... ... ... ... P1 P2 P3 P4 Stream Task 1 Stream Task 2 Stream Task 3 Stream Task 4 GlobalKTable 2 + 3 + 5 + 2 = 12 GB 12 GB 12 GB 12 GB @miguno
  • 35. 3535 @miguno@miguno Concept Schema Partitioned Unbounded Ordering Guarantee*** Mutable Single Value Per Key Fault Tolerant Storage Layer Topic No* (raw bytes) Yes Yes Yes No No Yes Processing Layer Stream Yes Yes Yes Yes No No Yes Table Yes Yes No** No Yes Yes Yes Global Table Yes No No** No Yes Yes Yes *When using Schema Registry with Avro, then a schema is technically available for the data in a topic. **Generally speaking the answer is Yes but, in practice, tables are almost always bounded due to finite key space. ***Ordering per partition, not across the multiple partitions in a topic. Topics vs. Streams and Tables
  • 36. 3636 Elasticity, Fault Tolerance, and Other Advanced Concepts
  • 37. 37 Tables are materialized on local disk so that processing is fast, and tables can be larger than RAM. Enables large-scale state. Saves $$$ on cloud instances. 37 But machines and containers can be lost! How can we make this fault-tolerant? @miguno Stream Task 1 Materialized on local disk under /var (via RocksDB by default) Global Table Table
  • 38. 38 streaming ‘restore’ via network Stream Task 1 On machine B Tables and other state are made fault-tolerant by storing their data durably, remotely in Kafka topics 38 streaming ‘backup’ via network table’s changelog topic (log-compacted) Kafka Storage ksqlDB, Kafka Streams On machine A Stream Task 1
  • 39. 39 Standby Replicas speed up application recovery App instances can optionally maintain copies of another instance’s table/state data to minimize failover times 39 @miguno num.standby.replicas = 1 Stream Task 1 Stream Task 2 App Instance 1 App Instance 2 num.standby.replicas = 0 (default) Stream Task 1 Stream Task 2 Stream Task 2 failover
  • 40. 40 Elastic scaling Stream tasks are migrated, including tables/state (via Kafka) 40 @miguno App Instance 1 App Instance 2 Stream Task 1 Stream Task 3 Stream Task 2 Stream Task 4 App Instance 3 App Instance 4 Stream Task 2 Stream Task 4
  • 41. 41 bob Zurich alice Rome bob Zurich bob Sydney alice Romealice Bern alice Paris Tables <-> topic log-compaction A table’s underlying topic is compacted by default to save Kafka storage space to speed up failover and recovery for processing 41 Note: Compaction intentionally removes part of a table’s history. If you need the full history and don’t have the historic data elsewhere, consider disabling compaction. @miguno
  • 42. 4242 @miguno TL;DR for Log Compaction Have a Stream? → Disable log compaction for its topic (= default) Have a Table? → Enable log compaction for its topic (= default) Disable only when needed, see previous slide
  • 43. 43 Max processing parallelism = #input partitions 43 ... ... ... ... P1 P2 P3 P4 Topic Application Instance 1 Application Instance 2 Application Instance 3 Application Instance 4 Application Instance 5 *** idle *** Application Instance 6 *** idle *** → Need higher parallelism? Increase the original topic’s partition count. → Higher parallelism for just one use case? Derive a new topic from the original with higher partition count. Lower its retention to save storage. @miguno
  • 44. 44 How to increase number of partitions when needed ksqlDB example: statement below creates a new stream with the desired number of partitions. 44 CREATE STREAM products_repartitioned WITH (PARTITIONS=30) AS SELECT * FROM products EMIT CHANGES @miguno
  • 45. 45 ‘Hot’ partitions can be problematic, often caused by 1. Events not evenly distributed across partitions 2. Events evenly distributed but certain events take longer to process 45 Strategies to address hot partitions include 1a. Ingress: Find better partitioning function ƒ(event.key) for producers 1b. Storage: Re-partition data into new topic if you can’t change the original 2. Scale processing vertically, e.g. more powerful CPU instances ... ... ... ... P1 P2 P3 P4 @miguno
  • 46. 46 How to re-partition your data when needed ksqlDB example: statement below creates a new stream with changed number of partitions and a new field as event.key (so that its data is now correctly co-partitioned for joining) 46 CREATE STREAM products_repartitioned WITH (PARTITIONS=42) AS SELECT * FROM products PARTITION BY product_id EMIT CHANGES @miguno
  • 47. 47 Capacity Planning and Sizing Sorry, not covered here! I recommend: ksqlDB Performance Tuning for Fun and Profit, by Nick Dearden https://kafka-summit.org/sessions/ksql-performance-tuning-fun-profit/ ksqlDB documentation https://docs.ksqldb.io/en/latest/operate-and-deploy/capacity-planning/ Kafka Streams documentation https://docs.confluent.io/current/streams/sizing.html 47 @miguno
  • 48. 4848 Where to go from here? My blog series on Kafka Fundamentals to learn more https://www.confluent.io/blog/kafka-streams-tables-part-1-event-streaming Confluent Cloud, the fastest & easiest way to get started with Kafka https://www.confluent.io/confluent-cloud/ Confluent Platform, if you want to operate Kafka yourself https://www.confluent.io/product/confluent-platform/ 48 @miguno