Kafka Streams
Quando streams encontram tabelas
Cristiano Altmann
Arquiteto Software
https://www.linkedin.com/in/crisaltmann/
Matheus Alagia
Eng. de Computação
https://www.linkedin.com/in/matheusalagia/
https://ubots.com.br
“Data models are perhaps the most important
part of developing software, because they have
such a profound effect: not only on how the
software is written, but also on how we think
about the problem that we are solving.”
- Martin Kleppmann
Designing data-intensive applications
Microservices
Independent deploy
Low coupling
Horizontal Scalability
Technological choices
Business time
Independent teams
Technological
evolution
Resilient
Increasingly we build ecosystem
Microservices is really just distributed
systems!
The Hardest Part About Microservices:
Your Data
Common Patterns
● Shared database
● Database per service
● Event driven approaches
The beginning... Shared Database
serviceXserviceY
Database per service
serviceXserviceY
Event-driven architecture
S2
S3
S1S4
Message Broker
What is Streaming Processing?
“Is some kind of computation over a Data Stream. First and foremost, a data
stream is an abstraction representing an unbounded dataset. Unbounded means
infinite and ever growing.”
Kafka: The Definitive Guide
Stream processing is a programming paradigm...
Request-Response Batch ProcessingStreaming
Processing
Throughput
Latency
Stream-Processing Concepts
Time
Event time
Local state
Log append time
State
Processing time
External state
Time Windows
Slide window
Tumbling window
Hopping Window
Stream-Processing Concepts
Time
Event time
Local state
Log append time
State
Processing time
External state
Time Windows
Slide window
Tumbling window
Hopping Window
Stream-Processing Concepts
Time
Event time
Local state
Log append time
State
Processing time
External state
Time Windows
Slide window
Tumbling window
Hopping Window
Calculation of the time window
Size
Advance interval
How long remains updatable
The world always changes, and sometimes we are interested
in the events that caused those changes, whereas other
times we are interested in the current state of the world….
Stream-Table Duality
Systems that allow you to transition back and forth between
the two ways of looking at data are more powerful than
systems that support just one.
- Neha Narkhede (Kafka: The definitive guide)
Kafka vs Kafka Stream
● Distributed log
● High available
● ⅓ Fortune 500
● APIS:
○ Producer
○ Consumer
○ Connect
○ Streams
● Part of Kafka ecosystem
● Just a lib
● Simple API
● DSL
Stream-Processing Design Patterns
Single-Event Processing
FILTER
Code
External Lookup: Stream-Table Join
Processing with Local State
But sometimes we need state….
Moving the state to a database just push the
problem to another layer.
Mantra: Stateless
Service
Client
STATELESS
STATE STORAGE
??
?
Stateless is good!!
● Services start instantly
● Thread Safe
● Scaled out linearly
● Not shared state
Stream
Table
Stream
Join
● Stream - Stream
● Stream - Table
● Table - Table
Stream - Stream (Windowed-join)
Stream - Table
Stream-Processing Landscape
Obrigado!
Perguntas?

Kafka Streams