Event-Driven Architecture Masterclass: Challenges in Stream Processing

A Tale of 3 Pipelines
Brian Taylor

Reference Architectures
2
Stream to Stream
Stream to State
Stateful Stream to
Stream
Write: Money = Performance
Read: Data dependency
limited
Data dependency
limited
Money = Performance
ODP “Analytic
segments”
ODP “Real-Time
Segments”
Webhook system
What it is: How it scales:

Inter-Event Data Dependency
3
A property of the stream and the problem. Measures the way that events impact
the processing of following events.
For example:
A stream of record mutations that must be applied one after another within a
single record id
● A stream with many records and no sequential mutations for any given record
id has no data dependency
● A stream with a single record id and only sequential mutations has maximum
data dependency
● A topic with a single partition has maximum data dependency (in some sense)

Inter-Event Data Dependency
The average length of the data-dependent chains in your stream
decide your average throughput at any scale.
This is equivalent to the way “the sequential portion” of a problem
constrains the maximum parallel speedup in Amdhal’s Law.
S: max speedup fraction, s: parallelism, p: “data dependency fraction”

Big Idea
5
● “It’s all about the data-dependency, baby”
● No data dependency: Smooth scaling
● Data dependency: Navigating hell

Reference Architecture
7
Stream to Stream Money = Performance
Webhook system
Subscription information
Change
Notifications
Delivery
Requests

What you can do with ∅DD
8
Abstractly
■ Data reshaping
■ Order-independent enrichment
■ Non-self Joins
Concrete Use Cases
■ Adapters
■ Sentiment detectors
■ Geo-IP mappers
■ Redaction
If no external data
access is required:
Redpanda transforms
FTW!

Performance Tradespace
9
More money = More Throughput
Tactics: Add shards and partitions until you have enough capacity

Deferred Inter-Event
Data Dependency
DEDD
10

11
Stream to State
Write: Money = Performance
Read: Data dependency
limited
ODP “Analytic
segments”
Optimizely
Experimentation

What you can do with DEDD
12
Abstractly
● Use it when Write Performance is more important than Read
Performance
Concrete Use Cases
● Reporting: Especially when users read less than they write
● Nightly model training

13
Write side: More money = More Speed
Read side: Data-dependency limited
Tactics: Reduce data dependency with finer grained partitioning

Streaming Inter-Event
Data Dependency
SEDD
14

15
Stateful Stream to
Stream
Data dependency
limited
ODP “Real-Time
Segments”

What you can do with SEDD
16
Abstractly
■ Streaming aggregates
■ Pattern detectors
Concrete Use Cases
■ Segmentation
■ Real time model training

17
Throughput: Data-dependency limited
Tactics for reducing data-dependency:
■ Finer grained partitioning
■ Accept eventual consistency with CRDTs

Fundamental Tradeoff
Inter-event data dependency Max throughput
If you need SEDD and throughput, welcome to hell.
18

Query Latency Data Latency
Query Latency: Time it takes to
respond to a request
Driven by: DD work remaining
to resolve the request
Impact: The places where it’s
suitable to use your query API
20
Data Latency: How long it
takes for new information to
impact a query
Driven by: How you cheated to
hide from your data
dependency
Impact: How actionable the
results from your API are

“Cheating” out of Hell
21
Stream to State
Introduces a data
latency / cost tradeoff
Min-data latency is
now data dependency
limited
Everyone else’s
“Real-Time
Segments”
What it is:
How it scales:
Periodic State to Stream

Data Dependency Decides Everything
∅DD - Oddly common in example code and marketing materials. Very
rarely happens in real life.
DEDD - Practical workaround most of the time. Became truly effective
in the last decade as data warehouses have matured.
SEDD - Sounds like “sad” for a reason. A difficult place to be. Hopefully
the next decade will bring some meaningful breakthroughs here.
23

Keep in touch!
Brian Taylor
Director of Engineering
Optimizely
brian.taylor@optimizely.com
@netguy204

Event-Driven Architecture Masterclass: Challenges in Stream Processing

Recommended

Recommended

More Related Content

Similar to Event-Driven Architecture Masterclass: Challenges in Stream Processing

Similar to Event-Driven Architecture Masterclass: Challenges in Stream Processing (20)

More from ScyllaDB

More from ScyllaDB (20)

Recently uploaded

Recently uploaded (20)

Event-Driven Architecture Masterclass: Challenges in Stream Processing