Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022

Unbundling the Modern
Streaming Stack
Dunith Dhanushka - 05/10/2022
Navigating the Real-time Analytics Landscape

About Me
twitter.com/dunithd medium.com/event-driven-utopia linkedin.com/in/dunithd/
• I’m Dunith Dhanushka
• Big data solution architect -> DevRel
• Blogs at eventdrivenutopia.com

Background
• This talk is based on my blog
that I published in April, 2022.
• This talk has been updated
with a few new things since
then.
• Enjoy!

Goal of the Talk
What Are We Going To Talk About Today?
Introduce you to the things
required to build real-time applications
that harness value from streaming data

The Plan
The Order of Things
1. A refresher on streaming data
2. The classic streaming stack
3. The modern streaming stack
4. Current trends and the future outlook

Streaming Data
What Is a stream?
A stream is a continuous, never-ending data
f
low with no beginning or
end. The data is incrementally made available over time, enabling you to
act upon it without needing to be downloaded
f
irst.

Events
Streams are made of events
A data stream consists of a series of data points ordered in time.
Each data point represents an “event” or a change in the state of the
business.
T4 T3 T2 T1 T0
Event source
Event stream
Time
Events

Event First Thinking
Modelling State Changes in Systems
A user with ID 1234 purchased item 567 for $3.99 on 2022/06/12 at Austin, TX
Fact Value
User ID 1234
Item ID 567
Price Paid $3.99
Date 2022/06/12
Place Austin, TX
• Events represents facts.
• Events are immutable.
• Events belong to the past.

Making Sense of Streaming Data

Events Have A Shelf Life
Act Fast Before You Lose Their Value
Image credit - https://d3i71xaburhd42.cloudfront.net/8cb6c2711afd3e504400ee12d3b582cc06348b08/7-Figure2-1.png

Real-time Analytics
Extracting Value From Events As Soon as They Are Made Available
REAL-TIME
ANALYTICS
Insights
React
Streams of Events

A streaming stack is the processes, tools, and technologies
you use to derive insights from unbounded data.

The Beginning
• Real-time analytics dates back to decades, existed in the forms of
Complex Event Processing (CEP) and Event Stream Processing (ESP).
• Most of the work has been academic. But few vendors like Progress
Apama, Esper, Tibco, and Streambase tried bringing it to the mass
market.

Lambda Architecture
Promotes A Uni
f
ied Serving Layer
Image credit - https://www.databricks.com/glossary/lambda-architecture

• Overly complicated technology:
Specialised skillset of distributed
systems and performance
engineering.
• Limited only to the JVM: Non-
JVM developers had no option
rather than adapting.
• Higher footprint on
infrastructure: Stream
processors tax heavily on the CPU
and RAM.
• Maintenance overhead: Having
to maintain both speed and batch
layers.

Modern Streaming Stack
Modern Cloud-native tools
Managed and Serverless platforms
Rich tooling and developer experience
Expressive programming model

MSS is the classic streaming stack reimagined with
self-service cloud-native tools
providing a simpli
f
ied yet powerful developer experience
to build real-time analytics applications.

Modern Streaming Stack
STREAMING DATA
PLATFORM
STREAM PROCESSING
EVENT
PRODUCERS
TIERED
STORAGE
DATA API,
METADATA &
GOVERNANCE
Data-driven
Applications
Operational
Systems
Real-time
Analytics
SERVING LAYER

Event Production/Enablement
The Origins of Events
STREAMING DATA
PLATFORM
Language Speci
f
ic SDK Clients

• Ingest events from sources in a
scalable manner, and store
them durably until they are
processed.
• Based on an immutable,
distributed log
f
ile. Events are
appended to the log and
partitioned across multiple
servers for durability and
scalability.
EVENT
PRODUCERS
Streaming Data Platform
TOPIC
TOPIC
TOPIC
TOPIC
TOPIC
TOPIC

STREAM PROCESSING
Event-driven Microservices
Streaming ETL
• Stream joins for enrichment
• Filtering/routing/transforming streams
• Data integration
• Repartitioning streams (re-keying)
Streaming Analytics
• Stateful aggregations
• Window operations
• Materialising streams, stream-table duality
• Actors
• Reactive logic execution
• Event-by-event processing, triggering side e
ff
ects

INPUT TOPIC OUTPUT TOPIC
Event Streaming Platform
STREAM PROCESSING
Serving Layer
Events Streaming ingestion
Real-time Insights Consumption
Internal/user-facing
Analytics
Data
Applications Recommendation
Ad-hoc
Exploration

Serving Layer
Expectations
• Serve queries with sub-second latency to provide a better user experience.
• Support a throughput of hundreds of thousands of queries per second to
serve an Internet-scale user base.
• Ensure data freshness — serve analytics from data ingested a few seconds
ago.
• Run complex OLAP queries, supporting joins, aggregations, and
f
iltering on
large data sets.

Serving Layer
Technology Choices
Key-value stores,
NoSQL databases Real-time OLAP Databases

Serving Layer
STREAMING DATA
PLATFORM
New Events
Older Events
Tiered Storage
• Back
f
illing
• Hydrating new applications
• Experimentation (ad-hoc querying)
• Archival/regulatory compliance
• Training ML models
O
ff
line Use Cases

Data APIs, Metadata, and Governance

Analytics must be democratised
and accessible across the board…
Image credits - https://www.datanami.com/2022/01/21/data-meshes-set-to-spread-in-2022/, https://www.con
f
luent.io/blog/how-to-build-a-data-mesh-using-
event-streams/

Event Mesh
EVENT CATALOG SCHEMA REGISTRY
STREAMING API GRAPHQL API
Serving Layer
STREAM PROCESSOR
EVENT STREAMING
PLATFORM
Decision makers Data applications Regulatory bodies Business partners
Real-time Insights

Technology Choices
Standards Schema Registries

Convergence of Stream Processing and Serving Layer
Streaming databases takes the stateful stream processing to the next level.
SaaS o
ff
errings Integrated serving layer Write logic with SQL
Pluggable integrations
A
ff
ordable Developer friendly
Pay-as-you-go
Less components to manage
Integrated tooling
Caters to non-JVM developers
Self-serve

Rise of The Lakehouse Architecture
A Lakehouse combines a data warehouse, data lake, and an event streaming platform
together.
High-throughput
streaming ingestion
Change Data Capture
Upserts
Transactions
Table formats

Takeaways
There’s No Silver Bullet
• Start small, build the critical path, and iterate.
• Pick components based on the need and know their limitations.
• Experiment, fail fast, and fail cheap.
• Go for managed services, if the team is small and new to streaming
technologies.
• Learn from mistakes, establish processes, and share wisdom!!

Thank you!
twitter.com/dunithd medium.com/event-driven-utopia linkedin.com/in/dunithd/
Find me at:

Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022

Recommended

Recommended

More Related Content

Similar to Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022

Similar to Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022 (20)

More from HostedbyConfluent

More from HostedbyConfluent (20)

Recently uploaded

Recently uploaded (20)

Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022