SlideShare a Scribd company logo
Unbundling the Modern
Streaming Stack
Dunith Dhanushka - 05/10/2022
Navigating the Real-time Analytics Landscape
About Me
twitter.com/dunithd medium.com/event-driven-utopia linkedin.com/in/dunithd/
• I’m Dunith Dhanushka
• Big data solution architect -> DevRel
• Blogs at eventdrivenutopia.com
Background
• This talk is based on my blog
that I published in April, 2022.
• This talk has been updated
with a few new things since
then.
• Enjoy!
Goal of the Talk
What Are We Going To Talk About Today?
Introduce you to the things
required to build real-time applications
that harness value from streaming data
The Plan
The Order of Things
1. A refresher on streaming data
2. The classic streaming stack
3. The modern streaming stack
4. Current trends and the future outlook
What is a Streaming Stack?
Streaming Data
What Is a stream?
A stream is a continuous, never-ending data
f
low with no beginning or
end. The data is incrementally made available over time, enabling you to
act upon it without needing to be downloaded
f
irst.
Events
Streams are made of events
A data stream consists of a series of data points ordered in time.
Each data point represents an “event” or a change in the state of the
business.
T4 T3 T2 T1 T0
Event source
Event stream
Time
Events
Event First Thinking
Modelling State Changes in Systems
A user with ID 1234 purchased item 567 for $3.99 on 2022/06/12 at Austin, TX
Fact Value
User ID 1234
Item ID 567
Price Paid $3.99
Date 2022/06/12
Place Austin, TX
• Events represents facts.
• Events are immutable.
• Events belong to the past.
Making Sense of Streaming Data
Events Have A Shelf Life
Act Fast Before You Lose Their Value
Image credit - https://d3i71xaburhd42.cloudfront.net/8cb6c2711afd3e504400ee12d3b582cc06348b08/7-Figure2-1.png
Real-time Analytics
Extracting Value From Events As Soon as They Are Made Available
REAL-TIME
ANALYTICS
Insights
React
Streams of Events
What is a Streaming Stack?
A streaming stack is the processes, tools, and technologies
you use to derive insights from unbounded data.
The Classic Streaming Stack
The Beginning
• Real-time analytics dates back to decades, existed in the forms of
Complex Event Processing (CEP) and Event Stream Processing (ESP).
• Most of the work has been academic. But few vendors like Progress
Apama, Esper, Tibco, and Streambase tried bringing it to the mass
market.
Then Came Big Data…
Lambda Architecture
Promotes A Uni
f
ied Serving Layer
Image credit - https://www.databricks.com/glossary/lambda-architecture
Why Didn’t It Pick Up?
• Overly complicated technology:
Specialised skillset of distributed
systems and performance
engineering.
• Limited only to the JVM: Non-
JVM developers had no option
rather than adapting.
• Higher footprint on
infrastructure: Stream
processors tax heavily on the CPU
and RAM.
• Maintenance overhead: Having
to maintain both speed and batch
layers.
The Modern Streaming Stack
Modern Streaming Stack
Modern Cloud-native tools
Managed and Serverless platforms
Rich tooling and developer experience
Expressive programming model
MSS is the classic streaming stack reimagined with
self-service cloud-native tools
providing a simpli
f
ied yet powerful developer experience
to build real-time analytics applications.
Modern Streaming Stack
STREAMING DATA
PLATFORM
STREAM PROCESSING
EVENT
PRODUCERS
TIERED
STORAGE
DATA API,
METADATA &
GOVERNANCE
Data-driven
Applications
Operational
Systems
Real-time
Analytics
SERVING LAYER
The Unbundling
Event Production/Enablement
The Origins of Events
STREAMING DATA
PLATFORM
Language Speci
f
ic SDK Clients
Streaming Data Platform
• Ingest events from sources in a
scalable manner, and store
them durably until they are
processed.
• Based on an immutable,
distributed log
f
ile. Events are
appended to the log and
partitioned across multiple
servers for durability and
scalability.
EVENT
PRODUCERS
Streaming Data Platform
TOPIC
TOPIC
TOPIC
TOPIC
TOPIC
TOPIC
Technology Choices
Stream Processors
STREAM PROCESSING
Event-driven Microservices
Streaming ETL
• Stream joins for enrichment
• Filtering/routing/transforming streams
• Data integration
• Repartitioning streams (re-keying)
Streaming Analytics
• Stateful aggregations
• Window operations
• Materialising streams, stream-table duality
• Actors
• Reactive logic execution
• Event-by-event processing, triggering side e
ff
ects
Technology Choices
Serving Layer
INPUT TOPIC OUTPUT TOPIC
Event Streaming Platform
STREAM PROCESSING
Serving Layer
Events Streaming ingestion
Real-time Insights Consumption
Internal/user-facing
Analytics
Data
Applications Recommendation
Ad-hoc
Exploration
Serving Layer
Expectations
• Serve queries with sub-second latency to provide a better user experience.
• Support a throughput of hundreds of thousands of queries per second to
serve an Internet-scale user base.
• Ensure data freshness — serve analytics from data ingested a few seconds
ago.
• Run complex OLAP queries, supporting joins, aggregations, and
f
iltering on
large data sets.
Serving Layer
Technology Choices
Key-value stores,
NoSQL databases Real-time OLAP Databases
Tiered Storage
Serving Layer
STREAMING DATA
PLATFORM
New Events
Older Events
Tiered Storage
• Back
f
illing
• Hydrating new applications
• Experimentation (ad-hoc querying)
• Archival/regulatory compliance
• Training ML models
O
ff
line Use Cases
Data APIs, Metadata, and Governance
Analytics must be democratised
and accessible across the board…
Image credits - https://www.datanami.com/2022/01/21/data-meshes-set-to-spread-in-2022/, https://www.con
f
luent.io/blog/how-to-build-a-data-mesh-using-
event-streams/
Event Mesh
EVENT CATALOG SCHEMA REGISTRY
STREAMING API GRAPHQL API
Serving Layer
STREAM PROCESSOR
EVENT STREAMING
PLATFORM
Decision makers Data applications Regulatory bodies Business partners
Real-time Insights
Technology Choices
Standards Schema Registries
Observations &
Future Outlook
Convergence of Stream Processing and Serving Layer
Streaming databases takes the stateful stream processing to the next level.
SaaS o
ff
errings Integrated serving layer Write logic with SQL
Pluggable integrations
A
ff
ordable Developer friendly
Pay-as-you-go
Less components to manage
Integrated tooling
Caters to non-JVM developers
Self-serve
Rise of The Lakehouse Architecture
A Lakehouse combines a data warehouse, data lake, and an event streaming platform
together.
High-throughput
streaming ingestion
Change Data Capture
Upserts
Transactions
Table formats
Takeaways
Takeaways
There’s No Silver Bullet
• Start small, build the critical path, and iterate.
• Pick components based on the need and know their limitations.
• Experiment, fail fast, and fail cheap.
• Go for managed services, if the team is small and new to streaming
technologies.
• Learn from mistakes, establish processes, and share wisdom!!
Book Announcement!
Thank you!
twitter.com/dunithd medium.com/event-driven-utopia linkedin.com/in/dunithd/
Find me at:

More Related Content

Similar to Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022

Lean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataLean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataStylight
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Avere Systems
 
Pacemaker hadoop infrastructure and soft serve experience
Pacemaker   hadoop infrastructure and soft serve experiencePacemaker   hadoop infrastructure and soft serve experience
Pacemaker hadoop infrastructure and soft serve experience
Vitaliy Bashun
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
Maya Lumbroso
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
Dataconomy Media
 
The New Model
The New ModelThe New Model
The New Model
David Kaiser
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
Guido Schmutz
 
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...
Memoori
 
Machine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville MeetupMachine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville Meetup
Sri Ambati
 
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Impetus Technologies
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
confluent
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016
Guido Schmutz
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
Guido Schmutz
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
eRic Choo
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapSrinath Perera
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven products
Lars Albertsson
 
Hadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data Architect
Hadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data ArchitectHadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data Architect
Hadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data Architect
SoftServe
 
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Spark Summit
 
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled ArchitectureDM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
DATAVERSITY
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
Eric Kavanagh
 

Similar to Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022 (20)

Lean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataLean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big Data
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
 
Pacemaker hadoop infrastructure and soft serve experience
Pacemaker   hadoop infrastructure and soft serve experiencePacemaker   hadoop infrastructure and soft serve experience
Pacemaker hadoop infrastructure and soft serve experience
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
 
The New Model
The New ModelThe New Model
The New Model
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...
 
Machine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville MeetupMachine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville Meetup
 
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven products
 
Hadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data Architect
Hadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data ArchitectHadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data Architect
Hadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data Architect
 
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
 
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled ArchitectureDM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 

More from HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 

More from HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Recently uploaded

The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 

Recently uploaded (20)

The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 

Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022

  • 1. Unbundling the Modern Streaming Stack Dunith Dhanushka - 05/10/2022 Navigating the Real-time Analytics Landscape
  • 2. About Me twitter.com/dunithd medium.com/event-driven-utopia linkedin.com/in/dunithd/ • I’m Dunith Dhanushka • Big data solution architect -> DevRel • Blogs at eventdrivenutopia.com
  • 3. Background • This talk is based on my blog that I published in April, 2022. • This talk has been updated with a few new things since then. • Enjoy!
  • 4. Goal of the Talk What Are We Going To Talk About Today? Introduce you to the things required to build real-time applications that harness value from streaming data
  • 5. The Plan The Order of Things 1. A refresher on streaming data 2. The classic streaming stack 3. The modern streaming stack 4. Current trends and the future outlook
  • 6. What is a Streaming Stack?
  • 7. Streaming Data What Is a stream? A stream is a continuous, never-ending data f low with no beginning or end. The data is incrementally made available over time, enabling you to act upon it without needing to be downloaded f irst.
  • 8. Events Streams are made of events A data stream consists of a series of data points ordered in time. Each data point represents an “event” or a change in the state of the business. T4 T3 T2 T1 T0 Event source Event stream Time Events
  • 9. Event First Thinking Modelling State Changes in Systems A user with ID 1234 purchased item 567 for $3.99 on 2022/06/12 at Austin, TX Fact Value User ID 1234 Item ID 567 Price Paid $3.99 Date 2022/06/12 Place Austin, TX • Events represents facts. • Events are immutable. • Events belong to the past.
  • 10. Making Sense of Streaming Data
  • 11. Events Have A Shelf Life Act Fast Before You Lose Their Value Image credit - https://d3i71xaburhd42.cloudfront.net/8cb6c2711afd3e504400ee12d3b582cc06348b08/7-Figure2-1.png
  • 12. Real-time Analytics Extracting Value From Events As Soon as They Are Made Available REAL-TIME ANALYTICS Insights React Streams of Events
  • 13. What is a Streaming Stack?
  • 14. A streaming stack is the processes, tools, and technologies you use to derive insights from unbounded data.
  • 16. The Beginning • Real-time analytics dates back to decades, existed in the forms of Complex Event Processing (CEP) and Event Stream Processing (ESP). • Most of the work has been academic. But few vendors like Progress Apama, Esper, Tibco, and Streambase tried bringing it to the mass market.
  • 17. Then Came Big Data…
  • 18.
  • 19. Lambda Architecture Promotes A Uni f ied Serving Layer Image credit - https://www.databricks.com/glossary/lambda-architecture
  • 20. Why Didn’t It Pick Up?
  • 21.
  • 22. • Overly complicated technology: Specialised skillset of distributed systems and performance engineering. • Limited only to the JVM: Non- JVM developers had no option rather than adapting. • Higher footprint on infrastructure: Stream processors tax heavily on the CPU and RAM. • Maintenance overhead: Having to maintain both speed and batch layers.
  • 24. Modern Streaming Stack Modern Cloud-native tools Managed and Serverless platforms Rich tooling and developer experience Expressive programming model
  • 25. MSS is the classic streaming stack reimagined with self-service cloud-native tools providing a simpli f ied yet powerful developer experience to build real-time analytics applications.
  • 26. Modern Streaming Stack STREAMING DATA PLATFORM STREAM PROCESSING EVENT PRODUCERS TIERED STORAGE DATA API, METADATA & GOVERNANCE Data-driven Applications Operational Systems Real-time Analytics SERVING LAYER
  • 28. Event Production/Enablement The Origins of Events STREAMING DATA PLATFORM Language Speci f ic SDK Clients
  • 30. • Ingest events from sources in a scalable manner, and store them durably until they are processed. • Based on an immutable, distributed log f ile. Events are appended to the log and partitioned across multiple servers for durability and scalability. EVENT PRODUCERS Streaming Data Platform TOPIC TOPIC TOPIC TOPIC TOPIC TOPIC
  • 33. STREAM PROCESSING Event-driven Microservices Streaming ETL • Stream joins for enrichment • Filtering/routing/transforming streams • Data integration • Repartitioning streams (re-keying) Streaming Analytics • Stateful aggregations • Window operations • Materialising streams, stream-table duality • Actors • Reactive logic execution • Event-by-event processing, triggering side e ff ects
  • 36. INPUT TOPIC OUTPUT TOPIC Event Streaming Platform STREAM PROCESSING Serving Layer Events Streaming ingestion Real-time Insights Consumption Internal/user-facing Analytics Data Applications Recommendation Ad-hoc Exploration
  • 37. Serving Layer Expectations • Serve queries with sub-second latency to provide a better user experience. • Support a throughput of hundreds of thousands of queries per second to serve an Internet-scale user base. • Ensure data freshness — serve analytics from data ingested a few seconds ago. • Run complex OLAP queries, supporting joins, aggregations, and f iltering on large data sets.
  • 38. Serving Layer Technology Choices Key-value stores, NoSQL databases Real-time OLAP Databases
  • 40. Serving Layer STREAMING DATA PLATFORM New Events Older Events Tiered Storage • Back f illing • Hydrating new applications • Experimentation (ad-hoc querying) • Archival/regulatory compliance • Training ML models O ff line Use Cases
  • 41. Data APIs, Metadata, and Governance
  • 42. Analytics must be democratised and accessible across the board… Image credits - https://www.datanami.com/2022/01/21/data-meshes-set-to-spread-in-2022/, https://www.con f luent.io/blog/how-to-build-a-data-mesh-using- event-streams/
  • 43. Event Mesh EVENT CATALOG SCHEMA REGISTRY STREAMING API GRAPHQL API Serving Layer STREAM PROCESSOR EVENT STREAMING PLATFORM Decision makers Data applications Regulatory bodies Business partners Real-time Insights
  • 46. Convergence of Stream Processing and Serving Layer Streaming databases takes the stateful stream processing to the next level. SaaS o ff errings Integrated serving layer Write logic with SQL Pluggable integrations A ff ordable Developer friendly Pay-as-you-go Less components to manage Integrated tooling Caters to non-JVM developers Self-serve
  • 47. Rise of The Lakehouse Architecture A Lakehouse combines a data warehouse, data lake, and an event streaming platform together. High-throughput streaming ingestion Change Data Capture Upserts Transactions Table formats
  • 49. Takeaways There’s No Silver Bullet • Start small, build the critical path, and iterate. • Pick components based on the need and know their limitations. • Experiment, fail fast, and fail cheap. • Go for managed services, if the team is small and new to streaming technologies. • Learn from mistakes, establish processes, and share wisdom!!