SlideShare a Scribd company logo
A Tale of 3 Pipelines
Brian Taylor
Reference Architectures
2
Stream to Stream
Stream to State
Stateful Stream to
Stream
Write: Money = Performance
Read: Data dependency
limited
Data dependency
limited
Money = Performance
ODP “Analytic
segments”
ODP “Real-Time
Segments”
Webhook system
What it is: How it scales:
Inter-Event Data Dependency
3
A property of the stream and the problem. Measures the way that events impact
the processing of following events.
For example:
A stream of record mutations that must be applied one after another within a
single record id
● A stream with many records and no sequential mutations for any given record
id has no data dependency
● A stream with a single record id and only sequential mutations has maximum
data dependency
● A topic with a single partition has maximum data dependency (in some sense)
Inter-Event Data Dependency
The average length of the data-dependent chains in your stream
decide your average throughput at any scale.
This is equivalent to the way “the sequential portion” of a problem
constrains the maximum parallel speedup in Amdhal’s Law.
S: max speedup fraction, s: parallelism, p: “data dependency fraction”
Big Idea
5
● “It’s all about the data-dependency, baby”
● No data dependency: Smooth scaling
● Data dependency: Navigating hell
No Data Dependency
∅DD
6
Reference Architecture
7
Stream to Stream Money = Performance
Webhook system
What it is: How it scales:
Subscription information
Change
Notifications
Delivery
Requests
What you can do with ∅DD
8
Abstractly
■ Data reshaping
■ Order-independent enrichment
■ Non-self Joins
Concrete Use Cases
■ Adapters
■ Sentiment detectors
■ Geo-IP mappers
■ Redaction
If no external data
access is required:
Redpanda transforms
FTW!
Performance Tradespace
9
More money = More Throughput
Tactics: Add shards and partitions until you have enough capacity
Deferred Inter-Event
Data Dependency
DEDD
10
Reference Architecture
11
Stream to State
Write: Money = Performance
Read: Data dependency
limited
ODP “Analytic
segments”
Optimizely
Experimentation
What it is: How it scales:
What you can do with DEDD
12
Abstractly
● Use it when Write Performance is more important than Read
Performance
Concrete Use Cases
● Reporting: Especially when users read less than they write
● Nightly model training
Performance Tradespace
13
Write side: More money = More Speed
Read side: Data-dependency limited
Tactics: Reduce data dependency with finer grained partitioning
Streaming Inter-Event
Data Dependency
SEDD
14
Reference Architecture
15
Stateful Stream to
Stream
Data dependency
limited
ODP “Real-Time
Segments”
What it is: How it scales:
What you can do with SEDD
16
Abstractly
■ Streaming aggregates
■ Pattern detectors
Concrete Use Cases
■ Segmentation
■ Real time model training
Performance Tradespace
17
Throughput: Data-dependency limited
Tactics for reducing data-dependency:
■ Finer grained partitioning
■ Accept eventual consistency with CRDTs
Fundamental Tradeoff
Inter-event data dependency Max throughput
If you need SEDD and throughput, welcome to hell.
18
But…
19
Query Latency Data Latency
Query Latency: Time it takes to
respond to a request
Driven by: DD work remaining
to resolve the request
Impact: The places where it’s
suitable to use your query API
20
Data Latency: How long it
takes for new information to
impact a query
Driven by: How you cheated to
hide from your data
dependency
Impact: How actionable the
results from your API are
“Cheating” out of Hell
21
Stream to State
Introduces a data
latency / cost tradeoff
Min-data latency is
now data dependency
limited
Everyone else’s
“Real-Time
Segments”
What it is:
How it scales:
Periodic State to Stream
Wrapping it Up
Data Dependency Decides Everything
∅DD - Oddly common in example code and marketing materials. Very
rarely happens in real life.
DEDD - Practical workaround most of the time. Became truly effective
in the last decade as data warehouses have matured.
SEDD - Sounds like “sad” for a reason. A difficult place to be. Hopefully
the next decade will bring some meaningful breakthroughs here.
23
Keep in touch!
Brian Taylor
Director of Engineering
Optimizely
brian.taylor@optimizely.com
@netguy204

More Related Content

Similar to Event-Driven Architecture Masterclass: Challenges in Stream Processing

The Central Hub: Defining the Data Lake
The Central Hub: Defining the Data LakeThe Central Hub: Defining the Data Lake
The Central Hub: Defining the Data Lake
Eric Kavanagh
 
Big data business case
Big data   business caseBig data   business case
Big data business case
Karthik Padmanabhan ( MLE℠)
 
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
Flink Forward
 
Application Performance Lecture
Application Performance LectureApplication Performance Lecture
Application Performance Lecture
Vishwanath Ramdas
 
Web Performance – die effektivsten Techniken aus der Praxis
Web Performance – die effektivsten Techniken aus der PraxisWeb Performance – die effektivsten Techniken aus der Praxis
Web Performance – die effektivsten Techniken aus der Praxis
Felix Gessert
 
Querying datasets on the Web with high availability
Querying datasets on the Web with high availabilityQuerying datasets on the Web with high availability
Querying datasets on the Web with high availability
Ruben Verborgh
 
How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...
How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...
How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...
Amazon Web Services
 
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsReal time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Data Con LA
 
Stream Processing
Stream Processing Stream Processing
Stream Processing
FogGuru MSCA Project
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
DataStax
 
Big data presentationandoverview_of_couchbase
Big data presentationandoverview_of_couchbaseBig data presentationandoverview_of_couchbase
Big data presentationandoverview_of_couchbase
AMAR NATH
 
Data Analytics at Altocloud
Data Analytics at Altocloud Data Analytics at Altocloud
Data Analytics at Altocloud
Altocloud
 
Queues, Pools and Caches paper
Queues, Pools and Caches paperQueues, Pools and Caches paper
Queues, Pools and Caches paper
Gwen (Chen) Shapira
 
Events and metrics the Lifeblood of Webops
Events and metrics the Lifeblood of WebopsEvents and metrics the Lifeblood of Webops
Events and metrics the Lifeblood of Webops
Datadog
 
Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016
Stavros Kontopoulos
 
Queues, Pools and Caches - Paper
Queues, Pools and Caches - PaperQueues, Pools and Caches - Paper
Queues, Pools and Caches - Paper
Gwen (Chen) Shapira
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
Dr. Shikha Mehta
 
Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®
Ververica
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
Kostas Tzoumas
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Philip Filleul
 

Similar to Event-Driven Architecture Masterclass: Challenges in Stream Processing (20)

The Central Hub: Defining the Data Lake
The Central Hub: Defining the Data LakeThe Central Hub: Defining the Data Lake
The Central Hub: Defining the Data Lake
 
Big data business case
Big data   business caseBig data   business case
Big data business case
 
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
 
Application Performance Lecture
Application Performance LectureApplication Performance Lecture
Application Performance Lecture
 
Web Performance – die effektivsten Techniken aus der Praxis
Web Performance – die effektivsten Techniken aus der PraxisWeb Performance – die effektivsten Techniken aus der Praxis
Web Performance – die effektivsten Techniken aus der Praxis
 
Querying datasets on the Web with high availability
Querying datasets on the Web with high availabilityQuerying datasets on the Web with high availability
Querying datasets on the Web with high availability
 
How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...
How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...
How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...
 
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsReal time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
 
Stream Processing
Stream Processing Stream Processing
Stream Processing
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
 
Big data presentationandoverview_of_couchbase
Big data presentationandoverview_of_couchbaseBig data presentationandoverview_of_couchbase
Big data presentationandoverview_of_couchbase
 
Data Analytics at Altocloud
Data Analytics at Altocloud Data Analytics at Altocloud
Data Analytics at Altocloud
 
Queues, Pools and Caches paper
Queues, Pools and Caches paperQueues, Pools and Caches paper
Queues, Pools and Caches paper
 
Events and metrics the Lifeblood of Webops
Events and metrics the Lifeblood of WebopsEvents and metrics the Lifeblood of Webops
Events and metrics the Lifeblood of Webops
 
Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016
 
Queues, Pools and Caches - Paper
Queues, Pools and Caches - PaperQueues, Pools and Caches - Paper
Queues, Pools and Caches - Paper
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
 
Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FS
 

More from ScyllaDB

Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
ScyllaDB
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
ScyllaDB
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
ScyllaDB
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
ScyllaDB
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
ScyllaDB
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
ScyllaDB
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
ScyllaDB
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
ScyllaDB
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
ScyllaDB
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
ScyllaDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
ScyllaDB
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
ScyllaDB
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
ScyllaDB
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
ScyllaDB
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
ScyllaDB
 

More from ScyllaDB (20)

Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 

Recently uploaded

HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 

Recently uploaded (20)

HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 

Event-Driven Architecture Masterclass: Challenges in Stream Processing

  • 1. A Tale of 3 Pipelines Brian Taylor
  • 2. Reference Architectures 2 Stream to Stream Stream to State Stateful Stream to Stream Write: Money = Performance Read: Data dependency limited Data dependency limited Money = Performance ODP “Analytic segments” ODP “Real-Time Segments” Webhook system What it is: How it scales:
  • 3. Inter-Event Data Dependency 3 A property of the stream and the problem. Measures the way that events impact the processing of following events. For example: A stream of record mutations that must be applied one after another within a single record id ● A stream with many records and no sequential mutations for any given record id has no data dependency ● A stream with a single record id and only sequential mutations has maximum data dependency ● A topic with a single partition has maximum data dependency (in some sense)
  • 4. Inter-Event Data Dependency The average length of the data-dependent chains in your stream decide your average throughput at any scale. This is equivalent to the way “the sequential portion” of a problem constrains the maximum parallel speedup in Amdhal’s Law. S: max speedup fraction, s: parallelism, p: “data dependency fraction”
  • 5. Big Idea 5 ● “It’s all about the data-dependency, baby” ● No data dependency: Smooth scaling ● Data dependency: Navigating hell
  • 7. Reference Architecture 7 Stream to Stream Money = Performance Webhook system What it is: How it scales: Subscription information Change Notifications Delivery Requests
  • 8. What you can do with ∅DD 8 Abstractly ■ Data reshaping ■ Order-independent enrichment ■ Non-self Joins Concrete Use Cases ■ Adapters ■ Sentiment detectors ■ Geo-IP mappers ■ Redaction If no external data access is required: Redpanda transforms FTW!
  • 9. Performance Tradespace 9 More money = More Throughput Tactics: Add shards and partitions until you have enough capacity
  • 11. Reference Architecture 11 Stream to State Write: Money = Performance Read: Data dependency limited ODP “Analytic segments” Optimizely Experimentation What it is: How it scales:
  • 12. What you can do with DEDD 12 Abstractly ● Use it when Write Performance is more important than Read Performance Concrete Use Cases ● Reporting: Especially when users read less than they write ● Nightly model training
  • 13. Performance Tradespace 13 Write side: More money = More Speed Read side: Data-dependency limited Tactics: Reduce data dependency with finer grained partitioning
  • 15. Reference Architecture 15 Stateful Stream to Stream Data dependency limited ODP “Real-Time Segments” What it is: How it scales:
  • 16. What you can do with SEDD 16 Abstractly ■ Streaming aggregates ■ Pattern detectors Concrete Use Cases ■ Segmentation ■ Real time model training
  • 17. Performance Tradespace 17 Throughput: Data-dependency limited Tactics for reducing data-dependency: ■ Finer grained partitioning ■ Accept eventual consistency with CRDTs
  • 18. Fundamental Tradeoff Inter-event data dependency Max throughput If you need SEDD and throughput, welcome to hell. 18
  • 20. Query Latency Data Latency Query Latency: Time it takes to respond to a request Driven by: DD work remaining to resolve the request Impact: The places where it’s suitable to use your query API 20 Data Latency: How long it takes for new information to impact a query Driven by: How you cheated to hide from your data dependency Impact: How actionable the results from your API are
  • 21. “Cheating” out of Hell 21 Stream to State Introduces a data latency / cost tradeoff Min-data latency is now data dependency limited Everyone else’s “Real-Time Segments” What it is: How it scales: Periodic State to Stream
  • 23. Data Dependency Decides Everything ∅DD - Oddly common in example code and marketing materials. Very rarely happens in real life. DEDD - Practical workaround most of the time. Became truly effective in the last decade as data warehouses have matured. SEDD - Sounds like “sad” for a reason. A difficult place to be. Hopefully the next decade will bring some meaningful breakthroughs here. 23
  • 24. Keep in touch! Brian Taylor Director of Engineering Optimizely brian.taylor@optimizely.com @netguy204