SlideShare a Scribd company logo
Engineering a Robust and
High performance EDA
with Redpanda
Christina Lin
2
About me
2
Christina Lin
Developer Advocate, Redpanda
SOA
WebSphere
DB2
Sybase
Oracle
MQ
J2EE
EJB
DevOps
Microservice
EIP
K8s
Agile Integration
Data
Mesh
Active MQ
Live data stack
Resilience - handle failures and scale gracefully
Elasticity – infrastructure that can scale dynamically
Decentralization - data ownership, empowering individual teams
Performance - low latency and high throughput
Autonomy – self service, define quality, and access
Nimble - efficient data movement
Distributed -distributed data processing for cloud native
Agility – quickly respond to change in data
3
Robust
Event Driven Architecture
4
Services
Microservices
Databases
IoT Devices
Applications
System B
Team A
Department C
Team D
Group E
Services
Databases
IoT Devices
Applications
System B
Team A
Department C
Team D
Group E
Microservices
Producer
Consumer
Event Driven Architecture
5
Orders
Health records
Restock Signal
CDC Event
Streaming
Table/
Materialize
view
Data Store
Payroll
Payment
Shipment Signal
Inventory
The Contracts
6
Microservices
Microservices
Databases
/ CDC
Microservices
Data
Lake/Data
warehouse
Microservices
Schema Registry
7
Producer
Data structure encoding
- Avro, Protobuf and JSON
Data structure
- {name:type}
Serialize
Download the
schema (version)
Consumer
Schema Registry
Deserialize
Value
(Binary)
Schema
ID
Key
(Binary)
Value
(Binary)
Schema Registry
8
Server-side validation
Value
(Binary)
Sche
ma ID
Key
(Binar
y)
Value
(Binary)
Schema Registry
Check if schema id is
valid
Schema Registry
Producer
• Backward
• Forward
• Full
compatibility
• None
Schema Registry
Version 1
Version 2
Version 3
Schema Registry in Redpanda
9
Service Registry
Service Registry
Restful Endpoint
Restful Endpoint
_schemas
_schemas
Schema Registry
• Assign a default value to the fields that you might remove in the future
• Do not rename an existing field—add an alias instead
• When using schema evolution, always provide a default value
• Never delete a required field
When not to use Schema registry
• You’re certain the schema won’t change in the future
• If hardware resources are limited and low latency is critical, it may impact
performance (e.g., for IoT)
• You want to serialize the data with an unsupported serialization scheme
10
Event validation & DLQ
11
DLQ
Consumer
Correction/
Remedy
Validator
DLQ
Correction/
Remedy
Validator
DLQ
Correction/
Remedy
In broker validation – how it works
12
Replicate
across clusters
customer
partition 1
Load to
cache
Validate
against
schema
Transform
Write back to
disk with DMA
Customer validated
partition 1
Example repo: https://github.com/redpanda-data/redpanda-labs/tree/main/data-transforms/to_avro
In broker validation & transformation
• Firsthand processing, quick filtering
• Simple rerouting determine on ingested data
• Masking, schema validation
• Stateless, functional processing
When not to use in broker transformation?
• When it requires external data dependencies
• Windowing, complex processing, with multiple streams of input
• When it requires to keep the state of the processes
13
14
High Performance
15
Turning the knobs
Producer
Producer
Producer
Producer
Producer
Producer
Consumer
Consumer
Consumer
Consumer
Consumer
Consumer
Consumer
16
The Broker
Infrastructure Storage – XFS,NVMe
Network bandwidth
Memory
CPU
Location (Multi-AZ)
OS
Disk I/O
read_iops/bandwidth
write_iops/bandwidth
Broker
# Brokers
# Replicas
# Partitions
Log segment size
17
Partitions
Partitions
Producer
Consumer
Consumer
Consumer
Group A
• Round Robin
• Hashing Key Partition
• Custom Partitioner
Overhead
• File handler
• Follower, heartbeat
• Large Metadata
quadratic (N2)
Idempotency
• Order guarantee in partition only
Higher latency
• Producer batch
Consumer rebalance
• RangeAssignor (SW)
• RoundRobinAssignor(SW)
• StickyAssignor(SW)
• CooperativeStickyAssignor
• Static (No Rebalance)
18
Producer
Producer
fsync
Acknowledgment
from the leader Ack=all
Ack=1
Majority of replicas
acknowledge
write_caching_default=true
flush_bytes, flush_ms
Ack=0
Doesn’t wait for
acknowledgments
and doesn’t retry
sending messages
Producer
batch.size
linger.ms
compression
19
Consumer
Consumer
fetch.min.bytes
max.poll.records
fetch.max.bytes
fetch.max.wait.ms
High Throughput
• There is no on size fits all, there are many factor when it comes to
performances.
• More partition will allow more parallel processing, hence higher throughput,
but it comes with cost.(Avoid over-partitioning or under-partitioning.)
• Experiment with acks settings, Enable write caching,
• Explore how the producer batches messages. Increasing the value
of batch.size and linger.ms can increase throughput by making the
producer add more messages into one batch
• Explore consumer fetch frequency and message size.
• Start with a baseline configuration and gradually make changes, measuring
the impact of each change on performance.
20
21
Robust for Stateful Processes
Beyond just streams of events
22
Databases
/ CDC
Microservices
Databases
/ CDC
Databases
/ CDC
Processor
Beyond just streams of events
23
Databases
/ CDC
Microservices
Databases
/ CDC
Databases
/ CDC
Processor
Limited disk space
24
Event Sourcing
S3 Rehydrate
State Snapshot
25
Microservices
Databases
/ CDC
Databases
/ CDC
Databases
/ CDC
Processor
Snapshot
Summary
■ Use schema to insure data shape for consumer
■ When designing, think about compatibility
■ Validation to ensures consumer always get the correct format.
■ In broker transform are great for simple, functions, stateless processes
■ Provision appropriate partition to your topics
■ Depends on your use case, for producer, always set the right Ack, and buffer
■ For stateful streams processing, use snapshot for fault tolerance
26
On demand example
27
Batch
Every 10 mins
CSV
CSV
Batch
pipeline
Batch Processing
Batch
pipeline
Right away!
Stream
CSV
Keep in touch!
Christina Lin
Developer Advocate
Redpanda
Christina@redpanda.com

More Related Content

Similar to Event-Driven Architecture Masterclass: Engineering a Robust, High-performance EDA

ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
RedHat MRG and Infinispan for Large Scale Integration
RedHat MRG and Infinispan for Large Scale IntegrationRedHat MRG and Infinispan for Large Scale Integration
RedHat MRG and Infinispan for Large Scale Integration
prajods
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
DATAVERSITY
 
Microsoft SQL Server - Reduce Your Cost and Improve your Agility Presentation
Microsoft SQL Server - Reduce Your Cost and Improve your Agility PresentationMicrosoft SQL Server - Reduce Your Cost and Improve your Agility Presentation
Microsoft SQL Server - Reduce Your Cost and Improve your Agility Presentation
Microsoft Private Cloud
 
Marketing Automation at Scale: How Marketo Solved Key Data Management Challen...
Marketing Automation at Scale: How Marketo Solved Key Data Management Challen...Marketing Automation at Scale: How Marketo Solved Key Data Management Challen...
Marketing Automation at Scale: How Marketo Solved Key Data Management Challen...
Continuent
 
Designing a Feedback Loop for Event-Driven Data Sharing With Teresa Wang | Cu...
Designing a Feedback Loop for Event-Driven Data Sharing With Teresa Wang | Cu...Designing a Feedback Loop for Event-Driven Data Sharing With Teresa Wang | Cu...
Designing a Feedback Loop for Event-Driven Data Sharing With Teresa Wang | Cu...
HostedbyConfluent
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Luan Moreno Medeiros Maciel
 
Best Practices in the Cloud for Data Management (US)
Best Practices in the Cloud for Data Management (US)Best Practices in the Cloud for Data Management (US)
Best Practices in the Cloud for Data Management (US)
Denodo
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
Sunil Govindan
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
Sunil Govindan
 
Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”
Amazon Web Services
 
Strategies For Migrating From SQL to NoSQL — The Apache Kafka Way
Strategies For Migrating From SQL to NoSQL — The Apache Kafka WayStrategies For Migrating From SQL to NoSQL — The Apache Kafka Way
Strategies For Migrating From SQL to NoSQL — The Apache Kafka Way
ScyllaDB
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?
Precisely
 
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled ArchitectureDM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
DATAVERSITY
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
DATAVERSITY
 
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
Tammy Bednar
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
DATAVERSITY
 
Five ways database modernization simplifies your data life
Five ways database modernization simplifies your data lifeFive ways database modernization simplifies your data life
Five ways database modernization simplifies your data life
SingleStore
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital Transformation
MongoDB
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
Apache Apex
 

Similar to Event-Driven Architecture Masterclass: Engineering a Robust, High-performance EDA (20)

ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
RedHat MRG and Infinispan for Large Scale Integration
RedHat MRG and Infinispan for Large Scale IntegrationRedHat MRG and Infinispan for Large Scale Integration
RedHat MRG and Infinispan for Large Scale Integration
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
 
Microsoft SQL Server - Reduce Your Cost and Improve your Agility Presentation
Microsoft SQL Server - Reduce Your Cost and Improve your Agility PresentationMicrosoft SQL Server - Reduce Your Cost and Improve your Agility Presentation
Microsoft SQL Server - Reduce Your Cost and Improve your Agility Presentation
 
Marketing Automation at Scale: How Marketo Solved Key Data Management Challen...
Marketing Automation at Scale: How Marketo Solved Key Data Management Challen...Marketing Automation at Scale: How Marketo Solved Key Data Management Challen...
Marketing Automation at Scale: How Marketo Solved Key Data Management Challen...
 
Designing a Feedback Loop for Event-Driven Data Sharing With Teresa Wang | Cu...
Designing a Feedback Loop for Event-Driven Data Sharing With Teresa Wang | Cu...Designing a Feedback Loop for Event-Driven Data Sharing With Teresa Wang | Cu...
Designing a Feedback Loop for Event-Driven Data Sharing With Teresa Wang | Cu...
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
 
Best Practices in the Cloud for Data Management (US)
Best Practices in the Cloud for Data Management (US)Best Practices in the Cloud for Data Management (US)
Best Practices in the Cloud for Data Management (US)
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”
 
Strategies For Migrating From SQL to NoSQL — The Apache Kafka Way
Strategies For Migrating From SQL to NoSQL — The Apache Kafka WayStrategies For Migrating From SQL to NoSQL — The Apache Kafka Way
Strategies For Migrating From SQL to NoSQL — The Apache Kafka Way
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?
 
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled ArchitectureDM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
Five ways database modernization simplifies your data life
Five ways database modernization simplifies your data lifeFive ways database modernization simplifies your data life
Five ways database modernization simplifies your data life
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital Transformation
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 

More from ScyllaDB

Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
ScyllaDB
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
ScyllaDB
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
ScyllaDB
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
ScyllaDB
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
ScyllaDB
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
ScyllaDB
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
ScyllaDB
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
ScyllaDB
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
ScyllaDB
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
ScyllaDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
ScyllaDB
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
ScyllaDB
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
ScyllaDB
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
ScyllaDB
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
ScyllaDB
 

More from ScyllaDB (20)

Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 

Recently uploaded

Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 

Recently uploaded (20)

Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 

Event-Driven Architecture Masterclass: Engineering a Robust, High-performance EDA

  • 1. Engineering a Robust and High performance EDA with Redpanda Christina Lin
  • 2. 2 About me 2 Christina Lin Developer Advocate, Redpanda SOA WebSphere DB2 Sybase Oracle MQ J2EE EJB DevOps Microservice EIP K8s Agile Integration Data Mesh Active MQ Live data stack Resilience - handle failures and scale gracefully Elasticity – infrastructure that can scale dynamically Decentralization - data ownership, empowering individual teams Performance - low latency and high throughput Autonomy – self service, define quality, and access Nimble - efficient data movement Distributed -distributed data processing for cloud native Agility – quickly respond to change in data
  • 4. Event Driven Architecture 4 Services Microservices Databases IoT Devices Applications System B Team A Department C Team D Group E Services Databases IoT Devices Applications System B Team A Department C Team D Group E Microservices Producer Consumer
  • 5. Event Driven Architecture 5 Orders Health records Restock Signal CDC Event Streaming Table/ Materialize view Data Store Payroll Payment Shipment Signal Inventory
  • 7. Schema Registry 7 Producer Data structure encoding - Avro, Protobuf and JSON Data structure - {name:type} Serialize Download the schema (version) Consumer Schema Registry Deserialize Value (Binary) Schema ID Key (Binary) Value (Binary)
  • 8. Schema Registry 8 Server-side validation Value (Binary) Sche ma ID Key (Binar y) Value (Binary) Schema Registry Check if schema id is valid Schema Registry Producer • Backward • Forward • Full compatibility • None Schema Registry Version 1 Version 2 Version 3
  • 9. Schema Registry in Redpanda 9 Service Registry Service Registry Restful Endpoint Restful Endpoint _schemas _schemas
  • 10. Schema Registry • Assign a default value to the fields that you might remove in the future • Do not rename an existing field—add an alias instead • When using schema evolution, always provide a default value • Never delete a required field When not to use Schema registry • You’re certain the schema won’t change in the future • If hardware resources are limited and low latency is critical, it may impact performance (e.g., for IoT) • You want to serialize the data with an unsupported serialization scheme 10
  • 11. Event validation & DLQ 11 DLQ Consumer Correction/ Remedy Validator DLQ Correction/ Remedy Validator DLQ Correction/ Remedy
  • 12. In broker validation – how it works 12 Replicate across clusters customer partition 1 Load to cache Validate against schema Transform Write back to disk with DMA Customer validated partition 1 Example repo: https://github.com/redpanda-data/redpanda-labs/tree/main/data-transforms/to_avro
  • 13. In broker validation & transformation • Firsthand processing, quick filtering • Simple rerouting determine on ingested data • Masking, schema validation • Stateless, functional processing When not to use in broker transformation? • When it requires external data dependencies • Windowing, complex processing, with multiple streams of input • When it requires to keep the state of the processes 13
  • 16. 16 The Broker Infrastructure Storage – XFS,NVMe Network bandwidth Memory CPU Location (Multi-AZ) OS Disk I/O read_iops/bandwidth write_iops/bandwidth Broker # Brokers # Replicas # Partitions Log segment size
  • 17. 17 Partitions Partitions Producer Consumer Consumer Consumer Group A • Round Robin • Hashing Key Partition • Custom Partitioner Overhead • File handler • Follower, heartbeat • Large Metadata quadratic (N2) Idempotency • Order guarantee in partition only Higher latency • Producer batch Consumer rebalance • RangeAssignor (SW) • RoundRobinAssignor(SW) • StickyAssignor(SW) • CooperativeStickyAssignor • Static (No Rebalance)
  • 18. 18 Producer Producer fsync Acknowledgment from the leader Ack=all Ack=1 Majority of replicas acknowledge write_caching_default=true flush_bytes, flush_ms Ack=0 Doesn’t wait for acknowledgments and doesn’t retry sending messages Producer batch.size linger.ms compression
  • 20. High Throughput • There is no on size fits all, there are many factor when it comes to performances. • More partition will allow more parallel processing, hence higher throughput, but it comes with cost.(Avoid over-partitioning or under-partitioning.) • Experiment with acks settings, Enable write caching, • Explore how the producer batches messages. Increasing the value of batch.size and linger.ms can increase throughput by making the producer add more messages into one batch • Explore consumer fetch frequency and message size. • Start with a baseline configuration and gradually make changes, measuring the impact of each change on performance. 20
  • 22. Beyond just streams of events 22 Databases / CDC Microservices Databases / CDC Databases / CDC Processor
  • 23. Beyond just streams of events 23 Databases / CDC Microservices Databases / CDC Databases / CDC Processor
  • 24. Limited disk space 24 Event Sourcing S3 Rehydrate
  • 25. State Snapshot 25 Microservices Databases / CDC Databases / CDC Databases / CDC Processor Snapshot
  • 26. Summary ■ Use schema to insure data shape for consumer ■ When designing, think about compatibility ■ Validation to ensures consumer always get the correct format. ■ In broker transform are great for simple, functions, stateless processes ■ Provision appropriate partition to your topics ■ Depends on your use case, for producer, always set the right Ack, and buffer ■ For stateful streams processing, use snapshot for fault tolerance 26
  • 27. On demand example 27 Batch Every 10 mins CSV CSV Batch pipeline Batch Processing Batch pipeline Right away! Stream CSV
  • 28. Keep in touch! Christina Lin Developer Advocate Redpanda Christina@redpanda.com