SlideShare a Scribd company logo
1 of 25
Distributed Data Processing
for Real-time Applications
Distributed Data Expert and
Solutions Architect at AWS
Mahee Gunturu
Outline
1. Modern Application Architectures
i. Microservices
ii. Serverless
2. Fundamentals of Event Processing
i. Basic Terminology
ii. Advancements in Event Processing Platforms
iii. Connectors
iv. Benefits
3. Patterns in Event Processing
4. State Propagation in Distributed Systems
i. Data mesh architectures
2
Modern
Application
Architectures
1
4
4
Application architectures are evolving!
EAI
EII
ETL ESB
SOA
(WS-*)
CEP
iP
aaS
Service
Mesh
Data
Mesh
2020’s
2010’s
2000’s
1990’s
1980’s
Modern Real-time Applications Architecture
Business logic
Purpose-built
databases
Events Events
APIs
Queues/Messages
5
5
6
6
Benefits of Serverless
No servers to manage
Automatically scales
Only pay for consumption
Event-Driven Architectures
Write less code
7
7
If your application is cloud-native,
large-scale, or distributed, and
doesn’t include a messaging
component, that’s probably a bug.
Tim Bray
General-purpose internet-software geek
Fundamentals of
Event Processing
2
What is an Event?
■ An event is a change in state, or an update that carries entropy
and has a timestamp associated with it. Events can either carry
state or can be identifiers.
Event based architectures have three key components:
● event producers
● event routers
● event consumers
9
What is an Event Stream?
■ A series of continuous events that represent the changing
behaviour of a system form an event stream.
Event stream processing usually combines the information
from many streams and/or data sources in real-time,
to derive actionable insights.
10
Stream Processing
11
Microservices
IOT
Operational
Applications
Databases
Producers
Consumers
Database
changes
Microservices
events
IOT
Logs
Point of Sale
txns
Streams of real time events
Stream processing apps
Databases
Point of sale
Basic Terminology
12
Producer
API
Consumer
API
Publisher Subscriber
● Producer
● Broker
● Topic
● Topic Partition
● Offset
● Consumer
● Schema Registry
Brokers
Topics
Pulsar Terminology
13
● Producer
● Broker
● Topic
● Cursor
● Consumer
● Namespace/bundles
● Subscription
■ Key-Shared
■ Failover
■ Exclusive
■ Shared
● Reader
● Schema Registry
● ack/nack
● Message Delivery
■ At-most Once
■ At-least once
■ Exactly once
● Bookie/Bookkeeper
● Segment
● Dispatcher
● Ledger
● Tenant
Tiered Storage
Stateless
Stateful
Multi-tenancy Model in Pulsar
14
Tenants
(Compliance)
Tenants
(Data Services)
Namespace
(Microservices)
Topic-1
(Cust Auth)
Topic-1
(Location Resolution)
Topic-2
(Demographics)
Topic-1
(Budgeted Spend)
Topic-1
(Acct History)
Topic-1
(Risk Detection)
Namespace
(ETL)
Namespace
(Campaigns)
Namespace
(ETL)
Tenants
(Marketing)
Namespace
(Risk Assessment)
Pulsar Instance
Pulsar Cluster
Pulsar Cluster Pulsar Cluster
Global Replication
15
Data Center 3
Data Center 2
Data Center 1
● Data is replicated
Asynchronously
● Kafka uses Mirror-Maker to
replicate the data
● Pulsar has built-in cross data
center replication that is used in
production already
Clients
■ Client API
● Language bindings for Java, Go, Python, C++, Node.js and C#.
● Community client drivers for rust, erlang, haskell, websockets etc…
■ SQL
● Trino (Presto SQL)
■ Kafka on Pulsar
● Client wrapper for Kafka API compatibility
■ Apache Flink native integration
● Exactly once guarantees.
■ Pulsar adaptor for Apache Spark streaming
● Receives raw data from pulsar.
16
Pulsar Connectors
17
Support for multiple
protocols and multi-tenancy
FunctionMesh for K8
Connector Management
Preserves data schema
Integrated Observability
Processing Guarantees,
Load Balancing
Benefits of Event Stream Processing Platforms
■ Centralized infrastructure
■ Intermediate layer for buffering
■ Impedance mismatch between applications
■ Data export and import capabilities
■ Publish CDC (Change Data Capture) streams
■ Ability to recreate state
■ Streaming Data Transformations
■ Integrate with various applications
■ Event based custom Streaming workflows
18
Patterns in Event-
Processing
3
Patterns in Event Processing
■ Event Driven
● Message queues/routing
● Loosely coupled
● Async communication
■ Event Streaming
● Transformations
● Aggregator
● Filtering
20
■ Event Sourcing
● CQRS
● Stateful service pattern
● Saga pattern
State Propagation
in Distributed
Systems
4
2
Event Streaming DDD Microservices Data Warehouses
Domain
Inventory
Orders
Shipments
Data Product
Data Mesh
...
What is Data Mesh?
22
22
Design Principles of a Data Mesh
1. Data is a product
i. Each organization owns their data end-to-end
ii. Responsible for building, operating and serving their data
iii. Publish CDC logs across bounded contexts
iv. Organization needs to resolve any issues and make it easy for data
discoverability
2. Federated data governance
i. Provides data lineage
ii. Validates data quality
iii. Data is shared securely and appropriate access controls are enforced.
iv. Auditing and reporting are enabled
3. Platform should be self-service
i. Accessible via purpose built infra tooling
23
Keep in touch!
Mahee Gunturu
Solutions Architect
AWS
linkedin.com/in/maheedhargunturu/
@Vanguard_space
Implement a Data Mesh: Cheat Sheet
25
■ Centralize data in motion
Introduce a central event streaming platform
■ Nominate data owners
Firm owners for all key datasets in the organization
Make ownership information broadly accessible
■ Data on demand
Events are either stored in messaging systems
or can be republished by data products on demand
■ Handle schema change
Owners publish schema information to the mesh
Process for schema change approval
■ Secure event streams
Access to event streams is permissioned by a central
body
■ Connect from any database
Sink Connectors are made available for all
supported database types to ease the provisioning
of new output data ports in the mesh
■ Central user interface
Events are either stored in messaging systems
or can be republished by data products on demand
■ Discovery & registration of event streams
■ Search data schemas
■ Data lineage views
■ Request to access event streams
■ Previewing event streams

More Related Content

Similar to Distributed Data Processing for Real-time Applications

High Availability HPC ~ Microservice Architectures for Supercomputing
High Availability HPC ~ Microservice Architectures for SupercomputingHigh Availability HPC ~ Microservice Architectures for Supercomputing
High Availability HPC ~ Microservice Architectures for Supercomputinginside-BigData.com
 
Horizontal Scaling for Millions of Customers!
Horizontal Scaling for Millions of Customers! Horizontal Scaling for Millions of Customers!
Horizontal Scaling for Millions of Customers! elangovans
 
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...HostedbyConfluent
 
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...Tammy Bednar
 
apidays New York - Leveraging Event Streaming to Super-Charge your Business, ...
apidays New York - Leveraging Event Streaming to Super-Charge your Business, ...apidays New York - Leveraging Event Streaming to Super-Charge your Business, ...
apidays New York - Leveraging Event Streaming to Super-Charge your Business, ...apidays
 
Microservices Patterns with GoldenGate
Microservices Patterns with GoldenGateMicroservices Patterns with GoldenGate
Microservices Patterns with GoldenGateJeffrey T. Pollock
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshIanFurlong4
 
Confluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPointConfluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPointconfluent
 
Deconstructing Monoliths with Domain Driven Design
Deconstructing Monoliths with Domain Driven DesignDeconstructing Monoliths with Domain Driven Design
Deconstructing Monoliths with Domain Driven DesignVMware Tanzu
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsStreamsets Inc.
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache KafkaBest Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache KafkaKai Wähner
 
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...confluent
 
QRadar, ArcSight and Splunk
QRadar, ArcSight and Splunk QRadar, ArcSight and Splunk
QRadar, ArcSight and Splunk M sharifi
 
Digital Business Transformation for Energy & Utility company
Digital Business Transformation for Energy & Utility companyDigital Business Transformation for Energy & Utility company
Digital Business Transformation for Energy & Utility companyIlham Ahmed
 
Wicsa2011 cloud tutorial
Wicsa2011 cloud tutorialWicsa2011 cloud tutorial
Wicsa2011 cloud tutorialAnna Liu
 

Similar to Distributed Data Processing for Real-time Applications (20)

High Availability HPC ~ Microservice Architectures for Supercomputing
High Availability HPC ~ Microservice Architectures for SupercomputingHigh Availability HPC ~ Microservice Architectures for Supercomputing
High Availability HPC ~ Microservice Architectures for Supercomputing
 
Horizontal Scaling for Millions of Customers!
Horizontal Scaling for Millions of Customers! Horizontal Scaling for Millions of Customers!
Horizontal Scaling for Millions of Customers!
 
VAS - VMware CMP
VAS - VMware CMPVAS - VMware CMP
VAS - VMware CMP
 
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...
 
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
 
apidays New York - Leveraging Event Streaming to Super-Charge your Business, ...
apidays New York - Leveraging Event Streaming to Super-Charge your Business, ...apidays New York - Leveraging Event Streaming to Super-Charge your Business, ...
apidays New York - Leveraging Event Streaming to Super-Charge your Business, ...
 
Microservices Patterns with GoldenGate
Microservices Patterns with GoldenGateMicroservices Patterns with GoldenGate
Microservices Patterns with GoldenGate
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
 
Confluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPointConfluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPoint
 
kumarResume
kumarResumekumarResume
kumarResume
 
Brad stack - Digital Health and Well-Being Festival
Brad stack - Digital Health and Well-Being Festival Brad stack - Digital Health and Well-Being Festival
Brad stack - Digital Health and Well-Being Festival
 
Technical Skillwise
Technical SkillwiseTechnical Skillwise
Technical Skillwise
 
Deconstructing Monoliths with Domain Driven Design
Deconstructing Monoliths with Domain Driven DesignDeconstructing Monoliths with Domain Driven Design
Deconstructing Monoliths with Domain Driven Design
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache KafkaBest Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
 
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
 
QRadar, ArcSight and Splunk
QRadar, ArcSight and Splunk QRadar, ArcSight and Splunk
QRadar, ArcSight and Splunk
 
Digital Business Transformation for Energy & Utility company
Digital Business Transformation for Energy & Utility companyDigital Business Transformation for Energy & Utility company
Digital Business Transformation for Energy & Utility company
 
inmation Presentation_2017
inmation Presentation_2017inmation Presentation_2017
inmation Presentation_2017
 
Wicsa2011 cloud tutorial
Wicsa2011 cloud tutorialWicsa2011 cloud tutorial
Wicsa2011 cloud tutorial
 

More from ScyllaDB

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLScyllaDB
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...ScyllaDB
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...ScyllaDB
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaScyllaDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityScyllaDB
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptxScyllaDB
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDBScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationScyllaDB
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsScyllaDB
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesScyllaDB
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsScyllaDB
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101ScyllaDB
 

More from ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Recently uploaded (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Distributed Data Processing for Real-time Applications

  • 1. Distributed Data Processing for Real-time Applications Distributed Data Expert and Solutions Architect at AWS Mahee Gunturu
  • 2. Outline 1. Modern Application Architectures i. Microservices ii. Serverless 2. Fundamentals of Event Processing i. Basic Terminology ii. Advancements in Event Processing Platforms iii. Connectors iv. Benefits 3. Patterns in Event Processing 4. State Propagation in Distributed Systems i. Data mesh architectures 2
  • 4. 4 4 Application architectures are evolving! EAI EII ETL ESB SOA (WS-*) CEP iP aaS Service Mesh Data Mesh 2020’s 2010’s 2000’s 1990’s 1980’s
  • 5. Modern Real-time Applications Architecture Business logic Purpose-built databases Events Events APIs Queues/Messages 5 5
  • 6. 6 6 Benefits of Serverless No servers to manage Automatically scales Only pay for consumption Event-Driven Architectures Write less code
  • 7. 7 7 If your application is cloud-native, large-scale, or distributed, and doesn’t include a messaging component, that’s probably a bug. Tim Bray General-purpose internet-software geek
  • 9. What is an Event? ■ An event is a change in state, or an update that carries entropy and has a timestamp associated with it. Events can either carry state or can be identifiers. Event based architectures have three key components: ● event producers ● event routers ● event consumers 9
  • 10. What is an Event Stream? ■ A series of continuous events that represent the changing behaviour of a system form an event stream. Event stream processing usually combines the information from many streams and/or data sources in real-time, to derive actionable insights. 10
  • 12. Basic Terminology 12 Producer API Consumer API Publisher Subscriber ● Producer ● Broker ● Topic ● Topic Partition ● Offset ● Consumer ● Schema Registry Brokers Topics
  • 13. Pulsar Terminology 13 ● Producer ● Broker ● Topic ● Cursor ● Consumer ● Namespace/bundles ● Subscription ■ Key-Shared ■ Failover ■ Exclusive ■ Shared ● Reader ● Schema Registry ● ack/nack ● Message Delivery ■ At-most Once ■ At-least once ■ Exactly once ● Bookie/Bookkeeper ● Segment ● Dispatcher ● Ledger ● Tenant Tiered Storage Stateless Stateful
  • 14. Multi-tenancy Model in Pulsar 14 Tenants (Compliance) Tenants (Data Services) Namespace (Microservices) Topic-1 (Cust Auth) Topic-1 (Location Resolution) Topic-2 (Demographics) Topic-1 (Budgeted Spend) Topic-1 (Acct History) Topic-1 (Risk Detection) Namespace (ETL) Namespace (Campaigns) Namespace (ETL) Tenants (Marketing) Namespace (Risk Assessment) Pulsar Instance Pulsar Cluster Pulsar Cluster Pulsar Cluster
  • 15. Global Replication 15 Data Center 3 Data Center 2 Data Center 1 ● Data is replicated Asynchronously ● Kafka uses Mirror-Maker to replicate the data ● Pulsar has built-in cross data center replication that is used in production already
  • 16. Clients ■ Client API ● Language bindings for Java, Go, Python, C++, Node.js and C#. ● Community client drivers for rust, erlang, haskell, websockets etc… ■ SQL ● Trino (Presto SQL) ■ Kafka on Pulsar ● Client wrapper for Kafka API compatibility ■ Apache Flink native integration ● Exactly once guarantees. ■ Pulsar adaptor for Apache Spark streaming ● Receives raw data from pulsar. 16
  • 17. Pulsar Connectors 17 Support for multiple protocols and multi-tenancy FunctionMesh for K8 Connector Management Preserves data schema Integrated Observability Processing Guarantees, Load Balancing
  • 18. Benefits of Event Stream Processing Platforms ■ Centralized infrastructure ■ Intermediate layer for buffering ■ Impedance mismatch between applications ■ Data export and import capabilities ■ Publish CDC (Change Data Capture) streams ■ Ability to recreate state ■ Streaming Data Transformations ■ Integrate with various applications ■ Event based custom Streaming workflows 18
  • 20. Patterns in Event Processing ■ Event Driven ● Message queues/routing ● Loosely coupled ● Async communication ■ Event Streaming ● Transformations ● Aggregator ● Filtering 20 ■ Event Sourcing ● CQRS ● Stateful service pattern ● Saga pattern
  • 22. 2 Event Streaming DDD Microservices Data Warehouses Domain Inventory Orders Shipments Data Product Data Mesh ... What is Data Mesh? 22 22
  • 23. Design Principles of a Data Mesh 1. Data is a product i. Each organization owns their data end-to-end ii. Responsible for building, operating and serving their data iii. Publish CDC logs across bounded contexts iv. Organization needs to resolve any issues and make it easy for data discoverability 2. Federated data governance i. Provides data lineage ii. Validates data quality iii. Data is shared securely and appropriate access controls are enforced. iv. Auditing and reporting are enabled 3. Platform should be self-service i. Accessible via purpose built infra tooling 23
  • 24. Keep in touch! Mahee Gunturu Solutions Architect AWS linkedin.com/in/maheedhargunturu/ @Vanguard_space
  • 25. Implement a Data Mesh: Cheat Sheet 25 ■ Centralize data in motion Introduce a central event streaming platform ■ Nominate data owners Firm owners for all key datasets in the organization Make ownership information broadly accessible ■ Data on demand Events are either stored in messaging systems or can be republished by data products on demand ■ Handle schema change Owners publish schema information to the mesh Process for schema change approval ■ Secure event streams Access to event streams is permissioned by a central body ■ Connect from any database Sink Connectors are made available for all supported database types to ease the provisioning of new output data ports in the mesh ■ Central user interface Events are either stored in messaging systems or can be republished by data products on demand ■ Discovery & registration of event streams ■ Search data schemas ■ Data lineage views ■ Request to access event streams ■ Previewing event streams