SlideShare a Scribd company logo
Pulsar Summit
San Francisco
Hotel Nikko
August 18 2022
Ecosystem
Distributed Database
Design Decisions to
Support High
Performance Event
Streaming
Peter Corless
Director of Technical Advocacy • ScyllaDB
Peter Corless is the Director of
Technical Advocacy of ScyllaDB, the
company behind the monstrously fast
and scalable NoSQL database.
He is the editor of and frequent
contributor to the ScyllaDB blog, and
program chair of the ScyllaDB Summit
and P99 CONF.
He recently hosted the Distributed
Systems Masterclass, co-sponsored by
StreamNative+ScyllaDB
Peter Corless
Director of Technical Advocacy
ScyllaDB
Distributed Database Design Decisions to Support High Performance Event Streaming
Requirements for
This Next Tech Cycle
This Next Tech Cycle
2000 2010
2020 2025+
Transistor
Count
42M
Pentium 4
(2000)
228M
Pentium D
(2005)
2.3B
Xeon Nahalem-EX
(2010)
10B
SPARC M7
(2015)
39B
Epyc Rome
(2019)
Core
Count 1 2 8 32 64
~60B?
Epyc Genoa
(2022)
96
~80B?
Epyc Bergamo
(2023)
128
1.2 ZB
IP traffic
(2016)
2 ZB
Data stored
(2010)
64 ZB
Data stored
(2020)
Broadband
Speeds
3G
(2002)
105mbps
(2014)
1.5 mbps
(2002)
16 mbps
(2008)
Wireless
Services
3Gbps
(2021)
1Gbps
(2018)
4G
(2014)
5G
(2018)
Zettabyte
Era
~180 ZB
Data stored
(2025)
Public Cloud
to Multicloud
AWS
(2006)
GCP
(2008)
Azure
(2010)
1021
Azure Arc
Hardware Infrastructure is Evolving
+ Compute
+ From 100+ cores → 1,000+ cores per server
+ From multicore CPUs → full System on a Chip (SoC) designs (CPU,
GPU, Cache, Memory)
+ Memory
+ Terabyte-scale RAM per server
+ DDR4 — 800 to 1600 MHz, 2011-present
+ DDR5 — 4600 MHz in 2020, 8000 MHz by 2024
+ DDR6 — 9600 MHz by 2025
+ Storage
+ Petabyte-scale storage per server
+ NVMe 2.0 [2021] — separation of base and transport
Distributed Database Design Decisions to Support High Performance Event Streaming
Distributed Database Design Decisions to Support High Performance Event Streaming
Databases are Evolving
+ Consistency Models [CAP Model: AP vs. CP]
+ Strong, Eventual, Tunable
+ ACID vs. BASE
+ Data Model / Query Languages [SQL vs. NoSQL]
+ RDBMS / SQL
+ NoSQL [Document, Key-Value, Wide-Column, Graph]
+ Big Data → HUGE Data
+ Data Stored: Gigabytes? Terabytes? Petabytes? Exabytes?
+ Payload Sizes: Kilobytes? Megabytes?
+ OPS / TPS: Hundreds of thousands? Millions?
+ Latencies: Sub-millisecond? Single-digit milliseconds?
Distributed Database Design Decisions to Support High Performance Event Streaming
Databases are [or should be] designed for
specific kinds of data, specific kinds of
workloads, and specific kinds of queries.
How aligned or far away from your specific
use case a database may be in its design &
implementation from your desired utility of it
determines the resistance of the system
Variable
Resistors
Anyone?
Distributed Database Design Decisions to Support High Performance Event Streaming
Sure you can use various databases for tasks they were
never designed for — but should you?
DATA ENGINEERS
Distributed Database Design Decisions to Support High Performance Event Streaming
Δ Data
––––––––––––
t
t ~ n ×0.001s
For a database to be appropriate for event streaming, it needs to
support managing changes to data over time in “real time” —
measured in single-digit milliseconds or less.
And where changes to data can be produced at a rate of
hundreds of thousands or millions of events per second. [And
greater rates in future]
DBaaS
Single-cloud vs.
Multi-cloud?
Multi-datacenter
Elasticity
Serverless
Orchestration
DevSecOps
Scalability
Reliability
Durability
Manageability
Observability
Flexibility
Facility / Usability
Compatibility
Interoperability
Linearizability
“Batch” → “Stream”
Change Data
Capture (CDC)
Sink & Source
Time Series
Event Streaming
Event Sourcing*
[* ≠ Event Streaming]
SQL or NoSQL?
Query Language
Data Model
Data Distribution
Workload [R/W]
Speed
Price/TCO/ROI
Cloud Native
Qualities
All the “-ilities” Event-Driven Best Fit to Use Case
Distributed Database Design Decisions to Support High Performance Event Streaming
Distributed Database Design Decisions to Support High Performance Event Streaming
While many database systems have been incrementally adapted to cloud native
environments, they still have underlying architectural limits or presumptions.
+ Strong consistency / record-locking — limits latencies & throughput
+ Single primary server for read/writes — replicas are read-only or only for failover;
bottlenecks write-heavy workloads
+ Local clustering/single datacenter design — inappropriate for high availability;
hampers global distribution; lack of topology-awareness induces fragility
Distributed Database Design Decisions to Support High Performance Event Streaming
Two flavors of responses:
+ NoSQL — Designed for non-relational data models, various query languages,
high availability distributed systems
+ Key value, document, wide column, graph, etc.
+ NewSQL — Still RDBMS, still SQL, but designed to operate as a highly
available distributed system
Distributed Database Design Decisions to Support High Performance Event Streaming
Database-as-a-Service (DBaaS)
+ Lift-and-Shift to Cloud — Same base offering as on-premises version,
offered as a cloud-hosted managed service
+ Easy/fast to bring to market, but no fundamental design changes
+ Cloud Native — Designed from-the-ground-up for cloud [only] usage
+ Elasticity — Dynamic provisioning, scale up/down for throughput, storage
+ Serverless — Do I need to know what hardware I’m running on?
+ Microservices & API integration — App integration, connectors, DevEx
+ Billing — making it easy to consume & measure ROI/TCO
+ Governance: Privacy Compliance / Data Localization
Distributed Database Design Decisions to Support High Performance Event Streaming
What does a database need to be, or have, or do, to properly support
event streaming in 2022?
+ High Availability [“Always On”]
+ Impedance Match of Database to Event Streaming Systems
+ Similar characteristics for throughput, latency
+ All the Appropriate “Goesintos/Goesouttas”
+ Sink Connector
+ Change Data Capture (CDC) / Source Connector
+ Supports your favorite streaming flavor of the day
+ Kafka, Pulsar, RabbitMQ Streams, etc.
Distributed Database Design Decisions to Support High Performance Event Streaming
Event Streaming Journey of a
NoSQL Database: ScyllaDB
Distributed Database Design Decisions to Support High Performance Event Streaming
ScyllaDB: Building on “Good Bones”
+ Performant: Shard-per-core, async-everywhere, shared-nothing architecture
+ Scalable: both horizontal [100s/1000s of nodes] & vertical [100s/1000s cores]
+ Available: Peer-to-Peer, Active-Active; no single point of failure
+ Distribution: Multi-datacenter clustering & replication, auto-sharding
+ Consistency: tunable; primarily eventual, but also Lightweight Transactions (LWT)
+ Topology Aware: Shard-aware, Node-aware, Rack-aware, Datacenter-aware
+ Compatible: Cassandra CQL & Amazon DynamoDB APIs
Distributed Database Design Decisions to Support High Performance Event Streaming
ScyllaDB Journey to Event Streaming — Starting with Kafka
+ Shard-Aware Kafka Sink Connector [January 2020]
+ Github: https://github.com/scylladb/kafka-connect-scylladb
+ Blog: https://www.scylladb.com/2020/02/18/introducing-the-kafka-scylla-connector/
Distributed Database Design Decisions to Support High Performance Event Streaming
ScyllaDB Journey to Event Streaming — Starting with Kafka
+ Shard-Aware Kafka Sink Connector [January 2020]
+ Github: https://github.com/scylladb/kafka-connect-scylladb
+ Blog: https://www.scylladb.com/2020/02/18/introducing-the-kafka-scylla-connector/
+ Change Data Capture [January 2020 – October 2021]
+ January 2020: ScyllaDB Open Source 3.2 — Experimental
+ Course of 2020 - 3.3, 3.4, 4.0, 4.1, 4.2 — Experimental iterations
+ January 2021: 4.3: Production-ready, new API
+ March 2021: 4.4: new API
+ October 2021: 4.5: performance & stability
Distributed Database Design Decisions to Support High Performance Event Streaming
ScyllaDB Journey to Event Streaming — Starting with Kafka
+ Shard-Aware Kafka Sink Connector [January 2020]
+ Github: https://github.com/scylladb/kafka-connect-scylladb
+ Blog: https://www.scylladb.com/2020/02/18/introducing-the-kafka-scylla-connector/
+ Change Data Capture [January 2020 – October 2021]
+ January 2020: ScyllaDB Open Source 3.2 — Experimental
+ Course of 2020 - 3.3, 3.4, 4.0, 4.1, 4.2 — Experimental iterations
+ January 2021: 4.3: Production-ready, new API
+ March 2021: 4.4: new API
+ October 2021: 4.5: performance & stability
+ CDC Kafka Source Connector [April 2021]
+ Github: https://github.com/scylladb/scylla-cdc-source-connector
+ Blog: https://debezium.io/blog/2021/09/22/deep-dive-into-a-debezium-community-conn
ector-scylla-cdc-source-connector/
Distributed Database Design Decisions to Support High Performance Event Streaming
Distributed Database Design Decisions to Support High Performance Event Streaming
ScyllaDB Journey to Event Streaming with Pulsar
+ Pulsar Consumer: Cassandra Sink Connector
+ Comes by default with Pulsar
+ ScyllaDB is Cassandra CQL compatible
+ Docs: https://pulsar.apache.org/docs/io-cassandra-sink/
+ Github: https://github.com/apache/pulsar/blob/master/site2/docs/io-cassandra-sink.md
+ Pulsar Producer: Can use ScyllaDB CDC Source Connector using Kafka Compatibility
+ Pulsar makes it easy to bring Kafka topics into Pulsar
+ Docs: https://pulsar.apache.org/docs/adaptors-kafka/
+ Potential Developments:
+ Native Pulsar Shard-Aware ScyllaDB Consumer Connector — even faster ingestion
+ Native CDC Pulsar Producer — unwrap your topics
Distributed Database Design Decisions to Support High Performance Event Streaming
ScyllaDB CDC:
How Does It Work?
ScyllaDB Quickstart: Create a Table and Enable CDC
CREATE TABLE ks.tbl (
pk int,
ck int,
val int,
col set<int>,
PRIMARY KEY (pk, ck)
) WITH cdc = { 'enabled': true };
Distributed Database Design Decisions to Support High Performance Event Streaming
Distributed Database Design Decisions to Support High Performance Event Streaming
CDC Options - Record Types
Delta Preimage Postimage
'full': contain
information about
every modified
column
'keys': only the
primary key of the
change will be
recorded
'false': Disables the
feature
'true': contain only the
columns that were
changed by the write
‘full’: contain the entire
row (how it was
before the write was
made)
'false': Disables the
feature
'true': show the
affected row’s state
after the write.
Postimage row always
contains all the
columns no matter if
they were affected by
the change or not
What was changed? What was before? What’s the end result?
Distributed Database Design Decisions to Support High Performance Event Streaming
CDC Options - Record Types
Enabled Postimage
86400: In seconds. By
default records on
CDC log table expire
within 24 hours
If set to 0, a separate
cleaning mechanism is
recommended.
'false': Disables the
CDC feature
'true': Enables the
CDC feature
TTL
Distributed Database Design Decisions to Support High Performance Event Streaming
cqlsh> desc table ks.tbl
_scylla_cdc_log;
CREATE TABLE ks.tbl_scylla_cdc_log (
"cdc$stream_id" blob,
"cdc$time" timeuuid,
"cdc$batch_seq_no" int,
"cdc$deleted_col" boolean,
"cdc$deleted_elements_col" frozen<set<int>>,
"cdc$deleted_val" boolean,
"cdc$end_of_batch" boolean,
"cdc$operation" tinyint,
"cdc$ttl" bigint,
ck int,
col frozen<set<int>>,
pk int,
val int,
PRIMARY KEY ("cdc$stream_id"
, "cdc$time", "cdc$batch_seq_no")
)
Partition Key Sorted by time Batch sequence
CDC Log Table
Cassandra DynamoDB MongoDB ScyllaDB
Consumer location on-node off-node off-node off-node
Replication duplicated deduplicated deduplicated deduplicated
Deltas yes no partial optional
Pre-image no yes no optional
Post-image no yes yes optional
Slow consumer reaction Table stopped Consumer loses data Consumer loses data Consumer loses data
Ordering no yes yes yes
Distributed Database Design Decisions to Support High Performance Event Streaming
How Do NoSQL CDC Implementations Compare?
Writing to Base Table [No CDC]
CQL write goes to
coordinator node.
INSERT INTO base_table(...)...
Distributed Database Design Decisions to Support High Performance Event Streaming
Coordinator node creates
write calls to replica nodes.
INSERT INTO base_table(...)...
CQL
Replicated writes
Writing to Base Table [No CDC]
Distributed Database Design Decisions to Support High Performance Event Streaming
Writing to CDC Enabled Table
CQL write goes to
coordinator node.
INSERT INTO base_table(...)...
Distributed Database Design Decisions to Support High Performance Event Streaming
Writing to CDC enabled table (post/preimage)
If required, Coordinator reads
existing row data for
pre-/post image generation.
INSERT INTO base_table(...)...
CQL
(Opt) preimage read
Distributed Database Design Decisions to Support High Performance Event Streaming
Writing to CDC Enabled Table
Coordinator creates CDC log table
writes and piggybacks on base
table writes to same replica nodes.
While data size written is larger, the
number of writes requests does not
change.
INSERT INTO base_table(...)...
CQL
CDC write
Distributed Database Design Decisions to Support High Performance Event Streaming
▪ CDC data is grouped into streams
• Divides the token ring space
• Each stream represents a tokenization “slot” in
current topology
• Stream is log partition key
• Stream chosen for given write based on base table
PK tokenization
▪ Can read from all, one or some streams at a time
• Allows “round-robin” traversal of data space to
avoid too large or cross-node queries
Stream 1, 2, 3, 4...
Token ring
Distributed Database Design Decisions to Support High Performance Event Streaming
CDC Streams
Distributed Database Design Decisions to Support High Performance Event Streaming
CDC Streams
Token ring
CDC
Java
Driver
Kafka
Source
Conn.
The Java driver handles round-robin
traversal.
Kafka
Broker
CDC Streams
Stream 1, 2, 3, 4...
Distributed Database Design Decisions to Support High Performance Event Streaming
Change Data Capture (CDC) lesson here:
https://university.scylladb.com/courses/scylla-operations/lessons/change-data-capture-cdc/
Learn NoSQL for free!
university.scylladb.com
Peter Corless
Thank you!
peter@scylladb.com
@petercorless
Pulsar Summit
San Francisco
Hotel Nikko
August 18 2022

More Related Content

Similar to Distributed Database Design Decisions to Support High Performance Event Streaming - Pulsar Summit SF 2022

What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0
ScyllaDB
 
The Last Frontier- Virtualization, Hybrid Management and the Cloud
The Last Frontier-  Virtualization, Hybrid Management and the CloudThe Last Frontier-  Virtualization, Hybrid Management and the Cloud
The Last Frontier- Virtualization, Hybrid Management and the Cloud
Kellyn Pot'Vin-Gorman
 
Containerized Hadoop beyond Kubernetes
Containerized Hadoop beyond KubernetesContainerized Hadoop beyond Kubernetes
Containerized Hadoop beyond Kubernetes
DataWorks Summit
 
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, PresetStreaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
HostedbyConfluent
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
James Serra
 
Optimizing Performance in Rust for Low-Latency Database Drivers
Optimizing Performance in Rust for Low-Latency Database DriversOptimizing Performance in Rust for Low-Latency Database Drivers
Optimizing Performance in Rust for Low-Latency Database Drivers
ScyllaDB
 
NewSQL - Deliverance from BASE and back to SQL and ACID
NewSQL - Deliverance from BASE and back to SQL and ACIDNewSQL - Deliverance from BASE and back to SQL and ACID
NewSQL - Deliverance from BASE and back to SQL and ACID
Tony Rogerson
 
The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
Timothy Spann
 
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Trivadis
 
Episode 3: Kubernetes and Big Data Services
Episode 3: Kubernetes and Big Data ServicesEpisode 3: Kubernetes and Big Data Services
Episode 3: Kubernetes and Big Data Services
Mesosphere Inc.
 
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
nnakasone
 
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, QlikKeeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
HostedbyConfluent
 
Seamless, Real-Time Data Integration with Connect
Seamless, Real-Time Data Integration with ConnectSeamless, Real-Time Data Integration with Connect
Seamless, Real-Time Data Integration with Connect
Precisely
 
Cloudera Operational DB (Apache HBase & Apache Phoenix)
Cloudera Operational DB (Apache HBase & Apache Phoenix)Cloudera Operational DB (Apache HBase & Apache Phoenix)
Cloudera Operational DB (Apache HBase & Apache Phoenix)
Timothy Spann
 
Managing and Deploying High Performance Computing Clusters using Windows HPC ...
Managing and Deploying High Performance Computing Clusters using Windows HPC ...Managing and Deploying High Performance Computing Clusters using Windows HPC ...
Managing and Deploying High Performance Computing Clusters using Windows HPC ...
Saptak Sen
 
Simplifying Big Data Integration with Syncsort DMX and DMX-h
Simplifying Big Data Integration with Syncsort DMX and DMX-hSimplifying Big Data Integration with Syncsort DMX and DMX-h
Simplifying Big Data Integration with Syncsort DMX and DMX-h
Precisely
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
confluent
 
44spotkaniePLSSUGWRO_CoNowegowKrainieChmur
44spotkaniePLSSUGWRO_CoNowegowKrainieChmur44spotkaniePLSSUGWRO_CoNowegowKrainieChmur
44spotkaniePLSSUGWRO_CoNowegowKrainieChmur
Tobias Koprowski
 
MACHBASE_NEO
MACHBASE_NEOMACHBASE_NEO
MACHBASE_NEO
MACHBASE
 
Cosmos DB Real-time Advanced Analytics Workshop
Cosmos DB Real-time Advanced Analytics WorkshopCosmos DB Real-time Advanced Analytics Workshop
Cosmos DB Real-time Advanced Analytics Workshop
Databricks
 

Similar to Distributed Database Design Decisions to Support High Performance Event Streaming - Pulsar Summit SF 2022 (20)

What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0
 
The Last Frontier- Virtualization, Hybrid Management and the Cloud
The Last Frontier-  Virtualization, Hybrid Management and the CloudThe Last Frontier-  Virtualization, Hybrid Management and the Cloud
The Last Frontier- Virtualization, Hybrid Management and the Cloud
 
Containerized Hadoop beyond Kubernetes
Containerized Hadoop beyond KubernetesContainerized Hadoop beyond Kubernetes
Containerized Hadoop beyond Kubernetes
 
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, PresetStreaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Optimizing Performance in Rust for Low-Latency Database Drivers
Optimizing Performance in Rust for Low-Latency Database DriversOptimizing Performance in Rust for Low-Latency Database Drivers
Optimizing Performance in Rust for Low-Latency Database Drivers
 
NewSQL - Deliverance from BASE and back to SQL and ACID
NewSQL - Deliverance from BASE and back to SQL and ACIDNewSQL - Deliverance from BASE and back to SQL and ACID
NewSQL - Deliverance from BASE and back to SQL and ACID
 
The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
 
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
 
Episode 3: Kubernetes and Big Data Services
Episode 3: Kubernetes and Big Data ServicesEpisode 3: Kubernetes and Big Data Services
Episode 3: Kubernetes and Big Data Services
 
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
 
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, QlikKeeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
 
Seamless, Real-Time Data Integration with Connect
Seamless, Real-Time Data Integration with ConnectSeamless, Real-Time Data Integration with Connect
Seamless, Real-Time Data Integration with Connect
 
Cloudera Operational DB (Apache HBase & Apache Phoenix)
Cloudera Operational DB (Apache HBase & Apache Phoenix)Cloudera Operational DB (Apache HBase & Apache Phoenix)
Cloudera Operational DB (Apache HBase & Apache Phoenix)
 
Managing and Deploying High Performance Computing Clusters using Windows HPC ...
Managing and Deploying High Performance Computing Clusters using Windows HPC ...Managing and Deploying High Performance Computing Clusters using Windows HPC ...
Managing and Deploying High Performance Computing Clusters using Windows HPC ...
 
Simplifying Big Data Integration with Syncsort DMX and DMX-h
Simplifying Big Data Integration with Syncsort DMX and DMX-hSimplifying Big Data Integration with Syncsort DMX and DMX-h
Simplifying Big Data Integration with Syncsort DMX and DMX-h
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
 
44spotkaniePLSSUGWRO_CoNowegowKrainieChmur
44spotkaniePLSSUGWRO_CoNowegowKrainieChmur44spotkaniePLSSUGWRO_CoNowegowKrainieChmur
44spotkaniePLSSUGWRO_CoNowegowKrainieChmur
 
MACHBASE_NEO
MACHBASE_NEOMACHBASE_NEO
MACHBASE_NEO
 
Cosmos DB Real-time Advanced Analytics Workshop
Cosmos DB Real-time Advanced Analytics WorkshopCosmos DB Real-time Advanced Analytics Workshop
Cosmos DB Real-time Advanced Analytics Workshop
 

More from StreamNative

Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
StreamNative
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
StreamNative
 
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
StreamNative
 
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
StreamNative
 
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
StreamNative
 
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
StreamNative
 
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
StreamNative
 
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
StreamNative
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
StreamNative
 
Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022
StreamNative
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
StreamNative
 
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
StreamNative
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022
StreamNative
 
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
StreamNative
 
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
StreamNative
 
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
StreamNative
 
Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022
StreamNative
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
StreamNative
 
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
StreamNative
 
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021
Improvements Made in KoP 2.9.0  - Pulsar Summit Asia 2021Improvements Made in KoP 2.9.0  - Pulsar Summit Asia 2021
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021
StreamNative
 

More from StreamNative (20)

Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
 
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
 
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
 
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
 
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
 
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
 
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
 
Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
 
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022
 
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
 
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
 
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
 
Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
 
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
 
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021
Improvements Made in KoP 2.9.0  - Pulsar Summit Asia 2021Improvements Made in KoP 2.9.0  - Pulsar Summit Asia 2021
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021
 

Recently uploaded

Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
UiPathCommunity
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
Jen Stirrup
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
Globus
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 

Recently uploaded (20)

Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 

Distributed Database Design Decisions to Support High Performance Event Streaming - Pulsar Summit SF 2022

  • 1. Pulsar Summit San Francisco Hotel Nikko August 18 2022 Ecosystem Distributed Database Design Decisions to Support High Performance Event Streaming Peter Corless Director of Technical Advocacy • ScyllaDB
  • 2. Peter Corless is the Director of Technical Advocacy of ScyllaDB, the company behind the monstrously fast and scalable NoSQL database. He is the editor of and frequent contributor to the ScyllaDB blog, and program chair of the ScyllaDB Summit and P99 CONF. He recently hosted the Distributed Systems Masterclass, co-sponsored by StreamNative+ScyllaDB Peter Corless Director of Technical Advocacy ScyllaDB
  • 3. Distributed Database Design Decisions to Support High Performance Event Streaming Requirements for This Next Tech Cycle
  • 4. This Next Tech Cycle 2000 2010 2020 2025+ Transistor Count 42M Pentium 4 (2000) 228M Pentium D (2005) 2.3B Xeon Nahalem-EX (2010) 10B SPARC M7 (2015) 39B Epyc Rome (2019) Core Count 1 2 8 32 64 ~60B? Epyc Genoa (2022) 96 ~80B? Epyc Bergamo (2023) 128 1.2 ZB IP traffic (2016) 2 ZB Data stored (2010) 64 ZB Data stored (2020) Broadband Speeds 3G (2002) 105mbps (2014) 1.5 mbps (2002) 16 mbps (2008) Wireless Services 3Gbps (2021) 1Gbps (2018) 4G (2014) 5G (2018) Zettabyte Era ~180 ZB Data stored (2025) Public Cloud to Multicloud AWS (2006) GCP (2008) Azure (2010) 1021 Azure Arc
  • 5. Hardware Infrastructure is Evolving + Compute + From 100+ cores → 1,000+ cores per server + From multicore CPUs → full System on a Chip (SoC) designs (CPU, GPU, Cache, Memory) + Memory + Terabyte-scale RAM per server + DDR4 — 800 to 1600 MHz, 2011-present + DDR5 — 4600 MHz in 2020, 8000 MHz by 2024 + DDR6 — 9600 MHz by 2025 + Storage + Petabyte-scale storage per server + NVMe 2.0 [2021] — separation of base and transport Distributed Database Design Decisions to Support High Performance Event Streaming
  • 6. Distributed Database Design Decisions to Support High Performance Event Streaming Databases are Evolving + Consistency Models [CAP Model: AP vs. CP] + Strong, Eventual, Tunable + ACID vs. BASE + Data Model / Query Languages [SQL vs. NoSQL] + RDBMS / SQL + NoSQL [Document, Key-Value, Wide-Column, Graph] + Big Data → HUGE Data + Data Stored: Gigabytes? Terabytes? Petabytes? Exabytes? + Payload Sizes: Kilobytes? Megabytes? + OPS / TPS: Hundreds of thousands? Millions? + Latencies: Sub-millisecond? Single-digit milliseconds?
  • 7. Distributed Database Design Decisions to Support High Performance Event Streaming Databases are [or should be] designed for specific kinds of data, specific kinds of workloads, and specific kinds of queries. How aligned or far away from your specific use case a database may be in its design & implementation from your desired utility of it determines the resistance of the system Variable Resistors Anyone?
  • 8. Distributed Database Design Decisions to Support High Performance Event Streaming Sure you can use various databases for tasks they were never designed for — but should you? DATA ENGINEERS
  • 9. Distributed Database Design Decisions to Support High Performance Event Streaming Δ Data –––––––––––– t t ~ n ×0.001s For a database to be appropriate for event streaming, it needs to support managing changes to data over time in “real time” — measured in single-digit milliseconds or less. And where changes to data can be produced at a rate of hundreds of thousands or millions of events per second. [And greater rates in future]
  • 10. DBaaS Single-cloud vs. Multi-cloud? Multi-datacenter Elasticity Serverless Orchestration DevSecOps Scalability Reliability Durability Manageability Observability Flexibility Facility / Usability Compatibility Interoperability Linearizability “Batch” → “Stream” Change Data Capture (CDC) Sink & Source Time Series Event Streaming Event Sourcing* [* ≠ Event Streaming] SQL or NoSQL? Query Language Data Model Data Distribution Workload [R/W] Speed Price/TCO/ROI Cloud Native Qualities All the “-ilities” Event-Driven Best Fit to Use Case Distributed Database Design Decisions to Support High Performance Event Streaming
  • 11. Distributed Database Design Decisions to Support High Performance Event Streaming While many database systems have been incrementally adapted to cloud native environments, they still have underlying architectural limits or presumptions. + Strong consistency / record-locking — limits latencies & throughput + Single primary server for read/writes — replicas are read-only or only for failover; bottlenecks write-heavy workloads + Local clustering/single datacenter design — inappropriate for high availability; hampers global distribution; lack of topology-awareness induces fragility
  • 12. Distributed Database Design Decisions to Support High Performance Event Streaming Two flavors of responses: + NoSQL — Designed for non-relational data models, various query languages, high availability distributed systems + Key value, document, wide column, graph, etc. + NewSQL — Still RDBMS, still SQL, but designed to operate as a highly available distributed system
  • 13. Distributed Database Design Decisions to Support High Performance Event Streaming Database-as-a-Service (DBaaS) + Lift-and-Shift to Cloud — Same base offering as on-premises version, offered as a cloud-hosted managed service + Easy/fast to bring to market, but no fundamental design changes + Cloud Native — Designed from-the-ground-up for cloud [only] usage + Elasticity — Dynamic provisioning, scale up/down for throughput, storage + Serverless — Do I need to know what hardware I’m running on? + Microservices & API integration — App integration, connectors, DevEx + Billing — making it easy to consume & measure ROI/TCO + Governance: Privacy Compliance / Data Localization
  • 14. Distributed Database Design Decisions to Support High Performance Event Streaming What does a database need to be, or have, or do, to properly support event streaming in 2022? + High Availability [“Always On”] + Impedance Match of Database to Event Streaming Systems + Similar characteristics for throughput, latency + All the Appropriate “Goesintos/Goesouttas” + Sink Connector + Change Data Capture (CDC) / Source Connector + Supports your favorite streaming flavor of the day + Kafka, Pulsar, RabbitMQ Streams, etc.
  • 15. Distributed Database Design Decisions to Support High Performance Event Streaming Event Streaming Journey of a NoSQL Database: ScyllaDB
  • 16. Distributed Database Design Decisions to Support High Performance Event Streaming ScyllaDB: Building on “Good Bones” + Performant: Shard-per-core, async-everywhere, shared-nothing architecture + Scalable: both horizontal [100s/1000s of nodes] & vertical [100s/1000s cores] + Available: Peer-to-Peer, Active-Active; no single point of failure + Distribution: Multi-datacenter clustering & replication, auto-sharding + Consistency: tunable; primarily eventual, but also Lightweight Transactions (LWT) + Topology Aware: Shard-aware, Node-aware, Rack-aware, Datacenter-aware + Compatible: Cassandra CQL & Amazon DynamoDB APIs
  • 17. Distributed Database Design Decisions to Support High Performance Event Streaming ScyllaDB Journey to Event Streaming — Starting with Kafka + Shard-Aware Kafka Sink Connector [January 2020] + Github: https://github.com/scylladb/kafka-connect-scylladb + Blog: https://www.scylladb.com/2020/02/18/introducing-the-kafka-scylla-connector/
  • 18. Distributed Database Design Decisions to Support High Performance Event Streaming ScyllaDB Journey to Event Streaming — Starting with Kafka + Shard-Aware Kafka Sink Connector [January 2020] + Github: https://github.com/scylladb/kafka-connect-scylladb + Blog: https://www.scylladb.com/2020/02/18/introducing-the-kafka-scylla-connector/ + Change Data Capture [January 2020 – October 2021] + January 2020: ScyllaDB Open Source 3.2 — Experimental + Course of 2020 - 3.3, 3.4, 4.0, 4.1, 4.2 — Experimental iterations + January 2021: 4.3: Production-ready, new API + March 2021: 4.4: new API + October 2021: 4.5: performance & stability
  • 19. Distributed Database Design Decisions to Support High Performance Event Streaming ScyllaDB Journey to Event Streaming — Starting with Kafka + Shard-Aware Kafka Sink Connector [January 2020] + Github: https://github.com/scylladb/kafka-connect-scylladb + Blog: https://www.scylladb.com/2020/02/18/introducing-the-kafka-scylla-connector/ + Change Data Capture [January 2020 – October 2021] + January 2020: ScyllaDB Open Source 3.2 — Experimental + Course of 2020 - 3.3, 3.4, 4.0, 4.1, 4.2 — Experimental iterations + January 2021: 4.3: Production-ready, new API + March 2021: 4.4: new API + October 2021: 4.5: performance & stability + CDC Kafka Source Connector [April 2021] + Github: https://github.com/scylladb/scylla-cdc-source-connector + Blog: https://debezium.io/blog/2021/09/22/deep-dive-into-a-debezium-community-conn ector-scylla-cdc-source-connector/
  • 20. Distributed Database Design Decisions to Support High Performance Event Streaming
  • 21. Distributed Database Design Decisions to Support High Performance Event Streaming ScyllaDB Journey to Event Streaming with Pulsar + Pulsar Consumer: Cassandra Sink Connector + Comes by default with Pulsar + ScyllaDB is Cassandra CQL compatible + Docs: https://pulsar.apache.org/docs/io-cassandra-sink/ + Github: https://github.com/apache/pulsar/blob/master/site2/docs/io-cassandra-sink.md + Pulsar Producer: Can use ScyllaDB CDC Source Connector using Kafka Compatibility + Pulsar makes it easy to bring Kafka topics into Pulsar + Docs: https://pulsar.apache.org/docs/adaptors-kafka/ + Potential Developments: + Native Pulsar Shard-Aware ScyllaDB Consumer Connector — even faster ingestion + Native CDC Pulsar Producer — unwrap your topics
  • 22. Distributed Database Design Decisions to Support High Performance Event Streaming ScyllaDB CDC: How Does It Work?
  • 23. ScyllaDB Quickstart: Create a Table and Enable CDC CREATE TABLE ks.tbl ( pk int, ck int, val int, col set<int>, PRIMARY KEY (pk, ck) ) WITH cdc = { 'enabled': true }; Distributed Database Design Decisions to Support High Performance Event Streaming
  • 24. Distributed Database Design Decisions to Support High Performance Event Streaming CDC Options - Record Types Delta Preimage Postimage 'full': contain information about every modified column 'keys': only the primary key of the change will be recorded 'false': Disables the feature 'true': contain only the columns that were changed by the write ‘full’: contain the entire row (how it was before the write was made) 'false': Disables the feature 'true': show the affected row’s state after the write. Postimage row always contains all the columns no matter if they were affected by the change or not What was changed? What was before? What’s the end result?
  • 25. Distributed Database Design Decisions to Support High Performance Event Streaming CDC Options - Record Types Enabled Postimage 86400: In seconds. By default records on CDC log table expire within 24 hours If set to 0, a separate cleaning mechanism is recommended. 'false': Disables the CDC feature 'true': Enables the CDC feature TTL
  • 26. Distributed Database Design Decisions to Support High Performance Event Streaming cqlsh> desc table ks.tbl _scylla_cdc_log; CREATE TABLE ks.tbl_scylla_cdc_log ( "cdc$stream_id" blob, "cdc$time" timeuuid, "cdc$batch_seq_no" int, "cdc$deleted_col" boolean, "cdc$deleted_elements_col" frozen<set<int>>, "cdc$deleted_val" boolean, "cdc$end_of_batch" boolean, "cdc$operation" tinyint, "cdc$ttl" bigint, ck int, col frozen<set<int>>, pk int, val int, PRIMARY KEY ("cdc$stream_id" , "cdc$time", "cdc$batch_seq_no") ) Partition Key Sorted by time Batch sequence CDC Log Table
  • 27. Cassandra DynamoDB MongoDB ScyllaDB Consumer location on-node off-node off-node off-node Replication duplicated deduplicated deduplicated deduplicated Deltas yes no partial optional Pre-image no yes no optional Post-image no yes yes optional Slow consumer reaction Table stopped Consumer loses data Consumer loses data Consumer loses data Ordering no yes yes yes Distributed Database Design Decisions to Support High Performance Event Streaming How Do NoSQL CDC Implementations Compare?
  • 28. Writing to Base Table [No CDC] CQL write goes to coordinator node. INSERT INTO base_table(...)... Distributed Database Design Decisions to Support High Performance Event Streaming
  • 29. Coordinator node creates write calls to replica nodes. INSERT INTO base_table(...)... CQL Replicated writes Writing to Base Table [No CDC] Distributed Database Design Decisions to Support High Performance Event Streaming
  • 30. Writing to CDC Enabled Table CQL write goes to coordinator node. INSERT INTO base_table(...)... Distributed Database Design Decisions to Support High Performance Event Streaming
  • 31. Writing to CDC enabled table (post/preimage) If required, Coordinator reads existing row data for pre-/post image generation. INSERT INTO base_table(...)... CQL (Opt) preimage read Distributed Database Design Decisions to Support High Performance Event Streaming
  • 32. Writing to CDC Enabled Table Coordinator creates CDC log table writes and piggybacks on base table writes to same replica nodes. While data size written is larger, the number of writes requests does not change. INSERT INTO base_table(...)... CQL CDC write Distributed Database Design Decisions to Support High Performance Event Streaming
  • 33. ▪ CDC data is grouped into streams • Divides the token ring space • Each stream represents a tokenization “slot” in current topology • Stream is log partition key • Stream chosen for given write based on base table PK tokenization ▪ Can read from all, one or some streams at a time • Allows “round-robin” traversal of data space to avoid too large or cross-node queries Stream 1, 2, 3, 4... Token ring Distributed Database Design Decisions to Support High Performance Event Streaming CDC Streams
  • 34. Distributed Database Design Decisions to Support High Performance Event Streaming CDC Streams Token ring CDC Java Driver Kafka Source Conn. The Java driver handles round-robin traversal. Kafka Broker CDC Streams Stream 1, 2, 3, 4...
  • 35. Distributed Database Design Decisions to Support High Performance Event Streaming Change Data Capture (CDC) lesson here: https://university.scylladb.com/courses/scylla-operations/lessons/change-data-capture-cdc/ Learn NoSQL for free! university.scylladb.com
  • 36. Peter Corless Thank you! peter@scylladb.com @petercorless Pulsar Summit San Francisco Hotel Nikko August 18 2022