Distributed Database Design Decisions to Support High Performance Event Streaming - Pulsar Summit SF 2022

Pulsar Summit
San Francisco
Hotel Nikko
August 18 2022
Ecosystem
Distributed Database
Design Decisions to
Support High
Performance Event
Streaming
Peter Corless
Director of Technical Advocacy • ScyllaDB

Peter Corless is the Director of
Technical Advocacy of ScyllaDB, the
company behind the monstrously fast
and scalable NoSQL database.
He is the editor of and frequent
contributor to the ScyllaDB blog, and
program chair of the ScyllaDB Summit
and P99 CONF.
He recently hosted the Distributed
Systems Masterclass, co-sponsored by
StreamNative+ScyllaDB
Peter Corless
Director of Technical Advocacy
ScyllaDB

Distributed Database Design Decisions to Support High Performance Event Streaming
Requirements for
This Next Tech Cycle

This Next Tech Cycle
2000 2010
2020 2025+
Transistor
Count
42M
Pentium 4
(2000)
228M
Pentium D
(2005)
2.3B
Xeon Nahalem-EX
(2010)
10B
SPARC M7
(2015)
39B
Epyc Rome
(2019)
Core
Count 1 2 8 32 64
~60B?
Epyc Genoa
(2022)
96
~80B?
Epyc Bergamo
(2023)
128
1.2 ZB
IP traﬃc
(2016)
2 ZB
Data stored
(2010)
64 ZB
Data stored
(2020)
Broadband
Speeds
3G
(2002)
105mbps
(2014)
1.5 mbps
(2002)
16 mbps
(2008)
Wireless
Services
3Gbps
(2021)
1Gbps
(2018)
4G
(2014)
5G
(2018)
Zettabyte
Era
~180 ZB
Data stored
(2025)
Public Cloud
to Multicloud
AWS
(2006)
GCP
(2008)
Azure
(2010)
1021
Azure Arc

Hardware Infrastructure is Evolving
+ Compute
+ From 100+ cores → 1,000+ cores per server
+ From multicore CPUs → full System on a Chip (SoC) designs (CPU,
GPU, Cache, Memory)
+ Memory
+ Terabyte-scale RAM per server
+ DDR4 — 800 to 1600 MHz, 2011-present
+ DDR5 — 4600 MHz in 2020, 8000 MHz by 2024
+ DDR6 — 9600 MHz by 2025
+ Storage
+ Petabyte-scale storage per server
+ NVMe 2.0 [2021] — separation of base and transport

Databases are Evolving
+ Consistency Models [CAP Model: AP vs. CP]
+ Strong, Eventual, Tunable
+ ACID vs. BASE
+ Data Model / Query Languages [SQL vs. NoSQL]
+ RDBMS / SQL
+ NoSQL [Document, Key-Value, Wide-Column, Graph]
+ Big Data → HUGE Data
+ Data Stored: Gigabytes? Terabytes? Petabytes? Exabytes?
+ Payload Sizes: Kilobytes? Megabytes?
+ OPS / TPS: Hundreds of thousands? Millions?
+ Latencies: Sub-millisecond? Single-digit milliseconds?

Databases are [or should be] designed for
specific kinds of data, specific kinds of
workloads, and specific kinds of queries.
How aligned or far away from your specific
use case a database may be in its design &
implementation from your desired utility of it
determines the resistance of the system
Variable
Resistors
Anyone?

Sure you can use various databases for tasks they were
never designed for — but should you?
DATA ENGINEERS

Δ Data
––––––––––––
t
t ~ n ×0.001s
For a database to be appropriate for event streaming, it needs to
support managing changes to data over time in “real time” —
measured in single-digit milliseconds or less.
And where changes to data can be produced at a rate of
hundreds of thousands or millions of events per second. [And
greater rates in future]

DBaaS
Single-cloud vs.
Multi-cloud?
Multi-datacenter
Elasticity
Serverless
Orchestration
DevSecOps
Scalability
Reliability
Durability
Manageability
Observability
Flexibility
Facility / Usability
Compatibility
Interoperability
Linearizability
“Batch” → “Stream”
Change Data
Capture (CDC)
Sink & Source
Time Series
Event Streaming
Event Sourcing*
[* ≠ Event Streaming]
SQL or NoSQL?
Query Language
Data Model
Data Distribution
Workload [R/W]
Speed
Price/TCO/ROI
Cloud Native
Qualities
All the “-ilities” Event-Driven Best Fit to Use Case

While many database systems have been incrementally adapted to cloud native
environments, they still have underlying architectural limits or presumptions.
+ Strong consistency / record-locking — limits latencies & throughput
+ Single primary server for read/writes — replicas are read-only or only for failover;
bottlenecks write-heavy workloads
+ Local clustering/single datacenter design — inappropriate for high availability;
hampers global distribution; lack of topology-awareness induces fragility

Two flavors of responses:
+ NoSQL — Designed for non-relational data models, various query languages,
high availability distributed systems
+ Key value, document, wide column, graph, etc.
+ NewSQL — Still RDBMS, still SQL, but designed to operate as a highly
available distributed system

Database-as-a-Service (DBaaS)
+ Lift-and-Shift to Cloud — Same base offering as on-premises version,
offered as a cloud-hosted managed service
+ Easy/fast to bring to market, but no fundamental design changes
+ Cloud Native — Designed from-the-ground-up for cloud [only] usage
+ Elasticity — Dynamic provisioning, scale up/down for throughput, storage
+ Serverless — Do I need to know what hardware I’m running on?
+ Microservices & API integration — App integration, connectors, DevEx
+ Billing — making it easy to consume & measure ROI/TCO
+ Governance: Privacy Compliance / Data Localization

What does a database need to be, or have, or do, to properly support
event streaming in 2022?
+ High Availability [“Always On”]
+ Impedance Match of Database to Event Streaming Systems
+ Similar characteristics for throughput, latency
+ All the Appropriate “Goesintos/Goesouttas”
+ Sink Connector
+ Change Data Capture (CDC) / Source Connector
+ Supports your favorite streaming flavor of the day
+ Kafka, Pulsar, RabbitMQ Streams, etc.

Event Streaming Journey of a
NoSQL Database: ScyllaDB

ScyllaDB: Building on “Good Bones”
+ Performant: Shard-per-core, async-everywhere, shared-nothing architecture
+ Scalable: both horizontal [100s/1000s of nodes] & vertical [100s/1000s cores]
+ Available: Peer-to-Peer, Active-Active; no single point of failure
+ Distribution: Multi-datacenter clustering & replication, auto-sharding
+ Consistency: tunable; primarily eventual, but also Lightweight Transactions (LWT)
+ Topology Aware: Shard-aware, Node-aware, Rack-aware, Datacenter-aware
+ Compatible: Cassandra CQL & Amazon DynamoDB APIs

ScyllaDB Journey to Event Streaming — Starting with Kafka
+ Shard-Aware Kafka Sink Connector [January 2020]
+ Github: https://github.com/scylladb/kafka-connect-scylladb
+ Blog: https://www.scylladb.com/2020/02/18/introducing-the-kafka-scylla-connector/

+ Change Data Capture [January 2020 – October 2021]
+ January 2020: ScyllaDB Open Source 3.2 — Experimental
+ Course of 2020 - 3.3, 3.4, 4.0, 4.1, 4.2 — Experimental iterations
+ January 2021: 4.3: Production-ready, new API
+ March 2021: 4.4: new API
+ October 2021: 4.5: performance & stability

+ Change Data Capture [January 2020 – October 2021]
+ January 2020: ScyllaDB Open Source 3.2 — Experimental
+ Course of 2020 - 3.3, 3.4, 4.0, 4.1, 4.2 — Experimental iterations
+ January 2021: 4.3: Production-ready, new API
+ March 2021: 4.4: new API
+ October 2021: 4.5: performance & stability
+ CDC Kafka Source Connector [April 2021]
+ Github: https://github.com/scylladb/scylla-cdc-source-connector
+ Blog: https://debezium.io/blog/2021/09/22/deep-dive-into-a-debezium-community-conn
ector-scylla-cdc-source-connector/

ScyllaDB Journey to Event Streaming with Pulsar
+ Pulsar Consumer: Cassandra Sink Connector
+ Comes by default with Pulsar
+ ScyllaDB is Cassandra CQL compatible
+ Docs: https://pulsar.apache.org/docs/io-cassandra-sink/
+ Github: https://github.com/apache/pulsar/blob/master/site2/docs/io-cassandra-sink.md
+ Pulsar Producer: Can use ScyllaDB CDC Source Connector using Kafka Compatibility
+ Pulsar makes it easy to bring Kafka topics into Pulsar
+ Docs: https://pulsar.apache.org/docs/adaptors-kafka/
+ Potential Developments:
+ Native Pulsar Shard-Aware ScyllaDB Consumer Connector — even faster ingestion
+ Native CDC Pulsar Producer — unwrap your topics

ScyllaDB CDC:
How Does It Work?

ScyllaDB Quickstart: Create a Table and Enable CDC
CREATE TABLE ks.tbl (
pk int,
ck int,
val int,
col set<int>,
PRIMARY KEY (pk, ck)
) WITH cdc = { 'enabled': true };

CDC Options - Record Types
Delta Preimage Postimage
'full': contain
information about
every modified
column
'keys': only the
primary key of the
change will be
recorded
'false': Disables the
feature
'true': contain only the
columns that were
changed by the write
‘full’: contain the entire
row (how it was
before the write was
made)
feature
'true': show the
affected row’s state
after the write.
Postimage row always
contains all the
columns no matter if
they were affected by
the change or not
What was changed? What was before? What’s the end result?

CDC Options - Record Types
Enabled Postimage
86400: In seconds. By
default records on
CDC log table expire
within 24 hours
If set to 0, a separate
cleaning mechanism is
recommended.
CDC feature
'true': Enables the
CDC feature
TTL

cqlsh> desc table ks.tbl
_scylla_cdc_log;
CREATE TABLE ks.tbl_scylla_cdc_log (
"cdc$stream_id" blob,
"cdc$time" timeuuid,
"cdc$batch_seq_no" int,
"cdc$deleted_col" boolean,
"cdc$deleted_elements_col" frozen<set<int>>,
"cdc$deleted_val" boolean,
"cdc$end_of_batch" boolean,
"cdc$operation" tinyint,
"cdc$ttl" bigint,
ck int,
col frozen<set<int>>,
pk int,
val int,
PRIMARY KEY ("cdc$stream_id"
, "cdc$time", "cdc$batch_seq_no")
)
Partition Key Sorted by time Batch sequence
CDC Log Table

Cassandra DynamoDB MongoDB ScyllaDB
Consumer location on-node off-node off-node off-node
Replication duplicated deduplicated deduplicated deduplicated
Deltas yes no partial optional
Pre-image no yes no optional
Post-image no yes yes optional
Slow consumer reaction Table stopped Consumer loses data Consumer loses data Consumer loses data
Ordering no yes yes yes
How Do NoSQL CDC Implementations Compare?

Writing to Base Table [No CDC]
CQL write goes to
coordinator node.
INSERT INTO base_table(...)...

Coordinator node creates
write calls to replica nodes.
CQL
Replicated writes
Writing to Base Table [No CDC]

Writing to CDC Enabled Table
CQL write goes to
coordinator node.

Writing to CDC enabled table (post/preimage)
If required, Coordinator reads
existing row data for
pre-/post image generation.
CQL
(Opt) preimage read

Writing to CDC Enabled Table
Coordinator creates CDC log table
writes and piggybacks on base
table writes to same replica nodes.
While data size written is larger, the
number of writes requests does not
change.
CQL
CDC write

▪ CDC data is grouped into streams
• Divides the token ring space
• Each stream represents a tokenization “slot” in
current topology
• Stream is log partition key
• Stream chosen for given write based on base table
PK tokenization
▪ Can read from all, one or some streams at a time
• Allows “round-robin” traversal of data space to
avoid too large or cross-node queries
Stream 1, 2, 3, 4...
Token ring
CDC Streams

CDC Streams
Token ring
CDC
Java
Driver
Kafka
Source
Conn.
The Java driver handles round-robin
traversal.
Kafka
Broker
CDC Streams
Stream 1, 2, 3, 4...

Change Data Capture (CDC) lesson here:
https://university.scylladb.com/courses/scylla-operations/lessons/change-data-capture-cdc/
Learn NoSQL for free!
university.scylladb.com

Peter Corless
Thank you!
peter@scylladb.com
@petercorless
Pulsar Summit
San Francisco
Hotel Nikko
August 18 2022

Distributed Database Design Decisions to Support High Performance Event Streaming - Pulsar Summit SF 2022

Recommended

Recommended

More Related Content

Similar to Distributed Database Design Decisions to Support High Performance Event Streaming - Pulsar Summit SF 2022

Similar to Distributed Database Design Decisions to Support High Performance Event Streaming - Pulsar Summit SF 2022 (20)

More from StreamNative

More from StreamNative (20)

Recently uploaded

Recently uploaded (20)

Distributed Database Design Decisions to Support High Performance Event Streaming - Pulsar Summit SF 2022