Building a 100% ScyllaDB Shard-Aware Application Using Rust

Building a 100%
ScyllaDB Shard-Aware
Application Using Rust
Alexys Jacob, Joseph Perez, Yassir Barchi

Alexys
CTO
Joseph
Senior Software
Engineer
Yassir
Lead Software
Engineer

The Path to a
100% Shard-Aware
Application

Project Context at Numberly
The Omnichannel Delivery team got tasked to build a platform that could be the single point of
entry for all the messages Numberly is operating and routing for their clients.
■ Clients & Platforms send messages using REST API gateways (Email, SMS, WhatsApp)
■ Gateways render and relay the messages to the Central Message Routing Platform
■ Which offers strong and consistent features for all channels
■ Scheduling
■ Accounting
■ Tracing
■ Routing

Central Messaging Platform Constraints
■ High Availability
The platform is a Single Point Of Failure for _all_ our messages, it must be resilient.
■ Horizontal scalability
Ability to scale to match our message routing needs, no matter the channel.

Central Messaging Platform Guarantees
■ Observability
Expose per channel metrics and allow per message or per batch tracing.
■ Idempotence
The platform guarantees that the same message can’t be sent twice.

Design Thinking & Key Concepts
We need to apply some key concepts in our design to keep up with the constraints and
guarantees of our platform.
Reliability
- Simple: few share-(almost?)-nothing components
- Low coupling: keep remote dependencies to its minimum
- Coding language: performant with explicit patterns and strict paradigms

Design Thinking & Key Concepts
Scale
- Application layer: easy to deploy & scale with strong resilience
- Data bus: high-throughput, highly-resilient, horizontally scalable, time and order
preserving capabilities message bus
- Data querying: low-latency, one-or-many query support
Idempotence
- Processing isolation: workload distribution should be deterministic

Considering Numberly’s stack, the ﬁrst go-to architecture could have been…
Platform Architecture 101

Reliability
HA with low coupling Relies on 3 data technologies
Scalability
Easy to deploy Kubernetes
Data horizontal scaling ScyllaDB Kafka Redis
Data low latency querying ScyllaDB Kafka Redis
Data ordered bus ScyllaDB Kafka Redis
Idempotence
Deterministic workload distribution SUM( ScyllaDB + Kafka + Redis ) ?!
Platform Architecture Not So 101

Reliability
HA with low coupling Use only ONE data technology
Scalability
Easy to deploy Kubernetes
Data horizontal scaling ScyllaDB Kafka Redis
Data low latency querying ScyllaDB Kafka Redis
Data ordered bus ScyllaDB Kafka Redis
Idempotence
Deterministic workload distribution ScyllaDB?!
The Daring Architecture

What if I used ScyllaDB’s
shard-per-core
architecture inside
my application?

ScyllaDB Shard-Per-Core Architecture
ScyllaDB shard-per-core data distribution and deterministic processing.

Using ScyllaDB Shard-Per-Core Architecture
Let’s align our application with ScyllaDB’s shard-per-core deterministic data distribution!

Using ScyllaDB’s shard-awareness at the core of our application we gain:
- Deterministic workload distribution
- Super optimized data processing capacity aligned from the application to the storage layer
- Strong latency and isolation guarantees per application instance (pod)
- Patterned after ScyllaDB’s Inﬁnite scale
The 100% Shard-Aware Application

Building a Shard-Aware
Application

The Language Dilemma
■ We need a modern language that reﬂects our
desire to build a reliable, secure and eﬃcient
platform.

platform.
■ Shard calculation algorithm requires performant
hashing capabilities and a great synergy with the
ScyllaDB driver.

platform.
ScyllaDB driver.
reliable + secure + efficient = Rust

platform.
ScyllaDB driver.
reliable + secure + efficient + = Rust

Deterministic Data Ingestion
Partition Key is a tuple of
(channel, customer, message id)
Clustering Key
(event date, event action)
1
Ingester Store

Deterministic Data Ingestion
Clustering Key
(channel, shard)
Clustering Key
(timestamp, customer, message id)
1
2
2
Ingester Store

Deterministic Data Processing
Clustering Key
(channel, shard)
Clustering Key
1
2
2
Ingester Store Scheduler
Shard (1) handler
Shard (2) handler
Shard (3) handler
Shard (N) handler

Clustering Key
(channel, shard)
Clustering Key
1
2
2
3
Shard (1) handler
Shard (2) handler
Shard (3) handler
Shard (N) handler
SELECT ... FROM buffer
WHERE channel = ?
AND shard = 2
AND timestamp >= ?
AND timestamp <= currentTimestamp()
LIMIT ?

Clustering Key
(channel, shard)
Clustering Key
1
2
2
3
Shard (1) handler
Shard (2) handler
Shard (3) handler
Shard (N) handler
WHERE channel = ?
AND shard = 2
AND timestamp >= ?
LIMIT ?
4

Deterministic Message Routing
Clustering Key
(channel, shard)
Clustering Key
1
2
2
3
5
Channel
EMAIL MTA
Shard (1) handler
Shard (2) handler
Shard (3) handler
Shard (N) handler
WHERE channel = ?
AND shard = 2
AND timestamp >= ?
LIMIT ?
4

Could We Replace Kafka With ScyllaDB?
(channel, shard)
(channel, shard)
Clustering Key
Store Scheduler
Shard (1) handler
Shard (2) handler
Shard (3) handler
Shard (N) handler

Trying To Replace Kafka With ScyllaDB
(channel, shard)
(channel, shard)
Clustering Key
Store Scheduler
Shard (1) handler
Shard (2) handler
Shard (3) handler
Shard (N) handler
SELECT buffer_last_pull_ts
FROM sharding
WHERE channel = ?
AND shard = 2
3.1

Replacing Kafka With ScyllaDB
(channel, shard)
(channel, shard)
Clustering Key
Store Scheduler
Shard (1) handler
Shard (2) handler
Shard (3) handler
Shard (N) handler
WHERE channel = ?
AND shard = 2
AND timestamp >= ?
LIMIT ?
SELECT buffer_last_pull_ts
FROM sharding
WHERE channel = ?
AND shard = 2
3.2
- window_offset
buffer_last_pulled_ts
currentTimestamp()
3.1

What We Learned on the Road
■ Load testing is more than useful
Spotted a lot of non trivial issues (batch execution delay, timeouts, large partitions, etc.)
■ Time-Window Compaction Strategy
Message buffering as time-series processing allowed us to avoid large partitions!

What We Contributed to Make it Possible
■ Rust driver contributions

■ Bugs discovery

■ Bugs discovery
❤ ScyllaDB support

What We Wish We Could Do
■ Long-polling for time-series
Our architecture implies regular fetching, but we have idea to improve this.
■ A Rust driver with less allocations
We did encounter some memory issues and have (a lot?) of ideas to improve the Rust driver!

Going Further with ScyllaDB Features
■ CDC Kafka source connector
Use CDC to stream message events to the rest of the infrastructure, without touching
applicative code
■ Replace LWT by Raft?
We use LWT in a few places, e.g. dynamic shard workload attribution, and can’t wait to test
strongly-consistent tables!

Thank You
Stay in Touch
Want to have fun with us? Reach out!
alexys@numberly.com | joseph@numberly.com | yassir@numberly.com
@ultrabug
numberly

Building a 100% ScyllaDB Shard-Aware Application Using Rust

More Related Content

Similar to Building a 100% ScyllaDB Shard-Aware Application Using Rust

More from ScyllaDB

Recently uploaded

Building a 100% ScyllaDB Shard-Aware Application Using Rust