Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaDB Pipeline

Learning Rust the Hard Way
for a Production
Kafka+ScyllaDB Pipeline
Alexys Jacob
CTO

Alexys Jacob
■ ScyllaDB awarded Open Source & University contributor
■ Open Source author & contributor
• Apache Avro, Apache Airﬂow, MongoDB, MkDocs…
■ Tech speaker & writer
■ Gentoo Linux developer
■ Python Software Foundation contributing member
CTO, Numberly
@ultrabug

Your next 20 minutes
■ The thought process to move from Python to Rust
• Context, promises, arguments and decision
■ Learning Rust the hard way
• All the stack components I had to work with in Rust
• Tips, Open Source contributions and code samples
■ What is worth it?
• Graphs, production numbers
• Personal notes

At Numberly, we move and process (a lot of) data using Kafka streams and
pipelines that are enriched using ScyllaDB.
processor
app
processor
app
Project context at Numberly
Scylla
processor
app
raw data
enriched data
enriched data
enriched data client
app
partner
API
business
app

processor
app
processor
app
Pipeline reliability = latency + resilience
Scylla
processor
app
raw data
enriched data
enriched data
app
partner
API
business
app
If a processor or ScyllaDB is slow or fails, our business, partners & clients are at risk.

A major change in our pipeline processors had to be undertaken, giving us the
opportunity to redesign them.
The (rusted) opportunity
Scylla
processor
app
raw data
enriched data
enriched data
app
partner
API
business
app

“Hey, why not rewrite
those 3 Python processor apps
into 1 Rust app?”

A language empowering everyone to build reliable and efficient software.
■ Secure
• Memory and thread safety as first class citizens
• No runtime or garbage collector
■ Easy to deploy
• Compiled binaries are self-sufficient
■ No compromises
• Strongly and statically typed
• Exhaustivity is mandatory
• Built-in error management syntax and primitives
■ Plays well with Python
• PyO3 can be used to run Rust from Python (or the contrary)
The (never tried before) Rust promises

Eﬃcient software != Faster software
■ “Fast” meanings vary depending on your objectives.
• Fast to develop?
• Fast to maintain?
• Fast to prototype?
• Fast to process data?
• Fast to cover all failure cases?
“
Selecting a programming language can be a form of
premature optimization

Eﬃcient software != Faster software
■ “Fast” meanings vary depending on your objectives.
• Fast to develop? Python is way faster, did that for 15 years
• Fast to maintain? Nobody at Numberly does Rust yet
• Fast to prototype? No, code must be complete to compile and run
• Fast to process data? Sure: to prove it, measure it
• Fast to cover all failure cases? Deﬁnitely: mandatory exhaustivity + error handling primitives
“
I did not choose Rust to be “faster”.
Our Python code was fast enough
to deliver their pipeline processing.

Innovation cannot exist
if you don’t accept to lose time.
The question is
to know when and on what project.

The Reliable software paradigms
■ What makes me slow will make me stronger.
• Low level paradigms (ownership, borrowing, lifetimes).
• Strong type safety.
• Compilation (debug, release).
• Dependency management.
• Exhaustive pattern matching.
• Error management primitives (Result).

The Reliable software paradigms
■ What makes me slow will make me stronger.
• Low level paradigms (ownership, borrowing, lifetimes). If it compiles, it’s safe
• Strong type safety. Predictable, readable, maintainable
• Compilation (debug, release). Compiler is very helpful vs a random Python exception
• Dependency management. Finally something looking sane vs Python mess
• Exhaustive pattern matching. Conﬁdence that you’re not forgetting something
• Error management primitives (Result). Handle failure right from the language syntax
“
I chose Rust because it provided me with
the programming paradigms at the right abstraction level
that I needed to ﬁnally understand and better explain
the reliability and performance of an application.

Production is not a Hello World
■ Learning the syntax and handling errors everywhere
■ Conﬂuent Kafka + Schema Registry + Avro
■ Asynchronous latency-optimized design
■ ScyllaDB multi-datacenter
■ MongoDB
■ Kubernetes deployment
■ Prometheus exporter
■ Grafana dashboarding
■ Sentry
Scylla
processor
app
Conﬂuent
Kafka

Conﬂuent Kafka Schema Registry
■ Conﬂuent Schema Registry breaks vanilla Apache Avro deserialization.
• Gerard Klijs’ schema_registry_converter crate helps
• I discovered performance problems which we worked on and are being addressed!
■ Latency-overhead-free manual approach:

Apache Avro Rust was broken!
■ Avro Rust crate given to Apache Avro without an
appointed committer.
• Deserialization of complex schemas was broken...
• I contributed ﬁxes to Apache Avro
(AVRO-3232+3240)
• Now merged thanks to Martin Grigorov!
■ Rust compiler optimizations give a hell of a boost
(once Avro is ﬁxed)
• Deserializing Avro is faster than JSON!

green thread / msg
Asynchronous patterns to optimize latency
■ Tricks to make your Kafka consumer strategy more eﬃcient.
• Deserialize your consumer messages on the consumer loop, not on green-threads
• Spawning a green-thread has a performance cost
• Control your green-thread parallelism
• Defer to green-threads when I/O starts to be required
Kafka
consumer
+
avro
deserializer
raw data
green thread / msg
green thread / msg
green thread / msg
green thread / msg
Scylla
enriched data

Absorbing tail latency spikes with parallelism
x16
x2
parallelism load

Scylla Rust (shard-aware) driver
■ The scylla-rust-driver crate is mature enough
for production.
• Use a CachingSession to automatically cache your
prepared statements
• Beware: prepared queries are NOT paged, use
paged queries with execute_iter() instead!

Exporting metrics properly for Prometheus
■ Effectively measuring latencies down to microseconds.
• Fine tune your histogram buckets to match your expected latencies!
...

Grafana dashboarding
■ Graph your precious metrics right!
• ScyllaDB prepared statement cache
size
• Query and throughput rates
• Kafka commits occurrence
• Errors by type
• Kubernetes pod memory
• ...
■ Visualizing Prom Histograms
max by (environment)(histogram_quantile(0.50, processing_latency_seconds_bucket{...}))

Did I really lose time because of Rust?
■ I spent more time analyzing the latency impacts of code patterns and drivers’
options than struggling with Rust syntax.
■ Key ﬁgures for this application:
• Kafka consumer max throughput with processing? 200K msg/s on 20 partitions
• Avro deserialization P50 latency? 75µs
• Scylla SELECT P50 latency on 1.5B+ rows tables? 250µs
• Scylla INSERT P50 latency on 1.5B+ rows tables? 2ms

It went better than expected
■ Rust crates ecosystem is mature, similar to Python Package Index.
■ The scylla-rust-driver is stable and eﬃcient!
■ It took me a while to accept that Apache Avro was broken, not me.
■ 3 Python apps totalling 54 pods replaced by 1 Rust app totalling 20 pods
■ This feels like the most reliable and eﬃcient software I ever wrote!

Thank you!
Stay in touch
Alexys Jacob - Join us:
ultrabug
alexys@numberly.com

Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaDB Pipeline

More Related Content

What's hot

Similar to Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaDB Pipeline

More from ScyllaDB

Recently uploaded

Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaDB Pipeline