Porting a Streaming Pipeline from Scala to Rust

Lessons: Porting a Streaming Pipeline from
Scala to Rust
2023 Scale by the Bay
Evan Chan
Principal Engineer - Conviva
http://velvia.github.io/presentations/2023-conviva-scala-to-rust
1 / 38

Massive Real-time Streaming Analytics
5 trillion events processed per day
800-2000GB/hour (not peak!!)
Started with custom Java code
went through Spark Streaming and Flink iterations
Most backend data components in production are written in Scala
Today: 420 pods running custom Akka Streams processors
3 / 38

Data World is Going Native and Rust
Going native: Python, end of Moore's Law, cloud compute
Safe, fast, and high-level abstractions
Functional data patterns - map, fold, pattern matching, etc.
Static dispatch and no allocations by default
PyO3 - Rust is the best way to write native Python extensions
JVM Rust projects
Spark, Hive DataFusion, Ballista, Amadeus
Flink Arroyo, RisingWave, Materialize
Kafka/KSQL Fluvio
ElasticSearch / Lucene Toshi, MeiliDB
Cassandra, HBase Skytable, Sled, Sanakirja...
Neo4J TerminusDB, IndraDB
4 / 38

About our Architecture
graph LR; SAE(Streaming
Data
Pipeline) Sensors --> Gateways Gateways --> Kafka Kafka --> SAE SAE -->
DB[(Metrics
Database)] DB --> Dashboards
5 / 38

What We Are Porting to Rust
graph LR; classDef highlighted fill:#99f,stroke:#333,stroke-width:4px
SAE(Streaming
Data
Pipeline) Sensors:::highlighted --> Gateways:::highlighted Gateways --> Kafka
Kafka --> SAE:::highlighted SAE --> DB[(Metrics
graph LR; Notes1(Sensors: consolidate
fragmented code base) Notes2(Gateway:
Improve on JVM and Go) Notes3(Pipeline:
Improve efficiency
New operator architecture) Notes1 ~~~ Notes2 Notes2 ~~~ Notes3
6 / 38

Our Journey to Rust
gantt title From Hackathon to Multiple Teams dateFormat YYYY-MM
axisFormat %y-%b section Data Pipeline Hackathon :Small Kafka ingestion
project, 2022-11, 30d Scala prototype :2023-02, 6w Initial Rust Port : small
team, 2023-04, 45d Bring on more people :2023-07, 8w 20-25 people 4 teams
:2023-11, 1w section Gateway Go port :2023-07, 6w Rust port :2023-09, 4w
“I like that if it compiles, I know it will work, so it gives confidence.”
7 / 38

Promising Rust Hackathon
graph LR; Kafka --> RustDeser(Rust Deserializer) RustDeser --> RA(Rust Actors -
Lightweight Processing)
Measurement Improvement over Scala/Akka
Throughput (CPU) 2.6x more
Memory used 12x less
Mostly I/O-bound lightweight deserialization and processing workload
Found out Actix does not work well with Tokio
8 / 38

Performance Results - Gateway
9 / 38

Key Lessons or Questions
What matters for a Rust port?
The 4 P's ?
People How do we bring developers onboard?
Performance How do I get performance? Data structures? Static dispatch?
Patterns What coding patterns port well from Scala? Async?
Project How do I build? Tooling, IDEs?
10 / 38

People
How do we bring developers onboard?
11 / 38

A Phased Rust Bringup
We ported our main data pipeline in two phases:
Phase Team Rust Expertise Work
First 3-5, very senior
1-2 with significant
Rust
Port core project
components
Second
10-15, mixed,
distributed
Most with zero
Rust
Smaller, broken down
tasks
Have organized list of learning resources
2-3 weeks to learn Rust and come up to speed
12 / 38

Difficulties:
Lifetimes
Compiler errors
Porting previous patterns
Ownership and async
etc.
How we helped:
Good docs
Start with tests
ChatGPT!
Rust Book
Office hours
Lots of detailed reviews
Split project into async and
sync cores
Overcoming Challenges
13 / 38

Performance
Data structures, static dispatch, etc.
"I enjoy the fact that the default route is performant. It makes you write
performant code, and if you go out the way, it becomes explicit (e.g., with dyn,
Boxed, or clone etc). "
14 / 38

Porting from Scala: Huge Performance Win
graph LR; classDef highlighted fill:#99f,stroke:#333,stroke-width:4px
SAE(Streaming
Data
Pipeline) Sensors --> Gateways Gateways --> Kafka Kafka --> SAE:::highlighted
SAE --> DB[(Metrics
CPU-bound, programmable, heavy data processing
Neither Rust nor Scala is productionized nor optimized
Same architecture and same input/outputs
Scala version was not designed for speed, lots of objects
Rust: we chose static dispatch and minimizing allocations
Type of comparison Improvement over Scala
Throughput, end to end 22x
Throughput, single-threaded microbenchmark >= 40x
15 / 38

Building a Flexible Data Pipeline
graph LR; RawEvents(Raw Events) RawEvents -->| List of numbers | Extract1
RawEvents --> Extract2 Extract1 --> DoSomeMath Extract2 -->
TransformSomeFields DoSomeMath --> Filter1 TransformSomeFields -->
Filter1 Filter1 --> MoreProcessing
An interpreter passes time-ordered data between flexible DAG of operators.
Span1
Start time: 1000
End time: 1100
Events: ["start", "click"]
Span2
Start time: 1100
End time: 1300
Events: ["ad_load"]
16 / 38

Scala: Object Graph on Heap
graph TB; classDef default font-
size:24px
ArraySpan["Àrray[Span]`"]
TL(Timeline - Seq) --> ArraySpan
ArraySpan --> Span1["`Span(start,
end, Payload)`"] ArraySpan -->
Span2["`Span(start, end,
Payload)`"] Span1 -->
EventsAtSpanEnd("Èvents(Seq[A])`")
EventsAtSpanEnd -->
ArrayEvent["Àrray[A]`"]
Rust: mostly stack based / 0 alloc:
flowchart TB; subgraph Timeline
subgraph OutputSpans subgraph
Span1 subgraph Events EvA ~~~
EvB end TimeInterval ~~~ Events
end subgraph Span2 Time2 ~~~
Events2 end Span1 ~~~ Span2 end
DataType ~~~ OutputSpans end
Data Structures: Scala vs Rust
17 / 38

Rust: Using Enums and Avoiding Boxing
pub enum Timeline {
EventNumber(OutputSpans<EventsAtEnd<f64>>),
EventBoolean(OutputSpans<EventsAtEnd<bool>>),
EventString(OutputSpans<EventsAtEnd<DataString>>),
}
type OutputSpans<V> = SmallVec<[Spans<V>; 2]>;
pub struct Span<SV: SpanValue> {
pub time: TimeInterval,
pub value: SV,
}
pub struct EventsAtEnd<V>(SmallVec<[V; 1]>);
In the above, the Timeline enum can fit entirely in the stack and avoid all
boxing and allocations, if:
The number of spans is very small, below limit set in code
The number of events in each span is very small (1 in this case, which is
the common case)
The base type is a primitive, or a string which is below a certain length 18 / 38

Avoiding Allocations using SmallVec and
SmallString
SmallVec is something like this:
pub enum SmallVec<T, const N: usize> {
Stack([T; N]),
Heap(Vec<T>),
}
The enum can hold up to N items inline in an array with no allocations, but
switches to the Heap variant if the number of items exceeds N.
There are various crates for small strings and other data structures.
19 / 38

Static vs Dynamic Dispatch
Often one will need to work with many different structs that implement a Trait
-- for us, different operator implementations supporting different types. Static
dispatch and inlined code is much faster.
1. Monomorphisation using generics
fn execute_op<O: Operator>(op: O) -> Result<...>
Compiler creates a new instance of execute_op for every different O
Only works when you know in advance what Operator to pass in
2. Use Enums and enum_dispatch
fn execute_op(op: OperatorEnum) -> Result<...>
3. Dynamic dispatch
fn execute_op(op: Box<dyn Operator>) -> Result<...>
fn execute_op(op: &dyn Operator) -> Result<...> (avoids allocation)
4. Function wrapping
Embedding functions in a generic struct
20 / 38

enum_dispatch
Suppose you have
trait KnobControl {
fn set_position(&mut self, value: f64);
fn get_value(&self) -> f64;
}
struct LinearKnob {
position: f64,
}
struct LogarithmicKnob {
position: f64,
}
impl KnobControl for LinearKnob...
enum_dispatch lets you do this:
#[enum_dispatch]
trait KnobControl {
//...
} 21 / 38

Function wrapping
Static function wrapping - no generics
pub struct OperatorWrapper {
name: String,
func: fn(input: &Data) -> Data,
}
Need a generic - but accepts closures
pub struct OperatorWrapper<F>
where F: Fn(input: &Data) -> Data {
name: String,
func: F,
}
22 / 38

Patterns
Async, Type Classes, etc.
23 / 38

Rust Async: Different Paradigms
"Async: It is well designed... Yes, it is still pretty complicated piece of code, but
the logic or the framework is easier to grasp compared to other languages."
Having to use Arc: Data Structures are not Thread-safe by default!
Scala Rust
Futures futures, async functions
?? async-await
Actors(Akka) Actix, Bastion, etc.
Async streams Tokio streams
Reactive (Akka streams, Monix, ZIO) reactive_rs, rxRust, etc.
24 / 38

Replacing Akka: Actors in Rust
Actix threading model doesn't mix well with Tokio
We moved to tiny-tokio-actor, then wrote our own
pub struct AnomalyActor {}
#[async_trait]
impl ChannelActor<Anomaly, AnomalyActorError> for AnomalyActor {
async fn handle(
&mut self,
msg: Anomaly,
ctx: &mut ActorContext<Anomaly>,
) -> Result<(), Report<AnomalyActorError>> {
use Anomaly::*;
match msg {
QuantityOverflowAnomaly {
ctx: _, ts: _, qual: _,
qty: _, cnt: _, data: _,
} => {}
PoisonPill => {
ctx.stop();
}
}
Ok(())
}
25 / 38

Other Patterns to Learn
Old Pattern New Pattern
No inheritance
Use composition!
- Compose data structures
- Compose small Traits
No exceptions Use Result and ?
Data structures are not
Thread safe
Learn to use Arc etc.
Returning Iterators
Don't return things that borrow other things.
This makes life difficult.
26 / 38

Type Classes
In Rust, type classes (Traits) are smaller and more compositional.
pub trait Inhale {
fn sniff(&self);
}
You can implement new Traits for existing types, and have different impl's for
different types.
impl Inhale for String {
fn sniff(&self) {
println!("I sniffed {}", self);
}
}
// Only implemented for specific N subtypes of MyStruct
impl<N: Numeric> Inhale for MyStruct<N> {
fn sniff(&self) {
....
}
}
27 / 38

Project
Build, IDE, Tooling
28 / 38

"Cargo is the best build tool ever"
Almost no dependency conflicts due to multiple dep versioning
Configuration by convention - common directory/file layouts for example
Really simple .toml - no need for XML, functional Scala, etc.
Rarely need code to build anything, even for large projects
[package]
name = "telemetry-subscribers"
version = "0.3.0"
license = "Apache-2.0"
description = "Library for common telemetry and observability functionality"
[dependencies]
console-subscriber = { version = "0.1.6", optional = true }
crossterm = "0.25.0"
once_cell = "1.13.0"
opentelemetry = { version = "0.18.0", features = ["rt-tokio"], optional = true }
29 / 38

IDEs, CI, and Tooling
IDEs/Editors
VSCode, RustRover (IntelliJ),
vim/emacs/etc with Rust Analyzer
Code Coverage VSCode inline, grcov/lcov, Tarpaulin (Linux only)
Slow build times Caching: cargo-chef, rust-cache
Slow test times cargo-nextest
Property Testing proptest
Benchmarking Criterion
https://blog.logrocket.com/optimizing-ci-cd-pipelines-rust-projects/
VSCode's "LiveShare" feature for distributed pair programming is TOP NOTCH.
30 / 38

Rust Resources and Projects
https://github.com/velvia/links/blob/main/rust.md - this is my list of Rust
projects and learning resources
https://github.com/rust-unofficial/awesome-rust
https://www.arewelearningyet.com - ML focused
31 / 38

What do we miss from Scala?
More mature libraries - in some cases: HDFS, etc.
Good streaming libraries - like Monix, Akka Streams etc.
I guess all of Akka
"Less misleading compiler messages"
Rust error messages read better from the CLI, IMO (not an IDE)
32 / 38

Takeaways
It's a long journey but Rust is worth it.
Structuring a project for successful onramp is really important
Think about data structure design early on
Allow plenty of time to ramp up on Rust patterns, tools
We are hiring across multiple roles/levels!
33 / 38

https://velvia.github.io/about
https://github.com/velvia
@evanfchan
IG: @platypus.arts
Thank You Very Much!
34 / 38

Data World is Going Native (from JVM)
The rise of Python and Data Science
Led to AnyScale, Dask, and many other Python-oriented data
frameworks
Rise of newer, developer-friendly native languages (Go, Swift, Rust, etc.)
Migration from Hadoop/HDFS to more cloud-based data architectures
Apache Arrow and other data interchange formats
Hardware architecture trends - end of Moore's Law, rise of GPUs etc
36 / 38

Why We Went with our Own Actors
1. Initial Hackathon prototype used Actix
Actix has its own event-loop / threading model, using Arbiters
Difficult to co-exist with Tokio and configure both
2. Moved to tiny-tokio-actor
Really thin layer on top of Tokio
25% improvement over rdkafka + Tokio + Actix
3. Ultimately wrote our own, 100-line mini Actor framework
tiny-tokio-actor required messages to be Clone so we could not, for
example, send OneShot channels for other actors to reply
Wanted ActorRef<MessageType> instead of ActorRef<ActorType,
MessageType>
supports tell() and ask() semantics
37 / 38

Scala: Object Graphs and Any
class Timeline extends BufferedIterator[Span[Payload]]
final case class Span[+A](start: Timestamp, end: Timestamp, payload: A) {
def mapPayload[B](f: A => B): Span[B] = copy(payload = f(payload))
}
type Event[+A] = Span[EventsAtSpanEnd[A]]
@newtype final case class EventsAtSpanEnd[+A](events: Iterable[A])
BufferedIterator must be on the heap
Each Span Payload is also boxed and on the heap, even for numbers
To be dynamically interpretable, we need BufferedIterator[Span[Any]]
in many places :(
Yes, specialization is possible, at the cost of complexity
38 / 38

Porting a Streaming Pipeline from Scala to Rust

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Porting a Streaming Pipeline from Scala to Rust

Similar to Porting a Streaming Pipeline from Scala to Rust (20)

More from Evan Chan

More from Evan Chan (16)

Recently uploaded

Recently uploaded (20)

Porting a Streaming Pipeline from Scala to Rust