Performance Pitfalls of Rust Async Function Pointers (And Why It Might Not Matter)

A ScyllaDB Community
Performance Pitfalls of Rust Async
Function Pointers (And Why It
Might Not Matter)
Byron Wasti
Founder of Balter Load Testing

Byron Wasti (he/him)
Founder of Balter Load Testing
■ Programming with Rust for 7+ years, 4+
professionally
■ Focused on building robust high-performance,
low-latency systems
■ Developer of Open Source load testing framework
for Rust called Balter
(github.com/BalterLoadTesting/balter)

Motivation: Building a Load Testing Framework
■ Ability to run a user provided function repeatedly and in parallel
// Balter code
pub fn load_test(user_func: fn()) {
loop {
user_func();
}
}
// User Code
fn main() {
balter::load_test(my_load_test_scenario);
}
fn my_load_test_scenario() {
...
}

Function Pointers
■ Any function with the same
signature
■ Run the function in multiple threads
■ Send the function to other machines
(its just a pointer)
pub fn load_test(user_func: fn()) {
for _ in 0..THREADS {
thread::spawn(|| {
loop {
user_func();
}
});
}
}
load_test(my_func_a);
load_test(my_func_b);
load_test(my_func_c);

Async Function Pointers
For IO bound tasks (e.g. HTTP requests), async promises better performance
pub async fn load_test(user_func: async fn()) {
for _ in 0..TASKS {
tokio::spawn(|| async {
loop {
user_func().await;
}
});
}
}
async fn foo() {
}
load_test(foo).await;

Async Functions in Rust
■ Desugar into normal functions returning `impl Future<Output=?>`
■ The compiler auto-generates an opaque type for the `impl Trait`
async fn foo() -> i32 {
}
async fn bar() -> i32 {
}
// Compiler error!
let arr: [fn() -> impl Future<Output=i32>] = [foo, bar];

Type-Erased Async Function Pointers
■ Common workaround is to use `Box::pin()`
fn foo() -> Pin<Box<dyn Future<Output=i32>>> {
Box::pin(async {
// Our usual async code
})
}
fn bar() -> Pin<Box<dyn Future<Output=i32>>> {
Box::pin(async {
// Our usual async code
})
}
// This works now!
let arr = [foo, bar];

Performance Characteristics
use std::hint::black_box;
fn main() {
load_test(black_box(foo));
}
fn load_test(func: fn(i32) -> i32) {
for i in 0..250_000_000 {
let _res = func(i);
}
}
fn foo(arg: i32) -> i32 {
black_box(arg * 2)
}

Time (mean ± σ) Range (min … max)
Function Pointer 429.1 ms ± 7.0 ms 418.9 ms … 436.7 ms
Boxed Function Pointer 537.9 ms ± 2.5 ms 536.1 ms … 544.0 ms
Async Function 407.6 ms ± 3.6 ms 403.7 ms … 411.6 ms
Boxed Async Function 4.985 s ± 0.090 s 4.922 s … 5.198 s
Source: https://github.com/byronwasti/async-fn-pointer-perf

What is (Probably) Going On?
■ Boxed Async Functions are an order of magnitude slower than boxed
functions
■ Heap allocation for async functions includes the opaque state-machine Struct
the compiler generates
● A normal boxed function is just… a pointer on the heap

Alternative 1: `Box::Pin()` at the Edge
■ Make use of Generics to have one `Box::pin()` call.
async fn load_test<T, F>(func: T)
where T: Fn() -> F,
F: Future<Output=i32>,
{
loop {
func().await;
}
}
}
Let arr = [Box::pin(load_test(foo)), Box::pin(load_test(bar))];

Generic Async Boxed 318.1 ms ± 1.2 ms 317.1 ms … 320.9 ms

Alternative 2: Use an Enum
async fn load_test(func: Func)
{
loop {
func.run().await;
}
}
}
async fn bar() -> i32 {
}
enum Func {
Foo,
Bar,
}
impl Func {
async fn run(&self) -> i32 {
match self {
Func::Foo => foo().await,
Func::Bar => bar().await,
}
}
}

Generic Async Boxed 318.1 ms ± 1.2 ms 317.1 ms … 320.9 ms
Async Enum Dispatch 526.5 ms ± 0.8 ms 525.6 ms … 528.1 ms

Alternative 3: Reset the Future
■ Used by the Tower
rate-limiting
functionality [1]
■ Unfortunately no
generic way to
implement
pub struct RateLimit {
...
sleep: Pin<Box<Sleep>>,
}
impl RateLimit {
fn call() {
...
// The service is disabled until further notice
// Reset the sleep future in place, so that we
don't have to
// deallocate the existing box and allocate a
new one.
self.sleep.as_mut().reset(until);
}
}
[1] https://docs.rs/tower/latest/src/tower/limit/rate/service.rs.html#106-109

Implementing In Practice
■ Converted Balter to use Generics (pushing the Box::pin() to the edge)
■ Saw no performance difference
■ Functions calls are ridiculously fast, a 10x slowdown is… still really fast
■ There is a Storage RFC for Rust which may add new options in the future

Thank you!
Byron Wasti
p99@byronwasti.com
www.byronwasti.com
github.com/byronwasti

Performance Pitfalls of Rust Async Function Pointers (And Why It Might Not Matter)

More Related Content

Similar to Performance Pitfalls of Rust Async Function Pointers (And Why It Might Not Matter)

More from ScyllaDB

Recently uploaded

Performance Pitfalls of Rust Async Function Pointers (And Why It Might Not Matter)