WebAssembly, also known as Wasm, is a binary format for representing executable code, designed to be easily embeddable into other projects. It's also a perfect candidate for a user-defined functions (UDFs) back-end due to its ease of integration, performance and popularity. ScyllaDB already supports user-defined functions expressed in WebAssembly in experimental mode, based on an open-source runtime written natively in Rust - Wasmtime.
This talk will cover a few examples of how to create Wasm functions in ScyllaDB, how to combine them into powerful user-defined aggregates and what are the future plans of integrating with Wasmtime and Rust even further.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
2. Piotr Sarna
■ software engineer keen on open-source projects, C++ and Rust
■ used to develop a distributed file system (LizardFS)
■ wrote a few patches for the Linux kernel
■ graduated from University of Warsaw with MSc in Computer Science
■ maintainer of the Scylla Rust Driver project
Principal Software Engineer @ScyllaDB
3. WebAssembly
Binary format for expressing executable code, executed on a stack-based virtual
machine. Designed to be:
■ portable
■ easily embeddable
■ efficient
WebAssembly is binary, but it also specifies a standard human-readable format:
WAT (WebAssembly Text Format).
4. Runtime of choice: Wasmtime
A variety of WebAssembly engines are available for embedding into C++ projects
■ Wasmtime
• implemented in Rust
• WebAssembly only
• lightweight (esp. compared to v8)
• has bindings for C/C++
• native support for yielding
■ v8
• implemented in C++
• supports javascript too
• a heavy dependency
• no direct support for yielding
the execution to reduce
latency
5. Runtime of choice: Wasmtime
For an initial implementation, we chose Wasmtime and its C++ bindings -
libwasmtime.
The next step is to get rid of the bindings due to its incomplete feature set,
and instead write the UDF support in Rust and compile it directly into Scylla.
7. How to code in WebAssembly?
Option 2: write in C, compile with clang
int fib(int n) {
if (n < 2) {
return n;
}
return fib(n - 1) + fib(n - 2);
}
clang -O2 --target=wasm32 --no-standard-libraries -Wl,--export-all -Wl,--no-entry
fib.c -o fib.wasm
wasm2wat fib.wasm > fib.wat
8. How to code in WebAssembly?
Option 3: Rust!
use wasm_bindgen::prelude::*;
#[wasm_bindgen]
pub fn fib(n: i32) -> i32 {
if n < 2 {
n
} else {
fib(n - 1) + fib(n - 2)
}
}
rustup target add wasm32-unknown-unknown
cargo build --target wasm32-unknown-unknown
wasm2wat target/wasm32-unknown-unknown/debug/fib.wasm > fib.wat
9. How to code in WebAssembly?
Option 4: AssemblyScript
export function fib(n: i32): i32 {
if (n < 2) {
return n
}
return fib(n - 1) + fib(n - 2)
}
asc fib.ts --textFile fib.wat --optimize
source: https://www.assemblyscript.org/introduction.html
10. User-defined functions
User-defined functions are a CQL feature that allows applying a custom function
to the query result rows.
cassandra@cqlsh:ks> SELECT id, inv(id), mult(id, inv(id)) FROM t;
id | ks.inv(id) | ks.mult(id, ks.inv(id))
----+------------+-------------------------
7 | 0.142857 | 1
1 | 1 | 1
0 | Infinity | NaN
4 | 0.25 | 1
(4 rows)
11. User-defined aggregates
A powerful tool for combining functions into accumulators, which aggregate
results from single rows.
cassandra@cqlsh:ks> SELECT * FROM words;
word
------------
monkey
rhinoceros
dog
(3 rows)
cassandra@cqlsh:ks> SELECT avg_length(word) FROM words;
ks.avg_length(word)
-----------------------------------------------
The average string length is 6.3333333333333!
(1 rows)
CREATE FUNCTION accumulate_len(acc tuple<bigint,bigint>, a text)
RETURNS NULL ON NULL INPUT
RETURNS tuple<bigint,bigint>
LANGUAGE lua as 'return {acc[1] + 1, acc[2] + #a}';
CREATE OR REPLACE FUNCTION present(res tuple<bigint,bigint>)
RETURNS NULL ON NULL INPUT
RETURNS text
LANGUAGE lua as
'return "The average string length is " .. res[2]/res[1] .. "!"';
CREATE OR REPLACE AGGREGATE avg_length(text)
SFUNC accumulate_len
STYPE tuple<bigint,bigint>
FINALFUNC present INITCOND (0,0);
12. User-defined aggregates
Possible scenarios for user-defined aggregates:
■ gathering statistical data: variance, standard deviation, percentiles, etc.
■ combining multiple rows into a new format, e.g. JSON or XML
■ custom predicates, e.g. "return 10 highest values"
■ you name it!
13. UDF coded with Wasm
Creating a user-defined function with Wasm is as easy as providing its source code
represented in WebAssembly Text Format:
CREATE FUNCTION fib(input bigint) RETURNS NULL ON NULL INPUT RETURNS
bigint
LANGUAGE xwasm AS
'(module
(func $fib (param $n i64) (result i64)
(if
(i64.lt_s (local.get $n) (i64.const 2))
(return (local.get $n))
)
(i64.add
(call $fib (i64.sub (local.get $n) (i64.const 1)))
(call $fib (i64.sub (local.get $n) (i64.const 2)))
)
)
(export "fib" (func $fib))
)';
cassandra@cqlsh:ks> SELECT n, fib(n) FROM numbers;
n | ks.fib(n)
---+-----------
1 | 1
2 | 1
3 | 2
4 | 3
5 | 5
6 | 8
7 | 13
8 | 21
9 | 34
(9 rows)
14. UDF coded with Wasm
The interface for expressing CQL types, return values, NULL values and many more
details are thoroughly explained in a public design doc:
https://github.com/scylladb/scylla/blob/master/docs/design-notes/wasm.md
15. Try it out!
Support for Wasm-based user-defined functions and
user-defined aggregates is already available
in experimental mode.
Enable it for testing today by adding these entries
to your scylla.yaml configuration file:
enable_user_defined_functions: true
experimental_features:
- udf
Scylla currently supports Lua and Wasm for
user-defined functions.