Bratislava, Slovakia. June 3-5, 2024
Harnessing WebAssembly
for Real-time Stateless
Streaming Pipelines
Christina Lin
Online SaaS Services
AWS, GCP
Control
Plane
AWS, GCP
AWS, GCP
AWS, GCP
AWS, GCP
A
A
A
A
Not all brokers are at its full capacity!
Consumer
Broker
Data Ping-Pong
Data
Pipeline
Data
Pipeline
Efficient
Safe
Great UX
</>
C++
Memory
Thread
P P P P P P P P P
P P P P P P P P P
P P P P P P P P P
Client
Streaming
Parallelism
Partitions
Producer
Consumer
Consumer
Consumer
Efficient
Safe
Great UX
</>
C++
Memory
Thread
JavaScript
C/C++
Rust
Go
01110101101
11010101010
01101010110
11100010101
WebAssembly
Browser
Sandbox
Modules Memory Table
pointers
values
Function
Safe
WebAssembly System Interface
Portable Operating System Interface
File system Network Environment Variable
Clock Clock Arguments
No right to access resource beyond sandbox
• Gas metering in CPU
• Memory, restrict memory used.
• Pre-allocating memory
Core 1 Core 2
wasm
wasm
VM
mem
VM
mem mem
Broker
void displaylog(int n);
int max(int num1, int num2) {
if (num1 > num2)
result = num1;
else result = num2;
displaylog(result);
return result;
}
(module
(type $FUNCSIG$vi (func (param i32)))
(import "env" "displaylog" (func $displaylog (param
i32)))
(table 0 anyfunc)
(memory $0 1)
(export "memory" (memory $0))
(export "max" (func $max))
(func $max (; 1 ;) (param $0 i32) (param $1 i32)(result
i32)
(call $displaylog
(tee_local $0
(select
(get_local $0)
(get_local $1)
(i32.gt_s (get_local $0) (get_local $1))
)
)
)
(get_local $0)
)
0055fa0 6974 656d 632e 6f6c 656e 7453 6972 676e
0055fb0 01a8 7409 6d69 2e65 6144 6574 01a9 2817
0055fc0 742a 6d69 2e65 6f4c 6163 6974 6e6f 2e29
0055fd0 6f6c 6b6f 7075 01aa 2813 742a 6d69 2e65
0055fe0 6954 656d 2e29 6461 5364 6365 01ab 740a
0055ff0 6d69 2e65 7a74 6573 ac74 0e01 6974 656d
0056000 462e 7869 6465 6f5a 656e 01ad 2814 742a
0056010 6d69 2e65 6f4c 6163 6974 6e6f 2e29 6567
0056020 ae74 1101 6974 656d 612e 6f74 5b69 7473
0056030 6972 676e af5d 1d01 6974 656d 702e 7261
0056040 6573 614e 6f6e 6573 6f63 646e 5b73 7473
0056050 6972 676e b05d 1601 6974 656d 702e 7261
0056060 6573 6953 6e67 6465 664f 7366 7465 01b1
0056070 7411 6d69 2e65 6f6c 6461 6f4c 6163 6974
0056080 6e6f 01b2 740f 6d69 2e65 6f6c 6461 7a54
0056090 6e69 6f66 01b3 741b 6d69 2e65 6f4c 6461
00560a0 6f4c 6163 6974 6e6f 7246 6d6f 5a54 6144
00560b0 6174 01b4 7409 6d69 2e65 706f 6e65 01b5
00560c0 740b 6d69 2e65 7270 6165 6e64 01b6 2813
00560d0 742a 6d69 2e65 6164 6174 4f49 2e29 6572
00560e0 6461 01b7 740e 6d69 2e65 7a74 6573 4e74
00560f0 6d61 b865 1001 6974 656d 742e 737a 7465
0056100 664f 7366 7465 01b9 740e 6d69 2e65 7a74
0056110 6573 5274 6c75 ba65 0c01 6974 656d 612e
0056120 7362 6144 6574 01bb 740f 6d69 2e65 7a74
0056130 7572 656c 6954 656d 01bc 740d 6d69 2e65
0056140 7a74 6573 4e74 6d75 01bd 7417 6d69 2e65
Efficient
Transforms software development kit
func myTransform(evt redpanda.WriteEvent) ([]redpanda.Record, error) {
var data struct {...}
if err := xml.Unmarshal(evt.Record().Value, &data); err != nil {
return nil, err
}
v, err := json.Marshal(&data)
if err != nil {
return nil, err
}
return redpanda.Record{
Key: evt.Record().Key,
Value: v,
}, nil
}
Great
UX
rpk
cloud login
Choose my fav language!
Builds the
WebAssembly module
Define transformation rules
rpk transform build
rpk transform init
rpk transform deploy
--input-topic=customer
--output-topic=customer_masked
Deploy transformation to cluster
customer
customer_masked
customer
customer_masked
customer
customer_masked
Replicate
across
clusters
Stateless
Streaming Pipeline
Transform
format Change, masking, filtering, validating
Dispatch, Wiretap
Spilt, multiple destination
Control
reroute
Normalize/ Denormalize Enrich
Multiple ingestion
Stateful
Streaming Pipeline
Complex event processing
Time-window based
processing
Enrich
Multiple ingestion
Micro batch Pipeline
Transform for large output (Dataset)
Partitioning Split workload
Analytics
batch Pipeline
Analytics large volume
(legacy)
Transform large output (Dataset, legacy)
Transport large unstructured data
Better scalability for
pipelines
Redpanda Data Transform
Stateless
Streaming Pipeline
Transform
format Change, masking, filtering, validating
Dispatch, Wiretap
Spilt, multiple destination
Control
reroute
Normalize/ Denormalize
Enrich
Multiple ingestion
WASM
WebAssembly
Binary instruction format for a stacked-based VM.
Portable compilation
Go
Rust
JS
Python
Ruby
Demo
Redpanda University
Free, self-paced online learning
https://university.redpanda.com
•Learn the fundamentals of data streaming and Redpanda
•Install Redpanda and use the rpk CLI to configure it
•Create producers and consumers
in Java, Python and NodeJS
•Sign up today for free!
Questions?
LinkedIn: www.linkedin.com/in/weimeilin
X: @Christina_wm
Email: christina@redpanda.com

Harnessing WebAssembly for Real-time Stateless Streaming Pipelines

  • 1.
    Bratislava, Slovakia. June3-5, 2024 Harnessing WebAssembly for Real-time Stateless Streaming Pipelines Christina Lin
  • 2.
    Online SaaS Services AWS,GCP Control Plane AWS, GCP AWS, GCP AWS, GCP AWS, GCP A A A A Not all brokers are at its full capacity!
  • 3.
  • 4.
  • 5.
    P P PP P P P P P P P P P P P P P P P P P P P P P P P Client Streaming
  • 6.
  • 7.
  • 8.
  • 10.
    Browser Sandbox Modules Memory Table pointers values Function Safe WebAssemblySystem Interface Portable Operating System Interface File system Network Environment Variable Clock Clock Arguments No right to access resource beyond sandbox
  • 11.
    • Gas meteringin CPU • Memory, restrict memory used. • Pre-allocating memory Core 1 Core 2 wasm wasm VM mem VM mem mem Broker
  • 12.
    void displaylog(int n); intmax(int num1, int num2) { if (num1 > num2) result = num1; else result = num2; displaylog(result); return result; } (module (type $FUNCSIG$vi (func (param i32))) (import "env" "displaylog" (func $displaylog (param i32))) (table 0 anyfunc) (memory $0 1) (export "memory" (memory $0)) (export "max" (func $max)) (func $max (; 1 ;) (param $0 i32) (param $1 i32)(result i32) (call $displaylog (tee_local $0 (select (get_local $0) (get_local $1) (i32.gt_s (get_local $0) (get_local $1)) ) ) ) (get_local $0) ) 0055fa0 6974 656d 632e 6f6c 656e 7453 6972 676e 0055fb0 01a8 7409 6d69 2e65 6144 6574 01a9 2817 0055fc0 742a 6d69 2e65 6f4c 6163 6974 6e6f 2e29 0055fd0 6f6c 6b6f 7075 01aa 2813 742a 6d69 2e65 0055fe0 6954 656d 2e29 6461 5364 6365 01ab 740a 0055ff0 6d69 2e65 7a74 6573 ac74 0e01 6974 656d 0056000 462e 7869 6465 6f5a 656e 01ad 2814 742a 0056010 6d69 2e65 6f4c 6163 6974 6e6f 2e29 6567 0056020 ae74 1101 6974 656d 612e 6f74 5b69 7473 0056030 6972 676e af5d 1d01 6974 656d 702e 7261 0056040 6573 614e 6f6e 6573 6f63 646e 5b73 7473 0056050 6972 676e b05d 1601 6974 656d 702e 7261 0056060 6573 6953 6e67 6465 664f 7366 7465 01b1 0056070 7411 6d69 2e65 6f6c 6461 6f4c 6163 6974 0056080 6e6f 01b2 740f 6d69 2e65 6f6c 6461 7a54 0056090 6e69 6f66 01b3 741b 6d69 2e65 6f4c 6461 00560a0 6f4c 6163 6974 6e6f 7246 6d6f 5a54 6144 00560b0 6174 01b4 7409 6d69 2e65 706f 6e65 01b5 00560c0 740b 6d69 2e65 7270 6165 6e64 01b6 2813 00560d0 742a 6d69 2e65 6164 6174 4f49 2e29 6572 00560e0 6461 01b7 740e 6d69 2e65 7a74 6573 4e74 00560f0 6d61 b865 1001 6974 656d 742e 737a 7465 0056100 664f 7366 7465 01b9 740e 6d69 2e65 7a74 0056110 6573 5274 6c75 ba65 0c01 6974 656d 612e 0056120 7362 6144 6574 01bb 740f 6d69 2e65 7a74 0056130 7572 656c 6954 656d 01bc 740d 6d69 2e65 0056140 7a74 6573 4e74 6d75 01bd 7417 6d69 2e65 Efficient
  • 13.
    Transforms software developmentkit func myTransform(evt redpanda.WriteEvent) ([]redpanda.Record, error) { var data struct {...} if err := xml.Unmarshal(evt.Record().Value, &data); err != nil { return nil, err } v, err := json.Marshal(&data) if err != nil { return nil, err } return redpanda.Record{ Key: evt.Record().Key, Value: v, }, nil } Great UX
  • 14.
    rpk cloud login Choose myfav language! Builds the WebAssembly module Define transformation rules rpk transform build rpk transform init rpk transform deploy --input-topic=customer --output-topic=customer_masked Deploy transformation to cluster customer customer_masked customer customer_masked customer customer_masked Replicate across clusters
  • 15.
    Stateless Streaming Pipeline Transform format Change,masking, filtering, validating Dispatch, Wiretap Spilt, multiple destination Control reroute Normalize/ Denormalize Enrich Multiple ingestion Stateful Streaming Pipeline Complex event processing Time-window based processing Enrich Multiple ingestion Micro batch Pipeline Transform for large output (Dataset) Partitioning Split workload Analytics batch Pipeline Analytics large volume (legacy) Transform large output (Dataset, legacy) Transport large unstructured data Better scalability for pipelines
  • 16.
    Redpanda Data Transform Stateless StreamingPipeline Transform format Change, masking, filtering, validating Dispatch, Wiretap Spilt, multiple destination Control reroute Normalize/ Denormalize Enrich Multiple ingestion WASM WebAssembly Binary instruction format for a stacked-based VM. Portable compilation Go Rust JS Python Ruby
  • 17.
  • 18.
    Redpanda University Free, self-pacedonline learning https://university.redpanda.com •Learn the fundamentals of data streaming and Redpanda •Install Redpanda and use the rpk CLI to configure it •Create producers and consumers in Java, Python and NodeJS •Sign up today for free!
  • 19.

Editor's Notes

  • #2 If you add speaker notes, you would be able to see them on the computer on the podium, but we will not have floor screens for you to see them while walking around.
  • #11 Clock: Allows using timers and accessing system time Arguments: Inputs for int main(int argc, char* argv[]) Environment Variables:  Define a custom environment Random: Access to /dev/urandom Std{in,out,err}: Allow access to standard inputs and outputs Filesystem: fd_write, fd_read, path_open, and friends Socket: sock_accept, sock_recv, sock_send Also - it's lightweight, for example, Wasm micro runtime (WAMR) is on the order of kilobytes of memory and code size overhead
  • #12 does not hog up CPU, if run too long, abort prevent fragmentation from becoming an issue when deploying transforms to a long running cluster we reserve memory on system startup, and plug them into Wasm instances using custom allocators.