High Performance Rust UI
Raph Levien
Google Fonts
2022-12-02
Outline
● Goals and motivations
○ Performance!
● Rethinking the entire stack
○ Layering profoundly affects performance
○ Drawing: GPU compute shaders (piet-gpu)
○ Reactive model: enable Rust async
● Intro to Xilem architecture
● Intro to piet-gpu 2D rendering engine
● Deeper dive into (painting) layers and (architectural) layering
● Next steps
A bit of history
● xi-editor (2016)
○ Explore multi-process approach, use “native GUI” for front end
○ Async added much complexity, ultimately a failure
● Druid ca 2019
○ ECS-inspired architecture
○ Use platform drawing capabilities
● Druid “muggle” architecture
○ Works well when app data is modeled as tree, and model->view map is straightforward
○ Other patterns painful to express
○ Runebender font editor (currently on back burner)
● GPU drawing (piet-metal, piet-gpu)
○ Ongoing research program
Research questions to answer
● How much performance is possible over existing state of the art?
○ Break down quantitatively: where are the wins?
● Is current architectural layering a local optimum?
● Can we express UI excellently in Rust?
○ Subgoals:
■ Code is concise
■ Idiomatic to write performant UI
■ Components compose well
Rust
● Good news:
○ High level, performant, and safe
○ Strong commitment to portability
■ Windows, mac, and Linux are 1st class; many other targets work
○ Very active ecosystem
● Bad news:
○ Traditional GUI patterns cannot be expressed idiomatically in Rust
■ Shared mutable state is discouraged (but possible through RefCell etc)
■ No traditional object-oriented inheritance
○ Lots of fragmentation in GUI space (blog post)
■ Elm and immediate mode GUI are most popular architectural approaches
○ Not optimized for dev experience (compiles can be slow)
Performance
● Many dimensions to performance
○ 60 120 fps refresh without jank
○ Lower latency (frame pacing & closer coordination with compositor)
○ Fast startup time
○ Small binary size
● Strategies
○ Use Rust 🦀
○ Modern GPU centric drawing approach
○ Do expensive resource creation (image decompression, text layout) in threads
○ Minimize accidental complexity
○ Incremental computation
■ Avoid unnecessary work at all levels
Xilem architecture
fn app(count: &mut u32) -> impl View<u32> {
column((
format!("Count: {}", count),
button("Increment", |count| *count += 1),
))
}
Xilem architecture
● For more details, read blog post
● A proposal for the most excellent way to express reactive systems in Rust
○ A simple yet flexible incremental computation engine
■ Intelligent scheduling of re-renders plus diffing (virtual “DOM”)
○ Based on trees (view tree, view state tree, widget tree)
○ Statically typed
■ Type of view state tree and widget tree is automatically inferred from view tree type
○ Optional type erasure
○ Inspired by SwiftUI
Xilem core concept: View trait
● Two associated types:
○ View state
○ Widget
● Three methods:
○ build
■ create widget and initialize state
○ rebuild
■ diff against previous view tree, mutate widget & state
○ event
■ target of event is id path (slice of unique id values)
■ choose child based on [0], traverse using [1..]
■ body of event is Box<dyn Any>
Xilem flow
App state
View tree 1
View tree 2
time
rebuild (diff)
View state
Widget tree
event event
build
Xilem async
● Key concept: async wake sent to view node as an event
● In Rust lingo, each Xilem node can be its own executor, sharing one reactor
● Minimize flashing for fast-resolving futures
○ Kick off paint cycle and start timer
○ Call app logic to render view tree, count pending async futures
○ Deliver async wakes, decrement on ready
○ When count->0, or timeout (~5ms), actually render
● Demo!
piet-gpu
● GPU-accelerated 2D drawing
● A pipeline of compute shaders
● Break scene into 16x16 pixel tiles
● Coarse rasterization builds a command list for each tile
● Fine rasterization is basically a bytecode interpreter
● Inspired by Spinel, PathFinder, academic work
piet-gpu pipeline
Element
processing
Binning
Coarse path
raster
Tile
allocation
Coarse
winding
number
Coarse raster Fine raster
path
segments
other
drawing ops
piet-gpu scene encoding
● Conceptually, a tree of nodes
○ Path fill, path stroke (path = moveto, lineto, quadto, curveto, closepath)
○ Brushes:
■ Solid color
■ Linear/radial gradient
■ Image
○ Affine transform
○ Clip to path (affects children)
○ Blend (eg hard light, exclusion)
● Concretely, an array of binary-encoded streams
○ ~One stream for each type of node
○ Efficient (variable size) byte encoding for path segments
piet-gpu scene fragment
● A struct of byte-encoded streams
● Operation of concatenating scene fragments is cheap
○ mostly just memcpy of the streams, plus some reference fixup
● Can build fragments in multiple threads (doesn’t require context)
● Changing transform doesn’t require re-encoding
○ Glyph outlines can be retained and re-rendered with different transforms
piet-gpu WGSL port
● Shaders rewritten in WGSL
● Running on wgpu for native
● Running on WebGPU in browser (Chrome Canary)
● Workload is “advanced compute,” and have had to overcome some issues
○ miscompilations
○ uniformity analysis
● Opportunity: collaborate with rest of wgpu ecosystem
○ Looking into possibilities with Bevy
Signature of paint method in widget trait
● Traditionally:
○ fn paint(&self, render_context: &mut RenderContext);
○ Sometimes with lots of additional mechanism for layers (Flutter repaint boundaries)
● Proposed (conceptual):
○ fn paint(&self) -> SceneFragment;
● Implications are profound
○ Retain scene fragment and avoid re-rendering (similar to Flutter layers)
○ Distribute rendering to multiple threads
■ Mutable render context forces single threading
○ Maybe: fine-grained update of fragments (but maybe re-encoding is good enough)
Scene fragments vs layers (compositing)
● Similar goals: reduce re-rendering work when no change
● Compositing downside: variance in rendering time
○ GPU resource allocation & memory bandwidth on first render/re-render
○ Extra GPU RAM for retained textures is also an issue!
● Compositing downside: encourage compositor-friendly visual effects
○ Ok: translation (scrolling/sliding), alpha fading
○ Not ok: animated vector shapes, variable font animation, high-quality smooth zoom
● Goal of piet-gpu: no performance cliffs
○ Compositing happens in GPU vector registers, avoiding memory traffic
○ That includes clips, soft masks, blends
A bit about the widget tree
● Fairly traditional retained architecture
○ Major goal: support accessibility
○ Beginnings of integration with AccessKit
● Layout is Flutter-inspired
○ Explored a more SwiftUI-like approach
○ Successful prototype of our version of GeometryReader
■ dependency edge from layout to app logic
AccessKit integration
● AccessKit is a portable abstraction layer over platform Accessibility
○ Mac and Windows back-ends working, Linux in progress
○ Written in Rust + bindings for other languages
● Lazy instantiation of accessibility tree
○ accesskit_tree() called when AT (screen reader) is connected
● Incremental update
○ Triggered by changes to the widget tree
○ Widget calls update_access_kit_if_active(), updates delta in closure
○ egui & prototype Druid integrations built entire tree then diffed
Layered architecture
● Rebuilding bottom-to-top, but layers could be repurposed
● Bottom layer: GPU-accelerated drawing
○ scene fragments allow multithreading, but could support an immediate-mode API
● Middle layer: fairly standard retained widget tree
○ could be driven by scripting language bindings
● Top layer: Rust-centric reactive architecture
○ Other architectures to consider:
■ makepad has dual-dispatch tweak on immediate mode
■ Dioxus tries to adapt React to Rust
■ Sycamore tries to adapt SolidJS to Rust
○ Xilem can support language bindings (thanks to type erasure)
○ Demo: python
Problems out of scope (for now)
● Theming/styling
○ Simple key/value indirection in place for now
● Higher level description of animations
○ Get lower levels in place first, be able to play what we’re given
● Design tools
○ Arguably make/break for product success
○ Open question: import from other tools?
● Interaction with system compositor
○ Necessary for good video playback
○ Very difficult portability and compatibility problems
○ See blog post outline for deeper discussion
Current status
● Components exist as prototypes
○ xilem repo has working prototype
■ Existing Druid widget tree will be adapted
○ piet-gpu
○ xilemweb (not yet public)
● Active development effort on piet-gpu
● Significant open source community interest
○ Zulip instance is home base for community
Next steps
● Will rebuild Druid widget set in Xilem architecture
○ Good opportunity for people to contribute
● Follow progress
○ raphlinus blog
○ xi.zulipchat.com
○ Weekly office hours (8am Pacific time Wednesdays)
○ Major goal: explain how and why, not just crank out code
● Non-goals (for now):
○ This is still research, not a product
○ Build things in it to learn, not to ship

High Performance Rust UI.pdf

  • 1.
    High Performance RustUI Raph Levien Google Fonts 2022-12-02
  • 2.
    Outline ● Goals andmotivations ○ Performance! ● Rethinking the entire stack ○ Layering profoundly affects performance ○ Drawing: GPU compute shaders (piet-gpu) ○ Reactive model: enable Rust async ● Intro to Xilem architecture ● Intro to piet-gpu 2D rendering engine ● Deeper dive into (painting) layers and (architectural) layering ● Next steps
  • 3.
    A bit ofhistory ● xi-editor (2016) ○ Explore multi-process approach, use “native GUI” for front end ○ Async added much complexity, ultimately a failure ● Druid ca 2019 ○ ECS-inspired architecture ○ Use platform drawing capabilities ● Druid “muggle” architecture ○ Works well when app data is modeled as tree, and model->view map is straightforward ○ Other patterns painful to express ○ Runebender font editor (currently on back burner) ● GPU drawing (piet-metal, piet-gpu) ○ Ongoing research program
  • 4.
    Research questions toanswer ● How much performance is possible over existing state of the art? ○ Break down quantitatively: where are the wins? ● Is current architectural layering a local optimum? ● Can we express UI excellently in Rust? ○ Subgoals: ■ Code is concise ■ Idiomatic to write performant UI ■ Components compose well
  • 5.
    Rust ● Good news: ○High level, performant, and safe ○ Strong commitment to portability ■ Windows, mac, and Linux are 1st class; many other targets work ○ Very active ecosystem ● Bad news: ○ Traditional GUI patterns cannot be expressed idiomatically in Rust ■ Shared mutable state is discouraged (but possible through RefCell etc) ■ No traditional object-oriented inheritance ○ Lots of fragmentation in GUI space (blog post) ■ Elm and immediate mode GUI are most popular architectural approaches ○ Not optimized for dev experience (compiles can be slow)
  • 6.
    Performance ● Many dimensionsto performance ○ 60 120 fps refresh without jank ○ Lower latency (frame pacing & closer coordination with compositor) ○ Fast startup time ○ Small binary size ● Strategies ○ Use Rust 🦀 ○ Modern GPU centric drawing approach ○ Do expensive resource creation (image decompression, text layout) in threads ○ Minimize accidental complexity ○ Incremental computation ■ Avoid unnecessary work at all levels
  • 7.
    Xilem architecture fn app(count:&mut u32) -> impl View<u32> { column(( format!("Count: {}", count), button("Increment", |count| *count += 1), )) }
  • 8.
    Xilem architecture ● Formore details, read blog post ● A proposal for the most excellent way to express reactive systems in Rust ○ A simple yet flexible incremental computation engine ■ Intelligent scheduling of re-renders plus diffing (virtual “DOM”) ○ Based on trees (view tree, view state tree, widget tree) ○ Statically typed ■ Type of view state tree and widget tree is automatically inferred from view tree type ○ Optional type erasure ○ Inspired by SwiftUI
  • 9.
    Xilem core concept:View trait ● Two associated types: ○ View state ○ Widget ● Three methods: ○ build ■ create widget and initialize state ○ rebuild ■ diff against previous view tree, mutate widget & state ○ event ■ target of event is id path (slice of unique id values) ■ choose child based on [0], traverse using [1..] ■ body of event is Box<dyn Any>
  • 10.
    Xilem flow App state Viewtree 1 View tree 2 time rebuild (diff) View state Widget tree event event build
  • 11.
    Xilem async ● Keyconcept: async wake sent to view node as an event ● In Rust lingo, each Xilem node can be its own executor, sharing one reactor ● Minimize flashing for fast-resolving futures ○ Kick off paint cycle and start timer ○ Call app logic to render view tree, count pending async futures ○ Deliver async wakes, decrement on ready ○ When count->0, or timeout (~5ms), actually render ● Demo!
  • 12.
    piet-gpu ● GPU-accelerated 2Ddrawing ● A pipeline of compute shaders ● Break scene into 16x16 pixel tiles ● Coarse rasterization builds a command list for each tile ● Fine rasterization is basically a bytecode interpreter ● Inspired by Spinel, PathFinder, academic work
  • 13.
  • 14.
    piet-gpu scene encoding ●Conceptually, a tree of nodes ○ Path fill, path stroke (path = moveto, lineto, quadto, curveto, closepath) ○ Brushes: ■ Solid color ■ Linear/radial gradient ■ Image ○ Affine transform ○ Clip to path (affects children) ○ Blend (eg hard light, exclusion) ● Concretely, an array of binary-encoded streams ○ ~One stream for each type of node ○ Efficient (variable size) byte encoding for path segments
  • 15.
    piet-gpu scene fragment ●A struct of byte-encoded streams ● Operation of concatenating scene fragments is cheap ○ mostly just memcpy of the streams, plus some reference fixup ● Can build fragments in multiple threads (doesn’t require context) ● Changing transform doesn’t require re-encoding ○ Glyph outlines can be retained and re-rendered with different transforms
  • 16.
    piet-gpu WGSL port ●Shaders rewritten in WGSL ● Running on wgpu for native ● Running on WebGPU in browser (Chrome Canary) ● Workload is “advanced compute,” and have had to overcome some issues ○ miscompilations ○ uniformity analysis ● Opportunity: collaborate with rest of wgpu ecosystem ○ Looking into possibilities with Bevy
  • 17.
    Signature of paintmethod in widget trait ● Traditionally: ○ fn paint(&self, render_context: &mut RenderContext); ○ Sometimes with lots of additional mechanism for layers (Flutter repaint boundaries) ● Proposed (conceptual): ○ fn paint(&self) -> SceneFragment; ● Implications are profound ○ Retain scene fragment and avoid re-rendering (similar to Flutter layers) ○ Distribute rendering to multiple threads ■ Mutable render context forces single threading ○ Maybe: fine-grained update of fragments (but maybe re-encoding is good enough)
  • 18.
    Scene fragments vslayers (compositing) ● Similar goals: reduce re-rendering work when no change ● Compositing downside: variance in rendering time ○ GPU resource allocation & memory bandwidth on first render/re-render ○ Extra GPU RAM for retained textures is also an issue! ● Compositing downside: encourage compositor-friendly visual effects ○ Ok: translation (scrolling/sliding), alpha fading ○ Not ok: animated vector shapes, variable font animation, high-quality smooth zoom ● Goal of piet-gpu: no performance cliffs ○ Compositing happens in GPU vector registers, avoiding memory traffic ○ That includes clips, soft masks, blends
  • 19.
    A bit aboutthe widget tree ● Fairly traditional retained architecture ○ Major goal: support accessibility ○ Beginnings of integration with AccessKit ● Layout is Flutter-inspired ○ Explored a more SwiftUI-like approach ○ Successful prototype of our version of GeometryReader ■ dependency edge from layout to app logic
  • 20.
    AccessKit integration ● AccessKitis a portable abstraction layer over platform Accessibility ○ Mac and Windows back-ends working, Linux in progress ○ Written in Rust + bindings for other languages ● Lazy instantiation of accessibility tree ○ accesskit_tree() called when AT (screen reader) is connected ● Incremental update ○ Triggered by changes to the widget tree ○ Widget calls update_access_kit_if_active(), updates delta in closure ○ egui & prototype Druid integrations built entire tree then diffed
  • 21.
    Layered architecture ● Rebuildingbottom-to-top, but layers could be repurposed ● Bottom layer: GPU-accelerated drawing ○ scene fragments allow multithreading, but could support an immediate-mode API ● Middle layer: fairly standard retained widget tree ○ could be driven by scripting language bindings ● Top layer: Rust-centric reactive architecture ○ Other architectures to consider: ■ makepad has dual-dispatch tweak on immediate mode ■ Dioxus tries to adapt React to Rust ■ Sycamore tries to adapt SolidJS to Rust ○ Xilem can support language bindings (thanks to type erasure) ○ Demo: python
  • 22.
    Problems out ofscope (for now) ● Theming/styling ○ Simple key/value indirection in place for now ● Higher level description of animations ○ Get lower levels in place first, be able to play what we’re given ● Design tools ○ Arguably make/break for product success ○ Open question: import from other tools? ● Interaction with system compositor ○ Necessary for good video playback ○ Very difficult portability and compatibility problems ○ See blog post outline for deeper discussion
  • 23.
    Current status ● Componentsexist as prototypes ○ xilem repo has working prototype ■ Existing Druid widget tree will be adapted ○ piet-gpu ○ xilemweb (not yet public) ● Active development effort on piet-gpu ● Significant open source community interest ○ Zulip instance is home base for community
  • 24.
    Next steps ● Willrebuild Druid widget set in Xilem architecture ○ Good opportunity for people to contribute ● Follow progress ○ raphlinus blog ○ xi.zulipchat.com ○ Weekly office hours (8am Pacific time Wednesdays) ○ Major goal: explain how and why, not just crank out code ● Non-goals (for now): ○ This is still research, not a product ○ Build things in it to learn, not to ship