Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Rendering Web Content
@ 60FPS
Vangelis Kokkevis & Brian Salomon
vangelis@google.com bsalomon@Google.
com
Google Chrome

●
●
●

Recently celebrated Chrome’s fifth anniversary!
Hundreds of millions of active users
Cross platform:...
Chrome’s Multi-Process Architecture (pre-GPU)

User Input

Browser

Renderer
Renderer
Renderer
V8 (JavaScript)
V8 (JavaScr...
Why use the GPU?

●

Enable new platform features:
○

●

3D CSS, WebGL

Speed & Responsiveness
○
○
○

Less jank: Smoother ...
Accelerated Compositing

Re-rasterizing is expensive and should be avoided if possible
Caching rasterized contents into te...
Compositing Layers

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
The Rendering Pipeline

User Input
or Timer
Event

Run Script

Rasterize
Invalidated
Content

Re-Layout
Document

Upload N...
Tiling

Large content layers get tiled
●
●

Layer split up into 256 x 256 or 512 x 512 pixel tiles
Cache rasterized conten...
Tiling Example

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
GPU Architecture

Browser

Screen

Shared Memory

Renderer
Blink (WebGL)
Skia (Canvas)
Compositor

CMD
CMD
CMD
ringbuffer
...
The Challenge

Ideally….

16ms

JS

Layout

Rasterize

16ms

Upload

Draw

JS

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER...
The Challenge

In practice...

16ms

JS

Layout

Rasterize

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDE...
Threaded Compositing

Solution: Move compositing to its own thread

16ms
Main
Thread

Compositor
Thread

JS

Upload

Layou...
Good enough?

The devil’s in the details
●
●

Need to aggressively pre-paint tiles to avoid running out of rasterized cont...
Deferred Rasterization
Less checkerboarding: Move raster out of main thread

16ms
Main
Thread

Compositor
Thread

Raster
T...
Tooling

Lots of threads, lots of asynchronous tasks.
Good performance tools are a must for debugging and improving!
Tools...
Tracing

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Frame-Viewer

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Telemetry

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Challenges

●
●
●
●
●
●

Rasterization is a bottleneck
The main thread is unpredictable (JS, layout, long records)
There’s...
What does the future hold

More performance gains:
●
●
●
●

Hardware accelerated rasterization
“Zero-copy” texture uploads...
Skia

●

Portable 2D graphics/text engine
○
○
○
○
○

●

Multiple Backends
○
○
○
○

●

Device independent coordinates
3x3 m...
Primitives
Non-Zero
●
●
●
●
●
●
●

Lines
Rectangles
Ellipses
Rounded Corner Rectangles
Text
...
Paths
○
○

Made of contour...
Pipeline Stages
SkPaint: Life of a Path

Programmable (via Subclassing)
●

SkRasterizer
○
○

●

Coverage Mask -> Coverage ...
GPU Shaders

GPU Backend has an “effect” system for
building shaders
●
●
●

Effects arranged in linear order.
Write a snip...
Pipeline Stages and GPU Backend

●

SkPathEffect
○
○
○
○

●

SkRasterizer: ignored
○
○

●

Perform on CPU
Call filterPath(...
Pipeline Stages and GPU Backend Continued

●

SkShader
○
○

●

SkColorFilter:
○
○

●

Produces an Effect object that is in...
Pipeline Stages and GPU Backend Continued

●

SkXfermode: Either as GL coefficients or Effect
○

The Porter-Duff blend mod...
Primitives: Text

●

Skia sits on top of system font engine:
○
○
○
○

●

Large ALPHA8 texture used as glyph mask atlas (10...
Primitives: Text Continued

●
●

Glyphs packed in plots packed using Skyline algorithm [Jukka Jylänki http://clb.demon.fi/...
Primitives: Rects

Not anti-aliased: Simple, draw a quad!
Two approaches for anti-aliasing (non-MSAA):
●

Geometric
○
○
○
...
Primitives: Misc

Adaptations for stroked rectangles
Similar shader techniques for:
●
●
●

Ellipses
Circles
Rounded-Rectan...
Primitives: Paths

●

Why are paths hard?
○
○

In most general case have to handle both the fill rule and anti-aliasing
Af...
Primitives: Paths Continued

●
●
●
●

MSAA solves the AA problem
Use the stencil to solve the fill rule problem
Tessellate...
Primitives: Paths Continued

For AA paths without MSAA:
●
●
●

Detect if path is one of the other primitive types (e.g. ro...
Questions

?

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Upcoming SlideShare
Loading in …5
×

WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

1,853 views

Published on

Presentation WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon at the AMD Developer Summit (APU13) Nov. 11-13, 2013.

Published in: Technology, Art & Photos
  • Be the first to comment

WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

  1. 1. Rendering Web Content @ 60FPS Vangelis Kokkevis & Brian Salomon vangelis@google.com bsalomon@Google. com
  2. 2. Google Chrome ● ● ● Recently celebrated Chrome’s fifth anniversary! Hundreds of millions of active users Cross platform: ○ ○ ● ● ● Windows (XP +) , Mac, Linux Chrome OS (x86 and ARM), Android, iOS (*) Open source: Chromium and Blink Rapid release cycle, four channels (canary, dev, beta, stable) Core Principles: Speed, Security, Stability, Simplicity | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  3. 3. Chrome’s Multi-Process Architecture (pre-GPU) User Input Browser Renderer Renderer Renderer V8 (JavaScript) V8 (JavaScript) V8 Blink (JavaScript) Blink(Web Renderer) Blink(Web Renderer) (Web Renderer) Skia (2D graphics) Skia (2D graphics) Skia (2D graphics) | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL Screen Shared Memory
  4. 4. Why use the GPU? ● Enable new platform features: ○ ● 3D CSS, WebGL Speed & Responsiveness ○ ○ ○ Less jank: Smoother scrolling, 60fps CSS animations Page “sticks to your finger” Faster <canvas>, <video> | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  5. 5. Accelerated Compositing Re-rasterizing is expensive and should be avoided if possible Caching rasterized contents into textures is an effective way to reduce raster costs. Split the page contents into layers, use the GPU to composite them What gets a layer? ● ● Content that rasters on the GPU: WebGL, 2D Canvas, Video, Flash Content that is expected to change infrequently: ○ ○ ○ ● CSS transform and opacity animations Overflow scroll Fixed position elements Content that overlaps other composited content | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  6. 6. Compositing Layers | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  7. 7. The Rendering Pipeline User Input or Timer Event Run Script Rasterize Invalidated Content Re-Layout Document Upload New Content to Textures Draw Textured Quads < 16ms = (if needed) Compositor | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  8. 8. Tiling Large content layers get tiled ● ● Layer split up into 256 x 256 or 512 x 512 pixel tiles Cache rasterized contents in manageable chunks to ○ ○ Speed up scrolling Conserve VRAM | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  9. 9. Tiling Example | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  10. 10. GPU Architecture Browser Screen Shared Memory Renderer Blink (WebGL) Skia (Canvas) Compositor CMD CMD CMD ringbuffer ringbuffer ringbuffer GLES2 Client | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL GPU Process GLES2 Service Transfer Transfer Transfer buffer buffer buffer ANGLE (GL ES -> D3D)
  11. 11. The Challenge Ideally…. 16ms JS Layout Rasterize 16ms Upload Draw JS | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL Layout Rasterize Upload Draw
  12. 12. The Challenge In practice... 16ms JS Layout Rasterize | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL 16ms Upload Draw
  13. 13. Threaded Compositing Solution: Move compositing to its own thread 16ms Main Thread Compositor Thread JS Upload Layout 16ms Rasterize Draw | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL Upload Draw
  14. 14. Good enough? The devil’s in the details ● ● Need to aggressively pre-paint tiles to avoid running out of rasterized content in the compositor thread when scrolling. How many tiles to pre-paint? ○ ○ Too many: VRAM pressure, possibly lots of unnecessary work Too few: Checkerboarding | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  15. 15. Deferred Rasterization Less checkerboarding: Move raster out of main thread 16ms Main Thread Compositor Thread Raster Thread(s) 16ms JS Sort Tiles Record Display List Layout Issue Raster Tasks UT RT RT | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL UT RT UT Draw RT RT Sort Tiles RT Issue Raster Tasks UT UT RT UT RT UT Draw RT RT
  16. 16. Tooling Lots of threads, lots of asynchronous tasks. Good performance tools are a must for debugging and improving! Tools we use when developing Chrome: ● ● ● Tracing (to monitor what each thread is doing in a timeline) FrameViewer (Inspect layers, tiles and rasterization) Telemetry (automated performance measurement framework) | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  17. 17. Tracing | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  18. 18. Frame-Viewer | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  19. 19. Telemetry | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  20. 20. Challenges ● ● ● ● ● ● Rasterization is a bottleneck The main thread is unpredictable (JS, layout, long records) There’s not enough cores to go around (mobile) Bandwidth is at premium GPU is a shared resource and can get oversubscribed Huge matrix of OS / GPU / CPU / Drivers | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  21. 21. What does the future hold More performance gains: ● ● ● ● Hardware accelerated rasterization “Zero-copy” texture uploads Hardware accelerated image decode Smarter and more efficient layers | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  22. 22. Skia ● Portable 2D graphics/text engine ○ ○ ○ ○ ○ ● Multiple Backends ○ ○ ○ ○ ● Device independent coordinates 3x3 matrices w/ perspective Arbitrary clipping Transparency, anti-aliasing, dithering, filters Extension architecture for… SW rasterizer GPU (“Ganesh”) PDF Picture (display list) Open source ○ code.google.com/p/skia | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  23. 23. Primitives Non-Zero ● ● ● ● ● ● ● Lines Rectangles Ellipses Rounded Corner Rectangles Text ... Paths ○ ○ Made of contours Contours are connected set of Bezier curves ■ ■ ■ ○ ○ lines quadratics (rational) cubics Can be filled or stroked Fills are based on winding number | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL Even/Odd
  24. 24. Pipeline Stages SkPaint: Life of a Path Programmable (via Subclassing) ● SkRasterizer ○ ○ ● Coverage Mask -> Coverage Mask e.g. Blur Source-Space Coordinate -> Color e.g. Gradients, Bitmap Fill SkColorFilter ○ ○ Color -> Color e.g. Color Matrix, Blend with constant Color | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL Src Image -> New Src Image e.g. Color Blur, Morphology Filter Subsume SkColorFilter? SkXfermode ○ ○ ○ Path -> Coverage Mask e.g. ?? [considering deprecating] SkShader ○ ○ ● ● SkMaskFilter ○ ○ ● Path -> Path e.g. Dashing SkImageFilter ○ ○ ○ SkPathEffect ○ ○ ● ● AKA Blend Src Color + Dst Color -> New Dst Color e.g. Porter-Duff modes, Darken, … Fixed Function ● ● ● ● ● ● Stroking (width, caps, joins) Text settings (typeface, pt size, …) AA enable/disable Image filtering quality level Alpha Default color if no SkShader
  25. 25. GPU Shaders GPU Backend has an “effect” system for building shaders ● ● ● Effects arranged in linear order. Write a snippet of GLSL fragment code. Effect passes a vec4 “color” to the next effect. ○ ● ● Input to first effect is either constant or per-vertex value. Can insert uniforms, functions, textures. Internal effects can ○ ○ Insert vertex shader code. Require additional vertex attributes. | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL Initial Coverage Initial Color Color Effect 1 Color Effect 2 final color texture matrix uniform Cov. Effect 1 Cov. Effect 2 Cov. Effect 3 final coverage Important to keep color and fractional coverage separate.
  26. 26. Pipeline Stages and GPU Backend ● SkPathEffect ○ ○ ○ ○ ● SkRasterizer: ignored ○ ○ ● Perform on CPU Call filterPath(), draw the resulting path Special hooks for some dashing cases Future: general mechanism to avoid creating intermediate path object on CPU No known clients use custom rasterizers. Act as though no rasterizer installed SkMaskFilter: ○ Filter object is given a gpu “context object” and primitive’s mask ■ ■ ■ ○ ○ ○ Can create intermediate textures Performs draws using Effects Returns new mask as a texture. Special case for filters that can be performed inline with the draw to dst In practice the only significant SkMaskFilter is blur Future: Specialize blur code path for simple primitive types (e.g. rects) | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  27. 27. Pipeline Stages and GPU Backend Continued ● SkShader ○ ○ ● SkColorFilter: ○ ○ ● Produces an Effect object that is inserted into the draw Implementations for bitmap shaders, various gradient types, noise shader. Produces an effect that receives SkShader effect’s output. Implementations for color matrix, color table, blend-against-const-color SkImageFilter: ○ ○ Works the same way as SkMaskFilter but with color input/ouput Implementations for ■ ■ ■ ○ Graph implementation for chaining SkImageFilters together (CPU or GPU) ■ ■ ○ Color blur Lighting effect Any (color filter, shader, or xfermode) as an image filter SVG image filter DAG Future: Optimization pass to minimize intermediate draws. Shortcuts for Image filters that can be done inline or are really just a matrix. | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  28. 28. Pipeline Stages and GPU Backend Continued ● SkXfermode: Either as GL coefficients or Effect ○ The Porter-Duff blend modes (src-over, etc) are all expressible as GL blend coeffs ■ ○ Many others are not: ■ ■ ■ ■ ○ Big caveat here Luminance Darken Arithmetic … Xfermode can install an Effect ■ Access to the destination? ● Effect framework provides abstract interface for accessing the dst color ● GL_EXT_shader_framebuffer_fetch if available ● Future: GL_NV_texture_barrier ● Otherwise a dst-copy-to-texture is triggered | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  29. 29. Primitives: Text ● Skia sits on top of system font engine: ○ ○ ○ ○ ● Large ALPHA8 texture used as glyph mask atlas (1024 x 2048) ○ ○ ● FreeType CoreText GDI DirectWrite Will use a second RGB(A) texture if there are “LCD” glyphs Texture divided into 256x256 texel “plots” Strike: A unique combination of ● ● Typeface Size Style (italic, bold, …) Strikes claim (multiple) plots Plots purged wholesale using LRU | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL Strike 0 Strike 1 Strike 0 Strike 2 Strike 2 Strike 1 Strike 3 Strike 3 Strike 0 Strike 3 Strike 3 Strike 1 Strike 2 ○ ○ ○ Strike 3 (free) Strike 2
  30. 30. Primitives: Text Continued ● ● Glyphs packed in plots packed using Skyline algorithm [Jukka Jylänki http://clb.demon.fi/] Attempt to perform all uploads for a frame before draws ○ ○ ● Avoid flushing draws ○ ○ ● ● ● Queue GL draws Uploads go through immediately Only flush draws to GL when a plot is purged that is referenced in currently queued draws Matters a lot more on mobile, especially tiled architectures Works pretty well for scrolling Struggles with pinch-zoom Under development: distance field atlas ○ ○ Same texture partitioning and replacement scheme “Masks” are (mostly) resolution independent | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  31. 31. Primitives: Rects Not anti-aliased: Simple, draw a quad! Two approaches for anti-aliasing (non-MSAA): ● Geometric ○ ○ ○ Create inner and outer offset geometry Offset is 0.5 pixels Use “coverage” vertex attribute ■ ■ ○ ● c=1 0 at outer offset rect 1 at inner offset rect c=0 Handle degenerate cases Shader ○ Attributes: ■ ■ ■ ○ W = rect.width() + 0.5, H = rect.height() + 0.5 Y = normalized y-axis of rect C = center of rect coverage in Y at pixel P is clamp(H-((p - C) dot Y), 0, 1) Geometry shaders could reduce VBO size and save CPU cycles W C Y H p | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  32. 32. Primitives: Misc Adaptations for stroked rectangles Similar shader techniques for: ● ● ● Ellipses Circles Rounded-Rectangles | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  33. 33. Primitives: Paths ● Why are paths hard? ○ ○ In most general case have to handle both the fill rule and anti-aliasing After a blend coverage/alpha distinction is lost. Must only perform one blend in general. Can’t double blend in overlap! Can’t anti-alias interior edge! Multiple edges from different contours relevant to pixels in concavities! | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  34. 34. Primitives: Paths Continued ● ● ● ● MSAA solves the AA problem Use the stencil to solve the fill rule problem Tessellate contours into line segments Pass 1: ○ ○ ○ ● +1 Draw the tessellated contours as triangle fan Disable color writes Stencil op: +1 for front face, -1 for back face -1 Pass 2: ○ ○ ○ Draw bounding geometry Enable color writes Stencil func ■ ■ ● Pass 1 +1 Winding: Pass if stencil is non-zero Even/Odd: Pass if LSB is 1 Avoid tessellating quadratic and cubic beziers: ○ ○ ○ Discard in FS if outside the curve [Kokojima et al.] Need per sample discard or sample coverage mask No-go on ES3 :( | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL Pass 2
  35. 35. Primitives: Paths Continued For AA paths without MSAA: ● ● ● Detect if path is one of the other primitive types (e.g. rounded rectangle) If very thin stroke draw as AA lines (and ignore double blend problem) If path is convex fill rule problem goes away ○ ○ ○ ● Fan the on-contour control points Draw bounding hulls of curves Compute coverage using implict eq. approx distance to curve [LoopBlinn] Otherwise, SW rasterize mask and upload | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  36. 36. Questions ? | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

×