Advertisement

#Webperf Choreography

Partner Engineer & Web Craftsman at Mozilla Corp
Oct. 28, 2017
Advertisement

More Related Content

Advertisement

#Webperf Choreography

  1. #WebPerf Choreography Improving the web for everybody Harald Kirschner Mozilla PM for Performance & Devtools
  2. Performance Primer 2 Different perspectives on performance for users, web developers, and browsers
  3. “Perceived performance, in computer engineering, refers to how quickly a software feature appears to perform its task. 3 Wikipedia: Perceived Performance https://en.wikipedia.org/wiki/ Perceived_performance
  4. 4 Measuring Perceived Performance Evolution of metrics
  5. Page Load Time (Generally) < 1s (on Cable or < 3s on 3G) for showing something meaningful 5 Aka: PLT, Navigation Timing
  6. Responsiveness Visually respond to any input in under 100ms, handling the full response in under 1s 6 Aka: Input Latency, Input Delay
  7. Smoothness 60 FPS (common display refresh rate) or 16ms per Frame (1 second / 60 = 16.66ms)
 7 Aka: Buttery, Long Frames, Slow Frames, Jitter
  8. Performance At Scale of the Web 8 Browsers and Web Developers have the same goals but different problems!
  9. As a Web Developer … "I want to ship my website in time!” 
 … “But also make websites that feel fast for my current and potential user!”
 How: Hit fast paths & Avoid anti-patterns 9
  10. As a Browser Engine … Make the web fast, for every website, for every user & for every device. Everywhere! How: • Add more fast paths & reduce anti-patterns • Interventions to fix user experience on websites that abuse the web • Performance timings to collect in the wild (RUM) • Outreach, console warnings and performance tooling 10
  11. Primer on Browser Engines 11 The art of drawing-pixels-from-web and its evolution over the years
  12. Anatomy of a Browser Engine 12 For more on this, read Lin Clark’s Code Cartoon on WebRender https://hacks.mozilla.org/2017/10/the-whole-web-at-maximum-fps-how-webrender-gets-rid-of-jank/
  13. Anatomy of a Browser Engine 13 Figure out what the page page should look like Draw the page on the screen
  14. In The Beginning: The Single-Threaded Web 14 Everything needs to be finished in 16.6ms, otherwise frames are dropped
  15. In The Beginning: The Single-Threaded Web 15 Everything needs to be finished in 16.6ms, otherwise frames are dropped
  16. Everything needs to be finished in 16.6ms 16 Otherwise frames are dropped, janking the experience vsync
  17. 17 Reducing cost and complexity of specific steps
 Skipping steps in the rendering pipeline
 Move work off the main thread that doesn’t touch the DOM • HTML Parsing • Script Parsing • Image Decoding, etc Continuously Improving Engines Goal: Free up the main thread for application logic Single-threaded, single-core to multi-process, multi-threaded and composited rendering pipeline
  18. Paint only visible area Paint just the area Just the pixels that the user will see (adding some margin). 18 aka: Culling
  19. Paint invalidation Reduces paint cost. Figure out the smallest rectangle to repaint for changes 19 aka. Dirty rectangles
  20. Layers Skips paints and reduce composite cost. Retains parts for the page that can be moved, like scrolling and animations. Compositing then just needs to arrange layers without painting.
 20 aka: Compositor Layers
  21. Layers Limitations: Creating & managing layers is costly and takes up memory 21 aka: Compositor Layers
  22. GPU Compositing Free main thread and makes compositing GPU- accelerated. Scrolling can be asynchronous (APZ), bypassing the main thread. GPUs are made for handling pixels and can easily combine layers. Compositor thread: Prepares work for GPU and handles scroll input (APZ)
 GPU thread: Handles all communication with the GPU to recover from GPU crashes 22 aka: Accelerated Graphics
  23. GPU Compositing Limitations: • Creating & managing layers is costly and takes up memory • Knowing when to create, combine or separate layers is complex • Layers have to be used sparingly, which limits how many things should be moving on the page • APZ: Scrolling can still be blocked by input event listeners 23 aka: Accelerated Graphics
  24. Cost & Complexity Balancing 24 Anti Patterns & Fast Paths for each step in the pipeline.
  25. Loading Cost Factors: RTT, Bytes Transferred, Number of Domains, Number of Requests Bad: Unoptimized, large images
 Good: Correctly size and export images Bad: Missing/bad cache headers
 Good: Correctly set cache headers Bad: Uncompressed transfers
 Good: Compressed files (see Brotli) 25 Holds biggest gains, start here!
  26. Style Cost Factors: CSS selectors complexity
 Bad: Complex CSS Rules
 Good: Specific CSS Rules Bad: Deeply nested DOM
 Good: Just enough DOM ;) Bad: Invalidating many elements
 Good: Reduce style invalidation 26 Happens a lot during loading & interactions
  27. Reflow Cost Factors: Number of elements to reflow, complexity of layout
 Bad: Complex Long-Running Javascript
 Good: Break out work and/or move it off thread Bad: Expensive input event handlers
 Good: Debounce expensive input event handlers 27 (Usually) invalidates layout for the whole document
  28. JavaScript Cost Factors: Running stuff, Garbage collection, JIT de-optimizations
 Bad: Complex Paints (i.e. shadows, gradients)
 Bad: Non-Composited Animations (i.e. background, colors, shadows)
 Worse: Layout-based Animations (i.e. width/ height, borders, font, etc)
 Good: Animate compositor-only properties
 Best: Use CSS Animations Good: Minimize invalidated regions
 Best: Avoid triggering paints 28 Triggered by everything from network, timers, promises, input, callbacks, …
  29. Paint & Composite Cost Factors: Amount/size of layers & pixels painted
 Bad: Forced Reflow/Synchronous Layout Worse: Layout Thrashing (Forced Reflow in a loop) Good: Read reflow-forcing properties once and before changing styling Better: Avoid reflow altogether 29 Triggered by everything from network, timers, promises, input, callbacks, …
  30. Firefox Quantum, where anti-patterns become fast paths 30 Take advantage of modern hardware and parallelize work across cores!
  31. “ Quantum is not a new web browser. Quantum is Mozilla's project to build the next-generation web engine for Firefox users, building on the Gecko engine as a solid foundation. Quantum will leverage the fearless concurrency of Rust and high-performance components of Servo to bring more parallelization and GPU offloading to Firefox. 31 Arrives November 14, 2017 https://www.mozilla.org/en-US/firefox/quantum/
  32. Quantum CSS 32 aka: Stylo https://hacks.mozilla.org/2017/08/inside-a-super-fast-css-engine-quantum-css-aka-stylo/
  33. Quantum CSS Reduce styling/re-styling cost by number of cores (yup, parallelism). Ships in Firefox 57, November 14 33 aka: Stylo
  34. Quantum Render 34 aka WebRender https://hacks.mozilla.org/2017/10/the-whole-web-at-maximum-fps-how-webrender-gets-rid-of-jank/
  35. Quantum Render Removes layer overhead and expensive paint overhead. Combines paint & compositing on the GPU and accelerates both. 35 aka: WebRender
  36. Quantum Render Initial landing will provide a solid foundation to optimize even further and unlock more gains. Ships in Firefox 58, beginning of 2018 36 aka: WebRender
  37. Quantum Projects continued … Racing Cache With Network
 New heuristics to let the browser fallback to the network if the cache access is too slow
 Pausing Videostream decoding in background
 Media keeps playing audio, but video decoding is paused and resumes when the tab is activated again
 Quantum DOM: Prioritize foreground tab & input
 Handle events first that matter most to the user
 JavaScript Start-up Bytecode Cache JIT-accelerated JavaScript on repeat page loads
 37 Performance work never stops
  38. Prioritizing with the Browser to dance faster 38 New dance move to performance hints and primitives
  39. Performance Primitives 39 APIs for developers to tell the browser how to schedule and prioritize work Loading Rendering Parallelism
  40. 40 HTTP/2 allows for multiplexing of multiple streams. Stream dependencies: Indicate which of the resources are more important than the others Push: Send multiple responses for a single client request. Avoid a round trip between fetching HTML and linked stylesheets and CSS Loading: HTTP/2 Priorities & Push All Modern Browsers
  41. 41 HTTP Header: Link: <https://fonts.gstatic.com>; rel=preconnect; crossorigin Link: <link href=‘https://fonts.gstatic.com' rel='preconnect' crossorigin> Network layer opens socket during network phase, eliminating roundtrips. Loading: Preconnect All Modern Browsers
  42. 42 HTTP Header: Link: </images/big.jpeg>; rel=prefetch Link: <link rel="prefetch" href="/images/big.jpeg"> Prefetching on low priority for resources that will be used in the next navigation/page load
 Loading: Prefetch All Modern Browsers
  43. 43 Link: <link rel="preload" href="late_discovered_thing.js" as="script"> High priority fetching for resources that you know you’ll need in the current page (but that the browser would not discover until later). Preloaded resources don’t block document’s onload as they don’t execute. Uses: • Preload Javascript dependencies for later • Fonts • Responsive Loading: • <link rel="preload" as="script" href="map.js" media="(min-width: 601px)"> Loading: Preload All Modern Browsers
  44. 44 Header: Cache-Control: public,max-age=31536000,immutable Mark a resource as “Not changing”, so the browser has not to ask the server ever if the resource changed. Reduces bandwidth and improves page load performance by avoiding requests. “In my testing a typical feed may initially be comprised of 150 different resources. Pressing refresh in Firefox 49 generates just 25 network requests.” Patrick McManus on testing Facebook.com Loading: Cache-Control Immutable Firefox 49, Edge 15, Safari TP 24, Chrome has a workaround
  45. 45 <script async src='https://www.google-analytics.com/analytics.js'></script> async: script will be executed as soon as it is available, without blocking HTML parsing defer: script is executed when the page has finished parsing Loading: Script Defer & Async All Modern Browsers
  46. 46 var handle = requestAnimationFrame((frameStart) => { … }); Did you know: • Fires at the beginning of a frame. • Does not fire when the page is in background. • Avoid forcing layout Rendering: Request Animation Frame All Modern Browsers
  47. 47 var handle = requestIdleCallback((idleDeadline) => { … }[, options]) Queue tasks for which the browser determines when there is free time for executing them. Idle time usually starts after work was handed over to the compositor. idleDeadline describes how much time is available and if the callback has been run because of the optional timeout. Provide optional timeout for required work Rendering: Request Idle Callback Not Edge
  48. 48 will-change: transform; Problem: Browsers infer when they should create/combine/destroy layers when elements are created and start/end animations. This hints browsers about the kind of changes to be expected on an element, so browsers apply ahead-of-time optimizations (usually creating layers). Risk: Layers add memory use and compositing complexity, so apply will-change sparingly and only after profiling performance. Rendering: CSS: will-change All Modern Browsers
  49. 49 var workSome = new Worker('worker.js'); workSome.postMessage('Hello world’); Limitations: • Workers don’t have DOM access but a growing numbers of APIs • Messaging around data is expensive and uses memory (uses structured cloning, a copy operation). Parallelism: WebWorkers All Modern Browsers
  50. 50 worker.postMessage(arrayBuffer, [arrayBuffer]); Passing data by transferring ownership (transferable objects), a high performant way of moving data. Comparable to pass-by-reference in native code. Parallelism: WebWorkers & Transferable Objects All Modern Browsers
  51. 51 var htmlCanvas = document.getElementById("canvas"); var offscreen = htmlCanvas.transferControlToOffscreen(); var worker = new Worker("offscreencanvas.js"); worker.postMessage({canvas: offscreen}, [offscreen]); Passing data by transferring ownership (transferable objects), a high performant way of moving data. Comparable to pass-by-reference in native code. Parallelism: WebWorkers & OffscreenCanvas Chrome & Firefox, both behind a flag
  52. 52 Parallelism: WebAssembly All Modern Browsers (Edge just shipped) https://hacks.mozilla.org/2017/02/what-makes-webassembly-fast/
  53. 53 Parallelism: WebAssembly All Modern Browsers (Edge just shipped) https://hacks.mozilla.org/2017/02/what-makes-webassembly-fast/
  54. 54 WebAssembly is bytecode for the web SharedArrayBuffer allow threads shared access to memory, eliminating the transfer cost. Both threads, main and worker, can be writing data and reading data from the same chunk of memory. Trade-off: Don’t use SharedArrayBuffer directly (race conditions). Use with languages that offer concurrency abstractions (like Rust, or PThreads in C++) and cross-compile via WebAssembly. Parallelism: WebAssembly & SharedArrayBuffer Shipped by Safari, In-Development by All Browsers https://hacks.mozilla.org/2017/06/a-cartoon-intro-to-arraybuffers-and-sharedarraybuffers/
  55. If you choose to improve performance … 55 “You start by being, above all else, fast.”
 via codinghorror
  56. 56 1.Profile, Profile, Profile! 1. Test on slow machines 2. Profile again, there as well 2.Know where to optimize by being data-driven 1. Run synthetic performance testing continuously (WebPageTest, SpeedCurve, Calibre, etc) 2. Collect RUM performance in the wild to 3.Report performance issues to browsers (me, @digitarald, @addyosmani, @paul_irish, etc) How to make fast websites … There is no end game, don’t drop the ball!
  57. 57 1.Cross-functional team effort that focussed on performance 2.Root-cause analysis on recorded performance profiles 3.Prioritize based on in-the-field performance data (RUM) 4.Continues triage of performance issues Quantum Flow Optimizing Firefox performance from top to bottom
  58. Questions? 58 As always, also read more on https://developer.mozilla.org/ and http://hacks.mozilla.org/
  59. Thank You 59 @digitarald For even more questions & feedback:
Advertisement