Slide deck for a presentation at OSCON 2011 about why Netflix uses web technology for TV user interfaces and how we maximize performance for a broad range of devices.
2. These slides were originally designed for a presentation. They’ll make much more sense if you read the speaker notes. (On Slideshare, as of 8/11, speaker notes appear beneath the slide show.) README
3. What is Webkit TV UI? Why web? Engineering for UI variation Performance for TV devices Topics
15. Which component handles the next keystroke? How & where do we model navigation between components? …And also, these components should be reusable between completely different UIs Solve These
17. States as the C in MVC Can drive state transitions States are event handling contexts User input Programmatic events Current Solution: State Transition System
18. Search Input State Search Input State Search Compound State Search Results State
19. Search Input State Search Compound State Search Results State Search Results State
29. Progressive enhancement Baseline Enhanced Animations Request throttling Cache sizes Data pre-fetching None enabled 5 concurrent Small Delayed, Small batches All enabled 20 concurrent Large Frequent, Larger batches
30. 0.1 second: Feeling of instantaneous response 1.0 second: Keeps flow of thought seamless 10 seconds:Keeps the user’s attention Perceived Performance Nielsen 2010, 1993; Miller 1968; Card et al. 1991
31. Provide immediate feedback on user input Split up long running process Mask and reduce perceived wait times Background work and anticipate common requests Ways to Improve Responsiveness
32. Wait until the user settles for expensive operations or paints Avoid DOM changes at the beginning of / during animations Tune delays to find the sweet spot Ways to Improve Responsiveness
34. Naïve implementation Progressively inserted new DOM nodes Animated very large DOM parent Height ever-growing of DOM parent Bad: Performance degraded as you scrolled 1 2 3 4 5 6 7 Performance Evolution: Scrolling Rows
35. Naïve implementation Progressively inserted new DOM nodes Animated very large DOM parent Height ever-growing of DOM parent Bad: Performance degraded as you scrolled 1 2 3 4 5 6 7 Performance Evolution: Scrolling Rows
36. Optimized implementation Recycle DOM nodes Animate each row individually Delaying modifying row until comes into viewport or the user settles Good: Performance consistent regardless of location 1 2 3 4 2 1 5 Performance Evolution: Scrolling Rows
37. Optimized implementation Recycle DOM nodes Animate each row individually Delaying modifying row until comes into viewport or the user settles Good: Performance consistent regardless of location 3 4 5 1 2 Performance Evolution: Scrolling Rows
41. Enables GPU acceleration of compositing parts of the page Greatly benefits CSS animations Accelerated Compositing
42. DOM Tree -> Render Tree -> RenderLayer Tree Software path Changes to a render layer require repainting all overlapping layers Hardware path Some render layers paint to their own backing surface (compositing layer) Changes to a layer only repaint the contents of that layer Accelerated Compositing
43. 3D transforms Opacity changes Accidental Overlapping a layer Render engine Several ways to create layers
44. Safe CSS properties Transforms Opacity Un-safe Any other CSS properties DOM manipulation Leveraging layers
45. Keep layers small Don’t inadvertently create gigantic layers Memory consumption = width x height x 4 (bit depth) Animate smaller areas rather than large parts of the screen Trial and error, testing important Tips
47. Avoid unbounded growth Minimize the number of throwaway objects Use closures sparingly & only where necessary Dynamically load and unload code Memory
48. What’s Next? i was led to believe there would be flying cars
Core differences for UI design between 2’ UI (desktop) and 10’ UI (TV): - Mouse & keyboard vs. diverse remote controls, game controllers, etc. - Solitary vs. group - A device designed for activity (“lean-forward”) vs. a device designed for consumption (“lean-back”)
Our Webkit-based TV UI runs on a number of game consoles, set-top boxes, Internet-connected TVs, and Blu-ray players.PS3 image from store.sony.comSony Internet TV image from sonystyle.comWii image from nintendo.comLogitech Revue image from logitech.comBoxee image from laughingsquid.com
Why do we use web technologies instead of native applications?Most TV applications are written specifically for that device, in C/C++.
Why web technologies: Dynamic updatesWith web technologies, we can update frequently without the customer even knowing. We typically push updates every two weeks, but we can push emergency updates at any time. That means we can easily add new features and fix bugs on existing hardware, just like any other website.If our UI changes don’t require a firmware update, that means we require less coordination with partners. Device development cycles are very long, and firmware updates can be rare (certainly rarer than once every two weeks).
Why web technologies: Common technologyWe don’t have to reimplement the application for every platform. Device manufacturers implement a DPI (device porting interface) to support our SDK, and our application should Just Work™ (though it may require special attention to performance). We have a certification team that verifies that the app works on a partner device as expected.Even if the UI differs between devices, other client-side code can be shared (authentication, string bundles & i18n, metadata cache design, common UI components…)Old TV photo by Susan E Adamshttp://www.flickr.com/photos/susanad813/4167385353/Attribution-NonCommercial-ShareAlike 2.0 Generic (CC BY-NC-SA 2.0)http://creativecommons.org/licenses/by-nc-sa/2.0/HTML5 logo by W3Chttp://www.w3.org/html/logo/Attribution 3.0 Unported (CC BY 3.0)http://creativecommons.org/licenses/by/3.0/
Why web technologies: Dynamically Add Locale SupportSince our UI text labels are just bundled in our JavaScript, we can add support for many new locales by uploading a new string bundle and a new UI version that’s aware of the bundle’s availability.(Some locales may be more complicated, e.g. if they require large new fonts that we don’t want to download at runtime.)
Why web technologies: A/B testingWe rely on A/B testing to help us figure out which experiences are most effective for helping our customers find movies & TV shows they’ll love.With web technologies, we can easily redirect different customers to different web pages, so those customers get different experiences.
Supporting variationWe build and maintain a number of different user interfaces. We constantly add new features, create new UIs to support new devices, etc.To support this vast variation, we have to architect our applications and components for maximum flexibility.
Supporting variation: Problem description (1/2)Consider one of our device UIs, codenamed “Special.” It may have dozens of UI components (menus, box art gallery, details page, episode selection, audio tracks, subtitles, star ratings, buffering screen, playback, fast forward mode…).
Supporting variation: Problem description (2/2)How would we solve these requirements for Special?The first two would be easy if not for the third.
Supporting variation: Solutions we’ve tried & abandonedIn our products and prototypes, we’ve tried a number of different ways of solving those problems.Inadequate solution #1: Tight couplingIn this model, components are aware of one another, and manipulate each other directly, e.g.:navigation.handleRight = function() { gallery.focus(); };Tight coupling is very easy to understand, since there’s no abstraction.However, it makes change & reuse difficult: - What if the gallery is on the left of the navigation? - What if we want to show a settings panel instead of a gallery?Inadequate solution #2: Mediator patternIn the mediator pattern, components are completely ignorant of one another, and are therefore totally independent and reusable in other UIs.They broadcast events that the “mediator” class handles. The mediator models the relationship between components.Unfortunately, since your variant UIs may need to reuse some or most of those relationships, you end up needing to reuse just part of your mediator.The components are reusable, but the monolithic mediator is not.Inadequate solution #3: DOM focus & eventsThis model is what most developers use on the open web, and it works well to support mouse interaction: if it’s on the DOM and visible, you can click on it, and a handler/listener can be associated with that DOM element.The approach can be used with LRUD input (left/right/up/down), too: programmatically assign focus to DOM elements, and key input events will be dispatched on those elements.Unfortunately, this ties the controller & view: - It requires a DOM view, which precludes use of alternative rendering technologies (e.g. Canvas). (…Unless you decouple your controller DOM elements from your view DOM elements, but now we’re just making things complicated...) - It also falls apart when something is interactive but isn’t rendered (yet), like if metadata about some movies is still loading, or if we’re re-rendering a set of buttons to reflect available options for the selected title. Dynamic documents make a mess out of DOM focus. - It makes it harder to take input from non-DOM sources, e.g.: touch/gesture input that doesn’t correspond to DOM events, automated testing, voice control,parts of the system that govern other parts programmatically, etc.
Supporting variation: Current solutionOur state transition system models the application as a set of discrete application states. The states are also controllers (as in model-view-controller), so they handle user input and programmatic events and decide when it’s time to transition to a different state.There can be only one active state at a time. Only the active state can receive state events like user input (e.g. an up key event) or programmatic events (e.g. video buffering complete). This helps disambiguate the application state, and tells us which UI component (state) should handle the next key press.References:http://en.wikipedia.org/wiki/State_transition_system
Supporting variation: State system example: SearchSearchInputState and SearchResultsState are two states/controllers. Only one is active at any time.The SearchCompoundState models the relationship between the two (i.e. it owns the state transition criteria), and knows when to change the active state (i.e. it triggers the state transition).When focus starts on SearchInputState and the customer presses the “right” key, SearchInputState handles the input state event and decides what to do: - If there’s another column of keys to the right of the focused key, increment the focused column by one and return true. Returning true halts event propagation. - Otherwise, return false. The event will bubble to the SearchCompoundState, which interprets a “right” key event as a state transition criterion, and sets the active state to SearchResultsState.Note that the SearchResultsState still updates even when the SearchInputState is the active state, since the input state can manipulate a SearchModel that SearchResultsState listens to using the Observable pattern (i.e. events, but not state events).
Supporting variation: State system example: SearchSearchInputState and SearchResultsState are two states/controllers. Only one is active at any time.The SearchCompoundState models the relationship between the two (i.e. it owns the state transition criteria), and knows when to change the active state (i.e. it triggers the state transition).When focus starts on SearchInputState and the customer presses the “right” key, SearchInputState handles the input state event and decides what to do: - If there’s another column of keys to the right of the focused key, increment the focused column by one and return true. Returning true halts event propagation. - Otherwise, return false. The event will bubble to the SearchCompoundState, which interprets a “right” key event as a state transition criterion, and sets the active state to SearchResultsState.Note that the SearchResultsState still updates even when the SearchInputState is the active state, since the input state can manipulate a SearchModel that SearchResultsState listens to using the Observable pattern (i.e. events, but not state events).
Supporting variation: Loose couplingIn addition to our state transition system, we also use some other abstraction techniques to provide flexibility.Event patterns help decouple event sources from consumers.State actionsOnly the active state can receive state actionsObservable patternConsumer knows sourceSource ignorant of consumerMessage BusSource & consumer ignorant of one anotherImplemented as a global Observable (with keys for event names, e.g. “TITLE_FOCUSED”)Dependency injection allows us to bind components at runtime based on configuration.It’s similar to a factory, but instead of referencing multiple components like a factory, our configuration names a specific component associated with that UI. Some examples of how we use dependency injection:States/controllersWhich video playback screen does this UI use? In that screen, for this device, which keys map to which behaviors?Navigation modelHow do we get from search input to results? Are results off the right edge of the input, or the bottom, or what? Does the customer have to press a button?Magnitude configurationHow many search results shall we fetch?
Why we worry about performance and memoryStrategies for improving performance and efficiently using memory
Unbounded memory growth can occur, memory fragmentationImpact performance: slowdown or cause crashesWant the app to feel snappy, smooth, consistent
All of the things above can vary across devices and impact the performance and memory footprintCredits:Circuit Board by John Morrishttp://www.flickr.com/photos/jm999uk/182396962/Attribution-NonCommercial-ShareAlike 2.0 Generic (CC BY-NC-SA 2.0) http://creativecommons.org/licenses/by-nc-sa/2.0/
Just looking at CPU and memory alone, you can see that there's a wide range between the low and the high endGraphics driver and GPU can really impact performance on these devices
Looks like we have a lot of memory available but we don’t get to use all of itLots of things competing for memoryWebkit is no small potatoOur port which uses Qt is ~20MB of codeReducesthe overallmemory available to the UIOthers:Background processesGraphics subsystem
Group devices into classes and target configuration based on the device class: range of attributesCurrently, we have 3 tiersDevices all start out in the middle tier and then based on performance are adjusted up or down
Optimize for perceived performance: focus on improvements that reduce delays in responsiveness to user actionsStudies have found this as well as our own qualitative research0.1: Within 100 milliseconds provide feedback on user action. Longer and the perception of cause and effect is broken1.0: Try to complete actions within a second. After 1 sec, provide feedback that the action is taking place: progress indicatorsAfter a second without feedback: users start to lose attention, feel that the app is brokenA lot of what we deal with in perceived performance is within the sub 1 second rangeReferences:“Usability Engineering” by Jakob Nielsenhttp://www.useit.com/alertbox/response-times.htmlhttp://www.useit.com/papers/responsetime.html“Designing with the Mind in Mind: Simple Guide to Understanding User Interface Design Rules” by Jeff Johnsonhttp://www.sapdesignguild.org/editions/highlight_articles_02/responsiveness.asphttp://www.sapdesignguild.org/editions/highlight_articles_02/perc_perf.asphttp://developer.gnome.org/hig-book/3.0/feedback-response-times.html.en
Immediate feedback:Don’t = key input ignored / failed. Repeat it. Result = navigate further then intendedSacrifice immediate content updates in favor of immediate feedbackSplit up processes:Execution Deferrer: do this when you have time. Examples: processing AJAX responses, eventsMask and reduce perceived wait times:Spinner while we’re loading the movie metadata on our movie details pageDuring application start-up, we progress through text to give the user a sense that the application is proceedingDuring playback startup, have a progress bar that progresses through metadata loading, license acquisition, buffering, etc.Background work:Pre-fetch metadataMetadata = movie metadata and imagesSome applications or UIs might be able to take advantage of Web WorkersReferences:“Usability Engineering” by Jakob Nielsenhttp://www.useit.com/alertbox/response-times.htmlhttp://www.useit.com/papers/responsetime.html“Designing with the Mind in Mind: Simple Guide to Understanding User Interface Design Rules” by Jeff Johnsonhttp://www.sapdesignguild.org/editions/highlight_articles_02/responsiveness.asphttp://www.sapdesignguild.org/editions/highlight_articles_02/perc_perf.asphttp://developer.gnome.org/hig-book/3.0/feedback-response-times.html.en
Settles:User settles = stops navigating for long enough (200-300 ms) that we believe they intend to consume the content they’re focused on.Opportunity for update content/metadata on the screen, prefetch metadata, perform delayed tasksHow: setTimeout that is cleared when another key input event occursAnimations:Wait until “webkitTransitionEnd” event, then slightly longer (timeout) to allow key inputAnimations are timed based. Doing anything during eat up time having Webkit paint parts of screen, drop framesConsistent and smooth has better perceived performance than inconsistent and jerky even if its faster.
Because memory constraints don’t prefetch and load all data (imagine endless rows vertically and horizontally)Grey placeholders: user scrolled to area not loaded metadata or delaying updates in favor immediate feedbackLike during rapid scroll, don’t load movie boxart until user settlesDo update list titles to give context within the gridBob = back of DVD box: lower the opacity to show focus no longer matchesSave expensive paints since large portion of screenSettle: update focused row and Bob first and stagger out rest of rowsGive time for key input in between. Avoid UI freezing up from large/expensive paintsOther visual cues:Counter: provides context to where you are horizontally when the row of moviesBoots and hats (rows peeking through on the top and bottom) give user context that there is more to navigate to
The blue boxes are rows of box art and the dotted line is the viewportCreate and insert new DOM nodes into the parent node
The blue boxes are rows of box art and the dotted line is the viewportCreate and insert new DOM nodes into the parent node
DOM recycling: reusing DOM nodes for other movie rows.Don’t change their location in the DOM (don’t do any insertions)Just change their visual position with -webkit-transform
DOM recycling: reusing DOM nodes for other movie rows.Don’t change their location in the DOM (don’t do any insertions)Just change their visual position with -webkit-transform
Take advantage of what the hardware is optimized for / good atGPU designed for drawing and compositing operations that involve large numbers of pixelsAvoid computationally expense work in software
Cool new CSS3 features can come at a cost on low-end devicesCSS can be very powerful but often you are doing work in software that can be pre-baked in imagesHardware can just paint images. CSS requires software to calculate, render and paintFull-screen radial gradient was a background image and the UI was the foreground
Often can accidentally cause paints without realizing itRendering engine doesn’t optimize for no-ops, e.g. changing textContent to the same textMinimize DOM manipulationLocalize changes: update attributes, e.g. imgsrc, css, textContentReferences:http://code.google.com/chrome/devtools/docs/overview.html
Without accelerated compositing, animations would be slowEvery device we’ve seen has benefited from accelerated compositing, even high-end platforms like the PS3Accelerated compositing trading memory for performance, but often devices have plenty of video memory
Render Tree:Each node in the DOM tree that produces visual output has a corresponding RenderObjectRenderLayer Tree:Each RenderObject directly or indirectly has a RenderLayerRenderLayers exist so that the elements of the page are composited in the correct order to properly display overlapping content, semi-transparent elements, etc.Hardware path:Compositing layer = bitmap in video memory (unless it’s a container)References:http://www.chromium.org/developers/design-documents/gpu-accelerated-compositing-in-chrome
Easy to accidentally create themOverlapping an element on top of a layer will create an implicit layer of that elementRender engine may choose to render an element as a layer
Safe CSS properties to change on a layer that will not cause it to re-render the layerUnsafeAny other properties are not safe to modifyDOM insertions, manipulation (attributes, textContent)
Gigantic layers: text-indent -8000pxSome devices paint larger areas slowerYou could actually make it slower if you just turned everything into a layerNot always intuitive what improves the performance: trial and error is importantUse the tools to verify the performance on actual devices
Mention available in Safari and ChromeDescribe visual hintsDiscuss optimizations: vertical up/down and horizontal left/right navigationVisual hints:Red – a layerYellow – containerNumber in the top left corner of the layer indicates how many times it has been repainted.Cyan – Helps with debugging “overflow”. Not expensive for memoryGreen – Tiled layer. Created when the layer width/height is > 1024Optimizations:Each row has 2 layers: one for the title, one for the boxartAllows fast updates to the title during rapid scrolling as it’s a smaller area to repaint then 1 big layer for the rowFor each row animating 2 layersLeft to right, breakdown the boxart layer into individual layers for each boxartOptimizes for left/right animation and recycling the boxart nodesCounter updates for each key input but didn’t for the movie details layer since only does when user settlesEnabling in Browsers:SafariOn terminal run following command:defaults write com.apple.SafariIncludeInternalDebugMenu 1Go to "Debug" Menu and select "show compositing borders”Chromeabout:flagsEnable “Composited render layer borders”
Unbounded:Eliminate references to free up for garbage collectionUse pools to fix sizes and re-use objects, e.g. DOM elementsObjects:Lots of objects will cause more frequent and slower garbage collectionsUnpack options objectMemory fragmentationVideo buffer needs a large contiguous block of free space. Too many small blocks = can’t allocate large blocksClosuresAvoid persistent closures that increase the scope chain and memory footprintLoading:Infrequently visited areas of the app
What’s nextDevices - New game consoles with novel capabilitiesMore UI and feature testingInput types - Pointer: We’ve already developed some UIs for TV devices with pointer input. - Touch - Motion/gesture - VoicePerformance tuning - Greater capabilities on inexpensive devices - Greater capabilities on high-end devices