Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

"Observability is for User Happiness," performance.now() 2019 (Amsterdam)

576 views

Published on

Within the observability community, there’s a saying, “nines don’t matter if users aren’t happy,” meaning that 99.999% server uptime is a pointless goal if our customers aren’t having a fast, smooth, productive experience. But how do we know if users are happy? As members of the web performance community, we’ve been thinking about the best ways to answer that question for years. Now the observability community is asking the same questions, but coming at them from the opposite side of the stack. What can we learn from each other? Emily will talk about how approaching web performance through the lens of observability has changed the way her team thinks about performance instrumentation and optimization. She’ll cover the nuts & bolts of how Honeycomb instrumented its customer-facing web app, and she’ll show how the Honeycomb team is using this data to find and fix some of its trickiest performance issues, optimize customer productivity, and drive the design of new features.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

"Observability is for User Happiness," performance.now() 2019 (Amsterdam)

  1. 1. Observability and User Happiness @eanakashima
  2. 2. about:// Emily Nakashima (she/her)
 Director, Engineering
  3. 3. Web Performance?
  4. 4. Observability?
  5. 5. NONSENSE.
  6. 6. Site Reliability Engineering (SRE) Island The Isle of Ops Backend Engineer Island Fullstackville Frontend Island Perf Island The Isle of
 Browser
 Vendor Expansive Domain of
 of Designers &
 Product Managers Observability Web Perf
  7. 7. An observable system is one whose internal state can be deeply understood just by observing its outputs. Observability
  8. 8. Observability
  9. 9. Observability vs. Web Perf Let’s talk about birds.
  10. 10. Hawks vs. Falcons Red-Tailed Hawk @bwmaddog21 | https://www.flickr.com/photos/bwmaddog21/38890588682 Peregrine Falcon @beckymatsubara | https://www.flickr.com/photos/beckymatsubara/42017229951
  11. 11. Parrots && Falcons Parrot (Lovebird, Agapornis lilianae) @nikborrow | https://www.flickr.com/photos/nikborrow/31643657028/ Peregrine Falcon @beckymatsubara | https://www.flickr.com/photos/beckymatsubara/42017229951
  12. 12. Convergent evolution! Red-Tailed Hawk @bwmaddog21 | https://www.flickr.com/photos/bwmaddog21/38890588682 Peregrine Falcon @beckymatsubara | https://www.flickr.com/photos/beckymatsubara/42017229951
  13. 13. Same deal with Web Perf & Observability folks Web Perf practitioner @solutionist | https://www.flickr.com/photos/solutionist/48227528782/ Observability practitioner @solutionist | https://www.flickr.com/photos/solutionist/48227528782/
  14. 14. An observable system is one whose internal state can be deeply understood just by observing its outputs. Observability is a system property, just like performance. Observability
  15. 15. “Nines don’t matter if users aren’t happy.” - Charity Majors, all the time
  16. 16. Same deal with Web Perf & Observability folks Web Perf practitioner @solutionist | https://www.flickr.com/photos/solutionist/48227528782/ Observability practitioner @solutionist | https://www.flickr.com/photos/solutionist/48227528782/ Worries about developer adoption Thinks these numbers look wrong Cares deeply about UX Not sure how to balance this, my real job, with what they think they pay me for Debating nine different ways to measure the same thing Lots of emotional energy going into some standards or spec process Obsessed with numbers
  17. 17. Let’s join forces!
  18. 18. This talk 1. Talk about birds for five minutes 2. Data models 3. SLOs vs. performance budgets 4. Observability for perf optimization 5. Observability for UX design
  19. 19. References! bit.ly/hawks-vs-falcons
  20. 20. 2. Data models
 Events, metrics, logs, traces … oh my!
  21. 21. Two communities, different superpowers Web Perf: " Sophisticated tooling, many tool types " Amazing, mature developer experience " Lots of experience improving the ecosystem through specs & new browser APIs Observability: " Focused on instrumentation best practices " Goal is to enable answering any question about the state of your software " Just starting on specs
  22. 22. 2. Data models Events are the fundamental data unit of observability
  23. 23. 2. Data models != DOM events
  24. 24. What is an event? 1 event
 ==
 1 unit of work
 ~=
 1 request
  25. 25. What’s in an event? { "GojiPattern": "/user_event/:event_type", "Header.Content-Type": "["application/json"]", "Header.Cookie": "["_ga=GA1.2.2033006133.1516389900;", "Header.User-Agent": "["Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_1)…”]”, "Host": "127.0.0.1:8080", "IsXHR": true, "Method": "POST", "RequestURI": "/user_event/page-unload", "ResponseContentLength": 443, "ResponseHttpStatus": 200, "ResponseTime_ms": 123, "Timestamp": "2018-03-02T06:14:57.206349701Z", "UserEmail": "nathan@honeycomb.io", "UserID": 18, "availability_zone": "us-east-1b", "build_id": "6552", "env": "dogfood", "infra_type": "aws_instance", "instance_type": "t2.micro", "memory_inuse": 15450056, "num_goroutines": 56, "request_id": "poodle-a38f5e39/5fIUGkX5D1-001814", "server_hostname": "poodle-a38f5e39", "type": "request" },
  26. 26. High cardinality Fields that may have many unique values Common examples: • email address • username / user id / team id • server hostname • IP address • user agent string • build id • request url • feature flags / flag combinations
  27. 27. What about the Three Pillars? Logs Metrics Traces 207.46.1.2 - [03/Nov/2016:16:11:43 -0700] ”GET /robots.txt HTTP/1.1"
  28. 28. Events { “type”: “db.select”, “duration”: 23, “host”: “poodle-437987”, “query_shape”: “SELECT * FROM teams WHERE id = ?”, } Derive all Three Pillars from Events
  29. 29. Derive all Three Pillars from Events 1 (structured) log line ~= 1 event
 metrics can be derived from events
 traces = n events (spans) with parent/child relationships
  30. 30. Derive all Three Pillars from Events 1 (structured) log line ~= 1 event
 metrics can be derived from events
 traces = n events (spans) with parent/child relationships
  31. 31. Wait, maybe you should just use Tracing?
  32. 32. Distributed Tracing OpenCensus OpenTelemetry OpenTracing
  33. 33. Distributed Tracing … in the Browser bit.ly/hawks-vs-falcons
  34. 34. Simple trace: just load a page
  35. 35. When we create events (spans) • On page load • On history state change (SPA navigation) • On significant user actions • On error (also send to error monitoring tools) • On page unload
  36. 36. What’s in an event? { // For the page load event, collect information about the page “type”: “page-load”, “duration”: “1278”, “device_type”: “tablet”, “connection_type”: “3g”, “user_agent”: “Mozilla/5.0 (Macintosh)…”, // ...all feature flag states // ...all navigation timing measurements } { // For the button click, collect information about the interaction “type”: “usage-mode-button-click”, “duration”: “28”, “location”: “dataset list”, “animation_render_duration”: “127”, }
  37. 37. page-load duration, graphed
  38. 38. Complicated trace: lots of user interaction
  39. 39. Hawks vs. Falcons Network Tools (single view, exportable as HAR) Distributed trace for the same page view (RUM data)
  40. 40. What’s next? Measuring React renders! bit.ly/hawks-vs-falcons
  41. 41. 3. SLOs vs. performance budgets
 How to make sure we’re optimizing what really matters
  42. 42. Performance budgets
  43. 43. Performance budgets
  44. 44. Service Level Objectives (SLOs)
  45. 45. Service Level Objectives (SLOs) bit.ly/hawks-vs-falcons
  46. 46. SLOs, SLIs, SLAs For each facet of system performance (latency, errors, etc.) ask: • SLI: Service Level Indicator — what do we measure? ○ response time of web app requests • SLA: Service Level Agreement — what did we promise our customers? ○ response time will be under 10 seconds, 99% of the time • SLO: Service Level Objective — what number would keep our users happy? ○ response time should be under 1 second, 99.9% of the time
  47. 47. Service Level Objectives (SLOs)
  48. 48. Two tools, different superpowers Performance Budgets: " Many easy ways to get started " Great tooling support (webpack, lighthouse) " Easy to understand " Business stakeholders might not care SLOs (Service Level Objectives): " Extremely flexible—use any # you measure! " Get burn alerts when your budget runs low " Read multiple book chapters to understand " You get business stakeholder buy-in up front
  49. 49. 5. Observability for Perf Optimization
 Using observability to make things faster
  50. 50. Regular product architecture
  51. 51. Secure Tenancy product architecture
  52. 52. Observability for Perf Optimization It’s slow for one team only on the secure architecture
  53. 53. Secure product performance regression
  54. 54. Secure product performance regression
  55. 55. High-performance browser instrumentation code: 1. Batch requests together so you don’t run down battery & use up resources 2. Use the Beacon API to send events in a non-blocking way 3. Use `requestIdleCallback` or `setTimeout` to handle slower calculations Don’t shoot yourself in the foot
 while trying to look at your own foot
  56. 56. Another performance regression
  57. 57. Another performance regression
  58. 58. Another performance regression
  59. 59. Another performance regression
  60. 60. 5. Observability for UX Design
 Using observability to drive design
  61. 61. Using observability to drive design
  62. 62. Using observability to drive design
  63. 63. Using observability to drive design
  64. 64. Using observability to drive design
  65. 65. Using observability to drive design
  66. 66. Using observability to drive design
  67. 67. Using observability to drive design
  68. 68. We’re all product engineers now!
  69. 69. How to tell a hawk from a falcon Red-Tailed Hawk @hmclin | https://www.flickr.com/photos/hmclin/14119319574 Peregrine Falcon @zonotrichia | https://www.flickr.com/photos/zonotrichia/31001823086
  70. 70. Thank you @eanakashima bit.ly/hawks-vs-falcons

×