The Statistics of Web Performance Analysis

4,481
-1

Published on

If you're interested in measuring real user web performance, you'll find tools like boomerang or episodes quite handy. Some popular web frameworks even have modules that make it easy to add them to your site. However, what does one do once one has collected the data? How do you filter out the noise and get meaningful insights from the data?

In this talk, I'll go over the techniques we've picked up by analyzing millions of datapoints daily. I'll cover some simple rules to filter out invalid data, and the statistics to analyze and make sense of what's left. Do you use the mean, median or mode? What about the geometric mean and standard deviation? How confident are we in the results? And finally, why should we care?

This talk should help you gain useful insights from a histogram, or at the very least point you in the right direction for further analysis.

Published in: Technology, Design
1 Comment
4 Likes
Statistics
Notes
No Downloads
Views
Total Views
4,481
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
38
Comments
1
Likes
4
Embeds 0
No embeds

No notes for slide

The Statistics of Web Performance Analysis

  1. • Philip Tellis• .com• philip@lognormal.com• @bluesmoon• geek paranoid speedfreak• http://bluesmoon.info/ Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 1
  2. I’m a Web SpeedfreakBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 2
  3. We measure real user website performanceBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 3
  4. This talk is about the Statistics we learned while building it Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 4
  5. The Statistics of Web Performance Analysis Philip Tellis / philip@lognormal.com Boston #WebPerf Meetup / 2012-08-14 Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 5
  6. 0 NumbersBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 6
  7. Accurately measure page performance∗Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 7
  8. Be unintrusive If you try to measure something accurately, you will change something related – Heisenberg’s uncertainty principle Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 8
  9. And one number to rule them allBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 9
  10. What do we measure? • Network Throughput • Network Latency • User perceived page load time Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 10
  11. We measure real user dataBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 11
  12. Which is noisyBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 12
  13. 1 Statistics - 1Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 13
  14. Disclaimer I am not a statistician Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 14
  15. 1-1 Random SamplingBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 15
  16. Population All possible users of your system Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 16
  17. Sample Representative subset of the population Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 17
  18. Bad sample Sometimes it’s not Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 18
  19. How to randomize? http://xkcd.com/221/ Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 19
  20. How to randomize? • Pick 10% of users at random and always test them OR • For each user, decide at random if they should be tested http://tech.bluesmoon.info/2010/01/statistics-of-performance-measurement.html Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 20
  21. Select 10% of users - I if($sessionid % 10 === 0) { // instrument code for measurement } • Once a user enters the measurement bucket, they stay there until they log out • Fixed set of users, so tests may be more consistent • Error in the sample results in positive feedback Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 21
  22. Select 10% of users - II if(rand() < 0.1 * getrandmax()) { // instrument code for measurement } • For every request, a user has a 10% chance of being tested • Gets rid of positive feedback errors, but sample size != 10% of population Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 22
  23. How big a sample is representative? Select n such that σ 1.96 √n ≤ 5%µ Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 23
  24. 1-2 Margin of ErrorBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 24
  25. Standard Deviation • Standard deviation tells you the spread of the curve • The narrower the curve, the more confident you can be Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 25
  26. MoE at 95% confidence σ ±1.96 √n Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 26
  27. MoE & Sample size There is an inverse square root correlation between sample size and margin of error Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 27
  28. 1-3 Central TendencyBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 28
  29. Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 29
  30. One number • Mean (Arithmetic) • Good for symmetric curves • Affected by outliers Mean(10, 11, 12, 11, 109) = 30 Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 30
  31. One number • Median • Middle value measures central tendency well • Not trivial to pull out of a DB Median(10, 11, 12, 11, 109) = 11 Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 31
  32. One number • Mode • Not often used • Multi-modal distributions suggest problems Mode(10, 11, 12, 11, 109) = 11 Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 32
  33. Other numbers • A percentile point in the distribution: 95th , 98.5th or 99th • Used to find out the worst user experience • Makes more sense if you filter data first P95th (10, 11, 12, 11, 109) = 12 Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 33
  34. Other means • Geometric mean • Good if your data is exponential in nature (with the tail on the right) GMean(10, 11, 12, 11, 109) = 16.68 Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 34
  35. Wait... how did I get that? N ΠN xi — could lead to overflow i=1 ΣN loge (xi ) i=1 N e — computationally simpler Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 35
  36. Wait... how did I get that? N ΠN xi — could lead to overflow i=1 ΣN loge (xi ) i=1 N e — computationally simpler Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 35
  37. Wait... how did I get that? N ΠN xi — could lead to overflow i=1 ΣN loge (xi ) i=1 N e — computationally simpler Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 35
  38. Wait... how did I get that? N ΠN xi — could lead to overflow i=1 ΣN loge (xi ) i=1 N e — computationally simpler Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 35
  39. Other means And there is also the Harmonic mean, but forget about that Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 36
  40. ...though consequently We have other margins of error • Geometric margin of error • Uses geometric standard deviation • Median margin of error • Uses ranges of actual values from data set • Stick to the arithmetic MoE – simpler to calculate, simpler to read and not incorrect Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 37
  41. ...though consequently We have other margins of error • Geometric margin of error • Uses geometric standard deviation • Median margin of error • Uses ranges of actual values from data set • Stick to the arithmetic MoE – simpler to calculate, simpler to read and not incorrect Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 37
  42. 2 Statistics - 2Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 38
  43. 2-1 DistributionsBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 39
  44. Let’s look at some real chartsBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 40
  45. Sparse Distribution Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 41
  46. Log-normal distribution Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 42
  47. Bimodal distribution Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 43
  48. What does all of this mean?Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 44
  49. Distributions • Sparse distribution suggests that you don’t have enough data points • Log-normal distribution is typical • Bi-modal distribution suggests two (or more) distributions combined Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 45
  50. In practice, a bi-modal distribution is not uncommonBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 46
  51. Hint: Does your site do a lot of back-end caching?Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 47
  52. 2-2 FilteringBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 48
  53. Outliers • Out of range data points • Nothing you can fix here • There’s even a book about them Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 49
  54. Outliers • Out of range data points • Nothing you can fix here • There’s even a book about them Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 49
  55. Outliers • Out of range data points • Nothing you can fix here • There’s even a book about them Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 49
  56. Outliers • Out of range data points • Nothing you can fix here • There’s even a book about them Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 49
  57. DNS problems can cause outliers • 2 or 3 DNS servers for an ISP • 30 second timeout if first fails • ... 30 second increase in page load time • Maybe measure both and fix what you can • http://nms.lcs.mit.edu/papers/dns-ton2002.pdf Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 50
  58. Band-pass filtering Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 51
  59. Band-pass filtering • Strip everything outside a reasonable range • Bandwidth range: 4kbps - 4Gbps • Page load time: 50ms - 120s • You may need to relook at the ranges all the time Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 51
  60. IQR filtering Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 52
  61. IQR filtering Here, we derive the range from the data Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 52
  62. Further Reading lognormal.com/blog/2012/08/13/analysing-performance-data/ Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 53
  63. Summary • Choose a reasonable sample size and sampling factor • Tune sample size for minimal margin of error • Decide based on your data whether to use mode, median or one of the means • Figure out whether your data is Normal, Log-Normal or something else • Filter out anomalous outliers Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 54
  64. • Philip Tellis• .com• philip@lognormal.com• @bluesmoon• geek paranoid speedfreak• http://bluesmoon.info/ Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 55
  65. Thank youBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 56
  66. Photo credits • http://www.flickr.com/photos/leoffreitas/332360959/ by leoffreitas • http://www.flickr.com/photos/cobalt/56500295/ by cobalt123 • http://www.flickr.com/photos/sophistechate/4264466015/ by Lisa Brewster Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 57
  67. List of figures • http://en.wikipedia.org/wiki/File:Standard_deviation_diagram.svg • http://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svg • http://en.wikipedia.org/wiki/File:KilroySchematic.svg • http://en.wikipedia.org/wiki/File:Boxplot_vs_PDF.png Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 58
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×