• Philip Tellis•                           .com• philip@lognormal.com• @bluesmoon• geek paranoid speedfreak• http://bluesm...
I’m a Web SpeedfreakBoston #WebPerf Meetup / 2012-08-14   The Statistics of Web Performance Analysis   2
We measure real user website performanceBoston #WebPerf Meetup / 2012-08-14   The Statistics of Web Performance Analysis   3
This talk is about the Statistics we learned while building it  Boston #WebPerf Meetup / 2012-08-14   The Statistics of We...
The Statistics of Web Performance Analysis            Philip Tellis / philip@lognormal.com             Boston #WebPerf Mee...
0                             NumbersBoston #WebPerf Meetup / 2012-08-14   The Statistics of Web Performance Analysis   6
Accurately measure page performance∗Boston #WebPerf Meetup / 2012-08-14   The Statistics of Web Performance Analysis   7
Be unintrusive     If you try to measure something accurately, you will change                          something related ...
And one number to rule them allBoston #WebPerf Meetup / 2012-08-14   The Statistics of Web Performance Analysis   9
What do we measure?    • Network Throughput    • Network Latency    • User perceived page load time      Boston #WebPerf M...
We measure real user dataBoston #WebPerf Meetup / 2012-08-14   The Statistics of Web Performance Analysis   11
Which is noisyBoston #WebPerf Meetup / 2012-08-14   The Statistics of Web Performance Analysis   12
1                        Statistics - 1Boston #WebPerf Meetup / 2012-08-14   The Statistics of Web Performance Analysis   13
Disclaimer   I am not a statistician      Boston #WebPerf Meetup / 2012-08-14   The Statistics of Web Performance Analysis...
1-1  Random SamplingBoston #WebPerf Meetup / 2012-08-14   The Statistics of Web Performance Analysis   15
Population                        All possible users of your system       Boston #WebPerf Meetup / 2012-08-14   The Statis...
Sample                    Representative subset of the population         Boston #WebPerf Meetup / 2012-08-14   The Statis...
Bad sample                                   Sometimes it’s not      Boston #WebPerf Meetup / 2012-08-14   The Statistics ...
How to randomize?                                                                                   http://xkcd.com/221/  ...
How to randomize?      • Pick 10% of users at random and always test them                                               OR...
Select 10% of users - I       if($sessionid % 10 === 0) {          // instrument code for measurement       }     • Once a...
Select 10% of users - II       if(rand() < 0.1 * getrandmax()) {          // instrument code for measurement       }     •...
How big a sample is representative?                                     Select n such that                                ...
1-2     Margin of ErrorBoston #WebPerf Meetup / 2012-08-14   The Statistics of Web Performance Analysis   24
Standard Deviation     • Standard deviation tells you the spread of the curve     • The narrower the curve, the more confid...
MoE at 95% confidence                                       σ                                 ±1.96 √n      Boston #WebPerf...
MoE & Sample size   There is an inverse square root correlation between sample size                         and margin of ...
1-3   Central TendencyBoston #WebPerf Meetup / 2012-08-14   The Statistics of Web Performance Analysis   28
Boston #WebPerf Meetup / 2012-08-14   The Statistics of Web Performance Analysis   29
One number    • Mean (Arithmetic)       • Good for symmetric curves       • Affected by outliers                Mean(10, 1...
One number    • Median       • Middle value measures central tendency well       • Not trivial to pull out of a DB        ...
One number    • Mode       • Not often used       • Multi-modal distributions suggest problems                Mode(10, 11,...
Other numbers    • A percentile point in the distribution: 95th , 98.5th or 99th        • Used to find out the worst user e...
Other means    • Geometric mean        • Good if your data is exponential in nature          (with the tail on the right) ...
Wait... how did I get that?                N                    ΠN xi — could lead to overflow                     i=1     ...
Wait... how did I get that?                N                    ΠN xi — could lead to overflow                     i=1     ...
Wait... how did I get that?                N                    ΠN xi — could lead to overflow                     i=1     ...
Wait... how did I get that?                N                    ΠN xi — could lead to overflow                     i=1     ...
Other means    And there is also the Harmonic mean, but forget about that      Boston #WebPerf Meetup / 2012-08-14   The S...
...though consequently   We have other margins of error    • Geometric margin of error          • Uses geometric standard ...
...though consequently   We have other margins of error    • Geometric margin of error          • Uses geometric standard ...
2                        Statistics - 2Boston #WebPerf Meetup / 2012-08-14   The Statistics of Web Performance Analysis   38
2-1         DistributionsBoston #WebPerf Meetup / 2012-08-14   The Statistics of Web Performance Analysis   39
Let’s look at some real chartsBoston #WebPerf Meetup / 2012-08-14   The Statistics of Web Performance Analysis   40
Sparse Distribution       Boston #WebPerf Meetup / 2012-08-14   The Statistics of Web Performance Analysis   41
Log-normal distribution       Boston #WebPerf Meetup / 2012-08-14   The Statistics of Web Performance Analysis   42
Bimodal distribution       Boston #WebPerf Meetup / 2012-08-14   The Statistics of Web Performance Analysis   43
What does all of this mean?Boston #WebPerf Meetup / 2012-08-14   The Statistics of Web Performance Analysis   44
Distributions     • Sparse distribution suggests that you don’t have enough       data points     • Log-normal distributio...
In practice, a bi-modal distribution is not uncommonBoston #WebPerf Meetup / 2012-08-14   The Statistics of Web Performanc...
Hint: Does your site do a lot of back-end caching?Boston #WebPerf Meetup / 2012-08-14   The Statistics of Web Performance ...
2-2               FilteringBoston #WebPerf Meetup / 2012-08-14   The Statistics of Web Performance Analysis   48
Outliers                                                        • Out of range data points                                ...
Outliers                                                        • Out of range data points                                ...
Outliers                                                        • Out of range data points                                ...
Outliers                                                        • Out of range data points                                ...
DNS problems can cause outliers     • 2 or 3 DNS servers for an ISP     • 30 second timeout if first fails     • ... 30 sec...
Band-pass filtering       Boston #WebPerf Meetup / 2012-08-14   The Statistics of Web Performance Analysis   51
Band-pass filtering     • Strip everything outside a reasonable range         • Bandwidth range: 4kbps - 4Gbps         • Pa...
IQR filtering       Boston #WebPerf Meetup / 2012-08-14   The Statistics of Web Performance Analysis   52
IQR filtering                  Here, we derive the range from the data       Boston #WebPerf Meetup / 2012-08-14   The Stat...
Further Reading   lognormal.com/blog/2012/08/13/analysing-performance-data/      Boston #WebPerf Meetup / 2012-08-14   The...
Summary    • Choose a reasonable sample size and sampling factor    • Tune sample size for minimal margin of error    • De...
• Philip Tellis•                           .com• philip@lognormal.com• @bluesmoon• geek paranoid speedfreak• http://bluesm...
Thank youBoston #WebPerf Meetup / 2012-08-14   The Statistics of Web Performance Analysis   56
Photo credits     • http://www.flickr.com/photos/leoffreitas/332360959/ by leoffreitas     • http://www.flickr.com/photos/co...
List of figures     • http://en.wikipedia.org/wiki/File:Standard_deviation_diagram.svg     • http://en.wikipedia.org/wiki/F...
Upcoming SlideShare
Loading in...5
×

The Statistics of Web Performance Analysis

3,721

Published on

If you're interested in measuring real user web performance, you'll find tools like boomerang or episodes quite handy. Some popular web frameworks even have modules that make it easy to add them to your site. However, what does one do once one has collected the data? How do you filter out the noise and get meaningful insights from the data?

In this talk, I'll go over the techniques we've picked up by analyzing millions of datapoints daily. I'll cover some simple rules to filter out invalid data, and the statistics to analyze and make sense of what's left. Do you use the mean, median or mode? What about the geometric mean and standard deviation? How confident are we in the results? And finally, why should we care?

This talk should help you gain useful insights from a histogram, or at the very least point you in the right direction for further analysis.

Published in: Technology, Design
1 Comment
4 Likes
Statistics
Notes
No Downloads
Views
Total Views
3,721
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
38
Comments
1
Likes
4
Embeds 0
No embeds

No notes for slide

The Statistics of Web Performance Analysis

  1. 1. • Philip Tellis• .com• philip@lognormal.com• @bluesmoon• geek paranoid speedfreak• http://bluesmoon.info/ Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 1
  2. 2. I’m a Web SpeedfreakBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 2
  3. 3. We measure real user website performanceBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 3
  4. 4. This talk is about the Statistics we learned while building it Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 4
  5. 5. The Statistics of Web Performance Analysis Philip Tellis / philip@lognormal.com Boston #WebPerf Meetup / 2012-08-14 Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 5
  6. 6. 0 NumbersBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 6
  7. 7. Accurately measure page performance∗Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 7
  8. 8. Be unintrusive If you try to measure something accurately, you will change something related – Heisenberg’s uncertainty principle Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 8
  9. 9. And one number to rule them allBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 9
  10. 10. What do we measure? • Network Throughput • Network Latency • User perceived page load time Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 10
  11. 11. We measure real user dataBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 11
  12. 12. Which is noisyBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 12
  13. 13. 1 Statistics - 1Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 13
  14. 14. Disclaimer I am not a statistician Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 14
  15. 15. 1-1 Random SamplingBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 15
  16. 16. Population All possible users of your system Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 16
  17. 17. Sample Representative subset of the population Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 17
  18. 18. Bad sample Sometimes it’s not Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 18
  19. 19. How to randomize? http://xkcd.com/221/ Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 19
  20. 20. How to randomize? • Pick 10% of users at random and always test them OR • For each user, decide at random if they should be tested http://tech.bluesmoon.info/2010/01/statistics-of-performance-measurement.html Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 20
  21. 21. Select 10% of users - I if($sessionid % 10 === 0) { // instrument code for measurement } • Once a user enters the measurement bucket, they stay there until they log out • Fixed set of users, so tests may be more consistent • Error in the sample results in positive feedback Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 21
  22. 22. Select 10% of users - II if(rand() < 0.1 * getrandmax()) { // instrument code for measurement } • For every request, a user has a 10% chance of being tested • Gets rid of positive feedback errors, but sample size != 10% of population Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 22
  23. 23. How big a sample is representative? Select n such that σ 1.96 √n ≤ 5%µ Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 23
  24. 24. 1-2 Margin of ErrorBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 24
  25. 25. Standard Deviation • Standard deviation tells you the spread of the curve • The narrower the curve, the more confident you can be Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 25
  26. 26. MoE at 95% confidence σ ±1.96 √n Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 26
  27. 27. MoE & Sample size There is an inverse square root correlation between sample size and margin of error Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 27
  28. 28. 1-3 Central TendencyBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 28
  29. 29. Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 29
  30. 30. One number • Mean (Arithmetic) • Good for symmetric curves • Affected by outliers Mean(10, 11, 12, 11, 109) = 30 Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 30
  31. 31. One number • Median • Middle value measures central tendency well • Not trivial to pull out of a DB Median(10, 11, 12, 11, 109) = 11 Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 31
  32. 32. One number • Mode • Not often used • Multi-modal distributions suggest problems Mode(10, 11, 12, 11, 109) = 11 Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 32
  33. 33. Other numbers • A percentile point in the distribution: 95th , 98.5th or 99th • Used to find out the worst user experience • Makes more sense if you filter data first P95th (10, 11, 12, 11, 109) = 12 Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 33
  34. 34. Other means • Geometric mean • Good if your data is exponential in nature (with the tail on the right) GMean(10, 11, 12, 11, 109) = 16.68 Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 34
  35. 35. Wait... how did I get that? N ΠN xi — could lead to overflow i=1 ΣN loge (xi ) i=1 N e — computationally simpler Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 35
  36. 36. Wait... how did I get that? N ΠN xi — could lead to overflow i=1 ΣN loge (xi ) i=1 N e — computationally simpler Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 35
  37. 37. Wait... how did I get that? N ΠN xi — could lead to overflow i=1 ΣN loge (xi ) i=1 N e — computationally simpler Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 35
  38. 38. Wait... how did I get that? N ΠN xi — could lead to overflow i=1 ΣN loge (xi ) i=1 N e — computationally simpler Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 35
  39. 39. Other means And there is also the Harmonic mean, but forget about that Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 36
  40. 40. ...though consequently We have other margins of error • Geometric margin of error • Uses geometric standard deviation • Median margin of error • Uses ranges of actual values from data set • Stick to the arithmetic MoE – simpler to calculate, simpler to read and not incorrect Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 37
  41. 41. ...though consequently We have other margins of error • Geometric margin of error • Uses geometric standard deviation • Median margin of error • Uses ranges of actual values from data set • Stick to the arithmetic MoE – simpler to calculate, simpler to read and not incorrect Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 37
  42. 42. 2 Statistics - 2Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 38
  43. 43. 2-1 DistributionsBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 39
  44. 44. Let’s look at some real chartsBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 40
  45. 45. Sparse Distribution Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 41
  46. 46. Log-normal distribution Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 42
  47. 47. Bimodal distribution Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 43
  48. 48. What does all of this mean?Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 44
  49. 49. Distributions • Sparse distribution suggests that you don’t have enough data points • Log-normal distribution is typical • Bi-modal distribution suggests two (or more) distributions combined Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 45
  50. 50. In practice, a bi-modal distribution is not uncommonBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 46
  51. 51. Hint: Does your site do a lot of back-end caching?Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 47
  52. 52. 2-2 FilteringBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 48
  53. 53. Outliers • Out of range data points • Nothing you can fix here • There’s even a book about them Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 49
  54. 54. Outliers • Out of range data points • Nothing you can fix here • There’s even a book about them Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 49
  55. 55. Outliers • Out of range data points • Nothing you can fix here • There’s even a book about them Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 49
  56. 56. Outliers • Out of range data points • Nothing you can fix here • There’s even a book about them Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 49
  57. 57. DNS problems can cause outliers • 2 or 3 DNS servers for an ISP • 30 second timeout if first fails • ... 30 second increase in page load time • Maybe measure both and fix what you can • http://nms.lcs.mit.edu/papers/dns-ton2002.pdf Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 50
  58. 58. Band-pass filtering Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 51
  59. 59. Band-pass filtering • Strip everything outside a reasonable range • Bandwidth range: 4kbps - 4Gbps • Page load time: 50ms - 120s • You may need to relook at the ranges all the time Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 51
  60. 60. IQR filtering Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 52
  61. 61. IQR filtering Here, we derive the range from the data Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 52
  62. 62. Further Reading lognormal.com/blog/2012/08/13/analysing-performance-data/ Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 53
  63. 63. Summary • Choose a reasonable sample size and sampling factor • Tune sample size for minimal margin of error • Decide based on your data whether to use mode, median or one of the means • Figure out whether your data is Normal, Log-Normal or something else • Filter out anomalous outliers Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 54
  64. 64. • Philip Tellis• .com• philip@lognormal.com• @bluesmoon• geek paranoid speedfreak• http://bluesmoon.info/ Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 55
  65. 65. Thank youBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 56
  66. 66. Photo credits • http://www.flickr.com/photos/leoffreitas/332360959/ by leoffreitas • http://www.flickr.com/photos/cobalt/56500295/ by cobalt123 • http://www.flickr.com/photos/sophistechate/4264466015/ by Lisa Brewster Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 57
  67. 67. List of figures • http://en.wikipedia.org/wiki/File:Standard_deviation_diagram.svg • http://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svg • http://en.wikipedia.org/wiki/File:KilroySchematic.svg • http://en.wikipedia.org/wiki/File:Boxplot_vs_PDF.png Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 58
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×