If you're interested in measuring real user web performance, you'll find tools like boomerang or episodes quite handy. Some popular web frameworks even have modules that make it easy to add them to your site. However, what does one do once one has collected the data? How do you filter out the noise and get meaningful insights from the data?
In this talk, I'll go over the techniques we've picked up by analyzing millions of datapoints daily. I'll cover some simple rules to filter out invalid data, and the statistics to analyze and make sense of what's left. Do you use the mean, median or mode? What about the geometric mean and standard deviation? How confident are we in the results? And finally, why should we care?
This talk should help you gain useful insights from a histogram, or at the very least point you in the right direction for further analysis.
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
The Statistics of Web Performance Analysis
1. • Philip Tellis
• .com
• philip@lognormal.com
• @bluesmoon
• geek paranoid speedfreak
• http://bluesmoon.info/
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 1
2. I’m a Web Speedfreak
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 2
3. We measure real user website performance
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 3
4. This talk is about the Statistics we learned while building it
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 4
5. The Statistics of Web Performance Analysis
Philip Tellis / philip@lognormal.com
Boston #WebPerf Meetup / 2012-08-14
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 5
6. 0
Numbers
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 6
7. Accurately measure page performance∗
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 7
8. Be unintrusive
If you try to measure something accurately, you will change
something related
– Heisenberg’s uncertainty principle
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 8
9. And one number to rule them all
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 9
10. What do we measure?
• Network Throughput
• Network Latency
• User perceived page load time
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 10
11. We measure real user data
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 11
12. Which is noisy
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 12
13. 1
Statistics - 1
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 13
14. Disclaimer
I am not a statistician
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 14
15. 1-1 Random Sampling
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 15
16. Population
All possible users of your system
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 16
17. Sample
Representative subset of the population
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 17
18. Bad sample
Sometimes it’s not
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 18
19. How to randomize?
http://xkcd.com/221/
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 19
20. How to randomize?
• Pick 10% of users at random and always test them
OR
• For each user, decide at random if they should be tested
http://tech.bluesmoon.info/2010/01/statistics-of-performance-measurement.html
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 20
21. Select 10% of users - I
if($sessionid % 10 === 0) {
// instrument code for measurement
}
• Once a user enters the measurement bucket, they stay
there until they log out
• Fixed set of users, so tests may be more consistent
• Error in the sample results in positive feedback
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 21
22. Select 10% of users - II
if(rand() < 0.1 * getrandmax()) {
// instrument code for measurement
}
• For every request, a user has a 10% chance of being
tested
• Gets rid of positive feedback errors, but sample size !=
10% of population
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 22
23. How big a sample is representative?
Select n such that
σ
1.96 √n ≤ 5%µ
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 23
24. 1-2 Margin of Error
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 24
25. Standard Deviation
• Standard deviation tells you the spread of the curve
• The narrower the curve, the more confident you can be
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 25
26. MoE at 95% confidence
σ
±1.96 √n
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 26
27. MoE & Sample size
There is an inverse square root correlation between sample size
and margin of error
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 27
28. 1-3 Central Tendency
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 28
30. One number
• Mean (Arithmetic)
• Good for symmetric curves
• Affected by outliers
Mean(10, 11, 12, 11, 109) = 30
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 30
31. One number
• Median
• Middle value measures central tendency well
• Not trivial to pull out of a DB
Median(10, 11, 12, 11, 109) = 11
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 31
32. One number
• Mode
• Not often used
• Multi-modal distributions suggest problems
Mode(10, 11, 12, 11, 109) = 11
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 32
33. Other numbers
• A percentile point in the distribution: 95th , 98.5th or 99th
• Used to find out the worst user experience
• Makes more sense if you filter data first
P95th (10, 11, 12, 11, 109) = 12
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 33
34. Other means
• Geometric mean
• Good if your data is exponential in nature
(with the tail on the right)
GMean(10, 11, 12, 11, 109) = 16.68
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 34
35. Wait... how did I get that?
N
ΠN xi — could lead to overflow
i=1
ΣN loge (xi )
i=1
N
e — computationally simpler
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 35
36. Wait... how did I get that?
N
ΠN xi — could lead to overflow
i=1
ΣN loge (xi )
i=1
N
e — computationally simpler
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 35
37. Wait... how did I get that?
N
ΠN xi — could lead to overflow
i=1
ΣN loge (xi )
i=1
N
e — computationally simpler
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 35
38. Wait... how did I get that?
N
ΠN xi — could lead to overflow
i=1
ΣN loge (xi )
i=1
N
e — computationally simpler
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 35
39. Other means
And there is also the Harmonic mean, but forget about that
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 36
40. ...though consequently
We have other margins of error
• Geometric margin of error
• Uses geometric standard deviation
• Median margin of error
• Uses ranges of actual values from data set
• Stick to the arithmetic MoE
– simpler to calculate, simpler to read and not incorrect
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 37
41. ...though consequently
We have other margins of error
• Geometric margin of error
• Uses geometric standard deviation
• Median margin of error
• Uses ranges of actual values from data set
• Stick to the arithmetic MoE
– simpler to calculate, simpler to read and not incorrect
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 37
42. 2
Statistics - 2
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 38
43. 2-1 Distributions
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 39
44. Let’s look at some real charts
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 40
45. Sparse Distribution
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 41
46. Log-normal distribution
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 42
47. Bimodal distribution
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 43
48. What does all of this mean?
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 44
49. Distributions
• Sparse distribution suggests that you don’t have enough
data points
• Log-normal distribution is typical
• Bi-modal distribution suggests two (or more) distributions
combined
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 45
50. In practice, a bi-modal distribution is not uncommon
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 46
51. Hint: Does your site do a lot of back-end caching?
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 47
52. 2-2 Filtering
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 48
53. Outliers
• Out of range data points
• Nothing you can fix here
• There’s even a book about
them
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 49
54. Outliers
• Out of range data points
• Nothing you can fix here
• There’s even a book about
them
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 49
55. Outliers
• Out of range data points
• Nothing you can fix here
• There’s even a book about
them
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 49
56. Outliers
• Out of range data points
• Nothing you can fix here
• There’s even a book about
them
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 49
57. DNS problems can cause outliers
• 2 or 3 DNS servers for an ISP
• 30 second timeout if first fails
• ... 30 second increase in page load time
• Maybe measure both and fix what you can
• http://nms.lcs.mit.edu/papers/dns-ton2002.pdf
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 50
58. Band-pass filtering
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 51
59. Band-pass filtering
• Strip everything outside a reasonable range
• Bandwidth range: 4kbps - 4Gbps
• Page load time: 50ms - 120s
• You may need to relook at the ranges all the time
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 51
60. IQR filtering
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 52
61. IQR filtering
Here, we derive the range from the data
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 52
62. Further Reading
lognormal.com/blog/2012/08/13/analysing-performance-data/
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 53
63. Summary
• Choose a reasonable sample size and sampling factor
• Tune sample size for minimal margin of error
• Decide based on your data whether to use mode, median
or one of the means
• Figure out whether your data is Normal, Log-Normal or
something else
• Filter out anomalous outliers
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 54
64. • Philip Tellis
• .com
• philip@lognormal.com
• @bluesmoon
• geek paranoid speedfreak
• http://bluesmoon.info/
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 55
66. Photo credits
• http://www.flickr.com/photos/leoffreitas/332360959/ by leoffreitas
• http://www.flickr.com/photos/cobalt/56500295/ by cobalt123
• http://www.flickr.com/photos/sophistechate/4264466015/ by Lisa
Brewster
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 57
67. List of figures
• http://en.wikipedia.org/wiki/File:Standard_deviation_diagram.svg
• http://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svg
• http://en.wikipedia.org/wiki/File:KilroySchematic.svg
• http://en.wikipedia.org/wiki/File:Boxplot_vs_PDF.png
Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 58