The Statistics of Web Performance Analysis
Upcoming SlideShare
Loading in...5
×
 

The Statistics of Web Performance Analysis

on

  • 3,133 views

If you're interested in measuring real user web performance, you'll find tools like boomerang or episodes quite handy. Some popular web frameworks even have modules that make it easy to add them to ...

If you're interested in measuring real user web performance, you'll find tools like boomerang or episodes quite handy. Some popular web frameworks even have modules that make it easy to add them to your site. However, what does one do once one has collected the data? How do you filter out the noise and get meaningful insights from the data?

In this talk, I'll go over the techniques we've picked up by analyzing millions of datapoints daily. I'll cover some simple rules to filter out invalid data, and the statistics to analyze and make sense of what's left. Do you use the mean, median or mode? What about the geometric mean and standard deviation? How confident are we in the results? And finally, why should we care?

This talk should help you gain useful insights from a histogram, or at the very least point you in the right direction for further analysis.

Statistics

Views

Total Views
3,133
Views on SlideShare
3,125
Embed Views
8

Actions

Likes
4
Downloads
32
Comments
2

3 Embeds 8

http://f206.faxo.com 3
https://si0.twimg.com 3
https://twimg0-a.akamaihd.net 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

The Statistics of Web Performance Analysis The Statistics of Web Performance Analysis Presentation Transcript

  • • Philip Tellis• .com• philip@lognormal.com• @bluesmoon• geek paranoid speedfreak• http://bluesmoon.info/ Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 1
  • I’m a Web SpeedfreakBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 2
  • We measure real user website performanceBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 3
  • This talk is about the Statistics we learned while building it Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 4
  • The Statistics of Web Performance Analysis Philip Tellis / philip@lognormal.com Boston #WebPerf Meetup / 2012-08-14 Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 5
  • 0 NumbersBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 6
  • Accurately measure page performance∗Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 7
  • Be unintrusive If you try to measure something accurately, you will change something related – Heisenberg’s uncertainty principle Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 8
  • And one number to rule them allBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 9
  • What do we measure? • Network Throughput • Network Latency • User perceived page load time Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 10
  • We measure real user dataBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 11
  • Which is noisyBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 12
  • 1 Statistics - 1Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 13
  • Disclaimer I am not a statistician Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 14
  • 1-1 Random SamplingBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 15
  • Population All possible users of your system Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 16
  • Sample Representative subset of the population Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 17
  • Bad sample Sometimes it’s not Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 18
  • How to randomize? http://xkcd.com/221/ Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 19
  • How to randomize? • Pick 10% of users at random and always test them OR • For each user, decide at random if they should be tested http://tech.bluesmoon.info/2010/01/statistics-of-performance-measurement.html Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 20
  • Select 10% of users - I if($sessionid % 10 === 0) { // instrument code for measurement } • Once a user enters the measurement bucket, they stay there until they log out • Fixed set of users, so tests may be more consistent • Error in the sample results in positive feedback Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 21
  • Select 10% of users - II if(rand() < 0.1 * getrandmax()) { // instrument code for measurement } • For every request, a user has a 10% chance of being tested • Gets rid of positive feedback errors, but sample size != 10% of population Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 22
  • How big a sample is representative? Select n such that σ 1.96 √n ≤ 5%µ Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 23
  • 1-2 Margin of ErrorBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 24
  • Standard Deviation • Standard deviation tells you the spread of the curve • The narrower the curve, the more confident you can be Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 25
  • MoE at 95% confidence σ ±1.96 √n Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 26
  • MoE & Sample size There is an inverse square root correlation between sample size and margin of error Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 27
  • 1-3 Central TendencyBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 28
  • Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 29
  • One number • Mean (Arithmetic) • Good for symmetric curves • Affected by outliers Mean(10, 11, 12, 11, 109) = 30 Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 30
  • One number • Median • Middle value measures central tendency well • Not trivial to pull out of a DB Median(10, 11, 12, 11, 109) = 11 Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 31
  • One number • Mode • Not often used • Multi-modal distributions suggest problems Mode(10, 11, 12, 11, 109) = 11 Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 32
  • Other numbers • A percentile point in the distribution: 95th , 98.5th or 99th • Used to find out the worst user experience • Makes more sense if you filter data first P95th (10, 11, 12, 11, 109) = 12 Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 33
  • Other means • Geometric mean • Good if your data is exponential in nature (with the tail on the right) GMean(10, 11, 12, 11, 109) = 16.68 Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 34
  • Wait... how did I get that? N ΠN xi — could lead to overflow i=1 ΣN loge (xi ) i=1 N e — computationally simpler Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 35
  • Wait... how did I get that? N ΠN xi — could lead to overflow i=1 ΣN loge (xi ) i=1 N e — computationally simpler Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 35
  • Wait... how did I get that? N ΠN xi — could lead to overflow i=1 ΣN loge (xi ) i=1 N e — computationally simpler Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 35
  • Wait... how did I get that? N ΠN xi — could lead to overflow i=1 ΣN loge (xi ) i=1 N e — computationally simpler Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 35
  • Other means And there is also the Harmonic mean, but forget about that Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 36
  • ...though consequently We have other margins of error • Geometric margin of error • Uses geometric standard deviation • Median margin of error • Uses ranges of actual values from data set • Stick to the arithmetic MoE – simpler to calculate, simpler to read and not incorrect Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 37
  • ...though consequently We have other margins of error • Geometric margin of error • Uses geometric standard deviation • Median margin of error • Uses ranges of actual values from data set • Stick to the arithmetic MoE – simpler to calculate, simpler to read and not incorrect Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 37
  • 2 Statistics - 2Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 38
  • 2-1 DistributionsBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 39
  • Let’s look at some real chartsBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 40
  • Sparse Distribution Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 41
  • Log-normal distribution Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 42
  • Bimodal distribution Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 43
  • What does all of this mean?Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 44
  • Distributions • Sparse distribution suggests that you don’t have enough data points • Log-normal distribution is typical • Bi-modal distribution suggests two (or more) distributions combined Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 45
  • In practice, a bi-modal distribution is not uncommonBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 46
  • Hint: Does your site do a lot of back-end caching?Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 47
  • 2-2 FilteringBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 48
  • Outliers • Out of range data points • Nothing you can fix here • There’s even a book about them Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 49
  • Outliers • Out of range data points • Nothing you can fix here • There’s even a book about them Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 49
  • Outliers • Out of range data points • Nothing you can fix here • There’s even a book about them Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 49
  • Outliers • Out of range data points • Nothing you can fix here • There’s even a book about them Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 49
  • DNS problems can cause outliers • 2 or 3 DNS servers for an ISP • 30 second timeout if first fails • ... 30 second increase in page load time • Maybe measure both and fix what you can • http://nms.lcs.mit.edu/papers/dns-ton2002.pdf Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 50
  • Band-pass filtering Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 51
  • Band-pass filtering • Strip everything outside a reasonable range • Bandwidth range: 4kbps - 4Gbps • Page load time: 50ms - 120s • You may need to relook at the ranges all the time Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 51
  • IQR filtering Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 52
  • IQR filtering Here, we derive the range from the data Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 52
  • Further Reading lognormal.com/blog/2012/08/13/analysing-performance-data/ Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 53
  • Summary • Choose a reasonable sample size and sampling factor • Tune sample size for minimal margin of error • Decide based on your data whether to use mode, median or one of the means • Figure out whether your data is Normal, Log-Normal or something else • Filter out anomalous outliers Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 54
  • • Philip Tellis• .com• philip@lognormal.com• @bluesmoon• geek paranoid speedfreak• http://bluesmoon.info/ Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 55
  • Thank youBoston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 56
  • Photo credits • http://www.flickr.com/photos/leoffreitas/332360959/ by leoffreitas • http://www.flickr.com/photos/cobalt/56500295/ by cobalt123 • http://www.flickr.com/photos/sophistechate/4264466015/ by Lisa Brewster Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 57
  • List of figures • http://en.wikipedia.org/wiki/File:Standard_deviation_diagram.svg • http://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svg • http://en.wikipedia.org/wiki/File:KilroySchematic.svg • http://en.wikipedia.org/wiki/File:Boxplot_vs_PDF.png Boston #WebPerf Meetup / 2012-08-14 The Statistics of Web Performance Analysis 58