Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

8,239 views

Published on

There are a few contenders for what the magic number should be. Do you use the mean, median, mode, or something else? How do you determine the correctness of this number or whether your sample size is large enough? Is one number sufficient?

This talk covers some of the statistics behind figuring out which numbers one should be looking at and how to go about extracting it from the sample.

Published in:
Technology

No Downloads

Total views

8,239

On SlideShare

0

From Embeds

0

Number of Embeds

147

Shares

0

Downloads

164

Comments

0

Likes

24

No embeds

No notes for slide

- 1. Introduction Statistics - I Statistics - II The Statistics of web Performance Philip Tellis / philip@bluesmoon.info ConFoo / 2010-03-12 ConFoo / 2010-03-12 The Statistics of web Performance
- 2. Introduction Statistics - I Statistics - II $ ﬁnger philip Philip Tellis philip@bluesmoon.info @bluesmoon yahoo geek ConFoo / 2010-03-12 The Statistics of web Performance
- 3. Introduction The goal Statistics - I Performance Measurement Statistics - II Introduction ConFoo / 2010-03-12 The Statistics of web Performance
- 4. Introduction The goal Statistics - I Performance Measurement Statistics - II Accurately measure page performance At least, as accurately as possible ConFoo / 2010-03-12 The Statistics of web Performance
- 5. Introduction The goal Statistics - I Performance Measurement Statistics - II Accurately measure page performance At least, as accurately as possible ConFoo / 2010-03-12 The Statistics of web Performance
- 6. Introduction The goal Statistics - I Performance Measurement Statistics - II Be unintrusive If you try to measure something accurately, you will change something related – Heisenberg’s uncertainty principle ConFoo / 2010-03-12 The Statistics of web Performance
- 7. Introduction The goal Statistics - I Performance Measurement Statistics - II And one number to rule them all ConFoo / 2010-03-12 The Statistics of web Performance
- 8. Introduction The goal Statistics - I Performance Measurement Statistics - II Bandwidth Real bandwidth v/s advertised bandwidth Bandwidth to your server, not to the ISP Bandwidth during normal internet usage If the user’s always watching movies, you’re not winning ConFoo / 2010-03-12 The Statistics of web Performance
- 9. Introduction The goal Statistics - I Performance Measurement Statistics - II Bandwidth Real bandwidth v/s advertised bandwidth Bandwidth to your server, not to the ISP Bandwidth during normal internet usage If the user’s always watching movies, you’re not winning ConFoo / 2010-03-12 The Statistics of web Performance
- 10. Introduction The goal Statistics - I Performance Measurement Statistics - II Latency How long does it take a byte to get to the user? Wired, wireless, mobile, satellite? How many hops in between? Speed of light is constant This is not a battle we will soon win. When was the last time you heard latency mentioned in a TV ad? http://www.stuartcheshire.org/rants/Latency.html ConFoo / 2010-03-12 The Statistics of web Performance
- 11. Introduction The goal Statistics - I Performance Measurement Statistics - II Latency How long does it take a byte to get to the user? Wired, wireless, mobile, satellite? How many hops in between? Speed of light is constant This is not a battle we will soon win. When was the last time you heard latency mentioned in a TV ad? http://www.stuartcheshire.org/rants/Latency.html ConFoo / 2010-03-12 The Statistics of web Performance
- 12. Introduction The goal Statistics - I Performance Measurement Statistics - II Latency How long does it take a byte to get to the user? Wired, wireless, mobile, satellite? How many hops in between? Speed of light is constant This is not a battle we will soon win. When was the last time you heard latency mentioned in a TV ad? http://www.stuartcheshire.org/rants/Latency.html ConFoo / 2010-03-12 The Statistics of web Performance
- 13. Introduction The goal Statistics - I Performance Measurement Statistics - II User perceived page load time Time from “click on a link” to “spinner stops spinning” This is what users notice Depends on how long your page takes to build Depends on what’s in your page Depends on how long components take to load Depends on how long the browser takes to execute and render ConFoo / 2010-03-12 The Statistics of web Performance
- 14. Introduction The goal Statistics - I Performance Measurement Statistics - II We need to measure real user data ConFoo / 2010-03-12 The Statistics of web Performance
- 15. Introduction The goal Statistics - I Performance Measurement Statistics - II The statistics apply to any kind of performance data though ConFoo / 2010-03-12 The Statistics of web Performance
- 16. Introduction Random Sampling Statistics - I Margin of Error Statistics - II Central Tendency Statistics - I ConFoo / 2010-03-12 The Statistics of web Performance
- 17. Introduction Random Sampling Statistics - I Margin of Error Statistics - II Central Tendency Disclaimer I am not a statistician ConFoo / 2010-03-12 The Statistics of web Performance
- 18. Introduction Random Sampling Statistics - I Margin of Error Statistics - II Central Tendency Population All possible users of your system ConFoo / 2010-03-12 The Statistics of web Performance
- 19. Introduction Random Sampling Statistics - I Margin of Error Statistics - II Central Tendency Sample Representative subset of the population ConFoo / 2010-03-12 The Statistics of web Performance
- 20. Introduction Random Sampling Statistics - I Margin of Error Statistics - II Central Tendency Bad sample Sometimes it’s not ConFoo / 2010-03-12 The Statistics of web Performance
- 21. Introduction Random Sampling Statistics - I Margin of Error Statistics - II Central Tendency How to randomize? Pick 10% of users at random and always test them OR For each user, decide at random if they should be tested http://tech.bluesmoon.info/2010/01/statistics-of-performance-measurement.html ConFoo / 2010-03-12 The Statistics of web Performance
- 22. Introduction Random Sampling Statistics - I Margin of Error Statistics - II Central Tendency Select 10% of users - I if($sessionid % 10 === 0) { // instrument code for measurement } Once a user enters the measurement bucket, they stay there until they log out Fixed set of users, so tests may be more consistent Error in the sample results in positive feedback ConFoo / 2010-03-12 The Statistics of web Performance
- 23. Introduction Random Sampling Statistics - I Margin of Error Statistics - II Central Tendency Select 10% of users - II if(rand() < 0.1 * getrandmax()) { // instrument code for measurement } For every request, a user has a 10% chance of being tested Gets rid of positive feedback errors, but sample size != 10% of population ConFoo / 2010-03-12 The Statistics of web Performance
- 24. Introduction Random Sampling Statistics - I Margin of Error Statistics - II Central Tendency How big a sample is representative? Select n such that σ 1.96 √n ≤ 5%µ ConFoo / 2010-03-12 The Statistics of web Performance
- 25. Introduction Random Sampling Statistics - I Margin of Error Statistics - II Central Tendency Standard Deviation Standard deviation tells you the spread of the curve The narrower the curve, the more conﬁdent you can be ConFoo / 2010-03-12 The Statistics of web Performance
- 26. Introduction Random Sampling Statistics - I Margin of Error Statistics - II Central Tendency MoE at 95% conﬁdence σ ±1.96 √n ConFoo / 2010-03-12 The Statistics of web Performance
- 27. Introduction Random Sampling Statistics - I Margin of Error Statistics - II Central Tendency MoE & Sample size There is an inverse square root correlation between sample size and margin of error ConFoo / 2010-03-12 The Statistics of web Performance
- 28. Introduction Random Sampling Statistics - I Margin of Error Statistics - II Central Tendency But wait... it’s not complicated enough. We have different types of margins of error ...more about that later ConFoo / 2010-03-12 The Statistics of web Performance
- 29. Introduction Random Sampling Statistics - I Margin of Error Statistics - II Central Tendency But wait... it’s not complicated enough. We have different types of margins of error ...more about that later ConFoo / 2010-03-12 The Statistics of web Performance
- 30. Introduction Random Sampling Statistics - I Margin of Error Statistics - II Central Tendency But wait... it’s not complicated enough. We have different types of margins of error ...more about that later ConFoo / 2010-03-12 The Statistics of web Performance
- 31. Introduction Random Sampling Statistics - I Margin of Error Statistics - II Central Tendency Ding dong ConFoo / 2010-03-12 The Statistics of web Performance
- 32. Introduction Random Sampling Statistics - I Margin of Error Statistics - II Central Tendency One number Mean (Arithmetic) Good for symmetric curves Affected by outliers Mean(10, 11, 12, 11, 109) = 30 ConFoo / 2010-03-12 The Statistics of web Performance
- 33. Introduction Random Sampling Statistics - I Margin of Error Statistics - II Central Tendency One number Median Middle value measures central tendency well Not trivial to pull out of a DB Median(10, 11, 12, 11, 109) = 11 ConFoo / 2010-03-12 The Statistics of web Performance
- 34. Introduction Random Sampling Statistics - I Margin of Error Statistics - II Central Tendency One number Mode Not often used Multi-modal distributions suggest problems Mode(10, 11, 12, 11, 109) = 11 ConFoo / 2010-03-12 The Statistics of web Performance
- 35. Introduction Random Sampling Statistics - I Margin of Error Statistics - II Central Tendency Other numbers A percentile point in the distribution: 95th , 98.5th or 99th Used to ﬁnd out the worst user experience Makes more sense if you ﬁlter data ﬁrst P95th (10, 11, 12, 11, 109) = 12 ConFoo / 2010-03-12 The Statistics of web Performance
- 36. Introduction Random Sampling Statistics - I Margin of Error Statistics - II Central Tendency Other means Geometric mean Good if your data is exponential in nature (with the tail on the right) GMean(10, 11, 12, 11, 109) = 16.68 ConFoo / 2010-03-12 The Statistics of web Performance
- 37. Introduction Random Sampling Statistics - I Margin of Error Statistics - II Central Tendency Wait... how did I get that? N ΠN xi — could lead to overﬂow i=1 ΣN loge (xi ) i=1 N e — computationally simpler ConFoo / 2010-03-12 The Statistics of web Performance
- 38. Introduction Random Sampling Statistics - I Margin of Error Statistics - II Central Tendency Wait... how did I get that? N ΠN xi — could lead to overﬂow i=1 ΣN loge (xi ) i=1 N e — computationally simpler ConFoo / 2010-03-12 The Statistics of web Performance
- 39. Introduction Random Sampling Statistics - I Margin of Error Statistics - II Central Tendency Wait... how did I get that? N ΠN xi — could lead to overﬂow i=1 ΣN loge (xi ) i=1 N e — computationally simpler ConFoo / 2010-03-12 The Statistics of web Performance
- 40. Introduction Random Sampling Statistics - I Margin of Error Statistics - II Central Tendency Wait... how did I get that? N ΠN xi — could lead to overﬂow i=1 ΣN loge (xi ) i=1 N e — computationally simpler ConFoo / 2010-03-12 The Statistics of web Performance
- 41. Introduction Random Sampling Statistics - I Margin of Error Statistics - II Central Tendency Other means And there is also the Harmonic mean, but forget about that ConFoo / 2010-03-12 The Statistics of web Performance
- 42. Introduction Random Sampling Statistics - I Margin of Error Statistics - II Central Tendency ...though consequently We have other margins of error Geometric margin of error Uses geometric standard deviation Median margin of error Uses ranges of actual values from data set Stick to the arithmetic MoE – simpler to calculate, simpler to read and not incorrect ConFoo / 2010-03-12 The Statistics of web Performance
- 43. Introduction Random Sampling Statistics - I Margin of Error Statistics - II Central Tendency ...though consequently We have other margins of error Geometric margin of error Uses geometric standard deviation Median margin of error Uses ranges of actual values from data set Stick to the arithmetic MoE – simpler to calculate, simpler to read and not incorrect ConFoo / 2010-03-12 The Statistics of web Performance
- 44. Introduction Filtering Statistics - I The Log-Normal distribution Statistics - II Statistics - II ConFoo / 2010-03-12 The Statistics of web Performance
- 45. Introduction Filtering Statistics - I The Log-Normal distribution Statistics - II Outliers Out of range data points Nothing you can ﬁx here There’s even a book about them ConFoo / 2010-03-12 The Statistics of web Performance
- 46. Introduction Filtering Statistics - I The Log-Normal distribution Statistics - II Outliers Out of range data points Nothing you can ﬁx here There’s even a book about them ConFoo / 2010-03-12 The Statistics of web Performance
- 47. Introduction Filtering Statistics - I The Log-Normal distribution Statistics - II Outliers Out of range data points Nothing you can ﬁx here There’s even a book about them ConFoo / 2010-03-12 The Statistics of web Performance
- 48. Introduction Filtering Statistics - I The Log-Normal distribution Statistics - II Outliers Out of range data points Nothing you can ﬁx here There’s even a book about them ConFoo / 2010-03-12 The Statistics of web Performance
- 49. Introduction Filtering Statistics - I The Log-Normal distribution Statistics - II DNS problems can cause outliers 2 or 3 DNS servers for an ISP 30 second timeout if ﬁrst fails ... 30 second increase in page load time Maybe measure both and ﬁx what you can http://nms.lcs.mit.edu/papers/dns-ton2002.pdf ConFoo / 2010-03-12 The Statistics of web Performance
- 50. Introduction Filtering Statistics - I The Log-Normal distribution Statistics - II Band-pass ﬁltering ConFoo / 2010-03-12 The Statistics of web Performance
- 51. Introduction Filtering Statistics - I The Log-Normal distribution Statistics - II Band-pass ﬁltering Strip everything outside a reasonable range Bandwidth range: 4kbps - 4Gbps Page load time: 50ms - 120s You may need to relook at the ranges all the time ConFoo / 2010-03-12 The Statistics of web Performance
- 52. Introduction Filtering Statistics - I The Log-Normal distribution Statistics - II IQR ﬁltering ConFoo / 2010-03-12 The Statistics of web Performance
- 53. Introduction Filtering Statistics - I The Log-Normal distribution Statistics - II IQR ﬁltering Here, we derive the range from the data ConFoo / 2010-03-12 The Statistics of web Performance
- 54. Introduction Filtering Statistics - I The Log-Normal distribution Statistics - II Let’s look at some real charts ConFoo / 2010-03-12 The Statistics of web Performance
- 55. Introduction Filtering Statistics - I The Log-Normal distribution Statistics - II Bandwidth distribution for web devs x-axis is linear ConFoo / 2010-03-12 The Statistics of web Performance
- 56. Introduction Filtering Statistics - I The Log-Normal distribution Statistics - II Now let’s use log(kbps) instead of kbps x-axis is exponential ConFoo / 2010-03-12 The Statistics of web Performance
- 57. Introduction Filtering Statistics - I The Log-Normal distribution Statistics - II Exponential == Geometric Categories/Buckets grow exponentially Data is related geometrically Use the geometric mean and geometric margin of error gmean Error _range = /gmoe , gmean ∗ gmoe Non-linear ranges are hard for humans to grok ConFoo / 2010-03-12 The Statistics of web Performance
- 58. Introduction Filtering Statistics - I The Log-Normal distribution Statistics - II Exponential == Geometric Categories/Buckets grow exponentially Data is related geometrically Use the geometric mean and geometric margin of error gmean Error _range = /gmoe , gmean ∗ gmoe Non-linear ranges are hard for humans to grok ConFoo / 2010-03-12 The Statistics of web Performance
- 59. Introduction Filtering Statistics - I The Log-Normal distribution Statistics - II Exponential == Geometric Categories/Buckets grow exponentially Data is related geometrically Use the geometric mean and geometric margin of error gmean Error _range = /gmoe , gmean ∗ gmoe Non-linear ranges are hard for humans to grok ConFoo / 2010-03-12 The Statistics of web Performance
- 60. Introduction Statistics - I Statistics - II So... ConFoo / 2010-03-12 The Statistics of web Performance
- 61. Introduction Statistics - I Statistics - II Further reading Web Performance - Not a Simple Number http://www.netforecast.com/Articles/BCR+C25+Web+Performance+-+Not+A+Simple+Number.pdf Revisiting statistics for web performance (introduction to Log-Normal) http://home.pacbell.net/ciemo/statistics/WhatDoYouMean.pdf Random Sampling http://tech.bluesmoon.info/2010/01/statistics-of-performance-measurement.html Khan Academy’s tutorials on statistics http://khanacademy.com/ Learning about Statistical Learning http://measuringmeasures.blogspot.com/2010/01/learning-about-statistical-learning.html Wikipedia articles on Random Sampling, Central Tendency, Standard Error, Confounding, Means and IQR ConFoo / 2010-03-12 The Statistics of web Performance
- 62. Introduction Statistics - I Statistics - II Summary Choose a reasonable sample size and sampling factor Tune sample size for minimal margin of error Decide based on your data whether to use mode, median or one of the means Figure out whether your data is Normal, Log-Normal or something else Filter out anomalous outliers ConFoo / 2010-03-12 The Statistics of web Performance
- 63. Introduction Statistics - I Statistics - II contact me Philip Tellis philip@bluesmoon.info bluesmoon.info @bluesmoon ConFoo / 2010-03-12 The Statistics of web Performance
- 64. Introduction Statistics - I Statistics - II Photo credits http://www.ﬂickr.com/photos/leoffreitas/332360959/ by leoffreitas http://www.ﬂickr.com/photos/cobalt/56500295/ by cobalt123 http://www.ﬂickr.com/photos/sophistechate/4264466015/ by Lisa Brewster http://www.ﬂickr.com/photos/nchoz/243216008/ by nchoz ConFoo / 2010-03-12 The Statistics of web Performance
- 65. Introduction Statistics - I Statistics - II List of ﬁgures http://en.wikipedia.org/wiki/File:Standard_deviation_diagram.svg http://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svg http://en.wikipedia.org/wiki/File:KilroySchematic.svg http://en.wikipedia.org/wiki/File:Boxplot_vs_PDF.png ConFoo / 2010-03-12 The Statistics of web Performance

No public clipboards found for this slide

Be the first to comment