Statistics for Dummies
Fred Moyer
@phredmoyer
@circonus
Data
“Without data, you’re just another person
with an opinion”
W. Edwards Deming
Average
“Arithmetic Mean”
Avg = sum / # samples
avg(1,2,3) = 6/3 = 2
Median
Midpoint of a data set
50th percentile q(0.5) = 33
Value 11 22 33 44 55
Sample 1 2 3 4 5
90th Percentile
90% of the values are below it
q(0.9) = 100
Value 11 22 33 44 55 66 77 88 99
100 111
Sample 1 2 3 4 5 6 7 8 9 10 11
Histogram
Sample Value
Number of
Samples
Normal Distribution
Sample Value
Number of
Samples
Median = Mode = Average
Standard Deviation
Sample Value
Number of
Samples
68.1% of values within one
sigma (standard deviation)
Standard Deviation (σ)
Subtract the mean μ from all the samples, square the
difference, sum and divide by number of samples
Non Normal Distribution
Sample Value
Number of
Samples
Non Normal Distribution
Sample Value
Number of
Samples
Median
Mode
What’s the
standard
deviation?
Percentiles
Sample set A q(0.95) = 10
Sample set B q(0.95) = 20
What is q(0.95) for A U B?
Percentiles
a(0.95){A U B} !=
avg( q(0.95){A} + q(0.95){B} )
q(0.95){A U B } needs raw data
Null Hypothesis
“The Double” = 2012
Determine who is Cassius the Assassin
Null Hypothesis
“Stephen Hawking is Cassius”
Formulate a null hypothesis
“Stephen Hawking is not Cassius”
Null Hypothesis
Try to disprove the null hypothesis
Hawking wasn’t in any crime scene photos
Null hypothesis proved, Hawking != Cassius
Large p-value, > 0.05, weak evidence against
Null Hypothesis
“Richard Gere is Cassius”
Formulate a null hypothesis
“Richard Gere is not Cassius”
Null Hypothesis
Try to disprove null hypothesis
Richard Gere was in ALL of the crime
scene photos
Small p-value, <= 0.05, weak evidence for
Null Hypothesis
Null hypothesis disproved
Richard Gere is Cassius
Don’t be a Dummy
Go learn some statistics!

Statistics for dummies

Editor's Notes

  • #3 Without math, your interpretation of the data is likely misleading. So we want to apply math to the data, and apply it correctly. This is statistics - applying math to data.
  • #7 This is a histogram. The vertical bars are called bins, or buckets. THe x axis is the sample value, and the number of samples is on the y axis.
  • #8 This is a normal distribution, or gaussian distribution. THe median equals the mode equals the average. The distribution is symmetrical on both sides. Often this is called a bell curve.
  • #9 This is the standard deviation. 68.1% percent of the values of the data fall within one standard deviation. We call that one sigma.
  • #10 This is how we calculate the standard deviation. Just take this complicated formula, and code it up. Easy, right?
  • #11 This is a bimodal distribution
  • #12 This is a nonnormal distribution also. This particular distribution is a skewed distribution Where’s the median? Where’s the mode? There is no standard deviation for a non normal distribution.
  • #13 Let’s talk about percentiles. If we have a 95th percentile for set A of 10, and a 95th percentile for set B of 20, what’s the 95th percentile for the set A union B?
  • #14 You can’t generate a quantile for a combination of data sets by averaging the quantiles of the data sets. You have to generate the percentile from the raw data for both sets.
  • #15 Who here has seen The Double? Topher Grace and Richard Gere star. There’s an assassin named Cassius running around killing people, and they are trying to find him.
  • #16 Let’s say we want to prove Stephen Hawking is Cassius, we formulate a null hypothesis that says Hawking is not Cassius.
  • #17 Then we try to disprove the null hypothesis. He wasn’t in the crime scene photos, so we have weak evidence against the null hypothesis. We have
  • #18 Now we examine if Richard Gere is Cassius. Let’s formulate a null hypothesis, that RIchard Gere is not Cassius.
  • #19 Since Gere was in all the crime scene photos, we have weak evidence for the null hypothesis
  • #20 Thus we can disprove the null hypothesis. Gere is Cassius. If this stuff seems confusing, it’s because it is. Explaining this null hypothesis theory confuses me almost every single time.
  • #21 Thanks folks. Don’t be a dummy, go learn some statistics. It will make you a better engineer.