3. Remember the sample of 6 fish that we caught from
the lake . . .
They were the following lengths:
3”, 4”, 5”, 6”, 8”, 10”
The mean length was 6 inches. Recall that we
calculated the deviations from the mean. What was the
sum of these deviations?
Can we find an average deviation?
What can we do to the deviations so that
we could find an average?
4. The estimated average of the deviations squared
is called the variance.
s 2
=
x -m
( )
2
å
N
5. Standard Deviation
- is the square root of the variance.
- is the average distance from the
center(mean).
s =
x -m
( )
2
å
N
7. ( )
1
2
2
−
−
=
n
x
x
s
Degree of
freedom
When calculating sample variance, we use degrees of freedom (n – 1)
in the denominator instead of n because this tends to produce
better estimates.
Degrees of freedom will be revisited again in Chapter 8.
8. x (x - x) (x - x)2
3 -3
4 -2
5 -1
6 0
8 2
10 4
Sum 0
What is the sum
of the deviations
squared?
Remember the sample of 6 fish that we caught from the lake . . .
Find the variance of the length of fish.
Divide this by 5.
First square the
deviations
9
4
1
0
4
16
34
s2 = 6.8
9. A typical deviation from the mean is the
standard deviation.
s2 = 6.8 inches2 so s = 2.608 inches
The fish in our sample deviate from the mean of
6 by an average of 2.608 inches.
10. The most commonly used measures of
center and variability are the mean
and standard deviation, respectively.
11. Choosing Measures of Center and Spread
- Mean and Standard Deviation
- Median and Interquartile Range
12. • The median and IQR are usually better than
the mean and standard deviation for
describing a skewed distribution or a
distribution with outliers.
• Use mean and standard deviation only for
reasonably symmetric distributions that don’t
have outliers.
13. Rule of Thumb
The range is 4 times as much as the
standard deviation.
14. symmetrical distribution of data
Consider the following data set:
4 5 6 6 6 7 7 7 7 7 7 8 8 8 9 10
This data set produces the histogram shown below. Each interval has width one and each
value is located in the middle of an interval. The histogram displays
a symmetrical distribution of data
15. Skewness
• Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A
distribution, or data set, is symmetric if it looks the same to the left and right of the
center point.
• The skewness for a normal distribution is zero, and any symmetric data should have
a skewness near zero. Negative values for the skewness indicate data that are
skewed left and positive values for the skewness indicate data that are skewed right.
By skewed left, we mean that the left tail is long relative to the right tail. Similarly,
skewed right means that the right tail is long relative to the left tail.
• [Ref: https://en.wikipedia.org/wiki/Skewness] Skewness in a data series may
sometimes be observed not only graphically but by simple inspection of the
values. For instance, consider the numeric sequence (49, 50, 51), whose values are
evenly distributed around a central value of 50. We can transform this sequence
into a negatively skewed distribution by adding a value far below the mean, e.g.
(40, 49, 50, 51). Similarly, we can make the sequence positively skewed by adding
a value far above the mean, e.g. (49, 50, 51, 60).