3. Mean
Median
Mode
Range
Percentile
Variance
Standard Deviation
Covariance
Correlation Coefficient
Skewness
4. Why do we care about CentralTendency?
What is most valuable to you:
Average price of home in a neighborhood
Median price of home …
Range of prices …
Mode …
What does it say about neighborhood if:
Average price is $500K
Median price is $350K
Range is $750K
5. 𝜇 for Population 𝜇 =
𝑋
𝑁
𝑥 for Sample 𝑥 =
𝑋
𝑛
N = 1,2,3,4,4,5,5,5,5,6
Mean = 4
6. The “middle” value from sorted list
𝑀𝑒𝑑𝑖𝑎𝑛 =
𝑛+1
2
𝑡ℎ
term
Data: 1,2,3,4,4,5,5,5,5,6 Median: 4.5
Data: 1,2,3,4,4,5,5,5,5,6,7 Median: 5
7. The number that occurs the most
Data: 1,2,3,4,4,5,5,5,5,6,7
Mode: 5 (appears 4 times)
> table(c(1,2,3,4,4,5,5,5,5,6,7))
1 2 3 4 5 6 7
1 1 1 2 4 1 1
8. Cuts off data by n percent
Quiz Scores:
67,72,88,82,80,90,95,60,77,89,99,85,77
What is the score that cuts off 30% of all
scores? 50%?
quantile(c(67,72,88,82,80,90,95,60,77,89,99,85,77), c(.3, .5))
30% 50%
77 82
30% of all scores were 77 or below; 50% of all scores were 82 or
below
9. Graph showing data within quartiles
MaxValue
MinValue
Median
Q3 (75%)
Q1 (25%)
10. How to see the “dispersion” of data
Sets of quiz scores for different classes:
Set1: 80, 79, 80, 81, 80, 80, 79, 79
Set2: 75, 100, 60, 100, 100, 75, 75, 60
Just by looking at the number you should be
able to state that Set2 is more dispersed
standard deviation measures the dispersion
around the mean
Other measures include range and variance
11. difference between the highest and lowest
values in the data
Data: 1,2,3,4,4,5,5,5,5,6,7
Range: 6 (7-1)
12. Measures dispersion around the mean (how
far from the normal)
𝜎2
=
(𝑥−𝜇)2
𝑛
Steps to calculate population variance (𝜎2
)
Calculate mean
For each number in set
▪ subtract the mean
▪ Square the result (why square it?)
Get the average of differences
13. Data:
67,72,88,82,80,90,95,6
0,77,89,99,85,77
Large variance
indicates data is
spread out; small
indicates data is close
to mean.
For Sample (note n-1)
𝑠2 =
(𝑥− 𝑥)2
𝑛−1
Mean: 81.62
(67-81.62)2 = 213.74
(72-81.62)2 = 92.54
…
(72-81.62)2 = 21.34
Average of 213.74,
92.54,…, 21.74
𝜎2 = 123.09
14. Simply the square root of variance
𝜎 = 𝑖(𝑥 𝑖 −𝜇)2
𝑁
𝜎2
= 123.09
𝜎= 11.09
𝜎 is useful in determining what is “normal”
The mean of scores was 81.62, so most scores
are within 1 𝜎 (+/- 11.09). All scores are within
2 𝜎 (+/- 22.18).
16. Measures how two variables (x,y) are linearly
related. Positive value indicates linear relation.
Test Scores (x):
67,72,88,82,80,90,95,60,77,89,99,85,77
StudyTime (y):
30,45,80,85,75,85,120,30,45,75,85,110,40
Is there a relationship betweenTest Scores and
Time spent studying?
𝜎𝑥𝑦 = 269.43
What if everyone studied for 30 minutes?
𝜎𝑥𝑦 = 0 (so no linear relation)
17. A normalized measurement of how two
variables are linearly related.
Sample: 𝑟𝑥𝑦 =
𝑠 𝑥𝑦
𝑠 𝑥 𝑠 𝑦
Population: 𝜌 𝑥𝑦 =
𝜎 𝑥𝑦
𝜎 𝑥 𝜎 𝑦
From Previous Example: 𝜌 𝑥𝑦= 0.83 (the closer
this value to 1, the stronger the relationship)
18. Intuitively we would say there is no relation
But be careful …
Let’s say I have data for sales of ice vs. temp
> cor(temps, sales)
[1] -0.001413245
Correlation Coefficient is nearly 0, so no
relation, right?
19. The scatter plot
clearly shows a
strong relationship
between sales and
temps. Maybe
when it’s too hot
people just don’t
want to leave the
house.
Always visualize data!
21. Measure of “symmetry” of
data. Negative value
indicates mean is less than
median (left skewed).
Positive value indicates
mean is larger than
median (right skewed).
> skewness(scores)
[1] -0.302365
24. Mean, Median, Mode exactly at center
99.999% of all data within 3 𝜎 of mean
Important for making inferences
Test scores are generally normal distribution
Height of humans follow normal distribution
Need to be careful not to apply normal
distribution rules against non-normal data
https://en.wikipedia.org/wiki/Normality_test
25. z-Score indicates how far above or below the mean
a given score in the distribution is
Scenario:Which exam did Scott do better?
Scott got a 65/100 on Exam1; 𝜇 is 60; 𝜎 is 10
Scott got a 42/200 on Exam2; 𝜇 is 37; 𝜎 is 5
First, need to standardize scores (Exam1 is out of
100; Exam2 out of 200)
This standardization is the z-score
𝑧 =
𝑟𝑎𝑤 𝑠𝑐𝑜𝑟𝑒 −𝑚𝑒𝑎𝑛
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
or 𝑧 =
𝑋−𝜇
𝜎
26. z of -1.5 means student
scored 1.5 standard
deviations below the
mean
In the case of test scores,
positive numbers are good
Less than 10% scored worse
Which score marks the
97th percentile?
What percentage of
population scored
between score1 and
score2 (say 75 and 90)?
27. Measurements of CentralTendency and
Variability are critical to study of statistics
CentralTendency tries to provide information
about the “central” value of your set
Variability tries to provide information about
the dispersion of data in your set
Covariance tries to provide information about
how two variables are related
z-Scores are useful with a normal distribution