2. Content
• Scales of Measurement
– Categorical Variables
– Numerical Variables:
• Displays of Categorical Data
– Frequencies
– Bar Graph
– Pie Chart
• Numerical Measures of Central Tendency
– Mean
– Median
– Mode
• Numerical Measures of Spread
• Association
• Correlation
• Regression
3. Scales of Measurement
• Categorical Variables:
– Nominal: Categorical variable with no order (e.g. Blood
type A, B, AB or O).
– Ordinal: Categorical, but with an order (e.g. Pain: “none",
“mild", “moderate", or “severe").
• Numerical Variables:
– Interval: Quantitative data where differences are
meaningful (e.g. Years 2009 -2010.). Here differences are
meaningful; ratios are not meaningful.
– Ratio: Quantitative data where ratios are meaningful (e.g.
weights, 200 lbs is twice as heavy as 100 lbs).
5. Categorical Variables
Variable (Sex) Frequency Proportion
Male 609 0.61
Female 391 0.39
Total 1000 100
0
100
200
300
400
500
600
700
Male Female
Bar Graph
Pie Chart
9. Measures of Central Tendency
Sample Mean
The sample mean, 𝑥, is the sum of all values in the
sample divided by the total number of observations,
n, in the sample.
𝑥 =
𝑥𝑖
𝑛
𝑖=1
𝑛
11. Sample Mean
• The mean is affected by extreme observations
and is not a resistant measure.
Scenario 2:
Mean = (120 + 135 + 115 + 110 +
105 + 140 + 280)/7 =144
Subjects BP
1 120 (x1)
2 135 (x2)
3 115 (x3)
4 110 (x4)
5 105 (x5)
6 140 (x6)
7 280 (x7)
12. Median
• The sample median, M, is the number such
that “half" the values in the sample are
smaller and the other “half" are larger.
• Use the following steps to find M.
– Sort the data (arrange in increasing order).
– Is the size of the data set n even or odd?
– If odd: M = value in the exact middle.
– If even: M = the average of the two middle
numbers.
13. Example: Sample Median
• Median systolic BP:
Scenario 1:
120 : 135 : 115 : 110 : 105 : 140
Median = (115 + 110) /2 = 112.5
Scenario 2:
120 : 135 : 115 : 110 : 105 : 140 : 280
Median = 110
• The median is not affected by extreme
observations and is a resistant measure.
14. Mode
• The sample mode is the value that occurs
most frequently in the sample (a data set can
have more than one mode).
• This is the only measure of center which can
also be used for categorical data.
• The population mode is the highest point on
the population distribution.
19. Numerical Measures of Spread
Range: The range of the data set is the
difference between the highest value and the
lowest value.
– Range = highest value - lowest value
– Easy to compute BUT ignores a great deal of
information.
– Obviously the range is affected by extreme
observations and is not a resistant measure.
20. Numerical Measures of Spread
• Variance: equal to the sum of squared deviations
from the sample mean divided by n - 1, where n is
the number of observations in the sample.
21. Numerical Measures of Spread
• Percentile: The percentile of a distribution is
the value at which observations fall at or
below it.
22. Numerical Measures of Spread
• The most commonly used percentiles are the
quartiles.
1st quartile Q1 = 25th percentile.
2nd quartile Q2 = 50th percentile.
3rd quartile Q1 = 75th percentile.
23. Numerical Measures of Spread
Inter Quartile Range (IQR)
A simple measure spread giving the range covered
by the middle half of the data is the (IQR) defined
below.
IQR = Q3 - Q1
The IQR is a resistant measure of spread.
24. Numerical Measures of Spread
Outliers: extreme observations that fall well
outside the overall pattern of the distribution.
• An outlier may be the result of a
– Recording error,
– An observation from a different population,
– An unusual extreme observation (biological
diversity)