1. PUBH 601 Concepts and Methods of Biostatistics
2-Numerical
summaries
Manar Elhassan, PhD
Department of Public Health
College of Health Sciences Qatar
University
Fall 2022
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022 1
Department of Public Health
2. • Shape, location and spread
• Frequency tables
• Proportions and percentages
Measures of central location:
• Mean, median, mode
• Comparison of mean,
median, and mode
Objectives of today’s class
Numerical summaries
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022 2
Department of Public Health
Frequency distributions:
Measures of spread/variability:
• Variance and standard
deviation
• Change of units
• Range and interquartile
range
• Percentiles
Coefficient of variation
3. Student Learning Outcomes of the Course
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022 3
Department of Public Health
At the end of this course, students will be able to:
• Summarize distribution of continuous and categorical
data in frequency tables
• Report and interpret appropriate continuous data
summary measures for central tendency, variability
and position
8. Once we obtained our sample, we would like to summarize it.
Depending on the type of the data and the dimension there are
different methods of summarizing the data.
Types
•Numerical
•Categorical
Dimensions
•Univariate
•Bivariate
•Multivariable
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022 8
Department of Public Health
10. Frequency table – Categorical variables
Also called
relative
frequencies
Frequencies
Should add
up to 100
10
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022
Department of Public Health 1
11. Frequency table – Ordinal variables
Normal: SBP<120and DBP <80
Pre-hypertension:SBP 120-139or DBP 80-89
Stage I hypertension:SBP 140-159or DBP 90-99
Stage II hypertension:SBP 160+ or DBP 100+
11
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022
Department of Public Health
12. Frequency table – Ordinal variables
There are 3,311patients with normal, pre-
hypertension or StageI hypertension.
12
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022
Department of Public Health
13. Frequency table – Ordinal variables
75.2%of the patients are NOT classified as hypertensive
13
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022
Department of Public Health
14. Frequency table – Numerical variables
14
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022
Department of Public Health
15. Frequency table – Numerical variables
15
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022
Department of Public Health
16. Frequency table – Numerical variables
19
16
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022
Department of Public Health
3
1
17. Frequency table – Numerical variables
19
Shape or distribution of
haemoglobin level in the
sample of 70 women
3
1
What is this shape in the population?
17
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022
Department of Public Health
18. Shape of a distribution
18
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022
Department of Public Health
20. Location and spread of a distribution
Location in
terms of its
center
20
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022
Department of Public Health
21. Location and spread of a distribution
Location in
terms of its
center
Spread in
terms of its
variability or
dispersion
20
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022
Department of Public Health 2
22. Missing data
• In a certain study with 3500 patients, 1200 received a treatment
of hypertension. What is the relative frequency of those receiving
hypertension?
• For 200 patients information on treatment was missing. What is
the relative frequency of those receiving hypertension?
• What is going on here?
22
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022
Department of Public Health
23. Missing data
• When there is very little missing data (e.g., less than 5%) and
there is no apparent pattern to the missingness (e.g., there is no
systematic reason for missing data), then statistical analyses
based on the available data are generally appropriate.
• If too many records missing on a given variable, probably need to
discard variable
23
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022
Department of Public Health
24. Measures of location
Numerical data
24
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022
Department of Public Health
25. 25
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022
Department of Public Health
26. The mean is located at the balancing
point of the histogram.
The median splits the histograminto
two halves of equal area.
26
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022
Department of Public Health
27. Mean or Median?
27
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022
Department of Public Health
• Mean is usually the preferred measure since it takes
into account each individual observation
• Use median if there are one or two extremely high or
low values (outliers) - data is skewed
• Symmetric distribution ⇒ mean and median are
exactly the same, use the mean.
• Skewed distribution ⇒ mean is farther out in the long
tail than is the median, use the median.
28. Mean, Median and Mode
When mean = median → distribution is symmetrical
When mean > median → there is a positive skew
When mean < median → there is a negative skew
28
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022
Department of Public Health
29. Examples n=10 participants attending one of the Framingham study examinations
• Means and medians are not identical but are relatively close.
Mean is the most appropriate summary
• If the mean and median are very different, it suggests that there
maybe outliers affecting the mean
29
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022
Department of Public Health
Variable Mean Median
Diastolic Blood Pressure 71.3 71
Systolic Blood Pressure 121.2 122.5
Total Serum Cholesterol 202.3 206.5
Weight 176 169.5
Height 67.175 69.375
Body Mass Index 27.26 26.6
30. Measures of spread
Numerical data
PUBH 601 Concepts and Methods of Biostatistics. Fall 2020 30
DepartmentofPublicHealth 3
31. Measures of spread of the data Variability
Dispersion
Variance,standard deviation
Range
Interquartile range
Percentile
31
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022
Department of Public Health
32. Percentiles
You are the fourth tallest person in a group of 20
32
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022
Department of Public Health
• What percentageof people are shorter than you?Answer: 80%
• That means you are at the 80th percentile.
• If your height is 1.85m then ”1.85m” is the 80th percentile height
in that group.
33. Percentile/
centile
A value below which a certain percentage of
observations lie
Quartile Divide the data into 4 equally-sized groups
Quantiles?
33
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022
Department of Public Health
Deciles?
34. Interquartile range and the range
Interquartile Range
= Upper quartile Q3 - Lower quartile Q1
If the interquartile range
is large it means that the
middle 50% of observations
are spaced wide apart.
Q2 is the median
Range
= Highest value - Lowest value
34
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022
Department of Public Health
35. Quartiles
n is even
35
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022
Department of Public Health
n is odd
36. Interquartile range and the range
Advantages
• Range is easy to calculate
• IQR is not affected by extreme values or outliers.
Disadvantages
• Range only measures the spread between highest and lowest
values.
• Range is sensitive to outliers
• IQR uses only the two quartiles of and ignores much of the
information about how individual values vary.
If data a skewed report the median and the interquartile range
36
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022
Department of Public Health
37. Variance σ2 or SD2 & standard deviation σ or SD
• Variance ⇒ average of the squares of the deviations of
the observations from their mean
• Standard deviation ⇒ square root of the variance
• The most common numerical description of a
distribution is the mean and the standard deviation
37
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022
Department of Public Health
38. 38
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022
Department of Public Health
39. SD= 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 =
39
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022
Department of Public Health
40. Properties of the Standard Deviation SD
• SD measures spread about the mean and should be
used only when the mean is chosen as the measure of
center.
• Observations more spread out about their mean ⇒
SD gets larger.
• SD has the same units of measurement as the original
observations. e.g. mean and standard deviation of
age both have years as units
40
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022
Department of Public Health
41. Properties of the Standard Deviation SD
• SD is positive.
• SD =0 only when all observations take the same
value e.g. 2, 2, 2, 2.
• SD is not resistant. Strong skewness or a few
outliers can greatly increase SD.
• We divide by n-1 instead of n.
The number n-1 is called the degrees of freedom.
40
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022
Department of Public Health 4
42. Properties of the Standard Deviation SD
42
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022
Department of Public Health
• SD are useful when making comparisons
• If the standard deviation is very large, you should
check for outliers, skewness, or some other
unexpected shape.
43. Examples n=10 participants attending one of the Framingham study examinations
43
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022
Department of Public Health
Variable Mean Standard
Deviation
Median Q1 Q3 IQR
Diastolic Blood
Pressure
71.3 7.2 71 64 77 13
Systolic Blood Pressure 121.2 11.1 122.5 113 127 14
Total Serum
Cholesterol
202.3 37.7 206.5 163 227 64
Weight 176 33 169.5 151 206 55
Height 67.175 4.205 69.375 63 70 7
Body Mass Index 27.26 3.1 26.6 24.9 29.6 4.7
44. Coefficient of Variation
44
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022
Department of Public Health
The coefficient of variation (CV) expresses the
standard deviation as
a percentage of the sample mean.
CV has no units
45. Coefficient of Variation
Which method is more precise?
45
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022
Department of Public Health
46. CV proves both methods are equally precise
46
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022
Department of Public Health
Coefficient of Variation
Which method is more precise?
47. The Five-number Summary
Useful in descriptive analyses or during the preliminary investigation
of a large data set
47
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022
Department of Public Health
48. Activity 2-1: Learn through simulations
48
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022
Department of Public Health
Learning outcome:
• Distinguish when to use the mean or the median.
• Distinguish when to use the standard deviation or the IQR.
• Interpret results from published research.
Work in pairs
Time: 20 minutes
49. Activity 2-2
49
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022
Department of Public Health
Assessment on BB using Team based learning
6 questions
Work in pairs
Discuss
Time: 20 minutes
50. Practical 1 using Stata
50
PUBH 601 Concepts and Methods of Biostatistics. Fall 2022
Department of Public Health
Numerical and Graphical summaries
Access data and practical document from BB
Divide into pairs
Time: 45 minutes