1Basic biostatistics.pdf

Course title: Advanced biostatistics
Course code: ?
Credit hours: 3
21 February 2023 1

Course contents
1. Revision on basic biostatistics
2. Correlation analysis
3. Analysis of variance (ANOVA)
4. Introduction to Non-parametric tests
5. Linear regression
6. Analysis of Categorical Data :
a) Analysis of contingency tables and
b) logistic regression
7. Longitudinal data analysis
Survival Analysis
21 February 2023 2

Cont’d
• Software package required
– SPSS/STATA
• Assessment methods
a) Formative (40%):
– Individual, group assignments, project, Appraisal, Group presentations
b) Summative(60%):
– Final written exam
21 February 2023 3

I. Review of basic biostatistics
Emiru Merdassa(MSc, Assistant Professor)
21 February 2023 4

Learning objectives
• Rehearse on descriptive and inferential statistics
• Identify the levels of measurement of variables used in research
• Describe graphical displays of data appropriate for specific levels of measurement.
• Identify measures of central tendency & dispersion appropriate for level of measurement.
• Conduct and interpret these statistical tests using the SPSS/STATA
• Two-sample t-test for independent samples (unpaired)
• T-test for dependent samples (matched or paired samples)
• Explain results of t-tests in an understandable way.
• Select the correct statistical procedure for different questions
21 February 2023 5

What is Statistics?
1. Collecting data
e.g., Sample, Survey, Observe, Simulate
2. Characterizing data
e.g., Organize/Classify, Count, Summarize
3. Presenting data
e.g., Tables, Charts, Statements
4. Interpreting results
e.g. Infer, Conclude, Specify Confidence
Why?
Data analysis
Decision making
© 1984-1994 T/Maker Co.
21 February 2023 6

Types of Statistics/biostatistics
1. Descriptive statistics
– Numerical or graphic summaries of data
– Charts, graphs, tables, summary statistics (e.g., mean and standard deviation)
2. Inferential Statistics
– Statistical techniques that allow conclusions to be drawn about the population
– Examples include Chi-square test, t test, ANOVA
21 February 2023 7

Variable
 A variable is any characteristic that can and does assume different values for different
people, objects, or events being studied
For example:
• heart rate,
• the heights of adult males,
• the weights of preschool children,
• the ages of patients seen in a dental clinic.
21 February 2023 8

Types of variables
21 February 2023 9

Measurement Scales
1. Nominal
• Numbers are simply used as a code to represent characteristics
• There is no order to the categories
• The assignment of numbers to categories is arbitrary
• Examples of Variables
– Gender: 1. Male 2. Female
– Ethnicity: 1. Oromo 2. Tigre 3. Amhara 4. Guraghe
21 February 2023 10

2. Ordinal
– Numbers represent categories that can be placed in a meaningful numerical order (e.g., from
lowest to highest)
– There is no information regarding the size of the interval between the different values
– Ordinal: Example of a Variable: Example: Pain Scale
1. No pain
2. A little pain
3. Some pain
4. A lot of pain
♦ Note: Almost all subjective scales (satisfaction, pain, and depression) are considered ordinal
21 February 2023 11

3. Interval
oNumbers can be placed in meaningful order
oThe intervals between the numbers are equal
oIt is possible to add and subtract across an interval scale
oThere is no true zero, so ratios cannot be calculated
oExample: Temperature in Fahrenheit, IQ
oNote that all of these do not have a “true zero”
21 February 2023 12

4. Ratio
• Numbers can be placed in meaningful order
• The intervals between the numbers are equal
• There is a “true” zero, determined by nature, which represents the absence of the
phenomena
• Almost all biomedical measures (weight, pulse rate, and cholesterol level) are of ratio scale
• Examples of a Variable: Weight, Age, Number of minutes spent exercising, Cholesterol
level, Number of weeks pregnant
– Note that all of these do have a “true zero”
21 February 2023 13

Population and Sample
Population
• It is the group that is targeted to collect the data from.
• It is always defined first, before starting the data collection process for
any statistical study.
• It is not necessarily be people rather it could be micro-organism,
measurements of rainfall in an area or a group of people.
• It is the collection of all items of interest or under investigation
• N represents the population size
• A specific characteristic is called parameter
21 February 2023 14

Population and Sample
Sample
• It is the part of population which is selected randomly for the study.
• The sample should be selected such that it represents all the
characteristics of the population.
• n represents the sample size
• A specific characteristic is called statistics
21 February 2023 15

Population vs. Sample
a b c d
ef gh i jk l m n
o p q rs t u v w
x y z
Population Sample
b c
g i n
o r u y
Values calculated using population
data are called parameters
Values computed from sample
data are called statistics
21 February 2023 16

Cont’d
• Data presentation
– Tabulation
– Graphs
• Data summary measure
– Measures of Location
– Measures of Dispersion
– Measures of Skewness & Kurtosis
• Inferential statistics
– Estimation: Point estimate & Interval estimate
– Hypothesis Testing
• Univariate analysis /Multivariate analysis: adjust the confounder
21 February 2023 17

Data Presentation
• Overall goal is to get a feeling for the distribution of the data
0 Central tendency: most frequently occurring or typical/common values
0 Dispersion: how the values are spread out
0 Shape and skewness: symmetry or asymmetry of the distribution of the
values
0 Outliers: unusual values that do not fit the overall pattern of the data
21 February 2023 18

Data Presentation
• Frequency distribution table
– A way of organizing the data in table form
• Table shows
– Possible values of the variable
– Raw frequencies (number of cases with that value)
– Relative frequency (% of cases with that value)
– Cumulative frequency (total % having up to and including a given value of the
variable)
21 February 2023 19

Frequency distribution table
Weight Range/Interval Raw Frequency
(No)
Relative frequency
(% of total sample)
Cumulative Frequency
(Cumulative %)
45-54 2 5.1 5.1
55-64 4 10.3 15.4
65-74 5 12.8 .
75-84 6 15.4 .
85-94 11 28.2 .
95-104 4 10.3 .
105-114 3 7.7 .
115-124 2 5.1 .
125-134 1 2.6 .
135-144 0 0 .
145-154 1 2.6 .
Total 39 100
21 February 2023 20

Graphic Presentation
 The graphs just represent a summary of data
 It is usually suggested that the graphic representation of the data should be looked at
before proceeding for format statistical analysis
Common uses of Graphs
 Visual representation
 Good to understand and attractive
 To check assumptions
 Help in selection of Statistical tools
21 February 2023 21

Types of Graphs for Categorical Variables
Graphing data
Bar-chart Pie-chart
21 February 2023 22

Quantitative data graphical presentation
0 Histogram
0 Stem-and-leaf plot
0 Box plot
0 Scatter plot
0 Line graph & etc.
21 February 2023 23

General rules for designing graphs
 A graph should have a self-explanatory legend: Title: For the Table on the top.
For the Graph at the bottom.
 A graph should help reader to understand data
 Axis labeled, units of measurement indicated
 Scales important. Start with zero (otherwise // break) If not put // on the x-axis
 Avoid graphs with three-dimensional impression, it may be misleading (reader
visualize less easily)
21 February 2023 24

Measures of Central Tendency
1. Mean:
• The arithmetic average of the distribution.
• Most appropriate for interval and ratio level data.
• Sometimes used for ordinal data.
2. Median:
• The value that is in the middle of the distribution, i.e. the 50th percentile. Appropriate for ordinal, interval, and
ratio level data.
3. Mode:
• The most frequently occurring value.
• There can be multiple modes. Appropriate for all measurement levels.
21 February 2023 25

Mean
• Mean is the sum of all of the values of the variable in a given data set divided by
the total number of values
21 February 2023 26

Measures of dispersion or variability
• Overall goal is to get a feeling for the spread of the data.
• Range: The difference between the highest and the lowest value in a data set.
• Interquartile range: The difference between the first (Q1) and the third (Q3) quartile in the
distribution.
• Standard deviation: The average distance (deviation) of each point from the mean.
• Coefficient of variation: compare the dispersion in two sets of data which is independent of
the unit of the measurement.
CV =
SD
ഥ
X
*100
21 February 2023 27

Standard Deviation
• The sample variance (𝐒𝟐
) is the sum of the squared deviations from the mean, divided by
n-1 (the number of values summed -1)
• The standard deviation (s) is the square root of the variance
21 February 2023 28

Quartiles
• Quartiles are the values that divide a list of numbers into quarters:
• Put the list of numbers in order
• Then cut the list into four equal parts
• Example: 5, 7, 4, 4, 6, 2, 8
• Put them in order: 2, 4, 4, 5, 6, 7, 8
• Cut the list into quarters:
• Quartile 1 (Q1) = 4
• Quartile 2 (Q2), which is also the Median, = 5
• Quartile 3 (Q3) = 7
21 February 2023 29

Interquartile Range
3rd quartile – 1st quartile
 75th – 25th percentile
3(n+1)/4 - (n+1)/4
Robust to outliers
Middle 50% of observations
The Interquartile Range is:
IQR = Q3 − Q1 = 7 − 4 = 3
21 February 2023 30

Which measure to use ?
• If the distribution of data is Symmetric, use/report
–Mean with Standard Deviation
• If the distribution of data is skewed, use/report
–Median with IQR
21 February 2023 31

Measures of shape
• It is necessary to consider the shape of the data – the manner, in which the data
are distributed.
• There are two measures of the shape of a data set:
oSkewness and
oKurtosis.
21 February 2023 32

Skewness
❖Skew is a measure of symmetry in the distribution of scores
❖skewness is defined by the formula:
❖Skewness:
• a3 > 0 distribution skewed to the right/ positively skewed
• a3 < 0 distribution skewed to the left/ negatively skewed
• a3 = 0 then, the distribution is symmetrical.
21 February 2023 33

Measure of Skew
Positive Skew
Negative Skew
Normal (skew = 0)
21 February 2023 34

Kurtosis
• Kurtosis characterizes the relative Peakedness or flatness of a distribution compared with
the bell-shaped distribution (normal distribution).
• Kurtosis of a sample data set is calculated by the formula:
Kurtosis:
• a4 > 3 thinner tails & higher peak than a normal distribution
• a4 < 3 thicker tails & lower peak compared to a normal distribution
For a meaningful and comparable measure of a4, the distribution should be
symmetrical (hence again the need to have a normal distribution)
21 February 2023 35

Kurtosis
• Kurtosis measures whether the scores are spread out more or less
than they would be in a normal (Gaussian) distribution
Mesokurtic (a4 = 3)
Leptokurtic (a4 > 3)
Platykurtic (a4 < 3)
21 February 2023 36

Basic probability
• Definition and characteristics of probability
• Types of probability
– Objective probability(classical and Empirical)
– Subjective probability
• Probability distribution
– Binomial distribution
– Continuous probability distribution(normal distribution)
21 February 2023 37

Normal distribution, Sampling distribution & Estimation
I. Normal Distribution
 One of the most important theoretical (a priori) probability distributions in statistics
21 February 2023 38

Properties of the Normal Distribution
Mean=median=mode
Bell shaped
Symmetrical around the mean
Area under the curve = 1
68% of the data lie within +/- one standard deviation (SD) from the mean
95% of the data lie within +/- two standard deviations (SD) from the mean
>99% of the data lie within +/- three standard deviations (SD) from the mean
21 February 2023 39

Percentile Ranks & Z-Scores
• Use this formula to convert the data to Z scores:
𝐙 =
𝐱 − 𝛍
𝛔
• The Z-score of point “x” equals (x minus the mean) divided by the standard deviation
• The Z-score can be looked up in a Z-table to get the percentile rank
• Those points with a positive Z-score have a percentile rank of greater than 50 and those points
with a negative Z-score have a percentile rank of less than 50
• Those points with a Z-score of zero have a percentile rank of exactly 50. These scores are the
median value.
21 February 2023 40

Example
• We have a group of 62 young women with a mean age of 16 years and a standard deviation of 2.94
years. What would be the percentile rank of a girl aged 14 years? What percentage of girls are the 14
years or younger?
• Step #1: Obtain the Z-score
𝐙 =
𝟏𝟒−𝟏𝟔
𝟐.𝟗𝟒
= -0.6802
Z-Scores
• Step #2: Look this number up in a Z-table (also called “table of the area under the normal curve”)
✓ A Z-score of −0.6802 corresponds to an area of 0.2483 under the curve
✓ The percentile rank is 24.83
✓ Thus, 24.83% of the girls are age 14 or younger
21 February 2023 41

Diagnostic Tests
• Diagnostic tests attempt to classify whether somebody has a disease or not before
symptoms are present. There is a need to establish how good a diagnostic test is in
detecting disease.
21 February 2023 42

Diagnostic Tests
1. Sensitivity: This is the proportion of diseased individuals that are correctly identified
by the test as having the disease. P(+ve/D)
Sensitivity =
𝐚
𝐚+𝐜
2. Specificity: This is the proportion of non-diseased individuals that are correctly
identified by the test as not having the disease. P(-Ve/ND)
Specificity =
𝐝
𝐛+𝐝
21 February 2023 43

cont’d
3. Positive Predictive Value: This is the proportion of individuals with positive test results
that are correctly diagnosed and actually have the disease. P(D/+Ve)
PPV =
a
a+b
4. Negative Predictive Value: This is the proportion of individuals with negative test
results that are correctly diagnosed and do not have the disease. P(ND/-Ve)
NPV =
d
c+d
21 February 2023 44

Example
• Consider a test for HIV which uses certain test to assess HIV status; if the test returns a
positive result then the patient is presumed to have the disease. The true diagnosis is
whether the patient truly has HIV or not.
21 February 2023 45
True diagnosis
HIV Non-HIV Total
Test results Positive 900 1100 2000
Negative 450 3550 4000
1350 4650 6000

Solution
–Sensitivity =
𝐚
𝐚+𝐜
=
900
1350
= 𝟎. 𝟔𝟕
–Specificity =
𝐝
𝐛+𝐝
=
3650
4650
= 𝟎. 𝟕𝟔
–PPV =
a
a+b
=
900
2000
= 𝟎. 𝟒𝟓
–NPV =
d
c+d
=
3550
4000
= 𝟎. 𝟖𝟗
21 February 2023 46

Study designs
21 February 2023 47

Study designs cont’d
21 February 2023 48
• Observational: studies do not involve any intervention
or experiment.
• Experimental: studies that entail manipulation of the
study factor (exposure) and randomization of subjects
to treatment (exposure) groups

Sampling distributions
♦ Sampling distribution
• The probability distribution of a sample statistic.
• Formed when samples of size n are repeatedly taken from a population.
♦ Example
 Sampling distribution of sample means
 Sampling distribution of sample proportions
21 February 2023 49

Sampling Distribution of Sample Means
• The sampling distribution consists of the values of the sample means, ҧ
𝑥1, ҧ
𝑥2, ҧ
𝑥3, ҧ
𝑥4,
ҧ
𝑥5,… ҧ
𝑥𝑛
21 February 2023 50

Properties of Sampling Distributions of Sample Means
The mean of the sample means, 𝜇 ҧ
𝑥, is equal to the population mean μ.
𝝁ഥ
𝒙 = μ
The standard deviation of the sample means, 𝜎 ҧ
𝑥, is equal to the population standard
deviation, σ divided by the square root of the sample size, n.
𝝈ഥ
𝒙=
𝝈
𝒏
Called the standard error of the mean.
21 February 2023 51

The Central Limit Theorem
0 If samples of size n ≥30, are drawn from any population with mean = 𝜇 a,nd standard
deviation = 𝜎, then the sampling distribution of the sample means approximates a
normal distribution. The greater the sample size, the better the approximation.
21 February 2023 52

Statistical Inference
• Two types of statistical inference
I. Parameter Estimation
• Estimating a population’s characteristics from sample data
II. Hypothesis Testing
• Testing statements of relationships between two or more variables
21 February 2023 53

Confidence Interval
0 Provides an idea on how good (precise) an estimate is, e.g. sample mean as a point
estimate of population mean
0 In general, CI for a parameter is [estimate ± (critical value × SE of the estimate)]
0 Common to compute a 95% CI, however other confidence levels can be used, e.g. 99% or
90%.
0 Need three components to compute a CI: an estimate of parameter of interest, critical
value (e.g. z, t) and standard error of the estimate.
21 February 2023 54

Comparison of distributions
Figure 1. Comparison of t and normal distribution
21 February 2023 55

Conditions for using, various test statistics [Z, t or non parameter]
21 February 2023 56

Hypothesis testing
• A hypothesis is a statement or assertion or assumption or claim or
belief about the state of nature (about the true value of an unknown
population parameter)
• A Hypothesis Test is a statistical procedure that involves formulating a
hypothesis and using sample data to decide on the validity of the
hypothesis (to support or not to support)
21 February 2023 57

Types of Hypotheses
• Null hypothesis (H0): states that there will be no relationship between the two variables
• Alternative (research) hypothesis (Ha): states that there will be a relationship between
the two variables
– Directional (one‐sided or one‐tailed), e.g. a mean higher than/less than another or a
null value, positive/negative
– Nondirectional (two‐sided or two‐tailed), e.g. means not equal
21 February 2023 58

Example
Null hypothesis (H0)
There will be no relationship between height and weight in adolescent boys.
Alternative hypothesis (Ha)
Directional: Height will be positively related to weight in adolescent boys (e.g.,
taller boys will weigh more).
Nondirectional: There will be a relationship between height and weight in
adolescent boys.
21 February 2023 59

Tails of the test
21 February 2023 60

Choosing the appropriate Statistical test
• Type of variables
• Number of groups being compared &
• Sample size
21 February 2023 61

Statistical Tests
Z-test:
• Study variable: Qualitative
• Outcome variable: Quantitative
• Comparison: Sample mean with population mean & two sample means
• Sample size: larger in each group(>30) & standard deviation is known
Student’s t-test:
• Study variable: Qualitative
• Outcome variable: Quantitative
• Comparison: sample mean with population mean; two means (independent samples); paired samples.
• Sample size: each group <30 ( can be used even for large sample size)
21 February 2023 62

Independent T‐test
• Used to compare two independent (unrelated) groups: Whether the difference we
observed between groups on a continuous outcome variable is greater than it could occur
by chance alone
• Grouping (independent) variable is dichotomous and continuous outcome(dependent)
variable.
• Examples of two independent groups: Intervention group and a non‐matched
comparison group, Males and females, Groups with and without a medical condition
• Examples of dependent variable
Length of stay in hospital (days), Age (years), Weight (kilogram)
21 February 2023 63

Example
• Research question: on average, will male dieters lose more weight over a 3‐month period
than female dieters?
o Independent variable (IV): gender (male/female)
• Dependent variable (DV): Weight loss in pounds over a 3‐month period
o IV is dichotomous, and DV is continuous
• Report mean and SD of DV for each group
21 February 2023 64

Two types of Independent t‐test
• There are two versions of independent samples t test
• Pooled (equal/homogeneity) variance t‐test
• Separate (unequal) variance t‐test
• We can use Levene’s test to test for homogeneity of variance between the two
groups
• However, SPSS provides both types of analyses by default.
21 February 2023 65

Example: Performing Independent T‐test
• Suppose data come from a study of 17 male and 15 female dieters to test the
following null hypothesis:
• H0: There will be no difference in weight loss between male dieters and female
dieters.
• This hypothesis stems from the research question:
✓Do male dieters lose more weight, on average, than do female dieters?
21 February 2023 66

State Null and Alternative Hypotheses
Null Hypothesis
• 𝐇𝟎: There will be no difference in weight loss between male dieters and female
dieters.
Alternative Hypothesis
• 𝐇𝐀: There will be a significant difference in weight loss between male dieters and
female dieters.
21 February 2023 67

Null and Alternative Hypotheses Notation
Hypotheses
𝐇𝟎 : μ1 = μ2 , or μ1 ‐ μ2 = 0
𝐇𝟏 or 𝐇𝐀 : μ1 ≠ μ2 (two‐tailed or nondirectional)
Where,
μ1 = population mean for first group
μ2 = population mean for second group
21 February 2023 68

Select α and Find the Critical Value
α‐Level
– Statistical significance will be defined as p < 0.05.
Critical Value
– Using the table of critical values of the t‐table(find it on any statistics textbook), we
find that the critical values that define the rejection region are ± 2.042 on 30 df at
alpha=0.05 (two‐tailed/nondirectional test)
– If the t‐statistic is greater than 2.042 or more negative (smaller than) −2.042, we
will reject the null hypothesis
21 February 2023 69

Assumptions of Independent t‐Test
– The independent variable must be dichotomous (two categories, mutually exclusive)
• The two categories must be independent
– The dependent variable has to be continuous (interval or ratio)
–The dependent variable has to be approximately normally distributed
– If normality assumption is not satisfied, a non parametric test called Mann‐Whitney U
test can be used.
21 February 2023 70

Histogram of weight loss variable
21 February 2023 71

SPSS-independent-t-test
21 February 2023 72

SPSS output
21 February 2023 74

Determine Statistical Significance and State a Conclusion
• Since the computed t test statistic of 3.16 is larger than the critical value of 2.042, we
conclude that the difference in mean weight loss between males and females is
statistically significant. We use p‐value (sig.) if SPSS is used
• In short:
0 Men lost an average of 18.6 lbs (SD 6.02) and women lost an average of 12.1 lbs
(SD 5.46). This is a statistically significant difference at p < 0.05 by the
independent t test.
21 February 2023 75

Interval estimate: Manual calculation
Solution:
 Point estimate: 𝐱𝟏 − 𝐱𝟐; 18.59-12.13 = 6.46
 Critical value: tα/2 has (n1 + n2 – 2) d.f., alpha=0.05, using t-table 𝒕𝟑𝟎,𝟎.𝟎𝟓 = 2.042
 Standard error:
𝐬𝐩
𝐧
; where 𝐬𝐩 =
𝐧𝟏−𝟏 𝐬𝟏
𝟐+ 𝐧𝟐−𝟏 𝐬𝟐
𝟐
𝐧𝟏+𝐧𝟐−𝟐
=
𝟏𝟕−𝟏 ×𝟑𝟔.𝟐𝟓+ 𝟏𝟓−𝟏 ×𝟐𝟗.𝟖𝟑
𝟏𝟕+𝟏𝟓−𝟐
= 5.77
 Standard error = 𝐬𝐩
𝟏
𝐧𝟏
+
𝟏
𝐧𝟐
= 𝟓. 𝟕𝟕 ×
𝟏
𝟏𝟕
+
𝟏
𝟏𝟓
= 5.77 *0.354 = 2.043
 Margin of error: t critical * SE of estimates: 2.042*2.043 = 4.17
21 February 2023 76

Estimate for the difference
• The confidence interval for μ1– μ2 is: 𝐱𝟏 − 𝐱𝟐 ± 𝒕𝜶/2 𝐬𝐩
𝟏
𝐧𝟏
+
𝟏
𝐧𝟐
• Where tα/2 has (n1 + n2 – 2) df, and
𝐬𝐩 =
𝐧𝟏 − 𝟏 𝐬𝟏
𝟐 + 𝐧𝟐 − 𝟏 𝐬𝟐
𝟐
𝐧𝟏 + 𝐧𝟐 − 𝟐
• 95%CI for the difference is 𝐱𝟏 − 𝐱𝟐 ± 𝒕𝜶/2 𝐬𝐩
𝟏
𝐧𝟏
+
𝟏
𝐧𝟐
• 95%CI: 6.46 ±2.042 ∗2.043⟹ (𝐿𝐿, 𝑈𝐿) ⟹ (6.46 − 4.17, 6.46 + 4.17)
 Ans: 95%CI: (2.29, 10.63); interpretation?
21 February 2023 77

Hypothesis testing: hand calculation
0 Step 1: Set up hypotheses and determine level of
significance
0 Step 2: Select the appropriate test statistic.
0 Step 3: Set up decision rule
0 Step 4: Compute the test statistic
0 Step 5: Conclusion.
21 February 2023 78

Cont’d
𝑡𝑐𝑎𝑙 =
𝐱𝟏−𝐱𝟐
𝐬𝐩
𝟏
𝐧𝟏
+
𝟏
𝐧𝟐
=
6.46
4.17
= 3.16, which is greater than t-critical
• After the statistical computation: Reject 𝐻𝑜 if tcalc. >2.042(from t-table)
Since 3.16 >2.042, we reject null hypothesis
21 February 2023 79

Summary
• Identify a research question that is based on two independent groups (groups can be
equal or unequal sized)
• Perform t‐test, either equal variance or unequal variance, based on Levene’s test of
homogeneity of variance
• If dependent variable is not approximately normally distributed, Mann‐Whitney U test
(a nonparametric equivalent method) can be used (to appear later in this course)
• If a computer program like SPSS is used, look at p‐value (sig.)
• Interpret results and state a conclusion
21 February 2023 80

Paired t‐Test
• Used to compare two groups that are dependent or closely related.
• Are the means of two related groups (paired or matched pairs) different from one another?
• Whether the difference we observed between groups is greater than it could occur by
chance alone
• Example
– Population type: Pretest‐posttest measures on the same person (time), twin studies,
couples)
– Outcome: Blood pressure premedication and post‐medication, Weight (case and control)
21 February 2023 81

Cont’d
Pretest – posttest study
• Effect of an intervention on study participants by comparing posttest values
of the dependent variable after the intervention to the pretest values of the
same variable before the intervention
• Example:
–Weight loss after education intervention by comparing the study
participants’ post‐intervention weight with their pre‐intervention weight?
21 February 2023 82

Cont’d
Matched pairs study
• Examine the effect of an intervention in which a participant who receives the intervention is
matched to a control who does not receive the intervention, matching criteria can be
gender, education, socioeconomic status etc.
• Example
– Does breast feeding affect bone density?
– Matched study, 58 female twin pairs, one with breast feeding, one not, used paired t test to examine difference in
bone density
– We want the two groups to be very similar on factors other than breast feeding status and observe bone density!
Other factors that are the same in this matched study are gender, age, BMI, ethnicity, health status (as they are
twins!)
21 February 2023 83

Steps in computing paired t‐test
– State null hypothesis and alternative hypothesis.
– Define the significance level, degree of freedom, and thus critical value for
computed T test statistics.
– Make sure that the data meet the assumptions for using paired t test.
– Compute the paired t-test statistic.
– Determine statistical significance and state a conclusion.
21 February 2023 84

Example
Research question
Does a “healthy heart” education program increase knowledge about
cardiovascular knowledge in teenagers?
Independent (Grouping) variable
Pretest/posttest measure
Dependent variable
Score on a 100 point quiz about cardiovascular health
Mean and SD for each time and for difference between two time points would be relevant
21 February 2023 85

State Hypotheses
❖Null Hypothesis
• H0: There will be no difference between the pretest and posttest
cardiovascular knowledge
❖Alternative Hypothesis (nondirectional)
• HA: There will be a significant difference in cardiovascular knowledge after
attending the program
❖This hypothesis stems from the research question:
• Does a “healthy heart” education program increase knowledge about
cardiovascular knowledge in teenagers?
21 February 2023 86

Specify α Level and Find the Critical Value
1. α‐Level
• Statistical significance will be defined as p<0.05
2. Critical Value
• Using the table of critical values of the t‐table, we find that the critical values that
define the rejection region are ±2.042 (on 30 df)
• If the t‐statistic is greater than 2.042 or more negative (smaller than) −2.042, we will
reject the null hypothesis
• This would be relevant for hand calculations. We would use p‐value (sig.) from a
computer output, e.g. SPSS
21 February 2023 87

Ensure data meet the assumptions
• There are two paired measures of the dependent variable
• Dependent variable is approximately normally distributed.
21 February 2023 88

Paired t test: SPSS procedure
21 February 2023 89

Paired t‐Test: SPSS Output
21 February 2023 91

Determine statistical significance and state a conclusion
• Since the computed t test statistics of 6.935 is greater than the critical value of 2.042,
we conclude that the difference in test scores from pretest to posttest is statistically
significant.
In short:
• The posttest score was an average of 9.03 points higher than the pretest score, and this
was significant at p < 0.05 by the paired t test. It can be concluded that the teenagers
had significantly higher test scores after going through the educational program.
21 February 2023 92

Summary
• Identify a research question that is based on true or matched pairs (dependent groups)
• Equal sample size across groups as they are paired, so degrees of freedom is number of
pairs minus 1
• Perform paired t‐test (one sample t‐test based on differences)
• If dependent variable is not approximately normally distributed, Wilcoxon Signed rank
test (a nonparametric equivalent method) can be used (to appear later in this course)
• If hand calculations are performed, use t‐distribution table
• If a computer program like SPSS is used, look at p‐value (sig.)
• Interpret results and state a conclusion
21 February 2023 93

Hand calculation: Try using the above example
21 February 2023 94
1. Interval estimate for true knowledge score
differences
2. Test the hypothesis

1Basic biostatistics.pdf

Recommended

Recommended

More Related Content

Similar to 1Basic biostatistics.pdf

Similar to 1Basic biostatistics.pdf (20)

More from YomifDeksisaHerpa

More from YomifDeksisaHerpa (6)

Recently uploaded

Recently uploaded (20)

1Basic biostatistics.pdf