SlideShare a Scribd company logo
1 of 95
Download to read offline
Course title: Advanced biostatistics
Course code: ?
Credit hours: 3
21 February 2023 1
Course contents
1. Revision on basic biostatistics
2. Correlation analysis
3. Analysis of variance (ANOVA)
4. Introduction to Non-parametric tests
5. Linear regression
6. Analysis of Categorical Data :
a) Analysis of contingency tables and
b) logistic regression
7. Longitudinal data analysis
Survival Analysis
21 February 2023 2
Cont’d
• Software package required
– SPSS/STATA
• Assessment methods
a) Formative (40%):
– Individual, group assignments, project, Appraisal, Group presentations
b) Summative(60%):
– Final written exam
21 February 2023 3
I. Review of basic biostatistics
Emiru Merdassa(MSc, Assistant Professor)
21 February 2023 4
Learning objectives
• Rehearse on descriptive and inferential statistics
• Identify the levels of measurement of variables used in research
• Describe graphical displays of data appropriate for specific levels of measurement.
• Identify measures of central tendency & dispersion appropriate for level of measurement.
• Conduct and interpret these statistical tests using the SPSS/STATA
• Two-sample t-test for independent samples (unpaired)
• T-test for dependent samples (matched or paired samples)
• Explain results of t-tests in an understandable way.
• Select the correct statistical procedure for different questions
21 February 2023 5
What is Statistics?
1. Collecting data
e.g., Sample, Survey, Observe, Simulate
2. Characterizing data
e.g., Organize/Classify, Count, Summarize
3. Presenting data
e.g., Tables, Charts, Statements
4. Interpreting results
e.g. Infer, Conclude, Specify Confidence
Why?
Data analysis
Decision making
© 1984-1994 T/Maker Co.
21 February 2023 6
Types of Statistics/biostatistics
1. Descriptive statistics
– Numerical or graphic summaries of data
– Charts, graphs, tables, summary statistics (e.g., mean and standard deviation)
2. Inferential Statistics
– Statistical techniques that allow conclusions to be drawn about the population
– Examples include Chi-square test, t test, ANOVA
21 February 2023 7
Variable
 A variable is any characteristic that can and does assume different values for different
people, objects, or events being studied
For example:
• heart rate,
• the heights of adult males,
• the weights of preschool children,
• the ages of patients seen in a dental clinic.
21 February 2023 8
Types of variables
21 February 2023 9
Measurement Scales
1. Nominal
• Numbers are simply used as a code to represent characteristics
• There is no order to the categories
• The assignment of numbers to categories is arbitrary
• Examples of Variables
– Gender: 1. Male 2. Female
– Ethnicity: 1. Oromo 2. Tigre 3. Amhara 4. Guraghe
21 February 2023 10
2. Ordinal
– Numbers represent categories that can be placed in a meaningful numerical order (e.g., from
lowest to highest)
– There is no information regarding the size of the interval between the different values
– Ordinal: Example of a Variable: Example: Pain Scale
1. No pain
2. A little pain
3. Some pain
4. A lot of pain
♦ Note: Almost all subjective scales (satisfaction, pain, and depression) are considered ordinal
21 February 2023 11
3. Interval
oNumbers can be placed in meaningful order
oThe intervals between the numbers are equal
oIt is possible to add and subtract across an interval scale
oThere is no true zero, so ratios cannot be calculated
oExample: Temperature in Fahrenheit, IQ
oNote that all of these do not have a “true zero”
21 February 2023 12
4. Ratio
• Numbers can be placed in meaningful order
• The intervals between the numbers are equal
• There is a “true” zero, determined by nature, which represents the absence of the
phenomena
• Almost all biomedical measures (weight, pulse rate, and cholesterol level) are of ratio scale
• Examples of a Variable: Weight, Age, Number of minutes spent exercising, Cholesterol
level, Number of weeks pregnant
– Note that all of these do have a “true zero”
21 February 2023 13
Population and Sample
Population
• It is the group that is targeted to collect the data from.
• It is always defined first, before starting the data collection process for
any statistical study.
• It is not necessarily be people rather it could be micro-organism,
measurements of rainfall in an area or a group of people.
• It is the collection of all items of interest or under investigation
• N represents the population size
• A specific characteristic is called parameter
21 February 2023 14
Population and Sample
Sample
• It is the part of population which is selected randomly for the study.
• The sample should be selected such that it represents all the
characteristics of the population.
• n represents the sample size
• A specific characteristic is called statistics
21 February 2023 15
Population vs. Sample
a b c d
ef gh i jk l m n
o p q rs t u v w
x y z
Population Sample
b c
g i n
o r u y
Values calculated using population
data are called parameters
Values computed from sample
data are called statistics
21 February 2023 16
Cont’d
• Data presentation
– Tabulation
– Graphs
• Data summary measure
– Measures of Location
– Measures of Dispersion
– Measures of Skewness & Kurtosis
• Inferential statistics
– Estimation: Point estimate & Interval estimate
– Hypothesis Testing
• Univariate analysis /Multivariate analysis: adjust the confounder
21 February 2023 17
Data Presentation
• Overall goal is to get a feeling for the distribution of the data
0 Central tendency: most frequently occurring or typical/common values
0 Dispersion: how the values are spread out
0 Shape and skewness: symmetry or asymmetry of the distribution of the
values
0 Outliers: unusual values that do not fit the overall pattern of the data
21 February 2023 18
Data Presentation
• Frequency distribution table
– A way of organizing the data in table form
• Table shows
– Possible values of the variable
– Raw frequencies (number of cases with that value)
– Relative frequency (% of cases with that value)
– Cumulative frequency (total % having up to and including a given value of the
variable)
21 February 2023 19
Frequency distribution table
Weight Range/Interval Raw Frequency
(No)
Relative frequency
(% of total sample)
Cumulative Frequency
(Cumulative %)
45-54 2 5.1 5.1
55-64 4 10.3 15.4
65-74 5 12.8 .
75-84 6 15.4 .
85-94 11 28.2 .
95-104 4 10.3 .
105-114 3 7.7 .
115-124 2 5.1 .
125-134 1 2.6 .
135-144 0 0 .
145-154 1 2.6 .
Total 39 100
21 February 2023 20
Graphic Presentation
 The graphs just represent a summary of data
 It is usually suggested that the graphic representation of the data should be looked at
before proceeding for format statistical analysis
Common uses of Graphs
 Visual representation
 Good to understand and attractive
 To check assumptions
 Help in selection of Statistical tools
21 February 2023 21
Types of Graphs for Categorical Variables
Graphing data
Bar-chart Pie-chart
21 February 2023 22
Quantitative data graphical presentation
0 Histogram
0 Stem-and-leaf plot
0 Box plot
0 Scatter plot
0 Line graph & etc.
21 February 2023 23
General rules for designing graphs
 A graph should have a self-explanatory legend: Title: For the Table on the top.
For the Graph at the bottom.
 A graph should help reader to understand data
 Axis labeled, units of measurement indicated
 Scales important. Start with zero (otherwise // break) If not put // on the x-axis
 Avoid graphs with three-dimensional impression, it may be misleading (reader
visualize less easily)
21 February 2023 24
Measures of Central Tendency
1. Mean:
• The arithmetic average of the distribution.
• Most appropriate for interval and ratio level data.
• Sometimes used for ordinal data.
2. Median:
• The value that is in the middle of the distribution, i.e. the 50th percentile. Appropriate for ordinal, interval, and
ratio level data.
3. Mode:
• The most frequently occurring value.
• There can be multiple modes. Appropriate for all measurement levels.
21 February 2023 25
Mean
• Mean is the sum of all of the values of the variable in a given data set divided by
the total number of values
21 February 2023 26
Measures of dispersion or variability
• Overall goal is to get a feeling for the spread of the data.
• Range: The difference between the highest and the lowest value in a data set.
• Interquartile range: The difference between the first (Q1) and the third (Q3) quartile in the
distribution.
• Standard deviation: The average distance (deviation) of each point from the mean.
• Coefficient of variation: compare the dispersion in two sets of data which is independent of
the unit of the measurement.
CV =
SD
ഥ
X
*100
21 February 2023 27
Standard Deviation
• The sample variance (𝐒𝟐
) is the sum of the squared deviations from the mean, divided by
n-1 (the number of values summed -1)
• The standard deviation (s) is the square root of the variance
21 February 2023 28
Quartiles
• Quartiles are the values that divide a list of numbers into quarters:
• Put the list of numbers in order
• Then cut the list into four equal parts
• Example: 5, 7, 4, 4, 6, 2, 8
• Put them in order: 2, 4, 4, 5, 6, 7, 8
• Cut the list into quarters:
• Quartile 1 (Q1) = 4
• Quartile 2 (Q2), which is also the Median, = 5
• Quartile 3 (Q3) = 7
21 February 2023 29
Interquartile Range
3rd quartile – 1st quartile
 75th – 25th percentile
3(n+1)/4 - (n+1)/4
Robust to outliers
Middle 50% of observations
The Interquartile Range is:
IQR = Q3 − Q1 = 7 − 4 = 3
21 February 2023 30
Which measure to use ?
• If the distribution of data is Symmetric, use/report
–Mean with Standard Deviation
• If the distribution of data is skewed, use/report
–Median with IQR
21 February 2023 31
Measures of shape
• It is necessary to consider the shape of the data – the manner, in which the data
are distributed.
• There are two measures of the shape of a data set:
oSkewness and
oKurtosis.
21 February 2023 32
Skewness
❖Skew is a measure of symmetry in the distribution of scores
❖skewness is defined by the formula:
❖Skewness:
• a3 > 0 distribution skewed to the right/ positively skewed
• a3 < 0 distribution skewed to the left/ negatively skewed
• a3 = 0 then, the distribution is symmetrical.
21 February 2023 33
Measure of Skew
Positive Skew
Negative Skew
Normal (skew = 0)
21 February 2023 34
Kurtosis
• Kurtosis characterizes the relative Peakedness or flatness of a distribution compared with
the bell-shaped distribution (normal distribution).
• Kurtosis of a sample data set is calculated by the formula:
Kurtosis:
• a4 > 3 thinner tails & higher peak than a normal distribution
• a4 < 3 thicker tails & lower peak compared to a normal distribution
For a meaningful and comparable measure of a4, the distribution should be
symmetrical (hence again the need to have a normal distribution)
21 February 2023 35
Kurtosis
• Kurtosis measures whether the scores are spread out more or less
than they would be in a normal (Gaussian) distribution
Mesokurtic (a4 = 3)
Leptokurtic (a4 > 3)
Platykurtic (a4 < 3)
21 February 2023 36
Basic probability
• Definition and characteristics of probability
• Types of probability
– Objective probability(classical and Empirical)
– Subjective probability
• Probability distribution
– Binomial distribution
– Continuous probability distribution(normal distribution)
21 February 2023 37
Normal distribution, Sampling distribution & Estimation
I. Normal Distribution
 One of the most important theoretical (a priori) probability distributions in statistics
21 February 2023 38
Properties of the Normal Distribution
Mean=median=mode
Bell shaped
Symmetrical around the mean
Area under the curve = 1
68% of the data lie within +/- one standard deviation (SD) from the mean
95% of the data lie within +/- two standard deviations (SD) from the mean
>99% of the data lie within +/- three standard deviations (SD) from the mean
21 February 2023 39
Percentile Ranks & Z-Scores
• Use this formula to convert the data to Z scores:
𝐙 =
𝐱 − 𝛍
𝛔
• The Z-score of point “x” equals (x minus the mean) divided by the standard deviation
• The Z-score can be looked up in a Z-table to get the percentile rank
• Those points with a positive Z-score have a percentile rank of greater than 50 and those points
with a negative Z-score have a percentile rank of less than 50
• Those points with a Z-score of zero have a percentile rank of exactly 50. These scores are the
median value.
21 February 2023 40
Example
• We have a group of 62 young women with a mean age of 16 years and a standard deviation of 2.94
years. What would be the percentile rank of a girl aged 14 years? What percentage of girls are the 14
years or younger?
• Step #1: Obtain the Z-score
𝐙 =
𝟏𝟒−𝟏𝟔
𝟐.𝟗𝟒
= -0.6802
Z-Scores
• Step #2: Look this number up in a Z-table (also called “table of the area under the normal curve”)
✓ A Z-score of −0.6802 corresponds to an area of 0.2483 under the curve
✓ The percentile rank is 24.83
✓ Thus, 24.83% of the girls are age 14 or younger
21 February 2023 41
Diagnostic Tests
• Diagnostic tests attempt to classify whether somebody has a disease or not before
symptoms are present. There is a need to establish how good a diagnostic test is in
detecting disease.
21 February 2023 42
Diagnostic Tests
1. Sensitivity: This is the proportion of diseased individuals that are correctly identified
by the test as having the disease. P(+ve/D)
Sensitivity =
𝐚
𝐚+𝐜
2. Specificity: This is the proportion of non-diseased individuals that are correctly
identified by the test as not having the disease. P(-Ve/ND)
Specificity =
𝐝
𝐛+𝐝
21 February 2023 43
cont’d
3. Positive Predictive Value: This is the proportion of individuals with positive test results
that are correctly diagnosed and actually have the disease. P(D/+Ve)
PPV =
a
a+b
4. Negative Predictive Value: This is the proportion of individuals with negative test
results that are correctly diagnosed and do not have the disease. P(ND/-Ve)
NPV =
d
c+d
21 February 2023 44
Example
• Consider a test for HIV which uses certain test to assess HIV status; if the test returns a
positive result then the patient is presumed to have the disease. The true diagnosis is
whether the patient truly has HIV or not.
21 February 2023 45
True diagnosis
HIV Non-HIV Total
Test results Positive 900 1100 2000
Negative 450 3550 4000
1350 4650 6000
Solution
–Sensitivity =
𝐚
𝐚+𝐜
=
900
1350
= 𝟎. 𝟔𝟕
–Specificity =
𝐝
𝐛+𝐝
=
3650
4650
= 𝟎. 𝟕𝟔
–PPV =
a
a+b
=
900
2000
= 𝟎. 𝟒𝟓
–NPV =
d
c+d
=
3550
4000
= 𝟎. 𝟖𝟗
21 February 2023 46
Study designs
21 February 2023 47
Study designs cont’d
21 February 2023 48
• Observational: studies do not involve any intervention
or experiment.
• Experimental: studies that entail manipulation of the
study factor (exposure) and randomization of subjects
to treatment (exposure) groups
Sampling distributions
♦ Sampling distribution
• The probability distribution of a sample statistic.
• Formed when samples of size n are repeatedly taken from a population.
♦ Example
 Sampling distribution of sample means
 Sampling distribution of sample proportions
21 February 2023 49
Sampling Distribution of Sample Means
• The sampling distribution consists of the values of the sample means, ҧ
𝑥1, ҧ
𝑥2, ҧ
𝑥3, ҧ
𝑥4,
ҧ
𝑥5,… ҧ
𝑥𝑛
21 February 2023 50
Properties of Sampling Distributions of Sample Means
The mean of the sample means, 𝜇 ҧ
𝑥, is equal to the population mean μ.
𝝁ഥ
𝒙 = μ
The standard deviation of the sample means, 𝜎 ҧ
𝑥, is equal to the population standard
deviation, σ divided by the square root of the sample size, n.
𝝈ഥ
𝒙=
𝝈
𝒏
Called the standard error of the mean.
21 February 2023 51
The Central Limit Theorem
0 If samples of size n ≥30, are drawn from any population with mean = 𝜇 a,nd standard
deviation = 𝜎, then the sampling distribution of the sample means approximates a
normal distribution. The greater the sample size, the better the approximation.
21 February 2023 52
Statistical Inference
• Two types of statistical inference
I. Parameter Estimation
• Estimating a population’s characteristics from sample data
II. Hypothesis Testing
• Testing statements of relationships between two or more variables
21 February 2023 53
Confidence Interval
0 Provides an idea on how good (precise) an estimate is, e.g. sample mean as a point
estimate of population mean
0 In general, CI for a parameter is [estimate ± (critical value × SE of the estimate)]
0 Common to compute a 95% CI, however other confidence levels can be used, e.g. 99% or
90%.
0 Need three components to compute a CI: an estimate of parameter of interest, critical
value (e.g. z, t) and standard error of the estimate.
21 February 2023 54
Comparison of distributions
Figure 1. Comparison of t and normal distribution
21 February 2023 55
Conditions for using, various test statistics [Z, t or non parameter]
21 February 2023 56
Hypothesis testing
• A hypothesis is a statement or assertion or assumption or claim or
belief about the state of nature (about the true value of an unknown
population parameter)
• A Hypothesis Test is a statistical procedure that involves formulating a
hypothesis and using sample data to decide on the validity of the
hypothesis (to support or not to support)
21 February 2023 57
Types of Hypotheses
• Null hypothesis (H0): states that there will be no relationship between the two variables
• Alternative (research) hypothesis (Ha): states that there will be a relationship between
the two variables
– Directional (one‐sided or one‐tailed), e.g. a mean higher than/less than another or a
null value, positive/negative
– Nondirectional (two‐sided or two‐tailed), e.g. means not equal
21 February 2023 58
Example
Null hypothesis (H0)
There will be no relationship between height and weight in adolescent boys.
Alternative hypothesis (Ha)
Directional: Height will be positively related to weight in adolescent boys (e.g.,
taller boys will weigh more).
Nondirectional: There will be a relationship between height and weight in
adolescent boys.
21 February 2023 59
Tails of the test
21 February 2023 60
Choosing the appropriate Statistical test
• Type of variables
• Number of groups being compared &
• Sample size
21 February 2023 61
Statistical Tests
Z-test:
• Study variable: Qualitative
• Outcome variable: Quantitative
• Comparison: Sample mean with population mean & two sample means
• Sample size: larger in each group(>30) & standard deviation is known
Student’s t-test:
• Study variable: Qualitative
• Outcome variable: Quantitative
• Comparison: sample mean with population mean; two means (independent samples); paired samples.
• Sample size: each group <30 ( can be used even for large sample size)
21 February 2023 62
Independent T‐test
• Used to compare two independent (unrelated) groups: Whether the difference we
observed between groups on a continuous outcome variable is greater than it could occur
by chance alone
• Grouping (independent) variable is dichotomous and continuous outcome(dependent)
variable.
• Examples of two independent groups: Intervention group and a non‐matched
comparison group, Males and females, Groups with and without a medical condition
• Examples of dependent variable
Length of stay in hospital (days), Age (years), Weight (kilogram)
21 February 2023 63
Example
• Research question: on average, will male dieters lose more weight over a 3‐month period
than female dieters?
o Independent variable (IV): gender (male/female)
• Dependent variable (DV): Weight loss in pounds over a 3‐month period
o IV is dichotomous, and DV is continuous
• Report mean and SD of DV for each group
21 February 2023 64
Two types of Independent t‐test
• There are two versions of independent samples t test
• Pooled (equal/homogeneity) variance t‐test
• Separate (unequal) variance t‐test
• We can use Levene’s test to test for homogeneity of variance between the two
groups
• However, SPSS provides both types of analyses by default.
21 February 2023 65
Example: Performing Independent T‐test
• Suppose data come from a study of 17 male and 15 female dieters to test the
following null hypothesis:
• H0: There will be no difference in weight loss between male dieters and female
dieters.
• This hypothesis stems from the research question:
✓Do male dieters lose more weight, on average, than do female dieters?
21 February 2023 66
State Null and Alternative Hypotheses
Null Hypothesis
• 𝐇𝟎: There will be no difference in weight loss between male dieters and female
dieters.
Alternative Hypothesis
• 𝐇𝐀: There will be a significant difference in weight loss between male dieters and
female dieters.
21 February 2023 67
Null and Alternative Hypotheses Notation
Hypotheses
𝐇𝟎 : μ1 = μ2 , or μ1 ‐ μ2 = 0
𝐇𝟏 or 𝐇𝐀 : μ1 ≠ μ2 (two‐tailed or nondirectional)
Where,
μ1 = population mean for first group
μ2 = population mean for second group
21 February 2023 68
Select α and Find the Critical Value
α‐Level
– Statistical significance will be defined as p < 0.05.
Critical Value
– Using the table of critical values of the t‐table(find it on any statistics textbook), we
find that the critical values that define the rejection region are ± 2.042 on 30 df at
alpha=0.05 (two‐tailed/nondirectional test)
– If the t‐statistic is greater than 2.042 or more negative (smaller than) −2.042, we
will reject the null hypothesis
21 February 2023 69
Assumptions of Independent t‐Test
– The independent variable must be dichotomous (two categories, mutually exclusive)
• The two categories must be independent
– The dependent variable has to be continuous (interval or ratio)
–The dependent variable has to be approximately normally distributed
– If normality assumption is not satisfied, a non parametric test called Mann‐Whitney U
test can be used.
21 February 2023 70
Histogram of weight loss variable
21 February 2023 71
SPSS-independent-t-test
21 February 2023 72
Cont’d
21 February 2023 73
SPSS output
21 February 2023 74
Determine Statistical Significance and State a Conclusion
• Since the computed t test statistic of 3.16 is larger than the critical value of 2.042, we
conclude that the difference in mean weight loss between males and females is
statistically significant. We use p‐value (sig.) if SPSS is used
• In short:
0 Men lost an average of 18.6 lbs (SD 6.02) and women lost an average of 12.1 lbs
(SD 5.46). This is a statistically significant difference at p < 0.05 by the
independent t test.
21 February 2023 75
Interval estimate: Manual calculation
Solution:
 Point estimate: 𝐱𝟏 − 𝐱𝟐; 18.59-12.13 = 6.46
 Critical value: tα/2 has (n1 + n2 – 2) d.f., alpha=0.05, using t-table 𝒕𝟑𝟎,𝟎.𝟎𝟓 = 2.042
 Standard error:
𝐬𝐩
𝐧
; where 𝐬𝐩 =
𝐧𝟏−𝟏 𝐬𝟏
𝟐+ 𝐧𝟐−𝟏 𝐬𝟐
𝟐
𝐧𝟏+𝐧𝟐−𝟐
=
𝟏𝟕−𝟏 ×𝟑𝟔.𝟐𝟓+ 𝟏𝟓−𝟏 ×𝟐𝟗.𝟖𝟑
𝟏𝟕+𝟏𝟓−𝟐
= 5.77
 Standard error = 𝐬𝐩
𝟏
𝐧𝟏
+
𝟏
𝐧𝟐
= 𝟓. 𝟕𝟕 ×
𝟏
𝟏𝟕
+
𝟏
𝟏𝟓
= 5.77 *0.354 = 2.043
 Margin of error: t critical * SE of estimates: 2.042*2.043 = 4.17
21 February 2023 76
Estimate for the difference
• The confidence interval for μ1– μ2 is: 𝐱𝟏 − 𝐱𝟐 ± 𝒕𝜶/2 𝐬𝐩
𝟏
𝐧𝟏
+
𝟏
𝐧𝟐
• Where tα/2 has (n1 + n2 – 2) df, and
𝐬𝐩 =
𝐧𝟏 − 𝟏 𝐬𝟏
𝟐 + 𝐧𝟐 − 𝟏 𝐬𝟐
𝟐
𝐧𝟏 + 𝐧𝟐 − 𝟐
• 95%CI for the difference is 𝐱𝟏 − 𝐱𝟐 ± 𝒕𝜶/2 𝐬𝐩
𝟏
𝐧𝟏
+
𝟏
𝐧𝟐
• 95%CI: 6.46 ±2.042 ∗2.043⟹ (𝐿𝐿, 𝑈𝐿) ⟹ (6.46 − 4.17, 6.46 + 4.17)
 Ans: 95%CI: (2.29, 10.63); interpretation?
21 February 2023 77
Hypothesis testing: hand calculation
0 Step 1: Set up hypotheses and determine level of
significance
0 Step 2: Select the appropriate test statistic.
0 Step 3: Set up decision rule
0 Step 4: Compute the test statistic
0 Step 5: Conclusion.
21 February 2023 78
Cont’d
𝑡𝑐𝑎𝑙 =
𝐱𝟏−𝐱𝟐
𝐬𝐩
𝟏
𝐧𝟏
+
𝟏
𝐧𝟐
=
6.46
4.17
= 3.16, which is greater than t-critical
• After the statistical computation: Reject 𝐻𝑜 if tcalc. >2.042(from t-table)
Since 3.16 >2.042, we reject null hypothesis
21 February 2023 79
Summary
• Identify a research question that is based on two independent groups (groups can be
equal or unequal sized)
• Perform t‐test, either equal variance or unequal variance, based on Levene’s test of
homogeneity of variance
• If dependent variable is not approximately normally distributed, Mann‐Whitney U test
(a nonparametric equivalent method) can be used (to appear later in this course)
• If a computer program like SPSS is used, look at p‐value (sig.)
• Interpret results and state a conclusion
21 February 2023 80
Paired t‐Test
• Used to compare two groups that are dependent or closely related.
• Are the means of two related groups (paired or matched pairs) different from one another?
• Whether the difference we observed between groups is greater than it could occur by
chance alone
• Example
– Population type: Pretest‐posttest measures on the same person (time), twin studies,
couples)
– Outcome: Blood pressure premedication and post‐medication, Weight (case and control)
21 February 2023 81
Cont’d
Pretest – posttest study
• Effect of an intervention on study participants by comparing posttest values
of the dependent variable after the intervention to the pretest values of the
same variable before the intervention
• Example:
–Weight loss after education intervention by comparing the study
participants’ post‐intervention weight with their pre‐intervention weight?
21 February 2023 82
Cont’d
Matched pairs study
• Examine the effect of an intervention in which a participant who receives the intervention is
matched to a control who does not receive the intervention, matching criteria can be
gender, education, socioeconomic status etc.
• Example
– Does breast feeding affect bone density?
– Matched study, 58 female twin pairs, one with breast feeding, one not, used paired t test to examine difference in
bone density
– We want the two groups to be very similar on factors other than breast feeding status and observe bone density!
Other factors that are the same in this matched study are gender, age, BMI, ethnicity, health status (as they are
twins!)
21 February 2023 83
Steps in computing paired t‐test
– State null hypothesis and alternative hypothesis.
– Define the significance level, degree of freedom, and thus critical value for
computed T test statistics.
– Make sure that the data meet the assumptions for using paired t test.
– Compute the paired t-test statistic.
– Determine statistical significance and state a conclusion.
21 February 2023 84
Example
Research question
Does a “healthy heart” education program increase knowledge about
cardiovascular knowledge in teenagers?
Independent (Grouping) variable
Pretest/posttest measure
Dependent variable
Score on a 100 point quiz about cardiovascular health
Mean and SD for each time and for difference between two time points would be relevant
21 February 2023 85
State Hypotheses
❖Null Hypothesis
• H0: There will be no difference between the pretest and posttest
cardiovascular knowledge
❖Alternative Hypothesis (nondirectional)
• HA: There will be a significant difference in cardiovascular knowledge after
attending the program
❖This hypothesis stems from the research question:
• Does a “healthy heart” education program increase knowledge about
cardiovascular knowledge in teenagers?
21 February 2023 86
Specify α Level and Find the Critical Value
1. α‐Level
• Statistical significance will be defined as p<0.05
2. Critical Value
• Using the table of critical values of the t‐table, we find that the critical values that
define the rejection region are ±2.042 (on 30 df)
• If the t‐statistic is greater than 2.042 or more negative (smaller than) −2.042, we will
reject the null hypothesis
• This would be relevant for hand calculations. We would use p‐value (sig.) from a
computer output, e.g. SPSS
21 February 2023 87
Ensure data meet the assumptions
• There are two paired measures of the dependent variable
• Dependent variable is approximately normally distributed.
21 February 2023 88
Paired t test: SPSS procedure
21 February 2023 89
cont’d
21 February 2023 90
Paired t‐Test: SPSS Output
21 February 2023 91
Determine statistical significance and state a conclusion
• Since the computed t test statistics of 6.935 is greater than the critical value of 2.042,
we conclude that the difference in test scores from pretest to posttest is statistically
significant.
In short:
• The posttest score was an average of 9.03 points higher than the pretest score, and this
was significant at p < 0.05 by the paired t test. It can be concluded that the teenagers
had significantly higher test scores after going through the educational program.
21 February 2023 92
Summary
• Identify a research question that is based on true or matched pairs (dependent groups)
• Equal sample size across groups as they are paired, so degrees of freedom is number of
pairs minus 1
• Perform paired t‐test (one sample t‐test based on differences)
• If dependent variable is not approximately normally distributed, Wilcoxon Signed rank
test (a nonparametric equivalent method) can be used (to appear later in this course)
• If hand calculations are performed, use t‐distribution table
• If a computer program like SPSS is used, look at p‐value (sig.)
• Interpret results and state a conclusion
21 February 2023 93
Hand calculation: Try using the above example
21 February 2023 94
1. Interval estimate for true knowledge score
differences
2. Test the hypothesis
21 February 2023 95

More Related Content

Similar to 1Basic biostatistics.pdf

Biostatistics CH Lecture Pack
Biostatistics CH Lecture PackBiostatistics CH Lecture Pack
Biostatistics CH Lecture Pack
Shaun Cochrane
 
3Measurements of health and disease_MCTD.pdf
3Measurements of health and disease_MCTD.pdf3Measurements of health and disease_MCTD.pdf
3Measurements of health and disease_MCTD.pdf
AmanuelDina
 
Research methodology and iostatistics ppt
Research methodology and iostatistics pptResearch methodology and iostatistics ppt
Research methodology and iostatistics ppt
Nikhat Mohammadi
 

Similar to 1Basic biostatistics.pdf (20)

Chapter 12 Data Analysis Descriptive Methods and Index Numbers
Chapter 12 Data Analysis Descriptive Methods and Index NumbersChapter 12 Data Analysis Descriptive Methods and Index Numbers
Chapter 12 Data Analysis Descriptive Methods and Index Numbers
 
Biostatistics CH Lecture Pack
Biostatistics CH Lecture PackBiostatistics CH Lecture Pack
Biostatistics CH Lecture Pack
 
2. chapter ii(analyz)
2. chapter ii(analyz)2. chapter ii(analyz)
2. chapter ii(analyz)
 
Frequency Distribution.pdf
Frequency Distribution.pdfFrequency Distribution.pdf
Frequency Distribution.pdf
 
Introduction to Statistics .pdf
Introduction to Statistics .pdfIntroduction to Statistics .pdf
Introduction to Statistics .pdf
 
Basic statisctis -Anandh Shankar
Basic statisctis -Anandh ShankarBasic statisctis -Anandh Shankar
Basic statisctis -Anandh Shankar
 
Week 2 measures of disease occurence
Week 2  measures of disease occurenceWeek 2  measures of disease occurence
Week 2 measures of disease occurence
 
Biostatistics ppt
Biostatistics  pptBiostatistics  ppt
Biostatistics ppt
 
measures of central tendency.pptx
measures of central tendency.pptxmeasures of central tendency.pptx
measures of central tendency.pptx
 
Measure of central tendency grouped data.pptx
Measure of central tendency grouped data.pptxMeasure of central tendency grouped data.pptx
Measure of central tendency grouped data.pptx
 
1_6 practical analysis using SPSS, Part I (2).pptx
1_6 practical analysis using SPSS, Part I (2).pptx1_6 practical analysis using SPSS, Part I (2).pptx
1_6 practical analysis using SPSS, Part I (2).pptx
 
Statistics
StatisticsStatistics
Statistics
 
Data Display and Summary
Data Display and SummaryData Display and Summary
Data Display and Summary
 
3Measurements of health and disease_MCTD.pdf
3Measurements of health and disease_MCTD.pdf3Measurements of health and disease_MCTD.pdf
3Measurements of health and disease_MCTD.pdf
 
EXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSISEXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSIS
 
Research methodology and iostatistics ppt
Research methodology and iostatistics pptResearch methodology and iostatistics ppt
Research methodology and iostatistics ppt
 
BRM_Data Analysis, Interpretation and Reporting Part II.ppt
BRM_Data Analysis, Interpretation and Reporting Part II.pptBRM_Data Analysis, Interpretation and Reporting Part II.ppt
BRM_Data Analysis, Interpretation and Reporting Part II.ppt
 
Medical Statistics.ppt
Medical Statistics.pptMedical Statistics.ppt
Medical Statistics.ppt
 
STATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptxSTATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptx
 
STATISTICS.pptx
STATISTICS.pptxSTATISTICS.pptx
STATISTICS.pptx
 

More from YomifDeksisaHerpa (6)

yom seminar TWO.pptx
yom seminar TWO.pptxyom seminar TWO.pptx
yom seminar TWO.pptx
 
2Analysis of Variance.pdf
2Analysis of Variance.pdf2Analysis of Variance.pdf
2Analysis of Variance.pdf
 
2. Descriptive Statistics.pdf
2. Descriptive Statistics.pdf2. Descriptive Statistics.pdf
2. Descriptive Statistics.pdf
 
3. Descriptive statistics.pdf
3. Descriptive statistics.pdf3. Descriptive statistics.pdf
3. Descriptive statistics.pdf
 
Delivering effective presentations.ppt
Delivering effective presentations.pptDelivering effective presentations.ppt
Delivering effective presentations.ppt
 
ethical dillema.pptx
ethical dillema.pptxethical dillema.pptx
ethical dillema.pptx
 

Recently uploaded

Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls  * UPA...Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls  * UPA...
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...
mahaiklolahd
 

Recently uploaded (20)

Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...
Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...
Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...
 
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
 
Coimbatore Call Girls in Coimbatore 7427069034 genuine Escort Service Girl 10...
Coimbatore Call Girls in Coimbatore 7427069034 genuine Escort Service Girl 10...Coimbatore Call Girls in Coimbatore 7427069034 genuine Escort Service Girl 10...
Coimbatore Call Girls in Coimbatore 7427069034 genuine Escort Service Girl 10...
 
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls  * UPA...Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls  * UPA...
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...
 
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...
Manyata Tech Park ( Call Girls ) Bangalore ✔ 6297143586 ✔ Hot Model With Sexy...
 
Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...
Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...
Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...
 
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any TimeTop Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
 
Independent Call Girls Service Mohali Sector 116 | 6367187148 | Call Girl Ser...
Independent Call Girls Service Mohali Sector 116 | 6367187148 | Call Girl Ser...Independent Call Girls Service Mohali Sector 116 | 6367187148 | Call Girl Ser...
Independent Call Girls Service Mohali Sector 116 | 6367187148 | Call Girl Ser...
 
Call Girls Hyderabad Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Hyderabad Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Hyderabad Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Hyderabad Just Call 8250077686 Top Class Call Girl Service Available
 
9630942363 Genuine Call Girls In Ahmedabad Gujarat Call Girls Service
9630942363 Genuine Call Girls In Ahmedabad Gujarat Call Girls Service9630942363 Genuine Call Girls In Ahmedabad Gujarat Call Girls Service
9630942363 Genuine Call Girls In Ahmedabad Gujarat Call Girls Service
 
Saket * Call Girls in Delhi - Phone 9711199012 Escorts Service at 6k to 50k a...
Saket * Call Girls in Delhi - Phone 9711199012 Escorts Service at 6k to 50k a...Saket * Call Girls in Delhi - Phone 9711199012 Escorts Service at 6k to 50k a...
Saket * Call Girls in Delhi - Phone 9711199012 Escorts Service at 6k to 50k a...
 
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
 
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
 
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
 
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
 
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...
 
Call Girls Service Jaipur {8445551418} ❤️VVIP BHAWNA Call Girl in Jaipur Raja...
Call Girls Service Jaipur {8445551418} ❤️VVIP BHAWNA Call Girl in Jaipur Raja...Call Girls Service Jaipur {8445551418} ❤️VVIP BHAWNA Call Girl in Jaipur Raja...
Call Girls Service Jaipur {8445551418} ❤️VVIP BHAWNA Call Girl in Jaipur Raja...
 
Russian Call Girls Service Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
Russian Call Girls Service  Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...Russian Call Girls Service  Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
Russian Call Girls Service Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
 
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
 
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
 

1Basic biostatistics.pdf

  • 1. Course title: Advanced biostatistics Course code: ? Credit hours: 3 21 February 2023 1
  • 2. Course contents 1. Revision on basic biostatistics 2. Correlation analysis 3. Analysis of variance (ANOVA) 4. Introduction to Non-parametric tests 5. Linear regression 6. Analysis of Categorical Data : a) Analysis of contingency tables and b) logistic regression 7. Longitudinal data analysis Survival Analysis 21 February 2023 2
  • 3. Cont’d • Software package required – SPSS/STATA • Assessment methods a) Formative (40%): – Individual, group assignments, project, Appraisal, Group presentations b) Summative(60%): – Final written exam 21 February 2023 3
  • 4. I. Review of basic biostatistics Emiru Merdassa(MSc, Assistant Professor) 21 February 2023 4
  • 5. Learning objectives • Rehearse on descriptive and inferential statistics • Identify the levels of measurement of variables used in research • Describe graphical displays of data appropriate for specific levels of measurement. • Identify measures of central tendency & dispersion appropriate for level of measurement. • Conduct and interpret these statistical tests using the SPSS/STATA • Two-sample t-test for independent samples (unpaired) • T-test for dependent samples (matched or paired samples) • Explain results of t-tests in an understandable way. • Select the correct statistical procedure for different questions 21 February 2023 5
  • 6. What is Statistics? 1. Collecting data e.g., Sample, Survey, Observe, Simulate 2. Characterizing data e.g., Organize/Classify, Count, Summarize 3. Presenting data e.g., Tables, Charts, Statements 4. Interpreting results e.g. Infer, Conclude, Specify Confidence Why? Data analysis Decision making © 1984-1994 T/Maker Co. 21 February 2023 6
  • 7. Types of Statistics/biostatistics 1. Descriptive statistics – Numerical or graphic summaries of data – Charts, graphs, tables, summary statistics (e.g., mean and standard deviation) 2. Inferential Statistics – Statistical techniques that allow conclusions to be drawn about the population – Examples include Chi-square test, t test, ANOVA 21 February 2023 7
  • 8. Variable  A variable is any characteristic that can and does assume different values for different people, objects, or events being studied For example: • heart rate, • the heights of adult males, • the weights of preschool children, • the ages of patients seen in a dental clinic. 21 February 2023 8
  • 9. Types of variables 21 February 2023 9
  • 10. Measurement Scales 1. Nominal • Numbers are simply used as a code to represent characteristics • There is no order to the categories • The assignment of numbers to categories is arbitrary • Examples of Variables – Gender: 1. Male 2. Female – Ethnicity: 1. Oromo 2. Tigre 3. Amhara 4. Guraghe 21 February 2023 10
  • 11. 2. Ordinal – Numbers represent categories that can be placed in a meaningful numerical order (e.g., from lowest to highest) – There is no information regarding the size of the interval between the different values – Ordinal: Example of a Variable: Example: Pain Scale 1. No pain 2. A little pain 3. Some pain 4. A lot of pain ♦ Note: Almost all subjective scales (satisfaction, pain, and depression) are considered ordinal 21 February 2023 11
  • 12. 3. Interval oNumbers can be placed in meaningful order oThe intervals between the numbers are equal oIt is possible to add and subtract across an interval scale oThere is no true zero, so ratios cannot be calculated oExample: Temperature in Fahrenheit, IQ oNote that all of these do not have a “true zero” 21 February 2023 12
  • 13. 4. Ratio • Numbers can be placed in meaningful order • The intervals between the numbers are equal • There is a “true” zero, determined by nature, which represents the absence of the phenomena • Almost all biomedical measures (weight, pulse rate, and cholesterol level) are of ratio scale • Examples of a Variable: Weight, Age, Number of minutes spent exercising, Cholesterol level, Number of weeks pregnant – Note that all of these do have a “true zero” 21 February 2023 13
  • 14. Population and Sample Population • It is the group that is targeted to collect the data from. • It is always defined first, before starting the data collection process for any statistical study. • It is not necessarily be people rather it could be micro-organism, measurements of rainfall in an area or a group of people. • It is the collection of all items of interest or under investigation • N represents the population size • A specific characteristic is called parameter 21 February 2023 14
  • 15. Population and Sample Sample • It is the part of population which is selected randomly for the study. • The sample should be selected such that it represents all the characteristics of the population. • n represents the sample size • A specific characteristic is called statistics 21 February 2023 15
  • 16. Population vs. Sample a b c d ef gh i jk l m n o p q rs t u v w x y z Population Sample b c g i n o r u y Values calculated using population data are called parameters Values computed from sample data are called statistics 21 February 2023 16
  • 17. Cont’d • Data presentation – Tabulation – Graphs • Data summary measure – Measures of Location – Measures of Dispersion – Measures of Skewness & Kurtosis • Inferential statistics – Estimation: Point estimate & Interval estimate – Hypothesis Testing • Univariate analysis /Multivariate analysis: adjust the confounder 21 February 2023 17
  • 18. Data Presentation • Overall goal is to get a feeling for the distribution of the data 0 Central tendency: most frequently occurring or typical/common values 0 Dispersion: how the values are spread out 0 Shape and skewness: symmetry or asymmetry of the distribution of the values 0 Outliers: unusual values that do not fit the overall pattern of the data 21 February 2023 18
  • 19. Data Presentation • Frequency distribution table – A way of organizing the data in table form • Table shows – Possible values of the variable – Raw frequencies (number of cases with that value) – Relative frequency (% of cases with that value) – Cumulative frequency (total % having up to and including a given value of the variable) 21 February 2023 19
  • 20. Frequency distribution table Weight Range/Interval Raw Frequency (No) Relative frequency (% of total sample) Cumulative Frequency (Cumulative %) 45-54 2 5.1 5.1 55-64 4 10.3 15.4 65-74 5 12.8 . 75-84 6 15.4 . 85-94 11 28.2 . 95-104 4 10.3 . 105-114 3 7.7 . 115-124 2 5.1 . 125-134 1 2.6 . 135-144 0 0 . 145-154 1 2.6 . Total 39 100 21 February 2023 20
  • 21. Graphic Presentation  The graphs just represent a summary of data  It is usually suggested that the graphic representation of the data should be looked at before proceeding for format statistical analysis Common uses of Graphs  Visual representation  Good to understand and attractive  To check assumptions  Help in selection of Statistical tools 21 February 2023 21
  • 22. Types of Graphs for Categorical Variables Graphing data Bar-chart Pie-chart 21 February 2023 22
  • 23. Quantitative data graphical presentation 0 Histogram 0 Stem-and-leaf plot 0 Box plot 0 Scatter plot 0 Line graph & etc. 21 February 2023 23
  • 24. General rules for designing graphs  A graph should have a self-explanatory legend: Title: For the Table on the top. For the Graph at the bottom.  A graph should help reader to understand data  Axis labeled, units of measurement indicated  Scales important. Start with zero (otherwise // break) If not put // on the x-axis  Avoid graphs with three-dimensional impression, it may be misleading (reader visualize less easily) 21 February 2023 24
  • 25. Measures of Central Tendency 1. Mean: • The arithmetic average of the distribution. • Most appropriate for interval and ratio level data. • Sometimes used for ordinal data. 2. Median: • The value that is in the middle of the distribution, i.e. the 50th percentile. Appropriate for ordinal, interval, and ratio level data. 3. Mode: • The most frequently occurring value. • There can be multiple modes. Appropriate for all measurement levels. 21 February 2023 25
  • 26. Mean • Mean is the sum of all of the values of the variable in a given data set divided by the total number of values 21 February 2023 26
  • 27. Measures of dispersion or variability • Overall goal is to get a feeling for the spread of the data. • Range: The difference between the highest and the lowest value in a data set. • Interquartile range: The difference between the first (Q1) and the third (Q3) quartile in the distribution. • Standard deviation: The average distance (deviation) of each point from the mean. • Coefficient of variation: compare the dispersion in two sets of data which is independent of the unit of the measurement. CV = SD ഥ X *100 21 February 2023 27
  • 28. Standard Deviation • The sample variance (𝐒𝟐 ) is the sum of the squared deviations from the mean, divided by n-1 (the number of values summed -1) • The standard deviation (s) is the square root of the variance 21 February 2023 28
  • 29. Quartiles • Quartiles are the values that divide a list of numbers into quarters: • Put the list of numbers in order • Then cut the list into four equal parts • Example: 5, 7, 4, 4, 6, 2, 8 • Put them in order: 2, 4, 4, 5, 6, 7, 8 • Cut the list into quarters: • Quartile 1 (Q1) = 4 • Quartile 2 (Q2), which is also the Median, = 5 • Quartile 3 (Q3) = 7 21 February 2023 29
  • 30. Interquartile Range 3rd quartile – 1st quartile  75th – 25th percentile 3(n+1)/4 - (n+1)/4 Robust to outliers Middle 50% of observations The Interquartile Range is: IQR = Q3 − Q1 = 7 − 4 = 3 21 February 2023 30
  • 31. Which measure to use ? • If the distribution of data is Symmetric, use/report –Mean with Standard Deviation • If the distribution of data is skewed, use/report –Median with IQR 21 February 2023 31
  • 32. Measures of shape • It is necessary to consider the shape of the data – the manner, in which the data are distributed. • There are two measures of the shape of a data set: oSkewness and oKurtosis. 21 February 2023 32
  • 33. Skewness ❖Skew is a measure of symmetry in the distribution of scores ❖skewness is defined by the formula: ❖Skewness: • a3 > 0 distribution skewed to the right/ positively skewed • a3 < 0 distribution skewed to the left/ negatively skewed • a3 = 0 then, the distribution is symmetrical. 21 February 2023 33
  • 34. Measure of Skew Positive Skew Negative Skew Normal (skew = 0) 21 February 2023 34
  • 35. Kurtosis • Kurtosis characterizes the relative Peakedness or flatness of a distribution compared with the bell-shaped distribution (normal distribution). • Kurtosis of a sample data set is calculated by the formula: Kurtosis: • a4 > 3 thinner tails & higher peak than a normal distribution • a4 < 3 thicker tails & lower peak compared to a normal distribution For a meaningful and comparable measure of a4, the distribution should be symmetrical (hence again the need to have a normal distribution) 21 February 2023 35
  • 36. Kurtosis • Kurtosis measures whether the scores are spread out more or less than they would be in a normal (Gaussian) distribution Mesokurtic (a4 = 3) Leptokurtic (a4 > 3) Platykurtic (a4 < 3) 21 February 2023 36
  • 37. Basic probability • Definition and characteristics of probability • Types of probability – Objective probability(classical and Empirical) – Subjective probability • Probability distribution – Binomial distribution – Continuous probability distribution(normal distribution) 21 February 2023 37
  • 38. Normal distribution, Sampling distribution & Estimation I. Normal Distribution  One of the most important theoretical (a priori) probability distributions in statistics 21 February 2023 38
  • 39. Properties of the Normal Distribution Mean=median=mode Bell shaped Symmetrical around the mean Area under the curve = 1 68% of the data lie within +/- one standard deviation (SD) from the mean 95% of the data lie within +/- two standard deviations (SD) from the mean >99% of the data lie within +/- three standard deviations (SD) from the mean 21 February 2023 39
  • 40. Percentile Ranks & Z-Scores • Use this formula to convert the data to Z scores: 𝐙 = 𝐱 − 𝛍 𝛔 • The Z-score of point “x” equals (x minus the mean) divided by the standard deviation • The Z-score can be looked up in a Z-table to get the percentile rank • Those points with a positive Z-score have a percentile rank of greater than 50 and those points with a negative Z-score have a percentile rank of less than 50 • Those points with a Z-score of zero have a percentile rank of exactly 50. These scores are the median value. 21 February 2023 40
  • 41. Example • We have a group of 62 young women with a mean age of 16 years and a standard deviation of 2.94 years. What would be the percentile rank of a girl aged 14 years? What percentage of girls are the 14 years or younger? • Step #1: Obtain the Z-score 𝐙 = 𝟏𝟒−𝟏𝟔 𝟐.𝟗𝟒 = -0.6802 Z-Scores • Step #2: Look this number up in a Z-table (also called “table of the area under the normal curve”) ✓ A Z-score of −0.6802 corresponds to an area of 0.2483 under the curve ✓ The percentile rank is 24.83 ✓ Thus, 24.83% of the girls are age 14 or younger 21 February 2023 41
  • 42. Diagnostic Tests • Diagnostic tests attempt to classify whether somebody has a disease or not before symptoms are present. There is a need to establish how good a diagnostic test is in detecting disease. 21 February 2023 42
  • 43. Diagnostic Tests 1. Sensitivity: This is the proportion of diseased individuals that are correctly identified by the test as having the disease. P(+ve/D) Sensitivity = 𝐚 𝐚+𝐜 2. Specificity: This is the proportion of non-diseased individuals that are correctly identified by the test as not having the disease. P(-Ve/ND) Specificity = 𝐝 𝐛+𝐝 21 February 2023 43
  • 44. cont’d 3. Positive Predictive Value: This is the proportion of individuals with positive test results that are correctly diagnosed and actually have the disease. P(D/+Ve) PPV = a a+b 4. Negative Predictive Value: This is the proportion of individuals with negative test results that are correctly diagnosed and do not have the disease. P(ND/-Ve) NPV = d c+d 21 February 2023 44
  • 45. Example • Consider a test for HIV which uses certain test to assess HIV status; if the test returns a positive result then the patient is presumed to have the disease. The true diagnosis is whether the patient truly has HIV or not. 21 February 2023 45 True diagnosis HIV Non-HIV Total Test results Positive 900 1100 2000 Negative 450 3550 4000 1350 4650 6000
  • 46. Solution –Sensitivity = 𝐚 𝐚+𝐜 = 900 1350 = 𝟎. 𝟔𝟕 –Specificity = 𝐝 𝐛+𝐝 = 3650 4650 = 𝟎. 𝟕𝟔 –PPV = a a+b = 900 2000 = 𝟎. 𝟒𝟓 –NPV = d c+d = 3550 4000 = 𝟎. 𝟖𝟗 21 February 2023 46
  • 48. Study designs cont’d 21 February 2023 48 • Observational: studies do not involve any intervention or experiment. • Experimental: studies that entail manipulation of the study factor (exposure) and randomization of subjects to treatment (exposure) groups
  • 49. Sampling distributions ♦ Sampling distribution • The probability distribution of a sample statistic. • Formed when samples of size n are repeatedly taken from a population. ♦ Example  Sampling distribution of sample means  Sampling distribution of sample proportions 21 February 2023 49
  • 50. Sampling Distribution of Sample Means • The sampling distribution consists of the values of the sample means, ҧ 𝑥1, ҧ 𝑥2, ҧ 𝑥3, ҧ 𝑥4, ҧ 𝑥5,… ҧ 𝑥𝑛 21 February 2023 50
  • 51. Properties of Sampling Distributions of Sample Means The mean of the sample means, 𝜇 ҧ 𝑥, is equal to the population mean μ. 𝝁ഥ 𝒙 = μ The standard deviation of the sample means, 𝜎 ҧ 𝑥, is equal to the population standard deviation, σ divided by the square root of the sample size, n. 𝝈ഥ 𝒙= 𝝈 𝒏 Called the standard error of the mean. 21 February 2023 51
  • 52. The Central Limit Theorem 0 If samples of size n ≥30, are drawn from any population with mean = 𝜇 a,nd standard deviation = 𝜎, then the sampling distribution of the sample means approximates a normal distribution. The greater the sample size, the better the approximation. 21 February 2023 52
  • 53. Statistical Inference • Two types of statistical inference I. Parameter Estimation • Estimating a population’s characteristics from sample data II. Hypothesis Testing • Testing statements of relationships between two or more variables 21 February 2023 53
  • 54. Confidence Interval 0 Provides an idea on how good (precise) an estimate is, e.g. sample mean as a point estimate of population mean 0 In general, CI for a parameter is [estimate ± (critical value × SE of the estimate)] 0 Common to compute a 95% CI, however other confidence levels can be used, e.g. 99% or 90%. 0 Need three components to compute a CI: an estimate of parameter of interest, critical value (e.g. z, t) and standard error of the estimate. 21 February 2023 54
  • 55. Comparison of distributions Figure 1. Comparison of t and normal distribution 21 February 2023 55
  • 56. Conditions for using, various test statistics [Z, t or non parameter] 21 February 2023 56
  • 57. Hypothesis testing • A hypothesis is a statement or assertion or assumption or claim or belief about the state of nature (about the true value of an unknown population parameter) • A Hypothesis Test is a statistical procedure that involves formulating a hypothesis and using sample data to decide on the validity of the hypothesis (to support or not to support) 21 February 2023 57
  • 58. Types of Hypotheses • Null hypothesis (H0): states that there will be no relationship between the two variables • Alternative (research) hypothesis (Ha): states that there will be a relationship between the two variables – Directional (one‐sided or one‐tailed), e.g. a mean higher than/less than another or a null value, positive/negative – Nondirectional (two‐sided or two‐tailed), e.g. means not equal 21 February 2023 58
  • 59. Example Null hypothesis (H0) There will be no relationship between height and weight in adolescent boys. Alternative hypothesis (Ha) Directional: Height will be positively related to weight in adolescent boys (e.g., taller boys will weigh more). Nondirectional: There will be a relationship between height and weight in adolescent boys. 21 February 2023 59
  • 60. Tails of the test 21 February 2023 60
  • 61. Choosing the appropriate Statistical test • Type of variables • Number of groups being compared & • Sample size 21 February 2023 61
  • 62. Statistical Tests Z-test: • Study variable: Qualitative • Outcome variable: Quantitative • Comparison: Sample mean with population mean & two sample means • Sample size: larger in each group(>30) & standard deviation is known Student’s t-test: • Study variable: Qualitative • Outcome variable: Quantitative • Comparison: sample mean with population mean; two means (independent samples); paired samples. • Sample size: each group <30 ( can be used even for large sample size) 21 February 2023 62
  • 63. Independent T‐test • Used to compare two independent (unrelated) groups: Whether the difference we observed between groups on a continuous outcome variable is greater than it could occur by chance alone • Grouping (independent) variable is dichotomous and continuous outcome(dependent) variable. • Examples of two independent groups: Intervention group and a non‐matched comparison group, Males and females, Groups with and without a medical condition • Examples of dependent variable Length of stay in hospital (days), Age (years), Weight (kilogram) 21 February 2023 63
  • 64. Example • Research question: on average, will male dieters lose more weight over a 3‐month period than female dieters? o Independent variable (IV): gender (male/female) • Dependent variable (DV): Weight loss in pounds over a 3‐month period o IV is dichotomous, and DV is continuous • Report mean and SD of DV for each group 21 February 2023 64
  • 65. Two types of Independent t‐test • There are two versions of independent samples t test • Pooled (equal/homogeneity) variance t‐test • Separate (unequal) variance t‐test • We can use Levene’s test to test for homogeneity of variance between the two groups • However, SPSS provides both types of analyses by default. 21 February 2023 65
  • 66. Example: Performing Independent T‐test • Suppose data come from a study of 17 male and 15 female dieters to test the following null hypothesis: • H0: There will be no difference in weight loss between male dieters and female dieters. • This hypothesis stems from the research question: ✓Do male dieters lose more weight, on average, than do female dieters? 21 February 2023 66
  • 67. State Null and Alternative Hypotheses Null Hypothesis • 𝐇𝟎: There will be no difference in weight loss between male dieters and female dieters. Alternative Hypothesis • 𝐇𝐀: There will be a significant difference in weight loss between male dieters and female dieters. 21 February 2023 67
  • 68. Null and Alternative Hypotheses Notation Hypotheses 𝐇𝟎 : μ1 = μ2 , or μ1 ‐ μ2 = 0 𝐇𝟏 or 𝐇𝐀 : μ1 ≠ μ2 (two‐tailed or nondirectional) Where, μ1 = population mean for first group μ2 = population mean for second group 21 February 2023 68
  • 69. Select α and Find the Critical Value α‐Level – Statistical significance will be defined as p < 0.05. Critical Value – Using the table of critical values of the t‐table(find it on any statistics textbook), we find that the critical values that define the rejection region are ± 2.042 on 30 df at alpha=0.05 (two‐tailed/nondirectional test) – If the t‐statistic is greater than 2.042 or more negative (smaller than) −2.042, we will reject the null hypothesis 21 February 2023 69
  • 70. Assumptions of Independent t‐Test – The independent variable must be dichotomous (two categories, mutually exclusive) • The two categories must be independent – The dependent variable has to be continuous (interval or ratio) –The dependent variable has to be approximately normally distributed – If normality assumption is not satisfied, a non parametric test called Mann‐Whitney U test can be used. 21 February 2023 70
  • 71. Histogram of weight loss variable 21 February 2023 71
  • 75. Determine Statistical Significance and State a Conclusion • Since the computed t test statistic of 3.16 is larger than the critical value of 2.042, we conclude that the difference in mean weight loss between males and females is statistically significant. We use p‐value (sig.) if SPSS is used • In short: 0 Men lost an average of 18.6 lbs (SD 6.02) and women lost an average of 12.1 lbs (SD 5.46). This is a statistically significant difference at p < 0.05 by the independent t test. 21 February 2023 75
  • 76. Interval estimate: Manual calculation Solution:  Point estimate: 𝐱𝟏 − 𝐱𝟐; 18.59-12.13 = 6.46  Critical value: tα/2 has (n1 + n2 – 2) d.f., alpha=0.05, using t-table 𝒕𝟑𝟎,𝟎.𝟎𝟓 = 2.042  Standard error: 𝐬𝐩 𝐧 ; where 𝐬𝐩 = 𝐧𝟏−𝟏 𝐬𝟏 𝟐+ 𝐧𝟐−𝟏 𝐬𝟐 𝟐 𝐧𝟏+𝐧𝟐−𝟐 = 𝟏𝟕−𝟏 ×𝟑𝟔.𝟐𝟓+ 𝟏𝟓−𝟏 ×𝟐𝟗.𝟖𝟑 𝟏𝟕+𝟏𝟓−𝟐 = 5.77  Standard error = 𝐬𝐩 𝟏 𝐧𝟏 + 𝟏 𝐧𝟐 = 𝟓. 𝟕𝟕 × 𝟏 𝟏𝟕 + 𝟏 𝟏𝟓 = 5.77 *0.354 = 2.043  Margin of error: t critical * SE of estimates: 2.042*2.043 = 4.17 21 February 2023 76
  • 77. Estimate for the difference • The confidence interval for μ1– μ2 is: 𝐱𝟏 − 𝐱𝟐 ± 𝒕𝜶/2 𝐬𝐩 𝟏 𝐧𝟏 + 𝟏 𝐧𝟐 • Where tα/2 has (n1 + n2 – 2) df, and 𝐬𝐩 = 𝐧𝟏 − 𝟏 𝐬𝟏 𝟐 + 𝐧𝟐 − 𝟏 𝐬𝟐 𝟐 𝐧𝟏 + 𝐧𝟐 − 𝟐 • 95%CI for the difference is 𝐱𝟏 − 𝐱𝟐 ± 𝒕𝜶/2 𝐬𝐩 𝟏 𝐧𝟏 + 𝟏 𝐧𝟐 • 95%CI: 6.46 ±2.042 ∗2.043⟹ (𝐿𝐿, 𝑈𝐿) ⟹ (6.46 − 4.17, 6.46 + 4.17)  Ans: 95%CI: (2.29, 10.63); interpretation? 21 February 2023 77
  • 78. Hypothesis testing: hand calculation 0 Step 1: Set up hypotheses and determine level of significance 0 Step 2: Select the appropriate test statistic. 0 Step 3: Set up decision rule 0 Step 4: Compute the test statistic 0 Step 5: Conclusion. 21 February 2023 78
  • 79. Cont’d 𝑡𝑐𝑎𝑙 = 𝐱𝟏−𝐱𝟐 𝐬𝐩 𝟏 𝐧𝟏 + 𝟏 𝐧𝟐 = 6.46 4.17 = 3.16, which is greater than t-critical • After the statistical computation: Reject 𝐻𝑜 if tcalc. >2.042(from t-table) Since 3.16 >2.042, we reject null hypothesis 21 February 2023 79
  • 80. Summary • Identify a research question that is based on two independent groups (groups can be equal or unequal sized) • Perform t‐test, either equal variance or unequal variance, based on Levene’s test of homogeneity of variance • If dependent variable is not approximately normally distributed, Mann‐Whitney U test (a nonparametric equivalent method) can be used (to appear later in this course) • If a computer program like SPSS is used, look at p‐value (sig.) • Interpret results and state a conclusion 21 February 2023 80
  • 81. Paired t‐Test • Used to compare two groups that are dependent or closely related. • Are the means of two related groups (paired or matched pairs) different from one another? • Whether the difference we observed between groups is greater than it could occur by chance alone • Example – Population type: Pretest‐posttest measures on the same person (time), twin studies, couples) – Outcome: Blood pressure premedication and post‐medication, Weight (case and control) 21 February 2023 81
  • 82. Cont’d Pretest – posttest study • Effect of an intervention on study participants by comparing posttest values of the dependent variable after the intervention to the pretest values of the same variable before the intervention • Example: –Weight loss after education intervention by comparing the study participants’ post‐intervention weight with their pre‐intervention weight? 21 February 2023 82
  • 83. Cont’d Matched pairs study • Examine the effect of an intervention in which a participant who receives the intervention is matched to a control who does not receive the intervention, matching criteria can be gender, education, socioeconomic status etc. • Example – Does breast feeding affect bone density? – Matched study, 58 female twin pairs, one with breast feeding, one not, used paired t test to examine difference in bone density – We want the two groups to be very similar on factors other than breast feeding status and observe bone density! Other factors that are the same in this matched study are gender, age, BMI, ethnicity, health status (as they are twins!) 21 February 2023 83
  • 84. Steps in computing paired t‐test – State null hypothesis and alternative hypothesis. – Define the significance level, degree of freedom, and thus critical value for computed T test statistics. – Make sure that the data meet the assumptions for using paired t test. – Compute the paired t-test statistic. – Determine statistical significance and state a conclusion. 21 February 2023 84
  • 85. Example Research question Does a “healthy heart” education program increase knowledge about cardiovascular knowledge in teenagers? Independent (Grouping) variable Pretest/posttest measure Dependent variable Score on a 100 point quiz about cardiovascular health Mean and SD for each time and for difference between two time points would be relevant 21 February 2023 85
  • 86. State Hypotheses ❖Null Hypothesis • H0: There will be no difference between the pretest and posttest cardiovascular knowledge ❖Alternative Hypothesis (nondirectional) • HA: There will be a significant difference in cardiovascular knowledge after attending the program ❖This hypothesis stems from the research question: • Does a “healthy heart” education program increase knowledge about cardiovascular knowledge in teenagers? 21 February 2023 86
  • 87. Specify α Level and Find the Critical Value 1. α‐Level • Statistical significance will be defined as p<0.05 2. Critical Value • Using the table of critical values of the t‐table, we find that the critical values that define the rejection region are ±2.042 (on 30 df) • If the t‐statistic is greater than 2.042 or more negative (smaller than) −2.042, we will reject the null hypothesis • This would be relevant for hand calculations. We would use p‐value (sig.) from a computer output, e.g. SPSS 21 February 2023 87
  • 88. Ensure data meet the assumptions • There are two paired measures of the dependent variable • Dependent variable is approximately normally distributed. 21 February 2023 88
  • 89. Paired t test: SPSS procedure 21 February 2023 89
  • 91. Paired t‐Test: SPSS Output 21 February 2023 91
  • 92. Determine statistical significance and state a conclusion • Since the computed t test statistics of 6.935 is greater than the critical value of 2.042, we conclude that the difference in test scores from pretest to posttest is statistically significant. In short: • The posttest score was an average of 9.03 points higher than the pretest score, and this was significant at p < 0.05 by the paired t test. It can be concluded that the teenagers had significantly higher test scores after going through the educational program. 21 February 2023 92
  • 93. Summary • Identify a research question that is based on true or matched pairs (dependent groups) • Equal sample size across groups as they are paired, so degrees of freedom is number of pairs minus 1 • Perform paired t‐test (one sample t‐test based on differences) • If dependent variable is not approximately normally distributed, Wilcoxon Signed rank test (a nonparametric equivalent method) can be used (to appear later in this course) • If hand calculations are performed, use t‐distribution table • If a computer program like SPSS is used, look at p‐value (sig.) • Interpret results and state a conclusion 21 February 2023 93
  • 94. Hand calculation: Try using the above example 21 February 2023 94 1. Interval estimate for true knowledge score differences 2. Test the hypothesis