VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
COM 301 INFERENTIAL STATISTICS SLIDES.ppt
1. AN OVERVIEW OF
INFERENTIAL STATISTICS
BY
EMMANANUEL J.O
DEPARTMENT OF COMMUNITY MEDICINE PRINCE
ABUBAKAR AUDU UNIVERSITY ANYIGBA
4/8/2024 1
2. Outline
What is Inferential Statistics ?
Epidemiological Study Designs
Sampling Methods
Sampling Distributions
Methods of Inferential Statistics
Correlation and Regression Analyses
Exercises
Conclusion
Bibliography
4/8/2024 2
3. Objectives of the Presentation
To present a concise and straightforward overview of
the basic methods and techniques of medical statistics
To put the multitude of statistical methods applicable
to medical research into their practical context
To combine simplicity and depth in doing so
Hopefully to improve the statistical rigors of our
scientific publications
To promote the growth of evidence based medicine
Are your expectations captured ?
4/8/2024 3
4. What is Inferential statistics (I.S.) ?
Concerns decision making on the general
population based on data collected from a sample
(i.e. a subset or part of a population)
I.S INFER the true finding (s) in the larger
population based on findings in the sample using
the P-Values and the Confidence Intervals ( CI )
We INFER the parameter from the statistic
We generalize findings from sample (s) to the
larger population
I.S. therefore relies on the statistical properties of
sample estimates
4/8/2024 4
5. The Process of Making a Statistical
Inference
Sample
(statistic)
P-values
Confidence
Intervals
Inference
Start from
the
POPULATION
( parameter )
4/8/2024 5
6. Validity of Results
• Internal Validity
– Conclusion supported by study designs?
• External validity
– Generalizable to reference population?
4/8/2024 6
7. Some Epidemiological Study Designs
Epidemiological
study designs
Observational-
Descriptive
Analytic
Experimental-
RCTs (individual& community)
Clinical trials
4/8/2024 7
8. Probability (or Random Sampling
Methods)
– The chance of selecting every unit in the population
is known/ equal
– The sampling error can be estimated and may be very
small
– Outcomes of studies can be generalized to the larger
population
4/8/2024 8
9. Examples of Probability (or Random
Sampling Methods)
1. Simple Random Sampling
2. Systematic Random Sampling
3. Stratified Random Sampling
4. Cluster Sampling
5. Multi-phase Sampling
6. Multistage Sampling
4/8/2024 9
10. Non probability (or Non-Random
Sampling Method)
• The chance of selecting every unit is not
known/Unequal
• Outcomes of studies cannot be generalized to the
larger population
4/8/2024 10
12. Exercise
What sampling method (s) would you use in the
following studies?
1. Selection of 100 women attending ANC at the clinic
2. Selection of 150 under 5 children in a nursery
school for a study on malnutrition
3. Selection of 100 men into a clinical trial to test the
effect of their wife’s presence during HCT
4/8/2024 12
13. Sampling Distributions
• Most events of interest can be described using
probability distributions e.g. the normal or Gaussian
distribution curve
• I.S. therefore uses probability concepts and sampling
theory
• Inferences are drawn based on comparing observed
data (with expected values i.e. Ho) based on some
sampling distributions such as the Z, t, F, & Pearson’s
Chi square tests etc
4/8/2024 13
15. Types of probability distributions
Discrete probability distributions
I. Binomial distribution (for dichotomous
outcomes where the events of interest are
independent)
II. Poisson distribution (for rare events e.g. a plane
crash)
III. Cox distribution (for analysis of survival data)
Continuous probability distributions
I. Normal distribution (for quantitative
continuous variables)
4/8/2024 15
16. Review : The Normal Distribution
Curve
The most widely used probability distribution
Many significance tests or hypothesis testing make
the assumption that the data set collected follows
this distribution
Estimates can be computed from samples
irrespective of the nature of the variable (qualitative
or quantitative) as they follow or may be transformed
to follow the normal distribution ( = Central limit
theorem)
The normal distribution plays a major role in
statistical inference
4/8/2024 16
17. The Normal Distribution Curve
• 68%, 95% and 99 % lie within +/- 1,2 and 3 SD
respectively
• µ-3σ µ-2σ µ-σ µ µ+σ µ+2σ µ+3σ
4/8/2024 17
20. Methods of Inferential Statistics
1. Hypothesis testing (Ho) or Significance
Testing
2. Estimations of magnitude of effect
a) Point estimations e.g. p- values
b) Interval estimations e.g. 95% CI
Caution !
I. Biological Plausibility
II. Confounding
4/8/2024 20
21. Steps involved in Hypothesis Testing/
Significance Testing
1. State the NULL Hypothesis (Ho)
2. State the ALTERNATIVE Hypothesis (Ha)
3. Set the ALPHA ( ᾳ ) level
4. Select and perform the appropriate statistical test
e.g. Student t-test, Paired t-test or Chi-square etc
5. Calculate the P-Value from the test statistic
6. Decide statistical significance ( Result due to chance
or not)
7. Conclude (Clinical Significance )
4/8/2024 21
22. General format for ALL test statistics
Test Statistic = Observed Value (O) minus Expected
Value (E= Ho) Divide by Standard Error (SE)
O – E/S.E = p- value
S.E of the sample mean = sample SD/square root of
n (where n = no of samples taken from the pop.)
Used for 1 sample Z test, 1 sample t-test, 2 sample t-
test, Paired t-test, Pearson’s Chi square test etc
The p-value may be calculated manually or by using a
statistical software (e.g. SPSS, STATA, EPI-INFO )
4/8/2024 22
23. Point Estimations (P-Values)
P-value is the probability of getting a difference at least as
big as that observed if the NULL hypothesis (Ho) is TRUE
This means the smaller the P-value, the lower the chance
of getting a difference as big as the one observed if the
(Ho) were true
It also means the smaller the P-value e.g. < 0.05, the
stronger the evidence against the NULL hypothesis (Ho)
By convention the 2-sided/tailed P-values are used
A guide to tell us that a result is “significant”
Generally at the 95% CI level /Rarely 99 % CI level
4/8/2024 23
24. Point Estimations (P-Values) 2
• When P < 0.05 is Significant at the 95% CI level, it
means that there is a 95% probability that the result
is true or valid (NOT by chance)
• Example: P-value < 0.01 (Signif @ 99% CI)
• Example: P-value = 0.36 (Not Signif @ 95% & 99% CI)
4/8/2024 24
25. Common Mistakes in the
Interpretation of P-values
Do not ignore all P-values > 0.05 especially in studies
with small sample size because statistically non
significant differences are NOT always clinically or
medically non significant. Check the CI range as well.
At least 1 in 20 comparisons in which the Ho is true
will report a false P-value < 0.05, especially with
studies involving treatment effects
A larger sample size detects even an extremely small
difference in a population. So do not hurriedly accept
the Ho
4/8/2024 25
26. Confidence Intervals (CI)
CI is a range of possible values for the true value of
the parameter being estimated
The parameter could be mean, mean difference,
odds ratio, difference in proportion etc
A 95% CI gives the interval within which the true
value of the estimate lies with about 95% certainty
A 99% CI gives the interval within which the true
value of the estimate lies with about 99% certainty
4/8/2024 26
27. Confidence Intervals (CI)
CIs are used with risk ratios or relative risks (RR) and
odds ratios (OR)
CI tells us about both precision and accuracy of our
estimates
With an OR or RR we can estimate the magnitude of
the association between variables
E.g. 95% CI tells us that we can be 95% sure or
‘confident’ that the true association is somewhere in
that interval
Example: OR = 7, 95% CI= (5.2 - 8.8) or ( 5.2, 8.8)
Example: OR = 7, 95% CI= (0.4 -18.7) or ( 0.4, 18.7)
4/8/2024 27
28. Interpretation of Confidence Intervals
(CI)
CI always agree with the P- values
The inclusion of the null value (ZERO) of the
parameter in the CI means non significance i.e. P-
value is < 0.05 (and vice versa)
Because Z value of 1.96 (95 % CI) corresponds to a P-
value of 0.05
This means that if p < 0.05, then 95% CI will not
contain a ZERO value
The size of the P-value also depends on the SAMPLE
SIZE
4/8/2024 28
29. Interpretation of Confidence Intervals
(CI)
• CI for difference in means
-3.5 to 8.9 (not significant) = P-value > 0.05 (or 0.01)
5.8 to 11.5 (significant) = P-value < 0.05 (or 0.01)
4/8/2024 29
30. Exercise: Interpretation of Confidence
Intervals (CI)
• CI for correlation coefficient
- 0.3 to 0.6 (significant ?)
0.5 to 0.72 (significant?)
• CI for odds ratios
- 0.12 to 3.67 (significant?)
3.67 to 5.89 (significant ?)
4/8/2024 30
31. Reasons for observed difference/
association
1. Chance (Ruled out by hypothesis or significance
testing)
2. Confounding e.g. smoking, lung cancer & asbestosis
3. Interation ( Effect modification )
4. Spurious factors (Bias) e.g. selection & information
bias
4/8/2024 31
32. Use of 2-By-2 Tables to Calculate OR
and RR
SICK WELL TOTAL
Exposed a b a +b
Unexposed c d c+d
Total a+c b+d N
4/8/2024 32
33. Use of 2-By-2 Tables Cont’d
• Odd Ratio = ad/cb
• Relative Risk = a (c+d)/c(a+b)
4/8/2024 33
35. Interpreting odds ratios and
confidence intervals
• Odds ratios measure association between 2
qualitative or categorical variables
• OR values range from zero to infinity !
• It is >1 when the association is positive (Risk factor ?)
• It is <1(a decimal) when the association is negative
(Protective factor?)
• It = 1 when there is no association i.e. odds in the 2
groups are the same
4/8/2024 35
36. Interpreting OR and CI
The OR is always further away from 1 than the
corresponding RR (or prevalence ratio/Risk ratio/Cross
Product Ratio):
If RR>1 then OR is further > 1 ; if RR< 1 then OR is
further < 1
For rare outcomes the odds are approximately equal to
the risks (OR approx = RR)
The OR for the occurrence of disease is the reciprocal
of the odds ratio for non occurrence of the disease
ORs are fundamental in the analysis of Case-Control
studies
4/8/2024 36
37. Interpreting CIs for odds ratios
CI s for ORs are significant (P < 0.05) when the
interval does not include 1
– Examples 0.23 – 0.56, 2.67 – 5.78, 11.21 – 23.56
It is NOT significant ( P> 0.05) when the
interval includes 1
– Examples 0.24 – 4.78, 0.02 – 2.56 etc
4/8/2024 37
38. Some Determinants of Sample Size
The study design e.g. Is it a cross sectional study?
The level of difference the study is designed to detect
between groups e.g. 10% or 15% ? The smaller the
difference, the higher the sample size & vice versa
Statistical power to detect an actual difference (type 2
error, commonly 90%)
The level of error (alpha ) the researcher is willing to
tolerate (type 1 error) usually 5% ( 95% CI)
Drop out/attrition/none response rate
4/8/2024 38
39. Sample Size Calculation for a Cross-
Sectional Study
Leslie-Kish formula
N =Zα2pq/d2
Where N=minimum sample size
Zα = level of significance at 95% confidence interval =1.96
P = previous estimate of proportion of interest= say 45.1%
(0.451) i.e. from literature or pilot study or use 50%
q = 1-P = 1- 0.451 = 0.549
d = degree of precision = 5% (0.05)
4/8/2024 39
40. Sample Size Calculation for a Cross-
Sectional Study 2
• Evaluating in the formula
• n= (1.96)2 x 0.451 x 0.549 / 0.052
• = 380
• Minimum sample size = 380
• Add 10 % non response rate = 380 x 100/90 = 421.8
• Therefore N= 422
4/8/2024 40
41. Sample size formula to compare two
independent proportions
Using the formula for calculating sample size for the
comparison of two independent proportions:
n/ group = 2( Z α + Z β )2 π ( 1-π)
d2
Where,
n = minimum sample size per group
Zα = standard normal deviate corresponding to the
probability of α i.e. the probability of making a type 1 error at
5% = 1.96
Zβ = standard normal deviate at 90% statistical power,
corresponding to the probability of making a type 2 error =
1.28
4/8/2024 41
42. Sample size formula to compare two
independent proportions
π = mean of two proportions P1 and P 2
P1 = proportion of patients associated with the outcome of interest
P2 = proportion patients associated with the outcome of interest
d = the desired level of difference between the two groups P1 & P2
Assuming the prevalence of the out come of interest is 24% (from
literature or your pilot study) then 24% will be used in this study to
detect a difference of say 15% between the two groups
4/8/2024 42
43. Sample size formula to compare two
independent proportions 2
Therefore,
P 1 = 24% = 0.24
P 2 = 24 % + 15% = 39% ( = P1 + d )
π = 24 + 39/2= 63/2 = 31.5 % = 0.315
1-π = 1 – 0.315 = 0.69
n = 2 (1.96+1.28)2 × 0.315 × 0.69
0.152
n = 21 × 0.315 × 0.69
0.0225
n = 203 = minimum sample size for each group
Assuming 10% attrition rate =203 ×100/90 = 226 per group.
Total sample size for the two groups = 452 participants.
4/8/2024 43
44. Sample Size for RCTs
N = 1 /(1-f) x [ 2 (Z + Z )2 x P (1-P) ]
(P0 - P1)2
Where P = (P0 + P1)/2
SAMPLE SIZE FOR OTHER STUDY DESIGNS???
4/8/2024 44
45. Bivariate/Multivariate Analyses
Bivariate analyses: Used to find relationship
between 2 variables or difference between groups
concerning a characteristic:
Apply Chi square or t test etc as appropriate
Use P values and confidence intervals for estimates
Multivariate logistic regression is the most widely
used when more than 2 variables involved
4/8/2024 45
46. Practical Considerations for Logistic
Regression
• Sample size
• Selection of best variable type as predictor
variable
• Prevalence of the outcome or dependent
variable etc
4/8/2024 46
47. Logistic Regression Analyses
• Popular in medical research because many outcomes
are in qualitative units e.g. disease status, outcome
of illness etc
• Outcome variables are qualitative dichotomous or
multichotomous
• It is necessary to adjust for confounders (to develop
predictor models)
• The independent (or predictor) variables could be
quantitative or qualitative
4/8/2024 47
48. Example of a result of a logistic regression analysis of
contraceptive use on women’s characteristics
4/8/2024 48
49. Interpretation of results in the Table
• Age and location are significant
• Women aged less than 25 years are 4.76 times more
likely than those 35 years and above to use
contraceptives and this was a significant result (95%
CI = 2.45 – 8.23, P < 0.001)
4/8/2024 49
50. Exercises
• What type of analyses/ test statistic would you
use?
• HIV status compared among four groups of 500
women each: those married, never married,
divorced, separated
• Nutritional status of children compared between
three socioeconomic classes
• To identify predictors of suicide attempt – age,
gender, educational status, associated medical
illness
4/8/2024 50
51. Exercise
• Predicting the HIV status ( dependent
variable) of commercial sex workers using age
of sex worker, base (brothel or non brothel),
years in sex work, number of sexual partners,
condom use with partners, history of STI and
exposure to HIV AIDS intervention
(Independent or predictor variables)
4/8/2024 51
52. • A z-test is a statistical test to determine whether
two population means are different when the
variances are known and the sample size is
large.
• A t test is a statistical test that is used to compare
the means of two groups.
• In contrast, the T-test determines how averages
of different data sets differ in case the standard
deviation or the variance is unknown.
4/8/2024 52
53. • A chi-square test is a statistical test used to
compare observed results with expected results.
• ANOVA, which stands for Analysis of Variance, is a
statistical test used to analyze the difference
between the means of more than two groups.
• The Student's t test is used to compare the
means between two groups, whereas ANOVA is
used to compare the means among three or
more groups.
4/8/2024 53
54. • The Paired Samples t Test compares the
means of two measurements taken from
the same individual, object, or related
units.
• A paired t-test takes paired observations
(like before and after), subtracts one from
the other, and conducts a 1-sample t-test
on the differences.
• Paired-samples t tests compare scores on
two different variables but for the same
group of cases; independent-samples t
tests compare scores on the same variable
but for two different groups of cases.
4/8/2024 54
55. • Wilcoxon rank-sum test is used to compare
two independent samples, while Wilcoxon
signed-rank test is used to compare two
related samples, matched samples, or to
conduct a paired difference test of repeated
measurements on a single sample to assess
whether their population mean ranks differ.
4/8/2024 55