2. Inferential statistics
• Comparison of two (or more)
variables
• Qual. Vs Qual eg. htn vs smoking__
(count/proportions)
• Quant. Vs Qual eg. BP vs sex
• Quant. Vs quant. eg. BP vs weight__
(metric/interval data)
• Drawing inference from the sample
for our population of interest
3. Scatter plots
• A way of portraying a
relationship between two
quantitative variables
Linear
Non-linear
No relationship
• Correlation and regression
4. Regression and correlation
• Analyze the association between two quantitative variables
• Assume independent observations
• Assume a linear relationship
• Allow hypothesis testing of relationship– drawing inferences on the
population
• Regression: gives the ‘best-fit’ line to the data
• Correlation: gives a measure of scatter of data points around this line
5. Regression line: y = bx + a
Least squares method: line is fitted to minimize the sum of the squares of vertical distances of the
observed values from the line
6. The regression equation
• Gives the ‘best fit’ line to the data
• The regression coefficient ‘b’:
• Measures the relationship
between variables
• ‘amount of change in the
variable y for a unit change in
x’
• Positive for a direct
relationship and negative for
an inverse one
7. Correlation
• While regression equation
measures the average
relationship between two
variables
• Correlation gives the
strength or goodness of fit
of the relationship
• Correlation coefficient
(Pearson’s) r : lies
between -1 to +1
8. Coefficient of determination : r2
• Interpreted as the percentage of total variation in the dependent
variable (y) explained by the regression line or just alone by the
variation in the particular independent variable (x)
• r2 of 1 would imply that 100 percent of variation is explained by
variation in x
• Values less than 1 imply that other ‘unknown’ variables exist which
can explain y to a certain extent
9. Hypothesis testing
• The sample statistics b and r used to make inferences on the
population parameters
• Assumptions for valid inferences:
o Independent data (two scatter points are independent)
o Linear relationship in mean of y vs x
o Distribution of y normal for each x
o Variances the same at each x
• Confidence intervals and p values are obtained based on t distribution
10. When the assumptions
do not hold
• Residual analysis
• Polynomial regression:
y = a + bx + cx2
• Data transformations
• Rank correlation: if
data transformation
fails
12. Significance test on Spearman’s ρ
• The test statistic is ρ/rs itself
• If the calculated coefficient is within the limits +/- rc (critical value)
given in the table for
• ‘n’ pairs (10)
• two sided significance level α (5 %)
• then the null hypothesis (that there is no actual correlation) can’t be
rejected
For the example the value is +/- 0.6485, so the its concluded that there
is no difference between the ranks assigned by the two assessors
14. Wilcoxon rank-sum test/ Mann-Whitney U
test
• Used when normality assumption doesn’t hold esp. for small samples
• Hypothesis test for assessing the assumption that one of the sets of
samples have a larger value than others
• Ranks are assigned to the values used for comparison
• Assumptions:
• Sample is randomly drawn
• Observations are independent
15. Steps
1. Rank all the values irrespective of the particular group
2. Sum the ranks in each group
Original
values
Ranks
W1=52 W2=101
16. U statistic
Decision is based on the
value of U
• For one tailed: u1 or u2
• For two tailed:
u = min (u1;u2)
Reject the null hypothesis
whenever the test statistic
u/u1/u2 is less than critical
value
17. Comparing two paired groups: Wilcoxon
signed-rank test
Paired tests are used when the the observations between groups are
dependent in some way:
• Variable is measured before-after an intervention
• Subjects are recruited as matched pairs (such as for age, sex, co-
morbidities)
• ‘twins’ or siblings recruited as pairs
• ‘right-left pairs’– ex different treatment for right and lefty eye
Assumption: each pair chosen is random and independent
18. Wilcoxon signed-rank test
• Non-parametric test for paired data sets
• Tests the hypothesis that there is no difference between two paired groups
Steps:
• Calculate difference between each matched pair keeping track of the sign
• Rank the absolute value of differences for ‘positive’ and ‘negative’
differences ignoring the sign
• Calculate the sums of two groups ‘positive’ and ‘negative’ differently
• Calculate test statistic and compute the p-value
19. Kruskal-Wallis test
• Similar to one-way ANOVA and extension of Mann Whitney U test
• Non-parametric test for comparing the medians between more than
two groups of observation for a given variable
• Ranks are given to all the observations f/b
• Sum of the ranks for each group are calculated
• Test statistic: H follows a chi square distribution with df= k-1
20. Summary: non-parametric tests
• Nonparametric tests are less powerful: ‘some information is discarded
while using ranks’
• Sample size: compute the sample size for parametric test and add
15%
• Nonparametric tests are usually not reported with CIs
• Nonparametric tests are not readily extended to regression models
21. Variable Parametric test
(paired test)
Non-parametric test
(paired test)
Quantitative variable; 2
groups
Mean or median
Unpaired t test
(paired t test)
Mann Whitney U test
(Wilcoxon signed rank
test)
Quantitative variable; > 2
groups
Mean or median
One Way ANOVA
(repeated measures
ANOVA)
Kruskal Wallis test
(Friedman test)
Categorical variable/
proportions
Chi square test
(Mc Nemar test)