2. How to think while selecting tests
• While its easy to think as groups and
variables, its better to think in following
format:
• Categorical vs Categorical variables – Tests of
proportions
• Categorical vs Quantitative variables – Group
based parametric/non parametric tests
• Quantitative vs Quantitative variables –
Correlation, Regression
5. General outline of all these tests
• All these tests are based on certain probability
distributions – Standard normal (Z-test), T-distribution
(t-test), F-distribution (ANOVA), Chi-square distribution
(Chi-sq test).
• The distribution is basically a mathematical
equation/function which calculates probability of
occurrence of an “effect” when the same experiment has
been conducted in the same manner infinite number of
times.
• This “effect” is encapsulated by calculation of a “statistic”
(eg. T-statistic for difference of means, F-statistic for
comparison of variance of group with total variance, Chi-sq
statistic for difference of proportions).
• The occurrence of this statistic (and hence effect) is then
compared with a scenario where no effect exists at the
given dF (degrees of freedom – encapsulates sample size).
If the probability of occurrence of this statistic at that
dF is very low, we say that the “no effect scenario” is
rejected. Thus the effect seen was not due to chance.
6. You can forget the previous
slide if you want to.
Its just the conceptual basis
for all tests you do, but doesn’t
hamper you much even if you
don’t know.
7. Testing of Nominal Variables
• Organizing the data into contingency
table.
• Simple scenario – Two categoricals –
Exposure (yes/no), Outcome (yes/no).
• Contingency table is 2X2 showing the
number/frequency of cases.
• If data was from Cohort study design
Can calculate Risk Ratio/Relative
risk = (A/A+C)/(B/B+D)
• If data was from Case Control Design
Can calculate Odds ratio AD/BC
• If only want to find if statistical
difference in proportions is present
Chi-Square test/Fisher Exact test.
• Effect size measured by Odds Ratio.
9. Chi-Square on SPSS
• The two variables have to be categorical/nominal in
two separate columns – May be numerically or string
coded. But string code should be uniform. (Male ≠
male)
• Import data into SPSS Use the “For Chi-Sq” sheet
in Descriptives.xlsx
• Go to Analyze Descriptive Statistics Crosstabs.
• Put one variable in row and other in column (does not
matter which in which).
• “Statistics” tab Check Chi Square
• “Cells” tab Check “Expected” and percentages
“Row” and “Column”.
• Click OK.
11. P-value for Exp Count ≥ 5
P-value for Exp Count < 5
Descriptives
12. Chi-Sq/OR/RR in Epi-Info
• If you have 2X2 contingency table ready, easier to
use StatCalc by EpiInfo.
• Open EpiInfo StatCalc TABLES (2X2XN) tab.
• Fill in the contingency table with values and you get
OR/RR/Chi-Sq/Fisher Exact ready.
13. Fill Contingency
table
Output
Note: Odds ratio is a bit
Cumbersome in SPSS
Need to do log reg for it.
It is easier on StatCalc,
but needs contingency
table to be at hand.
Note the P-values in
StatCalc and SPSS are the
same.
14. McNemar’s test
• Before and after test of difference of proportions.
• Variable coding should be same in both the before and after
columns.
• On SPSS, in Crosstabs, in Statistics tab, check McNemar. Rest
all same as chi-sq.
• If Sig < 0.05, there is a difference of proportions before and
after the intervention.
• The “statistic” is calculated from the discordant proportions of
the contingency table (B and C).
15. Interpreted as significantly higher
proportion of case changing outcome
from Bad to Good (75%) than those
changing from Good to Bad (25%)P-value
Editor's Notes
A probability distribution is a table or an equation that links each outcome of a statistical experiment with its probability of occurrence.
Hypothesis tests are procedures for making rational decisions about the reality of effects. All hypothesis tests proceed by measuring the size of an effect, or relationship between two variables, by computing a statistic. A theoretical probability model or distribution of what that statistic would look like given there were no effects is created using the sampling distribution. The statistic that measures the size of the effect is compared to the model of no effects. If the probability of the obtained value of the statistic is unlikely given the model, the model of no effects is rejected and the alternative hypothesis that there are real effects is accepted. If the model could explain the results, the model and the hypothesis that there are no effects is retained, as is the alternative hypothesis that there are real effects.