# Inferential statistics nominal data

Neuroanaesthesiologist (Assistant Professor) at National Institute of Mental Health and Neurosciences, Bangalore
Jan. 28, 2018
1 of 15

### Inferential statistics nominal data

• 1. Inferential Statistics Nominal data Dhritiman Chakrabarti Assistant Professor, Dept of Neuroanaesthesiology and Neurocritical Care, NIMHANS, Bangalore
• 2. How to think while selecting tests • While its easy to think as groups and variables, its better to think in following format: • Categorical vs Categorical variables – Tests of proportions • Categorical vs Quantitative variables – Group based parametric/non parametric tests • Quantitative vs Quantitative variables – Correlation, Regression
• 3. Selecting the Appropriate Test to use
• 4. 2groups >2groupsCochran’sQ ExpFreq<5 ExpFreq≥5
• 5. General outline of all these tests • All these tests are based on certain probability distributions – Standard normal (Z-test), T-distribution (t-test), F-distribution (ANOVA), Chi-square distribution (Chi-sq test). • The distribution is basically a mathematical equation/function which calculates probability of occurrence of an “effect” when the same experiment has been conducted in the same manner infinite number of times. • This “effect” is encapsulated by calculation of a “statistic” (eg. T-statistic for difference of means, F-statistic for comparison of variance of group with total variance, Chi-sq statistic for difference of proportions). • The occurrence of this statistic (and hence effect) is then compared with a scenario where no effect exists at the given dF (degrees of freedom – encapsulates sample size). If the probability of occurrence of this statistic at that dF is very low, we say that the “no effect scenario” is rejected. Thus the effect seen was not due to chance.
• 6. You can forget the previous slide if you want to. Its just the conceptual basis for all tests you do, but doesn’t hamper you much even if you don’t know.
• 7. Testing of Nominal Variables • Organizing the data into contingency table. • Simple scenario – Two categoricals – Exposure (yes/no), Outcome (yes/no). • Contingency table is 2X2 showing the number/frequency of cases. • If data was from Cohort study design  Can calculate Risk Ratio/Relative risk = (A/A+C)/(B/B+D) • If data was from Case Control Design  Can calculate Odds ratio AD/BC • If only want to find if statistical difference in proportions is present  Chi-Square test/Fisher Exact test. • Effect size measured by Odds Ratio.
• 8. The Chi-Square test
• 9. Chi-Square on SPSS • The two variables have to be categorical/nominal in two separate columns – May be numerically or string coded. But string code should be uniform. (Male ≠ male) • Import data into SPSS  Use the “For Chi-Sq” sheet in Descriptives.xlsx • Go to Analyze  Descriptive Statistics  Crosstabs. • Put one variable in row and other in column (does not matter which in which). • “Statistics” tab  Check Chi Square • “Cells” tab  Check “Expected” and percentages “Row” and “Column”. • Click OK.
• 11. P-value for Exp Count ≥ 5 P-value for Exp Count < 5 Descriptives
• 12. Chi-Sq/OR/RR in Epi-Info • If you have 2X2 contingency table ready, easier to use StatCalc by EpiInfo. • Open EpiInfo  StatCalc  TABLES (2X2XN) tab. • Fill in the contingency table with values and you get OR/RR/Chi-Sq/Fisher Exact ready.
• 13. Fill Contingency table Output Note: Odds ratio is a bit Cumbersome in SPSS  Need to do log reg for it. It is easier on StatCalc, but needs contingency table to be at hand. Note the P-values in StatCalc and SPSS are the same.
• 14. McNemar’s test • Before and after test of difference of proportions. • Variable coding should be same in both the before and after columns. • On SPSS, in Crosstabs, in Statistics tab, check McNemar. Rest all same as chi-sq. • If Sig < 0.05, there is a difference of proportions before and after the intervention. • The “statistic” is calculated from the discordant proportions of the contingency table (B and C).
• 15. Interpreted as significantly higher proportion of case changing outcome from Bad to Good (75%) than those changing from Good to Bad (25%)P-value

### Editor's Notes

1. A probability distribution is a table or an equation that links each outcome of a statistical experiment with its probability of occurrence. Hypothesis tests are procedures for making rational decisions about the reality of effects. All hypothesis tests proceed by measuring the size of an effect, or relationship between two variables, by computing a statistic. A theoretical probability model or distribution of what that statistic would look like given there were no effects is created using the sampling distribution. The statistic that measures the size of the effect is compared to the model of no effects. If the probability of the obtained value of the statistic is unlikely given the model, the model of no effects is rejected and the alternative hypothesis that there are real effects is accepted. If the model could explain the results, the model and the hypothesis that there are no effects is retained, as is the alternative hypothesis that there are real effects.