Non-Parametric TestsRyan Sain, Ph.D
Non-parametric testsThese are used in the place of parametric statsWhen your data is not normalThere are specific adjustments and procedures to not be affected by thisTypically do not use the mean to make comparisonsMost create rankings of the raw scores then analyze these rankings
Independent samplesComparing two groups of independent samplesEquivalent to the t-testMann-WhitneyWilcoxon rank-sum
Rank logicIgnoring the specific groupsWe rank all data from lowest (1st) to highest (nth)If the groups are the same you would expect similar ranks in each groupThe sums of these ranks will likely be similar if no difference between groups existIf the groups ARE different – then you will expect a disproportionate set of ranks in one group compared to the other and the sums of those ranks would be different.Same raw scores get an average of the ranks (tied ranks).
Standardizing and significanceWe can calculate a mean using n for each group:Wmean= n1(n1+n2+1)/2SEWmean= SQRT (n1n2(n1+n2+1))/12But we still need to get a standard errorconvert raw to zUsing the mean calculated from aboveMagical +/-1.96
Two related conditionsWilcoxon signed rank testUsed when the data are related (repeated measures of the same individuals)Is the same as the dependent t-testUse a negative sign of the rank dropped for a given person between test 1 and 2.Drop all people that did not change.
Testing multiple groupsKruskal-WallisUses the same ranking logic as the mann-whiteneyIs akin to an ANOVAOmnibus test as well.Post hoc tests of mann-whitney or Wilcoxon rank-sum.
Categorical DataCategorical data is data that fits into only one categoryGenderPregnancyVotingWe have looked at using categorical data for predicting something (point biserial correlation) but now we want to examine the relationship between these variable types
The logicThere is no mean or median to work withThe values are arbitraryAll we can really look at are frequencies of occurrence
Chi squareTwo categorical variablesPregnant and contraception used.What is the chance that our observations are not due to chance?We cannot look at means, we can only look at frequencies – so we need to find the expected values
Contingency table
Expected distributionsSo we look at what is expected in each cell. (cannot use n/cells to get this)Because there is a different number of people in each condition. So we make an adjustmentRow total x column total / nX2 = the sum of each (observed-expected)2/expectedThis statistic is then able to be looked up on a probability table.We can then decide if the distribution is expected or not.Degrees of freedom (row-1)(column-1)
A sampleX2 = (20-70)2/70 + …..X2 = 35.71 + 35.71 +31.25 + 31.25 X2 = 133.92 with 2 df
assumptionsNo repeated measures situationsExpected frequencies should be greater than 5
conclusionIf you have categorical data and you are wanting to see if the distributions are by chance – use the Chi Square analysis.

Non parametrics

  • 1.
  • 2.
    Non-parametric testsThese areused in the place of parametric statsWhen your data is not normalThere are specific adjustments and procedures to not be affected by thisTypically do not use the mean to make comparisonsMost create rankings of the raw scores then analyze these rankings
  • 3.
    Independent samplesComparing twogroups of independent samplesEquivalent to the t-testMann-WhitneyWilcoxon rank-sum
  • 4.
    Rank logicIgnoring thespecific groupsWe rank all data from lowest (1st) to highest (nth)If the groups are the same you would expect similar ranks in each groupThe sums of these ranks will likely be similar if no difference between groups existIf the groups ARE different – then you will expect a disproportionate set of ranks in one group compared to the other and the sums of those ranks would be different.Same raw scores get an average of the ranks (tied ranks).
  • 5.
    Standardizing and significanceWecan calculate a mean using n for each group:Wmean= n1(n1+n2+1)/2SEWmean= SQRT (n1n2(n1+n2+1))/12But we still need to get a standard errorconvert raw to zUsing the mean calculated from aboveMagical +/-1.96
  • 6.
    Two related conditionsWilcoxonsigned rank testUsed when the data are related (repeated measures of the same individuals)Is the same as the dependent t-testUse a negative sign of the rank dropped for a given person between test 1 and 2.Drop all people that did not change.
  • 7.
    Testing multiple groupsKruskal-WallisUsesthe same ranking logic as the mann-whiteneyIs akin to an ANOVAOmnibus test as well.Post hoc tests of mann-whitney or Wilcoxon rank-sum.
  • 8.
    Categorical DataCategorical datais data that fits into only one categoryGenderPregnancyVotingWe have looked at using categorical data for predicting something (point biserial correlation) but now we want to examine the relationship between these variable types
  • 9.
    The logicThere isno mean or median to work withThe values are arbitraryAll we can really look at are frequencies of occurrence
  • 10.
    Chi squareTwo categoricalvariablesPregnant and contraception used.What is the chance that our observations are not due to chance?We cannot look at means, we can only look at frequencies – so we need to find the expected values
  • 11.
  • 12.
    Expected distributionsSo welook at what is expected in each cell. (cannot use n/cells to get this)Because there is a different number of people in each condition. So we make an adjustmentRow total x column total / nX2 = the sum of each (observed-expected)2/expectedThis statistic is then able to be looked up on a probability table.We can then decide if the distribution is expected or not.Degrees of freedom (row-1)(column-1)
  • 13.
    A sampleX2 =(20-70)2/70 + …..X2 = 35.71 + 35.71 +31.25 + 31.25 X2 = 133.92 with 2 df
  • 14.
    assumptionsNo repeated measuressituationsExpected frequencies should be greater than 5
  • 15.
    conclusionIf you havecategorical data and you are wanting to see if the distributions are by chance – use the Chi Square analysis.