Nonparametric Hypothesis Testing Guy Lion December 2005
Nonparametric tests handle variables that are not normally distributed.
When to use these methods <ul><li>With large samples (> 100), even if the variable is not normally distributed, the samples Mean is [ Central Limit Theorem ]. Use Parametric test. </li></ul><ul><li>Nonparametric tests can be superior with samples with less than 100 observations. </li></ul><ul><li>Before proceeding with a nonparametric test confirm that the variable does not have a normal distribution (Kurtosis and Skewness close to Zero using Excel). </li></ul>
Sign Test Testing for Differences in Paired data. <ul><li>Count number of paired data values that are different. This is the Modified Sample Size (MSS). </li></ul><ul><li>Count how many outcome values have increased (or decreased). </li></ul><ul><li>Use a binomial distribution algorithm to figure out what is the probability that the two samples come from populations with identical distribution. </li></ul>
Sign Test testing the creativity of an Ad campaign Classic situation where we need to use a nonparametric test. This is because the samples are small (17 observations), and the variables are not normally distributed (check Skewness and Kurtosis).
Sign Test testing the creativity of an Ad campaign (continued)
Mann-Whitney test for Unpaired testing Steps. <ul><li>Put both samples together. Rank values in ascending order. For repeated numbers (ties) across samples, use the average of their ranks so that identical numbers get identical ranks. </li></ul><ul><li>Find the average rank for each sample. </li></ul><ul><li>Calculate Difference in average rank. </li></ul><ul><li>Find the Standard Error for the average difference in the ranks: (n 1 + n 2 )[SQRT(n 1 + n 2 + 1)/(12n 1 n 2 )]. </li></ul><ul><li>Divide the Difference in avg. rank (step 3) by the Standard Error (step 4) to find the test statistic (a Z value). </li></ul><ul><li>Calculate P Value using NORMSDIST. </li></ul>Note Steps 2 through 6 are similar to the unpaired t Test except it uses Ranks instead of Values .
Example: Testing income of mortg. applicants Testing if applicants for fixed-rate mortgages have higher income than applicants for variable-rate mortgages. The fixed-rate applicants have one high income value ($240,000). Kurtosis and Skewness of both samples confirm they are not normally distributed. The unpaired t test would not work well.
Sorting and Ranking Ranked in ascending order. The figures in yellow are identical ($36,500). They originally ranked 12 th , 13 th , and 14 th . So, they all received the tied ranking of 13 th .
P Value (probability difference is due to chance) Based on ranks (not values), there is an 18.3% probability the two samples come from same population. There is a 81.7% probability that the Variable Rate mortgage applicants have a higher income because they have a higher average rank (17.79 vs 13.50).
Mann-Whitney U. Things to watch for <ul><li>Breaking off the ties may not have much impact. Having redone the last example without breaking the ties, depending on how the yellow figures got ranked you get P values of 17.0% or 19.8% not much different than the 18.3%. </li></ul><ul><li>Important caveat . You need at least 10 observations for each of the two unpaired samples you test for to obtain a valid Z variable to calculate a P value. </li></ul><ul><li>Mann-Whitney U is calculated differently than as shown that reflects calculations by Andrew Siegel that gets the same result faster. See Appendix on next slide. </li></ul>
Appendix: The actual Mann-Whitney U Calculation