Randomization Tests

Randomization Tests – unequal-N,
unequal-σ problem

AK Dhamija

Agenda
Assumptions of t, F tests
Randomization tests
Problems of Randomization Test
Too liberal
Too conservative
Computationally Intensive
Solving the problems
Resampling
Gill’s algorithm

Assumptions of t, F tests

The two samples are each drawn from
normal distributions.

The two samples are drawn randomly from
their respective populations.

RANDOMIZATION TESTS TACKLE THESE
UNREALISTIC ASSUMPTIONS

Randomization tests
An Example Comparing t-Test and Randomization Test Results
Two fertilizers (A and B) that are randomly applied to a type of sunflower seed.
The maximum heights reached (in feet) are recorded after some time period.
All Other Factors are constant

Null hypothesis : no difference between fertilizers A and B with respect to sunflower height.
Alternative hypothesis : fertilizer A is superior to fertilizer B on average with respect to sunflower height.

Sample Fertilizer Height (ft)
1 A 9.9
2 B 9.6
3 B 9.7
4 B 9.4
5 A 10.1
6 B 9.5
7 A 9.9
8 B 9.6 Total 462 (11 !/ 5! 6!) permutations
9 A 9.5 5 of the 462 showed mean difference of 9.920 – 9.533 = 0.387
10 A 10.2 p-value = 5/ 462 = 0.0108 => Reject H0 (t-test also rejects)
11 B 9.4 = > fertilizer A outperforms fertilizer B
So t-test provides reasonably good approximation to randomization test

Randomization tests
Randomization Tests do not consider normality, random sampling, equal variances, or
other assumptions.

The conclusion was based solely on the observed results, and the fact that the fertilizers
were randomly assigned.

Why randomization tests then are not widely used, nor addressed in many statistical
texts.

The number of computations with larger sample sizes becomes astronomical
With two samples, each of size 30, there are over 1.18 * 1017 possible permutations!

But randomization tests becomes sensitive to heteroscedasticity when the cells are
unequal in size

Approximate randomization Tests (selecting few combinations)
Unstable – (statistics may vary)
Unreplicable

Randomization tests
Full Randomization Test Problems (similar to t,F test)
Too conservative if larger cells have larger variances (large effect is required for
significance)
Too liberal if smaller cells have larger variances (exaggerates the true difference)

Variance Ratios
N n1,n2 C(N,n1) 1:10 1:4 1:2 1:1 2:1 4:1 10:1
16 8,8 12,870 .0744 .0585 .0594 .045 .0616 .0464 .0656
20 8,14 125,970 .0312 .03 .0319 .058 .0921 .0984 .1152
24 8,16 735,471 .0156 .0158 .0181 .0468 .1222 .1304 .1618
28 8,20 3,108,105 .0072 .0095 .0104 .052 .1414 .1577 .1946
32 8,24 10,518,300 .0042 .0052 .0094 .058 .1631 .2024 .2133

Randomization tests
Full Randomization Test Problems (similar to t,F test)
So ideal is to keep n1 = n2, but has practical limitations

What could be done to:
N=32(8,24) : To bring back rejection level from 20% to 5% :
Use BOOTSTRAPPING (Computationally intensive)
Take scores at random (without replacement,let’s say 100 times) from larger groups
to create a sample of size equal to smaller group and do standard randomization test
Each time noting whether H0 is rejected at 5% level.
Increase is independent of differences in N
Curves are averaged for different Variance ratios
nominal level is controlled,
ability to detect difference depends only on smaller n
Resampling corrects too liberal behavior (test remains
sensitive to true effects)

For F test, non-gaussian parent distributions: similar results

Caution: For equal and unequal n: Resampling is
Conservative

Randomization tests
Full Randomization Test Problems : Bringing Computational cost under control
Computations : (n1=10,n2=16, equal ) = C(26,10) = 26!/16!10! = 5,311,735 combinations
(larger in smaller cell) => resampling => 100 randomization tests each involves
C(20,10) = 184,756 combinations => Total 18,475,600 combinations

Gill’s Algorithm : Gill(2007) used Fourier expansion to count extreme cases.
Under H0, all combinations of data in a randomization case are equally likely
Compute proportion of cases that is as or more extreme than observed data
one tail prob = P(T>t) + p(T=2) /2

where tr is the value on rth combination

where k = 2k’ –1, K’=1 to , and F(a) is imaginary part of a

Computational Cost brought down to practical level of a PC (little more costly than F,t but faster than full
enumerations of all combinations

Conclusion
Assumptions of t, F tests create problems
Randomization test obviates that, but it has its own
problems
Too conservative, Too liberal, and computationally
intensive
Liberal Bias can be removed by Bootstrapping, but it
further makes it more computationally intensive
Gill’s algorithm saves computational cost
However algorithm is still asymmetric : No algorithm is
known yet to remove Conservative bias

References

Fisher, Ronald A. “The Design of Experiments”. 8th ed. New
York: Hafner Publishing Company Inc., 1966.

Mewhort, D.J.K, Mathew Kelly and Johns Brendan
T.“Randomization tests and the unequal-N/unequal-variance
problem”

Gill, P. M.W. (2007). Efficient calculation of p-values in linear-
statistic permutation significance tests.Journal of Statistical
computation & Simulation, 77, 55-61.

Randomization Tests

More Related Content

What's hot

Viewers also liked

Similar to Randomization Tests

More from Ajay Dhamija

Recently uploaded

Randomization Tests