1. 04-09-2012
Understanding Equivalent calculations Understanding Equivalent calculations
n (n + 1)
Test Statistic U = n1n2 + 1 1 − R1
2
= # of ( X i , Y j ) pairs where X i < Y j
The p-value and hence the conclusion would be the same
n1n2 n1n2 (n1 + n2 + 1) irrespective of what form of the test statistic is used, i.e.
Under H 0 , µU = , σU =
2 12 R1 or R2 ɶ
or U or U
U-µ U
and has N(0,1) distribution But need to be careful on (a) left or right tailed, depending on the
σU
n ( n + 1)
form (b) Mean to be subtracted (S.E would be the same)
(n1 + n2 )( n1 + n2 + 1) ɶ
U = n1n2 + 2 2 − R2
R1 + R2 = 2
2
n ( n + n + 1)
= # of ( X i , Y j ) pairs where X i > Y j
Under H 0 : E ( R1 ) = 1 1 2
2 ɶ
U + U = n1n2
ɶ nn nn
⇒ U − 1 2 = 1 2 −U
2 2
Numerical illustration: small sample 14-34 in Aczel -Sounderpandyan
Q. Do Model B planes fly faster? (modified ex14.4)
Travel time in two models of copter-planes :
Test if the (average) current ratio for the 3 industries are the same.
Model A: 35 38 40 43 n1 = 4
Model B: 27 29 36 n2 = 3
R1 = 3 + 5 + 6 + 7 = 21 R2 = 1 + 2 + 4 = 7
4×5 mean sd
U = 12 + − 21 = 1 Easier to note this directly from pairs!
2
p-value=P(U ≤ 1) = 0.0571 (Table 9 in P798)
A 1.38 1.55 1.9 2 1.22 2.11 1.98 1.61 1.719 0.324
Note that the distribution of U is symmetric (about…?) under the null hypothesis
Let the rank of X obs. be r1 ,… rn1 . R1 = r1 + … + rn1
n1 ( n1 − 1)
B 2.33 2.5 2.79 3.01 1.99 2.45 2.512 0.356
Can you see why U = No. of (X i , Y j ) pairs with X i < Y j U = (n1 + n2 − r1 ) + … (n1 + n2 − rn1 ) −
n1 (n1 + 1) 2
= n1n2 + − R1 n1 ( n1 − 1)
2 = n1 ( n1 + n2 ) − R1 −
2 C 1.06 1.37 1.09 1.65 1.44 1.11 1.287 0.238
n1 (n1 + 1)
= n1n2 + − R1
2
Kruskal-Wallis Test Solving 14-34 using Kruskal-Wallis
Industry current ratio rank
• For comparing means of more than 2 A
A
1.38
1.55
6
8
populations – alternative to ANOVA A 1.9 11
A 2 14 ranksum sample size R^2/n
• Use if data is ordinal or the assumptions of A 1.22 4 A 79 8 780.125
ANOVA are violated A
A
2.11
1.98
15
12 B 103 6 1768.167
• Pull all observations and rank them A 1.61 9
B 2.33 16 C 28 6 130.6667
• Compute total of the ranks of observations B 2.5 18
B 2.79 19 total 210 20 2678.958
coming from 1st, 2nd ,3rd… populations B 3.01 20
B 1.99 13
• Null distribution is Chi-square with k-1 d.f B 2.45 17 12 × 2678.96
C 1.06 1 T.S. is − 3 × 21 = 13.54
12 R 2 C 1.37 5 20 × 21
T.S. is
n(n + 1 )
∑ n − 3(n + 1 ) i C
C
1.09
1.65
2
10 p − value = P(χ 2 df > 13.54) = 0.001
2
i
C 1.44 7
C 1.11 3
1
2. 04-09-2012
Run test:
Problem 3
A test for randomness
• Which of the following sequences appear to be ‘random’?
– HTHTHTHTHTHTHTHTHTHTHTHTHTHTHTHT
– HHHHHHHHHHHHHHTTTTTTTTTTTTTTTTTT
A sequence of small glass sculptures was inspected for – HHHHHHHHHHHTTTTTTTTTTTHHHHHHHHH
shipping damage. The sequence of acceptable and damaged • NONE! How to determine objectively or statistically?
pieces was as follows: • Calculate the no. of runs
• A run is a sequence of identical symbols/events
D,A,A,A,D,D,D,D,D,A,A,D,D,A,A,A,A,D,A,A,D,D,D,D,D • Too many (or few) runs indicate lack of randomness
– HTHT HTHT HTHT HTHT HTHT HTHT HTHT HT
– HHHHHHHHHHHHHHTTTTTTTTTTTTTTTTTT
Test for the randomness of the damage to the shipment using – HHHHHHHHHHHTTTTTTTTTTTHHHHHHHHH
the 0.05 significance level.
Run test (cont.) Solution to Problem 3
• How to determine too many or too few?
• Acceptable no. of runs depend on n1, n2 nA = 11, nD = 14,
2 × 11× 14
If H 0 (the sequence is 'randomly' mixed) is true,
r-µr
has approximately µr = + 1 = 13.32,
σr 11 + 14
N(0,1) distribution, provided either n1or n 2 moderately large ( ≥ 10). 2 × 11×14(2 ×11× 14 − 25)
Small sample distributions are avaliable (Table 8 page 796-797).
σr = = 2.41
25 2 × 24
Atα = 0.05, the C.R. is | Z | > 1.96.
2n1n2 2n1n2 (2n1n2 − n1 − n2 )
9 - 13.32
µr = +1 σr = The observed r = 9, and value of the T.S. is = −1.79.
n1 + n2 (n1 + n2 )2 (n1 + n2 − 1) 2.41
So at 5% level we conclude that damages occur randomly
Data summarization Expected Value
And presentation in decision making
Decision trees Discrete: General, Binomial, Poisson
Probability Random variable
And its Distribution
Continuous: General, Normal,Exponential
T, Chi-square, F
Confidence interval/
Testing hypothesis Sampling
Sampling distribution of
π X, p, S 2
1 or 2
µ
sample
σ
Goodness of Fit
ANOVA NP
Test for indep/homogeneity
2