F-1
Stat 423, Stat 523 Formulas
Chapter 7 Sections 7.1, 7.2, 7.3
We take a random sample X1, …, Xn from N(µ,s2)
Two-Sided 100(1-a)% Confidence Intervals for µ
Requirements Confidence Interval
Normal, s known
Normal, s unknown
Chapter 8 Sections 8.1, 8.2, 8.4
Steps in Testing Hypotheses
1. null hypothesis H0 and alternative hypothesis Ha
H0: µ = µ0 Ha: µ > µ0, µ < µ0 or µ ¹ µ0
where µ0 is the known hypothesized value of µ.
2. test statistic
Requirements Test Statistic Reference Distribution
Normal, s known
N(0,1)
Normal, s unknown
tn-1
3. rejection region or P-value (a = level of significance)
Ha
Rejection Region (RR)
Z test T test
µ > µ0 z ³ za t ³ ta,n-1
µ < µ0 z £ -za t £ -ta,n-1
µ ¹ µ0 z ³ za/2 or z £ -za/2 t ³ ta/2,n-1 or t £ -ta/2,n-1
Ha
P-value
Z test R Command T test R Command
µ > µ0 1 - P(Z £ z) 1-pnorm(z) P(tn-1 ³ t) 1-pt(t,n-1)
µ < µ0 P(Z £ z) pnorm(z) P(tn-1 £ t) pt(t,n-1)
µ ¹ µ0 2[1 - P(Z £ |z|)] 2*(1-pnorm(abs(z))) 2P(tn-1 ³ |t|) 2*(1-pt(abs(t),n-1))
4. Conclusion: Reject H0 at the a level of significance if:
• test statistic is inside the RR or P-value < a.
÷÷
ø
ö
çç
è
æ s
×+
s
×- aa
n
zx ,
n
zx
22
÷÷
ø
ö
çç
è
æ
×+×- -- aa n
s
tx,
n
s
tx 1n,1n, 22
n
x
z 0
s
µ-
=
n
s
x
t 0
µ-
=
g zg
0.100 1.282
0.050 1.645
0.025 1.960
0.010 2.326
0.005 2.576
0.001 3.09
F-2
Chapter 9 Section 9.1 z Tests and CIs
Assumptions
• X1, X2, ..., Xm = data from population 1 with mean µ1 and variance s12
• Y1, Y2, ..., Yn = data from population 2 with mean µ2 and variance s22
• data:
Case I Normal Populations with Known Variances
Hypothesis Test:
1. H0: µ1 - µ2 = D0 vs. Ha: where D0 is a known constant (zero usually).
2. Test statistic:
3. Rejection region and P-value
Ha Rejection Region P-value P-value in R
µ1 - µ2 > D0 z ³ za 1 - P(Z £ z) 1-pnorm(z)
µ1 - µ2 < D0 z £ -za P(Z £ z) pnorm(z)
µ1 - µ2 ¹ D0 z £ -za/2 or z ³ za/2 2[1 - P(Z £ |z|)] 2*(1-pnorm(abs(z)))
100(1-a)% Confidence Intervals for µ1 - µ2:
2-sided CI:
1-sided CIs: ,
where za is defined like in Chapters 7 and 8.
Case II Large-Sample Procedures (s1 and s2 are unknown, m>30, n>30)
Replace s1 and s2 in Case I with standard deviations s1 and s2.
ïî
ï
í
ì
===
===
nsize sample ,sdeviation standard ,y mean:2 Sample
msize sample ,sdeviation standard ,x mean:1 Sample
2
1
ï
î
ï
í
ì
D¹µ-µ
D<µ-µ
D>µ-µ
021
021
021
)1,0(N~
nm
)yx(
z
2
2
2
1
0
s
+
s
D--
=
nm
z)yx(
2
2
2
1
2
s
+
s
±- a
,
nm
z)yx(
2
2
2
1
÷
÷
ø
ö
ç
ç
è
æ
¥+
s
+
s
-- a
nm
z)yx( ,
2
2
2
1
÷
÷
ø
ö
ç
ç
è
æ s
+
s
+-¥- a
9A
9B
F-3
Section 9.2 t Test and Confidence Interval
• Normal populations, s1 and s2 are unknown, and sample sizes are small.
Case III t-based Procedures
, round down to the nearest integer.
t Test:
1. H0: µ ...
Seal of Good Local Governance (SGLG) 2024Final.pptx
F-1 Stat 423, Stat 523 Formulas Chapter 7 Section.docx
1. F-1
Stat 423, Stat 523 Formulas
Chapter 7 Sections 7.1, 7.2, 7.3
We take a random sample X1, …, Xn from N(µ,s2)
Two-Sided 100(1-a)% Confidence Intervals for µ
Requirements Confidence Interval
Normal, s known
Normal, s unknown
Chapter 8 Sections 8.1, 8.2, 8.4
Steps in Testing Hypotheses
1. null hypothesis H0 and alternative hypothesis Ha
H0: µ = µ0 Ha: µ > µ0, µ < µ0 or µ ¹ µ0
where µ0 is the known hypothesized value of µ.
2. 2. test statistic
Requirements Test Statistic Reference Distribution
Normal, s known
N(0,1)
Normal, s unknown
tn-1
3. rejection region or P-value (a = level of significance)
Ha
Rejection Region (RR)
Z test T test
µ > µ0 z ³ za t ³ ta,n-1
µ < µ0 z £ -za t £ -ta,n-1
µ ¹ µ0 z ³ za/2 or z £ -za/2 t ³ ta/2,n-1 or t £ -ta/2,n-1
Ha
P-value
3. Z test R Command T test R Command
µ > µ0 1 - P(Z £ z) 1-pnorm(z) P(tn-1 ³ t) 1-pt(t,n-1)
µ < µ0 P(Z £ z) pnorm(z) P(tn-1 £ t) pt(t,n-1)
µ ¹ µ0 2[1 - P(Z £ |z|)] 2*(1-pnorm(abs(z))) 2P(tn-1 ³ |t|) 2*(1-
pt(abs(t),n-1))
4. Conclusion: Reject H0 at the a level of significance if:
• test statistic is inside the RR or P-value < a.
÷÷
ø
ö
çç
è
æ s
×+
s
×- aa
n
zx ,
n
zx
22
5. g zg
0.100 1.282
0.050 1.645
0.025 1.960
0.010 2.326
0.005 2.576
0.001 3.09
F-2
Chapter 9 Section 9.1 z Tests and CIs
Assumptions
• X1, X2, ..., Xm = data from population 1 with mean µ1 and
variance s12
• Y1, Y2, ..., Yn = data from population 2 with mean µ2 and
variance s22
• data:
Case I Normal Populations with Known Variances
Hypothesis Test:
1. H0: µ1 - µ2 = D0 vs. Ha: where D0 is a known constant
(zero usually).
2. Test statistic:
3. Rejection region and P-value
Ha Rejection Region P-value P-value in R
6. µ1 - µ2 > D0 z ³ za 1 - P(Z £ z) 1-pnorm(z)
µ1 - µ2 < D0 z £ -za P(Z £ z) pnorm(z)
µ1 - µ2 ¹ D0 z £ -za/2 or z ³ za/2 2[1 - P(Z £ |z|)] 2*(1-
pnorm(abs(z)))
100(1-a)% Confidence Intervals for µ1 - µ2:
2-sided CI:
1-sided CIs: ,
where za is defined like in Chapters 7 and 8.
Case II Large-Sample Procedures (s1 and s2 are unknown,
m>30, n>30)
Replace s1 and s2 in Case I with standard deviations s1 and s2.
ïî
ï
í
ì
===
===
nsize sample ,sdeviation standard ,y mean:2 Sample
7. msize sample ,sdeviation standard ,x mean:1 Sample
2
1
ï
î
ï
í
ì
D¹µ-µ
D<µ-µ
D>µ-µ
021
021
021
)1,0(N~
nm
)yx(
z
2
2
2
10. ø
ö
ç
ç
è
æ s
+
s
+-¥- a
9A
9B
F-3
Section 9.2 t Test and Confidence Interval
• Normal populations, s1 and s2 are unknown, and sample sizes
are small.
Case III t-based Procedures
, round down to the nearest integer.
t Test:
11. 1. H0: µ1 - µ2 = D0 vs. Ha: 2. Test statistic:
3. Rejection region and P-value
Ha Rejection Region P-value P-value in R
µ1 - µ2 > D0 t ³ ta, n P(tn ³ t) 1-pt(t,n)
µ1 - µ2 < D0 t £ -ta, n P(tn £ t) pt(t,n)
µ1 - µ2 ¹ D0 t £ -ta/2, n or t ³ ta/2, n 2P(tn ³ |t|) 2*(1-
pt(abs(t),n))
100(1-a)% Confidence Intervals for µ1 - µ2:
2-sided CI:
1-sided CI: ,
1n
n
s
1m
m
s
n
s
m
s
22
17. • J = common number of replications of each treatment
• µi = mean of treatment i (for i = 1, 2, ..., I)
• Xij = random variable that represents the measurement from
the jth EU under
treatment i (for i = 1, ..., I and j = 1, ..., J)
The One-Way Fixed Model: Xij = µi + Îij
where Îi1, Îi2, ..., ÎiJ are iid N(0,s2).
Definition Sums of Squares (SS)
Treatment i average: Grand Average:
• Total SS = Treatment i standard deviation = si
• Treatment (Among) SS =
• Error (Within) SS =
Þ ,
---------------------------------------------------------------------------
-----------
Alternative (Working) Formulas
Let , .
•
•
• SSE = SST – SSTr
Remarks:
• is called a residual and eij estimates Îij.
18. • SST = SSTr + SSE Þ SSE = SST – SSTr.
---------------------------------------------------------------------------
-----------
ANOVA table:
J
x
x
J
1j
ij
.i
å
=
=
IJ
x
x
I
1i
J
1j
ij
19. ..
å å
= =
=
( )å å
= =
-=
I
1i
J
1j
2
..ij xxSST
( )å
=
-=
I
1i
2
...i xxJSSTr
( )å å
= =
23. =
.iijij xxe -=
Source of
Variation
degrees of
freedom (df)
Sum of
Squares (SS)
Mean Square
(MS)
Test
Statistic F
P-value
P-value in R
Treatments
(Among)
I-1
SSTr
1-pf(F,I-1,I(J-1))
25. F-5
When H0: µ1 = µ2 = ... = µI is true, ~ FI-1,I(J-1).
Hypothesis Testing
H0: µ1 = µ2 = ... = µI vs. Ha: H0 is false
• F-statistic:
• P-value: P-value = (In R: 1-pf(F,I-1,I*(J-1)))
• rejection region: RR = {F > Fa,I-1,I(J-1)}
---------------------------------------------------------------------------
----------
Section 10.2 Multiple Comparison in ANOVA (Equal
Treatment Reps J)
Tukey's Procedure for Simultaneous 100(1-a)% CIs for µi-µj:
T Method for Significant Differences
1. Compute .
2. List the sample means in increasing order.
3. Underline groups of means that do not differ by more than w.
---------------------------------------------------------------------------
26. -----------
Contrast where .
Hypothesis Test (Equal Sample Sizes J)
1. H0: C = c0 vs. Ha:
2. Test Statistic
3. Rejection Region and P-value
Ha Rejection Region P-value P-value in R
C > c0 t ³ ta, I(J-1) P(tI(J-1) ³ t) 1-pt(t,I*(J-1))
C < c0 t £ -ta, I(J-1) P(tI(J-1) £ t) pt(t,I*(J-1))
C ¹ c0 t £ -ta/2, I(J-1) or t ³ ta/2, I(J-1) 2P(tI(J-1) ³ |t|) 2*(1-
pt(abs(t),I*(J-1)))
----
F Test for H0: C = 0 vs. Ha: C ¹ 0
, Test Statistic
• Rejection Region ;
• P-value = , (in R) 1-pf(F,1,I*(J-1)))
MSE
MSTr
F =
MSE
MSTr
F =
27. ( )FFP )J(I,I >-- 11
( )
J
MSE
Qxx )1J(I,I,ji -a±-
J
MSE
Qw )1J(I,I, -a=
.ix
å
=
µ=
I
1i
iicC 0c
I
1i
i =å
=
ï
î
30. 10E
10F
In R: qtukey(1-a,I,I*(J-1))
F-6
100(1-a)% CIs for Contrast C (Equal Sample Sizes)
2-sided:
1-sided:
---------------------------------------------------------------------------
----------
Section 10.3 ANOVA for Unequal Sample Sizes
Ji = sample size for treatment i, n = SJi (total sample size).
Treatment i total: , Treatment i average:
Grand Average:
• Total Sum of Squares:
• Treatment Sum of Squares:
• Error Sum of Squares: SSE = SST – SSTr Treatment i
standard deviation = si
31. ANOVA table:
Source df SS MS F P-value P-value in R
Treatments
I-1
SSTr
1-pf(F,I-1,n-I)
Error n-I SSE
Total n-1 SST
• Reject Region = {F ³ Fa,I-1,n-I}
---------------------------------------------------------------------------
-----------
T Method for Significant Differences (Unequal Treatment Reps)
1. Compute for all pairs i,j where i¹j.
2. List the sample means in increasing order.
3. Underline and if they do not differ by more than wij.
J
cMSE
40. F-8
Hypothesis Test with Contrasts (Unequal Sample Sizes)
1. H0: C = c0 vs. Ha:
2. Test Statistic
3. Rejection Region and P-value
Ha Rejection Region P-value P-value in R
C > c0 t ³ ta, n-I P(tn-I ³ t) 1-pt(t,n-I))
C < c0 t £ -ta, n-I P(tn-I ³ |t|) pt(t,n-I)
C ¹ c0 t £ -ta/2, n-I or t ³ ta/2, n-I 2P(tn-I ³ |t|) 2*(1-pt(abs(t),n-
I))
100(1-a)% CIs for Contrast C (Unequal Sample Sizes)
2-sided:
1-sided:
Special Case:
---------------------------------------------------------------------------
-----------
A Random Effects Model: Xij = µ + Ai + Îij
where A1, A2, ..., AI are iid N(0,sA2) and Îi1, Îi2, ..., ÎiJ are iid
41. N(0,s2).
E(MSTr) = s2 + rsA2, E(MSE) = s2 where r=(n-SJi2/n)/(I-1).
• F=MSTr/MSE tests H0: sA2 = 0 versus Ha: sA2 ¹ 0.
• Estimates: and where r=(n-SJi2/n)/(I-1).
• V(Xij) = s2 + sA2 = total variance observed in measurements
• Estimate of V(Xij) =
• % of total variance explained by differences among treatments
= %
ï
î
ï
í
ì
¹
<
>
0
0
0
cC
cC
47. s
´
10J
10K
F-9
Chapter 11 Formulas Set Section 11.1 Two-Factor ANOVA
with No Replications
Notation
• A = 1st factor, I = number of levels of A
• B = 2nd factor, J = number of levels of B
• Xij = the measurement from the combination of the ith level of
A and jth level of B
• xij = actual (observed) value of Xij
Two-Way Additive Fixed Model
Model equation and assumptions are
Xij = µ + ai + bj + Îij
where , and Îij's are iid N(0,s2). The average response at the
level i of A and level j of B is
µij = E(Xij) = µ + ai + bj .
51. 11A
11B
F-10
Hypothesis Tests
• Factor A: H0: a1 = a2 = ... = aI = 0 vs. Ha: at least one ai
¹ 0
• Factor B: H0: b1 = b2 = ... = bJ = 0 vs. Ha: at least one bj
¹ 0
Sums of Squares df
IJ-1
I-1
J-1
(I-1)(J-1)
ANOVA Table
Source df SS MS F P-value P-value in R
Factor
A
60. tests H0: sA2 = 0 vs. Ha: sA2 ¹ 0.
tests H0: sB2 = 0 vs. Ha: sB2 ¹ 0.
Estimates:
total variance = .
---
Two-Way Additive Mixed Model: Xij = µ + Ai + bj + Îij
where the Ai's are iid N(0,sA2), Sbj=0, and Îij's are iid N(0,s2).
tests H0: sA2 = 0 vs. Ha: sA2 ¹ 0.
tests H0: b1 = b2 = ... = bJ = 0 vs. Ha: at least one bj ¹ 0.
Estimates: , total variance =
---------------------------------------------------------------------------
-----------
Section 11.2 Two-Way ANOVA with Replications
Two-Way Interaction Fixed Effects Model
Xijk = kth observation for level i of A and level j of B.
Xijk = µ + ai + bj + gij + Îijk
for i=1, ..., I, j=1, ...,J, k=1, ..., K and where
, for all i, for all j,
and Îij's are iid N(0,s2). The mean response at the level i of A
and level j of B is
µij = E(Xij) = µ + ai + bj + gij .
67. ijk.j.
å
å å =
= =
==
.ijijjiij xˆ
ˆˆˆx̂ =g+b+a+µ=
ijijkijk x̂ xe -=
...xˆ =µ
.....ii xxˆ -=a
....j.j xx
ˆ -=b
....j...i.ijij xxxxˆ +--=g
11E
11F
11G
F-12
Hypothesis Tests
• Factor A: H0: a1 = a2 = ... = aI = 0 vs. Ha: at least one ai
¹ 0
68. • Factor B: H0: b1 = b2 = ... = bJ = 0 vs. Ha: at least one bj
¹ 0
• Interaction: H0: gij = 0 for all i,j vs. Ha: at least one gij ¹
0
ANOVA Table (Two-Way Interaction Fixed Model)
Source df SS MS F P-value P-value in R
A
I-1
SSA
1-pf(F,I-1,I*J*(K-1))
B J-1 SSB
1-pf(F,J-1,I*J*(K-1))
Interaction (I-
1)(J-1)
SSAB
1-pf(F,(I-1)*(J-1),I*J*(K-1))
69. Error IJ(K-1) SSE
Total IJK-1 SST
• Factor A: RR = {F=MSA/MSE > Fa,I-1,IJ{K-1)}
• Factor B: RR = {F=MSB/MSE > Fa,J-1,IJ(K-1)}
• Interaction: RR = {F=MSAB/MSE > Fa,(I-1)(J-1),IJ(K-1)}
---------------------------------------------------------------------------
-----------
T Method for Factor Levels (Use only when interactions are not
significant.)
Note that I=# of A levels, J=# of B levels, K=# of replications.
Section 11.3 Three-Factor Fixed Effects ANOVA
Xijkl = µ + ai + bj + dk + gABij + gACik + gBCjk +gijk + Îijk
for i=1, ..., I, j=1, ...,J, k=1, ..., K, l=1, ..., L, where Îijk's are
iid N(0,s2) and
the sum of parameters over any subscript is 0:
= = = =
70. = = = = .
The mean response at level i of A, j of B and k of C is
µijk = µ + ai + bj + dk + gABij + gACik + gBCjk +gijk .
1I
SSA
MSA
-
=
MSE
MSA
F =
( )FFP )1K(IJ,1I >--
1J
SSB
MSB
-
=
MSE
MSB
F =
( )FFP )1K(IJ,1J >--
)1J)(1I(
SSAB
MSAB
--
71. =
MSE
MSAB
F =
( )FFP )1K(IJ),1J)(1I( >---
)1K(IJ
SSE
MSE
-
=
ï
ï
î
ïï
í
ì
=
=
-a
-a
.J..2..1.)1K(IJ,J,B
I..2..1..)1K(IJ,I,A
75. g 0
K
1k
ijkå
=
=g
11I
11J
11H
F-13
Test of Hypotheses
• Factor A: H0: a1 = a2 = ... = aI = 0 vs. Ha: at least one ai
¹ 0
• Factor B: H0: b1 = b2 = ... = bJ = 0 vs. Ha: at least one bj
¹ 0
• Factor C: H0: d1 = d2 = ... = dK = 0 vs. Ha: at least one
dk ¹ 0
• AB Interaction: H0: all gABij = 0 vs. Ha: at least one gABij
¹ 0
• AC Interaction: H0: all gACik = 0 vs. Ha: at least one
gACik ¹ 0
• BC Interaction: H0: all gBCjk = 0 vs. Ha: at least one
gBCjk ¹ 0
• ABC Interaction: H0: all gijk = 0 vs. Ha: at least one gijk ¹
76. 0
Assume that there are L observations from each ABC level
combination (balanced data).
Total sample size is IJKL.
ANOVA Table (3 Factors Fixed Effects Model)
Source df SS MS F P-value*
A
I-1
SSA
B J-1 SSB
C K-1 SSC
AB
Interaction
(I-1)(J-1) SSAB
AC
Interaction
(I-1)(K-1) SSAC
77. BC
Interaction
(J-1)(K-1) SSBC
ABC
Interaction
(I-1)
´(J-1)(K-1)
SSABC
Error IJK(L-1) SSE
* In R, 1-pf(F,m,n) gives P(Fm,n > F).
Total IJKL-1 SST
! There should be at least L=2 observations per treatment to test
for all interactions. If L=1,
there is no MSE and, hence, no F-test of interactions. !
• Factor A: RR = {F=MSA/MSE > Fa,I-1,IJK{L-1)}
• Factor B: RR = {F=MSB/MSE > Fa,J-1,IJK(L-1)}
• Factor C: RR = {F=MSC/MSE > Fa,K-1,IJK(L-1)}
• AB Interaction: RR = {F=MSAB/MSE > Fa,(I-1)(J-1),IJK(L-
1)}
• AC Interaction: RR = {F=MSAC/MSE > Fa,(I-1)(K-1),IJK(L-
78. 1)}
• BC Interaction: RR = {F=MSBC/MSE > Fa,(J-1)(K-1),IJK(L-
1)}
• ABC Interaction: RR = {F=MSABC/MSE > Fa,(I-1)(J-1)(K-
1),IJK(L-1)}
T Method for Factor Levels (use when no interaction is
significant)
where {total reps per level} = JKL for factor A
= IKL for factor B
= IJL for factor C
Coefficient of Determination: , Adjusted R2:
1I
SSA
MSA
-
=
MSE
MSA
F =
( )FFP )1L(IJK,1I >--
1J
SSB
MSB
-
81. )1L(IJK
SSE
MSE
-
=
level per reps total
MSE
Qw df} {MSE levels}, factor of {#, ´= a
SST
SSE
R -= 12
SST
SSE
Radj ´÷
ø
ö
ç
è
æ-=
df error
df total
12
11K
82. F-14
Latin Squares Design
Model Assumptions
Xij(k) = µ + ai + bj + dk + eij(k)
where and eij(k)’s are iid N(0,s2).
N = # of factor levels (note that N=I=J=K)
, , ,
, , ,
Sums of Squares (Latin Squares Design)
Sums of Squares df
N2-1
N-1
N-1
N-1
83. (N-1)(N-2)
Note: SSE = SST – SSA – SSB – SSC
T Method for Factor Levels: For all factors, use .
---------------------------------------------------------------------------
-----------
Section 11.4 2p Factorial Experiments, Factor Effects, Yates
Algorithm
23 Factorial Model: Xijkl = µ + ai + bj + dk + gABij + gACik +
gBCjk +gijk + Îijkl
for i=1,2, j=1,2, k=1,2, l=1, ..., L
Estimates
•
• Fitted main effects of factors A, B and C
• Fitted 2-way interactions
• Fitted 3-way interactions
0kji =d=b=aå åå
å=
j
)k(ij..i xx å=
i
)k(ij.j. xx å=
89. ....xˆ =µ
.....k..k......j.j.......ii xx
ˆ xxˆ xxˆ -=d-=b-=a
.....k....j..jk.
BC
jk
.....k.....i.k.i
AC
ik
......j....i..ij
AB
ij
xxxxˆ
xxxxˆ
xxxxˆ
+--=g
+--=g
+--=g
.....k....j....i.jk..k.i..ij.ijkijk xxxxxxxxˆ -+++---=g
11L
11M
90. F-15
Yates Algorithm
1. List sample means (xbars) in Yates standard order.
• Start with (1) then a.
• "Multiply by b" the previous treatments to get b and ab.
• "Multiply by c" the previous treatments to get c, ac, bc, abc.
etc.
There should be 2p treatments in the list.
2. The next column is obtained by adding the numbers in the
previous column in pairs
and subtracting in pairs (2nd minus 1st).
Repeat this process p times.
3. Divide the pth new column by 2p. The results are the overall
mean and fitted
effects (with all factors at the 2nd level).
Reverse the sign of the fitted effect if you change an odd
number of subscripts.
---------------------------------------------------------------------------
-----------
Section 11.4 Fractional Factorial Studies
A. Choice of 1/2q Fraction of a 2p Factorial
1. Pick any p-q factors and list all their level combinations
using -'s and +'s.
2. Pick q different groups of these "first" factors and multiply
the signs of the
91. members of each group. Use the q products to determine the
levels of the
remaining q factors.
B. Determining the "Alias Structure" of the 1/2q Fraction
Multiplication Rules:
• A*A = B*B = ... = I
• I*A = A, I*B = B, etc.
1. Take the q generators and apply multiplication so that I is on
the left-hand-side
of the equation.
2. Multiply (LHS x LHS and RHS x RHS) the new equations in
pairs, then in triples,
then in sets of four, etc.
(2q - 1) factor products are equivalent to I. Factor effects are
aliased in 2p-q
groups of 2q members.
C. Analyzing a 2p-q Fractional Factorial
1. Initially ignore the "last" q factors and treat the data as a full
factorial in
the "first" p-q factors. Estimate the factor effects (e.g. using
formulas in
Section 11.4 or by Yates algorithm = p-q cycles and divide last
cycle by 2p-q) and
judge their statistical significance.
a. (with replication)
92. • Compute
or get them from the ANOVA table.
• Compute .
A 100(1-a)% CI for an effect is .
Note that an effect is judged not statistically significant at the a
level
if .
b. If no replication, do a normal probability plot of fitted effects
(exclude ).
2. Interpret the estimates in the light of the alias structure.
1)- size (sample of sumdf
1)- size (sample of sum
]s ingcorrespond1)- size [(sample of sum
MSE
2
=
´
=
sizes sample all of sreciprocal of sum
2
1
MSEt)r(
93. q-pdf,2
´´=a a
)(r effect
^
a±
)(r effect
^
a<
µ̂
11N
11O
F-16
Chapter 12 Formulas
Linear Model: where e's are iid N(mean=0,variance=s2).
• b1 = average change in Y for every unit change in x
• µy•x* = b0 + b1x* = average response at x*
• s2y•x* = variance of Y at x=x*
---------------------------------------------------------------------------
94. -----------
Least-Squares Estimates:
Fitted Value:
Residual:
or .
Estimate of s2: mean square error = .
---------------------------------------------------------------------------
-----------
Numerical Diagnostics
Sample Correlation:
sx and sy are the sd's of x and y => .
Coefficient of Determination: , SSR = SST-SSE
Adjusted R2 = where MST = SST/(n-1)
95. ii10i xY e+b+b=
( )
( )
n
x
x
n
yx
yx
ˆ
2
i2
i
ii
ii
1
åå
ååå
-
-
=b xˆyˆ 10 b-=b
i10i x
ˆˆŷ b+b=
96. iii ŷye -=
( ) åå
==
=-=
n
1i
2
i
n
1i
2
ii eŷySSE ii1i0
2
i yx
ˆyˆySSE ååå b-b-=
2n
SSE
ˆMSE 2
-
=s=
( )
( )
97. ( )
( )
( )( )
n
yx
yxyyxxS
n
y
yyyS or SST
n
x
xxxS
ii
iiiixy
2
i2
i
2
iyy
2
i2
i
2
100. 100(1-a)% CI for b1 :
Hypothesis Test
1. H0: b1 = b10, Ha: b1 > b10, b1 < b10 or b1 ¹ b10
2. test statistic
3. rejection region and P-value (a = level of significance)
Ha Rejection Region P-value P-value in R
b1 > b10 t ³ ta,n-2 1 - P(tn-2 £ t) 1-pt(t,n-2))
b1 < b10 t £ -ta,n-2 P(tn-2 £ t) pt(t,n-2)
b1 ¹ b10 t £ -ta/2,n-2 or t ³ ta/2,n-2 2[1 - P(tn-2 £ |t|)] 2*(1-
pt(abs(t),n-2))
ANOVA table with F-test to test H0: b1 = 0 versus b1 ¹ 0
(model utility test)
See Formulas 12B and 12C for SSR, SSE and SST.
Source df SS MS=SS/df F P-value P-value in R
Regression 1 SSR MSR F=MSR/MSE P(F1,n-2 > F) 1-pf(F,1,n-
2)
Error n-2 SSE MSE
Total n-1 SST
The rejection region is {F=MSR/MSE ³ Fa,1,n-2}
---------------------------------------------------------------------------
-----------
101. Section 12.4 CI for Mean Response µy.x and Prediction
Interval at x=x*
CI for Mean Response
At x=x*, the mean response is µy.x* = b0 + b1x*.
100(1-a)% CI for µy.x*:
where
Prediction Interval
100(1-a)% PI for a response Y at x=x*:
or
MSEˆ2 =s
xx
2
2
ˆ
S
ˆ
s
1
s
=
b
1
104. 12E
F-18
Section 13.1 More on Residuals
ith residual (random version) is
•
where
• Standardized residual where
Diagnostic Plots
1. ei* (or ei) versus xi (no pattern)
2. ei* (or ei) versus yi (no pattern)
3. (linear)
4. normal probability plot of ei* (or ei) (linear)
Section 13.2 Transformed Variables
• intrinsically linear models - function of x and y that can be
transformed
as
y' = b0 + b1x'
where y' = {function of y only} and x' = {function of x only}
105. Sections 13.4, 13.3 Multiple and Polynomial Regression
Model: Y = b0 + b1x1 + b2x2 + ... + bkxk + e
where the e's are independently distributed N(0,s2)
Data: (x11 , x21, ..., xk1, y1), (x12 , x22, ..., xk2, y2), ..., (x1n
, x2n, ..., xkn, yn)
(Least-squares criterion) Find that minimize
.
Fitted model/value:
Estimate for s2:
where .
• n-k-1 is the SSE or MSE df.
• is the jth residual
iii ŶYE -=
( ) ( ) ( )
( )ii
xx
109. --
== å
13A
13B
13C
F-19
Diagnostics: Assessing Model Fit to Data
1. Plots of Residuals
standardized residual
• Residual Plots. Plot ej* versus x1j, x2j, ..., xkj, .
• Normal Probability Plot of Residuals
2. Coefficient of Multiple Determination = R2
or
where and SSR = SST - SSE
3. Radj2 = Adjusted R2:
k = {number of predictor terms (x terms) in the
model}
110. 4. Mallows Cp:
k = number of x’s (predictors) in the smaller model, n = sample
size
SSEk = {fitted/smaller model’s SSE}, = {MSE of the full
model}
Smaller values of Cp and close to k+1 indicate better models.
Analysis of Variance and Regression
, , and SSR = SST-SSE.
Source df SS MS F P-value P-value in R
Regression k SSR MSR MSR/MSE P(Fk,n-k-1 ³ F) 1-pf(F,k,n-k-
1)
Error n-k-1 SSE MSE
Total n-1 SST
Model-Utility Test:
H0: b1 = b2 = ... = bk = 0 versus Ha: at least one of the b's is
not 0
Rejection Region = {F ³ Fa,k,n-k-1}
Inference for Model Coefficients
1. Confidence Intervals
100(1-a)% CI for bi:
where is an estimate of the standard deviation of .
111. 2. Test of Hypothesis
H0: bi = 0 versus Ha: bi ¹ 0
Test statistic ; P-value = 2*P(tn-k-1 ≥ |t|), (in R) 2*(1-
pt(abs(t),n-k-1))
RR = {t ³ ta/2,n-k-1 or t ≤ -ta/2,n-k-1}
jj e
jj
e
j*
j
s
ŷy
s
e
e
-
==
jŷ
SST
SSE
1R2 -=
112. SST
SSR
R2 =
( )2j yySST å -=
SST
SSE
df error
df total
1
)1k(n
kR)1n(
R
2
2
adj ´-=+-
--
=
n)1k(2
s
SSE
C
2
f
k
113. p -++=
2
fs
( )2j yySST å -= ( )
2
jj ŷySSE å -=
i
ˆ
1kn,
2
i st
ˆ
b--
a ×±b
i
ˆsb ib̂
i
ˆ
i
s
ˆ
t
b
114. b
=
13E
13D
F-20
More Intervals
1. Confidence Intervals for Mean Response at (x1*, x2* , ...,
xk*)
100(1-a)% CI for µy.x*:
where and is an estimate of the standard
deviation of .
2. Prediction Interval for New Observation
100(1-a)% PI for a new response Y at (x1*, x2* , ..., xk*):
where s2 = MSE.
140. 60 0.01 3.76 4.28 4.59 4.82 4.99 5.13 5.25 5.36 5.45 5.53 5.60
120 0.05 2.80 3.36 3.68 3.92 4.10 4.24 4.36 4.47 4.56 4.64
4.71
120 0.01 3.70 4.20 4.50 4.71 4.87 5.01 5.12 5.21 5.30 5.37
5.44
Inf 0.05 2.77 3.31 3.63 3.86 4.03 4.17 4.29 4.39 4.47 4.55
4.62
Inf 0.01 3.64 4.12 4.40 4.60 4.76 4.88 4.99 5.08 5.16 5.23
5.29
Stat 423 Section 02 Spring 2020
Name ______________________________________
Exam 3 (100 points)
ID Number __________________________
Part I. Workout Problems. Show solution in support of your
answers. Unsupported answers will not receive full
credit. (61 points)
1. A 2!"# fractional factorial involving factors A, B, C, D, E
and F is to be run. Practitioners have these two sets of
generators in mind:
Design 1 Generators: E=ABD and F=ACD
Design 2 Generators: E=ABCD and F=ABD
a. Consider Design 1. Which treatments in this experiment will
have both factors A and B at their high (+)
levels? [6 pts]
141. b. Consider Design 1. Derive its defining relation and
determine its resolution. [8 pts]
c. The defining relation for Design 2 is I=CEF=ABDF=ABCDE.
Which design (1 or 2) is better? Explain briefly
and give at least one reason for your choice. [3 pts]
142. 2. A 2$"% fractional factorial was conducted to study the
effects of four factors on the bond strength of an
integrated circuit mounted on metallized glass substrate. The
four factors (and their levels) that engineers
identified as potentially important determiners of bond strength
are listed in the table below.
Factor Levels
A – Adhesive Type D2A (−) vs. H-1-E (+)
B – Conductor Material Copper (−) vs. Nickel (+)
C – Cure Time at 90°C 90 min (−) vs. 120 min (+)
D – Deposition Material Tin (−) vs. Silver (+)
Let �& = main effect of A, �'= main effect of B, �( = main
effect of C, �) = main effect of D, and � = interaction
effect. Summary statistics and the results of the Yates
algorithm for computing fitted effects are given below.
Treatment
Replication
Sample
Variance ��
Sample
Mean �+
Yates Algorithm
143. Cycle 1 Cycle 2 Cycle 3 Fitted Effect
(1) 5 2.452 73.48 157.36
314.54 650.84 81.355
ad 5 4.233 83.88 157.18
336.30 7.84 0.980
bd 5 0.647 81.58 166.60
4.42 2.92 0.365
ab 5 26.711 75.60 169.70
3.42 2.08 0.260
cd 5 0.503 87.06 10.40
−0.18 21.76 2.720
ac 5 8.562 79.54 −5.98
3.10 −1.00 −0.125
bc 5 1.982 79.38 −7.52
−16.38 3.28 0.410
abcd 5 3.977 90.32 10.94
18.46 34.84 4.355
a. The replications and the sample variances of the 8 treatment
combinations are given in the 2nd and 3rd
columns, respectively, in the table above. Compute �(0.05)
for judging if a fitted effect is statistically
significant at the � = 0.05 level. Note that the sum of the
variances is 49.067. [8 pts]
144. b. The generator and defining relation were D=ABC and
I=ABCD, respectively. If you have no answer in (a), use
�(�.��) = �.���.
i. Based on your answer in (a), is the fitted effect 0.980
statistically significant? [2 pts]
Select one: NO YES
ii. What sum of effects does the fitted effect 0.980 estimate?
Your answer should be a sum of
subscripted/superscripted Greek letters (e.g., �# + �##+,). [4
pts]
3. The diameter � of a tree at breast height (in cm, relatively
easy to measure) is used to predict the height � of a
tree (in m, difficult to measure). Summary data on � = 36
white spruce trees (in British Columbia) are given
below.
B� = 655.1, B�# = 12711.47, B� =
644.7, B�# = 11824.45,
B�� = 12112.34, �-- =
790.4697, ��� = �.. = 278.9475, �̅� =
18.1972, �G = 17.9083.
145. a. Do some calculations to show that the least-squares line is
�H = 9.1468 + 0.4815�. [10 pts]
b. Compute the sample correlation � between � and �. Give a
quick interpretation. [6 pts]
Interpretation:
c. Construct an interval with 95% confidence for the height of a
new spruce tree with a breast height diameter �
146. = 19 cm. Plug in numbers in a formula and do not simplify.
Use � = 36, �̅� = 18.1972, �-- = 790.4697,
�# = ��� = 2.815. [8 pts]
Problem 3 (continued).
d. A scatterplot of the data and ��� values for the linear and
quadratic model fits are given below. Also, the tota
l sum of squares for either model is ��� = 1824.45. Which of
the two models provides a better description o
f the data? Explain briefly. In your explanation, use both
graphical AND numeric results [6 pts]
147. Part II. Multiple Choice. Circle the letter of the correct/best
answer. (39 points)
1. Which of the following statements is NOT true?
A. The simple linear regression model is � = �/ + �%� + �
where the � is a random variable that is normally
distributed with mean 0 and variance �#.
B. In simple linear regression, the independent variable � is
also referred to as the predictor or explanatory
variable.
C. The goal of least-squares regression is to find the curve that
maximizes the sum of the squared distances
between the curve and the data points.
D. A first step in a regression analysis involving two variables
is to construct a scatter plot.
2. In fitting � = �/ + �%� + � through data, (1.7,2.5) is a 90%
confidence interval for �%. What is a 90%
confidence interval for the mean change in � when we reduce �
by 0.65.
A. (−1.625,−1.105)
B. (1.05,1.85)
C. (1.105,1.625)
D. (2.35,3.15)
3. Which of the following is/are TRUE about the correlation
coefficient � between � and �?
148. A. For the simple linear regression, 100% × �# = �# where �#
is the coefficient of determination (in %).
B. A correlation of � = −0.87 is weaker than a correlation of �
= 0.25.
C. The correlation � is a measure of the strength of the linear
relationship between � and �.
D. If � = −0.1, and we convert � (in inches) to centimeters (1
in = 2.54 cm), then the correlation becomes
2.54 × (−0.1) = −0.254.
E. Both (A) and (C).
Model ���
� = �/ + �%� + � 95.703
� = �/ + �%� + �#�# + � 63.007
5 10 15 20 25 30
8
10
12
14
16
18
20
22
Breast-Height Diameter x
H
ei
149. gh
t
y
4. Is � = �/ ⋅ �%0 intrinsically linear? If yes, what is
appropriate transformation to obtain a linear model?
Recall: log(��) = log(�) + log(�), log(�1) = � ⋅ log(�)
A. No.
B. Yes, log(�) = log(�/) + log(�%) ⋅ �
C. Yes, log(�) = log(�/) + �% ⋅ log (�)
D. Yes, log(�) = log(�/) + �% ⋅ �
For Problems 5 to 8: A study investigated the effects of �% =
Seal Temperature, �# = Cooling Bar Temperature, and
�2 = % Polyethylene Additive on the seal strength �. The
three models in column of the table below were fit to the
data.
There were � = 20 observations, and the total sum of squares
(for all 3 models) is ��� = 82.17 (total df = 19).
5. What is ��� for Model (1)?
A. 30.96
B. 51.21
C. 21.36
D. 60.81
150. 6. What is �34'
# for Model (2)?
A. 49.42%
B. 76.66%
C. 23.34%
D. 84.03%
7. What is the F statistic for testing �/: {�% = �# = ⋯ = �5 =
0} versus �3: {�/ is false.} with model (3).
A. 6.59
B. 9.69
C. 3.23
D. 5.36
8. In the fit of Model (2), we get �̂�6 = −0.5 and �78! =
0.3552 and find that the P-value is 0.1827 for testing
�/:�6 = 0 versus �3: �6 ≠ 0. What are the � test statistic
and conclusion at � = 0.10 significance level?
A. � = −1.41. There is NO significant interaction between �%
and �2.
B. � = 1.41. The predictor �6 has NO significant effect on the
response �.
C. � = −0.84. There is NO significant interaction between �%
and �2.
D. � = −1.41. There is significant interaction between �% and
�2.
Model �� ����
� ���
152. 85.57%
72.58%
11.8593
9. Which of the following is not true about 2>"? fractional
factorial studies?
A. The loss of information and ambiguity (confounding) can be
held to a minimum by careful planning and
wise analysis.
B. A loss of information is usually expected because we are
unable to observe responses at all of the 2>
factor combinations.
C. If two effects are aliased or confounded together, it means
that we can discuss their significance together
but not apart from each other.
D. None of the above.
10. A fitted multiple regression model is �H = 10 − 4�% +
3�#. If �% is decreased by 2, while holding �# fixed, then
then we can expect �
A. to increase by 8
B. to decrease by 6
C. to increase by 6
D. to decrease by 8
E. remain the same
153. 11. Suppose that the least-squares line is �H = −2.12 + 15.75�.
If the � test statistic for testing �/: �% = 0
against �3: �% ≠ 0 is � = 2.1 (from the ANOVA table),
what is the � test statistic for testing the same
hypotheses?
A. � = 1.45
B. � = −4.41
C. � = −1.45
D. � = 4.41
12. Which of the following statements is true?
A. Model 1 with more predictor terms may not necessarily be a
better than Model 2 with fewer predictor
terms even though Model 1’s coefficient of multiple
determination �# is larger.
B. To balance the cost of using more parameters against the
gain in the coefficient of multiple determination
�#, many statisticians use �34'
# = {the adjusted �#}.
C. An objective of regression analysis is to find a model that is
simple (relatively few parameters) and provides
a good fit to the data.
D. All of the above.
13. A study investigated the effects of three explanatory
variables �%, �#, and �2 on the response �. The model � =
�/ + �%�% + �#�# + �2�2 + � provided a good �# value.
154. Which of the following is NOT appropriate in assessing
the (statistical) significance of the relationship between �2 and
�?
A. a � test of �/: �2 = 0 versus �3: �2 ≠ 0
B. a prediction interval
C. a confidence interval for �2
D. the sample correlation between �2 and �
E. a comparison of �34'
# values for � = �/ + �%�% + �#�# + �2�2 + � and � =
�/ + �%�% + �#�# + �