2. • Approximately 68% of the data lie within 1 standard
deviation of the mean
• Approximately 95% of the data lie within 2 standard
deviations of the mean
• Approximately 99.7% of the data lie within 3 stan-
dard deviations of the mean
• Population Mean from Grouped Data:
• Sample Mean from Grouped Data: x =
gxifi
gfi
m =
gxifi
gfi
s = 2s2
s = 2s2
s2 =
g1xi - x22
n - 1
=
gx2i -
1gxi22
n
n - 1
s2 =
g1xi - m22
3. N
=
gx2i -
1 gxi22
N
N
Range = Largest Data Value - Smallest Data Value
x =
gxi
n
m =
gxi
N
• Weighted Mean:
• Population Variance from Grouped Data:
• Sample Variance from Grouped Data:
• Population z-score:
• Sample z-score:
• Interquartile Range:
• Lower and Upper Fences:
• Five-Number Summary
4. Minimum, Q1 , M, Q3 , Maximum
Lower fence = Q1 - 1.51IQR2
Upper fence = Q3 + 1.51IQR2
IQR = Q3 - Q1
z =
x - x
s
z =
x - m
s
s2 =
g1xi - m22fi
Agfi B - 1 =
gx2i fi -
1gxifi22
gfi
gfi - 1
s2 =
g1xi - m22fi
gfi
=
gx2i fi -
1gxifi22
gfi
5. gfi
xw =
gwixi
gwi
CHAPTER 4 Describing the Relation between Two Variables
• observed
• for the least-squares regression model
• The coefficient of determination, measures the
proportion of total variation in the response variable
that is explained by the least-squares regression line.
R2,
yN = b1x + b0
R2 = r2
y - predicted y = y - yNResidual =
• Correlation Coefficient:
• The equation of the least-squares regression line is
where is the predicted value,
is the slope, and is the intercept.b0 = y - b1x
b1 = r #
sy
sx
yNyN = b1x + b0,
6. r =
a axi - xsx b a
yi - y
sy
b
n - 1
CHAPTER 5 Probability
• Empirical Probability
• Classical Probability
P1E2 = number of ways that E can occur
number of possible outcomes
=
N1E2
N1S2
P1E2 L frequency of E
number of trials of experiment
• Addition Rule for Disjoint Events
• Addition Rule for n Disjoint Events
• General Addition Rule
P1E or F2 = P1E2 + P1F2 - P1E and F2
P1E or F or G or Á 2 = P1E2 + P1F2 + P1G2 + Á
8. P1x2 = nCxpx11 - p2n- x
s2X = g1x - m22 # P1x2 = gx2P1x2 - m2X
mX = gx # P1x2
CHAPTER 7 The Normal Distribution
• Standardizing a Normal Random Variable
z =
x - m
s
• Finding the Score: x = m + zs
CHAPTER 8 Sampling Distributions
• Mean and Standard Deviation of the Sampling Distribu-
tion of
• Sample Proportion: pN =
x
n
mx = m and sx =
s
2n
x
• Mean and Standard Deviation of the Sampling
Distribution of
mpN = p and spN = C
9. p11 - p2
n
pN
CHAPTER 9 Estimating the Value of a Parameter Using
Confidence Intervals
Confidence Intervals
• A confidence interval about with
known is .
• A confidence interval about with
unknown is . Note: is computed using
degrees of freedom.
• A confidence interval about p is
.
• A confidence interval about is
.
1n - 12s2
xa/2
2 6 s
2 6
1n - 12s2
x1 -a/2
2
10. s211 - a2 # 100%
pn ; za/2 # C
pN11 - pn2
n
11 - a2 # 100%
n - 1
ta/2x ; ta/2 # s1n
sm11 - a2 # 100%
x ; za/2 # s1n
sm11 - a2 # 100%
Sample Size
• To estimate the population mean with a margin of error E
at a level of confidence:
rounded up to the next integer.
• To estimate the population proportion with a margin
of error E at a level of confidence:
rounded up to the next integer,
where is a prior estimate of the population proportion,
or rounded up to the next integer when
no prior estimate of p is available.
n = 0.25 a za/2
E
11. b2
pN
n = pN11 - pN2a za/2
E
b2
11 - a2 # 100%
n = a za/2 # s
E
b211 - a2 # 100%
• Complement Rule
• Multiplication Rule for Independent Events
• Multiplication Rule for n Independent Events
• Conditional Probability Rule
• General Multiplication Rule
P1E and F2 = P1E2 # P1F ƒ E2
P1F ƒ E2 = P1E and F2
P1E2 =
N1E and F2
N1E2
P1E and F and G Á 2 = P1E2 # P1F2 # P1G2 # Á
P1E and F2 = P1E2 # P1F2
13. • x20 =
1n - 12s2
s0
2
z0 =
pN - p0
C
p011 - p02
n
CHAPTER 11 Inferences on Two Samples
• Test Statistic for Matched-Pairs data
where is the mean and is the standard deviation of
the differenced data.
• Confidence Interval for Matched-Pairs data:
Note: is found using degrees of freedom.
• Test Statistic Comparing Two Means (Independent
Sampling):
• Confidence Interval for the Difference of Two Means
(Independent Samples):
1x1 - x22 ; ta/2C
s1
2
14. n1
+
s2
2
n2
t0 =
1x1 - x22 - 1m1 - m22
C
s1
2
n1
+
s2
2
n2
n - 1ta/2
d ; ta/2 # sd1n
sdd
t0 =
d - md
sdn1n
Note: is found using the smaller of or
15. degrees of freedom.
• Test Statistic Comparing Two Population Proportions
where
• Confidence Interval for the Difference of Two Proportions
• Test Statistic for Comparing Two Population Standard
Deviations
• Finding a Critical F for the Left Tail
F1 -a,n1 - 1,n2 - 1 =
1
Fa,n2 - 1,n1 - 1
F0 =
s1
2
s2
2
1pN1 - pN22 ; za/2C
pN111 - pN12
n1
+
pN211 - pN22
n2
pN =
16. x1 + x2
n1 + n2
.z0 =
pN1 - pN2 - (p1 - p2)
4pN11 - pN2B
1
n1
+
1
n2
n2 - 1n1 - 1ta/2
Test Statistics
• single mean, known
• single mean, unknownst0 =
x - m0
sn1n
s
z0 =
x - m0
sn1n
CHAPTER 12 Inference on Categorical Data
• Chi-Square Test Statistic
All and no more than 20% less than 5.Ei Ú 1
17. i = 1, 2, Á , k
x20 = a
1observed - expected22
expected
= a
1Oi - Ei22
Ei
• Expected Counts (when testing for goodness of fit)
• Expected Frequencies (when testing for independence or
homogeneity of proportions)
Expected frequency =
1row total21column total2
table total
Ei = mi = npi for i = 1, 2, Á , k
CHAPTER 13 Comparing Three or More Means
• Test Statistic for One-Way ANOVA
where
MSE =
1n1 - 12s12 + 1n2 - 12s22 + Á + 1nk - 12sk2
n - k
MST =
n11x1 - x22 + n21x2 - x22 + Á + nk1xk - x22
18. k - 1
F =
Mean square due to treatment
Mean square due to error
=
MST
MSE
• Test Statistic for Tukey’s Test after One-Way ANOVA
q =
1x2 - x12 - 1m2 - m12
A
s2
2
# a 1
n1
+
1
n2
b
=
x2 - x1
A
s2
20. x…
yN ; ta/2 # seC1 +
1
n
+
1x… - x22
g1xi - x22
yN
n - 2ta/2
x…
yN ; ta/2 # seC
1
n
+
1x… - x22
g1xi - x22
y, yN• Standard Error of the Estimate
• Standard error of
• Teststatistic fortheSlopeoftheLeast-SquaresRegressionLine
• Confidence Interval for the Slope of the Regression Line
where is computed with degrees of freedom.n - 2ta/2
b1 ; ta/2 # se4g1xi - x22
t0 =
21. b1 - b1
sen4g1xi - x22
=
b1 - b1
sb1
sb1 =
se
4g1xi - x22
b1
se = C
g1yi - yNi22
n - 2
= C
g residuals2
n - 2
Two-Tailed Left-Tailed Right-Tailed
CHAPTER 15 Nonparametric Statistics
• Test Statistic for a Runs Test for Randomness
Small-Sample Case If and the test statistic
in the runs test for randomness is r, the number of runs.
Large-Sample Case If or the test statistic is
where
22. • Test Statistic for a One-Sample Sign Test
mr =
2n1n2
n
+ 1 and sr = B
2n1n212n1n2 - n2
n2 1n - 12
z0 =
r - mr
sr
n2 7 20,n1 7 20
n2 … 20,n1 … 20
Large-Sample Case The test statistic, , is
where n is the number of minus and plus signs and k is obtained
as described in the small sample case.
z0 =
1k + 0.52 - n
2
1n
2
z0(n > 25)
The test statistic, k, will The test statistic, The test statistic,
be the smaller of the k, will be the k, will be the
23. number of minus signs number of number of
or plus signs. plus signs. minus signs.
H1 : M 7 MoH1 : M 6 MoH1 : M Z Mo
Ho : M = MoHo : M = MoHo: M = Mo
Small-Sample Case (n ◊ 25)
Large-Sample Case
where T is the test statistic from the small-sample case.
• Test Statistic for the Mann–Whitney Test
Small-Sample Case and
If S is the sum of the ranks corresponding to the sample
from population X, then the test statistic, T, is given by
Note: The value of S is always obtained by summing the ranks
of the sample data that correspond to in the hypothesis.
Large-Sample Case or
• Test Statistic for Spearman’s Rank Correlation Test
where the difference in the ranks of the two observations
in the ordered pair.
• Test Statistic for the Kruskal–Wallis Test
where is the sum of the ranks in the ith sample.Ri
=
12
24. N1N + 12 B
R1
2
n1
+
R2
2
n2
+ Á +
Rk
2
nk
R - 31N + 12
H =
12
N1N + 12a
1
ni
BRi - ni1N + 122 R
2
ith
di =
rs = 1 -
6gdi2
25. n1n2 - 12
z0 =
T -
n1n2
2
B
n1n21n1 + n2 + 12
12
(n2>20)(n1>20)
MX
T = S -
n11n1 + 12
2
n2 ◊ 20)(n1 ◊ 20
z0 =
T -
n1n + 12
4
C
n1n + 12 12n + 12
24
40. 28 12.461 13.565 15.308 16.928 18.939 37.916 41.337
44.461 48.278 50.993
29 13.121 14.256 16.047 17.708 19.768 39.087 42.557
45.722 49.588 52.336
30 13.787 14.953 16.791 18.493 20.599 40.256 43.773
46.979 50.892 53.672
40 20.707 22.164 24.433 26.509 29.051 51.805 55.758
59.342 63.691 66.766
50 27.991 29.707 32.357 34.764 37.689 63.167 67.505
71.420 76.154 79.490
60 35.534 37.485 40.482 43.188 46.459 74.397 79.082
83.298 88.379 91.952
70 43.275 45.442 48.758 51.739 55.329 85.527 90.531
95.023 100.425 104.215
80 51.172 53.540 57.153 60.391 64.278 96.578 101.879
106.629 112.329 116.321
90 59.196 61.754 65.647 69.126 73.291 107.565 113.145
118.136 124.116 128.299
100 67.328 70.065 74.222 77.929 82.358 118.498 124.342
129.561 135.807 140.169
Chi-Square (X2) Distribution
Table VII
0.995 0.99 0.975 0.95 0.90 0.10 0.05 0.025 0.01 0.005
Area to the Right of Critical Value
Degrees of
Freedom
Two tails
The area to the right
41. of this value is .
The area to the right
of this value is 1� .
a
2
a
2
2X1�a
2
2Xa
2 a
2
a
2
Left tail
Area � 1�a
The area to the
right of this
value is 1�a.
a
2X1�a
Right tail
43. Explain.
2) An engineer wanted to determine how the weight of a car
affects gas mileages. The following data represent the weight of
various cars and their gas mileage.
Car weight pound miles per gallon
A 3590 18
B 2690 26
C 3180 23
D 3565 20
E 3680 19
(a) I have done it
(b) I’ve done it
(c) Compute the linear correlation coefficient between the
weight of a car and its miles per gallon. r = (…….) (Round to
three decimal places as needed)
(d) Comment on the type of relation that appears to exist
between the weight of a car and its miles per gallon based on
the scatter diagram and the linear correlation coefficient.
3) Researcher wondered whether the size of a person’s brain
was to related to the individual’s mental capacity. The selected
3 females and 3 males measured their MRI counts IQ scores.
The data is below.
Females Males
MRI IQ MRI IQ
857,781 135 924,060
138
991,305 138 965,355
132
856,473 141 1,038,438
138
Treat the MRI as the explanatory variable. Compute the linear
correlation coefficient between MRI count and IQ for both the
males and the females. Do you believe that MRI count and IQ
are linearly related?
44. A-the linear correlation coefficient for females is (…….)
B- the linear correlation coefficient for males is (…….)
(Rounds both to three decimal places as needed)
-Do you believe that MRI count and IQ are linearly related?
4) For the set below,
(a) Determine the list-squares regression line
(b) Graph the least-squares regression line on the
scatter diagram.
X 3 4 5 6 8
Y 4 5 6 9 13
(a) Determine the list-squares regression line. Ŷ= (……) X +
(…….)(Round to four decimal places as needed.
(b) Graph the least-squares regression line on the scatter
diagram.
5)A pediatrician want to determine the relation that exists
between a child’s height, x, and head circumference, y. She
randomly selects 11 children from her practice, measures their
heights and head circumferences and obtains the accompanying
data. Complete parts (a) through (g).
Data table. In inches
Height, x Head Circumference, y Height, x
Head Circumference, y
27.75 17.7 27 17.3
24.7 17.1 27.5
17.5
25.5 17.1 26.75
17.3
25.5 17.3 26.75
17.5
25 16.9 27.5
17.5
27.75 17.6
(a) Find the least -squares regression line treating height
as the explanatory variable and head circumference as the
response variable.
The least squares regression line is Ŷ= (……) X + (……)(Round
45. to four decimal places as needed)
(b) Interpret the slope and y-intercept, if appropriate.
(c) Use the regression equation to predict the head
circumference of the child who is 25 inches tall.
(d) Compute the residual based on the observed head
circumference of the 25-inch-tall child in the table. Is the head
circumference of child above average or below average?
(Residual=observed y – predicted y = y- Ŷ
(e) Draw the least squares regression line on the scatter diagram
of the data and label the residual from part (d).
(f) Notice that two children are 27 inches tall. One has a head
circumference 17.7 inch and other has a head circumference
17.3 inch. How can this be?
(g) Would it be reasonable to use the least-squares regression
line to predict the head circumference of a child who was 34
inches tall?
6) An engineer wants to determine how the weight of a car, x,
affects gas mileages, y. The following data represent the weight
of various cars and their miles per gallon.
Car A B C D E
Weight pounds, x 2660 2910 3290 3860 4015
Miles per Gallon,y 22.2 22.9 21.8 15.9 14.1
(a) Find the least-squares regression line treating weight as the
explanatory variable and miles per gallon as the response
variable. Write equation for the least-squares regression line.
Ŷ= (…..) x + (…….)(Round to four decimal places as needed).
(b) Interpret the slop and intercept, if appropriate.
(c) Predict the miles per gallon of car B and compute the
residual. Is the miles per gallon of this car above average or
below average for cars of this weights?
(d) Draw the least squares regression line on the scatter diagram
of the data and label the residual.
7) One of the biggest factors in determining the value of a home
is the square footage. The accompanying data represent the
46. square footage and asking prices (in thousand of dollars) for a
random sample of home for sales. Complete parts (a) through
(h).
Square Footage, x Asking Prince
($000s), y
1,148 154
1,096 159.9
1,151 169
1,288 169.9
1,322 170
1,466 179.9
1344 180
1,544 189
1,494 189.9
(a) And (b) have done.
(c) Determine the linear correlation coefficient between square
footage and asking price.
The linear correlation coefficient between square footage and
asking price is r = (……..)(Round to three decimal places as
needed)
(d) Is there a linear relation between the square footage and
asking price?
(e) Find the least-square regression line treating square footage
as the explanatory variable.
(f) Interpret the slop
(g) Is it reasonable to interpret the y-intercept?
(h) One home that is1, 094 square feet is listed as 189,900. Is
this home’s price above or below average for home of this size?
8) The following data represent the heights and weights of
various baseball players. (in inches/ pounds)
Player Height, x Weight, y Player
Height, x Weight, y
Nate 72 197 Greg 70 195
Derek 71 201 Randy
77 240
47. Mark 73 192 Josh
71 193
Jon 74 209 Pat
74 203
(a) Compute the least-square regression line. Ŷ=(…..)x-
(……)(Do not round until the final answer, then round to three
decimal places as needed)
(b) Remove the value corresponding to Randy and compute the
least square regression line the correlation coefficient.
(c) Do you thing that Randy is an influential observation.
9) The time it takes for a planet to complete its orbit around a
particular star is called the planet’s sidereal year.The sidereal
year of the planet is related to the distance the planet is from
the star. Accompanying data show the distance of the planets
from a particular star and their sidereal years. Complete part a
through e. x million of miles/ y sidereal year
Planet Distance from the star x
Sidereal Year, y
Planet1 36
0.22
Planet2 67
0.64
Planet3 93
1.00
Planet4 142
1.88
Planet5 483
11.9
Planet6 887
29.3
Planet7 1,785
84.0
Planet8 2,797
165.0
Planet9 3.675
48. 248.0
(a) Has been done.
(b) Determine the correlation between distance and sidereal
year. The correlation between distance and sidereal year is
(……)(Round to three decimal places as needed).
(c) Compute the least square regression line.
(d) Plot the residuals against the distance from the star
(e) Do you think the least squares regression line is good
model?
1) For the data set shown below, do they following. (a)
Compute the standard error the point estimate for σ. (b)
Assuming the residuals are normally distributed, determine Sb1.
(c) Assuming the residuals are normally distributed, test
H0;β1=0 versus H1; β1≠0 at the a=0.05 level of significance.
The null hypothesis is that x and y are not linearly related.
X 3 4 5 7 8
Y 5 6 8 13 15
a) Determine the point estimate for σ. Se ≈ (…….)(Do not
round until the final answer. Then round to four decimal places
as needed)
b) Recall that b1 is the estimate of slope of the regression line
β1. The sample standard error of b1 is given by the following
formula.
c) In order to conduct the hypothesis test, first calculate the test
statistic t0.(Explain the result; rejected or do not reject; do the
relation exists between x and y?).
2) For the data set shown below, do they following. (a)
Compute the standard error the point estimate for σ. (b)
Assuming the residuals are normally distributed, determine Sb1.
(c) Assuming the residuals are normally distributed, test
H0;β1=0 versus H1; β1≠0 at the a=0.05 level of significance.
X 20 30 40 50 60
Y 98 95 91 81 68
49. a) Determine the point estimate for σ. Se = (…….)(Do not
round until the final answer. Then round to four decimal places
as needed)
b) Assuming the residuals are normally distributed, determine
Sb1.
c) In order to conduct the hypothesis test, first calculate the test
statistic t0.(Explain the result; rejected or do not reject; do the
relation exists between x and y?).
3) A concrete cures it gains strength. The following data
represent 7 days and 28 day strength in pounds per square inch
(psi) of a certain type of concrete.
7 day strength psi, x 3330 2620 3380 2300 2480
28 day strength psi, y 4850 4190 5020 4070 4120
a) Compute the standard error of the estimate, Se. Se≈(……)
(Do not round until the final answer, Then round to four
decimal places as needed)
b) Determine whether the residuals are normally distributes
c) If the residuals are normally distributed, determine Sb1
d) If the residuals are normally distributed, test whether a linear
relation exists between 7 day strength and 28 day strength a=
0.05 level of significance.
e) If the residuals are normally distributed construct a 95%
confidence interval about the slop of the true least squares
regression line.
f) What is the estimate mean 28 day strength of this concrete if
the 7 day is 3000 psi?
4) Supposed a multiple regression model is given by ŷ= -3.98x1
– 9.03x2 – 37.73. What would an interception of the coefficient
of x1 be? Fill in the blank below
An interception of the coefficient of x1 would be “if x1
increases by 1 unit, then the response variable will decrease by
(……) units, on average, while holdingx2 constant.”
50. 5) The regression equation ŷ = 56,777.848 + 4401.106x – 46.38
describe the median income, y, at different ages, x, for residents
of a certain country. Complete parts (a) through (c)
(a) And (b) have been done
(c)How much does median income change from 40 to 50 years
of age?
The median income (Increases / decreases) by $(……..) from 40
to 50 of age (Round to the nearest cent as needed)
6) Researchers develop a model to predict the age, y, of the
individual based on the gender of the individual ,x1 (0=female,
1= male), the height of the second premolar, x2, the number of
teeth with root development, x3, and sum of the normalized
heights of seven teeth on the left side of the month, x4. The
normalized height of the seven teeth was found by dividing the
distance between teeth by the height of the tooth. Their model is
shown below. Complete parts (a) through (d).
Ŷ= 9.139 + 0.392x1 + 1.256x2 + 0.678x3 – 0.90x4 – 0.174x3x4
(a) Based on this model, what is the expected age of a female
with x2 = 27mm, x3=9, and x4= 16mm? (………) years.(
Round to one decimal place as needed)
(b) Based on this model, what is the expected of a male with
x2=27, x3=9 and x4=16mm
(c) What is the interaction term? What variable interact?
(d) The Coefficient of determination for this model is 85.5%.
Explain what this means.
7) Determine if there is a linear relation among air temperature
x1, wind speed x2, and wind child y. The following data show
the measurement value for various days.
X1 -10 30 0 -30 20 0 30 30 -20 -30 -30
-20 10 -10
X2 50 30 10 100 90 40 20 50 30 50 70
40 80 20
Y -62 0 -22 -104 -32 -44 6 -8 -64 - 88 -98
-70 -44 -44
51. (a) Find the least squares regression equation Ŷ= b0 + b1x1 +
b2x2, where x1 is air temperature and x2 is wind speed, and y is
the respond variable, wind chill.
Ŷ= (……) + (…….)x1 + (…….)x2 (Round to three decimal
places as needed).
(b) Draw residual plots to assess the adequacy of the model.
(Create a residual plot for air temperature and for wind speed).
What might you conclude based on the plot of residuals against
wind speet.