SlideShare a Scribd company logo
1 of 206
Introductory
Statistics
Laboratory
for Excel
Lab Manual Author:
R. J. (Bob) Baker
December 2003
Revised by:
Krista Wilde (2016)
i
Table of Contents
Assignment #0
_____________________________________________________
__________ 2
Assignment #1
_____________________________________________________
__________ 6
Assignment #2
_____________________________________________________
_________ 10
Assignment #3
_____________________________________________________
_________ 16
Assignment #4
_____________________________________________________
_________ 22
Assignment #5
_____________________________________________________
_________ 26
Assignment #6
_____________________________________________________
_________ 32
Assignment #7
_____________________________________________________
_________ 36
Assignment #8
_____________________________________________________
_________ 44
Assignment #9
_____________________________________________________
_________ 52
INTRODUCTION
_____________________________________________________
______ 58
Example 1: Reading data from a data file into the EXCEL
worksheet. _________________ 60
Example 2: Preparing a histogram of data
________________________________________ 62
Example 3: Entering data from the keyboard into the EXCEL
worksheet _______________ 66
Example 4: Calculating relative frequencies
______________________________________ 67
Example 5: Leaving EXCEL and grading your assignment
__________________________ 68
Example 6: How to prepare a stem-and-leaf diagram
_______________________________ 69
Example 7: How to draw a frequency (or relative frequency)
polygon __________________ 71
Example 8: How to use EXCEL to calculate various numbers
that summarize the
characteristics of a population (or sample)
________________________________________ 73
Example 9: How to use the DESCRIPTIVE STATISTICS
command of EXCEL _________ 75
Example 10: Further uses of EXCEL->As a calculator
_____________________________ 76
Example 11: Calculations with a discrete probability
distribution _____________________ 77
Example 12: Reading and storing constants for further use
__________________________ 79
Example 13: Using EXCEL to answer questions about
continuous distributions _________ 80
Example 14: How to calculate a chi-squared statistic for a
'goodness-of-fit' test _________ 82
Example 15: How to calculate a confidence interval for one
mean when σ is known ______ 84
Example 16: How to calculate a confidence interval for one
mean when σ is NOT known _ 85
Example 17: How to calculate a confidence interval for a
binomial proportion __________ 86
Example 18: How to calculate a test of hypothesis concerning
one mean when σ is NOT
known
_____________________________________________________
________________ 87
ii
Example 19: Large sample confidence intervals and tests of
hypothesis for differences between
two means when population variance is unknown and equal
_________________________ 89
Example 20: Confidence intervals and tests of hypothesis for
differences between two means
for independent samples: population variances are unknown but
equal ________________ 91
Example 21: Large sample confidence intervals and tests of
hypothesis for differences between
two binomial proportions.
_____________________________________________________
94
Example 22: How to carry out a one-way analysis of variance.
_______________________ 97
Example 23: .
_____________________________________________________
_________ 101
Example 24: How to use information from analysis of variance
to calculate confidence
intervals or test hypotheses about treatment means (including
least significant difference). 101
Example 25: How to perform a two-way analysis of variance.
_______________________ 103
Example 26: How to calculate a randomized complete block
analysis of variance _______ 106
Example 27: How to prepare a scatterplot of two variables.
_________________________ 108
Example 28: How to calculate a correlation coefficient.
____________________________ 111
Example 29: How to perform a regression analysis using
EXCEL ____________________ 113
Blank page
ASSIGNMENT 0
2 INTRODUCTORY STATISTICS LABORATORY
Introductory Statistics Laboratory
Assignment #0
Purpose
This assignment is designed for use in the instructed
introduction for students using the
Introductory Statistics Laboratory for Excel (ISLeX) program.
NOTES
Login to ISLeX and get the data for Assignment 0. Then start
Microsoft Excel and
determine the answers to the questions in this assignment. When
finished, exit from EXCEL,
return to ISLeX and submit your answers.
In this assignment, all students use the same data set. In
remaining assignments, each
student will have unique data sets.
See the examples indicated by {Example } to learn how to use
EXCEL to perform a
particular task. Reference to an example will be given at the end
of each major task. The symbol
beginning of a new task.
Question A
Data called LAB0A.DAT in Table A represents measured
yields (q/ha, where 1q = 1
quintal = 100 kg) of a sample of wheat varieties tested at
Saskatoon.
EL
worksheet.
{Example 1}
midpoint (20.5 as its upper bin) and 1 as
the interval width (bin size).
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 3
Record the frequencies from the histogram into the following
table; add the relative
frequencies later.
Bin Midpoint Frequency Relative frequency
20.5 20
21.5 21
22.5 22
23.5 23
24.5 24
25.5 25
26.5 26
27.5 27
28.5 28
Record your answers to the following questions
1. How many observations were there in this sample?
2. What is the midpoint of the most frequent class?
(If tied, give lowest midpoint)
3. How many observations were there in the class with midpoint
equal to 22?
{Example 2}
into two columns of the EXCEL
worksheet. Verify that you have entered the correct data.
Calculate and store relative frequencies
in a new column. Record relative frequencies in the above table.
{Examples 3 and 4}
ASSIGNMENT 0
4 INTRODUCTORY STATISTICS LABORATORY
Question B
Data in Table B represents measured yields (q/ha) of a sample
of wheat varieties
evaluated at Tisdale.
calculate the mean value.
4. How many observations were there in this data set?
5. What was the mean yield of this sample of wheat varieties?
{Example 1, and Example 8 a and b}
recorded numerical answers to each of
the five questions, you should now leave EXCEL and submit
your answers for grading by the
ISLeX program.
{Example 5}
- END OF ASSIGNMENT 0 -
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 5
Blank page
ASSIGNMENT 1
6 INTRODUCTORY STATISTICS LABORATORY
Introductory Statistics Laboratory
Assignment #1
Purpose
This lab is an introduction to tabular and graphical methods of
descriptive statistics.
NOTE
As you proceed through this assignment, write your answers in
the spaces provided.
When you exit from EXCEL, you are then required to enter the
answers into the ISLeX
program.
Question A
Data in Table A represents measured yields (q/ha, where 1q = 1
quintal = 100 kg) of a
sample of wheat varieties tested at Saskatoon.
ata into an EXCEL worksheet.
{Example 1}
midpoint (20.5 as the starting bin) and 1
as the interval width (bin size). Note that the lower endpoint of
any interval is the midpoint
minus one-half the interval width while the upper endpoint is
the midpoint plus one-half the
interval width. Record the frequencies in the preceding table;
add relative frequencies later.
Excel places data points that are on a bin boundary in the lower
bin.
Bin Midpoint Frequency Relative frequency
20.5 20
21.5 21
22.5 22
23.5 23
24.5 24
25.5 25
26.5 26
27.5 27
28.5 28
Record your answers to the following questions
1. How many observations were there in this sample?
2. What is the midpoint of the most frequent class?
(If tied, give lowest midpoint)
3. How many observations were greater than 21.5 and less than
or
equal to 22.5
q/ha?
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 7
{Example 2}
EXCEL Worksheet.
{Example 3}
these will be used in question C).
Check the data you have entered and verify that the relative
frequencies sum to 1.0 (within
0.001). Record the relative frequencies in the preceding table.
4. What is the relative frequency of yields in sample A that
were greater than
21.5 and less than or equal to 22.5 ?
{Example 4}
-and-leaf diagram of the data from sample A.
Use an increment of 1.0 between
consecutive stem positions (leaf unit = 0.1). Use the stem-and-
leaf diagram to answer the
following questions.
5. What is the value (in q/ha) of the leaf unit in this stem-and-
leaf diagram?
6. What is the yield (in q/ha) for the item represented by the last
leaf position in
the fifth (from the top) stem position?
{Example 6}
Question B
Data in Table B represents measured yields (q/ha) of a sample
of wheat varieties
evaluated at Tisdale.
{Example 1}
midpoint (24.5 as the first bin) and 1 as
the interval width.
ASSIGNMENT 1
8 INTRODUCTORY STATISTICS LABORATORY
Record the frequencies in the following table; add relative
frequencies later.
Bin Midpoint Frequency Relative frequency
24.5 24
25.5 25
26.5 26
27.5 27
28.5 28
29.5 29
30.5 30
31.5 31
32.5 32
33.5 33
34.5 34
35.5 35
36.5 36
Record your answers to the following questions
7. How many observations were there in this sample?
8. What is the midpoint of the most frequent class?
(If tied, give lowest midpoint)
9. How many observations fell between 31.5 and 32.5 q/ha?
{Example 2}
EXCEL Worksheet.
Calculate the relative frequencies in each class.
Check that the correct information has been entered, that
frequencies sum to the total
number of observations and that the relative frequencies sum to
1.0.
Record the relative frequencies in the preceding table. Answer
the following question.
10. What is the relative frequency of yields in sample B that
were greater than
31.5 and less than or equal to 32.5 q/ha ?
{Example 4}
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 9
Question C
Compare the distributions of yields of wheat varieties in
sample A (Saskatoon) with those
from sample B (Tisdale).
from both samples. Include
appropriate titles and axis labels. Use different line types for
each sample.
Answer the following questions from the relative frequency
polygon.
11. Which of the two samples, Saskatoon (1) or Tisdale (2) has
the highest
relative frequency in the class whose midpoint is 26 q/ha?
(Answer 1 or 2; 0 if same)
12. Which of the two samples, Saskatoon (1) or Tisdale (2) has
the greatest
spread looking at the midpoints?
(i.e. greatest difference between maximum and minimum
midpoint values)?
(Answer 1 or 2; 0 if same)
{Example 7}
recorded numerical answers to each of
the twelve questions, you should now leave EXCEL and submit
your answers for grading by the
ISLeX program.
{Example 5}
- End of Assignment #1 -
ASSIGNMENT 2
10 INTRODUCTORY STATISTICS LABORATORY
Introductory Statistical Laboratory
Assignment #2
Purpose
The three main objectives of this assignment are to:
a) use numerical values as descriptive statistics,
b) introduce the concept of sampling from a population, and
c) demonstrate the effects of sample size.
NOTE
As you proceed through this assignment, write your answers in
the spaces provided.
When you exit from EXCEL, you are then required to enter the
answers into the ISLeX program.
Question A
Data in Table A represents protein concentrations (g/kg) of
boxcar lots of durum wheat
delivered to Thunder Bay, Ontario. This data is supposed to be a
population of data points.
EXCEL worksheet, and name the
column. When viewing the data for the first time, you should try
to determine approximately the
number of items and guess at the average value. Scan the data to
try to determine what the
smallest and largest values are.
{Example 1}
record the values of the following
population characteristics (i.e. parameters).
1. How many data points are there in this data set?
2. What is the mean protein concentration (g/kg)?
3. What is the minimum protein concentration?
4. What is the maximum protein concentration?
5. What is the median protein concentration?
6. What is the value of the first quartile?
7. What is the value of the third quartile?
8. What is the standard deviation of the population of protein
concentrations?
{Example 8}
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 11
Question B
The data in Table B constitutes 10 random samples, each of
size 7, from the population of
protein concentrations. The data file contains seven rows of
data with each row containing ten
columns.
ata for the ten samples into columns of the
EXCEL worksheet.
{Example 1}
the mean, median, standard
deviation, minimum, maximum, first quartile and third quartile
of each of the ten samples.
Record these descriptive statistics in the following table.
Sample Size Mean Median Standard
Deviation
Minimum Maximum Q1 Q3
1 7
2 7
3 7
4 7
5 7
6 7
7 7
8 7
9 7
10 7
{Example 9}
calculated in question A to answer
the following questions.
These questions are designed to get you thinking about how
well sample statistics
represent the characteristics of the population from which the
sample was taken.
9. How many of the ten sample means are less than or equal to
the
population mean?
ASSIGNMENT 2
12 INTRODUCTORY STATISTICS LABORATORY
10. How many of the ten sample medians are exactly equal to
the population
median?
11. How many of the ten sample minimums are less than or
equal to the
population minimum?
12. How many of the ten sample maximums are greater than or
equal to the
population maximum?
13. How many of the sample first quartiles are less than or
equal to the
population first quartile?
14. How many of the sample third quartiles are greater than or
equal to the
population third quartile?
15. Which sample has the largest standard deviation?
16. Which sample has the largest range (=Maximum -
Minimum)?
17. What is the ratio of the largest sample standard deviation to
the smallest
sample standard deviation?
18. What is the ratio of the largest sample mean to the smallest
sample mean?
19. Of the two ratios (Questions 17 and 18), which is the
largest, the ratio
of standard deviations (17) or the ratio of means (18)?
{Answer 17 or 18}
{Example 10}
Question C
The data in Table C constitutes 10 random samples, each of
size 27, from the population
of protein concentrations.
EXCEL worksheet.
{Example 1}
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 13
median, standard deviation,
minimum, maximum, first quartile and third quartile of each of
the ten samples. Record the
descriptive statistics in the following table.
Sample Size Mean Median Standard
Deviation
Minimum Maximum Q1 Q3
1 27
2 27
3 27
4 27
5 27
6 27
7 27
8 27
9 27
10 27
{Example 9}
questions A and B to answer the
following questions.
The following questions are designed to get you thinking about
how the size of the
sample affects relationship between sample statistics and
population parameters.
20. How many of the ten sample minimums were exactly equal
to the population
minimum?
21. How many of the ten sample maximums were exactly equal
to the population
maximum?
22. For samples of size 27, what is the ratio of the largest
sample mean to the
smallest sample mean?
23. For samples of 27, what is the ratio of the largest sample
standard deviation
to the smallest sample standard deviation?
ASSIGNMENT 2
14 INTRODUCTORY STATISTICS LABORATORY
For the following questions, answer 0 if the statement is false
or 1 if it is true.
24. The ratio of the largest sample mean to the smallest sample
mean was less in
samples of 27 than in samples of 7.
25. The ratio of the largest to the smallest sample standard
deviations was greater
in the larger samples.
{Example 10}
- Please use ISLeX to record and grade your answers -
- END OF ASSIGNMENT 2 -
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 15
Blank page
ASSIGNMENT 3
16 INTRODUCTORY STATISTICS LABORATORY
Introductory Statistics Laboratory
Assignment #3
Purpose
This assignment is and introduction to questions concerning
discrete probability
distributions.
NOTE
As you proceed through this assignment, write your answers in
the spaces provided.
When you exit from EXCEL, you will then be required to enter
the answers into the ISLeX
program.
Question A
A binomial experiment consists of repeated trials each with two
possible outcomes. The
outcome of any trial is independent of all other trials. The
binomial distribution gives the
probability that a number X of n independent trials will have
one type of outcome. X can be any
number from 0 up to the total number of trials.
The data in Table A gives the probabilities of observing that X
= 0, 1, .. 20 out of 20
flower seeds from a given lot will germinate.
lumns of the EXCEL worksheet and
attach appropriate names to those
two columns. Then, record the probabilities in the following
table.
{Example 1}
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 17
Number germinated (out of 20) Probability
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Use the table to answer the following questions.
1. What is the probability that all 20 seeds in a random sample
of 20 seeds will
germinate?
2. What is the probability that fewer than 15 seeds in a random
sample of 20
seeds will germinate?
ASSIGNMENT 3
18 INTRODUCTORY STATISTICS LABORATORY
3. What is the probability that at least 17 seeds in a random
sample of 20 will
germinate?
4. What is the probability that the number of seeds in a random
sample of 20 that
will germinate is between 10 and 15?
HINT: Do not include 10 and 15.
5. What is the probability that the number of seeds in a random
sample of 20 that
will germinate will be less than 10 or greater than 17?
HINT : You will have to add the probabilities for 0, 1, .. 9
and 18, 19, 20.
6. What is the mean of this binomial distribution?
HINT: The mean of a discrete variable can be calculated by
summing the
products of each value multiplied by its corresponding
probability.
7. What is the variance of this binomial distribution?
HINT : The variance of a probability distribution is the
mean of the
squares of values minus the square of the mean of values.
{Example 11}
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 19
Question B
This question is based on a Poisson discrete probability
distribution. The distribution is
important in biology and medicine, and can be dealt with in the
same way as any other discrete
distribution.
Red blood cell deficiency may be determined by examining a
specimen of blood under
the microscope. The data in Table B gives a hypothetical
distribution of numbers of red blood
cells in a certain small fixed volume of blood from normal
patients. Theoretically, there is no
upper limit to the value of a POISSON distribution. In reality,
you can force only so many red
blood cells into a given volume.
worksheet, name the columns, and
view the table. Since the table is quite large, you should attempt
to answer the following
questions without actually recording the table.
{Example 1}
questions.
8. What is the probability that a blood sample from this
distribution will have
exactly 20 red blood cells?
9. What is the probability that a blood sample from a normal
person will have
between 19 and 26 red blood cells?
HINT: See questions 3 and 4.
10. What is the probability that a blood sample from a normal
person would have
fewer than 10 red blood cells?
11. What is the probability that a blood sample from a normal
person will have at
least 15 red blood cells?
HINT: Since there is no theoretical upper limit to the Poisson
distribution, the
correct way to answer this question is to calculate 1 –
probability of fewer than
15 red blood cells.
ASSIGNMENT 3
20 INTRODUCTORY STATISTICS LABORATORY
12. A person with a red blood cell count in the lower 2.5
percent of the
distribution might be considered as deficient. What is the red
blood cell
count below which 2.5 percent of the distribution lies?
HINT: You need to determine a value X so that if you sum all
the probabilities
for counts up to and including that value they will sum to at
least 0.025. The
sum of probabilities of all counts up to but excluding X should
be less than
0.025.
You can proceed in the following way.
Look at the table to guess how many probabilities (P[X = 0]
+ P[X = 1] + . . )
should be added to give a sum of approximately 0.025.
Calculate sums of
probabilities for your guess of X.
Continue your guessing of X until you get a sum ≥ 0.025
while the sum for
X-1 < 0.025.
13. What is the mean red blood cell count in this distribution?
14. What is the variance of red blood cell count in this
distribution?
HINT: See question 7, and remember it is a Poisson
distribution.
15. Is the following statement true (1) or false (0) for this
distribution?
In a Poisson distribution, the variance is equal to the
mean (within
rounding error). Record 1 if true, 0 if false.
{Example 11}
Please enter your answers into the ISLeX program
- END OF ASSIGNMENT 3 –
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 21
Blank page
ASSIGNMENT 4
22 INTRODUCTORY STATISTICS LABORATORY
Introductory Statistics Laboratory
Assignment #4
Purpose
This lab is an introduction to questions concerning cumulative
continuous probability
distributions.
NOTES
As you proceed through this assignment, write your answers in
the spaces provided.
When you exit from EXCEL, you will then be required to enter
the answers into the ISLeX
program.
With continuous distributions, P{X = x} = 0. In words, the
probability that a continuous
variable equals a particular value is considered to be zero. For
this reason, all questions
concerning continuous distributions must be phrased in terms of
intervals. Furthermore, the
probability that a continuous variable is less than or equal (≤) to
a particular value is the same as
the probability that the variable is less (<) than that particular
value.
The EXCEL NORM.DIST function gives the probability that a
normal variable is less
than (or equal to) a specified constant.
The terminology concerning probability varies from one source
to another. For this
assignment, consider that probability = relative frequency =
proportion. Also for this
assignment, percentage = 100 * probability.
Question A
Suppose that height (cm) of male university students is
normally distributed with the
mean given in column 1 of Table A (LAB4A.DAT) and a
standard deviation given in column 2
of Table A.
heights from Table A and store
them for further use. The data file contains one row with two
columns. The first column contains
the mean, the second contains the standard deviation.
1. What is the mean height in this population?
2. What is the standard deviation of height in this population?
{Example 12}
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 23
NORM.DIST function, to calculate
answers for the following questions.
3. What proportion of male university students are expected to
have a height
between 170 and 180 cm?
4. What percentage of male university students would have a
height less than
170 cm?
5. If a student is chosen at random from this population, what is
the probability
that he will be taller than 180 cm?
{Example 13}
Question B
Suppose that the average length of telephone calls made by
teenagers is a normally
distributed variable with mean and standard deviation given in
columns 1 and 2 of Table B
(LAB4B.DAT).
mean and standard deviation of the distribution of
lengths of telephone calls from the
first two columns of Table B and store them for further use.
{Example 12}
Use the values and the EXCEL NORM.DIST function to
calculate answers for the
following.
6. What is the mean length of telephone call?
7. What is the standard deviation of this distribution?
8. What is the probability that a random telephone call will last
a length of time
that is within one standard deviation of the mean (± 1 standard
deviation)?
9. What is the proportion of telephone calls that last a length of
time that is
within two standard deviations of the mean (± 2 standard
deviations)?
10. What is the relative frequency of lengths of teenage
telephone calls that lie
within three standard deviations of the mean (± 3 standard
deviations)?
11. What is the probability that a telephone call will be longer
than the mean by
more than 1.645 standard deviations?
{Example 13}
ASSIGNMENT 4
24 INTRODUCTORY STATISTICS LABORATORY
Question C
In a study conducted by Booth et al (Int. J. Sports Psychol.
17:269-279 1986), student
nurses at the University of Ottawa completed the Thurston-
Richardson attitude questionnaire and
voluntarily took the Canadian Home Fitness Test. They found
that the frequency response of
heart rates after a second exercise bout ranged from 101 to 190
beats per minute and seemed to
follow a normal distribution. The mean heart rate was 145 with
a standard deviation of 20.
and standard deviation = 20) to
calculate the answer to the following question.
12. What is the estimated proportion that had a frequency
response of less than
130 after the second exercise session?
{Example 13}
Question D
A standard normal distribution is one for which the mean is
zero and the standard
deviation is unity (1.0). This distribution is often referred to as
the z-distribution.
IST function to calculate answers
to the following questions.
13. What is the probability that a standard normal variable will
have a value less
than 1.96?
14. What is the probability that a standard normal variable will
have a value
between -1 and +1?
{Example 13}
Please enter your answers into the ISLeX program
- END OF ASSIGNMENT 4 -
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 25
Blank page
ASSIGNMENT 5
26 INTRODUCTORY STATISTICS LABORATORY
Introductory Statistics Laboratory
Assignment #5
Purpose
The main objectives of this assignment are to:
a) use a goodness-of-fit test to demonstrate an important
statistical theorem and
b) calculate means and confidence intervals for a single sample
when σ is known and when σ is
not known.
NOTE
As you proceed with this assignment, write your answers in the
spaces provided. When
you have completed the assignment and exit from EXCEL, you
are required to enter your
answers into the ISLeX program.
Question A
The central limit theorem states that means of samples of more
than 30 observations from
any distribution will have a distribution that
a) is approximately normal,
b) has a mean equal to the mean of the original distribution,
and
c) has a standard deviation equal to the standard deviation of
the original distribution
divided by the square root of the sample size.
The Poisson distribution is discrete and skewed; it is decidedly
non-normal! However,
the central limit theorem states that the means of sufficiently
large (n ≥ 30) samples from even a
Poisson distribution will be normally distributed.
The means of 100 samples, each of size 40, from a Poisson
distribution are recorded in
Table A. For this first question, you are required to use the
'goodness-of-fit' test to test the
hypothesis that the means in this file are normally distributed
with a mean of 10 and a standard
deviation of 0.5.
distribution into the EXCEL worksheet.
{Example 1}
the sample means.
1. What is the mean of the 100 sample means?
2. What is the standard deviation of the 100 sample means?
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 27
{Example 9}
means in each of the classes
indicated in the following table. Note that interval endpoints are
midpoint ± 0.5*width and the
interval midpoint is average of two endpoints. Use 9.31, 9.61,
9.91, 10.21, 10.51 and 10.81 as the
“bin” boundaries for the Excel HIST0GRAM procedure.
Class interval Midpoint Expected
frequency
Observed
Frequency
< 9.31 - 8.3794
9.31 - 9.61 9.46 13.3902
9.61 - 9.91 9.76 21.0881
9.91 - 10.21 10.06 23.4181
10.21 - 10.51 10.36 18.3379
10.51 - 10.81 10.66 10.1248
> 10.81 - 5.2616
3. What was the observed frequency of sample means that fell
between 9.91 and
10.21 ?
{Example 2}
EXCEL worksheet and the seven
observed frequencies into another column. Make sure that
expected and observed frequencies for
the same class are entered in the same row. Check that both
columns of data sum to 100 (within
rounding error). If they do not, correct your error(s).
{Example 3}
-of-fit test should now be used to see if the
observed frequencies in two or more
classes of observed values agree sufficiently well with those
expected on the basis of some
hypothesis. In this example, the hypothesis is that the means of
samples will be normal with
mean 10 and standard deviation 0.5.
The test requires that you calculate a chi-squared statistic by:
a) calculating the differences between the observed and
expected frequencies in each class,
b) squaring the differences and dividing by the expected
frequencies in each class, and
c) summing the values from step b.
ASSIGNMENT 5
28 INTRODUCTORY STATISTICS LABORATORY
4. What is the value of (O-E)2/E for the first class ?
5. What is the value of the chi-squared statistic (that is, the sum
over all seven
classes of (O-E)2/E) ?
With seven classes, the chi-square statistic has 7-1 = 6 degrees
of freedom and the critical
value of a 5% significance level is 12.6. If your test statistic is
less than 12.6, you should
conclude that the observed data show a good fit to the
hypothesis.
6. Does the data show a good fit to the normal distribution with
mean 10 and
standard deviation 0.5 (0 for no, 1 for yes) ?
7. Based on your limited experience, is the following statement
true (use 1) or
false (use 0)?
Means of samples of size 40 from a Poisson (discrete)
distribution are
approximately normal (continuous).
{Example 14}
Question B
The time (in minutes) required for six-year old children to
assemble a certain toy is
believed to be normally distributed with a known standard
deviation of 3.0. The data in Table
B gives the assembly times for a random sample of 25 children.
compute and report the mean and
standard deviation.
8. What was the mean assembly time for this sample of 25 six-
year old children?
9. What was the estimated standard deviation?
{Examples 1 and 9}
deviation is known or given, one
should use a standard normal distribution to calculate a
confidence interval for the population
mean. The procedure for calculating a large sample confidence
interval for one mean involves
three basic steps:
a) determine a critical value from the appropriate distribution
(for a 90% confidence
interval with known standard deviation the critical value is
z0.05 = 1.645).
b) calculate the margin of error of the estimate E = zα/2σ/√n,
and
c) calculate lower limit = mean – margin of error,
and upper limit = mean + margin of error
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 29
10. What was the margin of error of the estimate for a 90%
confidence interval?
11. What was the lower limit of the 90% confidence interval for
average
assembly time?
12. What was the upper limit of the 90% confidence interval for
average
assembly time?
13. From this example, would you say that the following
statement is true (use 1)
or false (use 0) ?
The lower confidence limit must always be less than the sample
mean and
the upper confidence limit must always be greater.
14. From this example, would you say that the following
statement is true (use 1)
or false (use 0)?
When one has a choice of a known (or given) standard deviation
and an
estimated standard deviation, one should ignore the estimated
standard
deviation in calculating confidence intervals.
{Example 15}
Question C
The level of monoamine oxidase (MOA) activity (nmol/hr/mg
protein) was recorded for
fourteen non-responsive depressed patients who had been
treated with phenylzine. MOA activity
is assumed to follow a normal distribution. The data are stored
in a single column of Table C.
You are asked to calculate a point estimate and an interval
estimate of the mean MOA activity of
this type of patient. Nothing is known about the variability of
MOA activity.
worksheet, and compute and
report the mean and standard deviation.
15. What was the point estimate for the mean MOA activity for
this sample of 14
depressed patients?
16. What was the standard deviation?
{Examples 1 & 9}
ASSIGNMENT 5
30 INTRODUCTORY STATISTICS LABORATORY
When data has a normal distribution but is from a small
(<30) sample or when data is from a
large sample (≥30) and in either case σ is not known, one
should use a t-distribution to calculate
a confidence interval for the population mean. The procedure
for calculating a confidence
interval for one mean when σ is not known involves three basic
steps:
a) determine a critical value from the appropriate distribution
(for a 90% confidence
interval with estimated standard deviation the critical value is
tα/2,n-1 = t0.05,13 = 1.771),
b) calculate the margin of error of the estimate, E = tα/2,n-
1s/√n, and
c) calculate lower limit = mean – margin of error
and upper limit = mean + margin of error
17. What was the margin of error of estimate for a 90%
confidence interval in
this sample of 14 depressed patients?
18. What was the lower limit of the 90% confidence interval for
average MOA
activity?
19. What was the upper limit of the 90% confidence interval for
average MOA
activity ?
20. From these examples, would you say that the following
statement is true (use
1) or false (use 0)?
All confidence intervals are calculated by calculating a point
estimate and then
subtracting and adding a margin of error of the estimate.
{Example 16}
- END OF ASSIGNMENT 5 -
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 31
Blank page
ASSIGNMENT 6
32 INTRODUCTORY STATISTICS LABORATORY
Introductory Statistics Laboratory
Assignment #6
Purpose
The objectives of this assignment are to:
a) calculate a confidence interval for a proportion and
b) present confidence intervals and tests of hypothesis for
matched pairs.
NOTE
As you proceed with this assignment, write your answers in the
spaces provided. When
you have completed the assignment and exit from EXCEL, you
are required to enter your
answers into the ISLeX program.
Question A
Opinion polls are a popular method for assessing product
preference, political preference,
and more. As a simple example, consider that a poll was taken
ten days prior to a civic election
to try to predict what proportion of the electorate would vote for
the incumbent mayor. The data
in Table A represents the results of a moderate sample of
persons who were asked if they would
vote for the same mayor; a yes was recorded as 1, a no as 0.
You are required to analyze the
results of the poll and predict what proportion of voters will
vote for the incumbent.
he EXCEL worksheet, prepare a
histogram to count the number of yes (1)
and no (0) responses, and calculate the proportion who
indicated that they would vote for the
incumbent mayor. Note that, since yes and no are represented by
1 and 0, the proportion of yes
can be determined by calculating the sum and dividing by the
total sample size.
1. How large was the sample of voters represented in this poll?
2. What proportion of the sample voters indicated they would
vote for the
incumbent mayor?
{Examples 1, 2 and 10}
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 33
voters expected to vote for the
incumbent mayor. The procedure for calculating a confidence
interval for a proportion involves
three basic steps.
3. Determine the α/2 critical value for the appropriate
distribution (standard
normal in this case). Use the NORM.INV function to calculate
the critical value
=NORM.INV(0.95,0,1). What is the critical value for a 90%
confidence interval
based on the standard normal distribution?
4. What is the standard error of the estimated proportion of
polled voters who
favour the incumbent? n
qpsp
ˆˆ
ˆ =
5. What is the margin of error of the estimated proportion?
6. What is the lower 90% confidence limit on the proportion of
voters who will
vote for the incumbent?
7. What is the upper 90% confidence limit on the estimated
proportion of voters
who will vote for the incumbent?
{Example 17}
23,217 of the 58,839 persons that voted
actually voted for the incumbent. Calculate and report the actual
proportion that voted for the
incumbent.
8. What was the proportion that actually voted for the
incumbent?
9. Based on the results given in questions 6, 7 and 8, which of
the following
statements (1, 2 or 3) is most correct?
1 - The poll of a sample of voters gave a good indication of the
final vote.
2 - Many of the voters who would have voted for the incumbent
at the time of the poll
must have changed their minds.
3 - The persons sampled in the poll must have contained an
unusually low proportion of
those who favoured the incumbent.
{Example 10}
Question B
The Monster Chemical Company believes that its herbicide
(Avena-doom) is better than
its competitor's herbicide (Avena-kill) for controlling wild oat
in barley fields. To demonstrate
ASSIGNMENT 6
34 INTRODUCTORY STATISTICS LABORATORY
the advantage of their herbicide over that of their competitor,
Monster grew side-by-side plots of
barley treated with each of the two herbicides in a large sample
of farmers' fields throughout
western Canada. The company then wished to compare the
yields of barley treated with the two
types of herbicides.
Yield of barley will vary from farm to farm regardless of which
herbicide is used. A
difference in climate, differences in agronomic practices, and
differences in type of barley grown
cause variation. For this reason, it is desirable to match the data
from the two plots on each farm.
The analysis is one of looking at differences between matched
pairs.
with Avena-doom (second
column), and barley yield with Avena-kill (third column) from
the three columns in Table B into
columns of the EXCEL worksheet. Describe the data from the
two treatments.
10. What was the average barley yield for plots treated with the
Avena-doom
herbicide?
11. What was the standard deviation of yields of barley plots
treated with
Avena-doom ?
12. What was the average yield of plots treated with Avena-kill?
13. What was the standard deviation with Avena-kill?
{Examples 1 and 9}
calculated and then analyze the
differences.
14. What was the mean of the differences between yield of
barley plots treated
with Avena-doom and Avena-kill ?
15. What was the standard deviation of the differences (for each
pair)?
16. Was the standard deviation of the differences smaller (0) or
larger (1) than
the standard deviation of the barley yields from plots treated
with
Avena-doom?
{Examples 10 and 9}
differences in yield between plots
treated with Avena-doom and those treated with Avena-kill.
NOTE: The standard deviation is estimated from the data so
we use the t distribution.
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 35
17. What is the critical value for the confidence interval?
18. What was the margin of error of the estimated mean
difference?
19. What was the lower limit of the 95% confidence interval for
the average
difference in yields of barley treated with Avena-doom and
barley treated with
Avena-kill?
20. What was the upper limit?
{Example 16}
Question C
Use the same data and results of Question B to investigate the
hypothesis that the
increase in barley yield by using Avena-doom instead of Avena-
kill is no greater than 3.0 q/ha
(300 kg/ha). The alternative to this hypothesis is that the
increase is greater than 3 q/ha.
To test this hypothesis, one must calculate a test statistic, t =
Mean of differences - hypothesized mean ( =3.0)
Standard error of the differences
The null hypothesis should be rejected if the test statistic
exceeds the critical value from
the theoretical distribution. For a 5% significance level, α =
0.05, the critical value for a
one-tailed test can be found by using the appropriate T.INV
function (see Example 18) with n-1
degrees of freedom. For matched pairs, n is the number of pairs.
In this instance, the null hypothesis should be rejected if the
test statistic exceeds the
critical value.
21. 21. What is the value of the test statistic for testing the
hypothesis that the mean
difference is 3.0 q/ha or less?
22. What is the critical value against which the test statistic in
question 21 should
be compared?
23. Should the hypothesis that the yield difference is 3 q/ha or
less be rejected
(1) or not (0)?
{Example 18}
- END OF ASSIGNMENT 6 -
ASSIGNMENT 7
36 INTRODUCTORY STATISTICS LABORATORY
Introductory Statistics Laboratory
Assignment #7
Purpose
This lengthy assignment serves to review calculations of
confidence intervals and tests of
hypothesis for:
a) two means of large independent samples from populations
with unknown and unequal
variances,
b) two means of small independent samples from populations
with the same unknown variance,
c) two proportions from large independent samples.
NOTE
As you proceed with this assignment, write your answers in the
spaces provided. When
you have completed the assignment and exit from EXCEL, you
are required to enter your
answers into the ISLeX program.
Question A
The role that cholesterol plays in the development of
"hardening of the arteries"
(atherosclerosis) and heart disease has been widely reported. In
one experiment, a group of
patients who were considered to be high-risk were split into two
equal groups. The first group
was put on a special diet with a high proportion of fish (salmon,
tuna, mackerel and cod). Oil
from these deep-sea fish is known to be very rich on Omega-3
fatty acids. The other (control)
group was maintained on a standard diet (high-protein, low-fat,
complex carbohydrates and
polyunsaturated cooking oil). The change (decrease) in
cholesterol was measured after a period
of time. A greater change is desirable.
The (simulated) data (mg decrease per decilitre of blood) for
the Omega-3 group is stored
in Table A1, and the data for the control group is stored in
Table A2. You are required to
calculate a 95% confidence interval for the average difference
in cholesterol reduction and to test
the hypothesis that there was no difference between the two
diets in average reduction of
cholesterol.
m the 'Omega-3' group [Table A1] the data
from the 'control' group [Table
A2] into the EXCEL worksheet. Determine and report the
number of observations in each group,
the mean change (mg/dl) in each group and the standard
deviation of the change in each group.
1. How many patients were in each diet group?
2. What was the mean (decrease) in cholesterol for the Omega-3
group of
patients?
3. What was the standard deviation in that group?
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 37
4. What was the mean (decrease) in cholesterol for the control
group of patients?
5. What was the standard deviation in the control group?
{Examples 1 and 9}
variances that are unequal. We
can use the normal distribution as an approximation to the t
distribution when the sample sizes
are large. The method for calculating a large-sample confidence
interval for the difference
between two means consists of three basic steps.
a) Estimate the difference between the two sample means
and the standard error of the
difference between the two sample means.
6. What is the estimated difference of means?
7. Standard error of the difference between means
2
2
2
1
2
1
n
s
n
s
+=
What is the standard error of the difference of means?
b) Calculate the margin of error of the estimated difference of
means. For this
large-sample 95% confidence interval we can approximate with
a z value which is z0.025 = 1.96.
Calculate the confidence interval as difference between means ±
margin of error.
8. What is the margin of error of the estimated difference?
9. What is the lower limit for the 95% confidence interval of the
difference in
cholesterol reduction between Omega-3 and control diets?
10. What is the upper limit?
{Example 19}
difference between the two diets
proceeds as follows. Since we expect that the Omega-3 diet
should give a greater decrease in
cholesterol than the control, we will use a one-tailed alternative
hypothesis. Use a 5%
significance level to test the null hypothesis that there is no
difference between the diets against
an alternative that the difference between Omega-3 and control
groups is greater than zero.
The test of hypothesis has two basic steps:
ASSIGNMENT 7
38 INTRODUCTORY STATISTICS LABORATORY
a) Compute the test statistic (z) as the difference in means
divided by standard error of the
difference.
b) The null hypothesis should be rejected if the test statistic
exceeds the critical value for
a one-tailed alternative (approximately 1.645 for 5%
significance in a large-sample, one-tailed
test).
11. What is the value of the test statistic?
12. Should the null hypothesis be rejected and the conclusion be
that Omega-3
diet did indeed cause a greater reduction in cholesterol than the
control diet?
Yes =1, No = 0
{Example 19}
Question B
In some law schools, the score on a test known as LSAT is an
important criterion for
acceptance. Two law schools decided to compare the LSAT
scores of students registered in their
respective schools. LSAT scores for students in Law school 1
are stored in Table B1 and those
for students from Law school 2 in Table B2.
Assume that the variances of LSAT scores are equal in the two
schools. You are asked to
calculate a 90% confidence interval for the difference in
average LSAT scores and to test the
hypothesis that students from the two schools do not differ in
their average LSAT scores. Use a
5% significance level.
from Law school 2 into the
EXCEL worksheet. Compute and report the number, means and
standard deviations of scores
from each school.
13. How many LSAT scores from school 1?
14. What was the mean LSAT score from school 1?
15. What was the standard deviation of scores from school 1?
16. How many LSAT scores from school 2?
17. What was the mean LSAT score from school 2?
18. What was the standard deviation of scores from school 2?
{Examples 1 and 9}
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 39
eps to calculate a 90% confidence
interval for the difference in mean
LSAT scores when variances are unknown but assumed to be
equal.
a) calculate the difference between the two means (school 1 -
school 2)
b) calculate the pooled variance for the two samples:
c) calculate the standard error of the difference:
d) Calculate the critical value and margin of error for α = 0.10.
Use the T.INV function to
get the critical value. Multiply the critical value by the standard
error of the difference to get the
margin of error. Use degrees of freedom = n1 + n2 – 2.
e) Calculate the lower and upper 90% confidence limits
19. What is the estimated pooled variance for this data?
20. What is the standard error of the difference?
21. What is the margin of error of the difference?
22. What is the lower limit of the difference between the two
schools in LSAT
scores?
{Example 20}
pools =
( 1n -1) 21s + ( 2n -1)
2
2s
( 1n -1) + ( 2n -1)
1n = size, sample 1
2n = size, sample 2
1s = st.dev, sample 1
2s = st.dev, sample 2
sx1−x2 = pools (
1
1n
+ 1
2n
)
ASSIGNMENT 7
40 INTRODUCTORY STATISTICS LABORATORY
hypothesis that the means of the
two groups of LSAT scores are equal when the samples are
independent and the population
variances are unknown but equal. The test statistic is the
difference in means minus zero divided
by the standard error of the difference. The null hypothesis
should be rejected if the test statistic
is less than -tα/2,df or greater than tα/2,df where df = n1 + n2 -
2 and α=0.05 is the chosen
significance level. Use the T.INV function to calculate the
critical values for this two-tailed test.
23. What is the value of the test statistic for testing the
hypothesis that the mean
LSAT scores are the same for the two law schools?
24. Using the 5% significance level, should the null hypothesis
be rejected (1) or
not (0)?
{Example 20}
Question C
The legislature of a southern state in the U.S. passed a rule,
commonly called "no-pass,
no-play", which prohibits a student who fails in any subject
from participating in any
extracurricular activity for six weeks. Data were collected for
students involved in football,
volleyball, cross country, and band for the first six-week
grading period. Records were kept from
last year and this year.
The numbers of students is stored in column 1 and the
proportions sidelined because of
the rule are stored in column 2 of Table C, the first row being
for last year and the second for this
year.
values.
25. How many students were there in last year's sample?
26. What proportion of the last year's students were sidelined
because of one or
more failures?
27. How large was this year's sample?
28. What proportion failed and were sidelined this year?
{Example 1}
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 41
change (last year minus this year)
in proportion of students sidelined.
a) Calculate the difference in proportions.
b) Calculate the standard error of the difference.
n
)p-(1p
+
n
)p-(1p
= s
2
22
1
11
pp
ˆˆˆˆ
2ˆ1ˆ −
c) Calculate the margin of error of estimate. For a 90%
confidence interval with large
samples, use z0.05 = 1.645.
d) Calculate the lower and upper limits.
29. What is the upper 90% confidence limit on the change in
proportion of
students sidelined because of failure?
{Example 21}
an alternative that the proportion
sidelined has decreased (that is, the difference in proportions is
greater than zero). Use a 5%
significance level.
NOTE: Under the null hypothesis, the proportions are equal
and we should therefore calculate
an average proportion for the two groups. This will result in a
new estimate of the standard error
of the difference between sample proportions.
average (pooled) proportion =
30. What was the average (pooled) proportion sidelined?
31. Now use the pooled proportion to calculate the standard
error of the
difference between the two proportions.
)
n
+
n
)(p-(1p = s
2
pp
11
1
2ˆ1ˆ −
What is the value of the test statistic for testing the hypothesis
that the
proportion did not change (remember to divide by the standard
error of the
difference between the two proportions which was calculated
using the
pooled proportion)?
n + n
pn +pn = p
21
2211 ˆˆ
ASSIGNMENT 7
42 INTRODUCTORY STATISTICS LABORATORY
Use a one-tailed test with a 5% significance level to answer
the following question.
Remember that you will reject the null hypothesis if the test
statistic exceeds the critical value
(1.645 in this case).
32. Was the superintendent of schools justified in saying, "We
are very pleased
with the improvement. It shows coaches and students are taking
the rule
seriously"? Answer 1 for yes or 0 for no.
{Example 21}
- END OF ASSIGNMENT 7 –
ASSIGNMENT 8
44 INTRODUCTORY STATISTICS LABORATORY
Introductory Statistics Laboratory
Assignment #8
Purpose
In this assignment calculations will be completed for analyses
of variance for :
a) a one-way design,
b) a two-way design with more than one observation per cell,
and
c) a two-way design with one observation per cell (randomized
complete block design)
NOTE
As you proceed with this assignment, write your answers in the
spaces provided. When you
have completed the assignment and exit from EXCEL, you are
required to enter your answers into
the ISLeX program.
Question A
Gasoline mileage (mpg) was measured on several cars of each
of four different makes
(coded 1, 2, 3 and 4). The make of each car is stored in the first
column, and the mileage for each
car is stored in the second column, of Table A. You need to
conduct an analysis of variance to see if
there are differences among the four makes in gasoline mileage.
You should also estimate the
mileage of each of the four makes of cars.
worksheet. Name the columns and
view the data.
{Example 1}
-way analysis of variance on this data. Since
each data point can be classified only
according to the make of car, a one-way analysis of variance is
required. It is important that students
be able to interpret analysis of variance tables such as those
produced by EXCEL. For this analysis,
you will need to copy data for each make into different adjacent
columns. Fill in the following
one-way analysis of variance table and answer the first five
questions.
Source of
variation
Degrees of
freedom
Sum of
squares
Mean square
F
P
Make of car 3
Error
1. What is the value of the F-statistic for testing the null
hypothesis that there are no
differences in gasoline mileage among the four makes of
automobile?
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 45
2. What are the degrees of freedom associated with the
numerator of this test
statistic?
3. What are the degrees of freedom associated with the
denominator of the F-value
for MAKE of car?
4. What is the estimate of the pooled variance within makes of
cars (i.e. the Error
mean square)?
5. What are the degrees of freedom for this variance in #4?
{Example 22}
NOTE: For the following questions (6 - 13), use the error mean
square and the error degrees of
freedom to calculate confidence intervals and to test hypotheses
about pairs of means.
car
and record them in the following table.
Make of car Number tested Average mileage
1
2
3
4
6. How many cars of make 2 were evaluated in this experiment?
7. What was the average gasoline mileage for make 2?
8. How many cars of make 3 were evaluated in this experiment?
9. What was the average gasoline mileage for make 3?
make 2. Use the method for single
means when σ is not known, but use the Error Mean Square as
the estimate of the variance. The
degrees of freedom will be the Error DF, not n-1!
Reminders:
Confidence Interval = mean ± margin of error
Margin of error = critical value * standard error
Use critical value for T at α/2 = 0.025 and df = error df (t table
or EXCEL T.INV function)
Use standard error = √(error mean square/number of
observations of that make of car)
10. What was the margin of error for the confidence interval for
gasoline mileage
of make 2?
ASSIGNMENT 8
46 INTRODUCTORY STATISTICS LABORATORY
11. What was the lower 95% confidence limit for make 2
mileage?
12. What was the upper 95% confidence limit for make 2
mileage?
{Example 24}
of makes
2 and 3 do not differ. Use the
method for single means when σ is not known with the Error
MS serving as the pooled variance.
Reminders:
Test statistic t = difference of means / standard error of
difference of means.
The standard error of the difference equals square root of the
sum of variances of the two
means. The variance of each mean is estimated by the error
mean square/number of
observations in that mean.
13. What is the value of the t test statistic for testing the
hypothesis that makes 2
and 3 do not differ in mileage?
{Example 24}
Question B
The data in Table B represents the times (in seconds) for men
of three different ages (40, 50
and 60) in each of three different fitness classes (1, 2 and 3) to
run a 2 km course. For each runner,
age is recorded in the first column, fitness category is recorded
in the second column, and running
time is recorded in the third.
Two men in each of the nine categories ran the course. You
should be interested in
determining whether age and/or fitness affect running time.
Each data point can be classified
according to age of the runner or according to fitness of the
runner. The data therefore requires a
two-way analysis of variance. It is possible that differences
among ages of runner will depend upon
the fitness categories of those two runners. The model for the
analysis should include an interaction
term.
the columns, and view the data. You
will have to copy the data into three different columns each
with six observations in order to
perform the following analysis (see Example 25).
{Example 1, 25}
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 47
out a two-way analysis of variance and answer the
following questions.
Source of
variation
Degrees of
freedom
Sum of
squares
Mean square
F
P
Age of runner 2
Fitness of runner 2
Interaction 4
Error 9
14. What is the value of the F test statistic for testing the
hypothesis that age, on
average, has no effect on running time?
15. What are the numerator degrees of freedom for that F
statistic reported in
question 14?
16. What are the denominator degrees of freedom for that F
statistic reported in
question 14?
17. What is the value of the F test statistic for testing the
hypothesis that fitness, on
average, has no effect on running time?
18. What is the value of the F test statistic for testing the
hypothesis that the effect
of age (if any) on running time does not depend of the runner's
fitness?
NOTE
In analysis of variance, the null hypothesis should be rejected
whenever the calculated F-statistic is
greater than the critical value for a chosen significance level
and appropriate numerator and
denominator degrees of freedom. Equivalently, the null
hypothesis should be rejected whenever the
computed p-value is less than the chosen significance level. Use
α = 0.01 (significance level =1 %)
and answer the following two questions.
19. Should the null hypothesis that age has no effect on running
time be rejected (1)
or not rejected (0)?
20. Should the null hypothesis that the effect of age is
independent of the effect of
fitness be rejected (1) or not rejected (0)?
{Example 25}
ASSIGNMENT 8
48 INTRODUCTORY STATISTICS LABORATORY
following three questions.
Age Fitness 1 Fitness 2 Fitness 3 Average
40
50
60
Average
21. What was the average running time for all 60-year olds?
22. What was the average running time for all men in fitness
category 3?
23. What was the mean running time of the two 60-year,
category 3 runners?
{Example 25}
Question C
In many agricultural and biological experiments, one may use a
two-way model with only
one observation per cell. When one of the factors is related to
the grouping of experimental units
into more uniform groups, the design may be called a
randomized complete block design (RCBD).
The analysis is similar to a two-way analysis of variance
(question B) except that the model does
not include an interaction term.
The specific leaf areas (area per unit mass) of three types of
citrus each treated with one of
three levels of shading are stored in Table C. The first column
contains the code for the shading
treatment, the second column contains the code for the citrus
species, and the third column contains
the specific leaf area. Assume that there is no interaction
between citrus species and shading. Carry
out a two-way analysis of this data.
The shading treatment and citrus species are coded as follows:
Treatment Code Species Code
Full sun 1 Shamouti orange 1
Half shade 2 Marsh grapefruit 2
Full shade 3 Clementine mandarin 3
leaf area into the EXCEL worksheet,
label the columns and look at the data.
{Example 1}
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 49
-way (without interaction) analysis of this
data and answer the following questions.
Use a 5% significance level.
Source of
variation
Degrees of
freedom
Sum of
squares
Mean square
F
P
Shading treatment 2
Citrus species 2
Error 4
24. Should the hypothesis that shading treatment has no effect
on specific leaf area
be rejected (1) or not (0)?
25. Should the hypothesis that citrus species do not differ in
specific leaf area be
rejected (1) or not (0)?
26. What is the estimate of the average (pooled) variance in this
experiment (i.e.
Error mean square)?
27. What are the error degrees of freedom for the pooled
variance?
{Example 26}
Recall that the confidence interval for a difference between two
means is based on a
calculation of the margin of error of the estimated difference.
With a common variance (Error MS)
and the same number of observations in all shading treatments,
the margin of error of an estimated
difference will be the same whether we calculate it for
treatments 1 and 2, 1 and 3, or 2 and 3. This
margin of error of the difference between two means is
sometimes referred as the least significant
difference (LSD).
experiment.
LSD = critical t value × standard error of difference.
Use the critical t value with 4 degrees of freedom is t 0.025,4 =
2.776.
n is the number of times of times each treatment was tested (in
this case n = 3 for the 3 species).
n
quareErrorMeanS
t=)LSD( edf/2,
*2
αα
28. What is the least significant difference (α = 0.05) for
comparing shading
treatments in this experiment?
{Example 24}
ASSIGNMENT 8
50 INTRODUCTORY STATISTICS LABORATORY
Any two shading treatments are judged to be significantly
different if their absolute (ignore
the + or - sign) difference exceeds the least significant
difference.
differences. Compare the appropriate
differences to the LSD to answer the following questions.
Shading Treatment Mean Specific Leaf Area
Full Sun
Half Shade
Full Shade
29. Should the hypothesis that the specific leaf area under full
sun is not different
from the specific leaf area in half shade be rejected (1) or not
rejected (0)?
30. Should the hypothesis that the specific leaf areas of half
shade and full shade
are not different be rejected (1) or not rejected (0)?
{Example 24}
- END OF ASSIGNMENT 8 -
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 51
Blank page
ASSIGNMENT 9
52 INTRODUCTORY STATISTICS LABORATORY
Introductory Statistics Laboratory
Assignment #9
Purpose
This final assignment presents some of the important points to
consider in correlation
analysis and simple linear regression analysis.
Question A
The data in Table A gives the (simulated) advertising
expenditures of 25 large companies
for last year and this year. You are asked to investigate the
question of whether or not expenditures
in one year are related to expenditures in another. The data file
contains the company number in the
first column, last year's expenditures ($ millions) in the second
column, and this year's expenditures
($ millions) in the third column.
t,
name the columns, and view the data.
1. Which company had the greatest advertising expenditures last
year?
2. Which company had the greatest advertising expenditures this
year?
{Example 1}
ditures in the
two years and answer the following
question.
3. Which of the following three statements (1, 2 or 3) most
correctly describes the
relationship between last year's and this year's expenditures?
1 - There is little relationship between what a company spends
on advertising in one year and
what that company spends in another.
2 - Companies that spent most on advertising last year tended
to be among those spending the
greatest amount this year.
3 - Companies that spend a lot on advertising in one year tend
to reduce their advertising
expenditures in the next.
{Example 27}
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 53
riables
can be measured by the covariance.
The covariance is a measure of how much two random variables
vary together. The larger the
magnitude of the product, the stronger the strength of the
relationship.
The value of the covariance is interpreted as follows:
• Positive covariance - indicates that higher than average values
of one variable tend to be
paired with higher than average values of the other variable.
• Negative covariance - indicates that higher than average
values of one variable tend to
be paired with lower than average values of the other variable.
• Zero covariance - if the two random variables are independent,
the covariance will be
zero. However, a covariance of zero does not necessarily mean
that the variables are
independent. A nonlinear relationship can exist that still would
result in a covariance
value of zero.
Calculate the standard deviation for last year's expenditures, the
standard deviation for this year's
expenditures and the covariance between the two.
4. What is the standard deviation of last year's advertising
expenditures ($ millions)
of these 25 companies?
5. What is the standard deviation of this year's advertising
expenditures ($ millions)
of these 25 companies?
6. What is the covariance between the last year's and this year's
advertising
expenditures ($ millions2) of these 25 companies?
Because the covariance depends on the units of the data, it is
difficult to compare covariances
among data sets having different scales. A value that might
represent a strong linear relationship
for one data set might represent a very weak one in another.
The correlation coefficient (r) addresses this issue by
normalizing the covariance (i.e. divide the
covariance sxy by the product of the two standard deviations (sx
* sy)), creating a dimensionless
quantity that allows the comparison of different data sets.
7. What is the correlation (r) between last year's and this year's
expenditures?
{Example 28}
ASSIGNMENT 9
54 INTRODUCTORY STATISTICS LABORATORY
expenditures from one year to another?
Test the null hypothesis that there is no relationship between
last year's and this year's expenditures
against an alternative that there is a positive relationship (r >
0). Use a 10% significance level.
Because this is a one-tailed test with 25 pairs of observations
(degrees of freedom = 23), we find
that the critical value against which to compare the estimated
correlation is t = 1.319. Using your r
value and n = 25, calculate the test statistic tcalc and compare.
If the test statistic is greater than the
critical value of 1.319, the null hypothesis will be rejected.
21
2
r
nr=tcalc −
−
8. Should the hypothesis that there is no relationship between
last year's and this
year's advertising expenditures be rejected (1) or not (0)?
{Example 28}
Question B
In a study of the role of young drivers in automobile accidents,
data on percentage of
licensed drivers under the age of 21 and the number of fatal
accidents per 1000 licenses were
determined for 32 cities. The data are stored in Table B. The
first column contains a number as the
city code, the second column contains the percentage of drivers
who are under 21, and the third
column contains the number of fatal accidents per 1000 drivers.
The primary interest is whether or
not the number of fatal accidents is dependent upon the
proportion of licensed drivers that are under
21.
py the data into the EXCEL worksheet, name the
columns, and view the data.
9. Which city (number) had the highest number of fatal
accidents per 1000 licensed
drivers?
{Example 1}
percentage of drivers under 21. Based on the
plot, try to anticipate whether or not the following analysis will
show that there is a significant
increase or decrease in number of fatalities with increases in
percentage of drivers under 21.
{Example 27}
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 55
can be used to predict levels of a
dependent variable for specified levels of an independent
variable. Use the EXCEL REGRESSION
command to calculate the intercept and slope of the least-
squares line, as well as the analysis of
variance associated with that line. Fill in the following table
and use the results to answer the next
few questions. Carefully choose your independent and
dependent variables and input them
correctly using EXCEL’s regression command. In this example,
the percentage of drivers under the
age of 21 affects the number of Fatals/1000 licenses.
The regression equation (least-squares line) is
Fatals/1000 licenses = + % under 21
(intercept) (slope)
Analysis of variance
Source DF SS MS F P
Regression 1 ________ _______ ________ _______
Residual (Error) 30 ________ _______
10. What is the estimated increase in number of fatal accidents
per 1000 licenses
due to a one percent increase in the percentage of drivers under
21 (i.e. the
slope)?
11. What is the standard deviation of the estimated slope?
12. What is the estimated number of fatal accidents per 1000
licenses if there were
no drivers under the age of 21 (i.e. the y intercept)?
13. What percentage of the variation in accident fatalities can
be explained by the
linear relationship with drivers under 21 (i.e. 100 × the
unadjusted coefficient
of determination)?
14. Should the hypothesis that the slope does not differ from
zero (no effect of
young drivers on fatals) be rejected (1) or not (0) based on a
test at the 1%
significance level (i.e. is the p-value from the ANOVA less than
0.01)?
15. What are the degrees of freedom for the standard error of
estimate (and the
standard deviation of the slope); i.e. what are the error degrees
of freedom?
{Example 29}
ASSIGNMENT 9
56 INTRODUCTORY STATISTICS LABORATORY
to calculate a confidence interval for
the slope of the least-squares line and to test hypotheses other
than H0 : ß1 = 0. In both cases, one
needs to have an estimate of the slope and of its standard
deviation (sometimes called standard
error). Furthermore, one needs to recognize that the degrees of
freedom for the standard deviation is
the same as the error degrees of freedom (n - 2).
Note that the EXCEL gives the standard error of estimate
directly, but correctly calls it the standard
deviation of the slope. Therefore, you must not divide by the
square root of sample size as in
example 16.
Use the above information to calculate a 90% confidence
interval for the slope of the true regression
line. For 30 degrees of freedom and α = 0.1, the critical t-value
is 1.697.
16. What is the margin of error for calculating a 90%
confidence interval for the
slope of the regression line (i.e. 1.697 × the standard deviation
of the slope)?
17. What is the lower 90% confidence limit for the slope?
(i.e. slope – margin of error)
18. What is the upper 90% confidence limit for the slope?
(i.e. slope + margin of error)
null hypothesis H0 : ß1 = 0.05 against
a one-sided alternative H1 : ß1 > 0.05. Use a 1 percent
significance level (for which the critical value
is 2.423).
Reminder : t = estimated value - hypothesized value = slope
- 0.05
standard error (deviation) of estimate st dev of slope
19. What is the value of the test statistic for testing this
hypothesis?
20. Should the hypothesis that the increase in fatals per one
percent increase in
drivers under 21 is not greater than 0.05 be rejected (1) or not
(0)?
- END OF ASSIGNMENT #9 - THE LAST ASSIGNMENT -
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 57
Introductory
Statistics
Laboratory
for Excel
PC Instructions for Excel 2013
EXCEL EXAMPLES INTRODUCTION
58 INTRODUCTORY STATISTICS LABORATORY
Excel Examples
INTRODUCTION
Note: Specific Excel 2013 instructions are shown in [Excel
2013: ] throughout the excel
examples.
These EXCEL examples provide a basis for learning to use
MICROSOFT EXCEL to
perform various tasks required in the ISLeX laboratory
assignments.
The examples may not refer exactly to the task to be
performed. For instance, in some
cases, the example may use different columns than required for
a particular task.
Your laboratory sessions will be much less frustrating if you
study the assignment and
associated examples before sitting down at a computer.
The examples will not match exactly what you need to do to
complete your assignments.
They should provide an adequate outline, but you will have to
modify the example to complete
your assigned task. For instance, you will need to use different
file names in your lab
assignments than those used in examples. You will also have to
refer to different EXCEL
worksheet columns.
The EXCEL workbook contains one or more worksheets each
identified by a tab on the
lower left part of the window. EXCEL will assign default
names, such as Sheet 1, to individual
worksheets or the user can change the name by clicking the
right mouse button on the tab and
choosing the 'rename' option.
Each worksheet is composed of cells arranged in rows and
columns. Rows are identified
by numbers 1, 2, 3 and so on, while columns are identified by
letters A, B, C and so on. After
column Z, naming starts with AA and proceeds to ZZ. Each cell
may contain a number, some
text, or a formula.
In this manual, only absolute referencing is used to refer to
cells or blocks of cells. To
refer to the cell located in the second row of column C, use C2.
To indicate all cells in the block
that includes rows 2 to 10 of columns B through D, use the cell
designations for the cell in the
upper left corner (i.e. B2) and for the cell in the lower right
corner (i.e. D10) separated by a
colon, thus B2:D10.
Sometimes, it will be useful to enter a formula into a cell and
then copy that formula to
other cells. If the formula in cell B2 refers to cell A1, it will
refer to cell D5 when the formula is
copied to cell E6. If you wish it to continue to refer to cell A1,
use $A$1 instead of A1 in the
formula.
INTRODUCTION EXCEL EXAMPLES
INTRODUCTORY STATISTICS LABORATORY 59
EXCEL commands and subcommands can be selected by
clicking the left mouse button
on the required command or subcommand.
When you first start using Excel, you should become familiar
with three important areas
in the Excel window. Mention has already been made of the
cells arranged in rows and columns
in the worksheet. In fact there may be several worksheets in a
single workbook.
If you place the cursor in a particular cell, the “Name box”
located at the upper left hand
side of the worksheet will indicate the identity of the active
cell, e.g. B5.
If you type a number, name or formula into that cell, it will
also appear in the “Formula
bar” at the top of the worksheet. If you then press the enter key,
the cursor will move to the next
cell and the formula bar will become blank (if the next cell is
empty). If you had entered an
actual formula, it will be evaluated and the evaluation will be
present in the cell that you entered
the formula. If you made an error and need to edit the formula,
highlight the cell and then move
the cursor to the formula bar to edit the formula.
In these laboratory assignments, you are sometimes required to
combine information
from two parts of an assignment. Typically, each part will result
in a separate workbook in
Excel. You can copy data from one workbook to another by
using the following procedure.
Highlight the data you wish to copy and press Ctrl-C to copy
the data.
Use the Window command of Excel to choose the workbook you
wish to copy to.
Place the cursor where you wish to past the data and press Ctrl-
V
Note: Rather than using Ctrl-C and Ctrl-V to copy and paste,
you may use Edit->Copy and
Edit->Paste.
Most data analysis tools of Excel default to printing their
results on a new worksheet.
However, most also have an option to specify an output range
on the same worksheet. If you
choose the Output range option, click in the adjacent box and
then highlight the area of the
worksheet where you wish to store the results.
EXCEL EXAMPLES EXAMPLE 1
60 INTRODUCTORY STATISTICS LABORATORY
Example 1: Copying data from the assignment webpage into the
EXCEL worksheet.
Your data will be presented to you in a web page. To copy
the data to Excel:
• First highlight the data and either press the key combination
ctrl-c, or select Copy from
the Edit menu to copy the data (to the clipboard).
• Then, switch to the Excel window and either use the key
combination ctrl-v, or select
Paste from the Edit menu to paste the data into Excel.
At this stage, you should now have the data on an Excel
worksheet. (If you wish, you
can name this worksheet LAB0A.DAT by right clicking on its
tab at the bottom and choosing
the rename option.)
This same procedure applies to all assignments. Follow the
above procedure even with
multi-column tables.
If you wish to add a label in cell 1 of column A, move the
cursor to that cell and then
choose Insert->Cells and click OK (or press enter) on the Insert
dialog box to move all cells
down. [Excel 2013: Home Tab – Insert] This will allow you to
type a label in cell A1.
The following procedure will allow you to calculate some
summary statistics for data in
a column. It is good practice to look at summary statistics
before proceeding with further
analysis. This will alert you to the number of data points, their
average value, and a few other
informative characteristics about the data.
Data Analysis… to pop-up Data Analysis window [Excel 2013:
Data Tab – Data
Analysis over on far right side] (SEE NOTE BELOW if Data
Analysis is missing.)
double click on Descriptive statistics
With cursor flashing in Input Range: box, click on column letter
for column with
data
If you have entered a name in the first column, click Labels in
first row.
Click in box preceding Summary statistics, and click on OK or
press the enter key.
EXCEL will create a new worksheet with the summary
statistics. You should note such key
characteristics as count, minimum, mean and maximum. At
more advanced stages, you may
choose to think about kurtosis, skewness and standard deviation
or standard error.
If you wish, you can delete this temporary worksheet by right-
clicking on its tab and
choosing the delete option.
The same basic procedures will be used in later assignments to
enter data from a file that
contains several columns.
EXAMPLE 1 EXCEL EXAMPLES
INTRODUCTORY STATISTICS LABORATORY 61
NOTE: The Analysis ToolPak is a Microsoft Excel add-in
program that is available when
you install Microsoft Office or Excel. To use it in Excel,
however, you need to load it first.
1. Click the File tab, and then click Options.
2. Click Add-Ins, and then in the Manage box, select Excel
Add-ins.
3. Click Go.
4. In the Add-Ins available box, select the Analysis ToolPak
check box, and then click
OK.
a. If Analysis ToolPak is not listed in the Add-Ins available
box, click Browse to
locate it.
b. If you get prompted that the Analysis ToolPak is not
currently installed on your
computer, click Yes to install it.
5. After you load the Analysis ToolPak, the Data Analysis
command is available in the
Analysis group on the Data tab.
EXCEL EXAMPLES EXAMPLE 4
62 INTRODUCTORY STATISTICS LABORATORY
Example 2: Preparing a histogram of data
A histogram is a graphical summary of numerical data. In this
example, data stored in
EXCEL worksheet column A is summarized in a histogram.
Before calculating frequencies in
different groups, you must define the classes. In EXCEL, the
classes are called "bins". For this
example, suppose that the data to be summarized varies from 21
to 28 and you wish to group the
observations into "bins" each with one unit for a class width.
The first bin will include all data
points with values up to and including 22, the second bin will
include values greater than 22 up
to and including 23 and so on. You only need to indicate the
upper boundary for each bin. For
this example, use 22, 23, 24, 25, 26, 27, and 28. These values
need to be entered into a new
column, say column B. You can type the numbers into the first
seven rows of column B.
To actually draw the histogram, you must first calculate
frequencies of data in each bin.
Choose Data analysis [Excel 2013: Data Tab – Data Analysis]
and select Histogram
In the histogram dialog box,
move cursor to Input range and click on top of column A,
move cursor to Bin range and click on top of column B,
if you have a labels in A1 and B1, check the Labels option,
and
click on OK or press the enter key.
EXCEL is very slow at this calculation, so be patient! In a few
seconds, you should get a
new sheet in the workbook that contains the upper ends of the
bin and the frequencies) of
observations in each bin. In this example, the results look like
this
Bin Frequency
22 9
23 6
24 6
25 5
26 7
27 2
28 1
More 0
At this point, you should have a numerical representation of a
histogram. Most
histograms are presented in graphical form. To develop a bar
graph to show the histogram,
proceed as follows. Note that Excel creates a bar graph not a
true histogram as there are spaces
between the bars. A true histogram has no spaces between the
bars.
Highlight the data, including titles, using the cursor.
Insert a chart. [Excel 2013: Insert Tab – in Charts choose Insert
Column Chart – select
2D (first choice of the options)]
Excel will automatically produce a chart.
EXAMPLE 2 AND 3 EXCEL EXAMPLES
INTRODUCTORY STATISTICS LABORATORY 63
A histogram gives the frequency (number of observations) in
each of various classes. In
EXCEL, the classes are defined by giving the upper boundaries
of each class (bin).
The + sign allows you to format your chart’s elements. You
can click on the boxes to
include whatever elements you feel are appropriate for your
chart. If you want to edit the Axis
Title, you can click into that box and type a new axis title.
The paint brush allows you to choose the style and color of your
chart.
This icon allows you to select your data source and make
changes instead of having to
highlight your excel cells that hold the data and start the chart
all over again.
EXCEL EXAMPLES EXAMPLE 4
64 INTRODUCTORY STATISTICS LABORATORY
How to make a true histogram: To get rid of the gaps between
the bars and make a true
histogram, right click on any bar and Excel comes up with a
window with Format Data Series.
Choose Format Data Series (see above arrow).
On this window you will need to choose the three column
symbol (see above arrow) and then
Excel opens Series Options and at the bottom is Gap Width.
Change the gap width to zero and
you will have a true histogram.
EXAMPLE 2 AND 3 EXCEL EXAMPLES
INTRODUCTORY STATISTICS LABORATORY 65
You can change the outline of your bars to a different color to
have them appear separated by
clicking the Outline (see arrow below) and changing the color
to black or white.
The resulting chart looks like this (remember to make changes
to your titles according to best
graphing practices, not shown in this chart):
EXCEL EXAMPLES EXAMPLE 4
66 INTRODUCTORY STATISTICS LABORATORY
Example 3: Entering data from the keyboard into the EXCEL
worksheet
Occasionally, you will be required to enter data or intermediate
results directly into the
EXCEL worksheet. You merely type the data into the cells
where you wish to store the
information.
EXAMPLE 5 EXCEL EXAMPLES
INTRODUCTORY STATISTICS LABORATORY 67
Example 4: Calculating relative frequencies
To calculate relative frequencies in each of several classes,
you must divide each
frequency of a class by the sum of all the frequencies. Consider
data summarized in three classes.
Class Frequency
1 5
2 10
3 5
Total 20
The relative frequency for Class 1 is 5/20 = 0.25, for Class
2 is 10/20 = 0.50, and for
Class 3 is 5/20 = 0.25. Note that the relative frequencies must
always sum to 1.0 (within
rounding error). Thus, 0.25 + 0.50 + 0.25 = 1.0.
If the frequencies are stored in EXCEL Worksheet column C,
you can calculate relative
frequencies and store them in another column in the following
way. Suppose 5 is in cell C1, 10
in cell C2 and 5 in cell C3. Move the cursor to cell D1, type ‘=
C1/SUM($C$1:$C$3)’ in the
formula bar, and press enter. Don’t forget the = at the beginning
of your equation otherwise it
will be entered only as text and will calculate for you. You
should see the value 0.25 in cell D1.
To calculate the remaining relative frequencies, just copy the
formula in cell D1 to cells D2 and
D3. Note that, as the formula is copied, C1 will change to C2
and then to C3, but $C$1:$C$3
will remain constant.
An alternative would be to first calculate the sum (20) and
store in a cell that could then
be used to calculate all relative frequencies. For example, enter
the formula ‘=SUM(C1:C3)’ in
cell C4. Now, use the formula ‘= C1/$C$4’ in cell D1. Again,
copy cell D1 to cells D2 and D3.
You should also confirm that the relative frequencies sum to
1.0.
Use the formula ‘= SUM(D1:D3)’ in cell D4. You can also use
the Σ in the tool bar and Excel
will help you calculate a sum for that column. [Excel
2013:Home Tab – Σ ]
EXCEL EXAMPLES EXAMPLE 5
68 INTRODUCTORY STATISTICS LABORATORY
Example 5: Leaving EXCEL and grading your assignment.
When you have completed an assignment and have recorded
numerical answers to each
of the questions in the INTRODUCTORY STATISTICSD
LABORAOTRY, you should try your
answers in ISLeX.
In submitting your answers to the Introductory Statistics
Laboratory Program (ISLeX),
you are required to use numbers for all answers. Place the
cursor in the appropriate box and type
in your answer. Use the mouse or the tab key to move to the
next box. If you press enter, it will
go right to grading. (You have the option to go back again, so
DO NOT accept unless you are
completely finished.) Click on the “Check my answers” box to
grade your assignment.
At the end of the assignment, your grade will be displayed on
the screen and you will be
given to option of accepting the grade or repeating the
assignment. Once you accept your grade,
you will not be able to repeat the assignment. You are
encouraged to repeat the assignment until
you are satisfied with your effort. You must achieve 80 or
higher to move onto the next
assignment.
EXAMPLE 6 EXCEL EXAMPLES
INTRODUCTORY STATISTICS LABORATORY 69
Example 6: How to prepare a stem-and-leaf diagram
A stem-and-leaf diagram combines graphical and numerical
methods to summarize data.
Unfortunately, EXCEL does not have a command for preparing
a stem-and-leaf diagram.
Suppose you wish to develop a stem-and-leaf diagram of the
following data.
25.6 26.0 25.3 27.2 23.6 26.3 25.4 23.8 21.1 23.4 23.9
23.8 26.0 20.0 22.5 28.0 26.7
24.8 25.1 24.9 26.6 24.9 25.0 27.5 20.6 24.0 22.1 20.0
21.8 24.7 21.7 25.2 27.1 24.8
25.8 26.9 25.6
Enter (or read) the data into a column in EXCEL and then sort
the data from lowest to
highest use the Data->Sort command. [Excel 2013: Data Tab –
Sort] The results follow.
20.0
20.0
20.6
21.1
21.7
21.8
22.1
22.5
23.4
23.6
23.8
23.8
23.9
24.0
24.4
24.7
24.8
24.8
24.9
24.9
25.0
25.1
25.2
25.3
25.4
25.6
25.6
25.8
26.0
26.0
26.3
26.6
26.7
26.9
27.1
27.2
27.5
28.0
If you decide to have leaf units of
0.1, the successive stem units will
be 10 × 0.1 = 1.0 higher than the
previous one. Start by writing the
stem units in a column followed by
a vertical bar.
20 | 20 | 0 0 6
21 | Then, go 21 | 1 7 8
22 | down the data 22 | 1 5
23 | and write the 23 | 4 6 8 8 9
24 | last digit of 24 | 0 4 7 8 8
9 9
25 | each number 25 | 0 1 2 3 4
6 6 8
26 | in the leaf 26 | 0 0 3 6 7
9
27 | position 27 | 1 2 5
28 | 28 | 0
And, finally, add a title and leaf
unit to complete the job.
Stem-and-leaf diagram of example
data.
Leaf unit = 0.1
20 | 0 0 6
21 | 1 7 8
22 | 1 5
23 | 4 6 8 8 9
24 | 0 4 7 8 8 9 9
25 | 0 1 2 3 4 6 6 8
26 | 0 0 3 6 7 9
27 | 1 2 5
28 | 0
EXCEL EXAMPLES EXAMPLE 7
70 INTRODUCTORY STATISTICS LABORATORY
The stem-and-leaf diagram consists of two columns of
numbers. The first column is
called the stem. The second column contains the leaves; one
leaf for each data point. The value
of any number in a leaf position is indicated by the leaf unit,
0.1 in this example. Any number in
a leaf position represents that number multiplied by the leaf unit
0.1. In the first row of the
diagram, the 0 stands for 0 × 0.1 = 0.0, and the 6 stands for 6 ×
0.1 = 0.6.
The value of the numbers in the stem position are 10 × leaf
unit, i.e. 1 in this case. In the
last row, the 28 for 28 × 1 = 28. The final value of any leaf is
calculated by adding the leaf value
to the corresponding stem value. The 0 in the last row
represents the number 0 × 0.1 + 28 × 1 =
28.0. The third leaf in stem position 21 represents 8 × 0.1 + 21
× 1 = 21.8.
EXAMPLE 7 EXCEL EXAMPLES
INTRODUCTORY STATISTICS LABORATORY 71
Example 7: How to draw a frequency (or relative frequency)
polygon.
In this example, midpoints for Samples 1 and 2 are stored in
column A, and relative
frequencies from Sample 1 are stored in column B and relative
frequencies from Sample 2 are
stored in column C of an EXCEL worksheet. In order to
compare the two samples, it will be
useful to plot relative frequencies for both samples on the same
graph.
Here are columns A, B, and C of an example worksheet.
20 0.0357 0.0000
21 0.1429 0.0270
22 0.2143 0.1081
23 0.1786 0.1081
24 0.2500 0.1622
25 0.1071 0.2162
26 0.0714 0.1892
27 0.0000 0.1081
28 0.0000 0.0811
[Excel 2013: highlight the data. Insert Tab – Charts and Choose
SCATTER, then click 2D
‘Straight Line with Markers’]. The resulting graph will look
like:
However, you will want to edit the graph. Click the to edit the
chart. Choose Axes and
move the cursor over until the little right arrow appears, then
choose More Options and then
Click on the histogram picture.
EXCEL EXAMPLES EXAMPLE 7
72 INTRODUCTORY STATISTICS LABORATORY
The resulting graph will now have better representation.
Remember to label your chart title and
axis appropriately (not shown in chart below).
You can now edit the Axis. Change
the minimum Bounds to 19 and the
maximum Bounds to 29. Then
change the Major Units to 1.0.
EXAMPLE 9 EXCEL EXAMPLES
INTRODUCTORY STATISTICS LABORATORY 73
Example 8: How to use EXCEL to calculate various numbers
that summarize the characteristics
of a population (or sample).
In this example, the Function command is used to calculate
various constant values to be
stored in cells in the worksheet. [Excel 2013: Formulas Tab –
Insert Function (fx)]. There are
many different functions that can be used. Some refer to whole
columns, some to individual
observations. The following examples demonstrate a few of the
uses of functions in EXCEL.
You can type the function into any particular cell by first
typing an equal sign in the
formula bar and then typing the name of the function along with
its required arguments. As an
alternative, you can use the [Excel 2013: Formulas Tab – Insert
Function (fx)] to choose a
function and have EXCEL prompt you for necessary arguments.
In this course, you would
probably choose Function category = Statistical and then double
click on the Function name
for the function you want to use.
For this example, consider that there are 22 observations
stored in column A.
a) Determine the number of data points in the population.
=COUNT(A:A)
b) Calculate the mean (= sum of all observations divided by
number of observations)
=SUM(A1:A22)/COUNT(A1:A22)
=AVERAGE(A1:A22)
c) Determine the minimum in this population (the first value in
a magnitude array). If the data
have been sorted from smallest to largest, the smallest
(minimum) value will be in the first
position, cell A1, and the largest will be located in the last
position, cell A22 in this example.
=MIN(A1:A22)
d) Determine the maximum in this population (the last value in
a magnitude array).
=MAX(A1:A22)
e) Determine the median (the middle value in a magnitude
array).
For an odd number of data points, the median is the middle
value. The middle value of n data
points if n is even is given by the average of the values of the
two middle terms.
=MEDIAN(A1:A22)
f) Determine the first quartile.
The first quartile is that value below which one-quarter of the
observations lie. Because there is
no generally accepted definition of quartile, different programs
gives different results for
quartiles. ISLeX is programmed to calculate quartiles in the
same way that Excel uses.
=QUARTILE(A1:A22,1)
EXCEL EXAMPLES EXAMPLE 8
74 INTRODUCTORY STATISTICS LABORATORY
g) Determine the third quartile.
The third quartile is that value below which three-quarters of
the observations lie.
=QUARTILE(A1:A22,3)
NOTE: The median is sometimes referred to as the second
quartile (Q2) because it is the
value below which 2/4 of the values lie. The first quartile (Q1),
the median (Q2) and the third
quartile (Q3) divide the data values into four groups. We know
that 1/4 of the data values are less
than Q1, 1/4 are between Q1 and Q2, 1/4 are between Q2 and
Q3, and 1/4 are greater than Q3.
For some purposes, it may be sufficient to summarize a large
data set by presenting these three
values.
h) Determine the standard deviation.
The standard deviation is the square root of the variance, and
the variance is the average of the
squares of differences between individual data points and the
overall mean. Remember that the
standard deviation of a population is calculated differently than
a standard deviation of a sample.
It is important to know if you have a sample or a population.
=STDEV.S(A1:A22) for a sample
=STDEV.P(A1:A22) for a population
23
20 22 Uses =COUNT(A1:A22) to count number of observations
29 22.77273 Uses =SUM(A1:A22)/COUNT(A1:A22) to
calculate average
29 16 Uses =MIN(A1:A22) to calculate the minimum value
27 30 Uses =MAX(A1:A22) to calculate maximum value
23 23 Uses =MEDIAN(A1:A22) to calculate median value
17 19 Uses =QUARTILE(A1:A22,1) to calculate first quartile
17 27.75 Uses =QUARTILE(A1,A22,3) to calculate third
quartile
22 4.669372 Uses =STDEV.S(A1:A22) to calculate standard
deviation for a sample
23
25
21
21
18
16
21
24
19
27
19
25
24
EXAMPLE 9 EXCEL EXAMPLES
INTRODUCTORY STATISTICS LABORATORY 75
Example 9: How to use the DESCRIPTIVE STATISTICS
command of EXCEL
The Descriptive statistics command of EXCEL will
automatically calculate most of the
summary statistics required of data in a single column [Excel
2013 Data Tab – Data Analysis and
then choose Descriptive Statistics]. By listing several columns,
the
Descriptive statistics command can be applied to several
columns simultaneously.
Consider that data has been stored in column A. To calculate
summary statistics for this
column, follow these steps.
Excel 2013: Data Tab and choose Data Analysis (on right)
Double click on Descriptive statistics in the Data Analysis
dialog box
Set Input range to = A:A (or just highlight the data with the
cursor)
Click on Summary statistics
Click on OK
Your results will be on a new worksheet and will look like this
(move column borders to
see full text).
Column1
Mean 23.90909
Standard Error 1.038041
Median 23.5
Mode #NUM!
Standard Deviation 4.868843
Sample Variance 23.70563
Kurtosis -1.32235
Skewness -0.11628
Range 14
Minimum 16
Maximum 30
Sum 526
Count 22
This approach gives many of the summary statistics described
in the preceding example
as well as several others. The #NUM! Message means only that
there are several possible values
for the mode in this data set.
EXCEL EXAMPLES EXAMPLE 10
76 INTRODUCTORY STATISTICS LABORATORY
Example 10: Further uses of the EXCEL->As a calculator
EXCEL can also be used as a calculator.
The following statements would allow you to calculate 5.6-3.2
= 2.4 and store it in a cell
in the EXCEL worksheet. It is important to start your equation
with an “=” otherwise the
calculator function is not enabled .
=5.6-3.2
If 5.6 was stored in cell D3 and 3.2 was stored in cell D4, you
could also use
=D3-D4
The second option may be useful if 5.6 and 3.2 may be used in
other calculations.
This same scheme may be used for all elementary mathematical
operations.
Use - to indicate subtraction [ = 5.6 - 3.2]
Use + to indicate addition [ = 5.6 + 3.2]
Use * to indicated multiplication [ = 5.6 * 3.2]
Use / to indicate division [ = 5.6 / 3.2]
Use POWER to indicate exponentiation [ = POWER(5.6, 3.2)]
EXAMPLE 12 EXCEL EXAMPLES
INTRODUCTORY STATISTICS LABORATORY 77
Example 11: Calculations with a discrete probability
distribution
In this example, EXCEL is used to answer various questions
dealing with a discrete
probability distribution. EXCEL worksheet column A contains
the event names and column B
contains the corresponding probabilities. In PL SC 314, we will
discuss only events that
represent counts; e.g. number of seeds germinated, number of
red blood cells, number of live
plantlets, number of microbial colonies, et cetera.
0 0.018316
1 0.073263
2 0.146525
3 0.195367
4 0.195367
5 0.156293
6 0.104196
7 0.059540
8 0.029770
9 0.013231
10 0.005292
11 0.001925
12 0.000642
13 0.000197
14 0.000056
15 0.000015
16 0.000004
17 0.000001
18 0.000000
19 0.000000
20 0.000000
Suppose one were interested in the probability of exactly 10 in
this distribution. This can
be read directly from column B in the row position
corresponding to A = 10. Thus, P(X = 10) =
0.005292.
A powerful way of calculating the probabilities of compound
events is to sum parts of the
probability table.
Suppose you want the probability of less than 13. You must add
the probabilities for 0, 1,
. . 12. Those probabilities are in cells B1:B13. To calculate the
probability, you could move to
cell C1 and enter the formula = SUM(B1:B13). In this example,
the probability of less than 13 is
0.99973 or 99.973 percent.
Note that terms such as 'less than 13' and 'fewer than 13'
include all possible values from
the smallest up to, but excluding, 13.
Similarly, 'more than 13' or 'greater than 13' would not include
13. Moreover, the term
'between 5 and 10' would include 6, 7, 8 and 9, and would
exclude 5 and 10.
EXCEL EXAMPLES EXAMPLE 11
78 INTRODUCTORY STATISTICS LABORATORY
However, ‘no more than 13’ would include 13. ‘At least 13’
would include 13 and all
higher values.
The following three examples show other questions that can be
dealt with in this general
manner.
a) P[10 < X < 21] = ?
= P(11) + P(12) + P(13) + P(14) + … + P(20). P(11) is listed in
row 12 of column
B while P(20) is listed in row 21 of column B.
= SUM(B12:B21) = 0.0028398
b) P[(X < 6) or (X > 14)] = ?
In this example, calculate P(0) + P(1) + … +P(5) + P(15) +
P(16) + … + P(20)
= SUM(B1:B6)+SUM(B16:B21) = 0.78515
c) P[X > 0] = ?
= SUM(B2:B21) = 0.98168
or = 1 - B1 = 0.98168
In order to calculate the mean of a probability distribution, one
must use the methods for
calculating the mean of a relative frequency distribution. The
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx
Introductory Statistics Laboratory for Excel .docx

More Related Content

Similar to Introductory Statistics Laboratory for Excel .docx

ENGR 102B Microsoft Excel Proficiency LevelsPlease have your in.docx
ENGR 102B Microsoft Excel Proficiency LevelsPlease have your in.docxENGR 102B Microsoft Excel Proficiency LevelsPlease have your in.docx
ENGR 102B Microsoft Excel Proficiency LevelsPlease have your in.docxYASHU40
 
Workshop 4
Workshop 4Workshop 4
Workshop 4eeetq
 
TSTD 6251  Fall 2014SPSS Exercise and Assignment 120 PointsI.docx
TSTD 6251  Fall 2014SPSS Exercise and Assignment 120 PointsI.docxTSTD 6251  Fall 2014SPSS Exercise and Assignment 120 PointsI.docx
TSTD 6251  Fall 2014SPSS Exercise and Assignment 120 PointsI.docxnanamonkton
 
Multiple regression in spss
Multiple regression in spssMultiple regression in spss
Multiple regression in spssDr. Ravneet Kaur
 
Math 131 he goal of this lab is to find descriptive statistics/tutorialoutlet
Math 131 he goal of this lab is to find descriptive statistics/tutorialoutletMath 131 he goal of this lab is to find descriptive statistics/tutorialoutlet
Math 131 he goal of this lab is to find descriptive statistics/tutorialoutletHussanz
 
Using microsoft excel for weibull analysis
Using microsoft excel for weibull analysisUsing microsoft excel for weibull analysis
Using microsoft excel for weibull analysisMelvin Carter
 
Electronic Keno Project 3 Overview and Rationale.docx
  Electronic Keno Project 3 Overview and Rationale.docx  Electronic Keno Project 3 Overview and Rationale.docx
Electronic Keno Project 3 Overview and Rationale.docxShiraPrater50
 
EX16_WD_CH03_GRADER_CAP_AS (completed solution)
EX16_WD_CH03_GRADER_CAP_AS (completed solution)EX16_WD_CH03_GRADER_CAP_AS (completed solution)
EX16_WD_CH03_GRADER_CAP_AS (completed solution)NinaDobrev22
 
Using References in your Lab Writeup Make sure you both c.docx
Using References in your Lab Writeup  Make sure you both c.docxUsing References in your Lab Writeup  Make sure you both c.docx
Using References in your Lab Writeup Make sure you both c.docxdickonsondorris
 
V.6 CSPro Tabulation Application_Creating Tables with PostCalc Application.pptx
V.6 CSPro Tabulation Application_Creating Tables with PostCalc Application.pptxV.6 CSPro Tabulation Application_Creating Tables with PostCalc Application.pptx
V.6 CSPro Tabulation Application_Creating Tables with PostCalc Application.pptxEmmanuelAzuela3
 
Cmc chapter 02
Cmc chapter 02Cmc chapter 02
Cmc chapter 02Jane Hamze
 
D-Pubb-TestingExperience_Issue_28_2014-12_Berta
D-Pubb-TestingExperience_Issue_28_2014-12_BertaD-Pubb-TestingExperience_Issue_28_2014-12_Berta
D-Pubb-TestingExperience_Issue_28_2014-12_BertaBerta Danilo
 
Lecture 3 Dispersion(1).pptx
Lecture 3 Dispersion(1).pptxLecture 3 Dispersion(1).pptx
Lecture 3 Dispersion(1).pptxssuser378d7c
 
EX16_WD_CH03_GRADER_CAP_AS - Weather 1.0 (completed solution)
EX16_WD_CH03_GRADER_CAP_AS - Weather 1.0 (completed solution)EX16_WD_CH03_GRADER_CAP_AS - Weather 1.0 (completed solution)
EX16_WD_CH03_GRADER_CAP_AS - Weather 1.0 (completed solution)NinaDobrev22
 
DDBA 8307 Week 2 Assignment ExemplarJohn Doe[footnoteRef1] .docx
DDBA 8307 Week 2 Assignment ExemplarJohn Doe[footnoteRef1] .docxDDBA 8307 Week 2 Assignment ExemplarJohn Doe[footnoteRef1] .docx
DDBA 8307 Week 2 Assignment ExemplarJohn Doe[footnoteRef1] .docxedwardmarivel
 
measure of dispersion
measure of dispersion measure of dispersion
measure of dispersion som allul
 
Graphical presentation of data
Graphical presentation of dataGraphical presentation of data
Graphical presentation of dataprince irfan
 
DATA ANALYSIS FOR BUSINESS ch02-Discriptive Statistics_Tabular and Graphical ...
DATA ANALYSIS FOR BUSINESS ch02-Discriptive Statistics_Tabular and Graphical ...DATA ANALYSIS FOR BUSINESS ch02-Discriptive Statistics_Tabular and Graphical ...
DATA ANALYSIS FOR BUSINESS ch02-Discriptive Statistics_Tabular and Graphical ...kellymeinhold327
 

Similar to Introductory Statistics Laboratory for Excel .docx (20)

ENGR 102B Microsoft Excel Proficiency LevelsPlease have your in.docx
ENGR 102B Microsoft Excel Proficiency LevelsPlease have your in.docxENGR 102B Microsoft Excel Proficiency LevelsPlease have your in.docx
ENGR 102B Microsoft Excel Proficiency LevelsPlease have your in.docx
 
Workshop 4
Workshop 4Workshop 4
Workshop 4
 
TSTD 6251  Fall 2014SPSS Exercise and Assignment 120 PointsI.docx
TSTD 6251  Fall 2014SPSS Exercise and Assignment 120 PointsI.docxTSTD 6251  Fall 2014SPSS Exercise and Assignment 120 PointsI.docx
TSTD 6251  Fall 2014SPSS Exercise and Assignment 120 PointsI.docx
 
Multiple regression in spss
Multiple regression in spssMultiple regression in spss
Multiple regression in spss
 
Math 131 he goal of this lab is to find descriptive statistics/tutorialoutlet
Math 131 he goal of this lab is to find descriptive statistics/tutorialoutletMath 131 he goal of this lab is to find descriptive statistics/tutorialoutlet
Math 131 he goal of this lab is to find descriptive statistics/tutorialoutlet
 
Using microsoft excel for weibull analysis
Using microsoft excel for weibull analysisUsing microsoft excel for weibull analysis
Using microsoft excel for weibull analysis
 
Electronic Keno Project 3 Overview and Rationale.docx
  Electronic Keno Project 3 Overview and Rationale.docx  Electronic Keno Project 3 Overview and Rationale.docx
Electronic Keno Project 3 Overview and Rationale.docx
 
EX16_WD_CH03_GRADER_CAP_AS (completed solution)
EX16_WD_CH03_GRADER_CAP_AS (completed solution)EX16_WD_CH03_GRADER_CAP_AS (completed solution)
EX16_WD_CH03_GRADER_CAP_AS (completed solution)
 
Using References in your Lab Writeup Make sure you both c.docx
Using References in your Lab Writeup  Make sure you both c.docxUsing References in your Lab Writeup  Make sure you both c.docx
Using References in your Lab Writeup Make sure you both c.docx
 
V.6 CSPro Tabulation Application_Creating Tables with PostCalc Application.pptx
V.6 CSPro Tabulation Application_Creating Tables with PostCalc Application.pptxV.6 CSPro Tabulation Application_Creating Tables with PostCalc Application.pptx
V.6 CSPro Tabulation Application_Creating Tables with PostCalc Application.pptx
 
Cmc chapter 02
Cmc chapter 02Cmc chapter 02
Cmc chapter 02
 
D-Pubb-TestingExperience_Issue_28_2014-12_Berta
D-Pubb-TestingExperience_Issue_28_2014-12_BertaD-Pubb-TestingExperience_Issue_28_2014-12_Berta
D-Pubb-TestingExperience_Issue_28_2014-12_Berta
 
Lecture 3 Dispersion(1).pptx
Lecture 3 Dispersion(1).pptxLecture 3 Dispersion(1).pptx
Lecture 3 Dispersion(1).pptx
 
EX16_WD_CH03_GRADER_CAP_AS - Weather 1.0 (completed solution)
EX16_WD_CH03_GRADER_CAP_AS - Weather 1.0 (completed solution)EX16_WD_CH03_GRADER_CAP_AS - Weather 1.0 (completed solution)
EX16_WD_CH03_GRADER_CAP_AS - Weather 1.0 (completed solution)
 
Chapter 11
Chapter 11Chapter 11
Chapter 11
 
DDBA 8307 Week 2 Assignment ExemplarJohn Doe[footnoteRef1] .docx
DDBA 8307 Week 2 Assignment ExemplarJohn Doe[footnoteRef1] .docxDDBA 8307 Week 2 Assignment ExemplarJohn Doe[footnoteRef1] .docx
DDBA 8307 Week 2 Assignment ExemplarJohn Doe[footnoteRef1] .docx
 
measure of dispersion
measure of dispersion measure of dispersion
measure of dispersion
 
8407195.ppt
8407195.ppt8407195.ppt
8407195.ppt
 
Graphical presentation of data
Graphical presentation of dataGraphical presentation of data
Graphical presentation of data
 
DATA ANALYSIS FOR BUSINESS ch02-Discriptive Statistics_Tabular and Graphical ...
DATA ANALYSIS FOR BUSINESS ch02-Discriptive Statistics_Tabular and Graphical ...DATA ANALYSIS FOR BUSINESS ch02-Discriptive Statistics_Tabular and Graphical ...
DATA ANALYSIS FOR BUSINESS ch02-Discriptive Statistics_Tabular and Graphical ...
 

More from normanibarber20063

Assist with first annotated bibliography.  Assist with f.docx
Assist with first annotated bibliography.  Assist with f.docxAssist with first annotated bibliography.  Assist with f.docx
Assist with first annotated bibliography.  Assist with f.docxnormanibarber20063
 
Assistance needed with SQL commandsI need assistance with the quer.docx
Assistance needed with SQL commandsI need assistance with the quer.docxAssistance needed with SQL commandsI need assistance with the quer.docx
Assistance needed with SQL commandsI need assistance with the quer.docxnormanibarber20063
 
assingment Assignment Agenda Comparison Grid and Fact Sheet or .docx
assingment Assignment Agenda Comparison Grid and Fact Sheet or .docxassingment Assignment Agenda Comparison Grid and Fact Sheet or .docx
assingment Assignment Agenda Comparison Grid and Fact Sheet or .docxnormanibarber20063
 
Assimilate the lessons learned from the dream sequences in Defense o.docx
Assimilate the lessons learned from the dream sequences in Defense o.docxAssimilate the lessons learned from the dream sequences in Defense o.docx
Assimilate the lessons learned from the dream sequences in Defense o.docxnormanibarber20063
 
Assignmnt-500 words with 2 referencesRecognizing the fa.docx
Assignmnt-500 words with 2 referencesRecognizing the fa.docxAssignmnt-500 words with 2 referencesRecognizing the fa.docx
Assignmnt-500 words with 2 referencesRecognizing the fa.docxnormanibarber20063
 
Assignmnt-700 words with 3 referencesToday, there is a crisi.docx
Assignmnt-700 words with 3 referencesToday, there is a crisi.docxAssignmnt-700 words with 3 referencesToday, there is a crisi.docx
Assignmnt-700 words with 3 referencesToday, there is a crisi.docxnormanibarber20063
 
Assignment  For Paper #2, you will pick two poems on a similar th.docx
Assignment  For Paper #2, you will pick two poems on a similar th.docxAssignment  For Paper #2, you will pick two poems on a similar th.docx
Assignment  For Paper #2, you will pick two poems on a similar th.docxnormanibarber20063
 
Assignment Write an essay comparingcontrasting two thingspeople.docx
Assignment Write an essay comparingcontrasting two thingspeople.docxAssignment Write an essay comparingcontrasting two thingspeople.docx
Assignment Write an essay comparingcontrasting two thingspeople.docxnormanibarber20063
 
Assignment Travel Journal to Points of Interest from the Early Midd.docx
Assignment Travel Journal to Points of Interest from the Early Midd.docxAssignment Travel Journal to Points of Interest from the Early Midd.docx
Assignment Travel Journal to Points of Interest from the Early Midd.docxnormanibarber20063
 
Assignment What are the factors that influence the selection of .docx
Assignment What are the factors that influence the selection of .docxAssignment What are the factors that influence the selection of .docx
Assignment What are the factors that influence the selection of .docxnormanibarber20063
 
Assignment Write a research paper that contains the following.docx
Assignment Write a research paper that contains the following.docxAssignment Write a research paper that contains the following.docx
Assignment Write a research paper that contains the following.docxnormanibarber20063
 
Assignment Thinking about Managers and Leaders· Identifya man.docx
Assignment Thinking about Managers and Leaders· Identifya man.docxAssignment Thinking about Managers and Leaders· Identifya man.docx
Assignment Thinking about Managers and Leaders· Identifya man.docxnormanibarber20063
 
Assignment Talk to friends, family, potential beneficiaries abou.docx
Assignment Talk to friends, family, potential beneficiaries abou.docxAssignment Talk to friends, family, potential beneficiaries abou.docx
Assignment Talk to friends, family, potential beneficiaries abou.docxnormanibarber20063
 
Assignment The objective of assignment is to provide a Power .docx
Assignment The objective of assignment is to provide a Power .docxAssignment The objective of assignment is to provide a Power .docx
Assignment The objective of assignment is to provide a Power .docxnormanibarber20063
 
Assignment During the on-ground, residency portion of Skill.docx
Assignment During the on-ground, residency portion of Skill.docxAssignment During the on-ground, residency portion of Skill.docx
Assignment During the on-ground, residency portion of Skill.docxnormanibarber20063
 
Assignment PurposeThe first part of this assignment will assist.docx
Assignment PurposeThe first part of this assignment will assist.docxAssignment PurposeThe first part of this assignment will assist.docx
Assignment PurposeThe first part of this assignment will assist.docxnormanibarber20063
 
Assignment PowerPoint Based on what you have learned so .docx
Assignment PowerPoint Based on what you have learned so .docxAssignment PowerPoint Based on what you have learned so .docx
Assignment PowerPoint Based on what you have learned so .docxnormanibarber20063
 
Assignment In essay format, please answer the following quest.docx
Assignment In essay format, please answer the following quest.docxAssignment In essay format, please answer the following quest.docx
Assignment In essay format, please answer the following quest.docxnormanibarber20063
 
Assignment NameUnit 2 Discussion BoardDeliverable Length150-.docx
Assignment NameUnit 2 Discussion BoardDeliverable Length150-.docxAssignment NameUnit 2 Discussion BoardDeliverable Length150-.docx
Assignment NameUnit 2 Discussion BoardDeliverable Length150-.docxnormanibarber20063
 
Assignment In essay format, please answer the following questions.docx
Assignment In essay format, please answer the following questions.docxAssignment In essay format, please answer the following questions.docx
Assignment In essay format, please answer the following questions.docxnormanibarber20063
 

More from normanibarber20063 (20)

Assist with first annotated bibliography.  Assist with f.docx
Assist with first annotated bibliography.  Assist with f.docxAssist with first annotated bibliography.  Assist with f.docx
Assist with first annotated bibliography.  Assist with f.docx
 
Assistance needed with SQL commandsI need assistance with the quer.docx
Assistance needed with SQL commandsI need assistance with the quer.docxAssistance needed with SQL commandsI need assistance with the quer.docx
Assistance needed with SQL commandsI need assistance with the quer.docx
 
assingment Assignment Agenda Comparison Grid and Fact Sheet or .docx
assingment Assignment Agenda Comparison Grid and Fact Sheet or .docxassingment Assignment Agenda Comparison Grid and Fact Sheet or .docx
assingment Assignment Agenda Comparison Grid and Fact Sheet or .docx
 
Assimilate the lessons learned from the dream sequences in Defense o.docx
Assimilate the lessons learned from the dream sequences in Defense o.docxAssimilate the lessons learned from the dream sequences in Defense o.docx
Assimilate the lessons learned from the dream sequences in Defense o.docx
 
Assignmnt-500 words with 2 referencesRecognizing the fa.docx
Assignmnt-500 words with 2 referencesRecognizing the fa.docxAssignmnt-500 words with 2 referencesRecognizing the fa.docx
Assignmnt-500 words with 2 referencesRecognizing the fa.docx
 
Assignmnt-700 words with 3 referencesToday, there is a crisi.docx
Assignmnt-700 words with 3 referencesToday, there is a crisi.docxAssignmnt-700 words with 3 referencesToday, there is a crisi.docx
Assignmnt-700 words with 3 referencesToday, there is a crisi.docx
 
Assignment  For Paper #2, you will pick two poems on a similar th.docx
Assignment  For Paper #2, you will pick two poems on a similar th.docxAssignment  For Paper #2, you will pick two poems on a similar th.docx
Assignment  For Paper #2, you will pick two poems on a similar th.docx
 
Assignment Write an essay comparingcontrasting two thingspeople.docx
Assignment Write an essay comparingcontrasting two thingspeople.docxAssignment Write an essay comparingcontrasting two thingspeople.docx
Assignment Write an essay comparingcontrasting two thingspeople.docx
 
Assignment Travel Journal to Points of Interest from the Early Midd.docx
Assignment Travel Journal to Points of Interest from the Early Midd.docxAssignment Travel Journal to Points of Interest from the Early Midd.docx
Assignment Travel Journal to Points of Interest from the Early Midd.docx
 
Assignment What are the factors that influence the selection of .docx
Assignment What are the factors that influence the selection of .docxAssignment What are the factors that influence the selection of .docx
Assignment What are the factors that influence the selection of .docx
 
Assignment Write a research paper that contains the following.docx
Assignment Write a research paper that contains the following.docxAssignment Write a research paper that contains the following.docx
Assignment Write a research paper that contains the following.docx
 
Assignment Thinking about Managers and Leaders· Identifya man.docx
Assignment Thinking about Managers and Leaders· Identifya man.docxAssignment Thinking about Managers and Leaders· Identifya man.docx
Assignment Thinking about Managers and Leaders· Identifya man.docx
 
Assignment Talk to friends, family, potential beneficiaries abou.docx
Assignment Talk to friends, family, potential beneficiaries abou.docxAssignment Talk to friends, family, potential beneficiaries abou.docx
Assignment Talk to friends, family, potential beneficiaries abou.docx
 
Assignment The objective of assignment is to provide a Power .docx
Assignment The objective of assignment is to provide a Power .docxAssignment The objective of assignment is to provide a Power .docx
Assignment The objective of assignment is to provide a Power .docx
 
Assignment During the on-ground, residency portion of Skill.docx
Assignment During the on-ground, residency portion of Skill.docxAssignment During the on-ground, residency portion of Skill.docx
Assignment During the on-ground, residency portion of Skill.docx
 
Assignment PurposeThe first part of this assignment will assist.docx
Assignment PurposeThe first part of this assignment will assist.docxAssignment PurposeThe first part of this assignment will assist.docx
Assignment PurposeThe first part of this assignment will assist.docx
 
Assignment PowerPoint Based on what you have learned so .docx
Assignment PowerPoint Based on what you have learned so .docxAssignment PowerPoint Based on what you have learned so .docx
Assignment PowerPoint Based on what you have learned so .docx
 
Assignment In essay format, please answer the following quest.docx
Assignment In essay format, please answer the following quest.docxAssignment In essay format, please answer the following quest.docx
Assignment In essay format, please answer the following quest.docx
 
Assignment NameUnit 2 Discussion BoardDeliverable Length150-.docx
Assignment NameUnit 2 Discussion BoardDeliverable Length150-.docxAssignment NameUnit 2 Discussion BoardDeliverable Length150-.docx
Assignment NameUnit 2 Discussion BoardDeliverable Length150-.docx
 
Assignment In essay format, please answer the following questions.docx
Assignment In essay format, please answer the following questions.docxAssignment In essay format, please answer the following questions.docx
Assignment In essay format, please answer the following questions.docx
 

Recently uploaded

How to Manage Closest Location in Odoo 17 Inventory
How to Manage Closest Location in Odoo 17 InventoryHow to Manage Closest Location in Odoo 17 Inventory
How to Manage Closest Location in Odoo 17 InventoryCeline George
 
Financial Accounting IFRS, 3rd Edition-dikompresi.pdf
Financial Accounting IFRS, 3rd Edition-dikompresi.pdfFinancial Accounting IFRS, 3rd Edition-dikompresi.pdf
Financial Accounting IFRS, 3rd Edition-dikompresi.pdfMinawBelay
 
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45MysoreMuleSoftMeetup
 
Application of Matrices in real life. Presentation on application of matrices
Application of Matrices in real life. Presentation on application of matricesApplication of Matrices in real life. Presentation on application of matrices
Application of Matrices in real life. Presentation on application of matricesRased Khan
 
Essential Safety precautions during monsoon season
Essential Safety precautions during monsoon seasonEssential Safety precautions during monsoon season
Essential Safety precautions during monsoon seasonMayur Khatri
 
philosophy and it's principles based on the life
philosophy and it's principles based on the lifephilosophy and it's principles based on the life
philosophy and it's principles based on the lifeNitinDeodare
 
Dementia (Alzheimer & vasular dementia).
Dementia (Alzheimer & vasular dementia).Dementia (Alzheimer & vasular dementia).
Dementia (Alzheimer & vasular dementia).Mohamed Rizk Khodair
 
Incoming and Outgoing Shipments in 2 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 2 STEPS Using Odoo 17Incoming and Outgoing Shipments in 2 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 2 STEPS Using Odoo 17Celine George
 
Morse OER Some Benefits and Challenges.pptx
Morse OER Some Benefits and Challenges.pptxMorse OER Some Benefits and Challenges.pptx
Morse OER Some Benefits and Challenges.pptxjmorse8
 
Championnat de France de Tennis de table/
Championnat de France de Tennis de table/Championnat de France de Tennis de table/
Championnat de France de Tennis de table/siemaillard
 
BỘ LUYỆN NGHE TIẾNG ANH 8 GLOBAL SUCCESS CẢ NĂM (GỒM 12 UNITS, MỖI UNIT GỒM 3...
BỘ LUYỆN NGHE TIẾNG ANH 8 GLOBAL SUCCESS CẢ NĂM (GỒM 12 UNITS, MỖI UNIT GỒM 3...BỘ LUYỆN NGHE TIẾNG ANH 8 GLOBAL SUCCESS CẢ NĂM (GỒM 12 UNITS, MỖI UNIT GỒM 3...
BỘ LUYỆN NGHE TIẾNG ANH 8 GLOBAL SUCCESS CẢ NĂM (GỒM 12 UNITS, MỖI UNIT GỒM 3...Nguyen Thanh Tu Collection
 
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdfINU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdfbu07226
 
Discover the Dark Web .pdf InfosecTrain
Discover the Dark Web .pdf  InfosecTrainDiscover the Dark Web .pdf  InfosecTrain
Discover the Dark Web .pdf InfosecTraininfosec train
 
2024_Student Session 2_ Set Plan Preparation.pptx
2024_Student Session 2_ Set Plan Preparation.pptx2024_Student Session 2_ Set Plan Preparation.pptx
2024_Student Session 2_ Set Plan Preparation.pptxmansk2
 
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjjStl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjjMohammed Sikander
 
The Last Leaf, a short story by O. Henry
The Last Leaf, a short story by O. HenryThe Last Leaf, a short story by O. Henry
The Last Leaf, a short story by O. HenryEugene Lysak
 
Behavioral-sciences-dr-mowadat rana (1).pdf
Behavioral-sciences-dr-mowadat rana (1).pdfBehavioral-sciences-dr-mowadat rana (1).pdf
Behavioral-sciences-dr-mowadat rana (1).pdfaedhbteg
 
MichaelStarkes_UncutGemsProjectSummary.pdf
MichaelStarkes_UncutGemsProjectSummary.pdfMichaelStarkes_UncutGemsProjectSummary.pdf
MichaelStarkes_UncutGemsProjectSummary.pdfmstarkes24
 
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...Nguyen Thanh Tu Collection
 
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17Celine George
 

Recently uploaded (20)

How to Manage Closest Location in Odoo 17 Inventory
How to Manage Closest Location in Odoo 17 InventoryHow to Manage Closest Location in Odoo 17 Inventory
How to Manage Closest Location in Odoo 17 Inventory
 
Financial Accounting IFRS, 3rd Edition-dikompresi.pdf
Financial Accounting IFRS, 3rd Edition-dikompresi.pdfFinancial Accounting IFRS, 3rd Edition-dikompresi.pdf
Financial Accounting IFRS, 3rd Edition-dikompresi.pdf
 
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45
 
Application of Matrices in real life. Presentation on application of matrices
Application of Matrices in real life. Presentation on application of matricesApplication of Matrices in real life. Presentation on application of matrices
Application of Matrices in real life. Presentation on application of matrices
 
Essential Safety precautions during monsoon season
Essential Safety precautions during monsoon seasonEssential Safety precautions during monsoon season
Essential Safety precautions during monsoon season
 
philosophy and it's principles based on the life
philosophy and it's principles based on the lifephilosophy and it's principles based on the life
philosophy and it's principles based on the life
 
Dementia (Alzheimer & vasular dementia).
Dementia (Alzheimer & vasular dementia).Dementia (Alzheimer & vasular dementia).
Dementia (Alzheimer & vasular dementia).
 
Incoming and Outgoing Shipments in 2 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 2 STEPS Using Odoo 17Incoming and Outgoing Shipments in 2 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 2 STEPS Using Odoo 17
 
Morse OER Some Benefits and Challenges.pptx
Morse OER Some Benefits and Challenges.pptxMorse OER Some Benefits and Challenges.pptx
Morse OER Some Benefits and Challenges.pptx
 
Championnat de France de Tennis de table/
Championnat de France de Tennis de table/Championnat de France de Tennis de table/
Championnat de France de Tennis de table/
 
BỘ LUYỆN NGHE TIẾNG ANH 8 GLOBAL SUCCESS CẢ NĂM (GỒM 12 UNITS, MỖI UNIT GỒM 3...
BỘ LUYỆN NGHE TIẾNG ANH 8 GLOBAL SUCCESS CẢ NĂM (GỒM 12 UNITS, MỖI UNIT GỒM 3...BỘ LUYỆN NGHE TIẾNG ANH 8 GLOBAL SUCCESS CẢ NĂM (GỒM 12 UNITS, MỖI UNIT GỒM 3...
BỘ LUYỆN NGHE TIẾNG ANH 8 GLOBAL SUCCESS CẢ NĂM (GỒM 12 UNITS, MỖI UNIT GỒM 3...
 
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdfINU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
 
Discover the Dark Web .pdf InfosecTrain
Discover the Dark Web .pdf  InfosecTrainDiscover the Dark Web .pdf  InfosecTrain
Discover the Dark Web .pdf InfosecTrain
 
2024_Student Session 2_ Set Plan Preparation.pptx
2024_Student Session 2_ Set Plan Preparation.pptx2024_Student Session 2_ Set Plan Preparation.pptx
2024_Student Session 2_ Set Plan Preparation.pptx
 
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjjStl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjj
 
The Last Leaf, a short story by O. Henry
The Last Leaf, a short story by O. HenryThe Last Leaf, a short story by O. Henry
The Last Leaf, a short story by O. Henry
 
Behavioral-sciences-dr-mowadat rana (1).pdf
Behavioral-sciences-dr-mowadat rana (1).pdfBehavioral-sciences-dr-mowadat rana (1).pdf
Behavioral-sciences-dr-mowadat rana (1).pdf
 
MichaelStarkes_UncutGemsProjectSummary.pdf
MichaelStarkes_UncutGemsProjectSummary.pdfMichaelStarkes_UncutGemsProjectSummary.pdf
MichaelStarkes_UncutGemsProjectSummary.pdf
 
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
 
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
 

Introductory Statistics Laboratory for Excel .docx

  • 1. Introductory Statistics Laboratory for Excel Lab Manual Author: R. J. (Bob) Baker December 2003 Revised by: Krista Wilde (2016) i Table of Contents Assignment #0 _____________________________________________________ __________ 2
  • 2. Assignment #1 _____________________________________________________ __________ 6 Assignment #2 _____________________________________________________ _________ 10 Assignment #3 _____________________________________________________ _________ 16 Assignment #4 _____________________________________________________ _________ 22 Assignment #5 _____________________________________________________ _________ 26 Assignment #6 _____________________________________________________ _________ 32 Assignment #7 _____________________________________________________ _________ 36 Assignment #8 _____________________________________________________ _________ 44 Assignment #9 _____________________________________________________ _________ 52 INTRODUCTION _____________________________________________________ ______ 58
  • 3. Example 1: Reading data from a data file into the EXCEL worksheet. _________________ 60 Example 2: Preparing a histogram of data ________________________________________ 62 Example 3: Entering data from the keyboard into the EXCEL worksheet _______________ 66 Example 4: Calculating relative frequencies ______________________________________ 67 Example 5: Leaving EXCEL and grading your assignment __________________________ 68 Example 6: How to prepare a stem-and-leaf diagram _______________________________ 69 Example 7: How to draw a frequency (or relative frequency) polygon __________________ 71 Example 8: How to use EXCEL to calculate various numbers that summarize the characteristics of a population (or sample) ________________________________________ 73 Example 9: How to use the DESCRIPTIVE STATISTICS command of EXCEL _________ 75 Example 10: Further uses of EXCEL->As a calculator _____________________________ 76 Example 11: Calculations with a discrete probability distribution _____________________ 77 Example 12: Reading and storing constants for further use __________________________ 79 Example 13: Using EXCEL to answer questions about continuous distributions _________ 80 Example 14: How to calculate a chi-squared statistic for a 'goodness-of-fit' test _________ 82
  • 4. Example 15: How to calculate a confidence interval for one mean when σ is known ______ 84 Example 16: How to calculate a confidence interval for one mean when σ is NOT known _ 85 Example 17: How to calculate a confidence interval for a binomial proportion __________ 86 Example 18: How to calculate a test of hypothesis concerning one mean when σ is NOT known _____________________________________________________ ________________ 87 ii Example 19: Large sample confidence intervals and tests of hypothesis for differences between two means when population variance is unknown and equal _________________________ 89 Example 20: Confidence intervals and tests of hypothesis for differences between two means for independent samples: population variances are unknown but equal ________________ 91 Example 21: Large sample confidence intervals and tests of hypothesis for differences between two binomial proportions. _____________________________________________________ 94 Example 22: How to carry out a one-way analysis of variance.
  • 5. _______________________ 97 Example 23: . _____________________________________________________ _________ 101 Example 24: How to use information from analysis of variance to calculate confidence intervals or test hypotheses about treatment means (including least significant difference). 101 Example 25: How to perform a two-way analysis of variance. _______________________ 103 Example 26: How to calculate a randomized complete block analysis of variance _______ 106 Example 27: How to prepare a scatterplot of two variables. _________________________ 108 Example 28: How to calculate a correlation coefficient. ____________________________ 111 Example 29: How to perform a regression analysis using EXCEL ____________________ 113 Blank page ASSIGNMENT 0
  • 6. 2 INTRODUCTORY STATISTICS LABORATORY Introductory Statistics Laboratory Assignment #0 Purpose This assignment is designed for use in the instructed introduction for students using the Introductory Statistics Laboratory for Excel (ISLeX) program. NOTES Login to ISLeX and get the data for Assignment 0. Then start Microsoft Excel and determine the answers to the questions in this assignment. When finished, exit from EXCEL, return to ISLeX and submit your answers. In this assignment, all students use the same data set. In remaining assignments, each student will have unique data sets. See the examples indicated by {Example } to learn how to use EXCEL to perform a particular task. Reference to an example will be given at the end of each major task. The symbol beginning of a new task. Question A
  • 7. Data called LAB0A.DAT in Table A represents measured yields (q/ha, where 1q = 1 quintal = 100 kg) of a sample of wheat varieties tested at Saskatoon. EL worksheet. {Example 1} midpoint (20.5 as its upper bin) and 1 as the interval width (bin size). LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 3
  • 8. Record the frequencies from the histogram into the following table; add the relative frequencies later. Bin Midpoint Frequency Relative frequency 20.5 20 21.5 21 22.5 22 23.5 23 24.5 24 25.5 25 26.5 26 27.5 27 28.5 28 Record your answers to the following questions 1. How many observations were there in this sample? 2. What is the midpoint of the most frequent class? (If tied, give lowest midpoint) 3. How many observations were there in the class with midpoint equal to 22? {Example 2}
  • 9. into two columns of the EXCEL worksheet. Verify that you have entered the correct data. Calculate and store relative frequencies in a new column. Record relative frequencies in the above table. {Examples 3 and 4} ASSIGNMENT 0 4 INTRODUCTORY STATISTICS LABORATORY Question B Data in Table B represents measured yields (q/ha) of a sample of wheat varieties evaluated at Tisdale. calculate the mean value.
  • 10. 4. How many observations were there in this data set? 5. What was the mean yield of this sample of wheat varieties? {Example 1, and Example 8 a and b} recorded numerical answers to each of the five questions, you should now leave EXCEL and submit your answers for grading by the ISLeX program. {Example 5} - END OF ASSIGNMENT 0 - LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 5 Blank page ASSIGNMENT 1 6 INTRODUCTORY STATISTICS LABORATORY
  • 11. Introductory Statistics Laboratory Assignment #1 Purpose This lab is an introduction to tabular and graphical methods of descriptive statistics. NOTE As you proceed through this assignment, write your answers in the spaces provided. When you exit from EXCEL, you are then required to enter the answers into the ISLeX program. Question A Data in Table A represents measured yields (q/ha, where 1q = 1 quintal = 100 kg) of a sample of wheat varieties tested at Saskatoon. ata into an EXCEL worksheet. {Example 1} midpoint (20.5 as the starting bin) and 1 as the interval width (bin size). Note that the lower endpoint of any interval is the midpoint minus one-half the interval width while the upper endpoint is the midpoint plus one-half the interval width. Record the frequencies in the preceding table;
  • 12. add relative frequencies later. Excel places data points that are on a bin boundary in the lower bin. Bin Midpoint Frequency Relative frequency 20.5 20 21.5 21 22.5 22 23.5 23 24.5 24 25.5 25 26.5 26 27.5 27 28.5 28 Record your answers to the following questions 1. How many observations were there in this sample? 2. What is the midpoint of the most frequent class? (If tied, give lowest midpoint) 3. How many observations were greater than 21.5 and less than or equal to 22.5 q/ha?
  • 13. LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 7 {Example 2} EXCEL Worksheet. {Example 3} these will be used in question C). Check the data you have entered and verify that the relative frequencies sum to 1.0 (within 0.001). Record the relative frequencies in the preceding table. 4. What is the relative frequency of yields in sample A that were greater than 21.5 and less than or equal to 22.5 ? {Example 4} -and-leaf diagram of the data from sample A. Use an increment of 1.0 between consecutive stem positions (leaf unit = 0.1). Use the stem-and- leaf diagram to answer the following questions. 5. What is the value (in q/ha) of the leaf unit in this stem-and- leaf diagram?
  • 14. 6. What is the yield (in q/ha) for the item represented by the last leaf position in the fifth (from the top) stem position? {Example 6} Question B Data in Table B represents measured yields (q/ha) of a sample of wheat varieties evaluated at Tisdale. {Example 1} midpoint (24.5 as the first bin) and 1 as the interval width.
  • 15. ASSIGNMENT 1 8 INTRODUCTORY STATISTICS LABORATORY Record the frequencies in the following table; add relative frequencies later. Bin Midpoint Frequency Relative frequency 24.5 24 25.5 25 26.5 26 27.5 27 28.5 28 29.5 29 30.5 30 31.5 31 32.5 32 33.5 33 34.5 34 35.5 35 36.5 36 Record your answers to the following questions 7. How many observations were there in this sample? 8. What is the midpoint of the most frequent class? (If tied, give lowest midpoint)
  • 16. 9. How many observations fell between 31.5 and 32.5 q/ha? {Example 2} EXCEL Worksheet. Calculate the relative frequencies in each class. Check that the correct information has been entered, that frequencies sum to the total number of observations and that the relative frequencies sum to 1.0. Record the relative frequencies in the preceding table. Answer the following question. 10. What is the relative frequency of yields in sample B that were greater than 31.5 and less than or equal to 32.5 q/ha ? {Example 4} LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 9 Question C
  • 17. Compare the distributions of yields of wheat varieties in sample A (Saskatoon) with those from sample B (Tisdale). from both samples. Include appropriate titles and axis labels. Use different line types for each sample. Answer the following questions from the relative frequency polygon. 11. Which of the two samples, Saskatoon (1) or Tisdale (2) has the highest relative frequency in the class whose midpoint is 26 q/ha? (Answer 1 or 2; 0 if same) 12. Which of the two samples, Saskatoon (1) or Tisdale (2) has the greatest spread looking at the midpoints? (i.e. greatest difference between maximum and minimum midpoint values)? (Answer 1 or 2; 0 if same) {Example 7} recorded numerical answers to each of the twelve questions, you should now leave EXCEL and submit
  • 18. your answers for grading by the ISLeX program. {Example 5} - End of Assignment #1 - ASSIGNMENT 2 10 INTRODUCTORY STATISTICS LABORATORY Introductory Statistical Laboratory Assignment #2 Purpose The three main objectives of this assignment are to: a) use numerical values as descriptive statistics, b) introduce the concept of sampling from a population, and c) demonstrate the effects of sample size. NOTE As you proceed through this assignment, write your answers in the spaces provided. When you exit from EXCEL, you are then required to enter the answers into the ISLeX program. Question A
  • 19. Data in Table A represents protein concentrations (g/kg) of boxcar lots of durum wheat delivered to Thunder Bay, Ontario. This data is supposed to be a population of data points. EXCEL worksheet, and name the column. When viewing the data for the first time, you should try to determine approximately the number of items and guess at the average value. Scan the data to try to determine what the smallest and largest values are. {Example 1} record the values of the following population characteristics (i.e. parameters). 1. How many data points are there in this data set? 2. What is the mean protein concentration (g/kg)? 3. What is the minimum protein concentration? 4. What is the maximum protein concentration? 5. What is the median protein concentration? 6. What is the value of the first quartile?
  • 20. 7. What is the value of the third quartile? 8. What is the standard deviation of the population of protein concentrations? {Example 8} LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 11 Question B The data in Table B constitutes 10 random samples, each of size 7, from the population of protein concentrations. The data file contains seven rows of data with each row containing ten columns. ata for the ten samples into columns of the EXCEL worksheet. {Example 1} the mean, median, standard deviation, minimum, maximum, first quartile and third quartile of each of the ten samples.
  • 21. Record these descriptive statistics in the following table. Sample Size Mean Median Standard Deviation Minimum Maximum Q1 Q3 1 7 2 7 3 7 4 7 5 7 6 7 7 7 8 7 9 7 10 7 {Example 9} calculated in question A to answer the following questions.
  • 22. These questions are designed to get you thinking about how well sample statistics represent the characteristics of the population from which the sample was taken. 9. How many of the ten sample means are less than or equal to the population mean? ASSIGNMENT 2 12 INTRODUCTORY STATISTICS LABORATORY 10. How many of the ten sample medians are exactly equal to the population median? 11. How many of the ten sample minimums are less than or equal to the population minimum? 12. How many of the ten sample maximums are greater than or equal to the
  • 23. population maximum? 13. How many of the sample first quartiles are less than or equal to the population first quartile? 14. How many of the sample third quartiles are greater than or equal to the population third quartile? 15. Which sample has the largest standard deviation? 16. Which sample has the largest range (=Maximum - Minimum)? 17. What is the ratio of the largest sample standard deviation to the smallest sample standard deviation? 18. What is the ratio of the largest sample mean to the smallest sample mean? 19. Of the two ratios (Questions 17 and 18), which is the largest, the ratio
  • 24. of standard deviations (17) or the ratio of means (18)? {Answer 17 or 18} {Example 10} Question C The data in Table C constitutes 10 random samples, each of size 27, from the population of protein concentrations. EXCEL worksheet. {Example 1} LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 13
  • 25. median, standard deviation, minimum, maximum, first quartile and third quartile of each of the ten samples. Record the descriptive statistics in the following table. Sample Size Mean Median Standard Deviation Minimum Maximum Q1 Q3 1 27 2 27 3 27 4 27 5 27 6 27 7 27 8 27 9 27 10 27 {Example 9}
  • 26. questions A and B to answer the following questions. The following questions are designed to get you thinking about how the size of the sample affects relationship between sample statistics and population parameters. 20. How many of the ten sample minimums were exactly equal to the population minimum? 21. How many of the ten sample maximums were exactly equal to the population maximum? 22. For samples of size 27, what is the ratio of the largest sample mean to the smallest sample mean? 23. For samples of 27, what is the ratio of the largest sample standard deviation to the smallest sample standard deviation?
  • 27. ASSIGNMENT 2 14 INTRODUCTORY STATISTICS LABORATORY For the following questions, answer 0 if the statement is false or 1 if it is true. 24. The ratio of the largest sample mean to the smallest sample mean was less in samples of 27 than in samples of 7. 25. The ratio of the largest to the smallest sample standard deviations was greater in the larger samples. {Example 10} - Please use ISLeX to record and grade your answers - - END OF ASSIGNMENT 2 - LABORATORY ASSIGNMENTS
  • 28. INTRODUCTORY STATISTICS LABORATORY 15 Blank page ASSIGNMENT 3 16 INTRODUCTORY STATISTICS LABORATORY Introductory Statistics Laboratory Assignment #3 Purpose This assignment is and introduction to questions concerning discrete probability distributions. NOTE As you proceed through this assignment, write your answers in the spaces provided. When you exit from EXCEL, you will then be required to enter the answers into the ISLeX program.
  • 29. Question A A binomial experiment consists of repeated trials each with two possible outcomes. The outcome of any trial is independent of all other trials. The binomial distribution gives the probability that a number X of n independent trials will have one type of outcome. X can be any number from 0 up to the total number of trials. The data in Table A gives the probabilities of observing that X = 0, 1, .. 20 out of 20 flower seeds from a given lot will germinate. lumns of the EXCEL worksheet and attach appropriate names to those two columns. Then, record the probabilities in the following table. {Example 1} LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 17 Number germinated (out of 20) Probability 0 1
  • 31. 20 Use the table to answer the following questions. 1. What is the probability that all 20 seeds in a random sample of 20 seeds will germinate? 2. What is the probability that fewer than 15 seeds in a random sample of 20 seeds will germinate? ASSIGNMENT 3 18 INTRODUCTORY STATISTICS LABORATORY 3. What is the probability that at least 17 seeds in a random sample of 20 will germinate? 4. What is the probability that the number of seeds in a random sample of 20 that
  • 32. will germinate is between 10 and 15? HINT: Do not include 10 and 15. 5. What is the probability that the number of seeds in a random sample of 20 that will germinate will be less than 10 or greater than 17? HINT : You will have to add the probabilities for 0, 1, .. 9 and 18, 19, 20. 6. What is the mean of this binomial distribution? HINT: The mean of a discrete variable can be calculated by summing the products of each value multiplied by its corresponding probability. 7. What is the variance of this binomial distribution? HINT : The variance of a probability distribution is the mean of the squares of values minus the square of the mean of values. {Example 11} LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 19
  • 33. Question B This question is based on a Poisson discrete probability distribution. The distribution is important in biology and medicine, and can be dealt with in the same way as any other discrete distribution. Red blood cell deficiency may be determined by examining a specimen of blood under the microscope. The data in Table B gives a hypothetical distribution of numbers of red blood cells in a certain small fixed volume of blood from normal patients. Theoretically, there is no upper limit to the value of a POISSON distribution. In reality, you can force only so many red blood cells into a given volume. worksheet, name the columns, and view the table. Since the table is quite large, you should attempt to answer the following questions without actually recording the table. {Example 1} questions. 8. What is the probability that a blood sample from this distribution will have exactly 20 red blood cells?
  • 34. 9. What is the probability that a blood sample from a normal person will have between 19 and 26 red blood cells? HINT: See questions 3 and 4. 10. What is the probability that a blood sample from a normal person would have fewer than 10 red blood cells? 11. What is the probability that a blood sample from a normal person will have at least 15 red blood cells? HINT: Since there is no theoretical upper limit to the Poisson distribution, the correct way to answer this question is to calculate 1 – probability of fewer than 15 red blood cells. ASSIGNMENT 3 20 INTRODUCTORY STATISTICS LABORATORY
  • 35. 12. A person with a red blood cell count in the lower 2.5 percent of the distribution might be considered as deficient. What is the red blood cell count below which 2.5 percent of the distribution lies? HINT: You need to determine a value X so that if you sum all the probabilities for counts up to and including that value they will sum to at least 0.025. The sum of probabilities of all counts up to but excluding X should be less than 0.025. You can proceed in the following way. Look at the table to guess how many probabilities (P[X = 0] + P[X = 1] + . . ) should be added to give a sum of approximately 0.025. Calculate sums of probabilities for your guess of X. Continue your guessing of X until you get a sum ≥ 0.025 while the sum for X-1 < 0.025. 13. What is the mean red blood cell count in this distribution?
  • 36. 14. What is the variance of red blood cell count in this distribution? HINT: See question 7, and remember it is a Poisson distribution. 15. Is the following statement true (1) or false (0) for this distribution? In a Poisson distribution, the variance is equal to the mean (within rounding error). Record 1 if true, 0 if false. {Example 11} Please enter your answers into the ISLeX program - END OF ASSIGNMENT 3 –
  • 37. LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 21 Blank page ASSIGNMENT 4 22 INTRODUCTORY STATISTICS LABORATORY Introductory Statistics Laboratory Assignment #4 Purpose This lab is an introduction to questions concerning cumulative continuous probability distributions. NOTES As you proceed through this assignment, write your answers in the spaces provided. When you exit from EXCEL, you will then be required to enter the answers into the ISLeX
  • 38. program. With continuous distributions, P{X = x} = 0. In words, the probability that a continuous variable equals a particular value is considered to be zero. For this reason, all questions concerning continuous distributions must be phrased in terms of intervals. Furthermore, the probability that a continuous variable is less than or equal (≤) to a particular value is the same as the probability that the variable is less (<) than that particular value. The EXCEL NORM.DIST function gives the probability that a normal variable is less than (or equal to) a specified constant. The terminology concerning probability varies from one source to another. For this assignment, consider that probability = relative frequency = proportion. Also for this assignment, percentage = 100 * probability. Question A Suppose that height (cm) of male university students is normally distributed with the mean given in column 1 of Table A (LAB4A.DAT) and a standard deviation given in column 2 of Table A. heights from Table A and store
  • 39. them for further use. The data file contains one row with two columns. The first column contains the mean, the second contains the standard deviation. 1. What is the mean height in this population? 2. What is the standard deviation of height in this population? {Example 12} LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 23 NORM.DIST function, to calculate answers for the following questions. 3. What proportion of male university students are expected to have a height between 170 and 180 cm? 4. What percentage of male university students would have a height less than 170 cm?
  • 40. 5. If a student is chosen at random from this population, what is the probability that he will be taller than 180 cm? {Example 13} Question B Suppose that the average length of telephone calls made by teenagers is a normally distributed variable with mean and standard deviation given in columns 1 and 2 of Table B (LAB4B.DAT). mean and standard deviation of the distribution of lengths of telephone calls from the first two columns of Table B and store them for further use. {Example 12} Use the values and the EXCEL NORM.DIST function to calculate answers for the following. 6. What is the mean length of telephone call? 7. What is the standard deviation of this distribution? 8. What is the probability that a random telephone call will last
  • 41. a length of time that is within one standard deviation of the mean (± 1 standard deviation)? 9. What is the proportion of telephone calls that last a length of time that is within two standard deviations of the mean (± 2 standard deviations)? 10. What is the relative frequency of lengths of teenage telephone calls that lie within three standard deviations of the mean (± 3 standard deviations)? 11. What is the probability that a telephone call will be longer than the mean by more than 1.645 standard deviations? {Example 13}
  • 42. ASSIGNMENT 4 24 INTRODUCTORY STATISTICS LABORATORY Question C In a study conducted by Booth et al (Int. J. Sports Psychol. 17:269-279 1986), student nurses at the University of Ottawa completed the Thurston- Richardson attitude questionnaire and voluntarily took the Canadian Home Fitness Test. They found that the frequency response of heart rates after a second exercise bout ranged from 101 to 190 beats per minute and seemed to follow a normal distribution. The mean heart rate was 145 with a standard deviation of 20. and standard deviation = 20) to calculate the answer to the following question. 12. What is the estimated proportion that had a frequency response of less than 130 after the second exercise session? {Example 13} Question D
  • 43. A standard normal distribution is one for which the mean is zero and the standard deviation is unity (1.0). This distribution is often referred to as the z-distribution. IST function to calculate answers to the following questions. 13. What is the probability that a standard normal variable will have a value less than 1.96? 14. What is the probability that a standard normal variable will have a value between -1 and +1? {Example 13} Please enter your answers into the ISLeX program - END OF ASSIGNMENT 4 - LABORATORY ASSIGNMENTS
  • 44. INTRODUCTORY STATISTICS LABORATORY 25 Blank page ASSIGNMENT 5 26 INTRODUCTORY STATISTICS LABORATORY Introductory Statistics Laboratory Assignment #5 Purpose The main objectives of this assignment are to: a) use a goodness-of-fit test to demonstrate an important statistical theorem and b) calculate means and confidence intervals for a single sample when σ is known and when σ is not known. NOTE As you proceed with this assignment, write your answers in the spaces provided. When you have completed the assignment and exit from EXCEL, you are required to enter your answers into the ISLeX program.
  • 45. Question A The central limit theorem states that means of samples of more than 30 observations from any distribution will have a distribution that a) is approximately normal, b) has a mean equal to the mean of the original distribution, and c) has a standard deviation equal to the standard deviation of the original distribution divided by the square root of the sample size. The Poisson distribution is discrete and skewed; it is decidedly non-normal! However, the central limit theorem states that the means of sufficiently large (n ≥ 30) samples from even a Poisson distribution will be normally distributed. The means of 100 samples, each of size 40, from a Poisson distribution are recorded in Table A. For this first question, you are required to use the 'goodness-of-fit' test to test the hypothesis that the means in this file are normally distributed with a mean of 10 and a standard deviation of 0.5. distribution into the EXCEL worksheet. {Example 1} the sample means.
  • 46. 1. What is the mean of the 100 sample means? 2. What is the standard deviation of the 100 sample means? LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 27 {Example 9} means in each of the classes indicated in the following table. Note that interval endpoints are midpoint ± 0.5*width and the interval midpoint is average of two endpoints. Use 9.31, 9.61, 9.91, 10.21, 10.51 and 10.81 as the “bin” boundaries for the Excel HIST0GRAM procedure. Class interval Midpoint Expected frequency Observed Frequency < 9.31 - 8.3794 9.31 - 9.61 9.46 13.3902 9.61 - 9.91 9.76 21.0881
  • 47. 9.91 - 10.21 10.06 23.4181 10.21 - 10.51 10.36 18.3379 10.51 - 10.81 10.66 10.1248 > 10.81 - 5.2616 3. What was the observed frequency of sample means that fell between 9.91 and 10.21 ? {Example 2} EXCEL worksheet and the seven observed frequencies into another column. Make sure that expected and observed frequencies for the same class are entered in the same row. Check that both columns of data sum to 100 (within rounding error). If they do not, correct your error(s). {Example 3} -of-fit test should now be used to see if the observed frequencies in two or more classes of observed values agree sufficiently well with those expected on the basis of some hypothesis. In this example, the hypothesis is that the means of samples will be normal with mean 10 and standard deviation 0.5.
  • 48. The test requires that you calculate a chi-squared statistic by: a) calculating the differences between the observed and expected frequencies in each class, b) squaring the differences and dividing by the expected frequencies in each class, and c) summing the values from step b. ASSIGNMENT 5 28 INTRODUCTORY STATISTICS LABORATORY 4. What is the value of (O-E)2/E for the first class ? 5. What is the value of the chi-squared statistic (that is, the sum over all seven classes of (O-E)2/E) ? With seven classes, the chi-square statistic has 7-1 = 6 degrees of freedom and the critical value of a 5% significance level is 12.6. If your test statistic is less than 12.6, you should conclude that the observed data show a good fit to the hypothesis. 6. Does the data show a good fit to the normal distribution with mean 10 and
  • 49. standard deviation 0.5 (0 for no, 1 for yes) ? 7. Based on your limited experience, is the following statement true (use 1) or false (use 0)? Means of samples of size 40 from a Poisson (discrete) distribution are approximately normal (continuous). {Example 14} Question B The time (in minutes) required for six-year old children to assemble a certain toy is believed to be normally distributed with a known standard deviation of 3.0. The data in Table B gives the assembly times for a random sample of 25 children. compute and report the mean and standard deviation. 8. What was the mean assembly time for this sample of 25 six- year old children? 9. What was the estimated standard deviation?
  • 50. {Examples 1 and 9} deviation is known or given, one should use a standard normal distribution to calculate a confidence interval for the population mean. The procedure for calculating a large sample confidence interval for one mean involves three basic steps: a) determine a critical value from the appropriate distribution (for a 90% confidence interval with known standard deviation the critical value is z0.05 = 1.645). b) calculate the margin of error of the estimate E = zα/2σ/√n, and c) calculate lower limit = mean – margin of error, and upper limit = mean + margin of error LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 29 10. What was the margin of error of the estimate for a 90% confidence interval? 11. What was the lower limit of the 90% confidence interval for average assembly time?
  • 51. 12. What was the upper limit of the 90% confidence interval for average assembly time? 13. From this example, would you say that the following statement is true (use 1) or false (use 0) ? The lower confidence limit must always be less than the sample mean and the upper confidence limit must always be greater. 14. From this example, would you say that the following statement is true (use 1) or false (use 0)? When one has a choice of a known (or given) standard deviation and an estimated standard deviation, one should ignore the estimated standard deviation in calculating confidence intervals. {Example 15}
  • 52. Question C The level of monoamine oxidase (MOA) activity (nmol/hr/mg protein) was recorded for fourteen non-responsive depressed patients who had been treated with phenylzine. MOA activity is assumed to follow a normal distribution. The data are stored in a single column of Table C. You are asked to calculate a point estimate and an interval estimate of the mean MOA activity of this type of patient. Nothing is known about the variability of MOA activity. worksheet, and compute and report the mean and standard deviation. 15. What was the point estimate for the mean MOA activity for this sample of 14 depressed patients? 16. What was the standard deviation? {Examples 1 & 9}
  • 53. ASSIGNMENT 5 30 INTRODUCTORY STATISTICS LABORATORY When data has a normal distribution but is from a small (<30) sample or when data is from a large sample (≥30) and in either case σ is not known, one should use a t-distribution to calculate a confidence interval for the population mean. The procedure for calculating a confidence interval for one mean when σ is not known involves three basic steps: a) determine a critical value from the appropriate distribution (for a 90% confidence interval with estimated standard deviation the critical value is tα/2,n-1 = t0.05,13 = 1.771), b) calculate the margin of error of the estimate, E = tα/2,n- 1s/√n, and c) calculate lower limit = mean – margin of error and upper limit = mean + margin of error 17. What was the margin of error of estimate for a 90% confidence interval in this sample of 14 depressed patients? 18. What was the lower limit of the 90% confidence interval for average MOA activity?
  • 54. 19. What was the upper limit of the 90% confidence interval for average MOA activity ? 20. From these examples, would you say that the following statement is true (use 1) or false (use 0)? All confidence intervals are calculated by calculating a point estimate and then subtracting and adding a margin of error of the estimate. {Example 16} - END OF ASSIGNMENT 5 - LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 31
  • 55. Blank page ASSIGNMENT 6 32 INTRODUCTORY STATISTICS LABORATORY Introductory Statistics Laboratory Assignment #6 Purpose The objectives of this assignment are to: a) calculate a confidence interval for a proportion and b) present confidence intervals and tests of hypothesis for matched pairs. NOTE As you proceed with this assignment, write your answers in the spaces provided. When you have completed the assignment and exit from EXCEL, you are required to enter your answers into the ISLeX program. Question A Opinion polls are a popular method for assessing product
  • 56. preference, political preference, and more. As a simple example, consider that a poll was taken ten days prior to a civic election to try to predict what proportion of the electorate would vote for the incumbent mayor. The data in Table A represents the results of a moderate sample of persons who were asked if they would vote for the same mayor; a yes was recorded as 1, a no as 0. You are required to analyze the results of the poll and predict what proportion of voters will vote for the incumbent. he EXCEL worksheet, prepare a histogram to count the number of yes (1) and no (0) responses, and calculate the proportion who indicated that they would vote for the incumbent mayor. Note that, since yes and no are represented by 1 and 0, the proportion of yes can be determined by calculating the sum and dividing by the total sample size. 1. How large was the sample of voters represented in this poll? 2. What proportion of the sample voters indicated they would vote for the incumbent mayor? {Examples 1, 2 and 10} LABORATORY ASSIGNMENTS
  • 57. INTRODUCTORY STATISTICS LABORATORY 33 voters expected to vote for the incumbent mayor. The procedure for calculating a confidence interval for a proportion involves three basic steps. 3. Determine the α/2 critical value for the appropriate distribution (standard normal in this case). Use the NORM.INV function to calculate the critical value =NORM.INV(0.95,0,1). What is the critical value for a 90% confidence interval based on the standard normal distribution? 4. What is the standard error of the estimated proportion of polled voters who favour the incumbent? n qpsp ˆˆ ˆ = 5. What is the margin of error of the estimated proportion? 6. What is the lower 90% confidence limit on the proportion of voters who will
  • 58. vote for the incumbent? 7. What is the upper 90% confidence limit on the estimated proportion of voters who will vote for the incumbent? {Example 17} 23,217 of the 58,839 persons that voted actually voted for the incumbent. Calculate and report the actual proportion that voted for the incumbent. 8. What was the proportion that actually voted for the incumbent? 9. Based on the results given in questions 6, 7 and 8, which of the following statements (1, 2 or 3) is most correct? 1 - The poll of a sample of voters gave a good indication of the final vote. 2 - Many of the voters who would have voted for the incumbent at the time of the poll must have changed their minds. 3 - The persons sampled in the poll must have contained an
  • 59. unusually low proportion of those who favoured the incumbent. {Example 10} Question B The Monster Chemical Company believes that its herbicide (Avena-doom) is better than its competitor's herbicide (Avena-kill) for controlling wild oat in barley fields. To demonstrate ASSIGNMENT 6 34 INTRODUCTORY STATISTICS LABORATORY the advantage of their herbicide over that of their competitor, Monster grew side-by-side plots of barley treated with each of the two herbicides in a large sample of farmers' fields throughout western Canada. The company then wished to compare the yields of barley treated with the two types of herbicides. Yield of barley will vary from farm to farm regardless of which herbicide is used. A difference in climate, differences in agronomic practices, and differences in type of barley grown cause variation. For this reason, it is desirable to match the data from the two plots on each farm.
  • 60. The analysis is one of looking at differences between matched pairs. with Avena-doom (second column), and barley yield with Avena-kill (third column) from the three columns in Table B into columns of the EXCEL worksheet. Describe the data from the two treatments. 10. What was the average barley yield for plots treated with the Avena-doom herbicide? 11. What was the standard deviation of yields of barley plots treated with Avena-doom ? 12. What was the average yield of plots treated with Avena-kill? 13. What was the standard deviation with Avena-kill? {Examples 1 and 9} calculated and then analyze the differences.
  • 61. 14. What was the mean of the differences between yield of barley plots treated with Avena-doom and Avena-kill ? 15. What was the standard deviation of the differences (for each pair)? 16. Was the standard deviation of the differences smaller (0) or larger (1) than the standard deviation of the barley yields from plots treated with Avena-doom? {Examples 10 and 9} differences in yield between plots treated with Avena-doom and those treated with Avena-kill. NOTE: The standard deviation is estimated from the data so we use the t distribution. LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 35
  • 62. 17. What is the critical value for the confidence interval? 18. What was the margin of error of the estimated mean difference? 19. What was the lower limit of the 95% confidence interval for the average difference in yields of barley treated with Avena-doom and barley treated with Avena-kill? 20. What was the upper limit? {Example 16} Question C Use the same data and results of Question B to investigate the hypothesis that the increase in barley yield by using Avena-doom instead of Avena- kill is no greater than 3.0 q/ha (300 kg/ha). The alternative to this hypothesis is that the increase is greater than 3 q/ha. To test this hypothesis, one must calculate a test statistic, t = Mean of differences - hypothesized mean ( =3.0) Standard error of the differences The null hypothesis should be rejected if the test statistic exceeds the critical value from the theoretical distribution. For a 5% significance level, α = 0.05, the critical value for a
  • 63. one-tailed test can be found by using the appropriate T.INV function (see Example 18) with n-1 degrees of freedom. For matched pairs, n is the number of pairs. In this instance, the null hypothesis should be rejected if the test statistic exceeds the critical value. 21. 21. What is the value of the test statistic for testing the hypothesis that the mean difference is 3.0 q/ha or less? 22. What is the critical value against which the test statistic in question 21 should be compared? 23. Should the hypothesis that the yield difference is 3 q/ha or less be rejected (1) or not (0)? {Example 18} - END OF ASSIGNMENT 6 - ASSIGNMENT 7 36 INTRODUCTORY STATISTICS LABORATORY
  • 64. Introductory Statistics Laboratory Assignment #7 Purpose This lengthy assignment serves to review calculations of confidence intervals and tests of hypothesis for: a) two means of large independent samples from populations with unknown and unequal variances, b) two means of small independent samples from populations with the same unknown variance, c) two proportions from large independent samples. NOTE As you proceed with this assignment, write your answers in the spaces provided. When you have completed the assignment and exit from EXCEL, you are required to enter your answers into the ISLeX program. Question A The role that cholesterol plays in the development of "hardening of the arteries" (atherosclerosis) and heart disease has been widely reported. In one experiment, a group of patients who were considered to be high-risk were split into two equal groups. The first group
  • 65. was put on a special diet with a high proportion of fish (salmon, tuna, mackerel and cod). Oil from these deep-sea fish is known to be very rich on Omega-3 fatty acids. The other (control) group was maintained on a standard diet (high-protein, low-fat, complex carbohydrates and polyunsaturated cooking oil). The change (decrease) in cholesterol was measured after a period of time. A greater change is desirable. The (simulated) data (mg decrease per decilitre of blood) for the Omega-3 group is stored in Table A1, and the data for the control group is stored in Table A2. You are required to calculate a 95% confidence interval for the average difference in cholesterol reduction and to test the hypothesis that there was no difference between the two diets in average reduction of cholesterol. m the 'Omega-3' group [Table A1] the data from the 'control' group [Table A2] into the EXCEL worksheet. Determine and report the number of observations in each group, the mean change (mg/dl) in each group and the standard deviation of the change in each group. 1. How many patients were in each diet group? 2. What was the mean (decrease) in cholesterol for the Omega-3 group of patients?
  • 66. 3. What was the standard deviation in that group? LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 37 4. What was the mean (decrease) in cholesterol for the control group of patients? 5. What was the standard deviation in the control group? {Examples 1 and 9} variances that are unequal. We can use the normal distribution as an approximation to the t distribution when the sample sizes are large. The method for calculating a large-sample confidence interval for the difference between two means consists of three basic steps. a) Estimate the difference between the two sample means and the standard error of the difference between the two sample means. 6. What is the estimated difference of means? 7. Standard error of the difference between means
  • 67. 2 2 2 1 2 1 n s n s += What is the standard error of the difference of means? b) Calculate the margin of error of the estimated difference of means. For this large-sample 95% confidence interval we can approximate with a z value which is z0.025 = 1.96. Calculate the confidence interval as difference between means ± margin of error. 8. What is the margin of error of the estimated difference? 9. What is the lower limit for the 95% confidence interval of the difference in cholesterol reduction between Omega-3 and control diets?
  • 68. 10. What is the upper limit? {Example 19} difference between the two diets proceeds as follows. Since we expect that the Omega-3 diet should give a greater decrease in cholesterol than the control, we will use a one-tailed alternative hypothesis. Use a 5% significance level to test the null hypothesis that there is no difference between the diets against an alternative that the difference between Omega-3 and control groups is greater than zero. The test of hypothesis has two basic steps: ASSIGNMENT 7 38 INTRODUCTORY STATISTICS LABORATORY a) Compute the test statistic (z) as the difference in means divided by standard error of the difference. b) The null hypothesis should be rejected if the test statistic exceeds the critical value for a one-tailed alternative (approximately 1.645 for 5% significance in a large-sample, one-tailed test).
  • 69. 11. What is the value of the test statistic? 12. Should the null hypothesis be rejected and the conclusion be that Omega-3 diet did indeed cause a greater reduction in cholesterol than the control diet? Yes =1, No = 0 {Example 19} Question B In some law schools, the score on a test known as LSAT is an important criterion for acceptance. Two law schools decided to compare the LSAT scores of students registered in their respective schools. LSAT scores for students in Law school 1 are stored in Table B1 and those for students from Law school 2 in Table B2. Assume that the variances of LSAT scores are equal in the two schools. You are asked to calculate a 90% confidence interval for the difference in average LSAT scores and to test the hypothesis that students from the two schools do not differ in their average LSAT scores. Use a 5% significance level. from Law school 2 into the EXCEL worksheet. Compute and report the number, means and
  • 70. standard deviations of scores from each school. 13. How many LSAT scores from school 1? 14. What was the mean LSAT score from school 1? 15. What was the standard deviation of scores from school 1? 16. How many LSAT scores from school 2? 17. What was the mean LSAT score from school 2? 18. What was the standard deviation of scores from school 2? {Examples 1 and 9} LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 39 eps to calculate a 90% confidence interval for the difference in mean LSAT scores when variances are unknown but assumed to be equal. a) calculate the difference between the two means (school 1 - school 2) b) calculate the pooled variance for the two samples: c) calculate the standard error of the difference:
  • 71. d) Calculate the critical value and margin of error for α = 0.10. Use the T.INV function to get the critical value. Multiply the critical value by the standard error of the difference to get the margin of error. Use degrees of freedom = n1 + n2 – 2. e) Calculate the lower and upper 90% confidence limits 19. What is the estimated pooled variance for this data? 20. What is the standard error of the difference? 21. What is the margin of error of the difference? 22. What is the lower limit of the difference between the two schools in LSAT scores? {Example 20} pools = ( 1n -1) 21s + ( 2n -1)
  • 72. 2 2s ( 1n -1) + ( 2n -1) 1n = size, sample 1 2n = size, sample 2 1s = st.dev, sample 1 2s = st.dev, sample 2 sx1−x2 = pools ( 1 1n + 1 2n ) ASSIGNMENT 7 40 INTRODUCTORY STATISTICS LABORATORY hypothesis that the means of the two groups of LSAT scores are equal when the samples are
  • 73. independent and the population variances are unknown but equal. The test statistic is the difference in means minus zero divided by the standard error of the difference. The null hypothesis should be rejected if the test statistic is less than -tα/2,df or greater than tα/2,df where df = n1 + n2 - 2 and α=0.05 is the chosen significance level. Use the T.INV function to calculate the critical values for this two-tailed test. 23. What is the value of the test statistic for testing the hypothesis that the mean LSAT scores are the same for the two law schools? 24. Using the 5% significance level, should the null hypothesis be rejected (1) or not (0)? {Example 20} Question C The legislature of a southern state in the U.S. passed a rule, commonly called "no-pass, no-play", which prohibits a student who fails in any subject from participating in any extracurricular activity for six weeks. Data were collected for students involved in football, volleyball, cross country, and band for the first six-week
  • 74. grading period. Records were kept from last year and this year. The numbers of students is stored in column 1 and the proportions sidelined because of the rule are stored in column 2 of Table C, the first row being for last year and the second for this year. values. 25. How many students were there in last year's sample? 26. What proportion of the last year's students were sidelined because of one or more failures? 27. How large was this year's sample? 28. What proportion failed and were sidelined this year? {Example 1}
  • 75. LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 41 change (last year minus this year) in proportion of students sidelined. a) Calculate the difference in proportions. b) Calculate the standard error of the difference. n )p-(1p + n )p-(1p = s 2 22 1 11 pp ˆˆˆˆ 2ˆ1ˆ − c) Calculate the margin of error of estimate. For a 90% confidence interval with large samples, use z0.05 = 1.645.
  • 76. d) Calculate the lower and upper limits. 29. What is the upper 90% confidence limit on the change in proportion of students sidelined because of failure? {Example 21} an alternative that the proportion sidelined has decreased (that is, the difference in proportions is greater than zero). Use a 5% significance level. NOTE: Under the null hypothesis, the proportions are equal and we should therefore calculate an average proportion for the two groups. This will result in a new estimate of the standard error of the difference between sample proportions. average (pooled) proportion = 30. What was the average (pooled) proportion sidelined? 31. Now use the pooled proportion to calculate the standard error of the difference between the two proportions. ) n
  • 77. + n )(p-(1p = s 2 pp 11 1 2ˆ1ˆ − What is the value of the test statistic for testing the hypothesis that the proportion did not change (remember to divide by the standard error of the difference between the two proportions which was calculated using the pooled proportion)? n + n pn +pn = p 21 2211 ˆˆ
  • 78. ASSIGNMENT 7 42 INTRODUCTORY STATISTICS LABORATORY Use a one-tailed test with a 5% significance level to answer the following question. Remember that you will reject the null hypothesis if the test statistic exceeds the critical value (1.645 in this case). 32. Was the superintendent of schools justified in saying, "We are very pleased with the improvement. It shows coaches and students are taking the rule seriously"? Answer 1 for yes or 0 for no. {Example 21} - END OF ASSIGNMENT 7 – ASSIGNMENT 8 44 INTRODUCTORY STATISTICS LABORATORY Introductory Statistics Laboratory Assignment #8
  • 79. Purpose In this assignment calculations will be completed for analyses of variance for : a) a one-way design, b) a two-way design with more than one observation per cell, and c) a two-way design with one observation per cell (randomized complete block design) NOTE As you proceed with this assignment, write your answers in the spaces provided. When you have completed the assignment and exit from EXCEL, you are required to enter your answers into the ISLeX program. Question A Gasoline mileage (mpg) was measured on several cars of each of four different makes (coded 1, 2, 3 and 4). The make of each car is stored in the first column, and the mileage for each car is stored in the second column, of Table A. You need to conduct an analysis of variance to see if there are differences among the four makes in gasoline mileage. You should also estimate the mileage of each of the four makes of cars. worksheet. Name the columns and view the data.
  • 80. {Example 1} -way analysis of variance on this data. Since each data point can be classified only according to the make of car, a one-way analysis of variance is required. It is important that students be able to interpret analysis of variance tables such as those produced by EXCEL. For this analysis, you will need to copy data for each make into different adjacent columns. Fill in the following one-way analysis of variance table and answer the first five questions. Source of variation Degrees of freedom Sum of squares Mean square F P Make of car 3
  • 81. Error 1. What is the value of the F-statistic for testing the null hypothesis that there are no differences in gasoline mileage among the four makes of automobile? LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 45 2. What are the degrees of freedom associated with the numerator of this test statistic? 3. What are the degrees of freedom associated with the denominator of the F-value for MAKE of car? 4. What is the estimate of the pooled variance within makes of cars (i.e. the Error mean square)?
  • 82. 5. What are the degrees of freedom for this variance in #4? {Example 22} NOTE: For the following questions (6 - 13), use the error mean square and the error degrees of freedom to calculate confidence intervals and to test hypotheses about pairs of means. car and record them in the following table. Make of car Number tested Average mileage 1 2 3 4 6. How many cars of make 2 were evaluated in this experiment? 7. What was the average gasoline mileage for make 2? 8. How many cars of make 3 were evaluated in this experiment? 9. What was the average gasoline mileage for make 3?
  • 83. make 2. Use the method for single means when σ is not known, but use the Error Mean Square as the estimate of the variance. The degrees of freedom will be the Error DF, not n-1! Reminders: Confidence Interval = mean ± margin of error Margin of error = critical value * standard error Use critical value for T at α/2 = 0.025 and df = error df (t table or EXCEL T.INV function) Use standard error = √(error mean square/number of observations of that make of car) 10. What was the margin of error for the confidence interval for gasoline mileage of make 2? ASSIGNMENT 8 46 INTRODUCTORY STATISTICS LABORATORY 11. What was the lower 95% confidence limit for make 2 mileage? 12. What was the upper 95% confidence limit for make 2 mileage? {Example 24}
  • 84. of makes 2 and 3 do not differ. Use the method for single means when σ is not known with the Error MS serving as the pooled variance. Reminders: Test statistic t = difference of means / standard error of difference of means. The standard error of the difference equals square root of the sum of variances of the two means. The variance of each mean is estimated by the error mean square/number of observations in that mean. 13. What is the value of the t test statistic for testing the hypothesis that makes 2 and 3 do not differ in mileage? {Example 24} Question B The data in Table B represents the times (in seconds) for men of three different ages (40, 50 and 60) in each of three different fitness classes (1, 2 and 3) to run a 2 km course. For each runner, age is recorded in the first column, fitness category is recorded in the second column, and running time is recorded in the third.
  • 85. Two men in each of the nine categories ran the course. You should be interested in determining whether age and/or fitness affect running time. Each data point can be classified according to age of the runner or according to fitness of the runner. The data therefore requires a two-way analysis of variance. It is possible that differences among ages of runner will depend upon the fitness categories of those two runners. The model for the analysis should include an interaction term. the columns, and view the data. You will have to copy the data into three different columns each with six observations in order to perform the following analysis (see Example 25). {Example 1, 25} LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 47
  • 86. out a two-way analysis of variance and answer the following questions. Source of variation Degrees of freedom Sum of squares Mean square F P Age of runner 2 Fitness of runner 2 Interaction 4 Error 9 14. What is the value of the F test statistic for testing the hypothesis that age, on average, has no effect on running time?
  • 87. 15. What are the numerator degrees of freedom for that F statistic reported in question 14? 16. What are the denominator degrees of freedom for that F statistic reported in question 14? 17. What is the value of the F test statistic for testing the hypothesis that fitness, on average, has no effect on running time? 18. What is the value of the F test statistic for testing the hypothesis that the effect of age (if any) on running time does not depend of the runner's fitness? NOTE In analysis of variance, the null hypothesis should be rejected whenever the calculated F-statistic is greater than the critical value for a chosen significance level and appropriate numerator and
  • 88. denominator degrees of freedom. Equivalently, the null hypothesis should be rejected whenever the computed p-value is less than the chosen significance level. Use α = 0.01 (significance level =1 %) and answer the following two questions. 19. Should the null hypothesis that age has no effect on running time be rejected (1) or not rejected (0)? 20. Should the null hypothesis that the effect of age is independent of the effect of fitness be rejected (1) or not rejected (0)? {Example 25} ASSIGNMENT 8 48 INTRODUCTORY STATISTICS LABORATORY
  • 89. following three questions. Age Fitness 1 Fitness 2 Fitness 3 Average 40 50 60 Average 21. What was the average running time for all 60-year olds? 22. What was the average running time for all men in fitness category 3? 23. What was the mean running time of the two 60-year, category 3 runners? {Example 25} Question C In many agricultural and biological experiments, one may use a two-way model with only one observation per cell. When one of the factors is related to the grouping of experimental units into more uniform groups, the design may be called a randomized complete block design (RCBD). The analysis is similar to a two-way analysis of variance (question B) except that the model does not include an interaction term. The specific leaf areas (area per unit mass) of three types of citrus each treated with one of
  • 90. three levels of shading are stored in Table C. The first column contains the code for the shading treatment, the second column contains the code for the citrus species, and the third column contains the specific leaf area. Assume that there is no interaction between citrus species and shading. Carry out a two-way analysis of this data. The shading treatment and citrus species are coded as follows: Treatment Code Species Code Full sun 1 Shamouti orange 1 Half shade 2 Marsh grapefruit 2 Full shade 3 Clementine mandarin 3 leaf area into the EXCEL worksheet, label the columns and look at the data. {Example 1} LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 49 -way (without interaction) analysis of this data and answer the following questions.
  • 91. Use a 5% significance level. Source of variation Degrees of freedom Sum of squares Mean square F P Shading treatment 2 Citrus species 2 Error 4 24. Should the hypothesis that shading treatment has no effect on specific leaf area be rejected (1) or not (0)? 25. Should the hypothesis that citrus species do not differ in specific leaf area be
  • 92. rejected (1) or not (0)? 26. What is the estimate of the average (pooled) variance in this experiment (i.e. Error mean square)? 27. What are the error degrees of freedom for the pooled variance? {Example 26} Recall that the confidence interval for a difference between two means is based on a calculation of the margin of error of the estimated difference. With a common variance (Error MS) and the same number of observations in all shading treatments, the margin of error of an estimated difference will be the same whether we calculate it for treatments 1 and 2, 1 and 3, or 2 and 3. This margin of error of the difference between two means is sometimes referred as the least significant difference (LSD). experiment. LSD = critical t value × standard error of difference. Use the critical t value with 4 degrees of freedom is t 0.025,4 = 2.776. n is the number of times of times each treatment was tested (in this case n = 3 for the 3 species).
  • 93. n quareErrorMeanS t=)LSD( edf/2, *2 αα 28. What is the least significant difference (α = 0.05) for comparing shading treatments in this experiment? {Example 24} ASSIGNMENT 8 50 INTRODUCTORY STATISTICS LABORATORY Any two shading treatments are judged to be significantly different if their absolute (ignore the + or - sign) difference exceeds the least significant difference.
  • 94. differences. Compare the appropriate differences to the LSD to answer the following questions. Shading Treatment Mean Specific Leaf Area Full Sun Half Shade Full Shade 29. Should the hypothesis that the specific leaf area under full sun is not different from the specific leaf area in half shade be rejected (1) or not rejected (0)? 30. Should the hypothesis that the specific leaf areas of half shade and full shade are not different be rejected (1) or not rejected (0)? {Example 24} - END OF ASSIGNMENT 8 - LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 51
  • 95. Blank page ASSIGNMENT 9 52 INTRODUCTORY STATISTICS LABORATORY Introductory Statistics Laboratory Assignment #9 Purpose This final assignment presents some of the important points to consider in correlation analysis and simple linear regression analysis. Question A The data in Table A gives the (simulated) advertising expenditures of 25 large companies for last year and this year. You are asked to investigate the question of whether or not expenditures in one year are related to expenditures in another. The data file contains the company number in the first column, last year's expenditures ($ millions) in the second column, and this year's expenditures ($ millions) in the third column.
  • 96. t, name the columns, and view the data. 1. Which company had the greatest advertising expenditures last year? 2. Which company had the greatest advertising expenditures this year? {Example 1} ditures in the two years and answer the following question. 3. Which of the following three statements (1, 2 or 3) most correctly describes the relationship between last year's and this year's expenditures? 1 - There is little relationship between what a company spends on advertising in one year and what that company spends in another. 2 - Companies that spent most on advertising last year tended to be among those spending the greatest amount this year. 3 - Companies that spend a lot on advertising in one year tend to reduce their advertising expenditures in the next. {Example 27}
  • 97. LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 53 riables can be measured by the covariance. The covariance is a measure of how much two random variables vary together. The larger the magnitude of the product, the stronger the strength of the relationship. The value of the covariance is interpreted as follows: • Positive covariance - indicates that higher than average values of one variable tend to be paired with higher than average values of the other variable. • Negative covariance - indicates that higher than average values of one variable tend to be paired with lower than average values of the other variable. • Zero covariance - if the two random variables are independent, the covariance will be zero. However, a covariance of zero does not necessarily mean that the variables are independent. A nonlinear relationship can exist that still would result in a covariance value of zero.
  • 98. Calculate the standard deviation for last year's expenditures, the standard deviation for this year's expenditures and the covariance between the two. 4. What is the standard deviation of last year's advertising expenditures ($ millions) of these 25 companies? 5. What is the standard deviation of this year's advertising expenditures ($ millions) of these 25 companies? 6. What is the covariance between the last year's and this year's advertising expenditures ($ millions2) of these 25 companies? Because the covariance depends on the units of the data, it is difficult to compare covariances among data sets having different scales. A value that might represent a strong linear relationship for one data set might represent a very weak one in another. The correlation coefficient (r) addresses this issue by normalizing the covariance (i.e. divide the covariance sxy by the product of the two standard deviations (sx * sy)), creating a dimensionless
  • 99. quantity that allows the comparison of different data sets. 7. What is the correlation (r) between last year's and this year's expenditures? {Example 28} ASSIGNMENT 9 54 INTRODUCTORY STATISTICS LABORATORY expenditures from one year to another? Test the null hypothesis that there is no relationship between last year's and this year's expenditures against an alternative that there is a positive relationship (r > 0). Use a 10% significance level. Because this is a one-tailed test with 25 pairs of observations (degrees of freedom = 23), we find that the critical value against which to compare the estimated correlation is t = 1.319. Using your r value and n = 25, calculate the test statistic tcalc and compare. If the test statistic is greater than the critical value of 1.319, the null hypothesis will be rejected.
  • 100. 21 2 r nr=tcalc − − 8. Should the hypothesis that there is no relationship between last year's and this year's advertising expenditures be rejected (1) or not (0)? {Example 28} Question B In a study of the role of young drivers in automobile accidents, data on percentage of licensed drivers under the age of 21 and the number of fatal accidents per 1000 licenses were determined for 32 cities. The data are stored in Table B. The first column contains a number as the city code, the second column contains the percentage of drivers who are under 21, and the third column contains the number of fatal accidents per 1000 drivers. The primary interest is whether or not the number of fatal accidents is dependent upon the proportion of licensed drivers that are under 21. py the data into the EXCEL worksheet, name the
  • 101. columns, and view the data. 9. Which city (number) had the highest number of fatal accidents per 1000 licensed drivers? {Example 1} percentage of drivers under 21. Based on the plot, try to anticipate whether or not the following analysis will show that there is a significant increase or decrease in number of fatalities with increases in percentage of drivers under 21. {Example 27} LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 55 can be used to predict levels of a
  • 102. dependent variable for specified levels of an independent variable. Use the EXCEL REGRESSION command to calculate the intercept and slope of the least- squares line, as well as the analysis of variance associated with that line. Fill in the following table and use the results to answer the next few questions. Carefully choose your independent and dependent variables and input them correctly using EXCEL’s regression command. In this example, the percentage of drivers under the age of 21 affects the number of Fatals/1000 licenses. The regression equation (least-squares line) is Fatals/1000 licenses = + % under 21 (intercept) (slope) Analysis of variance Source DF SS MS F P Regression 1 ________ _______ ________ _______ Residual (Error) 30 ________ _______ 10. What is the estimated increase in number of fatal accidents per 1000 licenses due to a one percent increase in the percentage of drivers under 21 (i.e. the slope)?
  • 103. 11. What is the standard deviation of the estimated slope? 12. What is the estimated number of fatal accidents per 1000 licenses if there were no drivers under the age of 21 (i.e. the y intercept)? 13. What percentage of the variation in accident fatalities can be explained by the linear relationship with drivers under 21 (i.e. 100 × the unadjusted coefficient of determination)? 14. Should the hypothesis that the slope does not differ from zero (no effect of young drivers on fatals) be rejected (1) or not (0) based on a test at the 1% significance level (i.e. is the p-value from the ANOVA less than 0.01)? 15. What are the degrees of freedom for the standard error of estimate (and the standard deviation of the slope); i.e. what are the error degrees of freedom?
  • 104. {Example 29} ASSIGNMENT 9 56 INTRODUCTORY STATISTICS LABORATORY to calculate a confidence interval for the slope of the least-squares line and to test hypotheses other than H0 : ß1 = 0. In both cases, one needs to have an estimate of the slope and of its standard deviation (sometimes called standard error). Furthermore, one needs to recognize that the degrees of freedom for the standard deviation is the same as the error degrees of freedom (n - 2). Note that the EXCEL gives the standard error of estimate directly, but correctly calls it the standard deviation of the slope. Therefore, you must not divide by the square root of sample size as in example 16. Use the above information to calculate a 90% confidence interval for the slope of the true regression line. For 30 degrees of freedom and α = 0.1, the critical t-value is 1.697. 16. What is the margin of error for calculating a 90% confidence interval for the slope of the regression line (i.e. 1.697 × the standard deviation
  • 105. of the slope)? 17. What is the lower 90% confidence limit for the slope? (i.e. slope – margin of error) 18. What is the upper 90% confidence limit for the slope? (i.e. slope + margin of error) null hypothesis H0 : ß1 = 0.05 against a one-sided alternative H1 : ß1 > 0.05. Use a 1 percent significance level (for which the critical value is 2.423). Reminder : t = estimated value - hypothesized value = slope - 0.05 standard error (deviation) of estimate st dev of slope 19. What is the value of the test statistic for testing this hypothesis? 20. Should the hypothesis that the increase in fatals per one percent increase in drivers under 21 is not greater than 0.05 be rejected (1) or not (0)? - END OF ASSIGNMENT #9 - THE LAST ASSIGNMENT -
  • 106. LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 57 Introductory Statistics Laboratory for Excel PC Instructions for Excel 2013 EXCEL EXAMPLES INTRODUCTION 58 INTRODUCTORY STATISTICS LABORATORY Excel Examples
  • 107. INTRODUCTION Note: Specific Excel 2013 instructions are shown in [Excel 2013: ] throughout the excel examples. These EXCEL examples provide a basis for learning to use MICROSOFT EXCEL to perform various tasks required in the ISLeX laboratory assignments. The examples may not refer exactly to the task to be performed. For instance, in some cases, the example may use different columns than required for a particular task. Your laboratory sessions will be much less frustrating if you study the assignment and associated examples before sitting down at a computer. The examples will not match exactly what you need to do to complete your assignments. They should provide an adequate outline, but you will have to modify the example to complete your assigned task. For instance, you will need to use different file names in your lab assignments than those used in examples. You will also have to refer to different EXCEL worksheet columns. The EXCEL workbook contains one or more worksheets each identified by a tab on the lower left part of the window. EXCEL will assign default names, such as Sheet 1, to individual
  • 108. worksheets or the user can change the name by clicking the right mouse button on the tab and choosing the 'rename' option. Each worksheet is composed of cells arranged in rows and columns. Rows are identified by numbers 1, 2, 3 and so on, while columns are identified by letters A, B, C and so on. After column Z, naming starts with AA and proceeds to ZZ. Each cell may contain a number, some text, or a formula. In this manual, only absolute referencing is used to refer to cells or blocks of cells. To refer to the cell located in the second row of column C, use C2. To indicate all cells in the block that includes rows 2 to 10 of columns B through D, use the cell designations for the cell in the upper left corner (i.e. B2) and for the cell in the lower right corner (i.e. D10) separated by a colon, thus B2:D10. Sometimes, it will be useful to enter a formula into a cell and then copy that formula to other cells. If the formula in cell B2 refers to cell A1, it will refer to cell D5 when the formula is copied to cell E6. If you wish it to continue to refer to cell A1, use $A$1 instead of A1 in the formula. INTRODUCTION EXCEL EXAMPLES
  • 109. INTRODUCTORY STATISTICS LABORATORY 59 EXCEL commands and subcommands can be selected by clicking the left mouse button on the required command or subcommand. When you first start using Excel, you should become familiar with three important areas in the Excel window. Mention has already been made of the cells arranged in rows and columns in the worksheet. In fact there may be several worksheets in a single workbook. If you place the cursor in a particular cell, the “Name box” located at the upper left hand side of the worksheet will indicate the identity of the active cell, e.g. B5. If you type a number, name or formula into that cell, it will also appear in the “Formula bar” at the top of the worksheet. If you then press the enter key, the cursor will move to the next cell and the formula bar will become blank (if the next cell is empty). If you had entered an actual formula, it will be evaluated and the evaluation will be present in the cell that you entered the formula. If you made an error and need to edit the formula, highlight the cell and then move the cursor to the formula bar to edit the formula. In these laboratory assignments, you are sometimes required to combine information from two parts of an assignment. Typically, each part will result in a separate workbook in
  • 110. Excel. You can copy data from one workbook to another by using the following procedure. Highlight the data you wish to copy and press Ctrl-C to copy the data. Use the Window command of Excel to choose the workbook you wish to copy to. Place the cursor where you wish to past the data and press Ctrl- V Note: Rather than using Ctrl-C and Ctrl-V to copy and paste, you may use Edit->Copy and Edit->Paste. Most data analysis tools of Excel default to printing their results on a new worksheet. However, most also have an option to specify an output range on the same worksheet. If you choose the Output range option, click in the adjacent box and then highlight the area of the worksheet where you wish to store the results. EXCEL EXAMPLES EXAMPLE 1 60 INTRODUCTORY STATISTICS LABORATORY Example 1: Copying data from the assignment webpage into the EXCEL worksheet. Your data will be presented to you in a web page. To copy the data to Excel: • First highlight the data and either press the key combination ctrl-c, or select Copy from
  • 111. the Edit menu to copy the data (to the clipboard). • Then, switch to the Excel window and either use the key combination ctrl-v, or select Paste from the Edit menu to paste the data into Excel. At this stage, you should now have the data on an Excel worksheet. (If you wish, you can name this worksheet LAB0A.DAT by right clicking on its tab at the bottom and choosing the rename option.) This same procedure applies to all assignments. Follow the above procedure even with multi-column tables. If you wish to add a label in cell 1 of column A, move the cursor to that cell and then choose Insert->Cells and click OK (or press enter) on the Insert dialog box to move all cells down. [Excel 2013: Home Tab – Insert] This will allow you to type a label in cell A1. The following procedure will allow you to calculate some summary statistics for data in a column. It is good practice to look at summary statistics before proceeding with further analysis. This will alert you to the number of data points, their average value, and a few other informative characteristics about the data. Data Analysis… to pop-up Data Analysis window [Excel 2013: Data Tab – Data Analysis over on far right side] (SEE NOTE BELOW if Data
  • 112. Analysis is missing.) double click on Descriptive statistics With cursor flashing in Input Range: box, click on column letter for column with data If you have entered a name in the first column, click Labels in first row. Click in box preceding Summary statistics, and click on OK or press the enter key. EXCEL will create a new worksheet with the summary statistics. You should note such key characteristics as count, minimum, mean and maximum. At more advanced stages, you may choose to think about kurtosis, skewness and standard deviation or standard error. If you wish, you can delete this temporary worksheet by right- clicking on its tab and choosing the delete option. The same basic procedures will be used in later assignments to enter data from a file that contains several columns. EXAMPLE 1 EXCEL EXAMPLES INTRODUCTORY STATISTICS LABORATORY 61
  • 113. NOTE: The Analysis ToolPak is a Microsoft Excel add-in program that is available when you install Microsoft Office or Excel. To use it in Excel, however, you need to load it first. 1. Click the File tab, and then click Options. 2. Click Add-Ins, and then in the Manage box, select Excel Add-ins. 3. Click Go. 4. In the Add-Ins available box, select the Analysis ToolPak check box, and then click OK. a. If Analysis ToolPak is not listed in the Add-Ins available box, click Browse to locate it. b. If you get prompted that the Analysis ToolPak is not currently installed on your computer, click Yes to install it. 5. After you load the Analysis ToolPak, the Data Analysis command is available in the Analysis group on the Data tab. EXCEL EXAMPLES EXAMPLE 4
  • 114. 62 INTRODUCTORY STATISTICS LABORATORY Example 2: Preparing a histogram of data A histogram is a graphical summary of numerical data. In this example, data stored in EXCEL worksheet column A is summarized in a histogram. Before calculating frequencies in different groups, you must define the classes. In EXCEL, the classes are called "bins". For this example, suppose that the data to be summarized varies from 21 to 28 and you wish to group the observations into "bins" each with one unit for a class width. The first bin will include all data points with values up to and including 22, the second bin will include values greater than 22 up to and including 23 and so on. You only need to indicate the upper boundary for each bin. For this example, use 22, 23, 24, 25, 26, 27, and 28. These values need to be entered into a new column, say column B. You can type the numbers into the first seven rows of column B. To actually draw the histogram, you must first calculate frequencies of data in each bin. Choose Data analysis [Excel 2013: Data Tab – Data Analysis] and select Histogram In the histogram dialog box, move cursor to Input range and click on top of column A, move cursor to Bin range and click on top of column B, if you have a labels in A1 and B1, check the Labels option, and click on OK or press the enter key. EXCEL is very slow at this calculation, so be patient! In a few
  • 115. seconds, you should get a new sheet in the workbook that contains the upper ends of the bin and the frequencies) of observations in each bin. In this example, the results look like this Bin Frequency 22 9 23 6 24 6 25 5 26 7 27 2 28 1 More 0 At this point, you should have a numerical representation of a histogram. Most histograms are presented in graphical form. To develop a bar graph to show the histogram, proceed as follows. Note that Excel creates a bar graph not a true histogram as there are spaces between the bars. A true histogram has no spaces between the bars. Highlight the data, including titles, using the cursor. Insert a chart. [Excel 2013: Insert Tab – in Charts choose Insert Column Chart – select 2D (first choice of the options)] Excel will automatically produce a chart.
  • 116. EXAMPLE 2 AND 3 EXCEL EXAMPLES INTRODUCTORY STATISTICS LABORATORY 63 A histogram gives the frequency (number of observations) in each of various classes. In EXCEL, the classes are defined by giving the upper boundaries of each class (bin). The + sign allows you to format your chart’s elements. You can click on the boxes to include whatever elements you feel are appropriate for your chart. If you want to edit the Axis Title, you can click into that box and type a new axis title. The paint brush allows you to choose the style and color of your chart. This icon allows you to select your data source and make changes instead of having to highlight your excel cells that hold the data and start the chart
  • 117. all over again. EXCEL EXAMPLES EXAMPLE 4 64 INTRODUCTORY STATISTICS LABORATORY How to make a true histogram: To get rid of the gaps between the bars and make a true histogram, right click on any bar and Excel comes up with a window with Format Data Series. Choose Format Data Series (see above arrow). On this window you will need to choose the three column symbol (see above arrow) and then Excel opens Series Options and at the bottom is Gap Width. Change the gap width to zero and you will have a true histogram.
  • 118. EXAMPLE 2 AND 3 EXCEL EXAMPLES INTRODUCTORY STATISTICS LABORATORY 65 You can change the outline of your bars to a different color to have them appear separated by clicking the Outline (see arrow below) and changing the color to black or white. The resulting chart looks like this (remember to make changes to your titles according to best graphing practices, not shown in this chart): EXCEL EXAMPLES EXAMPLE 4 66 INTRODUCTORY STATISTICS LABORATORY
  • 119. Example 3: Entering data from the keyboard into the EXCEL worksheet Occasionally, you will be required to enter data or intermediate results directly into the EXCEL worksheet. You merely type the data into the cells where you wish to store the information. EXAMPLE 5 EXCEL EXAMPLES INTRODUCTORY STATISTICS LABORATORY 67 Example 4: Calculating relative frequencies To calculate relative frequencies in each of several classes, you must divide each frequency of a class by the sum of all the frequencies. Consider data summarized in three classes. Class Frequency 1 5 2 10 3 5 Total 20 The relative frequency for Class 1 is 5/20 = 0.25, for Class 2 is 10/20 = 0.50, and for Class 3 is 5/20 = 0.25. Note that the relative frequencies must always sum to 1.0 (within rounding error). Thus, 0.25 + 0.50 + 0.25 = 1.0. If the frequencies are stored in EXCEL Worksheet column C,
  • 120. you can calculate relative frequencies and store them in another column in the following way. Suppose 5 is in cell C1, 10 in cell C2 and 5 in cell C3. Move the cursor to cell D1, type ‘= C1/SUM($C$1:$C$3)’ in the formula bar, and press enter. Don’t forget the = at the beginning of your equation otherwise it will be entered only as text and will calculate for you. You should see the value 0.25 in cell D1. To calculate the remaining relative frequencies, just copy the formula in cell D1 to cells D2 and D3. Note that, as the formula is copied, C1 will change to C2 and then to C3, but $C$1:$C$3 will remain constant. An alternative would be to first calculate the sum (20) and store in a cell that could then be used to calculate all relative frequencies. For example, enter the formula ‘=SUM(C1:C3)’ in cell C4. Now, use the formula ‘= C1/$C$4’ in cell D1. Again, copy cell D1 to cells D2 and D3. You should also confirm that the relative frequencies sum to 1.0. Use the formula ‘= SUM(D1:D3)’ in cell D4. You can also use the Σ in the tool bar and Excel will help you calculate a sum for that column. [Excel 2013:Home Tab – Σ ] EXCEL EXAMPLES EXAMPLE 5 68 INTRODUCTORY STATISTICS LABORATORY
  • 121. Example 5: Leaving EXCEL and grading your assignment. When you have completed an assignment and have recorded numerical answers to each of the questions in the INTRODUCTORY STATISTICSD LABORAOTRY, you should try your answers in ISLeX. In submitting your answers to the Introductory Statistics Laboratory Program (ISLeX), you are required to use numbers for all answers. Place the cursor in the appropriate box and type in your answer. Use the mouse or the tab key to move to the next box. If you press enter, it will go right to grading. (You have the option to go back again, so DO NOT accept unless you are completely finished.) Click on the “Check my answers” box to grade your assignment. At the end of the assignment, your grade will be displayed on the screen and you will be given to option of accepting the grade or repeating the assignment. Once you accept your grade, you will not be able to repeat the assignment. You are encouraged to repeat the assignment until you are satisfied with your effort. You must achieve 80 or higher to move onto the next assignment. EXAMPLE 6 EXCEL EXAMPLES INTRODUCTORY STATISTICS LABORATORY 69
  • 122. Example 6: How to prepare a stem-and-leaf diagram A stem-and-leaf diagram combines graphical and numerical methods to summarize data. Unfortunately, EXCEL does not have a command for preparing a stem-and-leaf diagram. Suppose you wish to develop a stem-and-leaf diagram of the following data. 25.6 26.0 25.3 27.2 23.6 26.3 25.4 23.8 21.1 23.4 23.9 23.8 26.0 20.0 22.5 28.0 26.7 24.8 25.1 24.9 26.6 24.9 25.0 27.5 20.6 24.0 22.1 20.0 21.8 24.7 21.7 25.2 27.1 24.8 25.8 26.9 25.6 Enter (or read) the data into a column in EXCEL and then sort the data from lowest to highest use the Data->Sort command. [Excel 2013: Data Tab – Sort] The results follow. 20.0 20.0 20.6 21.1 21.7 21.8 22.1 22.5 23.4 23.6 23.8 23.8 23.9 24.0 24.4 24.7 24.8
  • 123. 24.8 24.9 24.9 25.0 25.1 25.2 25.3 25.4 25.6 25.6 25.8 26.0 26.0 26.3 26.6 26.7 26.9 27.1 27.2 27.5 28.0 If you decide to have leaf units of 0.1, the successive stem units will be 10 × 0.1 = 1.0 higher than the previous one. Start by writing the stem units in a column followed by a vertical bar. 20 | 20 | 0 0 6 21 | Then, go 21 | 1 7 8 22 | down the data 22 | 1 5 23 | and write the 23 | 4 6 8 8 9 24 | last digit of 24 | 0 4 7 8 8 9 9 25 | each number 25 | 0 1 2 3 4
  • 124. 6 6 8 26 | in the leaf 26 | 0 0 3 6 7 9 27 | position 27 | 1 2 5 28 | 28 | 0 And, finally, add a title and leaf unit to complete the job. Stem-and-leaf diagram of example data. Leaf unit = 0.1 20 | 0 0 6 21 | 1 7 8 22 | 1 5 23 | 4 6 8 8 9 24 | 0 4 7 8 8 9 9 25 | 0 1 2 3 4 6 6 8 26 | 0 0 3 6 7 9 27 | 1 2 5 28 | 0 EXCEL EXAMPLES EXAMPLE 7 70 INTRODUCTORY STATISTICS LABORATORY The stem-and-leaf diagram consists of two columns of numbers. The first column is called the stem. The second column contains the leaves; one leaf for each data point. The value of any number in a leaf position is indicated by the leaf unit, 0.1 in this example. Any number in
  • 125. a leaf position represents that number multiplied by the leaf unit 0.1. In the first row of the diagram, the 0 stands for 0 × 0.1 = 0.0, and the 6 stands for 6 × 0.1 = 0.6. The value of the numbers in the stem position are 10 × leaf unit, i.e. 1 in this case. In the last row, the 28 for 28 × 1 = 28. The final value of any leaf is calculated by adding the leaf value to the corresponding stem value. The 0 in the last row represents the number 0 × 0.1 + 28 × 1 = 28.0. The third leaf in stem position 21 represents 8 × 0.1 + 21 × 1 = 21.8. EXAMPLE 7 EXCEL EXAMPLES INTRODUCTORY STATISTICS LABORATORY 71 Example 7: How to draw a frequency (or relative frequency) polygon. In this example, midpoints for Samples 1 and 2 are stored in column A, and relative frequencies from Sample 1 are stored in column B and relative frequencies from Sample 2 are stored in column C of an EXCEL worksheet. In order to compare the two samples, it will be useful to plot relative frequencies for both samples on the same graph. Here are columns A, B, and C of an example worksheet. 20 0.0357 0.0000
  • 126. 21 0.1429 0.0270 22 0.2143 0.1081 23 0.1786 0.1081 24 0.2500 0.1622 25 0.1071 0.2162 26 0.0714 0.1892 27 0.0000 0.1081 28 0.0000 0.0811 [Excel 2013: highlight the data. Insert Tab – Charts and Choose SCATTER, then click 2D ‘Straight Line with Markers’]. The resulting graph will look like: However, you will want to edit the graph. Click the to edit the chart. Choose Axes and move the cursor over until the little right arrow appears, then choose More Options and then Click on the histogram picture. EXCEL EXAMPLES EXAMPLE 7 72 INTRODUCTORY STATISTICS LABORATORY The resulting graph will now have better representation. Remember to label your chart title and axis appropriately (not shown in chart below).
  • 127. You can now edit the Axis. Change the minimum Bounds to 19 and the maximum Bounds to 29. Then change the Major Units to 1.0. EXAMPLE 9 EXCEL EXAMPLES INTRODUCTORY STATISTICS LABORATORY 73 Example 8: How to use EXCEL to calculate various numbers that summarize the characteristics of a population (or sample). In this example, the Function command is used to calculate various constant values to be stored in cells in the worksheet. [Excel 2013: Formulas Tab – Insert Function (fx)]. There are many different functions that can be used. Some refer to whole columns, some to individual observations. The following examples demonstrate a few of the uses of functions in EXCEL. You can type the function into any particular cell by first typing an equal sign in the formula bar and then typing the name of the function along with its required arguments. As an alternative, you can use the [Excel 2013: Formulas Tab – Insert Function (fx)] to choose a function and have EXCEL prompt you for necessary arguments. In this course, you would probably choose Function category = Statistical and then double
  • 128. click on the Function name for the function you want to use. For this example, consider that there are 22 observations stored in column A. a) Determine the number of data points in the population. =COUNT(A:A) b) Calculate the mean (= sum of all observations divided by number of observations) =SUM(A1:A22)/COUNT(A1:A22) =AVERAGE(A1:A22) c) Determine the minimum in this population (the first value in a magnitude array). If the data have been sorted from smallest to largest, the smallest (minimum) value will be in the first position, cell A1, and the largest will be located in the last position, cell A22 in this example. =MIN(A1:A22) d) Determine the maximum in this population (the last value in a magnitude array). =MAX(A1:A22) e) Determine the median (the middle value in a magnitude array). For an odd number of data points, the median is the middle value. The middle value of n data points if n is even is given by the average of the values of the two middle terms. =MEDIAN(A1:A22) f) Determine the first quartile.
  • 129. The first quartile is that value below which one-quarter of the observations lie. Because there is no generally accepted definition of quartile, different programs gives different results for quartiles. ISLeX is programmed to calculate quartiles in the same way that Excel uses. =QUARTILE(A1:A22,1) EXCEL EXAMPLES EXAMPLE 8 74 INTRODUCTORY STATISTICS LABORATORY g) Determine the third quartile. The third quartile is that value below which three-quarters of the observations lie. =QUARTILE(A1:A22,3) NOTE: The median is sometimes referred to as the second quartile (Q2) because it is the value below which 2/4 of the values lie. The first quartile (Q1), the median (Q2) and the third quartile (Q3) divide the data values into four groups. We know that 1/4 of the data values are less than Q1, 1/4 are between Q1 and Q2, 1/4 are between Q2 and Q3, and 1/4 are greater than Q3. For some purposes, it may be sufficient to summarize a large data set by presenting these three values. h) Determine the standard deviation.
  • 130. The standard deviation is the square root of the variance, and the variance is the average of the squares of differences between individual data points and the overall mean. Remember that the standard deviation of a population is calculated differently than a standard deviation of a sample. It is important to know if you have a sample or a population. =STDEV.S(A1:A22) for a sample =STDEV.P(A1:A22) for a population 23 20 22 Uses =COUNT(A1:A22) to count number of observations 29 22.77273 Uses =SUM(A1:A22)/COUNT(A1:A22) to calculate average 29 16 Uses =MIN(A1:A22) to calculate the minimum value 27 30 Uses =MAX(A1:A22) to calculate maximum value 23 23 Uses =MEDIAN(A1:A22) to calculate median value 17 19 Uses =QUARTILE(A1:A22,1) to calculate first quartile 17 27.75 Uses =QUARTILE(A1,A22,3) to calculate third quartile 22 4.669372 Uses =STDEV.S(A1:A22) to calculate standard deviation for a sample 23 25 21 21 18 16 21 24 19 27
  • 131. 19 25 24 EXAMPLE 9 EXCEL EXAMPLES INTRODUCTORY STATISTICS LABORATORY 75 Example 9: How to use the DESCRIPTIVE STATISTICS command of EXCEL The Descriptive statistics command of EXCEL will automatically calculate most of the summary statistics required of data in a single column [Excel 2013 Data Tab – Data Analysis and then choose Descriptive Statistics]. By listing several columns, the Descriptive statistics command can be applied to several columns simultaneously. Consider that data has been stored in column A. To calculate summary statistics for this column, follow these steps. Excel 2013: Data Tab and choose Data Analysis (on right) Double click on Descriptive statistics in the Data Analysis dialog box Set Input range to = A:A (or just highlight the data with the cursor) Click on Summary statistics
  • 132. Click on OK Your results will be on a new worksheet and will look like this (move column borders to see full text). Column1 Mean 23.90909 Standard Error 1.038041 Median 23.5 Mode #NUM! Standard Deviation 4.868843 Sample Variance 23.70563 Kurtosis -1.32235 Skewness -0.11628 Range 14 Minimum 16 Maximum 30 Sum 526 Count 22 This approach gives many of the summary statistics described in the preceding example as well as several others. The #NUM! Message means only that there are several possible values for the mode in this data set. EXCEL EXAMPLES EXAMPLE 10
  • 133. 76 INTRODUCTORY STATISTICS LABORATORY Example 10: Further uses of the EXCEL->As a calculator EXCEL can also be used as a calculator. The following statements would allow you to calculate 5.6-3.2 = 2.4 and store it in a cell in the EXCEL worksheet. It is important to start your equation with an “=” otherwise the calculator function is not enabled . =5.6-3.2 If 5.6 was stored in cell D3 and 3.2 was stored in cell D4, you could also use =D3-D4 The second option may be useful if 5.6 and 3.2 may be used in other calculations. This same scheme may be used for all elementary mathematical operations. Use - to indicate subtraction [ = 5.6 - 3.2] Use + to indicate addition [ = 5.6 + 3.2] Use * to indicated multiplication [ = 5.6 * 3.2] Use / to indicate division [ = 5.6 / 3.2] Use POWER to indicate exponentiation [ = POWER(5.6, 3.2)]
  • 134. EXAMPLE 12 EXCEL EXAMPLES INTRODUCTORY STATISTICS LABORATORY 77 Example 11: Calculations with a discrete probability distribution In this example, EXCEL is used to answer various questions dealing with a discrete probability distribution. EXCEL worksheet column A contains the event names and column B contains the corresponding probabilities. In PL SC 314, we will discuss only events that represent counts; e.g. number of seeds germinated, number of red blood cells, number of live plantlets, number of microbial colonies, et cetera. 0 0.018316 1 0.073263 2 0.146525 3 0.195367 4 0.195367 5 0.156293 6 0.104196 7 0.059540 8 0.029770 9 0.013231 10 0.005292 11 0.001925 12 0.000642 13 0.000197
  • 135. 14 0.000056 15 0.000015 16 0.000004 17 0.000001 18 0.000000 19 0.000000 20 0.000000 Suppose one were interested in the probability of exactly 10 in this distribution. This can be read directly from column B in the row position corresponding to A = 10. Thus, P(X = 10) = 0.005292. A powerful way of calculating the probabilities of compound events is to sum parts of the probability table. Suppose you want the probability of less than 13. You must add the probabilities for 0, 1, . . 12. Those probabilities are in cells B1:B13. To calculate the probability, you could move to cell C1 and enter the formula = SUM(B1:B13). In this example, the probability of less than 13 is 0.99973 or 99.973 percent. Note that terms such as 'less than 13' and 'fewer than 13' include all possible values from the smallest up to, but excluding, 13. Similarly, 'more than 13' or 'greater than 13' would not include 13. Moreover, the term 'between 5 and 10' would include 6, 7, 8 and 9, and would exclude 5 and 10.
  • 136. EXCEL EXAMPLES EXAMPLE 11 78 INTRODUCTORY STATISTICS LABORATORY However, ‘no more than 13’ would include 13. ‘At least 13’ would include 13 and all higher values. The following three examples show other questions that can be dealt with in this general manner. a) P[10 < X < 21] = ? = P(11) + P(12) + P(13) + P(14) + … + P(20). P(11) is listed in row 12 of column B while P(20) is listed in row 21 of column B. = SUM(B12:B21) = 0.0028398 b) P[(X < 6) or (X > 14)] = ? In this example, calculate P(0) + P(1) + … +P(5) + P(15) + P(16) + … + P(20) = SUM(B1:B6)+SUM(B16:B21) = 0.78515 c) P[X > 0] = ? = SUM(B2:B21) = 0.98168 or = 1 - B1 = 0.98168 In order to calculate the mean of a probability distribution, one must use the methods for calculating the mean of a relative frequency distribution. The