This chapter begins our study of sampling. Sampling is a process of selecting items from a population so we can use the information to make judgments or inferences about the population.
There are many practical reasons why we prefer to select portions or samples of a population to observe and measure when studying characteristics of a population.
These are not the only methods of sampling available to a researcher. If you become involved in a major research project, consult books devoted to sample theory and sample design.
In an unbiased sample, all members of the population have a chance of being selected for the sample. If the population is relatively small, one could place all the items of interest, say names, in a box and select the sample. For large samples like the one in this example, this is very time-consuming.
Other ways to select a sample are to use a table of random numbers like the one in Appendix B.4 or to use a statistical software program or Excel to select a simple random sample. This example shows how to select random numbers using a portion of a random number table. To choose a starting point, you could close your eyes and simply point at a number in the table. Another way is to randomly pick a column and a row. Suppose the time is 3:04; go to column three and down to row four. Here, the number is 03759, so we use the first three digits in the five digit number. To continue selecting, you could move in any direction, in this example, we move right. The next number is 44723, so we choose player 447, the next number is 961 and is too high so we skip it and we skip the next number, 784 because it is also too high. Then the next number is 18910, so we select player 189 and so on until we have a list of 10 players. The instructions on how to use Excel to select a random sample are in Appendix C.
Caution: If the population is in some order already, like invoices arranged in increasing dollar amounts, the systematic procedure should not be used.
This example of stratified random sampling groups the firms by profitability, measured by the percent return on equity. The advantage of stratified random sampling is that this method may more accurately reflect the population characteristic since at least one firm will be selected from each strata, even those with low numbers; which might not happen when using random sampling or systematic random sampling.
In this example, we wish to determine the views of residents in the greater Chicago area about state and federal environmental protection policies. Selecting a random sample of residents in the region and personally contacting each one would be time-consuming and expensive. Instead, let the counties serve as the primary unit and select a random sample of the counties and then each of the selected county’s residents. This is a combination of cluster sampling and simple random sampling.
In this example, the population is the number of rooms rented each of the 30 days in the month of June. The sampling errors are 3.80 – 3.13 = .67; 3.40-3.13 = .27; 1.80 – 3.13 = -1.33. Notice, the sampling error may be a positive value or a negative value. Using the combination formula, we find there are 142,506 possible samples of size 5 possible with a population of 30 values. If you summed the sampling errors for all 142,506 samples, the result would equal 0. This is because the sample mean is an unbiased estimator of the population mean.
We can expect there to be sampling error between sample standard deviations and the corresponding population standard deviation as well.
In the next few slides, we’ll use the sampling distribution of the sample mean to help explain how we can rely on sample estimates.
Here is an example using a small population of just 7 to highlight the relationship between the population mean and the various sample means.
We learned how to calculate a population mean in chapter 3; it is designated with the Greek letter mu, μ.
This is Table 8-3 in the text and is a table of all samples of size 2 taken from the population of the 7 Tartus Industries production employees. There are 21 possible samples and the sample mean of hourly earnings has been calculated for each.
This is Table 8-4 in the textbook and is a probability distribution called the sampling distribution of the sample mean for n=2.
The mean of the population is exactly equal to the mean of the sample means. The sample means range from $14 to $17 while the population values range from $14 to $18. The shape of the sampling distribution of the sample mean and the shape of the frequency distribution of the population values are different; as wee see in this chart (8-2), the distribution of the sample mean tends to be more bell-shaped and approximate the normal distribution.
The Central Limit Theorem is one of the most useful conclusions in statistics and is true for all population distributions. This allows us to use the normal probability distribution to create confidence intervals for the population mean (chapter 9) and perform tests of hypothesis (chapter 10).
This is Chart 8-3 from the textbook; at the top of the chart there are 4 different population distributions. Following each of these downward, we observe that the sample distributions appear more normal as sample size increases. In other words, we observe the convergence to a normal distribution regardless of the shape of the population distribution.
Even if we do not select all samples, we can expect the mean of the distribution of the samples means to be close to the population mean. If the population standard deviation is sigma, the standard deviation of the distribution of sample means is sigma divided by the square root of n. Note that when the sample size is increased, the standard error of the mean decreases.
Use this formula when the population standard deviation is known or is assumed to be known.
First we find z, then we use the table in Appendix B.3 or statistical software to determine probability. In this example, we find that it is unlikely, less than a 4% chance, we could select a sample of 16 observations from a normal population with a mean of 31.2 ounces and a population standard deviation of 0.4 ounce and find the sample mean equal to or greater than 31.38 ounces. We conclude the process is putting too much cola in the bottles. The quality technician should see the production supervisor about reducing the amount of soda in each bottle.