In any sampling problem there is a relevant population , the set of all members about which the study intends to make inferences.
Before we select a sample from a given population, we typically need a list of all members of the population. This list is called the frame , and the potential sample members are called sampling units .
There are two type of samples, probability samples and judgmental samples.
The idea is very simple. We first generate a column of random numbers in column C. Then we sort the rows according to the random numbers and choose the first 10 families in the sorted rows.
The following procedure produces the results.
Random numbers. Enter the formula =RAND() in cell C10 and copy it down column C.
Replace with values. To enable sorting we must “freeze” the random numbers - that is, replace their formulas with values. To do third, select the range C10:C49 use Edit/Copy and then use Edit/Paste Special with the Values option.
Copy to a new range. Copy the range A10:C49 to the range E10:G49.
Sort. Select the range E10:G49 and use the Data/Sort menu item. Sort according to the Random # column in ascending order. Then the 10 families with the 10 smallest random numbers are the ones in the sample.
Means. Use the AVERAGE, MEDIAN and STDEV functions in row 6 to calculate summary statistics of the first 10 incomes in column F.
To select small accounts only, insert blank row after account 150 (the last small account).
Then, with the cursor anywhere in the small account data set, use the StatPro/Statistical Inference/Generate Random Samples enter 50 and 15 as the number of samples and the sample size, and put the results in a new sheet.
To find the amounts owed for the sampled accounts, enter the formula =VLOOKUP(B3,Data!Data,4) in cell B21 and copy it to the range B21:AY35.
Then calculate the average in row 37 with the AVERAGE function and transpose this row of average to a column of averages in BA4:BA53 with the formula =TRANSPOSE(B37:AY37) and pressing Ctrl-Shift-Enter.
Use StatPro’s histogram procedure to create a histogram - each will look different because of the random numbers selected.
To see what age category each customer falls in we enter the formula =IF(B11<=$D$6,1,IF(B11<=$D$7,2,3)) in cell C11 and then copy it down column C.
Next, it is useful to “unstack” the data into three groups, one for each age category.
It is easy to unstack the data in columns A-C.
With the cursor anywhere in A10:C1010 select StatPro/Data Utilities/Unstack Variables. Select Category as the Code variable, select Cust and Age as the variables to unstack, and accept the default location for the unstacked variables.
The advantage of cluster sampling is sampling convenience (and possibly less cost).
It is straightforward to select a cluster sample. The key is to define the sampling units as the clusters, then select a simple random sample of clusters. Then sample all the population members in each selected cluster.
When all sampling units within each cluster are taken it is called a single stage sampling scheme.
Real applications are often more complex and result in multistage sampling schemes .
What does this experiment have to do with random sampling?
Here, the population is the set of all outcomes we could obtain from a single spin of the wheel; that is, all dollar values from $0 to $1000. Each spin results in one randomly sampled dollar value from the population.
Furthermore, because we have assumed that the wheel is equally likely to land in any position, all possible values in the continuum from $0 to $1000 have the same chance of occurring.
The experiment simulates n spins of the wheel and calculates the average - that is, the winnings - from the n spins.
Based on these 1000 replications, we can then calculate the average winnings, the standard deviation of winnings, and a histogram of winnings for each n . These will show clearly how the distribution of winnings depends on n .
The following slide shows the results for n =1. Here, there is no averaging.
To replicate the experiment 1000 times and collect statistics, we proceed as follows.
Random outcomes. To generate outcomes uniformly distributed between $0 and $1000 we enter the formula =$B$3RAND( ) *($B$4-$B$3) in cells B11 and copy it down column B. The effect of this formula is to generate a random number between 0 and 1 and multiply it be $1000.
Summary measures. Calculate the average and standard deviation of the 1000 winnings in column B with the AVERAGE and STDEV functions. These values appear in cells E4 and E5.
The histogram is now triangular shaped - symmetric, but not yet bell shaped.
To develop similar simulations for n =3, n =6, n =10, or any other n , we insert additional outcome columns and make sure that the AVERAGE formula in the Winnings column average all n outcomes to its left.