1.
Session 2 - Learning Objectives <ul><li>By the end of today’s session you should be able to: </li></ul><ul><li>Describe key Sampling Terminology </li></ul><ul><li>Perform a simple random sample using Excel and StatPro </li></ul><ul><li>Differentiate between Systematic and Stratified Sampling </li></ul>
2.
Example 8.1 Selecting a Sample of Families to Analyze Annual Incomes Methods for Selecting Random Samples
3.
Objective To illustrate how Excel’s random number function, RAND, can be used to generate simple random samples.
4.
RANDSAMP.XLS <ul><li>This file contains data about the annual incomes of 40 families. </li></ul><ul><li>We want to choose a simple random sample of size 10 from this frame. </li></ul><ul><li>How can this be done? </li></ul><ul><li>And how do summary statistics of the chosen families compare to the corresponding summary statistics of the population? </li></ul>
6.
Sampling Terminology <ul><li>In any sampling problem there is a relevant population , the set of all members about which the study intends to make inferences. </li></ul><ul><li>Before we select a sample from a given population, we typically need a list of all members of the population. This list is called the frame , and the potential sample members are called sampling units . </li></ul><ul><li>There are two type of samples, probability samples and judgmental samples. </li></ul>
7.
Sampling Terminology -- continued <ul><li>A probability sample is a sample in which the sampling units are chosen from the population by means of a random mechanism such as a random number table. </li></ul><ul><li>No formal random mechanism is used to select a judgmental sample , in this case the sampling units are chosen according to the sampler’s judgment. </li></ul><ul><li>The simplest type of sampling scheme is appropriately called simple random sampling. </li></ul>
8.
Solution <ul><li>The idea is very simple. We first generate a column of random numbers in column C. Then we sort the rows according to the random numbers and choose the first 10 families in the sorted rows. </li></ul><ul><li>The following procedure produces the results. </li></ul><ul><ul><li>Random numbers. Enter the formula =RAND() in cell C10 and copy it down column C. </li></ul></ul><ul><ul><li>Replace with values. To enable sorting we must “freeze” the random numbers - that is, replace their formulas with values. To do third, select the range C10:C49 use Edit/Copy and then use Edit/Paste Special with the Values option. </li></ul></ul>
9.
Solution -- continued <ul><ul><li>Copy to a new range. Copy the range A10:C49 to the range E10:G49. </li></ul></ul><ul><ul><li>Sort. Select the range E10:G49 and use the Data/Sort menu item. Sort according to the Random # column in ascending order. Then the 10 families with the 10 smallest random numbers are the ones in the sample. </li></ul></ul><ul><ul><li>Means. Use the AVERAGE, MEDIAN and STDEV functions in row 6 to calculate summary statistics of the first 10 incomes in column F. </li></ul></ul>
11.
More Random Samples Automatically <ul><li>If we would like more random samples of size 10 we would need to repeat the process repeatedly. </li></ul><ul><li>To save you the trouble, we have setup a macro to automate the process. See the Automated sheet of the RANDSAMP.XLS file. By clicking on the button we get a different random sample. </li></ul>
12.
Example 8.2 Sampling from Accounts Receivable at Spring Mills Company Methods for Selecting Random Samples
13.
Objective To illustrate StatPro’s method of choosing simple random samples, and how the associated data can be found with Excel’s lookup functions.
14.
RECEIVE.XLS <ul><li>This file contains 280 accounts receivable for the Spring Mills Company. There are three variables: </li></ul><ul><ul><li>Size: customer size (small, medium, large), depending on its volume of business with Spring Mills </li></ul></ul><ul><ul><li>Days: number of days since the customer was billed </li></ul></ul><ul><ul><li>Amount: amount of the bill </li></ul></ul><ul><li>Generate 50 random samples of size 15 each from the small customers only, calculate the average amount owed in each random sample, and construct a histogram of these 50 averages. </li></ul>
16.
Solution <ul><li>To select small accounts only, insert blank row after account 150 (the last small account). </li></ul><ul><li>Then, with the cursor anywhere in the small account data set, use the StatPro/Statistical Inference/Generate Random Samples enter 50 and 15 as the number of samples and the sample size, and put the results in a new sheet. </li></ul><ul><li>To find the amounts owed for the sampled accounts, enter the formula =VLOOKUP(B3,Data!Data,4) in cell B21 and copy it to the range B21:AY35. </li></ul>
17.
Solution -- continued <ul><li>Then calculate the average in row 37 with the AVERAGE function and transpose this row of average to a column of averages in BA4:BA53 </li></ul><ul><li>Use StatPro’s histogram procedure to create a histogram - each will look different because of the random numbers selected. </li></ul>
18.
Solution -- continued <ul><li>The histogram indicates the variability of sample means we might obtain by selecting many different random samples of size 15 from this population of small customer accounts. </li></ul>
19.
Solution -- continued <ul><li>This histogram, which is approximately bell shaped, approximates the sampling distribution of the sample mean. </li></ul>
20.
Example 8.3 Stratified Sampling from the Smalltown Population of Sears Credit Card Holders Methods for Selecting Random Samples
21.
Objective To illustrate how stratified sampling, with proportional sample sizes, can be implemented in Excel.
22.
STRATIFIED.XLS <ul><li>This file contains a frame of all 1000 people in the city of Smalltown who have Sears credit cards. </li></ul><ul><li>Sears is interested in estimating the average number of other credit cards these people own, as well as other information about their use of credit. </li></ul><ul><li>The company decides to stratify these customers by age, select a stratified sample of size 100 with proportional sample sizes, and then contact these 100 people by phone. </li></ul><ul><li>How might Sears proceed? </li></ul>
23.
Systematic Sampling <ul><li>A systematic sample provides a convenient way to choose the sample. </li></ul><ul><li>It works as follows: </li></ul><ul><ul><li>First, we calculate the sampling interval as the population size 1000 divided by the sample size,100. </li></ul></ul><ul><ul><li>Next, we use a random mechanism to choose a number between 1 and 10 (Say number 7). </li></ul></ul><ul><ul><li>Then we choose the 7th name, the 17th name, the 27th and so on. The result is a systematic sample of size n=100. </li></ul></ul>
24.
Stratified Sampling <ul><li>Suppose we can identify various subpopulations within the total population. We call these subpopulations strata . </li></ul><ul><li>It makes sense to select a simple random sample from the stratum instead of from the entire population. This is called stratified sampling . </li></ul><ul><li>This method is particularly useful when there is considerable variation between the various strata but relatively little variation within a given stratum. </li></ul>
25.
Stratified Sampling -- continued <ul><li>To obtain a stratified random sample we must choose a total sample size n , and we must choose a sample size n i for each stratum i . </li></ul><ul><li>There are many ways to choose these numbers but the most popular method is proportional sample sizes . </li></ul><ul><li>The advantage of proportional sample sizes is that they are very easy to determine. The disadvantage is that they ignore differences in variability among the strata. </li></ul>
26.
Solution <ul><li>First Sears must decide exactly how to stratify by age. </li></ul><ul><li>There reasoning is that different age groups probably have different attitudes and behavior regarding credit. </li></ul><ul><li>After preliminary investigation they decide to have three age categories: 18-30, 31-62, and 63-80. </li></ul><ul><li>The calculation goes as follows: </li></ul><ul><ul><li>the total sample size is cell C3 </li></ul></ul><ul><ul><li>the definitions of the strata in rows 6-8 </li></ul></ul><ul><ul><li>the customer data in range A11:B1010 </li></ul></ul>
28.
Solution -- continued <ul><li>To see what age category each customer falls in we enter the formula =IF(B11<=$D$6,1,IF(B11<=$D$7,2,3)) in cell C11 and then copy it down column C. </li></ul><ul><li>Next, it is useful to “unstack” the data into three groups, one for each age category. </li></ul><ul><ul><li>It is easy to unstack the data in columns A-C. </li></ul></ul><ul><ul><li>With the cursor anywhere in A10:C1010 select StatPro/Data Utilities/Unstack Variables. Select Category as the Code variable, select Cust and Age as the variables to unstack, and accept the default location for the unstacked variables. </li></ul></ul>
29.
Solution -- continued <ul><li>Once the variables are unstacked we can calculate the counts and sample sizes in F6:G8 with the formulas =COUNT(E11:E142) and =ROUND(TotSampSize*F6/1000,0) . </li></ul><ul><li>Finally, we proceed by copying the data in columns E and F into L and M and append a a column of random numbers, sort on the random number column and choose the first 13 (or how ever many) customers. </li></ul><ul><li>The file shows the calculations for the other categories. </li></ul>
Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.
Be the first to comment