Upcoming SlideShare
×

13 sampling

434

Published on

1 Like
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total Views
434
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
19
0
Likes
1
Embeds 0
No embeds

No notes for slide

13 sampling

1. 1. 14-04-2012 1 Research Methodology Dr. NimitChowdhary,Professor Saturday, April 14, 2012 1© Dr. Nimit Chowdhary In statistics and survey methodology, sampling is concerned with the selection of a subset of individualsfrom within a populationto estimate characteristics of the whole population. Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 2
2. 2. 14-04-2012 2 Conditions Favoring the Use of Type of Study Sample Census 1. Budget Small Large 2. Time available Short Long 3. Population size Large Small 4. Variance in the characteristic Small Large 5. Cost of sampling errors Low High 6. Cost of nonsampling errors High Low 7. Nature of measurement Destructive Nondestructive 8. Attention to individual cases Yes No Fig. 11.1 Define the population Determine the sampling frame Select sampling technique(s) Determine the sample size Execute the sampling process
3. 3. 14-04-2012 3 © Dr. Nimit Chowdhary Research Methodology Workshop p. 5 © Dr. Nimit Chowdhary Research Methodology Workshop p. 6
4. 4. 14-04-2012 4 Target population Populationof interest Sampling frame Listor rule defining the population List of target sample Actual population to which generalizations are made Defined/ listed by samplingframe Target sample Sample The people actuallystudied Response rate Generalization Method of selection  Need to distinguish between the population of interest and actual populationdefined by samplingframe  Generalizationscan be made only to ‘actual population’  Understand crucial role of the sampling frame Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 8
5. 5. 14-04-2012 5  The list or procedure defining the Population. (From which the sample will be drawn.)  Distinguishsampling frame from sample.  Examples:  Telephone book  Voter list  Random digit dialling  Essential for probabilitysampling, but can be defined for non-probabilitysampling Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 9 © Dr. Nimit Chowdhary Research Methodology Workshop p. 10 Sampling Non- probability Probability Convenience Judgmental Quota Snowball Random sampling Stratified sampling Cluster sampling Systematic Proportionate Dis- proportionate
6. 6. 14-04-2012 6  A probabilitysample is one in which each element of the populationhas a known non- zero probability of selection.  Not a probabilitysample of some elements of populationcannot be selected (have zero probability)  Not a probabilitysample if probabilitiesof selection are not known. Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 11  Cannot guarantee “representativeness” on all traits of interest  A sampling plan with known statistical properties  Permits statement like- “there are 99% chances that the true populationcorrelation falls between 0.46 and 0.56. Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 12
7. 7. 14-04-2012 7  If the sampling frame is a poor fit to the populationof interest, random sampling from that frame cannot fix the problem  The sampling frame is non-randomly chosen. Elements not in the sampling frame have zero probabilityof selection.  Generalizationscan be madeONLY to the actual populationdefined by the sampling frame Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 13  Each element in the population has an equal probabilityof selectionAND each combination of elements has an equal probabilityof selection  Names drawn out of a hat  Random numbers to select elements from an ordered list Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 14
8. 8. 14-04-2012 8 © Dr. Nimit Chowdhary Research Methodology Workshop p. 15 http://faculty.elgin.edu/dkernler/statistics/ch01/4-1.html A small catering business serves 9 reception centers.The owner wants to interview a sample of 4 clients in detail to find ways to improve services to his/her clients. To avoid bias, the owner chooses a simple random sample of size 4. Saturday, April 14, 2012 © Dr. Nimit Chowdhary
9. 9. 14-04-2012 9 Each reception center is assigned a numerical label 1-9. 1 - Darlene’sWeddingCenter 2 - Magic Moments Reception Hall 3 - Rustic RealmWeddings 4 - Romance Gardens 5 - Classic Weddings 6 - OldTime Chapel 7 - Lovers Lane Weddings 8 -Accents-ModernWeddings 9 - Century Falls ReceptionCenter Saturday, April 14, 2012 © Dr. Nimit Chowdhary The owner decides to use a statistical software program to generate 4 numerical labels between 1 and 9 at random. The software returns the following numbers: 5, 8, 6, 4 Therefore, the simple random sample to be interviewed in detail will be: ClassicWeddings(5) Accents-ModernWeddings(8) OldTimeChapel (6) RomanceGardens(4)
10. 10. 14-04-2012 10  Sometimessubpopulations within your entire populationvary considerably. In this case, it is advantageous to divide your sample into subpopulationscalled "strata“ and then perform simple random sampling withineach stratum. This is stratified sampling. Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 19  Dividepopulationinto groups that differ in importantways  Basis for grouping must be known before sampling  Select random sample from within each group Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 20
11. 11. 14-04-2012 11 © Dr. Nimit Chowdhary Research Methodology Workshop p. 21 Imagineyou would like to interview schools that contract with different vendors to bring food to their cafeteria.We would expect opinions about cafeteria food to vary widely from school to school. Therefore, it makes sense to create school strata to sample from. Suppose the schools are as follows: School 1: 1050 students School 2: 565 students School 3: 1554 students School 4: 306 students
12. 12. 14-04-2012 12  Total students: 1050 + 565 + 1554 + 306 = 3475 students  The administrator wishes to take a sample of 150 students. The first step is to find the total number of students (3475 above) and calculate the percent of students in each stratum. School 1: 1050 / 3475 = .30 School 2: 565 / 3475 = .16 School 3: 1554 / 3475 = .45 School 4: 306 / 3475 = .09
13. 13. 14-04-2012 13 Next, to select a sample in proportion to the size of each stratum (in this case school), the followingnumber of students should be randomlyselected: School 1: 150 x .30 = 45 School 2: 150 x .16 = 24 School 3: 150 x .45 ~ 67 School 4: 150 x .09 ~ 14 This tells us that our sample of 150 students should be comprised of:  45 students randomly selected from School 1  24 students randomly selected from School 2  67 students randomly selected from School 3  14 students randomly selected from School 4 The primary advantage of stratified sampling over simple random sampling is it improves accuracy of estimation if you select a relevant stratification variable.
14. 14. 14-04-2012 14 Is a statistical method involving the selection of elements from an ordered sampling frame.The most commonform of systematic sampling is an equal-probabilitymethod, in which every kth element in the frame is selected, where k, the sampling interval (sometimes known as the skip). Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 27  Number the units in the populationfrom 1 to N  decide on the n (sample size) that you want or need  k = N/n = the interval size  randomlyselect an integer between 1 to k  then take every kth unit Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 28
15. 15. 14-04-2012 15 © Dr. Nimit Chowdhary Research Methodology Workshop p. 29  Has same error rate as simplerandom sample if the list is in random or haphazardorder  Provides the benefits of implicitstratification if the list is grouped Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 30
16. 16. 14-04-2012 16  Runs the risk of error if periodicityin the list matches the sampling interval  This is rare.  In this example, every 4th element is red, and red never gets sampled. If j had been 4 or 8, ONLY reds would be sampled. Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 31  Done correctly, this is a form of random sampling  Populationis divided into groups, usually geographic or organizational
17. 17. 14-04-2012 17  Some of the groups are randomlychosen  In pure cluster sampling, wholecluster is sampled.  In simple multistage cluster, there is random samplingwithin each randomlychosen cluster © Dr. Nimit Chowdhary Research Methodology Workshop p. 34
18. 18. 14-04-2012 18  Populationis divided into groups  Some of the groups are randomly selected  For given sample size, a cluster sample has more error than a simplerandom sample  Cost savings of clustering may permit larger sample  Error is smaller if the clusters are similar to each other Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 35  Cluster sampling has very high error if the clusters are different from each other  Cluster sampling is NOT desirable if the clusters are different  It IS random sampling: you randomlychoose the clusters  But you will tend to omit some kinds of subjects
19. 19. 14-04-2012 19 Example: Election forecast! Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 37  Reduce the error in cluster samplingby creating strata of clusters  Sampleone cluster from each stratum  The cost-savings of clustering with the error reduction of stratification Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 38
20. 20. 14-04-2012 20 STRATIFICATION  Divide population into groupsdifferent from each other: sexes, races, ages  Sample randomly from each group  Less error compared to simple random  More expensive to obtain stratification information before sampling CLUSTERING  Divide population into comparable groups: schools, cities  Randomly sample some of the groups  More error compared to simple random  Reduces costs to sample only some areas or organizations Saturday, April 14, 2012 © Dr. Nimit Chowdhary  Combines elements of stratification and clustering  First you define the clusters  Then you group the clusters into strata of clusters, putting similarclusters together in a stratum Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 40
21. 21. 14-04-2012 21  Then you randomly pick one (or more) cluster from each of the strata of clusters  Then you sample the subjects within the sampled clusters (either all the subjects, or a simplerandom sample of them) Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 41  Convenience sampling  Purposive sampling  Quota sampling  Snowballsampling Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 42
22. 22. 14-04-2012 22  Subjects selected because it is easy to access them.  No reason tied to purposes of research.  Students in your class, people on street, friends Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 43  Subjects selected for a good reason tied to purposes of research  Smallsamples < 30, not large enough for power of probabilitysampling.  Nature of research requires small sample  Choose subjects with appropriate variability in what you are studying  Hard-to-get populationsthat cannot be found through screening general population Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 44
23. 23. 14-04-2012 23  Examples  test markets  purchase engineers selected in industrial marketing research  bellwether precincts selected in voting behavior research  expert witnesses used in court Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 45  Pre-plan number of subjects in specified categories (e.g. 100 men, 100 women)  In uncontrolled quota sampling,the subjects chosen for those categories are a convenience sample, selected any way the interviewer chooses Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 46
24. 24. 14-04-2012 24  In controlled quota sampling, restrictions are imposed to limit interviewer’s choice  No call-backs or other features to eliminate convenience factors in sample selection Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 47  In Stratified Sampling, selection of subject is random. Call-backs are used to get that particular subject.  Stratified sampling without call-backs may not, in practice, be much different from quota sampling.  InQuota Sampling, interviewer selects first available subject who meets criteria: is a convenience sample.  Highly controlled quota sampling uses probability sampling down to the last block or telephone exchange
25. 25. 14-04-2012 25 In snowball sampling, an initial group of respondents is selected, usually at random.  After being interviewed, these respondents are asked to identify others who belong to the target populationof interest.  Subsequent respondents are selected based on the referrals. Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 49  Heterogeneity: need larger sample to study more diverse population  Desired precision: need larger sample to get smallererror  Sampling design: smaller if stratified, larger if cluster Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 50
26. 26. 14-04-2012 26  Nature of analysis: complex multivariate statistics need larger samples  Accuracy of sample depends upon sample size, not ratio of sample to population Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 51  Often a non-random selection of basic samplingframe (city, organizationetc.)  Fit between sampling frame and research goalsmust be evaluated  Samplingframe as a concept is relevant to all kinds of research (including non-probability) Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 52
27. 27. 14-04-2012 27  Non-probabilitysampling means you cannot generalizebeyond the sample  Probabilitysamplingmeans you can generalizeto the populationdefined by the samplingframe Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 53  There are normally two case:  Determining sample size for percents  Determining sample size for means Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 54 n Z e   e Error that is acceptable Z Z score that is calculatedon the basis of desired confidence  estimated standard deviation of the populationunder study (from paststudy, pilot study n Samplesize to be determined
28. 28. 14-04-2012 28 A fast food company wants to determine the average number of times that fast food users visit fast food restaurants per week. They have decided that their estimate needs to be accurate within plus or minus one-tenth of a visit, and they want to be 95% sure that their estimate does differ from true number of visits by more than one-tenth of a visit. Previous research has shown that the standard deviation is .7 visits. What is the required sample size? Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 55 Population standard deviation(): .7 Maximum acceptable difference (e): .1 Desired confidence interval (%): 95  Z = 1.96 Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 56
29. 29. 14-04-2012 29 Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 57 188 )1.0( )7.0()96.1( 2 22 2 22     n e Z n n Z e   A publishing wants to know what percent of the population might be interested in a new magazine on making the most of your retirement. Secondary data (that is several years old) indicates that 22% of the population is retired. They are willing to accept an error rate of 5% and they want to be 95% certain that their finding does not differ from the true rate by more than 5%. What is the required sample size? Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 58
30. 30. 14-04-2012 30 Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 59 2 22 e Z n n Z e       in this case pq Population proportion (p) = .22 Therefore, q= .78 Population standard deviation() = (√p q)=√(.22)(.78) Maximum acceptable difference (e): 0.05 Desired confidence interval (%): 95  Z = 1.96
31. 31. 14-04-2012 31 Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 61 263 )1.0( )78.0)(22.0()96.1( 2 2 2 2 2 22     n e pqZ e Z n n Z e  