Lesson 11 Introduction To Sampling


Published on

Sampling Methods in Market Research

Published in: Business, Education, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Lesson 11 Introduction To Sampling

  1. 1. Introduction to Sampling<br />Lesson 11<br />
  2. 2. What is Sampling?<br />Drawing conclusions about a large group on the basis of a small sample.<br />Useful because you get a wealth of knowledge on the basis of a good sample of the universe.<br />Low Cost Advantage in collecting data instead of complete enumeration.<br />Sampling is the most important concept in MR and the bedrock on which MR is based.<br />
  3. 3. Definition of the Universe being studied<br />The universe or population is the entire group of items which the researchers wish to study and about which they plan to generalize.<br />Definition determined by Research Objectives of the particular study.<br />All Women Homemakers over 40 years of age residing in USA<br />All grocery stores in Delhi<br />
  4. 4. Definition of the Variables being studied<br />A grocery store has Diet Pepsi in stock or not.<br />A grocery store stocking bottled soft drinks:<br />Glass bottle or plastic bottle?<br />Cans<br />Take awaysvis a vis in-store consumption?<br />Fountain Soft Drink?<br />Soft Drink Concentrates?<br />Only Carbonated Bevearges?<br />Variable may or may not have a strict definition. <br />
  5. 5. Sample Design<br />Determining Sampling Units.<br />Selecting the sample items and determining Sample Size.<br />Estimating Universe Characteristics from sample data.<br />
  6. 6. Choice of Sampling UnitsSelecting the Sample<br />Probability Sampling Methods are those in which every item in the universe (e.g., every grocery store in Delhi) has a known chance, or probability, of being chosen for the sample.<br />This implies that the selection of sample items is independent of the person making the study – i.e, the sampling operation is controlled so objectively that the items will be chosen strictly at random. <br />
  7. 7. Selecting the Sample<br />Non-probability sampling methods are those which do not provide every item in the universe with a known chance of being included in the sample.<br />The selection process is, at least partially, subjective.<br />
  8. 8. A probability Sampling Method &A non-probability Sampling Method<br />From a list of all New York metropolitan area grocery stores, select a sample of 50 stores at random – ie., in such a way as to give each store an equal chance of being selected. Fieldworkers visit all 50 stores and observe whether Pepsi Cola is in Stock.<br />Ten New York metropolitan are fieldworkers visit five ‘average’ grocery stores near their homes and observe whether Pepsi Cola is in stock.<br />
  9. 9. Estimating Universe characteristics from sample data<br />Market Researchers are usually interested in summary numbers which describe particular properties of a given universe like the arithmetic mean or percentage of observation which show a given characteristic.<br />In practice, researchers do not usually know these summary values for the universe.<br />They estimate them by measuring the given characteristics in a sample.<br />Thus, researchers are forced to rely on estimates of the universe values, which will generally be different from the true universe values.<br />It is important to note that an universe value (mean or percentage) is a fixed number even though generally it is not known.<br />In contrast, the estimate of the universe value obtained from a sample will vary from one sample to the next. <br />If 100 independently selected samples of x items, each from the same universe, one would expect to obtain a different sample mean or percentage each time – even though there was only one real universe mean or percentage.<br />
  10. 10. Estimating Universe characteristic from sample data<br />All face cards removed from a deck of 52 playing cards. <br />What is left? 40 cards<br />Removed : 3 King, Queen, Jacks X 4 = 12 cards<br />Numerical Value<br />Suit <br />Color : Red or Black<br />Universe Characteristics : <br />Arithmetic mean 5.5<br />Proportion of cards represented by any suit 0.25<br />Proportion of red cards 0.5<br />
  11. 11. Distribution of 50 sample proportions taken from samples of 5 cards eachuniverse proportion of red cards 0.5<br />
  12. 12. Observation 1<br />All samples do not lead to the same estimate of the universe value. Sample estimates of red cards varied from 0 to 1, when the universe proportion was 0.5. Thus, unless all items in a universe are identical, different samples may lead to different estimates.<br />
  13. 13. Observation 2<br />Some sample estimates differ in extreme fashion from the universe value being estimated (in the illustration , the 0.00 and 1.00 values are examples) is an inevitable consequence of sampling. <br />It is the price which must be paid for generalizing about a universe characteristic on the basis of a sample.<br />
  14. 14. Observation 3<br />Most of the estimates tend to cluster around the true universe proportion of red cards. <br />It is this property which is a major justification for use of probability sampling.<br />No such reasoning exists in non-probability sampling method.<br />
  15. 15. Simple Random Sampling <br />Probability sampling is the only sampling technique available which will provide an objective measure of the reliability of the sample estimate.<br />The simplest possible probability sampling method is called simple random sampling.<br />In probability sampling, every possible sample of a given size drawn from a specified universe has a known chance of being selected.<br />In simple random sampling, every possible sample has a known and equal chance of selection.<br />
  16. 16. Example of Simple Random Sampling<br /><ul><li>Place 15 chips in a bowl and mix thoroughly.
  17. 17. Choose 1 Chip blindly.
  18. 18. Number on the chip identifies a random sample of two </li></ul>Homemakers.<br /><ul><li>Repeat large number of times.
  19. 19. Each time returning the chip previously obtained in the bowl.
  20. 20. Mixing again before next selection.</li></ul>In the long run, every chip and, therefore, every possible sample<br />Will be obtained with equal frequency and with a known <br />Probability of 1 in 15.<br />Simple random sample is a sample chosen by a process<br />That guarantees, in the long run, that every possible sample<br />Will be represented with equal and known frequency.<br />
  21. 21. Sample Values as Estimates of Universe Values<br />The sample mean is an unbiased estimate of the universe mean ie., sample means do not tend, on the average, to be higher or lower than the universe mean.<br />
  22. 22. Example<br />If A=1, B=3, C=4, D=8<br />Sample - Sample Mean<br />Abc – 2 and 2/3<br />Abd – 4<br />Acd – 4 and 1/3<br />Bcd – 5<br />The average of all possible means = 4<br />(22/3+4+ 41/3+5) / 4 = 4<br />Thus, the sample mean affords an unbiased estimate of the universe mean.<br />
  23. 23. Sample Values as Estimates of Universe Values<br />Similarly, a sample proportion (or percentage) provides an unbiased estimate of the corresponding universe proportion (or percentage).<br />Example, if sample shows that 75% respondents prefer to buy drugs directly from the manufacturer. This means if a large number of simple random samples were drawn from this universe and sample percentage computed each time, then the average value of these sample percentages would tend to equal the universe percentage. This signifies an unbiased estimate. <br />
  24. 24. Biased Estimate<br />Not all sample values provide unbiased estimates of the corresponding universe values. <br />A conspicuous example is a group of estimates called Ratio estimates which are biased estimates of the corresponding universe values.<br />
  25. 25. Biased Estimate Example<br />
  26. 26. Biased Estimate Example<br />Overall Brand X has 50% Market Share in 5 stores. Using samples of size two, it will be shown that average brand x share for all possible samples is not equal to the brand share for the universe as a whole.<br />
  27. 27. Biased Estimate Example<br />
  28. 28. Biased Estimate Example<br />The average of all 10 possible sample brand shares is 0.493.<br />In this case, the sample value is called a biased estimate because the average of all possible sample brand shares (0.493) does not equal the universe brand share (0.500)<br />Therefore, the ratio estimate of brand share is a biased estimate of the universe brand share.<br />Thus, not all estimates derived from simple random samples provide unbiased estimates of the corresponding universe values.<br />
  29. 29. CONSTRUCTING A CONFIDENCE INTERVAL<br />Different samples from the same universe will give different estimates of the universe value.<br />This is because of sampling error ie., because the sample chosen will not be a precise replica of the universe.<br />Need to determine how precise, or reliable, the sample estimates are.<br />After a sample has been taken, need to determine a range of values within which there is surety that the true universe value lies.<br />Need to specify quantitatively how confident are the indicated range of values that will in fact include the true universe mean.<br />
  30. 30. CONSTRUCTING A CONFIDENCE INTERVAL<br />Thus, need a confidence interval estimate.<br />After a sample has been taken, the researcher wishes to say, with a particular degree of confidence, that the true universe mean lies between two specified numerical values, which are called confidence limits. <br />With simple random sampling data, one can measure the sampling error associated with an estimated mean or percentage, thereby setting bounds within which the universe value being estimated will likely lie.<br />
  31. 31. The Distribution of Sample Means<br />A listing of all possible sample means, together with their relative frequencies of occurrence, is called a sampling distribution of the mean or distribution of sample means.<br />The distribution of sample means will depend on the size of the sample being taken.<br />The larger the sample, the more closely the distribution of sample means will tend to be clustered around the mean of the universe being sampled.<br />
  32. 32. Sampling distribution of the means in large samples <br />Central Limit Theorem states that the distribution of sample means for large samples will be approximately a normal distribution.<br />
  33. 33. Characteristic features of the approximating normal distribution<br />C. Distribution of sample means when n=400<br />Relative Frequency of Occurence<br />B. Distribution of sample means <br />When n=100<br />A. Universe<br />Universe Mean<br />Household Income<br />
  34. 34. Four Important Characteristics<br />The mean of the distribution of sample means is equal to the universe mean. The universe mean lies at the vertical dashed line. The means of Curve B and Curve C lie there also.<br />The distribution of sample means (Curve B and Curve C) are symmetrically distributed about the universe mean.<br />In a distribution of sample means, there is a general tendency for the sample means to occur in the vicinity of the universe mean (ie., small deviations of sample means from the universe mean are more likely than large deviations).<br />As the sample size used gets larger, the distribution of sample means becomes more tightly clustered around the universe mean. Comparison of shapes of Curve B and Curve C illustrates this feature.<br />
  35. 35. To Summarize these four characteristics of the approximating normal distribution <br />The normal distribution approximation to the distribution of sample means is a symmetrical distribution centred about the mean of the universe sampled.<br />Small deviations of the sample mean from the universe mean are more probable than large deviations.<br />The larger the sample size , the more tightly clustered around the universe mean will be the distribution of sample means.<br />
  36. 36. Standard Deviation<br />The Standard Deviation (σ) of any distribution or universe of items is a measure of the dispersion or variability of the items in that universe.<br />σ = √∑(Xі-M)²<br />Where : σ = universe standard deviation<br /> Xi = value of the ith item in the universe<br /> M = universe mean<br /> N = number of items in universe<br />N-1<br />
  37. 37. Standard error of the mean<br />When the distribution is that of sample means, the standard deviation of this distribution is called the standard error of the mean.<br />The proportion of sample means between any two limits is determined by the distance of those limits from the universe mean, measured in terms of numbers of standard errors. <br />
  38. 38. Proportion of Sample Means included within certain distances, Measured in Terms of Standard Errors(σ X) from the Mean of the Universe (M)<br />34.1%<br />34.1%<br />13.6%<br />13.6%<br />2.2%<br />2.2%<br /> - 1σX<br />-3 σX<br />1 σX<br />2 σX<br />3 σX<br />- 2 σX<br />M<br />
  39. 39. Observations<br />About two thirds (68.2%) of all sample means will be within one standard error (either side) of the universe mean.<br />About 95.4% of all sample means will be within two standard errors of the universe mean. <br />Practically all (99.7%) sample means will be within three standard errors.<br />
  40. 40. Capital Letter Z is used to refer to a distance from universe mean when it is expressed in standard errors.<br />The Z values in the diagram are Z=1,2,3.<br />When constructing confidence intervals, researchers use the knowledge of the percentage of sample means which fall between any two Z values. <br />Identify specific ranges of +Z to –Z and the percentage of sample means falling with those ranges.<br />
  41. 41. The +-Z ranges most commonly used in MR are :<br />
  42. 42. Standard error of the mean and universe standard deviation<br />Standard error of the mean is related to the standard deviation of the universe :<br />σ X = σ(This Formula applies if less than <br />5% of the universe is included in the sample)<br />√n<br />Where : σx = Standard error of the mean<br />σ = Standard deviation of the universe<br /> n = Number of observations in the sample<br />
  43. 43. Relationship between Standard Error of the Mean and Standard Deviation of the Universe<br />Sigma Implies that when larger sample sizes n are used, the standard error of the mean will be a smaller value and therefore, the mean of the given sample is likely to be closer to the universe mean. <br />
  44. 44. Example<br />The Universe average household income is $25000 and the universe standard deviation is $40000 and n = 100.<br />Then Sigmaxbar = 40000/√100 = $4000<br />A 68.26% of sample means will lie within the interval <br />Interval = Universe Mean +-1 Standard Error<br />= $25000+-1($4000)<br />=$21000 - $29000<br />Similarly for Z = 2 and 3<br />$17000-$33000 for 95.44% of sample means<br />$13000-$37000 for 99.72% of sample means<br />
  45. 45. Level of Confidence<br />Level of confidence is the probability that one’s confidence interval statement about the universe mean is correct ie., the obtained interval based on a given sample will infact encompass the universe mean.<br />The higher the confidence level, the more likely that the confidence interval will in fact be correct.<br />80,90,95% depending on how confident the analyst wishes to be about the location of the universe mean.<br />
  46. 46. If the universe standard deviation is σ known<br />If σ = 40,000 construct a 99.7% interval ie Z=3<br />For the unknown population mean M using data on a sample mean x based on sample size of n=400.<br />σx=σ/√n = 40000/√400 = 2000<br />Interval =+-3 standard errors<br />= +-3(2000) <br />=+-6000<br />Implies that 99.7% confident that the obtained sample mean would be within $6000 of the actual mean income M.<br />
  47. 47. If standard deviation of the universe is not known<br />Estimated Standard Error<br />Sx=S/√n <br />Where <br />Sx = Estimated Standard Error of the mean<br />S= standard deviation of the sample<br />n=number of observations in the sample<br />
  48. 48. Confidence limits for percentages<br />Sp=√(p.q)/n<br />Where p = percentage of items in the sample possessing a given charactersistic<br />q=percentage of items in the sample not possessing the characteristic<br />n=sample size<br />