Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Sampling Technique


Published on

  • Be the first to comment

Sampling Technique

  1. 1. TOPIC: report on sampling techniques PREPARED BY RAJESH Narayanan s.r.m university
  2. 2. What is Sampling: Sampling: A process used in statistical analysis in which a predetermined number of observations will be taken from a larger population. The methodology used to sample from a larger population will depend on the type of analysis being performed, but will include simple random sampling, systematic sampling and observational sampling. It is technique of drawing samples, I.e., it is a technique of collecting data only on a part of the population to reveal the characteristics of the entire population. Purpose of Sampling The purpose of sampling is to provide various types of statistical information of a qualitative or quantitative nature about the whole by examining a few selected units. The sampling method is the scientific procedure of selecting those sampling units which would provide the required estimates with associated margins of uncertainity, arising from examining only a part and not the whole. Reasons for sampling instead of census / Need for sampling: There are 6 reasons for sampling (1) Economy (2) Timeliness (3) Large size of many population (4) Inaccessibility of the entire population. (5) Destructive nature of many Observation (6) Reliability or accuracy. (1) Economy: Unit cost of collecting data in the case of census is significantly less then in the case of sampling for example: In case of census is taka 200, while in the case of sampling is taka 1,000 but due to the larger number of items the total cost involve in the case of census of census is significantly higher then in the case of sampling. For example, We can find out the total cost of collecting information by multiplying the total population with the unit cost in case of census. Here total population = N We can find out the total cost of collecting information by multiplying the sample size with the unit cost in case of census. Here sample size=n. 10,00,000 x 200 = 20,00,00,000 5,000 x 1000 = 50,00,000
  3. 3. (2) Timeliness: Unit time involve in the case of sampling then in the case census but due to the larger size of population total time involve in the case of census in significantly higher then in the case of census. (3) Large size of many population: In some cases the size of the population is extremely large. All of them are not treaseable due in traveling, disease, death, mental abnormality, prisoners etc. In that situation the only way to conduct the research is collecting data through a sample survey. (4) Inaccessibility of the entire population: In some cases the entire population may not be accessible. At that case sampling is necessary. Suppose in some cases the entire population is inaccessible because of aircraft crash. (5) Destructive nature of many population: Due to destructive nature of many of the population, the resources is completed to collect information only on a part of the population. For example: Blood test for a patient. Life hours of a tube light. (6) Reliability: By using a scientific sampling technique one can minimize the sampling error and as qualified investigators are included, the non-sampling error committed in the case of sample survey is also minimum. The amount of non-sampling error in the case of census is much higher than the total amount of sampling and non-sampling error committed in the case of a sample survey ( as less qualified investigator are involve in the case of census and the supervision, monitoring and quality control mechanism in the case of census. The degree of errors has a relationship with reliability. If error decrease than the reliability increase sampling decrease both the sampling and non-sampling error. So, it enhance the reliability of information. How to Determine Sample Size, Determining Sample Size In order to prove that a process has been improved, you must measure the process capability before and after improvements are implemented. This allows you to quantify the process improvement (e.g., defect reduction or productivity increase) and translate the effects into an estimated financial result – something business leaders can understand and appreciate. If data is not readily available for the process, how many members of the population should be
  4. 4. selected to ensure that the population is properly represented? If data has been collected, how do you determine if you have enough data? Determining sample size is a very important issue because samples that are too large may waste time, resources and money, while samples that are too small may lead to inaccurate results. In many cases, we can easily determine the minimum sample size needed to estimate a process parameter, such as the population mean . When sample data is collected and the sample mean is calculated, that sample mean is typically different from the population mean . This difference between the sample and population means can be thought of as an error. The margin of error is the maximum difference between the observed sample mean : and the true value of the population mean where: is known as the critical value, the positive area of value that is at the vertical boundary for the in the right tail of the standard normal distribution. is the population standard deviation. is the sample size. Rearranging this formula, we can solve for the sample size necessary to produce results accurate to a specified confidence and margin of error. This formula can be used when you know and want to determine the sample size necessary to establish, with a confidence of , the mean value to within . You can still use this formula if you don’t know your population standard deviation and you have a small sample size. Although it’s unlikely that you know when the population mean is not known, you may be able to determine from a similar process or from a pilot test/simulation.
  5. 5. Let’s put all this statistical mumbo-jumbo to work. Take for example that we would like to start an Internet service provider (ISP) and need to estimate the average Internet usage of households in one week for our business plan and model. Sample Size Calculation Example Problem We would like to start an ISP and need to estimate the average Internet usage of households in one week for our business plan and model. How many households must we randomly select to be 95 percent sure that the sample mean is within 1 minute of the population mean . Assume that a previous survey of household usage has shown = 6.95 minutes. Solution We are solving for the sample size . A 95% degree confidence corresponds to = 0.05. Each of the shaded tails in the following figure has an area of = 0.025. The region to the left of and to the right of = 0 is 0.5 – 0.025, or 0.475. In the table of the standard normal ( ) distribution, an area of 0.475 corresponds to a value of 1.96. The critical value is therefore The margin of error = 1 and the standard deviation size, we can calculate : = 1.96. = 6.95. Using the formula for sample So we will need to sample at least 186 (rounded up) randomly selected households. With this sample we will be 95 percent confident that the sample mean true population of Internet usage. will be within 1 minute of the This formula can be used when you know and want to determine the sample size necessary to establish, with a confidence of , the mean value to within . You can still use this formula if you don’t know your population standard deviation and you have a small sample size. Although it is unlikely that you know when the population mean is not known, you may be able to determine from a similar process or from a pilot test/simulation.
  6. 6. STEPS IN SAMPLING PROCESS: It is the procedure required right from defining a population to the actual selection of sample elements. There are seven steps involved in this process. Step 1: Define the population It is the aggregate of all the elements defined prior to selection of the sample. It is necessary to define population in terms of (i) elements (ii) sampling units (iii) extent (iv) time. A few examples are given here. If we were to conduct a survey on the consumption of tea in Gujarat, then these specifications might be as follows (i) Element: Housewives (ii) Sampling units: Households, then housewives (iii) Extent Gujarat State (iv) Time January 1-10, 1999 If we were to monitor the sales of a product recently introduced by us, the population might be (i) Element Our product (ii) Sampling units Retail outlets, super markets, then our product (iii) Extent Delhi and New Delhi (iv) Time January 7-14, 1999
  7. 7. It may be emphasized that all these four specifications must be contained in the designated population Omission of any of them would render the definition of population incomplete Step 2 : Identify the sampling frame Identifying the sampling frame, which could be a telephone directory, a list of blocks and localities of a city, a map or any other list consisting of all the sampling units. It may be pointed out that if the frame is incomplete or otherwise defective, sampling will not be able to overcome these shortcomings The question is—How to ensure that the frame is perfect and free from any defect Leslie Kish has observed that a perfect frame is one where “every element appears on the list separately, once only once, and nothing else appears on the list” This type of perfect frame would indicate one-to-one correspondence between frame units and sampling units But such perfect frames are rather rare Accordingly, one has to use frames with one deficiency or another, but one should ensure that the frame is not too deficient so as to be given up altogether This raises a pertinent question -What are the criteria for a suitable frame? In order to examine the suitability or otherwise of a sampling frame, a number of questions need be asked. These are 1 Does it adequately cover the population to be surveyed? 2 How complete is the frame? Is every unit that should be included represented? 3 Is it accurate? Is the information about each individual unit correct? Does the frame as a whole contain units, which no longer exist? 4 Is there any duplication? If so, then the probability of selection is disturbed as a unit can enter the sample more than once 5 Is the frame up-to-date? It could have met all the criteria when compiled but could well be deficient when it came to be used This could well be true of all frames involving the human population as change is taking place continuously 6 How convenient is it to use? Is it readily accessible? Is it arranged in a way suitable for sampling? Can it easily be re-arranged so as to enable us to introduce stratification and to undertake multi-stage sampling? These are demanding criteria and it is most unlikely that any frame would meet them all Nevertheless, they are the factors to be borne in mind whenever we undertake random sampling
  8. 8. In marketing research most of the frames are from census reports, electoral registers, lists of member units of trade and industry associations, lists of members of professional bodies, lists of dwelling units maintained by local bodies, returns from an earlier survey and large scale maps. Step 3: Specify the sampling unit The sampling unit is the basic unit containing the elements of the target population. The sampling unit may be different from the element. For example, if one wanted a sample of housewives, it might be possible to have access to such a sample directly. However, it is easier to select households as the sampling unit and then interview housewives in each of the households. As mentioned in the preceding step, the sampling frame should be complete and accurate otherwise the selection of the sampling unit might be defective. It is necessary to get a further specification of the sampling unit both in personal interviews and in telephone interviews. Thus, in personal interviews, a pertinent question is—of the several persons in a household, who should be interviewed? If interviews were held during office timings when the heads of families and other employed persons are away, interviewing would under-represent employed persons and over-represent elderly persons, housewives and the unemployed. In view of these considerations, it is necessary to have a random process of selection of the adult residents of each household. One method that could be used for this purpose is to list all the eligible persons living at a particular address and then select one of them. Step 4: Specify the sampling method It indicates how the sample units are selected. One of the most important decisions in this regard is to determine which of the two—probability and non-probability sample—is to be chosen. In case of a probability sample, the probability or chance of every unit in the population being in the sample is known. Further, the selection of specific units in the sample depends entirely on chance. No substitution of one unit for another is permissible. This means that no human judgment is involved in the selection of a sample. In contrast, in a non-probability sample, the probability of inclusion of any unit in the population in the sample is not known. In addition, the selection of units within a sample involves human judgment rather than pure chance. In case of a probability sample, it is possible to measure the sampling error and thereby determine the degree of precision in the estimates with the help of the theory of probability.
  9. 9. This theory also enables us to consider, from amongst the various possible sample designs, the one that will give the maximum information per rupee. This is not possible when a nonprobability sample is used. Probability sampling enables us to choose representative sample designs. It also enables us to estimate the extent to which the results based on such a sample are likely to be different from what we would have obtained had we covered the population in our study. Conversely, the use of probability sampling enables us to determine the sample size for a given degree of precision, indicating that our sample results do not differ by more than a specified amount from those yielded by a study covering entire population. Although non-probability sampling does not yield these benefits, on account of its convenience and economy, it is often preferred to probability sampling. If the researcher is convinced that the risks involved in the use of a non-probability sample are more than offset by its being relatively cheap and convenient, his choice should be in favor of non-probability sampling. There are various types of sample designs that can be covered under the two broad groups, random or probability samples and non-random or non-probability samples. Step 5: Determine the sample size In other words, one has to decide how many elements of the target population are to be chosen. Step 6: Specify the sampling plan This means that one should indicate how decisions made so far are to be implemented. For example, if a survey of households is to be conducted, a sampling plan should define a household, contain instructions to the interviewer as to how he should take a systematic sample of households, advise him on what he should do when no one is available on his visit to the household, and so on. These are some pertinent issues in a sampling survey to which a sampling plan should provide answers. Step 7: Select the sample This is the final step in the sampling process. A good deal of office and fieldwork is involved in the actual selection of the sampling elements. Most of the problems in this stage are faced by the interviewer while contacting the sample-respondents.
  10. 10. Why Sample? Sampling is done in a wide variety of research settings. Listed below are a few of the benefits of sampling: 1. Reduced cost: It is obviously less costly to obtain data for a selected subset of a population, rather than the entire population. Furthermore, data collected through a carefully selected sample are highly accurate measures of the larger population. Public opinion researchers can usually draw accurate inferences for the entire population of the United States from interviews of only 1,000 people. 2. Speed: Observations are easier to collect and summarize with a sample than with a complete count. This consideration may be vital if the speed of the analysis is important, such as through exit polls in elections. 3. Greater scope: Sometimes highly trained personnel or specialized equipment limited in availability must be used to obtain the data. A complete census (enumeration) is not practical or possible. Thus, surveys that rely on sampling have greater flexibility regarding the type of information that can be obtained. It is important to keep in mind that the primary point of sampling is to create a small group from a population that is as similar to the larger population as possible. In essence, we want to have a little group that is like the big group. With that in mind, one of the features we look for in a sample is the degree of representativeness - how well does the sample represent the larger population from which it was drawn? How closely do the features of the sample resemble those of the larger population? There are, of course, good and bad samples, and different sampling methods have different strengths and weaknesses. Before turning to specific methods, a few specialized terms used in sampling should be defined. Sampling Terminology Samples are always drawn from a population, but we have not defined the term "population." By "population" we denote the aggregate from which the sample is drawn. The population to be sampled (the sampled population) should coincide with the population about which information is wanted (the target population). Sometimes, for reasons of practicality or convenience, the sampled population is more restricted than the target population. In such cases, precautions must be taken to secure that the conclusions only refer to the sampled population. Before selecting the sample, the population must be divided into parts that are called sampling units or units. These units must cover the whole of the population and they must not overlap, in the sense that every element in the population belongs to one and only one unit. Sometimes the choice of the unit is obvious, as in the case of the population of Americans so often used for opinion polling. In sampling individuals in a town, the unit might be an individual person, the members of a family, or all persons living in the same city block. In sampling an agricultural crop, the unit might be a field, a farm, or an area of land whose shape and dimensions are at our disposal. The construction of this list of sampling units, called a frame, is often one of the major practical problems.
  11. 11. Types of Sampling We may then consider different types of probability samples. Although there are a number of different methods that might be used to create a sample, they generally can be grouped into one of two categories: probability samples or non-probability samples. Probability Samples The idea behind this type is random selection. More specifically, each sample from the population of interest has a known probability of selection under a given sampling scheme. There are four categories of probability samples described below. Simple Random Sampling The most widely known type of a random sample is the simple random sample (SRS). This is characterized by the fact that the probability of selection is the same for every case in the population. Simple random sampling is a method of selecting n units from a population of size N such that every possible sample of size an has equal chance of being drawn. An example may make this easier to understand. Imagine you want to carry out a survey of 100 voters in a small town with a population of 1,000 eligible voters. With a town this size, there are "old-fashioned" ways to draw a sample. For example, we could write the names of all voters on a piece of paper, put all pieces of paper into a box and draw 100 tickets at random. You shake the box, draw a piece of paper and set it aside, shake again, draw another, set it aside, etc. until we had 100 slips of paper. These 100 form our sample. And this sample would be drawn through a simple random sampling procedure - at each draw, every name in the box had the same probability of being chosen. In real-world social research, designs that employ simple random sampling are difficult to come by. We can imagine some situations where it might be possible - you want to interview a sample of doctors in a hospital about work conditions. So you get a list of all the physicians that work in the hospital, write their names on a piece of paper, put those pieces of paper in the box, shake and draw. But in most real-world instances it is impossible to list everything on a piece of paper and put it in a box, then randomly draw numbers until desired sample size is reached. There are many reasons why one would choose a different type of probability sample in practice. Stratified Random Sampling In this form of sampling, the population is first divided into two or more mutually exclusive segments based on some categories of variables of interest in the research. It is designed to organize the population into homogenous subsets before sampling, then drawing a random sample within each subset. With stratified random sampling the population of N units is divided into subpopulations of units respectively. These subpopulations, called strata, are non-overlapping and together they comprise the whole of the population. When these have been determined, a sample is drawn from each, with a separate draw for each of the different
  12. 12. strata. The sample sizes within the strata are denoted by respectively. If a SRS is taken within each stratum, then the whole sampling procedure is described as stratified random sampling. The primary benefit of this method is to ensure that cases from smaller strata of the population are included in sufficient numbers to allow comparison. An example makes it easier to understand. Say that you're interested in how job satisfaction varies by race among a group of employees at a firm. To explore this issue, we need to create a sample of the employees of the firm. However, the employee population at this particular firm is predominantly white, as the following chart illustrates: If we were to take a simple random sample of employees, there's a good chance that we would end up with very small numbers of Blacks, Asians, and Latinos. That could be disastrous for our research, since we might end up with too few cases for comparison in one or more of the smaller groups. Rather than taking a simple random sample from the firm's population at large, in a stratified sampling design, we ensure that appropriate numbers of elements are drawn from each racial group in proportion to the percentage of the population as a whole. Say we want a sample of 1000 employees - we would stratify the sample by race (group of White employees, group of African American employees, etc.), then randomly draw out 750 employees from the White group, 90 from the African American, 100 from the Asian, and 60 from the Latino. This yields a sample that is proportionately representative of the firm as a whole. Stratification is a common technique. There are many reasons for this, such as: 1. If data of known precision are wanted for certain subpopulations, than each of these should be treated as a population in its own right. 2. Administrative convenience may dictate the use of stratification, for example, if an agency administering a survey may have regional offices, which can supervise the survey for a part of the population. 3. Sampling problems may be inherent with certain sub populations, such as people living in institutions (e.g. hotels, hospitals, prisons). 4. Stratification may improve the estimates of characteristics of the whole population. It may be possible to divide a heterogeneous population into sub-populations, each of
  13. 13. which is internally homogenous. If these strata are homogenous, i.e., the measurements vary little from one unit to another; a precise estimate of any stratum mean can be obtained from a small sample in that stratum. The estimate can then be combined into a precise estimate for the whole population. 5. There is also a statistical advantage in the method, as a stratified random sample nearly always results in a smaller variance for the estimated mean or other population parameters of interest. Systematic Sampling This method of sampling is at first glance very different from SRS. In practice, it is a variant of simple random sampling that involves some listing of elements - every nth element of list is then drawn for inclusion in the sample. Say you have a list of 10,000 people and you want a sample of 1,000. Creating such a sample includes three steps: 1. Divide number of cases in the population by the desired sample size. In this example, dividing 10,000 by 1,000 gives a value of 10. 2. Select a random number between one and the value attained in Step 1. In this example, we choose a number between 1 and 10 - say we pick 7. 3. Starting with case number chosen in Step 2, take every tenth record (7, 17, 27, etc.). More generally, suppose that the N units in the population are ranked 1 to N in some order (e.g., alphabetic). To select a sample of n units, we take a unit at random, from the 1st k units and take every k-th unit thereafter. The advantages of systematic sampling method over simple random sampling include: 1. It is easier to draw a sample and often easier to execute without mistakes. This is a particular advantage when the drawing is done in the field. 2. Intuitively, you might think that systematic sampling might be more precise than SRS. In effect it stratifies the population into n strata, consisting of the 1st k units, the 2nd k units, and so on. Thus, we might expect the systematic sample to be as precise as a stratified random sample with one unit per stratum. The difference is that with the systematic one the units occur at the same relative position in the stratum whereas with the stratified, the position in the stratum is determined separately by randomization within each stratum. Cluster Sampling In some instances the sampling unit consists of a group or cluster of smaller units that we call elements or subunits (these are the units of analysis for your study). There are two main reasons for the widespread application of cluster sampling. Although the first intention may be to use the elements as sampling units, it is found in many surveys that no reliable list of elements in the population is available and that it would be prohibitively expensive to construct such a list. In many countries there are no complete and updated lists of the people, the houses or the farms in any large geographical region.
  14. 14. Even when a list of individual houses is available, economic considerations may point to the choice of a larger cluster unit. For a given size of sample, a small unit usually gives more precise results than a large unit. For example a SRS of 600 houses covers a town more evenly than 20 city blocks containing an average of 30 houses apiece. But greater field costs are incurred in locating 600 houses and in traveling between them than in covering 20 city blocks. When cost is balanced against precision, the larger unit may prove superior. Important things about cluster sampling: 1. Most large scale surveys are done using cluster sampling; 2. Clustering may be combined with stratification, typically by clustering within strata; 3. In general, for a given sample size n cluster samples are less accurate than the other types of sampling in the sense that the parameters you estimate will have greater variability than an SRS, stratified random or systematic sample.
  15. 15. Nonprobability Sampling Social research is often conducted in situations where a researcher cannot select the kinds of probability samples used in large-scale social surveys. For example, say you wanted to study homelessness - there is no list of homeless individuals nor are you likely to create such a list. However, you need to get some kind of a sample of respondents in order to conduct your research. To gather such a sample, you would likely use some form of non-probability sampling. To reiterate, the primary difference between probability methods of sampling and nonprobability methods is that in the latter you do not know the likelihood that any element of a population will be selected for study. There are four primary types of non-probability sampling methods: Availability Sampling Availability sampling is a method of choosing subjects who are available or easy to find. This method is also sometimes referred to as haphazard, accidental, or convenience sampling. The primary advantage of the method is that it is very easy to carry out, relative to other methods. A researcher can merely stand out on his/her favorite street corner or in his/her favorite tavern and hand out surveys. One place this used to show up often is in university courses. Years ago, researchers often would conduct surveys of students in their large lecture courses. For example, all students taking introductory sociology courses would have been given a survey and compelled to fill it out. There are some advantages to this design - it is easy to do, particularly with a captive audience, and in some schools you can attain a large number of interviews through this method. The primary problem with availability sampling is that you can never be certain what population the participants in the study represent. The population is unknown, the method for selecting cases is haphazard, and the cases studied probably don't represent any population you could come up with. However, there are some situations in which this kind of design has advantages - for example, survey designers often want to have some people respond to their survey before it is given out in the "real" research setting as a way of making certain the questions make sense to respondents. For this purpose, availability sampling is not a bad way to get a group to take a survey, though in this case researchers care less about the specific responses given than whether the instrument is confusing or makes people feel bad. Despite the known flaws with this design, it's remarkably common. Ask a provocative question, give telephone number and web site address ("Vote now at, announce results of poll. This method provides some form of statistical data on a current issue, but it is entirely unknown what population the results of such polls represents. At best, a researcher could make some conditional statement about people who are watching CNN at a particular point in time who cared enough about the issue in question to log on or call in.
  16. 16. Quota Sampling Quota sampling is designed to overcome the most obvious flaw of availability sampling. Rather than taking just anyone, you set quotas to ensure that the sample you get represents certain characteristics in proportion to their prevalence in the population. Note that for this method, you have to know something about the characteristics of the population ahead of time. Say you want to make sure you have a sample proportional to the population in terms of gender - you have to know what percentage of the population is male and female, then collect sample until yours matches. Marketing studies are particularly fond of this form of research design. The primary problem with this form of sampling is that even when we know that a quota sample is representative of the particular characteristics for which quotas have been set, we have no way of knowing if sample is representative in terms of any other characteristics. If we set quotas for gender and age, we are likely to attain a sample with good representativeness on age and gender, but one that may not be very representative in terms of income and education or other factors. Moreover, because researchers can set quotas for only a small fraction of the characteristics relevant to a study quota sampling is really not much better than availability sampling. To reiterate, you must know the characteristics of the entire population to set quotas; otherwise there's not much point to setting up quotas. Finally, interviewers often introduce bias when allowed to self-select respondents, which is usually the case in this form of research. In choosing males 18-25, interviewers are more likely to choose those that are better-dressed, seem more approachable or less threatening. That may be understandable from a practical point of view, but it introduces bias into research findings. Purposive Sampling Purposive sampling is a sampling method in which elements are chosen based on purpose of the study. Purposive sampling may involve studying the entire population of some limited group (sociology faculty at Columbia) or a subset of a population (Columbia faculty who have won Nobel Prizes). As with other non-probability sampling methods, purposive sampling does not produce a sample that is representative of a larger population, but it can be exactly what is needed in some cases - study of organization, community, or some other clearly defined and relatively limited group. Snowball Sampling Snowball sampling is a method in which a researcher identifies one member of some population of interest, speaks to him/her, then asks that person to identify others in the population that the researcher might speak to. This person is then asked to refer the researcher to yet another person, and so on. Snowball sampling is very good for cases where members of a special population are difficult to locate. For example, several studies of Mexican migrants in Los Angeles have used snowball sampling to get respondents.
  17. 17. The method also has an interesting application to group membership - if you want to look at pattern of recruitment to a community organization over time, you might begin by interviewing fairly recent recruits, asking them who introduced them to the group. Then interview the people named, asking them who recruited them to the group. The method creates a sample with questionable representativeness. A researcher is not sure who is in the sample. In effect snowball sampling often leads the researcher into a realm he/she knows little about. It can be difficult to determine how a sample compares to a larger population. Also, there's an issue of who respondents refer you to - friends refer to friends, less likely to refer to ones they don't like, fear, etc.