Successfully reported this slideshow.
Upcoming SlideShare
×

# Sampling Technique

1,623 views

Published on

• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

### Sampling Technique

1. 1. TOPIC: report on sampling techniques PREPARED BY RAJESH Narayanan s.r.m university
2. 2. What is Sampling: Sampling: A process used in statistical analysis in which a predetermined number of observations will be taken from a larger population. The methodology used to sample from a larger population will depend on the type of analysis being performed, but will include simple random sampling, systematic sampling and observational sampling. It is technique of drawing samples, I.e., it is a technique of collecting data only on a part of the population to reveal the characteristics of the entire population. Purpose of Sampling The purpose of sampling is to provide various types of statistical information of a qualitative or quantitative nature about the whole by examining a few selected units. The sampling method is the scientific procedure of selecting those sampling units which would provide the required estimates with associated margins of uncertainity, arising from examining only a part and not the whole. Reasons for sampling instead of census / Need for sampling: There are 6 reasons for sampling (1) Economy (2) Timeliness (3) Large size of many population (4) Inaccessibility of the entire population. (5) Destructive nature of many Observation (6) Reliability or accuracy. (1) Economy: Unit cost of collecting data in the case of census is significantly less then in the case of sampling for example: In case of census is taka 200, while in the case of sampling is taka 1,000 but due to the larger number of items the total cost involve in the case of census of census is significantly higher then in the case of sampling. For example, We can find out the total cost of collecting information by multiplying the total population with the unit cost in case of census. Here total population = N We can find out the total cost of collecting information by multiplying the sample size with the unit cost in case of census. Here sample size=n. 10,00,000 x 200 = 20,00,00,000 5,000 x 1000 = 50,00,000
3. 3. (2) Timeliness: Unit time involve in the case of sampling then in the case census but due to the larger size of population total time involve in the case of census in significantly higher then in the case of census. (3) Large size of many population: In some cases the size of the population is extremely large. All of them are not treaseable due in traveling, disease, death, mental abnormality, prisoners etc. In that situation the only way to conduct the research is collecting data through a sample survey. (4) Inaccessibility of the entire population: In some cases the entire population may not be accessible. At that case sampling is necessary. Suppose in some cases the entire population is inaccessible because of aircraft crash. (5) Destructive nature of many population: Due to destructive nature of many of the population, the resources is completed to collect information only on a part of the population. For example: Blood test for a patient. Life hours of a tube light. (6) Reliability: By using a scientific sampling technique one can minimize the sampling error and as qualified investigators are included, the non-sampling error committed in the case of sample survey is also minimum. The amount of non-sampling error in the case of census is much higher than the total amount of sampling and non-sampling error committed in the case of a sample survey ( as less qualified investigator are involve in the case of census and the supervision, monitoring and quality control mechanism in the case of census. The degree of errors has a relationship with reliability. If error decrease than the reliability increase sampling decrease both the sampling and non-sampling error. So, it enhance the reliability of information. How to Determine Sample Size, Determining Sample Size In order to prove that a process has been improved, you must measure the process capability before and after improvements are implemented. This allows you to quantify the process improvement (e.g., defect reduction or productivity increase) and translate the effects into an estimated financial result – something business leaders can understand and appreciate. If data is not readily available for the process, how many members of the population should be
4. 4. selected to ensure that the population is properly represented? If data has been collected, how do you determine if you have enough data? Determining sample size is a very important issue because samples that are too large may waste time, resources and money, while samples that are too small may lead to inaccurate results. In many cases, we can easily determine the minimum sample size needed to estimate a process parameter, such as the population mean . When sample data is collected and the sample mean is calculated, that sample mean is typically different from the population mean . This difference between the sample and population means can be thought of as an error. The margin of error is the maximum difference between the observed sample mean : and the true value of the population mean where: is known as the critical value, the positive area of value that is at the vertical boundary for the in the right tail of the standard normal distribution. is the population standard deviation. is the sample size. Rearranging this formula, we can solve for the sample size necessary to produce results accurate to a specified confidence and margin of error. This formula can be used when you know and want to determine the sample size necessary to establish, with a confidence of , the mean value to within . You can still use this formula if you don’t know your population standard deviation and you have a small sample size. Although it’s unlikely that you know when the population mean is not known, you may be able to determine from a similar process or from a pilot test/simulation.
5. 5. Let’s put all this statistical mumbo-jumbo to work. Take for example that we would like to start an Internet service provider (ISP) and need to estimate the average Internet usage of households in one week for our business plan and model. Sample Size Calculation Example Problem We would like to start an ISP and need to estimate the average Internet usage of households in one week for our business plan and model. How many households must we randomly select to be 95 percent sure that the sample mean is within 1 minute of the population mean . Assume that a previous survey of household usage has shown = 6.95 minutes. Solution We are solving for the sample size . A 95% degree confidence corresponds to = 0.05. Each of the shaded tails in the following figure has an area of = 0.025. The region to the left of and to the right of = 0 is 0.5 – 0.025, or 0.475. In the table of the standard normal ( ) distribution, an area of 0.475 corresponds to a value of 1.96. The critical value is therefore The margin of error = 1 and the standard deviation size, we can calculate : = 1.96. = 6.95. Using the formula for sample So we will need to sample at least 186 (rounded up) randomly selected households. With this sample we will be 95 percent confident that the sample mean true population of Internet usage. will be within 1 minute of the This formula can be used when you know and want to determine the sample size necessary to establish, with a confidence of , the mean value to within . You can still use this formula if you don’t know your population standard deviation and you have a small sample size. Although it is unlikely that you know when the population mean is not known, you may be able to determine from a similar process or from a pilot test/simulation.
6. 6. STEPS IN SAMPLING PROCESS: It is the procedure required right from defining a population to the actual selection of sample elements. There are seven steps involved in this process. Step 1: Define the population It is the aggregate of all the elements defined prior to selection of the sample. It is necessary to define population in terms of (i) elements (ii) sampling units (iii) extent (iv) time. A few examples are given here. If we were to conduct a survey on the consumption of tea in Gujarat, then these specifications might be as follows (i) Element: Housewives (ii) Sampling units: Households, then housewives (iii) Extent Gujarat State (iv) Time January 1-10, 1999 If we were to monitor the sales of a product recently introduced by us, the population might be (i) Element Our product (ii) Sampling units Retail outlets, super markets, then our product (iii) Extent Delhi and New Delhi (iv) Time January 7-14, 1999
7. 7. It may be emphasized that all these four specifications must be contained in the designated population Omission of any of them would render the definition of population incomplete Step 2 : Identify the sampling frame Identifying the sampling frame, which could be a telephone directory, a list of blocks and localities of a city, a map or any other list consisting of all the sampling units. It may be pointed out that if the frame is incomplete or otherwise defective, sampling will not be able to overcome these shortcomings The question is—How to ensure that the frame is perfect and free from any defect Leslie Kish has observed that a perfect frame is one where “every element appears on the list separately, once only once, and nothing else appears on the list” This type of perfect frame would indicate one-to-one correspondence between frame units and sampling units But such perfect frames are rather rare Accordingly, one has to use frames with one deficiency or another, but one should ensure that the frame is not too deficient so as to be given up altogether This raises a pertinent question -What are the criteria for a suitable frame? In order to examine the suitability or otherwise of a sampling frame, a number of questions need be asked. These are 1 Does it adequately cover the population to be surveyed? 2 How complete is the frame? Is every unit that should be included represented? 3 Is it accurate? Is the information about each individual unit correct? Does the frame as a whole contain units, which no longer exist? 4 Is there any duplication? If so, then the probability of selection is disturbed as a unit can enter the sample more than once 5 Is the frame up-to-date? It could have met all the criteria when compiled but could well be deficient when it came to be used This could well be true of all frames involving the human population as change is taking place continuously 6 How convenient is it to use? Is it readily accessible? Is it arranged in a way suitable for sampling? Can it easily be re-arranged so as to enable us to introduce stratification and to undertake multi-stage sampling? These are demanding criteria and it is most unlikely that any frame would meet them all Nevertheless, they are the factors to be borne in mind whenever we undertake random sampling
8. 8. In marketing research most of the frames are from census reports, electoral registers, lists of member units of trade and industry associations, lists of members of professional bodies, lists of dwelling units maintained by local bodies, returns from an earlier survey and large scale maps. Step 3: Specify the sampling unit The sampling unit is the basic unit containing the elements of the target population. The sampling unit may be different from the element. For example, if one wanted a sample of housewives, it might be possible to have access to such a sample directly. However, it is easier to select households as the sampling unit and then interview housewives in each of the households. As mentioned in the preceding step, the sampling frame should be complete and accurate otherwise the selection of the sampling unit might be defective. It is necessary to get a further specification of the sampling unit both in personal interviews and in telephone interviews. Thus, in personal interviews, a pertinent question is—of the several persons in a household, who should be interviewed? If interviews were held during office timings when the heads of families and other employed persons are away, interviewing would under-represent employed persons and over-represent elderly persons, housewives and the unemployed. In view of these considerations, it is necessary to have a random process of selection of the adult residents of each household. One method that could be used for this purpose is to list all the eligible persons living at a particular address and then select one of them. Step 4: Specify the sampling method It indicates how the sample units are selected. One of the most important decisions in this regard is to determine which of the two—probability and non-probability sample—is to be chosen. In case of a probability sample, the probability or chance of every unit in the population being in the sample is known. Further, the selection of specific units in the sample depends entirely on chance. No substitution of one unit for another is permissible. This means that no human judgment is involved in the selection of a sample. In contrast, in a non-probability sample, the probability of inclusion of any unit in the population in the sample is not known. In addition, the selection of units within a sample involves human judgment rather than pure chance. In case of a probability sample, it is possible to measure the sampling error and thereby determine the degree of precision in the estimates with the help of the theory of probability.
9. 9. This theory also enables us to consider, from amongst the various possible sample designs, the one that will give the maximum information per rupee. This is not possible when a nonprobability sample is used. Probability sampling enables us to choose representative sample designs. It also enables us to estimate the extent to which the results based on such a sample are likely to be different from what we would have obtained had we covered the population in our study. Conversely, the use of probability sampling enables us to determine the sample size for a given degree of precision, indicating that our sample results do not differ by more than a specified amount from those yielded by a study covering entire population. Although non-probability sampling does not yield these benefits, on account of its convenience and economy, it is often preferred to probability sampling. If the researcher is convinced that the risks involved in the use of a non-probability sample are more than offset by its being relatively cheap and convenient, his choice should be in favor of non-probability sampling. There are various types of sample designs that can be covered under the two broad groups, random or probability samples and non-random or non-probability samples. Step 5: Determine the sample size In other words, one has to decide how many elements of the target population are to be chosen. Step 6: Specify the sampling plan This means that one should indicate how decisions made so far are to be implemented. For example, if a survey of households is to be conducted, a sampling plan should define a household, contain instructions to the interviewer as to how he should take a systematic sample of households, advise him on what he should do when no one is available on his visit to the household, and so on. These are some pertinent issues in a sampling survey to which a sampling plan should provide answers. Step 7: Select the sample This is the final step in the sampling process. A good deal of office and fieldwork is involved in the actual selection of the sampling elements. Most of the problems in this stage are faced by the interviewer while contacting the sample-respondents.
10. 10. Why Sample? Sampling is done in a wide variety of research settings. Listed below are a few of the benefits of sampling: 1. Reduced cost: It is obviously less costly to obtain data for a selected subset of a population, rather than the entire population. Furthermore, data collected through a carefully selected sample are highly accurate measures of the larger population. Public opinion researchers can usually draw accurate inferences for the entire population of the United States from interviews of only 1,000 people. 2. Speed: Observations are easier to collect and summarize with a sample than with a complete count. This consideration may be vital if the speed of the analysis is important, such as through exit polls in elections. 3. Greater scope: Sometimes highly trained personnel or specialized equipment limited in availability must be used to obtain the data. A complete census (enumeration) is not practical or possible. Thus, surveys that rely on sampling have greater flexibility regarding the type of information that can be obtained. It is important to keep in mind that the primary point of sampling is to create a small group from a population that is as similar to the larger population as possible. In essence, we want to have a little group that is like the big group. With that in mind, one of the features we look for in a sample is the degree of representativeness - how well does the sample represent the larger population from which it was drawn? How closely do the features of the sample resemble those of the larger population? There are, of course, good and bad samples, and different sampling methods have different strengths and weaknesses. Before turning to specific methods, a few specialized terms used in sampling should be defined. Sampling Terminology Samples are always drawn from a population, but we have not defined the term "population." By "population" we denote the aggregate from which the sample is drawn. The population to be sampled (the sampled population) should coincide with the population about which information is wanted (the target population). Sometimes, for reasons of practicality or convenience, the sampled population is more restricted than the target population. In such cases, precautions must be taken to secure that the conclusions only refer to the sampled population. Before selecting the sample, the population must be divided into parts that are called sampling units or units. These units must cover the whole of the population and they must not overlap, in the sense that every element in the population belongs to one and only one unit. Sometimes the choice of the unit is obvious, as in the case of the population of Americans so often used for opinion polling. In sampling individuals in a town, the unit might be an individual person, the members of a family, or all persons living in the same city block. In sampling an agricultural crop, the unit might be a field, a farm, or an area of land whose shape and dimensions are at our disposal. The construction of this list of sampling units, called a frame, is often one of the major practical problems.
11. 11. Types of Sampling We may then consider different types of probability samples. Although there are a number of different methods that might be used to create a sample, they generally can be grouped into one of two categories: probability samples or non-probability samples. Probability Samples The idea behind this type is random selection. More specifically, each sample from the population of interest has a known probability of selection under a given sampling scheme. There are four categories of probability samples described below. Simple Random Sampling The most widely known type of a random sample is the simple random sample (SRS). This is characterized by the fact that the probability of selection is the same for every case in the population. Simple random sampling is a method of selecting n units from a population of size N such that every possible sample of size an has equal chance of being drawn. An example may make this easier to understand. Imagine you want to carry out a survey of 100 voters in a small town with a population of 1,000 eligible voters. With a town this size, there are "old-fashioned" ways to draw a sample. For example, we could write the names of all voters on a piece of paper, put all pieces of paper into a box and draw 100 tickets at random. You shake the box, draw a piece of paper and set it aside, shake again, draw another, set it aside, etc. until we had 100 slips of paper. These 100 form our sample. And this sample would be drawn through a simple random sampling procedure - at each draw, every name in the box had the same probability of being chosen. In real-world social research, designs that employ simple random sampling are difficult to come by. We can imagine some situations where it might be possible - you want to interview a sample of doctors in a hospital about work conditions. So you get a list of all the physicians that work in the hospital, write their names on a piece of paper, put those pieces of paper in the box, shake and draw. But in most real-world instances it is impossible to list everything on a piece of paper and put it in a box, then randomly draw numbers until desired sample size is reached. There are many reasons why one would choose a different type of probability sample in practice. Stratified Random Sampling In this form of sampling, the population is first divided into two or more mutually exclusive segments based on some categories of variables of interest in the research. It is designed to organize the population into homogenous subsets before sampling, then drawing a random sample within each subset. With stratified random sampling the population of N units is divided into subpopulations of units respectively. These subpopulations, called strata, are non-overlapping and together they comprise the whole of the population. When these have been determined, a sample is drawn from each, with a separate draw for each of the different
12. 12. strata. The sample sizes within the strata are denoted by respectively. If a SRS is taken within each stratum, then the whole sampling procedure is described as stratified random sampling. The primary benefit of this method is to ensure that cases from smaller strata of the population are included in sufficient numbers to allow comparison. An example makes it easier to understand. Say that you're interested in how job satisfaction varies by race among a group of employees at a firm. To explore this issue, we need to create a sample of the employees of the firm. However, the employee population at this particular firm is predominantly white, as the following chart illustrates: If we were to take a simple random sample of employees, there's a good chance that we would end up with very small numbers of Blacks, Asians, and Latinos. That could be disastrous for our research, since we might end up with too few cases for comparison in one or more of the smaller groups. Rather than taking a simple random sample from the firm's population at large, in a stratified sampling design, we ensure that appropriate numbers of elements are drawn from each racial group in proportion to the percentage of the population as a whole. Say we want a sample of 1000 employees - we would stratify the sample by race (group of White employees, group of African American employees, etc.), then randomly draw out 750 employees from the White group, 90 from the African American, 100 from the Asian, and 60 from the Latino. This yields a sample that is proportionately representative of the firm as a whole. Stratification is a common technique. There are many reasons for this, such as: 1. If data of known precision are wanted for certain subpopulations, than each of these should be treated as a population in its own right. 2. Administrative convenience may dictate the use of stratification, for example, if an agency administering a survey may have regional offices, which can supervise the survey for a part of the population. 3. Sampling problems may be inherent with certain sub populations, such as people living in institutions (e.g. hotels, hospitals, prisons). 4. Stratification may improve the estimates of characteristics of the whole population. It may be possible to divide a heterogeneous population into sub-populations, each of
13. 13. which is internally homogenous. If these strata are homogenous, i.e., the measurements vary little from one unit to another; a precise estimate of any stratum mean can be obtained from a small sample in that stratum. The estimate can then be combined into a precise estimate for the whole population. 5. There is also a statistical advantage in the method, as a stratified random sample nearly always results in a smaller variance for the estimated mean or other population parameters of interest. Systematic Sampling This method of sampling is at first glance very different from SRS. In practice, it is a variant of simple random sampling that involves some listing of elements - every nth element of list is then drawn for inclusion in the sample. Say you have a list of 10,000 people and you want a sample of 1,000. Creating such a sample includes three steps: 1. Divide number of cases in the population by the desired sample size. In this example, dividing 10,000 by 1,000 gives a value of 10. 2. Select a random number between one and the value attained in Step 1. In this example, we choose a number between 1 and 10 - say we pick 7. 3. Starting with case number chosen in Step 2, take every tenth record (7, 17, 27, etc.). More generally, suppose that the N units in the population are ranked 1 to N in some order (e.g., alphabetic). To select a sample of n units, we take a unit at random, from the 1st k units and take every k-th unit thereafter. The advantages of systematic sampling method over simple random sampling include: 1. It is easier to draw a sample and often easier to execute without mistakes. This is a particular advantage when the drawing is done in the field. 2. Intuitively, you might think that systematic sampling might be more precise than SRS. In effect it stratifies the population into n strata, consisting of the 1st k units, the 2nd k units, and so on. Thus, we might expect the systematic sample to be as precise as a stratified random sample with one unit per stratum. The difference is that with the systematic one the units occur at the same relative position in the stratum whereas with the stratified, the position in the stratum is determined separately by randomization within each stratum. Cluster Sampling In some instances the sampling unit consists of a group or cluster of smaller units that we call elements or subunits (these are the units of analysis for your study). There are two main reasons for the widespread application of cluster sampling. Although the first intention may be to use the elements as sampling units, it is found in many surveys that no reliable list of elements in the population is available and that it would be prohibitively expensive to construct such a list. In many countries there are no complete and updated lists of the people, the houses or the farms in any large geographical region.
14. 14. Even when a list of individual houses is available, economic considerations may point to the choice of a larger cluster unit. For a given size of sample, a small unit usually gives more precise results than a large unit. For example a SRS of 600 houses covers a town more evenly than 20 city blocks containing an average of 30 houses apiece. But greater field costs are incurred in locating 600 houses and in traveling between them than in covering 20 city blocks. When cost is balanced against precision, the larger unit may prove superior. Important things about cluster sampling: 1. Most large scale surveys are done using cluster sampling; 2. Clustering may be combined with stratification, typically by clustering within strata; 3. In general, for a given sample size n cluster samples are less accurate than the other types of sampling in the sense that the parameters you estimate will have greater variability than an SRS, stratified random or systematic sample.