Sampling Size

4,860 views

Published on

Some quick slides I put together on sampling size design

Sampling Size

  1. 1. Sampling Size
  2. 2. Principles <ul><li>The information contained in a sample does not depend appreciably on the size of the lot from which the sample is taken (provided the lot is at least 10 times the sample) </li></ul><ul><li>The information in a sample increases when the size of the sample increases (larger samples are more faithful to the entire lot) </li></ul><ul><li>The information contained in a sample does not depend on the proportion that the sample represents of the lot (just on the sheer number) </li></ul>
  3. 3. Principles <ul><li>The information contained in a sample does not depend appreciably on the size of the lot from which the sample is taken (provided the lot is at least 10 times the sample). </li></ul><ul><li>The information in a sample increases when the size of the sample increases (larger samples are more faithful to the entire lot) </li></ul><ul><li>The information contained in a sample does not depend on the proportion that the sample represents of the lot (just on the sheer number) </li></ul>
  4. 4. Principles <ul><li>The information contained in a sample does not depend appreciably on the size of the lot from which the sample is taken (provided the lot is at least 10 times the sample) </li></ul><ul><li>The information in a sample increases when the size of the sample increases. Larger samples are more faithful to the entire lot. </li></ul><ul><li>The information contained in a sample does not depend on the proportion that the sample represents of the lot (just on the sheer number) </li></ul>
  5. 5. Principles <ul><li>The information contained in a sample does not depend appreciably on the size of the lot from which the sample is taken (provided the lot is at least 10 times the sample) </li></ul><ul><li>The information in a sample increases when the size of the sample increases (larger samples are more faithful to the entire lot) </li></ul><ul><li>The information contained in a sample does not depend on the proportion that the sample represents of the lot (just on the sheer number). In any case, larger samples contain more information than smaller ones. </li></ul>
  6. 6. Key principle <ul><li>Larger sample sizes yield better information </li></ul><ul><li>However, we want to minimize the amount of data collected and still achieve “good” results </li></ul><ul><ul><li>Minimize time for data collection </li></ul></ul><ul><ul><li>Minimize sample degradation </li></ul></ul><ul><ul><li>Etc. </li></ul></ul>
  7. 7. Sampling Sizes for Various Applications <ul><li>Estimating Averages, single variable </li></ul><ul><li>Estimating Variances, single variable </li></ul>
  8. 8. Single Variable - Estimating Average <ul><li>It can be shown that the length of a confidence interval for μ is given by: </li></ul><ul><li>Where t is the critical value of Student’s t at the level of significance desired with degree of freedom equal to DF for σ estimation </li></ul><ul><li>So, given a required value of L and an estimated standard deviation, we can determine how many measurements are necessary </li></ul><ul><li>Assumptions </li></ul><ul><li>We have an estimate of the standard deviation of a measurement </li></ul><ul><li>We want to have a certain level of confidence of our average value </li></ul><ul><li>The t value can be obtained in numerous data tables or by using excel TINV((1-p),DoF), where p is the % confidence desired </li></ul>
  9. 9. Single Variable - Estimating Variance <ul><li>Simple, approximate method (assume the distribution is normal): </li></ul><ul><li>We have a series of samples all of the same size N, taken from a population with a normal variance </li></ul><ul><li>Each sample will have it’s own variance. The distribution of these variances is NOT normal . It is a chi-square distribution with a mean at the population variance. </li></ul><ul><li>We are interested in the RATIO of estimated variance to actual variance </li></ul>
  10. 10. Single Variable - Estimating Variance <ul><li>Complicated, exact method (because the distribution is not normal): </li></ul><ul><li>Then, you have to lookup in a chi-square table under the proper percentage column to find a value such that its quotient by the corresponding n is not less than (1- ε ) 2 and not more than (1+ ε ) 2 . </li></ul><ul><li>This is done in Excel by the following: </li></ul><ul><ul><ul><li>χ low -1 =CHIINV((1-p)/2,n) </li></ul></ul></ul><ul><ul><ul><li>χ hi -1 =CHIINV(1-(1-p)/2,n) </li></ul></ul></ul><ul><li>We have a series of samples all of the same size N, taken from a population with a normal variance </li></ul><ul><li>Each sample will have it’s own variance. The distribution of these variances is NOT normal . It is a chi-square distribution with a mean at the population variance. </li></ul><ul><li>We are interested in the RATIO of estimated variance to actual variance </li></ul>
  11. 11. The problem of fitting a line <ul><li>y=mx+b </li></ul><ul><li>We want to ask a few major questions about this line </li></ul><ul><ul><li>How good is our estimate of the slope? </li></ul></ul><ul><ul><li>How good is our estimate of the offset? (offset = (m-1)<x> + b) </li></ul></ul><ul><ul><li>Over what range do these values hold? </li></ul></ul>
  12. 13. Errors in both x and y <ul><li>y=mx+b  Y = α + β X </li></ul><ul><ul><li>where x = X+ ε , y = Y+ δ </li></ul></ul><ul><ul><li>We assume that ε & δ are normally distributed with < ε > and < δ > = 0 and σ ε and σ δ are known </li></ul></ul><ul><li>Want to know how many sites we need for good confidence over specific ranges for both slope (m) and average offset (<Y>-<X> = (m-1)<X>+b) </li></ul><ul><li>We can run Monte Carlo simulations to solve this </li></ul>
  13. 14. Offset calculation <ul><li>The offset can be calculated by <Y> - <X> </li></ul><ul><li>We know that the standard deviation of this value will be equal to σ offset =sqrt( σ ε 2 + σ δ 2 +2(<XY>-<X><Y>)) </li></ul>

×