15 estimation and sample size

524 views

Published on

Published in: Technology, News & Politics
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
524
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
19
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

15 estimation and sample size

  1. 1. 14-04-2012 1 Research Methodology Dr. NimitChowdhary,Professor Saturday, April 14, 2012 1© Dr. Nimit Chowdhary © Dr. Nimit Chowdhary Research Methodology Workshop p. 2 Saturday, April 14, 2012  We want to know the behavior (characteristics) of a population  We draw samples from this population  Make conclusions about the population parameters based on sample statistics.  A characteristic of the sample is an estimate of the similar characteristic of population
  2. 2. 14-04-2012 2 © Dr. Nimit ChowdharySaturday, April 14, 2012 Estimation Point estimate Interval estimate © Dr. Nimit ChowdharySaturday, April 14, 2012 A point estimate uses single sample value to estimate the desired population parameter. Example Sample mean is a point estimate.
  3. 3. 14-04-2012 3 © Dr. Nimit Chowdhary Research Methodology Workshop p. 5 Saturday, April 14, 2012  Unbiasedness That the estimated value (statistic derived from sample)is equal to the population parameter  Consistency If statistic approaches populationparameter as the sample size increases and approaches populationsize © Dr. Nimit Chowdhary Research Methodology Workshop p. 6 Saturday, April 14, 2012  Efficiency If value of the estimate remains stable from sampleto sample. Best estimator will return least inter sample variation  Sufficiency If estimator uses all the information about the populationparameter contained in the sample(mean uses all information while mediandoes not)
  4. 4. 14-04-2012 4 © Dr. Nimit Chowdhary Research Methodology Workshop p. 7 Saturday, April 14, 2012  May not exactly locate the population parameter  May not indicate how far is the estimate from the true value  Point estimate does not specify as to how confident we can be that the estimate is close to the populationparameter © Dr. Nimit Chowdhary Research Methodology Workshop p. 8 Saturday, April 14, 2012  Instead of a point value we can indicate a range as an estimate.  We can be reasonably confident that the true parameter will lie in this range
  5. 5. 14-04-2012 5 © Dr. Nimit ChowdharySaturday, April 14, 2012 A sampling distribution is normally distributed with a mean of  and a standard deviationof x  x x Z     x x Z    x Zx Z  x1 x2X © Dr. Nimit Chowdhary Research Methodology Workshop p. 10  x1 x2X Suppose we want to find out a confidence interval around the sample mean within which the population mean is expected to lie 95% of the time. 95% 47.5%47.5% 2.5%2.5%
  6. 6. 14-04-2012 6 © Dr. Nimit Chowdhary Research Methodology Workshop p. 11 Saturday, April 14, 2012 This can be interpreted as:  If all possible samples of size n were taken, then on the average 95% of these samples would include the population mean within the interval around their sample means bounded by x1 and x2  If we took a random sample of size n from a given population, the probability is 0.95 that the population mean would lie between the interval x1 and x2 around the sample mean © Dr. Nimit Chowdhary Research Methodology Workshop p. 12 Saturday, April 14, 2012 This can be interpreted as:  If a random sample of size n was taken from a given population, we can be 95% confident in our assertion that the population mean will lie around the sample mean in the interval bounded by values x1 and x2 as shown (also known as 95% confidence interval).  At 95% confidence interval, the value of z score as taken from the z score table is 1.96.
  7. 7. 14-04-2012 7 © Dr. Nimit Chowdhary Research Methodology Workshop p. 13 Saturday, April 14, 2012 The sponsor of aTV programme targeted at the children’s market (age 4-10 years) wants to find out the average amount of time children spend watching TV. A random sample of 100 children indicated the average time spent by these children watching TV per week to be 27.2 hours. From previous experience, the population s.d. of the weekly extent ofTV watched () is known to be 8 hours. A confidence level of 95% is considered to be adequate. © Dr. Nimit Chowdhary Research Methodology Workshop p. 14 1.96 x  x1 x227.2X  =8 1.96 x  Confidence interval is given by So  must lie between and also, x x Z x x Z x x Z x n   
  8. 8. 14-04-2012 8 © Dr. Nimit Chowdhary Research Methodology Workshop p. 15 Therefore, in our case 27.2 1.96 8 100 X Z n      8 0.8 100x n      Hence, Therefore the confidence interval is 27.2 1.96 0.8 27.2 1.568 25.632,28.768 x x Z        This means that we can conclude with 95% confidence that a child on average spends between 26.632 and 28.768 hours per week watching television. © Dr. Nimit ChowdharySaturday, April 14, 2012 Calculate the confidence interval in the previous example, if we want to increase our confidence level from 95% to 99%. Other values remain the same. = 8 x1 x2 27.2X  95% 49.5%49.5% 0.5%0.5% 2.58 x 2.58 x 
  9. 9. 14-04-2012 9 © Dr. Nimit Chowdhary Research Methodology Workshop p. 17 As in previous case, 27.2 1.96 8 100 X Z n      8 0.8 100x n      Hence, Therefore the confidence interval is 27.2 2.58 0.8 27.2 2.064 25.136,29.264 x x Z        This means that we can conclude with 99% confidence that a child on average spends between 25.136 and 29.264 hours per week watching television. Note that limits have to be spread out further to exude more confidence. © Dr. Nimit ChowdharySaturday, April 14, 2012 In situations when population variation (s.d) is unknown, and when sample size is reasonably large (30 or more), we can approximate the population standard deviation () by the sample standard deviation (s), so that the confidence interval, is approximated by interval , when n ≥ 30 Where, x x Z x x Zs x s s n 
  10. 10. 14-04-2012 10 © Dr. Nimit Chowdhary Research Methodology Workshop p. 19 Now since we are interested in distribution of means, the dispersion of distribution of sample means can be estimated from sample standard deviation- we have assumed that sample s.d. is equal to population s.d. for n ≥ 30. x s s n  © Dr. Nimit Chowdhary Research Methodology Workshop p. 20 Saturday, April 14, 2012 It is desired to estimate the average age of students who graduate with an MBA degree in the university system. A random sample of 64 graduating students showed that the average age was 27 years with a standard deviation of 4 years.
  11. 11. 14-04-2012 11 © Dr. Nimit Chowdhary Research Methodology Workshop p. 21 Saturday, April 14, 2012  Estimate a 95% confidence interval estimate of the true average (populationmean) age of all such graduating students at the university.  How would the confidence interval limits change if the confidence level was increased from 95% to 99% Sample size n is sufficiently large, we can approximate the population standard deviation by the sample standard deviation 1.96 x  x1 x227X  s=4 1.96 x  4 0.5 64x s s n   
  12. 12. 14-04-2012 12 © Dr. Nimit Chowdhary Research Methodology Workshop p. 23 95% confidence interval of population  mean is given by: 27 1.96 0.5 27.98,26.02 x x Zs      Hence, 26.02 27.98  For 99% confidence, Z=2.58 27 2.58 0.5 28.29,25.71 , 25.71 28.29 x x Zs Hence          © Dr. Nimit Chowdhary Research Methodology Workshop p. 24 Saturday, April 14, 2012  Sampleis studied to infer about the populationparameters  More the variation in the population,a bigger samplewould be required to estimate  Bigger the sample, more is our confidence with the estimate
  13. 13. 14-04-2012 13 © Dr. Nimit Chowdhary Research Methodology Workshop p. 25 Saturday, April 14, 2012  Choice of sample size is depends upon two things:  Degree of accuracy we require in our estimate  The degree of confidence in ourselves that the error in the estimate remains within the degree of accuracy that is desired  Ideally sample mean should be equal to the population mean   If entire population is taken as a sample then would be equal to   For a sampling exercise ( - ) can be considered as error or deviation of the estimator from the population mean. We know that: X X X X X X Z     But, ( ) X n and X E      
  14. 14. 14-04-2012 14 2 / X X Z E E n Z n Z n E                  Therefore sample size depends upon  Confidence interval desired, Z  Maximum error allowed, E  Variability of the population,  © Dr. Nimit ChowdharySaturday, April 14, 2012  If we want a smaller error (want to be more accurate!)  Be more confident and/ or,  Population variance is more We will have to have a larger sample
  15. 15. 14-04-2012 15 © Dr. Nimit Chowdhary Research Methodology Workshop p. 29 Saturday, April 14, 2012 We would like to know that a child spends watching television over the weekend. We want our estimate to be within ± 1 hour of the true population average. (This means that the maximum allowable error is 1 hour). Previous studies have shown the population s.d. to be 3 hours. What sample size should be taken for this purpose, if we want to be 95% confident that the error in our estimate will not exceed the maximum allowable error? © Dr. Nimit Chowdhary Research Methodology Workshop p. 30 For 95% confidence level, the values of Z=1.96 E= 1 hour (given) = 3 hours (given) 2 2 2 2 2 2 , (1.96) (3) (1) 34.57 35 then Z n E n n      

×