QT1 - 07 - Estimation

1,608 views

Published on

Class notes used in Quantitative Techniques - I course at Praxis Business School, Calcutta

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,608
On SlideShare
0
From Embeds
0
Number of Embeds
60
Actions
Shares
0
Downloads
75
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

QT1 - 07 - Estimation

  1. 1. Estimation S Q U A N T T E C H I N T E U Q I A S E V I T 1 0 S
  2. 2. Why Estimation ? <ul><li>[ From ] Inference </li></ul><ul><ul><li>For a given population </li></ul></ul><ul><ul><li>Various statistical parameters are GIVEN or KNOWN </li></ul></ul><ul><ul><ul><li>Mean, Standard Deviation etc </li></ul></ul></ul><ul><ul><li>Task was to interpret them and take managerial decisions </li></ul></ul><ul><ul><ul><ul><li>How many shirts to be stocked in the store ? </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Is the machine setup faulty ? Should we fix it </li></ul></ul></ul></ul><ul><li>[ To ] Estimation </li></ul><ul><ul><li>For the given population or sample </li></ul></ul><ul><ul><li>Various statistical parameters are NOT KNOWN </li></ul></ul><ul><ul><li>So managerial decisions cannot be taken </li></ul></ul><ul><ul><li>UNLESS we can estimate the parameters </li></ul></ul>
  3. 3. Two kinds of Estimates <ul><li>Point Estimate </li></ul><ul><ul><li>A single number that is used to estimate a given population parameter </li></ul></ul><ul><ul><ul><li>Mean Age = 22.3 </li></ul></ul></ul><ul><li>Interval Estimate </li></ul><ul><ul><li>A range of values used to estimate a population parameter </li></ul></ul><ul><ul><ul><li>Mean Age is between 21.5 and 23 </li></ul></ul></ul><ul><li>Difficulty of point estimate </li></ul><ul><ul><li>It is either right or wrong ! </li></ul></ul><ul><ul><li>No way to know the quantum of error in the estimate </li></ul></ul><ul><ul><li>Needs to accompanied by another estimate of the error that could have happened !! </li></ul></ul>
  4. 4. Estimator <ul><li>Estimator </li></ul><ul><ul><li>A sample statistic that is used to estimate a population parameter </li></ul></ul><ul><li>Estimate </li></ul><ul><ul><li>A specific value of the statistic that is observed </li></ul></ul>
  5. 5. Criteria for a Good Estimator <ul><li>Unbiased </li></ul><ul><ul><li>Example : Mean of the sampling distribution of sample means taken from the population is equal to the population mean itself. </li></ul></ul><ul><li>Efficient </li></ul><ul><ul><li>Depends on the standard error of the statistic </li></ul></ul><ul><ul><ul><li>Standard error = standard deviation of the sampling distribution </li></ul></ul></ul><ul><ul><li>If standard error is low, estimator is efficient </li></ul></ul><ul><li>Consistent </li></ul><ul><ul><li>When sample size increases the value of the statistic comes closer and closer to the value of the parameter </li></ul></ul><ul><li>Sufficient </li></ul><ul><ul><li>Uses all information that can be extracted from sample </li></ul></ul>
  6. 6. Point Estimate <ul><li>Estimate of Mean </li></ul><ul><li>Sample Mean </li></ul><ul><li>S x </li></ul><ul><li>x = </li></ul><ul><li>n </li></ul><ul><li>Estimate of Standard Deviation </li></ul><ul><li>S (x – x) 2 </li></ul><ul><li>s 2 = </li></ul><ul><li>n – 1 </li></ul><ul><li>We cannot use the lower statistic because </li></ul><ul><li>S (x – x) 2 </li></ul><ul><li>s 2 = </li></ul><ul><li>n </li></ul><ul><li>because it can be shown that it has a bias ! </li></ul>
  7. 7. Where are the errors here ? <ul><li>Potential number of patrons at a very popular musical concert that is always sold out .. </li></ul><ul><ul><li>Estimator : Average number of tickets sold </li></ul></ul><ul><li>Telephone calls are billed by whole minutes even if the duration is a fraction. What is the average length of a call ? </li></ul><ul><ul><li>Estimator : Average billing for all calls made over a day / Rate per minute </li></ul></ul>
  8. 8. Interval Estimate <ul><li>An interval estimate describes a range of values within which a population parameter is likely to lie </li></ul><ul><li>Consider an interval estimate for the mean </li></ul><ul><ul><li>Start with a point estimate </li></ul></ul><ul><ul><li>Find the likely error of this estimate </li></ul></ul><ul><ul><ul><li>Standard error is standard deviation of the estimator </li></ul></ul></ul><ul><ul><li>Make an interval estimate </li></ul></ul><ul><ul><ul><li>Defined in terms of the estimate and the standard error </li></ul></ul></ul><ul><ul><li>Find the probability that mean will fall in this interval estimate </li></ul></ul>
  9. 9. Example <ul><li>Estimate the average battery life of a car in months </li></ul><ul><li>From a sample size of 200 we get x = 36 </li></ul><ul><li>Standard error of sample mean </li></ul><ul><li>s </li></ul><ul><li>s x = = 0.707 assuming s = 10 </li></ul><ul><li>Now we can make an interval estimates like </li></ul><ul><ul><li>x – s x < m < x + s x => 35.293 < m < 36.707 </li></ul></ul><ul><ul><li>x – 2 s x < m < x + 2 s x => 34.586 < m < 37.414 </li></ul></ul><ul><ul><li>x – 3 s x < m < x + 3 s x => 33.879 < m < 38.121 </li></ul></ul>
  10. 10. Back to Probability <ul><li>Sampling distribution of the mean is also normal with </li></ul><ul><ul><li>Mean = 36.0 </li></ul></ul><ul><ul><li>Standard Deviation ( Standard Error) = 0.707 </li></ul></ul><ul><li>So probability of the real mean lying between the limits given by the interval estimate is known ! </li></ul>68.3% 95.5% 99.7%
  11. 11. Interval Estimate of Mean <ul><li>Probabilities are as follows </li></ul><ul><ul><li>68.3% => 35.293 < m < 36.707 </li></ul></ul><ul><ul><li>95. 5 % => 34.586 < m < 37.414 </li></ul></ul><ul><ul><li>99.7% => 33.879 < m < 38.121 </li></ul></ul><ul><li>Here we note that the probabilities are odd, fractional kind of numbers ... </li></ul><ul><ul><li>So how can we have simpler probabilities like 50% or 90% probability ? </li></ul></ul>
  12. 12. Confidence Interval <ul><li>We observe that in a normal distribution </li></ul><ul><ul><li>90% of the values lie within 1.64 s of mean </li></ul></ul><ul><ul><li>99% of the values lie within 2.58 s of mean </li></ul></ul><ul><li>So we redefine our interval estimates as </li></ul><ul><ul><li>90% => 34.84 < m < 37.16 </li></ul></ul><ul><ul><li>99% => 34.18 < m < 37.82 </li></ul></ul>
  13. 13. Confidence Intervals <ul><li>Original Limits </li></ul><ul><ul><li>68.3% => 35.293 < m < 36.707 </li></ul></ul><ul><ul><li>95. 5 % => 34.586 < m < 37.414 </li></ul></ul><ul><ul><li>99.7% => 33.879 < m < 38.121 </li></ul></ul><ul><li>More convenient limits </li></ul><ul><ul><li>90% => 34.84 < m < 37.16 </li></ul></ul><ul><ul><li>99% => 34.18 < m < 37.82 </li></ul></ul><ul><li>Limits </li></ul><ul><li>1 s </li></ul><ul><li>2 s </li></ul><ul><li>3 s </li></ul><ul><li>1.64 s </li></ul><ul><li>2.58 s </li></ul>
  14. 14. Is a “higher” confidence interval always better ? <ul><li>I am 99.999% sure that the average age of this class lies between 1 year and 50 years </li></ul><ul><ul><li>Does this really help you in anyway ? </li></ul></ul><ul><li>I am 95% sure that the average age of this class lies between 23 and 26 years </li></ul><ul><ul><li>This gives me a far better idea of where the average age of the class lies </li></ul></ul><ul><ul><li>This information is better than the first information </li></ul></ul>
  15. 15. 95% confident that the mean battery life lies between 30 – 42 months <ul><li>Does NOT mean that </li></ul><ul><ul><li>There is 95% probability that the mean life of all our batteries falls within the interval established from this one sample </li></ul></ul><ul><li>It DOES mean that </li></ul><ul><ul><li>If we select many random samples of the same size and calculate a confidence interval for each of these samples then 95% of these intervals will contain the population mean </li></ul></ul>
  16. 16. Calculation of Confidence Interval Example <ul><li>A large automotive parts wholesalers needs an estimate of the mean life that he can expect from a windshield wiper under normal driving conditions </li></ul><ul><li>It is known that the standard deviation of the population life is 6 months </li></ul><ul><li>Observations from 1 simple random sample of 100 blades is as follows </li></ul><ul><ul><li>Sample Size </li></ul></ul><ul><ul><ul><li>n = 100 </li></ul></ul></ul><ul><ul><li>Sample Mean </li></ul></ul><ul><ul><ul><li>x = 21 months </li></ul></ul></ul><ul><ul><li>Population standard deviation </li></ul></ul><ul><ul><ul><li>s = 6 months </li></ul></ul></ul>
  17. 17. 95% Confidence Interval <ul><li>Standard Error </li></ul><ul><ul><ul><ul><li>s </li></ul></ul></ul></ul><ul><ul><li>s x = </li></ul></ul><ul><ul><li>6 </li></ul></ul><ul><ul><li>= </li></ul></ul><ul><ul><li>= 0.6 months </li></ul></ul><ul><li>95% confidence level will include 47.5 % on each side of the mean </li></ul><ul><li>Sample size is > 30 so we can assume that the sample mean follows a normal distribution </li></ul><ul><li>In a normal distribution </li></ul><ul><ul><li>95% values lie within 1.96 times the standard deviation </li></ul></ul><ul><ul><li>95 % values of the sample mean lie within 1.96 times the standard error </li></ul></ul>
  18. 18. Upper Confidence Limit Lower Confidence Limit <ul><li>Upper Confidence Limit </li></ul><ul><ul><li>x + 1.96 s x </li></ul></ul><ul><ul><li>= 21 + 1.96 ( 0.6) </li></ul></ul><ul><ul><li>= 22.18 months </li></ul></ul><ul><li>Lower Confidence Limit </li></ul><ul><ul><li>x – 1.96 s x </li></ul></ul><ul><ul><li>= 21 – 1.96 ( 0.6) </li></ul></ul><ul><ul><li>= 19.82 months </li></ul></ul><ul><li>Two major assumptions </li></ul><ul><li>Standard Deviation of Population is known </li></ul><ul><ul><li>In reality this is may not be known </li></ul></ul><ul><li>The sampling distribution follows the normal distribution </li></ul><ul><ul><li>This assumption is valid only if the sample size is more than 30. </li></ul></ul>
  19. 19. Standard deviation is not known <ul><li>When Standard Deviation Known </li></ul><ul><li>Standard Error of the Sample mean </li></ul><ul><ul><ul><ul><li>s </li></ul></ul></ul></ul><ul><ul><li>s x = </li></ul></ul><ul><li>When Standard Deviation is not Known </li></ul><ul><li>Standard Error of the sample mean </li></ul><ul><ul><ul><ul><li>s </li></ul></ul></ul></ul><ul><ul><li>s x = </li></ul></ul><ul><ul><li>S (x - x) 2 </li></ul></ul><ul><ul><li>s = </li></ul></ul><ul><ul><li>n - 1 </li></ul></ul>^ ^ ^
  20. 20. How do we get this interval <ul><li>[Usually] we are trying to estimate the population mean m </li></ul><ul><li>We have an estimator E which [ in most cases ] is the sample mean. </li></ul><ul><ul><li>E follows a distribution that has mean m and standard error s </li></ul></ul><ul><li>We create a statistic Q = (E – m )/ s </li></ul><ul><li>Q follows a some distribution ( normal ? T ? ) </li></ul><ul><li>We identify two values Q 1 , Q 2 such that probability of Q falling between Q 1 and Q 2 is equal to required confidence P </li></ul><ul><li>Interval is E – Q 1 s < m < E + Q 2 s </li></ul>
  21. 21. What is our goal ? <ul><li>What is known ? </li></ul><ul><ul><li>E, s , P </li></ul></ul><ul><li>What is to be calculated </li></ul><ul><ul><li>Q 1 , Q 2 </li></ul></ul><ul><li>What is the objective </li></ul><ul><ul><li>To be confident that </li></ul></ul><ul><ul><li>Probability of m </li></ul></ul><ul><ul><li>Lying between E – Q 1 s and E + Q 2 s </li></ul></ul><ul><ul><li>Is equal to P </li></ul></ul>
  22. 22. What are the steps <ul><li>Identify an estimator </li></ul><ul><li>What distribution does the estimator follow ? </li></ul><ul><ul><li>Is the standard deviation known ? </li></ul></ul><ul><ul><ul><li>If not what is the estimator for the standard deviation </li></ul></ul></ul><ul><ul><li>Is the sample size big enough ? </li></ul></ul><ul><li>Get a value of the estimate </li></ul><ul><li>Get a value of the standard error for the estimator </li></ul><ul><li>Set an appropriate confidence level in terms of probability </li></ul><ul><li>From the graph / table of the sampling distribution get the upper and lower limits in terms of estimate and the standard error </li></ul>
  23. 23. Confidence Intervals Revisited <ul><li>Original Limits </li></ul><ul><ul><li>68.3% => 35.293 < m < 36.707 </li></ul></ul><ul><ul><li>95. 5 % => 34.586 < m < 37.414 </li></ul></ul><ul><ul><li>99.7% => 33.879 < m < 38.121 </li></ul></ul><ul><li>More convenient limits </li></ul><ul><ul><li>90% => 34.84 < m < 37.16 </li></ul></ul><ul><ul><li>99% => 34.18 < m < 37.82 </li></ul></ul><ul><li>Point to note ... </li></ul><ul><ul><li>m and s come from the estimate </li></ul></ul><ul><ul><li>How do we connect </li></ul></ul><ul><ul><ul><li>68.3%, 90%, 95.5%, 99%, 99.7% </li></ul></ul></ul><ul><ul><ul><li>1, 1.64, 2, 2.58, 3 </li></ul></ul></ul><ul><li>Limits </li></ul><ul><li>1 s </li></ul><ul><li>2 s </li></ul><ul><li>3 s </li></ul><ul><li>1.64 s </li></ul><ul><li>2.58 s </li></ul>By looking at the probability distribution function of the estimator
  24. 24. Sample Size in Estimation <ul><li>What should be the sample size such that with a known population standard deviation, the sample size should be adequate to ensure an adequate confidence interval ? </li></ul>
  25. 25. Which distribution does the estimator follow ? <ul><li>So far ... and usually .. we assume that the estimator follows the normal distribution </li></ul><ul><li>That is how we get </li></ul><ul><ul><li>68.3% => 1.0 s </li></ul></ul><ul><ul><li>90.0% => 1.64 s </li></ul></ul><ul><ul><li>95.5% => 2.0 s </li></ul></ul><ul><ul><li>99.0% => 2.58 s </li></ul></ul><ul><ul><li>99.7% => 3.00 s </li></ul></ul>68.3% 95.5% 99.7%
  26. 26. The Student's t distribution <ul><li>Used when </li></ul><ul><ul><li>Standard deviation of the population is NOT known AND </li></ul></ul><ul><ul><li>Sample size is less than 30 </li></ul></ul><ul><li>When this happens we cannot use the Normal distribution but must look up the tables for the T distribution </li></ul>
  27. 27. t-distribution instead of normal =NORMSINV( D5 +0.5) =TINV( $F5 ; G$2 )
  28. 28. Usage of t-table <ul><li>The probability that we are working with </li></ul><ul><ul><li>IS NOT the probability that the estimated value will fall inside the confidence interval </li></ul></ul><ul><ul><li>INSTEAD </li></ul></ul><ul><ul><li>It is the probability that the estimated value will fall OUTSIDE the confidence interval </li></ul></ul><ul><ul><ul><li>This probability is defined as a </li></ul></ul></ul><ul><ul><ul><li>Confidence = 1 – a </li></ul></ul></ul><ul><li>Degree of Freedom </li></ul><ul><ul><li>1 – sample size </li></ul></ul>
  29. 29. Binomial Distribution / Proportions <ul><li>We have a binomial distribution with p as the success probability </li></ul><ul><ul><li>20% student population are engineers </li></ul></ul><ul><ul><li>45% of employees population are married </li></ul></ul><ul><li>We need to have an estimate of p </li></ul><ul><li>Estimator is proportion p from sample </li></ul><ul><ul><li>Assumptions </li></ul></ul><ul><ul><ul><li>Estimator follows normal distribution </li></ul></ul></ul><ul><ul><ul><li>m = np </li></ul></ul></ul><ul><ul><ul><li>Standard error of estimate </li></ul></ul></ul>
  30. 30. Example <ul><li>Sample = 75 </li></ul><ul><li>Fraction graduate </li></ul><ul><ul><li>p = 0.4 </li></ul></ul><ul><li>Fraction not graduate </li></ul><ul><ul><li>q = 0.6 </li></ul></ul><ul><li>Estimate of p = 0.4 </li></ul><ul><li>Standard Error for Estimator </li></ul><ul><ul><li>= 0.057 </li></ul></ul><ul><li>99% confidence interval </li></ul><ul><li>Z = 2.58 </li></ul><ul><li>LCL </li></ul><ul><ul><li>0.4 – 0.057 * 2.58 </li></ul></ul><ul><ul><li>= 0.253 </li></ul></ul><ul><li>UCL </li></ul><ul><ul><li>0.4 + 0.057 * 2.58 </li></ul></ul><ul><ul><li>= 0.547 </li></ul></ul>

×