Stats chapter 10

8,457 views

Published on

Published in: Technology, Sports
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
8,457
On SlideShare
0
From Embeds
0
Number of Embeds
190
Actions
Shares
0
Downloads
60
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Stats chapter 10

  1. 1. Chapter 10<br />Estimating with Confidence<br />
  2. 2. 10.1 Confidence Intervals: The Basics<br />
  3. 3. Definitions<br />Statistical Inference<br />The method of drawing conclusions about a population based on a sample<br />The last Major Topic for this last statistics course<br />The mindset: we are looking at data that comes from a random sample (or an experiment) and inferring characteristics of the population<br />
  4. 4. The Basic Plan<br /> In the absence of other data, the sample estimate is the estimate of the parameter<br />We would like an interval estimation<br />State the parameter that is being estimated<br />Check to see if the data can be Normalized<br />Compute the interval using area under the Normal curve<br />Write a nice conclusion<br />
  5. 5. What does an interval estimation look like?<br />EX1We are 95% confident the mean is in the interval (9.11, 12.05)<br />
  6. 6. What does an interval estimation look like?<br />EX1We are 95% confident the mean is in the interval (9.11, 12.05)<br />Confidence Level C<br />
  7. 7. What does an interval estimation look like?<br />EX1We are 95% confident the mean is in the interval (9.11, 12.05)<br />Confidence Interval<br />(CI)<br />
  8. 8. What does an interval estimation look like?<br />EX1We are 95% confident the mean is in the interval (9.11, 12.05)<br />Lower Bound<br />Upper Bound<br />
  9. 9. What does an interval estimation look like?<br />EX1We are 95% confident the mean is in the interval (9.11, 12.05)<br />EX2We are 90% confident that the proportion that support this law is 0.74 ± 0.08<br />
  10. 10. What does an interval estimation look like?<br />EX1We are 95% confident the mean is in the interval (9.11, 12.05)<br />EX2We are 90% confident that the proportion that support this law is 0.74 ± 0.08<br />Confidence Level C<br />
  11. 11. What does an interval estimation look like?<br />EX1We are 95% confident the mean is in the interval (9.11, 12.05)<br />EX2We are 90% confident that the proportion that support this law is 0.74 ± 0.08<br />Confidence Interval (CI)<br />
  12. 12. What does an interval estimation look like?<br />EX1We are 95% confident the mean is in the interval (9.11, 12.05)<br />EX2We are 90% confident that the proportion that support this law is 0.74 ± 0.08<br />Margin of Error<br />(ME)<br />Point Estimate<br />
  13. 13. Confidence Level<br />The value of the parameter is fixed before we start our sampling<br />Either the parameter is ‘in’ our interval or it’s not.<br />There is no probability involved here<br />“There is a 95% probability the parameter is in the interval”<br />
  14. 14. Confidence Level<br />The value of the parameter is fixed before we start our sampling<br />Either the parameter is ‘in’ our interval or it’s not.<br />There is no probability involved here<br />“There is a 95% probability the parameter is in the interval”<br />F A I L<br />
  15. 15. Confidence Level<br />Remember that our sample is just one of many samples that could have been taken<br />If good sampling technique is used, 95% (for example) of the samples would contain the parameter<br />We just don’t know if our sample is part of the 95% or the 5%<br />
  16. 16. Confidence Level<br />Interpretation of “90% Confidence Level”<br />“We are 90% confident that our CI contains the value of the parameter”<br />“90% of CI’s computed with this technique will contain the parameter”<br />“There is a 90% probability our CI contains the parameter”<br />WRONG INTERPRETATION<br />
  17. 17. Confidence Level<br />Interpretation of “90% Confidence Level”<br />“We are 90% confident that our CI contains the value of the parameter”<br />“90% of CI’s computed with this technique will contain the parameter”<br />“There is a 90% probability our CI contains the parameter”<br />WRONG INTERPRETATION<br />
  18. 18. PANIC<br />When constructing Confidence Intervals for the AP test, there is a definite checklist that readers look for<br />Use the acronym P.A.N.I.C. to help memorize the steps to construction a confidence interval<br />P = state the Parameter<br />A = check that Assumptions are satisfied<br />N = state is the Name of the interval<br />I = compute numerical values of the Interval<br />C = write a Conclusion for your findings<br />
  19. 19. PANIC for Means<br />We are going to start with Means, we’ll save proportions for later<br />Parameterdefine what  measuresdefine what x-bar measures<br />“ = mean length of all Great White Sharks‘ flukes<br />x-bar = mean length the Great White Shark flukes in our sample of 35 sharks”<br />
  20. 20. Assumptions for CI’s of Mean<br />SRS- the data must have come from an SRS<br />Independence- the population size must be more than 10 times the sampleN> 10n (Independence)<br />
  21. 21. Assumptions for CI’s of Mean<br />SRS- the data must have come from an SRS<br />Independence- the population size must be more than 10 times the sampleN> 10n (Independence)<br />This is a condition that must be met since we are sampling without replacement and our formula for<br />std dev needs to hold true<br />
  22. 22. Assumptions for CI’s of Mean<br />SRS- the data must have come from an SRS<br />Independence- the population size must be more than 10 times the sampleN> 10n (Independence)<br />The sampling Distribution is approximately Normal (Normality)<br />
  23. 23. More on Normality<br />We are looking for justification that the sampling distribution is Normal<br />If the population is Normal, then the samp dist is also Normal<br />If n> 30 and the sample data is not given, state “The Central Limit Theorem guarantees that the sampling distribution is Normal”<br />For smaller samples, check the following:(1) the Histogram is single peak, symmetric with no outliers(2) the Normal Probability Plot is approx Linear<br />
  24. 24. Name of the Interval<br />Currently, the only Interval we will worry about is “z-interval for sample means”<br />This will always be the name of the distribution used<br />
  25. 25. Confidence Interval Computation<br />
  26. 26. Confidence Interval Computation<br />Margin of Error<br />
  27. 27. Confidence Interval Computation<br />Standard Error<br />Always the std dev<br />of the sampling distribution<br />Critical Value<br />-from the Normal (z) distribution<br />
  28. 28. Critical Value<br />The area between –z* and +z* = Confidence Level<br />Because they are used frequently, there is a shorthand method in table C<br />The last row gives different confidence levels<br />Keep critical values to 3 decimal places <br />The row above the CL row has the z*<br />In the AP test, they will call this z<br />
  29. 29. A General z* curve<br />
  30. 30. z* for a 80% CL<br />
  31. 31. Computing a CI<br />Let’s compute the 95% CI for a sample of 35, with a mean of 5.38 and a population std dev =0.74<br />(1) locate the critical value<br />z* = 1.960<br />(2) Compute S.E. and M.E.<br />SE = 0.74/(35) = 0.1251ME = (1.960)(0.1251) = 0.2452<br />
  32. 32. Computing a CI<br />State the CI = 5.38 ± 0.25<br />Interval estimate:(5.13, 5.63)<br />Notice that the point estimate is the average of the upper and lower bounds<br />
  33. 33. Conclusions<br />We are 95% confident that the mean length of a great white shark fluke is 5.38 ± 0.25 ft.<br /> OR<br />We are 95% confident that the mean length of all great white shark flukes is in the interval (5.13 ft, 5.63 ft). <br />
  34. 34. Calculators TI83/TI84<br />
  35. 35. Calculators TI83/TI84<br />Your TI is very efficient for finding these intervals! This doesn’t excuse you from the mathematics, of course.<br />[stat] -> “TESTS” -> “Zinterval”<br />Inpt: “Stats” (if your data is in L1, you can use “Data”<br />Enter values for , x-bar, n, and C-Level.<br />Viola!<br />
  36. 36. Calculators TI89<br />Run the “Stats/List Editor” APP<br />[2nd] -> [F2] (F7) -> “Zinterval”<br />Input Method = “Stats”Choose Data if all the observations are in a List<br />Enter values for , x-bar, n, and C-Level.<br />Viola!<br />
  37. 37. Behavior of Margin of Error<br />ME = z* (/n)<br />In practice, we would like to minimize ME.<br />(1) decrease z* This also means decrease our CL!<br />(2) increase nthis is usually a trade off, since obtaining large samples could be more expensive/time consuming<br />(3) decrease Not really an option.  is a known quantity.<br />
  38. 38. Sample Size<br />By using algebra, we can get a formula to compute minimum sample size for a given ME<br />You should always round the sample size up in this calculation<br />Note: ME is produced by sampling variability, it has nothing to do with “sloppy work”<br />
  39. 39. 10.2 Estimating a Population Mean<br />
  40. 40. Student’s t-distribution<br />The confidence intervals computed in the previous section assumed that we knew .<br />It doesn’t seem likely that we would know  and not know the value of <br />When  is unknown, we can no longer use the Normal distribution<br />
  41. 41. Student’s t-distribution<br />The t-distribution is to be used when  is unknown.<br />The t-distribution is very similar to the Normal distribution with a key difference<br />The shape of the distribution changes based upon the sample size/degrees of freedom (df)<br />Large samples have a tall peak and thinner tails<br />Small samples have a small peak and thicker tails.<br />
  42. 42. Student’s t-distribution<br />Degrees of Freedom = n -1<br />
  43. 43. Using the t-distribution<br />The CI is found using PANIC<br />The Assumptions are the same as a z-interval<br />Since sample sizes tend to be small, you will most likely need to check the histogram (symmetric, no outliers) and Normal prob plot (approx linear)<br />We cannot use the t-distribution when there are outliers!<br />The Name of the interval is “1-sample t-interval for mean”<br />
  44. 44. Using the t-distribution<br />Interval is computed with<br />df = n – 1<br />t* is from table C or your calculator<br />s is the sample’s standard deviation<br />Uses the sample std dev to approximate <br />
  45. 45. Upper Tail Area<br />
  46. 46. Upper Tail Area<br />This is the Upper Tail Area<br />
  47. 47. Using the t-distribution<br />
  48. 48. Using the t-distribution<br />Using TI84<br /> [2nd] -> [vars](DIST) -> “invT”<br />“invT(1-Upper Tail Area, df)”<br />Where “Upper Tail Area” = (1-CL)/2<br />This is the area to the left of the right crit. Value.<br />ALTERNATIVELY, you may use“−invT(Upper Tail Area, df)”<br />
  49. 49. Using the t-distribution<br />Using TI84<br /> [2nd] -> [vars](DIST) -> “invT”<br />“invT(1-Upper Tail Area, df)”<br />Where “Upper Tail Area” = (1-CL)/2<br />This is the area to the left of the right crit. Value.<br />ALTERNATIVELY, you may use“−invT(Upper Tail Area, df)”<br />
  50. 50. Using the t-distribution<br />Using TI84<br /> [2nd] -> [vars](DIST) -> “invT”<br />“invT(1-Upper Tail Area, df)”<br />Where “Upper Tail Area” = (1-CL)/2<br />This is the area to the left of the right crit. Value.<br />ALTERNATIVELY, you may use“−invT(Upper Tail Area, df)”<br />Don’t forget the negative!<br />
  51. 51. Using the t-distribution<br />
  52. 52. Using the t-distribution<br />Using TI89<br />From home screen:<br />[catalog] -> [F3](FlashApps) -> inv_t…TIStat<br />“tistat.invT(1-Upper Tail Area, df)”<br />Where “Upper Tail Area” = (1-CL)/2<br />This is the area to the left of the right crit. Value.<br />ALTERNATIVELY, you may use“-tistat.invT(Upper Tail Area, df)”<br />
  53. 53. Using the t-distribution<br />Using TI89<br />From home screen:<br />[catalog] -> [F3](FlashApps) -> inv_t…TIStat<br />“tistat.invT(1-Upper Tail Area, df)”<br />Where “Upper Tail Area” = (1-CL)/2<br />This is the area to the left of the right crit. Value.<br />ALTERNATIVELY, you may use“-tistat.invT(Upper Tail Area, df)”<br />
  54. 54. Using the t-distribution<br />Using TI89<br />From home screen:<br />[catalog] -> [F3](FlashApps) -> inv_t…TIStat<br />“tistat.invT(1-Upper Tail Area, df)”<br />Where “Upper Tail Area” = (1-CL)/2<br />This is the area to the left of the right crit. Value.<br />ALTERNATIVELY, you may use“-tistat.invT(Upper Tail Area, df)”<br />Don’t forget the negative!<br />
  55. 55. Using the t-distribution<br />ALTERNATIVE TI89 titanium<br />[APPS] -> “Stat/List Editor” -> [F5] (distrib) -> “Inverse” -> “Inverse t…” <br />“Area: 1- Upper Tail”<br />Upper Tail = (1 – CL)/2<br />“degrees of freedom, df: df”<br />This takes longer to get to, but the menu “guides” you through <br />
  56. 56. Table C<br />Occasionally, you will need to fine the t* for a df that does not appear in the chart.<br />When this happens, you are to use the nearest greatest df in the same column<br />This usually means “use the t* that is in the line above where the desired t* should be”<br />
  57. 57. Example: 1 sample t-interval<br /> Problem 10.30<br /> The amount of Vitamin C (mg/100g) in CSB for a random sample of 8 are given as:<br /> 26, 31, 23, 22, 11, 22, 14, 31<br /> Construct a 95% Confidence interval for the amount of vitamin C in CSB.<br />
  58. 58. Example: 1 sample t-interval<br />Parameter<br /> = the mean amount of vitamin C in CSB produced at the factory<br />x-bar = the mean amount of vitamin C in a sample n = 8 <br />
  59. 59. Example: 1 sample t-interval<br />Assumptions<br />SRS<br />‘Our problem states that we have a random sample’<br />Independence<br />‘10n = 80 < N; we can infer that more than 80 CSB is produced in the factory’<br />
  60. 60. Example: 1 sample t-interval<br />Assumptions<br />Normality<br />‘The histogram is symmetric w/ no outliers’<br />‘The Norm Prob Plot is approx linear’<br />‘We have good evidence that our sampling distribution is approximately Normal’<br />x<br />10 15 20 25 30 35<br />Norm Prob Plot<br />Histogram<br />z<br />
  61. 61. Example: 1 sample t-interval<br />Name of Interval<br /> ‘We will compute a 1-sample t-interval for a mean’<br />
  62. 62. Example: 1 sample t-interval<br />Name of Interval<br /> ‘We will compute a 1-sample t-interval for a mean’<br />Interval Calculation<br />
  63. 63. Example: 1 sample t-interval<br />Name of Interval<br /> ‘We will compute a 1-sample t-interval for a mean’<br />Interval Calculation<br />Conclusion<br /> ‘We are 95% confident that the mean amount of vitamin C in a unit of CSB produced in the factory is between 16.487 and 28.513 mg/100 g’<br />
  64. 64. 10.3 Estimating a population Proportion<br />
  65. 65. Estimating a Proportion<br />Like with means, we are going to estimate the population proportion based on the proportion in a sample<br />x = # of positive responsesn = total number of responses<br />We will again use the PANIC procedures<br />Note: nothing is averaged. We are not looking at the average proportion from many samples<br />
  66. 66. Parameter<br />Some typical parameters:<br />‘p = prop of people in CA who support the proposition<br /> p-hat = proportion of people in a sample n = 35 who support the proposition’<br />‘p = prop of students at THS who ride the bus<br /> p-hat = propotion of students in a sample (n = 14) from THS who ride the bus<br />
  67. 67. Assumptions<br />Simple Random Sample SRS must be either stated or inferred<br />Independencebecause you are usually sampling without replacement:N> 10n<br />Normalityn·p-hat > 10n·q-hat > 10<br />
  68. 68. Name of the Interval<br />“1-proportion z interval”<br />Unlike for means, you will always use the Normal curve when dealing with proportions!<br />
  69. 69. Interval Calculation<br />z* is calculated as before<br />Use table C<br /> OR<br />“ − invNorm( (1 – C) /2 )”This calculation is similar to the calculations for the last section<br />
  70. 70. Interval Calculation<br />z* is calculated as before<br />Use table C<br /> OR<br />“ − invNorm( (1 – C) /2 )”This calculation is similar to the calculations for the last section<br />Margin of Error (ME)<br />
  71. 71. Interval Calculation<br />z* is calculated as before<br />Use table C<br /> OR<br />“ − invNorm( (1 – C) /2 )”This calculation is similar to the calculations for the last section<br />Standard Error(SE)<br />
  72. 72. Conclusion<br />Some Examples:<br />We are 90% confident that the proportion of voters in CA who support the proposition is 0.34 ± 0.03<br />We are 95% confident that the proportion of students at THS who ride this bus is in the interval (0.39, 0.44)<br />
  73. 73. Sample Size<br />The relevant formula (from ME) for the sample size is:<br />p* and q* are guessed values of the proportion<br />If there is no previous data or study, use p* = q* = 0.5 (this will maximize the error and sample size)<br />As before, you are to round the sample size up to the nearest integer<br />
  74. 74. TI83/84<br />
  75. 75. TI83/84<br />[stat] -> “TEST” -> “1-PropZInt”<br />“x: number of successes” this can be computed with “p-hat x n”<br />“n: number of people in sample”<br />“C-Level : confidence level”<br />“Calculate” and you are done<br />You still need to fully write up “PANIC” procedures<br />
  76. 76. TI 89<br />
  77. 77. TI 89<br />[APPS] -> “Stat/List Editor”<br />[2nd] -> [F2](F7) -> “1-PropZInt”<br />“Successes x: # of successes”<br />“n: number of people in sample”<br />“C-Level : confidence level”<br />“Calculate” and you are done<br />You still need to fully write up “PANIC” procedures<br />
  78. 78. Example 1-PropZInt<br /> The 2004 Gallup Youth Survey asked a random sample of 439 US teens aged 13 to 17 whether they though young people should wait to have sex until marriage. 246 sad “yes.” Let’s construct a 95% confidence interval for the proportion of all teens who would say “Yes”<br />
  79. 79. Parameter<br />p = the proportion of all teens in the US aged 13-17 who would answer “Yes” to the survey<br />p-hat = the proportion of teens in the survey of 439 who answered “Yes” to the survey<br />
  80. 80. Assumptions<br />SRS: We are told in the problem that the survey was random sample.<br />Independence: there are than 10(439) = 4390 teens aged 13-17 in the US<br />Normalityn x p-hat = 246, n x q-hat = 193The sampling distribution is approx. Normal<br />(Note that this is just #successes and #failures)<br />
  81. 81. Name of Interval<br />We are constructing a “1 proportion Z interval”<br />
  82. 82. Construction of Interval<br />
  83. 83. Conclusion<br />“We are 95% confident that the true proportion of 13-17 year olds who would answer “yes” when asked if young people should wait to have sex until they are married is between 0.514 and 0.606.”<br />

×